SPO600 Lab1 (Pt.1) - Calculation of the execution time of a script

In this Lab, we are told to calculate the execution time of a script of 6502 CPU instructions. We assume the clock speed is 1MHz.

Meanwhile, when I was studying the 6502 CPU instructions in another website.  I note that there are different type of addressing. Immediate addressing basically means we use a value instead of an address of memory. "Zeropage" addressing  means that we use a number within $FF, ie the memory is located in the page 0. "Absolute" addressing means that we use a memory location in the form of $00FF. "Indirect" addressing  means we can put the address of a memory, and that memory and the next memory will store the address of the target memory. It works like a pointer.

The original Script of Lab 1:

        lda #$00	; set a pointer in memory location $40 to point to $0200
 	sta $40		; ... low byte ($00) goes in address $40
 	lda #$02	
 	sta $41		; ... high byte ($02) goes into address $41
 	lda #$07	; colour number
 	ldy #$00	; set index to 0
 loop:	sta ($40),y	; set pixel colour at the address (pointer)+Y
 	iny		; increment index
 	bne loop	; continue until done the page (256 pixels)
 	inc $41		; increment the page
 	ldx $41		; get the current page number
cpx #$06 ; compare with 6
bne loop ; continue until done all pages

Performance calculation for the original code:


We note that the script takes 11325 cycle to fill in the entire screen with color.

Memory cost:
The machine code cost 25 Bytes (code provided below). And, one pointer of 2 Bytes is used at $40 and $41.  The program should cost 27 Bytes in total. Also, X,Y register and accumulator are used.

Machine code:
0600: a9 00 85 40 a9 02 85 41 a9 07 a0 00 91 40 c8 d0 
0610: fb e6 41 a6 41 e0 06 d0 f3 

Improve the performance

Viewing the performance calculation table, the first idea comes to me is to reduce the cycle cause by the INY and BNE instruction. The value of Y register is increased and overflowed 4 times which kind of causing inefficiency. 

A modified version:
        ldx #$00	; set a pointer1 in memory location $40 to point to $0200
 	stx $40		; ... low byte ($00) goes in address $40
 	lda #$02	
 	sta $41		; ... high byte ($02) goes into address $41
; set a pointer2 in memory location $40 to point to $0300
 	stx $42		; ... low byte ($00) goes in address $42
 	lda #$03	
 	sta $43		; ... high byte ($03) goes into address $43
; set a pointer3 in memory location $40 to point to $0400
 	stx $44		; ... low byte ($00) goes in address $44
 	lda #$04	
 	sta $45		; ... high byte ($04) goes into address $45
; set a pointer4 in memory location $40 to point to $0500
 	stx $46		; ... low byte ($00) goes in address $46
 	lda #$05	
 	sta $47		; ... high byte ($05) goes into address $47
 	lda #$07	; colour number
 	ldy #$00	; set index to 0
 loop:	sta ($40),y	; set pixel colour at the address (pointer)+Y
 	sta ($42),y	; set pixel colour at the address (pointer)+Y
 	sta ($44),y	; set pixel colour at the address (pointer)+Y
 	sta ($46),y	; set pixel colour at the address (pointer)+Y
        iny		; increment index
 	bne loop	; continue until done the page (256 pixels)
Performance calculation for the revised version:




The revised version took 7461cycles, which is 34.1% faster then the original version, which took 11325 cycles.

However, the size of the program is 41 Bytes and 4 pointer are used. In total, it cost 49 Bytes and the X,Y and accumulator.

Further Modification

The above used indirect addressing in STA instruction. A faster version is to use direct addressing.
        lda #$07	; colour number
 	ldy #$00	; set index to 0
 loop:	sta $0200,y	; set pixel colour at $0200 +Y
 	sta $0300,y	; set pixel colour at $0300 +Y
 	sta $0400,y	; set pixel colour at $0400 +Y
 	sta $0500,y	; set pixel colour at $0500 +Y
        iny		; increment index
 	bne loop	; continue until done the page (256 pixels)
Calculated execution time:

The revised version took 6403cycles, which is 43.4% faster then the original version, which took 11325 cycles.

The program size is 19 Bytes.

The fastest version

The fastest version I can think of is to load the color value into the memory without using any pointer or loop.

The code would be like:

    lda #$07 ; colour number
    sta $0200 ; set the color of the 1st pixel to $07
    sta $0201 ; set the color of the 2nd pixel to $07
......
    sta $05ff; set the color of the 1024th  pixel to $07

Calculated execution time:
This version will be the fastest. It takes 4098 cycles, which is 63.8% faster than the original code.
I think an old saying in Chinese "the wisest often appears ordinary" maybe used to describe this way of coding.
 However, the memory usage of the is program is huge. It took 1024*3 +2 =3074 Bytes. 



Comments

Popular posts from this blog

SPO600 Project Stage 1 (Pt.1) - Create a GCC Pass

SPO600 Project Stage 2 (Pt.1) - GCC pass locating clone function

SPO600 Project Stage 2 (Pt.2) - GCC pass locating clone function -modified version