SPO600 Lab1 (Pt.1) - Calculation of the execution time of a script
In this Lab, we are told to calculate the execution time of a script of 6502 CPU instructions. We assume the clock speed is 1MHz.
Meanwhile, when I was studying the 6502 CPU instructions in another website. I note that there are different type of addressing. Immediate addressing basically means we use a value instead of an address of memory. "Zeropage" addressing means that we use a number within $FF, ie the memory is located in the page 0. "Absolute" addressing means that we use a memory location in the form of $00FF. "Indirect" addressing means we can put the address of a memory, and that memory and the next memory will store the address of the target memory. It works like a pointer.
The original Script of Lab 1:
lda #$00 ; set a pointer in memory location $40 to point to $0200 sta $40 ; ... low byte ($00) goes in address $40 lda #$02 sta $41 ; ... high byte ($02) goes into address $41 lda #$07 ; colour number ldy #$00 ; set index to 0 loop: sta ($40),y ; set pixel colour at the address (pointer)+Y iny ; increment index bne loop ; continue until done the page (256 pixels) inc $41 ; increment the page ldx $41 ; get the current page numbercpx #$06 ; compare with 6
bne loop ; continue until done all pages
Performance calculation for the original code:
We note that the script takes 11325 cycle to fill in the entire screen with color.
Memory cost:
The machine code cost 25 Bytes (code provided below). And, one pointer of 2 Bytes is used at $40 and $41. The program should cost 27 Bytes in total. Also, X,Y register and accumulator are used.
Machine code:
0600: a9 00 85 40 a9 02 85 41 a9 07 a0 00 91 40 c8 d0
0610: fb e6 41 a6 41 e0 06 d0 f3
Improve the performance
Viewing the performance calculation table, the first idea comes to me is to reduce the cycle cause by the INY and BNE instruction. The value of Y register is increased and overflowed 4 times which kind of causing inefficiency.A modified version:
However, the size of the program is 41 Bytes and 4 pointer are used. In total, it cost 49 Bytes and the X,Y and accumulator.
ldx #$00 ; set a pointer1 in memory location $40 to point to $0200
stx $40 ; ... low byte ($00) goes in address $40
lda #$02
sta $41 ; ... high byte ($02) goes into address $41
; set a pointer2 in memory location $40 to point to $0300
stx $42 ; ... low byte ($00) goes in address $42
lda #$03
sta $43 ; ... high byte ($03) goes into address $43
; set a pointer3 in memory location $40 to point to $0400
stx $44 ; ... low byte ($00) goes in address $44
lda #$04
sta $45 ; ... high byte ($04) goes into address $45
; set a pointer4 in memory location $40 to point to $0500
stx $46 ; ... low byte ($00) goes in address $46
lda #$05
sta $47 ; ... high byte ($05) goes into address $47
lda #$07 ; colour number
ldy #$00 ; set index to 0
loop: sta ($40),y ; set pixel colour at the address (pointer)+Y
sta ($42),y ; set pixel colour at the address (pointer)+Y
sta ($44),y ; set pixel colour at the address (pointer)+Y
sta ($46),y ; set pixel colour at the address (pointer)+Y
iny ; increment index
bne loop ; continue until done the page (256 pixels)
Performance calculation for the revised version:
The revised version took 7461cycles, which is 34.1% faster then the original version, which took 11325 cycles.
Further Modification
The above used indirect addressing in STA instruction. A faster version is to use direct addressing.
lda #$07 ; colour number ldy #$00 ; set index to 0 loop: sta $0200,y ; set pixel colour at $0200 +Y sta $0300,y ; set pixel colour at $0300 +Y sta $0400,y ; set pixel colour at $0400 +Y sta $0500,y ; set pixel colour at $0500 +Y iny ; increment index bne loop ; continue until done the page (256 pixels)
Calculated execution time:
The revised version took 6403cycles, which is 43.4% faster then the original version, which took 11325 cycles.
The program size is 19 Bytes.
The fastest version
The fastest version I can think of is to load the color value into the memory without using any pointer or loop.
The code would be like:
lda #$07 ; colour number
sta $0200 ; set the color of the 1st pixel to $07
sta $0201 ; set the color of the 2nd pixel to $07
......
sta $05ff; set the color of the 1024th pixel to $07
Calculated execution time:
This version will be the fastest. It takes 4098 cycles, which is 63.8% faster than the original code.
I think an old saying in Chinese "the wisest often appears ordinary" maybe used to describe this way of coding.
However, the memory usage of the is program is huge. It took 1024*3 +2 =3074 Bytes.
Comments
Post a Comment