Missing Cycles by Pasi 'Albert' Ojala (po87553@cs.tut.fi albert@cc.tut.fi) Missing Cycles -------------- [all timings are in PAL, the principle applies to NTSC too] Everybody knows that there are 63 cycles available to the C64 processor on each scan line, except for one which only provides 23 cycles (later referred to as a "bad" scan line). But what happens when we add sprites and why ? In the C64, the VIC (video interface controller) has much more to do than just showing graphics on the screen. It also handles the memory refresh. On each scanline, it has to refresh five rows in the memory matrix and fetch fourty bytes of graphics data. The VIC does all of this during the cycles (phase 1) that the processor is not using the memory. These cycles, however, are not sufficient when the VIC also needs to access the character and color codes for the next row. The memory bus can't be used by the CPU and the VIC at the same time, so CPU access to the bus must be denied to allow the VIC to fetch its data. Fortunately, the VIC bus (12-bit wide) allows the character (8 bits) and color (4 bits) codes to be fetched at the same time. _Understanding how sprites work_ If there are sprites on the screen, the VIC needs even more cycles to fetch all of the graphics data. Scan lines are time divided so that there is enough time for all action during one line. On each line, the sprite image pointers are fetched during phase 1. If the sprite is to be displayed on that line, the three bytes of image data are fetched right after that. Out of these three fetches, two take place during phase 2 of the clock, so the processor will lose these. On average, two clock cycles are lost for each sprite that is displayed on that line. But how is it possible for all eight sprites to only take 16-19 cycles (depending on the timing) when we have observed that one sprite requires three cycles? And why do sprites 0, 2, 4, 6 and 7 together take up as many cycles as all eight sprites ? The answer may be found in the way the VIC tells the CPU that it needs additional cycles. _The BA signal_ When the VIC wants to use the bus, the BA (Bus Available) signal goes inactive. This will happen three cycles before the bus must be released ! During these three cycles, the CPU must complete all memory accesses or delay them until it has the bus again. The CPU either completes the current instruction in the remaining cycles or sits and waits for the bus to become available again. It can't execute a new instruction as long as it doesn't have the bus. This is why cycles seem to be lost (besides those stolen directly for the sprites). Usually, all 8 sprites take 17 cycles while one sprite takes three cycles. However, the CPU may continue to execute an instruction if it does not use the bus. _Theory and speculation_ Let's suppose that all the sprites are enabled and on the same scan line. Then, the VIC steals 16 cycles (2 cycles for each sprite) for the memory fetches and 3 cycles as overhead for the BA signal, for a total of 19 cycles. However, it will be usually less because the CPU will use some of the cycles when the bus request is pending. If we now disable sprite 4, no cycles are released for the CPU's use. This is because during the previous sprite 4 data fetch, the VIC already signals that it needs the bus for the sprite 5 data fetch and BA stays low (Refer to the timing chart). Thus, the CPU never sees BA go high during sprite 4 and 2 cycles are still lost. Accordingly, if we only turn off sprites 1, 3 and 5 we get no cycles back from the VIC. So in time-critical raster routines, always use sprites in order. _What can we do with this feature ?_ How can this be useful? A good use is for synchronization. Normally, before the CPU starts to execute the raster interrupt code, it's executing an instruction of undefined cycle-length. This execution time varies from two to seven cycles. With a sprite, you can do the synchronization with a minimal effort using a DEC or INC instruction in the right place. If the processor is early, it has to wait for the bus, otherwise it will continue to execute cycles from the instruction. I have never experimented with any other instruction than DEC/INC, but some others should work also. You need an instruction which has a cycle that do not need the bus to be available. e.g. INC $3fff will increase the value during the fifth cycle and do not need the bus for that. _A demo program_ The enclosed program includes a short raster color routine to demonstrate this strict timing and synchronization. The background color is changed 12 times on each line. The electron beam runs over eight pixels during one cycle, so the timing must be precise. -------------------------------------------------------------------------- _Table for PAL VIC timing for the Missing cycles_ 012345678901234567890123456789012345678901234567890123456789012 cycles Normal scan line, 0 sprites ggggggggggggggggggggggggggggggggggggggggrrrrr p p p p p p p p phi-1 VIC phi-2 VIC xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx phi-2 6510 63 cycles available Normal scan line, 8 sprites ggggggggggggggggggggggggggggggggggggggggrrrrr pspspspspspspsps phi-1 VIC ssssssssssssssss phi-2 VIC xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxXXX phi-2 6510 46-49 cycles available Normal scan line, 4 sprites ggggggggggggggggggggggggggggggggggggggggrrrrr psp psp psp psp phi-1 VIC ss ss ss ss phi-2 VIC xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxXXX xx phi-2 6510 48-51 cycles available Bad scan line, 0 sprites ggggggggggggggggggggggggggggggggggggggggrrrrr p p p p p p p p phi-1 VIC cccccccccccccccccccccccccccccccccccccccc phi-2 VIC xxxxxxxxxxxxxxxxxxxxxxx phi-2 6510 23 cycles available Bad scan line, 8 sprites ggggggggggggggggggggggggggggggggggggggggrrrrr pspspspspspspsps phi-1 VIC cccccccccccccccccccccccccccccccccccccccc ssssssssssssssss phi-2 VIC xxxxXXX phi-2 6510 4-7 cycles available g= grafix data fetch (character images or graphics data) r= refresh p= sprite image pointer fetch c= character and color CODE fetch during a bad scan line s= sprite data fetch x= processor executing instructions X= processor executing an instruction, bus request pending Observe! The left edge of the chart is not the left edge of the screen nor the left edge of the beam, but the sprite x-coordinate 0. If you have opened the borders, you know what I mean. A sprite can be moved left from the coordinate 0 by using x-values greater than 500. ___________ | _______ |<-- Maximum sized video screen ||| | | ||| |<-- Normal C64 screen ||| | | |||_______| | || | ||__________| ^ Sprite coordinate 0 -------------------------------------------------------------------------- Demonstration program for missing cycles COLOR0= $CE00 ; Place for color bar 0 COLOR1= $CF00 ; Place for color bar 1 RASTER= $FA ; Line for the raster interrupt DUMMY= $CFFF ; Timing variable *= $C000 SEI ; Disable interrupts LDA #$7F ; Disable timer interrupts STA $DC0D LDA #$01 ; Enable raster interrupts STA $D01A STA $D015 ; Enable Sprite 0 LDA #IRQ STA $0315 LDA #$1B STA $D011 LDA #RASTER ; Set interrupt position (inc. 9th bit) STA $D012 LDA #RASTER-20 ; Sprite will just reach the interrupt position STA $D001 ; when it is positioned 20 lines earlier LDX #51 LDY #0 STA $D017 ; No Y-enlargement LOOP0 LDA COL,X ; Create color bars PHA AND #15 STA COLOR0,X STA COLOR0+52,Y STA COLOR0+104,X STA COLOR0+156,Y PLA LSR LSR LSR LSR STA COLOR1,X STA COLOR1+52,Y STA COLOR1+104,X STA COLOR1+156,Y INY DEX BPL LOOP0 CLI ; Enable interrupts RTS ; Return IRQ NOP ; Wait a bit NOP NOP NOP LDY #103 ; 104 lines of colors (some of them not visible) ; Reduce for NTSC, 55 ? INC DUMMY ; Handles the synchronization with the help of the DEC DUMMY ; sprite and the 6-clock instructions ; Add a NOP for NTSC FIRST LDX COLOR0,Y ; Do the color effects SECOND LDA COLOR1,Y STA $D020 STX $D020 STA $D020 STX $D020 STA $D020 STX $D020 STA $D020 STX $D020 STA $D020 STX $D020 STA $D020 STX $D020 ; Add a NOP for NTSC (one line = 65 cycles) LDA #0 ; Throw away 2 cycles (total loop = 63 cycles) DEY BPL FIRST ; Loop for 104 lines STA $D020 LDA #103 ; For subtraction DEC FIRST+1 ; Move the bars BPL OVER STA FIRST+1 OVER SEC SBC FIRST+1 STA SECOND+1 LDA #1 ; Ack the raster interrupt STA $D019 JMP $EA31 ; Jump to the standard irq handler COL BYT $09,$90,$09,$9B,$00,$99,$2B,$08,$90,$29,$8B,$08,$9C,$20,$89,$AB BYT $08,$9C,$2F,$80,$A9,$FB,$08,$9C,$2F,$87,$A0,$F9,$7B,$18,$0C,$6F BYT $07,$61,$40,$09,$6B,$48,$EC,$0F,$67,$41,$E1,$30,$09,$6B,$48,$EC BYT $3F,$77,$11,$11 ; Two color bars --------------------------------------------------------------------------