AVR timing inprecise

#1 · February 1, 2024, 3:42 pm

While debugging a time critical application and comparing it to a real ATtiny85 I discovered a slight deviation in timing. The program itself is simple and uses the timer 0 overflow interupt. The assembler code produced by avr-gcc is

__vector_5:
  push r1  ; 
  push r0  ; 
  in r0,__SREG__   ; ,
  push r0  ; 
  clr __zero_reg__   ; 
  push r24   ; 
 ;  main.c:211:   mean = TCNT0;
  in r24,0x32  ;  _1, MEM[(volatile uint8_t *)82B]
 ;  main.c:211:   mean = TCNT0;
  sts mean,r24   ;  mean, _1

It basically reads TCNT0 and stores it into a variable named 'mean'. That variable is output to a display in the main function (that shouldn't matter). The clock is set to 1 MHz and there is no prescaler for the timer. Once the overflow interrupt triggers some time will be passed before TCNT0 is read. That time differs.

The value of mean is 14 (sometimes 15) when running in simulide and 16 (sometimes 17) on a real ATtiny85. Summing up the cycles within the above ISR until TCNT0 is read totals to 11 (push needs 2 cycles, in, clr need 1 cycle). According to the documentation of ATtiny85 the interrupt execution response time is 4 cycles minimum. There is probably another cycle (or even two) for finishing the instruction executed when the interrupt occurs. Hence, 16 (or 17) cycles should be the correct value, but simulide is faster. Could it be, that the interrupt execution response time is too low (2 instead of 4 cycles)?

#2 · February 1, 2024, 3:58 pm

I will have a look, but I think the response time is 4 cycles in the simulation:
2 cycles to store current PC into stack.
2 cycles for the jump to ISR.

#3 · February 1, 2024, 4:33 pm

The documentation says

The interrupt execution response for all the enabled AVR interrupts is four clock cycles minimum. After four clock
cycles the Program Vector address for the actual interrupt handling routine is executed. During this four clock cycle
period, the Program Counter is pushed onto the Stack. The vector is normally a jump to the interrupt routine, and
this jump takes three clock cycles.

That sounds like pushing the PC onto the stack already takes 4 cycles, which could well be for a 16 Bit value if pushing a byte already takes 2 cycles. But if that jump to the interrupt routine takes another 3 cycles, it would take 7 cycles before the first instruction in the interrupt routine is executed. That's more than I observe on real hardware. I also wonder where those 3 cycles derive from. A rjmp takes only 2 cycles.

#4 · February 1, 2024, 4:35 pm

I think you are right, interrupt is using a "common" CALL to ISR which adds only 2 cycles.
I have to check it more carefully, but seems that this is the problem.

#5 · February 1, 2024, 5:01 pm

Quote from arcachofo on February 1, 2024, 4:35 pm

I think you are right, interrupt is using a "common" CALL to ISR which adds only 2 cycles.

I found it. It would be 3 cycles if it were a jmp but ATtiny85 doesn't have a jmp instruction just a rjmp instruction which takes two bytes and two cycles. Hence, the first instruction in the ISR should be executed 6 cycles after the interrupt occurs. Also assuming that no instruction is interrupted, because that could increase the time by 1 or 2 cycles. Looks like this isn't the case, maybe because the overflow is syncronized with the clock. Finally, it should be 4 + 2 + 11 cycles, which is unfortunately one more than measured on the the real hardware.

#6 · February 1, 2024, 5:05 pm

While coding ATmega8 in assembly, I used to see an interrupt call as an RCALL (3 cycles) + RJMP (2 cycles).

I think I will also have a close look at it.

#7 · February 1, 2024, 5:18 pm

A "normal" bare JUMP is in most CPUs 2 cycles: one to set the PC and another to start executing code at the new PC.
A CALL is usually a "push PC to Stack" + a JUMP.
Then depending in how much it takes to "find" the address it can add cycles.

If an RCALL takes 3 cycles and RJMP takes 2, I tend to think that the "Push to Stack" takes only 1.
Then the JUMP to ISR takes 3 instead of 2 (don't know why) + 1 for "Push to Stack" = 4.

That is how I'm starting to see it.
In any case jumping to the ISR takes 4 cycles in total and simulide is only adding 2 (a "bare" JUMP).

#8 · February 1, 2024, 5:32 pm

Then the JUMP to ISR takes 3 instead of 2 (don't know why)

Could be that this 3 cycles are counted from the moment the interrupt is triggered?
Because after the interrupt is triggered the CPU still executes one more instruction before jumping to ISR.
Maybe this is the extra cycle for JUMP to ISR?

#9 · February 1, 2024, 5:48 pm

Quote from arcachofo on February 1, 2024, 5:32 pm

Then the JUMP to ISR takes 3 instead of 2 (don't know why)

Could be that this 3 cycles are counted from the moment the interrupt is triggered?
Because after the interrupt is triggered the CPU still executes one more instruction before jumping to ISR.
Maybe this is the extra cycle for JUMP to ISR?

In this case, doesn't the cycle of this executed normal instruction will be counted twice? before the jump (as if an interrupt didn't occur) and during the jump (when an interrupt occurs).

Perhaps this extra cycle sets a flag (or the like) to signal that the called subroutine is expected to be ended with RETI, not RET.

#10 · February 1, 2024, 6:01 pm

Perhaps this extra cycle sets a flag (or the like) to signal that the called subroutine is expected to be ended with RETI, not RET.

Yes, I think something like that makes more sense.
I was thinking about the Global Interrupt Enable bit which is cleared when entering the interrupt and set at RETI.
But then RETI should take 1 cycle more that RET, which is not the case.

In any case you can try a simple solution:
At file: src/microsim/cores/avr/avrcore.cpp
Add: "m_retCycles = 4;" to the constructor.

I need to change the name of that variable because it is misleading, but you can try it and see if it solves the mismatch.