The featured mage of this blog post is by Gerd Altmann from Pixabay
What is the overhead imposed by the millis()
interrupt? And can we get rid of it?
millis()
and micros()
are the two functions returning the number of milliseconds and microseconds since the Arduino program was started. With these functions, you can measure time intervals between two events. But beware, the counters roll over after some time. Since the time counters use unsigned 32-bit numbers, micros()
will go from UINT32_MAX
to zero after roughly 70 minutes, millis()
does that after 49 days. Fortunately, it is nevertheless possible to determine the length of time intervals correctly if they are not too long.
In an ideal world, these timekeeping functions would not impose any computational costs. In the real world, however, they do. The function calls cost some time and, moreover, the timekeeping in the background that is performed by the TIM0_OVF
interrupt service routine (ISR) incurs some costs.
We want to find out how much time the interrupt handler that counts the milliseconds needs. Since no other code can be executed while an interrupt is serviced, time-critical functions (such as bit-banging I/O functions) are very sensitive to it. Looking into the source code of the Arduino core, one finds in the file wiring.c
the following code:
#if defined(TIM0_OVF_vect) ISR(TIM0_OVF_vect) #else ISR(TIMER0_OVF_vect) #endif { // copy these to local variables so they can be stored in registers // (volatile variables must be read from memory on every access) unsigned long m = timer0_millis; unsigned char f = timer0_fract; m += MILLIS_INC; f += FRACT_INC; if (f >= FRACT_MAX) { f -= FRACT_MAX; m += 1; } timer0_fract = f; timer0_millis = m; timer0_overflow_count++; }
Now, one can try to determine the runtime of this ISR either by looking at the generated code and count instruction cycles or by measuring the timing empirically. I did both.
First, I inserted two port manipulation statements (such as PORTC |= 0x01
and PORTC &= ~0x01
) in the beginning and the end of the routine and measured the time using a logic analyzer. The result was that one could see blips of length 3.13 µs. Then, using the triggering mechanism, I looked for longer intervals and found that sometimes the routine needs 3.44 µs. But there are no longer execution times. This looks completely plausible because sometimes, the if-branch will be executed, which needs a bit more time.
So this is it? Definitely not! When looking into the generated assembler code, one notices that there is some code before the first port manipulation command is executed and some after. It is mainly a series of push
instructions before (in order to save registers temporarily) and a series of pop instructions afterward. Before we go into that, let us count the cycles between the two port manipulation instructions that I inserted.
0000067a <__vector_16>: __vector_16(): wiring.c:47 { 67a: 1f 92 push r1 [2] 67c: 0f 92 push r0 [2] 67e: 0f b6 in r0, 0x3f [1] 680: 0f 92 push r0 [2] 682: 11 24 eor r1, r1 [1] 684: 2f 93 push r18 [2] 686: 3f 93 push r19 [2] 688: 8f 93 push r24 [2] 68a: 9f 93 push r25 [2] 68c: af 93 push r26 [2] 68e: bf 93 push r27 [2] wiring.c:48 PORTC |= 0x01; 690: 40 9a sbi 0x08, 0 [2] wiring.c:51 unsigned long m = timer0_millis; 692: 80 91 1e 01 lds r24, 0x011E ; <timer0_millis> [2] 696: 90 91 1f 01 lds r25, 0x011F ; <timer0_millis+0x1> [2] 69a: a0 91 20 01 lds r26, 0x0120 ; <timer0_millis+0x2> [2] 69e: b0 91 21 01 lds r27, 0x0121 ; <timer0_millis+0x3> [2] /wiring.c:52 unsigned char f = timer0_fract; 6a2: 30 91 14 01 lds r19, 0x0114 ; <timer0_fract> [2] wiring.c:55 f += FRACT_INC; 6a6: 23 e0 ldi r18, 0x03 ; 3 [1] 6a8: 23 0f add r18, r19 [1] wiring.c:56 if (f >= FRACT_MAX) { 6aa: 2d 37 cpi r18, 0x7D ; 125 [1] 6ac: 60 f5 brcc .+88 ; 0x706 <__vector_16+0x8c> [wc: 2] wiring.c:54 m += MILLIS_INC; 6ae: 01 96 adiw r24, 0x01 ; 1 6b0: a1 1d adc r26, r1 6b2: b1 1d adc r27, r1 wiring.c:61 timer0_fract = f; 6b4: 20 93 14 01 sts 0x0114, r18 ; <timer0_fract> [2] wiring.c:62 timer0_millis = m; 6b8: 80 93 1e 01 sts 0x011E, r24 ; <timer0_millis> [2] 6bc: 90 93 1f 01 sts 0x011F, r25 ; <timer0_millis+0x1> [2] 6c0: a0 93 20 01 sts 0x0120, r26 ; <timer0_millis+0x2> [2] 6c4: b0 93 21 01 sts 0x0121, r27 ; <timer0_millis+0x3> [2] wiring.c:63 timer0_overflow_count++; 6c8: 80 91 15 01 lds r24, 0x0115 ; <timer0_overflow_count> [2] 6cc: 90 91 16 01 lds r25, 0x0116 ; <timer0_overflow_count+0x1> [2] 6d0: a0 91 17 01 lds r26, 0x0117 ; <timer0_overflow_count+0x2> [2] 6d4: b0 91 18 01 lds r27, 0x0118 ; <timer0_overflow_count+0x3> [2] 6d8: 01 96 adiw r24, 0x01 ; 1 [2] 6da: a1 1d adc r26, r1 [1] 6dc: b1 1d adc r27, r1 [1] 6de: 80 93 15 01 sts 0x0115, r24 ; <timer0_overflow_count> [2] 6e2: 90 93 16 01 sts 0x0116, r25 ; <timer0_overflow_count+0x1> [2] 6e6: a0 93 17 01 sts 0x0117, r26 ; <timer0_overflow_count+0x2> [2] 6ea: b0 93 18 01 sts 0x0118, r27 ; <timer0_overflow_count+0x3> [2] wiring.c:64 PORTC &= ~0x01; 6ee: 40 98 cbi 0x08, 0 [2] wiring.c:66 } 6f0: bf 91 pop r27 [2] 6f2: af 91 pop r26 [2] 6f4: 9f 91 pop r25 [2] 6f6: 8f 91 pop r24 [2] 6f8: 3f 91 pop r19 [2] 6fa: 2f 91 pop r18 [2] 6fc: 0f 90 pop r0 [2] 6fe: 0f be out 0x3f, r0 [1] 700: 0f 90 pop r0 [2] 702: 1f 90 pop r1 [2] 704: 18 95 reti [4] wiring.c:57 f -= FRACT_MAX; 706: 26 e8 ldi r18, 0x86 ; 134 [1] 708: 23 0f add r18, r19 [1] wiring.c:58 m += 1; 70a: 02 96 adiw r24, 0x02 ; 2 [2] 70c: a1 1d adc r26, r1 [1] 70e: b1 1d adc r27, r1 [1] 710: d1 cf rjmp .-94 ; 0x6b4 <__vector_16+0x3a> [2]
In the worst case (when the if-branch is executed), one ends up with 53 execution cycles. One has to add 2 cycles from the CBI instruction (clearing the port bit), which makes it 55. Since we have 16 cycles per µs, this leads to 55/16=3.4375 µs, which is close enough to what I have measured. Remember for later, 2 of these cycles can be subtracted since they are caused by the measurement.
So what about the prologue and the epilogue? Adding those cycles up leads to another 43 cycles. And this is still not everything. The data sheet for the ATmega328 states under the heading Interrupt Response Time:
The interrupt execution response for all the enabled AVR interrupts is four clock cycles minimum. After four clock cycles the program vector address for the actual interrupt handling routine is executed. During this four clock cycle period, the Program Counter is pushed onto the Stack. The vector is normally a jump to the interrupt routine, and this jump takes three clock cycles. If an interrupt occurs during execution of a multi-cycle instruction, this instruction is completed before the interrupt is served.
ATmega48A/48PA/88A/88PA/168A/168PA/328/328PA Datasheet
In other words, we have another 4+3 cycles and if a 4-cycle instruction is executed when the interrupt happens, one may have to wait for another 3 cycles before the interrupt is served. In other words, we have in the worst case 10 cycles before the interrupt service routine starts to execute. Adding this all up leads to 53+43+10 = 106 cycles, which is equivalent to 6.625 µs when using a 16 MHz clock.
This is a very small amount of time when compared with the history of mankind. But it is a very large chunk of time when time-critical operations are performed. For instance, if we want to receive data serially over a UART line with 115200 bps, then each bit takes 8.68 µs. With the millis interrupt running, this will probably lead to missed bits.
So, what could be a solution in this case? If the timing is very tight, it is always an option to disable the millis interrupt: TIMSK0 = 0
. If you do this, millis()
and micros()
, as well as the sibling functions delay()
and delayMicroseconds()
cannot be used anymore.
If you need to do timekeeping nevertheless and accuracy is not a requirement, you could employ the watchdog timer interrupt. Of course, you have to write an ISR, but this can be a very short one, simply incrementing a variable. If accuracy is of concern, you could use a hardware solution (i.e., an RTC) instead.
If you want to use delay routines with the millis interrupt disabled, then the ones in util/delay.h
may be of help. _delay_ms
and _delay_us
are similar to delay
and delayMicroseconds
. However, they expect float arguments and these have to be constants.
0 Comments
1 Pingback