The featured mage of this blog post is by Gerd Altmann from Pixabay

What is the overhead imposed by the millis() interrupt? And can we get rid of it?

millis() and micros() are the two functions returning the number of milliseconds and microseconds since the Arduino program was started. With these functions, you can measure time intervals between two events. But beware, the counters roll over after some time. Since the time counters use unsigned 32-bit numbers, micros() will go from UINT32_MAX to zero after roughly 70 minutes, millis() does that after 49 days. Fortunately, it is nevertheless possible to determine the length of time intervals correctly if they are not too long.

In an ideal world, these timekeeping functions would not impose any computational costs. In the real world, however, they do. The function calls cost some time and, moreover, the timekeeping in the background that is performed by the TIM0_OVF interrupt service routine (ISR) incurs some cost.

We want to find out how much time the interrupt handler that counts the milliseconds needs. Since no other code can be executed while an interrupt is serviced, time critical functions (such as bit-banging I/O functions) are very sensitive to it. Looking into the source code of the Arduino core, one finds in the file wiring.c the following code:

#if defined(TIM0_OVF_vect)
ISR(TIM0_OVF_vect)
#else
ISR(TIMER0_OVF_vect)
#endif
{
	// copy these to local variables so they can be stored in registers
	// (volatile variables must be read from memory on every access)
	unsigned long m = timer0_millis;
	unsigned char f = timer0_fract;

	m += MILLIS_INC;
	f += FRACT_INC;
	if (f >= FRACT_MAX) {
		f -= FRACT_MAX;
		m += 1;
	}

	timer0_fract = f;
	timer0_millis = m;
	timer0_overflow_count++;

}

Now, one can try to determine the runtime of this ISR either by looking at the generated code and count instruction cycles or one measures the timing empirically. I did both.

First, I inserted two port manipulation statements (such as PORTC |= 0x01 and PORTC &= ~0x01) in the beginning and the end of the routine and measured the time using a logic analyzer. The result was that one could see blips of length 3.13 µs. Then, using the triggering mechanism, I looked for longer intervals and found that sometimes the routine needs 3.44 µs. But there are no longer execution times. This looks completely plausible because sometimes, the if-branch will be executed, which needs a bit more time.

So this is it? Definitely not! When looking into the generated assembler code, one notices that there is some code before the first port manipulation command is executed and some after. It is mainly a series of push instructions before (in order to save registers temporarily) and a series of pop instructions afterwards. Before we go into that, let us count the cycles between the two port manipulation instructions that I inserted.

0000067a <__vector_16>:
__vector_16():
wiring.c:47
{
 67a:	1f 92       	push	r1        [2]
 67c:	0f 92       	push	r0        [2]
 67e:	0f b6       	in      r0, 0x3f  [1]
 680:	0f 92       	push	r0        [2] 
 682:	11 24       	eor     r1, r1    [1]
 684:	2f 93       	push	r18       [2]
 686:	3f 93       	push	r19       [2]
 688:	8f 93       	push	r24       [2]
 68a:	9f 93       	push	r25       [2]
 68c:	af 93       	push	r26       [2]
 68e:	bf 93       	push	r27       [2]
wiring.c:48
  PORTC |= 0x01;
 690:	40 9a       	sbi	0x08, 0	  [2]
wiring.c:51
	unsigned long m = timer0_millis;
 692:	80 91 1e 01 	lds	r24, 0x011E	; <timer0_millis>              [2]
 696:	90 91 1f 01 	lds	r25, 0x011F	; <timer0_millis+0x1>          [2]
 69a:	a0 91 20 01 	lds	r26, 0x0120	; <timer0_millis+0x2>          [2]
 69e:	b0 91 21 01 	lds	r27, 0x0121	; <timer0_millis+0x3>          [2]
/wiring.c:52
	unsigned char f = timer0_fract;
 6a2:	30 91 14 01 	lds	r19, 0x0114	; <timer0_fract>               [2]
wiring.c:55
	f += FRACT_INC;
 6a6:	23 e0       	ldi	r18, 0x03	; 3                            [1]
 6a8:	23 0f       	add	r18, r19                                       [1]
wiring.c:56
	if (f >= FRACT_MAX) {
 6aa:	2d 37       	cpi	r18, 0x7D	; 125                          [1]
 6ac:	60 f5       	brcc	.+88     	; 0x706 <__vector_16+0x8c>     [wc: 2]
wiring.c:54
	m += MILLIS_INC;
 6ae:	01 96       	adiw	r24, 0x01	; 1
 6b0:	a1 1d       	adc	r26, r1
 6b2:	b1 1d       	adc	r27, r1
wiring.c:61
	timer0_fract = f;
 6b4:	20 93 14 01 	sts	0x0114, r18	; <timer0_fract>               [2]
wiring.c:62
	timer0_millis = m;
 6b8:	80 93 1e 01 	sts	0x011E, r24	; <timer0_millis>              [2]
 6bc:	90 93 1f 01 	sts	0x011F, r25	; <timer0_millis+0x1>          [2]
 6c0:	a0 93 20 01 	sts	0x0120, r26	; <timer0_millis+0x2>          [2]
 6c4:	b0 93 21 01 	sts	0x0121, r27	; <timer0_millis+0x3>          [2]
wiring.c:63
	timer0_overflow_count++;
 6c8:	80 91 15 01 	lds	r24, 0x0115	; <timer0_overflow_count>      [2]
 6cc:	90 91 16 01 	lds	r25, 0x0116	; <timer0_overflow_count+0x1>  [2]
 6d0:	a0 91 17 01 	lds	r26, 0x0117	; <timer0_overflow_count+0x2>  [2]
 6d4:	b0 91 18 01 	lds	r27, 0x0118	; <timer0_overflow_count+0x3>  [2]
 6d8:	01 96       	adiw	r24, 0x01	; 1                            [2]
 6da:	a1 1d       	adc	r26, r1                                        [1]
 6dc:	b1 1d       	adc	r27, r1                                        [1]
 6de:	80 93 15 01 	sts	0x0115, r24	; <timer0_overflow_count>      [2]
 6e2:	90 93 16 01 	sts	0x0116, r25	; <timer0_overflow_count+0x1>  [2]
 6e6:	a0 93 17 01 	sts	0x0117, r26	; <timer0_overflow_count+0x2>  [2]
 6ea:	b0 93 18 01 	sts	0x0118, r27	; <timer0_overflow_count+0x3>  [2]
wiring.c:64
 PORTC &= ~0x01;
 6ee:	40 98       	cbi	0x08, 0	    [2]
wiring.c:66
}
 6f0:	bf 91       	pop	r27         [2]
 6f2:	af 91       	pop	r26         [2]
 6f4:	9f 91       	pop	r25         [2]
 6f6:	8f 91       	pop	r24         [2]
 6f8:	3f 91       	pop	r19         [2]
 6fa:	2f 91       	pop	r18         [2]
 6fc:	0f 90       	pop	r0          [2]
 6fe:	0f be       	out	0x3f, r0    [1]
 700:	0f 90       	pop	r0          [2]
 702:	1f 90       	pop	r1          [2]
 704:	18 95       	reti                [4]
wiring.c:57
		f -= FRACT_MAX;
 706:	26 e8       	ldi	r18, 0x86	; 134                          [1]
 708:	23 0f       	add	r18, r19                                       [1]
wiring.c:58
		m += 1; 
 70a:	02 96       	adiw	r24, 0x02	; 2                            [2]
 70c:	a1 1d       	adc	r26, r1                                        [1]
 70e:	b1 1d       	adc	r27, r1                                        [1]
 710:	d1 cf       	rjmp	.-94     	; 0x6b4 <__vector_16+0x3a>     [2]

In the worst case (when the if-branch is executed), one ends up with 53 execution cycles. One has to add 2 cycles from the CBI instruction (clearing the port bit), which makes it 55. Since we have 16 cycles per µs, this leads to 55/16=3.4375 µs, which is close enough to what I have measured. Remember for later, 2 of these cycles can be subtracted since they are caused by the measurement.

So what about the prologue and the epilogue? Adding those cycles up leads to another 43 cycles. And this is still not everything. The data sheet for the ATmega328 states under the heading Interrupt Response Time:

The interrupt execution response for all the enabled AVR interrupts is four clock cycles minimum. After four clock cycles the program vector address for the actual interrupt handling routine is executed. During this four clock cycle period, the Program Counter is pushed onto the Stack. The vector is normally a jump to the interrupt routine, and this jump takes three clock cycles. If an interrupt occurs during execution of a multi-cycle instruction, this instruction is completed before the interrupt is served.

ATmega48A/48PA/88A/88PA/168A/168PA/328/328PA Datasheet

In other words, we have another 4+3 cycles and if a 4-cycle instruction is executed when the interrupt happens, one may have to wait for another 3 cycles before the interrupt is served. In other words, we have in the worst case 10 cycles before the interrupt service routine starts to execute. Adding this all up leads to 53+43+10 = 106 cycles, which is equivalent to 6.625 µs when using a 16 MHz clock.

This is a very small amount of time when compared with the history of mankind. But it is a very large chunk of time when time-critical operations are performed. For instance, if we want to receive data serially over an UART line with 115200 bps, then each bit takes 8.68 µs. With the millis interrupt running, this will probably lead to missed bits.

So, what could be a solution in this case? If the timing is very tight, it is always an option to disable the millis interrupt: TIMSK0 = 0. If you do this, millis() and micros(), as well as the sibling functions delay() and delayMicroseconds() cannot be used anymore.

If you need to do timekeeping nevertheless and accuracy is not a requirement, you could employ the watchdog timer interrupt. Of course, you have to write an ISR, but this can be a very short one, simply incrementing a variable. If accuracy is of concern, you could use a hardware solution (i.e., an RTC) instead.

If you want to use delay routines with the millis interrupt disabled, then the ones in util/delay.h may be of help. _delay_ms and _delay_us are similar to delay and delayMicroseconds. However, they expect float arguments and these have to be constants.