Is any Input Available?

The featured image of this post is by rawpixel.com – de.freepik.com

The SoftwareSerial class has the available() method, which returns the number of characters that have been already received but not yet read. This is very similar to what the standard Serial.available() method offers. There is an interesting difference, though. A call to SoftwareSerial.available() is significantly slower than a call to Serial.available(). We will look for the deeper reason of this strange behavior and I will show you three ways how to fix it.

EDIT: The problem will vanish with Arduino version 1.8.17

The observation

While I was working on implementing a serial library that uses a single line for input and output, I noticed that a call to SoftwareSerial.available() uses a lot of time (from an MCU perspective). I measured the time and compared it to the standard method by using the following sketch:

#include <SoftwareSerial.h>

SoftwareSerial SoftSerial(8,9);

void setup()
{
  SoftSerial.begin(19200);
  Serial.begin(19200);
  DDRC = 0x03;
}

void loop() {
  PORTC = 0x01;
  Serial.available();
  PORTC = 0;
  PORTC = 0x02;
  SoftSerial.available();
  PORTC = 0;
  delay(100);
}

Connecting now a logic analyzer to the port pins PC0 and PC1 revealed the following: A call to the SoftwareSerial method is ten times slower than a call to the standard method!

Time for calling Serial.available() and SoftSerial.available()

The reason

Of course, this leads to the question whether this is unavoidable. The good thing is that you can inspect the entire source code and try to find out what is behind it. The implementation of the standard method can be found in HardwareSerial.cpp in the Arduino core directory:

int HardwareSerial::available(void)
{
  return ((unsigned int)(SERIAL_RX_BUFFER_SIZE +
                         _rx_buffer_head -
                         _rx_buffer_tail)) % SERIAL_RX_BUFFER_SIZE;
}

The implementation of the available method in the SoftwareSerial library looks very similar:

int SoftwareSerial::available()
{
  if (!isListening())
    return 0;

  return (_receive_buffer_tail + _SS_MAX_RX_BUFF -
          _receive_buffer_head) % _SS_MAX_RX_BUFF;
}

The main difference appears to be the additional if-statement with a call to the isListening() method. The body of this method contains just a single comparison and cannot be responsible for a tenfold slow-down, though. So, the remaining difference is the type cast to unsigned int present in the standard method and missing in the SoftwareSerial method. Could that make such a huge difference?

It does indeed. When one looks into the generated assembly code, one sees that the modulo operation in the HardwareSerial class leads to the following assembly code (the left operand of the modulo operation is already in r24:r25 and the right operand is the compile time constant 64):

  andi r24, 0x3F ; 63
  eor r25, r25

This is pretty clever and works similarly for all cases when the left operand is an unsigned integer and the right operand is a positive power of two. In contrast to that, the code generated for the SoftwareSerial class looks as follows:

  ldi r22, 0x40 ; 64
  ldi r23, 0x00 ; 0
  call 0xdae ; 0xdae <__divmodhi4>

So, here again, the left operand is already in register pair r24:r25. Then the number 64 is loaded into the register pair r22:r23 and the modulo subroutine is called (note, that AVRs do not have any hardware division instructions). The reason that the simple masking operation is not enough is that by the C++ type-conversion rules, all summands are converted to int, resulting in a signed value, where the above optimization would not yield the correct result if we had a negative number. Doing integer division in software is costly and for this reason the 15 µs is not a surprise. However, the call to the general integer modulo function is obviously not necessary at all, since the left operand can never become negative.

A potential problem

Can this behavior lead to problems? Most of the time, one will probably never notice that the call to SoftwareSerial.available() consumes more time than necessary. However, when timing becomes tight, then one may run into problems.

One problematic scenario could look as follows. The program receives bytes with 57600 bps. This implies that a bit takes 17.36 µs. The interrupt routine for receiving bytes is written in a way such that it waits into the stop bit before it returns, implying that less than 17.36 µs per received byte are available. If the interrupt routine returns too late, then the program will probably not be able to read one byte for each received byte. In other words, after some time, the receive buffer will probably be overrun. In order to demonstrate this, I have written the following sketch:

#include <SoftwareSerial.h>

SoftwareSerial mySerial =  SoftwareSerial(8, 9);

boolean available;
unsigned long sum = 0;
  
void setup()
{
  pinMode(LED_BUILTIN, OUTPUT);
  digitalWrite(LED_BUILTIN, HIGH);   
  DDRC |= 0x03;
  mySerial.begin(57600);

  while (!mySerial.overflow()) {
    PORTC = 0x01;
    available = mySerial.available();
    PORTC = 0x00;
    if (available) {
      PORTC = 0x02;
      sum += mySerial.read();
      PORTC = 0x00;
    }
  }
  digitalWrite(LED_BUILTIN, LOW);
}

void loop() { }

On the host side, a Python script generates the bytes to be received using a FT232R interface for sending them to the Arduino board (using the pyserial module):

#!/usr/bin/env python3
import serial

serialport = '/dev/cu.usbserial-1410'

while (1):
    ser = serial.Serial(serialport, 57600, stopbits=1)
    while (1):
        ser.write(b'1');

When setting the compile time constant _SS_MAX_RX_BUFF to 8 in order to force an early failure, the logic analyzer connected to the Arduino the same way as described above gave this picture after 11 bytes were received:

Buffer overrun after 11 bytes have been received

It is interesting to see that the execution times of the available and the read methods are stretched because of the execution of the interrupt routines. In any case, it is easy to see that after 11 bytes have been received, we have a buffer overrun because only 3 bytes have been read so far.

If we reduce the communication speed to 38400 bps, we have the same problems a few bytes later. With 19200 bps, no buffer overrun happens in our setup. Similarly, if we use 1.5 stop bits, there is no problem even for 57600 bps.

Three ways to fix the problem

I have submitted a pull request for fixing the problem (October, 18 2021). So hopefully, this issue will go away soon. Until then, I have three possible ways for you to fix it.

  1. Instead of checking whether new input is available, you can call the read() method (which you would do anyway). If the return value is -1, then you know that no new byte has been sent. Note that the return value of the read method is an integer (a signed two-byte value), though!
  2. You can directly modify the available method in the SoftwareSerial class (provided you find the place in your installation, where the SoftwareSerial library is stored).
  3. You can define a new class MySoftwareSerial (perhaps in a new library), which inherits everything from the SoftwareSerial class, but overrides the available method.

With any of the proposed fixes, we can receive at 57600 bps. With 115200 bps, we run into a buffer overrun very soon nevertheless.

Summary

The available method of the standard SoftwareSerial library uses much more time than necessary, which might lead to buffer overrun problems when communicating with 38400 bps or faster. Apparently nobody has noticed this since the library has been adapted as the standard library a couple of years ago. Hopefully, the issue will be solved soon by incorporating my pull request. Until then, you can use one of the three proposed fixes above.

EDIT: The pull request has been accepted, so the problem should not be present any longer in Arduino version 1.8.17 and later.

Categories: Hack, Insight, Library

2 Comments

  1. The key thing that caused the slowdown was the fact that the value in the parenthesis ( the one that’s getting casted to unsigned int) would otherwise be an int. X % Y where Y is a power of 2 simplifies to X & (Y-1) when X is unsigned. The same is not true when it is signed;.

    It doesn’t help nearly as much as what you found, but you can save another 8 bytes by casting that to a uint8_t instead of uint16_t. A few lines before than in read() you can save 6 bytes by casting the calculated position in the ring buffer before the % _RX_BUFFER_MA

    • Hi Spence,

      congrats for being the one who posted the first comment ever to my blog ;-).

      You are right! There are probably hundreds of places where the Arduino core could be streamlined to be more efficient, time- and space-wise. Currently, the IDE 2.0 seems to be of higher priority, but then we wait now for almost a year for a semi-final version. I have no idea what is going on.

Leave a Reply to Spence Konde Cancel reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Copyright © 2022 Arduino Craft Corner

Theme by Anders NorenUp ↑