The featured image of this post is by Albert Guillaume – Gils Blas, 24 décembre 1895, Public Domain, Link

When you develop a tool for a protocol that is undocumented, it is not surprising that you will encounter situations you will not have be anticipated. And this was exactly what I experienced developing the hardware debugger dw-link, which connects debugWIRE MCUs to the GDB debugger. Although a substantial part of the debugWIRE protocol has been reverse engineered, I encountered still plenty of surprising situations: Split personality MCUs, stuck-at-one bits in program counters, secret I/O addresses, half-legal opcodes, and more.

The debugWIRE protocol gives access to the on-chip debugging (OCD) features of some of the AVR MCUs. It uses the RESET line as an asynchronous single-wire connection. In order to enable it, one has to program a particular fuse (the DWEN fuse) using the ISP protocol and power-cycle the MCU afterwards. Getting it back to normal state involves disabling the debugWIRE state and unprogramming the DWEN fuse.

Split personality MCUs

The type of an MCU can be determined by querying the device signature. There is a signature query command in ISP mode and one in debugWIRE mode. One would expect that these queries examine the same register and return the same result. Most of the time, they do that.

However, the ATmega48A, ATmega88A, ATmega168A, and ATmega328 that I own have split personalities. They correctly identify themselves when queried in ISP mode. However, when in debugWIRE mode, they pose as P-types, i.e., as ATmega48PA, ATmega88PA, ATmega16PA, and ATmega328P, respectively. Of course, it does not make any difference for the debugger whether we have a 328 or 328P. However, it is a bit confusing. And it actually confuses MPLAB X when debugging such a MCU.

Some MCUs have stuck-at-one bits in their program counter

When running unit tests for my debugger on all the debugWIRE MCUs that were sitting on my bench, I came across a few ATmega48 and ATmega88 (without the A-suffix) produced more than 10 years ago, which exhibit a very strange (and undocumented) behavior. While all the other more than 30 MCU types passed the tests, even the brothers of the mentioned MCUs with an A-suffix and a PA-suffix, these MCUs failed. It turns out that the program counter of these MCUs have some unused bits that are stuck at one. Looking at what the Atmel-ICE does with these MCUs, it became clear that one mainly has to set the hardware breakpoint register to the right value and adjust internally the PC by ignoring the garbage bits.

Unfortunately, those MCUs push the garbled PC also on the stack when they make a function call or service an interrupt. This is not a problem for the MCU since on returning it will ignore the garbage bits anyway. And it is not a problem for the MPLAB X debugger. However, GDB gets confused when trying to perform a stack backtrace. Worse, when single-stepping over a function, GDB inspects the stack in order to find out where to put a temporary breakpoint. Since the address is garbage, GDB will not continue. For these reasons, debugging these MCUs with GDB seems to be of limited value and dw-link will reject them.

Note that these MCUs have the same device signature code as the ones with an A-suffix and the migration guide by Microchip on migrating from ATmegaX8 to ATmegaX8A does not say anything about this change of behavior. By the way, some older MCUs had the same peculiarity, e.g., the ATmega16. In this case, though, the data sheet contained a hint about that.

Half legal opcodes

There are some opcodes that according to the official documentation do not have any meaning. In addition, some opcodes are only supported on some MCU architectures. For example, hardware multiplication is only supported on the ATmegas. Similarly, according to the official documentation, 32-bit jump and call instructions are only supported on MCUs that have more than 8 kB flash memory. On small MCUs, the 16-bit relative jump and call instructions are enough to reach each location in flash memory.

As it turns out, 32 bit jump and call instruction codes work also on small MCUs, contrary to what the official documentation tells us. They are not very useful since the shorter and faster relative jump and call instructions can reach every flash location, but the 32-bit instructions nevertheless work. It would have probably more work to “disable” these instructions on small MCUs than to use the same hardware logic as on MCUs with larger flash memory. Why Atmel/Microchip tells its users that these instructions are not supported instead of saying that they are not useful, I have no idea.

There was a little bit of completely unfounded hope that perhaps also hardware multiplication (or some parts of it) work on ATtiny MCUs. However, as experiments showed, on ATtinys, these opcodes are simply no-ops.

Two-word instructions at breakpoints

Breakpoints are implemented by setting a BREAK instruction into the program code. When restarting from a breakpoint, one could replace the BREAK instruction with the original instruction, single-step, replace the original instruction with the BREAK again, and then continue. In order to minimize flash memory wear, one can execute the original instruction “offline” in a special debugWIRE instruction register.

That works with the ordinary one-word instructions quite well. For two-word instructions, the official Microchip documentation states that one should refrain from inserting breakpoints at these locations, implying that this could create problems. Indeed, RikusW noted in his reverse engineering notes about debugWIRE:

Seems that its not possible to execute a 32 bit instruction this way. The Dragon reflash the page to remove the SW BP, SS and then reflash again with the SW BP!!!

I noticed that this is still the case, i.e., MPLAB X in connection with Atmel-ICE still reprograms the page twice for hitting a breakpoint at a two-word instruction.

Now, what happens if we load the first part of the two-word instruction into the instruction register, set the program counter pointing to where the original instruction were located and then call for an “offline” execution? It turns out that the MCU does the only sensible thing, namely, loading the second word and executing the two word instruction as if the first word had been stored at the original location (on some examples). Since this is the most elegant solution, I was very much tempted to implement it in the harware debugger. Since Atmel and Microchip had decided to implement the reflashing-twice solution, however, I suspected that there might be some corner cases or MCUs under which the offline execution of two word instructions may not work, and for this reason refrained from adapting this solution.

For this reason, I decided to simulate the execution of these instructions in the hardware debugger, which is at least as fast and saves two reprogramming operations and thereby reducing flash wear. It is not clear to me, why Atmel did not chose this solution.

Changing the DWEN fuse can be unsuccessful, sometimes

In order to change into the debugWIRE state, one needs to program the DWEN fuse in the high fuse byte. According to the datasheets, this is done by enabling ISP programming and then programming the fuse. This works indeed for almost all cases. However, I own an ATmega48 and an ATmega168 (no A-suffix), which both show an extremely strange behavior.

These two MCUs accept the fuse programming commands, but the fuse is unchanged afterwards. Interestingly, when the low fuse byte is programmed at one point, then high fuse byte programming is always successful afterwards. This only changes when the MCUs are disconnected from the power supply (and all pins are shortened).

In order to mitigate this behavior, dw-link always programs the low fuse byte before the high fuse byte.

debugWIRE communication speed

When a debugWIRE session is started, the communication speed is MCU clock speed divided by 128. If the MCU uses a 16 MHz clock, communication speed is 125 kbps. If it runs on a 1 MHz clock, it is roughly 8 kbps. While 125 kbps is reasonably fast, 8 kbps leads to sluggish behavior of the debugger, in particular when single-stepping or loading a binary.

It is possible to change communication speed as documented by RikusW. However, it is a bit complicated, since after each break condition on the debugWIRE line, sent by the debugger or generated by the target, the communication speed is reset to its original speed. In order, to complicate matters even more, it is set to half the original speed when a break condition is used to stop the execution of a program on the MCU. And this happens only with my debugger, not with Atmel-ICE. I will try to find out what kind of magic Atmel-ICE uses in order to prohibit that.

Atmel-ICE always sets communication speed to 250 kbps. I also tried that. However, there appeared to be the occasional communication error. So, dw-link uses only 125 kbps, under which it appears to be pretty stable.

The I/O address of the DWDR

The debug wire data register (DWDR) is used to communicate with the environment. It is mentioned in the data sheet of every MCU that supports the debugWIRE interface. However, only in a few cases, the data sheets tell you at which I/O address you will find the DWDR. Sometimes, it has a different name as in the AT90PWM1 data sheet, in which it is called MONDR (short for monitor data register). Sometimes, you will find it in the AVR IO include files or in the ATDF description files that are shipped together with Microchip Studio and MPLAB X. For the ATtiny2313 and the ATtiny4313, I could not find anything, though. Fortunately, in the code of dwire-debug and DebugWireDebuggerProgrammer, the I/O address is mentioned as 0x1F, and that works perfectly.

Usually, a MCU family will use the same I/O address for the DWDR, e.g., the ATmegaX8 family has the DWDR at I/O address 0x31 and the ATtinyX5 family uses 0x22. For this reason, after having determined the I/O address of the DWDR for ATtiny2313, I thought the ATtiny4313 would use the same I/O address. As it turns out, the ATtiny4314 is the only exception among all AVR MCUs in that it uses a different I/O address for the DWDR than the rest of the MCUs in its family. It uses 0x27, which I found out by simply trying out different I/O addresses.

What I have learned

Never make the assumption that some piece of hardware/software will work on an AVR MCU simply because it works on a similar MCU. Just buy a sample and test. And never let you get fooled into believing that two MCUs behave identically (with respect to their essential functional specification) simply because they both have the same device signature code.

On top of that, it is, of course, a really interesting challenge to deal with undocumented features of a MCU family. What remains unclear to me is why Atmel (and now Microchip) kept the debugWIRE protocol a secret, though.