random kludges

notes for my future self

ARM ITM

The open source SWO/ITM/tracing scene has just exploded.

Every few years I have gotten that same frustration: I need to automate or repeat tracing data from a target MCU. Command line & open source tools, and not using printf. I do a sweep of the interent, and get nothing that ticks both boxes. But now, all of a sudden, it feels like there is an abundance of tools in this niche!

Example code used in this post is again with LibOpenCM3 and the STM32F411-Discovery board and my new favourite: orbuculum.

ITM printf

The ‘hello world’ of ITM.

Target source code

First, in the target source code, redirect file output to ITM by implementing the _write callback:

extern "C" {
#include <libopencm3/cm3/itm.h>
int _write(int file, char *buf, int len)
{
	int i;
	for (i = 0; i < len; i++)
	{
		while (ITM_STIM32(0) == 0);
		ITM_STIM8(0) = buf[i];
	}
	return i;
}
}

OpenOCD conf

The ITM module can be enabled directly in the target software, or via the debugger probe configuration:

source [find interface/stlink-v2.cfg]
transport select hla_swd
adapter_khz 4000
source [find target/stm32f4x.cfg]
tpiu  config internal itm.data  uart off  84000000 
itm port 0 on
init
reset run

Here the itm port 0 on enables the port number 0, the one the ITM_* macros in the above souce write to.

TPIU clock is the ARM (STM?) “TRACECLK”, which in this device is the same as HCLK, i.e. AHB1 clock, i.e. the main CPU clock. (Getting it wrong will corrupt the ITM datastream decoding. Getting it almost correct seems to be good enough: using 82MHz instead of 84MHz does not seem to corrupt the data. Perhaps this is just because the printf data stream is slow in comparison?)

Assuming the target device has already been flashed with the ITM-printf-enabled source code, run openocd -f <the above openocd conf>. This will dump the ITM datastream to the file itm.data

Orbuculum

After starting openocd, run:

$ orbuculum -b outputs/ -c 0,out,"%c" -f path/to/itm.data

Now outputs/out is a FIFO with the printfs from target. See it with

$ cat outputs/out

Datadumping

printf() is nice, but slow. Since ITM is just a data pipe, data can be dumped directly without formatting. A problem here is that since it is raw data, the beginning and end of e.g. an array doesn’t show up. In order to do that, use a second stream for synchronization:

	for (auto a: buffer) {
		while (ITM_STIM32(1) == 0);
		ITM_STIM16(1) = a;
	}
	while (ITM_STIM32(2) == 0);
	ITM_STIM8(2) = '\n';

Remember to enable the new streams, e.g. in the openocd.cfg

...
itm port 0 on
itm port 1 on
itm port 2 on
...

Orbuculum runs the channel processing in separate threads, which leads to mixing of two cannels into one output not working. Instead of printing with orbuculum, use orbcat.

Start orbuculum - it automatically dumps ITM data to a TCP connection if connected

$ orbuculum -f path/to/itm.data

Then start orbcat (which automatically connects to that TCP socket in orbuculum)

$ ./ofiles/orbcat  -c1,"%5d " -c2,"%c"
    0   235   143   126    92    85   189    69 
    0   129    29    82   136   207    96    58 
...

Here orbcat prints channel 1 as 5-character wide number (and a space), and channel 2 as characters, all mixed into one. So contents of buffer in the above example is printed on one line.

Memory tracing

Once the above is configured correctly, memory can be traced with the DWT (Data Watchpoint and Trace unit).

Figure out the memory address of the variable to trace, and add into the openocd.cfg. Reminder: don’t take an address of a varaiable on stack and trace that. There will be noise in the output :) Unless of course the variable is on the stack of main()

...
mww 0xE0001020 0x2001ffdc
mww 0xE0001028 0x0000000d
init
reset run

This writes DWT_COMP0 with the memory address to watch, and DWT_FUNC0 with instructions to create a hardware event whenever that adress is written to.

(More info on where the DWT registers are in memory in the Cortex-M4 Techical Reference Manual. More info on the contentes of the DWT registers are in the ARMv7-M Architecture Reference Manual.)

Orbuculum will dump this info into fifo named hwevent, if the -b option (i.e. directory to create the fifos into) is given.

Notes

Much more can be done with orbuculum - the links above include examples of loading execution trace to kcachegrind, or ploting with graphviz. Or orbtop for a top-like monitor of what is executing on target.

Also, orbuculum seems to be under heavy development. I guess there will be nice new features available next time I read this :)

All the registers can be written via OpenOCD, or on target or via gdb. Check out the gdb macros that come with orbuculum sources!

Home