What is actually sent/loaded to a microcontroller / STM32?
That is a lot of questions...
So a simple, technically functional, STM32 program:
.thumb
.globl _start
_start:
.word 0x20000100
.word reset
.word 0x12345678
.word 0xAABBCCDD
.thumb_func
reset:
nop
nop
nop
b reset
Build it and then see what we see:
$ arm-none-eabi-as so.s -o so.o
$ arm-none-eabi-ld -Ttext=0x08000000 so.o -o so.elf
$ arm-none-eabi-objcopy so.elf -O srec so.srec
$ arm-none-eabi-objcopy so.elf -O ihex so.hex
$ arm-none-eabi-objdump -D so.elf > so.list
actually start with this one:
$ arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <_start>:
0: 20000100 andcs r0, r0, r0, lsl #2
4: 00000000 andeq r0, r0, r0
8: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
c: aabbccdd bge feef3388 <reset+0xfeef3378>
00000010 <reset>:
10: 46c0 nop ; (mov r8, r8)
12: 46c0 nop ; (mov r8, r8)
14: 46c0 nop ; (mov r8, r8)
16: e7fb b.n 10 <reset>
The choice to use the ELF file format is not arbitrary but in some sense it is, there are other file formats that could be used or could invent a new one, but elf is quite useful for many architectures. The GNU tools at least for ARM default to ELF. The object files are in ELF format as well as we see above.
The assembler has converted the assembly language into machine code the best it can. The .word reset
line is not an instruction it is me asking for the address of the label reset to be placed there as that is a vector table that you need to boot the processor. The linker will fill in the externs and other gaps that the compiler doesn't know at compile time. So linked we can see the output in the so.list
file I created:
Disassembly of section .text:
08000000 <_start>:
8000000: 20000100 andcs r0, r0, r0, lsl #2
8000004: 08000011 stmdaeq r0, {r0, r4}
8000008: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
800000c: aabbccdd bge 6ef3388 <_stack+0x6e73388>
08000010 <reset>:
8000010: 46c0 nop ; (mov r8, r8)
8000012: 46c0 nop ; (mov r8, r8)
8000014: 46c0 nop ; (mov r8, r8)
8000016: e7fb b.n 8000010 <reset>
When you read the documentation from ST and ARM you find that the STM32 family is so far based on ARM Cortex-M cores, pretty much all of the flavors they make, they so far all boot the same way. The 32-bit value at address zero in the processors memory space is a value that they load into the stack pointer for you, nice feature, but I won't spend more time on it. The 32 bit word at address 0x00000004
in the processors memory space is the address where the reset handler is when the processor comes out of reset that address points at the code to run. Those are instructions, so machine code. The vector table is just vectors. For reasons I won't go into the LSB has to be a one so for the address 0x00000010
the vector is 0x00000011
.
And you can see that the toolchain has done what I asked and put the machine code for those no-ops and the branch in there.
Now for the processor to do what we want we have to have all of these bytes in a place where the processor gets them when it fetches/reads those addresses.
When you read the documentation you find that for bootloader and perhaps other reasons they can/will remap what is presented to the processor as memory at address 0x00000000
. When in the normal operating mode the flash that is at address 0x08000000
is mirrored at address 0x00000000
. So if I put 0x08000011
at address 0x08000004
which is mirrored to 0x00000004
. After reset the processor then sees that in the vector table and now fetches instructions from 0x08000010
and if we do everything right the processor will find our instructions and run them.
When you read the documentation for the chip you find that the flash is in the part and there are a couple-three ways to program the flash, to write bytes to it. One is a serial/UART deal, you wire the boot pin(s) to be high or low, reset the part, and it goes into the built-in bootloader that ST puts in there, not ours. Then there are formatted packets you communicate with the part and with that protocol you can ask it to write data to certain addresses in that flash, so the vector table and the machine code is what we would need to write:
8000000: 20000100
8000004: 08000011
8000008: 12345678
800000c: aabbccdd
8000010: 46c0
8000012: 46c0
8000014: 46c0
8000016: e7fb
These are hex numbers, the address on the left, data on the right. THAT is what we need to transfer.
Some of the STM32's have a USB bootloader, and all have a JTAG like thing, SWD, which is what you get with ST-LINK and such. On a lot of development boards you are actually talking to another microcontroller and that microcontroller is the one that uses SWD to talk to the MCU you are developing for.
Then you can wrap all kinds of host development IDE software around these interfaces for this part or other on board features, etc.
There are MANY files that qualify as "binary" files.
All of the ones I created above contain/describe the bytes and addresses we need to go into the flash. But as you saw above when I disassembled the program in the ELF file the labels _start
, reset
are in the output, how is that possible if that is not in any way used by the processor? Because these types of binaries have information like that for debugging or disassembling or other similar reasons.
hexdump -C so.elf
00000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 28 00 01 00 00 00 00 00 00 08 34 00 00 00 |..(.........4...|
00000020 d0 01 01 00 00 02 00 05 34 00 20 00 01 00 28 00 |........4. ...(.|
00000030 06 00 05 00 01 00 00 00 00 00 01 00 00 00 00 08 |................|
00000040 00 00 00 08 18 00 00 00 18 00 00 00 05 00 00 00 |................|
00000050 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00010000 00 01 00 20 11 00 00 08 78 56 34 12 dd cc bb aa |... ....xV4.....|
00010010 c0 46 c0 46 c0 46 fb e7 41 13 00 00 00 61 65 61 |.F.F.F..A....aea|
00010020 62 69 00 01 09 00 00 00 06 02 09 01 00 00 00 00 |bi..............|
00010030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00010040 00 00 00 08 00 00 00 00 03 00 01 00 00 00 00 00 |................|
00010050 00 00 00 00 00 00 00 00 03 00 02 00 01 00 00 00 |................|
00010060 00 00 00 00 00 00 00 00 04 00 f1 ff 06 00 00 00 |................|
00010070 11 00 00 08 00 00 00 00 02 00 01 00 0c 00 00 00 |................|
00010080 00 00 00 08 00 00 00 00 00 00 01 00 0f 00 00 00 |................|
00010090 10 00 00 08 00 00 00 00 00 00 01 00 21 00 00 00 |............!...|
000100a0 18 00 01 08 00 00 00 00 10 00 01 00 12 00 00 00 |................|
000100b0 18 00 01 08 00 00 00 00 10 00 01 00 20 00 00 00 |............ ...|
000100c0 18 00 01 08 00 00 00 00 10 00 01 00 59 00 00 00 |............Y...|
000100d0 00 00 00 08 00 00 00 00 10 00 01 00 2c 00 00 00 |............,...|
000100e0 18 00 01 08 00 00 00 00 10 00 01 00 38 00 00 00 |............8...|
000100f0 18 00 01 08 00 00 00 00 10 00 01 00 40 00 00 00 |............@...|
00010100 18 00 01 08 00 00 00 00 10 00 01 00 47 00 00 00 |............G...|
00010110 18 00 01 08 00 00 00 00 10 00 01 00 4c 00 00 00 |............L...|
00010120 00 00 08 00 00 00 00 00 10 00 01 00 53 00 00 00 |............S...|
00010130 18 00 01 08 00 00 00 00 10 00 01 00 00 73 6f 2e |.............so.|
00010140 6f 00 72 65 73 65 74 00 24 64 00 24 74 00 5f 5f |o.reset.$d.$t.__|
00010150 62 73 73 5f 73 74 61 72 74 5f 5f 00 5f 5f 62 73 |bss_start__.__bs|
00010160 73 5f 65 6e 64 5f 5f 00 5f 5f 62 73 73 5f 73 74 |s_end__.__bss_st|
00010170 61 72 74 00 5f 5f 65 6e 64 5f 5f 00 5f 65 64 61 |art.__end__._eda|
00010180 74 61 00 5f 65 6e 64 00 5f 73 74 61 63 6b 00 5f |ta._end._stack._|
00010190 5f 64 61 74 61 5f 73 74 61 72 74 00 00 2e 73 79 |_data_start...sy|
000101a0 6d 74 61 62 00 2e 73 74 72 74 61 62 00 2e 73 68 |mtab..strtab..sh|
000101b0 73 74 72 74 61 62 00 2e 74 65 78 74 00 2e 41 52 |strtab..text..AR|
000101c0 4d 2e 61 74 74 72 69 62 75 74 65 73 00 00 00 00 |M.attributes....|
000101d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000101f0 00 00 00 00 00 00 00 00 1b 00 00 00 01 00 00 00 |................|
00010200 06 00 00 00 00 00 00 08 00 00 01 00 18 00 00 00 |................|
00010210 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 |................|
00010220 21 00 00 00 03 00 00 70 00 00 00 00 00 00 00 00 |!......p........|
00010230 18 00 01 00 14 00 00 00 00 00 00 00 00 00 00 00 |................|
00010240 01 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 |................|
00010250 00 00 00 00 00 00 00 00 2c 00 01 00 10 01 00 00 |........,.......|
00010260 04 00 00 00 07 00 00 00 04 00 00 00 10 00 00 00 |................|
00010270 09 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
00010280 3c 01 01 00 60 00 00 00 00 00 00 00 00 00 00 00 |<...`...........|
00010290 01 00 00 00 00 00 00 00 11 00 00 00 03 00 00 00 |................|
000102a0 00 00 00 00 00 00 00 00 9c 01 01 00 31 00 00 00 |............1...|
000102b0 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
You can see the strings _start
and reset
in the file. But if you look, you can see the 0x12345678
and 0xAABBCCDD
values in there with the 0x08000010
and machine code before and after, put those in there to make it super easy to find.
00010000 00 01 00 20 11 00 00 08 78 56 34 12 dd cc bb aa |... ....xV4.....|
Intel Hex and Motorola S-record are/were competing formats from back in the day still used by some tools, you could carry these files around and a ROM programmer would burn that into the ROM. They are ASCII files like these:
cat so.hex
:020000040800F2
:10000000000100201100000878563412DDCCBBAA94
:08001000C046C046C046FBE7F4
:0400000508000000EF
:00000001FF
cat so.srec
S00A0000736F2E7372656338
S31508000000000100201100000878563412DDCCBBAA86
S30D08000010C046C046C046FBE7E6
S70508000000F2
Tith the srecord the S3 lines describe the address and data that we need to go into the flash to run. The s0 and s7 lines are additional information that the processor does not need.
arm-none-eabi-objcopy so.elf -O binary so.bin
hexdump -C so.bin
00000000 00 01 00 20 11 00 00 08 78 56 34 12 dd cc bb aa |... ....xV4.....|
00000010 c0 46 c0 46 c0 46 fb e7 |.F.F.F..|
00000018
This form of binary file with this toolchain/tools is a memory image that needs to go into the processor, byte for byte. but the user has to know the address there is no debugging or other information like that the user has to know where this data goes.
These kinds of things are true for all processors, the details for each chip/core are specific to that core, how they start, if they have vector tables, the machine code itself, if the memory is in the part how to get the code in there, if outside the part descriptions of the busses so that you can interface the part to some memory or your logic so that when it reads/fetches the instructions at some address you provide the bytes from that address so that it works.
A very flexible file format like elf is such that you can use it to carry around the "binary" the program plus debug and other info, and then have tools like objcopy
to convert it to file formats that other tools that don't support ELF might use. The elf file format you can google and is pretty simple, don't even need a library just read the file.
Not all MCUs have a bootloader built in, some only have one way in. Some folks develop their own bootloader that use other interfaces or the same ones in different ways, can/have created a bootloader for the STM32 that works over the UART using a different protocol than the one built in.
For the new fear of security "secure boot" the newer STM32's do not support the bootloader even though it's in the part, you have to unlock it before you can use it using another way in.
The STM32's only have a couple three ways in so all tools you find are coming in through one of those interfaces and/or the product has a bootloader and you are coming in through that (think AVR's on Arduinos).