Hardening STM32 code
Few weeks ago I went to a STM32 training day. Those training are targeting people with knowledge in embedded & micro-controller who want to discover the STM32 chips & development tools. After the course you have a good idea of the full range STM32 products capabilities & roadmap. And, more interesting, you know how to setup the full development suite & send to a Nucleo-64 board your first design. Nice for a hobbyist like me, a really cool day and … it’s free!
But at the end of the day I had an interesting chat with one of the trainers regarding the STM32 security’s features. The ST’s point of view is “this is not a secure target, and you should not rely on STM32 for critical hardware. We have various other secure chips”. Ok, good point for high range product with secret to protect from billion dollars laboratories. But to my point of view, low range products deserve a minimum of care regarding security. No need to deliberately sell unsecured stuff. The IOT (Internet Of shiT) coming is good example of what why we should care …
Let’s see how we can hardening the STM32 code generating to keep the attackers outside.
The dev kit :
This board is an official NUCLEO-64 kit, the CPU is a STM32L476 (details HERE).
The tool used to compile the design is System Workbench for MCU, from ac6 (details HERE):
The bad design:
Here is the “Hello World !” of the buffer-overflow. A LED blinking spaghetti code from a trainee, with a string vector boundary unchecked. As the ARM CPU have an interesting feature to avoid using the stack for one level of “call”, you need 2 successive “call function” to have a return address pushed on the stack.
The result is a blinking LED (GPIO_PIN_5) with a long period of:
Exploiting the bad code:
The bad thing happened in PigCode2, if the bufferTmp is written more than 60 bytes, it starts smashing the stack and you can gain control of the PC. I don’t detail this step, Google search for it if needed (link HERE).
The rogue code, a blinks the LED too but with a period of 2x291ms, slightly faster. It does it from the stack and calls back the HAL functions HAL_GPIO_TogglePin (0x0800055A) + HAL_Delay (0x080002A0) of the chip.
ASM ARM (in THUMB mode! thanks Polymorf 🙂 ) is:
The stack & ASM look like this (click for full size screen-shoot):
sp = 0x20017fc8, rogue code is at 0x20017f8c The payload with the stack smashing is 60+4+4 bytes: \xA8\xB0\x20\x21\x4F\xF0\x90\x40\x40\xF2\x5B\x56\xCB\xF2\xEF\x66 \xCB\xF6\xEF\x67\x86\xEA\x07\x06\xB0\x47\x40\xF2\x23\x10\x40\xF2 \xA1\x26\xCB\xF2\xEF\x66\xCB\xF6\xEF\x67\x86\xEA\x07\x06\xB0\x47 \xE7\xE7!!!!!!!!AARRRR\x8D\x7F\x01\x20 Focus on the last bytes: ...ARM ASM end][PAD 10 B][R4][adr inj. code ] ...\x47\xE7\xE7!!!!!!!!AARRRR\x8D\x7F\x01\x20
C code :
If you run the original code with this crafted string, the LED blinking faster: the code is running fine, but from the stack. The original code doesn’t have any more the control of the CPU.
Note: with this simple POC it is impossible to supply the payload from outside the CPU; it must come from the flash. But it would be the same result if the payload was sent from another device connected to the STM32.
Let’s introduce you the XN bit !
The x86 arch have an interesting feature, the NX bit. It allows a fine control of the memory pages and for example allows you to deny code execution from the stack. Few minutes of Google point you the equivalent stuff in ARM, the bit XN for “eXecute Never bit”. For the STM32, ST writes an application note number AN4838 (link HERE).
According to this doc, the RAM zone start at 0x20000000 and end at 0x3FFFFFFF. The STM32L476 have 128KB of RAM. This memory zone should not contain code and must be protected from code execution. The following code does the job:
If we run again the malicious payload, the exec path looks like this:
The PC is changing from the flash to the sram (stack zone), but when the CPU fetch the first instruction from the stack it raise an exception (an interruption) falling in deadloop code. Nice.
The cost of this mitigation is *only* 176 bytes of flash and 0 byte of RAM!
ROP & missing stack protector:
With the XN bit it’s now impossible to pass & exec payload from the stack. Great. But you always have the ability to control the execution path via a stack smashing. This mean you can build ROP chain and do nasty thing. Again, see Google for a full explain (link HERE).
To protect against this a stack-protector must be used. The canary is well proven solution. The official development kit is using GCC, so adding stack-protector should be simple: add -fstack-protector-strong during compilation stage, and link with the lib “ssp”.
Adding compilation option for the canaries:
Adding link option to include libssp:
But it doesn’t works. The compilation is OK, but the libssp is missing:
Libssp is missing; it failed to link the design. And you could not locate the file in the whole folder of the development kit.
The poor man’s STM32 libssp:
(or how to improve (awfully) System Workbench by adding your own ssp lib)
Minimal libssp needs to export 2 labels: __stack_chk_guard & __stack_chk_fail. So let’s build something functional for this POC.
A simple but fully working home-brew libssp can be done easily:
The libssp.o object file must be generated according to your target. You need to rip from the usual build process a call to GCC. Example for the STM32L476 chip:
-mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16
-Os -g3 -Wall
-fmessage-length=0 -ffunction-sections -c -fmessage-length=0
-o « libssp.o » « libssp.c »
Then, convert the object to the static linkable file:
arm-none-eabi-ar.exe rcs libssp.a libssp.o
A screen-shoot of the full process:
And you must include the project folder during the linking stage:
Unsecured code without the stack protector:
And with the stack protector:
The stack only grows of 4 bytes, for the 32 bits canary. And adding -fstack-protector-strong & link with a simple libssp cost only 72 bytes, as “-strong” heuristic only protect functions worth it. If you belong to paranoid people, -fstack-protector-all cost 752 extra bytes. That’s not much for definitively removing ROP threat from the design.
For the XN bit, it’s up to you to handle it right way, take the sample code and adapt it for your particular STM32 chip and your done. This solve the MOST DANGEROUS software security issue in embedded, the stack-buffer-overflow.
For the stack protector, my piggy hack works for a POC. But if a neat and dedicated to STM32 libssp could be added to System Workbench AC6 tools it would be great. As they usually said: “please contact your local ST sales representative” and do a feature request ;).
Following the classic receipts given in this page you can remove the 2 major’s threats from your STM32 design: stack based buffer overflow & ROP. As we are in embedded world, it’s enough to mitigate near all the software attacks and ask to the dark-side of the force to deploy more invasive attacks on the chip itself to dump or control it.
I found application note AN4729 *VERY* interesting, but I don’t have any information (link HERE). If someone dig this solution, I would be happy to have more technicals stuff on this new feature.