Introduction - Why FOTA?
There are several reasons why updates and the possibility to update have become much more important in recent years. Primarily, system complexity has skyrocketed. A simple internet-enabled switch contains a complete IP stack and most likely a TLS implementation. On the application level, there might be a MQTT client or some other kind of protocol. To use TLS, you need certificates. Most certificates have an expiration date, and you will need to renew them. You might also have other reasons why you might want to replace a certificate, for example because it got stolen. Replacing certificates is one of the good reasons to have the ability to update your devices.
All this functionality (protocols, security, IP…) will be provided by libraries. Overall, the number of lines of code of this simple device will end up being in the tens to hundreds of thousands. It seems reasonable to assume that there will be one or more bugs hidden somewhere in there. When your device is sold and out in the field, someone might find this bug, and suddenly all your devices have a known and documented security vulnerability. An attacker can only make one device crash, but if you are out of luck, they can take over your whole fleet of devices, execute a DDOS-attack against your backend or worse, someone elses.
There are certain things you can do to protect yourself from such an attack becoming a nightmare, but it is nearly impossible to avoid these kinds of bugs entirely. One example for this is the recently discovered flaw in curl, a widely used library and tool, CVE-2023-38545. According to curl’s website, there are 20 billion curl installations worldwide. The upcoming EU RED regulation will also require all IoT devices sold in the European Union to have some sort of functionality to update their firmware, precisely with the goal of increasing the security of deployed devices after they were manufactured.
Last but not least, you may also want to add further functionality to your devices after they have been sold, either because it wasn’t done in time, because you had a great idea after the launch or maybe some competitor has features that your devices should also have.
With the “Why?” answered, read on to learn how we ported an existing golang client library to C and used it on tiny microcontrollers.
CMender
In 2019 we were working on an ESP32-based, WiFi-connected smart home device and needed a way to update its firmware remotely. Somebody recommended mender for its server-side features and it did indeed look great. The issue being that the client was written in golang and there’s no way to run that on our resource-constrained MCU. So, we wrote our own.
We have decided to make the code open source and write this article to provide some background information. You can find the code on GitHub. This client was developed and put into production many years ago but even though there’s an official C++ client in the works there might still be use for our client. Read on to find out why 🙂
Over-the-air update client requirements
So let’s start with the basics. What do we expect the OTA client to do and how does it work?
No interference
The client must not interfere with the “normal” code running on the device. The same is true vice versa: Even when the “normal” code is broken - e.g., when there is a bug in the cloud client that prevents it from being able to connect - we still must be able to update the firmware to fix the bug. While this can’t be guaranteed, there are a few things you can do to change the likelihood in your favor, for example by letting the OTA client run in its own thread instead of integrating into a global event loop.
Speed and Reliability
An update should be installable while the system is running without an impact on usability. In other words: we quickly want to reboot into the latest version. We also must be able to roll back to the previous version if either the bootloader can’t load the new firmware, or the new firmware can’t initialize properly. Both things can be achieved using the common A/B partitioning scheme. We simply have two partitions and install the new firmware into the inactive one. Many systems like mcuboot or Espressif’s bootloader support that already, so we need the cmender client to integrate into those using platform-specific code.
Another form of reliability is to reduce the likelihood of the code to fail at runtime. Not using dynamic memory allocation unless you actually don’t know number and size at compile-time is a great example for that.
ROM and RAM Usage
We want to run this on MCUs like the ESP32 or even the ESP8266. Those are limited in both flash and ram sizes. That means that code must be kept small and RAM must be used carefully. It’s also a good idea to mark data that doesn’t need to be modified at runtime using the const keyword. This reduces RAM usage on a system that can read and execute data from flash, because the loader doesn’t have to copy read-only variables from flash to RAM.
Dependencies
In this blog post we’ve mentioned “platform-specific code” a few times. What we mean by that is that we have abstracted everything that is platform-specific into an exchangeable platform library within the cmender client. This is very useful to make the client work on as many platforms as possible without the need to include the code for all of these platforms in the core logic of the client. Examples for this are ways to load/store data or to log messages to the console.
Disclaimers
tinygo
You may ask why we didn’t use tinygo to compile the original client for our MCU. Well first, we didn’t know about it back then and up to this day have never used it. We’d also have to take a close look at the RAM and flash usage of tinygo programs because we were VERY constrained in that due to the requirements of the rest of the software running on the device. The A/B partition scheme needed for FOTA certainly didn’t help with that.
The official C++ client
It didn’t exist back then and at the time of writing this article it didn’t have a stable release, yet. There actually were discussions about C clients back then but they were progressing very slowly and nobody had started writing any code yet. They need to invest more work because they have many customers with very different needs. For our use case, we were able to write a working client within a couple of weeks though. After that it took a few months of testing and writing unit tests to make it production ready.
That being said, it doesn’t look like their current C++ client is ready to use without Linux and CMake - therefore it is not platform independent. This means we’d have to work quite a bit on their code. Also, most of our MCU-based work is entirely written in C, and adding a C++ component is not ideal. They also seem to make use of dynamic memory allocation and libstdc++ which might cause resource and reliability issues on our devices.
Version
The development happened many years ago, so all of this is based on mender 1.5. Some aspects may have changed since then, so keep this in mind.
Where to even start?
By reading the Device API documentation. There we can see that the API is actually very simple. We must authenticate using device-specific credentials, we have to provide status updates via the inventory API and we have to poll for updates. Easy, right? Well yes, if you don’t want to use any of the features that make the official client awesome. Things like rollback or all kinds of race conditions like deleting a deployment from the server while the client is installing it. There is a lot of knowledge and experience in their code and starting from scratch and implementing every feature one by one would not allow us to create a production-ready client within a few months.
So, what’s the obvious thing to do here? Copy all of the relevant go files, rename them to C files and convert them line by line until they compile - duh. 🤷♂
Manual transpilation
This might sound crazy at first but is surprisingly simple. That’s because C and golang are very similar languages. Neither supports OOP on a language level (especially not inheritance), neither supports exceptions so you do the typical if (err) { return err; } and they both have a very similar syntax. Let’s look at an example:
GO
func (m *MenderAuthManager) GenerateKey() error { if err := m.keyStore.Generate(); err != nil { log.Errorf("failed to generate device key: %v", err) return errors.Wrapf(err, "failed to generate device key") } if err := m.keyStore.Save(); err != nil { log.Errorf("failed to save device key: %s", err) return NewFatalError(err) } return nil }
C
mender_err_t mender_authmgr_generate_key(struct mender_authmgr *m) { mender_err_t err; err = mender_keystore_generate(m->keystore); if (err) { LOGE("failed to generate device key: %u", err); return err; } err = mender_keystore_save(m->keystore); if (err) { LOGE("failed to save device key: %u", err); return MENDER_ERR_FATAL(err); } return MERR_NONE; }
So, the process goes like this:
- convert the syntax
- declare variables
- convert log statements
- convert return statements
mender_err_t is a uint32_t and we use the most significant bit to indicate if the error is fatal or not. We also must create a new enum variant for every case because we don’t want to store strings in return values.
Results
We did this for the most crucial parts like state.go and it worked out very well. The code does look a little bit weird in terms of data types because we have to pass around and store a lot of “interface” and callback pointers. Luckily, all of that is allocated statically so we don’t have to think about lifetimes. However, this ended up in gorgeous code like this 🙈 :
mender_create( &mender, &store, &authmgr, &stack, &client, &dev, &iv_data, CONFIG_PROJECT_FIRMWARE_VERSION, CONFIG_MENDER_DEVICE_TYPE, CONFIG_MENDER_SERVER_URL, CONFIG_MENDER_UPDATE_POLL_INTERVAL, get_earliest_update_time, CONFIG_MENDER_INVENTORY_POLL_INTERVAL, CONFIG_MENDER_RETRY_POLL_INTERVAL);
Unit tests
The original client has quite a few useful unit tests. Unfortunately, they weren’t flexible enough to make them test against our C client. Instead, we rewrote all of them to C. GOs test framework is too different from cmocka so transpilation was not an option here. The tests are heavily mock-based so we had to write a ton of those as well. With cmocka that’s easy but a lot of boring copy and paste work.
Here’s an example:
GO
func TestStateInventoryUpdate(t *testing.T) { ius := inventoryUpdateState ctx := new(StateContext) s, _ := ius.Handle(ctx, &stateTestController{ inventoryErr: errors.New("some err"), }) assert.IsType(t, &CheckWaitState{}, s) s, _ = ius.Handle(ctx, &stateTestController{}) assert.IsType(t, &CheckWaitState{}, s) // no artifact name should fail s, _ = ius.Handle(ctx, &stateTestController{ inventoryErr: errNoArtifactName, }) assert.IsType(t, &ErrorState{}, s) }
C
static void test_state_inventory_update(void **state __unused) { mender_statemachine_create(&sm, store, mender); will_return_always(mender_time_now_test, FAKE_TIME); /* error */ mender_inventory_refresh_expect(mender, MERR_UNKNOWN); sm.current_state = MENDER_STATE_INVENTORY_UPDATE; assert_int_equal(mender_statemachine_run_once(&sm), MERR_NONE); assert_int_equal(sm.current_state, MENDER_STATE_INVENTORY_UPDATE_ASYNC); assert_int_equal(sm.last_error, MERR_UNKNOWN); assert_int_equal(sm.next_state_update, 0); assert_int_equal(mender_statemachine_run_once(&sm), MERR_NONE); assert_int_equal(sm.current_state, MENDER_STATE_CHECK_WAIT); assert_int_equal(sm.next_state_update, 0); /* success */ mender_inventory_refresh_expect(mender, MERR_NONE); sm.current_state = MENDER_STATE_INVENTORY_UPDATE; assert_int_equal(mender_statemachine_run_once(&sm), MERR_NONE); assert_int_equal(sm.current_state, MENDER_STATE_INVENTORY_UPDATE_ASYNC); assert_int_equal(sm.last_error, MERR_NONE); assert_int_equal(sm.next_state_update, 0); assert_int_equal(mender_statemachine_run_once(&sm), MERR_NONE); assert_int_equal(sm.current_state, MENDER_STATE_CHECK_WAIT); assert_int_equal(sm.next_state_update, 0); /* no artifact name should fail */ mender_inventory_refresh_expect(mender, MERR_NO_ARTIFACT_NAME); sm.current_state = MENDER_STATE_INVENTORY_UPDATE; assert_int_equal(mender_statemachine_run_once(&sm), MERR_NONE); assert_int_equal(sm.current_state, MENDER_STATE_INVENTORY_UPDATE_ASYNC); assert_int_equal(sm.last_error, MERR_NO_ARTIFACT_NAME); assert_int_equal(sm.next_state_update, 0); assert_int_equal(mender_statemachine_run_once(&sm), MERR_NONE); assert_int_equal(sm.current_state, MENDER_STATE_ERROR); assert_int_equal(sm.next_state_update, 0); }
You may notice that we call the state machine more often than the original code. That’s because we need many additional states due to our lack of threads which we’ll explain in the next section.
Async code
The state machine of the original mender client was just intended to represent logical states like update check and update fetch. The actions themselves like downloading an update were blocking implementations. We run cmenders statemachine on a separate thread instead of integrating it into an existing event loop (which is still supported anyway), because we ran into esp-idf bugs where blocking socket operation can stall the entire system. So instead, we made the whole client async. In C that means lots of state machines and callback functions.
We have a few modules like the http client which are self-contained, so they maintain their own async state machines and simply run a callback with the result once they’re done. For the main mender state machine we must add additional states that signal that we’re currently waiting for an async callback. So, for example, in addition to the update_check state we also have an update_check_async state. Pretty straight forward since all we have to do is split up a function everywhere, where there was a blocking call in the original code, add a new state and trigger advancing the state machine inside the callback that tells us that the async operation has completed. In other words: We did manually what languages with async-await support like Rust do automatically on a compiler level.
Resource usage optimizations
On an ESP32 things mostly worked but weren’t ideal. For another customer we had to run this on an ESP8266 though where things were tighter. So, here’s all the optimizations we had to do for one or both platforms.
SSL
We had to learn a lot about how SSL works to be able to understand the issues and do the necessary fine tuning. Basically, depending on the server configuration and the certificates used SSL can be very memory and CPU hungry. So much so that the ESP32 needed tens of seconds to get past the SSL-handshake all while needing so much RAM that there wouldn’t be any left for the main application. So, since we didn’t want to ship useless devices that update themselves without running anything else we had to improve the situation somehow.
Ciphers
First, we chose cipher suites and curves that need as little RAM as possible while still being secure. After some benchmarking those turned out to be (for mbedtls) MBEDTLS_TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384 and MBEDTLS_ECP_DP_SECP256R1. We chose matching certificates that support these.
Handshake
The handshake is also an issue because large certificates might get transferred. Solving this meant using a custom CA whose public key is compiled into the firmware so we can avoid sending a long cert chain on the server side.
Max Fragment Length negotiation (MFLN)
By default, SSL needs two 16KB buffers - one for sending and one for receiving. That’s obviously too big for poor little MCUs. Luckily, there’s an awesome feature called MFLN, which allows using buffers as small as 512 bytes each. On the server side we had to update the openssl version for that since it was a very recent feature at the time. As for the mbedtls client, it already had support for that and that worked well - except for the handshake. At the time mbedtls simply didn’t support that for the handshake and only fragmented the actual payload. Our workaround was making sure that all the handshake messages just happen to fit into 512 bytes. Using smaller certificates and not sending a huge CA chain certainly helped with that.
Only one handshake at a time
While the mender client never needs to do multiple requests at the same time, we also had main app code with a long running MQTT (over https) connection. Luckily, the high memory usage only happens during the handshake due to cryptography reasons. After that we’re back to our 512 byte buffers. So, all we had to do was to make sure that we’d never have both clients do the handshake at the same time. Thanks to the mbedtls api being designed in a way that we know when the handshake is happening, a mutex shared between both the mender client and the MQTT client was all we needed to solve this.
Uncompressed updates
Originally mender only supported gzip-compressed updates. The issue with gzip is that it needs many kilobytes of RAM during decompression due to the sliding window approach. Since MCU updates are tiny anyway we decided to add support for uncompressed updates. Since mender-artifact is both a CLI tool and a library that’s used for server-side verification, it’s the only repository that needed a change.
Side note: The author of the PR is marked as a ghost because the author originally used a separate account for work called “mzimmermanngcx” which was later deleted because they wanted to use their personal account instead.
ED25519 auth requests
The mender client needs to sign certain requests using an RSA key. This is a custom application layer protocol that’s not part of SSL. On the ESP8266 this was quite slow, so we added support for using ED25519 instead. For unknown reasons this didn’t get merged, but somebody else did the same work a year later which then got merged. 🤷♂
Closing words
Are we happy with the cmender client? Mostly, yes. It seems to work great in production, we have a bunch of unit tests, and we have a linux port for easy testing. The code does look a little bit weird due to the number of callbacks used. Without async-await we probably couldn’t produce a substantially better design though, so we’ll leave it at that.
The code is fully cross platform because many things - from socket communication to cryptography - are implemented within a platform library. So have fun trying it out yourself. 🙂