The case for using Rust for Automotive software

A massive fire broke out at the Notre-Dame cathedral on 15th April 2019. The fire started below the roof and spread to consume the complete roof and damaged the upper walls. The building was undergoing maintenance at the time. The exact cause of the fire remains unknown, but officials think it might have been a short-circuit.

Let’s assume for now that the cause is electricity. I think the root cause has a higher chance of it being a loose connection rather than a short circuit. Given the importance of the site and the risk of fire posed by the wood that has stood the test of time from the 13th century, there must have been adequate safety mechanisms in place to detect a short circuit and cause the breaker to trip. A short-circuit is easy to detect. The failure happens almost instantaneously, and if the fuses, breakers, and wires are rated appropriately, the power is disconnected fast enough to prevent a fire from starting.

A loose-connection behaves differently. The contact resistance increases and causes localized sparks and heating. Over time, the area becomes hot enough to ignite any combustible substance nearby. There is no increase in the current drawn, and none of the safety mechanisms can detect such a failure. A loose connection is usually the result of careless installation or inferior quality components. Just imagine, a building centuries old being brought down because someone did not take care when tightening a screw or when splicing wires. It is challenging to detect such errors, and even the most experienced worker can make the occasional mistake.

What does this have to do with software? I want to make an analogy with software failures and the two types of electrical failures mentioned above.

Software faults manifest in a couple of ways. The first type of failure is the one that can be persuaded to occur by improving test coverage and by writing specific tests to ensure the sunny paths and the error paths are exercised.

The second type of fault is far more elusive. This type of error occurs due to data races and memory corruptions. These result in one-of failures in that are incredibly difficult or even impossible to reproduce under test conditions. Trying to observe the failure changes what we are trying to find, and the failure ceases to happen. The use-after-free weakness CWE-416 is a manifestation of such an error, and there are 127 vulnerabilities due to CWE-416 from January to July 2019 alone. [2]

We can compare the first type of fault with short-circuits and the second with loose-connections. The former causes an observable failure that can be detected and corrected. Improved testing and coverage analysis can guarantee that these errors don’t occur in production. The latter lies hidden until it becomes the cause of a catastrophic failure.

We waste hundreds of hours trying to reproduce and fix bugs caused due to such issues. If we do manage to find the fault, it is after a lot of time and effort. In some cases, we don’t, and we write this off as a “once” issue that is un-reproducible. Of course, Murphy ensures that the error happens again at the most inopportune time.

Writing safe multi-threaded programs is hard. Multi-threaded applications running on multiple cores can cause even more issues if the developer does not understand the nuances of writing such software.

Let us now put ourselves in the shoes of a developer who needs to write system software in an embedded automotive system. Let’s assume a system based on Adaptive Autosar, which uses C++14 as the primary programming language.

[1] provides a guideline for the use of C++14 in Adaptive Autosar. There are 258 pages of instructions that extend the scope of the MISRA guidelines. So in addition to the understanding the new features of C++14, the developer also needs to be aware of what elements of the language are allowed, and how to use them.

Checks that can be automated are done using static code analyzers that parse the source code and report any potential errors. These errors have to be reviewed and corrected. Assuming the developer has gone through all these hoops, are we sure that this results in code that is free from data races or memory errors? Here is a quote from “3.2 Limitations” of [1].

“If the user of this document uses parallel computing, C++ standard libraries or develops >security-related software, then they are responsible for applying their own guidelines for these topics” — Guidelines for the use of the C++14 language in critical and safety-related systems

So if your software is multi-threaded and uses multiple cores, you are on your own, and Adaptive Autosar does not have any guidance for these aspects. Most modern software system software is multi-threaded, which usually runs on multiple cores.

I think the Adaptive Autosar spec is short-sighted for fixing C++ as the standard language for development. Selecting C for classic Autosar made sense, as it was targetted for microcontrollers. However, Adaptive Autosar is meant for larger systems and allowing flexibility to the implementation language would have been a good idea. I hope future versions of the spec will consider that. However, I digress, let’s get back to the main topic.

No one is perfect; even experienced programmers make mistakes. Relying on static code analysis and manual reviews seem to be a duct-taped solution, a solution that we have gotten so used to, that we don’t see the absurdity of it.

There has to be a better way to write system software that is free from data races and memory errors. How can we ensure that no loose-connections exist? If we can achieve this, we can focus on fixing the short circuits that are easier to test for in a controlled environment.

C and C++ have been the de-facto systems programming language for decades. While C has remained relatively constant, C++ has evolved considerably, becoming the language of choice for high-performance software.

There is increasing interest in moving to more modern languages for systems programming. Google’s Golang has been very successful in persuading developers to leave the C/C++ comfort zone to develop system software. Docker and Kubernetes, both developed in Go and both very popular are great examples.

There is no doubt that GoLang is suitable for developing server class system software, but there are some aspects of it that don’t make it ideal for embedded systems. One is that it requires a runtime, and second is that it is Garbage collected. The former means that you cannot develop bare metal software using Go. The latter means that you lack control of memory deallocations, and this may lead to unexpected pauses during Garbage collection. You don’t want that to happen when rendering a frame of video for a rear-view camera.

There is another language that started its life almost at the same time — Rust. Initially developed by Mozilla Research beginning in 2006, Mozilla took a bold step to build their new Servo browser engine using Rust, evolving the language in parallel with the development of the browser. Rust is now stable with the release of the 2018 edition. Go had an immediate uptake due to the support from Google. Rust did not see similar rapid growth, and I believe this is a good thing. Like a dish cooked in sous-vide style, Rust has had time to cook slowly, developing the right flavours and still retaining moisture. Safety, speed, and concurrency goals of Rust are not compromised no matter how the language evolves.

Rust guarantees that software developed in Rust is memory safe and free from data races, a bold claim to make but surprisingly something that it achieves. Also, the language is very expressive, taking concepts from functional programming languages, making a system software developer feel like the SICP Wizard, a feeling that is quite rare for the system software developer.

There are mechanisms by which you can disable the safety checks ( unsafe blocks). Unsafe blocks provide a clear marker to unsafe areas, and when something goes wrong, you know where to look.

Memory safety, free from data-races, do we lose out on performance? You will be pleasantly surprised that this is not the case. The Rust compiler performs all checks at compile time and thus does not incur any runtime overhead. Performance is comparable to C++. Rust aims for zero-cost abstractions.

As all the checks are at compile time, one does tend to spend more time arguing with the compiler. Once the code compiles, you are pretty sure it is going to work assuming, of course, that you have coded the logic correctly. You do not need to run a compiler and a static code analyzer. With Rust, they are the same.

This guarantee is a wonderful thing for system software and especially embedded software. Ensuring the code is free from data races or memory corruption at compile time saves debugging time. Running and debugging embedded software that runs on a remote target always takes additional effort and solving fundamental issues at compile time increase developer productivity.

Rust does not have a runtime and does not need a garbage collector. You can write bare metal software using Rust. You can seamlessly interface with C code in both directions. You can even write a Linux Kernel module in Rust if you want to be on the bleeding edge. [3]

Rust is opinionated in the naming conventions. While some people may not like this, I think this is great for an organization as you do not need to waste your time defining coding conventions and style guides. You also get a ready-to-use code formatter.

Rust comes with a package manager and build-system called Cargo. You don’t have to fuss around with build files to start building your code. Cargo supports the concept of features that can be enabled to include or exclude code during builds conditionally.

The Rust programming language, in my opinion, is headed in the right direction to become a suitable language for system software development. There is a strange feeling of craftsmanship when writing Rust, one that I personally attribute to the novelty of the new language — I’m still to see if that stays as I am still learning.

However, as with any new language, there are hurdles for potential Rust developers to cross.

Some may think that Rust is too strict and that it takes away freedom from the developer. I feel that the safety of Rust gives us more freedom to solve the problem at hand rather than worrying about data races and memory corruption. The language continuously forces you to write good quality code.

Getting started

Changing something as fundamental as the programming language is difficult. It is highly unlikely that one fine day you decide to switch to using a new programming language for your ongoing project. One approach is to start with smaller parts of the system to evaluate the experience. You could build standalone executables or libraries using Rust for such evaluations. Rust is pretty nice to develop libraries with a C api. [4]

Even if you don’t succeed in using Rust for production programs, learning it may help you learn to write safer software in other languages. Asking the question, “would this be allowed” in Rust?” is a good question for a mental checklist.

Clock speeds are hitting limits and Moore’s Law is just barely being kept alive by increasing the number of CPU cores in a processor. Multi-core systems are the norm going forward, but sadly, a vast majority of software and software developers have not kept up. I firmly believe that Rust is an excellent language for complex embedded software — especially automotive system software that requires additional safety guarantees. I wish that system software developers see this potential and try out Rust. I believe there is scope to build an equivalent to Adaptive Autosar with Rust.

The Rust programming language book [5] is the official documentation and learning material for the language. This is a great book for learning Rust; however, for experienced C/C++ programmers, I recommend the book “Programming Rust by O’Reilly Media [6]”. This book quickly gets to the point without spending time on programming basics.

Are you already using Rust and love it or hate it? I am curious to learn about your experience.

Some interesting Rust projects

References

  1. Guidelines for the use of the C++14 language in critical and safety-related systems. AUTOSAR Adaptive Platform. Just google this to find a link.
  2. https://www.cvedetails.com/vulnerability-list/cweid-416/vulnerabilities.html
  3. https://github.com/tsgates/rust.ko
  4. https://github.com/eqrion/cbindgen
  5. https://doc.rust-lang.org/book/
  6. Programming Rust — Fast, Safe Systems Development, O’Reilly Media ISBN: 0636920040385
  7. Awesome Rust, a curated list of Rust code and resources. https://github.com/rust-unofficial/awesome-rust