Introduction
Welcome to 4SE02: Rust for Embedded Systems! 🦀
This course, taught by Guillaume Duc and Samuel Tardieu, is part of the Embedded Systems program at Télécom Paris. Throughout these labs, you’ll discover how Rust’s safety guarantees and zero-cost abstractions make it an excellent choice for embedded development.
What You’ll Learn
In these practical exercises, you’ll:
- Master Rust fundamentals through hands-on coding
- Build real embedded applications for microcontrollers
- Work with LED matrices, serial communication, and real-time systems
- Experience the power of Rust’s type system in preventing bugs at compile time
Course Materials
- Offline access: Download the complete lab materials: book.tar.xz
- Lecture notes: Access the slides by following this link
This content is reserved for students of the Institut Polytechnique de Paris.
ⓒ 2020-2026 Guillaume Duc and Samuel Tardieu – all rights reserved
Setting Up Your Development Environment
Welcome to your first step in Rust embedded development! In this section, we’ll set up all the tools you need to start building amazing embedded applications. Don’t worry if you’re new to Rust—we’ll guide you through each step.
Git Repository Setup
Before diving into code, let’s get your project repository ready.
❎ Join the course group: Request to join the 4SE02/2526 group on the Telecom GitLab using this link.
Once approved by the instructors, you’ll have your own personal repository to store all your practical work. This is where your embedded Rust journey begins!
Installing rustup - Your Rust Toolchain Manager
rustup is Rust’s official toolchain installer and version manager. Think of it as your Swiss Army knife for managing Rust installations—it handles compiler versions, cross-compilation targets, and all the tools you’ll need.
❎ Install rustup using one of these methods:
- From your Linux distribution’s package manager, or
- From the rustup.rs website
If you choose the website installation, remember to reload your shell environment afterwards so that the Rust tools are added to your PATH.
💡 Platform flexibility: While this course assumes you’re using GNU/Linux, the Rust ecosystem works great on macOS and Windows too. Feel free to use another OS, but note that we can only provide support for GNU/Linux environments.
💡 Storage tip for school computers: Running out of disk quota? Use the local directory
/home/users/LOGIN(replaceLOGINwith your username). Set theRUSTUP_HOMEenvironment variable to/home/users/LOGIN/rustupin your shell configuration files to store all Rust data there instead of~/.rustup. Remember to delete~/.rustupto free up space. You’ll need to repeat this setup if you log in to a different computer.
Understanding Rust Toolchain Versions
Rust’s compilation toolchain comes in three flavors, each serving a different purpose:
- stable: The production-ready version, rigorously tested and updated every six weeks. This is what you’ll use for this course.
- beta: The testing ground for the next stable release. Great for early adopters who want to test upcoming features.
- nightly: The bleeding edge with experimental features. Some features here will eventually make it to stable, while others are just experiments.
By default, rustup installs the latest stable version. Let’s verify your installation:
$ rustup show
Default host: x86_64-unknown-linux-gnu
rustup home: /usr/local/rustup
installed toolchains
--------------------
stable-x86_64-unknown-linux-gnu (active, default)
active toolchain
----------------
name: stable-x86_64-unknown-linux-gnu
active because: it's the default toolchain
installed targets:
x86_64-unknown-linux-gnu
On a development system, you may find several versions of development chains, targets, etc., for example:
$ rustup show
Default host: x86_64-unknown-linux-gnu
rustup home: /home/sam/.rustup
installed toolchains
--------------------
stable-x86_64-unknown-linux-gnu
beta-x86_64-unknown-linux-gnu
nightly-x86_64-unknown-linux-gnu (active, default)
active toolchain
----------------
name: nightly-x86_64-unknown-linux-gnu
active because: it's the default toolchain
installed targets:
riscv32imac-unknown-none-elf
riscv32imafc-unknown-none-elf
riscv32imc-unknown-none-elf
thumbv7em-none-eabihf
thumbv7m-none-eabi
thumbv8m.main-none-eabihf
x86_64-unknown-linux-gnu
You can update all installed components with rustup update.
❎ Ensure you’re up to date: Run rustup update to get the latest stable version. This is especially important if you already had rustup installed before this course.
Your Rust Toolkit
When you install a Rust toolchain, you get a powerful set of tools:
cargo: Your command center! This all-in-one tool orchestrates compilation, testing, documentation, and more. You’ll primarily interact with Rust through cargo commands likecargo build,cargo test, andcargo run.rustc: The Rust compiler itself (though you’ll rarely call it directly—cargo does that for you).rustdoc: Automatically generates beautiful documentation from your code comments.rustfmt: Formats your code according to Rust community standards.clippy: A smart linter that catches common mistakes and suggests idiomatic improvements.
Choose Your Code Editor
You’re free to use any editor you prefer! Unless you already have one editor picked up already (Emacs, Neovim, Lapce, Helix, …), we suggest that you use:
Visual Studio Code with these extensions:
rust-analyzer: Provides intelligent code completion, inline error checking, and navigation. It’s like having a Rust expert looking over your shoulder!Error Lens(optional): Displays errors inline as you type. Helpful but can feel a bit intrusive—try it and see if you like it.
Code Quality: Formatting and Linting
Rust’s community values consistent, high-quality code. Two tools help you achieve this effortlessly:
Clippy: Your Rust Mentor 🦀
Clippy is an intelligent linter that identifies anti-patterns, inefficient code, and suggests more idiomatic Rust approaches.
⚠️ Important: Run
cargo clippyregularly on your code. Address its suggestions or adjust your code until Clippy is satisfied. Think of it as pair programming with an experienced Rustacean!
You have some flexibility:
- Disable specific warnings when you have a good reason: Use
#[allow(clippy::some_lint_name)]on specific items (be ready to justify this choice!) - Enable stricter checks for even better code quality: Run
cargo clippy -D clippy::pedantic
Rustfmt: Consistent Formatting Made Easy
⚠️ Keep your code formatted: Simply run
cargo fmtto automatically format your code according to Rust’s standard style. No debates about formatting—Rust has agreed on one style for everyone!
Pro tip: Run both cargo fmt and cargo clippy before every commit to keep your code clean and professional.
Your First Rust Program: Fibonacci
Let’s dive into Rust with a classic programming exercise: computing Fibonacci numbers! This hands-on introduction will get you comfortable with Rust’s syntax, tools, and workflows.
Creating Your First Project
Time to create your first Rust project using Cargo.
❎ Create a new project: Run cargo new fibo in your terminal.
This creates a new directory called fibo with a complete binary project structure. (If you wanted a library instead, you’d use cargo new --lib.)
Navigate into your new project directory and let’s explore what Cargo created for you:
Project Structure
Cargo.toml: The project manifest. It contains metadata (name, authors, version) and will list any dependencies you add.Cargo.lock: Generated after compilation, this file locks down the exact versions of dependencies used. This ensures anyone can reproduce your exact build—even months later.src/: Your source code lives here. Right now, it just hasmain.rswith a “Hello, world!” program.
All these files should be committed to version control (git). Cargo even initializes a git repository for you (unless you’re already in one) complete with a .gitignore file.
After compilation, a target/ directory will appear containing build artifacts and binaries. This directory is large and regeneratable, so it’s already in .gitignore—never commit it!
Building and Running
Let’s build your project:
❎ Compile the project: Run cargo build
By default, this creates a debug build—slower but easier to debug—in target/debug/fibo.
❎ Run your program: Execute ./target/debug/fibo and observe the “Hello, world!” output from src/main.rs.
💡 Shortcut: Instead of building and running separately, use
cargo runto compile (if needed) and execute in one command!
💡 Release builds: For optimized production code, use
cargo build --releaseorcargo run --release. Release builds are significantly faster but take longer to compile.
Implementing Fibonacci Recursively
Now for the fun part—let’s implement the classic Fibonacci function! As a refresher, the Fibonacci sequence is defined as:
- fibo(0) = 0
- fibo(1) = 1
- fibo(n) = fibo(n-1) + fibo(n-2) for n > 1
❎ Implement the recursive Fibonacci function with this signature:
#![allow(unused)]
fn main() {
fn fibo(n: u32) -> u32 {
// TODO: Your implementation here
}
}
💡 Rust tip: Remember that
ifis an expression in Rust—it returns a value! This means you can writeif condition { value1 } else { value2 }without explicitreturnstatements. Embrace this functional style!
Displaying the Sequence
❎ Create a loop in main() that displays Fibonacci values from 0 to 42:
fibo(0) = 0
fibo(1) = 1
fibo(2) = 1
fibo(3) = 2
fibo(4) = 3
fibo(5) = 5
...
fibo(42) = 267914296
Once working, try running in both debug and release modes to see the dramatic speed difference! Release mode should be much faster.
Making It Fast: Iterative Implementation
While elegant, recursive Fibonacci is notoriously slow for larger numbers. Let’s fix that with iteration.
❎ Reimplement fibo iteratively while keeping the same function signature.
Hints to help you succeed:
- Declare variables to track previous Fibonacci numbers
- Use
mutto make variables mutable - Create a loop without using the index: name it
_to avoid compiler warnings about unused variables - You can
returnearly for base cases (when n < 2)
This version should be significantly faster than the recursive one, even in debug mode!
Handling Overflow: When Numbers Get Too Big
Let’s explore what happens when Fibonacci numbers exceed what a u32 can hold.
❎ Change the limit from 42 to 50 and run your program.
Notice something strange between fibo(47) and fibo(48)? The numbers suddenly become nonsensical! This is integer overflow—when a number is too large to fit in a u32, it wraps around.
Rust provides several elegant ways to handle this:
- Use larger integers: Switch from
u32tou64(easy but just delays the problem) - Saturated arithmetic: Operations that hit a boundary (min or max) just stay at that boundary
- Checked arithmetic: Operations that would overflow return an error instead of producing wrong results
Let’s explore options 2 and 3 to see Rust’s safety features in action!
Saturated Arithmetic: Staying Within Bounds
Saturating operations “clamp” at the type’s maximum value when overflow would occur.
❎ Find the saturating_add method in the u32 documentation.
❎ Replace your addition with saturated addition and observe the results.
💡 Type suffixes: You can specify numeric literal types with suffixes like
1u32,42i64, or3.14f32.
Notice that results stay monotonic (always increasing) but plateau at u32::MAX (2³²-1). The values are wrong, but at least they don’t wrap around wildly!
Checked Arithmetic: Detecting Errors
Checked operations return None when overflow occurs instead of producing incorrect values.
❎ Find the checked_add method in the u32 documentation.
❎ Replace saturated addition with checked_add() followed by .unwrap() to extract the value.
Run your program—it should panic with a runtime error when overflow occurs. Not graceful, but at least it doesn’t silently produce wrong answers!
Handling Overflow Gracefully with Option
Let’s make overflow handling explicit and elegant using Rust’s Option type.
❎ Change the function signature to return Option<u32>:
#![allow(unused)]
fn main() {
fn fibo(n: u32) -> Option<u32> {
// TODO: Return None if overflow would occur,
// Some(result) otherwise
}
}
Now your function can communicate “this result doesn’t fit in a u32” by returning None.
❎ Update your main function to stop the loop when a computation fails (returns None).
You can use either:
- A
matchexpression to handleSome(value)andNonecases - An
if let Some(value) = fibo(n)statement for cleaner code when you only care about the success case
Perfect! Your program now accurately computes Fibonacci numbers and stops gracefully when values become too large for u32.
Leveraging the Ecosystem: Using Crates
Now let’s explore one of Rust’s superpowers: its vibrant ecosystem of reusable libraries called crates.
What Are Crates?
A crate is a package of Rust code that can be either:
- A binary crate: An executable program (like the
fiboproject you created) - A library crate: Reusable code that other projects can import
Your fibo project is a binary crate with source code in src/main.rs. The Rust community shares thousands of crates on crates.io, making it easy to add powerful functionality to your projects.
Adding Command-Line Argument Parsing
Let’s enhance your fibo program with professional command-line argument parsing using the popular clap crate.
❎ Add clap as a dependency by adding these lines to your Cargo.toml file:
[dependencies]
clap = { version = "4.5.58", features = ["derive"] }
This tells Cargo to:
- Fetch
clapversion 4.5.58 or newer (but stay below 5.0.0) - Enable the
derivefeature (not enabled by default), which allows using#[derive(Parser)]for cleaner code
Want to learn more about version specifications? Check out Cargo’s dependency documentation.
💡 You can also use
cargo add clap -F deriveon the command line instead of editing theCargo.tomlfile by hand.
Using Clap in Your Code
❎ Import the Parser trait at the top of main.rs:
#![allow(unused)]
fn main() {
use clap::Parser;
}
❎ Create a command-line interface using the clap documentation to match this usage pattern:
Compute Fibonacci suite values
Usage: fibo [OPTIONS] <VALUE>
Arguments:
<VALUE> The maximal number to print the fibo value of
Options:
-v, --verbose Print intermediate values
-m, --min <NUMBER> The minimum number to compute
-h, --help Print help
💡 Automatic dependency management: When you specify
clapinCargo.toml, Cargo automatically downloads it along with all of its dependencies, then compiles everything when you build your project. It’s that simple!
The exact dependency versions used are recorded in Cargo.lock, ensuring anyone can rebuild your project with identical dependencies—even years later.
Maintaining Code Quality
Before considering your work complete, let’s ensure it meets Rust community standards:
❎ Run Clippy to catch common mistakes and get suggestions: cargo clippy
- Address any warnings or suggestions it provides
❎ Format your code according to Rust conventions: cargo fmt
💡 Best practice: Run
cargo fmtandcargo clippybefore every commit to keep your code clean and professional. Many developers configure their editors to run these automatically!
Practice Problems: Mastering Rust Concepts 🧩
These exercises will sharpen your understanding of Rust’s unique features. Create a “problems” project in your repository to work through them.
Lifetimes: Understanding Ownership and Borrowing
Lifetimes are one of Rust’s most distinctive features. They ensure references stay valid without runtime overhead. Let’s explore them through practical problems.
Understanding the trim Method
The trim method on strings removes leading and trailing whitespace. Its signature uses lifetime elision:
fn trim(&self) -> &str;
This is shorthand for the explicit form:
fn trim<'a>(&'a self) -> &'a str;
The lifetime 'a connects the input and output: the returned string slice lives exactly as long as the string it came from.
Problem 1: Who Owns the String?
This code looks reasonable, but it won’t compile. Can you figure out why?
fn ret_string() -> String {
String::from(" A String object ")
}
fn main() {
let s = ret_string().trim();
assert_eq!(s, "A String object");
}
Think about it: What’s the lifetime of s? Who owns the underlying string with spaces? Every value in Rust has exactly one owner—when the owner goes out of scope, the value is dropped.
❎ Fix this code so it compiles and s holds the trimmed string.
💡 Hint: You can reuse the same variable name with shadowing!
Problem 2: Choosing Between Alternatives
Sometimes a function returns one of several borrowed values. How do lifetimes work in this case?
❎ Add appropriate lifetime annotations to make this function compile:
fn choose_str(s1: &str, s2: &str, select_s1: bool) -> &str {
if select_s1 { s1 } else { s2 }
}
Important constraint: At call time, s1 and s2 may have different lifetimes. We don’t want to artificially constrain them to have the same lifetime—that would be too restrictive.
Think carefully about what lifetime the return value should have!
Problem 3: Building an Owned-Or-Ref (OOR) Type
This is a meatier challenge that combines enums, generics, and smart pointer traits.
⚠️ For this problem, don’t peek at the standard
Cowtype—solve it yourself first, then compare your solution!
The goal: Create an OOR type that can efficiently store either a String (owned) or a &str (borrowed), avoiding unnecessary copies when the string already exists.
Step 1: Define the Enum
❎ Create an OOR enum with two variants:
Owned: stores aStringBorrowed: stores a&str
You’ll need a generic lifetime parameter. What does it represent? (Think about the lifetime of borrowed data!)
Step 2: Implement Deref
❎ Implement the Deref trait so that OOR dereferences to &str.
Consider: What’s the lifetime of the resulting &str? Why is your choice always safe?
❎ Test it: Verify you can call &str methods directly on OOR objects.
Step 3: Implement DerefMut
This gets trickier!
❎ Implement DerefMut for OOR.
Challenge: If you have a Borrowed variant, you can’t get a &mut str from an immutable &str. You’ll need to convert to an Owned variant with a cloned String first!
Step 4: Comprehensive Test
❎ Verify your implementation passes this test:
// Check Deref for both variants of OOR
let s1 = OOR::Owned(String::from(" Hello, world. "));
assert_eq!(s1.trim(), "Hello, world.");
let mut s2 = OOR::Borrowed(" Hello, world! ");
assert_eq!(s2.trim(), "Hello, world!");
// Check choose
let s = choose_str(&s1, &s2, true);
assert_eq!(s.trim(), "Hello, world.");
let s = choose_str(&s1, &s2, false);
assert_eq!(s.trim(), "Hello, world!");
// Check DerefMut, a borrowed string should become owned
assert!(matches!(s1, OOR::Owned(_)));
assert!(matches!(s2, OOR::Borrowed(_)));
unsafe {
for c in s2.as_bytes_mut() {
if *c == b'!' {
*c = b'?';
}
}
}
assert!(matches!(s2, OOR::Owned(_)));
assert_eq!(s2.trim(), "Hello, world?");
What’s happening here? Notice how s2 starts as Borrowed but becomes Owned when we need mutable access. This is the “clone-on-write” pattern!
These problems will deepen your understanding of Rust’s ownership system. Take your time, think through each step, and don’t hesitate to experiment! 🦀
Building a Virtual Machine in Rust 🤖
Time for a fun challenge! You’re going to build an interpreter for a custom virtual machine. This exercise will strengthen your Rust skills while exploring how computers execute programs at a fundamental level.
What You’ll Create
Your virtual machine will:
- Execute a simple instruction set
- Manage memory and registers
- Process control flow (jumps, conditionals)
- Demonstrate Rust’s power for systems programming
Getting Started
Everything you need is here:
- Machine specification: The complete instruction set and architecture
- Your program template: What you need to implement and how to get started
This is a great opportunity to see how Rust’s type system helps you build reliable, low-level systems. Ready to build your own CPU in software? Let’s go! 🚀
Virtual Machine Architecture 🏗️
Let’s define the architecture of your virtual machine! It’s intentionally simple, making it a great learning project while still being interesting to implement.
The Machine Model
Your VM is a classic von Neumann architecture with these characteristics:
Memory
- Size: 4096 bytes (4KB)
- Address range: 0 to 4095
- Usage: Stores both program code and data (unified memory space)
- Access: 32-bit reads and writes, no alignment required
- Byte order: Little-endian (least significant byte first)
Registers
- Count: 16 general-purpose registers
- Names:
r0throughr15 - Width: 32 bits each
- Special:
r0is the Instruction Pointer (IP)—it holds the address of the next instruction to execute
💡 Little-endian explained: When storing a 32-bit value like
0x12345678, the least significant byte (0x78) goes in the lowest address, then 0x56, then 0x34, then 0x12 in the highest address.
Execution Model: The Fetch-Decode-Execute Cycle
Each step of execution follows this pattern:
-
Fetch & Decode: Read the instruction at address
IPand decode it- Variable-length instructions! Each instruction component (like
reg_a) is exactly one byte
- Variable-length instructions! Each instruction component (like
-
Advance IP: Move
IPto point just after the decoded instruction and its arguments -
Execute: Perform the decoded instruction’s operation
This is the classic CPU execution model—the same pattern real processors use!
Error Handling: When Things Go Wrong
Your VM should detect these error conditions and return an error (not panic!):
- ❌ Invalid instruction opcode at IP
- ❌ Instruction doesn’t fit entirely in memory
- ❌ Instruction references an invalid register (> r15)
- ❌ Instruction accesses an invalid memory address (≥ 4096)
Important: Return a Result::Err for these cases—don’t panic! Once an error occurs, the VM should not be used again.
🦀 Rust philosophy: Recoverable errors return
Result, unrecoverable errors panic. VM execution errors are recoverable—the host program can handle them gracefully.
The Instruction Set 🔧
Your VM has a minimalist instruction set—just 8 instructions! Don’t let the simplicity fool you; this is enough to write interesting programs.
| Instruction | Opcode | Arguments | Effect |
|---|---|---|---|
| move if | 1 | rᵢ rⱼ rₖ | if rₖ ≠ 0 Then rᵢ ← rⱼ |
| store | 2 | rᵢ rⱼ | mem[rᵢ] ← rⱼ |
| load | 3 | rᵢ rⱼ | rᵢ ← mem[rⱼ] |
| loadimm | 4 | rᵢ L H | rᵢ ← extend(signed(H L)) |
| sub | 5 | rᵢ rⱼ rₖ | rᵢ ← rⱼ - rₖ |
| out | 6 | rᵢ | output char(rᵢ) |
| exit | 7 | exit the program | |
| out number | 8 | rᵢ | output decimal(rᵢ) |
Understanding the Examples
All examples below assume these initial register values:
r1= 10r2= 25r3= 0x1234ABCDr4= 0r5= 65
All other registers are unused in examples.
When you see 1 1 2 3, it means the instruction consists of 4 consecutive bytes: 1, 1, 2, and 3.
Instruction Details
move if
Format: 1 rᵢ rⱼ rₖ
Operation: Conditional move—if register rₖ contains a non-zero value, copy rⱼ into rᵢ; otherwise do nothing.
Examples:
1 1 2 3: Sincer3= 0x1234ABCD (non-zero),r1becomes 25 (value ofr2)1 1 2 4: Sincer4= 0, nothing happens—r1stays unchanged
💡 This is your conditional instruction! Use it for implementing if-statements and loops.
store
Format: 2 rᵢ rⱼ
Operation: Store the 32-bit value from register rⱼ into memory starting at the address in register rᵢ, using little-endian byte order.
Example:
2 2 3: Storesr3(0x1234ABCD) at addresses [25, 26, 27, 28]:- Address 25 ← 0xCD (least significant byte)
- Address 26 ← 0xAB
- Address 27 ← 0x34
- Address 28 ← 0x12 (most significant byte)
load
Format: 3 rᵢ rⱼ
Operation: Load a 32-bit value from memory at the address in register rⱼ into register rᵢ, interpreting bytes as little-endian.
Example:
3 1 2: Loads from addresses [25, 26, 27, 28] intor1:- If memory contains [0xCD, 0xAB, 0x34, 0x12]
- Then
r1becomes 0x1234ABCD
💡
loadandstoreare mirror operations—one writes to memory, the other reads from it.
loadimm
4 rᵢ L H: interpret H and L respectively as the high-order and the low-order bytes of a 16-bit signed value, sign-extend it to 32 bits, and store it into register rᵢ.
Examples:
4 1 0x11 0x70: store 0x00007011 into registerr14 1 0x11 0xd0: store 0xffffd011 into registerr1
Note how sign extension transforms a positive 16 bit value (0x7011 == 28689) into a positive 32 bit value (0x00007011 == 28689) and a negative 16 bit value (0xd011 == -12271) into a negative 32-bit value (0xffffd011 == -12271).
sub
5 rᵢ rⱼ rₖ: store the content of register rⱼ minus the content of register rₖ into register rᵢ
Arithmetic wraps around in case of overflow. For example, 0 - 1 returns 0xffffffff, and 0 - 0xffffffff returns 1.
Examples:
5 10 2 1: store 15 intor10(the subtraction of registerr225 and registerr110).5 10 4 1: store -10 (0xfffffff6) intor10(the subtraction of registerr40 and registerr110).
out
6 rᵢ: display the character whose unicode value is stored in the 8 low bits of register rᵢ on the standard output.
Example:
6 5: output “A” since the 8 low bits of registerr5contain 65 which is the unicode codepoint for “A”.6 3: output “Í” since the 8 low bits of registerr3contain 0xCD which is the unicode codepoint for “Í”.
Note: you have to convert the content into a char and display this char.
exit
7: exit the current program
Example:
7: get out.
out number
8 rᵢ: output the signed number stored in register rᵢ in decimal.
Example:
8 5: output “65” since registerr5contains 65.8 3: output “305441741” since registerr3contains 0x1234ABCD.
Note
Note that some common operations are absent from this instruction set. For example, there is no add operation, however a+b can be replaced by a-(0-b). Also, there are no jump or conditional jump operations. Those can be replaced by manipulating the value stored in register r0 (IP).
Your program
Your program will contain both an application and a library:
- The library allows other programs to embed your virtual machine
- The application lets you run programs written for the virtual machine from the command line.
You are given an archive file which contains (in a vm project):
Cargo.toml: the initial configuration filesrc/main.rs: the main program for the application, which loads a binary file with machine code and executes itsrc/lib.rs: the entry point for theinterpreterlibrary which contains your implementation of the virtual machinesrc/tests/: a directory with many tests, ranging from individual instructions tests to complex testssrc/examples/: some examples for the virtual machines that you can run when your interpreter is complete
Tests and examples are accompanied by their disassembled counterpart to help you understand what happens (*.bin is the program for the virtual machine, *.dis is the disassembly).
Start by adding the vm Cargo project to your repository and ensure that you can build the program even though it doesn’t do anything useful yet and will contain many warnings:
$ cargo build
You can see the tests fail (hopefully this is a temporary situation) by running:
$ cargo test
Program structure
At any time, make sure that the program and the tests compile, even if they don’t pass succesfully yet. In particular, you are not allowed to rename the Machine and Error types, although you will need to modify them to implement this assignment. Similarly, the already documented method must be kept without modifying their signature because they will be used in automated tests.
❎ After creating a new interpreter through interpreter::Machine::new(), the following methods must be implemented:
step_on(): takes a descriptor implementingWrite(for theoutandout numberinstructions), and execute just one instructionstep(): similar tostep_on(), but writes on the standard outputrun_on(): takes aWrite-implementing descriptor and runs until the program terminatesrun(): similar torun_on(), but writes on the standard outputmemory()andregs(): return a reference on the current memory and registers contentset_reg(): set the value of a register
Do not hesitate to add values to the Error enumeration to ease debugging. Also, you can implement additional functions to Machine if it helps dividing the work.
As far as Machine::new() is concerned, you might be interested in looking at slice::copy_from_slice().
Writing things to the user
For the out and out_number opcodes, you will have to write things to a file descriptor (respectively a character and a number). This can be done with the write!() macro, which lets you write into any object whose type implements the Write trait.
Suggested work program
Several tests are provided in the tests directory:
assignment.rscontains all the examples shown in the specification. You should try to concentrate on this one first and implement instructions in the same order as in the specification (and the test) until you pass this test. You can run only this test by usingcargo test --test assignment.basic_operations.rschecks that all instructions are implemented correctly. For example, it will attempt to read and write past the virtual machine memory, or use an invalid register, and check that you do not allow it.complex_execution.rswill load binary images and execute them using your virtual machine.
How to debug more easily
In order to ease debugging, you can use two existing crates, log and pretty_env_logger.
log provides you with a set of macros letting you formatting debugging information with different severities:
log::info!(…)is for regular informationlog::debug!(…)is for data you’d like to see when debugginglog::trace!(…)is for more verbose cases- …
See the documentation for a complete information.
pretty_env_logger is a back-end for log which gives you nice colored messages and is configured through environment variables.
You can initialize at the beginning of your main program by calling pretty_env_logger::init(). Then, you can set an environment variable to determine the severities you want to see:
$ RUST_LOG=debug cargo run mytest.bin
You’ll then see all messages with severity debug and above. Once again, the documentation is online.
💡 Note on the
ResulttypeYou might notice a redefinition of the
Resulttype:#![allow(unused)] fn main() { type Result<T, E = Error> = std::result::Result<T, E>; }This defines a local
Resulttype whose second generic parameter has a default value: your ownErrortype. It means that you can writeResult<T>instead ofResult<T, Error>for the return type of your functions. Also, a user of your library will be able to reference such a type asinterpreter:::Result<T>instead ofinterpreter:::Result<T, interpreter::Error>.This kind of shortcut is very common in Rust. For example, the
std::iomodule defines:#![allow(unused)] fn main() { type Result<T, E = std::io::Error> = std::result::Result<T, E>; }so that you can use
std::io::Result<usize>for an I/O operation which returns a number of bytes instead ofstd::io::Result<usize, std::io::Error>.Similarly, the
std::fmtmodule goes even further and defines#![allow(unused)] fn main() { type Result<T = (), E = std::fmt::Error> = std::result::Result<T, E>; }so that you can use
std::fmt::Result(without generic parameters) in a formatting operation instead ofstd::fmt::Result<(), std::fmt::Error>.
LED Matrix Lab: Rust in the Real World 🚀
Welcome to the main event! In this comprehensive lab, you’ll build a real embedded application that controls an LED matrix display. This is where Rust truly shines—combining safety with the performance needed for embedded systems.
What You’ll Build
You’re about to recreate what’s done in C in the 4SE07 bare board programming lab (French), but with the power and safety of Rust. We’ll use higher-level abstractions while skipping unnecessary complexity, letting you focus on the interesting parts.
By the end of this lab, you’ll have:
- Direct hardware control through Rust
- Real-time image display on an LED matrix
- Serial communication handling
- Understanding of embedded Rust patterns
Let’s get started! 🦀
Initial Setup: Preparing Your Embedded Toolkit
Before diving into embedded development, we need to install some specialized tools. Think of these as your embedded Rust toolbox—each tool serves a specific purpose in the development workflow.
Essential Tools Installation
Let’s install the tools you’ll need for embedded development.
❎ Install the following tools using the instructions below.
cargo-binutils: Binary Inspection Tools
cargo-binutils provides helpful subcommands like cargo size to inspect your compiled binaries—crucial for embedded work where every byte counts! It requires an additional LLVM component:
$ rustup component add llvm-tools
$ cargo install cargo-binutils
probe-rs: Your Hardware Communication Bridge
These powerful tools let you flash programs onto your microcontroller and debug them:
$ cargo install probe-rs-tools
💡 Linux users: On Debian and Ubuntu systems, you may need to install the
libudev-devpackage forprobe-rsto work correctly. Runsudo apt-get install libudev-devif you encounter issues.
Creating Your Project
Time to create the project structure for your LED matrix controller!
❎ Create a new library project called tp-led-matrix in your git repository.
Not sure about the arguments? Use cargo new --help to see the option for creating a library project (hint: it’s --lib).
Development Workflow Expectations
Throughout this lab, maintain high code quality with these practices:
- Compile frequently: Verify your code builds without warnings after each change
- Format consistently: Run
cargo fmtto keep formatting perfect - Catch issues early: Use
cargo clippyto get expert suggestions on improving your code
These aren’t just suggestions—they’re professional Rust development practices!
Going Bare Metal: The no_std Environment
Embedded systems don’t have operating systems or standard libraries. Your program runs directly on the hardware! We need to tell Rust we’re working in this “bare metal” environment.
❎ Declare no_std in your library by adding this inner attribute to src/lib.rs:
#![allow(unused)]
#![no_std]
fn main() {
}
💡 Why
#!instead of#?: The!makes this an inner attribute that applies to the entire module (your library), rather than to a specific item. That’s why it goes at the very top of your file!
This tells Rust: “We’re not using the standard library—we’re working directly with hardware.” Welcome to embedded development!
Building Visual Data Structures
Before we can display anything on the LED matrix, we need to create the fundamental data types for representing visual information. Think of this as building the vocabulary your program will use to “speak” to the display.
Module Organization
❎ Create a public image module in your project.
All the types in this section will live in this module. We’re building two key structures:
Color: Represents a single RGB pixel with red, green, and blue componentsImage: Represents a complete 8×8 image made of 64 colored pixels
Later, we’ll reexport these from the library’s top-level module for easier use. For now, just create them in the image module—don’t reexport anything yet.
Ready? Let’s build your first embedded data structures! 🎨
The Color Type: Representing RGB Pixels
Every pixel on the LED matrix displays a color made from mixing red, green, and blue light. Let’s create a Rust type to represent this!
Basic Color Structure
❎ Create an image::Color structure with three unsigned byte fields for the primary colors: r, g, and b.
Making Color Efficient with Traits
Since a Color is just 3 bytes, copying it is extremely cheap—much faster than borrowing in many cases!
❎ Derive Copy and Clone for Color.
💡
Copytypes can be duplicated by simply copying bits—perfect for small types. Note thatCopyrequiresClone, so you need both traits.
A Sensible Default
What’s the default color when you create a new Color? Black (all zeros) makes perfect sense—it’s the absence of light.
❎ Derive Default for Color to get this behavior automatically.
Primary Color Constants
Let’s define some helpful constants for the primary colors.
❎ Implement three public constants on Color:
Color::REDColor::GREENColor::BLUE
Initialize each with the appropriate RGB values (full intensity for one component, zero for others).
⚠️ Module organization tip: If you put your
imagemodule code in a file namedimage.rs, don’t wrap it inpub mod image { … }inside that file! That would create a nestedimage::imagemodule. The fileimage.rsalready defines theimagemodule—just put the module’s contents directly in the file.
Gamma Correction: Making Colors Look Right
Human perception of brightness isn’t linear—we’re much more sensitive to changes in dark colors than bright ones. LED matrices need gamma correction to display colors that look natural to our eyes.
We’ve prepared a gamma correction table that works perfectly with your LED matrix. It maps each input brightness value (0-255) to a perceptually corrected output value.
❎ Add a gamma module containing:
- The gamma correction table from the link above
- A function
pub fn gamma_correct(x: u8) -> u8that returns the corrected value from the table
❎ Implement gamma correction for Color by adding this method:
#![allow(unused)]
fn main() {
pub fn gamma_correct(&self) -> Self
}
This method should apply gamma::gamma_correct to all three color components (r, g, b) and return a new corrected Color.
💡 The
&selfparameter means this is a method called on aColorinstance:my_color.gamma_correct(). TheSelfreturn type is shorthand forColor—it returns the same type as the receiver.
Color Arithmetic: Making Colors Vibrant
Imagine you want to dim a color to 50% brightness, or brighten it to 150%. We can do this elegantly by implementing multiplication and division operations on colors!
The Challenge of no_std
Without the standard library, we lose access to some floating-point operations like f32::round(). We’ll use the micromath crate to get these back.
❎ Add the micromath crate to your project’s dependencies.
❎ Import micromath::F32Ext in your image module to gain access to floating-point operations.
Implementing Color Multiplication
Let’s implement the * operator for Color multiplied by f32. For example, Color::RED * 0.5 should give you a half-brightness red.
❎ Implement the core::ops::Mul<f32> trait on Color.
Your implementation should:
- Multiply each RGB component by the floating-point value
- Round to the nearest integer (use the
round()method fromF32Ext) - Clamp values to stay within 0-255 range (hint:
f32::clamp()is helpful!) - Return a new
Colorwith the adjusted components
Consider writing a helper function to handle one component at a time—it’ll make your code cleaner.
Implementing Color Division
Division should work similarly: Color::BLUE / 2.0 gives you half-intensity blue.
❎ Implement the core::ops::Div<f32> trait on Color.
💡 Smart implementation: You can implement division in terms of multiplication!
color / xis the same ascolor * (1.0 / x). Reuse your multiplication code for cleaner implementation.
Excellent! Your Color type is now complete and ready to create beautiful displays! 🎨
🦀 Advanced Note: Traits, Visibility, and Namespace Pollution
When you write use micromath::F32Ext;, you bring the F32Ext trait into scope. This trait defines methods like round() on the f32 type. Importing the trait makes these methods available—but also adds F32Ext to your namespace.
If you want the methods but don’t want the name cluttering your namespace, there’s a clever trick:
// Import the F32Ext trait methods without importing the name itself
use micromath::F32Ext as _;
The as _ means “bring this into scope but don’t bind it to any name.” The methods still work, but F32Ext itself isn’t part of your namespace. Neat!
The Image Type: Working with 8×8 Displays
Now that we have pixels (Color), let’s build a complete image! Our LED matrix is 8×8 pixels, so we need a structure to hold all 64 pixels together.
Basic Image Structure
❎ Create a public image::Image structure containing a single unnamed field: an array of 64 Color pixels.
Structures with unnamed fields (called tuple structs) are declared like this:
struct Image([Color; 64]);
With this definition, if im is an Image, then im.0 accesses the underlying array. It’s like tuple field access (.0 for first field, .1 for second, etc.).
Creating Solid-Color Images
Let’s add a convenient constructor for images filled with a single color.
❎ Implement a public associated function on Image:
#![allow(unused)]
fn main() {
pub fn new_solid(color: Color) -> Self
}
This should return an Image where all 64 pixels are set to the given color.
The Default Trait
The Default trait is perfect for images—a default image should be all black pixels. Unfortunately, Rust has a technical limitation: it can’t automatically derive Default for arrays longer than 32 elements. No problem—we’ll implement it manually!
❎ Manually implement the Default trait for Image.
Your implementation should return an image filled with the default color (which is black, since Color defaults to all zeros).
💡 You can use your
new_solidfunction here:Self::new_solid(Color::default())
Accessing Individual Pixels
We want intuitive pixel access using syntax like my_image[(row, col)]. Rust’s Index and IndexMut traits make this possible, and they accept any type as an index—a (usize, usize) tuple is perfect for our 2D grid!
❎ Implement core::ops::Index<(usize, usize)> for Image with output type Color.
This enables reading pixels: let pixel = image[(2, 3)];
❎ Implement core::ops::IndexMut<(usize, usize)> for Image.
This enables writing pixels: image[(2, 3)] = Color::RED;
💡 Note:
IndexMutdoesn’t specify an output type because it must match the one fromIndex. You can only implementIndexMuton types that also implementIndexwith the same index type.
Row Access for Display Scanning
LED matrices typically display images one row at a time. Let’s provide efficient access to entire rows!
❎ Add a row accessor method to Image:
#![allow(unused)]
fn main() {
pub fn row(&self, row: usize) -> &[Color]
}
This should return a slice referencing the pixels in the specified row.
💡 Lifetimes and safety: Notice how the returned reference borrows from
self? Rust ensures the reference can’t outlive the image—automatic memory safety with zero runtime cost!
Creating a Gradient for Testing
Let’s build a visual test pattern: a gradient that fades from a color to black.
❎ Implement a gradient constructor:
#![allow(unused)]
fn main() {
pub fn gradient(color: Color) -> Self
}
Each pixel should contain the reference color divided by (1 + row * row + col). Use the pixel access methods you just implemented (image[(row, col)]) to build this programmatically.
This creates a nice visual pattern perfect for testing your display!
Viewing Images as Raw Bytes
From the 4SE07 lab, we know we’ll receive image data from the serial port byte by byte. It would be much easier if we could view our Image as raw bytes too!
Understanding Memory Layout
Rust is allowed to reorder, pad, or otherwise rearrange struct fields for optimization. Right now, we don’t know how Color is organized in memory. Maybe each field uses 32 bits instead of 8? Maybe g comes before r? We need to take control of the memory layout.
❎ Add a repr(C) attribute to Color.
This forces Rust to use C-compatible representation, which guarantees:
- Each field is exactly 8 bits (one byte)
- Fields are packed with one-byte alignment
- Fields appear in the order we declared:
r, theng, thenb
Perfect for hardware interfacing!
Ensuring Image Layout
For Image, we’re in good shape. Rust guarantees that arrays are laid out according to their element type’s size and alignment. With our repr(C) on Color, this means the three bytes of pixel 0 are immediately followed by the three bytes of pixel 1, and so on—exactly what we need!
However, we must ensure Image uses the same representation as its inner array.
❎ Add a repr(transparent) attribute to Image.
This tells Rust: “Use the exact same memory layout as your single non-zero-sized field.” The Image wrapper becomes zero-cost!
Implementing Byte Access
Now let’s implement traits that let us view an Image as an array of 192 bytes (8 rows × 8 columns × 3 bytes per pixel).
❎ Implement AsRef<[u8; 192]> for Image.
You’ll need to use core::mem::transmute() to reinterpret self as a reference to a byte array. This is an unsafe function because we’re telling Rust “trust me, I know this is safe”—and with our repr attributes, it genuinely is!
⚠️ Unsafe code:
transmuteis powerful but dangerous. Only use it when you’ve carefully ensured memory layouts match, as we have here with ourreprattributes.
❎ Implement AsMut<[u8; 192]> for Image the same way.
This provides mutable byte access for filling the image from serial data.
🎉 Congratulations! You now have a rock-solid Image type with:
- Safe pixel access
- Efficient row access
- Raw byte conversion for hardware communication
This solid foundation will make the rest of the lab much smoother. Great work! 🦀
Reexporting Types for Easy Access
Your library users will want to use your Color and Image types. Let’s make their lives easier by reexporting these types at the library’s top level!
Why Reexport?
Without reexporting, users would need to write:
use tp_led_matrix::image::Color;
use tp_led_matrix::image::Image;
That’s verbose! By reexporting from lib.rs, they can simply write:
use tp_led_matrix::{Color, Image};
Much cleaner!
Implementation
❎ Add public re-exports in lib.rs using pub use to expose Color and Image at the crate root.
This is a common Rust pattern—organize your internal modules however makes sense for implementation, then present a clean, flat API to users. Best of both worlds! 🎯
Running on Real Hardware: Embedded Mode 🎯
Now comes the exciting part—running your Rust code on actual hardware! We’re moving from simulation to the real world, where your program will control a physical LED matrix on an IoT board.
The Journey Ahead
Getting your code onto the board requires several setup steps, but don’t worry—once configured, Cargo handles everything automatically. Here’s our roadmap:
- Configure the toolchain: Set up Rust to generate ARM microcontroller code
- Upload to the board: Flash your program using Segger JLink tools
- Display something: Make your first pixels light up!
- Optimize the setup: Streamline your development workflow
- Configure peripherals: Access the hardware through Rust’s Hardware Abstraction Layer
- Light the LED matrix: Bring your display to life with GPIO control
Each step builds on the previous one, taking you from “empty project” to “working LED matrix display.” Let’s get started! 🚀
Configuring the Toolchain: Cross-Compiling for ARM
Time to teach Rust how to generate code for your microcontroller! We’ll set up cross-compilation so your programs can run on the ARM Cortex-M4F processor.
Step 1: Installing the ARM Target
Your board uses a STM32L475VGT6 microcontroller with a Cortex-M4F core (the F means it has a hardware floating-point unit—nice!). We need to download the corresponding compilation target.
❎ Add the ARM target using rustup:
$ rustup target add thumbv7em-none-eabihf
This downloads the standard library and compiler components needed for ARM Cortex-M4F chips.
Setting the Default Target
Rather than specifying this target every time we build, let’s make it the default for this project.
❎ Create .cargo/config.toml in your project root with:
[build]
target = "thumbv7em-none-eabihf" # Cortex-M4F/M7F (with FPU)
Now every cargo build will automatically cross-compile for ARM!
❎ Verify it works: Run cargo build and notice the new target/thumbv7em-none-eabihf directory containing your ARM binaries.
Step 2: Building an Executable Program
We can compile a library, but we need an actual runnable program. For embedded systems, this requires:
- Linker script: Tells the linker where code and data go in memory
- Linker arguments: Configures the linking process
- Main program: Your entry point
- Panic handler: What to do when something goes wrong
Sounds like a lot, but the Rust ecosystem makes it straightforward!
Using the Cortex-M Runtime Crate
We could write our own linker script from scratch (like in the 4SE07 lab), but why reinvent the wheel? The cortex-m-rt crate provides everything we need:
- A complete linker script (
link.x) - The
#[entry]attribute to mark your main function - A proper vector table for Cortex-M processors
The linker script includes a memory.x file that describes your chip’s memory layout. We’ll provide this small configuration file.
❎ Add the runtime dependency:
$ cargo add cortex-m-rt
❎ Create memory.x in your project root (next to Cargo.toml):
MEMORY
{
FLASH : ORIGIN = 0x08000000, LENGTH = 1M
RAM : ORIGIN = 0x20000000, LENGTH = 96K
}
This tells the linker where your chip’s flash memory (for code) and RAM (for data) are located.
Configuring the Linker
We need to tell the linker to use the link.x script provided by cortex-m-rt.
❎ Add this section to .cargo/config.toml:
[target.'cfg(all(target_arch = "arm", target_os = "none"))']
rustflags = ["-C", "link-arg=-Tlink.x"]
This applies to all ARM bare-metal targets—exactly what we need!
Adding Peripheral Access
The cortex-m-rt linker scripts need a vector table specific to your microcontroller. We’ll get this from embassy-stm32, which provides complete STM32 support.
❎ Add Embassy STM32 support:
$ cargo add embassy-stm32 --features stm32l475vg
❎ Add critical section support (required by Embassy):
$ cargo add cortex-m --features critical-section-single-core
💡 What’s a critical section? It’s a piece of code that must run atomically (without interruption). Embassy needs a way to implement these for safety.
Writing the Main Program
While a crate can have only one library, it can have multiple executables (binaries). Let’s create our main program!
❎ Configure the executable in Cargo.toml:
[[bin]]
name = "tp-led-matrix"
The double brackets [[bin]] indicate a list item—you could add more executables if needed.
❎ Create src/main.rs with this minimal embedded program:
#![no_std]
#![no_main]
use cortex_m_rt::entry;
use embassy_stm32 as _; // Links Embassy (provides the vector table)
#[panic_handler]
fn panic_handler(_panic_info: &core::panic::PanicInfo) -> ! {
loop {}
}
#[entry]
fn main() -> ! {
panic!("The program stopped");
}
Let’s break this down:
#![no_std]: We’re not using the standard library#![no_main]: Our entry point isn’t the normalfn main()#[entry]: Marks our actual entry point (provided bycortex-m-rt)-> !: The “never” type—our program runs forever or panics, it never returns#[panic_handler]: Defines what happens on panic (here: infinite loop)
💡 Alternative: Instead of writing your own panic handler, you can use the
panic-haltcrate which does the same thing.
Building Your Embedded Program
Time to compile!
❎ Build in both modes:
$ cargo build # Debug mode
$ cargo build --release # Release mode (optimized)
Checking Binary Size
On embedded systems, code size matters! Let’s see how big our binaries are.
❎ Check the size with the traditional tool:
$ arm-none-eabi-size target/thumbv7em-none-eabihf/debug/tp-led-matrix
$ arm-none-eabi-size target/thumbv7em-none-eabihf/release/tp-led-matrix
Those paths are painful to type! Fortunately, there’s a better way:
❎ Use cargo-size for convenience:
$ cargo size # Debug mode
$ cargo size --release # Release mode
This builds the binary if needed, then shows its size. Much nicer!
Generating Customized Documentation
Here’s a pro tip: The online docs for embassy-stm32 show all STM32 microcontrollers, which can be overwhelming. Generate documentation specifically for your chip!
❎ Generate custom documentation:
$ cargo doc --open
This creates docs tailored to your dependencies and feature flags—only showing what’s actually available on your STM32L475VGT6 and
opens the documentation in your browser (thanks to --open).
Try searching for a method like gamma_correct to see your own documented code!
💡 Keep it updated: Rerun
cargo docafter updating dependencies or making significant code changes. It’s smart—it only regenerates what changed.
Great! You now have a complete embedded Rust development environment. Your code compiles to ARM, you have proper documentation, and you’re ready to flash it onto hardware! 🚀
Uploading the program to the board using Segger JLink tools
Even though this program does nothing, we want to upload it to the board. For this, we will use Segger JLink tool suite, as explained in 4SE07 lab.
❎ Ensure that you have either one of arm-none-eabi-gdb or gdb-multiarch installed on your system. If this is not the case, install it before proceeding.
❎ In a dedicated terminal, launch JLinkGDBServer -device STM32L475VG.
We need to configure gdb so that it connects to the JLinkGDBServer program and uploads the program.
❎ Create a jlink.gdb gdb script containing the commands to connect to JLinkGDBServer, upload and run the debugged program:
target extended-remote :2331
load
mon reset
c
We would like cargo run to automatically launch gdb with the script we just wrote. Fortunately, the runner can be configured as well!
❎ In .cargo/config.toml, add the following to the conditional target section you created earlier:
runner = "arm-none-eabi-gdb -q -x jlink.gdb"
⚠ On some systems, one must use gdb-multiarch instead of arm-none-eabi-gdb, check which executable is available.
❎ Upload and run your program using cargo run while your board is connected. You should be able to interrupt gdb using ctrl-c and see that you are indeed looping in the panic handler function.
Congratulations: you are running your first embedded Rust program on a real board.
Displaying Output: Real-Time Transfer (RTT) 📡
Your program runs on the board, but how do you see what it’s doing? Enter RTT (Real-Time Transfer)—a clever protocol from Segger that lets your microcontroller communicate with your computer through in-memory buffers.
How RTT Works
RTT uses shared memory that the JLink debugging probe continuously scans. It transfers data between your microcontroller and your host computer—fast, efficient, and perfect for debugging!
Setting Up RTT in Rust
The Rust embedded ecosystem provides excellent RTT support through two crates:
rtt-target: Implements the RTT protocol and providesrprintln!()for formatted output (likeprintln!but over RTT)panic-rtt-target: A panic handler that sends panic messages over RTT so you can see exactly what went wrong
❎ Add RTT crates as dependencies:
$ cargo add rtt-target panic-rtt-target
Wiring It Up
❎ Remove your manual panic handler from src/main.rs and import the RTT panic handler:
use panic_rtt_target as _;
The as _ means we’re importing it just to link it in—we don’t need to reference it directly.
❎ Import RTT printing macros and update your main function:
use rtt_target::{rtt_init_print, rprintln};
#[entry]
fn main() -> ! {
rtt_init_print!();
rprintln!("Hello, world!");
panic!("The program stopped");
}
Seeing the Output
❎ Start the RTT client in a terminal:
$ JLinkRTTClient
(Or JLinkRTTClientExe depending on your installation)
This connects to the running JLinkGDBServer and displays output from your board.
❎ Flash and run your program:
$ cargo run --release
You should now see “Hello, world!” followed by the panic message in the RTT client terminal!
🎉 Success! Now you can debug embedded programs just like regular Rust programs—with print statements and panic messages. This will make development so much easier!
Optimizing the setup
We will take some steps to ease our development process and save some time later.
Reduce binary size
Using cargo size and cargo size --release, we can see that the binary produced in release mode is much smaller than the one produced in debug mode. Note that size doesn’t display the debug information since those are never stored in the target memory.
We would like to use --release to keep an optimized binary, but we would like to keep the debug information in case we need to use gdb, or to have a better backtrace in case of panic. Fortunately,
we can do that with cargo and require that the release profile:
- keeps debug symbols;
- uses link-time-optimization (LTO) to optimize the produced binary even further;
- generates objects one by one to get an even better optimization.
❎ To do so, add the following section to your program Cargo.toml:
[profile.release]
debug = true # symbols are nice and they don't increase the size on the target
lto = true # better optimizations
codegen-units = 1 # better optimizations
From now on, we will always use --release when building binaries and those will be optimized fully and contain debugging symbols.
Make it simplier to run the program
Even though we have configured cargo run so that it runs gdb automatically and uploads our program, we still have to start JLinkGDBServer and JLinkRTTClient. Fortunately, the probe-rs and knurling-rs projects make it easy to develop embedded Rust programs:
probe-rslets you manipulate the probes connected to your computer, such as the probe located on your IoT-node board.defmt(for deferred formatting) is a logging library and set of tools that lets you log events from your embedded programs and transmit them in an efficient binary format. The formatting for the developer consumption will be made by tools running on the host rather than on the target.probe-rs runis able to getdefmttraces using a RTT channel and decode and format them.
Many others programs such as cargo flash or cargo embed exist, but we will not need them here.
❎ Stop the Segger JLink tools. Using the probe-rs executable, check if the probe on your board is properly detected.
❎ Use probe-rs run with the appropriate parameters instead of gdb to upload your program onto the board and run it. Replace your runner in .cargo/config.toml by:
runner = "probe-rs run --chip stm32l475vgtx"
❎ Using cargo run --release, look at your program being compiled, uploaded and run on your board. You should see the messages sent over RTT on your screen.
⚠ You can use ctrl-c to quit probe-rs run.
Use defmt for logging
Instead of using RTT directly, we will use defmt to have a better and efficient logging system.
❎ Remove the rtt-target and panic-rtt-target from your dependencies in Cargo.toml.
❎ Add the defmt and defmt-rtt dependencies to your Cargo.toml.
❎ Add the panic-probe dependency to your Cargo.toml with the print-defmt feature.
defmt-rtt is the RTT transport library for defmt. panic-probe with the print-defmt feature will indicate to probe-rs run the panic message to display using defmt and will tell it to stop in case of a panic.
❎ defmt uses a special section in your executable. In .cargo/config.toml, add the following to your existing rustflags in order to include the provided linker file fragment: "-C", "link-arg=-Tdefmt.x".
❎ Modify your code in src/main.rs to include the following changes:
- Write
use panic_probe as _;instead ofpanic_rtt_targetto use thepanic-probecrate. - Write
use defmt_rtt as _;to link with thedefmtt-rttlibrary. - Remove use of
rtt_targetitems. - Remove
rtt_init_print!(), and replacerprintln!()withdefmt::info!()to print a message.
❎ Run your program using cargo run --release. Notice that you see the panic information, but you do not see the “Hello, world!” message.
By default, defmt only prints errors. The various log level are trace, debug, info, warn, and error. If you want to see the messages of level info and above (info, warn, and error), you must set the DEFMT_LOG environment variable when building and when running the program. Only the appropriate information will be included at build time and displayed at run time.
❎ Build and run your program using DEFMT_LOG=info cargo run --release. You will see the “Hello, world!” message. Note that you could also have used DEFMT_LOG=trace or DEFMT_LOG=debug if you add more verbose error messages.
❎ Setup the default log level by telling cargo to set the DEFMT_LOG environment variable when using cargo commands. You can do this by adding a [env] section in .cargo/config.toml:
[env]
DEFMT_LOG = "info"
⚠️ Changing the
[env]section of.cargo/config.tomlwill not recompile the program with the new options. Make sure that you usecargo cleanwhen you change theDEFMT_LOGvariable.
🎉 Your environment is fully setup in an efficient way. If needed, you can revert to using gdb and Segger JLink tools, but that should be reserved to extreme cases.
Configuring the Hardware: Unleashing Performance ⚡
So far, your board is running with default settings—using a slow and imprecise 4MHz internal oscillator with most peripherals sleeping. Let’s wake it up and run at full speed!
Understanding Hardware Abstraction Layers
Several crates work together to give you safe, high-level access to your STM32L475VGT6’s hardware:
-
cortex-m: Common functionality for all ARM Cortex-M processors (not specific to STM32) -
stm32-metapac: A Peripheral Access Crate (PAC) providing low-level register access for all STM32 chips. You don’t need to add this explicitly—the HAL includes it. -
embassy-stm32: The Hardware Abstraction Layer (HAL) that provides safe, high-level APIs. This is what you’ll use!
Think of it like layers: PAC provides raw register access, HAL builds safe abstractions on top, and you build your application on the HAL.
Setting Up Clock Configuration
Let’s import what we need for clock configuration.
❎ Add imports to main.rs:
use embassy_stm32::rcc::*;
use embassy_stm32::Config;
Running at Maximum Speed
Your STM32L475VGT6 can run at 80MHz—let’s use that full power! The STM32L475VGT6 microcontroller can be clocked from several sources:
- HSE (High-Speed External) clock: an external crystal, oscillator, or other precise clock source. Unfortunately, our board is not fitted with a high-speed crystal/oscillator. Although we could use the precise clock signal coming from the debug probe, it would only work when the probe is powered (i.e., during debugging). 😕
- HSI (High-Speed Internal) clock: an internal 16MHz RC oscillator. It is not very precise and depends on temperature. 🙅
- MSI (Multi-Speed Internal) clock: another internal oscillator whose frequency can be set to several values between 100kHz and 48MHz. It is not precise, but it can be trimmed to within ~0.25% if a precise low-speed oscillator is present. And guess what? We have an LSE (Low-Speed External) oscillator on our board, running at 32.768kHz, so we can use it to stabilize the MSI. 🙂
- PLL (Phase-Locked Loop): this is a system that takes a clock signal as input, pre-divide it, and multiply it. This faster clock can then be divided again by three different values (P, Q and R). The result of the division by R can be used as a system clock. 🤩
We’ll configure the PLL to take a 4MHz MSI clock as input, divide it by 1, multiply it by 40, then divide by 2 to get 80MHz. We’ll also enable the LSE clock with its default configuration (32.768kHz); the HAL will detect this and automatically use it to stabilize the MSI clock.
❎ Replace your main() function with this clock-configured version:
#[entry]
fn main() -> ! {
defmt::info!("defmt correctly initialized");
// Setup the clocks at 80MHz using MSI, stabilized by the LSE:
// 4MHz (MSI) / 1 * 40 / 2 = 80MHz. The flash wait
// states will be configured accordingly.
let mut config = Config::default();
config.rcc.msi = Some(MSIRange::RANGE4M); // MSI at 4MHz
config.rcc.ls = LsConfig::default_lse(); // LSE at 32.768kHz
config.rcc.pll = Some(Pll {
source: PllSource::MSI, // 4MHz
prediv: PllPreDiv::DIV1, // 4MHz / 1 = 4MHz
mul: PllMul::MUL40, // 4MHz / 1 * 40 = 160MHz
divp: None,
divq: None,
divr: Some(PllRDiv::DIV2), // 4MHz / 1 * 40 / 2 = 80MHz
});
config.rcc.sys = Sysclk::PLL1_R;
embassy_stm32::init(config);
panic!("Everything configured");
}
What’s happening here?
- We create a default
Configand customize the clock settings. - We select MSI at 4MHz and enable the LSE, which the HAL uses to stabilize (trim) the MSI.
- We enable the PLL, set its source to the MSI, and don’t pre-divide (DIV1) the source.
- The PLL multiplies 4MHz by 40 (=160MHz).
- We then divide by 2 to get our target 80MHz.
- We select the R output of the PLL as our system clock source.
embassy_stm32::init(config)applies all these settings and configures flash wait states automatically.
🎉 Congratulations! Your microcontroller now runs at 80MHz instead of the default 4MHz—that’s 20× faster, with a clock precision within ~0.25% instead of the default ~1%-3% for the MSI alone. Your program does the same thing as before, but now you have the performance headroom for real-time tasks like driving an LED matrix.
Time to make those LEDs shine! 💡
GPIO and the LED matrix
We will now configure and program our LED matrix. It uses 13 GPIO on three different ports.
HAL and peripherals
The embassy_stm32::init() function that you have used earlier returns a value of type Peripherals. This is a large structure which contains every peripheral available on the microcontroller.
❎ Store the peripherals in a variable named p:
let p = embassy_stm32::init(config);
In this variable, you will find for example a field named PB0 (p.PB0). This field has type embassy_stm32::Peri<'static, embassy_stm32::peripherals::PB0>: this is the type of the pin B0. Each pin will have its own type, which means that you will not use one instead of another by mistake.
HAL and GPIO configuration
A pin is configured through types found in the embassy_stm32::gpio module. For example, you can configure pin PB0 as an output with an initial low state and a very high commuting speed by doing:
// pin will be of type Output<'static>
let mut pin = Output::new(p.PB0, Level::Low, Speed::VeryHigh);
// Set output to high
pin.set_high();
// Set output to low
pin.set_low();
If pin is dropped, it will be automatically deconfigured and set back as an input.
🦀 The lifetime parameter
'ainOutput<'a>represents the lifetime of the pin that we have configured as output. In our case, the lifetime is'staticas we work directly with the pins themselves. But sometimes, you get the pin from a structure which has a limited lifetime, and this is reflected in'a.
Matrix module
❎ Create a public matrix module.
❎ In the matrix module, import embassy_stm32::gpio::* as well as tp_led_matrix::{Color, Image} (from your library) and define the Matrix structure. It is fully given here to avoid a tedious manual copy operation, as well as all the functions you will have to implement on a Matrix:
pub struct Matrix<'a> {
sb: Output<'a>,
lat: Output<'a>,
rst: Output<'a>,
sck: Output<'a>,
sda: Output<'a>,
rows: [Output<'a>; 8],
}
impl Matrix<'_> {
/// Create a new matrix from the control registers and the individual
/// unconfigured pins. SB and LAT will be set high by default, while
/// other pins will be set low. After 100ms, RST will be set high, and
/// the bank 0 will be initialized by calling `init_bank0()` on the
/// newly constructed structure.
/// The pins will be set to very high speed mode.
#[expect(clippy::too_many_arguments)] // Necessary to avoid a Clippy warning
pub fn new(
pa2: Peri<'static, PA2>,
pa3: Peri<'static, PA3>,
pa4: Peri<'static, PA4>,
pa5: Peri<'static, PA5>,
pa6: Peri<'static, PA6>,
pa7: Peri<'static, PA7>,
pa15: Peri<'static, PA15>,
pb0: Peri<'static, PB0>,
pb1: Peri<'static, PB1>,
pb2: Peri<'static, PB2>,
pc3: Peri<'static, PC3>,
pc4: Peri<'static, PC4>,
pc5: Peri<'static, PC5>,
) -> Self {
// Configure the pins, with the correct speed and their initial state
todo!()
}
/// Make a brief high pulse of the SCK pin
fn pulse_sck(&mut self) {
todo!()
}
/// Make a brief low pulse of the LAT pin
fn pulse_lat(&mut self) {
todo!()
}
/// Send a byte on SDA starting with the MSB and pulse SCK high after each bit
fn send_byte(&mut self, pixel: u8) {
todo!()
}
/// Send a full row of bytes in BGR order and pulse LAT low. Gamma correction
/// must be applied to every pixel before sending them. The previous row must
/// be deactivated and the new one activated.
pub fn send_row(&mut self, row: usize, pixels: &[Color]) {
todo!()
}
/// Initialize bank0 by temporarily setting SB to low and sending 144 one bits,
/// pulsing SCK high after each bit and pulsing LAT low at the end. SB is then
/// restored to high.
fn init_bank0(&mut self) {
todo!()
}
/// Display a full image, row by row, as fast as possible.
pub fn display_image(&mut self, image: &Image) {
// Do not forget that image.row(n) gives access to the content of row n,
// and that self.send_row() uses the same format.
todo!()
}
}
❎ Implement all those functions.
You can refer to 4SE07 notes for GPIO connections (in French) and the operation of the LED Matrix controller (in French).
Note that you need to maintain the reset signal low for 100ms. How can you do that? Keep reading.
Implementing a delay
Since you do not use an operating system (yet!), you need to do some looping to implement a delay. Fortunately, the embassy-time can be used for this. By cooperating with the embassy-stm32 crate, it will be able to provide you with some timing functionalities:
❎ Add the embassy-time crate as a dependency with feature tick-hz-32_768: this will configure a timer at a 32768Hz frequency, which will give you sub-millisecond precision. You will also have to enable the generic-queue-8 feature since we don’t use the full Embassy executor at this stage. Note that embassy-time knows nothing about the microcontroller you use, it needs a timer to run on.
❎ Add the time-driver-any to the embassy-stm32 dependency. This will tell the HAL to make a timer at the disposal of the embassy-time crate.
The Rust embedded working-group has defined common traits to work on embedded systems. One of those traits is the DelayNs in the embedded-hal crate, which is implemented by the embassy_stm32::d::Delay singleton of embassy-time. You can use it as shown below:
❎ Add the embedded-hal dependency.
❎ Import the DelayNS trait in your matrix.rs, as well as the Delay singleton from embassy-time:
use embedded_hal::delay::DelayNs as _;
use embassy_time::Delay;
You can then use the following statement to wait for 100ms:
Delay.delay_ms(100);
🦀 Note on singletons
Delayis a singleton: this is a type which has only one value. Here,Delayis declared as:struct Delay;which means that the type
Delayhas only one value, which occupies 0 bytes in memory, also calledDelay. Here, theDelaytype is used to implement theDelayNstrait from theembedded-halcrate:impl embedded_hal::delay::DelayNs for Delay { fn delay_ms(&mut self, ms: u32) { … } … }You might have noticed that
selfis not used indelay_ms, but the implementation has to conform to the way the trait has been defined. When you later writeDelay.delay_ms(100), you create a new instance (which contains nothing) of the typeDelay, on which you mutably calldelay_ms(100).
Main program
❎ In your main program, build an image made of a gradient of blue and display it in loop on the matrix. Since it is necessary for the display to go fast, do not forget to run your program in release mode, as we have been doing for a while now. Don’t forget that Image values have a .row() method which can be handy here.
Are you seeing a nice gradient? If you do, congratulations, you have programmed your first peripheral in bare board mode with the help of a HAL. 👏
(if not, add traces using defmt)
Real-Time Control: Precision Timing 🕐
Great work getting the LED matrix displaying images! Now let’s take it to the next level by adding precise timing control. Instead of displaying as fast as possible, we’ll implement smooth, controlled animations with professional-quality timing.
What We’ll Build
In this section, you’ll transform your basic display into a sophisticated real-time system with:
- Embassy executor: Bring in Rust’s async/await for embedded systems
- Controlled line timing: Display each row at precise intervals for smooth 80 FPS rendering
- Timed image changes: Automatically cycle through images with perfect timing
- Serial communication: Receive new images over the serial port in real-time
- Triple buffering: Ensure buttery-smooth transitions without tearing or flicker
Why Real-Time Matters
Real-time systems aren’t just about speed—they’re about predictability. Your LED matrix needs consistent timing to avoid flicker and provide smooth animations. Embassy’s async framework makes this surprisingly elegant in Rust!
Ready to make your display professional-grade? Let’s dive in! 🚀
Embassy executor
The Embassy framework and particularly its executor will help us decouple tasks and resources.
Add the Embassy executor as a dependency
❎ Add the embassy-executor dependency to your Cargo.toml file with the following features:
arch-cortex-min order to select the asynchronous executor adapted to our architectureexecutor-threadto enable to default executor (“thread mode” is the “normal” processor mode, opposed to “interrupt mode”)defmtto enable debugging messages
Since we now use the full executor, the generic-queue-8 feature can be removed from embassy-time. The timers will use the features provided by the Embassy executor.
Embassy main program
❎ Add the embassy_executor::main attribute to your main function (instead of the previous entry attribute) and make it async, as seen in class and in Embassy documentation. Check that you can still execute your code as you did before. The main() function must take a Spawner parameter, which will be used to create tasks.
❎ Modify the Matrix::new() method so that it becomes asynchronous. Replace the use of the blocking delay by a call to one of the Timer asynchronous function.
For example you could use Timer::after() and give it an appropriate Duration, or use Timer::after_millis() directly.
Check that your program works correctly, including after unplugging and replugging your board in order to deinitialize the led matrix.
⚠ Right now, your
mainfunction executes a busy loop at its end. This is not a problem right now, because you don’t have other asynchronous tasks running. However, as soon as you will spawn any other asynchronous task, you will have to make sure that you don’t keep the busy loop, otherwise those asynchronous tasks won’t be able to execute as your main function will never relinguish control to the executor. Yourmainfunction will have to either terminate and return nothing, or to wait forever on a future (for example usingcore::future::pending().await).
Controlled line change
In this part, we will start using a periodic ticker to run some tasks at designated times. For example, we want to display frames at a pace of 80 FPS (frames per second) as it is most pleasant for the eyes to not have frequencies below 70Hz. Since each line of the matrix should get the same display time, we will call a display task 80×8=640 times per second. This display task will display the next line.
Blinking led
In order to check that you do not block the system, you want to create a new asynchronous task which will make the green led blink.
❎ Comment out your matrix display loop. You will reenable it later.
❎ Create a new task blinker as an asynchronous function with attribute embassy_executor::task. This function:
- receives the green led port (PB14) as an argument
- initialize the port as an output
- loops infinitely while displaying this pattern:
- three quick green flashes
- a longer pause
Don’t forget that you can use asynchronous functions from Timer as you did just before.
❎ Using the Spawner object passed to your main program, spawn the blinker task.
❎ Check that the green led displays the expected pattern repeatidly.
❎ Reenable the matrix display loop (after you have spawned the new task).
You should no longer see your green led blink: your matrix display loops never returns and never suspends itself as an asynchronous task would do while waiting for the time to switch to the next line. We will take care of that.
Controlled pace
We want to make an asynchronous task whose job is to take care of displaying the lines of the led matrix at the right pace in order to get a 80Hz smooth display. For this we will need to build the elements:
- An asynchronous task that will be spawned from
main() - A
Matrixinstance to give to this task – we already have it! - A
Tickerobject to maintain a steady 80Hz rate. - A way to be able to modify the
Imagedisplayed on the matrix from other tasks, such asmain(). We will need to use aMutexfrom the crateembassy-syncto protect theImagebeing accessed from several places.
Let’s build this incrementally.
Asynchronous display task
❎ Make a new display asynchronous task taking a Matrix instance as an argument, and copy the current display loop inside. Put an infinite loop around, as we do not want to leave the display task, ever! Add what is needed to make it working (such as a static Image). Spawn the display task from main().
Note that you have to supply a lifetime, as your Matrix type gets one. Fortunately, 'static will work, as this is the lifetime of the ports you configured from your Peripherals object.
Check that your program still works. Still, no green led blinking yet. Both the blinker and display asynchronous tasks run on the same executor, but the display task never relinquishes control to the executor.
Ticking
❎ In your display task, create a Ticker object which will tick every time it should display a new line. 8 lines, 80 Hz, that gives? You got it! Don’t hesitate to use the convenience methods such as Duration::from_hz().
You now want Matrix::display_image() to use this ticker.
❎ Add a ticker parameter to display_image(). You just want to use it, not take ownership of it, so you need a reference. Since you note that the ticker’s next() method requires a &mut self, you need to receive the ticker as a mutable reference as well.
❎ Make display_image() an asynchronous function, since it needs to wait for the ticker to tick.
❎ In display_image(), wait until the tickers tick before displaying a row, so that rows are evenly spaced every 1/640th of a second.
❎ In display(), pass a mutable reference to the ticker to display_image().
If everything goes well, you should see both the image on your led matrix and the green led pattern. Neat, eh?
Image change
Right now, the display tasks does more than displaying something, as it takes care of the Image itself. It should only access it when needed, but creating and modifying the image should not be its responsibility. Let’s fix that.
Sharing a Image between tasks
We will create a shared Image, protected by a mutex. However, you have to understand how Embassy’s mutexes work first.
Embassy asynchronous mutexes
Embassy’s mutexes cannot use spin locks, as spin locks loop forever until they get the lock. If Embassy did this, it would block the current asynchronous task, and thus the whole executor.
Embassy’s mutexes are asynchronous-friendly, and will yield when they cannot lock the resource immediately. However, to implement it, Embassy still needs a real mutex (which Embassy calls a “raw mutex”, or “blocking mutex”) for a very short critical section.
Since all our tasks are running on the same executor, they will never try to lock the raw mutex at the same time. It means that we can safely use the ThreadModeRawMutex as raw mutex.
Creating the shared image object
So we want to create a global (static) Image object protected by a Mutex using internally a ThreadModeRawMutex.
❎ Import embassy_sync::mutex::Mutex and embassy_sync::blocking_mutex::raw::ThreadModeRawMutex.
❎ Declare a new global (static) IMAGE object of type Mutex<ThreadModeRawMutex, Image> and initialize it… but with what?
Creating the initial image
Initialization of static variables are done before any code starts to execute. The compiler must know what data to put in the global IMAGE object.
We could try to use:
static IMAGE: Mutex<ThreadModeRawMutex, Image> = Mutex::new(Image::new_solid(Color::GREEN));
but the compiler will complain:
error[E0015]: cannot call non-const fn `tp_led_matrix::Image::new_solid` in statics
|
| static IMAGE: Mutex<ThreadModeRawMutex, Image> = Mutex::new(Image::new_solid(Color::GREEN));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Indeed, it cannot execute the call to Image::new_solid() before even code starts to execute. However, there is an easy solution here! 💡
The code of Image::new_solid() is likely simple (if it is not, fix it):
impl Image {
pub fn new_solid(color: Color) -> Self {
Image([color; 64])
}
}
Indeed, this is so simple that this could be done at compilation time if the function were a const one. const functions, when given constant parameters, can be replaced by their result at compilation time.
By adding the const keyword:
impl Image {
pub const fn new_solid(color: Color) -> Self {
Image([color; 64])
}
}
the compiler will now be able to create the data structure for the mutex containing the image with the green constant at compilation time, and place it the .data section.
Putting it together
❎ Add the const keyword to the Image::new_solid() function and initialize the IMAGE object. You may want to add a new constant Color, such as BLACK, even though it may be useful at the beginning to look at a visible image.
❎ Modify the display task so that, before being displayed, each image is copied locally in order not to keep the mutex locked for a long time.
Changing images dynamically
❎ Modify the main task so that the IMAGE object is modified, every second or so, by another one.
Don’t make things complicated. You should noticed that your display changes every second, while being pleasant to look at. The green led should blink its pattern at the same time.
This is starting to look nice.
Serial port
As was done in the 4SE07 lab, we want to be able to send image data from the serial port. We will configure the serial port, then write a decoding task to handle received bytes.
Fortunately, this will be much simpler to do so using Rust and Embassy.
The procedure
Of course, we will create a serial_receiver asynchronous task. This task will:
- receive the needed peripherals needed to configure the serial port
- receive the image bytes
- update the shared image by copying the received bytes
- loop to receiving the image bytes
Receiving the image efficiently
How can we receive the image most efficiently? How will we handle incomplete images, or extra bytes sent after the image?
The first byte sent is a marker (0xff), we must wait for it. Then we should receive 192 bytes, none of which should be a 0xff. We want to receive all bytes in one DMA (direct memory access) transaction. But what will happen if an image is incomplete?
In this case, another image will follow, starting with a 0xff. In our buffer, we will have:
<------------------------ 192 ---------------------> | o | … | o | o | o | o | 0xff | n | n | n | … | n | <---------- P ----------> <-------- N -------->where
obelongs to the original image, andnto the new image (N bytes received). In this case, we should rotate the dataP+1places to the left (orNplaces to the right, which is equivalent) so that the new image datanis put at the beginning of the buffer, in order to have<------------------------ 192 --------------------> |n | n | n | … | n | o | … | o | o | o | o | 0xff | <------- N --------X--------- P ----------->We just need to receive the 192-N bytes starting after the N bytes, and check again that there is no 0xff in the buffer. If this is the case, we have a full image, otherwise we rotate again, etc.
Note that the initial situation, after receiving the 0xff marker, is similar to having N being 0, there is no need to special case it.
The task
❎ Create the serial_receiver task. This task receives several peripherals: the USART1 peripheral, the serial port pins, and the DMA channel to use for the reception.
By looking at the figure 29 on page 339 of the STM32L4x5 reference manual, you will see that the DMA channel for reception (RX) of USART1 is DMA1_CH5.
Since we do not need to transmit anything, we do not need to configure the transmission (TX) side of the serial port, and we do not need to attribute a DMA channel for emission. Embassy supports this configuration out of the box and provide a UartRx structure, which is the reception (RX) side of a serial port, leaving the transmission side untouched and unconfigured.
❎ Create the UartRx device. Also, don’t forget to configure the baudrate to 38400.
Note that
UartRx::new()expects a_irqparameter. This is a convention for Embassy to ensure at compile time that you have properly declared that the corresponding IRQ is forwarded to the HAL using thebind_interrupts!()macro.bind_interrupts!(struct Irqs { USART1 => usart::InterruptHandler<USART1>; });
Irqsis the singleton that needs to be passed as the_irqparameter ofUartRx::new().
⚠ Depending on your version of Embassy, the order of the parameters for
UartRx::new()may be different. Choose carefully the version of Embassy you’re using in the documentation.
The logic
❎ Implement the reception logic, and update the shared image when the bytes for a full image have been received.
Some tips:
-
Use the algorithm shown in “Receiving the image efficiently” above:
- Create a buffer to hold 192 bytes
- Wait for the 0xff marker — you have then received N=0 image bytes at this stage
- Receive the missing 192-N bytes starting at offset N of the buffer
- If, looking from the end, you find a 0xff in the buffer at position K:
- Shift the buffer right by K positions
- Set N to K and go back to step 3 Otherwise, you have a full image, you can update the shared image and go to step 2.
-
To update the shared image from the received bytes, you can extract it from the static mutex-protected
IMAGEobject, then request the&mut [u8]view of the image with.as_mut(), since you have implementedAsMut<[u8; 192]>onImage. You can then use an assignment to update the image content from the buffer you have received.
❎ Start the serial_receiver task from main(). Check that you can display data received from the serial port.
Congratulations, your project rocks!
Triple buffering
Our current handling of the image received on the serial port is not very satisfying. As soon as we have received a full image, we update the shared image: it means that the next rows to be displayed will come from the newer image while some rows on the LED matrix may have come from the older image.
⚠ You do not have to implement double-buffering. You have to understand how it works, but you only need to implement triple-buffering.
What is double-buffering?
In older computers, drawing something was performed directly in the screen buffer (also called the video RAM) as memory was tight. It meant that some artifacts could easily be perceived unless extreme caution was observed. For example, if an image was displayed by a beam going from the top to the bottom of the screen, drawing a shape starting from the bottom of the screen would make the bottom half of the shape appear before the top half does. On the other hand, drawing from the top to the bottom at the same pace as the refreshing beam would display consistent pictures.
As memory became more affordable, people started to draw the next image to display into a back buffer. This process lets software draw things in an order which is not correlated with the beam displaying the image (for example objects far away then nearer objects). Once the new image is complete, it can be transferred into the front buffer (the video RAM) while ensuring that the transfer does not cross the beam, which requires synchronization with the hardware. This way, only full images are displayed in a consistent way.
On some hardware, both buffers fit in video RAM. In this case, switching buffer at the appropriate time is done by modifying a hardware register at the appropriate time.
Double-buffering in our project
We already implement part of the double-buffering method in our code: we prepare the next image in a separate buffer while the current one is being displayed in a loop. We could modify our code (⚠ again, you do not need to implement double-buffering, this is only an example, you’ll implement triple-buffering) so that the image switching takes place at the appropriate time:
- Make the new image a shared resource
next_imagerather than a local resource. - Add a shared boolean
switch_requestedto theSharedstate, and set it inreceive_bytewhen the new image is complete. - Have the
displaytask check theswitch_requestedboolean after displaying the last row of the current image, and swap theimageandnext_imageif this is the case and resetswitch_requested.
By locking next_image and switch_requested for the shortest possible time, the receive_byte task would prevent the display task from running for very short periods. However, we could still run into an issue in the following scenario:
- The last byte of the next image is received just as the current image starts displaying.
- We set
switch_requestedto request the image switch, but this will happen after the whole current image as been displayed (roughly 1/60 seconds later, or 17ms). - The speed of the serial port is 38400 bits per second, and a byte requires 10 symbols (start, 8 bits, stop).
- It means that while the current image is being displayed, about 64 bytes of the next-next image can be received.
Where can we store those bytes? If we store them in next_image, we will alter a buffer which has been fully drawn but not displayed yet so we cannot do this. We cannot obviously store them in image either. There is nothing we can do there.
Triple buffering
We need a third buffer: one buffer is the one currently being displayed, one buffer is the next fully completed image ready to be displayed, and one buffer is the work area where we build the currently incomplete image.
In order to avoid copying whole images around, we would like to work with buffer references and switch those references. Should we use dynamic memory allocation? ☠ Certainly not.
The heapless crate
The heapless crate contains several data structures that can be used in environments where dynamic memory allocation is not available or not desirable:
heapless::Vec<T>has an interface quite similar tostd::vec::Vec<T>except that those vectors have a fixed capacity, which means that thepushoperation returns aResultindicating if the operation succeeded or failed (in which case it returns the element we tried to push).- Other structures such as
BinaryHeap,IndexMap,IndexSet,String, etc. act closely like the standard library ones. heapless::poolis a module for defining lock-free memory pools which allocate and reclaim fixed size objects: this is the one we are interested in.
Using a pool
By using a static pool of Image types named POOL, we will be able to manipulate values of type Box<POOL>: this type represents a reference to an image from the pool. Box<POOL> implements Deref<Target = Image> as well as DerefMut, so we will be able to use such a type instead of a reference to an Image. Also, we can easily swap two Box<POOL> objects instead of exchanging whole image contents.
A pool is declared globally by using the heapless::box_pool!() macro as described in the heapless::pool documentation. The BoxBlock<Image> represents the space occupied by an image and will be managed by the pool. Then the .alloc() method can be used to retrieve some space to be used through a Box<POOL> smart pointer. Dropping such a Box<POOL> will return the space to the pool.
box_pool!(POOL: Image);
…
// Code to put in the main function:
// Statically reserve space for three `Image` objects, and let them
// be managed by the pool `POOL`.
unsafe {
#[expect(clippy::declare_interior_mutable_const)]
const BLOCK: BoxBlock<Image> = BoxBlock::new();
static mut MEMORY: [BoxBlock<Image>; 3] = [BLOCK; 3];
// By defaut, mutable reference static data is forbidden. We want
// to allow it.
#[expect(static_mut_refs)]
for block in &mut MEMORY {
POOL.manage(block);
}
}
- This pool can hand out
Box<POOL>throughPOOL.alloc(model)which returns anResult<Box<POOL>, Image>initialized frommodel:- Either the pool could return an object (
Ok(…)). - Or the pool had no free object, in which case the model is returned with the error:
Err(model).
- Either the pool could return an object (
- When it is no longer used, a
Box<POOL>can be returned to the pool just by dropping it.
We will build a pool containing the space for three images:
- When we receive a
0xffon the serial port to indicate a new image, we will draw an image from the pool and start filling its data until we have all the bytes. - When an image is complete, the serial receiver will hand it to the display task.
- The display task starts by waiting for an image coming from the serial receiver and starts displaying it repeatidly.
- If a new image arrives from the serial receiver after the last line of the current image is displayed, the display task replaces the current image by the new one. This drops the image that was just displayed, and it is then automatically returned to the pool.
We see why, in the worst case, three images might coexist at the same time:
- The display task may be displaying image 1.
- The serial receiver has finished receiving image 2 and has stored it so that the display task can pick it up when it is done displaying image 1.
- The serial receiver has started the reception of image 3.
❎ Declare a pool named POOL handing out Image objects using the box_pool!() macro.
❎ In the main() function, before starting the display or serial_receiver task, reserve memory for 3 Image (using the unsafe block shown above) and hand those three areas to the pool to be managed.
Using Embassy’s Signal
To pass an image from the serial receiver to the display task, we can use the Signal data structure from the embassy_sync crate. The Signal structure is interesting:
- It acts like a queue with at most one item.
- Reading from the queue waits asynchronously until an item is available and returns it.
- Writing to the queue overwrites (and drops) the current item if there is one.
This is exactly the data structure we need to pass information from the serial receiver to the display task. We will make a global NEXT_IMAGE static variable which will be a Signal to exchange Box<POOL> objects (each Box<POOL> contains an Image) between the serial_receiver and the display tasks.
A Signal needs to use a raw mutex internally. Here, a ThreadModeRawMutex similar to the one we used before can be used.
❎ Declare a NEXT_IMAGE static object as described above.
Displaying the image
You want to modify the display task so that:
- It waits until an image is available from
NEXT_IMAGEand stores it into the localimagevariable. - Then in an infinite loop:
- It displays the image it has received.
imageis of typeBox<POOL>, but sinceBox<POOL>implementsDeref<Target = Image>,&imagecan be used in a context where an&Imagewould be required. - If there is a new image available from
NEXT_IMAGE, thenimageis replaced by it. This will drop the olderBox<POOL>object, which will be made available to the pool again automatically.
- It displays the image it has received.
NEXT_IMAGE.wait() returns a Future which will eventually return the next image available in NEXT_IMAGE:
- Awaiting this future using
.awaitwill block until an image is available. This might be handy to get the initial image. - If you import
futures::FutureExtinto your scope, then you get additional methods onFutureimplementations. One of them is.now_or_never(), which returns anOption: eitherNoneif theFuturedoes not resolve immediately (without waiting), orSome(…)if the result is available immediately. You could use this to check if a new image is available fromNEXT_IMAGE, and if it is replace the currentimage.
❎ Add the futures crate as a dependency in your Cargo.toml. By default, the futures crates will require std; you have to specify default-features = false when importing it, or add it using cargo add futures --no-default-features.
❎ Rewrite display() to do what is described above.
You now want to check that it works by using an initial image before modifying the serial receiver. To do so, you will build an initial image and put it inside NEXT_IMAGE so that it gets displayed.
❎ At the end of the main() function, get an image from the pool, containing a red gradient, by using the POOL.alloc() method.
❎ Send this image containing a gradient to the NEXT_IMAGE queue by using the signal method of the queue.
You should see the gradient on the screen.
❎ Now, check that new images are correctly displayed:
- Surround the code above with an infinite loop.
- Inside the loop, add an asynchronous delay of 1 second after sending the image to
NEXT_IMAGE. - Still inside the loop, repeat those three steps (get an image from the pool, send it to the display task through
NEXT_IMAGE, and wait for one second) in another color.
If you see two images alternating every second, you have won: your display task is working, with proper synchronization. Time to modify the serial receiver.
Receiving new images
Only small modifications are needed to the serial receiver:
- When you receive the first
0xffindicating a new image, get an image from the pool (you can initialize it from the default image,Image::default()). You may panic if you don’t get one as we have shown that three image buffers should be enough for the program to work. - Receive bytes directly in the image buffer, that you can access with
image.as_mut()(remember, you implemented theAsMuttrait onImage). - When the image is complete, signal its existence to
NEXT_IMAGE.
❎ Implement the steps above.
❎ Remove the static IMAGE object which is not used anymore.
❎ Remove the image switching in main(), as don’t want to interfere with displaying the images received from the serial port. You may keep one initial image though, to display something before you receive the first image through the serial port.
❎ Check that you can display images coming from the serial port. Congratulations, you are now using triple buffering without copying large quantities of data around.
Bonus Level: Advanced Features 🌟
Congratulations on making it this far! This bonus section is completely optional—you can achieve the maximum grade without completing it, as long as the core requirements are perfect.
But here’s the exciting part: These bonus tasks are not only fun challenges that deepen your embedded Rust skills, but they can also earn you additional points if you haven’t quite reached the maximum grade yet.
What’s Available
Ready to level up? Choose from these advanced features:
1. Dedicated Executor
Implement priority-based task scheduling with a dedicated executor for your display task. This ensures glitch-free rendering even when the system is under heavy load. Perfect for understanding real-time scheduling!
2. Screen Saver
Every great display deserves a screen saver! Build animated patterns that activate after a period of inactivity. Express your creativity while learning about state management.
3. Text Drawing
What if your screen saver could display scrolling text or messages? Implement pixel-based text rendering to take your LED matrix to the next level.
Why Do These?
Beyond potential grade points, these challenges will:
- Deepen your understanding of async Rust
- Teach you advanced embedded patterns
- Give you impressive portfolio pieces
- Most importantly: They’re genuinely fun! 🎮
Pick what interests you and enjoy the journey! 🚀
Dedicated executor
Until now, we used only one executor in thread mode (the regular mode in which the processor runs, as opposed to interrupt mode). It means that Embassy’s executor will execute one asynchronous task until it yields, then the other, then the other, and so on. If for any reason one task requires a bit more time than expected, you might delay other tasks such as the display task. In this case, you might notice a short glitch on the display.
To prevent this, we will use a dedicated interrupt executor to run our display task. In this scenario, when it is time to display a new line on the display, an interrupt will be raised and the executor will resume the display task while still in interrupt mode, interrupting the rest of the program.
You will have to choose an unused hardware interrupt, and:
- configure it to the priority you want to use, with regard to other interrupts in the system
- start the executor, telling it to tell its tasks to raise this interrupt by software (pend the interrupt, as in make it pending) when they have progress to signal
- call the executor’s
on_interrupt()method in the ISR, so that the executor knows that it must poll its tasks
Those are three easy tasks. We will choose interrupt UART4, and set it to priority level Priority::P6:
❎ Add the executor-interrupt feature to the embassy-executor dependency in Cargo.toml.
❎ Create a static DISPLAY_EXECUTOR global variable, with type InterruptExecutor.
❎ Choose an unused interrupt (pick UART4, whose number is available as as embassy_stm32::interrupt::UART4), configure it with an arbitrary priority (use Priority::P6). Start the DISPLAY_EXECUTOR and associate it with this interrupt. Use the returned spawner to spawn the display task.
❎ Write an ISR for this interrupt, and redirect the event to the executor:
#[interrupt]
unsafe fn UART4() {
unsafe {
DISPLAY_EXECUTOR.on_interrupt();
}
}
Note that ISR are unsafe functions, as doing the wrong thing in an interrupt routine might lock up the system.
At this stage, you might notice that your code does not compile: the NEXT_IMAGE data structure uses a ThreadModeRawMutex as its internal mutex. Such a mutex, as its name indicates, can only be used to synchronize tasks running in thread mode, not in interrupt mode.
❎ Use a CriticalSectionRawMutex as an internal mutex for NEXT_IMAGE, because such a mutex is usable to synchronize code running in interrupt mode with code running in thread mode.
Your display should now be as beautiful as ever.
Screen saver
What should your led matrix do when you do not send anything on the serial port? Wouldn’t it be great to have a screen saver, which automatically runs when nothing is sent, and does not get in the way otherwise?
You will have to create a new screensaver task, which will trigger an image change when nothing is being received on the serial port for a while.
Recording image changes
You don’t want the screen saver to run if data is being received. Let’s record new images arrival.
❎ Declare a static NEW_IMAGE_RECEIVED Signal object containing a Instant.
❎ When a new image is received in serial_receiver, signal the current date to the NEW_IMAGE_RECEIVED queue.
Implementing the screensaver task
❎ Implement a screensaver task and start it on the thread-mode (regular) executor.
In this task, you may for example, in an infinite loop:
- Read the date of the last image received without waiting.
- If any image has been received, wait until one second after this date and
continuethe loop. This way, you effectively do not display anything until the serial port has been idle for one second. - Display your screensaver image (get one from the pool and set it to
NEXT_IMAGE). - Wait for one second.
You can even be more creative and use alternating images every second.
Note that both the serial port code and the screensaver run in thread-mode. The NEW_IMAGE_RECEIVED should only require a ThreadModeRawMutex for its internal synchronization. Check that you haven’t used a CriticalSectionRawMutex as it does not require one.
Drawing things
The screensaver feature was nice, but the screensaver could be more entertaining. What if it could display scrolling text, such as “This Rust 4SE02 project will get me a good grade”?
Fortunately, one crate can help you do that: embedded-graphics. Provided you do the proper interfacing with your hardware, this crate will let you draw all kind of shapes, and even display text.
Interfacing with your hardware: the embedded module
You have already decoupled the logical representation of your LED
matrix (the Image type) from the physical one (the Matrix
type). This will make your job easier, as you will only have to
interface the Image type with the embedded-graphics crate: once
you have an Image you can display it on your hardware by putting it
into next_image.
❎ Create an embedded module in your library. This module will
contain anything needed to interface the drawing primitives of the
embedded-graphics crate with your Image type.
First you’ll have to choose a pixel representation that
embedded-graphics can use and which is appropriate for your
display. Since you can already display RGB colors with 8 bits data for
each component, the
Rgb888
color type seems appropriate.
❎ Implement From<Rgb888> for your Color type. That will be useful
when drawing on your Image, to build a proper Color value.
Now, you need to implement the
DrawTarget
trait for your Image type. This trait is the one which does the real
drawing. You will only implement the minimal functionality and use the
provided defaults for the rest.
❎ Implement DrawTarget for Image:
- The
Colortype will beRgb888. - You can use
Infallibleas yourErrortype, because drawing into anImagenever fails. - When you implement
draw_iter(), make sure that you only set the pixels whose coordinates belong to the image (xandyboth in0..8). This method can be called with a larger image, for example a large text, and you will only display a portion of it. - If you need to convert a
Rgb888into aColor, do not forget that you can use.into()because you implementedFrom<Rgb888> for Color.
Upgrading the screensaver
You can now use the drawing primitives of embedded-graphics to
create images in your screensaver instead of using gradients.
❎ Modify your screensaver so that it creates intesting images using the drawing primitives.
For example, you could add another local variable in addition to the color index, such as a shape index, and draw a square, a triangle, a circle, and a solid color. Ideally, those color and shape indices would use cycle sizes which are coprime, to maximize the displayed combinations.
When this works, commit and push your code.
Drawing text
The next step is to display scrolling text from the screensaver. Yes, that means forgetting about the shapes that you just designed, they were used to familiarize yourself with the library.
A
Text
object represents some text that can later been drawn into anything
implementing DrawTarget (such as an Image). It uses a character
style, which can be built using
MonoTextStyle::new()
from a font and a color. And the
ibm437 crate provides a great
IBM437_8X8_REGULAR
font which will be perfect for your LED matrix.
The idea is to wait for 60ms (instead of one second) after you have
displayed an image to make some text scroll to the next position if no
new image has been received. To make the text scroll to the left, you
will position it as a negative x offset: since you display pixels
whose x is in 0..8, decreasing the x position of the start of
the text will make it go left.
❎ Modify the screensaver task so that it gets called every
60ms. You need a precise timing if you want the scrolling to be
pleasant.
❎ Modify the screensaver task such that, when it wants to display something:
- A
Textobject is built with a text such as “Hello 4SE02”, and placed at anxposition whose value is kept in aoffsetlocal variable. You can use the color you want, or make the color cycle. - The text is drawn into an image coming from the pool, and displayed through
NEXT_IMAGE. - Decrease the
offsetlocal variable, except if the end of the text has reached the0xcoordinate, in which caseoffsetmust be reset to display the text again (find the appropriate value so that it is nice for the eyes). Note: theTextobject has methods to check its bounding box (the smallest rectangle in which it fits).
❎ Modify the screensaver task so that if a new image has been
received on the serial port, the offset of the text is reset so that
next time the screensaver displays something it will start from the
beginning of the text.
Note: you might have to adapt your DrawingText trait implementation for
Image, for example if the text appears upside down.
Make it even prettier if you wish, commit, push.
🦀 Congratulations, you have reached the end of this lab! 🦀
Bridge: running the VM in the background
You now have two cool pieces of software:
- a small virtual machine interpreter you wrote in Rust,
- an Embassy-powered embedded program that drives an LED matrix in real time.
Let’s make them work together.
In this section, we will run the VM in the background while the LED matrix keeps doing its job (displaying images, receiving pixels from UART, etc.). If everything goes well, you’ll be able to:
- keep sending images to the board as before,
- simultaneously observe VM output in a serial terminal,
- and confirm the display remains stable (no glitches).
Advanced / optional part
This bridge is intentionally more advanced than the rest of the lab. It is meant for students who want to go further into:
- Rust in
no_stdenvironments, - cooperative execution models,
- and real embedded integration constraints (ownership of peripherals, backpressure, time-slicing CPU work).
If you are short on time, it is OK to stop after the main LED-matrix lab.
What makes it interesting
The constraints are the interesting part:
- no code copy: you will modify your VM crate in place (one shared implementation),
- the VM library must be usable in
no_std(embedded) environments, - it must integrate nicely with an Embassy application,
- it must not perturb the LED matrix real-time behavior,
- the VM output must go to the serial port.
This section is a bit more open-ended than the previous ones: you will have to make design choices. But we’ll keep the spirit of the lab: small steps, frequent builds, and a clear target.
What we will build
We will:
- reshape the VM crate so that the library is
no_std, and provides an API that returns output bytes via a small buffer, - add the VM crate as a dependency of the LED matrix project (by
path, to keep one shared codebase), - embed a prebuilt VM binary in the firmware using
include_bytes!(), - spawn a new task
vm_runnerthat repeatedly executes this program one step at a time, yielding often, - forward the VM output to
USART1TX, while leaving the existingUSART1RX image receiver intact.
Prerequisites
- You have a working
tp-led-matrixproject (in Embassy async mode). - You have a working
vmproject from the earlier lab.
We will assume you have both available locally as Cargo crates.
Exact paths do not matter, as long as you can reference one crate from the other using a relative path = "..." dependency.
Making the VM embeddable
In the first part of the course, you wrote a VM that runs on the desktop.
Now that we are bridging the VM and the Embassy-based embedded program, we will evolve this VM crate so that:
- it becomes
no_std-compatible (library part), - it gains a single execution API that returns output bytes via a small buffer,
- and it still behaves the same as before when used from the command line.
This work happens now, as part of the bridge.
Context
The VM crate you wrote earlier was written for the desktop first, and therefore uses std by default.
In order to embed it in tp-led-matrix (which is #![no_std]), the VM library part must be made no_std compatible.
This bridge is intentionally done in this order:
- change
step_onso it returns output bytes via a small buffer (and adaptstep/run/CLI/tests accordingly) - then make the VM library
no_std - embed it into the Embassy firmware
However, having no_std is not enough: we also need the VM to play well with an embedded program that uses async I/O.
A background task that runs the VM must:
- execute a small amount of work,
- yield back to the executor,
- execute again later,
- without ever monopolizing the CPU.
We will therefore:
- make the VM library compile in
no_stdmode (using a Cargostdfeature enabled by default), - expose a single-step API that writes output into a caller-provided buffer,
- implement a cooperative runner (
run_budget_on) so we can time-slice the VM in the embedded application.
Step 1: change step_on to use a buffer
Do this refactor first, while your VM still builds as a normal desktop project.
Why now?
- it’s easier to adapt
step/run, the CLI and the tests whilestdis still available everywhere, - and once the signature no longer mentions
std::io::Write, making the libraryno_stdbecomes mostly mechanical.
The suggested signature is:
pub fn step_on(&mut self, out: &mut [u8]) -> Result<(bool, usize)>
which adds to the return value the number of bytes that have been written to the out buffer (or 0 if no output instruction has been used).
After this step, your VM core should no longer need a Write to produce output: each instruction will instead fill a small caller-provided byte buffer.
Step 2: making the VM library no_std
Now we need to make sure it can compile in no_std so it can run on the microcontroller.
Your starting point (student version) typically has:
- an execution API (
Machine::step/run), - output written to
stdout, - tests/examples built for the host.
We want:
- the library to build in
no_std, - the desktop CLI and tests to keep working,
- and the same VM crate to be usable as a dependency from
tp-led-matrix.
This is why we will not create a copy of the VM crate: we will evolve the same codebase to support both worlds.
Since we are going async-only, the std/no_std split is now mostly about:
- whether you can use the standard library (
std), - what runtime you run on,
- and what output device you target.
In practice:
- desktop: CLI + stdout output
- embedded: Embassy + UART output
You will still typically use a std feature enabled by default so the CLI remains easy to run on the desktop.
This pattern is widely used in the ecosystem (many crates do the same), and once you have it, your VM becomes reusable in any embedded project.
Once that is done, you should be able to build the VM library in no_std with:
cargo build --lib --no-default-features
The std feature (what stays on the host)
We want one crate that:
- can be used as a
no_stdlibrary from the firmware, - but still provides the host-facing convenience API used by the original VM lab.
The common Cargo pattern is:
- define a
stdfeature, enabled by default, - compile in
no_stdwhen default features are disabled.
This typically looks like:
default = ["std"]std = []- plus
#![cfg_attr(not(feature = "std"), no_std)]at the crate root.
❎ Keep these parts available without the std feature:
- the VM core (
Machine, instruction decoding/execution, memory/register access), - the new buffer-based API (
step_on(out: &mut [u8]),run_budget_on, …), - any helper types that only depend on
core.
❎ Keep these parts only when std is enabled:
- the host convenience methods that write directly to stdout:
step()(uses stdout internally)run()(uses stdout internally)
- any error variants / conversions that mention
std::io.
In other words: the embedded firmware never calls step() / run(), it only uses step_on(&mut [u8]) and forwards the returned bytes to UART TX.
Example: keep step() and run() behind std
Here is the idea for host convenience methods implemented on top of the new buffer-based step_on:
#![allow(unused)]
fn main() {
#[cfg(feature = "std")]
impl Machine {
pub fn step(&mut self) -> Result<bool> {
use std::io::Write;
let mut out_buf = [0u8; 11];
let (exited, n) = self.step_on(&mut out_buf)?;
if n != 0 {
let mut stdout = std::io::stdout().lock();
stdout.write_all(&out_buf[..n])?;
stdout.flush()?;
}
Ok(exited)
}
pub fn run(&mut self) -> Result<()> {
while !self.step()? {}
Ok(())
}
}
}
(Exact return types/signatures depend on what your original VM lab required, but the pattern is the same: step()/run() call step_on(&mut [u8]), then forward the produced bytes to stdout.)
Step 3: return output bytes via a buffer
We will make VM output explicit: each instruction may produce some bytes, and the VM will write them into a small buffer provided by the caller.
Key observation:
- opcode 6 (
out) outputs one UTF-8 encoded character: at most 4 bytes - opcode 8 (
out number) outputs ani32as ASCII: at most"-2147483648"→ 11 bytes
So if the caller provides, e.g., a 12-byte buffer, the per-instruction output is always bounded.
The new execution API
⚠️ This changes the VM API compared to the initial VM lab.
In particular, you will change the signature of step_on (and any helper runners you add) so that it no longer takes a std::io::Write. Instead, it produces bytes into a small buffer, and the caller decides how to forward them (stdout on the host, UART TX on the board, etc.).
This also means that the host convenience methods (step, run, etc.) must be adapted: they can no longer forward a Write into step_on. They must call the new buffer-based step_on, then write &out_buf[..n] to stdout.
❎ Expose a stepping API:
pub fn step_on(&mut self, out: &mut [u8]) -> Result<(bool, usize)>
where:
- the returned
boolistrueif the VM has exited - the returned
usizeis the number of bytes written intooutfor that instruction
Inside step_on, everything stays the same except opcodes 6 and 8:
- For opcode 6 (
out): encode the character into a local 4-byte buffer and copy it intoout, then returnOk((exited, n)). - For opcode 8 (
out number): format into a local small buffer (itoa) and copy it intoout, then returnOk((exited, n)).
Where is backpressure now?
The caller is responsible for sending &out[..n] to the final output (stdout on the desktop, UART TX on the board).
This removes the need for the VM to depend on std::io::Write.
Budgeted execution
❎ Implement a cooperative runner:
pub fn run_budget_on(&mut self, out: &mut [u8], budget: u32) -> Result<(bool, usize)>
This runs up to budget steps and returns accumulated bytes written.
(You can also keep a thin helper that just loops on step_on if you prefer.)
Step 4: cooperative stepping
Even with buffered output, the VM still does CPU work between outputs.
The simplest way to integrate a CPU-bound algorithm into an async system is:
- split it into very small units of work (here: a single
step_on()), - run a fixed number of steps (“budget”),
- then yield back to the executor.
That is exactly what run_budget_on() gives you.
In the firmware, you will combine it with a small delay or yield_now() so the VM never monopolizes the CPU.
Why make this change in the library?
Because in an embedded environment, you often want to place policy decisions outside of the library:
- the library executes “pure compute” plus formatted output,
- the application decides how much CPU time to give it.
This keeps the VM reusable.
Optional: reduce formatting overhead
Your VM uses write!(out, ...) for each character / number.
In embedded environments, formatting can be expensive. Two easy techniques:
- pick a “budget” small enough that formatting overhead does not block the system,
- prefer output programs that don’t print too fast.
Later in this section we will hook this output to UART TX.
Depending on the VM crate from tp-led-matrix
Now that the VM library exposes an embeddable building block (a budgeted runner), we will use it from the LED matrix firmware.
Add the dependency
In your LED-matrix project Cargo.toml:
❎ Add a dependency on the VM crate by path.
Important: this must point to your existing VM crate directory, not to a copy. From now on, there should be exactly one shared implementation of the VM library crate in the repository, used both by:
- the desktop VM CLI/tests,
- and the embedded
tp-led-matrixfirmware.
Using a path = "..." dependency is what guarantees you are editing/running the same code in both contexts.
Important naming detail:
- The package name of the VM project is
vm(that is thename = "vm"under[package]in the VM crate’sCargo.toml). This is what Cargo uses to find the crate on disk / in the workspace. - The library crate name is
interpreter(that is thename = "interpreter"under[lib]). This is the name you write in Rust code:use interpreter::Machine;.
Those two names are allowed to differ.
In tp-led-matrix, we want the dependency key to match the name we will use in Rust code (interpreter), and we want to be explicit that the underlying package we’re pointing at is the VM package (vm).
Also:
- your VM crate is
stdby default, so in embedded you must disable default features (to turnstdoff).
Your dependency entry should therefore look like:
interpreter = { package = "vm", path = "../vm", default-features = false }
(Adjust the relative path if needed.)
❎ cargo build the embedded firmware.
At this stage, you are only compiling the VM library for your embedded target. Nothing is executed yet.
If this fails with errors mentioning std, go back to the previous page and finish the no_std conversion of the VM crate.
Sanity check
In tp-led-matrix/src/main.rs (or wherever your async main lives):
❎ Add a tiny, compile-only check that the type is visible, for instance:
use interpreter::Machine;
Do not instantiate it yet, as we will do this in a new async task.
Embedding a VM program with include_bytes!()
We want the firmware to boot and immediately start running a VM program.
Because embedded systems may not have a filesystem, we will embed the bytecode in the firmware at compile time.
Choose a program
Your VM repository ships example programs (such as hello_world.bin or 99bottles.bin). Pick a small one first.
For example: vm/examples/hello_world.bin (relative to the root of your VM project)
Embed the bytes
In your firmware code, define:
static PROGRAM: &[u8] = include_bytes!("../../vm/examples/hello_world.bin");
Notes:
- the path is relative to the Rust source file where the macro is located,
include_bytes!()returns&'static [u8; N]; the coercion into&'static [u8]happens automatically.
Acceptance criteria
- the program bytes are visible as a
&'static [u8]constant, - the firmware builds.
We will actually execute the VM in the next page.
Printing VM output on the serial port
The VM produces output through the out and out number instructions.
We want this output to go to a serial terminal on the host.
But there is a catch: your firmware currently uses USART1 RX with DMA to receive images.
We must:
- keep the RX task intact,
- enable TX,
- ensure the VM can print without blocking for long periods.
Strategy: a small UART TX task + a channel
The most robust and student-friendly approach is to keep UART TX owned by one dedicated task, and send VM output bytes to that task through a bounded channel.
- The VM task stays simple: it just calls
step_on(&mut out_buf)and forwards any produced bytes. - The serial TX task owns
UartTxand is the only place that touches the hardware. - The channel capacity provides backpressure: if the terminal is slow, the channel fills up and the VM automatically pauses when trying to send more bytes.
This design avoids having to implement a custom Future with Pin/poll just to adapt a UART driver.
What “backpressure” means
A UART is slower than the CPU. Sometimes it cannot accept a new byte right now because its internal hardware buffer is full.
Backpressure is the mechanism that prevents data loss in this situation:
- when the UART cannot accept more data, the channel
send(...).awaitcall does not complete yet, - so the VM runner task is paused (it yields to the executor),
- and it automatically resumes later when the UART has transmitted enough bytes.
This is exactly what we want:
- no bytes are dropped,
- VM output stays in-order,
- other Embassy tasks (display, RX, etc.) can continue running while the VM is waiting.
Important: sharing USART1
On STM32, a single peripheral instance (USART1) cannot be safely split into two independent “owners” (one RX and one TX) unless the HAL explicitly supports it.
There are two valid approaches:
- Preferred (single owner): configure
USART1once, then split the driver into RX and TX halves and pass them to the respective tasks. - Fallback (two UARTs): keep the existing RX setup on
USART1, and use a different UART peripheral for VM output.
In this lab, we will aim for (1). You will need to inspect the Embassy STM32 UART API for your version.
❎ Modify your serial initialization so that:
- UART is created once in
main, - you obtain a TX handle and an RX handle,
- RX goes to the existing
serial_receivertask, - TX will be moved into the VM output object (later used by the VM task).
If your current code uses UartRx::new(...), you will likely need to move to a constructor that returns a full Uart and then split it.
How to use split() in practice
The exact constructor name depends on your Embassy version, but the pattern is always the same:
- Create a full UART instance once (in
main) with both pins and both DMA channels. - Split it into a transmit half and a receive half.
- Move each half into the task that owns it.
Typical pattern
- In
main, instead of building aUartRx, build a fullUart(orBufferedUart) instance. - Immediately call
.split()(or sometimes.split_rx_tx()depending on the API).
Conceptually:
let uart = Uart::new(...);let (tx, rx) = uart.split();
Then:
- spawn
serial_receiver(rx, ...)for the LED-matrix image receiver, - pass
txto the serial TX task (e.g.vm_serial_tx(tx)), - the VM runner task will not touch the UART directly.
Important ownership rule
After calling split(), do not keep using the original uart object: both halves now own the peripheral state they need.
DMA note
If your setup uses DMA for RX (as in the image receiver), you will typically:
- give the RX half the RX DMA channel,
- provide a TX DMA channel for the TX half.
The key idea is: instantiate once, split once, then move halves to tasks.
VM output with a channel (backpressure)
The VM core produces per-instruction output bytes into a small buffer via Machine::step_on(&mut [u8]).
So on the firmware side we:
- run the VM in an async task,
- and forward the produced bytes to a dedicated UART TX task through a bounded channel.
The channel provides backpressure: when it is full, send().await will suspend the VM runner task until the UART catches up.
❎ Implement VM output using a channel and a TX task:
- Create a bounded
Channel(for exampleChannel<..., u8, 256>). - Create a
vm_serial_txtask that:- owns
UartTx, - receives bytes from the channel,
- writes them to UART (
tx.write(&[b]).await).
- owns
- In your VM runner task:
- create a small output buffer (11 bytes is enough),
- call
let (exited, n) = machine.step_on(&mut out_buf)?; - for
&out_buf[..n],send().awaiteach byte to the channel.
No bytes are dropped: if the UART cannot accept more data yet, the VM runner naturally pauses.
Acceptance criteria
- You can open a serial terminal and see VM output.
- The LED matrix still refreshes correctly.
- The
serial_receiverstill works.
Running the VM as an async background task
Now we have all the pieces:
- a VM library that can run for a bounded instruction budget per call,
- an embedded VM program included with
include_bytes!(), - a UART TX half and a bounded channel to provide backpressure.
Let’s write the vm_runner task.
The vm_runner task
❎ Create a new async task vm_runner that:
- creates a VM
Machinefrom the included bytes, - creates a small per-instruction output buffer (11 bytes is enough),
- runs the machine in a loop using
step_on(orrun_budget_on) and forwards produced bytes to UART TX through the bounded channel, - yields between budgets so other tasks always get CPU time.
Pseudo-code:
- create machine
- loop (“run one VM instance”):
- loop (“budget”):
- run one
step_on - forward
out_buf[..n]into the TX channel - if
exited == true: stop running this VM instance (break out of the program loop)
- run one
Timer::after_millis(0).await(preferred: always available) orTimer::after_millis(1).await- restart the VM (create a fresh machine)
- loop (“budget”):
Note: older/newer Embassy versions may or may not have
embassy_executor::yield_now(). The portable solution is a zero-duration timer.
Choosing a budget
The budget is how you control CPU usage.
Start with something conservative, for example:
BUDGET = 50instructions
Then:
- if the VM output is too slow: increase budget,
- if the display glitches: decrease budget.
This is exactly the kind of real-time trade-off you will face in real embedded systems.
Do not perturb the LED matrix
Your display task likely runs on a dedicated interrupt executor (if you did that bonus part) or at least relies on timely polling.
To keep it stable:
- never run the VM in a tight loop without yielding,
- let UART backpressure suspend the VM task (for example by awaiting on a bounded channel
send), - do not lock shared resources for long.
Error handling
If step_on() returns an error (bad program, invalid opcode, etc.):
- print the error (on serial),
- restart the VM.
In an embedded system, a “crash and restart the subsystem” strategy is often completely fine.
Acceptance criteria
- At boot, the board immediately starts printing VM output.
- You can still upload images on the LED matrix.
- There are no visible glitches on the display.
Checklist and possible improvements
Minimal checklist
You are done when all of the following are true:
- the VM library builds in
no_std(withdefault-features = false), - the VM library exposes an async budgeted stepping API (
run_budget) usable from async firmware, - the LED matrix firmware includes a
.binVM program usinginclude_bytes!(), - the firmware runs the VM in a background async task,
- VM output is visible on a serial terminal (complete, in-order, no dropped bytes), matching the desktop output for the same
.binprogram, - the LED matrix remains responsive and does not glitch,
- the serial image receiver keeps working.
Improvements
If you want to go further:
1. Recommended: UART TX task + bounded channel
In practice, the most robust solution is to own UART TX in a single dedicated task and send outgoing bytes to it through a bounded channel (e.g. embassy_sync::channel::Channel).
Why this is the recommended approach:
- it avoids tricky low-level
Future/polladapter code, - it keeps peripheral ownership clear (one task owns
UartTx), - it still provides backpressure (when the channel is full, the VM waits),
- it is easy to extend (batching, prefixes, multiple producers, etc.).
If you want higher throughput, the TX task can also batch bytes before calling tx.write(...).
2. Smarter yielding inside the VM
Instead of yielding after a fixed budget:
- yield after each
outinstruction, - or yield when a certain amount of output was produced.
3. Run multiple VM instances
Spawn multiple vm_runner tasks with different programs and prefixes, for example:
[vm1] ...[vm2] ...
4. “VM controls the image”
A fun integration project:
- define a memory-mapped “framebuffer” region in VM memory,
- have a task periodically read it and update the LED matrix image.
This makes your interpreted program drive the hardware.