Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

Welcome to 4SE02: Rust for Embedded Systems! 🦀

This course, taught by Guillaume Duc and Samuel Tardieu, is part of the Embedded Systems program at Télécom Paris. Throughout these labs, you’ll discover how Rust’s safety guarantees and zero-cost abstractions make it an excellent choice for embedded development.

What You’ll Learn

In these practical exercises, you’ll:

  • Master Rust fundamentals through hands-on coding
  • Build real embedded applications for microcontrollers
  • Work with LED matrices, serial communication, and real-time systems
  • Experience the power of Rust’s type system in preventing bugs at compile time

Course Materials

This content is reserved for students of the Institut Polytechnique de Paris.

ⓒ 2020-2026 Guillaume Duc and Samuel Tardieu – all rights reserved

Setting Up Your Development Environment

Welcome to your first step in Rust embedded development! In this section, we’ll set up all the tools you need to start building amazing embedded applications. Don’t worry if you’re new to Rust—we’ll guide you through each step.

Git Repository Setup

Before diving into code, let’s get your project repository ready.

Join the course group: Request to join the 4SE02/2526 group on the Telecom GitLab using this link.

Once approved by the instructors, you’ll have your own personal repository to store all your practical work. This is where your embedded Rust journey begins!

Installing rustup - Your Rust Toolchain Manager

rustup is Rust’s official toolchain installer and version manager. Think of it as your Swiss Army knife for managing Rust installations—it handles compiler versions, cross-compilation targets, and all the tools you’ll need.

Install rustup using one of these methods:

  • From your Linux distribution’s package manager, or
  • From the rustup.rs website

If you choose the website installation, remember to reload your shell environment afterwards so that the Rust tools are added to your PATH.

💡 Platform flexibility: While this course assumes you’re using GNU/Linux, the Rust ecosystem works great on macOS and Windows too. Feel free to use another OS, but note that we can only provide support for GNU/Linux environments.

💡 Storage tip for school computers: Running out of disk quota? Use the local directory /home/users/LOGIN (replace LOGIN with your username). Set the RUSTUP_HOME environment variable to /home/users/LOGIN/rustup in your shell configuration files to store all Rust data there instead of ~/.rustup. Remember to delete ~/.rustup to free up space. You’ll need to repeat this setup if you log in to a different computer.

Understanding Rust Toolchain Versions

Rust’s compilation toolchain comes in three flavors, each serving a different purpose:

  • stable: The production-ready version, rigorously tested and updated every six weeks. This is what you’ll use for this course.
  • beta: The testing ground for the next stable release. Great for early adopters who want to test upcoming features.
  • nightly: The bleeding edge with experimental features. Some features here will eventually make it to stable, while others are just experiments.

By default, rustup installs the latest stable version. Let’s verify your installation:

$ rustup show
Default host: x86_64-unknown-linux-gnu
rustup home:  /usr/local/rustup

installed toolchains
--------------------
stable-x86_64-unknown-linux-gnu (active, default)

active toolchain
----------------
name: stable-x86_64-unknown-linux-gnu
active because: it's the default toolchain
installed targets:
  x86_64-unknown-linux-gnu

On a development system, you may find several versions of development chains, targets, etc., for example:

$ rustup show
Default host: x86_64-unknown-linux-gnu
rustup home:  /home/sam/.rustup

installed toolchains
--------------------
stable-x86_64-unknown-linux-gnu
beta-x86_64-unknown-linux-gnu
nightly-x86_64-unknown-linux-gnu (active, default)

active toolchain
----------------
name: nightly-x86_64-unknown-linux-gnu
active because: it's the default toolchain
installed targets:
  riscv32imac-unknown-none-elf
  riscv32imafc-unknown-none-elf
  riscv32imc-unknown-none-elf
  thumbv7em-none-eabihf
  thumbv7m-none-eabi
  thumbv8m.main-none-eabihf
  x86_64-unknown-linux-gnu

You can update all installed components with rustup update.

Ensure you’re up to date: Run rustup update to get the latest stable version. This is especially important if you already had rustup installed before this course.

Your Rust Toolkit

When you install a Rust toolchain, you get a powerful set of tools:

  • cargo: Your command center! This all-in-one tool orchestrates compilation, testing, documentation, and more. You’ll primarily interact with Rust through cargo commands like cargo build, cargo test, and cargo run.
  • rustc: The Rust compiler itself (though you’ll rarely call it directly—cargo does that for you).
  • rustdoc: Automatically generates beautiful documentation from your code comments.
  • rustfmt: Formats your code according to Rust community standards.
  • clippy: A smart linter that catches common mistakes and suggests idiomatic improvements.

Choose Your Code Editor

You’re free to use any editor you prefer! Unless you already have one editor picked up already (Emacs, Neovim, Lapce, Helix, …), we suggest that you use:

Visual Studio Code with these extensions:

  • rust-analyzer: Provides intelligent code completion, inline error checking, and navigation. It’s like having a Rust expert looking over your shoulder!
  • Error Lens (optional): Displays errors inline as you type. Helpful but can feel a bit intrusive—try it and see if you like it.

Code Quality: Formatting and Linting

Rust’s community values consistent, high-quality code. Two tools help you achieve this effortlessly:

Clippy: Your Rust Mentor 🦀

Clippy is an intelligent linter that identifies anti-patterns, inefficient code, and suggests more idiomatic Rust approaches.

⚠️ Important: Run cargo clippy regularly on your code. Address its suggestions or adjust your code until Clippy is satisfied. Think of it as pair programming with an experienced Rustacean!

You have some flexibility:

  • Disable specific warnings when you have a good reason: Use #[allow(clippy::some_lint_name)] on specific items (be ready to justify this choice!)
  • Enable stricter checks for even better code quality: Run cargo clippy -D clippy::pedantic

Rustfmt: Consistent Formatting Made Easy

⚠️ Keep your code formatted: Simply run cargo fmt to automatically format your code according to Rust’s standard style. No debates about formatting—Rust has agreed on one style for everyone!

Pro tip: Run both cargo fmt and cargo clippy before every commit to keep your code clean and professional.

Your First Rust Program: Fibonacci

Let’s dive into Rust with a classic programming exercise: computing Fibonacci numbers! This hands-on introduction will get you comfortable with Rust’s syntax, tools, and workflows.

Creating Your First Project

Time to create your first Rust project using Cargo.

Create a new project: Run cargo new fibo in your terminal.

This creates a new directory called fibo with a complete binary project structure. (If you wanted a library instead, you’d use cargo new --lib.)

Navigate into your new project directory and let’s explore what Cargo created for you:

Project Structure

  • Cargo.toml: The project manifest. It contains metadata (name, authors, version) and will list any dependencies you add.
  • Cargo.lock: Generated after compilation, this file locks down the exact versions of dependencies used. This ensures anyone can reproduce your exact build—even months later.
  • src/: Your source code lives here. Right now, it just has main.rs with a “Hello, world!” program.

All these files should be committed to version control (git). Cargo even initializes a git repository for you (unless you’re already in one) complete with a .gitignore file.

After compilation, a target/ directory will appear containing build artifacts and binaries. This directory is large and regeneratable, so it’s already in .gitignore—never commit it!

Building and Running

Let’s build your project:

Compile the project: Run cargo build

By default, this creates a debug build—slower but easier to debug—in target/debug/fibo.

Run your program: Execute ./target/debug/fibo and observe the “Hello, world!” output from src/main.rs.

💡 Shortcut: Instead of building and running separately, use cargo run to compile (if needed) and execute in one command!

💡 Release builds: For optimized production code, use cargo build --release or cargo run --release. Release builds are significantly faster but take longer to compile.

Implementing Fibonacci Recursively

Now for the fun part—let’s implement the classic Fibonacci function! As a refresher, the Fibonacci sequence is defined as:

  • fibo(0) = 0
  • fibo(1) = 1
  • fibo(n) = fibo(n-1) + fibo(n-2) for n > 1

Implement the recursive Fibonacci function with this signature:

#![allow(unused)]
fn main() {
fn fibo(n: u32) -> u32 {
    // TODO: Your implementation here
}
}

💡 Rust tip: Remember that if is an expression in Rust—it returns a value! This means you can write if condition { value1 } else { value2 } without explicit return statements. Embrace this functional style!

Displaying the Sequence

Create a loop in main() that displays Fibonacci values from 0 to 42:

fibo(0) = 0
fibo(1) = 1
fibo(2) = 1
fibo(3) = 2
fibo(4) = 3
fibo(5) = 5
...
fibo(42) = 267914296

Once working, try running in both debug and release modes to see the dramatic speed difference! Release mode should be much faster.

Making It Fast: Iterative Implementation

While elegant, recursive Fibonacci is notoriously slow for larger numbers. Let’s fix that with iteration.

Reimplement fibo iteratively while keeping the same function signature.

Hints to help you succeed:

  • Declare variables to track previous Fibonacci numbers
  • Use mut to make variables mutable
  • Create a loop without using the index: name it _ to avoid compiler warnings about unused variables
  • You can return early for base cases (when n < 2)

This version should be significantly faster than the recursive one, even in debug mode!

Handling Overflow: When Numbers Get Too Big

Let’s explore what happens when Fibonacci numbers exceed what a u32 can hold.

Change the limit from 42 to 50 and run your program.

Notice something strange between fibo(47) and fibo(48)? The numbers suddenly become nonsensical! This is integer overflow—when a number is too large to fit in a u32, it wraps around.

Rust provides several elegant ways to handle this:

  1. Use larger integers: Switch from u32 to u64 (easy but just delays the problem)
  2. Saturated arithmetic: Operations that hit a boundary (min or max) just stay at that boundary
  3. Checked arithmetic: Operations that would overflow return an error instead of producing wrong results

Let’s explore options 2 and 3 to see Rust’s safety features in action!

Saturated Arithmetic: Staying Within Bounds

Saturating operations “clamp” at the type’s maximum value when overflow would occur.

Find the saturating_add method in the u32 documentation.

Replace your addition with saturated addition and observe the results.

💡 Type suffixes: You can specify numeric literal types with suffixes like 1u32, 42i64, or 3.14f32.

Notice that results stay monotonic (always increasing) but plateau at u32::MAX (2³²-1). The values are wrong, but at least they don’t wrap around wildly!

Checked Arithmetic: Detecting Errors

Checked operations return None when overflow occurs instead of producing incorrect values.

Find the checked_add method in the u32 documentation.

Replace saturated addition with checked_add() followed by .unwrap() to extract the value.

Run your program—it should panic with a runtime error when overflow occurs. Not graceful, but at least it doesn’t silently produce wrong answers!

Handling Overflow Gracefully with Option

Let’s make overflow handling explicit and elegant using Rust’s Option type.

Change the function signature to return Option<u32>:

#![allow(unused)]
fn main() {
fn fibo(n: u32) -> Option<u32> {
    // TODO: Return None if overflow would occur,
    //       Some(result) otherwise
}
}

Now your function can communicate “this result doesn’t fit in a u32” by returning None.

Update your main function to stop the loop when a computation fails (returns None).

You can use either:

  • A match expression to handle Some(value) and None cases
  • An if let Some(value) = fibo(n) statement for cleaner code when you only care about the success case

Perfect! Your program now accurately computes Fibonacci numbers and stops gracefully when values become too large for u32.

Leveraging the Ecosystem: Using Crates

Now let’s explore one of Rust’s superpowers: its vibrant ecosystem of reusable libraries called crates.

What Are Crates?

A crate is a package of Rust code that can be either:

  • A binary crate: An executable program (like the fibo project you created)
  • A library crate: Reusable code that other projects can import

Your fibo project is a binary crate with source code in src/main.rs. The Rust community shares thousands of crates on crates.io, making it easy to add powerful functionality to your projects.

Adding Command-Line Argument Parsing

Let’s enhance your fibo program with professional command-line argument parsing using the popular clap crate.

Add clap as a dependency by adding these lines to your Cargo.toml file:

[dependencies]
clap = { version = "4.5.58", features = ["derive"] }

This tells Cargo to:

  • Fetch clap version 4.5.58 or newer (but stay below 5.0.0)
  • Enable the derive feature (not enabled by default), which allows using #[derive(Parser)] for cleaner code

Want to learn more about version specifications? Check out Cargo’s dependency documentation.

💡 You can also use cargo add clap -F derive on the command line instead of editing the Cargo.toml file by hand.

Using Clap in Your Code

Import the Parser trait at the top of main.rs:

#![allow(unused)]
fn main() {
use clap::Parser;
}

Create a command-line interface using the clap documentation to match this usage pattern:

Compute Fibonacci suite values

Usage: fibo [OPTIONS] <VALUE>

Arguments:
  <VALUE>  The maximal number to print the fibo value of

Options:
  -v, --verbose       Print intermediate values
  -m, --min <NUMBER>  The minimum number to compute
  -h, --help          Print help

💡 Automatic dependency management: When you specify clap in Cargo.toml, Cargo automatically downloads it along with all of its dependencies, then compiles everything when you build your project. It’s that simple!

The exact dependency versions used are recorded in Cargo.lock, ensuring anyone can rebuild your project with identical dependencies—even years later.

Maintaining Code Quality

Before considering your work complete, let’s ensure it meets Rust community standards:

Run Clippy to catch common mistakes and get suggestions: cargo clippy

  • Address any warnings or suggestions it provides

Format your code according to Rust conventions: cargo fmt

💡 Best practice: Run cargo fmt and cargo clippy before every commit to keep your code clean and professional. Many developers configure their editors to run these automatically!

Practice Problems: Mastering Rust Concepts 🧩

These exercises will sharpen your understanding of Rust’s unique features. Create a “problems” project in your repository to work through them.

Lifetimes: Understanding Ownership and Borrowing

Lifetimes are one of Rust’s most distinctive features. They ensure references stay valid without runtime overhead. Let’s explore them through practical problems.

Understanding the trim Method

The trim method on strings removes leading and trailing whitespace. Its signature uses lifetime elision:

fn trim(&self) -> &str;

This is shorthand for the explicit form:

fn trim<'a>(&'a self) -> &'a str;

The lifetime 'a connects the input and output: the returned string slice lives exactly as long as the string it came from.

Problem 1: Who Owns the String?

This code looks reasonable, but it won’t compile. Can you figure out why?

fn ret_string() -> String {
    String::from("  A String object  ")
}

fn main() {
    let s = ret_string().trim();
    assert_eq!(s, "A String object");
}

Think about it: What’s the lifetime of s? Who owns the underlying string with spaces? Every value in Rust has exactly one owner—when the owner goes out of scope, the value is dropped.

Fix this code so it compiles and s holds the trimmed string.

💡 Hint: You can reuse the same variable name with shadowing!

Problem 2: Choosing Between Alternatives

Sometimes a function returns one of several borrowed values. How do lifetimes work in this case?

Add appropriate lifetime annotations to make this function compile:

fn choose_str(s1: &str, s2: &str, select_s1: bool) -> &str {
    if select_s1 { s1 } else { s2 }
}

Important constraint: At call time, s1 and s2 may have different lifetimes. We don’t want to artificially constrain them to have the same lifetime—that would be too restrictive.

Think carefully about what lifetime the return value should have!

Problem 3: Building an Owned-Or-Ref (OOR) Type

This is a meatier challenge that combines enums, generics, and smart pointer traits.

⚠️ For this problem, don’t peek at the standard Cow type—solve it yourself first, then compare your solution!

The goal: Create an OOR type that can efficiently store either a String (owned) or a &str (borrowed), avoiding unnecessary copies when the string already exists.

Step 1: Define the Enum

Create an OOR enum with two variants:

  • Owned: stores a String
  • Borrowed: stores a &str

You’ll need a generic lifetime parameter. What does it represent? (Think about the lifetime of borrowed data!)

Step 2: Implement Deref

Implement the Deref trait so that OOR dereferences to &str.

Consider: What’s the lifetime of the resulting &str? Why is your choice always safe?

Test it: Verify you can call &str methods directly on OOR objects.

Step 3: Implement DerefMut

This gets trickier!

Implement DerefMut for OOR.

Challenge: If you have a Borrowed variant, you can’t get a &mut str from an immutable &str. You’ll need to convert to an Owned variant with a cloned String first!

Step 4: Comprehensive Test

Verify your implementation passes this test:

// Check Deref for both variants of OOR
let s1 = OOR::Owned(String::from("  Hello, world.  "));
assert_eq!(s1.trim(), "Hello, world.");
let mut s2 = OOR::Borrowed("  Hello, world!  ");
assert_eq!(s2.trim(), "Hello, world!");

// Check choose
let s = choose_str(&s1, &s2, true);
assert_eq!(s.trim(), "Hello, world.");
let s = choose_str(&s1, &s2, false);
assert_eq!(s.trim(), "Hello, world!");

// Check DerefMut, a borrowed string should become owned
assert!(matches!(s1, OOR::Owned(_)));
assert!(matches!(s2, OOR::Borrowed(_)));
unsafe {
    for c in s2.as_bytes_mut() {
        if *c == b'!' {
            *c = b'?';
        }
    }
}
assert!(matches!(s2, OOR::Owned(_)));
assert_eq!(s2.trim(), "Hello, world?");

What’s happening here? Notice how s2 starts as Borrowed but becomes Owned when we need mutable access. This is the “clone-on-write” pattern!


These problems will deepen your understanding of Rust’s ownership system. Take your time, think through each step, and don’t hesitate to experiment! 🦀

Building a Virtual Machine in Rust 🤖

Time for a fun challenge! You’re going to build an interpreter for a custom virtual machine. This exercise will strengthen your Rust skills while exploring how computers execute programs at a fundamental level.

What You’ll Create

Your virtual machine will:

  • Execute a simple instruction set
  • Manage memory and registers
  • Process control flow (jumps, conditionals)
  • Demonstrate Rust’s power for systems programming

Getting Started

Everything you need is here:

This is a great opportunity to see how Rust’s type system helps you build reliable, low-level systems. Ready to build your own CPU in software? Let’s go! 🚀

Virtual Machine Architecture 🏗️

Let’s define the architecture of your virtual machine! It’s intentionally simple, making it a great learning project while still being interesting to implement.

The Machine Model

Your VM is a classic von Neumann architecture with these characteristics:

Memory

  • Size: 4096 bytes (4KB)
  • Address range: 0 to 4095
  • Usage: Stores both program code and data (unified memory space)
  • Access: 32-bit reads and writes, no alignment required
  • Byte order: Little-endian (least significant byte first)

Registers

  • Count: 16 general-purpose registers
  • Names: r0 through r15
  • Width: 32 bits each
  • Special: r0 is the Instruction Pointer (IP)—it holds the address of the next instruction to execute

💡 Little-endian explained: When storing a 32-bit value like 0x12345678, the least significant byte (0x78) goes in the lowest address, then 0x56, then 0x34, then 0x12 in the highest address.

Execution Model: The Fetch-Decode-Execute Cycle

Each step of execution follows this pattern:

  1. Fetch & Decode: Read the instruction at address IP and decode it

    • Variable-length instructions! Each instruction component (like reg_a) is exactly one byte
  2. Advance IP: Move IP to point just after the decoded instruction and its arguments

  3. Execute: Perform the decoded instruction’s operation

This is the classic CPU execution model—the same pattern real processors use!

Error Handling: When Things Go Wrong

Your VM should detect these error conditions and return an error (not panic!):

  1. ❌ Invalid instruction opcode at IP
  2. ❌ Instruction doesn’t fit entirely in memory
  3. ❌ Instruction references an invalid register (> r15)
  4. ❌ Instruction accesses an invalid memory address (≥ 4096)

Important: Return a Result::Err for these cases—don’t panic! Once an error occurs, the VM should not be used again.

🦀 Rust philosophy: Recoverable errors return Result, unrecoverable errors panic. VM execution errors are recoverable—the host program can handle them gracefully.

The Instruction Set 🔧

Your VM has a minimalist instruction set—just 8 instructions! Don’t let the simplicity fool you; this is enough to write interesting programs.

InstructionOpcodeArgumentsEffect
move if1rᵢ rⱼ rₖif rₖ ≠ 0 Then rᵢ ← rⱼ
store2rᵢ rⱼmem[rᵢ] ← rⱼ
load3rᵢ rⱼrᵢ ← mem[rⱼ]
loadimm4rᵢ L Hrᵢ ← extend(signed(H L))
sub5rᵢ rⱼ rₖrᵢ ← rⱼ - rₖ
out6rᵢoutput char(rᵢ)
exit7exit the program
out number8rᵢoutput decimal(rᵢ)

Understanding the Examples

All examples below assume these initial register values:

  • r1 = 10
  • r2 = 25
  • r3 = 0x1234ABCD
  • r4 = 0
  • r5 = 65

All other registers are unused in examples.

When you see 1 1 2 3, it means the instruction consists of 4 consecutive bytes: 1, 1, 2, and 3.

Instruction Details

move if

Format: 1 rᵢ rⱼ rₖ

Operation: Conditional move—if register rₖ contains a non-zero value, copy rⱼ into rᵢ; otherwise do nothing.

Examples:

  • 1 1 2 3: Since r3 = 0x1234ABCD (non-zero), r1 becomes 25 (value of r2)
  • 1 1 2 4: Since r4 = 0, nothing happens—r1 stays unchanged

💡 This is your conditional instruction! Use it for implementing if-statements and loops.

store

Format: 2 rᵢ rⱼ

Operation: Store the 32-bit value from register rⱼ into memory starting at the address in register rᵢ, using little-endian byte order.

Example:

  • 2 2 3: Stores r3 (0x1234ABCD) at addresses [25, 26, 27, 28]:
    • Address 25 ← 0xCD (least significant byte)
    • Address 26 ← 0xAB
    • Address 27 ← 0x34
    • Address 28 ← 0x12 (most significant byte)

load

Format: 3 rᵢ rⱼ

Operation: Load a 32-bit value from memory at the address in register rⱼ into register rᵢ, interpreting bytes as little-endian.

Example:

  • 3 1 2: Loads from addresses [25, 26, 27, 28] into r1:
    • If memory contains [0xCD, 0xAB, 0x34, 0x12]
    • Then r1 becomes 0x1234ABCD

💡 load and store are mirror operations—one writes to memory, the other reads from it.

loadimm

4 rᵢ L H: interpret H and L respectively as the high-order and the low-order bytes of a 16-bit signed value, sign-extend it to 32 bits, and store it into register rᵢ.

Examples:

  • 4 1 0x11 0x70: store 0x00007011 into register r1
  • 4 1 0x11 0xd0: store 0xffffd011 into register r1

Note how sign extension transforms a positive 16 bit value (0x7011 == 28689) into a positive 32 bit value (0x00007011 == 28689) and a negative 16 bit value (0xd011 == -12271) into a negative 32-bit value (0xffffd011 == -12271).

sub

5 rᵢ rⱼ rₖ: store the content of register rⱼ minus the content of register rₖ into register rᵢ

Arithmetic wraps around in case of overflow. For example, 0 - 1 returns 0xffffffff, and 0 - 0xffffffff returns 1.

Examples:

  • 5 10 2 1: store 15 into r10 (the subtraction of register r2 25 and register r1 10).
  • 5 10 4 1: store -10 (0xfffffff6) into r10 (the subtraction of register r4 0 and register r1 10).

out

6 rᵢ: display the character whose unicode value is stored in the 8 low bits of register rᵢ on the standard output.

Example:

  • 6 5: output “A” since the 8 low bits of register r5 contain 65 which is the unicode codepoint for “A”.
  • 6 3: output “Í” since the 8 low bits of register r3 contain 0xCD which is the unicode codepoint for “Í”.

Note: you have to convert the content into a char and display this char.

exit

7: exit the current program

Example:

  • 7: get out.

out number

8 rᵢ: output the signed number stored in register rᵢ in decimal.

Example:

  • 8 5: output “65” since register r5 contains 65.
  • 8 3: output “305441741” since register r3 contains 0x1234ABCD.

Note

Note that some common operations are absent from this instruction set. For example, there is no add operation, however a+b can be replaced by a-(0-b). Also, there are no jump or conditional jump operations. Those can be replaced by manipulating the value stored in register r0 (IP).

Your program

Your program will contain both an application and a library:

  • The library allows other programs to embed your virtual machine
  • The application lets you run programs written for the virtual machine from the command line.

You are given an archive file which contains (in a vm project):

  • Cargo.toml: the initial configuration file
  • src/main.rs: the main program for the application, which loads a binary file with machine code and executes it
  • src/lib.rs: the entry point for the interpreter library which contains your implementation of the virtual machine
  • src/tests/: a directory with many tests, ranging from individual instructions tests to complex tests
  • src/examples/: some examples for the virtual machines that you can run when your interpreter is complete

Tests and examples are accompanied by their disassembled counterpart to help you understand what happens (*.bin is the program for the virtual machine, *.dis is the disassembly).

Start by adding the vm Cargo project to your repository and ensure that you can build the program even though it doesn’t do anything useful yet and will contain many warnings:

$ cargo build

You can see the tests fail (hopefully this is a temporary situation) by running:

$ cargo test

Program structure

At any time, make sure that the program and the tests compile, even if they don’t pass succesfully yet. In particular, you are not allowed to rename the Machine and Error types, although you will need to modify them to implement this assignment. Similarly, the already documented method must be kept without modifying their signature because they will be used in automated tests.

❎ After creating a new interpreter through interpreter::Machine::new(), the following methods must be implemented:

  • step_on(): takes a descriptor implementing Write (for the out and out number instructions), and execute just one instruction
  • step(): similar to step_on(), but writes on the standard output
  • run_on(): takes a Write-implementing descriptor and runs until the program terminates
  • run(): similar to run_on(), but writes on the standard output
  • memory() and regs(): return a reference on the current memory and registers content
  • set_reg(): set the value of a register

Do not hesitate to add values to the Error enumeration to ease debugging. Also, you can implement additional functions to Machine if it helps dividing the work.

As far as Machine::new() is concerned, you might be interested in looking at slice::copy_from_slice().

Writing things to the user

For the out and out_number opcodes, you will have to write things to a file descriptor (respectively a character and a number). This can be done with the write!() macro, which lets you write into any object whose type implements the Write trait.

Suggested work program

Several tests are provided in the tests directory:

  • assignment.rs contains all the examples shown in the specification. You should try to concentrate on this one first and implement instructions in the same order as in the specification (and the test) until you pass this test. You can run only this test by using cargo test --test assignment.
  • basic_operations.rs checks that all instructions are implemented correctly. For example, it will attempt to read and write past the virtual machine memory, or use an invalid register, and check that you do not allow it.
  • complex_execution.rs will load binary images and execute them using your virtual machine.

How to debug more easily

In order to ease debugging, you can use two existing crates, log and pretty_env_logger.

log provides you with a set of macros letting you formatting debugging information with different severities:

  • log::info!(…) is for regular information
  • log::debug!(…) is for data you’d like to see when debugging
  • log::trace!(…) is for more verbose cases

See the documentation for a complete information.

pretty_env_logger is a back-end for log which gives you nice colored messages and is configured through environment variables. You can initialize at the beginning of your main program by calling pretty_env_logger::init(). Then, you can set an environment variable to determine the severities you want to see:

$ RUST_LOG=debug cargo run mytest.bin

You’ll then see all messages with severity debug and above. Once again, the documentation is online.

💡 Note on the Result type

You might notice a redefinition of the Result type:

#![allow(unused)]
fn main() {
type Result<T, E = Error> = std::result::Result<T, E>;
}

This defines a local Result type whose second generic parameter has a default value: your own Error type. It means that you can write Result<T> instead of Result<T, Error> for the return type of your functions. Also, a user of your library will be able to reference such a type as interpreter:::Result<T> instead of interpreter:::Result<T, interpreter::Error>.

This kind of shortcut is very common in Rust. For example, the std::io module defines:

#![allow(unused)]
fn main() {
type Result<T, E = std::io::Error> = std::result::Result<T, E>;
}

so that you can use std::io::Result<usize> for an I/O operation which returns a number of bytes instead of std::io::Result<usize, std::io::Error>.

Similarly, the std::fmt module goes even further and defines

#![allow(unused)]
fn main() {
type Result<T = (), E = std::fmt::Error> = std::result::Result<T, E>;
}

so that you can use std::fmt::Result (without generic parameters) in a formatting operation instead of std::fmt::Result<(), std::fmt::Error>.

LED Matrix Lab: Rust in the Real World 🚀

Welcome to the main event! In this comprehensive lab, you’ll build a real embedded application that controls an LED matrix display. This is where Rust truly shines—combining safety with the performance needed for embedded systems.

What You’ll Build

You’re about to recreate what’s done in C in the 4SE07 bare board programming lab (French), but with the power and safety of Rust. We’ll use higher-level abstractions while skipping unnecessary complexity, letting you focus on the interesting parts.

By the end of this lab, you’ll have:

  • Direct hardware control through Rust
  • Real-time image display on an LED matrix
  • Serial communication handling
  • Understanding of embedded Rust patterns

Let’s get started! 🦀

Initial Setup: Preparing Your Embedded Toolkit

Before diving into embedded development, we need to install some specialized tools. Think of these as your embedded Rust toolbox—each tool serves a specific purpose in the development workflow.

Essential Tools Installation

Let’s install the tools you’ll need for embedded development.

Install the following tools using the instructions below.

cargo-binutils: Binary Inspection Tools

cargo-binutils provides helpful subcommands like cargo size to inspect your compiled binaries—crucial for embedded work where every byte counts! It requires an additional LLVM component:

$ rustup component add llvm-tools
$ cargo install cargo-binutils

probe-rs: Your Hardware Communication Bridge

These powerful tools let you flash programs onto your microcontroller and debug them:

$ cargo install probe-rs-tools

💡 Linux users: On Debian and Ubuntu systems, you may need to install the libudev-dev package for probe-rs to work correctly. Run sudo apt-get install libudev-dev if you encounter issues.

Creating Your Project

Time to create the project structure for your LED matrix controller!

Create a new library project called tp-led-matrix in your git repository.

Not sure about the arguments? Use cargo new --help to see the option for creating a library project (hint: it’s --lib).

Development Workflow Expectations

Throughout this lab, maintain high code quality with these practices:

  • Compile frequently: Verify your code builds without warnings after each change
  • Format consistently: Run cargo fmt to keep formatting perfect
  • Catch issues early: Use cargo clippy to get expert suggestions on improving your code

These aren’t just suggestions—they’re professional Rust development practices!

Going Bare Metal: The no_std Environment

Embedded systems don’t have operating systems or standard libraries. Your program runs directly on the hardware! We need to tell Rust we’re working in this “bare metal” environment.

Declare no_std in your library by adding this inner attribute to src/lib.rs:

#![allow(unused)]
#![no_std]
fn main() {
}

💡 Why #! instead of #?: The ! makes this an inner attribute that applies to the entire module (your library), rather than to a specific item. That’s why it goes at the very top of your file!

This tells Rust: “We’re not using the standard library—we’re working directly with hardware.” Welcome to embedded development!

Building Visual Data Structures

Before we can display anything on the LED matrix, we need to create the fundamental data types for representing visual information. Think of this as building the vocabulary your program will use to “speak” to the display.

Module Organization

Create a public image module in your project.

All the types in this section will live in this module. We’re building two key structures:

  • Color: Represents a single RGB pixel with red, green, and blue components
  • Image: Represents a complete 8×8 image made of 64 colored pixels

Later, we’ll reexport these from the library’s top-level module for easier use. For now, just create them in the image module—don’t reexport anything yet.

Ready? Let’s build your first embedded data structures! 🎨

The Color Type: Representing RGB Pixels

Every pixel on the LED matrix displays a color made from mixing red, green, and blue light. Let’s create a Rust type to represent this!

Basic Color Structure

Create an image::Color structure with three unsigned byte fields for the primary colors: r, g, and b.

Making Color Efficient with Traits

Since a Color is just 3 bytes, copying it is extremely cheap—much faster than borrowing in many cases!

Derive Copy and Clone for Color.

💡 Copy types can be duplicated by simply copying bits—perfect for small types. Note that Copy requires Clone, so you need both traits.

A Sensible Default

What’s the default color when you create a new Color? Black (all zeros) makes perfect sense—it’s the absence of light.

Derive Default for Color to get this behavior automatically.

Primary Color Constants

Let’s define some helpful constants for the primary colors.

Implement three public constants on Color:

  • Color::RED
  • Color::GREEN
  • Color::BLUE

Initialize each with the appropriate RGB values (full intensity for one component, zero for others).

⚠️ Module organization tip: If you put your image module code in a file named image.rs, don’t wrap it in pub mod image { … } inside that file! That would create a nested image::image module. The file image.rs already defines the image module—just put the module’s contents directly in the file.

Gamma Correction: Making Colors Look Right

Human perception of brightness isn’t linear—we’re much more sensitive to changes in dark colors than bright ones. LED matrices need gamma correction to display colors that look natural to our eyes.

We’ve prepared a gamma correction table that works perfectly with your LED matrix. It maps each input brightness value (0-255) to a perceptually corrected output value.

Add a gamma module containing:

  • The gamma correction table from the link above
  • A function pub fn gamma_correct(x: u8) -> u8 that returns the corrected value from the table

Implement gamma correction for Color by adding this method:

#![allow(unused)]
fn main() {
pub fn gamma_correct(&self) -> Self
}

This method should apply gamma::gamma_correct to all three color components (r, g, b) and return a new corrected Color.

💡 The &self parameter means this is a method called on a Color instance: my_color.gamma_correct(). The Self return type is shorthand for Color—it returns the same type as the receiver.

Color Arithmetic: Making Colors Vibrant

Imagine you want to dim a color to 50% brightness, or brighten it to 150%. We can do this elegantly by implementing multiplication and division operations on colors!

The Challenge of no_std

Without the standard library, we lose access to some floating-point operations like f32::round(). We’ll use the micromath crate to get these back.

Add the micromath crate to your project’s dependencies.

Import micromath::F32Ext in your image module to gain access to floating-point operations.

Implementing Color Multiplication

Let’s implement the * operator for Color multiplied by f32. For example, Color::RED * 0.5 should give you a half-brightness red.

Implement the core::ops::Mul<f32> trait on Color.

Your implementation should:

  • Multiply each RGB component by the floating-point value
  • Round to the nearest integer (use the round() method from F32Ext)
  • Clamp values to stay within 0-255 range (hint: f32::clamp() is helpful!)
  • Return a new Color with the adjusted components

Consider writing a helper function to handle one component at a time—it’ll make your code cleaner.

Implementing Color Division

Division should work similarly: Color::BLUE / 2.0 gives you half-intensity blue.

Implement the core::ops::Div<f32> trait on Color.

💡 Smart implementation: You can implement division in terms of multiplication! color / x is the same as color * (1.0 / x). Reuse your multiplication code for cleaner implementation.

Excellent! Your Color type is now complete and ready to create beautiful displays! 🎨


🦀 Advanced Note: Traits, Visibility, and Namespace Pollution

When you write use micromath::F32Ext;, you bring the F32Ext trait into scope. This trait defines methods like round() on the f32 type. Importing the trait makes these methods available—but also adds F32Ext to your namespace.

If you want the methods but don’t want the name cluttering your namespace, there’s a clever trick:

// Import the F32Ext trait methods without importing the name itself
use micromath::F32Ext as _;

The as _ means “bring this into scope but don’t bind it to any name.” The methods still work, but F32Ext itself isn’t part of your namespace. Neat!

The Image Type: Working with 8×8 Displays

Now that we have pixels (Color), let’s build a complete image! Our LED matrix is 8×8 pixels, so we need a structure to hold all 64 pixels together.

Basic Image Structure

Create a public image::Image structure containing a single unnamed field: an array of 64 Color pixels.

Structures with unnamed fields (called tuple structs) are declared like this:

struct Image([Color; 64]);

With this definition, if im is an Image, then im.0 accesses the underlying array. It’s like tuple field access (.0 for first field, .1 for second, etc.).

Creating Solid-Color Images

Let’s add a convenient constructor for images filled with a single color.

Implement a public associated function on Image:

#![allow(unused)]
fn main() {
pub fn new_solid(color: Color) -> Self
}

This should return an Image where all 64 pixels are set to the given color.

The Default Trait

The Default trait is perfect for images—a default image should be all black pixels. Unfortunately, Rust has a technical limitation: it can’t automatically derive Default for arrays longer than 32 elements. No problem—we’ll implement it manually!

Manually implement the Default trait for Image.

Your implementation should return an image filled with the default color (which is black, since Color defaults to all zeros).

💡 You can use your new_solid function here: Self::new_solid(Color::default())

Accessing Individual Pixels

We want intuitive pixel access using syntax like my_image[(row, col)]. Rust’s Index and IndexMut traits make this possible, and they accept any type as an index—a (usize, usize) tuple is perfect for our 2D grid!

Implement core::ops::Index<(usize, usize)> for Image with output type Color.

This enables reading pixels: let pixel = image[(2, 3)];

Implement core::ops::IndexMut<(usize, usize)> for Image.

This enables writing pixels: image[(2, 3)] = Color::RED;

💡 Note: IndexMut doesn’t specify an output type because it must match the one from Index. You can only implement IndexMut on types that also implement Index with the same index type.

Row Access for Display Scanning

LED matrices typically display images one row at a time. Let’s provide efficient access to entire rows!

Add a row accessor method to Image:

#![allow(unused)]
fn main() {
pub fn row(&self, row: usize) -> &[Color]
}

This should return a slice referencing the pixels in the specified row.

💡 Lifetimes and safety: Notice how the returned reference borrows from self? Rust ensures the reference can’t outlive the image—automatic memory safety with zero runtime cost!

Creating a Gradient for Testing

Let’s build a visual test pattern: a gradient that fades from a color to black.

Implement a gradient constructor:

#![allow(unused)]
fn main() {
pub fn gradient(color: Color) -> Self
}

Each pixel should contain the reference color divided by (1 + row * row + col). Use the pixel access methods you just implemented (image[(row, col)]) to build this programmatically.

This creates a nice visual pattern perfect for testing your display!

Viewing Images as Raw Bytes

From the 4SE07 lab, we know we’ll receive image data from the serial port byte by byte. It would be much easier if we could view our Image as raw bytes too!

Understanding Memory Layout

Rust is allowed to reorder, pad, or otherwise rearrange struct fields for optimization. Right now, we don’t know how Color is organized in memory. Maybe each field uses 32 bits instead of 8? Maybe g comes before r? We need to take control of the memory layout.

Add a repr(C) attribute to Color.

This forces Rust to use C-compatible representation, which guarantees:

  • Each field is exactly 8 bits (one byte)
  • Fields are packed with one-byte alignment
  • Fields appear in the order we declared: r, then g, then b

Perfect for hardware interfacing!

Ensuring Image Layout

For Image, we’re in good shape. Rust guarantees that arrays are laid out according to their element type’s size and alignment. With our repr(C) on Color, this means the three bytes of pixel 0 are immediately followed by the three bytes of pixel 1, and so on—exactly what we need!

However, we must ensure Image uses the same representation as its inner array.

Add a repr(transparent) attribute to Image.

This tells Rust: “Use the exact same memory layout as your single non-zero-sized field.” The Image wrapper becomes zero-cost!

Implementing Byte Access

Now let’s implement traits that let us view an Image as an array of 192 bytes (8 rows × 8 columns × 3 bytes per pixel).

Implement AsRef<[u8; 192]> for Image.

You’ll need to use core::mem::transmute() to reinterpret self as a reference to a byte array. This is an unsafe function because we’re telling Rust “trust me, I know this is safe”—and with our repr attributes, it genuinely is!

⚠️ Unsafe code: transmute is powerful but dangerous. Only use it when you’ve carefully ensured memory layouts match, as we have here with our repr attributes.

Implement AsMut<[u8; 192]> for Image the same way.

This provides mutable byte access for filling the image from serial data.


🎉 Congratulations! You now have a rock-solid Image type with:

  • Safe pixel access
  • Efficient row access
  • Raw byte conversion for hardware communication

This solid foundation will make the rest of the lab much smoother. Great work! 🦀

Reexporting Types for Easy Access

Your library users will want to use your Color and Image types. Let’s make their lives easier by reexporting these types at the library’s top level!

Why Reexport?

Without reexporting, users would need to write:

use tp_led_matrix::image::Color;
use tp_led_matrix::image::Image;

That’s verbose! By reexporting from lib.rs, they can simply write:

use tp_led_matrix::{Color, Image};

Much cleaner!

Implementation

Add public re-exports in lib.rs using pub use to expose Color and Image at the crate root.

This is a common Rust pattern—organize your internal modules however makes sense for implementation, then present a clean, flat API to users. Best of both worlds! 🎯

Running on Real Hardware: Embedded Mode 🎯

Now comes the exciting part—running your Rust code on actual hardware! We’re moving from simulation to the real world, where your program will control a physical LED matrix on an IoT board.

The Journey Ahead

Getting your code onto the board requires several setup steps, but don’t worry—once configured, Cargo handles everything automatically. Here’s our roadmap:

  1. Configure the toolchain: Set up Rust to generate ARM microcontroller code
  2. Upload to the board: Flash your program using Segger JLink tools
  3. Display something: Make your first pixels light up!
  4. Optimize the setup: Streamline your development workflow
  5. Configure peripherals: Access the hardware through Rust’s Hardware Abstraction Layer
  6. Light the LED matrix: Bring your display to life with GPIO control

Each step builds on the previous one, taking you from “empty project” to “working LED matrix display.” Let’s get started! 🚀

Configuring the Toolchain: Cross-Compiling for ARM

Time to teach Rust how to generate code for your microcontroller! We’ll set up cross-compilation so your programs can run on the ARM Cortex-M4F processor.

Step 1: Installing the ARM Target

Your board uses a STM32L475VGT6 microcontroller with a Cortex-M4F core (the F means it has a hardware floating-point unit—nice!). We need to download the corresponding compilation target.

Add the ARM target using rustup:

$ rustup target add thumbv7em-none-eabihf

This downloads the standard library and compiler components needed for ARM Cortex-M4F chips.

Setting the Default Target

Rather than specifying this target every time we build, let’s make it the default for this project.

Create .cargo/config.toml in your project root with:

[build]
target = "thumbv7em-none-eabihf" # Cortex-M4F/M7F (with FPU)

Now every cargo build will automatically cross-compile for ARM!

Verify it works: Run cargo build and notice the new target/thumbv7em-none-eabihf directory containing your ARM binaries.

Step 2: Building an Executable Program

We can compile a library, but we need an actual runnable program. For embedded systems, this requires:

  1. Linker script: Tells the linker where code and data go in memory
  2. Linker arguments: Configures the linking process
  3. Main program: Your entry point
  4. Panic handler: What to do when something goes wrong

Sounds like a lot, but the Rust ecosystem makes it straightforward!

Using the Cortex-M Runtime Crate

We could write our own linker script from scratch (like in the 4SE07 lab), but why reinvent the wheel? The cortex-m-rt crate provides everything we need:

  • A complete linker script (link.x)
  • The #[entry] attribute to mark your main function
  • A proper vector table for Cortex-M processors

The linker script includes a memory.x file that describes your chip’s memory layout. We’ll provide this small configuration file.

Add the runtime dependency:

$ cargo add cortex-m-rt

Create memory.x in your project root (next to Cargo.toml):

MEMORY
{
  FLASH : ORIGIN = 0x08000000, LENGTH = 1M
  RAM   : ORIGIN = 0x20000000, LENGTH = 96K
}

This tells the linker where your chip’s flash memory (for code) and RAM (for data) are located.

Configuring the Linker

We need to tell the linker to use the link.x script provided by cortex-m-rt.

Add this section to .cargo/config.toml:

[target.'cfg(all(target_arch = "arm", target_os = "none"))']
rustflags = ["-C", "link-arg=-Tlink.x"]

This applies to all ARM bare-metal targets—exactly what we need!

Adding Peripheral Access

The cortex-m-rt linker scripts need a vector table specific to your microcontroller. We’ll get this from embassy-stm32, which provides complete STM32 support.

Add Embassy STM32 support:

$ cargo add embassy-stm32 --features stm32l475vg

Add critical section support (required by Embassy):

$ cargo add cortex-m --features critical-section-single-core

💡 What’s a critical section? It’s a piece of code that must run atomically (without interruption). Embassy needs a way to implement these for safety.

Writing the Main Program

While a crate can have only one library, it can have multiple executables (binaries). Let’s create our main program!

Configure the executable in Cargo.toml:

[[bin]]
name = "tp-led-matrix"

The double brackets [[bin]] indicate a list item—you could add more executables if needed.

Create src/main.rs with this minimal embedded program:

#![no_std]
#![no_main]

use cortex_m_rt::entry;
use embassy_stm32 as _;   // Links Embassy (provides the vector table)

#[panic_handler]
fn panic_handler(_panic_info: &core::panic::PanicInfo) -> ! {
    loop {}
}

#[entry]
fn main() -> ! {
    panic!("The program stopped");
}

Let’s break this down:

  • #![no_std]: We’re not using the standard library
  • #![no_main]: Our entry point isn’t the normal fn main()
  • #[entry]: Marks our actual entry point (provided by cortex-m-rt)
  • -> !: The “never” type—our program runs forever or panics, it never returns
  • #[panic_handler]: Defines what happens on panic (here: infinite loop)

💡 Alternative: Instead of writing your own panic handler, you can use the panic-halt crate which does the same thing.

Building Your Embedded Program

Time to compile!

Build in both modes:

$ cargo build              # Debug mode
$ cargo build --release    # Release mode (optimized)

Checking Binary Size

On embedded systems, code size matters! Let’s see how big our binaries are.

Check the size with the traditional tool:

$ arm-none-eabi-size target/thumbv7em-none-eabihf/debug/tp-led-matrix
$ arm-none-eabi-size target/thumbv7em-none-eabihf/release/tp-led-matrix

Those paths are painful to type! Fortunately, there’s a better way:

Use cargo-size for convenience:

$ cargo size              # Debug mode
$ cargo size --release    # Release mode

This builds the binary if needed, then shows its size. Much nicer!

Generating Customized Documentation

Here’s a pro tip: The online docs for embassy-stm32 show all STM32 microcontrollers, which can be overwhelming. Generate documentation specifically for your chip!

Generate custom documentation:

$ cargo doc --open

This creates docs tailored to your dependencies and feature flags—only showing what’s actually available on your STM32L475VGT6 and opens the documentation in your browser (thanks to --open).

Try searching for a method like gamma_correct to see your own documented code!

💡 Keep it updated: Rerun cargo doc after updating dependencies or making significant code changes. It’s smart—it only regenerates what changed.

Great! You now have a complete embedded Rust development environment. Your code compiles to ARM, you have proper documentation, and you’re ready to flash it onto hardware! 🚀

Uploading the program to the board using Segger JLink tools

Even though this program does nothing, we want to upload it to the board. For this, we will use Segger JLink tool suite, as explained in 4SE07 lab.

❎ Ensure that you have either one of arm-none-eabi-gdb or gdb-multiarch installed on your system. If this is not the case, install it before proceeding.

❎ In a dedicated terminal, launch JLinkGDBServer -device STM32L475VG.

We need to configure gdb so that it connects to the JLinkGDBServer program and uploads the program.

❎ Create a jlink.gdb gdb script containing the commands to connect to JLinkGDBServer, upload and run the debugged program:

target extended-remote :2331
load
mon reset
c

We would like cargo run to automatically launch gdb with the script we just wrote. Fortunately, the runner can be configured as well!

❎ In .cargo/config.toml, add the following to the conditional target section you created earlier:

runner = "arm-none-eabi-gdb -q -x jlink.gdb"

⚠ On some systems, one must use gdb-multiarch instead of arm-none-eabi-gdb, check which executable is available.

❎ Upload and run your program using cargo run while your board is connected. You should be able to interrupt gdb using ctrl-c and see that you are indeed looping in the panic handler function.

Congratulations: you are running your first embedded Rust program on a real board.

Displaying Output: Real-Time Transfer (RTT) 📡

Your program runs on the board, but how do you see what it’s doing? Enter RTT (Real-Time Transfer)—a clever protocol from Segger that lets your microcontroller communicate with your computer through in-memory buffers.

How RTT Works

RTT uses shared memory that the JLink debugging probe continuously scans. It transfers data between your microcontroller and your host computer—fast, efficient, and perfect for debugging!

Setting Up RTT in Rust

The Rust embedded ecosystem provides excellent RTT support through two crates:

  • rtt-target: Implements the RTT protocol and provides rprintln!() for formatted output (like println! but over RTT)
  • panic-rtt-target: A panic handler that sends panic messages over RTT so you can see exactly what went wrong

Add RTT crates as dependencies:

$ cargo add rtt-target panic-rtt-target

Wiring It Up

Remove your manual panic handler from src/main.rs and import the RTT panic handler:

use panic_rtt_target as _;

The as _ means we’re importing it just to link it in—we don’t need to reference it directly.

Import RTT printing macros and update your main function:

use rtt_target::{rtt_init_print, rprintln};

#[entry]
fn main() -> ! {
    rtt_init_print!();
    rprintln!("Hello, world!");
    panic!("The program stopped");
}

Seeing the Output

Start the RTT client in a terminal:

$ JLinkRTTClient

(Or JLinkRTTClientExe depending on your installation)

This connects to the running JLinkGDBServer and displays output from your board.

Flash and run your program:

$ cargo run --release

You should now see “Hello, world!” followed by the panic message in the RTT client terminal!

🎉 Success! Now you can debug embedded programs just like regular Rust programs—with print statements and panic messages. This will make development so much easier!

Optimizing the setup

We will take some steps to ease our development process and save some time later.

Reduce binary size

Using cargo size and cargo size --release, we can see that the binary produced in release mode is much smaller than the one produced in debug mode. Note that size doesn’t display the debug information since those are never stored in the target memory.

We would like to use --release to keep an optimized binary, but we would like to keep the debug information in case we need to use gdb, or to have a better backtrace in case of panic. Fortunately, we can do that with cargo and require that the release profile:

  • keeps debug symbols;
  • uses link-time-optimization (LTO) to optimize the produced binary even further;
  • generates objects one by one to get an even better optimization.

❎ To do so, add the following section to your program Cargo.toml:

[profile.release]
debug = true      # symbols are nice and they don't increase the size on the target
lto = true        # better optimizations
codegen-units = 1 # better optimizations

From now on, we will always use --release when building binaries and those will be optimized fully and contain debugging symbols.

Make it simplier to run the program

Even though we have configured cargo run so that it runs gdb automatically and uploads our program, we still have to start JLinkGDBServer and JLinkRTTClient. Fortunately, the probe-rs and knurling-rs projects make it easy to develop embedded Rust programs:

  • probe-rs lets you manipulate the probes connected to your computer, such as the probe located on your IoT-node board.
  • defmt (for deferred formatting) is a logging library and set of tools that lets you log events from your embedded programs and transmit them in an efficient binary format. The formatting for the developer consumption will be made by tools running on the host rather than on the target. probe-rs run is able to get defmt traces using a RTT channel and decode and format them.

Many others programs such as cargo flash or cargo embed exist, but we will not need them here.

❎ Stop the Segger JLink tools. Using the probe-rs executable, check if the probe on your board is properly detected.

❎ Use probe-rs run with the appropriate parameters instead of gdb to upload your program onto the board and run it. Replace your runner in .cargo/config.toml by:

runner = "probe-rs run --chip stm32l475vgtx"

❎ Using cargo run --release, look at your program being compiled, uploaded and run on your board. You should see the messages sent over RTT on your screen.

⚠ You can use ctrl-c to quit probe-rs run.

Use defmt for logging

Instead of using RTT directly, we will use defmt to have a better and efficient logging system.

❎ Remove the rtt-target and panic-rtt-target from your dependencies in Cargo.toml.

❎ Add the defmt and defmt-rtt dependencies to your Cargo.toml.

❎ Add the panic-probe dependency to your Cargo.toml with the print-defmt feature.

defmt-rtt is the RTT transport library for defmt. panic-probe with the print-defmt feature will indicate to probe-rs run the panic message to display using defmt and will tell it to stop in case of a panic.

defmt uses a special section in your executable. In .cargo/config.toml, add the following to your existing rustflags in order to include the provided linker file fragment: "-C", "link-arg=-Tdefmt.x".

❎ Modify your code in src/main.rs to include the following changes:

  • Write use panic_probe as _; instead of panic_rtt_target to use the panic-probe crate.
  • Write use defmt_rtt as _; to link with the defmtt-rtt library.
  • Remove use of rtt_target items.
  • Remove rtt_init_print!(), and replace rprintln!() with defmt::info!() to print a message.

❎ Run your program using cargo run --release. Notice that you see the panic information, but you do not see the “Hello, world!” message.

By default, defmt only prints errors. The various log level are trace, debug, info, warn, and error. If you want to see the messages of level info and above (info, warn, and error), you must set the DEFMT_LOG environment variable when building and when running the program. Only the appropriate information will be included at build time and displayed at run time.

❎ Build and run your program using DEFMT_LOG=info cargo run --release. You will see the “Hello, world!” message. Note that you could also have used DEFMT_LOG=trace or DEFMT_LOG=debug if you add more verbose error messages.

❎ Setup the default log level by telling cargo to set the DEFMT_LOG environment variable when using cargo commands. You can do this by adding a [env] section in .cargo/config.toml:

[env]
DEFMT_LOG = "info"

⚠️ Changing the [env] section of .cargo/config.toml will not recompile the program with the new options. Make sure that you use cargo clean when you change the DEFMT_LOG variable.

🎉 Your environment is fully setup in an efficient way. If needed, you can revert to using gdb and Segger JLink tools, but that should be reserved to extreme cases.

Configuring the Hardware: Unleashing Performance ⚡

So far, your board is running with default settings—using a slow and imprecise 4MHz internal oscillator with most peripherals sleeping. Let’s wake it up and run at full speed!

Understanding Hardware Abstraction Layers

Several crates work together to give you safe, high-level access to your STM32L475VGT6’s hardware:

  • cortex-m: Common functionality for all ARM Cortex-M processors (not specific to STM32)

  • stm32-metapac: A Peripheral Access Crate (PAC) providing low-level register access for all STM32 chips. You don’t need to add this explicitly—the HAL includes it.

  • embassy-stm32: The Hardware Abstraction Layer (HAL) that provides safe, high-level APIs. This is what you’ll use!

Think of it like layers: PAC provides raw register access, HAL builds safe abstractions on top, and you build your application on the HAL.

Setting Up Clock Configuration

Let’s import what we need for clock configuration.

Add imports to main.rs:

use embassy_stm32::rcc::*;
use embassy_stm32::Config;

Running at Maximum Speed

Your STM32L475VGT6 can run at 80MHz—let’s use that full power! The STM32L475VGT6 microcontroller can be clocked from several sources:

  • HSE (High-Speed External) clock: an external crystal, oscillator, or other precise clock source. Unfortunately, our board is not fitted with a high-speed crystal/oscillator. Although we could use the precise clock signal coming from the debug probe, it would only work when the probe is powered (i.e., during debugging). 😕
  • HSI (High-Speed Internal) clock: an internal 16MHz RC oscillator. It is not very precise and depends on temperature. 🙅
  • MSI (Multi-Speed Internal) clock: another internal oscillator whose frequency can be set to several values between 100kHz and 48MHz. It is not precise, but it can be trimmed to within ~0.25% if a precise low-speed oscillator is present. And guess what? We have an LSE (Low-Speed External) oscillator on our board, running at 32.768kHz, so we can use it to stabilize the MSI. 🙂
  • PLL (Phase-Locked Loop): this is a system that takes a clock signal as input, pre-divide it, and multiply it. This faster clock can then be divided again by three different values (P, Q and R). The result of the division by R can be used as a system clock. 🤩

We’ll configure the PLL to take a 4MHz MSI clock as input, divide it by 1, multiply it by 40, then divide by 2 to get 80MHz. We’ll also enable the LSE clock with its default configuration (32.768kHz); the HAL will detect this and automatically use it to stabilize the MSI clock.

Replace your main() function with this clock-configured version:

#[entry]
fn main() -> ! {
    defmt::info!("defmt correctly initialized");

    // Setup the clocks at 80MHz using MSI, stabilized by the LSE:
    // 4MHz (MSI) / 1 * 40 / 2 = 80MHz. The flash wait
    // states will be configured accordingly.
    let mut config = Config::default();
    config.rcc.msi = Some(MSIRange::RANGE4M); // MSI at 4MHz
    config.rcc.ls = LsConfig::default_lse();  // LSE at 32.768kHz
    config.rcc.pll = Some(Pll {
        source: PllSource::MSI,  // 4MHz
        prediv: PllPreDiv::DIV1, // 4MHz / 1 = 4MHz
        mul: PllMul::MUL40,      // 4MHz / 1 * 40 = 160MHz
        divp: None,
        divq: None,
        divr: Some(PllRDiv::DIV2), // 4MHz / 1 * 40 / 2 = 80MHz
    });
    config.rcc.sys = Sysclk::PLL1_R;
    embassy_stm32::init(config);

    panic!("Everything configured");
}

What’s happening here?

  • We create a default Config and customize the clock settings.
  • We select MSI at 4MHz and enable the LSE, which the HAL uses to stabilize (trim) the MSI.
  • We enable the PLL, set its source to the MSI, and don’t pre-divide (DIV1) the source.
  • The PLL multiplies 4MHz by 40 (=160MHz).
  • We then divide by 2 to get our target 80MHz.
  • We select the R output of the PLL as our system clock source.
  • embassy_stm32::init(config) applies all these settings and configures flash wait states automatically.

🎉 Congratulations! Your microcontroller now runs at 80MHz instead of the default 4MHz—that’s 20× faster, with a clock precision within ~0.25% instead of the default ~1%-3% for the MSI alone. Your program does the same thing as before, but now you have the performance headroom for real-time tasks like driving an LED matrix.

Time to make those LEDs shine! 💡

GPIO and the LED matrix

We will now configure and program our LED matrix. It uses 13 GPIO on three different ports.

HAL and peripherals

The embassy_stm32::init() function that you have used earlier returns a value of type Peripherals. This is a large structure which contains every peripheral available on the microcontroller.

❎ Store the peripherals in a variable named p:

    let p = embassy_stm32::init(config);

In this variable, you will find for example a field named PB0 (p.PB0). This field has type embassy_stm32::Peri<'static, embassy_stm32::peripherals::PB0>: this is the type of the pin B0. Each pin will have its own type, which means that you will not use one instead of another by mistake.

HAL and GPIO configuration

A pin is configured through types found in the embassy_stm32::gpio module. For example, you can configure pin PB0 as an output with an initial low state and a very high commuting speed by doing:

   // pin will be of type Output<'static>
   let mut pin = Output::new(p.PB0, Level::Low, Speed::VeryHigh);
   // Set output to high
   pin.set_high();
   // Set output to low
   pin.set_low();

If pin is dropped, it will be automatically deconfigured and set back as an input.

🦀 The lifetime parameter 'a in Output<'a> represents the lifetime of the pin that we have configured as output. In our case, the lifetime is 'static as we work directly with the pins themselves. But sometimes, you get the pin from a structure which has a limited lifetime, and this is reflected in 'a.

Matrix module

❎ Create a public matrix module.

❎ In the matrix module, import embassy_stm32::gpio::* as well as tp_led_matrix::{Color, Image} (from your library) and define the Matrix structure. It is fully given here to avoid a tedious manual copy operation, as well as all the functions you will have to implement on a Matrix:

pub struct Matrix<'a> {
    sb: Output<'a>,
    lat: Output<'a>,
    rst: Output<'a>,
    sck: Output<'a>,
    sda: Output<'a>,
    rows: [Output<'a>; 8],
}

impl Matrix<'_> {
    /// Create a new matrix from the control registers and the individual
    /// unconfigured pins. SB and LAT will be set high by default, while
    /// other pins will be set low. After 100ms, RST will be set high, and
    /// the bank 0 will be initialized by calling `init_bank0()` on the
    /// newly constructed structure.
    /// The pins will be set to very high speed mode.
    #[expect(clippy::too_many_arguments)] // Necessary to avoid a Clippy warning
    pub fn new(
        pa2: Peri<'static, PA2>,
        pa3: Peri<'static, PA3>,
        pa4: Peri<'static, PA4>,
        pa5: Peri<'static, PA5>,
        pa6: Peri<'static, PA6>,
        pa7: Peri<'static, PA7>,
        pa15: Peri<'static, PA15>,
        pb0: Peri<'static, PB0>,
        pb1: Peri<'static, PB1>,
        pb2: Peri<'static, PB2>,
        pc3: Peri<'static, PC3>,
        pc4: Peri<'static, PC4>,
        pc5: Peri<'static, PC5>,
    ) -> Self {
        // Configure the pins, with the correct speed and their initial state
        todo!()
    }

    /// Make a brief high pulse of the SCK pin
    fn pulse_sck(&mut self) {
        todo!()
    }

    /// Make a brief low pulse of the LAT pin
    fn pulse_lat(&mut self) {
        todo!()
    }

    /// Send a byte on SDA starting with the MSB and pulse SCK high after each bit
    fn send_byte(&mut self, pixel: u8) {
        todo!()
    }

    /// Send a full row of bytes in BGR order and pulse LAT low. Gamma correction
    /// must be applied to every pixel before sending them. The previous row must
    /// be deactivated and the new one activated.
    pub fn send_row(&mut self, row: usize, pixels: &[Color]) {
        todo!()
    }

    /// Initialize bank0 by temporarily setting SB to low and sending 144 one bits,
    /// pulsing SCK high after each bit and pulsing LAT low at the end. SB is then
    /// restored to high.
    fn init_bank0(&mut self) {
        todo!()
    }

    /// Display a full image, row by row, as fast as possible.
    pub fn display_image(&mut self, image: &Image) {
        // Do not forget that image.row(n) gives access to the content of row n,
        // and that self.send_row() uses the same format.
        todo!()
    }
}

❎ Implement all those functions.

You can refer to 4SE07 notes for GPIO connections (in French) and the operation of the LED Matrix controller (in French).

Note that you need to maintain the reset signal low for 100ms. How can you do that? Keep reading.

Implementing a delay

Since you do not use an operating system (yet!), you need to do some looping to implement a delay. Fortunately, the embassy-time can be used for this. By cooperating with the embassy-stm32 crate, it will be able to provide you with some timing functionalities:

❎ Add the embassy-time crate as a dependency with feature tick-hz-32_768: this will configure a timer at a 32768Hz frequency, which will give you sub-millisecond precision. You will also have to enable the generic-queue-8 feature since we don’t use the full Embassy executor at this stage. Note that embassy-time knows nothing about the microcontroller you use, it needs a timer to run on.

❎ Add the time-driver-any to the embassy-stm32 dependency. This will tell the HAL to make a timer at the disposal of the embassy-time crate.

The Rust embedded working-group has defined common traits to work on embedded systems. One of those traits is the DelayNs in the embedded-hal crate, which is implemented by the embassy_stm32::d::Delay singleton of embassy-time. You can use it as shown below:

❎ Add the embedded-hal dependency.

❎ Import the DelayNS trait in your matrix.rs, as well as the Delay singleton from embassy-time:

use embedded_hal::delay::DelayNs as _;
use embassy_time::Delay;

You can then use the following statement to wait for 100ms:

   Delay.delay_ms(100);

🦀 Note on singletons

Delay is a singleton: this is a type which has only one value. Here, Delay is declared as:

struct Delay;

which means that the type Delay has only one value, which occupies 0 bytes in memory, also called Delay. Here, the Delay type is used to implement the DelayNs trait from the embedded-hal crate:

impl embedded_hal::delay::DelayNs for Delay {
    fn delay_ms(&mut self, ms: u32) { … }
    …
}

You might have noticed that self is not used in delay_ms, but the implementation has to conform to the way the trait has been defined. When you later write Delay.delay_ms(100), you create a new instance (which contains nothing) of the type Delay, on which you mutably call delay_ms(100).

Main program

❎ In your main program, build an image made of a gradient of blue and display it in loop on the matrix. Since it is necessary for the display to go fast, do not forget to run your program in release mode, as we have been doing for a while now. Don’t forget that Image values have a .row() method which can be handy here.

Are you seeing a nice gradient? If you do, congratulations, you have programmed your first peripheral in bare board mode with the help of a HAL. 👏

(if not, add traces using defmt)

Real-Time Control: Precision Timing 🕐

Great work getting the LED matrix displaying images! Now let’s take it to the next level by adding precise timing control. Instead of displaying as fast as possible, we’ll implement smooth, controlled animations with professional-quality timing.

What We’ll Build

In this section, you’ll transform your basic display into a sophisticated real-time system with:

  1. Embassy executor: Bring in Rust’s async/await for embedded systems
  2. Controlled line timing: Display each row at precise intervals for smooth 80 FPS rendering
  3. Timed image changes: Automatically cycle through images with perfect timing
  4. Serial communication: Receive new images over the serial port in real-time
  5. Triple buffering: Ensure buttery-smooth transitions without tearing or flicker

Why Real-Time Matters

Real-time systems aren’t just about speed—they’re about predictability. Your LED matrix needs consistent timing to avoid flicker and provide smooth animations. Embassy’s async framework makes this surprisingly elegant in Rust!

Ready to make your display professional-grade? Let’s dive in! 🚀

Embassy executor

The Embassy framework and particularly its executor will help us decouple tasks and resources.

Add the Embassy executor as a dependency

❎ Add the embassy-executor dependency to your Cargo.toml file with the following features:

  • arch-cortex-m in order to select the asynchronous executor adapted to our architecture
  • executor-thread to enable to default executor (“thread mode” is the “normal” processor mode, opposed to “interrupt mode”)
  • defmt to enable debugging messages

Since we now use the full executor, the generic-queue-8 feature can be removed from embassy-time. The timers will use the features provided by the Embassy executor.

Embassy main program

❎ Add the embassy_executor::main attribute to your main function (instead of the previous entry attribute) and make it async, as seen in class and in Embassy documentation. Check that you can still execute your code as you did before. The main() function must take a Spawner parameter, which will be used to create tasks.

❎ Modify the Matrix::new() method so that it becomes asynchronous. Replace the use of the blocking delay by a call to one of the Timer asynchronous function.

For example you could use Timer::after() and give it an appropriate Duration, or use Timer::after_millis() directly.

Check that your program works correctly, including after unplugging and replugging your board in order to deinitialize the led matrix.

⚠ Right now, your main function executes a busy loop at its end. This is not a problem right now, because you don’t have other asynchronous tasks running. However, as soon as you will spawn any other asynchronous task, you will have to make sure that you don’t keep the busy loop, otherwise those asynchronous tasks won’t be able to execute as your main function will never relinguish control to the executor. Your main function will have to either terminate and return nothing, or to wait forever on a future (for example using core::future::pending().await).

Controlled line change

In this part, we will start using a periodic ticker to run some tasks at designated times. For example, we want to display frames at a pace of 80 FPS (frames per second) as it is most pleasant for the eyes to not have frequencies below 70Hz. Since each line of the matrix should get the same display time, we will call a display task 80×8=640 times per second. This display task will display the next line.

Blinking led

In order to check that you do not block the system, you want to create a new asynchronous task which will make the green led blink.

❎ Comment out your matrix display loop. You will reenable it later.

❎ Create a new task blinker as an asynchronous function with attribute embassy_executor::task. This function:

  • receives the green led port (PB14) as an argument
  • initialize the port as an output
  • loops infinitely while displaying this pattern:
    • three quick green flashes
    • a longer pause

Don’t forget that you can use asynchronous functions from Timer as you did just before.

❎ Using the Spawner object passed to your main program, spawn the blinker task.

❎ Check that the green led displays the expected pattern repeatidly.

❎ Reenable the matrix display loop (after you have spawned the new task).

You should no longer see your green led blink: your matrix display loops never returns and never suspends itself as an asynchronous task would do while waiting for the time to switch to the next line. We will take care of that.

Controlled pace

We want to make an asynchronous task whose job is to take care of displaying the lines of the led matrix at the right pace in order to get a 80Hz smooth display. For this we will need to build the elements:

  • An asynchronous task that will be spawned from main()
  • A Matrix instance to give to this task – we already have it!
  • A Ticker object to maintain a steady 80Hz rate.
  • A way to be able to modify the Image displayed on the matrix from other tasks, such as main(). We will need to use a Mutex from the crate embassy-sync to protect the Image being accessed from several places.

Let’s build this incrementally.

Asynchronous display task

❎ Make a new display asynchronous task taking a Matrix instance as an argument, and copy the current display loop inside. Put an infinite loop around, as we do not want to leave the display task, ever! Add what is needed to make it working (such as a static Image). Spawn the display task from main().

Note that you have to supply a lifetime, as your Matrix type gets one. Fortunately, 'static will work, as this is the lifetime of the ports you configured from your Peripherals object.

Check that your program still works. Still, no green led blinking yet. Both the blinker and display asynchronous tasks run on the same executor, but the display task never relinquishes control to the executor.

Ticking

❎ In your display task, create a Ticker object which will tick every time it should display a new line. 8 lines, 80 Hz, that gives? You got it! Don’t hesitate to use the convenience methods such as Duration::from_hz().

You now want Matrix::display_image() to use this ticker.

❎ Add a ticker parameter to display_image(). You just want to use it, not take ownership of it, so you need a reference. Since you note that the ticker’s next() method requires a &mut self, you need to receive the ticker as a mutable reference as well.

❎ Make display_image() an asynchronous function, since it needs to wait for the ticker to tick.

❎ In display_image(), wait until the tickers tick before displaying a row, so that rows are evenly spaced every 1/640th of a second.

❎ In display(), pass a mutable reference to the ticker to display_image().

If everything goes well, you should see both the image on your led matrix and the green led pattern. Neat, eh?

Image change

Right now, the display tasks does more than displaying something, as it takes care of the Image itself. It should only access it when needed, but creating and modifying the image should not be its responsibility. Let’s fix that.

Sharing a Image between tasks

We will create a shared Image, protected by a mutex. However, you have to understand how Embassy’s mutexes work first.

Embassy asynchronous mutexes

Embassy’s mutexes cannot use spin locks, as spin locks loop forever until they get the lock. If Embassy did this, it would block the current asynchronous task, and thus the whole executor.

Embassy’s mutexes are asynchronous-friendly, and will yield when they cannot lock the resource immediately. However, to implement it, Embassy still needs a real mutex (which Embassy calls a “raw mutex”, or “blocking mutex”) for a very short critical section.

Since all our tasks are running on the same executor, they will never try to lock the raw mutex at the same time. It means that we can safely use the ThreadModeRawMutex as raw mutex.

Creating the shared image object

So we want to create a global (static) Image object protected by a Mutex using internally a ThreadModeRawMutex.

❎ Import embassy_sync::mutex::Mutex and embassy_sync::blocking_mutex::raw::ThreadModeRawMutex.

❎ Declare a new global (static) IMAGE object of type Mutex<ThreadModeRawMutex, Image> and initialize it… but with what?

Creating the initial image

Initialization of static variables are done before any code starts to execute. The compiler must know what data to put in the global IMAGE object.

We could try to use:

static IMAGE: Mutex<ThreadModeRawMutex, Image> = Mutex::new(Image::new_solid(Color::GREEN));

but the compiler will complain:

error[E0015]: cannot call non-const fn `tp_led_matrix::Image::new_solid` in statics
   |
   | static IMAGE: Mutex<ThreadModeRawMutex, Image> = Mutex::new(Image::new_solid(Color::GREEN));
   |                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Indeed, it cannot execute the call to Image::new_solid() before even code starts to execute. However, there is an easy solution here! 💡

The code of Image::new_solid() is likely simple (if it is not, fix it):

impl Image {
    pub fn new_solid(color: Color) -> Self {
        Image([color; 64])
    }
}

Indeed, this is so simple that this could be done at compilation time if the function were a const one. const functions, when given constant parameters, can be replaced by their result at compilation time.

By adding the const keyword:

impl Image {
    pub const fn new_solid(color: Color) -> Self {
        Image([color; 64])
    }
}

the compiler will now be able to create the data structure for the mutex containing the image with the green constant at compilation time, and place it the .data section.

Putting it together

❎ Add the const keyword to the Image::new_solid() function and initialize the IMAGE object. You may want to add a new constant Color, such as BLACK, even though it may be useful at the beginning to look at a visible image.

❎ Modify the display task so that, before being displayed, each image is copied locally in order not to keep the mutex locked for a long time.

Changing images dynamically

❎ Modify the main task so that the IMAGE object is modified, every second or so, by another one.

Don’t make things complicated. You should noticed that your display changes every second, while being pleasant to look at. The green led should blink its pattern at the same time.

This is starting to look nice.

Serial port

As was done in the 4SE07 lab, we want to be able to send image data from the serial port. We will configure the serial port, then write a decoding task to handle received bytes.

Fortunately, this will be much simpler to do so using Rust and Embassy.

The procedure

Of course, we will create a serial_receiver asynchronous task. This task will:

  1. receive the needed peripherals needed to configure the serial port
  2. receive the image bytes
  3. update the shared image by copying the received bytes
  4. loop to receiving the image bytes

Receiving the image efficiently

How can we receive the image most efficiently? How will we handle incomplete images, or extra bytes sent after the image?

The first byte sent is a marker (0xff), we must wait for it. Then we should receive 192 bytes, none of which should be a 0xff. We want to receive all bytes in one DMA (direct memory access) transaction. But what will happen if an image is incomplete?

In this case, another image will follow, starting with a 0xff. In our buffer, we will have:

<------------------------ 192 --------------------->
| o | … | o | o | o | o | 0xff | n | n | n | … | n |
<---------- P ---------->      <-------- N -------->

where o belongs to the original image, and n to the new image (N bytes received). In this case, we should rotate the data P+1 places to the left (or N places to the right, which is equivalent) so that the new image data n is put at the beginning of the buffer, in order to have

<------------------------ 192 -------------------->
|n | n | n | … | n | o | … | o | o | o | o | 0xff |
<------- N --------X--------- P ----------->

We just need to receive the 192-N bytes starting after the N bytes, and check again that there is no 0xff in the buffer. If this is the case, we have a full image, otherwise we rotate again, etc.

Note that the initial situation, after receiving the 0xff marker, is similar to having N being 0, there is no need to special case it.

The task

❎ Create the serial_receiver task. This task receives several peripherals: the USART1 peripheral, the serial port pins, and the DMA channel to use for the reception.

By looking at the figure 29 on page 339 of the STM32L4x5 reference manual, you will see that the DMA channel for reception (RX) of USART1 is DMA1_CH5.

Since we do not need to transmit anything, we do not need to configure the transmission (TX) side of the serial port, and we do not need to attribute a DMA channel for emission. Embassy supports this configuration out of the box and provide a UartRx structure, which is the reception (RX) side of a serial port, leaving the transmission side untouched and unconfigured.

❎ Create the UartRx device. Also, don’t forget to configure the baudrate to 38400.

Note that UartRx::new() expects a _irq parameter. This is a convention for Embassy to ensure at compile time that you have properly declared that the corresponding IRQ is forwarded to the HAL using the bind_interrupts!() macro.

bind_interrupts!(struct Irqs {
    USART1 => usart::InterruptHandler<USART1>;
});

Irqs is the singleton that needs to be passed as the _irq parameter of UartRx::new().

⚠ Depending on your version of Embassy, the order of the parameters for UartRx::new() may be different. Choose carefully the version of Embassy you’re using in the documentation.

The logic

❎ Implement the reception logic, and update the shared image when the bytes for a full image have been received.

Some tips:

  • Use the algorithm shown in “Receiving the image efficiently” above:

    1. Create a buffer to hold 192 bytes
    2. Wait for the 0xff marker — you have then received N=0 image bytes at this stage
    3. Receive the missing 192-N bytes starting at offset N of the buffer
    4. If, looking from the end, you find a 0xff in the buffer at position K:
    • Shift the buffer right by K positions
    • Set N to K and go back to step 3 Otherwise, you have a full image, you can update the shared image and go to step 2.
  • To update the shared image from the received bytes, you can extract it from the static mutex-protected IMAGE object, then request the &mut [u8] view of the image with .as_mut(), since you have implemented AsMut<[u8; 192]> on Image. You can then use an assignment to update the image content from the buffer you have received.

❎ Start the serial_receiver task from main(). Check that you can display data received from the serial port.

Congratulations, your project rocks!

Triple buffering

Our current handling of the image received on the serial port is not very satisfying. As soon as we have received a full image, we update the shared image: it means that the next rows to be displayed will come from the newer image while some rows on the LED matrix may have come from the older image.

⚠ You do not have to implement double-buffering. You have to understand how it works, but you only need to implement triple-buffering.

What is double-buffering?

In older computers, drawing something was performed directly in the screen buffer (also called the video RAM) as memory was tight. It meant that some artifacts could easily be perceived unless extreme caution was observed. For example, if an image was displayed by a beam going from the top to the bottom of the screen, drawing a shape starting from the bottom of the screen would make the bottom half of the shape appear before the top half does. On the other hand, drawing from the top to the bottom at the same pace as the refreshing beam would display consistent pictures.

As memory became more affordable, people started to draw the next image to display into a back buffer. This process lets software draw things in an order which is not correlated with the beam displaying the image (for example objects far away then nearer objects). Once the new image is complete, it can be transferred into the front buffer (the video RAM) while ensuring that the transfer does not cross the beam, which requires synchronization with the hardware. This way, only full images are displayed in a consistent way.

On some hardware, both buffers fit in video RAM. In this case, switching buffer at the appropriate time is done by modifying a hardware register at the appropriate time.

Double-buffering in our project

We already implement part of the double-buffering method in our code: we prepare the next image in a separate buffer while the current one is being displayed in a loop. We could modify our code (⚠ again, you do not need to implement double-buffering, this is only an example, you’ll implement triple-buffering) so that the image switching takes place at the appropriate time:

  • Make the new image a shared resource next_image rather than a local resource.
  • Add a shared boolean switch_requested to the Shared state, and set it in receive_byte when the new image is complete.
  • Have the display task check the switch_requested boolean after displaying the last row of the current image, and swap the image and next_image if this is the case and reset switch_requested.

By locking next_image and switch_requested for the shortest possible time, the receive_byte task would prevent the display task from running for very short periods. However, we could still run into an issue in the following scenario:

  • The last byte of the next image is received just as the current image starts displaying.
  • We set switch_requested to request the image switch, but this will happen after the whole current image as been displayed (roughly 1/60 seconds later, or 17ms).
  • The speed of the serial port is 38400 bits per second, and a byte requires 10 symbols (start, 8 bits, stop).
  • It means that while the current image is being displayed, about 64 bytes of the next-next image can be received.

Where can we store those bytes? If we store them in next_image, we will alter a buffer which has been fully drawn but not displayed yet so we cannot do this. We cannot obviously store them in image either. There is nothing we can do there.

Triple buffering

We need a third buffer: one buffer is the one currently being displayed, one buffer is the next fully completed image ready to be displayed, and one buffer is the work area where we build the currently incomplete image.

In order to avoid copying whole images around, we would like to work with buffer references and switch those references. Should we use dynamic memory allocation? ☠ Certainly not.

The heapless crate

The heapless crate contains several data structures that can be used in environments where dynamic memory allocation is not available or not desirable:

  • heapless::Vec<T> has an interface quite similar to std::vec::Vec<T> except that those vectors have a fixed capacity, which means that the push operation returns a Result indicating if the operation succeeded or failed (in which case it returns the element we tried to push).
  • Other structures such as BinaryHeap, IndexMap, IndexSet, String, etc. act closely like the standard library ones.
  • heapless::pool is a module for defining lock-free memory pools which allocate and reclaim fixed size objects: this is the one we are interested in.

Using a pool

By using a static pool of Image types named POOL, we will be able to manipulate values of type Box<POOL>: this type represents a reference to an image from the pool. Box<POOL> implements Deref<Target = Image> as well as DerefMut, so we will be able to use such a type instead of a reference to an Image. Also, we can easily swap two Box<POOL> objects instead of exchanging whole image contents.

A pool is declared globally by using the heapless::box_pool!() macro as described in the heapless::pool documentation. The BoxBlock<Image> represents the space occupied by an image and will be managed by the pool. Then the .alloc() method can be used to retrieve some space to be used through a Box<POOL> smart pointer. Dropping such a Box<POOL> will return the space to the pool.

  box_pool!(POOL: Image);
  …
  // Code to put in the main function:
  // Statically reserve space for three `Image` objects, and let them
  // be managed by the pool `POOL`.
  unsafe {
    #[expect(clippy::declare_interior_mutable_const)]
    const BLOCK: BoxBlock<Image> = BoxBlock::new();
    static mut MEMORY: [BoxBlock<Image>; 3] = [BLOCK; 3];
    // By defaut, mutable reference static data is forbidden. We want
    // to allow it.
    #[expect(static_mut_refs)]
    for block in &mut MEMORY {
      POOL.manage(block);
    }
  }
  • This pool can hand out Box<POOL> through POOL.alloc(model) which returns an Result<Box<POOL>, Image> initialized from model:
    • Either the pool could return an object (Ok(…)).
    • Or the pool had no free object, in which case the model is returned with the error: Err(model).
  • When it is no longer used, a Box<POOL> can be returned to the pool just by dropping it.

We will build a pool containing the space for three images:

  • When we receive a 0xff on the serial port to indicate a new image, we will draw an image from the pool and start filling its data until we have all the bytes.
  • When an image is complete, the serial receiver will hand it to the display task.
  • The display task starts by waiting for an image coming from the serial receiver and starts displaying it repeatidly.
  • If a new image arrives from the serial receiver after the last line of the current image is displayed, the display task replaces the current image by the new one. This drops the image that was just displayed, and it is then automatically returned to the pool.

We see why, in the worst case, three images might coexist at the same time:

  • The display task may be displaying image 1.
  • The serial receiver has finished receiving image 2 and has stored it so that the display task can pick it up when it is done displaying image 1.
  • The serial receiver has started the reception of image 3.

❎ Declare a pool named POOL handing out Image objects using the box_pool!() macro.

❎ In the main() function, before starting the display or serial_receiver task, reserve memory for 3 Image (using the unsafe block shown above) and hand those three areas to the pool to be managed.

Using Embassy’s Signal

To pass an image from the serial receiver to the display task, we can use the Signal data structure from the embassy_sync crate. The Signal structure is interesting:

  • It acts like a queue with at most one item.
  • Reading from the queue waits asynchronously until an item is available and returns it.
  • Writing to the queue overwrites (and drops) the current item if there is one.

This is exactly the data structure we need to pass information from the serial receiver to the display task. We will make a global NEXT_IMAGE static variable which will be a Signal to exchange Box<POOL> objects (each Box<POOL> contains an Image) between the serial_receiver and the display tasks.

A Signal needs to use a raw mutex internally. Here, a ThreadModeRawMutex similar to the one we used before can be used.

❎ Declare a NEXT_IMAGE static object as described above.

Displaying the image

You want to modify the display task so that:

  • It waits until an image is available from NEXT_IMAGE and stores it into the local image variable.
  • Then in an infinite loop:
    • It displays the image it has received. image is of type Box<POOL>, but since Box<POOL> implements Deref<Target = Image>, &image can be used in a context where an &Image would be required.
    • If there is a new image available from NEXT_IMAGE, then image is replaced by it. This will drop the older Box<POOL> object, which will be made available to the pool again automatically.

NEXT_IMAGE.wait() returns a Future which will eventually return the next image available in NEXT_IMAGE:

  • Awaiting this future using .await will block until an image is available. This might be handy to get the initial image.
  • If you import futures::FutureExt into your scope, then you get additional methods on Future implementations. One of them is .now_or_never(), which returns an Option: either None if the Future does not resolve immediately (without waiting), or Some(…) if the result is available immediately. You could use this to check if a new image is available from NEXT_IMAGE, and if it is replace the current image.

❎ Add the futures crate as a dependency in your Cargo.toml. By default, the futures crates will require std; you have to specify default-features = false when importing it, or add it using cargo add futures --no-default-features.

❎ Rewrite display() to do what is described above.

You now want to check that it works by using an initial image before modifying the serial receiver. To do so, you will build an initial image and put it inside NEXT_IMAGE so that it gets displayed.

❎ At the end of the main() function, get an image from the pool, containing a red gradient, by using the POOL.alloc() method.

❎ Send this image containing a gradient to the NEXT_IMAGE queue by using the signal method of the queue.

You should see the gradient on the screen.

❎ Now, check that new images are correctly displayed:

  • Surround the code above with an infinite loop.
  • Inside the loop, add an asynchronous delay of 1 second after sending the image to NEXT_IMAGE.
  • Still inside the loop, repeat those three steps (get an image from the pool, send it to the display task through NEXT_IMAGE, and wait for one second) in another color.

If you see two images alternating every second, you have won: your display task is working, with proper synchronization. Time to modify the serial receiver.

Receiving new images

Only small modifications are needed to the serial receiver:

  • When you receive the first 0xff indicating a new image, get an image from the pool (you can initialize it from the default image, Image::default()). You may panic if you don’t get one as we have shown that three image buffers should be enough for the program to work.
  • Receive bytes directly in the image buffer, that you can access with image.as_mut() (remember, you implemented the AsMut trait on Image).
  • When the image is complete, signal its existence to NEXT_IMAGE.

❎ Implement the steps above.

❎ Remove the static IMAGE object which is not used anymore.

❎ Remove the image switching in main(), as don’t want to interfere with displaying the images received from the serial port. You may keep one initial image though, to display something before you receive the first image through the serial port.

❎ Check that you can display images coming from the serial port. Congratulations, you are now using triple buffering without copying large quantities of data around.

Bonus Level: Advanced Features 🌟

Congratulations on making it this far! This bonus section is completely optional—you can achieve the maximum grade without completing it, as long as the core requirements are perfect.

But here’s the exciting part: These bonus tasks are not only fun challenges that deepen your embedded Rust skills, but they can also earn you additional points if you haven’t quite reached the maximum grade yet.

What’s Available

Ready to level up? Choose from these advanced features:

1. Dedicated Executor

Implement priority-based task scheduling with a dedicated executor for your display task. This ensures glitch-free rendering even when the system is under heavy load. Perfect for understanding real-time scheduling!

2. Screen Saver

Every great display deserves a screen saver! Build animated patterns that activate after a period of inactivity. Express your creativity while learning about state management.

3. Text Drawing

What if your screen saver could display scrolling text or messages? Implement pixel-based text rendering to take your LED matrix to the next level.

Why Do These?

Beyond potential grade points, these challenges will:

  • Deepen your understanding of async Rust
  • Teach you advanced embedded patterns
  • Give you impressive portfolio pieces
  • Most importantly: They’re genuinely fun! 🎮

Pick what interests you and enjoy the journey! 🚀

Dedicated executor

Until now, we used only one executor in thread mode (the regular mode in which the processor runs, as opposed to interrupt mode). It means that Embassy’s executor will execute one asynchronous task until it yields, then the other, then the other, and so on. If for any reason one task requires a bit more time than expected, you might delay other tasks such as the display task. In this case, you might notice a short glitch on the display.

To prevent this, we will use a dedicated interrupt executor to run our display task. In this scenario, when it is time to display a new line on the display, an interrupt will be raised and the executor will resume the display task while still in interrupt mode, interrupting the rest of the program.

You will have to choose an unused hardware interrupt, and:

  • configure it to the priority you want to use, with regard to other interrupts in the system
  • start the executor, telling it to tell its tasks to raise this interrupt by software (pend the interrupt, as in make it pending) when they have progress to signal
  • call the executor’s on_interrupt() method in the ISR, so that the executor knows that it must poll its tasks

Those are three easy tasks. We will choose interrupt UART4, and set it to priority level Priority::P6:

❎ Add the executor-interrupt feature to the embassy-executor dependency in Cargo.toml.

❎ Create a static DISPLAY_EXECUTOR global variable, with type InterruptExecutor.

❎ Choose an unused interrupt (pick UART4, whose number is available as as embassy_stm32::interrupt::UART4), configure it with an arbitrary priority (use Priority::P6). Start the DISPLAY_EXECUTOR and associate it with this interrupt. Use the returned spawner to spawn the display task.

❎ Write an ISR for this interrupt, and redirect the event to the executor:

#[interrupt]
unsafe fn UART4() {
    unsafe {
        DISPLAY_EXECUTOR.on_interrupt();
    }
}

Note that ISR are unsafe functions, as doing the wrong thing in an interrupt routine might lock up the system.

At this stage, you might notice that your code does not compile: the NEXT_IMAGE data structure uses a ThreadModeRawMutex as its internal mutex. Such a mutex, as its name indicates, can only be used to synchronize tasks running in thread mode, not in interrupt mode.

❎ Use a CriticalSectionRawMutex as an internal mutex for NEXT_IMAGE, because such a mutex is usable to synchronize code running in interrupt mode with code running in thread mode.

Your display should now be as beautiful as ever.

Screen saver

What should your led matrix do when you do not send anything on the serial port? Wouldn’t it be great to have a screen saver, which automatically runs when nothing is sent, and does not get in the way otherwise?

You will have to create a new screensaver task, which will trigger an image change when nothing is being received on the serial port for a while.

Recording image changes

You don’t want the screen saver to run if data is being received. Let’s record new images arrival.

❎ Declare a static NEW_IMAGE_RECEIVED Signal object containing a Instant.

❎ When a new image is received in serial_receiver, signal the current date to the NEW_IMAGE_RECEIVED queue.

Implementing the screensaver task

❎ Implement a screensaver task and start it on the thread-mode (regular) executor.

In this task, you may for example, in an infinite loop:

  • Read the date of the last image received without waiting.
  • If any image has been received, wait until one second after this date and continue the loop. This way, you effectively do not display anything until the serial port has been idle for one second.
  • Display your screensaver image (get one from the pool and set it to NEXT_IMAGE).
  • Wait for one second.

You can even be more creative and use alternating images every second.

Note that both the serial port code and the screensaver run in thread-mode. The NEW_IMAGE_RECEIVED should only require a ThreadModeRawMutex for its internal synchronization. Check that you haven’t used a CriticalSectionRawMutex as it does not require one.

Drawing things

The screensaver feature was nice, but the screensaver could be more entertaining. What if it could display scrolling text, such as “This Rust 4SE02 project will get me a good grade”?

Fortunately, one crate can help you do that: embedded-graphics. Provided you do the proper interfacing with your hardware, this crate will let you draw all kind of shapes, and even display text.

Interfacing with your hardware: the embedded module

You have already decoupled the logical representation of your LED matrix (the Image type) from the physical one (the Matrix type). This will make your job easier, as you will only have to interface the Image type with the embedded-graphics crate: once you have an Image you can display it on your hardware by putting it into next_image.

❎ Create an embedded module in your library. This module will contain anything needed to interface the drawing primitives of the embedded-graphics crate with your Image type.

First you’ll have to choose a pixel representation that embedded-graphics can use and which is appropriate for your display. Since you can already display RGB colors with 8 bits data for each component, the Rgb888 color type seems appropriate.

❎ Implement From<Rgb888> for your Color type. That will be useful when drawing on your Image, to build a proper Color value.

Now, you need to implement the DrawTarget trait for your Image type. This trait is the one which does the real drawing. You will only implement the minimal functionality and use the provided defaults for the rest.

❎ Implement DrawTarget for Image:

  • The Color type will be Rgb888.
  • You can use Infallible as your Error type, because drawing into an Image never fails.
  • When you implement draw_iter(), make sure that you only set the pixels whose coordinates belong to the image (x and y both in 0..8). This method can be called with a larger image, for example a large text, and you will only display a portion of it.
  • If you need to convert a Rgb888 into a Color, do not forget that you can use .into() because you implemented From<Rgb888> for Color.

Upgrading the screensaver

You can now use the drawing primitives of embedded-graphics to create images in your screensaver instead of using gradients.

❎ Modify your screensaver so that it creates intesting images using the drawing primitives.

For example, you could add another local variable in addition to the color index, such as a shape index, and draw a square, a triangle, a circle, and a solid color. Ideally, those color and shape indices would use cycle sizes which are coprime, to maximize the displayed combinations.

When this works, commit and push your code.

Drawing text

The next step is to display scrolling text from the screensaver. Yes, that means forgetting about the shapes that you just designed, they were used to familiarize yourself with the library.

A Text object represents some text that can later been drawn into anything implementing DrawTarget (such as an Image). It uses a character style, which can be built using MonoTextStyle::new() from a font and a color. And the ibm437 crate provides a great IBM437_8X8_REGULAR font which will be perfect for your LED matrix.

The idea is to wait for 60ms (instead of one second) after you have displayed an image to make some text scroll to the next position if no new image has been received. To make the text scroll to the left, you will position it as a negative x offset: since you display pixels whose x is in 0..8, decreasing the x position of the start of the text will make it go left.

❎ Modify the screensaver task so that it gets called every 60ms. You need a precise timing if you want the scrolling to be pleasant.

❎ Modify the screensaver task such that, when it wants to display something:

  • A Text object is built with a text such as “Hello 4SE02”, and placed at an x position whose value is kept in a offset local variable. You can use the color you want, or make the color cycle.
  • The text is drawn into an image coming from the pool, and displayed through NEXT_IMAGE.
  • Decrease the offset local variable, except if the end of the text has reached the 0 x coordinate, in which case offset must be reset to display the text again (find the appropriate value so that it is nice for the eyes). Note: the Text object has methods to check its bounding box (the smallest rectangle in which it fits).

❎ Modify the screensaver task so that if a new image has been received on the serial port, the offset of the text is reset so that next time the screensaver displays something it will start from the beginning of the text.

Note: you might have to adapt your DrawingText trait implementation for Image, for example if the text appears upside down.

Make it even prettier if you wish, commit, push.

🦀 Congratulations, you have reached the end of this lab! 🦀

Bridge: running the VM in the background

You now have two cool pieces of software:

Let’s make them work together.

In this section, we will run the VM in the background while the LED matrix keeps doing its job (displaying images, receiving pixels from UART, etc.). If everything goes well, you’ll be able to:

  • keep sending images to the board as before,
  • simultaneously observe VM output in a serial terminal,
  • and confirm the display remains stable (no glitches).

Advanced / optional part

This bridge is intentionally more advanced than the rest of the lab. It is meant for students who want to go further into:

  • Rust in no_std environments,
  • cooperative execution models,
  • and real embedded integration constraints (ownership of peripherals, backpressure, time-slicing CPU work).

If you are short on time, it is OK to stop after the main LED-matrix lab.

What makes it interesting

The constraints are the interesting part:

  • no code copy: you will modify your VM crate in place (one shared implementation),
  • the VM library must be usable in no_std (embedded) environments,
  • it must integrate nicely with an Embassy application,
  • it must not perturb the LED matrix real-time behavior,
  • the VM output must go to the serial port.

This section is a bit more open-ended than the previous ones: you will have to make design choices. But we’ll keep the spirit of the lab: small steps, frequent builds, and a clear target.

What we will build

We will:

  1. reshape the VM crate so that the library is no_std, and provides an API that returns output bytes via a small buffer,
  2. add the VM crate as a dependency of the LED matrix project (by path, to keep one shared codebase),
  3. embed a prebuilt VM binary in the firmware using include_bytes!(),
  4. spawn a new task vm_runner that repeatedly executes this program one step at a time, yielding often,
  5. forward the VM output to USART1 TX, while leaving the existing USART1 RX image receiver intact.

Prerequisites

  • You have a working tp-led-matrix project (in Embassy async mode).
  • You have a working vm project from the earlier lab.

We will assume you have both available locally as Cargo crates.

Exact paths do not matter, as long as you can reference one crate from the other using a relative path = "..." dependency.

Making the VM embeddable

In the first part of the course, you wrote a VM that runs on the desktop.

Now that we are bridging the VM and the Embassy-based embedded program, we will evolve this VM crate so that:

  • it becomes no_std-compatible (library part),
  • it gains a single execution API that returns output bytes via a small buffer,
  • and it still behaves the same as before when used from the command line.

This work happens now, as part of the bridge.

Context

The VM crate you wrote earlier was written for the desktop first, and therefore uses std by default.

In order to embed it in tp-led-matrix (which is #![no_std]), the VM library part must be made no_std compatible.

This bridge is intentionally done in this order:

  1. change step_on so it returns output bytes via a small buffer (and adapt step/run/CLI/tests accordingly)
  2. then make the VM library no_std
  3. embed it into the Embassy firmware

However, having no_std is not enough: we also need the VM to play well with an embedded program that uses async I/O.

A background task that runs the VM must:

  • execute a small amount of work,
  • yield back to the executor,
  • execute again later,
  • without ever monopolizing the CPU.

We will therefore:

  1. make the VM library compile in no_std mode (using a Cargo std feature enabled by default),
  2. expose a single-step API that writes output into a caller-provided buffer,
  3. implement a cooperative runner (run_budget_on) so we can time-slice the VM in the embedded application.

Step 1: change step_on to use a buffer

Do this refactor first, while your VM still builds as a normal desktop project.

Why now?

  • it’s easier to adapt step/run, the CLI and the tests while std is still available everywhere,
  • and once the signature no longer mentions std::io::Write, making the library no_std becomes mostly mechanical.

The suggested signature is:

pub fn step_on(&mut self, out: &mut [u8]) -> Result<(bool, usize)>

which adds to the return value the number of bytes that have been written to the out buffer (or 0 if no output instruction has been used).

After this step, your VM core should no longer need a Write to produce output: each instruction will instead fill a small caller-provided byte buffer.

Step 2: making the VM library no_std

Now we need to make sure it can compile in no_std so it can run on the microcontroller.

Your starting point (student version) typically has:

  • an execution API (Machine::step / run),
  • output written to stdout,
  • tests/examples built for the host.

We want:

  • the library to build in no_std,
  • the desktop CLI and tests to keep working,
  • and the same VM crate to be usable as a dependency from tp-led-matrix.

This is why we will not create a copy of the VM crate: we will evolve the same codebase to support both worlds.

Since we are going async-only, the std/no_std split is now mostly about:

  • whether you can use the standard library (std),
  • what runtime you run on,
  • and what output device you target.

In practice:

  • desktop: CLI + stdout output
  • embedded: Embassy + UART output

You will still typically use a std feature enabled by default so the CLI remains easy to run on the desktop.

This pattern is widely used in the ecosystem (many crates do the same), and once you have it, your VM becomes reusable in any embedded project.

Once that is done, you should be able to build the VM library in no_std with:

cargo build --lib --no-default-features

The std feature (what stays on the host)

We want one crate that:

  • can be used as a no_std library from the firmware,
  • but still provides the host-facing convenience API used by the original VM lab.

The common Cargo pattern is:

  • define a std feature, enabled by default,
  • compile in no_std when default features are disabled.

This typically looks like:

  • default = ["std"]
  • std = []
  • plus #![cfg_attr(not(feature = "std"), no_std)] at the crate root.

❎ Keep these parts available without the std feature:

  • the VM core (Machine, instruction decoding/execution, memory/register access),
  • the new buffer-based API (step_on(out: &mut [u8]), run_budget_on, …),
  • any helper types that only depend on core.

❎ Keep these parts only when std is enabled:

  • the host convenience methods that write directly to stdout:
    • step() (uses stdout internally)
    • run() (uses stdout internally)
  • any error variants / conversions that mention std::io.

In other words: the embedded firmware never calls step() / run(), it only uses step_on(&mut [u8]) and forwards the returned bytes to UART TX.

Example: keep step() and run() behind std

Here is the idea for host convenience methods implemented on top of the new buffer-based step_on:

#![allow(unused)]
fn main() {
#[cfg(feature = "std")]
impl Machine {
    pub fn step(&mut self) -> Result<bool> {
        use std::io::Write;

        let mut out_buf = [0u8; 11];
        let (exited, n) = self.step_on(&mut out_buf)?;

        if n != 0 {
            let mut stdout = std::io::stdout().lock();
            stdout.write_all(&out_buf[..n])?;
            stdout.flush()?;
        }

        Ok(exited)
    }

    pub fn run(&mut self) -> Result<()> {
        while !self.step()? {}
        Ok(())
    }
}
}

(Exact return types/signatures depend on what your original VM lab required, but the pattern is the same: step()/run() call step_on(&mut [u8]), then forward the produced bytes to stdout.)

Step 3: return output bytes via a buffer

We will make VM output explicit: each instruction may produce some bytes, and the VM will write them into a small buffer provided by the caller.

Key observation:

  • opcode 6 (out) outputs one UTF-8 encoded character: at most 4 bytes
  • opcode 8 (out number) outputs an i32 as ASCII: at most "-2147483648" → 11 bytes

So if the caller provides, e.g., a 12-byte buffer, the per-instruction output is always bounded.

The new execution API

⚠️ This changes the VM API compared to the initial VM lab.

In particular, you will change the signature of step_on (and any helper runners you add) so that it no longer takes a std::io::Write. Instead, it produces bytes into a small buffer, and the caller decides how to forward them (stdout on the host, UART TX on the board, etc.).

This also means that the host convenience methods (step, run, etc.) must be adapted: they can no longer forward a Write into step_on. They must call the new buffer-based step_on, then write &out_buf[..n] to stdout.

❎ Expose a stepping API:

  • pub fn step_on(&mut self, out: &mut [u8]) -> Result<(bool, usize)>

where:

  • the returned bool is true if the VM has exited
  • the returned usize is the number of bytes written into out for that instruction

Inside step_on, everything stays the same except opcodes 6 and 8:

  • For opcode 6 (out): encode the character into a local 4-byte buffer and copy it into out, then return Ok((exited, n)).
  • For opcode 8 (out number): format into a local small buffer (itoa) and copy it into out, then return Ok((exited, n)).

Where is backpressure now?

The caller is responsible for sending &out[..n] to the final output (stdout on the desktop, UART TX on the board).

This removes the need for the VM to depend on std::io::Write.

Budgeted execution

❎ Implement a cooperative runner:

  • pub fn run_budget_on(&mut self, out: &mut [u8], budget: u32) -> Result<(bool, usize)>

This runs up to budget steps and returns accumulated bytes written. (You can also keep a thin helper that just loops on step_on if you prefer.)

Step 4: cooperative stepping

Even with buffered output, the VM still does CPU work between outputs.

The simplest way to integrate a CPU-bound algorithm into an async system is:

  • split it into very small units of work (here: a single step_on()),
  • run a fixed number of steps (“budget”),
  • then yield back to the executor.

That is exactly what run_budget_on() gives you.

In the firmware, you will combine it with a small delay or yield_now() so the VM never monopolizes the CPU.

Why make this change in the library?

Because in an embedded environment, you often want to place policy decisions outside of the library:

  • the library executes “pure compute” plus formatted output,
  • the application decides how much CPU time to give it.

This keeps the VM reusable.

Optional: reduce formatting overhead

Your VM uses write!(out, ...) for each character / number.

In embedded environments, formatting can be expensive. Two easy techniques:

  • pick a “budget” small enough that formatting overhead does not block the system,
  • prefer output programs that don’t print too fast.

Later in this section we will hook this output to UART TX.

Depending on the VM crate from tp-led-matrix

Now that the VM library exposes an embeddable building block (a budgeted runner), we will use it from the LED matrix firmware.

Add the dependency

In your LED-matrix project Cargo.toml:

❎ Add a dependency on the VM crate by path.

Important: this must point to your existing VM crate directory, not to a copy. From now on, there should be exactly one shared implementation of the VM library crate in the repository, used both by:

  • the desktop VM CLI/tests,
  • and the embedded tp-led-matrix firmware.

Using a path = "..." dependency is what guarantees you are editing/running the same code in both contexts.

Important naming detail:

  • The package name of the VM project is vm (that is the name = "vm" under [package] in the VM crate’s Cargo.toml). This is what Cargo uses to find the crate on disk / in the workspace.
  • The library crate name is interpreter (that is the name = "interpreter" under [lib]). This is the name you write in Rust code: use interpreter::Machine;.

Those two names are allowed to differ.

In tp-led-matrix, we want the dependency key to match the name we will use in Rust code (interpreter), and we want to be explicit that the underlying package we’re pointing at is the VM package (vm).

Also:

  • your VM crate is std by default, so in embedded you must disable default features (to turn std off).

Your dependency entry should therefore look like:

  • interpreter = { package = "vm", path = "../vm", default-features = false }

(Adjust the relative path if needed.)

cargo build the embedded firmware.

At this stage, you are only compiling the VM library for your embedded target. Nothing is executed yet.

If this fails with errors mentioning std, go back to the previous page and finish the no_std conversion of the VM crate.

Sanity check

In tp-led-matrix/src/main.rs (or wherever your async main lives):

❎ Add a tiny, compile-only check that the type is visible, for instance:

  • use interpreter::Machine;

Do not instantiate it yet, as we will do this in a new async task.

Embedding a VM program with include_bytes!()

We want the firmware to boot and immediately start running a VM program.

Because embedded systems may not have a filesystem, we will embed the bytecode in the firmware at compile time.

Choose a program

Your VM repository ships example programs (such as hello_world.bin or 99bottles.bin). Pick a small one first.

For example: vm/examples/hello_world.bin (relative to the root of your VM project)

Embed the bytes

In your firmware code, define:

  • static PROGRAM: &[u8] = include_bytes!("../../vm/examples/hello_world.bin");

Notes:

  • the path is relative to the Rust source file where the macro is located,
  • include_bytes!() returns &'static [u8; N]; the coercion into &'static [u8] happens automatically.

Acceptance criteria

  • the program bytes are visible as a &'static [u8] constant,
  • the firmware builds.

We will actually execute the VM in the next page.

Printing VM output on the serial port

The VM produces output through the out and out number instructions.

We want this output to go to a serial terminal on the host.

But there is a catch: your firmware currently uses USART1 RX with DMA to receive images.

We must:

  • keep the RX task intact,
  • enable TX,
  • ensure the VM can print without blocking for long periods.

Strategy: a small UART TX task + a channel

The most robust and student-friendly approach is to keep UART TX owned by one dedicated task, and send VM output bytes to that task through a bounded channel.

  • The VM task stays simple: it just calls step_on(&mut out_buf) and forwards any produced bytes.
  • The serial TX task owns UartTx and is the only place that touches the hardware.
  • The channel capacity provides backpressure: if the terminal is slow, the channel fills up and the VM automatically pauses when trying to send more bytes.

This design avoids having to implement a custom Future with Pin/poll just to adapt a UART driver.

What “backpressure” means

A UART is slower than the CPU. Sometimes it cannot accept a new byte right now because its internal hardware buffer is full.

Backpressure is the mechanism that prevents data loss in this situation:

  • when the UART cannot accept more data, the channel send(...).await call does not complete yet,
  • so the VM runner task is paused (it yields to the executor),
  • and it automatically resumes later when the UART has transmitted enough bytes.

This is exactly what we want:

  • no bytes are dropped,
  • VM output stays in-order,
  • other Embassy tasks (display, RX, etc.) can continue running while the VM is waiting.

Important: sharing USART1

On STM32, a single peripheral instance (USART1) cannot be safely split into two independent “owners” (one RX and one TX) unless the HAL explicitly supports it.

There are two valid approaches:

  1. Preferred (single owner): configure USART1 once, then split the driver into RX and TX halves and pass them to the respective tasks.
  2. Fallback (two UARTs): keep the existing RX setup on USART1, and use a different UART peripheral for VM output.

In this lab, we will aim for (1). You will need to inspect the Embassy STM32 UART API for your version.

❎ Modify your serial initialization so that:

  • UART is created once in main,
  • you obtain a TX handle and an RX handle,
  • RX goes to the existing serial_receiver task,
  • TX will be moved into the VM output object (later used by the VM task).

If your current code uses UartRx::new(...), you will likely need to move to a constructor that returns a full Uart and then split it.

How to use split() in practice

The exact constructor name depends on your Embassy version, but the pattern is always the same:

  1. Create a full UART instance once (in main) with both pins and both DMA channels.
  2. Split it into a transmit half and a receive half.
  3. Move each half into the task that owns it.

Typical pattern

  • In main, instead of building a UartRx, build a full Uart (or BufferedUart) instance.
  • Immediately call .split() (or sometimes .split_rx_tx() depending on the API).

Conceptually:

  • let uart = Uart::new(...);
  • let (tx, rx) = uart.split();

Then:

  • spawn serial_receiver(rx, ...) for the LED-matrix image receiver,
  • pass tx to the serial TX task (e.g. vm_serial_tx(tx)),
  • the VM runner task will not touch the UART directly.

Important ownership rule

After calling split(), do not keep using the original uart object: both halves now own the peripheral state they need.

DMA note

If your setup uses DMA for RX (as in the image receiver), you will typically:

  • give the RX half the RX DMA channel,
  • provide a TX DMA channel for the TX half.

The key idea is: instantiate once, split once, then move halves to tasks.

VM output with a channel (backpressure)

The VM core produces per-instruction output bytes into a small buffer via Machine::step_on(&mut [u8]).

So on the firmware side we:

  • run the VM in an async task,
  • and forward the produced bytes to a dedicated UART TX task through a bounded channel.

The channel provides backpressure: when it is full, send().await will suspend the VM runner task until the UART catches up.

❎ Implement VM output using a channel and a TX task:

  1. Create a bounded Channel (for example Channel<..., u8, 256>).
  2. Create a vm_serial_tx task that:
    • owns UartTx,
    • receives bytes from the channel,
    • writes them to UART (tx.write(&[b]).await).
  3. In your VM runner task:
    • create a small output buffer (11 bytes is enough),
    • call let (exited, n) = machine.step_on(&mut out_buf)?;
    • for &out_buf[..n], send().await each byte to the channel.

No bytes are dropped: if the UART cannot accept more data yet, the VM runner naturally pauses.

Acceptance criteria

  • You can open a serial terminal and see VM output.
  • The LED matrix still refreshes correctly.
  • The serial_receiver still works.

Running the VM as an async background task

Now we have all the pieces:

  • a VM library that can run for a bounded instruction budget per call,
  • an embedded VM program included with include_bytes!(),
  • a UART TX half and a bounded channel to provide backpressure.

Let’s write the vm_runner task.

The vm_runner task

❎ Create a new async task vm_runner that:

  1. creates a VM Machine from the included bytes,
  2. creates a small per-instruction output buffer (11 bytes is enough),
  3. runs the machine in a loop using step_on (or run_budget_on) and forwards produced bytes to UART TX through the bounded channel,
  4. yields between budgets so other tasks always get CPU time.

Pseudo-code:

  • create machine
  • loop (“run one VM instance”):
    • loop (“budget”):
      • run one step_on
      • forward out_buf[..n] into the TX channel
      • if exited == true: stop running this VM instance (break out of the program loop)
    • Timer::after_millis(0).await (preferred: always available) or Timer::after_millis(1).await
    • restart the VM (create a fresh machine)

Note: older/newer Embassy versions may or may not have embassy_executor::yield_now(). The portable solution is a zero-duration timer.

Choosing a budget

The budget is how you control CPU usage.

Start with something conservative, for example:

  • BUDGET = 50 instructions

Then:

  • if the VM output is too slow: increase budget,
  • if the display glitches: decrease budget.

This is exactly the kind of real-time trade-off you will face in real embedded systems.

Do not perturb the LED matrix

Your display task likely runs on a dedicated interrupt executor (if you did that bonus part) or at least relies on timely polling.

To keep it stable:

  • never run the VM in a tight loop without yielding,
  • let UART backpressure suspend the VM task (for example by awaiting on a bounded channel send),
  • do not lock shared resources for long.

Error handling

If step_on() returns an error (bad program, invalid opcode, etc.):

  • print the error (on serial),
  • restart the VM.

In an embedded system, a “crash and restart the subsystem” strategy is often completely fine.

Acceptance criteria

  • At boot, the board immediately starts printing VM output.
  • You can still upload images on the LED matrix.
  • There are no visible glitches on the display.

Checklist and possible improvements

Minimal checklist

You are done when all of the following are true:

  • the VM library builds in no_std (with default-features = false),
  • the VM library exposes an async budgeted stepping API (run_budget) usable from async firmware,
  • the LED matrix firmware includes a .bin VM program using include_bytes!(),
  • the firmware runs the VM in a background async task,
  • VM output is visible on a serial terminal (complete, in-order, no dropped bytes), matching the desktop output for the same .bin program,
  • the LED matrix remains responsive and does not glitch,
  • the serial image receiver keeps working.

Improvements

If you want to go further:

In practice, the most robust solution is to own UART TX in a single dedicated task and send outgoing bytes to it through a bounded channel (e.g. embassy_sync::channel::Channel).

Why this is the recommended approach:

  • it avoids tricky low-level Future/poll adapter code,
  • it keeps peripheral ownership clear (one task owns UartTx),
  • it still provides backpressure (when the channel is full, the VM waits),
  • it is easy to extend (batching, prefixes, multiple producers, etc.).

If you want higher throughput, the TX task can also batch bytes before calling tx.write(...).

2. Smarter yielding inside the VM

Instead of yielding after a fixed budget:

  • yield after each out instruction,
  • or yield when a certain amount of output was produced.

3. Run multiple VM instances

Spawn multiple vm_runner tasks with different programs and prefixes, for example:

  • [vm1] ...
  • [vm2] ...

4. “VM controls the image”

A fun integration project:

  • define a memory-mapped “framebuffer” region in VM memory,
  • have a task periodically read it and update the LED matrix image.

This makes your interpreted program drive the hardware.