MCU Agent Architecture — On-Device and Off-Device
MCU Agent Architecture
An MCU agent is structured as a layered firmware architecture: hardware abstraction and sensor I/O at the bottom, a real-time event loop in the middle, an agent state machine and optional inference engine above that, and a communication layer at the top — with delegation to off-device services only when local logic is insufficient.
This page describes the software layers, the data flow between them, and the decision points where on-device work ends and off-device delegation begins.
What are the software layers?
┌───────────────────────────────────────┐
│ Cloud / Edge Agent │ LLM reasoning, planning, RAG
├───────────────────────────────────────┤
│ Communication Layer │ MQTT, CoAP, HTTP/TLS
├───────────────────────────────────────┤
│ Agent State Machine │ Goals, transitions, command handler
├───────────────────────────────────────┤
│ Inference Engine (optional) │ TFLM, EON, CMSIS-NN
├───────────────────────────────────────┤
│ Feature Extraction / Pre-processing │ FFT, RMS, windowing, normalization
├───────────────────────────────────────┤
│ Sensor Abstraction Layer │ I2C, SPI, ADC, UART drivers
├───────────────────────────────────────┤
│ Hardware (MCU + peripherals) │ ESP32-S3, STM32H7, RP2040…
└───────────────────────────────────────┘
Each layer has strict responsibility boundaries. The sensor layer does not know about MQTT. The state machine does not directly write to GPIO — it sets a state and the action handler executes the consequence.
How does the event loop work?
On FreeRTOS, the event loop is usually a collection of tasks communicating over queues:
/* Three-task architecture on FreeRTOS */
// Task 1: sensor sampling at fixed interval
void vSensorTask(void *pvParams) {
SensorCtx_t ctx;
for (;;) {
sensor_read(&ctx.raw);
feature_extract(&ctx);
xQueueSend(xSensorQueue, &ctx, portMAX_DELAY);
vTaskDelay(pdMS_TO_TICKS(SAMPLE_MS));
}
}
// Task 2: agent state machine
void vAgentTask(void *pvParams) {
SensorCtx_t ctx;
AgentState_t state = STATE_IDLE;
for (;;) {
xQueueReceive(xSensorQueue, &ctx, portMAX_DELAY);
state = agent_transition(state, &ctx);
xQueueSend(xActionQueue, &state, 0);
}
}
// Task 3: communication
void vCommTask(void *pvParams) {
AgentState_t state;
for (;;) {
xQueueReceive(xActionQueue, &state, portMAX_DELAY);
if (state == STATE_ALERT || state == STATE_DELEGATE)
mqtt_publish_state(&state);
}
}
This decouples sampling latency from network latency. The sensor task never blocks on a network operation.
What does the agent state machine look like?
| State | Entry condition | On-device action | Off-device action |
|---|---|---|---|
IDLE | Default | Sample sensors, buffer data | None |
DETECTING | Threshold crossed or model output > confidence | Run inference, debounce | None |
ALERT | Inference confirms anomaly | Actuate if configured | Publish event to MQTT broker |
DELEGATE | Confidence too low for local decision | Buffer context | Publish to delegate topic; await response |
CMD_RECEIVED | Command arrives on sub topic | Apply params, update thresholds | Acknowledge receipt |
OTA | OTA signal received | Halt sensor tasks | Download and verify firmware |
The state machine is the agent’s “brain” — it is small, deterministic, and runs entirely on-device. The cloud’s role is to provide guidance that the state machine then applies.
How does inference fit in?
TinyML inference is a function call, not a separate thread in most implementations:
AgentState_t agent_transition(AgentState_t current, SensorCtx_t *ctx) {
if (ctx->rms < IDLE_THRESHOLD) return STATE_IDLE;
/* Run TFLite Micro inference */
float score = tflm_infer(ctx->feature_vec, FEATURE_LEN);
if (score > CONFIDENCE_HIGH) return STATE_ALERT;
if (score > CONFIDENCE_LOW) return STATE_DETECTING;
return STATE_DELEGATE; /* uncertain — ask upstream */
}
The inference call takes between 5 ms (keyword detect on ESP32-S3 with vector ISA) and several hundred milliseconds (image classification on Cortex-M4F without dedicated NPU). Budget this carefully against your sampling rate.
What gets delegated and when?
Delegation is triggered when the agent cannot make a confident local decision. The trigger is a confidence score, not a timeout:
- High confidence → act locally. No network latency, no connectivity dependency.
- Low confidence → delegate. The agent bundles its feature vector and pre-processed context, publishes to a request topic, and enters a waiting state.
- No response within deadline → fall back. The agent applies a conservative default action (e.g., safe state) rather than waiting forever.
Delegation payloads should be compact: pre-extracted features rather than raw sensor streams. A 64-float feature vector (256 bytes as float32, 64 bytes as int8 quantized) is practical. A 10-second raw audio buffer at 16 kHz is not.
How does the communication layer fit in?
The communication layer is responsible for:
- Connecting and reconnecting to the MQTT broker (with exponential back-off).
- Publishing agent events, readings, and delegation requests.
- Subscribing to command topics and delegation response topics.
- Serializing and deserializing payloads (JSON, CBOR, MessagePack).
- Buffering messages when offline and flushing when connectivity returns.
It is entirely separate from the decision layer. The state machine should not call mqtt_publish() directly — it should enqueue an action and let the communication task handle it. This prevents the state machine from blocking on network operations.
Component-to-layer mapping by board
| Board | Sensor layer | Inference layer | Comm layer | State machine |
|---|---|---|---|---|
| ESP32-S3 | ESP-IDF ADC/I2C/SPI | TFLM + ESP-NN vector | esp-mqtt + LwIP | C state machine / FreeRTOS |
| STM32H743 | STM32Cube HAL | STM32Cube.AI / CMSIS-NN | LwIP + MQTT-C | C state machine / FreeRTOS |
| RP2040 | Pico SDK I2C/ADC/SPI | TFLM (no DSP ext.) | lwIP / paho-embedded | C state machine / bare loop |
| nRF52840 | Zephyr sensor drivers | TFLM + Zephyr ML | Zephyr MQTT | C state machine / Zephyr |
Platform example: ForestHub.ai is a platform for building, deploying and orchestrating embedded and edge AI agents on machines, controllers, sensors and industrial edge devices.
FAQ
Q: Does every MCU agent need FreeRTOS? No. A simple super-loop with interrupt-driven sensor reads and a polling MQTT client works for low-frequency, non-concurrent agents. FreeRTOS becomes necessary when you need real concurrency: OTA + sensing + communication running without blocking each other.
Q: How much SRAM does the state machine itself use? A 6-state FSM with associated context typically uses less than 1 KB. The inference engine and its activations dominate RAM usage.
Q: Can the agent run two ML models? Yes, sequentially. A common pattern: run a lightweight keyword/anomaly detector first (fast, low SRAM). Only if it fires, run a heavier classifier (slower, higher SRAM). This cascades compute cost against prior probability.
Q: How do you unit-test the state machine? Test it on the host. The agent_transition() function takes a state and a sensor context and returns a new state — no hardware dependency. Compile it with a standard C toolchain and test with a framework like Unity or cmocka.
Q: What does a delegation response look like?
A JSON or CBOR payload on agents/<deviceid>/cmd with a decision field and optional parameters: {"decision": "alert", "threshold_update": 0.72, "expires_at": 1716400000}.