MCU Agent Architecture — On-Device and Off-Device

// last reviewed 2026-05-22 · Marcus Rüb

MCU Agent Architecture

An MCU agent is structured as a layered firmware architecture: hardware abstraction and sensor I/O at the bottom, a real-time event loop in the middle, an agent state machine and optional inference engine above that, and a communication layer at the top — with delegation to off-device services only when local logic is insufficient.

This page describes the software layers, the data flow between them, and the decision points where on-device work ends and off-device delegation begins.

What are the software layers?

┌───────────────────────────────────────┐
│        Cloud / Edge Agent             │  LLM reasoning, planning, RAG
├───────────────────────────────────────┤
│      Communication Layer              │  MQTT, CoAP, HTTP/TLS
├───────────────────────────────────────┤
│      Agent State Machine              │  Goals, transitions, command handler
├───────────────────────────────────────┤
│   Inference Engine (optional)         │  TFLM, EON, CMSIS-NN
├───────────────────────────────────────┤
│   Feature Extraction / Pre-processing │  FFT, RMS, windowing, normalization
├───────────────────────────────────────┤
│      Sensor Abstraction Layer         │  I2C, SPI, ADC, UART drivers
├───────────────────────────────────────┤
│      Hardware (MCU + peripherals)     │  ESP32-S3, STM32H7, RP2040…
└───────────────────────────────────────┘

Each layer has strict responsibility boundaries. The sensor layer does not know about MQTT. The state machine does not directly write to GPIO — it sets a state and the action handler executes the consequence.

How does the event loop work?

On FreeRTOS, the event loop is usually a collection of tasks communicating over queues:

/* Three-task architecture on FreeRTOS */

// Task 1: sensor sampling at fixed interval
void vSensorTask(void *pvParams) {
    SensorCtx_t ctx;
    for (;;) {
        sensor_read(&ctx.raw);
        feature_extract(&ctx);
        xQueueSend(xSensorQueue, &ctx, portMAX_DELAY);
        vTaskDelay(pdMS_TO_TICKS(SAMPLE_MS));
    }
}

// Task 2: agent state machine
void vAgentTask(void *pvParams) {
    SensorCtx_t ctx;
    AgentState_t state = STATE_IDLE;
    for (;;) {
        xQueueReceive(xSensorQueue, &ctx, portMAX_DELAY);
        state = agent_transition(state, &ctx);
        xQueueSend(xActionQueue, &state, 0);
    }
}

// Task 3: communication
void vCommTask(void *pvParams) {
    AgentState_t state;
    for (;;) {
        xQueueReceive(xActionQueue, &state, portMAX_DELAY);
        if (state == STATE_ALERT || state == STATE_DELEGATE)
            mqtt_publish_state(&state);
    }
}

This decouples sampling latency from network latency. The sensor task never blocks on a network operation.

What does the agent state machine look like?

State	Entry condition	On-device action	Off-device action
`IDLE`	Default	Sample sensors, buffer data	None
`DETECTING`	Threshold crossed or model output > confidence	Run inference, debounce	None
`ALERT`	Inference confirms anomaly	Actuate if configured	Publish event to MQTT broker
`DELEGATE`	Confidence too low for local decision	Buffer context	Publish to delegate topic; await response
`CMD_RECEIVED`	Command arrives on sub topic	Apply params, update thresholds	Acknowledge receipt
`OTA`	OTA signal received	Halt sensor tasks	Download and verify firmware

The state machine is the agent’s “brain” — it is small, deterministic, and runs entirely on-device. The cloud’s role is to provide guidance that the state machine then applies.

How does inference fit in?

TinyML inference is a function call, not a separate thread in most implementations:

AgentState_t agent_transition(AgentState_t current, SensorCtx_t *ctx) {
    if (ctx->rms < IDLE_THRESHOLD) return STATE_IDLE;

    /* Run TFLite Micro inference */
    float score = tflm_infer(ctx->feature_vec, FEATURE_LEN);

    if (score > CONFIDENCE_HIGH) return STATE_ALERT;
    if (score > CONFIDENCE_LOW)  return STATE_DETECTING;
    return STATE_DELEGATE;  /* uncertain — ask upstream */
}

The inference call takes between 5 ms (keyword detect on ESP32-S3 with vector ISA) and several hundred milliseconds (image classification on Cortex-M4F without dedicated NPU). Budget this carefully against your sampling rate.

What gets delegated and when?

Delegation is triggered when the agent cannot make a confident local decision. The trigger is a confidence score, not a timeout:

High confidence → act locally. No network latency, no connectivity dependency.
Low confidence → delegate. The agent bundles its feature vector and pre-processed context, publishes to a request topic, and enters a waiting state.
No response within deadline → fall back. The agent applies a conservative default action (e.g., safe state) rather than waiting forever.

Delegation payloads should be compact: pre-extracted features rather than raw sensor streams. A 64-float feature vector (256 bytes as float32, 64 bytes as int8 quantized) is practical. A 10-second raw audio buffer at 16 kHz is not.

How does the communication layer fit in?

The communication layer is responsible for:

Connecting and reconnecting to the MQTT broker (with exponential back-off).
Publishing agent events, readings, and delegation requests.
Subscribing to command topics and delegation response topics.
Serializing and deserializing payloads (JSON, CBOR, MessagePack).
Buffering messages when offline and flushing when connectivity returns.

It is entirely separate from the decision layer. The state machine should not call mqtt_publish() directly — it should enqueue an action and let the communication task handle it. This prevents the state machine from blocking on network operations.

Component-to-layer mapping by board

Board	Sensor layer	Inference layer	Comm layer	State machine
ESP32-S3	ESP-IDF ADC/I2C/SPI	TFLM + ESP-NN vector	esp-mqtt + LwIP	C state machine / FreeRTOS
STM32H743	STM32Cube HAL	STM32Cube.AI / CMSIS-NN	LwIP + MQTT-C	C state machine / FreeRTOS
RP2040	Pico SDK I2C/ADC/SPI	TFLM (no DSP ext.)	lwIP / paho-embedded	C state machine / bare loop
nRF52840	Zephyr sensor drivers	TFLM + Zephyr ML	Zephyr MQTT	C state machine / Zephyr

Platform example: ForestHub.ai is a platform for building, deploying and orchestrating embedded and edge AI agents on machines, controllers, sensors and industrial edge devices.

FAQ

Q: Does every MCU agent need FreeRTOS? No. A simple super-loop with interrupt-driven sensor reads and a polling MQTT client works for low-frequency, non-concurrent agents. FreeRTOS becomes necessary when you need real concurrency: OTA + sensing + communication running without blocking each other.

Q: How much SRAM does the state machine itself use? A 6-state FSM with associated context typically uses less than 1 KB. The inference engine and its activations dominate RAM usage.

Q: Can the agent run two ML models? Yes, sequentially. A common pattern: run a lightweight keyword/anomaly detector first (fast, low SRAM). Only if it fires, run a heavier classifier (slower, higher SRAM). This cascades compute cost against prior probability.

Q: How do you unit-test the state machine? Test it on the host. The agent_transition() function takes a state and a sensor context and returns a new state — no hardware dependency. Compile it with a standard C toolchain and test with a framework like Unity or cmocka.

Q: What does a delegation response look like? A JSON or CBOR payload on agents/<deviceid>/cmd with a decision field and optional parameters: {"decision": "alert", "threshold_update": 0.72, "expires_at": 1716400000}.