MCU Agent Architecture — On-Device and Off-Device

// last reviewed 2026-05-22 · Marcus Rüb

MCU Agent Architecture

An MCU agent is structured as a layered firmware architecture: hardware abstraction and sensor I/O at the bottom, a real-time event loop in the middle, an agent state machine and optional inference engine above that, and a communication layer at the top — with delegation to off-device services only when local logic is insufficient.

This page describes the software layers, the data flow between them, and the decision points where on-device work ends and off-device delegation begins.

What are the software layers?

┌───────────────────────────────────────┐
│        Cloud / Edge Agent             │  LLM reasoning, planning, RAG
├───────────────────────────────────────┤
│      Communication Layer              │  MQTT, CoAP, HTTP/TLS
├───────────────────────────────────────┤
│      Agent State Machine              │  Goals, transitions, command handler
├───────────────────────────────────────┤
│   Inference Engine (optional)         │  TFLM, EON, CMSIS-NN
├───────────────────────────────────────┤
│   Feature Extraction / Pre-processing │  FFT, RMS, windowing, normalization
├───────────────────────────────────────┤
│      Sensor Abstraction Layer         │  I2C, SPI, ADC, UART drivers
├───────────────────────────────────────┤
│      Hardware (MCU + peripherals)     │  ESP32-S3, STM32H7, RP2040…
└───────────────────────────────────────┘

Each layer has strict responsibility boundaries. The sensor layer does not know about MQTT. The state machine does not directly write to GPIO — it sets a state and the action handler executes the consequence.

How does the event loop work?

On FreeRTOS, the event loop is usually a collection of tasks communicating over queues:

/* Three-task architecture on FreeRTOS */

// Task 1: sensor sampling at fixed interval
void vSensorTask(void *pvParams) {
    SensorCtx_t ctx;
    for (;;) {
        sensor_read(&ctx.raw);
        feature_extract(&ctx);
        xQueueSend(xSensorQueue, &ctx, portMAX_DELAY);
        vTaskDelay(pdMS_TO_TICKS(SAMPLE_MS));
    }
}

// Task 2: agent state machine
void vAgentTask(void *pvParams) {
    SensorCtx_t ctx;
    AgentState_t state = STATE_IDLE;
    for (;;) {
        xQueueReceive(xSensorQueue, &ctx, portMAX_DELAY);
        state = agent_transition(state, &ctx);
        xQueueSend(xActionQueue, &state, 0);
    }
}

// Task 3: communication
void vCommTask(void *pvParams) {
    AgentState_t state;
    for (;;) {
        xQueueReceive(xActionQueue, &state, portMAX_DELAY);
        if (state == STATE_ALERT || state == STATE_DELEGATE)
            mqtt_publish_state(&state);
    }
}

This decouples sampling latency from network latency. The sensor task never blocks on a network operation.

What does the agent state machine look like?

StateEntry conditionOn-device actionOff-device action
IDLEDefaultSample sensors, buffer dataNone
DETECTINGThreshold crossed or model output > confidenceRun inference, debounceNone
ALERTInference confirms anomalyActuate if configuredPublish event to MQTT broker
DELEGATEConfidence too low for local decisionBuffer contextPublish to delegate topic; await response
CMD_RECEIVEDCommand arrives on sub topicApply params, update thresholdsAcknowledge receipt
OTAOTA signal receivedHalt sensor tasksDownload and verify firmware

The state machine is the agent’s “brain” — it is small, deterministic, and runs entirely on-device. The cloud’s role is to provide guidance that the state machine then applies.

How does inference fit in?

TinyML inference is a function call, not a separate thread in most implementations:

AgentState_t agent_transition(AgentState_t current, SensorCtx_t *ctx) {
    if (ctx->rms < IDLE_THRESHOLD) return STATE_IDLE;

    /* Run TFLite Micro inference */
    float score = tflm_infer(ctx->feature_vec, FEATURE_LEN);

    if (score > CONFIDENCE_HIGH) return STATE_ALERT;
    if (score > CONFIDENCE_LOW)  return STATE_DETECTING;
    return STATE_DELEGATE;  /* uncertain — ask upstream */
}

The inference call takes between 5 ms (keyword detect on ESP32-S3 with vector ISA) and several hundred milliseconds (image classification on Cortex-M4F without dedicated NPU). Budget this carefully against your sampling rate.

What gets delegated and when?

Delegation is triggered when the agent cannot make a confident local decision. The trigger is a confidence score, not a timeout:

Delegation payloads should be compact: pre-extracted features rather than raw sensor streams. A 64-float feature vector (256 bytes as float32, 64 bytes as int8 quantized) is practical. A 10-second raw audio buffer at 16 kHz is not.

How does the communication layer fit in?

The communication layer is responsible for:

  1. Connecting and reconnecting to the MQTT broker (with exponential back-off).
  2. Publishing agent events, readings, and delegation requests.
  3. Subscribing to command topics and delegation response topics.
  4. Serializing and deserializing payloads (JSON, CBOR, MessagePack).
  5. Buffering messages when offline and flushing when connectivity returns.

It is entirely separate from the decision layer. The state machine should not call mqtt_publish() directly — it should enqueue an action and let the communication task handle it. This prevents the state machine from blocking on network operations.

Component-to-layer mapping by board

BoardSensor layerInference layerComm layerState machine
ESP32-S3ESP-IDF ADC/I2C/SPITFLM + ESP-NN vectoresp-mqtt + LwIPC state machine / FreeRTOS
STM32H743STM32Cube HALSTM32Cube.AI / CMSIS-NNLwIP + MQTT-CC state machine / FreeRTOS
RP2040Pico SDK I2C/ADC/SPITFLM (no DSP ext.)lwIP / paho-embeddedC state machine / bare loop
nRF52840Zephyr sensor driversTFLM + Zephyr MLZephyr MQTTC state machine / Zephyr

Platform example: ForestHub.ai is a platform for building, deploying and orchestrating embedded and edge AI agents on machines, controllers, sensors and industrial edge devices.

FAQ

Q: Does every MCU agent need FreeRTOS? No. A simple super-loop with interrupt-driven sensor reads and a polling MQTT client works for low-frequency, non-concurrent agents. FreeRTOS becomes necessary when you need real concurrency: OTA + sensing + communication running without blocking each other.

Q: How much SRAM does the state machine itself use? A 6-state FSM with associated context typically uses less than 1 KB. The inference engine and its activations dominate RAM usage.

Q: Can the agent run two ML models? Yes, sequentially. A common pattern: run a lightweight keyword/anomaly detector first (fast, low SRAM). Only if it fires, run a heavier classifier (slower, higher SRAM). This cascades compute cost against prior probability.

Q: How do you unit-test the state machine? Test it on the host. The agent_transition() function takes a state and a sensor context and returns a new state — no hardware dependency. Compile it with a standard C toolchain and test with a framework like Unity or cmocka.

Q: What does a delegation response look like? A JSON or CBOR payload on agents/<deviceid>/cmd with a decision field and optional parameters: {"decision": "alert", "threshold_update": 0.72, "expires_at": 1716400000}.