MQTT for MCU Agents — Topics, QoS, Brokers, Payloads
MQTT for MCU Agents
MQTT is the dominant messaging protocol for MCU agents because it is lightweight (minimum 2-byte overhead per message), broker-mediated (devices never connect to each other directly), and supports QoS levels that match the reliability constraints of constrained devices.
This page covers topic design, QoS selection, broker choices, TLS overhead, and payload format trade-offs for MCU agents specifically. General MQTT tutorials exist elsewhere; this focuses on the decisions that matter when your client has 256–512 KB of SRAM.
What topic structure should MCU agents use?
A well-designed topic tree separates device identity, message direction, and message type. A flat scheme like data/temperature does not scale across hundreds of devices. A hierarchical scheme enables fine-grained ACL (access control list) rules and efficient wildcard subscriptions.
Recommended structure:
agents/{device_id}/event ← device → broker (unsolicited events)
agents/{device_id}/telemetry ← device → broker (periodic heartbeat)
agents/{device_id}/cmd ← broker → device (commands, param updates)
agents/{device_id}/delegate/req ← device → broker (delegation request)
agents/{device_id}/delegate/resp ← broker → device (delegation response)
agents/{device_id}/ota ← broker → device (OTA signals)
agents/{device_id}/status ← device → broker (retained; last known state)
Fleet-level subscriptions for monitoring:
agents/+/event ← all events from all agents
agents/+/status ← all status messages
agents/factory-a/# ← everything from a named group
Use {device_id} values that are stable, unique, and not user-readable PII. A UUID4 or a hash of the hardware MAC address works well.
Which QoS level should you use?
| QoS | Semantics | MCU implications |
|---|---|---|
| QoS 0 | At most once (fire and forget) | Lowest overhead. Use for periodic telemetry where loss is acceptable. |
| QoS 1 | At least once (acknowledged) | Broker ACKs each message; device retries until ACK. Use for events and alerts. Requires SRAM for the outbound queue. |
| QoS 2 | Exactly once (4-way handshake) | High protocol overhead (4 messages per publish). Rarely necessary on MCUs; adds latency and RAM. |
Default recommendation:
- Telemetry / heartbeat: QoS 0.
- Events / alerts / delegation requests: QoS 1.
- Commands to device: QoS 1 (broker stores and delivers).
- OTA triggers: QoS 1 with retained flag = false.
Broker comparison for MCU agent deployments
| Broker | Language | Scalability | MCU edge deployment | TLS | MQTT 5.0 | License |
|---|---|---|---|---|---|---|
| Eclipse Mosquitto | C | Thousands of connections | Yes — runs on Raspberry Pi, gateway hardware | Yes | Yes (v2.0+) | EPL 2.0 |
| EMQX | Erlang/OTP | Millions of connections | Gateway / cloud only | Yes | Yes | Apache 2.0 (Community) |
| HiveMQ | Java | Up to 200M connections (claimed) | Cloud / on-premise only | Yes | Yes | Commercial |
| NanoMQ | C (async I/O) | Designed for edge gateway | Yes — lightweight, 200KB–3MB | Yes | Yes | MIT |
For typical MCU agent deployments:
- Mosquitto on a local gateway is the simplest production setup for a device fleet up to ~10,000 connections. Configuration is a single text file.
- EMQX is appropriate when you need a shared-subscription rules engine, built-in data bridging to Kafka/InfluxDB, and horizontal scaling.
- NanoMQ is worth considering when the broker itself must run on a constrained edge gateway (a GL.iNet router, an industrial gateway with 256 MB RAM).
TLS on constrained devices
TLS is non-negotiable for production deployments. The cost on an MCU:
| Overhead | Typical value |
|---|---|
| Initial TLS handshake | 1–3 seconds on Cortex-M4 @ 168 MHz |
| Heap for active TLS session | 40–80 KB (mbedTLS / wolfSSL) |
| CPU during record encryption | <5% at typical MQTT rates (1 msg/sec) |
Recommendations:
- Use TLS 1.2 minimum; TLS 1.3 is supported by wolfSSL and mbedTLS 3.x and reduces handshake latency.
- Use pre-shared keys (TLS-PSK) on the most constrained devices (Cortex-M0) — faster handshake, lower code size than certificate-based TLS.
- Session resumption (TLS session tickets or session ID) avoids a full handshake on reconnect. Enable it on both client and broker.
- Mutual TLS (mTLS) provides device identity at the transport layer. Store the device private key in a secure element if your threat model requires it.
Payload format trade-offs
| Format | Overhead | Encoding required | Human readable | Notes |
|---|---|---|---|---|
| JSON | High (verbose) | No (text) | Yes | Easiest to debug; use for low-frequency events |
| CBOR | Low (~40–50% smaller than JSON) | Yes | No | RFC 7049; well-supported; good default for MCU agents |
| MessagePack | Low (~similar to CBOR) | Yes | No | Slightly simpler schema; less formal than CBOR |
| Protobuf | Very low (schema-driven) | Yes | No | Best efficiency; requires schema management |
| Raw binary (custom) | Minimal | Custom | No | Risky; no schema; hard to evolve |
For most MCU agents: CBOR strikes the best balance. A 128-byte CBOR payload versus a 220-byte JSON payload across thousands of devices at QoS 1 measurably reduces broker load and cellular data costs.
A minimal CBOR event payload:
{
"id": "esp32s3-node-01", // device identifier
"ts": 1716400000, // unix timestamp (from NTP or RTC)
"evt": "threshold_exceeded", // event type string or enum
"val": 87.4, // sensor value
"seq": 4821 // sequence number for gap detection
}
Last Will and Testament
Set a Last Will and Testament (LWT) on connect. If the device disconnects uncleanly (power loss, network failure), the broker publishes the LWT message automatically:
esp_mqtt_client_config_t cfg = {
.broker.address.uri = BROKER_URI,
.session.last_will.topic = "agents/" DEVICE_ID "/status",
.session.last_will.msg = "{\"online\":false}",
.session.last_will.qos = 1,
.session.last_will.retain = true,
};
On clean connect, the agent publishes {"online":true} to its status topic with retain=true. The LWT publishes {"online":false} on unexpected disconnect. Fleet monitoring reads the retained status topic to know device state without polling.
Platform example: ForestHub.ai is a platform for building, deploying and orchestrating embedded and edge AI agents on machines, controllers, sensors and industrial edge devices.
FAQ
Q: Should devices use a shared MQTT client or one per task?
One client per device. Within the device, publish from a dedicated communication task; other tasks enqueue messages to it. Never call mqtt_publish() from an ISR or from the sensor task directly.
Q: How do I handle MQTT reconnects on a device with intermittent Wi-Fi?
Implement an exponential back-off reconnect loop with a cap (e.g., max 60 seconds). ESP-IDF’s esp-mqtt handles this automatically via MQTT_EVENT_DISCONNECTED and its internal reconnect logic if configured.
Q: Is MQTT 5.0 worth using on MCU agents? The most useful MQTT 5.0 features for MCU agents: message expiry interval (discard stale events that arrive late), reason codes (better error diagnosis), topic aliases (reduce repeated long topic strings). Both esp-mqtt and Zephyr’s MQTT client support MQTT 5.0.
Q: What port should I use for MQTT over TLS? Port 8883 is the standard. Port 443 (HTTPS) is sometimes used to traverse restrictive firewalls; HiveMQ and EMQX both support WebSocket+TLS on 443.
Q: How many subscriptions can a device hold? It depends on the broker. Most brokers support 10–100 subscriptions per client easily. On the device side, each subscription adds a string match per incoming message — not a significant CPU cost at MCU-level message rates.