M3SHD
A Hub-and-Spoke Multi-Agent AI Mesh: Architecture, Dispatch, and Security
We present M3SHD, a hub-and-spoke multi-agent AI collaboration mesh connecting heterogeneous compute nodes — desktop machines, a VPS hub, and consumer smartphones — through a central FastAPI server with SQLite WAL storage and SSE event bus. The system addresses four persistent gaps in contemporary multi-agent frameworks: reliance on cloud-only infrastructure, absence of execution auditability, static role assignment, and lack of an intelligence layer above the model. M3SHD's dispatch engine combines a rolling-window circuit breaker, UCB-based agent reputation scoring, capability-based task routing, and DAG dependency resolution to allocate work across nodes with no central scheduler. The security model layers Fernet encryption with MultiFernet key rotation, PBKDF2-SHA256 key derivation, a granular RBAC system with 17 discrete permissions, per-agent token budgets, and an append-only audit log. The agent intelligence layer provides FTS5-indexed encrypted memory, confidence-triggered verification routing, self-evolving configuration proposals, and a structured debate protocol for high-stakes outputs. Physical deployment runs on Raspberry Pi 5 mesh nodes housed in custom 3D-printed N0D3 enclosures with automated provisioning.
Introduction
Multi-agent AI frameworks have matured rapidly, but most share a common assumption: agents run within a single trust boundary, on homogeneous infrastructure, managed by a central orchestrator with full visibility into all nodes. This assumption simplifies coordination but creates practical constraints that limit real-world deployment.
Four gaps persist across the leading open-source frameworks (AutoGPT, CrewAI, LangGraph, MetaGPT, AutoGen):
- Cloud-only architectures lock compute behind proprietary APIs, creating cost and privacy barriers. Operators cannot use hardware they already own — desktop workstations, spare mobile devices, or edge nodes.
- Opaque execution provides no auditability chain. When a task fails or produces incorrect output, there is no evidence linking the output to the specific agent, model call, and inputs that produced it. This prevents compliance use cases and makes debugging non-deterministic.
- Static role assignment wastes capacity. Frameworks assign roles at initialization; agents idle when their specialty is not required rather than claiming available work based on declared capabilities and current load.
- No intelligence layer above the model. Confidence scoring, adaptive routing, reputation management, and self-improvement must be rebuilt from scratch by every application. Frameworks provide no primitives for these concerns.
M3SHD was designed to address all four gaps simultaneously. The design prioritizes deployability on hardware the operator already owns, cryptographic accountability for every execution, dynamic work allocation based on live node state, and an intelligence layer that improves agent performance without operator intervention. The following sections document the resulting architecture.
Architecture
M3SHD uses a hub-and-spoke topology. A central Commander server coordinates task dispatch, event routing, and persistence. N0D3S connect outbound via HTTPS and SSE; the Commander never initiates connections to nodes. This design survives NAT, CGNAT, and firewall traversal without VPN configuration for all node types — desktop agents additionally use Tailscale for mesh-internal traffic isolation.
2.1 Commander Server
The Commander runs FastAPI with SQLite in WAL mode on a Hetzner VPS. All state is persisted to SQLite with explicit PRAGMA tuning: WAL mode with a 10MB checkpoint threshold, 10,000-page cache, 256MB mmap, and synchronous NORMAL for the write-performance-to-durability tradeoff appropriate for this workload. Litestream replicates WAL frames to Cloudflare R2 continuously, providing sub-second recovery point objective without requiring a secondary database process.
The Commander exposes a REST API for task submission, agent management, and state queries, and a SSE event bus at /events/stream that all N0D3S subscribe to for real-time dispatch signals.
2.2 N0D3S
Desktop N0D3S run the mesh daemon — a Python process that maintains a persistent SSE connection to the Commander, claims tasks via atomic SQL transactions, executes them by invoking the Claude CLI as a subprocess (not through a direct API wrapper), and posts results back to the Commander. Subprocess execution allows the daemon to inherit the operator's Claude authentication context and apply per-task system prompt overrides without maintaining API state in the daemon process itself.
Each node registers its slot capacity at connection time. The daemon implements exponential backoff on network failure and graceful task draining before shutdown, ensuring in-progress work completes before the node disconnects.
2.3 Mobile N0D3S
The N0D3 Flutter application allows iOS devices to function as compute nodes. The mobile N0D3 communicates with the Commander exclusively through the REST API and SSE stream — it has no shell access and executes no local processes. Claude API calls are made directly from the app using the Anthropic Dart SDK with real SSE streaming for token delivery. The app maintains an offline task queue for network interruptions and performs graceful task handoff to other available nodes when the device disconnects mid-execution.
2.4 iMessage Bridge
The iMessage bridge is a multithreaded Python daemon running on a macOS host. One thread polls chat.db via SQLite for new messages from an allowlisted set of senders. A second thread classifies message intent and submits tasks to the Commander REST API. A third thread monitors task completion events from the SSE stream. A fourth thread delivers replies via osascript. Magic byte verification prevents binary payloads from being interpreted as task text. The result is that any iMessage client — phone, Mac, iPad — can submit and monitor mesh tasks through natural language.
2.5 Database Layer
All Commander state is stored in a single SQLite database. The schema covers tasks, agents, agent memory, task dependencies, task templates, plugins, plugin call logs, agent evolution proposals, federation routes, reputation scores, debate votes, and an append-only audit log. INTEGER primary keys are used throughout. Timestamps are stored as ISO-8601 TEXT in UTC. The schema is versioned with migration scripts that use PRAGMA table_info to guard idempotently against re-running completed migrations.
Task Dispatch Engine
Task dispatch uses a pull model to avoid centralized scheduling complexity. When a task enters the pending state, the Commander broadcasts a task_available SSE event. N0D3S with available slots and matching capabilities attempt an atomic claim via a SQL transaction with optimistic locking. The first node to commit the claim owns the task; other claimants retry on the next available event. Four subsystems refine this baseline dispatch loop.
3.1 Circuit Breaker
Each N0D3 maintains a rolling window of its 20 most recent task outcomes. When the failure rate within this window exceeds 50%, the circuit breaker opens and the node is excluded from new task claims for a 120-second cooldown period. After the cooldown, the circuit enters a half-open state: the node is eligible to claim a single task, and the result determines whether the circuit closes (normal operation) or opens again.
This pattern prevents a degraded node — due to API rate limits, hardware load, or a failed dependency — from continuing to claim tasks it cannot complete, which would delay the overall queue and reduce apparent system throughput.
3.2 Reputation Scoring (UCB)
Each agent maintains a reputation score updated after every task completion. The scoring function is derived from the upper-confidence-bound (UCB1) formula used in bandit problems [29]: it balances exploitation (routing to historically high-performing agents) with exploration (giving lower-ranked agents opportunities to demonstrate improvement). Task complexity and domain are factored into the score update — a failure on an out-of-domain task penalizes less than a failure on a core-capability task.
The dispatcher preferentially routes tasks to agents with higher reputation scores when multiple capable nodes are eligible. Reputation scores are stored in the database and persist across daemon restarts, so the mesh accumulates routing knowledge over its operational lifetime.
3.3 Capability-Based Routing
Workers declare a set of capability tags at registration time (for example: code_review, research, summarization). Tasks similarly declare required capabilities. The dispatcher only broadcasts task_available events to nodes whose declared capabilities satisfy the task requirements. This eliminates spurious claim attempts from nodes that could not complete the task regardless, reducing contention on busy meshes.
The Plugin SDK extends the capability model to tool use. Workers that have registered a plugin (web search, file summarization, memory augmentation, or operator-defined custom tools) declare corresponding capability tags. Tasks that require tool use are routed only to plugin-capable nodes.
3.4 Task Dependencies and DAG Resolution
Tasks declare dependencies through a task_deps join table referencing parent task IDs. Before accepting a task into the dispatch queue, the Commander validates that the declared dependency graph contains no cycles using topological sort. Tasks that fail this check are rejected at submission time with an explicit error, preventing deadlock at runtime.
When a task completes, the Commander queries its dependency graph and automatically dispatches any child tasks whose remaining dependencies are now satisfied. This enables arbitrarily complex workflows — data collection feeding parallel analysis, parallel analysis feeding synthesis — to be encoded as task graphs and executed without orchestration code in the application layer.
Security Model
Encryption at Rest
Agent memory entries are encrypted at rest using Fernet symmetric encryption (AES-128-CBC with HMAC-SHA256 authentication). Key management uses the MultiFernet pattern: a primary encryption key and one or more retired keys are active simultaneously, allowing key rotation without decrypting and re-encrypting the entire memory corpus. All key material is derived from a master secret using PBKDF2-SHA256 with 100,000 iterations, providing resistance to offline brute-force attacks on exported key material. The plaintext key is never written to disk or logged.
Authentication and Key Derivation
Agent API keys are stored as hash:salt pairs derived via PBKDF2-SHA256. The plaintext key is presented exactly once at agent creation; subsequent operations authenticate using the hash. JWT sessions use HS256 with short-lived access tokens and refresh token rotation. All secrets are injected at runtime via environment variables — no credentials appear in source code or container images.
RBAC — Role-Based Access Control
The permission system defines 17 discrete permissions covering task submission, task claiming, memory read/write, plugin invocation, agent administration, audit log access, and federation relay. Four role presets are provided as starting points: worker (claim and execute tasks), mobile (API-only task interaction, no shell access), commander (full task management, no admin), and admin (all permissions). Custom role configurations can combine any subset of the 17 permissions.
A critical security property of the RBAC implementation: an agent with an empty or unrecognized permission set receives zero access — not default access. This empty-deny default means misconfigured agents fail closed, preventing accidental capability grants through configuration errors.
Per-Agent Token Budgets
Each agent is assigned monthly, daily, and per-task token budget limits. The budget guard evaluates these limits before the dispatch loop forwards a task to a node. Tasks that would exceed any budget threshold are not dispatched to that agent — they remain in the queue for dispatch to another eligible node or are returned to the submitter if no eligible node has budget remaining. Budget consumption is tracked in the database and reported in the audit log.
Transport Security and Headers
All Commander traffic is TLS 1.3 via Caddy with automatic ACME certificate management. Desktop nodes additionally communicate over Tailscale for mesh-internal traffic. The iMessage bridge operates entirely locally — message content does not leave the host without explicit Commander submission. SecurityHeadersMiddleware injects Content-Security-Policy (strict, nonce-based for inline scripts), Strict-Transport-Security (max-age=31536000, includeSubDomains), X-Frame-Options: DENY, X-Content-Type-Options: nosniff, and Referrer-Policy: same-origin on every response.
Input Validation
Path parameters are validated against ^[a-zA-Z0-9_.-]{1,64}$ before any database access — the dot is required to support agent identifiers that follow dot-notation conventions. All request bodies are validated by Pydantic V2 models enforcing field types, ranges, and format constraints. All SQL uses parameterized queries throughout the codebase; string interpolation into SQL is structurally absent.
Audit Logging
Every state-changing operation — task status transitions, agent configuration changes, permission grants, plugin invocations, federation relays — writes a row to the audit_log table. Each row records: timestamp, agent_id, action, resource_type, resource_id, before/after state snapshot (JSON), and a provenance chain hash linking to the previous audit entry. The table is append-only; no UPDATE or DELETE operations are permitted on it. The hash chain allows external verification that the audit log has not been retroactively modified.
Security Audit History
Agent Intelligence
M3SHD implements a set of patterns from the multi-agent research literature that operate above the model layer — they improve outcomes without modifying the underlying language model or requiring fine-tuning.
Encrypted Memory with FTS5 Search
The agent_memory table functions as a shared world model. Agents read from memory at task start and write to it by emitting structured [REMEMBER] markers in their output, which the daemon intercepts and stores. Memory entries are encrypted with Fernet before storage and indexed in an FTS5 virtual table over their decrypted content, allowing full-text semantic search without storing plaintext in the main table. A nightly consolidation sweep identifies contradictory entries, merges duplicates, and flags low-confidence memories for review. Memory persists across daemon restarts, model upgrades, and node replacements.
Confidence-Based Verification Routing
Every agent output includes a structured confidence score in the range [0.0, 1.0]. The daemon parses this score before posting the result to the Commander. Outputs scoring below 0.7 are automatically routed to a second eligible agent for independent verification before delivery to the task submitter. The verification agent receives the original task, the first agent's output, and a prompt instructing it to identify errors rather than produce a new answer. This operationalizes the metacognitive verification model described by Steyvers and Peters [33] without requiring changes to the base language model.
Self-Evolving Agent Configurations
A background evolution process runs on a configurable schedule over a rolling 7-day performance window. It aggregates task outcome data per agent — success rate, confidence distribution, verification trigger rate, operator corrections — and submits this data to Claude with a prompt requesting specific, actionable amendments to the agent's system prompt and configuration parameters. The proposed changes are stored in the agent_evolution table for operator review. Accepted proposals are applied to the agent configuration; rejected proposals are logged with the rejection reason for future analysis.
Debate Protocol
Tasks submitted with the debate: true flag are dispatched to three independent agents simultaneously. Each agent produces a complete answer without access to the others' outputs. A designated moderator agent then receives all three answers and identifies points of agreement, points of disagreement, and the relative strength of the arguments. The moderator synthesizes a final answer from the convergent positions and logs the full debate transcript. The debate protocol is implemented as described in the multi-agent debate framework [32] and is most effective on factual and analytical tasks where answer correctness can be evaluated.
Collective Voting
Configuration changes — model tier assignments, prompt amendments, plugin additions, federation policy changes — are submitted as proposals to the votes table. Agents cast weighted votes, with vote weight proportional to the agent's current reputation score. The operator sees the aggregated result — vote distribution, reasoning from supporting and opposing agents — and applies or rejects the proposal. This creates a governance layer that scales with mesh size without requiring operator involvement in routine configuration decisions.
Model Routing
Task complexity is classified at dispatch time using a rule-based classifier that analyzes prompt structure, required output format, declared capabilities, and historical success rates per model tier. Simple tasks are directed to lighter model tiers; complex or high-stakes tasks are directed to more capable tiers. The classification rules are explicit and editable — operators can inspect and adjust the routing logic without retraining a model. The routing decision and the rationale are recorded in the audit log for each dispatched task.
Hardware Deployment
Pi 5 Mesh Nodes
Physical mesh nodes run on Raspberry Pi 5 hardware. The Pi 5 handles concurrent daemon processes without breaking a sweat, idles at under 5W, and fits anywhere — the right hardware for an always-on node. Each node boots from a prepared SD card image containing the mesh daemon, its Python dependencies, Tailscale, and a systemd service unit that starts the daemon on boot and restarts it on failure.
N0D3 Enclosures
Physical nodes are housed in the N0D3 enclosure — a custom 3D-printed case designed specifically for the Pi 5 form factor. The enclosure provides structured airflow for thermal management, a recessed panel for cable management, and a mounting pattern compatible with standard rack and wall-mount hardware. N0D3 enclosures are printed in PETG for thermal stability.
Provisioning
Node provisioning is scripted end-to-end: SD card imaging, first-boot configuration, Tailscale authentication, daemon installation, and Commander registration are all driven by a single provisioning script. A new physical node can be added to an existing mesh without manual SSH configuration steps. The Commander validates the node's registration credentials and capability declaration before it begins receiving task dispatch events.
Network Topology
Pi 5 N0D3S and desktop N0D3S communicate with the Commander over Tailscale, providing encrypted mesh-internal traffic without requiring port forwarding or VPN gateway configuration. Mobile N0D3S (N0D3 Flutter app) communicate directly with the Commander's public HTTPS endpoint. The Commander itself is exposed through Caddy with automatic TLS, with no direct exposure of the FastAPI process to the public internet.
Comparison with Existing Frameworks
The table below compares M3SHD against five leading open-source multi-agent frameworks across fifteen capability dimensions. Ratings reflect publicly documented features as of April 2026.
| Capability | M3SHD | AutoGPT | CrewAI | LangGraph | MetaGPT | AutoGen |
|---|---|---|---|---|---|---|
| Hub-and-spoke topology | YES | NO | NO | PARTIAL | NO | PARTIAL |
| Heterogeneous hardware nodes | YES | NO | NO | NO | NO | NO |
| Mobile device as compute node | YES | NO | NO | NO | NO | NO |
| iMessage natural language interface | YES | NO | NO | NO | NO | NO |
| Circuit breaker with rolling window | YES | NO | NO | NO | NO | NO |
| Capability-based task routing | YES | NO | PARTIAL | PARTIAL | PARTIAL | PARTIAL |
| DAG task dependency resolution | YES | NO | PARTIAL | YES | NO | PARTIAL |
| Agent reputation scoring (UCB) | YES | NO | NO | NO | NO | NO |
| Encrypted agent memory (FTS5) | YES | PARTIAL | PARTIAL | PARTIAL | NO | PARTIAL |
| Confidence-based verification routing | YES | NO | NO | NO | PARTIAL | NO |
| Self-evolving agent configurations | YES | NO | NO | NO | NO | NO |
| Debate protocol | YES | NO | NO | NO | NO | YES |
| Granular RBAC (17 permissions) | YES | NO | NO | NO | NO | NO |
| Per-agent token budgets | YES | NO | NO | NO | NO | NO |
| Cross-mesh federation with hop limits | YES | NO | NO | NO | NO | NO |
M3SHD achieves YES across all 15 dimensions. AutoGen is the nearest competitor with 2 YES ratings (hub topology, debate protocol) and 1 PARTIAL. LangGraph scores 2 YES (hub topology, DAG dependencies) and 2 PARTIAL. The remaining frameworks score 0 YES across these dimensions.
Future Work
- Android worker client — The Flutter codebase is cross-platform; an Android build of the N0D3 app would extend the mobile compute pool to the largest smartphone market by volume without requiring a separate implementation.
- Learned task router — The current model tier classifier is rule-based and explicit. A lightweight classifier trained on historical task outcomes per model tier could improve routing precision, particularly for tasks that fall ambiguously between tiers.
- Federated memory consolidation — The nightly consolidation sweep currently operates within a single hub's memory corpus. Extending it across federated meshes, with RBAC-governed read access, would allow knowledge to propagate between M3SHD deployments.
- Plugin SDK documentation and registry — The plugin interface is implemented and functional but not yet formally documented. A published plugin registry and development guide would allow operators to share plugins across deployments.
- Benchmark harness publication — The internal test suite covers dispatch, security, and intelligence layer behaviors. Publishing this as a standalone evaluation harness would enable objective comparison against frameworks that add similar capabilities in future releases.
References
- 1. Anthropic. Claude 3.7 Sonnet System Card — Anthropic, February 2025. anthropic.com/claude/sonnet
- 2. Anthropic / Block. Model Context Protocol Specification — MCP Spec, November 2024. modelcontextprotocol.io/specification
- 3. Wang et al. A Survey on Large Language Model based Autonomous Agents — arXiv 2308.11432. arxiv.org/abs/2308.11432
- 4. Xi et al. The Rise and Potential of LLM-Based Agents: A Survey — arXiv 2309.07864. arxiv.org/abs/2309.07864
- 5. Park et al. Generative Agents: Interactive Simulacra of Human Behavior — CHI 2023, arXiv 2304.03442. arxiv.org/abs/2304.03442
- 6. Hong et al. MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework — arXiv 2308.00352. arxiv.org/abs/2308.00352
- 7. Wu et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation — arXiv 2308.08155. arxiv.org/abs/2308.08155
- 8. Chase. LangChain — GitHub. github.com/langchain-ai/langchain
- 9. Joao Moura. CrewAI — GitHub. github.com/joaomdmoura/crewAI
- 10. Significant Gravitas. AutoGPT — GitHub. github.com/Significant-Gravitas/AutoGPT
- 11. FastAPI — tiangolo.com. fastapi.tiangolo.com
- 12. SQLite WAL Mode Documentation — SQLite.org. sqlite.org/wal.html
- 13. Litestream — litestream.io. litestream.io
- 14. MemoryOS: A Memory-Based Operating System for AI Agents — arXiv 2506.06326. arxiv.org/abs/2506.06326
- 15. A-MEM: Agentic Memory for LLM Agents — arXiv 2502.12110, February 2025. arxiv.org/abs/2502.12110
- 16. Tulving. Episodic and Semantic Memory — Academic Press, 1972.
- 17. OWASP Top Ten 2021 — OWASP Foundation. owasp.org/Top10
- 18. PBKDF2 — NIST SP 800-132. csrc.nist.gov
- 19. Riverpod — riverpod.dev. riverpod.dev
- 20. GoRouter — pub.dev/packages/go_router. pub.dev/packages/go_router
- 21. Firebase Cloud Messaging — Google. firebase.google.com/docs/cloud-messaging
- 22. Tailscale — tailscale.com. tailscale.com
- 23. Caddy Server — caddyserver.com. caddyserver.com
- 24. Cloudflare R2 — cloudflare.com/developer-platform/r2. cloudflare.com
- 25. Dynamic Scheduling in Heterogeneous Computing Environments — IEEE TC, March 2025.
- 26. LoRASA: Low-Rank Agent-Specific Adaptation — arXiv 2502.05573, February 2025. arxiv.org/abs/2502.05573
- 27. TT-LoRA MoE: Unifying PEFT and Sparse Mixture-of-Experts — arXiv 2504.21190. arxiv.org/abs/2504.21190
- 28. AgentNet: Decentralized Evolutionary Coordination — arXiv 2504.00587. arxiv.org/abs/2504.00587
- 29. DRF: LLM-AGENT Dynamic Reputation Filtering — arXiv 2509.05764. arxiv.org/abs/2509.05764
- 30. MasRouter: Learning to Route LLMs for Multi-Agent Systems — arXiv 2502.11133. arxiv.org/abs/2502.11133
- 31. Decentralized Adaptive Task Allocation — Nature Scientific Reports, February 2025. nature.com
- 32. Multi-Agent Debate: Can LLM Agents Really Debate? — arXiv 2511.07784. arxiv.org/abs/2511.07784
- 33. Metacognition and Uncertainty in Humans and LLMs — Steyvers & Peters, Psychological Science 2025. journals.sagepub.com
- 34. Emergent Coordination in Multi-Agent Language Models — arXiv 2510.05174. arxiv.org/abs/2510.05174
- 35. Evaluating Theory of Mind in LLM-Based Multi-Agent Systems — arXiv 2603.00142. arxiv.org/abs/2603.00142
- 36. Multi-Agent Evolve: LLM Self-Improve through Co-evolution — arXiv 2510.23595. arxiv.org/abs/2510.23595
- 37. State of Agentic iOS Engineering in 2026 — Dimillian/Medium. dimillian.medium.com
- 38. Microsoft Agent Governance Toolkit — April 2026. opensource.microsoft.com
- 39. AI Agent Index 2025 — arXiv 2602.17753. arxiv.org/html/2602.17753v1
- 40. Claude Agent SDK Overview — Anthropic. platform.claude.com
- 41. AutoGen GitHub — Microsoft. github.com/microsoft/autogen
- 42. CrewAI Concepts — Official Docs. docs.crewai.com
- 43. LangGraph Overview — LangChain. langchain.com/langgraph
- 44. Matrix: Meta Multi-Agent for Synthetic Data — arXiv 2511.21686. arxiv.org/abs/2511.21686
- 45. Galileo Human-in-the-Loop Guide. galileo.ai
- 46. Agent Budget Guard MCP Server — earezki.com. earezki.com
- 47. Cryptographic Verifiability of End-to-End AI Pipelines — IWSPA 2025 (ACM), arXiv 2503.22573. arxiv.org/html/2503.22573v1
- 48. Towards AGI: Self-Evolving Agent — arXiv 2601.11658. arxiv.org/abs/2601.11658
- 49. MiRA: Subgoal-driven Framework for Long-Horizon LLM Agents — arXiv 2603.19685. arxiv.org/abs/2603.19685
- 50. Orchestrating Confidence-Aware Routing for Multi-Agent Collaboration — arXiv 2601.04861. arxiv.org/abs/2601.04861
Demo: mesh.demobygrit.com (public read-only dashboard) | Contact: fred@gritwerk.com