Whitepaper — April 2026

M3SHD

A Hub-and-Spoke Multi-Agent AI Mesh: Architecture, Dispatch, and Security

● Fred Wojo / GritWerk ● fredwojo@gmail.com ● mesh.demobygrit.com

Design Claims

Role Presets

RBAC Permissions

Frameworks Compared

Security Audits

Abstract

We present M3SHD, a hub-and-spoke multi-agent AI collaboration mesh connecting heterogeneous compute nodes — desktop machines, a VPS hub, and consumer smartphones — through a central FastAPI server with SQLite WAL storage and SSE event bus. The system addresses four persistent gaps in contemporary multi-agent frameworks: reliance on cloud-only infrastructure, absence of execution auditability, static role assignment, and lack of an intelligence layer above the model. M3SHD's dispatch engine combines a rolling-window circuit breaker, UCB-based agent reputation scoring, capability-based task routing, and DAG dependency resolution to allocate work across nodes with no central scheduler. The security model layers Fernet encryption with MultiFernet key rotation, PBKDF2-SHA256 key derivation, a granular RBAC system with 17 discrete permissions, per-agent token budgets, and an append-only audit log. The agent intelligence layer provides FTS5-indexed encrypted memory, confidence-triggered verification routing, self-evolving configuration proposals, and a structured debate protocol for high-stakes outputs. Physical deployment runs on Raspberry Pi 5 mesh nodes housed in custom 3D-printed N0D3 enclosures with automated provisioning.

Section 01

Introduction

Multi-agent AI frameworks have matured rapidly, but most share a common assumption: agents run within a single trust boundary, on homogeneous infrastructure, managed by a central orchestrator with full visibility into all nodes. This assumption simplifies coordination but creates practical constraints that limit real-world deployment.

Four gaps persist across the leading open-source frameworks (AutoGPT, CrewAI, LangGraph, MetaGPT, AutoGen):

Cloud-only architectures lock compute behind proprietary APIs, creating cost and privacy barriers. Operators cannot use hardware they already own — desktop workstations, spare mobile devices, or edge nodes.
Opaque execution provides no auditability chain. When a task fails or produces incorrect output, there is no evidence linking the output to the specific agent, model call, and inputs that produced it. This prevents compliance use cases and makes debugging non-deterministic.
Static role assignment wastes capacity. Frameworks assign roles at initialization; agents idle when their specialty is not required rather than claiming available work based on declared capabilities and current load.
No intelligence layer above the model. Confidence scoring, adaptive routing, reputation management, and self-improvement must be rebuilt from scratch by every application. Frameworks provide no primitives for these concerns.

M3SHD was designed to address all four gaps simultaneously. The design prioritizes deployability on hardware the operator already owns, cryptographic accountability for every execution, dynamic work allocation based on live node state, and an intelligence layer that improves agent performance without operator intervention. The following sections document the resulting architecture.

Design principle: The mesh should behave like a team, not a pipeline. Tasks route to the right specialist automatically. Agents that underperform are penalized in routing decisions. The system can debate its own conclusions before delivering them. Shared memory grows continuously. Agent configurations evolve based on observed outcomes.

Section 02

Architecture

M3SHD uses a hub-and-spoke topology. A central Commander server coordinates task dispatch, event routing, and persistence. N0D3S connect outbound via HTTPS and SSE; the Commander never initiates connections to nodes. This design survives NAT, CGNAT, and firewall traversal without VPN configuration for all node types — desktop agents additionally use Tailscale for mesh-internal traffic isolation.

2.1 Commander Server

The Commander runs FastAPI with SQLite in WAL mode on a Hetzner VPS. All state is persisted to SQLite with explicit PRAGMA tuning: WAL mode with a 10MB checkpoint threshold, 10,000-page cache, 256MB mmap, and synchronous NORMAL for the write-performance-to-durability tradeoff appropriate for this workload. Litestream replicates WAL frames to Cloudflare R2 continuously, providing sub-second recovery point objective without requiring a secondary database process.

The Commander exposes a REST API for task submission, agent management, and state queries, and a SSE event bus at /events/stream that all N0D3S subscribe to for real-time dispatch signals.

2.2 N0D3S

Desktop N0D3S run the mesh daemon — a Python process that maintains a persistent SSE connection to the Commander, claims tasks via atomic SQL transactions, executes them by invoking the Claude CLI as a subprocess (not through a direct API wrapper), and posts results back to the Commander. Subprocess execution allows the daemon to inherit the operator's Claude authentication context and apply per-task system prompt overrides without maintaining API state in the daemon process itself.

Each node registers its slot capacity at connection time. The daemon implements exponential backoff on network failure and graceful task draining before shutdown, ensuring in-progress work completes before the node disconnects.

2.3 Mobile N0D3S

The N0D3 Flutter application allows iOS devices to function as compute nodes. The mobile N0D3 communicates with the Commander exclusively through the REST API and SSE stream — it has no shell access and executes no local processes. Claude API calls are made directly from the app using the Anthropic Dart SDK with real SSE streaming for token delivery. The app maintains an offline task queue for network interruptions and performs graceful task handoff to other available nodes when the device disconnects mid-execution.

2.4 iMessage Bridge

The iMessage bridge is a multithreaded Python daemon running on a macOS host. One thread polls chat.db via SQLite for new messages from an allowlisted set of senders. A second thread classifies message intent and submits tasks to the Commander REST API. A third thread monitors task completion events from the SSE stream. A fourth thread delivers replies via osascript. Magic byte verification prevents binary payloads from being interpreted as task text. The result is that any iMessage client — phone, Mac, iPad — can submit and monitor mesh tasks through natural language.

2.5 Database Layer

All Commander state is stored in a single SQLite database. The schema covers tasks, agents, agent memory, task dependencies, task templates, plugins, plugin call logs, agent evolution proposals, federation routes, reputation scores, debate votes, and an append-only audit log. INTEGER primary keys are used throughout. Timestamps are stored as ISO-8601 TEXT in UTC. The schema is versioned with migration scripts that use PRAGMA table_info to guard idempotently against re-running completed migrations.

Section 03

Task Dispatch Engine

Task dispatch uses a pull model to avoid centralized scheduling complexity. When a task enters the pending state, the Commander broadcasts a task_available SSE event. N0D3S with available slots and matching capabilities attempt an atomic claim via a SQL transaction with optimistic locking. The first node to commit the claim owns the task; other claimants retry on the next available event. Four subsystems refine this baseline dispatch loop.

3.1 Circuit Breaker

Each N0D3 maintains a rolling window of its 20 most recent task outcomes. When the failure rate within this window exceeds 50%, the circuit breaker opens and the node is excluded from new task claims for a 120-second cooldown period. After the cooldown, the circuit enters a half-open state: the node is eligible to claim a single task, and the result determines whether the circuit closes (normal operation) or opens again.

This pattern prevents a degraded node — due to API rate limits, hardware load, or a failed dependency — from continuing to claim tasks it cannot complete, which would delay the overall queue and reduce apparent system throughput.

3.2 Reputation Scoring (UCB)

Each agent maintains a reputation score updated after every task completion. The scoring function is derived from the upper-confidence-bound (UCB1) formula used in bandit problems [29]: it balances exploitation (routing to historically high-performing agents) with exploration (giving lower-ranked agents opportunities to demonstrate improvement). Task complexity and domain are factored into the score update — a failure on an out-of-domain task penalizes less than a failure on a core-capability task.

The dispatcher preferentially routes tasks to agents with higher reputation scores when multiple capable nodes are eligible. Reputation scores are stored in the database and persist across daemon restarts, so the mesh accumulates routing knowledge over its operational lifetime.

3.3 Capability-Based Routing

Workers declare a set of capability tags at registration time (for example: code_review, research, summarization). Tasks similarly declare required capabilities. The dispatcher only broadcasts task_available events to nodes whose declared capabilities satisfy the task requirements. This eliminates spurious claim attempts from nodes that could not complete the task regardless, reducing contention on busy meshes.

The Plugin SDK extends the capability model to tool use. Workers that have registered a plugin (web search, file summarization, memory augmentation, or operator-defined custom tools) declare corresponding capability tags. Tasks that require tool use are routed only to plugin-capable nodes.

3.4 Task Dependencies and DAG Resolution

Tasks declare dependencies through a task_deps join table referencing parent task IDs. Before accepting a task into the dispatch queue, the Commander validates that the declared dependency graph contains no cycles using topological sort. Tasks that fail this check are rejected at submission time with an explicit error, preventing deadlock at runtime.

When a task completes, the Commander queries its dependency graph and automatically dispatches any child tasks whose remaining dependencies are now satisfied. This enables arbitrarily complex workflows — data collection feeding parallel analysis, parallel analysis feeding synthesis — to be encoded as task graphs and executed without orchestration code in the application layer.

Federation relay: Tasks that exceed local mesh capacity can be forwarded to federated Commander instances. The federation protocol enforces a maximum hop count (default: 3) to prevent routing loops, and each relay hop is recorded in the audit log with the originating Commander identity.

Section 04

Security Model

Encryption at Rest

Agent memory entries are encrypted at rest using Fernet symmetric encryption (AES-128-CBC with HMAC-SHA256 authentication). Key management uses the MultiFernet pattern: a primary encryption key and one or more retired keys are active simultaneously, allowing key rotation without decrypting and re-encrypting the entire memory corpus. All key material is derived from a master secret using PBKDF2-SHA256 with 100,000 iterations, providing resistance to offline brute-force attacks on exported key material. The plaintext key is never written to disk or logged.

Authentication and Key Derivation

Agent API keys are stored as hash:salt pairs derived via PBKDF2-SHA256. The plaintext key is presented exactly once at agent creation; subsequent operations authenticate using the hash. JWT sessions use HS256 with short-lived access tokens and refresh token rotation. All secrets are injected at runtime via environment variables — no credentials appear in source code or container images.

RBAC — Role-Based Access Control

The permission system defines 17 discrete permissions covering task submission, task claiming, memory read/write, plugin invocation, agent administration, audit log access, and federation relay. Four role presets are provided as starting points: worker (claim and execute tasks), mobile (API-only task interaction, no shell access), commander (full task management, no admin), and admin (all permissions). Custom role configurations can combine any subset of the 17 permissions.

A critical security property of the RBAC implementation: an agent with an empty or unrecognized permission set receives zero access — not default access. This empty-deny default means misconfigured agents fail closed, preventing accidental capability grants through configuration errors.

Per-Agent Token Budgets

Each agent is assigned monthly, daily, and per-task token budget limits. The budget guard evaluates these limits before the dispatch loop forwards a task to a node. Tasks that would exceed any budget threshold are not dispatched to that agent — they remain in the queue for dispatch to another eligible node or are returned to the submitter if no eligible node has budget remaining. Budget consumption is tracked in the database and reported in the audit log.

Transport Security and Headers

All Commander traffic is TLS 1.3 via Caddy with automatic ACME certificate management. Desktop nodes additionally communicate over Tailscale for mesh-internal traffic. The iMessage bridge operates entirely locally — message content does not leave the host without explicit Commander submission. SecurityHeadersMiddleware injects Content-Security-Policy (strict, nonce-based for inline scripts), Strict-Transport-Security (max-age=31536000, includeSubDomains), X-Frame-Options: DENY, X-Content-Type-Options: nosniff, and Referrer-Policy: same-origin on every response.

Input Validation

Path parameters are validated against ^[a-zA-Z0-9_.-]{1,64}$ before any database access — the dot is required to support agent identifiers that follow dot-notation conventions. All request bodies are validated by Pydantic V2 models enforcing field types, ranges, and format constraints. All SQL uses parameterized queries throughout the codebase; string interpolation into SQL is structurally absent.

Audit Logging

Every state-changing operation — task status transitions, agent configuration changes, permission grants, plugin invocations, federation relays — writes a row to the audit_log table. Each row records: timestamp, agent_id, action, resource_type, resource_id, before/after state snapshot (JSON), and a provenance chain hash linking to the previous audit entry. The table is append-only; no UPDATE or DELETE operations are permitted on it. The hash chain allows external verification that the audit log has not been retroactively modified.

Security Audit History

Three independent security audits have been completed. The first audit identified SQL injection and hardcoded credential issues. The second audit identified IDOR vulnerabilities, an authentication bypass in the task claim flow, the empty-allow RBAC default, and weak JWT signing. The third audit identified no critical or high-severity findings; four medium-severity items were reviewed and accepted as known risks. All critical and high findings from the first two audits were remediated before the subsequent version was released.

Section 05

Agent Intelligence

M3SHD implements a set of patterns from the multi-agent research literature that operate above the model layer — they improve outcomes without modifying the underlying language model or requiring fine-tuning.

Encrypted Memory with FTS5 Search

The agent_memory table functions as a shared world model. Agents read from memory at task start and write to it by emitting structured [REMEMBER] markers in their output, which the daemon intercepts and stores. Memory entries are encrypted with Fernet before storage and indexed in an FTS5 virtual table over their decrypted content, allowing full-text semantic search without storing plaintext in the main table. A nightly consolidation sweep identifies contradictory entries, merges duplicates, and flags low-confidence memories for review. Memory persists across daemon restarts, model upgrades, and node replacements.

Confidence-Based Verification Routing

Every agent output includes a structured confidence score in the range [0.0, 1.0]. The daemon parses this score before posting the result to the Commander. Outputs scoring below 0.7 are automatically routed to a second eligible agent for independent verification before delivery to the task submitter. The verification agent receives the original task, the first agent's output, and a prompt instructing it to identify errors rather than produce a new answer. This operationalizes the metacognitive verification model described by Steyvers and Peters [33] without requiring changes to the base language model.

Self-Evolving Agent Configurations

A background evolution process runs on a configurable schedule over a rolling 7-day performance window. It aggregates task outcome data per agent — success rate, confidence distribution, verification trigger rate, operator corrections — and submits this data to Claude with a prompt requesting specific, actionable amendments to the agent's system prompt and configuration parameters. The proposed changes are stored in the agent_evolution table for operator review. Accepted proposals are applied to the agent configuration; rejected proposals are logged with the rejection reason for future analysis.

Debate Protocol

Tasks submitted with the debate: true flag are dispatched to three independent agents simultaneously. Each agent produces a complete answer without access to the others' outputs. A designated moderator agent then receives all three answers and identifies points of agreement, points of disagreement, and the relative strength of the arguments. The moderator synthesizes a final answer from the convergent positions and logs the full debate transcript. The debate protocol is implemented as described in the multi-agent debate framework [32] and is most effective on factual and analytical tasks where answer correctness can be evaluated.

Collective Voting

Configuration changes — model tier assignments, prompt amendments, plugin additions, federation policy changes — are submitted as proposals to the votes table. Agents cast weighted votes, with vote weight proportional to the agent's current reputation score. The operator sees the aggregated result — vote distribution, reasoning from supporting and opposing agents — and applies or rejects the proposal. This creates a governance layer that scales with mesh size without requiring operator involvement in routine configuration decisions.

Model Routing

Task complexity is classified at dispatch time using a rule-based classifier that analyzes prompt structure, required output format, declared capabilities, and historical success rates per model tier. Simple tasks are directed to lighter model tiers; complex or high-stakes tasks are directed to more capable tiers. The classification rules are explicit and editable — operators can inspect and adjust the routing logic without retraining a model. The routing decision and the rationale are recorded in the audit log for each dispatched task.

Section 06

Hardware Deployment

Pi 5 Mesh Nodes

Physical mesh nodes run on Raspberry Pi 5 hardware. The Pi 5 handles concurrent daemon processes without breaking a sweat, idles at under 5W, and fits anywhere — the right hardware for an always-on node. Each node boots from a prepared SD card image containing the mesh daemon, its Python dependencies, Tailscale, and a systemd service unit that starts the daemon on boot and restarts it on failure.

N0D3 Enclosures

Physical nodes are housed in the N0D3 enclosure — a custom 3D-printed case designed specifically for the Pi 5 form factor. The enclosure provides structured airflow for thermal management, a recessed panel for cable management, and a mounting pattern compatible with standard rack and wall-mount hardware. N0D3 enclosures are printed in PETG for thermal stability.

Provisioning

Node provisioning is scripted end-to-end: SD card imaging, first-boot configuration, Tailscale authentication, daemon installation, and Commander registration are all driven by a single provisioning script. A new physical node can be added to an existing mesh without manual SSH configuration steps. The Commander validates the node's registration credentials and capability declaration before it begins receiving task dispatch events.

Network Topology

Pi 5 N0D3S and desktop N0D3S communicate with the Commander over Tailscale, providing encrypted mesh-internal traffic without requiring port forwarding or VPN gateway configuration. Mobile N0D3S (N0D3 Flutter app) communicate directly with the Commander's public HTTPS endpoint. The Commander itself is exposed through Caddy with automatic TLS, with no direct exposure of the FastAPI process to the public internet.

Section 07

Comparison with Existing Frameworks

The table below compares M3SHD against five leading open-source multi-agent frameworks across fifteen capability dimensions. Ratings reflect publicly documented features as of April 2026.

Capability	M3SHD	AutoGPT	CrewAI	LangGraph	MetaGPT	AutoGen
Hub-and-spoke topology	YES	NO	NO	PARTIAL	NO	PARTIAL
Heterogeneous hardware nodes	YES	NO	NO	NO	NO	NO
Mobile device as compute node	YES	NO	NO	NO	NO	NO
iMessage natural language interface	YES	NO	NO	NO	NO	NO
Circuit breaker with rolling window	YES	NO	NO	NO	NO	NO
Capability-based task routing	YES	NO	PARTIAL	PARTIAL	PARTIAL	PARTIAL
DAG task dependency resolution	YES	NO	PARTIAL	YES	NO	PARTIAL
Agent reputation scoring (UCB)	YES	NO	NO	NO	NO	NO
Encrypted agent memory (FTS5)	YES	PARTIAL	PARTIAL	PARTIAL	NO	PARTIAL
Confidence-based verification routing	YES	NO	NO	NO	PARTIAL	NO
Self-evolving agent configurations	YES	NO	NO	NO	NO	NO
Debate protocol	YES	NO	NO	NO	NO	YES
Granular RBAC (17 permissions)	YES	NO	NO	NO	NO	NO
Per-agent token budgets	YES	NO	NO	NO	NO	NO
Cross-mesh federation with hop limits	YES	NO	NO	NO	NO	NO

M3SHD achieves YES across all 15 dimensions. AutoGen is the nearest competitor with 2 YES ratings (hub topology, debate protocol) and 1 PARTIAL. LangGraph scores 2 YES (hub topology, DAG dependencies) and 2 PARTIAL. The remaining frameworks score 0 YES across these dimensions.

Observation: The combination of capability routing, circuit breaking, reputation scoring, and DAG dependency resolution in the dispatch engine is not replicated in any framework in the comparison. These four mechanisms interact — a node that trips its circuit breaker is excluded from capability matching; a reputation-penalized node loses dispatch priority even when eligible. The composite behavior is more useful than any single mechanism in isolation.

Section 08

Future Work

Android worker client — The Flutter codebase is cross-platform; an Android build of the N0D3 app would extend the mobile compute pool to the largest smartphone market by volume without requiring a separate implementation.
Learned task router — The current model tier classifier is rule-based and explicit. A lightweight classifier trained on historical task outcomes per model tier could improve routing precision, particularly for tasks that fall ambiguously between tiers.
Federated memory consolidation — The nightly consolidation sweep currently operates within a single hub's memory corpus. Extending it across federated meshes, with RBAC-governed read access, would allow knowledge to propagate between M3SHD deployments.
Plugin SDK documentation and registry — The plugin interface is implemented and functional but not yet formally documented. A published plugin registry and development guide would allow operators to share plugins across deployments.
Benchmark harness publication — The internal test suite covers dispatch, security, and intelligence layer behaviors. Publishing this as a standalone evaluation harness would enable objective comparison against frameworks that add similar capabilities in future releases.

References

1. Anthropic. Claude 3.7 Sonnet System Card — Anthropic, February 2025. anthropic.com/claude/sonnet
2. Anthropic / Block. Model Context Protocol Specification — MCP Spec, November 2024. modelcontextprotocol.io/specification
3. Wang et al. A Survey on Large Language Model based Autonomous Agents — arXiv 2308.11432. arxiv.org/abs/2308.11432
4. Xi et al. The Rise and Potential of LLM-Based Agents: A Survey — arXiv 2309.07864. arxiv.org/abs/2309.07864
5. Park et al. Generative Agents: Interactive Simulacra of Human Behavior — CHI 2023, arXiv 2304.03442. arxiv.org/abs/2304.03442
6. Hong et al. MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework — arXiv 2308.00352. arxiv.org/abs/2308.00352
7. Wu et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation — arXiv 2308.08155. arxiv.org/abs/2308.08155
8. Chase. LangChain — GitHub. github.com/langchain-ai/langchain
9. Joao Moura. CrewAI — GitHub. github.com/joaomdmoura/crewAI
10. Significant Gravitas. AutoGPT — GitHub. github.com/Significant-Gravitas/AutoGPT
11. FastAPI — tiangolo.com. fastapi.tiangolo.com
12. SQLite WAL Mode Documentation — SQLite.org. sqlite.org/wal.html
13. Litestream — litestream.io. litestream.io
14. MemoryOS: A Memory-Based Operating System for AI Agents — arXiv 2506.06326. arxiv.org/abs/2506.06326
15. A-MEM: Agentic Memory for LLM Agents — arXiv 2502.12110, February 2025. arxiv.org/abs/2502.12110
16. Tulving. Episodic and Semantic Memory — Academic Press, 1972.
17. OWASP Top Ten 2021 — OWASP Foundation. owasp.org/Top10
18. PBKDF2 — NIST SP 800-132. csrc.nist.gov
19. Riverpod — riverpod.dev. riverpod.dev
20. GoRouter — pub.dev/packages/go_router. pub.dev/packages/go_router
21. Firebase Cloud Messaging — Google. firebase.google.com/docs/cloud-messaging
22. Tailscale — tailscale.com. tailscale.com
23. Caddy Server — caddyserver.com. caddyserver.com
24. Cloudflare R2 — cloudflare.com/developer-platform/r2. cloudflare.com
25. Dynamic Scheduling in Heterogeneous Computing Environments — IEEE TC, March 2025.
26. LoRASA: Low-Rank Agent-Specific Adaptation — arXiv 2502.05573, February 2025. arxiv.org/abs/2502.05573
27. TT-LoRA MoE: Unifying PEFT and Sparse Mixture-of-Experts — arXiv 2504.21190. arxiv.org/abs/2504.21190
28. AgentNet: Decentralized Evolutionary Coordination — arXiv 2504.00587. arxiv.org/abs/2504.00587
29. DRF: LLM-AGENT Dynamic Reputation Filtering — arXiv 2509.05764. arxiv.org/abs/2509.05764
30. MasRouter: Learning to Route LLMs for Multi-Agent Systems — arXiv 2502.11133. arxiv.org/abs/2502.11133
31. Decentralized Adaptive Task Allocation — Nature Scientific Reports, February 2025. nature.com
32. Multi-Agent Debate: Can LLM Agents Really Debate? — arXiv 2511.07784. arxiv.org/abs/2511.07784
33. Metacognition and Uncertainty in Humans and LLMs — Steyvers & Peters, Psychological Science 2025. journals.sagepub.com
34. Emergent Coordination in Multi-Agent Language Models — arXiv 2510.05174. arxiv.org/abs/2510.05174
35. Evaluating Theory of Mind in LLM-Based Multi-Agent Systems — arXiv 2603.00142. arxiv.org/abs/2603.00142
36. Multi-Agent Evolve: LLM Self-Improve through Co-evolution — arXiv 2510.23595. arxiv.org/abs/2510.23595
37. State of Agentic iOS Engineering in 2026 — Dimillian/Medium. dimillian.medium.com
38. Microsoft Agent Governance Toolkit — April 2026. opensource.microsoft.com
39. AI Agent Index 2025 — arXiv 2602.17753. arxiv.org/html/2602.17753v1
40. Claude Agent SDK Overview — Anthropic. platform.claude.com
41. AutoGen GitHub — Microsoft. github.com/microsoft/autogen
42. CrewAI Concepts — Official Docs. docs.crewai.com
43. LangGraph Overview — LangChain. langchain.com/langgraph
44. Matrix: Meta Multi-Agent for Synthetic Data — arXiv 2511.21686. arxiv.org/abs/2511.21686
45. Galileo Human-in-the-Loop Guide. galileo.ai
46. Agent Budget Guard MCP Server — earezki.com. earezki.com
47. Cryptographic Verifiability of End-to-End AI Pipelines — IWSPA 2025 (ACM), arXiv 2503.22573. arxiv.org/html/2503.22573v1
48. Towards AGI: Self-Evolving Agent — arXiv 2601.11658. arxiv.org/abs/2601.11658
49. MiRA: Subgoal-driven Framework for Long-Horizon LLM Agents — arXiv 2603.19685. arxiv.org/abs/2603.19685
50. Orchestrating Confidence-Aware Routing for Multi-Agent Collaboration — arXiv 2601.04861. arxiv.org/abs/2601.04861

M3SHD is built by Fred Wojo / GritWerk.
Demo: mesh.demobygrit.com (public read-only dashboard) | Contact: fred@gritwerk.com