M3SHD
Multi-Agent AI Mesh — Full System Documentation
Executive Summary
M3SHD is a multi-agent collaboration mesh that lets a single human operator command a distributed team of AI agents from anywhere — iMessage, a PWA dashboard, a native iOS app, or a desktop browser. The system spans six heterogeneous nodes across desktop, mobile, and cloud, coordinated by a lightweight hub that routes tasks without running AI inference itself. All intelligence lives at the edges.
The system spans six nodes: Archon (M2 Mac Mini, primary AI and iMessage bridge), Rex (Intel Mac Mini, research), Crucible (iMac 27", overflow compute), N0D3-1 (iPhone 14 Pro Max, mobile AI worker), N0D3-2 (iPhone 12 Pro, second mobile worker), and the M3SHD Commander node (dispatch and monitoring command center). Desktop nodes connect over Tailscale; mobile nodes connect via the public HTTPS endpoint.
The platform grew through eight iterative phases, each building on a tested foundation before proceeding. The codebase now spans 20+ test files and covers security, intelligence, observability, mobile, and cross-hub federation.
Agents maintain persistent encrypted memory: they remember facts across tasks, auto-extract key findings tagged [REMEMBER], and have those memories automatically injected into future task prompts. Tasks can be chained into pipelines with dependency graphs, so the output of a research task flows automatically into a summarization task when the first completes. Task templates allow one-tap dispatch of recurring workflows with variable substitution. A plugin system exposes tools (web search, file summary, notification, memory enhancement) to agents via a structured lifecycle. Self-evolving agents track their own performance and submit prompt amendment proposals for operator review.
The AGI-adjacent intelligence layer adds capabilities uncommon in production agent systems at this scale: metacognition (confidence scoring with auto-verification on low-confidence outputs), smart model routing (Haiku/Sonnet/Opus selected per task complexity), natural language mesh control, agent reputation scoring via UCB formula, intrinsic motivation for idle agents, adversarial self-improvement, a shared world model (entity graph auto-extracted from task outputs), the debate protocol (two agents argue, synthesis merges the strongest positions), cryptographic provenance (HMAC-SHA256 signed task chains), memory consolidation, and collective voting on ambiguous decisions.
The platform provides RBAC (17 permissions, 4 role presets), multi-user support (admin/user/viewer with PBKDF2 sessions), third-party API keys with rate limits and daily caps, cross-mesh federation (hub-to-hub relay with overflow routing and loop prevention), FCM push notifications (Firebase HTTP v1, silent wakeup on task dispatch), a demo mode (public read-only dashboard), an MCP server (17 tools, wired into Claude Desktop and Claude Code natively), voice command dispatch (Siri Shortcuts webhook), a 3D WebGL mesh visualization, and Android build support for both apps.
Architecture Overview
M3SHD follows a hub-and-spoke topology extended with peer federation. The hub is a stateless relay and persistence layer that does not run AI inference. All intelligence lives at the edges: worker processes on individual machines invoke Claude CLI as a subprocess and report results back to the hub via REST API, while mobile workers invoke the Claude API directly. Desktop workers additionally have access to the full plugin system, memory store, and intelligence layer. This design means the hub can run on a modest VPS (256MB RAM, 0.5 CPU) while workers leverage the full compute of their host machines.
The hub exposes over 80 API endpoints over HTTPS, authenticated by RBAC-scoped agent tokens or session cookies. Real-time event delivery uses Server-Sent Events (SSE), which avoids WebSocket complexity while providing push notifications for all system events. The SSE bus implements backpressure via per-subscriber queue caps of 500 events and sends a keepalive every 30 seconds.
Workers connect to the hub over Tailscale (desktop nodes) or the public HTTPS endpoint (mobile nodes and federated peer hubs). The Claude CLI on desktop workers runs with --dangerously-skip-permissions; the bearer token and Tailscale network boundary are the security perimeter. Mobile workers are API-only — no shell, no filesystem, no tool use.
┌─────────────────────────────────────────────┐
│ M3SHD Commander Hub │
│ mesh.demobygrit.com │
│ │
│ FastAPI + SSE + SQLite WAL │
│ Task Queue + Agent Registry │
│ RBAC + Multi-User Auth │
│ Agent Memory (FTS5, encrypted) │
│ Task Dependencies + Pipelines │
│ Plugin System (4 built-in tools) │
│ Self-Evolving Agents │
│ AGI Intelligence Layer │
│ Analytics Dashboard │
│ Webhook System + Federation │
│ MCP Server (17 tools) │
│ 3D WebGL Mesh Visualization │
└──────────────────┬──────────────────────────┘
│
┌────────────────────────────┼────────────────────────────┐
│ │ │
┌──────┴──────┐ ┌──────┴──────┐ ┌────────┴────────┐
│ Mesh Daemon │ │ Rex Worker │ │ Crucible Worker │
│ (M2 Mac) │ │ (Intel Mac) │ │ (iMac 27") │
│ │ │ │ │ │
│ Archon AI │ │ Rex AI │ │ Worker AI │
│ iMsg Bridge │ │ Research │ │ Overflow │
│ Full Tools │ │ File Ops │ │ Research │
│ 5 slots │ │ 2 slots │ │ 3 slots │
└─────────────┘ └─────────────┘ └─────────────────┘
N0D3-1 / N0D3-2 (iPhones) ──── Claude API ──────────► Hub
M3SHD Commander App ──────────── dispatch/monitor ───► Hub
Peer Hubs ────────────────────── federation relay ───► Hub
The mesh daemon on the M2 Mac Mini is the critical integration point. It runs four persistent threads plus a session bridge daemon: T1 reads iMessage chat.db and relays texts to the hub; T2 polls the hub for agent messages and delivers them via osascript; T3 is the Archon AI watcher that responds using Claude; T4 is a Rex agent thread; and the session bridge daemon listens to SSE and writes broadcast files that Claude Code hooks can read between task polls.
Hub Server
The hub is implemented as a FastAPI application in app/main.py. It serves authentication, API routing, SSE broadcasting, the web UI, and the admin control plane.
The hub supports three parallel auth mechanisms: (1) RBAC agent tokens (m3shd_ag_ prefix) with scoped permissions per the RBAC system, (2) third-party API keys (m3shd_key_ prefix) with per-key rate limits and daily caps, and (3) browser sessions via PBKDF2-hashed user accounts. The master token bypasses RBAC (all permissions). All token comparisons use hmac.compare_digest.
SecurityHeadersMiddleware injects on every response: Content-Security-Policy, Strict-Transport-Security (max-age 31536000; includeSubDomains), X-Frame-Options: DENY, X-Content-Type-Options: nosniff, Referrer-Policy: strict-origin-when-cross-origin, and Permissions-Policy: camera=(), microphone=(), geolocation=(). Login rate limiting is enforced at /api/login: 5 failures per 5-minute window per IP returns 429.
Three background tasks run on a schedule: (1) the agent sweeper marks workers offline if their heartbeat exceeds the configurable timeout, (2) the zombie task sweeper resets tasks stuck in running or assigned for more than 10 minutes back to queued, and (3) the agent evolution sweeper runs every 6 hours to analyze per-agent performance and generate prompt amendment proposals. A task deadline sweeper runs every 60 seconds to check for overdue tasks, increment their urgency, and fire ntfy alerts.
The SSE bus broadcasts across 30+ event types covering all system activity. Keepalive events fire every 30 seconds. Per-subscriber queue caps at 500 events enforce backpressure. Plugin lifecycle hooks run on task_created, task_completed, task_failed, agent_online, and agent_offline events.
Worker System
The generic worker (mesh-worker.py) is the mechanism by which any machine joins the mesh with a single command. It uses only urllib.request for HTTP and threading for concurrency — no FastAPI dependency.
The worker lifecycle: register with capabilities, enter a poll loop with heartbeats, check active task count and system load average before fetching tasks, spawn a daemon thread per task, invoke Claude CLI via subprocess, stream partial output to the hub via POST /api/tasks/{id}/stream, post final output and cost log, and report done.
result = subprocess.run(
["claude", "--print", "--dangerously-skip-permissions"],
input=prompt,
capture_output=True,
text=True,
timeout=self.claude_timeout,
)
- Memory integration. After task completion, the worker scans output for
[REMEMBER]blocks and POSTs them to/api/agents/{id}/memory. On task pickup, relevant memories are already injected into the prompt by the hub. - Plugin tool calls. Workers scan output for
[TOOL_CALL] ... [/TOOL_CALL]markers and invoke the corresponding plugin tool, incorporating the result into the next Claude invocation. - Capability routing. The
required_capabilitycolumn on tasks ensures a task dispatched forcode_reviewonly goes to agents that advertise that capability. - Escalation. After three consecutive failures, the worker triggers an escalation. Rex escalates to Archon, Archon escalates to the user. Daily token budgets cap spending at 500,000 tokens per agent.
Mesh Daemon
The mesh daemon (mesh-daemon.py) runs on the M2 Mac Mini and integrates four subsystems: the iMessage relay, the hub-to-iMessage relay, the Archon AI watcher, and the Rex agent. All four threads share a shutdown_event and are monitored by a health supervisor.
A standalone session-bridge.py daemon runs alongside the mesh daemon. It maintains a persistent SSE connection to the hub, and whenever a broadcast-worthy event arrives (new human message, agent output, system alert), it writes a timestamped file to ~/m3shdup/broadcasts/. Active Claude Code sessions check this directory between task polls via the broadcast-check.sh UserPromptSubmit hook. The mesh-post.sh script allows posting from any terminal session to hub chat.
The resilient_loop wrapper restarts a crashed thread after 5 seconds of backoff, up to 10 times. After all threads exhaust their budget, the main health supervisor exits and the outer start-mesh.sh shell loop relaunches the entire daemon in 5 seconds. Individual crash recovery: 5 seconds. Full process recovery: 10 seconds. The daemon auto-starts via .zprofile on terminal open. A lockfile prevents duplicate instances.
iMessage Bridge
The iMessage bridge reads Apple's chat.db in read-only mode and sends replies via AppleScript's osascript. It is the most unconventional component in the system.
- Thread 1 (iMessage to Hub) polls
chat.dbevery 3 seconds, filtering for incoming messages from the user's number. Bridge-echo loops are prevented by checking for the relay prefix regex. Image attachments undergo path restriction, symlink resolution, 20MB cap, and magic byte verification (JPEG, PNG, GIF, WebP, HEIC/HEIF). Failed POSTs are queued in a deque and retried on the next cycle. - Thread 2 (Hub to iMessage) polls the hub every 5 seconds for new agent messages and delivers them via osascript. The message content is always passed via argv — never interpolated into AppleScript — preventing injection. Messages over 3,000 characters are truncated at a word boundary.
- Schema version detection. The bridge includes an iMessage schema abstraction layer that detects the chat.db schema version at startup and normalizes reads accordingly, handling schema differences across macOS versions without requiring code changes.
Command Interface
The user communicates with the mesh by texting commands from an iPhone. The command parser recognizes 14 core command patterns:
| Command | Mode | Description |
|---|---|---|
| status | status | Show agent online/offline states and active task counts |
| tasks | tasks | List the 10 most recent tasks with status icons |
| @context | context_get | Show all shared context key-value pairs |
| @context key=val | context_set | Set a shared context entry |
| @rex <text> | rex_message | Dispatch a task directly to Rex |
| approve | approval | Approve the most recent pending approval |
| reject | approval | Reject the most recent pending approval |
| pending | pending_approvals | List all pending approval requests |
| costs / @costs | costs | Show today's token usage and cost by agent |
| escalations | escalations | List open escalations |
| do <text> | task | Execute a task via Archon with full context |
| auto <goal> | autonomous | Decompose goal into subtasks, dispatch to Rex |
| summary | summary | Summarize recent task results |
| (anything else) | chat | Free-form conversation with Archon |
Autonomous mode decomposes a high-level goal into 2–5 concrete subtasks via Claude, dispatches each to Rex in parallel, and delivers a summary with a tasks link for tracking. Voice commands via Siri Shortcuts provide an additional dispatch path — POST /api/voice/dispatch parses natural language and routes to the matching task template.
Task Dispatch Engine
The dispatcher (app/dispatch.py) implements capacity-aware routing with circuit-breaker protection, extended with reputation-based selection and deadline urgency.
- If
assign_tois specified, verify capacity and closed circuit. If not, fall through. - Query available workers for all online/busy agents with spare capacity advertising the required capability.
- Filter out agents with open circuit breakers.
- Apply reputation scoring (UCB formula across per-capability success histories).
- Select highest-reputation agent with available capacity.
- If no worker is available, create task with
status=queued.
Trips after 3 consecutive failures. Cooldown: 60 seconds. Half-open state on recovery: one task allowed through; success closes the circuit, failure trips it again immediately.
Each agent maintains per-capability success/failure history. The UCB (Upper Confidence Bound) formula balances exploitation (agents with high success rates) with exploration (agents that haven't been tried recently for a given capability). New agents begin with a neutral prior.
Tasks created with a deadline timestamp are monitored by the 60-second sweeper. Overdue tasks have their urgency level incremented (low → medium → high → critical) and trigger ntfy escalation alerts. Priority is bumped automatically on urgency escalation.
Safety and Control Layer
M3SHD provides five interlocking safety mechanisms: the approval queue, the escalation chain, cost tracking, RBAC, and metacognition.
- Approvals. Any agent can create an approval request via
POST /api/approvals. The user responds withapproveorreject. Expired approvals are automatically cleaned up. - Escalations. Three consecutive task failures trigger an escalation. Rex escalates to Archon, Archon escalates to the user. Each escalation record includes agent, target, task ID, and reason.
- Cost Tracking. Every Claude invocation logs estimated token usage. Daily limit: 500,000 tokens per agent. The budget ntfy alert fires at $1/day per agent.
- ntfy Alerting. Four alert types: agent-down (once per outage), task-failed (on third attempt), escalation (on escalation creation), and budget ($1/day threshold). Fire-and-forget via
run_in_executor— never blocks the request path. - Metacognition. Every task result carries a confidence score (0.0–1.0). Tasks with confidence below 0.7 are automatically submitted for verification by a second agent before the result is accepted. Operators can see confidence scores in the analytics dashboard.
Web UI
The web UI is a dark-themed, mobile-first PWA built without any JavaScript framework. CSS custom properties handle theming; JetBrains Mono is the primary typeface; the brand gradient is amber → purple → emerald.
- Dashboard. Agent cards showing name, machine, online/offline status, and task utilization. Status summary bar shows total online agents, active tasks, and pending approvals.
- Chat. Full-screen chat interface with color-coded message bubbles: amber (user), purple (Archon), emerald (Rex). Real-time SSE updates. 16px input font prevents iOS Safari zoom-on-focus.
- Tasks. Kanban-style task list. Template chips allow one-tap dispatch of seed templates. Each task card shows title, assignee, status badge, confidence score, and deadline indicator if set.
- Logs. Live log stream fed by SSE. Timestamped and color-coded by event type.
- 3D Mesh Visualization. WebGL force graph. Nodes colored by status (green=online, amber=busy, red=offline) and sized by capacity. Active tasks render as glowing particles traveling between nodes. The graph auto-rotates and is interactive.
- Analytics. Admin-only tab. Charts for task throughput by status, agent, and day; cost trends; uptime and memory counts per agent.
- Demo Mode. When enabled,
GET /demoserves a public read-only dashboard. Rate limited at 30 req/min per IP.
N0D3 Mobile Worker
N0D3 is a Flutter iOS app that turns any iPhone or iPad into a live M3SHD worker node. The second instance, N0D3-2, runs on an iPhone 12 Pro.
The app implements the full worker contract: register, heartbeat, poll, execute, report. It calls the Claude API directly (api.anthropic.com) using the user's API key stored in iOS Keychain via flutter_secure_storage. State management via Flutter Riverpod (Notifier / AsyncNotifier). GoRouter navigation with three routes: splash, setup, and main.
N0D3 uses anthropic_sdk_dart for real SSE streaming via client.messages.createStream(). The onChunk callback forwards partial output to the hub in real-time as the model generates, delivering genuine token-by-token updates to the dashboard and Commander app.
- Offline Task Queue. If the hub is unreachable when a task completes, the result is saved locally. On reconnect, a sync sweep POSTs all pending results.
- Capabilities.
research,summarize,chat,triage. Max 1 concurrent task. Text-in, text-out only. - Background Mode. FCM silent push (Firebase Cloud Messaging) wakes the app the moment a matching task is dispatched, reducing task start latency from minutes to seconds.
- Lifecycle-Aware Heartbeats. 10s (WiFi, foreground), 30s (backgrounded), 60s (cellular).
- Task Handoff. On 30 seconds of failed reconnection during active task execution: save partial output locally, reconnect, call
POST /api/tasks/{id}/handoff. Hub suspends original task and creates continuation for the next available agent. - UI Design. Glassmorphism: frosted glass cards via
BackdropFilter, gradient borders, live status indicator. All colors viaMeshColorsandMeshGradientintheme.dart.
| Config | Value |
|---|---|
| Bundle ID | com.gritwerk.meshNode |
| Minimum iOS | 15.0 |
| Display Name | N0D3 |
| Network | NSAllowsLocalNetworking: true |
| Android APK | 49MB |
M3SHD Commander App
The M3SHD Commander is a five-tab native iOS command center (25 Dart files). It registers with maxConcurrent: 0 — the dispatcher never assigns it tasks to execute.
| Tab | Path | Screen |
|---|---|---|
| 0 | / | Dashboard — agent grid, online status, utilization |
| 1 | /chat | Chat — full mesh chat, keyboard padding fixed |
| 2 | /tasks | Tasks — create tasks, template chips, view queue; FAB above tab bar |
| 3 | /logs | Logs — filtered SSE log stream |
| 4 | /settings | Settings — hub URL, token, commander name |
settingsProvider—NotifierProvider<SettingsNotifier, AppSettings>hubTokenProvider—FutureProvider<String>(iOS Keychain)hubConnectionProvider—NotifierProvider(heartbeat + SSE lifecycle)agentsProvider—AsyncNotifierProvider(fetches immediately on connect)messagesProviderandtasksProvider— real-time via SSE
SSE integration reconnects with exponential backoff. The hub connection provider tears down the SSE stream on device-offline and reconnects on return. All colors from MeshColors.*. Touch targets: 44px minimum throughout.
| Config | Value |
|---|---|
| Bundle ID | com.gritwerk.m3shdup |
| Minimum iOS | 15.0 |
| Background modes | fetch, remote-notification |
| Network | NSAllowsLocalNetworking: true |
Task Handoff System
The task handoff system ensures no work is lost when an agent disconnects mid-task.
Endpoint. POST /api/tasks/{id}/handoff accepts optional partial_output. The endpoint: loads the original task; updates status to suspended, storing partial output in the output field (capped at 32KB, with ownership check); creates a continuation task at bumped priority with a prompt prepended by CONTINUE FROM PREVIOUS AGENT'S PARTIAL WORK:; dispatches via the standard dispatcher; and broadcasts a task_handoff SSE event.
N0D3 triggers handoff automatically on 30 seconds of failed reconnection. Desktop workers can call it explicitly when approaching token budget limits. The required_capability of the original task is preserved in the handoff task so the continuation lands on a capable agent.
Agent Memory System
The agent memory system gives agents persistent, searchable, encrypted memory across tasks.
agent_memory table with UNIQUE(agent_id, key). Values are encrypted at rest using the hub's secret key. An FTS5 virtual table (agent_memory_fts) stores plaintext copies for full-text search.
POST /api/agents/{id}/memory— store or update a memory entryGET /api/agents/{id}/memory— list all memories for an agentGET /api/agents/{id}/memory/search?q=<query>— FTS5 full-text searchDELETE /api/agents/{id}/memory/{key}— delete a specific entry
Workers scan task output for [REMEMBER] key: value [/REMEMBER] blocks and POST them automatically. When the hub creates a task prompt, get_memory_context(agent_id, task_text) performs an FTS5 search against the task text and prepends top-K matching memories. Agents thus "remember" relevant prior facts without explicit operator configuration.
A "sleep" function sweeps all agent memories, merges duplicates, resolves contradictions (later entry wins unless confidence scores differ), and discovers cross-agent patterns. Results are written back as consolidated memory entries tagged with source: consolidation. FTS5 query strings are sanitized to alphanumeric words before reaching the FTS engine, preventing malformed syntax from crashing the shared database connection.
Task Dependencies and Pipelines
Tasks can declare dependencies on other tasks, forming execution DAGs.
Schema. task_deps junction table (parent_task_id, child_task_id). Circular dependency prevention uses a recursive CTE that walks the ancestor chain before insertion. Tasks created with depends_on: [id1, id2] start in queued state regardless of dispatcher availability.
Auto-dispatch. When a task reaches done, check_and_dispatch_dependents() runs. It finds all child tasks whose parents are all in done state, injects parent output into the child prompt, and dispatches. This chains arbitrarily deep without operator involvement.
Pipelines. POST /api/pipelines accepts a list of task definitions and wires them sequentially: task N's completion dispatches task N+1 with N's output injected. Use cases: research → summarize → notify; code_write → code_review → deploy.
Task Templates
Task templates allow one-tap dispatch of recurring workflows with variable substitution.
Schema. task_templates table with fields: id, name, description, prompt_template (with {variable} placeholders), capability, priority, created_by.
Seed templates: Research ({topic}), Summarize ({target}), QA ({target}), Code Review ({target}), Write ({topic}).
POST /api/templates— create a templateGET /api/templates— list all templatesDELETE /api/templates/{id}— remove a templatePOST /api/templates/{id}/dispatch— dispatch with variable substitution
The Commander app shows a horizontal template chip row on the Tasks screen. Tapping a chip opens a bottom sheet for variable input, then dispatches. The voice dispatch endpoint also matches natural language to the most appropriate template.
Plugin System
The plugin system allows extending agent capabilities with structured tools callable during task execution.
Architecture. app/plugins.py implements PluginManager with three registries: tool functions, lifecycle hooks, and capability declarations. Plugin files in the plugins/ directory expose a setup(manager) function that registers with the manager on hub startup.
- web_search — searches the web and returns structured results
- file_summary — summarizes a file at a given path (hub-side, path-sanitized)
- notify — fires an ntfy push notification from within a task
- memory_enhance — performs an FTS5 search against agent memories and returns matches
Tool invocation. Workers scan task output for [TOOL_CALL] {"tool": "web_search", "query": "..."} [/TOOL_CALL] markers, POST to /api/plugins/{tool}/invoke, and incorporate the result into the next Claude invocation. Plugin responses strip filesystem paths from the output.