// LLM knowledge base for Hermes Agent (NousResearch) + the Pi Harness deployment on claws-mac-mini
claws-mac-mini — launchd · Codex OAuth + Gemma-4 fallback · self-heal patches · tool surface). Companion prose at hermes-agent-guide and hermes-pi-harness-guide.
┌────────────── MESSAGE INGRESS ──────────────┐
│ Telegram · Discord · Slack · WhatsApp │
│ Signal · Matrix · SMS · CLI · voice · … │
└────────────────────┬────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ GATEWAY SERVICE (long-running) │
│ multi-platform routing · per-user sessions │
│ cron dispatch · systemd/launchd restart │
└────────────────────┬────────────────────────┘
▼
┌─────────────────────────────────────────────┐
│ AGENT CORE (claw.py) 6-step loop │
│ 1 load → 2 LLM → 3 tools → 4 stream → 5 persist → 6 learn
└────┬─────────────┬──────────────┬───────────┘
▼ ▼ ▼
17 toolsets MCP servers Model tier
(web · fs · (stdio · HTTP OpenRouter · Anthropic
browser · · OAuth 2.1) OpenAI · Nous · Copilot
code · TTS …) (+ Codex OAuth, Gemma-4 local on Pi harness)
│
▼
SKILLS Official · Trusted · Community · Custom
│
▼
~/.hermes/
.env · config.yaml · SOUL.md
state.db (FTS5) · skills/ · memories/
sessions/ · logs/ · cron/
Long-running entry point. Multi-platform routing, per-user session isolation, cron trigger dispatch. Supervised by systemd (Linux / WSL2) or launchd (macOS). On the Pi Harness: launchd label ai.hermes.gateway.
The thinking loop. Six steps per turn: load context → LLM call → tool execution → stream response → persist + learn. Persist step updates state.db, token usage, Honcho, and offers skill extraction.
Provider-agnostic LLM call shim. Normalises auth + streaming across OpenRouter · Anthropic · OpenAI · Ollama · vLLM · Nous · Copilot. The agent core never speaks to a provider directly.
Top-level command — hermes run for an interactive session, hermes gateway run for the service, plus the slash command family in-chat.
What makes Hermes stateful. After each turn, the core offers to extract reusable skills and update Honcho/Mem0 memories. The same install grows more personal over weeks.
Provider aggregator — single API key, many upstreams. Default recommendation for new installs because it covers most model choices in one place.
Claude model family — Opus, Sonnet, Haiku. First-class support; used by the autoresearch loop escalation policy (Sonnet → Opus 4.6 at 0.92).
Supported via API key. Distinct from the Codex OAuth path used on the Pi harness.
Nous Research's hosted endpoint. Project-default path for Nous-branded installs.
Local inference paths. vLLM for throughput, Ollama for ergonomics, llama-server for Gemma-class quantised models on Apple silicon.
Available as a provider for licensed users. Useful when the existing Copilot entitlement covers inference cost.
Pi-harness primary. chatgpt.com/backend-api/codex via ChatGPT session — entitlement-funded rather than API-billed. Not a supported surface; returns empty response.output in bursts. Covered by the Gemma self-heal.
Pi-harness fallback. gemma-4-e4b-it-Q4_K_M.gguf on llama-server at 127.0.0.1:8080. OpenAI-compatible chat-completions endpoint. Always up; covers Codex upstream blips.
Config toggle in config.yaml. Lets Hermes route low-stakes turns to a cheap model (Gemma on the Pi harness) while keeping the primary model for harder work. One knob, two models.
Search providers: Firecrawl, Exa, Tavily. Config-selectable per install.
Playwright-backed browser sessions. Inactivity timeout 120s in the config reference. Browserbase as an external option.
Read / write / patch files in the active working directory. Gated by the session's cwd + the permission model.
Runs code through the chosen terminal backend — local shell, Docker, SSH, Modal, Daytona, Singularity.
Image understanding + generation. FAL.ai common for image gen; vision inputs piped through the provider's multimodal path.
Voice out (ElevenLabs class) + voice in (Whisper class). Pairs with the iOS Pi harness voice input node.
Decomposes a task into subtasks; calls other toolsets in sequence. Explicit planner toolset rather than emergent behaviour.
Schedule recurring agent runs. Standard cron expressions plus per-run cost caps + pause/resume control.
Bridge to Home Assistant for physical-world automations. Entity discovery + service calls from chat.
Where code execution runs: local, Docker, SSH, Modal, Daytona, Singularity (HPC). Configurable per session. Pi harness uses local with a 180s per-call timeout.
Firecrawl · Exa · Tavily · Browserbase · FAL.ai · ElevenLabs · Home Assistant. API keys held in .env; consumed by the matching toolset.
0 9 * * *-style expressions stored in ~/.hermes/cron/. Per-run cost cap + pause/resume. Triggers flow back through the agent core like any other message.
External tools plugged in via the open MCP standard. Three transports supported: stdio, HTTP, and OAuth 2.1. Any MCP server that works with Claude Code works with Hermes.
stdio (local subprocess), HTTP (hosted), OAuth 2.1 (hosted with user auth). Pi harness uses stdio for filesystem and OAuth-bridged HTTP via mcp-remote for gtm.
mcp_servers: block in config.yaml. Each entry names a server, its transport command, and description. Gateway starts the subprocess on boot.
Four-tier library: Official, Trusted, Community, Custom. Skills are lazy — only attached when matched. Stored under ~/.hermes/skills/.
Security pipeline for skills: static scan → quarantine → policy check → user confirm → deploy. Custom taps hook into each phase.
Author → scan → test → publish → pin. Community skills land in quarantine first; promote after policy check + confirmation.
Socket-mode. Pi harness runs claude_code_slack as the Slack handle on a Blue Highlighted Text workspace.
Bot-gateway. Native support in the platform family; shares channel-routing semantics with Slack.
Supported surfaces sharing the one-internal-protocol design. Voice channels pair with STT/TTS toolsets.
Secrets. API keys for every provider + external tool. Backed up separately from config.yaml for safety.
Runtime config — model.*, smart_routing, mcp_servers, platform_toolsets, compression, memory, session_reset, delegation. See Pi harness block for the concrete shape.
Persona + style guide read on every turn. Sibling to OpenClaw's SOUL.md; shared convention across the two projects.
Session history + token usage + semantic index. SQLite with FTS5 for full-text recall. Ground truth for what "happened" in any session.
Honcho (dialectic memory) + optional Mem0. Distinct from state.db — memories are distilled facts; state.db is raw history. Memory limits per user: 2200-char memory, 1375-char profile on the Pi harness.
Installed skill bundles. Each skill is a folder with its own SKILL.md. Skills hub manages the registry; this directory holds the bytes.
Per-session working state. Paired with state.db for durability; survives Gateway restarts.
Gateway + agent logs. Pi harness splits into gateway.log, gateway.error.log, and errors.log — the last is what you tail during self-heal debugging.
Apple-silicon Mac mini. Tailscale IP 100.82.244.127. OS Darwin 25.2.0 arm64. User claw. Python 3.11 venv under ~/.hermes/hermes-agent/venv.
LaunchAgent at ~/Library/LaunchAgents/ai.hermes.gateway.plist. LimitLoadToSessionType=Aqua — gateway needs a logged-in desktop session. Restart: launchctl kickstart -k gui/$(id -u)/ai.hermes.gateway.
python -m hermes_cli.main gateway run --replace. One process; binds the Slack Socket Mode app, loads MCP servers, opens the session store, starts a 60s cron ticker.
Primary model openai-codex / gpt-5.4; smart_routing cheap_model gemma-4-e4b-it-Q4_K_M.gguf; compression summary model google/gemini-3-flash-preview; session_reset daily at 04:00 local; mcp_servers filesystem + remote gtm via Stape; platform_toolsets.slack = [hermes-slack, filesystem].
Two-patch ramp on top of upstream. Patch 1: content flattener — folds Codex's list-of-parts input into plain text so Gemma's /v1/chat/completions accepts it. Patch 2: sliding-window trim, tool-call token stripper, retry with [system, last_user, last_assistant] minimal envelope, graceful final message so Slack never shows "Max retries exceeded".
npx -y @modelcontextprotocol/server-filesystem ~/.hermes/hermes-agent. Rooted at the Hermes repo — agent can read/write anywhere under its own install.
npx -y mcp-remote https://gtm-mcp.stape.ai/mcp. OAuth-bridged Google Tag Manager MCP. Lets the harness read + mutate GTM container state — the bridge between Hermes and the autoresearch loop.
Installed on the Mac mini's PATH for the gateway to reach: gh (GitHub CLI) · gitingest (repo → LLM-friendly digest) · repo-digest (wrapper that glues both into one call). Auth via GITHUB_TOKEN in the plist env.
Every edit to config.yaml or run_agent.py creates a timestamped backup — e.g. run_agent.py.bak-2026-04-18-155055. Enables deterministic rollback after a bad patch.
curl -fsSL https://…/install.sh | bash. One-line install creates the Python venv under ~/.hermes, installs hermes-agent, sets up the config dir, adds the hermes binary to PATH.
Interactive session in the TUI. Good for quick work + debugging the install. hermes gateway run starts the always-on service instead.
Diagnoses the install. Checks Python venv, config, provider auth, MCP subprocesses, session store.
In-chat slash commands. /compact tightens context; /new resets the session; /think toggles thinking depth; /memory inspects memories; /skill previews skill state.
Pi harness gateway restart. launchctl kickstart -k gui/$(id -u)/ai.hermes.gateway. The -k kills the current process first so the reload is immediate.
Pi-harness wrapper around gh + gitingest. One-shot repo → digest path + metadata. Installed at ~/bin/repo-digest, on the gateway's PATH.
Upstream prose guide for Hermes Agent. Tabs: Overview · Install · CLI · Skills · MCP · Models · Platforms · Architecture · Glossary + FAQ. Source of truth for generic Hermes phrasing.
Deployment-specific guide for the Pi harness on claws-mac-mini. Runbook, self-heal patch history, backup discipline, shell tool surface, model stack.
Sibling personal-AI control plane. Shares conceptual shape (gateway · channels · skills · agents) but uses a different runtime. OpenClaw is a node/npm-delivered daemon; Hermes is a Python package.
Cloudflare-edge session orchestrator for Claude Code. Complementary to the Pi harness — same "host the coding agent somewhere reliable" problem, different substrate.
Hill-climb loop used with Hermes on the Pi harness to drive autoresearch. The gtm MCP server is the bridge; the escalation policy (Sonnet → Opus 4.6 @ 0.92) is the shared model-routing convention.