This is a submission for the Hermes Agent Challenge: Write About Hermes Agent
Most AI integrations are stateless. Every request starts cold.
Hermes Agent is different — it remembers.
This guide walks you through spinning up Hermes locally and building a minimal agent that accumulates memory across sessions. No vector database. No RAG pipeline. Just a session ID.
Prerequisites
- Docker or Python 3.11+
- Basic familiarity with REST APIs
- 15 minutes
Step 1: Run Hermes Locally
# via Docker
docker pull nousresearch/hermes-agent
docker run -p 11434:11434 nousresearch/hermes-agent
Verify it's alive:
curl http://localhost:11434/health
# {"status":"ok"}
Step 2: Your First Stateful Chat
Hermes exposes an OpenAI-compatible /v1/chat/completions endpoint. The magic is one header: X-Hermes-Session-Id.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="hermes",
)
SESSION_ID = "my-first-agent"
def chat(message: str) -> str:
response = client.chat.completions.create(
model="hermes",
messages=[{"role": "user", "content": message}],
extra_headers={"X-Hermes-Session-Id": SESSION_ID},
)
return response.choices[0].message.content
Now send two messages in separate calls — no shared history in the request body:
print(chat("My name is Alex and I'm building a todo app in Go."))
# "Nice to meet you, Alex! ..."
print(chat("What language am I using?"))
# "You're using Go, as you mentioned earlier."
The second call has no conversation history in the request body. Hermes remembered anyway — because the session ID matched.
Step 3: Feed Events Over Time
Hermes memory compounds. The more you feed it, the richer its understanding becomes. Feed events as structured facts:
events = [
"Decision: Switched auth from JWT to session cookies. Reason: race condition in token refresh under concurrent requests caused 2% of users to be logged out.",
"Decision: Removed Redis cache layer. Reason: cache invalidation bugs caused stale data in production. Replaced with direct DB reads.",
"Decision: Added rate limiting to /api/search. Reason: one customer was generating 40% of total API load.",
]
for event in events:
chat(event)
# Now ask about the accumulated history
print(chat("What are the biggest reliability concerns in this codebase?"))
# Hermes synthesizes across all three events
Step 4: Register a Cron Job
Hermes has a built-in scheduler. Register a recurring autonomous task:
import httpx
httpx.post(
"http://localhost:11434/api/jobs",
headers={"Authorization": "Bearer hermes"},
json={
"name": "daily-standup",
"schedule": "0 9 * * 1-5",
"prompt": (
"You are the project memory agent. Using what you remember "
"from recent activity, generate a concise standup summary: "
"what changed, why, and what to watch."
),
},
)
That's it. Hermes now runs this prompt every weekday at 9am, drawing from whatever it has accumulated in memory — no external database, no retrieval pipeline.
What Just Happened
| Concept | How Hermes Handles It |
|---|---|
| Memory | Persistent per session ID — no client-side history needed |
| Scheduling | Native /api/jobs endpoint with cron syntax |
| API surface | OpenAI-compatible — drop-in for existing code |
| Cost | Memory stays bounded — not a growing transcript |
Session ID Design Patterns
Session IDs are namespaces. Make them intentional:
# Per-user memory
session_id = f"user:{user_id}"
# Per-repository institutional memory
session_id = f"repo:{owner}/{repo_name}"
# Per-customer support history
session_id = f"support:{customer_id}"
Sessions never bleed into each other. repo:facebook/react and repo:your-team/backend are completely isolated brains.
What to Build Next
- Give each user their own session ID → per-user personalization without a user profile database
- Feed GitHub commits into a session over time → a codebase that explains its own history
- Schedule daily analysis jobs → autonomous agents that surface insights without being asked
The pattern scales to anything that benefits from an AI that remembers what it's seen before — which turns out to be almost everything worth building.