How to Build a Persistent AI Agent with Hermes in 15 Minutes

DevArt keeps this article discoverable at a fast, self-canonical URL and links clearly to the original DEV publication.

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

Most AI integrations are stateless. Every request starts cold.
Hermes Agent is different — it remembers.

This guide walks you through spinning up Hermes locally and building a minimal agent that accumulates memory across sessions. No vector database. No RAG pipeline. Just a session ID.

Prerequisites

Docker or Python 3.11+
Basic familiarity with REST APIs
15 minutes

Step 1: Run Hermes Locally

# via Docker
docker pull nousresearch/hermes-agent
docker run -p 11434:11434 nousresearch/hermes-agent

Verify it's alive:

curl http://localhost:11434/health
# {"status":"ok"}

Step 2: Your First Stateful Chat

Hermes exposes an OpenAI-compatible /v1/chat/completions endpoint. The magic is one header: X-Hermes-Session-Id.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="hermes",
)

SESSION_ID = "my-first-agent"

def chat(message: str) -> str:
    response = client.chat.completions.create(
        model="hermes",
        messages=[{"role": "user", "content": message}],
        extra_headers={"X-Hermes-Session-Id": SESSION_ID},
    )
    return response.choices[0].message.content

Now send two messages in separate calls — no shared history in the request body:

print(chat("My name is Alex and I'm building a todo app in Go."))
# "Nice to meet you, Alex! ..."

print(chat("What language am I using?"))
# "You're using Go, as you mentioned earlier."

The second call has no conversation history in the request body. Hermes remembered anyway — because the session ID matched.

Step 3: Feed Events Over Time

Hermes memory compounds. The more you feed it, the richer its understanding becomes. Feed events as structured facts:

events = [
    "Decision: Switched auth from JWT to session cookies. Reason: race condition in token refresh under concurrent requests caused 2% of users to be logged out.",
    "Decision: Removed Redis cache layer. Reason: cache invalidation bugs caused stale data in production. Replaced with direct DB reads.",
    "Decision: Added rate limiting to /api/search. Reason: one customer was generating 40% of total API load.",
]

for event in events:
    chat(event)

# Now ask about the accumulated history
print(chat("What are the biggest reliability concerns in this codebase?"))
# Hermes synthesizes across all three events

Step 4: Register a Cron Job

Hermes has a built-in scheduler. Register a recurring autonomous task:

import httpx

httpx.post(
    "http://localhost:11434/api/jobs",
    headers={"Authorization": "Bearer hermes"},
    json={
        "name": "daily-standup",
        "schedule": "0 9 * * 1-5",
        "prompt": (
            "You are the project memory agent. Using what you remember "
            "from recent activity, generate a concise standup summary: "
            "what changed, why, and what to watch."
        ),
    },
)

That's it. Hermes now runs this prompt every weekday at 9am, drawing from whatever it has accumulated in memory — no external database, no retrieval pipeline.

What Just Happened

Concept	How Hermes Handles It
Memory	Persistent per session ID — no client-side history needed
Scheduling	Native `/api/jobs` endpoint with cron syntax
API surface	OpenAI-compatible — drop-in for existing code
Cost	Memory stays bounded — not a growing transcript

Session ID Design Patterns

Session IDs are namespaces. Make them intentional:

# Per-user memory
session_id = f"user:{user_id}"

# Per-repository institutional memory
session_id = f"repo:{owner}/{repo_name}"

# Per-customer support history
session_id = f"support:{customer_id}"

Sessions never bleed into each other. repo:facebook/react and repo:your-team/backend are completely isolated brains.

What to Build Next

Give each user their own session ID → per-user personalization without a user profile database
Feed GitHub commits into a session over time → a codebase that explains its own history
Schedule daily analysis jobs → autonomous agents that surface insights without being asked

The pattern scales to anything that benefits from an AI that remembers what it's seen before — which turns out to be almost everything worth building.