
Why Mnemonic Exists¶
Mnemonic is a personal exploration into making AI agent coordination deterministic. It grew out of real frustrations I hit while using Claude Code for day-to-day development work.
Work in Progress
Mnemonic is under active development and the ideas behind it are still evolving. This page evolves with it -- what you read today may look different tomorrow.
The Backstory¶
When I started playing around with Claude Code, I immediately started running into issues; the biggest one being that it seemed to always "forget" the instructions and guidelines I'd give it. I soon learned about using memory files, and started loading up my CLAUDE.md doc with all types of @ references. You name it, from general agent behavior, documentation guidelines, coding guidelines, testing guidelines, you name it.
That helped with smaller projects, but for projects that were more than a couple of code files, it too, quickly began to show cracks. All those guidelines ate up a good majority of the context. And as the context grew even larger, I started to encounter the "lost in the middle" problem.
So, I then began to make sure that my memory files were pared down to only the essential. In some cases it helped. But it didn't truly resolve the issue. The challenge came from situations where, let's say I'm only working on the CI build, using YAML build definitions and shell scripting. Well, all those memory files not related to those tasks still get loaded into context.
MCP to the rescue. kinda?¶
About that time I also learned about MCP servers. I started using the Knowledge Graph Memory Server to hold the things that were applicable to all my projects instead of directly including them in each project, and instructing Claude to query the memory server when a guideline was needed. Now, that was starting to show some promise. Only the examples, guidelines, and behaviors got loaded into context that were needed for the work at hand.
Introducing Agents¶
But long sessions would start to show the same issues; lost-in-the-middle issues, and situations where the context would get so large that even proactive /compact commands didn't do much. So, I'd have to clear the context and start over. Then I learned about creating agents (or subagents) that Claude can delegate tasks to. So, I started defining agents that Claude could use for specific types of work. Agents like a go software agent, a bats test agent, and an api architect agent, to name a few examples.
Too Much Context...Again¶
As I was doing this I realized that with all the code examples and such, the markdown files that defined the agents were getting crazy huge. The subagents have their own context, true, but I didn't want to fill it up with things it may not need for a particular task. Then I got the idea, "hey, why don't I put all the examples in the memory server, and let the subagent query for them when it needs them?" I gave it a shot with a couple of agents and, wow! I was onto something, at least I think so.
Now, all this time I was a software engineering manager and had recently introduced my team to using Go, and at the same time encouraging them to use Claude Code (or whatever tool they liked). One of the things teams can struggle with is consistency across codebases, documentation, testing rigor, and so on. What if I have Claude help manage that? When anyone on the team starts a solution and uses Claude to help them, Claude would produce consistent results no matter who was using it. But how do I do this? The current memory server I was using wasn't shareable, at least not to my knowledge. That meant that each team member would need to stand up their own memory server, add in the docs and examples, and keep it updated as they evolved or new ones were developed. I think you can see how this could get unwieldy very quickly. No, I wanted a memory server that is shared, and resigned myself to the fact that I was going to have to write one.
But (I thought) I Found A Solution¶
While researching what it would take to write a shared memory server, I stumbled across campfirein/cipher by Byterover. Now, I like to sling code like anyone else, but I also am lazy in that I don't want to reinvent the wheel. I may have found my wheel! It seems to potentially have what I need for a production-grade memory server:
- Memory is stored in a vector database
- Chat history is stored in Postgres
This means Cipher itself can be load balanced for HA, and should things go really bad, we can have a DR plan in place without losing our examples, guidelines, and even chat history!
Bad news
Cipher just was too difficult and lacked the tooling I was looking for to "prime the pump" so to speak. Plus, it seems that the project has been archived.
Good news
I found topoteretes/cognee! It has better documentation, was much easier to get running, and has APIs and tooling for loading the server!
The Initial Idea¶
Here's what I'm thinking:
- Define a set of agents that perform specific tasks.
- Create examples and guidelines and load them into the shared memory server (Cognee).
- Add references to the examples and guidelines using query.
- Distribute the agent definitions to my team.
And if all works as I hope, everyone's agents will work the same and use the same examples and guidelines, loading them only when they are needed. This also solves the issue of the memory servers getting out of sync since there is only one, and gives us one place to add more examples and other things that need to be remembered across projects, engineers, and even across teams.
graph TB
subgraph Team["Development Team"]
subgraph Engineer1["Engineer 1"]
IDE1["Claude Desktop"]
subgraph Agents1["Subagents"]
Go1["go software agent"]
API1["api architect agent"]
Test1["bats test agent"]
end
IDE1 --> Agents1
end
subgraph Engineer2["Engineer 2"]
IDE2["Roocode"]
subgraph Agents2["Subagents"]
Go2["go software agent"]
API2["api architect agent"]
CI2["devops agent"]
end
IDE2 --> Agents2
end
subgraph Engineer3["Engineer 3"]
IDE3["Claude Code"]
subgraph Agents3["Subagents"]
Go3["go software agent"]
Doc3["documentation agent"]
Test3["bats test agent"]
end
IDE3 --> Agents3
end
end
Agents1 -->|"Query for<br/>examples,<br/>guidelines"| Cognee
Agents2 -->|"Query for<br/>API patterns,<br/>best practices"| Cognee
Agents3 -->|"Query for<br/>testing templates,<br/>doc standards"| Cognee
Cognee["Shared Memory MCP Server<br/>(SSE port 4000)"]
style Team fill:#f8fafc,stroke:#64748b,stroke-width:2px
style Engineer1 fill:#f1f5f9,stroke:#64748b,stroke-width:1px
style Engineer2 fill:#f1f5f9,stroke:#64748b,stroke-width:1px
style Engineer3 fill:#f1f5f9,stroke:#64748b,stroke-width:1px
style IDE1 fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
style IDE2 fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
style IDE3 fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
style Agents1 fill:#ccfbf1,stroke:#0d9488,stroke-width:1px
style Agents2 fill:#ccfbf1,stroke:#0d9488,stroke-width:1px
style Agents3 fill:#ccfbf1,stroke:#0d9488,stroke-width:1px
style Go1 fill:#ccfbf1,stroke:#0d9488
style API1 fill:#ccfbf1,stroke:#0d9488
style Test1 fill:#ccfbf1,stroke:#0d9488
style Go2 fill:#ccfbf1,stroke:#0d9488
style API2 fill:#ccfbf1,stroke:#0d9488
style CI2 fill:#ccfbf1,stroke:#0d9488
style Go3 fill:#ccfbf1,stroke:#0d9488
style Doc3 fill:#ccfbf1,stroke:#0d9488
style Test3 fill:#ccfbf1,stroke:#0d9488
style Cognee fill:#fecaca,stroke:#dc2626,stroke-width:3px
Why Mnemonic Evolved¶
The initial approach proved the concepts. Shared patterns work. Specialist agents work. A shared knowledge graph works.
But there's one problem I couldn't solve: non-deterministic agent delegation.
Even with explicit delegation tables in my global CLAUDE.md file - "if user asks for X, delegate to agent Y" - Claude would still sometimes forget. It would try to write BATS tests itself instead of using the bats-test-agent. It would explore code when it should just delegate immediately. The rules helped, but they weren't reliable.
The fundamental issue is that LLMs are non-deterministic by nature. Asking an LLM to make routing decisions means those decisions will sometimes be wrong, forgotten, or inconsistent. No amount of prompt engineering fully solves this.
The Mnemonic Solution¶
Mnemonic takes a different approach: routing decisions are made by code, not by the LLM.
graph TB
subgraph MN ["Mnemonic Architecture"]
direction LR
User2[User Request] --> ACE_CLI[Mnemonic CLI]
ACE_CLI -->|"REST API"| Mnemonic[Mnemonic Service]
Mnemonic -->|"Code-based<br/>routing"| Router[Routing Engine]
Router --> MainACE[Claude Code]
MainACE -->|"Executes route"| Agents2[Specialist Agents]
Mnemonic --- DataLayer[(Postgres + PGVector + Neo4j)]
end
style User2 fill:#f1f5f9,stroke:#64748b,stroke-width:1px
style ACE_CLI fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
style Mnemonic fill:#ccfbf1,stroke:#0d9488,stroke-width:2px
style Router fill:#dcfce7,stroke:#22c55e,stroke-width:2px
style MainACE fill:#f1f5f9,stroke:#64748b,stroke-width:1px
style Agents2 fill:#f1f5f9,stroke:#64748b,stroke-width:1px
style DataLayer fill:#fef3c7,stroke:#f59e0b,stroke-width:2px
Mnemonic's routing engine (code) decides which agent to use. Deterministic. Custom-built service.
Mnemonic: Built From the Ground Up¶
I'm building Mnemonic from scratch. It's a purpose-built service designed specifically for agent coordination:
| What Mnemonic Stores | Purpose |
|---|---|
| Agent Definitions | System prompts, allowed tools, model preferences |
| Routing Rules | Code-based rules that determine which agent handles which request |
| Patterns | Reusable context documents (code examples, guidelines, best practices) |
| Pattern Associations | Links between patterns and agents with relevance scores |
The data architecture:
- PostgreSQL for relational data and ACID transactions
- PGVector for semantic similarity search on patterns
- Neo4j for knowledge graph relationships between patterns, agents, and concepts
This isn't just swapping one memory server for another. Mnemonic is designed to make routing deterministic and auditable. The routing engine evaluates rules in priority order using code - no LLM involved in the decision. When you ask Mnemonic to "write BATS tests," the routing engine matches that to the bats-test-agent via configured rules, not via an LLM interpretation that might vary.
RAG with Deterministic Orchestration
Mnemonic's pattern retrieval works like RAG - semantic search finds relevant context to enrich prompts. The difference is that Mnemonic also handles routing (which agent gets the request) through code-based rules, not LLM decisions. Think of it as RAG with deterministic orchestration.
Main Claude's role in Mnemonic is purely execution - it doesn't decide who does the work, it just coordinates what the routing engine determines. The LLM's creativity goes into the work itself, not into deciding who should do the work.
The Phased Approach¶
Mnemonic is being built in phases:
Phase 1: Claude Code Integration (Current)
- Mnemonic provides routing decisions via REST API
- Mnemonic CLI orchestrates execution through Claude Code
- Patterns retrieved from Mnemonic's knowledge graph
Phase 2: Direct API Integration (Future)
- Direct integration with Anthropic API
- Removes Claude Code as intermediary
- More control over context and execution
Phase 3: Authentication and Authorization (Enterprise)
- Team-level access control
- Pattern governance
- Audit logging
Things that go without saying¶
This is meant to help reinforce a spec engineering approach to agentic coding. So, things like BMAD, OpenSpec, SpecFlow, or your custom spec-engineering process, could easily be adapted by updating the definition of your agents to use this approach for memory (context) management.
To Summarize¶
Mnemonic is a purpose-built service that makes routing deterministic. No more hoping the LLM remembers to use the right agent. The routing engine decides, and it decides the same way every time.
I'll be tracking my progress, both achievements and dead-ends, as I go. So follow along for the ride!