Skip to content

Hero

Why Mnemonic Exists

Mnemonic is a personal exploration into making AI agent coordination deterministic. It grew out of real frustrations I hit while using Claude Code for day-to-day development work.

Work in Progress

Mnemonic is under active development and the ideas behind it are still evolving. This page evolves with it -- what you read today may look different tomorrow.

The Backstory

When I started playing around with Claude Code, I immediately started running into issues; the biggest one being that it seemed to always "forget" the instructions and guidelines I'd give it. I soon learned about using memory files, and started loading up my CLAUDE.md doc with all types of @ references. You name it, from general agent behavior, documentation guidelines, coding guidelines, testing guidelines, you name it.

That helped with smaller projects, but for projects that were more than a couple of code files, it too, quickly began to show cracks. All those guidelines ate up a good majority of the context. And as the context grew even larger, I started to encounter the "lost in the middle" problem.

So, I then began to make sure that my memory files were pared down to only the essential. In some cases it helped. But it didn't truly resolve the issue. The challenge came from situations where, let's say I'm only working on the CI build, using YAML build definitions and shell scripting. Well, all those memory files not related to those tasks still get loaded into context.

MCP to the rescue. kinda?

About that time I also learned about MCP servers. I started using the Knowledge Graph Memory Server to hold the things that were applicable to all my projects instead of directly including them in each project, and instructing Claude to query the memory server when a guideline was needed. Now, that was starting to show some promise. Only the examples, guidelines, and behaviors got loaded into context that were needed for the work at hand.

Introducing Agents

But long sessions would start to show the same issues; lost-in-the-middle issues, and situations where the context would get so large that even proactive /compact commands didn't do much. So, I'd have to clear the context and start over. Then I learned about creating agents (or subagents) that Claude can delegate tasks to. So, I started defining agents that Claude could use for specific types of work. Agents like a go software agent, a bats test agent, and an api architect agent, to name a few examples.

Too Much Context...Again

As I was doing this I realized that with all the code examples and such, the markdown files that defined the agents were getting crazy huge. The subagents have their own context, true, but I didn't want to fill it up with things it may not need for a particular task. Then I got the idea, "hey, why don't I put all the examples in the memory server, and let the subagent query for them when it needs them?" I gave it a shot with a couple of agents and, wow! I was onto something, at least I think so.

Now, all this time I was a software engineering manager and had recently introduced my team to using Go, and at the same time encouraging them to use Claude Code (or whatever tool they liked). One of the things teams can struggle with is consistency across codebases, documentation, testing rigor, and so on. What if I have Claude help manage that? When anyone on the team starts a solution and uses Claude to help them, Claude would produce consistent results no matter who was using it. But how do I do this? The current memory server I was using wasn't shareable, at least not to my knowledge. That meant that each team member would need to stand up their own memory server, add in the docs and examples, and keep it updated as they evolved or new ones were developed. I think you can see how this could get unwieldy very quickly. No, I wanted a memory server that is shared, and resigned myself to the fact that I was going to have to write one.

But (I thought) I Found A Solution

While researching what it would take to write a shared memory server, I stumbled across campfirein/cipher by Byterover. Now, I like to sling code like anyone else, but I also am lazy in that I don't want to reinvent the wheel. I may have found my wheel! It seems to potentially have what I need for a production-grade memory server:

  • Memory is stored in a vector database
  • Chat history is stored in Postgres

This means Cipher itself can be load balanced for HA, and should things go really bad, we can have a DR plan in place without losing our examples, guidelines, and even chat history!


Bad news

Cipher just was too difficult and lacked the tooling I was looking for to "prime the pump" so to speak. Plus, it seems that the project has been archived.

Good news

I found topoteretes/cognee! It has better documentation, was much easier to get running, and has APIs and tooling for loading the server!


The Initial Idea

Here's what I'm thinking:

  1. Define a set of agents that perform specific tasks.
  2. Create examples and guidelines and load them into the shared memory server (Cognee).
  3. Add references to the examples and guidelines using query.
  4. Distribute the agent definitions to my team.

And if all works as I hope, everyone's agents will work the same and use the same examples and guidelines, loading them only when they are needed. This also solves the issue of the memory servers getting out of sync since there is only one, and gives us one place to add more examples and other things that need to be remembered across projects, engineers, and even across teams.

graph TB
    subgraph Team["Development Team"]
        subgraph Engineer1["Engineer 1"]
            IDE1["Claude Desktop"]
            subgraph Agents1["Subagents"]
                Go1["go software agent"]
                API1["api architect agent"]
                Test1["bats test agent"]
            end
            IDE1 --> Agents1
        end

        subgraph Engineer2["Engineer 2"]
            IDE2["Roocode"]
            subgraph Agents2["Subagents"]
                Go2["go software agent"]
                API2["api architect agent"]
                CI2["devops agent"]
            end
            IDE2 --> Agents2
        end

        subgraph Engineer3["Engineer 3"]
            IDE3["Claude Code"]
            subgraph Agents3["Subagents"]
                Go3["go software agent"]
                Doc3["documentation agent"]
                Test3["bats test agent"]
            end
            IDE3 --> Agents3
        end
    end

    Agents1 -->|"Query for<br/>examples,<br/>guidelines"| Cognee
    Agents2 -->|"Query for<br/>API patterns,<br/>best practices"| Cognee
    Agents3 -->|"Query for<br/>testing templates,<br/>doc standards"| Cognee

    Cognee["Shared Memory MCP Server<br/>(SSE port 4000)"]

    style Team fill:#f8fafc,stroke:#64748b,stroke-width:2px
    style Engineer1 fill:#f1f5f9,stroke:#64748b,stroke-width:1px
    style Engineer2 fill:#f1f5f9,stroke:#64748b,stroke-width:1px
    style Engineer3 fill:#f1f5f9,stroke:#64748b,stroke-width:1px
    style IDE1 fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
    style IDE2 fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
    style IDE3 fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
    style Agents1 fill:#ccfbf1,stroke:#0d9488,stroke-width:1px
    style Agents2 fill:#ccfbf1,stroke:#0d9488,stroke-width:1px
    style Agents3 fill:#ccfbf1,stroke:#0d9488,stroke-width:1px
    style Go1 fill:#ccfbf1,stroke:#0d9488
    style API1 fill:#ccfbf1,stroke:#0d9488
    style Test1 fill:#ccfbf1,stroke:#0d9488
    style Go2 fill:#ccfbf1,stroke:#0d9488
    style API2 fill:#ccfbf1,stroke:#0d9488
    style CI2 fill:#ccfbf1,stroke:#0d9488
    style Go3 fill:#ccfbf1,stroke:#0d9488
    style Doc3 fill:#ccfbf1,stroke:#0d9488
    style Test3 fill:#ccfbf1,stroke:#0d9488
    style Cognee fill:#fecaca,stroke:#dc2626,stroke-width:3px

Why Mnemonic Evolved

The initial approach proved the concepts. Shared patterns work. Specialist agents work. A shared knowledge graph works.

But there's one problem I couldn't solve: non-deterministic agent delegation.

Even with explicit delegation tables in my global CLAUDE.md file - "if user asks for X, delegate to agent Y" - Claude would still sometimes forget. It would try to write BATS tests itself instead of using the bats-test-agent. It would explore code when it should just delegate immediately. The rules helped, but they weren't reliable.

The fundamental issue is that LLMs are non-deterministic by nature. Asking an LLM to make routing decisions means those decisions will sometimes be wrong, forgotten, or inconsistent. No amount of prompt engineering fully solves this.

The Mnemonic Solution

Mnemonic takes a different approach: routing decisions are made by code, not by the LLM.

graph TB
    subgraph MN ["Mnemonic Architecture"]
        direction LR
        User2[User Request] --> ACE_CLI[Mnemonic CLI]
        ACE_CLI -->|"REST API"| Mnemonic[Mnemonic Service]
        Mnemonic -->|"Code-based<br/>routing"| Router[Routing Engine]
        Router --> MainACE[Claude Code]
        MainACE -->|"Executes route"| Agents2[Specialist Agents]
        Mnemonic --- DataLayer[(Postgres + PGVector + Neo4j)]
    end

    style User2 fill:#f1f5f9,stroke:#64748b,stroke-width:1px
    style ACE_CLI fill:#e0f2fe,stroke:#0284c7,stroke-width:2px
    style Mnemonic fill:#ccfbf1,stroke:#0d9488,stroke-width:2px
    style Router fill:#dcfce7,stroke:#22c55e,stroke-width:2px
    style MainACE fill:#f1f5f9,stroke:#64748b,stroke-width:1px
    style Agents2 fill:#f1f5f9,stroke:#64748b,stroke-width:1px
    style DataLayer fill:#fef3c7,stroke:#f59e0b,stroke-width:2px

Mnemonic's routing engine (code) decides which agent to use. Deterministic. Custom-built service.

Mnemonic: Built From the Ground Up

I'm building Mnemonic from scratch. It's a purpose-built service designed specifically for agent coordination:

What Mnemonic Stores Purpose
Agent Definitions System prompts, allowed tools, model preferences
Routing Rules Code-based rules that determine which agent handles which request
Patterns Reusable context documents (code examples, guidelines, best practices)
Pattern Associations Links between patterns and agents with relevance scores

The data architecture:

  • PostgreSQL for relational data and ACID transactions
  • PGVector for semantic similarity search on patterns
  • Neo4j for knowledge graph relationships between patterns, agents, and concepts

This isn't just swapping one memory server for another. Mnemonic is designed to make routing deterministic and auditable. The routing engine evaluates rules in priority order using code - no LLM involved in the decision. When you ask Mnemonic to "write BATS tests," the routing engine matches that to the bats-test-agent via configured rules, not via an LLM interpretation that might vary.

RAG with Deterministic Orchestration

Mnemonic's pattern retrieval works like RAG - semantic search finds relevant context to enrich prompts. The difference is that Mnemonic also handles routing (which agent gets the request) through code-based rules, not LLM decisions. Think of it as RAG with deterministic orchestration.

Main Claude's role in Mnemonic is purely execution - it doesn't decide who does the work, it just coordinates what the routing engine determines. The LLM's creativity goes into the work itself, not into deciding who should do the work.

The Phased Approach

Mnemonic is being built in phases:

Phase 1: Claude Code Integration (Current)

  • Mnemonic provides routing decisions via REST API
  • Mnemonic CLI orchestrates execution through Claude Code
  • Patterns retrieved from Mnemonic's knowledge graph

Phase 2: Direct API Integration (Future)

  • Direct integration with Anthropic API
  • Removes Claude Code as intermediary
  • More control over context and execution

Phase 3: Authentication and Authorization (Enterprise)

  • Team-level access control
  • Pattern governance
  • Audit logging

Things that go without saying

This is meant to help reinforce a spec engineering approach to agentic coding. So, things like BMAD, OpenSpec, SpecFlow, or your custom spec-engineering process, could easily be adapted by updating the definition of your agents to use this approach for memory (context) management.

To Summarize

Mnemonic is a purpose-built service that makes routing deterministic. No more hoping the LLM remembers to use the right agent. The routing engine decides, and it decides the same way every time.

I'll be tracking my progress, both achievements and dead-ends, as I go. So follow along for the ride!