Open Claude Code, do an hour of good work, then clear the context. Everything it learned about your project in that hour is gone. The decision you talked it out of, the convention it finally internalized, the bug you explained three times... all of it resets to zero the moment the window closes. Stock Claude Code has the memory of a goldfish. Brilliant for one conversation, a stranger the next.
KongCode is the thing that fixes that. It is a permanent, graph-backed memory layer for Claude Code. Every session it watches what happens, pulls out the parts worth keeping, and stores them in a graph database running on your own machine. The next session, before you have typed a word, the relevant pieces are already back in context. Corrections you made, preferences you stated, decisions with the reasoning attached, procedures that worked. It is the difference between an assistant who meets you fresh every morning and one who actually remembers yesterday.
You asked for the honest version, so this is a review, not a brochure. What it does, why it is worth running, and the costs that are real instead of the ones that sound dramatic. Both halves are true at once, and the second half is the one nobody puts on a landing page.
What it actually does
The concrete version. When you correct it... “no, we use this pattern here, not that one”... that correction gets stored as the highest-signal thing in the system and comes back the next time the topic surfaces. When you mention you like terse commit messages or you do not want mocks in your tests, that becomes a preference it carries forward. When it works out a multi-step fix that actually lands, that becomes a skill it can pull back up when the same shape of problem shows up months later. You stop re-explaining yourself. The context you built stops evaporating at the end of the day.
None of this is manual. You are not writing memory files or tagging things to remember. A background daemon does the extraction on its own, every few thousand tokens, while you work on something else. That is the whole pitch in one line. Memory that accumulates without you babysitting it.
How it works, without the hand-waving
Underneath it is two processes. A long-lived daemon holds the database, the embedding model, and the reranker in one pool of RAM, and a thin per-session client talks to it over a local socket. Run three Claude sessions at once and they share that one daemon instead of each spinning up its own copy of a 420MB model. One kitchen, three cooks, not three kitchens.
Retrieval is not just ‘embed the question, grab the five nearest vectors.’ It embeds your prompt, pulls candidates by similarity, scores them on a handful of signals, reranks the top of the pile with a cross-encoder that reads each candidate against the actual question, then walks the graph one hop out to drag in neighbors the raw similarity search missed. That last step is the part a flat vector store cannot do, and it is where the graph earns its name. There are 1014 tests guarding the invariants that hold all of this together, including one that checks every graph edge against its declared type before a change can ship.
The retrieval quality is not a guess. This pipeline is ported straight from kongclaw, the retrieval engine I built and benchmarked before KongCode existed, and it runs the same way here. On LongMemEval that design scores 98.2% Recall@5, which means the right memory lands in the top five results better than ninety-eight times out of a hundred. The only asterisk is the ordinary one... it is a single benchmark, and the top number assumes the cross-encoder reranker is loaded. That is the whole footnote.
One more thing worth knowing about the scoring. It starts on a hand-tuned baseline, seven signals weighted by feel, and a small learned model called ACAN, trained on which of your own retrieved memories actually got used, takes over once you have enough history for it to mean anything. The ranking gets more yours the longer you run it.
Now the part the brochure leaves out
It costs money to run, and the bill is invisible. The extraction that fills the graph needs a language model to do it, so the daemon shells out to your own authenticated Claude CLI in the background... on startup, every five minutes, and after every session ends. Each run chews through five to fifteen queued items on your normal usage. Nothing tells you it is happening. You are paying for memory in tokens, quietly, while you are looking at something else. That is a fair trade if you live in long projects across many sessions. It is pure waste if you fire up Claude to write one shell script and never come back.
The memory is lossy, and it is not the same twice. Extraction is quality-gated on purpose, so weak-signal material gets dropped before it can fill the graph with noise. The flip side is that the same conversation, run through extraction twice, can produce different memories depending on how strong the signals read that day. It is not a transcript and it is not trying to be. It keeps what looked durable and lets the rest go, which means now and then it lets go of something you wish it had kept. Fidelity over volume is the right call, but it is a call, and it has a price.
The foundation was learned on the job. We have said this in plain language already, and it belongs in any honest review too. The database layer was vibe coded. More than fifteen distinct SurrealQL bugs across the version history, and the audits keep turning up more... queries that crashed on inputs we never guarded, edge types declared one way in the schema and written another way in the code, a type-coercion bug in the supersede path that was corrupting rows until we caught and healed it live in 0.7.96. 817 lines of schema that we have been auditing line by line, release after release, precisely because we do not trust ourselves to have gotten it right by feel. The intelligence sitting on top of it, the daemon and the extraction and the retrieval and the roughly sixteen thousand lines of engine, that part was engineered with intent. The dirt floor it all stands on was poured by people who were still learning to pour concrete, and who keep going back to patch it.
It lives on one machine and it can die with that machine. The entire point is that nothing touches the cloud on the read path. Your embeddings, your graph, your retrieval, all of it local. The cost of that privacy is that your memory is a directory on one disk. No sync, no team sharing, no backup unless you make one yourself. There is a native export command in the box for exactly this, but it does not run itself. If that drive dies and you never used it, the agent that knew your project for six months is a stranger again. Privacy and durability pull in opposite directions, and KongCode picked privacy. You should know that is the trade before you depend on it.
So what is it, honestly
It is the most serious attempt I have seen at giving a coding agent real memory. A graph design pulled from actual memory research instead of a quickstart, a retrieval pipeline with more than one stage and a benchmark behind it, and a test suite in the four figures guarding the parts that have to stay true. It is also pre-1.0, it spends your tokens behind your back, it forgets things on purpose, and it stands on a database layer we are still hardening release after release. None of those cancel the others out. They are all the same system.
It earns its keep if you work in long-running projects and you are tired of re-teaching the same context every session. It is overhead you will never feel the value of if you treat Claude like a calculator. The honest pitch is not that this is magic. The honest pitch is that it is a real memory system with real costs, and we would rather tell you what they are than bury them under a benchmark. The whole thing is on github.com/42U/kongcode, costs and all.