On April 19, a KongBrain user (@rjhilgefort) opened our first external feature request. Six days later, on April 25, it was shipped as four stacked PRs and the issue closed. Here is what landed and why the work was bigger than “add a config flag.”
The ask
KongBrain shipped with local BGE-M3 embeddings via node-llama-cpp. Local embeddings are great for offline use and private setups, but they run serially through one model. A user with two agents across 15+ Discord channels accumulated about 3,400 turns in three weeks and started seeing 1,700+ embedding timeouts per week, escalating exponentially. The 30-second timeout we shipped in v0.4.4 helped at the edge, but the root cause was structural: serial local embeddings cannot keep up with high-volume multi-agent setups.
The request was reasonable: let users plug in OpenAI (or any compatible API) for embeddings, keep the local default. The issue included a concrete config sketch and even flagged the migration concern. Read it on GitHub.
Why this isn't a one-line config flip
Different embedding models produce vectors in different vector spaces. A BGE-M3 vector and a text-embedding-3-small vector are not interchangeable, even at the same dimensionality. Cosine similarity between them is meaningless: a number with no signal.
That matters because KongBrain's HNSW indexes hold embeddings across eight tables (turn, concept, memory, artifact, identity_chunk, skill, reflection, monologue). If two providers' vectors silently mix in the same index, every retrieval result is corrupted. A user who flipped from BGE-M3 to OpenAI would not get an error. They would get nonsense recall, slowly, and trust their assistant less without ever knowing why.
So the architectural invariant the feature had to preserve was simple to state and load-bearing to enforce: at any moment, every cosine query reads vectors from exactly one provider. The OpenAI adapter itself is the easy part. The hard part is making sure no write path or read path can ever break that invariant.
Four PRs, stacked in order
We landed the feature as a stacked sequence so each piece could be reviewed and reverted independently:
- PR-A: extract the
EmbeddingServiceinterface. Pure refactor. The existing BGE-M3 implementation becameLocalEmbeddingService. A newcreateEmbeddingService(config)factory at the single construction site. 27+ files importing the type kept working unchanged. Zero behavior change, all 439 tests passing. - PR-B: tag every embedding write, filter every search by provider. Added an
embedding_providercolumn to all eight vector-bearing tables. An idempotent backfill on every startup tags pre-existing rows aslocal-bge-m3. Five internal write paths stamp the active provider; 12 internal cosine queries filter by it; seven external write sites and ten external read sites were audited and patched. After the patch, everyvector::similarity::cosinecall site in the codebase scopes to a provider. This is the PR that made the OpenAI adapter safe to ship. - PR-C: the
openai-compatembedding provider. A newOpenAICompatEmbeddingServiceimplementing the interface from PR-A. Fetch-based, no SDK dependency, batches 96 inputs per request, retries 429s and 5xxs with exponential backoff and jitter, hard-fails on 401/403/404 with a useful error message. The provider ID encodes the model and dimension (openai-compat-text-embedding-3-small-1024d) so vectors written today can be distinguished from the same model at a different output dim later. - PR-D: re-embed migration tool. A CLI for users who want to switch providers and re-embed their existing graph. Closes the migration concern flagged in the original issue.
Why “openai-compat” and not “openai”
The OpenAI /v1/embeddings shape is implemented identically by Azure OpenAI, Together, Anyscale, vLLM, LM Studio, Ollama (compat endpoint), DeepInfra, and Fireworks. One implementation covers all of them. Switching backends is a baseURL change. Users self-serve.
The config:
{
"embedding": {
"provider": "openai-compat",
"dimensions": 1024,
"openaiCompat": {
"model": "text-embedding-3-small",
"baseURL": "https://api.openai.com/v1",
"apiKeyEnv": "OPENAI_API_KEY"
}
}
}The API key never appears in config. apiKeyEnv names the env var that holds it.
The trail is public
This is what we mean when we say the AI work is open. The issue, the four PRs, the diffs, the test counts, the audit notes are all on github.com/42U/kongbrain. If a future contributor wonders why every cosine query in the repo has an AND embedding_provider = $providerclause, the answer is sitting in PR-B's description.
Six days from a stranger's issue to a merged feature is not a heroic timeline. It is what should happen when the abstraction is close to right, the architectural invariant is named explicitly before the adapter ships, and the work happens in public so reviewers can see what they are signing off on.