We Rebuilt Everything. Here's What Happened.
Five services became one binary. Insert throughput hit 580K vectors per second. And we didn't change a single API endpoint.
TL;DR
We consolidated five services into one native binary. Insert latency dropped from ~30ms to 12ms. Health checks from ~15ms to 0.5ms. Our engine now inserts at 580K vec/s — 25× faster than zvec (Proxima). Same API, nothing breaks.
Here's an uncomfortable truth about building infrastructure: sometimes the architecture you're proud of is the thing slowing you down.
We built EmergentDB the “right way” — separate services for indexing, coordination, and caching. Clean boundaries. Independent scaling. Textbook microservice design. And every single vector operation paid for it with network hops, JSON serialization, and round-trip latency that had nothing to do with actual computation.
So we threw it all out and started over.
The problem nobody talks about
Our production stack had five services talking to each other over the network. The vector index was split across multiple processes. A coordination layer routed requests between them. A caching layer sat on top. Every insert, every search, every delete — multiple network hops.
We profiled an insert operation. 12ms was actual work. 18ms was overhead. Serialization. DNS resolution. TCP round-trips between containers sitting on the same machine. More than half the latency was the architecture itself.
Latency comparison
Server-side (ms) — lower is better
Where the time goes
Insert breakdown — 30ms became 12ms
30ms
Before
12ms
After
Infrastructure reduction
One binary to rule them all
We unified everything into a single native process. The vector index, the API layer, the coordination logic — same memory space. What used to be a network round-trip became a function call.
Insert latency dropped from ~30ms to 12ms. Health checks from ~15ms to 0.5ms. Deployment went from orchestrating five containers to shipping one image. Our CI pipeline became a single step.
“Same computation. Same algorithms. 60% less latency — just by removing the architecture.”
Then we benchmarked it against zvec
Zvec is built on Proxima, Alibaba's battle-tested vector search engine. It's fast, lightweight, and embeds directly into your application. We wanted to know how our new engine stacked up in a fair fight — in-process, no HTTP, same hardware.
The results were not what we expected.
Insert throughput
Vectors per second by dataset size — higher is better
P50 search latency
Milliseconds by dataset size — lower is better
Full benchmark
| Engine | N | Insert/s | P50 | P99 | QPS |
|---|---|---|---|---|---|
| zvec-flat | 1K | 23,184 | 0.26ms | 0.35ms | 3,206 |
| bolt-native | 1K | 627,190 | 0.30ms | 0.45ms | 3,212 |
| zvec-flat | 10K | 24,124 | 1.75ms | 1.95ms | 569 |
| bolt-native | 10K | 569,899 | 1.01ms | 1.27ms | 960 |
| zvec-hnsw | 100K | 22,102 | 5.53ms | 7.53ms | 178 |
| bolt-native | 100K | 581,540 | 11.37ms | 12.36ms | 87 |
Apples-to-apples: in-process, no HTTP. 1536-dim vectors. Same hardware.
Insert: bolt is 25× faster — 580K vec/s vs zvec's 23K. Our flat index is a raw memcpy into a pre-allocated BLAS-aligned buffer. Zero overhead.
Search at small scale: comparable — at 1K vectors, both engines return results in under a millisecond. At 10K, bolt is actually faster (1.01ms vs 1.75ms).
Search at 100K: zvec pulls ahead — for now — zvec's HNSW index gives it 5.53ms vs bolt's 11.37ms. But EmergentDB already figured out the optimal strategy for inserts on its own. It'll do the same for searches.
The trade-off is clear: bolt is a write-optimized monster that's competitive on reads up to 10K vectors. At larger scale, the search engine will adapt — that's what self-optimizing means.
What we learned
Your architecture is a latency budget
Every service boundary is a tax on every operation. Splitting makes sense when services have different trust boundaries or need to scale independently. Ours didn't. Every request touched every service.
The first deploy always breaks something unexpected
Ours failed on a DNS resolution edge case. Service-to-service communication that worked in the old setup needed slightly different network handling. Small fix, long diagnosis.
Check your persistence assumptions
Our production data volumes were empty. The old system kept everything in memory and never wrote to disk. Data survived because services rarely restarted. We got lucky. The new system auto-checkpoints.
Being honest about trade-offs builds trust
Zvec's HNSW beats us at 100K search today. We published the numbers anyway. The database found the best insert strategy on its own — it'll do the same for search. That's the whole point of self-optimizing infrastructure.
Questions
Things people have asked about this change.
Does this change anything about the API?
No. Every endpoint, request format, and response format is identical. If you have working code against EmergentDB today, it keeps working.
Why compare against zvec specifically?
Zvec is one of the best in-process vector databases available. It's built on Proxima (Alibaba), it's fast, and it's the kind of engine we respect. Benchmarking against strong competition keeps us honest.
What about search at 100K+ vectors?
Zvec's HNSW index is faster there today. But EmergentDB is self-optimizing — it already discovered the best strategy for inserts, and it'll do the same for searches. The database adapts. We'll publish those benchmarks when they're ready.
Isn't a single service a single point of failure?
The old system had one too — the coordination service every request flowed through. Recovery is faster now. A restart brings everything back in seconds. True HA is on the roadmap.
What's next
The self-optimizing engine already found the fastest insert strategy. Next, it'll tackle search at scale — discovering the right algorithm the same way it did for writes. And a single lightweight binary makes multi-region deployment realistic for the first time. Stay tuned.
This is live in production now.
580K vec/s. Same API. Try it.