Engineering·Feb 22, 2026

We Rebuilt Everything. Here's What Happened.

Five services became one binary. Insert throughput hit 580K vectors per second. And we didn't change a single API endpoint.

TL;DR

We consolidated five services into one native binary. Insert latency dropped from ~30ms to 12ms. Health checks from ~15ms to 0.5ms. Our engine now inserts at 580K vec/s — 25× faster than zvec (Proxima). Same API, nothing breaks.

Here's an uncomfortable truth about building infrastructure: sometimes the architecture you're proud of is the thing slowing you down.

We built EmergentDB the “right way” — separate services for indexing, coordination, and caching. Clean boundaries. Independent scaling. Textbook microservice design. And every single vector operation paid for it with network hops, JSON serialization, and round-trip latency that had nothing to do with actual computation.

So we threw it all out and started over.

The problem nobody talks about

Our production stack had five services talking to each other over the network. The vector index was split across multiple processes. A coordination layer routed requests between them. A caching layer sat on top. Every insert, every search, every delete — multiple network hops.

We profiled an insert operation. 12ms was actual work. 18ms was overhead. Serialization. DNS resolution. TCP round-trips between containers sitting on the same machine. More than half the latency was the architecture itself.

Latency comparison

Server-side (ms) — lower is better

BeforeAfter

Where the time goes

Insert breakdown — 30ms became 12ms

30ms

Before

Processing Overhead

12ms

After

100% processing

Infrastructure reduction

Services

5 → 1

Docker images

3 → 1

Config lines

90 → 30

Deploy steps

5 → 1

One binary to rule them all

We unified everything into a single native process. The vector index, the API layer, the coordination logic — same memory space. What used to be a network round-trip became a function call.

Insert latency dropped from ~30ms to 12ms. Health checks from ~15ms to 0.5ms. Deployment went from orchestrating five containers to shipping one image. Our CI pipeline became a single step.

“Same computation. Same algorithms. 60% less latency — just by removing the architecture.”

Then we benchmarked it against zvec

Zvec is built on Proxima, Alibaba's battle-tested vector search engine. It's fast, lightweight, and embeds directly into your application. We wanted to know how our new engine stacked up in a fair fight — in-process, no HTTP, same hardware.

The results were not what we expected.

Insert throughput

Vectors per second by dataset size — higher is better

zvec (Proxima)bolt-native25× faster

P50 search latency

Milliseconds by dataset size — lower is better

zvec (Proxima)bolt-native

Full benchmark

Engine	N	Insert/s	P50	P99	QPS
zvec-flat	1K	23,184	0.26ms	0.35ms	3,206
bolt-native	1K	627,190	0.30ms	0.45ms	3,212
zvec-flat	10K	24,124	1.75ms	1.95ms	569
bolt-native	10K	569,899	1.01ms	1.27ms	960
zvec-hnsw	100K	22,102	5.53ms	7.53ms	178
bolt-native	100K	581,540	11.37ms	12.36ms	87

Apples-to-apples: in-process, no HTTP. 1536-dim vectors. Same hardware.

Insert: bolt is 25× faster — 580K vec/s vs zvec's 23K. Our flat index is a raw memcpy into a pre-allocated BLAS-aligned buffer. Zero overhead.

Search at small scale: comparable — at 1K vectors, both engines return results in under a millisecond. At 10K, bolt is actually faster (1.01ms vs 1.75ms).

Search at 100K: zvec pulls ahead — for now — zvec's HNSW index gives it 5.53ms vs bolt's 11.37ms. But EmergentDB already figured out the optimal strategy for inserts on its own. It'll do the same for searches.

The trade-off is clear: bolt is a write-optimized monster that's competitive on reads up to 10K vectors. At larger scale, the search engine will adapt — that's what self-optimizing means.

What we learned

Your architecture is a latency budget

Every service boundary is a tax on every operation. Splitting makes sense when services have different trust boundaries or need to scale independently. Ours didn't. Every request touched every service.

The first deploy always breaks something unexpected

Ours failed on a DNS resolution edge case. Service-to-service communication that worked in the old setup needed slightly different network handling. Small fix, long diagnosis.

Check your persistence assumptions

Our production data volumes were empty. The old system kept everything in memory and never wrote to disk. Data survived because services rarely restarted. We got lucky. The new system auto-checkpoints.

Being honest about trade-offs builds trust

Zvec's HNSW beats us at 100K search today. We published the numbers anyway. The database found the best insert strategy on its own — it'll do the same for search. That's the whole point of self-optimizing infrastructure.

Questions

Things people have asked about this change.

Does this change anything about the API?

No. Every endpoint, request format, and response format is identical. If you have working code against EmergentDB today, it keeps working.

Why compare against zvec specifically?

Zvec is one of the best in-process vector databases available. It's built on Proxima (Alibaba), it's fast, and it's the kind of engine we respect. Benchmarking against strong competition keeps us honest.

What about search at 100K+ vectors?

Zvec's HNSW index is faster there today. But EmergentDB is self-optimizing — it already discovered the best strategy for inserts, and it'll do the same for searches. The database adapts. We'll publish those benchmarks when they're ready.

Isn't a single service a single point of failure?

The old system had one too — the coordination service every request flowed through. Recovery is faster now. A restart brings everything back in seconds. True HA is on the roadmap.

What's next

The self-optimizing engine already found the fastest insert strategy. Next, it'll tackle search at scale — discovering the right algorithm the same way it did for writes. And a single lightweight binary makes multi-region deployment realistic for the first time. Stay tuned.

This is live in production now.

580K vec/s. Same API. Try it.

Try EmergentDB →