Fast enough for real-time. Efficient enough for scale.
DeltaMemory is built in Rust with a custom storage engine. 50ms queries, 3,714x token compression, and 97% cost reduction over raw context re-processing.
Book DemoLoCoMo benchmark results
Head-to-head comparison on the LoCoMo long-term conversation memory benchmark across all evaluation categories.
| Method | Single-Hop | Multi-Hop | Open Domain | Temporal | Overall | Latency (p50) |
|---|---|---|---|---|---|---|
| DeltaMemory | 91.5% | 87.5% | 90.5% | 82.2% | 89% | 50ms |
| Memobase v0.0.37 | 70.9% | 46.9% | 77.2% | 85.1% | 75.8% | 800ms |
| Zep | 74.1% | 66% | 67.7% | 79.8% | 75.1% | 632ms |
| Memobase v0.0.32 | 63.8% | 52.1% | 71.8% | 80.4% | 70.9% | 900ms |
| Mem0-Graph | 65.7% | 47.2% | 75.7% | 58.1% | 68.4% | 2.6s |
| Mem0 | 67.1% | 51.2% | 72.9% | 55.5% | 66.9% | 1.4s |
| LangMem | 62.2% | 47.9% | 71.1% | 23.4% | 58.1% | 18.0s |
| OpenAI | 63.8% | 42.9% | 62.3% | 21.7% | 52.9% | 466ms |
Evaluated on the LoCoMo long-term conversation memory benchmark. All methods tested under identical conditions using GPT-4.1 as the base model.
Under the hood
Every layer of the stack is designed for speed and efficiency.
3,714x Token Compression
Raw conversations are compressed into structured facts and a knowledge graph. 26 million tokens of conversation history become 7,000 tokens of structured memory. Your agents recall what matters without re-processing entire histories.
50ms Query Latency
Hybrid retrieval combines HNSW vector search, BM25 keyword matching, and graph traversal in a single query. Reciprocal Rank Fusion merges results. The entire pipeline completes in under 50ms at the median.
Rust from the Ground Up
The storage engine is built from scratch in Rust. Write-ahead logging for crash recovery, B-tree sorted storage for fast reads, and parallel HNSW index rebuilding. No garbage collection pauses. No runtime overhead.
Scales with Your Agents
Per-user session isolation with fine-grained locking means concurrent access for different users while serializing requests for the same user. Background fact extraction and batch embedding generation keep throughput high.
What this means for your agents
Performance that translates directly into better user experiences and lower operating costs.
Real-Time Responses
Memory retrieval completes in 50ms. Your agents respond with full context without adding noticeable delay to conversations.
Lower Costs at Scale
Instead of re-sending entire conversation histories to your LLM, DeltaMemory delivers only the relevant facts. 97% fewer tokens processed per query.
Always-On Reliability
Every memory write is durable before acknowledgment. If anything goes wrong, recovery is automatic. No data loss, no manual intervention.