Performance

Fast enough for real-time. Efficient enough for scale.

DeltaMemory is built in Rust with a custom storage engine. 50ms queries, 3,714x token compression, and 97% cost reduction over raw context re-processing.

Book Demo

50ms

p50 query latency

Rust-powered retrieval engine

3,714x

Token compression

26M tokens become 7K structured facts

97%

Cost reduction

vs. raw token re-processing

<1ms

Core operations

Write-ahead log + in-memory index

LoCoMo benchmark results

Head-to-head comparison on the LoCoMo long-term conversation memory benchmark across all evaluation categories.

Method	Single-Hop	Multi-Hop	Open Domain	Temporal	Overall	Latency (p50)
DeltaMemory	91.5%	87.5%	90.5%	82.2%	89%	50ms
Memobase v0.0.37	70.9%	46.9%	77.2%	85.1%	75.8%	800ms
Zep	74.1%	66%	67.7%	79.8%	75.1%	632ms
Memobase v0.0.32	63.8%	52.1%	71.8%	80.4%	70.9%	900ms
Mem0-Graph	65.7%	47.2%	75.7%	58.1%	68.4%	2.6s
Mem0	67.1%	51.2%	72.9%	55.5%	66.9%	1.4s
LangMem	62.2%	47.9%	71.1%	23.4%	58.1%	18.0s
OpenAI	63.8%	42.9%	62.3%	21.7%	52.9%	466ms

Evaluated on the LoCoMo long-term conversation memory benchmark. All methods tested under identical conditions using GPT-4.1 as the base model.

Under the hood

Every layer of the stack is designed for speed and efficiency.

3,714x Token Compression

Raw conversations are compressed into structured facts and a knowledge graph. 26 million tokens of conversation history become 7,000 tokens of structured memory. Your agents recall what matters without re-processing entire histories.

50ms Query Latency

Hybrid retrieval combines HNSW vector search, BM25 keyword matching, and graph traversal in a single query. Reciprocal Rank Fusion merges results. The entire pipeline completes in under 50ms at the median.

Rust from the Ground Up

The storage engine is built from scratch in Rust. Write-ahead logging for crash recovery, B-tree sorted storage for fast reads, and parallel HNSW index rebuilding. No garbage collection pauses. No runtime overhead.

Scales with Your Agents

Per-user session isolation with fine-grained locking means concurrent access for different users while serializing requests for the same user. Background fact extraction and batch embedding generation keep throughput high.

What this means for your agents

Performance that translates directly into better user experiences and lower operating costs.

Real-Time Responses

Memory retrieval completes in 50ms. Your agents respond with full context without adding noticeable delay to conversations.

Lower Costs at Scale

Instead of re-sending entire conversation histories to your LLM, DeltaMemory delivers only the relevant facts. 97% fewer tokens processed per query.

Always-On Reliability

Every memory write is durable before acknowledgment. If anything goes wrong, recovery is automatic. No data loss, no manual intervention.