Introducing DeltaMemory : Cognitive memory for production AI agents. Learn more >

Performance

Fast enough for real-time. Efficient enough for scale.

DeltaMemory is built in Rust with a custom storage engine. 50ms queries, 3,714x token compression, and 97% cost reduction over raw context re-processing.

Book Demo
50ms
p50 query latency
Rust-powered retrieval engine
3,714x
Token compression
26M tokens become 7K structured facts
97%
Cost reduction
vs. raw token re-processing
<1ms
Core operations
Write-ahead log + in-memory index

LoCoMo benchmark results

Head-to-head comparison on the LoCoMo long-term conversation memory benchmark across all evaluation categories.

MethodSingle-HopMulti-HopOpen DomainTemporalOverallLatency (p50)
DeltaMemory91.5%87.5%90.5%82.2%89%50ms
Memobase v0.0.3770.9%46.9%77.2%85.1%75.8%800ms
Zep74.1%66%67.7%79.8%75.1%632ms
Memobase v0.0.3263.8%52.1%71.8%80.4%70.9%900ms
Mem0-Graph65.7%47.2%75.7%58.1%68.4%2.6s
Mem067.1%51.2%72.9%55.5%66.9%1.4s
LangMem62.2%47.9%71.1%23.4%58.1%18.0s
OpenAI63.8%42.9%62.3%21.7%52.9%466ms

Evaluated on the LoCoMo long-term conversation memory benchmark. All methods tested under identical conditions using GPT-4.1 as the base model.

Under the hood

Every layer of the stack is designed for speed and efficiency.

3,714x Token Compression

Raw conversations are compressed into structured facts and a knowledge graph. 26 million tokens of conversation history become 7,000 tokens of structured memory. Your agents recall what matters without re-processing entire histories.

50ms Query Latency

Hybrid retrieval combines HNSW vector search, BM25 keyword matching, and graph traversal in a single query. Reciprocal Rank Fusion merges results. The entire pipeline completes in under 50ms at the median.

Rust from the Ground Up

The storage engine is built from scratch in Rust. Write-ahead logging for crash recovery, B-tree sorted storage for fast reads, and parallel HNSW index rebuilding. No garbage collection pauses. No runtime overhead.

Scales with Your Agents

Per-user session isolation with fine-grained locking means concurrent access for different users while serializing requests for the same user. Background fact extraction and batch embedding generation keep throughput high.

What this means for your agents

Performance that translates directly into better user experiences and lower operating costs.

01

Real-Time Responses

Memory retrieval completes in 50ms. Your agents respond with full context without adding noticeable delay to conversations.

02

Lower Costs at Scale

Instead of re-sending entire conversation histories to your LLM, DeltaMemory delivers only the relevant facts. 97% fewer tokens processed per query.

03

Always-On Reliability

Every memory write is durable before acknowledgment. If anything goes wrong, recovery is automatic. No data loss, no manual intervention.

Early Access

Give your agents memory that compounds.

DeltaMemory is available to select design partners and enterprise teams. Book a demo to see how persistent, cognitive memory works with your AI agents.

Book a Demo
Delta