EngineeringFebruary 10, 2026

How AI Agents Forget (and Why That's a Feature)

Most AI memory systems hoard everything. DeltaMemory's salience decay lets agents forget gracefully, keeping context sharp and responses relevant.

Your AI agent remembers that a user ordered pizza on March 3rd. It also remembers their name, their job title, their daughter's birthday, and the fact that they switched from iPhone to Android six months ago.

When the user asks for a restaurant recommendation, which of those memories should surface?

If you said "all of them," you have the same problem most AI memory systems have. They remember everything with equal weight, forever. The context window fills up with stale trivia, and the agent loses the signal in the noise.

Human memory does not work this way. We forget things. And that forgetting is not a bug. It is how we stay focused.

The problem with perfect recall

Most approaches to AI memory treat it like a database. Store everything. Retrieve by similarity. Hope the vector search returns the right stuff.

This works fine when you have a handful of memories. It breaks down at scale. An agent that has had thousands of conversations with a user accumulates tens of thousands of memories. Many of them are outdated. Some contradict each other. A lot of them are just noise.

When you stuff all of that into a context window, three things happen:

1. The agent gets confused by contradictory information

2. Important recent context gets buried under old irrelevant memories

3. You burn tokens (and money) on context that adds no value

The fix is not better retrieval. The fix is better forgetting.

How salience decay works

Every memory in DeltaMemory has a salience score between 0 and 1. Think of it as "how important is this right now." A brand new memory starts with high salience. Over time, that score decays.

The math is simple: exponential decay. Current salience equals the stored salience multiplied by e raised to the power of negative decay rate times age in days. If that sounds familiar, it is the same curve that describes radioactive decay, capacitor discharge, and — not coincidentally — human forgetting.

With the default decay rate, a memory retains about 90% of its salience after one day. After a week, it is down to roughly 50%. After a month, it is below 5%.

This means a memory from yesterday about the user's flight delay will naturally outrank a memory from three weeks ago about their lunch order. No special logic needed. The math handles it.

Access keeps memories alive

Decay alone would be too aggressive. Some memories are old but still important. The user's name. Their company. Their preference for concise responses.

DeltaMemory handles this the same way your brain does: access refreshes salience. Every time a memory is retrieved and used in a response, its importance gets reinforced. Memories that keep being useful stay alive. Memories that never get recalled fade away.

This creates a natural feedback loop. The agent surfaces a memory. If it was relevant, the conversation flows naturally and the memory gets reinforced. If it was irrelevant, it does not get accessed again and continues to decay. Over time, the agent's memory self-curates.

The prune threshold

Eventually, a memory's salience drops below a configurable threshold. At that point, DeltaMemory marks it for cleanup. The memory is not deleted immediately — it gets tombstoned in the storage layer and cleaned up during the next compaction cycle.

This is important for two reasons. First, it keeps the active memory set small and fast to search. Second, it means the agent's context window is not competing with thousands of irrelevant memories for attention.

You can tune the decay rate and prune threshold per use case. A customer support agent might use aggressive decay because last week's ticket is rarely relevant to today's issue. A healthcare agent might use slow decay because patient history matters for months or years.

Why not just use a recency filter?

You could skip all of this and just retrieve the N most recent memories. Some systems do exactly that. It is simple and it works for short-lived conversations.

But it fails for long-term relationships. A user who mentioned their daughter's birthday three months ago expects the agent to remember it when the date comes around. A pure recency filter would have dropped that memory weeks ago.

Salience decay with access reinforcement handles this naturally. If the agent has referenced the birthday in previous conversations (maybe when suggesting gift ideas), that memory's salience stays high despite its age. It survives because it has proven useful, not because it is recent.

Batch processing at scale

When you are running memory for thousands of users, computing salience on every query would be expensive. DeltaMemory handles this with batch computation. The salience calculator processes arrays of (salience, timestamp) pairs in a single pass, and the decay operation runs as a background task rather than blocking retrieval.

The decay endpoint lets you trigger this explicitly, or you can let it run on a schedule. Either way, the retrieval path stays fast because it is working with pre-computed scores, not calculating decay in real time.

Forgetting is a feature

The instinct when building AI memory is to keep everything. More data is better, right?

Not when your context window is finite. Not when your users expect fast, relevant responses. Not when stale information actively degrades the quality of your agent's output.

Controlled forgetting — salience decay with access reinforcement and configurable thresholds — gives your agent the same advantage that human memory provides: the ability to focus on what matters right now, while letting the rest gracefully fade away.

Your agent does not need perfect recall. It needs the right memories at the right time.