Why this matters
Follow-up turns stay productive
Without the window signal, every turn is a rerun. The agent pays tokens to receive facts it already has. Window-awareness means turn N+1 can actually be progress.
Exploration reaches further
“Show me more like this” becomes a real affordance. The system understands that more means different, not more of the same.
Token budgets last longer
Redundant results are the silent budget-killer in multi-turn agents. Window-awareness converts those tokens into new signal.
The user doesn't notice anything
Good mechanism, invisible surface. The user sees an agent that remembers, not a retrieval system that’s doing plumbing.
How it works, from the outside
The retrieval system accepts an optionalhistory field — a list of prior conversation turns. When present, the system inspects the history before executing the query and builds an exclusion set: the entities, paths, and structural points the agent has already encountered. That set becomes an input to the query itself, not a post-filter.
What it looks like across the three calls
Window-awareness is a property of the retrieval layer — so it applies uniformly todiscover, interpret, and search (the three calls documented in Progressive Disclosure).
Discover
The menu the agent reads. With window-awareness, the menu never re-surfaces items from prior turns.Interpret
When the agent asks for a brief on specific items, window-awareness prunes the accompanying graph context — the brief focuses on the parts of those items the agent hasn’t read yet.Search
For the synthesized-answer surface, window-awareness narrows the graph context that feeds the synthesis LLM. The answer ends up covering the gap between what the agent knows and what the user just asked.Compared to plain retrieval
When does it fire?
| Caller passes | What happens |
|---|---|
No history | Window-awareness is a no-op. Results are whatever the raw retrieval returns. |
Empty history array | Same as above — no-op. |
Non-empty history | System extracts what’s been seen and pushes exclusion into the query. |
Edge cases
What if the agent genuinely wants to re-see something?
What if the agent genuinely wants to re-see something?
Pass the item’s stable identifier directly through
interpret. The exclusion applies to what was delivered in prior turns, not to what the agent is explicitly asking for now. An explicit reference always wins over an implicit “already seen” signal.What if the history mentions something only in passing?
What if the history mentions something only in passing?
Window-awareness is conservative — it only excludes items the agent actually received, not everything that co-occurred in the history text. A passing reference to a concept doesn’t lock the agent out of retrieving that concept later; only having received it as retrieval output does.
What if the phrasing differs between history and the graph?
What if the phrasing differs between history and the graph?
Matching happens on structural identity, not surface text. If the history mentioned “JWT refresh” and the graph stores it as “Token Refresh Policy”, the system matches them as the same underlying entity and correctly excludes it.
Does it work across sessions?
Does it work across sessions?
Today, window-awareness operates on whatever
history the caller passes in. Cross-session memory is the caller’s responsibility — but since the history contract is just a list of turns, plugging in a session store is straightforward.Does history inflate latency?
Does history inflate latency?
There’s a small upfront cost to extract the exclusion set from history. In exchange, downstream retrieval fetches fewer redundant items and returns more relevant ones. On multi-turn conversations, the net latency is lower, not higher, because the agent converges faster.
How this connects to progressive disclosure
Window-awareness and progressive disclosure are two sides of the same principle: treat the agent’s context window as a scarce, stateful resource.- Progressive disclosure gives the agent control over how much it retrieves per turn.
- Window-awareness ensures what gets retrieved is fresh signal, not redelivery.
Implications for your prompts
Don’t write prompts that try to solve this yourself. Patterns to drop:- “Don’t repeat anything you’ve already told me.” The system handles this; the prompt adds nothing but noise to the model.
- “Here’s what we’ve covered so far: …” If the history is in the window, it’s already available. Summarizing it for the retrieval system is redundant.
- Hand-rolled dedup in your tool-calling loop. Skip it. The retrieval layer has better signal than a set-of-strings comparison.
- Pass full recent history when retrieving. The richer the history, the sharper the exclusion.
- Trust the next-page behavior. When the user says “show me more,” the system naturally reaches deeper.
- Use
interpretwith explicit ids when you want to re-surface something. The explicit path bypasses the exclusion.
Next steps
- Progressive Disclosure — the three-call gradient that pairs with window-awareness to keep agents sharp across long conversations.
- Claude Code Extension — how the three window-aware tools install into Claude Code.

