Window-Aware Retrieval

The agent’s context window is a running record of what it has already learned. Every turn, it accumulates more: user messages, tool outputs, retrieved chunks, its own earlier responses. By turn 5 of a conversation, a sizable portion of the window is already about the topic you’re still retrieving against. Window-aware retrieval treats the window as first-class input. When the caller passes conversation history along with a query, the retrieval system reads what’s already there and pushes down an exclusion so the next response delivers fresh signal instead of restating what the agent already has.

  Without window-awareness              With window-awareness

  Turn 1  ──► [A, B, C, D, E]          Turn 1  ──► [A, B, C, D, E]
  Turn 2  ──► [A, B, C, D, E]          Turn 2  ──► [F, G, H, I, J]
          (same results, already       (reaches past what's known,
           in the agent's window)       returns the next layer down)

Why this matters

Follow-up turns stay productive

Without the window signal, every turn is a rerun. The agent pays tokens to receive facts it already has. Window-awareness means turn N+1 can actually be progress.

Exploration reaches further

“Show me more like this” becomes a real affordance. The system understands that more means different, not more of the same.

Token budgets last longer

Redundant results are the silent budget-killer in multi-turn agents. Window-awareness converts those tokens into new signal.

The user doesn't notice anything

Good mechanism, invisible surface. The user sees an agent that remembers, not a retrieval system that’s doing plumbing.

How it works, from the outside

The retrieval system accepts an optional history field — a list of prior conversation turns. When present, the system inspects the history before executing the query and builds an exclusion set: the entities, paths, and structural points the agent has already encountered. That set becomes an input to the query itself, not a post-filter.

                        ┌─────────────────────────┐
  question + history ──►│   WINDOW EXTRACTION     │
                        │  • what's been seen     │
                        └──────────┬──────────────┘
                                   │
                                   ▼
                        ┌─────────────────────────┐
                        │   RETRIEVAL             │
                        │  • fetch top-N NOT in   │
                        │    the exclusion set    │
                        │  • expand search as     │
                        │    needed to hit N      │
                        └──────────┬──────────────┘
                                   │
                                   ▼
                        ┌─────────────────────────┐
                        │   RESULTS               │
                        │  • N fresh items, no    │
                        │    redelivery           │
                        └─────────────────────────┘

The key word is push-down. The exclusion is applied during retrieval, not after. When the top-5 matches overlap with the window, the system reaches for positions 6–10 to replace them — so the caller gets a full page of genuinely new content, not a half-empty page of what’s left after filtering.

Push-down matters. A post-filter that runs after results come back would leave you with partial pages: request 50 results, receive 10 after dedup, call again, receive 0 the next time because the top matches were already in the window. Window-aware retrieval avoids that by pulling forward before filtering.

What it looks like across the three calls

Window-awareness is a property of the retrieval layer — so it applies uniformly to discover, interpret, and search (the three calls documented in Progressive Disclosure).

Discover

The menu the agent reads. With window-awareness, the menu never re-surfaces items from prior turns.

Turn 1 — history empty
  discover("auth service refactor")
  ► Auth Service                         [99%]
  ► Auth Service > Tokens                [88%]
  ► Auth Service > Refresh Policy        [85%]
  ► Auth Service > Errors                [82%]
  ► Auth Service > Circuit Breaker       [78%]

Turn 2 — window contains top 3 from above
  discover("auth service refactor")
  ► Auth Service > Errors                [82%]   ← next layer
  ► Auth Service > Circuit Breaker       [78%]
  ► Auth Service > Metrics               [71%]
  ► Auth Service > Dependency Injection  [67%]
  ► Auth Service > Logging               [64%]

Interpret

When the agent asks for a brief on specific items, window-awareness prunes the accompanying graph context — the brief focuses on the parts of those items the agent hasn’t read yet.

Search

For the synthesized-answer surface, window-awareness narrows the graph context that feeds the synthesis LLM. The answer ends up covering the gap between what the agent knows and what the user just asked.

Compared to plain retrieval

  Plain retrieval              Window-aware retrieval

  • Blind to the window.       • Treats the window as input.
  • Same query →               • Same query + evolved window
    same results.                → evolved results.
  • Deduplication is the       • No dedup needed — results
    caller's problem.            never overlapped in the first
                                 place.
  • Multi-turn agents          • Multi-turn agents stay
    accumulate redundant          sharp across arbitrary
    chunks.                      conversation length.

When does it fire?

Caller passes	What happens
No `history`	Window-awareness is a no-op. Results are whatever the raw retrieval returns.
Empty `history` array	Same as above — no-op.
Non-empty `history`	System extracts what’s been seen and pushes exclusion into the query.

Critically, it fires automatically. Builders don’t need to format the exclusion, maintain a dedup cache, or prompt the agent to avoid repetition. Pass the history; the system does the rest.

Edge cases

What if the agent genuinely wants to re-see something?

Pass the item’s stable identifier directly through interpret. The exclusion applies to what was delivered in prior turns, not to what the agent is explicitly asking for now. An explicit reference always wins over an implicit “already seen” signal.

What if the history mentions something only in passing?

Window-awareness is conservative — it only excludes items the agent actually received, not everything that co-occurred in the history text. A passing reference to a concept doesn’t lock the agent out of retrieving that concept later; only having received it as retrieval output does.

What if the phrasing differs between history and the graph?

Matching happens on structural identity, not surface text. If the history mentioned “JWT refresh” and the graph stores it as “Token Refresh Policy”, the system matches them as the same underlying entity and correctly excludes it.

Does it work across sessions?

Today, window-awareness operates on whatever history the caller passes in. Cross-session memory is the caller’s responsibility — but since the history contract is just a list of turns, plugging in a session store is straightforward.

Does history inflate latency?

There’s a small upfront cost to extract the exclusion set from history. In exchange, downstream retrieval fetches fewer redundant items and returns more relevant ones. On multi-turn conversations, the net latency is lower, not higher, because the agent converges faster.

How this connects to progressive disclosure

Window-awareness and progressive disclosure are two sides of the same principle: treat the agent’s context window as a scarce, stateful resource.

Progressive disclosure gives the agent control over how much it retrieves per turn.
Window-awareness ensures what gets retrieved is fresh signal, not redelivery.

A system with progressive disclosure but without window-awareness will produce shorter redundant pages. A system with window-awareness but without progressive disclosure will produce lean but still-too-large single-shot retrievals. Together they give you the shape that makes multi-turn agents actually work.

Implications for your prompts

Don’t write prompts that try to solve this yourself. Patterns to drop:

“Don’t repeat anything you’ve already told me.” The system handles this; the prompt adds nothing but noise to the model.
“Here’s what we’ve covered so far: …” If the history is in the window, it’s already available. Summarizing it for the retrieval system is redundant.
Hand-rolled dedup in your tool-calling loop. Skip it. The retrieval layer has better signal than a set-of-strings comparison.

Patterns that work with window-awareness:

Pass full recent history when retrieving. The richer the history, the sharper the exclusion.
Trust the next-page behavior. When the user says “show me more,” the system naturally reaches deeper.
Use interpret with explicit ids when you want to re-surface something. The explicit path bypasses the exclusion.

Next steps

Progressive Disclosure — the three-call gradient that pairs with window-awareness to keep agents sharp across long conversations.
Claude Code Extension — how the three window-aware tools install into Claude Code.

Getting Started

Copass Context

Agent Router

Collaboration

Developer Tools

Cookbooks

Security

Window-Aware Retrieval

Why this matters

Follow-up turns stay productive

Exploration reaches further

Token budgets last longer

The user doesn't notice anything

How it works, from the outside

What it looks like across the three calls

Discover

Interpret

Search

Compared to plain retrieval

When does it fire?

Edge cases

How this connects to progressive disclosure

Implications for your prompts

Next steps

Getting Started

Copass Context

Agent Router

Collaboration

Developer Tools

Cookbooks

Security

​Why this matters

Follow-up turns stay productive

Exploration reaches further

Token budgets last longer

The user doesn't notice anything

​How it works, from the outside

​What it looks like across the three calls

​Discover

​Interpret

​Search

​Compared to plain retrieval

​When does it fire?

​Edge cases

​How this connects to progressive disclosure

​Implications for your prompts

​Next steps

Why this matters

How it works, from the outside

What it looks like across the three calls

Discover

Interpret

Search

Compared to plain retrieval

When does it fire?

Edge cases

How this connects to progressive disclosure

Implications for your prompts

Next steps