The problem with “just send the chunks”
Today’s default retrieval shape is some variation of:Token tax
Every turn pays for every chunk, even the irrelevant ones. Budgets evaporate before the agent has done any real work.
Duplicate delivery
The window already contains what the agent read last turn. A naive retriever can’t see the window — so it serves the same content twice.
Flat relevance
“Top 10” is flat. The agent can’t tell which hit is worth reading deeply vs. which is peripheral. Everything arrives at the same resolution.
First principles
Four ideas, stated plainly. The rest of the system is the consequences.1. The agent is the decision-maker
The retriever doesn’t know which sub-path of a topic the agent needs. The agent does — it has the user’s actual goal. The retriever’s job is to present options, not to guess the answer.2. Retrieval is a gradient, not a step
A call can be cheap and broad or expensive and specific. It shouldn’t be one or the other. A good system offers the whole axis and lets each turn pick the right point.3. The window has memory — respect it
The agent’s context window is a record of what it has already learned. If the retriever is blind to the window, it will re-surface the same facts turn after turn, and the agent’s context fills with duplicates instead of new signal.4. Structure before prose
Retrieving a paragraph is expensive — in tokens, in latency, in cognitive load for the agent. Retrieving a pointer to a paragraph is cheap. Most of the time the agent just needs to know where the answer is, not what it is. Show structure first. Fetch prose only when the agent has decided prose is needed.Progressive disclosure as the consequence
Stack those four principles and a specific shape falls out — three operations, each narrower and more committed than the last:Compare
A single user turn against a codebase that mentionscheckout in 200 places:
Naive RAG
Progressive disclosure
What this enables
Multi-turn agents that stay sharp
Context doesn’t fill with sludge over a long conversation. Every turn’s retrieval is incremental.
Interactive UIs
Discover is fast enough for typeahead, hover cards, and live previews. You can render a “related context” panel without blocking.
Cost-aware orchestration
Production agents can budget per-turn: “2 discovers per turn, 1 interpret, 0.2 searches on average.” The three surfaces give you the right knobs.
Graceful degradation
Out of budget? Drop the search. Still running? Drop the interpret. Discover alone is often enough to route the agent to the right code.
Implications for how you build
If you take these principles seriously, your agents change shape.- Don’t prefetch. Don’t front-load “just in case” retrieval into the system prompt. Let the agent call discover on demand.
- Teach the agent to pick. The agent’s job on a discover response is selection — scanning the menu, picking 1–3 items. Prompt it to do this deliberately.
- Pin follow-ups. When the user asks “tell me more about X,” pass X’s identifier back to interpret instead of running a fresh discover. Cheaper and more focused.
- Skip steps when you can. If the user’s question is self-contained, skip discover and go straight to search. Progressive doesn’t mean linear.
- Trust the window. The system will avoid redelivering what the agent has seen. Don’t write prompts that second-guess that — it adds confusion for no gain.
When not to use it
Progressive disclosure shines when the agent is exploratory, multi-turn, or token-constrained. It’s less interesting when:- The question is tiny and self-contained. Go straight to search.
- The answer is always the same document. Read the document directly, skip retrieval entirely.
- The surface area is small enough that everything fits in the window anyway.
In practice
The progressive disclosure shape is exposed via three Copass tools:discover, interpret, and search. They’re available over MCP to any agent client — Claude Code, Cursor, and beyond — and as CLI commands for scripting.
See Claude Code Extension for the agent-facing setup.
Next steps
- Claude Code Extension — the three-call pattern wired into Claude Code.
- Window-Aware Retrieval — the complementary property that makes multi-turn retrieval actually work.
- Overview — how scope, retrieval, and ingestion fit together.

