Skip to main content
Context engineering is the discipline of deciding what information an agent sees, in what order, at what cost. It’s prompt engineering’s serious cousin. And most systems are doing it wrong. Copass treats context as a scarce resource and gives the agent a gradient — cheap structure at the top, expensive synthesis at the bottom — so the agent pays for depth only where it commits. We call this progressive disclosure.

The problem with “just send the chunks”

Today’s default retrieval shape is some variation of:
  User question ──► vector search ──► top-N chunks ──► stuffed into prompt
It works. It’s also a trap at scale. Three failures compound:

Token tax

Every turn pays for every chunk, even the irrelevant ones. Budgets evaporate before the agent has done any real work.

Duplicate delivery

The window already contains what the agent read last turn. A naive retriever can’t see the window — so it serves the same content twice.

Flat relevance

“Top 10” is flat. The agent can’t tell which hit is worth reading deeply vs. which is peripheral. Everything arrives at the same resolution.
Those aren’t implementation bugs — they’re baked into the “retrieve-then-stuff” shape. You can’t tune your way out of them. You have to rethink what the call actually does.

First principles

Four ideas, stated plainly. The rest of the system is the consequences.

1. The agent is the decision-maker

The retriever doesn’t know which sub-path of a topic the agent needs. The agent does — it has the user’s actual goal. The retriever’s job is to present options, not to guess the answer.
  Naive:        question  ─────────────► chunks        (retriever guesses)
  Progressive:  question  ──► menu  ──► agent picks  ──► depth
Give the agent a menu, let it pick, then go deep where it committed. The agent’s token budget and the user’s actual intent are the right place to make the decision — not a similarity threshold inside the retriever.

2. Retrieval is a gradient, not a step

A call can be cheap and broad or expensive and specific. It shouldn’t be one or the other. A good system offers the whole axis and lets each turn pick the right point.
  Cheap ◄─────────────────────────────────────────► Expensive
  Broad                                             Specific

  •───────────•───────────•───────────•───────────•
  menu        pinned      targeted    synthesized  full
              items       brief       paragraph    narrative
Every surface in Copass lives somewhere on that axis. The agent doesn’t “retrieve” — it navigates the gradient.

3. The window has memory — respect it

The agent’s context window is a record of what it has already learned. If the retriever is blind to the window, it will re-surface the same facts turn after turn, and the agent’s context fills with duplicates instead of new signal.
  Turn 1  │  window = ∅
  ────────┼────────────────────────────────────
  discover│  returns  [A, B, C, D, E, …]
          │  agent reads A, B
  ────────┼────────────────────────────────────
  Turn 2  │  window = {A, B}
  ────────┼────────────────────────────────────
  discover│  returns  [C, D, E, F, G, …]   ← not [A, B, …] again
          │  reaches "further" because A, B are already known
Window-awareness isn’t a polish feature. It’s what makes follow-up turns productive instead of a treadmill.

4. Structure before prose

Retrieving a paragraph is expensive — in tokens, in latency, in cognitive load for the agent. Retrieving a pointer to a paragraph is cheap. Most of the time the agent just needs to know where the answer is, not what it is. Show structure first. Fetch prose only when the agent has decided prose is needed.
  Structure                          Prose
  ─────────                          ─────

  Checkout Service                   "The checkout confirmation flow
  ├── Stripe Webhooks                 switched from polling to Stripe
  │   ├── retry policy                webhooks in Q3. The handler lives
  │   └── dead-letter queue           in services/webhooks.ts and retries
  └── Logs                            with exponential backoff on 5xx…"

  ~15 tokens                         ~250 tokens
  cheap                              16× more expensive
The structure is often all the agent needs to pick its next step. Prose is the second pass, not the first.

Progressive disclosure as the consequence

Stack those four principles and a specific shape falls out — three operations, each narrower and more committed than the last:
                    ┌────────────────────────────────┐
                    │        DISCOVER                │
                    │  • menu of structural hits     │
                    │  • ranked, no prose            │
                    │  • window-aware                │
                    │  • cheap (sub-second)          │
                    └──────────────┬─────────────────┘
                                   │ agent picks

                    ┌────────────────────────────────┐
                    │        INTERPRET               │
                    │  • brief pinned to picks       │
                    │  • grounded in real source     │
                    │  • medium cost, medium latency │
                    └──────────────┬─────────────────┘
                                   │ agent commits

                    ┌────────────────────────────────┐
                    │        SEARCH                  │
                    │  • full synthesized answer     │
                    │  • structure + prose           │
                    │  • heaviest call               │
                    └────────────────────────────────┘
Each call operates over the same underlying knowledge graph — it’s not three separate retrieval systems. What varies is the resolution: how much of the graph gets rendered into tokens. The agent traverses the gradient deliberately, one step at a time.

Compare

A single user turn against a codebase that mentions checkout in 200 places:

Naive RAG

retrieve(top_50, "checkout issue")
→ 50 chunks × 400 tokens = 20,000 tokens
→ 90% irrelevant to *this* question
→ 30% already in the window
Costly. Noisy. Redundant.

Progressive disclosure

discover("checkout issue")
→ 30 path pointers ≈ 800 tokens

agent picks 2 items, runs interpret
→ 1 brief ≈ 400 tokens

total: ~1,200 tokens, all relevant
16× leaner. Higher signal. No re-delivery.
The agent ends up with a smaller, more relevant context — not because the system retrieved less, but because the system let the agent decide when to pay more.

What this enables

Multi-turn agents that stay sharp

Context doesn’t fill with sludge over a long conversation. Every turn’s retrieval is incremental.

Interactive UIs

Discover is fast enough for typeahead, hover cards, and live previews. You can render a “related context” panel without blocking.

Cost-aware orchestration

Production agents can budget per-turn: “2 discovers per turn, 1 interpret, 0.2 searches on average.” The three surfaces give you the right knobs.

Graceful degradation

Out of budget? Drop the search. Still running? Drop the interpret. Discover alone is often enough to route the agent to the right code.

Implications for how you build

If you take these principles seriously, your agents change shape.
  • Don’t prefetch. Don’t front-load “just in case” retrieval into the system prompt. Let the agent call discover on demand.
  • Teach the agent to pick. The agent’s job on a discover response is selection — scanning the menu, picking 1–3 items. Prompt it to do this deliberately.
  • Pin follow-ups. When the user asks “tell me more about X,” pass X’s identifier back to interpret instead of running a fresh discover. Cheaper and more focused.
  • Skip steps when you can. If the user’s question is self-contained, skip discover and go straight to search. Progressive doesn’t mean linear.
  • Trust the window. The system will avoid redelivering what the agent has seen. Don’t write prompts that second-guess that — it adds confusion for no gain.

When not to use it

Progressive disclosure shines when the agent is exploratory, multi-turn, or token-constrained. It’s less interesting when:
  • The question is tiny and self-contained. Go straight to search.
  • The answer is always the same document. Read the document directly, skip retrieval entirely.
  • The surface area is small enough that everything fits in the window anyway.
Don’t turn a one-line question into a three-hop retrieval chain to prove a point.

In practice

The progressive disclosure shape is exposed via three Copass tools: discover, interpret, and search. They’re available over MCP to any agent client — Claude Code, Cursor, and beyond — and as CLI commands for scripting. See Claude Code Extension for the agent-facing setup.

Next steps