ComputeRouter

The ComputeRouter is unified managed compute for agents. Your agent runs fast and cheap; you don’t pick hosts, regions, or interconnect topologies. The Router handles compute provisioning, lifecycle, idle suspension, fail-over, and cost attribution per agent run. It’s the third axis of Copass’s three-Router architecture:

AgentRouter — who runs (the agent runtime)
ContextRouter — what they know (the data and retrieval layer)
ComputeRouter — where they run (the silicon under the runtime)

For Anthropic and Google agents, the provider’s vendor handles compute. For self-hosted open-weights agents (Hermes), ComputeRouter handles it — across multiple providers, with cost-aware routing.

What you can do

Run open-weights agent runtimes (Hermes today) on managed compute without provisioning servers yourself.
Pick a compute provider per agent (Daytona, Fly Sprites) — or let cost-aware routing pick per call.
Get long-lived sandboxes that auto-suspend on idle and resume on next invocation. No cold-start tax on every turn.
Audit compute time per (user_id, sandbox_id, agent_id, run_id) through the same credit ledger as token usage.
Fail over between providers automatically when one is degraded.

Supported compute providers

Provider	Best for	Key trade-off
Daytona	Latency-sensitive workloads, generous free tier, fast warm-start (~30 ms). Archive-resume model fits long-lived per-(user, sandbox) instances.	Slightly higher per-hour cost than Sprites at scale.
Fly Sprites	Cost-sensitive workloads, L3 egress allowlists, multi-region. The default for high-throughput agents.	Slightly higher cold-start latency than Daytona.

Both providers run the same Hermes runtime, expose the same OpenAI-compatible API surface, and stream the same neutral event types back to AgentRouter. From your code, the provider is one field on the agent row — everything else is identical.

Cost-aware routing

When you don’t pin a provider, ComputeRouter picks per call by policy:

Agent profile         →    Provider chosen
─────────────────────────────────────────
latency-sensitive     →    Daytona
cost-sensitive        →    Fly Sprites
unhealthy primary     →    fail over to secondary
region-pinned         →    provider that serves that region

The policy is configurable on the agent row. Default is latency-aware (favor warm Daytona instances; spill to Sprites when caps are hit).

Why this exists

Two breaks in the existing agent runtime model push toward a separate compute axis:

Open-weights agents need real compute. Hermes (Nous Research, MIT-licensed, OpenAI-compatible HTTP API, SQLite-on-disk memory) needs to run on a host you provision. Vendor-managed agent platforms don’t help here — you own the runtime, you own the silicon.
The compute axis is orthogonal to the runtime axis. Hermes-on-Daytona vs. Hermes-on-Sprites is a compute decision; Hermes vs. some-future-open-weights-agent is a runtime decision. Treating them as one axis would force a combinatorial enum that breaks every time you add either.

ComputeRouter is the seam for the compute axis. You select the runtime on backend; you select the compute on compute_provider. Two orthogonal fields, picked independently.

What ComputeRouter handles

Concern	Behavior
Provisioning	Spins up a compute sandbox per `(user_id, sandbox_id)`. Long-lived; auto-stops on idle (default 15 min); archives after a window (default 7 days); resumes on next invocation.
Cost attribution	Every sandbox carries `{user_id, sandbox_id, agent_id, run_id}` metadata tags. Compute time bills to the right account through the credit ledger — two rows per run (token cost + compute cost), joined on `run_id`.
Auth	Per-turn DEK injection — credentials never bake into the sandbox image. `Authorization: Bearer` rotates per call.
Egress	Sandbox network is locked down to the Copass MCP host. No general internet. L3 allowlists on Sprites.
Kill-switch	Inherits the platform-wide gate (global pause, per-user pause, per-provider pause). Pause the user, pause the platform, pause Daytona but not Sprites — all supported.
Concurrency caps	Hard per-user limits prevent runaway sandboxes. Max-runtime ceilings prevent silent cost accumulation.
Region selection	Provider chooses the region nearest the calling user, or pins explicitly when compliance requires.
Fallback	Primary provider unhealthy? ComputeRouter fails over to the secondary automatically.

You never call ComputeRouter directly. AgentRouter resolves the agent’s ANS address, reads its compute_provider, and asks ComputeRouter to ensure a session.

Three paths

Via the Concierge

“Create a Hermes agent that runs on the cheapest compute available.” “Show me my active compute sessions.” “Pause my Hermes agents on Daytona — keep Sprites running.”

Via the CLI

# Pin a provider
copass agent create support \
  --backend hermes \
  --compute-provider daytona \
  --prompt-file ./prompts/support.md

# Or let cost-aware routing pick
copass agent create support \
  --backend hermes \
  --compute-policy cost-aware \
  --prompt-file ./prompts/support.md

# Inspect active sessions
copass compute sessions list

Via the SDK

// Pin a provider
await client.agents.create(sandboxId, {
  slug: 'support',
  model_settings: {
    backend: 'hermes',
    compute_provider: 'daytona',
  },
});

// Cost-aware routing
await client.agents.create(sandboxId, {
  slug: 'support',
  model_settings: {
    backend: 'hermes',
    compute_policy: 'cost-aware',
  },
});

// Inspect compute sessions for an agent
const sessions = await client.compute.sessions.list(sandboxId, { agent_slug: 'support' });

The compute axis is one extra field; everything else is the same agent CRUD.

How an agent run uses ComputeRouter

router.run({ provider: 'hermes', ... })
            │
            ▼
   AgentRouter resolves the agent's ANS address
            │
            ▼
   Reads model_settings.compute_provider (or applies the policy)
            │
            ▼
   ComputeRouter ensures a session
            │   • finds an active sandbox for (user_id, sandbox_id)
            │   • resumes from archive if needed
            │   • spins up a new one if none exists
            │   • applies kill-switch and concurrency caps
            ▼
   Hermes runtime on the provisioned compute
            │
            ▼
   Stream of neutral AgentEvents back to your client

The whole thing happens transparently. From your code, it’s the same router.run({...}) call as any other AgentRouter invocation.

Common patterns

Latency-sensitive customer chat

Pin compute_provider: 'daytona' for an agent that handles live customer conversations. Daytona’s warm-start budget keeps response times tight.

Throughput-heavy backend processing

Pin compute_provider: 'fly_sprites' for batch-style agents. Lower per-hour cost; cold-start latency doesn’t matter when the workload runs on a queue.

Mixed workload, policy-driven

compute_policy: 'cost-aware' lets ComputeRouter pick per call. Latency-tagged turns route to Daytona; everything else hits Sprites.

Compliance routing

Region-pin via compute_region when data-residency requires a specific geography. Both providers support multi-region.

What it explicitly is not

Not a generic GPU compute marketplace. ComputeRouter is for agent runtimes — Hermes today, peers later. ML training jobs, batch inference at scale, custom CUDA kernels — not the target. Buyers who need raw GPU SKU access go direct to the providers.
Not a sandbox for untrusted user code. That’s E2B’s niche. Hermes is our code; the isolation requirement is per-tenant data isolation, not adversarial-code isolation.
Not a replacement for vendor-managed agent compute. Anthropic and Google still run their own compute when you select provider: 'anthropic' or provider: 'google' on AgentRouter. ComputeRouter only enters the picture when the agent runtime is one we self-host.
Not user-OAuth into the providers. Compute provider accounts are platform-wide, with metadata tags for cost attribution. You don’t bring your own Daytona or Fly account.

Next steps

AgentRouter — Providers — provider matrix including the Hermes / Daytona / Sprites paths.
AgentRouter — Build an agent — full agent creation flow including the compute_provider and compute_policy fields.
ANS overview — the addressing layer that ties an agent’s address to its compute session.
Portable Context — why context survives provider and compute swaps.

Getting Started

Context Router

Agent Router

Compute Router

ANS - Attention Name System

Collaboration

Account

Developer Tools

Cookbooks

Security

ComputeRouter

What you can do

Supported compute providers

Cost-aware routing

Why this exists

What ComputeRouter handles

Three paths

Via the Concierge

Via the CLI

Via the SDK

How an agent run uses ComputeRouter

Common patterns

Latency-sensitive customer chat

Throughput-heavy backend processing

Mixed workload, policy-driven

Compliance routing

What it explicitly is not

Next steps

Getting Started

Context Router

Agent Router

Compute Router

ANS - Attention Name System

Collaboration

Account

Developer Tools

Cookbooks

Security

Documentation Index

​What you can do

​Supported compute providers

​Cost-aware routing

​Why this exists

​What ComputeRouter handles

​Three paths

​Via the Concierge

​Via the CLI

​Via the SDK

​How an agent run uses ComputeRouter

​Common patterns

Latency-sensitive customer chat

Throughput-heavy backend processing

Mixed workload, policy-driven

Compliance routing

​What it explicitly is not

​Next steps

What you can do

Supported compute providers

Cost-aware routing

Why this exists

What ComputeRouter handles

Three paths

Via the Concierge

Via the CLI

Via the SDK

How an agent run uses ComputeRouter

Common patterns

What it explicitly is not

Next steps