Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.copass.com/llms.txt

Use this file to discover all available pages before exploring further.

The ComputeRouter is unified managed compute for agents. Your agent runs fast and cheap; you don’t pick hosts, regions, or interconnect topologies. The Router handles compute provisioning, lifecycle, idle suspension, fail-over, and cost attribution per agent run. It’s the third axis of Copass’s three-Router architecture:
  • AgentRouterwho runs (the agent runtime)
  • ContextRouterwhat they know (the data and retrieval layer)
  • ComputeRouterwhere they run (the silicon under the runtime)
For Anthropic and Google agents, the provider’s vendor handles compute. For self-hosted open-weights agents (Hermes), ComputeRouter handles it — across multiple providers, with cost-aware routing.

What you can do

  • Run open-weights agent runtimes (Hermes today) on managed compute without provisioning servers yourself.
  • Pick a compute provider per agent (Daytona, Fly Sprites) — or let cost-aware routing pick per call.
  • Get long-lived sandboxes that auto-suspend on idle and resume on next invocation. No cold-start tax on every turn.
  • Audit compute time per (user_id, sandbox_id, agent_id, run_id) through the same credit ledger as token usage.
  • Fail over between providers automatically when one is degraded.

Supported compute providers

ProviderBest forKey trade-off
DaytonaLatency-sensitive workloads, generous free tier, fast warm-start (~30 ms). Archive-resume model fits long-lived per-(user, sandbox) instances.Slightly higher per-hour cost than Sprites at scale.
Fly SpritesCost-sensitive workloads, L3 egress allowlists, multi-region. The default for high-throughput agents.Slightly higher cold-start latency than Daytona.
Both providers run the same Hermes runtime, expose the same OpenAI-compatible API surface, and stream the same neutral event types back to AgentRouter. From your code, the provider is one field on the agent row — everything else is identical.

Cost-aware routing

When you don’t pin a provider, ComputeRouter picks per call by policy:
Agent profile         →    Provider chosen
─────────────────────────────────────────
latency-sensitive     →    Daytona
cost-sensitive        →    Fly Sprites
unhealthy primary     →    fail over to secondary
region-pinned         →    provider that serves that region
The policy is configurable on the agent row. Default is latency-aware (favor warm Daytona instances; spill to Sprites when caps are hit).

Why this exists

Two breaks in the existing agent runtime model push toward a separate compute axis:
  1. Open-weights agents need real compute. Hermes (Nous Research, MIT-licensed, OpenAI-compatible HTTP API, SQLite-on-disk memory) needs to run on a host you provision. Vendor-managed agent platforms don’t help here — you own the runtime, you own the silicon.
  2. The compute axis is orthogonal to the runtime axis. Hermes-on-Daytona vs. Hermes-on-Sprites is a compute decision; Hermes vs. some-future-open-weights-agent is a runtime decision. Treating them as one axis would force a combinatorial enum that breaks every time you add either.
ComputeRouter is the seam for the compute axis. You select the runtime on backend; you select the compute on compute_provider. Two orthogonal fields, picked independently.

What ComputeRouter handles

ConcernBehavior
ProvisioningSpins up a compute sandbox per (user_id, sandbox_id). Long-lived; auto-stops on idle (default 15 min); archives after a window (default 7 days); resumes on next invocation.
Cost attributionEvery sandbox carries {user_id, sandbox_id, agent_id, run_id} metadata tags. Compute time bills to the right account through the credit ledger — two rows per run (token cost + compute cost), joined on run_id.
AuthPer-turn DEK injection — credentials never bake into the sandbox image. Authorization: Bearer rotates per call.
EgressSandbox network is locked down to the Copass MCP host. No general internet. L3 allowlists on Sprites.
Kill-switchInherits the platform-wide gate (global pause, per-user pause, per-provider pause). Pause the user, pause the platform, pause Daytona but not Sprites — all supported.
Concurrency capsHard per-user limits prevent runaway sandboxes. Max-runtime ceilings prevent silent cost accumulation.
Region selectionProvider chooses the region nearest the calling user, or pins explicitly when compliance requires.
FallbackPrimary provider unhealthy? ComputeRouter fails over to the secondary automatically.
You never call ComputeRouter directly. AgentRouter resolves the agent’s ANS address, reads its compute_provider, and asks ComputeRouter to ensure a session.

Three paths

Via the Concierge

“Create a Hermes agent that runs on the cheapest compute available.” “Show me my active compute sessions.” “Pause my Hermes agents on Daytona — keep Sprites running.”

Via the CLI

# Pin a provider
copass agent create support \
  --backend hermes \
  --compute-provider daytona \
  --prompt-file ./prompts/support.md

# Or let cost-aware routing pick
copass agent create support \
  --backend hermes \
  --compute-policy cost-aware \
  --prompt-file ./prompts/support.md

# Inspect active sessions
copass compute sessions list

Via the SDK

// Pin a provider
await client.agents.create(sandboxId, {
  slug: 'support',
  model_settings: {
    backend: 'hermes',
    compute_provider: 'daytona',
  },
});

// Cost-aware routing
await client.agents.create(sandboxId, {
  slug: 'support',
  model_settings: {
    backend: 'hermes',
    compute_policy: 'cost-aware',
  },
});

// Inspect compute sessions for an agent
const sessions = await client.compute.sessions.list(sandboxId, { agent_slug: 'support' });
The compute axis is one extra field; everything else is the same agent CRUD.

How an agent run uses ComputeRouter

router.run({ provider: 'hermes', ... })


   AgentRouter resolves the agent's ANS address


   Reads model_settings.compute_provider (or applies the policy)


   ComputeRouter ensures a session
            │   • finds an active sandbox for (user_id, sandbox_id)
            │   • resumes from archive if needed
            │   • spins up a new one if none exists
            │   • applies kill-switch and concurrency caps

   Hermes runtime on the provisioned compute


   Stream of neutral AgentEvents back to your client
The whole thing happens transparently. From your code, it’s the same router.run({...}) call as any other AgentRouter invocation.

Common patterns

Latency-sensitive customer chat

Pin compute_provider: 'daytona' for an agent that handles live customer conversations. Daytona’s warm-start budget keeps response times tight.

Throughput-heavy backend processing

Pin compute_provider: 'fly_sprites' for batch-style agents. Lower per-hour cost; cold-start latency doesn’t matter when the workload runs on a queue.

Mixed workload, policy-driven

compute_policy: 'cost-aware' lets ComputeRouter pick per call. Latency-tagged turns route to Daytona; everything else hits Sprites.

Compliance routing

Region-pin via compute_region when data-residency requires a specific geography. Both providers support multi-region.

What it explicitly is not

  • Not a generic GPU compute marketplace. ComputeRouter is for agent runtimes — Hermes today, peers later. ML training jobs, batch inference at scale, custom CUDA kernels — not the target. Buyers who need raw GPU SKU access go direct to the providers.
  • Not a sandbox for untrusted user code. That’s E2B’s niche. Hermes is our code; the isolation requirement is per-tenant data isolation, not adversarial-code isolation.
  • Not a replacement for vendor-managed agent compute. Anthropic and Google still run their own compute when you select provider: 'anthropic' or provider: 'google' on AgentRouter. ComputeRouter only enters the picture when the agent runtime is one we self-host.
  • Not user-OAuth into the providers. Compute provider accounts are platform-wide, with metadata tags for cost attribution. You don’t bring your own Daytona or Fly account.

Next steps