ADK 2.0 — Multi-Tier Claude Router for Gemini Enterprise

One surface.
Every Claude,
routed right.

RouteMind classifies every query with Gemini 3.1 Flash-Lite and deterministically dispatches it across the Claude spectrum — Haiku, Sonnet, Opus, or Fable. Deterministic guards enforce compliance and context limits before any model runs. Cut blended cost 30%+ with zero quality loss, all inside Gemini Enterprise.

30%+
Blended cost reduction
≤800ms
P95 routing overhead
85%+
Routing accuracy
routemind · reasoning-engine
LIVE
USER
"Prove the sum of the first n odd numbers equals n²."
CLASSIFIER (Gemini 3.1 Flash-Lite)
Detected: multi-step mathematical proof → deep reasoning required.
Decision: claude_opus
01 · Classify task typology
02 · Emit RouteResultSchema (JSON)
03 · Deterministic dispatch node → edge jump
04 · Stream from claude-opus-4-8
OPUS — routed in 121ms
Base case n=1: 1 = 1². Inductive step assumes ∑(2k−1)=n²; adding (2n+1) yields (n+1)²…
The Challenge

One model for everything is
the wrong default.

Sending a "capital of France" query to a premium reasoning model burns money. Sending a multi-step proof to a fast model burns quality. A single fixed model can't win on both axes.

Single-Model Default

Pay Premium, Everywhere

The legacy sonnet_qa_adk agent dispatched every query to one Claude model — overpaying on trivial traffic while under-serving the hardest questions.

  • Simple Q&A pays premium token rates
  • Complex reasoning never escalated to a stronger tier
  • No cost/latency/quality optimization per query
  • Zero transparency into model selection
Intelligent Routing

The Right Model, Every Time

A cheap Flash-Lite classifier reads each query and a deterministic dispatch node performs the edge jump — the system, not a second LLM, decides.

  • High-volume traffic → fast, cheap Claude Haiku
  • Hard reasoning reserved for Opus; frontier for Fable
  • Guards block ZDR/PII from non-compliant tiers
  • Every response carries its routing reasoning
Core Capabilities

Built for cost, latency,
and quality at once.

Deterministic Graph Routing

A Gemini 3.1 Flash-Lite classifier emits a structured tag; a zero-LLM Python function node parses it and the ADK 2.0 WorkflowAgent performs the edge jump. Routing accuracy and precision become fully testable.

# RouteResultSchema — guaranteed structured output
class RouteResultSchema(BaseModel):
  selected_model: Literal[
    "claude_haiku", "claude_sonnet",
    "claude_opus", "claude_fable"
  ]
  routing_reasoning: str

The Full Claude Spectrum

Four tiers on one Agent Platform — Haiku 4.5, Sonnet 4.6, Opus 4.8, Fable 5 — all via the global Model Garden endpoint, with a cheap Gemini Flash-Lite brain choosing between them.

Full Routing Transparency

Every response carries selected_model, routing_reasoning, and routing_latency_ms — surfaced in metadata and logged for evaluation.

Loop-Safe Escalation

On a downstream 4xx/5xx or timeout, the cascade escalates Haiku → Sonnet → Opus — bounded to one hop per turn, so it never loops.

Deterministic Guards

Zero-LLM rules run before the classifier: route oversized prompts off Haiku's window, force PII/HIPAA/GDPR traffic to a ZDR-safe tier, and gate refusal-prone Fable away from biology/medical — all logged and testable.

The Decision Matrix

Every query, mapped to
its optimal model.

The classifier evaluates task typology, context volume, and latency sensitivity — then routes deterministically. Here's the policy.

Query Category Triggers Routed Model Cost / 1M (in·out) Why
Simple / Factual / Search Short queries, current events, greetings, summaries claude-haiku-4-5 $0.80 · $4.00 Speed + cost; fastest Claude tier
Analytical / Code Code gen & refactor, technical writing, doc review claude-sonnet-4-6 $3.30 · $16.50 Strong coding + reasoning balance
Advanced Reasoning / Math Multi-step proofs, complex design, deep logic claude-opus-4-8 $5.50 · $27.50 Maximum step-wise reasoning depth
Classifier (routing brain) Every inbound query gemini-3.1-flash-lite $0.25 · $1.50 Cheapest per-query classification
How It Works

From query to answer in five steps.

A deterministic loop — classify, dispatch, execute, stream — with the system in control of every routing decision.

Step 01
Query In

Gemini Enterprise forwards a StreamRunRequest with the full session path and state.

Step 02
Classify

Gemini 3.1 Flash-Lite emits a structured RouteResultSchema decision.

Step 03
Dispatch

A zero-LLM function node parses the tag; the graph jumps to the right specialist.

Step 04
Stream Back

The specialist response streams via SSE to Gemini Enterprise with routing metadata.

Targets & KPIs

Measured, not promised.

Every goal is tied to a measurable target and validated against a golden evaluation dataset.

≥30%
Blended cost reduction vs. all-Sonnet
≥85%
Routing accuracy (exact-match)
≤3s
P95 latency, simple queries
100%
ZDR-marked prompts kept off Fable
Reference Architecture

Production-grade on
ADK 2.0 + Agent Platform.

Deployed to a single Agent Platform, exposed in Gemini Enterprise. No monkeypatches — session paths handled at the entrypoint, state propagated natively. Deterministic guards enforce compliance and context limits before any model is invoked.

reference-architecture.txt
ADK 2.0
User (Gemini Enterprise chat)
   │  bi-dir streaming / SSE
   ▼
Gemini Enterprise (Discovery Engine gateway)
   │  StreamRunRequest { newMessage, sessionId(full path), stateDelta }
   ▼
Agent Platform  [us-central1]
   ├─ ENTRYPOINT: extract /sessions/ segment  (no monkeypatch)
   ├─ ADK 2.0 Runner  (run_async, native state_delta)
   │
   ├─ WORKFLOW GRAPH
   │     START → [context_guard]    off-Haiku to Sonnet if >200K
   │           → [compliance_guard] force Opus if ZDR/PII
   │           → [Classifier: gemini-3.1-flash-lite] → RouteResultSchema
   │           → [Dispatch Node] Fable gating + edge jump
   │
   └─ Specialist leaf nodes (Model Garden MaaS, global):
         ├─ claude-haiku-4-5
         ├─ claude-sonnet-4-6
         ├─ claude-opus-4-8
         └─ claude-fable-5  (bounded, gated)
   ▼
Specialist response → SSE → Gemini Enterprise → User
   + metadata: { selected_model, routing_reasoning, guard_flags, latency_ms }
Interactive Examples

Try it — see the router
think in real time.

Click any prompt below to watch the classifier evaluate and route it. Copy prompts to test against your live deployment.

Haiku
"What is the capital of France?"
Haiku
"Hey! What's the weather like in Tokyo right now?"
Haiku
"Translate 'thank you' into Japanese, Spanish, and Arabic."
Sonnet
"Write a Python function that parses a CSV and returns a sorted list of unique email domains."
Sonnet
"Review this Python class and suggest refactors for testability and SOLID compliance."
Sonnet
"Explain the difference between BFS and DFS, and when to prefer each in a graph problem."
Opus
"Prove by induction that the sum of the first n odd numbers equals n²."
Opus
"Design a globally consistent, low-latency distributed key-value store with conflict resolution. Walk through the CAP theorem tradeoffs."
Opus
"Five people each own a different pet, speak a different language, and drink a different beverage. Given these 12 clues, determine who owns the fish."
Fable
"Write an opening chapter for a science-fiction novel set on a generation ship where AI has replaced all governance. Tone: literary, unsettling."
Fable
"Write a two-page screenplay scene: a morally grey detective interrogates a suspect they secretly believe is innocent."
Fable
"Create a fully realised mythology for a world where mathematics is sentient and numbers have political factions. Include creation myth, cosmology, and key conflicts."
Classifier · Gemini 3.1 Flash-Lite
IDLE
Click a prompt above to see the routing decision
Model
Reasoning
Confidence

Ship the right model
for every query.

Clone the repo, set your project, and deploy a multi-model router to Agent Platform in minutes. Fully open, fully testable.