RouteMind — Intelligent Claude Model Routing on ADK 2.0

The Challenge

One model for everything is
the wrong default.

Sending a "capital of France" query to a premium reasoning model burns money. Sending a multi-step proof to a fast model burns quality. A single fixed model can't win on both axes.

Single-Model Default

Pay Premium, Everywhere

The legacy sonnet_qa_adk agent dispatched every query to one Claude model — overpaying on trivial traffic while under-serving the hardest questions.

Simple Q&A pays premium token rates
Complex reasoning never escalated to a stronger tier
No cost/latency/quality optimization per query
Zero transparency into model selection

Intelligent Routing

The Right Model, Every Time

A cheap Flash-Lite classifier reads each query and a deterministic dispatch node performs the edge jump — the system, not a second LLM, decides.

High-volume traffic → fast, cheap Claude Haiku
Hard reasoning reserved for Opus; frontier for Fable
Guards block ZDR/PII from non-compliant tiers
Every response carries its routing reasoning

Core Capabilities

Built for cost, latency,
and quality at once.

Deterministic Graph Routing

A Gemini 3.1 Flash-Lite classifier emits a structured tag; a zero-LLM Python function node parses it and the ADK 2.0 WorkflowAgent performs the edge jump. Routing accuracy and precision become fully testable.

              # RouteResultSchema — guaranteed structured output

              class
              RouteResultSchema(BaseModel):

                selected_model: Literal[

                  "claude_haiku", "claude_sonnet",

                  "claude_opus", "claude_fable"

                ]

                routing_reasoning: str

The Full Claude Spectrum

Four tiers on one Agent Platform — Haiku 4.5, Sonnet 4.6, Opus 4.8, Fable 5 — all via the global Model Garden endpoint, with a cheap Gemini Flash-Lite brain choosing between them.

Full Routing Transparency

Every response carries selected_model, routing_reasoning, and routing_latency_ms — surfaced in metadata and logged for evaluation.

Loop-Safe Escalation

On a downstream 4xx/5xx or timeout, the cascade escalates Haiku → Sonnet → Opus — bounded to one hop per turn, so it never loops.

Deterministic Guards

Zero-LLM rules run before the classifier: route oversized prompts off Haiku's window, force PII/HIPAA/GDPR traffic to a ZDR-safe tier, and gate refusal-prone Fable away from biology/medical — all logged and testable.

The Decision Matrix

Every query, mapped to
its optimal model.

The classifier evaluates task typology, context volume, and latency sensitivity — then routes deterministically. Here's the policy.

Query Category	Triggers	Routed Model	Cost / 1M (in·out)	Why
Simple / Factual / Search	Short queries, current events, greetings, summaries	claude-haiku-4-5	$0.80 · $4.00	Speed + cost; fastest Claude tier
Analytical / Code	Code gen & refactor, technical writing, doc review	claude-sonnet-4-6	$3.30 · $16.50	Strong coding + reasoning balance
Advanced Reasoning / Math	Multi-step proofs, complex design, deep logic	claude-opus-4-8	$5.50 · $27.50	Maximum step-wise reasoning depth
Classifier (routing brain)	Every inbound query	gemini-3.1-flash-lite	$0.25 · $1.50	Cheapest per-query classification

How It Works

From query to answer in five steps.

A deterministic loop — classify, dispatch, execute, stream — with the system in control of every routing decision.

Step 01

Query In

Gemini Enterprise forwards a StreamRunRequest with the full session path and state.

Step 02

Classify

Gemini 3.1 Flash-Lite emits a structured RouteResultSchema decision.

Step 03

Dispatch

A zero-LLM function node parses the tag; the graph jumps to the right specialist.

Step 04

Stream Back

The specialist response streams via SSE to Gemini Enterprise with routing metadata.

Targets & KPIs

Measured, not promised.

Every goal is tied to a measurable target and validated against a golden evaluation dataset.

≥30%

Blended cost reduction vs. all-Sonnet

≥85%

Routing accuracy (exact-match)

≤3s

P95 latency, simple queries

100%

ZDR-marked prompts kept off Fable

Reference Architecture

Production-grade on
ADK 2.0 + Agent Platform.

Deployed to a single Agent Platform, exposed in Gemini Enterprise. No monkeypatches — session paths handled at the entrypoint, state propagated natively. Deterministic guards enforce compliance and context limits before any model is invoked.

reference-architecture.txt

ADK 2.0

User (Gemini Enterprise chat)
   │  bi-dir streaming / SSE
   ▼
Gemini Enterprise (Discovery Engine gateway)
   │  StreamRunRequest { newMessage, sessionId(full path), stateDelta }
   ▼
Agent Platform  [us-central1]
   ├─ ENTRYPOINT: extract /sessions/ segment  (no monkeypatch)
   ├─ ADK 2.0 Runner  (run_async, native state_delta)
   │
   ├─ WORKFLOW GRAPH
   │     START → [context_guard]    off-Haiku to Sonnet if >200K
   │           → [compliance_guard] force Opus if ZDR/PII
   │           → [Classifier: gemini-3.1-flash-lite] → RouteResultSchema
   │           → [Dispatch Node] Fable gating + edge jump
   │
   └─ Specialist leaf nodes (Model Garden MaaS, global):
         ├─ claude-haiku-4-5
         ├─ claude-sonnet-4-6
         ├─ claude-opus-4-8
         └─ claude-fable-5  (bounded, gated)
   ▼
Specialist response → SSE → Gemini Enterprise → User
   + metadata: { selected_model, routing_reasoning, guard_flags, latency_ms }

Interactive Examples

Try it — see the router
think in real time.

Click any prompt below to watch the classifier evaluate and route it. Copy prompts to test against your live deployment.

Haiku

"What is the capital of France?"

Haiku

"Hey! What's the weather like in Tokyo right now?"

Haiku

"Translate 'thank you' into Japanese, Spanish, and Arabic."

Sonnet

"Write a Python function that parses a CSV and returns a sorted list of unique email domains."

Sonnet

"Review this Python class and suggest refactors for testability and SOLID compliance."

Sonnet

"Explain the difference between BFS and DFS, and when to prefer each in a graph problem."

Opus

"Prove by induction that the sum of the first n odd numbers equals n²."

Opus

"Design a globally consistent, low-latency distributed key-value store with conflict resolution. Walk through the CAP theorem tradeoffs."

Opus

"Five people each own a different pet, speak a different language, and drink a different beverage. Given these 12 clues, determine who owns the fish."

Fable

"Write an opening chapter for a science-fiction novel set on a generation ship where AI has replaced all governance. Tone: literary, unsettling."

Fable

"Write a two-page screenplay scene: a morally grey detective interrogates a suspect they secretly believe is innocent."

Fable

"Create a fully realised mythology for a world where mathematics is sentient and numbers have political factions. Include creation myth, cosmology, and key conflicts."

Classifier · Gemini 3.1 Flash-Lite

IDLE

Click a prompt above to see the routing decision

Model

Reasoning

Confidence

One surface.
Every Claude,
routed right.

One model for everything is
the wrong default.

Pay Premium, Everywhere

The Right Model, Every Time

Built for cost, latency,
and quality at once.

Deterministic Graph Routing

The Full Claude Spectrum

Full Routing Transparency

Loop-Safe Escalation

Deterministic Guards

Every query, mapped to
its optimal model.

From query to answer in five steps.

Measured, not promised.

Production-grade on
ADK 2.0 + Agent Platform.

Try it — see the router
think in real time.

Ship the right model
for every query.

One model for everything isthe wrong default.

Pay Premium, Everywhere

The Right Model, Every Time

Built for cost, latency, and quality at once.

Deterministic Graph Routing

The Full Claude Spectrum

Full Routing Transparency

Loop-Safe Escalation

Deterministic Guards

Every query, mapped toits optimal model.

From query to answer in five steps.

Measured, not promised.

Production-grade on ADK 2.0 + Agent Platform.

Try it — see the routerthink in real time.

Ship the right modelfor every query.

One model for everything is
the wrong default.

Built for cost, latency,
and quality at once.

Every query, mapped to
its optimal model.

Production-grade on
ADK 2.0 + Agent Platform.

Try it — see the router
think in real time.

Ship the right model
for every query.