RouteMind classifies every query with Gemini 3.1 Flash-Lite and deterministically dispatches it across the Claude spectrum — Haiku, Sonnet, Opus, or Fable. Deterministic guards enforce compliance and context limits before any model runs. Cut blended cost 30%+ with zero quality loss, all inside Gemini Enterprise.
Sending a "capital of France" query to a premium reasoning model burns money. Sending a multi-step proof to a fast model burns quality. A single fixed model can't win on both axes.
The legacy sonnet_qa_adk agent dispatched every query to one Claude model — overpaying on trivial traffic while under-serving the hardest questions.
A cheap Flash-Lite classifier reads each query and a deterministic dispatch node performs the edge jump — the system, not a second LLM, decides.
A Gemini 3.1 Flash-Lite classifier emits a structured tag; a zero-LLM Python function node parses it and the ADK 2.0 WorkflowAgent performs the edge jump. Routing accuracy and precision become fully testable.
Four tiers on one Agent Platform — Haiku 4.5, Sonnet 4.6, Opus 4.8, Fable 5 — all via the global Model Garden endpoint, with a cheap Gemini Flash-Lite brain choosing between them.
Every response carries selected_model, routing_reasoning, and routing_latency_ms — surfaced in metadata and logged for evaluation.
On a downstream 4xx/5xx or timeout, the cascade escalates Haiku → Sonnet → Opus — bounded to one hop per turn, so it never loops.
Zero-LLM rules run before the classifier: route oversized prompts off Haiku's window, force PII/HIPAA/GDPR traffic to a ZDR-safe tier, and gate refusal-prone Fable away from biology/medical — all logged and testable.
The classifier evaluates task typology, context volume, and latency sensitivity — then routes deterministically. Here's the policy.
| Query Category | Triggers | Routed Model | Cost / 1M (in·out) | Why |
|---|---|---|---|---|
| Simple / Factual / Search | Short queries, current events, greetings, summaries | claude-haiku-4-5 | $0.80 · $4.00 | Speed + cost; fastest Claude tier |
| Analytical / Code | Code gen & refactor, technical writing, doc review | claude-sonnet-4-6 | $3.30 · $16.50 | Strong coding + reasoning balance |
| Advanced Reasoning / Math | Multi-step proofs, complex design, deep logic | claude-opus-4-8 | $5.50 · $27.50 | Maximum step-wise reasoning depth |
| Classifier (routing brain) | Every inbound query | gemini-3.1-flash-lite | $0.25 · $1.50 | Cheapest per-query classification |
A deterministic loop — classify, dispatch, execute, stream — with the system in control of every routing decision.
Gemini Enterprise forwards a StreamRunRequest with the full session path and state.
Gemini 3.1 Flash-Lite emits a structured RouteResultSchema decision.
A zero-LLM function node parses the tag; the graph jumps to the right specialist.
The specialist response streams via SSE to Gemini Enterprise with routing metadata.
Every goal is tied to a measurable target and validated against a golden evaluation dataset.
Deployed to a single Agent Platform, exposed in Gemini Enterprise. No monkeypatches — session paths handled at the entrypoint, state propagated natively. Deterministic guards enforce compliance and context limits before any model is invoked.
User (Gemini Enterprise chat) │ bi-dir streaming / SSE ▼ Gemini Enterprise (Discovery Engine gateway) │ StreamRunRequest { newMessage, sessionId(full path), stateDelta } ▼ Agent Platform [us-central1] ├─ ENTRYPOINT: extract /sessions/ segment (no monkeypatch) ├─ ADK 2.0 Runner (run_async, native state_delta) │ ├─ WORKFLOW GRAPH │ START → [context_guard] off-Haiku to Sonnet if >200K │ → [compliance_guard] force Opus if ZDR/PII │ → [Classifier: gemini-3.1-flash-lite] → RouteResultSchema │ → [Dispatch Node] Fable gating + edge jump │ └─ Specialist leaf nodes (Model Garden MaaS, global): ├─ claude-haiku-4-5 ├─ claude-sonnet-4-6 ├─ claude-opus-4-8 └─ claude-fable-5 (bounded, gated) ▼ Specialist response → SSE → Gemini Enterprise → User + metadata: { selected_model, routing_reasoning, guard_flags, latency_ms }
Click any prompt below to watch the classifier evaluate and route it. Copy prompts to test against your live deployment.
Clone the repo, set your project, and deploy a multi-model router to Agent Platform in minutes. Fully open, fully testable.