We ran 574 unique task prompts across 11 categories and 40 subcategories. Every prompt was sent to both the adaptively routed specialist and the GPT-4o generalist baseline. No synthetic data, no cherry-picking — just transparent head-to-head results from real API calls.
Performance varies by task type. Routing excels at high-level reasoning tasks (analysis, planning, explanation) where choosing the right specialist model matters most. Even in competitive categories like code generation, routing delivers cost savings while maintaining quality.
| Category | Tasks | Win Rate | Δ Quality |
|---|---|---|---|
| Analysis | 21 | 100% | +20.5 pts |
| Planning | 15 | 100% | +21.2 pts |
| Explanation | 21 | 95% | +15.4 pts |
| Summarization | 40 | 78% | +9.8 pts |
| Creative Writing | 43 | 69% | +9.0 pts |
| Math Reasoning | 105 | 66% | — |
| Data Extraction | 45 | 60% | +6.8 pts |
| Email Writing | 20 | 55% | +4.5 pts |
| Code Generation | 166 | 56% | — |
| Translation | 54 | 52% | +2.7 pts |
| Comparison | 14 | 67% | +11.1 pts |
Replace your existing model calls with a single Orob endpoint. The protocol classifies your prompt, selects the optimal model from the network, executes it, and returns the result — while the routing graph learns from every outcome.
// Before — single generalist, fixed cost const res = await openai.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: prompt }] });
// After — intelligent routing across the network const res = await orob.route({ prompt }); // Mistral Small for summaries · GPT-4.1 Nano for extraction // DeepSeek for reasoning · Codestral for code
These specializations weren't manually programmed. The Thompson sampling bandit explored the model space and converged on optimal assignments through real performance data. The graph updates continuously from live traffic.
Every benchmark result is validated through multiple independent signals. No single metric determines the outcome — the protocol blends them to build a complete picture of model performance for each task type.