Your agent keeps working when OpenAI goes down, your quota runs out, or a model gets deprecated.
Solve Track 04 · Reliability. Agent-Routing is a multi-provider LLM router with per-task fallback cascades, session-level circuit breakers, budget boundaries, token compression, and prompt injection filters. It ensures agent systems never freeze due to single API timeouts. Extracted from 18 months of production Agentic OS.
Relying on a single LLM API is a single point of failure. If you build an agent system that only speaks to OpenAI, you are at the mercy of rate limits, server outages, billing caps, and network timeouts. When a customer is waiting for a response and OpenAI returns a 503 error, your entire agent pipeline crashes.
Furthermore, different tasks require different models. Running simple text cleanup on GPT-4o is a waste of money; running code writing on a weak model leads to errors. Agent-Routing resolves this by wrapping model requests in a resilient cascade. It routes tasks based on priority chains, tracks errors to temporarily disable down services, and estimates costs beforehand to enforce budget limits.
code, content, ui, or simple). The router matches the class to a pre-defined fallback chain. For example, code cascades through OpenAI → Gemini → Anthropic → Ollama, while simple runs Ollama first to keep costs at zero.
OPEN. All subsequent requests for the next 5 minutes bypass that provider entirely, saving time.
Each provider cycles through three states. This prevents a broken endpoint from constantly adding timeout latency to requests:
Default priority chains optimized for pricing, speed, and capability:
| Task Class | Cascade Order | Fail-Safe Backup |
|---|---|---|
| Code | gpt-4o → claude-3-5-sonnet → gemini-1.5-pro | ollama (qwen2.5-coder:7b) |
| Content | gemini-1.5-flash → gpt-4o-mini → claude-3-5-haiku | ollama (llama3:8b) |
| Simple | gpt-4o-mini → gemini-1.5-flash | ollama (gemma2:2b) |
git clone https://github.com/shubham0086/Agent-Routing cd Agent-Routing npm install # Run the failover simulator node demo/failover.js
import { Router } from 'agent-routing';
const keys = {
openai: process.env.OPENAI_API_KEY,
anthropic: process.env.ANTHROPIC_API_KEY,
gemini: process.env.GEMINI_API_KEY
};
const router = new Router(keys, 'http://localhost:11434', 'openai');
// Send chat task with failover security and USD budget limits
const response = await router.chat("Write a fast JSON parser", {
taskClass: 'code',
budget: 0.05, // limit call to max 5 cents
temperature: 0.2
});
Agent-Routing represents the resilient backend gateway of the autonomy ladder. It implements Pattern 02 (Multi-Provider LLM Routing) in Agentic Patterns. The capstone platform AgentKernel routes all of its API traffic through this client to protect agent routines from timeouts.
A failover cascade hides model-specific format differences. If a task fails on OpenAI and shifts to Gemini, differences in system prompt support or tool call schemas can cause downstream errors. The router normalizes responses into a unified message structure, but when designing complex multi-agent tools, you must test fallback models to ensure they understand your function descriptions. Resiliency requires slightly simpler interfaces.