Model Router (Cost Tiers) Pattern
Demonstrates intelligent routing of requests to different models based on task complexity to optimize cost and quality.
Route every LLM request to the right model for the job, so you stop paying premium prices for economy-grade tasks.
Edition
Section titled “Edition”Community — runs on the OSS / Community SDK edition.
What this demonstrates
Section titled “What this demonstrates”Difficulty: Starter 🟢 · LLM
- Summary: Cost-aware provider routing and selection
- Scenario: Route requests to the cheapest capable provider
tech_tagsin manifest:LLM— example idcost-routinginconformance/examples_manifest.json.
Prerequisites
Section titled “Prerequisites”- SDK: Use an installed SDK tree (
NXUSKIT_SDK_DIR,NXUSKIT_LIB_PATHas needed);test-examples.shresolves Go/Rust/Python deps from that tree only — see README.md,scripts/setup-sdk.sh, andscripts/test-examples.sh. - Languages in this example: go, rust (paths under this directory; Python may live under a sibling
python/or shared reference per Language Implementations). - Models: Set cloud provider API keys and/or run Ollama locally when you execute the Run steps (interactive flags like
--help/--verboseare documented below).
Key nxusKit Features Demonstrated
Section titled “Key nxusKit Features Demonstrated”| Feature | Description |
|---|---|
| Provider Abstraction | Same interface for all models enables seamless tier switching |
| Request Portability | ChatRequest works identically across model tiers |
| Token Normalization | Consistent token usage tracking for cost calculations |
Provider Compatibility: Works with any provider supporting multiple models (OpenAI, Ollama, etc.)
Pattern Overview
Section titled “Pattern Overview”Not all prompts need the most expensive model. This pattern analyzes prompt characteristics and routes requests to an appropriate cost tier, balancing quality requirements with cost efficiency.
Cost Tiers
Section titled “Cost Tiers”| Tier | Model | Use Case | Cost |
|---|---|---|---|
| Economy | gpt-4o-mini | Simple queries, lookups | $ |
| Standard | gpt-4o | General tasks, explanations | $$ |
| Premium | gpt-4-turbo | Complex analysis, reasoning | $$$ |
Classification Heuristics
Section titled “Classification Heuristics”The example uses simple heuristics to classify tasks:
-
Premium tier triggers:
- Keywords: “analyze”, “compare”, “evaluate”, “synthesize”, “critique”
- Prompt length > 1000 characters
-
Standard tier:
- Prompt length > 200 characters (without premium keywords)
-
Economy tier:
- Short, simple prompts
Real-World Application
Section titled “Real-World Application”Cost-optimized AI platform, budget-aware inference.
Technologies
Section titled “Technologies”LLM
Language Implementations
Section titled “Language Implementations”| Language | Path | Status |
|---|---|---|
| Rust | rust/ | Available |
| Go | go/ | Available |
Attach an installed SDK (NXUSKIT_SDK_DIR). See the repository README.md and scripts/test-examples.sh.
# From `/examples/patterns/cost-routing`:cd rust && cargo buildcd go && make buildLibrary usage
Section titled “Library usage”use cost_routing::{classify_task, routed_chat, CostTier};
// Classify without making API calllet tier = classify_task(prompt);println!("Would use: {} tier", tier.as_str());
// Or route and executelet result = routed_chat(&provider, prompt).await?;println!("Used {} tier", result.tier.as_str());// Classify without making API calltier := ClassifyTask(prompt)fmt.Println("Would use:", tier.Name(), "tier")
// Or route and executeresult, err := RoutedChat(ctx, provider, prompt)fmt.Println("Used", result.Tier.Name(), "tier")cd rustcargo runcd gogo run .Interactive Modes
Section titled “Interactive Modes”All examples support debugging flags:
# Verbose mode - show raw HTTP request/response datacargo run -- --verbose # Rustgo run . --verbose # Go
# Step mode - pause at each step with explanationscargo run -- --step # Rustgo run . --step # Go
# Combined modecargo run -- --verbose --stepOr use environment variables:
export NXUSKIT_VERBOSE=1export NXUSKIT_STEP=1Testing
Section titled “Testing”# Rustcd rust && cargo test
# Gocd go && go test -vAdvanced Classification
Section titled “Advanced Classification”For production, consider more sophisticated approaches:
- Intent classification: Use a small model to classify intent first
- Historical data: Learn from past prompt/tier success rates
- User preferences: Allow users to specify quality requirements
- Dynamic pricing: Adjust tiers based on current API pricing
Production Considerations
Section titled “Production Considerations”- Calibration: Tune thresholds based on your specific use cases
- Monitoring: Track accuracy and cost savings by tier
- Override capability: Allow manual tier selection for edge cases
- A/B testing: Validate routing decisions improve outcomes