Documentation Index
Fetch the complete documentation index at: https://to11.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Providers
The gateway supports multiple LLM providers through a pluggable adapter system. Each provider is registered in the configuration file with its models and base URL.
Supported Providers
Native adapters (dedicated request/response translation):
| Provider | Format | Streaming | Tool Calls | Structured Output |
|---|
| OpenAI | OpenAI chat completions | Fast-path SSE | Yes | json_object, json_schema |
| Anthropic | Anthropic messages | Normalized SSE | Yes (auto-translated) | text only |
| Cohere | Cohere v2 chat | Normalized SSE | — | — |
OpenAI-compatible (config-only, no code changes needed):
| Provider | Base URL | Streaming | Tool Calls | Structured Output |
|---|
| xAI | api.x.ai | Fast-path SSE | Yes | json_object, json_schema |
| Mistral | api.mistral.ai | Fast-path SSE | Yes | json_object |
| Groq | api.groq.com | Fast-path SSE | Yes | json_object |
| Together AI | api.together.xyz | Fast-path SSE | Yes | json_object |
| Fireworks AI | api.fireworks.ai | Fast-path SSE | Yes | json_object |
| DeepSeek | api.deepseek.com | Fast-path SSE | Yes | json_object |
| Perplexity | api.perplexity.ai | Fast-path SSE | — | — |
| OpenRouter | openrouter.ai | Fast-path SSE | Yes | json_object, json_schema |
| Cerebras | api.cerebras.ai | Fast-path SSE | Yes | json_object |
| SambaNova | api.sambanova.ai | Fast-path SSE | Yes | — |
| AI21 | api.ai21.com | Fast-path SSE | Yes | — |
| DeepInfra | api.deepinfra.com | Fast-path SSE | Yes | json_object |
| NVIDIA NIM | integrate.api.nvidia.com | Fast-path SSE | Yes | json_object |
| Nebius | api.studio.nebius.ai | Fast-path SSE | Yes | json_object |
| Novita AI | api.novita.ai | Fast-path SSE | Yes | — |
| Moonshot | api.moonshot.cn | Fast-path SSE | Yes | — |
Cloud platforms (uncomment in config, custom auth):
| Provider | Auth Type | Streaming | Tool Calls | Structured Output |
|---|
| Azure OpenAI | api-key header | Fast-path SSE | Yes | json_object, json_schema |
| Google Gemini | query_param | Fast-path SSE | Yes | json_object |
| AWS Bedrock | Bearer | Fast-path SSE | Yes | json_object |
Self-hosted / local (uncomment in config):
| Provider | Default URL | Streaming | Notes |
|---|
| Ollama | localhost:11434 | Fast-path SSE | No auth required |
| vLLM | localhost:8000 | Fast-path SSE | No auth required |
| Databricks | per-workspace | Fast-path SSE | Workspace-specific URL |
| HuggingFace | api-inference.huggingface.co | Fast-path SSE | TGI endpoints |
| Anyscale | api.endpoints.anyscale.com | Fast-path SSE | — |
| Cloudflare Workers AI | per-account | Fast-path SSE | Account ID in URL |
Configuration
Each provider is registered as a named table under [providers.*] in gateway.toml. The gateway uses passthrough authentication — the caller’s Authorization header is forwarded directly to the upstream provider.
Native adapters
[providers.openai]
base_url = "https://api.openai.com/v1"
credential = "env::OPENAI_API_KEY"
models = ["gpt-4o", "gpt-4o-mini", "o1", "o3-mini", "text-embedding-3-small", "text-embedding-3-large", "tts-1", "tts-1-hd", "whisper-1", "dall-e-3", "dall-e-2"]
timeout_ms = 30000
[providers.anthropic]
base_url = "https://api.anthropic.com/v1"
credential = "env::ANTHROPIC_API_KEY"
models = ["claude-sonnet-4-6", "claude-haiku-4-5-20251001", "claude-opus-4-6"]
timeout_ms = 60000
[providers.cohere]
base_url = "https://api.cohere.com"
credential = "env::COHERE_API_KEY"
models = ["command-r-plus", "command-r", "command-light", "embed-english-v3.0", "embed-multilingual-v3.0", "rerank-english-v3.0", "rerank-multilingual-v3.0"]
timeout_ms = 30000
OpenAI-compatible providers
These providers speak the OpenAI /chat/completions format. No code changes needed — just add the config block and set the API key.
[providers.xai]
base_url = "https://api.x.ai/v1"
credential = "env::XAI_API_KEY"
models = ["grok-2", "grok-2-mini"]
timeout_ms = 30000
[providers.mistral]
base_url = "https://api.mistral.ai/v1"
credential = "env::MISTRAL_API_KEY"
models = ["mistral-large-latest", "mistral-medium-latest", "mistral-small-latest", "open-mistral-nemo", "mistral-embed"]
timeout_ms = 30000
[providers.groq]
base_url = "https://api.groq.com/openai/v1"
credential = "env::GROQ_API_KEY"
models = ["llama-3.3-70b-versatile", "llama-3.1-8b-instant", "mixtral-8x7b-32768", "gemma2-9b-it"]
timeout_ms = 15000
[providers.together]
base_url = "https://api.together.xyz/v1"
credential = "env::TOGETHER_API_KEY"
models = ["meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo", "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo", "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo", "mistralai/Mixtral-8x22B-Instruct-v0.1"]
timeout_ms = 30000
[providers.fireworks]
base_url = "https://api.fireworks.ai/inference/v1"
credential = "env::FIREWORKS_API_KEY"
models = ["accounts/fireworks/models/llama-v3p1-405b-instruct", "accounts/fireworks/models/llama-v3p1-70b-instruct", "accounts/fireworks/models/mixtral-8x22b-instruct"]
timeout_ms = 30000
[providers.deepseek]
base_url = "https://api.deepseek.com/v1"
credential = "env::DEEPSEEK_API_KEY"
models = ["deepseek-chat", "deepseek-reasoner"]
timeout_ms = 60000
[providers.perplexity]
base_url = "https://api.perplexity.ai"
credential = "env::PERPLEXITY_API_KEY"
models = ["sonar", "sonar-pro", "sonar-reasoning", "sonar-reasoning-pro"]
timeout_ms = 30000
[providers.openrouter]
base_url = "https://openrouter.ai/api/v1"
credential = "env::OPENROUTER_API_KEY"
models = ["openai/gpt-4o", "anthropic/claude-sonnet-4-6", "google/gemini-2.5-pro-preview-06-05", "meta-llama/llama-3.1-405b-instruct"]
timeout_ms = 30000
[providers.cerebras]
base_url = "https://api.cerebras.ai/v1"
credential = "env::CEREBRAS_API_KEY"
models = ["llama-3.3-70b", "llama-3.1-8b"]
timeout_ms = 15000
[providers.sambanova]
base_url = "https://api.sambanova.ai/v1"
credential = "env::SAMBANOVA_API_KEY"
models = ["Meta-Llama-3.1-405B-Instruct", "Meta-Llama-3.1-70B-Instruct", "Meta-Llama-3.1-8B-Instruct"]
timeout_ms = 30000
[providers.ai21]
base_url = "https://api.ai21.com/studio/v1"
credential = "env::AI21_API_KEY"
models = ["jamba-1.5-large", "jamba-1.5-mini"]
timeout_ms = 30000
[providers.deepinfra]
base_url = "https://api.deepinfra.com/v1/openai"
credential = "env::DEEPINFRA_API_KEY"
models = ["meta-llama/Meta-Llama-3.1-405B-Instruct", "meta-llama/Meta-Llama-3.1-70B-Instruct", "mistralai/Mixtral-8x22B-Instruct-v0.1"]
timeout_ms = 30000
[providers.nvidia-nim]
base_url = "https://integrate.api.nvidia.com/v1"
credential = "env::NVIDIA_NIM_API_KEY"
models = ["meta/llama-3.1-405b-instruct", "meta/llama-3.1-70b-instruct"]
timeout_ms = 30000
[providers.nebius]
base_url = "https://api.studio.nebius.ai/v1"
credential = "env::NEBIUS_API_KEY"
models = ["meta-llama/Meta-Llama-3.1-405B-Instruct", "meta-llama/Meta-Llama-3.1-70B-Instruct"]
timeout_ms = 30000
[providers.novita-ai]
base_url = "https://api.novita.ai/v3/openai"
credential = "env::NOVITA_AI_API_KEY"
models = ["meta-llama/llama-3.1-405b-instruct", "meta-llama/llama-3.1-70b-instruct"]
timeout_ms = 30000
[providers.moonshot]
base_url = "https://api.moonshot.cn/v1"
credential = "env::MOONSHOT_API_KEY"
models = ["moonshot-v1-128k", "moonshot-v1-32k", "moonshot-v1-8k"]
timeout_ms = 30000
These require user-specific configuration (workspace URLs, account IDs). Replace the placeholder values before enabling.
[providers.azure-openai]
base_url = "https://YOUR_RESOURCE.openai.azure.com/openai"
credential = "env::AZURE_OPENAI_API_KEY"
auth_type = "api_key_header"
models = ["my-gpt4o-deployment"]
timeout_ms = 30000
[providers.google-gemini]
base_url = "https://generativelanguage.googleapis.com/v1beta/openai"
credential = "env::GEMINI_API_KEY"
auth_type = "query_param"
models = ["gemini-2.5-pro-preview-06-05", "gemini-2.5-flash-preview-05-20", "gemini-2.0-flash", "gemini-2.0-flash-lite", "text-embedding-004"]
timeout_ms = 30000
[providers.aws-bedrock]
base_url = "https://bedrock-mantle.us-east-1.api.aws/v1"
credential = "env::BEDROCK_API_KEY"
models = ["anthropic.claude-sonnet-4-6-v1", "anthropic.claude-haiku-4-5-20251001-v1", "meta.llama3-1-405b-instruct-v1"]
timeout_ms = 60000
[providers.databricks]
base_url = "https://YOUR_WORKSPACE.databricks.com/serving-endpoints"
credential = "env::DATABRICKS_TOKEN"
models = ["databricks-meta-llama-3-1-405b-instruct"]
timeout_ms = 30000
[providers.huggingface]
base_url = "https://api-inference.huggingface.co/v1"
credential = "env::HF_API_KEY"
models = ["meta-llama/Meta-Llama-3.1-70B-Instruct"]
timeout_ms = 30000
[providers.anyscale]
base_url = "https://api.endpoints.anyscale.com/v1"
credential = "env::ANYSCALE_API_KEY"
models = ["meta-llama/Meta-Llama-3.1-70B-Instruct"]
timeout_ms = 30000
[providers.cloudflare-ai]
base_url = "https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/ai/v1"
credential = "env::CLOUDFLARE_API_KEY"
models = ["@cf/meta/llama-3.1-70b-instruct"]
timeout_ms = 30000
Self-hosted / local providers
[providers.ollama]
base_url = "http://localhost:11434/v1"
credential = "none" # Ollama doesn't require auth
models = ["llama3.1", "mistral", "codellama"]
timeout_ms = 60000
[providers.vllm]
base_url = "http://localhost:8000/v1"
credential = "none" # vLLM doesn't require auth
models = ["meta-llama/Meta-Llama-3.1-70B-Instruct"]
timeout_ms = 60000
Model Routing
The gateway routes requests based on the model field in the request body. It looks up which provider registered that model name and forwards the request there.
Automatic routing
{ "model": "gpt-4o", "messages": [...] }
The gateway matches gpt-4o to the openai provider and routes accordingly.
Explicit provider hints
Force a specific provider with the x-genai-provider header:
x-genai-provider: anthropic
Or use a provider::model prefix:
{ "model": "anthropic::claude-sonnet-4-6", "messages": [...] }
If both header and prefix are present, they must agree or the request is rejected with 400 Bad Request.
The gateway handles format translation automatically when the caller’s SDK format differs from the upstream provider’s format.
OpenAI to Anthropic
When an OpenAI-format request is routed to Anthropic:
messages with role: "system" are extracted and sent as Anthropic’s top-level system field
max_tokens is forwarded directly
tools definitions are translated from OpenAI parameters to Anthropic input_schema
- Response is translated back to OpenAI
choices[].message format
Anthropic to OpenAI
When an Anthropic-format request is routed to OpenAI:
- Top-level
system is converted to a {"role": "system"} message
- Content blocks are normalized
- Response is translated to Anthropic
content[] format
Streaming translation
| Client SDK | Upstream | Path | Why |
|---|
| OpenAI | OpenAI | Fast path | Wire formats match |
| OpenAI | xAI | Fast path | xAI uses OpenAI-compatible SSE |
| Anthropic | OpenAI | Normalized | Incompatible SSE wire formats |
| Anthropic | Anthropic | Normalized | Anthropic ingress uses normalized path |
Both paths produce identical telemetry and apply the same output guardrail checks.
API Key Handling
The gateway uses passthrough authentication. Your Authorization header is forwarded directly to the upstream provider. The gateway does not read, store, or manage provider API keys.
Extended Thinking
Anthropic extended thinking blocks (thinking content type) are preserved in Anthropic-format responses. When the response is serialized to OpenAI format, thinking blocks are omitted since OpenAI has no equivalent.
Adding a New Provider
Providers that follow the OpenAI /chat/completions format (like xAI) can use the OpenAiAdapter directly — just add a [providers.name] table to the config. No code changes needed.
For providers with a different wire format, implement the ProviderAdapter trait. See the Gateway Development guide for details on implementing a new adapter.
Provider Limitations
Not all endpoints are available for every provider. The following features are only supported by OpenAI-compatible providers:
| Feature | OpenAI / xAI | Anthropic |
|---|
| Chat completions | Yes | Yes |
| Embeddings | Yes | No |
| Image generation | Yes | No |
| Audio transcription | Yes | No |
| Audio speech (TTS) | Yes | No |
| Files API | Yes | No |
| Token counting | No | Yes |
json_object / json_schema response format | Yes | No |
Requests for unsupported features return 400 Bad Request.