Skip to main content

Documentation Index

Fetch the complete documentation index at: https://to11.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Providers

The gateway supports multiple LLM providers through a pluggable adapter system. Each provider is registered in the configuration file with its models and base URL.

Supported Providers

Native adapters (dedicated request/response translation):
ProviderFormatStreamingTool CallsStructured Output
OpenAIOpenAI chat completionsFast-path SSEYesjson_object, json_schema
AnthropicAnthropic messagesNormalized SSEYes (auto-translated)text only
CohereCohere v2 chatNormalized SSE
OpenAI-compatible (config-only, no code changes needed):
ProviderBase URLStreamingTool CallsStructured Output
xAIapi.x.aiFast-path SSEYesjson_object, json_schema
Mistralapi.mistral.aiFast-path SSEYesjson_object
Groqapi.groq.comFast-path SSEYesjson_object
Together AIapi.together.xyzFast-path SSEYesjson_object
Fireworks AIapi.fireworks.aiFast-path SSEYesjson_object
DeepSeekapi.deepseek.comFast-path SSEYesjson_object
Perplexityapi.perplexity.aiFast-path SSE
OpenRouteropenrouter.aiFast-path SSEYesjson_object, json_schema
Cerebrasapi.cerebras.aiFast-path SSEYesjson_object
SambaNovaapi.sambanova.aiFast-path SSEYes
AI21api.ai21.comFast-path SSEYes
DeepInfraapi.deepinfra.comFast-path SSEYesjson_object
NVIDIA NIMintegrate.api.nvidia.comFast-path SSEYesjson_object
Nebiusapi.studio.nebius.aiFast-path SSEYesjson_object
Novita AIapi.novita.aiFast-path SSEYes
Moonshotapi.moonshot.cnFast-path SSEYes
Cloud platforms (uncomment in config, custom auth):
ProviderAuth TypeStreamingTool CallsStructured Output
Azure OpenAIapi-key headerFast-path SSEYesjson_object, json_schema
Google Geminiquery_paramFast-path SSEYesjson_object
AWS BedrockBearerFast-path SSEYesjson_object
Self-hosted / local (uncomment in config):
ProviderDefault URLStreamingNotes
Ollamalocalhost:11434Fast-path SSENo auth required
vLLMlocalhost:8000Fast-path SSENo auth required
Databricksper-workspaceFast-path SSEWorkspace-specific URL
HuggingFaceapi-inference.huggingface.coFast-path SSETGI endpoints
Anyscaleapi.endpoints.anyscale.comFast-path SSE
Cloudflare Workers AIper-accountFast-path SSEAccount ID in URL

Configuration

Each provider is registered as a named table under [providers.*] in gateway.toml. The gateway uses passthrough authentication — the caller’s Authorization header is forwarded directly to the upstream provider.

Native adapters

[providers.openai]
base_url    = "https://api.openai.com/v1"
credential  = "env::OPENAI_API_KEY"
models      = ["gpt-4o", "gpt-4o-mini", "o1", "o3-mini", "text-embedding-3-small", "text-embedding-3-large", "tts-1", "tts-1-hd", "whisper-1", "dall-e-3", "dall-e-2"]
timeout_ms  = 30000

[providers.anthropic]
base_url    = "https://api.anthropic.com/v1"
credential  = "env::ANTHROPIC_API_KEY"
models      = ["claude-sonnet-4-6", "claude-haiku-4-5-20251001", "claude-opus-4-6"]
timeout_ms  = 60000

[providers.cohere]
base_url    = "https://api.cohere.com"
credential  = "env::COHERE_API_KEY"
models      = ["command-r-plus", "command-r", "command-light", "embed-english-v3.0", "embed-multilingual-v3.0", "rerank-english-v3.0", "rerank-multilingual-v3.0"]
timeout_ms  = 30000

OpenAI-compatible providers

These providers speak the OpenAI /chat/completions format. No code changes needed — just add the config block and set the API key.
[providers.xai]
base_url    = "https://api.x.ai/v1"
credential  = "env::XAI_API_KEY"
models      = ["grok-2", "grok-2-mini"]
timeout_ms  = 30000

[providers.mistral]
base_url    = "https://api.mistral.ai/v1"
credential  = "env::MISTRAL_API_KEY"
models      = ["mistral-large-latest", "mistral-medium-latest", "mistral-small-latest", "open-mistral-nemo", "mistral-embed"]
timeout_ms  = 30000

[providers.groq]
base_url    = "https://api.groq.com/openai/v1"
credential  = "env::GROQ_API_KEY"
models      = ["llama-3.3-70b-versatile", "llama-3.1-8b-instant", "mixtral-8x7b-32768", "gemma2-9b-it"]
timeout_ms  = 15000

[providers.together]
base_url    = "https://api.together.xyz/v1"
credential  = "env::TOGETHER_API_KEY"
models      = ["meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo", "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo", "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo", "mistralai/Mixtral-8x22B-Instruct-v0.1"]
timeout_ms  = 30000

[providers.fireworks]
base_url    = "https://api.fireworks.ai/inference/v1"
credential  = "env::FIREWORKS_API_KEY"
models      = ["accounts/fireworks/models/llama-v3p1-405b-instruct", "accounts/fireworks/models/llama-v3p1-70b-instruct", "accounts/fireworks/models/mixtral-8x22b-instruct"]
timeout_ms  = 30000

[providers.deepseek]
base_url    = "https://api.deepseek.com/v1"
credential  = "env::DEEPSEEK_API_KEY"
models      = ["deepseek-chat", "deepseek-reasoner"]
timeout_ms  = 60000

[providers.perplexity]
base_url    = "https://api.perplexity.ai"
credential  = "env::PERPLEXITY_API_KEY"
models      = ["sonar", "sonar-pro", "sonar-reasoning", "sonar-reasoning-pro"]
timeout_ms  = 30000

[providers.openrouter]
base_url    = "https://openrouter.ai/api/v1"
credential  = "env::OPENROUTER_API_KEY"
models      = ["openai/gpt-4o", "anthropic/claude-sonnet-4-6", "google/gemini-2.5-pro-preview-06-05", "meta-llama/llama-3.1-405b-instruct"]
timeout_ms  = 30000

[providers.cerebras]
base_url    = "https://api.cerebras.ai/v1"
credential  = "env::CEREBRAS_API_KEY"
models      = ["llama-3.3-70b", "llama-3.1-8b"]
timeout_ms  = 15000

[providers.sambanova]
base_url    = "https://api.sambanova.ai/v1"
credential  = "env::SAMBANOVA_API_KEY"
models      = ["Meta-Llama-3.1-405B-Instruct", "Meta-Llama-3.1-70B-Instruct", "Meta-Llama-3.1-8B-Instruct"]
timeout_ms  = 30000

[providers.ai21]
base_url    = "https://api.ai21.com/studio/v1"
credential  = "env::AI21_API_KEY"
models      = ["jamba-1.5-large", "jamba-1.5-mini"]
timeout_ms  = 30000

[providers.deepinfra]
base_url    = "https://api.deepinfra.com/v1/openai"
credential  = "env::DEEPINFRA_API_KEY"
models      = ["meta-llama/Meta-Llama-3.1-405B-Instruct", "meta-llama/Meta-Llama-3.1-70B-Instruct", "mistralai/Mixtral-8x22B-Instruct-v0.1"]
timeout_ms  = 30000

[providers.nvidia-nim]
base_url    = "https://integrate.api.nvidia.com/v1"
credential  = "env::NVIDIA_NIM_API_KEY"
models      = ["meta/llama-3.1-405b-instruct", "meta/llama-3.1-70b-instruct"]
timeout_ms  = 30000

[providers.nebius]
base_url    = "https://api.studio.nebius.ai/v1"
credential  = "env::NEBIUS_API_KEY"
models      = ["meta-llama/Meta-Llama-3.1-405B-Instruct", "meta-llama/Meta-Llama-3.1-70B-Instruct"]
timeout_ms  = 30000

[providers.novita-ai]
base_url    = "https://api.novita.ai/v3/openai"
credential  = "env::NOVITA_AI_API_KEY"
models      = ["meta-llama/llama-3.1-405b-instruct", "meta-llama/llama-3.1-70b-instruct"]
timeout_ms  = 30000

[providers.moonshot]
base_url    = "https://api.moonshot.cn/v1"
credential  = "env::MOONSHOT_API_KEY"
models      = ["moonshot-v1-128k", "moonshot-v1-32k", "moonshot-v1-8k"]
timeout_ms  = 30000

Cloud platform providers

These require user-specific configuration (workspace URLs, account IDs). Replace the placeholder values before enabling.
[providers.azure-openai]
base_url    = "https://YOUR_RESOURCE.openai.azure.com/openai"
credential  = "env::AZURE_OPENAI_API_KEY"
auth_type   = "api_key_header"
models      = ["my-gpt4o-deployment"]
timeout_ms  = 30000

[providers.google-gemini]
base_url    = "https://generativelanguage.googleapis.com/v1beta/openai"
credential  = "env::GEMINI_API_KEY"
auth_type   = "query_param"
models      = ["gemini-2.5-pro-preview-06-05", "gemini-2.5-flash-preview-05-20", "gemini-2.0-flash", "gemini-2.0-flash-lite", "text-embedding-004"]
timeout_ms  = 30000

[providers.aws-bedrock]
base_url    = "https://bedrock-mantle.us-east-1.api.aws/v1"
credential  = "env::BEDROCK_API_KEY"
models      = ["anthropic.claude-sonnet-4-6-v1", "anthropic.claude-haiku-4-5-20251001-v1", "meta.llama3-1-405b-instruct-v1"]
timeout_ms  = 60000

[providers.databricks]
base_url    = "https://YOUR_WORKSPACE.databricks.com/serving-endpoints"
credential  = "env::DATABRICKS_TOKEN"
models      = ["databricks-meta-llama-3-1-405b-instruct"]
timeout_ms  = 30000

[providers.huggingface]
base_url    = "https://api-inference.huggingface.co/v1"
credential  = "env::HF_API_KEY"
models      = ["meta-llama/Meta-Llama-3.1-70B-Instruct"]
timeout_ms  = 30000

[providers.anyscale]
base_url    = "https://api.endpoints.anyscale.com/v1"
credential  = "env::ANYSCALE_API_KEY"
models      = ["meta-llama/Meta-Llama-3.1-70B-Instruct"]
timeout_ms  = 30000

[providers.cloudflare-ai]
base_url    = "https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/ai/v1"
credential  = "env::CLOUDFLARE_API_KEY"
models      = ["@cf/meta/llama-3.1-70b-instruct"]
timeout_ms  = 30000

Self-hosted / local providers

[providers.ollama]
base_url    = "http://localhost:11434/v1"
credential  = "none"   # Ollama doesn't require auth
models      = ["llama3.1", "mistral", "codellama"]
timeout_ms  = 60000

[providers.vllm]
base_url    = "http://localhost:8000/v1"
credential  = "none"    # vLLM doesn't require auth
models      = ["meta-llama/Meta-Llama-3.1-70B-Instruct"]
timeout_ms  = 60000

Model Routing

The gateway routes requests based on the model field in the request body. It looks up which provider registered that model name and forwards the request there.

Automatic routing

{ "model": "gpt-4o", "messages": [...] }
The gateway matches gpt-4o to the openai provider and routes accordingly.

Explicit provider hints

Force a specific provider with the x-genai-provider header:
x-genai-provider: anthropic
Or use a provider::model prefix:
{ "model": "anthropic::claude-sonnet-4-6", "messages": [...] }
If both header and prefix are present, they must agree or the request is rejected with 400 Bad Request.

Cross-Format Translation

The gateway handles format translation automatically when the caller’s SDK format differs from the upstream provider’s format.

OpenAI to Anthropic

When an OpenAI-format request is routed to Anthropic:
  • messages with role: "system" are extracted and sent as Anthropic’s top-level system field
  • max_tokens is forwarded directly
  • tools definitions are translated from OpenAI parameters to Anthropic input_schema
  • Response is translated back to OpenAI choices[].message format

Anthropic to OpenAI

When an Anthropic-format request is routed to OpenAI:
  • Top-level system is converted to a {"role": "system"} message
  • Content blocks are normalized
  • Response is translated to Anthropic content[] format

Streaming translation

Client SDKUpstreamPathWhy
OpenAIOpenAIFast pathWire formats match
OpenAIxAIFast pathxAI uses OpenAI-compatible SSE
AnthropicOpenAINormalizedIncompatible SSE wire formats
AnthropicAnthropicNormalizedAnthropic ingress uses normalized path
Both paths produce identical telemetry and apply the same output guardrail checks.

API Key Handling

The gateway uses passthrough authentication. Your Authorization header is forwarded directly to the upstream provider. The gateway does not read, store, or manage provider API keys.

Extended Thinking

Anthropic extended thinking blocks (thinking content type) are preserved in Anthropic-format responses. When the response is serialized to OpenAI format, thinking blocks are omitted since OpenAI has no equivalent.

Adding a New Provider

Providers that follow the OpenAI /chat/completions format (like xAI) can use the OpenAiAdapter directly — just add a [providers.name] table to the config. No code changes needed. For providers with a different wire format, implement the ProviderAdapter trait. See the Gateway Development guide for details on implementing a new adapter.

Provider Limitations

Not all endpoints are available for every provider. The following features are only supported by OpenAI-compatible providers:
FeatureOpenAI / xAIAnthropic
Chat completionsYesYes
EmbeddingsYesNo
Image generationYesNo
Audio transcriptionYesNo
Audio speech (TTS)YesNo
Files APIYesNo
Token countingNoYes
json_object / json_schema response formatYesNo
Requests for unsupported features return 400 Bad Request.