Providers

The gateway supports multiple LLM providers through a pluggable adapter system. Each provider is registered in the configuration file with its models and base URL.

Supported Providers

Native adapters (dedicated request/response translation):

Provider	Format	Streaming	Tool Calls	Structured Output
OpenAI	OpenAI chat completions	Fast-path SSE	Yes	`json_object`, `json_schema`
Anthropic	Anthropic messages	Normalized SSE	Yes (auto-translated)	`text` only
Cohere	Cohere v2 chat	Normalized SSE	—	—

OpenAI-compatible (config-only, no code changes needed):

Provider	Base URL	Streaming	Tool Calls	Structured Output
xAI	`api.x.ai`	Fast-path SSE	Yes	`json_object`, `json_schema`
Mistral	`api.mistral.ai`	Fast-path SSE	Yes	`json_object`
Groq	`api.groq.com`	Fast-path SSE	Yes	`json_object`
Together AI	`api.together.xyz`	Fast-path SSE	Yes	`json_object`
Fireworks AI	`api.fireworks.ai`	Fast-path SSE	Yes	`json_object`
DeepSeek	`api.deepseek.com`	Fast-path SSE	Yes	`json_object`
Perplexity	`api.perplexity.ai`	Fast-path SSE	—	—
OpenRouter	`openrouter.ai`	Fast-path SSE	Yes	`json_object`, `json_schema`
Cerebras	`api.cerebras.ai`	Fast-path SSE	Yes	`json_object`
SambaNova	`api.sambanova.ai`	Fast-path SSE	Yes	—
AI21	`api.ai21.com`	Fast-path SSE	Yes	—
DeepInfra	`api.deepinfra.com`	Fast-path SSE	Yes	`json_object`
NVIDIA NIM	`integrate.api.nvidia.com`	Fast-path SSE	Yes	`json_object`
Nebius	`api.studio.nebius.ai`	Fast-path SSE	Yes	`json_object`
Novita AI	`api.novita.ai`	Fast-path SSE	Yes	—
Moonshot	`api.moonshot.cn`	Fast-path SSE	Yes	—

Cloud platforms (uncomment in config, custom auth):

Provider	Auth Type	Streaming	Tool Calls	Structured Output
Azure OpenAI	`api-key` header	Fast-path SSE	Yes	`json_object`, `json_schema`
Google Gemini	`query_param`	Fast-path SSE	Yes	`json_object`
AWS Bedrock	Bearer	Fast-path SSE	Yes	`json_object`

Self-hosted / local (uncomment in config):

Provider	Default URL	Streaming	Notes
Ollama	`localhost:11434`	Fast-path SSE	No auth required
vLLM	`localhost:8000`	Fast-path SSE	No auth required
Databricks	per-workspace	Fast-path SSE	Workspace-specific URL
HuggingFace	`api-inference.huggingface.co`	Fast-path SSE	TGI endpoints
Anyscale	`api.endpoints.anyscale.com`	Fast-path SSE	—
Cloudflare Workers AI	per-account	Fast-path SSE	Account ID in URL

Configuration

Each provider is registered as a named table under [providers.*] in gateway.toml. The gateway uses passthrough authentication — the caller’s Authorization header is forwarded directly to the upstream provider.

Native adapters

[providers.openai]
base_url    = "https://api.openai.com/v1"
credential  = "env::OPENAI_API_KEY"
models      = ["gpt-4o", "gpt-4o-mini", "o1", "o3-mini", "text-embedding-3-small", "text-embedding-3-large", "tts-1", "tts-1-hd", "whisper-1", "dall-e-3", "dall-e-2"]
timeout_ms  = 30000

[providers.anthropic]
base_url    = "https://api.anthropic.com/v1"
credential  = "env::ANTHROPIC_API_KEY"
models      = ["claude-sonnet-4-6", "claude-haiku-4-5-20251001", "claude-opus-4-6"]
timeout_ms  = 60000

[providers.cohere]
base_url    = "https://api.cohere.com"
credential  = "env::COHERE_API_KEY"
models      = ["command-r-plus", "command-r", "command-light", "embed-english-v3.0", "embed-multilingual-v3.0", "rerank-english-v3.0", "rerank-multilingual-v3.0"]
timeout_ms  = 30000

OpenAI-compatible providers

These providers speak the OpenAI /chat/completions format. No code changes needed — just add the config block and set the API key.

[providers.xai]
base_url    = "https://api.x.ai/v1"
credential  = "env::XAI_API_KEY"
models      = ["grok-2", "grok-2-mini"]
timeout_ms  = 30000

[providers.mistral]
base_url    = "https://api.mistral.ai/v1"
credential  = "env::MISTRAL_API_KEY"
models      = ["mistral-large-latest", "mistral-medium-latest", "mistral-small-latest", "open-mistral-nemo", "mistral-embed"]
timeout_ms  = 30000

[providers.groq]
base_url    = "https://api.groq.com/openai/v1"
credential  = "env::GROQ_API_KEY"
models      = ["llama-3.3-70b-versatile", "llama-3.1-8b-instant", "mixtral-8x7b-32768", "gemma2-9b-it"]
timeout_ms  = 15000

[providers.together]
base_url    = "https://api.together.xyz/v1"
credential  = "env::TOGETHER_API_KEY"
models      = ["meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo", "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo", "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo", "mistralai/Mixtral-8x22B-Instruct-v0.1"]
timeout_ms  = 30000

[providers.fireworks]
base_url    = "https://api.fireworks.ai/inference/v1"
credential  = "env::FIREWORKS_API_KEY"
models      = ["accounts/fireworks/models/llama-v3p1-405b-instruct", "accounts/fireworks/models/llama-v3p1-70b-instruct", "accounts/fireworks/models/mixtral-8x22b-instruct"]
timeout_ms  = 30000

[providers.deepseek]
base_url    = "https://api.deepseek.com/v1"
credential  = "env::DEEPSEEK_API_KEY"
models      = ["deepseek-chat", "deepseek-reasoner"]
timeout_ms  = 60000

[providers.perplexity]
base_url    = "https://api.perplexity.ai"
credential  = "env::PERPLEXITY_API_KEY"
models      = ["sonar", "sonar-pro", "sonar-reasoning", "sonar-reasoning-pro"]
timeout_ms  = 30000

[providers.openrouter]
base_url    = "https://openrouter.ai/api/v1"
credential  = "env::OPENROUTER_API_KEY"
models      = ["openai/gpt-4o", "anthropic/claude-sonnet-4-6", "google/gemini-2.5-pro-preview-06-05", "meta-llama/llama-3.1-405b-instruct"]
timeout_ms  = 30000

[providers.cerebras]
base_url    = "https://api.cerebras.ai/v1"
credential  = "env::CEREBRAS_API_KEY"
models      = ["llama-3.3-70b", "llama-3.1-8b"]
timeout_ms  = 15000

[providers.sambanova]
base_url    = "https://api.sambanova.ai/v1"
credential  = "env::SAMBANOVA_API_KEY"
models      = ["Meta-Llama-3.1-405B-Instruct", "Meta-Llama-3.1-70B-Instruct", "Meta-Llama-3.1-8B-Instruct"]
timeout_ms  = 30000

[providers.ai21]
base_url    = "https://api.ai21.com/studio/v1"
credential  = "env::AI21_API_KEY"
models      = ["jamba-1.5-large", "jamba-1.5-mini"]
timeout_ms  = 30000

[providers.deepinfra]
base_url    = "https://api.deepinfra.com/v1/openai"
credential  = "env::DEEPINFRA_API_KEY"
models      = ["meta-llama/Meta-Llama-3.1-405B-Instruct", "meta-llama/Meta-Llama-3.1-70B-Instruct", "mistralai/Mixtral-8x22B-Instruct-v0.1"]
timeout_ms  = 30000

[providers.nvidia-nim]
base_url    = "https://integrate.api.nvidia.com/v1"
credential  = "env::NVIDIA_NIM_API_KEY"
models      = ["meta/llama-3.1-405b-instruct", "meta/llama-3.1-70b-instruct"]
timeout_ms  = 30000

[providers.nebius]
base_url    = "https://api.studio.nebius.ai/v1"
credential  = "env::NEBIUS_API_KEY"
models      = ["meta-llama/Meta-Llama-3.1-405B-Instruct", "meta-llama/Meta-Llama-3.1-70B-Instruct"]
timeout_ms  = 30000

[providers.novita-ai]
base_url    = "https://api.novita.ai/v3/openai"
credential  = "env::NOVITA_AI_API_KEY"
models      = ["meta-llama/llama-3.1-405b-instruct", "meta-llama/llama-3.1-70b-instruct"]
timeout_ms  = 30000

[providers.moonshot]
base_url    = "https://api.moonshot.cn/v1"
credential  = "env::MOONSHOT_API_KEY"
models      = ["moonshot-v1-128k", "moonshot-v1-32k", "moonshot-v1-8k"]
timeout_ms  = 30000

Cloud platform providers

These require user-specific configuration (workspace URLs, account IDs). Replace the placeholder values before enabling.

[providers.azure-openai]
base_url    = "https://YOUR_RESOURCE.openai.azure.com/openai"
credential  = "env::AZURE_OPENAI_API_KEY"
auth_type   = "api_key_header"
models      = ["my-gpt4o-deployment"]
timeout_ms  = 30000

[providers.google-gemini]
base_url    = "https://generativelanguage.googleapis.com/v1beta/openai"
credential  = "env::GEMINI_API_KEY"
auth_type   = "query_param"
models      = ["gemini-2.5-pro-preview-06-05", "gemini-2.5-flash-preview-05-20", "gemini-2.0-flash", "gemini-2.0-flash-lite", "text-embedding-004"]
timeout_ms  = 30000

[providers.aws-bedrock]
base_url    = "https://bedrock-mantle.us-east-1.api.aws/v1"
credential  = "env::BEDROCK_API_KEY"
models      = ["anthropic.claude-sonnet-4-6-v1", "anthropic.claude-haiku-4-5-20251001-v1", "meta.llama3-1-405b-instruct-v1"]
timeout_ms  = 60000

[providers.databricks]
base_url    = "https://YOUR_WORKSPACE.databricks.com/serving-endpoints"
credential  = "env::DATABRICKS_TOKEN"
models      = ["databricks-meta-llama-3-1-405b-instruct"]
timeout_ms  = 30000

[providers.huggingface]
base_url    = "https://api-inference.huggingface.co/v1"
credential  = "env::HF_API_KEY"
models      = ["meta-llama/Meta-Llama-3.1-70B-Instruct"]
timeout_ms  = 30000

[providers.anyscale]
base_url    = "https://api.endpoints.anyscale.com/v1"
credential  = "env::ANYSCALE_API_KEY"
models      = ["meta-llama/Meta-Llama-3.1-70B-Instruct"]
timeout_ms  = 30000

[providers.cloudflare-ai]
base_url    = "https://api.cloudflare.com/client/v4/accounts/YOUR_ACCOUNT_ID/ai/v1"
credential  = "env::CLOUDFLARE_API_KEY"
models      = ["@cf/meta/llama-3.1-70b-instruct"]
timeout_ms  = 30000

Self-hosted / local providers

[providers.ollama]
base_url    = "http://localhost:11434/v1"
credential  = "none"   # Ollama doesn't require auth
models      = ["llama3.1", "mistral", "codellama"]
timeout_ms  = 60000

[providers.vllm]
base_url    = "http://localhost:8000/v1"
credential  = "none"    # vLLM doesn't require auth
models      = ["meta-llama/Meta-Llama-3.1-70B-Instruct"]
timeout_ms  = 60000

Model Routing

The gateway routes requests based on the model field in the request body. It looks up which provider registered that model name and forwards the request there.

Automatic routing

{ "model": "gpt-4o", "messages": [...] }

The gateway matches gpt-4o to the openai provider and routes accordingly.

Explicit provider hints

Force a specific provider with the x-genai-provider header:

x-genai-provider: anthropic

Or use a provider::model prefix:

{ "model": "anthropic::claude-sonnet-4-6", "messages": [...] }

If both header and prefix are present, they must agree or the request is rejected with 400 Bad Request.

Cross-Format Translation

The gateway handles format translation automatically when the caller’s SDK format differs from the upstream provider’s format.

OpenAI to Anthropic

When an OpenAI-format request is routed to Anthropic:

messages with role: "system" are extracted and sent as Anthropic’s top-level system field
max_tokens is forwarded directly
tools definitions are translated from OpenAI parameters to Anthropic input_schema
Response is translated back to OpenAI choices[].message format

Anthropic to OpenAI

When an Anthropic-format request is routed to OpenAI:

Top-level system is converted to a {"role": "system"} message
Content blocks are normalized
Response is translated to Anthropic content[] format

Streaming translation

Client SDK	Upstream	Path	Why
OpenAI	OpenAI	Fast path	Wire formats match
OpenAI	xAI	Fast path	xAI uses OpenAI-compatible SSE
Anthropic	OpenAI	Normalized	Incompatible SSE wire formats
Anthropic	Anthropic	Normalized	Anthropic ingress uses normalized path

Both paths produce identical telemetry and apply the same output guardrail checks.

API Key Handling

The gateway uses passthrough authentication. Your Authorization header is forwarded directly to the upstream provider. The gateway does not read, store, or manage provider API keys.

Extended Thinking

Anthropic extended thinking blocks (thinking content type) are preserved in Anthropic-format responses. When the response is serialized to OpenAI format, thinking blocks are omitted since OpenAI has no equivalent.

Adding a New Provider

Providers that follow the OpenAI /chat/completions format (like xAI) can use the OpenAiAdapter directly — just add a [providers.name] table to the config. No code changes needed. For providers with a different wire format, implement the ProviderAdapter trait. See the Gateway Development guide for details on implementing a new adapter.

Provider Limitations

Not all endpoints are available for every provider. The following features are only supported by OpenAI-compatible providers:

Feature	OpenAI / xAI	Anthropic
Chat completions	Yes	Yes
Embeddings	Yes	No
Image generation	Yes	No
Audio transcription	Yes	No
Audio speech (TTS)	Yes	No
Files API	Yes	No
Token counting	No	Yes
`json_object` / `json_schema` response format	Yes	No

Requests for unsupported features return 400 Bad Request.

Get Started

Concepts

Routing

Reference

Security

Telemetry

Providers

Providers

Supported Providers

Configuration

Native adapters

OpenAI-compatible providers

Cloud platform providers

Self-hosted / local providers

Model Routing

Automatic routing

Explicit provider hints

Cross-Format Translation

OpenAI to Anthropic

Anthropic to OpenAI

Streaming translation

API Key Handling

Extended Thinking

Adding a New Provider

Provider Limitations

Get Started

Concepts

Routing

Reference

Security

Telemetry

Documentation Index

​Providers

​Supported Providers

​Configuration

​Native adapters

​OpenAI-compatible providers

​Cloud platform providers

​Self-hosted / local providers

​Model Routing

​Automatic routing

​Explicit provider hints

​Cross-Format Translation

​OpenAI to Anthropic

​Anthropic to OpenAI

​Streaming translation

​API Key Handling

​Extended Thinking

​Adding a New Provider

​Provider Limitations

Providers

Supported Providers

Configuration

Native adapters

OpenAI-compatible providers

Cloud platform providers

Self-hosted / local providers

Model Routing

Automatic routing

Explicit provider hints

Cross-Format Translation

OpenAI to Anthropic

Anthropic to OpenAI

Streaming translation

API Key Handling

Extended Thinking

Adding a New Provider

Provider Limitations