API Reference

The gateway exposes chat ingress endpoints, model discovery, embeddings, token counting, image generation, audio (transcription + TTS), and file management. Each chat endpoint accepts requests in its native SDK format, normalises them internally, and responds in the caller’s format.

Health check

GET /health  ->  200 OK

Endpoints

Endpoint	Method	Description
`/v1/chat/completions`	POST	OpenAI / xAI chat completions
`/v1/messages`	POST	Anthropic messages API
`/v1/responses`	POST	Canonical gateway endpoint (OpenAI / xAI / Anthropic)
`/v1/models`	GET	List all configured models
`/v1/models/{model_id}`	GET	Get single model info
`/v1/embeddings`	POST	Embeddings (OpenAI-compatible providers only)
`/v1/messages/count_tokens`	POST	Token counting (Anthropic only)
`/v1/images/generations`	POST	Image generation (OpenAI-compatible providers only)
`/v1/audio/transcriptions`	POST	Audio transcription (OpenAI, multipart/form-data)
`/v1/audio/speech`	POST	Text-to-speech (OpenAI-compatible providers only)
`/v1/files`	POST	File upload (OpenAI-compatible providers only)
`/v1/files`	GET	List files
`/v1/files/{file_id}`	GET	Retrieve file metadata
`/v1/files/{file_id}/content`	GET	Download file content
`/v1/files/{file_id}`	DELETE	Delete a file

The model field determines which upstream provider handles the request. In addition to plain model names (e.g. gpt-4o), the model field supports namespace prefixes for explicit routing:

function::name — route to a named function defined in your gateway config
route::name — route to a named route
provider::model — route to a specific provider (e.g. anthropic::claude-sonnet-4-6)

Headers

Authorization

The gateway passes your API key through to the upstream provider. Use the Authorization header as you normally would with the provider:

Authorization: Bearer sk-...

SDK detection

The gateway auto-detects the caller’s SDK format from the endpoint. Override with:

x-genai-sdk: openai | anthropic | xai

Provider hints

Force routing to a specific provider:

x-genai-provider: openai | anthropic | xai

Or use a provider::model prefix in the model field:

{ "model": "anthropic::claude-sonnet-4-6", "messages": [...] }

If both header and prefix are present, they must agree or the request is rejected.

Chat completions

OpenAI format

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "max_tokens": 256
  }'

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "gpt-4o",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "Paris." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 14, "completion_tokens": 3, "total_tokens": 17 }
}

Anthropic format

curl http://localhost:4000/v1/messages \
  -H "Authorization: Bearer $ANTHROPIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 256,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Response:

{
  "id": "msg-abc123",
  "type": "message",
  "role": "assistant",
  "content": [{ "type": "text", "text": "Hello!" }],
  "model": "claude-sonnet-4-6",
  "stop_reason": "end_turn",
  "usage": { "input_tokens": 10, "output_tokens": 5 }
}

Cross-format routing

Send an OpenAI-format request to an Anthropic model — the gateway translates the request and returns an OpenAI-format response:

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $ANTHROPIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

System messages are correctly forwarded: the gateway extracts {"role": "system"} messages and passes them as Anthropic’s top-level system field.

Streaming

Add "stream": true to any request to receive Server-Sent Events:

curl -N http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Count to 3"}],
    "stream": true
  }'

The gateway streams in the caller’s native SSE format. See Streaming for details on fast-path vs normalised-path behaviour.

Structured output

The response_format field is supported for OpenAI models:

{
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "List 3 colors as JSON"}],
  "response_format": { "type": "json_object" }
}

Supported types: text, json_object, json_schema. Anthropic models do not support json_object or json_schema — requests with these formats routed to Anthropic return 400 Bad Request.

Tool calls

The tools field is forwarded to the upstream provider. For Anthropic models, tool definitions are automatically translated to the input_schema format:

{
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "What's the weather in Paris?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": { "type": "string" }
        },
        "required": ["location"]
      }
    }
  }]
}

Models

List all configured models or retrieve a single model. These are synthetic responses built from the gateway config — no upstream call is made. No authentication required.

# List all models
curl http://localhost:4000/v1/models

{
  "object": "list",
  "data": [
    { "id": "gpt-4o", "object": "model", "created": 0, "owned_by": "openai" },
    { "id": "claude-sonnet-4-6", "object": "model", "created": 0, "owned_by": "anthropic" }
  ]
}

# Get a single model
curl http://localhost:4000/v1/models/gpt-4o

Embeddings

Passthrough proxy to OpenAI-compatible providers. The request is routed by model name. Anthropic models return 400 Bad Request.

curl -X POST http://localhost:4000/v1/embeddings \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "text-embedding-3-small", "input": "hello world"}'

Input text is scanned by the security pipeline (PII + blocklist) when security is enabled. Embedding model names must be listed in the provider’s models config array.

Token counting

Passthrough proxy to Anthropic’s /messages/count_tokens endpoint. Only Anthropic models are supported — OpenAI models return 400 Bad Request.

curl -X POST http://localhost:4000/v1/messages/count_tokens \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "How many tokens is this?"}]
  }'

Response: {"input_tokens": 14}

Image generation

Passthrough proxy to OpenAI-compatible providers’ DALL-E endpoint. Anthropic models return 400 Bad Request.

curl -X POST http://localhost:4000/v1/images/generations \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "dall-e-3", "prompt": "a white cat", "n": 1, "size": "1024x1024"}'

The prompt field is scanned by the security pipeline when enabled. Image model names must be listed in the provider’s models config array.

Audio transcription

Passthrough proxy to OpenAI’s Whisper endpoint. The request must be multipart/form-data containing an audio file. No security scanning is applied (binary audio input).

curl -X POST http://localhost:4000/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F file=@recording.mp3 \
  -F model=whisper-1

Audio speech (TTS)

Passthrough proxy to OpenAI’s text-to-speech endpoint. Anthropic models return 400 Bad Request.

curl -X POST http://localhost:4000/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "tts-1", "input": "Hello world", "voice": "alloy"}' \
  --output speech.mp3

The input field is scanned by the security pipeline when enabled.

Files

Passthrough proxy to OpenAI’s file management endpoints. Supports upload (multipart), list, retrieve, download, and delete. Anthropic models return 400 Bad Request. No security scanning is applied.

# Upload a file
curl -X POST http://localhost:4000/v1/files \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F purpose=assistants \
  -F file=@data.jsonl

# List files
curl http://localhost:4000/v1/files \
  -H "Authorization: Bearer $OPENAI_API_KEY"

# Retrieve file metadata
curl http://localhost:4000/v1/files/file-abc123 \
  -H "Authorization: Bearer $OPENAI_API_KEY"

# Download file content
curl http://localhost:4000/v1/files/file-abc123/content \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  --output downloaded.jsonl

# Delete a file
curl -X DELETE http://localhost:4000/v1/files/file-abc123 \
  -H "Authorization: Bearer $OPENAI_API_KEY"

Error responses

Scenario	Status
Unknown model	`404 Not Found`
Security violation (input PII, blocklist, or output guardrail)	`400 Bad Request`
Provider returned an error	`502 Bad Gateway`
Unsupported response_format for provider	`400 Bad Request`

Error body format:

{ "error": { "message": "<description>" } }

Get Started

Concepts

Routing

Reference

Security

Telemetry

API Reference

API Reference

Health check

Endpoints

Headers

Authorization

SDK detection

Provider hints

Chat completions

OpenAI format

Anthropic format

Cross-format routing

Streaming

Structured output

Tool calls

Models

Embeddings

Token counting

Image generation

Audio transcription

Audio speech (TTS)

Files

Error responses

Get Started

Concepts

Routing

Reference

Security

Telemetry

Documentation Index

​API Reference

​Health check

​Endpoints

​Headers

​Authorization

​SDK detection

​Provider hints

​Chat completions

​OpenAI format

​Anthropic format

​Cross-format routing

​Streaming

​Structured output

​Tool calls

​Models

​Embeddings

​Token counting

​Image generation

​Audio transcription

​Audio speech (TTS)

​Files

​Error responses

API Reference

Health check

Endpoints

Headers

Authorization

SDK detection

Provider hints

Chat completions

OpenAI format

Anthropic format

Cross-format routing

Streaming

Structured output

Tool calls

Models

Embeddings

Token counting

Image generation

Audio transcription

Audio speech (TTS)

Files

Error responses