API Reference

Velahub API

Velahub is a unified gateway for Claude, GPT, Gemini, Voyage and DeepSeek. Point any OpenAI- or Anthropic-compatible SDK at the endpoint below and authenticate with a vh-live-... key — everything else stays the same.

Base URL

OpenAI-compatible SDKs

https://www.velahub.ai/v1

Anthropic SDK

https://www.velahub.ai

OpenAI-family SDKs append /chat/completions to the base URL, so they need the /v1 prefix. The Anthropic SDK appends /v1/messages itself, so its base URL is just the host.

The API is also reachable at the apex domain https://velahub.ai — handy for tools that don't follow redirects or that strip the Authorization header on a host change.

Create an API key View plans

Quickstart

Sign in and create an API key in the dashboard.
Copy the key — it starts with vh-live-. Treat it like a password and never commit it to source control.
Top up your wallet on the billing page. Calls deduct from this balance based on upstream token usage — see Models for current pricing.
Point your SDK's base URL at https://www.velahub.ai/v1 (OpenAI dialect) or https://www.velahub.ai (Anthropic dialect).
Make a request. Every response includes an x-velahub-request-id header and per-call cost / token headers — see Response headers below.

There is no separate org / project / region setup. One key works against every supported model. Use X-Velahub-* headers to opt into gateway features like fallback, max-price caps or BYOK.

Authentication

All API calls require a Bearer token in the Authorization header. The token is the API key you generated in the dashboard.

Production keys start with vh-live- and bill against the workspace wallet.
Account / management endpoints (manage inference keys, usage, generations, token counting, BYOK) authenticate with a management key (vh-mgmt-...) — see Management keys — not an inference API key.
Keys can be revoked at any time from the dashboard. Revocation takes effect within a few seconds.

Models

Velahub routes to the model you specify by id. The same id works regardless of which dialect you call — under the hood the gateway translates between OpenAI- and Anthropic-style request shapes.

Anthropic — claude-opus-4-*, claude-sonnet-4-*, claude-haiku-4-*.
OpenAI — gpt-4o, gpt-4o-mini, o1, o3, dall-e-3, tts-1, whisper-1.
Google — gemini-2.5-pro, gemini-2.5-flash.
Voyage — embeddings such as voyage-3 and voyage-3-lite.
DeepSeek — deepseek-chat, deepseek-reasoner.

For the live catalogue, prices and capabilities call GET /v1/models or browse the Models page.

Streaming

Pass "stream": true in the body to receive a Server-Sent Events stream. Velahub forwards upstream events unchanged so existing SSE clients keep working.

In addition, the gateway emits a final velahub.usage event right before the stream closes. It carries the canonical request id, the billed model and per-call cost and token counts:

event: velahub.usage
data: {"request_id":"...","model":"gpt-4o","cost_microcents":12345,
       "usage":{"input_tokens":120,"output_tokens":420,
                "cache_read_tokens":0,"cache_creation_tokens":0,"reasoning_tokens":0}}

Clients that don't care about cost can safely ignore unknown event types. The values match the x-velahub-* headers returned on non-streaming calls.

Routing

Every request is routed across providers by a quality × fair-share algorithm. You can steer it per API key (in the dashboard under Routing) or per request with the X-Velahub-Route header:

auto (default) — balances quality and price automatically.
cheapest — bias toward the lowest-priced channel for the model.
fastest — bias toward the lowest-latency channel.
pool:<slug> — pin requests to a specific line pool (Direct / Turbo / Economy).

Set a per-key default in the dashboard; the X-Velahub-Route header overrides it for a single call.

# Per-request override — route this call to the cheapest channel.
curl https://www.velahub.ai/v1/chat/completions \
  -H "Authorization: Bearer vh-live-..." \
  -H "X-Velahub-Route: cheapest" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hi"}]}'

Max-price cap

Send X-Velahub-Max-Price-Microcents to reject a call before it runs if its estimated cost exceeds the cap. Useful for agents and long-running jobs where you want a hard ceiling per call.

Unit: microcents. 1,000,000 = 1 cent, 100,000,000 = $1.
The estimate uses your declared max_tokens and the upstream price for the model.
A capped call fails fast with HTTP 403 and never bills.

curl https://www.velahub.ai/v1/chat/completions \
  -H "Authorization: Bearer vh-live-..." \
  -H "Content-Type: application/json" \
  -H "X-Velahub-Max-Price-Microcents: 5000000" \  # 5 cent
  -d '{"model":"gpt-4o","max_tokens":1024,"messages":[{"role":"user","content":"hi"}]}'

Fallback models

Send a comma-separated X-Velahub-Fallback-Models header to have the gateway retry the call against the next model in the list if the primary model returns 5xx or is unavailable.

Order matters. The gateway tries each model left-to-right.
Only transient upstream failures (5xx, timeout, overload) trigger the fallback. 4xx errors from the primary are returned as-is.
The actually-billed model is reported in x-velahub-billed-model.
Streaming responses fall back only if the failure happens before the first byte is sent — once the stream starts it cannot be retried.

curl https://www.velahub.ai/v1/chat/completions \
  -H "Authorization: Bearer vh-live-..." \
  -H "X-Velahub-Fallback-Models: gpt-4o, gpt-4o-mini" \
  -d '{"model":"gpt-5","messages":[{"role":"user","content":"hi"}]}'

Auto-translate

Send X-Velahub-Translate: 1 to have the gateway translate non-English user messages into English before forwarding to the model, then translate the reply back to the user's language.

Detection is automatic — English content is passed through unchanged.
System prompts are never modified.
Streaming is supported; deltas are buffered into sentence-level chunks before translation.
Tool / function calls and structured-output JSON are preserved verbatim.

# Translate user content while keeping tool calls intact.
curl https://www.velahub.ai/v1/chat/completions \
  -H "Authorization: Bearer vh-live-..." \
  -H "X-Velahub-Translate: 1" \
  -d '{
    "model":"claude-sonnet-4-6",
    "stream":true,
    "messages":[{"role":"user","content":"weather in SF?"}],
    "tools":[{"type":"function","function":{"name":"get_weather",
              "parameters":{"type":"object","properties":{"city":{"type":"string"}}}}}]
  }'

Bring your own key (BYOK)

Already have a contract with OpenAI, Anthropic or another upstream? Upload your key to Velahub and route traffic through it with X-Velahub-BYOK: 1. You still get Velahub's logging, headers and dashboards, but the upstream bill goes to your account.

Keys are encrypted at rest and only decrypted in memory for the call.
Velahub charges a flat per-call gateway fee for BYOK requests; no margin on tokens.
You can upload one key per provider per workspace and rotate it at any time.

# Upload a key (one-time, via a management key — see Management keys).
curl https://www.velahub.ai/v1/byok/keys \
  -H "Authorization: Bearer vh-mgmt-..." \
  -d '{"provider":"openai","api_key":"sk-...","label":"my OpenAI"}'

# Use the uploaded key on a regular API call.
curl https://www.velahub.ai/v1/chat/completions \
  -H "Authorization: Bearer vh-live-..." \
  -H "X-Velahub-BYOK: 1" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hi"}]}'

Management keys

Management keys (vh-mgmt-...) authenticate the account / management API programmatically — manage inference keys, query usage / generations, and upload BYOK keys — without exposing your browser session. Mint and revoke them in the dashboard under Management keys.

A management key authenticates the account endpoints only; it cannot make LLM calls — use a regular API key (vh-live-...) for inference.
A regular API key (vh-live-...) calls models but cannot manage your account.

# Create a new inference API key with a management key.
curl https://www.velahub.ai/v1/management-keys \
  -H "Authorization: Bearer vh-mgmt-..." \
  -d '{"name":"new inference key"}'

# Query your recent usage with a management key.
curl https://www.velahub.ai/v1/usage/recent \
  -H "Authorization: Bearer vh-mgmt-..."

Response headers

Every response — streaming or not — carries a uniform set of Velahub headers so you can attribute cost, debug routing and reconcile billing without parsing the body.

x-velahub-request-id        # Canonical id; quote it in support requests.
x-velahub-cost-microcents   # Total cost of this call in microcents.
x-velahub-tokens-input
x-velahub-tokens-output
x-velahub-tokens-cache-read
x-velahub-tokens-cache-creation
x-velahub-tokens-reasoning
x-velahub-billed-model      # May differ from requested model if fallback fired.

On streaming calls the cost / token headers are sent with the initial response, then refined in the final velahub.usage SSE event.

Rate limits & errors

Velahub passes upstream rate limits through and adds a thin layer of its own based on your plan. Limits and current usage are reported on the dashboard. The gateway uses standard HTTP status codes; the error field in the JSON body always carries a human-readable reason.

Common error responses:

401 — missing or invalid API key. Check that the Authorization header is Bearer vh-live-....
402 — wallet balance is zero or below the per-call minimum. Top up on the billing page.
403 with error.code = max_price_exceeded — the estimated cost is above your X-Velahub-Max-Price-Microcents cap.
403 with error.code = quota_exceeded — workspace daily / monthly cap reached.
404 — unknown model id or unknown request id on the generations endpoint.
429 — upstream or gateway rate limit hit. Retry after the Retry-After header value.
5xx — upstream is unhealthy. The same request will be retried automatically if you set X-Velahub-Fallback-Models.