API Reference

Velahub API

Velahub is a unified gateway for Claude, GPT, Gemini, Voyage and DeepSeek. Point any OpenAI- or Anthropic-compatible SDK at the endpoint below and authenticate with a vh-live-... key — everything else stays the same.

Base URL
OpenAI-compatible SDKs
https://www.velahub.ai/v1
Anthropic SDK
https://www.velahub.ai

OpenAI-family SDKs append /chat/completions to the base URL, so they need the /v1 prefix. The Anthropic SDK appends /v1/messages itself, so its base URL is just the host.

The API is also reachable at the apex domain https://velahub.ai — handy for tools that don't follow redirects or that strip the Authorization header on a host change.

Quickstart

  1. Sign in and create an API key in the dashboard.
  2. Copy the key — it starts with vh-live-. Treat it like a password and never commit it to source control.
  3. Top up your wallet on the billing page. Calls deduct from this balance based on upstream token usage — see Models for current pricing.
  4. Point your SDK's base URL at https://www.velahub.ai/v1 (OpenAI dialect) or https://www.velahub.ai (Anthropic dialect).
  5. Make a request. Every response includes an x-velahub-request-id header and per-call cost / token headers — see Response headers below.

There is no separate org / project / region setup. One key works against every supported model. Use X-Velahub-* headers to opt into gateway features like fallback, max-price caps or BYOK.

Authentication

All API calls require a Bearer token in the Authorization header. The token is the API key you generated in the dashboard.

  • Production keys start with vh-live- and bill against the workspace wallet.
  • Account / management endpoints (manage inference keys, usage, generations, token counting, BYOK) authenticate with a management key (vh-mgmt-...) — see Management keys — not an inference API key.
  • Keys can be revoked at any time from the dashboard. Revocation takes effect within a few seconds.

Models

Velahub routes to the model you specify by id. The same id works regardless of which dialect you call — under the hood the gateway translates between OpenAI- and Anthropic-style request shapes.

  • Anthropicclaude-opus-4-*, claude-sonnet-4-*, claude-haiku-4-*.
  • OpenAIgpt-4o, gpt-4o-mini, o1, o3, dall-e-3, tts-1, whisper-1.
  • Googlegemini-2.5-pro, gemini-2.5-flash.
  • Voyage — embeddings such as voyage-3 and voyage-3-lite.
  • DeepSeekdeepseek-chat, deepseek-reasoner.

For the live catalogue, prices and capabilities call GET /v1/models or browse the Models page.

Streaming

Pass "stream": true in the body to receive a Server-Sent Events stream. Velahub forwards upstream events unchanged so existing SSE clients keep working.

In addition, the gateway emits a final velahub.usage event right before the stream closes. It carries the canonical request id, the billed model and per-call cost and token counts:

event: velahub.usage
data: {"request_id":"...","model":"gpt-4o","cost_microcents":12345,
       "usage":{"input_tokens":120,"output_tokens":420,
                "cache_read_tokens":0,"cache_creation_tokens":0,"reasoning_tokens":0}}

Clients that don't care about cost can safely ignore unknown event types. The values match the x-velahub-* headers returned on non-streaming calls.

Routing

Every request is routed across providers by a quality × fair-share algorithm. You can steer it per API key (in the dashboard under Routing) or per request with the X-Velahub-Route header:

  • auto (default) — balances quality and price automatically.
  • cheapest — bias toward the lowest-priced channel for the model.
  • fastest — bias toward the lowest-latency channel.
  • pool:<slug> — pin requests to a specific line pool (Direct / Turbo / Economy).

Set a per-key default in the dashboard; the X-Velahub-Route header overrides it for a single call.

# Per-request override — route this call to the cheapest channel.
curl https://www.velahub.ai/v1/chat/completions \
  -H "Authorization: Bearer vh-live-..." \
  -H "X-Velahub-Route: cheapest" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hi"}]}'

Max-price cap

Send X-Velahub-Max-Price-Microcents to reject a call before it runs if its estimated cost exceeds the cap. Useful for agents and long-running jobs where you want a hard ceiling per call.

  • Unit: microcents. 1,000,000 = 1 cent, 100,000,000 = $1.
  • The estimate uses your declared max_tokens and the upstream price for the model.
  • A capped call fails fast with HTTP 403 and never bills.
curl https://www.velahub.ai/v1/chat/completions \
  -H "Authorization: Bearer vh-live-..." \
  -H "Content-Type: application/json" \
  -H "X-Velahub-Max-Price-Microcents: 5000000" \  # 5 cent
  -d '{"model":"gpt-4o","max_tokens":1024,"messages":[{"role":"user","content":"hi"}]}'

Fallback models

Send a comma-separated X-Velahub-Fallback-Models header to have the gateway retry the call against the next model in the list if the primary model returns 5xx or is unavailable.

  • Order matters. The gateway tries each model left-to-right.
  • Only transient upstream failures (5xx, timeout, overload) trigger the fallback. 4xx errors from the primary are returned as-is.
  • The actually-billed model is reported in x-velahub-billed-model.
  • Streaming responses fall back only if the failure happens before the first byte is sent — once the stream starts it cannot be retried.
curl https://www.velahub.ai/v1/chat/completions \
  -H "Authorization: Bearer vh-live-..." \
  -H "X-Velahub-Fallback-Models: gpt-4o, gpt-4o-mini" \
  -d '{"model":"gpt-5","messages":[{"role":"user","content":"hi"}]}'

Auto-translate

Send X-Velahub-Translate: 1 to have the gateway translate non-English user messages into English before forwarding to the model, then translate the reply back to the user's language.

  • Detection is automatic — English content is passed through unchanged.
  • System prompts are never modified.
  • Streaming is supported; deltas are buffered into sentence-level chunks before translation.
  • Tool / function calls and structured-output JSON are preserved verbatim.
# Translate user content while keeping tool calls intact.
curl https://www.velahub.ai/v1/chat/completions \
  -H "Authorization: Bearer vh-live-..." \
  -H "X-Velahub-Translate: 1" \
  -d '{
    "model":"claude-sonnet-4-6",
    "stream":true,
    "messages":[{"role":"user","content":"weather in SF?"}],
    "tools":[{"type":"function","function":{"name":"get_weather",
              "parameters":{"type":"object","properties":{"city":{"type":"string"}}}}}]
  }'

Bring your own key (BYOK)

Already have a contract with OpenAI, Anthropic or another upstream? Upload your key to Velahub and route traffic through it with X-Velahub-BYOK: 1. You still get Velahub's logging, headers and dashboards, but the upstream bill goes to your account.

  • Keys are encrypted at rest and only decrypted in memory for the call.
  • Velahub charges a flat per-call gateway fee for BYOK requests; no margin on tokens.
  • You can upload one key per provider per workspace and rotate it at any time.
# Upload a key (one-time, via a management key — see Management keys).
curl https://www.velahub.ai/v1/byok/keys \
  -H "Authorization: Bearer vh-mgmt-..." \
  -d '{"provider":"openai","api_key":"sk-...","label":"my OpenAI"}'

# Use the uploaded key on a regular API call.
curl https://www.velahub.ai/v1/chat/completions \
  -H "Authorization: Bearer vh-live-..." \
  -H "X-Velahub-BYOK: 1" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hi"}]}'

Management keys

Management keys (vh-mgmt-...) authenticate the account / management API programmatically — manage inference keys, query usage / generations, and upload BYOK keys — without exposing your browser session. Mint and revoke them in the dashboard under Management keys.

  • A management key authenticates the account endpoints only; it cannot make LLM calls — use a regular API key (vh-live-...) for inference.
  • A regular API key (vh-live-...) calls models but cannot manage your account.
# Create a new inference API key with a management key.
curl https://www.velahub.ai/v1/management-keys \
  -H "Authorization: Bearer vh-mgmt-..." \
  -d '{"name":"new inference key"}'

# Query your recent usage with a management key.
curl https://www.velahub.ai/v1/usage/recent \
  -H "Authorization: Bearer vh-mgmt-..."

Response headers

Every response — streaming or not — carries a uniform set of Velahub headers so you can attribute cost, debug routing and reconcile billing without parsing the body.

x-velahub-request-id        # Canonical id; quote it in support requests.
x-velahub-cost-microcents   # Total cost of this call in microcents.
x-velahub-tokens-input
x-velahub-tokens-output
x-velahub-tokens-cache-read
x-velahub-tokens-cache-creation
x-velahub-tokens-reasoning
x-velahub-billed-model      # May differ from requested model if fallback fired.

On streaming calls the cost / token headers are sent with the initial response, then refined in the final velahub.usage SSE event.

Rate limits & errors

Velahub passes upstream rate limits through and adds a thin layer of its own based on your plan. Limits and current usage are reported on the dashboard. The gateway uses standard HTTP status codes; the error field in the JSON body always carries a human-readable reason.

Common error responses:

  • 401 — missing or invalid API key. Check that the Authorization header is Bearer vh-live-....
  • 402 — wallet balance is zero or below the per-call minimum. Top up on the billing page.
  • 403 with error.code = max_price_exceeded — the estimated cost is above your X-Velahub-Max-Price-Microcents cap.
  • 403 with error.code = quota_exceeded — workspace daily / monthly cap reached.
  • 404 — unknown model id or unknown request id on the generations endpoint.
  • 429 — upstream or gateway rate limit hit. Retry after the Retry-After header value.
  • 5xx — upstream is unhealthy. The same request will be retried automatically if you set X-Velahub-Fallback-Models.