Velahub API
Velahub is a unified gateway for Claude, GPT, Gemini, Voyage and DeepSeek. Point any OpenAI- or Anthropic-compatible SDK at the endpoint below and authenticate with a vh-live-... key — everything else stays the same.
https://www.velahub.ai/v1https://www.velahub.aiOpenAI-family SDKs append /chat/completions to the base URL, so they need the /v1 prefix. The Anthropic SDK appends /v1/messages itself, so its base URL is just the host.
The API is also reachable at the apex domain https://velahub.ai — handy for tools that don't follow redirects or that strip the Authorization header on a host change.
Quickstart
- Sign in and create an API key in the dashboard.
- Copy the key — it starts with
vh-live-. Treat it like a password and never commit it to source control. - Top up your wallet on the billing page. Calls deduct from this balance based on upstream token usage — see Models for current pricing.
- Point your SDK's base URL at
https://www.velahub.ai/v1(OpenAI dialect) orhttps://www.velahub.ai(Anthropic dialect). - Make a request. Every response includes an
x-velahub-request-idheader and per-call cost / token headers — see Response headers below.
There is no separate org / project / region setup. One key works against every supported model. Use X-Velahub-* headers to opt into gateway features like fallback, max-price caps or BYOK.
Authentication
All API calls require a Bearer token in the Authorization header. The token is the API key you generated in the dashboard.
- Production keys start with
vh-live-and bill against the workspace wallet. - Account / management endpoints (manage inference keys, usage, generations, token counting, BYOK) authenticate with a
management key(vh-mgmt-...) — see Management keys — not an inference API key. - Keys can be revoked at any time from the dashboard. Revocation takes effect within a few seconds.
Models
Velahub routes to the model you specify by id. The same id works regardless of which dialect you call — under the hood the gateway translates between OpenAI- and Anthropic-style request shapes.
- Anthropic —
claude-opus-4-*,claude-sonnet-4-*,claude-haiku-4-*. - OpenAI —
gpt-4o,gpt-4o-mini,o1,o3,dall-e-3,tts-1,whisper-1. - Google —
gemini-2.5-pro,gemini-2.5-flash. - Voyage — embeddings such as
voyage-3andvoyage-3-lite. - DeepSeek —
deepseek-chat,deepseek-reasoner.
For the live catalogue, prices and capabilities call GET /v1/models or browse the Models page.
Streaming
Pass "stream": true in the body to receive a Server-Sent Events stream. Velahub forwards upstream events unchanged so existing SSE clients keep working.
In addition, the gateway emits a final velahub.usage event right before the stream closes. It carries the canonical request id, the billed model and per-call cost and token counts:
event: velahub.usage
data: {"request_id":"...","model":"gpt-4o","cost_microcents":12345,
"usage":{"input_tokens":120,"output_tokens":420,
"cache_read_tokens":0,"cache_creation_tokens":0,"reasoning_tokens":0}}Clients that don't care about cost can safely ignore unknown event types. The values match the x-velahub-* headers returned on non-streaming calls.
Routing
Every request is routed across providers by a quality × fair-share algorithm. You can steer it per API key (in the dashboard under Routing) or per request with the X-Velahub-Route header:
- auto (default) — balances quality and price automatically.
- cheapest — bias toward the lowest-priced channel for the model.
- fastest — bias toward the lowest-latency channel.
- pool:<slug> — pin requests to a specific line pool (Direct / Turbo / Economy).
Set a per-key default in the dashboard; the X-Velahub-Route header overrides it for a single call.
# Per-request override — route this call to the cheapest channel.
curl https://www.velahub.ai/v1/chat/completions \
-H "Authorization: Bearer vh-live-..." \
-H "X-Velahub-Route: cheapest" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"hi"}]}'Max-price cap
Send X-Velahub-Max-Price-Microcents to reject a call before it runs if its estimated cost exceeds the cap. Useful for agents and long-running jobs where you want a hard ceiling per call.
- Unit: microcents. 1,000,000 = 1 cent, 100,000,000 = $1.
- The estimate uses your declared max_tokens and the upstream price for the model.
- A capped call fails fast with HTTP
403and never bills.
curl https://www.velahub.ai/v1/chat/completions \
-H "Authorization: Bearer vh-live-..." \
-H "Content-Type: application/json" \
-H "X-Velahub-Max-Price-Microcents: 5000000" \ # 5 cent
-d '{"model":"gpt-4o","max_tokens":1024,"messages":[{"role":"user","content":"hi"}]}'Fallback models
Send a comma-separated X-Velahub-Fallback-Models header to have the gateway retry the call against the next model in the list if the primary model returns 5xx or is unavailable.
- Order matters. The gateway tries each model left-to-right.
- Only transient upstream failures (5xx, timeout, overload) trigger the fallback. 4xx errors from the primary are returned as-is.
- The actually-billed model is reported in
x-velahub-billed-model. - Streaming responses fall back only if the failure happens before the first byte is sent — once the stream starts it cannot be retried.
curl https://www.velahub.ai/v1/chat/completions \
-H "Authorization: Bearer vh-live-..." \
-H "X-Velahub-Fallback-Models: gpt-4o, gpt-4o-mini" \
-d '{"model":"gpt-5","messages":[{"role":"user","content":"hi"}]}'Auto-translate
Send X-Velahub-Translate: 1 to have the gateway translate non-English user messages into English before forwarding to the model, then translate the reply back to the user's language.
- Detection is automatic — English content is passed through unchanged.
- System prompts are never modified.
- Streaming is supported; deltas are buffered into sentence-level chunks before translation.
- Tool / function calls and structured-output JSON are preserved verbatim.
# Translate user content while keeping tool calls intact.
curl https://www.velahub.ai/v1/chat/completions \
-H "Authorization: Bearer vh-live-..." \
-H "X-Velahub-Translate: 1" \
-d '{
"model":"claude-sonnet-4-6",
"stream":true,
"messages":[{"role":"user","content":"weather in SF?"}],
"tools":[{"type":"function","function":{"name":"get_weather",
"parameters":{"type":"object","properties":{"city":{"type":"string"}}}}}]
}'Bring your own key (BYOK)
Already have a contract with OpenAI, Anthropic or another upstream? Upload your key to Velahub and route traffic through it with X-Velahub-BYOK: 1. You still get Velahub's logging, headers and dashboards, but the upstream bill goes to your account.
- Keys are encrypted at rest and only decrypted in memory for the call.
- Velahub charges a flat per-call gateway fee for BYOK requests; no margin on tokens.
- You can upload one key per provider per workspace and rotate it at any time.
# Upload a key (one-time, via a management key — see Management keys).
curl https://www.velahub.ai/v1/byok/keys \
-H "Authorization: Bearer vh-mgmt-..." \
-d '{"provider":"openai","api_key":"sk-...","label":"my OpenAI"}'
# Use the uploaded key on a regular API call.
curl https://www.velahub.ai/v1/chat/completions \
-H "Authorization: Bearer vh-live-..." \
-H "X-Velahub-BYOK: 1" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"hi"}]}'Management keys
Management keys (vh-mgmt-...) authenticate the account / management API programmatically — manage inference keys, query usage / generations, and upload BYOK keys — without exposing your browser session. Mint and revoke them in the dashboard under Management keys.
- A management key authenticates the account endpoints only; it
cannotmake LLM calls — use a regular API key (vh-live-...) for inference. - A regular API key (
vh-live-...) calls models but cannot manage your account.
# Create a new inference API key with a management key.
curl https://www.velahub.ai/v1/management-keys \
-H "Authorization: Bearer vh-mgmt-..." \
-d '{"name":"new inference key"}'
# Query your recent usage with a management key.
curl https://www.velahub.ai/v1/usage/recent \
-H "Authorization: Bearer vh-mgmt-..."Response headers
Every response — streaming or not — carries a uniform set of Velahub headers so you can attribute cost, debug routing and reconcile billing without parsing the body.
x-velahub-request-id # Canonical id; quote it in support requests.
x-velahub-cost-microcents # Total cost of this call in microcents.
x-velahub-tokens-input
x-velahub-tokens-output
x-velahub-tokens-cache-read
x-velahub-tokens-cache-creation
x-velahub-tokens-reasoning
x-velahub-billed-model # May differ from requested model if fallback fired.On streaming calls the cost / token headers are sent with the initial response, then refined in the final velahub.usage SSE event.
Rate limits & errors
Velahub passes upstream rate limits through and adds a thin layer of its own based on your plan. Limits and current usage are reported on the dashboard. The gateway uses standard HTTP status codes; the error field in the JSON body always carries a human-readable reason.
Common error responses:
401— missing or invalid API key. Check that theAuthorizationheader isBearer vh-live-....402— wallet balance is zero or below the per-call minimum. Top up on the billing page.403witherror.code = max_price_exceeded— the estimated cost is above yourX-Velahub-Max-Price-Microcentscap.403witherror.code = quota_exceeded— workspace daily / monthly cap reached.404— unknown model id or unknown request id on the generations endpoint.429— upstream or gateway rate limit hit. Retry after theRetry-Afterheader value.5xx— upstream is unhealthy. The same request will be retried automatically if you setX-Velahub-Fallback-Models.