Velahub — Unified AI model gateway

SDK examples

OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    api_key="vh-live-...",
    base_url="https://www.velahub.ai/v1",
)
resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role":"user","content":"hello"}],
)
print(resp.choices[0].message.content)

Anthropic Python SDK

from anthropic import Anthropic

client = Anthropic(
    api_key="vh-live-...",
    base_url="https://www.velahub.ai",
)
resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role":"user","content":"hello"}],
)
print(resp.content[0].text)

Streaming with curl

curl https://www.velahub.ai/v1/chat/completions \
  -H "Authorization: Bearer vh-live-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","stream":true,"messages":[{"role":"user","content":"write a poem"}]}'

Multimodal

Image, audio and embedding endpoints follow the standard OpenAI shape and are billed per upstream call.

# Image generation
curl https://www.velahub.ai/v1/images/generations \
  -H "Authorization: Bearer vh-live-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"dall-e-3","prompt":"a cat","n":1,"size":"1024x1024"}'

# Text-to-speech
curl https://www.velahub.ai/v1/audio/speech \
  -H "Authorization: Bearer vh-live-..." \
  -o speech.mp3 \
  -d '{"model":"tts-1","input":"hello world","voice":"alloy"}'

# Speech-to-text (Whisper)
curl https://www.velahub.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer vh-live-..." \
  -H "X-Velahub-Model: whisper-1" \
  -F file=@audio.mp3 \
  -F response_format=json

# Embeddings
curl https://www.velahub.ai/v1/embeddings \
  -H "Authorization: Bearer vh-live-..." \
  -d '{"model":"voyage-3","input":"hello world"}'

Set X-Velahub-Model when the request body can't carry a model field (for example multipart uploads to /v1/audio/transcriptions).

Claude Code

Anthropic's claude CLI reads ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN from the environment — set both to point it at Velahub.

# Point Claude Code at Velahub
export ANTHROPIC_BASE_URL="https://www.velahub.ai"
export ANTHROPIC_AUTH_TOKEN="vh-live-..."

# Verify it works
claude --version
claude "hello, who am I talking to?"

Velahub speaks the full Anthropic /v1/messages dialect, including tools, vision and streaming, so every Claude Code feature works unchanged.

Claude Code can burn through tokens fast on long sessions. Pair it with a X-Velahub-Max-Price-Microcents cap (see below) to bound a single call's cost.

Codex / OpenAI CLI

The official codex and openai CLIs both read OPENAI_API_KEY and OPENAI_BASE_URL. Set them once and every command flows through Velahub.

export OPENAI_API_KEY="vh-live-..."
export OPENAI_BASE_URL="https://www.velahub.ai/v1"

# Codex
codex --model gpt-4o "refactor this function"

# Plain OpenAI CLI
openai api chat.completions.create -m gpt-4o-mini -g user "say hi"

Any tool built on top of the OpenAI Python or Node SDK (Aider, sgpt, llm, …) works the same way — just set the same two environment variables.

VS Code · Cline

Cline is a VS Code agent extension. Configure it to talk to Velahub via the OpenAI-compatible dialect:

Install the Cline extension in VS Code.
Open the Cline sidebar and click the settings gear icon.
Set API Provider to OpenAI Compatible and fill in:
- Base URL: https://www.velahub.ai/v1
- API Key: your vh-live-... key
- Model ID: any Velahub model id, e.g. gpt-4o or claude-sonnet-4-6
Save and start a new task — Cline now talks to Velahub.

Watch the per-call cost on the recent generations page if you're iterating on prompts.

VS Code · Continue

Continue lets you mix providers in one config. Set apiBase per model — use the /v1 base for provider: openai entries and the bare host for provider: anthropic entries.

// ~/.continue/config.json
{
  "models": [
    {
      "title": "Claude Sonnet 4.6 (via Velahub)",
      "provider": "anthropic",
      "model": "claude-sonnet-4-6",
      "apiKey": "vh-live-...",
      "apiBase": "https://www.velahub.ai"
    },
    {
      "title": "GPT-4o (via Velahub)",
      "provider": "openai",
      "model": "gpt-4o",
      "apiKey": "vh-live-...",
      "apiBase": "https://www.velahub.ai/v1"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Tab Autocomplete",
    "provider": "openai",
    "model": "gpt-4o-mini",
    "apiKey": "vh-live-...",
    "apiBase": "https://www.velahub.ai/v1"
  }
}

tabAutocompleteModel is called on every keystroke, so prefer a small / cheap model like gpt-4o-mini there.

Zed

Zed's assistant supports custom api_url per provider block. Configure both openai and anthropic to route through Velahub:

// ~/.config/zed/settings.json
{
  "language_models": {
    "openai": {
      "api_url": "https://www.velahub.ai/v1",
      "available_models": [
        { "name": "gpt-4o",      "max_tokens": 128000 },
        { "name": "gpt-4o-mini", "max_tokens": 128000 }
      ]
    },
    "anthropic": {
      "api_url": "https://www.velahub.ai",
      "available_models": [
        { "name": "claude-sonnet-4-6", "max_tokens": 200000 },
        { "name": "claude-haiku-4-5",  "max_tokens": 200000 }
      ]
    }
  },
  "assistant": {
    "default_model": {
      "provider": "anthropic",
      "model":    "claude-sonnet-4-6"
    },
    "version": "2"
  }
}

Restart Zed after editing settings.json. The assistant picker will then list every model you declared under available_models.

Cursor

Cursor's Models setting accepts a custom OpenAI base URL. Use Velahub there to route Cursor's chat / Cmd-K / Composer traffic through your wallet.

Open Cursor Settings → Models.
Scroll to OpenAI API Key, paste your vh-live-... key, and click Verify.
Expand OpenAI Base URL (under the key field), enable the toggle and set the URL to https://www.velahub.ai/v1.
Under Model Names, enable the OpenAI-compatible models you want (e.g. gpt-4o, gpt-4o-mini).
Disable Cursor's built-in models or set Velahub-routed ones as default — that's it.

Cursor's Background agents and Privacy mode still apply on top of Velahub. Make sure your team's policy allows routing through a third-party gateway before enabling this in shared workspaces.

Anthropic models in Cursor go through Cursor's own infra and can't be redirected — only the OpenAI-compatible slot is user-configurable. Use Claude Code or Continue if you need Velahub-routed Claude inside an editor.

Curl recipes

When something looks off, drop down to curl. The recipes below mirror what the SDKs do under the hood — they're the fastest way to confirm the gateway, your key and a given model are all healthy.

Health check

curl -s https://www.velahub.ai/healthz | jq
# {
#   "status": "ok",
#   "db": true,
#   "tunnel": true
# }

List models

No auth required. Browse the live catalogue exactly as the dashboard does.

curl -s https://www.velahub.ai/v1/models \
  | jq '.data[] | {id, provider: .velahub.provider}'

Non-streaming chat completion

curl -i https://www.velahub.ai/v1/chat/completions \
  -H "Authorization: Bearer vh-live-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "max_tokens": 100,
    "messages": [{"role":"user","content":"Reply with the single word: pong."}]
  }'

The response includes the x-velahub-* cost and token headers — useful for budgeting and for confirming which upstream model was billed.

Streaming Anthropic call

curl -N https://www.velahub.ai/v1/messages \
  -H "Authorization: Bearer vh-live-..." \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 500,
    "stream": true,
    "messages": [{"role":"user","content":"Count from 1 to 5, one per line."}]
  }'

The -N flag disables curl's output buffering so you can see the SSE deltas land in real time. The final event is always message_stop, immediately followed by velahub.usage.

Cap the cost of a single call

# Reject the call if its estimated cost exceeds 10 cents.
# Value is in microcents (1 cent = 1,000,000).
curl https://www.velahub.ai/v1/chat/completions \
  -H "Authorization: Bearer vh-live-..." \
  -H "Content-Type: application/json" \
  -H "X-Velahub-Max-Price-Microcents: 10000000" \
  -d '{
    "model": "gpt-4o",
    "max_tokens": 4000,
    "messages": [{"role":"user","content":"hi"}]
  }'

Auto-translate user content

curl https://www.velahub.ai/v1/chat/completions \
  -H "Authorization: Bearer vh-live-..." \
  -H "Content-Type: application/json" \
  -H "X-Velahub-Translate: 1" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 200,
    "messages": [{"role":"user","content":"今天天气怎么样？"}]
  }'

With X-Velahub-Translate: 1 the gateway translates non-English user content into English before sending it upstream and translates the model's reply back. Tool / function calls are preserved verbatim. See Claude Code for an end-to-end setup.

Run the full smoke test

The repo ships a scripts/smoke-test.sh script that exercises every dialect (chat, messages, streaming, embeddings, …) against a live key. Use it after rotating a key or before reporting a bug.

# Run all checks with your key.
VELAHUB_KEY=vh-live-... ./scripts/smoke-test.sh

# Expected tail of the output:
# -- Test 5: /v1/messages — Anthropic dialect, streaming --
#   OK  message_start (claude-haiku-4-5)
#   OK  39 content delta(s)
#   OK  message_stop
#
# 7/7 checks passed

Look up a single call

# Fetch the canonical record for one request id.
curl https://www.velahub.ai/v1/generations/<request-id> \
  -H "Authorization: Bearer vh-mgmt-..."
# Returns the billed model, status, token counts and cost
# as well as duration_ms and whether the call used BYOK.
# Authenticate with a management key (vh-mgmt-...), not an inference API key.