SDK examples
OpenAI Python SDK
from openai import OpenAI
client = OpenAI(
api_key="vh-live-...",
base_url="https://www.velahub.ai/v1",
)
resp = client.chat.completions.create(
model="gpt-4o",
messages=[{"role":"user","content":"hello"}],
)
print(resp.choices[0].message.content)Anthropic Python SDK
from anthropic import Anthropic
client = Anthropic(
api_key="vh-live-...",
base_url="https://www.velahub.ai",
)
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role":"user","content":"hello"}],
)
print(resp.content[0].text)Streaming with curl
curl https://www.velahub.ai/v1/chat/completions \
-H "Authorization: Bearer vh-live-..." \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","stream":true,"messages":[{"role":"user","content":"write a poem"}]}'Multimodal
Image, audio and embedding endpoints follow the standard OpenAI shape and are billed per upstream call.
# Image generation
curl https://www.velahub.ai/v1/images/generations \
-H "Authorization: Bearer vh-live-..." \
-H "Content-Type: application/json" \
-d '{"model":"dall-e-3","prompt":"a cat","n":1,"size":"1024x1024"}'
# Text-to-speech
curl https://www.velahub.ai/v1/audio/speech \
-H "Authorization: Bearer vh-live-..." \
-o speech.mp3 \
-d '{"model":"tts-1","input":"hello world","voice":"alloy"}'
# Speech-to-text (Whisper)
curl https://www.velahub.ai/v1/audio/transcriptions \
-H "Authorization: Bearer vh-live-..." \
-H "X-Velahub-Model: whisper-1" \
-F file=@audio.mp3 \
-F response_format=json
# Embeddings
curl https://www.velahub.ai/v1/embeddings \
-H "Authorization: Bearer vh-live-..." \
-d '{"model":"voyage-3","input":"hello world"}'Set X-Velahub-Model when the request body can't carry a model field (for example multipart uploads to /v1/audio/transcriptions).
Claude Code
Anthropic's claude CLI reads ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN from the environment — set both to point it at Velahub.
# Point Claude Code at Velahub
export ANTHROPIC_BASE_URL="https://www.velahub.ai"
export ANTHROPIC_AUTH_TOKEN="vh-live-..."
# Verify it works
claude --version
claude "hello, who am I talking to?"Velahub speaks the full Anthropic /v1/messages dialect, including tools, vision and streaming, so every Claude Code feature works unchanged.
Claude Code can burn through tokens fast on long sessions. Pair it with a X-Velahub-Max-Price-Microcents cap (see below) to bound a single call's cost.
Codex / OpenAI CLI
The official codex and openai CLIs both read OPENAI_API_KEY and OPENAI_BASE_URL. Set them once and every command flows through Velahub.
export OPENAI_API_KEY="vh-live-..."
export OPENAI_BASE_URL="https://www.velahub.ai/v1"
# Codex
codex --model gpt-4o "refactor this function"
# Plain OpenAI CLI
openai api chat.completions.create -m gpt-4o-mini -g user "say hi"Any tool built on top of the OpenAI Python or Node SDK (Aider, sgpt, llm, …) works the same way — just set the same two environment variables.
VS Code · Cline
Cline is a VS Code agent extension. Configure it to talk to Velahub via the OpenAI-compatible dialect:
- Install the Cline extension in VS Code.
- Open the Cline sidebar and click the settings gear icon.
- Set API Provider to
OpenAI Compatibleand fill in:- Base URL:
https://www.velahub.ai/v1 - API Key: your
vh-live-...key - Model ID: any Velahub model id, e.g.
gpt-4oorclaude-sonnet-4-6
- Base URL:
- Save and start a new task — Cline now talks to Velahub.
Watch the per-call cost on the recent generations page if you're iterating on prompts.
VS Code · Continue
Continue lets you mix providers in one config. Set apiBase per model — use the /v1 base for provider: openai entries and the bare host for provider: anthropic entries.
// ~/.continue/config.json
{
"models": [
{
"title": "Claude Sonnet 4.6 (via Velahub)",
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"apiKey": "vh-live-...",
"apiBase": "https://www.velahub.ai"
},
{
"title": "GPT-4o (via Velahub)",
"provider": "openai",
"model": "gpt-4o",
"apiKey": "vh-live-...",
"apiBase": "https://www.velahub.ai/v1"
}
],
"tabAutocompleteModel": {
"title": "Tab Autocomplete",
"provider": "openai",
"model": "gpt-4o-mini",
"apiKey": "vh-live-...",
"apiBase": "https://www.velahub.ai/v1"
}
}tabAutocompleteModel is called on every keystroke, so prefer a small / cheap model like gpt-4o-mini there.
Zed
Zed's assistant supports custom api_url per provider block. Configure both openai and anthropic to route through Velahub:
// ~/.config/zed/settings.json
{
"language_models": {
"openai": {
"api_url": "https://www.velahub.ai/v1",
"available_models": [
{ "name": "gpt-4o", "max_tokens": 128000 },
{ "name": "gpt-4o-mini", "max_tokens": 128000 }
]
},
"anthropic": {
"api_url": "https://www.velahub.ai",
"available_models": [
{ "name": "claude-sonnet-4-6", "max_tokens": 200000 },
{ "name": "claude-haiku-4-5", "max_tokens": 200000 }
]
}
},
"assistant": {
"default_model": {
"provider": "anthropic",
"model": "claude-sonnet-4-6"
},
"version": "2"
}
}Restart Zed after editing settings.json. The assistant picker will then list every model you declared under available_models.
Cursor
Cursor's Models setting accepts a custom OpenAI base URL. Use Velahub there to route Cursor's chat / Cmd-K / Composer traffic through your wallet.
- Open Cursor Settings → Models.
- Scroll to OpenAI API Key, paste your
vh-live-...key, and click Verify. - Expand OpenAI Base URL (under the key field), enable the toggle and set the URL to
https://www.velahub.ai/v1. - Under Model Names, enable the OpenAI-compatible models you want (e.g.
gpt-4o,gpt-4o-mini). - Disable Cursor's built-in models or set Velahub-routed ones as default — that's it.
Cursor's Background agents and Privacy mode still apply on top of Velahub. Make sure your team's policy allows routing through a third-party gateway before enabling this in shared workspaces.
Anthropic models in Cursor go through Cursor's own infra and can't be redirected — only the OpenAI-compatible slot is user-configurable. Use Claude Code or Continue if you need Velahub-routed Claude inside an editor.
Curl recipes
When something looks off, drop down to curl. The recipes below mirror what the SDKs do under the hood — they're the fastest way to confirm the gateway, your key and a given model are all healthy.
Health check
curl -s https://www.velahub.ai/healthz | jq
# {
# "status": "ok",
# "db": true,
# "tunnel": true
# }List models
No auth required. Browse the live catalogue exactly as the dashboard does.
curl -s https://www.velahub.ai/v1/models \
| jq '.data[] | {id, provider: .velahub.provider}'Non-streaming chat completion
curl -i https://www.velahub.ai/v1/chat/completions \
-H "Authorization: Bearer vh-live-..." \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"max_tokens": 100,
"messages": [{"role":"user","content":"Reply with the single word: pong."}]
}'The response includes the x-velahub-* cost and token headers — useful for budgeting and for confirming which upstream model was billed.
Streaming Anthropic call
curl -N https://www.velahub.ai/v1/messages \
-H "Authorization: Bearer vh-live-..." \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-6",
"max_tokens": 500,
"stream": true,
"messages": [{"role":"user","content":"Count from 1 to 5, one per line."}]
}'The -N flag disables curl's output buffering so you can see the SSE deltas land in real time. The final event is always message_stop, immediately followed by velahub.usage.
Cap the cost of a single call
# Reject the call if its estimated cost exceeds 10 cents.
# Value is in microcents (1 cent = 1,000,000).
curl https://www.velahub.ai/v1/chat/completions \
-H "Authorization: Bearer vh-live-..." \
-H "Content-Type: application/json" \
-H "X-Velahub-Max-Price-Microcents: 10000000" \
-d '{
"model": "gpt-4o",
"max_tokens": 4000,
"messages": [{"role":"user","content":"hi"}]
}'Auto-translate user content
curl https://www.velahub.ai/v1/chat/completions \
-H "Authorization: Bearer vh-live-..." \
-H "Content-Type: application/json" \
-H "X-Velahub-Translate: 1" \
-d '{
"model": "claude-sonnet-4-6",
"max_tokens": 200,
"messages": [{"role":"user","content":"今天天气怎么样?"}]
}'With X-Velahub-Translate: 1 the gateway translates non-English user content into English before sending it upstream and translates the model's reply back. Tool / function calls are preserved verbatim. See Claude Code for an end-to-end setup.
Run the full smoke test
The repo ships a scripts/smoke-test.sh script that exercises every dialect (chat, messages, streaming, embeddings, …) against a live key. Use it after rotating a key or before reporting a bug.
# Run all checks with your key.
VELAHUB_KEY=vh-live-... ./scripts/smoke-test.sh
# Expected tail of the output:
# -- Test 5: /v1/messages — Anthropic dialect, streaming --
# OK message_start (claude-haiku-4-5)
# OK 39 content delta(s)
# OK message_stop
#
# 7/7 checks passedLook up a single call
# Fetch the canonical record for one request id.
curl https://www.velahub.ai/v1/generations/<request-id> \
-H "Authorization: Bearer vh-mgmt-..."
# Returns the billed model, status, token counts and cost
# as well as duration_ms and whether the call used BYOK.
# Authenticate with a management key (vh-mgmt-...), not an inference API key.