RouteCat API Documentation
The open-source AI inference gateway. Access any model, pay per token with Bitcoin Lightning, powered by community GPU providers.
RouteCat is an open-source gateway that routes AI inference requests to a decentralized network of GPU providers running Owlrun nodes. It exposes an OpenAI-compatible API, so any existing client or library works out of the box.
Providers earn Bitcoin for serving requests. Users pay per token with Lightning Network micropayments — no subscriptions, no vendor lock-in.
https://route.cat
All endpoints are relative to this base URL. The API follows the OpenAI specification where applicable, so you can use the official openai Python/JS libraries by changing the base_url.
Quick Start
Get up and running in three steps.
Get an API key
Register for a free key. You get 10 free requests per day with no payment required.
curl -X POST https://route.cat/v1/auth/register \
-H "Content-Type: application/json" \
-d '{"name": "my app"}'
Send a request
Use your API key to call the chat completions endpoint.
curl https://route.cat/v1/chat/completions \
-H "Authorization: Bearer rc_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.1:8b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Read the response
You get back a standard OpenAI-compatible response.
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "llama3.1:8b",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 11,
"completion_tokens": 9,
"total_tokens": 20
}
}
You can use the official openai Python library by setting base_url="https://route.cat/v1" and your RouteCat API key.
Authentication
All authenticated requests require a valid API key sent via the Authorization header using the Bearer scheme.
Authorization: Bearer rc_your_key_here
Key Format
API keys always start with the rc_ prefix followed by a random alphanumeric string. Keep your key secret — anyone with it can make requests on your behalf.
Access Tiers
| Tier | Daily Quota | Rate Limit | Requirements |
|---|---|---|---|
| Free | 10 requests/day | 60 req/min | API key only |
| Paid | Unlimited (balance-based) | 60 req/min | API key + sats balance |
All API keys are rate-limited to 60 requests per minute. Exceeding this limit returns a 429 status code. Free-tier keys are additionally capped at 10 requests per day.
Endpoints
Generate a model response for a conversation. This is the primary endpoint and is compatible with the OpenAI Chat Completions API.
Request Body
llama3.1:8b, mistral:7b, gemma2:9b). Use the /v1/models endpoint to list available models.role ("system", "user", or "assistant") and content (string).0.7.true, response is streamed using Server-Sent Events (SSE). See Streaming.1.0.Example
curl https://route.cat/v1/chat/completions \
-H "Authorization: Bearer rc_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.1:8b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in one sentence."}
],
"temperature": 0.5,
"max_tokens": 100
}'
from openai import OpenAI
client = OpenAI(
base_url="https://route.cat/v1",
api_key="rc_your_key_here",
)
response = client.chat.completions.create(
model="llama3.1:8b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in one sentence."},
],
temperature=0.5,
max_tokens=100,
)
print(response.choices[0].message.content)
Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1713000000,
"model": "llama3.1:8b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses qubits that can exist in multiple states simultaneously, enabling it to solve certain problems exponentially faster than classical computers."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 28,
"total_tokens": 52
}
}
Returns a list of models currently available on the network, along with per-token pricing information.
No authentication required.
Example Response
{
"object": "list",
"data": [
{
"id": "llama3.1:8b",
"object": "model",
"owned_by": "community",
"routecat_pricing": {
"input_per_1m_tokens_usd": 0.05,
"output_per_1m_tokens_usd": 0.10,
"input_per_1m_tokens_sats": 50,
"output_per_1m_tokens_sats": 100
}
},
{
"id": "mistral:7b",
"object": "model",
"owned_by": "community",
"routecat_pricing": {
"input_per_1m_tokens_usd": 0.04,
"output_per_1m_tokens_usd": 0.08,
"input_per_1m_tokens_sats": 40,
"output_per_1m_tokens_sats": 80
}
}
]
}
The routecat_pricing object is a RouteCat extension not present in the standard OpenAI response. It shows the current per-token cost in both USD and satoshis.
Create a new API key. No authentication required.
Request Body
Example
curl -X POST https://route.cat/v1/auth/register \
-H "Content-Type: application/json" \
-d '{"name": "my app"}'
{
"api_key": "rc_a1b2c3d4e5f6g7h8i9j0",
"quota_daily": 10
}
Store your API key securely. It is only shown once at creation time and cannot be retrieved later.
Generate a Lightning Network invoice to add satoshis to your account balance. Requires authentication.
Request Body
Example
curl -X POST https://route.cat/v1/auth/topup \
-H "Authorization: Bearer rc_your_key_here" \
-H "Content-Type: application/json" \
-d '{"amount_sats": 500}'
{
"invoice": "lnbc5u1pjk...",
"amount_sats": 500,
"expires_at": "2026-04-13T12:30:00Z"
}
Pay the returned Lightning invoice with any compatible wallet. Your balance updates automatically once the payment confirms.
Check your current account balance and remaining free-tier requests. Requires authentication.
Example Response
{
"balance_sats": 4200,
"free_remaining": 7
}
| Field | Type | Description |
|---|---|---|
balance_sats |
integer | Your paid balance in satoshis |
free_remaining |
integer | Free-tier requests remaining today (resets at midnight UTC) |
Public endpoint returning current gateway statistics. No authentication required.
Example Response
{
"nodes_online": 12,
"jobs_24h": 3847,
"tokens_24h": 2841500,
"btc_usd": 84250.00,
"version": "0.5.2",
"commit": "816ea5c"
}
| Field | Description |
|---|---|
nodes_online |
Number of GPU provider nodes currently connected |
jobs_24h |
Inference jobs completed in the last 24 hours |
tokens_24h |
Total tokens generated in the last 24 hours |
btc_usd |
Current BTC/USD exchange rate used for pricing |
version |
Gateway software version |
commit |
Git commit hash of the running build |
Public audit log of recent inference jobs. Data is anonymized — no user-identifiable information is included. No authentication required.
Example Response
{
"jobs": [
{
"id": "job_x7k9m2",
"model": "llama3.1:8b",
"tokens_in": 45,
"tokens_out": 128,
"duration_ms": 1850,
"cost_sats": 2,
"timestamp": "2026-04-13T10:05:32Z"
}
],
"total": 3847,
"page": 1
}
The audit log provides transparency into how the network is being used. It is useful for providers monitoring demand and for anyone verifying the gateway's operation.
Integrations
RouteCat works with any tool that supports the OpenAI API. Below you'll find step-by-step guides for the most popular ones. In every case, you only need two things:
API Key — get one for free at route.cat (Account section, "Create API Key")
Base URL — https://route.cat/v1
To see which models are available right now, visit route.cat → Pricing or call GET /v1/models. Use the exact model name shown there (e.g. qwen2.5:7b, llama3.1:8b).
VS Code — Continue
Continue is a free, open-source AI coding assistant for VS Code and JetBrains. It gives you autocomplete, chat, and inline editing — all powered by the model of your choice.
1. Install Continue
Open VS Code, go to the Extensions panel (Ctrl+Shift+X), search Continue, and click Install.
2. Open Continue settings
Click the Continue icon in the sidebar (the wand icon), then click the gear icon at the bottom of the Continue panel. This opens ~/.continue/config.yaml.
3. Add RouteCat as a provider
Replace or add the following in your config.yaml:
models:
- name: RouteCat
provider: openai
model: qwen2.5:7b
apiBase: https://route.cat/v1
apiKey: rc_YOUR_KEY
4. Start chatting
Open the Continue panel, select RouteCat from the model dropdown, and ask a question. That's it!
You can add multiple models — just repeat the block with a different model value. Check the available models at route.cat/pricing.
Cursor
Cursor is an AI-first code editor built on VS Code. You can point it to RouteCat instead of the built-in models.
1. Open model settings
Go to Cursor Settings (Ctrl+Shift+J or click the gear icon) → Models.
2. Add an OpenAI-compatible model
Click + Add model, then choose OpenAI-compatible. Fill in:
Model name: qwen2.5:7b
API Base URL: https://route.cat/v1
API Key: rc_YOUR_KEY
3. Select and use
Select the new model from the model selector in the chat panel or inline edit. Start coding.
Open WebUI
Open WebUI is a self-hosted web interface that looks and works like ChatGPT. You can connect it to RouteCat to chat with any available model through a familiar interface.
1. Open connections
In Open WebUI, click your profile icon (bottom-left) → Settings → Connections.
2. Add an OpenAI connection
Under OpenAI API, click + and fill in:
URL: https://route.cat/v1
API Key: rc_YOUR_KEY
3. Save and chat
Click Save. The available models will appear in the model selector. Start a new chat, pick a model, and go.
Install it with one command: docker run -d -p 3000:8080 ghcr.io/open-webui/open-webui:main. Then open localhost:3000.
ChatBox
ChatBox is a free desktop app for Windows, Mac, and Linux. It's the easiest way to use RouteCat if you just want to chat — no terminal, no code.
1. Download and install
Download from chatboxai.app and install it.
2. Configure the AI provider
Open Settings (gear icon) → AI Model Provider. Select OpenAI API Compatible and fill in:
API Domain: https://route.cat
API Path: /v1/chat/completions
API Key: rc_YOUR_KEY
Model: qwen2.5:7b
3. Chat away
Click Save and start a new conversation. Works exactly like ChatGPT.
OpenCode (CLI)
OpenCode is a terminal-based AI coding assistant with a TUI interface. It supports custom providers, making it easy to use RouteCat for AI-assisted coding directly in your terminal.
1. Install OpenCode
# Quick install (Linux/macOS)
curl -fsSL https://opencode.ai/install | bash
# macOS (Homebrew)
brew install anomalyco/tap/opencode
# npm
npm i -g opencode-ai@latest
# Windows (Scoop)
scoop install opencode
2. Set your API key
Store your RouteCat API key as an environment variable. Add this to your ~/.bashrc, ~/.zshrc, or equivalent:
export ROUTECAT_API_KEY="rc_YOUR_KEY"
3. Configure
Create an opencode.json at your project root (or globally at ~/.config/opencode/opencode.json):
{
"provider": {
"routecat": {
"npm": "@ai-sdk/openai-compatible",
"name": "RouteCat",
"options": {
"baseURL": "https://route.cat/v1",
"apiKey": "{env:ROUTECAT_API_KEY}"
},
"models": {
"qwen2.5:7b": { "name": "Qwen 2.5 7B" },
"llama3.1:8b": { "name": "Llama 3.1 8B" }
}
}
},
"model": "routecat/qwen2.5:7b"
}
4. Run
opencode
OpenCode will start in interactive mode. Ask it to edit files, explain code, or write tests — all powered by RouteCat.
The {env:ROUTECAT_API_KEY} syntax tells OpenCode to read the key from your environment — so your key never ends up committed to a repository. You can add more models to the models block at any time; check available ones at route.cat/pricing.
Any OpenAI-compatible app
RouteCat works with any tool that supports the OpenAI API. If your app has a settings page for "OpenAI" or "Custom API", you can point it to RouteCat. Here's the universal pattern:
| Setting | Value |
|---|---|
| API Base URL | https://route.cat/v1 |
| API Key | rc_YOUR_KEY |
| Model | Any model from the pricing table (e.g. qwen2.5:7b) |
Environment variables
Many tools also accept configuration via environment variables:
export OPENAI_API_BASE=https://route.cat/v1
export OPENAI_API_KEY=rc_YOUR_KEY
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://route.cat/v1",
api_key="rc_YOUR_KEY",
)
response = client.chat.completions.create(
model="qwen2.5:7b",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
Node.js (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://route.cat/v1",
apiKey: "rc_YOUR_KEY",
});
const response = await client.chat.completions.create({
model: "qwen2.5:7b",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
Always use the exact model name as shown in the pricing table, including the tag (e.g. qwen2.5:7b, not qwen2.5). If the model name doesn't match a connected node, you'll get a 503 error.
Streaming
When stream: true is set in a chat completion request, the response is delivered as Server-Sent Events (SSE). Each event contains a chunk of the response, allowing you to display tokens as they are generated.
SSE Format
Each line is prefixed with data: followed by a JSON object. The stream ends with a data: [DONE] sentinel.
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":" there"}}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Parsing in Python
from openai import OpenAI
client = OpenAI(
base_url="https://route.cat/v1",
api_key="rc_your_key_here",
)
stream = client.chat.completions.create(
model="llama3.1:8b",
messages=[{"role": "user", "content": "Tell me a joke"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Parsing in JavaScript
const response = await fetch("https://route.cat/v1/chat/completions", {
method: "POST",
headers: {
"Authorization": "Bearer rc_your_key_here",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "llama3.1:8b",
messages: [{ role: "user", content: "Tell me a joke" }],
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const lines = decoder.decode(value).split("\n");
for (const line of lines) {
if (line.startsWith("data: ") && line !== "data: [DONE]") {
const data = JSON.parse(line.slice(6));
const content = data.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
}
}
Billing
RouteCat uses a simple per-token billing model. You pay only for what you use, with no monthly subscriptions or minimums.
How It Works
- Top up your balance by paying a Lightning invoice (see /v1/auth/topup).
- Use tokens — each request deducts from your balance based on the model's per-token rate.
- Providers earn — 95% of the token cost goes to the GPU provider who served the request.
- Gateway fee — RouteCat takes a 5% fee to cover infrastructure and development.
Pricing
Each model has its own per-token rate, visible via the /v1/models endpoint. Prices are set in USD and converted to satoshis using the current BTC/USD exchange rate (updated every 5 minutes).
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
llama3.1:8b |
$0.05 / ~50 sats | $0.10 / ~100 sats |
mistral:7b |
$0.04 / ~40 sats | $0.08 / ~80 sats |
gemma2:9b |
$0.06 / ~60 sats | $0.12 / ~120 sats |
Prices shown in sats are approximate and depend on the current BTC/USD rate.
Spending Caps
If your balance reaches zero during a request, the response will still complete — you are not cut off mid-generation. However, subsequent requests will return a 402 error until you top up. Free-tier users receive 10 requests per day regardless of balance.
The gateway fetches the BTC/USD rate every 5 minutes. The rate used for a given request is the rate in effect at the time the request is processed. You can check the current rate via /v1/stats.
Provider Guide
Anyone with a GPU can earn Bitcoin by serving inference requests on the RouteCat network. Providers run Owlrun, a lightweight agent that connects to the gateway and processes jobs using locally-installed models via Ollama.
Prerequisites
- A machine with a GPU (NVIDIA recommended, 8GB+ VRAM)
- A Lightning Network address for receiving payouts
Setup
Install Ollama
Ollama provides the local model runtime. Install it and pull the models you want to serve.
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b
ollama pull mistral:7b
Install Owlrun
Download the Owlrun binary for your platform from the releases page, or build from source.
# Download latest release (Linux amd64)
wget https://github.com/routecat/owlrun/releases/latest/download/owlrun-linux-amd64
chmod +x owlrun-linux-amd64
sudo mv owlrun-linux-amd64 /usr/local/bin/owlrun
Configure & Run
Point Owlrun at the RouteCat gateway and provide your Lightning address for payouts.
owlrun \
--gateway https://route.cat \
--lightning-address you@walletofsatoshi.com \
--ollama-url http://localhost:11434
Earnings & Payouts
Providers earn 95% of the per-token cost for every request they serve. Earnings accumulate until they reach the payout threshold (1,000 sats by default), at which point the gateway automatically sends a Lightning payment to your configured address.
Pull popular models to maximize your chances of being selected for jobs. Check /v1/stats to see current network demand.
Errors
The API uses standard HTTP status codes. All error responses include a JSON body with an error field describing the issue.
Error Response Format
{
"error": "Insufficient balance. Please top up your account."
}
Status Codes
Retry-After header interval.
For 429 and 503 errors, implement exponential backoff with a maximum of 3 retries. For 500 errors, a single retry is usually sufficient.
RouteCat is open source. Contribute on GitHub.