RouteCat API Documentation

The open-source AI inference gateway. Access any model, pay per token with Bitcoin Lightning, powered by community GPU providers.

RouteCat is an open-source gateway that routes AI inference requests to a decentralized network of GPU providers running Owlrun nodes. It exposes an OpenAI-compatible API, so any existing client or library works out of the box.

Providers earn Bitcoin for serving requests. Users pay per token with Lightning Network micropayments — no subscriptions, no vendor lock-in.

Base URL

https://route.cat

All endpoints are relative to this base URL. The API follows the OpenAI specification where applicable, so you can use the official openai Python/JS libraries by changing the base_url.

Quick Start

Get up and running in three steps.

Get an API key

curl

curl -X POST https://route.cat/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{"name": "my app"}'

Send a request

Use your API key to call the chat completions endpoint.

curl

curl https://route.cat/v1/chat/completions \
  -H "Authorization: Bearer rc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Read the response

You get back a standard OpenAI-compatible response.

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "llama3.1:8b",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 11,
    "completion_tokens": 9,
    "total_tokens": 20
  }
}

Tip

You can use the official openai Python library by setting base_url="https://route.cat/v1" and your RouteCat API key.

Authentication

All authenticated requests require a valid API key sent via the Authorization header using the Bearer scheme.

Header

Authorization: Bearer rc_your_key_here

Key Format

API keys always start with the rc_ prefix followed by a random alphanumeric string. Keep your key secret — anyone with it can make requests on your behalf.

Access Tiers

Tier	Daily Quota	Rate Limit	Requirements
Free	10 requests/day	60 req/min	API key only
Paid	Unlimited (balance-based)	60 req/min	API key + sats balance

Rate Limiting

All API keys are rate-limited to 60 requests per minute. Exceeding this limit returns a 429 status code. Free-tier keys are additionally capped at 10 requests per day.

Endpoints

POST /v1/chat/completions

Generate a model response for a conversation. This is the primary endpoint and is compatible with the OpenAI Chat Completions API.

Request Body

model string required

The model ID to use (e.g. llama3.1:8b, mistral:7b, gemma2:9b). Use the /v1/models endpoint to list available models.

messages array required

An array of message objects. Each message has a role ("system", "user", or "assistant") and content (string).

temperature number optional

Sampling temperature between 0 and 2. Lower values are more focused, higher values more creative. Defaults to 0.7.

max_tokens integer optional

Maximum number of tokens to generate. Defaults to the model's maximum context length.

stream boolean optional

If true, response is streamed using Server-Sent Events (SSE). See Streaming.

top_p number optional

Nucleus sampling. Defaults to 1.0.

Example

curl

curl https://route.cat/v1/chat/completions \
  -H "Authorization: Bearer rc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1:8b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in one sentence."}
    ],
    "temperature": 0.5,
    "max_tokens": 100
  }'

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://route.cat/v1",
    api_key="rc_your_key_here",
)

response = client.chat.completions.create(
    model="llama3.1:8b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in one sentence."},
    ],
    temperature=0.5,
    max_tokens=100,
)

print(response.choices[0].message.content)

Response

JSON

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1713000000,
  "model": "llama3.1:8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses qubits that can exist in multiple states simultaneously, enabling it to solve certain problems exponentially faster than classical computers."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 28,
    "total_tokens": 52
  }
}

GET /v1/models

Returns a list of models currently available on the network, along with per-token pricing information.

No authentication required.

Example Response

JSON

{
  "object": "list",
  "data": [
    {
      "id": "llama3.1:8b",
      "object": "model",
      "owned_by": "community",
      "routecat_pricing": {
        "input_per_1m_tokens_usd": 0.05,
        "output_per_1m_tokens_usd": 0.10,
        "input_per_1m_tokens_sats": 50,
        "output_per_1m_tokens_sats": 100
      }
    },
    {
      "id": "mistral:7b",
      "object": "model",
      "owned_by": "community",
      "routecat_pricing": {
        "input_per_1m_tokens_usd": 0.04,
        "output_per_1m_tokens_usd": 0.08,
        "input_per_1m_tokens_sats": 40,
        "output_per_1m_tokens_sats": 80
      }
    }
  ]
}

The routecat_pricing object is a RouteCat extension not present in the standard OpenAI response. It shows the current per-token cost in both USD and satoshis.

POST /v1/auth/register

Create a new API key. No authentication required.

Request Body

name string optional

A label for this key. Helps you identify it later.

Example

Request

curl -X POST https://route.cat/v1/auth/register \
  -H "Content-Type: application/json" \
  -d '{"name": "my app"}'

Response

{
  "api_key": "rc_a1b2c3d4e5f6g7h8i9j0",
  "quota_daily": 10
}

Important

Store your API key securely. It is only shown once at creation time and cannot be retrieved later.

POST /v1/auth/topup

Generate a Lightning Network invoice to add satoshis to your account balance. Requires authentication.

Request Body

amount_sats integer required

Amount in satoshis to add to your balance. Minimum: 100 sats.

Example

Request

curl -X POST https://route.cat/v1/auth/topup \
  -H "Authorization: Bearer rc_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"amount_sats": 500}'

Response

{
  "invoice": "lnbc5u1pjk...",
  "amount_sats": 500,
  "expires_at": "2026-04-13T12:30:00Z"
}

Pay the returned Lightning invoice with any compatible wallet. Your balance updates automatically once the payment confirms.

GET /v1/auth/balance

Check your current account balance and remaining free-tier requests. Requires authentication.

Example Response

JSON

{
  "balance_sats": 4200,
  "free_remaining": 7
}

Field	Type	Description
`balance_sats`	integer	Your paid balance in satoshis
`free_remaining`	integer	Free-tier requests remaining today (resets at midnight UTC)

GET /v1/stats

Public endpoint returning current gateway statistics. No authentication required.

Example Response

JSON

{
  "nodes_online": 12,
  "jobs_24h": 3847,
  "tokens_24h": 2841500,
  "btc_usd": 84250.00,
  "version": "0.5.2",
  "commit": "816ea5c"
}

Field	Description
`nodes_online`	Number of GPU provider nodes currently connected
`jobs_24h`	Inference jobs completed in the last 24 hours
`tokens_24h`	Total tokens generated in the last 24 hours
`btc_usd`	Current BTC/USD exchange rate used for pricing
`version`	Gateway software version
`commit`	Git commit hash of the running build

GET /v1/audit

Public audit log of recent inference jobs. Data is anonymized — no user-identifiable information is included. No authentication required.

Example Response

JSON

{
  "jobs": [
    {
      "id": "job_x7k9m2",
      "model": "llama3.1:8b",
      "tokens_in": 45,
      "tokens_out": 128,
      "duration_ms": 1850,
      "cost_sats": 2,
      "timestamp": "2026-04-13T10:05:32Z"
    }
  ],
  "total": 3847,
  "page": 1
}

The audit log provides transparency into how the network is being used. It is useful for providers monitoring demand and for anyone verifying the gateway's operation.

Integrations

RouteCat works with any tool that supports the OpenAI API. Below you'll find step-by-step guides for the most popular ones. In every case, you only need two things:

What you need

API Key — get one for free at route.cat (Account section, "Create API Key")
Base URL — https://route.cat/v1

To see which models are available right now, visit route.cat → Pricing or call GET /v1/models. Use the exact model name shown there (e.g. qwen2.5:7b, llama3.1:8b).

VS Code — Continue

Continue is a free, open-source AI coding assistant for VS Code and JetBrains. It gives you autocomplete, chat, and inline editing — all powered by the model of your choice.

1. Install Continue

Open VS Code, go to the Extensions panel (Ctrl+Shift+X), search Continue, and click Install.

2. Open Continue settings

Click the Continue icon in the sidebar (the wand icon), then click the gear icon at the bottom of the Continue panel. This opens ~/.continue/config.yaml.

3. Add RouteCat as a provider

Replace or add the following in your config.yaml:

~/.continue/config.yaml

models:
  - name: RouteCat
    provider: openai
    model: qwen2.5:7b
    apiBase: https://route.cat/v1
    apiKey: rc_YOUR_KEY

4. Start chatting

Open the Continue panel, select RouteCat from the model dropdown, and ask a question. That's it!

Tip

You can add multiple models — just repeat the block with a different model value. Check the available models at route.cat/pricing.

Cursor

Cursor is an AI-first code editor built on VS Code. You can point it to RouteCat instead of the built-in models.

1. Open model settings

Go to Cursor Settings (Ctrl+Shift+J or click the gear icon) → Models.

2. Add an OpenAI-compatible model

Click + Add model, then choose OpenAI-compatible. Fill in:

Cursor Settings

Model name:    qwen2.5:7b
API Base URL:  https://route.cat/v1
API Key:       rc_YOUR_KEY

3. Select and use

Select the new model from the model selector in the chat panel or inline edit. Start coding.

Open WebUI

Open WebUI is a self-hosted web interface that looks and works like ChatGPT. You can connect it to RouteCat to chat with any available model through a familiar interface.

1. Open connections

In Open WebUI, click your profile icon (bottom-left) → Settings → Connections.

2. Add an OpenAI connection

Under OpenAI API, click + and fill in:

Open WebUI → Connections

URL:      https://route.cat/v1
API Key:  rc_YOUR_KEY

3. Save and chat

Click Save. The available models will appear in the model selector. Start a new chat, pick a model, and go.

Don't have Open WebUI?

Install it with one command: docker run -d -p 3000:8080 ghcr.io/open-webui/open-webui:main. Then open localhost:3000.

ChatBox

ChatBox is a free desktop app for Windows, Mac, and Linux. It's the easiest way to use RouteCat if you just want to chat — no terminal, no code.

1. Download and install

Download from chatboxai.app and install it.

2. Configure the AI provider

Open Settings (gear icon) → AI Model Provider. Select OpenAI API Compatible and fill in:

ChatBox Settings

API Domain:  https://route.cat
API Path:    /v1/chat/completions
API Key:     rc_YOUR_KEY
Model:       qwen2.5:7b

3. Chat away

Click Save and start a new conversation. Works exactly like ChatGPT.

OpenCode (CLI)

OpenCode is a terminal-based AI coding assistant with a TUI interface. It supports custom providers, making it easy to use RouteCat for AI-assisted coding directly in your terminal.

1. Install OpenCode

Terminal

# Quick install (Linux/macOS)
curl -fsSL https://opencode.ai/install | bash

# macOS (Homebrew)
brew install anomalyco/tap/opencode

# npm
npm i -g opencode-ai@latest

# Windows (Scoop)
scoop install opencode

2. Set your API key

Store your RouteCat API key as an environment variable. Add this to your ~/.bashrc, ~/.zshrc, or equivalent:

Shell

export ROUTECAT_API_KEY="rc_YOUR_KEY"

3. Configure

Create an opencode.json at your project root (or globally at ~/.config/opencode/opencode.json):

opencode.json

{
  "provider": {
    "routecat": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "RouteCat",
      "options": {
        "baseURL": "https://route.cat/v1",
        "apiKey": "{env:ROUTECAT_API_KEY}"
      },
      "models": {
        "qwen2.5:7b": { "name": "Qwen 2.5 7B" },
        "llama3.1:8b": { "name": "Llama 3.1 8B" }
      }
    }
  },
  "model": "routecat/qwen2.5:7b"
}

4. Run

Terminal

opencode

OpenCode will start in interactive mode. Ask it to edit files, explain code, or write tests — all powered by RouteCat.

Tip

The {env:ROUTECAT_API_KEY} syntax tells OpenCode to read the key from your environment — so your key never ends up committed to a repository. You can add more models to the models block at any time; check available ones at route.cat/pricing.

Any OpenAI-compatible app

RouteCat works with any tool that supports the OpenAI API. If your app has a settings page for "OpenAI" or "Custom API", you can point it to RouteCat. Here's the universal pattern:

Setting	Value
API Base URL	`https://route.cat/v1`
API Key	`rc_YOUR_KEY`
Model	Any model from the pricing table (e.g. `qwen2.5:7b`)

Environment variables

Many tools also accept configuration via environment variables:

Shell

export OPENAI_API_BASE=https://route.cat/v1
export OPENAI_API_KEY=rc_YOUR_KEY

Python (OpenAI SDK)

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://route.cat/v1",
    api_key="rc_YOUR_KEY",
)

response = client.chat.completions.create(
    model="qwen2.5:7b",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Node.js (OpenAI SDK)

JavaScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://route.cat/v1",
  apiKey: "rc_YOUR_KEY",
});

const response = await client.chat.completions.create({
  model: "qwen2.5:7b",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

Important

Always use the exact model name as shown in the pricing table, including the tag (e.g. qwen2.5:7b, not qwen2.5). If the model name doesn't match a connected node, you'll get a 503 error.

Streaming

When stream: true is set in a chat completion request, the response is delivered as Server-Sent Events (SSE). Each event contains a chunk of the response, allowing you to display tokens as they are generated.

SSE Format

Each line is prefixed with data: followed by a JSON object. The stream ends with a data: [DONE] sentinel.

SSE Stream

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":" there"}}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Parsing in Python

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://route.cat/v1",
    api_key="rc_your_key_here",
)

stream = client.chat.completions.create(
    model="llama3.1:8b",
    messages=[{"role": "user", "content": "Tell me a joke"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Parsing in JavaScript

JavaScript

const response = await fetch("https://route.cat/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer rc_your_key_here",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "llama3.1:8b",
    messages: [{ role: "user", content: "Tell me a joke" }],
    stream: true,
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const lines = decoder.decode(value).split("\n");
  for (const line of lines) {
    if (line.startsWith("data: ") && line !== "data: [DONE]") {
      const data = JSON.parse(line.slice(6));
      const content = data.choices[0]?.delta?.content;
      if (content) process.stdout.write(content);
    }
  }
}

Billing

RouteCat uses a simple per-token billing model. You pay only for what you use, with no monthly subscriptions or minimums.

How It Works

Top up your balance by paying a Lightning invoice (see /v1/auth/topup).
Use tokens — each request deducts from your balance based on the model's per-token rate.
Providers earn — 95% of the token cost goes to the GPU provider who served the request.
Gateway fee — RouteCat takes a 5% fee to cover infrastructure and development.

Pricing

Each model has its own per-token rate, visible via the /v1/models endpoint. Prices are set in USD and converted to satoshis using the current BTC/USD exchange rate (updated every 5 minutes).

Model	Input (per 1M tokens)	Output (per 1M tokens)
`llama3.1:8b`	$0.05 / ~50 sats	$0.10 / ~100 sats
`mistral:7b`	$0.04 / ~40 sats	$0.08 / ~80 sats
`gemma2:9b`	$0.06 / ~60 sats	$0.12 / ~120 sats

Prices shown in sats are approximate and depend on the current BTC/USD rate.

Spending Caps

If your balance reaches zero during a request, the response will still complete — you are not cut off mid-generation. However, subsequent requests will return a 402 error until you top up. Free-tier users receive 10 requests per day regardless of balance.

BTC/USD Conversion

The gateway fetches the BTC/USD rate every 5 minutes. The rate used for a given request is the rate in effect at the time the request is processed. You can check the current rate via /v1/stats.

Provider Guide

Anyone with a GPU can earn Bitcoin by serving inference requests on the RouteCat network. Providers run Owlrun, a lightweight agent that connects to the gateway and processes jobs using locally-installed models via Ollama.

Prerequisites

A machine with a GPU (NVIDIA recommended, 8GB+ VRAM)
A Lightning Network address for receiving payouts

Setup

Install Ollama

Ollama provides the local model runtime. Install it and pull the models you want to serve.

Shell

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b
ollama pull mistral:7b

Install Owlrun

Download the Owlrun binary for your platform from the releases page, or build from source.

Shell

# Download latest release (Linux amd64)
wget https://github.com/routecat/owlrun/releases/latest/download/owlrun-linux-amd64
chmod +x owlrun-linux-amd64
sudo mv owlrun-linux-amd64 /usr/local/bin/owlrun

Configure & Run

Point Owlrun at the RouteCat gateway and provide your Lightning address for payouts.

Shell

owlrun \
  --gateway https://route.cat \
  --lightning-address you@walletofsatoshi.com \
  --ollama-url http://localhost:11434

Earnings & Payouts

Providers earn 95% of the per-token cost for every request they serve. Earnings accumulate until they reach the payout threshold (1,000 sats by default), at which point the gateway automatically sends a Lightning payment to your configured address.

Tip

Pull popular models to maximize your chances of being selected for jobs. Check /v1/stats to see current network demand.

Errors

The API uses standard HTTP status codes. All error responses include a JSON body with an error field describing the issue.

Error Response Format

JSON

{
  "error": "Insufficient balance. Please top up your account."
}

Status Codes

200 Request succeeded.

400 Bad request. The request body is malformed or missing required fields.

401 Unauthorized. API key is missing, invalid, or revoked.

402 Payment required. Your balance is zero and free-tier quota is exhausted. Top up via /v1/auth/topup.

404 Not found. The requested model or endpoint does not exist.

429 Rate limited. You have exceeded 60 requests per minute. Retry after the Retry-After header interval.

500 Internal server error. Something went wrong on the gateway. Please try again or report the issue.

503 Service unavailable. No provider nodes are currently online for the requested model. Try again shortly or choose a different model.

Retry Strategy

For 429 and 503 errors, implement exponential backoff with a maximum of 3 retries. For 500 errors, a single retry is usually sufficient.

RouteCat is open source. Contribute on GitHub.