OpenAI API Proxy in Python: How to Build One (and When to Use HiAPI Instead)

hiapi

OpenAI API Proxy in Python: How to Build One (and When to Use HiAPI Instead)

If you're shipping anything on top of the OpenAI API, sooner or later you'll want a server between your app and api.openai.com. That server is what people call an OpenAI API proxy — a thin pass-through that adds the things OpenAI itself doesn't give you: per-customer keys, rate limits, logging, retries, key rotation.

This recipe walks through a working FastAPI proxy in under 100 lines, then shows where the pattern breaks down — and what to use instead when you need more than just OpenAI behind that one endpoint.

What you'll need

Python 3.10+
An OpenAI API key (the upstream one your proxy holds; clients never see it).
For the multi-provider section: a HiAPI key from the HiAPI dashboard — sign up and grab sk-....

The proxy itself has no other dependencies beyond fastapi, uvicorn, and httpx.

pip install fastapi uvicorn httpx

Step 1 — A minimal pass-through proxy

The simplest proxy reads the upstream key from an env var, validates an internal key sent by your client, and forwards the body. We use httpx.AsyncClient so one worker can fan out across thousands of concurrent requests.

# proxy.py
import os
import httpx
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import StreamingResponse

UPSTREAM = "https://api.openai.com"
UPSTREAM_KEY = os.environ["OPENAI_API_KEY"]
INTERNAL_KEYS = set(os.environ["INTERNAL_KEYS"].split(","))  # comma-separated

app = FastAPI()
client = httpx.AsyncClient(timeout=60.0)


def auth(request: Request) -> str:
    header = request.headers.get("authorization", "")
    if not header.startswith("Bearer "):
        raise HTTPException(401, "missing bearer token")
    key = header.removeprefix("Bearer ").strip()
    if key not in INTERNAL_KEYS:
        raise HTTPException(401, "invalid internal key")
    return key


@app.post("/v1/{path:path}")
async def forward(path: str, request: Request):
    tenant = auth(request)
    body = await request.body()
    upstream_url = f"{UPSTREAM}/v1/{path}"

    # Replace the client's internal key with the real upstream key
    headers = {
        "Authorization": f"Bearer {UPSTREAM_KEY}",
        "Content-Type": request.headers.get("content-type", "application/json"),
    }

    upstream = await client.send(
        client.build_request("POST", upstream_url, content=body, headers=headers),
        stream=True,
    )

    # Stream the response back so SSE / long generations don't time out
    return StreamingResponse(
        upstream.aiter_raw(),
        status_code=upstream.status_code,
        media_type=upstream.headers.get("content-type"),
    )

Run it:

OPENAI_API_KEY=sk-upstream-key \
INTERNAL_KEYS=tenant-a,tenant-b \
uvicorn proxy:app --host 0.0.0.0 --port 8000

Now a client points the OpenAI SDK at your proxy instead of OpenAI directly:

from openai import OpenAI

client = OpenAI(api_key="tenant-a", base_url="http://localhost:8000/v1")
resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)

The proxy strips the client's tenant-a key, swaps in the real upstream key, and forwards the body untouched. Streaming chat completions, embeddings, anything OpenAI accepts at /v1/* — same handler.

Step 2 — Add what OpenAI doesn't give you

The pass-through is the easy part. The reason to run a proxy is usually one of these:

Per-tenant rate limits. Hold a counter in Redis keyed by tenant, reject (or queue) when they exceed a window. Without this, one noisy customer eats your whole OpenAI quota.

Audit logging. Log tenant, path, status, latency_ms, tokens_in, tokens_out. Never log the upstream Authorization header — strip it before any structured log call.

Key rotation. If UPSTREAM_KEY is loaded from a secret manager and refreshed on a timer, you can rotate OpenAI keys without redeploying the proxy. Clients keep using their tenant key; the upstream key swaps under them.

Cost guardrails. Inspect the request body (json.loads(body)) and reject if max_tokens or model is outside policy — e.g., block gpt-4o for free-tier tenants.

Here's the rate-limit hook added to the handler above:

from collections import defaultdict
import time

_window: dict[str, list[float]] = defaultdict(list)
_LIMIT = 60  # requests per 60s

def check_rate_limit(tenant: str):
    now = time.time()
    _window[tenant] = [t for t in _window[tenant] if now - t < 60]
    if len(_window[tenant]) >= _LIMIT:
        raise HTTPException(429, "rate limit exceeded")
    _window[tenant].append(now)

Call check_rate_limit(tenant) right after auth(). (For real traffic, move the counter into Redis with INCR + EXPIRE so it survives restarts and scales horizontally.)

Step 3 — Where the proxy pattern breaks

The proxy in Step 1 works because OpenAI's surface is one shape: POST /v1/chat/completions, body in, response out, all in seconds. The moment you want to add a second provider, the assumptions break:

Google Veo, HiAPI HappyHorse, Seedance — video models take minutes, not seconds. A synchronous proxy that holds the connection open will hit every timeout in your stack (load balancer, CDN, browser).
Image models — each provider names the same parameter differently: qwen-image-2-0 takes aspect_ratio, GPT Image takes size. Your "thin pass-through" now needs per-provider request translation.
Auth headers — OpenAI uses Authorization: Bearer, but other vendors use x-api-key headers or query string keys. More if-statements in your forwarder.

You can keep building. Or you can use a service that already did this.

Step 4 — One unified endpoint instead of N proxies

HiAPI exposes one async task endpoint that covers OpenAI-family models (GPT Image), Google (Veo), Qwen, FLUX, Seedance, Wan, and others. The shape is the same regardless of provider:

# 1. Create — returns taskId immediately, generation runs async on our side
curl -X POST "https://api.hiapi.ai/v1/tasks" \
  -H "Authorization: Bearer sk-YOUR_HIAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-image-2-0",
    "input": {
      "prompt": "a glass data center at golden hour, isometric",
      "aspect_ratio": "16:9"
    }
  }'
# => {"code":0,"message":"ok","data":{"taskId":"tk-hiapi-01HZTQ..."}}

Then either poll for the result:

curl "https://api.hiapi.ai/v1/tasks/tk-hiapi-01HZTQ..." \
  -H "Authorization: Bearer sk-YOUR_HIAPI_KEY"
# => when status=="success": {"data":{"status":"success","output":[{"url":"https://..."}]}}

Or — better for production — register a callback so we POST you the result when it's ready:

curl -X POST "https://api.hiapi.ai/v1/tasks" \
  -H "Authorization: Bearer sk-YOUR_HIAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "happyhorse-1-0",
    "callback": { "url": "https://your.app/hiapi/cb", "when": "final" },
    "input": { "prompt": "..." }
  }'

In Python:

import os, time, httpx

HIAPI_KEY = os.environ["HIAPI_KEY"]
HDRS = {"Authorization": f"Bearer {HIAPI_KEY}", "Content-Type": "application/json"}

async def generate(model: str, input_payload: dict) -> str:
    async with httpx.AsyncClient(timeout=30.0) as c:
        r = await c.post("https://api.hiapi.ai/v1/tasks",
                         headers=HDRS,
                         json={"model": model, "input": input_payload})
        task_id = r.json()["data"]["taskId"]

        # Poll (or use callbacks in production)
        while True:
            r = await c.get(f"https://api.hiapi.ai/v1/tasks/{task_id}", headers=HDRS)
            data = r.json()["data"]
            if data["status"] == "success":
                return data["output"][0]["url"]
            if data["status"] == "fail":
                raise RuntimeError(data.get("error"))
            await asyncio.sleep(2)

Swap qwen-image-2-0 for gpt-image-2-pro, seedance-2-0, or any model on the HiAPI models page — the call shape is the same. The reason a unified endpoint matters is exactly what makes the Step-1 proxy painful at scale: the provider differences live on our side, not in your code.

Auth failures look like this. A wrong or expired key returns HTTP 401 with {"error":{"code":"permission_denied","type":"hiapi_error","request_id":"..."}}. Keep the request_id when you log — it's how support traces the call. The full reference is on the Authentication page.

Production checklist

Use callbacks, not polling. Polling thousands of tasks burns RPS for no reason. Pass callback.url + "when": "final" and we POST you the result. See Create Task.
Handle idempotency. Callback delivery can retry. Treat your callback handler as idempotent — key on taskId.
Don't trust the output URL forever. Each successful task returns an output[].url with an expireAt. Persist the bytes to your own storage if you need them past that.
Strip secrets from logs. Same rule as the Step-1 proxy: never log Authorization. With HiAPI it's just one upstream key instead of N, but the rule still holds.
Status codes are flat. Poll Get Task Detail — the status enum is queued | handling | archiving | success | fail. Only success and fail are terminal.

FAQ

Do I need a proxy at all if I use HiAPI? Only for the same reasons you'd want one in front of any vendor: per-tenant keys, rate limits, audit logs. The point of HiAPI is that you're no longer writing per-provider translation logic inside that proxy — it's a thin auth + logging layer, not a smart router.

Can I use the OpenAI Python SDK against HiAPI? No — HiAPI's /v1/tasks is async-by-design (you get a taskId, not a final response). OpenAI's SDK expects a synchronous chat-completions shape. Use httpx directly (the snippet above) or our Python SDK.

Is there a cheaper plan for testing? Pricing is per-task per-model and shown on the pricing page. There's no monthly minimum — you pay for what you generate.

How do I rotate keys without downtime? Generate a new key in the dashboard, deploy it to your proxy / secret store, then revoke the old one. HiAPI keys can coexist — you can run both in parallel during the rollout window.

What does "model not available" mean? If you see MODEL_UNAVAILABLE from /v1/tasks, the model id is wrong or your key doesn't have access. Use the bare model id (e.g., qwen-image-2-0, not qwen/qwen-image-2-0) and check the models page for the current list.

If you've been keeping a custom OpenAI proxy alive just to add a second image or video model behind it, that's the moment to switch. The proxy stays useful for tenant management; the multi-provider problem belongs on the API side.