
When an image generation request fails with HTTP 429 or your batch suddenly stalls at exactly the same concurrency every time, you are hitting a rate limit — not a bug. Every production image API gates traffic on at least one of three axes, and the difference matters when you debug:
POST calls you can fire inside a rolling 60-second window. Burst-friendly. Resets fast.For image generation specifically, the bottleneck is almost always concurrency, not RPM — a single 1024×1024 task can take 8–30 seconds end-to-end, so 10 parallel tasks ≠ 10 RPM. Treating these the same is the most common cause of "my retries make it worse."
POST /v1/tasks calls at once, the provider accepts the first N, the rest come back 429. Concurrency caps on most providers are not documented per-key — they are inferred from your tier.gpt-image-2 and flux-1-pro from the same key does not mean they share one quota.Walk these in order. Each one isolates a different axis.
Read the error body, not just the status. A real 429 from HiAPI looks like this:
{
"error": {
"code": "rate_limit_exceeded",
"type": "hiapi_error",
"request_id": "req_01HXXXXXXXXXXXXXXX"
}
}
Grab the request_id. That's the single piece of information that lets support trace exactly which limit you hit.
Distinguish 429 from 401. A 401 with {"error":{"code":"permission_denied","type":"hiapi_error","request_id":...}} is not a rate limit — it's an auth or entitlement failure. Don't add backoff to a 401; it will never become a 200 with patience.
Cap your client-side concurrency before you cap your retries. A simple semaphore is enough:
import asyncio, httpx
sem = asyncio.Semaphore(4) # match this to your provider's concurrency cap
async def submit(client, payload):
async with sem:
return await client.post("https://api.hiapi.ai/v1/tasks", json=payload)
Start at 4. Tune up only after a full batch runs without a single 429.
Use exponential backoff with full jitter on 429 only.
import asyncio, random
async def backoff(attempt):
# 1s, 2s, 4s, 8s, capped at 30s, randomized to avoid thundering herd
delay = min(30, 2 ** attempt) * random.random()
await asyncio.sleep(delay)
Cap retries at 5. After that, surface the failure — do not retry forever.
Honor Retry-After when present. If the response has a Retry-After header, that's the provider telling you exactly when the window resets. Use it instead of your computed backoff for that one retry.
Separate keys per environment. If staging and production share a key, your concurrency cap is shared too. One key per environment + one per high-volume worker pool is the cheapest fix to "limits feel lower than the docs say."
HiAPI fronts a POST /v1/tasks endpoint that accepts your job, returns a task id immediately, and you poll for completion. This split matters for rate-limit math: the create call is cheap and short-lived, so RPM caps are generous; the heavy work is gated by concurrency on the model side, where it belongs. When a downstream provider is congested, the create call still succeeds — your task just waits in the queue, and GET /v1/tasks/{id} returns status: "queued" until a slot opens. You never see a 429 from transient downstream pressure; you only see one if you exceed your own key's quota.
This is the practical reason teams move from a direct provider integration to HiAPI when their image workload outgrows a single account: one Bearer key, one rate-limit budget to reason about, and bursts are absorbed by the queue instead of by your retry loop.
Use this to confirm your key is healthy and your concurrency settings are sane. It submits one image task to gpt-image-2 and prints the task id:
curl -s -X POST https://api.hiapi.ai/v1/tasks \
-H "Authorization: Bearer $HIAPI_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-2",
"input": {
"prompt": "a single red apple on a wooden table, soft daylight",
"size": "1024x1024"
}
}'
Expected responses:
200 with {"id":"task_...","status":"queued"} → key works, quota fine.429 with {"error":{"code":"rate_limit_exceeded",...}} → you are over RPM, RPD, or concurrency. Slow down per the fix steps above.401 with {"error":{"code":"permission_denied",...}} → auth failure, not a rate limit. Check your key.Repeat this call from a quick seq 1 10 | xargs -n1 -P10 curl ... loop to find your concurrency ceiling — the first one that returns 429 marks your effective cap.
queued or processing longer than expected./v1/tasks flow.model_not_available errors — adjacent error class that's often misread as a rate limit.Is there a published RPM number for gpt-image-2 on HiAPI?
Limits are tier-based and can change as providers tune their own caps. Check your dashboard for your current effective limits rather than hard-coding a number from a blog post.
Why do retries sometimes make 429s worse?
A naive sleep(1) retry across N parallel workers turns one 429 into N synchronized retries that all hit again. Use full jitter (sleep * random.random()), not fixed delay.
Does HiAPI charge for 429 responses? No. Failed task creations (429, 401, 4xx in general) are not billed. You are only charged for tasks that successfully produce output.
Does increasing concurrency client-side raise my cap? No. Concurrency caps are server-side per key. Client-side parallelism only determines whether you stay under the cap or trip it.
What's the difference between RPM and TPM?
Image APIs gate on requests (RPM), not tokens (TPM) — there's no meaningful "token" unit for an image job. If you see a TPM limit in your dashboard, it applies to text/chat endpoints, not /v1/tasks image jobs.
Should I use a single key with high concurrency or many keys with low concurrency? For image workloads, one key per logical worker pool is the simplest mental model. Many keys under one billing account don't actually multiply your quota — they share it.
Can I request a higher concurrency cap?
Yes — once you've shown sustained usage at your current tier. Open the request_id of a representative 429 with support and they can trace which axis you're hitting.
Why didn't I see a Retry-After header?
Not all providers send one. When absent, fall back to exponential backoff with jitter starting at 1s. When present, trust it for that retry attempt.