Image Generation API Rate Limits Explained: What to Expect and How HiAPI Handles Bursts

hiapi

What "rate limit" actually means for image APIs

When an image generation request fails with HTTP 429 or your batch suddenly stalls at exactly the same concurrency every time, you are hitting a rate limit — not a bug. Every production image API gates traffic on at least one of three axes, and the difference matters when you debug:

RPM (requests per minute) — how many POST calls you can fire inside a rolling 60-second window. Burst-friendly. Resets fast.
RPD (requests per day) — a daily ceiling, usually tied to your plan tier. Resets on UTC midnight (or your account's billing day).
Concurrency / in-flight tasks — how many image jobs can be running at the same time. Even if RPM is healthy, the 5th task in flight gets queued or rejected.

For image generation specifically, the bottleneck is almost always concurrency, not RPM — a single 1024×1024 task can take 8–30 seconds end-to-end, so 10 parallel tasks ≠ 10 RPM. Treating these the same is the most common cause of "my retries make it worse."

Common causes ranked by hit rate

You're parallelizing past the concurrency cap. Most likely cause. Your loop fires 50 POST /v1/tasks calls at once, the provider accepts the first N, the rest come back 429. Concurrency caps on most providers are not documented per-key — they are inferred from your tier.
A retry storm after a transient error. A 5xx from one task triggers an immediate retry, every other in-flight task does the same, and your effective concurrency doubles inside one second.
Per-model RPM caps you didn't know about. A high-tier general key may still be capped per model, especially for new or premium models. Hitting gpt-image-2 and flux-1-pro from the same key does not mean they share one quota.
Daily quota exhausted mid-day. Less common but the most confusing — your requests work, then start failing at the same wall-clock time every day. Always RPD.
A noisy neighbor on a shared sub-account. If you provision keys per environment but they all share one billing account, staging traffic eats production quota.

Fix steps

Walk these in order. Each one isolates a different axis.

Read the error body, not just the status. A real 429 from HiAPI looks like this:
```
{
  "error": {
    "code": "rate_limit_exceeded",
    "type": "hiapi_error",
    "request_id": "req_01HXXXXXXXXXXXXXXX"
  }
}
```
Grab the request_id. That's the single piece of information that lets support trace exactly which limit you hit.
Distinguish 429 from 401. A 401 with {"error":{"code":"permission_denied","type":"hiapi_error","request_id":...}} is not a rate limit — it's an auth or entitlement failure. Don't add backoff to a 401; it will never become a 200 with patience.

Cap your client-side concurrency before you cap your retries. A simple semaphore is enough:

import asyncio, httpx
sem = asyncio.Semaphore(4)  # match this to your provider's concurrency cap

async def submit(client, payload):
    async with sem:
        return await client.post("https://api.hiapi.ai/v1/tasks", json=payload)

Start at 4. Tune up only after a full batch runs without a single 429.

Use exponential backoff with full jitter on 429 only.

import asyncio, random
async def backoff(attempt):
    # 1s, 2s, 4s, 8s, capped at 30s, randomized to avoid thundering herd
    delay = min(30, 2 ** attempt) * random.random()
    await asyncio.sleep(delay)

Cap retries at 5. After that, surface the failure — do not retry forever.

Honor Retry-After when present. If the response has a Retry-After header, that's the provider telling you exactly when the window resets. Use it instead of your computed backoff for that one retry.
Separate keys per environment. If staging and production share a key, your concurrency cap is shared too. One key per environment + one per high-volume worker pool is the cheapest fix to "limits feel lower than the docs say."

How HiAPI handles bursts

HiAPI fronts a POST /v1/tasks endpoint that accepts your job, returns a task id immediately, and you poll for completion. This split matters for rate-limit math: the create call is cheap and short-lived, so RPM caps are generous; the heavy work is gated by concurrency on the model side, where it belongs. When a downstream provider is congested, the create call still succeeds — your task just waits in the queue, and GET /v1/tasks/{id} returns status: "queued" until a slot opens. You never see a 429 from transient downstream pressure; you only see one if you exceed your own key's quota.

This is the practical reason teams move from a direct provider integration to HiAPI when their image workload outgrows a single account: one Bearer key, one rate-limit budget to reason about, and bursts are absorbed by the queue instead of by your retry loop.

Minimal verification example

Use this to confirm your key is healthy and your concurrency settings are sane. It submits one image task to gpt-image-2 and prints the task id:

curl -s -X POST https://api.hiapi.ai/v1/tasks \
  -H "Authorization: Bearer $HIAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-2",
    "input": {
      "prompt": "a single red apple on a wooden table, soft daylight",
      "size": "1024x1024"
    }
  }'

Expected responses:

200 with {"id":"task_...","status":"queued"} → key works, quota fine.
429 with {"error":{"code":"rate_limit_exceeded",...}} → you are over RPM, RPD, or concurrency. Slow down per the fix steps above.
401 with {"error":{"code":"permission_denied",...}} → auth failure, not a rate limit. Check your key.

Repeat this call from a quick seq 1 10 | xargs -n1 -P10 curl ... loop to find your concurrency ceiling — the first one that returns 429 marks your effective cap.

HiAPI task hang and timeout diagnosis — when a task stays in queued or processing longer than expected.
Why your HiAPI task callback isn't firing — async callback troubleshooting for the /v1/tasks flow.
Resolving model_not_available errors — adjacent error class that's often misread as a rate limit.
Model catalog — per-model pricing and endpoint type.
Pricing — tier-by-tier breakdown.

FAQ

Is there a published RPM number for gpt-image-2 on HiAPI? Limits are tier-based and can change as providers tune their own caps. Check your dashboard for your current effective limits rather than hard-coding a number from a blog post.

Why do retries sometimes make 429s worse? A naive sleep(1) retry across N parallel workers turns one 429 into N synchronized retries that all hit again. Use full jitter (sleep * random.random()), not fixed delay.

Does HiAPI charge for 429 responses? No. Failed task creations (429, 401, 4xx in general) are not billed. You are only charged for tasks that successfully produce output.

Does increasing concurrency client-side raise my cap? No. Concurrency caps are server-side per key. Client-side parallelism only determines whether you stay under the cap or trip it.

What's the difference between RPM and TPM? Image APIs gate on requests (RPM), not tokens (TPM) — there's no meaningful "token" unit for an image job. If you see a TPM limit in your dashboard, it applies to text/chat endpoints, not /v1/tasks image jobs.

Should I use a single key with high concurrency or many keys with low concurrency? For image workloads, one key per logical worker pool is the simplest mental model. Many keys under one billing account don't actually multiply your quota — they share it.

Can I request a higher concurrency cap? Yes — once you've shown sustained usage at your current tier. Open the request_id of a representative 429 with support and they can trace which axis you're hitting.

Why didn't I see a Retry-After header? Not all providers send one. When absent, fall back to exponential backoff with jitter starting at 1s. When present, trust it for that retry attempt.