A working curl + Python guide for catalog packshots, lifestyle shots, variants and bilingual banners at $0.025 per image — including the size schema quirks no other hiapi image model has.

qwen-image-2.0 is Alibaba's second-generation text-to-image model, shipped through the hiapi /v1/tasks endpoint at $0.025 per image with a default 2K output. Two things make it interesting for ecommerce: its Chinese-text rendering is the cleanest in any current text-to-image model at this price band, and its prompt obedience holds up across long, descriptive product briefs — the kind a cross-border listing actually needs.
This guide is the working version of that workflow. Every API call is what you would paste into a terminal, the prompt templates are the ones we test-rendered on the live endpoint, and the per-image price is the live number from HiAPI pricing. Nothing here is theoretical: the cost figures assume the actual size values qwen-image-2.0 accepts, not the marketing ones.
Three things matter when you are pushing 200 SKUs at a time onto a marketplace listing.
For prompt-by-prompt tuning experiments and a six-recipe walkthrough that is style-led rather than ecommerce-led, see the companion piece qwen-image-2.0 prompt recipes. This one stays focused on the catalog use case.
qwen-image-2.0 ships under the unified /v1/tasks interface, which is the only image endpoint hiapi still serves. There is one quirk worth getting right before you write any code:
size — not aspect_ratio, not resolution, not image_size. Other models on hiapi (nano-banana, flux variants) accept aspect_ratio; qwen-image-2.0 does not.size is *, not x. "size": "2688x1536" returns a 400; "size": "2688*1536" works.size values are accepted. They are all 2K:| Aspect | size value | Use |
|---|---|---|
| 16:9 (~1.75:1) | 2688*1536 | Hero shots, banners, listing covers |
| 4:3 (~1.37:1) | 2368*1728 | Editorial spreads, two-column ad units |
| 1:1 | 2048*2048 | Square packshots, marketplace thumbnails |
| 3:4 (~0.73:1) | 1728*2368 | Mobile-first product pages |
| 9:16 (~0.57:1) | 1536*2688 | Vertical video covers, TikTok/Reels stills |
If you submit any other value, the API responds with the literal allowlist in the error:
{"code": 400, "message": "invalid input: size: value must be one of '2688*1536', '2368*1728', '2048*2048', '1728*2368', '1536*2688'"}
Treat that error message as the authoritative source of truth — if you see a new value land there in the future, the platform has expanded the schema.
The whole call surface is two endpoints: POST /v1/tasks to enqueue, GET /v1/tasks/{taskId} to poll. Here is the minimum that produced the packshot below.
curl -s -X POST https://api.hiapi.ai/v1/tasks \
-H "Authorization: Bearer $HIAPI_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-image-2.0",
"input": {
"prompt": "A clear glass cosmetic dropper bottle on a clean off-white seamless paper background, soft daylight from camera left, gentle elliptical shadow, the label reads 玫瑰精华水 in bold serif Chinese with a small line of English Rose Essence Water below, the liquid inside has a faint pink tint, studio product photography, sharp focus on the label, no other props, square 1:1 composition",
"size": "2048*2048"
}
}'
Response:
{"code": 200, "data": {"taskId": "tk-hiapi-01KW8R..."}, "message": "success"}
Poll with:
curl -s https://api.hiapi.ai/v1/tasks/tk-hiapi-01KW8R... \
-H "Authorization: Bearer $HIAPI_TOKEN"
The status field walks through handling → archiving → success. Total wall-clock time on our test runs was 25–45 seconds per image at 2K. When status is success, output[0].url holds the image — but it points at temp.hiapi.ai and carries an expireAt Unix timestamp, so download it immediately. Do not embed it on a page or pin it in a database; it will 404 within hours.
This packshot prompt, run on the live endpoint, produces a glass dropper bottle with the Chinese characters 玫瑰精华水 rendered cleanly across the label and the English Rose Essence Water in a smaller weight directly underneath. The model places both lines in the canonical label region without any layout hints in the prompt — the "the label reads X" phrasing is enough.
This is the loop we ran to generate the images on this page. It accepts a list of (slug, prompt, size) tuples, submits them concurrently up to a small limit, polls each task to completion, downloads the result, and uploads it as webp to your own storage. The pattern is intentionally boring — requests plus concurrent.futures — because the model is the interesting part.
import os, time, io, requests, concurrent.futures as cf
from PIL import Image
TOKEN = os.environ["HIAPI_TOKEN"]
BASE = "https://api.hiapi.ai/v1/tasks"
H = {"Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json"}
def submit(prompt, size):
r = requests.post(BASE, headers=H, json={
"model": "qwen-image-2.0",
"input": {"prompt": prompt, "size": size},
}, timeout=60)
r.raise_for_status()
return r.json()["data"]["taskId"]
def wait(task_id, deadline_s=240):
deadline = time.time() + deadline_s
while time.time() < deadline:
r = requests.get(f"{BASE}/{task_id}", headers=H, timeout=30)
d = r.json()["data"]
if d["status"] == "success":
return d["output"][0]["url"]
if d["status"] == "fail":
raise RuntimeError(f"task {task_id} failed: {d.get('error')}")
time.sleep(4)
raise TimeoutError(f"task {task_id} not done in {deadline_s}s")
def render_one(item):
slug, prompt, size = item
task_id = submit(prompt, size)
url = wait(task_id)
png = requests.get(url, timeout=60).content
buf = io.BytesIO()
Image.open(io.BytesIO(png)).convert("RGB").save(buf, "WEBP", quality=88, method=6)
return slug, buf.getvalue()
JOBS = [
("packshot-rose-water", "A clear dropper bottle, label 玫瑰精华水…", "2048*2048"),
("packshot-tea-canister","A pale grey ceramic tea canister, label 龙井…", "2048*2048"),
# … 200 more rows from your sheet …
]
with cf.ThreadPoolExecutor(max_workers=6) as pool:
for slug, webp in pool.map(render_one, JOBS):
with open(f"out/{slug}.webp", "wb") as f:
f.write(webp)
print(f"saved {slug}.webp ({len(webp)} bytes)")
A few small notes on this that are not obvious from the code itself.
max_workers=6 is the sweet spot we observed. At 8–10 you start to see queue depth show up as extra latency rather than throughput. The model side appears to fair-share concurrency per account.These are the prompt shapes that survived our internal A/B against more verbose alternatives. They are written for English speakers using Chinese product labels — the most common cross-border case.
The default product shot. Specify backdrop color in plain English, label text in literal Chinese with the font family in English, and one source of light.
A pale grey ceramic tea canister on a clean off-white seamless paper background. The label reads 龙井 in bold black serif Chinese with the small English subtitle Longjing Green Tea below. Soft daylight from camera right, gentle elliptical shadow, no other props. Sharp focus on the label, studio product photography, square 1:1 composition.
The two things doing the work are "reads X in bold serif Chinese" (qwen-image-2.0 places this where it expects a label) and "sharp focus on the label" (makes the model treat the text as the focal point, not decorative).
For lifestyle photography, describe the scene first and place the product into it last, not the other way around. The model is more likely to drift on the scene than on the product, so anchor the scene first.
A wooden Japanese-style table near a wide window in late-morning daylight. The window casts a soft rectangle of light onto the table. In the center of the table is the pale grey ceramic tea canister from the previous shot, with a small wisp of steam rising from a cup beside it. Loose tea leaves are scattered to the right. Wabi-sabi aesthetic, soft shadows, shallow depth of field with focus on the canister. 16:9 widescreen composition.
On our test render the canister carries the characters 龙井 with the English Longjing Green Tea directly below — the same label discipline as the packshot — and the steam, scattered leaves and window light all land as described.
When variants share an identical product geometry, you can render the whole grid in one call rather than three. Cheaper and the lighting matches across cells by construction.
Three pairs of low-top canvas sneakers shot from a slight top-down angle, arranged in one horizontal row on a clean light-grey seamless background. From left to right: dusty rose, sage green, ivory cream. Identical sneaker silhouette, identical lighting, identical shadow. Each pair faces the camera with toes pointing forward. Studio product photography, 1:1 composition, evenly spaced.
This pattern saves you from rendering each variant separately and then trying to match the lighting in post. The constraint "identical sneaker silhouette, identical lighting, identical shadow" is what holds the row together.
For Shopee, Lazada, and Taobao co-branded campaigns you usually need both Chinese headline and English subtitle in the same image. qwen-image-2.0 is the only model on hiapi that gets both right in the same render.
A horizontal ecommerce promotional banner on a soft pink gradient background. The main headline reads 双十一大促 in large bold black sans-serif Chinese in the center-left. Below the Chinese headline in smaller dark grey English serif: Double 11 Sale — Up to 50% Off. To the right is a clear glass dropper bottle of pink serum with a small white label. Subtle confetti texture in the background, clean and modern, 16:9 widescreen.
The trick that makes this consistently work is putting the Chinese first in the prompt and giving it the "large bold" qualifier, then giving the English the "smaller" qualifier. If you reverse the order the model tends to make the English headline-sized too. The product label inside the banner is the place to ease off — on our test render qwen picked a generic "Vitamin C Serum" label for the bottle because we did not spell the characters out; for branded campaigns, either include the literal label text in the prompt or composite the real label in post.
For composite assets that look like an Amazon or Tmall listing tile — product, headline, price, badges — the prompt has to read like a layout brief, not a photo brief.
A square ecommerce listing card on a white background. In the top two-thirds, a clear glass dropper bottle of pink rose-essence water labeled 玫瑰精华水 / Rose Essence Water. In the bottom third, on the left, the price 99 元 in large bold black Chinese characters with a small English subtitle Rose Essence 30ml below. On the right, a small green badge that reads 新品 in white. Clean modern layout, soft drop shadow under the product, 4:3 composition.
For listings, render in 4:3 (2368*1728) rather than 1:1 — most marketplace listing components are 4:3 in their hero slot and you will get awkward crops at square.
The last pattern is the one that turns the unit economics. Once each render is under three cents, you can stop pre-rendering banners for marketing campaigns and render them per-customer at request time — "Hi 王女士, your monthly bundle is ready" with the customer's actual name baked into the banner. Cache by name hash, set a TTL, and you have personalized banner imagery for the cost of a small CDN bill.
The constraint is latency. 25–45 seconds per render is too slow for synchronous request flow but perfectly fine for an email send or a push notification — generate at queue time, embed when the message goes out.
For balance, the four jobs we still hand off to other models in the same /v1/tasks endpoint.
nano-banana-pro instead, it has better grasp of "casual" composition.flux-schnell returns in ~5 seconds at $0.005/image but does not render Chinese text. Pick by latency budget first.Putting numbers on the workflow end-to-end, for a representative case.
| Step | Quantity | Unit cost | Total |
|---|---|---|---|
| Hero / packshot per SKU | 200 × 1 | $0.025 | $5.00 |
| Lifestyle per SKU | 200 × 1 | $0.025 | $5.00 |
| Variant grid per SKU (~3 variants in one render) | 200 × 1 | $0.025 | $5.00 |
| Bilingual banner per category (10 categories) | 10 × 1 | $0.025 | $0.25 |
| Listing card per SKU | 200 × 1 | $0.025 | $5.00 |
| Total image-gen cost | 810 renders | — | $20.25 |
At current hiapi pricing, the entire image budget for a 200-SKU cross-border launch lands around $20. Premium models for the same job (counting from $0.04 per image and up) land $30–$140 with longer per-image runtime. That's the gap qwen-image-2.0 was designed to fill: Chinese-aware ecommerce work at the cost band where you stop counting renders.
flux-schnell and follow the same /v1/tasks shape — see Generating Ecommerce Product Images with FLUX.1 schnell for the comparison.The whole point of the unified /v1/tasks endpoint is that switching models is changing one string. Once you have the loop in place for qwen-image-2.0, the same Python script renders against any image model on hiapi by editing the "model" field and matching the size schema for that model.