Generating Ecommerce Product Images with qwen-image-2.0 via the hiapi API

A working curl + Python guide for catalog packshots, lifestyle shots, variants and bilingual banners at $0.025 per image — including the size schema quirks no other hiapi image model has.

hiapi

$0.025Cost per image

~25-45sEnd-to-end time

$20200-SKU catalog cost

Generating Ecommerce Product Images with qwen-image-2.0 via the hiapi API

qwen-image-2.0 is Alibaba's second-generation text-to-image model, shipped through the hiapi /v1/tasks endpoint at $0.025 per image with a default 2K output. Two things make it interesting for ecommerce: its Chinese-text rendering is the cleanest in any current text-to-image model at this price band, and its prompt obedience holds up across long, descriptive product briefs — the kind a cross-border listing actually needs.

This guide is the working version of that workflow. Every API call is what you would paste into a terminal, the prompt templates are the ones we test-rendered on the live endpoint, and the per-image price is the live number from HiAPI pricing. Nothing here is theoretical: the cost figures assume the actual size values qwen-image-2.0 accepts, not the marketing ones.

Why qwen-image-2.0 for cross-border product images

Three things matter when you are pushing 200 SKUs at a time onto a marketplace listing.

Chinese-text rendering. Most ecommerce listings that sell into mainland China, Hong Kong, Taiwan or Southeast Asia need readable Chinese on packaging, banners or labels. qwen-image-2.0 is the only model on hiapi where you can write a prompt like "the bottle label reads 玫瑰精华水 in bold serif Chinese" and the model will render those exact glyphs in the exact place you describe.
Prompt obedience on long product briefs. Ecommerce briefs are long: material, color hex, surface finish, lighting direction, shadow shape, backdrop color, camera angle. qwen-image-2.0 holds together as the brief grows — the failure mode is usually one missed prop, not a hallucinated product.
Cost band that survives variant explosion. A typical SKU page wants 4–6 shots (packshot, lifestyle, top-down, scale comparison, two variants). At $0.025 per image, that's $0.10–$0.15 per SKU. A 200-SKU drop costs $20–$30. That's the band that makes "generate everything from scratch" cheaper than commissioning a partial photoshoot and filling gaps with stock.

For prompt-by-prompt tuning experiments and a six-recipe walkthrough that is style-led rather than ecommerce-led, see the companion piece qwen-image-2.0 prompt recipes. This one stays focused on the catalog use case.

The input schema for qwen-image-2.0 on hiapi

qwen-image-2.0 ships under the unified /v1/tasks interface, which is the only image endpoint hiapi still serves. There is one quirk worth getting right before you write any code:

The size parameter is named size — not aspect_ratio, not resolution, not image_size. Other models on hiapi (nano-banana, flux variants) accept aspect_ratio; qwen-image-2.0 does not.
The separator inside size is *, not x. "size": "2688x1536" returns a 400; "size": "2688*1536" works.
Only five size values are accepted. They are all 2K:

Aspect	size value	Use
16:9 (~1.75:1)	`2688*1536`	Hero shots, banners, listing covers
4:3 (~1.37:1)	`2368*1728`	Editorial spreads, two-column ad units
1:1	`2048*2048`	Square packshots, marketplace thumbnails
3:4 (~0.73:1)	`1728*2368`	Mobile-first product pages
9:16 (~0.57:1)	`1536*2688`	Vertical video covers, TikTok/Reels stills

If you submit any other value, the API responds with the literal allowlist in the error:

{"code": 400, "message": "invalid input: size: value must be one of '2688*1536', '2368*1728', '2048*2048', '1728*2368', '1536*2688'"}

Treat that error message as the authoritative source of truth — if you see a new value land there in the future, the platform has expanded the schema.

A working request

The whole call surface is two endpoints: POST /v1/tasks to enqueue, GET /v1/tasks/{taskId} to poll. Here is the minimum that produced the packshot below.

curl -s -X POST https://api.hiapi.ai/v1/tasks \
  -H "Authorization: Bearer $HIAPI_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-image-2.0",
    "input": {
      "prompt": "A clear glass cosmetic dropper bottle on a clean off-white seamless paper background, soft daylight from camera left, gentle elliptical shadow, the label reads 玫瑰精华水 in bold serif Chinese with a small line of English Rose Essence Water below, the liquid inside has a faint pink tint, studio product photography, sharp focus on the label, no other props, square 1:1 composition",
      "size": "2048*2048"
    }
  }'

Response:

{"code": 200, "data": {"taskId": "tk-hiapi-01KW8R..."}, "message": "success"}

Poll with:

curl -s https://api.hiapi.ai/v1/tasks/tk-hiapi-01KW8R... \
  -H "Authorization: Bearer $HIAPI_TOKEN"

The status field walks through handling → archiving → success. Total wall-clock time on our test runs was 25–45 seconds per image at 2K. When status is success, output[0].url holds the image — but it points at temp.hiapi.ai and carries an expireAt Unix timestamp, so download it immediately. Do not embed it on a page or pin it in a database; it will 404 within hours.

This packshot prompt, run on the live endpoint, produces a glass dropper bottle with the Chinese characters 玫瑰精华水 rendered cleanly across the label and the English Rose Essence Water in a smaller weight directly underneath. The model places both lines in the canonical label region without any layout hints in the prompt — the "the label reads X" phrasing is enough.

A Python loop that turns a SKU sheet into uploaded packshots

This is the loop we ran to generate the images on this page. It accepts a list of (slug, prompt, size) tuples, submits them concurrently up to a small limit, polls each task to completion, downloads the result, and uploads it as webp to your own storage. The pattern is intentionally boring — requests plus concurrent.futures — because the model is the interesting part.

import os, time, io, requests, concurrent.futures as cf
from PIL import Image

TOKEN = os.environ["HIAPI_TOKEN"]
BASE = "https://api.hiapi.ai/v1/tasks"
H = {"Authorization": f"Bearer {TOKEN}", "Content-Type": "application/json"}


def submit(prompt, size):
    r = requests.post(BASE, headers=H, json={
        "model": "qwen-image-2.0",
        "input": {"prompt": prompt, "size": size},
    }, timeout=60)
    r.raise_for_status()
    return r.json()["data"]["taskId"]


def wait(task_id, deadline_s=240):
    deadline = time.time() + deadline_s
    while time.time() < deadline:
        r = requests.get(f"{BASE}/{task_id}", headers=H, timeout=30)
        d = r.json()["data"]
        if d["status"] == "success":
            return d["output"][0]["url"]
        if d["status"] == "fail":
            raise RuntimeError(f"task {task_id} failed: {d.get('error')}")
        time.sleep(4)
    raise TimeoutError(f"task {task_id} not done in {deadline_s}s")


def render_one(item):
    slug, prompt, size = item
    task_id = submit(prompt, size)
    url = wait(task_id)
    png = requests.get(url, timeout=60).content
    buf = io.BytesIO()
    Image.open(io.BytesIO(png)).convert("RGB").save(buf, "WEBP", quality=88, method=6)
    return slug, buf.getvalue()


JOBS = [
    ("packshot-rose-water",  "A clear dropper bottle, label 玫瑰精华水…", "2048*2048"),
    ("packshot-tea-canister","A pale grey ceramic tea canister, label 龙井…", "2048*2048"),
    # … 200 more rows from your sheet …
]

with cf.ThreadPoolExecutor(max_workers=6) as pool:
    for slug, webp in pool.map(render_one, JOBS):
        with open(f"out/{slug}.webp", "wb") as f:
            f.write(webp)
        print(f"saved {slug}.webp ({len(webp)} bytes)")

A few small notes on this that are not obvious from the code itself.

max_workers=6 is the sweet spot we observed. At 8–10 you start to see queue depth show up as extra latency rather than throughput. The model side appears to fair-share concurrency per account.
We convert to webp at quality 88 before storing. A 2K png from this endpoint is roughly 3–4 MB; the webp lands at 200–400 KB and is visually identical at product-page sizes. Cheaper to ship and the model already encodes a lot of texture that JPEG handles poorly.
The temp URL is only good for one download. If your upload step fails (S3 outage, network blip) you have to re-render — there is no second chance to fetch the same task output.

Six prompt patterns that work for ecommerce

These are the prompt shapes that survived our internal A/B against more verbose alternatives. They are written for English speakers using Chinese product labels — the most common cross-border case.

Pattern 1 — Single-product packshot with Chinese label

The default product shot. Specify backdrop color in plain English, label text in literal Chinese with the font family in English, and one source of light.

A pale grey ceramic tea canister on a clean off-white seamless paper background. The label reads 龙井 in bold black serif Chinese with the small English subtitle Longjing Green Tea below. Soft daylight from camera right, gentle elliptical shadow, no other props. Sharp focus on the label, studio product photography, square 1:1 composition.

The two things doing the work are "reads X in bold serif Chinese" (qwen-image-2.0 places this where it expects a label) and "sharp focus on the label" (makes the model treat the text as the focal point, not decorative).

Pattern 2 — Lifestyle scene built around an empty plate

For lifestyle photography, describe the scene first and place the product into it last, not the other way around. The model is more likely to drift on the scene than on the product, so anchor the scene first.

A wooden Japanese-style table near a wide window in late-morning daylight. The window casts a soft rectangle of light onto the table. In the center of the table is the pale grey ceramic tea canister from the previous shot, with a small wisp of steam rising from a cup beside it. Loose tea leaves are scattered to the right. Wabi-sabi aesthetic, soft shadows, shallow depth of field with focus on the canister. 16:9 widescreen composition.

On our test render the canister carries the characters 龙井 with the English Longjing Green Tea directly below — the same label discipline as the packshot — and the steam, scattered leaves and window light all land as described.

Pattern 3 — Color variant grid in a single render

When variants share an identical product geometry, you can render the whole grid in one call rather than three. Cheaper and the lighting matches across cells by construction.

Three pairs of low-top canvas sneakers shot from a slight top-down angle, arranged in one horizontal row on a clean light-grey seamless background. From left to right: dusty rose, sage green, ivory cream. Identical sneaker silhouette, identical lighting, identical shadow. Each pair faces the camera with toes pointing forward. Studio product photography, 1:1 composition, evenly spaced.

This pattern saves you from rendering each variant separately and then trying to match the lighting in post. The constraint "identical sneaker silhouette, identical lighting, identical shadow" is what holds the row together.

For Shopee, Lazada, and Taobao co-branded campaigns you usually need both Chinese headline and English subtitle in the same image. qwen-image-2.0 is the only model on hiapi that gets both right in the same render.

A horizontal ecommerce promotional banner on a soft pink gradient background. The main headline reads 双十一大促 in large bold black sans-serif Chinese in the center-left. Below the Chinese headline in smaller dark grey English serif: Double 11 Sale — Up to 50% Off. To the right is a clear glass dropper bottle of pink serum with a small white label. Subtle confetti texture in the background, clean and modern, 16:9 widescreen.

The trick that makes this consistently work is putting the Chinese first in the prompt and giving it the "large bold" qualifier, then giving the English the "smaller" qualifier. If you reverse the order the model tends to make the English headline-sized too. The product label inside the banner is the place to ease off — on our test render qwen picked a generic "Vitamin C Serum" label for the bottle because we did not spell the characters out; for branded campaigns, either include the literal label text in the prompt or composite the real label in post.

Pattern 5 — Cross-border marketplace listing card

For composite assets that look like an Amazon or Tmall listing tile — product, headline, price, badges — the prompt has to read like a layout brief, not a photo brief.

A square ecommerce listing card on a white background. In the top two-thirds, a clear glass dropper bottle of pink rose-essence water labeled 玫瑰精华水 / Rose Essence Water. In the bottom third, on the left, the price 99 元 in large bold black Chinese characters with a small English subtitle Rose Essence 30ml below. On the right, a small green badge that reads 新品 in white. Clean modern layout, soft drop shadow under the product, 4:3 composition.

For listings, render in 4:3 (2368*1728) rather than 1:1 — most marketplace listing components are 4:3 in their hero slot and you will get awkward crops at square.

Pattern 6 — Generated-on-demand for personalization

The last pattern is the one that turns the unit economics. Once each render is under three cents, you can stop pre-rendering banners for marketing campaigns and render them per-customer at request time — "Hi 王女士, your monthly bundle is ready" with the customer's actual name baked into the banner. Cache by name hash, set a TTL, and you have personalized banner imagery for the cost of a small CDN bill.

The constraint is latency. 25–45 seconds per render is too slow for synchronous request flow but perfectly fine for an email send or a push notification — generate at queue time, embed when the message goes out.

What qwen-image-2.0 is not good at

For balance, the four jobs we still hand off to other models in the same /v1/tasks endpoint.

Hand-held shots with motion blur. Studio packshots are the sweet spot. If you need something that looks unposed — UGC-style, slight motion, imperfect framing — try nano-banana-pro instead, it has better grasp of "casual" composition.
Pixel-identical product variants. qwen-image-2.0 will render similar variants across separate calls, but if you need the exact same dropper bottle in five different lighting setups, you need a model with reference-image conditioning. None of the pure text-to-image models on hiapi guarantee pixel identity across calls.
Photo-realistic human faces at full body. Faces at close-up are fine; full-body or group shots tend to soften facial detail. For people-led ecommerce — apparel try-ons, makeup demos — pair qwen-image-2.0 with a reference-conditioned video model for the action shots.
Sub-second turnaround. If you need image generation in the inner loop of a synchronous request (the user is waiting at the screen), 25–45 seconds is too long. flux-schnell returns in ~5 seconds at $0.005/image but does not render Chinese text. Pick by latency budget first.

What a 200-SKU drop actually costs

Putting numbers on the workflow end-to-end, for a representative case.

Step	Quantity	Unit cost	Total
Hero / packshot per SKU	200 × 1	$0.025	$5.00
Lifestyle per SKU	200 × 1	$0.025	$5.00
Variant grid per SKU (~3 variants in one render)	200 × 1	$0.025	$5.00
Bilingual banner per category (10 categories)	10 × 1	$0.025	$0.25
Listing card per SKU	200 × 1	$0.025	$5.00
Total image-gen cost	810 renders	—	$20.25

At current hiapi pricing, the entire image budget for a 200-SKU cross-border launch lands around $20. Premium models for the same job (counting from $0.04 per image and up) land $30–$140 with longer per-image runtime. That's the gap qwen-image-2.0 was designed to fill: Chinese-aware ecommerce work at the cost band where you stop counting renders.

Where to go next

Read the qwen-image-2.0 prompt recipes for six style-led recipes that are not ecommerce-specific.
Use the hiapi pricing page as your source of truth for any cost number — qwen-image-2.0 is currently $0.025 per image but the page is authoritative.
For latency-sensitive flows that don't need Chinese text, swap to flux-schnell and follow the same /v1/tasks shape — see Generating Ecommerce Product Images with FLUX.1 schnell for the comparison.

The whole point of the unified /v1/tasks endpoint is that switching models is changing one string. Once you have the loop in place for qwen-image-2.0, the same Python script renders against any image model on hiapi by editing the "model" field and matching the size schema for that model.

Generating Ecommerce Product Images with qwen-image-2.0 via the hiapi API

Generating Ecommerce Product Images with qwen-image-2.0 via the hiapi API

Why qwen-image-2.0 for cross-border product images

The input schema for qwen-image-2.0 on hiapi

A working request

A Python loop that turns a SKU sheet into uploaded packshots

Six prompt patterns that work for ecommerce

Pattern 1 — Single-product packshot with Chinese label

Pattern 2 — Lifestyle scene built around an empty plate

Pattern 3 — Color variant grid in a single render

Pattern 4 — Bilingual promo banner

Pattern 5 — Cross-border marketplace listing card

Pattern 6 — Generated-on-demand for personalization

What qwen-image-2.0 is not good at

What a 200-SKU drop actually costs

Where to go next