Video
Alibaba

Wan 2.6

Generate HD video up to 15 seconds with native synced audio and lip-sync, from text or a single image.

From 750 HGcoins / generation·pay per generation, no subscription
Examples

Made with Wan 2.6

Sample outputs. Open in Studio to generate your own.

What it's for

Where Wan 2.6 shines

Talking Clips

Generate short video with native dialogue and lip-sync in a single pass, no separate audio pipeline.

Image to Motion

Bring a single still image to life as a moving clip using image-to-video mode.

HD Shorts

Produce 720p or 1080p clips up to 15 seconds straight from a text prompt.

Strengths

  • Runs both text-to-video and image-to-video from a single optional reference image
  • Generates native synchronized audio with lip-sync in the same pass, with no separate audio step
  • Outputs HD video at 720p and 1080p
  • Supports clip lengths up to 15 seconds, in 5, 10, or 15 second options

Trade-offs

  • Priced per second of output, so longer clips cost proportionally more
  • 1080p costs more than 720p
  • Image-to-video accepts at most one reference image
  • No aspect-ratio control is exposed in the HexGen form
Specs

At a glance

Type
Text-to-video and image-to-video
Vendor
Alibaba (Wan)
Resolution
720p or 1080p
Duration
5, 10, or 15 seconds
Reference images
Up to 1 (optional, for image-to-video)
Audio
Native synchronized audio with lip-sync

About Wan 2.6

Wan 2.6 is Alibaba's latest Wan video model, built for cost-effective HD video that arrives with sound already attached. It runs in two modes: text-to-video, where a written prompt is all you need, and image-to-video, where a single reference image guides the motion. You pick the output you want and Wan 2.6 handles the rest in one pass.

What sets it apart is native synchronized audio. Dialogue, sound effects, and lip-sync are generated together with the picture, so there is no separate audio step to stitch on afterward. Output is available at 720p and 1080p, in clip lengths of 5, 10, or 15 seconds.

On HexGen you choose your resolution and duration, add an optional reference image for image-to-video, and generate. Pricing is per second of output and tiered by resolution, so a short 720p clip costs less than a longer 1080p one. You always know what you are paying for before you run.

Prompt ideas

Starting points

Copy, tweak, and run. Good prompts get you most of the way there.

A barista in a cozy cafe looks up at the camera and says good morning, steam rising from the espresso machine, warm light, soft ambient chatter in the background.

Waves crash against dark rocks at sunset, seagulls calling overhead, slow cinematic push-in over the shoreline.

A golden retriever bounds across a sunny park chasing a red ball, leaves crunching underfoot, bright cheerful daytime scene.

Pricing
750
HGcoins / generation · ≈ $0.75

Pay only for what you render. 1 USD = 1,000 HGcoins. HGcoins never expire and failed runs refund automatically.

Compare

Wan 2.6 vs other models

How Wan 2.6 stacks up against other video models in the HexGen catalog. Ranks are relative across these siblings.

Wan 2.6 vs other models
ModelQualitySpeedCostChoose it when
Wan 2.6
This
Alibaba
Best
Fast
Mid cost
Pick Wan 2.6 when you want HD video with native synced audio and lip-sync in a single pass, from text or one image.
Kuaishou
Great
Fast
Mid cost
A capable Kuaishou video model for general text-to-video and image-to-video work.
ByteDance
Great
Fastest
Lower cost
ByteDance's video model for faster, lower-cost clip generation.
Bottom line: pick Wan 2.6 when pick wan 2.6 when you want hd video with native synced audio and lip-sync in a single pass, from text or one image.. Otherwise one of the models above will fit better. Tap a row to compare.

Frequently asked questions

Wan 2.6 is Alibaba's video model that generates HD video with native synchronized audio and lip-sync. It runs in text-to-video mode and image-to-video mode.