What inputs does it need?

For text-to-video, just a written prompt. For image-to-video, upload one image to animate, or two images to use as the first and last frame. No more than two reference images are accepted.

Does it generate sound?

Yes, it generates native synchronized audio alongside the video. Google describes the audio as an experimental feature, so it may be unavailable on some videos.

What resolutions and aspect ratios are supported?

Output is available at 720p, 1080p, and 4K, in landscape 16:9 or portrait 9:16. Square and other ratios are not offered.

How does pricing work?

It is a flat per-video price, not priced per second. The cost is set by the tier you choose (Lite, Fast, or Quality) crossed with the output resolution (720p, 1080p, or 4K).

How do I run it on HexGen?

Pick Veo 3.1 in the generator, choose a tier and resolution, enter your prompt or upload reference image(s), select 16:9 or 9:16, and generate.

Video

Google

Veo 3.1

Google's Veo 3.1 turns text or images into video with native synchronized audio, up to 4K.

Open in Studio See examples

From 480 HGcoins / generation·pay per generation, no subscription

Examples

Made with Veo 3.1

Sample outputs. Open in Studio to generate your own.

What it's for

Where Veo 3.1 shines

Text to Video

Write a prompt and Veo 3.1 generates a complete clip with synchronized dialogue, sound effects, and ambient audio.

Animate a Still

Upload one image and the model brings it to life as motion video in 16:9 or 9:16.

Frame Transitions

Provide two images as the first and last frame to direct a controlled transition between two shots.

Strengths

Handles both text-to-video and image-to-video from one model
Generates native synchronized audio with dialogue, sound effects, and ambient sound alongside the video
Outputs up to 4K resolution
Three tiers (Lite, Fast, Quality) to balance speed and cost against fidelity
Image-to-video supports one image to animate or two images for first-and-last-frame control
Works in both landscape 16:9 and portrait 9:16

Trade-offs

Only two aspect ratios are offered, 16:9 and 9:16, with no square or other ratios
Image-to-video requires uploading reference image(s), and no more than two are accepted
Output duration is not user-selectable in this configuration
Audio is described by Google as an experimental feature and may be unavailable on some videos

Specs

At a glance

Type

Video generation (text-to-video and image-to-video)

Vendor

Google (DeepMind)

Model tiers

Lite, Fast, Quality

Resolution

720p, 1080p, 4K

Aspect ratios

16:9, 9:16

Reference images

0-2 (1 to animate, 2 for first + last frame)

About Veo 3.1

Veo 3.1 is Google DeepMind's video generation model, available on HexGen for both text-to-video and image-to-video work. Describe a scene in words or hand it a reference image, and it produces a finished clip complete with native synchronized audio, including dialogue, sound effects, and ambient sound generated alongside the picture.

The model outputs up to 4K resolution and supports both landscape 16:9 and portrait 9:16 framing, so the same idea can be shaped for a widescreen edit or a vertical feed. Image-to-video accepts a single image to animate, or two images used as the first and last frame for a controlled transition from one shot to another.

Three tiers, Lite, Fast, and Quality, let you trade speed and cost against fidelity, so quick drafts and polished final renders both have a home. Pricing is a flat per-video rate set by the tier you pick crossed with the output resolution, which keeps the cost of a clip predictable before you hit generate.

Prompt ideas

Starting points

Copy, tweak, and run. Good prompts get you most of the way there.

A barista steam-frothing milk in a sunlit cafe, close-up on the cup, with the hiss of the steam wand and quiet morning chatter in the background. 16:9, 4K.

Animate this photo of a still harbor at dusk: gentle water ripples, a slow drifting boat, and the soft call of distant gulls. 9:16.

Transition from an empty stage to the same stage filled with light and a single dancer mid-spin, using the two reference images as first and last frame.

Pricing

480

HGcoins / generation · ≈ $0.48

Pay only for what you render. 1 USD = 1,000 HGcoins. HGcoins never expire and failed runs refund automatically.

Open in Studio View top-up packs

Compare

Veo 3.1 vs other models

Veo 3.1 sits at the high end of HexGen's video lineup, with native audio and 4K output. Here is how it lines up against two other video models in the catalog.

Veo 3.1 vs other models
Model	Quality	Speed	Cost	Choose it when
Veo 3.1 This Google	Best	Fast	Higher cost	Pick Veo 3.1 when you want the highest fidelity with native synchronized audio and up to 4K, and you can choose a tier to manage cost.
Kling 3.0 Kuaishou	Best	Fast	Higher cost	A strong Kuaishou alternative for high-end video generation.
Seedance 2.0 ByteDance	Great	Fastest	Mid cost	A ByteDance option when speed and lower cost matter more than maximum fidelity.

Bottom line: pick Veo 3.1 when pick veo 3.1 when you want the highest fidelity with native synchronized audio and up to 4k, and you can choose a tier to manage cost.. Otherwise one of the models above will fit better. Tap a row to compare.

More models

Related models

Frequently asked questions

It generates video from a text prompt or from reference images, producing clips up to 4K with native synchronized audio that includes dialogue, sound effects, and ambient sound.