What inputs does it need?

A text prompt is required. You can optionally add up to 4 reference images (image files only) to steer an image-to-video result. Audio and video reference inputs are not exposed on HexGen.

What resolutions, aspect ratios, and durations are available?

Resolutions are 720p, 1080p, and 4K. Aspect ratios are 16:9 and 9:16. Clip durations are 4, 6, 8, or 10 seconds, with 10 seconds the maximum.

Does it generate audio?

Gemini Omni is marketed as native-audio video generation. However, Google's launch announcement noted that synchronized audio output was still being tested, so audio behavior may be limited.

Pricing is per clip, set by a resolution-by-duration matrix. 720p and 1080p share one tier and 4K is a higher tier, and within each tier the price rises with clip length. 4K and longer clips cost more.

How do I run it on HexGen?

Open the model in HexGen, write your prompt, optionally add up to 4 reference images, then pick resolution, aspect ratio, and duration before generating your clip.

Vidéo

Google

Gemini Omni Video

Google's Gemini Omni turns a text prompt or up to 4 reference images into video, in resolutions up to 4K.

Ouvrir dans Studio Voir les exemples

À partir de 1050 HGcoins / génération·paiement à la génération, sans abonnement

Exemples

Créé avec Gemini Omni Video

Exemples de rendus. Ouvrez Studio pour générer les vôtres.

À quoi ça sert

Là où Gemini Omni Video excelle

Social Clips

Generate short 4 to 10 second clips in vertical 9:16 for mobile feeds or landscape 16:9 for wider placements.

Image To Video

Attach up to 4 reference images to guide the motion and look of a generated clip from existing stills.

4K Hero Shots

Render higher-resolution footage up to 4K from a single text prompt for polished marketing moments.

Points forts

Generates video directly from a written text prompt, with the prompt as the required input
Supports image-to-video using up to 4 optional reference images
Outputs at resolutions up to 4K
Selectable clip durations of 4, 6, 8, or 10 seconds
Handles both landscape 16:9 and portrait 9:16 aspect ratios
Built by Google on the Gemini Omni multimodal foundation

Compromis

Aspect ratios are limited to 16:9 and 9:16, with no square or other ratios
Maximum clip length is 10 seconds
Reference inputs accept image files only; no audio or video references are exposed
4K output and longer clips cost more, since price scales with both resolution and duration
Native synchronized audio was still in testing at Google's launch, so audio behavior may be limited despite native-audio marketing

Specs

En un coup d'œil

Type

Text-to-video / image-to-video

Vendor

Google (Gemini Omni)

Resolution

720p, 1080p, 4K

Aspect ratios

16:9, 9:16

Duration

4s, 6s, 8s, 10s

Reference images

Optional, up to 4 (images only)

Pricing

Per clip, tiered by resolution and duration (4K costs more)

À propos de Gemini Omni Video

Gemini Omni is Google's multimodal video model, built on the Gemini family's multimodal architecture. On HexGen it turns a written prompt into video, and you can optionally guide the result with up to 4 reference images for an image-to-video workflow. The prompt is the required input, so you can start from words alone or pair them with visuals.

You control the output to fit where it will run. Pick a resolution up to 4K, choose landscape 16:9 or portrait 9:16, and set the clip length to 4, 6, 8, or 10 seconds. That makes it a flexible choice for short social clips, vertical mobile content, and higher-resolution hero footage from a single tool.

Gemini Omni is marketed as native-audio video generation. Note that Google's launch announcement described synchronized audio output as still being tested at release, so audio behavior may be limited. Pricing on HexGen is per clip, set by a resolution-by-duration matrix: 720p and 1080p share one tier, 4K sits in a higher tier, and within each tier the price rises with clip length.

Idées de prompts

Points de départ

Copiez, ajustez et lancez. Un bon prompt fait l'essentiel du travail.

A neon-lit Tokyo street at night in the rain, camera slowly pushing forward past glowing storefronts, reflections on wet pavement, cinematic 16:9.

Close-up of fresh coffee being poured into a white mug, steam rising in soft morning light, vertical 9:16 for a cafe promo.

A paper boat sailing across a calm pond as autumn leaves drift down, gentle ripples, warm late-afternoon sun, slow dolly shot.

Tarifs

1050

HGcoins / génération · ≈ $1.05

Payez seulement ce que vous générez. 1 USD = 1,000 HGcoins. Les HGcoins n'expirent jamais et les échecs sont remboursés automatiquement.

Ouvrir dans Studio Voir les packs de recharge

Comparer

Gemini Omni Video face aux autres modèles

Gemini Omni Video is Google's multimodal video model offering text-to-video and image-to-video with output up to 4K. Here is how it sits next to two other video models in the HexGen catalog.

Gemini Omni Video face aux autres modèles
Modèle	Qualité	Vitesse	Coût	À choisir quand
Gemini Omni Video Celui-ci Google	Excellent	Rapide	Coût élevé	Pick this for Google's multimodal pipeline when you want up to 4K output and image-guided video in both landscape and portrait.
Kling 2.6 Kuaishou	Très bon	Rapide	Coût moyen	A capable Kuaishou video alternative when you want a different motion model in the catalog.
Seedance 2.0 ByteDance	Très bon	Ultra-rapide	Coût réduit	Lean toward ByteDance Seedance when speed and lower cost matter more than 4K output.

En résumé : choisissez Gemini Omni Video quand pick this for google's multimodal pipeline when you want up to 4k output and image-guided video in both landscape and portrait.. Sinon, l'un des modèles ci-dessus conviendra mieux. Touchez une ligne pour comparer.

Plus de modèles

Modèles similaires

Questions fréquentes

It is Google's Gemini Omni multimodal video model that generates video from a text prompt, with the prompt as the required input. You can also guide the result with up to 4 optional reference images.