How to Write Happy Horse Prompts: Six-Part Formula for AI Influencer Video

Happy Horse rewards structure over verbosity. The model has what its prompt guide calls a "prompt budget" — past roughly 60 words, faces go generic, motion gets mushy, and lip-sync drifts. The fix is the six-part formula, the same skeleton Alibaba's ATH team built the model around.

This guide adapts that formula for AI influencer UGC video specifically: talking-head Reels, sponsored lip-sync ads, multilingual variants, multi-shot mini-stories, and atmospheric mood pieces. Every template is copy-paste ready and built to slot into the OmniGems AI pipeline alongside GPT-Image-2 persona anchors.

For background on what Happy Horse is and why we run it as the default video model, see the Happy Horse pillar guide.

The Six-Part Formula

Every Happy Horse prompt has six blocks. Order matters. Block-by-block:

Subject — who or what is on screen, with persona invariants restated
Action — what they do, as a single fluid motion phrase
Environment — setting, lighting, time of day
Style/Composition — aspect ratio, framing, visual tone
Camera Motion — explicit move or static framing
Audio — voiceover script, language, ambient bed

Skip a block and the model fills it with a generic default. Always provide all six, even if the answer is "static, no camera motion" or "no voiceover, ambient only."

Why Block Order Matters

The model parses prompts left-to-right and weights early blocks higher. Subject and Action carry the most quality budget. If you bury the persona invariants under decorative environment description, the persona drifts. Lead with who and what; let environment, style, and camera fall into place after.

The Prompt Budget

Aim for 40–60 words total across all six blocks. Twenty is too thin (the model fills gaps unpredictably). Eighty is too dense (quality dilutes across blocks). Forty to sixty is the sweet spot.

The discipline that gets you there: one specific noun and one specific adjective per block. Not "a beautiful young woman with stunning features in a lovely outfit" — that's six adjectives doing the work of one noun. Try "26-year-old, olive skin, cream turtleneck." Three nouns, three modifiers, done.

Template 1: Talking-Head Reel

The bread and butter. Persona speaks to camera, 9:16, 8–12 seconds, single shot, conversational tone.

Subject: Same persona as reference image, same face, same hair. Action: Speaking directly to camera, slight head movement, natural blinks. Environment: Sunlit Brooklyn café window seat, soft golden hour light. Style: 9:16 vertical, casual iPhone-style, slight handheld drift. Camera: Locked-off medium close-up, eye level. Audio: Female voiceover, English, conversational tone — "Honestly? This one product changed my whole morning routine."

49 words. Within budget. Every block has one specific noun and one specific modifier. Pass the GPT-Image-2 persona anchor as the reference image and the model holds the face.

What to Vary

Audio script — swap the line, keep everything else
Environment — swap "Brooklyn café" for "Tokyo subway platform" or "Seoul rooftop at night"
Time of day — swap "golden hour" for "blue hour" or "harsh midday"
Wardrobe — restate the wardrobe in Subject if you're swapping it; the model needs the cue

Template 2: Sponsored UGC Ad with Lip-Sync

The format brands actually pay for. Persona on camera, holding the product, delivering the brand line.

Subject: Same persona as reference, same face, holding [product reference image] in right hand. Action: Showing product to camera, smiling, speaking the brand line. Environment: Bright kitchen counter, morning natural light through window. Style: 9:16 vertical, polished UGC, slight handheld. Camera: Medium close-up, locked, eye level. Audio: Female voiceover, English, warm and confident — "Three weeks in and I'm not going back."

53 words. Pass two reference images (persona anchor + product still). The model handles multi-image input cleanly.

Lip-Sync Tips

Quote the script verbatim in the Audio block — paraphrasing the script in the prompt produces drifted lip-sync
Specify the language explicitly even if it's English — the model uses it to select phoneme-level lip patterns
For brand names with unusual pronunciation, write them phonetically in a parenthetical: "Try our new Nuance (NEW-AHNS) cream"

Template 3: Multilingual Localized Variant

Same persona, same scene, different language. This is where Happy Horse compounds — generate four language variants of one ad from one prompt skeleton.

Subject: Same persona as reference, same face, same wardrobe. Action: Speaking directly to camera, holding product, light smile. Environment: Same kitchen counter as English variant, morning light. Style: 9:16 vertical, polished UGC. Camera: Medium close-up, locked. Audio: Female voiceover, Japanese, warm and confident — "三週間使って、もう戻れない。"

The only blocks that change between language variants are the script inside Audio and the language label. Subject, Action, Environment, Style, Camera stay identical. This is why one Happy Horse generation per language replaces an entire reshoot.

Supported Languages with Strong Lip-Sync

English, Mandarin Chinese, Cantonese Chinese, Japanese, Korean, German, French. For other languages the model still generates audio but lip-sync quality degrades — see the Happy Horse vs Sora 2 vs Veo 3 breakdown.

Template 4: Multi-Shot Mini-Story

15-second beat with setup → action → payoff. Compress the sequence into a single fluid motion phrase in the Action block — multi-step prose breaks the cuts.

Subject: Same persona as reference, casual loungewear. Action: Opens fridge, pours iced matcha into glass, walks to window, looks at camera with raised eyebrow. Environment: Sunlit Brooklyn loft, late morning. Style: 9:16 vertical, three-shot cut, polished UGC. Camera: Shot 1 wide on fridge, shot 2 medium on pour, shot 3 close on look-to-camera. Audio: Ambient morning kitchen sounds, no voiceover, soft lo-fi music bed.

68 words — slightly over budget but multi-shot inherently needs more. The trick: enumerate the shots inside Camera, not Action. Action describes the persona's continuous motion; Camera describes how the camera observes it.

Why This Works

Happy Horse trains on multi-shot sequences but parses the persona's motion as one trajectory. If you split the trajectory across multiple sentences in Action, the model treats each sentence as an independent generation request and continuity breaks. One Action sentence, one persona motion, one continuous beat — even when the camera cuts.

Template 5: Atmospheric Mood Piece

Slower, cinematic, non-speaking. Used for brand-establishing posts and influencer-launch announcements.

Subject: Same persona as reference, charcoal turtleneck, contemplative. Action: Walking slowly through coffee shop, pausing at window, gazing out. Environment: Tokyo coffee shop, blue hour, neon reflections in puddles outside. Style: 9:16 vertical, cinematic, color-graded teal-and-amber. Camera: Steadicam glide following persona, slow dolly-in to medium close-up at window. Audio: Ambient café sound, distant rain, lo-fi instrumental — no voiceover.

64 words. This format leans into Happy Horse's strengths — atmospherics, fabric dynamics, geometric consistency in reflections, cinema-grade color grading.

When to Use

Influencer launch posts (introducing the persona to the feed)
Campaign opening clips (set the mood before the talking-head ad lands)
Sponsored brand films where the persona is the subject of the cinematography, not the speaker

Common Prompt Mistakes

Bloated Subject blocks — "a beautiful young woman with cascading auburn hair, piercing blue eyes, a warm smile, wearing a stunning cream-colored turtleneck" eats half the budget. Compress: "26-year-old, auburn hair, cream turtleneck."
Multi-step Action prose — "She opens the door, walks to the table, sits down, picks up a book, then opens it" produces broken cuts. Compress: "Opens door, sits at table reading."
Decorative cinematography — "stunning, breathtaking, professional film look" is noise. The model wants concrete cinematography vocabulary: "locked-off medium close-up, eye level, slight handheld drift."
Skipping Audio — if you don't specify, you get random ambient. Always describe at least the audio bed, even on non-speaking clips: "ambient café sound, no voiceover."
Vague language tags — "speaking the brand line" without an Audio block produces TTS-quality lip-sync. Always quote the script verbatim and label the language explicitly.
Restating the persona anchor description in text — pass the anchor as a reference image; in Subject, just write "Same persona as reference, same face, same hair." The image carries the heavy load.

Prompt Iteration Workflow

The single-change-per-pass discipline that works for image generation works for video too:

Generate the base clip with the full six-block prompt
Lock five blocks; vary one
Compare output to base; keep what works
Move to next block; vary that one
Stop iterating when you have a clip that ships

This is how series content stays coherent across 30+ daily Reels. Same persona anchor, same prompt skeleton, one variable at a time. Trying to vary three blocks at once produces unpredictable output and a folder of unusable takes.

How OmniGems AI Uses This Formula

Inside the OmniGems AI Studio, the influencer's persona brief auto-generates the Subject block. The creator's content schedule defines the Action and Audio blocks. Style and Camera defaults are set per platform (9:16 for Reels/TikTok/Shorts, 16:9 for YouTube long-form). The creator only writes the Action and Audio variation — the rest is templated.

This is what turns Happy Horse from a powerful video model into a content-pipeline component. Discipline at the prompt level scales the discipline at the persona level.

Next Steps

For why we picked Happy Horse over Sora 2 and Veo 3, see Happy Horse vs Sora 2 vs Veo 3
For the persona anchor workflow that feeds image-to-video, see GPT-Image-2 for AI Influencers
For aspect ratios and platform formats, see Best Aspect Ratios for Social Platforms
For image-side prompt structure, see How to Write Prompts for AI Influencer Content

Start Generating

Try the six-part formula inside the OmniGems AI Studio. Persona anchor handled, video pipeline integrated, model routing per clip available, posting agent and token launch in the same flow.

For background on what Happy Horse is and why we run it as the default video model, see the Happy Horse pillar guide.

The Six-Part Formula

Every Happy Horse prompt has six blocks. Order matters. Block-by-block:

Subject — who or what is on screen, with persona invariants restated
Action — what they do, as a single fluid motion phrase
Environment — setting, lighting, time of day
Style/Composition — aspect ratio, framing, visual tone
Camera Motion — explicit move or static framing
Audio — voiceover script, language, ambient bed

Skip a block and the model fills it with a generic default. Always provide all six, even if the answer is "static, no camera motion" or "no voiceover, ambient only."

Why Block Order Matters

The Prompt Budget

Aim for 40–60 words total across all six blocks. Twenty is too thin (the model fills gaps unpredictably). Eighty is too dense (quality dilutes across blocks). Forty to sixty is the sweet spot.

Template 1: Talking-Head Reel

The bread and butter. Persona speaks to camera, 9:16, 8–12 seconds, single shot, conversational tone.

Subject: Same persona as reference image, same face, same hair. Action: Speaking directly to camera, slight head movement, natural blinks. Environment: Sunlit Brooklyn café window seat, soft golden hour light. Style: 9:16 vertical, casual iPhone-style, slight handheld drift. Camera: Locked-off medium close-up, eye level. Audio: Female voiceover, English, conversational tone — "Honestly? This one product changed my whole morning routine."

49 words. Within budget. Every block has one specific noun and one specific modifier. Pass the GPT-Image-2 persona anchor as the reference image and the model holds the face.

What to Vary

Audio script — swap the line, keep everything else
Environment — swap "Brooklyn café" for "Tokyo subway platform" or "Seoul rooftop at night"
Time of day — swap "golden hour" for "blue hour" or "harsh midday"
Wardrobe — restate the wardrobe in Subject if you're swapping it; the model needs the cue

Template 2: Sponsored UGC Ad with Lip-Sync

The format brands actually pay for. Persona on camera, holding the product, delivering the brand line.

Subject: Same persona as reference, same face, holding [product reference image] in right hand. Action: Showing product to camera, smiling, speaking the brand line. Environment: Bright kitchen counter, morning natural light through window. Style: 9:16 vertical, polished UGC, slight handheld. Camera: Medium close-up, locked, eye level. Audio: Female voiceover, English, warm and confident — "Three weeks in and I'm not going back."

53 words. Pass two reference images (persona anchor + product still). The model handles multi-image input cleanly.

Lip-Sync Tips

Quote the script verbatim in the Audio block — paraphrasing the script in the prompt produces drifted lip-sync
Specify the language explicitly even if it's English — the model uses it to select phoneme-level lip patterns
For brand names with unusual pronunciation, write them phonetically in a parenthetical: "Try our new Nuance (NEW-AHNS) cream"

Template 3: Multilingual Localized Variant

Same persona, same scene, different language. This is where Happy Horse compounds — generate four language variants of one ad from one prompt skeleton.

Subject: Same persona as reference, same face, same wardrobe. Action: Speaking directly to camera, holding product, light smile. Environment: Same kitchen counter as English variant, morning light. Style: 9:16 vertical, polished UGC. Camera: Medium close-up, locked. Audio: Female voiceover, Japanese, warm and confident — "三週間使って、もう戻れない。"

Supported Languages with Strong Lip-Sync

Template 4: Multi-Shot Mini-Story

15-second beat with setup → action → payoff. Compress the sequence into a single fluid motion phrase in the Action block — multi-step prose breaks the cuts.

Subject: Same persona as reference, casual loungewear. Action: Opens fridge, pours iced matcha into glass, walks to window, looks at camera with raised eyebrow. Environment: Sunlit Brooklyn loft, late morning. Style: 9:16 vertical, three-shot cut, polished UGC. Camera: Shot 1 wide on fridge, shot 2 medium on pour, shot 3 close on look-to-camera. Audio: Ambient morning kitchen sounds, no voiceover, soft lo-fi music bed.

Why This Works

Template 5: Atmospheric Mood Piece

Slower, cinematic, non-speaking. Used for brand-establishing posts and influencer-launch announcements.

Subject: Same persona as reference, charcoal turtleneck, contemplative. Action: Walking slowly through coffee shop, pausing at window, gazing out. Environment: Tokyo coffee shop, blue hour, neon reflections in puddles outside. Style: 9:16 vertical, cinematic, color-graded teal-and-amber. Camera: Steadicam glide following persona, slow dolly-in to medium close-up at window. Audio: Ambient café sound, distant rain, lo-fi instrumental — no voiceover.

64 words. This format leans into Happy Horse's strengths — atmospherics, fabric dynamics, geometric consistency in reflections, cinema-grade color grading.

When to Use

Influencer launch posts (introducing the persona to the feed)
Campaign opening clips (set the mood before the talking-head ad lands)
Sponsored brand films where the persona is the subject of the cinematography, not the speaker

Common Prompt Mistakes

Bloated Subject blocks — "a beautiful young woman with cascading auburn hair, piercing blue eyes, a warm smile, wearing a stunning cream-colored turtleneck" eats half the budget. Compress: "26-year-old, auburn hair, cream turtleneck."
Multi-step Action prose — "She opens the door, walks to the table, sits down, picks up a book, then opens it" produces broken cuts. Compress: "Opens door, sits at table reading."
Decorative cinematography — "stunning, breathtaking, professional film look" is noise. The model wants concrete cinematography vocabulary: "locked-off medium close-up, eye level, slight handheld drift."
Skipping Audio — if you don't specify, you get random ambient. Always describe at least the audio bed, even on non-speaking clips: "ambient café sound, no voiceover."
Vague language tags — "speaking the brand line" without an Audio block produces TTS-quality lip-sync. Always quote the script verbatim and label the language explicitly.
Restating the persona anchor description in text — pass the anchor as a reference image; in Subject, just write "Same persona as reference, same face, same hair." The image carries the heavy load.

Prompt Iteration Workflow

The single-change-per-pass discipline that works for image generation works for video too:

Generate the base clip with the full six-block prompt
Lock five blocks; vary one
Compare output to base; keep what works
Move to next block; vary that one
Stop iterating when you have a clip that ships

How OmniGems AI Uses This Formula

This is what turns Happy Horse from a powerful video model into a content-pipeline component. Discipline at the prompt level scales the discipline at the persona level.

Next Steps

For why we picked Happy Horse over Sora 2 and Veo 3, see Happy Horse vs Sora 2 vs Veo 3
For the persona anchor workflow that feeds image-to-video, see GPT-Image-2 for AI Influencers
For aspect ratios and platform formats, see Best Aspect Ratios for Social Platforms
For image-side prompt structure, see How to Write Prompts for AI Influencer Content

Start Generating

Try the six-part formula inside the OmniGems AI Studio. Persona anchor handled, video pipeline integrated, model routing per clip available, posting agent and token launch in the same flow.

The Six-Part Formula

Why Block Order Matters

The Prompt Budget

Template 1: Talking-Head Reel

What to Vary

Template 2: Sponsored UGC Ad with Lip-Sync

Lip-Sync Tips

Template 3: Multilingual Localized Variant

Supported Languages with Strong Lip-Sync

Template 4: Multi-Shot Mini-Story

Why This Works

Template 5: Atmospheric Mood Piece

When to Use

Common Prompt Mistakes

Prompt Iteration Workflow

How OmniGems AI Uses This Formula

Next Steps

Start Generating

Happy Horse for AI Influencers: 2026 UGC Video Pipeline Guide

Happy Horse vs Sora 2 vs Veo 3 for AI Influencer Video

How Much Can AI Influencers Earn? 2026 Monetization Guide

OmniGems

Turn ideas into autonomous influencers

The Six-Part Formula

Why Block Order Matters

The Prompt Budget

Template 1: Talking-Head Reel

What to Vary

Template 2: Sponsored UGC Ad with Lip-Sync

Lip-Sync Tips

Template 3: Multilingual Localized Variant

Supported Languages with Strong Lip-Sync

Template 4: Multi-Shot Mini-Story

Why This Works

Template 5: Atmospheric Mood Piece

When to Use

Common Prompt Mistakes

Prompt Iteration Workflow

How OmniGems AI Uses This Formula

Next Steps

Start Generating

Happy Horse for AI Influencers: 2026 UGC Video Pipeline Guide

Happy Horse vs Sora 2 vs Veo 3 for AI Influencer Video

How Much Can AI Influencers Earn? 2026 Monetization Guide

OmniGems

Turn ideas into autonomous influencers