Google’s flagship video generating model Veo3 is around for so time and if you are a frequent tiktok or other video social media platform user you for sure stumbled upon the Yeti, Bigfoot or talking baby videos that have been created with Veo3. They feature quite realistic movements and also lipsynch speech and sound effects. In term of AI video generation it is the benchmark at the moment.
However it got problems with keeping character consistent over more then 1 prompt/video generation. So when you look very closely the Yeti or Bigfoot has variations over videos from the same account.
A model named “nano banana” was hyping in the AI see as it made appearance on LmArena and was showing excellent results in terms of realism, consistency and quality. It war rumored that Google is behind this new flagship model and last week we got the confirmation: https://developers.googleblog.com/en/introducing-gemini-2-5-flash-image/ . What sets the model apart from many competitors is the ability to keep character consistency and make super fast context aware edits (for the record: similar workflows would be also possible with flux and flux-kontext models). see example of character consistence images of me with the Past Forward tool:


What AI tools you need:
- AI Studio from Google for the flash 2.5 image generation
- Google Flow for the Veo3 video generation (also possible in the Gemini app)
- (alternatively the google model you need are also available in fal.ai)
Character
random character i created using flux 1.1 ultra. (You can use whatever image model you feel best comfortable with, midjourny has obviously the best result still)

Product
For testing i chose my previously with AI generated Jack Daniels gummy bears:

Step 1: Combining in a product scene using flash image 2.5:

Generate an realistic image like in a advertising campaign of the person in the image provided sitting in a forest in front of a campfire. in the back we can see his tent – he is obviously on a camping trip. the man is eating gummy bears from the bag shown in the other image. the brand and the visual of the gummy bear bag should be clearly recognizable like in a product ad.
This is the outcome (first try):

You can easily adapt the scene more to your needs with simple prompts like remove the whiskey bottle, change the sweater color to green etc.
Step 2: Creating the product video with Veo3
We take now the image as input reference for the Veo3 video. With the prompt we bring the image to life and add voice over to the video – like in a real advertising. For more advance use case you can also use Json prompting and my tool I especially created for this: https://veo3json.moweco.com/
A man eating the jack Daniels gummy bears from his bag sitting in the forest in front of a campfire saying: “enjoy real freedom with the new whiskey flavored gummy bears”
camera: professional like in an advertising campaign, slowly moving towards the man sitting
light: natural light, evening moodsequence 1:
man eating from the bag and then saying “enjoy real freedom with the new whiskey flavored gummy bears” and smiling. 0-6ssequence 2: big product shot of the gummy bear bag on white background. on the right top side we see then a big yellow background insert “Available now”
Unfortunately Google flow wouldn’t let my upload and use realistic images of people (cause of country restriction in Europe. So you might want to use a VPN or like me, use the fal.ai Veo3 endpoint for the creation (used to work as well with the Gemini app).
This is my result (1st attempt – I could have created more versions to optimize and also to get rid of the typo in the end frame or to get the product image exactly in the end frame – but just to demonstrate what a first version looks like):
The result is far from being a real advertisement someone would use, but I just wanted to briefly show the process in general. Especially the Veo3 output needs more refinements to get a real descent result. Also here the character consistency unfortunately breaks. But with some more tweaks I am sure you can get advertising quality like result with those tools.
Leave a Reply