Google DeepMind released Veo 3 in May 2026, and we got access early enough to run real tests before writing anything about it. I want to be clear about what 'tests' means here: we generated footage across several different brand video briefs we had in-house — product reveals, background ambiance, abstract transitions, and a few attempts at simple character scenes. Some of it was immediately usable. Some of it was not. Here's what we found.
What's Genuinely New in Veo 3
Native audio is the headline feature and it's real. Veo 3 generates synchronized audio — ambient sound, sound effects, basic dialogue — alongside the video. Previous AI video models produced silent footage that required separate audio work. For certain types of content, like a product video with ambient environmental sound or a simple voiceover narrative, this changes the workflow meaningfully. The audio quality is not broadcast-ready, but it's good enough to inform post-production direction.
1080p output at up to around 8 seconds per clip is the other meaningful upgrade. Earlier Veo versions and competitors like Sora often struggled above 720p or produced noticeable artifacts at higher resolutions. At 1080p, Veo 3's footage holds up well for web and social use. Print or broadcast broadcast quality it is not, but for digital placements it's workable.
What Brands Can Realistically Use It For
B-roll is where Veo 3 earns its place in a production workflow right now. Abstract transitions, product close-ups with no text or complex branding, environmental establishing shots, textural background loops — these are areas where the 8-second clip limit doesn't matter and character consistency isn't a requirement. We generated several usable background video segments for a client's social campaign within an hour. That would have cost half a day of shooting or licensing stock footage.
Simple product reveal sequences also work. A product emerging from darkness, a packaging shot with controlled camera movement, liquid or material product videos — Veo 3 handles these with enough quality for social placements. The prompt-to-output speed is fast enough to iterate through several directions quickly, which has real value at the concept stage.
What Still Requires Human Direction
Character consistency is the main limitation. Generate the same character across two clips and they look like different people. For any brand video that tells a story with a recurring protagonist, you cannot rely on Veo 3 alone. You either need to accept that limitation and structure the video around it, or you need a human shoot for the character-driven parts.
Camera control is another gap. You can describe camera movement in your prompt — 'slow push in', 'arc shot from left' — but the model's execution of these instructions is inconsistent. Controlled camera work for product hero shots, the kind where you really need a smooth dolly move at a specific speed, still requires physical production. There's also no way to specify lens characteristics meaningfully, which matters for brand visual language.
Lip sync and dialogue are technically present in Veo 3 but nowhere near reliable enough for branded content. We tested a simple testimonial format and the lip sync degraded noticeably after the first couple of seconds. Keep characters away from talking for now.
Veo 3 in a Production Workflow
The workflows where Veo 3 actually saves time and money look like this: supplementing a human shoot with AI-generated B-roll to extend edit options, generating multiple background and environmental variants quickly for client approval, building out short-form social content where production volume is high and individual clip polish requirements are moderate. Using it as a replacement for a full production shoot is not realistic at this stage.
Comparison: Veo 3 vs. Sora vs. Runway Gen-3
| Criterion | Veo 3 | Sora | Runway Gen-3 |
|---|---|---|---|
| Max resolution | 1080p | 1080p | 1080p |
| Native audio | Yes | No | No |
| Max clip length | ~8 seconds | ~20 seconds (extended) | ~10 seconds |
| Photorealism | Very good | Excellent | Good |
| Character consistency | Weak | Medium | Medium |
| Camera control | Limited | Better | Good |
| Commercial access | Google AI Studio / Vertex | ChatGPT Pro / API | Runway subscription |
| Best for brands | B-roll, product shots | Cinematic concepts | Creative transitions |
Our Honest Assessment After Initial Tests
Veo 3 is the most production-relevant AI video model we've tested to date. The native audio alone changes how you think about using AI video for certain content types. But it's not a shoot replacement and it's not close to one. Treat it as a powerful addition to a production toolkit rather than a substitute for one. The brands that get value from it now are the ones that use it for high-volume, moderate-quality digital content — social feeds, product demos, supplementary B-roll — not for flagship campaign films.
Frequently Asked Questions
How do I access Veo 3?
Via Google AI Studio and Google Vertex AI as of May 2026. Access is rolling out; availability may depend on your region and Google Cloud account status.
Can Veo 3 generate longer videos than 8 seconds?
Individual clips are currently around 8 seconds. Longer videos require stitching multiple clips together in post-production, which introduces the consistency problem between clips.
Is Veo 3 output usable for TV advertising?
Not yet, in our assessment. For web and social placements it's workable. For broadcast, the resolution and production control limitations make it unsuitable as the primary content source.
How does the native audio generation actually work?
Veo 3 generates audio directly from the same prompt used for the video, so the sound is synchronized with the scene. You can describe the audio environment in your prompt or let the model infer it from the visual content.
- B-roll
- Supplementary footage used to support the main narrative in a video — establishing shots, product close-ups, environmental scenes.
- Character consistency
- The ability of an AI video model to reproduce the same character appearance reliably across multiple generated clips.
- Native audio generation
- Generating synchronized audio — ambient sound, effects, dialogue — as part of the same model pass that produces video, rather than adding it separately.
- Prompt-to-output
- The full process from writing a text description of a scene to receiving the generated video, including model processing time.
- Vertex AI
- Google Cloud's machine learning platform, which provides API access to models including Veo 3 for enterprise and developer use.
We're actively building Veo 3 into production workflows for select client projects. If you want to explore what AI video can do for your brand's content calendar, let's talk.