Product Videos with Kling AI: Our Production Notes from 2026

We've generated over 900 product video clips with Kling AI across various versions. These are our actual production notes -- what changed from 1.6 to 2.0, what product categories work, what doesn't, and what a realistic production costs.

Kling 2.0 is a meaningful jump from 1.6 -- motion physics and product stability are both better
Cosmetics, packaged goods, and fashion accessories are the strongest use cases
Complex mechanical products and fast action still produce unreliable results
Realistic cost: per final polished second of product video, all-in

We started using Kling AI in mid-2024 when it was still in early access and producing outputs that were impressive but unpredictable. By early 2026, it's become our primary production tool for product motion video. That shift didn't happen because of any single update -- it happened because the model got consistently better at the specific things that matter for product work: keeping products recognizable through a motion sequence, handling reflective surfaces without the warping artifacts we saw in earlier versions, and giving us enough prompt control to produce predictable results at scale. These are our honest notes.

Kling 1.6 vs. 2.0: what actually changed

Kling 1.6 was good at generating atmospheric product motion -- a bottle drifting through a misty environment, a cosmetics product catching light in a slow rotation. Where it struggled: any situation where the product needed to stay geometrically consistent through the motion, and any shot with liquid physics (pour shots, splashes) that needed to look physically realistic.

Kling 2.0 improved both of those areas. Product geometry holds better through camera moves and product motion. Liquid physics is more convincing -- we've produced several pour shots for a beverage client that required minimal compositing work to use. The frame consistency (what Kling calls temporal consistency) is noticeably better, meaning you don't get the subtle shape-drift across a 5-second clip that was common in 1.6. Generation speed also improved -- the same clip that took 12-15 minutes on 1.6 typically finishes in 6-10 minutes on 2.0.

Use cases that work well

Cosmetics and skincare are our strongest Kling use case. Bottles, tubes, and jars move beautifully. Liquid textures, serums catching light, cream textures spreading -- Kling handles these better than any other model we've tested at production volume. We've delivered full Instagram Reels campaigns for three skincare brands using Kling as the primary generation engine, and the results have been commercially used without issue.

Packaged goods -- boxed products, bottles, canned items -- work reliably for camera orbit shots and product reveals. The key is keeping the prompt focused on camera movement rather than product movement. A prompt describing the camera orbiting around a stationary product produces more consistent results than asking the product itself to move.

Fashion accessories (bags, shoes, watches) work well for surface and detail showcase. A leather bag rotating slowly to show texture, a watch reflecting ambient light on a marble surface -- Kling produces these well. Clothing on a model is less reliable for the reasons covered earlier: human motion and fabric physics together are still a weak spot.

What consistently fails

Complex mechanical products: anything with visible moving parts, buttons, mechanisms -- the AI generates plausible-looking geometry that doesn't match the actual product. We stop trying and recommend 3D animation or traditional video for these.
Fast action sequences: product impact shots (a perfume bottle landing on a surface), product in rapid motion, anything implying speed or force -- temporal consistency breaks down and you get distortion artifacts.
Transparent products showing contents: a clear glass bottle with visible liquid contents is hard. Kling 2.0 is better than 1.6 at this but still unreliable enough that we add significant review time to these projects.
Multi-product scenes: two or more distinct products in the same frame, especially if they need to move independently. The model tends to blend the products in ways that are hard to predict.
Very small products or fine detail work: a piece of fine jewelry with engraved detail, a watch dial with text -- the model smooths and invents detail at small scales.

Prompt strategy for product motion

The single most useful prompt principle we've developed: describe the camera, not just the product. 'Camera slowly orbits the product from left to right, product stationary, soft studio lighting' produces more consistent results than 'product rotates showing all angles.' The model seems to handle camera motion better than object motion for product work.

Lighting description matters more than scene description. We spend more prompt tokens on describing the lighting setup (soft diffused fill, single hard spotlight from camera-left, warm golden hour side light) than on describing the environment. The product behavior in light is what makes or breaks a product video.

Negative prompts still matter even in 2026. Our standard negative prompt string for product work: 'distortion, warping, melting, text artifacts, blurry product, deformed packaging, inconsistent shape.' These reduce the rejection rate in our curation step from roughly 40% down to 20%.

Turnaround time and cost reality

Raw generation cost per clip on Kling 2.0: roughly per 5-second clip depending on quality settings. We generate 5-8 candidates per finished clip, so generation cost per finished clip is Add our editing time (music sync, color grading, format export), and a single polished 5-second product video clip runs about total from our production to delivery. That's per finished second.

Turnaround from brief to delivery: 2-3 days for a single clip, 5-7 days for a 10-clip campaign set. Clients who need same-day turnaround pay a rush premium. Same-day turnaround is genuinely possible for simple clips -- it's not rush-is-impossible, it's rush-means-less-iteration.

One honest limitation worth mentioning

Kling (and all AI video tools) require you to accept some loss of control compared to traditional video production. When you direct a video shoot, you can iterate in real time. When you prompt an AI model, you're submitting a generation job and waiting. If the output isn't right, you iterate the prompt and wait again. For clients who are used to being on set and making real-time decisions, this workflow requires an adjustment. The creative direction conversation has to happen before generation, not during it.

Temporal consistency: How well an AI video model maintains consistent object appearance, geometry, and color across frames. Low temporal consistency causes shape drift and texture flickering.
Camera motion prompting: Describing movement as the camera moving around a stationary subject rather than the subject moving. Generally produces more stable results in AI video generation.
Generation candidate: A single output from one generation run. Standard practice is generating multiple candidates per final clip and selecting the best, rather than accepting the first output.
Negative prompt: Instructions to an AI model about what to avoid in the output. In Kling, these are specified separately from the main prompt and help reduce artifacts and unwanted behaviors.
Motion amplitude: A parameter in Kling that controls how much movement occurs in a generated clip. Lower values produce more subtle, controlled motion; higher values generate more dynamic sequences that also carry higher risk of artifacts.

If you want to see Kling 2.0 output quality for your specific product category before committing to a production budget, we can run a small test generation against your brief. Send us a product photo and a description of the motion you want -- we'll turn it around in 24 hours.

Blog · Get a quote