Kling 3.0 - Dialogue Guide Part 2

February 16, 2026

Kling 3.0 with Two Elements

While generating the coverage for this scene I started with a Medium Shot of Ji-Ah and Gun-Woo speaking. I generated a default Kling 3.0 video with both characters bound as Elements and the generated video was as expected.

However, when generating the single Medium Shot for Ji-Ah, I discovered an intersting Kling 3.0 behavior.

Ji-Ah Medium Shot | Shot 1 Prompt

When I switched to generating Ji-Ah's medium shots, I left BOTH character elements bound. I figured it would ignore Gun-Woo's element because he wasn't visible in the starting frame. I generated two outputs and the results are below.

Ji-Ah Medium Shot | Shot 1 "Output 1"

In Output 1, the shot begins at the referenced starting frame and Ji-Ah delivers the dialogue. But the camera slowly dollies backwards to position us in an over the shoulder of Gun-Woo. The "Over the Shoulder" shot was generated by Kling.

Ji-Ah Medium Shot | Shot 1 "Output 2"

In Output 2, the shot begins at the referenced start frame then Ji-Ah walks forward and sits in a chair. The camera ends in an Over The Shoulder shot of her, looking at Gun Woo who is sitting at a table. This end shot is again generated by Kling.

This was very suprising to me. I was expecting a static medium shot, but instead Kling is "compelled" to feature both Elements in the scene. Considering that both elements are "Characters" it feels like Kling is generating a random "Two Shot" of both characters based on the starting frame and then generates a Kling 3.0 Default | First and Last Frame | Starting Image video in this mode.

In the end I would call this end result, undesirable. If you have the credits, letting Kling make up some coverage might lead to new ideas or discoveries that do work for the scene. But in my workflow I like to dictate every major key frame.

Kling 3.0 Single Element

For your most standard coverage of a dialogue scene the camera doesn't move and neither do the actors. To get this very standard ouput, we stick to vanilla Kling 3.0 Video with a single Character Element and no multi-shot.

Kling performances and generated voice (from the Element) are quite good and I'm happy enough with these results to start producing the first episode of Rogue School.

Ji-Ah Medium Shot | Shot 1

This is a straight foward Medium Shot where the camera and Ji-Ah are static. She delivers her lines with a pause in the middle, allowing for space in editing to insert Gun-Woo's line. This is a default Kling 3.0 | First Frame | Dialogue generation.

Gun-Woo | Medium Shot

A static shot of Gun-Woo deliverying his lines with a static camera. In other generations with this starting frame the camera boomed/jibed down tighten up his head room, unprompted. In this generation it left the headroom the same as the starting frame, which is a bit too much generally speaking.

Conclusion

For most dialogue scenes camera movement and too much motion is distracting to the viewer who just wants to understand the dramatic beats between the characters. Camera movement like a dolly in can elevate certain lines but it is normally done selectively.

When generating the standard static coverage for a scene sticking to Kling 3.0 default video with a single Element for "Singles" is the safest bet and the results are quite good.

Cheers,

Matt

Back to blog