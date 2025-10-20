Google released updated video generator Veo 3.1

Google has introduced updated models of the video generator Veo 3.1 and Veo 3.1 Fast, which received a number of improvements in terms of image, sound and content creation management.

The main innovation was support for realistic dialogues and synchronized sound effects. According to the company, Veo 3.1 creates a more natural sound accompaniment, and also provides better consistency of characters in different scenes – from facial expressions to movements and voice. For developers, advanced cinematic styles have been added, allowing for more precise control over the visual style of the video.

Another important change is the ability to generate up to three reference images of a character, object or location. This helps maintain their integrity across different video episodes, which is especially useful for serial clips or story clips.

Google also introduced the scene extension function, which allows you to create videos longer than 30 seconds. Previously, this limit was strict. New clips can now continue previous ones, generated based on the last second of the previous video – this provides smooth visual continuity.

In addition, Veo 3.1 allows you to make transitions between two videos taking into account both image and audio, creating a single scene. Access to the Veo 3.1 and Veo 3.1 Fast models can be obtained through the Gemini API in Google AI Studio and Vertex AI. At the same time, the cost of use has not changed compared to Veo 3.

For regular users, the updated model is available through Gemini and Flow, and the Veo 3.1 Fast version is focused on faster and cheaper creation of videos in 9:16 format with 1080p resolution – convenient for social networks and mobile devices.

Previously, Google has already integrated Veo 3 into YouTube Shorts, which allowed users to create short videos simply by text description. Now, thanks to Veo 3.1, this process becomes even more realistic – with full sound, dialogue, and visual dynamics, without the need for filming or editing.