Question 26
Section 1Which Google foundation model is purpose-built for text-to-video generation?
Correct answer: A
Explanation
Veo is Google’s foundation model designed for text-to-video generation. It is purpose-built to turn text prompts into video, which distinguishes it from models focused on images or text only.
Why each option is right or wrong
A. Veo
Google’s video-generation foundation model is Veo, introduced as the model specifically trained for generating video from text prompts rather than still images or text-only outputs. In Google’s model lineup, this distinguishes it from image-focused models such as Imagen and general multimodal models, so the correct identification is the one purpose-built for text-to-video generation.
B. Imagen
Imagen is associated with text-to-image generation, not text-to-video output.
C. Gemma
Gemma is a lightweight language model family for text-centric tasks, not video generation.
D. Chirp
Chirp is used for speech/audio understanding tasks rather than generating video from prompts.