March 12, 2025 4:03 PM
Credit: VentureBeat made with Midjourney
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Google’s latest open source AI model Gemma 3 isn’t the only big news from the Alphabet subsidiary today.
No, in fact, the spotlight may have been stolen by Google’s Gemini 2.0 Flash with native image generation, a new experimental model available for free to users of Google AI Studio and to developers through Google’s Gemini API.
It marks the first time a major U.S. tech company has shipped multimodal image generation directly within a model to consumers. Most other AI image generation tools were diffusion models (image specific ones) hooked up to large language models (LLMs), requiring a bit of interpretation between two models to derive an image that the user asked for in a text prompt.
By contrast, Gemini 2.0 Flash can generate images natively within the same model that the user types text prompts into, theoretically allowing for greater accuracy and more capabilities — and the early indications are this is entirely true.
Gemini 2.0 Flash, first unveiled in December 2024 but without the native image generation capability switched on for users, integrates multimodal input, reasoning, and natural language understanding to generate images alongside text.
Th...