Google Introduces Gemini 2.0

Language Model

December 12th, 2024

Over the past year, we have made remarkable strides in the field of artificial intelligence, setting new benchmarks and unlocking possibilities that were previously unimaginable. Today, we are thrilled to announce the release of the first model in the Gemini 2.0 family: Gemini 2.0 Flash. This experimental model is a testament to our commitment to pushing the boundaries of AI technology. Designed as our workhorse model, Gemini 2.0 Flash combines low latency with enhanced performance, leveraging cutting-edge innovations to operate seamlessly at scale.

Showcasing the Future: Gemini 2.0’s Agentic Research

Beyond the release of Gemini 2.0 Flash, we are excited to share the forefront of our agentic research. By showcasing prototypes enabled by Gemini 2.0’s native multimodal capabilities, we aim to redefine what AI can accomplish in both virtual and physical environments. These advancements represent a leap forward in AI-driven applications and interactions.

Gemini 2.0 Flash: A New Benchmark

Gemini 2.0 Flash builds on the success of its predecessor, 1.5 Flash, which has been widely embraced by developers. This new model outperforms the 1.5 Pro on key benchmarks while maintaining twice the speed. Its enhanced capabilities include support for multimodal inputs—such as images, videos, and audio—and the ability to deliver multimodal outputs. For example, Gemini 2.0 Flash can natively generate images combined with text and produce multilingual audio through steerable text-to-speech (TTS). Additionally, it can natively call tools like Google Search, execute code, and interact with third-party user-defined functions.

These innovations reflect our goal of delivering safe and accessible models into users’ hands as swiftly as possible. Over the past month, early experimental versions of Gemini 2.0 have been shared with developers, who provided invaluable feedback. This iterative process has been instrumental in refining the model’s capabilities.

Gemini 2.0 Flash is now available to developers via the Gemini API in Google AI Studio and Vertex AI. Multimodal input and text output capabilities are accessible to all developers, while text-to-speech and native image generation are currently offered to early-access partners. General availability is scheduled for January, alongside additional model sizes.

To empower developers in creating dynamic and interactive applications, we are also releasing a new Multimodal Live API. This API supports real-time audio and video streaming inputs and enables the use of multiple, combined tools. More details about Gemini 2.0 Flash and the Multimodal Live API can be found on our developer blog.

Gemini 2.0 in the Gemini App: Enhancing User Experiences

Starting today, Gemini app users worldwide can access a chat-optimized version of Gemini 2.0 Flash by selecting it in the model drop-down menu on desktop and mobile web platforms. This feature will soon be integrated into the Gemini mobile app, allowing users to experience an even more helpful AI assistant. Early next year, we plan to expand Gemini 2.0’s availability across more Google products.

Unlocking Agentic Experiences with Gemini 2.0

The capabilities of Gemini 2.0 Flash go beyond traditional AI applications. Its advanced features—including native user interface action capabilities, multimodal reasoning, long-context understanding, complex instruction following, compositional function-calling, and improved latency—are designed to enable a new class of agentic experiences. These advancements pave the way for AI agents that can help users accomplish intricate tasks, both virtually and in the real world.

We are exploring this potential with a series of prototypes, such as:

Project Astra: A research prototype that explores the capabilities of a universal AI assistant, enhanced by multimodal understanding.
Project Mariner: A prototype focused on advancing human-agent interaction, starting with browser-based tasks.
Jules: An AI-powered code agent designed to assist developers.

These projects are still in the early stages, but we are eager to learn from trusted testers and iterate based on their feedback. Our goal is to make these capabilities widely available in future products.

Project Astra: A Universal AI Assistant

Since its introduction at I/O, Project Astra has provided valuable insights into how a universal AI assistant could function in practice. Built with Gemini 2.0, the latest version of Astra includes several enhancements:

Improved Dialogue: Astra now supports multiple languages, including mixed-language conversations, and has a better understanding of accents and uncommon words.
Enhanced Tool Use: With Gemini 2.0, Astra can utilize Google Search, Lens, and Maps, making it a more versatile everyday assistant.
Better Memory: Astra now offers up to 10 minutes of in-session memory and can recall past conversations, enabling a more personalized user experience.
Reduced Latency: Streaming capabilities and native audio understanding allow Astra to respond at the speed of human conversation.

We are also expanding Astra’s reach to new form factors, such as prototype glasses, and increasing the scope of our trusted tester program to gather broader feedback.

Project Mariner: Redefining Browser-Based AI Assistance

Project Mariner, built with Gemini 2.0, explores the future of AI agents in the context of web browsers. This research prototype can understand and reason across browser content, including text, code, images, and forms. By leveraging an experimental Chrome extension, Mariner can complete tasks such as navigating websites and filling out forms.

When evaluated against the WebVoyager benchmark, Project Mariner achieved a state-of-the-art result of 83.5% for end-to-end web tasks. While still in its early stages, the project highlights the growing potential of browser-based AI agents.

To ensure safety, Mariner includes safeguards such as limiting actions to the active browser tab and requiring user confirmation for sensitive tasks. Trusted testers are currently evaluating Mariner, and we are engaging with the web ecosystem to address potential risks and ensure responsible development.

Building Responsibly in the Agentic Era

As we develop these groundbreaking technologies, we remain committed to responsible innovation. We recognize the complexities and potential risks associated with AI agents, and we are taking a cautious, exploratory approach. Our efforts include:

Working with our Responsibility and Safety Committee to identify and mitigate risks.
Enhancing our AI-assisted red-teaming capabilities to optimize models for safety.
Implementing rigorous evaluations of Gemini 2.0’s multimodal outputs.
Designing privacy controls to give users greater control over their data.

Looking Ahead: Gemini 2.0 and Beyond

The release of Gemini 2.0 Flash and the unveiling of our agentic research prototypes mark an exciting milestone in the evolution of AI. These innovations reflect our vision for the future—a future where AI serves as a powerful tool to enhance human capabilities and make the world more connected and efficient.

As we continue to explore the possibilities of AI, our commitment to safety, responsibility, and user empowerment will remain at the core of our efforts. We look forward to sharing more advancements and expanding the reach of Gemini 2.0 in the months ahead.

Categories: Language Model

Posted By: raffael dickreuter