The Gemini 1.5 Pro can now hear

Precise News

Google’s update to Gemini 1.5 Pro gives the model ears.
During its Google Next event, Google also announced it’ll make Gemini 1.5 Pro available to the public for the first time through its platform to build AI applications, Vertex AI.
This new version of Gemini Pro, which is supposed to be the middle-weight model of the Gemini family, already surpasses the biggest and most powerful model, Gemini Ultra, in performance.
Gemini 1.5 Pro can understand complicated instructions and eliminates the need to fine-tune models, Google claims.
Gemini 1.5 Pro is not available to people without access to Vertex AI.
Right now, most people encounter Gemini language models through the Gemini chatbot.
Gemini Ultra powers the Gemini Advanced chatbot, and while it is powerful and also able to understand long commands, it’s not as fast as Gemini 1.5 Pro.
Gemini 1.5 Pro is not the only large AI model from Google getting an update.


The Gemini 1.5 Pro model now has ears thanks to a Google update. Now, instead of referring to a written transcript, the model can process information from things like earnings calls or audio from videos by listening to uploaded audio files.

Google additionally disclosed at its Google Next event that it will launch Gemini 1.5 Pro to the general public via Vertex AI, its platform for creating AI applications. In February, the Gemini 1.5 Pro was initially revealed.

The newest model in the Gemini family, the Gemini Pro, is meant to be a middle-weight model, but it already performs better than the largest and most potent model, the Gemini Ultra. According to Google, Gemini 1.5 Pro can comprehend complex instructions and does away with the need for model fine-tuning.

People without access to Vertex AI cannot use Gemini 1.5 Pro. Currently, the Gemini chatbot is the primary way that most users interact with Gemini language models. The Gemini Advanced chatbot is powered by Gemini Ultra, which is slower than Gemini 1.5 Pro despite being more potent and capable of comprehending longer commands.

Not all of Google’s large AI models are receiving updates, including Gemini 1.5 Pro. In addition, Gemini’s text-to-image generation model, Imagen 2, will support inpainting and outpainting, allowing users to manipulate image elements. Additionally, all images created with Imagen models now have access to Google’s SynthID digital watermark feature. With SynthID, images are enhanced with an invisible watermark that, when viewed through a detection tool, identifies the source of the image.

Not to mention more widely available to consumers on more recent Samsung Galaxy phones, many of the new features of Imagen—particularly inpainting and outpainting—have been included into other text-to-image models like Getty’s Generative AI by iStock and Stability AI’s Stable Cascade.

As part of its public preview program, Google claims to be integrating Google Search into its AI responses to ensure that the results are current. With the answers generated by large language models, it’s not always the case—Google has purposefully barred Gemini from providing answers to queries pertaining to the US election of 2024.

Recently, Gemini came under fire for creating images of people who weren’t historically accurate.

scroll to top