Gemini 1.5 Pro now has 'Native Audio Understanding,' can convert lectures into quizzes

Now accessible in over 180 countries.

Home » News

2 min. read

Published on April 9, 2024

by Devesh Beri

published on April 9, 2024

Share this article

Improve this guide

Readers help support MSpoweruser. We may get a commission if you buy through our links.

Key notes

Google AI opens access to powerful language model Gemini 1.5 Pro in 180+ countries.
Gemini 1.5 Pro gains the ability to understand and process audio directly.
Developers gain more control over model behavior with system instructions and JSON mode.

Google AI has released a major update to its LLM, Gemini 1.5 Pro. The first major part of the announcement is that previously only available to a limited group, Gemini 1.5 Pro is now accessible in over 180 countries through Google AI Studio’s public preview.

This tool now has a 1 million context window, which lets developers to analyze vast amounts of information for superior understanding. All this comes after Google rebrands Duet AI for Devs as Gemini Code Assist.

The other exciting addition (at least for me) is Gemini 1.5 Pro’s native audio understanding capability. This “first-ever” feature allows the model to directly process spoken language. Developers can upload audio files, like lectures or meetings, and Gemini will extract valuable insights.

“You can upload a recording of a lecture, like 117,000+ token lecture from Jeff Dean, and Gemini 1.5 Pro can turn it into a quiz with an answer key.

The update also gives developers with greater control and functionality. “System instructions” let users define specific roles, formats, and goals for the model, turning its responses to their unique needs. And, “JSON mode” allows structured data extraction from text or images, perfect for tasks requiring organized information.

Instruct the model to only output JSON objects. This mode enables structured data extraction from text or images. You can get started with cURL, and Python SDK support is coming soon.

Google AI has also released a next-generation text embedding model alongside Gemini 1.5 Pro. This model offers better retrieval performance and surpasses existing options in its class.

Google is also said to be developing a new in-house designed CPU chip named Axion after successfully working up Tensor chips.

More here.

Devesh Beri

Tech Journalist

These are the things that motivate me - creating informative and helpful content, pursuing my passion for motorsports and music, engaging in expeditions, maintaining a healthy lifestyle, and spending time with my adorable cat Taco.

User forum

0 messages

Sort by:

Leave a Reply Cancel reply