Gemini 1.5 Pro now has 'Native Audio Understanding,' can convert lectures into quizzes

Now accessible in over 180 countries.

Reading time icon 2 min. read


Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

Key notes

  • Google AI opens access to powerful language model Gemini 1.5 Pro in 180+ countries.
  • Gemini 1.5 Pro gains the ability to understand and process audio directly.
  • Developers gain more control over model behavior with system instructions and JSON mode.

Google AI has released a major update to its LLM, Gemini 1.5 Pro. The first major part of the announcement is that previously only available to a limited group, Gemini 1.5 Pro is now accessible in over 180 countries through Google AI Studio’s public preview. 

This tool now has a 1 million context window, which lets developers to analyze vast amounts of information for superior understanding. All this comes after Google rebrands Duet AI for Devs as Gemini Code Assist.

The other exciting addition (at least for me) is Gemini 1.5 Pro’s native audio understanding capability. This “first-ever” feature allows the model to directly process spoken language. Developers can upload audio files, like lectures or meetings, and Gemini will extract valuable insights.

You can upload a recording of a lecture, like 117,000+ token lecture from Jeff Dean, and Gemini 1.5 Pro can turn it into a quiz with an answer key.

The update also gives developers with greater control and functionality. “System instructions” let users define specific roles, formats, and goals for the model, turning its responses to their unique needs. And, “JSON mode” allows structured data extraction from text or images, perfect for tasks requiring organized information.

Instruct the model to only output JSON objects. This mode enables structured data extraction from text or images. You can get started with cURL, and Python SDK support is coming soon.

Google AI has also released a next-generation text embedding model alongside Gemini 1.5 Pro. This model offers better retrieval performance and surpasses existing options in its class.

Google is also said to be developing a new in-house designed CPU chip named Axion after successfully working up Tensor chips.

More here.

User forum

0 messages