Gemini 2.5 Gets New Real-time Audio Feature and More - Check Details Here

After ElevenLabs and Speechify, Gemini too has jumped in with its own feature

Reading time icon 2 min. read


Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

June 4, 2025 – Google has expanded its Gemini 2.5 model with advanced audio capabilities, now available in preview through Google AI Studio and Vertex AI. These updates include real-time voice interactions, emotion-aware responses, and customizable speech generation across more than 24 languages.

The new native audio dialog feature allows Gemini 2.5 to engage in real-time conversations, directly generating audio responses without converting from text. Users can adjust tone, accent, and speaking style using natural language prompts. With the real-time audio feature, Gemini 2.5 can detect emotions in a user’s voice and respond appropriately, enhancing the naturalness of interactions.

Other recent Google News –

Additionally, the controllable text-to-speech (TTS) functionality enables the generation of speech with multiple distinct voices, allowing for dynamic multi-speaker dialogues. Users can fine-tune delivery speed, emotional expression, and pronunciation, providing precise control over the audio output.

Google has integrated these audio features into applications like NotebookLM’s Audio Overviews and Project Astra, demonstrating their versatility across various platforms. All AI-generated audio includes SynthID, Google’s watermarking technology, ensuring transparency and authenticity.

Developers can access these features through the Gemini API, with broader availability planned in the coming weeks. These enhancements position Gemini 2.5 as a powerful tool for creating immersive, voice-driven experiences among other AI alternatives.

You may also be interested in:

More about the topics: Google

User forum

0 messages