Gemini 2.5 Gets New Real-time Audio Feature and More - Check Details Here
After ElevenLabs and Speechify, Gemini too has jumped in with its own feature
2 min. read
Published on
Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more
June 4, 2025 – Google has expanded its Gemini 2.5 model with advanced audio capabilities, now available in preview through Google AI Studio and Vertex AI. These updates include real-time voice interactions, emotion-aware responses, and customizable speech generation across more than 24 languages.
The new native audio dialog feature allows Gemini 2.5 to engage in real-time conversations, directly generating audio responses without converting from text. Users can adjust tone, accent, and speaking style using natural language prompts. With the real-time audio feature, Gemini 2.5 can detect emotions in a user’s voice and respond appropriately, enhancing the naturalness of interactions.
Other recent Google News –
- Here’s How You Can Track Document Changes with Google Drive’s New ‘Catch Me Up’ Feature
- Google Makes Collaborations Using NotebookLM Much Easier – Here’s How
- Google Celebrates Pride Month with THESE Maps, YouTube Music, and Doodle Features
- Google Drive Now Tracks Who Watched Your Videos and When
Additionally, the controllable text-to-speech (TTS) functionality enables the generation of speech with multiple distinct voices, allowing for dynamic multi-speaker dialogues. Users can fine-tune delivery speed, emotional expression, and pronunciation, providing precise control over the audio output.
Google has integrated these audio features into applications like NotebookLM’s Audio Overviews and Project Astra, demonstrating their versatility across various platforms. All AI-generated audio includes SynthID, Google’s watermarking technology, ensuring transparency and authenticity.
Developers can access these features through the Gemini API, with broader availability planned in the coming weeks. These enhancements position Gemini 2.5 as a powerful tool for creating immersive, voice-driven experiences among other AI alternatives.
You may also be interested in:
User forum
0 messages