NVIDIA and Google partner to optimize new Google Gemma on NVIDIA GPUs

Home » News

2 min. read

Published on February 22, 2024

by Devesh Beri

published on February 22, 2024

Share this article

Improve this guide

Readers help support MSpoweruser. We may get a commission if you buy through our links.

Key notes

Microsoft ditches NVIDIA, and Google embraces them for optimizing its new AI model.
NVIDIA’s TensorRT-LLM speeds up Google’s Gemma on various platforms, including local PCs.
Developers gain access to tools for fine-tuning and deploying Gemma for specific needs.

While Microsoft recently announced its decision to shift away from NVIDIA GPUs in favor of its custom chips, Google has taken the opposite approach, collaborating with NVIDIA to optimize its new lightweight language model, Gemma, on NVIDIA GPUs.

Gemma is a lightweight language model developed by Google. Unlike traditional large language models (LLMs) that require immense computational resources, Gemma boasts a smaller size (2 billion and 7 billion parameter versions) while offering impressive capabilities.

This collaboration aims to significantly improve the accessibility and performance of Gemma, making it faster and more widely available across various platforms.

This open-source library optimizes LLM inference, enabling faster performance on NVIDIA GPUs in data centers, cloud environments, and even personal computers equipped with NVIDIA RTX GPUs. The collaboration targets over 100 million NVIDIA RTX GPUs globally and cloud platforms featuring H100 and upcoming H200 GPUs.

NVIDIA’s AI Enterprise suite, including the NeMo framework and TensorRT-LLM, empowers developers to fine-tune and deploy Gemma for specific use cases.

Users can directly interact with Gemma through the NVIDIA AI Playground and, soon, through the Chat with RTX demo, allowing them to personalize chatbots with their data.

With Microsoft distancing itself from NVIDIA, Google’s move to optimize its technology on NVIDIA GPUs suggests potentially strengthening their partnership. This could lead to further AI and language modeling advancements, benefiting developers and users alike.

Additionally, focusing on local processing through RTX GPUs empowers users with greater control over their data and privacy, potentially addressing concerns associated with cloud-based LLM services.

More here.

Devesh Beri

Tech Journalist

These are the things that motivate me - creating informative and helpful content, pursuing my passion for motorsports and music, engaging in expeditions, maintaining a healthy lifestyle, and spending time with my adorable cat Taco.

User forum

0 messages

Sort by:

Leave a Reply Cancel reply