NVIDIA and Google partner to optimize new Google Gemma on NVIDIA GPUs

Reading time icon 2 min. read

Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

Key notes

  • Microsoft ditches NVIDIA, and Google embraces them for optimizing its new AI model.
  • NVIDIA’s TensorRT-LLM speeds up Google’s Gemma on various platforms, including local PCs.
  • Developers gain access to tools for fine-tuning and deploying Gemma for specific needs.

While Microsoft recently announced its decision to shift away from NVIDIA GPUs in favor of its custom chips, Google has taken the opposite approach, collaborating with NVIDIA to optimize its new lightweight language model, Gemma, on NVIDIA GPUs.

Gemma is a lightweight language model developed by Google. Unlike traditional large language models (LLMs) that require immense computational resources, Gemma boasts a smaller size (2 billion and 7 billion parameter versions) while offering impressive capabilities.

This collaboration aims to significantly improve the accessibility and performance of Gemma, making it faster and more widely available across various platforms.

This open-source library optimizes LLM inference, enabling faster performance on NVIDIA GPUs in data centers, cloud environments, and even personal computers equipped with NVIDIA RTX GPUs. The collaboration targets over 100 million NVIDIA RTX GPUs globally and cloud platforms featuring H100 and upcoming H200 GPUs.

NVIDIA’s AI Enterprise suite, including the NeMo framework and TensorRT-LLM, empowers developers to fine-tune and deploy Gemma for specific use cases.

Users can directly interact with Gemma through the NVIDIA AI Playground and, soon, through the Chat with RTX demo, allowing them to personalize chatbots with their data.

With Microsoft distancing itself from NVIDIA, Google’s move to optimize its technology on NVIDIA GPUs suggests potentially strengthening their partnership. This could lead to further AI and language modeling advancements, benefiting developers and users alike.

Additionally, focusing on local processing through RTX GPUs empowers users with greater control over their data and privacy, potentially addressing concerns associated with cloud-based LLM services.

More here.

More about the topics: Gemma