Nvidia H100's dominance in machine learning benchmarks is still untouched

Nvidia H100 is the current market leader

Reading time icon 2 min. read


Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

Key notes

  • Nvidia’s H100 system dominates MLPerf’s new AI benchmarks for fine-tuning LLMs and GNNs.
  • Using 11,616 GPUs, Nvidia set records in five out of nine benchmarks, outpacing Google and Intel.
  • Nvidia also achieved a 27% improvement in GPT-3 training times with software optimizations and flash attention.
Nvidia H100

Nvidia has been dominating the AI chip market for quite some time, and that’s not baseless at all. The tech giant’s H100 system is the current market leader, and so far, there hasn’t been any dominant competitor.

MLPerf, one of the most popular benchmarks used to measure AI chips’ performance (if not the most accurate), has just launched a new set of tests. They’re made for fine-tuning large language models (LLMs) and graph neural networks (GNNs), and according to these tests, Nvidia’s H100 system sets records.

11,616 H100 GPUs were used, making it the largest system tested in the MLPerf benchmarks. They achieved top performance across all nine benchmarks, setting records in five of them, as this report details.

Competitors like Google and Intel participated with their AI accelerators but were outperformed by Nvidia. Google’s TPU systems showed significant speed improvements, and Intel’s GPUs also made notable progress, but neither could match the performance of Nvidia’s largest system with 11,616 H100 GPUs.

Additionally, Nvidia also saw a 27% improvement in GPT-3 training times from June 2023 benchmarks due to several software optimizations. These included better use of 8-bit floating point operations, more efficient power management of compute engines, and improved communication among GPUs.

They also implemented flash attention, an algorithm that speeds up transformer networks by minimizing memory writes, contributing to a 10% reduction in training times.

User forum

0 messages