NVIDIA Dominates MLPerf Training v5.1 Benchmarks

NewsNVIDIA Dominates MLPerf Training v5.1 Benchmarks

In today’s rapidly evolving technological landscape, advancing artificial intelligence (AI) through innovative model training is crucial for enhancing the capabilities of intelligent systems. Achieving this requires significant advancements in various technological components such as Graphics Processing Units (GPUs), Central Processing Units (CPUs), Network Interface Cards (NICs), as well as scaling up and out of networking, system architectures, and an extensive suite of software and algorithms.

In the latest iteration of the MLPerf Training v5.1 benchmarks, which are a series of rigorous industry-standard tests used to evaluate AI training performance, NVIDIA has emerged victorious across all seven categories. These tests encompass a wide range of AI applications, including large language models (LLMs), image generation, recommender systems, computer vision, and graph neural networks.

NVIDIA’s accomplishment is particularly noteworthy as it is the only platform to have submitted results for every test, highlighting the robust programmability of its GPUs and the maturity and flexibility of its CUDA software stack.

### NVIDIA Blackwell Ultra: A New Era of GPU Architecture

One of the standout performers in this benchmark round was NVIDIA’s GB300 NVL72 rack-scale system, powered by the new NVIDIA Blackwell Ultra GPU architecture. This system made waves following a record-setting performance in the recent MLPerf Inference round.

Compared to its predecessor, the Hopper architecture, the Blackwell Ultra-based GB300 NVL72 demonstrated over four times the performance in Llama 3.1 405B pretraining and nearly five times the performance in Llama 2 70B LoRA fine-tuning, all while using the same number of GPUs.

These impressive gains can be attributed to the architectural advancements of the Blackwell Ultra, which includes new Tensor Cores capable of delivering 15 petaflops of NVFP4 AI compute. In addition, it features twice the compute capability for attention layers and 279GB of HBM3e memory. These improvements, combined with innovative new training methods, allow the architecture to fully leverage its significant NVFP4 compute performance.

To further enhance the performance of these systems, NVIDIA introduced the Quantum-X800 InfiniBand platform. This platform, the industry’s first end-to-end 800 Gb/s scale-up networking solution, debuted in the MLPerf benchmarks, effectively doubling the scale-out networking bandwidth from the previous generation.

### Unlocking Performance with NVFP4 for LLM Training

A key factor in NVIDIA’s success in this benchmark round was the use of NVFP4 precision for calculations, marking a first in the history of MLPerf Training. This approach involves performing computations on data using fewer bits, which allows for faster computation rates. However, this comes with the challenge of maintaining accuracy, as lower precision means less information is available in each calculation.

NVIDIA addressed this challenge by innovating across the entire stack to adopt FP4 precision for training LLMs. The NVIDIA Blackwell GPU is capable of executing FP4 calculations, including the NVIDIA-designed NVFP4 format, at double the rate of FP8. The Blackwell Ultra architecture elevates this to three times the rate, significantly boosting the AI compute performance of the GPUs.

Notably, NVIDIA is the only platform to have submitted MLPerf Training results using FP4 precision while adhering to the benchmark’s stringent accuracy standards.

### Scaling New Heights with NVIDIA Blackwell

In a remarkable demonstration of its capabilities, NVIDIA set a new time-to-train record for the Llama 3.1 405B model, achieving this milestone in just 10 minutes with the coordinated effort of over 5,000 Blackwell GPUs. This performance was 2.7 times faster than the best Blackwell-based submission from the previous round, made possible by efficiently scaling to more than twice the number of GPUs and leveraging NVFP4 precision to significantly enhance the effective performance of each GPU.

To further illustrate the performance improvements per GPU, NVIDIA submitted results this round using 2,560 Blackwell GPUs, achieving a training time of 18.79 minutes—45% faster than the prior submission, which used 2,496 GPUs.

### Breaking New Ground with New Benchmarks

In addition to its stellar performance on existing tests, NVIDIA set new benchmarks with two additional tests: Llama 3.1 8B and FLUX.1. The Llama 3.1 8B, a smaller yet highly effective LLM, replaced the long-standing BERT-large model in the benchmark suite. NVIDIA set a training standard of 5.2 minutes with submissions using up to 512 Blackwell Ultra GPUs.

FLUX.1, a cutting-edge image generation model, supplanted the Stable Diffusion v2 benchmark, with NVIDIA being the only platform to submit results. Using 1,152 Blackwell GPUs, NVIDIA set a record time of 12.5 minutes for training.

NVIDIA maintained its lead on existing tests for graph neural networks, object detection, and recommender systems, demonstrating its comprehensive capabilities across a wide range of AI applications.

### A Thriving Ecosystem of Partners

The extensive participation of NVIDIA’s ecosystem in this benchmark round underscores the company’s broad influence. Submissions from 15 organizations, including industry giants like ASUSTeK, Dell Technologies, Hewlett Packard Enterprise, and Lenovo, as well as academic institutions like the University of Florida, highlight the widespread adoption and trust in NVIDIA’s technology.

NVIDIA’s commitment to innovation, with significant performance gains achieved on an annual basis across pretraining, post-training, and inference, is paving the way for new levels of intelligence and accelerating the adoption of AI technologies.

For those interested in exploring NVIDIA’s performance data further, more information is available on the Data Center Deep Learning Product Performance Hub and the Performance Explorer pages.

In conclusion, NVIDIA’s sweeping success in the latest MLPerf Training benchmarks underscores its leadership in AI technology and its unwavering dedication to pushing the boundaries of what’s possible in AI training and performance. As the field of artificial intelligence continues to evolve, NVIDIA’s innovations are set to play a pivotal role in shaping the future of intelligent systems.

For more detailed insights and data, you can visit NVIDIA’s official blog at blogs.nvidia.com.
For more Information, Refer to this article.

Neil S
Neil S
Neil is a highly qualified Technical Writer with an M.Sc(IT) degree and an impressive range of IT and Support certifications including MCSE, CCNA, ACA(Adobe Certified Associates), and PG Dip (IT). With over 10 years of hands-on experience as an IT support engineer across Windows, Mac, iOS, and Linux Server platforms, Neil possesses the expertise to create comprehensive and user-friendly documentation that simplifies complex technical concepts for a wide audience.
Watch & Subscribe Our YouTube Channel
YouTube Subscribe Button

Latest From Hawkdive

You May like these Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.