NVIDIA Blackwell Ultra has achieved new records in MLPerf Inference tests, showing outstanding performance for AI workloads. Inference is a key part of how artificial intelligence systems make predictions and decisions quickly. The better the inference speed, the more efficiently an AI system can process information, which means higher profits and lower costs for organizations using these technologies.
The NVIDIA GB300 NVL72 system, which uses the new Blackwell Ultra design, was tested on the MLPerf Inference v5.1 benchmark. It delivered up to 45 percent higher performance on DeepSeek-R1 tasks compared to the older GB200 NVL72 system. Blackwell Ultra improves on the previous Blackwell architecture by offering one and a half times more AI processing power and double the speed for attention layers. Each GPU now supports up to 288 gigabytes of memory, helping handle bigger and more complex models.
NVIDIA also set the fastest scores in several new data center tests including DeepSeek-R1, Llama 3.1 405B Interactive, Llama 3.1 8B, and Whisper. It continues to lead in performance per GPU across all MLPerf data center categories.
These achievements are thanks to a combination of advanced hardware and smart software design. Blackwell Ultra includes special support for NVFP4, a new data format created by NVIDIA, which allows for both faster speeds and greater accuracy compared to other similar formats. NVIDIA's TensorRT Model Optimizer software and TensorRT-LLM library helped improve famous models like DeepSeek-R1 and Llama series for use on this new format, which means better performance without losing accuracy.
Large language models use two main workloads. One handles the initial user input and creates the first part of the response, while the other generates the rest of the answer. NVIDIA uses a method called disaggregated serving, which separates these two jobs so they can each be improved easily. This was key to almost doubling the performance per GPU in certain tests.
NVIDIA also introduced its Dynamo inference framework in these tests for the first time. Major tech companies and universities used NVIDIA’s Blackwell and Hopper platforms in their own test entries, showing wide industry support. As a result, organizations that use NVIDIA technology from cloud providers or server manufacturers see better results and save money.
To learn more, you can read the detailed post on the NVIDIA Technical Blog or visit the NVIDIA DGX Cloud Performance Explorer to see performance details and create custom reports.
Original article and image: https://blogs.nvidia.com/blog/mlperf-inference-blackwell-ultra/