Microsoft Azure has just introduced a new series of virtual machines called NDv6 GB300. These virtual machines use the first production cluster of NVIDIA GB300 NVL72 supercomputers, which are specifically designed to handle some of the most advanced artificial intelligence tasks by OpenAI.
This new supercomputer setup is extremely powerful. It brings together over four thousand six hundred NVIDIA Blackwell Ultra graphics processing units (GPUs), which are linked with advanced networking technology from NVIDIA called Quantum X800 InfiniBand. Microsoft has used innovative engineering in both memory and networking to give this system the massive computing power needed for training and running large and complex AI models.
The heart of this technology is the NVIDIA GB300 NVL72 system. Each rack uses liquid cooling to keep everything working efficiently and includes seventy two NVIDIA Blackwell Ultra GPUs along with thirty six NVIDIA Grace CPUs. Each virtual machine can access a huge amount of fast memory and has enough power to handle some of the most advanced reasoning and generative AI tasks.
NVIDIA's technology helps these GPUs work together as a single unit, providing a large shared memory space. This is essential for running very large AI models. The system supports fast communication between GPUs and uses the latest software and training formats like NVFP4 for efficient performance. In recent benchmarks, this setup achieved record high processing speeds for massive AI models compared to previous NVIDIA GPUs.
To connect thousands of GPUs into a single supercomputer, Azure uses a smart two level networking system. Inside each rack, NVIDIA NVLink Switch creates fast connections so all seventy two GPUs can share memory and data quickly. Across the entire cluster, NVIDIA Quantum X800 connects all of the racks together so they work as a unified supercomputer. This ensures fast and reliable communication between all four thousand six hundred and eight GPUs.
Building this powerful cluster required Microsoft to rethink every part of its data centers, including cooling, power, and software. This new milestone means that Microsoft Azure is ready to handle the largest and most demanding AI projects, opening the door for future innovation from companies like OpenAI. As Microsoft continues to expand its use of NVIDIA GPUs, even more advancements in artificial intelligence are expected.
Original article and image: https://blogs.nvidia.com/blog/microsoft-azure-worlds-first-gb300-nvl72-supercomputing-cluster-openai/
