Microsoft and Nvidia announce they are teaming up to create an “AI supercomputer” using Azure infrastructure combined with Nvidia’s GPU accelerators, network kit and its software stack.
The target market will be companies looking to train and deploy large, state-of-the-art AI models at scale.
According to Nvidia, the project will be a multi-year effort that will aim to deliver “one of the most powerful AI supercomputers in the world,” and the move will also make Azure the first public cloud to integrate Nvidia’s AI software stack. Nvidia.
To build this system, the pair will use the Azure cloud platform’s ND and NC series virtual machines, which are GPU-based instances, but the project will involve bringing in “tens of thousands” of A100s from Nvidia. and the last H100 GPUs to the platform.
Nvidia says Microsoft’s Azure is the first public cloud to integrate its Quantum-2 InfiniBand network switches for its AI-optimized virtual machines. Current Azure instances feature 200 Gbps Quantum InfiniBand and A100 GPUs, but future ones will have 400 Gbps Quantum-2 InfiniBand and the new H100 GPUs.
The two companies didn’t offer any timeline for this project to become operational, but it seems that this so-called AI supercomputer might not exist as such.
Reading between the lines, it looks like Microsoft is simply adding a set of AI-optimized instances to Azure with the latest hardware and software from Nvidia, which customers will chain together as needed for their specific project.
We asked Nvidia for clarification, and a spokesperson told us that: “All new features are in Azure instances, but the configuration is such that enterprises will be able to scale these instances to supercomputing status.” So it’s an AI cloud service, in this case.
The Nvidia spokesperson added, “Customers can acquire resources like they normally would with a real supercomputer. For both, you have a software layer that reserves resources. For now, that’s in the cloud and not on a dedicated supercomputer. The most important thing is scalability. The resource on Azure can be scaled up to supercomputer standards with the same AI software, the same network capacity and the same compute nodes.”
The partnership will also see Nvidia use Azure resources to conduct research into generative AI models, which the company describes as a “rapidly emerging area” of AI in which fundamental models such as the Megatron Turing NLG 530B are the basis of new algorithms capable of synthesizing text, computer code, images and even video.
The model in question was developed by a team from Microsoft and Nvidiausing Nvidia’s Megatron-LM transformer model and Microsoft’s DeepSpeed deep learning optimization library, the latter of which the pair will also work on optimization.
Scott Guthrie, Microsoft’s executive vice president for its Cloud + AI group, said AI will power the next wave of automation in business, allowing organizations to do more with less.
“Our collaboration with Nvidia unlocks the world’s most scalable supercomputer platform, delivering industry-leading AI capabilities for every business on Microsoft Azure.”
Nvidia Vice President of Enterprise Computing Manuvir Das said the partnership aims to provide researchers and companies with infrastructure and software to harness AI.
“The breakthrough in basic models has triggered a tidal wave of research, fostered new startups and enabled new enterprise applications,” he said.
To that end, the AI supercomputer/cloud service will support a wide range of applications and services, including Microsoft DeepSpeed and Nvidia’s AI Enterprise software suite. The latter is already certified and supported on Azure instances with A100 GPUs, while support for Azure instances with new H100 GPUs will be added in a future software release, Nvidia said. ®