• Thu. Jun 13th, 2024

The power of infrastructure purpose-built for AI


Apr 26, 2023
The power of infrastructure purpose-built for AI


This article is part of a VB Lab Insights series on AI sponsored by Microsoft and Nvidia.

Don’t miss additional articles in this series providing new industry insights, trends and analysis on how AI is transforming organizations. Find them all here.  

Advances in powerful computing and the cloud are at the heart of the AI revolution. With faster, more efficient processing of vast amounts of data, companies can break through the boundaries that previously limited model sizes, leading to innovation that will change the world. It’s a revolution open to organizations of every size, given the affordability and versatility of the cloud.

At NVIDIA’s 2023 GTC, held March 20 – 23, leaders from Microsoft showcased the ways AI infrastructure is crucial to AI innovation, from the keynote to demos and many other panels. Customers and partners dug deep into how they’re using AI to transform and disrupt their industries, while Microsoft shared exclusive company announcements and showcased the recently released Azure OpenAI Service, NVIDIA H100 GPU-powered VM series for Generative AI, the new NVIDIA Omniverse Cloud on Azure and integration Microsoft 365 and more.

Bringing AI into the physical world with autonomous driving

The Microsoft Session, “Accelerate AI Innovation with Unmatched Cloud Scale and Performance,” featured conversations with Alex Kendall from Wayve, Nidhi Chappell from Microsoft and NVIDIA’s Manuvir Das, focusing on the ways AI infrastructure is crucial for advancing AI innovation, the leaps forward in Azure AI infrastructure powered by NVIDIA and more.

First up was Alex Kendall, CEO of Wayve, a leader in bringing autonomous driving to market.

It’s widely accepted that autonomous driving will fundamentally change the way people and goods move around cities. “But in order to realize the promise of autonomous driving,” Kendall explained, “we need to have an AI system which has the intelligence to make safe and robust decisions in the complex urban environments we drive in.”

Wayve is building an artificial intelligence system that can learn how to drive based on what it sees, with onboard intelligence needed to make decisions in real-time without requiring a set of rules or maps, bringing AI into the physical world. That requires the latest advances in machine learning, and a large-scale foundational neural network, trained with self-supervised learning to address very diverse sets of data.

The scale of video data makes it an especially complicated task. Vehicles collect terabytes of data per minute from each of the car’s cameras and radar, which is then sent to Wayve’s training infrastructure on Azure that allows them to train billion-parameter neural networks.

“This is a scale that autonomous driving hasn’t seen yet — and we want to push the boundaries here,” Kendall said. “That’s where our partnership with Microsoft has been fantastic because we’ve been working on supercomputing technologies that really make this possible both from a software infrastructure perspective as well as creating the right data and compute nodes to be able to train at this level of scale.”

Current successful results will only compound as the scale of data and compute increases, he added.

“We see more and more of our fleet partners’ vehicles contribute experience to our learning as we use new generative AI methods to be able to generate, re-simulate or create experiences that we can train from, or by scaling our simulation platform,” Kendall said. “It operates in a really adaptable way in Azure as well that lets us both create synthetic training data, while new experiences train multi-agent reinforcement learning systems, and validate that our system achieves the performance that we expect before we deploy it on public roads.”

Azure infrastructure levels up again

Nidhi Chappell, General Manager, HPC, AI, SAP and Confidential Computing, Microsoft brought news about the Microsoft AI initiatives addressing the growing compute demand for AI training and the equally urgent need for performance at scale. This high-end training is also very sensitive to performance at large scale with a single job synchronously running across thousands of GPUs.

“Microsoft AI initiatives are part of that revolution,” Chappell said. “We have a vision to embed AI in all of our products and we want to leverage the best infrastructure for these AI workloads.”

But building a full AI-optimized solution stack on premises can be impractical and costly for most companies. Azure AI infrastructure powered by NVIDIA accelerated computing and networking technologies combined with Microsoft’s overall AI solution, was architected to address these challenges for customers of all sizes.

“Our design philosophy is to provide the optimal AI infrastructure configuration of the next-generation CPU, compute GPU and storage,” she explained. NVIDIA solutions are utilized and interconnected with the NVIDIA Quantum InfiniBand, she added, “to allow unprecedented scale in cloud surrounded by first-in-class AI platform services.”

Chappell also explained that Microsoft is currently the only provider to offer a pay-as-you go infrastructure stack — so that customers only pay for what they need, giving organizations the best performance at the scale of their choosing.

The variety of Azure VM series and SKUs configurations are designed for middle-tier to small-scale AI workloads, even allowing the use of virtual machines for GPU workloads. These VMs offer hardware-based security enhancements to protect AI models and data in use.

NVIDIA and Azure bring AI to companies of all sizes

Manuvir Das, Vice President of Enterprise Compute at NVIDIA spoke about the ways their partnership with Microsoft is crucial to the evolution of Azure.

“At its core, NVIDIA is the company of accelerated computing,” said Manuvir Das, Vice President of Enterprise Compute, NVIDIA, “whether that’s AI, data science or data processing.”

Das continued: “More and more customers have gravitated to Azure for their cloud computing platform. So the [Microsoft] partnership has been all about how do we take this new accelerated computing model and place it into Azure, the cloud where all these companies are choosing to do their work.”

Deep integration of NVIDIA AI software and hardware into Azure has put speed into the hands of an increasing number of companies, no matter where they are on the AI timeline. It’s a particularly powerful tool for enterprises that aren’t doing grand-scale projects, meaning they don’t have large engineering teams or need the most powerful platform. But to operationalize AI, they need more power than simple turnkey solutions offer. NVIDIA AI Enterprise packages AI software for a broad array of use cases and integrates these seamlessly into Azure.

For more details on these conversations, watch the full NVIDIA GTC session, “Accelerate AI Innovation,” on demand, featuring Alex Kendall from Wayve, Nidhi Chappell, Microsoft, and Manuvir Das, NVIDIA.

Introducing a new family of powerful, scalable VMs

A new powerful and scalable VM is in town. Microsoft introduced the ND v5 H100 series, powered by  NVIDIA H100 Tensor Core GPUs. It allows companies to scale models from eight to thousands of NVIDIA H100 GPUs interconnected on a single fabric, backed by NVIDIA Quantum-2 InfiniBand Networking.

“Customers will see significant performance improvement on their OAI models over our last generation of ND series that was based on NVIDIA A100 GPUs,” said Chappell. “With the introduction of these VMs there is going to be groundbreaking research for generative AI applications both for training as well as for inference.”

Microsoft worked closely with NVIDIA to update the entire architecture to absorb these new GPUs, offering a diverse portfolio that makes them fully available to the entire developer community.

“It’s the same programming model to access all of this stuff whether they’re doing training or inference, and so it opens up all kinds of use cases,” Das explained.

For an in-depth look at the latest updates for Azure’s ND series based on NVIDIA H100 GPUs, watch the NVIDIA GTC session: Azure’s Purpose-Built AI Infrastructure using the Latest NVIDIA GPU Accelerators featuring Matt Vegas, Principal Product Manager at Microsoft.

NVIDIA and Microsoft announce industrial metaverse and AI supercomputing service collabs

Elsewhere in the conference, Microsoft and NVIDIA demonstrated a new collaboration in action: NVIDIA Omniverse Cloud.

Microsoft Azure will host two new cloud offerings from NVIDIA: NVIDIA Omniverse Cloud, a platform-as-a-service giving instant access to a full-stack environment to design, develop, deploy and manage industrial metaverse applications; and NVIDIA DGX Cloud, an AI supercomputing service that gives enterprises immediate access to the infrastructure and software needed to train advanced models for generative AI and other groundbreaking applications.

Connecting Omniverse to Azure Digital Twins and Internet of Things allows companies to link real-time data from sensors in the physical world to digital replicas that automatically respond to changes in their physical environments. Azure provides the cloud infrastructure and capabilities needed to deploy enterprise services at scale, including security, identity and storage.

Omniverse will also be integrated into the Microsoft productivity suite, starting with Microsoft Teams, SharePoint and OneDrive. Omniverse will be powered by Microsoft Azure, giving Microsoft 365 and Azure users their own Omniverse.

Omniverse Cloud, powered by NVIDIA OVX computing systems, will be available on Azure in the second half of the year, while DGX Cloud will be available running in Azure beginning next quarter, providing enterprises with dedicated clusters of NVIDIA DGX AI supercomputing and software, rented on a monthly basis.

And for more information on the ways companies can build, operationalize and deploy AI-enabled products and services, plus industry insights trends and analysis to help business and IT decision-makers capture the revolutionary power of AI, head to Microsoft and NVIDIA’s Make AI Your Reality hub at VB Lab.

VB Lab Insights content is created in collaboration with a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Leave a Reply

Your email address will not be published. Required fields are marked *