TECHnalysis Research Services: Blogs, March 19, 2024

Previous Blogs

TECHnalysis Research Blog

March 19, 2024
Nvidia Advances GenAI Adoption

By Bob O'Donnell

What do you do when you’re the hottest thing in town?

That’s the question that Nvidia, along with a large and growing crowd of tech partners, customers, and investors, have all been pondering in advance of Nvidia’s first in-person GTC conference in 5 years.

In short, the answer the company gave was two very different things. First, it unveiled a next generation reinvention of the underlying architecture that has made its GPU chips such an incredibly important part of the GenAI revolution. Second, it announced an expansive set of tools and partnerships to make the often-overwhelming process of putting generative AI applications into production much easier for businesses of all types.

Along the way, Nvidia also emphasized its growing ambitions to become a software and services company with the release of AI Enterprise 5.0 and the new NIM (Nvidia Inference Microservices) that are included within it. The company also managed to expand the range of applications as well as the industries to which it’s working to bring critical GenAI compute solutions, including health care, heavy industry, automotive, robotics, manufacturing, telecommunications (6G), weather forecasting, and more.

On the chips side, the big news from GTC was the Blackwell GPU architecture, named for the trailblazing African American mathematician David Blackwell. This is the company’s first big advance in chip design since the debut of the Hopper architecture two years ago. Blackwell offers several important improvements over its predecessor, particularly with regard to the performance and power efficiency of the chip. Specifically, Nvidia said that the 20 PetaFLOPS of AI performance that Blackwell offers is 4x faster on AI Training workloads, 30x faster on AI Inferencing workloads and, most notably, up to 25x more power efficient than its predecessor.

Physically, the 208 billion transistor Blackwell design consists of two processing elements—each of which is as big as the 4nm manufacturing equipment will allow—connected via an ultra-high-speed link called NV-HBI that transfers data at 10 TB/sec. The chip also supports up to 192 GB of HBM3e memory.

Within the chip, an important advancement made with the Blackwell is a second-generation transformer engine. This allows each micro-tensor within the main tensor processing units to be monitored in real-time, thereby enabling support for 4-bit floating point AI calculations in conjunction with the company’s TensorRT LLMs and NeMo Megatron AI frameworks. Practically speaking, by reducing these calculations down from 8-bit on previous generations, they can double the compute performance and model sizes they can support on Blackwell with this single change. Some might argue this is an apples to oranges comparison, as a result, but the fact that it can handle bigger models is testament to the real-world benefits.

As powerful as a single Blackwell GPU may be, in the new era of Mixture of Experts (MoE) AI “supermodels” that can include over 10 trillion parameters and deal with more than 32,000 input tokens, there’s an essential need for connecting large numbers of GPUs together. That’s where the company’s new NVLink 5.0 technology kicks in, as it allows up to 576 GPUs to be linked together at speeds up to 1.8 TB/sec.

As it did with the previous generation, Nvidia has also put together a “superchip” that combines its latest GPUs with its Arm-based CPU designs. That latest superchip version is called the Grace Blackwell 200 (or GB200 for short), and it includes two Blackwell GPUs and a single Grace CPU. These GB200s are being packaged together in a variety of configurations and will sit at the heart of the company’s new NVL72 rack server designs as well as the next generation DGX SuperPod. Another way to get access to multiple Blackwell GPUs is via the HGX B200 server board, which incorporates 8 Blackwell GPUs (called B200s) onto a single card for smaller server designs.

Once again, connectivity is critical for all of these systems, so the company also introduced a new range of switches—including the InfiniBand-based Quantum-X800 switch and the Ethernet-based Spectrum-X800. Both leverage the company’s BlueField technology to speed the process of feeding data across the data center into the GPUs for processing.

Not surprisingly, every major cloud provider and server manufacturer announced that services or systems with the Blackwell-based design would be coming shortly, as all the major IT companies are leveraging Nvidia’s technology for their use. Similarly, an impressive array of software vendors also announced they would be supporting Blackwell and these latest designs in the next generation versions of their applications.

Speaking of software, the big software news from Nvidia was the introduction of microservices called NIM that are part of the AI Enterprise 5.0 release. These microservices are web-native containers that run on top of the company’s CUDA software platform and are specifically designed to make the process of creating and developing GenAI applications that can leverage CUDA and Nvidia’s hardware much easier.

While perhaps not as exciting as the latest hardware designs, this is actually significantly more important in the long run for several reasons. First, it’s supposed to make it faster and more efficient for companies to move from GenAI experiments and POCs (proof of concepts) into real-world production. There simply aren’t enough data scientists and GenAI programming experts to go around, so many companies who’ve been eager to deploy GenAI have been limited by technical challenges. As a result, it’s great to see Nvidia helping ease this process.

Second, these new microservices allow for the creation of an entire new revenue stream and business strategy for Nvidia because they can be licensed on a per GPU/per hour basis (as well as other variations). This could prove to be an important, long-lasting, and more diversified means of generating income for Nvidia, so even though it’s early days, this is going to be important to watch.

From a pragmatic perspective, many of these microservices focus on important capabilities designed to make Nvidia hardware-accelerated GenAI applications much more compelling. For example, Nvidia has several of these NIM services it calls CUDA-X focused on the process of integrating existing corporate data into applications. The data formatting and ingestion process has been problematic for many organizations, so solutions built by Nvidia along with a number of software companies that are focused on data management tools are important. In a related way, the NeMO Retriever microservices include capabilities to integrate important new refinement technologies such as RAG (Retrieval Augmented Generation) into customized applications so that they can better use that enterprise data.

On top of these individual microservices, Nvidia also talked about the idea of an AI Foundry, where it could help its customers piece together the various microservice containers they need, help them pair them with these customers’ specific data sets, and help them build a customized GenAI application. This is extremely important because most organizations need help doing this. Nvidia can leverage the learnings it has gathered in building its own models to help the app creation process along the way. The fact that they’ll also be able to make some money on that process is a very nice benefit.

As has become typical with Nvidia CEO Jensen Huang’s keynotes, there was an absolute firehose of information that extended some of these announcements even further, particularly into more real-world applications across industries. Ultimately, though, what became clear is that Nvidia is taking its role as the GenAI industry engine very seriously and, far from resting on its laurels, is pushing itself forward as quickly as it can.

The new hardware advancements represent important new steps in keeping the crazy pace of innovation in GenAI moving forward as fast as possible. The Blackwell platform is also the first GPU platform that was designed and built in the GenAI era, and some of the design tweaks clearly reflect the specific demands of very large LLMs. Even more importantly, the new software applications and microservices look to position the company as an even wider and more important GenAI industry enabler for the long term. Plus, as the absolute abundance of partner announcements made at the event demonstrates, a huge portion of the tech industry clearly sees Nvidia as the company they’re going to watch and work with for some time to come.

Here's a link to the original column: https://www.linkedin.com/pulse/nvidia-advances-genai-adoption-bob-o-donnell-bmwrc/

Bob O’Donnell is the president and chief analyst of TECHnalysis Research, LLC a market research firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on LinkedIn at Bob O’Donnell or on Twitter @bobodtech.