January 8, 2025
By Bob O'Donnell
In what was undoubtedly the most eagerly anticipated, most closely watched and highly attended CES keynotes of all time, Nvidia CEO Jensen Huang managed to once again unveil an impressively wide-ranging set of announcements across many of the hottest topics in tech, including AI, robotics, autonomous cars and more.
Clad in a Las Vegas-glitz style version of his trademark black leather jacket, the tech industry leader worked his way through the company’s latest Geforce RTX50 series graphics cards, new Nemotron AI foundation model families and AI blueprints for AI-powered agents. He also touted extensions to the company’s Omniverse digital twin/simulation platform that extends AI into the physical world, new safety certifications for its autonomous driving platform, and a new mini desktop size AI supercomputer called Project Digits that’s powered by a Grace Blackwell GPU. Needless to say, it was all a lot to take in.
One of the most intriguing—but probably least understood—of all the announcements was a set of foundation models and platform capabilities that the company is calling Cosmos. Specifically defined as a set of world foundation models, advance tokenizers, safety guardrails and an advanced video processing pipeline, Cosmos is designed to bring the training capabilities and advanced outcomes of generative AI from the digital world into the physical one. In other words, instead of having GenAI create new digital outputs built from its training across billions of documents, images, and other digital content, Cosmos can help generate new physical actions—let’s call them analog outputs—by leveraging data it’s been trained on from digitally simulated environments.
While the concept is complex, the real-world results are both simple and profound. For applications like robotics, autonomous vehicles, and other mechanical systems, this means that Cosmos can help these systems react to physical stimuli in more accurate, safe, and helpful ways. For example, humanoid-style robots can be taught to physically emulate the most effective or safest way to perform a given task—whether it’s flipping an omelet or picking up and putting away a part on a production line. Similarly, an autonomous car can be taught to react dynamically to different types of situations and environments.
Much of this type of training is currently going on, but a huge portion of it is being done manually, with human beings being filmed performing the same action hundreds of different times or having autonomous cars drive millions of miles. Plus, even after that’s done, thousands of people are spending enormous amounts of time hand-labelling and tagging those videos. With Cosmos, these types of training methods can be automated, dramatically reducing costs, saving time, and improving the range of data that’s used for the training process.
The way it works is that Cosmos acts as a type of extension to Nvidia’s Omniverse digital simulation environment and takes the digital physics of the models and systems that are created in Omniverse and translates them into physical actions in the real world. While that may seem like a subtle distinction, it’s a critically important one, because it’s what allows Cosmos to generate its GenAI-powered physical outputs. At the heart of Cosmos is a series of what are called world foundation models, built from millions of hours of video content, that have an understanding of the physical world. Cosmos essentially takes the digital models of physical objects and environments that can be created in Omniverse and then places them into these world foundation models and generates photorealistic video outputs of how these models are predicted to react in the world. These videos, in turn, serve as synthetic data sources that can be used to train the models running in robotic systems, autonomous cars and other GPU-powered mechanical systems. The end results are systems that can react more effectively across a wide range of different environments.
One other important note is that Nvidia is making its Cosmos world foundation models available for free to encourage more developments in the fields of robotics and autonomous vehicles as well as further experimentation.
In the short term, the immediate impact of Cosmos will be limited, as it’s primarily targeted at a small group of individuals who are developing advanced robotics and autonomous vehicle applications. Longer term, however, the impact could be profound as it’s expected to dramatically speed up the development of these product categories and improve the accuracy and safety of these applications. More importantly, it shows how Nvidia continues to look ahead to and plan for bigger tech trends like robotics. It also highlights the ongoing but little recognized trend that Nvidia is transforming itself into a software company building platforms for these new applications. For those wondering where the company is headed and how it should be able to continue its impressive growth, these are intriguing and important signs.
Here's a link to the original column: https://www.linkedin.com/pulse/nvidia-brings-genai-physical-world-cosmos-bob-o-donnell-dwnjc/
Bob O’Donnell is the president and chief analyst of TECHnalysis Research, LLC a market research firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on LinkedIn at Bob O’Donnell or on Twitter @bobodtech.
|