Computer VisionJune 1, 2026

NVIDIA Cosmos 3: the open-source model teaching machines to understand the physical world

NVIDIA introduced Cosmos 3, an open vision model designed for robots, autonomous vehicles, and industrial systems to build real-world representations that allow them to plan and act safely.

3 min read76 views599 words
NVIDIA Cosmos 3: the open-source model teaching machines to understand the physical world

Key Takeaways

1

Cosmos 3 is open-source and can be adapted to specific use cases

2

The model builds 3D environment representations from video, not just image classification

3

Improves robot motion planning by 45% over previous methods

4

NVIDIA positions it as the foundation for all physical AI: robots, drones, autonomous vehicles

5

The model understands physical properties like gravity, friction, and collisions

Cosmos 3 is not just another computer vision model. While most vision models specialize in classifying images or generating visual content, Cosmos 3 has a different goal: building an internal representation of the physical world that allows machines to plan real actions.

1What makes Cosmos 3 different?

The fundamental difference is that Cosmos 3 does not just "see" the world; it understands it. While a classification model can tell you there is a table in an image, Cosmos 3 understands:

  • Where the table is in 3D space
  • What is on the table and roughly how much it weighs
  • What would happen if you push an object to the edge of the table
  • Where a robot can move without hitting the table

A model that classifies images is an observer. A model that understands the physics of the world is a potential actor.

Learned physical properties

Cosmos 3 demonstrates implicit understanding of fundamental physical properties:

  • **Gravity**: Correctly predicts how objects fall
  • **Friction**: Understands that a box on a ramp will slide
  • **Collisions**: Anticipates what happens when two objects interact
  • **Occlusion**: Knows that an object behind another still exists

2Performance and benchmarks

馃搳 On the RoboNav-2026 benchmark, Cosmos 3 improves motion planning accuracy by 45% over the previous best model. In autonomous navigation, it reduces simulated collision incidents by 67%.

These numbers are impressive, but the real impact lies in real-world applications.

3Practical applications

Autonomous driving

Cosmos 3 can process information from multiple cameras and LiDAR sensors to build a complete representation of the driving environment. This includes predicting the movement of pedestrians, cyclists, and other vehicles.

Industrial robotics

In factories and warehouses, Cosmos 3 enables robots to plan navigation routes that avoid dynamic obstacles (people, forklifts, other robots) in real time.

Inspection drones

For drones inspecting infrastructure (bridges, power towers, solar panels), Cosmos 3 provides the ability to autonomously navigate around complex structures.

Precision agriculture

Agricultural robots can use Cosmos 3 to navigate between crop rows, identify individual plants, and execute precise actions like selective irrigation or weed removal.

4The open-source approach

NVIDIA has made the strategic decision to make Cosmos 3 open-source. This allows:

  • The community to adapt the model to specific cases
  • Small companies and startups to access cutting-edge technology
  • Researchers to use it as a foundation for new breakthroughs
  • An ecosystem of tools and applications to be built around the model

馃挕 NVIDIA's strategy is clear: by making the model open-source, it increases adoption of its GPUs (which are needed to run the model), creating a value flywheel.

5Hardware requirements

Cosmos 3 comes in three variants:

  • **Cosmos 3 Lite**: For edge devices with NVIDIA Jetson GPUs
  • **Cosmos 3 Standard**: For servers with A100/H100 GPUs
  • **Cosmos 3 Ultra**: For data centers with GPU clusters

6The future of physical AI

Cosmos 3 represents an important step in the transition from models that only process digital information (text, images, code) to models that understand and can interact with the physical world. This is the foundation on which the robots and autonomous vehicles of the next decade will be built.

Last updated: July 2, 2026