NVIDIA Vera Rubin: the next leap in AI infrastructure
Unveiled at CES 2026, NVIDIA's Vera Rubin is a rack-scale platform built for the agentic-AI era, where inference and reasoning workloads dominate. Rather than a single faster GPU, it co-designs CPU, GPU, networking and cooling to act as one machine at data-center scale.
The flagship NVL72 rack pairs 72 Rubin GPUs with 36 Vera CPUs. NVIDIA positions it as a generational jump over Grace Blackwell — roughly a 4x reduction in the GPUs needed to train large mixture-of-experts models, and around 10x lower inference cost per token at the platform level. Each Rubin GPU moves to HBM4 memory and is built for trillion-parameter reasoning models.
Partner systems are expected to ship in the second half of 2026 across major cloud and OEM platforms.