NVIDIA Vera Rubin: What is it, what's new, and when you can get it
Written by
Chief Technology Officer (CTO) at Civo
Written by
Chief Technology Officer (CTO) at Civo
NVIDIA's infrastructure roadmap moves fast, and the next major milestone is already here. The NVIDIA Vera Rubin platform is the company's next-generation AI compute architecture, the successor to Blackwell, and it's shaping up to be one of the most significant leaps forward in AI infrastructure NVIDIA has ever shipped.
Whether you're planning your next training cluster, scaling inference pipelines, or building the infrastructure to power autonomous agents, Vera Rubin is worth understanding now. In this blog, we break down what it is, what's genuinely new, and when you can start getting access.
What is NVIDIA Vera Rubin?
Vera Rubin is NVIDIA's next-generation AI platform, succeeding the Blackwell GPU family. But calling it a "new GPU" undersells what it actually is. Vera Rubin is a full data center platform, a collection of six co-designed chips built to work together as a unified AI infrastructure system.
Those six chips are:
- Rubin GPU: The next-generation GPU architecture, the compute core of the platform
- Vera CPU: A new ARM-based CPU designed specifically for AI-first workloads
- NVLink 6 Switch: For high-bandwidth chip-to-chip interconnects
- ConnectX-9 SuperNIC: For high-speed networking
- BlueField-4 DPU: For data processing and security offload
- Spectrum-6 Ethernet Switch: For fabric-level networking at rack scale
The design philosophy NVIDIA calls "extreme co-design" means these chips weren't optimized separately and assembled into a server. They were architected together from the ground up (GPU, CPU, networking, security, and cooling) as a single integrated system. It's the same approach NVIDIA introduced with Blackwell NVL72, but Rubin takes it further with a new in-house CPU, doubled NVLink bandwidth, and HBM4 memory.
“Vera Rubin is a generational leap — seven breakthrough chips, five racks, one giant supercomputer — built to power every phase of AI… The agentic AI inflection point has arrived with Vera Rubin kicking off the greatest infrastructure buildout in history.”
Jensen Huang, Founder and CEO of NVIDIA (Source: NVIDIA press release)
Why the name Vera Rubin?
NVIDIA has a tradition of naming its GPU architectures after pioneering scientists. Vera Rubin is named after the American astronomer whose observations of galaxy rotation curves provided some of the first compelling evidence for dark matter. Her work showed that galaxies contain five to ten times more mass than what's visible, fundamentally reshaping our understanding of the universe, a fitting tribute for a platform designed to unlock a new era of AI capability.
What's new: The key components explained
The Rubin GPU
The Rubin GPU is the successor to the NVIDIA Blackwell architecture, and it brings major improvements aimed squarely at the challenges of large-scale AI inference.
Rubin attacks the cost of inference from three distinct angles:
Research from Splitwise and DistServe showed that separating prefill and decode workloads onto specialized hardware yields up to 1.4x higher throughput at 20% lower cost, and Rubin is built with that insight baked into the architecture.
The Vera CPU
In previous GPU generations, NVIDIA paired its GPUs with CPUs from other manufacturers. Blackwell, for example, uses B200 GPUs alongside ARM-based Grace CPUs from a separate design effort. With Vera Rubin, NVIDIA designed both the GPU and CPU in-house.
The Vera CPU is purpose-built for AI-first workloads. It's designed specifically to handle the sequential reasoning patterns that modern AI agents rely on, and it works seamlessly with Rubin GPUs via high-speed interconnects. Critically, the Vera CPU can also operate as a standalone data center CPU; it isn't dependent on being paired with a Rubin GPU.
The practical benefit is eliminating one of the most frustrating bottlenecks in AI infrastructure: GPUs sitting idle while workloads move through orchestration and decision layers. With Vera handling that coordination efficiently, Rubin GPUs spend more time doing what they're built to do.
The Groq LPU Integration
One of the bigger surprises from NVIDIA GTC 2026 was the announcement of Groq LPU integration into dedicated LPX racks. While the Rubin GPU handles heavy compute workloads, Groq's LPU (Language Processing Unit) is designed for lightning-fast token generation.
Fusing these two approaches together is how NVIDIA is targeting a 35x inference performance-per-watt improvement, a figure that, if it holds at production scale, would represent a step-change in the economics of serving AI models.
How it's configured: From individual chips to full racks
One of the strengths of the Vera Rubin platform is its flexibility. Organizations don't have to go all-in on a full rack deployment from day one. Supported configurations include:
- Individual Rubin chips: For teams integrating at the component level
- HGX Rubin: The GPU and interconnect technology available to OEM partners and cloud providers for custom builds
- DGX Rubin: NVIDIA's fully integrated, turnkey AI platform using Rubin GPUs and Vera CPUs, ready to deploy out of the box
- NVL72: A rack-scale configuration combining 72 Rubin GPUs with 36 Vera CPUs and high-performance networking
NVL72 is the headline configuration. It's one of the first confirmed Vera Rubin deployments and is designed specifically for large-scale training and inference workloads at rack scale.
Why does this matter? The bigger picture
The timing of Vera Rubin isn't accidental. The industry is moving beyond the phase where AI models are simply trained and deployed. Increasingly, models are expected to act, reason, and operate autonomously, and that changes everything about what AI infrastructure needs to support.
Jensen Huang highlighted at CES 2026 that AI inference is no longer a simple request-response. With the rise of reasoning models and test-time scaling, inference has become a "thinking process", the model generates long chains of thought, tries different approaches, and iterates before producing a final answer. As Huang put it: "the longer it thinks, oftentimes it produces a better answer."
The consequence? The number of tokens generated per inference request is growing by roughly 5x every single year. At the same time, the cost of serving those tokens needs to fall just as rapidly to keep AI deployment economically viable at scale.
Vera Rubin is NVIDIA's answer to both sides of that equation, more compute where it matters, at a substantially lower cost per token.
“Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof… With our annual cadence of delivering a new generation of AI supercomputers — and extreme codesign across six new chips — Rubin takes a giant leap toward the next frontier of AI.”
Jensen Huang, Founder and CEO of NVIDIA (Source: NVIDIA press release)
When can you get NVIDIA Vera Rubin?
NVIDIA has confirmed that Rubin-based systems are in production, with partner and service provider availability expected in the second half of 2026. Early configurations including the NVL72 will be among the first to enter the market.
How to prepare now
Rubin represents the next major evolution in AI infrastructure architecture, designed for rack-scale systems capable of running massive agentic workloads. But as with every generational shift in compute, the organizations that benefit most will be the ones that plan ahead rather than wait for broad availability.
That means evaluating current workloads against the new architecture's strengths, identifying which teams will be early adopters, and securing capacity before allocations fill up.
At Civo, we have confirmed early access availability for NVIDIA Vera Rubin, with delivery from Q1 2027 and pricing from $11.00/hr. Allocations are limited. If you're planning for Rubin, now is the time to secure your capacity.
The workloads coming next, autonomous agents, real-time reasoning systems, and large-scale inference pipelines, will require a new generation of infrastructure to support them. Preparing now ensures your team is ready when the next generation arrives.
FAQs

Chief Technology Officer (CTO) at Civo
Dinesh Majrekar is Chief Technology Officer at Civo, where he leads the company’s technology strategy and platform development. His work focuses on building scalable cloud infrastructure and advancing the technologies that power the Civo platform.
Before becoming CTO, Dinesh served as Director of Innovation at Civo and held senior leadership roles at ServerChoice. His experience spans infrastructure architecture, platform engineering, and large-scale operations across hosting, cloud, and cybersecurity environments.
Share this article