What role does the Vera CPU play in the Vera Rubin platform?

The Vera CPU is built for AI first workloads and is tightly integrated with Rubin GPUs using high speed links. It supports data intensive processing and enables teams to run complex workloads at rack scale while maximizing the performance of the Rubin GPU compute.

What is NVL72 and why is it significant?

NVL72 is a rack-scale configuration that combines 72 Rubin GPUs with 36 Vera CPUs and high-performance networking. It is one of the first confirmed Vera Rubin configurations designed for large-scale training and inference, operating as a single tightly coupled system rather than a collection of separate GPUs.

What is the difference between NVIDIA DGX and HGX for Rubin?

DGX systems are fully integrated AI platforms from NVIDIA, ready to use out of the box. HGX provides the same GPU and interconnect technology but lets partners and OEMs build custom configurations. With Rubin, both DGX and HGX can deploy Rubin GPUs and Vera CPUs depending on whether you want a turnkey system or a flexible setup.

Should we reserve now or wait for general availability?

Waiting carries real time-to-market risk. Initial Blackwell lines sold out within weeks of availability, and Vera Rubin demand is expected to follow. Reserving now secures your place in the first wave and locks your pricing for the term. You still wait for Q1 2027 delivery, but against a confirmed allocation rather than an open-ended queue.

NVIDIA Vera Rubin: What is it, what's new, and when you can get it

NVIDIA's infrastructure roadmap moves fast, and the next major milestone is already here. The NVIDIA Vera Rubin platform is the company's next-generation AI compute architecture, the successor to Blackwell, and it's shaping up to be one of the most significant leaps forward in AI infrastructure NVIDIA has ever shipped.

Whether you're planning your next training cluster, scaling inference pipelines, or building the infrastructure to power autonomous agents, Vera Rubin is worth understanding now. In this blog, we break down what it is, what's genuinely new, and when you can start getting access.

Civo has early access to Vera Rubin - reserve now

What is NVIDIA Vera Rubin?

Vera Rubin is NVIDIA's next-generation AI platform, succeeding the Blackwell GPU family. But calling it a "new GPU" undersells what it actually is. Vera Rubin is a full data center platform, a collection of six co-designed chips built to work together as a unified AI infrastructure system.

Those six chips are:

Rubin GPU: The next-generation GPU architecture, the compute core of the platform
Vera CPU: A new ARM-based CPU designed specifically for AI-first workloads
NVLink 6 Switch: For high-bandwidth chip-to-chip interconnects
ConnectX-9 SuperNIC: For high-speed networking
BlueField-4 DPU: For data processing and security offload
Spectrum-6 Ethernet Switch: For fabric-level networking at rack scale

The design philosophy NVIDIA calls "extreme co-design" means these chips weren't optimized separately and assembled into a server. They were architected together from the ground up (GPU, CPU, networking, security, and cooling) as a single integrated system. It's the same approach NVIDIA introduced with Blackwell NVL72, but Rubin takes it further with a new in-house CPU, doubled NVLink bandwidth, and HBM4 memory.

“Vera Rubin is a generational leap — seven breakthrough chips, five racks, one giant supercomputer — built to power every phase of AI… The agentic AI inflection point has arrived with Vera Rubin kicking off the greatest infrastructure buildout in history.”

Jensen Huang, Founder and CEO of NVIDIA (Source: NVIDIA press release)

Why the name Vera Rubin?

NVIDIA has a tradition of naming its GPU architectures after pioneering scientists. Vera Rubin is named after the American astronomer whose observations of galaxy rotation curves provided some of the first compelling evidence for dark matter. Her work showed that galaxies contain five to ten times more mass than what's visible, fundamentally reshaping our understanding of the universe, a fitting tribute for a platform designed to unlock a new era of AI capability.

What's new: The key components explained

The Rubin GPU

The Rubin GPU is the successor to the NVIDIA Blackwell architecture, and it brings major improvements aimed squarely at the challenges of large-scale AI inference.

Rubin attacks the cost of inference from three distinct angles:

What changes	Why it matters
Lower precision compute: Rubin's 3rd-generation Transformer Engine adds hardware-level support for NVFP4 (4-bit floating point), enabling inference at much lower numerical precision without meaningful quality loss.	Lower precision dramatically increases tokens per watt and per GPU, reducing inference cost while maintaining model accuracy.
Removing the memory bottleneck: Long-reasoning models generate massive token sequences stored as a KV cache. Rubin increases memory bandwidth and capacity to keep these models fed.	Higher bandwidth and larger memory pools prevent the KV cache from becoming the dominant limiter in long-context and chain-of-thought inference.
Disaggregated inference: NVIDIA introduces the CPX processor, dedicated to prompt processing (prefill), while the Rubin GPU focuses on token generation (decode).	Separating prefill and decode allows higher utilization, enabling operators to serve more concurrent requests with fewer GPUs.

Research from Splitwise and DistServe showed that separating prefill and decode workloads onto specialized hardware yields up to 1.4x higher throughput at 20% lower cost, and Rubin is built with that insight baked into the architecture.

The Vera CPU

In previous GPU generations, NVIDIA paired its GPUs with CPUs from other manufacturers. Blackwell, for example, uses B200 GPUs alongside ARM-based Grace CPUs from a separate design effort. With Vera Rubin, NVIDIA designed both the GPU and CPU in-house.

The Vera CPU is purpose-built for AI-first workloads. It's designed specifically to handle the sequential reasoning patterns that modern AI agents rely on, and it works seamlessly with Rubin GPUs via high-speed interconnects. Critically, the Vera CPU can also operate as a standalone data center CPU; it isn't dependent on being paired with a Rubin GPU.

The practical benefit is eliminating one of the most frustrating bottlenecks in AI infrastructure: GPUs sitting idle while workloads move through orchestration and decision layers. With Vera handling that coordination efficiently, Rubin GPUs spend more time doing what they're built to do.

The Groq LPU Integration

One of the bigger surprises from NVIDIA GTC 2026 was the announcement of Groq LPU integration into dedicated LPX racks. While the Rubin GPU handles heavy compute workloads, Groq's LPU (Language Processing Unit) is designed for lightning-fast token generation.

Fusing these two approaches together is how NVIDIA is targeting a 35x inference performance-per-watt improvement, a figure that, if it holds at production scale, would represent a step-change in the economics of serving AI models.

How it's configured: From individual chips to full racks

One of the strengths of the Vera Rubin platform is its flexibility. Organizations don't have to go all-in on a full rack deployment from day one. Supported configurations include:

Individual Rubin chips: For teams integrating at the component level
HGX Rubin: The GPU and interconnect technology available to OEM partners and cloud providers for custom builds
DGX Rubin: NVIDIA's fully integrated, turnkey AI platform using Rubin GPUs and Vera CPUs, ready to deploy out of the box
NVL72: A rack-scale configuration combining 72 Rubin GPUs with 36 Vera CPUs and high-performance networking

NVL72 is the headline configuration. It's one of the first confirmed Vera Rubin deployments and is designed specifically for large-scale training and inference workloads at rack scale.

Why does this matter? The bigger picture

The timing of Vera Rubin isn't accidental. The industry is moving beyond the phase where AI models are simply trained and deployed. Increasingly, models are expected to act, reason, and operate autonomously, and that changes everything about what AI infrastructure needs to support.

Jensen Huang highlighted at CES 2026 that AI inference is no longer a simple request-response. With the rise of reasoning models and test-time scaling, inference has become a "thinking process", the model generates long chains of thought, tries different approaches, and iterates before producing a final answer. As Huang put it: "the longer it thinks, oftentimes it produces a better answer."

The consequence? The number of tokens generated per inference request is growing by roughly 5x every single year. At the same time, the cost of serving those tokens needs to fall just as rapidly to keep AI deployment economically viable at scale.

Vera Rubin is NVIDIA's answer to both sides of that equation, more compute where it matters, at a substantially lower cost per token.

“Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof… With our annual cadence of delivering a new generation of AI supercomputers — and extreme codesign across six new chips — Rubin takes a giant leap toward the next frontier of AI.”

Jensen Huang, Founder and CEO of NVIDIA (Source: NVIDIA press release)

When can you get NVIDIA Vera Rubin?

NVIDIA has confirmed that Rubin-based systems are in production, with partner and service provider availability expected in the second half of 2026. Early configurations including the NVL72 will be among the first to enter the market.

How to prepare now

Rubin represents the next major evolution in AI infrastructure architecture, designed for rack-scale systems capable of running massive agentic workloads. But as with every generational shift in compute, the organizations that benefit most will be the ones that plan ahead rather than wait for broad availability.

That means evaluating current workloads against the new architecture's strengths, identifying which teams will be early adopters, and securing capacity before allocations fill up.

The workloads coming next, autonomous agents, real-time reasoning systems, and large-scale inference pipelines, will require a new generation of infrastructure to support them. Preparing now ensures your team is ready when the next generation arrives.

Reserve your Vera Rubin capacity today

At Civo, we have been preparing for this next phase of AI infrastructure for some time. That is why we are proud to be one of the first commercial cloud providers with confirmed early access availability for NVIDIA Vera Rubin infrastructure, with delivery from Q1 2027.

Vera Rubin infrastructure at Civo will be available from $11.00/hr, giving organizations early access to one of the biggest shifts in AI infrastructure in years.

If you are planning large-scale AI training, inference, or next-generation AI platforms, now is the time to start thinking about what your infrastructure strategy looks like for 2027 and beyond. Reserve your Vera Rubin capacity by contacting the Civo sales team. First come, first served!

FAQs

The NVIDIA Vera Rubin platform is the next generation of AI compute architecture from NVIDIA. It brings together Rubin GPUs, Vera CPUs, networking and system-scale components into a unified infrastructure for training, inference and reasoning workloads. The Rubin GPU handles large models and long-context reasoning with high throughput, while the tightly integrated Vera CPU supports data-intensive processing at rack scale. For a full breakdown of how Rubin compares to the current generation, see our Vera Rubin vs Blackwell B200 guide.

Vera Rubin is a full generation ahead of Blackwell: 5x inference performance, 50% more memory per GPU, and nearly 3x the memory bandwidth. Because complex reasoning workloads finish in fewer GPU-hours, the higher hourly rate can mean a lower total cost per job than older hardware running longer. Our Vera Rubin vs B200 comparison has the full spec breakdown. If you are weighing the current generation, the B200 vs H100 deep dive is also worth a read.

NVIDIA Vera Rubin: What is it, what's new, and when you can get it

What is NVIDIA Vera Rubin?

Why the name Vera Rubin?

What's new: The key components explained

The Rubin GPU

The Vera CPU

The Groq LPU Integration

How it's configured: From individual chips to full racks

Why does this matter? The bigger picture

When can you get NVIDIA Vera Rubin?

How to prepare now

FAQs

What is the NVIDIA Vera Rubin platform?

How does Vera Rubin compare to the B200, and is it worth the premium?

What role does the Vera CPU play in the Vera Rubin platform?

What is NVL72 and why is it significant?

What is the difference between NVIDIA DGX and HGX for Rubin?

Should we reserve now or wait for general availability?

Related Articles

Our key takeaways from NVIDIA GTC 2026

NVIDIA Rubin GPU vs. NVIDIA Vera CPU

NVIDIA Vera Rubin vs. NVIDIA Blackwell (B200) GPU

Our key takeaways from NVIDIA GTC 2026

NVIDIA Rubin GPU vs. NVIDIA Vera CPU

NVIDIA Vera Rubin vs. NVIDIA Blackwell (B200) GPU

Company

Company

Public Cloud

Public Cloud

Private Cloud

Private Cloud

Civo AI

Civo AI

Solutions

Solutions

Resources

Resources

Contact

Contact