NVIDIA Rubin GPU vs. NVIDIA Vera CPU

Brainchild of NVIDIA, the Vera Rubin lineup is shaping up to be a promising release from the company in the first half of the year. In a previous blog, we discussed how the NVIDIA Vera Rubin stacks up against the Blackwell series.

The Vera Rubin release is exciting not just because the tech giant is releasing an even more capable GPU, but because it is the first time the company is producing a CPU specifically geared toward “agentic reasoning.”

In this blog, we take a look at the two sides of NVIDIA's latest release, Vera and Rubin, draw a distinction between the two, and provide some clarity on which you should be most excited for.

What is Vera Rubin?

Vera Rubin is NVIDIA's successor to the Blackwell generation. It is not a single chip, but a full data center platform made up of six co-designed chips: the Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet switch.

The design philosophy behind the platform is what NVIDIA calls "extreme co-design." Rather than optimizing each chip independently and assembling them into a server, NVIDIA architected the GPU, CPU, networking, security, and cooling together as a single system.

This is the same approach NVIDIA introduced with Blackwell NVL72, but Rubin takes it further with a new CPU, doubled NVLink bandwidth, and HBM4 memory.

In previous generations, NVIDIA paired its GPUs with CPUs from other manufacturers. The Blackwell lineup, for example, uses B200 GPUs alongside ARM-based Grace CPUs.

With Vera Rubin, NVIDIA designed both the GPU and CPU in-house. Rubin refers to the GPU architecture, while Vera refers to a new ARM-based CPU built to handle data movement, orchestration, and agentic workloads alongside the Rubin GPU, or independently as a standalone data center CPU.

What is the NVIDIA Vera CPU?

Vera is special because it is NVIDIA's first CPU designed as a standalone processor to compete directly with traditional data center-grade CPUs like Intel Xeon and AMD EPYC. While NVIDIA has shipped CPUs before with its Grace lineup, those were always tightly paired with NVIDIA GPUs.

Spec	Vera CPU
Cores	88 custom NVIDIA Olympus cores
Threads	176 via Spatial Multithreading
Memory	Up to 1.5 TB LPDDR5X
NVLink C2C bandwidth	1.8 TB/s
FP8 support	Native
PCIe	Gen6 / CXL 3.1

For more information, see our blog on GPU time-slicing here

It is important to note here that NVIDIA designed Vera specifically for the demands of agentic AI, where the CPU is no longer just supporting a model but actively driving it. Agentic workloads require the CPU to handle orchestration, tool use, code execution, and data movement at scale.

What is the NVIDIA Rubin GPU?

The Rubin GPU is designed for large-scale AI inference and training in data centers. It is the first NVIDIA GPU to use HBM4 memory, replacing the HBM3e used in the Blackwell generation. Additional specs from the Rubin GPU include:

Spec	Rubin GPU
Process Node	TSMC N3
Transistor count	336 billion
Memory	288 GB HBM4
Memory bandwidth	22TB/s
NVLink banwith(per GPU)	3.6TB/s
FP4 Inference	50 PFLOPs

The Rubin GPU exists because inference is getting harder. With reasoning models, AI inference is no longer a quick request and response; models now generate long chains of thought, try different approaches, and iterate before producing an answer.

NVIDIA's headline claim is that Rubin delivers one-tenth the cost per million tokens compared to Blackwell for agentic AI inference, and can train MoE models with one-fourth the number of GPUs. It achieves this through a new Transformer Engine with hardware-accelerated NVFP4 support.

Rubin does not ship as a standalone GPU. It is part of a broader platform. The flagship configuration is the Vera Rubin NVL72, a 100% liquid-cooled rack that combines 72 Rubin GPUs with 36 Vera CPUs, cooled with 45°C water.

A smaller configuration, the DGX Rubin NVL8, packs eight Rubin GPUs into a liquid-cooled 2U system for training, inference, and post-training workloads. Both systems are optimized for NVIDIA's software stack, including NIM inference microservices and the NeMo framework.

NVIDIA Rubin GPU vs. NVIDIA Vera CPU

So far, we have established some of the functional differences between the two devices, but what do the spec sheets say?

Spec	Vera CPU	Rubin GPU
What it is	A standalone data center CPU	A data center GPU for AI training and inference
Architecture	88 custom NVIDIA Olympus cores	336 billion transistors, TSMC N3 process
Threads	76 via Spatial Multithreading	Not publicly disclosed (at the time of publication)
Memory type	LPDDR5X	HBM4 (first NVIDIA GPU to use HBM4)
Memory capacity	Up to 1.5 TB	288 GB per GPU
Memory bandwidth	1.2 TB/s	22 TB/s
NVLink C2C bandwidth	1.8 TB/s	3.6 TB/s
FP8 support	Yes (first CPU with native FP8)	Yes
Confidential computing	Supported	Supported via rack-scale trusted execution
PCIe	Gen6 / CXL 3.1	Gen6

Other notable differences

Aside from pure specs, NVIDIA has made certain optimizations to certain areas in the Vera Rubin line:

Energy efficiency and cooling

Vera CPU's LPDDR5X memory subsystem delivers 1.2 TB/s of bandwidth at under 50W of memory power, using SOCAMM modules.
The Vera Rubin NVL72 is 100% liquid-cooled, using 45°C water with no chillers required.
NVIDIA states this allows data centers to allocate up to 10% more power budget directly to compute instead of cooling.

Precision and compute scope

Both processors support FP8, but only the Rubin GPU supports FP4 through its Transformer Engine.
FP4 inference is the primary way NVIDIA is driving down cost per token on the Rubin platform.
Vera's native FP8 support allows it to handle lighter AI workloads like agentic tool execution and data preprocessing without offloading to a GPU.

Software stack

Both are optimized for the NVIDIA AI Enterprise software suite, including NIM inference microservices and the NeMo framework.
The Vera CPU is additionally positioned for reinforcement learning sandboxing, where NVIDIA claims a single 256-CPU rack can sustain over 22,500 concurrent RL or agent sandbox environments.

Standalone availability

The Vera CPU ships as a standalone product in dual and single-socket server configurations from Dell, HPE, Lenovo, and Supermicro.
The Rubin GPU only ships as part of multi-GPU system configurations like the NVL72 or NVL8.

Summary

You’ve probably heard it a hundred times this week, but as AI adoption speeds up, organizations are looking for ways to optimize GPU usage and make inference cheaper. The Vera Rubin line-up looks to make this a reality.

In this post, we distinguish between the Vera CPU and GPU, highlighting the differences and the reasons each one exists.

If you’re looking to learn more about previous generations of NVIDIA GPUs, here are some resources:

Run Vera Rubin workloads on Civo

Whether you're planning large-scale inference, agentic AI, or MoE model training, Civo has confirmed early access availability for NVIDIA Vera Rubin infrastructure, with delivery from Q1 2027 and pricing from $11.00/hr.

Allocations are limited. First-come, first-served. Contact the Civo sales team >

NVIDIA Rubin GPU vs. NVIDIA Vera CPU

What is Vera Rubin?

What is the NVIDIA Vera CPU?

What is the NVIDIA Rubin GPU?

NVIDIA Rubin GPU vs. NVIDIA Vera CPU

Other notable differences

Energy efficiency and cooling

Precision and compute scope

Software stack

Standalone availability

Summary

Related Articles

NVIDIA Vera Rubin vs. NVIDIA Blackwell (B200) GPU

A100 vs. L40s vs. H100 vs. H200 GH superchips: A comparison of NVIDIA’s next-gen GPUs

Comparing NVIDIA's B200 and H100: A deep dive into next-gen AI performance

NVIDIA Vera Rubin vs. NVIDIA Blackwell (B200) GPU

A100 vs. L40s vs. H100 vs. H200 GH superchips: A comparison of NVIDIA’s next-gen GPUs

Comparing NVIDIA's B200 and H100: A deep dive into next-gen AI performance

Company

Company

Public Cloud

Public Cloud

Private Cloud

Private Cloud

Civo AI

Civo AI

Solutions

Solutions

Resources

Resources

Contact

Contact

Legal

Social