NVIDIA Rubin GPU vs. NVIDIA Vera CPU

5 minutes reading time

Written by

Jubril Oyetunji
Jubril Oyetunji

Technical Writer @ Civo

Brainchild of NVIDIA, the Vera Rubin lineup is shaping up to be a promising release from the company in the first half of the year. In a previous blog, we discussed how the NVIDIA Vera Rubin stacks up against the Blackwell series

The Vera Rubin release is exciting not just because the tech giant is releasing an even more capable GPU, but because it is the first time the company is producing a CPU specifically geared toward “agentic reasoning.”  

In this blog, we take a look at the two sides of NVIDIA's latest release, Vera and Rubin, draw a distinction between the two, and provide some clarity on which you should be most excited for. 

What is Vera Rubin?

Vera Rubin is NVIDIA's successor to the Blackwell generation. It is not a single chip, but a full data center platform made up of six co-designed chips: the Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet switch.

nvidia-rubin-platform

The design philosophy behind the platform is what NVIDIA calls "extreme co-design." Rather than optimizing each chip independently and assembling them into a server, NVIDIA architected the GPU, CPU, networking, security, and cooling together as a single system. 

This is the same approach NVIDIA introduced with Blackwell NVL72, but Rubin takes it further with a new CPU, doubled NVLink bandwidth, and HBM4 memory.

In previous generations, NVIDIA paired its GPUs with CPUs from other manufacturers. The Blackwell lineup, for example, uses B200 GPUs alongside ARM-based Grace CPUs. 

With Vera Rubin, NVIDIA designed both the GPU and CPU in-house. Rubin refers to the GPU architecture, while Vera refers to a new ARM-based CPU built to handle data movement, orchestration, and agentic workloads alongside the Rubin GPU, or independently as a standalone data center CPU.

What is the NVIDIA Vera CPU? 

nvidia-vera-cpu

Vera is special because it is NVIDIA's first CPU designed as a standalone processor to compete directly with traditional data center-grade CPUs like Intel Xeon and AMD EPYC. While NVIDIA has shipped CPUs before with its Grace lineup, those were always tightly paired with NVIDIA GPUs.

SpecVera CPU

Cores

88 custom NVIDIA Olympus cores

Threads

176 via Spatial Multithreading

Memory

Up to 1.5 TB LPDDR5X

NVLink C2C bandwidth

1.8 TB/s

FP8 support

Native

PCIe

Gen6 / CXL 3.1

It is important to note here that NVIDIA designed Vera specifically for the demands of agentic AI, where the CPU is no longer just supporting a model but actively driving it. Agentic workloads require the CPU to handle orchestration, tool use, code execution, and data movement at scale. 

What is the NVIDIA Rubin GPU?

rubin-gpu

The Rubin GPU is designed for large-scale AI inference and training in data centers. It is the first NVIDIA GPU to use HBM4 memory, replacing the HBM3e used in the Blackwell generation.  Additional specs from the Rubin GPU include: 

SpecRubin GPU

Process Node

TSMC N3

Transistor count

336 billion

Memory

288 GB HBM4

Memory bandwidth

22TB/s

NVLink banwith(per GPU)

3.6TB/s

FP4 Inference

50 PFLOPs

The Rubin GPU exists because inference is getting harder. With reasoning models, AI inference is no longer a quick request and response; models now generate long chains of thought, try different approaches, and iterate before producing an answer. 

NVIDIA's headline claim is that Rubin delivers one-tenth the cost per million tokens compared to Blackwell for agentic AI inference, and can train MoE models with one-fourth the number of GPUs. It achieves this through a new Transformer Engine with hardware-accelerated NVFP4 support.

Rubin does not ship as a standalone GPU. It is part of a broader platform. The flagship configuration is the Vera Rubin NVL72, a 100% liquid-cooled rack that combines 72 Rubin GPUs with 36 Vera CPUs, cooled with 45°C water. 

A smaller configuration, the DGX Rubin NVL8, packs eight Rubin GPUs into a liquid-cooled 2U system for training, inference, and post-training workloads. Both systems are optimized for NVIDIA's software stack, including NIM inference microservices and the NeMo framework.

NVIDIA Rubin GPU vs. NVIDIA Vera CPU

So far, we have established some of the functional differences between the two devices, but what do the spec sheets say? 

Spec Vera CPURubin GPU

What it is

A standalone data center CPU

A data center GPU for AI training and inference

Architecture

88 custom NVIDIA Olympus cores

336 billion transistors, TSMC N3 process

Threads

76 via Spatial Multithreading

Not publicly disclosed (at the time of publication)

Memory type

LPDDR5X

HBM4 (first NVIDIA GPU to use HBM4)

Memory capacity

Up to 1.5 TB

288 GB per GPU

Memory bandwidth

1.2 TB/s

22 TB/s

NVLink C2C bandwidth

1.8 TB/s

3.6 TB/s

FP8 support

Yes (first CPU with native FP8)

Yes

Confidential computing

Supported

Supported via rack-scale trusted execution

PCIe

Gen6 / CXL 3.1

Gen6

Other notable differences 

Aside from pure specs, NVIDIA has made certain optimizations to certain areas in the Vera Rubin line:

Energy efficiency and cooling

  • Vera CPU's LPDDR5X memory subsystem delivers 1.2 TB/s of bandwidth at under 50W of memory power, using SOCAMM modules.
  • The Vera Rubin NVL72 is 100% liquid-cooled, using 45°C water with no chillers required.
  • NVIDIA states this allows data centers to allocate up to 10% more power budget directly to compute instead of cooling.

Precision and compute scope

  • Both processors support FP8, but only the Rubin GPU supports FP4 through its Transformer Engine.
  • FP4 inference is the primary way NVIDIA is driving down cost per token on the Rubin platform.
  • Vera's native FP8 support allows it to handle lighter AI workloads like agentic tool execution and data preprocessing without offloading to a GPU.

Software stack

  • Both are optimized for the NVIDIA AI Enterprise software suite, including NIM inference microservices and the NeMo framework.
  • The Vera CPU is additionally positioned for reinforcement learning sandboxing, where NVIDIA claims a single 256-CPU rack can sustain over 22,500 concurrent RL or agent sandbox environments.

Standalone availability

  • The Vera CPU ships as a standalone product in dual and single-socket server configurations from Dell, HPE, Lenovo, and Supermicro.
  • The Rubin GPU only ships as part of multi-GPU system configurations like the NVL72 or NVL8.

Summary

You’ve probably heard it a hundred times this week, but as AI adoption speeds up, organizations are looking for ways to optimize GPU usage and make inference cheaper. The Vera Rubin line-up looks to make this a reality. 

In this post, we distinguish between the Vera CPU and GPU, highlighting the differences and the reasons each one exists. 

If you’re looking to learn more about previous generations of NVIDIA GPUs, here are some resources:

Jubril Oyetunji
Jubril Oyetunji

Technical Writer @ Civo

Jubril Oyetunji is a DevOps engineer and technical writer with a strong focus on cloud-native technologies and open-source tools. His work centers on creating practical tutorials that help developers better understand platforms such as Kubernetes, NGINX, Rust, and Go.

As a contract technical writer, Jubril authored an extensive library of technical guides covering cloud-native infrastructure and modern development workflows. Many of his tutorials achieved strong search rankings, helping developers around the world learn and adopt emerging technologies.

View author profile