NVIDIA Vera Rubin vs. NVIDIA Blackwell (B200) GPU
Written by
Technical Writer at Civo
Written by
Technical Writer at Civo
Since 1999, when NVIDIA invented the GPU (graphics processing unit), the demand has “skyrocketed”. At CES 2026, CEO Jensen Huang announced their latest GPU, named after Vera Rubin. This follows on from the announcement of their Blackwell lineup only two years ago.
Through this blog, we'll explore the Vera Rubin platform and compare the Rubin GPU against the NVIDIA B200 from the Blackwell lineup.
What is the NVIDIA Vera Rubin?
Vera Rubin is NVIDIA's next-generation GPU architecture, the successor to the Blackwell family.
💡 The name, Vera Rubin, follows NVIDIA's tradition of naming GPU architectures after pioneering scientists. In this case, they have selected Vera Rubin, the American astronomer whose observations of galaxy rotation curves provided some of the first compelling evidence for dark matter. Her work showed that galaxies contain five to ten times more mass than what's visible, fundamentally reshaping our understanding of the universe.
Throughout this blog, we'll refer to the architecture as Rubin. In practice, NVIDIA uses "Vera Rubin" to describe both the GPU architecture and a full data center platform that includes CPUs, networking, and interconnects alongside the Rubin GPU.
When you see "Vera Rubin NVL72" or "DGX Rubin," those are systems built on Rubin GPUs. The flagship configuration is the Vera Rubin NVL72, a 100% liquid-cooled rack combining 72 Rubin GPUs with 36 Vera CPUs. A smaller configuration, the DGX Rubin NVL8, packs eight Rubin GPUs into a liquid-cooled 2U system.
Why is the NVIDIA Rubin important?
Each release from NVIDIA sets out to improve a certain aspect of computation. At CES 2026, Jensen Huang highlighted that AI inference is no longer a simple one-shot request-response. With the rise of reasoning models and test-time scaling, inference has become a “thinking process”, whereby the model generates long chains of thought, tries different approaches, and iterates before producing a final answer. As Huang put it, "the longer it thinks, oftentimes it produces a better answer."
According to Huang, test-time scaling is causing the number of tokens generated per inference request to grow by roughly 5x every single year. At the same time, the race to the next frontier of AI means the cost of last-generation tokens drops by about 10x per year as newer, more efficient models and hardware replace them.
Making inference cheaper
Combining the demand for inference and long reasoning models, Rubin is designed to attack this problem from three angles:
💡 NVIDIA introduced a new processor class, the CPX. This specialized processor is designed specifically for the prefill stage of LLM inference. Traditional GPUs handle both the ‘prefill’ and ‘decode’ phases, but research (Splitwise and DistServe) showed that separating these workloads onto specialized hardware called disaggregated inference yields up to 1.4x higher throughput at 20% lower cost.
An introduction to the NVIDIA Rubin
The Rubin GPU is the flagship GPU of NVIDIA's Vera Rubin platform, designed specifically for large-scale AI inference and training workloads in data centers.
At a high level, the Rubin lineup represents a shift away from simply maximizing raw compute and toward optimizing the full inference pipeline, including memory access, interconnect bandwidth, and long-context reasoning efficiency. Key characteristics of the Rubin GPU include:
- Next-generation process node: The Rubin GPU is manufactured on TSMC's N3 process, enabling higher transistor density and improved performance-per-watt compared to Blackwell's 4NP node.
- HBM4 memory subsystem: With 288 GB of HBM4 and up to 22 TB/s of memory bandwidth, the Rubin GPU is designed to keep long-context and reasoning-heavy models fed without stalling on memory access.
- Optimized for low-precision inference: The Rubin GPU is tightly coupled with NVIDIA's 3rd-generation Transformer Engine, providing native hardware support for NVFP4 to maximize throughput and efficiency during inference.
- High-bandwidth scale-out interconnect: Support for sixth-generation NVLink enables up to 3.6 TB/s of bidirectional bandwidth per GPU, allowing Rubin GPUs to operate as part of tightly coupled, rack-scale systems.
- Designed for disaggregated systems: Rather than operating in isolation, the Rubin GPU is intended to work alongside specialized processors like CPX, with different parts of the inference pipeline mapped to the hardware best suited to each stage.
Unlike previous generations, the Rubin GPU is not positioned as a general-purpose accelerator for every workload. Instead, it is purpose-built for the realities of modern AI systems: long-running inference, large KV caches, and reasoning models that trade time and tokens for higher-quality outputs.
NVIDIA Rubin vs. NVIDIA Blackwell (B200) GPU
It is important to note that MLPerf results for the Rubin GPU are not yet available. MLPerf is an industry-standard benchmark suite maintained by ML Commons that provides standardized, reproducible performance measurements for ML training and inference across hardware platforms.
It is widely regarded as the closest thing to an apples-to-apples comparison in this space. Until Rubin submissions appear, the numbers above are NVIDIA's own published specs rather than independently verified benchmarks.
NVIDIA has confirmed additional performance claims at GTC 2026. NVIDIA states that the Rubin GPU can train MoE models with one-fourth the number of GPUs compared to Blackwell, and delivers one-tenth the cost per million tokens for agentic AI inference. Additionally, when deployed alongside NVIDIA Groq 3 LPX racks, the Vera Rubin NVL72 delivers up to 35x inference performance per watt for trillion-parameter models relative to Blackwell.
Summary
NVIDIA's latest GPU lineup sets a promising premise in an age where demand for compute and inference is growing exponentially. Being able to do it cheaper and faster lowers the barrier for organizations building on AI, whether that means training the next frontier model or serving millions of inference requests at a fraction of the current cost.
With Rubin in production and MLPerf results still to come, the real test will be how these specs translate to real-world workloads. If you’re looking to learn more about previous generations of NVIDIA GPUs, here are some resources:
- A100 vs. L40s vs. H100 vs. H200 GH Superchips
- NVIDIA’s B200 vs. H100
- Inside Civo’s launch of NVIDIA Blackwell B200 cloud compute
Reserve your NVIDIA Vera Rubin capacity today
If the Vera Rubin vs Blackwell comparison has you thinking about your next infrastructure decision, Civo has confirmed early access availability for NVIDIA Vera Rubin, with delivery from Q1 2027 and pricing from $11.00/hr.
Allocations are limited. First-come, first-served. to discuss your options.
FAQs

Technical Writer at Civo
Jubril Oyetunji is a DevOps engineer and technical writer with a strong focus on cloud-native technologies and open-source tools. His work centers on creating practical tutorials that help developers better understand platforms such as Kubernetes, NGINX, Rust, and Go.
As a contract technical writer, Jubril authored an extensive library of technical guides covering cloud-native infrastructure and modern development workflows. Many of his tutorials achieved strong search rankings, helping developers around the world learn and adopt emerging technologies.
Share this article