NVIDIA Vera Rubin vs. NVIDIA Blackwell (B200) GPU
Written by
Technical Writer @ Civo
Written by
Technical Writer @ Civo
Since 1999, when NVIDIA invented the GPU (graphics processing unit), the demand has “skyrocketed”. At CES 2026, CEO Jensen Huang announced their latest GPU, named after Vera Rubin. This follows on from the announcement of their Blackwell lineup only two years ago.
Through this blog, we’ll explore what the industry knows about the Vera Rubin so far. Plus, we will take a look at some specs in comparison to the NVIDIA B200 from the Blackwell lineup.
What is the NVIDIA Vera Rubin?
Vera Rubin is NVIDIA's next-generation GPU architecture, the successor to the Blackwell family.
💡 The name, Vera Rubin, follows NVIDIA's tradition of naming GPU architectures after pioneering scientists. In this case, they have selected Vera Rubin, the American astronomer whose observations of galaxy rotation curves provided some of the first compelling evidence for dark matter. Her work showed that galaxies contain five to ten times more mass than what's visible, fundamentally reshaping our understanding of the universe.
Throughout this blog, we'll refer to the architecture as Rubin. In practice, NVIDIA uses “Vera Rubin” to describe both the GPU architecture and a full data-center platform that includes CPUs, networking, and interconnects alongside the R100 GPU.
Within this architecture, the R100 has been announced as the first GPU product. When you see "Vera Rubin NVL72" or "DGX Rubin," those are systems that use R100 GPUs based on the Rubin architecture.
Why is the NVIDIA Rubin important?
Each release from NVIDIA sets out to improve a certain aspect of computation. At CES 2026, Jensen Huang highlighted that AI inference is no longer a simple one-shot request-response. With the rise of reasoning models and test-time scaling, inference has become a “thinking process”, whereby the model generates long chains of thought, tries different approaches, and iterates before producing a final answer. As Huang put it, "the longer it thinks, oftentimes it produces a better answer."
According to Huang, test-time scaling is causing the number of tokens generated per inference request to grow by roughly 5x every single year. At the same time, the race to the next frontier of AI means the cost of last-generation tokens drops by about 10x per year as newer, more efficient models and hardware replace them.
Making inference cheaper
Combining the demand for inference and long reasoning models, Rubin is designed to attack this problem from three angles:
💡 NVIDIA introduced a new processor class, the CPX. This specialized processor is designed specifically for the prefill stage of LLM inference. Traditional GPUs handle both the ‘prefill’ and ‘decode’ phases, but research (Splitwise and DistServe) showed that separating these workloads onto specialized hardware called disaggregated inference yields up to 1.4x higher throughput at 20% lower cost.
An introduction to the NVIDIA Rubin (R100)
The R100 is the first GPU built on NVIDIA’s Rubin architecture and is designed specifically for large-scale AI inference and training workloads in data centers.
At a high level, the R100 represents a shift away from simply maximizing raw compute and toward optimizing the full inference pipeline, including memory access, interconnect bandwidth, and long-context reasoning efficiency. Key characteristics of the R100 include:
- Next-generation process node: R100 is manufactured on TSMC’s N3 process, enabling higher transistor density and improved performance-per-watt compared to Blackwell’s 4NP node.
- HBM4 memory subsystem: With 288 GB of HBM4 and up to 22 TB/s of memory bandwidth, the R100 is designed to keep long-context and reasoning-heavy models fed without stalling on memory access.
- Optimized for low-precision inference: R100 is tightly coupled with NVIDIA’s 3rd-generation Transformer Engine, providing native hardware support for NVFP4 to maximize throughput and efficiency during inference.
- High-bandwidth scale-out interconnect: Support for next-generation NVLink enables up to 3.6 TB/s of bidirectional bandwidth per GPU, allowing R100s to operate as part of tightly coupled, rack-scale systems.
- Designed for disaggregated systems: Rather than operating in isolation, the R100 is intended to work alongside specialized processors like CPX, with different parts of the inference pipeline mapped to the hardware best suited to each stage.
Unlike previous generations, the R100 is not positioned as a general-purpose accelerator for every workload. Instead, it is purpose-built for the realities of modern AI systems: long-running inference, large KV caches, and reasoning models that trade time and tokens for higher-quality outputs.
NVIDIA Rubin (R100) vs. NVIDIA Blackwell (B200) GPU
It is important to note that MLPerf results for the R100 are not yet available. MLPerf is an industry-standard benchmark suite maintained by ML Commons that provides standardized, reproducible performance measurements for ML training and inference across hardware platforms.
It is widely regarded as the closest thing to an apples-to-apples comparison in this space. Until R100 submissions appear, the numbers above are NVIDIA's own published specs rather than independently verified benchmarks.
Summary
NVIDIA's latest GPU lineup sets a promising premise in an age where demand for compute and inference is growing exponentially. Being able to do it cheaper and faster lowers the barrier for organizations building on AI, whether that means training the next frontier model or serving millions of inference requests at a fraction of the current cost.
With Rubin in production and MLPerf results still to come, the real test will be how these specs translate to real-world workloads. If you’re looking to learn more about previous generations of NVIDIA GPUs, here are some resources:

Technical Writer @ Civo
Jubril Oyetunji is a DevOps engineer and technical writer with a strong focus on cloud-native technologies and open-source tools. His work centers on creating practical tutorials that help developers better understand platforms such as Kubernetes, NGINX, Rust, and Go.
As a contract technical writer, Jubril authored an extensive library of technical guides covering cloud-native infrastructure and modern development workflows. Many of his tutorials achieved strong search rankings, helping developers around the world learn and adopt emerging technologies.
Share this article
Related Articles
4 August 2025
NVIDIA Blackwell B200 GPUs are now available on Civo
Josh Mesout
Chief Innovation Officer @ Civo
11 September 2025
Inside Civo’s launch of NVIDIA Blackwell B200 cloud compute
Kendall Miller
Founder and CEO @ Maybe Don't AI
2 June 2025
Comparing NVIDIA's B200 and H100: A deep dive into next-gen AI performance
Mostafa Ibrahim
Software Engineer @ GoCardless