NVIDIA Vera Rubin: Inside the $8.8 Million AI Supercomputer Powering the Next AI Era

How NVIDIA's next-generation Vera Rubin platform with 60 exaflops of AI compute is reshaping the AI infrastructure landscape — and what Malaysian tech businesses need to know.

At the GPU Technology Conference (GTC) 2026, NVIDIA CEO Jensen Huang unveiled what many are calling the company's most ambitious platform yet: Vera Rubin. Named after the American astronomer who discovered dark matter, Vera Rubin represents a generational leap in AI infrastructure — combining seven custom chips, a revolutionary new CPU architecture, and record-breaking memory bandwidth into an AI factory that delivers up to 60 exaflops of compute power.

For Malaysian technology professionals, system integrators, and business leaders tracking AI infrastructure trends, Vera Rubin isn't just another hardware launch. It signals a fundamental shift in how AI workloads will be built, deployed, and scaled over the next decade. The platform's staggering cost — up to US$8.8 million per rack — and its 485% memory cost increase have already sent ripples through the global data center industry.

60

Exaflops AI compute per Vera Rubin POD (40 racks)

US$8.8M

Cost per Vera Rubin NVL72 rack system

485%

Increase in memory costs per rack vs. Blackwell

What Is NVIDIA Vera Rubin?

Vera Rubin is NVIDIA's next-generation AI computing platform, the direct successor to the Blackwell architecture that currently powers the world's largest AI training clusters. Unlike previous generational leaps that focused primarily on GPU improvements, Vera Rubin is a complete platform overhaul — encompassing a new CPU (Vera), a new GPU architecture (Rubin), and five additional companion chips that handle networking, memory, and AI inference acceleration.

The platform was announced in full production status at GTC 2026 in March, with NVIDIA confirming that all seven chips are being manufactured simultaneously. Initial shipments are expected in the second half of 2026, with major cloud providers and hyperscalers already positioning for deployment.

The Seven Chips Powering Vera Rubin

What makes Vera Rubin unique is its system-on-package approach. Unlike previous generations where a single GPU die handled most of the work, Vera Rubin distributes computation across seven distinct chips, each optimized for a specific function:

Vera CPU — NVIDIA's second-generation data center CPU (following Grace), built on a custom ARM-based architecture. Meta has already announced plans to deploy standalone Vera CPUs in production.
Rubin GPU — The flagship GPU architecture, delivering substantially higher FP8 and FP4 throughput compared to Blackwell.
NVLink Switch — Sixth-generation NVLink interconnect for seamless GPU-to-GPU communication across the rack.
CX9 SmartNIC — Latest data processing unit for network acceleration and virtualization.
BlueField 4 — Next-gen DPU for storage and security offload, enabling zero-trust architectures.
Groq 3 LPU — Acquired through NVIDIA's Groq acquisition, this Language Processing Unit provides SRAM-packed acceleration boosting "every layer of the AI model on every token."
Spectrum 5 Switch — Fifth-gen Ethernet switch providing 800 Gbps per port for large-scale AI fabric deployments.

Vera Rubin NVL72: The $8.8 Million Rack

The NVL72 is the base building block of the Vera Rubin platform — a single rack system integrating 72 Rubin GPUs with Vera CPUs, NVLink switches, and Groq 3 LPUs in a tightly coupled, liquid-cooled chassis. At US$7.8 to 8.8 million per rack, the NVL72 represents a more than 2x cost increase over the previous-generation Blackwell NVL72.

According to industry analysis, memory now accounts for nearly 25% of the total rack cost, up from approximately 5% in the Hopper generation. This is driven by the massive HBM4 memory stacks required. Supermicro has already demonstrated its Vera Rubin NVL72 rack with an all-new coolant delivery system designed to handle the thermal demands of the seven-chip platform.

60 Exaflops: The Vera Rubin POD

For hyperscale deployments, NVIDIA offers the Vera Rubin POD — a 40-rack AI factory supercomputer delivering 60 exaflops of AI compute. To put that in perspective, 60 exaflops is more computing power than the world's top 500 supercomputers combined. A single POD consumes approximately 15-20 megawatts of power, with cooling systems requiring millions of litres of coolant circulation per hour.

Memory Costs Soar 485%: What Changed?

The most surprising revelation from GTC 2026 was the 485% increase in memory costs per rack compared to Blackwell. Three factors drive this:

HBM4 transition — Vera Rubin is the first NVIDIA platform to use HBM4 memory, with more complex stacking and interposer requirements.
Increased capacity per GPU — Each Rubin GPU carries more memory than Blackwell, enabling larger models to fit without sharding.
Memory-heavy workload shift — AI models have grown 100x in size over the past three years, requiring massive memory pools.

What This Means for Malaysian Tech Businesses

While the Vera Rubin platform is aimed squarely at hyperscalers, its implications filter down to the Malaysian technology ecosystem:

AI-as-a-Service pricing will rise — Malaysian SMEs using cloud AI services should prepare for potential price increases of 20-40% for high-end GPU compute over the next 12-18 months.
Edge AI becomes more attractive — As data center AI compute becomes more expensive, the economics of on-device AI improve dramatically. Malaysian tech companies should invest in model optimization.
Cooling and infrastructure opportunities — The shift to liquid cooling creates opportunities for Malaysian engineering firms and data center operators in regional hyperscaler buildouts.
Partner ecosystem growth — Malaysian system integrators with expertise in liquid-cooled infrastructure and AI cluster deployment may find new opportunities.

The Rosa and Feynman Roadmap

NVIDIA also previewed its longer-term roadmap at GTC 2026, announcing the upcoming Rosa CPU and Feynman GPU architectures — expected to follow Vera Rubin around 2028. This confirms NVIDIA's commitment to a two-year cadence for major platform refreshes.

"The Vera Rubin platform represents NVIDIA's most ambitious bet: that AI demand will continue growing exponentially, and that the industry needs a fundamentally rearchitected platform to meet it."

Frequently Asked Questions

When will NVIDIA Vera Rubin be available?

NVIDIA has confirmed that Vera Rubin is in full production across all seven chips, with initial shipments expected in the second half of 2026. Major cloud providers are expected to begin offering Vera Rubin instances in early 2027.

How much does a Vera Rubin NVL72 system cost?

Industry analysts estimate the Vera Rubin NVL72 rack system costs between US$7.8 million and US$8.8 million, depending on configuration — more than double the previous-generation Blackwell NVL72 rack.

What is the difference between Vera and Rubin?

Vera is NVIDIA's second-generation data center CPU (successor to Grace, ARM-based). Rubin is the new GPU architecture (successor to Blackwell). "Vera Rubin" is the complete platform combining both chips with five additional companion chips.

Will NVIDIA Vera Rubin affect cloud AI pricing for Malaysian SMEs?

Yes. As hyperscalers deploy Vera Rubin, they face 485% higher memory costs. This is expected to lead to 20-40% price increases for high-end GPU cloud instances over the next 12-18 months. Malaysian SMEs should explore model optimization and edge AI alternatives.

What comes after Vera Rubin?

NVIDIA has previewed the Rosa CPU and Feynman GPU architectures, expected around 2028, confirming a two-year cadence for major platform refreshes.

# AI for Business Tech & IT Essentials

Cybersecurity for SMEs: The Hidden Cost of Going Digital