NVIDIA B200 — Blackwell firepower for frontier-scale AI.

The successor to the H100 throne. 180 GB of HBM3e, 8.0 TB/s memory bandwidth, and second-generation transformer engine make the B200 the first single GPU that can hold a 200B+ dense model in full precision. Spot rates start at $6.48/hr — a fraction of on-demand pricing for the most powerful GPU on the market.

Rent B200 now See spot prices

At a glance

B200 specifications.

Key hardware specs that determine what workloads this GPU handles.

180GB

VRAM

HBM3e memory

8.0 TB/s

Memory Bandwidth

peak throughput

1000W

TDP

thermal design power

Blackwell

Architecture

NVIDIA GPU architecture

Spot pricing

B200: live hourly rates.

Every provider offering this GPU on the spot market, sorted cheapest first.

Prices in USD per GPU-hour · spot instances · sorted cheapest first

Recommended models

AI models that run well on B200.

Tested model-GPU pairings with notes on why each is a good fit.

DeepSeek V3.2 671B MoE architecture runs on a single B200 node with tensor parallelism. The 8 TB/s bandwidth keeps expert routing at production latency. View model pricing → Qwen 3 235B Fits entirely in FP16 on a single B200 with room for large KV caches. No multi-GPU sharding needed — one card, one model. View model pricing → Llama 4 Scout 109B active parameters from a 17B-per-expert MoE. The B200's massive VRAM holds the full model in FP16 with overhead for batched serving. View model pricing →

Use cases

What the B200 is built for.

Running 200B+ dense models on a single GPU

Before the B200, deploying Qwen 3 235B or similar frontier models required multi-GPU clusters with NVLink. At 180 GB HBM3e, the B200 fits these models in FP16 on one card — cutting infrastructure complexity and inter-GPU communication overhead while spot pricing at $6.48/hr keeps costs manageable.
Full-precision fine-tuning of 70B models

Fine-tuning Llama 3.3 70B in BF16 with optimizer states requires ~210 GB of memory on older GPUs, demanding at least 3x A100 80GB. The B200 handles the full training loop on a single card with Adam optimizer states, eliminating the coordination overhead of distributed training.
High-context inference with million-token windows

Long-context models like Gemini or GPT-4 with 1M+ context windows generate enormous KV caches. The B200's 180 GB VRAM and 8 TB/s bandwidth sustain high-throughput serving even at 128K+ context lengths where other GPUs choke on memory pressure.

FAQ

Common questions.

How does the B200 compare to the H100 for LLM inference?

The B200 delivers roughly 2.5x the inference throughput of an H100 on transformer workloads, thanks to the second-generation transformer engine and FP4 support. Memory capacity jumps from 80 GB to 180 GB, and bandwidth from 3.35 TB/s to 8.0 TB/s. For models that fit on a single H100, the B200 serves more concurrent requests; for models that required multi-H100 setups, the B200 may run them on one card.

Is the B200 worth the price premium over the H200?

At $6.48/hr vs $4.71/hr for the H200, the B200 costs ~38% more per hour. But it delivers roughly 2x the compute throughput and 28% more VRAM (180 vs 141 GB). If your model is compute-bound (high batch sizes, production throughput targets), the B200's cost-per-token is lower. If you're memory-bound on a model that fits on the H200, the H200 is the better value.

What cooling and power infrastructure does the B200 require?

The B200 draws up to 1000W TDP and requires liquid cooling in most data center configurations. This is handled by cloud providers — when you rent a B200 spot instance, the provider manages power and thermal delivery. You don't need to worry about infrastructure; you pay the hourly rate and get a ready-to-use GPU.

Can I use the B200 for training, not just inference?

Absolutely. The B200's FP8 and FP4 tensor cores are designed for both training and inference. A single B200 can fine-tune 70B models that previously required multi-GPU setups. For pre-training at scale, B200 clusters with fifth-generation NVLink deliver near-linear scaling. Spot pricing makes short training runs economical — rent 8 B200s for a few hours instead of committing to reserved instances.

Explore

More GPUs.

A10G 24GB GDDR6 · Ampere View spot pricing → A40 48GB GDDR6 · Ampere View spot pricing → B300 288GB HBM3e · Blackwell View spot pricing → L4 24GB GDDR6 · Ada Lovelace View spot pricing → L40 48GB GDDR6X · Ada Lovelace View spot pricing → MI300X 192GB HBM3 · CDNA 3 View spot pricing → All GPUs Full marketplace with live pricing for every GPU Compare all → Smart Inference Managed API routing · cheapest provider per request Learn more →

Rent a B200. Right now.

Spot pricing, per-second billing, no commitment.

Browse the live marketplace, pick your GPU, deploy in one click. Credits from $10.

Browse the marketplace Compare all GPUs