NVIDIA B300 — the most memory you can fit on one card.

288 GB of HBM3e on a single accelerator. The B300 Blackwell Ultra pushes memory bandwidth to 12 TB/s and packs enough VRAM to hold a 400B dense model in FP16 without sharding. At $8.19/hr on spot, it's the only practical way to deploy the largest open-weight models on a single GPU.

Rent B300 now See spot prices

At a glance

B300 specifications.

Key hardware specs that determine what workloads this GPU handles.

288GB

VRAM

HBM3e memory

12.0 TB/s

Memory Bandwidth

peak throughput

1200W

TDP

thermal design power

Blackwell

Architecture

NVIDIA GPU architecture

Spot pricing

B300: live hourly rates.

Every provider offering this GPU on the spot market, sorted cheapest first.

Prices in USD per GPU-hour · spot instances · sorted cheapest first

Recommended models

AI models that run well on B300.

Tested model-GPU pairings with notes on why each is a good fit.

Llama 4 Maverick 400B MoE model fits comfortably in FP16 with room for batched KV caches. No tensor parallelism needed — single-card deployment eliminates inter-GPU latency. View model pricing → DeepSeek V3.2 671B MoE at FP8 precision sits well within the 288 GB envelope. The 12 TB/s bandwidth keeps expert dispatch under 2ms even at high batch sizes. View model pricing → Qwen 3 235B Full FP16 deployment on one card with 50+ GB headroom for KV cache, context batching, and LoRA adapters running concurrently. View model pricing →

Use cases

What the B300 is built for.

Single-GPU deployment of 400B+ parameter models

The B300 eliminates the biggest operational headache of frontier model deployment: multi-GPU coordination. Models like Llama 4 Maverick (400B MoE) that previously required 4-8 GPU clusters now run on one card at $8.19/hr. Fewer GPUs means simpler infrastructure, no tensor parallelism overhead, and predictable latency.
Research experimentation with full-precision frontier models

Academic and research teams running ablation studies on 200B+ models need fast iteration cycles. The B300 lets you load the full model in FP16, modify architecture components, and re-run experiments without juggling multi-node orchestration — cutting experiment turnaround from days to hours.
Massive-context serving for retrieval-augmented generation

RAG pipelines processing 500K+ token contexts generate KV caches that exceed 100 GB alone. The B300's 288 GB VRAM holds both the model weights and enormous KV caches simultaneously, enabling true long-context retrieval without the latency penalty of offloading to CPU memory.

FAQ

Common questions.

B300 vs B200 — when does the extra memory matter?

The B300 adds 108 GB over the B200 (288 vs 180 GB) and costs $8.19/hr vs $6.48/hr — a 26% premium for 60% more VRAM. Choose the B300 when your model exceeds 180 GB in target precision, when you need massive KV cache headroom for long-context serving, or when you want to avoid multi-GPU setups entirely for 200-400B models. For models under 180 GB, the B200 is the better value.

Is the B300 available in multi-GPU configurations?

Yes. B300 NVLink clusters can scale to 8-GPU nodes with 2.3 TB of aggregate VRAM and inter-GPU bandwidth exceeding 1.8 TB/s. These configurations are designed for pre-training runs on trillion-parameter models. However, the B300's primary advantage is that it pushes the single-GPU frontier so far that most models don't need multi-GPU at all.

What models are too large even for a B300?

Dense models above ~400B parameters in FP16 (requiring 800+ GB) still need multi-GPU setups. Similarly, training runs on 200B+ models with Adam optimizer states (which triple the memory footprint) exceed 288 GB. For these workloads, a B300 cluster is the answer, not a single card.

How does spot pricing work for such an expensive GPU?

B300 spot instances follow the same model as cheaper GPUs — you pay the hourly rate with per-second billing, accept the risk of preemption, and save 60-80% versus on-demand pricing. The interruption probability is typically low (under 5%) because B300 supply is still ramping and demand is concentrated in reserved contracts. Spot is a genuine bargain at this tier.

Explore

More GPUs.

A10G 24GB GDDR6 · Ampere View spot pricing → A40 48GB GDDR6 · Ampere View spot pricing → B200 180GB HBM3e · Blackwell View spot pricing → L4 24GB GDDR6 · Ada Lovelace View spot pricing → L40 48GB GDDR6X · Ada Lovelace View spot pricing → MI300X 192GB HBM3 · CDNA 3 View spot pricing → All GPUs Full marketplace with live pricing for every GPU Compare all → Smart Inference Managed API routing · cheapest provider per request Learn more →

Rent a B300. Right now.

Spot pricing, per-second billing, no commitment.

Browse the live marketplace, pick your GPU, deploy in one click. Credits from $10.

Browse the marketplace Compare all GPUs