AIVory  ·  GPU Marketplace

Live spot pricing

NVIDIA B300 — the most memory you can fit on one card.

288 GB of HBM3e on a single accelerator. The B300 Blackwell Ultra pushes memory bandwidth to 12 TB/s and packs enough VRAM to hold a 400B dense model in FP16 without sharding. At $8.19/hr on spot, it's the only practical way to deploy the largest open-weight models on a single GPU.

At a glance

B300 specifications.

Key hardware specs that determine what workloads this GPU handles.

288GB
VRAM

HBM3e memory

12.0 TB/s
Memory Bandwidth

peak throughput

1200W
TDP

thermal design power

Blackwell
Architecture

NVIDIA GPU architecture

Spot pricing

B300: live hourly rates.

Every provider offering this GPU on the spot market, sorted cheapest first.

Loading spot prices…

Prices in USD per GPU-hour · spot instances · sorted cheapest first

Use cases

What the B300 is built for.

  1. Single-GPU deployment of 400B+ parameter models

    The B300 eliminates the biggest operational headache of frontier model deployment: multi-GPU coordination. Models like Llama 4 Maverick (400B MoE) that previously required 4-8 GPU clusters now run on one card at $8.19/hr. Fewer GPUs means simpler infrastructure, no tensor parallelism overhead, and predictable latency.

  2. Research experimentation with full-precision frontier models

    Academic and research teams running ablation studies on 200B+ models need fast iteration cycles. The B300 lets you load the full model in FP16, modify architecture components, and re-run experiments without juggling multi-node orchestration — cutting experiment turnaround from days to hours.

  3. Massive-context serving for retrieval-augmented generation

    RAG pipelines processing 500K+ token contexts generate KV caches that exceed 100 GB alone. The B300's 288 GB VRAM holds both the model weights and enormous KV caches simultaneously, enabling true long-context retrieval without the latency penalty of offloading to CPU memory.

FAQ

Common questions.

B300 vs B200 — when does the extra memory matter?

The B300 adds 108 GB over the B200 (288 vs 180 GB) and costs $8.19/hr vs $6.48/hr — a 26% premium for 60% more VRAM. Choose the B300 when your model exceeds 180 GB in target precision, when you need massive KV cache headroom for long-context serving, or when you want to avoid multi-GPU setups entirely for 200-400B models. For models under 180 GB, the B200 is the better value.

Is the B300 available in multi-GPU configurations?

Yes. B300 NVLink clusters can scale to 8-GPU nodes with 2.3 TB of aggregate VRAM and inter-GPU bandwidth exceeding 1.8 TB/s. These configurations are designed for pre-training runs on trillion-parameter models. However, the B300's primary advantage is that it pushes the single-GPU frontier so far that most models don't need multi-GPU at all.

What models are too large even for a B300?

Dense models above ~400B parameters in FP16 (requiring 800+ GB) still need multi-GPU setups. Similarly, training runs on 200B+ models with Adam optimizer states (which triple the memory footprint) exceed 288 GB. For these workloads, a B300 cluster is the answer, not a single card.

How does spot pricing work for such an expensive GPU?

B300 spot instances follow the same model as cheaper GPUs — you pay the hourly rate with per-second billing, accept the risk of preemption, and save 60-80% versus on-demand pricing. The interruption probability is typically low (under 5%) because B300 supply is still ramping and demand is concentrated in reserved contracts. Spot is a genuine bargain at this tier.

Rent a B300. Right now.

Spot pricing, per-second billing, no commitment.

Browse the live marketplace, pick your GPU, deploy in one click. Credits from $10.