HBM3e memory
AIVory · GPU Marketplace
Live spot pricingNVIDIA B300 — the most memory you can fit on one card.
288 GB of HBM3e on a single accelerator. The B300 Blackwell Ultra pushes memory bandwidth to 12 TB/s and packs enough VRAM to hold a 400B dense model in FP16 without sharding. At $8.19/hr on spot, it's the only practical way to deploy the largest open-weight models on a single GPU.
At a glance
B300 specifications.
Key hardware specs that determine what workloads this GPU handles.
peak throughput
thermal design power
NVIDIA GPU architecture
Spot pricing
B300: live hourly rates.
Every provider offering this GPU on the spot market, sorted cheapest first.
Prices in USD per GPU-hour · spot instances · sorted cheapest first
Recommended models
AI models that run well on B300.
Tested model-GPU pairings with notes on why each is a good fit.
Use cases
What the B300 is built for.
-
Single-GPU deployment of 400B+ parameter models
The B300 eliminates the biggest operational headache of frontier model deployment: multi-GPU coordination. Models like Llama 4 Maverick (400B MoE) that previously required 4-8 GPU clusters now run on one card at $8.19/hr. Fewer GPUs means simpler infrastructure, no tensor parallelism overhead, and predictable latency.
-
Research experimentation with full-precision frontier models
Academic and research teams running ablation studies on 200B+ models need fast iteration cycles. The B300 lets you load the full model in FP16, modify architecture components, and re-run experiments without juggling multi-node orchestration — cutting experiment turnaround from days to hours.
-
Massive-context serving for retrieval-augmented generation
RAG pipelines processing 500K+ token contexts generate KV caches that exceed 100 GB alone. The B300's 288 GB VRAM holds both the model weights and enormous KV caches simultaneously, enabling true long-context retrieval without the latency penalty of offloading to CPU memory.
FAQ
Common questions.
B300 vs B200 — when does the extra memory matter?
The B300 adds 108 GB over the B200 (288 vs 180 GB) and costs $8.19/hr vs $6.48/hr — a 26% premium for 60% more VRAM. Choose the B300 when your model exceeds 180 GB in target precision, when you need massive KV cache headroom for long-context serving, or when you want to avoid multi-GPU setups entirely for 200-400B models. For models under 180 GB, the B200 is the better value.
Is the B300 available in multi-GPU configurations?
Yes. B300 NVLink clusters can scale to 8-GPU nodes with 2.3 TB of aggregate VRAM and inter-GPU bandwidth exceeding 1.8 TB/s. These configurations are designed for pre-training runs on trillion-parameter models. However, the B300's primary advantage is that it pushes the single-GPU frontier so far that most models don't need multi-GPU at all.
What models are too large even for a B300?
Dense models above ~400B parameters in FP16 (requiring 800+ GB) still need multi-GPU setups. Similarly, training runs on 200B+ models with Adam optimizer states (which triple the memory footprint) exceed 288 GB. For these workloads, a B300 cluster is the answer, not a single card.
How does spot pricing work for such an expensive GPU?
B300 spot instances follow the same model as cheaper GPUs — you pay the hourly rate with per-second billing, accept the risk of preemption, and save 60-80% versus on-demand pricing. The interruption probability is typically low (under 5%) because B300 supply is still ramping and demand is concentrated in reserved contracts. Spot is a genuine bargain at this tier.
Rent a B300. Right now.
Spot pricing, per-second billing, no commitment.
Browse the live marketplace, pick your GPU, deploy in one click. Credits from $10.