NVIDIA RTX 4000 Ada — 20 GB in a single-slot whisper.

The smallest professional Ada card packs 20 GB of GDDR6 and full Ada tensor cores into a single-slot, 130-watt form factor. At $0.18/hr on spot, the RTX 4000 Ada is the cheapest Ada Lovelace GPU you can rent — with enough VRAM for 14B models and FP8 support that the older Ampere professional cards lack.

Rent RTX 4000 Ada now See spot prices

At a glance

RTX 4000 Ada specifications.

Key hardware specs that determine what workloads this GPU handles.

20GB

VRAM

GDDR6 memory

360 GB/s

Memory Bandwidth

peak throughput

130W

TDP

thermal design power

Ada Lovelace

Architecture

NVIDIA GPU architecture

Spot pricing

RTX 4000 Ada: live hourly rates.

Every provider offering this GPU on the spot market, sorted cheapest first.

Prices in USD per GPU-hour · spot instances · sorted cheapest first

Recommended models

AI models that run well on RTX 4000 Ada.

Tested model-GPU pairings with notes on why each is a good fit.

Phi-4 14B 14B parameters in FP16 need ~28 GB — fits in INT8 at ~14 GB with 6 GB headroom. The Ada tensor cores handle INT8 inference natively with FP16-comparable quality. View model pricing → Qwen 2.5 14B 14B in INT8 fits with room for KV cache. The RTX 4000 Ada's FP8 support can push throughput even higher for latency-insensitive batch workloads. View model pricing → Mistral Nemo 12B 12B in FP16 needs ~24 GB — tight on the 20 GB card, but INT8 quantization drops it to ~12 GB with 8 GB free for serving overhead. View model pricing →

Use cases

What the RTX 4000 Ada is built for.

Cheapest Ada Lovelace inference for 7B-14B models

At $0.18/hr, the RTX 4000 Ada matches the T4's price while delivering 2-3x the inference throughput thanks to Ada tensor cores and FP8 support. The extra 4 GB over the T4 (20 vs 16 GB) opens up 14B models that won't fit on the T4. For budget-conscious teams upgrading from Turing to Ada, this is the entry point.
Edge deployment and space-constrained servers

The single-slot, 130W form factor fits in compact 1U and 2U servers where full-size GPUs won't. Deploy inference at the edge — retail locations, branch offices, or embedded systems — without the power and cooling infrastructure that 300W+ data center cards demand. The RTX 4000 Ada brings Ada-class inference to places other GPUs can't physically reach.
Workstation-integrated AI for design and engineering

CAD engineers and designers running AI-assisted workflows (generative design, neural rendering, automated analysis) on their workstation need a card that handles both GPU compute and display. The RTX 4000 Ada's 20 GB VRAM, professional drivers, and display output make it a dual-purpose card that serves inference and drives monitors simultaneously.

FAQ

Common questions.

RTX 4000 Ada vs RTX A4500 — both have 20 GB, which is better?

The RTX 4000 Ada is Ada Lovelace; the RTX A4500 is Ampere. Ada delivers ~50% higher tensor throughput and adds FP8 support. The RTX A4500 has higher bandwidth (640 vs 360 GB/s) and draws more power (200W vs 130W). For inference throughput, the RTX 4000 Ada wins on tensor performance; for bandwidth-sensitive workloads (long-context serving), the RTX A4500's higher bandwidth may generate tokens faster. Pricing is similar ($0.18 vs $0.19/hr).

Is 20 GB enough VRAM for useful AI inference?

Yes. 20 GB handles most 7B models in FP16 and 14B models in INT8. This covers Mistral Nemo 12B, Phi-4 14B, Qwen 2.5 14B, and the entire sub-14B landscape. The cutoff is 27B models — those need 32 GB or 48 GB cards. If your use case fits in the 7B-14B range, 20 GB is the sweet spot for cost-effective serving.

Why is the RTX 4000 Ada so much cheaper than other Ada cards?

It's the smallest chip in the Ada professional lineup — fewer CUDA cores, less VRAM, and lower power consumption. Fewer raw resources means lower manufacturing cost and lower demand from enterprise buyers who typically want 48 GB. For inference workloads that don't need 48 GB, this is a feature, not a limitation — you get Ada architecture at Turing pricing.

Can I run multiple RTX 4000 Ada cards for larger models?

Physically yes — the single-slot design allows high density. But multi-GPU inference via PCIe has limited bandwidth (~32 GB/s), and the RTX 4000 Ada lacks NVLink. Two RTX 4000 Ada cards (40 GB total, $0.36/hr) could theoretically run a 30B model with tensor parallelism, but the PCIe bottleneck makes latency unpredictable. A single RTX 5090 32 GB or A6000 48 GB is a better choice for models that exceed 20 GB.

Explore

More GPUs.

A10G 24GB GDDR6 · Ampere View spot pricing → A40 48GB GDDR6 · Ampere View spot pricing → B200 180GB HBM3e · Blackwell View spot pricing → B300 288GB HBM3e · Blackwell View spot pricing → L4 24GB GDDR6 · Ada Lovelace View spot pricing → L40 48GB GDDR6X · Ada Lovelace View spot pricing → All GPUs Full marketplace with live pricing for every GPU Compare all → Smart Inference Managed API routing · cheapest provider per request Learn more →

Rent a RTX 4000 Ada. Right now.

Spot pricing, per-second billing, no commitment.

Browse the live marketplace, pick your GPU, deploy in one click. Credits from $10.

Browse the marketplace Compare all GPUs