# GPU API reference

API endpoints for browsing spot GPU offers, starting rentals, checking status, and terminating instances via the AIVory Smart Inference API.

GPU API reference Four endpoints for renting spot GPUs directly. All paths sit under /v1/gpu. Authentication is the same Bearer token used for chat completions.
A u t h o r i z a t i o n : B e a r e r s i _ . . . GET /v1/gpu/offers Lists available spot GPU offers across all providers, with current pricing and interruption probabilities.
{ &#34;object&#34;: &#34;list&#34;, &#34;count&#34;: 79, &#34;data&#34;: [ { &#34;gpu_model&#34;: &#34;RTX 4090&#34;, &#34;vram_gb&#34;: 24, &#34;provider_id&#34;: &#34;vast-ai&#34;, &#34;region&#34;: &#34;Quebec, CA&#34;, &#34;price_per_gpu_hour&#34;: &#34;0.287&#34;, &#34;interruption_probability&#34;: 0.02, &#34;spot&#34;: true } ] } Query filters Parameter Description ?gpu_model=&lt;name&gt; Filter by GPU model (e.g. RTX 4090, H100) ?provider=&lt;id&gt; Only offers from a specific provider ?min_vram=&lt;gb&gt; Minimum VRAM in GB ?max_price=&lt;hourly&gt; Maximum hourly price per GPU All filters stack.
POST /v1/gpu/rent Starts a GPU rental. A credit hold is placed for the minimum session cost before provisioning begins.
Request body { &#34;gpu_model&#34;: &#34;RTX 4090&#34;, &#34;provider_id&#34;: &#34;vast-ai&#34;, &#34;region&#34;: &#34;Quebec, CA&#34;, &#34;model_id&#34;: &#34;meta-llama/Llama-3.3-70B-Instruct&#34;, &#34;docker_image&#34;: &#34;vllm/vllm-openai:latest&#34;, &#34;ssh_key&#34;: &#34;ssh-ed25519 AAAA... user@host&#34;, &#34;fine_tuned&#34;: false } Optional fields Field Description ssh_key Your public SSH key. If provided, the instance is provisioned with SSH access for debugging or custom setup. fine_tuned Set to true if deploying a fine-tuned model variant. The provisioner uses an appropriate serving configuration. Returns the rental object with a rental_id and status: &quot;provisioning&quot;.
GET /v1/gpu/rentals/{rental_id} Returns the current status of a rental.
Status Meaning provisioning GPU is being set up running GPU is active and serving requests terminated GPU has been shut down DELETE /v1/gpu/rentals/{rental_id} Terminates a running rental. You are billed for the time used. The remaining credit hold is released.
Next steps GPU billing Credit holds, per-GPU pricing, and idle timeouts GPU overview When to use GPU rental and the cold-start flow SDK compatibility Which chat parameters are honored 

---

Original URL: https://aivory.net/md/docs/smart-inference/gpu/api/index.md