Supplier Integration API

A lightweight REST API that GPU providers implement to list inventory, provision instances, and integrate with GPU.ai. Designed for small GPU farms and colocations without existing APIs.

How it works

You implement three REST endpoints on your infrastructure. GPU.ai polls your inventory every 30 seconds, sends provision requests when customers order GPUs, and polls instance status for lifecycle tracking. You don't need to call GPU.ai — we call you.

  1. Contact integrations@gpu.ai to receive your client_id and client_secret
  2. Implement the three endpoints described below
  3. GPU.ai configures your base URL and starts polling automatically

Authentication

GPU.ai authenticates to your endpoints using Bearer tokens obtained via the OAuth2 client credentials flow. GPU.ai calls its own token endpoint with your credentials, receives a short-lived token, and sends it in the Authorization header on every request.

Authorization: Bearer eyJhbGciOiJS...

Tokens expire after 1 hour. GPU.ai handles token refresh automatically — your endpoints just need to validate the Bearer token on each request.

Base URL

All endpoints are relative to your base URL, configured during onboarding. For example, if your base URL is https://api.acme-gpu.com, the availability endpoint would be at https://api.acme-gpu.com/v1/gpu/available.

Endpoints

GET/v1/gpu/available

Returns your current GPU offerings with pricing and availability. GPU.ai polls this every 30 seconds. Only return GPUs that are currently provisionable.

Response body

offeringsarrayrequired
Array of GPU offering objects (see schema below)
// 200 OK
{
  "offerings": [
    {
      "gpu_type": "h100_sxm",
      "gpu_count": 1,
      "vram_per_gpu_gb": 80,
      "cpu_cores": 24,
      "ram_gb": 128,
      "storage_gb": 500,
      "price_per_hour": 3.49,
      "tier": "on_demand",
      "region": "US",
      "datacenter_location": "US-East-1",
      "stock_status": "High",
      "available_count": 12
    }
  ]
}
POST/v1/instances

Provisions a new GPU instance. The startup_script field contains a bootstrap script that must be executed on instance boot — it establishes the SSH tunnel back to GPU.ai. Return immediately with your instance ID; GPU.ai polls status separately.

Request body

instance_idstringrequired
GPU.ai's internal instance ID for correlation
gpu_typestringrequired
Canonical GPU type (e.g. h100_sxm, a100_80gb)
gpu_countintegerrequired
Number of GPUs
tierstringrequired
on_demand or spot
regionstring
Preferred region (best-effort)
ssh_public_keysstring[]
SSH public keys to install on the instance
docker_imagestring
Docker image to use
startup_scriptstring
Bootstrap script — must be executed on boot

Response body

upstream_idstringrequired
Your unique identifier for this instance
statusstringrequired
Initial status (typically creating)
cost_per_hournumber
Actual hourly cost in USD
estimated_ready_secondsinteger
Estimated seconds until instance is running
datacenter_locationstring
Actual datacenter placement
regionstring
Actual region
// 201 Created
{
  "upstream_id": "sup-12345",
  "status": "creating",
  "cost_per_hour": 3.49,
  "estimated_ready_seconds": 60,
  "datacenter_location": "US-East-1",
  "region": "US"
}
GET/v1/instances/{id}

Returns the current status of a provisioned instance. GPU.ai polls this to track lifecycle transitions.

Response body

upstream_idstringrequired
Your instance identifier
statusstringrequired
One of: creating, running, stopping, terminated, error
ipstring
Instance IP address (when running)
cost_per_hournumber
Current hourly cost
uptime_secondsinteger
Seconds since instance started running
// 200 OK
{
  "upstream_id": "sup-12345",
  "status": "running",
  "ip": "10.0.1.55",
  "cost_per_hour": 3.49,
  "uptime_seconds": 3600
}
DELETE/v1/instances/{id}

Terminates a running instance and releases all resources. This must be idempotent — terminating an already-terminated instance should return 204 without error.

Returns 204 No Content on success (no response body).

GPU type identifiers

Use GPU.ai's canonical GPU type strings in your gpu_type fields. These must match exactly.

IdentifierGPUVRAM
h200_sxmNVIDIA H200 SXM141 GB
h100_sxmNVIDIA H100 SXM80 GB
h100_pcieNVIDIA H100 PCIe80 GB
a100_80gbNVIDIA A10080 GB
l40sNVIDIA L40S48 GB
rtx_4090NVIDIA RTX 409024 GB
rtx_3090NVIDIA RTX 309024 GB

Error handling

GPU.ai retries on 5xx errors with exponential backoff (up to 3 attempts). Return 429 with a Retry-After header if you need to rate limit. 4xx errors (except 429) are not retried.

If the availability endpoint returns an error, GPU.ai isolates the failure — your supplier is temporarily skipped while other suppliers continue serving. GPU.ai will resume polling your endpoint on the next 30-second cycle.

OpenAPI spec

The full OpenAPI 3.0.3 specification is available for download. Import it into Postman, Swagger UI, or your API client of choice to explore the endpoints interactively.

View openapi.yaml on GitHub →

Get started

Ready to integrate? Contact integrations@gpu.ai with your company name, datacenter locations, and GPU inventory. We'll provision your credentials and walk you through onboarding.