UPDATED 2026.05.07READ 12 MINEDIT ON GITHUB →

CH·07For providers

Supplier API.

A lightweight REST API that GPU providers implement to list inventory, provision instances, and integrate with GPU.ai. Designed for small GPU farms and colocations without existing APIs.

§ 07.1How it works¶

You implement three REST endpoints on your infrastructure. GPU.ai polls your inventory every 30 seconds, sends provision requests when customers order GPUs, and polls instance status for lifecycle tracking. You don't need to call GPU.ai — we call you.

Contact integrations@gpu.ai to receive your client_id and client_secret.
Implement the three endpoints described below.
GPU.ai configures your base URL and starts polling automatically.

§ 07.2Authentication¶

GPU.ai authenticates to your endpoints using Bearer tokens obtained via the OAuth2 client credentials flow. GPU.ai calls its own token endpoint with your credentials, receives a short-lived token, and sends it in the Authorization header on every request.

HEADER

Authorization: Bearer eyJhbGciOiJS...

Tokens expire after 1 hour. GPU.ai handles token refresh automatically — your endpoints just need to validate the Bearer token on each request.

§ 07.3Base URL¶

All endpoints are relative to your base URL, configured during onboarding. For example, if your base URL is https://api.acme-gpu.com, the availability endpoint would be at https://api.acme-gpu.com/v1/gpu/available.

§ 07.4Endpoints¶

GET/v1/gpu/available

Returns your current GPU offerings with pricing and availability. GPU.ai polls this every 30 seconds. Only return GPUs that are currently provisionable.

Response body

offeringsarrayrequired

Array of GPU offering objects (see schema below).

200 OK

{
  "offerings": [
    {
      "gpu_type": "h100_sxm",
      "gpu_count": 1,
      "vram_per_gpu_gb": 80,
      "cpu_cores": 24,
      "ram_gb": 128,
      "storage_gb": 500,
      "price_per_hour": 3.49,
      "tier": "on_demand",
      "region": "US",
      "datacenter_location": "US-East-1",
      "stock_status": "High",
      "available_count": 12
    }
  ]
}

POST/v1/instances

Provisions a new GPU instance. The startup_script field contains a bootstrap script that must be executed on instance boot — it establishes the SSH tunnel back to GPU.ai. Return immediately with your instance ID; GPU.ai polls status separately.

Request body

instance_idstringrequired

GPU.ai's internal instance ID for correlation.

gpu_typestringrequired

Canonical GPU type (e.g. h100_sxm, a100_80gb).

gpu_countintegerrequired

Number of GPUs.

tierstringrequired

on_demand or spot.

regionstring

Preferred region (best-effort).

ssh_public_keysstring[]

SSH public keys to install on the instance.

docker_imagestring

Docker image to use.

startup_scriptstring

Bootstrap script — must be executed on boot.

Response body

upstream_idstringrequired

Your unique identifier for this instance.

statusstringrequired

Initial status (typically creating).

cost_per_hournumber

Actual hourly cost in USD.

estimated_ready_secondsinteger

Estimated seconds until instance is running.

datacenter_locationstring

Actual datacenter placement.

regionstring

Actual region.

201 CREATED

{
  "upstream_id": "sup-12345",
  "status": "creating",
  "cost_per_hour": 3.49,
  "estimated_ready_seconds": 60,
  "datacenter_location": "US-East-1",
  "region": "US"
}

GET/v1/instances/{id}

Returns the current status of a provisioned instance. GPU.ai polls this to track lifecycle transitions.

Response body

upstream_idstringrequired

Your instance identifier.

statusstringrequired

One of: creating, running, stopping, terminated, error.

ipstring

Instance IP address (when running).

cost_per_hournumber

Current hourly cost.

uptime_secondsinteger

Seconds since instance started running.

200 OK

{
  "upstream_id": "sup-12345",
  "status": "running",
  "ip": "10.0.1.55",
  "cost_per_hour": 3.49,
  "uptime_seconds": 3600
}

DELETE/v1/instances/{id}

Terminates a running instance and releases all resources. This must be idempotent — terminating an already-terminated instance should return 204 without error.

Returns 204 No Content on success (no response body).

§ 07.5GPU type identifiers¶

Use GPU.ai's canonical GPU type strings in your gpu_type fields. These must match exactly.

Identifier	GPU	VRAM
`h200_sxm`	NVIDIA H200 SXM	141 GB
`h100_sxm`	NVIDIA H100 SXM	80 GB
`h100_pcie`	NVIDIA H100 PCIe	80 GB
`a100_80gb`	NVIDIA A100	80 GB
`l40s`	NVIDIA L40S	48 GB
`rtx_4090`	NVIDIA RTX 4090	24 GB
`rtx_3090`	NVIDIA RTX 3090	24 GB

§ 07.6Error handling¶

GPU.ai retries on 5xx errors with exponential backoff (up to 3 attempts). Return 429 with a Retry-After header if you need to rate limit. 4xx errors (except 429) are not retried.

§ 07.7OpenAPI spec¶

The full OpenAPI 3.0.3 specification is available for download. Import it into Postman, Swagger UI, or your API client of choice to explore the endpoints interactively.

View openapi.yaml on GitHub ↗

§ 07.8Get started¶

Ready to integrate? Contact integrations@gpu.ai with your company name, datacenter locations, and GPU inventory. We'll provision your credentials and walk you through onboarding.