GPU.ai now offers custom GPU buildouts →
Vol. 01 · Dispatches

Updated April 20, 2026

8 entries

Notes from the
compute layer.

Field reports, market commentary, and the occasional announcement from the team building the aggregation layer for AI compute. We write when we have something honest to say.

01
EngineeringMarch 30, 2026

Training and inference need different infrastructure

Teams that optimize their GPU clusters for training often find them poorly suited for inference — and vice versa. Here's why the hardware requirements diverge, and how an aggregated supply model lets you spec each workload correctly without buying twice.

Aditya Reddy3 min read
02
MarketMarch 12, 2026

What to look for in a GPU cloud (besides the price tag)

Price per GPU-hour is table stakes. The seven things that actually determine whether a provider will work for your workload — and why we built our pricing engine to expose every one of them.

John Nguyen3 min read
03
MarketMarch 1, 2026

The GPU shortage is over. The GPU market is just getting started.

H100 lead times have collapsed from 52 weeks to under 8. The shortage narrative is outdated — what's replacing it is a real, liquid, price-discovered market for AI compute. That's a much bigger deal.

Aryamaan Singhania3 min read
04
ResearchFebruary 21, 2026

DeepSeek and the open-source inference shift: where the bottleneck moves next

DeepSeek V3 trained for $5.6M. R1 added another $294K. When frontier-quality models cost single-digit millions and ship under open weights, the real moat moves from the model to the inference fleet — and that fleet has to be elastic.

Aditya Reddy3 min read
05
EngineeringJanuary 28, 2026

NVIDIA Blackwell Ultra: what the GB300 NVL72 actually changes

1.5x the FP4 compute of B200. 288GB HBM3e per GPU. 8TB/s memory bandwidth. The headline specs are real — here's what they mean for your training run, your serving fleet, and your next quarter's GPU bill.

Aditya Reddy3 min read
06
EngineeringDecember 18, 2025

Why bare metal still matters for training (and where it doesn't)

Virtualization overhead costs you 10–15% of your GPU compute. At supercluster scale, that's millions in wasted spend. We surface bare metal and virtualized capacity side-by-side so you can pick the right one for each workload.

William Han2 min read

What's next

Build on the compute layer.

Per-second billing. Sixty seconds from CLI to SSH.