Updated April 20, 2026
8 entries
Field reports, market commentary, and the occasional announcement from the team building the aggregation layer for AI compute. We write when we have something honest to say.
Why we started GPU.ai, what we believe about the compute layer, and the world we're trying to build toward — by the founders.
A strategic partnership giving GPU.ai customers first-look access to NovaCore's Hyderabad Blackwell cluster — and giving NovaCore tenants a single API into U.S. east-coast capacity. One compute layer. Two continents.
Teams that optimize their GPU clusters for training often find them poorly suited for inference — and vice versa. Here's why the hardware requirements diverge, and how an aggregated supply model lets you spec each workload correctly without buying twice.
Price per GPU-hour is table stakes. The seven things that actually determine whether a provider will work for your workload — and why we built our pricing engine to expose every one of them.
H100 lead times have collapsed from 52 weeks to under 8. The shortage narrative is outdated — what's replacing it is a real, liquid, price-discovered market for AI compute. That's a much bigger deal.
DeepSeek V3 trained for $5.6M. R1 added another $294K. When frontier-quality models cost single-digit millions and ship under open weights, the real moat moves from the model to the inference fleet — and that fleet has to be elastic.
1.5x the FP4 compute of B200. 288GB HBM3e per GPU. 8TB/s memory bandwidth. The headline specs are real — here's what they mean for your training run, your serving fleet, and your next quarter's GPU bill.
Virtualization overhead costs you 10–15% of your GPU compute. At supercluster scale, that's millions in wasted spend. We surface bare metal and virtualized capacity side-by-side so you can pick the right one for each workload.
What's next
Per-second billing. Sixty seconds from CLI to SSH.