
Infrastructure14 min read
The Real Cost of Running 1M Context Locally (Part 2 of 3)
VRAM and KV cache math for 7B, 13B, 70B, and 405B models at 1M tokens. Scaling per user, for a single crew member, the whole workforce, and the whole workforce plus guests. The numbers are uglier than the marketing suggests, and there is exactly one way out.
James Calder·April 7, 2026