Tag

KV cache VRAM

1 post tagged with this topic.

Rack-mounted GPU cluster with glowing status lights

The Real Cost of Running 1M Context Locally (Part 2 of 3)

VRAM and KV cache math for 7B, 13B, 70B, and 405B models at 1M tokens. Scaling per user, for a single crew member, the whole workforce, and the whole workforce plus guests. The numbers are uglier than the marketing suggests, and there is exactly one way out.

James Calder·April 7, 2026