Tag

KV cache

3 posts tagged with this topic.

Edge AI17 min read

KV Cache Quantization at Sea (3/3)

Google's TurboQuant compresses the KV cache to 3 bits with no accuracy loss. Here is what that means for a vessel GPU rack.

Ethan Marsh·April 10, 2026

VRAM and KV cache math for models from 7B to 405B at 1M tokens. Per-user scaling for crew and guests. The numbers are ugly.

James Calder·April 7, 2026

Edge AI10 min read

Vercel ran the eval: a compressed doc file beat on-demand retrieval 100% to 53%. What that means for AI deployments on vessels.

Ethan Marsh·April 4, 2026