Edge AI17 min read
KV Cache Quantization at Sea (3/3)
Google's TurboQuant compresses the KV cache to 3 bits with no accuracy loss. Here is what that means for a vessel GPU rack.
Ethan Marsh·April 10, 2026
3 posts tagged with this topic.
Google's TurboQuant compresses the KV cache to 3 bits with no accuracy loss. Here is what that means for a vessel GPU rack.
Ethan Marsh·April 10, 2026
VRAM and KV cache math for models from 7B to 405B at 1M tokens. Per-user scaling for crew and guests. The numbers are ugly.
James Calder·April 7, 2026
Vercel ran the eval: a compressed doc file beat on-demand retrieval 100% to 53%. What that means for AI deployments on vessels.
Ethan Marsh·April 4, 2026