On-Vessel GPU Selection Guide

A yacht management company sent us a quote last month. Their integrator had proposed a pair of H100 SXM5 80GB cards for a single 85-meter sailing yacht, with a projected total installed cost (hardware, racks, power conditioning, cooling, and professional services) north of $160,000. The owner wanted to know if it was the right call.

The short answer: no. The longer answer is the subject of this guide.

April 2026 update. This guide originally covered H100, L40S, and A100 only. Since then NVIDIA has shipped the H200 (a Hopper memory refresh) and Blackwell (B200 and B100). Those cards now ship in vessel deployments, and the recommendation math has shifted in some scenarios. I have added full sections on both below. The core conclusion still stands: two L40S cards is the right call for the majority of single-vessel deployments, but there are specific scenarios where H200 or Blackwell now wins decisively. If you are reading this because you are about to sign a PO, read through all six options before you do.

Picking the wrong GPU for a vessel deployment is the kind of mistake that only becomes obvious a year later, when you are either paying to run silicon you do not need or replacing hardware that cannot handle the workload you actually put on it. I want to walk through how we actually make this decision at ShipboardAI, what the real specs look like on each of the three GPUs that matter for on-vessel AI today, and where we would put our own money.

The three questions that determine GPU choice

Before you look at a spec sheet, you answer three questions. The answers collapse the decision to one or two viable options in almost every case.

1. What is the largest model you need to run locally, and at what quantization? Your GPU memory budget is the single most important number in this conversation. A 70B parameter LLM at FP16 needs about 140GB of VRAM. The same model at 4-bit quantization (Q4_K_M or equivalent) fits in about 38–42GB depending on context window. Q8 lands somewhere in between. Once you pick a target model and quantization level, the VRAM ceiling is fixed and most of your options disappear.

2. How many concurrent inference requests do you need to handle? A guest concierge answering one question at a time is a completely different workload than a crew operations assistant handling twelve simultaneous agent chains during turndown. Concurrency drives throughput, and throughput drives either "you need more memory bandwidth per card" or "you need more cards." These are two different hardware decisions.

3. What is your redundancy posture? A single card deployment fails unrecoverably when the card fails. For a charter yacht where the AI is a nice-to-have, that may be fine. For a commercial vessel or a private yacht where the AI is operationally load-bearing, it is not. Two smaller cards with automatic failover beat one bigger card almost every time in the real world, even when the single bigger card has better peak performance on paper.

Write the answers down before you look at prices. The answers almost always eliminate two of the three options below.

Blackwell (B200 and B100): when you are buying the ceiling

NVIDIA Blackwell is the current flagship architecture, launched in 2024-2025 and finally available at sane lead times in early 2026. The two data-center cards that matter for our conversation are the B200 and B100.

B200 SXM ships with 192 GB of HBM3e memory, delivers about 8 TB/s of memory bandwidth, draws up to 1000W, and costs somewhere between $30,000 and $40,000 per card on street pricing. It is a dual-die design built on TSMC's 4NP process, roughly 208 billion transistors, and it introduces native FP4 support via the second-generation Transformer Engine. The raw number that matters: about 4,500 TFLOPS FP8 dense (roughly 2.3x the H100), and 9,000 TFLOPS FP4 dense (which doubles effective throughput on models that can use the lower precision). Reported real-world LLM inference throughput is 11-15x Hopper generation depending on workload.

B100 SXM is the same architecture at a lower power point: 192 GB HBM3e, 8 TB/s bandwidth, 700W (same envelope as H100 SXM5), and slightly reduced clocks versus the B200. It is the "I want Blackwell memory and architecture without upgrading my power plant" option.

You want Blackwell when:

You are running 70B or 405B models at 1M context with real throughput. This is the single deployment scenario where Blackwell's 192 GB of HBM3e single-handedly solves problems that required multi-card configurations on H100. A single B200 can hold a 4-bit Llama 3.3 70B weights plus a full 1M context FP8 KV cache (roughly 200 GB of total VRAM needed becomes feasible on a pair of B200s; with 3-bit KV cache quantization, it fits in a single B200 with margin). That is a genuine category shift.
You are serving high-concurrency batched inference. FP4 on Blackwell changes the per-token economics for workloads that tolerate lower precision. Fleet-wide deployments that were H100-limited on throughput are now Blackwell-capable at meaningfully lower cost per token.
You are replacing a DGX or HGX cluster. If you were going to buy four H100 SXM5 cards anyway, two B200s give you more total memory, more bandwidth, and better per-watt efficiency in the same rack footprint. The math works.

You do not want Blackwell when:

Your vessel cannot support a 1000W sustained card. B200 SXM is the highest-power data center GPU NVIDIA has ever shipped. A yacht that was already pushing its thermal budget on H100s at 700W will not handle B200 without a rack-level cooling redesign. B100 at 700W is the compromise, but it is still priced closer to B200 than to H100.
You are a single-vessel deployment with one or two concurrent users. The B200 is wildly over-specced for this. You are paying $35,000 for a card that spends 90% of its time idle.
You need hardware inside the next 30 days. Blackwell supply is improving but it is still behind demand, and vessel deployments are not at the front of the allocation queue. If you have a deadline, H100 or H200 has much shorter lead times.

Honest take: out of the same twenty ShipboardAI deployment discussions I mentioned earlier, Blackwell is the right recommendation exactly zero times so far, and I expect it to land at maybe 2-3 per year once lead times normalize. Mostly for large commercial operators and cruise lines, not for individual yachts. But for the yachts where it does make sense (the owners who want the ceiling, the 200-meter flagships, the deployments that will grow into multi-model workloads), it is a real option in a way it was not six months ago.

H200: the smart-money Hopper refresh

The H200 is the card NVIDIA shipped in 2024 as a memory-and-bandwidth refresh of the H100. Same GH100 die, same compute, same thermal envelope. What changed is the memory: 141 GB of HBM3e (up from 80 GB) and 4.8 TB/s of bandwidth (up from 3.35 TB/s). Price sits between H100 and B200, typically $28,000 to $35,000 per card, depending on volume and channel.

For vessel deployments, the H200 is quietly the most interesting card NVIDIA has shipped in a year. Here is why.

You want H200 when:

You were going to buy a pair of H100s for long-context workloads and you can wait for H200 delivery. The 141 GB memory lets a single H200 handle the KV cache scenarios that required two H100s. With KV cache quantization on top (FP8 or 3-bit TurboQuant), a single H200 will serve 1M context on a quantized 70B for a single user with comfortable headroom. The math has changed. Two H100s at $25K each = $50K for 160 GB; one H200 at $32K = $32K for 141 GB. Cheaper, one card, less rack space, lower power draw.
You want Blackwell-like memory footprint without Blackwell-like power draw. 141 GB is most of the way to the B200's 192 GB, at 700W sustained (vs 1000W for B200 SXM). For a vessel where every watt matters and the thermal budget is fixed, H200 is the sensible compromise.
You need a drop-in replacement for an H100 deployment that is hitting memory limits. The driver stack, the inference engine, and the rest of the toolchain treats H200 exactly like H100 because it is the same architecture. You redeploy your checkpoints and everything keeps working. Blackwell, by contrast, requires updated inference engine builds and some tuning for FP4 if you want the full benefit.

You do not want H200 when:

You need the raw FP8/FP4 compute throughput of Blackwell. The H200 is a memory refresh, not a compute refresh. If your bottleneck is FLOPs per token rather than memory per session, you are buying the wrong card.
You are running a single-vessel single-user deployment at reasonable context lengths. An L40S still wins on price and power. The H200 is for deployments where you need the extra memory.

Honest take: H200 is the card I am recommending right now for "I want 1M context running locally on 70B" single-user yacht deployments. It is cheaper than a pair of H100s, cheaper than a B200, and gives you enough memory to run the workload without KV cache gymnastics on day one (though you should still turn on FP8 KV cache from the start because it is free quality-equivalent performance).

H100: when it is the right call

The NVIDIA H100 is the current flagship data center GPU. The SXM5 variant ships with 80GB of HBM3 memory, delivers about 3.35 TB/s of memory bandwidth, draws up to 700W in SXM5 form or 350W in the PCIe variant, and costs somewhere between $25,000 and $35,000 per card depending on who you buy from and when. The specs are real and the performance is real, but the H100 is the right answer for a narrow set of vessel deployments.

You want H100 when:

You are running 70B+ models with long context at production throughput. An H100 can keep a quantized 70B model in memory with room for the KV cache at a useful context length, and the memory bandwidth means tokens come out fast enough for real interactive use. On an L40S you would be paging or dropping context. On an A100 80GB you would be close to the edge.
You need to serve multiple concurrent heavy workloads. H100s shine when you are running batched inference across multiple sessions. The NVLink on SXM variants also lets you pool memory across cards for even larger models, though cross-card NVLink adds serious complexity to the cooling and power story on a vessel.
You are building for the next three to five years. If your fleet is going to grow and you do not want to be ripping out hardware in eighteen months, H100 buys you more headroom than anything else you can physically put in a vessel rack today.
You are running workloads beyond LLM inference. Computer vision at scale, fine-tuning, RAG pipelines with heavy embedding models on the same box. These stack up, and H100 handles the stack where L40S does not.

You do not want H100 when:

You are a single-vessel deployment running one or two models under light concurrency. This is the 85-meter yacht scenario above. The H100 is massive overkill for what the workload actually demands. You will pay for bandwidth and flops you will never use.
Your power and cooling budget is constrained. An SXM5 H100 is 700W sustained. On a vessel, that is a meaningful fraction of a typical rack's thermal budget before you account for the CPU, storage, and networking sitting next to it.
You care about price per useful token. For pure inference workloads under moderate concurrency, the L40S delivers a better price/performance ratio by a significant margin. You pay the H100 premium for capabilities most single-vessel deployments do not touch.

The honest take: out of the last twenty ShipboardAI deployment discussions we have had, H100 was the right recommendation exactly twice. Both were commercial cruise operators with fleet-scale inference needs and serious concurrency.

L40S: the yacht sweet spot

The NVIDIA L40S is the card we recommend most often, and we recommend it for almost every single-vessel deployment. It ships with 48GB of GDDR6 memory, delivers about 864 GB/s of memory bandwidth, draws 350W, and costs somewhere between $7,500 and $9,500 per card depending on your integrator relationships.

The L40S is interesting because it is not an H100. NVIDIA built it as a universal data center GPU positioned between the consumer-grade RTX 6000 Ada line and the full data center HBM-memory parts. For our use case that positioning is almost perfect.

You want L40S when:

You are running 30B to 70B models at Q4 or Q8 quantization. A quantized 70B fits in 48GB with enough headroom for a reasonable context window. Llama 3.3 70B Instruct at Q4_K_M runs comfortably. Mistral Large, Qwen 2.5 72B, and DBRX all fit with room to work.
Your concurrency is moderate. Think 2 to 8 simultaneous inference streams with reasonable latency, not 32 concurrent streams hitting the same model. The L40S will keep up.
You want dual-card redundancy at a reasonable price. Two L40S cards at $16,000 to $19,000 of silicon cost gives you failover and enough combined memory to run a quantized 70B on either card independently. You can configure one as primary and one as warm standby, or you can load-balance across both. Either way you have real redundancy for less than half the cost of a single H100 SXM5.
You have a mixed workload. Computer vision, embedding models, voice transcription (Whisper-large), and LLM inference all running on the same card. The L40S handles mixed workloads well because it has plenty of FP16 compute and enough memory to keep multiple models resident.

You do not want L40S when:

You are running a 405B model. Too small. Llama 3 405B at Q4 still needs more than 48GB for the weights alone, plus context. You are back to H100 territory or a multi-card solution.
You are running high-concurrency production inference. The memory bandwidth ceiling of GDDR6 will show up under pressure. If you need to serve dozens of concurrent sessions on the same model at low latency, you will feel the difference between GDDR6 and HBM3.

For the 85-meter sailing yacht from the intro paragraph, the right answer was two L40S cards in a dual-card configuration, with the rest of the rack budget going into proper cooling, serious UPS, and local NVMe storage for the model weights and logs. The final install came in under half the proposed H100 budget and delivers substantially better operational characteristics.

A100: the reasonable backup plan

The A100 is the previous generation, a 2020-2021 era card that NVIDIA still sells in some channels and that has a large secondary market. It ships in 40GB or 80GB HBM2e variants, delivers about 1.6–2.0 TB/s of memory bandwidth depending on the specific SKU, draws 300-400W, and costs roughly $10,000–$15,000 new for the 80GB variant or as low as $5,000–$7,000 on the used market.

The A100 is the right answer in two specific situations.

First, when budget is the hard constraint. If the owner will not approve new H100 or L40S pricing and you need to get something deployed, an A100 80GB is still a competent card. It will run a quantized 70B model. It will handle moderate concurrency. It will work. You give up some headroom and some modern features (no FP8, weaker sparsity support, older NVLink), but the core inference workloads of a vessel deployment are ones the A100 was specifically designed for. It is not exciting, but it is reasonable.

Second, when you are building a redundant pair and you already have one A100 in the rack. A yacht that bought an A100 eighteen months ago is not doing anything wrong by adding a second A100 for failover instead of mixing card generations. Matched pairs simplify your operational story and your driver stack. We have seen fleets make this call specifically to avoid the hassle of managing two different generations of hardware in the same engine room.

Where A100 stops making sense: new single-card deployments where budget is not the constraint. For the same money as a new A100 80GB, the L40S gives you more modern features, better power efficiency, and a longer support horizon from NVIDIA. The L40S is just a better purchase unless you have specific reasons to prefer the older card.

Power and cooling: the reality check nobody writes down

GPU spec sheets do not tell you the truth about what it takes to run these cards on a vessel. Here is the part the vendor does not put in the quote.

Sustained power draw is higher than rated TDP for short periods. A 350W L40S can pull 380W or more during a batched inference spike. An H100 SXM5 rated at 700W has been measured pulling 750W under load. You design your power budget for 20% above nameplate. If you do not, you trip breakers at the worst possible time.

Cooling is the actual constraint on most vessels. A yacht's engine room is already thermally stressed. Adding a GPU rack means you are adding another meaningful heat load to a space that was probably not designed for it. The real question is not "does my vessel have 1.5 kW of spare power?", almost every large yacht does. The real question is "does my vessel have the cooling capacity to remove 1.5 kW of heat from a specific rack location, continuously, in a tropical anchorage?" That one is harder to answer and almost always more expensive to fix.

Vibration is a real variable. GPUs are reasonably robust, but the surrounding infrastructure (spinning disks, connectors, mounting hardware)is not always. Every vessel GPU deployment we have done has been dual-mounted: vibration isolation at the rack level and again at the individual component level. Skipping this step is how you get intermittent failures six months after installation that are impossible to diagnose remotely.

Salt air wants to kill your hardware. Engine rooms are often the least-sealed spaces on a yacht. If you are installing GPU hardware in or near the engine room, you are installing it in a corrosive environment. Conformal coating on circuit boards, stainless fasteners, sealed enclosures, and positive-pressure ventilation with filtered intake air are all worth the money. None of this is in the vendor quote.

If the integrator proposing your deployment does not have a paragraph on each of these topics, get a different integrator.

The quiet recommendation for 90% of single-vessel deployments

Two L40S cards in a dual-card configuration. Sized for a quantized 70B model as the ceiling, with room to run embedding models and Whisper on the side. Configured for warm failover, not hot pooling. Deployed in a purpose-built enclosure with forced-air cooling, vibration isolation, and a UPS sized to bridge the window between a generator fault and genset restart.

Every word of that is the answer to a lesson learned the hard way by somebody else.

The new options change the picture at the edges, not at the center. If you need 1M context on a 70B model for a single primary user, the right call is now one H200, not a pair of H100s. That is the most useful update this guide has received since it was written. If you are buying the ceiling for a 200-meter flagship or a commercial cruise operator and you have the power budget to support it, Blackwell is finally an option that makes sense. If you are shipping a standard yacht deployment (single vessel, one to ten concurrent users, quantized 30B-70B models, reasonable context lengths), the L40S recommendation stands unchanged. For everything else, we step up to H100 or H200 depending on memory needs, and for budget-constrained retrofits we still step down to A100 80GB.

On a vessel, the best hardware is the hardware that keeps working when the other card fails, when the generator hiccups, when the engine room temperature climbs, when the anchorage is humid, and when the guest in cabin 4 asks their question anyway. Two midrange cards beat one top-end card at that job every time.

That is the quiet lesson of why cloud AI does not work at sea applied one layer down the stack. The environment decides the architecture. Everything else is optimization around what the environment will actually let you get away with.

For the practical details of satellite bandwidth and how it shapes model selection, see Marcus Hale's take on vessel fleet security for the hardening side of the same problem, and Ethan Marsh's series on context vs. RAG for how to think about which workloads are worth the memory budget.

Sizing an on-vessel GPU deployment and not sure which card is right for your workload? Talk to us. We will tell you what we would actually buy for your vessel and why, even when the answer is smaller than what the big integrators are quoting.