A yacht management company sent us a quote last month. Their integrator had proposed a pair of H100 SXM5 80GB cards for a single 85-meter sailing yacht, with a projected total installed cost (hardware, racks, power conditioning, cooling, and professional services)north of $160,000. The owner wanted to know if it was the right call.
The short answer: no. The longer answer is the subject of this guide.
Picking the wrong GPU for a vessel deployment is the kind of mistake that only becomes obvious a year later, when you are either paying to run silicon you do not need or replacing hardware that cannot handle the workload you actually put on it. I want to walk through how we actually make this decision at ShipboardAI, what the real specs look like on each of the three GPUs that matter for on-vessel AI today, and where we would put our own money.
The three questions that determine GPU choice
Before you look at a spec sheet, you answer three questions. The answers collapse the decision to one or two viable options in almost every case.
1. What is the largest model you need to run locally, and at what quantization? Your GPU memory budget is the single most important number in this conversation. A 70B parameter LLM at FP16 needs about 140GB of VRAM. The same model at 4-bit quantization (Q4_K_M or equivalent) fits in about 38–42GB depending on context window. Q8 lands somewhere in between. Once you pick a target model and quantization level, the VRAM ceiling is fixed and most of your options disappear.
2. How many concurrent inference requests do you need to handle? A guest concierge answering one question at a time is a completely different workload than a crew operations assistant handling twelve simultaneous agent chains during turndown. Concurrency drives throughput, and throughput drives either "you need more memory bandwidth per card" or "you need more cards." These are two different hardware decisions.
3. What is your redundancy posture? A single card deployment fails unrecoverably when the card fails. For a charter yacht where the AI is a nice-to-have, that may be fine. For a commercial vessel or a private yacht where the AI is operationally load-bearing, it is not. Two smaller cards with automatic failover beat one bigger card almost every time in the real world, even when the single bigger card has better peak performance on paper.
Write the answers down before you look at prices. The answers almost always eliminate two of the three options below.
H100: when it is the right call
The NVIDIA H100 is the current flagship data center GPU. The SXM5 variant ships with 80GB of HBM3 memory, delivers about 3.35 TB/s of memory bandwidth, draws up to 700W in SXM5 form or 350W in the PCIe variant, and costs somewhere between $25,000 and $35,000 per card depending on who you buy from and when. The specs are real and the performance is real, but the H100 is the right answer for a narrow set of vessel deployments.
You want H100 when:
- You are running 70B+ models with long context at production throughput. An H100 can keep a quantized 70B model in memory with room for the KV cache at a useful context length, and the memory bandwidth means tokens come out fast enough for real interactive use. On an L40S you would be paging or dropping context. On an A100 80GB you would be close to the edge.
- You need to serve multiple concurrent heavy workloads. H100s shine when you are running batched inference across multiple sessions. The NVLink on SXM variants also lets you pool memory across cards for even larger models, though cross-card NVLink adds serious complexity to the cooling and power story on a vessel.
- You are building for the next three to five years. If your fleet is going to grow and you do not want to be ripping out hardware in eighteen months, H100 buys you more headroom than anything else you can physically put in a vessel rack today.
- You are running workloads beyond LLM inference. Computer vision at scale, fine-tuning, RAG pipelines with heavy embedding models on the same box. These stack up, and H100 handles the stack where L40S does not.
You do not want H100 when:
- You are a single-vessel deployment running one or two models under light concurrency. This is the 85-meter yacht scenario above. The H100 is massive overkill for what the workload actually demands. You will pay for bandwidth and flops you will never use.
- Your power and cooling budget is constrained. An SXM5 H100 is 700W sustained. On a vessel, that is a meaningful fraction of a typical rack's thermal budget before you account for the CPU, storage, and networking sitting next to it.
- You care about price per useful token. For pure inference workloads under moderate concurrency, the L40S delivers a better price/performance ratio by a significant margin. You pay the H100 premium for capabilities most single-vessel deployments do not touch.
The honest take: out of the last twenty ShipboardAI deployment discussions we have had, H100 was the right recommendation exactly twice. Both were commercial cruise operators with fleet-scale inference needs and serious concurrency.
L40S: the yacht sweet spot
The NVIDIA L40S is the card we recommend most often, and we recommend it for almost every single-vessel deployment. It ships with 48GB of GDDR6 memory, delivers about 864 GB/s of memory bandwidth, draws 350W, and costs somewhere between $7,500 and $9,500 per card depending on your integrator relationships.
The L40S is interesting because it is not an H100. NVIDIA built it as a universal data center GPU positioned between the consumer-grade RTX 6000 Ada line and the full data center HBM-memory parts. For our use case that positioning is almost perfect.
You want L40S when:
- You are running 30B to 70B models at Q4 or Q8 quantization. A quantized 70B fits in 48GB with enough headroom for a reasonable context window. Llama 3.3 70B Instruct at Q4_K_M runs comfortably. Mistral Large, Qwen 2.5 72B, and DBRX all fit with room to work.
- Your concurrency is moderate. Think 2 to 8 simultaneous inference streams with reasonable latency, not 32 concurrent streams hitting the same model. The L40S will keep up.
- You want dual-card redundancy at a reasonable price. Two L40S cards at $16,000 to $19,000 of silicon cost gives you failover and enough combined memory to run a quantized 70B on either card independently. You can configure one as primary and one as warm standby, or you can load-balance across both. Either way you have real redundancy for less than half the cost of a single H100 SXM5.
- You have a mixed workload. Computer vision, embedding models, voice transcription (Whisper-large), and LLM inference all running on the same card. The L40S handles mixed workloads well because it has plenty of FP16 compute and enough memory to keep multiple models resident.
You do not want L40S when:
- You are running a 405B model. Too small. Llama 3 405B at Q4 still needs more than 48GB for the weights alone, plus context. You are back to H100 territory or a multi-card solution.
- You are running high-concurrency production inference. The memory bandwidth ceiling of GDDR6 will show up under pressure. If you need to serve dozens of concurrent sessions on the same model at low latency, you will feel the difference between GDDR6 and HBM3.
For the 85-meter sailing yacht from the intro paragraph, the right answer was two L40S cards in a dual-card configuration, with the rest of the rack budget going into proper cooling, serious UPS, and local NVMe storage for the model weights and logs. The final install came in under half the proposed H100 budget and delivers substantially better operational characteristics.
A100: the reasonable backup plan
The A100 is the previous generation, a 2020-2021 era card that NVIDIA still sells in some channels and that has a large secondary market. It ships in 40GB or 80GB HBM2e variants, delivers about 1.6–2.0 TB/s of memory bandwidth depending on the specific SKU, draws 300-400W, and costs roughly $10,000–$15,000 new for the 80GB variant or as low as $5,000–$7,000 on the used market.
The A100 is the right answer in two specific situations.
First, when budget is the hard constraint. If the owner will not approve new H100 or L40S pricing and you need to get something deployed, an A100 80GB is still a competent card. It will run a quantized 70B model. It will handle moderate concurrency. It will work. You give up some headroom and some modern features (no FP8, weaker sparsity support, older NVLink), but the core inference workloads of a vessel deployment are ones the A100 was specifically designed for. It is not exciting, but it is reasonable.
Second, when you are building a redundant pair and you already have one A100 in the rack. A yacht that bought an A100 eighteen months ago is not doing anything wrong by adding a second A100 for failover instead of mixing card generations. Matched pairs simplify your operational story and your driver stack. We have seen fleets make this call specifically to avoid the hassle of managing two different generations of hardware in the same engine room.
Where A100 stops making sense: new single-card deployments where budget is not the constraint. For the same money as a new A100 80GB, the L40S gives you more modern features, better power efficiency, and a longer support horizon from NVIDIA. The L40S is just a better purchase unless you have specific reasons to prefer the older card.
Power and cooling: the reality check nobody writes down
GPU spec sheets do not tell you the truth about what it takes to run these cards on a vessel. Here is the part the vendor does not put in the quote.
Sustained power draw is higher than rated TDP for short periods. A 350W L40S can pull 380W or more during a batched inference spike. An H100 SXM5 rated at 700W has been measured pulling 750W under load. You design your power budget for 20% above nameplate. If you do not, you trip breakers at the worst possible time.
Cooling is the actual constraint on most vessels. A yacht's engine room is already thermally stressed. Adding a GPU rack means you are adding another meaningful heat load to a space that was probably not designed for it. The real question is not "does my vessel have 1.5 kW of spare power?", almost every large yacht does. The real question is "does my vessel have the cooling capacity to remove 1.5 kW of heat from a specific rack location, continuously, in a tropical anchorage?" That one is harder to answer and almost always more expensive to fix.
Vibration is a real variable. GPUs are reasonably robust, but the surrounding infrastructure (spinning disks, connectors, mounting hardware)is not always. Every vessel GPU deployment we have done has been dual-mounted: vibration isolation at the rack level and again at the individual component level. Skipping this step is how you get intermittent failures six months after installation that are impossible to diagnose remotely.
Salt air wants to kill your hardware. Engine rooms are often the least-sealed spaces on a yacht. If you are installing GPU hardware in or near the engine room, you are installing it in a corrosive environment. Conformal coating on circuit boards, stainless fasteners, sealed enclosures, and positive-pressure ventilation with filtered intake air are all worth the money. None of this is in the vendor quote.
If the integrator proposing your deployment does not have a paragraph on each of these topics, get a different integrator.
The quiet recommendation for 90% of single-vessel deployments
Two L40S cards in a dual-card configuration. Sized for a quantized 70B model as the ceiling, with room to run embedding models and Whisper on the side. Configured for warm failover, not hot pooling. Deployed in a purpose-built enclosure with forced-air cooling, vibration isolation, and a UPS sized to bridge the window between a generator fault and genset restart.
Every word of that is the answer to a lesson learned the hard way by somebody else.
If your workload is materially larger than that (a 405B model, high-concurrency commercial inference, fleet-scale operations)we step up to H100. If your budget will not support L40S and you need to ship something, we step down to A100 80GB. But the sweet spot for almost every single-vessel deployment is two L40S cards, and the specific reason is that vessels need redundancy more than they need peak performance.
On a vessel, the best hardware is the hardware that keeps working when the other card fails, when the generator hiccups, when the engine room temperature climbs, when the anchorage is humid, and when the guest in cabin 4 asks their question anyway. Two midrange cards beat one top-end card at that job every time.
That is the quiet lesson of why cloud AI does not work at sea applied one layer down the stack. The environment decides the architecture. Everything else is optimization around what the environment will actually let you get away with.
For the practical details of satellite bandwidth and how it shapes model selection, see Marcus Hale's take on vessel fleet security for the hardening side of the same problem, and Ethan Marsh's commentary on long-context models for how to think about which workloads are worth the memory budget.
Sizing an on-vessel GPU deployment and not sure which card is right for your workload? Talk to us. We will tell you what we would actually buy for your vessel and why, even when the answer is smaller than what the big integrators are quoting.
