NVIDIA H200

NVIDIA H200, announced in 2024, is a refresh of the H100 built on the same Hopper architecture but with a significantly expanded memory subsystem. With 141GB of HBM3e and roughly 43% more memory bandwidth, it both makes it easier to fit huge models onto a single GPU and delivers significant inference gains — especially in Long Context scenarios. NVIDIA pitched it as a 'do the same work with fewer GPUs' upgrade, particularly aimed at LLM inference workloads. Until the Blackwell generation (B100/B200) ramps up, the H200 is the most important refresh in the spine of frontier serving.