AMD puts out new slottable GPU for AI-curious enterprises
AMD hopes to win over enterprise AI customers with a more affordable datacenter GPU that can drop into conventional air-cooled servers. Announced on Thursday, the MI350P is the House of Zen’s first PCIe-based Instinct accelerator since the MI210 debuted all the way back in 2022. Until now, AMD’s best GPUs have only been available in packs of eight and used socketed OAM modules that weren’t compatible with most server platforms. By comparison, The MI350P can slot into just about any 19-inch pizza box design that offers enough power and airflow, making it a much easier sell for enterprises dipping their toes into on-prem AI for the first time. The 600-watt, dual-slot card is essentially a MI350X that’s been cut in half. That means the CNDA-based GPU is packing 4.6 petaFLOPS of FP4 compute and 144 GB of VRAM spread across four HBM3e stacks delivering a respectable 4 TB/s of memory bandwidth. AMD supports configurations ranging from one to eight MI350Ps, though a lack of high-speed interconnects on these cards means it’ll be limited to PCIe 5.0 speeds (128 GB/s) for chip-to-chip communications, potentially limiting its potential in larger models. AMD hasn’t shared pricing for the cards just yet, but at least on paper, the MI350P is well positioned to compete with either Nvidia’s H200 NVL or RTX Pro 6000 Blackwell PCIe cards. Compared to the 141 GB H200, the MI350P promises about 38 percent higher peak performance at FP8, while eking out a narrow VRAM capacity advantage. But the H200 does pull ahead when it comes to memory bandwidth. With six HBM3e stacks to the MI350P’s four, the nearly two-year-old card’s memory is still about 20 percent faster. Nvidia's H200 also supports high-speed chip-to-chip communications over NVLink, while the MI350P doesn’t use AMD’s equivalent Infinity Fabric interconnect. However, all this assumes you can still find H200 NVLs in the wild. Since last summer, Nvidia has been pushing its RTX Pro 6000 Server cards on enterprise customers. As of writing, the card is Nvidia’s most powerful Blackwell-based accelerator offered in a PCIe formfactor. Compared to the RTX Pro 6000, the MI350P’s price starts becoming a bigger factor than performance. Workstation versions of the RTX Pro, which ditch the passive cooler for an active one, routinely sell for between $8,000 to $10,000 apiece, making it one of Nvidia’s more affordable datacenter-class GPUs. Depending on how pricing shakes out, AMD may have to push hard to be competitive. Having said that, the MI350P is still the better-specced part, delivering 2.3x higher peak flops, 2.5x the memory bandwidth, and 50 percent more vRAN of the RTX Pro. Now, this all assumes peak FLOPS and memory bandwidth, which is rarely realistic. The tensors used by AI workloads are rarely the ideal shape for squeezing the maximum number of FLOPS out of a chip. This is why we run for Maximum Achievable MatMul FLOPS (MAMF) and Babel Stream memory bandwidth benchmarks as part of our AI test suite. AMD seems to understand that peak FLOPS don’t really translate cleanly into real-world performance, and in the marketing materials shared with El Reg prior to publication, compared the MI350P’s theoretical performance against its real-world delivered performance. It’d be nice to see Nvidia and others adopt similar practices regarding accelerator performance claims, though we suspect getting everyone to agree on the best way to measure this might not be easy. The MI350P’s launch comes as AMD prepares to address a very different and likely more lucrative segment with its first rack-scale compute platform, codenamed Helios. That system is due out in the second half of the year, and is aimed primarily at large hyperscale and neocloud deployments. The system packs 72 of its all-new MI455X GPUs into a single double-wide OCP rack that behaves like an enormous accelerator. The platform will be AMD’s first crack at Nvidia’s NVL72 racks, which launched alongside its Blackwell generation nearly two years ago. ®