AMD shatters the Petaflop rack barrier with Instinct

Inventec and Super Micro chassis for AI cards

rtg-logo Radeon AMD logoAMD released their Instinct cards today along with three interesting servers to house it. SemiAccurate thinks these datacenter beasts are going to be a healthy sales winner for the company.

The first two of these platforms are pretty ho-hum GPU compute platforms from Super Micro and Inventec. The SYS 1028GQ-TRT takes the usual Super Micro route of obvious names which indicate functions. If you haven’t figured it out by now this is a 1U dual Xeon box with 3x PCIe3 16x slots filled with Instinct cards. If you have seen one GPGPU server, you have seem most of them all.

Same goes for Inventec’s K888 G3 with Radeon Instinct, and as usual their names give nothing away so we will tell you about this 2U box. Think 4x Instinct GPUs for a claimed 100TF of 16b FP math backed by 2x Haswell or Broadwell Xeons as well. The larger size allows for the full 24 DIMMs and 12/24 3.5/2.5″ drives respectively on a 12Gbps SAS bus. Once again nothing really amazing, just GPU compute with Instinct cards.

Inventec Falconwitch chassis

Four GPUs per cage, four cages per chassis

Fortunately the Inventec PS1816 Falconwitch server breaks that mold, actually shatters it quite comprehensively. AMD and Inventec were pretty tight about the details of the box but there are some pretty big claims for this 4U server, starting with 400TF of 16b FP power. How? Modularity and tight packaging starting with four modular server chassis across the front and disaggregated PCIe slots in a configurable manner.

As the numbers suggest there are 16 Vega based Instinct MI25 cards in this chassis, plus an unknown number of unknown CPUs. If you recall our earlier reveal of the Zen based Naples CPUs, you will remember that they have 128x PCIe3 lanes per device. This is not pure slots, networking, SATA, and NVMe slots pull lanes from that count so in a usable server arrangement the count is more likely to be 4-6 full 16x slots per socket plus storage and ethernet. That strongly intones a single Zen Naples socket (SP3) because Xeons including the upcoming Purley can’t come close to this count. We can’t get into speculation about other really interesting features that enable this but stay tuned for some neat stuff in a few months, this is cool.

One thing we can speculate about is the disaggregation of the PCIe slots from the CPU. Remember the Avago/PLX PEX9700 and it’s 96 or so lanes of switching capacity? Sure you do but go read that link carefully again just to humor me. Then think about the advanced switching and routing capabilities of the PCIe fabric on the PEX9700. The Falconwitch has configurable slots and topologies, something that would be impossible to do without the capabilities of the new PLX parts. The AMD slides call this Dynamic Resource Allocation and, well, it is apt. Get it? Pun pun.

Better than software configurable topologies is the fact that you can connect multiple Falconwitch chassis together with “Up to 144GB non blocking BW for P2P”. Remember the bit from the Instinct cards about Large BAR Support for MGPU Peer to Peer? 144GB of bandwidth strongly implies a PCIe based fabric between boxes or 8x PCIe3 16-lanes worth. That PLX fabric must be stretched to its maximum but this is going to be a very low latency P2P AI monster.

Unfortunately companies doing AI training for large data sets and especially things like high rez video training think a mere 400TF is paltry power for their tasks. So if you take a bunch of Falconwitches and slap them in a rack you get the Inventec Rack with Radeon Instinct. This little 3 Petaflop rack holds 120 Instinct GPUs, that would be 7.5 chassis per rack for reasons we don’t understand, but we have shattered the Petaflop barrier at last.

Given what AMD and Inventec are claiming you can start with a vanilla server like the Super Micro SYS 1028GQ-TRT or Inventec K888G3 and have islands or roll your own interconnect. If you pick something like the Falconwictch the interconnect comes with the box. Sure it will cost more but the single system image of the result will undoubtedly make the whole thing much more powerful than the pieces, think no crushing latency between boxes or at least much less. For anyone buying Instinct cards, they won’t but one they will buy racks worth. This is a serious game changer for that crowd and we can’t wait to hear about all the details.S|A

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate