NVIDIA DGX A100: Debut Ampere Platform Delivers Five Petaflops of Performance

As part of the DGX A100 system, the basis of which Jen-Hsun Huang recently taken out of the oven, includes eight A100 GPUs, six NVLink 3.0 switches, nine Mellanox network controllers, two 64-core AMD EPYC Rome generation processors, 1TB of RAM, and 15TB of NVMe-enabled SSDs.

NVIDIA DGX A100: Debut Ampere Platform Delivers Five Petaflops of Performance

NVIDIA DGX A100 is the third generation of the company's computing systems, designed primarily for solving artificial intelligence problems. These systems are now built on the most advanced A100 Ampere family of GPUs, resulting in a dramatic increase in their performance, which has reached 5 petaflops. As a result, the DGX A100 is able to handle much more complex AI models and much larger amounts of data.

For the DGX A100 system, NVIDIA only lists the total amount of HBM2 memory, which goes up to 320 GB. Simple arithmetic calculations allow us to determine that each GPU has 40 GB of memory, and the images of the novelty make it possible to unequivocally judge that this amount is distributed among six stacks. The graphics memory bandwidth is also mentioned - 12,4 TB / s for the entire DGX A100 system in the aggregate.

Considering that the DGX-1 system, based on eight Tesla V100s, delivered one petaflops in mixed-precision calculations, and the DGX A100 is claimed to perform at five petaflops, it can be assumed that in specific calculations, one Ampere GPU is five times faster than its predecessor with Volta architecture. In some cases, the advantage becomes twenty-fold.

NVIDIA DGX A100: Debut Ampere Platform Delivers Five Petaflops of Performance

In total, the DGX A8 delivers peak performance of 100 operations per second in integer operations (INT1016), 16 petaflops in half-precision floating point (FP5) operations, and 64 teraflops in double precision operations (FP156). In addition, in TF32 tensor computing, the DGX A100 peaks at 2,5 petaflops. Recall that one teraflops is 1012 floating point operations per second, one petaflops is 1015 floating point operations per second.

An important feature of the NVIDIA A100 accelerators is the ability to divide the resources of one GPU into seven virtual segments. This allows you to significantly increase configuration flexibility in the same cloud segment. For example, one DGX A100 system with eight physical GPUs can act as 56 virtual GPUs. Multi-Instance GPU (MIG) technology allows you to select segments of different sizes both among the computing cores and in the cache memory and HBM2 type memory, and they will not compete with each other for bandwidth.

NVIDIA DGX A100: Debut Ampere Platform Delivers Five Petaflops of Performance

It is worth noting that compared to previous DGX systems, the anatomy of the DGX A100 has undergone some changes. The number of heat pipes in the heatsinks of SXM3 modules, which are equipped with A100 GPUs with HBM2 memory, has increased significantly compared to Tesla V100 modules of the Volta generation, although their ends are hidden from the layman's eyes by upper overlays. The practical limit for this design is 400 W of thermal energy. This is also confirmed by the official characteristics of the A100 in the performance of SXM3, published today.

Next to the A100 GPUs, the motherboard houses six third-generation NVLink interface switches, which together provide two-way data transfer at a speed of 4,8 TB / s. NVIDIA also seriously took care of their cooling, judging by the full-profile heatsinks with heat pipes. 12 channels of NVLink interface are allocated for each GPU, neighboring GPUs can exchange data at a speed of 600 GB / s.

The DGX A100 system also hosted nine Mellanox ConnectX-6 HDR network controllers capable of transmitting information at speeds up to 200 Gbps. Collectively, the DGX A100 provides 3,6 TB/s bi-directional data transfer. The system also uses proprietary Mellanox technologies aimed at efficient scaling of computing systems with this architecture. Support for PCI Express 4.0 at the platform level is determined by AMD EPYC processors of the Rome generation, as a result, this interface is used not only by A100 graphics accelerators, but also by NVMe SSDs.

NVIDIA DGX A100: Debut Ampere Platform Delivers Five Petaflops of Performance

In addition to the DGX A100, NVIDIA has begun supplying its partners with HGX A100 boards, which are one of the components of server systems that other manufacturers will produce independently. One HGX A100 board can host either four or eight NVIDIA A100 GPUs. In addition, for its own needs, NVIDIA has already assembled DGX SuperPOD - a cluster of 140 DGX A100 systems that provides performance at the level of 700 petaflops with fairly modest overall dimensions. The company promised to provide methodological assistance to partners wishing to build similar computing clusters based on the DGX A100. By the way, it took NVIDIA no more than a month to build the DGX SuperPOD instead of several months or even years typical for such tasks.

NVIDIA DGX A100: Debut Ampere Platform Delivers Five Petaflops of Performance

According to NVIDIA, deliveries of the DGX A100 have already begun at a price of $199 per copy, the company's partners are already hosting these systems in their cloud clusters, the ecosystem already covers 000 countries, among which Vietnam and the UAE are mentioned. In addition, graphics solutions based on the Ampere architecture will quite predictably be included in the Perlmutter supercomputer system, created by Cray for the US Department of Energy. In its composition, NVIDIA Ampere GPUs will coexist with AMD EPYC Milan generation CPUs with Zen 26 architecture. The NVIDIA Ampere-based supercomputer nodes will reach the customer in the second half of the year, although the first copies have already arrived at the US agency's specialized laboratory.



Source: 3dnews.ru

Add a comment