NVIDIA A100 is the first GPU with Ampere architecture. The NVIDIA A100 has more than 54 billion transistors, the die size is 826 mm2, the manufacturing process rule is TSMC 7nm, and the memory is Samsung HBM2 40GB.
The TF (Tensor Float) 32 for AI, which adopts the 3rd generation Tensor core and is newly implemented, uses the Volta architecture for the previous generation NVIDIA V100 (renamed from the Tesla V100 from the 2019 GTC).
The TF32 (single precision floating point number) operation achieves 312 TFLOPS which is 20 times the performance without changing the code. In addition, the Tensor core supports FP64 (double precision floating point number), which improves the performance in HPC applications to 19.5 TFLOPS, which is 2.5 times the Volta ratio.
Even deep learning inference processing of INT8 (8-bit integer operation) has achieved 1,248 TOPS, which is 20 times higher.
Multi-instance GPU (MIG) technology can be used to divide a single A100 GPU into up to 7 GPUs and handle them, assigning various levels of calculation to jobs of different sizes, and optimal use Method and maximum return on investment possible.
The third generation NVLink that doubles the high-speed connection between GPUs and the sparse ability that doubles the performance of AI calculation are also features.
A100 was provided for the “DGX A100” system for supercomputers, and equipped with a total of eight A100s connected by NVLink. The DGX A100 is equipped with a total of 320GB of HBM2 with a memory bandwidth of 12.4TB / s, and demonstrates AI performance of 5PFLOPS per node. The DGX A100 will be shipped for 199,000 dollars.
Posted by Mohit Sharma on May 15, 2020 in Technology