NVIDIA Corporation DGX A100 (AMD EPYC 7742, Tesla A100-SXM-80GB) |
SPEChpc 2021_sml_base = 7.78 |
SPEChpc 2021_sml_peak = 8.30 |
hpc2021 License: | 019 | Test Date: | Sep-2021 |
---|---|---|---|
Test Sponsor: | NVIDIA Corporation | Hardware Availability: | Jul-2020 |
Tested by: | NVIDIA Corporation | Software Availability: | Sep-2021 |
Benchmark result graphs are available in the PDF report.
Benchmark | Base | Peak | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Model | Ranks | Thrds/Rnk | Seconds | Ratio | Seconds | Ratio | Seconds | Ratio | Model | Ranks | Thrds/Rnk | Seconds | Ratio | Seconds | Ratio | Seconds | Ratio | |
SPEChpc 2021_sml_base | 7.78 | |||||||||||||||||
SPEChpc 2021_sml_peak | 8.30 | |||||||||||||||||
Results appear in the order in which they were run. Bold underlined text indicates a median measurement. | ||||||||||||||||||
605.lbm_s | ACC | 8 | 1 | 90.5 | 17.1 | 90.4 | 17.2 | ACC | 8 | 1 | 90.5 | 17.1 | 90.4 | 17.2 | ||||
613.soma_s | ACC | 8 | 1 | 133 | 12.0 | 134 | 12.0 | ACC | 8 | 1 | 126 | 12.7 | 126 | 12.7 | ||||
618.tealeaf_s | ACC | 8 | 1 | 616 | 3.33 | 616 | 3.33 | ACC | 8 | 1 | 532 | 3.85 | 532 | 3.86 | ||||
619.clvleaf_s | ACC | 8 | 1 | 182 | 9.07 | 182 | 9.06 | ACC | 8 | 1 | 182 | 9.07 | 182 | 9.06 | ||||
621.miniswp_s | ACC | 8 | 1 | 176 | 6.23 | 177 | 6.22 | ACC | 8 | 1 | 143 | 7.70 | 143 | 7.68 | ||||
628.pot3d_s | ACC | 8 | 1 | 208 | 8.06 | 208 | 8.05 | ACC | 8 | 1 | 208 | 8.06 | 208 | 8.06 | ||||
632.sph_exa_s | ACC | 8 | 1 | 639 | 3.60 | 641 | 3.59 | ACC | 8 | 1 | 639 | 3.60 | 641 | 3.59 | ||||
634.hpgmgfv_s | ACC | 8 | 1 | 291 | 3.36 | 291 | 3.35 | ACC | 8 | 1 | 248 | 3.93 | 248 | 3.93 | ||||
635.weather_s | ACC | 8 | 1 | 92.3 | 28.2 | 92.4 | 28.2 | ACC | 8 | 1 | 92.3 | 28.2 | 92.4 | 28.2 |
Hardware Summary | |
---|---|
Type of System: | SMP |
Compute Node: | DGX A100 |
Interconnect: | None |
Compute Nodes Used: | 1 |
Total Chips: | 2 |
Total Cores: | 128 |
Total Threads: | 256 |
Total Memory: | 2 TB |
Max. Peak Threads: | 1 |
Software Summary | |
---|---|
Compiler: | C/C++/Fortran: Version 21.9 of NVIDIA HPC SDK for Linux |
MPI Library: | OpenMPI Version 4.0.5 |
Other MPI Info: | None |
Other Software: | None |
Base Parallel Model: | ACC |
Base Ranks Run: | 8 |
Base Threads Run: | 1 |
Peak Parallel Models: | ACC |
Minimum Peak Ranks: | 8 |
Maximum Peak Ranks: | 8 |
Max. Peak Threads: | 1 |
Min. Peak Threads: | 1 |
Hardware | |
---|---|
Number of nodes: | 1 |
Uses of the node: | compute |
Vendor: | NVIDIA Corporation |
Model: | DGX A100 |
CPU Name: | AMD EPYC 7742 |
CPU(s) orderable: | 2 chips |
Chips enabled: | 2 |
Cores enabled: | 128 |
Cores per chip: | 64 |
Threads per core: | 2 |
CPU Characteristics: | Turbo Boost up to 3400MHz |
CPU MHz: | 2250 |
Primary Cache: | 32 KB I + 32 KB D on chip per core |
Secondary Cache: | 512 KB I+D on chip per core |
L3 Cache: | 256 MB I+D on chip per chip 16 MB shared / 4 cores |
Other Cache: | None |
Memory: | 2 TB (32 x 64 GB 2Rx8 PC4-3200AA-R) |
Disk Subsystem: | OS: 2TB U.2 NVMe SSD drive Internal Storage: 30TB (8x 3.84TB U.2 NVMe SSD drives) |
Other Hardware: | None |
Accel Count: | 8 |
Accel Model: | Tesla A100-SXM-80GB |
Accel Vendor: | NVIDIA Corporation |
Accel Type: | GPU |
Accel Connection: | NVLINK 3.0, NVSWITCH 2.0 600GB/s |
Accel ECC enabled: | Yes |
Accel Description: | See Notes |
Adapter: | None |
Number of Adapters: | 0 |
Slot Type: | None |
Data Rate: | None |
Ports Used: | 0 |
Interconnect Type: | None |
Software | |
---|---|
Accelerator Driver: | NVIDIA UNIX x86_64 Kernel Module 470.57.02 |
Adapter: | None |
Adapter Driver: | None |
Adapter Firmware: | None |
Operating System: | Ubuntu 20.04 4.12.14-94.41-default |
Local File System: | xfs |
Shared File System: | None |
System State: | Run level 3 (multi-user) |
Other Software: | None |
Hardware | |
---|---|
Vendor: | N/A |
Model: | N/A |
Switch Model: | N/A |
Number of Switches: | 0 |
Number of Ports: | 0 |
Data Rate: | 0 |
Firmware: | 0 |
Topology: | N/A |
Primary Use: | N/A |
Software |
---|
Binaries built and run within a NVHPC SDK 21.9 CUDA 11.4 Ubuntu 20.04 Container available from NVIDIA's NGC Catalog: https://ngc.nvidia.com/catalog/containers/nvidia:nvhpc
The config file option 'submit' was used. MPI startup command: mpirun command was used to start MPI jobs. Indiviual Ranks were bound to the CPU cores on the same NUMA node as the GPU using 'numactl' within the following "bindACC.pl" perl script: ---- Start bindACC.pl ------ my %core_map = ( 0=>48, 1=>56, 2=>16, 3=>24, 4=>112, 5=>120, 6=>80, 7=>88 ); my %mem_map = ( 0=>3, 1=>3, 2=>1, 3=>1, 4=>7, 5=>7, 6=>5, 7=>5, ); my $rank = $ENV{OMPI_COMM_WORLD_LOCAL_RANK}; my $mrank = $rank % 8; my $cplus = int($rank/8); my $core = $core_map{$mrank} + $cplus; my $mem = $mem_map{$mrank}; my $cmd = "numactl -C $core -m $mem "; while (my $arg = shift) { $cmd .= "$arg "; } system($cmd); ---- End bindACC.pl ------
Detailed A100 Information from nvaccelinfo CUDA Driver Version: 11040 NVRM version: NVIDIA UNIX x86_64 Kernel Module 470.57.02 Device Number: 0 Device Name: NVIDIA A100-SXM-80GB Device Revision Number: 8.0 Global Memory Size: 85198045184 Number of Multiprocessors: 108 Concurrent Copy and Execution: Yes Total Constant Memory: 65536 Total Shared Memory per Block: 49152 Registers per Block: 65536 Warp Size: 32 Maximum Threads per Block: 1024 Maximum Block Dimensions: 1024, 1024, 64 Maximum Grid Dimensions: 2147483647 x 65535 x 65535 Maximum Memory Pitch: 2147483647B Texture Alignment: 512B Clock Rate: 1410 MHz Execution Timeout: No Integrated Device: No Can Map Host Memory: Yes Compute Mode: default Concurrent Kernels: Yes ECC Enabled: Yes Memory Clock Rate: 1593 MHz Memory Bus Width: 5120 bits L2 Cache Size: 41943040 bytes Max Threads Per SMP: 2048 Async Engines: 3 Unified Addressing: Yes Managed Memory: Yes Concurrent Managed Memory: Yes Preemption Supported: Yes Cooperative Launch: Yes Multi-Device: Yes Default Target: cc80
============================================================================== CC 605.lbm_s(base, peak) 613.soma_s(base, peak) 618.tealeaf_s(base, peak) 621.miniswp_s(base, peak) 634.hpgmgfv_s(base, peak) ------------------------------------------------------------------------------ nvc 21.9-0 64-bit target on x86-64 Linux -tp zen NVIDIA Compilers and Tools Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. ------------------------------------------------------------------------------ ============================================================================== CXXC 632.sph_exa_s(base, peak) ------------------------------------------------------------------------------ nvc++ 21.9-0 64-bit target on x86-64 Linux -tp zen NVIDIA Compilers and Tools Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. ------------------------------------------------------------------------------ ============================================================================== FC 619.clvleaf_s(base, peak) 628.pot3d_s(base, peak) 635.weather_s(base, peak) ------------------------------------------------------------------------------ nvfortran 21.9-0 64-bit target on x86-64 Linux -tp zen NVIDIA Compilers and Tools Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. ------------------------------------------------------------------------------
621.miniswp_s: | -DUSE_KBA -DUSE_ACCELDIR |
632.sph_exa_s: | -DSPEC_USE_LT_IN_KERNELS --c++17 |
-Mfprelaxed -Mnouniform -Mstack_arrays -fast -acc=gpu |
-Mfprelaxed -Mnouniform -Mstack_arrays -fast -acc=gpu |
-Mfprelaxed -Mnouniform -Mstack_arrays -fast -acc=gpu |
621.miniswp_s: | -DUSE_KBA -DUSE_ACCELDIR |
632.sph_exa_s: | -DSPEC_USE_LT_IN_KERNELS --c++17 |
605.lbm_s: | basepeak = yes |
613.soma_s: | -fast -O3 -acc=gpu -gpu=pinned |
618.tealeaf_s: | -fast -Msafeptr -acc=gpu |
621.miniswp_s: | -Mfprelaxed -Mnouniform -Mstack_arrays -fast -acc=gpu -gpu=pinned |
634.hpgmgfv_s: | -fast -acc=gpu -gpu=pinned -static-nvidia |
632.sph_exa_s: | basepeak = yes |
619.clvleaf_s: | basepeak = yes |
628.pot3d_s: | -Mstack_arrays -fast -acc=gpu |
635.weather_s: | basepeak = yes |