Posted by on
Tags: , , , , , , , , , , , , , , ,
Categories: Uncategorized

An important milestone for NERSC, the Department of Energy (DOE), HPE, and the HPC community

Perlmutter is the largest system that’s been built using the HPE Cray EX supercomputer shipped to date. It’s the first in a series of HPE Cray EX supercomputers to be delivered to the DOE for important research in areas such as climate modeling, early universe simulations, cancer research, high-energy physics, protein structure, material science, and more. Over the next few years, HPE will deliver three exascale supercomputers to the DOE, culminating in the two exaflop El Capitan system for Lawrence Livermore National Laboratory. 

The HPE Cray EX supercomputer is HPE’s first exascale-class HPC system. Perlmutter is using many of its exascale era innovations. Designed from the ground up for converged HPC and AI workloads, the HPE Cray EX supports a heterogenous computing model, allowing for the mixing of compute blades across processor and GPU architectures. This capability is important for the NERSC usage models. 

Perlmutter will be a heterogeneous system with both GPU-accelerated and CPU-only nodes to support  NERSC applications and users, including a mix of those who utilize GPUs and those who don’t.

Phase 1 is the GPU-accelerated portion of Perlmutter and contains 12 HPE Cray EX cabinets with 1500 AMD EPYC nodes each with 4 NVIDA A100 GPUs for a total of 6000 GPUs. Phase 2 will add an additional 3000 dual socket AMD EPYC CPU only nodes later this year housed in an additional 12 cabinets.

Perlmutter takes advantage of the HPE Slingshot Interconnect that incorporates intelligent features that enable diverse workloads to run simultaneously across the system. It includes novel adaptive routing, quality-of-service, and congestion management feature while retaining full Ethernet compatibility. Ethernet compatibility is important to NERSC’s use of the supercomputer as it allows new paths for connecting the system more broadly to their internal file system as well as the to the external internet. 

Perlmutter also enables a direct connection to the Cray ClusterStor E1000, an all-flash Lustre-based storage system. ClusterStor E1000 is the fastest storage system of its kind, transferring data at more than 5 terabytes/sec. The Perlmutter deployment is currently the world’s largest—with 35 petabytes of usable storage.

Read more here:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.