[go: up one dir, main page]

0% found this document useful (0 votes)
64 views6 pages

Supercomputers - A Complete Study (Aug 23, 2025)

Uploaded by

adityaroonwal2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views6 pages

Supercomputers - A Complete Study (Aug 23, 2025)

Uploaded by

adityaroonwal2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Supercomputers — A Complete Study (Aug 23,

2025)

1) Definition
A supercomputer is a computer system designed to deliver the highest computational performance available
at a given time, typically measured in floating‑point operations per second (FLOPS). Modern
leadership‑class machines achieve petascale (10^15 FLOPS) to exascale (≥10^18 FLOPS) performance.
They are built by tightly coupling tens of thousands to millions of compute cores with a high‑speed
interconnect, a parallel storage system, and a specialized software stack for massive parallelism.

Key idea: Supercomputers trade single‑thread speed for extreme parallelism (many nodes + accelerators)
and fast communication.

2) Core Use Cases (What they’re used for)


• Climate & Weather: Global/regional climate modeling, hurricane prediction, monsoon forecasting,
data assimilation.
• Energy & Materials: Nuclear stockpile stewardship, fusion/fission modeling, combustion, batteries,
catalysts, quantum materials.
• Aerospace/Automotive (CFD & Multiphysics): Aerodynamics, aero‑thermal, crash simulations,
noise/vibration/harshness.
• Life Sciences & Health: Genomics, protein folding, molecular dynamics (e.g., drug discovery),
epidemiological models.
• Earth & Space: Seismic imaging, reservoir simulation, astrophysics, cosmology, gravitational waves,
space weather.
• AI/ML at Scale: Foundation model training, large‑scale inference, digital twins that couple HPC
simulations with AI surrogates.
• Security & National Missions: Cryptanalysis, cyber simulation ranges, weapons physics,
supply‑chain/critical‑infrastructure modeling.
• Finance & Industry: Risk Monte Carlo, optimization, demand forecasting, logistics and operations
research.

3) Architecture & Components (How they’re built)

3.1 Compute

• CPUs: Many‑core processors (x86/ARM/Power) with wide vector units (AVX‑512/SVE) and multi‑socket
NUMA designs.

1
• Accelerators (GPUs/AI chips): NVIDIA, AMD, Intel Max; massively parallel SIMT cores + HBM (High
Bandwidth Memory).
• Node types: CPU‑only, GPU‑accelerated, or APUs (CPU+GPU in one package, e.g., AMD MI300A).

3.2 Memory hierarchy

• On‑package HBM (TB/s bandwidth) → DDR/LPDDR → NVRAM/SSDs → Parallel file system.


• Latency hiding & locality: NUMA awareness, huge pages, cache‑blocking, GPU Unified/Managed
memory.
• Emerging: CXL memory pooling; NVLink/NVSwitch fabrics for peer‑to‑peer GPU memory access.

3.3 System interconnect & topology

• Interconnects: HPE Slingshot‑11, NVIDIA InfiniBand (HDR/NDR), custom (e.g., Tofu).


• Topologies: Dragonfly, Fat‑Tree, torus/mesh; optimize for low‑latency, high‑bisection bandwidth.
• Collectives & RDMA: Offload engines (SHARP‑like), adaptive routing, Quality of Service.

3.4 Storage (I/O)

• Parallel file systems: Lustre, IBM Storage Scale/GPFS, DAOS, Weka, EXAScaler.
• Burst buffers & tiers: SSD/NVMe layers in front of disk/object stores; checkpoints & fast restarts.
• I/O patterns: Checkpoint/restart, parallel HDF5/NetCDF, small‑file metadata stress (why IO500
matters).

3.5 Software stack

• OS & runtime: Linux variants (e.g., Cray OS, RHEL/TOSS), resource managers (Slurm/PBS/LSF),
containers (Apptainer/Singularity).
• Programming models: MPI, OpenMP, CUDA, HIP/ROCm, SYCL/DPC++, OpenACC, Kokkos/RAJA
portability layers.
• Math & science libs: BLAS/LAPACK, FFTW, PETSc, Trilinos, MKL/oneMKL, cuBLAS, rocBLAS, MAGMA.
• DevOps: Spack/Conda for builds, CI pipelines, performance profilers (Nsight, ROCm tools, VTune,
TAU, CrayPAT).

3.6 Facility engineering

• Cooling: Direct liquid cooling (DLC), warm‑water loops; some immersion pilots. Thermal design is as
critical as FLOPS.
• Power: 10–30+ MW typical for top systems; power capping, DVFS, node power telemetry.
• Resilience & security: Checkpointing, redundant fabrics, isolation policies, zero‑trust/HPC security
guidance.

4) Performance Metrics (How to read/compare systems)


• HPL / LINPACK (TOP500): Solves dense linear systems; reports Rmax (sustained) vs Rpeak
(theoretical). The standard ranking.
• HPCG: Sparse, memory‑bound; better reflects many real apps than HPL’s compute‑bound kernel.

2
• HPL‑MxP / “HPL‑AI”: Mixed‑precision HPL that exercises tensor cores/AI features; highlights AI‑tilted
throughput.
• Graph500: Data‑intensive breadth‑first search on large graphs; measures traversed edges per
second (TEPS).
• IO500: Storage bandwidth + metadata performance; ranks production and research storage stacks.
• Green500: Energy efficiency (GFLOPS/W); crucial for TCO and sustainability.

Interview tip: Quote Rmax, Rpeak, cores/accelerators, interconnect, power (MW), and optionally HPCG/
Green500 rank to sound complete. Note that Rmax/Rpeak efficiency indicates how well a system turns
silicon into usable performance.

5) Types / Architectural Families


• Heterogeneous CPU+GPU: Dominant at the top end (e.g., AMD/NVIDIA/Intel GPUs paired with CPUs
via Slingshot or InfiniBand).
• CPU‑only vector‑heavy nodes: Large core counts + wide vectors (e.g., A64FX SVE vectors in Fugaku).
• APU‑centric nodes: Unified CPU+GPU memory (e.g., MI300A) simplifies programming and improves
locality.
• Cloud‑scale supercomputers: Hyperscaler clusters (Azure “Eagle”) using homogeneous GPU fleets
with elastic growth.
• Classic MPP clusters: Commodity CPU nodes with high‑speed interconnect; still common in
mid‑range HPC.
• Specialized fabrics/topologies: Custom interconnects (Tofu, proprietary routing) tuned to app
communication patterns.

6) Top 10 Supercomputers (June 2025 TOP500) — in decreasing


order
Each entry lists the essentials you should be ready to cite in an interview: Rmax/Rpeak, cores, CPU/GPU &
interconnect, power, site/country, and one notable fact.

#1 El Capitan (LLNL, USA)

• Performance: Rmax 1,742 PF (1.742 EF); Rpeak 2,746 PF.


• Scale: 11,039,616 total cores.
• Architecture: HPE Cray EX255a with AMD 4th‑gen EPYC 24‑core CPUs + AMD Instinct MI300A
(APU) accelerators; Slingshot‑11 interconnect.
• Power: ~29.6 MW; strong energy efficiency (GF/W) for its class.
• Notable: Also #1 on HPCG (≈17.41 PF) → great memory‑bound performance; third recorded
exascale HPL system.

#2 Frontier (ORNL, USA)

• Performance: Rmax 1,353 PF (1.353 EF); Rpeak 2,055.72 PF.

3
• Scale: 9,066,176 cores.
• Architecture: HPE Cray EX235a; AMD EPYC (64C, 3rd‑gen) + AMD Instinct MI250X GPUs;
Slingshot‑11.
• Power: ~24.6 MW.
• Notable: First U.S. exascale‑class system; long‑running production for open science.

#3 Aurora (Argonne, USA)

• Performance: Rmax 1,012 PF (1.012 EF); Rpeak 1,980.01 PF.


• Scale: 9,264,128 cores.
• Architecture: HPE Cray EX – Intel Exascale Compute Blade; Intel Xeon CPU Max 9470 + Intel
Data Center GPU Max; Slingshot‑11.
• Power: ~38.7 MW.
• Notable: Intel GPU‑centric exascale architecture; key platform for exascale science + AI.

#4 JUPITER Booster (EuroHPC/JSC, Germany)

• Performance: Rmax 793.4 PF; Rpeak 930 PF.


• Scale: 4,801,344 cores.
• Architecture: Eviden BullSequana XH3000; NVIDIA GH200 Superchip nodes; quad‑rail NVIDIA
InfiniBand NDR200; RHEL.
• Power: ~13.1 MW.
• Notable: Preliminary/partial system during commissioning; Europe’s first exascale‑class program.

#5 Eagle (Microsoft Azure, USA)

• Performance: Rmax 561.2 PF; Rpeak 846.84 PF.


• Scale: 2,073,600 cores.
• Architecture: Azure NDv5: Intel Xeon Platinum 8480C + NVIDIA H100; NVIDIA InfiniBand NDR
fabric.
• Power: Not publicly listed in the TOP500 table.
• Notable: Cloud‑scale supercomputer; demonstrates hyperscaler GPUs can place highly on TOP500.

#6 HPC6 (Eni S.p.A., Italy)

• Performance: Rmax 477.9 PF; Rpeak 606.97 PF.


• Scale: 3,143,520 cores.
• Architecture: HPE Cray EX235a; AMD EPYC (64C) + AMD Instinct MI250X; Slingshot‑11; RHEL 8.9.
• Power: ~8.46 MW.
• Notable: Industrial energy supercomputing (large‑scale reservoir/geoscience workloads typical for
the sector).

#7 Supercomputer Fugaku (RIKEN R‑CCS, Japan)

• Performance: Rmax 442.01 PF; Rpeak 537.21 PF.


• Scale: 7,630,848 cores.
• Architecture: Fujitsu A64FX (48C, SVE) CPU‑only; Tofu‑D custom interconnect.
• Power: ~29.9 MW.

4
• Notable: Long‑standing leadership system; repeated #1 on Graph500 (data‑intensive) and high
HPCG ranking.

#8 Alps (CSCS, Switzerland)

• Performance: Rmax 434.9 PF; Rpeak 574.84 PF.


• Scale: 2,121,600 cores.
• Architecture: HPE Cray EX254n; NVIDIA Grace (72C) + GH200 nodes; Slingshot‑11; Cray OS.
• Power: ~7.12 MW.
• Notable: Grace Hopper superchip architecture aimed at AI/HPC convergence.

#9 LUMI (EuroHPC/CSC, Finland)

• Performance: Rmax 379.7 PF; Rpeak 531.51 PF.


• Scale: 2,752,704 cores.
• Architecture: HPE Cray EX235a; AMD EPYC (64C) + AMD Instinct MI250X; Slingshot‑11.
• Power: ~7.11 MW.
• Notable: One of Europe’s flagship GPU‑accelerated systems for open science.

#10 Leonardo (EuroHPC/CINECA, Italy)

• Performance: Rmax 241.2 PF; Rpeak 306.31 PF.


• Scale: 1,824,768 cores.
• Architecture: Eviden BullSequana XH2000; Intel Xeon 8358 + NVIDIA A100 (SXM4, 64 GB);
quad‑rail HDR100 InfiniBand.
• Power: ~7.49 MW.
• Notable: Balanced CPU+GPU design with strong interconnect provisioning (quad‑rail) for
bandwidth‑hungry apps.

Reminder: Rankings and figures above reflect the June 2025 TOP500 release.

7) How to think about performance (practical heuristics)


• Is your app compute‑bound or memory‑/I/O‑bound? Choose HPL vs HPCG/IO500 analogies
accordingly.
• Strong vs. weak scaling: Know whether you’re shrinking problem size per node (strong) or growing
global size (weak).
• Roofline mental model: FLOP/s limited by either peak compute or memory bandwidth; accelerators
lift both.
• Efficiency check: Rmax/Rpeak (HPL) and power (GF/W) hint at how well the machine is utilized and
cooled.
• Interconnect matters: Collectives, bisection bandwidth, and topology awareness (e.g., process
placement) often gate scaling.

5
8) Interview cheat‑sheet answers
• Definition: “A supercomputer is a massively parallel computer achieving petascale-to-exascale
FLOPS, built from thousands of nodes, high‑speed interconnects, parallel storage, and a software
stack for distributed/accelerated computing.”
• Use cases: Climate/CFD/materials/MD/seismic/genomics/AI at scale.
• Components: Compute nodes (CPU/GPU/APU), HBM+DDR, Slingshot/InfiniBand/Tofu, Lustre/GPFS/
DAOS, MPI/OpenMP/CUDA/HIP/SYCL, Slurm, DLC cooling.
• Metrics: HPL (Rmax/Rpeak), HPCG, HPL‑MxP, Graph500, IO500, Green500; always mention power
and efficiency.
• Types: Heterogeneous GPU+CPU, CPU‑only vector‑heavy, APU nodes, cloud supercomputers, classic
MPP clusters; custom interconnects.
• Current Top 3 (Jun’25): #1 El Capitan (LLNL), #2 Frontier (ORNL), #3 Aurora (ANL). Be ready to quote
their Rmax and architectures.

9) Glossary
• Exascale: ≥10^18 FLOPS sustained on HPL.
• Rmax/Rpeak: Sustained/theoretical HPL performance.
• HBM: High Bandwidth Memory, on‑package DRAM for GPUs/CPUs.
• SIMD/SIMT: Vectorization (CPU) / GPU thread execution model.
• Collectives: MPI operations (all‑reduce/bcast) sensitive to fabric quality.
• Burst buffer: Fast SSD tier for checkpoints between memory and disk.

10) Further study (what to practice)


• Map one of your workloads onto two architectures (CPU‑only vs GPU‑accelerated) and explain the
tradeoffs.
• Draw the node‑to‑node data path (HBM↔GPU↔CPU↔NIC↔switch) and annotate bottlenecks.
• Use the roofline model to justify where optimization time should go (vectorization vs memory
locality vs communication).
• Compare two interconnects (Slingshot‑11 vs NDR InfiniBand) for collectives‑heavy apps.
• Read one IO500 entry and summarize the storage stack (FS, clients, bandwidth, metadata).

End of study guide.

You might also like