Supercomputers — A Complete Study (Aug 23,
2025)
1) Definition
A supercomputer is a computer system designed to deliver the highest computational performance available
at a given time, typically measured in floating‑point operations per second (FLOPS). Modern
leadership‑class machines achieve petascale (10^15 FLOPS) to exascale (≥10^18 FLOPS) performance.
They are built by tightly coupling tens of thousands to millions of compute cores with a high‑speed
interconnect, a parallel storage system, and a specialized software stack for massive parallelism.
Key idea: Supercomputers trade single‑thread speed for extreme parallelism (many nodes + accelerators)
and fast communication.
2) Core Use Cases (What they’re used for)
     • Climate & Weather: Global/regional climate modeling, hurricane prediction, monsoon forecasting,
       data assimilation.
     • Energy & Materials: Nuclear stockpile stewardship, fusion/fission modeling, combustion, batteries,
       catalysts, quantum materials.
     • Aerospace/Automotive (CFD & Multiphysics): Aerodynamics, aero‑thermal, crash simulations,
       noise/vibration/harshness.
     • Life Sciences & Health: Genomics, protein folding, molecular dynamics (e.g., drug discovery),
       epidemiological models.
     • Earth & Space: Seismic imaging, reservoir simulation, astrophysics, cosmology, gravitational waves,
       space weather.
     • AI/ML at Scale: Foundation model training, large‑scale inference, digital twins that couple HPC
       simulations with AI surrogates.
     • Security & National Missions: Cryptanalysis, cyber simulation ranges, weapons physics,
       supply‑chain/critical‑infrastructure modeling.
     • Finance & Industry: Risk Monte Carlo, optimization, demand forecasting, logistics and operations
       research.
3) Architecture & Components (How they’re built)
3.1 Compute
     • CPUs: Many‑core processors (x86/ARM/Power) with wide vector units (AVX‑512/SVE) and multi‑socket
       NUMA designs.
                                                    1
    • Accelerators (GPUs/AI chips): NVIDIA, AMD, Intel Max; massively parallel SIMT cores + HBM (High
      Bandwidth Memory).
    • Node types: CPU‑only, GPU‑accelerated, or APUs (CPU+GPU in one package, e.g., AMD MI300A).
3.2 Memory hierarchy
    • On‑package HBM (TB/s bandwidth) → DDR/LPDDR → NVRAM/SSDs → Parallel file system.
    • Latency hiding & locality: NUMA awareness, huge pages, cache‑blocking, GPU Unified/Managed
      memory.
    • Emerging: CXL memory pooling; NVLink/NVSwitch fabrics for peer‑to‑peer GPU memory access.
3.3 System interconnect & topology
    • Interconnects: HPE Slingshot‑11, NVIDIA InfiniBand (HDR/NDR), custom (e.g., Tofu).
    • Topologies: Dragonfly, Fat‑Tree, torus/mesh; optimize for low‑latency, high‑bisection bandwidth.
    • Collectives & RDMA: Offload engines (SHARP‑like), adaptive routing, Quality of Service.
3.4 Storage (I/O)
    • Parallel file systems: Lustre, IBM Storage Scale/GPFS, DAOS, Weka, EXAScaler.
    • Burst buffers & tiers: SSD/NVMe layers in front of disk/object stores; checkpoints & fast restarts.
    • I/O patterns: Checkpoint/restart, parallel HDF5/NetCDF, small‑file metadata stress (why IO500
      matters).
3.5 Software stack
    • OS & runtime: Linux variants (e.g., Cray OS, RHEL/TOSS), resource managers (Slurm/PBS/LSF),
      containers (Apptainer/Singularity).
    • Programming models: MPI, OpenMP, CUDA, HIP/ROCm, SYCL/DPC++, OpenACC, Kokkos/RAJA
      portability layers.
    • Math & science libs: BLAS/LAPACK, FFTW, PETSc, Trilinos, MKL/oneMKL, cuBLAS, rocBLAS, MAGMA.
    • DevOps: Spack/Conda for builds, CI pipelines, performance profilers (Nsight, ROCm tools, VTune,
      TAU, CrayPAT).
3.6 Facility engineering
    • Cooling: Direct liquid cooling (DLC), warm‑water loops; some immersion pilots. Thermal design is as
      critical as FLOPS.
    • Power: 10–30+ MW typical for top systems; power capping, DVFS, node power telemetry.
    • Resilience & security: Checkpointing, redundant fabrics, isolation policies, zero‑trust/HPC security
      guidance.
4) Performance Metrics (How to read/compare systems)
    • HPL / LINPACK (TOP500): Solves dense linear systems; reports Rmax (sustained) vs Rpeak
      (theoretical). The standard ranking.
    • HPCG: Sparse, memory‑bound; better reflects many real apps than HPL’s compute‑bound kernel.
                                                    2
     • HPL‑MxP / “HPL‑AI”: Mixed‑precision HPL that exercises tensor cores/AI features; highlights AI‑tilted
       throughput.
     • Graph500: Data‑intensive breadth‑first search on large graphs; measures traversed edges per
       second (TEPS).
     • IO500: Storage bandwidth + metadata performance; ranks production and research storage stacks.
     • Green500: Energy efficiency (GFLOPS/W); crucial for TCO and sustainability.
Interview tip: Quote Rmax, Rpeak, cores/accelerators, interconnect, power (MW), and optionally HPCG/
Green500 rank to sound complete. Note that Rmax/Rpeak efficiency indicates how well a system turns
silicon into usable performance.
5) Types / Architectural Families
     • Heterogeneous CPU+GPU: Dominant at the top end (e.g., AMD/NVIDIA/Intel GPUs paired with CPUs
       via Slingshot or InfiniBand).
     • CPU‑only vector‑heavy nodes: Large core counts + wide vectors (e.g., A64FX SVE vectors in Fugaku).
     • APU‑centric nodes: Unified CPU+GPU memory (e.g., MI300A) simplifies programming and improves
       locality.
     • Cloud‑scale supercomputers: Hyperscaler clusters (Azure “Eagle”) using homogeneous GPU fleets
       with elastic growth.
     • Classic MPP clusters: Commodity CPU nodes with high‑speed interconnect; still common in
       mid‑range HPC.
     • Specialized fabrics/topologies: Custom interconnects (Tofu, proprietary routing) tuned to app
       communication patterns.
6) Top 10 Supercomputers (June 2025 TOP500) — in decreasing
order
Each entry lists the essentials you should be ready to cite in an interview: Rmax/Rpeak, cores, CPU/GPU &
interconnect, power, site/country, and one notable fact.
#1 El Capitan (LLNL, USA)
     • Performance: Rmax 1,742 PF (1.742 EF); Rpeak 2,746 PF.
     • Scale: 11,039,616 total cores.
     • Architecture: HPE Cray EX255a with AMD 4th‑gen EPYC 24‑core CPUs + AMD Instinct MI300A
       (APU) accelerators; Slingshot‑11 interconnect.
     • Power: ~29.6 MW; strong energy efficiency (GF/W) for its class.
     • Notable: Also #1 on HPCG (≈17.41 PF) → great memory‑bound performance; third recorded
      exascale HPL system.
#2 Frontier (ORNL, USA)
     • Performance: Rmax 1,353 PF (1.353 EF); Rpeak 2,055.72 PF.
                                                     3
     • Scale: 9,066,176 cores.
     • Architecture: HPE Cray EX235a; AMD EPYC (64C, 3rd‑gen) + AMD Instinct MI250X GPUs;
       Slingshot‑11.
     • Power: ~24.6 MW.
     • Notable: First U.S. exascale‑class system; long‑running production for open science.
#3 Aurora (Argonne, USA)
     • Performance: Rmax 1,012 PF (1.012 EF); Rpeak 1,980.01 PF.
     • Scale: 9,264,128 cores.
     • Architecture: HPE Cray EX – Intel Exascale Compute Blade; Intel Xeon CPU Max 9470 + Intel
       Data Center GPU Max; Slingshot‑11.
     • Power: ~38.7 MW.
     • Notable: Intel GPU‑centric exascale architecture; key platform for exascale science + AI.
#4 JUPITER Booster (EuroHPC/JSC, Germany)
     • Performance: Rmax 793.4 PF; Rpeak 930 PF.
     • Scale: 4,801,344 cores.
     • Architecture: Eviden BullSequana XH3000; NVIDIA GH200 Superchip nodes; quad‑rail NVIDIA
       InfiniBand NDR200; RHEL.
     • Power: ~13.1 MW.
     • Notable: Preliminary/partial system during commissioning; Europe’s first exascale‑class program.
#5 Eagle (Microsoft Azure, USA)
     • Performance: Rmax 561.2 PF; Rpeak 846.84 PF.
     • Scale: 2,073,600 cores.
     • Architecture: Azure NDv5: Intel Xeon Platinum 8480C + NVIDIA H100; NVIDIA InfiniBand NDR
       fabric.
     • Power: Not publicly listed in the TOP500 table.
     • Notable: Cloud‑scale supercomputer; demonstrates hyperscaler GPUs can place highly on TOP500.
#6 HPC6 (Eni S.p.A., Italy)
     • Performance: Rmax 477.9 PF; Rpeak 606.97 PF.
     • Scale: 3,143,520 cores.
     • Architecture: HPE Cray EX235a; AMD EPYC (64C) + AMD Instinct MI250X; Slingshot‑11; RHEL 8.9.
     • Power: ~8.46 MW.
     • Notable: Industrial energy supercomputing (large‑scale reservoir/geoscience workloads typical for
       the sector).
#7 Supercomputer Fugaku (RIKEN R‑CCS, Japan)
     • Performance: Rmax 442.01 PF; Rpeak 537.21 PF.
     • Scale: 7,630,848 cores.
     • Architecture: Fujitsu A64FX (48C, SVE) CPU‑only; Tofu‑D custom interconnect.
     • Power: ~29.9 MW.
                                                   4
    • Notable: Long‑standing leadership system; repeated #1 on Graph500 (data‑intensive) and high
      HPCG ranking.
#8 Alps (CSCS, Switzerland)
    • Performance: Rmax 434.9 PF; Rpeak 574.84 PF.
    • Scale: 2,121,600 cores.
    • Architecture: HPE Cray EX254n; NVIDIA Grace (72C) + GH200 nodes; Slingshot‑11; Cray OS.
    • Power: ~7.12 MW.
    • Notable: Grace Hopper superchip architecture aimed at AI/HPC convergence.
#9 LUMI (EuroHPC/CSC, Finland)
    • Performance: Rmax 379.7 PF; Rpeak 531.51 PF.
    • Scale: 2,752,704 cores.
    • Architecture: HPE Cray EX235a; AMD EPYC (64C) + AMD Instinct MI250X; Slingshot‑11.
    • Power: ~7.11 MW.
    • Notable: One of Europe’s flagship GPU‑accelerated systems for open science.
#10 Leonardo (EuroHPC/CINECA, Italy)
    • Performance: Rmax 241.2 PF; Rpeak 306.31 PF.
    • Scale: 1,824,768 cores.
    • Architecture: Eviden BullSequana XH2000; Intel Xeon 8358 + NVIDIA A100 (SXM4, 64 GB);
      quad‑rail HDR100 InfiniBand.
    • Power: ~7.49 MW.
    • Notable: Balanced CPU+GPU design with strong interconnect provisioning (quad‑rail) for
      bandwidth‑hungry apps.
     Reminder: Rankings and figures above reflect the June 2025 TOP500 release.
7) How to think about performance (practical heuristics)
    • Is your app compute‑bound or memory‑/I/O‑bound? Choose HPL vs HPCG/IO500 analogies
      accordingly.
    • Strong vs. weak scaling: Know whether you’re shrinking problem size per node (strong) or growing
      global size (weak).
    • Roofline mental model: FLOP/s limited by either peak compute or memory bandwidth; accelerators
      lift both.
    • Efficiency check: Rmax/Rpeak (HPL) and power (GF/W) hint at how well the machine is utilized and
      cooled.
    • Interconnect matters: Collectives, bisection bandwidth, and topology awareness (e.g., process
      placement) often gate scaling.
                                                  5
8) Interview cheat‑sheet answers
      • Definition: “A supercomputer is a massively parallel computer achieving petascale-to-exascale
        FLOPS, built from thousands of nodes, high‑speed interconnects, parallel storage, and a software
        stack for distributed/accelerated computing.”
      • Use cases: Climate/CFD/materials/MD/seismic/genomics/AI at scale.
      • Components: Compute nodes (CPU/GPU/APU), HBM+DDR, Slingshot/InfiniBand/Tofu, Lustre/GPFS/
        DAOS, MPI/OpenMP/CUDA/HIP/SYCL, Slurm, DLC cooling.
      • Metrics: HPL (Rmax/Rpeak), HPCG, HPL‑MxP, Graph500, IO500, Green500; always mention power
        and efficiency.
      • Types: Heterogeneous GPU+CPU, CPU‑only vector‑heavy, APU nodes, cloud supercomputers, classic
        MPP clusters; custom interconnects.
      • Current Top 3 (Jun’25): #1 El Capitan (LLNL), #2 Frontier (ORNL), #3 Aurora (ANL). Be ready to quote
        their Rmax and architectures.
9) Glossary
      • Exascale: ≥10^18 FLOPS sustained on HPL.
      • Rmax/Rpeak: Sustained/theoretical HPL performance.
      • HBM: High Bandwidth Memory, on‑package DRAM for GPUs/CPUs.
      • SIMD/SIMT: Vectorization (CPU) / GPU thread execution model.
      • Collectives: MPI operations (all‑reduce/bcast) sensitive to fabric quality.
      • Burst buffer: Fast SSD tier for checkpoints between memory and disk.
10) Further study (what to practice)
      • Map one of your workloads onto two architectures (CPU‑only vs GPU‑accelerated) and explain the
        tradeoffs.
      • Draw the node‑to‑node data path (HBM↔GPU↔CPU↔NIC↔switch) and annotate bottlenecks.
      • Use the roofline model to justify where optimization time should go (vectorization vs memory
        locality vs communication).
      • Compare two interconnects (Slingshot‑11 vs NDR InfiniBand) for collectives‑heavy apps.
      • Read one IO500 entry and summarize the storage stack (FS, clients, bandwidth, metadata).
End of study guide.