[go: up one dir, main page]

0% found this document useful (0 votes)
9 views17 pages

Aum HPC Processor

The document outlines the development of the AUM HPC Processor under India's National Supercomputing Mission, focusing on creating an indigenous supercomputing ecosystem. The AUM Processor aims to provide high performance for HPC, AI, and server markets with features like energy efficiency, security, and a complete open-source software ecosystem. It is expected to be available in 2024, targeting applications in HPC, cloud, and edge computing.

Uploaded by

ANDROID
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views17 pages

Aum HPC Processor

The document outlines the development of the AUM HPC Processor under India's National Supercomputing Mission, focusing on creating an indigenous supercomputing ecosystem. The AUM Processor aims to provide high performance for HPC, AI, and server markets with features like energy efficiency, security, and a complete open-source software ecosystem. It is expected to be available in 2024, targeting applications in HPC, cloud, and edge computing.

Uploaded by

ANDROID
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Aum HPC

Processor

Development under
National
Supercomputing Mission

Sanjay Wandhekar
Senior Director, HoD HPC
Technologies Group, C-DAC
sanjayw@cdac.in
National Supercomputing Mission (NSM)
Brief about National Supercomputing Mission
• Supercomputing infrastructure in the country
• Indigenous Supercomputing ecosystem in phased
manner: From “Assembly” to “Manufacturing” to
“Design and Manufacturing” of Supercomputers
• Servers
• HPC network
• Software stack
• HPC Processor
• Liquid cooling technologies
• Supercomputing Applications of National interest
• Human resources for applications development and
HPC maintenance

3
Motivation for India’s own HPC Processor - AUM
•Processor architecture suitable for both HPC & General
Purpose Computing- Extracting maximum application level
performance
•Energy efficiency: Arm Architecture
•Capability building with bargaining power
•Immunity from possible export restrictions to India in Future
•Technological sovereignty: Designed and Engineered in India
•Security (Back doors etc.) : Highest priority for strategic
sectors

4
HPC Processor development program
• Develop a competitive HPC Processor for HPC, AI and server market
• Develop a complete ecosystem leveraging open source components
• Open source software ecosystem
• Reference Boards
• Reference server designs
• Build a Pilot HPC system with > 1 PF compute power
• Be ready with Exascale system design and subsystems based on AUM
Processor
• Industry Collaboration – Design of SoC, Server designs, Deploy and market
solutions based on AUM Processor
• Targeted for both HPC and Cloud market
• Planned to be available in 2024

5
Some of the Architectural decisions
• Best Efficiency
• Memory Bandwidth
• Easy to optimize (Vector Size)
• Superior Application level program /
Watt
• Better I/O for Data Access • Superior Application level program /
• HBM and DDR Watt -> Increase Memory sub-system
• Many PCIe5 Lanes
performance
• CXL for Coherent accelerators • Need Much better Bytes/Flop
performance – Target > 0.5 Byte/Flop
• No Competition with specialized
devices like GPUs – Keep Provision
of GPUs for specialized applications
• Security Features Provision

6
High Performance Conjugate Gradients (HPCG) Benchmark
HPL Rmax Fraction of
Rank Site Computer Bytes/Flop HPCG (Pflop/s)
(Pflop/s) Peak
RIKEN Center for Supercomputer
1 Computational Science Fugaku — A64FX 48C 442.01 0.3 16 3.00%
Japan 2.2GHz, Tofu D
Summit — IBM
DOE/SC/ORNL POWER9 22C 3.07GHz,
2 Dual-rail Mellanox EDR 148.6 <0.2 2.926 1.50%
Infiniband, NVIDIA
USA Volta GV100
Perlmutter — AMD
DOE/SC/LBNL/NERSC EPYC 7763 64C
3 2.45GHz, Slingshot-10, 64.59 <0.2 1.905 2.10%
NVIDIA A100 SXM4 40
United States GB
Sierra — IBM POWER9
DOE/NNSA/LLNL 22C 3.1GHz, Dual-rail
4 Mellanox EDR 94.64 <0.2 1.796 1.40%
Infiniband, NVIDIA
USA Volta GV100

K-Computer: Bytes/Flop = 0.5 Superior Application level program


HPCG – 5.2% of Peak Better Bytes/Flop i.e. higher Memory B/w

7
C-DAC HPC SoC (A48Z) Block Diagram (48-Cores)

Other Other
Chiplet Socket
Arm Neoverse V1 (Zeus)

ArmZeus
Zeus
D2D C2C
Arm
Cores SubSystem SubSystem
Cores (48) PCIe Gen5
(32) (Fully (Fully
/ CXL
Coherent) Coherent)

CortexM7
System
based MSCP Coherent Mesh Network Cache
SubSystem

(Memory
Security
Subsystem)
SubSystem
HBM3, DDR5
C-DAC AUM ॐ Microprocessor – 96 Cores

HBM3 5600 HBM3 5600


RAM RAM
Interpos
HBM3-6400 er
HBM3-640
0
Interposer
PCIe Gen5

PCIe Gen5
A48Z A48Z
CXL

CXL
/

/
(Chiplet-1) D2D D2 (Chiplet-2)
48-Zeus Cores D 48-Zeus Cores

0
DDR5-520
8-DDR5 Channels 8-DDR5 Channels
DDR5-520
0

HBM3-6400 HBM3-6400

HBM3
HBM35600 HBM3 5600
RAM
5600 RAM
PHY
D2D Chiplet Interconnect
AUM - HPC Processor Development

• 96 core HPC Processor


• ARM 8.4 architecture
• 96MB L2 cache, 96MB System cache
• 8 channel 5200 Mhz DDR5 memory
• 64 GB HBM3 5600Mhz memory
• PCIe5 64/128 Lanes – CXL support for coherent Accelerators/ NIC
• SMP support up to 2 sockets
• Security Features - Secure boot and Crypto support
• 5nm Technology Node, Chiplet based architecture, 2-Chiplets, 96-Cores and up to
96-GB HBM3 memory in a socket
• Dual socket Server design with up to 4 Industry standard GPU accelerators –
Both HPC and AI applications (CPU Only node ~ 10 TF/Node)
• Indigenous Software eco-system for Aum Processor leveraging open source
eco-system

10
Specification Comparison

Fujitsu A64FX C-DAC AUM HPC Processor


Fabrication 7nm FF TSMC 5nm FF
Technology
Core (48+4)-Cores, 2.2 GHz (typical) 96-Cores, 3.0 GHz (typical)
Configuration 3.5+ GHz (turbo)
DDR Configuration No DDR 16-Channels (32 bit) DDR5-5200
BW = 332.8 GB/s
HBM 32-GB HBM2 (4-Controllers) 64-GB HBM3 (4-Controllers)
BW = 1 TB/s BW = 2.87 TB/s
PCIe 16 PCIe Gen3 Lanes 64 PCIe Gen5 Lanes
Power Not Known 300 W (TDP)
Performance (DP) 2.7 TFLOPS per socket 4.6+ TFLOPS per socket
Bytes/FLOPS 0.38 0.7
C-DAC HPC System SW & Development Tools
• System Software, Dev Tools and
Utilities
• HPC Compiler (C and Fortran,
multicore + accelerator, HPC & AI
applications)
• IDE for HPC & AI applications on ARM
system supporting multiple parallel
paradigms
• Automatic Parallelizer generate
parallel code for multicore /
accelerators
• Application Debugger & Profiler
• Optimized Math-AI Libraries
• ARM System Monitor and Utilities
• Parallel Runtime System
• Secure Access Interface
Dual Socket Compute Node: ANANTA
4/8 4/8 4/8 4/8
DIMM DIMM Slots DIMM DIMM DIMM Slots DIMM

8 8 8 8
Riser slot_0 1 x16 1 x16 Riser slot_4
(PCIe/CXL) 4 x HBM3 (PCIe/CXL)
4 x HBM3
Riser Slot_1 1 x16 C2C
(PCIe/CXL) Socket_0 (CCIX) Socket_1 1 x16 Riser Slot_5
4 x16 (PCIe/CXL)
OCP 3.0 IF/
1 x16
Riser Slot_2
(PCIe)
1 x16 Riser Slot_6
1 x16 (PCIe)
(PCIe)
Riser Slot_7
1 x16
(PCIe)

2 x4 1 x4 2 x2

2x M.2 1 x2 1 x2
NVMe Network
Adapter USB Debug Port
3.0/2.0 BMC
MGMT Interface
Clock
UTP UTP (for KVM and Distribution
1G/10G 1G/10G Redfish System
ports ports support)
Power
CPLD
Distribution
Block
System
Single Socket Compute Node: ANANTA
4/8 4/8
DIMM DIMM Slots DIMM

8 8
1 x16
Riser slot_0 1 x16 Riser slot_4 (PCIe)
(PCIe/CXL) 4 x HBM3 1 x16
1 x16 Riser Slot_5 (PCIe)
Riser Slot_1
(PCIe/CXL) Socket_0 1 x16
Riser Slot_6 (PCIe)
OCP 3.0 IF/ 1 x16 1 x16
Riser Slot_2 Riser Slot_7 (PCIe)
(PCIe)

(PCIe)
1 x16

2 x4 1 x4 2 x2

2x M.2 1 x2 1 x2
NVMe Network
Adapter USB Debug Port
3.0/2.0 BMC
MGMT Interface
Clock
UTP UTP (for KVM and Distribution
1G/10G 1G/10G Redfish System
ports ports support)
Power
CPLD
Distribution
Block
System
Summary Aum Processor
• Competitive HPC Processor for HPC, AI and server market
• Address Strategic requirements
• Complete ecosystem
• Open source software ecosystem
• Reference Boards
• Reference server designs – Derivatives as per market requirements
• Industry partners OEMs/ODMs/Solution providers
• Towards Indigenous Exascale system including Processors
• Targeted market HPC/AI, Cloud, storage, edge computing
• Planned to be available in 2024

15
Thank You

16
Market Comparison
Ampere Altra SiPearl Rhea C-DAC ॐ
80 Arm Neoverse N1 (Ares)
Cores 72 Arm Neoverse V1 (Zeus) Cores 96 Arm Neoverse V1 (Zeus) Cores
Cores
L1: 64KB I-Cache / 64KB D-Cache
L1: 64 KB L1 I / 64 KB L1 D per Core
Cache per core L2: 1MB Unified Cache per Core
Cache System Cache: 128 MB
L2: 1 MB L2 cache per core System Cache: 96MB, Snoop
System Cache: 32 MB Filter: 192MB

Frequency 3.0 (base), 3.3 GHz (turbo) 2.5 GHz (base), 3.0 GHz (turbo) 3.0 Ghz

HBM No HBM 96GB of HBM2E 96GB of HBM3


DDR up to 4 TB per socket. 4 Channels DDR5 16 Channles DDR5
104 Lanes PCIe5: 128 Lanes PCIe5:
- Upto 64-lanes for coherent - Upto 64-lanes for coherent
PCI 128 PCIe4 Lanes
connectivity connectivity
- Remaining lanes as PCIe5 - Remaining lanes as PCIe5/CXL
TDP 250 W 320W 280 - 320 W
Node TSMC 5nm TSMC 6nm TSMC 5nm
Package 2.5 D 2.5D
Release Year 2020 2022 /23 2023 / 24

You might also like