SlideShare a Scribd company logo
Evaluating HPX and Kokkos on RISC-V using an
Astrophysics Application Octo-Tiger
Patrick Diehl
Joint work with: Gregor Daiß, Steven R. Brandt, Alireza Kheirkhahan,
Hartmut Kaiser, Christopher Taylor, and John Leidel
Louisiana State University
patrickdiehl@lsu.edu
April 25, 2024
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 1 / 27
Motivation
What is RISC-V?
RISC-V was introduced in 2015 as an open standard instruction set
architecture (ISA); RISC-V is an iteration on established reduced
instruction set computer (RISC) principles
Why is it interesting for the HPC community?
The RISC-V ISA is completely open for use by anyone and is
royalty-free.
RISC-V is extensible; processor features can be added to provide
customized capabilities (ie: Cache management, SIMD, and Vector
Machine support are optional)
European Processor Initiative (EPI), which aims to develop a
vendor-independent European CPU for high-performance computing,
has identified RISC-V as a target for future investment.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 2 / 27
Overview
1 Astrophysical application
2 Software stack
Octo-Tiger
Kokkos
HPX
3 Porting the software stack to RISC-V
4 In-house RISC-V Test System
5 Performance measurements
Node level scaling
Distributed scaling
6 Energy consumption
7 Conclusion and Outlook
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 3 / 27
Astrophysical application
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 4 / 27
Example simulation
Astrophysical event: Merging of two stars – Flow on the surface which
corresponds in layman’s terms to the weather on the stars.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 5 / 27
Software stack
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 6 / 27
Octo-Tiger
Astrophysics open source program1 simulating the evolution of star
systems based on the fast multipole method on adaptive Octrees.
Modules
Hydro
Gravity
Radiation
Supports
Communication:
MPI/libfabric/LCI/GASNet +
OpenSHMEM
Backends: CUDA, HIP, SYCL
Reference
Marcello, Dominic C., et al. ”octo-tiger: a new, 3D hydrodynamic code for stellar mergers that uses hpx
parallelization.” Monthly Notices of the Royal Astronomical Society 504.4 (2021): 5345-5382.
1
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 7 / 27
Kokkos: C++ Performance Portability Programming
EcoSystem
Kokkos is a C++ library2 for writing performance portable applications
targeting all major HPC platforms
CPU
OpenMP
HPX
GPU
Native: CUDA & HIP
SYCL: CUDA & HIP
Reference
Trott, Christian R., et al. ”Kokkos 3: Programming model extensions for the exascale era.” IEEE Transactions on
Parallel and Distributed Systems 33.4 (2021): 805-817.
2
https://github.com/kokkos/kokkos
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 8 / 27
HPX
HPX is a open source C++ Standard Library for Concurrency and
Parallelism3.
Features
HPX exposes a uniform, standards-oriented API for ease of
programming parallel and distributed applications.
HPX provides unified syntax and semantics for local and remote
operations.
HPX exposes a uniform, flexible, and extendable performance counter
framework which can enable runtime adaptivity.
Reference
Kaiser, Hartmut, et al. ”HPX-the C++ standard library for parallelism and concurrency.” Journal of Open Source
Software 5.53 (2020): 2352.
3
https://github.com/STEllAR-GROUP/hpx
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 9 / 27
Porting the software stack to RISC-V
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 10 / 27
Porting HPX
Most parts of HPX are implemented ISO C++. However, small
portions of the runtime system are implemented using assembly.
The HPX context-switching software implementation can optionally
utilize Boost.Context support or a native independently provided
assembly implementation for a targeted ISA. Note HPX already relies
on Boost.
We had to do some single source code modification within the HPX
timer. The RISC-V HPX port implements timing using the RISC-V
RDTIME instruction. RDTIME is a pseudo-instruction that reads
bits from the time Control and Status Register (CSR).
Recall, that since we had an ISO C++ compiler and had Boost support, the
code changes were minimal.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 11 / 27
Porting Kokkos and Octo-Tiger
Kokkos
Building Kokkos required no changes to the code base and GCC
compiled the Kokkos without any issues.
However, Kokkos’s build system CMake files required some minor
changes. The RISC-V architecture was not detected, and incorrect
compiler flags were added for the architecture and vectorization.
Octo-Tiger
Octo-Tiger needed no porting after HPX and Kokkos were already
ported.
Due to the abstraction levels provided by HPX and Kokkos porting the
software stack was a walk in the park.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 12 / 27
In-house RISC-V Test System
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 13 / 27
In-house RISC-V test system I
Image of the in-house cluster using
two VisionFive2 Open Source
RISC-V single board computers with
Quad-core StarFive JH7110 64-bit
CPU and 8GB LPDDR4 System
Memory.
Official image based on an older
Ubuntu version
Ubuntu Linux image based on
23.04 had the versions, we need
or the Slurm integration and
recent compilers.
The Ubuntu image does not
support USB and PCIe on the
VisionFive2.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 14 / 27
In-house RISC-V test system II
Two MILK-V with desktop
computers with 64-core SOPHON
SG2042 64-bit CPU and 128 GB
DDR System Memory.
Linux OS: Fedora Linux 38
Slurm integration
GNU compiler collection
MPI
We have the full HPC stack on
RISC-V. At least to run Kokkos and
HPX!
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 15 / 27
Performance measurements
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 16 / 27
Node-level scaling (MILK-V)
20 21 22 23 24 25 26
210
212
214
216
# cores
Processed
sub-grids
per
second
DWD (Initial mesh)
Level 10 Level 10 (optimized)
Level 11 Level 11 (optimized)
0.47
1.79
10.8
61.33
GFLOP/s
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 17 / 27
Distributed runs (Single-board computer)
0 200 400 600 800 1,000
1-RISC
1-Fugaku
2-RISC-TCP
2-RISC-MPI
2-Fugaku-MPI
91
168
140
778
1,091
Cells processed per second
Figure: For a comparison, runs on a single and two Supercomputer Fugaku nodes
are shown (each using only four cores out of the 48 available ones for a better
comparison).
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 18 / 27
Distributed runs (MILK-V) I
0 200 400 600 800
1
1
2
2
105.92
176.11
163.64
225.14
Processed sub-grids per time step
#
nodes
Level 11 (initial mesh)
RISC-V
A64FX
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 19 / 27
Distributed runs (MILK-V) II
0 200 400 600 800 1,000
1
16
740.31
635.74
Processed sub-grids per time step
#
nodes
v1309
RISC-V
A64FX
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 20 / 27
Energy consumption
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 21 / 27
How to measure energy consumption?
We want to compare the RISC-V boards and Supercomputer Fugaku for
the astrophysics application.
On Supercomputer Fugaku the power consumption was measured
with the PowerAPI interface provided by Riken.
On the RISC-V boards, no hardware counters for power measurements
are present. Here, we attached a power meter to the USB power
source and measured the power consumption while running the Linux
command stress –cpu 4 and while running Octo-Tiger with four cores.
Would be nice to have hardware counters to get more sophisticated
measurements for RISC-V!
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 22 / 27
Energy consumption (Single-board computer)
0 0.5 1 1.5 2
1-RISC
1-Fugaku
2-RISC-TCP
2-RISC-MPI
2-Fugaku-MPI
1.19
1.28
1.53
0.92
1.46
Wh
Figure: On Supercomputer Fugaku, the power consumption was measured using
PowerAPI. Due to missing hardware counters, the power consumption was
measured using a power meter on RISC-V.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 23 / 27
Energy consumption (MILK-V)
0 1,000 2,000 3,000
1
1
2
2
1,854.7
2,230.8
2,000.7
2,908.3
Wh
#
nodes
Level 11 (initial mesh)
RISC-V
A64FX
Recall that an A64FX node has 48 cores and a RISC-V node has 64 cores
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 24 / 27
Conclusion and Outlook
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 25 / 27
Conclusion and Outlook
Conclusion
Porting the software stack was rather easy due to the advanced C++
compilers on RISC-V.
HPX and Octo-Tiger scaled from one up to four cores. However, more
RAM and more cores are needed for sophisticated benchmarking.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 26 / 27
This work is licensed under a Creative Com-
mons “Attribution-NonCommercial-ShareAlike
3.0 Unported” license.
P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 27 / 27

More Related Content

PDF
Evaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-Tiger
PDF
Is RISC-V ready for HPC workload? Maybe?
PDF
Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku
PDF
RISC-V software state of the union
PDF
Recent developments in HPX and Octo-Tiger
PDF
RISC-V and open source chip design
PDF
Linux on RISC-V
PDF
Linux on RISC-V (ELC 2020)
Evaluating HPX and Kokkos on RISC-V Using an Astrophysics Application Octo-Tiger
Is RISC-V ready for HPC workload? Maybe?
Simulating Stellar Merger using HPX/Kokkos on A64FX on Supercomputer Fugaku
RISC-V software state of the union
Recent developments in HPX and Octo-Tiger
RISC-V and open source chip design
Linux on RISC-V
Linux on RISC-V (ELC 2020)

Similar to Evaluating HPX and Kokkos on RISC-V using an Astrophysics Application Octo-Tiger (20)

PDF
How to run Linux on RISC-V
PDF
Berlin Embedded Linux meetup: How to Linux on RISC-V
PDF
Involvement in OpenHPC
PDF
Porting our astrophysics application to Arm64FX and adding Arm64FX support us...
PDF
How to run Linux on RISC-V (FOSS North 2020)
PDF
Linux on RISC-V
PDF
Recent developments in HPX and Octo-Tiger
PPTX
Static partitioning virtualization on RISC-V
PDF
DRAC: Designing RISC-V-based Accelerators for next generation Computers
PPTX
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
PPTX
OpenACC Monthly Highlights: July 2021
PDF
Fluttercon Berlin 23 - Dart & Flutter on RISC-V
PDF
Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)
PDF
Porting HelenOS to RISC-V
PDF
OpenHPC: A Comprehensive System Software Stack
PPTX
RISC-V Foundation Overview
PDF
Linux on RISC-V with Open Hardware (ELC-E 2020)
PPTX
RISC-V-Introduction-_-Aug-2021.pptx
PPTX
RISC-V-The Open New-Era of Computing-04-19-202.pptx
PPTX
Educating the computer architects of tomorrow's critical systems with RISC-V
How to run Linux on RISC-V
Berlin Embedded Linux meetup: How to Linux on RISC-V
Involvement in OpenHPC
Porting our astrophysics application to Arm64FX and adding Arm64FX support us...
How to run Linux on RISC-V (FOSS North 2020)
Linux on RISC-V
Recent developments in HPX and Octo-Tiger
Static partitioning virtualization on RISC-V
DRAC: Designing RISC-V-based Accelerators for next generation Computers
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
OpenACC Monthly Highlights: July 2021
Fluttercon Berlin 23 - Dart & Flutter on RISC-V
Linux on RISC-V with Open Source Hardware (Open Source Summit Japan 2020)
Porting HelenOS to RISC-V
OpenHPC: A Comprehensive System Software Stack
RISC-V Foundation Overview
Linux on RISC-V with Open Hardware (ELC-E 2020)
RISC-V-Introduction-_-Aug-2021.pptx
RISC-V-The Open New-Era of Computing-04-19-202.pptx
Educating the computer architects of tomorrow's critical systems with RISC-V
Ad

More from Patrick Diehl (16)

PDF
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
PDF
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
PDF
Subtle Asynchrony by Jeff Hammond
PDF
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
PDF
JOSS and FLOSS for science: Examples for promoting open source software and s...
PDF
A tale of two approaches for coupling nonlocal and local models
PDF
Challenges for coupling approaches for classical linear elasticity and bond-b...
PDF
Quantifying Overheads in Charm++ and HPX using Task Bench
PDF
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
PDF
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
PDF
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
PDF
A review of benchmark experiments for the validation of peridynamics models
PDF
Deploying a Task-based Runtime System on Raspberry Pi Clusters
PDF
On the treatment of boundary conditions for bond-based peridynamic models
PDF
EMI 2021 - A comparative review of peridynamics and phase-field models for en...
PDF
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
D-HPC Workshop Panel : S4PST: Stewardship of Programming Systems and Tools
Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HP...
Subtle Asynchrony by Jeff Hammond
Framework for Extensible, Asynchronous Task Scheduling (FEATS) in Fortran
JOSS and FLOSS for science: Examples for promoting open source software and s...
A tale of two approaches for coupling nonlocal and local models
Challenges for coupling approaches for classical linear elasticity and bond-b...
Quantifying Overheads in Charm++ and HPX using Task Bench
Interactive C++ code development using C++Explorer and GitHub Classroom for e...
An asynchronous and task-based implementation of peridynamics utilizing HPX—t...
Quasistatic Fracture using Nonliner-Nonlocal Elastostatics with an Analytic T...
A review of benchmark experiments for the validation of peridynamics models
Deploying a Task-based Runtime System on Raspberry Pi Clusters
On the treatment of boundary conditions for bond-based peridynamic models
EMI 2021 - A comparative review of peridynamics and phase-field models for en...
Google Summer of Code mentor summit 2020 - Session 2 - Open Science and Open ...
Ad

Recently uploaded (20)

PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
2. Earth - The Living Planet earth and life
PPTX
Introduction to Cardiovascular system_structure and functions-1
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
2Systematics of Living Organisms t-.pptx
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Sciences of Europe No 170 (2025)
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Cell Membrane: Structure, Composition & Functions
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
2. Earth - The Living Planet earth and life
Introduction to Cardiovascular system_structure and functions-1
AlphaEarth Foundations and the Satellite Embedding dataset
Derivatives of integument scales, beaks, horns,.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Biophysics 2.pdffffffffffffffffffffffffff
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
. Radiology Case Scenariosssssssssssssss
2. Earth - The Living Planet Module 2ELS
2Systematics of Living Organisms t-.pptx
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
ECG_Course_Presentation د.محمد صقران ppt
Sciences of Europe No 170 (2025)
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
POSITIONING IN OPERATION THEATRE ROOM.ppt
Cell Membrane: Structure, Composition & Functions

Evaluating HPX and Kokkos on RISC-V using an Astrophysics Application Octo-Tiger

  • 1. Evaluating HPX and Kokkos on RISC-V using an Astrophysics Application Octo-Tiger Patrick Diehl Joint work with: Gregor Daiß, Steven R. Brandt, Alireza Kheirkhahan, Hartmut Kaiser, Christopher Taylor, and John Leidel Louisiana State University patrickdiehl@lsu.edu April 25, 2024 P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 1 / 27
  • 2. Motivation What is RISC-V? RISC-V was introduced in 2015 as an open standard instruction set architecture (ISA); RISC-V is an iteration on established reduced instruction set computer (RISC) principles Why is it interesting for the HPC community? The RISC-V ISA is completely open for use by anyone and is royalty-free. RISC-V is extensible; processor features can be added to provide customized capabilities (ie: Cache management, SIMD, and Vector Machine support are optional) European Processor Initiative (EPI), which aims to develop a vendor-independent European CPU for high-performance computing, has identified RISC-V as a target for future investment. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 2 / 27
  • 3. Overview 1 Astrophysical application 2 Software stack Octo-Tiger Kokkos HPX 3 Porting the software stack to RISC-V 4 In-house RISC-V Test System 5 Performance measurements Node level scaling Distributed scaling 6 Energy consumption 7 Conclusion and Outlook P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 3 / 27
  • 4. Astrophysical application P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 4 / 27
  • 5. Example simulation Astrophysical event: Merging of two stars – Flow on the surface which corresponds in layman’s terms to the weather on the stars. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 5 / 27
  • 6. Software stack P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 6 / 27
  • 7. Octo-Tiger Astrophysics open source program1 simulating the evolution of star systems based on the fast multipole method on adaptive Octrees. Modules Hydro Gravity Radiation Supports Communication: MPI/libfabric/LCI/GASNet + OpenSHMEM Backends: CUDA, HIP, SYCL Reference Marcello, Dominic C., et al. ”octo-tiger: a new, 3D hydrodynamic code for stellar mergers that uses hpx parallelization.” Monthly Notices of the Royal Astronomical Society 504.4 (2021): 5345-5382. 1 P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 7 / 27
  • 8. Kokkos: C++ Performance Portability Programming EcoSystem Kokkos is a C++ library2 for writing performance portable applications targeting all major HPC platforms CPU OpenMP HPX GPU Native: CUDA & HIP SYCL: CUDA & HIP Reference Trott, Christian R., et al. ”Kokkos 3: Programming model extensions for the exascale era.” IEEE Transactions on Parallel and Distributed Systems 33.4 (2021): 805-817. 2 https://github.com/kokkos/kokkos P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 8 / 27
  • 9. HPX HPX is a open source C++ Standard Library for Concurrency and Parallelism3. Features HPX exposes a uniform, standards-oriented API for ease of programming parallel and distributed applications. HPX provides unified syntax and semantics for local and remote operations. HPX exposes a uniform, flexible, and extendable performance counter framework which can enable runtime adaptivity. Reference Kaiser, Hartmut, et al. ”HPX-the C++ standard library for parallelism and concurrency.” Journal of Open Source Software 5.53 (2020): 2352. 3 https://github.com/STEllAR-GROUP/hpx P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 9 / 27
  • 10. Porting the software stack to RISC-V P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 10 / 27
  • 11. Porting HPX Most parts of HPX are implemented ISO C++. However, small portions of the runtime system are implemented using assembly. The HPX context-switching software implementation can optionally utilize Boost.Context support or a native independently provided assembly implementation for a targeted ISA. Note HPX already relies on Boost. We had to do some single source code modification within the HPX timer. The RISC-V HPX port implements timing using the RISC-V RDTIME instruction. RDTIME is a pseudo-instruction that reads bits from the time Control and Status Register (CSR). Recall, that since we had an ISO C++ compiler and had Boost support, the code changes were minimal. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 11 / 27
  • 12. Porting Kokkos and Octo-Tiger Kokkos Building Kokkos required no changes to the code base and GCC compiled the Kokkos without any issues. However, Kokkos’s build system CMake files required some minor changes. The RISC-V architecture was not detected, and incorrect compiler flags were added for the architecture and vectorization. Octo-Tiger Octo-Tiger needed no porting after HPX and Kokkos were already ported. Due to the abstraction levels provided by HPX and Kokkos porting the software stack was a walk in the park. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 12 / 27
  • 13. In-house RISC-V Test System P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 13 / 27
  • 14. In-house RISC-V test system I Image of the in-house cluster using two VisionFive2 Open Source RISC-V single board computers with Quad-core StarFive JH7110 64-bit CPU and 8GB LPDDR4 System Memory. Official image based on an older Ubuntu version Ubuntu Linux image based on 23.04 had the versions, we need or the Slurm integration and recent compilers. The Ubuntu image does not support USB and PCIe on the VisionFive2. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 14 / 27
  • 15. In-house RISC-V test system II Two MILK-V with desktop computers with 64-core SOPHON SG2042 64-bit CPU and 128 GB DDR System Memory. Linux OS: Fedora Linux 38 Slurm integration GNU compiler collection MPI We have the full HPC stack on RISC-V. At least to run Kokkos and HPX! P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 15 / 27
  • 16. Performance measurements P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 16 / 27
  • 17. Node-level scaling (MILK-V) 20 21 22 23 24 25 26 210 212 214 216 # cores Processed sub-grids per second DWD (Initial mesh) Level 10 Level 10 (optimized) Level 11 Level 11 (optimized) 0.47 1.79 10.8 61.33 GFLOP/s P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 17 / 27
  • 18. Distributed runs (Single-board computer) 0 200 400 600 800 1,000 1-RISC 1-Fugaku 2-RISC-TCP 2-RISC-MPI 2-Fugaku-MPI 91 168 140 778 1,091 Cells processed per second Figure: For a comparison, runs on a single and two Supercomputer Fugaku nodes are shown (each using only four cores out of the 48 available ones for a better comparison). P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 18 / 27
  • 19. Distributed runs (MILK-V) I 0 200 400 600 800 1 1 2 2 105.92 176.11 163.64 225.14 Processed sub-grids per time step # nodes Level 11 (initial mesh) RISC-V A64FX P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 19 / 27
  • 20. Distributed runs (MILK-V) II 0 200 400 600 800 1,000 1 16 740.31 635.74 Processed sub-grids per time step # nodes v1309 RISC-V A64FX P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 20 / 27
  • 21. Energy consumption P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 21 / 27
  • 22. How to measure energy consumption? We want to compare the RISC-V boards and Supercomputer Fugaku for the astrophysics application. On Supercomputer Fugaku the power consumption was measured with the PowerAPI interface provided by Riken. On the RISC-V boards, no hardware counters for power measurements are present. Here, we attached a power meter to the USB power source and measured the power consumption while running the Linux command stress –cpu 4 and while running Octo-Tiger with four cores. Would be nice to have hardware counters to get more sophisticated measurements for RISC-V! P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 22 / 27
  • 23. Energy consumption (Single-board computer) 0 0.5 1 1.5 2 1-RISC 1-Fugaku 2-RISC-TCP 2-RISC-MPI 2-Fugaku-MPI 1.19 1.28 1.53 0.92 1.46 Wh Figure: On Supercomputer Fugaku, the power consumption was measured using PowerAPI. Due to missing hardware counters, the power consumption was measured using a power meter on RISC-V. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 23 / 27
  • 24. Energy consumption (MILK-V) 0 1,000 2,000 3,000 1 1 2 2 1,854.7 2,230.8 2,000.7 2,908.3 Wh # nodes Level 11 (initial mesh) RISC-V A64FX Recall that an A64FX node has 48 cores and a RISC-V node has 64 cores P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 24 / 27
  • 25. Conclusion and Outlook P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 25 / 27
  • 26. Conclusion and Outlook Conclusion Porting the software stack was rather easy due to the advanced C++ compilers on RISC-V. HPX and Octo-Tiger scaled from one up to four cores. However, more RAM and more cores are needed for sophisticated benchmarking. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 26 / 27
  • 27. This work is licensed under a Creative Com- mons “Attribution-NonCommercial-ShareAlike 3.0 Unported” license. P. Diehl (CCT/Physics/LSU) HPX and Kokkos on RISC-V April 25, 2024 27 / 27