SlideShare a Scribd company logo
Webinar: Getting started with Mahti
Jussi Enkovaara
Contents
• Overview of Mahti
• Running programs in Mahti
• Building programs in Mahti
• Technical details about Mahti
2
User documentation: docs.csc.fi
Getting access to Mahti
• All users need to apply for new services via new CSC
customer portal my.csc.fi
• Project manager of CSC project need to apply for Mahti
service in CSC customer portal my.csc.fi
oProject manager can addCSC users to the project
oUsers need to accept terms and conditions
• Connect with ssh
ossh <csc_username>@mahti.csc.fi
1.9.20203
Mahti - overview
• 1404 compute nodes with next generation AMD Rome CPUs
• Two 64 coreCPUs per node
• core can run 2 threads, thus applications can see 256 “cores” per node
• 2.6GHz base frequency (maximum boost 3.3GHz)
• 256 GB of memory per node
• About 180 000 cores in total
• Infiniband HDR interconnect between nodes
o200 GB/s bandwidth
• Over 8 petabytes of work disk for data under active use
1.9.20204
In customer use
sinceAugust 2020
Storage in Mahti
1.9.20205
• Similar disk system as in Puhti
• SCRATCH directories are of the form: /scratch/<project>
• PROJAPPL: /projappl/<project>
• Project names and other information can be found at my.csc.fi
• csc-workspaces –command can be used for listing available
directories in Mahti
• The disk areas for different supercomputers are separate,
home, projappl and scratch in Puhti cannot be directly
accessed from Mahti.
Moving data between Puhti and Mahti
1.9.20206
• Data can be moved between supercomputers via Allas
oRecommended approach if the data should also be preserved for a longer
time.
• Data can also be copied directly with the rsync command
• From example, copy directory my_results from Puhti to Mahti:
rsync -azP my_results <username>@mahti.csc.fi:/scratch/project_xxxxxxx
Module system
• Similar module system as in Puhti
• Module system is hierarchical, availability of modules can
depend on the loaded modules (e.g. compiler suite).
• List modules compatible with the current set
module avail
• List all available modules
module spider
7
Running applications
• Scientific software installed by CSC available via modules
• Due to many cores within a node, many applications benefit
from hybrid MPI/OpenMP parallelization
oSome applications benefit from simultaneous multithreading (SMT),
i.e. two threads per core.
oSimultaneous multithreading can also slow down applications
oMemory bound applications may benefit from using less than 128
cores per node
• Optimum ratio of MPI tasks / OpenMP threads per node
depends heavily on application and the input set, and
should be tested before production runs
1.9.20208
Batch job partitions in Mahti
9
Partition Nodes Time limit Access
test 1-2 1 hour All
medium 1-20 36 hours All
large 20-200 36 hours Scalability test
gc 1-700 36 hours Scalability test
• Only full nodes are allocated in Mahti
• Jobs have access to all cores and memory in node, but may
choose to run with fewer cores for better performance
• Billing is based on allocated nodes
Access to large partition
• Project manager can apply access to large partition via
my.csc.fi
• 30 day test period is granted automatically
• During the test period the scalability and parallel
performance of the code can be demonstrated
• Results are submitted for evaluation and production
access is granted if the performance is sufficient
• Detailed instructions at:
docs.csc.fi/accounts/how-to-access-mahti-large-partition/
10
Interactive pre/post-processing
• Jobs in interactive partition can reserve 1-8 cores and
each core reserves 1,875 GB of memory
• Easy to use sinteractive -i tool
oBy default, two cores and 24 hours
• Interactive partition can be used also via normal batch
job scripts
• User can reserve a maximum of 8 cores at a time
oCan be split in multiple small sessions
11
Pure MPI job
12
#!/bin/bash
#SBATCH --job-name=example
#SBATCH --account=<project>
#SBATCH --partition=medium
#SBATCH --time=02:00:00
#SBATCH --nodes=10
#SBATCH --ntasks-per-node=128
export OMP_NUM_THREADS=1
srun myprog <options>
Hybrid MPI+OpenMP job
13
#!/bin/bash
#SBATCH --job-name=example
#SBATCH --account=<project>
#SBATCH --partition=medium
#SBATCH --time=02:00:00
#SBATCH --nodes=10
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=8
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun myprog <options>
Hybrid MPI+OpenMP job with SMT
14
#!/bin/bash
#SBATCH --job-name=example
#SBATCH --account=<project>
#SBATCH --partition=medium
#SBATCH --time=02:00:00
#SBATCH --nodes=10
#SBATCH --ntasks-per-node=16
#SBATCH --cpus-per-task=16
#SBATCH --hint=multithread
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun myprog <options>
Affinity in hybrid MPI + OpenMP jobs
• By default, operating system is allowed to move threads between
CPU cores
• In many HPC applications it is beneficial to bind threads to cores
by setting
export OMP_PLACES=cores
• Affinity can be printed out to stderr of job by setting
export OMP_AFFINITY_FORMAT="Process %P level %L thread %0.3n affinity %A"
export OMP_DISPLAY_AFFINITY=true
15
Building applications in Mahti
• Currently, GNU, AMD and Intel compiler suites are
available via modules
• See documentation for recommended compiler settings
• High performance libraries available via modules
oMost libraries are provided both as single threaded and
multithreadedversions (with omp in the module version)
oFor pure MPI applications and applications calling libraries from
multiple threads use single threaded version
• Note: MKL library that is provided within Intel compiler
suite does not fully utilize AMDCPUs !
1.9.202016
Technical details
Mahti node
• two 64 core AMD EPYC 7H12 (Rome) processors
o2.6GHz base frequency (3.3GHz max boost )
oAVX2 vector instructions
• Cache hierarchy
1.9.202018
Cache L1 L2 L3
Size 32 kb 512 kb 16 MB
Private / shared Private per
core
Private per
core
Shared among
4 cores
SMT threads share L1 and L2 caches
Hierarchical architecture
• Mahti node has highly hierarchical architecture
19
• Each CCD
(Cluster Complex Die)
contains two 4 core
CCXs (Core CompleX)
• L3 cache is shared
withinCCX
Hierarchical architecture
• Mahti node has highly hierarchical architecture
20
• Even though memory is
shared between all cores,
latency and bandwidth vary
Idle latency (ns) Bandwidth (GB/s)
Within NUMA node 80 41
Within socket 100-120 37-39
Between sockets 220 21-22
Rank/thread placement
• Memory bound applications may benefit from running only
a single MPI task / OpenMP thread per NUMA node or per
CCX (L3 cache)
• Slurm places MPI tasks --cpus-per-task apart
• OMP_PLACES and OMP_PROC_BIND can be used for
controlling placement of OpenMP threads
• srun option --cpu-bind=verbose can be used for printing
out the binding of MPI tasks
• OMP_AFFINITY_FORMAT + OMP_DISPLAY_AFFINITY can be
used for checking thread binding21
Example: single MPI task per NUMA
22
#!/bin/bash
#SBATCH --job-name=example
#SBATCH --account=<project>
#SBATCH --partition=medium
#SBATCH --time=02:00:00
#SBATCH --nodes=10
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=16
export OMP_NUM_THREADS=1
export OMP_PLACES=cores
srun myprog <options>
Example: single MPI task per NUMA, single
thread per CCX (L3 cache)
23
#!/bin/bash
#SBATCH --job-name=example
#SBATCH --account=<project>
#SBATCH --partition=medium
#SBATCH --time=02:00:00
#SBATCH --nodes=10
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=16
export OMP_NUM_THREADS=4
export OMP_PLACES=cores
export OMP_PROC_BIND=spread
srun myprog <options>
Network topology
• Infiniband HDR network
obandwidth of 200 Gbit/s and a
MPI latency of ~1.3 us per link
• Dragonfly+ topology, 6
groups each consisting of a
separate fat tree
oFat trees connected with all-
to-all links
24
Questions?
• Up-to-date information about timetables, relevant
changes for users etc. : research.csc.fi/dl2021-utilization
• CSC Customer portal: my.csc.fi
• User documention: docs.csc.fi
1.9.202025

More Related Content

PPT
Snooping 2
PDF
Functional approach to packet processing
PDF
Deploying containers and managing them on multiple Docker hosts, Docker Meetu...
PPTX
Tuning linux for mongo db
PPTX
Memory model
PPTX
Preparing OpenSHMEM for Exascale
PDF
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
PDF
Ceph on arm64 upload
Snooping 2
Functional approach to packet processing
Deploying containers and managing them on multiple Docker hosts, Docker Meetu...
Tuning linux for mongo db
Memory model
Preparing OpenSHMEM for Exascale
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Ceph on arm64 upload

What's hot (20)

PDF
Utilizing AMD GPUs: Tuning, programming models, and roadmap
PPTX
Open shmem
PDF
CephFS Update
PDF
Gluster d2
PDF
librados
PDF
Ceph Month 2021: RADOS Update
PDF
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
PPTX
Geneve
PDF
Making distributed storage easy: usability in Ceph Luminous and beyond
PPTX
Introduction to memory order consume
PDF
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
PDF
Accelerating HBase with NVMe and Bucket Cache
PDF
Ceph Performance: Projects Leading up to Jewel
PDF
IPv4aaS tutorial and hands-on
ODP
GlusterFS Containers
PPTX
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
PDF
2021.02 new in Ceph Pacific Dashboard
PDF
Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...
PPT
PDF
What's new in Luminous and Beyond
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Open shmem
CephFS Update
Gluster d2
librados
Ceph Month 2021: RADOS Update
OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in t...
Geneve
Making distributed storage easy: usability in Ceph Luminous and beyond
Introduction to memory order consume
Ceph, Now and Later: Our Plan for Open Unified Cloud Storage
Accelerating HBase with NVMe and Bucket Cache
Ceph Performance: Projects Leading up to Jewel
IPv4aaS tutorial and hands-on
GlusterFS Containers
Ceph at Work in Bloomberg: Object Store, RBD and OpenStack
2021.02 new in Ceph Pacific Dashboard
Practical CephFS with nfs today using OpenStack Manila - Ceph Day Berlin - 12...
What's new in Luminous and Beyond
Ad

Similar to Mahti quick-start guide (20)

PPTX
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
PPTX
HPC and cloud distributed computing, as a journey
PPTX
multithread in multiprocessor architecture
PPTX
Uni Processor Architecture
PPTX
High performance computing capacity building
PDF
ODSA Sub-Project Launch
PDF
ODSA Sub-Project Launch
PDF
A Dataflow Processing Chip for Training Deep Neural Networks
PDF
Application Profiling at the HPCAC High Performance Center
PPTX
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
PPTX
Advanced Computer Architecture
PPTX
Pune-Cocoa: Blocks and GCD
PPTX
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
PPTX
PEARC17: Improving Uintah's Scalability Through the Use of Portable Kokkos-Ba...
PDF
OpenPOWER Acceleration of HPCC Systems
PPTX
Study of various factors affecting performance of multi core processors
PDF
DPDK Summit 2015 - Aspera - Charles Shiflett
PPTX
Realtime traffic analyser
PPTX
High performance computing
PDF
Motivation for multithreaded architectures
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
HPC and cloud distributed computing, as a journey
multithread in multiprocessor architecture
Uni Processor Architecture
High performance computing capacity building
ODSA Sub-Project Launch
ODSA Sub-Project Launch
A Dataflow Processing Chip for Training Deep Neural Networks
Application Profiling at the HPCAC High Performance Center
Ceph Day SF 2015 - Deploying flash storage for Ceph without compromising perf...
Advanced Computer Architecture
Pune-Cocoa: Blocks and GCD
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
PEARC17: Improving Uintah's Scalability Through the Use of Portable Kokkos-Ba...
OpenPOWER Acceleration of HPCC Systems
Study of various factors affecting performance of multi core processors
DPDK Summit 2015 - Aspera - Charles Shiflett
Realtime traffic analyser
High performance computing
Motivation for multithreaded architectures
Ad

More from CSC - IT Center for Science (10)

PDF
Spring School on Computational Chemistry
PDF
CSC - Corporate Social Responsibility Report 2016
PDF
CSC - Yhteiskuntavastuuraportti 2016
PDF
Meidän tapamme toimia – CSC:n toimintaohje
PDF
Our Way of Working – CSC Code of Conduct
PPTX
CSC;n palvelut korkeakouluille
PPTX
Building European competitiveness: Smart specialization for European datacenters
PPTX
Miksi Suomen tulee investoida tieteellisen laskennan tutkimusinfrastruktuuriin?
PPTX
Palvelut suomalaisen koulutuksen, tieteen, kulttuurin ja hallinnon tarpeisiin...
PPT
Dataintensiivinen laskenta Suomen menestyksen tekijänä
Spring School on Computational Chemistry
CSC - Corporate Social Responsibility Report 2016
CSC - Yhteiskuntavastuuraportti 2016
Meidän tapamme toimia – CSC:n toimintaohje
Our Way of Working – CSC Code of Conduct
CSC;n palvelut korkeakouluille
Building European competitiveness: Smart specialization for European datacenters
Miksi Suomen tulee investoida tieteellisen laskennan tutkimusinfrastruktuuriin?
Palvelut suomalaisen koulutuksen, tieteen, kulttuurin ja hallinnon tarpeisiin...
Dataintensiivinen laskenta Suomen menestyksen tekijänä

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Big Data Technologies - Introduction.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPT
Teaching material agriculture food technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Cloud computing and distributed systems.
PDF
Chapter 3 Spatial Domain Image Processing.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Diabetes mellitus diagnosis method based random forest with bat algorithm
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
MYSQL Presentation for SQL database connectivity
Big Data Technologies - Introduction.pptx
The AUB Centre for AI in Media Proposal.docx
Encapsulation_ Review paper, used for researhc scholars
Mobile App Security Testing_ A Comprehensive Guide.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Teaching material agriculture food technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Advanced methodologies resolving dimensionality complications for autism neur...
Cloud computing and distributed systems.
Chapter 3 Spatial Domain Image Processing.pdf

Mahti quick-start guide

  • 1. Webinar: Getting started with Mahti Jussi Enkovaara
  • 2. Contents • Overview of Mahti • Running programs in Mahti • Building programs in Mahti • Technical details about Mahti 2 User documentation: docs.csc.fi
  • 3. Getting access to Mahti • All users need to apply for new services via new CSC customer portal my.csc.fi • Project manager of CSC project need to apply for Mahti service in CSC customer portal my.csc.fi oProject manager can addCSC users to the project oUsers need to accept terms and conditions • Connect with ssh ossh <csc_username>@mahti.csc.fi 1.9.20203
  • 4. Mahti - overview • 1404 compute nodes with next generation AMD Rome CPUs • Two 64 coreCPUs per node • core can run 2 threads, thus applications can see 256 “cores” per node • 2.6GHz base frequency (maximum boost 3.3GHz) • 256 GB of memory per node • About 180 000 cores in total • Infiniband HDR interconnect between nodes o200 GB/s bandwidth • Over 8 petabytes of work disk for data under active use 1.9.20204 In customer use sinceAugust 2020
  • 5. Storage in Mahti 1.9.20205 • Similar disk system as in Puhti • SCRATCH directories are of the form: /scratch/<project> • PROJAPPL: /projappl/<project> • Project names and other information can be found at my.csc.fi • csc-workspaces –command can be used for listing available directories in Mahti • The disk areas for different supercomputers are separate, home, projappl and scratch in Puhti cannot be directly accessed from Mahti.
  • 6. Moving data between Puhti and Mahti 1.9.20206 • Data can be moved between supercomputers via Allas oRecommended approach if the data should also be preserved for a longer time. • Data can also be copied directly with the rsync command • From example, copy directory my_results from Puhti to Mahti: rsync -azP my_results <username>@mahti.csc.fi:/scratch/project_xxxxxxx
  • 7. Module system • Similar module system as in Puhti • Module system is hierarchical, availability of modules can depend on the loaded modules (e.g. compiler suite). • List modules compatible with the current set module avail • List all available modules module spider 7
  • 8. Running applications • Scientific software installed by CSC available via modules • Due to many cores within a node, many applications benefit from hybrid MPI/OpenMP parallelization oSome applications benefit from simultaneous multithreading (SMT), i.e. two threads per core. oSimultaneous multithreading can also slow down applications oMemory bound applications may benefit from using less than 128 cores per node • Optimum ratio of MPI tasks / OpenMP threads per node depends heavily on application and the input set, and should be tested before production runs 1.9.20208
  • 9. Batch job partitions in Mahti 9 Partition Nodes Time limit Access test 1-2 1 hour All medium 1-20 36 hours All large 20-200 36 hours Scalability test gc 1-700 36 hours Scalability test • Only full nodes are allocated in Mahti • Jobs have access to all cores and memory in node, but may choose to run with fewer cores for better performance • Billing is based on allocated nodes
  • 10. Access to large partition • Project manager can apply access to large partition via my.csc.fi • 30 day test period is granted automatically • During the test period the scalability and parallel performance of the code can be demonstrated • Results are submitted for evaluation and production access is granted if the performance is sufficient • Detailed instructions at: docs.csc.fi/accounts/how-to-access-mahti-large-partition/ 10
  • 11. Interactive pre/post-processing • Jobs in interactive partition can reserve 1-8 cores and each core reserves 1,875 GB of memory • Easy to use sinteractive -i tool oBy default, two cores and 24 hours • Interactive partition can be used also via normal batch job scripts • User can reserve a maximum of 8 cores at a time oCan be split in multiple small sessions 11
  • 12. Pure MPI job 12 #!/bin/bash #SBATCH --job-name=example #SBATCH --account=<project> #SBATCH --partition=medium #SBATCH --time=02:00:00 #SBATCH --nodes=10 #SBATCH --ntasks-per-node=128 export OMP_NUM_THREADS=1 srun myprog <options>
  • 13. Hybrid MPI+OpenMP job 13 #!/bin/bash #SBATCH --job-name=example #SBATCH --account=<project> #SBATCH --partition=medium #SBATCH --time=02:00:00 #SBATCH --nodes=10 #SBATCH --ntasks-per-node=16 #SBATCH --cpus-per-task=8 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun myprog <options>
  • 14. Hybrid MPI+OpenMP job with SMT 14 #!/bin/bash #SBATCH --job-name=example #SBATCH --account=<project> #SBATCH --partition=medium #SBATCH --time=02:00:00 #SBATCH --nodes=10 #SBATCH --ntasks-per-node=16 #SBATCH --cpus-per-task=16 #SBATCH --hint=multithread export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK srun myprog <options>
  • 15. Affinity in hybrid MPI + OpenMP jobs • By default, operating system is allowed to move threads between CPU cores • In many HPC applications it is beneficial to bind threads to cores by setting export OMP_PLACES=cores • Affinity can be printed out to stderr of job by setting export OMP_AFFINITY_FORMAT="Process %P level %L thread %0.3n affinity %A" export OMP_DISPLAY_AFFINITY=true 15
  • 16. Building applications in Mahti • Currently, GNU, AMD and Intel compiler suites are available via modules • See documentation for recommended compiler settings • High performance libraries available via modules oMost libraries are provided both as single threaded and multithreadedversions (with omp in the module version) oFor pure MPI applications and applications calling libraries from multiple threads use single threaded version • Note: MKL library that is provided within Intel compiler suite does not fully utilize AMDCPUs ! 1.9.202016
  • 18. Mahti node • two 64 core AMD EPYC 7H12 (Rome) processors o2.6GHz base frequency (3.3GHz max boost ) oAVX2 vector instructions • Cache hierarchy 1.9.202018 Cache L1 L2 L3 Size 32 kb 512 kb 16 MB Private / shared Private per core Private per core Shared among 4 cores SMT threads share L1 and L2 caches
  • 19. Hierarchical architecture • Mahti node has highly hierarchical architecture 19 • Each CCD (Cluster Complex Die) contains two 4 core CCXs (Core CompleX) • L3 cache is shared withinCCX
  • 20. Hierarchical architecture • Mahti node has highly hierarchical architecture 20 • Even though memory is shared between all cores, latency and bandwidth vary Idle latency (ns) Bandwidth (GB/s) Within NUMA node 80 41 Within socket 100-120 37-39 Between sockets 220 21-22
  • 21. Rank/thread placement • Memory bound applications may benefit from running only a single MPI task / OpenMP thread per NUMA node or per CCX (L3 cache) • Slurm places MPI tasks --cpus-per-task apart • OMP_PLACES and OMP_PROC_BIND can be used for controlling placement of OpenMP threads • srun option --cpu-bind=verbose can be used for printing out the binding of MPI tasks • OMP_AFFINITY_FORMAT + OMP_DISPLAY_AFFINITY can be used for checking thread binding21
  • 22. Example: single MPI task per NUMA 22 #!/bin/bash #SBATCH --job-name=example #SBATCH --account=<project> #SBATCH --partition=medium #SBATCH --time=02:00:00 #SBATCH --nodes=10 #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=16 export OMP_NUM_THREADS=1 export OMP_PLACES=cores srun myprog <options>
  • 23. Example: single MPI task per NUMA, single thread per CCX (L3 cache) 23 #!/bin/bash #SBATCH --job-name=example #SBATCH --account=<project> #SBATCH --partition=medium #SBATCH --time=02:00:00 #SBATCH --nodes=10 #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=16 export OMP_NUM_THREADS=4 export OMP_PLACES=cores export OMP_PROC_BIND=spread srun myprog <options>
  • 24. Network topology • Infiniband HDR network obandwidth of 200 Gbit/s and a MPI latency of ~1.3 us per link • Dragonfly+ topology, 6 groups each consisting of a separate fat tree oFat trees connected with all- to-all links 24
  • 25. Questions? • Up-to-date information about timetables, relevant changes for users etc. : research.csc.fi/dl2021-utilization • CSC Customer portal: my.csc.fi • User documention: docs.csc.fi 1.9.202025