SlideShare a Scribd company logo
How To Scale From
Workstation Through Cloud
To HPC In Cryo-EM
Processing
Presented by: Dr Lance Wilson
Presentation Outline
What will we discuss…
• Who am I? Where am I from?
• What is cryo-electron microscopy?
• How long does analysis take?
• What impact does hardware have?
• What options do you suggest?
• What processing opportunities exist?
• Next opportunity: Light sheet microscopy!
Who am I and Where
am I from?
The research context
Video of globe
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
MASSIVE
… is a data processing engine for Australian
science. It empowers researchers to unlock
impactful research discoveries within scientific data.
Integrative High Performance Computing
• usability by new HPC user communities over raw capacity;
• hardware suited to data processing;
• underpinning high performing wet, experimental and data-focused
laboratories, with growing data processing needs;
• workflows that increase return on investment in instruments and
experiments; and
• porosity and flexibility to serve specific requirements in growth usage
areas, such as the life sciences, machine learning etc.
FEI Titan Krios
Nationally funded
project to develop
environments for Cryo
analysis
MMI Lattice
Light Sheet
Nationally funded
project to capture and
preprocess LLS data
Synchrotron
MX
Data Management,
Integration between
AS and MASSIVE M3
MASSIVE M3
Structural refinement
and analysis
Professor Trevor Lithgow
ARC Australian Laureate Fellow
Discovery of new protein transport
machines in bacteria, understanding the
assembly of protein transport machines,
and dissecting the effects of anti-
microbial peptides on anti-biotic
resistant “super-bugs”
Chamber details from the
nanomachine that secretes
the toxin that causes cholera.
Research and data by Dr. Iain Hay (Lithgow
lab)
MASSIVE and Characterisation
Virtual Laboratory
How do we partner with researchers… HIGHLIGHT TEXT
TO DRAW
ATTENTION TO A
CALL-OUT.
“Here is your
CD of data…”
“Your data is moving up to a data
management system in the cloud
where you have access to a range
of tools and services to start your
data analysis”
Rich Online Workbenches
How do we complement/replace workstations… Atom Probe, Structural
Biology, Bioinformatics,
Cytometry, Cryo-
Electron Microscopy,
Neutron Beam Imaging,
General Imaging Tool,
Light Microscopy, General
Scientific, X-ray
Cryo-Em:
Titan Krios
Cryo-Em PC
Collect
movies,
Images,
MyTardis
Data storage,
sharing
MyData
App
Raw
frames
MyData
App
Corrected
Stills
Picking
Publication
DOI, Reuse
Strudel
Desktop & Web
Ctffind
Relion
Frealign
Model
What is Cryo Electron
Microscopy?
.. and some research context
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
CryoEM
Computed
Tomography
+
Photogrammetry
http://www.cmu.edu/me/xctf/xrayct/index.htmlhttps://www.maximintegrated.com/en/app-notes/index.mvp/id/4682
How many projections does a clinical CT use?
1000!
http://www.upstate.edu/radiology/education/rsna/ct/rec
16
https://skfb.ly/TI89
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
3D reconstruction of the electron density of aE11 Fab’ polyC9 complex
What outcome are the
scientists looking for?
http://www.ebi.ac.uk/pdbe/entry/pdb/5k12
Disease
resistance
and drug
targets
https://doi.org/10.1128/mBio.00395-17
Disease
resistance
and drug
targets
https://doi.org/10.1002/cmdc.201900042
What is a time taken
for analysis of a single
dataset set?
25
Paper Processing Time
Protein
Purification
Screening
Neg. Stain
Screening
- Cryo
Cryo
Collection
2 months 2 months 2 months 3 x 2 days
Tecnai T12
JEOL IEM-1400
Molecular
Model
3 days
Motion correction
CTF correction
2D classification
3D classification
Refinement
Tecnai T12
Talos Artica
Titan Krios - K2
Talos - Falcon 3
1 hrs
5 min
1 day
1 day
3 days
Legend
Time = Red
Equipment = Blue
Steps = Grey
8 Months
300 Steps
3 Months
How much are GPUs used in
preprocessing?
27
Important quote:
“The structure
determination task is
further complicated by the
lack of information about
the relative orientations of
all particles and, in the
case of structural
variability in the sample,
also their assignment to a
structurally unique class.”
28
What are the major analysis steps?
Hint: All use GPUs
and are done
repeatedly.
2D Classification
3D Classification
Refinement
29
What is 2D Classification?
2D classification results are a way to visualize how your 2D images will
cluster together into homogenous class averages without any model
required.
Source: https://github.com/cianfrocco-lab/Old-school-processing/wiki/2D-classification 30
2D Classification
The ML2D algorithm may be used to
simultaneously align and classify single-
particle images
An intrinsic characteristic of the ML
approach is that it does not assign
images to one particular class or
orientation. Instead, images are
compared with all references in all
possible orientations and probability
weights are calculated for each
possibility.
(Maximum Likelihood)
https://doi.org/10.1016/S0076-6879(10)82013-0
31
Particle picking
for next step of
classification in
2D
32Source: https://www.e-sciencecentral.org/articles/Table.php?xn=am/am-48-001&id=
Classification of selected particles in 2D
33Source: https://www.e-sciencecentral.org/articles/Table.php?xn=am/am-48-001&id=
What is 3D Classification?
“The premise of 3D classification is to use an initial starting model
to sort your particles into 3 or more groups. By using the 3D
model, Relion uses maximum likelihood methods to create
homogenous classes of particles that will belong to a given
group.”
Source: https://github.com/cianfrocco-lab/Old-school-processing/wiki/Relion-3D-classification 34
What is a time taken
for analysis of a single
dataset set?
35
Slow time from lead compound to clinic Increasing prevalence of
resistant bacteria in the clinic
UK AMR review: “Deaths by infection will be larger than both cancer
and diabetes combined by 2050…..”
A big problem is Methicillin-resistant Staphylococcus aureus (MRSA)
and Vancomycin-resistant Enterococci (VRE)…
What is the problem?
Antibiotics at crisis point
Australian patient
contracts MRSA
Linezolid
resistance
MDR-MRSA
Isolated
Genome seqenced
(Illumina)
70S ribosomes
Isolated
Ribosomes imaged
on FEI Titan Krios
• Rational redesign of existing drug pharmaphacores
• Linezolid
• Virginiamycin (Synercid)
• More tractable chemical synthesis of fully synthetic
derivatives
• Thiostrepton
• Semi-synthetic modifications to help with drug solubility
Patient to Microscope
“Pragmatic Approach”
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
How long
does an
analysis take
(and why?)?
42
2017
• ~1-4TB raw data set/sample ~2000-5000 files
• Pipeline analysis with internal & external tools
• Require large memory gpu > 8GB
• Require large system memory > 64GB
• Require cpu cores 200 - 400
• Parallel file reads and writes
2019
• ~1-10TB raw data set/sample ~2000-10000 files
• Require large memory gpu > 16GB
• Require large system memory > 18GB
• Require cpu cores 30 - 130
• Parallel file reads and writes and high ops local
cache
How long do each
of the steps take?
• 2,500 images
• 150,000 particles
• 260 pixels
Task Submitted? GPU? Nodes Time
Import No < 1 min
Motion Correction Yes Yes 3 20 min
CTF estimation Yes No 1 20 min
Manual Picking No ?
Autopicking Yes Yes 2 40 min
Particle Extraction Yes No 1 10 min
2D Classification Yes Yes 2 10 min/iteration
3D Classification Yes Yes 1 10 min/iteration
3D Refine Yes Yes 2 5-10 min/iteration
Movie Refine Yes No 1 1 hour
Particle Polishing Yes No 1 1-2 hours
Mask Creation No 5-30 min
Postprocessing No <1 min
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
How long
does an
analysis take
(and why?)?
46
9 DAYS
GPU acceleration
for 30 of 45
How long
does an
analysis take
(and why?)?
47
2D Classification
Does GPU
architecture
affect
performance?
48
~ 2 x per
generation (
Kepler -> Volta)
Does job
layout matter
between GPU
architectures?
49
Volta peaks at
lower MPI ranks,
leaves more GPU
Ram available
How does GPU
Ram affect
scientific
outcomes?
50
K3 cameras
produce 5k
images
Workstation
GPUs vs HPC
GPUs?
51
~30% Faster,
even faster for
higher precision
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
How long
does an
analysis take
(and why?)?
54
14 DAYS
GPU acceleration
for 11 of 30
How long
does an
analysis take
(and why?)?
55
2D Classification
How long
does each
step take
(and why?)?
56
Ø 6 days of compute
Ø 3 days without
extraction step
How long
does each
step take
(and why?)?
57
Extract particles
How long
does each
step take
(and why?)?
58
How long
does each
step take
(and why?)?
59
Zoom
change
How long
does each
step take
(and why?)?
60
Two models for structure
How long
does each
step take
(and why?)?
61
Improving resolution
How fast can we
process this data?
… hardware choices, do they matter?
62
Hardware Configurations
Workstation
8 Cores
64GB Ram
4 x GTX1070
Cloud (HPC)
Dual Socket
36 Cores
384GB Ram
3 x V100
DGX-1V
Dual Socket
40 Cores
512GB Ram
8 x V100
SSD + NAS | SSD + Lustre | SSD
Can we
process
faster? (Is
HPC the
option)
64
Plot 1-4 m3g + dgx vs workstation
111.62
29.75
67.03
43.85
38.53
39.10
0.00 20.00 40.00 60.00 80.00 100.00 120.00
JOB RUNTIME (MIN)
HARDWARECONFIGURATIONS
2D Classification Step Runtime for Different
Hardware
Hardware Type 4xm3g Hardware Type 3xm3g
Hardware Type 2xm3g Hardware Type 1xm3g
Hardware Type DGX1-V Hardware Type Workstation
Can we
process
faster? (do
options
matter?)
65
Plot 1-4 m3g + dgx vs workstation
0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00
User config
Pre-read particles off
Parallel disk access off
MPI result combine off
Local scratch on
JOB RUNTIME (MIN)
OPTIMISATIONOPTIONS
2D Classification Step Runtime for Different
Hardware and Optimisations
Hardware Type 1xm3g Hardware Type DGX1-V
Can we
process
faster? (do
options
matter?)
66
Plot 1-4 m3g + dgx vs workstation
0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00
User config
Pre-read particles off
Parallel disk access off
MPI result combine off
Local scratch on
JOB RUNTIME (MIN)
OPTIMISATIONOPTIONS
2D Classification Step Runtime for Different
Hardware and Optimisations
Hardware Type 4xm3g Hardware Type 1xm3g
What is the
ratio of cost to
performance?
67
Plot 1-4 m3g + dgx vs workstation
7.14
14.29
21.43
28.57
28.57
1.67
2.55
2.90
2.85
3.75
1.00 6.00 11.00 16.00 21.00 26.00
1 x GPU node
2 x GPU node
3 x GPU node
4 x GPU node
DGX-1V
HARDWARECONFIGURATION
Cost and Performance Ratios for Workstation
VS HPC
Performance Ratio Cost Ratio
What retail options for
compute exist?
Workstations + Software Support
68
How fast are
the Relion
developers
machines?
69
• [lancew@m3-login1 job008]$ pwd
• /scratch/br76/relion3tut/relion30_tutorial_precalculated_results/Class2D/job008
• [lancew@m3-login1 job008]$ ls --full-time
• total 2884
• -rw-r--r-- 1 lancew br76 2777 2018-06-07 23:32:58.000000000 +1000 default_pipeline.star
• -rw-r--r-- 1 lancew br76 840 2018-06-07 23:32:58.000000000 +1000 job_pipeline.star
• -rw-r--r-- 1 lancew br76 448 2018-06-07 23:32:58.000000000 +1000 note.txt
• -rw-r--r-- 1 lancew br76 0 2018-06-07 23:32:57.000000000 +1000 run.err
• -rw-r--r-- 1 lancew br76 820224 2018-06-07 23:32:58.000000000 +1000 run_it025_classes.mrcs
• -rw-r--r-- 1 lancew br76 740401 2018-06-07 23:32:57.000000000 +1000 run_it025_data.star
• -rw-r--r-- 1 lancew br76 223301 2018-06-07 23:32:57.000000000 +1000 run_it025_model.star
• -rw-r--r-- 1 lancew br76 5916 2018-06-07 23:32:58.000000000 +1000 run_it025_optimiser.star
• -rw-r--r-- 1 lancew br76 458 2018-06-07 23:32:56.000000000 +1000 run_it025_sampling.star
• -rw-r--r-- 1 lancew br76 1234 2018-06-07 23:32:57.000000000 +1000 run.job
• -rw-r--r-- 1 lancew br76 307687 2018-06-07 23:32:58.000000000 +1000 run.out
• -rw-r--r-- 1 lancew br76 820224 2018-06-07 23:32:58.000000000 +1000
run_unmasked_classes.mrcs
Why do these
systems look
attractive to
researchers?
Source: https://www.exxactcorp.com/Relion-for-Cryo-EM-Solutions
DO MORE SCIENCE!
Have peace of mind, focus on what matters
most, knowing your system is backed by a 3
year warranty and support.
PLUG AND PLAY
Exxact systems are fully
turnkey, built to perform
right out of the box so
you avoid the drudgery of
configuration and setup.
Why do these
systems look
attractive to
researchers?
Source: https://www.exxactcorp.com/Relion-for-Cryo-EM-Solutions
Software:
System
CentOS 7
CUDA Toolkit
v9.2
Evince
FFTW3 Gnuplot
MPICH v3.2.1
Research
Unblur v1.0.2 EMAN v2.21
IMOD v4.9.9 Relion 3
Motioncorr MotionCor2
Gctf v1.06 CTFFind4.1.10
ResMap v1.1.4
Summovie
v1.0.2
Chimera v1.13
PRECONFIGURED
“Relion 3.0 beta and v2.1 With example job
submission scripts, benchmarks, a fully
validated test suite, and the latest software
patches for quick implementation.”
Why do these
systems look
attractive to
researchers?
Source: http://www.linuxvixion.com/wp-content/uploads/2016/04/linuxvixion-molecular-dynamics-2016-english.pdf
Source: https://aws.amazon.com/ec2/instance-types/p3/
What are the cloud options? …and cost?
$25/hr
Source: https://aws.amazon.com/ec2/instance-types/p3/
What are the cloud options? …and cost?
$25/hr 14 Days $8400
Source: http://labphoto.tumblr.com/post/105112424006/cyanuric-triazide-or-2-4-6-triazido-1-3-5-triazine
What has changed in 2
years?
Mostly hardware …. And a little ML
75
Options for Computation (2017)
Workstation
Pro
Full user control
Con
Limited by single
machine
Cloud
Pro
Scales easily
Con
Cost, complexity,
data movement
HPC
Pro
Huge resources
Con
Tightly controlled,
shared
Options for Computation (2019)
Workstation
Pro
Full user control
Con
Limited by single
machine
Cloud
Pro
Scales easily
Con
Cost, complexity,
data movement
HPC
Pro
Huge resources,
purpose designed
Con
Tightly controlled,
shared
Containers
P100 -> V100
0
10
20
30
40
50
60
70
80
90
4 8 16
PROCESSINGTIME(MINS)
NUMBER OF CORES (THREADS)
Relion - 3D Classification Step (2017)
K80
DGX
K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU
DGX-1 = 32 x CPU, 512GB RAM, 8 x P100
0
10
20
30
40
50
60
70
80
90
4 8 16
PROCESSINGTIME(MINS)
NUMBER OF CORES (THREADS)
Relion - 3D Classification Step (2019)
K80
DGX
DGX1-V
K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU
DGX-1 = 32 x CPU, 512GB RAM, 8 x P100
DGX-1V = 40 x CPU, 512GB RAM, 8 x V100
0
20
40
60
80
100
120
140
3 5 9 13 17
PROCESSINGTIME(MINS)
MPI TASKS
Relion Class 3D - MPI Task Effects (2017)
K80 Processing Time
DGX Processing Time
K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU
DGX-1 = 32 x CPU, 512GB RAM, 8 x P100
0
20
40
60
80
100
120
140
3 5 9 13 17
PROCESSINGTIME(MINS)
MPI TASKS
Relion Class 3D - MPI Task Effects (2019)
K80
DGX
DGX-1V
K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU
DGX-1 = 32 x CPU, 512GB RAM, 8 x P100
DGX-1V = 40 x CPU, 512GB RAM, 8 x V100
K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU
DGX-1 = 32 x CPU, 512GB RAM, 8 x P100
DGX-1V = 40 x CPU, 512GB RAM, 8 x V100
39.75
19.92
13.12
9.88
0 5 10 15 20 25 30 35 40 45
1
2
3
4
PROCESSING TIME FOR FIRST ITERATION(MINS)
NUMBEROFDGX-1SERVERS Iteration Processing Time VS Number of DGX-1 Servers
Linear speed up !!!
What has stayed the same
in 2 years?
Mostly software …. Still a struggle
84
Software
Dependencies
• Relion has two main parts
• Internal programs
• External programs
• List of dependencies
• FFTW
• FLTK
• OpenMPI
Software Dependencies
How to accelerate a
solution with HPC?
Using a real example (K2)
86
What are the
optimisation
options?
87
--j
The number of parallel threads to run on each CPU. We often use
4-6.
--
dont_combine_weights_
via_disc
By default large messages are passed between MPI processes
through reading and writing of large files on the computer disk.
By giving this option, the messages will be passed through the
network instead. We often use this option.
--no_parallel_disc_io
By default, all MPI slaves read their own particles (from disk or
into RAM). Use this option to have the master read all particles,
and then send them all through the network. We do not often
use this option.
--preread_images
By default, all particles are read from the computer disk in every
iteration. Using this option, they are all read into RAM once, at
the very beginning of the job instead. We often use this option if
the machine has enough RAM (more than N*boxsize*boxsize*4
bytes) to store all N particles.
--scratch_dir
By default, particles are read every iteration from the location
specified in the input STAR file. By using this option, all particles
are copied to a scratch disk, from where they will be read (every
iteration) instead. We often use this option if we don't have
enough RAM to read in all the particles, but we have large
enough fast SSD scratch disk(s) (e.g. mounted as /tmp).
Source: https://www3.mrc-lmb.cam.ac.uk/relion/index.php/Benchmarks_%26_computer_hardware
What are the
optimisation
options?
88
Disable MPI for results
What are the
optimisation
options?
89
Very little difference
What are the
optimisation
options?
90
14% Slower
How are the
CPUs used?
… How many
are needed?
91
How are the
CPUs used?
… How many
are needed?
92
How are the
CPUs used?
… How many
are needed?
93
CPUs usage
divided between
I/O and analysis
Utilization rates report how busy each GPU is
over time, and can be used to determine how
much an application is using the GPUs in the
system.
GPU Percent of time over the past sample
period during which one or more kernels was
executing on the GPU.
Memory Percent of time over the past sample
period during which global (device) memory
was being read or written.
PCIe Rx and Tx Throughput in MB/s
GPU Performance Measurements
How efficient is the software… Latest generation
GPU hardware
enables very detailed
performance metrics.
https://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf
What are the
GPUs doing?
… are they
busy?
95
What are the
GPUs doing?
… are they
busy?
96
What are the
GPUs doing?
… are they
busy?
97Relion3 Tutorial Dataset
37%
What are the
GPUs doing?
… are they
busy?
98Relion3 Tutorial Dataset
1.7%
What are the
GPUs doing
with
memory?
99Relion3 Tutorial Dataset
12GB
How fast is
data copied
to memory?
100Relion3 Tutorial Dataset
How much
power is
consumed ?
… global
warming?
102Relion3 Tutorial Dataset
50W
How cooling
is needed ?
… global
warming?
103Relion3 Tutorial Dataset
Where are the opportunities
for reducing “Time to
Science”?
104
However, the GPU acceleration is only available for
cards from a single vendor, and it cannot use many of
the largescale computational resources available in
existing centres, local clusters, or even researchers’
laptops. In addition, the memory available on typical
GPUs limits the box sizes that can be used, which
could turn into a severe bottleneck for large particles.
For relion-3, we have developed a new general code
path where CPU algorithms have been rewritten to
mirror the GPU acceleration, which provides dual
execution branches of the code that work very efficiently
both on accelerators as well as the single-instruction,
multiple-data vector units present on traditional CPUs.
Relion 3 – CPU Vs GPU?
Where is development heading… Limitations of
memory
Increased
performance from
new CPUs
https://www.biorxiv.org/content/biorxiv/early/2018/09/19/421123.full.pdf
Opportunities for
speed up
❏ Cryosparc
❏ Cryolo
❏ Preprocessing
tools
10
Source: https://cryosparc.com/
Opportunities for
speed up
❏ Cryosparc
❏ Cryolo
❏ Preprocessing
tools
10
Source: http://sphire.mpg.de/wiki/doku.php?id=downloads:cryolo_1&redirect=1
Opportunities for
speed up
❏ Cryosparc
❏ Cryolo
❏ Preprocessing
tools
10
Opportunities for
speed up
❏ Cryosparc
❏ Cryolo
❏ Preprocessing
tools
https://github.com/Characterisation-Virtual-Laboratory/Cryo-EM-Processing-
Tool 10
Cryo-EM Processing Tool
https://github.com/Characterisation-Virtual-Laboratory/Cryo-EM-Processing-
Tool
1
1
https://github.com/Characterisation-Virtual-Laboratory/Cryo-EM-Processing-
Tool 111
Cryo-EM Processing Tool
https://github.com/Characterisation-Virtual-Laboratory/Cryo-EM-Processing-Tool 1
1
Next Generation: 4D+
Volumetric Imaging
Lattice light sheet microscopy
How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing
Mitochondrial DNA
(green) escaping
mitrochondria (red)
during cell death.
Dr Kate McArthur (Monash BDI) and Dr
Lachlan Whitehead (WEHI) and The
Advanced Imaging Centre at Janelia
Research Campus.
Credit: Steve Morton
Acknowledgments
Dr Matt Belousof
Hari Venogopal
Jafar Lie
Jay Van Schyndel
MASSIVE Partners
ARDC
THANK YOU
Final messages:
ü HPC is the best option, if you have interactive
queues.
ü Cloud is good for one off analysis.
ü Workstations make great training/learning
platforms
For future questions: lance.wilson@monash.edu

More Related Content

PPTX
ESE presentation.pptx
PDF
KPI から生まれるアクセシビリティ
PDF
品質を落とさずにウォーターフォール開発から徐々にアジャイル開発へとシフトしてみる
PPTX
社会との接点・バウンダリーオブジェクト 「問いの価値」
PDF
noteの決して止まらないカイゼンを支える、 エンジニアリングへの挑戦
PPTX
Amazon SageMaker ML Governance 3つの機能紹介
PDF
Software Mining and Software Datasets
PDF
事業が対峙する現実からエンジニアリングを俯瞰する #devlove
ESE presentation.pptx
KPI から生まれるアクセシビリティ
品質を落とさずにウォーターフォール開発から徐々にアジャイル開発へとシフトしてみる
社会との接点・バウンダリーオブジェクト 「問いの価値」
noteの決して止まらないカイゼンを支える、 エンジニアリングへの挑戦
Amazon SageMaker ML Governance 3つの機能紹介
Software Mining and Software Datasets
事業が対峙する現実からエンジニアリングを俯瞰する #devlove

What's hot (13)

PDF
DeNAオリジナル ゲーム専用プラットフォーム Sakashoについて
PDF
Fluentd, Digdag, Embulkを用いたデータ分析基盤の始め方
PDF
OSSかな漢字変換『Egoistic Lily』の紹介&今後の展望
PPTX
Qiita Night 足場固めからやるマイクロサービス
PPTX
Machine Learning for Recommender Systems in the Job Market
PDF
Shinise maker minade_agile_2021_scrum_festo_saka
PDF
Els egipcis
PDF
オタクエンジニアを熱くさせる!モチベーションをあげるチームビルディング
PDF
JaSST Tokyo 2022 アジャイルソフトウェア開発への統計的品質管理の応用
PDF
リクルートにおけるデータのインフラ化への取組
PDF
10分でわかったつもりになるLean Analytics_10min lean analytics
PDF
ザ・ジェネラリスト #5000dai
PDF
情報処理技術者試験で学ぶ SAML
DeNAオリジナル ゲーム専用プラットフォーム Sakashoについて
Fluentd, Digdag, Embulkを用いたデータ分析基盤の始め方
OSSかな漢字変換『Egoistic Lily』の紹介&今後の展望
Qiita Night 足場固めからやるマイクロサービス
Machine Learning for Recommender Systems in the Job Market
Shinise maker minade_agile_2021_scrum_festo_saka
Els egipcis
オタクエンジニアを熱くさせる!モチベーションをあげるチームビルディング
JaSST Tokyo 2022 アジャイルソフトウェア開発への統計的品質管理の応用
リクルートにおけるデータのインフラ化への取組
10分でわかったつもりになるLean Analytics_10min lean analytics
ザ・ジェネラリスト #5000dai
情報処理技術者試験で学ぶ SAML
Ad

Similar to How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing (20)

PDF
Collins seattle-2014-final
PPTX
Opportunities for X-Ray science in future computing architectures
PDF
Networked Data Storage and Analysis for the Wisconsin Regional Materials Network
PDF
Pycon9 dibernado
PPTX
Introduction to Machine Learning and Texture Analysis for Lesion Characteriza...
PPTX
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
PDF
Medical imaging Seminar Session 1
PDF
GPU Compute in Medical and Print Imaging
 
PDF
The Power of Scripting with a Desktop SEM
PPTX
MDC Connects: CryoEM in medicines discovery
PDF
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
PPT
Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility
PDF
G3 talk rld_2
PDF
Multi-GPU FFT Performance on Different Hardware
PDF
Ground truth generation in medical imaging: a crowdsourcing-based iterative a...
PPTX
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
PPTX
pptxonCyroelectromicroscopydetailed.pptx
PPTX
Opportunities for HPC in pharma R&D - main deck
PDF
Xomics brochure 2013
PDF
sem fe
Collins seattle-2014-final
Opportunities for X-Ray science in future computing architectures
Networked Data Storage and Analysis for the Wisconsin Regional Materials Network
Pycon9 dibernado
Introduction to Machine Learning and Texture Analysis for Lesion Characteriza...
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Medical imaging Seminar Session 1
GPU Compute in Medical and Print Imaging
 
The Power of Scripting with a Desktop SEM
MDC Connects: CryoEM in medicines discovery
Enabling Real Time Analysis & Decision Making - A Paradigm Shift for Experime...
Rob Davidson at the G3 Workshop: Open Source - Tools for Reproducibility
G3 talk rld_2
Multi-GPU FFT Performance on Different Hardware
Ground truth generation in medical imaging: a crowdsourcing-based iterative a...
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
pptxonCyroelectromicroscopydetailed.pptx
Opportunities for HPC in pharma R&D - main deck
Xomics brochure 2013
sem fe
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
PPTX
Transforming Private 5G Networks
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
PDF
HPC Impact: EDA Telemetry Neural Networks
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
PDF
Machine Learning for Weather Forecasts
PPTX
HPC AI Advisory Council Update
PDF
Fugaku Supercomputer joins fight against COVID-19
PDF
Energy Efficient Computing using Dynamic Tuning
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
PDF
State of ARM-based HPC
PDF
Versal Premium ACAP for Network and Cloud Acceleration
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
PDF
Scaling TCO in a Post Moore's Era
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
PDF
Introducing HPC with a Raspberry Pi Cluster
PDF
Overview of HPC Interconnects
Major Market Shifts in IT
Preparing to program Aurora at Exascale - Early experiences and future direct...
Transforming Private 5G Networks
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
HPC Impact: EDA Telemetry Neural Networks
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Machine Learning for Weather Forecasts
HPC AI Advisory Council Update
Fugaku Supercomputer joins fight against COVID-19
Energy Efficient Computing using Dynamic Tuning
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
State of ARM-based HPC
Versal Premium ACAP for Network and Cloud Acceleration
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Scaling TCO in a Post Moore's Era
CUDA-Python and RAPIDS for blazing fast scientific computing
Introducing HPC with a Raspberry Pi Cluster
Overview of HPC Interconnects

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Cloud computing and distributed systems.
PDF
Empathic Computing: Creating Shared Understanding
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Big Data Technologies - Introduction.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPT
Teaching material agriculture food technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Encapsulation theory and applications.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Cloud computing and distributed systems.
Empathic Computing: Creating Shared Understanding
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Big Data Technologies - Introduction.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Teaching material agriculture food technology
Programs and apps: productivity, graphics, security and other tools
Understanding_Digital_Forensics_Presentation.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Electronic commerce courselecture one. Pdf
Machine learning based COVID-19 study performance prediction
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Unlocking AI with Model Context Protocol (MCP)
Encapsulation_ Review paper, used for researhc scholars
Dropbox Q2 2025 Financial Results & Investor Presentation
sap open course for s4hana steps from ECC to s4
Encapsulation theory and applications.pdf

How to Scale from Workstation through Cloud to HPC in Cryo-EM Processing

  • 1. How To Scale From Workstation Through Cloud To HPC In Cryo-EM Processing Presented by: Dr Lance Wilson
  • 2. Presentation Outline What will we discuss… • Who am I? Where am I from? • What is cryo-electron microscopy? • How long does analysis take? • What impact does hardware have? • What options do you suggest? • What processing opportunities exist? • Next opportunity: Light sheet microscopy!
  • 3. Who am I and Where am I from? The research context
  • 7. MASSIVE … is a data processing engine for Australian science. It empowers researchers to unlock impactful research discoveries within scientific data. Integrative High Performance Computing • usability by new HPC user communities over raw capacity; • hardware suited to data processing; • underpinning high performing wet, experimental and data-focused laboratories, with growing data processing needs; • workflows that increase return on investment in instruments and experiments; and • porosity and flexibility to serve specific requirements in growth usage areas, such as the life sciences, machine learning etc.
  • 8. FEI Titan Krios Nationally funded project to develop environments for Cryo analysis MMI Lattice Light Sheet Nationally funded project to capture and preprocess LLS data Synchrotron MX Data Management, Integration between AS and MASSIVE M3 MASSIVE M3 Structural refinement and analysis Professor Trevor Lithgow ARC Australian Laureate Fellow Discovery of new protein transport machines in bacteria, understanding the assembly of protein transport machines, and dissecting the effects of anti- microbial peptides on anti-biotic resistant “super-bugs” Chamber details from the nanomachine that secretes the toxin that causes cholera. Research and data by Dr. Iain Hay (Lithgow lab)
  • 9. MASSIVE and Characterisation Virtual Laboratory How do we partner with researchers… HIGHLIGHT TEXT TO DRAW ATTENTION TO A CALL-OUT. “Here is your CD of data…” “Your data is moving up to a data management system in the cloud where you have access to a range of tools and services to start your data analysis”
  • 10. Rich Online Workbenches How do we complement/replace workstations… Atom Probe, Structural Biology, Bioinformatics, Cytometry, Cryo- Electron Microscopy, Neutron Beam Imaging, General Imaging Tool, Light Microscopy, General Scientific, X-ray
  • 11. Cryo-Em: Titan Krios Cryo-Em PC Collect movies, Images, MyTardis Data storage, sharing MyData App Raw frames MyData App Corrected Stills Picking Publication DOI, Reuse Strudel Desktop & Web Ctffind Relion Frealign Model
  • 12. What is Cryo Electron Microscopy? .. and some research context
  • 16. How many projections does a clinical CT use? 1000! http://www.upstate.edu/radiology/education/rsna/ct/rec 16
  • 21. 3D reconstruction of the electron density of aE11 Fab’ polyC9 complex
  • 22. What outcome are the scientists looking for? http://www.ebi.ac.uk/pdbe/entry/pdb/5k12
  • 25. What is a time taken for analysis of a single dataset set? 25
  • 26. Paper Processing Time Protein Purification Screening Neg. Stain Screening - Cryo Cryo Collection 2 months 2 months 2 months 3 x 2 days Tecnai T12 JEOL IEM-1400 Molecular Model 3 days Motion correction CTF correction 2D classification 3D classification Refinement Tecnai T12 Talos Artica Titan Krios - K2 Talos - Falcon 3 1 hrs 5 min 1 day 1 day 3 days Legend Time = Red Equipment = Blue Steps = Grey 8 Months 300 Steps 3 Months
  • 27. How much are GPUs used in preprocessing? 27
  • 28. Important quote: “The structure determination task is further complicated by the lack of information about the relative orientations of all particles and, in the case of structural variability in the sample, also their assignment to a structurally unique class.” 28
  • 29. What are the major analysis steps? Hint: All use GPUs and are done repeatedly. 2D Classification 3D Classification Refinement 29
  • 30. What is 2D Classification? 2D classification results are a way to visualize how your 2D images will cluster together into homogenous class averages without any model required. Source: https://github.com/cianfrocco-lab/Old-school-processing/wiki/2D-classification 30
  • 31. 2D Classification The ML2D algorithm may be used to simultaneously align and classify single- particle images An intrinsic characteristic of the ML approach is that it does not assign images to one particular class or orientation. Instead, images are compared with all references in all possible orientations and probability weights are calculated for each possibility. (Maximum Likelihood) https://doi.org/10.1016/S0076-6879(10)82013-0 31
  • 32. Particle picking for next step of classification in 2D 32Source: https://www.e-sciencecentral.org/articles/Table.php?xn=am/am-48-001&id=
  • 33. Classification of selected particles in 2D 33Source: https://www.e-sciencecentral.org/articles/Table.php?xn=am/am-48-001&id=
  • 34. What is 3D Classification? “The premise of 3D classification is to use an initial starting model to sort your particles into 3 or more groups. By using the 3D model, Relion uses maximum likelihood methods to create homogenous classes of particles that will belong to a given group.” Source: https://github.com/cianfrocco-lab/Old-school-processing/wiki/Relion-3D-classification 34
  • 35. What is a time taken for analysis of a single dataset set? 35
  • 36. Slow time from lead compound to clinic Increasing prevalence of resistant bacteria in the clinic UK AMR review: “Deaths by infection will be larger than both cancer and diabetes combined by 2050…..” A big problem is Methicillin-resistant Staphylococcus aureus (MRSA) and Vancomycin-resistant Enterococci (VRE)… What is the problem? Antibiotics at crisis point
  • 37. Australian patient contracts MRSA Linezolid resistance MDR-MRSA Isolated Genome seqenced (Illumina) 70S ribosomes Isolated Ribosomes imaged on FEI Titan Krios • Rational redesign of existing drug pharmaphacores • Linezolid • Virginiamycin (Synercid) • More tractable chemical synthesis of fully synthetic derivatives • Thiostrepton • Semi-synthetic modifications to help with drug solubility Patient to Microscope “Pragmatic Approach”
  • 42. How long does an analysis take (and why?)? 42 2017 • ~1-4TB raw data set/sample ~2000-5000 files • Pipeline analysis with internal & external tools • Require large memory gpu > 8GB • Require large system memory > 64GB • Require cpu cores 200 - 400 • Parallel file reads and writes 2019 • ~1-10TB raw data set/sample ~2000-10000 files • Require large memory gpu > 16GB • Require large system memory > 18GB • Require cpu cores 30 - 130 • Parallel file reads and writes and high ops local cache
  • 43. How long do each of the steps take? • 2,500 images • 150,000 particles • 260 pixels Task Submitted? GPU? Nodes Time Import No < 1 min Motion Correction Yes Yes 3 20 min CTF estimation Yes No 1 20 min Manual Picking No ? Autopicking Yes Yes 2 40 min Particle Extraction Yes No 1 10 min 2D Classification Yes Yes 2 10 min/iteration 3D Classification Yes Yes 1 10 min/iteration 3D Refine Yes Yes 2 5-10 min/iteration Movie Refine Yes No 1 1 hour Particle Polishing Yes No 1 1-2 hours Mask Creation No 5-30 min Postprocessing No <1 min
  • 45. How long does an analysis take (and why?)? 46 9 DAYS GPU acceleration for 30 of 45
  • 46. How long does an analysis take (and why?)? 47 2D Classification
  • 47. Does GPU architecture affect performance? 48 ~ 2 x per generation ( Kepler -> Volta)
  • 48. Does job layout matter between GPU architectures? 49 Volta peaks at lower MPI ranks, leaves more GPU Ram available
  • 49. How does GPU Ram affect scientific outcomes? 50 K3 cameras produce 5k images
  • 50. Workstation GPUs vs HPC GPUs? 51 ~30% Faster, even faster for higher precision
  • 52. How long does an analysis take (and why?)? 54 14 DAYS GPU acceleration for 11 of 30
  • 53. How long does an analysis take (and why?)? 55 2D Classification
  • 54. How long does each step take (and why?)? 56 Ø 6 days of compute Ø 3 days without extraction step
  • 55. How long does each step take (and why?)? 57 Extract particles
  • 56. How long does each step take (and why?)? 58
  • 57. How long does each step take (and why?)? 59 Zoom change
  • 58. How long does each step take (and why?)? 60 Two models for structure
  • 59. How long does each step take (and why?)? 61 Improving resolution
  • 60. How fast can we process this data? … hardware choices, do they matter? 62
  • 61. Hardware Configurations Workstation 8 Cores 64GB Ram 4 x GTX1070 Cloud (HPC) Dual Socket 36 Cores 384GB Ram 3 x V100 DGX-1V Dual Socket 40 Cores 512GB Ram 8 x V100 SSD + NAS | SSD + Lustre | SSD
  • 62. Can we process faster? (Is HPC the option) 64 Plot 1-4 m3g + dgx vs workstation 111.62 29.75 67.03 43.85 38.53 39.10 0.00 20.00 40.00 60.00 80.00 100.00 120.00 JOB RUNTIME (MIN) HARDWARECONFIGURATIONS 2D Classification Step Runtime for Different Hardware Hardware Type 4xm3g Hardware Type 3xm3g Hardware Type 2xm3g Hardware Type 1xm3g Hardware Type DGX1-V Hardware Type Workstation
  • 63. Can we process faster? (do options matter?) 65 Plot 1-4 m3g + dgx vs workstation 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 User config Pre-read particles off Parallel disk access off MPI result combine off Local scratch on JOB RUNTIME (MIN) OPTIMISATIONOPTIONS 2D Classification Step Runtime for Different Hardware and Optimisations Hardware Type 1xm3g Hardware Type DGX1-V
  • 64. Can we process faster? (do options matter?) 66 Plot 1-4 m3g + dgx vs workstation 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 User config Pre-read particles off Parallel disk access off MPI result combine off Local scratch on JOB RUNTIME (MIN) OPTIMISATIONOPTIONS 2D Classification Step Runtime for Different Hardware and Optimisations Hardware Type 4xm3g Hardware Type 1xm3g
  • 65. What is the ratio of cost to performance? 67 Plot 1-4 m3g + dgx vs workstation 7.14 14.29 21.43 28.57 28.57 1.67 2.55 2.90 2.85 3.75 1.00 6.00 11.00 16.00 21.00 26.00 1 x GPU node 2 x GPU node 3 x GPU node 4 x GPU node DGX-1V HARDWARECONFIGURATION Cost and Performance Ratios for Workstation VS HPC Performance Ratio Cost Ratio
  • 66. What retail options for compute exist? Workstations + Software Support 68
  • 67. How fast are the Relion developers machines? 69 • [lancew@m3-login1 job008]$ pwd • /scratch/br76/relion3tut/relion30_tutorial_precalculated_results/Class2D/job008 • [lancew@m3-login1 job008]$ ls --full-time • total 2884 • -rw-r--r-- 1 lancew br76 2777 2018-06-07 23:32:58.000000000 +1000 default_pipeline.star • -rw-r--r-- 1 lancew br76 840 2018-06-07 23:32:58.000000000 +1000 job_pipeline.star • -rw-r--r-- 1 lancew br76 448 2018-06-07 23:32:58.000000000 +1000 note.txt • -rw-r--r-- 1 lancew br76 0 2018-06-07 23:32:57.000000000 +1000 run.err • -rw-r--r-- 1 lancew br76 820224 2018-06-07 23:32:58.000000000 +1000 run_it025_classes.mrcs • -rw-r--r-- 1 lancew br76 740401 2018-06-07 23:32:57.000000000 +1000 run_it025_data.star • -rw-r--r-- 1 lancew br76 223301 2018-06-07 23:32:57.000000000 +1000 run_it025_model.star • -rw-r--r-- 1 lancew br76 5916 2018-06-07 23:32:58.000000000 +1000 run_it025_optimiser.star • -rw-r--r-- 1 lancew br76 458 2018-06-07 23:32:56.000000000 +1000 run_it025_sampling.star • -rw-r--r-- 1 lancew br76 1234 2018-06-07 23:32:57.000000000 +1000 run.job • -rw-r--r-- 1 lancew br76 307687 2018-06-07 23:32:58.000000000 +1000 run.out • -rw-r--r-- 1 lancew br76 820224 2018-06-07 23:32:58.000000000 +1000 run_unmasked_classes.mrcs
  • 68. Why do these systems look attractive to researchers? Source: https://www.exxactcorp.com/Relion-for-Cryo-EM-Solutions DO MORE SCIENCE! Have peace of mind, focus on what matters most, knowing your system is backed by a 3 year warranty and support. PLUG AND PLAY Exxact systems are fully turnkey, built to perform right out of the box so you avoid the drudgery of configuration and setup.
  • 69. Why do these systems look attractive to researchers? Source: https://www.exxactcorp.com/Relion-for-Cryo-EM-Solutions Software: System CentOS 7 CUDA Toolkit v9.2 Evince FFTW3 Gnuplot MPICH v3.2.1 Research Unblur v1.0.2 EMAN v2.21 IMOD v4.9.9 Relion 3 Motioncorr MotionCor2 Gctf v1.06 CTFFind4.1.10 ResMap v1.1.4 Summovie v1.0.2 Chimera v1.13 PRECONFIGURED “Relion 3.0 beta and v2.1 With example job submission scripts, benchmarks, a fully validated test suite, and the latest software patches for quick implementation.”
  • 70. Why do these systems look attractive to researchers? Source: http://www.linuxvixion.com/wp-content/uploads/2016/04/linuxvixion-molecular-dynamics-2016-english.pdf
  • 72. Source: https://aws.amazon.com/ec2/instance-types/p3/ What are the cloud options? …and cost? $25/hr 14 Days $8400 Source: http://labphoto.tumblr.com/post/105112424006/cyanuric-triazide-or-2-4-6-triazido-1-3-5-triazine
  • 73. What has changed in 2 years? Mostly hardware …. And a little ML 75
  • 74. Options for Computation (2017) Workstation Pro Full user control Con Limited by single machine Cloud Pro Scales easily Con Cost, complexity, data movement HPC Pro Huge resources Con Tightly controlled, shared
  • 75. Options for Computation (2019) Workstation Pro Full user control Con Limited by single machine Cloud Pro Scales easily Con Cost, complexity, data movement HPC Pro Huge resources, purpose designed Con Tightly controlled, shared Containers
  • 77. 0 10 20 30 40 50 60 70 80 90 4 8 16 PROCESSINGTIME(MINS) NUMBER OF CORES (THREADS) Relion - 3D Classification Step (2017) K80 DGX K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU DGX-1 = 32 x CPU, 512GB RAM, 8 x P100
  • 78. 0 10 20 30 40 50 60 70 80 90 4 8 16 PROCESSINGTIME(MINS) NUMBER OF CORES (THREADS) Relion - 3D Classification Step (2019) K80 DGX DGX1-V K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU DGX-1 = 32 x CPU, 512GB RAM, 8 x P100 DGX-1V = 40 x CPU, 512GB RAM, 8 x V100
  • 79. 0 20 40 60 80 100 120 140 3 5 9 13 17 PROCESSINGTIME(MINS) MPI TASKS Relion Class 3D - MPI Task Effects (2017) K80 Processing Time DGX Processing Time K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU DGX-1 = 32 x CPU, 512GB RAM, 8 x P100
  • 80. 0 20 40 60 80 100 120 140 3 5 9 13 17 PROCESSINGTIME(MINS) MPI TASKS Relion Class 3D - MPI Task Effects (2019) K80 DGX DGX-1V K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU DGX-1 = 32 x CPU, 512GB RAM, 8 x P100 DGX-1V = 40 x CPU, 512GB RAM, 8 x V100
  • 81. K80 = 24 x CPU, 256GB RAM, 4 x K80 GPU DGX-1 = 32 x CPU, 512GB RAM, 8 x P100 DGX-1V = 40 x CPU, 512GB RAM, 8 x V100 39.75 19.92 13.12 9.88 0 5 10 15 20 25 30 35 40 45 1 2 3 4 PROCESSING TIME FOR FIRST ITERATION(MINS) NUMBEROFDGX-1SERVERS Iteration Processing Time VS Number of DGX-1 Servers Linear speed up !!!
  • 82. What has stayed the same in 2 years? Mostly software …. Still a struggle 84
  • 83. Software Dependencies • Relion has two main parts • Internal programs • External programs • List of dependencies • FFTW • FLTK • OpenMPI Software Dependencies
  • 84. How to accelerate a solution with HPC? Using a real example (K2) 86
  • 85. What are the optimisation options? 87 --j The number of parallel threads to run on each CPU. We often use 4-6. -- dont_combine_weights_ via_disc By default large messages are passed between MPI processes through reading and writing of large files on the computer disk. By giving this option, the messages will be passed through the network instead. We often use this option. --no_parallel_disc_io By default, all MPI slaves read their own particles (from disk or into RAM). Use this option to have the master read all particles, and then send them all through the network. We do not often use this option. --preread_images By default, all particles are read from the computer disk in every iteration. Using this option, they are all read into RAM once, at the very beginning of the job instead. We often use this option if the machine has enough RAM (more than N*boxsize*boxsize*4 bytes) to store all N particles. --scratch_dir By default, particles are read every iteration from the location specified in the input STAR file. By using this option, all particles are copied to a scratch disk, from where they will be read (every iteration) instead. We often use this option if we don't have enough RAM to read in all the particles, but we have large enough fast SSD scratch disk(s) (e.g. mounted as /tmp). Source: https://www3.mrc-lmb.cam.ac.uk/relion/index.php/Benchmarks_%26_computer_hardware
  • 89. How are the CPUs used? … How many are needed? 91
  • 90. How are the CPUs used? … How many are needed? 92
  • 91. How are the CPUs used? … How many are needed? 93 CPUs usage divided between I/O and analysis
  • 92. Utilization rates report how busy each GPU is over time, and can be used to determine how much an application is using the GPUs in the system. GPU Percent of time over the past sample period during which one or more kernels was executing on the GPU. Memory Percent of time over the past sample period during which global (device) memory was being read or written. PCIe Rx and Tx Throughput in MB/s GPU Performance Measurements How efficient is the software… Latest generation GPU hardware enables very detailed performance metrics. https://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf
  • 93. What are the GPUs doing? … are they busy? 95
  • 94. What are the GPUs doing? … are they busy? 96
  • 95. What are the GPUs doing? … are they busy? 97Relion3 Tutorial Dataset 37%
  • 96. What are the GPUs doing? … are they busy? 98Relion3 Tutorial Dataset 1.7%
  • 97. What are the GPUs doing with memory? 99Relion3 Tutorial Dataset 12GB
  • 98. How fast is data copied to memory? 100Relion3 Tutorial Dataset
  • 99. How much power is consumed ? … global warming? 102Relion3 Tutorial Dataset 50W
  • 100. How cooling is needed ? … global warming? 103Relion3 Tutorial Dataset
  • 101. Where are the opportunities for reducing “Time to Science”? 104
  • 102. However, the GPU acceleration is only available for cards from a single vendor, and it cannot use many of the largescale computational resources available in existing centres, local clusters, or even researchers’ laptops. In addition, the memory available on typical GPUs limits the box sizes that can be used, which could turn into a severe bottleneck for large particles. For relion-3, we have developed a new general code path where CPU algorithms have been rewritten to mirror the GPU acceleration, which provides dual execution branches of the code that work very efficiently both on accelerators as well as the single-instruction, multiple-data vector units present on traditional CPUs. Relion 3 – CPU Vs GPU? Where is development heading… Limitations of memory Increased performance from new CPUs https://www.biorxiv.org/content/biorxiv/early/2018/09/19/421123.full.pdf
  • 103. Opportunities for speed up ❏ Cryosparc ❏ Cryolo ❏ Preprocessing tools 10 Source: https://cryosparc.com/
  • 104. Opportunities for speed up ❏ Cryosparc ❏ Cryolo ❏ Preprocessing tools 10 Source: http://sphire.mpg.de/wiki/doku.php?id=downloads:cryolo_1&redirect=1
  • 105. Opportunities for speed up ❏ Cryosparc ❏ Cryolo ❏ Preprocessing tools 10
  • 106. Opportunities for speed up ❏ Cryosparc ❏ Cryolo ❏ Preprocessing tools https://github.com/Characterisation-Virtual-Laboratory/Cryo-EM-Processing- Tool 10
  • 110. Next Generation: 4D+ Volumetric Imaging Lattice light sheet microscopy
  • 112. Mitochondrial DNA (green) escaping mitrochondria (red) during cell death. Dr Kate McArthur (Monash BDI) and Dr Lachlan Whitehead (WEHI) and The Advanced Imaging Centre at Janelia Research Campus. Credit: Steve Morton
  • 113. Acknowledgments Dr Matt Belousof Hari Venogopal Jafar Lie Jay Van Schyndel MASSIVE Partners ARDC
  • 114. THANK YOU Final messages: ü HPC is the best option, if you have interactive queues. ü Cloud is good for one off analysis. ü Workstations make great training/learning platforms For future questions: lance.wilson@monash.edu