SlideShare a Scribd company logo
2D Classification
3D Initial model
Christos	Gatsogiannis
30.	January		2019
Tissues Cells Organelles Macro(molecules)
1mm 100μm 10μm 1μm 100nm 10nm 10	Å 1	Å
LM
Electron tomography
SAXS
Electron crystallography
&	single particle TEM
X-ray crystallography
NMR
http://gem.loria.fr/gEMfitter/
TRPV1 (2013, Liao et al., Nature)TRPV1 (2008, Moiseenkova-Bell et al., PNAS)	
Electron cryo Microscopy
The END of “Blob”-ology!
Class2 d
http://www.emdatabank.org
Tissues Cells Organelles Macro(molecules)
1mm 100μm 10μm 1μm 100nm 10nm 10	Å 1	Å
LM
Electron tomography
SAXS
Electron crystallography
&	single particle TEM
X-ray crystallography
NMR
http://gem.loria.fr/gEMfitter/
• correct for motion
• estimate CTF
• Particle picking
• 2D Classification
• Get Initial Model
• 3D Refinement
• 3D Variability
• 3D Heterogeneity analysis
• Local Resolution/Filtering
from projection images...to an initial model..
to a refined 3D structure
7
We need:
• different views
• average many images
Initial Assumption:
All particles have the same structure
and are linked by 3D rigid body transformations
(Frank,	1996)
Status	quo
9
CTF Picking Extraction
• Stack	of	extremely	noisy	particles	– low	signal-to-noise	ratio	(SNR)
• Difficult/impossible	to	
• assess	data	set	quality	and	homogeneity
• do	ab-initio 3D	reconstruction
➜ Need	to	improve	SNR
Suboptimal	dataset?
Suboptimal	intial model?
Pitfalls
Particle Picking
Pitfalls
Particle Picking
Pitfalls
Particle Picking
Improving	SNR	by	averaging
14
Avg.	1 Avg.	10 Avg.	100 Avg.	1000 Avg.	10000
Average	of	many	images	of	the	same	view	of	an	identical	object
Closer	to	reality
Shift and	rotate	every	particle	to	match	reference	=	2D	Alignment
x
y
α
Different	measures	for	similarity	of	pairs	ex.	cross-correlation	(CCC)	or	sum	of	squared	errors	(SSE)	
In	a	model	world
Close	to	reality
Improving	SNR	by	averaging
15
Avg.	1 Avg.	10 Avg.	100 Avg.	1000 Avg.	10000
Average	of	many	images	of	the	same	view	of	an	identical	object
In	a	model	world
Too	noisy	to	use	as	reference	➜ use	average	of	all	particles	instead
Iterate 2D	alignment	until	result	converges
x
y
α
Iteration	1Iteration	3Iteration	2
16
Avg.	10	single	imagesAvg.	50	single	imagesAvg.	100	single	imagesAvg.	1000	single	images
Can	you	find	out	
who	is	hiding	here?
Iterative
2D	alignment
17
Averaging	does	not	help	when	we	have	multiple	objects/views
➜ Need	to	group	particles	based	on	their	view
How	to	group	particles?
18
Easy	... 2D	Multi-Reference	Alignment	(MRA)
...	but	in	a	real	world	we	do	not	know	the	groups	our	data	set	is	made	up	of
High	risk	of	model	bias	
http://blake.bcm.edu/ncmi/workshop_files/Model_bias.pdf
➜ 2D	clustering
K-means	clustering
Every	image	with	N	pixels	
can	be	considered	a	point	in	a	
N-dimensional	coordinate	system
The	higher	the	similarity	of	
a	pair	of	images,	the	closer	
the	representing	points	are.
19
K-means	clustering
1. User	provides	number	of	expected	clusters
2. Splits	data	set	randomly	into	K cluster	
3. Calculates	average	image	per	cluster	
➜ new	center	location
K=3
Initialization	steps
20
K-means	clustering
4. Finds	out	which	center	it	is	closest	to	each	point	(MRA)
5. Calculates	new	average	of	all	images	belonging	to	one	cluster
6. Repeat	4-5.	until	result	converges
K=3
Iterations
21
K-means	clustering
4. Finds	out	which	center	it	is	closest	to	each	point	(MRA)
5. Calculates	new	average	of	all	images	belonging	to	one	cluster
6. Repeat	4-5.	until	result	converges
K=3
Iterations
22
Four	Directors	of	MPI-Dortmund
Early	Stage	of
Iterations
Middle	Stage	of	
Iterations
Final	Stage	of	
Iterations
23
Weaknesses	of	K-means
K-Means	clustering	is	a	good	algorithm	because	it	is	simple	and fast.	
However,	it	is	not	perfect…
24
The	number	of	clusters	is	a	critical	parameter	and	can	affect	results	considerablyNeed	to	guess	the
number	of	clusters	K
Results	dramatically	depend	on	the	initialization.	The	algorithm	may	be	trapped	in	
the	local	optimum	➜ Model	bias	problem
Sensitive	to	initial	
condition
Not	robust	to	outliers
Data	points	far	from	the	centroid	may	pull	the	centroid	away	from	the	center	-
Weakness	of	arithmetic	mean	➜ Especially	problematic	for	preferred	orientations
Limited	to	circular	
clusters	of	similar	size
K-means	can	hardly	handle	clusters	of	variable	size/density
K=5
25
K=10
The	real	cryo-EM	world	is	far	too	noisy	for	K-means
26
ISAC
What	can	ISAC do	better	to	overcome	problems	of	K-means?
27
– Iterative	stable	alignment	and	clustering
Need	to	guess	the
number	of	clusters	K
Sensitive	to	initial	
condition
Not	robust	to	outliers
Limited	to	circular	
clusters	of	similar	size
Ask	for	number	of	images	per	group	instead	➜ Equal-Size	K-means
Run	2D	clustering	mutiple	times	starting	from	different	inital	conditions
➜ Keep	reproducible classes	only
Mutliple	2D	alignments	within	each	cluster to	identify	heterogenous	clusters	and	
outliers,	which	have	high	variation	in	alignment	results	➜ Keep	stable	classes	only
Reject	too	small	clusters	typical	for	outliers	and	limit	maximum	size
100	particles
img_per_grp=10,	
minimum_grp_size=3
Expected	K=10
Returned	K=17
ISAC	can	handle	the	real	cryo-EM	world!
28
Advantage	of	equal-size	K-means
29
Not	equal-size	classification	
No.	of	particles	470,000
K=	300
ISAC	2
No.	of	particles	40,000	(subset)
Images	per	group	200
320k	ptcls
✗
✓
Data	assessment
Quality	of	data	set?	➜ Details	of	classes,	contamination,	preferred	orientations?	
Information	about	protein	➜ Shape,	oligomer,	symmetry,	heterogeneity?
Cleaning	of	data	set	for	3D	refinement	+ higher	SNR	for	initial	3D	model
30
31
Movie from Chen et al. (2017) Nature Commun.
3D	Specimen
2D		Projections
2D		Transforms
Are	Sections
of 3D	Transform!
Fourier	
Inversion
3D	density map
Projection theorem
Ψ,Θ,Φ (Psi,Theta,Phi)
33
x,y - shifts
I	need	to	get	me	one	of	them	starting	models!
34
How	is	a	structure	obtained	from	images	of	its	2D	projections?	
This need to be initialized
somehow
Two possibilities:
1. We know the structure from previous
work (boring…)
2. We produce a rough estimate of the
structure from the data without using the 3D
refinement algorithm.
Image from Cheng et al. (2015) Cell 161: 438 – 449
How	to	overcome	a	greedy	algorithm	– Stochastic	Hill	climbing
35
Image from Cheng et al. (2015) Cell 161: 438 – 449
Image modified from Pujani et al. (2013) Nature Methods 14:290 – 296
If ccf is bigger than
before then take it
and stop lookingRandomize
order
The	original	concept	- PRIME
36
Elmlund et al. (2013) Structure 21:1299 – 1306
Image from Cheng et al. (2015) Cell 161: 438 – 449
If ccf is bigger than
before then take it
and stop lookingRandomize
order
Stochastic	hill	climbing	– When	things	go	awry
37
In some nice cases But sometimes...
Stochastic	hill	climbing	meets	genetic	algorithm
38
What‘s	a	genetic	algorithm?
An	optimization	technique	inspired	in	natural	selection.
Initialize population
Stop condition?
Mutate and crossover
Final result
Evaluate fitness
SelectionYes
No
How	is	this	implemented?
The	idea	behind	VIPER
39
Slide from Pawel Penczek
This can be used to calculate
an alignment accuracy
40
Slide from Pawel Penczek
Generation 1
Six independent
stochastic hill
climbing runs
Start for
generation 2
(Here’s the
genetic
algorithm’s turn)
Generation 2
How	is	this	implemented?
The	idea	behind	VIPER
41
Slide from Pawel Penczek
How	is	this	implemented?
What	does	the	R in	RVIPER stand	for?
What do you get in the end?
Sorted errors
main0??: Full reproducible run
run0??: individual VIPER run
RVIPER
Reproducible initial 3D model
What	to	do	after	running	RVIPER
43
Moons
Mask for refinement
Another important requirement:
even angular coverage, without major gaps.
What	to	do	after	running	RVIPER
45
Handedness: You hav a 50% chance of getting the right handedness from a VIPER run. It should not
matter for further image processing.
„Possible explanations for intermodel variations include the model resolution, different handedness
of the electron density maps, rotational freedom of the NP molecules, and source of RNP samples
(viral particles versus cells).“
Filter the reference at the proper resolution!
Defocus 1.0	μm
1.1	MDA
EMD-2984
-nice distribution
-collect small dataset
-high defocus!
-high dose!
-state of the art picking
-check quality of class averages
Defocus 1.15	μm
464	kDa
Assumed	contamination
as	top-view	of	the	Tc-toxin	complex
Lee	et	al,		JMB,	2006
Separate the sheeps from the goats!
Avoid assumptions!
51

More Related Content

PDF
From atom determination to cell characterization
PPTX
Atomic level manipulation of matter using Scanning Transmission Electron Micr...
PDF
SFSCON23 - Alan Ianeselli - Machine learning driven simulation of protein fol...
PPTX
Positron Emission Tomography
PDF
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
PPT
Physics of Nuclear Medicine, SPECT and PET.ppt
PPT
SPECT: Single Photon Emission Computed Tomography
PDF
Particle Swarm Optimization Application In Power System
From atom determination to cell characterization
Atomic level manipulation of matter using Scanning Transmission Electron Micr...
SFSCON23 - Alan Ianeselli - Machine learning driven simulation of protein fol...
Positron Emission Tomography
淺嚐 LHCb 數據分析的滋味 Play around the LHCb Data on Kaggle with SK-Learn and MatPlotLib
Physics of Nuclear Medicine, SPECT and PET.ppt
SPECT: Single Photon Emission Computed Tomography
Particle Swarm Optimization Application In Power System

Similar to Class2 d (20)

PDF
Simulating ultrasound images - day1 - 15h30-16h00.pdf
PPT
Fg micropolates
PPTX
X ray crystallography analysis
PPTX
Second Harmonic Generation Non linear optics
PPT
Satellite imageclassification
PPTX
Machine Learning at the (sub)Atomic Scale (or Are The Nanobots Nigh?)
PDF
Models Can Lie
PDF
The Challenges of Probabilistic Thinking (keynote talk at ICFEM 2017)
PPT
AI methods for localization in noisy environment
PDF
Terahertz Spectroscopy for the Solid State Characterisation of Amorphous Systems
PPT
博士論文口試-Ph.D. Defense (2013-06-19)-TH Wang.ppt
PDF
kape_science
ODP
Petar Petrov MSc thesis defense
PDF
介紹 TrackML 挑戰 (Introduction to TrackML Kaggle challenge)
PDF
Dissection of a Cold, Infalling High-Mass Star-Forming Core
PPT
Positron Emission Tomography
PPTX
TEM
PPTX
Invited talk-hc-nucl-med-dedicated-brain-pet-3-5-2019
PDF
Positron Emission Tomography (PET).pdf
PPTX
Seminor ansto-0730
Simulating ultrasound images - day1 - 15h30-16h00.pdf
Fg micropolates
X ray crystallography analysis
Second Harmonic Generation Non linear optics
Satellite imageclassification
Machine Learning at the (sub)Atomic Scale (or Are The Nanobots Nigh?)
Models Can Lie
The Challenges of Probabilistic Thinking (keynote talk at ICFEM 2017)
AI methods for localization in noisy environment
Terahertz Spectroscopy for the Solid State Characterisation of Amorphous Systems
博士論文口試-Ph.D. Defense (2013-06-19)-TH Wang.ppt
kape_science
Petar Petrov MSc thesis defense
介紹 TrackML 挑戰 (Introduction to TrackML Kaggle challenge)
Dissection of a Cold, Infalling High-Mass Star-Forming Core
Positron Emission Tomography
TEM
Invited talk-hc-nucl-med-dedicated-brain-pet-3-5-2019
Positron Emission Tomography (PET).pdf
Seminor ansto-0730
Ad

More from Dominika Elmlund (11)

PDF
Otago 2019 2
PDF
3 3dclassify
PDF
PDF
Mb viruses
PDF
Sample prep
PPTX
Theoretical minimum
PDF
Resolution
PDF
Mc mullan imageformation
PDF
Tomography
PDF
Fundamentals of image reconstruction from projection images
PDF
Otago 2019 1
Otago 2019 2
3 3dclassify
Mb viruses
Sample prep
Theoretical minimum
Resolution
Mc mullan imageformation
Tomography
Fundamentals of image reconstruction from projection images
Otago 2019 1
Ad

Recently uploaded (20)

PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
The scientific heritage No 166 (166) (2025)
PPTX
perinatal infections 2-171220190027.pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPT
6.1 High Risk New Born. Padetric health ppt
PPTX
Biomechanics of the Hip - Basic Science.pptx
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
Overview of calcium in human muscles.pptx
PDF
. Radiology Case Scenariosssssssssssssss
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
BIOMOLECULES PPT........................
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PDF
Sciences of Europe No 170 (2025)
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
lecture 2026 of Sjogren's syndrome l .pdf
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
The scientific heritage No 166 (166) (2025)
perinatal infections 2-171220190027.pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
6.1 High Risk New Born. Padetric health ppt
Biomechanics of the Hip - Basic Science.pptx
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
Hypertension_Training_materials_English_2024[1] (1).pptx
Overview of calcium in human muscles.pptx
. Radiology Case Scenariosssssssssssssss
Placing the Near-Earth Object Impact Probability in Context
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
BIOMOLECULES PPT........................
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Sciences of Europe No 170 (2025)
BODY FLUIDS AND CIRCULATION class 11 .pptx

Class2 d