SlideShare a Scribd company logo
Computer-Assisted
Structure Elucidation
Christoph Steinbeck
1	
10	
100	
1000	
10000	
100000	
1000000	
10000000	
100000000	
Metabolites	
in	Human	
Metabolites	
in	Microbes	
Compounds	
in	ChEBI	
V154	
Metabolites	
in	HMDB	
V3.6	
Metabolites	
in	Plants	
Compounds	
in	ChEMBL	
V23	
Compounds	
in	PubChem	
V8-2017	
Species Metabolomes and
How Little We Know
1	
10	
100	
1000	
10000	
100000	
1000000	
10000000	
100000000	
Metabolites	
in	Human	
Metabolites	
in	Microbes	
Compounds	
in	ChEBI	
V154	
Metabolites	
in	HMDB	
V3.6	
Metabolites	
in	Plants	
Compounds	
in	ChEMBL	
V23	
Compounds	
in	PubChem	
V8-2017	
Species Metabolomes and
How Little We Know
80,000
1	
10	
100	
1000	
10000	
100000	
1000000	
10000000	
100000000	
Metabolites	
in	Human	
Metabolites	
in	Microbes	
Compounds	
in	ChEBI	
V154	
Metabolites	
in	HMDB	
V3.6	
Metabolites	
in	Plants	
Compounds	
in	ChEMBL	
V23	
Compounds	
in	PubChem	
V8-2017	
Species Metabolomes and
How Little We Know
80,000
200,000
1	
10	
100	
1000	
10000	
100000	
1000000	
10000000	
100000000	
Metabolites	
in	Human	
Metabolites	
in	Microbes	
Compounds	
in	ChEBI	
V154	
Metabolites	
in	HMDB	
V3.6	
Metabolites	
in	Plants	
Compounds	
in	ChEMBL	
V23	
Compounds	
in	PubChem	
V8-2017	
Species Metabolomes and
How Little We Know
80,000
200,000
2,000,000
In a typical metabolome
measurement, less than 40% of
the features can be assigned to
known compounds.
Oliver Fiehn, UC Davis, USA
Internal communication
source: http://www.csfmetabolome.ca
source: http://www.csfmetabolome.ca
source: http://www.csfmetabolome.ca
There are known knowns; there are
things we know we know.
We also know there are known
unknowns; that is to say, we know
there are some things we do not know.
But there are also unknown
unknowns – the ones we don’t know
we don’t know.
—United States Secretary of Defense,
Donald Rumsfeld
Levels of Confidence
Reproduced from: Viant MR, Kurland IJ, Jones MR, Dunn WB (2017) How close are we to complete annotation of metabolomes? Current Opinion in Chemical Biology 36:64–69. doi: 10.1016/j.cbpa.2017.01.001
How many constitutional
isomers are we looking at?
C6H6
The 217 constitutional isomers of C6H6
How many constitutional
isomers are we looking at?
C10H16
C13H16O3
C30H48O2
How many constitutional
isomers are we looking at?
C10H16
C13H16O3
C30H48O2
24938 constitutional isomers
How many constitutional
isomers are we looking at?
C10H16
C13H16O3
C30H48O2
24938 constitutional isomers
> 2,000,000,000 constitutional isomers
How many constitutional
isomers are we looking at?
C10H16
C13H16O3
C30H48O2
24938 constitutional isomers
> 2,000,000,000 constitutional isomers
>> 1012 constitutional isomers
Christoph Steinbeck 10
Constitutional Isomers of C10H16
Generating molecular spaces:
Algorithms for non-redundant generation of molecular graphs
Generating molecular spaces:
Algorithms for non-redundant generation of molecular graphs
(Chemical) Space is big. You just won't
believe how vastly, hugely, mind-bogglingly
big it is. I mean, you may think it's a long
way down the road to the chemist's, but
that's just peanuts to space.
Douglas Adams, The Hitchhiker's Guide to the Galaxy
English humorist & science fiction novelist (1952 - 2001)
Computer-Assisted Structure Elucidation (CloudMet 2017)
Levels of Confidence
Reproduced from: Viant MR, Kurland IJ, Jones MR, Dunn WB (2017) How close are we to complete annotation of metabolomes? Current Opinion in Chemical Biology 36:64–69. doi: 10.1016/j.cbpa.2017.01.001
Computer-Assisted Structure
Elucidation with 2D NMR
• The only viable way for slightly more complex
problems upwards.
1D Proton NMR
1D Proton NMR
1D Proton NMR
Exploded Pharmacy: HR-MS
yields information about elemental composition, such as C10H16
H
C
C
C
C
C
CCC
C
C
H H
H
H
H
H
H
H
H
H
H
H
H
H
H
Experiments: J-Couplings: DEPT
• 13C-detekted 1D-Exp.
•Number of protons attached
to each carbon is coded as
signal phase
•Combining information from
DEPT-135, DEPT-90 and bb-
decoupled carbon nmr yields
a complete list of carbon
fragments in the molecule.
DEPT-90
Zoomed region of
DEPT-135
DEPT-135
Time required: 1-5 min
Acronyms:
DEPT: Distortionless Enhancement by Polarization Transfer
APT: Attached Proton Test
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
After evaluation of DEPT experiment (or multiplicity edited
HSQC) heavy atoms are labeled with chemical shifts and
number of attached hydrogen atoms.
31.5
ppm
31.3
ppm
38.0
ppm 144.5
ppm
20.9
ppm
26.4
ppm
23.0
ppm
116.1
ppm
47.2
ppm
40.9
ppm
Experiments: J-Couplings: HSQC
alias HMQC, CH-COSY, HETCOR
Acronyms:
HSQC: Heteronuclear Single Quantum Coherence
HMQC: Heteronuclear Multiple Quantum Coherence
1JCH
•Cross-Signals in HSQC-diagrams caused
by CH-Couplings via one CH-bond(1JCH,
~140 Hz ).
•Experiment yields list of pairs of directly
bonded carbon and hydrogen atoms.
1JCH
Experiments: J-Couplings: HSQC
alias HMQC, CH-COSY, HETCOR
Experiments: J-Couplings: HH-COSY
Acronyms:
DQF-COSY: Double Quantum Filtered COrrelation Spectroscopy
•3JHH-Couplings (and a few others,
unfortunately)
•Proton-rich skeletons might be
elucidated just by HHCOSY und HSQC
•Problem: Quarternary carbons,
heteroatoms in the skeleton
3JHH
3JHH
gs-COSY without DQ-Filter
2 scans/increment, 128 increments
5 min experiment time
Experiments: J-Couplings: HH-COSY
3JHH
gs-COSY without DQ-Filter
2 scans/increment, 128 increments
5 min experiment time
Experiments: J-Couplings: HH-COSY
Experiments: J-Couplings:TOCSY
Acronyms:
TOCSY: Total Correlation SpectroscopY
HOHAHA (HOmonuclear HArtmann-HAhn) transfer
•Correlates all protons in a scalarly-
coupled spin system via spin-lock Puls.
Experiments: J-Couplings: HMBC
Acronyms:
HMBC: Heteronuclear Multiple Bond Coherence
COLOC: COrrelation via LOng range Kopplungen
•Cross-signals through scalar couplings
between carbon and hydrogen via 2 or
3 bonds (2
JCH/3
JCH , ~8 Hz ).
•Problem: 2
JCH/3
JCH -couplings cannot be
distinguished.
•Problem: 4
JCH/5
JCH -couplings, which
cannot be easily distinguished from 2
JCH/
3
JCH -couplings
Experiments: J-Couplings-> HMBC
Experiments: J-Couplings-> INADEQUATE
Acronyms:
INADEQUATE: Incredible Natural Abundance DoublE QUAnTum CoherencE
•Powerful 13
C-detected method
•Cross signals via 1-bond CC
couplings(1
JCC , ~40 Hz )
•Problem: 0.011x 0.011 ~ 1/10000 (0.01%).
•Large amount of substance needed
(10mg/C-atom)
Experiments: J-Couplings: 1,1-ADEQUATE
Lamellarin H
Experiments: Dipolar Couplings: NOESY
Acronyms:
NOESY: Nuclear Overhauser Enhanced SpectroscopY
•Cross signals via dipolar couplings
through space between protons
that are less than 5 Å apart in
space
•I ~ r-6
•Important experiment for 3D
structure determination of
macromolecular structure.
After evaluation of HMBC, we are looking at a molecular
puzzle of pairs of carbon atoms that are either 1 or 2 bonds
apart (but we don’t know which is the case).
31.5
ppm
31.3
ppm
38.0
ppm
144.5
ppm
20.9
ppm
26.4
ppm
23.0
ppm
116.1
ppm
47.2
ppm
40.9
ppm
1.16 ppm
2.34 ppm
1.63 ppm
2.19 ppm
0.85 ppm
1.27 ppm5.17 ppm
2.06 ppm
1.93 ppm
Structure Elucidation (CASE): Step by Step
HR-MS, et al.
Typical 1H and
13C-chemical
shifts
DEPT,
1JCH-Correlations
HH COSY, TOCSY
HMBC
1,1-ADEQUATE
Gross Formula
Functional Groups
Structural Fragments
Constitution
Relative Configuration
Complete
Structure
3JHH-Couplings
NOE-Diff. spectra
HH NOESY, HH ROESY
C10H16
2x
3x
3x
2x
Computer-Assisted
Structure Elucidation
(CASE)
Computer-Assisted Structure Elucidation (CASE): Step by Step
HR-MS, et al.
Typical 1H and
13C-chemical
shifts
DEPT,
1JCH-Correlations
HH COSY, TOCSY
HMBC
1,1-ADEQUATE
Gross Formula
Functional Groups
Structural Fragments
Constitution
Relative Configuration
Complete
Structure
3JHH-Couplings
NOE-Diff. spectra
HH NOESY, HH ROESY
C10H16
2x
3x
3x
2x
Yes, it is the
same workflow
and you need a bit of cheminformatics behind the
scene to do the job
The Chemistry Development Kit (CDK): 

An Open Source Java Library for
Structural Cheminformatics

http://cdk.github.io
Computer-Assisted Structure Elucidation (CloudMet 2017)
Structure Elucidation
versus
Identification/
Dereplication
Computer-Assisted Structure Elucidation (CloudMet 2017)
Fast structure searches via
binning
40 years of
CASE research
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
• Sesami
• Assemble
• Houdini
Munk, M.E. et al., 1982. Computer-assisted structure elucidation. Fresenius' Zeitschrift für analytische Chemie, 313(6), pp.473–479.
Computer-Assisted Structure Elucidation (CloudMet 2017)
LSD
(Logic for Structure Determination)
• LSD (Logic for Structure Determination
• Command-line driven
• Takes spectral constraints as input
• Generates lists of connection tables
(molecules)
• Open Source
• Rocket-fast
• No early spectrum processing
Christoph Steinbeck 48
Ab-Initio Structure Elucidation by 2D NMR
No. 13C CPD 1JCH (HMQC) HMBC
1 148.24 - 4.0; 2.42; 2.28; 1.20
2 118.25 5.49 4.0; 2.28; 2.17
3 66.32 4.0 5.49
4 43.87 2.17 5.49; 4.0; 2.42; 1.32; 1.20; 0.86
5 41.40 2.14 5.49; 1.32; 0.86
6 38.32 - 2.42; 1.32; 1.20; 0.86
7 32.00 1.20/2.42
8 31.52 2.28 2.42; 1.20
9 26.52 1.32 0.86; 2.17
10 21.46 0.86 1.20; 1.32
2D peak picking table
Table with heavy-atom relations
Internally generated by CASE
program
Steinbeck, C. Computer-Assisted Structure Elucidation. In Handbook on Chemoinformatics.; Gasteiger, J. Ed.; Wiley-VCH: Weinheim, 2003; Vol. 2; pp. 1378-1406.
Limits to Growth
Limits to Growth
• Deterministic methods suffer from combinatorial explosion
Limits to Growth
• Deterministic methods suffer from combinatorial explosion
No. of Heavy Atoms
No.ofConstitutionalIsomer
CalculationTime
Limits to Growth
• Deterministic methods suffer from combinatorial explosion
No. of Heavy Atoms
No.ofConstitutionalIsomer
CalculationTime
C10H16 (10 Heavy Atoms)
24938 Constitutional Isomers
Limits to Growth
• Deterministic methods suffer from combinatorial explosion
No. of Heavy Atoms
No.ofConstitutionalIsomer
CalculationTime
C13H16O3 (16 Heavy Atoms)
> 2,000,000,000 Constitutional Isomers
O
O
O
C10H16 (10 Heavy Atoms)
24938 Constitutional Isomers
Limits to Growth
• Deterministic methods suffer from combinatorial explosion
No. of Heavy Atoms
No.ofConstitutionalIsomer
CalculationTime
C13H16O3 (16 Heavy Atoms)
> 2,000,000,000 Constitutional Isomers
O
O
O
C10H16 (10 Heavy Atoms)
24938 Constitutional Isomers
C30H48O2 (32 Heavy Atoms)
>> 1012 Constitutional Isomers
HO
OH
Limits to Growth
• Deterministic methods suffer from combinatorial explosion
• Prospective use of spectroscopic input information may make them error-intolerant
No. of Heavy Atoms
No.ofConstitutionalIsomer
CalculationTime
C13H16O3 (16 Heavy Atoms)
> 2,000,000,000 Constitutional Isomers
O
O
O
C10H16 (10 Heavy Atoms)
24938 Constitutional Isomers
C30H48O2 (32 Heavy Atoms)
>> 1012 Constitutional Isomers
HO
OH
Deterministic Structure Generators: The LUCY Method
• Prospective use of spectral information for building isomers
• Needs 1D 13C, 2D HMQC, HMBC, HH COSY
• Example: Walk a decision tree while interpreting HMBC-Signals
N Atoms,
No Bonds
Inserted Bonds: 1 2 3 n
HMBC-derived relations
between Heteroatoms
Steinbeck, C.; Angewandte Chemie. International Ed. in English 1996, 35, 1984-1986.
LSD Input Syntax: Basics
⍺-Pinene Example
http://eos.univ-reims.fr/LSD/index_ENG.html
MULT 1 C 2 0
MULT 2 C 2 1
MULT 3 C 3 1
MULT 4 C 3 1
MULT 5 C 3 0
MULT 6 C 3 2
MULT 7 C 3 2
MULT 8 C 3 3
MULT 9 C 3 3
MULT 10 C 3 3
HMQC 2 2
HMQC 3 3
HMQC 4 4
HMQC 6 6
HMQC 7 7
HMQC 8 8
HMQC 9 9
HMQC 10 10
HMBC 1 6
HMBC 1 9
HMBC 2 3
HMBC 2 9
HMBC 3 6
HMBC 3 8
HMBC 3 9
HMBC 3 10
HMBC 4 6
HMBC 4 8
HMBC 4 10
HMBC 5 6
HMBC 5 8
HMBC 5 10
HMBC 7 6
HMBC 8 10
HMBC 9 3
HMBC 10 8
Atomdefinitions1JCH,redundant?
2/3JCH,C-H-long-rangecorrelations
C
oncatenate
those
blocks
in
a
textfile
LSD Usage
⍺-Pinene Example
$ lsd pinene.in
$ outlsd 7 < pinene.sol > pinene.sdf
http://eos.univ-reims.fr/LSD/index_ENG.html
LSD Usage
⍺-Pinene Example
$ lsd pinene.in
$ outlsd 7 < pinene.sol > pinene.sdf
http://eos.univ-reims.fr/LSD/index_ENG.html
Command prompt -
don’t type this in
LSD Usage
⍺-Pinene Example
$ lsd pinene.in
$ outlsd 7 < pinene.sol > pinene.sdf
http://eos.univ-reims.fr/LSD/index_ENG.html
Command prompt -
don’t type this in will produce pinene.sol
(LSD specific output file)
LSD Usage
⍺-Pinene Example
$ lsd pinene.in
$ outlsd 7 < pinene.sol > pinene.sdf
http://eos.univ-reims.fr/LSD/index_ENG.html
Command prompt -
don’t type this in will produce pinene.sol
(LSD specific output file)
converts .sol into .sdf
(standard molecular file format)
Use e.g. MarvinView to view .sdf
LSD Usage
⍺-Pinene Example
MarvinView rendering of the two results in accordance
with out input data from the previous slide
LSD Input Syntax:Advanced
Liu et al., C34H52O6, HRESI-MS (m/z 557.3833 [m+H])
ELIM 3 4
MULT 1 C 2 0
MULT 2 C 3 0
[… 35 more omitted …]
; known carbonyls
BOND 1 36
BOND 7 37
BOND 10 39
CARB L1 ; define list L1 containing all carbons
HETE L2
LIST L3 2 3 6 8 17 18 27 28
LIST L4 9 38 40
PROP L1 1 L2 - ; Every carbon atom
; can carry one or less
; hetero-atoms, but not
; two
PROP L4 0 L3 ; Every oxygen which is
; not an sp2 O has a
; parter from L3 (based
; on a conservative
; chemical shift
; inspection)
COSY 4 5
[… 4 more omitted …]
HMQC 4 4
[… more omitted …]
HMBC 2 4
[… 98 more omitted …]
http://eos.univ-reims.fr/LSD/index_ENG.html
Liu et al., C34H52O6, HRESI-MS (m/z 557.3833 [m+H])
InChIKey=ANWFPAAUCGPEBV-MOHJPFBDNA-N InChIKey=YAJAXOAXTCGOQA-HKOYGPOVNA-N
Liu et al., C34H52O6, HRESI-MS (m/z 557.3833 [m+H])
InChIKey=ANWFPAAUCGPEBV-MOHJPFBDNA-N InChIKey=YAJAXOAXTCGOQA-HKOYGPOVNA-N
InChIKey of published compound #14 = YAJAXOAXTCGOQA-HKOYGPOVNA-N
LSD demo
http://eos.univ-reims.fr/LSD/index_ENG.html
http://eos.univ-reims.fr/LSD/MANUAL_ENG.html
Stochastic Search Methods
• Simulated Annealing
• Traveling Salesman Problem
• Finding the solution structure of large biomolecules
• Integrated Circuits Layout
• Robotic Path Planning
• Genetic Algorithms
• Protein Folding
• Immune System Simulation
• Computer-Aided Design
• Quite a number of other options ...
Algorithms known to tackle large search spaces:
Simulated Annealing
Guided Walk in Constitution Space
Simulated Annealing
Guided Walk in Constitution Space
Simulated Annealing
Guided Walk in Constitution Space
Simulated Annealing
Guided Walk in Constitution Space
Simulated Annealing
Guided Walk in Constitution Space
Neighbors in constitution space
Simulated Annealing
Guided Walk in Constitution Space
Neighbors in constitution space
• are in close chemical distance to
each other
Simulated Annealing
Guided Walk in Constitution Space
Neighbors in constitution space
• are in close chemical distance to
each other
• are likely to be similar in their
spectroscopic properties
Simulated Annealing
Guided Walk in Constitution Space
Neighbors in constitution space
• are in close chemical distance to
each other
• are likely to be similar in their
spectroscopic properties
Simulated Annealing
Small steps on the constitution space landscape
Faulon, J.-L.; J. Chem. Inf. Comput. Sci., 36 (1996) 4, 731-40
Simulated Annealing
Evaluating a score for each point (constitution)
in structure space
Score Function
based on
Spectroscopic Fitness
Stotal =
c1 SHMBC +
c2 SHHCOSY +
c3 SShift +
… +
cn SFeatures
Simulated Annealing
Small steps on the constitution space landscape
exp( )
f
p
T
δ
= −
T
t0,0
Annealing Schedule
Tt=αTt-1 with 0.9 < α < 1
Acceptance criterion
Simulated Annealing
Small steps on the constitution space landscape
exp( )
f
p
T
δ
= −
T
t0,0
Annealing Schedule
Tt=αTt-1 with 0.9 < α < 1
Acceptance criterion
Simulated Annealing
Small steps on the constitution space landscape
exp( )
f
p
T
δ
= −
T
t0,0
Annealing Schedule
Tt=αTt-1 with 0.9 < α < 1
Acceptance criterion
Simulated Annealing
Small steps on the constitution space landscape
exp( )
f
p
T
δ
= −
T
t0,0
Annealing Schedule
Tt=αTt-1 with 0.9 < α < 1
Acceptance criterion
Some Nice Properties of this SA Scheme
Some Nice Properties of this SA Scheme
•Pluggable Target Function
Some Nice Properties of this SA Scheme
• If you can reliably calculate a measurable property
for a given constitution, it can be part of your
target function
•Pluggable Target Function
Some Nice Properties of this SA Scheme
• Spectroscopic information
• IR
• UV-VIS
• Other types of NMR experiments
• MS fragmentation (?)
• If you can reliably calculate a measurable property
for a given constitution, it can be part of your
target function
•Pluggable Target Function
Some Nice Properties of this SA Scheme
• Spectroscopic information
• IR
• UV-VIS
• Other types of NMR experiments
• MS fragmentation (?)
• If you can reliably calculate a measurable property
for a given constitution, it can be part of your
target function
•Pluggable Target Function
• Additional knowledge
• Good-List/Bad-List fragments
• Drug Likeness
• Natural Product Likeness
Some Nice Properties of this SA Scheme
• Spectroscopic information
• IR
• UV-VIS
• Other types of NMR experiments
• MS fragmentation (?)
• If you can reliably calculate a measurable property
for a given constitution, it can be part of your
target function
•Pluggable Target Function
• Additional knowledge
• Good-List/Bad-List fragments
• Drug Likeness
• Natural Product Likeness
General System
for Optimization
in Constitution
Space
Some Nice Properties of this SA Scheme
• Spectroscopic information
• IR
• UV-VIS
• Other types of NMR experiments
• MS fragmentation (?)
• If you can reliably calculate a measurable property
for a given constitution, it can be part of your
target function
•Pluggable Target Function
•Artifacts only lead to slightly lower ranking of correct structure in hit list
• Additional knowledge
• Good-List/Bad-List fragments
• Drug Likeness
• Natural Product Likeness
General System
for Optimization
in Constitution
Space
Score convergence for SENECA SA run on Polycarpol
(32 heavy atoms), performed by 8 server processes
Score
3000
4250
5500
6750
8000
Iteration
0 33500 67000 101000134500 168000201500 235000268500 302000335500
HO
OH
Polycarpol (C30H48O2).
Distributed computing
at its cheapest.
Distributed Server
Client
Gatekeeper
Retrieve Server List
Collect ResultsSubmit Spectral Data
Qualitative assessment:
Computational complexity of deterministic and
stochastic algorithms
Compound LUCY SENECA Steps overall
α-Pinene (C10H16) 2 s 1 min 30,000
Eurabidiol(C15H28O2) 29 s 5 min 90 000
Polycarpol (C30H50O) 33 min 12 min 350,000
OH
HO
HO
OH
Eurabidiol (C15H28O2) Polycarpol (C30H48O2).α-Pinene (C10H16)
Deterministic vs SA Generation
Timeinseconds
0
500
1000
1500
2000
No of Heavy Atoms
0 8 15 23 30
Deterministic SA
C. Steinbeck, Journal of Chemical Information & Computer Sciences 2001, 41, 1500.
Ranking Solutions
• Heteroatom-rich/proton-poor skeletons
can yield many solutions (hundreds,
thousands)
• Possible ranking by
• Spectrum Similarity
• Natural Product Likeness
CONTENTS
Datasets
39162
113,425
Components for Molecule Curation
Components for Molecule Curation
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
Component for Signature Generation
Statistics for Scoring
Statistics for Scoring
NP - Natural product
SM - Synthetic molecule
Statistics for Scoring
In the fragment contribution (Fragmenti),
✦NPi is the total number of molecules in the NP dataset in which the
Fragmenti occurs,
✦SMi is the total number of molecules in the SM dataset in which the
Fragmenti occurs,
✦SMt is the total number of molecules int he SM dataset
✦NPt is the total number of molecules in the NP dataset.
✦N is the number of fragments in given molecule
NP - Natural product
SM - Synthetic molecule
Statistics for Scoring
In the fragment contribution (Fragmenti),
✦NPi is the total number of molecules in the NP dataset in which the
Fragmenti occurs,
✦SMi is the total number of molecules in the SM dataset in which the
Fragmenti occurs,
✦SMt is the total number of molecules int he SM dataset
✦NPt is the total number of molecules in the NP dataset.
✦N is the number of fragments in given molecule
NP - Natural product
SM - Synthetic molecule
Statistics for Scoring
In the fragment contribution (Fragmenti),
✦NPi is the total number of molecules in the NP dataset in which the
Fragmenti occurs,
✦SMi is the total number of molecules in the SM dataset in which the
Fragmenti occurs,
✦SMt is the total number of molecules int he SM dataset
✦NPt is the total number of molecules in the NP dataset.
✦N is the number of fragments in given molecule
NP - Natural product
SM - Synthetic molecule
Statistics for Scoring
Jayaseelan KV, Moreno P, Truszkowski A, Ertl P & Steinbeck C (2012) Natural product-likeness
score revisited: an open-source, open-data implementation. BMC Bioinformatics 13, 106.
• Natural Product-likeness classification and integrated it into
Taverna workflow tool
• (http://sourceforge.net/projects/np-likeness/).
• Included in second version of SENECA CASE
Availability
Publicationen
Publicationen
Publicationen
Computer-Assisted Structure Elucidation (CloudMet 2017)

More Related Content

PPT
New insights into the metabolic network of Methylobacterium extorquens AM1
PDF
Multinuclear liquid and solid-state NMR of Fructoborate complex
PDF
Deep Purple: Discolouration in CBD products
PDF
System response of metabolic networks in Chlamydomonas reinhardtii during Nit...
PPT
Htos Presentation
PDF
A Biotechniques Webinar Seminar on Epigenetics and the Histone Code
PPTX
Intelligent library design for protein families and beyond sp
PPT
Thireault_Presentation_17Jul2008_ver 1
New insights into the metabolic network of Methylobacterium extorquens AM1
Multinuclear liquid and solid-state NMR of Fructoborate complex
Deep Purple: Discolouration in CBD products
System response of metabolic networks in Chlamydomonas reinhardtii during Nit...
Htos Presentation
A Biotechniques Webinar Seminar on Epigenetics and the Histone Code
Intelligent library design for protein families and beyond sp
Thireault_Presentation_17Jul2008_ver 1

Similar to Computer-Assisted Structure Elucidation (CloudMet 2017) (20)

PPTX
Tim Cheeseright, Cresset, 'Introducing Fragment Growing in FieldStere and oth...
PDF
Two dimensional nmr spectroscopy (practical application and spectral analysis
PPTX
2D NMR ORGANIC SPECTROSCOPY by DR ANTHONY CRASTO
PPT
Cheminformatics and the Structure Elucidation of Natural Products
PPTX
Hetcor
PDF
T21 IB Chemistry- Spectroscopy continued
PPTX
2 d nmr
PDF
Elucidating undecipherable chemical structures using computer assisted struct...
PPT
5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk
PPT
ICCS9 2011 Talk
PPTX
IB Chemistry on Mass Spectrometry, Index Hydrogen Deficiency and Isotopes
PDF
IB Chemistry on Mass Spectrometry, Index Hydrogen Deficiency and Isotopes
PPTX
Representing Chemicals Digitally: An overview of Cheminformatics
PPTX
Cheminformatics
PDF
Applying Computer Assisted Structure Elucidation Algorithms For The Purpose O...
PPTX
Simplification process of complex 1H NMR and13C NMR
PPTX
2D NMR 2D nmr hetcor and inadequate technique
PPT
Great promise of navigating the internet using in chis
PPTX
Advanced Computational Drug Design
Tim Cheeseright, Cresset, 'Introducing Fragment Growing in FieldStere and oth...
Two dimensional nmr spectroscopy (practical application and spectral analysis
2D NMR ORGANIC SPECTROSCOPY by DR ANTHONY CRASTO
Cheminformatics and the Structure Elucidation of Natural Products
Hetcor
T21 IB Chemistry- Spectroscopy continued
2 d nmr
Elucidating undecipherable chemical structures using computer assisted struct...
5th Meeting on U.S. Government Chemical Databases and Open Chemistry Talk
ICCS9 2011 Talk
IB Chemistry on Mass Spectrometry, Index Hydrogen Deficiency and Isotopes
IB Chemistry on Mass Spectrometry, Index Hydrogen Deficiency and Isotopes
Representing Chemicals Digitally: An overview of Cheminformatics
Cheminformatics
Applying Computer Assisted Structure Elucidation Algorithms For The Purpose O...
Simplification process of complex 1H NMR and13C NMR
2D NMR 2D nmr hetcor and inadequate technique
Great promise of navigating the internet using in chis
Advanced Computational Drug Design
Ad

More from Christoph Steinbeck (16)

PPTX
The COCONUT Natural Products Database, Talk at ICCS 2025
PDF
AI in Chemistry: Deep Learning Models Love Really Big Data
PDF
Publication of raw and curated NMR spectroscopic data for organic molecules
PDF
Developments in Metabolomics leading to PhenoMeNal
PDF
Building a Model Organism Metabolome Database
PDF
PhenoMeNal: Large scale computing with medical metabolic phenotyping data
PDF
Developing an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
PDF
Building an efficient infrastructure, standards and data flow for metabolomics
PDF
World-wide data exchange in metabolomics, Wageningen, October 2016
PDF
Skolnik symposium ACS Meeting Philadelphia 2016
PDF
Multi-Omics Bioinformatics across Application Domains
PDF
The time is right to focus on a model organism database
PDF
PhenoMeNal presentation at STFC-ELIXIR Meeting Hinxon
PDF
16 years of the Chemistry Development Kit (CDK)
PDF
Large Scale computing with medical metabolic phenotyping data
PDF
Sharing data from clinical and medical research
The COCONUT Natural Products Database, Talk at ICCS 2025
AI in Chemistry: Deep Learning Models Love Really Big Data
Publication of raw and curated NMR spectroscopic data for organic molecules
Developments in Metabolomics leading to PhenoMeNal
Building a Model Organism Metabolome Database
PhenoMeNal: Large scale computing with medical metabolic phenotyping data
Developing an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Building an efficient infrastructure, standards and data flow for metabolomics
World-wide data exchange in metabolomics, Wageningen, October 2016
Skolnik symposium ACS Meeting Philadelphia 2016
Multi-Omics Bioinformatics across Application Domains
The time is right to focus on a model organism database
PhenoMeNal presentation at STFC-ELIXIR Meeting Hinxon
16 years of the Chemistry Development Kit (CDK)
Large Scale computing with medical metabolic phenotyping data
Sharing data from clinical and medical research
Ad

Recently uploaded (20)

PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
famous lake in india and its disturibution and importance
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
. Radiology Case Scenariosssssssssssssss
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Microbiology with diagram medical studies .pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
ECG_Course_Presentation د.محمد صقران ppt
The scientific heritage No 166 (166) (2025)
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
The KM-GBF monitoring framework – status & key messages.pptx
famous lake in india and its disturibution and importance
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
TOTAL hIP ARTHROPLASTY Presentation.pptx
Phytochemical Investigation of Miliusa longipes.pdf
Placing the Near-Earth Object Impact Probability in Context
. Radiology Case Scenariosssssssssssssss
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Microbiology with diagram medical studies .pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...

Computer-Assisted Structure Elucidation (CloudMet 2017)

  • 6. In a typical metabolome measurement, less than 40% of the features can be assigned to known compounds. Oliver Fiehn, UC Davis, USA Internal communication
  • 10. There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say, we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know. —United States Secretary of Defense, Donald Rumsfeld
  • 11. Levels of Confidence Reproduced from: Viant MR, Kurland IJ, Jones MR, Dunn WB (2017) How close are we to complete annotation of metabolomes? Current Opinion in Chemical Biology 36:64–69. doi: 10.1016/j.cbpa.2017.01.001
  • 12. How many constitutional isomers are we looking at? C6H6
  • 13. The 217 constitutional isomers of C6H6
  • 14. How many constitutional isomers are we looking at? C10H16 C13H16O3 C30H48O2
  • 15. How many constitutional isomers are we looking at? C10H16 C13H16O3 C30H48O2 24938 constitutional isomers
  • 16. How many constitutional isomers are we looking at? C10H16 C13H16O3 C30H48O2 24938 constitutional isomers > 2,000,000,000 constitutional isomers
  • 17. How many constitutional isomers are we looking at? C10H16 C13H16O3 C30H48O2 24938 constitutional isomers > 2,000,000,000 constitutional isomers >> 1012 constitutional isomers
  • 19. Generating molecular spaces: Algorithms for non-redundant generation of molecular graphs
  • 20. Generating molecular spaces: Algorithms for non-redundant generation of molecular graphs (Chemical) Space is big. You just won't believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it's a long way down the road to the chemist's, but that's just peanuts to space. Douglas Adams, The Hitchhiker's Guide to the Galaxy English humorist & science fiction novelist (1952 - 2001)
  • 22. Levels of Confidence Reproduced from: Viant MR, Kurland IJ, Jones MR, Dunn WB (2017) How close are we to complete annotation of metabolomes? Current Opinion in Chemical Biology 36:64–69. doi: 10.1016/j.cbpa.2017.01.001
  • 23. Computer-Assisted Structure Elucidation with 2D NMR • The only viable way for slightly more complex problems upwards.
  • 27. Exploded Pharmacy: HR-MS yields information about elemental composition, such as C10H16 H C C C C C CCC C C H H H H H H H H H H H H H H H
  • 28. Experiments: J-Couplings: DEPT • 13C-detekted 1D-Exp. •Number of protons attached to each carbon is coded as signal phase •Combining information from DEPT-135, DEPT-90 and bb- decoupled carbon nmr yields a complete list of carbon fragments in the molecule. DEPT-90 Zoomed region of DEPT-135 DEPT-135 Time required: 1-5 min Acronyms: DEPT: Distortionless Enhancement by Polarization Transfer APT: Attached Proton Test
  • 31. After evaluation of DEPT experiment (or multiplicity edited HSQC) heavy atoms are labeled with chemical shifts and number of attached hydrogen atoms. 31.5 ppm 31.3 ppm 38.0 ppm 144.5 ppm 20.9 ppm 26.4 ppm 23.0 ppm 116.1 ppm 47.2 ppm 40.9 ppm
  • 32. Experiments: J-Couplings: HSQC alias HMQC, CH-COSY, HETCOR Acronyms: HSQC: Heteronuclear Single Quantum Coherence HMQC: Heteronuclear Multiple Quantum Coherence 1JCH •Cross-Signals in HSQC-diagrams caused by CH-Couplings via one CH-bond(1JCH, ~140 Hz ). •Experiment yields list of pairs of directly bonded carbon and hydrogen atoms.
  • 34. Experiments: J-Couplings: HH-COSY Acronyms: DQF-COSY: Double Quantum Filtered COrrelation Spectroscopy •3JHH-Couplings (and a few others, unfortunately) •Proton-rich skeletons might be elucidated just by HHCOSY und HSQC •Problem: Quarternary carbons, heteroatoms in the skeleton 3JHH
  • 35. 3JHH gs-COSY without DQ-Filter 2 scans/increment, 128 increments 5 min experiment time Experiments: J-Couplings: HH-COSY
  • 36. 3JHH gs-COSY without DQ-Filter 2 scans/increment, 128 increments 5 min experiment time Experiments: J-Couplings: HH-COSY
  • 37. Experiments: J-Couplings:TOCSY Acronyms: TOCSY: Total Correlation SpectroscopY HOHAHA (HOmonuclear HArtmann-HAhn) transfer •Correlates all protons in a scalarly- coupled spin system via spin-lock Puls.
  • 38. Experiments: J-Couplings: HMBC Acronyms: HMBC: Heteronuclear Multiple Bond Coherence COLOC: COrrelation via LOng range Kopplungen •Cross-signals through scalar couplings between carbon and hydrogen via 2 or 3 bonds (2 JCH/3 JCH , ~8 Hz ). •Problem: 2 JCH/3 JCH -couplings cannot be distinguished. •Problem: 4 JCH/5 JCH -couplings, which cannot be easily distinguished from 2 JCH/ 3 JCH -couplings
  • 40. Experiments: J-Couplings-> INADEQUATE Acronyms: INADEQUATE: Incredible Natural Abundance DoublE QUAnTum CoherencE •Powerful 13 C-detected method •Cross signals via 1-bond CC couplings(1 JCC , ~40 Hz ) •Problem: 0.011x 0.011 ~ 1/10000 (0.01%). •Large amount of substance needed (10mg/C-atom)
  • 42. Experiments: Dipolar Couplings: NOESY Acronyms: NOESY: Nuclear Overhauser Enhanced SpectroscopY •Cross signals via dipolar couplings through space between protons that are less than 5 Å apart in space •I ~ r-6 •Important experiment for 3D structure determination of macromolecular structure.
  • 43. After evaluation of HMBC, we are looking at a molecular puzzle of pairs of carbon atoms that are either 1 or 2 bonds apart (but we don’t know which is the case). 31.5 ppm 31.3 ppm 38.0 ppm 144.5 ppm 20.9 ppm 26.4 ppm 23.0 ppm 116.1 ppm 47.2 ppm 40.9 ppm 1.16 ppm 2.34 ppm 1.63 ppm 2.19 ppm 0.85 ppm 1.27 ppm5.17 ppm 2.06 ppm 1.93 ppm
  • 44. Structure Elucidation (CASE): Step by Step HR-MS, et al. Typical 1H and 13C-chemical shifts DEPT, 1JCH-Correlations HH COSY, TOCSY HMBC 1,1-ADEQUATE Gross Formula Functional Groups Structural Fragments Constitution Relative Configuration Complete Structure 3JHH-Couplings NOE-Diff. spectra HH NOESY, HH ROESY C10H16 2x 3x 3x 2x
  • 46. Computer-Assisted Structure Elucidation (CASE): Step by Step HR-MS, et al. Typical 1H and 13C-chemical shifts DEPT, 1JCH-Correlations HH COSY, TOCSY HMBC 1,1-ADEQUATE Gross Formula Functional Groups Structural Fragments Constitution Relative Configuration Complete Structure 3JHH-Couplings NOE-Diff. spectra HH NOESY, HH ROESY C10H16 2x 3x 3x 2x
  • 47. Yes, it is the same workflow and you need a bit of cheminformatics behind the scene to do the job
  • 48. The Chemistry Development Kit (CDK): 
 An Open Source Java Library for Structural Cheminformatics
 http://cdk.github.io
  • 52. Fast structure searches via binning
  • 53. 40 years of CASE research
  • 57. Munk, M.E. et al., 1982. Computer-assisted structure elucidation. Fresenius' Zeitschrift für analytische Chemie, 313(6), pp.473–479.
  • 59. LSD (Logic for Structure Determination) • LSD (Logic for Structure Determination • Command-line driven • Takes spectral constraints as input • Generates lists of connection tables (molecules) • Open Source • Rocket-fast • No early spectrum processing
  • 60. Christoph Steinbeck 48 Ab-Initio Structure Elucidation by 2D NMR No. 13C CPD 1JCH (HMQC) HMBC 1 148.24 - 4.0; 2.42; 2.28; 1.20 2 118.25 5.49 4.0; 2.28; 2.17 3 66.32 4.0 5.49 4 43.87 2.17 5.49; 4.0; 2.42; 1.32; 1.20; 0.86 5 41.40 2.14 5.49; 1.32; 0.86 6 38.32 - 2.42; 1.32; 1.20; 0.86 7 32.00 1.20/2.42 8 31.52 2.28 2.42; 1.20 9 26.52 1.32 0.86; 2.17 10 21.46 0.86 1.20; 1.32 2D peak picking table Table with heavy-atom relations Internally generated by CASE program Steinbeck, C. Computer-Assisted Structure Elucidation. In Handbook on Chemoinformatics.; Gasteiger, J. Ed.; Wiley-VCH: Weinheim, 2003; Vol. 2; pp. 1378-1406.
  • 62. Limits to Growth • Deterministic methods suffer from combinatorial explosion
  • 63. Limits to Growth • Deterministic methods suffer from combinatorial explosion No. of Heavy Atoms No.ofConstitutionalIsomer CalculationTime
  • 64. Limits to Growth • Deterministic methods suffer from combinatorial explosion No. of Heavy Atoms No.ofConstitutionalIsomer CalculationTime C10H16 (10 Heavy Atoms) 24938 Constitutional Isomers
  • 65. Limits to Growth • Deterministic methods suffer from combinatorial explosion No. of Heavy Atoms No.ofConstitutionalIsomer CalculationTime C13H16O3 (16 Heavy Atoms) > 2,000,000,000 Constitutional Isomers O O O C10H16 (10 Heavy Atoms) 24938 Constitutional Isomers
  • 66. Limits to Growth • Deterministic methods suffer from combinatorial explosion No. of Heavy Atoms No.ofConstitutionalIsomer CalculationTime C13H16O3 (16 Heavy Atoms) > 2,000,000,000 Constitutional Isomers O O O C10H16 (10 Heavy Atoms) 24938 Constitutional Isomers C30H48O2 (32 Heavy Atoms) >> 1012 Constitutional Isomers HO OH
  • 67. Limits to Growth • Deterministic methods suffer from combinatorial explosion • Prospective use of spectroscopic input information may make them error-intolerant No. of Heavy Atoms No.ofConstitutionalIsomer CalculationTime C13H16O3 (16 Heavy Atoms) > 2,000,000,000 Constitutional Isomers O O O C10H16 (10 Heavy Atoms) 24938 Constitutional Isomers C30H48O2 (32 Heavy Atoms) >> 1012 Constitutional Isomers HO OH
  • 68. Deterministic Structure Generators: The LUCY Method • Prospective use of spectral information for building isomers • Needs 1D 13C, 2D HMQC, HMBC, HH COSY • Example: Walk a decision tree while interpreting HMBC-Signals N Atoms, No Bonds Inserted Bonds: 1 2 3 n HMBC-derived relations between Heteroatoms Steinbeck, C.; Angewandte Chemie. International Ed. in English 1996, 35, 1984-1986.
  • 69. LSD Input Syntax: Basics ⍺-Pinene Example http://eos.univ-reims.fr/LSD/index_ENG.html MULT 1 C 2 0 MULT 2 C 2 1 MULT 3 C 3 1 MULT 4 C 3 1 MULT 5 C 3 0 MULT 6 C 3 2 MULT 7 C 3 2 MULT 8 C 3 3 MULT 9 C 3 3 MULT 10 C 3 3 HMQC 2 2 HMQC 3 3 HMQC 4 4 HMQC 6 6 HMQC 7 7 HMQC 8 8 HMQC 9 9 HMQC 10 10 HMBC 1 6 HMBC 1 9 HMBC 2 3 HMBC 2 9 HMBC 3 6 HMBC 3 8 HMBC 3 9 HMBC 3 10 HMBC 4 6 HMBC 4 8 HMBC 4 10 HMBC 5 6 HMBC 5 8 HMBC 5 10 HMBC 7 6 HMBC 8 10 HMBC 9 3 HMBC 10 8 Atomdefinitions1JCH,redundant? 2/3JCH,C-H-long-rangecorrelations C oncatenate those blocks in a textfile
  • 70. LSD Usage ⍺-Pinene Example $ lsd pinene.in $ outlsd 7 < pinene.sol > pinene.sdf http://eos.univ-reims.fr/LSD/index_ENG.html
  • 71. LSD Usage ⍺-Pinene Example $ lsd pinene.in $ outlsd 7 < pinene.sol > pinene.sdf http://eos.univ-reims.fr/LSD/index_ENG.html Command prompt - don’t type this in
  • 72. LSD Usage ⍺-Pinene Example $ lsd pinene.in $ outlsd 7 < pinene.sol > pinene.sdf http://eos.univ-reims.fr/LSD/index_ENG.html Command prompt - don’t type this in will produce pinene.sol (LSD specific output file)
  • 73. LSD Usage ⍺-Pinene Example $ lsd pinene.in $ outlsd 7 < pinene.sol > pinene.sdf http://eos.univ-reims.fr/LSD/index_ENG.html Command prompt - don’t type this in will produce pinene.sol (LSD specific output file) converts .sol into .sdf (standard molecular file format) Use e.g. MarvinView to view .sdf
  • 74. LSD Usage ⍺-Pinene Example MarvinView rendering of the two results in accordance with out input data from the previous slide
  • 75. LSD Input Syntax:Advanced Liu et al., C34H52O6, HRESI-MS (m/z 557.3833 [m+H]) ELIM 3 4 MULT 1 C 2 0 MULT 2 C 3 0 [… 35 more omitted …] ; known carbonyls BOND 1 36 BOND 7 37 BOND 10 39 CARB L1 ; define list L1 containing all carbons HETE L2 LIST L3 2 3 6 8 17 18 27 28 LIST L4 9 38 40 PROP L1 1 L2 - ; Every carbon atom ; can carry one or less ; hetero-atoms, but not ; two PROP L4 0 L3 ; Every oxygen which is ; not an sp2 O has a ; parter from L3 (based ; on a conservative ; chemical shift ; inspection) COSY 4 5 [… 4 more omitted …] HMQC 4 4 [… more omitted …] HMBC 2 4 [… 98 more omitted …] http://eos.univ-reims.fr/LSD/index_ENG.html
  • 76. Liu et al., C34H52O6, HRESI-MS (m/z 557.3833 [m+H]) InChIKey=ANWFPAAUCGPEBV-MOHJPFBDNA-N InChIKey=YAJAXOAXTCGOQA-HKOYGPOVNA-N
  • 77. Liu et al., C34H52O6, HRESI-MS (m/z 557.3833 [m+H]) InChIKey=ANWFPAAUCGPEBV-MOHJPFBDNA-N InChIKey=YAJAXOAXTCGOQA-HKOYGPOVNA-N InChIKey of published compound #14 = YAJAXOAXTCGOQA-HKOYGPOVNA-N
  • 79. Stochastic Search Methods • Simulated Annealing • Traveling Salesman Problem • Finding the solution structure of large biomolecules • Integrated Circuits Layout • Robotic Path Planning • Genetic Algorithms • Protein Folding • Immune System Simulation • Computer-Aided Design • Quite a number of other options ... Algorithms known to tackle large search spaces:
  • 80. Simulated Annealing Guided Walk in Constitution Space
  • 81. Simulated Annealing Guided Walk in Constitution Space
  • 82. Simulated Annealing Guided Walk in Constitution Space
  • 83. Simulated Annealing Guided Walk in Constitution Space
  • 84. Simulated Annealing Guided Walk in Constitution Space Neighbors in constitution space
  • 85. Simulated Annealing Guided Walk in Constitution Space Neighbors in constitution space • are in close chemical distance to each other
  • 86. Simulated Annealing Guided Walk in Constitution Space Neighbors in constitution space • are in close chemical distance to each other • are likely to be similar in their spectroscopic properties
  • 87. Simulated Annealing Guided Walk in Constitution Space Neighbors in constitution space • are in close chemical distance to each other • are likely to be similar in their spectroscopic properties
  • 88. Simulated Annealing Small steps on the constitution space landscape Faulon, J.-L.; J. Chem. Inf. Comput. Sci., 36 (1996) 4, 731-40
  • 89. Simulated Annealing Evaluating a score for each point (constitution) in structure space Score Function based on Spectroscopic Fitness Stotal = c1 SHMBC + c2 SHHCOSY + c3 SShift + … + cn SFeatures
  • 90. Simulated Annealing Small steps on the constitution space landscape exp( ) f p T δ = − T t0,0 Annealing Schedule Tt=αTt-1 with 0.9 < α < 1 Acceptance criterion
  • 91. Simulated Annealing Small steps on the constitution space landscape exp( ) f p T δ = − T t0,0 Annealing Schedule Tt=αTt-1 with 0.9 < α < 1 Acceptance criterion
  • 92. Simulated Annealing Small steps on the constitution space landscape exp( ) f p T δ = − T t0,0 Annealing Schedule Tt=αTt-1 with 0.9 < α < 1 Acceptance criterion
  • 93. Simulated Annealing Small steps on the constitution space landscape exp( ) f p T δ = − T t0,0 Annealing Schedule Tt=αTt-1 with 0.9 < α < 1 Acceptance criterion
  • 94. Some Nice Properties of this SA Scheme
  • 95. Some Nice Properties of this SA Scheme •Pluggable Target Function
  • 96. Some Nice Properties of this SA Scheme • If you can reliably calculate a measurable property for a given constitution, it can be part of your target function •Pluggable Target Function
  • 97. Some Nice Properties of this SA Scheme • Spectroscopic information • IR • UV-VIS • Other types of NMR experiments • MS fragmentation (?) • If you can reliably calculate a measurable property for a given constitution, it can be part of your target function •Pluggable Target Function
  • 98. Some Nice Properties of this SA Scheme • Spectroscopic information • IR • UV-VIS • Other types of NMR experiments • MS fragmentation (?) • If you can reliably calculate a measurable property for a given constitution, it can be part of your target function •Pluggable Target Function • Additional knowledge • Good-List/Bad-List fragments • Drug Likeness • Natural Product Likeness
  • 99. Some Nice Properties of this SA Scheme • Spectroscopic information • IR • UV-VIS • Other types of NMR experiments • MS fragmentation (?) • If you can reliably calculate a measurable property for a given constitution, it can be part of your target function •Pluggable Target Function • Additional knowledge • Good-List/Bad-List fragments • Drug Likeness • Natural Product Likeness General System for Optimization in Constitution Space
  • 100. Some Nice Properties of this SA Scheme • Spectroscopic information • IR • UV-VIS • Other types of NMR experiments • MS fragmentation (?) • If you can reliably calculate a measurable property for a given constitution, it can be part of your target function •Pluggable Target Function •Artifacts only lead to slightly lower ranking of correct structure in hit list • Additional knowledge • Good-List/Bad-List fragments • Drug Likeness • Natural Product Likeness General System for Optimization in Constitution Space
  • 101. Score convergence for SENECA SA run on Polycarpol (32 heavy atoms), performed by 8 server processes Score 3000 4250 5500 6750 8000 Iteration 0 33500 67000 101000134500 168000201500 235000268500 302000335500 HO OH Polycarpol (C30H48O2). Distributed computing at its cheapest. Distributed Server Client Gatekeeper Retrieve Server List Collect ResultsSubmit Spectral Data
  • 102. Qualitative assessment: Computational complexity of deterministic and stochastic algorithms Compound LUCY SENECA Steps overall α-Pinene (C10H16) 2 s 1 min 30,000 Eurabidiol(C15H28O2) 29 s 5 min 90 000 Polycarpol (C30H50O) 33 min 12 min 350,000 OH HO HO OH Eurabidiol (C15H28O2) Polycarpol (C30H48O2).α-Pinene (C10H16) Deterministic vs SA Generation Timeinseconds 0 500 1000 1500 2000 No of Heavy Atoms 0 8 15 23 30 Deterministic SA C. Steinbeck, Journal of Chemical Information & Computer Sciences 2001, 41, 1500.
  • 103. Ranking Solutions • Heteroatom-rich/proton-poor skeletons can yield many solutions (hundreds, thousands) • Possible ranking by • Spectrum Similarity • Natural Product Likeness
  • 118. Component for Signature Generation
  • 121. NP - Natural product SM - Synthetic molecule Statistics for Scoring
  • 122. In the fragment contribution (Fragmenti), ✦NPi is the total number of molecules in the NP dataset in which the Fragmenti occurs, ✦SMi is the total number of molecules in the SM dataset in which the Fragmenti occurs, ✦SMt is the total number of molecules int he SM dataset ✦NPt is the total number of molecules in the NP dataset. ✦N is the number of fragments in given molecule NP - Natural product SM - Synthetic molecule Statistics for Scoring
  • 123. In the fragment contribution (Fragmenti), ✦NPi is the total number of molecules in the NP dataset in which the Fragmenti occurs, ✦SMi is the total number of molecules in the SM dataset in which the Fragmenti occurs, ✦SMt is the total number of molecules int he SM dataset ✦NPt is the total number of molecules in the NP dataset. ✦N is the number of fragments in given molecule NP - Natural product SM - Synthetic molecule Statistics for Scoring
  • 124. In the fragment contribution (Fragmenti), ✦NPi is the total number of molecules in the NP dataset in which the Fragmenti occurs, ✦SMi is the total number of molecules in the SM dataset in which the Fragmenti occurs, ✦SMt is the total number of molecules int he SM dataset ✦NPt is the total number of molecules in the NP dataset. ✦N is the number of fragments in given molecule NP - Natural product SM - Synthetic molecule Statistics for Scoring
  • 125. Jayaseelan KV, Moreno P, Truszkowski A, Ertl P & Steinbeck C (2012) Natural product-likeness score revisited: an open-source, open-data implementation. BMC Bioinformatics 13, 106. • Natural Product-likeness classification and integrated it into Taverna workflow tool • (http://sourceforge.net/projects/np-likeness/). • Included in second version of SENECA CASE