SlideShare a Scribd company logo
Revising the Topliss decision
tree
…based on 30 years of medicinal chemistry
literature
Noel O’Boyle and Roger Sayle
NextMove Software
Jonas Boström
AstraZeneca
248th ACS National Meeting
Aug 2014
http://www.acsmedchem.org/topliss.html
Topliss Tree for Substituted
Phenyl
Topliss, J. G. Utilization of Operational Schemes for Analog Synthesis in
Drug Design. J. Med. Chem. 1972, 15, 1006–1011.
Features of the Topliss Tree
• Maximize the chances of synthesizing the
most potent compound in the series as soon
as possible
• Based on inferring Hansch structure-activity
relationship from relative potencies of R
groups
– Electronic (σ), hydrophobic (π), steric (Es)
• General scheme
– for any target
– for any scaffold
ChEMBL Bioactivity database
• July 2008 - ChEMBL established
with Wellcome Trust grant
– John Overington, EMBL-EBI
• Open access source of bioactivity data
abstracted from the literature
– Chemical structures, activity values, activity type,
assay description, journal article name, target
– www.ebi.ac.uk/chembl/
Gaulton et al. Nucleic Acids Res. 2012, 40, D1100
ChEMBL Bioactivity database
• ChEMBL 19 – July 2014
– 57k papers
• 94% from Bioorg. Med. Chem. Lett., J. Med. Chem., J.
Nat. Prod., Bioorg. Med. Chem., Eur. J. Med. Chem.,
Antimicrob. Agents Chemother., Med. Chem. Res.
– 1.4 million compounds with 12 million activities
– 1.1 million assays against 10k targets
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
1977 1982 1987 1992 1997 2002 2007 2012
Count
Year
Number of articles extracted from a particular year
Matched (Molecular) series
• Recent concept in cheminformatics (*)
– … not so recent in medicinal chemistry
• Series of structural analogs
– same scaffold
– different R groups at a single position
* “Matching molecular series” introduced by Wawer and
Bajorath J. Med. Chem. 2011, 54, 2944
Matched Series of length 3
[Cl, F, NH2]
Matched Series of length 3
[4-Cl-Ph, 4-F-Ph, 4-NH2-Ph]
Ordered Matched Series
[4-Cl-Ph > 4-F-Ph > 4-NH2-Ph]
3.5
2.1
1.6
pIC50
1
10
100
1000
10000
100000
1000000
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
Frequency
Series length
Matched series in ChEMBL19 IC50 binding assays
Length 2: 240,967 212,494
Length 3: 59,753 52,666
Length 4: 27,779 24,306
Length 5: 15,892 13,834
Length 6: 10,619 9,203
Method described in O’Boyle, Boström, Sayle, Gill. Using Matched Molecular Series as a Predictive Tool
To Optimize Biological Activity. J. Med. Chem. 2014, 57, 2704.
(ChEMBL16)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Frequency
R Groups sorted by frequency
Analysis of the 16268 matched series
containing at least 4 substituted phenyls
Find R Groups that increase activity
A > B
Query A > B > C
C > A > B
D > A > B > C
D > A > C > B
E > D > A > B
…
R Group Observations
Obs that
increase
activity
% that
increase
activity
D 3 3 100
E 1 1 100
C 4 1 25
… … …
O’Boyle, Boström, Sayle, Gill. Using Matched Molecular Series as a Predictive
Tool To Optimize Biological Activity. J. Med. Chem. 2014, 57, 2704.
Example
Example II
Topliss Decision Tree
Topliss Decision Tree
Topliss Decision Tree
Topliss Decision Tree
(1st if lower cutoff)
Topliss Decision Tree
Topliss Decision Tree
(20th)
Topliss Decision Tree
Topliss Decision Tree
(21st)
“Assuming that the –σ effect is
the most probable explanation…”
Topliss Decision Tree
Topliss Decision Tree
Topliss Decision Tree
Topliss Decision Tree
*
Matsy Decision Tree
Matsy Decision Tree (Take II)
Target specific subsets
4-Cl > H
Everything Kinases Class A GPCRs
Account for Lipophilic Efficiency
• ΔLiPE = ΔpIC50 – ΔLogP
• The “%>” value is based on the number of
times a particular R group has greater pIC50
– i.e. ΔpIC50 > 0
• Redefine it to only include cases where the
increase in pIC50 was larger than any increase
in LogP
– i.e. ΔpIC50 > 0 and ΔLiPE > 0
4-Cl > H
ΔLiPE > 0
ΔLiPE > 0
3,4-diCl > 4-Cl > H
Data-driven approach
• Not limited to the two trees in the Topliss paper
• All predictions backed by experimental data
– Can drill-down into the data, look at targets,
scaffolds
– Can restrict experimental data used to particular
targets, use in-house data rather than ChEMBL
• Does not explain why, only that it happens
Conclusions
• In the main, the Topliss Tree is supported by
published data
– Largest difference is recommendation of 4-OMe
rather than 4-OH
– Suggestion of 4-CF3 is also problematic
• We have generated the corresponding ‘Matsy
Tree’ derived from experimental data
drag-and-drop interface to Matsy
Revising the Topliss decision tree
…based on 30 years of medicinal
chemistry literature
Using Matched Molecular Series as a
Predictive Tool To Optimize Biological
Activity
J. Med. Chem. 2014, 57, 2704.
Want to hear more?
Poster COMP 394
Tuesday 6:00-8:00pm Marriott Marquis
Interested in an evaluation copy of Matsy?
Come by our booth
noel@nextmovesoftware.com

More Related Content

PPTX
Relationship between hansch analysis and free wilson analysis
PPTX
Hansch and Free-Wilson QSAR Models
PPTX
CoMFA CoMFA Comparative Molecular Field Analysis)
PPTX
Pharmacophore Mapping and Virtual Screening (Computer aided Drug design)
PPTX
PHARMACOHORE MAPPING AND VIRTUAL SCRRENING FOR RESEARCH DEPARTMENT
PPTX
ria on digitalis
PPTX
solid phase synthesis Presentation by komal
PPTX
Free wilson analysis
Relationship between hansch analysis and free wilson analysis
Hansch and Free-Wilson QSAR Models
CoMFA CoMFA Comparative Molecular Field Analysis)
Pharmacophore Mapping and Virtual Screening (Computer aided Drug design)
PHARMACOHORE MAPPING AND VIRTUAL SCRRENING FOR RESEARCH DEPARTMENT
ria on digitalis
solid phase synthesis Presentation by komal
Free wilson analysis

What's hot (20)

PDF
Pharmacophore modeling
PPTX
RADIO IMMUNO ASSAY
PPTX
QSAR by hansch analysis
PPTX
Qsar
PPTX
PDF
Quantitative structure activity relationships
PPTX
3D QSAR
PPTX
STATISTICAL METHOD OF QSAR
PPTX
Analog design bioisosterism
PPTX
CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]
PPT
Ria 112070804007
PPTX
Pharmacophore mapping.pptx
PDF
Physicochemical properties (descriptors) in QSAR.pdf
PPTX
Stereochemistry&drug action- mounika.perli
PPTX
PRODRUG DESIGN [M.PHARM]
PPTX
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
PPTX
QSAR applications: Hansch analysis and Free Wilson analysis, CADD
PPTX
SAR & QSAR
PPTX
PRINCIPLES of FT-NMR & 13C NMR
PPTX
3 d qsar approaches structure
Pharmacophore modeling
RADIO IMMUNO ASSAY
QSAR by hansch analysis
Qsar
Quantitative structure activity relationships
3D QSAR
STATISTICAL METHOD OF QSAR
Analog design bioisosterism
CHEMISTRY OF PEPTIDES [M.PHARM, M.SC, BSC, B.PHARM]
Ria 112070804007
Pharmacophore mapping.pptx
Physicochemical properties (descriptors) in QSAR.pdf
Stereochemistry&drug action- mounika.perli
PRODRUG DESIGN [M.PHARM]
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
QSAR applications: Hansch analysis and Free Wilson analysis, CADD
SAR & QSAR
PRINCIPLES of FT-NMR & 13C NMR
3 d qsar approaches structure
Ad

Viewers also liked (20)

PPT
Qsar by hansch analysis
PPTX
Free wilson analysis qsar
PPT
QSAR : Activity Relationships Quantitative Structure
PPT
Qsar
PDF
InChI for Large Molecules
PDF
Evidence-based medicinal chemistry using matched molecular series
PDF
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
PPT
PPT
Qsar and drug design ppt
PDF
David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
PDF
Chemistry and reactions from non-US patents
PDF
Representation and display of non-standard peptides using semi-systematic ami...
PPTX
Leedsphcem ADRENERGIC AGENTS
PDF
Compact models for compact devices: Visualisation of SAR using mobile apps
PDF
Peptide Informatics - Bridging the gap between small-molecule and large-molec...
PPTX
ADRENERGIC AGENTS
PDF
Peptide line notations for biologics registration and patent filings
PDF
Using Matched Series to decide what compound to make next
PDF
Standardized Representations of ELN Reactions for Categorization and Duplicat...
PPT
Pharmacology Antibiotics
Qsar by hansch analysis
Free wilson analysis qsar
QSAR : Activity Relationships Quantitative Structure
Qsar
InChI for Large Molecules
Evidence-based medicinal chemistry using matched molecular series
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
Qsar and drug design ppt
David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
Chemistry and reactions from non-US patents
Representation and display of non-standard peptides using semi-systematic ami...
Leedsphcem ADRENERGIC AGENTS
Compact models for compact devices: Visualisation of SAR using mobile apps
Peptide Informatics - Bridging the gap between small-molecule and large-molec...
ADRENERGIC AGENTS
Peptide line notations for biologics registration and patent filings
Using Matched Series to decide what compound to make next
Standardized Representations of ELN Reactions for Categorization and Duplicat...
Pharmacology Antibiotics
Ad

Similar to Revising the Topliss Decision Tree (20)

PDF
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
PPT
Prediction Of Bioactivity From Chemical Structure
PDF
UCT Oct 2014
PPTX
Richard Cramer 2014 euro QSAR presentation
PPT
Primer design task
PPT
SOT short course on computational toxicology
PDF
Computational tools for drug discovery
PDF
Aspects of pharmaceutical molecular design (Fidelta version)
PDF
Predicting Value of Binding Constants of Organic Ligands to Beta-Cyclodextrin...
PDF
Aspects of pharmaceutical molecular design (Belgrade version)
PPTX
LEAD IDENTIFICATION BY SUHAS PATIL (S.K.)
PPT
Current initiatives in developing research data repositories at the Royal Soc...
PDF
How to use data to design and optimize reaction? A quick introduction to work...
PDF
Computational Chemistry: From Theory to Practice
PPTX
3 D QSAR Approaches and Contour Map Analysis
PDF
Property-based molecular design: where next? (12-Jun-2015)
PPT
Prediction of transcription factor binding to DNA using rule induction methods
PPTX
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
PDF
Translating data to model ICCS2022_pub.pdf
PPTX
DFT Presentation.pptx
CINF 29: Visualization and manipulation of Matched Molecular Series for decis...
Prediction Of Bioactivity From Chemical Structure
UCT Oct 2014
Richard Cramer 2014 euro QSAR presentation
Primer design task
SOT short course on computational toxicology
Computational tools for drug discovery
Aspects of pharmaceutical molecular design (Fidelta version)
Predicting Value of Binding Constants of Organic Ligands to Beta-Cyclodextrin...
Aspects of pharmaceutical molecular design (Belgrade version)
LEAD IDENTIFICATION BY SUHAS PATIL (S.K.)
Current initiatives in developing research data repositories at the Royal Soc...
How to use data to design and optimize reaction? A quick introduction to work...
Computational Chemistry: From Theory to Practice
3 D QSAR Approaches and Contour Map Analysis
Property-based molecular design: where next? (12-Jun-2015)
Prediction of transcription factor binding to DNA using rule induction methods
Basics of QSAR Modeling by Prof Rahul D. Jawarkar.pptx
Translating data to model ICCS2022_pub.pdf
DFT Presentation.pptx

More from NextMove Software (20)

PDF
DeepSMILES
PDF
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
PDF
Building a bridge between human-readable and machine-readable representations...
PDF
CINF 35: Structure searching for patent information: The need for speed
PDF
A de facto standard or a free-for-all? A benchmark for reading SMILES
PDF
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
PDF
Can we agree on the structure represented by a SMILES string? A benchmark dat...
PDF
Comparing Cahn-Ingold-Prelog Rule Implementations
PDF
Eugene Garfield: the father of chemical text mining and artificial intelligen...
PDF
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
PDF
Recent improvements to the RDKit
PDF
Pharmaceutical industry best practices in lessons learned: ELN implementation...
PDF
Digital Chemical Representations
PDF
Challenges and successes in machine interpretation of Markush descriptions
PDF
PubChem as a Biologics Database
PDF
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
PDF
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
PDF
Building on Sand: Standard InChIs on non-standard molfiles
PDF
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
PDF
Advanced grammars for state-of-the-art named entity recognition (NER)
DeepSMILES
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
Building a bridge between human-readable and machine-readable representations...
CINF 35: Structure searching for patent information: The need for speed
A de facto standard or a free-for-all? A benchmark for reading SMILES
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Comparing Cahn-Ingold-Prelog Rule Implementations
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Recent improvements to the RDKit
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Digital Chemical Representations
Challenges and successes in machine interpretation of Markush descriptions
PubChem as a Biologics Database
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
Building on Sand: Standard InChIs on non-standard molfiles
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Advanced grammars for state-of-the-art named entity recognition (NER)

Recently uploaded (20)

PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
PMR- PPT.pptx for students and doctors tt
PPTX
BIOMOLECULES PPT........................
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PPT
LEC Synthetic Biology and its application.ppt
PPTX
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
PPTX
gene cloning powerpoint for general biology 2
PPT
Presentation of a Romanian Institutee 2.
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PDF
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
PPTX
Microbes in human welfare class 12 .pptx
PDF
Science Form five needed shit SCIENEce so
PPTX
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
PPTX
Substance Disorders- part different drugs change body
PPT
veterinary parasitology ````````````.ppt
PPTX
Introcution to Microbes Burton's Biology for the Health
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PMR- PPT.pptx for students and doctors tt
BIOMOLECULES PPT........................
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
LEC Synthetic Biology and its application.ppt
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
gene cloning powerpoint for general biology 2
Presentation of a Romanian Institutee 2.
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
Microbes in human welfare class 12 .pptx
Science Form five needed shit SCIENEce so
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
Substance Disorders- part different drugs change body
veterinary parasitology ````````````.ppt
Introcution to Microbes Burton's Biology for the Health
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Placing the Near-Earth Object Impact Probability in Context
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw

Revising the Topliss Decision Tree

  • 1. Revising the Topliss decision tree …based on 30 years of medicinal chemistry literature Noel O’Boyle and Roger Sayle NextMove Software Jonas Boström AstraZeneca 248th ACS National Meeting Aug 2014
  • 3. Topliss Tree for Substituted Phenyl Topliss, J. G. Utilization of Operational Schemes for Analog Synthesis in Drug Design. J. Med. Chem. 1972, 15, 1006–1011.
  • 4. Features of the Topliss Tree • Maximize the chances of synthesizing the most potent compound in the series as soon as possible • Based on inferring Hansch structure-activity relationship from relative potencies of R groups – Electronic (σ), hydrophobic (π), steric (Es) • General scheme – for any target – for any scaffold
  • 5. ChEMBL Bioactivity database • July 2008 - ChEMBL established with Wellcome Trust grant – John Overington, EMBL-EBI • Open access source of bioactivity data abstracted from the literature – Chemical structures, activity values, activity type, assay description, journal article name, target – www.ebi.ac.uk/chembl/ Gaulton et al. Nucleic Acids Res. 2012, 40, D1100
  • 6. ChEMBL Bioactivity database • ChEMBL 19 – July 2014 – 57k papers • 94% from Bioorg. Med. Chem. Lett., J. Med. Chem., J. Nat. Prod., Bioorg. Med. Chem., Eur. J. Med. Chem., Antimicrob. Agents Chemother., Med. Chem. Res. – 1.4 million compounds with 12 million activities – 1.1 million assays against 10k targets
  • 7. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 1977 1982 1987 1992 1997 2002 2007 2012 Count Year Number of articles extracted from a particular year
  • 8. Matched (Molecular) series • Recent concept in cheminformatics (*) – … not so recent in medicinal chemistry • Series of structural analogs – same scaffold – different R groups at a single position * “Matching molecular series” introduced by Wawer and Bajorath J. Med. Chem. 2011, 54, 2944
  • 9. Matched Series of length 3 [Cl, F, NH2]
  • 10. Matched Series of length 3 [4-Cl-Ph, 4-F-Ph, 4-NH2-Ph]
  • 11. Ordered Matched Series [4-Cl-Ph > 4-F-Ph > 4-NH2-Ph] 3.5 2.1 1.6 pIC50
  • 12. 1 10 100 1000 10000 100000 1000000 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 Frequency Series length Matched series in ChEMBL19 IC50 binding assays Length 2: 240,967 212,494 Length 3: 59,753 52,666 Length 4: 27,779 24,306 Length 5: 15,892 13,834 Length 6: 10,619 9,203 Method described in O’Boyle, Boström, Sayle, Gill. Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity. J. Med. Chem. 2014, 57, 2704. (ChEMBL16)
  • 13. 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Frequency R Groups sorted by frequency Analysis of the 16268 matched series containing at least 4 substituted phenyls
  • 14. Find R Groups that increase activity A > B Query A > B > C C > A > B D > A > B > C D > A > C > B E > D > A > B … R Group Observations Obs that increase activity % that increase activity D 3 3 100 E 1 1 100 C 4 1 25 … … … O’Boyle, Boström, Sayle, Gill. Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity. J. Med. Chem. 2014, 57, 2704.
  • 20. Topliss Decision Tree (1st if lower cutoff)
  • 24. Topliss Decision Tree (21st) “Assuming that the –σ effect is the most probable explanation…”
  • 30. Matsy Decision Tree (Take II)
  • 31. Target specific subsets 4-Cl > H Everything Kinases Class A GPCRs
  • 32. Account for Lipophilic Efficiency • ΔLiPE = ΔpIC50 – ΔLogP • The “%>” value is based on the number of times a particular R group has greater pIC50 – i.e. ΔpIC50 > 0 • Redefine it to only include cases where the increase in pIC50 was larger than any increase in LogP – i.e. ΔpIC50 > 0 and ΔLiPE > 0
  • 34. ΔLiPE > 0 3,4-diCl > 4-Cl > H
  • 35. Data-driven approach • Not limited to the two trees in the Topliss paper • All predictions backed by experimental data – Can drill-down into the data, look at targets, scaffolds – Can restrict experimental data used to particular targets, use in-house data rather than ChEMBL • Does not explain why, only that it happens
  • 36. Conclusions • In the main, the Topliss Tree is supported by published data – Largest difference is recommendation of 4-OMe rather than 4-OH – Suggestion of 4-CF3 is also problematic • We have generated the corresponding ‘Matsy Tree’ derived from experimental data
  • 38. Revising the Topliss decision tree …based on 30 years of medicinal chemistry literature Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity J. Med. Chem. 2014, 57, 2704. Want to hear more? Poster COMP 394 Tuesday 6:00-8:00pm Marriott Marquis Interested in an evaluation copy of Matsy? Come by our booth noel@nextmovesoftware.com