SlideShare a Scribd company logo
Visualization and manipulation of
Matched Molecular Series for
decision support
Noel O’Boyle and Roger Sayle
NextMove Software
250th ACS National Meeting, Boston
16th Aug 2015
Matched (Molecular) Pairs
[Cl, F]1.6
3.5
Coined by Kenny and Sadowski in 2005*
Easier to predict differences in the values of a property
than it is to predict the value itself
* Chemoinformatics in drug discovery, Wiley, 271–285.
Matched Pair usage
• Successfully used for:
– Rationalising and predicting physicochemical
property changes
– Finding bioisosteres
• Not very successful in improving activity
– Activity changes dependent on binding environment
• Need to look beyond matched pairs
Matched Series of length 2
= Matched Pair
[Cl, F]
“Matching molecular series” introduced by Wawer and
Bajorath, J. Med. Chem. 2011, 54, 2944
Matched Series of length 3
[Cl, F, NH2]
Ordered Matched Series of length 3
3.5
2.1
1.6
pIC50
[Cl > F > NH2]
The Matched Pair mentality
• There can only be two
– Like inhabitants of Flatland ignorant of a third
dimension
• What is the equivalent of pair for three?
– Triad, trio, triple?
• A matched pair represents a transformation
from A->B
– How would that work if it there were three?
Matsy:
Prediction using
Matched Series
Matched Series have prefeRred orders
Series Enrichment Observations
Br > Cl > F > H 5.36* 256
Cl > Br > F > H 3.14* 150
H > F > Cl > Br 1.53* 73
Br > Cl > H > F 1.40 67
F > Cl > Br > H 1.36 65
Cl > F > Br > H 0.96 46
… … …
H > F > Br > Cl 0.77 37
… … …
H > Br > F > Cl 0.48* 23
Cl > H > F > Br 0.48* 23
Cl > F > H > Br 0.48* 23
H > Cl > F > Br 0.42* 20
Br > F > H > Cl 0.40* 19
F > H > Br > Cl 0.40* 19
H > Cl > Br > F 0.38* 18
F > Br > H > Cl 0.36* 17
Br > H > F > Cl 0.17* 8
The fact that certain
orders are preferred
may be used as the basis
of a predictive method
Find R Groups that increase activity
A > B
Query matched series A > B > C
C > A > B
D > A > B > C
D > A > C > B
E > D > A > B
…
R Group Observations
Obs that
increase
activity
% that
increase
activity
D 3 3 100
E 1 1 100
C 4 1 25
… … …
In-house
O’Boyle, Boström, Sayle, Gill. J. Med. Chem. 2014, 57, 2704.
The Dataset-Centric approach
• “Here is my dataset of molecules with
activities – now tell me what to make next”
• Pro:
– Easy for users to get up-and-running
– Fits with their existing way of thinking
• Don’t need to think too much about matched series
• Con:
– User is one step removed from the matched series
data on which the predictions are actually based
– Dataset is fixed: cannot play with around with the
prediction input
Goals for the interface
• Visual interface based around R-Groups as first-
class objects arranged in ordered series
– Promote new paradigm
– Make it clear that the scaffold is not involved
• Should help break the “matched pair” mentality
– Just a particular case of matched series
• Should be easy to play with
– Easy to manipulate and quick to respond
• Drag-and-drop R Groups into slots to represent
observed activity order
– The query matched series
CINF 29: Visualization and manipulation of Matched Molecular Series for decision support
Pros
• Easy to play around with
– Swap around order of R groups
– See what happens if you follow the predictions
• May suggest hypotheses
• Useful for searching (not just for predictions)
• Tablet-friendly
ConS
• The user needs to be able to provide an
ordered matched series as a query
– You can’t just provide a dataset of molecules
No Chemistry required
• Predictions are solely based on the order of R
groups in a matched series
– Not using any calculated properties
• Images of all R groups in ChEMBL can be
generated in advance (~65K)
• ⇒ A cheminformatics toolkit is not required for
the interface or even for making predictions
• In practice, we do use a toolkit to allow the user
to enter R groups as SMILES
Use Case #1
Are matched Series
Predictions symmetric?
Are matched series predictions
symmetric?
• If A>B>C>D is a highly preferred order
– Then D>C>B>A also tends to be preferred
• Hypothesis:
– if A reduces the activity given D>C>B
⇒ it will also improve the activity given B>C>D
• If true, then we have twice as much data to
use for predictions
– Let’s find out….
? > Cl > F > H H > F > Cl > ?
Use case #2
Topliss Decision Tree
Topliss Decision Tree
Topliss, J. G. Utilization of Operational Schemes for Analog Synthesis in
Drug Design. J. Med. Chem. 1972, 15, 1006–1011.
Topliss Decision Tree
Topliss Decision Tree
(17th)
ChEMBL-BASED Decision Tree (One of
Many)
http://nextmovesoftware.com
noel@nextmovesoftware.com
@nmsoftware
Visualization and manipulation of
Matched Molecular Series for decision
support

More Related Content

PDF
Evidence-based medicinal chemistry using matched molecular series
PDF
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
PDF
Using Matched Series to decide what compound to make next
PDF
PPT
Dealing with the complex challenge of managing diverse analytical chemistry d...
PDF
Is 20TB really Big Data?
PDF
USUGM 2014 - Gregory Landrum (Novartis): What else can you do with the Marku...
PPT
Using online chemistry databases to facilitate structure identification in ma...
Evidence-based medicinal chemistry using matched molecular series
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
Using Matched Series to decide what compound to make next
Dealing with the complex challenge of managing diverse analytical chemistry d...
Is 20TB really Big Data?
USUGM 2014 - Gregory Landrum (Novartis): What else can you do with the Marku...
Using online chemistry databases to facilitate structure identification in ma...

What's hot (20)

PDF
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
PDF
Open-source tools for querying and organizing large reaction databases
PPT
Gene Ontology Enrichment Network Analysis -Tutorial
PPT
Cheminformatics and the Structure Elucidation of Natural Products
PPT
Building a data repository to manage chemistry research data
PPT
Current initiatives in developing research data repositories at the Royal Soc...
PPTX
Data Integration in a Big Data Context
PPT
Data integration and building a profile for yourself as an online scientist
PPTX
Molecular docking
PDF
CSD-Discovery - Discover new molecules
PDF
Large scale classification of chemical reactions from patent data
PPTX
The needs for chemistry standards, database tools and data curation at the ch...
PDF
Recent improvements to the RDKit
PDF
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
PDF
Making solubility models with reaxy
PDF
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
PDF
Substructure Search Face-off
PPTX
Solving Tough Chemistry Problems Using Reaxys
PPT
Gordon2003
PPT
The importance of standards for data exchange and interchange on the Royal So...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
Open-source tools for querying and organizing large reaction databases
Gene Ontology Enrichment Network Analysis -Tutorial
Cheminformatics and the Structure Elucidation of Natural Products
Building a data repository to manage chemistry research data
Current initiatives in developing research data repositories at the Royal Soc...
Data Integration in a Big Data Context
Data integration and building a profile for yourself as an online scientist
Molecular docking
CSD-Discovery - Discover new molecules
Large scale classification of chemical reactions from patent data
The needs for chemistry standards, database tools and data curation at the ch...
Recent improvements to the RDKit
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
Making solubility models with reaxy
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Substructure Search Face-off
Solving Tough Chemistry Problems Using Reaxys
Gordon2003
The importance of standards for data exchange and interchange on the Royal So...
Ad

Similar to CINF 29: Visualization and manipulation of Matched Molecular Series for decision support (20)

PPTX
Knowledge extraction and visualisation using rule-based machine learning
PPTX
2010 CASCON - Towards a integrated network of data and services for the life ...
PPTX
Presentation of Project and Critique.pptx
PPT
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
PPT
Rules and expert system
PDF
Neo4j_Cypher.pdf
PDF
Revising the Topliss Decision Tree
PDF
Elsevier Industry Talk - WSDM 2020
PPTX
Collaborative Metric Learning (WWW'17)
PPT
Prediction Of Bioactivity From Chemical Structure
PPTX
Metabolomic Data Analysis Workshop and Tutorials (2014)
PPTX
Experimental Design.pptx
PDF
Systems Modeling Overview
PDF
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
PPT
Chapter37
PPTX
Quantification of variability and uncertainty in systems medicine models
PPT
Technical research writing
PDF
Neural Architectures for Named Entity Recognition
PPTX
Slas talk 2016
Knowledge extraction and visualisation using rule-based machine learning
2010 CASCON - Towards a integrated network of data and services for the life ...
Presentation of Project and Critique.pptx
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
Rules and expert system
Neo4j_Cypher.pdf
Revising the Topliss Decision Tree
Elsevier Industry Talk - WSDM 2020
Collaborative Metric Learning (WWW'17)
Prediction Of Bioactivity From Chemical Structure
Metabolomic Data Analysis Workshop and Tutorials (2014)
Experimental Design.pptx
Systems Modeling Overview
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
Chapter37
Quantification of variability and uncertainty in systems medicine models
Technical research writing
Neural Architectures for Named Entity Recognition
Slas talk 2016
Ad

More from NextMove Software (20)

PDF
DeepSMILES
PDF
Building a bridge between human-readable and machine-readable representations...
PDF
CINF 35: Structure searching for patent information: The need for speed
PDF
A de facto standard or a free-for-all? A benchmark for reading SMILES
PDF
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
PDF
Can we agree on the structure represented by a SMILES string? A benchmark dat...
PDF
Comparing Cahn-Ingold-Prelog Rule Implementations
PDF
Eugene Garfield: the father of chemical text mining and artificial intelligen...
PDF
Pharmaceutical industry best practices in lessons learned: ELN implementation...
PDF
Digital Chemical Representations
PDF
Challenges and successes in machine interpretation of Markush descriptions
PDF
PubChem as a Biologics Database
PDF
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
PDF
Building on Sand: Standard InChIs on non-standard molfiles
PDF
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
PDF
Advanced grammars for state-of-the-art named entity recognition (NER)
PDF
Challenges in Chemical Information Exchange
PDF
Automatic extraction of bioactivity data from patents
PDF
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
PDF
RDKit UGM 2016: Higher Quality Chemical Depictions
DeepSMILES
Building a bridge between human-readable and machine-readable representations...
CINF 35: Structure searching for patent information: The need for speed
A de facto standard or a free-for-all? A benchmark for reading SMILES
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Comparing Cahn-Ingold-Prelog Rule Implementations
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Digital Chemical Representations
Challenges and successes in machine interpretation of Markush descriptions
PubChem as a Biologics Database
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
Building on Sand: Standard InChIs on non-standard molfiles
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Advanced grammars for state-of-the-art named entity recognition (NER)
Challenges in Chemical Information Exchange
Automatic extraction of bioactivity data from patents
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit UGM 2016: Higher Quality Chemical Depictions

Recently uploaded (20)

PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PDF
Science Form five needed shit SCIENEce so
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPT
LEC Synthetic Biology and its application.ppt
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
The Land of Punt — A research by Dhani Irwanto
PPTX
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
PDF
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PPTX
gene cloning powerpoint for general biology 2
PPT
Mutation in dna of bacteria and repairss
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Hypertension_Training_materials_English_2024[1] (1).pptx
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
Science Form five needed shit SCIENEce so
lecture 2026 of Sjogren's syndrome l .pdf
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
LEC Synthetic Biology and its application.ppt
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
The Land of Punt — A research by Dhani Irwanto
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
gene cloning powerpoint for general biology 2
Mutation in dna of bacteria and repairss
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...

CINF 29: Visualization and manipulation of Matched Molecular Series for decision support

  • 1. Visualization and manipulation of Matched Molecular Series for decision support Noel O’Boyle and Roger Sayle NextMove Software 250th ACS National Meeting, Boston 16th Aug 2015
  • 2. Matched (Molecular) Pairs [Cl, F]1.6 3.5 Coined by Kenny and Sadowski in 2005* Easier to predict differences in the values of a property than it is to predict the value itself * Chemoinformatics in drug discovery, Wiley, 271–285.
  • 3. Matched Pair usage • Successfully used for: – Rationalising and predicting physicochemical property changes – Finding bioisosteres • Not very successful in improving activity – Activity changes dependent on binding environment • Need to look beyond matched pairs
  • 4. Matched Series of length 2 = Matched Pair [Cl, F] “Matching molecular series” introduced by Wawer and Bajorath, J. Med. Chem. 2011, 54, 2944
  • 5. Matched Series of length 3 [Cl, F, NH2]
  • 6. Ordered Matched Series of length 3 3.5 2.1 1.6 pIC50 [Cl > F > NH2]
  • 7. The Matched Pair mentality • There can only be two – Like inhabitants of Flatland ignorant of a third dimension • What is the equivalent of pair for three? – Triad, trio, triple? • A matched pair represents a transformation from A->B – How would that work if it there were three?
  • 9. Matched Series have prefeRred orders Series Enrichment Observations Br > Cl > F > H 5.36* 256 Cl > Br > F > H 3.14* 150 H > F > Cl > Br 1.53* 73 Br > Cl > H > F 1.40 67 F > Cl > Br > H 1.36 65 Cl > F > Br > H 0.96 46 … … … H > F > Br > Cl 0.77 37 … … … H > Br > F > Cl 0.48* 23 Cl > H > F > Br 0.48* 23 Cl > F > H > Br 0.48* 23 H > Cl > F > Br 0.42* 20 Br > F > H > Cl 0.40* 19 F > H > Br > Cl 0.40* 19 H > Cl > Br > F 0.38* 18 F > Br > H > Cl 0.36* 17 Br > H > F > Cl 0.17* 8 The fact that certain orders are preferred may be used as the basis of a predictive method
  • 10. Find R Groups that increase activity A > B Query matched series A > B > C C > A > B D > A > B > C D > A > C > B E > D > A > B … R Group Observations Obs that increase activity % that increase activity D 3 3 100 E 1 1 100 C 4 1 25 … … … In-house O’Boyle, Boström, Sayle, Gill. J. Med. Chem. 2014, 57, 2704.
  • 11. The Dataset-Centric approach • “Here is my dataset of molecules with activities – now tell me what to make next” • Pro: – Easy for users to get up-and-running – Fits with their existing way of thinking • Don’t need to think too much about matched series • Con: – User is one step removed from the matched series data on which the predictions are actually based – Dataset is fixed: cannot play with around with the prediction input
  • 12. Goals for the interface • Visual interface based around R-Groups as first- class objects arranged in ordered series – Promote new paradigm – Make it clear that the scaffold is not involved • Should help break the “matched pair” mentality – Just a particular case of matched series • Should be easy to play with – Easy to manipulate and quick to respond
  • 13. • Drag-and-drop R Groups into slots to represent observed activity order – The query matched series
  • 15. Pros • Easy to play around with – Swap around order of R groups – See what happens if you follow the predictions • May suggest hypotheses • Useful for searching (not just for predictions) • Tablet-friendly ConS • The user needs to be able to provide an ordered matched series as a query – You can’t just provide a dataset of molecules
  • 16. No Chemistry required • Predictions are solely based on the order of R groups in a matched series – Not using any calculated properties • Images of all R groups in ChEMBL can be generated in advance (~65K) • ⇒ A cheminformatics toolkit is not required for the interface or even for making predictions • In practice, we do use a toolkit to allow the user to enter R groups as SMILES
  • 17. Use Case #1 Are matched Series Predictions symmetric?
  • 18. Are matched series predictions symmetric? • If A>B>C>D is a highly preferred order – Then D>C>B>A also tends to be preferred • Hypothesis: – if A reduces the activity given D>C>B ⇒ it will also improve the activity given B>C>D • If true, then we have twice as much data to use for predictions – Let’s find out….
  • 19. ? > Cl > F > H H > F > Cl > ?
  • 20. Use case #2 Topliss Decision Tree
  • 21. Topliss Decision Tree Topliss, J. G. Utilization of Operational Schemes for Analog Synthesis in Drug Design. J. Med. Chem. 1972, 15, 1006–1011.
  • 24. ChEMBL-BASED Decision Tree (One of Many)