SlideShare a Scribd company logo
Data analysis in QSAR

        Noel O’Boyle

  Dave Palmer, John Mitchell
Quantitative Structure-Activity
             Relationship (QSAR)
Also QSPR (Property)



                       Perceive
                                  Physical
          Structure    Predict
                                  property
                       Propose
Quantitative Structure-Activity
              Relationship (QSAR)
Also QSPR (Property)



                             Perceive
                                        Physical
          Structure          Predict
                                        property
                             Propose

    Descriptors

    Molecular weight
    No.of H-bonding groups
    Surface area

    MOE: 180 descriptors
Quantitative Structure-Activity
              Relationship (QSAR)
                                           Need to use an estimate of the error
 Optimise model on training set (2/3)      of prediction (CV or bootstrap, but not
     Test model on test set (1/3)          resubstitution)

                  or

Build model on training set (2/3 of 2/3)
   Optimise on test set (1/3 of 2/3)       Using the error of prediction
  Test model on validation set (1/3)
Quantitative Structure-Activity
                   Relationship (QSAR)
                                                         Need to use an estimate of the error
  Optimise model on training set (2/3)                   of prediction (CV or bootstrap, but not
      Test model on test set (1/3)                       resubstitution)

                         or

Build model on training set (2/3 of 2/3)
   Optimise on test set (1/3 of 2/3)                     Using the error of prediction
  Test model on validation set (1/3)


Testing must be carried out on a hold-out set from the same distribution
as the the set of molecules used for optimisation
=> don't optimise model on diverse set
Assessing Model Fit by Cross-Validation, JCICS, 2003, 43, 579.
Beware of q2, J.Mol.Graph.Model., 2002, 20, 269.
Feature selection




                  Number of descriptors →

Features = descriptors                2n different subsets
Parameter optimisation

– Models have parameters which should be tuned
   • Support vector machines
        – gamma, cost, epsilon
QSAR: Feature selection and parameter
             optimisation
• Aim: To find a robust method of simultaneously
  optimising the parameters and performing feature
  selection
• Prediction of solubility for drug-like molecules
   – David Palmer
   – Support vector machines
       • gamma, cost, epsilon
   – 127 descriptors
• Large search space
   – stochastic algorithm required
Ant Colony Optimisation (ACO)
•   Insipired by behaviour of ants foraging for food
•   Ants lay down pheromones, which influence the path taken by other
    ants. Meanwhile pheromones are evaporating.
•   Ants’ trails converge to shortest path between nest and food




•   Ant Colony Optimisation (ACO) – Marco Dorigo, PhD Thesis, 1992.
•   ACO for feature selection – Shen et al, JCIM, 2005, 45, 1024.
•   We have extended it to perform simultaneous parameter optimisation
Ant Colony Optimisation (ACO)

• population of ants (typically 50 to 100)
    – each ant represents a model – i.e. a subset of descriptors and
      values for the parameters

    – each ant has a fitness score, e.g. 10-fold cross validation rmse

• it is more likely that an ant will choose a particular
  descriptor/parameter value in the next iteration if
    – many ants have chosen it in this iteration (local search), or

    – many ants have chosen it in their best models to date (global
      search)

• for descriptors/parameter values that are not chosen in the
  current or best models, the probability that they will be chosen
  decreases (evaporation)
Does it work?

More Related Content

PPT
Qsar lecture
PPT
Qsar
PPT
QSAR : Activity Relationships Quantitative Structure
PPT
Qsar by hansch analysis
PPTX
Qsar
PPT
Qsar and drug design ppt
PPTX
Free wilson analysis qsar
PPTX
Structure activity relation ship
Qsar lecture
Qsar
QSAR : Activity Relationships Quantitative Structure
Qsar by hansch analysis
Qsar
Qsar and drug design ppt
Free wilson analysis qsar
Structure activity relation ship

Viewers also liked (17)

PPT
Computational Chemistry Robots
PPT
Making the most of a QM calculation
PDF
Hyperchem Ma, badbarcode en_1109_nocomment-final
PPT
Quantum pharmacology. Basics
PDF
Focus on Reading Structure Activity
PDF
Computer Aided Drug Design QSAR Related Methods
PPT
2 d qsar model of dihydrofolate reductase (dhfr) inhibitors with activity in ...
KEY
Electronegativity
PPT
Discovery Bus: UK QSAR meeting at GSK
PPT
Molecular design of life
PPT
Automating Drug Design Nov 13th 2009 97
PPT
PPT
Qsar introduction beginners
PPTX
Introduction to OECD QSAR Toolbox
PPT
QSAR Study on Antitubercular Drug Derivatives
PPTX
Qsar studies of saponine analogues for anticancer activity by sagar alone
Computational Chemistry Robots
Making the most of a QM calculation
Hyperchem Ma, badbarcode en_1109_nocomment-final
Quantum pharmacology. Basics
Focus on Reading Structure Activity
Computer Aided Drug Design QSAR Related Methods
2 d qsar model of dihydrofolate reductase (dhfr) inhibitors with activity in ...
Electronegativity
Discovery Bus: UK QSAR meeting at GSK
Molecular design of life
Automating Drug Design Nov 13th 2009 97
Qsar introduction beginners
Introduction to OECD QSAR Toolbox
QSAR Study on Antitubercular Drug Derivatives
Qsar studies of saponine analogues for anticancer activity by sagar alone
Ad

Similar to Data Analysis in QSAR (20)

PPTX
ADMET.pptx
PPT
IGARSS2011-I-Ling.ppt
PDF
BIM_2010_20_Bioinformatics_Project
PDF
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
PDF
Automated parameter optimization should be included in future 
defect predict...
PDF
A comparison of particle swarm optimization and the genetic algorithm by Rani...
PDF
A comparison of three chromatographic retention time prediction models
PPT
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
PDF
T180203125133
PDF
A parsimonious SVM model selection criterion for classification of real-world ...
PDF
An Automatic Clustering Technique for Optimal Clusters
PDF
AI 바이오 (4일차).pdf
PPT
structure analysis.ppt
PPT
structure analysis.ppt
PPT
structure analysis.ppt
PPT
structure analysis.ppt
PDF
an improver particle optmizacion plan de negocios
PDF
Improved Particle Swarm Optimization
PPTX
Small Molecules and siRNA: Methods to Explore Bioactivity Data
PPTX
Week9-Scoring-Matrices.pptx Week9-Scoring-Matrices.pptx
ADMET.pptx
IGARSS2011-I-Ling.ppt
BIM_2010_20_Bioinformatics_Project
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
Automated parameter optimization should be included in future 
defect predict...
A comparison of particle swarm optimization and the genetic algorithm by Rani...
A comparison of three chromatographic retention time prediction models
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
T180203125133
A parsimonious SVM model selection criterion for classification of real-world ...
An Automatic Clustering Technique for Optimal Clusters
AI 바이오 (4일차).pdf
structure analysis.ppt
structure analysis.ppt
structure analysis.ppt
structure analysis.ppt
an improver particle optmizacion plan de negocios
Improved Particle Swarm Optimization
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Week9-Scoring-Matrices.pptx Week9-Scoring-Matrices.pptx
Ad

More from baoilleach (20)

PPTX
We need to talk about Kekulization, Aromaticity and SMILES
PPTX
Open Babel project overview
PPTX
So I have an SD File... What do I do next?
PPTX
Chemistrify the Web
PPTX
Universal Smiles: Finally a canonical SMILES string
PPTX
What's New and Cooking in Open Babel 2.3.2
PPTX
Intro to Open Babel
PPT
Protein-ligand docking
PPTX
Cheminformatics
PPTX
Large-scale computational design and selection of polymers for solar cells
PDF
My Open Access papers
PPTX
Improving the quality of chemical databases with community-developed tools (a...
PPTX
De novo design of molecular wires with optimal properties for solar energy co...
PPTX
Cinfony - Bring cheminformatics toolkits into tune
PPT
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
PDF
Application of Density Functional Theory to Scanning Tunneling Microscopy
PPT
Towards Practical Molecular Devices
PPT
Why multiple scoring functions can improve docking performance - Testing hypo...
PPT
Why multiple scoring functions can improve docking performance - Testing hypo...
PPT
Improving enrichment rates
We need to talk about Kekulization, Aromaticity and SMILES
Open Babel project overview
So I have an SD File... What do I do next?
Chemistrify the Web
Universal Smiles: Finally a canonical SMILES string
What's New and Cooking in Open Babel 2.3.2
Intro to Open Babel
Protein-ligand docking
Cheminformatics
Large-scale computational design and selection of polymers for solar cells
My Open Access papers
Improving the quality of chemical databases with community-developed tools (a...
De novo design of molecular wires with optimal properties for solar energy co...
Cinfony - Bring cheminformatics toolkits into tune
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Application of Density Functional Theory to Scanning Tunneling Microscopy
Towards Practical Molecular Devices
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
Improving enrichment rates

Recently uploaded (20)

PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
01-Introduction-to-Information-Management.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Pre independence Education in Inndia.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
O7-L3 Supply Chain Operations - ICLT Program
Final Presentation General Medicine 03-08-2024.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
TR - Agricultural Crops Production NC III.pdf
Renaissance Architecture: A Journey from Faith to Humanism
PPH.pptx obstetrics and gynecology in nursing
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
01-Introduction-to-Information-Management.pdf
Anesthesia in Laparoscopic Surgery in India
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
human mycosis Human fungal infections are called human mycosis..pptx
Microbial disease of the cardiovascular and lymphatic systems
Module 4: Burden of Disease Tutorial Slides S2 2025
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Microbial diseases, their pathogenesis and prophylaxis
Supply Chain Operations Speaking Notes -ICLT Program
Pre independence Education in Inndia.pdf

Data Analysis in QSAR

  • 1. Data analysis in QSAR Noel O’Boyle Dave Palmer, John Mitchell
  • 2. Quantitative Structure-Activity Relationship (QSAR) Also QSPR (Property) Perceive Physical Structure Predict property Propose
  • 3. Quantitative Structure-Activity Relationship (QSAR) Also QSPR (Property) Perceive Physical Structure Predict property Propose Descriptors Molecular weight No.of H-bonding groups Surface area MOE: 180 descriptors
  • 4. Quantitative Structure-Activity Relationship (QSAR) Need to use an estimate of the error Optimise model on training set (2/3) of prediction (CV or bootstrap, but not Test model on test set (1/3) resubstitution) or Build model on training set (2/3 of 2/3) Optimise on test set (1/3 of 2/3) Using the error of prediction Test model on validation set (1/3)
  • 5. Quantitative Structure-Activity Relationship (QSAR) Need to use an estimate of the error Optimise model on training set (2/3) of prediction (CV or bootstrap, but not Test model on test set (1/3) resubstitution) or Build model on training set (2/3 of 2/3) Optimise on test set (1/3 of 2/3) Using the error of prediction Test model on validation set (1/3) Testing must be carried out on a hold-out set from the same distribution as the the set of molecules used for optimisation => don't optimise model on diverse set Assessing Model Fit by Cross-Validation, JCICS, 2003, 43, 579. Beware of q2, J.Mol.Graph.Model., 2002, 20, 269.
  • 6. Feature selection Number of descriptors → Features = descriptors 2n different subsets
  • 7. Parameter optimisation – Models have parameters which should be tuned • Support vector machines – gamma, cost, epsilon
  • 8. QSAR: Feature selection and parameter optimisation • Aim: To find a robust method of simultaneously optimising the parameters and performing feature selection • Prediction of solubility for drug-like molecules – David Palmer – Support vector machines • gamma, cost, epsilon – 127 descriptors • Large search space – stochastic algorithm required
  • 9. Ant Colony Optimisation (ACO) • Insipired by behaviour of ants foraging for food • Ants lay down pheromones, which influence the path taken by other ants. Meanwhile pheromones are evaporating. • Ants’ trails converge to shortest path between nest and food • Ant Colony Optimisation (ACO) – Marco Dorigo, PhD Thesis, 1992. • ACO for feature selection – Shen et al, JCIM, 2005, 45, 1024. • We have extended it to perform simultaneous parameter optimisation
  • 10. Ant Colony Optimisation (ACO) • population of ants (typically 50 to 100) – each ant represents a model – i.e. a subset of descriptors and values for the parameters – each ant has a fitness score, e.g. 10-fold cross validation rmse • it is more likely that an ant will choose a particular descriptor/parameter value in the next iteration if – many ants have chosen it in this iteration (local search), or – many ants have chosen it in their best models to date (global search) • for descriptors/parameter values that are not chosen in the current or best models, the probability that they will be chosen decreases (evaporation)