SlideShare a Scribd company logo
1
Mohamed A. Khamis, Walid Gomaa, Walaa F. Ahmed,
Machine learning in computational docking, Artificial Intelligence In Medicine (2015),
http://dx.doi.org/10.1016/j.artmed.2015.02.002
http://dx.doi.org/10.1016/j.artmed.2015.02.002
Objective
http://dx.doi.org/10.1016/j.artmed.2015.02.0022
 The objective of this paper is to highlight the state-of-the-art
machine learning (ML) techniques in computational docking.
 The use of smart computational methods in the life cycle of
drug design is relatively a recent development that has
gained much popularity and interest over the last few years.
 Computational docking is the process of predicting the best
pose (orientation + conformation) of a small molecule (drug
candidate) when bound to a target larger receptor molecule
(protein) in order to form a stable complex molecule.
Background
3
• Background for protein-ligand interactions:
 Physical, chemical, and biological
• Molecular data formats:
 e.g., .mol, .pdb, .sdf, etc.
• Docking software programs:
 e.g., AutoDock, eHiTS, iDock, etc.
• Molecular databases:
 Containing data of proteins with their possible ligands
 e.g., PDB, PDBbind, Binding DB, DUD etc.
http://dx.doi.org/10.1016/j.artmed.2015.02.002
4
ligand (small drug molecule)large protein molecule stable complex molecule
Fitting Puzzle Pieces
Drug Design: Docking of Ligand with Target Protein
Binding Site
http://dx.doi.org/10.1016/j.artmed.2015.02.002
Protein
HIV-1 protease (hsg1.pdb)
5
Ligand (Drug)
Indinavir (ind.pdb)
Formula:C36H47N5O4
Indinavir (IDV; trade name Crixivan,
manufactured by Merck) is inhibitor used
to treat HIV infection and AIDS.
http://dx.doi.org/10.1016/j.artmed.2015.02.002
Complex molecule: Indinavir when fit into binding pocket of
receptor protein HIV-1 protease
6 http://dx.doi.org/10.1016/j.artmed.2015.02.002
Traditional Drug Design Methods
7
• Traditional drug design techniques - such as random screening and
chance discovery are essentially trial and error methods.
• And so they are very time consuming (10-15 years), very
expensive ($300M), with extremely low yield.
• For instance, over last 50 years, 500,000 compounds have been
tested for anti-cancer;
 Only 25 are in wide use today [1].
• On other hand, CADD is target specific, structure-based,
automatic, fast, and very low cost with high success rate.
1. Denny, William A., New Zealand Institute of Chemistry, The Design and
development of anti-cancer drugs. Available at
http://nzic.org.nz/ChemProcesses/biotech/12J.pdf.
http://dx.doi.org/10.1016/j.artmed.2015.02.002
Scoring Function
8
 Is mathematical predictive model that produces a
score that represents binding free energy and
hence stability of resulting complex molecule.
 Generally, such function should produce set of credible
ligands ranked according to their binding stability
along with their binding poses
X-Score: Wang R, Lai L, Wang S. Further development and validation of
empirical scoring functions for structure-based binding affinity
prediction. J Computer-Aided Molecular Design 2002;16:11–26.
http://dx.doi.org/10.1016/j.artmed.2015.02.002
Powers of Scoring Functions
9
 Scoring Power: Score protein-ligand complex;
correlation coefficient between predicted &
experimentally determined binding affinity.
 Ranking Power: Rank different ligands bound to
same target protein; successful ranking percentage.
 Docking Power: Identify native binding pose among
computer-generated decoys.
 Screening Power: Classification; True binders vs.
Negative Binders (random molecules).
http://dx.doi.org/10.1016/j.artmed.2015.02.002
Classical Scoring functions
10
 Classical scoring functions e.g., X-Score rely only on fixed set of
molecular features (e.g., energy terms)
 Summed in linear weighted manner that fails to model non-linear
relationships among individual energy terms.
 In addition, weights of those individual energy terms are calibrated
based on specific protein family (using linear regression),
 Hence, classical scoring functions are more prone to over-fitting.
http://dx.doi.org/10.1016/j.artmed.2015.02.002
Machine Learning-based Scoring functions
11
Ballester PJ. Machine learning approaches to predicting protein-ligand
binding. Presentation; Cambridge Computational Biology Institute - European
Molecular Biology Laboratory EMBL-EBI; Cambridge, United Kingdom; 2013.
Value to be
predicted
using
regression
techniques
http://dx.doi.org/10.1016/j.artmed.2015.02.002
Training & Testing sets of PDBbind v. 2007
12
Ballester PJ. Machine learning approaches to predicting protein-ligand
binding. Presentation; Cambridge Computational Biology Institute - European
Molecular Biology Laboratory EMBL-EBI; Cambridge, United Kingdom; 2013.
http://dx.doi.org/10.1016/j.artmed.2015.02.002
Results
http://dx.doi.org/10.1016/j.artmed.2015.02.00213
 We survey this paradigm shift elaborating on the main building
components of ML approaches used in molecular docking.
 For instance, the best random forest (RF)-based scoring function
(Li, 2014) on PDBbind v2007 achieves a Pearson correlation
coefficient between the predicted and experimentally
determined binding affinities of 0.803 while the best classical
scoring function achieves 0.644 (Cheng, 2009).
 The best RF-based ranking power (Ashtawy, 2012) ranks the
ligands correctly based on their experimentally determined
binding affinities with accuracy 62.5% and identifies the top
binding ligand with accuracy 78.1%.
Conclusion
14
 Machine Learning techniques give ability to utilize as
many relevant molecular features (e.g., geometric
features, pharmacophore features, etc.) as possible.
 Particularly, ensemble-based machine learning
approaches (e.g., random forest, boosted
regression trees, etc.) are resilient to over fitting.
 Yield good results not only on training complexes but on
any testing complexes as well.
http://dx.doi.org/10.1016/j.artmed.2015.02.002
Acknowledgement
15
 This work is supported:
 Mainly by Information Technology Industry
Development Agency (ITIDA) under ITAC Program
grant number CFP#58
 In part by E-JUST Research Fellowship
http://dx.doi.org/10.1016/j.artmed.2015.02.002
Publications
16
 Mohamed A. Khamis, Walid Gomaa, 2015,
Comparative Assessment of Scoring and Ranking
Powers of Machine-Learning-Based Scoring
Functions on an Updated Benchmark PDBbind
2013, Engineering Applications of Artificial Intelligence,
Elsevier. (submitted)
 Mohamed A. Khamis, Walid Gomaa, Basem Galal, 2015,
Deep Learning Competes Random Forest in
Computational Docking, IEEE/ACM Transactions on
Computational Biology and Bioinformatics. (submitted)
http://dx.doi.org/10.1016/j.artmed.2015.02.002
Questions
http://dx.doi.org/10.1016/j.artmed.2015.02.00217
 E-mail:
 mohamed.khamis@ejust.edu.eg
 mohamed.abdelaziz.khamis@gmail.com

More Related Content

PDF
working_example_poster
PDF
Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO
PDF
Determining stable ligand orientation
PPTX
Connecting Metabolomic Data with Context
PDF
Griffen MedChemica Virtual Tox Panel
PDF
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
PDF
Accelerating lead optimisation with active learning by exploiting MMPA based ...
PDF
LE Metrics (EuroQSAR2016)
working_example_poster
Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO
Determining stable ligand orientation
Connecting Metabolomic Data with Context
Griffen MedChemica Virtual Tox Panel
Emerging Challenges for Artificial Intelligence in Medicinal Chemistry
Accelerating lead optimisation with active learning by exploiting MMPA based ...
LE Metrics (EuroQSAR2016)

What's hot (20)

PDF
MLconf NYC Chang Wang
PPTX
Ligand efficiency: nice concept shame about the metrics
PDF
MedChemica Levinthal Lecture at Openeye CUP XX 2020
PDF
The Evaluated Measurement of a Combined Genetic Algorithm and Artificial Immu...
PPTX
Molecular docking
PDF
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
PDF
Drug Discovery and Development Using AI
PDF
Assessment of Decision Tree Algorithms on Student’s Recital
PDF
BrazMedChem2014
DOCX
Dissertation
PDF
Advanced statistical manual part i
PPTX
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
PPTX
Docking Score Functions
PDF
Possibilities for integrating model-related data in computational biology (DI...
PDF
Thermodynamics for medicinal chemistry design
PPTX
Molecular docking and_virtual_screening
PDF
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
PPTX
Digital webinar master deck final
PPT
Data Mining Using a Consensus Algorithm
MLconf NYC Chang Wang
Ligand efficiency: nice concept shame about the metrics
MedChemica Levinthal Lecture at Openeye CUP XX 2020
The Evaluated Measurement of a Combined Genetic Algorithm and Artificial Immu...
Molecular docking
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
Drug Discovery and Development Using AI
Assessment of Decision Tree Algorithms on Student’s Recital
BrazMedChem2014
Dissertation
Advanced statistical manual part i
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
Docking Score Functions
Possibilities for integrating model-related data in computational biology (DI...
Thermodynamics for medicinal chemistry design
Molecular docking and_virtual_screening
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
Digital webinar master deck final
Data Mining Using a Consensus Algorithm
Ad

Viewers also liked (14)

PPT
MOLECULAR DOCKING
PPT
Protein-Ligand Docking
PDF
Whole Genome Regression using Bayesian Lasso
PDF
Robotics: Modelling, Planning and Control
PPTX
A LASSO for Linked Data
PPTX
CADD Lecture
PPTX
DRUG DESIGN AND DISCOVERY
PPT
Error analysis revised
PPTX
Structure based drug design
PPTX
Molecular docking
PPTX
High Performance Concrete
PDF
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
MOLECULAR DOCKING
Protein-Ligand Docking
Whole Genome Regression using Bayesian Lasso
Robotics: Modelling, Planning and Control
A LASSO for Linked Data
CADD Lecture
DRUG DESIGN AND DISCOVERY
Error analysis revised
Structure based drug design
Molecular docking
High Performance Concrete
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
Ad

Similar to Machine learning in computational docking (20)

PDF
Review on Computational Bioinformatics and Molecular Modelling Novel Tool for...
PPTX
Molecular Docking and its Applications.pptx
PDF
Comparison of Traditional and Machine Learning Programs in the Evaluation of ...
PPT
Significance of computational tools in drug discovery
PPT
Docking
PPT
Cadd and molecular modeling for M.Pharm
PDF
43_EMIJ-06-00212.pdf
PDF
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model
PDF
Ieee projects 2012 2013 - Bio Informatics
PPTX
Docking Techniques in Medicinal Chemistry by Rishabh Tiwari.pptx
PPTX
Integrating Chemistry with Computational Drug Design (1).pptx
PPTX
DD.pptx
PPTX
Computer Assisted Drug Design By Rauf Pathan and Patel Mo Shaffan
PPT
Computer aided drug design - a new drug discovery tool
PPTX
COMPUTER AIDED DRUG DESIGN BYJayant_Nimkar
PPTX
COMPUTER AISES DRUG DESIGN .BY JAYA NT NIMKAR
PDF
NanoAgents: Molecular Docking Using Multi-Agent Technology
PPTX
Research Application of MOLECULAR DOCKING
PPTX
CADD_Presentation.pptx by - Prikshit pundir
Review on Computational Bioinformatics and Molecular Modelling Novel Tool for...
Molecular Docking and its Applications.pptx
Comparison of Traditional and Machine Learning Programs in the Evaluation of ...
Significance of computational tools in drug discovery
Docking
Cadd and molecular modeling for M.Pharm
43_EMIJ-06-00212.pdf
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model
Ieee projects 2012 2013 - Bio Informatics
Docking Techniques in Medicinal Chemistry by Rishabh Tiwari.pptx
Integrating Chemistry with Computational Drug Design (1).pptx
DD.pptx
Computer Assisted Drug Design By Rauf Pathan and Patel Mo Shaffan
Computer aided drug design - a new drug discovery tool
COMPUTER AIDED DRUG DESIGN BYJayant_Nimkar
COMPUTER AISES DRUG DESIGN .BY JAYA NT NIMKAR
NanoAgents: Molecular Docking Using Multi-Agent Technology
Research Application of MOLECULAR DOCKING
CADD_Presentation.pptx by - Prikshit pundir

Machine learning in computational docking

  • 1. 1 Mohamed A. Khamis, Walid Gomaa, Walaa F. Ahmed, Machine learning in computational docking, Artificial Intelligence In Medicine (2015), http://dx.doi.org/10.1016/j.artmed.2015.02.002 http://dx.doi.org/10.1016/j.artmed.2015.02.002
  • 2. Objective http://dx.doi.org/10.1016/j.artmed.2015.02.0022  The objective of this paper is to highlight the state-of-the-art machine learning (ML) techniques in computational docking.  The use of smart computational methods in the life cycle of drug design is relatively a recent development that has gained much popularity and interest over the last few years.  Computational docking is the process of predicting the best pose (orientation + conformation) of a small molecule (drug candidate) when bound to a target larger receptor molecule (protein) in order to form a stable complex molecule.
  • 3. Background 3 • Background for protein-ligand interactions:  Physical, chemical, and biological • Molecular data formats:  e.g., .mol, .pdb, .sdf, etc. • Docking software programs:  e.g., AutoDock, eHiTS, iDock, etc. • Molecular databases:  Containing data of proteins with their possible ligands  e.g., PDB, PDBbind, Binding DB, DUD etc. http://dx.doi.org/10.1016/j.artmed.2015.02.002
  • 4. 4 ligand (small drug molecule)large protein molecule stable complex molecule Fitting Puzzle Pieces Drug Design: Docking of Ligand with Target Protein Binding Site http://dx.doi.org/10.1016/j.artmed.2015.02.002
  • 5. Protein HIV-1 protease (hsg1.pdb) 5 Ligand (Drug) Indinavir (ind.pdb) Formula:C36H47N5O4 Indinavir (IDV; trade name Crixivan, manufactured by Merck) is inhibitor used to treat HIV infection and AIDS. http://dx.doi.org/10.1016/j.artmed.2015.02.002
  • 6. Complex molecule: Indinavir when fit into binding pocket of receptor protein HIV-1 protease 6 http://dx.doi.org/10.1016/j.artmed.2015.02.002
  • 7. Traditional Drug Design Methods 7 • Traditional drug design techniques - such as random screening and chance discovery are essentially trial and error methods. • And so they are very time consuming (10-15 years), very expensive ($300M), with extremely low yield. • For instance, over last 50 years, 500,000 compounds have been tested for anti-cancer;  Only 25 are in wide use today [1]. • On other hand, CADD is target specific, structure-based, automatic, fast, and very low cost with high success rate. 1. Denny, William A., New Zealand Institute of Chemistry, The Design and development of anti-cancer drugs. Available at http://nzic.org.nz/ChemProcesses/biotech/12J.pdf. http://dx.doi.org/10.1016/j.artmed.2015.02.002
  • 8. Scoring Function 8  Is mathematical predictive model that produces a score that represents binding free energy and hence stability of resulting complex molecule.  Generally, such function should produce set of credible ligands ranked according to their binding stability along with their binding poses X-Score: Wang R, Lai L, Wang S. Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J Computer-Aided Molecular Design 2002;16:11–26. http://dx.doi.org/10.1016/j.artmed.2015.02.002
  • 9. Powers of Scoring Functions 9  Scoring Power: Score protein-ligand complex; correlation coefficient between predicted & experimentally determined binding affinity.  Ranking Power: Rank different ligands bound to same target protein; successful ranking percentage.  Docking Power: Identify native binding pose among computer-generated decoys.  Screening Power: Classification; True binders vs. Negative Binders (random molecules). http://dx.doi.org/10.1016/j.artmed.2015.02.002
  • 10. Classical Scoring functions 10  Classical scoring functions e.g., X-Score rely only on fixed set of molecular features (e.g., energy terms)  Summed in linear weighted manner that fails to model non-linear relationships among individual energy terms.  In addition, weights of those individual energy terms are calibrated based on specific protein family (using linear regression),  Hence, classical scoring functions are more prone to over-fitting. http://dx.doi.org/10.1016/j.artmed.2015.02.002
  • 11. Machine Learning-based Scoring functions 11 Ballester PJ. Machine learning approaches to predicting protein-ligand binding. Presentation; Cambridge Computational Biology Institute - European Molecular Biology Laboratory EMBL-EBI; Cambridge, United Kingdom; 2013. Value to be predicted using regression techniques http://dx.doi.org/10.1016/j.artmed.2015.02.002
  • 12. Training & Testing sets of PDBbind v. 2007 12 Ballester PJ. Machine learning approaches to predicting protein-ligand binding. Presentation; Cambridge Computational Biology Institute - European Molecular Biology Laboratory EMBL-EBI; Cambridge, United Kingdom; 2013. http://dx.doi.org/10.1016/j.artmed.2015.02.002
  • 13. Results http://dx.doi.org/10.1016/j.artmed.2015.02.00213  We survey this paradigm shift elaborating on the main building components of ML approaches used in molecular docking.  For instance, the best random forest (RF)-based scoring function (Li, 2014) on PDBbind v2007 achieves a Pearson correlation coefficient between the predicted and experimentally determined binding affinities of 0.803 while the best classical scoring function achieves 0.644 (Cheng, 2009).  The best RF-based ranking power (Ashtawy, 2012) ranks the ligands correctly based on their experimentally determined binding affinities with accuracy 62.5% and identifies the top binding ligand with accuracy 78.1%.
  • 14. Conclusion 14  Machine Learning techniques give ability to utilize as many relevant molecular features (e.g., geometric features, pharmacophore features, etc.) as possible.  Particularly, ensemble-based machine learning approaches (e.g., random forest, boosted regression trees, etc.) are resilient to over fitting.  Yield good results not only on training complexes but on any testing complexes as well. http://dx.doi.org/10.1016/j.artmed.2015.02.002
  • 15. Acknowledgement 15  This work is supported:  Mainly by Information Technology Industry Development Agency (ITIDA) under ITAC Program grant number CFP#58  In part by E-JUST Research Fellowship http://dx.doi.org/10.1016/j.artmed.2015.02.002
  • 16. Publications 16  Mohamed A. Khamis, Walid Gomaa, 2015, Comparative Assessment of Scoring and Ranking Powers of Machine-Learning-Based Scoring Functions on an Updated Benchmark PDBbind 2013, Engineering Applications of Artificial Intelligence, Elsevier. (submitted)  Mohamed A. Khamis, Walid Gomaa, Basem Galal, 2015, Deep Learning Competes Random Forest in Computational Docking, IEEE/ACM Transactions on Computational Biology and Bioinformatics. (submitted) http://dx.doi.org/10.1016/j.artmed.2015.02.002