Machine learning in computational docking

1
Mohamed A. Khamis, Walid Gomaa, Walaa F. Ahmed,
Machine learning in computational docking, Artificial Intelligence In Medicine (2015),
http://dx.doi.org/10.1016/j.artmed.2015.02.002

Objective
 The objective of this paper is to highlight the state-of-the-art
machine learning (ML) techniques in computational docking.
 The use of smart computational methods in the life cycle of
drug design is relatively a recent development that has
gained much popularity and interest over the last few years.
 Computational docking is the process of predicting the best
pose (orientation + conformation) of a small molecule (drug
candidate) when bound to a target larger receptor molecule
(protein) in order to form a stable complex molecule.

Background
3
• Background for protein-ligand interactions:
 Physical, chemical, and biological
• Molecular data formats:
 e.g., .mol, .pdb, .sdf, etc.
• Docking software programs:
 e.g., AutoDock, eHiTS, iDock, etc.
• Molecular databases:
 Containing data of proteins with their possible ligands
 e.g., PDB, PDBbind, Binding DB, DUD etc.

4
ligand (small drug molecule)large protein molecule stable complex molecule
Fitting Puzzle Pieces
Drug Design: Docking of Ligand with Target Protein
Binding Site

Protein
HIV-1 protease (hsg1.pdb)
5
Ligand (Drug)
Indinavir (ind.pdb)
Formula:C36H47N5O4
Indinavir (IDV; trade name Crixivan,
manufactured by Merck) is inhibitor used
to treat HIV infection and AIDS.

Complex molecule: Indinavir when fit into binding pocket of
receptor protein HIV-1 protease
6 http://dx.doi.org/10.1016/j.artmed.2015.02.002

Traditional Drug Design Methods
7
• Traditional drug design techniques - such as random screening and
chance discovery are essentially trial and error methods.
• And so they are very time consuming (10-15 years), very
expensive ($300M), with extremely low yield.
• For instance, over last 50 years, 500,000 compounds have been
tested for anti-cancer;
 Only 25 are in wide use today [1].
• On other hand, CADD is target specific, structure-based,
automatic, fast, and very low cost with high success rate.
1. Denny, William A., New Zealand Institute of Chemistry, The Design and
development of anti-cancer drugs. Available at
http://nzic.org.nz/ChemProcesses/biotech/12J.pdf.

Scoring Function
8
 Is mathematical predictive model that produces a
score that represents binding free energy and
hence stability of resulting complex molecule.
 Generally, such function should produce set of credible
ligands ranked according to their binding stability
along with their binding poses
X-Score: Wang R, Lai L, Wang S. Further development and validation of
empirical scoring functions for structure-based binding affinity
prediction. J Computer-Aided Molecular Design 2002;16:11–26.

Powers of Scoring Functions
9
 Scoring Power: Score protein-ligand complex;
correlation coefficient between predicted &
experimentally determined binding affinity.
 Ranking Power: Rank different ligands bound to
same target protein; successful ranking percentage.
 Docking Power: Identify native binding pose among
computer-generated decoys.
 Screening Power: Classification; True binders vs.
Negative Binders (random molecules).

Classical Scoring functions
10
 Classical scoring functions e.g., X-Score rely only on fixed set of
molecular features (e.g., energy terms)
 Summed in linear weighted manner that fails to model non-linear
relationships among individual energy terms.
 In addition, weights of those individual energy terms are calibrated
based on specific protein family (using linear regression),
 Hence, classical scoring functions are more prone to over-fitting.

Machine Learning-based Scoring functions
11
Ballester PJ. Machine learning approaches to predicting protein-ligand
binding. Presentation; Cambridge Computational Biology Institute - European
Molecular Biology Laboratory EMBL-EBI; Cambridge, United Kingdom; 2013.
Value to be
predicted
using
regression
techniques

Training & Testing sets of PDBbind v. 2007
12
Ballester PJ. Machine learning approaches to predicting protein-ligand
binding. Presentation; Cambridge Computational Biology Institute - European
Molecular Biology Laboratory EMBL-EBI; Cambridge, United Kingdom; 2013.

Results
 We survey this paradigm shift elaborating on the main building
components of ML approaches used in molecular docking.
 For instance, the best random forest (RF)-based scoring function
(Li, 2014) on PDBbind v2007 achieves a Pearson correlation
coefficient between the predicted and experimentally
determined binding affinities of 0.803 while the best classical
scoring function achieves 0.644 (Cheng, 2009).
 The best RF-based ranking power (Ashtawy, 2012) ranks the
ligands correctly based on their experimentally determined
binding affinities with accuracy 62.5% and identifies the top
binding ligand with accuracy 78.1%.

Conclusion
14
 Machine Learning techniques give ability to utilize as
many relevant molecular features (e.g., geometric
features, pharmacophore features, etc.) as possible.
 Particularly, ensemble-based machine learning
approaches (e.g., random forest, boosted
regression trees, etc.) are resilient to over fitting.
 Yield good results not only on training complexes but on
any testing complexes as well.

Acknowledgement
15
 This work is supported:
 Mainly by Information Technology Industry
Development Agency (ITIDA) under ITAC Program
grant number CFP#58
 In part by E-JUST Research Fellowship

Publications
16
 Mohamed A. Khamis, Walid Gomaa, 2015,
Comparative Assessment of Scoring and Ranking
Powers of Machine-Learning-Based Scoring
Functions on an Updated Benchmark PDBbind
2013, Engineering Applications of Artificial Intelligence,
Elsevier. (submitted)
 Mohamed A. Khamis, Walid Gomaa, Basem Galal, 2015,
Deep Learning Competes Random Forest in
Computational Docking, IEEE/ACM Transactions on
Computational Biology and Bioinformatics. (submitted)

Questions
 E-mail:
 mohamed.khamis@ejust.edu.eg
 mohamed.abdelaziz.khamis@gmail.com

Machine learning in computational docking

More Related Content

What's hot (20)

Viewers also liked (14)

Similar to Machine learning in computational docking (20)

Machine learning in computational docking