Improved Predictions in Structure Based Drug Design Using Cart and Bayesian Models

Donovan N. Chin & R. Aldrin Denny

 Traditional Drug Discovery (insert graph)
 In Silico Prediction of ADME (insert graph)
◦ Potency
◦ Absorption
◦ Lead
◦ Drug
◦ Toxicity
◦ Excretion
◦ Metabolism
◦ distribution

 Target IVY(Brute force virtual screening of
very large compound libraries) Lead
Discovery IVY(Utilize predictive models
from Biogen data for more efficient virtual
screening) Lead Optimization candidate

 (insert graph)
◦ Potency
◦ Lead
◦ Drug
◦ Toxicity
◦ Excretion
◦ Metabolism
◦ Distribution
◦ absorption

 Goal: Identify crystallographic binding mode,
Rank order ligands wrt binding with protein
 (insert graph)
 Receptor Docking
 Ligand Shape
 Generate plausible trial binding modes using
docking function then Re-rank modes with
scoring function

 (insert graph)
 341 Active
 47 Non-Active

 (insert graph)
 After filtering by Pharmacophore Feature

 (insert functions for)
◦ F_Score*
◦ D_Score
◦ G_Score
◦ PMF_Score
◦ Chem_Score
◦ ICM_Score*

 Cell Adhesion Assay (50% Serum)
◦ (insert graph)
 Biochemical Adhesion Assay
◦ (insert graph)
 Scoring Functions Are Poor More Often Than
Not

 Receptor Site View Library Design FlexX
Score Consensus Score>=3 e.g. Contact
Map, CLogP MW, HBOND Rotatable bonds
Consensus=5? if yes, substructure exists?
if yes, Pharmacophore<4.2Å? if yes, Publish
Hit Report

 Goal: Predict hit/miss class based on presence of features
(fingerprints)
 Method
◦ Given a set of N samples
◦ Given that some subset A of them are good (‘active’)
 Then we estimate for a new compound: P(good)~ A/N
◦ Given a set of binary features F
 For a given feature F:
 It appears in N samples
 It appears in A good samples
 Can we estimate: P(good l F)~A/N
 (Problem: Error gets worse as Nsmall)
◦ P’(good l F)= (A+P(good)k)/(n+k)
 P’(good l F)p(good)as N0
 P’(good l F) A/N as N large
◦ (If K=1/P(good) this is the Laplacian correction)
 Descriptors (insert)
 Advantages
◦ Can describe huge number of features (up to 4 billion; MDL 1024; Lead
scope 27,000)
◦ Contains tertiary and stereochemistry information
◦ Fast

 Classification Analysis
◦ Developing Non-Linear Scoring Functions to classify
actives and non-actives
◦ (insert graphs)
◦ Cost Function to Minimize: Gini Impurity N= 1-
ΣP^2(ω)

 Training Set Prediction Success
 (insert table)
 10-fold cross validation
 Randomly split training and test sets
 Significant Improvement in Separating Actives
from Non-Actives

 (insert graph)
 Significant Improvement in Finding Hits Using
New SF

 Optimal tree identified (insert graph)
 No random effects (insert graph)

 (insert cluster)
 Able to identify different molecular property
criteria that lead to hits

 (insert graph)
 Size= magnitude of OBA
 OBA values cover range of descriptor space

 (insert graph)
 Choose 1 & 2D Descriptors for ease of
interpretation and lower “noise”

 Build Model (insert graphs) Apply Model

 Features found in high OBA
 Features found in low OBA
 Would be nice if CART did similar view

 Improved scoring functions for separating
hits from non-hits in structure-based drug
design developed with CART and Bayesian
models
 Identified key differences in molecular
physical properties that led to hits
 Built reasonably predictive OBA model
(cannot expect method to extend to other
systems given complexity of OBA, however)

 Biogen IDEC
 Modeling
◦ Rajiah Denny
◦ Claudio Chuaqui
◦ Juswinder Singh
◦ Herman van Vlijmen
◦ Norman Wang
◦ Anuj Patel
◦ Zhan Deng
 Chemistry
◦ Kevin Guckian
◦ Dan Scott
◦ Thomas Durand-Reville
◦ Pat Conlon
◦ Charlie Hammond
◦ Chuck Jewell
 Pharmacology
◦ Tonika Bonhert

Improved Predictions in Structure Based Drug Design Using Cart and Bayesian Models

More Related Content

Similar to Improved Predictions in Structure Based Drug Design Using Cart and Bayesian Models (20)

More from Salford Systems (20)

Recently uploaded (20)

Improved Predictions in Structure Based Drug Design Using Cart and Bayesian Models