SlideShare a Scribd company logo
Abhik Seal
                                  Phd Student(Chemical Informatics)
                                     Indiana University Bloomington
                                    http://chemin-abs.blogspot.com/
                                             mypage.iu.edu/~abseal/
10/16/2012   abseal@indiana.edu                                1
Whats Pknb ???
• Ser/Thr protein kinase (STPK) highly
  conserved in Gram-positive bacteria and
  apparently essential for Mycobacterial
  viability.
• Essential for cell division and metabolism,
  expressed in exponential growth and
  overexpression causes defects in cell wall
  synthesis and cell division.

10/16/2012           abseal@indiana.edu         2
PknB binding ATP pocket



Gatekeeper




                                     Wehenkel,FEBS Letters 580 (2006) 3018–3022
   10/16/2012       abseal@indiana.edu                                            3
Kinase inhibitor and pharmacophores




Targeting cancer with small molecule kinase inhibitors Nature Review’s Cancer  Through the “Gatekeeper Door”: Exploiting the Active Kinase Conformation
2009 10/16/2012                                                   abseal@indiana.edu Chem. 2010, 53, 2681–2694
                                                                               J. Med.                                                            4
Properties of Kinase Inhibitors




       Through the “Gatekeeper Door”: Exploiting the Active Kinase Conformation J. Med. Chem. 2010, 53, 2681–2694


10/16/2012                                                 abseal@indiana.edu                                       5
Some PknB inhibitors




10/16/2012         abseal@indiana.edu   6
10/16/2012   abseal@indiana.edu   7
• A data fusion algorithm accepts two or more ranked lists
  and merges these lists into a single ranked list with the
  aim of providing better effectiveness than all systems
  used for data fusion. (Croft,2000, Chapter 1; Meng et al.,
  2002).
• Another aim of the data fusion is to group existing search
  services under one umbrella, as the number of existing
  search services increases (Selberg & Etzioni, 1996)
• Fusion in automatic ranking of IR systems
     Automatic ranking of information retrieval systems using data
     fusion, Nuray & Can ’06
• Merging the retrieval results of multiple systems.
  see more on wikipedia (http://en.wikipedia.org/wiki/Data_fusion)
10/16/2012                   abseal@indiana.edu                  8
Used By
    Meta Search engines for example :
    (http://en.wikipedia.org/wiki/List_of_search_engines#Metasearch_engines)

     ex: www.dogpile.com,www.copernic.com,www.hotbot.com

                               Meta search




              Engine1            Engine 2               Engine 2




               D1                  D2                    D3



                           Information Resource
10/16/2012                         abseal@indiana.edu                          9
Workflow of meta-search

• Execute a database search for some particular target
  structure using different similarity measures
• Note the rank position, R(i), of each database
  structure in the ranking for the i-th similarity
  measure using similarity coefficients
• Combine the various positions using a fusion rule to
  give a new rank position for each database structure
• Use these fused positions to generate the final
  output ranking for the search.
             http://www.his.se/PageFiles/6884/Peter%20Willet%20presentation.pdf
10/16/2012                    abseal@indiana.edu                           10
Types of fusion for 2D similarity search
a) Similarity fusion (SF):
 SF involves searching a single reference structure against a database using
multiple different similarity measures, and the output is obtained by
combining the rankings resulting from these different measures.


b) Group fusion (GF):
GF involves searching multiple reference structures against a database using a
single similarity measure, and the output is obtained by combining the
rankings resulting from these different reference structures.



Holliday etal :Multiple search methods for similarity-based virtual screening: analysis of search overlap and
precision Journal of Cheminformatics 2011, 3:29


10/16/2012                                     abseal@indiana.edu                                               11
Similarity fusion (SF)




  (a) WOMBAT top-1% searches; (b) WOMBAT top-5% searches.            (a) MDDR top-1% searches; (b) MDDR top-5%
  searches.


Holliday etal :Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision
Journal of Cheminformatics 2011, 3:29
  10/16/2012                                      abseal@indiana.edu                                              12
Group fusion(GF)




(a) WOMBAT top-1% searches; (b) WOMBAT top-5% searches.               (a) MDDR top-1% searches; (b) MDDR top-5%
searches.

 10/16/2012                                      abseal@indiana.edu                                               13
Reciprocal Rank method

• Merge compounds using only rank positions

• Rank score of compound i (j: system index)
                                      1
               r (d i )
                                   1 pos ( d ij )
                               j




10/16/2012                abseal@indiana.edu        14
Reciprocal rank example
• 4 systems: A, B, C, D
  documents: a, b, c, d, e, f, g
• Query results:
  A={a,b,c,d}, B={a,d,b,e},
  C={c,a,f,e}, D={b,g,e,f}
• r(a)=1/(1+1+1/2)=0.4
  r(b)=1/(1/2+1/3+1)=0.52
• Final ranking of compounds:
  (most relev) a > b > c > d > e > f > g (least relev)

    Nuray, R.;Can,F. Automatic ranking of information retrieval systems using data
    fusion. Information Processing and Management 42 (2006) 595–614

10/16/2012                         abseal@indiana.edu                            15
Sum score

The normalized scores of each ranking are
summed to get the fused score of a compound

               Ranking 1   Ranking 2   Ranking 3   Sum score   Rank

  Compound 1          1          0.9         0.7         2.6     1

  Compound 2         0.8         0.5          1          2.3     2

  Compound 3         0.7          1          0.5         2.2     3

  Compound 4         0.2          0          0.1         0.3     4

  Compound 5          0          0.3          0          0.3     5
Sum rank

• In sum rank ranking is done based on the sum
  scores the maximum score receives the
  minimum rank . The ranks are then summed
  and reranked.

                Ranking 1    Ranking 2   Ranking 3   Sum rank   Rank

   Compound 1          1           10           4         15      5

   Compound 2          2            5           6         13      4

   Compound 3          7            4           3         14      4

   Compound 4          2            3           3          8      2

   Compound 5          3            2           1          6      1
3 d virtual screening of pknb inhibitors using data
Pharmacophore design


To generate the pharmacophoric features we used the energetic
pharmacophore as developed by Salam et al with presence of exclusion
spheres.

Pharmacophoric sites were automatically generated with Phase using the
default set of six chemical features: hydrogen bond acceptor (A), hydrogen
bond donor (D), hydrophobic (H), negative ionizable (N),positive ionizable
(P), and aromatic ring (R).
E-Pharmacophores




10/16/2012                       abseal@indiana.edu                             20
             E-pharmacophore I            E-pharmacophore II   E-pharmacophore III
Validation of Pharmacophores
• To determine how well a hit list was for a query
  compound or a pharmacophore; yield of active
  compounds, enrichment factor, percentage actives and
  Goodness of a Hit list (GH score) were considered.

• Also, how well a pharmacophore or any other screening
  method can rank compounds “early” in a virtual
  screening process using Boltzmann-enhanced
  discrimination of receiver operating characteristic
  (BEDROC Truchon et al) and RIE metric (Sheridan et al)
• 35 active compounds randomly sampled from 62 actives
  along with 1000 decoys
     (www. schrodinger.com/ glide_decoy_set).

10/16/2012                                  abseal@indiana.edu   21
Some formula’s




10/16/2012       abseal@indiana.edu   22
Why BEDROC ??
• Despite its early recognition sensitivity, the Enrichment Factor has
  the drawback of being insensitive to the relative ranking of the
  compounds in the top X% and ignoring the complete ranking of the
  remaining data set.
• The ROC measure cannot identify the compounds ranked early in a
  virtual screening process.

• This BEDROC metric uses an exponential decay function to reduce
  the influence of lower ranked compounds on the final score. The
  score has a parameter α that allows the user to adjust the definition
  of the early recognition problem.

• BEDROC value for three VS methods at α=20.At α=20 implies that
  80% of the the final BEDROC score is based on the first 8% of the
  ranked data set.

10/16/2012                   abseal@indiana.edu                       23
Validation of virtual screening

a) E- pharmacophore
E-pharmacophore III was selected based on the performance measures and
also number of compounds retrieved had more than fitness 2 and also high
Goodness of Hit Score, yield of actives and specificity.

b) ROCS
All the compounds were scored and ranked according to Tanimoto combo
score parameters were selected as mentioned by Bostrom et al.

c) Glide XP
 All compound were score based on the glide XP docking score. The
compound were ranked in a descending order of scores.
R13




                                   D8


                                              E-pharmacophore II




 E-pharmacophore I


Which pharmacophore is good?

Does sites D8 and R13 important?


                                              E-pharmacophore III
Results
Performance measures




Method                 EF(1%)   EF(2%)   EF(5%)   EF(10%)   BEDROC (α=20)   RIE
E-pharmacophore I      11.71    11       10.51    6.8       0.538           7.81
E-pharmacophore II     29.57    27.51    12.14    6.9       0.716           10.40
E-pharmacophore III    29.57    27.14    13.71    7.42      0.744           10.81
vROCS                  29.57    26.71    13.14    7.42      0.749           10.89
GlideXP                26.71    21       11.42    6.28      0.629           9.14
Sum score              29.57    28.57    14.85    7.42      0.785           11.42
Sum rank               29.57    24.28    12       7.42      0.703           10.21
Reciprocal rank        29.57    29.57    17.14    8.85      0.875           12.73
AUC ROC results




Methods                   AUC(1%)   AUC(2%)   AUC(5%)   AUC(100%)

E-pharmacophore III       0.56      0.602     0.649     0.832

vROCS                     0.58      0.62      0.62      0.89

GlideXP                   0.39      0.44      0.51      0.84

Sum score                 0.64      0.6780    0.717     0.90

Sum rank                  0.47      0.49      0.565     0.91

Reciprocal rank           0.72      0.75      0.81      0.96
Architecture
             Data Preprocessing
                                  Rescoring and Ranking
System1
                                                          Validation


System 2

                                  Fusion Algorithms               Decision


System 3




System 4

10/16/2012            abseal@indiana.edu                               29
Virtual Screening of Asinex 400K compounds
                         Workflow
    Chemical Structure                               Post processing       Compound
       Collection
                         3D virtual Screening
                                                      and Ranking           Selection




                         Virtual Screening
                         Using                         Data Fusion      Top 10% of the database
•     400K               •    Phase E                                   Selected for for Glide XP
                              pharmacophore select                      docking
      compounds               top 5000 compounds
                                                     Data Fusion
      from Asinex             for VS in vROCs and    Using Reciprocal   45 compounds
                              Glide SP               Rank algorithm     Selected after visual
      Optimized          •    Conformer generation                      Inspection and
      using ligprep           and perfom ROCS                           pharmacophore mapping
                         •    Glide SP docking
Machine Learning Models under process

• Tools used:
 a)PowerMV descriptors 2D pharmacological fingerprints,
Weighted Burden Number and 8 properties
 b) maccs(166 keys)
 c) rcdk extended graph based
d) j compound mapper library PHAP2PT3 D, PHAP3PT3D ,
CATS3D,CATS2D

 None of the descriptors till now efficient to retrieve the 3D
screening results well.
But ML model provides hope because it’s classifying active and
decoys well with polykernel SVM.
PCA Analysis of predicted compounds

•   12 different physicochemical properties are calculated using cdk ((http://rguha.net/code/ java/cdkdesc.
    html) including molecular refractivity, atom polarizabilities, bond polarizabilities, hydrogen bond donors
    and acceptors, petitjean number, topological polar surface area, number of rotatable bonds,liphophilicity
    XLogP, molecular weight, topological shape and geometrical shape.
Hits retrieved After visual inspection and Pharmacophore
                          mapping
Docking of predicted compounds
Tools Used

• For docking and pharmacophore –
  Schrodinger’s Glide and phase
• Shape based Screening – vROCS
• Performance calculation and visualization - R
  statistics, ggplot2, enrichVS package.
More work

• Working with Design of PknG inhibitors
• Enhanced Ranking systems for better
  prediction
• Automated protocol for developing enhanced
  virtual screening using open source tools.
Acknowledgements

• Indo US science Technology Forum
• Prof P.Yogeshwari and Prof D.Sriram (BITS
  Hyderabad)
• Computer Aided Drug Design Lab BITS Pilani
  Hyderabad.
• Prof David J Wild
• OSDD Team

More Related Content

PDF
Virtual Screening in Drug Discovery
PPTX
CADD Lecture
PPT
pharmacophore mapping
PPT
Structure based drug designing
PPT
Docking
PPTX
Fragment based drug design
Virtual Screening in Drug Discovery
CADD Lecture
pharmacophore mapping
Structure based drug designing
Docking
Fragment based drug design

What's hot (20)

PDF
Structure Based Drug Design
PPT
Computer aided drug designing
PPT
Computer aided Drug designing (CADD)
PDF
In silico methods in drug discovery and development
PPTX
Denovo Drug Design
ODP
PDF
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
PPT
De novo drug design
PPT
Computer aided drug design
 
PPTX
Rational drug design method
PDF
Drug Discovery Today: Fighting TB with Technology
PPT
Rational drug design
PPTX
Drug design
PPTX
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
PPTX
Ligbuilder V2: overview and tutorial.
PPTX
Computer Aided Drug Design ppt
PPTX
Virtual sreening
PPTX
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
PPTX
Computer aided drug designing (CADD)
PPT
“Docking Studies and Drug Design”
Structure Based Drug Design
Computer aided drug designing
Computer aided Drug designing (CADD)
In silico methods in drug discovery and development
Denovo Drug Design
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
De novo drug design
Computer aided drug design
 
Rational drug design method
Drug Discovery Today: Fighting TB with Technology
Rational drug design
Drug design
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
Ligbuilder V2: overview and tutorial.
Computer Aided Drug Design ppt
Virtual sreening
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
Computer aided drug designing (CADD)
“Docking Studies and Drug Design”
Ad

Similar to 3 d virtual screening of pknb inhibitors using data (20)

PPTX
DENOVO DRUG DESIGN AS PER PCI SYLLABUS
PPTX
Docking Score Functions
PDF
Prediction of proteins for insecticidal activity using python toolkit iFeature
PDF
AUTOMATED TEST CASE GENERATION AND OPTIMIZATION: A COMPARATIVE REVIEW
PDF
Systematic Review Workflows and Semantic Solutions for Integrating Biological...
PDF
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
PDF
Elsevier Industry Talk - WSDM 2020
PDF
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
PPTX
Molecular modelling and docking in CADD.pptx
PDF
Machine learning in computational docking
PDF
Rinaldi - ODIN
PDF
JBEI Science Highlights - January 2023
PPTX
molecular docking screnning. pptx
PPTX
Medicinal presentation exam batcho1.pptx
PDF
ENHANCING PERSONALIZED RECIPE RECOMMENDATION THROUGH MULTI CLASS CLASSIFICATION
PDF
Kk341721880
PDF
Enhancing Personalized Recipe Recommendation through Multi-Class Classification
PDF
ENHANCING PERSONALIZED RECIPE RECOMMENDATION THROUGH MULTICLASS CLASSIFICATION
PPTX
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
DOC
Research proposal
DENOVO DRUG DESIGN AS PER PCI SYLLABUS
Docking Score Functions
Prediction of proteins for insecticidal activity using python toolkit iFeature
AUTOMATED TEST CASE GENERATION AND OPTIMIZATION: A COMPARATIVE REVIEW
Systematic Review Workflows and Semantic Solutions for Integrating Biological...
Development and sharing of ADME/Tox and Drug Discovery Machine learning models
Elsevier Industry Talk - WSDM 2020
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
Molecular modelling and docking in CADD.pptx
Machine learning in computational docking
Rinaldi - ODIN
JBEI Science Highlights - January 2023
molecular docking screnning. pptx
Medicinal presentation exam batcho1.pptx
ENHANCING PERSONALIZED RECIPE RECOMMENDATION THROUGH MULTI CLASS CLASSIFICATION
Kk341721880
Enhancing Personalized Recipe Recommendation through Multi-Class Classification
ENHANCING PERSONALIZED RECIPE RECOMMENDATION THROUGH MULTICLASS CLASSIFICATION
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Research proposal
Ad

More from Abhik Seal (20)

PDF
Chemical data
PPTX
Clinicaldataanalysis in r
PDF
Data manipulation on r
PDF
Data handling in r
PPTX
Networks
PDF
Modeling Chemical Datasets
PPTX
Introduction to Adverse Drug Reactions
PPTX
Mapping protein to function
PPTX
Sequencedatabases
PPTX
Chemical File Formats for storing chemical data
PPTX
Understanding Smiles
PDF
Learning chemistry with google
PPTX
Poster
DOCX
R scatter plots
PDF
Indo us 2012
PDF
Q plot tutorial
PDF
Weka guide
PPTX
Pharmacohoreppt
PDF
Document1
PPT
Qsar and drug design ppt
Chemical data
Clinicaldataanalysis in r
Data manipulation on r
Data handling in r
Networks
Modeling Chemical Datasets
Introduction to Adverse Drug Reactions
Mapping protein to function
Sequencedatabases
Chemical File Formats for storing chemical data
Understanding Smiles
Learning chemistry with google
Poster
R scatter plots
Indo us 2012
Q plot tutorial
Weka guide
Pharmacohoreppt
Document1
Qsar and drug design ppt

Recently uploaded (20)

PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Classroom Observation Tools for Teachers
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Complications of Minimal Access Surgery at WLH
PPTX
master seminar digital applications in india
PPTX
Pharma ospi slides which help in ospi learning
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Computing-Curriculum for Schools in Ghana
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
RMMM.pdf make it easy to upload and study
PPTX
GDM (1) (1).pptx small presentation for students
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Microbial diseases, their pathogenesis and prophylaxis
Renaissance Architecture: A Journey from Faith to Humanism
102 student loan defaulters named and shamed – Is someone you know on the list?
VCE English Exam - Section C Student Revision Booklet
Classroom Observation Tools for Teachers
Microbial disease of the cardiovascular and lymphatic systems
STATICS OF THE RIGID BODIES Hibbelers.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Module 4: Burden of Disease Tutorial Slides S2 2025
Complications of Minimal Access Surgery at WLH
master seminar digital applications in india
Pharma ospi slides which help in ospi learning
O5-L3 Freight Transport Ops (International) V1.pdf
Computing-Curriculum for Schools in Ghana
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
RMMM.pdf make it easy to upload and study
GDM (1) (1).pptx small presentation for students

3 d virtual screening of pknb inhibitors using data

  • 1. Abhik Seal Phd Student(Chemical Informatics) Indiana University Bloomington http://chemin-abs.blogspot.com/ mypage.iu.edu/~abseal/ 10/16/2012 abseal@indiana.edu 1
  • 2. Whats Pknb ??? • Ser/Thr protein kinase (STPK) highly conserved in Gram-positive bacteria and apparently essential for Mycobacterial viability. • Essential for cell division and metabolism, expressed in exponential growth and overexpression causes defects in cell wall synthesis and cell division. 10/16/2012 abseal@indiana.edu 2
  • 3. PknB binding ATP pocket Gatekeeper Wehenkel,FEBS Letters 580 (2006) 3018–3022 10/16/2012 abseal@indiana.edu 3
  • 4. Kinase inhibitor and pharmacophores Targeting cancer with small molecule kinase inhibitors Nature Review’s Cancer Through the “Gatekeeper Door”: Exploiting the Active Kinase Conformation 2009 10/16/2012 abseal@indiana.edu Chem. 2010, 53, 2681–2694 J. Med. 4
  • 5. Properties of Kinase Inhibitors Through the “Gatekeeper Door”: Exploiting the Active Kinase Conformation J. Med. Chem. 2010, 53, 2681–2694 10/16/2012 abseal@indiana.edu 5
  • 6. Some PknB inhibitors 10/16/2012 abseal@indiana.edu 6
  • 7. 10/16/2012 abseal@indiana.edu 7
  • 8. • A data fusion algorithm accepts two or more ranked lists and merges these lists into a single ranked list with the aim of providing better effectiveness than all systems used for data fusion. (Croft,2000, Chapter 1; Meng et al., 2002). • Another aim of the data fusion is to group existing search services under one umbrella, as the number of existing search services increases (Selberg & Etzioni, 1996) • Fusion in automatic ranking of IR systems Automatic ranking of information retrieval systems using data fusion, Nuray & Can ’06 • Merging the retrieval results of multiple systems. see more on wikipedia (http://en.wikipedia.org/wiki/Data_fusion) 10/16/2012 abseal@indiana.edu 8
  • 9. Used By Meta Search engines for example : (http://en.wikipedia.org/wiki/List_of_search_engines#Metasearch_engines) ex: www.dogpile.com,www.copernic.com,www.hotbot.com Meta search Engine1 Engine 2 Engine 2 D1 D2 D3 Information Resource 10/16/2012 abseal@indiana.edu 9
  • 10. Workflow of meta-search • Execute a database search for some particular target structure using different similarity measures • Note the rank position, R(i), of each database structure in the ranking for the i-th similarity measure using similarity coefficients • Combine the various positions using a fusion rule to give a new rank position for each database structure • Use these fused positions to generate the final output ranking for the search. http://www.his.se/PageFiles/6884/Peter%20Willet%20presentation.pdf 10/16/2012 abseal@indiana.edu 10
  • 11. Types of fusion for 2D similarity search a) Similarity fusion (SF): SF involves searching a single reference structure against a database using multiple different similarity measures, and the output is obtained by combining the rankings resulting from these different measures. b) Group fusion (GF): GF involves searching multiple reference structures against a database using a single similarity measure, and the output is obtained by combining the rankings resulting from these different reference structures. Holliday etal :Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision Journal of Cheminformatics 2011, 3:29 10/16/2012 abseal@indiana.edu 11
  • 12. Similarity fusion (SF) (a) WOMBAT top-1% searches; (b) WOMBAT top-5% searches. (a) MDDR top-1% searches; (b) MDDR top-5% searches. Holliday etal :Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision Journal of Cheminformatics 2011, 3:29 10/16/2012 abseal@indiana.edu 12
  • 13. Group fusion(GF) (a) WOMBAT top-1% searches; (b) WOMBAT top-5% searches. (a) MDDR top-1% searches; (b) MDDR top-5% searches. 10/16/2012 abseal@indiana.edu 13
  • 14. Reciprocal Rank method • Merge compounds using only rank positions • Rank score of compound i (j: system index) 1 r (d i ) 1 pos ( d ij ) j 10/16/2012 abseal@indiana.edu 14
  • 15. Reciprocal rank example • 4 systems: A, B, C, D documents: a, b, c, d, e, f, g • Query results: A={a,b,c,d}, B={a,d,b,e}, C={c,a,f,e}, D={b,g,e,f} • r(a)=1/(1+1+1/2)=0.4 r(b)=1/(1/2+1/3+1)=0.52 • Final ranking of compounds: (most relev) a > b > c > d > e > f > g (least relev) Nuray, R.;Can,F. Automatic ranking of information retrieval systems using data fusion. Information Processing and Management 42 (2006) 595–614 10/16/2012 abseal@indiana.edu 15
  • 16. Sum score The normalized scores of each ranking are summed to get the fused score of a compound Ranking 1 Ranking 2 Ranking 3 Sum score Rank Compound 1 1 0.9 0.7 2.6 1 Compound 2 0.8 0.5 1 2.3 2 Compound 3 0.7 1 0.5 2.2 3 Compound 4 0.2 0 0.1 0.3 4 Compound 5 0 0.3 0 0.3 5
  • 17. Sum rank • In sum rank ranking is done based on the sum scores the maximum score receives the minimum rank . The ranks are then summed and reranked. Ranking 1 Ranking 2 Ranking 3 Sum rank Rank Compound 1 1 10 4 15 5 Compound 2 2 5 6 13 4 Compound 3 7 4 3 14 4 Compound 4 2 3 3 8 2 Compound 5 3 2 1 6 1
  • 19. Pharmacophore design To generate the pharmacophoric features we used the energetic pharmacophore as developed by Salam et al with presence of exclusion spheres. Pharmacophoric sites were automatically generated with Phase using the default set of six chemical features: hydrogen bond acceptor (A), hydrogen bond donor (D), hydrophobic (H), negative ionizable (N),positive ionizable (P), and aromatic ring (R).
  • 20. E-Pharmacophores 10/16/2012 abseal@indiana.edu 20 E-pharmacophore I E-pharmacophore II E-pharmacophore III
  • 21. Validation of Pharmacophores • To determine how well a hit list was for a query compound or a pharmacophore; yield of active compounds, enrichment factor, percentage actives and Goodness of a Hit list (GH score) were considered. • Also, how well a pharmacophore or any other screening method can rank compounds “early” in a virtual screening process using Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC Truchon et al) and RIE metric (Sheridan et al) • 35 active compounds randomly sampled from 62 actives along with 1000 decoys (www. schrodinger.com/ glide_decoy_set). 10/16/2012 abseal@indiana.edu 21
  • 22. Some formula’s 10/16/2012 abseal@indiana.edu 22
  • 23. Why BEDROC ?? • Despite its early recognition sensitivity, the Enrichment Factor has the drawback of being insensitive to the relative ranking of the compounds in the top X% and ignoring the complete ranking of the remaining data set. • The ROC measure cannot identify the compounds ranked early in a virtual screening process. • This BEDROC metric uses an exponential decay function to reduce the influence of lower ranked compounds on the final score. The score has a parameter α that allows the user to adjust the definition of the early recognition problem. • BEDROC value for three VS methods at α=20.At α=20 implies that 80% of the the final BEDROC score is based on the first 8% of the ranked data set. 10/16/2012 abseal@indiana.edu 23
  • 24. Validation of virtual screening a) E- pharmacophore E-pharmacophore III was selected based on the performance measures and also number of compounds retrieved had more than fitness 2 and also high Goodness of Hit Score, yield of actives and specificity. b) ROCS All the compounds were scored and ranked according to Tanimoto combo score parameters were selected as mentioned by Bostrom et al. c) Glide XP All compound were score based on the glide XP docking score. The compound were ranked in a descending order of scores.
  • 25. R13 D8 E-pharmacophore II E-pharmacophore I Which pharmacophore is good? Does sites D8 and R13 important? E-pharmacophore III
  • 27. Performance measures Method EF(1%) EF(2%) EF(5%) EF(10%) BEDROC (α=20) RIE E-pharmacophore I 11.71 11 10.51 6.8 0.538 7.81 E-pharmacophore II 29.57 27.51 12.14 6.9 0.716 10.40 E-pharmacophore III 29.57 27.14 13.71 7.42 0.744 10.81 vROCS 29.57 26.71 13.14 7.42 0.749 10.89 GlideXP 26.71 21 11.42 6.28 0.629 9.14 Sum score 29.57 28.57 14.85 7.42 0.785 11.42 Sum rank 29.57 24.28 12 7.42 0.703 10.21 Reciprocal rank 29.57 29.57 17.14 8.85 0.875 12.73
  • 28. AUC ROC results Methods AUC(1%) AUC(2%) AUC(5%) AUC(100%) E-pharmacophore III 0.56 0.602 0.649 0.832 vROCS 0.58 0.62 0.62 0.89 GlideXP 0.39 0.44 0.51 0.84 Sum score 0.64 0.6780 0.717 0.90 Sum rank 0.47 0.49 0.565 0.91 Reciprocal rank 0.72 0.75 0.81 0.96
  • 29. Architecture Data Preprocessing Rescoring and Ranking System1 Validation System 2 Fusion Algorithms Decision System 3 System 4 10/16/2012 abseal@indiana.edu 29
  • 30. Virtual Screening of Asinex 400K compounds Workflow Chemical Structure Post processing Compound Collection 3D virtual Screening and Ranking Selection Virtual Screening Using Data Fusion Top 10% of the database • 400K • Phase E Selected for for Glide XP pharmacophore select docking compounds top 5000 compounds Data Fusion from Asinex for VS in vROCs and Using Reciprocal 45 compounds Glide SP Rank algorithm Selected after visual Optimized • Conformer generation Inspection and using ligprep and perfom ROCS pharmacophore mapping • Glide SP docking
  • 31. Machine Learning Models under process • Tools used: a)PowerMV descriptors 2D pharmacological fingerprints, Weighted Burden Number and 8 properties b) maccs(166 keys) c) rcdk extended graph based d) j compound mapper library PHAP2PT3 D, PHAP3PT3D , CATS3D,CATS2D None of the descriptors till now efficient to retrieve the 3D screening results well. But ML model provides hope because it’s classifying active and decoys well with polykernel SVM.
  • 32. PCA Analysis of predicted compounds • 12 different physicochemical properties are calculated using cdk ((http://rguha.net/code/ java/cdkdesc. html) including molecular refractivity, atom polarizabilities, bond polarizabilities, hydrogen bond donors and acceptors, petitjean number, topological polar surface area, number of rotatable bonds,liphophilicity XLogP, molecular weight, topological shape and geometrical shape.
  • 33. Hits retrieved After visual inspection and Pharmacophore mapping
  • 34. Docking of predicted compounds
  • 35. Tools Used • For docking and pharmacophore – Schrodinger’s Glide and phase • Shape based Screening – vROCS • Performance calculation and visualization - R statistics, ggplot2, enrichVS package.
  • 36. More work • Working with Design of PknG inhibitors • Enhanced Ranking systems for better prediction • Automated protocol for developing enhanced virtual screening using open source tools.
  • 37. Acknowledgements • Indo US science Technology Forum • Prof P.Yogeshwari and Prof D.Sriram (BITS Hyderabad) • Computer Aided Drug Design Lab BITS Pilani Hyderabad. • Prof David J Wild • OSDD Team