SlideShare a Scribd company logo
8
Most read
16
Most read
24
Most read
Chemical Database Preparation
for Compound Acquisition or
Virtual Screening
Lalit Samant
Research Officer
B J WADIA HOSPITAL FOR CHILDREN
Virtual Screening
• AIM:-
1. HTS
2. Biologically active
3. Rapid
4. Effective
Cont.
• The progression HTS hits = > HTS actives = > lead
series = > drug candidate = > launched drug has
shifted the focus from good-quality candidate
drugs to good-quality leads (10).
• A set of simple property filters known as the “rule
of five” (Ro5) (11) is implemented in the
pharmaceutical industry to restrict small-
molecule synthesis in the property space
defined by ClogP (octanol/water partition
coefficient), molecular weight etc.
Conditions to consider for Library
Desig
• Many library design programs based on
combinatorial chemistry or com- pound
acquisition are now Ro5 compliant.
• Smaller compounds are easier to optimize
toward the drug candidate status, and
leadlikeness has become an established con-
cept in drug discovery
Materials
1. Software to convert chemical structures based on standard file
formats (e.g., SDF, mol2) into canonical isomeric SMILES (15,16), or
equivalent representations of chemical structures
2. Software to handle canonical isomeric SMILES (or equivalent)
and provide chemicalfingerprints, e.g., Daylight (19), Unity (20), Mesa
Analytics and Computing (21), Barnard Chemical Information ([22];
3. Software to compute chemical properties from structures; e.g., to
calculate the octanol/ water partition coefficient, LogP with CLogP ,
KowWIN , or ALogPS
4. Software to cluster chemical structures from fingerprints or from
computed properties.
Cont.
5. Software to convert SMILES (or equivalent)
into appropriate three-dimensional (3D)–
coordinate systems using CONCORD
6. Software to appropriately handle D-optimal
design based on multidimensional spaces.
Methods
1. Assembling the Collection(s)
large pharmaceutical companies have acquired
compound collections, Reals , that contain a
significant number of molecules, including
marketed drugs and other high-activity
compounds. Reals-a valuable resource that is
routinely screened against novel targets.
Cont. Assembling
• such collections of structures must include existing sets
of commercially available chemicals, or Tangibles—
termed this way because one can conceivably acquire
them or synthesize them in-house using tractable
chemistry .
• Thus, any collection prepared for virtual or HTS would
sample both the in-house and the “external” chemical
spaces. In addition to the Reals and the Tangibles, one
can also define the Virtuals—an extremely large set of
molecules (1060–10200) that cannot all be made, at
least with current chemistry, but that can essentially be
used as “resource” for virtual screening.
Methods
2. Cleaning up the collection
There is no “perfect” chemical database, unless
it contains rather simple (e.g., NaCl, H2O) or a
rather small number of molecules. The user
needs to spend a significant effort in cleaning up
the collection, whether it includes Virtuals,
Reals, or Tangibles.
Cleaning up Cont.
2.1 Removing Garbage From the Collection
2.2 Verifying Integrity of Molecular Structure
2.3. Generation of Unique, Normalized SMILES
3. Filtering for Lead-Likeness
• After cleanup, the collection can be processed
to remove compounds that do not have
leadlike properties.
• It is advisable to cluster the remaining
“nonleadlike” set and to include a
representative set of these compounds (up to
30%), because they are likely to capture
additional chemotypes.
suggestions for exclusions according to
leadlikeness are as follows:
1. More than four rings.
2. More than three fused aromatic rings (avoid polyaromatic rings, because they
are likely to be processed by cytochrome P450 enzymes and yield epoxides and
other carcinogens).
3. HDO more than 4; HDO ≤ 5 is one of the Ro5 criteria, but 80% of drugs have HDO
less than 3
4. More than four halogens, except fluorine (avoid “pesticides”). A notable
exception is the crop-protectant business; in such situations, the collection must
be processed with entirely different criteria.
5. More than two CF3 groups (avoid highly halogenated molecules).
6. The removal of compounds that contain fragments responsible for
cytotoxicity
Important Note:-
• collection may t require different processing
criteria for different targets and discovery
goals;
• Eg- targets located in the lung require a
different pharmacokinetic profile,
• E.g., for inhalation therapy, compared with
targets located in the urinary tract that may
require good aqueous solubility at pH = 5.0
Methods cont.
3.4. Searching for Similarity If Known Active
Molecules are Available
3.5. Exploring Alternative Structures
The user should seek alternative structures by
modifying the canonical isomericSMILES, because
these may occur in solution or at the ligand-
receptor interface
a. Tautomerism,
b. Acid/base equilibria
c. chiral centers
Exploring alternative structures is advisable prior to
processing any collection with computational
means, such as for diversity analysis
3.6 Generating 3D Structures
• exploring one or more conformers per
molecule.- Very Essential
3.7. Selecting Chemical Structure Representatives
Screening compounds that are similar to known actives
increases the likelihood of finding new active compounds, but
it may not lead to different chemotypes, a highly desirable
situation in the industrial context. The severity of this
situation is increased if the original actives are covered by
third-party patents or if the lead chemotype is toxic.
Clustering methods aim at grouping molecules into “families”
(clusters) of related structures that are perceived—at a given
resolution— to be different from other chemical families.
With clustering, the end user has the ability to select one or
more representatives from each family. SMD methods aim at
sampling various areas of chemical space and selecting
representatives from each area.
3.7.1 Chemical descriptors
• Chemical descriptors are used to encode
chemical structures and properties of com-
pounds: 2D/3D binary fingerprints or counts
of different substructural features, or per-
haps (computed) physicochemical properties
(e.g., molecular weight, CLogP, HDO, HAC), as
well as other types of steric, electronic,
electrostatic, topological, or hydro- gen-
bonding descriptors.
3.7.2. Similarity (Dissimilarity)
Measure
• Chemical similarity is used to quantify the “distance”
between a pair of compounds (dissimilarity, or 1 −
similarity), or how related the two compounds are
(similarity).
• The basic tenet of chemical similarity is that molecules
exhibiting similar features are expected to have similar
biological activity (46).
• Similarity is, by definition, related to a particular
framework: that of a descriptor system (a metric by
which to judge similar- ity), as well as that of an object,
or class of objects, reference point with which objects
can be compared is needed (47).
• Similarity depends on the choice of molecular descrip-
tors (48), the choice of the weighting scheme(s), and
the similarity coefficient.
3.7.3. Clustering Algorithms
• Clustering algorithms can be classified using many criteria
and also implemented in different ways (29–32).
Hierarchical clustering methods have been traditionally
used to a greater extent, in part owing to computational
simplicity. More recently, chemical structure classifications
are examining nonhierarchical methods. In practice, the
indi- vidual choice of different factors (descriptors,
similarity measure, clustering algorithm) depends also on
the hardware and software resources available, the size
and diversity of the collection that must be clustered, and
not ultimately on the user experience in pro- ducing a
useful classification that has the ability to predict property
values.
3.7.4. Statistical Molecular Design
• SMD can be applied to rationally select
collection representatives, as illustrated for
building block selection in combinatorial
synthesis planning (55).
3.8. Assembling List of Compounds for
Acquisition or Virtual Screening
• Once provided with an output from one or
several methods for compound selection, the
now-selected collection representatives are
almost ready to be submitted for acquisition
or for virtual screening. The end user is
encouraged to allow non leadlike molecules to
be reentered into the candidate pool.
• An additional random, perhaps nonleadlike
selec- tion (up to 30%) can, and should, be
entered in the final list of compounds.
Summery
1. Assemble the collection starting from in-house and on-line databases.
2. Clean up the collection by removing “garbage,” verifying structural
integrity, and making sure that only unique structures are screened.
3. Perform property filtering to remove unwanted structures based on
substructures, property profiling, or various scoring schemes; the
collection can become the virtual screening set at this stage, or it can be
further subdivided in a target- and project-dependent manner.
4. Use similarity to given actives to seek compounds with related
properties.
5. Explore the possible stereoisomers, tautomers, and protonation state
6. Generate the 3D structures in preparation for virtual screening, or for
computation of 3Ddescriptors.
7. Use clustering or SMD to select compound representatives for
acquisition.
8. Add a random subset to the final list of compounds. The final list can
now be submitted for compound acquisition or virtual screening.
THANK YOU !!!

More Related Content

PPTX
Structure based drug design
PPTX
Drug and Chemical Databases 2018 - Drug Discovery
PPT
PPT
In silico drug design an intro
PPTX
Molecular docking
PPTX
Presentation on insilico drug design and virtual screening
PPT
Qsar and drug design ppt
PDF
Virtual Screening in Drug Discovery
Structure based drug design
Drug and Chemical Databases 2018 - Drug Discovery
In silico drug design an intro
Molecular docking
Presentation on insilico drug design and virtual screening
Qsar and drug design ppt
Virtual Screening in Drug Discovery

What's hot (20)

PPTX
Basics Of Molecular Docking
PPTX
Energy minimization methods - Molecular Modeling
PPTX
Cheminformatics in drug design
PPT
Protein docking
PPTX
Homology modelling
PPTX
2D QSAR DESCRIPTORS
PPTX
threading and homology modelling methods
PPTX
Molecular docking
PPTX
Molecular docking by harendra ...power point presentation
PPTX
Chemoinformatics
PPTX
Cheminformatics
PPTX
In Silico Drug Designing
PPTX
Denovo Drug Design
PPTX
The Role of Bioinformatics in The Drug Discovery Process
PPTX
Motifs and domains
PPTX
Example of force fields
PPTX
NMR of protein
PPTX
Cheminformatics
PDF
Basics of QSAR Modeling
PDF
Molecular Dynamics for Beginners : Detailed Overview
Basics Of Molecular Docking
Energy minimization methods - Molecular Modeling
Cheminformatics in drug design
Protein docking
Homology modelling
2D QSAR DESCRIPTORS
threading and homology modelling methods
Molecular docking
Molecular docking by harendra ...power point presentation
Chemoinformatics
Cheminformatics
In Silico Drug Designing
Denovo Drug Design
The Role of Bioinformatics in The Drug Discovery Process
Motifs and domains
Example of force fields
NMR of protein
Cheminformatics
Basics of QSAR Modeling
Molecular Dynamics for Beginners : Detailed Overview
Ad

Similar to Chemical database preparation ppt (20)

PPTX
VIRTUAL SCREENING TECHNIQUE CADD.pptx
ODP
Code camp 2014 Talk Scientific Thinking
PPTX
Bio inspiring computing and its application in cheminformatics
PPTX
Cheminformatics approaches to support chemical identification delivered via t...
PDF
II-PIC 2017: Drug Discovery of Novel Molecules using Chemical Data Mining tool
PPTX
Chemoinformatic File Format.pptx
PDF
So I have an SD File … What do I do next?
PPT
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
DOCX
Cadd assignment 4 (sarita)
PPTX
So I have an SD File... What do I do next?
PPTX
Non-targeted analysis supported by data and cheminformatics delivered via the...
PPTX
Consensus ranking and fragmentation prediction for identification of unknowns...
PPTX
Non-targeted analysis supported by data and cheminformatics delivered via the...
PDF
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
PPTX
Drug design in the field of pharmacy, nikhil patil
PDF
UCT Oct 2014
PPTX
Cheminformatics II
PPTX
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
PPTX
Applications of the US EPA’s CompTox chemicals dashboard to support structure...
PPTX
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
VIRTUAL SCREENING TECHNIQUE CADD.pptx
Code camp 2014 Talk Scientific Thinking
Bio inspiring computing and its application in cheminformatics
Cheminformatics approaches to support chemical identification delivered via t...
II-PIC 2017: Drug Discovery of Novel Molecules using Chemical Data Mining tool
Chemoinformatic File Format.pptx
So I have an SD File … What do I do next?
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
Cadd assignment 4 (sarita)
So I have an SD File... What do I do next?
Non-targeted analysis supported by data and cheminformatics delivered via the...
Consensus ranking and fragmentation prediction for identification of unknowns...
Non-targeted analysis supported by data and cheminformatics delivered via the...
1 -val_gillet_-_ligand-based_and_structure-based_virtual_screening
Drug design in the field of pharmacy, nikhil patil
UCT Oct 2014
Cheminformatics II
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Applications of the US EPA’s CompTox chemicals dashboard to support structure...
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
Ad

Recently uploaded (20)

PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Institutional Correction lecture only . . .
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Lesson notes of climatology university.
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Supply Chain Operations Speaking Notes -ICLT Program
Chinmaya Tiranga quiz Grand Finale.pdf
Computing-Curriculum for Schools in Ghana
2.FourierTransform-ShortQuestionswithAnswers.pdf
Microbial disease of the cardiovascular and lymphatic systems
STATICS OF THE RIGID BODIES Hibbelers.pdf
Final Presentation General Medicine 03-08-2024.pptx
Pharma ospi slides which help in ospi learning
Microbial diseases, their pathogenesis and prophylaxis
Institutional Correction lecture only . . .
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
human mycosis Human fungal infections are called human mycosis..pptx
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Lesson notes of climatology university.
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
O7-L3 Supply Chain Operations - ICLT Program
Abdominal Access Techniques with Prof. Dr. R K Mishra
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Supply Chain Operations Speaking Notes -ICLT Program

Chemical database preparation ppt

  • 1. Chemical Database Preparation for Compound Acquisition or Virtual Screening Lalit Samant Research Officer B J WADIA HOSPITAL FOR CHILDREN
  • 2. Virtual Screening • AIM:- 1. HTS 2. Biologically active 3. Rapid 4. Effective
  • 3. Cont. • The progression HTS hits = > HTS actives = > lead series = > drug candidate = > launched drug has shifted the focus from good-quality candidate drugs to good-quality leads (10). • A set of simple property filters known as the “rule of five” (Ro5) (11) is implemented in the pharmaceutical industry to restrict small- molecule synthesis in the property space defined by ClogP (octanol/water partition coefficient), molecular weight etc.
  • 4. Conditions to consider for Library Desig • Many library design programs based on combinatorial chemistry or com- pound acquisition are now Ro5 compliant. • Smaller compounds are easier to optimize toward the drug candidate status, and leadlikeness has become an established con- cept in drug discovery
  • 5. Materials 1. Software to convert chemical structures based on standard file formats (e.g., SDF, mol2) into canonical isomeric SMILES (15,16), or equivalent representations of chemical structures 2. Software to handle canonical isomeric SMILES (or equivalent) and provide chemicalfingerprints, e.g., Daylight (19), Unity (20), Mesa Analytics and Computing (21), Barnard Chemical Information ([22]; 3. Software to compute chemical properties from structures; e.g., to calculate the octanol/ water partition coefficient, LogP with CLogP , KowWIN , or ALogPS 4. Software to cluster chemical structures from fingerprints or from computed properties.
  • 6. Cont. 5. Software to convert SMILES (or equivalent) into appropriate three-dimensional (3D)– coordinate systems using CONCORD 6. Software to appropriately handle D-optimal design based on multidimensional spaces.
  • 7. Methods 1. Assembling the Collection(s) large pharmaceutical companies have acquired compound collections, Reals , that contain a significant number of molecules, including marketed drugs and other high-activity compounds. Reals-a valuable resource that is routinely screened against novel targets.
  • 8. Cont. Assembling • such collections of structures must include existing sets of commercially available chemicals, or Tangibles— termed this way because one can conceivably acquire them or synthesize them in-house using tractable chemistry . • Thus, any collection prepared for virtual or HTS would sample both the in-house and the “external” chemical spaces. In addition to the Reals and the Tangibles, one can also define the Virtuals—an extremely large set of molecules (1060–10200) that cannot all be made, at least with current chemistry, but that can essentially be used as “resource” for virtual screening.
  • 9. Methods 2. Cleaning up the collection There is no “perfect” chemical database, unless it contains rather simple (e.g., NaCl, H2O) or a rather small number of molecules. The user needs to spend a significant effort in cleaning up the collection, whether it includes Virtuals, Reals, or Tangibles.
  • 10. Cleaning up Cont. 2.1 Removing Garbage From the Collection 2.2 Verifying Integrity of Molecular Structure 2.3. Generation of Unique, Normalized SMILES
  • 11. 3. Filtering for Lead-Likeness • After cleanup, the collection can be processed to remove compounds that do not have leadlike properties. • It is advisable to cluster the remaining “nonleadlike” set and to include a representative set of these compounds (up to 30%), because they are likely to capture additional chemotypes.
  • 12. suggestions for exclusions according to leadlikeness are as follows: 1. More than four rings. 2. More than three fused aromatic rings (avoid polyaromatic rings, because they are likely to be processed by cytochrome P450 enzymes and yield epoxides and other carcinogens). 3. HDO more than 4; HDO ≤ 5 is one of the Ro5 criteria, but 80% of drugs have HDO less than 3 4. More than four halogens, except fluorine (avoid “pesticides”). A notable exception is the crop-protectant business; in such situations, the collection must be processed with entirely different criteria. 5. More than two CF3 groups (avoid highly halogenated molecules). 6. The removal of compounds that contain fragments responsible for cytotoxicity
  • 13. Important Note:- • collection may t require different processing criteria for different targets and discovery goals; • Eg- targets located in the lung require a different pharmacokinetic profile, • E.g., for inhalation therapy, compared with targets located in the urinary tract that may require good aqueous solubility at pH = 5.0
  • 14. Methods cont. 3.4. Searching for Similarity If Known Active Molecules are Available
  • 15. 3.5. Exploring Alternative Structures The user should seek alternative structures by modifying the canonical isomericSMILES, because these may occur in solution or at the ligand- receptor interface a. Tautomerism, b. Acid/base equilibria c. chiral centers Exploring alternative structures is advisable prior to processing any collection with computational means, such as for diversity analysis
  • 16. 3.6 Generating 3D Structures • exploring one or more conformers per molecule.- Very Essential
  • 17. 3.7. Selecting Chemical Structure Representatives Screening compounds that are similar to known actives increases the likelihood of finding new active compounds, but it may not lead to different chemotypes, a highly desirable situation in the industrial context. The severity of this situation is increased if the original actives are covered by third-party patents or if the lead chemotype is toxic. Clustering methods aim at grouping molecules into “families” (clusters) of related structures that are perceived—at a given resolution— to be different from other chemical families. With clustering, the end user has the ability to select one or more representatives from each family. SMD methods aim at sampling various areas of chemical space and selecting representatives from each area.
  • 18. 3.7.1 Chemical descriptors • Chemical descriptors are used to encode chemical structures and properties of com- pounds: 2D/3D binary fingerprints or counts of different substructural features, or per- haps (computed) physicochemical properties (e.g., molecular weight, CLogP, HDO, HAC), as well as other types of steric, electronic, electrostatic, topological, or hydro- gen- bonding descriptors.
  • 19. 3.7.2. Similarity (Dissimilarity) Measure • Chemical similarity is used to quantify the “distance” between a pair of compounds (dissimilarity, or 1 − similarity), or how related the two compounds are (similarity). • The basic tenet of chemical similarity is that molecules exhibiting similar features are expected to have similar biological activity (46). • Similarity is, by definition, related to a particular framework: that of a descriptor system (a metric by which to judge similar- ity), as well as that of an object, or class of objects, reference point with which objects can be compared is needed (47). • Similarity depends on the choice of molecular descrip- tors (48), the choice of the weighting scheme(s), and the similarity coefficient.
  • 20. 3.7.3. Clustering Algorithms • Clustering algorithms can be classified using many criteria and also implemented in different ways (29–32). Hierarchical clustering methods have been traditionally used to a greater extent, in part owing to computational simplicity. More recently, chemical structure classifications are examining nonhierarchical methods. In practice, the indi- vidual choice of different factors (descriptors, similarity measure, clustering algorithm) depends also on the hardware and software resources available, the size and diversity of the collection that must be clustered, and not ultimately on the user experience in pro- ducing a useful classification that has the ability to predict property values.
  • 21. 3.7.4. Statistical Molecular Design • SMD can be applied to rationally select collection representatives, as illustrated for building block selection in combinatorial synthesis planning (55).
  • 22. 3.8. Assembling List of Compounds for Acquisition or Virtual Screening • Once provided with an output from one or several methods for compound selection, the now-selected collection representatives are almost ready to be submitted for acquisition or for virtual screening. The end user is encouraged to allow non leadlike molecules to be reentered into the candidate pool. • An additional random, perhaps nonleadlike selec- tion (up to 30%) can, and should, be entered in the final list of compounds.
  • 23. Summery 1. Assemble the collection starting from in-house and on-line databases. 2. Clean up the collection by removing “garbage,” verifying structural integrity, and making sure that only unique structures are screened. 3. Perform property filtering to remove unwanted structures based on substructures, property profiling, or various scoring schemes; the collection can become the virtual screening set at this stage, or it can be further subdivided in a target- and project-dependent manner. 4. Use similarity to given actives to seek compounds with related properties. 5. Explore the possible stereoisomers, tautomers, and protonation state 6. Generate the 3D structures in preparation for virtual screening, or for computation of 3Ddescriptors. 7. Use clustering or SMD to select compound representatives for acquisition. 8. Add a random subset to the final list of compounds. The final list can now be submitted for compound acquisition or virtual screening.