In silico methods in drug discovery and development

In Silico methods in Drug Discovery
and Development
Stephane Acoca
Department of Biochemistry
McGill University
Montrea, Quebec, Canada
Submitted August 2011
A thesis submitted to McGill University in partial fulfillment of the requirements
for the degree of Doctor of Philosophy
© Stephane Acoca, 2011

i

Abstract
Computational drug design methods have become increasingly invaluable in the drug discovery
and development process. Throughout this thesis will be described the development and
application of methods that are used at every stage of the drug discovery and development
pipeline. In Chapter 2 will take a look at the use computational methods towards the
understanding and development of two novel Bcl-2 inhibitors, Obatoclax and ABT-737, being
developed for the treatment of Cancer. The study proposes certain mechanisms through which
ABT-737 displays selectivity towards certain targets within the Bcl-2 family. Additionally, we
propose a binding mode for Obatoclax which is in accordance with experimental data. The
following Chapter addresses the use of virtual screening for the identification of novel lead
compounds. Trypanosoma brucei RNA Editing Ligase 1 was chosen as the target for the
development of treatments against Trypanosoma infections and C35, a potent novel inhibitor of
the enzyme, was identified. Furthermore, our research shows that the action of C35 extends to
inhibition of several critical enzyme activities required for the RNA editing process as well as
compromising the integrity of the multiprotein complex which carries it out. The following
Chapter takes a look at the use of mass spectrometry data in order to expedite discovery of
bioactive compounds in natural products. We developed an algorithm which analyses MS/MS
data in order to derive the Molecular Formula of the compound. The novel algorithm obtained a
95% success rate on a test set of 91 compounds. The last Chapter of the thesis explores the use of
molecular dynamics to generate a conformational ensemble of targets for virtual screening.
Conformational ensembles were generated for a target test set taken from the Directory for
Useful Decoys. The results showed that molecular dynamics-based conformational ensembles

ii

provided remarkable improvements on 2 of the targets tested due to the enhanced capacity to
properly dock compounds in otherwise restricted structures. The last Chapter of the thesis is a
general discussion on the work of the thesis and a proposal on how all can be integrated within
the drug discovery and development pipeline.

iii

Résumé
Les méthodes the modélisation sont devenues un outil inestimable dans le processus de
découverte et de développement de nouveaux médicaments. Au cours de cette thèse va être
décrit le développement et l’application de méthodes utilisés à chaque stage de la découverte et
du développement de produits pharmaceutiques. Le Chapitre 2 est un aperçu sur l’utilisation de
méthodes computationnelles vers le développement de deux nouveaux inhibiteurs des protéines
Bcl-2, Obatoclax et ABT-737, en développement pour le traitement du Cancer. L’étude propose
certains mécanismes d’ABT-737 qui expliquent ca sélectivité envers les membres de la famille
Bcl-2. De plus, nous proposons un mécanisme d’attachement pour Obatoclax qui conforme aux
données expérimentales. Le Chapitre suivant adresse l’utilisation du dépistage virtuel pour
l’identification de nouvelles molécules mère. La Ligase de l’Edition d’ARN du Trypanosoma
brucei a été choisie comme cible pour le développement de traitements contre des infections dû
au Trypanosome et C35 a été identifié comme nouvel inhibiteur de l’enzyme. En outre, notre
recherche démontre que l’action de C35 s’étends a l’inhibition de plusieurs enzymes nécessaires
pour le mécanisme d’édition de l’ARN en plus de compromettre l’intégrité du complexe multi-
protéinique qui l’effectue. Le Chapitre suivant prends regard a l’utilisation de donnes dérivant
de la spectrométrie de masse pour but d’accélérer la découverte de molécules bioactives venant
de sources naturelles. Nous avons développé un algorithme qui analyse les données MS/MS pour
but de dériver la formule moléculaire du composé. Le nouvel algorithme a obtenu un taux de
succès s’élevant à 95% sur un ensemble test de 91 molécules. Le dernier Chapitre de la thèse
explore l’utilisation de simulations de dynamique moléculaire pour générer en ensemble
conformationel de protéines cible pour son utilisation dans le dépistage virtuel. Les ensembles

iv

conformationel ont étés généré pour une série test obtenu d’un répertoire attitré ‘Directory for
Useful Decoys’. Les résultats démontrent que les ensembles conformationel dérivés de la
dynamique moléculaire ont apporté des améliorations remarquables sur deux des cibles testées
dû à une capacité accrue de placement approprié des molécules dans un site qui est autrement
très restreint. Le dernier Chapitre de cette thèse est une discussion générale sur le travail
accomplie et une proposition sur la manière dont tous les éléments sont intégrer dans un
protocole de découverte et de développement de produits pharmaceutiques.

v

Acknowledgements
I would like to thank first Dr Enrico Purisima and Prof. Gordon Shore for their mentorship
and patience throughout the doctoral work leading to this thesis. I am very thankful for the
experiences I have during my tenure in Dr Purisima’s laboratory. I would like to show my
special thanks to former members of the laboratory Dr Sathesh Bhat, Dr Marwen Naim, Herve
Hogues, and Dr Qizhi Cui whose guidance, friendship, inspiration, assistance and support have
been invaluable during my tenure. My long conversations on modeling with Dr Bhat and Mr
Hogues have been of special value in my learning of computational modeling. I would also like
to extend thanks to current and past members of the laboratory which include Dr Shafinaz
Chowdhury, Dr Christophe Deprez, Dr Edwin Wang and Dr Sheldon Dennis for creating a
positive work environment. I would like to show special thanks members of my Research
Advisory Committee (RAC) Prof. John Silvius, Prof. Albert Berghuis who have been of
great assistance in guiding me through the completion of the doctoral work. I’d also like to add
special recognition to Prof. Imed Gallouzi for his help. I’d also like to thank the Chemical
Biology program at McGill, which has partially funded my work. Lastly, I’d like to thank my
family for their continued encouragement and support.

vi

Table of Contents
Abstract i
Résumé iii
Acknowledgements v
Table of Contents vi
List of Figures x
List of Tables xiii
Abbreviations xv
Contribution of Authors xvii
Chapter 1. General Introduction
1.1 Drug Discovery and Development 2
1.1.1 Overview & Challenges 2
1.1.2 Thesis Outline 5
1.2 Molecular Modeling 7
1.2.1 Molecular Mechanics 7
1.3 Predicting Binding Free Energies – Scoring 12
1.3.1 Effect of water: Continuum (Implicit) Solvation energy 15
1.3.1.1 Finite difference 18
1.3.1.2 Boundary Element Method 19
1.3.1.3 Desolvation Cost 19
1.3.2 Scoring Functions 21
1.3.2.1 Physical-Chemical 22
1.3.2.2 Empirical function 25
1.3.2.3 Knowledge-based 25
1.3.2.4 Problems 26
1.4 Predicting Binding Modes – Docking 27
1.4.1 Docking Algorithms 28
1.4.1.1 Fast Shape Matching 28
1.4.1.2 Incremental Construction 29
1.4.1.3 Monte Carlo Simulations 30
1.4.1.4 Evolutionary Programming 31
1.5 Molecular Dynamics 32
1.5.1 Newton’s Laws 32
1.5.2 Ensembles 34
1.5.3 Verlet Algorithm 34
1.5.4 Considerations 36
1.5.5 Boundary Conditions 38
1.5.6 Long Range Electrostatic Calculations: The Ewald Summation Method 39
1.6 Virtual Screening 40
1.6.1 Virtual Screening Pipeline 41

vii

1.6.2 The Target 43
1.6.3 The Compound Database 44
1.6.4 The Docking Protocol 44
1.6.5 MD Simulations 45
1.6.6 Conformational Ensembles 45
1.7 Successes of CADD 48
Chapter 2. Molecular Dynamics Study of Small Molecule Inhibitors
of the Bcl-2 Family
Preface 51
2.1 Rationale 52
2.2 Abstract 52
2.3 Introduction 53
2.4 Methods 58
2.4.1 Structure Preparation 58
2.4.2 Force Field Parameters 58
2.4.3 Docking 59
2.4.4 Molecular Dynamics Simulations 60
2.4.5 Binding free energy estimate 61
2.5 Results and Discussion 62
2.5.1 Molecular Modeling of ABT-737 complexes 62
2.5.2 Binding groove structure 64
2.5.3 Chlorobiphenyl group 65
2.5.4 Phenylpiperazine linker 67
2.5.5 Nitrophenylsulfonamide group 69
2.5.6 S-phenyl group 71
2.5.7 Dimethyl group 72
2.5.8 SIE Analysis and Virtual Alanine Mutations 72
2.5.9 Protein structure and dynamics 75
2.5.10 Mcl-1 and obatoclax 78
2.6 Conclusion 80
Chapter 3. Naphthalene-based RNA editing inhibitor blocks RNA
editing activities and editosome assembly in Trypanosoma
Brucei
Preface 83
3.1 Rationale 84
3.2 Abstract 84
3.3 Introduction 85
3.4 Experimental Procedures 88
3.4.2 Virtual Screening 89
3.4.3 Solvated Interaction Energy 90

viii

3.4.4 Preparation of mitochondrial extract and tandem affinity purification 90
of ligase complex
3.4.5 Preparation of RNAs 91
3.4.6 Adenylylation and deadenylylation assays 91
3.4.7 In vitro RNA editing assays 92
3.4.8 Gel shift assay 93
3.4.9 Guanylyltransferase labeling 94
3.5 Results 94
3.5.1 Virtual Screening 94
3.5.2 Inhibition of RNA editing by selected compounds 97
3.5.3 Inhibition of ligase adenylylation at low protein concentrations by C35 and S5 100
3.5.4 Inhibition of deadenylylation by C35 and S5 102
3.5.5 Inhibition of different steps of RNA editing by C35 and S5 104
3.5.6 Inhibitory compounds affect the editosome RNA-binding activity 107
3.5.7 20S editosome complex integrity is affected by C35 treatment 110
3.6 Discussion 112
3.7 Acknowledgments 117
Chapter 4. Automated Molecular Formula Analysis determination by
Tandem Mass Spectrometry (MS/MS)
Preface 119
4.1 Rationale 120
4.2 Abstract 120
4.3 Introduction 121
4.4 Experimental 125
4.4.1 Materials 125
4.4.2 Instrumentation 125
4.4.3 MS/MS experiments 126
4.4.4 The algorithm of molecular formula analysis 127
4.4.5 Nitrogen-enriched or oxygen-enriched compounds 132
4.5.1 Risk of assigning incorrect molecular formula 133
4.5.2 Mass accuracy 134
4.5.3 Fragmentation pathways of brefeldin 4 135
4.5.4 Molecules with single structural domain 137
4.5.5 Molecules with multiple core structures 140
4.5.6 Analysis of structurally-related compounds 143
4.5.7 Cyclazocine and N-alllylnormetazocine 146
4.5.8 Peptides 148
4.5.9 Chloro- or bromo-containing compounds 152
4.6 Conclusion 154
4.7 Acknowledgements 155

ix

Chapter 5. Molecular Dynamics ensemble in Virtual Screening
Preface 157
5.1 Rationale 158
5.2 Abstract 158
5.3 Introduction 159
5.4 Methods 162
5.4.2 Ligand Preparation and Docking 163
5.4.3 Molecular dynamics simulations 164
5.4.4 Force Field Parameters 165
5.4.5 Clustering 165
5.4.6 Test Data Sets 165
5.5.1 Overview of Results 166
5.5.2 Obstructive changes during apo simulations 167
5.5.3 Performance of holo ensemble 170
5.5.4 Structural change in holo ensemble 171
5.5.5 Effect on score distribution 174
5.5.6 Comparison with RCS 177
5.5.7 Use of DUD training set 178
5.6 Conclusion 180
Chapter 6. General Discussion
6.1 Molecular Dynamics Study of Bcl-2 Inhibitors
6.2 Discovery of TbRel1 Inhibitors
6.3 Automated Molecular Formula determination by Tandem Mass Spectrometry
6.4 Ensemble-based Virtual Screening
Appendices
Appendix A
Appendix B
Appendix C
Appendix D
References 222
Original Contributions to Knowledge 250

x

List of Figures
Chapter 1
Figure 1.01 The pharmaceutical drug discovery and development pipeline
Figure 1.02 Increasing costs in pharmaceutical R&D
Figure 1.03 Pre-approval costs for new drugs
Figure 1.04 The contributions of bonded terms to the potential energy function
Figure 1.05 Cubic grid scheme for the Finite Difference Method
Figure 1.06 Representation of desolvation effects during ligand-protein complex formation
Figure 1.07 Periodic boundary conditions in molecular dynamic simulations
Figure 1.08 The virtual screening pipeline
Chapter 2
Figure 2.01 ABT-737 chemical structure
Figure 2.02 Obatoclax chemical structure
Figure 2.03 Multiple sequence alignment of representative BH3 domains from BH3-Only
proteins.
Figure 2.04 Superposition of ABT-737 and Bim BH3 peptide bound to Bcl-xL.
Figure 2.05 Calculated binding mode of ABT-737 in Bcl-xl, Bcl-2 and Mcl-1.
Figure 2.06 Distance of the ABT-737 biphenyl ring centroids from their initial positions after
superposition of the protein C-alpha atoms to those in the first snapshots.
Figure 2.07 Distance of the ABT-737 linker ring centroids from their initial positions after
superposition of the protein C-alpha atoms to those in the first snapshots.
Figure 2.08 Distance of the ABT-737 nitrophenyl and S-phenyl ring centroids from their
initial positions after superposition of the protein C-alpha atoms to those in the
first snapshot.
Figure 2.09 Calculated binding mode of obatoclax in Mcl-1.
Chapter 3
Figure 3.01 Predicted binding modes of TbREL1
Figure 3.02 Effect of selected compounds that inhibit editosome activity
Figure 3.03 Effect of inhibitory compounds on adenylylation and deadenylylation steps of
RNA editing ligases
Figure 3.04 Effect of inhibitory compounds on different steps of RNA editing
Figure 3.05 Effect of inhibitory compounds on RNA-binding activity of editosome complex
Figure 3.06 Analysis of sedimentation profile and activity of ligase-associated complexes in
the presence of C35

xi

Figure 3.07 Alternative models for the mechanism of action of C35 and S5.
Chapter 4
Figure 4.01 The MS/MS spectrum of brefeldin A
Figure 4.02 Fragmentation pathways of brefeldin A
Figure 4.03 The MS/MS spectrum of prazosin
Figure 4.04 Fragmentation pathways of prazosin
Figure 4.05 The MS/MS spectrum of dihydroergotamine and dihydroergocristine
Figure 4.06 Fragmentation pathways of dihydroergotamine
Figure 4.07 Structures of dihydroergotamine and dihydroergocristine
Figure 4.08 The MS/MS spectrum of cyclazocine and N-allylnormetazocine
Figure 4.09 Fragmentation pathways of cyclazocine
Figure 4.10 The MS/MS spectrum of 5-leucine encephalin
Figure 4.11 Stepwise analysis of 5-leucine encephalin sequences
Figure 4.12 Overall detail analysis of 5-leucine encephalin
Figure 4.13 The MS/MS spectrum of quinacrine
Figure 4.14 Shows the plausible fragmentation pathways of quinacrine
Chapter 5
Figure 5.01 Changes in binding site observed in the apo ensemble in a) COX2, b) AR,
c) GART and d) PARP.
Figure 5.02 Changes in binding site observed in the holo ensemble for COX2.
Figure 5.03 Changes in binding site observed in the holo ensemble for AR.
Figure 5.04 Changes in binding site observed in the holo ensemble for ER.
Figure 5.05 Score distribution of true binders across the crystal structure and selected holo ensemble
structure for a) ER, b) AR, c) EGFR, and d) COX2.
Appendix
A
Figure A.01 Helices surrounding the binding grooves of Bcl-xL, Bcl-2 and Mcl-1.
Figure A.02 Distance between ABT-737 sulfonamide HN and backbone carbonyl O of Bcl-xL
Asn136, bcl-2 Asn140 and Mcl-1 Asn260.
Figure A.03 Hydrogen bond pair distances between ABT-737 sulfonyl O and side chains
in Bcl-xL, Bcl-2 and Mcl-1

xii

Figure A.04 Hydrogen bond pair distances between ABT-737 dimethylamino HN and side
chain carboxylate O in Bcl-xL and Bcl-2.
Figure A.05 Distance of ABT-737 ring centroids from their initial positions after superposition
of the protein C-alpha atoms to those in the first snapshot.
B
Figure B.01 Inhibitors identified from first round of virtual screening.
Figure B.02 Previously identified inhibitors not retrieved in virtual screening.
D
Figure D.01 Overview of VS results for the crystal structure and apo/holo ensembles
Figure D.02 Ensemble-based VS results for structures generated from apo MDs

xiii

List of Tables
Chapter 2
Table 2.1 Solvated interaction energies (SIE) in kcal/mol
Table 2.2 Virtual alanine mutations
Chapter 3
Table 3.1 Virtual hits selected for experimental validation
Chapter 4
Table 4.1 Potential neutral losses in the MS/MS experiment in forward MFA
Table 4.2 Reverse MFA of brefeldin A with correct formula of precursor ion
Table 4.3 Reverse MFA of brefeldin A with incorrect formula of precursor ion
Table 4.4 Molecular formula analysis of prazonsin
Table 4.5 Molecular formula analysis of dihydroergotamine
Table 4.6 Molecular formula analysis of dihydroergocristine
Table 4.7 Molecular formula analysis of cyclazocine
Table 4.8 Molecular formula analysis of N-allylnormetazocine
Table 4.9 Molecular formula analysis of quinacrine
Chapter 5
Table 5.1 Targets of the DUD set selected and properties of each set
Appendix
A
Table A.01 Fourier coefficients for ca-s6-n-ca

xiv

B
Table B.01 Ranking of selected hits from virtual screen
C
Table C.01 Molecular formula analysis of 5-leucine enkephalin

xv

Abbreviations
ADA Adenosine Deaminase
AR Androgen Receptor
BCL-2 B-Cell Lymphoma 2
BEM Boundary Element Method
CADD Computer-Aided Drug Design
CML Chronic Myelogenous Leukemia
COX2 Cyclooxygenase 2
CRK Cdc2-Related Kinase
DNDi Drugs for Neglected Disease initiative
DUD Directory of Useful Decoys
EGFR Epidermal Growth Factor Receptor
EP Evolutionary Programming
ER Estrogen Receptor
FDM Finite Difference Method
FXa Factor Xa
GA Genetic Algorithm
GART Glynacinamide Ribonucleotide Transformylase
gRNA guide RNA
GSK Glycogen Synthase Kinase
HSP90 Heat Shock Protein 90
IC Incremental Construction
KB Knowledge-based
KBP Knowledge-based potentials
LGA Lamarckian Genetic Algorithm
MAPK Mitogen-Activated Protein Kinase
MC Monte-Carlo
MD Molecular Dynamics
MF Molecular Formula
MFA Molecular Formula Analysis
MM Molecular Mechanics
MW Molecular Weight
NCE New Chemical Entity
NS Nanoseconds
NTD Neglected Tropical Diseases
PARP Poly ADP-Ribose Polymerase
PDB Protein Data Bank
PBSA Poisson-Boltzmann Surface Area
PS Picoseconds
RCS Relaxed Complex Scheme
RMS Root Mean Square
SA Surface Area
SBDD Structure-Based Drug Design
SIE Solvated Interaction Energy

xvi

SM Shape Matching
SRC SRC Tyrosine Kinase
TbRel1 Trypanosoma Brucei RNA-editing Ligase 1
VDS Virtual Decoy Set
VdW Van der Waals
VS Virtual Screening

xvii

Contributions of Authors
This thesis includes the text and figures from 3 published articles. I am the first author in
one of the manuscript (Chapter 2) and second author in the remaining two (Chapter 3 & 4).
Additionally, the thesis includes the text and figures from work to be completed towards the
publication of a manuscript (Chapter 5). This thesis has been written in manuscript-based format,
and the references of all chapters have been combined into one reference section at the end of the
dissertation. The contributions of the authors for each of the manuscripts are as follows:
Chapter 2:
Acoca S., Cui Q., Shore G.C., Purisima E.O. 2011. Molecular Dynamics Study of Small
Molecule Inhibitors of the Bcl-2 Family. Proteins. 79(9):2624-36.
I performed all original work and completed the first draft of the manuscript. Prior to
submission, Dr Cui reran a number of the simulations and Dr Purisima reworked the manuscript.
Chapter 3:
Moshiri H., Acoca S., Kala S., Najafadabi H.S., Hogues H., Purisima E.O., Salavati R. 2011.
RNA Editing Ligase 1 Inhibitors Blocks RNA Editing Activities and Editosome Assembly in
Trypanosoma Brucei. J Biol Chemistry. 286(16):14178-89.
My contributions to the manuscript involved the virtual screening segment of the work.
Specifically, the a) Virtual Screening section, b) Figure 1, c) Table1 and d) all relevant section
of the Experimental Procedures (Structure Preparation, Virtual Screening and Solvated
Interaction Energy). Prof Salavati’s Group carried out all experimental testing of the
compounds and its inhibitory properties with regards to the 20s Editosome activities.
Chapter 4:
Jarussophon S, Acoca S, Gao J.M., Deprez C., Kiyota T., Draghici C., Purisima E., Konishi Y.
2009. Automated Molecular Formula Determination by Tandem Mass Spectrometry (MS/MS).
Analyst 134(4):690-700.

xviii

I wrote the code for the software that ran the analysis and collaborated with Dr Konishi in its
development. The algorithm implemented in the software was originally developed by Dr
Konishi and his group. Dr Deprez is responsible for the continued maintenance of the software.
Chapter 5:
Acoca S., Hogues H, Purisima EO. 2010. Molecular dynamics ensembles for virtual screening.
(Manuscript in preparation).
The entirety of the work for this manuscript was carried by me. The docking scripts for the
tailoring of the pipeline to ensemble virtual screening were written by Mr Hogues.

1
Chapter 1
General Introduction

1.1
1.1.1
T
since
over
from
case,
multi
1.01)
F
PR
T
Drug Di
Overview
Though the u
the beginnin
a century old
plants and m
the modern
i-step proces
).
Figure 1.01
RE-CLINI
Target Identif
Identificati
Lead Comp
iscovery a
w & Challen
use of foreign
ng of time, t
d. Since then
microbial sou
pipeline for
ss involving
The Ph
ICAL ST
In Vitro Te
Animal Te
fication
on of
ounds
and Develo
nges
n substances
the use of an
n, medicinal
urces, or pro
r pharmaceut
the collabor
harmaceutica
TUDIES
esting
esting
2
opment
s for the treat
n isolated, we
l substances
oducts of pur
tical drug di
rative effort o
al Drug Disc
Ph
Ph
tment of illn
ell-defined c
have been n
re chemical
scovery and
of a multitud
overy and D
CLINIC
hase I
hase II
Lead O
nesses has be
chemical ent
natural produ
synthesis. W
d developmen
de of special
Development
AL STUD
Optimization
een practiced
tity is only
ucts isolated
Whichever th
nt is a long,
lties (Figure
Pipeline
DIES
Phase III
Phase IV
n
d
he

3
However, no venture of pharmaceutical research is without risk and a positive
outcome of the research is all but guaranteed. The difficulties inherent in discovery and
development along with the stringent requirements of pharmaceutical drugs have created
an economic problem in the profitability of such endeavors. Despite some spectacular
successes, more is spent on drug discovery and development every year and less is
delivered in terms of innovation (DiMasi et al., 2003). Figure 1.02 shows the reported
aggregate annual domestic prescription drug R&D expenditures for all members of the
U.S. pharmaceutical industry since 1963 alongside with the number of new US drug
approvals by year (DiMasi et al., 2003). When compared, the rate of growth of R&D
expenditures clearly outpaces that of new approvals by a large margin. These rising costs
have led to an overwhelming economical R&D problem within the pharmaceutical
industry. In 2003, a study of 68 new medications placed a timeline of 10-12 years and
cumulative costs averaging US$897 million for the development and marketing of a new
medication (Ezzell, 2003). The pre-approval R&D costs themselves are up from US$138
million in 1979 to US$318 million in 1991 to US$802 million in 2000 (Figure 1.03). The
result of these increases in higher R&D costs is an increased trend towards mergers and
industry consolidation. Additionally, higher costs translate into lowering risks.
Reorganization of R&D sectors in the pharmaceutical industry aims to optimize the
return on investment by carefully selecting the most profitable research sectors. The sum
of these effects leads to an increased need in efficient, low-cost technologies that bridge
the gap between R&D and the economic challenges facing the pharmaceutical industry.

F
in
fr
(P
F
pr
D
Figure 1.02
ndustry R&D
rom 1963 to 2
PhRMA) and
Figure 1.03
re-clinical, cl
DiMasi et al., 2
Increa
expenditures
2000. Source
d Tufts CSDD
Pre-ap
inical and tot
2003)
sing costs in
s (2000 dollar
of data: Phar
D Approved N
pproval costs
tal costs per ap
4
pharmaceut
rs) and US ne
rmaceutical R
NCE database.
for new dru
pproved new
tical R&D. In
ew chemical e
Research and M
. (Taken from
ugs. Each colu
w drug in 2000
nflation-adjus
entity (NCE)
Manufacturer
m DiMasi et al
umn indicates
0 US dollars.
sted
approvals
rs of America
l., 2003)
s the capitaliz
(Taken from
a
ze

5
Computer-Assisted Drug Design (CADD) approaches have been widely used in the
pharmaceutical industry. By allowing scientists to direct their attention on the most
promising candidate compounds, and thereby narrowing the synthetic and biological
testing efforts, CADD approaches play an important role in accelerating pharmaceutical
research. The recent successes of CADD in assisting rational drug design approaches
have proven it to be an essential tool drug design and development (Kapetanovic IM,
2008; Mandal et al., 2009; Song et al., 2009).
1.1.2 Thesis Outline
As part of this thesis, several elements of CADD have been incorporated into
research targeted at every step of the pharmaceutical drug discovery pipeline. The
following is a description of the contributions of each chapter to the individual segments
of the pharmaceutical drug design and discovery pipeline.
Chapter 4 explores the lead identification stage of the pipeline and provides an
alternative means of expediting research when identification of an active compound from
a natural products sample is required. Natural products (and their semi-synthetic
derivatives) have been major sources of marketed medications. However, lead isolation
and identification from natural product extracts faces the problem of replication, i.e. the
re-discovery of known natural products. Chapter 4 presents the development of a novel
algorithm which utilizes MS/MS data to extrapolate the correct molecular formula of a

6
compound resulting in a rapid identification of the probable nature of the isolated
compound.
Chapters 3 and 5 look at lead identification from the alternative source: compound
databases. Chapter 3 is the application of our virtual screening (VS) pipeline to the
Trypanosoma Brucei RNA-editing Ligase 1 (TbRel1) where the success of our screen led
to the identification of an inhibitor which allowed a better understanding as to the effects
of inhibition. Chapter 5 seeks to further enhance the current VS pipeline by utilizing MD-
generated conformational ensembles. Here the use of conformational ensembles (see
Chapter 1.6.6) attempts to provide a better, more complete representation of the target’s
conformational dynamics as part of the VS process.
Lastly, Chapter 2 is a representative pre-clinical molecular dynamics (MD) study
of a lead compound in complex with its target. The in silico MD study provides the
opportunity to researchers of obtaining information on the mechanism of action of the
compound that would be unavailable through the usual experimental means. Our
experiments aimed at identifying specific structural factors which provided the specificity
of two compounds which target the Bcl-2 family of proteins which have recently become
key targets for cancer therapeutics.

7
1.2 Molecular Modeling
The field of Computational Drug Design relies on the development of our
understanding of the underlying mechanisms involved in the interactions of a drug and its
target. As such, the development of Molecular Mechanics (MM) and Quantum
Mechanics (QM) has brought about the study of drug-target recognition events at the
atomic and electronic level. The increasing accuracy of these models, along with that of
the computational resources required to compute them, has prompted the development of
computational tools with increasing accuracy in evaluating drug-target interactions.
1.2.1 Molecular Mechanics
First applied by Westheimer and Mayer in 1946, MM encompasses the
computational techniques that allow the calculation of molecular properties through the
use of classical mechanics and electrostatics (Westheimer and Mayer, 1946). MM
provides the means to computationally describe molecular structures and properties
practically. As opposed to QM where the primary purpose is the accuracy of the
calculations, MM packages are directed to describe molecular structures and properties
accurately, robustly, and within reasonable time frames (Boyd DB and Lipkowitz KB,
1982). To do so MM (also referred to as Force Fields) describes molecules as a collection
of atoms held together by elastic or harmonic forces. These forces essentially represent
the structural features of a molecule such as bond lengths, bond angles, dihedral angles,
etc. Functions are used to describe the behavior of these forces resulting in a calculated

8
potential energy for each. As such, the total potential energy of a molecule is calculated
by the sum of all energy contributions (Eq. 1.01):
= + + + + (1.01)
Functional form of the potential energy of a molecule
where Ebond, Eangle, Etorsion, Evdw and Eelec describe the bond length, bond angle, torsion
angle, Van der Waals and electrostatic contributions respectively. Energy contributions
are calculated to describe the deviation of structural features from their empirically (or
high-level QM) calculated ideal value. While the exact mathematical functions utilized to
describe these contributions may differ between MM packages, the functions are chosen
to accurately replicate the behavior of each energy contribution within expected ranges
while minimizing the amount of calculations, and therefore of computational time,
required. From herein, all discussions on the potential energy function will refer to that
implemented by the AMBER forcefield (Cornell et al., 1995).
The potential energy function is described as follows:
Ε = − + − + (1.02)
2
1 + cos( Φ − ) + − +
The potential energy function

wher
length
bond
atom
and r
will n
e Kr, Kθ and
h respective
ed paramete
-centered pa
rij is the dista
now be discu
Figure 1.0
The torsion
series, harm
Harbury, 2
d Vn are force
ly, and γ is t
ers used to co
artial charges
ance between
ussed briefly
04 The co
n angle, bond
monic potenti
2007).
e constants,
the phase for
ompute the v
s on atoms i
n atoms i an
y.
ontributions o
d length/angle
ial, and Lenar
9
θeq and req a
r the torsiona
van der Waa
and j respec
d j. Each ter
of bonded ter
e, and VdW c
rd-Jones pote
are the equil
al angle. Aij
als energies.
ctively, ε is
rm of the pot
rms to the po
ontact are rep
ential respecti
librium bond
and Bij are t
qi and qj de
the dielectri
tential energ
otential ener
presented by a
ively.(Taken f
d angle and
the non-
enote the
c constant
gy function
rgy function.
a Fourier
from Boas annd

10
Ε = − (1.03a) Ε = − (1.03b)
The bond length and bond angle contributions to the potential energy function.
The typical bond length of an alkane carbon-carbon bond is 1.53Å. Similarly, the
angle between a typical C-C-C bond is between 109° and 114°. Deviations from these
equilibrium values will result in an increase in the energy of the system. Therefore,
thinking of a molecule as an assembly of point masses held together by springs (the bond
lengths and angles) is a perfectly reasonable approximation to their experimental
behavior. Therefore, the Ebond, Eangle terms are modeled as harmonic potentials centered
around an equilibrium value (Eq. 1.03a,b).
Ε =
2
1 + cos( Φ − ) (1.03 )
The dihedral angle contribution to the potential energy function.
The torsion angle is essentially the rotation about bonds. For any set of four
covalently bonded atoms ABCD the torsion angle is described as the angle measured
about the BC axis from the ABC plane to the BCD plane. The periodic nature of the
torsion angle, and of the torsional potential energy, lends itself to be described by
periodic functions such as a Fourier series with the series typically truncated at the third
term (Eq. 1.03c).

11
Ε = − (1.03 )
The VdW contribution to the potential energy function.
The Van der Waals (VdW) energy relates to non-bonded interactions of atoms as
a function of the distance between the nuclei. As two atoms approach one another,
London dispersion forces predominate creating a net attractive force between them. As
the distance between the two radii get too close, a VdW repulsion comes into play. The
attractive and repulsive parts of the potential energy is described by the Lenard-Jones (6-
12) potential though the more computationally demanding Buckingham potential can also
be used (Eq 1.03d). Parameters for the VdW energy term are obtained by measuring non-
bonded contact distances in crystals as well as VdW contact data for rare gas atoms
though other non-experimental sources (simulations) can also be used (Boyd and
Lipkowitz, 1982; Cornell et al., 1995).
Ε = (1.03 )
The electrostatics contribution to the potential energy function.
The last term in the potential energy function calculates the electrostatic energy
associated with interaction of two point charges, as described by Coulomb’s law.
Therefore, the magnitude of the electrostatic forces (and energy) of interaction between
two point charges is directly proportional to the scalar multiplication of the magnitude of
the charges and inversely proportional to the square of the distances between them (Eq.

12
1.03e). Applications of MM Force fields include, but are not limited to, energy
minimization, scoring, docking, molecular dynamics and Monte Carlo methods.
Over the past decade, the development of techniques such as high-throughput X-
ray crystallography has expedited the rate of macromolecular structure determination
resulting in a current total of ~70000 crystallographic or solution structures of proteins
deposited in the Protein Data Bank (Berman et al., 2000). The availability of this wealth
of structural information, along with well-documented successes, has generated
considerable interest the advancement of structure-based drug design (SBDD) techniques
(Marrone et al., 1997). A number of structure-based screening methods have been
developed to expedite pharmaceutical research. These methods have been used in lead
discovery identify novel chemical entities showing strong inhibitory activity towards a
target and in lead optimization where the careful selection of an optimized lead within a
set of chemically similar compounds is required. The following sections will include an
overall review of the methods which have been most crucial to the development of
molecular modeling in drug design, namely predicting binding free energy and binding
modes, and will be followed by a review of Molecular Dynamics & Virtual Screening
methods which have together significantly contributed to the advancement of CADD.
1.3 Predicting Binding Free Energies - Scoring
Calculations of free binding energies play an important role in the accuracy of
SBDD techniques (Raha and Merz, 2005). The major function of such techniques is in

13
providing estimates of binding free energies at a faster rate and lower cost than that
possible by experimental means. As such, the correlation between experimental and
computationally derived binding free energies to a target is a prerequisite to their success
in drug design.
The selective binding of a small molecule to a target protein is the result of
complementary structural and energetic features. This reaction is determined by the
standard Gibb’s free energy of binding Δ ∘
under standard state conditions
(concentrations at 1M, temperature is 298K and pressure of 1atm). The experimentally
determined association, dissociation and inhibitory constants (KA, KD and Ki respectively)
relate to the standard Gibbs free energy as follows:
= = = (1.04)
Δ ∘
= − (1.05)
As the binding free energy of a system is a state function, theoretical calculations of the
binding free energy can approximate the binding free energy in a direct fashion, by
calculating the properties of the protein and ligand individually and then of their
complex:

14
Δ = − + (1.06)
where ∆ is the free energy of binding, is the free energy of the complex,
and the free energies of the protein and ligand respectively. Another
form of expression of the binding free energy used is the decomposition into different
additive free energy components integrated into a single equation:
Δ = Δ + Δ + Δ + Δ (1.07)
In Eq 1.08 , ∆ is the interaction free energy owing mostly to electrostatic and steric
enthalpic contributions from complex formation, ∆ is the free energy of solvation
which accounts for solvent effects in binding, ∆ is the free energy change
associated with changes in the motion of the components of the system, and ∆
accounts for the free energy due to conformational changes upon complexation. Scoring
functions address these components of the binding free energy differently. Chapter 1.3.2
is a review of the different methods employed to evaluate them. However, before
addressing the differences between scoring methods, an extensive review of solvation
effects on binding is appropriate, as the development of implicit solvation models has had
a tremendous impact in the calculation of solvation free energies and hence of our ability
to estimate binding free energies (Tomasi and Persico, 1994; Orozco and Luque, 2000).

15
1.3.1 Effect of water: Continuum (Implicit) Solvation energy
Protein-ligand binding is a process that normally occurs within an aqueous
environment. These interactions play a significant role in binding energetics and are thus
taken into account when making binding free energy predictions. The effective dielectric
constant of water at 25°C is 78.5 while that of vacuum is 1. This energy is the result of a
favorable interaction between the atomic charge and the high-dielectric environment. As
a result of this favorable interaction, there is an energy penalty when polar parts of the
ligand are removed from their contact with water and exposed instead to the binding site.
Additionally, the presence of water results in the effective screening of charge-charge
interactions as indicated by the dielectric constant in the Coulomb equation (Eq. 1.03e).
However, the interface of a protein-ligand complex usually excludes the presence of
water molecules. In order to account for the distance-dependence on the effect of water
on charge-charge screening, a crude screening model that contains a distance-dependent
dielectric constant was introduced. In this model, for all atoms i and j in Eq 1.03e the
effective dielectric constant would be ε = Crij where C is a constant and rij is the
interatomic distance. While this model allows for the rapid calculation of one of the
major effects of water, it does not account for the one-body solvation energy for each
atom. Additionally, in calculating the electrostatic interaction between two atoms, the
position of all other protein and ligand atoms affect it and should also be taken into
account.

16
Continuum (implicit) solvation models can account for the additional
complexities of electrostatic interactions. The continuum solvation models essentially
treat the solvent as a bulk dielectric medium with a dielectric constant of ~80 (Dout=78.5
for water at 25°C) and the protein/ligand as low-dielectric regions with enclosed atomic
charges. Numerical solutions to the Poisson Boltzmann (PB) equation provide an
efficient means of calculating the electrostatic potential produced by a system. The PB
equation relates the electrostatic potential Φ(r) to the charge density ρ(r) as:
∇( ( )∇ ( )) = −4 ( ) (1.08)
where ε(r) is the dielectric constant. The total free energy of solvation is calculated as
follows:
∆ = ∆ + ∆ (1.09)
In Eq. 1.09, Φ(r), obtained from solving the PB equation, allows for the computation of
the total electrostatic energy component of the solvation free energy:
= ∑ ( ) = ∑ ( ( ) + ( )) (1.10)
where ϕC
and ϕR
are respectively the Coulomb and reaction field potential. The
Coulombic component is calculated as a Coulomb summation over all other charges than
qi :

17
( ) =
1
(1.11)
The reaction field component ϕR
of the electrostatic potential is derived from numerical
solutions to the PB equation, using either a finite difference scheme (FDM) or a boundary
element method (BEM) (Gilson et al., 1988; Honig and Nicholls, 1995; Purisima and
Nilar, 1995). Therefore, once a solution to the PB equation is calculated, the electrostatics
component of the solvation free energy is obtained.
The non-polar segment is derived from surface area terms. It contains
contributions from cavity formation and solvent-solute dispersion-repulsion interactions.
These terms are often considered to be proportional to the molecular surface area (Floris
and Tomasi, 1989; Still et al., 1990; Gogonea and Merz, 1999). The general formula for
this term is therefore:
= ∑ (1.12)
where Ai is the furnace area of one solute atom and τi is a surface tension parameter
specific for that atom. Typically, the molecular surface will be defined as the solvent-
excluded surface area or the solvent-accessible surface area. The solvent-excluded
surface may however perform better than other surface models (Pitarch et al., 1996).

The f
meth
Elem
1.3.1
Honi
(War
super
poten
withi
betwe
assign
formu
following se
ods for solvi
ment (BEM) m
.1 Fi
While the
g’s group in
rwicker and W
rimposed on
ntial, charge
in the lattice
een grid poin
ned proporti
ula then calc
Figure
(Taken f
ctions will ta
ing the PB e
methods.
inite Differen
e FDM was f
n the develop
Watson, 198
to the solute
density, diel
(Fig. 1.05).
nts, the alloc
ionally to the
culates the d
1.05
from Folgaro
ake a closer
equation, nam
nce Method
first introduc
pment of the
82; Gilson et
e and surroun
lectric const
As the posit
cated charge
e distance of
erivatives of
Cubic grid
o et al., 2002)
18
look at the t
mely the Fin
ced by Warw
Delphi prog
t al., 1988). I
nding solven
tant and ioni
tion of the at
e at each of th
f the grid po
f the PB equ
scheme for t
two most com
nite Differenc
wicker and W
gram has wid
In the FDM,
nt where valu
ic strength ar
toms of the s
he eight neig
int to the ch
uation.
the Finite Dif
mmonly em
ce (FDM) an
Watson, the w
dely popular
, a cubic latt
ues of the el
re assigned t
solute usuall
ghboring gri
harge. A finit
fference Met
mployed
nd Boundary
work of
rized its use
tice is first
lectrostatic
to grid point
ly fall
id points is
te difference
thod.
y
ts
e

19
1.3.1.2 Boundary Element Method
The BEM is an alternative approach to FDM for solving the PB equation. In the
BEM, the potential is represented as a charge density spread over the molecular surface
(Zauhar and Varnek, 1996). Instead of directly solving for the PB equation, the BEM
considers the induced-surface charge to develop an integral formulation to the problem.
This is expressed as:
=
( )
| |
(1.13)
where is the electrostatic potentials due to the surface charge distribution, σ(r) is the
surface charge density and the integral is taken over the entire molecular surface area
(Zauhar and Morgan, 1985; Purisima and Nilar, 1995) . The SIE and SIETRAJ scoring
functions used throughout this thesis for computation of binding free energies, compute
the reaction field energy using the BRI-BEM program which utilizes the BEM (Purisima
and Nilar, 1995; Purisima EO, 1998; Naïm et al, 2007; Cui et al., 2008).
1.3.1.3 Desolvation cost
Continuum solvation studies on the energetics involved in ligand-binding have
been conclusive in noting the large, unfavorable effects of solvent-screening on the
overall electrostatic change in free energy (Kuhn and Kollman, 2000; Wang et al., 2001;
Hou et al., 2002). Complex formation between a protein and ligand involves the breakage

20
and formation of several hydrogen bonds that includes the reorganization of water
molecules around the ligand and target active site (Fig. 1.06). While the gas-phase
interaction between the ligand and protein is favorable, the desolvation of the binding
pocket involved in ligand-binding results in an overall large energetic penalty. Hence,
ligand-binding is suggested to be primarily driven by short-range (vdW) and long-range
hydrophobic forces (Hünenberger et al., 1999; Kuhn and Kollman, 2000; Wang et al.,
2001; Hou et al., 2002). This phenomenon can be better described by looking at the
electrostatic component of the binding free energy. The electrostatic change in binding
free energy is expressed in Eq. 1.15 as the sum of the change in reaction field and
coulomb binding free energies:
Δ = Δ + (1.14)
where Δ is the change in reaction field energy and is the change in
intermolecular Coulomb energy. Computational studies have noted that while the
intermolecular Coulomb energy favors binding, the desolvation effects are incompletely
compensated by ligand-target interaction in the bound state resulting in an unfavorable
effect on the binding free energy (Hendsch and Tidor, 1999; Miyashita et al., 2003; Sims
et al., 2005).

1.3.2
of inh
mode
rates
2009
devel
respe
abilit
funct
comp
equat
most
Figure 1.0
Scoring F
In their m
hibitors and
es. Virtual sc
and improve
). The maste
lopment of s
ect to their ab
ty to predict
tions may us
putational de
tion, most sc
dominant co
06 Repres
format
Functions
most general u
provide an a
creening pip
e lead comp
er equation (
scoring funct
bility to pred
binding mod
e all or some
emands in as
coring functi
ontributions
sentation of d
tion. (Taken f
use, scoring
accurate disc
elines have u
ound identif
Eq. 1.08) de
tions. The fo
dict binding
des is also o
e of the diffe
ssessing a mo
ions employ
into accoun
21
desolvation e
from Cozzini
functions ar
crimination b
used scoring
fication (Grü
escribed is us
ollowing is a
affinity and/
f interest (H
erent terms e
ore rigorous
a more emp
nt. This provi
effects during
et al., 2004)
re designed t
between true
g functions to
üneberg et al
sed as an ov
an overview
/or affinity r
Halperin et al
expressed in
representati
pirical expre
ides a fast an
g ligand-prot
to predict bi
e and false b
o improve en
l., 2002; Seih
verall guide f
of scoring fu
ranking thou
l., 2002). Sco
n Eq. 1.08. D
ion of the m
ssion, taking
nd accurate m
tein complex
inding mode
binding
nrichment
hert MH,
for the
functions wit
ugh their
oring
Due to the
master
g only the
means of
x
es
th

22
predicting binding modes and ranking potential lead compounds in a virtual screening
setting. The three categories of scoring functions that will be reviewed include physical-
chemical, knowledge-based, and empirical functions.
1.3.2.1 Physical-Chemical
The most prominent physical-chemical scoring function is the Molecular
Mechanics/Poisson-Bolzmann Surface Area (MM-PBSA) function (Kollman et al.,
2000). The overall format of the function can be summarized as follows:
Δ = Δ + Δ − Δ − Δ − ∆ (1.15)
where ∆ is the Coulomb electrostatics and vdW interaction energies calculated using
MM force field packages such as AMBER and CHARMM (Case et al., 2005; Brooks et
al., 2009). The ∆ term is usually evaluated using normal mode analysis of a MD
trajectory. All calculations are based on ensemble averages based on snapshots taken
from MD trajectory. Therefore, the MM/PBSA energy is calculated from averages of a
finite number of snapshots from the ensemble and, as such, the quality of the results is
sensitive to the details of the MD simulation.
The Solvated-Interaction Energy (SIE) scoring function is another example of
physics-based scoring function. It makes use of force-field parameters and equations to

make
equat

For th
interm
the ch
eleme
energ
comp
solva
poten
betwe
water
solva
surfa
surfa
e estimates o
tion is as fol
ΔG
he electrosta
molecular in
hange in rea
ent method (
gy is calculat
ponents of fr
ation energy
ntial between
een the boun
r.
As describ
ation energy)
ce area can b
ce area is ca
on the bindin
lows:
=
atic compone
nteraction en
ction field so
(Purisima an
ted as the dif
ree energy to
and i
n the ligand
nd and free s
bed in Sectio
) is proportio
be different
alculated as t
ng affinity of
+ Δ

ent of the fre
ergy i
olvation ene
nd Nilar, 199
fference betw
o binding,
s the vdW in
and protein
states of the
on 1.31, the
onal to the su
from functio
the solvent-e
23
f molecules t
+
ee energy (
is estimated
ergy calcula
95; Purisima
ween the bou
is the
nteraction en
atoms.
solute-water
cavitation c
urface area (
on to functio
excluded sur
to a target. T
+ Δ
), the e
using Coulo
ated using the
a, 1998). Th
und and free
change in th
nergy calcula
is calcul
r VdW energ
ost (nonpola
(Eq. 1.13). T
on. In the cas
rface area (N
The format o
electrostatic
omb’s law an
e BRI-BEM
e change in
e states. For
he non-electr
ated using th
lated as the d
gy and cavita
ar componen
The definitio
se of SIE, th
Naim et al., 2
of the SIE
(1.16
nd is
M boundary
solvation
the nonpola
rostatic
he LJ 6-12
difference
ation cost in
nt of the
n of the
he molecular
2007).
6)
s
ar
n

24
= ∙ Δ (1.17)
However, to the cavitation cost is also added the loss of intermolecular VdW interaction
between solute and solvent. This is accomplished by a linear scaling the solute-solute
intermolecular VdW by a factor β and thereby account for the loss of solute-solvent VdW
interactions upon complex formation:
= ( − 1) + ∙ Δ (1.18)
The complete parameterization of the SIE scoring function is dependent on a number of
variables that include the solute dielectric constant (Din), solute atomic radii {ri}, SA
scaling coefficient (γ), vdW interaction energy scaling coefficient (β) and fitting constant
(C) (Naim et al., 2007):
({ }, , , , ) = ( ) + Δ ({ }, ) +
∙ + ∙ Δ ({ }) + (1.19)
One general issue with empirical scoring functions is tied to their training set which can
lead to an overall bias towards targets that have been explored as part of it (i.e. the
training set on which they are parameterized may represent a bias towards the
composition and diversity contained in it) (Gohlke and Klebe, 2002; Ferrera et al., 2004).

25
1.3.2.2 Empirical (Regression) Scoring Functions
Empirical scoring functions weigh contributions from the different energetic
terms in order to make a binding affinity prediction. These terms may include hydrogen-
bonding using geometric measures as well as FF-based physical potentials. However, the
linear weighing of the terms is derived from regression methods that fit binding affinity
terms to experimental affinities using experimental data and structural information. The
regression analysis optimizes the weighing to provide a maximal correlation between
computed and experimental binding affinities in the training set (Bohm et al., 1994;
Verkhivker et al., 1995; Head et al., 1996; Naim et al., 2007).
1.3.2.3 Knowledge-based Scoring Functions
Knowledge-based (KB) scoring functions use statistical potentials that are derived
from protein-ligand complexes databases such as the PDB (Koppensteiner and Sippl,
1998; Muegge et al., 2000; Gohlke and Klebe, 2001). The use of KB potentials for the
scoring of protein-ligand complexes was inspired by the success of potentials in
predicting protein folding and structure (Sippl, 1990; Sippl, 1993; Sippl et al., 1996). In
KB functions, occurrences of interacting pairs of atoms in a training set of complexes are
used to derive statistical potentials that resemble but are not potentials of mean force
(Ben-Naim, 1997). In doing so, certain assumptions are made. The first is that the
protein-ligand complex structures are assumed to be in a state of thermodynamic

26
equilibrium while the second is that the distributions of atoms in the complexes obey
Boltzmann’s law (Sippl et al., 1993; Mullinax and Noid, 2010).
KB potentials are built by first calculating a distance-dependent probability
distribution of atom-pairs. The Hemholtz free energy is then calculated per atom-pair in
the protein-ligand complex:
( ) = −
( )
(1.20)
where ρij(r) is the pair correlation function for an atom pair of type ij at distance r while
is a normalization factor representing the bulk density for the atom-pair when they
are not interacting at a distance r. A few notable examples of KBP scoring functions
include the piecewise linear potential (PLP), PMFScore and DrugScore (Verkhivker et
al., 1995; Muegge and Martin, 1999; Gohlke et al., 2000).
1.3.1.4 Problems
The major shortcoming of most scoring techniques like SIE is that they only
consider a single receptor-compound interaction in estimating binding free energies of
what is a dynamic process in nature resulting from an ensemble of such complexes. The
use of ensembles as part of a Virtual Screening pipeline is explored in Chapter 5 of this

27
thesis. Nevertheless, despite phenomenal advances in computational power and
technologies, accurate estimates of binding free energies remains challenging.
1.4 Predicting Binding Modes – Docking
Predicting binding modes of ligands to a target protein structure, also known as
docking, has been a key component of in silico techniques used in structure-based,
rational drug design (Kuntz I, 1992; Cavasotto and Orry, 2007). Docking schemes
attempt to find the optimal matching between a ligand and a targeted protein. In essence,
the problem can be reduced to the following: given the atomic coordinates of these two
molecules, predict the proper conformation of the complex. One assumption that is
usually taken into the docking problem is prior knowledge of the binding site targeted by
the ligand.
Docking schemes are typically validated by their ability to reproduce
experimental data through docking studies where protein-ligand complex conformations
are obtained in silico and compared to structures obtained by experimental means (i.e. X-
ray crystallography or nuclear magnetic resonance). Since predicting the correct bound
conformation of both the protein and ligand is a challenging and computationally
expensive task, the problem is usually reduced to the following: given the proper “bound”
conformation of the protein, predict the proper bound conformation of the ligand and
complex. This problem is the focus of the large majority of docking algorithms though a

28
few incorporate a sampling of receptor conformation as well to optimize the predicted
complex coordinates.
The main purposes of docking algorithms can be divided into two groups though
the function of one is not mutually exclusive of the other. The first emphasizes speed and
accuracy, where the main goal is the rapid screening of millions of potential candidate
molecules for the discovery of a few active compounds in virtual screening (see Section
1.6). The second emphasizes accuracy of the complex structure, attempting to bridge the
gap closer and closer between the predicted complex and the experimental structure.
Docking programs search through a large selection of possible fits between a ligand and
the targeted binding pocket and assess the best fit between them by taking into account
several parameters. These parameters are akin to those used in scoring functions, which is
in essence what they are. In this case however, the scoring scheme is optimized to
retrieve the binding mode that is closest to the experimental structure as measured by
RMSD.
1.4.1 Docking Algorithms
1.4.1.1 Fast Shape Matching
Shape Matching algorithms primarily take into account the overall geometrical
overlap between the protein and ligand molecules. Shape matching methods employ a
variety of algorithms in order to assess proper conformations of the ligand and binding
site to be matched.

29
Rigid-body docking applications are mainly SM-based. Examples include
ZDOCK as well as our own internally developed docking program (Chen et al., 2003;
unpublished). Flexible docking algorithms can also use SM methods as part of their
strategy. For instance, DOCK combines incremental construction and a sphere matching
algorithm in order to identify an optimal geometrical alignment (Kuntz et al., 1982).
The development of methods that analytically calculate the solvent-accessible molecular
surface was a key contributor that allowed the development of these SM applications
(Connolly et al., 1983a; Connolly et al., 1983b).
1.4.1.2 Incremental construction
Incremental construction methods divide the ligand into fragments which are
separately docked onto the surface of the binding site. The fragments identified as rigid
“anchors” regions of the ligand are typically docked first and the fragments identified as
flexible regions are added sequentially with a systematic scanning of the torsion angles
around the anchors. Following the docking, rigid fragments are then fused together for an
optimal orientation of the molecule to be obtained. This fragmentation of the molecule is
a means of incorporating ligand flexibility into docking.
The first IC algorithm was part of the DOCK program (Desjarlais et al., 1986). In
DOCK, the rigid fragments were first docked independently and each combination of the
rigid fragments was combined as in the original compound if the atoms were within

30
certain distances of each other. Other methods utilizing IC include FlexX, FLOG,
Hammerhead and Surflex (Miller et al., 1994; Rarey et al., 1996; Welch et al., 1996;
Kramer et al., 1999; Ewing et al., 2001; Jain AN, 2007).
1.4.1.3 Monte Carlo Simulations
The Monte Carlo (MC) Method was developed in its present form by Metropolis,
Ulam and Neumann during their work on the Manhattan project (Metropolis and Ulam,
1949; Metropolis et al., 1953). Historically, it was used to perform the first computer
simulation of a molecular system. MC simulations were later integrated as a means of
adding flexibility to docking algorithms (Liu and Wang, 1999). With respect to docking
algorithms, MC simulations attempt to position the ligand within the binding site through
a number of random translational and rotational changes. The advantage of the added
randomness to the sampling is a decreased likelihood of being trapped in local minima.
The standard (Metropolis) MC methods generate configurations of a system
through random Cartesian changes. Each change to the system is evaluated and then
rejected or accepted based on a Boltzmann probability. One example of a MC-based
docking application is the Internal Coordinates Mechanics (ICM) program (Abagyan et
al., 1994). ICM initially makes a random move of one of three types: a rigid body ligand
move, a torsion move of the ligand or a torsion move of the receptor side chain (Abagyan
and Totrov, 1994; Abagyan et al., 1994). The side chain movement samples the
conformational space defined a priori through a side-chain rotamer library (Ponder and

31
Richards, 1987). The side chain sampling allows the algorithm to explore with larger
probability the conformational space which is known to be highly populated. Following
each sampling step, a modified ECEPP/3 scoring function is used to perform a conjugate
gradient local minimization and test whether the conformation is accepted or rejected
using the Boltzmann criteria.
1.4.1.4 Evolutionary Programming
Evolutionary programming (EP) algorithms are computational models that take
their name and concept from biological processes. The EP algorithms generally start with
a population of structures characterized by a given set of genes. Parent structures are then
allowed to produce children structures containing a mixture of structural characteristic of
the parents (as defined by the parents genes), throughout which mutations are allowed to
occur. The individuals of the population displaying the most favorable features are kept
while others are discarded, as per Darwin’s principle of natural selection.
Genetic algorithms (GA) are one example of EP algorithms. In GA, a population
of chromosomes (parents) is used to create new chromosomes (offsprings). Crossovers
are used to generate the new chromosomes and a complex set of scoring functions are
then used to select members within each round of selection. DOCK and GOLD are two of
the most notable docking programs utilizing variations on the GAs (Ewing et al., 2001;
Verdonk et al., 2003). While EP algorithms can find one of the best solutions to the
docking problem, they, like all heuristic algorithms, can also be trapped in local minima.

32
1.5 Molecular Dynamics
Molecular recognition between a protein and its ligand is a dynamic and complex
process. An accurate computational representation of this interaction is a problem of
considerable complexity and interest in CADD. Few techniques address this process and
account for the conformational flexibility of both the ligand and receptor. Even fewer do
so in an accurate and efficient manner. Protein flexibility is a multi-factorial, complex
problem owing to the inter- and intramolecular interactions involved in the
conformational dynamics. Of all methods commonly used, Molecular Dynamics (MD)
simulations provide the most complete computational representation of the dynamics
involved in this process.
1.5.1 Newton’s Laws
Molecular dynamics methods solve the Newton’s equation of motion for atoms on
an energy surface. Newton’s law of motion provides the means of generating successive
conformations of the system. The result of these successive conformations is a trajectory
that indicates how the positions and velocities of particles within the system vary with
time. Newton’s laws of motion can be summarized as follows:

33
First law: Every body remains in a state of constant velocity unless acted upon by an
external unbalanced force. Hence, if the resultant force is zero, then the velocity of the
object is constant (Eq. 1.22):
I. = 0 ⟹ = 0 (1.21)
Second law: A body of mass m subject to a net force F undergoes and acceleration a that
has the same direction to the force and a magnitude that is directly proportional to the
force and inversely proportional to the mass:
II. F = = a (1.22)
Third law: The mutual forces of action and reaction between two bodies are equal,
opposite and collinear, i.e. whenever a first body exerts a force F on a second body, the
second body exerts a force –F on the first:
III. F , = − F , (1.23)
In order to obtain an accurate trajectory, the differential equation embodied by Newton’s
second law of motion is solved:

34
= (1.24)
which describes the motion of particle of mass mi along one coordinate xi with a net force
F along that direction.
1.5.2 Ensembles
MD simulations are characterized with regards to the macroscopic conditions that
are held constant. Statistical mechanics require that certain macroscopic conditions must
be held constant in order to study the collection of all microstates of a system, its
ensemble. Therefore, ensembles can be characterized by different quantities that include:
volume (V), pressure (P), total energy (E), temperature (T) and number of particles (N).
Ensembles are accordingly named and labeled with respect to the fixed quantities: NVT
(canonical), NVE (micro-canonical) and NPT (isothermic-isobaric).
1.5.3 Verlet Algorithm
As discussed above (see Section 1.2.1), nuclei behave to a good approximation as
classical particles. The dynamics of motion can therefore be extrapolated by solving
Newton’s second equation:
= − = (1.25)

35
Here, V is the potential energy at position x and the vector x is a vector of length 3N
containing the Cartesian coordinates for all particles. With an initial set of particles at
position xi, the positions at a small time-step later can be calculated using a Taylor
expansion (Eq. 1.27, 1.28):
= +
∂
∂t
(Δ ) +
1
2
∂
∂t
(Δ ) +
1
6
∂
∂t
(Δ ) + ⋯ (1.26)
= + (Δ ) +
1
2
(Δ ) +
1
6
(Δ ) + ⋯ (1.27)
where the velocities vi, the acceleration ai and the hyper-acceleration bi are the first,
second and third derivatives of the positions with respect to time. Substituting Δt with -Δt
we obtain the positions at ri-1:
= − (Δ ) +
1
2
(Δ ) −
1
6
(Δ ) + ⋯ (1.28)
By adding equations for ri+1 and ri-1 we are able to calculate the position at Δt later from
the current acceleration, and the previous and current positions.
= (2 − ) + (Δ ) (1.29)

36
where the current acceleration can be obtained from the force or the derivative of the
potential:
=
F
= −
1
(1.30)
As the acceleration is re-evaluated at each time step from the forces, the positions are
changed at each time-step, which then creates the resulting trajectory. This, in essence, is
the Verlet algorithm (Verlet, 1967). Certain disadvantages of the Verlet algorithm has
given rise to the use of alternative algorithms for MD simulations. The first disadvantage
is the tendency towards truncation errors. This is a consequence of adding ai (a small
number) and 2ri – ri-1 (a large number) for the calculations of the new positions. The
second is that velocities are not an explicit part of the Verlet algorithm and creates a
problem in generating constant temperature ensembles (Cuendet and van Gurensteren,
2007). The velocity Verlet algorithm is a variation that addresses these problems (Martys
and Mountain, 1999).
1.5.4 Considerations
MD simulations require small time-steps and are time-intensive with regard to the
calculation of phenomena such as bond stretching and angle-bending motions. The size
of the chosen time-step is a critical element affecting the accuracy of the trajectory with
smaller time-steps providing a better approximation of the expected dynamics of the
system. This however also increases the computational costs, as more steps are required
for propagating the system for a given total time. Generally, the longest time-step that can

37
be taken is limited by the rate of the fastest process being sampled in the system.
Typically, that requires that the time-step be one order of magnitude smaller than the
fastest process. In MD simulations, molecular rotations and vibrations occur with
frequencies in the 1011
-1014
S-1
. Therefore, time-steps in the order of 10-15
S or less are
required for sampling of these molecular motions. A consequence of this limitation is that
a MD simulation of 1 nanosecond (ns) would require ~109
time-steps to complete. Since
simulations are typically in the nanosecond range, orders of ~109
calculations present a
significant computational demand. Additionally, biological phenomena such as protein
folding typically occur in even longer microsecond timescales.
One solution to this problem involves the freezing of the fastest molecular
motions. This allows for significantly longer time-steps to be used while affecting the
overall accuracy minimally. This is made possible because the fastest processes, the
stretching vibrations, have a minimal impact on the properties of the trajectory. This is
especially true for bonds involving hydrogen atoms. Therefore, freezing of bond lengths
involving hydrogen atoms results in longer simulation times for a given number of
calculated time-steps. The SHAKE and RATTLE algorithms provide the constraints
necessary to maintain bonds involving hydrogen atoms fixed during the simulation and
typically allow time-steps to be increased two to three fold (Ryckaert et al., 1977;
Andersen, 1983).
Lastly, overcoming energy barriers can be a challenging task given that any
motion of a conformational ensemble outside of its minimum in the potential energy

38
surface will generate a force pulling the system back towards its minimum. A number of
novel algorithms such as Replica-Exchange and MetaDynamics attempt to overcome this
limitation using different means (Sugita and Okamoto, 1999; Laio and Parrinello, 2002).
1.5.5 Boundary Conditions
MD simulations of a solvated system usually involve several hundred or thousand
molecules of solvent. However, in order for macroscopic properties to be realistically
calculated from a limited number of solvent molecules, boundary effects require special
considerations. When considering that a water-filled 1L cube contains 3.3 x 1025
molecules of water at room temperature, 2 x 1019
of which will be interacting with the
cube’s boundary, it is easy to see why using a computationally tractable number of
molecules will be insufficient for deriving bulk properties. In a system containing a few
thousand water molecules, most would be under the influence of interactions with the
boundary.
Periodic boundary conditions basically replicate the bulk properties of a fluid
given a limited number of solvent molecules. The system is usually prepared within the
confines of a box having a cubic or other polyhedral geometry (Bekker, 1997). The box is
then replicated in all directions (Fig. 1.07). If a solvent molecule leaves the box during
the simulation it is replaced by an image particle entering the box from the opposite side
(Fig. 1.07). A constant number of solvent molecules within the box is therefore

maint
as if t
Figu
repro
1.5.6
r-1
. T
contr
spher
cutof
simul
1995
tained. This
they were w
re 1.07 Pe
duced from fr
Long-Ran
The intera
This creates a
ributions from
rical cutoffs,
ff. However,
lations of pe
).
configuratio
within bulk flu
eriodic bound
from www.-ph
ge Electrost
action energy
a computatio
m atoms loc
, which essen
cutoffs have
eptides and n
on allows for
uid.
dary conditio
hy.-cmich.-edu
tatic Calcul
y between tw
onal problem
ated outside
ntially elimi
e been docum
nucleic acids
39
r particles w
ons in molecu
u/-people/-pe
lations: The
wo point cha
m in consider
e of the centr
nates electro
mented to re
s (Schreiber
within the sys
ular dynami
tkov/-isaacs/-
e Ewald Sum
arges decays
ring the long
ral box. One
ostatic contri
esult in sever
and Steinhau
stem to expe
ic simulation
-phys/-pbc.-h
mmation M
s at a rate pro
g-range elec
e solution is
ibutions bey
re artifacts in
user, 1992; Y
erience force
ns. (box
tml)
Method
oportional to
trostatic
using
yond the
n MD
York et al.,
es
o

40
Ewald summation methods allow the potential due to the partial charges of a
system and all of their periodic images to be considered. In Ewald summation, the
position of each image box is related to the central box through a vector. Each vector is
therefore an integral multiple of the length of the box. Generally, the contribution of
charge-charge interactions within the central box to the potential energy can be written
as:
=
1
2 4
(1.31)
where rij is the distance between charge i and j.
1.6 Virtual Screening
While economic pressures increase to deliver target-optimized drugs
at an accelerated pace and minimal costs, computational methods have become an
increasingly important tool in drug discovery efforts. While numerous challenges
continue to persist in the in silico accurate prediction of ligand-target interaction,
computational methods have already proved themselves in the successful development of
numerous pharmaceutical medications (See Section 1.7). Of note is the role of virtual
screening (VS) in lead discovery efforts. VS provides the ability to analyze large
compound databases, make predictions as to which compounds are most likely to interact
with the desired target and become promising lead candidates. These candidates can then
be tested and successful molecules can then go through rounds of optimization. VS

41
therefore circumvents the expense incurred through large scale screening efforts and
narrows the search to a few, high-potential candidates (Oprea and Matter, 2004).
1.6.1 Virtual Screening Pipeline
A VS pipeline is designed to optimize the use of computational resources for
efficiency and speed at the initial stages and for accuracy at the later stages. This design
optimizes the use of computational resources for the best overall performance of the
pipeline. In this case, earlier stages of the filtering process minimize the use
computational resources, thereby optimizing speed, by using soft scoring functions. More
extensive calculations and sampling methods are reserved for the later stages of the
pipeline where careful selection of the candidates with the highest potential is required
(Fig 1.08).

Figuure 1.08
42
The Virrtual Screenning Pipelinne.

43
1.6.2 The Target
Target selection is the first step to any structure-based drug discovery project.
Several requirements must be met. The first involves the target’s druggability (Hajduk et
al., 2005). The second involves the availability and choice of the 3D structure used for
the screening. X-ray crystallography or NMR structures are the preferred choices though
VS projects have been successfully run on homology models as well (Evers and
Klabunde, 2005). Since the majority of VS software has limited considerations with
regards to target flexibility, the choice of structure should be aimed towards one where
the conformation of the binding site is akin to that expected when bound to a small
molecule (Sousa et al., 2006).
Following the careful selection of target and structure, preparation of the target
structure is another important task in the VS preparation steps. The primary consideration
is in the assignment of proper protonation states to active-site residues. Difficulties arise
due to the effects of local electrostatic conditions on the pKa values of side-chain
functional groups. With respect to the success of VS, proper assignment of side-chain
protonation states is crucial in providing an accurate representation of the binding-site
characteristics. A few alternatives exist which integrate the electrostatic effects in
assessing the protonation states of side-chain functional groups. One example is the H++
server which predicts the protonation states of amino-acid side chain functional groups
within the continuum electrostatic framework (Gordon et al., 2005).

44
1.6.3 The Compound Database
A database should first provide optimal structural diversity so as to maximize
chances of finding numerous scaffolds displaying activity against the target. Generally,
compounds should also adhere to the Lipinski’s rule of five (Lipinski et al., 2001).
Several small molecule database exist which are routinely used for VS. These include
the ZINC library, the National Cancer Institute compound database, and Accelerys
Available Chemical Directory and MDDR libraries (Milne et al., 1994; Irwin and
Shoichet, 2005). Most major pharmaceutical companies also have in-house corporate
libraries.
1.6.4 The Docking Protocol
The docking protocol is at the core of every VS pipeline. Docking algorithms
attempt to predict the structure of the protein-ligand complex as a first, preliminary filter.
The docking must therefore be fast, as an extremely large number of compounds must be
evaluated. While the docking pipeline may not provide absolute accuracy with regards to
selecting all true-positive compounds, it must be robust enough not to discard moderate
to strong binders as false-negatives across a variety of targets. This preliminary docking
step is typically composed of a docking algorithm (see Section 1.4.1) and a scoring
function. The scoring function used at this step is usually optimized for speed rather than
accuracy and other more extensive and accurate functions are usually used at later stages

45
of the pipeline where a more discriminate assessment of the binding potential of a
smaller number of compounds is required.
1.6.5 MD Simulations
MD simulations are used a final refinement of the most promising candidate
molecules before selection is done. As such, MD simulations in the order of a few
hundred picoseconds to a few nanoseconds are done on the predicted ligand-protein
complex. The goal of MD simulations in this setting is to establish the proper dynamic
stability of the complex. This is achieved by careful observation as to the interactions
between the ligand and target supported by analysis of the stability of the protein
structure and binding mode. Scoring functions such as SIETRAJ and MM-PBSA can also
be used on the MD simulations to obtain a better assessment as to the potential binding
affinity of the compounds (Kollman et al., 2000; Cui et al., 2008).
1.6.6 Conformational Ensembles
The VS pipeline described typically considers the target as a rigid entity during
the docking process. Since the conformational flexibility of the target is seldom fully
considered during such a process, methods that integrate target flexibility through the use
of conformational ensembles have proved successful (Bursavich et al., 2002; Osterberg et
al., 2002; Barril and Morley, 2005; Amaro et al., 2008). Theoretically, conformational
ensembles allow a fully dynamic representation of the target to be presented to the ligand

46
for fit. This is akin to what is thought to occur in solution where a ligand binds to a pre-
existing receptor population. The ligand is then exposed to the conformational ensemble
of the receptor and may preferentially bind to conformations that occur infrequently in
the receptor’s dynamics (Ma et al., 2002; Wong and McCammon, 2003). The result is a
shift in the equilibrium population towards that of the preferentially bound conformation
(Ma et al., 2002). The “lock and key” model of ligand binding is therefore thought to be
a representation of one of the rare conformations within this ensemble and hence, that
conformational selection is a driving force in ligand recognition.
One of the most prominent examples of using conformational ensembles
generated from MD simulations for VS has been implemented as part of the Relaxed
Complex Scheme (RCS) (Lin et al., 2002; Lin et al., 2003; Amaro et al., 2008). The RCS
combines the advantages of docking with the dynamic conformational sampling that is
provided by MD simulations. Through this use of MD simulations, the RCS integrates
extensive conformational sampling of the target structure into the VS pipeline. At the
core of the RCS is an all-atom MD simulation of the target where the simulation time
varies from a few ns to tens of ns (Schames et al., 2004; Amaro et al., 2008; Cheng et al.,
2008). With few exceptions, the AutoDock docking program is typically used to carry out
docking and scoring functions (Morris et al., 2009). Since significant conformational
changes to the active site are induced by ligand binding, a ligand-bound structure is
usually preferred. The resulting trajectory is then reduced to a computationally tractable
ensemble. A number of strategies exist to select a representative subset from the full set
of resulting structures where much of the dynamic information of the trajectory remains.

47
RMSD-based clustering is an obvious choice for selection of the most dominant
configurations within the trajectory. In their study of avian influenza neuraminidase using
the RCS, Cheng et al. applied RMSD clustering on snapshots extracted every 10ps from
40ns trajectories (Cheng et al., 2008). An alternate but equally effective method is that of
QR-factorization (O’Donoghue and Luthey-Schulten, 2005). QR-factorization was
originally designed for the removal of redundant information from structural databases by
identifying a set of structures which represent the evolutionary conformational space of a
protein. In their study of the Trypanosoma brucei RNA-editing Ligase 1 (TbRel1),
Amaro et al. integrated the use of QR factorization into the RCS in order to extract a
representative set of structures from a 20ns trajectory of the target in complex with ATP,
its native substrate (Amaro et al., 2007; Amaro et al., 2008). For the QR factorization,
snapshots were extracted every 50ps resulting in a set of 400 structures which was
reduced to a total of 33. In both cases the RCS proved extremely successful in identifying
true binders from the original database. For Cheng et al., the weighted average score
from docking into the full representative ensemble of the holo trajectory resulted in the
selection of 25 compounds, 10 of which displayed a Ki under 500µM (Cheng et al.,
2008). For Amaro et al., ranking of the mean score from docking into the QR
representative set resulted in the selection of 10 compounds, 5 of which displayed
inhibition at 10µM or better (Amaro et al., 2008).

48
1.7 Successes of CADD
Computer-aided drug design techniques have now become a core component of
modern drug discovery and development pipelines (Jorgensen, 2004). One of the most
prominent successes of rational, structure-based drug design is that of the imatinib
(Gleevec®), a tyrosine kinase inhibitor for the treatment of Chronic Myelogenous
Leukemia (CML) (Capdeville et al., 2002). Early drug discovery programs for the
treatment of cancer largely focused on inhibition of DNA synthesis and cell division
through the use of antimetabolites (nucleoside analogs and antifolates), alkylating agents
(classical and newer platinum-based therapeutics) and microtubule destabilizers (vinca
alkaloids) and microtubule stabilizing agents (taxanes) (Scott, 1970; Scagliotti and
Selvaggi, 2006; Zhou and Giannakakou, 2005). The uncovering of the bcr-abl reciprocal
translocation as the pathogenic event in CML established it as an attractive drug target
(Kelliher et al., 1990). Docking studies and X-ray crystallography established the binding
of Gleevec with high-affinity to the inactive form of the ATP-binding pocket (Schindler
et al., 2000; Zimmerman et al., 2001). Additionally, SBDD allowed for the analysis of
mutations in the enzyme which gives rise to imatinib resistance. This provided an
opportunity for the design of novel pharmaceuticals that are effective in overcoming
imatinib-resistance (Weisberg et al., 2007).
The first marketed drug whose development was assisted by SBDD was captopril
(Capoten®
), an angiotensin-converting enzyme (ACE) inhibitor used for the treatment of
hypertension (Cushman et al., 1977). Early on in the developmental stages a peptidic lead

49
compound had been identified from a snake poison. However, structural information as to
the binding site of ACE was lacking. This led scientists at Squibb to use the structure of
another zinc protease, the recently crystallized carboxypeptidase A, to model binding site
of ACE. The modeling led to the development of captopril, the first successful design
based on a molecular model. The structural determination of ACE came about in 2002
where it was determined that the biding site of ACE differed significantly from
carboxypeptidase A leading to the development of newer, more targeted ACE inhibititors
(Natesh et al., 2003).
The success of CADD in properly assessing the potential binding of a compound
to a target is directly related to our ability to correctly the binding affinity of a small
molecule. Many of the limitations of current in silico pipelines stem from the difficulties
in properly and reliably predicting the binding of small molecules to a target (Michel and
Essex, 2010).

50
Chapter 2
Molecular Dynamics Study of Small Molecule
Inhibitors of the Bcl-2 Family

51
Preface
The contents presented in the following chapter have been published as presented:
Acoca S, Cui Q, Shore GC, Purisima EO. 2011. Molecular dynamics study of small
molecule inhibitors of the Bcl-2 family. Proteins. 79(9):2624-36.

52
2.1 Rationale
Molecular modeling techniques have taken an important role in drug
development. This is especially true of molecular simulations and scoring functions
which provide useful insights for the optimization of lead compounds. Obatoclax and
ABT-737 are two novel Bcl-2 inhibitors which have different selectivity profiles for
antiapoptotic Bcl-2 members. While numerous studies have examined the selectivity of
BH3 domains for Bcl-2 members, few have provided conclusive evidence as to the
selectivity of ABT-737. With regards to Obatoclax, lack of structural data on its binding
mode has also left much questions unanswered as to how it mediates its inhibition of Bcl-
2 members. This study therefore aimed to provide the grounds on which the selectivity of
both ABT-737 and Obatoclax could be understood while identifying the most probable
binding mode of Obatoclax.
2.2 Abstract
We carried out docking and molecular dynamics simulations on ABT-737 and
Obatoclax, which are inhibitors of the Bcl-2 family of proteins. We modeled the binding
mode of ABT-737 with Bcl-XL, Bcl-2, and Mcl-1 and examined their dynamical
behavior. We found that the binding of the chlorobiphenyl end of ABT-737 was quite
stable across all three proteins. However, the phenylpiperazine linker group was
dramatically more mobile in Mcl-1 compared to either Bcl-XL or Bcl-2. The S-phenyl
group at the p4 binding site was well-anchored in Bcl-XL and Bcl-2 but was somewhat

53
more mobile in Mcl-1 although the phenyl ring itself on average stayed close to the p4
binding site in Mcl-1. This greater mobility is likely due to the greater openness of the p3
and p4 binding sites on Mcl-1. The calculated binding free energies were consistent with
the much weaker binding affinity of ABT-737 for Mcl-1. Obatoclax was predicted to
bind at the p1 and p2 binding sites of Mcl-1 and the binding mode was quite stable during
the molecular dynamics simulation with Mcl-1 wrapping around the molecule. The
modeled binding mode suggests that Obatoclax is able to inhibit all three proteins
because it makes use of the p1 and p2 binding sites alone, which is a fairly narrow groove
in all three proteins unlike the p4 binding site, which is much broader in Mcl-1.
2.3 Introduction
Cancer is fundamentally a disease of dynamic changes in the genome. It has been
described as a multistep process culminating in the acquirement of six essential
alterations in cellular physiology (Hanahan and Weinberg, 2000). Dysregulation of the
apoptotic process has been recognized as one of these critical alterations required for
progression to the disease phenotype (Hanahan and Weinberg, 2000). As such, research
directed towards a better understanding of the processes involved in the regulation of
apoptosis has bloomed in the past decade, directed towards a better understanding of the
extensive network of protein interactions that regulate it and the potential targets that can
be used to activate it.

54
At its core, apoptosis is the mechanism responsible for the careful synchrony of
cellular death observed throughout development, the maintenance of homeostasis and
proper immune function (Krammer et al., 1994; Meier et al., 2000; Elmore S, 2007).
There are two pathways (intrinsic and extrinsic) which converge towards activation of the
apoptotic machinery. The extrinsic pathway is characterized by activation of members of
the death receptor family (Ashkenazi and Dixit, 1998). Death receptors, which belong to
the tumor necrosis factor (TNF) receptor superfamily, are surface transmembrane
receptors engaged by binding of extracellular “death ligands” such as FasL and TNF
(Ashkenazi and Dixit, 1998). Activation of these receptors leads to the formation of the
death-inducing signaling complex (DISC), which mediates the activation of initiator
caspases thereby committing the cell to apoptotic death (Bao and Shi, 2007). On the other
hand, the intrinsic (mitochondrial) apoptotic pathway is triggered by mainly non-receptor
stimuli. It is unique in its ability to initiate apoptosis in response to DNA damage,
cytotoxic stress and cytokine deprivation though it can be engaged by the extrinsic
pathway as well (Brenner and Mak, 20009). In response to apoptotic stimuli, the intrinsic
pathway triggers the permeabilization of the outer mitochondrial membrane (OMM).
This permeabilization releases Cytochrome C and other molecules residing within the
mitochondrial inter membrane space (IMS) into the cytosol, resulting in the formation of
the apoptosome (a complex of Cytochrome C, APAF-1 and pro-caspase 9) and activation
of the caspase cascade through caspase 9 (Ow et al., 2008).
At the heart of the intrinsic pathway lies the Bcl-2 family of apoptotic proteins.
Known as the “Gatekeepers of Mitochondrial Apoptosis”, the Bcl-2 family of proteins

55
are unique in their role of regulating mitochondrial outer membrane integrity in response
to death stimuli (Adams and Cory, 2007). Through heterodimerization, anti-apoptotic
members can neutralize the effects of pro-apoptotic members, the relative balance of
which acts as a regulating switch for initiating mitochondrial apoptosis (Oltersdorf et al.,
2005). The Bcl-2 family is composed of three groups of proteins distinguished through
functional and structural features. The antiapoptotic members (consisting of Bcl-2, Bcl-
XL, Bcl-B, Bcl-W, Mcl-1 and A1) share three to four α-helical regions of high sequence
similarity known as the Bcl-2 Homology (BH) domains (Petros et al., 2004; Adams and
Cory, 2007). Bcl-2 pro-survival proteins inhibit the pro-apoptotic members in part by
sequestering the amphiphilic BH3 helix of the pro-apoptotic members within a long
surface exposed groove. Because the Bcl-2 survival members promote cell survival in
cancer cell lines, they are recognized as a highly relevant target for the treatment of
cancer. They are also implicated in general resistance to chemotherapeutic agents along
with a more aggressive malignant phenotype (Minn et al., 1995; Simonian et al., 1997;
Amundson et al, 2000). Bcl-2 inhibitors show promise as cancer therapeutics, especially
when used in combination therapy (Oltersdorf et al., 2005; Nguyen et al., 2007; Lessene
et al., 2008; Tse et al, 2008; Ackler et al., 2010).
One promising agent is the orally bioavailable compound ABT-263 (navitoclax);
ABT-737 (Figure 2.01) is an analog that is widely used in preclinical studies as a tool
compound (Oltersdorf et al., 2005; Tse et al., 2008). These compounds were developed
using the SAR by NMR methodology and employing stable protein fragments for optimal
NMR study. They display subnanomolar affinity for such recombinant fragments of Bcl-

56
2, Bcl-XL, and Bcl-W but > 1µM for Mcl-1 (Shuker et al., 1996; Oltersdorf et al., 2005;
Tse et al., 2008). As predicted by its affinity profile, ABT-737 as well as navitoclax
exhibits limited efficacy in cells where Mcl-1 is expressed (Konopleva et al., 2006; van
Delft et al., 2006; Chen et al., 2007; Tse et al. 2008). Consequently, this selectivity is one
of the key aspects of navitoclax that may limit its chemotherapeutic utility. Several
studies have addressed the selectivity of BH3 peptides for members of the Bcl-2 family;
however, extrapolation to explaining and modifying the selectivity of ABT-
737/navitoclax has not been straightforward (Lee et al., 2008; Lee et al., 2009; Fire et al.,
2010). Furthermore, the Bcl-2 pro-survival proteins are anchored in the mitochondrial
outer membrane where they in fact undergo conformational changes and greater
penetration into the lipid bilayer in response to stress stimuli (Shore et al., 2008). Thus
despite the high affinity binding of these compounds to soluble recombinant protein
fragments in aqueous buffers in vitro, it is not clear how this translates to the efficacy of
binding in intact cells.
Figure 2.01 ABT-737 chemical structure.

57
A second Bcl-2 inhibitor currently in Phase I & II trials is obatoclax (GX15-070),
a hydrophobic cycloprodigiosin derivative developed by Gemin X Pharmaceuticals
(Nguyen et al., 2007). Obatoclax (Figure 2.02) was found to inhibit the binding of BH3
peptides to recombinant fragments of all pro-survival members of the Bcl-2 family with
low micromolar affinity employing fluorescence polarization assays but its key property
lies in its ability to potently overcome Mcl-1 mediated resistance to chemotherapeutic
agents (Zhai et al., 2006; Nguyen et al., 2007; Perez-Galan et al., 2007). Indeed, in
assays employing native Mcl-1 in intact mitochondrial outer membrane, 10 nM obatoclax
reverses the constitutive interaction between Mcl-1 and pro-apoptotic Bak. Hence, an
understanding of its binding mode to Mcl-1 is of particular interest.
Figure 2.02 Obatoclax chemical structure.
In this chapter we present an extensive analysis of molecular dynamics
simulations performed for obatoclax/Mcl-1 and ABT-737 complexes. The aim of the
current study is to rationalize the binding specificity of ABT-737 and to predict the
binding mode of obatoclax to Mcl-1, for which an experimentally determined three-
dimensional structure of the complex has proven to be elusive.

58
2.4 Methods
2.4.1 Structure Preparation
The starting structures for the docking and molecular dynamics simulation
experiments of Bcl-2, Bcl-XL and Mcl-1 complexes were taken from the Protein Data
Bank (Codes 1YSW, 2YXJ, and 2PQK respectively). All bound ligands (small molecules
and BH3 peptides), waters and ions and other molecules were removed from the
complexes, except for Bcl-XL for which we kept the ABT-737 ligand. Missing side
chains, terminal residues and hydrogen atoms were added using Sybyl 8.0 (Tripos Inc.,
St. Louis, MO) and XLeap in AMBER (Case et al., 2005). Protonation states were
assigned using the H++ server (Gordon et al., 2005). Visual inspection of all assigned
protonation states was done in Sybyl 8.0 and adjusted as needed.
2.4.2 Force field parameters
The FF99SB force field in the AMBER suite of programs was used for the protein
atoms. The antechamber module of Amber Tools was used to assign GAFF parameters
for obatoclax and ABT-737 (Wang et al., 2004; Case et al., 2005; Hornak et al., 2006). In
the case of the ABT-737, we applied the biphenyl parameters of Athri and Wislon (Athri
and Wilson, 2009). Partial charges for the inhibitors were obtained using RESP with 6-
31G* electrostatic potentials calculated using GAMESS (Bayly et al., 1993; Schmidt et
al., 1993).

59
The sulfonamide group in ABT-737 has an imide-like bond (see Figure 2.01) that
is not well-represented by the default GAFF parameters. Hence, we derived force field
torsional parameters for the S–N bond using a model compound with a phenyl ring on
either side the SO2NHCO group. The covalent geometry was taken from the Cambridge
Structural Database (CSD) entry CEKHIJ (Allen FH, 2002). A torsional energy profile
around the S–N bond was generated using GAMESS at an MP2/6-31G* level of theory.
A truncated Fourier series was fitted to the residual torsional energy profile after
subtracting out the calculated AMBER energy. The resulting coefficients are listed in
Table S1 (Supplementary Materials).
2.4.3 Docking
For Bcl-XL the deposited crystal structure of the complex (PDB 2YXJ)
was used directly as the starting point for our calculations. For Bcl-2, the initial docked
pose of ABT-737 was obtained by superposing the Bcl-2 structure with Bcl-XL and
extracting and merging the inhibitor coordinates in the Bcl-XL structure into the Bcl-2
structure. The same procedure was carried out for docking ABT-737 into Mcl-1. For Bcl-
2 and Mcl-1, direct merging of the inhibitor into the binding site resulted in some side
chains being in awkward positions relative to ABT-737. These were initially relieved
using the Sculpt module of Pymol (Schrodinger, New York) followed by ligand-
restrained energy minimization.

60
Docking of obatoclax into Mcl-1 was carried out using an in-house docking
program (manuscript in preparation) that does an exhaustive rigid body docking
(translation and rotation) of the ligand on a grid. A rectangular box enclosing the entire
binding groove defined the search region. We used a grid spacing of 0.5 Å and rigid body
rotational angular increments corresponding to atomic displacements of 0.5 Å. Poses
were scored using a weighted combination of van der Waals, coulomb, surface area,
shape complementarity and hydrogen bonding terms. The weights were previously
calibrated to reproduce binding poses of a training set of protein-ligand complexes.
OMEGA (OpenEye Scientific Software, New Mexico) was used to generate conformers
for the ligand used in the rigid docking. The protein was kept fixed during the docking.
The top-scoring pose was used for the MD simulation.
2.4.4 Molecular Dynamics Simulations
Each system was immersed in a truncated octahedral TIP3P water box (Jorgensen
et al., 1983). The distance between the wall of the box and the closest atom of the solute
was 12Å. Sodium or chloride counterions were added as required to maintain
electroneutrality of the system. Molecular dynamics (MD) simulations were carried out
using the AMBER program. A 2 fs time step and 9 Å non-bonded cutoff was used.
SHAKE was employed to constrain bond lengths of bonds to hydrogen atoms and the
Particle Mesh Ewald algorithm was used to treat long-range electrostatics (Ryckaert et
al., 1977; Cheatam et al., 1995).

61
1000 steps of energy minimization were carried out with harmonic restraints with
a force constant of 20 Kcal/mol/Å applied to the solute atoms. The restraints were kept as
the system was subsequently heated from 100 to 300 K over 25ps in the NVT canonical
ensemble. The system was then equilibrated to adjust the solvent density under 1 atm of
pressure in the NPT isothermal-isobaric ensemble simulation over 25 ps. The harmonic
restraints were gradually reduced to zero during an additional four rounds of 25ps NPT
simulations. Production runs of 20 ns were then run for each complex. Snapshots were
collected at 1-ps intervals.
2.4.5 Binding free energy estimate
Binding free energies were estimated using the Solvated Interaction Energy (SIE)
method as implemented in the sietraj program, which calculates averages from the MD
trajectory. The SIE is the AMBER interaction energy augmented with the desolvation
cost of binding consisting of a reaction field energy and cavity cost (Naïm et al., 2007;
Cui et al., 2008). The reaction field energy was obtained by solving the Poisson equation
using the BRI BEM program and using a variable-probe molecular surface and a
marching-tetrahedra tessellation algorithm to define the dielectric boundary (Purisima
and Nilar, 1995; Chan and Purisima, 1998; Purisima EO, 1998; Bhat and Purisima,
2006). SIE default values of 2.25 and 78.5 for the dielectric constants were used for the
solute and solvent regions, respectively (Cui et al., 2008).

62
2.5 Results and Discussion
2.5.1 Molecular Modeling of ABT-737 complexes
There is a crystal structure of the complex of ABT-737 with Bcl-xL (PDB 2YXJ)
but there are no crystal structures of complexes of ABT-737 with either Bcl-2 or Mcl-1.
However, there is an NMR structure of Bcl-2 with an analog of ABT-737 (PDB 1YSW)
and a crystal structure of Mcl-1 with the Bim BH3 peptide (PDB 2PQK). The 1YSW and
2PQK structures were used to model the complexes of ABT-737 with Bcl-2 and Mcl-1,
respectively, using the 2YXJ structure as a guide (see Methods section). With Mcl-1
there was also the possibility of docking ABT-737 into the apoprotein (PDB 1WSX).
However, examination of the apo structure showed that the binding groove around part of
the binding site had closed up somewhat and was too narrow to accommodate ABT-737.
Hence we opted to use the protein structure obtained from the complex with the Bim
BH3 peptide instead. Note that all references to residue numbers in the discussion that
follows are based on the sequence numbers in the original PDB files.
Binding of natural ligands, the BH3 peptides, is achieved through interaction of
the hydrophobic face of the amphipathic helix of the peptide with the hydrophobic cleft
of the Bcl-2 family. BH3 peptides have four hydrophobic amino acids (referred to as h1–
h4; see Figure 2.03) that bind to complementary binding sites in the Bcl-2 family:

These
respe
pepti
(Olte
Mcl-
relate
detail
dynam
make
of the
groov
ABT
Figure 2
from BH
residues w
conserved
Residues
e four residu
ective bindin
de. Nonpola
rsdorf et al.,
1 versus Bcl
ed proteins (
l by explicitl
mics simulat
e specific inf
e Results and
ve. Then we
-737: the ch
.03 Multip
3-Only prot
with respect
d hydrophob
in bold are c
ues also affor
ng partners. F
ar groups in A
, 2005). Lee
l-2 and Bcl-x
Lee et al., 20
ly docking it
tions on all t
ferences with
d Discussion
will look at
lorobipheny
ple sequenc
teins. The nu
to the full-le
bic residues a
conserved ac
rd some deg
Figure 2.04 s
ABT-737 m
et al. ration
xL based on v
007). We ex
t into Bcl-2,
three comple
h regard to A
n as follows.
the MD resu
yl group, the
63
e alignment
umbers at th
ength protein
are labeled h
cross all proa
gree of select
shows an ov
imic the BH
nalized the di
variations in
xplored the in
Bcl-xL and
exes to study
ABT-737’s s
First, we w
ults accordin
phenylpiper
t of represen
he top indica
n, Bim. The
h1 to h4, box
apoptotic BH
tivity of BH
verlay of AB
H3 h2 and h4
ifference in a
n the binding
nteractions o
Mcl-1and ru
y their dynam
specificity. W
will give an o
ng to the diff
razine linker
ntative BH3
ate the positi
position of t
xed and high
H3 domains
3 domains fo
T-737 and th
4 hydrophobi
affinity of A
g pockets of
of ABT-737
unning 20-ns
mical behavi
We will orga
overview of t
fferent comp
r, the
3 domains
on of the
the four
hlighted.
.
for their
he Bim BH3
ic side chain
ABT-737 for
these three
in greater
s molecular
ior and to
anize the rest
the binding
onents of
3
ns
t

nitrop
Last,
2.5.2
hydro
2005
A.01
repre
bindi
four h
pocke
accom
phenylsulfon
we will pres
Figure 2
xL. BH3
chains cor
which sup
Binding g
The struct
ophobic heli
). The bindin
, Appendix A
sentative of
ing grooves o
hydrophobic
et, which is n
mmodates th
namide moie
sent a propo
.04 Super
is represente
rresponding
perposes wit
groove struct
tures of Bcl-
ces surround
ng groove is
A). Figure 2
the binding
of Bcl-xL, B
c pockets lab
not utilized b
he chlorobiph
ety and the te
sed binding
position of A
ed as a semi-
to h1–h4 sh
th the dimeth
ture
-xL, Bcl-2 an
ded by amph
s formed mai
.05 shows sn
modes towa
cl-2 and Mc
beled p1 to p
by ABT-737
henyl group
64
erminal S-ph
mode for ob
ABT-737 an
-transparent
hown. Also s
hylamino gro
nd Mcl-1 con
hipathic helic
inly by the α
napshots fro
ards the end
cl-1 are show
p4. At one en
7. The p2 po
. p3 lies in a
henyl and di
batoclax in M
nd Bim BH3
helical ribbo
shown is the
oup of ABT-
nsist primari
ces (Petros e
α2, α3, α4 an
om the three
of the 20-ns
wn in similar
nd of the bin
ocket is the d
a channel flan
imethylamin
Mcl-1.
3 peptide bo
on with sele
side chain o
-737.
ily of two ce
et al., 2004;
nd α5 helices
complexes
s simulations
r orientations
nding groove
deepest of th
nked by the
no groups.
ound to Bcl
ected side
of Tyr73,
entral
Day et al.,
s (Figure
s. The
s with the
e is the p1
he four and
α2 and α3
l-

65
helices. Towards the end of the binding groove is the p4 pocket. A major difference
between the Mcl-1 binding groove and that of Bcl-xL and Bcl-2 is that p4 is fairly open
and not a well-defined pocket in Mcl-1 (Czabotar et al., 2007).
2.5.3 Chlorobiphenyl group
The chlorobiphenyl group was initially docked in the p2 pocket in each of the
proteins and it stays stably bound to that pocket throughout the simulation. This is the
hydrophobic pocket occupied by the conserved Leu in BH3 peptides. In order to assess
the dynamics of the binding mode of ABT-737 in the various complexes we monitored
fluctuation of the position of the centroids of the various rings present in the inhibitor
with respect to the docked pose at the start of the simulation. Figure 2.06 shows the
fluctuations of the centroids of the two rings in the chlorobiphenyl group for the three
complexes. We see that the chlorobiphenyl group is well-anchored in the p2 pocket in
each of the complexes. The second ring in the biphenyl is also quite stable and its
centroid remains fairly localized. Interestingly, the chlorobiphenyl group exhibits larger
fluctuations in Bcl-xL than in Bcl-2 or Mcl-1 but its equilibrium position is closer to the
initial coordinates (within 1 Å) than in Bcl-2 or Mcl-1. The lower drift in equilibrium
structure away from the initial structure may be due to the fact that the starting structure
for the Bcl-xL complex is an actual crystal structure while the other two complexes are
modeled. In Mcl-1, the chlorobiphenyl group enters the p2 pocket at a less steep angle
compared to that in Bcl-xL and Bcl-2 (Figure 2.05c). This has the effect of pulling the
piperazine ring of the linker towards the p2 pocket.

Figur
Mcl-
sulfur
white
show
re 2.05 Ca
1. The prote
r atoms colo
e. ABT-737
wn.
alculated bi
eins are repre
ored red, blue
is represente
inding mode
esented as m
e and yellow
ed as a stick
66
e of ABT-73
molecular sur
w, respective
model with
37 in (a) Bcl
rfaces with o
ely. All other
non-polar h
l-xL, (b) Bc
oxygen, nitro
r atoms were
hydrogen ato
cl-2 and (c)
ogen and
e colored
oms not

2.5.4
helica
Figure 2
initial po
the first s
correspon
to the chlo
Phenylpip
The pheny
al backbone
.06 Distan
sitions after
snapshot. D
nd to Bcl-xL
orophenyl an
perazine link
ylpiperazine
of the BH3
nce of the AB
r superposit
ata points ar
, Bcl-2 and M
nd phenyl rin
ker
e linker sits i
peptide conn
67
BT-737 bip
tion of the p
re at 10-ps in
Mcl-1, respe
ngs in the bi
in the region
necting the h
henyl ring c
protein C-al
ntervals. Row
ectively. Col
iphenyl grou
n that would
hydrophobic
centroids fr
lpha atoms
ws one to th
lumns 1 and
up, respectiv
be occupied
c residues th
rom their
to those in
hree
2 correspon
vely.
d by the
hat bind to th
nd
he

68
p2 and p4 pockets. Figure 2.07 shows the fluctuations of the centroids of the two rings in
the phenylpiperazine group for the three complexes. As with the chlorobiphenyl group,
the smallest drift of the phenylpiperazine linker from the initial structure is seen in the
Bcl-xL complex. The equilibrium location of the centroid of the piperazine ring is about
1.3 Å away from the initial structure and that of the phenyl ring is about 0.7 Å away. In
Bcl-2, the linker rings are also fairly stable and settle in at around 2 Å away from the
starting location but are fairly stable at that position. In contrast, the linker rings in Mcl-1
drift significantly away from their starting structures. Towards the last 5 ns of the
simulation the centroids of the piperazine and phenyl rings in the linker have moved
about 4 and 5.5 Å away from their initial positions, respectively. They have essentially
shifted from one side of the binding groove to the other. They also experience larger
fluctuations than in Bcl-xL or Bcl-2. The much larger width of the binding groove in Mcl-
1 (Figure 2.05c) allows this movement. In particular, the linker is not as well anchored in
the p4 binding site of Mcl-1 (see below).

2.5.5
737 (
hydro
Figure 2
positions
snapshot
xL, Bcl-2
and pheny
Nitrophen
We monit
(Appendix A
ogen bond w
.07 Distan
after super
. Data points
and Mcl-1,
yl rings, resp
nylsulfonami
tored hydrog
A, Figure A.0
with a side ch
nce of the AB
rposition of
s are at 10-p
respectively
pectively.
ide group
gen bonding
02). In Bcl-x
hain carbony
69
BT-737 link
the protein
ps intervals. R
y. Columns 1
interactions
xL, the sulfon
yl oxygen of
ker ring cen
n C-alpha at
Rows one to
1 and 2 corre
s of the sulfo
namide HN
f Asn136. Th
ntroids from
toms to thos
o three corres
espond to th
onamide grou
forms a fair
he HN to O d
m their initia
se in the firs
spond to Bcl
he piperazine
up of ABT-
rly stable
distance
al
st
l-
e

70
generally fluctuates around 2.3 Å. One of the sulfonyl oxygens also forms a hydrogen
bond with either the backbone amide hydrogen of Gly138 or with the side chain amide
HN of Asn136 for most of the simulation (Appendix A, Figure A.03). In Bcl-2, the
sulfonamide HN also forms a stable hydrogen bond with the side chain carbonyl of the
homologous Asn140. One of the sulfonyl oxygen atoms forms a hydrogen bond with
either the Gly142 backbone amide hydrogen or the side chain amide of Asn140. In Mcl-
1, no such hydrogen bonds are formed with the equivalent Asn side chain or Gly
backbone. We see in Figure A.02 (Appendix A) that the sulfonamide group is highly
mobile and moves far away from the homologous Asn260 and Gly262 residues in Mcl-1.
The position of the ring centroid of the nitrophenyl group is very stable
throughout the simulation of Bcl-xL (Figure 2.08). It drifts to about 0.8Å away from its
starting position. In Bcl-2, there is a larger shift of 2.5Å away from the starting pose and
slightly larger fluctuations compared to Bcl-xL. In Mcl-1, the displacement from the
initial position is 2.1 Å with noticeably larger fluctuations.

2.5.6
and s
pocke
confo
Figure 2
centroids
alpha ato
Rows one
1 and 2 co
S-phenyl g
ABT-737
stacks agains
et of the Bcl
ormation is s
.08 Distan
s from their
oms to those
e to three cor
orrespond to
group
and related
st the nitroph
l-2 family. A
seen in the B
nce of the AB
r initial posi
e in the first
rrespond to B
o the nitrophe
compounds
henyl ring. T
As with all th
Bcl-xL simula
71
BT-737 nitr
tions after s
t snapshot. D
Bcl-xL, Bcl-
enyl and S-p
have a char
The S-phenyl
he other ring
ation (Figure
rophenyl an
superpositio
Data points
-2 and Mcl-1
phenyl rings
racteristic S-
l group bind
s, the smalle
e 2.08). The
nd S-phenyl
on of the pr
are at 10-ps
1, respective
, respectivel
phenyl grou
ds in the h4 h
est drift from
phenyl ring
l ring
rotein C-
intervals.
ely. Columns
ly.
up folds back
hydrophobic
m the starting
centroid
s
k
c
g

72
settles in at about 1.1Å from its initial position. In Bcl-2 and Mcl-1, the ring centroids are
about 2.2Å and 1.7Å, respectively, away from their starting positions. The ring is
markedly more mobile in Mcl-1 than in either Bcl-2 or Bcl-xL but remains in the vicinity
of the h4 pocket.
2.5.7 Dimethylamino group
Pointing away from the nitrophenyl and S-phenyl rings in ABT-737 is an N,N-
dimethylethylamino moiety that was modeled as a protonated species. The protonated
state is consistent with the expected pKa of this tertiary alkyl amine and is compatible
with the hydrogen bond seen between the nitrogen and the side chain carboxylate oxygen
of Glu96 in one of the asymmetric units of the crystal structure (PDB 2YXJ) of Bcl-xL
with ABT-737. The MD simulations with Bcl-xL show that the dimethylamino group
spends a significant amount of time hydrogen-bonded to the Glu96 side chain carboxylate
(Appendix A, Figure A.04). In Bcl-2 the Glu side chain is mutated to a shorter Asp side
chain. The ABT-737 dimethylamino N is occasionally hydrogen bonded to Asp100 but
much less frequently than in the Bcl-xL case. In Mcl-1 the corresponding amino acid is
Gly219 with no possibility for hydrogen bonding.
2.5.8 SIE analysis and virtual alanine mutations
In order to quantify the energetic consequences of the variations in the binding
modes of ABT-737 we carried out binding free energy calculations using the Solvated

73
Interaction Energy (SIE) method (Naïm et al., 2007; Cui et al., 2008). The sietraj
program was used to estimate binding affinities from the trajectories (Cui et al., 2008).
2000 snapshots taken at 10-ps intervals were used to obtain the averages. Table 2.1
summarizes the calculated binding affinities and their components. The calculated
binding affinities for Bcl-2 and Bcl-xL are similar at –11.7 kcal/mol and –11.6 kcal/mol,
respectively. The affinity for Mcl-1 is significantly lower at –9.1 kcal/mol. This is
consistent with the experimental observation that ABT-737 is at least 200-fold less potent
an inhibitor of Mcl-1 compared to Bcl-2 (Zhai et al., 2006). This is probably due to the
poorer complementarity of ABT-737 to the more open p3 and p4 pockets of Mcl-1.
Table 2.1 Solvated interaction energies (SIE) in kcal/mol.
Bcl-xL Bcl-2 Mcl-1
Ave Stdev Ave Stdev Ave Stdev
EvdW -74.76 4.97 -78.80 4.10 -55.81 4.75
ECoul -91.85 13.65 -66.78 5.34 21.87 4.72
ERF 95.82 11.91 74.49 5.71 -15.26 4.15
ECav -12.28 0.62 -12.52 0.41 -9.79 0.72
ΔG -11.59 0.73 -11.65 0.54 -9.07 0.59
Averages were taken over 2000 snapshots at 10-ps intervals.
ΔG = 0.104758 * (EvdW + ECoul + ERF + 0.012894 * ECav) – 2.89
To drill down further on the energetics of ABT-737 binding, we carried out a virtual
alanine scan of selected residues that line the binding groove of Bcl-xL. The coordinates
in the snapshots of the MD trajectory with the wild-type protein were modified to mutate
each residue in the list one by one to alanine and the binding free energy for each
mutation was recalculated over the same 2000 snapshots. The binding free energies

74
relative to the wild-type protein are summarized in Table 2.2. All values are positive or
zero, meaning that all the alanine mutations examined are detrimental of have no effect.
However, no single mutation is predicted to have an overwhelming effect. The most
important side chain for binding ABT-737 appears to be from Y195 followed by Y101,
F97 and E96. The Y195 aromatic ring stacks on top of the nitrophenyl ring of ABT-737.
The piperazine ring of ABT-737 packs against the side chains of Y101 and F97. The side
chain of F97 is also in contact with the S-phenyl end of the ligand. The E96 side chain
carboxylate forms ionic interactions with the protonated diaminomethyl group of ABT-
737. F105 and E98 are part of the binding groove but their side chains are not in direct
contact with ABT-737. They serve as a negative control and we see no observable
contribution to binding suggested by the virtual alanine mutation. Moroy et al. have
reported a theoretical analysis of the binding interactions of Bcl-xL with a series of BH3
peptides (Moroy et al., 2009). They used molecular dynamics and an MM/PBSA analysis
to dissect out the contributions of amino acids around the binding groove. They identified
several hotspots (Figure 3 in their paper) for interaction with BH3 peptides: F97, Y101,
L112, V126, L130, R139, F146 and Y195 (Moroy et al., 2009). This more or less
parallels the results in Table 2.2. This is consistent with ABT-737 being a mimic of BH3
peptides. However, the magnitudes of the contributions are smaller in our case compared
to those reported by Moroy et al. Aside from the magnitudes, the main differences in the
hotspots are L112 and R139. L112 lines the p1 pocket, which is not used by ABT-737
and hence has no effect in our alanine scan. In their analysis, Moroy et al. found
significant contributions from R139 arising from a salt bridge with an aspartic acid from
the BH3 peptide (Moroy et al., 2009). In our case, R139 is in partial contact with ABT-

75
737 but does not form a salt bridge and, correspondingly, we see no effect of an R139A
mutation. On the other hand, our MD simulations identify a salt bridge between the ABT-
737 dimethylamino group and E96 as a stabilizing interaction.
Table 2.2 Virtual alanine mutations.
Residue ΔΔG (kcal/mol)
Y195A 0.7
F146A 0.1
R139A 0.0
L130A 0.2
E129A 0.1
V126A 0.1
L112A 0.0
L108A 0.2
F105A 0.0
Y101A 0.5
R100A 0.2
E98A 0.0
F97A 0.5
E96A 0.4
ΔΔG values are relative to the complex of
ABT-737 with wild-type Bcl-xL.
2.5.9 Protein structure and dynamics
Previous molecular dynamics studies of the Bcl-2 family have focused on
complexes with various BH3 peptides (Pinto et al., 2004; Lama et al., 2008; Moroy et al.
2009). They explored differences in binding modes and structural changes in the binding
groove in response to the BH3 peptides. Lama and Sankararamakrishnan examined the

76
dynamics and stability of various helices in complexes of Bcl-xL and Bad, Bak and Bim
peptides. They noted that the α2 helix was the most flexible, consistent with the partial
unfolding of this helix that they and Eyrisch and Helms observed for the apoprotein
(Eyrisch et al., 2007; Lama et al., 2008). Lee et al. have also pointed out the plasticity of
the binding groove of Bcl-xL including the junction of the α2 and α3 helices (Sattler et
al., 1997; Lee et al., 2009). Although these and other studies have brought some
understanding of the nature of the interactions of BH3 peptides with Bcl-2 proteins, the
application of that knowledge to understand the specificity of small molecules such as
ABT-737 for Bcl-xL or Bcl-2 over Mcl-1 has not been straightforward (Lee et al., 2008;
Fire et al., 2010). Lee et al. attempted to use structural information from complexes of
Bcl-xL with Bim and a BimL12Y mutant to redesign ABT-737 to bind to Mcl-1 (Lee et
al., 2009). They focused on the chlorobiphenyl group and modified the angle of
approach to the p2 hydrophobic pocket of a phenyl ring in a derivative of ABT-737.
Unfortunately, the modification did not confer activity towards Mcl-1, although it
enhanced selectivity towards Bcl-xL. The focus on the p2 pocket for enhancing Mcl-1
activity is understandable because mutation at the h4 residue of Bim (e.g., F69A) reduces
activity towards Bcl-xL but not towards Mcl-1, indicating the lack of importance of this
site for Mcl-1 (Lee et al., 2008). We believe that this is precisely what explains the lack
of activity of ABT-737 against Mcl-1. Our MD simulations indicate that the
chlorobiphenyl group of ABT-737 is able to form good interactions with the Mcl-1 p2
pocket. On the other hand, stable interactions were not observed at p3 and p4 (Figures
2.07 and 2.08). This suggests that the interaction energies between Mcl-1 and ABT-737
are not sufficient to overcome the entropic cost of binding such a large flexible molecule.

77
A strategy for designing small molecule inhibitors of Mcl-1 is to either use larger groups
to fill up the p3 and p4 sites better or to avoid using these sites altogether and concentrate
on enhancing interactions at just the p1 and p2 sites, which are much better defined in
Mcl-1. The latter, we believe, is precisely what obatoclax does (see next section).
Concerns can be raised on the dependence of our results on our initial structures.
We used the crystal structure of ABT-737 with Bcl-xL as the starting point for our MD
simulations. For Bcl-2 we docked ABT-737 onto a structure of Bcl-2 derived from an
NMR structure of Bcl-2 with an analog of ABT-737. Thus, we have some level of
confidence in the starting structures for the complexes with Bcl-xL and Bcl-2. However,
for Mcl-1 we used a crystal structure of Mcl-1 with the Bim BH3 peptide since no crystal
structures of Mcl-1 with small molecules were available. Binding of BH3 domains, which
are bulkier than ABT-737, to the hydrophobic cleft can cause an expansion of the groove.
In particular, it can induce a shift in α3 and part of α2 (Liu et al., 2003; Day et al., 2005;
Czabotar et al., 2007). The question then arises as to whether the less poorly defined
binding of ABT-737 to Mcl-1 that we observed in our MD simulations was an artifact of
our choice of starting structure. To answer this we altered our Bcl-xL structure by
modifying the coordinates of the α2 and α3 helices and the loop joining them using the
crystal structure of a complex of Bcl-xL and a Bim peptide (pdb code 3IO8) as a template
(Lee et al., 2009). We carried out a 20-ns MD simulation on this complex and observed
similar dynamical behaviour and stability of ABT-737 in the binding pocket (Appendix
A, Fig. A.05) compared to our original MD simulation with the unaltered crystal structure
conformation. It should be noted though that at the end of the 20-ns simulation, helix α3

78
and the loop linking it to α2 have not yet converged back to the equilibrium positions
seen in the unaltered crystal structure. However, this does not appear to have significantly
affected the calculated binding affinity using this trajectory, which at -11.38 kcal/mol is
quite comparable to the -11.59 kcal/mol obtained using the original crystal structure. This
suggests that the choice of starting structure for Mcl-1 taken from a complex with a BH3
peptide did not bias the results significantly.
2.5.10 Mcl-1 and obatoclax
Virtual docking of obatoclax in Mcl-1 positioned it in the p1 and p2 pockets with
the methoxy group buried in the p2 pocket. Figure 2.09 shows the modeled binding mode
of obatoclax in Mcl-1. The three rings of obatoclax lie in a plane with the nitrogens
grouped together facing away from the protein core. During the course of the MD
simulation, the His252 side chain from Mcl-1 moves towards obatoclax and forms
hydrogen bonding interactions with the indole NH and pyrrole NH of obatoclax. This is a
fairly stable interaction and seems to help lock the molecule in place. Also the protein
residues near the p1 and p2 binding sites close up around the obatoclax molecule and
partially bury it. The methoxy group sits in the p2 site. Unlike ABT-737, obatoclax does
not make use of the p4 binding site. We saw in the ABT-737 MD simulations that the p4
binding site in Mcl-1 is fairly open as previously noted and does not provide an ideal
binding site for a small molecule (Czabotar et al., 2007). Larger molecules such as the
BH3 peptides can fill up that space more easily and provide more complementary
packing. Obatoclax, on the other hand, has a very snug fit in the p1 and p2 binding sites.

Figur
Mcl-
BH3
re 2.09c show
1. We see th
and the indo
Fi
1 i
ato
co
hy
Hi
Bi
ws a superpo
hat the metho
ole ring is in
igure 2.09
is represente
oms colored
olored white.
ydrogen atom
is252 side ch
im BH3 pept
osition of ob
oxypyrrole m
n the vicinity
Calculated
ed as a molec
d red, blue an
. Obatoclax
ms not shown
hain explicit
tide bound t
79
batoclax with
moiety overla
y of the h1 is
d binding m
cular surface
nd yellow, re
is represente
n. (b) Mcl-1
tly shown. (c
o Mcl-1.
h the Bim BH
aps with the
oleucine sid
mode of obat
e with oxyge
espectively.
ed as a stick
is represent
c) Superposi
H3 peptide b
e h2 leucine s
de chain.
toclax in Mc
en, nitrogen
All other ato
model with
ted as a ribb
ition of obato
bound to
side chain of
cl-1. (a) Mcl
and sulfur
oms were
non-polar
on with the
oclax and
f
l

80
Obatoclax belongs to the prodigiosin family of molecules, which have long been
known for their anti-cancer activity (Pérez-Tomás et al., 2010). Boger and Patel
compared the in vitro cytotoxicity of prodigiosin and two derivatives lacking the methoxy
group in the central ring (Boger and Patel, 1988). They found the methoxy group to be
essential for cytotoxic activity. Furthermore, theoretical calculations indicated that the
role of the methoxy substituent did not appear to be related to conformational or
electronic effects on the molecule’s structure or reactivity (Boger and Patel, 1988).
D’Alessio et al. in a study of a larger set of prodigiosin derivatives also found that
removal of the methoxy group resulted in a drastic reduction in cytotoxicity (D’Alessio et
al., 2000). They also found that substitution of methoxy for larger alkoxy groups resulted
in decreased cytotoxicity. Similarly, for obatoclax replacing the methoxy with hydroxyl
or attaching large bulky groups to the methoxy inhibited the activity of obatoclax in in
vitro binding assays (unpublished results). This is consistent with the predicted role of the
methoxy group in binding to the p2 pocket of Mcl-1. By analogy with our predicted pose,
we can speculate that the observed importance of the methoxy group in previous studies
of the cytotoxicity of prodigiosins is possibly due to a similar interaction of the group
with a nonpolar pocket in their molecular targets.
2.6 Conclusions
We modeled the binding mode of ABT-737 with Bcl-xL, Bcl-2 and Mcl-1 and
examined the dynamical behaviour of the bound conformations using molecular
dynamics simulations. We found that the binding of the chlorobiphenyl group at the p2

81
binding site was quite stable across all three proteins. However, the phenylpiperazine
linker group was dramatically more mobile in Mcl-1 compared to either Bcl-xL or Bcl-2.
The S-phenyl group at the p4 binding site was well-anchored in Bcl-xL and Bcl-2 but was
somewhat more mobile in Mcl-1 although the phenyl ring itself on average stayed close
to the p4 binding site in Mcl-1. This greater mobility is likely due to the greater openness
of the p3 and p4 binding sites on Mcl-1. The predicted binding free energy for each
complex was calculated from the molecular dynamics trajectories and was consistent
with the much weaker binding of ABT-737 to Mcl-1. Obatoclax was docked onto Mcl-1
and the structure was subjected to molecular dynamics refinement. The binding mode at
the p1 and p2 binding sites was quite stable with Mcl-1 wrapping around the molecule.
Interestingly, a histidine side chain formed polar/hydrogen bonding interactions with the
nitrogen atoms of obatoclax that persisted throughout most of the simulation. The
modeled binding mode suggests that obatoclax is able to inhibit all three proteins because
it makes use of the p1 and p2 binding sites alone, which is a fairly narrow groove in all
three proteins unlike the p4 binding site, which is much broader in Mcl-1.

82
Chapter 3
Naphthalene-based RNA editing inhibitor blocks RNA
editing activities and editosome assembly in
Trypanosoma Brucei

83
Preface
Moshiri H, Acoca S, Kala S, Najafabadi HS, Hogues H, Purisima E, Salavati R. 2011.
Naphthalene-based RNA editing inhibitor blocks RNA editing activities and editosome
assembly in Trypanosoma brucei. J Biol Chem. 286(16):14178-89.

84
3.1 Rationale
Chapter 2 looked at post-discovery molecular dynamic studies on the binding
mode and mechanism underlying the selectivity of Bcl-2 inhibitors currently in clinical
trials. The docking and simulations were used to define certain aspects of the biology of
the compounds. However, in pharmaceutical research, lead identification presents the
preceding problem of identifying molecules with the potential to develop into
pharmaceutical drugs. The primary concern becomes the identification of the most
appropriate target to engage in lead discovery efforts. In this chapter, we address the
growing problem of neglected tropical diseases by targeting the RNA Editing Ligase 1
enzyme, a target within the Trypanosoma parasites which holds promise as a selective
way of targeting the Trypanosome parasites. We then undertake a virtual screening effort
to identify lead compounds and further explore the effect of the newly identified inhibitor
on the Trypanosoma Brucei RNA editing machinery.
3.2 Abstract
RNA editing, catalyzed by the multiprotein editosome complex, is an essential
step for the expression of most mitochondrial genes in trypanosomatid pathogens. It has
been shown previously that Trypanosoma brucei RNA editing ligase 1(TbREL1), a core
catalytic component of the editosome, is essential in the mammalian life stage of these
parasitic pathogens. Because of the availability of its crystal structure and absence from
human, the adenylylation domain of TbREL1 has recently become the focus of several

85
studies for designing inhibitors that target its adenylylation pocket. Here, we have studied
new and existing inhibitors of TbREL1 to better understand their mechanism of action.
We found that these compounds are moderate to weak inhibitors of adenylylation of
TbREL1 and in fact enhance adenylylation at higher concentrations of protein.
Nevertheless, they can efficiently block deadenylylation of TbREL1in the editosome and,
consequently, result in inhibition of the ligation step of RNA editing. Further experiments
directly showed that the studied compounds inhibit the interaction of the editosome with
substrate RNA. This was supported by the observation that not only the ligation activity
of TbREL1 but also the activities of other editosome proteins such as endoribonuclease,
terminal RNA uridylyltransferase, and uridylate-specific exoribonuclease, all of which
require the interaction of the editosome with the substrate RNA, are efficiently inhibited
by these compounds. In addition, we found that these compounds can interfere with the
integrity and/or assembly of the editosome complex, opening the exciting possibility of
using them to study the mechanism of assembly of the editosome components.
3.3 Introduction
Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major are three major
trypanosomatid pathogens that cause hundreds of thousands of deaths and infect millions
of people in tropical and sub-tropical areas of the world (Stuart et al., 2008). Current
trypanocidal drugs have a number of limitations such as high rate of toxicity, low rate of
efficacy, and drug resistance (Denise and Barrett, 2001; Delespaux and de Koning, 2007).
Therefore, it is important to look for a drug that is effective and does not produce harmful

86
side effects. RNA editing is a unique post-transcriptional modification of mitochondrial
mRNAs that is shared in all trypanosomatid pathogens (Simpson et al., 2003; Stuart et
al., 2005). Modification of specific editing sites, dictated by complementary guide RNAs
(gRNAs), constitutes essential steps to ensure the production of translatable mRNAs
which encode essential components of mitochondrial respiratory system. While gRNAs
specify the number of uridylates (Us) to be added or deleted by base pairing at each
editing block, a 1.6 MDa multi-protein complex, the editosome, is responsible for
catalysis of different steps of RNA editing (Seiwert et al., 1996; Kable et al., 1997).
While the complete composition of the editosome is being elucidated, most of purified
functional editosomes contain over 20 proteins (Panigrahi et al., 2001; Panigrahi et al.,
2003). The editosomes differ in their compositions, having at least three different
complexes which sediment at ~20S on glycerol gradients and have the essential RNA
editing activities (Carnes et al., 2008). These editosomes, however, contain a similar
essential catalytic core whose proteins are functionally characterized, including two
endoribonucleases (KREN1 and KREN2) (Carnes et al., 2005; Trotter et al., 2005), two
3’-terminal uridylyl transferase (TUTase) (KRET1 and KRET2) (Aphasizhev et al.,
2002; Ernst et al., 2003), two 3’-exoribonulceases (exoUases) that are called KREX1 and
KREX2 (Schnaufer et al., 2003; Kang et al., 2005), two RNA ligases (KREL1 and
KREL2) (Schnaufer et al., 2001; Cruz-Reyes et al., 2002), and six proteins with predicted
oligonucleotide binding (OB) folds (KREPA1-6) (Panigrahi et al., 2001; Worthey et al.,
2003; Salavati et al., 2006; Law et al., 2008; Neimann et al., 2008; Tarun et al., 2008;
Guo et al., 2010; Kala and Salavati, 2010). The letter K in the beginning of each protein

87
name refers to Kinetoplastid. To specify species, we have used species-specific
abbreviations (e.g. TbREL1 for REL1 in T. brucei).
While other ligases such as T7 DNA ligase have a catalytic domain and an OB
fold domain, KREL1 and KREL2 contain only the catalytic domain (McMannus et al.,
2001; Worthey et al., 2003). The OB fold domain that is essential for interaction with the
substrate RNA is provided in trans by KREPA2 and KREPA1, which interact via their
zinc fingers with KREL1 and KREL2, respectively, and bring the substrate RNA and the
catalytic domain into proximity (Schnaufer et al., 2003; Swift et al., 2009; Gao et al.,
2010). Although RNA editing ligases differ in their structure compared to DNA ligases
(Schnaufer et al., 2003), their overall mechanism is similar. Ligation requires three
distinct and reversible steps; (i) the ligase adenylylation step in which the conserved
catalytic lysine of ligase attacks α-phosphate of ATP and displaces pyrophosphate,
forming an enzyme-AMP intermediate through the phosphoamide linkage, (ii) the ligase
deadenylylation step in which the guide RNA/nicked mRNA duplex binds to the protein
and AMP is transferred from the adenylylated ligase to the 5’-phosphate of the RNA
molecule, forming an adenylylated RNA with a 5’,5’-phosphoanhydride bond, and (iii)
the ligation step in which the free 3’-hydroxyl at the nick site attacks the
phosphoanhydride bond of the adenylylated RNA fragment, forming a phosphodiester
bond, ligating the double stranded RNA and releasing AMP.
TbREL1 is an essential enzyme for editing process and parasite viability
(Schnaufer et al., 2001). The crystal structure of TbREL1 N-terminal catalytic domain

88
has been determined, which has made virtual screening of chemical compounds against
TbREL1 possible (Amaro et al., 2008; Durrant et al., 2010). The unique characteristics of
the TbREL1 ATP-binding pocket and the absence of any close homolog in the human
genome makes TbREL1 an ideal target for design of selective inhibitors that block the
essential RNA ligase function.
A recent study has found several inhibitors against TbREL1 using a combination
of in silico analysis and in vitro adenylylation assays (Amaro et al., 2008) , some of
which have been validated as inhibitors of full-round deletion RNA editing in the
presence of purified editosome (Moshiri and Salavati, 2010). While we were preparing
this manuscript, a second study (Durrant et al., 2010) was published that reported a
number of naphthalene-based inhibitors of TbREL1. In the present study, we have studied
the effect of one of the naphthalene-based inhibitors provided by the National Cancer
Institutes (NCI) of Health, chemical no. 162535 (previously referred to as V2 (Durrant et
al., 2010) and designated in this study as C35), on editosome, and we report that upon
addition of this new inhibitor, all different essential catalytic steps of RNA editing are
inhibited, most likely as a result of losing the interaction of core editosome with substrate
RNA.
3.4 Experimental Procedures

89
The starting structure for the virtual screen was taken from the Protein Data Bank
[Accession no. 1XDN]. The bound ATP, waters and ions were removed from the
complex. Missing terminal residues and hydrogen atoms were added. Protonation states
were assigned using the H++ server (Gordon et al., 2005). Visual inspection of all
assigned protonation states was done in Sybyl 8.0 (Tripos Inc., St. Louis, MO) and
adjustments were made as needed.
3.4.2 Virtual Screening
Virtual screening was performed using our in-house docking program
(manuscript in preparation). The program uses an empirical scoring function trained to
reproduce the binding modes of known protein-ligand complexes. The program takes
conformers generated by Omega (OpenEye Scientific Software, New Mexico) and carries
out an exhaustive rigid docking of the ligand on a grid around the binding site with a grid
spacing of 0.6 Å. The window and rms settings of Omega were set to 20kcal/mol and
0.4Å, respectively. We screened a 77,000-molecule drug-like subset of NCI compounds
contained in the ZINC database (Irwin and Shoichet, 2005). The 2000 top-scoring
compounds were then clustered based on structural similarity using Sybyl 8.0 (Tripos
Inc., St. Louis, MO). These compounds were also rescored using the Solvated Interaction
Energy (SIE) binding free energy function (Naim et al., 2007; Cui et al., 2008).
Representatives from each cluster having a good docking score and SIE score were
ordered for testing.

90
3.4.3 Solvated Interaction Energy
Binding free energies were estimated using the Solvated Interaction Energy (SIE)
method (Naim et al., 2007; Cui et al., 2008). The SIE is the AMBER interaction energy
augmented with the desolvation cost of binding consisting of a reaction field energy and
cavity cost. The reaction field energy was obtained by solving the Poisson equation using
the BRI BEM program (Purisima and Nilar, 1995; Purisima, 1998) and using a variable-
probe molecular surface (Bhat and Purisima, 2006) to define the dielectric boundary.
3.4.4 Preparation of mitochondrial extract and tandem affinity purification of ligase
complex
The wild-type T. brucei cell line 1.7A was used to extract the mitochondrial
contents (39). The mitochondrial contents extracted from 11×109 cells were centrifugated
on a linear 10%-30% (vol/vol) glycerol gradient and fractionated into 21 fractions each
500 µl, as described before (Schneider et al., 2007). Tagged TbREL1 complexes were
purified from 4 liters of T. brucei cells as described before (Panigrahi et al., 2003), with
the following modifications. After TEV protease cleavage, the TEV eluates were
obtained and loaded onto 10%-30% (vol/vol) glycerol gradients and fractionated into 500
µl fractions as above. The western blot analysis was performed using four monoclonal
antibodies against KREPA1, KREPA2, KREL1, and KREPA3, as described previously
(Moshiri and Salavati, 2010).

91
3.4.5 Preparation of RNAs
All RNAs used in full-round deletion RNA editing were prepared as described
before (Moshiri and Salavati, 2010). The fluorescently labeled 16-mer reporter substrate,
5’-FAM (6-carboxyfluorescein)–GAUCUAUUGUCUCACA-TAMRA (6-
carboxytetramethylrhodamine)-3’, was synthesized and HPLC purified by Eurogentec.
The Cyb pre-edited RNA and its guide (gCyb) were synthesized as described before
(Salavati et al., 2002). RNAs for ligase and insertion assays (5’CL18, 3’CL13pp, gA6PC-
0A, and gA6PC-1A) were synthesized as described previously (Igo et al., 2000). The
substrates for pre-cleaved deletion assay (U4-5’CL, U4-3’CL, and gA6[14]PC-del) were
prepared as described before (Igo et al., 2002). Radiolabeling of RNAs at the 3’-terminus
was performed by 5’[32P]pCp ligation (Amaro et al., 2008). For 5’-terminus labeled
RNAs, [γ-32P] was used. All RNAs were purified by gel electrophoresis on 9 or 15%
denaturing polyacrylamide gel containing 7 M urea.
3.4.6 Adenylylation and deadenylylation assays
Compounds were dissolved in DMSO and reactions with equivalent concentration
of DMSO served as controls. Adenylylation assays were performed with 0.5 µl or 5 µl of
glycerol gradient fraction 11 which was incubated for 10 min on ice in the presence or
absence of each compound, followed by incubation at 28°C in the reaction buffer
containing 12.5 mM HEPES (pH7.9), 25 mM KCL, 5 mM Mg acetate, 0.25 mM DTT, 40
nM [α-32p] ATP, and 0.1% Triton X-100. Adenylylation of tag-purified editosome (TEV

92
eluate) was done for 10 min using the same protocol. The proteins were resolved on 10%
SDS-PAGE and radiolabeled proteins were detected by phosphorimaging.
Deadenylylation assays were performed using 5µl of fraction 11 from glycerol gradient,
which was pre-incubated with compounds as mentioned above followed by incubation
with 40 nM [α-32p]ATP at 28°C for 15, 90, and 150 min in reaction buffer containing 1X
HHE (25 mM HEPES pH 7.9, 10 mM Mg(OAc) 2, 50 mM KCL, and 1 mM EDTA), 5
mM CaCl2¬¬, 0.1% Triton X-100, 83 ng/ml yeast Toroula RNA, and ligatable RNA
fragments (used in pre-cleaved ligase assay). Deadenylylation of TEV eluate was carried
on for 15 min using the same protocol. The reaction was stopped by SDS dye and
samples were resolved by electrophoresis on 10% SDS-Gels were visualized by
phosphorimaging.
3.4.7 In vitro RNA editing assays
All in vitro RNA editing assays were performed using 5 µl of glycerol gradient
fraction 11 which was pre-incubated for 10 min on ice in the presence or absence of each
compound. 0.1% Triton X-100 was included in each reaction. Equivalent concentrations
of DMSO and Triton X were included in the controls that did not contain any compound.
Full-round hammerhead ribozyme based assay was performed as described previously
(Moshiri and Salavati, 2010). Real-time measurement of the ribozyme activity was
recorded at intervals of 1 min for a period of 2h using Rotor Gene 3000. The emission
spectra of FAM and TAMRA were 535 nm and 582 nm, respectively, and the excitation

93
wavelength was 470 nm for FAM. The rate of increase of the signal from FAM was used
to measure the RNA editing activity.
Endonuclease assay was performed as described previously (Salavati et al., 2002)
using 3’ end-labeled [32p]pCp pre-edited Cyb and Cyb guide RNA. Pre-cleaved
insertion, deletion and ligation assays were performed using 5’CL18, 3’CL13pp, and
gA6PC RNAs as described previously (Igo et al., 2000; Igo et al., 2002). Equal volume
of 10 M Urea dye was added to all samples which were run on 9% or 15 %
polyacrylamide gel according to their sizes and visualized by phosphorimager.
3.4.8 Gel shift assay
The gA6 [14] guide RNA and A6 pre-edited mRNA used in this assay were
prepared by T7 polymerase (Promega) transcription of PCR-generated templates as
previously described (Seiwert et al., 1996). Gel shift assays were performed as previously
described (Salavati et al., 2006) with the following exception. Purified KREPA4 at 400
nM concentration or 5 µl of editosome (glycerol gradient fraction 11) was incubated in
the presence or absence of 20 µM of each compound on ice for 10 min. This was
followed by incubation with 9 nM of 3’ end labeled gA6[14] gRNA in the presence or
absence of 4.5 nM of A6 pre-mRNA. Alternatively, we incubated the editosome with 9
nM of labeled gA6[14] for 30 min, and then added 20 µM of drug and incubated for
another 10 min. For Shift-Western blotting, the gel shift assay was duplicated; one for
autoradiography and one for western blotting using a 4% acrylamide gel to better resolve

94
heavy protein complexes. We used 15 µl of glycerol gradient fraction 11, and four
monoclonal antibodies against KREPA1, KREPA2, KREL1, and KREPA3 for blotting.
For gel shifts with TEV eluate, C35-treated or untreated TEV eluate was run on a 10-30%
glycerol gradient and 10 µl of protein from every odd-number fraction was incubated
with 9 nM of 3’ end-labeled gA6 [14] gRNA in the reaction buffer.
3.4.9 Guanylyltransferase labeling
To visualize endogenous RNA associated with editosome, purified editosome
from tandem TEV-glycerol gradient (see above) was used, and fractions 1-6 (a), 7-12 (b)
and 13-17 (c) were pooled together. RNA was extracted from each pooled fraction using
phenol/chloroform. The RNA obtained from each pooled fraction was treated with
guanylyltransferase (Epicentre Biotechnologies) in the presence of 50µCi of [α-32P]GTP
in a 20 µl reaction according to manufacturer’s instructions. Equal volume of 10 M Urea
dye was added to all samples and samples were run on 15 % polyacrylamide gel
visualized by phosphorimager.
3.5 Results
3.5.1 Virtual screening
We conducted a virtual screening of 77,000 compounds from the NCI library for
potential inhibitors of TbREL1 adenylylation. After clustering of the virtual screening
hits, 12 top-ranking representatives from various clusters were selected for experimental

95
testing. Table 3.1 lists the chosen compounds along with their score and rank in the
library. Compounds that are referenced explicitly in the text are also identified with a
shorter ID label. The top two ranking compounds in the editosome assay were C35 and
C10, in that order (see below). The compound that we call C35 was also found by a
different method in a recent virtual screening that was published as we were preparing
this manuscript (referred to as V2 in ref. (Durrant et al., 2010)), indicating the robustness
of this compound against the methodology used for virtual screening.
NCIa
no. IDb
VS Scorec
SIEd
Rank
614641 -35.1 -9.6 12
45210 C11 -34.7 -8.7 18
344553 -34.6 -8.7 20
162535 C35 -33.9 -8.3 38
79710 C10 -33.9 -8.8 41
641601 -33.4 -10.1 58
89166 C66 -33.0 -7.7 79
641753 -32.8 -8.2 92
7809 -32.8 -8.1 96
37204 C04 -32.7 -8.0 102
674000 -31.0 -8.8 257
623766 -29.4 -8.1 562
Table 3.1 Virtual hits selected for experimental validation. a
National Cancer Institute.
b
Short Identification Label. c
Values are in arbitrary unit. These are empirical scores from the
docking step of the high throughput virtual screening. More negative numbers suggest better
affinity. d
Values are in kcal/mol. The final step in the virtual screening involves rescoring of the
docked poses using a higher quality binding free energy calculation, in this case with the SIE
function (Naim et al., 2007; Cui et al., 2008). The predicted binding affinities are generally in the
micromolar range.
Figure 3.01 shows the predicted binding modes of C35 and C10, in TbREL1. The
interactions of C35 with the protein mimic certain aspects of ATP binding. Similar to the
adenine moiety of ATP, one of the naphthalene rings forms π- π stacking interactions
with Phe209. Also, similar to the polyphosphate tail of ATP, a sulfonate group forms an
ion pair with the guanidine group of Arg111. In addition, the aminonaphthyl group forms

96
a water-mediated interaction with Arg288 as is seen with the adenine N1 atom in ATP.
However, our predicted binding mode is rotated almost 180 degrees compared to that
described by Durrant et al., in which the aminonaphthyl group is exposed rather than
being buried. Durrant et al. docked their compounds on an ensemble of about 30 protein
conformations obtained from molecular dynamics simulations (Durrant et al., 2010). The
favored binding mode that they found for C35 required a protein conformation in which
the E60 side chain is somewhat separated from R111, with the aminonaphthyl packing
between E60 and R111. That pose is incompatible with the crystal structure of the protein
in which there is little room between E60 and R111. In our case, all our docking
calculations were performed on a single protein conformation based on the crystal
structure. Even for this single protein conformation, our exhaustive docker was able to
find a favorable pose with good interactions with the protein. Hence, we present a
binding mode that does not require the extra step of invoking a conformational change in
the protein. However, it is not possible without further experimental work to decide
which is the correct binding mode for C35.

Figur
The i
hydro
intera
intera
3.5.2
inhib
full-r
riboz
round
and f
Salav
µl of
fracti
re 3.01 Pr
inhibitors (A
ogen atoms n
acting residu
actions.
Inhibition
Adenylyla
ition of this
round RNA e
zyme assay, w
d RNA editin
full-round RN
vati, 2010). I
f the editosom
ionation. The
redicted bin
A) C35 and (B
not shown fo
ues represent
n of RNA edi
ation is the f
step results
editing. Usin
we tested ou
ng along wit
NA editing,
In the reactio
me purified f
e complete R
nding modes
B) C10 in Tb
or clarity. Th
ted as sticks
ting by selec
first essentia
in inhibition
ng our recent
ur initial 12 c
th S5, previo
as the positi
on mixture, 2
from whole
RNA editing
97
s of TbREL
bREL1 are r
he protein is
. Red dotted
cted compou
al step of RN
n of ligase ac
tly develope
compounds f
ously reporte
ive control (A
20 µM of ea
cell mitocho
g reaction mi
1 inhibitors
represented i
represented
d lines repres
unds
NA editing lig
ctivity and c
ed fluorescen
for their abil
ed inhibitor o
Amaro et al
ach compoun
ondrial extra
ix, which wa
s.
in stick form
d in ribbon fo
sent hydroge
gation. Ther
onsequently
nt-based ham
lity to inhibi
of TbREL1
., 2008; Mos
nd was incub
act by glycer
as used for a
m with
orm with key
en bonding
refore,
y inhibition o
mmerhead
it the full-
adenylylatio
shiri and
bated with 5
rol gradient
all the
y
of
on

98
experiments throughout this paper, contained pre-edited mRNA, gRNA and fraction 11
which is the most active fraction of glycerol gradient. We found that three compounds,
C10, C35, and C04, led to about seven times decrease in RNA editing activity (Fig.
3.02A). In order to test whether the inhibition observed was due to promiscuous
aggregation, we repeated the assay by including Triton X-100 (0.1% w/vol) for the
compounds which had resulted in 50% or more reduction in full round RNA editing
activity. Triton X-100 prevents compounds from aggregation or non-specific inhibition,
as previously described (Amaro et al., 2008). In the presence of this detergent, five out of
the six examined compounds did not retain their inhibitory effect, and therefore were
considered as promiscuous aggregators (Fig. 3.02B). We however found that C35 has
even a more pronounced inhibitory effect on full-round RNA editing compared to the
previously reported S5. To exclude the possibility of inhibitory effect of C35 on the
function of the reporter ribozyme system, the interaction of this compound with the active
ribozyme in the presence of gRNA and fluorescent substrate was examined (Fig. 3.02C).
The ribozyme activity was not significantly altered by C35 at a concentration of 20 µM.
Therefore, the inhibition observed in full-round RNA editing assay was not due to non-
specific binding of C35 to the reporter ribozyme in the assay. Our results prompted us to
test whether this compound can be used to selectively block the ligase activity in vitro.

99
Figure 3.02 Effect of selected compounds that inhibit editosome activity.
(A) Results of the first round of screening using FRET-based RNA editing assay are shown here.
The reactions were performed in the absence (Ø) or presence of various compounds in a 30µl
reaction volume under multiple turn-over conditions. Compounds resulting in more than 50%
inhibition are indicated by dark gray bars. (B) Effect of Triton X on the compounds prone to
aggregation is shown. Triton X-100 (0.1% wt/vol) was used to monitor any non-specific
inhibition due to aggregation. The compounds C35 and S5 which still showed more than 50%
inhibition are indicated by dark gray bars. The error bars represent the experimental variation
(standard deviation) from three independent repetitions. (C) The effect of the best inhibitor
compound, C35, on ribozyme activity was evaluated. Cleavage activity of active ribozyme was
measured in the absence of the compound (-C35) and presence of the compound (+C35). In all
graphs, y-axis represents relative percentages of the cleavage activity for edited ribozyme in each
experiment.

100
3.5.3 Inhibition of ligase adenylylation at low protein concentrations by C35 and S5
To determine the specificity of inhibition, we analyzed adenylylation activities of
editosome ligases in the absence and presence of 20 μM of C35 and S5, as well as the
less-inhibitory C10 as a negative control. Adenylylation of ligase is an RNA-independent
activity in which the enzyme reacts with ATP and becomes covalently linked to adenylyl
moiety via a lysine and releases a pyrophosphate. Upon incubation of the editosome in
the presence of Mg2+ and [α-32p]-ATP, the RNA editing ligases, TbREL1 and TbREL2
are adenylylated. Unexpectedly however, when we incubated 5 µl of the editosome with
[α-32p]-ATP in the presence of 20 μM of C35, S5 or C10, we did not observe inhibition
of adenylylation of either TbREL1 or TbREL2 (Figure 3.03A, upper panel). It should be
noted that the amount of editosome and the concentration of the compounds in this
experiment were the same as the full-round RNA editing assay, in which we saw the
inhibition of editing by C35 and S5 (see the previous section). Next, we tested various
concentrations of the compounds to determine if adenylylation inhibition could occur at
higher compound concentrations. Surprisingly, not only did we not observe inhibition of
adenylylation in the presence of C35 and S5, but we also saw an increase in the
adenylylation levels of both TbREL1 and TbREL2 with increasing concentrations of
these compounds (Fig. 3.03A, middle and lower panels, 5 µl of editosome). This is not in
agreement with the previous reports in which both S5 and C35 were suggested to
compete with ATP for binding to the recombinant TbREL1 adenylylation pocket (Amaro
et al., 2008; Durrant et al., 2010). We noticed that the estimated concentration of
TbREL1 in our reaction mix (with 5 µl editosome) was about 10 times higher than the

101
concentration of recombinant TbREL1 used in the previous studies (Amaro et al., 2008;
Durrant et al., 2010) . To test whether the difference in concentration of TbREL1 could
explain our observations, editosome was diluted 10 times and subjected to various
compound concentrations (Fig. 3.03A, middle and lower panels, 0.5 µl of editosome).
Inhibition in adenylylation was observed by both C35 and S5 as their concentrations were
increased, suggesting that the inhibitory compounds can block adenylylation activity, but
only at lower concentrations of the editosome. It has been reported that the adenylylation
activities of TbREL1 and TbREL2 are significantly increased in the presence of their
interacting partners, KREPA2 and KREPA1, respectively (Schnaufer et al., 2003; Gao et
al., 2010). Tenfold dilution of editosome might result in depletion and dissociation of
interacting partners and consequently lowering the adenylylation efficiency, thus leading
to effective inhibition of the basal levels of adenylylation activities by the compounds.
The effect of these compounds on TbREL2 is not surprising as TbREL1 and TbREL2 are
highly similar in sequence and structure and it has been reported that S5 can efficiently
inhibit the function of more distantly related T4 RNA ligase (Worthey et al., 2003;
Amaro et al., 2008; Shaneh and Salavati, 2009). Nevertheless, the inhibition observed in
full-round RNA editing cannot be explained by inhibition of adenylylation activity alone,
since in contrast to adenylylation activity, full-round RNA editing can be inhibited at
high concentrations of editosome in the presence of C35 and S5.

Figur
of RN
The u
32
P] A
lower
(below
conce
exper
the ad
60, 15
during
ATP*
ligase
at diff
Then,
of C1
also in
additi
TbRE
3.5.4
and S
durat
absen
re 3.03 Ef
NA editing lig
upper panel is
ATP in the abs
r panels, eithe
w the dashed
entrations of c
riments to mo
denylylation o
50, 240 min. T
g the 3h time
*). The lower
es as indicated
ferent time po
, ligatable RN
0, C35, or S5
ncluded. The
ion of C35 or
EL2 bands are
Inhibition
To better
S5 on the edi
tion of time r
nce or presen
ffect of inhib
gases. (A) Ad
an autoradio
sence (Ø) and
er 5µl editosom
line) was sub
compounds ar
nitor changes
of 5µl editoso
The middle g
course, as the
panel shows
d. (C) Time c
oints are show
NA substrate,
5 and incubate
results indica
S5 in the pre
e indicated by
n of deadeny
understand t
itosome func
required for
nce of 20 μM
itory compou
denylylation a
graph of 5µl
d presence of
me protein (a
bjected to vari
re indicated a
s in state of ad
ome in the abs
el indicates th
ey were able
the effect of
ourse experim
wn. 5 µl of fra
5’CL18, CL1
ed for 0, 15, 9
ate that deade
esence of ligat
y arrows.
lylation by C
the mechani
ction, we mo
the full-roun
M of compou
102
unds on aden
and the effect
of fraction 11
f 20 µM of C1
above the dash
ious concentr
above each ge
denylylation a
sence of any c
hat the ligases
to bind the fr
C10, C35, an
ments to moni
action 11 was
13pp, and gA6
90, or 150 min
enylylation is
table substrat
C35 and S5
ism responsi
onitored edit
nd RNA edit
unds. Interes
nylylation an
t of inhibitory
1 from glycer
10, C35, and S
hed line) and
rations of C35
el (0 to 100 µM
are shown he
compounds a
s maintained
reshly added r
nd S5 on spon
itor changes i
s labeled with
6PC-0A were
n. A control l
blocked as so
te. The adeny
ible for the s
tosome aden
ting reaction
stingly, the re
nd deadenyly
y compounds
rol gradient la
S5. In both m
0.5µl editoso
5 and S5 as in
M). (B) Time
ere. The upper
at different tim
their adenyly
radiolabled A
ntaneous dead
in state of dea
h [α-32
P] ATP
e added along
lacking any ch
oon as 15 min
ylylated TbRE
selective acti
nylylation for
n (three hour
esults showe
ylation steps
are shown.
abeled with [α
middle and
ome protein
ndicated. The
e course
r panel shows
me points, 10,
ylation activit
ATP (fresh
denylylation o
adenylylation
for 10min.
g with 20 µM
hemicals was
n after
EL1 and
ion of C35
r the same
rs) in the
ed that in the
α-
s
,
ty
of
n
s
e

103
absence of the inhibitory C35 and S5 compounds, adenylylated TbREL1 and TbREL2
gradually become deadenylylated during the course of three hours (Fig. 3.03B, upper
panel) .We hypothesized that the reduction in adenylylated enzymes might have occurred
as a result of deadenylylation of the enzymes by endogenous RNA substrates that are
present in editosome fraction, and/or as a result of instability and inactivation of the
ligases after the three-hour incubation at 28°C. To test these two possibilities, we
replenished the reactions with additional radio-labeled ATP at different time points and
observed that the adenylylation of the ligases were restored by binding to the freshly
added ATP during and at the end of the three-hour incubation period (Fig. 3.03B, upper
panel, +fresh ATP). These data suggest that the loss of adenylylation signal during the
incubation is most likely due to deadenylylation of ligases by the endogenous RNAs,
highlighting a dynamic interaction of the editosomes with the endogenous RNA
substrates. Interestingly, in the presence of C35 and S5, the level of deadenylylation was
lower, suggesting that the compounds directly or indirectly inhibit ligase deadenylylation
(Fig. 3.03B, lower panel).
In order to directly test for deadenylylation inhibition by C35 and S5, we
monitored the deadenylylation activity in the presence of exogenously added ligatable
dsRNAs. In the presence of ligatable dsRNAs, the AMP is transferred from the
adenylylated enzyme (E-AMP) to the 5’ phosphate group of the 3’ RNA fragment,
thereby adenylylating the RNA substrate and deadenylylating the enzyme. In the
presence of ligatable RNA substrates and absence of inhibitors, the deadenylylation
occurred quickly in the first 15 min, as it is evident by the reduction in the adenylylated

104
enzymes. In the presence of S5 and, particularly, C35, deadenylylation rate was
significantly reduced (Fig. 3.03C). Inhibition of the ligase deadenylylation step supports
the observation that adenylylated RNA editing ligases are accumulated in the presence of
C35 and S5 compounds (see the previous section).
3.5.5 Inhibition of different steps of RNA editing by C35 and S5
We further characterized the effect of the compounds on different steps of RNA
editing by carrying out in vitro pre-cleaved editing assays, as previously described, in the
absence and presence of the compounds (Sabatini and Hajduk, 1995; Igo et al., 2000; Igo
et al., 2002). First, we studied pre-cleaved ligation by performing the editing reaction in
the absence or presence of inhibitory compounds. As expected, we detected a significant
inhibition of the ligation when the reaction contained C35 (five times inhibition) and to a
lesser extent S5 (two times inhibition), compared with the reaction containing C10 or
controls (Fig. 3.04A). The difference in the efficiency of inhibition by C35 and S5 is
consistent with the observation that C35 is also a more potent inhibitor of the
deadenylylation step (Fig. 3.03C).
To examine the effect of compounds on U-removal as well as RNA ligation steps
of deletion editing, we used a pre-cleaved in vitro deletion assay that directs the deletion
of three Us from the substrate RNA (Fig. 3.04B). Incubation of the substrate RNAs,
gRNA, and the editosome in the absence of ATP resulted in removal of all three Us as
expected. Addition of ATP resulted in ligation of the 3’ fragment with the 5’ fragment

105
from which all three U's had been removed. In the presence of C35, a significant
inhibition of both U-removal (exoUase activity) and subsequent ligation activity was
observed (Fig. 3.04B). Addition of compound S5 to the editing reaction also affected
exoUase and ligase activities, but to a lesser extent, while the no-compound control and
the C10 control showed no inhibition on U-removal or RNA ligation.
We also used pre-cleaved RNA editing insertion assay to test the effect of C35
and S5 on the TUTase activity of the editosome. Incubation of the 5′ and 3′ fragments
with gRNA (designed to direct insertion of one U into the substrate RNA) in the presence
of UTP,ATP and editosome resulted in a RNA product with a single added U and also the
ligated edited RNA (Fig. 3.04C). As expected, in the absence of UTP or the editosome,
neither the U addition product nor the edited RNA was detected, and in the absence of
UTP only the 5’ and 3’ input RNA fragments were ligated. Also, omission of ATP
resulted in a major product with a single U added, which is an expected intermediate
before the ATP-requiring ligation step. Interestingly, again both C35 and, to a lesser
extent, S5 were able to inhibit TUTase activity (Fig. 3.04C).
We next tested the effect of compounds on endonuclease activity of the editosome
which is the required initial step for cleaving pre-edited mRNA, directed by
complementary gRNA at the editing site. We used the cytochrome b (Cyb) pre-mRNA
and its gRNA to monitor endonuclease activity. As expected, in the presence of the
editosome we detected both the gRNA-independent and gRNA-dependent cleavage

activi
endon
Figur
(A) Pr
gPCA
or S5
as ind
editin
specif
editos
presen
schem
which
insert
gRNA
no UT
(comp
indica
schem
labele
ities (Fig. 3.
nuclease act
re 3.04 Ef
re-cleaved lig
A6-0A gRNA
(lanes 4-6, re
dicated above
ng reactions w
fies the deleti
some, and lan
nce of 20 µM
matically. The
h represents th
tion editing re
A that specifie
TP, and no AT
plete reaction
ated by arrow
matically. (D)
ed Cyb mRNA
04D). In the
ivity of edito
ffect of inhib
gation reactio
in the absenc
espectively). L
the gel. The
were performe
on of 3Us in
ne 2 does not c
M of C10, C35
e edited produ
he ligation of
eactions were
es the insertio
TP, respective
n), and lanes 5
w. The pre-edi
Endonucleas
A at editing si
e presence of
osome was c
itory compou
ns were perfo
ce of any com
Lane 1 does n
ligated produ
ed with 5’ lab
the presence
contain ATP.
5, or S5, respe
uct is also sho
f the 5’ and 3’
performed us
on of one U. L
ely. Lane 4 co
5-7 contain 20
ted 5’ fragme
se assay was p
ite 1 in the ab
106
f C35 and, to
completely l
unds on diffe
ormed using 5
mpounds (lane
not contain ed
uct is indicated
eled U4-5’CL
of editosome
. Lanes 4-6 re
ectively. Input
own. The top b
’ fragments w
sing 5’ labele
Lanes 1-3 rep
ontains all the
0 µM of C10,
ent and U-inse
performed by
bsence (lane 4
o a lesser ext
ost (Fig. 3.0
erent steps o
5’ labeled 5’C
e 3) or presen
ditosome and
d by arrow. (B
L, U4-3’CL, a
e (lane 3). Lan
epresent the c
ut RNA and ad
band represen
without deletio
ed 5’CL18, 3’
present the rea
e components
C35, or S5. T
erted 5’ fragm
y gRNA–depe
4) and presenc
tent, S5, the
04D).
of RNA editin
CL18, 3’CL13
nce of 20 µM
lane 2 does n
B) Pre-cleave
and gA6 [14]
ne 1 does not
complete react
dded Us are r
nts unedited l
on of Us. (C)
’CL13pp, and
actions with n
s for insertion
The edited pr
ment are repre
endent cleava
ce of 20 µM o
ng.
3pp, and
of C10, C35,
not have ATP
ed deletion
PC-del whic
contain
tion in the
represented
ligated produc
Pre-cleaved
d gPCA6-1A
no editosome,
n RNA editing
roduct is
esented
ge of 3’
of C10, C35,
P
ch
ct
,
g

107
and S5 (lanes 5-7). Lane 1 contains only the input RNA, lane 2 contains no editosome, and lane 3
contains no gRNA. Lane 8 represents the T1 digestion of input RNA. Numbers below the panels
indicate percent of edited/ligated product in the presence of compounds normalized to the
edited/ligated product in the absence of any compound (complete reaction).
3.5.6 Inhibitory compounds affect the editosome RNA-binding activity
RNA editing model implies the requirement for substrate RNA and gRNA
binding by various components of the editosome for positioning of the catalytic core
(Read et al., 1994; Aphasizhev et al., 2003; Salavati et al., 2006; Kala and Salavati,
2010). We noticed that one of the simplest explanations for the inhibitory effect of C35
and S5 on all editing-related activities is that the editosome cannot interact with its
substrate RNA in the presence of these compounds. We used native gel mobility shift
assay, which allows for detection of ribonucleoprotein complexes, in order to test this
hypothesis (Kala and Salavati, 2010). The 32P-labeled gA6[14] gRNA, which specifies
the first editing site of the ATPase subunit 6 (A6) pre-mRNA, was incubated with the
editosome or recombinant KREPA4 in the absence or presence of pre-mRNA, either with
or without the inhibitory compounds. KREPA4 is a known gRNA-binding protein and a
core component of the editosome. The assembled complexes were analyzed by native gel
electrophoresis. As expected, the untreated controls showed the characteristic pattern of
RNA-protein interaction for both recombinant KREPA4 and editosome (Fig. 3.05A). We
saw that while KREPA4-RNA interaction was not affected by C35 and S5, these
compounds completely inhibited the formation of a major gRNA-protein complex that
was seen in the editosome preparation. (Fig. 3.05A). Also, significant reduction in the
amount of a heavier gRNA-protein complex was evident. Further separation on a 4% gel
showed four distinct RNA-protein complexes, as also reported before, of which the

108
lightest complex, previously designated as G1, showed the most dramatic inhibition by
C35 (Goringer et al., 1994; Read et al., 1994). The other three (G2-G4) also showed
significant inhibitions (Fig. 3.05B). In order to further investigate the composition of
each of these gRNA-containg ribonucleoprotein (RNP) complexes, we used Shift-
Western blotting using antibodies against KREPA1, KREPA2, KREPA3 and KREL1.
Interestingly, we observed that only the heaviest complex (G4), which was least affected
by C35, contained these four proteins. These results suggest that RNA-protein interaction
is one of the earliest steps that are affected by C35; yet, the major target of C35 is none of
the four proteins that we analyzed by Shift-Western.
In order to understand the dynamics of inhibition of RNA-protein interaction by
C35, we used alternative orders of addition of RNA and C35 to editosome. We observed
that C35 can inhibit RNA-protein interaction even after the RNA-protein complex is
formed, suggesting that it can disengage the RNA that is already bound to the protein
(Fig. 3.05C).
It has been shown previously that treatment of editosome complex with nucleases
results in disassembly of complex, suggesting a role for RNA to maintain the 20S
editosome complex integrity (Salavati et al., 2002). Furthermore, our data on
deadenylylation of TbREL1 and TbREL2 in the absence of exogenous substrate RNA
indicate that the purified editosome complex already contains non-negligible amounts of
endogenous RNA substrate (Fig. 3.03B, upper panel), which may be responsible for the
integrity of the purified editosome. Effective inhibition of editosome-substrate interaction

by C3
comp
Figur
comp
presen
editos
S5, as
were r
RNA
resolv
comp
KREP
(right
prepa
was fi
((E+R
C10 h
35 prompted
ponents, as d
re 3.05 Ef
plex. (A) 32
P-
nce of A6 pre
some (5 µl of
s indicated. T
resolved on 1
are indicated
ved on 4% po
lexes. Compl
PA2, KREPA
t). It can be se
aration was fir
first added and
RNA)+C). It c
had no effect.
d us to exam
discussed in t
ffect of inhib
-labeld gA6[1
e-mRNA (righ
f glycerol grad
he protein-bo
10% denaturin
d. (B) Treated
olyacrylamide
lexes G1-G4 a
A3 and KREL
een that these
rst treated wit
d then the RN
can be seen th
mine whether
the next sect
itory compou
4] gRNA alo
ht panel) was
dient fraction
ound RNA an
ng polyacryla
d and untreate
e gel for bette
are shown by
1 in the untre
proteins prim
th C35/C10 an
NA-protein co
hat in both cas
109
this drug ca
tion.
unds on RNA
one (left panel
incubated wi
11) in the ab
d unbound in
amide gels. P
ed editosome
er separation o
y the arrowhea
eated sample w
marily exist in
nd then RNA
omplex was su
ses C35 inhib
an promote d
A-binding ac
l) or 32
p-label
ith recombina
bsence (Ø) or
nput RNAs ar
Positions of fr
proteins (sim
of high molec
ads (left). Als
was invesitag
n the G4 comp
A was added ((
ubjected to C
bited RNA-pr
dissociation o
ctivity of edit
ld gA6[14] gR
ant KREPA4
presence of C
e indicated. T
ree RNA and
milar to panel
cular weight R
so, presence o
ged by Shift-W
plex. (C) Edit
(E+C)+RNA)
35/C10 treatm
rotein interact
of editosome
tosome
RNA in the
protein or
C10, C35, and
The reactions
protein-boun
A, left) were
RNP
of KREPA1,
Western
tosome
) or the RNA
ment
tion, while
e
d
s
d

110
3.5.7 20S editosome complex integrity is affected by C35 treatment
To examine the editosome complex integrity in the presence and absence of the
more potent inhibitory compound C35, we prepared editosome by affinity purification of
tagged TbREL1, and after elution of purified editosome complex using TEV protease, we
incubated different amounts of the TEV eluate with 20 µM of C35. Consistent with our
previous experiments on glycerol gradient-purified editosome, adenylylation of TbREL1
and TbREL2 could be inhibited by C35 only when low amounts of affinity-purified
editosome were present; at higher editosome concentrations, while the adenylylation was
not inhibited, the drug could effectively reduce the rate of deadenylylation at 15 min.
(Figure 3.06A). Furthermore, again we saw a very significant inhibition of gRNA-protein
interaction in the presence of C35 (Figure 3.06B). These data prompted us to test the
effect of C35 treatment on editosome complex integrity. We treated the TEV eluate with
C35, and then we fractionated the TEV eluate on 10-30% glycerol gradient. As shown in
Fig. 3.06C, a nearly complete loss of 20S editosome complex can be seen upon treatment
with C35, and the profiles of proteins are shifted towards low-density fractions.
Adenylylation assay on glycerol gradient fractions of C35-treated sample confirmed that
TbREL1 and TbREL2 are mostly present in complexes that are smaller than 20S,
indicating disintegration of the 20S editosome after treatment (Fig. 3.06D). In order to
investigate the effect of C35 on endogenous RNA that accompanies editosome, we
extracted RNA from tandem TEV/glycerol gradient-purified editosome that was or was
not treated with C35. As shown in Fig. 3.06E, while an abundance of gRNA molecules is
present in complex with the editosome, especially in fractions 7-12, treatment with C35

almo
treatm
thus b
Figur
in the
studie
amou
of 20
TAP T
CL13
perfor
5, and
gA61
indica
µM o
collec
MP81
both u
[α-32
P
effect
st completel
ment, the end
becomes sus
re 3.06 An
e presence of
ed in the abse
unts of REL1-
µM of C35. T
TEV eluate a
pp, and gA6P
rmed with 32
p
d 10 µl) in the
4 in the absen
ated by arrow
f C35 (lower
cted from top
1, MP63, REL
untreated (upp
P] ATP and ru
tively abolish
ly abolishes
dogenous pr
sceptible to d
nalysis of sed
f C35. (A) Ad
nce and prese
TAP TEV elu
The lower gel
s indicated ab
PC-0A in the
p-labeld gA6[
e absence or p
nce of any pro
ws. (C) REL1-
gel) were fra
of the gradie
L1, and MP42
per gel) and t
un on SDS ge
es most RNA
these RNA m
rotein-bound
degradation.
dimentation p
denylylation
ence of C35. T
uate (1, 5, and
l represents th
bove, with [α-
absence or pr
[14] gRNA us
presence of 20
otein. The pro
-TAP TEV elu
actionated on
nts and ran on
2. (D) 10 µl o
treated (lower
el. (E) Guanly
A species in th
111
molecules. T
d RNA disen
profile and a
and deadenyl
The upper ge
d 10 µl) with
he deadenylyl
-32
P] ATP and
resence of 20
sing different
0 µM of C35.
otein-bound R
uate in absenc
10%-30% gly
n SDS page, b
of odd-numbe
r gel) REL1-T
ylyltransferas
he tandem TE
These data su
ngages from
activity of lig
lylation of RE
l represents a
[α-32
P] ATP
lation of the s
d ligatable RN
0 µM of C35(B
amounts of R
. The rightmo
RNA and unb
ce of C35 (up
ycerol gradien
blotted and p
ered glycerol g
TAP TEV elu
se assay show
EV/glycerol gr
uggest that u
the protein c
gase-associate
EL1-TAP TEV
adenylylation
in the absenc
same volume
NA substrate,
B) Gel shift a
REL1-TAP T
ost lane repres
bound input R
pper gel) or pr
nts. Fractions
probed with m
gradient fract
uate were aden
wed that C35 t
radient-purifi
upon C35
complex and
ed complexes
V eluate were
of various
ce or presence
s of REL1-
, 5’CL18,
assay was
TEV eluate (1,
sents unboun
RNAs are
resence of 20
s were
mAbs against
tions from
nylylated with
treatment
ied editosome
d
s
e
e
,
d
0
h
e

112
fractions. Letters a, b and c correspond to pooled fractions 1-6, 7-12 and 13-17, respectively. (-)
and (+) signs indicate non-treated control and C35-treated sample, respectively. The rightmost
lane shows in vitro-transcribed [32
P]guanylyl-labeled gA6[14] which corresponds to 70 nt
(indicated by the asterisk). The ladder-like pattern below the major gRNA
band in the non-treated samples correspond to gRNA molecules with different poly(U)-tails.
3.6 Discussion
In the course of this work, we have examined the mechanism of action of two
naphthalene-based inhibitors of RNA editing process, C35 and S5, and have shown that
these compounds affect virtually all editosome activities that require editosome-RNA
interaction, suggesting that they block the interaction of editosome with its substrate
RNA. This hypothesis is further supported by direct experiments showing that the
editosome cannot interact with substrate RNA in the presence of C35 and S5. Therefore,
although the exact molecular mechanism of action of naphthalene-based compounds
remains to be determined, our data present C35 and S5 as the first known drug-like
compounds that can interfere with editosome-RNA interaction. In addition, we examined
the effect of the more potent compound C35 on complex integrity, and showed that this
compound could interfere with integrity and/or assembly of editosome complex, shifting
the 20S editosome sedimentation to the lower 5-10S region. This resembles the results
obtained from treatment of core editosome proteins by RNases, suggesting that editosome
requires its substrate RNA for initiating assembly and/or maintaining its integrity
(Aphasizhev et al., 2003).
It has been reported that C35 and S5 inhibit the adenylylation step of ligation,
with the inhibitory compounds competing with ATP for binding to the adenylylation

113
pocket (Amaro et al., 2008; Durrant et al., 2010). However, in our partially purified
editosome fraction, as estimated, the minimum concentration of TbREL1 which is able to
perform full-round editing is about 50 nM, about ten times higher than the concentration
of recombinant TbREL1 used in previous reports. With this amount of editosome, we
could not observe any inhibition at the adenylylation step. However, we were able to see
inhibition of adenylylation after we diluted our editosome preparation 10 times,
indicating that adenylylation inhibition only occurs at very low protein concentrations.
On the other hand, we observed efficient inhibition of editing using the non-diluted
amount of editosome that is required for successful completion of full-round in vitro
RNA editing, suggesting that adenylylation inhibition is most likely not the mechanism
through which C35 and S5 inhibit the editing process at higher protein concentrations.
Our results suggest that the main mechanism of action of C35 is through
inhibition of a protein-RNA interaction that is integral to the process of editing. (Fig.
3.07A). Alternative molecular mechanisms can lead to such observation. For example,
freezing the OB fold-containing interacting partner of ligases in a closed conformational
state can lead to their inability to interact with the RNA substrate. During the
deadenylylation step in capping enzymes and DNA ligases, it has been shown that the
OB fold rotates and positions itself at a distance from the AMP-binding pocket that
allows the ligase open its conformation to bind to the nicked double stranded DNA
(Doherty and Suh, 2000). Thus, one possible model is that in the presence of compound,
when OB fold provides the ATP for adenylylation pocket, OB-fold gets locked and
induces the closed ligase-interacting partner conformation which does not allow the

114
interaction of protein with RNA (Fig. 3.07B, top panel). In other words, due to the
compound-induced locked position of OB fold on adenylylated catalytic pocket, the OB
fold cannot rotate around and is not capable of positioning its nucleic acid binding
surface towards the active site awaiting nick binding. Although this model corresponds to
in silico modeling studies that suggest the interaction of C35 and S5 with TbREL1 (this
work as well as ref. (Amaro et al., 2008; Durrant et al., 2010)), it is not fully supported
by our experimental data, especially in that the main target of C35 does not seem to be in
association with TbREL1 (Fig. 3.05B).Alternatively, C35 may inhibit a yet unidentified
protein(s) which is responsible for bringing the substrate RNA into the editosome
complex (Fig. 3.07B, bottom panel). This may result from similarity of physiochemical
properties of RNA-binding regions of proteins to nucleoside triphosphate-binding clefts;
C35 was designed to be positioned in the adenylylation pocket of RNA editing ligases,
which can structurally be similar to other nucleotide and nucleic acid-binding
pockets.Furthermore, we have shown that in the presence of C35 there is a shift in 20S
editosome toward 5-10S. This result is very similar to the reports showing that upon
treatment of the editosome complex with nucleases, the editosome is disintegrated,
suggesting that the loss of integrity after treatment of editosome with C35 is because
editosome loses its RNA-interacting capacity (Aphasizhev et al., 2003). Treatment with
the compounds does not result in loss of RNA-binding activity of KREPA4, suggesting
that these compounds do not directly affect KREPA4, and that they do not block protein-
RNA interactions in a non-specific manner, but rather target a particular protein.

Figur
Altho
weekl
prima
steps.
determ
ligase
Since
the in
bindin
respon
suppo
Altern
TbRE
for sim
comp
re 3.07 Al
ough C35 and
ly able to bloc
arily by block
The exact m
mined. Two li
es of editosom
this conform
nteraction of e
ng cleft in a c
nsible for ind
orted by our e
natively, C35
EL1 and TbRE
mplicity, we h
ounds as well
lternative mo
S5 are design
ck this step o
king the intera
molecular mech
ikely possibil
me and inhibit
mational chang
editosome wit
conformationa
ducing the con
experimental d
and S5 may
EL2, is respon
have only sho
l.
odels for the
ned to inhibit
f editing (pan
action of RNA
hanism for th
lities are depi
t the induced
ge is necessar
th RNA. This
ally variable r
nformational
data, and thus
bind directly
nsible for recr
own TbREL1
115
mechanism
t adenylylatio
nel (A), dashe
A with editoso
his protein-RN
cted in panel
conformation
ry for RNA in
model is sup
region of TbR
change of the
s is not highli
to an RNA-b
ruiting the sub
and not TbRE
of action of C
on, our data su
ed line). Instea
ome, therefor
NA interaction
(B). Top: C3
nal change of
nteraction, bin
pported by pre
REL1 (E60-R
e interacting p
ighted in this
binding protei
bstrate RNA
REL2 which m
C35 and S5.
uggest that the
ad, C35 and S
re inhibiting a
n inhibition is
35 and S5 bin
f their interact
nding of these
esence of a po
R111) (30) wh
partner. Howe
figure. Bottom
in that, indepe
into protein.
might be targe
ey are only
S5 act
all subsequent
s yet to be
d to the
ting partners.
e drugs block
ossible drug-
hich may be
ever, it is not
m:
endent of
In this figure
eted by the
t
k
,

116
The RNA-protein complex that we found to be most affected by C35 has been
previously designated as G1 (Read et al., 1994). The most abundant gRNA-binding
protein of G1 has been shown to be a 25kDa protein. G1 has been proposed to be an
intermediate during the formation of complete editosome (Goringer et al., 1994). This
supports our observation that C35 inhibits the formation of high molecular weight RNA-
containing editosome complexes. Another candidate target of C35 can be RBP38, a
previously identified essential protein that has been shown to be able to bind to both
single-stranded and double-stranded RNA and is important for stability of mitochondrial
RNA (Sbicego et al., 2003). The properties of this protein match the observation that C35
abolishes the interaction of its target with both single-stranded gRNA and double-
stranded gRNA/mRNA complex (Fig. 3.05A), which leads to instability of RNA after
drug treatment as shown by guanylyltransferase assay (Fig. 3.06E).
While our data support a role for RNA to maintain the editosome complex
integrity, there are contradicting reports showing that mutants that lack mitochondrial
DNA and, hence, lack mitochondrial mRNA and gRNA contain catalytically active
editosomes (Domingo et al., 2003). Resolving the exact mechanism of inhibitory
compounds such as C35 would provide a more detailed understanding for assembly of
functional editosomes.

117
3.7 Acknowledgments
We would like to thank Dr. Kenneth Stuart (Seattle Biomedical Research
Institute) for kindly providing Monoclonal antibodies against KREPA1, A2, A3, and
KREL1.

118
Chapter 4
Automated Molecular Formula Analysis Determination
by Tandem Mass Spectrometry (MS/MS)

119
Preface
Jarussophon S, Acoca S, Gao JM, Deprez C, Kiyota T, Draghici C, Purisima E, Konishi
Y. 2009. Automated molecular formula determination by tandem mass spectrometry
(MS/MS). Analyst. 134(4):690-700.

120
4.1 Rational
In Chapter 3 we successfully identified a novel TbREL1 inhibitor through our
virtual screening pipeline. Compounds identified in these types of settings are derived
from proprietary/public compound libraries. However, the most important source of lead
compounds leading to marketed drugs has been natural sources. One of the most difficult
tasks in the identification of active lead compounds within natural sources comes from
the rediscovery of known natural products. In this Chapter we describe a novel method
which facilitates the identification of compounds from natural sources through the
development of a novel algorithm which utilizes high-resolution MS/MS spectroscopic
data to identify the molecular formula of isolated compounds.
4.2 Abstract
Automated software was developed to analyze the molecular formula of organic
molecules and peptides based on high-resolution MS/MS spectroscopic data. The
software was validated with 96 compounds including a few small peptides in the mass
range of 138–1569 Da containing the elements carbon, hydrogen, nitrogen and oxygen. A
Micromass Waters Q-TOF Ultima Global mass spectrometer was used to measure the
molecular masses of precursor and fragment ions. Our software assigned correct
molecular formulas for 91 compounds, incorrect molecular formulas for 3 compounds,
and no molecular formula for 2 compounds. The obtained 95% success rate indicates
high reliability of the software. The mass accuracy of the precursor ion and the fragment

121
ions, which is critical for the success of the analysis, was high, i.e. the accuracy and the
precision of 850 data were 0.0012 Da and 0.0016 Da, respectively. For the precursor and
fragment ions below 500 Da, 60% and 90% of the data showed accuracy within ≤0.001
Da and ≤0.002 Da, respectively. The precursor and fragment ions above 500 Da showed
slightly lower accuracy, i.e. 40% and 70% of them showed accuracy within ≤0.001 Da
and ≤0.002 Da, respectively. The molecular formulas of the precursor and the fragments
were further used to analyze possible mass spectrometric fragmentation pathways, which
would be a powerful tool in structural analysis and identification of small molecules. The
method is valuable in the rapid screening and identification of small molecules such as
the dereplication of natural products, characterization of drug metabolites, and
identification of small peptide fragments in proteomics. The analysis was also extended
to compounds that contain a chlorine or bromine atom.
4.3 Introduction
Over the past decades, the major sources of marketed drug therapeutic agents
available to human were natural products, or their semi-synthetic derivatives. Natural
products provide larger structural diversity than combinatorial chemistry products and
offer significant opportunities for finding novel lead compounds (Shu YZ, 1998;
Newman et al., 2003; Lee KH, 2004; Koehn and Carter, 2005; Sarker et al., 2005).
Each year, a large number of new natural products are discovered and fully
characterized; however, during the courses of isolation, characterization and
identification, natural products chemists have faced increasing problems of

122
replication, i.e., re-discovery of known natural products (Eldridge et al., 2002;
Hostettmann et al., 2005).
A method that identifies and eliminates known compounds at early
stages of natural product discovery processes, generally known as dereplication, plays
a key role in phytochemistry and in an effective natural product discovery program
(Cordell et al., 1997; Hook et al., 1997; vanMiddlesworth and Cannell, 1998; Cordell
and Shin, 1999; Dinan L, 2005; Böröczky et al., 2006). Dereplication typically uses a
combination of analytical techniques and database searching to identity active
compounds (Corley and Durley, 1994; Bindseil et al., 2001; Bradshaw et al., 2001).
The databases that are extensively used are Chemical Abstracts (CA), Beilstein,
Bioactive Natural Product Database, Chapman & Hall’s Dictionary of Natural
Products and Natural Products Alert (NAPRALERT). There have been considerable
developments of analytical separation techniques such as GC, LC and CE, and of
spectroscopic characterization techniques such as PDA, IR, NMR, and MS.
Hyphenated techniques couple the separation techniques with online spectroscopic
characterization, e.g., LC-MS, GC-MS, CE-MS, and LC-NMR. They are expected to
resolve the complexity of the natural product extracts (Wilson and Brinkman, 2003;
Sarker and Nahar, 2005). Recent advances of hyphenated techniques and their
applications have been reported by several research laboratories (Constant and
Beecher, 1995; Bobzin et al., 2000; Wolfender et al., 2000; Hansen et al., 2005;
Jaroszewski JW, 2005). In addition, multiple combinations of the characterization
techniques are developed, e.g., LC-PDA-MS, LC-NMR-MS, LC-SPE-NMR, LC-

123
PDA-NMR-MS, 2D-LC(IEC-RP)-MS (He XG, 2000; Sandvoss et al., 2000; Louden
et al., 2001; Wolfender et al., 2003; Clarkson et al., 2005; Jaroszewski JW, 2005;
Lambert et al., 2005; Pepaj et al., 2006). These improve sample separation, structural
characterization and elucidation, and also detection of valuable minor components in
natural sources.
By virtue of high productivity and sensitivity, mass spectrometry has
become the most powerful irreplaceable technique and provides critical information
in many phases of drug discovery and development. Examples are structural
characterization and identification, high-throughput screening, quantitation,
proteomics, metabonomics, metabolomics, and dereplication of natural products. In
the area of dereplication, MS delivers the molecular mass information that can be
used as search query in almost all databases of small molecules (Shin and
vanBreemen, 2001; Glish and Vachet, 2003; Korfmacher WA, 2005). Accurate mass
obtained from high resolution MS is commonly used to search for candidates in
literature databases (Potterat et al., 2000). Unfortunately, several molecular formulas
fit close enough to the observed molecular mass, resulting in a large set of
compounds, which are often identified as false positives. MS/MS measurements
often provide information of some sub-structures, or fingerprints of sub-structures
(Fredenhagen et al., 2005; Petucci and Mallis, 2005; Plumb et al., 2006). There are
few databases that contain MS/MS data such as NIST/EPA/NIH Mass Spectral
Library containing 14,802 MS/MS spectra. They may become practically useful when
more MS/MS data are accumulated and systematically analyzed.

124
Obviously, dereplication process needs information that is available in most of
the databases of natural product. Molecular formula is one of the most valuable
indexes for dereplication as it is available in any natural product database and is
independent of the source, of the sample preparation, and of the conditions of
measurements. Conventional elemental composition analysis is typically achieved by
high-resolution MS such as magnetic sector MS, time-of-flight MS (TOF-MS) and
Fourier Transform MS (FTMS) and provides a set of molecular formulas, which is
further narrowed down in combination with NMR data (Grange and Sovocool, 1999;
Chernushevich et al., 2001; Zhang et al., 2005; Bristow AWT, 2006). However, NMR
is much less sensitive than MS and works well on purified material, requiring a large
quantity of purified sample. Thus, the combination of MS and NMR brings practically
little contribution to dereplication. It should be mentioned that the number of possible
molecular formulas exponentially increases with the size of the molecules, whereas
there has been no drastic progress to improve accurate mass determination (Bristow
and Webb, 2003; Kujawinski and Behn, 2006).
In our previous paper, we analyzed molecular formula and fragmentation
pathways of small molecules based on their accurate MS, MS/MS, and MS/MS/MS
data (Konishi et al., 2007). The method provided quite detailed information on the
structure and sub-structures and their fragmentation pathways. The method is useful
when a specific molecule is targeted for detailed structural analysis. However, the
measurements and the analysis are not automated. Thus, we developed simplified,
automated, and productive software, which determines molecular formula of small

125
molecules and their fragments based on the accurate MS and MS/MS data.
4.4 Experimental
4.4.1 Materials
[Glu1
]-fibrinopeptide B was purchased from Sigma (Oakville, ON, Canada)
and used as a reference compound for the calibration of mass spectrometer. All
reagents were used without further purification. Water and acetonitrile are HPLC
grades and were purchased from Anachemia (Lachine, QC, Canada) and J. T. Baker
(Phillipsburg, NJ), respectively. Formic acid was purchased from Fluka (Oakville,
ON, Canada) and used to aid the positive ion electrospray ionization process. All
solvents were degassed at least 30 minutes before used.
4.4.2 Instrumentation
All MS and MS/MS measurements were performed in a positive ion
electrospray mode (+ESI) on a Micromass Waters Q-TOF Ultima Global mass
spectrometer equipped with a Z-spray ion source and NanoLockSpray (Waters,
Mississauga, ON, Canada) source. The m/z range was acquired within the mass range
of 50 − 990 m/z for small organic molecules and of 100 − 1990 m/z for organic
molecules with > 900 Da molecular mass and small peptides. The acquisition time per
spectrum was set to 1s, inter-scan delay was set to 0.1s, with the lock spray frequency
being set to 4s (Eckers et al., 2000; Wolff et al., 2001; Wolff et al., 2003). The mass

126
spectrometer was set up in V mode with instrument resolution between 9,000 and
10,000 based on FWHM. The source and desolvation temperature were set to 80 and
150°C, respectively. The TOF was operated at an acceleration voltage of 9.1 kV, a
cone voltage of 100 V, RF lens of 45 V, and a capillary voltage of 3.8 kV. Operating
parameters of the ESI interface were optimized by infusing standard solutions of
[Glu1
]-fibrinopeptide B, 100 nM in a solution of water:acetonitrile 50:50 (v/v) with
0.1 % formic acid at a flow rate of 1.0 µL/min. The instrument was carefully
calibrated as to obtain error of the MS/MS fragments of [Glu1
]-fibrinopeptide B less
than 4 ppm. All measurements were performed at room temperature. The MassLynx
4.0 chromatographic software was used for instrument control data analysis.
4.4.3 MS/MS experiments
A precursor ion of interest was selected at the first quadrupole (Q1) and
subjected to collision-induced fragmentation in the second quadrupole (Q2) with
argon collision gas at appropriate collision energy to produce abundant product ions.
Fragment ions were measured to obtain a number of spectral peaks each comprising
an m/z and a peak area value. The collision energy is adjusted for each compound
typically from 5 eV to 40 eV in order to maintain the precursor peak in the range of
35 – 100 counts/scan, while maximizing the peak areas of the product ions.
Sometimes, high collision energy is used just to enhance the peak areas of the
fragment ions at low m/z. The fragment ions of 20 – 400 counts/scan were used for
most of the analyses. The acquired mass spectra were accumulated for at least 2 min.

127
The mass measurements are most accurate when analyte/lock mass intensity ratio is
between 0.5 and 2.0 (Bloom KF, 2001; Colombo et al., 2004). In few cases, fragment
ions lower than 20 counts/scan were used for the analysis after accumulating many
scans. Typically, analytes were dissolved in water:acetonitrile 50:50 (v/v) with 0.1%
formic acid and directly infused to the mass spectrometer using a Harvard syringe
pump or autosampler direct injection. For the reference channel, freshly prepared
[Glu1
]-fibrinopeptide B (~1 µM) in water:acetonitrile 50:50 (v/v) with 0.1% formic
acid was continuously infused to maintain the constant concentration of reference
solution. Both analyte and reference channels were controlled by NanoLockSpray.
The TOF mass correction (accurate mass measurement) parameters were as follows:
no background subtraction; smooth type = Savitsky Golay; smooth window 3
channels; number of smooths 1; minimum peak width at half-height 4 channels;
centroid top 60%; the dead-time correction was turned on (Wu and McAllister, 2003;
Vivó-Truyols G and Schoenmakers PJ, 2006). Spectral intensity cut-off threshold
setting of 0.1-1.0% was used to simplify and reduce the number of peaks analyzed
which have not sufficient intensity to get good accuracy (Tyler et al., 1996; Clauwert
et al., 2003; Sleno et al., 2005). The TOF transform was used to exclude all isotope
peaks.
4.4.4 The algorithm of molecular formula analysis
The forward and reverse molecular formula analysis algorithms are described
in the previous literature (Konishi et al., 2007). Briefly, MS/MS experiment of a

128
precursor ion A generates several fragment ions, A1, A2, A3, A4, A5, A6, A7, A8,
and A9 (Table 4.1). All possible neutral fragments, Ni (i = 1-9) and Nij (i = 1-8; j =
2-9; j > i)) are listed in Table 4.1, i.e., Ni (i = 1-9) are generated from A and Nij (i =
1-8; j = 2-9; j > i) are possibly generated from Ai (i = 1-8). H+
(= 1.0078 Da) is
added artificially as the smallest product ion to obtain corresponding neutral product
N, N1H, . . ., N9H in Table 4.1.
Table 4.1 Potential neutral losses in the MS/MS expperiment in forward MFA.
Precursor Product ions and neutral products
A A1 A2 A3 A4 A5 A6 A7 A8 A9
H+
A N1 N2 N3 N4 N5 N6 N7 N8 N9
N
A1 N12 N13 N14 N15 N16 N17 N18 N19
N1H
A2 N23 N24 N25 N26 N27 N28 N29
N2H
A3 N34 N35 N36 N37 N38 N39
N3H
A4 N45 N46 N47 N48 N49
N4H
A5 N56 N57 N58 N59
N5H
A6 N67 N68 N69
N6H
A7 N78 N79
N7H
A8 N89
N8H
A9
N9H
1) The first step of the analysis is designated as “Forward Analysis”, where,
briefly, accurate mass measurement uniquely determines molecular formula of some
small fragments, and the molecular formulas of these small fragments are added up
sequentially to determine the molecular formula of the precursor ion, A. More

129
specifically, the molecular formula analysis were carried out for the neutral losses
NiH (i = 1-9), N, and Ni9 (i = 1-8). Two restrictions were applied in the search. One
is the molecular size of the neutral loss, which is typically limited to 200 – 400 Da,
preferably 200 Da, and the other is the error cut off, which is 0.002 – 0.003 Da,
preferably 0.002 Da. These restrictions save the CPU time and enhance the
identification of unique molecular formula, respectively. The neutral molecules
identified uniquely are highlighted in grey background (N9H, N8H, N89, N6H, …,
etc). It should be emphasized that all Ni9 (i = 1-8) are simply listed to fill Table 1 and
some of them may not exist mathematically or physically. The molecular formula of
fragment ion A9 is assigned from the added molecular formulas (N9H + H+). The
observed m/z value of A9 is then replaced with the one calculated from the assigned
molecular formula. The molecular masses of Ni9 (i = 1-8) are also replaced
accordingly. Similarly, the molecular formula of fragment ion A8 is assigned from
(N8H + H+) and (N89 + A9). The process continues to the assignment of the
molecular formula of A from (N1 + A1), (N2 + A2), (N3 + A3), (N4 + A4), and (N5
+ A5). In most of the analysis, the molecular formula of A is uniquely assigned;
however, sometimes, two or more molecular formulas are assigned for A. All of the
molecular formulas assigned are further examined in the next step, which is
designated as “Reverse Analysis”, where each of the molecular formulas of the
fragment ions and neutral losses assigned in “Forward Analysis” is re-examined.
2) The second step is designated as “Reverse Analysis”. First, the observed m/z
value of the precursor ion is replaced with the m/z value calculated from assigned

130
molecular formula. The molecular formulas of Ni (i = 1-9), in which each element is
restricted not to exceed to that of A, are analyzed with error cut off of 0.002 – 0.003
Da, but with no limitation of the molecular size. The molecular formula of A1 is
determined as the difference of those of A and N1. The observed m/z value of A1 is
replaced with the m/z value calculated from assigned molecular formula. The
molecular masses of N1i (i = 1-9) are also replaced accordingly. Similarly, the
molecular formula of fragment ion A2 is assigned from (A – N2), and (A1 – N12).
The process continues until the molecular formula of A9 is assigned from (A – N9),
(A1 – N19), (A2 – N29), (A3 – N39), (A4 – A49), (A5 – A59), (A6 – N69), (A7 –
N79), and (A8 – N89). If they are not consistent, the molecular formula of Ai is
assigned by taking the one assigned most frequently and used in the following steps.
Table 2 shows the outcome of the reverse analysis of brefeldin A.
3) The third step is designated as “Least-square Index”, where statistical
verification is introduced on the outcome of “Reverse Analysis”. This step is designed
to select a correct molecular formula out of multiple molecular formulas that are
occasionally resulted in “Forward Analysis”. It should be reminded that “Reverse
Analysis” is performed for all of the molecular formula(s) derived from “Forward
Analysis”. The molecular formula analysis of brefeldin A is an example resulted in
two molecular formulas of C16H24O4 and C17H20N4 in “Forward Analysis”. Their
“Reverse Analysis” gave quite different molecular formulas of the fragment ions and
neutral losses. Tables 4.2 and 4.3 show the reverse analysis of the correct and
incorrect ones, respectively (the formats of precursor ion, fragment ions and neutral

131
losses are the same as those in Table 4.1). The difference is clear such that the correct
molecular formula of the precursor ion assigned the molecular formulas of all
fragment ions and of most of the potential neutral losses, whereas the incorrect
molecular formula of the precursor ion failed to assign the molecular formula of four
fragment ions and of several neutral losses. A least-square index was introduced in
the automated evaluation software to evaluate the difference numerically. The index
is based on the fact shown in Table 4.2, i.e., the number of possible neutral losses
(NLcalc in Table 4.2) associated with the fragment ions is linearly increased as the size
of the fragment ions is decreased. The number of neutral losses (NLobs in Table 4.2),
of which molecular formulas are assigned, more or less correlates with NLcalc, and
the deviation from the linear relationship is estimated by R2
of linear least squares
fitting. The R2
values of Tables 4.2 and 4.3 are 0.9749 and 0.9292, respectively. The
molecular formula analysis with higher number R2
value is taken as correct molecular
formula.
Table 4.2 Reverse MFA of brefeldin A with correct formula of precursor ion
MMobs. 280.1673 262.1569 244.1458 226.1363 216.1521 198.1413 184.1261 162.1419 158.1113 130.0791
C16H24O4 C16H22O3 C16H20O2 C16H18O C15H20O C15H18 C14H16 C12H18 C12H14 C10H10
C16H24O
4
- H2O H4O2 H6O3 CH4O3 CH6O4 C2H8O4 C4H6O4 C4H10O4 C6H14O4
C16H22O
3
- H2O H4O2 CH2O2 CH4O3 C2H6O3 C4H4O3 C4H8O3 C6H12O3
C16H20O
2
- H2O CO CH2O2 C2H4O2 C4H2O2 C4H6O2 C6H10O2
C16H18O - CO C2H2O C4O C4H4O C6H8O
C15H20O - H2O CH4O C3H2O C3H6O C5H10O
C15H18 - C3 C3H4 C5H8
C14H16 - C2H2 C4H6
C12H18 - H4 C2H8
C12H14 - C2H4
C10H10 -
NLcalc.a
1 2 3 4 5 6 7 8 9
NLobs.b
1 2 3 3 5 5 6 8 9
a
The number of possible neutral losses associated with each fragment ion
b
The number of neutral losses, of which molecular formulas are assigned, associated with each fragment ion

132
Table 4.3 Reverse MFA of brefeldin A with incorrect formula of precursor ion
MMobs. 280.1673 262.1569 244.1458 226.1363 216.1521 198.1413 184.1261 162.1419 158.1113 130.0791
C17H20N4 C15H18 C14H16 C12H18 C12H14 C10H10
C17H20N
4
- C2H2N4 C3H4N4 C5H2N4 C5H6N4 C7H10N4
-
-
-
-
C15H18 - C3 C3H4 C5H8
C14H16 - C2H2 C4H6
C12H18 - H4 C2H8
C12H14 - C2H4
C10H10 -
NLcalc.a
1 2 3 4 5
NLobs.
b
1 1 2 4 5
a
The number of possible neutral losses associated with each fragment ion (fragment ions, of which molecular formulas are not assigned, are
not counted.) b
The number of neutral losses, of which molecular formulas are assigned, associated with each fragment ion (fragment ions,
of which molecular formulas are not assigned, are not counted.)
4.4.5 Nitrogen-enriched or oxygen-enriched compounds
Bases such as adenine and cytosine are typical nitrogen-enriched
groups, and saccharides are typical oxygen-enriched groups. As the molecular masses
of CN4 (68.0122 Da) and H4O4 (68.0109 Da) are very close, the molecular formula
analysis of nitrogen-enriched and oxygen-enriched fragments tends to end up with
two molecular formulas of nitrogen-enriched and oxygen-enriched ones. For example,
the analysis of glucose neutral loss (C6H12O6, 180.0633 Da) also assigns a nitrogen-
enriched neutral loss (C7H8N4O2, 180.0647 Da) such as p-xanthine, theophylline, and
NSC265259 with the molecular mass difference of only 0.0013 Da. They may be
distinguished by another dehydrated neutral loss (C6H10O5, 162.0528 Da), i.e.,
dehydration should be observed for glucose, but not for the base. In order to minimize

133
the failure of the molecular formula analysis of oxygen or nitrogen enriched
compounds, a few commonly observed saccharides and bases are pre-assigned in the
software and the presence/absence of the dehydrated fragment is manually confirmed
later. The molecular formula analysis of streptomycin was exceptionally failed as it
contains both nitrogen-enriched substructure (C8H18N6O4) and oxygen-enriched
substructures (C7H13NO4 and C6H8O4). Since none of them are commonly present in
small molecules, there are not pre-assigned, resulting in no unique molecular formula
of the precursor molecule being assigned. Nucleoside analogs containing bases and
ribose analogs may generally have the same problem.
4.5 Results and Discussion
4.5.1 Risk of assigning incorrect molecular formula
The first “Forward Analysis” step is typically performed with the restrictions
of 200 Da of molecular mass cutoff and 0.002 Da of mass accuracy. The outcome of
“Forward Analysis” applied to 96 small molecules (138 – 1569 Da) are:
1) “Forward Analysis” of 86 compounds out of 96 compounds (90%) resulted in a
unique molecular formula for each precursor molecule. Among them, 83
compounds got the correct molecular formula and 3 compounds got incorrect
molecular formula (97% success rate).
2) No molecular formula was assigned for two compounds (troleandomycin,

134
813.4511 Da; streptomycin, 581.2657 Da). In case of troleandomycin, only a few
fragment ions were detected. The MS/MS analysis does not work when there are
not enough detected peaks of fragment ions. In case of streptomycin, sufficient
numbers of fragment peaks were observed; however, our analysis seems to be
rather weak in analyzing sugar-containing compounds, requiring further
improvement of the analysis.
3) Two molecular formulas were assigned for each of seven compounds. “Least-
square Index” on the outcome of “Reverse Analysis” assigned the correct
molecular formula in all seven cases.
4) Three molecular formulas were assigned for one compound protoveratrine A
(793.4249 Da). “Least-square Index” on the outcome of “Reverse Analysis”
selected the correct molecular formula.
Thus, the integrated approach minimized the risk of assigning incorrect
molecular formula to 3% (3 out of 96 compounds).
4.5.2 Mass accuracy
The accuracy of the data is crucial for the success of the molecular formula
analysis. The mass instrument is calibrated and tuned for peak shape (symmetry and
tailing), resolving power (9,000 - 10,000), and ion abundance in the mass ranges of

135
interest. Peaks, of which ion abundance exceeded the peak saturation (400
counts/scan) or was less than 20 counts/scan, tended to have large errors even after
accumulation of several scans and were not used in the analysis. Alternatively,
isotope peak could be used if the monoisotope peak exceeded the peak saturation.
The molecular formula analysis used a total of 850 peaks of the 96 compounds. After
assigning the molecular formula to all of the 850 peaks, the mass accuracy of these
peaks was 0.0012 Da with 0.0016 Da precision. The mass accuracy of the 734 peaks
(86% of 850 peaks) was ≤0.002 Da, while the mass accuracies of 55, 36, 18 and 7
peaks were 0.002 - 0.003 Da, 0.003 – 0.005 Da, 0.005 – 0.01 Da, and 0.01 – 0.014
Da, respectively. The peaks at high m/z tend to be less accurate such that 60% and
90% of the precursor and fragment ions below 500 Da showed the accuracy of ≤
0.001 Da and ≤ 0.002 Da, respectively, whereas the precursor and fragment ions
above 500 Da showed slightly lower accuracy, i.e., the accuracies of 40% and 70% of
them were ≤ 0.001 Da and ≤ 0.002 Da. For the molecular formula analysis, there is no
need to use many peaks. Instead, it is more important for the m/z values of the
neighboring fragment ions not to exceed 200 Da preferably.
4.5.3 Fragmentation pathways of brefeldin A
The MS/MS spectrum of brefeldin A is shown in Figure 4.01. Figure
4.02 shows plausible fragmentation pathways analyzed based on the fragment ions
used in the molecular formula analysis (Table 4.2). Further MSn
(n ≥ 3) studies are
required to validate them. Also the fragment ions that are not used in the molecular

formu
on th
introd
[C16H
loss
[C16H
two p
4.02.
et al.
are d
peaks
mass
ula analysis
Figure 4.
Neverthe
he structure
duced to ex
H23O3]+ (m/
of CO fro
H19O]+ (m/z
pathways fr
Meijuan e
., 2006). Th
different fro
s at 227, 21
measureme
s would prov
.01
less, the pla
and substru
xplain three
/z 263), [C
om [C16H21
z 227) to [C
rom [C16H19
et al. report
he molecular
om ours. If
17, 0.0373 D
ent accuracy
vide more d
The MS/M
ausible fragm
cture of the
consecutiv
16H21O2]+ (
1O2]+ (m/z
15H19]+ (m/
9O]+ (m/z 2
ed other fra
r formulas o
f we use th
Da, respecti
y.
136
detailed fragm
MS spectrum
mentation p
target mole
e water loss
(m/z 245), a
z 245) to
/z 199). Furt
227) and fro
agmentation
of fragment
heir molecul
ively, which
mentation p
of brefeldin
pathways pro
ecules. McL
ses from [C
and [C16H19
[C15H21O]+
ther fragmen
om [C15H21O
n pathways o
ions at m/z
lar formula
h are highly
pathways.
n A
ovide useful
Lafferty rear
C16H25O4]+
9O]+ (m/z 2
+ (m/z 217
ntations are
O]+ (m/z 21
of brefeldin
z 227, 217,
a analysis, t
y unlikely b
l informatio
rrangement i
(m/z 281) t
227) and th
7) and from
e described i
17) in Figur
n A (Meijua
199, and 18
the fragmen
based on ou
on
is
to
he
m
in
re
an
85
nt
ur

137
Figure 4.02 Fragmentation pathways of brefeldin A
McLafferty
rearrangement
O
O
OH
H
OH
HO
H
OH
H
OH
HO
H
−CO
−CO
−H2C=C=O
H
H
H
O
−H2O
O
−H2C=C=C=O
H
H
OHOH
HH
−H2O
−H2O
−H2O
−2xH2
−C2H4
H
C16H25O4
+
281 Da
H
O
C
O
C
O
C16H23O3
+
263 Da
C16H21O2
+
245 Da
H
C15H21O+
217 Da
H
C12H19
+
163 Da
H
C15H19
+
199 Da
H
C16H19O+
227 Da
H
C14H17
+
185 Da
H
C12H15
+
159 Da
H
C10H11
+
131 Da
OH
O H
4.5.4 Molecules with single structural domain
The majority of small molecules consist of a single core structure. An
example is prazosin (C19H21N5O4, 383.1594 Da) for which the MS/MS spectrum is
shown in Figure 4.03 and the result of the MFA is shown in Table 4.4. The molecular
formula of prazosin is correctly assigned to C19H21N5O4. Some of the neutral losses,
which are underlined, are easily assigned to the losses of methane (CH4), water
(H2O), and furan (C4H4O), suggesting substructures of CH3-(O or N)- and furan.
Molecular formula of the peak at m/z 232 was incorrectly assigned to C13H14NO3+

and w
prazo
occur
fragm
fragm
shoul
or a f
a few
to tol
easily
fragm
Prazo
side c
C12H
bond
was manuall
osin (bold in
rs in ESI Q-
mentations, w
mentations o
ld be empha
few fragmen
w peaks may
lerate such e
y identified
ment and pre
Figure 4.
osin is predo
chain of C7H
H15N4O2+ los
cleavages o
ly corrected
n Table 4.4)
-TOF instrum
which thus h
of 96 compo
asized that th
nt ions are in
y have errors
errors. Tho
and correct
ecursor ions
.03
ominantly sp
H8NO2+ thro
es a methyl
of prazosin p
d to a free ra
. Since the f
ment, the so
have to be c
ounds, this w
he software
ncorrectly a
s larger than
se one or fe
ted manually
s are assigne
The MS/M
lit into two f
ough the clea
free radical
roduces C9H
138
adical of C11
fragmentatio
oftware does
corrected ma
was the only
works even
assigned. As
n the error c
w incorrectl
y, once the m
ed.
MS spectrum
fragment ion
avages of two
to form C11H
H10NO2+ and
1H11N4O2• b
on producin
s not accept
anually as in
y case to pro
n if the mole
s the observ
cut off, the s
ly assigned
molecular fo
of prazosin.
ns of a core o
o C-N bonds
H11N4O2•. A
d C10H12N4O
based on the
ng free radic
t the free rad
n Table 4.4.
oduce free ra
ecular formu
ed m/z valu
software wa
molecular f
formulas of o
.
of C12H15N4
s (Figure 4.0
Another type
O2; however,
e structure o
cals hardly
dical
. Among the
adical. It
ulas of one
ues of one or
s designed
formulas are
other
O2+ and a
04). The core
e of two C-N
, it is not
f
e
r
e
e
N

139
clear why the charge is localized to C9H10NO2+ rather than C10H12N4O2 as the charge is
easily localized on the similar fragment C12H15N4O2+ (m/zobs. = 247.1214). Other
plausible fragmentation pathways of the fragment ions used in the molecular formula
analysis are shown in Figure 4.04. Erve et al. reported the fragmentation pathway of
prazosin in agreement with ours (Erve et al., 2008). The major structural difference is the
fragment peak at 231 m/z. Their fragmentation includes a cleavage of a relatively stable
C-C bond, whereas our analysis includes the cleavages of less stable heteroatom C-O
bond. It should be mentioned that the fragmentation pathways are analyzed by using the
fragment ions used in the molecular formula analysis. Other peaks could be incorporated
to get more detailed analysis of the fragmentation pathways; however, they are all
speculative and are not worthwhile unless required.
Figure 4.04 Fragmentation pathways of prazosin.
N
N N
N
O
OMeO
MeO
NH2
N
N N
N
O
OMeO
O
NH
N
N NMeO
MeO
NH2
N
O
O
N
N NMeO
O
NH
+
N
N N
N
MeO
MeO
NH2
N
O
O
N
N N
N
O
MeO
NH
O
O
−CH4
−H2O
−CH4
−H2O
Major pathway
N
N NMeO
O
NH2
−CH3
.
−CH4
H
C19H22N5O4
+
384 Da
H
C18H18N5O4
+
368 Da
H
C18H16N5O3
+
350 Da
H
C19H20N5O3
+
366 Da
C7H8NO2
+
138 Da
H
C12H15N4O2
+
247 Da
H H
C11H11N4O2
+
231 Da
C11H12N4O2
232 Da
C9H10NO2
+
164 Da
H

140
Table 4.4 Molecular Formula Analysis of prazosin
MMobs. 383.1581 367.1291 365.1492 349.1183 246.1096 231.0888 230.0805 163.0638 137.0474
- C19H21N5O4 C18H17N5O4 C19H19N5O3 C18H15N5O3 C12H14N4O2 C11H11N4O2
·a
C11H10NO2 C9H9NO2 C7H7NO2
C19H21N5O4 - CH4 H2O CH6O C7H7NO2 C8H10NO2· C8H11NO2 C10H12N4O2 C12H14N4O2
C18H17N5O4 - H2O C6H4NO2 C7H6NO2· C7H7NO2 C9H8N4O2 C11H10N4O2
C19H19N5O3 - CH4 C7H5NO C8H8NO· C8H9NO C10H10N4O C12H12N4O
C18H15N5O3 - C6HNO C7H4NO· C7H5NO C9H6N4O C11H8N4O
C12H14N4O2 - CH3· CH4 C3H5N3 C5H5N3
C11H11N4O2
·a
- C2H2O· C4H4O·
C11H10N4O2 - C2HN3 C4H3N3
C9H9NO2 - C2H2
C7H7NO2 -
a
initial incorrect assignment to C13H13NO3 required manual correction to C11H11N4O2·
4.5.5 Molecules with multiple core structures.
Some molecules consist of two or more core structures. An example was
shown with Taxol in our previous paper (Konishi et al; 2007). Taxol split into two
core structures in the MS/MS fragmentation. Another example is ergot alkaloid
dihydroergotamine (C33H37N5O5, 583.1795 Da), of which MS/MS spectrum is shown
in Figure 4.05A. Table 4.5 is the result of the molecular formula analysis. The
molecular formula of dihydroergotamine is correctly assigned to C33H37N5O5.
However, the molecular formula of a peak at m/z 270 was incorrectly assigned to
C11H19N5O3, as the dehydrated dihydroergotamine split into a fragment ion
C17H17N2O3+ (m/z 297) and a neutral loss C16H19N3O (269 Da). The peak at m/z 270
must be the protonated form of C16H19N3O. Thus the molecular formula of the peak
at m/z 270 was corrected to C16H20N3O+ (bold in Table 5). Some small neutral
losses, which are underlined in Table 4.5, are easily assigned to the losses of water

(H2OO), carbon m
Figure 4.
Figure 4.
monoxide (C
.05A
.05B
O), carbon d
The MS/M
The MS/M
141
dioxide (CO
MS spectrum
MS spectrum
O2), and amm
of dihydroe
of dihydroe
monia (NH3
ergotamine
ergocristine
3).

142
Table 4.5 Molecular Formula Analysis of dihydroergotamine
MMobs. 583.2810 565.2702 537.2753 321.1477 296.1163 269.1504 252.1260
- C33H37N5O5 C33H35N5O4 C32H35N5O3 C19H19N3O2 C17H16N2O3 C16H19N3Oa
C16H16N2O
C33H37N5O5 - H2O CH2O2 C14H18N2O3 C16H21N3O2 C17H18N2O4 C17H21N3O4
C33H35N5O4 - CO C14H16N2O2 C16H19N3O C17H16N2O3 C17H19N3O3
C32H35N5O3 - C13H16N2O C15H19N3 C16H16N2O2 C16H19N3O2
C19H19N3O2 - C3O C3H3NO
C17H16N2O3 - CO2
C16H19N3Oa
- NH3
C16H16N2O -
a
initial incorrect assignment to C11H19N5O3 required manual correction to C16H19N3O
The plausible fragmentation pathway of dihydroergotamine is shown in Figure 4.06.
The precursor molecule is in ketal-keto equilibrium. A cleavage of the ketal and keto
forms at an amide bond in the linker splits the precursor into a fragment ion
C16H17N2O+
(m/z 253) and a neutral molecule C17H21N3O4. A cleavage of the keto
form at a C-N bond of the linker splits the precursor ion into a fragment ion
C16H20N3O+
(m/z 270) and a neutral molecule C17H18N2O4. The ketal form looses
formic acid to form C32H36N5O3
+
(m/z 538), of which linker is cleaved at a C-
N bond. The keto form similarly looses water to form C33H36N5O4
+
(m/z 567). The
linkers of these fragment ions C32H36N5O3
+
(m/z 538) and C33H36N5O4
+
(m/z 567) are
cleaved at each of two amide bonds and a C-N bond, generating fragment ions
C19H20N3O2
+
(m/z 322), C16H17N2O+
(m/z 253), and C16H20N3O+
(m/z 270),
respectively. The cleavage of the amide bond of the dehydrated fragment ion
C33H36N5O4
+
(m/z 567) also generates a fragment ion C17H17N2O3
+
.

143
4.5.6 Analysis of structurally-related compounds
Drugs are metabolized in vivo. Some metabolites have biological activities
and/or toxicity that may cause adverse effects. Thus, metabolomics and
metabonomics play key roles in drug development. Similarly, series of chemical
modifications of natural products in vivo such as microbial transformation,
biotransformation and plant cell culture produced their derivatives. Since metabolites
are structurally related with minor chemical modifications, molecular formula
analysis of metabolites and their fragment ions would provide some structural
information to identify/characterize them. Dihydroergotamine and dihydroergocristine
are structurally close homologues and thus, we used them as a model, although
dihydroergotamine is not a metabolite of dihydroergocristine and vise versa. The
MS/MS spectrum of dihydroergocristine is shown in Figure 4.05B and Table 4.6 lists
the precursor and MS/MS fragment ions of dihydroergocristine, which is homologous
to dihydroergotamine in Table 4.5. The molecular formula of a fragment ion at m/z
594 was not assigned in Reverse Analysis; however, the neutral loss of 18.0096 Da
was manually assigned to H2O (MMcalc = 18.0106 Da), allowing the assignment of the
molecular formula of the fragment ion to C35H39N5O4 (bold in Table 4.6).

144
N
N
O
O
O
NH
CH3
H
OH
O
N
NH
CH3
H
H
C33H38N5O5
+
584 Da
H
C33H36N5O4
+
566 Da
N
N
HO
O NH
CH3
H
N
NH
CH3
H
H
H
O
O
O
N
N
O NH
H
N
NH
CH3
H
H
O
O
O
−H2O
NH
N
NH
CH3
H
H
O
O
C19H20N3O2
+
322 Da
N
N
O
H
O
O
C17H17N2O3
+
297 Da
NH2
N
NH
CH3
H
H
O
N
NH
CH3
H
H
O
C16H17N2O+
253 Da
C16H20N3O+
270 Da
Ketal form Keto form
H
N
NH
N
NH
CH3
H
H
O
O
−HCOOH
C32H36N5O3
+
538 Da
N
O
H H
Figure 4.06 Fragmentation pathways of dihydroergotamine.
The fragment ions of 253 and 270 Da at the center of the third row are also observed in the
MS/MS spectrum of dihydroergocristine. The two fragments of the first row, the 2nd
fragment
of the second row and the first and last fragment of the third row are the precursor ion and the
fragment ions that are different from those in the MS/MS spectrum of dihydroergocristine.
The first fragment ion of the second row was not observed in the MS/MS spectrum of
dihydroergocristine.
The structure of dihydroergocristine is analyzed based on the structure
of dihydroergotamine and the assumed comparative fragmentation pathways of
dihydroergotamine and dihydroergocristine. The MS/MS spectra of
dihydroergotamine and dihydroergocristine are similar with a few common fragment
ions at m/z 270 and 253. Thus, dihydroergocristine is assumed to contain the same
fragment ions at m/z 270 and 253 as the first step of constructing the structure of
dihydroergocristine as shown in red in Figure 4.06. The precursor and fragment ions

145
of dihydroergocristine at m/z 612, 594, 350, 325 have an extra C2H4 than the
precursor and fragment ions of dihydroergotamine at m/z 584, 566, 322, 297,
respectively (those fragment ions are in blue color). Figure 4.07 shows the un-
common portion of these ions as circulated in red. C2H4 can be added only to the
methyl group converting it to n-propyl group or isopropyl group. Indeed,
dihydroergocristine has isopropyl group.
Table 4.6 Molecular Formula Analysis of dihydroergocristine
MMobs. 611.3123 593.3027 349.1784 324.1481 269.1506 252.1265
- C35H41N5O5 C35H39N5O4 C21H23N3O2 C19H20N2O3 C16H19N3O C16H16NO2
a
C35H41N5O5 - H2O C14H18N2O3 C16H21N3O2 C19H22N2O4 C19H25N3O4
C35H39N5O4 - C14H16N2O2 C16H19N3O C19H20N2O3 C19H23N3O3
C21H23N3O2 - C5H7NO
C19H20N2O3 - C3H4O2
C16H19N3Oa
-
C16H16NO2 -
a
initial incorrect assignment to C11H19N5O3 required manual correction to C16H19N3O
N
N
O
O
O
NH
CH3
CH3
H
OH
O
N
NH
CH3
H
H
Dihydroergocristine
N
N
O
O
O
NH
CH3
H
OH
O
N
NH
CH3
H
H
Dihydroergotamine
Figure 4.07 Structures of dihydroergotamine and dihydroergocristine, in which
the location of structural difference are circled.

4.5.7
allyln
allyln
of cy
precu
allyln
is sho
are t
which
the fr
cycla
fragm
simil
group
7 Cyclazoc
An
normetazoci
normetazoci
yclazocine a
ursor and
normetazoci
own in Figu
he same as
h was not u
fragmentatio
azocine and
ment ion of N
lar way as
p. Indeed, N
Figure 4.
ine and N-a
nother ex
ine, where
ine as its an
and N-allyl
fragment
ine, respecti
ure 4.09. Th
s those of N
sed in the m
on pathway
N-allylnorm
N-allylnorm
cyclopropy
N-allylnorm
.08A
allylnormeta
xample of
we take
nalog. Figur
lnormetazoc
ions of b
ively. The p
he fragment
N-allylnorm
molecular fo
(Figure 4.0
metazocine i
metazocine,
ylmethyl gro
metazocine h
The MS/M
146
azocine
f metaboli
cyclazocin
res 4.08A an
cine, respec
benzomorph
plausible fra
t ions of cy
metazocine.
rmula analy
9). The mo
is CH2. Sinc
it is very lik
oup of cycl
has N-allyl g
MS spectrum
ites is
ne as know
nd 4.08B sh
tively. Tab
han opioids
agmentation
yclazocine a
Another fra
ysis, is also
olecular form
ce the peak
kely that the
lazocine wi
group.
of cyclazoci
cyclazocine
wn compou
how the MS
les 4.7 and
s cyclazoci
n pathway of
at m/z 175,
agment ion
common an
mula differe
at m/z 216
e nitrogen is
ith cyclopro
ine
e and N
und and N
S/MS spectr
d 4.8 list th
ine and N
f cyclazocin
173 and 15
at m/z 216
nd is added i
ence betwee
is the larges
s alkylated i
opyl or ally
N-
N-
ra
he
N-
ne
59
6,
in
en
st
in
yl

MM
-
C18H2
C15H1
C14H1
C12H
C12H
C11H
M
C1
C
C
C
Figure 4.
Table 4.7
Mobs. 271
- C18H
25NO
19NO
19NO
H14O
H12O
H10O
Table 4.8
MMobs.
-
17H23NO
C12H14O
C12H12O
C11H10O
.08B
7
1.1928 2
H25NO C
-
8
257.177
C17H23N
-
The MS/M
Molecular F
229.1472
C15H19NO
C3H6
-
Molecular F
73
NO
147
MS spectrum
Formula Ana
217.1466
C14H19NO
C4H6
-
Formula Ana
174.1041
C12H14O
C5H9N
-
of N-allylno
alysis of cycl
174.1043
C12H14O
C6H11N
C3H5N
C2H5N
-
alysis of N-a
172
C12
C5
ormetazocine
lazocine
3 172.08
C12H12
C6H13
C3H7N
C2H7N
H2
-
allylnormetaz
2.0876
2H12O
5H11N
-
e
883 158
2O C11
3N C7H
N C4
N C3
C
C
zocine
158.072
C11H10O
C6H13N
CH4
CH2
-
.0729
H10O
H15N
4H9N
H9N
CH4
CH2
-
25
O
N

148
HO
N
H
C18H26NO+
272 Da
HO
N
C15H20NO+
230 Da
HO
NH
H
C14H20NO+
218 Da
HO
N
H
C14H18NO+
216 Da
HO
H
C12H15O+
175 Da
HO
H
C12H13O+
173 Da
HO
H
C11H11O+
159Da
Figure 4.09 Fragmentation pathways of cyclazocine.
The common fragment ions with those of N-allylnormetazocine are the second and third
of the second row and those of the third row. The uncommon molecular ion with that of
N-allylnormetazocine is the first of the second row. N-Allylnormetazocine did not show
corresponding fragment ions of the first row in the MS/MS spectrum
4.5.8 Peptides
Molecular formula analysis of peptides is useful for proteomics and
identification of bioactive peptide metabolites. 5-Leucine enkephalin was analyzed as
an example (Figure 4.10 and Appendix C, Table C.1). Fourteen fragment ions were
used for the analysis, more than those used for small organic molecules as the
analysis was extended from the molecular formula analysis to peptide sequencing.
The molecular formulas of the fragment ions at m/z 538 and 510 were not assigned by
the software due to the larger errors of the observed m/z values than the error cutoff
0.002 Da, and were assigned manually as shown in bold. The molecular formulas of
the neutral losses that are assigned manually are also shown in bold.

requi
analy
of th
acids
formu
Ile as
acids
amin
(2Gly
amin
termi
assig
amid
fragm
Figure
The use o
ires further
ysis of the M
e peptide.
s – H2O) an
ula with gre
s they are n
s are not fou
no acid resid
y + Leu +
no acid com
inal free ca
gned to the a
de bond clea
ment C11H12
e 4.10
of molecula
sequence an
MS/MS data
Some neutr
nd di-peptid
ey backgrou
not distingui
und in the n
dues and H2O
Phe + Tyr
mposition of
arboxyl gro
amino acid
avage sites a
2N2O2 (204
The MS/M
ar formula a
nalysis, whi
a. The first
ral losses ar
de residues
und, resultin
ished by ou
neutral loss
O for C-term
+ H2O) ma
f the peptide
oup. The m
residues, H
and N- or C
Da) is, for
149
MS spectrum
analysis for
ch was carr
step is the
re assigned
as shown i
ng Gly, Leu
ur MS/MS fr
ses. Adding
minal residu
atching to t
e is 2Gly, L
molecular f
H2O, CO, NH
-terminal re
r example,
of 5-leucine
peptides is
ried out with
analysis of
to amino ac
in parenthes
u, Phe and T
fragmentatio
g the molec
ue resulted i
the precurso
Leu (or Ile)
formulas of
H3, and HC
esidues. The
assigned to
e enkephalin
not a goal.
h the follow
amino acid
cid residues
sis below th
Tyr, where L
on energy.
cular masse
in the molec
or molecule
), Phe, and
f the fragm
CO2H reflect
e molecular
o (Gly + Ph
n
Proteomic
wing stepwis
compositio
s (free amin
he molecula
Leu could b
Other amin
s of the fou
cular mass o
e. Thus, th
Tyr with C
ment ions ar
ting differen
formula of
he) and othe
cs
se
on
no
ar
be
no
ur
of
he
C-
re
nt
a
er

150
assignments are listed in the second row from the bottom (see Appendix C, Table
C.1). The bottom row of Table B.1 lists the amino acid residues, H2O, CO, NH3,
HCO2H, loosed from the precursor molecule to form the corresponding fragment ions.
Alternatively, the amino acid composition can be obtained by tracing
fragmentation pathways, which release amino acid residues (or di-peptides). In case
of 5-leucine enkephalin, one of such fragmentation pathway is shown in Figure
4.11A, resulting in the composition to 2Gly, Leu, Phe, Tyr and H2O. The second step
is the identification of the N- or C-terminal residues (Figure 4.11B). Since Leu and
Tyr are the first amino acid residues that can be fragmented off from the precursor
ion, they are the N- or C-terminal residues. The third step is the identification of the
C-terminal residue. As peptides often have free carboxyl group or are blocked with
amide, fragment ions that contains Leu-OH, Leu-NH2, Tyr-OH, or Tyr-NH2 were
searched, resulting a fragment ion of (Phe, Leu)-OH. Therefore, Leu was assigned to
the C-terminal residue, and, automatically, Tyr was assigned to the N-terminal
residue. The fourth step is the sequencing of internal amino acid residues. Starting
from the C-terminal Phe-Leu-OH fragment ion, in the fragmentation pathway amino
acid residues are sequentially added (or di-peptides if necessary) as of Figure 4.11C.
Similarly, starting from the N-terminal Tyr-Gly fragment ion, amino acid residues (or
di-peptides if necessary) are sequentially added as shown in Figure 4.11D. The final
step applies the sequence Tyr-Gly-Gly-Phe-Leu-OH to all fragment ions in order to
confirm the sequence. Each line below represents the sequence region of the fragment
of which molecular formula is shown in the left column. As all of the fragments used

151
in the molecular formula analysis fit to the sequence, Tyr-Gly-Gly-Phe-Leu-OH
sequence is confirmed (Figure 4.12).
C28H37N5O7
(555 Da)
C28H35N5O6
(537 Da)
C22H24N4O5
(424 Da)
C13H15N3O4
(277 Da)
C11H12N2O3
(220 Da)
(Gly, Tyr)
Leu
Phe
Gly
H2O
C28H37N5O7
(555 Da)
C28H35N5O6
(537 Da)
C19H26N4O4
(374 Da)
C13H15N3O3
(261 Da)
C11H12N2O2
(204 Da)
(Gly, Phe)
Tyr
Leu
Gly
H2O
C15H22N2O3
(278 Da)
(Phe-Leu-OH)
C17H25N3O4
(335 Da)
(Gly-Phe-Leu-OH)
C19H28N4O5
(392 Da)
(Gly-Gly-Phe-Leu-OH)
C28H37N5O7
(555 Da)
(Tyr-Gly-Gly-Phe-Leu-OH)
Gly
Tyr
Gly
C11H12N2O3
(220 Da)
(Tyr-Gly)
C13H15N3O3
(277 Da)
(Tyr-Gly-Gly)
C22H24N4O5
(424 Da)
(Tyr-Gly-Gly-Phe)
C28H37N5O7
(555 Da)
(Tyr-Gly-Gly-Phe-Leu-OH)
Phe
Leu
Gly
C28H35N5O6
(537 Da)
(Tyr-Gly-Gly-Phe-Leu)
H2O
A)
B)
C)
D)
Figure 4.11 Stepwise analysis of 5-leucine enkephalin sequences

152
C28H37N5O7
C28H35N5O6
C27H35N5O5
C27H32N4O5
C22H24N4O5
C21H24N4O4
C19H28N4O5
C21H21N3O4
C19H26N4O4
C17H25N3O4
C15H22N2O3
C13H15N3O4
C13H15N3O3
C11H12N2O3
C11H12N2O2
H2N CH C
O
H
N CH C
O
H
N CH C
O
H
N CH C
O
H
N CH C OH
OCH2
OH
H H CH2 CH2
CH
H3C CH3
Figure 4.12 Overall detail analysis of 5-leucine enkephalin
4.5.9 Chloro- or bromo-containing compounds
The molecular formula analysis was extended to organic compounds
that contain chloride and/or bromide atoms besides C, H, N and O. The inclusions of
chloride and bromide atoms in the automated molecular formula analysis are not
possible with the precision of 0.002 Da. However, as chloride and bromide have
characteristic isotopes of 37
Cl (24.47% natural abundance) and 81
Br (49.48% natural
abundance), chloride and/or bromide atoms in precursor ion and fragment ions

estim
mole
mole
mole
beclo
319.0
(C18H
corre
quina
resto
quina
may
96 co
mated based
cular masse
cular formu
cular form
ometasone d
0948 Da),
H22ClNO, 3
ect molecula
acrine as an
ring a chlor
acrine based
not be enou
ompounds an
Figure 4.
d on their is
es of these
ula analysis.
mulas manu
dipropionate
brimonidi
03.1390 Da
ar formulas f
n example.
ride atom an
d on the ana
ugh to gener
nalyzed in t
.13 The M
sotope peak
ions are m
. Chloride
ually. The
e (C28H37C
ine (C11H1
a), and quin
for all of the
Table 4.9
nd Figure 4
alysis in Tab
alize the an
this article.
MS/MS spectr
153
ks are repla
modified acc
and/or brom
method w
lO7, 520.22
10BrN5, 29
nacrine (C23H
em. Figure 4
shows the
4.14 shows c
ble 4.9. Sinc
alysis, these
rum of quin
aced with h
cordingly an
mide atoms
was applied
228 Da), b
91.0120 D
H30ClN3O,
4.13 shows
e molecular
chemical fra
ce 5 compou
e compound
acrine
hydrogen at
nd are subm
are then re
d to five
benzamil (C
Da), phenox
399.2077 D
the MS/MS
r formula a
agmentation
unds contain
ds are not in
tom(s). Th
mitted to th
estored to th
compound
C13H14ClN7O
xybenzamin
Da), resultin
S spectrum o
analysis afte
n pathway o
ning Cl or B
ncluded in th
he
he
he
ds
O,
ne
ng
of
er
of
Br
he

154
Table 4.9 Molecular Formula Analysis of quinacrine
MMobs. 399.2080 326.1167 258.0573 243.0446 141.1510
- C23H30ClN3O C19H19ClN2O C14H11ClN2O C14H10ClNO C9H19N
C23H30ClN3O - C4H11N C9H19N C9H20N2 C14H11ClN2O
C19H19ClN2O - C5H8 C5H9N C10ClNO
C14H11ClN2O -
C14H10ClNO -
C9H19N -
N
NH
O
Me
N
HCl
C23H31ClN3O+
400 Da
N
NH2
O
Me
Cl
N
+
H
C14H12ClN2O+
259 Da
C9H20N+
142 Da
N
NH
O
Me
Cl H
C19H20ClN2O+
327 Da
N
O
Me
Cl
H
C14H11ClNO+
244 Da
Figure 4.14 Shows the plausible fragmentation pathways of quinacrine,
where the cleavages occurred at C-N bonds.
4.6 Conclusions
The automated software developed to determine molecular formula (C, H, N,
and O) of precursor molecules demonstrated a high success rate of 95% . The

155
software also determined the molecular formulas of fragment ions and neutral losses.
Although a few molecular formulas of the fragment ions were incorrectly assigned
due to the large errors of the observed m/z values, the software managed to get correct
molecular formula of the precursor ion. The incorrectly assigned molecular formulas
of fragment ions and neutral losses were then easily corrected manually. Using the
molecular formulas of precursor ion, fragment ions and neutral losses, plausible
fragmentation pathways were estimated, which were used for structural
characterization of homologous compounds such as metabolites of drugs and
secondary metabolites of natural products. Such structural characterization of
homologous metabolites may be used in dereplication of natural products and
identification of drug metabolites. The analysis was further successfully extended to
compounds containing Cl and Br.
4.7 Acknowledgements
We acknowledge Beata Usakiewicz for her technical assistance. This work
is supported by a McGill Chemical Biology Scholarship.

156
Chapter 5
Molecular Dynamics Ensemble in Virtual
Screening

157
Preface
The contents presented in the following chapter are from the following work:
Acoca S, Hogues H, Purisima E. 2010. Molecular Dynamics Ensembles in Virtual
Screening. Manuscript in preparation.

158
5.1 Rationale
Chapter 2 and 3 looked at the application of current modeling technologies to
address certain problems in the development of pharmaceuticals. Chapter 4 was a method
development in the identification of novel compounds from natural sources. Chapter 5
follows along the methods development in attempting to explore the enhancements that
can be made to the virtual screening pipeline through the use of conformational
ensembles. Since high-resolution experimental ensembles are not always available for
targets, molecular dynamics ensembles have been proposed as a promising alternative.
Our work in Chapter 5 validated the use of MD-based ensembles using a test set of 9
targets chosen from the Directory of Useful Decoys. The enrichment results are
compared for ensembles generated from an apo MD, a holo MD, and to the crystal
structure itself.
5.2 Abstract
In order to assess the performance of molecular dynamics-based ensembles in
virtual screening, a set of 9 targets were selected from the Directory of Useful Decoys.
10 ns molecular dynamic simulation were obtained for each target structure with (holo)
and without (apo) the bound native ligand. Each trajectory was then clustered using
RMSD with the criteria limiting the ensemble size to 14. Virtual screening enrichments
were then compared between the holo and apo ensembles and with that of the crystal
structure using the specialized decoy sets for each target. The results indicated significant

159
improvements in enrichments for targets displaying difficulties in docking compounds
using the crystal structure. In targets displaying significant enrichments on the crystal
alone, the benefits of conformational ensembles were minimal. Future studies should
address the specific nature of the MD which presents the most benefits to virtual
screening enrichments with regards to duration and ensemble selection. Other work
regarding the optimization of scoring methods for the use of ensembles in VS should also
be considered.
5.3 Introduction
One of the key objectives in computational drug-design is the prediction of
accurate protein-ligand complexes. More often than not, this is achieved using docking
algorithms on a high-resolution crystal structure, though NMR-derived structures and
homology models have been used successfully (Cavasotto, 2011). However, the
previously accepted “lock and key” model of the interaction between a ligand and its
protein receptor is being replaced by a more complex and dynamic one whereby both the
protein and ligand adjust to accommodate each other in what is referred to as an induced-
fit (Najmanovich et al., 2000; Fradera et al., 2002). In this model, molecules can modify
their shape and complementarity to maximize the total binding free energy (Verkhivker et
al. 2002). This is especially relevant in the active site region where catalytic residues are
usually structurally stable while mobile loops that contain the bound ligand usually
display significant flexibility. An alternative model dominates the biochemical literature
where a protein is thought to exist in a number of energetically equivalent conformations.

160
In this model, ligand binding induces a change in the conformation ensemble of the target
by binding selectively to one of these conformers and thereby promoting an increase of
that conformation within the population (Ma et al., 2002; Amaro and Li, 2010). Ligands
therefore tend to shift the equilibrium to the conformation they preferentially bind to.
Thermodynamically, both models are equivalent. It is therefore clear that molecular
recognition techniques must aim to account for the conformational flexibility of both the
ligand and the target in an accurate and comprehensive manner.
Conformational sampling of small molecules represents the easier of the two to
predict. The results however do not necessarily guarantee the lowest energy conformation
of the bound ligand. Conformational sampling of the protein represents an even bigger
problem. The importance of target conformations on docking results has been highlighted
by the work of several groups (Murray et al., 1999; Teague SJ, 2003; Ehrlich et al.,
2005). On one hand, apo conformations of a target have been found to be inadequate in
accommodating ligands because of wrongly positioned residues or loops that block
access to the binding site (Seidler et al., 2003). On the other hand, holo conformations
can display a significant bias for molecules that are structurally similar to the ligand
present in the original structure and miss a large proportion of other molecules that
display a different binding mode (Murray et al., 1999). One of the simplest means of
improving performance of docking algorithms to holo structures is to reduce the van der
Walls radii of the protein thereby potentially eliminating close contacts. The results of
this approach likely yield an improved performance in their ability to predict correct
ligand binding mode for a larger group of molecules. This approach however, does not

161
provide an improved insight into the specific interactions of the target with the ligand
since the positioning of these remains unchanged and may be inconsistent with those of
the correct ligand-bound receptor conformation. It additionally also provides an increased
propensity for false positives in virtual screening.
Methods have been developed with the aim of incorporating the receptor’s
conformational flexibility into the screening process. These include docking to MD
structures (Wong et al. 2005; Frembgen-Kesner and Elcock 2006), docking to normal
modes structures, induced-fit docking (Sherman et al. 2006), the dynamic pharmacophore
model (Carlson et al., 2000; Meagher and Carlson 2004), and the relaxed complex
scheme (Line et al. 2002; Lin et al., 2003). Additionally, receptor ensembles have also
been assembled from a collection of independent x-ray or NMR structures (Barril and
Morley, 2005; Huang and Zou 2007; Damm and Carlson 2007). The relaxed complex
scheme (RCS) is the best studied example to date of these methods (Lin et al., 2002; Lin
et al., 2003; Amaro et al., 2008a). The RCS was developed in an effort to overcome the
limitations imposed by docking to an experimental structure by incorporating the large
variety of conformational changes that characterize the binding process through the use
of ensembles in a virtual screening setting. The advantages of such an approach are
thought to be twofold. The first addresses the issue that proper docking of a ligand to the
receptor may only be possible in conformations that occur infrequently in the dynamics
of the receptor (Teague, 2003). Improved sampling therefore provides a better
opportunity for proper docking of these molecules. Secondly, strong binding of
molecules is sometimes a reflection of their multivalent binding to the receptor. Hence,

162
high-affinity compounds may be preferentially selected in an ensemble-based screening.
A method combining thorough sampling of both the ligand and receptor is therefore more
likely to provide certain advantages in virtual screening. At the core of the RCS is the
structural information obtained from an all-atom MD simulation (typically in the order of
tens of ns) of the target crystal structure with a bound ligand or substrate. The resulting
set of structures is then clustered using either RMSD or QR-factorization methods to
obtain a representative ensemble that is used in docking experiments. By combining the
use of MD ensembles with the AutoDock docking software, the RCS was successfully
applied to a number of systems (Li et al., 2002; Schames et al. 2004; Amaro et al.,
2008b; Durrant et al., 2010b).
While the RCS has addressed the advantages of MD ensembles in a virtual
screening pipeline little insight exists as to the optimal parameters through which the
ensemble set is chosen. In this work we analyze the performance of MD ensembles
generated by apo and holo starting structures to assess the variations obtained in the
conformation of the binding site and its effects on virtual screening using the DUD as a
test dataset.
5.4 Methods
The starting structures for the docking and molecular dynamics simulation
experiments were taken from the Protein Data Bank. All bound ligands, waters and ions

163
and other molecules were removed from the complexes for apo MDs while only ligands
were kept for holo MDs with the exception of Adenosine Deaminase where an active site
Zn ion was maintained. Missing side chains, terminal residues and hydrogen atoms were
added. Protonation states were assigned using the H++
server (Gordon et al. 2005). Visual
inspection of all assigned protonation states was done in Sybyl 8.0 (Tripos Inc., St. Louis,
MO) and adjusted as needed.
Harmonic restraints with a force constant of 10 Kcal/mol/Å2
were applied to the
solute atoms as the system was heated from 100 to 300 K over 50ps in the NVT
canonical ensemble. The system was then equilibrated to adjust the solvent density under
1 atm of pressure in the NPT isothermal-isobaric ensemble simulation over 50 ps. The
harmonic restraints were gradually reduced to zero during an additional four rounds of
50ps NPT simulations followed by a 50ps equilibration period. Production runs of 10 ns
were then run for each complex. Snapshots were collected at 1-ps intervals.
5.4.2 Ligand Preparation and docking
Ligand partial charges were calculated with Molcharge (OpenEye, Inc., Santa Fe,
NM) based on the AM1-BCC method (Jakalian et al., 2000). Primary ligand
conformations were generated by Omega using a window of 20, an rms of 0.4Å and
“maxconf” of 100 (OpenEye, Inc., Santa Fe, NM). A secondary set of ligand
conformations were generated using an rms of 0.2Å and “maxconfs” of 5000 which are

164
mapped to the primary set of conformations and docked only to enrich top docking
conformations of the first set.
Docking was performed using our in-house exhaustive, rigid-body (translation
and rotation) docking software (manuscript in preparation). A rectangular box enclosing
the entire binding groove defined the search region. We used a grid spacing of 0.6 Å and
rigid body rotational angular increments corresponding to atomic displacements of 0.6 Å.
Poses were scored using a weighted combination of van der Waals, Coulomb, surface
area, shape complementarity and hydrogen bonding terms. The weights were previously
calibrated to reproduce binding poses of a training set of protein-ligand complexes. Each
ligand for each set of the DUD was docked first into the crystal structure of the target
(labeled in Table 5.1) and into each of the MD-generated ensemble of structures. The top-
scoring binding mode against each structure is the one considered for the final results.
5.4.3 Molecular Dynamics Simulations
Each system was immersed in a 12Å truncated octahedral TIP3P water box.
Sodium or chloride counterions were added as required to maintain electroneutrality of
the system (Jorgensen et al., 1983). 10 ns molecular dynamics (MD) simulations were
carried out using the AMBER program. A 2 fs time step and 12Å non-bonded cutoff was
used. SHAKE was employed to constrain bond lengths of bonds to hydrogen atoms and
the Particle Mesh Ewald algorithm was used to treat long-range electrostatics (Ryckaert
et al., 1977; Cheatham et al., 1995).

165
5.4.4 Force field parameters
The FF99SB force field in the AMBER suite of programs was used for the protein
atoms (Hornak et al., 2006; Case et al., 2005). The antechamber module of Amber Tools
was used to assign GAFF parameters for the inhibitors (Wang et al., 2004).
5.4.5 Clustering
Clustering was performed using the cluster command of PTRAJ in AMBER 10
(Walker et al., 2008). The RMSD clustering option was engaged and limited to residues
within an 8Å radius around the bound ligand of the crystal structure complex. The RMSD
was set to a value that allowed 8-14 clusters to be generated for each molecular dynamic
simulation clustered.
5.4.6 Test Data Sets
9 targets were manually chosen from the Directory of Useful Decoys (DUD) for
the purposes of our study (Table 5.1) (Huang et al., 2006).The DUD set assembles a set
of true binders for each target to which 36 compounds are selected as decoys. Each set of
36 decoys is selected from the ZINC database to match features of a true binder of the set
(Irwin and Shoichet, 2005). The decoys are matched to a ligand on the basis of the
molecular weight, number of rotatable bonds, number of hydrogen bond acceptors,
number of hydrogen bond donors and log P. All target structures were prepared as
described above.

166
5.5 Results
5.5.1 Overview of Results
The first analysis for our ensemble-based virtual screening tests compared
ensembles generated from MD simulations starting with an empty target structure (i.e.
apo containing no bound ligand) and were compared to that with the native ligand bound
(i.e. the holo crystal structure). Figure D.01 (Appendix D) illustrates the overall
performance of each VS method comparing the results from the standard VS to the
crystal structure to the best score across the apo and holo ensembles.
Table 5.1 Targets of the DUD set selected and properties of each set
Target PDB Ligands Decoys
Factor Xa (FXa) 1f0r 142 5102
Androgen Receptor (AR) 1xq2 74 2630
Estrogen Receptor (ER) 3ert 39 1399
Epidermal Growth Factor Receptor
(EGFR)
1m17 416 14914
Adenosine Deaminase (ADA) 1stw 23 822
Cyclooxygenase-2 (COX2) 1cx2 349 12491
Poly ADP-Ribose Polymerase (PARP) 1efy 33 1178
Glynacinamide Ribonucleotide
Transformylase (GART)
1c2t 21 753
Heat Shock Protein 90 (HSP90) 1uy6 24 861

167
The performance can be measured by looking at the increase of true binders that are
retrieved across the ranked set of compounds for each set. For instance, looking at Figure
5.2a, at 10% of the ranked set, ranking across crystal structure and the apo/holo
ensembles retrieves, 40%, 30% and 42.5% respectively. Overwhelmingly, the results
indicate a clear overall disadvantage in using the apo ensembles for VS enrichments. This
is clearly the case for AR and COX2 (Appendix D; Figure D.01c and D.01d) where a
change in the binding pocket generated throughout the apo MD prevents proper docking
of the compounds.
5.5.2 Obstructive changes during apo simulations
In Figure 5.01 we can see the changes occurring in the binding during the apo
MD simulation and clearly demonstrates the problems involved. Figure 5.01a is a close
look at the binding site of COX2 for structure 9 of the apo ensemble for which no
enrichment results could be obtained. This is usually the result of strong ѴdW clashes
between the boxed binding pocket and the docking of the compounds. When looking at
the changes that occur, the alpha helix at the top-right of the bound ligand shows a
displacement of 1.5Å towards the binding pocket, essentially closing up the active site to
binding of the compounds (Fig. 5.01a). Furthermore, the absence of bound ligands during
the simulation promotes the displacement of amino acid side chains towards the active
site which completely prevents proper docking of compounds. This is also observed in
the ER (Figure 5.01b) where even though helical movements do not close up the binding
site, repositioning of the side chains frequently interfere with the docking process. What

is mo
crysta
and h
GAR
occup
5.01d
those
looki
certai
over
ost striking is
al structure c
have detrime
RT, a 1.7Å m
pied by the l
d). Additiona
e of the holo
ing at the top
in targets lik
the crystal s
Figure 5.0
s how subtle
can have dra
ental effects
movement of
ligand (Fig. 5
ally, while th
in certain sy
p 1% of the r
ke AR and C
tructure.
01 Chang
b)AR,
ensemb
native b
a)
e side chain c
amatic effect
on the overa
the isoleucin
5.01c). Simi
he results for
ystems, it is
results which
COX2, structu
ges in binding
c) GART an
ble structure a
bound ligand
168
changes neig
ts on the abil
all enrichmen
ne side chain
ilar observat
r the apo ens
never superi
h are the mo
ures from th
g site observe
nd d) PARP. T
are colored da
.
ghboring 1Å
lity to prope
nt results. Lo
n fills up the
tions can be
semble struc
ior to the ho
ost relevant t
he holo ensem
ed in the apo
The crystal st
ark and light g
Å in deviation
erly dock the
ooking at the
e space other
made in PAR
ctures can co
lo or crystal
to VS. In con
mble do show
o ensemble in
tructure and t
grey respectiv
n from the
e true binder
e changes in
rwise
RP (Fig.
ompare to
when
ntrast, for
w advantage
n a) COX2,
the apo
vely with the
s
n
es

170
5.5.3 Performance of holo ensemble
The results for our preliminary tests using holo ensembles are illustrated in Figure
D.02 and look at the enrichments achieved in the top 5%, 2% and 1% for each set. The
1%, 2% and 5% columns are colored blue, red and green respectively and denote the
percentage of true binders that are retrieved cumulatively at each level. The performance
of VS at these levels of enrichment is especially important since the number of molecules
of interest is limited to the extent to which they can be purchased and tested. In VS, this
signifies that compounds ranked in the top 1% or lower are of the highest interest and
will be looked at carefully throughout this analysis. Figure D.02 (Appendix D) illustrates
the results for a) ER, b) HSP90, c) EGFR, d) COX2, e) PARP, f) AR, g) GART, and h)
FxA. The column labeled as “Crystal” denotes the results for the virtual screening based
on the crystal structure itself while the numbered columns denotes the VS results for each
structure within the ensemble. Additionally, the columns labeled “Min”, “Median” and
“Avg” are the VS results when ranking of the ensemble set is based on the minimum,
median and average value across the set of structures respectively.
The results show that there is an overall target-dependence on the observed
benefits of the VS enrichments seen with holo ensembles. While certain targets like ER,
EGFR, COX2 and AR benefited from the holo ensemble, the performance of the crystal
structure was more robust across all targets.

171
5.5.4 Structural changes in holo ensemble
One of the targets that benefited the most from docking into the holo ensemble is
COX2 where over 90% of compounds went simply undocked for both the apo ensemble
and crystal structure. Figure 5.02 is a comparison of the effects of docking into the crystal
structure and an ensemble structure from the holo MD for COX2. The result of the MD is
an overall opening up of the pocket, especially with regards to helical movement right
above the positioning of the ligand. The proper docking of the compound is shown in
Figure 5.02b. Notice the close proximity of the sulfonamide group to the valine side
chain and the overall closeness of the helix to the dimethylaniline segment of the
compound. Docking into the crystal therefore brings about an alternative binding mode
whereby another aromatic segment of the compound is positioned at a 90° angle to that of
the original dimethylaniline and where the sulfonamide group is now sequestered in a
neighboring region of the pocket with less interference from side chains. A closer look at
AR, another target displaying significant benefits from the use of the holo ensemble,
illustrates how minute changes in a very enclosed binding pocket can have drastic effects
on enrichments (Fig. 5.03 & 5.05b). Looking at ensemble structures 1,2,3 & 4 which
showed the largest benefits with regards to enrichments, similar features are prominent,
most notably a ~1Å movement of the lower left & lower right helices which provides a
more favorable binding site for docking of compounds with larger constituents then that
of the bound ligand in the crystal structure (Appendix D, Fig. D.02f; Fig. 5.03; PDB
1XQ2). A similar effect is seen in the holo ensemble for the ER where the largest

enrich
of the
hment benef
e uppermost
Figure 5.0
The crysta
Docking o
b) Docking
fits seen in s
helix (Appe
02 Chang
al structure (da
of compound i
g in holo ense
tructure 5 of
endix D, Fig
ges in binding
ark grey) and
into crystal st
emble structu
a)
b)
172
f the ensemb
. D.02a; Fig
g site observe
d holo ensemb
tructure result
ure results in p
ble correlates
g. 5.04).
ed in the holo
ble structure 5
ts in imprope
proper positio
s with a 2Å
o ensemble f
5 (light grey)
er positioning
oning of the c
displacemen
for COX2.
are shown. a)
of compound
ompound.
nt
)
d.

Figure 5.0
Figure 5.0
03 Chang
The cry
grey) a
04 Chang
crystal
are sho
ges in binding
ystal structure
re shown.
ges in binding
structure (dar
own.
173
g site observe
e (dark grey)
g site observe
rk grey) and h
ed in the holo
and holo ense
ed in the holo
holo ensembl
o ensemble f
emble structu
o ensemble f
le structure 5
for AR.
ure 4 ( light
for ER. The
(light grey)

174
5.5.5 Effect on score distribution
The potential enrichment benefits of the holo ensemble are more clearly seen by
looking at the effects of the holo ensemble on the score distribution of true binders (Fig.
5.05). The total number of docked compounds for both COX2 and AR is larger in the
ensemble since true binders could not be docked in the crystal structure. However,
benefits for the ensemble structures are evident in the score distribution seen for ER and
EGFR, where no increase in the total number of ligands docked was seen. Hence, the
implications of the results from both Fig. D.02 (Appendix D) and 5.05 indicate that
structures within the holo ensemble have an enrichment potential larger than the crystal
structure itself. The problem in this set therefore lies in the identification of the targets for
whom the use of the holo ensemble may be beneficial and of the method to best extract
the enrichment potential from the set of structures obtained. Our work makes it evident
that different structural features from the native ligand be problematic in a VS setting
where the binding pocket of the target may be too constricted for the proper docking of
such compounds. Nevertheless, our results still indicate that a clear benefit in a VS
pipeline requires a means of properly identifying the structure within the ensemble and/or
scoring across the ensemble. So far, our analysis has only been able to observe that such
structures exist within the ensemble for certain targets. It is interesting to note that targets
which performed well on the standard protocol seemed to perform similarly on the holo
ensemble (i.e. FxA, EGFR, ER, etc.). FxA and ER in particular were targets for which
improvements were not expected given their previously documented favorable
performance on our pipeline (unpublished results). The most interesting aspect of this

175
study is that targets that did not perform well could find drastic improvements through
the use of a holo ensemble, most notably COX2 and AR. Hence, when a documented test
set of true binders exists for a target, bad performance on a VS enrichment test combined
with significant constraints within the binding pocket may be an indication of a target that
would be especially sensitive to the use of this method.
Figure 5.05 Score distribution of true binders across the crystal structure and
selected holo ensemble structure for a) ER, b) AR, c) EGFR, and d)
COX2.
a)
-1
0
1
2
3
4
5
6
7
8
-33 -32 -31 -30 -29 -28 -27 -26 -25 -24 -23 -22 -21
Crystal
holo #5

176
b)
c)
-1
0
1
2
3
4
5
6
7
8
9
-27 -26 -25 -24 -23 -22 -21 -20 -19 -18 -17 -16 -15
Crystal
holo #3
0
10
20
30
40
50
60
70
80
90
100
-32 -31 -30 -29 -28 -27 -26 -25 -24 -23 -22 -21 -20 -19 -18 -17
Crystal
holo #3

177
d)
5.5.6 Comparison with RCS
Unquestionably, the most avid proponents of MD-ensembles in virtual screening
has been the McCammon group and their documented success with TbRel1 and HIV
integrase (Schames et al., 2004; Amaro et al., 2008b). However, while their discovery of
a new binding trench in the active site of HIV integrase could only have been done
through MD observations, our own VS results for TbRel1 indicated that with few
exceptions, most of the true binders they uncovered would have been uncovered in a
simple screen on the crystal structure (see Chapter 3). This follows closely with the
observations made here for the above mentioned targets where binding sites that are
0
10
20
30
40
50
60
70
80
-32
-31
-30
-29
-28
-27
-26
-25
-24
-23
-22
-21
-20
-19
-18
-17
-16
-15
-14
-13
-12
-11
Crystal
holo #5

178
narrow and constricted display benefits from the MD. In addition, the TbRel1 VS results
also indicated that slightly larger compounds than the native ligand also benefit from a
slight relaxation of the binding pocket. That was the case for compounds V3 and V4 of
Amaro et al., which owing to the larger ring structure could not be docked properly into
the crystal binding site (unpublished results; Amaro et al., 2008b). It should also be
pointed out that our results rely on the use of 10ns MD simulations while the RCS
binding predictions mentioned here were done on 20ns MD simulations (Amaro et al.,
2007; Amaro et al., 2008b; Perryman et al., 2010). Another key difference is the nature
of our docking programs. While the in house docking is near exhaustive is its exploration
of the binding pocket, the RCS uses AutoDock which employs a heuristic algorithm
(LGA) which may or may not arrive at the optimal solution (Morris et al., 1998; Morris
et al., 2009). It is clear that holo ensembles provide an overall advantage to certain targets
while minimally affecting other VS results. However, this is to the best of our knowledge
the first undertaking to systematically compare the effects of MD ensembles to a rigid
structure on VS enrichment results. While our test was structured to assess the
performance across a number of targets, a number of limitations with regards to the test
set need to be addressed.
5.5.7 Use of DUD training set
The DUD set has been well established as the standard for VS enrichment tests
(Huang et al., 2006; Irwin, 2008). Nevertheless, there are few documented shortcomings
of the set (Irwin, 2008). One of the criticisms of the DUD dataset is that of its incomplete

179
sampling of the chemical space since the pool from which its compound were drawn, the
ZINC database, is itself incomplete in its representation of the chemical space. For the
purposes of our endeavors, this does not present a significant detriment to the real-world
significance of our results as the majority of our virtual screening activities are performed
using the ZINC database. The more important concern for the purposes of our group is
that the selected set for each target represents only a biased fraction of the chemical space
represented in the ZINC database which, although a challenging one by design, leaves a
number of unanswered questions as to the performance of the pipeline on its entirety.
Secondly, the ratio of true binders to decoys being 1:8 seems insufficient to properly
identify the sensitivity of the pipeline for identifying true binders as real world VS
requires only a ~0.1% criteria for selection. A larger and more diverse training set with
regards to the decoys selected would seem to be of use. One group has brought forth the
use of virtual decoys to expand and/or provide an alternative to the DUD decoy set and
incorporates some of the ideas mentioned here (Wallach and Lilien, 2011). In some way,
the in silico design of virtual decoys provided a more challenging test set than the DUD
itself. The main reason behind the advantage of the virtual decoy set (VDS) is that for the
VDS data set the decoys can be generated with physical properties that match those of the
ligands to a higher degree than those of the DUD. This is a simple extension of the
limited chemical space from which the DUD draws from whereas that of the VDS is
infinite. More importantly, the VDS provides the means of obtaining a training set with
larger diversity and tailor the true binder to decoy ratio to our needs. Additionally, the
risk of overfitting can be overcome by simply producing alternate training sets with
similar characteristics. Hence, it would seem like a reasonable approach for further

180
investigations. An alternative and/or complementary approach that has already been used
would simply involve incorporating a randomized and diversified fraction of the ZINC
database to each data set.
5.6 Conclusion
This study intended to take a first look at the advantages of MD-ensembles to VS
pipelines. During the course of our analysis, certain observations could be made that
clearly define their utility. First, the use of holo ensembles was clearly beneficial to
targets such as COX2 and AR while that from apo simulations were not. The lack of
bound ligand in the apo simulations promoted the movement of side chains and
secondary structures towards the binding site and thereby interfered with proper docking
of the compounds across a number of targets and can therefore not be recommended. The
results obtained for the holo ensembles were on the other hand extremely promising and
warrant further research. This is especially the case for constricted binding pockets such
as COX2 and AR were a dramatic increase in docking ability and enrichments was seen.
The main criticism and limitation of the significance of our results aimed at the
choice of data set. While the DUD has proven itself to be a challenging test set and good
starting point for benchmarking VS pipelines, further expansion of the set to provide a
larger sampling of the chemical space, a closer matching to known ligands and a higher
ratio of true binders to decoys would provide a more discriminate and revealing test set.

181
If an objective account of the true significance of the holo ensembles to VS is to be
obtained, then similar studies must prove the advantages of the method across all targets.

182
Chapter 6
General Discussion

183
The results and methodologies presented in this thesis represent novel
contributions to knowledge and method development. The contributions in Chapters 2
and 3 consisted of the identification of a novel inhibitor for the treatment of disease and a
deeper understanding on the mechanism of action of a few novel pharmaceuticals. The
following Chapters 3 and 4 targeted method development in the area of lead
identification. In this chapter we will discuss the overall significance of the contributions
of the work and future directions.
6.1 The Molecular Dynamics Study of Bcl-2 Inhibitors
Chapter 2 was a unique study that addressed some of the literature’s questioning
regarding the nature of the selectivity of two Bcl-2 inhibitors: Obatoclax and ABT-
737/ABT-263 (Oltersdorf et al., 2005; Nguyen et al., 2007; Park et al., 2008). Previous
studies have provided different hypothesis as to the binding differences which would
promote the selectivity of ABT-737 for Bcl-2/Bcl-XL/Bcl-W and not Mcl-1 (Lee et al.,
2009). More importantly, a definite, cohesive display of the mechanism underlying this
selectivity had yet to be proposed. The first part of the study therefore focused on a
thorough investigation of the binding of ABT-737 to Bcl-2, Bcl-XL and Mcl-1. The
second segment of this study looked at the pan-Bcl-2 inhibitor Obatoclax and its probable
binding mode to Mcl-1, it’s most relevant target (Nguyen et al., 2007; Trudel et al., 2007;
Konopleva et al., 2008). The identified binding mode for Obatoclax suggests that it
utilizes features of the p2 pocket, and to a lesser extent p1, that is very well defined in all
Bcl-2 family members.

184
The study of ABT-737’s selectivity has large implications for the SB
development of Bcl-2 inhibitors. While ABT-737 provides nanomolar inhibition of Bcl-
2/Bcl-XL, a number of cases have shown that resistance to ABT-737 can develop as a
result of Mcl-1 upregulation (Konopleva et al., 2006; van Delft et al., 2006). An
understanding of the binding differences which provide a biological pathway to
resistance can therefore provide the means to overcome it through the development of
Mcl-1 specific inhibitors. Specifically, our MD simulation study showed that while the
binding to p2 and p4 is sufficient for proper inhibition of Bcl-2 and Bcl-XL, the
differences in the binding pocket at p4 and in α2/α3 region results in an unstable complex
with Mcl-1. This finding differed notably from the work done by Lee et al. which
suggested that binding angle at p2 was the cause (Lee et al., 2009). Their observations
stemmed from binding studies of a Bim mutant whereby Leu12, which penetrates the p2
pocket of the binding groove and is critical for binding of BH3 domains, was substituted
to Tyr (Lee et al., 2007). The crystal structure for the L12Y Bim mutant displayed an
angle of penetration in the p2 pocket different than that observed for Bcl-XL and
therefore suggested that correction of this difference could restore binding affinity to
Mcl-1. Manual docking of ABT-737 into the Mcl-1 pocket did suggest that the angle of
penetration was not as steep as in Bcl-XL. However, the simulations showed that while
the chlorobiphenyl ring penetration angle at p2 may not be optimal, its binding was
surprisingly stable and most likely not the primary concern in the absence of binding of
ABT-737 to Mcl-1. This result of our simulation is supported by experiments with the
ABT-737 derivative W1191542 that was synthesized and tested by Lee et al. which,
despite being modified for proper geometry at p2, also showed no inhibition of Mcl-1

185
(Lee et al., 2009). Other quinazoline derivatives of ABT-737 also show similar binding
patterns (Sleebs et al., 2011).
Another point of contention is the increased mobility of the S-phenyl ring binding
at p4. Aside from differences in the opening of the pocket at α2/α3, several residues in
Mcl-1 are notably different than in either Bcl-2 or Bcl-XL. In an effort to identify residues
critical for the high affinity of ABT-737 to Bcl-XL, we ran a virtual alanine scan. The
results were able to provide a contributing factor for the increased fluctuations of the S-
phenyl ring at p4. The residues identified Tyr195, Leu108, Tyr101, Arg100, Phe97 and
Glu96 as the largest contributors to the binding free energy (Table 2.2). While Tyr195 is
substituted for Phe318 in Mcl-1, and thus retains its aromatic character, the other amino
acids do not. Therefore, Leu108, Tyr101, Arg100, Phe97 and Glu96 become Met231,
His224, Asn223, Val220 and Gly219 respectively. These results were also found to be in
general agreement with data from Moroy et al., whereby molecular dynamic simulations
were carried out for complexes of BH3 peptides with Bcl-XL to dissect out the energetic
contributions of amino acids around the binding groove using a MM/PBSA analysis
(Moroy et al., 2009). As described in Chapter 2, the main differences found between our
results and theirs was in the magnitude of the contributions and in residues implicated in
binding BH3 domains which are not engaged similarly with ABT-737. Through the
results of our SIE scoring function, we were also able to calculate an approximate
binding free energy of ABT-737 for Bcl-XL, Bcl-2 and Mcl-1. As expected both Bcl-2
and Bcl-XL showed similar binding affinities to ABT-737 (-11.65 and -11.59 kcal/mol
respectively) whereas Mcl-1 showed a notably lower binding affinity (-9.07 kcal/mol)

186
which is consistent with experimental observation. While some of this calculated
difference can be attributed to vdW differences, an even larger difference exists between
the coulombic and reaction field terms indicating a clear tendency towards dissociation of
the complex. Some of this can attributed to the absence of a salt bridge in Mcl-1 which
exists in Bcl-xL and Bcl-2 for Glu96 with the dimethylamino group while in Mcl-1 the
corresponding Glycine does not allow it. The tendency of the ABT-737/Mcl-1 towards
dissociation is also reflected by the dynamics of the phenylpiperazine linker. During the
course of our simulation, no other region of the ABT-737 showed as clear an indication
of an unstable complex as the linker.
The following section of our study revolved around Obatoclax, a novel pan-Bcl-2
inhibitor whose primary target therapeutically is Mcl-1. While much research establishes
Obatoclax as a potent inhibitor of Mcl-1 interaction with the proapoptotic members Bim
and Bak little is known with regards to its binding mechanism. Our initial proposed
binding mode extended to a simple rigid docking of Obatoclax to Bcl-2. The result was
the placement of the indole portion of Obatoclax within the p2 pocket with the rest of the
molecule wrapping around α4 and making contact with p1 (Nguyen et al., 2007). The
improvements of our pipeline for docking studies and binding mode prediction led us to
use our in-house docking program and molecular dynamic simulations to provide a more
thorough study of the binding mode of Obatoclax. However, instead of relying on the
structure of Bcl-2 bound to ABT-737 (PDB 1YSW) as used in the previous effort, our
docking study relied on that of Mcl-1 bound to the Bim BH3 peptide (2PQK) since the

187
apo structure in unsuitable for this purpose. The choice of Mcl-1 was done in an effort to
study the target most relevant therapeutically (Albertshardt et al., 2011).
Initially, our investigation consisted of MD simulations of the top 3 binding
modes retrieved by docking. The results were conclusive in that the top binding mode
showed the least fluctuations overall (unpublished). Additionally, the top ranking binding
mode differed in numerous aspects from that proposed previously. The preferred binding
mode positioned the methoxy segment of Obatoclax at p2 while the indole segment lined
the binding pocket neighboring p1 (see Chapter2, Fig. 2.09). From a BH3-mimetic
standpoint, Obatoclax can be said to mimic the h2 Leu residue conserved throughout the
BH3 domains by binding p2 with the methoxy group. This is even more evident from the
near perfect alignment of the methoxy group with the Bim Leu152 as illustrated in Fig
2.09c.
A review of the literature revealed interesting properties of bacterial prodigiosins,
the family of small molecules from which Obatoclax is derived from (Pérez-Tomás et al.,
2010). While prodigiosins have long been known for their anti-cancer activity and
numerous mechanisms of action have been proposed, conclusive evidence linking the
methoxy substituent to cytotoxicity of these molecules has been found (Pérez-Tomás et
al., 2010). Similar observations were also found for Obatoclax where substitution for
large substituents or a hydroxyl group abrogated in vitro activity (unpublished
results).This study therefore provided the first rationale for the effect of the methoxy

188
group on cytotoxicity of prodigiosins and the further development of Bcl-2 inhibitors
based on the prodigiosin scaffold.
Previously, the bulk of existing research consisted of determining factors involved
in binding specificity of BH3 peptides towards the anti-apoptotic proteins (Chen et al.,
2005; Lee et al., 2007; Boersma et al., 2008; Lee et al., 2008; Fire et al., 2010).
Extending that research to small molecules has been problematic and with inconsistent
outcomes (Lee et al., 2009). As a whole, the work described in Chapter 2 is the first
cohesive study to specifically look at the function and selectivity of two novel Bcl-2
inhibitors. We found that while ABT-737 requires strong binding at p4 to offset the
entropic costs of forming a complex with such a large molecule, Obatoclax is proposed to
only require strong binding at p2, a feature of all BH3 domains. Obatoclax was also
found to engage p1 to a lesser extent and form strong vdW interactions with α4 of the
Bcl-2 binding pockets. Unlike ABT-737, it does not utilize features of p4 and/or require a
special arrangement of α2/α3 for stable complex formation with Mcl-1. Aside from
Obatoclax and ABT-737, several other BH3 mimetics are currently in the developmental
stages (Wang et al., 2000; Tzung et al., 2001; Chan et al., 2003; Kitada et al., 2008).
Gossypol and apogossypol, like Obatoclax, are thought to make use primarily of the p1
and p2 pockets for binding to Bcl-2 members (Wei et al., 2009a; Wei et al., 2009b).
Unsurprisingly, gossypol-derived compounds also display pan-Bcl-2 inhibition. In
addition, the novel semisynthetic derivatives make use of the p3 pocket, and not the p4,
to increase the binding affinity to Bcl-2 members and retain pan-inhibition properties
(Wei et al., 2009a; Wei et al., 2009b).

189
Our work highlighted the features that provide the grounds for a better
understanding of the mechanism behind the selectivity of BH3-mimetics. The
conclusions our research came to note that p1 and p2 may be better suited for compounds
desiring to achieve binding to the Bcl-2/Bcl-XL/Mcl-1 pockets. Mcl-1 specific inhibition
may be obtained by utilizing features of p3 that are unique to it while p4 provides
increased inhibitory properties to Bcl-2/Bcl-XL. This contribution to knowledge will
prove invaluable for the further development of even more potent pan-Bcl-2 BH3
mimetics and/or enhance their selectivity towards certain targets.
6.2 Discovery of TbRel1 Inhibitors
Identification of novel lead molecules for drug development is the first step
towards pharmaceutical drug development. However, the neglected tropical diseases
(NTD), along with other orphan diseases, are usually neglected by the pharmaceutical
industry owing to the poor financial prospects of such undertakings. Therefore, various
public-private partnerships have emerged to bring forward drug development for NTDs
(Nwaka and Ridley, 2003; Croft, 2005; Oduor et al., 2011). Academic centers have been
active participants in drug discovery efforts for the treatment of Trypanosomatid
pathogens which are responsible for a number of NTDs including leighmaniasis, African
trypanosomiasis and Chagas disease (Amaro et al., 2008). In such efforts, molecular
modeling provides an ideal platform for the discovery and development of drug
candidates while minimizing overall costs (Amaro et al., 2008; Chatelain and Ioset,

190
2011). Chapter 3 is a prime example of the success of virtual screening efforts in the
discovery of promising lead compounds for the treatment of NTDs.
The virtual screen targeted RNA editing, a unique mechanism employed by
Trypanosomatid pathogens for the processing of mitochondrial mRNA (Stuart et al.,
2005; Amaro et al., 2008). As mentioned in Chapter 3, the RNA editing ligase 1 enzyme
(REL1) component of the editosome was chosen as the ideal target due to its being
essential to the viability of the trypanosomatid pathogens and to the lack of known human
homologs (Rusche et al., 2001; Schnaufer et al., 2001). Furthermore, a high-resolution
structure of this key editosome enzyme had been previously characterized (Deng et al.,
2004). Our VS work in Chapter 3 and ensemble-based VS work of Chapter 4 was heavily
inspired on that of Amaro et al., who had previously identified the first series of TbRel1
inhibitors (Amaro et al., 2008). Furthermore, their work was an incentive for the
exploration of the use of MD-based ensembles in our VS pipeline. However, the results
obtained for the VS carried in Chapter 3 suggested that ensemble-based screening was
not the primary factor in their success (unpublished). Somewhat surprisingly, VS on the
rigid, ATP-bound TbRel1 structure provided us with the necessary enrichments to
recover the majority of their hits within the top 500 compounds (Amaro et al., 2008;
Appendix B, Table B.01). Additionally, from the top 12 selected compounds, an
additional inhibitor was discovered with potent editosome inhibitory properties (see
Chapter 3, Fig. 3.02-3.06; Appendix B, Fig. B.01). In fact, only two compounds from
Amaro et al., would not have been identified during our screen on the rigid structure
(Appendix B, Table B.01 & Fig. B.03). Our analysis revealed that V3 and V4 of the

191
Amaro et al. compounds could not be properly docked onto the ATP-site of the crystal
structure owing to their larger ring structure. This outcome could be related to the overall
performance of our in-house docking program versus that of AutoDock 4.0 which was
used by the McCammon group (Amaro et al., 2008). It is this type of questioning that
begs to be answered by comparative studies on the effects of virtual screening on an MD-
based ensemble versus that on a rigid structure. In effect, this led to the work described in
Chapter 5 where the effects of MD ensembles where evaluated on our virtual screening
pipeline.
The key aspect of our study was the identification and elucidation of the effects of
C35 on editosome integrity. In fact, our study was the first to document that C35, along
with potentially other inhibitors, is able to not only inhibit the function of TbRel1 but also
compromise editosome integrity altogether. This is in addition of its effects on RNA
binding to the editosome complex and inhibition of other editosome activities, including
the endoribonuclease, the uridylyltransferase and the exoribonuclease activities. These
effects of C35 were never documented for any inhibitor and further strengthened the
premise that TbRel1 inhibition could be an effective means of selectively targeting the
Trypanosomatid parasites.
Other enzymes are currently prospected as targets for the treatment of
Trypanosoma brucei infections. Pfizer is actively researching and developing Glycogen
Synthase Kinase (GSK) 3 inhibitors (Oduor et al., 2011). Other targets include tRNA
Synthetases, mitogen-activated protein kinases (MAPK) and cdc2-related kinases (CRK)

192
(Mercer et al., 2011; Shibata et al., 2011). SCYNEXIS, in collaboration with Anacor
Pharmaceuticals and sponsored by the not-for-profit Drugs for Neglected Disease
initiative (DNDi), is also currently developing SCYX-7158 which is expected to enter
phase 1 clinical trials shortly (Brun et al., 2011; Jacobs et al., 2011). However, for all of
these targets selectivity issues will arise as human homologs are known (with the
exception of SCYX-7158 for which no specific target has been described). The research
on TbRel1 conducted in Chapter 5 provided conclusive evidence on the efficacy of these
compounds in disrupting essential processes in the survival of the Trypanosoma brucei
pathogen. These results essentially strengthened the position of TbRel1 as an important
target in drug development efforts for combating trypanosomatid-related diseases. While
the binding assays conclusively show its inhibitory properties, future studies must
establish the Trypanocidal activity of these compounds and their clinical efficacy. Our
research provided the incentive on which further development of these compounds
should be undertaken. With partnerships in the private sector, preclinical developmental
studies will allow more extensive optimization and characterization of the potential use of
these compounds in the treatment of Trypanosoma infections.
6.3 Automated Molecular Formula determination by Tandem Mass Spectrometry
Natural sources contain millions of natural products and provide a larger
structural diversity than compound libraries. Natural products often have a specific
function with many of them having biological activity which may be of use in drug
development (Koehn and Carter, 2005). In fact, 61% of all New Chemical Entities

193
introduced between 1981 and 2002 are derived or inspired from natural products
(Newman and Cragg, 2007). Natural products are especially useful in the area of anti-
infectives were they compose 69.8% of all NCEs (Newman and Cragg, 2007). Several
new natural products are also being investigated for the treatment of neglected tropical
diseases (Ioset JR, 2008; Queiroz et al., 2009). It is therefore of valuable importance to
drug discovery effort to expedite the time to identification of bioactive compounds within
natural product sources.
However, identification of novel bioactive compounds which could progress to
new lead compounds and eventually drugs requires extensive chemical screening. The
aim of chemical screening is to distinguish within extracts which compounds are already
known from those that have yet to be characterized. Additionally, metabolic profiling is
used to identify different plant metabolites (Hostettmann et al., 2001). Together,
chemical screening and metabolic profiling represent dereplication i.e. distinguishing
between previously identified compounds. Typically, dereplication is often done by
means of mass spectral library search. However, standard procedures are estimated to
cost over 50,000$ and 3 months of work to isolate and characterize an active compound
from a natural source (Corley and Durley, 1994). The re-identification of a previously
known compound or one of its metabolites results in a “wasted” effort (Cordell, 1995).
These high-costs and time-consuming efforts have favored a de-emphasis on natural drug
discovery by pharmaceutical companies in favor of high-throughput screening and
biologics design (Mishra et al., 2008; Appendino et al., 2010). As the past decade has
brought about extraordinary advances in spectrometry and liquid chromatography

194
techniques, algorithms such as the one described in Chapter 4 utilizes the sensitivity of
these techniques to expedite the dereplication process. This, along with the development
of natural product databases, expedites the de-replication process allowing a quicker
identification of novel compounds (López-Pérez et al., 2007).
6.4 Ensemble-based Virtual Screening
In Chapter 3 we undertook a VS project using our in-house docking program and
the crystal structure of the target. However, information regarding the target’s flexibility
provides numerous advantages in identifying inhibitors that may bind to optimally to
structural features not present in a given structure. Several different means of
incorporating that information have been tried. Some docking programs do this by
building an average grid from several experimental structures while others simply
incorporate features from the different structures into the docking process (Knegtel et al.,
1997; Clausen et al., 2001). Others have made explored hybrid conformations by using
experimental structures to represent the range of possible conformations and allowing
regions of the target that are flexible to move independently of each other and recombine
(Claussen et al., 2001; Wei et al., 2004). The use of experimental conformational
ensembles has garnered attention in virtual screening where several studies have
demonstrated their ability to provide superior binding mode predictions and, in certain
cases, enrichments (Barril and Morley, 2005; Huang and Zou, 2007; Craig et al., 2010).

195
The study by Barril et al., used a large set of crystal structures in evaluating the
advantages of the ensemble virtual screening (Barril and Morley, 2005). Their results
suggested that while improved affinity predictions for the true binders were obtained, an
increase in false positives was also observed thereby minimizing the benefits in
enrichments (Barril and Morley, 2005). Their enrichments results strongly benefited from
addition of the internal energy term for each receptor structure used. Their results for
enrichments mimicked the observations of the Shoichet group who carefully
recommended the consideration of receptor conformational energy when evaluating
ensemble in docking studies (Wei et al., 2004). The other study by Huang et al.,
addressed the computational expenses of docking each ligand into an experimental
ensemble by devising an optimization algorithm which selects a structure for each ligand
that is predicted to be optimal (Huang et al., 2007). Similarly to Barril et al., ensemble-
docking provided large improvements in the prediction of binding modes. The
incorporation of internal energy of receptor conformation and of an optimization scheme
for structure selection are two viable improvements that can be made to the current
scheme and should further be evaluated.
While experimental structures have shown benefits in prediction of binding mode
and possibly enrichments when used for as an ensemble for virtual screening, less
research compares the relative effect of MD-based ensembles in VS. Previous studies
have shown that MD simulations on the whole could capture the protein structural
dynamic properties that are inferred experimentally (Philipopoulos and Lim, 1999).
Therefore, MD-based ensemble should provide sufficient accuracy and structural

196
diversity to be of benefit to VS pipelines. However, properly assessing the variables
involved in such simulations and their effect on VS enrichments had yet to be studied
extensively. The basis of this study was therefore to address such uncertainties and
provide a preliminary report on the efficacy of the use of MD-ensembles on enrichments
across various targets.
Several problems had been highlighted in previous studies using experimental
ensembles on VS that are in accordance with the results seen in this work. Firstly, our
work suggests that certain structures within the identified ensemble have an increased
propensity to discriminate between true binders and decoys. Such was the case for
structure 5 of the ER holo ensemble (Fig. D.02a), structure 2 of the EGFR holo ensemble
(Fig. D.02c), structure 5 of the COX2 holo ensemble (Fig. D.02d), etc. However,
identification of such structures prior to a virtual screen is problematic unless a test set
exists to evaluate the propensity of each structure for enrichments. Additionally, the work
done by Huang and Zou suggests that a scoring method can be used to determine the
optimal structure for binding of true binders if a number of true binders are known
beforehand (Huang and Zou, 2007). However, such benefits are only seen for a number
of targets, while others such as FxA (Fig. D.02h), GART (Fig. D.02g) and PARP (Fig,
D.02e) show little overall benefit from the use of MD-based ensembles. Hence
identification of the most appropriate structure becomes even more important in the
limited cases where MD-ensembles generate worse results.

197
Other groups have used MD-ensemble to increase sampling of protein dynamics
with encouraging results (Amaro et al., 2008). More encouraging is the discovery of
novel features within the binding site that were not described by previously published
experimental structures (Schames et al., 2004; Durrant et al., 2010). However, a detailed
evaluation on the objective performance of such ensembles on virtual screening results
had yet to be performed. The work done within this chapter was motivated by such
questioning. To date, the McCammon group has used 10, 20 or 40ns simulations for the
RCS (Amaro et al., 2007; Baron and McCammon, 2007; Cheng et al., 2008). An
appropriate evaluation of the effect of different time lengths on virtual screening
enrichments should be part of future work. Additionally, the ensemble size selected
within this study composed that which was computationally tractable, limiting the overall
ensemble size to at most 12 and averaging 8. While the results show a clear propensity of
certain members of the ensemble to generate improved enrichments, a study of the effect
of ensemble size on VS screening enrichments should also be studied at length.
The work in this chapter provided the first objective view of the use of MD-
ensemble in virtual screening by comparing its effects across various targets. The results
show dramatic improvements for targets which benefited from the relaxation of the
binding pocket for proper docking of compounds. Further work should include scoring
optimization strategies including the addition of internal conformation energy and
selection optimization methods, optimization of time length and of ensemble size. As
discussed in Chapter 5, a larger training set would also provide a more relative

198
assessment of the performance of virtual screening pipeline to expected real-world
results.

200
Table A.01 Fourier coefficients for ca-s6-n-ca, (Vn/2)(1 + cosnθ)
n Vn (kcal/mol)
1 –2.902
2 –1.809
3 1.192
4 0.048
5 0.257
6 –0.317

Figure A
Bcl-xL
Mcl-1
A.01 Helice
Mcl-1.
inhibit
simula
similar
es surroundi
. Labeled are
tor taken from
ations is also
r to those in
201
ing the bind
e helices α2,
m snapshots
o shown. The
Figure 2.05
ding groove
, α3, α4, α5
s from the m
e orientation
of the main
Bcl
es of Bcl-xL,
and α8. The
molecular dyn
ns of the com
n text.
l-2
Bcl-2 and
ABT-737
namics
mplexes are

202
Distance(Å)
1
2
3
4
5
6
Distance(Å)
1
2
3
4
5
6
7
Time (ns)
0 5 10 15 20
Distance(Å)
1
2
3
4
5
6
7
8
9
10
11
12
13
(a)
(b)
(c)
Figure A.02 Distance between ABT-737 sulfonamide HN and backbone
carbonyl O of (a) Bcl-xL Asn136, (b) Bcl-2 Asn140 and (c) Mcl-
1 Asn260.

203
Distance(Å)
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
Distance(Å)
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
Time (ns)
0 5 10 15 20
Distance(Å)
1
2
3
4
5
6
7
8
9
10
11
12
13
Time (ns)
0 5 10 15 20
1
2
3
4
5
6
7
8
9
10
11
12
13
(a)
(a)
(b)
(c)
(d)
(e)
(f)
Figure A.03 Hydrogen bond pair distances. Column 1: Distance between
ABT-737 sulfonyl O and the side chain amide HN of (a) Bcl-xL
Asn136, (b) Bcl-2 Asn140 and (c) Mcl-1 Asn260. Column 2:
Distance between ABT-737 sulfonyl O and the side backbone HN
of (a) Bcl-xL Gly138, (b) Bcl-2 Gly142 and (c) Mcl-1 Gly262.

204
Distance(Å)
1
2
3
4
5
6
7
8
9
10
Time (ns)
0 5 10 15 20
Distance(Å)
1
2
3
4
5
6
7
8
9
10
(a)
(b)
Figure A.04 Hydrogen bond pair distances. Distance between the
ABT-737 dimethylamino HN and (a) Bcl-xL Glu96 side
chain carboxylate O, (b) Bcl-2 Asp100 side chain
carboxylate O.

205
Distance(Å)
0
1
2
3
4
0
1
2
3
4
(r1) (r2)
Distance(Å)
0
1
2
3
4
0
1
2
3
4
Time(ns)
0 5 10 15 20
Distance(Å)
0
1
2
3
4
Time (ns)
0 5 10 15 20
0
1
2
3
4
(r3) (r4)
(r5) (r6)
Figure A.05 Distance of ABT-737 ring centroids from their initial positions after
superposition of the protein C-alpha atoms to those in the first
snapshot. Data points are at 10-ps intervals. The MD simulation was
started from a structure of Bcl-xL in which the coordinates of the α2 and
α3 helices have been modified to correspond to that observed when a Bim
peptide binds. The ABT-737 binding mode is just as stable in this
simulation as it is with the unaltered protein structure. The labels, r1 to r6,
refer to the ring moieties in ABT-737, starting from the chlorobiphenyl to
the S-phenyl. (See Figure 2.01 of the main text.)

207
Table B.01 Ranking of selected hits from virtual screen.
a
National Cancer Institute. b
Short Identification Label. IDs V1, V3, V4,
S1, S5, S6 refer to the active compounds identified by McCammon group
(Amaro et al., 2008).
c
Values are in arbitrary unit. These are empirical
scores from the docking step of the high throughput virtual screening.
More negative numbers suggest better affinity.
NCI no.a
IDb
VS Scorec
Rank
614641 -35.105 12
45210 C11 -34.724 18
344553 -34.647 20
162535 C35 -33.947 38
79710 C10 -33.899 41
641601 -33.446 58
89166 C66 -33.029 79
16209 S5 -33.004 82
641753 -32.797 92
7809 -32.773 96
37204 C04 -32.716 102
45201 S6 -32.405 130
100234 S1 -31.338 220
674000 -31.012 257
45208 V1 -30.272 379
623766 -29.372 562
125908 V3 -25.212 6749
117079 V4 -24.226 11874

F
C11
C6
Figure B.01
66
Inhibittors identifie
C04
d from first
C
C35
round of vir
C10
rtual screeninng.

FFigure B.02
V4
Previously identifieed inhibitors
.
s not retrieve
V3
ed in virtual sscreen.

Table C.01 Molecularr Formula AAnalysis of 5--leucine enkephalin

Figure D.01 Overview of VS results for the crystal structure and apo/holo
ensembles. The black, dark grey and light grey lines represent the enrichments obtained
at the respective percentage of the ranked database for the apo, crystal and holo virtual
screening routines were performed. The Y-axis represents the percentage of true binders
within the database that were retrieved.
a)
b)
0
10
20
30
40
50
60
70
80
90
100
%ofknownligandsfound
% of ranked database
ER
Apo
Crystal
Holo
0
10
20
30
40
50
60
70
80
90
100
EGFR
Apo
Crystal
Holo

c)
d)
0
10
20
30
40
50
60
70
80
90
100
%knownligandsfound
AR
Apo
Crystal
Holo
0
10
20
30
40
50
60
70
80
90
100
COX2
Apo
Crystal
Holo

e)
f)
0
10
20
30
40
50
60
70
80
90
100
FxA
Apo
Crystal
Holo
0
20
40
60
80
100
120
GART
Apo
Crystal
Holo

g)
h)
0
10
20
30
40
50
60
70
80
90
100
Hsp90
Apo
Crystal
Holo
0
10
20
30
40
50
60
70
80
90
100
PARP
Apo
Crystal
Holo

i)
0
10
20
30
40
50
60
70
80
90
100
%knownligandsfound
ADA
Apo
Crystal
Holo

Figure D.02 Ensemble-based VS results for structures generated from apo MDs.
The stacked columns represent the enrichment percentages obtained. The crystal column
indicates the results obtained for virtual screening on the crystal structure. The numbered
columns indicate the virtual screening enrichment obtained for each of the structures in
the respective ensemble. The min, median and avg columns represent the enrichments
obtained for scoring over the structures contained in the ensemble using the minimum,
median and average score repectively. The black, dark grey and light grey segments of
the column illustrate de percentage of true binders retrieved ar 5%, 2% and 1% of the
total ranked database respectively.
a)
b)
0
5
10
15
20
25
30
35
40
45
ER Holo
5%
2%
1%
0
5
10
15
20
25
30
35
40
45
Hsp90 Holo
5%
2%
1%

c)
d)
0
5
10
15
20
25
30
35
40
45
EGFR Holo
5%
2%
1%
0
5
10
15
20
25
30
35
Cox2 Holo
5%
2%
1%

e)
f)
0
10
20
30
40
50
60
PARP Holo
5%
2%
1%
0
5
10
15
20
25
30
Ar Holo
5%
2%
1%

g)
h)
0
5
10
15
20
25
30
35
40
Crystal 1 2 3 4 5 6 7 8 Min
GART Holo
5%
2%
1%
0
5
10
15
20
25
30
35
FxA Holo
5%
2%
1%

References
Abagyan R and Totrov M. 1994a. Biased probability Monte Carlo conformational
searches and electrostatic calculations for peptides and proteins. J Mol Biol.
235(3):983-1002.
Abagyan R, Totrov M and Kuznetsov D. 1994b. ICM – A New Method for Protein
Modeling and Design: Applications to Docking and Structure Prediction from the
Distorted Native Conformation. J Comput Chem. 15(5):488-506.
Ackler S, Mitten M, Foster K, Oleksijew A, Refici M, Tahir S, Xiao Y, Tse C, Frost D,
Fesik S, Rosenberg S, Elmore S, Shoemaker A. 2010. The Bcl-2 inhibitor ABT-
263 enhances the response of multiple chemotherapeutic regimens in hematologic
tumors in vivo. Cancer Chemother Pharmacol. 66(5):869-880.
Adams JM, Cory S. 2007. The Bcl-2 apoptotic switch in cancer development and
therapy. Oncogene. 26(9):1324-1337
Albershardt TC, Salerni BL, Soderquist RS, Bates DJ, Pletnev AA, Kisselev AF,
Eastman A. 2011. Multiple BH3 Mimetics Antagonize Antiapoptotic MCL1
Protein by Inducing the Endoplasmic Reticulum Stress Response and Up-
regulating BH3-only Protein NOXA. J Biol Chem. 286(28):24882-95.
Allen FH. 2002. The Cambridge Structural Database: a quarter of a million crystal
structures and rising. Acta Crystallographica. B58:380-388.
Allison J, Rothwell V, Newport G, Agabian N, and Stuart K. 1984. The IsTat 1.3 VSG
multigene family in Trypanosoma brucei: retention of the expression linked copy
through multiple antigenic switches. Nucleic Acids Res. 12(23):9051-66.
Amaro RE, Swift RV, McCammon JA. 2007. Functional and structural insights revealed
by molecular dynamics simulations of an essential RNA editing ligase in
Trypanosoma brucei. PLoS Negl Trop Dis. 1(2):e68.
Amaro RE, Baron R, and McCammon JA. 2008a.An improved relaxed complex scheme
for receptor flexibility in computer-aided drug design. J Comput Aided Mol Des.
22(9):693-705.
Amaro RE, Schnaufer A, Interthal H, Hol W, Stuart KD, McCammon JA. 2008b.
Discovery of drug-like inhibitors of an essential RNA-editing ligase in
Trypanosoma brucei. Proc Natl Acad Sci U S A. 105(45):17278-83
Amaro RE and Li WW. 2010. Emerging methods for ensemble-based virtual screening.
Curr Top Med Chem. 10(1):3-13.

Amundson SA, Myers TG, Scudiero D, Kitada S, Reed JC, Fornace AJ. 2000. An
Informatics Approach Identifying Markers of Chemosensitivity in Human Cancer
Cell Lines. Cancer Research. 60(21):6101-6110.
Anderson HC. 1983. Rattle: A “Velocity” Version of the Shake Algorithm for Molecular
Dynamics Calculations. J Comp Phys. 52:24-34.
Aphasizhev R, Sbicego S, Peris M, Jang SH, Aphasizheva I, Simpson AM, Rivlin A,
Simpson L. 2002. Trypanosome mitochondrial 3' terminal uridylyl transferase
(TUTase): the key enzyme in U-insertion/deletion RNA editing.
Cell. 108(5):637-48.
Aphasizhev R, Aphasizheva I, Nelson RE, Gao G, Simpson AM, Kang X, Falick AM,
Sbicego S, and Simpson L. 2003. Isolation of a U-insertion/deletion editing
complex from Leishmania tarentolae mitochondria. EMBO J. 22(4):913-24.
Appendino G, Fontana G, Pollastro F. 2010. Natural Products Drug Discovery.
Comprehensive Natural Product II: Chem & Biol. 3:205-236.
Ashkenazi A, Dixit VM. 1998. Death Receptors: Signaling and Modulation. Science.
281(5381):1305-1308.
Athri P, Wilson WD. 2009. Molecular Dynamics of Water-Mediated Interactions of a
Linear Benzimidazole-Biphenyl Diamidine with the DNA Minor Groove. J Amer
Chem Soc. 131(22):7618-7625
Bao Q, Shi Y. 2007. Apoptosome: a platform for the activation of initiator caspases. Cell
Death Differ. 14(1):56-65.
Baron R, McCammon JA. 2007. Dynamics, hydration, and motional averaging of a loop-
gated artificial protein cavity: the W191G mutant of cytochrome c peroxidase in
water as revealed by molecular dynamics simulations. Biochemistry.
46(37):10629-42.
Barril X and Morley SD. 2005. Unveiling the full potential of flexible receptor docking
using multiple crystallographic structures. J Med Chem. 48(13):4432-43.
Bayly CI, Cieplak P, Cornell WD, Kollman PA. 1993. A Well-Behaved Electrostatic
Potential Based Method Using Charge Restraints for Deriving Atomic Charges:
The RESP Model. J Phys Chem. 97:10269-10280.
Bekker H. 1997. Unification of box shapes in molecular simulations. J Comp Chem.
18(15):1930-1942.
Ben-Naim A. 1997. Statistical Potentials extracted from protein structures: Are these
Meaningful Potentials. J Chem Phys. 107:3698-3707.

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN,
Bourne PE. 2000. The Protein Data Bank. Nucleic Acids Research. 28: 235-242.
Bhat S, Purisima EO. 2006. Molecular surface generation using a variable-radius solvent
probe. Proteins Struct Funct Bioinf. 62(1):244-261.
Bindseil KU, Jakupovic J, Wolf D, Lavayre J, Leboul J, and van der Pyl D. 2001. Pure
compound libraries; a new perspective for natural product based drug discovery.
Drug Discov Today. 6(16):840-847.
Blom KF. 2001. Estimating the precision of exact mass measurements on an orthogonal
time-of-flight mass spectrometer. Anal Chem. 73(3):715-9.
Boas FE and Harbury PB. 2007. Potential energy functions for protein design. Curr Opin
Struct Biol. 17:199-204.
Bobzin SC, Yang S, and Kasten TP. 2000. LC-NMR: a new tool to expedite the
dereplication and identification of natural products. J Ind Microbiol Biotechnol.
25(6):342-345.
Böcker S and Rasche F. 2008. Towards de novo identification of metabolites by
analyzing tandem mass spectra. Bioinformatics. 24(16):i49-i55.
Boersma MD, Sadowsky JD, Tomita YA, Gellman SH. 2008. Hydrophile scanning as a
complement to alanine scanning for exploring and manipulating protein-protein
recognition: application to the Bim BH3 domain. Protein Sci. 17(7):1232-40.
Boger DL, Patel M. 1988. Total synthesis of prodigiosin, prodigiosene, and
desmethoxyprodigiosin: Diels-Alder reactions of heterocyclic azadienes and
development of an effective palladium(II)-promoted 2,2'-bipyrrole coupling
procedure. J Org Chem. 53(7):1405-1415.
Böhm HJ. 1994. The development of a simple empirical scoring function to estimate the
binding constant for a protein-ligand complex of known three-dimensional
structure. J Comput Aided Mol Des. 8(3):243-56.
Böröczky K, Laatsch H, Wagner-Döbler I, Stritzke K, and Schulz S. 2006.Cluster
analysis as selection and dereplication tool for the identification of new natural
compounds from large sample sets. Chem Biodivers. 3(6):622-34.
Boyd DB and Lipkowitz KB. 1982. Molecular Mechanics. J Chem Ed. 59(4):269-274.
Bradshaw J, Butina D, Dunn AJ, Green RH, Hajek M, Jones MM, Lindon JC, and
Sidebottom PJ. 2001. A rapid and facile method for the dereplication of purified
natural products. J Nat Prod. 64(12):1541-4.

Brenner D, Mak TW. 2009. Mitochondrial cell death effectors. Curr Opin Cell Biol.
21(6):871-877.
Bristow AW and Webb KS. 2003. Intercomparison study on accurate mass measurement
of small molecules in mass spectrometry. J Am Soc Mass Spectrom. 14(10):1086-
98.
Bristow AW. 2006. Accurate mass measurement for the determination of elemental
formula--a tutorial. Mass Spectrom Rev. 25(1):99-111.
Brodniewicz T, Grynkiewicz G. 2010. Preclinical drug development. Acta Pol Pharm.
67(6):578-85.
Brooks BR, Brooks CL 3rd, Mackerell AD Jr, Nilsson L, Petrella RJ, Roux B, Won Y,
Archontis G, Bartels C, Boresch S, Caflisch A, Caves L, Cui Q, Dinner AR, Feig
M, Fischer S, Gao J, Hodoscek M, Im W, Kuczera K, Lazaridis T, Ma J,
Ovchinnikov V, Paci E, Pastor RW, Post CB, Pu JZ, Schaefer M, Tidor B,
Venable RM, Woodcock HL, Wu X, Yang W, York DM, Karplus M. 2009.
CHARMM: the biomolecular simulation program. J Comput Chem. 30(10):1545-
614.
Brun R, Don R, Jacobs RT, Wang MZ, Barrett MP. 2011. Development of novel drugs
for human African trypanosomiasis. Future Microbiol. 6:677-91.
Bursavich MG and Rich DH. 2002. Designing non-peptide peptidomimetics in the 21st
century: inhibitors targeting conformational ensembles. J Med Chem. 45(3):541-
58.
Capdeville R, Buchdunger E, Zimmermann J, Matter A. 2002. Glivec (STI571, imatinib),
a rationally developed, targeted anticancer drug. Nat Rev Drug Discov. 1(7):493-
502.
Carlson HA, Masukawa KM, Rubins K, Bushman FD, Jorgensen WL, Lins RD, Briggs
JM, McCammon JA. 2000. Developing a dynamic pharmacophore model for
HIV-1 integrase. J Med Chem. 43(11):2100-14.
Carnes J, Trotter JR, Ernst NL, Steinberg A, Stuart K. 2005.An essential RNase III
insertion editing endonuclease in Trypanosoma brucei. Proc Natl Acad Sci U S A.
102(46):16614-9.
Carnes J, Trotter JR, Peltan A, Fleck M, Stuart K. 2008. RNA editing in Trypanosoma
brucei requires three different editosomes. Mol Cell Biol. 28(1):122-30.
Case DA, Cheatham TE 3rd, Darden T, Gohlke H, Luo R, Merz KM Jr, Onufriev A,
Simmerling C, Wang B, Woods RJ. 2005. The Amber biomolecular simulation
programs. J Comput Chem. 26(16):1668-88.

Cavasotto CN, Orry AJ. 2007. Ligand docking and structure-based virtual screening in
drug discovery. Curr Top Med Chem. 7(10):1006-14.
Cavasotto CN. 2011. Homology models in docking and high-throughput docking. Curr
Top Med Chem. 11(12):1528-34.
Chan SL, Purisima EO. 1998.Molecular Surface Generation Using Marching Tetrahedra.
J Comput Chem. 19(11):1268-1277.
Chan SL, Lee MC, Tan KO, Yang LK, Lee AS, Flotow H, Fu NY, Butler MS, Soejarto
DD, Buss AD, Yu VC. 2003. Identification of chelerythrine as an inhibitor of
Bcl-XL function. J Biol Chem. 278(23):20453-6.
Chatelain E and Ioset JR. 2011. Drug discovery and development for neglected diseases:
the DNDi model. Drug Des Devel Ther. 5:175-81.
Cheatham TE, III, Miller JL, Fox T, Darden TA, Kollman PA. 1995. Molecular
Dynamics Simulations on Solvated Biomolecular Systems: The Particle Mesh
Ewald Method Leads to Stable Trajectories of DNA, RNA, and Proteins. J Amer
Chem Soc. 117:4193-4194.
Chen R, Li L, and Weng Z. 2003. ZDOCK: an initial-stage protein-docking algorithm.
Proteins. 52(1):80-7.
Chen L, Willis SN, Wei A, Smith BJ, Fletcher JI, Hinds MG, Colman PM, Day CL,
Adams JM, Huang DC. 2005. Differential targeting of prosurvival Bcl-2 proteins
by their BH3-only ligands allows complementary apoptotic function. Mol Cell.
17(3):393-403.
Chen S, Dai Y, Harada H, Dent P, Grant S. 2007. Mcl-1 Down-regulation Potentiates
ABT-737 Lethality by Cooperatively Inducing Bak Activation and Bax
Translocation. Cancer Research. 67(2):782-791.
Cheng LS, Amaro RE, Xu D, Li WW, Arzberger PW, and McCammon JA. 2008.
Ensemble-based virtual screening reveals potential novel antiviral compounds for
avian influenza neuraminidase. J Med Chem. 51(13):3878-94.
Chernushevich IV, Loboda AV, and Thomson BA. 2001. An introduction to quadrupole-
time-of-flight mass spectrometry. J Mass Spectrom. 36(8):849-65.
Christen M, Hünenberger PH, Bakowies D, Baron R, Bürgi R, Geerke DP, Heinz TN,
Kastenholz MA, Kräutler V, Oostenbrink C, Peter C, Trzesniak D, van Gunsteren
WF. 2005. The GROMOS software for biomolecular simulation: GROMOS05. J
Comput Chem. 26(16):1719-51.

Clarkson C, Staerk D, Hansen SH, and Jaroszewski JW. 2005. Hyphenation of solid-
phase extraction with liquid chromatography and nuclear magnetic resonance:
application of HPLC-DAD-SPE-NMR to identification of constituents of Kanahia
laniflora. Anal Chem. 77(11):3547-53.
Claussen H, Buning C, Rarey M, Lengauer T. 2001. FlexE: efficient molecular docking
considering protein structure variations. J Mol Biol. 308(2):377-95.
Clauwaert K, Vande Casteele S, Sinnaeve B, Deforce D, Lambert W, Van Peteghem C,
and Van Bocxlaer J. 2003. Exact mass measurement of product ions for the
structural confirmation and identification of unknown compounds using a
quadrupole time-of-flight spectrometer: a simplified approach using combined
tandem mass spectrometric functions. Rapid Commun Mass Spectrom.
17(13):1443-8.
Colombo M, Sirtori FR, and Rizzo V. 2004. A fully automated method for accurate mass
determination using high-performance liquid chromatography with a
quadrupole/orthogonal acceleration time-of-flight mass spectrometer.
Rapid Commun Mass Spectrom. 18(4):511-7.
Connolly ML. 1983a. Analytical molecular surface calculation. J Appl Cryst. 16:548-558.
Connolly ML. 1983b. Solvent-accessible surfaces of proteins and nucleic acids. Science.
221(4612):709-13.
Constant HL and Beecher CWW. 1995. A method for dereplication of natural product
extracts using electrospray HPLC/MS. Nat Prod Lett. 6(3):193-196.
Cordell GA, Beecher CWW, Kinghorn AD, Pezzuto JM, Constant HL, Chai HB, Fang L,
Seo EK, Long L, Cui B, and Slowing-Barillas K. 1996.The dereplication of plant-
derived natural products. Studies in Natural Products Chemistry, Structure and
Chemistry (Part E). 19:749-791.
Cordell GA and Shin YG. 1999. Finding the needle in the haystack. The dereplication of
natural product extracts. Pure Appl. Chem. 71(6):1089-1094.
Corley DG and Durley RC. 1994. Strategies for Database Dereplication of Natural
Products. J. Nat. Prod. 57(11): 1484–1490.
Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM Jr, Ferguson DM, Spellmeyer
DC, Fox T, Caldwell JW, Kollman PA. 1995. A Second Generation Force Field
for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. J Am
Chem Soc. 117: 5179–5197.

Cozzini P, Fomabaio M, Marabotti A, Abraham DJ, Kellogg GE and Mozzarelli A. 2004.
Free Energy of Ligand Binding to Protein: Evaluation of the Contribution of
Water Molecules by Computational Methods. Curr Med Chem. 11:3093-3118.
Craig IR, Essex JW, Spiegel K. 2010. Ensemble docking into multiple
crystallographically derived protein structures: an evaluation based on the
statistical analysis of enrichments. J Chem Inf Model. 50(4):511-24.
Croft SL. 2005. Public-private partnership: from there to here. Trans R Soc Trop Med
Hyg. 99(Suppl 1):S9–S14
Cruz-Reyes J, Zhelonkina AG, Huang CE, Sollner-Webb B. 2002. Distinct functions of
two RNA ligases in active Trypanosoma brucei RNA editing complexes.
Mol Cell Biol. 22(13):4652-60.
Cuendet MA and van Gunsteren WF. 2007. On the calculation of velocity-dependent
properties in molecular dynamics simulations using the leapfrog integration
algorithm. J Chem Phys. 127(18):184102.
Cui Q, Sulea T, Schrag JD, Munger C, Hung MN, Naïm M, Cygler M, Purisima EO.
2008. Molecular dynamics-solvated interaction energy studies of protein-protein
interactions: the MP1-p14 scaffolding complex. J Mol Biol. 379(4):787-802.
Cushman DW, Cheung HS, Sabo EF, Ondetti MA. 1977. Design of potent competitive
inhibitors of angiotensin-converting enzyme. Carboxyalkanoyl and
mercaptoalkanoyl amino acids. Biochemistry. 16(25):5484-91.
Czabotar PE, Lee EF, van Delft MF, Day CL, Smith BJ, Huang DCS, Fairlie WD, Hinds
MG, Colman PM. 2007. Structural insights into the degradation of Mcl-1 induced
by BH3 domains. Proc Nat Acad Sci USA. 104(15):6217-6222.
D'Alessio R, Bargiotti A, Carlini O, Colotta F, Ferrari M, Gnocchi P, Isetta A, Mongelli
N, Motta P, Rossi A, Rossi M, Tibolla M, Vanotti E. 2000. Synthesis and
Immunosuppressive Activity of Novel Prodigiosin Derivatives. J Med Chem.
43(13):2557-2565.
Damm KL and Carlson HA. 2007. Exploring experimental sources of multiple protein
conformations in structure-based drug design. J Am Chem Soc. 129(26):8225-35.
Day CL, Chen L, Richardson SJ, Harrison PJ, Huang DCS, Hinds MG. 2005. Solution
Structure of Prosurvival Mcl-1 and Characterization of Its Binding by
Proapoptotic BH3-only Ligands. J Biol Chem. 280(6):4738-4744.
Desjarlais RL, Sheridan RP, Dixon JS, Kuntz ID, Venkataraghavan R. 1986. Docking
flexible ligands to macromolecular receptors by molecular shape. J Med Chem.
29:2149-53.

Delespaux V and de Koning HP. 2007. Drugs and drug resistance in African
trypanosomiasis. Drug Resist Updat. 10(1-2):30-50.
Deng J, Schnaufer A, Salavati R, Stuart KD, Hol WG. 2004. High resolution crystal
structure of a key editosome enzyme from Trypanosoma brucei: RNA editing
ligase 1. J Mol Biol. 343(3):601-13.
Denise H and Barrett MP. 2001. Uptake and mode of action of drugs used against
sleeping sickness. Biochem Pharmacol. 61(1):1-5.
Dias R, de Azevedo WF Jr. 2008. Molecular docking algorithms. Curr Drug Targets.
9(12):1040-7.
DiMasi JA, Hansen RW, Grabowski HG. 2003. The price of innovation: new estimates of
drug development costs. J Health Econ. 22(2):151-85.
Dinan L. 2005. Methods in Biotechnology, natural products isolation (2nd
edition). 20:
297-321.
Doherty AJ and Suh SW. 2000. Structural and mechanistic conservation in DNA ligases.
Nucleic Acids Res. 28(21):4051-8.
Domingo GJ, Palazzo SS, Wang B, Pannicucci B, Salavati R, Stuart KD. 2003.
Dyskinetoplastic Trypanosoma brucei contains functional editing complexes.
Eukaryot Cell. 2(3):569-77.
Dror O, Shulman-Peleg A, Nussinov R, Wolfson HJ. 2004. Predicting molecular
interactions in silico: I. A guide to pharmacophore identification and its
applications to drug design. Curr Med Chem. 11(1):71-90.
Durrant JD, Hall L, Swift RV, Landon M, Schnaufer A, and Amaro RE. 2010a. Novel
naphthalene-based inhibitors of Trypanosoma brucei RNA editing ligase 1. PLoS
Negl Trop Dis. 4(8):e803.
Durrant JD, Urbaniak MD, Ferguson MA, and McCammon JA. 2010b. Computer-aided
identification of Trypanosoma brucei uridine diphosphate galactose 4'-epimerase
inhibitors: toward the development of novel therapies for African sleeping
sickness. J Med Chem. 53(13):5025-32.
Eckers C, Wolff JC, Haskins NJ, Sage AB, Giles K, and Bateman R. 2000.
Accurate mass liquid chromatography/mass spectrometry on orthogonal
acceleration time-of-flight mass analyzers using switching between separate
sample and reference sprays. 1. Proof of concept. Anal Chem. 72(16):3683-8.
Ehrlich, P. 1909. Uber den jetzigen Stand der Chemotherapie. Berichte der Deutschen
Chemischen Gesellschaft, 42:17-47, In German.

Ehrlich LP, Nilges M, Wade RC. 2005. The impact of protein flexibility on protein-
protein docking. Proteins. 58(1):126-33.
Eldridge GR, Vervoort HC, Lee CM, Cremin PA, Williams CT, Hart SM, Goering MG,
O'Neil-Johnson M, and Zeng L. 2002. High-throughput method for the production
and analysis of large natural product libraries for drug discovery. Anal Chem.
74(16):3963-71.
Elmore S. 2007. Apoptosis: A Review of Programmed Cell Death. Toxicol Pathol.
35(4):495-516.
Ernst NL, Panicucci B, Igo RP Jr, Panigrahi AK, Salavati R, Stuart K. 2003. TbMP57 is a
3' terminal uridylyl transferase (TUTase) of the Trypanosoma brucei editosome.
Mol Cell. 11(6):1525-36.
Erve JC, Vashishtha SC, Ojewoye O, Adedoyin A, Espina R, Demaio W, and Talaat RE.
2008. Metabolism of prazosin in rat and characterization of metabolites in plasma,
urine, faeces, brain and bile using liquid chromatography/mass spectrometry
(LC/MS). Xenobiotica. 38(5):540-58.
Evers A and Klabunde T. 2005. Structure-based drug discovery using GPCR homology
modeling: successful virtual screening for antagonists of the alpha1A adrenergic
receptor. J Med Chem. 48(4):1088-97.
Ewing TJ, Makino S, Skillman AG, Kuntz ID. 2001. DOCK 4.0: search strategies for
automated molecular docking of flexible molecule databases. J Comput Aided
Mol Des. 15(5):411-28.
Eyrisch S, Helms V. 2007. Transient Pockets on Protein Surfaces Involved in Protein-
Protein Interaction. J Med Chem. 50(15):3457-3464.
Ezzell C. 2003. The price of pills. Does it really take $897 million for a new therapy?
Sci Am. 289(1):25.
Ferrara P, Gohlke H, Price DJ, Klebe G, Brooks CL 3rd
. 2004. Assessing scoring
functions for protein-ligand interactions. J Med Chem. 47(12):3032-47.
Fire E, Gullá SV, Grant RA, Keating AE. 2010. Mcl-1-Bim complexes accommodate
surprising point mutations via minor structural changes. Prot Sci. 19(3):507-519
Floris F and Tomasi J. 1989. Evaluation of the dispersion contribution to the solvation
energy. A simple computational model in the continuum approximation. J
Comput Chem. 10(5):616–627.
Fogolari F, Brigo A, Molinari H. 2002. The Poisson-Boltzmann equation for
biomolecular electrostatics: a tool for structural biology. J Mol Recognit.

15(6):377-92.
Fradera X, De La Cruz X, Silva CH, Gelpí JL, Luque FJ, Orozco M. 2002. Ligand-
induced changes in the binding sites of proteins. Bioinformatics. 18(7):939-48.
Fredenhagen A, Derrien C, and Gassmann E. 2005. An MS/MS library on an ion-trap
instrument for efficient dereplication of natural products. Different fragmentation
patterns for [M + H]+ and [M + Na]+ ions. J Nat Prod. 68(3):385-91.
Frembgen-Kesner T and Elcock AH. 2006. Computational sampling of a cryptic drug
binding site in a protein receptor: explicit solvent molecular dynamics and
inhibitor docking to p38 MAP kinase. J Mol Biol. 359(1):202-14.
Gao G, Rogers K, Li F, Guo Q, Osato D, Zhou SX, Falick AM, and Simpson L. 2010.
Uridine insertion/deletion RNA editing in Trypanosomatids: specific stimulation
in vitro of Leishmania tarentolae REL1 RNA ligase activity by the MP63 zinc
finger protein. Protist. 161(3):489-96.
Gilson M, Sharp K, and Honig B. 1988. Calculating the electrostatic potential of
molecules in solution: Method and error assessment. J Comput Chem. 9:327-335.
Glish GL and Vachet RW. 2003. The basics of mass spectrometry in the twenty-first
century. Nat Rev Drug Discov. 2(2):140-50.
Gohlke, H., Hendlich M., Klebe G. 2000. Predicting binding modes, binding affinities
and "hot spots" for protein-ligand complexes using a knowledge-based scoring
function. Persp. Drug Design Discov. 20:115-144.
Gohlke H and Klebe G. 2002. Approaches to the description and prediction of the
binding affinity of small-molecule ligands to macromolecular receptors.
Angew Chem Int Ed Engl. 41(15):2644-76.
Gordon JC, Myers JB, Folta T, Shoja V, Heath LS and Onufriev A. 2005. H++: a server
for estimating pKas and adding missing hydrogens to macromolecules. Nucleic
Acids Res. 33(Web Server issue):W368-71.
Göringer HU, Koslowsky DJ, Morales TH, Stuart K. 1994. The formation of
mitochondrial ribonucleoprotein complexes involving guide RNA molecules in
Trypanosoma brucei. Proc Natl Acad Sci U S A. 91(5):1776-80.
Grange AH and Sovocool GW. 1999. Determination of elemental composition by high
resolution mass spectrometry without mass calibrants. Rapid Commun Mass
Spectrom. 13(8):673-686.
Grüneberg S, Stubbs MT, Klebe G. 2002. Successful virtual screening for novel
inhibitors of human carbonic anhydrase: strategy and experimental confirmation.

J Med Chem. 45(17):3588-602.
Guo X, Ernst NL, Carnes J, Stuart KD. 2010. The zinc-fingers of KREPA3 are essential
for the complete editing of mitochondrial mRNAs in Trypanosoma brucei. PLoS
One. 5(1):e8913.
Hajduk PJ, Huth JR, Tse C. 2005. Predicting protein druggability. Drug Discov Today.
10(23-24):1675-82.
Halperin I, Ma B, Wolfson H, Nussinov R. 2002. Principles of docking: An overview of
search algorithms and a guide to scoring functions. Proteins. 47(4):409-43.
Hanahan D, Weinberg RA. 2000. The hallmarks of cancer. Cell. 100:57-70.
Hansen ME, Smedsgaard J, and Larsen TO. 2005. X-Hitting: an algorithm for novelty
detection and dereplication by UV spectra of complex mixtures of natural
products. Anal Chem. 77(21):6805-17.
He XG. 2000. On-line identification of phytochemical constituents in botanical extracts
by combined high-performance liquid chromatographic-diode array detection-
mass spectrometric techniques. J Chromatogr A. 880(1-2):203-32.
Head RD, Smythe ML, Oprea TI, Waller CL, Green SM and Marshall GR. 1996.
VALIDATE: A New Method for the Receptor-Based Prediction of Binding
Affinities of Novel Ligands. J Am Chem Soc. 118(16): 3959–3969
Hendsch ZS and Tidor B. 1999. Electrostatic interactions in the GCN4 leucine zipper:
substantial contributions arise from intramolecular interactions enhanced on
binding. Protein Sci. 8(7):1381-1392.
Honig B and Nicholls A. 1995. Classical electrostatics in biology and chemistry.
Science. 268:1144-1149.
Hook DJ, Pack EJ, Yacobucci JJ and Guss J. 1997. Approaches to Automating the
Dereplication of Bioactive Natural Products—The Key Step in High Throughput
Screening of Bioactive Materials From Natural Sources. J. Biomol. Screening.
2(3):145-152.
Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, and Simmerling C. 2006.
Comparison of multiple Amber force fields and development of improved protein
backbone parameters. Proteins. 65(3):712-25.
Hostettmann K, Wolfender JL, Terreaux C. 2001. Modern screening techniques for plant
extracts. Pharm Biol. 39 Suppl 1:18-32.

Hostettmann K, Marston A, and Wolfender JL. 2005. Strategy in the Search for New
lead Compounds and Drugs from Plants. Chimia, 59(6):291-294.
Hou T, Guo S and Xu X. 2002. Predictions of binding of a set of diverse set of ligands to
gelatinase-A by a combination of molecular dynamics and continuum solvation
models. J Phys Chem. 106:5527-5535.
Huang N, Shoichet BK, Irwin JJ. 2006. Benchmarking sets for molecular docking. J Med
Chem. 49(23):6789-801.
Huang SY and Zou X. 2007. Ensemble docking of multiple protein structures:
considering protein structural variations in molecular docking. Proteins.
66(2):399-421.
Hünenberger PH, Helms V, Narayana N, Taylor SS, McCammon JA. 1999. Determinants
of ligand binding to cAMP-dependent protein kinase. Biochemistry. 38(8):2358-
66.
Igo RP Jr, Palazzo SS, Burgess ML, Panigrahi AK, and Stuart K. 2000. Uridylate
addition and RNA ligation contribute to the specificity of kinetoplastid insertion
RNA editing. Mol Cell Biol. 20(22):8447-57.
Igo RP Jr, Weston DS, Ernst NL, Panigrahi AK, Salavati R, Stuart K. 2002. Role of
uridylate-specific exoribonuclease activity in Trypanosoma brucei RNA editing.
Eukaryot Cell. 1(1):112-8.
Ioset JR. 2008. Natural products for neglected diseases: a review. Curr Org Chem.
12(8):643-667.
Irwin JJ and Shoichet BK. 2005. ZINC--a free database of commercially available
compounds for virtual screening. J Chem Inf Model. 45(1):177-82.
Irwin JJ. 2008. Community benchmarks for virtual screening. J Comput Aided Mol Des.
22(3-4):193-9.
Jain AN. 2007. Surflex-Dock 2.1: robust performance from ligand energetic modeling,
ring flexibility, and knowledge-based search. J Comput Aided Mol Des.
21(5):281-306
Jaroszewski JW. 2005a. Hyphenated NMR methods in natural products research, part 1:
direct hyphenation. Planta Med. 71(8):691-700.
Jaroszewski JW. 2005b. Hyphenated NMR methods in natural products research, Part 2:
HPLC-SPE-NMR and other new trends in NMR hyphenation. Planta Med.
71(9):795-802.

Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML. 1983. Comparison
of simple potential functions for simulating liquid water. J Chem Phys. 79:926-
935.
Jorgensen WL. 2004. The many roles of computation in drug discovery. Science.
303(5665):1813-8.
Kable ML, Heidmann S, and Stuart KD. 1997. RNA editing: getting U into RNA.
Trends Biochem Sci. 22(5):162-6.
Kala S and Salavati R. 2010. OB-fold domain of KREPA4 mediates high-affinity
interaction with guide RNA and possesses annealing activity. RNA.16(10):1951-
67.
Källblad P, Todorov NP, Willems HM, Alberts IL. 2004. Receptor flexibility in the in
silico screening of reagents in the S1' pocket of human collagenase. J Med Chem.
47(11):2761-7.
Kang X, Rogers K, Gao G, Falick AM, Zhou S, Simpson L. 2005. Reconstitution of
uridine-deletion precleaved RNA editing with two recombinant enzymes.
Proc Natl Acad Sci U S A. 102(4):1017-22.
Kapetanovic IM. 2008. Computer-aided drug discovery and development (CADDD): in
silico-chemico-biological approach. Chem Biol Interact. 171(2):165-76.
Kelliher MA, McLaughlin J, Witte ON, Rosenberg N. 1990. Induction of a chronic
myelogenous leukemia-like syndrome in mice with v-abl and BCR/ABL. Proc
Natl Acad Sci U S A. 87(17):6649-53.
Kitada S, Kress CL, Krajewska M, Jia L, Pellecchia M, Reed JC. 2008. Bcl-2 antagonist
apogossypol (NSC736630) displays single-agent activity in Bcl-2-transgenic mice
and has superior efficacy with less toxicity compared with gossypol (NSC19048).
Blood. 111(6):3211-9.
Knegtel RM, Kuntz ID, Oshiro CM. 1997. Molecular docking to ensembles of protein
structures. J Mol Biol. 266(2):424-40.
Koehn FE and Carter GT. 2005. The evolving role of natural products in drug discovery.
Nat Rev Drug Discov. 4(3):206-20.
Kollman PA, Massova I, Reyes C, Kuhn B, Huo S, Chong L, Lee M, Lee T, Duan Y,
Wang W, Donini O, Cieplak P, Srinivasan J, Case DA and Cheatham TE 3rd.
2000. Calculating structures and free energies of complex molecules: combining
molecular mechanics and continuum models. Acc Chem Res. 33(12):889-97.
Konishi Y, Kiyota T, Draghici C, Gao JM, Yeboah F, Acoca S, Jarussophon S, and

Purisima E. 2007. Molecular formula analysis by an MS/MS/MS technique to
expedite dereplication of natural products. Anal Chem. 79(3):1187-97.
Konopleva M, Contractor R, Tsao T, Samudio I, Ruvolo PP, Kitada S, Deng X, Zhai D,
Shi YX, Sneed T, Verhaegen M, Soengas M, Ruvolo VR, McQueen T, Schober
WD, Watt JC, Jiffar T, Ling X, Marini FC, Harris D, Dietrich M, Estrov Z,
McCubrey J, May WS, Reed JC, Andreeff M. 2006. Mechanisms of apoptosis
sensitivity and resistance to the BH3 mimetic ABT-737 in acute myeloid
leukemia. Cancer Cell. 10(5):375-388.
Konopleva M, Watt J, Contractor R, Tsao T, Harris D, Estrov Z, Bornmann W,
Kantarjian H, Viallet J, Samudio I, Andreeff M. 2008. Mechanisms of
antileukemic activity of the novel Bcl-2 homology domain-3 mimetic GX15-070
(obatoclax). Cancer Res. 68(9):3413-20.
Koppensteiner WA and Sippl MJ. 1998. Knowledge-based potentials--back to the roots.
Biochemistry. 63(3):247-52
Korfmacher WA. 2005. Principles and applications of LC-MS in new drug discovery.
Drug Discov Today. 10(20):1357-67.
Koslowsky DJ, Göringer HU, Morales TH, and Stuart K. 1992. In vitro guide
RNA/mRNA chimaera formation in Trypanosoma brucei RNA editing. Nature.
356(6372):807-9.
Krammer PH, Behrmann I, Daniel P, Dhein J, Debatin KM. 1994. Regulation of
apoptosis in the immune system. Current Opinion in Immunology. 6(2):279-
289.
Kramer B, Rarey M, Lengauer T. 1999. Evaluation of the FLEXX incremental
construction algorithm for protein-ligand docking. Proteins. 37(2):228-41.
Kuhn B, Kollman PA. 2000. Binding of a diverse set of ligands to avidin and
streptavidin: an accurate quantitative prediction of their relative affinities by a
combination of molecular mechanics and continuum solvent models. J Med
Chem. 43(20):3786-91.
Kujawinski EB and Behn MD. 2006. Automated analysis of electrospray ionization
fourier transform ion cyclotron resonance mass spectra of natural organic matter.
Anal Chem. 78(13):4363-73.
Kuntz I, Blaney J, Oatley S, Langridge R, Ferrin T. 1982. A geometric approach to
macromolecule-ligand interactions. J Mol Biol. 161(2):269-88.
Kuntz I. 1992. Structure-Based Strategies for Drug Design and Discovery. Science.
257:(5073): 1078.

Laio A and Parrinello M. 2002. Escaping free-energy minima. Proc Natl Acad Sci U S A.
99(20):12562-6.
Lama D, Sankararamakrishnan R. 2008. Anti-apoptotic Bcl-X L protein in complex with
BH3 peptides of pro-apoptotic Bak, Bad, and Bim proteins: Comparative
molecular dynamics simulations. Proteins Struct Funct Bioinf. 73(2):492-514.
Lambert M, Strk D, Hansen SH, Sairafianpour M, and Jaroszewski JW. 2005. Rapid
extract dereplication using HPLC-SPE-NMR: analysis of isoflavonoids from
Smirnowia iranica. J Nat Prod. 68:1500-1509.
Law JA, O'Hearn SF, Sollner-Webb B. 2008. Trypanosoma brucei RNA editing protein
TbMP42 (band VI) is crucial for the endonucleolytic cleavages but not the
subsequent steps of U-deletion and U-insertion. RNA. 14(6):1187-200.
Lee KH. 2004. Current developments in the discovery and design of new drug candidates
from plant natural product leads. J Nat Prod. 67(2):273-83.
Lee EF, Czabotar PE, Smith BJ, Deshayes K, Zobel K, Colman PM, Fairlie WD. 2007.
Crystal structure of ABT-737 complexed with Bcl-x L : implications for
selectivity of antagonists of the Bcl-2 family. Cell Death Differ. 14(9):1711-1713.
Lee EF, Czabotar PE, van Delft MF, Michalak EM, Boyle MJ, Willis SN, Puthalakath H,
Bouillet P, Colman PM, Huang DCS, Fairlie WD. 2008. A novel BH3 ligand that
selectively targets Mcl-1 reveals that apoptosis can proceed without Mcl-1
degradation. J Cell Biol. 180(2):341-355.
Lee EF, Czabotar PE, Yang H, Sleebs BE, Lessene G, Colman PM, Smith BJ, Fairlie
WD. 2009. Conformational Changes in Bcl-2 Pro-survival Proteins Determine
Their Capacity to Bind Ligands. J Biol Chem. 284(44):30508-30517.
Lessene G, Czabotar PE, Colman PM. 2008. BCL-2 family antagonists for cancer
therapy. Nat Rev Drug Discovery. 7(12):989-1000.
Lin JH, Perryman AL, Schames JR, and McCammon JA. 2002. Computational drug
design accommodating receptor flexibility: the relaxed complex scheme. J Am
Chem Soc. 124(20):5632-3.
Lin JH, Perryman AL, Schames JR, and McCammon JA. 2003. The relaxed complex
method: Accommodating receptor flexibility for drug design with an improved
scoring scheme. Biopolymers. 68(1):47-62.
Lipinski CA, Lombardo F, Dominy BW and Feeney PJ. 2001. Experimental and
computational approaches to estimate solubility and permeability in drug
discovery and development settings. Adv Drug Deliv Rev. 46(1-3):3-26.

Liu M and Wang S. 1999. MCDOCK: A Monte Carlo simulation approach to the
molecular docking problem. J Comput Aided Mol Des. 13(5):435-451.
Liu X, Dai S, Zhu Y, Marrack P, Kappler JW. 2003. The Structure of a Bcl-xL/Bim
Fragment Complex: Implications for Bim Function. Immunity. 19(3):341-352.
López-Pérez JL, Therón R, del Olmo E, Díaz D. 2007. NAPROC-13: a database for the
dereplication of natural product mixtures in bioassay-guided protocols.
Bioinformatics. 23(23):3256-7
Lorber DM, Shoichet BK. 2005. Hierarchical docking of databases of multiple ligand
conformations. Curr Top Med Chem. 5(8):739-49.
Louden D, Handley A, Taylor S, Lenz E, Miller S, Wilson ID, Sage A, Lafont R. 2001.
Spectroscopic characterisation and identification of ecdysteroids using high-
performance liquid chromatography combined with on-line UV--diode array, FT-
infrared and 1H-nuclear magnetic resonance spectroscopy and time of flight mass
spectrometry. J Chromatogr A. 910(2):237-46.
Ma B, Shatsky M, Wolfson HJ, Nussinov R. 2002. Multiple diverse ligands binding at a
single protein site: a matter of pre-existing populations. Protein Sci. 11(2):184-97.
Madison-Antenucci S, Grams J, and Hajduk SL. 2002. Editing machines: the
complexities of trypanosome RNA editing. Cell. 108(4):435-8.
Mandal S, Moudgil M, Mandal SK. 2009. Rational drug design. Eur J Pharmacol. 625(1-
3):90-100.
Marrone TJ, Briggs JM, McCammon JA. 1997. Structure-based drug design:
computational advances. Annu Rev Pharmacol Toxicol. 37:71-90.
Martys NS and Mountain RD. 1999. Velocity Verlet algorithm for dissipative-particle-
dynamics-based models of suspensions. Phys. Rev. E 59:3733–3736.
McManus MT, Shimamura M, Grams J, and Hajduk SL. 2001. Identification of candidate
mitochondrial RNA editing ligases from Trypanosoma brucei. RNA. 7(2):167-75.
Meagher KL and Carlson HA. 2004. Incorporating protein flexibility in structure-based
drug discovery: using HIV-1 protease as a test case. J Am Chem Soc.
126(41):13276-81.
Meier P, Finch A, Evan G. 2000. Apoptosis in development. Nature. 407(6805):796-801.
Meijuan F, Jianfeng W, Yaojian H, and Yufen Z. 2006. Rapid screening and
identification of Brefeldin A in endophytic fungi using HPLC-MS/MS. Front

Chem China. 1(1):15-19
Mercer L, Bowling T, Perales J, Freeman J, Nguyen T, Bacchi C, Yarlett N, Don R,
Jacobs R, Nare B. 2011. 2,4-Diaminopyrimidines as potent inhibitors of
Trypanosoma brucei and identification of molecular targets by a chemical
proteomics approach. PLoS Negl Trop Dis. 5(2):e956.
Metropolis N and Ulam S. 1949. The Monte Carlo Method. J Am Stat Assoc.
44(247):335-341.
Metropolis N, Rosenbluth A, Rosenbluth M, Teller A and Teller E. 1953. Equations of
State Calculations by Fast Computing Machines. J Chem Phys. 21(6):1087-1092.
Michel J and Essex JW. 2010. Prediction of protein-ligand binding affinity free energy
simulations : assumptions, pitfalls and expectations. J Comput Aided Mol Des.
24:639-658.
Miller MD, Kearsley SK, Underwood DJ, Sheridan RP. 1994. FLOG: a system to select
'quasi-flexible' ligands complementary to a receptor of known three-dimensional
structure. J Comput Aided Mol Des. 8(2):153-74.
Milne GWA. 2002. Pharmacophore and Drug Discovery. Encyclopedia of
Computational Chemistry.
Milne GWA, Nicklaus MC, Driscoll JS, Wang S and Zaharevitz D. 1994. The NCI Drug
Information System 3D Database. J Chem Inf Comput Sci. 34:1219-1224.
Minn AJ, Rudin CM, Boise LH, Thompson CB. 1995. Expression of bcl-xL can confer a
multidrug resistance phenotype. Blood. 86(5):1903-1910.
Mishra KP, Ganju L, Sairam M, Banerjee PK, Sawhney RC. 2008. A review of high
throughput technology for the screening of natural products. Biomed
Pharmacother. 62(2):94-8.
Miyashita O, Onuchic JN, Okamura MY. 2003.Continuum electrostatic model for the
binding of cytochrome c2 to the photosynthetic reaction center from Rhodobacter
sphaeroides. Biochemistry. 42(40):11651-60.
Moroy G, Martin E, Dejaegere A, Stote RH. 2009. Molecular Basis for Bcl-2 Homology
3 Domain Recognition in the Bcl-2 Protein Family. J Biol Chem. 284(26):17499-
17511.
Morris, G. M., Goodsell, D. S., Halliday, R.S., Huey, R., Hart, W. E., Belew, R. K. and
Olson, A. J. 1998. Automated Docking Using a Lamarckian Genetic Algorithm
and and Empirical Binding Free Energy Function. J Comp Chem. 19: 1639-1662.

Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS and Olson AJ.
2009. AutoDock4 and AutoDockTools4: Automated docking with selective
receptor flexibility. J Comput Chem. 30(16):2785-91.
Moshiri H and Salavati R. 2010. A fluorescence-based reporter substrate for monitoring
RNA editing in trypanosomatid pathogens. Nucleic Acids Res. 38(13):e138.
Muegge I and Martin YC. 1999. A general and fast scoring function for protein-ligand
interactions: a simplified potential approach. J Med Chem. 42(5):791-804.
Muegge I. 2000. A knowledge-based scoring function for protein-ligand interactions:
Probing the reference state. Perspect Drug Discov. 20(1):99-114
Mullinax JW and Noid WG. 2010. Recovering physical potentials from a model protein
databank. PNAS. 107(46):19867-19872.
Murray CW, Baxter CA, Frenkel AD. 1999. The sensitivity of the results of molecular
docking to induced fit effects: application to thrombin, thermolysin and
neuraminidase. J Comput Aided Mol Des. 13(6):547-62.
Naïm M, Bhat S, Rankin KN, Dennis S, Chowdhury SF, Siddiqi I, Drabik P, Sulea T,
Bayly CI, Jakalian A, Purisima EO. 2007. Solvated interaction energy (SIE) for
scoring protein-ligand binding affinities. 1. Exploring the parameter space.
J Chem Inf Model. 47(1):122-33.
Najmanovich R, Kuttner J, Sobolev V, Edelman M. 2000. Side-chain flexibility in
proteins upon ligand binding. Proteins. 39(3):261-8.
Natesh R, Schwager SL, Sturrock ED, and Acharya KR. 2003. Crystal structure of the
human angiotensin-converting enzyme-lisinopril complex. Nature.
421(6922):551-4.
Niemann, M., Brecht, M., Schluter, E., Weitzel, K., Zacharias, M., and Goringer, H. U.
2008. TbMP42 is a structure-sensitive ribonuclease that likely follows a metal ion
catalysis mechanism. Nucleic acids research. 36:4465-4473.
Newman DJ, Cragg GM, and Snader KM. 2003. Natural products as sources of new
drugs over the period 1981-2002. J Nat Prod. 66(7):1022-37.
Newman DJ and Cragg GM. 2007. Natural products as sources of new drugs over the last
25 years. J Nat Prod. 70(3):461-77.
Ng R. 2004. Drugs : From Discovery to Approval. John Wiley & Sons, Hoboken, NJ.
Nguyen M, Marcellus RC, Roulston A, Watson M, Serfass L, Murthy Madiraju SR,
Goulet D, Viallet J, Belec L, Billot X, Acoca S, Purisima E, Wiegmans A, Cluse

L, Johnstone RW, Beauparlant P, Shore GC. 2007. Small molecule obatoclax
(GX15-070) antagonizes MCL-1 and overcomes MCL-1-mediated resistance to
apoptosis. Proc Nat Acad Sci USA. 104(49):19512-19517.
Nwaka S, Ridley RG. 2003. Virtual deug discovery and development for neglected
diseases through public-private partnerships. Nat Rev Drug Disc. 2:919–928.
O'Donoghue P, Luthey-Schulten Z. 2005. Evolutionary profiles derived from the QR
factorization of multiple structural alignments gives an economy of information.
J Mol Biol. 346(3):875-94.
Oduor RO, Ojo KK, Williams GP, Bertelli F, Mills J, Maes L, Pryde DC, Parkinson T,
Van Voorhis WC, Holler TP. 2011. Trypanosoma brucei glycogen synthase
kinase-3, a target for anti-trypanosomal drug development: a public-private
partnership to identify novel leads. PLoS Negl Trop Dis. 5(4):e1017.
Oltersdorf T, Elmore SW, Shoemaker AR, Armstrong RC, Augeri DJ, Belli BA, Bruncko
M, Deckwerth TL, Dinges J, Hajduk PJ, Joseph MK, Kitada S, Korsmeyer SJ,
Kunzer AR, Letai A, Li C, Mitten MJ, Nettesheim DG, Ng S, Nimmer PM,
O'Connor JM, Oleksijew A, Petros AM, Reed JC, Shen W, Tahir SK, Thompson
CB, Tomaselli KJ, Wang B, Wendt MD, Zhang H, Fesik SW, Rosenberg SH.
2005. An inhibitor of Bcl-2 family proteins induces regression of solid tumours.
Nature. 435(7042):677-681.
Oprea TI and Matter H. 2004. Integrating virtual screening in lead discovery. Curr Opin
Chem Biol. 8(4):349-58.
Osterberg F, Morris GM, Sanner MF, Olson AJ, Goodsell DS. 2002. Automated docking
to multiple target structures: incorporation of protein mobility and structural water
heterogeneity in AutoDock. Proteins. 46(1):34-40.
Orozco M and Luque FJ. 2000. Theoretical Methods for the Description of the Solvent
Effect in Biomolecular Systems. Chem Rev. 100(11):4187-4226.
Ow YL, Green DR, Hao Z, Mak TW. 2008. Cytochrome c: functions beyond respiration.
Nature Reviews Molecular Cell Biology. 9(7):532-542.
Panigrahi AK, Gygi SP, Ernst NL, Igo RP Jr, Palazzo SS, Schnaufer A, Weston DS,
Carmean N, Salavati R, Aebersold R, and Stuart KD. 2001. Association of two
novel proteins, TbMP52 and TbMP48, with the Trypanosoma brucei RNA editing
complex. Mol Cell Biol. 21(2):380-9.
Panigrahi AK, Schnaufer A, Ernst NL, Wang B, Carmean N, Salavati R, Stuart K. 2003.
Identification of novel components of Trypanosoma brucei editosomes. RNA.
9(4):484-92.

Park CM, Bruncko M, Adickes J, Bauch J, Ding H, Kunzer A, Marsh KC, Nimmer P,
Shoemaker AR, Song X, Tahir SK, Tse C, Wang X, Wendt MD, Yang X, Zhang
H, Fesik SW, Rosenberg SH, Elmore SW. 2008. Discovery of an orally
bioavailable small molecule inhibitor of prosurvival B-cell lymphoma 2 proteins.
J Med Chem. 51(21):6902-15.
Pepaj M, Wilson SR, Novotna K, Lundanes E, and Greibrokk T. 2006. Two-dimensional
capillary liquid chromatography: pH gradient ion exchange and reversed phase
chromatography for rapid separation of proteins. J Chromatogr A. 1120(1-2):132-
41.
Perez-Galan P, Roue G, Villamor N, Campo E, Colomer D. 2007. The BH3-mimetic
GX15-070 synergizes with bortezomib in mantle cell lymphoma by enhancing
Noxa-mediated activation of Bak. Blood. 109(10):4441-4449.
Pérez-Tomás R, Viñas M. 2010. New Insights on the Antitumoral Properties of
Prodiginines. Curr Med Chem. 17:2222-2231.
Perryman AL, Forli S, Morris GM, Burt C, Cheng Y, Palmer MJ, Whitby K, McCammon
JA, Phillips C, Olson AJ. 2010. A dynamic model of HIV integrase inhibition and
drug resistance. J Mol Biol. 397(2):600-15.
Petros AM, Olejniczak ET, Fesik SW. 2004. Structural biology of the Bcl-2 family of
proteins. Biochim Biophys Acta. 1644(2-3):83-94.
Petucci C and Mallis L. 2005. Automated accurate mass data processing using a gas
chromatograph/time-of-flight mass spectrometer in drug discovery.
Rapid Commun Mass Spectrom. 19(11):1492-8.
Philippopoulos M, Lim C. 1999. Exploring the dynamic information content of a protein
NMR structure: comparison of a molecular dynamics simulation with the NMR
and X-ray structures of Escherichia coli ribonuclease HI. Proteins. 36(1):87-110.
Pinto M, Perez J, Rubio-Martinez J. 2004. Molecular dynamics study of peptide segments
of the BH3 domain of the proapoptotic proteins Bak, Bax, Bid and Hrk bound to
the Bcl-xL and Bcl-2 proteins. J Comput-Aided Mol Des. 18(1):13-22.
Pitarch J, Moliner V, Pascual-Ahuir JL, Silla E and Tun˜o´n I. 1996. Can
Hydrophobic Interactions Be Correctly Reproduced by the Continuum
Models? J Phys Chem. 100:9955-9959.
Plumb RS, Johnson KA, Rainville P, Smith BW, Wilson ID, Castro-Perez JM, and
Nicholson JK. 2006. UPLC/MS(E); a new approach for generating molecular
fragment information for biomarker structure elucidation. Rapid Commun Mass
Spectrom. 20(13):1989-94.

Ponder JW and Richards FM. 1987. Tertiary templates for proteins. Use of packing
criteria in the enumeration of allowed sequences for different structural classes.
J Mol Biol. 193(4):775-91.
Potterat O, Wagner K, and Haag H. 2000. Liquid chromatography-electrospray time-of-
flight mass spectrometry for on-line accurate mass determination and
identification of cyclodepsipeptides in a crude extract of the fungus Metarrhizium
anisopliae. J Chromatogr A. 872(1-2):85-90.
Purisima EO and Nilar SH. 1995. A simple yet accurate boundary element method for
continuum dielectric calculations. J Comput Chem. 16(6):681-689.
Purisima EO. 1998. Fast Summation Boundary Element Method for Calculating
Solvation Free Energies of Macromolecules. J Comput Chem. 19:1494-1504.
Queiroz EF, Wolfender JL, Hostettmann K. 2009. Modern approaches in the search for
new lead antiparasitic compounds from higher plants. Curr Drug Targets.
10(3):202-11.
Raha K and Merz KM. 2005. Calculating Binding Free Energy in Protein-Ligand
Interaction. Ann Rep Comp Chem. 1:113-130.
Rarey M, Kramer B, Lengauer T, Klebe G. 1996. A fast flexible docking method using
an incremental construction algorithm. J Mol Biol. 261(3):470-89.
Read LK, Göringer HU, and Stuart K. 1994. Assembly of mitochondrial
ribonucleoprotein complexes involves specific guide RNA (gRNA)-binding
proteins and gRNA domains but does not require preedited mRNA. Mol Cell Biol.
14(4):2629-39.
Ryckaert JP, Ciccotti G and Berendsen HJC. 1977. Numerical Integration of the
Cartesian Equations of Motion of a System with Constraints: Molecular
Dynamics of n-Alkanes. J Comp Phys. 23(3):327–341.
Sabatini R and Hajduk SL. 1995. RNA ligase and its involvement in guide RNA/mRNA
chimera formation. Evidence for a cleavage-ligation mechanism of Trypanosoma
brucei mRNA editing. J Biol Chem. 270(13):7233-40.
Sabatini RS, Adler BK, Madison-Antenucci S, McManus MT, and Hajduk SL. 1998.
Biochemical methods for analysis of kinetoplastid RNA editing. Methods.
15(1):15-26.
Salavati R, Panigrahi AK, Morach BA, Palazzo SS, Igo RP, and Stuart K. 2002.
Endoribonuclease activities of Trypanosoma brucei mitochondria. Mol Biochem
Parasitol. 120(1):23-31.

Salavati R, Ernst NL, O'Rear J, Gilliam T, Tarun S Jr, and Stuart K. 2006. KREPA4, an
RNA binding protein essential for editosome integrity and survival of
Trypanosoma brucei. RNA. 12(5):819-31.
Sandvoss M, Pham LH, Levsen K, Preiss A, Mügge C, and Wünsch G. 2000.
Isolation and structural elucidation of stereoid oligoglycosides from the starfish
Asterias rubens by means of direct online LC-NMR-MS hyphenation and one and
two dimensional NMR investigations. Eur J Org Chem. 7:1253-1262.
Sarker SD, Latif Z, and Gray AI. 2005. Methods in Biotechnology, natural products
isolation (2nd
edition). 20:1-25.
Sarker SD and Nahar L. 2005. Methods in Biotechnology, natural products isolation (2nd
edition). 20:233-267.
Sattler M, Liang H, Nettesheim D, Meadows RP, Harlan JE, Eberstadt M, Yoon HS,
Shuker SB, Chang BS, Minn AJ, Thompson CB, Fesik SW. 1997. Structure of
Bcl-xL-Bak Peptide Complex: Recognition Between Regulators of Apoptosis.
Science. 275(5302):983-986.
Sbicego S, Alfonzo JD, Estévez AM, Rubio MA, Kang X, Turck CW, Peris M, Simpson
L. 2003. RBP38, a novel RNA-binding protein from trypanosomatid
mitochondria, modulates RNA stability. Eukaryot Cell. 2(3):560-8.
Scagliotti GV and Selvaggi G. 2006. Antimetabolites and cancer: emerging data with a
focus on antifolates. Expert Opin Ther Pat. 16(2):189-200.
Schames JR, Henchman RH, Siegel JS, Sotriffer CA, Ni H, McCammon JA . 2004.
Discovery of a novel binding trench in HIV integrase. J Med Chem. 47(8):1879-
81.
Schindler T, Bornmann W, Pellicena P, Miller WT, Clarkson B, Kuriyan J. 2000.
Structural mechanism for STI-571 inhibition of abelson tyrosine kinase. Science.
289(5486):1938-42.
Schmidt MW, Baldridge KK, Boatz JA, Elbert ST, Gordon MS, Jensen JH, Koseki S,
Matsunaga N, Nguyen KA, Su S, Windus TL, Dupuis M, Montgomery Jr JA.
1993. General Atomic and Molecular Electronic Structure System. J Comput
Chem. 14(11):1347-1353.
Schnaufer A, Panigrahi AK, Panicucci B, Igo RP Jr, Wirtz E, Salavati R, and Stuart K.
2001. An RNA ligase essential for RNA editing and survival of the bloodstream
form of Trypanosoma brucei. Science. 291(5511):2159-62.
Schnaufer A, Ernst NL, Palazzo SS, O'Rear J, Salavati R, and Stuart K. 2003.
Separate insertion and deletion subcomplexes of the Trypanosoma brucei RNA

editing complex. Mol Cell. 12(2):307-19.
Schneider A, Charrière F, Pusnik M, and Horn EK. 2007. Isolation of mitochondria from
procyclic Trypanosoma brucei. Methods Mol Biol. 372:67-80.
Schreiber H, Steinhauser O. 1992. Cutoff size does strongly influence molecular
dynamics results on solvated polypeptides. Biochemistry. 31(25):5856-60.
Scott RB. 1970. Cancer chemotherapy--the first twenty-five years. Br Med J. 4 (5730):
259–265.
Seidler J, McGovern SL, Doman TN, Shoichet BK. 2003. Identification and prediction of
promiscuous aggregating inhibitors among known drugs. J Med Chem.
46(21):4477-86.
Seifert MH. 2009. Targeted scoring functions for virtual screening. Drug Discov Today.
14(11-12):562-9
Seiwert SD, Heidmann S, and Stuart K. 1996. Direct visualization of uridylate deletion in
vitro suggests a mechanism for kinetoplastid RNA editing. Cell. 84(6):831-41.
Shaneh A and Salavati R. 2009. Kinetoplastid RNA editing ligases 1 and 2 exhibit
different electrostatic properties. J Mol Model. 16(1):61-76.
Sherman W, Day T, Jacobson MP, Friesner RA, Farid R. 2006. Novel procedure for
modeling ligand/receptor induced fit effects. J Med Chem. 49(2):534-53.
Shibata S, Gillespie JR, Kelley AM, Napuli AJ, Zhang Z, Kovzun KV, Pefley RM, Lam
J, Zucker FH, Van Voorhis WC, Merritt EA, Hol WG, Verlinde CL, Fan E,
Buckner FS. 2011. Selective inhibitors of methionyl-tRNA synthetase have potent
activity against Trypanosoma brucei Infection in Mice. Antimicrob Agents
Chemother. 55(5):1982-9.
Shin YG and van Breemen RB. 2001. Analysis and screening of combinatorial libraries
using mass spectrometry. Biopharm Drug Dispos. 22(7-8):353-72
Shore GC and Nguyen M. 2008. Bcl-2 Proteins and Apoptosis: Choose Your Partner.
Cell 135(6):1004-1006.
Shu YZ. 1998. Recent natural products based drug development: a pharmaceutical
industry perspective. J Nat Prod. 61(8):1053-71.
Shuker SB, Hajduk PJ, Meadows RP, Fesik SW. 1996. Discovering High-Affinity
Ligands for Proteins: SAR by NMR. Science. 274(5292):1531-1534.

Simonian PL, Grillot DAM, Nunez G. 1997. Bcl-2 and Bcl-XL Can Differentially Block
Chemotherapy-Induced Cell Death. Blood. 90(3):1208-1216.
Simpson L, Sbicego S, and Aphasizhev R. 2003. Uridine insertion/deletion RNA editing
in trypanosome mitochondria: a complex business. RNA. 9(3):265-76.
Sims PA, Wong CF, Vuga D, McCammon JA, Sefton BM. 2005. Relative contributions
of desolvation, inter- and intramolecular interactions to binding affinity in protein
kinase systems. J Comput Chem. 26(7):668-81.
Sippl MJ. 1990. Calculation of conformational ensembles from potentials of mean force.
An approach to the knowledge-based prediction of local structures in globular
proteins. J Mol Biol. 213(4):859-83.
Sippl MJ. 1993. Boltzmann's principle, knowledge-based mean fields and protein folding.
An approach to the computational determination of protein structures. J Comput
Aided Mol Des. 7(4):473-501.
Sippl MJ, Ortner M, Jaritz M, Lackner P, Flöckner H. 1996. Helmholtz free energies of
atom pair interactions in proteins. Fold Des. 1(4):289-98.
Sleebs BE, Czabotar PE, Fairbrother WJ, Fairlie WD, Flygare JA, Huang DC, Kersten
WJ, Koehler MF, Lessene G, Lowes K, Parisot JP, Smith BJ, Smith ML, Souers
AJ, Street IP, Yang H, Baell JB. 2011. Quinazoline sulfonamides as dual binders
of the proteins B-cell lymphoma 2 and B-cell lymphoma extra long with potent
proapoptotic cell-based activity. J Med Chem. 54(6):1914-26.
Sleno L, Volmer DA, and Marshall AG. 2005. Assigning product ions from complex
MS/MS spectra: the importance of mass uncertainty and resolving power. J Am
Soc Mass Spectrom. 16(2):183-98.
Song CM, Lim SJ, Tong JC. 2009. Recent advances in computer-aided drug design. Brief
Bioinform. 10(5):579-91.
Sousa SF, Fernandes PA and Ramos MJ. 2006. Protein-ligand docking: Current status
and future challenges. Proteins. 65:15-26.
Still WC, Tempczyk A, Hawley RC, Hendrickson T. 1990. Semianalytical treatment of
solvation for molecular mechanics and dynamics. J Am Chem Soc. 112(16):
6127–6129.
Stuart KD, Schnaufer A, Ernst NL, Panigrahi AK. 2005. Complex management: RNA
editing in trypanosomes. Trends Biochem Sci. 30(2):97-105.
Stuart K, Brun R, Croft S, Fairlamb A, Gürtler RE, McKerrow J, Reed S, Tarleton R.
2008. Kinetoplastids: related protozoan pathogens, different diseases. J Clin

Invest. 118(4):1301-10.
Sugita Y and Okamoto Y. 1999. Replica-exchange molecular dynamics method for
protein folding. Chem Phys Letters. 314: 141–151
Swift RV, Durrant J, Amaro RE, McCammon JA. 2009. Toward understanding the
conformational dynamics of RNA ligation. Biochemistry. 48(4):709-19.
Tarun SZ Jr, Schnaufer A, Ernst NL, Proff R, Deng J, Hol W, Stuart K. 2008. KREPA6
is an RNA-binding protein essential for editosome integrity and survival of
Trypanosoma brucei. RNA. 14(2):347-58.
Teague SJ. 2003. Implications of protein flexibility for drug discovery. Nat Rev Drug
Discov. 2(7):527-41.
Tomasi J and Persico M. 1994. Molecular Interactions in Solution: An Overview of
Methods Based on Continuous Distributions of the Solvent. Chem Rev.
94(7):2027-2094.
Trotter JR, Ernst NL, Carnes J, Panicucci B, Stuart K. 2005. A deletion site editing
endonuclease in Trypanosoma brucei. Mol Cell. 20(3):403-12.
Trudel S, Li ZH, Rauw J, Tiedemann RE, Wen XY, Stewart AK. 2007. Preclinical
studies of the pan-Bcl inhibitor obatoclax (GX015-070) in multiple myeloma.
Blood. 109(12):5430-8.
Tse C, Shoemaker AR, Adickes J, Anderson MG, Chen J, Jin S, Johnson EF, Marsh KC,
Mitten MJ, Nimmer P, Roberts L, Tahir SK, Xiao Y, Yang X, Zhang H, Fesik S,
Rosenberg SH, Elmore SW. 2008. ABT-263: A Potent and Orally Bioavailable
Bcl-2 Family Inhibitor. Cancer Research. 68(9):3421-3428.
Tyler AN, Clayton E, and Green BN. 1996. Exact mass measurement of polar organic
molecules at low resolution using electrospray ionization and a quadrupole mass
spectrometer. Anal Chem. 68(20):3561-9.
Tzung SP, Kim KM, Basañez G, Giedt CD, Simon J, Zimmerberg J, Zhang KY,
Hockenbery DM. 2001. Antimycin A mimics a cell-death-inducing Bcl-2
homology domain 3. Nat Cell Biol. 3(2):183-91.
van Delft MF, Wei AH, Mason KD, Vandenberg CJ, Chen L, Czabotar PE, Willis SN,
Scott CL, Day CL, Cory S, Adams JM, Roberts AW, Huang DCS. 2006. The
BH3 mimetic ABT-737 targets selective Bcl-2 proteins and efficiently induces
apoptosis via Bak/Bax if Mcl-1 is neutralized. Cancer Cell. 10(5):389-399.
Van Der Spoel D, Lindahl E, Hess B, Groenhof G, Mark AE, Berendsen HJ. 2005.
GROMACS: fast, flexible, and free. J Comput Chem. 26(16):1701-18.

VanMiddlesworth F and Cannell RJP. 1998. Dereplication and partial identification of
natural products. Methods in Biotechnology, natural products isolation. 4:279-
327.
Verdonk ML, Cole JC, Hartshorn MJ, Murray CW and Taylor RD. 2003. Improved
protein-ligand docking using GOLD. Proteins. 52(4):609-23.
Verkhivker G, Appelt K, Freer ST, Villafranca JE. 1995. Empirical free energy
calculations of ligand-protein crystallographic complexes. I. Knowledge-based
ligand-protein interaction potentials applied to the prediction of human
immunodeficiency virus 1 protease binding affinity. Protein Eng. 8(7):677-91.
Verkhivker GM, Bouzida D, Gehlhaar DK, Rejto PA, Freer ST, Rose PW. 2002.
Complexity and simplicity of ligand-macromolecule interactions: the energy
landscape perspective. Curr Opin Struct Biol. 12(2):197-203.
Verlet, J. 1967. Computer “Experiments” on Classical Fluids. 1. Thermodynamical
Properties of Lennard-Jones Molecules. Phys. Rev. 159:98-103.
Vivó-Truyols G and Schoenmakers PJ. 2006. Automatic selection of optimal Savitzky-
Golay smoothing. Anal Chem. 78(13):4598-608.
Wallach I and Lilien R. 2011. Virtual decoy sets for molecular docking benchmarks.
J Chem Inf Model. 51(2):196-202
Walker RC, Crowley MF, Case DA. 2008. The implementation of a fast and accurate
QM/MM potential method in Amber. J Comput Chem. 29(7):1019-31.
Wang JL, Liu D, Zhang ZJ, Shan S, Han X, Srinivasula SM, Croce CM, Alnemri ES,
Huang Z. 2000. Structure-based discovery of an organic compound that binds
Bcl-2 protein and induces apoptosis of tumor cells. Proc Natl Acad Sci U S A.
97(13):7124-9.
Wang W, Lim WA, Jakalian A, Wang J, Wang J, Luo R, Bayly CI, Kollman PA. 2001.
An analysis of the interactions between the Sem-5 SH3 domain and its ligands
using molecular dynamics, free energy calculations, and sequence analysis.
J Am Chem Soc. 123(17):3986-94.
Wang J, Wolf RM, Caldwell JW, Kollman PA, Case DA. 2004. Development and testing
of a general amber force field. J Comput Chem. 25:1157-1174.
Warwicker J and Watson HC. 1982. Calculation of the electric potential in the active site
cleft due to α-helix dipoles. J Mol Biol. 157:671-679.

Wei BQ, Weaver LH, Ferrari AM, Matthews BW, Shoichet BK. 2004. Testing a flexible-
receptor docking algorithm in a model binding site. J Mol Biol. 337(5):1161-82.
Wei J, Kitada S, Rega MF, Stebbins JL, Zhai D, Cellitti J, Yuan H, Emdadi A, Dahl R,
Zhang Z, Yang L, Reed JC, Pellecchia M. 2009a. Apogossypol derivatives as
pan-active inhibitors of antiapoptotic B-cell lymphoma/leukemia-2 (Bcl-2) family
proteins. J Med Chem. 52(14):4511-23.
Wei J, Kitada S, Rega MF, Emdadi A, Yuan H, Cellitti J, Stebbins JL, Zhai D, Sun J,
Yang L, Dahl R, Zhang Z, Wu B, Wang S, Reed TA, Wang HG, Lawrence N,
Sebti S, Reed JC, Pellecchia M. 2009b. Apogossypol derivatives as antagonists of
antiapoptotic Bcl-2 family proteins. Mol Cancer Ther. 8(4):904-13.
Weisberg E, Manley PW, Cowan-Jacob SW, Hochhaus A and Griffin JD. 2007. Second
generation inhibitors of BCR-ABL for the treatment of imatinib-resistant chronic
myeloid leukaemia. Nat Rev Cancer. 7(5):345-5
Welch W, Ruppert J, Jain AN. 1996. Hammerhead: fast, fully automated docking of
flexible ligands to protein binding sites. Chem Biol. 3:449-62.
Wendy D. Cornell, Piotr Cieplak, Christopher I. Bayly, Ian R. Gould, Kenneth M. Merz,
Jr., David M. Ferguson, David C. Spellmeyer, Thomas Fox, James W. Caldwell,
and Peter A. Kollman. 1995. A Second Generation Force Field for the Simulation
of Proteins, Nucleic Acids, and Organic Molecules. J. Am. Chem. Soc.
117:5179−5197.
Westheimer FH and Mayer JE. 1946. The Theory of the Racemization of Optically
Active Derivatives of Diphenyl. J Chem Phys. 14:733.
Wilson ID and Brinkman UA. 2003. Hyphenation and hypernation the practice and
prospects of multiple hyphenation. J Chromatogr A. 1000(1-2):325-56.
Wolfender JL, Terreaux C, and Hostettmann K. 2000. The importance of LC-MS and
LC-NMR in the discovery of new lead compounds from plants. Pharm. Biol.
38:41-54
Wolfender JL, Ndjoko K, Hostettmann K. 2003. Liquid chromatography with ultraviolet
absorbance-mass spectrometric detection and with nuclear magnetic resonance
spectroscopy: a powerful combination for the on-line structural investigation of
plant metabolites. J Chromatogr A. 1000(1-2):437-55.
Wolff JC, Eckers C, Sage AB, Giles K, and Bateman R. 2001. Accurate mass liquid
chromatography/mass spectrometry on quadrupole orthogonal acceleration time-
of-flight mass analyzers using switching between separate sample and reference
sprays. 2. Applications using the dual-electrospray ion source. Anal Chem.
73(11):2605-12.

Wolff JC, Fuentes TR, and Taylor J. 2003. Investigations into the accuracy and precision
obtainable on accurate mass measurements on a quadrupole orthogonal
acceleration time-of-flight mass spectrometer using liquid chromatography as
sample introduction. Rapid Commun Mass Spectrom. 17(11):1216-9.
Wong CF and McCammon AJ. 2003. Protein simulation and drug design. Adv Protein
Chem. 66:87-121.
Wong CF, Kua J, Zhang Y, Straatsma TP, McCammon JA. 2005. Molecular docking of
balanol to dynamics snapshots of protein kinase A. Proteins. 61(4):850-8.
Worthey EA, Schnaufer A, Mian IS, Stuart K, and Salavati R. 2003. Comparative
analysis of editosome proteins in trypanosomatids. Nucleic Acids Res.
31(22):6392-408.
Wu J and McAllister H. 2003. Exact mass measurement on an electrospray ionization
time-of-flight mass spectrometer: error distribution and selective averaging. J
Mass Spectrom. 38(10):1043-53.
York DM, Yang W, Lee H, Darden T and Pedersen LG. 1995. Toward the Accurate
Modeling of DNA: The Importance of Long-Range Electrostatics. J Am Chem
Soc. 117:5001–5002.
Zauhar RJ and Morgan RS. 1985. A new method for computing the macromolecular
electric potential. J Mol Biol. 186(4):815-820.
Zauhar RJ and Varnek A. 1996. A fast and space-efficient boundary element method for
computing electrostatic and hydration effects in large molecules. J Comp Chem.
17(7):864-877.
Zhai D, Jin C, Satterthwait AC, Reed JC. 2006. Comparison of chemical inhibitors of
antiapoptotic Bcl-2-family proteins. Cell Death Differ. 13(8):1419-1421.
Zhang LK, Rempel D, Pramanik BN and Gross ML. 2005. Accurate mass measurements
by Fourier transform mass spectrometry. Mass Spectrom Rev. 24(2):286-309.
Zhang S. 2011. Computer-Aided Drug Discovery and Development. In:
Satyanarayanajois, SD. Drug Design and Discovery : Methods and Protocols.
Springer Protocols. p24.
Zhou J and Giannakakou P. 2005. Targeting microtubules for cancer chemotherapy. Curr
Med Chem Anticancer Agents. 5(1):65-71.
Zimmermann J, Furet P and Buchdunger E. 2001. STI571: A New Treatment Modality
for CML? ACS Symp Series. 796:245-259.

Original Contributions to Knowledge
1. Provided a new understanding on the mechanism of selectivity of novel Bcl-2
inhibitors
a. Demonstrated that ABT-737 is not due to penetration angle of the
chlorobiphenyl ring as was previously described
b. Demonstrated that variations at p4 and at the α2/ α2 are crucially
responsible for the selectivity of ABT-737 to Bcl-2/Bcl-xL/Bcl-W
c. Provided evidence that Obatoclax binds in a BH3 mimetic fashion
with the methoxy group mimicking the BH3 Leucine residue which
binds at p2
d. Provided evidence that pan-Bcl-2 inhibitors will bind preferentially at
p1 and p2, while Bcl-2/Bcl-xL selective inhibitors will utilize p4
2. Identified a novel TbREL1 inhibitor and provided a better understanding on
its effect on RNA editing functionality
a. Identified C35 as a potent, novel TbREL1 inhibitor
b. Provided evidence for the effect of C35 on inhibition of deadenylation
of TbREL1
c. Provided evidence for the effect of C35 on editosome activities
d. Provided evidence for the effect of C35 on editosome integrity

3. Developped novel algorithm for the determination of the molecular formula
from MS/MS data
a. Provided detailed evidence for the realibility of the algorithm on 96
sample test set with a 95% success rate
b. Provided athe means to rapidly identify the molecular formula of
organic compounds and small peptides
c. Demonstrated effectiveness across organic compounds containing
Carbon, Hydrogen, Nitrogen and Oxygen and extended it to
compounds with a Bromine or Chlorine atom
4. Incorporated the use of conformational ensembles generated by Molcular
Dynamics for virtual screening
a. Carried out the first virtual screening study to examine the effects of
molecular dynamics ensembles on virtual screening enrichments
across numerous targets
b. Demonstrated impressive enrichment values for COX2 and AR
compared to previously documented enrichments obtained for our
pipeline
c. Provided evidence that structures within MD-generated
conformational ensembles can generate enrichment values beyond that
of the crystal structure or of the ensemble as a whole across the
majority of targets

In silico methods in drug discovery and development

More Related Content

What's hot (20)

Viewers also liked (11)

Similar to In silico methods in drug discovery and development (20)

In silico methods in drug discovery and development