Prediction of disorder in protein structure (amit singh)

PREDICTION OF DISORDER IN
PROTEIN STRUCTURE
Amit Singh
Bioinformatician
Central University of Punjab
http://www.ceitec.eu/ceitec-mu/protein-structure-and-dynamics/rg1

Contents
 What are intrinsically disordered proteins ?
 Why to predict ?
 Bioinformatics approach in prediction of disorder.
 Screenshots of prediction software.

Intrinsically Disordered/Unfolded Proteins(IDP/IUP)
 They are characterized by the lack of
stable secondary and tertiary structure
under physiological conditions and in
absence of a binding partner.
 Either completely disordered or contain
large disordered region in their native
state.
 IUP uses 50% of total surface for
interaction with partner as compared to
only 5-10% for most ordered proteins.
 70% of the cases of IUP contains a single
sequence continuous segment for binding
while IOP have number of fragments for
binding.
Intrinsically unstructured proteins and their functions
H. Jane Dyson & Peter E. Wright
Nature Reviews Molecular Cell Biology 6, 197-208 (March 2005)
doi:10.1038/nrm1589

Why to predict ?
 These proteins are difficult to study experimentally because of the lack of unique
structure in the isolated form.
 In X-ray crystallography, crystal packing may enforce certain disordered regions to
become ordered, and disordered binding segments are often crystallized in complex
with their partner and are classified ordered despite their lack of structure in
isolation.
 With NMR,disorder often is concluded from poor signal dispersion, which does not
differentiate between random coils and molten globules of high potential to fold in
the presence of a partner.

Bioinformatics approach in prediction of disorder.
Pairwise energy content of aa residues.
Frequencies of aa residues and hydrophobic cluster.
Mean packing densities of aa residues.

Pairwise energy content of aa residues.
 Pairwise energy of protein is a function of its amino acid sequence :-
Mij is the interaction energy between amino acid types i and j,
Cij is the number of interactions between residues of types i and j,
 Energy per amino acid is approximated by :-
 OR
Ni denote the number of amino acid residues of type i in the sequence
ni=Ni/L its frequency, P=energy predictor matrix for i and j
Dosztanyi Z, Csizmok V, Tompa P, Simon I. IUPred: Web server for the prediction of intrinsically
unstructured regions of proteins based on estimated energy content. Bioinformatics 2005;21:3433–3434.

 Total energy of the kth protein into amino acid specific contribution :-
energy of all amino acid residues type i.
 Depends on the number of contacts this residue makes with other amino acid
residues of j in the sequence.
 Letting ∂Z=0, for all Pij leads to a linear equation which are solved for each amino acid by GSL
scientific library.

Dosztanyi Z, Csizmok V, Tompa P, Simon I. IUPred: Web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 2005;21:3433–3434.

Software based on pairwise energy estimation IUpred
 Predict regions that lack a well-defined 3D structure under native conditions.
 The energy and amino acid composition for each position was calculated only by considering
interaction partners 2 to 100 residues apart.
 The choice of this range represents the intention of covering most structured domains, but
separating distinct domains in multi-domain proteins.
 This procedure yields an estimated energy at position p of type i:
 where Pp is the position specific energy predictor matrix.
 Software is written in C and interface is PHP.
 Available at http://iupred.enzim.hu/.

Frequencies of aa residues and hydrophobic cluster.
Based on two
properties:-
• 1.Disordered regions have a biased
composition
• They usually contain either small or no
hydrophobic clusters.
System and
Methods:-
• Constitution of reference set.
• Ratio and probabilities of aa occurrence
• Cluster distance

Constitution of reference set
 A subset named U10 is extracted from (L), Containing last ten residue of N-terminal
fragments and first ten residues of C-terminal fragments.
 Amino acid frequencies in structured and linker region were computed using the two
sets S and U10.

Ratio and probabilities of AA occurrence
 The probabilities of occurrence PL and PS of a given sequence in linker and
structured regions, respectively, are calculated using a multinomial law:
(NV,…,NG) are the variables taking as values the numbers (nV, nI,…,nG), of valines, isoleucines,…, glycines in the
sequence, and are the probabilities of occurrence of nv valines in
a linker sequence and in a structured sequence, respectively.
 For each sequence if it is more likely to be structured or unstructured, took the ratio
of these two probabilities, R = PL/PS.

Amino acids frequencies in the PDB and in the hydrophilic set U10.
Karen Coeytaux, and Anne Poupon Bioinformatics
2005;21:1891-1900

Cluster Distance
 Sequences were coded into ternary code
1 for hydrophobic residues (VILFMYW)
2 for proline and 0 for other amino acids.
 For amino acid in position i, we define the cluster distance as being the distance to
the closest cluster; the cluster distance is set to 0.5, when i is inside a cluster.
 For example, the sequence AGEKTISVVLQLEKEEQ corresponds to the current binary
code 00000101110100000.
 The identification of 1011101 as a hydrophobic cluster corresponding to the
sequence ISVVLQL

Rules for prediction of unfolded regions based on the probabilities ratio and the cluster
distance.
Karen Coeytaux, and Anne Poupon Bioinformatics
2005;21:1891-1900

Software based on Frequencies of aa residues and hydrophobic cluster.
PreLink.
 Available at http://genomics.eu.org.

Mean packing densities of aa residues
Based on
two
properties:-
Low overall hydrophobicity and a large net charge represents
a structural feature of unfolded proteins
The expected average number of contacts per residue for
folded and unfolded proteins.

System and Methods
 Construction of a protein database.
A database of 90 natively unfolded proteins (http://phys.protres.ru/resources/unfolded_90.html) was based on a published list
of proteins A database of 559 globular proteins (http://phys.protres.ru/resources/folded_559.html) was constructed using the PDB codes.
 Average number of close (heavy atoms is less than 8.0 Ao
apart) residues in the
globular state.
The expected average number of close residues was obtained as the total expected number of
close residues (according to Table 1) divided by the total number of amino acid residues in the protein.

 Hydrophobicity.
We used a published hydrophobicity scale. Average hydrophobicity was computed as the total hydrophobicity of all
amino acid residues divided by the total number of residues in the protein.
 Charge.
To compute the net charge of a protein, assumed the charge +1 for Lys and Arg, –1 for Glu and Asp, and 0
for the other residues. The average charge per residue was obtained as the net charge divided by the
total number of amino acid residues in the protein.

To be folded or to be unfolded?
Protein Science
Volume 13, Issue 11, pages 2871-2877, 29 DEC 2008 DOI: 10.1110/ps.04881304
http://onlinelibrary.wiley.com/doi/10.1110/ps.04881304/full#fig1
Figure 2.1 Comparison of the mean values of different parameters computed from sequence alone
for the set of 90 “natively unfolded” proteins (black circles) and for the set of 80 “ideally” folded
proteins (gray circles).

Software based on mean packing densities of aa residues FoldUnfold
.
.
 Available at http://skuld.protres.ru/~mlobanov/ogu/ogu.cgi.

Prediction of disorder in protein structure (amit singh)

Critical Assessment of Techniques for Protein Structure
Prediction(CASP)
 The performance of various disorder prediction methods was critically assessed in
the CASP experiments.
 Evaluation of performance of various predictor was first started during CASP5 on
December 1 - 5th, 2002
 The broad goals of the CASP5 experiment are to address the following questions
about the current state of the art in protein structure prediction:
 Are the models produced similar to the corresponding experimental structure
 Is the mapping of the target sequence onto the proposed structure (i.e. the alignment)
correct?
 Have similar structures that a model can be based on been identified?
 Are the details of the models correct?
 Has there been progress from the earlier CASPs?
 What methods are most effective?
 Where can future effort be most productively focused?

References
Coeytaux K., Poupon A. Prediction of unfolded segments in a protein sequence based on amino acid composition.
Bioinformatics 2005;21:1891-1900.
Dosztanyi Z., et al. The pairwise energy content estimated from amino acid composition discriminates between folded and
intrinsically unstructured proteins. J. Mol. Biol. 2005;347:827-839.
Galzitskaya O.V., et al. Optimal region of average side-chain entropy for fast protein folding. Protein Sci. 2000;9:580-586.
Galzitskaya O.V., et al. Prediction of natively unfolded regions in protein chains. Mol. Biol. (Moscow) 2006;40:341-348.
Garbuzynskiy S.O., et al. To be folded or to be unfolded? Protein Sci. 2004;13:2871-2877.
Linding R., et al. Protein disorder prediction: implications for structural proteomics. Structure 2003;11:1453-1459.
Obradovic Z., et al. Predicting intrinsic disorder from amino acid sequence. Proteins 2003;53:566-572.
Obradovic Z., et al. Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins
2005;61:176-182.
Radivojac P., et al. Protein flexibility and intrinsic disorder. Protein Sci. 2004;13:71-80.
Romero P., et al. Thousands of proteins likely to have long disordered regions. Pac. Symp. Biocomput. 1998:437-448.

Prediction of disorder in protein structure (amit singh)

More Related Content

What's hot (20)

Viewers also liked (17)

Similar to Prediction of disorder in protein structure (amit singh) (20)

Recently uploaded (20)

Prediction of disorder in protein structure (amit singh)