SlideShare a Scribd company logo
 
FBW 1-12-2011 Wim Van Criekinge
Inhoud Lessen: Bioinformatica don 29-09-2011: 1* Bioinformatics (practicum 8.30-11.00)  don 06-10-2011: 2* Biological Databases (practicum 9.00-11.30)  don 20-10-2011: 3 Sequence Similarity (Scoring Matrices) don 27-10-2011: 4 Sequence Alignments don 10-11-2011: 5 Database Searching Fasta/Blast don 17-11-2011:  afgelast don 24-11-2011: 6 Phylogenetics don 01-12-2011:  7 Protein Structure don 08-12-2011:  8 Gene Prediction, Gene Ontologies & HMM  don 15-12-2011: 9-10 Bio- & Cheminformatics in Drug Discovery (inhaalweek) Opgelet: Geen les op don 13-10-2010 en don 3-11-2010
Biobix: Applied Bioinformatics Research Thesisonderwerpen Lopend onderzoek Biomerker predictie /  Methylatie Metabonomics Peptidomics Translational biotechnology (text mining) Structural Genomics miRNA prediction / Target Prediction Exploring genomic dark matter ( junk mining ) Samenwerking met diverse instituten Ambities om te peer-reviewed te publiceren
empirical finding:  if two biological sequences are sufficiently similar, almost invariably they have similar biological functions and will be descended from a common ancestor.  (i)  function is encoded into sequence , this means: the sequence provides the syntax and  (ii) there is a  redundancy in the encoding , many positions in the sequence may be changed without perceptible changes in the function, thus the semantics of the encoding is robust. The reason for “bioinformatics” to exist ?
Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics  Weblems
Proteins perform a variety of cellular tasks in the living cells Each protein adopts a particular folding that determines its function The 3D structure of a protein can bring into close proximity residues that are far apart in the amino acid sequence  Catalytic site: Business End of the molecule Why protein structure ?
Rationale for understanding protein structure and function Protein sequence -large numbers of  sequences, including whole genomes Protein function - rational drug design and treatment of disease - protein and genetic engineering - build networks to model cellular pathways - study organismal function and evolution ? structure determination  structure prediction homology rational mutagenesis biochemical analysis model studies Protein structure - three dimensional - complicated - mediates function
About the use of protein models (Peitch) Structure is preserved under evolution when sequence is not  Interpreting the impact of mutations/SNPs and conserved residues on protein function. Potential link to disease Function ? Biochemical: the chemical interactions occerring in a protein Biological: role within the cell Phenotypic: the role in the organism  Gene Ontology functional classification ! Priorisation of residues to mutate to determine protein function Providing hints for protein function: Catalytic mechanisms of enzymes often require key residues to be close together in 3D space (protein-ligand complexes, rational drug design, putative interaction interfaces)
MIS-SENSE MUTATION e.g. Sickle Cell Anaemia Cause : defective haemoglobin due to mutation in β-globin gene Symptoms : severe anaemia and death in homozygote
Normal β-globin  - 146 amino acids val - his - leu - thr - pro -  glu  - glu - --------- 1  2  3  4  5  6  7 Normal gene  (aa 6) Mutant gene DNA CTC C A C mRNA GAG GUG Product Glu Valine Mutant   β-globin   val - his - leu - thr - pro -  val  - glu - ---------
Protein Conformation Christian Anfinsen Studies on reversible denaturation  “Sequence specifies conformation” Chaperones and disulfide interchange enzymes: involved but not controlling final state, they provide environment to refold if misfolded Structure implies function: The amino acid sequence encodes the protein’s structural information
by itself: Anfinsen had developed what he called his "thermodynamic hypothesis" of protein folding to explain the native conformation of amino acid structures. He theorized that the native or natural conformation occurs because this particular shape is thermodynamically the most stable in the intracellular environment. That is, it takes this shape as a result of the constraints of the peptide bonds as modified by the other chemical and physical properties of the amino acids.  To test this hypothesis, Anfinsen unfolded the RNase enzyme under extreme chemical conditions and observed that the enzyme's amino acid structure refolded spontaneously back into its original form when he returned the chemical environment to natural cellular conditions.  "The native conformation is determined by the totality of interatomic interactions and hence by the amino acid sequence, in a given environment."  How does a protein fold ?
Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics  Weblems
Proteins are linear heteropolymers: one or more polypeptide chains Below about 40 residues the term peptide is frequently used.  A certain number of residues is necessary to perform a particular biochemical function, and around 40-50 residues appears to be the lower limit for a functional domain size. Protein sizes range from this lower limit to several hundred residues in multi-functional proteins. Three-dimentional shapes (folds) adopted vary enormously Experimental methods: X-ray crystallography NMR (nuclear magnetic resonance) Electron microscopy Ab initio calculations … The Basics
Zeroth: amino acid composition (proteomics, %cysteine, %glycine) Levels of protein structure
The basic structure of an a-amino acid is quite simple. R denotes any one of the 20 possible side chains (see table below). We notice that the Ca-atom has 4 different ligands (the H is omitted in the drawing) and is thus  chiral . An easy trick to remember the correct L-form is the CORN-rule: when the Ca-atom is viewed with the H in front, the residues read "CO-R-N" in a clockwise direction.   Amino Acid Residues
 
Amino Acid Residues
Amino Acid Residues
Amino Acid Residues
Amino Acid Residues
Primary: This is simply the order of covalent linkages along the polypeptide chain, I.e. the sequence itself Levels of protein structure
Backbone Torsion Angles
Backbone Torsion Angles
Secondary Local organization of the protein backbone: alpha-helix, Beta-strand (which assemble into Beta-sheets) turn and interconnecting loop. Levels of protein structure
Ramachandran / Phi-Psi Plot
The alpha-helix
Residues with hydrophobic properties conserved at i, i+2, i+4 separated by unconserved or hydrophilic residues suggest surface beta- strands.  A short run of hydrophobic amino acids (4 residues) suggests a buried beta-strand.  Pairs of conserved hydrophobic amino acids separated by pairs of unconserved, or hydrophilic residues suggests an alfa-helix with one face packing in the protein core. Likewise, an i, i+3, i+4, i+7 pattern of conserved hydrophobic residues.  A Practical Approach:   Interpretation
Beta-sheets
Topologies of Beta-sheets
Secondary structure prediction ?
Chou, P.Y. and Fasman, G.D. (1974). Conformational parameters for amino acids in helical,   -sheet, and random coil regions calculated from proteins. Biochemistry   13 , 211-221. Chou, P.Y. and Fasman, G.D. (1974). Prediction of protein conformation. Biochemistry   13 , 222-245. Secondary structure prediction:CHOU-FASMAN
Method  Assigning a set of prediction values to a  residue, based on statistic analysis of 15 proteins Applying a simple algorithm to those  numbers Secondary structure prediction:CHOU-FASMAN
Calculation of preference parameters observed counts P = Log --------------------- + 1.0 expected counts Preference parameter > 1.0    specific residue has a preference for the specific secondary structure. Preference parameter = 1.0    specific residue does not have a preference for, nor dislikes the specific secondary structure. Preference parameter < 1.0    specific residue dislikes the specific secondary structure. For each of the 20 residues and each secondary structure (  -helix,   -sheet and   -turn): Secondary structure prediction:CHOU-FASMAN
Preference parameters Secondary structure prediction:CHOU-FASMAN Residue P(a) P(b) P(t) f(i) f(i+1) f(i+2) f(i+3) Ala 1.45 0.97 0.57 0.049 0.049 0.034 0.029 Arg 0.79 0.90 1.00 0.051 0.127 0.025 0.101 Asn 0.73 0.65 1.68 0.101 0.086 0.216 0.065 Asp 0.98 0.80 1.26 0.137 0.088 0.069 0.059 Cys 0.77 1.30 1.17 0.089 0.022 0.111 0.089 Gln 1.17 1.23 0.56 0.050 0.089 0.030 0.089 Glu 1.53 0.26 0.44 0.011 0.032 0.053 0.021 Gly 0.53 0.81 1.68 0.104 0.090 0.158 0.113 His 1.24 0.71 0.69 0.083 0.050 0.033 0.033 Ile 1.00 1.60 0.58 0.068 0.034 0.017 0.051 Leu 1.34 1.22 0.53 0.038 0.019 0.032 0.051 Lys 1.07 0.74 1.01 0.060 0.080 0.067 0.073 Met 1.20 1.67 0.67 0.070 0.070 0.036 0.070 Phe 1.12 1.28 0.71 0.031 0.047 0.063 0.063 Pro 0.59 0.62 1.54 0.074 0.272 0.012 0.062 Ser 0.79 0.72 1.56 0.100 0.095 0.095 0.104 Thr 0.82 1.20 1.00 0.062 0.093 0.056 0.068 Trp 1.14 1.19 1.11 0.045 0.000 0.045 0.205 Tyr 0.61 1.29 1.25 0.136 0.025 0.110 0.102 Val 1.14 1.65 0.30 0.023 0.029 0.011 0.029
Applying algorithm Assign parameters to residue. Identify regions where 4 out of 6 residues have P(a)>100:   -helix. Extend helix in both directions until four contiguous residues have an average P(a)<100: end of   -helix. If segment is longer than 5 residues and P(a)>P(b):   -helix.  Repeat this procedure to locate all of the helical regions.  Identify regions where 3 out of 5 residues have P(b)>100:   -sheet. Extend sheet in both directions until four contiguous residues have an average P(b)<100: end of   -sheet. If P(b)>105 and P(b)>P(a):   -helix. Rest: P(a)>P(b)      -helix. P(b)>P(a)      -sheet. To identify a bend at residue number i, calculate the following value: p(t) = f(i)f(i+1)f(i+2)f(i+3) If: (1) p(t) > 0.000075; (2) average P(t)>1.00 in the tetrapeptide; and (3) averages for tetrapeptide obey P(a)<P(t)>P(b):   -turn. Secondary structure prediction:CHOU-FASMAN
Successful method? 19 proteins evaluated: Successful in locating 88% of helical and 95% of    regions Correctly predicting 80% of helical and 86% of   -sheet residues Accuracy of predicting the three conformational states for all residues, helix, b, and coil, is 77% Chou & Fasman:successful method After 1974:improvement of preference parameters Secondary structure prediction:CHOU-FASMAN
 
Sander-Schneider:  Evolution of overall structure  Naturally occurring sequences with more than 20% sequence identity over 80 or more residues always adopt the same basic structure (Sander and Schneider 1991)
Sander-Schneider HSSP: homology derived secondary structure
SCOP:  Structural Classification of Proteins FSSP:  Family of Structurally Similar Proteins CATH:  Class, Architecture, Topology, Homology Structural Family Databases
Levels of protein structure Tertiary Packing of secondary structure elements into a compact spatial unit Fold or domain – this is the level to which structure is currently possible
Domains
Protein Architecture
Protein Dissection into domain Conserved Domain Architecture Retrieval Tool (CDART) uses information in Pfam and SMART to assign domains along a sequence (automatic when blasting) Domains
From the analysis of alignment of protein families Conserved sequence features, usually associate with a specific function PROSITE database for protein “signature” protein (large amount of FP & FN) From aligment of homologous sequences (PRINTS/PRODOM) From Hidden Markov Models (PFAM) Meta approach: INTERPRO Domains
Protein Architecture
Levels of protein structure: Topology
Hydrophobicity Plot P53_HUMAN (P04637) human cellular tumor antigen p53 Kyte-Doolittle hydrophilicty, window=19
 
The ‘positive inside’ rule (EMBO J. 5:3021; EJB 174:671,205:1207; FEBS lett. 282:41) Bacterial IM In: 16% KR out: 4% KR Eukaryotic PM In: 17% KR out: 7% KR Thylakoid membrane  In: 13% KR out: 5% KR Mitochondrial IM In: 10% KR out: 3% KR
 
Membrane-bound receptors A very large number of different domains both to bind their ligand and to activate G proteins. 6 different families Transducing messages as photons, organic odorants, nucleotides, nucleosides, peptides, lipids and proteins.  GPCR Topology Pharmaceutically the most important class Challenge: Methods to find novel GCPRs in human genome …
GPCR Topology
Seven transmembrane regions GPCR Structure Conserved residues and motifs (i.e. NPXXY) Hydrophobic/ hydrophilic domains GPCR Topology
GPCR Topology Eg. Plot conserverd residues (or multiple alignement: MSA to SSA)
Levels of protein structure Difficult to predict Functional units: Apoptosome, proteasome
Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics  Weblems
X-ray crystallography is an experimental technique that exploits the fact that X-rays are diffracted by crystals.  X-rays have the proper wavelength (in the Ångström range, ~10-8 cm)  to be scattered by the electron cloud of an atom of comparable size. Based on the diffraction pattern obtained from X-ray scattering off the periodic assembly of molecules or atoms in the crystal, the electron density can be reconstructed.  A model is then progressively built into the experimental electron density, refined against the data and the result is a quite accurate molecular structure. What is X-ray Crystallography
NMR uses protein in solution Can look at the dynamic properties of the protein structure Can look at the interactions between the protein and ligands, substrates or other proteins Can look at protein folding Sample is not damaged in any way The maximum size of a protein for NMR structure determination is ~30 kDa.This elliminates ~50% of all proteins High solubility is a requirement X-ray crystallography uses protein crystals No size limit: As long as you can crystallise it Solubility requirement is less stringent Simple definition of resolution Direct calculation from data to electron density and back again Crystallisation is the process bottleneck, Binary (all or nothing) Phase problem Relies on heavy atom soaks or SeMet incorporation Both techniques require large amounts of pure protein and require expensive equipment! NMR or Crystallography ?
Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics  Weblems
PDB
PDB
PDB
PDB
Visualizing Structures Cn3D versie 4.0 (NCBI)
Ball: Van der Waals radius Stick: length joins center N, blue/O, red/S, yellow/C, gray (green) Visualizing Structures
From N to C Visualizing Structures
Demonstration of Protein explorer  PDB, install Chime Search helicase (select structure where DNA is present) Stop spinning, hide water molecules Show basic residues, interact with negatively charged backbone RASMOL / Cn3D Visualizing Structures
Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics  Weblems
Modeling
Protein Stucture Molecular Modeling:  building a 3D protein structure  from its sequence
Finding a structural homologue Blast  versus PDB database or PSI-blast (E<0.005) Domain coverage at least 60% Avoid Gaps Choose for few gaps and reasonable similarity scores instead of lots of gaps and high similarity scores Modeling
Extract “template” sequences and align with query Whatch out for missing data (PDB file) and complement with additonal templates Try to get as much information as possible, X/NMR Sequence alignment from structure comparson of templates (SSA) can be different from a simple sequence aligment  >40% identity, any aligment method is OK <40%, checks are essential Residue conservation checks in functional regions (patterns/motifs) Indels: combine gaps separted by few resides Manual editing: Move gaps from secondary elements to loops Within loops, move gaps to loop ends, i.e. turnaround point of backbone Align templates structurally, extract the corresponding SSA or QTA (Query/template alignment) Modeling
Input for model building Query sequence (the one you want the 3D model for) Template sequences and structures Query/Template(s) (structure) sequence aligment Modeling
Methods (details on these see paper): WHATIF, SWISS-MODEL, MODELLER, ICM, 3D-JIGSAW, CPH-models, SDC1 Modeling
Model evaluation (How good is the prediction, how much can the algorithm rely/extract on the provided templates) PROCHECK WHATIF ERRAT CASP (Critical Assessment of Structure Prediction) Beste method is manual alignment editing !  Modeling
**T112/dhso –  4.9  Å  (348 residues; 24%) **T92/yeco – 5.6  Å  (104 residues; 12%) **T128/sodm – 1.0  Å  (198 residues; 50%) **T125/sp18 – 4.4  Å  (137 residues; 24%) **T111/eno – 1.7  Å  (430 residues; 51%) **T122/trpa – 2.9  Å  (241 residues; 33%) Comparative modelling at CASP CASP4: overall model accuracy ranging from 1  Å  to 6  Å  for 50-10% sequence identity   CASP2 fair ~ 75% ~ 1.0  Å ~ 3.0  Å CASP3 fair ~75% ~ 1.0  Å ~ 2.5  Å CASP4 fair ~75% ~ 1.0  Å ~ 2.0  Å CASP1 poor ~ 50% ~ 3.0  Å > 5.0  Å BC excellent ~ 80% 1.0  Å 2.0  Å alignment side chain short loops longer loops
 
Protein Engineering / Protein Design

More Related Content

PPTX
Bioinformatics t8-go-hmm v2014
PDF
rSHMT_EJB_1997
PPTX
Protein stability(molecular biology)
PPTX
Protein structure determination
PPT
Biotech 2012 spring_8_post-trans
PDF
Amyloid and alzheimer’s disease
PPTX
Post Translational Modification
PPT
Postranslational Modification
Bioinformatics t8-go-hmm v2014
rSHMT_EJB_1997
Protein stability(molecular biology)
Protein structure determination
Biotech 2012 spring_8_post-trans
Amyloid and alzheimer’s disease
Post Translational Modification
Postranslational Modification

What's hot (20)

PPTX
Post-translational Modifications in Crop Improvement
PPT
Brief introduction of post-translational modifications (PTMs)
PPTX
Post-translational modification of monoclonal antibodies
PDF
Determination of protein structure by Dr. Anurag Yadav
PPTX
Post-Translational Modifications
PPTX
Protein glycosylation and its associated disorders
PPTX
Post tranlational modification
PDF
A quick revision of Carbohydrate metabolism with case- based discussions and ...
PPT
Structure analysis of protein
PPT
ATP- The universal energy currency of cell
PPTX
Proteomics
PPT
Serum protein electrophoresis & their clinical importance
PPTX
Post translational modifications
PPTX
Post translational-modification-creative-peptides
PPTX
Strategies for Post-translational Modification (PTM)
PPTX
protein stability
PPT
Protein Purification
PDF
Flavocytochrome p450 bm3 mutant a264 e undergoes substrate dependent formatio...
PDF
Post translational modifications
PPT
Chapter 3(part2) - Protein purification and analysis
Post-translational Modifications in Crop Improvement
Brief introduction of post-translational modifications (PTMs)
Post-translational modification of monoclonal antibodies
Determination of protein structure by Dr. Anurag Yadav
Post-Translational Modifications
Protein glycosylation and its associated disorders
Post tranlational modification
A quick revision of Carbohydrate metabolism with case- based discussions and ...
Structure analysis of protein
ATP- The universal energy currency of cell
Proteomics
Serum protein electrophoresis & their clinical importance
Post translational modifications
Post translational-modification-creative-peptides
Strategies for Post-translational Modification (PTM)
protein stability
Protein Purification
Flavocytochrome p450 bm3 mutant a264 e undergoes substrate dependent formatio...
Post translational modifications
Chapter 3(part2) - Protein purification and analysis
Ad

Viewers also liked (12)

PPT
Part I : Introduction to Protein Structure
PPT
2009 CSBB LAB 新生訓練
ODP
12.protein folding
PDF
Generic approach for predicting unannotated protein pair function using protein
PPTX
Directed evolution
PDF
Computational Protein Design. 1. Challenges in Protein Engineering
PPT
Protein Engineering
DOC
Protein engineering
PPTX
Protein engineering
PPTX
Protein engineering
PPT
Protein structure classification
PPTX
Protein engineering saurav
Part I : Introduction to Protein Structure
2009 CSBB LAB 新生訓練
12.protein folding
Generic approach for predicting unannotated protein pair function using protein
Directed evolution
Computational Protein Design. 1. Challenges in Protein Engineering
Protein Engineering
Protein engineering
Protein engineering
Protein engineering
Protein structure classification
Protein engineering saurav
Ad

Similar to Bioinformatica 01-12-2011-t7-protein (20)

PPTX
Bioinformatics t7-proteinstructure v2014
PPTX
Bioinformatics t7-protein structure-v2013_wim_vancriekinge
PPTX
2016 bioinformatics i_proteins_wim_vancriekinge
PPTX
2015 bioinformatics protein_structure_wimvancriekinge
PPTX
Bioinformatica t7-protein structure
PPT
Prediction of protein function from sequence derived protein features
PPT
Proteins – Basics you need to know for Proteomics
PPT
Proteins
PPT
Proteomics
PPTX
Prediction of disorder in protein structure (amit singh)
PPT
Proteomics a search tool for vaccines
PPT
proteomics and protein technology
PPT
Protiens and peptids
PPT
materi tentang protein yang diberikan saat perkuliahan
PPTX
protein.pptx
PPT
NIH-mar2604.rm.ppt
PPT
Aug 26 2011
PPT
lehninger(sixth edition) Ch 03: Amino acids, peptides and proteins
PPT
Proteomics: lecture (1) introduction to proteomics
PDF
Protein Structure Prediction Using Support Vector Machine
Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-protein structure-v2013_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge
2015 bioinformatics protein_structure_wimvancriekinge
Bioinformatica t7-protein structure
Prediction of protein function from sequence derived protein features
Proteins – Basics you need to know for Proteomics
Proteins
Proteomics
Prediction of disorder in protein structure (amit singh)
Proteomics a search tool for vaccines
proteomics and protein technology
Protiens and peptids
materi tentang protein yang diberikan saat perkuliahan
protein.pptx
NIH-mar2604.rm.ppt
Aug 26 2011
lehninger(sixth edition) Ch 03: Amino acids, peptides and proteins
Proteomics: lecture (1) introduction to proteomics
Protein Structure Prediction Using Support Vector Machine

More from Prof. Wim Van Criekinge (20)

PPTX
2020 02 11_biological_databases_part1
PPTX
2019 03 05_biological_databases_part5_v_upload
PPTX
2019 03 05_biological_databases_part4_v_upload
PPTX
2019 03 05_biological_databases_part3_v_upload
PPTX
2019 02 21_biological_databases_part2_v_upload
PPTX
2019 02 12_biological_databases_part1_v_upload
PPTX
P7 2018 biopython3
PPTX
P6 2018 biopython2b
PPTX
P4 2018 io_functions
PPTX
P3 2018 python_regexes
PPTX
T1 2018 bioinformatics
PPTX
P1 2018 python
PDF
Bio ontologies and semantic technologies[2]
PPTX
2018 05 08_biological_databases_no_sql
PPTX
2018 03 27_biological_databases_part4_v_upload
PPTX
2018 03 20_biological_databases_part3
PPTX
2018 02 20_biological_databases_part2_v_upload
PPTX
2018 02 20_biological_databases_part1_v_upload
PPTX
P7 2017 biopython3
PPTX
P6 2017 biopython2
2020 02 11_biological_databases_part1
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part3_v_upload
2019 02 21_biological_databases_part2_v_upload
2019 02 12_biological_databases_part1_v_upload
P7 2018 biopython3
P6 2018 biopython2b
P4 2018 io_functions
P3 2018 python_regexes
T1 2018 bioinformatics
P1 2018 python
Bio ontologies and semantic technologies[2]
2018 05 08_biological_databases_no_sql
2018 03 27_biological_databases_part4_v_upload
2018 03 20_biological_databases_part3
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part1_v_upload
P7 2017 biopython3
P6 2017 biopython2

Recently uploaded (20)

PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
01-Introduction-to-Information-Management.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
Complications of Minimal Access Surgery at WLH
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Yogi Goddess Pres Conference Studio Updates
PDF
Classroom Observation Tools for Teachers
Orientation - ARALprogram of Deped to the Parents.pptx
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Anesthesia in Laparoscopic Surgery in India
A systematic review of self-coping strategies used by university students to ...
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
UNIT III MENTAL HEALTH NURSING ASSESSMENT
Final Presentation General Medicine 03-08-2024.pptx
01-Introduction-to-Information-Management.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
LDMMIA Reiki Yoga Finals Review Spring Summer
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Final Presentation General Medicine 03-08-2024.pptx
Paper A Mock Exam 9_ Attempt review.pdf.
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Complications of Minimal Access Surgery at WLH
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Yogi Goddess Pres Conference Studio Updates
Classroom Observation Tools for Teachers

Bioinformatica 01-12-2011-t7-protein

  • 1.  
  • 2. FBW 1-12-2011 Wim Van Criekinge
  • 3. Inhoud Lessen: Bioinformatica don 29-09-2011: 1* Bioinformatics (practicum 8.30-11.00) don 06-10-2011: 2* Biological Databases (practicum 9.00-11.30) don 20-10-2011: 3 Sequence Similarity (Scoring Matrices) don 27-10-2011: 4 Sequence Alignments don 10-11-2011: 5 Database Searching Fasta/Blast don 17-11-2011: afgelast don 24-11-2011: 6 Phylogenetics don 01-12-2011: 7 Protein Structure don 08-12-2011: 8 Gene Prediction, Gene Ontologies & HMM don 15-12-2011: 9-10 Bio- & Cheminformatics in Drug Discovery (inhaalweek) Opgelet: Geen les op don 13-10-2010 en don 3-11-2010
  • 4. Biobix: Applied Bioinformatics Research Thesisonderwerpen Lopend onderzoek Biomerker predictie / Methylatie Metabonomics Peptidomics Translational biotechnology (text mining) Structural Genomics miRNA prediction / Target Prediction Exploring genomic dark matter ( junk mining ) Samenwerking met diverse instituten Ambities om te peer-reviewed te publiceren
  • 5. empirical finding: if two biological sequences are sufficiently similar, almost invariably they have similar biological functions and will be descended from a common ancestor. (i) function is encoded into sequence , this means: the sequence provides the syntax and (ii) there is a redundancy in the encoding , many positions in the sequence may be changed without perceptible changes in the function, thus the semantics of the encoding is robust. The reason for “bioinformatics” to exist ?
  • 6. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 7. Proteins perform a variety of cellular tasks in the living cells Each protein adopts a particular folding that determines its function The 3D structure of a protein can bring into close proximity residues that are far apart in the amino acid sequence Catalytic site: Business End of the molecule Why protein structure ?
  • 8. Rationale for understanding protein structure and function Protein sequence -large numbers of sequences, including whole genomes Protein function - rational drug design and treatment of disease - protein and genetic engineering - build networks to model cellular pathways - study organismal function and evolution ? structure determination structure prediction homology rational mutagenesis biochemical analysis model studies Protein structure - three dimensional - complicated - mediates function
  • 9. About the use of protein models (Peitch) Structure is preserved under evolution when sequence is not Interpreting the impact of mutations/SNPs and conserved residues on protein function. Potential link to disease Function ? Biochemical: the chemical interactions occerring in a protein Biological: role within the cell Phenotypic: the role in the organism Gene Ontology functional classification ! Priorisation of residues to mutate to determine protein function Providing hints for protein function: Catalytic mechanisms of enzymes often require key residues to be close together in 3D space (protein-ligand complexes, rational drug design, putative interaction interfaces)
  • 10. MIS-SENSE MUTATION e.g. Sickle Cell Anaemia Cause : defective haemoglobin due to mutation in β-globin gene Symptoms : severe anaemia and death in homozygote
  • 11. Normal β-globin - 146 amino acids val - his - leu - thr - pro - glu - glu - --------- 1 2 3 4 5 6 7 Normal gene (aa 6) Mutant gene DNA CTC C A C mRNA GAG GUG Product Glu Valine Mutant β-globin val - his - leu - thr - pro - val - glu - ---------
  • 12. Protein Conformation Christian Anfinsen Studies on reversible denaturation “Sequence specifies conformation” Chaperones and disulfide interchange enzymes: involved but not controlling final state, they provide environment to refold if misfolded Structure implies function: The amino acid sequence encodes the protein’s structural information
  • 13. by itself: Anfinsen had developed what he called his &quot;thermodynamic hypothesis&quot; of protein folding to explain the native conformation of amino acid structures. He theorized that the native or natural conformation occurs because this particular shape is thermodynamically the most stable in the intracellular environment. That is, it takes this shape as a result of the constraints of the peptide bonds as modified by the other chemical and physical properties of the amino acids. To test this hypothesis, Anfinsen unfolded the RNase enzyme under extreme chemical conditions and observed that the enzyme's amino acid structure refolded spontaneously back into its original form when he returned the chemical environment to natural cellular conditions. &quot;The native conformation is determined by the totality of interatomic interactions and hence by the amino acid sequence, in a given environment.&quot; How does a protein fold ?
  • 14. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 15. Proteins are linear heteropolymers: one or more polypeptide chains Below about 40 residues the term peptide is frequently used. A certain number of residues is necessary to perform a particular biochemical function, and around 40-50 residues appears to be the lower limit for a functional domain size. Protein sizes range from this lower limit to several hundred residues in multi-functional proteins. Three-dimentional shapes (folds) adopted vary enormously Experimental methods: X-ray crystallography NMR (nuclear magnetic resonance) Electron microscopy Ab initio calculations … The Basics
  • 16. Zeroth: amino acid composition (proteomics, %cysteine, %glycine) Levels of protein structure
  • 17. The basic structure of an a-amino acid is quite simple. R denotes any one of the 20 possible side chains (see table below). We notice that the Ca-atom has 4 different ligands (the H is omitted in the drawing) and is thus chiral . An easy trick to remember the correct L-form is the CORN-rule: when the Ca-atom is viewed with the H in front, the residues read &quot;CO-R-N&quot; in a clockwise direction.  Amino Acid Residues
  • 18.  
  • 23. Primary: This is simply the order of covalent linkages along the polypeptide chain, I.e. the sequence itself Levels of protein structure
  • 26. Secondary Local organization of the protein backbone: alpha-helix, Beta-strand (which assemble into Beta-sheets) turn and interconnecting loop. Levels of protein structure
  • 29. Residues with hydrophobic properties conserved at i, i+2, i+4 separated by unconserved or hydrophilic residues suggest surface beta- strands. A short run of hydrophobic amino acids (4 residues) suggests a buried beta-strand. Pairs of conserved hydrophobic amino acids separated by pairs of unconserved, or hydrophilic residues suggests an alfa-helix with one face packing in the protein core. Likewise, an i, i+3, i+4, i+7 pattern of conserved hydrophobic residues. A Practical Approach: Interpretation
  • 33. Chou, P.Y. and Fasman, G.D. (1974). Conformational parameters for amino acids in helical,  -sheet, and random coil regions calculated from proteins. Biochemistry 13 , 211-221. Chou, P.Y. and Fasman, G.D. (1974). Prediction of protein conformation. Biochemistry 13 , 222-245. Secondary structure prediction:CHOU-FASMAN
  • 34. Method Assigning a set of prediction values to a residue, based on statistic analysis of 15 proteins Applying a simple algorithm to those numbers Secondary structure prediction:CHOU-FASMAN
  • 35. Calculation of preference parameters observed counts P = Log --------------------- + 1.0 expected counts Preference parameter > 1.0  specific residue has a preference for the specific secondary structure. Preference parameter = 1.0  specific residue does not have a preference for, nor dislikes the specific secondary structure. Preference parameter < 1.0  specific residue dislikes the specific secondary structure. For each of the 20 residues and each secondary structure (  -helix,  -sheet and  -turn): Secondary structure prediction:CHOU-FASMAN
  • 36. Preference parameters Secondary structure prediction:CHOU-FASMAN Residue P(a) P(b) P(t) f(i) f(i+1) f(i+2) f(i+3) Ala 1.45 0.97 0.57 0.049 0.049 0.034 0.029 Arg 0.79 0.90 1.00 0.051 0.127 0.025 0.101 Asn 0.73 0.65 1.68 0.101 0.086 0.216 0.065 Asp 0.98 0.80 1.26 0.137 0.088 0.069 0.059 Cys 0.77 1.30 1.17 0.089 0.022 0.111 0.089 Gln 1.17 1.23 0.56 0.050 0.089 0.030 0.089 Glu 1.53 0.26 0.44 0.011 0.032 0.053 0.021 Gly 0.53 0.81 1.68 0.104 0.090 0.158 0.113 His 1.24 0.71 0.69 0.083 0.050 0.033 0.033 Ile 1.00 1.60 0.58 0.068 0.034 0.017 0.051 Leu 1.34 1.22 0.53 0.038 0.019 0.032 0.051 Lys 1.07 0.74 1.01 0.060 0.080 0.067 0.073 Met 1.20 1.67 0.67 0.070 0.070 0.036 0.070 Phe 1.12 1.28 0.71 0.031 0.047 0.063 0.063 Pro 0.59 0.62 1.54 0.074 0.272 0.012 0.062 Ser 0.79 0.72 1.56 0.100 0.095 0.095 0.104 Thr 0.82 1.20 1.00 0.062 0.093 0.056 0.068 Trp 1.14 1.19 1.11 0.045 0.000 0.045 0.205 Tyr 0.61 1.29 1.25 0.136 0.025 0.110 0.102 Val 1.14 1.65 0.30 0.023 0.029 0.011 0.029
  • 37. Applying algorithm Assign parameters to residue. Identify regions where 4 out of 6 residues have P(a)>100:  -helix. Extend helix in both directions until four contiguous residues have an average P(a)<100: end of  -helix. If segment is longer than 5 residues and P(a)>P(b):  -helix. Repeat this procedure to locate all of the helical regions. Identify regions where 3 out of 5 residues have P(b)>100:  -sheet. Extend sheet in both directions until four contiguous residues have an average P(b)<100: end of  -sheet. If P(b)>105 and P(b)>P(a):  -helix. Rest: P(a)>P(b)   -helix. P(b)>P(a)   -sheet. To identify a bend at residue number i, calculate the following value: p(t) = f(i)f(i+1)f(i+2)f(i+3) If: (1) p(t) > 0.000075; (2) average P(t)>1.00 in the tetrapeptide; and (3) averages for tetrapeptide obey P(a)<P(t)>P(b):  -turn. Secondary structure prediction:CHOU-FASMAN
  • 38. Successful method? 19 proteins evaluated: Successful in locating 88% of helical and 95% of  regions Correctly predicting 80% of helical and 86% of  -sheet residues Accuracy of predicting the three conformational states for all residues, helix, b, and coil, is 77% Chou & Fasman:successful method After 1974:improvement of preference parameters Secondary structure prediction:CHOU-FASMAN
  • 39.  
  • 40. Sander-Schneider: Evolution of overall structure Naturally occurring sequences with more than 20% sequence identity over 80 or more residues always adopt the same basic structure (Sander and Schneider 1991)
  • 41. Sander-Schneider HSSP: homology derived secondary structure
  • 42. SCOP: Structural Classification of Proteins FSSP: Family of Structurally Similar Proteins CATH: Class, Architecture, Topology, Homology Structural Family Databases
  • 43. Levels of protein structure Tertiary Packing of secondary structure elements into a compact spatial unit Fold or domain – this is the level to which structure is currently possible
  • 46. Protein Dissection into domain Conserved Domain Architecture Retrieval Tool (CDART) uses information in Pfam and SMART to assign domains along a sequence (automatic when blasting) Domains
  • 47. From the analysis of alignment of protein families Conserved sequence features, usually associate with a specific function PROSITE database for protein “signature” protein (large amount of FP & FN) From aligment of homologous sequences (PRINTS/PRODOM) From Hidden Markov Models (PFAM) Meta approach: INTERPRO Domains
  • 49. Levels of protein structure: Topology
  • 50. Hydrophobicity Plot P53_HUMAN (P04637) human cellular tumor antigen p53 Kyte-Doolittle hydrophilicty, window=19
  • 51.  
  • 52. The ‘positive inside’ rule (EMBO J. 5:3021; EJB 174:671,205:1207; FEBS lett. 282:41) Bacterial IM In: 16% KR out: 4% KR Eukaryotic PM In: 17% KR out: 7% KR Thylakoid membrane In: 13% KR out: 5% KR Mitochondrial IM In: 10% KR out: 3% KR
  • 53.  
  • 54. Membrane-bound receptors A very large number of different domains both to bind their ligand and to activate G proteins. 6 different families Transducing messages as photons, organic odorants, nucleotides, nucleosides, peptides, lipids and proteins. GPCR Topology Pharmaceutically the most important class Challenge: Methods to find novel GCPRs in human genome …
  • 56. Seven transmembrane regions GPCR Structure Conserved residues and motifs (i.e. NPXXY) Hydrophobic/ hydrophilic domains GPCR Topology
  • 57. GPCR Topology Eg. Plot conserverd residues (or multiple alignement: MSA to SSA)
  • 58. Levels of protein structure Difficult to predict Functional units: Apoptosome, proteasome
  • 59. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 60. X-ray crystallography is an experimental technique that exploits the fact that X-rays are diffracted by crystals. X-rays have the proper wavelength (in the Ångström range, ~10-8 cm)  to be scattered by the electron cloud of an atom of comparable size. Based on the diffraction pattern obtained from X-ray scattering off the periodic assembly of molecules or atoms in the crystal, the electron density can be reconstructed. A model is then progressively built into the experimental electron density, refined against the data and the result is a quite accurate molecular structure. What is X-ray Crystallography
  • 61. NMR uses protein in solution Can look at the dynamic properties of the protein structure Can look at the interactions between the protein and ligands, substrates or other proteins Can look at protein folding Sample is not damaged in any way The maximum size of a protein for NMR structure determination is ~30 kDa.This elliminates ~50% of all proteins High solubility is a requirement X-ray crystallography uses protein crystals No size limit: As long as you can crystallise it Solubility requirement is less stringent Simple definition of resolution Direct calculation from data to electron density and back again Crystallisation is the process bottleneck, Binary (all or nothing) Phase problem Relies on heavy atom soaks or SeMet incorporation Both techniques require large amounts of pure protein and require expensive equipment! NMR or Crystallography ?
  • 62. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 63. PDB
  • 64. PDB
  • 65. PDB
  • 66. PDB
  • 67. Visualizing Structures Cn3D versie 4.0 (NCBI)
  • 68. Ball: Van der Waals radius Stick: length joins center N, blue/O, red/S, yellow/C, gray (green) Visualizing Structures
  • 69. From N to C Visualizing Structures
  • 70. Demonstration of Protein explorer PDB, install Chime Search helicase (select structure where DNA is present) Stop spinning, hide water molecules Show basic residues, interact with negatively charged backbone RASMOL / Cn3D Visualizing Structures
  • 71. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 73. Protein Stucture Molecular Modeling: building a 3D protein structure from its sequence
  • 74. Finding a structural homologue Blast versus PDB database or PSI-blast (E<0.005) Domain coverage at least 60% Avoid Gaps Choose for few gaps and reasonable similarity scores instead of lots of gaps and high similarity scores Modeling
  • 75. Extract “template” sequences and align with query Whatch out for missing data (PDB file) and complement with additonal templates Try to get as much information as possible, X/NMR Sequence alignment from structure comparson of templates (SSA) can be different from a simple sequence aligment >40% identity, any aligment method is OK <40%, checks are essential Residue conservation checks in functional regions (patterns/motifs) Indels: combine gaps separted by few resides Manual editing: Move gaps from secondary elements to loops Within loops, move gaps to loop ends, i.e. turnaround point of backbone Align templates structurally, extract the corresponding SSA or QTA (Query/template alignment) Modeling
  • 76. Input for model building Query sequence (the one you want the 3D model for) Template sequences and structures Query/Template(s) (structure) sequence aligment Modeling
  • 77. Methods (details on these see paper): WHATIF, SWISS-MODEL, MODELLER, ICM, 3D-JIGSAW, CPH-models, SDC1 Modeling
  • 78. Model evaluation (How good is the prediction, how much can the algorithm rely/extract on the provided templates) PROCHECK WHATIF ERRAT CASP (Critical Assessment of Structure Prediction) Beste method is manual alignment editing ! Modeling
  • 79. **T112/dhso – 4.9 Å (348 residues; 24%) **T92/yeco – 5.6 Å (104 residues; 12%) **T128/sodm – 1.0 Å (198 residues; 50%) **T125/sp18 – 4.4 Å (137 residues; 24%) **T111/eno – 1.7 Å (430 residues; 51%) **T122/trpa – 2.9 Å (241 residues; 33%) Comparative modelling at CASP CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity CASP2 fair ~ 75% ~ 1.0 Å ~ 3.0 Å CASP3 fair ~75% ~ 1.0 Å ~ 2.5 Å CASP4 fair ~75% ~ 1.0 Å ~ 2.0 Å CASP1 poor ~ 50% ~ 3.0 Å > 5.0 Å BC excellent ~ 80% 1.0 Å 2.0 Å alignment side chain short loops longer loops
  • 80.  
  • 81. Protein Engineering / Protein Design

Editor's Notes

  • #42: The new curve saturated around 20% for alignments over more than 250 residues --- and for alignments shorter than 11 residues the new equation yielded values above 100%. However, this was acceptable as 100% identity for gragments of 10-11 residues does not imply structural similarity.