SlideShare a Scribd company logo
Bioinformatics – A Brief overview
What is bioinformatics? Application of information technology to the storage, management and analysis of biological information Facilitated by the use of computers
Publically available genomes (April 1998) COMPLETE/PUBLIC Aquifex aeolicus  Pyrococcus horikoshii Bacillus subtilis Treponema pallidum Borrelia burgdorferi Helicobacter pylori .  Escherichia coli Mycoplasma pneumoniae Saccharomyces cerevisiae Mycoplasma genitalium Haemophilus influenzae COMPLETE/PENDING PUBLICATION Rickettsia prowazekii  Pseudomonas aeruginosa Pyrococcus abyssii Bacillus sp. C-125 Ureaplasma urealyticum Pyrobaculum aerophilum ALMOST/PUBLIC Pyrococcus furiosus Mycobacterium tuberculosis H37Rv Mycobacterium tuberculosis CSU93 Neisseria gonorrhea Neisseria meningiditis Streptococcus pyogenes
Promises of genomics and bioinformatics  Medicine Knowledge of protein structure facilitates drug design Understanding of genomic variation allows the tailoring of medical treatment to the individual’s genetic make-up Genome analysis allows the targeting of genetic diseases The effect of a disease or of a therapeutic on RNA and protein levels can be elucidated The same techniques can be applied to biotechnology, crop and livestock improvement, etc...
The need for bioinformaticists.   The number of entries in data bases of gene sequences is increasing exponentially. Bioinformaticians are needed to understand and use this information . 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 GenBank growth
What Can be done using bioinformatics? Sequence analysis Geneticists/ molecular biologists analyse genome sequence information to understand disease processes Molecular modeling Crystallographers/ biochemists design drugs using computer-aided tools Phylogeny/evolution Geneticists obtain information about the evolution of organisms by looking for similarities in gene sequences Ecology and population studies Bioinformatics is used to handle large amounts of data obtained in population studies Medical informatics Personalised medicine
NCBI (National centre for Biotechnology information ) www.ncbi.nlm.nih.gov Entrez  Protein DNA EMBL, DDBJ,  GENEBANK SRS  GENOME  Pubmed  Annotation Medline PIR Swissprot PDB
What can be discovered about a gene by a database search? A little or a lot, depending on the gene Evolutionary information : homologous genes, taxonomic distributions, allele frequencies, synteny, etc. Genomic information : chromosomal location, introns, UTRs, regulatory regions, shared domains, etc. Structural information : associated protein structures, fold types, structural domains Expression information : expression specific to particular tissues, developmental stages, phenotypes, diseases, etc. Functional information : enzymatic/molecular function, pathway/cellular role, localization, role in diseases
Databases Three types of databases Primary – Sequence database Secondary- Annotation Tertiary- structure database  Two other types DNA database  - Genebank,DDBJ,EMBL Protein databases –  PIR,SwissProt,MIPS
Biological databanks and databases Very fast growth of biological data Diversity of biological data: primary sequences 3D structures functional data Database entry usually required for publication Sequences Structures Database entry may replace primary publication genomic approaches Bioinformatics
PubMed
 
Sequence analysis: overview Nucleotide sequence file Search databases for similar sequences Sequence comparison Multiple sequence analysis Design further experiments Restriction mapping PCR planning Translate into protein Search for known motifs RNA structure prediction non-coding coding Protein sequence analysis Search for protein coding regions Manual sequence entry Sequence database browsing Sequencing project management  Protein sequence file Search databases for similar sequences Sequence comparison Search for known motifs Predict secondary structure Predict tertiary structure Create a multiple sequence alignment Edit the alignment Format the alignment for publication Molecular phylogeny Protein family analysis Nucleotide sequence analysis Sequence entry
Sequence comparison Pairwise sequence alignment  Blast  -  BlastP,BlastN,nBlastP Multiple sequence alignment ClustalW,ClustalX User interface Bioedit Biology Workbench CLC Workbench
Click on:
Database Search
 
Multiple Sequence Alignment: Approaches Optimal Global Alignments  -Dynamic programming Generalization of Needleman-Wunsch Find alignment that maximizes a score function Computationally expensive:  Time grows as product of sequence lengths Global Progressive Alignments  - Match closely-related sequences first using a guide tree Global Iterative Alignments  - Multiple re-building attempts to find best alignment Local alignments Profiles, Blocks, Patterns
CLUSTALW MSA
Phylogeny inference:  Analysis of sequences allows evolutionary relationships to be determined E.coli C.botulinum C.cadavers C.butyricum B.subtilis B.cereus Phylogenetic tree constructed using the Phylip package
gene prediction software Similarity-based or Comparative  BLAST  SGP2 (extension of GeneID) Ab initio  = “from the beginning” GeneID  GENSCAN GeneMark Combined "evidence-based” GeneSeqer  (Brendel et al., ISU) BEST-   GENSCAN, GeneMark.hmm, GeneSeqer but depends on organism & specific task
PCR Primer Design: Oligonucleotides for use in the polymerisation chain reaction can be designed using computer based prgrams OPTIMAL primer length  --> 20 MINIMUM primer length  --> 18 MAXIMUM primer length  --> 22  OPTIMAL primer melting temperature  --> 60.000 MINIMUM acceptable melting temp  --> 57.000 MAXIMUM acceptable melting temp  --> 63.000 MINIMUM acceptable primer GC%  --> 20.000 MAXIMUM acceptable primer GC%  --> 80.000 Salt concentration (mM)  --> 50.000  DNA concentration (nM)  --> 50.000 MAX no. unknown bases (Ns) allowed  --> 0  MAX acceptable self-complementarity --> 12  MAXIMUM 3' end self-complementarity --> 8  GC clamp how many 3' bases  --> 0
Restriction mapping:  Genes can be analysed to detect gene sequences that can be cleaved with restriction enzymes AceIII  1 CAGCTCnnnnnnn’nnn... AluI  2 AG’CT AlwI  1 GGATCnnnn’n_ ApoI  2 r’AATT_y BanII  1 G_rGCy’C BfaI  2 C’TA_G BfiI  1 ACTGGG BsaXI  1 ACnnnnnCTCC BsgI  1 GTGCAGnnnnnnnnnnn... BsiHKAI  1 G_wGCw’C Bsp1286I  1 G_dGCh’C BsrI  2 ACTG_Gn’ BsrFI  1 r’CCGG_y CjeI  2 CCAnnnnnnGTnnnnnn... CviJI  4 rG’Cy CviRI  1 TG’CA DdeI  2 C’TnA_G DpnI  2 GA’TC EcoRI  1 G’AATT_C HinfI  2 G’AnT_C MaeIII  1 ’GTnAC_ MnlI  1 CCTCnnnnnn_n’ MseI  2 T’TA_A MspI  1 C’CG_G NdeI  1 CA’TA_TG Sau3AI  2 ’GATC_ SstI  1 G_AGCT’C TfiI  2 G’AwT_C Tsp45I  1 ’GTsAC_ Tsp509I  3 ’AATT_ TspRI  1 CAGTGnn’ 50 100 150 200 250
RNA structure prediction:  Structural features of RNA can be predicted G G A C A G G A G G A U A C C G C G G U C C U G C C G G U C C U C A C U U G G A C U U A G U A U C A U C A G U C U G C G C A A U A G G U A A C G C G U
Protein Structure  : the 3-D structure of proteins is used to understand protein function and design new drugs
Gene Sequencing:  Automated chemcial sequencing methods allow rapid generation of large data banks of gene sequences
Structural Bioinformatics
Structural Bioinformatics Prediction of structure from sequence secondary structure homology modelling, threading ab initio 3D prediction Analysis of 3D structure structure comparison/ alignment prediction of function from structure molecular mechanics/ molecular dynamics prediction of molecular interactions, docking Structure databases (RCSB)
 
Bioinformatics key areas organisation of knowledge (sequences, structures, functional data) e.g. homology searches
Molecular modeling Homology model Comparative modeling  Modellar SwissPDB  Viwer Genetraeder MOLMOD
Molecular visualization Rasmol CN3D Jmol Pymol Jmol
SECONDARY STRUCTURE PREDICTION Jpred,Gor,Sopma
Tertiary Structure prediction CPHmodel
Active Site Prediction

More Related Content

PPT
Primary and secondary database
PPTX
Bio153 microbial genomics 2012
PPT
Metagenomic analysis
PDF
Transcriptomics and metabolomics
PPT
Agrobacterium
PPT
Metagenomics analysis
PPTX
String.pptx
PPTX
Uses of Artificial Intelligence in Bioinformatics
Primary and secondary database
Bio153 microbial genomics 2012
Metagenomic analysis
Transcriptomics and metabolomics
Agrobacterium
Metagenomics analysis
String.pptx
Uses of Artificial Intelligence in Bioinformatics

What's hot (20)

PPTX
bioinformatics simple
PPT
Environment biotechnology
PPTX
Bioinformatics
PPTX
Application of bioinformatics
PPT
Internet and Bioinformatics for Biologists
PPTX
Scoring matrices
PPT
Lecture 8 genetic engineering of animal cells
PPTX
System biology and its tools
PPTX
Shotgun (2) metagenomics
PPTX
Upgma
PPT
Bioinformatics
PDF
Computational Protein Design. 2. Computational Protein Design Techniques
PPTX
(Expasy)
PPTX
Transcriptomics approaches
PPTX
Biological databases
PPTX
BioInformatics Tools -Genomics , Proteomics and metablomics
PPT
OMICS tecnology
PPTX
Introduction to bioinformatics
bioinformatics simple
Environment biotechnology
Bioinformatics
Application of bioinformatics
Internet and Bioinformatics for Biologists
Scoring matrices
Lecture 8 genetic engineering of animal cells
System biology and its tools
Shotgun (2) metagenomics
Upgma
Bioinformatics
Computational Protein Design. 2. Computational Protein Design Techniques
(Expasy)
Transcriptomics approaches
Biological databases
BioInformatics Tools -Genomics , Proteomics and metablomics
OMICS tecnology
Introduction to bioinformatics
Ad

Viewers also liked (12)

PPT
Bioinformatics A Biased Overview
PPT
Molecular Markers: Major Applications in Insects
PPTX
Formal languages to map Genotype to Phenotype in Natural Genomes
PPT
DNA Markers Techniques for Plant Varietal Identification
PPT
Ap Chapter 21
PPTX
Flow Cytometry Training : Introduction day 1 session 1
PDF
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
PPTX
Bioinformatics in the Era of Open Science and Big Data
PPTX
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adam
PPTX
How to be a bioinformatician
PPTX
Gene concept
PDF
Basics of bioinformatics
Bioinformatics A Biased Overview
Molecular Markers: Major Applications in Insects
Formal languages to map Genotype to Phenotype in Natural Genomes
DNA Markers Techniques for Plant Varietal Identification
Ap Chapter 21
Flow Cytometry Training : Introduction day 1 session 1
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
Bioinformatics in the Era of Open Science and Big Data
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adam
How to be a bioinformatician
Gene concept
Basics of bioinformatics
Ad

Similar to Project report-on-bio-informatics (20)

PPTX
Informal presentation on bioinformatics
PDF
Bioinformatics مي.pdf
PPTX
Bioinformatics_1_ChenS.pptx
PPTX
Bioinformatics introduction
PPTX
Bioinformatics
PPTX
Introduction to bioinformatics
PDF
57 bio infomark
PPTX
Bioinformatics
PPTX
Bioinformatic, and tools by kk sahu
PPT
bioinfomatics
PPTX
Databases_CSS2.pptx
PDF
Bioinformatics - Exam_Materials.pdf by uos
PPTX
Introduction to databases.pptx
PDF
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
PPTX
Bioinformatics
PPT
Bioinformatic_Databases_2.ppt
PPT
Bioinformatic_Databases_2xcxzczxcxzxcxzc
PPT
Bioinformatic databases 2
PPT
Bioinformatic databases 2
PPT
Pcmd bioinformatics-lecture i
Informal presentation on bioinformatics
Bioinformatics مي.pdf
Bioinformatics_1_ChenS.pptx
Bioinformatics introduction
Bioinformatics
Introduction to bioinformatics
57 bio infomark
Bioinformatics
Bioinformatic, and tools by kk sahu
bioinfomatics
Databases_CSS2.pptx
Bioinformatics - Exam_Materials.pdf by uos
Introduction to databases.pptx
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
Bioinformatics
Bioinformatic_Databases_2.ppt
Bioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic databases 2
Bioinformatic databases 2
Pcmd bioinformatics-lecture i

Recently uploaded (20)

PPT
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
PPT
Management of Acute Kidney Injury at LAUTECH
PPTX
surgery guide for USMLE step 2-part 1.pptx
PDF
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
PPTX
Uterus anatomy embryology, and clinical aspects
PDF
Human Health And Disease hggyutgghg .pdf
PPTX
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
PPT
Breast Cancer management for medicsl student.ppt
PPTX
neonatal infection(7392992y282939y5.pptx
PPT
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
PPTX
DENTAL CARIES FOR DENTISTRY STUDENT.pptx
PPTX
History and examination of abdomen, & pelvis .pptx
PPTX
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
PPTX
Note on Abortion.pptx for the student note
DOC
Adobe Premiere Pro CC Crack With Serial Key Full Free Download 2025
PPTX
Fundamentals of human energy transfer .pptx
PDF
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
PPTX
Important Obstetric Emergency that must be recognised
PPTX
anal canal anatomy with illustrations...
PPTX
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
Management of Acute Kidney Injury at LAUTECH
surgery guide for USMLE step 2-part 1.pptx
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
Uterus anatomy embryology, and clinical aspects
Human Health And Disease hggyutgghg .pdf
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
Breast Cancer management for medicsl student.ppt
neonatal infection(7392992y282939y5.pptx
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
DENTAL CARIES FOR DENTISTRY STUDENT.pptx
History and examination of abdomen, & pelvis .pptx
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
Note on Abortion.pptx for the student note
Adobe Premiere Pro CC Crack With Serial Key Full Free Download 2025
Fundamentals of human energy transfer .pptx
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
Important Obstetric Emergency that must be recognised
anal canal anatomy with illustrations...
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...

Project report-on-bio-informatics

  • 1. Bioinformatics – A Brief overview
  • 2. What is bioinformatics? Application of information technology to the storage, management and analysis of biological information Facilitated by the use of computers
  • 3. Publically available genomes (April 1998) COMPLETE/PUBLIC Aquifex aeolicus Pyrococcus horikoshii Bacillus subtilis Treponema pallidum Borrelia burgdorferi Helicobacter pylori . Escherichia coli Mycoplasma pneumoniae Saccharomyces cerevisiae Mycoplasma genitalium Haemophilus influenzae COMPLETE/PENDING PUBLICATION Rickettsia prowazekii Pseudomonas aeruginosa Pyrococcus abyssii Bacillus sp. C-125 Ureaplasma urealyticum Pyrobaculum aerophilum ALMOST/PUBLIC Pyrococcus furiosus Mycobacterium tuberculosis H37Rv Mycobacterium tuberculosis CSU93 Neisseria gonorrhea Neisseria meningiditis Streptococcus pyogenes
  • 4. Promises of genomics and bioinformatics Medicine Knowledge of protein structure facilitates drug design Understanding of genomic variation allows the tailoring of medical treatment to the individual’s genetic make-up Genome analysis allows the targeting of genetic diseases The effect of a disease or of a therapeutic on RNA and protein levels can be elucidated The same techniques can be applied to biotechnology, crop and livestock improvement, etc...
  • 5. The need for bioinformaticists. The number of entries in data bases of gene sequences is increasing exponentially. Bioinformaticians are needed to understand and use this information . 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 GenBank growth
  • 6. What Can be done using bioinformatics? Sequence analysis Geneticists/ molecular biologists analyse genome sequence information to understand disease processes Molecular modeling Crystallographers/ biochemists design drugs using computer-aided tools Phylogeny/evolution Geneticists obtain information about the evolution of organisms by looking for similarities in gene sequences Ecology and population studies Bioinformatics is used to handle large amounts of data obtained in population studies Medical informatics Personalised medicine
  • 7. NCBI (National centre for Biotechnology information ) www.ncbi.nlm.nih.gov Entrez Protein DNA EMBL, DDBJ, GENEBANK SRS GENOME Pubmed Annotation Medline PIR Swissprot PDB
  • 8. What can be discovered about a gene by a database search? A little or a lot, depending on the gene Evolutionary information : homologous genes, taxonomic distributions, allele frequencies, synteny, etc. Genomic information : chromosomal location, introns, UTRs, regulatory regions, shared domains, etc. Structural information : associated protein structures, fold types, structural domains Expression information : expression specific to particular tissues, developmental stages, phenotypes, diseases, etc. Functional information : enzymatic/molecular function, pathway/cellular role, localization, role in diseases
  • 9. Databases Three types of databases Primary – Sequence database Secondary- Annotation Tertiary- structure database Two other types DNA database - Genebank,DDBJ,EMBL Protein databases – PIR,SwissProt,MIPS
  • 10. Biological databanks and databases Very fast growth of biological data Diversity of biological data: primary sequences 3D structures functional data Database entry usually required for publication Sequences Structures Database entry may replace primary publication genomic approaches Bioinformatics
  • 12.  
  • 13. Sequence analysis: overview Nucleotide sequence file Search databases for similar sequences Sequence comparison Multiple sequence analysis Design further experiments Restriction mapping PCR planning Translate into protein Search for known motifs RNA structure prediction non-coding coding Protein sequence analysis Search for protein coding regions Manual sequence entry Sequence database browsing Sequencing project management Protein sequence file Search databases for similar sequences Sequence comparison Search for known motifs Predict secondary structure Predict tertiary structure Create a multiple sequence alignment Edit the alignment Format the alignment for publication Molecular phylogeny Protein family analysis Nucleotide sequence analysis Sequence entry
  • 14. Sequence comparison Pairwise sequence alignment Blast - BlastP,BlastN,nBlastP Multiple sequence alignment ClustalW,ClustalX User interface Bioedit Biology Workbench CLC Workbench
  • 17.  
  • 18. Multiple Sequence Alignment: Approaches Optimal Global Alignments -Dynamic programming Generalization of Needleman-Wunsch Find alignment that maximizes a score function Computationally expensive: Time grows as product of sequence lengths Global Progressive Alignments - Match closely-related sequences first using a guide tree Global Iterative Alignments - Multiple re-building attempts to find best alignment Local alignments Profiles, Blocks, Patterns
  • 20. Phylogeny inference: Analysis of sequences allows evolutionary relationships to be determined E.coli C.botulinum C.cadavers C.butyricum B.subtilis B.cereus Phylogenetic tree constructed using the Phylip package
  • 21. gene prediction software Similarity-based or Comparative BLAST SGP2 (extension of GeneID) Ab initio = “from the beginning” GeneID GENSCAN GeneMark Combined "evidence-based” GeneSeqer (Brendel et al., ISU) BEST- GENSCAN, GeneMark.hmm, GeneSeqer but depends on organism & specific task
  • 22. PCR Primer Design: Oligonucleotides for use in the polymerisation chain reaction can be designed using computer based prgrams OPTIMAL primer length --> 20 MINIMUM primer length --> 18 MAXIMUM primer length --> 22 OPTIMAL primer melting temperature --> 60.000 MINIMUM acceptable melting temp --> 57.000 MAXIMUM acceptable melting temp --> 63.000 MINIMUM acceptable primer GC% --> 20.000 MAXIMUM acceptable primer GC% --> 80.000 Salt concentration (mM) --> 50.000 DNA concentration (nM) --> 50.000 MAX no. unknown bases (Ns) allowed --> 0 MAX acceptable self-complementarity --> 12 MAXIMUM 3' end self-complementarity --> 8 GC clamp how many 3' bases --> 0
  • 23. Restriction mapping: Genes can be analysed to detect gene sequences that can be cleaved with restriction enzymes AceIII 1 CAGCTCnnnnnnn’nnn... AluI 2 AG’CT AlwI 1 GGATCnnnn’n_ ApoI 2 r’AATT_y BanII 1 G_rGCy’C BfaI 2 C’TA_G BfiI 1 ACTGGG BsaXI 1 ACnnnnnCTCC BsgI 1 GTGCAGnnnnnnnnnnn... BsiHKAI 1 G_wGCw’C Bsp1286I 1 G_dGCh’C BsrI 2 ACTG_Gn’ BsrFI 1 r’CCGG_y CjeI 2 CCAnnnnnnGTnnnnnn... CviJI 4 rG’Cy CviRI 1 TG’CA DdeI 2 C’TnA_G DpnI 2 GA’TC EcoRI 1 G’AATT_C HinfI 2 G’AnT_C MaeIII 1 ’GTnAC_ MnlI 1 CCTCnnnnnn_n’ MseI 2 T’TA_A MspI 1 C’CG_G NdeI 1 CA’TA_TG Sau3AI 2 ’GATC_ SstI 1 G_AGCT’C TfiI 2 G’AwT_C Tsp45I 1 ’GTsAC_ Tsp509I 3 ’AATT_ TspRI 1 CAGTGnn’ 50 100 150 200 250
  • 24. RNA structure prediction: Structural features of RNA can be predicted G G A C A G G A G G A U A C C G C G G U C C U G C C G G U C C U C A C U U G G A C U U A G U A U C A U C A G U C U G C G C A A U A G G U A A C G C G U
  • 25. Protein Structure : the 3-D structure of proteins is used to understand protein function and design new drugs
  • 26. Gene Sequencing: Automated chemcial sequencing methods allow rapid generation of large data banks of gene sequences
  • 28. Structural Bioinformatics Prediction of structure from sequence secondary structure homology modelling, threading ab initio 3D prediction Analysis of 3D structure structure comparison/ alignment prediction of function from structure molecular mechanics/ molecular dynamics prediction of molecular interactions, docking Structure databases (RCSB)
  • 29.  
  • 30. Bioinformatics key areas organisation of knowledge (sequences, structures, functional data) e.g. homology searches
  • 31. Molecular modeling Homology model Comparative modeling Modellar SwissPDB Viwer Genetraeder MOLMOD
  • 32. Molecular visualization Rasmol CN3D Jmol Pymol Jmol

Editor's Notes

  • #3: As a result, the last few years have seen an explosion in the field of bioinformatics, a new field of study which combines methods from computer science and information technology to analyze biological information. In its purest definition, bioinformatics is the application of information technology to biology.
  • #4: Rather than sequencing isolated genes, more and more research groups and companies are now focussing on sequencing whole genomes from organisms of medical, commercial or scientific importance. The first complete bacterium to be completely sequenced was Haemophilus influenzae in 1995. In 1996, the first complete eukaryotic genome, that of baker’s yeast ( Saccharomyces cerevisiae ) was published. New complete genomes are now being published every month, and human genome projects, both publicly and privately funded, are well on the way to completion.
  • #5: The new genome technologies coupled with bioinformatics promise a revolution in almost all fields of life sciences and in society. For example, just in the medical sciences: In the pharmaceutical industry, these methods have been embraced as a shortcut to the discovery of better drugs. For example, knowledge of a protein’s structure can shorten considerably the time taken to develop specific inhibitors of this protein for therapeutic use. The study of how genome variation affects drug effectiveness (pharmacogenomics) is still in its infancy, but promises to deliver more effective and specific therapeutic drugs which are tailored to the individual’s genetic make-up. A knowledge of the genome also facilitates the targeting of genetic diseases by drug or gene therapy. Genome analysis also provides the framework for the study of gene and protein expression using DNA microarray technology or 2-dimensional gene electrophoresis, with broad-ranging applications. And these techniques can be applied not only in the medical sciences, but also in agriculture, biotechnology etc…
  • #6: The last 10 years have seen recombinant DNA techniques pervade the whole of biology and biology-related fields. The use of plasmids, restriction enzymes, DNA sequencing methods and, more recently, PCR, have allowed the cloning and characterization of many genes and of their protein products. The growth in DNA sequence data available to researchers is phenomenal. For example, GenBank, a major database where molecular biologists store the DNA sequences they obtain and make them available, doubles in size approximately every 14 months. At the beginning of 1999, Genbank contained over 3 million sequence records, and grew at a rate in excess of a million nucleotides deposited per day! Genbank is shown here as an example, but other sequence databases would grow at similar rates. Source: genbank release notes, National Center for Biotechnology Information (http://ncbi.nlm.nih.gov/)
  • #7: As the application of information technology to biology, bioinformatics pervades the whole of biology, including genetics, biochemistry, ecology and medicine. However, much of the publicity and emphasis which bioinformatics has received in the last few years has been on DNA and protein sequence analysis. Given the large amount of sequence data available and the rate at which it is growing, this is where the need for computer analysis has been felt the most. DNA and protein sequences are particularly amenable to computer analysis, since they can be represented by strings of letters, which computers are very apt to deal with. A DNA sequence is a string of 4 letters (A, C, G and T), and a protein sequence can also be represented by a string of 20 letters, each of which represents an amino acid
  • #14: The next part of the lecture uses flowcharts to outline a range of procedures commonly used in computer-assisted biomolecular sequence analysis. This rather complicated flowchart summarizes this whole section of the lecture. The flowchart will be divided into four sections: Sequence entry: getting the sequence into the computer Nucleotide sequence analysis Protein sequence analysis Multiple sequence analysis (working with multiple sequence alignments) Each step of the flowchart will be examined in turn
  • #18: 1 caagtcttct ttctccaagg aggatatgaa gcgttttcgg cttcctgccc tgagctgtgc 61 agcaaacagt ccacccccat ggggctcagc ctcccgctga gtactagtgt gcctgacagt 121 gcagaatccg gatgcagctc ctgtagcacc cctctctacg accagggggg cccagtggag 181 atcctgtcct tcctgtacct gggcagtgct taccatgctt cccggaaaga tatgctcgac 241 gccttgggta tcactgcttt gatcaacgtc tcggccaatt gtcctaacaa ctttgagggt 301 cactaccagt acaagagcat ccctgtggag gacaaccaca aggcagacat cagctcctgg 361 ttcaacgagg cgattgactt tatagactcc atcaaggatg ctggaggaag ggtgtttgtg 421 cactgccagg ccggcatctc caggtcagcc accatctgcc ttgcttacct catgaggact 481 aaccgagtga agctggacga ggcctttgag tttgtgaagc a
  • #21: Multiple sequence alignments can therefore be used as input to create phylogenetic trees representing possible evolutionary relationships. The principle is that the more closely related two species, the more similar their homologous sequences will be (in general - there are many exceptions) For example, according to the above tree, B. subtilis and B. cereus are more closely related to each other than to C. botulinum, C. cadavers, C. butyricum or E. coli. This tree was created from an alignment of the 16s ribosomal RNA sequences from the various bacteria. Further reading: molecular phylogeny is a very large field in itself, with a lot of associated literature. A good introduction to the field can be found in: Swofford, Olsen, Waddell and Hillis (1996) “Phylogenetic inference” in Molecular Systematics (2nd ed), DM Hillis, C Moritz and BK Mable eds.Sinauer Associates, Inc. Sunderland MA, USA
  • #23: PCR planning programs let the user specify criteria such as primer length, melting temperature, GC content etc...
  • #24: This type of display is produced by the program mapplot , part of the GCG package. It lists the restriction enzymes which cut a particular sequence (together with their recognition sequence) and creates a graphical representation of the sequence with the cutting sites marked along a line representing the sequence. This type of image is useful for finding suitable restriction enzymes for subcloning a particular sequence fragment, or for producing a distinctive restriction pattern for in vitro diagnostic procedures. Enzyme name Recognition sequence cutting sites
  • #25: This transfer RNA cloverleaf structure was predicted for a tRNA sequence using Michael Zuker’s program mfold , which has been incorporated in the GCG package. Further reading: M. Zuker, D.H. Mathews & D.H. Turner (1999) “Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide” In RNA Biochemistry and Biotechnology , J. Barciszewski & B.F.C. Clark, eds., NATO ASI Series, Kluwer Academic Publishers Also available online at http://www.ibc.wustl.edu/~zuker/seqanal/
  • #26: There are several approaches to building a 3 dimensional model for a protein: Homology modeling uses sequence similarity to map a sequence onto the known structure of a similar sequence (for example, using BLAST to search the PDB database) Profiling involves converting known structures into 3D profiles where the residue preference for each position is classified according to secondary structure (helix, strand, coil) and hydrophobicity/accessibility (exposed, partially exposed, buried). The query sequence can then be mapped onto a library of 3D profiles and the best matching profiles are selected. Threading also involves mapping a sequence onto a library of structures, but only structural information is used. Instead, pseudo-potential energy functions are used to evaluate residue-residue interactions. The query sequence is “threaded” through the various potential structures in the library and the folds yielding the lowest interaction energy when the sequence is mapped onto them are selected. For example, a fold which bring two residues of opposite charge close together will be considered a better fit than a fold which brings together two residues of the same charge or two large residues which would cause a steric clash. (Slide and notes courtesy of Dr Shoba Ranganathan, Australian Genomic Information Centre)
  • #27: Large scale sequencing projects make use of automated sequencing machines connected to a computer. Because the sequencing machines are typically limited to 300-600 nucleotides, it is often necessary to break down large sequences into fragments, sequence these fragments, then reconstruct the original complete sequence by searching for regions in common between the gel readings, using specialized software. This picture shows some windows from gap4 , a sequencing project management program which is part of the Staden package. This type of software helps in the management of sequencing projects not only by assembling gel readings but also by searching and removing vector sequences, repeat sequences and poor quality sequence regions which can cause problems when assembling the fragments Further reading: Staden, R., Beal, K.F. and Bonfield, J.K. (1998) The Staden Package, Computer Methods in Molecular Biology Eds Stephen Misener and Steve Krawetz. The Humana Press Inc., Totowa, NJ 07512 Also available at: http://www.mrc-lmb.cam.ac.uk/pubseq/methods_in_mol_biol/index.html