Project report-on-bio-informatics

Bioinformatics – A Brief overview

What is bioinformatics? Application of information technology to the storage, management and analysis of biological information Facilitated by the use of computers

Publically available genomes (April 1998) COMPLETE/PUBLIC Aquifex aeolicus Pyrococcus horikoshii Bacillus subtilis Treponema pallidum Borrelia burgdorferi Helicobacter pylori . Escherichia coli Mycoplasma pneumoniae Saccharomyces cerevisiae Mycoplasma genitalium Haemophilus influenzae COMPLETE/PENDING PUBLICATION Rickettsia prowazekii Pseudomonas aeruginosa Pyrococcus abyssii Bacillus sp. C-125 Ureaplasma urealyticum Pyrobaculum aerophilum ALMOST/PUBLIC Pyrococcus furiosus Mycobacterium tuberculosis H37Rv Mycobacterium tuberculosis CSU93 Neisseria gonorrhea Neisseria meningiditis Streptococcus pyogenes

Promises of genomics and bioinformatics Medicine Knowledge of protein structure facilitates drug design Understanding of genomic variation allows the tailoring of medical treatment to the individual’s genetic make-up Genome analysis allows the targeting of genetic diseases The effect of a disease or of a therapeutic on RNA and protein levels can be elucidated The same techniques can be applied to biotechnology, crop and livestock improvement, etc...

The need for bioinformaticists. The number of entries in data bases of gene sequences is increasing exponentially. Bioinformaticians are needed to understand and use this information . 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 GenBank growth

What Can be done using bioinformatics? Sequence analysis Geneticists/ molecular biologists analyse genome sequence information to understand disease processes Molecular modeling Crystallographers/ biochemists design drugs using computer-aided tools Phylogeny/evolution Geneticists obtain information about the evolution of organisms by looking for similarities in gene sequences Ecology and population studies Bioinformatics is used to handle large amounts of data obtained in population studies Medical informatics Personalised medicine

NCBI (National centre for Biotechnology information ) www.ncbi.nlm.nih.gov Entrez Protein DNA EMBL, DDBJ, GENEBANK SRS GENOME Pubmed Annotation Medline PIR Swissprot PDB

What can be discovered about a gene by a database search? A little or a lot, depending on the gene Evolutionary information : homologous genes, taxonomic distributions, allele frequencies, synteny, etc. Genomic information : chromosomal location, introns, UTRs, regulatory regions, shared domains, etc. Structural information : associated protein structures, fold types, structural domains Expression information : expression specific to particular tissues, developmental stages, phenotypes, diseases, etc. Functional information : enzymatic/molecular function, pathway/cellular role, localization, role in diseases

Databases Three types of databases Primary – Sequence database Secondary- Annotation Tertiary- structure database Two other types DNA database - Genebank,DDBJ,EMBL Protein databases – PIR,SwissProt,MIPS

Biological databanks and databases Very fast growth of biological data Diversity of biological data: primary sequences 3D structures functional data Database entry usually required for publication Sequences Structures Database entry may replace primary publication genomic approaches Bioinformatics

Sequence analysis: overview Nucleotide sequence file Search databases for similar sequences Sequence comparison Multiple sequence analysis Design further experiments Restriction mapping PCR planning Translate into protein Search for known motifs RNA structure prediction non-coding coding Protein sequence analysis Search for protein coding regions Manual sequence entry Sequence database browsing Sequencing project management Protein sequence file Search databases for similar sequences Sequence comparison Search for known motifs Predict secondary structure Predict tertiary structure Create a multiple sequence alignment Edit the alignment Format the alignment for publication Molecular phylogeny Protein family analysis Nucleotide sequence analysis Sequence entry

Sequence comparison Pairwise sequence alignment Blast - BlastP,BlastN,nBlastP Multiple sequence alignment ClustalW,ClustalX User interface Bioedit Biology Workbench CLC Workbench

Multiple Sequence Alignment: Approaches Optimal Global Alignments -Dynamic programming Generalization of Needleman-Wunsch Find alignment that maximizes a score function Computationally expensive: Time grows as product of sequence lengths Global Progressive Alignments - Match closely-related sequences first using a guide tree Global Iterative Alignments - Multiple re-building attempts to find best alignment Local alignments Profiles, Blocks, Patterns

Phylogeny inference: Analysis of sequences allows evolutionary relationships to be determined E.coli C.botulinum C.cadavers C.butyricum B.subtilis B.cereus Phylogenetic tree constructed using the Phylip package

gene prediction software Similarity-based or Comparative BLAST SGP2 (extension of GeneID) Ab initio = “from the beginning” GeneID GENSCAN GeneMark Combined "evidence-based” GeneSeqer (Brendel et al., ISU) BEST- GENSCAN, GeneMark.hmm, GeneSeqer but depends on organism & specific task

PCR Primer Design: Oligonucleotides for use in the polymerisation chain reaction can be designed using computer based prgrams OPTIMAL primer length --> 20 MINIMUM primer length --> 18 MAXIMUM primer length --> 22 OPTIMAL primer melting temperature --> 60.000 MINIMUM acceptable melting temp --> 57.000 MAXIMUM acceptable melting temp --> 63.000 MINIMUM acceptable primer GC% --> 20.000 MAXIMUM acceptable primer GC% --> 80.000 Salt concentration (mM) --> 50.000 DNA concentration (nM) --> 50.000 MAX no. unknown bases (Ns) allowed --> 0 MAX acceptable self-complementarity --> 12 MAXIMUM 3' end self-complementarity --> 8 GC clamp how many 3' bases --> 0

Restriction mapping: Genes can be analysed to detect gene sequences that can be cleaved with restriction enzymes AceIII 1 CAGCTCnnnnnnn’nnn... AluI 2 AG’CT AlwI 1 GGATCnnnn’n_ ApoI 2 r’AATT_y BanII 1 G_rGCy’C BfaI 2 C’TA_G BfiI 1 ACTGGG BsaXI 1 ACnnnnnCTCC BsgI 1 GTGCAGnnnnnnnnnnn... BsiHKAI 1 G_wGCw’C Bsp1286I 1 G_dGCh’C BsrI 2 ACTG_Gn’ BsrFI 1 r’CCGG_y CjeI 2 CCAnnnnnnGTnnnnnn... CviJI 4 rG’Cy CviRI 1 TG’CA DdeI 2 C’TnA_G DpnI 2 GA’TC EcoRI 1 G’AATT_C HinfI 2 G’AnT_C MaeIII 1 ’GTnAC_ MnlI 1 CCTCnnnnnn_n’ MseI 2 T’TA_A MspI 1 C’CG_G NdeI 1 CA’TA_TG Sau3AI 2 ’GATC_ SstI 1 G_AGCT’C TfiI 2 G’AwT_C Tsp45I 1 ’GTsAC_ Tsp509I 3 ’AATT_ TspRI 1 CAGTGnn’ 50 100 150 200 250

RNA structure prediction: Structural features of RNA can be predicted G G A C A G G A G G A U A C C G C G G U C C U G C C G G U C C U C A C U U G G A C U U A G U A U C A U C A G U C U G C G C A A U A G G U A A C G C G U

Protein Structure : the 3-D structure of proteins is used to understand protein function and design new drugs

Gene Sequencing: Automated chemcial sequencing methods allow rapid generation of large data banks of gene sequences

Structural Bioinformatics Prediction of structure from sequence secondary structure homology modelling, threading ab initio 3D prediction Analysis of 3D structure structure comparison/ alignment prediction of function from structure molecular mechanics/ molecular dynamics prediction of molecular interactions, docking Structure databases (RCSB)

Bioinformatics key areas organisation of knowledge (sequences, structures, functional data) e.g. homology searches

Molecular modeling Homology model Comparative modeling Modellar SwissPDB Viwer Genetraeder MOLMOD

Molecular visualization Rasmol CN3D Jmol Pymol Jmol

SECONDARY STRUCTURE PREDICTION Jpred,Gor,Sopma

Tertiary Structure prediction CPHmodel

Project report-on-bio-informatics

More Related Content

What's hot (20)

Viewers also liked (12)

Similar to Project report-on-bio-informatics (20)

Recently uploaded (20)

Project report-on-bio-informatics

Editor's Notes