SlideShare a Scribd company logo
Introduction to Bioinformatics Shivani Chandra The Birla Institute of Scientific Research
What is Bioinformatics? Bioinformatics :  is the development and use of computer applications for the  Analysis ,  Interpretation ,  Simulation  and  Prediction  of biological Systems and corresponding experimental methods in nature sciences.
What is bioinformatics? Interface of biology and computers Analysis of proteins, genes and genomes using computer algorithms and  computer databases Genomics is the analysis of genomes.  The tools of bioinformatics are used to make  sense of the billions of base pairs of DNA  that are sequenced by genomics projects.
History of Bioinformatics Biologists were searching for  algorithms  to analyze and interpret their huge amount of empiric biological data Computer aided  modeling  and  simulation  International molecular biological  databases  arose to make data internationally accessible and comparable
History of Bioinformatics Algorithms for  gene - and  protein prediction  where developed These efforts lead to the development of  artificial neuronal networks ,  genetic algorithms  and  evolution  strategies
 
Bioinformatics Offers an ever more essential input to Molecular Biology Pharmacology (drug design) Agriculture Biotechnology Clinical medicine Forensic science Chemical industries (detergent industries, etc.)
The Central Dogma
Central Dogma DNA RNA Protein Transcription Translation ATG CTA CTT CAC TGA M L L H AUG CUA CUU CAC UGA
Anatomy of a Gene Promoter Introns Exons
DNA to RNA to Protein
Molecular Sequences Two primary types DNA (4  nucleotides : A,C,G,T) Amino acid (20  residues ) Strings of nucleotides can form  genes , most of which code for the production of chains of amino acids called  proteins .
Proteins Proteins have a variety of roles that they must fulfill: they are the enzymes that rearrange chemical bonds. they carry signals to and from the outside of the cell, and within the cell. they transport small molecules. they form many of the cellular structures. they regulate cell processes, turning them on and off and controlling their rates.
Proteins – Amino Acids There are 20 different types of amino acids. Different sequences of amino acids  fold  into different 3-D shapes. Proteins can range from fewer than 20 to more than 5000 amino acids in length. Each protein that an organism can produce is encoded in  piece of the DNA called a “gene”.
Proteins – Amino Acids The single-celled bacterium  E.coli  has about 4300 different genes. Properties of amino acids : play a role in the construction of 3-D structures in proteins
 
 
In Summary DNA sequence determines protein sequence Protein sequence determines protein structure Protein structure determines protein folding and function
GenBank EMBL DDBJ There are three major public DNA databases The underlying raw DNA sequences are identical Databases in Bioinformatics
GenBank EMBL DDBJ Housed at EBI European Bioinformatics Institute There are three major public DNA databases Housed  at NCBI National Center for Biotechnology Information Housed  in Japan
>100,000 species are represented in GenBank all species 128,941   viruses 6,137 bacteria 31,262  archaea 2,100  eukaryota 87,147
 
 
The most sequenced organisms in GenBank Homo sapiens  (6.9 million entries) Mus musculus  (5.0 million) Zea mays   (896,000) Rattus norvegicus  (819,000) Gallus gallus   (567,000) Arabidopsis thaliana  (519,000) Danio rerio  (492,000) Drosophila melanogaster  (350,000) Oryza sativa  (221,000)
National Center for Biotechnology Information (NCBI) www.ncbi.nlm.nih.gov
www.ncbi.nlm.nih.gov
 
PubMed is… National Library of Medicine's search service 11 million citations in MEDLINE links to participating online journals PubMed tutorial (via “Education” on side bar)
Entrez  integrates… the scientific literature;  DNA and protein sequence databases;  3D protein structure data;  population study data sets;  assemblies of complete genomes
Entrez is a search and retrieval system  that integrates NCBI databases
BLAST is… Basic Local Alignment Search Tool NCBI's sequence similarity search tool supports analysis of DNA and protein databases 80,000 searches per day
OMIM is… Online Mendelian Inheritance in Man catalog of human genes and genetic disorders edited by Dr. Victor McKusick, others at JHU
Books is… searchable resource of on-line books
TaxBrowser is… browser for the major divisions of living organisms  (archaea, bacteria, eukaryota, viruses) taxonomy information such as genetic codes molecular data on extinct organisms
Question #1: How can I use  PubMed at NCBI to find literature information?
PubMed is the NCBI gateway to MEDLINE. MEDLINE contains bibliographic citations  and author abstracts from over 4,000 journals  published in the United States and in 70 foreign  countries.  It has 12 million records dating back to 1966.
MeSH is the acronym for "Medical Subject Headings."  MeSH is the list of the vocabulary terms used  for subject analysis of biomedical literature at NLM.  MeSH vocabulary is used for indexing journal articles  for MEDLINE.  The MeSH controlled vocabulary imposes uniformity  and consistency to the indexing of biomedical literature.
 
 
PubMed search strategies Try the tutorial (“education” on the left sidebar) Use boolean queries lipocalin AND disease Try using “limits” Try “LinkOut” to find external resources Obtain articles on-line via Welch Medical Library (and download pdf files): http://www.welch.jhu.edu/
Sequence Databases GenBank -- DNA sequences and derived protein sequences   EMBL  -- DNA sequences and derived protein sequences DDBJ  -- DNA sequences and derived protein sequences   SWISS-PROT -- Protein sequences   PDB -- three-dimensional structures of protein
GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences .  A new release is made every two months. GenBank is part of the  International Nucleotide Sequence Database Collaboration , which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis.  GenBank,EMBL & DDBJ
GenBank,EMBL & DDBJ GenBank  Release  122.0,Feb.15,2001.  10,897,000 sequence records  11,720,000,000 bases  EMBL Release 66,Mar.2,2000 11,169,673 11,916,112,872  DDBJ, the Center for operating DDBJ, National Institute of Genetics (NIG),Japan,established in April 1995.
Next Topic : Protein Databases

More Related Content

PPTX
Introduction of bioinformatics
PPTX
Introduction to Bioinformatics
PPTX
Bioinformatics
PPT
Role of bioinformatics in life sciences research
PPT
Intro bioinfo
PPTX
BioInformatics Tools -Genomics , Proteomics and metablomics
PPTX
Bioinformatics introduction
PPTX
Careers in bioinformatics, Scope, Skills and Jobs
Introduction of bioinformatics
Introduction to Bioinformatics
Bioinformatics
Role of bioinformatics in life sciences research
Intro bioinfo
BioInformatics Tools -Genomics , Proteomics and metablomics
Bioinformatics introduction
Careers in bioinformatics, Scope, Skills and Jobs

What's hot (20)

PDF
Bioinformatics
PDF
LECTURE NOTES ON BIOINFORMATICS
PPTX
bioinformatics simple
PPTX
Bioinformatics ppt
PPTX
Introduction to bioinformatics
PPT
Bioinformatics lecture 1
PPTX
MoM2010: Bioinformatics
PPTX
Bioinformatics for beginners (exam point of view)
PPTX
Major resources of bioinformatics 2
PDF
Bioinformatics
PPTX
Genome Database Systems
PPTX
Bioinformatics
PPT
Bioinformatics
PPT
Bioinformatics
PPTX
Genomics and Bioinformatics
PPTX
Informal presentation on bioinformatics
PPTX
introduction of Bioinformatics
Bioinformatics
LECTURE NOTES ON BIOINFORMATICS
bioinformatics simple
Bioinformatics ppt
Introduction to bioinformatics
Bioinformatics lecture 1
MoM2010: Bioinformatics
Bioinformatics for beginners (exam point of view)
Major resources of bioinformatics 2
Bioinformatics
Genome Database Systems
Bioinformatics
Bioinformatics
Bioinformatics
Genomics and Bioinformatics
Informal presentation on bioinformatics
introduction of Bioinformatics
Ad

Similar to Intro bioinfo (20)

PPTX
Introduction to databases.pptx
PPTX
Biological databasesBiological databases
PPT
Bioinformatics - Discovering the Bio Logic Of Nature
PPTX
MLS 5321 MOLECULAR BIOLOGY II TECHNIQUES AND APPLICATIONS POWER POINT.pptx
PPTX
MOLECULAR BIOLOGY TECHNIQUES AND APPLICATIONS
PPTX
Bioinformatics
PPTX
Bioinformatics .pptx
PDF
Bioinformatics biological databases
PPT
Bioinformatics in biotechnology by kk sahu
PDF
Bioinformatics introduction
PDF
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
PDF
Bioinformatics seminar
PPTX
Biological database ppt(1).pptx Introuction
PPTX
Biological database ppt(1).pptx Introuction
PPTX
GENOMICS AND BIOINFORMATICS
PPTX
Biological database
PDF
Bioinformatics مي.pdf
DOCX
Bioinformatics
PDF
Introduction to Bioinformatics-1.pdf
PPTX
BIOINFO unit 1.pptx
Introduction to databases.pptx
Biological databasesBiological databases
Bioinformatics - Discovering the Bio Logic Of Nature
MLS 5321 MOLECULAR BIOLOGY II TECHNIQUES AND APPLICATIONS POWER POINT.pptx
MOLECULAR BIOLOGY TECHNIQUES AND APPLICATIONS
Bioinformatics
Bioinformatics .pptx
Bioinformatics biological databases
Bioinformatics in biotechnology by kk sahu
Bioinformatics introduction
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
Bioinformatics seminar
Biological database ppt(1).pptx Introuction
Biological database ppt(1).pptx Introuction
GENOMICS AND BIOINFORMATICS
Biological database
Bioinformatics مي.pdf
Bioinformatics
Introduction to Bioinformatics-1.pdf
BIOINFO unit 1.pptx
Ad

Recently uploaded (20)

PPTX
Lesson notes of climatology university.
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
RMMM.pdf make it easy to upload and study
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Institutional Correction lecture only . . .
PPTX
Cell Types and Its function , kingdom of life
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Classroom Observation Tools for Teachers
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Lesson notes of climatology university.
human mycosis Human fungal infections are called human mycosis..pptx
VCE English Exam - Section C Student Revision Booklet
RMMM.pdf make it easy to upload and study
Computing-Curriculum for Schools in Ghana
Microbial diseases, their pathogenesis and prophylaxis
Microbial disease of the cardiovascular and lymphatic systems
Institutional Correction lecture only . . .
Cell Types and Its function , kingdom of life
O5-L3 Freight Transport Ops (International) V1.pdf
GDM (1) (1).pptx small presentation for students
Classroom Observation Tools for Teachers
TR - Agricultural Crops Production NC III.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Supply Chain Operations Speaking Notes -ICLT Program
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPH.pptx obstetrics and gynecology in nursing
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
FourierSeries-QuestionsWithAnswers(Part-A).pdf

Intro bioinfo

  • 1. Introduction to Bioinformatics Shivani Chandra The Birla Institute of Scientific Research
  • 2. What is Bioinformatics? Bioinformatics : is the development and use of computer applications for the Analysis , Interpretation , Simulation and Prediction of biological Systems and corresponding experimental methods in nature sciences.
  • 3. What is bioinformatics? Interface of biology and computers Analysis of proteins, genes and genomes using computer algorithms and computer databases Genomics is the analysis of genomes. The tools of bioinformatics are used to make sense of the billions of base pairs of DNA that are sequenced by genomics projects.
  • 4. History of Bioinformatics Biologists were searching for algorithms to analyze and interpret their huge amount of empiric biological data Computer aided modeling and simulation International molecular biological databases arose to make data internationally accessible and comparable
  • 5. History of Bioinformatics Algorithms for gene - and protein prediction where developed These efforts lead to the development of artificial neuronal networks , genetic algorithms and evolution strategies
  • 6.  
  • 7. Bioinformatics Offers an ever more essential input to Molecular Biology Pharmacology (drug design) Agriculture Biotechnology Clinical medicine Forensic science Chemical industries (detergent industries, etc.)
  • 9. Central Dogma DNA RNA Protein Transcription Translation ATG CTA CTT CAC TGA M L L H AUG CUA CUU CAC UGA
  • 10. Anatomy of a Gene Promoter Introns Exons
  • 11. DNA to RNA to Protein
  • 12. Molecular Sequences Two primary types DNA (4 nucleotides : A,C,G,T) Amino acid (20 residues ) Strings of nucleotides can form genes , most of which code for the production of chains of amino acids called proteins .
  • 13. Proteins Proteins have a variety of roles that they must fulfill: they are the enzymes that rearrange chemical bonds. they carry signals to and from the outside of the cell, and within the cell. they transport small molecules. they form many of the cellular structures. they regulate cell processes, turning them on and off and controlling their rates.
  • 14. Proteins – Amino Acids There are 20 different types of amino acids. Different sequences of amino acids fold into different 3-D shapes. Proteins can range from fewer than 20 to more than 5000 amino acids in length. Each protein that an organism can produce is encoded in piece of the DNA called a “gene”.
  • 15. Proteins – Amino Acids The single-celled bacterium E.coli has about 4300 different genes. Properties of amino acids : play a role in the construction of 3-D structures in proteins
  • 16.  
  • 17.  
  • 18. In Summary DNA sequence determines protein sequence Protein sequence determines protein structure Protein structure determines protein folding and function
  • 19. GenBank EMBL DDBJ There are three major public DNA databases The underlying raw DNA sequences are identical Databases in Bioinformatics
  • 20. GenBank EMBL DDBJ Housed at EBI European Bioinformatics Institute There are three major public DNA databases Housed at NCBI National Center for Biotechnology Information Housed in Japan
  • 21. >100,000 species are represented in GenBank all species 128,941 viruses 6,137 bacteria 31,262 archaea 2,100 eukaryota 87,147
  • 22.  
  • 23.  
  • 24. The most sequenced organisms in GenBank Homo sapiens (6.9 million entries) Mus musculus (5.0 million) Zea mays (896,000) Rattus norvegicus (819,000) Gallus gallus (567,000) Arabidopsis thaliana (519,000) Danio rerio (492,000) Drosophila melanogaster (350,000) Oryza sativa (221,000)
  • 25. National Center for Biotechnology Information (NCBI) www.ncbi.nlm.nih.gov
  • 27.  
  • 28. PubMed is… National Library of Medicine's search service 11 million citations in MEDLINE links to participating online journals PubMed tutorial (via “Education” on side bar)
  • 29. Entrez integrates… the scientific literature; DNA and protein sequence databases; 3D protein structure data; population study data sets; assemblies of complete genomes
  • 30. Entrez is a search and retrieval system that integrates NCBI databases
  • 31. BLAST is… Basic Local Alignment Search Tool NCBI's sequence similarity search tool supports analysis of DNA and protein databases 80,000 searches per day
  • 32. OMIM is… Online Mendelian Inheritance in Man catalog of human genes and genetic disorders edited by Dr. Victor McKusick, others at JHU
  • 33. Books is… searchable resource of on-line books
  • 34. TaxBrowser is… browser for the major divisions of living organisms (archaea, bacteria, eukaryota, viruses) taxonomy information such as genetic codes molecular data on extinct organisms
  • 35. Question #1: How can I use PubMed at NCBI to find literature information?
  • 36. PubMed is the NCBI gateway to MEDLINE. MEDLINE contains bibliographic citations and author abstracts from over 4,000 journals published in the United States and in 70 foreign countries. It has 12 million records dating back to 1966.
  • 37. MeSH is the acronym for "Medical Subject Headings." MeSH is the list of the vocabulary terms used for subject analysis of biomedical literature at NLM. MeSH vocabulary is used for indexing journal articles for MEDLINE. The MeSH controlled vocabulary imposes uniformity and consistency to the indexing of biomedical literature.
  • 38.  
  • 39.  
  • 40. PubMed search strategies Try the tutorial (“education” on the left sidebar) Use boolean queries lipocalin AND disease Try using “limits” Try “LinkOut” to find external resources Obtain articles on-line via Welch Medical Library (and download pdf files): http://www.welch.jhu.edu/
  • 41. Sequence Databases GenBank -- DNA sequences and derived protein sequences EMBL -- DNA sequences and derived protein sequences DDBJ -- DNA sequences and derived protein sequences SWISS-PROT -- Protein sequences PDB -- three-dimensional structures of protein
  • 42. GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences . A new release is made every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration , which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis. GenBank,EMBL & DDBJ
  • 43. GenBank,EMBL & DDBJ GenBank Release 122.0,Feb.15,2001. 10,897,000 sequence records 11,720,000,000 bases EMBL Release 66,Mar.2,2000 11,169,673 11,916,112,872 DDBJ, the Center for operating DDBJ, National Institute of Genetics (NIG),Japan,established in April 1995.
  • 44. Next Topic : Protein Databases

Editor's Notes

  • #10: DNA sequences of genes are rarely of any functional value alone. It is the proteins that they encode that are important to the organism. The process of reading the code in DNA and converting that code into a functional protein is highly conserved across almost all branches of life. An RNA-based copy of a gene’s DNA sequence on a chromosome is constructed by a molecule called RNA polymerase through a process called transcription. This RNA molecule is then read by ribosomes, which manufacture amino acids and assemble them into amino acid sequences. This latter process is known as translation. To summarize: DNA sequences are transcribed into RNA sequences, which are then translated into proteins.
  • #11: A gene sequence is not simply a series of codons. Instead, there are several key components. Promoter sequences assist the RNA polymerase in attaching itself to the DNA sequence template. Once the DNA sequence is transcribed, processing still remains. One of the most unexpected findings in the history of molecular genetics was the discovery that genes are split into pieces. Exons composed of codons are often interrupted by intron sequences that do not encode amino acids. Before translation can occur, the intron sequences must be spliced out of the RNA. The exons are then reassembled for translation into proteins.
  • #12: Here we see a representation of the steps involved in creating a protein from a DNA sequence.