Introduction to Databases
INTRODUCTION
DATA
Data is raw, unorganized facts that need to be processed.
Example:- Each student's test score is one piece of data.
INFORMATION
When data is processed, organized, structured or presented in a given context
so as to make it useful, it is called information.
Example:- score of a class or of the average entire school is information that
can be derived from the given data.
Database
 A database is a collection of data in an organized
manner, which is accessible in various ways.
 Biological Databases serve a critical purpose in the
collection and organization of data related to biological
systems.
 They provide a computational support and a user-friendly
interface to a researcher for a meaningful analysis of
biological data.
 A database is a computerized archive used to store and
organize data in such a way that information can be
retrieved easily via a variety of search criteria.
 Databases are composed of computer hardware and software
for data management.
 The chief objective of the development of a database is to
organize data in a set of structured records to enable easy
retrieval of information.
 Each record, also called an entry, should contain a number
of fields that hold the actual data items, for example, fields
for names, phone numbers, addresses, dates.
WHAT ARE THE BIOLOGICAL
DATABASES ???
Databases.ppt
Different classifications of
databases
 Type of data
 nucleotide sequences
 protein sequences
 proteins sequence patterns or motifs
 macromolecular 3D structure
 gene expression data
 metabolic pathways
Databases.ppt
Different classifications of
databases….
 Primary or derived databases
 Primary databases: experimental results directly
into database
 Secondary databases: results of analysis of
primary databases
 Aggregate of many databases
 Links to other data items
 Combination of data
 Consolidation of data
Different classifications of
databases….
 Availability
 Publicly available, no restrictions
 Available, but with copyright
 Accessible, but not downloadable
 Academic, but not freely available
 Proprietary, commercial; possibly free for
academics
TYPES OF DATABASES
 Primary Databases
 Secondary Databases
PRIMARY DATABASES
Contains bio-molecular data in its original form.
Experimental results are submitted directly into the database by
researchers, and the data are essentially archival in nature.
Once given a database accession number, the data in primary
databases are never changed.
Examples :- GenBank, EMBL and DDBJ for DNA/RNA sequences,
SWISS-PROT and PIR for protein sequences and PDB for molecular
structures.
GenBank
• Database from NCBI, includes sequences from
publicly available resources.
http://www.ncbi.nlm.nih.gov
/genbank/
15
NCBI and Entrez
 One of the largest and most comprehensive
databases belonging to the NIH – national institute
of health (USA)
 Entrez is the search engine of NCBI
 Search for :
genes, proteins, genomes, structures, diseases,
publications and more.
 http://www.ncbi.nlm.nih.gov/
Genbank
 An annotated collection of all publicly
available nucleotide and proteins
 Set up in 1979 at the LANL (Los Alamos).
 Maintained since 1992 NCBI (Bethesda).
GenBank file format
GenBank file format
Databases.ppt
EMBL
European Molecular Biological Laboratory
Nucleic acid database from EBI
(European Bioinformatics Institute)
Produced in collaboration with DDBJ and GenBank
Search engine – SRS (Sequence Retrieval System)
http://www.ebi.ac.uk
/
DDBJ
DNA Databank of Japan
Started in 1986 in collaboration with GenBank
Produced and maintained at NIG
(National Institute of Genetics)
http://www.ddbj.nig.ac.jp/
SWISS PROT http://www.ebi.ac.uk/uniprot/
…...
 Annotated sequence database established
in 1986
 Consists of sequence entries of different
lie formats
 Similar format to EMBL
 http://us.expasy.org/sprot/sprot-top.html
PIR
• Protein Information Resource
•A division of National Biomedical Research
•Foundation (NBRF) in U.S.
•One can search for entries or do sequence
similarity search at PIR site.
http://pir.georgetown.edu
/
TrEMBL
Translated European Molecular Biology Laboratory
Computer annotated supplement of SWISS PROT.
Contains all the translations of EMBL nucleotide
sequence entries not yet integrated in SWISS PROT.
http://www.ebi.ac.uk/trembl/
Protein DataBank (PDB)
 Important in solving real problems in molecular
biology
 Protein Databank
 PDB Established in 1972 at Brookhaven National
Laboratory (BNL)
 Sole international repository of macromolecular
structure data
 Moved to Research Collaboratory
for Structural Bioinformatics
http://www.rcsb.org/
PDB: example
HEADER LYASE(OXO-ACID) 01-OCT-91 12CA 12CA 2
COMPND CARBONIC ANHYDRASE /II (CARBONATE DEHYDRATASE) (/HCA II) 12CA 3
SOURCE HUMAN (HOMO SAPIENS) RECOMBINANT PROTEIN 12CA 5
AUTHOR S.K.NAIR,D.W.CHRISTIANSON 12CA 6
REVDAT 1 15-OCT-92 12CA 0 12CA 7
JRNL AUTH S.K.NAIR,T.L.CALDERONE,D.W.CHRISTIANSON,C.A.FIERKE 12CA 8
JRNL TITL ALTERING THE MOUTH OF A HYDROPHOBIC POCKET. 12CA 9
JRNL TITL 2 STRUCTURE AND KINETICS OF HUMAN CARBONIC ANHYDRASE 12CA 10
JRNL TITL 3 /II$ MUTANTS AT RESIDUE VAL-121 12CA 11
JRNL REF J.BIOL.CHEM. V. 266 17320 1991 12CA 12
JRNL REFN ASTM JBCHA3 US ISSN 0021-9258 071 12CA 13
REMARK 1 12CA 14EMARK 3 AUTHORS
HENDRICKSON,KONNERT 12CA 20
REMARK 3 R VALUE 0.170 12CA 21
REMARK 3 RMSD BOND DISTANCES 0.011 ANGSTROMS 12CA 22
REMARK 3 RMSD BOND ANGLES 1.3 DEGREES 12CA 23
REMARK 4 12CA 24
REMARK 4 N-TERMINAL RESIDUES SER 2, HIS 3, HIS 4 AND C-TERMINAL 12CA 25
REMARK 4 RESIDUE LYS 260 WERE NOT LOCATED IN THE DENSITY MAPS AND, 12CA 26
REMARK 4 THEREFORE, NO COORDINATES ARE INCLUDED FOR THESE RESIDUES. 12CA 27
………
COMPOSITE DATABASES
Collection of various primary database sequences
Renders sequence searching highly efficient as it searches
multiple resources
Examples :- NRDB (Non Redundant Database), OWL,
MIPSX, SWISS PROT + TrEMBL
Databases.ppt
SECONDARY DATABASES
Contains data derived from the results of analysing
primary data
Manually created or automatically generated
Contains more relevant and useful information
structured to specific requirements
Example :- PROSITE, PRINTS, BLOCKS, Pfam
PROSITE
Families of proteins
Can search using regular
expressions
Similar to unix commands
Families exhibit these patterns
So we can search over families
http://ca.expasy.org/
prosite/
BLOCKS
 Motifs/blocks are
created by
automatically
detecting the
most conserved
regions of each
protein family.
PRIMARY VS SECONDARY DATABASES

More Related Content

PPTX
Biological databases.pptx
PPT
Bioinformatics and Databases in Biological Science
PDF
BIOINFORMATICS AND DATABASES IN BIOINFORMATICS.pdf
PPTX
COMPUNATIONAL BIOLOGY AND DATABASES IN BIOINFORMATICS.pptx
PDF
BIOLOGICAL DATABASE AND ITS TYPES,IMPORTANCE OF BIOLOGICAL DATABASE
PPTX
What are Databases?
PPTX
DATABASES...............................pptx
PPTX
Biological database
Biological databases.pptx
Bioinformatics and Databases in Biological Science
BIOINFORMATICS AND DATABASES IN BIOINFORMATICS.pdf
COMPUNATIONAL BIOLOGY AND DATABASES IN BIOINFORMATICS.pptx
BIOLOGICAL DATABASE AND ITS TYPES,IMPORTANCE OF BIOLOGICAL DATABASE
What are Databases?
DATABASES...............................pptx
Biological database

Similar to Databases.ppt (20)

PPT
Biological databases
PPT
Bioinformatics in biotechnology by kk sahu
PPTX
Primary and secondary databases ppt by puneet kulyana
PPTX
Database in bioinformatics
PPT
Primary and secondary database
PPTX
Biological databases
PDF
Bioinformatics biological databases
PPTX
Presentation on Biological database By Elufer Akram @ University Of Science ...
PPTX
Nucleic acid and protein databanks
PPTX
Nucleic Acid Databases (NDB ) of bioinformatics pptx
PPTX
Primary Databases.pptx
PPTX
Biological databasesBiological databases
PPTX
Databases in Bioinformatics
PPTX
2 Discovery and Acquisition of Data1.pptx
PPT
Introduction to Bioinformatics and DatabasesDay1.ppt
PPTX
BIOINFORMATICS BIOLOGICAL DATABASES DATA BASES.pptx
PPT
Bioinformatic_Databases and Sequence Analysis
PPTX
Introduction to databases.pptx
PDF
Bioinformatics introduction
Biological databases
Bioinformatics in biotechnology by kk sahu
Primary and secondary databases ppt by puneet kulyana
Database in bioinformatics
Primary and secondary database
Biological databases
Bioinformatics biological databases
Presentation on Biological database By Elufer Akram @ University Of Science ...
Nucleic acid and protein databanks
Nucleic Acid Databases (NDB ) of bioinformatics pptx
Primary Databases.pptx
Biological databasesBiological databases
Databases in Bioinformatics
2 Discovery and Acquisition of Data1.pptx
Introduction to Bioinformatics and DatabasesDay1.ppt
BIOINFORMATICS BIOLOGICAL DATABASES DATA BASES.pptx
Bioinformatic_Databases and Sequence Analysis
Introduction to databases.pptx
Bioinformatics introduction
Ad

More from BlackHunt1 (13)

PPTX
Plant breeding - The past, the present and the future.pptx
PPT
topic_14_-_genetic_technology.ppt
PPT
Pierce5e_ch21_lecturePPT.ppt
PPT
Lezione 17- Epigenetics.ppt
PPT
slides1.ppt
PPT
45931.ppt
PPT
4_4_lambda_decisions.ppt
PPTX
DNA replication_BTL.pptx
PPTX
Gene_Expression.pptx
PPT
_chapter 3.ppt_.ppt
PPT
Bioinformatics&Databases.ppt
PPTX
Presentation A - Using Restriction Enzymes.pptx
PDF
Recombinant-DNA-Technology.pdf
Plant breeding - The past, the present and the future.pptx
topic_14_-_genetic_technology.ppt
Pierce5e_ch21_lecturePPT.ppt
Lezione 17- Epigenetics.ppt
slides1.ppt
45931.ppt
4_4_lambda_decisions.ppt
DNA replication_BTL.pptx
Gene_Expression.pptx
_chapter 3.ppt_.ppt
Bioinformatics&Databases.ppt
Presentation A - Using Restriction Enzymes.pptx
Recombinant-DNA-Technology.pdf
Ad

Recently uploaded (20)

DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PPTX
What’s under the hood: Parsing standardized learning content for AI
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PPTX
Computer Architecture Input Output Memory.pptx
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
MICROENCAPSULATION_NDDS_BPHARMACY__SEM VII_PCI .pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
Hazard Identification & Risk Assessment .pdf
PDF
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PDF
HVAC Specification 2024 according to central public works department
PDF
advance database management system book.pdf
PDF
Journal of Dental Science - UDMY (2021).pdf
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Cambridge-Practice-Tests-for-IELTS-12.docx
What’s under the hood: Parsing standardized learning content for AI
AI-driven educational solutions for real-life interventions in the Philippine...
Computer Architecture Input Output Memory.pptx
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
A powerpoint presentation on the Revised K-10 Science Shaping Paper
MICROENCAPSULATION_NDDS_BPHARMACY__SEM VII_PCI .pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
What if we spent less time fighting change, and more time building what’s rig...
Hazard Identification & Risk Assessment .pdf
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
B.Sc. DS Unit 2 Software Engineering.pptx
HVAC Specification 2024 according to central public works department
advance database management system book.pdf
Journal of Dental Science - UDMY (2021).pdf
Paper A Mock Exam 9_ Attempt review.pdf.
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf

Databases.ppt

  • 3. DATA Data is raw, unorganized facts that need to be processed. Example:- Each student's test score is one piece of data. INFORMATION When data is processed, organized, structured or presented in a given context so as to make it useful, it is called information. Example:- score of a class or of the average entire school is information that can be derived from the given data.
  • 4. Database  A database is a collection of data in an organized manner, which is accessible in various ways.  Biological Databases serve a critical purpose in the collection and organization of data related to biological systems.  They provide a computational support and a user-friendly interface to a researcher for a meaningful analysis of biological data.
  • 5.  A database is a computerized archive used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria.  Databases are composed of computer hardware and software for data management.  The chief objective of the development of a database is to organize data in a set of structured records to enable easy retrieval of information.  Each record, also called an entry, should contain a number of fields that hold the actual data items, for example, fields for names, phone numbers, addresses, dates.
  • 6. WHAT ARE THE BIOLOGICAL DATABASES ???
  • 8. Different classifications of databases  Type of data  nucleotide sequences  protein sequences  proteins sequence patterns or motifs  macromolecular 3D structure  gene expression data  metabolic pathways
  • 10. Different classifications of databases….  Primary or derived databases  Primary databases: experimental results directly into database  Secondary databases: results of analysis of primary databases  Aggregate of many databases  Links to other data items  Combination of data  Consolidation of data
  • 11. Different classifications of databases….  Availability  Publicly available, no restrictions  Available, but with copyright  Accessible, but not downloadable  Academic, but not freely available  Proprietary, commercial; possibly free for academics
  • 12. TYPES OF DATABASES  Primary Databases  Secondary Databases
  • 13. PRIMARY DATABASES Contains bio-molecular data in its original form. Experimental results are submitted directly into the database by researchers, and the data are essentially archival in nature. Once given a database accession number, the data in primary databases are never changed. Examples :- GenBank, EMBL and DDBJ for DNA/RNA sequences, SWISS-PROT and PIR for protein sequences and PDB for molecular structures.
  • 14. GenBank • Database from NCBI, includes sequences from publicly available resources. http://www.ncbi.nlm.nih.gov /genbank/
  • 15. 15 NCBI and Entrez  One of the largest and most comprehensive databases belonging to the NIH – national institute of health (USA)  Entrez is the search engine of NCBI  Search for : genes, proteins, genomes, structures, diseases, publications and more.  http://www.ncbi.nlm.nih.gov/
  • 16. Genbank  An annotated collection of all publicly available nucleotide and proteins  Set up in 1979 at the LANL (Los Alamos).  Maintained since 1992 NCBI (Bethesda).
  • 20. EMBL European Molecular Biological Laboratory Nucleic acid database from EBI (European Bioinformatics Institute) Produced in collaboration with DDBJ and GenBank Search engine – SRS (Sequence Retrieval System) http://www.ebi.ac.uk /
  • 21. DDBJ DNA Databank of Japan Started in 1986 in collaboration with GenBank Produced and maintained at NIG (National Institute of Genetics) http://www.ddbj.nig.ac.jp/
  • 22. SWISS PROT http://www.ebi.ac.uk/uniprot/ …...  Annotated sequence database established in 1986  Consists of sequence entries of different lie formats  Similar format to EMBL  http://us.expasy.org/sprot/sprot-top.html
  • 23. PIR • Protein Information Resource •A division of National Biomedical Research •Foundation (NBRF) in U.S. •One can search for entries or do sequence similarity search at PIR site. http://pir.georgetown.edu /
  • 24. TrEMBL Translated European Molecular Biology Laboratory Computer annotated supplement of SWISS PROT. Contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS PROT. http://www.ebi.ac.uk/trembl/
  • 25. Protein DataBank (PDB)  Important in solving real problems in molecular biology  Protein Databank  PDB Established in 1972 at Brookhaven National Laboratory (BNL)  Sole international repository of macromolecular structure data  Moved to Research Collaboratory for Structural Bioinformatics http://www.rcsb.org/
  • 26. PDB: example HEADER LYASE(OXO-ACID) 01-OCT-91 12CA 12CA 2 COMPND CARBONIC ANHYDRASE /II (CARBONATE DEHYDRATASE) (/HCA II) 12CA 3 SOURCE HUMAN (HOMO SAPIENS) RECOMBINANT PROTEIN 12CA 5 AUTHOR S.K.NAIR,D.W.CHRISTIANSON 12CA 6 REVDAT 1 15-OCT-92 12CA 0 12CA 7 JRNL AUTH S.K.NAIR,T.L.CALDERONE,D.W.CHRISTIANSON,C.A.FIERKE 12CA 8 JRNL TITL ALTERING THE MOUTH OF A HYDROPHOBIC POCKET. 12CA 9 JRNL TITL 2 STRUCTURE AND KINETICS OF HUMAN CARBONIC ANHYDRASE 12CA 10 JRNL TITL 3 /II$ MUTANTS AT RESIDUE VAL-121 12CA 11 JRNL REF J.BIOL.CHEM. V. 266 17320 1991 12CA 12 JRNL REFN ASTM JBCHA3 US ISSN 0021-9258 071 12CA 13 REMARK 1 12CA 14EMARK 3 AUTHORS HENDRICKSON,KONNERT 12CA 20 REMARK 3 R VALUE 0.170 12CA 21 REMARK 3 RMSD BOND DISTANCES 0.011 ANGSTROMS 12CA 22 REMARK 3 RMSD BOND ANGLES 1.3 DEGREES 12CA 23 REMARK 4 12CA 24 REMARK 4 N-TERMINAL RESIDUES SER 2, HIS 3, HIS 4 AND C-TERMINAL 12CA 25 REMARK 4 RESIDUE LYS 260 WERE NOT LOCATED IN THE DENSITY MAPS AND, 12CA 26 REMARK 4 THEREFORE, NO COORDINATES ARE INCLUDED FOR THESE RESIDUES. 12CA 27 ………
  • 27. COMPOSITE DATABASES Collection of various primary database sequences Renders sequence searching highly efficient as it searches multiple resources Examples :- NRDB (Non Redundant Database), OWL, MIPSX, SWISS PROT + TrEMBL
  • 29. SECONDARY DATABASES Contains data derived from the results of analysing primary data Manually created or automatically generated Contains more relevant and useful information structured to specific requirements Example :- PROSITE, PRINTS, BLOCKS, Pfam
  • 30. PROSITE Families of proteins Can search using regular expressions Similar to unix commands Families exhibit these patterns So we can search over families http://ca.expasy.org/ prosite/
  • 31. BLOCKS  Motifs/blocks are created by automatically detecting the most conserved regions of each protein family.
  • 32. PRIMARY VS SECONDARY DATABASES