SlideShare a Scribd company logo
Biological Databases
Pharmamatrix Workshop 2010
- Philip Winter
- Ishwar V. Hosamani
Some databases in the field of molecular biology…
AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb,
ARR, AsDb,BBDB, BCGD,Beanref,Biolmage,
BioMagResBank, BIOMDB, BLOCKS, BovGBASE,
BOVMAP, BSORF, BTKbase, CANSITE, CarbBank,
CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP,
ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG,
CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb,
Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC,
ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db,
ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView,
GCRDB, GDB, GENATLAS, Genbank, GeneCards,
Genline, GenLink, GENOTK, GenProtEC, GIFTS,
GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB,
HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD,
HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB,
HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat,
KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB,
Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5
Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us,
MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase,
OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB,
PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD,
PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE,
PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE,
SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase,
SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D,
SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS-
MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB,
TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE,
VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD,
YPM, etc .................. !!!!
What we expect from a database..!!
• Sequence, functional, structural information,
related bibliography
• Well Structured and Indexed
• Well cross-referenced (with other databases)
• Periodically updated
• Tools for analysis and visualization
Biological Databases
• Sequence databases
• Structure databases
Sequence databases
• Nucleotide databases
• Protein databases
Sequence databases
Nucleotide databases
• International Nucleotide Sequence
Database Collaboration (INSDC)
– NCBI
– EMBL
– DDBJ
Standard contents of a sequence
database
• Sequences
• Accession number
• References
• Taxonomic data
• Annotation/curation
• Keywords
• Cross-references
• Documentation
NCBI
• Very comprehensive biological database
• GENBANK: The nucleotide sequence database
• Provides 42 different resource
• Provides a simple and easy to use web
interface
http://www.ncbi.nlm.nih.gov/
• Sequence submission: done using Bankit or
Sequin
• Search Engine for data retrieval: Entrez
• Retrieves information across all the resources
under NCBI
Example: PubMed, taxonomy, SNP, PubChem
etc.
Tools for analysis
• BLAST
• Primer-BLAST
• B-Link
• ORF finder
• Genome workbench
Protein Sequence databases
• UniProt
• PFAM
• Gene Index project
UniProt
• Universal Protein Resource
• Formed through the merger of :
– SIB
– EBI-SwissProt
– TrEMBL
– PIR-PSD
• Entry names are often the names of the gene
followed by the species.
• Accession numbers are of the following
format:
• e.g. P26367 (PAX6_HUMAN)
Uniprot features
• Blast
• Align
• Retrieve
• ID mapping
Pfam
• Proteins contain conserved regions
• Based on the conserved regions, proteins are
classified into families
• Provides links to external databases like PDB,
SCOP, CATH etc.
Pfam: Features
• Sequence search
• View Pfam family
• View a clan
• View a sequence
• View a structure
• Keyword search
Gene Indices
• Project aimed at indexing genes and their
variants in the various genome sequences.
• Creating a catalogue of genes in a wide range
of organisms
• Reduce redundancy
Gene Indices Software Tools
• TGI Clustering tools
• Clview
• SeqClean
• Cdbfasta/cdbyank
Structural databases
• PDB – Protein Data Bank
• CATH
• SCOP – Structural Classification of Proteins
wwPDB
• Contains information about experimentally
determined structures of proteins, nucleic
acids, and complex assemblies
• RCSB-PDB, PDBe, PDBj, BMRB – repositories of
protein structure data
• Files in PDB, mmCIF, PDBML/XML formats
• Advanced search – provides comprehensive
information about a protein.
• Sequence info, domain info, sequence
similarity, literature, apart from the details of
the structure.
• Cross referenced to SCOP and CATH
CATH
• Classification of proteins based on domain
structures
• Each protein chopped into individual domains
and assigned into homologous superfamilies.
• Hierarchial domain classification of PDB
entries.
CATH hierarchy
• Class – derived from secondary structure content is assigned
automatically
• Architecture – describes gross orientation of secondary
structures, independent of connectivity
• Topology – clusters structures according to their
topological connections and numbers of secondary
structures
• Homologous superfamily – this level groups
together protein domains which are thought to
share a common ancestor and can therefore be
described as homologous
SCOP
• Description of structural and evolutionary
relationships between all the proteins with
known structures
• Uses the PDB entries
• Search using keywords or PDB identifiers
Hierarchy in SCOP
• Class
• Fold
• Superfamily
• Family
• Species
Thank you

More Related Content

PDF
PDF文档.pdf
PPT
Bioinformatic_Databases_2.ppt Bioinformatics
PPT
Bioinformatic_Databases and Sequence Analysis
PDF
Biological Database (1)pptxpdfpdfpdf.pdf
PPTX
Major databases in bioinformatics
PPTX
Biological databases
PPT
Data Base in Bioinformatics.ppt
PDF文档.pdf
Bioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases and Sequence Analysis
Biological Database (1)pptxpdfpdfpdf.pdf
Major databases in bioinformatics
Biological databases
Data Base in Bioinformatics.ppt

Similar to Bioinformatic_Databases_2xcxzczxcxzxcxzc (20)

PPTX
biological databases.pptx
PPTX
BIOINFORMATICS BIOLOGICAL DATABASES DATA BASES.pptx
PPTX
BIOINFORMATICS and Data analysis ppt 202
PPTX
Proteins databases
PPT
Role of bioinformatics in life sciences research
PPT
Intro to databases
PPTX
Databases_CSS2.pptx
PDF
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
PDF
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
PDF
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
PPTX
Primary Bioinformatics Database.pptx
PDF
Data Retrieval Systems
PPT
Primary and secondary database
PPTX
Important protein databases and proteomics softwares
PPT
Hands on training_biological_databases.ppt
PPT
protein databases.ppt
PPTX
Protein database
PDF
BITS: Overview of important biological databases beyond sequences
PPTX
Data retreival system
PPTX
bioinformatics presentation in the master presentation
biological databases.pptx
BIOINFORMATICS BIOLOGICAL DATABASES DATA BASES.pptx
BIOINFORMATICS and Data analysis ppt 202
Proteins databases
Role of bioinformatics in life sciences research
Intro to databases
Databases_CSS2.pptx
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Primary Bioinformatics Database.pptx
Data Retrieval Systems
Primary and secondary database
Important protein databases and proteomics softwares
Hands on training_biological_databases.ppt
protein databases.ppt
Protein database
BITS: Overview of important biological databases beyond sequences
Data retreival system
bioinformatics presentation in the master presentation
Ad

Recently uploaded (20)

PPTX
NEET PG 2025 Pharmacology Recall | Real Exam Questions from 3rd August with D...
PPTX
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...
PDF
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
PPTX
Stimulation Protocols for IUI | Dr. Laxmi Shrikhande
PPTX
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
PPT
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
PPTX
neonatal infection(7392992y282939y5.pptx
PPTX
Note on Abortion.pptx for the student note
PPTX
Transforming Regulatory Affairs with ChatGPT-5.pptx
PPTX
surgery guide for USMLE step 2-part 1.pptx
PDF
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
PDF
شيت_عطا_0000000000000000000000000000.pdf
PPT
Obstructive sleep apnea in orthodontics treatment
PDF
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
PPTX
Neuropathic pain.ppt treatment managment
PPT
Management of Acute Kidney Injury at LAUTECH
PPTX
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
PPTX
ACID BASE management, base deficit correction
PPTX
Electromyography (EMG) in Physiotherapy: Principles, Procedure & Clinical App...
PDF
Medical Evidence in the Criminal Justice Delivery System in.pdf
NEET PG 2025 Pharmacology Recall | Real Exam Questions from 3rd August with D...
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
Stimulation Protocols for IUI | Dr. Laxmi Shrikhande
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
neonatal infection(7392992y282939y5.pptx
Note on Abortion.pptx for the student note
Transforming Regulatory Affairs with ChatGPT-5.pptx
surgery guide for USMLE step 2-part 1.pptx
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
شيت_عطا_0000000000000000000000000000.pdf
Obstructive sleep apnea in orthodontics treatment
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
Neuropathic pain.ppt treatment managment
Management of Acute Kidney Injury at LAUTECH
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
ACID BASE management, base deficit correction
Electromyography (EMG) in Physiotherapy: Principles, Procedure & Clinical App...
Medical Evidence in the Criminal Justice Delivery System in.pdf
Ad

Bioinformatic_Databases_2xcxzczxcxzxcxzc

  • 1. Biological Databases Pharmamatrix Workshop 2010 - Philip Winter - Ishwar V. Hosamani
  • 2. Some databases in the field of molecular biology… AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, ARR, AsDb,BBDB, BCGD,Beanref,Biolmage, BioMagResBank, BIOMDB, BLOCKS, BovGBASE, BOVMAP, BSORF, BTKbase, CANSITE, CarbBank, CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP, ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG, CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb, Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC, ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db, ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView, GCRDB, GDB, GENATLAS, Genbank, GeneCards, Genline, GenLink, GENOTK, GenProtEC, GIFTS, GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB, HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD, HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB, HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat, KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB, Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5 Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us, MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase, OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB, PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD, PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE, PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE, SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase, SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D, SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS- MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB, TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE, VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD, YPM, etc .................. !!!!
  • 3. What we expect from a database..!! • Sequence, functional, structural information, related bibliography • Well Structured and Indexed • Well cross-referenced (with other databases) • Periodically updated • Tools for analysis and visualization
  • 4. Biological Databases • Sequence databases • Structure databases
  • 5. Sequence databases • Nucleotide databases • Protein databases
  • 7. Nucleotide databases • International Nucleotide Sequence Database Collaboration (INSDC) – NCBI – EMBL – DDBJ
  • 8. Standard contents of a sequence database • Sequences • Accession number • References • Taxonomic data • Annotation/curation • Keywords • Cross-references • Documentation
  • 9. NCBI • Very comprehensive biological database • GENBANK: The nucleotide sequence database • Provides 42 different resource • Provides a simple and easy to use web interface http://www.ncbi.nlm.nih.gov/
  • 10. • Sequence submission: done using Bankit or Sequin • Search Engine for data retrieval: Entrez • Retrieves information across all the resources under NCBI Example: PubMed, taxonomy, SNP, PubChem etc.
  • 11. Tools for analysis • BLAST • Primer-BLAST • B-Link • ORF finder • Genome workbench
  • 12. Protein Sequence databases • UniProt • PFAM • Gene Index project
  • 13. UniProt • Universal Protein Resource • Formed through the merger of : – SIB – EBI-SwissProt – TrEMBL – PIR-PSD
  • 14. • Entry names are often the names of the gene followed by the species. • Accession numbers are of the following format: • e.g. P26367 (PAX6_HUMAN)
  • 15. Uniprot features • Blast • Align • Retrieve • ID mapping
  • 16. Pfam • Proteins contain conserved regions • Based on the conserved regions, proteins are classified into families • Provides links to external databases like PDB, SCOP, CATH etc.
  • 17. Pfam: Features • Sequence search • View Pfam family • View a clan • View a sequence • View a structure • Keyword search
  • 18. Gene Indices • Project aimed at indexing genes and their variants in the various genome sequences. • Creating a catalogue of genes in a wide range of organisms • Reduce redundancy
  • 19. Gene Indices Software Tools • TGI Clustering tools • Clview • SeqClean • Cdbfasta/cdbyank
  • 21. • PDB – Protein Data Bank • CATH • SCOP – Structural Classification of Proteins
  • 22. wwPDB • Contains information about experimentally determined structures of proteins, nucleic acids, and complex assemblies • RCSB-PDB, PDBe, PDBj, BMRB – repositories of protein structure data • Files in PDB, mmCIF, PDBML/XML formats
  • 23. • Advanced search – provides comprehensive information about a protein. • Sequence info, domain info, sequence similarity, literature, apart from the details of the structure. • Cross referenced to SCOP and CATH
  • 24. CATH • Classification of proteins based on domain structures • Each protein chopped into individual domains and assigned into homologous superfamilies. • Hierarchial domain classification of PDB entries.
  • 25. CATH hierarchy • Class – derived from secondary structure content is assigned automatically • Architecture – describes gross orientation of secondary structures, independent of connectivity • Topology – clusters structures according to their topological connections and numbers of secondary structures • Homologous superfamily – this level groups together protein domains which are thought to share a common ancestor and can therefore be described as homologous
  • 26. SCOP • Description of structural and evolutionary relationships between all the proteins with known structures • Uses the PDB entries • Search using keywords or PDB identifiers
  • 27. Hierarchy in SCOP • Class • Fold • Superfamily • Family • Species

Editor's Notes

  • #8: Each database exchange data every day. Each database has its own sequence submission and retrieval tools They follow a standardized annotation The Collaboration created a Feature Table Definition that outlines legal features and syntax
  • #10: Currently, NCBI receives and processes about 20,000 direct submission sequences per month, in addition to the approximately 200,000 bulk submissions that are processed automatically. Collaboration with EMBL and DDBJ
  • #11: Database continues to grow at exponential rate. Doubling in size every 10 months Has sequences of 250,000 distinct organisms
  • #12: All tools can be downloaded and used on your local workstations as standalone.
  • #19: The goal of this project is ultimately to represent a non-redundant view of all human genes and data on their expression patterns, cellular roles, functions, and evolutionary relationships. The database will also include links to genomic sequences, mapping data, 3D structures, and literature references