BIOINFORMATICS AND DATABASES IN BIOINFORMATICS.pdf

Dr. Harisingh Gour Viswavidyalaya
A Central University
DEPARTMENT OF ZOOLOGY
TOPIC – DATABASES IN BIOINFORMATICS
MID II ASSIGNMENT
ZOO – SEC – 128
SUBMITED TO – MR. ANUPAM KUMAR
SUBMITED BY –
PRAVANJAN DASH
ROLL NO. – Y23265020, Msc 1st YEAR, 1st SEMESTER

INTRODUCTION OF DATABASE
BIOLOGICAL DATABASES are
 Collection of files containing records of biological data in
machine readable form Can be accessed, added, retrieved,
manipulated and modified.
 Store, manage, connect and distribute data.
 Data are arranged by sets of rules which are programmed
into software that manages the data called Database
Management System or DBMS.
 A biological database is a collection of data that is
structured, searchable, updated periodically and cross
referenced.
 The data is stores, maintained, annotated, curated and
stored for public/research use.
 Data collected and organized in a specific but useful way

Classification based on type of data stored
 Primary Databases: Contain original data in the form of
primary sequence data or structural data as submitted by the
scientific community.
 Secondary Databases: Contain information that has been
processed and derived from the raw data available in primary
database.eg: PROSITE, PRINTS, BLOCKS etc..
 Composite Databases: Collect and present data after
comparing and filtering them from different primary databases
and exhibit only the non redundant sequences.

PRIMARY DATA VERSUS SECONDARY DATA
PRIMARY DATA
• Primary data is a type of data researchers
directly collect from main sources.
• Includes real-time data.
• Collected to address a current research
problem.
• Accessing primary data includes a relatively
long process.
• Data collection tools include observations,
surveys, questionnaires, physical testing,
online questionnaires, personal or telephone
interviews, case studies, and focused group
discussions.
SECONDARY DATA
• Secondary data refers to already existing data
produced by the previous researchers.
• Related to the past.
• Primarily collected to address previously
existed research problems and can be used
to address the current research problem as
well.
• Referring to secondary data is quick and easy.
• Data collection tools include journal articles,
websites, books, government publications,
records, etc.

PRIMARY DATABASES
 Primary databases contain original biological data. They are
archives of raw sequence or structural data submitted by the scientific
community.
 Once given a database a accession number, the data in primary
database are never changed.
 There are three (Genbank, EMBL, DDBJ) major public sequence
databases that store raw nucleic acid sequence data produced and
submitted by researchers worldwide.
 SOME PRIMARY DATABASES
Nucleic acid databases: Gen Bank, EMBL, DDBJ
Protein sequence databases: PIR, Swiss-Prot, UNIPROT
Protein structure database: PDB
Metabolic databases: KEGG

SECONDARY DATABASE
• Secondary database contain additional information
derived from the analysis f data available in primary
sources. econdary databases are analysed in a variety
Of ways and contain different formation in different
formats.
• SOME SECONDARY DATABASES ARE
 TrEMBL
 Pfam
 PROSITE
 Profiles
 SCOP
 CATH

NUCLEOTIDE SEQUENCE DATABASE
• Composed of a group of nucleotide sequence entries.
• Data repositories that accept nucleic acid sequence data
and make it freely available to the public.
• All the three are members of the International Nucleotide
Sequence Database Consortium (INSDC) and interchange
data.
• GenBank, EMBL, DDBJ are principal nucleotide
databases.

PROTEIN SEQUENCE DATABASES
 An array of amino acid sequence entries arranged
according to the identification number.
 Well known protein sequence databases available
on www are
 Swiss-Prot
 PIR
 UNIPROT

PROTEIN STRUCTURE DATABASE
 Many proteins which exhibit a common evolutionary
origin, show structural similarities.
 Dissimilar proteins exhibit changes in primary, secondary,
teritiary and quarternary structures.
 Similar or dissimilar protein structure can be predicted
with structure database.
 These databases store a collection of three dimensional
structures of proteins.
 EXAMPLE IS pluggable database (PDB) .

BIOINFORMATICS AND DATABASES IN BIOINFORMATICS.pdf

More Related Content

What's hot (20)

Similar to BIOINFORMATICS AND DATABASES IN BIOINFORMATICS.pdf (20)

Recently uploaded (20)

BIOINFORMATICS AND DATABASES IN BIOINFORMATICS.pdf