DATA RETRIEVAL SYSTEM
Text-based Database Searching
Submitted By:
Dr. Shikha Thakur
Assistant Professor (Guest Faculty)
TCSC
Mumbai
Maharashtra
• The amount of biologically relevant data accessible via the WWW is
increasing at a very rapid rate.
• It is important for scientists to have easy and efficient ways of wading
through the data and finding what is important for their research.
• Knowing how to access and search for information in the database is
essential.
Depending on the type of data at hand, there are
two basic ways of searching:
• Using descriptive words to search text databases.
• Using a nucleotide or protein sequence to search sequence
databases.
Text- based database Searching
• There are three important data retrieval systems of particular
relevance to molecular biologists:
• Entrez ( at NCBI) (GI(Global Image disk image file) /Accession no.
• Sequence Retreival System, SRS (at EBI)
• DBGET/LinkDB (At Japan)
• The advantage of these retrieval systems is that they not only return
matches to a query, but also provide handy pointers to additional
important information in related databases.
Text-based database Searching
• The three systems differ in the databases they search and the links
they provide to other information.
• In using any of these systems, queries can be as simple as entering
the accession number of a newly published sequence or as complex
as searching multiple database fields for specific terms.
Text-Based Database Searching
• Basic Search Concepts
• Boolean Search – An advanced query search using two or more terms,
using Boolean operator AND, OR, NOT, default – AND
• Broadening the Search – If the results of a search produce no useful
entries, change or remove terms.
• Narrowing the search – If the results of a search produce no useful entries,
change or remove terms.
• Proximity Searching – To search with multiword terms or phrases, place
quotes around the terms.
• Wild Card – The character prepended or appended to a search term make
a search less specific., e.g., to look for all authors with last name Zav,
search using Zav*.
Entrez
• Entrez – is a molecular biology database and retrieval system
developed by the National Center for Biotechnology Information
(NCBI).
• It is an entry point for exploring distinct but integrated databases.
• (http://www.ncbi.nlm.nih.gov/Entrez/)
Entrez
• The Entrez system provides access to:
• Nucleotide sequence databases- GenBank/DDBJ/EBI
• Protein sequence databases – Swiss-Prot, PIR, PRF, PDB, and translated
protein sequences from DNA sequence databases.
• Genome and chromosome mapping data
• Molecular Modeling 3-D structures Databases.
• Literature database, PubMed – Provides excellent and easy access to
MEDLINE and pre-MEDLINE articles.
• Taxonomy database – Allows retrieval of DNA and protein sequences for
any taxonomic group.
• Specialized Databases – OMIM, dbSNP, UniSTS, etc.
Data retreival system
Data retreival system
Data retreival system
Data retreival system
Data retreival system
Data retreival system
Data retreival system
Data retreival system
Entrez
• The most valuable feature of Entrez is
• Its exploitation of the concept of ’neighbouring’.
• Which allows related articles indifferent databases to be linked to
each other, whether or not they are cross-referenced directly.
• Neighbours and links are listed in the order of similarity to the query.
• The similarity is based on pre-computed analysis of sequences,
structures and the literature.
Entrez
• One particularly useful feature in Entrez is –
• The ability to retrieve large sets of data based on some criterion and
to download them to a local computer- Batch Entrez
• Allowing these sequences to be worked on using analytical tools
available on local computer.
Entrez Features
1. Entrez Global Query – Search a subset of Entrez databases.
2. Batch Entrez –Upload a file of GI or accession numbers to retrieve
sequences.
3. Making Links Entrez – Linking to PubMed and Genbank
4.E-Utilities – Entrez programming utilities
5. LinkOut – External links to related resources.
6. Cubby – Provides with a stored search feature to store and update
searches, allows to customize your LinkOut display.
SRS.
• The Sequence Retrieval System (SRS) – A network browser for
datbases in molecular biology.
• It is a powerful sequence information indexing, search and retrieval
system (http://srs.ebi.ac.uk/)
Data retreival system
Data retreival system
SRS
• SRS is a homogeneous interface to over 80 biological databases
developed at the European Bioinformatics Institute (EBI) at Hinxton,
UK.
• The types of databases included are sequence and sequence related,
metabolic pathways, transcription factors, application results (e.g.,
BLAST), protein 3D- structure, genome, mapping, mutations, and
locus-specific mutatins.
• One can access and query their contents and navigate among them.
SRS
The Web page listing all the databases contains a link to a description
page about the database and includes the date of last update.
One can select one or more datbases to search before entering the
query.
• Over 30 versions of SRS are currently running on the WWW. Each
includes a different subset of databases and associated analytical
tools.
SRS
• SRS Features:
• SRS databases are well indexed, thus reducing the search time for the
large number of potential databases.
• SRS allows any flat file database to be indexed to any other. The
advantage being the derived indices may be rapidly searched allowing
users to retrieve link and access entries from all the interconnected
resources.
• The system has the particular strength that it can be readily
customized to use any defined set of databanks.
SRS
• Simple SRS queries
• By accession number
• Query on accession number: J00231
• By a simple author or organism: Ausubel and Rhizobium
• Boolean relations between keywords: and, or, but not
SRS
• Contd…
• Searching by dates: 01-Jan-1995:31-Dec-1995.
• Searching by size: 400:600
• Using hypertext links in an entry: Medline, Swiss- Prot and PDB
entries can be linked from within the EMBL database.
• Display of molecules via Rasmol plug-in
Data retreival system
DBGET
• DBGET/LinkDB – Is an integrated bioinformatics database retrieval
system at GenomeNet, developed by the institute for Chemical
Research, Kyoto University, and the Human Genome Center of the
University of Tokyo.
Data retreival system
Data retreival system
DBGET
• DBGET – Is used to search and extract entries from a wide range of
molecular biology databases.
• LinkDB- Is used to compute links between entries in different
databases.
• It is designed to be a network distributed database system with an
open architecture, which is suitable for incorporating local databases
or establishing a server environment.
• http://www.genome.ad.jp/dbget/
Data retreival system
Data retreival system
Data retreival system
Data retreival system
DBGET
• DBGET/LinkDB is integrated with other search tools, such as BLAST,
FAST and MOTIF to conduct further retreivals instantly.
• DBGET provides access to about 20 databases, which are queried one
at a time. After querying one of these databases, DBGET presents
links to associated information in addition to the list of results.
• A unique feature of DBGET is its connection with the Kyoto
Encyclopedia of Genes and Genomes(KEGG) database – a database of
metabolic and regulatory pathways.
Data retreival system
DBGET
• DBGET has three basic commands (or three basic modes in the Web
version), bfind, bget, and blink, to search and extract database
entries.
• blink – To search and extract database entries.
• bget – Performs the retrieval of database entries specified by the
combination of dbname:identifier
• bfind – Is used for searching entries by keywords
• Notable feature of DBGET, different from other text search systems, is
that no keyword indexing is performed when a database is installed or
updated.
DBGET
• Selected fields are extracted and stored in separate files for bfind
searches.
• An advantage for rapid database updates, but sometimes a
disadvantage for elaborate searching.
• To supplement bfind, the full text search STAG is provided.
• blink – The LinkDB search. Once entries of interest are found, it can
be used to retrieve related entries in a given database or all databases
in GenomeNet.
Example
• Let’s consider an example to show how each system can be used to
retrieve the SwissProt entry P04391, an ornithine
carbamoyltransferase protein in Escherichia coli.
• In Entrez, enter the name P04391 in the protein database query
form and view the entry and associated links and neighbours.
Data retreival system
Data retreival system
Data retreival system
Example - SRS
• In SRS, first select the SwissProt database, then enter P04391 in the
query form and, once the entry is displayed search for links to other
related databases.
Data retreival system
Data retreival system
Data retreival system
Data retreival system
Example – LinkDB
• However, the fastest way of gathering the related information for this
entry is to search LinkDB.
• By simply entering swissport:P04391, a list of all links to all the
related databases is displayed.
Data retreival system
Data retreival system
Data retreival system
Thank You

More Related Content

DOCX
UniProt
 
PPTX
Gen bank databases
PPTX
PPT
Gene bank by kk sahu
DOCX
PPTX
Protein Databases
PDF
Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE)
PPTX
Uni prot presentation
UniProt
 
Gen bank databases
Gene bank by kk sahu
Protein Databases
Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE)
Uni prot presentation

What's hot (20)

PPTX
Genomic databases
PPTX
Bioinformatics
PPTX
Entrez databases
PPTX
Cath
PPTX
Ion torrent sequencing
PPTX
Secondary protein structure prediction
PPTX
Scop database
PPTX
Biological databases
PDF
Dot matrix
PPTX
sequence of file formats in bioinformatics
PPT
swiss-prot<bioinformatics>
PPTX
2 d gel electrophoresis
DOCX
Data retrieval tools
PPTX
Express sequence tags
PPTX
Ion torrent
PPT
Sanger sequencing method of DNA
PDF
Sequence analysis - Bioinformatics
Genomic databases
Bioinformatics
Entrez databases
Cath
Ion torrent sequencing
Secondary protein structure prediction
Scop database
Biological databases
Dot matrix
sequence of file formats in bioinformatics
swiss-prot<bioinformatics>
2 d gel electrophoresis
Data retrieval tools
Express sequence tags
Ion torrent
Sanger sequencing method of DNA
Sequence analysis - Bioinformatics
Ad

Similar to Data retreival system (20)

PDF
Data Retrieval Systems
PPTX
Data retriveal ,srg and dbget
PPTX
BIOINFORMATICS BIOLOGICAL DATABASES DATA BASES.pptx
PPTX
Bioinformatics
PDF
BIOLOGICAL DATABASE AND ITS TYPES,IMPORTANCE OF BIOLOGICAL DATABASE
PDF
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
PPTX
biological databases.pptx
PPTX
Proteins databases
PPTX
Sequence submission tools ............pptx
PPTX
bioinformatics presentation in the master presentation
PPT
7. Information retrieval from databases.ppt
PPTX
Primary Bioinformatics Database.pptx
PPT
ENTREZ.ppt
PPTX
Major databases in bioinformatics
PPTX
Protein sequence data bases in animals.pptx
PPT
Biological data base
PDF
Biological Database (1)pptxpdfpdfpdf.pdf
PPTX
Genomic Databases-.pptx
PPTX
Biological data bioinformatics
PPTX
DATABASES...............................pptx
Data Retrieval Systems
Data retriveal ,srg and dbget
BIOINFORMATICS BIOLOGICAL DATABASES DATA BASES.pptx
Bioinformatics
BIOLOGICAL DATABASE AND ITS TYPES,IMPORTANCE OF BIOLOGICAL DATABASE
Bioinformatics: History of Bioinformatics, Components of Bioinformatics, Geno...
biological databases.pptx
Proteins databases
Sequence submission tools ............pptx
bioinformatics presentation in the master presentation
7. Information retrieval from databases.ppt
Primary Bioinformatics Database.pptx
ENTREZ.ppt
Major databases in bioinformatics
Protein sequence data bases in animals.pptx
Biological data base
Biological Database (1)pptxpdfpdfpdf.pdf
Genomic Databases-.pptx
Biological data bioinformatics
DATABASES...............................pptx
Ad

More from Shikha Thakur (14)

PPTX
Types of greenhouse
PPTX
Medicinal plants on terrace
PPTX
Biological Weapon Threat to Humanity
PPTX
Bacteria
PPTX
Swiss prot
PPTX
Energetics of kreb's cycle
PPTX
Top 10 must vaccines
PPT
Introduction to Pubmed
PPTX
Proteomics
PPTX
Career oppurtunities in the field of Bioinformatics
PPTX
Screening
PPTX
Presentation1
PPTX
Presentation2
PPTX
Screening potential of biocontrol agents
Types of greenhouse
Medicinal plants on terrace
Biological Weapon Threat to Humanity
Bacteria
Swiss prot
Energetics of kreb's cycle
Top 10 must vaccines
Introduction to Pubmed
Proteomics
Career oppurtunities in the field of Bioinformatics
Screening
Presentation1
Presentation2
Screening potential of biocontrol agents

Recently uploaded (20)

PDF
Transcultural that can help you someday.
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Leprosy and NLEP programme community medicine
DOCX
Factor Analysis Word Document Presentation
PPTX
A Complete Guide to Streamlining Business Processes
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Steganography Project Steganography Project .pptx
 
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
Introduction to Inferential Statistics.pptx
Transcultural that can help you someday.
SAP 2 completion done . PRESENTATION.pptx
Leprosy and NLEP programme community medicine
Factor Analysis Word Document Presentation
A Complete Guide to Streamlining Business Processes
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Pilar Kemerdekaan dan Identi Bangsa.pptx
retention in jsjsksksksnbsndjddjdnFPD.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Steganography Project Steganography Project .pptx
 
Topic 5 Presentation 5 Lesson 5 Corporate Fin
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
CYBER SECURITY the Next Warefare Tactics
Introduction to Inferential Statistics.pptx

Data retreival system

  • 1. DATA RETRIEVAL SYSTEM Text-based Database Searching Submitted By: Dr. Shikha Thakur Assistant Professor (Guest Faculty) TCSC Mumbai Maharashtra
  • 2. • The amount of biologically relevant data accessible via the WWW is increasing at a very rapid rate. • It is important for scientists to have easy and efficient ways of wading through the data and finding what is important for their research. • Knowing how to access and search for information in the database is essential.
  • 3. Depending on the type of data at hand, there are two basic ways of searching: • Using descriptive words to search text databases. • Using a nucleotide or protein sequence to search sequence databases.
  • 4. Text- based database Searching • There are three important data retrieval systems of particular relevance to molecular biologists: • Entrez ( at NCBI) (GI(Global Image disk image file) /Accession no. • Sequence Retreival System, SRS (at EBI) • DBGET/LinkDB (At Japan) • The advantage of these retrieval systems is that they not only return matches to a query, but also provide handy pointers to additional important information in related databases.
  • 5. Text-based database Searching • The three systems differ in the databases they search and the links they provide to other information. • In using any of these systems, queries can be as simple as entering the accession number of a newly published sequence or as complex as searching multiple database fields for specific terms.
  • 6. Text-Based Database Searching • Basic Search Concepts • Boolean Search – An advanced query search using two or more terms, using Boolean operator AND, OR, NOT, default – AND • Broadening the Search – If the results of a search produce no useful entries, change or remove terms. • Narrowing the search – If the results of a search produce no useful entries, change or remove terms. • Proximity Searching – To search with multiword terms or phrases, place quotes around the terms. • Wild Card – The character prepended or appended to a search term make a search less specific., e.g., to look for all authors with last name Zav, search using Zav*.
  • 7. Entrez • Entrez – is a molecular biology database and retrieval system developed by the National Center for Biotechnology Information (NCBI). • It is an entry point for exploring distinct but integrated databases. • (http://www.ncbi.nlm.nih.gov/Entrez/)
  • 8. Entrez • The Entrez system provides access to: • Nucleotide sequence databases- GenBank/DDBJ/EBI • Protein sequence databases – Swiss-Prot, PIR, PRF, PDB, and translated protein sequences from DNA sequence databases. • Genome and chromosome mapping data • Molecular Modeling 3-D structures Databases. • Literature database, PubMed – Provides excellent and easy access to MEDLINE and pre-MEDLINE articles. • Taxonomy database – Allows retrieval of DNA and protein sequences for any taxonomic group. • Specialized Databases – OMIM, dbSNP, UniSTS, etc.
  • 17. Entrez • The most valuable feature of Entrez is • Its exploitation of the concept of ’neighbouring’. • Which allows related articles indifferent databases to be linked to each other, whether or not they are cross-referenced directly. • Neighbours and links are listed in the order of similarity to the query. • The similarity is based on pre-computed analysis of sequences, structures and the literature.
  • 18. Entrez • One particularly useful feature in Entrez is – • The ability to retrieve large sets of data based on some criterion and to download them to a local computer- Batch Entrez • Allowing these sequences to be worked on using analytical tools available on local computer.
  • 19. Entrez Features 1. Entrez Global Query – Search a subset of Entrez databases. 2. Batch Entrez –Upload a file of GI or accession numbers to retrieve sequences. 3. Making Links Entrez – Linking to PubMed and Genbank 4.E-Utilities – Entrez programming utilities 5. LinkOut – External links to related resources. 6. Cubby – Provides with a stored search feature to store and update searches, allows to customize your LinkOut display.
  • 20. SRS. • The Sequence Retrieval System (SRS) – A network browser for datbases in molecular biology. • It is a powerful sequence information indexing, search and retrieval system (http://srs.ebi.ac.uk/)
  • 23. SRS • SRS is a homogeneous interface to over 80 biological databases developed at the European Bioinformatics Institute (EBI) at Hinxton, UK. • The types of databases included are sequence and sequence related, metabolic pathways, transcription factors, application results (e.g., BLAST), protein 3D- structure, genome, mapping, mutations, and locus-specific mutatins. • One can access and query their contents and navigate among them.
  • 24. SRS The Web page listing all the databases contains a link to a description page about the database and includes the date of last update. One can select one or more datbases to search before entering the query. • Over 30 versions of SRS are currently running on the WWW. Each includes a different subset of databases and associated analytical tools.
  • 25. SRS • SRS Features: • SRS databases are well indexed, thus reducing the search time for the large number of potential databases. • SRS allows any flat file database to be indexed to any other. The advantage being the derived indices may be rapidly searched allowing users to retrieve link and access entries from all the interconnected resources. • The system has the particular strength that it can be readily customized to use any defined set of databanks.
  • 26. SRS • Simple SRS queries • By accession number • Query on accession number: J00231 • By a simple author or organism: Ausubel and Rhizobium • Boolean relations between keywords: and, or, but not
  • 27. SRS • Contd… • Searching by dates: 01-Jan-1995:31-Dec-1995. • Searching by size: 400:600 • Using hypertext links in an entry: Medline, Swiss- Prot and PDB entries can be linked from within the EMBL database. • Display of molecules via Rasmol plug-in
  • 29. DBGET • DBGET/LinkDB – Is an integrated bioinformatics database retrieval system at GenomeNet, developed by the institute for Chemical Research, Kyoto University, and the Human Genome Center of the University of Tokyo.
  • 32. DBGET • DBGET – Is used to search and extract entries from a wide range of molecular biology databases. • LinkDB- Is used to compute links between entries in different databases. • It is designed to be a network distributed database system with an open architecture, which is suitable for incorporating local databases or establishing a server environment. • http://www.genome.ad.jp/dbget/
  • 37. DBGET • DBGET/LinkDB is integrated with other search tools, such as BLAST, FAST and MOTIF to conduct further retreivals instantly. • DBGET provides access to about 20 databases, which are queried one at a time. After querying one of these databases, DBGET presents links to associated information in addition to the list of results. • A unique feature of DBGET is its connection with the Kyoto Encyclopedia of Genes and Genomes(KEGG) database – a database of metabolic and regulatory pathways.
  • 39. DBGET • DBGET has three basic commands (or three basic modes in the Web version), bfind, bget, and blink, to search and extract database entries. • blink – To search and extract database entries. • bget – Performs the retrieval of database entries specified by the combination of dbname:identifier • bfind – Is used for searching entries by keywords • Notable feature of DBGET, different from other text search systems, is that no keyword indexing is performed when a database is installed or updated.
  • 40. DBGET • Selected fields are extracted and stored in separate files for bfind searches. • An advantage for rapid database updates, but sometimes a disadvantage for elaborate searching. • To supplement bfind, the full text search STAG is provided. • blink – The LinkDB search. Once entries of interest are found, it can be used to retrieve related entries in a given database or all databases in GenomeNet.
  • 41. Example • Let’s consider an example to show how each system can be used to retrieve the SwissProt entry P04391, an ornithine carbamoyltransferase protein in Escherichia coli. • In Entrez, enter the name P04391 in the protein database query form and view the entry and associated links and neighbours.
  • 45. Example - SRS • In SRS, first select the SwissProt database, then enter P04391 in the query form and, once the entry is displayed search for links to other related databases.
  • 50. Example – LinkDB • However, the fastest way of gathering the related information for this entry is to search LinkDB. • By simply entering swissport:P04391, a list of all links to all the related databases is displayed.