SlideShare a Scribd company logo
NCBIBootCamp
NCBI“”...advances science and health by providing access to biomedical and genomic information
NCBISequencesExpressionGenome mapsStructuresProtein DomainsHomology (gene, protein, structure)PathwaysGenetic Variation
NCBItoolsdatabases
databases** a brief survey of selected dbs
1literature
PubMedBookshelfOMIM
PubMed20,672,941citations2,157,529PubMed Central5,519indexed journals
Bookshelf767
Dr. McKusickOMIM
Lesch-NyhanIf you query for Lesch-Nyhan, youget a very long OMIM record OMIM
Clinical FeaturesBiochemical FeaturesInheritancePathogenesisDiagnosisHistoryDescriptionCloningGene StructureMappingMolecular GeneticsPathogenesisEvolutionAnimal ModelAllelic VariantsSee AlsoReferencesContributorsCreation DateEdit HistoryOMIMNote: there are separate entries for Lesch-Nyhan syndrome and the protein that causes the defect
OMIMEvery OMIM Record has an extensive list of internal and external links
2sequences
NucleotideGenBankRefSeq
DNARNAProteinEST: expressed sequence tagSNP: single nucleotide polymorphismWGS: whole genome sequencingCDS: coding sequenceSTS: sequence tagged site
NCBISNPPrimary DatabasesGEOGenBankProtein
GenBank FormatGenBank
    LOCUSLocus name, size, type, division, modification dateSearch tips: 	Locus names can change!	Division names are historical, 	not taxonomical!
    DEFINITIONAs the author sees fit…Search tip: No Controlled Vocabulary in Definitions!
    ACCESSION/VersionAccession numbers do not change, even if information in the record is changed at the author's request.Version and GI numbers change
    Keywords, Source, OrganismOrganism: Tied into Taxonomy BrowserSearch tip: Keywords are often blankWhen performing a “keyword” style search, use [all] , [word] or [title]
    Selected ReferencesNewest FirstLast “reference” covers submission information
    Features ISource, gene, misc features
    Features IICDS: links, translation
    Sequence
GenBank FormatGenBank(also for protein)
132,015,054Sequences in GenBank 3/20/11+HARD WORK-redundancyRefSeq
RefSeqsprovides a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotesRefSeq
bio molDNARNAProteinRefSeqsprovides a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotesRefSeq
bio molDNARNAProteinRefSeqsHELLOmy name isprovides a single record for each natural biological molecule for major organisms ranging from viruses to bacteria to eukaryotesXX_123456RefSeq
bio moleculesGenomic DNA(NC)Incomplete(NG)mRNA(NM)Model mRNA(XM)Curated Protein(NP)Model protein (XP)RefSeq
NG_012250.1 NM_000690.2 AY621070.1 EU414258.1 EU414257.1 EU414256.1 EU414255.1 EU414254.1 EU414253.1 EU414252.1 EU414251.1 EU414250.1 EU414249.1 AF164120.1 EU373813.1 EU373812.1 EU373811.1 EU373810.1 EU373809.1 EU373808.1 EU373807.1 EU373806.1 EU373805.1 EU373804.1 AH002599.1 M20456.1 M20455.1 M20454.1 M20453.1 M20452.1 M20451.1 M20450.1 M20449.1 M20448.1 M20447.1 M20446.1 M20445.1 M20444.1 CR456991.1 AB385105.1 CU678321.1 CU678320.1 AF073514.1 AF073513.1 AF073512.1 AF073511.1 NG_012250.1 NM_000690.2 RefSeq
Note: the NP sequence would not normally be found using a nucleotide search – I have included it only to show the complete suite of RefSeq for ALDH2NG_012250.1 NM_000690.2 NP_000681.2RefSeq
3genes/genome
GenomeGeneHomoloGene
Genome1090eukaryota1483prokaryota2507viruses
Note: genome records are either mitochondrial or chromosomeNote: no common names are listed as genome query results
The genome record shows a variety of stats for different databases, as well as a map of the genome that is scrollable
Searching in BioProject yields common names
BioProject results contain background information
Instead of searching Genome, you can also browse via the Genome Resource Guide
Genome ResourcesGGenome BLASTBMap ViewerMGenome Project(BioProject)P
GGenes and Human HealthEpigenomicsThe Genomic SequenceMaps and MarkersTranscribed SequencesCytogeneticsComparative GenomicsA standard record in Genome Resources contains many links out along with brief database summaries
MMap Viewer starts by letting you select a chromosome (or section of a circular genome)
M
To the left of each gene, there are a variety of links out.  Note: these change based on the level of information known about a given gene.MHUGO Gene NomenclatureSequence ViewerProteinDownloadEvidence ViewerMolecular ModelSTS, OMIM, CCDS, SNP
RegulatoryGene		IntronExonIntron
NCBI Boot Camp for Beginners Slides
Each gene record provides extensive details.  We will go through an example Gene record in the following slides.
Sequence Viewer and MapViewerGenomic Info
BibliographyPubMedNOTE: Gene Reference into Function is an excellent resource for literature related to function.  These articles have been submitted for inclusion into GeneRIF and are not the product of an automated text search.GeneRIF
There’sEven More!InteractionsGene OntologyGenotypesHomologuesProtein InformationInteractions will list all known interacting molecules, providing links to
RefSeqThese reference sequences are stable and are independent of genome builds
The NCBI Assembly~100 individualsThe Celera assembly~5 individualsThese reference sequences refer to specific buildsHuRefJust Craig Ventner
LINKSLINKS: internal, external and commercial
HomoloGene
Homologsparalogsorthologsorthologsfrog αchick αmouseαmouseβchick βfrogβα-chain geneβ-chain geneGENE DUPLICATIONEarly Gene of Interest
P3H1
Protein of Interest(P3H1)Cross-species identity is automatically calculatedAutomatic sequence alignments are easily accessible
Protein of Interest(P3H1)Note: UniGene  may come up with different results, since it is based on EST clusters and not protein sequence
4expression& structure
UniGene     EST, GEO		Structures		     CDD, MMDB, PubChem…
UniGene…an organized view of the transcriptome
SELECTED PROTEIN SIMILARITIESGENE EXPRESSIONESTGEOMAPPING POSITIONSEQUENCESmRNAEST
GENE EXPRESSIONESTThis is a “virtual northern” whereESTs are counted to get a rough sense of overall expression levels
GENE EXPRESSIONGEONote: the GEO results contain all arrays that assay for this gene; most of these results are for specific disease or altered states and do not necessarily reflect wild type, normal levels of expression
Structures		     CDD, MMDB, PubChem…
NCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners Slides
Cn3D colored by secondary structureNote: Cn3D has aligned the individual chains for you
Cn3D colored by chain (there are 7)
“structure function”: the hemolysin protein bores a hole into red blood cells and sucks their insides out.  The structure kind of looks like a hollow tack.
Note: the structure listing shows each individual chain (along with 3D domains and superfamilies) AND the chemical that was found in the crustal structure (see arrow)
Another example… this time a single chain with distinct domains
Now we are coloring by domain.  Also note the funky space-filling model.  It makes proteins look fat.
Note that Super Families are defined: clicking on them will take you to the conserved domain database
The Conserved Domains Database provides alignments across species of conserved domains, along with a general description of the domain
3D domains are color coded.  Note: 3D domains do not always correlate to Super Families! Clicking on the 3D domain will take you to related structures
You can select structures and then view the 3D alignment in Cn3D
Volia!  Structural alignment.  Note: the sequences are aligned in the Sequence View box.
PubChem has three primary areas:BioAssay – registry of assays that can be searched by small moleculeSubstance – a redundant registry of compoundsCompound – a non-redundant, curated chemical database
You can search PubChem by chemical name, CAS number, or even by similar structures.Records contain lots of additional information.  Highlights: synonyms (which can be quite extensive in chemical nomenclature).  Of particular note: if the compound shows up in Structure, you can link to a view in Cn3D that shows it complexed with protein/DNA/RNA!
NCBI Boot Camp for Beginners Slides
BioSystemswill display a short verbal description, a schematic of the system in question and a link to all of the genes, proteins, small molecules found in the system along with links to related systems .
NCBIdiscoveryinitiative
NCBIhigh quality DBdiscovery tools
high quality DBdiscovery toolsRefSeqGenBankDatabase Adscheck out these resources!Sensorsare you looking for…Analysis toolspre-computed & on the fly
wheredo Istart?
anywhere**but gene acts as a good hub
Apolipoprotein EAPOECys130Arg
We Can Do It!Gene and RefSeqGenome MapsAllelic Var/DiseaseExpressionHomologus G/PStructure
Search for APOE in Entrez:Note that there are many different records in several different databases that have hits for APOE.  Select PubMed.
Select APOE in homo sapiensWe have used PubMed for it’s gene sensor , which is fantastically useful.  However, you can also search directly in the Gene database.
LOTS of information in this report, including links IN the report, links to other NCBI databases and links to outside resources.
Let’s check out the reference sequences….
Note the genomic, mRNA and protein RefSeq that are independently maintained.
Separate records for ref sequences associated with specific genomic builds…
(many databases here)Let’s check out the SNP:Variation Viewer
Note, the Cys130Arg variant has been frequently observed and well documented
Let’s observe the Sequence Viewer and MapViewer for this gene.
Note, you can change which sequence you want to observe (Stable reference, reference, celera and HuRef)
The full view shows genes in the area, along with info on SNPs and other variation classifications.
Full screen of MapViewer
ab initio modelingEnsembleGenesUniGeneRefSeqThese are default maps
You can change the maps you wish to view, both in terms of how you are annotating the genome but also in which organisms.
Here we are looking at Chimp, Mouse and Human Gene maps.You can zoom out to get a larger picture of the area.
APOE is part of the APO  gene cluster.  Note: lines between maps are mapped homologues.
Each gene has a series of links following its name.  We’ll jump to the APOE OMIM record.
Another extensive record!  Let’s jump to allelic variants.
Let’s go to the SNP recordCys130Arg is .0016.0016Extensive documentation…
OMIMProt 3DSeqViewGeneViewMapViewVarViewPubMed
NOTE: the default setting doesn’t show much, because it doesn’t include clinically associated variants – click this box and refresh.
This is the one we want!  Let’s jump to this reference SNP
Hummmm… the two reference assemblies have a wild type allele, whereas Celera and HuRef carry the mutant allele.Let’s check out this area in HuRef using the sequence viewer – click on the chromosome position link.
Clicking on sequence will bring up the sequence and the CDS.  You will note that HuRef (which means Craig Ventner) carries the mutant allele.
(many databases here)Let’s check out expression in UniGene
Click on EST profile to go to the virtual northern.
DiseaseStateBODY SITESDevelopment
Click on GEO Profiles to see actually gene expression array data.
Note, there are thousands of hits, meaning many gene arrays have assayed for this gene.  However, most of these are in reference to a disease or altered state.
Use GDS596 – it is the results for  “normal” gene expression.Click on a chart to see detailed results.
Highest expression in the liver, with lower level throughout the brain.liverbrain
(many databases here)Let’s check out homologs
You can show a pairwise alignment using BLAST…
EvalueNote the very low E value1e-158
The alignment shows that the Chimp genome carries an R at the allele in question!
You can also check out homologs found in UniGene –  a different a way to search.
Bunnies show up using the UniGene homolog search, but not the HomoloGene search.
Let’s go check out the protein record…
Click here to link to the RefSeq protein record.
Let’s run a BLAST to see if we can identify the giant panda homolog.
I’ve changed the search to focus on the RefSeq protein database and limit it to the giant panda.
Note: BLAST automatically detects domainsThe highest hit is a hypothetical protein.  Let’s take a look at the alignment.
Note, the panda has the mutant arginine...… does this mean pandas and chimps both have early onset Alzheimer's disease?  Nobody knows!
Let’s check out some related structures.
This is the default setting.  Change to all similar MMDB.
Click here to go to an alignent between your query and the structure’s sequence.
Click here to view in Cn3DNote: the structure sequence contains the mutant arginine
Showing side chains, colored by hydrophobicity. The arginine is shown in yellow.Click here to go to the structure summary for 1B68
Click here to find similar 3D domains
Select another structure and then view 3D alignment.
Overall alignment, showing side chains colored by hydrophobicity. Note, the Cys vs. Arg doesn’t make a huge change structurally.
asdfsciencecan becomplex...
…we canhelp youwith that.
thank you
Jackie Wirz, PhDwirzj@ohsu.edu

More Related Content

PPT
RML NCBI Resources
PPTX
PPT
PPT
Biological databases
PPT
Biological databases
PPTX
BITS training - UCSC Genome Browser - Part 2
DOCX
Major biological nucleotide databases
PPTX
Biological databases
RML NCBI Resources
Biological databases
Biological databases
BITS training - UCSC Genome Browser - Part 2
Major biological nucleotide databases
Biological databases

What's hot (20)

PPTX
PDF
Tools and database of NCBI
PPT
Bioinformatic databases 2
PPTX
Genomic databases
PDF
BITS: Overview of important biological databases beyond sequences
PDF
TOOLS AND DATA BASES OF NCBI
PPTX
Biological database
PPT
Biological databases
PPTX
Biological databases
PPT
Biodatabases 101220022654-phpapp02
PDF
Biological databases
PPTX
BIOLOGICAL SEQUENCE DATABASES
PPTX
Biological databases
PPT
Intro to databases
PDF
100505 koenig biological_databases
PPTX
Databases ii
PPT
B.sc biochem i bobi u 2 database
PPT
Biological Databases
PPT
Biological databases: Challenges in organization and usability
Tools and database of NCBI
Bioinformatic databases 2
Genomic databases
BITS: Overview of important biological databases beyond sequences
TOOLS AND DATA BASES OF NCBI
Biological database
Biological databases
Biological databases
Biodatabases 101220022654-phpapp02
Biological databases
BIOLOGICAL SEQUENCE DATABASES
Biological databases
Intro to databases
100505 koenig biological_databases
Databases ii
B.sc biochem i bobi u 2 database
Biological Databases
Biological databases: Challenges in organization and usability
Ad

Viewers also liked (6)

PPTX
How to make a monkey: functional adaptation in the primate genome
PDF
AM Career Marketing OHSU RIPSS 2014
PDF
NGP Retreat Open Science 2015
PDF
Bioinformatics issues and challanges presentation at s p college
PPTX
Introduction to NCBI
PDF
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
How to make a monkey: functional adaptation in the primate genome
AM Career Marketing OHSU RIPSS 2014
NGP Retreat Open Science 2015
Bioinformatics issues and challanges presentation at s p college
Introduction to NCBI
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Ad

Similar to NCBI Boot Camp for Beginners Slides (20)

PPTX
Introduction to Bioinformatics: Part 3
DOC
Protein databases
PPT
Prediction of protein function
PPT
Intro bioinfo
PPT
Intro bioinfo
PPT
databaseofptoreinsteycturrdescribing.ppt
PPTX
Databases_L2.pptx
PPTX
Sequencedatabases
PPT
Genome database and its applications.ppt
PPTX
Bioinformatics- An overwiew..................
PPTX
Understanding Genome
PPTX
Informal presentation on bioinformatics
PPTX
Bioinformatics final
PPTX
Bioinformaatics for M.Sc. Biotecchnology.pptx
PPTX
Bioinformatics introduction
PPTX
Whole genome sequence
PPTX
BIOINFORMATICS_PRACTICAL_A_BRIEF_INTRODUCTION.pptx
PPT
Transcriptomics and lexico-syntactic analysis
PPTX
blast bioinformatics
Introduction to Bioinformatics: Part 3
Protein databases
Prediction of protein function
Intro bioinfo
Intro bioinfo
databaseofptoreinsteycturrdescribing.ppt
Databases_L2.pptx
Sequencedatabases
Genome database and its applications.ppt
Bioinformatics- An overwiew..................
Understanding Genome
Informal presentation on bioinformatics
Bioinformatics final
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformatics introduction
Whole genome sequence
BIOINFORMATICS_PRACTICAL_A_BRIEF_INTRODUCTION.pptx
Transcriptomics and lexico-syntactic analysis
blast bioinformatics

More from Jackie Wirz, PhD (20)

PDF
Online NW 2015 Wirz Developing Novel Outreach Data Visualization
PDF
Data Viz CE 2014 Vision and the Brain
PDF
Data Viz CE 2014 Toolbox
PDF
Data Viz CE 2014 Storytelling
PDF
Data Viz CE 2014 Intro and Overview
PDF
Data Viz CE 2014 Color
PDF
Data Viz CE 2014 Libraries
PDF
Scientific Writing 2014 IEH
PDF
Posters & Presentations that Don't Suck
PDF
Data Management
PDF
Rw 2014 poster final
PDF
Rw 2014 data visulization
PDF
Data management workshop 101113
PPTX
Data Management Open House
PDF
Foundations of data viz
PPTX
Data101 pmcb retreat_09-20-13_final
PPTX
SPARC 2013 Data Management Presentation
PPTX
Science is a moving target
PDF
Powered by Libraries: Leveraging Libraries for Semantic Web and Linked Open D...
PPTX
Science101 slideshare
Online NW 2015 Wirz Developing Novel Outreach Data Visualization
Data Viz CE 2014 Vision and the Brain
Data Viz CE 2014 Toolbox
Data Viz CE 2014 Storytelling
Data Viz CE 2014 Intro and Overview
Data Viz CE 2014 Color
Data Viz CE 2014 Libraries
Scientific Writing 2014 IEH
Posters & Presentations that Don't Suck
Data Management
Rw 2014 poster final
Rw 2014 data visulization
Data management workshop 101113
Data Management Open House
Foundations of data viz
Data101 pmcb retreat_09-20-13_final
SPARC 2013 Data Management Presentation
Science is a moving target
Powered by Libraries: Leveraging Libraries for Semantic Web and Linked Open D...
Science101 slideshare

Recently uploaded (20)

PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Sports Quiz easy sports quiz sports quiz
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
master seminar digital applications in india
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Pre independence Education in Inndia.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Cell Types and Its function , kingdom of life
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Basic Mud Logging Guide for educational purpose
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
RMMM.pdf make it easy to upload and study
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Classroom Observation Tools for Teachers
102 student loan defaulters named and shamed – Is someone you know on the list?
2.FourierTransform-ShortQuestionswithAnswers.pdf
human mycosis Human fungal infections are called human mycosis..pptx
TR - Agricultural Crops Production NC III.pdf
Renaissance Architecture: A Journey from Faith to Humanism
Sports Quiz easy sports quiz sports quiz
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
master seminar digital applications in india
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Pre independence Education in Inndia.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Cell Types and Its function , kingdom of life
STATICS OF THE RIGID BODIES Hibbelers.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Basic Mud Logging Guide for educational purpose
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
RMMM.pdf make it easy to upload and study
O5-L3 Freight Transport Ops (International) V1.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Classroom Observation Tools for Teachers

NCBI Boot Camp for Beginners Slides

Editor's Notes

  • #15: DNA fingerprint of M. tuberculosis
  • #17: Nathan Sawaya, LEGO Artist
  • #30: About 4000 major organisms vs. the 250,000 that are present in all of GenBank
  • #36: Figure ©1979 by T. C. Hsu; all text material ©2007 by Steven M. Carr
  • #44: http://survivingtheworkday.com
  • #45: http://survivingtheworkday.com
  • #46: http://survivingtheworkday.com
  • #47: http://survivingtheworkday.com
  • #48: http://survivingtheworkday.com
  • #60: www.biojobblog.com
  • #61: www.biojobblog.com
  • #62: www.biojobblog.com
  • #63: Crystal structure of putative aminotransferase (YP_614685.1) from SILICIBACTER SP. TM1040 at 1.80 A resolution. To be published
  • #90: cba-ramblings.blogspot.com
  • #91: cba-ramblings.blogspot.com
  • #92: http://www.alz.org/alzheimers_disease_4719.asp
  • #93: Reference SequenceHow people accessExpresseionGenomic assemblies maps region in map viewer look at gene cluster on ch19 compare across two other genomesPolymorphismsGenotypes referenceHuRefHomologusBlast – pandaGenome Reference Consortium human
  • #110: OMIMOMIM Link        HGNC        HGNC Listingsv        Sequence Viewpr        Proteinsdl        Download sequence region: corresponding contig regionevEvidence viewermm        Model Makerhm        HomologeneSTSUniSTSSNP        SNPs linked to gene
  • #120: Virtual northern blog
  • #128: Human on top: apoe3 CChimp has “risk” allele has r… interesting
  • #129: Panda has an RRestricted to completely sequenced eukaryotic genomesTranslating blast seqences against expressed sequences?
  • #130: http://www.petwebsite.com/rabbits/rabbit_care.htm
  • #141: Changes howprotein is processed, not so much structureColor by hydrophobicity!! When interact with lipid, interior partilaly unfolds to interact with lipid.