SlideShare a Scribd company logo
UCSC
genome browsing

        Paco Hulpiau



  http://www.bits.vib.be
TABLE     GET       CURRENT
        BROWSER   DNA       BROWSER
                            GRAPHIC IN PDF




                        TO GET
                        OTHER
CLICK                    DATA
LINE
TO GET
            OTHER
CLICK
LINE    2    DATA
Databases & accession numbers

§
    GenBank exchanges data daily with its two partners in the
    International Nucleotide Sequence Database Collaboration (INSDC):
        European Bioinformatics Institute (EBI, part of EMBL)
        DNA Data Bank of Japan (DDBJ)

§
    Characteristics of GenBank and RefSeq @ NCBI :

    GenBank                          RefSeq

                                     Curated, NCBI creates from existing
    Not curated, author submits
                                     data

    Multiple records for same loci   Single records for each molecule

    No limit to species included     Limited to model organisms
Databases & accession numbers

§




§
    The Ensembl automatic gene annotation system (Curwen et al, 2004) :
        The gene-building system enables fast automated annotation of
    eukaryotic genomes. It annotates genes based on evidence derived
    from known protein, cDNA, and EST sequences
        incl. GenBank sequences shared by INSDC, UniProtKB and NCBI
    RefSeq
Databases & accession numbers

§   Database                   Typical accession numbers

    GenBank                    AAA37420

                               NM_123456 = mRNA
                               NP_123456 = proteins
    RefSeq
                               XM_123456 = predicted mRNA
                               XP_123456 = predicted proteins

    UniProtKB (Swiss-
                               P12345, Q1AAA9
    Prot/TrEMBL)

                               ENSMUSG00000123456 for Genes
    Ensembl                    ENSMUST00000123456 for Transcripts
                               ENSMUSP00000123456 for Proteins
TO GET
            OTHER
CLICK
LINE    2    DATA
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
zoom in on
exon 1 +
upstream
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
Exercises (II)


1)   Are there any diseases related to your gene of interest?      (OMIM)
         Which interactions partners are known?         (Entrez Gene)
         Any important SNPs changing the amino acid sequence?


         Get the multiple sequence alignment (MSA, multiz46way)
     showing the nucleotide sequences of human, mouse, chicken,
     Xenopus and zebrafish genes (CDS fasta alignment, exons not
     separate).


         Save your results (e.g. exercises2_1.doc).
3
GET
DNA




          TO GET
          OTHER
           DATA
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
http://www.visibone.com/colorlab
/
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
Exercises (II)


2)   Get the DNA sequence for your gene of interest
         including 2000 base pairs upstream and
         use the following extended case/color options:
         » RefSeq and Ensembl genes in bold
         » SNPs (132) underlined
         » Regulatory information e.g. from Oreganno and miRNA sites
          in different colors


         » Save your results (e.g. exercises2_2a.doc).
Exercises (II)


2)   Try to get the DNA sequence for your gene of interest
         in chicken or zebrafish and
         use the following extended case/color options:
         » UCSC, RefSeq and Ensembl genes in bold
         » Other RefSeq genes underlined
         » Human proteins in a specific color


         » Save your results (e.g. exercises2_2b.doc).
4
TABLE
BROWSER




              TO GET
              OTHER
               DATA
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
COPY (Ctrl+C)
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
= Accession Number (RefSeq) e.g. NM_001229




= Gene Name (Entrez) e.g. CASP1
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
Exercises (II)


3)   Get a list of the RefSeq and Ensembl transcripts using the table
         browser with the following selected fields:
         » name, chromosome, exon count, name2
         » Save the results (exercises2_3a.xls)
         Also get the sequences and save as genename_transcripts.fasta


         Search the mouse genome using the filter in the table browser
         to get all family members of a protein family (research interest)
         and save the results in a list (exercises2_3b.xls) containing name,
     chromosome, cds start and end, exon count and name2
TO GET
OTHER
 DATA
TO GET
OTHER
 DATA
BITS training - UCSC Genome Browser - Part 2
BLAT = Blast-Like Alignment Tool
Ø search for high similarity matches by indexing entire
genome
Ø DNA limit = 25000 bases, for multiple seqs 50000 bases

Ø protein limit = 10000 aa, for multiple seqs 25000 aa

Ø total sequences = 25
PASTE (Ctrl+V)
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
TTTAGCCAACGAACAGTCGCT   TTCTCTTTGCATCTGTCCCAG
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
§
    The Utilities page contains links to some tools

    created by the UCSC Genome Bioinformatics Group.


§
    DNA Duster & Protein Duster remove non-sequence

    related characters from an input sequence.
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
Exercises (II)


4)   Use BLAT to find orthologs of your gene in chicken, zebrafish
         and fruit fly. What is the genomic location?
         Are the flanking genes the same?


         Perform an in silico PCR to see what happens when more than 1
     PCR product may arise and determine product size and Tm:
         species: human
         forward primer: TTC AAG GAG GCC TTC TCC CT
         reverse primer: CTG GGG GAG AAG CTG A (+click flip reverse)

More Related Content

PDF
BITS: UCSC genome browser - Part 1
PDF
The UCSC genome browser: A Neuroscience focused overview
PPTX
NCBI Boot Camp for Beginners Slides
PPT
RML NCBI Resources
PPT
BITs: Genome browsers and interpretation of gene lists.
PPTX
PPT
Biological databases
PPT
Bioinformatic databases 2
BITS: UCSC genome browser - Part 1
The UCSC genome browser: A Neuroscience focused overview
NCBI Boot Camp for Beginners Slides
RML NCBI Resources
BITs: Genome browsers and interpretation of gene lists.
Biological databases
Bioinformatic databases 2

What's hot (19)

PPT
PPT
Biological databases
DOCX
Major biological nucleotide databases
PPTX
PPT
Biological databases
PPTX
Biological databases
PDF
BITS: Overview of important biological databases beyond sequences
PDF
Tools and database of NCBI
PPTX
Biological databases
PPT
Intro to databases
PPTX
Databases ii
PDF
TOOLS AND DATA BASES OF NCBI
PPT
B.sc biochem i bobi u 2 database
PPTX
Genomic databases
PPT
Ensembl genome
PPTX
Protein database ..... of NCBI
PDF
Biological databases
PPT
Biological data base
PDF
100505 koenig biological_databases
Biological databases
Major biological nucleotide databases
Biological databases
Biological databases
BITS: Overview of important biological databases beyond sequences
Tools and database of NCBI
Biological databases
Intro to databases
Databases ii
TOOLS AND DATA BASES OF NCBI
B.sc biochem i bobi u 2 database
Genomic databases
Ensembl genome
Protein database ..... of NCBI
Biological databases
Biological data base
100505 koenig biological_databases
Ad

Viewers also liked (7)

PPTX
EMBL-EBI
PDF
GenomeBrowser
PPTX
Presentation on Biological database By Elufer Akram @ University Of Science ...
PPTX
databases in bioinformatics
PPTX
Biological databases
PPT
Biological Databases
PPTX
Protein databases
EMBL-EBI
GenomeBrowser
Presentation on Biological database By Elufer Akram @ University Of Science ...
databases in bioinformatics
Biological databases
Biological Databases
Protein databases
Ad

Similar to BITS training - UCSC Genome Browser - Part 2 (20)

PPTX
Understanding Genome
PPTX
Bioinformatics final
DOCX
Genome comparision
PDF
RNA sequencing analysis tutorial with NGS
PPT
databaseofptoreinsteycturrdescribing.ppt
PPTX
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
PDF
Bioinformatics.Practical Notebook
PDF
Apollo Introduction for the Chestnut Research Community
PPT
Role of bioinformatics in life sciences research
PDF
Introduction to Apollo for i5k
PPT
Microarray biotechnologg ppy dna microarrays
PDF
PDF
Apollo : A workshop for the Manakin Research Coordination Network
PPTX
Race against the sequencing machine: processing of raw DNA sequence data at t...
PPTX
Dgaston dec-06-2012
PPT
L14 human genome
PDF
Genome Assembly
PPTX
Informal presentation on bioinformatics
PDF
DNA SEQUENCING_BASICS_NGS_SANGER_NGS_SLIDES
PPTX
Introduction to databases.pptx
Understanding Genome
Bioinformatics final
Genome comparision
RNA sequencing analysis tutorial with NGS
databaseofptoreinsteycturrdescribing.ppt
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Bioinformatics.Practical Notebook
Apollo Introduction for the Chestnut Research Community
Role of bioinformatics in life sciences research
Introduction to Apollo for i5k
Microarray biotechnologg ppy dna microarrays
Apollo : A workshop for the Manakin Research Coordination Network
Race against the sequencing machine: processing of raw DNA sequence data at t...
Dgaston dec-06-2012
L14 human genome
Genome Assembly
Informal presentation on bioinformatics
DNA SEQUENCING_BASICS_NGS_SANGER_NGS_SLIDES
Introduction to databases.pptx

More from BITS (20)

PDF
RNA-seq for DE analysis: detecting differential expression - part 5
PDF
RNA-seq for DE analysis: extracting counts and QC - part 4
PDF
RNA-seq for DE analysis: the biology behind observed changes - part 6
PDF
RNA-seq: analysis of raw data and preprocessing - part 2
PDF
RNA-seq: general concept, goal and experimental design - part 1
PDF
RNA-seq: Mapping and quality control - part 3
PDF
Productivity tips - Introduction to linux for bioinformatics
PDF
Text mining on the command line - Introduction to linux for bioinformatics
PDF
The structure of Linux - Introduction to Linux for bioinformatics
PDF
Managing your data - Introduction to Linux for bioinformatics
PDF
Introduction to Linux for bioinformatics
PDF
BITS - Genevestigator to easily access transcriptomics data
PDF
BITS - Comparative genomics: the Contra tool
PDF
BITS - Comparative genomics on the genome level
PDF
BITS - Comparative genomics: gene family analysis
PDF
BITS - Introduction to comparative genomics
PDF
BITS - Protein inference from mass spectrometry data
PDF
BITS - Overview of sequence databases for mass spectrometry data analysis
PDF
BITS - Search engines for mass spec data
PDF
BITS - Introduction to proteomics
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: Mapping and quality control - part 3
Productivity tips - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformatics
Introduction to Linux for bioinformatics
BITS - Genevestigator to easily access transcriptomics data
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics on the genome level
BITS - Comparative genomics: gene family analysis
BITS - Introduction to comparative genomics
BITS - Protein inference from mass spectrometry data
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Search engines for mass spec data
BITS - Introduction to proteomics

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
cuic standard and advanced reporting.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Getting Started with Data Integration: FME Form 101
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
Teaching material agriculture food technology
PPTX
Tartificialntelligence_presentation.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Unlocking AI with Model Context Protocol (MCP)
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Encapsulation_ Review paper, used for researhc scholars
cuic standard and advanced reporting.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
Getting Started with Data Integration: FME Form 101
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Machine Learning_overview_presentation.pptx
Approach and Philosophy of On baking technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Teaching material agriculture food technology
Tartificialntelligence_presentation.pptx
Building Integrated photovoltaic BIPV_UPV.pdf

BITS training - UCSC Genome Browser - Part 2

  • 1. UCSC genome browsing Paco Hulpiau http://www.bits.vib.be
  • 2. TABLE GET CURRENT BROWSER DNA BROWSER GRAPHIC IN PDF TO GET OTHER CLICK DATA LINE
  • 3. TO GET OTHER CLICK LINE 2 DATA
  • 4. Databases & accession numbers § GenBank exchanges data daily with its two partners in the International Nucleotide Sequence Database Collaboration (INSDC): European Bioinformatics Institute (EBI, part of EMBL) DNA Data Bank of Japan (DDBJ) § Characteristics of GenBank and RefSeq @ NCBI : GenBank RefSeq Curated, NCBI creates from existing Not curated, author submits data Multiple records for same loci Single records for each molecule No limit to species included Limited to model organisms
  • 5. Databases & accession numbers § § The Ensembl automatic gene annotation system (Curwen et al, 2004) : The gene-building system enables fast automated annotation of eukaryotic genomes. It annotates genes based on evidence derived from known protein, cDNA, and EST sequences incl. GenBank sequences shared by INSDC, UniProtKB and NCBI RefSeq
  • 6. Databases & accession numbers § Database Typical accession numbers GenBank AAA37420 NM_123456 = mRNA NP_123456 = proteins RefSeq XM_123456 = predicted mRNA XP_123456 = predicted proteins UniProtKB (Swiss- P12345, Q1AAA9 Prot/TrEMBL) ENSMUSG00000123456 for Genes Ensembl ENSMUST00000123456 for Transcripts ENSMUSP00000123456 for Proteins
  • 7. TO GET OTHER CLICK LINE 2 DATA
  • 27. zoom in on exon 1 + upstream
  • 36. Exercises (II) 1) Are there any diseases related to your gene of interest? (OMIM) Which interactions partners are known? (Entrez Gene) Any important SNPs changing the amino acid sequence? Get the multiple sequence alignment (MSA, multiz46way) showing the nucleotide sequences of human, mouse, chicken, Xenopus and zebrafish genes (CDS fasta alignment, exons not separate). Save your results (e.g. exercises2_1.doc).
  • 37. 3 GET DNA TO GET OTHER DATA
  • 50. Exercises (II) 2) Get the DNA sequence for your gene of interest including 2000 base pairs upstream and use the following extended case/color options: » RefSeq and Ensembl genes in bold » SNPs (132) underlined » Regulatory information e.g. from Oreganno and miRNA sites in different colors » Save your results (e.g. exercises2_2a.doc).
  • 51. Exercises (II) 2) Try to get the DNA sequence for your gene of interest in chicken or zebrafish and use the following extended case/color options: » UCSC, RefSeq and Ensembl genes in bold » Other RefSeq genes underlined » Human proteins in a specific color » Save your results (e.g. exercises2_2b.doc).
  • 52. 4 TABLE BROWSER TO GET OTHER DATA
  • 65. = Accession Number (RefSeq) e.g. NM_001229 = Gene Name (Entrez) e.g. CASP1
  • 71. Exercises (II) 3) Get a list of the RefSeq and Ensembl transcripts using the table browser with the following selected fields: » name, chromosome, exon count, name2 » Save the results (exercises2_3a.xls) Also get the sequences and save as genename_transcripts.fasta Search the mouse genome using the filter in the table browser to get all family members of a protein family (research interest) and save the results in a list (exercises2_3b.xls) containing name, chromosome, cds start and end, exon count and name2
  • 75. BLAT = Blast-Like Alignment Tool Ø search for high similarity matches by indexing entire genome Ø DNA limit = 25000 bases, for multiple seqs 50000 bases Ø protein limit = 10000 aa, for multiple seqs 25000 aa Ø total sequences = 25
  • 85. TTTAGCCAACGAACAGTCGCT TTCTCTTTGCATCTGTCCCAG
  • 88. § The Utilities page contains links to some tools created by the UCSC Genome Bioinformatics Group. § DNA Duster & Protein Duster remove non-sequence related characters from an input sequence.
  • 91. Exercises (II) 4) Use BLAT to find orthologs of your gene in chicken, zebrafish and fruit fly. What is the genomic location? Are the flanking genes the same? Perform an in silico PCR to see what happens when more than 1 PCR product may arise and determine product size and Tm: species: human forward primer: TTC AAG GAG GCC TTC TCC CT reverse primer: CTG GGG GAG AAG CTG A (+click flip reverse)