SlideShare a Scribd company logo
Ann E. Loraine and Gregg A. Helt
Affymetrix, Inc. 6550 Vallejo Street, Emeryville, CA 94608 USA
Introduction
In order to take full advantage of the public human genome data and associated annotations, biologists require
visualization tools ("genome browsers") that can accommodate the high frequency of alternative splicing in human
genes and other complexities.
Techniques for presenting human genome data and sequence annotations in an interactive, graphical format are
illustrated using examples from two genome browser applications: The Neomorphic GeneViewer, developed on
contract for TIGR, and ProtAnnot, a prototype protein domain visualization tool designed to reveal the impact of
alternative splicing on conserved domains within protein isoforms encoded at the same locus.
Our aim is not to showcase these two applications, instead but to provide interested software developers with a guide to
what features are most likely to meet the needs of biologists.
Two views of the SNURF locus, which encodes an unusual bicistronic transcript. (a) Annotation types are sorted
into labeled tiers. (b) Several tiers shown in (a) have been hidden, collapsed, or moved to new positions. The horizontal
slider has been used to expand the display in the vertical direction.
Visualizing the genome: Techniques for
presenting genome data and annotations
1. Semantic Zooming
Biologists need the ability to inspect sequence data
alongside larger structures such as introns and exons.
Semantic zooming, in which objects change their
representation according to the scale of the view, is one
way this feature could be implemented.
(a) Low Zoom. A gene structure inferred from a
cDNA-to-genomic sequence alignment is shown.
(b) High Zoom. Close-up view of an unusually
small 3' intron from (a). Semantic zooming reveals that
this intron departs from the expected "GT-intron-AG"
consensus sequence for intron boundaries.
Semantic zooming can also be used to show more
detailed information about cDNA-to-genomic sequence
alignments used to infer gene structures. An example of
this is shown below.
(a) Complex View (b) Simplified, Adjusted View
CD79b Locus. One variant lacks a conserved domain that
is present in the others, a result of alternative splicing. This
difference is made obvious by showing "hits" against these
motifs alongside gene structures inferred from cDNA-to-
genomic alignments. Amino acid motifs are shown below
each transcript as green rectangles.
2. Adjustable, moveable tiers
Annotation density can vary enormously from region to region. Sorting items in rows or columns perpendicular to the
sequence axis can help organize a scene and make it easier to spot biologically important patterns. In this example,
annotations based on cDNA-to-genome alignments are sorted into horizontal tiers based on the type and quality of the
aligned cDNAs.
3. Protein in the context of genomic
sequence
Shading coding region exons according to frame, and
displaying protein motifs alongside gene structures, allows
biologists to assess at a glance how alternative splicing
impacts protein function.
Arg1 Locus. Two alternative transcripts encoding
divergent proteins are shown. Alternative splicing causes
the final exon to be translated in different frames in the
two different variants.
At low zoom, "V" characters indicate "cDNA inserts,"
regions of the cDNA sequence that failed to align to
genomic. These can appear when the genomic sequence
contains gaps or runs of "N's" representing ambiguous
sequence. Inverted "V" characters at low zoom indicate
gaps in the cDNA part of the alignment, while
mismatches are shown as black rectangles. At high
zoom, the sequence for the cDNA partner in the
alignment appears.

More Related Content

PPTX
Interviewing - why some questions are off limits
PPT
Human Genome and Big Data Challenges
PDF
Bioinformatics & Genomics December Newsletter
KEY
Big Data & the networked future of Science (at Ignite Seattle 7)
PDF
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
PPTX
Daily Snapshot - 2nd March 2017
PDF
Hannes Smarason: Genomics: Forging Patient-Centric Communities
PDF
Hannes Smarason: Progress & Prospects in Genomics
Interviewing - why some questions are off limits
Human Genome and Big Data Challenges
Bioinformatics & Genomics December Newsletter
Big Data & the networked future of Science (at Ignite Seattle 7)
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Daily Snapshot - 2nd March 2017
Hannes Smarason: Genomics: Forging Patient-Centric Communities
Hannes Smarason: Progress & Prospects in Genomics

Viewers also liked (19)

PPTX
Genomics Is Not Special: Towards Data Intensive Biology
PDF
Hannes Smarason: 2015 = An Inflection Point in Genomics
PDF
Strata Big Data Science Talk on ADAM
PPTX
Genome Analysis Pipelines with Spark and ADAM
PDF
Genome Big Data
PDF
Tracxn Research — Home Improvements Landscape, December 2016
PDF
Tracxn Research — New Space Landscape, December 2016
PDF
Tracxn Research - PR Tech Landscape, January 2017
PDF
Tracxn Research — Legal Tech Landscape, December 2016
PDF
Tracxn Research - Smart Cars Landscape, January 2017
PDF
Genomics: Big Data Leading to Big Opportunities
PDF
Tracxn Research - Wind Energy Landscape, January 2017
PDF
Genome Analysis Pipelines, Big Data Style
PDF
Tracxn Research - Music Tech Landscape, January 2017
PDF
Tracxn Research — Big Data Infrastructure Landscape, September 2016
PDF
Tracxn Startup Research — Life Sciences Landscape, October 2016
PDF
Tracxn Research — Genomics Landscape, November 2016
PDF
Tracxn Research: Healthcare Analytics Startup Landscape, July 2016
PDF
Tracxn Research - Blockchain Landscape, November 2016
Genomics Is Not Special: Towards Data Intensive Biology
Hannes Smarason: 2015 = An Inflection Point in Genomics
Strata Big Data Science Talk on ADAM
Genome Analysis Pipelines with Spark and ADAM
Genome Big Data
Tracxn Research — Home Improvements Landscape, December 2016
Tracxn Research — New Space Landscape, December 2016
Tracxn Research - PR Tech Landscape, January 2017
Tracxn Research — Legal Tech Landscape, December 2016
Tracxn Research - Smart Cars Landscape, January 2017
Genomics: Big Data Leading to Big Opportunities
Tracxn Research - Wind Energy Landscape, January 2017
Genome Analysis Pipelines, Big Data Style
Tracxn Research - Music Tech Landscape, January 2017
Tracxn Research — Big Data Infrastructure Landscape, September 2016
Tracxn Startup Research — Life Sciences Landscape, October 2016
Tracxn Research — Genomics Landscape, November 2016
Tracxn Research: Healthcare Analytics Startup Landscape, July 2016
Tracxn Research - Blockchain Landscape, November 2016
Ad

Similar to Visualizing the genome: Techniques for presenting genome data and annotations (20)

PPT
Gene gain and loss: aCGH. ISACGH
PDF
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
PPT
Bioinformatica 08-12-2011-t8-go-hmm
DOCX
Drug TanzeumDiseasesDiabetesGene and Gene Productglucago.docx
PDF
PDF
ImmGenPosterCLVizbiSpring2014
PPT
An Interactive Genome Visualization Tool Using DECIPHER Data
PPTX
Bioinformatics
PDF
Cytoscape Talk 2010
PPT
Chapter8 igenetics
PPTX
Kulakova sbb2014
DOCX
Genome comparision
PPT
The rat genome database - genome browser
PDF
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
PPTX
RNA-Seq_Presentation
PDF
Apollo annotation guidelines for i5k projects Diaphorina citri
PDF
Basics of bioinformatics
PDF
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
PDF
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
PPTX
Bioinformatics
Gene gain and loss: aCGH. ISACGH
Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...
Bioinformatica 08-12-2011-t8-go-hmm
Drug TanzeumDiseasesDiabetesGene and Gene Productglucago.docx
ImmGenPosterCLVizbiSpring2014
An Interactive Genome Visualization Tool Using DECIPHER Data
Bioinformatics
Cytoscape Talk 2010
Chapter8 igenetics
Kulakova sbb2014
Genome comparision
The rat genome database - genome browser
IRJET- Gene Mutation Data using Multiplicative Adaptive Algorithm and Gene On...
RNA-Seq_Presentation
Apollo annotation guidelines for i5k projects Diaphorina citri
Basics of bioinformatics
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...
Bioinformatics
Ad

More from Ann Loraine (14)

PDF
Use Integrated Genome Browser to explore, analyze, and publish genomic data
PPTX
Visualize genomes with Integrated Genome Browser
PDF
BINF 3121 Data Analysis Report How-To
PPTX
Giving great talks in Bioinformatics - from Professional Communication class ...
PDF
RNA-Seq Analysis of Blueberry Fruit Development and Ripening
PDF
Introducing ProtAnnot - Araport workshop at PAG 2016
PPTX
Em pcr 16x9
PDF
Arrays and alternative splicing
PDF
wings2014 Workshop 1 Design, sequence, align, count, visualize
PDF
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
PDF
RNA-Seq data analysis at wings 2014 - Workshop 3 Biological Interpretation
PDF
Linking IGB with Galaxy
PDF
IGB genome genometry data models by Gregg Helt and Cyrus Harmon
PDF
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
Use Integrated Genome Browser to explore, analyze, and publish genomic data
Visualize genomes with Integrated Genome Browser
BINF 3121 Data Analysis Report How-To
Giving great talks in Bioinformatics - from Professional Communication class ...
RNA-Seq Analysis of Blueberry Fruit Development and Ripening
Introducing ProtAnnot - Araport workshop at PAG 2016
Em pcr 16x9
Arrays and alternative splicing
wings2014 Workshop 1 Design, sequence, align, count, visualize
WiNGS 2014 Workshop 2 R, RStudio, and reproducible research with knitr
RNA-Seq data analysis at wings 2014 - Workshop 3 Biological Interpretation
Linking IGB with Galaxy
IGB genome genometry data models by Gregg Helt and Cyrus Harmon
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...

Recently uploaded (20)

PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Introduction to Business Data Analytics.
PPTX
1_Introduction to advance data techniques.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Fluorescence-microscope_Botany_detailed content
Clinical guidelines as a resource for EBP(1).pdf
Reliability_Chapter_ presentation 1221.5784
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Data_Analytics_and_PowerBI_Presentation.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Database Infoormation System (DBIS).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Launch Your Data Science Career in Kochi – 2025
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
climate analysis of Dhaka ,Banglades.pptx
Introduction to Business Data Analytics.
1_Introduction to advance data techniques.pptx

Visualizing the genome: Techniques for presenting genome data and annotations

  • 1. Ann E. Loraine and Gregg A. Helt Affymetrix, Inc. 6550 Vallejo Street, Emeryville, CA 94608 USA Introduction In order to take full advantage of the public human genome data and associated annotations, biologists require visualization tools ("genome browsers") that can accommodate the high frequency of alternative splicing in human genes and other complexities. Techniques for presenting human genome data and sequence annotations in an interactive, graphical format are illustrated using examples from two genome browser applications: The Neomorphic GeneViewer, developed on contract for TIGR, and ProtAnnot, a prototype protein domain visualization tool designed to reveal the impact of alternative splicing on conserved domains within protein isoforms encoded at the same locus. Our aim is not to showcase these two applications, instead but to provide interested software developers with a guide to what features are most likely to meet the needs of biologists. Two views of the SNURF locus, which encodes an unusual bicistronic transcript. (a) Annotation types are sorted into labeled tiers. (b) Several tiers shown in (a) have been hidden, collapsed, or moved to new positions. The horizontal slider has been used to expand the display in the vertical direction. Visualizing the genome: Techniques for presenting genome data and annotations 1. Semantic Zooming Biologists need the ability to inspect sequence data alongside larger structures such as introns and exons. Semantic zooming, in which objects change their representation according to the scale of the view, is one way this feature could be implemented. (a) Low Zoom. A gene structure inferred from a cDNA-to-genomic sequence alignment is shown. (b) High Zoom. Close-up view of an unusually small 3' intron from (a). Semantic zooming reveals that this intron departs from the expected "GT-intron-AG" consensus sequence for intron boundaries. Semantic zooming can also be used to show more detailed information about cDNA-to-genomic sequence alignments used to infer gene structures. An example of this is shown below. (a) Complex View (b) Simplified, Adjusted View CD79b Locus. One variant lacks a conserved domain that is present in the others, a result of alternative splicing. This difference is made obvious by showing "hits" against these motifs alongside gene structures inferred from cDNA-to- genomic alignments. Amino acid motifs are shown below each transcript as green rectangles. 2. Adjustable, moveable tiers Annotation density can vary enormously from region to region. Sorting items in rows or columns perpendicular to the sequence axis can help organize a scene and make it easier to spot biologically important patterns. In this example, annotations based on cDNA-to-genome alignments are sorted into horizontal tiers based on the type and quality of the aligned cDNAs. 3. Protein in the context of genomic sequence Shading coding region exons according to frame, and displaying protein motifs alongside gene structures, allows biologists to assess at a glance how alternative splicing impacts protein function. Arg1 Locus. Two alternative transcripts encoding divergent proteins are shown. Alternative splicing causes the final exon to be translated in different frames in the two different variants. At low zoom, "V" characters indicate "cDNA inserts," regions of the cDNA sequence that failed to align to genomic. These can appear when the genomic sequence contains gaps or runs of "N's" representing ambiguous sequence. Inverted "V" characters at low zoom indicate gaps in the cDNA part of the alignment, while mismatches are shown as black rectangles. At high zoom, the sequence for the cDNA partner in the alignment appears.