SlideShare a Scribd company logo
FIND MEANING IN COMPLEXITY
© Copyright 2014 by Pacific Biosciences of California, Inc. All rights reserved.For Research Use Only. Not for use in diagnostic procedures.
Jason Chin (@infoecho) / Sept. 20 2014, GRC Workshop, Cambridge,
UK
Learning Genomic Structures From De
Novo Assembly and Long-read Mapping
de novol
Cost per Genome Dilemma
2
Sequencing cost is down for sure, but getting a de novo human genome that has the
same scientific standard as the initial work does NOT follow Moore’s law.
PacBio® CHM1: 4378 kb
from just single random fragment
library
HGP, N50 ~100kb
NCBI-34
Contig N50 29Mb
HuRef: 107kb
BGI YH: 7.4kb
KB1: 5.5kb
NA12878: 24kb
CHM1: 144kb
RP11: 127kb
According to the NHGRI
website, the definition of
“sequencing a genome”
changed in 2008.
The 1000 Genomes Project
starts in 2008, too.
Question Asked!!
•  Since the 1000 Genomes
Project, we have learned a lot
of about point mutations. Can
we go beyond that?
•  What if we have 50, 100 or
more human assemblies so we
can address all genetic
variations as much as
possible?
•  Will one day all human genome
sequencing be done in de novo
fashion?
–  If so, how can we get ready
for that as bioinformatists?
3
Evan Eichler , In Future Opportunities
for Genome Sequencing and Beyond,
July 28-29, 2014
Where We Are Now
•  One PacBio® human data set is publicly available, more are likely to
come
•  Multiple groups have successfully assembled the public CHM1 data
set independently with new algorithms from raw data
•  With new alignment/assembly tools from Gene Myers:
one can assemble a genome in ~ 20,000 CPU-hours. (20X faster
than 400,000+ CPU-hours from previous effort.)
4
New Assembly Statistics done
With Daligner:
	
  
#Seqs	
  	
  	
  5,058	
  
Mean	
  	
  	
  	
  562,695	
  
Max	
  	
  	
  	
  	
  27,292,514	
  
n50	
  	
  	
  	
  	
  5,265,098	
  
Total	
  	
  	
  2,846,115,586	
  http://dazzlerblog.wordpress.com
What Can We Learn from High-contiguity
De Novo Human Assemblies?
5
What Can We Learn from High-contiguity Human
Assemblies?
•  Low-hanging Fruits
–  Calling SNPs (assembly not needed, but it helps)
–  Calling structure variants with whole-genome alignment
approaches
–  Inferring repeats by coverage analysis
•  Assembly graph can provide information for understanding
more complicated polymorphisms
6
Call SNPs / Example: HLA-B
7
Call Structure Variation By Whole-genome Alignment
•  Whole-genome alignments ( ~ 1 hr in a 32-core machine)
–  With multi-threaded Mummer
–  Clustering the hits with Mgaps and identified “gaps” in the alignments,
convert to bed format for visualization
8
Structure Variants Called in Chromosome 1
Distribution of The Structure Variation Sizes
•  Number of insertions/deletions: 13796 SV calls (for insertion or deletion >
100 bp against hg19)
9
PacBio® vs. Short-read Alignment View for SV in the MHC region
10
318bp insertion
Assembly Graph
11
Each edge is associated with a sequence.
Every path is a candidate of a model of part
of the genome.
From Gene Myers’ ISMB 2014 Keynote talk
Dissect a Contig from a String Graph
The autonomy of a contig from a string graph layout
12
A contig: a linear non-branching path
Each node: the begin (5’) or end (3’)
of a read
Each edge: a continuous sub-
sequence from one read
Ek:	
  (V1,	
  V2,	
  Read,	
  Range)	
  =	
  
	
  (	
  00099576_1:B,	
  00101043_0:B,	
  00101043_0,	
  1991-­‐0	
  )	
  
	
  
Read	
  1:	
  00099576_1,	
  Read	
  2:	
  00101043_0	
  
	
  
In practice, we might just encode the paths in a contig rather than each single
edge:
C	
  =	
  (Ek,	
  Ek+1,	
  Ek+2,	
  Ek+2)	
  =	
  (Pj	
  Pj+1)	
  	
  	
  
V1 V2 V3 V4 V5
Ek Ek+1 Ek+2 Ek+3
V1 V3 V5
Pj Pj+1
C =
=
Assembly String Graph of CHM1 Genome
•  Largest connect component: 31998 nodes, 39399 edges, ~36.5%
(~1Gbp) of the human genome (total: 87572 nodes, 94530 edges)
13
Centromere?
Casey Bergman:
“it almost looks like an
electron micrograph of
the nucleus”
#convergence
Polymorphism Structure vs. Local Assembly Graph
Structure
14
SNPs
SNPs SNPs
SVsSVs
Diploid Genome
Segmental Duplication
Similar String Graph
Identify Contigs: A New Proposal
SNPs
SNPs SNPs
SVs
SVs
Associated
contig 1
Associated
contig 2
Primary
contig
1 full length contig + 2 associated contigs
Keep the long-range information
while maintaining the relations of
the alternative alleles.
Contig 4076 Alignment Around DPY19L2 Locus
Same contig
Contig Graph and Segmental Duplication
Contig 4076, one primary contig, 3 associate contigs, aligned to Chr7 and Chr12
Coting 4076 Alignment to Chr7
Same contig
SV calls from
CHM1 asm
SV calls from
GRC38
Local Neighborhood Subgraph of Contig 4076
19
Examining an Assembly Graph at Contig Level Around
1q21
•  Contig graph, 1q21, contig 4108, another potential segmental
duplication?
20
Another Intriguing Case
21
•  Contig 4006 mapped to chr 9
The aligned region changes a lot in GRC38.
Contig Coverage Analysis
22
18.5 X
2 * 18.5 X
3 * 18.5 X
High coverage long contigs
40 contigs > 100kbp
> 2.5 * 18.5 X
Poor assemblies,
alignment artifacts,
or sequence errors?
High repeat elements
Checking the Complexity of the High-coverage Contigs
23
Contig 4006, 687kb, 53x coverage
Contig 4235, 453k, 59x coverage
Contig 3842, 235k, 54x coverage
Warning: These contigs may not be 100% correctly assembled due to
some nasty repeats. However, the local graphs give hints about the
true genome structures.
How does the High-coverage Contig Look?
24
>2000X in this region
How does The High-coverage Contig Look?
25
High-coverage
Region
Alpha satellites?
For Research Use Only. Not for use in diagnostic procedures.
Extreme Repeats
26
Identify Centromere Alpha-satellite Structure
•  Most of the nasty contig graphs are around the centromere.
Currently, it remains hard to get long contigs around those very long
tandem repeats.
•  However, we can still learn many useful things from long-read data
•  Tool In Development: α-Centauri for identifying different high-order
repeat structures (https://github.com/volkansevim/alpha-CENTAURI,
Volkan Sevim, Ali Bashir & Karen Miga )
27
Centromere Alpha Satellites Have Non-trivial High-order
Repeat Structure
28
Karen Miga
Example: A Read Reconstructs a 24-mer HOR
29
Align monomer to each other to
identify near identical mon0mers
Identify HOR with the monomer
IDs and positions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
171819
20
21
22
23
24
Many Other Open Topics
•  Low-coverage assembly: cost vs. quality analysis
•  Phasing for haplotypes
•  Crowd-sourcing infrastructure for examining / annotating / correcting
genome assemblies
•  Evaluation about SNPs calling with short reads on better assembly
•  Large-scale comparative genomes with de novo assemblies
•  Assembly-graph data format
•  Visualization Techniques
•  Combining other data types, e.g. optical mapping
30
It is a very exciting time. We still need more tools to harvest
information to generate new knowledge.
For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq
are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners.
31

More Related Content

PDF
Ashg2015 grc-pruitt
PPTX
Grc workshop agbt2015_tg
PPTX
PDF
agbt 2016 workshop church
PPTX
Explaining the assembly model
PPTX
Ashg2015 schneider final
PPTX
Ashg grc workshop2014_tg
PDF
Theory and practice of graphical population analysis
Ashg2015 grc-pruitt
Grc workshop agbt2015_tg
agbt 2016 workshop church
Explaining the assembly model
Ashg2015 schneider final
Ashg grc workshop2014_tg
Theory and practice of graphical population analysis

What's hot (20)

PDF
Variation graphs and population assisted genome inference copy
PPTX
Understanding the reference assembly: CSHL Hackathon
PPTX
Ashg2014 grc workshop_schneider
PPTX
GRCWorkshop_geval_1KG_slides
PPTX
Previewing GRCm39: Assembly Updates from the GRC
PPTX
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
PPTX
hg19 (GRCh37) vs. hg38 (GRCh38)
PPTX
agbt 2016 workshop lindsay
PDF
Grc ashg2015 workshop_mudge
PPTX
ABGT 2016 Workshop Schneider
PPTX
Creating Reference-Grade Human Genome Assemblies
PDF
Variant Calling II
PPTX
TAGC2016 schneider
PPTX
Agbt2015 workshop schneider
PDF
101717.kh miga ashg_grc
PPTX
Getting the most from the reference assembly
PDF
Ashg grc workshop2015_tg
PPTX
Exploiting long read sequencing technology to build a substantially improved ...
PPTX
AGBT 2016 Workshop Magrini
PPTX
Creating Reference-Grade Human Genome Assemblies
Variation graphs and population assisted genome inference copy
Understanding the reference assembly: CSHL Hackathon
Ashg2014 grc workshop_schneider
GRCWorkshop_geval_1KG_slides
Previewing GRCm39: Assembly Updates from the GRC
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
hg19 (GRCh37) vs. hg38 (GRCh38)
agbt 2016 workshop lindsay
Grc ashg2015 workshop_mudge
ABGT 2016 Workshop Schneider
Creating Reference-Grade Human Genome Assemblies
Variant Calling II
TAGC2016 schneider
Agbt2015 workshop schneider
101717.kh miga ashg_grc
Getting the most from the reference assembly
Ashg grc workshop2015_tg
Exploiting long read sequencing technology to build a substantially improved ...
AGBT 2016 Workshop Magrini
Creating Reference-Grade Human Genome Assemblies
Ad

Similar to Alignment Approaches II: Long Reads (20)

PPTX
Telomere-to-telomere assembly of a complete human chromosomes
PDF
London Calling 2019: Karen Miga
PPTX
from genome sequencing to genome assembly
PPTX
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
PPTX
BioSB meeting 2015
PDF
04_Assembly_2022.pdf
PPTX
2015 osu-metagenome
PPTX
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
PPTX
How to sequence a large eukaryotic genome
PPTX
Review of Liao et al - A draft human pangenome reference - Nature (2023)
PPTX
BFG_Chapter09_Next Generaton Sequencing_v04.pptx
PPTX
2015 beacon-metagenome-tutorial
PPTX
Church_GenomeAccess_2013_genome2013
PDF
Clase 2 - Genoma Humano proyecto conicet.pdf
PPTX
How we revealed genomes secrets?
PDF
40 Years of Genome Assembly: Are We Done Yet?
PPT
SyMAP Master's Thesis Presentation
PPTX
Genome Assembly copy
PPT
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
PDF
10.1.1.80.2149
Telomere-to-telomere assembly of a complete human chromosomes
London Calling 2019: Karen Miga
from genome sequencing to genome assembly
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
BioSB meeting 2015
04_Assembly_2022.pdf
2015 osu-metagenome
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
How to sequence a large eukaryotic genome
Review of Liao et al - A draft human pangenome reference - Nature (2023)
BFG_Chapter09_Next Generaton Sequencing_v04.pptx
2015 beacon-metagenome-tutorial
Church_GenomeAccess_2013_genome2013
Clase 2 - Genoma Humano proyecto conicet.pdf
How we revealed genomes secrets?
40 Years of Genome Assembly: Are We Done Yet?
SyMAP Master's Thesis Presentation
Genome Assembly copy
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
10.1.1.80.2149
Ad

More from Genome Reference Consortium (20)

PPTX
What's new and what's next for the human reference assembly?
PPTX
Advancements in the human genome reference assembly (GRCh38)
PPTX
Genome variation graphs with the vg toolkit
PPTX
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
PPTX
Why graph genome storage and updating wakes me up at 4 am
PPTX
Schneider grc workshop_final
PPTX
PPTX
Lrg and mane 16 oct 2018
PPTX
20181016 grc presentation-pa
PPTX
2018 1016 trio_binning_ashg_arhie_final
PPTX
Ashg2017 workshop schneider
PPTX
Ashg2017 workshop tg
PPTX
Ashg sedlazeck grc_share
PPTX
171017 giab for giab grc workshop
PDF
AGBT2017 Reference Workshop: Fulton
PPTX
AGBT2017 Reference Workshop: Schneider
PDF
AGBT2017 Reference Workshop: Lindsay
PDF
Haplotype resolved structural variation assembly with long reads
PDF
Everyday de novo diploid assembly
PPTX
Genome in a Bottle
What's new and what's next for the human reference assembly?
Advancements in the human genome reference assembly (GRCh38)
Genome variation graphs with the vg toolkit
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
Why graph genome storage and updating wakes me up at 4 am
Schneider grc workshop_final
Lrg and mane 16 oct 2018
20181016 grc presentation-pa
2018 1016 trio_binning_ashg_arhie_final
Ashg2017 workshop schneider
Ashg2017 workshop tg
Ashg sedlazeck grc_share
171017 giab for giab grc workshop
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Lindsay
Haplotype resolved structural variation assembly with long reads
Everyday de novo diploid assembly
Genome in a Bottle

Recently uploaded (20)

PPTX
CORDINATION COMPOUND AND ITS APPLICATIONS
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
Overview of calcium in human muscles.pptx
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PPTX
Biomechanics of the Hip - Basic Science.pptx
PPTX
Science Quipper for lesson in grade 8 Matatag Curriculum
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
Application of enzymes in medicine (2).pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
CORDINATION COMPOUND AND ITS APPLICATIONS
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
Overview of calcium in human muscles.pptx
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
Biomechanics of the Hip - Basic Science.pptx
Science Quipper for lesson in grade 8 Matatag Curriculum
Hypertension_Training_materials_English_2024[1] (1).pptx
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
Application of enzymes in medicine (2).pptx
Introduction to Cardiovascular system_structure and functions-1
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
lecture 2026 of Sjogren's syndrome l .pdf
BODY FLUIDS AND CIRCULATION class 11 .pptx
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
TOTAL hIP ARTHROPLASTY Presentation.pptx
Phytochemical Investigation of Miliusa longipes.pdf
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...

Alignment Approaches II: Long Reads

  • 1. FIND MEANING IN COMPLEXITY © Copyright 2014 by Pacific Biosciences of California, Inc. All rights reserved.For Research Use Only. Not for use in diagnostic procedures. Jason Chin (@infoecho) / Sept. 20 2014, GRC Workshop, Cambridge, UK Learning Genomic Structures From De Novo Assembly and Long-read Mapping de novol
  • 2. Cost per Genome Dilemma 2 Sequencing cost is down for sure, but getting a de novo human genome that has the same scientific standard as the initial work does NOT follow Moore’s law. PacBio® CHM1: 4378 kb from just single random fragment library HGP, N50 ~100kb NCBI-34 Contig N50 29Mb HuRef: 107kb BGI YH: 7.4kb KB1: 5.5kb NA12878: 24kb CHM1: 144kb RP11: 127kb According to the NHGRI website, the definition of “sequencing a genome” changed in 2008. The 1000 Genomes Project starts in 2008, too.
  • 3. Question Asked!! •  Since the 1000 Genomes Project, we have learned a lot of about point mutations. Can we go beyond that? •  What if we have 50, 100 or more human assemblies so we can address all genetic variations as much as possible? •  Will one day all human genome sequencing be done in de novo fashion? –  If so, how can we get ready for that as bioinformatists? 3 Evan Eichler , In Future Opportunities for Genome Sequencing and Beyond, July 28-29, 2014
  • 4. Where We Are Now •  One PacBio® human data set is publicly available, more are likely to come •  Multiple groups have successfully assembled the public CHM1 data set independently with new algorithms from raw data •  With new alignment/assembly tools from Gene Myers: one can assemble a genome in ~ 20,000 CPU-hours. (20X faster than 400,000+ CPU-hours from previous effort.) 4 New Assembly Statistics done With Daligner:   #Seqs      5,058   Mean        562,695   Max          27,292,514   n50          5,265,098   Total      2,846,115,586  http://dazzlerblog.wordpress.com
  • 5. What Can We Learn from High-contiguity De Novo Human Assemblies? 5
  • 6. What Can We Learn from High-contiguity Human Assemblies? •  Low-hanging Fruits –  Calling SNPs (assembly not needed, but it helps) –  Calling structure variants with whole-genome alignment approaches –  Inferring repeats by coverage analysis •  Assembly graph can provide information for understanding more complicated polymorphisms 6
  • 7. Call SNPs / Example: HLA-B 7
  • 8. Call Structure Variation By Whole-genome Alignment •  Whole-genome alignments ( ~ 1 hr in a 32-core machine) –  With multi-threaded Mummer –  Clustering the hits with Mgaps and identified “gaps” in the alignments, convert to bed format for visualization 8 Structure Variants Called in Chromosome 1
  • 9. Distribution of The Structure Variation Sizes •  Number of insertions/deletions: 13796 SV calls (for insertion or deletion > 100 bp against hg19) 9
  • 10. PacBio® vs. Short-read Alignment View for SV in the MHC region 10 318bp insertion
  • 11. Assembly Graph 11 Each edge is associated with a sequence. Every path is a candidate of a model of part of the genome. From Gene Myers’ ISMB 2014 Keynote talk
  • 12. Dissect a Contig from a String Graph The autonomy of a contig from a string graph layout 12 A contig: a linear non-branching path Each node: the begin (5’) or end (3’) of a read Each edge: a continuous sub- sequence from one read Ek:  (V1,  V2,  Read,  Range)  =    (  00099576_1:B,  00101043_0:B,  00101043_0,  1991-­‐0  )     Read  1:  00099576_1,  Read  2:  00101043_0     In practice, we might just encode the paths in a contig rather than each single edge: C  =  (Ek,  Ek+1,  Ek+2,  Ek+2)  =  (Pj  Pj+1)       V1 V2 V3 V4 V5 Ek Ek+1 Ek+2 Ek+3 V1 V3 V5 Pj Pj+1 C = =
  • 13. Assembly String Graph of CHM1 Genome •  Largest connect component: 31998 nodes, 39399 edges, ~36.5% (~1Gbp) of the human genome (total: 87572 nodes, 94530 edges) 13 Centromere? Casey Bergman: “it almost looks like an electron micrograph of the nucleus” #convergence
  • 14. Polymorphism Structure vs. Local Assembly Graph Structure 14 SNPs SNPs SNPs SVsSVs Diploid Genome Segmental Duplication Similar String Graph
  • 15. Identify Contigs: A New Proposal SNPs SNPs SNPs SVs SVs Associated contig 1 Associated contig 2 Primary contig 1 full length contig + 2 associated contigs Keep the long-range information while maintaining the relations of the alternative alleles.
  • 16. Contig 4076 Alignment Around DPY19L2 Locus Same contig
  • 17. Contig Graph and Segmental Duplication Contig 4076, one primary contig, 3 associate contigs, aligned to Chr7 and Chr12
  • 18. Coting 4076 Alignment to Chr7 Same contig SV calls from CHM1 asm SV calls from GRC38
  • 19. Local Neighborhood Subgraph of Contig 4076 19
  • 20. Examining an Assembly Graph at Contig Level Around 1q21 •  Contig graph, 1q21, contig 4108, another potential segmental duplication? 20
  • 21. Another Intriguing Case 21 •  Contig 4006 mapped to chr 9 The aligned region changes a lot in GRC38.
  • 22. Contig Coverage Analysis 22 18.5 X 2 * 18.5 X 3 * 18.5 X High coverage long contigs 40 contigs > 100kbp > 2.5 * 18.5 X Poor assemblies, alignment artifacts, or sequence errors? High repeat elements
  • 23. Checking the Complexity of the High-coverage Contigs 23 Contig 4006, 687kb, 53x coverage Contig 4235, 453k, 59x coverage Contig 3842, 235k, 54x coverage Warning: These contigs may not be 100% correctly assembled due to some nasty repeats. However, the local graphs give hints about the true genome structures.
  • 24. How does the High-coverage Contig Look? 24 >2000X in this region
  • 25. How does The High-coverage Contig Look? 25 High-coverage Region Alpha satellites?
  • 26. For Research Use Only. Not for use in diagnostic procedures. Extreme Repeats 26
  • 27. Identify Centromere Alpha-satellite Structure •  Most of the nasty contig graphs are around the centromere. Currently, it remains hard to get long contigs around those very long tandem repeats. •  However, we can still learn many useful things from long-read data •  Tool In Development: α-Centauri for identifying different high-order repeat structures (https://github.com/volkansevim/alpha-CENTAURI, Volkan Sevim, Ali Bashir & Karen Miga ) 27
  • 28. Centromere Alpha Satellites Have Non-trivial High-order Repeat Structure 28 Karen Miga
  • 29. Example: A Read Reconstructs a 24-mer HOR 29 Align monomer to each other to identify near identical mon0mers Identify HOR with the monomer IDs and positions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 171819 20 21 22 23 24
  • 30. Many Other Open Topics •  Low-coverage assembly: cost vs. quality analysis •  Phasing for haplotypes •  Crowd-sourcing infrastructure for examining / annotating / correcting genome assemblies •  Evaluation about SNPs calling with short reads on better assembly •  Large-scale comparative genomes with de novo assemblies •  Assembly-graph data format •  Visualization Techniques •  Combining other data types, e.g. optical mapping 30 It is a very exciting time. We still need more tools to harvest information to generate new knowledge.
  • 31. For Research Use Only. Not for use in diagnostic procedures. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq are trademarks of Pacific Biosciences in the United States and/or other countries. All other trademarks are the sole property of their respective owners. 31