SlideShare a Scribd company logo
Validating and improving the
D. melanogaster reference genome sequence
using PacBio de novo assemblies
Casey M. Bergman
@bergmanlab
@caseybergman
University of Liverpool
Centre for Genomic Research
PacBio Symposium
4 April 2014
!
Credits
• Danny Miller (Stowers Institute)
• Jane Landolin, Kristi Kim, Jason Chin & Edwin Hauw
(Pacific Biosciences)
• Sue Celniker & Roger Hoskins (Berkeley Drosophila
Genome Project)
• Sergey Koren & Adam Phillippy (National Biodefense
Analysis and Countermeasures Center)
• Raquel Linheiro (University of Manchester)
Bridges (1916) PMID: 17245850
“The” Drosophila genome circa 1910
“The” Drosophila genome circa 1925
Morgan et al. (1925) The Genetics of Drosophila
Painter (1933) PMID: 17801695
“The” Drosophila genome circa 1940
The strategy we have used is called chromosomal walking and jumping; it is
shown diagrammatically in Figure 1. The chromosomal origin of any non-repeated
segment of D. melanogaster DNA (Dm segment) can be determined by in situ
hybridization of that DNA to polytene chromosomes. When the sites of
hybridization are visualized by tritium autoradiography, the position is usually
confined to one or a few bands, which is similar to the precision of the cytological
localizations of rearrangement breakpoints or the localizations of well-mapped
genes. If a DNA sequence is found within a few bands of a gene of interest, that
sequence can be used as the starting point for a chromosomal walk to the gene. A
"step" in the walking procedure involves screening a recombinant DNA library of
random large Dm segments to collect those that overlap the starting point. The
CIIBIDE~W:KI' F87B: I B I C I D I E I F888A I B IC~~FA 0
A Af  / 
- - START HERE
• T
ill
T •
LEFT FUSION FRAGMENT RIGHT FUSION FRABNENT
89 IBB 88
INVERSION INVERSION
BREAK BREAK
Fro. 1. The strategy for walking and jumping. The upper chromosome represents a portion of the
right arm of the third chromosome with normal cytology (drawn from the map of Bridges, 1941), and
the lower chromosome has an inversion of the region from 87E to 89E. A few steps of a chromosomal
walk are shown diagrammatically below the 87E region (not to scale with the chromosome). When the
walk reached the site of the inversion breakpoint, the DNA from that position could be used to
identify the two fusion fragments isolated from the inversion chromosome. The foreign DNA in the
fusion fragments (tandem circles) was homologous to normal chromosomal DNA at the right or distal
inversion breakpoint, and thus it served as the origin of a chromosomal walk in 89E.
e.g. Bender et al. (1983) PMID: 6410077
“The” Drosophila genome circa 1990
Validating and improving the D. melanogaster reference genome sequence using PacBio de novo assemblies
“The” Drosophila genome circa 2000
Adams et al. (2000) PMID: 10731132
Accuracy of whole genome shotgun (WGS)
assembly vs. BAC-based physical map
Myers et al. (2000) PMID: 10731133
peaks - discrepancies
green - gaps
purple - TEs
Myers et al. (2000) PMID: 10731133
Accuracy of WGS vs. BAC-based sequencing
“The” Drosophila genome since 2000
~ 120 Mb of euchromatin
~ 60-100 Mb heterochromatin
Release Date
Total size of
scaffolds
Total size of
contigs
Contigs Contig N50
1 Mar 2000 116,117,226 114,201,085 1427 220,490
2 Oct 2000 116,109,070 114,448,849 1103 318,193
3 Dec 2002 116,781,562 116,739,493 50 14,289,516
4 Apr 2004 118,357,599 118,348,386 28 18,203,742
5 Mar 2006 120,381,546 120,290,946 14 21,485,538
Euchromatic genome assemblies
Several gaps persist in euchromatic arms
~ 120 Mb of euchromatin
~ 60-100 Mb heterochromatin
“The” Drosophila genome since 2000
Hoskins et al. (2007) PMID: 17569867
Heterochromatic genome assemblies
~350 Kb
in Rel5
Release Scaffolds
Total Size of
Scaffolds
Contigs
Total Size of
Contigs
1 0 0 0 0
2 1 (U) 7,513,406 1000 5,530,718
3 2604 20,941,991 3810 17,150,417
4 0 0 0 0
5 8 (U + armHet + mt) 19,350,335 3044 16,535,110
Majority of heterochromatin unassembled
Heterochromatic genome assemblies
Low coverage pilot experiment with Hawley Lab
http://bergmanlab.smith.man.ac.uk/?p=1971
High coverage experiment with PacBio & BDGP
http://blog.pacificbiosciences.com/2014/01/data-release-preliminary-de-novo.html
http://www.ncbi.nlm.nih.gov/sra/?term=SRP040522
Metric Value
Library Size (Kb) 15
Chemistry P5-C3
# SMRT cells 42
Run time (days) 6
# bases (nt) 15,208,567,933
# reads 1,514,730
avg length (nt) 10,040
N50 (nt) 14,214
Max (nt) 44,766
High coverage PacBio dataset for
D. melanogaster BDGP reference strain
http://blog.pacificbiosciences.com/2014/01/data-release-preliminary-de-novo.html
http://www.ncbi.nlm.nih.gov/sra/?term=SRP040522
Reference-based long-read mapping with BLASR
http://bergmanlab.smith.man.ac.uk/?p=2176
>90x coverage based on reference mapping
http://bergmanlab.smith.man.ac.uk/?p=2176
PacBio-only assemblies of the
D. melanogaster genome
Assembly Read Set Pre-assembly Assembler Quivered
CA25x 25x longest PBcR CA 8.1 n
CA25x-Q 25x longest PBcR CA 8.1 y
CA50x 50x longest PBcR CA 8.1 n
FALCON-Q 25x longest FALCON FALCON y
FALCON-PBcR 70x PBcR FALCON n
FALCON-AWS all FALCON FALCON n
Koren & Phillippy (unpublished)
Chin & Bergman (unpublished)
Assembly Contigs Contig N50 (nt) Max Contig (nt)
CA25x 128 15,297,019 24,622,056
CA25x-Q 128 15,305,620 24,648,237
CA50x 131 4,105,199 24,577,947
FALCON-Q 434 5,001,041 21,336,512
FALCON-PBcR 1774 7,499,810 25,727,813
FALCON-AWS 955 7,882,002 21,631,108
PacBio-only assemblies of the
D. melanogaster genome
Long-range contiguity of CA25x assembly
Koren & Phillippy (unpublished)
http://cbcb.umd.edu/software/PBcR/dmel.html
X 3R 3L 2L 42R
Chin (unpublished)
https://github.com/PacificBiosciences/falcon
Long-range contiguity of FALCON-Q assembly
X3R 2R3L 2L
Base level accuracy of PacBio
D. melanogaster assemblies vs Release 5
0
6
12
18
24
30
CA25x CA25x-Q CA50x FALCON-Q FALCON-PBcR FALCON-AWS
0
60
120
180
240
300
mismatches/100kb
indels/100kb
Rel3
Rel3
Rel1
Rel1
Towards a $1000 genome assembly
using FALCON, StarCluster & AWS
Assembly
Pre-assembly
(CPU hours)
Assembly
(CPU hours)
CA25x 621,000 8,000
FALCON-AWS 1,500 48
Expert Novice
https://github.com/PacificBiosciences/FALCON/blob/v0.1.1/examples/Dmel_asm.md
https://github.com/cbergman/FALCON/blob/v0.1.1/examples/Dmel_asm.md
Euchromatic gap closure with PacBio contigs
Celniker (unpublished)
Gap at 64C
Gap at 57B
Identification of Y-chromosome contigs in
PacBio assemblies by female/male depth ratio
0 1 2 3
02468
Ratio Profile
Ratio (in 10000 bre
Counts(log)
02468
Ratio Profile
chr2L
chr2R
chr3L
chr3R
chr4
chrX
chrYHet
log10frequency
female/male depth ratio
bwa
short read
DNA-seq
female/male depth ratio
Linheiro & Bergman (unpublished)
0 10 20 30 40 50 60
01234
Ratio 0052_00|quiver|quiver
Location in chr (x10000)
Ratio
●●●●●●
●
●
●●●●●●●●●●●●●
●
●●
●
●●
●
●
●
●●●●●
●
●●●●
●
●●●
●
●●●
●●●●●
●●●
●
●
●●●
●
●
●
●
female/maledepthratio
window (10Kb step)
0 10 20 30 40 50 60
01234
Ratio 0052_00|quiver|quiver
Location in chr (x10000)
Ratio
●●●●●●
●
●
●●●●●●●●●●●●●
●
●●
●
●●
●
●
●
●●●●●
●
●●●●
●
●●●
●
●●●
●●●●●
●●●
●
●
●●●
●
●
●
●
●
●
●
_
_
A ratio
X ratio
Y ratio
X log 100 count
Y log 100 count
Identification of Y-chromosome contigs in
PacBio Assemblies by female/male depth ratio
Linheiro & Bergman (unpublished)
Improvement of the Y-chromosome
assembly & gene models
Celniker (unpublished)
Take Home (I)
• View of D. melanogaster genome has been changing
for >100 years & is still not complete
• Frontier of D. melanogaster genome assembly is in
heterochromatic regions (model for repeat-rich plant
genomes)
• PacBio long reads can be used to generate long-
range de novo assemblies that can close
euchromatic gaps & generate large heterochromatic
contigs
• Bioinformatic challenges: better pre-assembly
algorithms, better polishing algorithms, *.h5 data
archiving
• Early, open release of genomic data by small labs
can stimulate big returns & new collaborations
• PacBio has right corporate philosophy of engaging/
collaborating with the genomics community (open
data, open source)
Take Home (II)

More Related Content

PDF
The Human Genome Project - Part I
PDF
Human Evolution Talk
PPTX
PPT
Unilag workshop complex genome analysis
PDF
Recommentation Letter HN Shanna
PDF
Rethinking media relations (2)
PDF
The Human Genome Project - Part I
Human Evolution Talk
Unilag workshop complex genome analysis
Recommentation Letter HN Shanna
Rethinking media relations (2)

Viewers also liked (11)

PDF
gallerirundan2_031211
PDF
МойСклад: новые возможности 2016
PPT
KOMPAS
PDF
Vacature shine
PPTX
Belgien
PPT
Presentazione Chiara definitiva (2)
PDF
The Ultimate Leadership Development Experience Explores Key Issues
PPTX
Ashoka the great....
PDF
ITIL ServiceNow offerings
PDF
Era7 bioinformatics and_the_microbiome_november_2016
PDF
HANDICARE Monte-Escaliers - Courbe - Double rail
gallerirundan2_031211
МойСклад: новые возможности 2016
KOMPAS
Vacature shine
Belgien
Presentazione Chiara definitiva (2)
The Ultimate Leadership Development Experience Explores Key Issues
Ashoka the great....
ITIL ServiceNow offerings
Era7 bioinformatics and_the_microbiome_november_2016
HANDICARE Monte-Escaliers - Courbe - Double rail
Ad

Similar to Validating and improving the D. melanogaster reference genome sequence using PacBio de novo assemblies (20)

PDF
Open pacbiomodelorgpaper j_landolin_20150121
PPTX
Drosophila melanogaster (genome analysis)
PPTX
Science Project Title
PDF
ECCB10 talk - Nextgen sequencing and SNPs
PDF
40 Years of Genome Assembly: Are We Done Yet?
PPTX
Taras Oleksyk at #ICG12: Innovative assembly strategy contributes to the unde...
PDF
Burns_et_al-2016-Molecular_Ecology_Resources
PDF
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
PPTX
Three posters presented at AAAS2015
PPTX
from genome sequencing to genome assembly
PDF
Forsharing cshl2011 sequencing
PPTX
VCF and RDF
PDF
Genome Assembly
PPTX
Telomere-to-telomere assembly of a complete human chromosomes
PPT
20100516 bioinformatics kapushesky_lecture08
PDF
DNA barcoding techniques in insect diagnosis ppt
PDF
Presentation at ZSJ 2013 by Shigehiro Kuraku
PPTX
Rnaseq forgenefinding
PDF
PDF
Lecture on the annotation of transposable elements
Open pacbiomodelorgpaper j_landolin_20150121
Drosophila melanogaster (genome analysis)
Science Project Title
ECCB10 talk - Nextgen sequencing and SNPs
40 Years of Genome Assembly: Are We Done Yet?
Taras Oleksyk at #ICG12: Innovative assembly strategy contributes to the unde...
Burns_et_al-2016-Molecular_Ecology_Resources
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
Three posters presented at AAAS2015
from genome sequencing to genome assembly
Forsharing cshl2011 sequencing
VCF and RDF
Genome Assembly
Telomere-to-telomere assembly of a complete human chromosomes
20100516 bioinformatics kapushesky_lecture08
DNA barcoding techniques in insect diagnosis ppt
Presentation at ZSJ 2013 by Shigehiro Kuraku
Rnaseq forgenefinding
Lecture on the annotation of transposable elements
Ad

Recently uploaded (20)

PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Application of enzymes in medicine (2).pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPT
6.1 High Risk New Born. Padetric health ppt
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
Pharmacology of Autonomic nervous system
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
2. Earth - The Living Planet earth and life
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
C1 cut-Methane and it's Derivatives.pptx
PPTX
Science Quipper for lesson in grade 8 Matatag Curriculum
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
7. General Toxicologyfor clinical phrmacy.pptx
Application of enzymes in medicine (2).pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
TOTAL hIP ARTHROPLASTY Presentation.pptx
6.1 High Risk New Born. Padetric health ppt
neck nodes and dissection types and lymph nodes levels
ECG_Course_Presentation د.محمد صقران ppt
Classification Systems_TAXONOMY_SCIENCE8.pptx
Pharmacology of Autonomic nervous system
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
Introduction to Cardiovascular system_structure and functions-1
2. Earth - The Living Planet earth and life
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
C1 cut-Methane and it's Derivatives.pptx
Science Quipper for lesson in grade 8 Matatag Curriculum
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...

Validating and improving the D. melanogaster reference genome sequence using PacBio de novo assemblies

  • 1. Validating and improving the D. melanogaster reference genome sequence using PacBio de novo assemblies Casey M. Bergman @bergmanlab @caseybergman University of Liverpool Centre for Genomic Research PacBio Symposium 4 April 2014 !
  • 2. Credits • Danny Miller (Stowers Institute) • Jane Landolin, Kristi Kim, Jason Chin & Edwin Hauw (Pacific Biosciences) • Sue Celniker & Roger Hoskins (Berkeley Drosophila Genome Project) • Sergey Koren & Adam Phillippy (National Biodefense Analysis and Countermeasures Center) • Raquel Linheiro (University of Manchester)
  • 3. Bridges (1916) PMID: 17245850 “The” Drosophila genome circa 1910
  • 4. “The” Drosophila genome circa 1925 Morgan et al. (1925) The Genetics of Drosophila
  • 5. Painter (1933) PMID: 17801695 “The” Drosophila genome circa 1940
  • 6. The strategy we have used is called chromosomal walking and jumping; it is shown diagrammatically in Figure 1. The chromosomal origin of any non-repeated segment of D. melanogaster DNA (Dm segment) can be determined by in situ hybridization of that DNA to polytene chromosomes. When the sites of hybridization are visualized by tritium autoradiography, the position is usually confined to one or a few bands, which is similar to the precision of the cytological localizations of rearrangement breakpoints or the localizations of well-mapped genes. If a DNA sequence is found within a few bands of a gene of interest, that sequence can be used as the starting point for a chromosomal walk to the gene. A "step" in the walking procedure involves screening a recombinant DNA library of random large Dm segments to collect those that overlap the starting point. The CIIBIDE~W:KI' F87B: I B I C I D I E I F888A I B IC~~FA 0 A Af / - - START HERE • T ill T • LEFT FUSION FRAGMENT RIGHT FUSION FRABNENT 89 IBB 88 INVERSION INVERSION BREAK BREAK Fro. 1. The strategy for walking and jumping. The upper chromosome represents a portion of the right arm of the third chromosome with normal cytology (drawn from the map of Bridges, 1941), and the lower chromosome has an inversion of the region from 87E to 89E. A few steps of a chromosomal walk are shown diagrammatically below the 87E region (not to scale with the chromosome). When the walk reached the site of the inversion breakpoint, the DNA from that position could be used to identify the two fusion fragments isolated from the inversion chromosome. The foreign DNA in the fusion fragments (tandem circles) was homologous to normal chromosomal DNA at the right or distal inversion breakpoint, and thus it served as the origin of a chromosomal walk in 89E. e.g. Bender et al. (1983) PMID: 6410077 “The” Drosophila genome circa 1990
  • 8. “The” Drosophila genome circa 2000 Adams et al. (2000) PMID: 10731132
  • 9. Accuracy of whole genome shotgun (WGS) assembly vs. BAC-based physical map Myers et al. (2000) PMID: 10731133
  • 10. peaks - discrepancies green - gaps purple - TEs Myers et al. (2000) PMID: 10731133 Accuracy of WGS vs. BAC-based sequencing
  • 11. “The” Drosophila genome since 2000 ~ 120 Mb of euchromatin ~ 60-100 Mb heterochromatin
  • 12. Release Date Total size of scaffolds Total size of contigs Contigs Contig N50 1 Mar 2000 116,117,226 114,201,085 1427 220,490 2 Oct 2000 116,109,070 114,448,849 1103 318,193 3 Dec 2002 116,781,562 116,739,493 50 14,289,516 4 Apr 2004 118,357,599 118,348,386 28 18,203,742 5 Mar 2006 120,381,546 120,290,946 14 21,485,538 Euchromatic genome assemblies Several gaps persist in euchromatic arms
  • 13. ~ 120 Mb of euchromatin ~ 60-100 Mb heterochromatin “The” Drosophila genome since 2000
  • 14. Hoskins et al. (2007) PMID: 17569867 Heterochromatic genome assemblies ~350 Kb in Rel5
  • 15. Release Scaffolds Total Size of Scaffolds Contigs Total Size of Contigs 1 0 0 0 0 2 1 (U) 7,513,406 1000 5,530,718 3 2604 20,941,991 3810 17,150,417 4 0 0 0 0 5 8 (U + armHet + mt) 19,350,335 3044 16,535,110 Majority of heterochromatin unassembled Heterochromatic genome assemblies
  • 16. Low coverage pilot experiment with Hawley Lab http://bergmanlab.smith.man.ac.uk/?p=1971
  • 17. High coverage experiment with PacBio & BDGP http://blog.pacificbiosciences.com/2014/01/data-release-preliminary-de-novo.html http://www.ncbi.nlm.nih.gov/sra/?term=SRP040522
  • 18. Metric Value Library Size (Kb) 15 Chemistry P5-C3 # SMRT cells 42 Run time (days) 6 # bases (nt) 15,208,567,933 # reads 1,514,730 avg length (nt) 10,040 N50 (nt) 14,214 Max (nt) 44,766 High coverage PacBio dataset for D. melanogaster BDGP reference strain http://blog.pacificbiosciences.com/2014/01/data-release-preliminary-de-novo.html http://www.ncbi.nlm.nih.gov/sra/?term=SRP040522
  • 19. Reference-based long-read mapping with BLASR http://bergmanlab.smith.man.ac.uk/?p=2176
  • 20. >90x coverage based on reference mapping http://bergmanlab.smith.man.ac.uk/?p=2176
  • 21. PacBio-only assemblies of the D. melanogaster genome Assembly Read Set Pre-assembly Assembler Quivered CA25x 25x longest PBcR CA 8.1 n CA25x-Q 25x longest PBcR CA 8.1 y CA50x 50x longest PBcR CA 8.1 n FALCON-Q 25x longest FALCON FALCON y FALCON-PBcR 70x PBcR FALCON n FALCON-AWS all FALCON FALCON n Koren & Phillippy (unpublished) Chin & Bergman (unpublished)
  • 22. Assembly Contigs Contig N50 (nt) Max Contig (nt) CA25x 128 15,297,019 24,622,056 CA25x-Q 128 15,305,620 24,648,237 CA50x 131 4,105,199 24,577,947 FALCON-Q 434 5,001,041 21,336,512 FALCON-PBcR 1774 7,499,810 25,727,813 FALCON-AWS 955 7,882,002 21,631,108 PacBio-only assemblies of the D. melanogaster genome
  • 23. Long-range contiguity of CA25x assembly Koren & Phillippy (unpublished) http://cbcb.umd.edu/software/PBcR/dmel.html X 3R 3L 2L 42R
  • 25. Base level accuracy of PacBio D. melanogaster assemblies vs Release 5 0 6 12 18 24 30 CA25x CA25x-Q CA50x FALCON-Q FALCON-PBcR FALCON-AWS 0 60 120 180 240 300 mismatches/100kb indels/100kb Rel3 Rel3 Rel1 Rel1
  • 26. Towards a $1000 genome assembly using FALCON, StarCluster & AWS Assembly Pre-assembly (CPU hours) Assembly (CPU hours) CA25x 621,000 8,000 FALCON-AWS 1,500 48 Expert Novice https://github.com/PacificBiosciences/FALCON/blob/v0.1.1/examples/Dmel_asm.md https://github.com/cbergman/FALCON/blob/v0.1.1/examples/Dmel_asm.md
  • 27. Euchromatic gap closure with PacBio contigs Celniker (unpublished) Gap at 64C Gap at 57B
  • 28. Identification of Y-chromosome contigs in PacBio assemblies by female/male depth ratio 0 1 2 3 02468 Ratio Profile Ratio (in 10000 bre Counts(log) 02468 Ratio Profile chr2L chr2R chr3L chr3R chr4 chrX chrYHet log10frequency female/male depth ratio bwa short read DNA-seq female/male depth ratio Linheiro & Bergman (unpublished)
  • 29. 0 10 20 30 40 50 60 01234 Ratio 0052_00|quiver|quiver Location in chr (x10000) Ratio ●●●●●● ● ● ●●●●●●●●●●●●● ● ●● ● ●● ● ● ● ●●●●● ● ●●●● ● ●●● ● ●●● ●●●●● ●●● ● ● ●●● ● ● ● ● female/maledepthratio window (10Kb step) 0 10 20 30 40 50 60 01234 Ratio 0052_00|quiver|quiver Location in chr (x10000) Ratio ●●●●●● ● ● ●●●●●●●●●●●●● ● ●● ● ●● ● ● ● ●●●●● ● ●●●● ● ●●● ● ●●● ●●●●● ●●● ● ● ●●● ● ● ● ● ● ● ● _ _ A ratio X ratio Y ratio X log 100 count Y log 100 count Identification of Y-chromosome contigs in PacBio Assemblies by female/male depth ratio Linheiro & Bergman (unpublished)
  • 30. Improvement of the Y-chromosome assembly & gene models Celniker (unpublished)
  • 31. Take Home (I) • View of D. melanogaster genome has been changing for >100 years & is still not complete • Frontier of D. melanogaster genome assembly is in heterochromatic regions (model for repeat-rich plant genomes) • PacBio long reads can be used to generate long- range de novo assemblies that can close euchromatic gaps & generate large heterochromatic contigs • Bioinformatic challenges: better pre-assembly algorithms, better polishing algorithms, *.h5 data archiving
  • 32. • Early, open release of genomic data by small labs can stimulate big returns & new collaborations • PacBio has right corporate philosophy of engaging/ collaborating with the genomics community (open data, open source) Take Home (II)