SlideShare a Scribd company logo
3
Most read
5
Most read
6
Most read
Solanum lycopersicum Heinz 1706
genome assembly and annotation
SL4.0 and ITAG4.0
Sol Genomics Network
https://solgenomics.net/
SL4.0 assembly
● 80X Pacbio coverage with RSII and Sequel (13kb read N50)
● Canu assembly (N50 5.5 Mb)
● Hi-C scaffolding (12 chromosomes and unplaced contigs)
● Corrected with Illumina DNAseq (coverage 60x)
● Filtered for mitochondrial and chloroplast contigs
● Validated with Bionano optical maps and 10X linked reads
Comparison with the previous assemblies
Genome Assembly versions SL4.0 SL3.0 SL2.5
Assembly Size (bp) 782,520,133 828,076,956 823,944,041
Non-N bases 782,475,302 746,357,470 737,636,348
N’s (bp) 44,831 81,719,486 86,307,693
Chr 00 / unplaced contig size (bp)
9,643,350 20,852,292 21,805,821
Number of Chr 00 contigs 152 3,141 4,410
Repeat content
(RepeatModeler/RepeatMasker)
64.19% 56.39% 56.34%
Repeat content (REPET) 71.77% 61.55% 60.94%
Assembly completeness estimation
based on kmer's
99.24% 98.96% 98.83%
SL3.0 vs SL4.0
Genome assembly co-linearity
Input data for genome annotation
- Full-length cDNA sequenced using PacBio IsoSeq (Breaker and Mature
green fruit stages)
- RNAseq Illumina data from >1,300 libraries with >14 billion reads
- Disease resistance data (Martin and Jones labs)
- 3’ and 5’ UTR enriched data (Giovannoni, Aharoni and Sinha labs)
- Public data from NCBI SRA
- NCBI EST sequences (~300 K)
- Full-length cDNA sequences (~13 K) from Micro-Tom (Aoki et. al., 2010)
Annotation of protein-coding gene models
ITAG4.0 ITAG2.4
Number of protein-coding genes 34,075 34,725
Average transcript length 1,303 1,209
Average number of exons per gene 4.74 4.61
Fraction of genes with 5' UTR 0.49 0.34
Fraction of genes with 3' UTR 0.58 0.41
Long non-coding RNA in ITAG4.0 - 5,874 with 6,694 alternately spliced isoforms
Annotation Edit Distance (AED)
Annotation Edit Distance (AED)
provides a means to evaluate
quality of annotations given the
evidence set.
AED cumulative plot shows
improvements in the ITAG4.0
compared to ITAG2.4.
Novel protein coding genes in ITAG4.0
Novel genes in ITAG4.0
are enriched in stress
response genes.
GO-terms enriched in
novel genes are shown as
fold enriched in minus
log10 of their
corresponding P-values.
Thank you!
Submit your annotation corrections using Tomato Apollo annotation editor - contact SGN for account
https://solgenomics.net/contact/form

More Related Content

PPTX
submergence tolerance in rice - sub 1
PPTX
Population breeding in self pollinated crops
PPTX
Phenomics assisted breeding in crop improvement
PDF
Role of double haploids in vegetable crop improvement
PPTX
molecular breeding achievement in crop improvement.pptx
PDF
STRIGOLACTONES: Role In Plant Development
PPT
Deployment of rust resistance genes in wheat varieties
DOCX
Rajma breeding
submergence tolerance in rice - sub 1
Population breeding in self pollinated crops
Phenomics assisted breeding in crop improvement
Role of double haploids in vegetable crop improvement
molecular breeding achievement in crop improvement.pptx
STRIGOLACTONES: Role In Plant Development
Deployment of rust resistance genes in wheat varieties
Rajma breeding

What's hot (20)

PDF
2 gpb 621 analysis of continuous variation
PDF
Submergence tolerant rice
PPT
Rapid Generation Advance
PPTX
Seed Biopriming- Biological method of seed treatment
PPT
seed physiology
PPTX
Heterotic pools
PPTX
Speed breeding _Manoj CA
PPTX
Seed deterioration
PPTX
Fungicide resistance
PPTX
Improved Crop varieties in Nepal
PPTX
Pathogenomics
PPTX
Marker Assisted Backcrossing
PPTX
Genetic Engineering for drought
PPTX
Production Technology of Safflower
PPTX
Organogenesis in tissue culture
PPTX
Back Cross Breeding Method
POT
MORFOLOGIA E FENOLOGIA DA CULTURA DA SOJA
PPTX
APPLICATION OF BIOTECHNOLOGICAL TOOLS IN VEGETABLE IMPROVEMENT
PPTX
Gene introgression from wild relatives to cultivated plants
PPTX
Handling of segregating generations -Backcross breeding
2 gpb 621 analysis of continuous variation
Submergence tolerant rice
Rapid Generation Advance
Seed Biopriming- Biological method of seed treatment
seed physiology
Heterotic pools
Speed breeding _Manoj CA
Seed deterioration
Fungicide resistance
Improved Crop varieties in Nepal
Pathogenomics
Marker Assisted Backcrossing
Genetic Engineering for drought
Production Technology of Safflower
Organogenesis in tissue culture
Back Cross Breeding Method
MORFOLOGIA E FENOLOGIA DA CULTURA DA SOJA
APPLICATION OF BIOTECHNOLOGICAL TOOLS IN VEGETABLE IMPROVEMENT
Gene introgression from wild relatives to cultivated plants
Handling of segregating generations -Backcross breeding
Ad

Similar to Sl4.0 and ITAG4.0 (9)

PDF
Improvements in the Tomato Reference Genome (SL3.0) and Annotation (ITAG3.0)
PDF
2010-09-03.Whitty_B.Solanaceae_Genomics_Resource.poster
PDF
Functional Genomics and Biotechnology in Solanaceae and Cucurbitaceae Crops 1...
PDF
SGN UPLB 2016
PDF
Functional Genomics and Biotechnology in Solanaceae and Cucurbitaceae Crops 1...
PDF
Nature potato
PDF
Potato genome sequence paper
PDF
Sol2015 SGN Bioinformatics Tools
PDF
Nature potato
Improvements in the Tomato Reference Genome (SL3.0) and Annotation (ITAG3.0)
2010-09-03.Whitty_B.Solanaceae_Genomics_Resource.poster
Functional Genomics and Biotechnology in Solanaceae and Cucurbitaceae Crops 1...
SGN UPLB 2016
Functional Genomics and Biotechnology in Solanaceae and Cucurbitaceae Crops 1...
Nature potato
Potato genome sequence paper
Sol2015 SGN Bioinformatics Tools
Nature potato
Ad

More from solgenomics (20)

PDF
Cassavabase-PhenoApps demo ISTRC 2018
PDF
Cassavabase-PhenoApp sample tracking
PDF
breeding informatics solutions at SGN
PDF
Musabase PAG 2018
PDF
Cassavabase workshop ibadan March17
PPT
SolGS Hyderabad conference 2016
PPTX
Musa base phenotyping workflow demo
PPT
SolGS workshop 2016
PPTX
Cassavabase workshop IITA oct2016
PDF
Sql cheat sheet
PDF
Introduction to SQL
PPTX
YamBase phenotyping workflow demo
PPTX
Introduction to YamBase
PDF
Cassavabase general presentation PAG 2016
PDF
Cassavabase SolGS presentation PAG 2016
PDF
Cassavabase SolGS poster PAG 2016
PDF
1 introduction to cassavabase
PDF
2 Cassavabase workshop: search menu
PDF
3a Cassavabase worksop: manage breeding-program ands locations
PDF
3b Cassavabase workshop: manage accessions
Cassavabase-PhenoApps demo ISTRC 2018
Cassavabase-PhenoApp sample tracking
breeding informatics solutions at SGN
Musabase PAG 2018
Cassavabase workshop ibadan March17
SolGS Hyderabad conference 2016
Musa base phenotyping workflow demo
SolGS workshop 2016
Cassavabase workshop IITA oct2016
Sql cheat sheet
Introduction to SQL
YamBase phenotyping workflow demo
Introduction to YamBase
Cassavabase general presentation PAG 2016
Cassavabase SolGS presentation PAG 2016
Cassavabase SolGS poster PAG 2016
1 introduction to cassavabase
2 Cassavabase workshop: search menu
3a Cassavabase worksop: manage breeding-program ands locations
3b Cassavabase workshop: manage accessions

Recently uploaded (20)

PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
famous lake in india and its disturibution and importance
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
diccionario toefl examen de ingles para principiante
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
The scientific heritage No 166 (166) (2025)
PDF
Sciences of Europe No 170 (2025)
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
TOTAL hIP ARTHROPLASTY Presentation.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Cell Membrane: Structure, Composition & Functions
famous lake in india and its disturibution and importance
Phytochemical Investigation of Miliusa longipes.pdf
diccionario toefl examen de ingles para principiante
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
The scientific heritage No 166 (166) (2025)
Sciences of Europe No 170 (2025)
neck nodes and dissection types and lymph nodes levels
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
Comparative Structure of Integument in Vertebrates.pptx
7. General Toxicologyfor clinical phrmacy.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx

Sl4.0 and ITAG4.0

  • 1. Solanum lycopersicum Heinz 1706 genome assembly and annotation SL4.0 and ITAG4.0 Sol Genomics Network https://solgenomics.net/
  • 2. SL4.0 assembly ● 80X Pacbio coverage with RSII and Sequel (13kb read N50) ● Canu assembly (N50 5.5 Mb) ● Hi-C scaffolding (12 chromosomes and unplaced contigs) ● Corrected with Illumina DNAseq (coverage 60x) ● Filtered for mitochondrial and chloroplast contigs ● Validated with Bionano optical maps and 10X linked reads
  • 3. Comparison with the previous assemblies Genome Assembly versions SL4.0 SL3.0 SL2.5 Assembly Size (bp) 782,520,133 828,076,956 823,944,041 Non-N bases 782,475,302 746,357,470 737,636,348 N’s (bp) 44,831 81,719,486 86,307,693 Chr 00 / unplaced contig size (bp) 9,643,350 20,852,292 21,805,821 Number of Chr 00 contigs 152 3,141 4,410 Repeat content (RepeatModeler/RepeatMasker) 64.19% 56.39% 56.34% Repeat content (REPET) 71.77% 61.55% 60.94% Assembly completeness estimation based on kmer's 99.24% 98.96% 98.83%
  • 4. SL3.0 vs SL4.0 Genome assembly co-linearity
  • 5. Input data for genome annotation - Full-length cDNA sequenced using PacBio IsoSeq (Breaker and Mature green fruit stages) - RNAseq Illumina data from >1,300 libraries with >14 billion reads - Disease resistance data (Martin and Jones labs) - 3’ and 5’ UTR enriched data (Giovannoni, Aharoni and Sinha labs) - Public data from NCBI SRA - NCBI EST sequences (~300 K) - Full-length cDNA sequences (~13 K) from Micro-Tom (Aoki et. al., 2010)
  • 6. Annotation of protein-coding gene models ITAG4.0 ITAG2.4 Number of protein-coding genes 34,075 34,725 Average transcript length 1,303 1,209 Average number of exons per gene 4.74 4.61 Fraction of genes with 5' UTR 0.49 0.34 Fraction of genes with 3' UTR 0.58 0.41 Long non-coding RNA in ITAG4.0 - 5,874 with 6,694 alternately spliced isoforms
  • 7. Annotation Edit Distance (AED) Annotation Edit Distance (AED) provides a means to evaluate quality of annotations given the evidence set. AED cumulative plot shows improvements in the ITAG4.0 compared to ITAG2.4.
  • 8. Novel protein coding genes in ITAG4.0 Novel genes in ITAG4.0 are enriched in stress response genes. GO-terms enriched in novel genes are shown as fold enriched in minus log10 of their corresponding P-values.
  • 9. Thank you! Submit your annotation corrections using Tomato Apollo annotation editor - contact SGN for account https://solgenomics.net/contact/form