SlideShare a Scribd company logo
iMate Protocol (version 1.2) by GRAS – May 18, 2015
NGS and Phyloinfo in Kobe
http://www.clst.riken.jp/phylo/
1
iMate Protocol: Improved and Inexpensive NexteraTM
Mate Pair Library Preparation
Authorized by Kaori Tatsumi, Osamu Nishimura, Kazu Itomi, Chiharu Tanegashima & Shigehiro Kuraku
Genome Resource & Analysis Station (GRAS)
operated by Phyloinformatics Unit
RIKEN Center for Life Science Technologies (CLST)
Notice: A benchmark paper introducing this protocol has been accepted for publication in the journal
Biotechniques. When you present or publish data based on technical guidance in this protocol, you could think
about citing this protocol at our web site and the benchmark paper (Tatsumi et al., 2015) published in
Biotechniques.
This protocol outlines the modifications to the ‘Gel-plus’ version of the standard protocol for
Nextera Mate Pair Library Preparation and the logical background for them. Desiring optimal
scaffolding performance, we have optimized the protocol under the possibly conservative policy
that only read pairs with junction adaptors (bona fide ‘mates’) should be passed on to scaffolding.
The keys for this protocol are optimizing the 1) tagmentation condition, 2) Covaris shearing
condition, and 3) sequence read length, in order to enhance the yield of libraries and the
capability of detecting the junction adaptor in reads.
Basically, we understand that 4μg of starting genomic DNA, as formulated in the standard
protocol, is enough for preparation of mate-pair libraries with mate distance of >10kb. Ideally, we
could optimize the tagmentation condition so that as much DNA as possible fall into the targeted
size range. For this purpose, perform tagment reaction with multiple conditions, for example, in
three tubes with 4, 8 and 12 μl of tagment enzyme supplied in the kit. The tagment buffer can be
self-made [1], which leads to cost-saving, if other limiting reagents are also saved.
Size distribution of tagmented DNA molecules should be analyzed with a trustworthy method,
such as pulse field electrophoresis (e.g., PippinPulse) or the Agilent TapeStation―the Agilent
Bioanalyzer does not perform well for this purpose. With comparable results from multiple
tagment reactions, you could figure out which tagment condition allows you to retrieve the largest
amount of DNA for the targeted size range.
Like the previous tagmentation step, the amounts of the supplied reagents used in this step are
the limiting factor in terms of how many libraries can be prepared with one purchased kit. Thus, it
would be preferable to find a way to decrease the amount of kit-supplied reagents required to
perform this step. Our solution is to perform strand displacement after size selection with
BluePippin. Using size-selected DNA as substrate, we usually perform strand displacement with
1/4 volume for all reaction components.
Do as instructed in the standard protocol.
We use a BluePippin in this step and usually set a size range of 4 kb in width (for example, from 6
kb to 10 kb) although this is a matter of further consideration. As mentioned above in the
step, performing this step before strand displacement reduces the
amount of DNA to be processed for strand displacement, resulting in saving enzyme and buffer
for strand displacement. We recommend quantifying the amount of DNA after size selection.
iMate Protocol (version 1.2) by GRAS – May 18, 2015
NGS and Phyloinfo in Kobe
http://www.clst.riken.jp/phylo/
2
After strand displacement and size selection (whether you perform these steps in this order or the
other way round), it is ideal to retain at least 100 ng of DNA. Although the standard protocol
mentions ‘150-400 ng’ (on page 27), 100-200ng is realistic and still promising, in our experience.
Do as instructed in the standard protocol.
Shearing determines the length of library inserts, which should be coordinated with read length in
sequencing. A shearing condition we propose is the one which will ultimately result in the library
size distribution of 300 – 700 bp with the peak at 450-500bp in the step far
below. Note that this is markedly different from the size distribution illustrated in the standard
protocol (300-1200bp; on page 49). To achieve the size distribution proposed above, we
recommend performing successive shearing with multiple executions of the Covaris condition
instructed in the standard protocol. In our experience, shearing the genomes of different species
with the same condition can result in markedly different fragment size distributions. Thus, you
need to optimize the condition specifically for your species of interest. For one of the species we
worked on, we performed as many as 7 runs of Covaris shearing with the condition instructed in
the standard protocol.
You may feel an urge to perform QC with Bioanalyzer immediately after the Covaris shearing, but
it will not give you a fair assessment of shearing results because you do not want to use a large
amount of sheared DNA for QC. Thus, we recommend to save as much DNA as possible at this
stage and to measure the size distribution later in the step ‘ ’.
Do as instructed in the standard protocol.
To get as many unique mate-pair reads as possible, it is strongly recommended to reduce PCR
cycles and avoid excessive amplification. We suggest performing no more than 10 cycles of PCR.
This warning is supported by our experience of getting enough amount of products with 10 PCR
cycles, even for samples that are supposed to require 15 cycles according to the standard
protocol (for example, 100ng for libraries with mate distant range of 6-10kb). In fact, we normally
perform 8 PCR cycles, and only when we find the yield too low after AMPure clean-up do we
perform additional PCR cycling (still, no more than 10 cycles in total). If you do not get enough
products within 10 cycles, you had better first optimize the tagment condition to increase the yield
for the targeted size range.
To have a clearer idea of how many PCR cycles are needed beforehand, you might want to use a
kit such as KAPA Real-Time Library Amplification Kit (KK2701). However, one crucial concern is
that beads attached to library DNA can interfere with the detection of SYBR Green in qPCR.
With the illumina system, it seems that the insert lengths of many reads actually sequenced are
shorter than the most frequent insert length of a library. Thus, be sure to perform greedy size
selection with AMPure to get rid of molecules with short inserts, as instructed in the standard
protocol (x0.67 AMPure to get rid of <300bp molecules), no matter what the size distribution of
library inserts is. Modest size selection can result in high proportion of read pairs with too small
lengths, and they may not suffice for effective scaffolding.
iMate Protocol (version 1.2) by GRAS – May 18, 2015
NGS and Phyloinfo in Kobe
http://www.clst.riken.jp/phylo/
3
Use Bioanalyzer or equivalent in this final QC before sequencing. Keep in mind that the size
distribution is determined mostly by shearing condition and AMPure clean-up, rather than the
choice of size range of mate distance.
We use KAPA Library Quantification Kit (KK4835) in this step. Quantification should not be tricky
if the library has an ordinary unimodal size distribution. The standard protocol says that you need
1.5nM-20nM of the synthesized library, but we think that 2nM is enough unless the sequencing
facility you are working with requests much more than required in an actual sequencing run.
In your first trial, it is advised to run a MiSeq for small-scale pilot sequencing to get 300bp-long
paired-end reads from prepared libraries―sequencing as many as 10 libraries per MiSeq run
should allow you fair validation of libraries. Obtained 300bp-long paired-end reads could also be
used for simulating which read length yields the highest proportion of reads with junction adaptor,
by chopping them at 100bp, 127bp and 171bp for example (if sequencing with HiSeq is planned
next).
The lengths of 127bp and 171bp may sound unusual, but with Rapid Run on HiSeq, one can
obtain reads of these lengths by making the best use of extra cycles inherently assigned for
Nextera dual indexing which we do not need in mate-pair sequencing. This trick allows you to get
127bp and 171bp, using three and four of the TruSeq Rapid SBS Kit for 50 cycles, respectively
(see page 6 of the official manual for TruSeq Rapid SBS Kit). Please consult with the sequencing
facility that you plan to work with, about the possibility of this extra-cycle sequencing.
In our experience, Rapid Run mode with v1 chemistry on older HCS (HiSeq Control Software)
seems to be vulnerable to suboptimal library pooling, such as the ‘low plex pooling’ issue (see
this document by illumina). In the course of your mate pair sequencing, you may encounter a
situation in which you have only 4 or fewer libraries to be sequenced in a Rapid Run. In this case
there is a high chance that base composition in index reads will be too homogeneous, and you
will get lower QV in index reads, resulting in a larger proportion of reads that failed in
demultiplexing. To reduce this unfavorable effect, you could introduce multiple indices per library
in the step above . As long as demultiplexing between libraries works out
without any overlap of indices, this strategy is supposed to produce as many valid reads as
possible, only with the cost of handling more data files in post-sequencing informatics steps. The
latest versions of HCS (version 2.2.38 or higher) seems to be robust against low diversity
samples, so you are suggested to contact the sequencing facility you are working with in advance
to make sure if you need to be concerned with the low plex pooling issue.
We recommend to first run on raw fastq files a recent version of FastQC (v0.11 or higher) to
monitor some standard metrics, including the frequency of junction adaptor appearance along
base positions (in the ‘Adapter Content’ view newly added in v0.11).
After the primary QC, run the program NextClip [2] and assess PCR duplicate rate and what
proportion of reads has the junction adaptors. After the NextClip run, be sure to rerun FastQC on
processed fastq files of Category A, B and C, separately, in order to confirm that junction/external
adaptors and low-quality bases were properly trimmed.
1. Wang Q, Gu L, Adey A, Radlwimmer B, Wang W, Hovestadt V, Bahr M, Wolf S, Shendure J, Eils R et al:
Tagmentation-based whole-genome bisulfite sequencing. Nature protocols 2013, 8(10):2022-2032.
2. Leggett RM, Clavijo BJ, Clissold L, Clark MD, Caccamo M: NextClip: an analysis and read preparation tool
for Nextera Long Mate Pair libraries. Bioinformatics 2014, 30(4):566-568.
iMate Protocol (version 1.2) by GRAS – May 18, 2015
NGS and Phyloinfo in Kobe
http://www.clst.riken.jp/phylo/
4

More Related Content

PDF
iMate Protocol Guide version 2.0
PDF
iMate Protocol Guide version 2.1
PDF
iMate Protocol Guide version 3.0
PDF
DNA_Services
PPTX
2015 pag-metagenome
PPTX
How to cluster and sequence an ngs library (james hadfield160416)
PPTX
Ngs de novo assembly progresses and challenges
PPTX
Making powerful science: an introduction to NGS and beyond
iMate Protocol Guide version 2.0
iMate Protocol Guide version 2.1
iMate Protocol Guide version 3.0
DNA_Services
2015 pag-metagenome
How to cluster and sequence an ngs library (james hadfield160416)
Ngs de novo assembly progresses and challenges
Making powerful science: an introduction to NGS and beyond

What's hot (20)

PDF
Bioo Scientific - Improving the Performance of SureSelectXT2 Target Capture
PDF
Overview of methods for variant calling from next-generation sequence data
PPTX
Making powerful science: an introduction to NGS data analysis
PDF
Galaxy RNA-Seq Analysis: Tuxedo Protocol
PDF
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
PDF
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
PPTX
High efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDT
PDF
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
PDF
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
PDF
PrimeTime® qPCR products for gene expression
PDF
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
PDF
Computational infrastructure for NGS data analysis
PDF
Advanced NGS Library Prep for Challenging Samples
PPTX
Alt-R™ CRISPR-Cas9 System: Ribonucleoprotein delivery optimization for improv...
PDF
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
PDF
MCNext Sybr qPCR quantification Kit
PDF
A geometric approach to improving active packet loss measurement
PPTX
Workshop NGS data analysis - 1
PPTX
2015 osu-metagenome
PDF
20140711 4 e_tseng_ercc2.0_workshop
Bioo Scientific - Improving the Performance of SureSelectXT2 Target Capture
Overview of methods for variant calling from next-generation sequence data
Making powerful science: an introduction to NGS data analysis
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
High efficiency qPCR with PrimeTime® Gene Expression Master Mix from IDT
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
PrimeTime® qPCR products for gene expression
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Computational infrastructure for NGS data analysis
Advanced NGS Library Prep for Challenging Samples
Alt-R™ CRISPR-Cas9 System: Ribonucleoprotein delivery optimization for improv...
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
MCNext Sybr qPCR quantification Kit
A geometric approach to improving active packet loss measurement
Workshop NGS data analysis - 1
2015 osu-metagenome
20140711 4 e_tseng_ercc2.0_workshop
Ad

Viewers also liked (20)

PDF
Alliance Rebel Rally Presentation
PPT
2013 2014 presentation
PPT
Presentasjon bygg ppt
PDF
Managing strategy-career-and-mindset
PPTX
Principales ressources cartographiques et statistiques Centre GéoStat (2016)
PPT
Презентация Медиалогии с конференции "Дни PR на Юге"
PPT
圖書館書香之旅(100.09)
PPTX
Global innovation networks clintelica predictive crm
PDF
인터랙티브디자인 김혜진
PPTX
GéoPhoto+ : Toward an collective directory of aerial photographs!
PPTX
LMHS PowerPoint March 2015
PDF
Pp 13 2002
PPT
OSSLT presentation (2012)
PPTX
Presentasjon kokkekamp 2014
DOCX
HARI PRASADStudent Mentor and Coach (Rev)
KEY
Implementing the Inverted Classroom in the Basic Video Course
PPTX
ИНФОПОВОД 2013: ВымпелКом
PPT
Services et ressources du Centre GéoStat
PPT
Osslt presentation 2013 14 new
PPTX
인터넷 비즈니스의 활용
Alliance Rebel Rally Presentation
2013 2014 presentation
Presentasjon bygg ppt
Managing strategy-career-and-mindset
Principales ressources cartographiques et statistiques Centre GéoStat (2016)
Презентация Медиалогии с конференции "Дни PR на Юге"
圖書館書香之旅(100.09)
Global innovation networks clintelica predictive crm
인터랙티브디자인 김혜진
GéoPhoto+ : Toward an collective directory of aerial photographs!
LMHS PowerPoint March 2015
Pp 13 2002
OSSLT presentation (2012)
Presentasjon kokkekamp 2014
HARI PRASADStudent Mentor and Coach (Rev)
Implementing the Inverted Classroom in the Basic Video Course
ИНФОПОВОД 2013: ВымпелКом
Services et ressources du Centre GéoStat
Osslt presentation 2013 14 new
인터넷 비즈니스의 활용
Ad

Similar to iMate Protocol Guide version 1.2 (20)

PPTX
A Comparison of NGS Platforms.
PPTX
Bioinformatics workshop Sept 2014
PDF
Illumina TruSeq DNA PCR-Free_Biomek FXP Automated Workstation
PPTX
Intro to illumina sequencing
PPT
20100516 bioinformatics kapushesky_lecture08
PDF
Illumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
PDF
RNA-seq: analysis of raw data and preprocessing - part 2
PDF
05_Microbio590B_QC_2022.pdf
PDF
Barcode Data Standards
PDF
14506 nuffield-dylan-bale
PDF
Medicall genetics lab manual
PDF
Medicall genetics lab manual وراثة عملي
PDF
L50149_NGS_Illumina_TruSeq-NS
PDF
Part 2 of RNA-seq for DE analysis: Investigating raw data
PPTX
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
PDF
HMW-DNA for long-read single-molecule sequencing
PDF
Illumina Nextera Rapid Capture_Biomek FXP Dual Arm Multi-96 and Span-8 Automa...
PPT
10 tips for working with RNA
PPT
High Throughput Sequencing Technologies: What We Can Know
A Comparison of NGS Platforms.
Bioinformatics workshop Sept 2014
Illumina TruSeq DNA PCR-Free_Biomek FXP Automated Workstation
Intro to illumina sequencing
20100516 bioinformatics kapushesky_lecture08
Illumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
RNA-seq: analysis of raw data and preprocessing - part 2
05_Microbio590B_QC_2022.pdf
Barcode Data Standards
14506 nuffield-dylan-bale
Medicall genetics lab manual
Medicall genetics lab manual وراثة عملي
L50149_NGS_Illumina_TruSeq-NS
Part 2 of RNA-seq for DE analysis: Investigating raw data
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
HMW-DNA for long-read single-molecule sequencing
Illumina Nextera Rapid Capture_Biomek FXP Dual Arm Multi-96 and Span-8 Automa...
10 tips for working with RNA
High Throughput Sequencing Technologies: What We Can Know

Recently uploaded (20)

PDF
. Radiology Case Scenariosssssssssssssss
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
An interstellar mission to test astrophysical black holes
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
2. Earth - The Living Planet earth and life
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
Sciences of Europe No 170 (2025)
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
HPLC-PPT.docx high performance liquid chromatography
. Radiology Case Scenariosssssssssssssss
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
ECG_Course_Presentation د.محمد صقران ppt
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
An interstellar mission to test astrophysical black holes
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Phytochemical Investigation of Miliusa longipes.pdf
INTRODUCTION TO EVS | Concept of sustainability
Comparative Structure of Integument in Vertebrates.pptx
Derivatives of integument scales, beaks, horns,.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
2. Earth - The Living Planet earth and life
2. Earth - The Living Planet Module 2ELS
Sciences of Europe No 170 (2025)
Taita Taveta Laboratory Technician Workshop Presentation.pptx
HPLC-PPT.docx high performance liquid chromatography

iMate Protocol Guide version 1.2

  • 1. iMate Protocol (version 1.2) by GRAS – May 18, 2015 NGS and Phyloinfo in Kobe http://www.clst.riken.jp/phylo/ 1 iMate Protocol: Improved and Inexpensive NexteraTM Mate Pair Library Preparation Authorized by Kaori Tatsumi, Osamu Nishimura, Kazu Itomi, Chiharu Tanegashima & Shigehiro Kuraku Genome Resource & Analysis Station (GRAS) operated by Phyloinformatics Unit RIKEN Center for Life Science Technologies (CLST) Notice: A benchmark paper introducing this protocol has been accepted for publication in the journal Biotechniques. When you present or publish data based on technical guidance in this protocol, you could think about citing this protocol at our web site and the benchmark paper (Tatsumi et al., 2015) published in Biotechniques. This protocol outlines the modifications to the ‘Gel-plus’ version of the standard protocol for Nextera Mate Pair Library Preparation and the logical background for them. Desiring optimal scaffolding performance, we have optimized the protocol under the possibly conservative policy that only read pairs with junction adaptors (bona fide ‘mates’) should be passed on to scaffolding. The keys for this protocol are optimizing the 1) tagmentation condition, 2) Covaris shearing condition, and 3) sequence read length, in order to enhance the yield of libraries and the capability of detecting the junction adaptor in reads. Basically, we understand that 4μg of starting genomic DNA, as formulated in the standard protocol, is enough for preparation of mate-pair libraries with mate distance of >10kb. Ideally, we could optimize the tagmentation condition so that as much DNA as possible fall into the targeted size range. For this purpose, perform tagment reaction with multiple conditions, for example, in three tubes with 4, 8 and 12 μl of tagment enzyme supplied in the kit. The tagment buffer can be self-made [1], which leads to cost-saving, if other limiting reagents are also saved. Size distribution of tagmented DNA molecules should be analyzed with a trustworthy method, such as pulse field electrophoresis (e.g., PippinPulse) or the Agilent TapeStation―the Agilent Bioanalyzer does not perform well for this purpose. With comparable results from multiple tagment reactions, you could figure out which tagment condition allows you to retrieve the largest amount of DNA for the targeted size range. Like the previous tagmentation step, the amounts of the supplied reagents used in this step are the limiting factor in terms of how many libraries can be prepared with one purchased kit. Thus, it would be preferable to find a way to decrease the amount of kit-supplied reagents required to perform this step. Our solution is to perform strand displacement after size selection with BluePippin. Using size-selected DNA as substrate, we usually perform strand displacement with 1/4 volume for all reaction components. Do as instructed in the standard protocol. We use a BluePippin in this step and usually set a size range of 4 kb in width (for example, from 6 kb to 10 kb) although this is a matter of further consideration. As mentioned above in the step, performing this step before strand displacement reduces the amount of DNA to be processed for strand displacement, resulting in saving enzyme and buffer for strand displacement. We recommend quantifying the amount of DNA after size selection.
  • 2. iMate Protocol (version 1.2) by GRAS – May 18, 2015 NGS and Phyloinfo in Kobe http://www.clst.riken.jp/phylo/ 2 After strand displacement and size selection (whether you perform these steps in this order or the other way round), it is ideal to retain at least 100 ng of DNA. Although the standard protocol mentions ‘150-400 ng’ (on page 27), 100-200ng is realistic and still promising, in our experience. Do as instructed in the standard protocol. Shearing determines the length of library inserts, which should be coordinated with read length in sequencing. A shearing condition we propose is the one which will ultimately result in the library size distribution of 300 – 700 bp with the peak at 450-500bp in the step far below. Note that this is markedly different from the size distribution illustrated in the standard protocol (300-1200bp; on page 49). To achieve the size distribution proposed above, we recommend performing successive shearing with multiple executions of the Covaris condition instructed in the standard protocol. In our experience, shearing the genomes of different species with the same condition can result in markedly different fragment size distributions. Thus, you need to optimize the condition specifically for your species of interest. For one of the species we worked on, we performed as many as 7 runs of Covaris shearing with the condition instructed in the standard protocol. You may feel an urge to perform QC with Bioanalyzer immediately after the Covaris shearing, but it will not give you a fair assessment of shearing results because you do not want to use a large amount of sheared DNA for QC. Thus, we recommend to save as much DNA as possible at this stage and to measure the size distribution later in the step ‘ ’. Do as instructed in the standard protocol. To get as many unique mate-pair reads as possible, it is strongly recommended to reduce PCR cycles and avoid excessive amplification. We suggest performing no more than 10 cycles of PCR. This warning is supported by our experience of getting enough amount of products with 10 PCR cycles, even for samples that are supposed to require 15 cycles according to the standard protocol (for example, 100ng for libraries with mate distant range of 6-10kb). In fact, we normally perform 8 PCR cycles, and only when we find the yield too low after AMPure clean-up do we perform additional PCR cycling (still, no more than 10 cycles in total). If you do not get enough products within 10 cycles, you had better first optimize the tagment condition to increase the yield for the targeted size range. To have a clearer idea of how many PCR cycles are needed beforehand, you might want to use a kit such as KAPA Real-Time Library Amplification Kit (KK2701). However, one crucial concern is that beads attached to library DNA can interfere with the detection of SYBR Green in qPCR. With the illumina system, it seems that the insert lengths of many reads actually sequenced are shorter than the most frequent insert length of a library. Thus, be sure to perform greedy size selection with AMPure to get rid of molecules with short inserts, as instructed in the standard protocol (x0.67 AMPure to get rid of <300bp molecules), no matter what the size distribution of library inserts is. Modest size selection can result in high proportion of read pairs with too small lengths, and they may not suffice for effective scaffolding.
  • 3. iMate Protocol (version 1.2) by GRAS – May 18, 2015 NGS and Phyloinfo in Kobe http://www.clst.riken.jp/phylo/ 3 Use Bioanalyzer or equivalent in this final QC before sequencing. Keep in mind that the size distribution is determined mostly by shearing condition and AMPure clean-up, rather than the choice of size range of mate distance. We use KAPA Library Quantification Kit (KK4835) in this step. Quantification should not be tricky if the library has an ordinary unimodal size distribution. The standard protocol says that you need 1.5nM-20nM of the synthesized library, but we think that 2nM is enough unless the sequencing facility you are working with requests much more than required in an actual sequencing run. In your first trial, it is advised to run a MiSeq for small-scale pilot sequencing to get 300bp-long paired-end reads from prepared libraries―sequencing as many as 10 libraries per MiSeq run should allow you fair validation of libraries. Obtained 300bp-long paired-end reads could also be used for simulating which read length yields the highest proportion of reads with junction adaptor, by chopping them at 100bp, 127bp and 171bp for example (if sequencing with HiSeq is planned next). The lengths of 127bp and 171bp may sound unusual, but with Rapid Run on HiSeq, one can obtain reads of these lengths by making the best use of extra cycles inherently assigned for Nextera dual indexing which we do not need in mate-pair sequencing. This trick allows you to get 127bp and 171bp, using three and four of the TruSeq Rapid SBS Kit for 50 cycles, respectively (see page 6 of the official manual for TruSeq Rapid SBS Kit). Please consult with the sequencing facility that you plan to work with, about the possibility of this extra-cycle sequencing. In our experience, Rapid Run mode with v1 chemistry on older HCS (HiSeq Control Software) seems to be vulnerable to suboptimal library pooling, such as the ‘low plex pooling’ issue (see this document by illumina). In the course of your mate pair sequencing, you may encounter a situation in which you have only 4 or fewer libraries to be sequenced in a Rapid Run. In this case there is a high chance that base composition in index reads will be too homogeneous, and you will get lower QV in index reads, resulting in a larger proportion of reads that failed in demultiplexing. To reduce this unfavorable effect, you could introduce multiple indices per library in the step above . As long as demultiplexing between libraries works out without any overlap of indices, this strategy is supposed to produce as many valid reads as possible, only with the cost of handling more data files in post-sequencing informatics steps. The latest versions of HCS (version 2.2.38 or higher) seems to be robust against low diversity samples, so you are suggested to contact the sequencing facility you are working with in advance to make sure if you need to be concerned with the low plex pooling issue. We recommend to first run on raw fastq files a recent version of FastQC (v0.11 or higher) to monitor some standard metrics, including the frequency of junction adaptor appearance along base positions (in the ‘Adapter Content’ view newly added in v0.11). After the primary QC, run the program NextClip [2] and assess PCR duplicate rate and what proportion of reads has the junction adaptors. After the NextClip run, be sure to rerun FastQC on processed fastq files of Category A, B and C, separately, in order to confirm that junction/external adaptors and low-quality bases were properly trimmed. 1. Wang Q, Gu L, Adey A, Radlwimmer B, Wang W, Hovestadt V, Bahr M, Wolf S, Shendure J, Eils R et al: Tagmentation-based whole-genome bisulfite sequencing. Nature protocols 2013, 8(10):2022-2032. 2. Leggett RM, Clavijo BJ, Clissold L, Clark MD, Caccamo M: NextClip: an analysis and read preparation tool for Nextera Long Mate Pair libraries. Bioinformatics 2014, 30(4):566-568.
  • 4. iMate Protocol (version 1.2) by GRAS – May 18, 2015 NGS and Phyloinfo in Kobe http://www.clst.riken.jp/phylo/ 4