Genotyping by Sequencing

 Methods for high-throughput marker discovery.
 Genotyping by sequencing strategy.
 Bioinformatics Pipeline.
 Need for modifying Genotyping-by-Sequencing.
 Applications of Genotyping-by-Sequencing.
Brief Outline

http://www.maizegenetics.net/

TraditionalMarkerDiscovery
Costly and can not be
parallelized
Time Consuming Cloning
and Primer Designing
Scoring :Expensive and
Labourious

NGSBasedMarker
Discovery
Discovering, Sequencing
and Genotyping of large
number of markers
FAST
Parallelized Library
Preparation

Sequencing is rapidly becoming so inexpensive that it will soon be reasonable to use it for
every genetic study
(Poland et al.,2012)

Enrichment strategies
• Long range PCR amplification (using molecular inversion probes)
• DNA hybridization/sequence capture methods
Time-consuming, technologically challenging, and can be cost-prohibitive for assaying
large numbers of samples.
Complexity Reduction Using Restriction Enzymes
 Easy,
 Quick
 Extremely specific
 Highly reproducible reach important regions of the genome (inaccessible to sequence
capture approaches)
 Repetitive regions of genomes can be avoided and lower copy regions can be targeted
with two to three fold higher efficiency
 Simplifies computationally challenging alignment problems in species with high levels
of genetic diversity.

Reduced-representation sequencing
 Reduced-representation libraries (RRLs)/CRoPS.
 Restriction-site-associated DNA sequencing (RAD-seq).
 Multiplexed Shotgun Sequencing.
 Genotyping based Sequencing.
 Model organisms with high-quality reference genome sequences
 Non-Model species with no existing genomic data
Methods for high-throughput marker
discovery using NGS
(Davey et al.,2011)

Comparison of current genotyping methods using
next-generation sequencing

( Davey et al.,2011)
GBS provides many advantages. It
offers a much simplified library
preparation procedure that can
be performed with small amounts
of starting DNA (100–200 ng) and
is amenable to a high level of
multiplexing.

MARKER
DISCOVERY
Assay
Designing
Genotyping
Marker
discovery
genotyping
CLASSICAL APPROACH GENOTYPING BY SEQUENCING

Marker discovery and genotyping are completed at the same time.
 Facilitates exploration of new germplasm sets.
 Raw data is dynamic.
 The raw sequences obtained from GBS can be re-analyzed.
 Reduced sample handling
 Few PCR & purification steps
 No DNA size fractionation
 Efficient barcoding system
Features of Genotyping By Sequencing

Computational Biology Service Unit, Cornell University

GBS BARCODES
 Barcode sets are enzyme specific
 Must not recreate the enzyme recognition site
 Must different enough from each other
 At least 3 bp differences among barcodes.
 No mononucleotide runs of 3 or more bases

Steps involved in Genotyping by Sequencing
Library Construction for Next Gen Sequencing
(Elshire et al.,2011)

Filtering and Selection of Reads

Sequence Processing

Bioinformatics
Challenges
Massive amounts
of data
Complex genomes
Missing data

Microarray
• Arrays designed based on one set of populations might not
represent the SNPs in a new germplasm set, higher cost of scale,
SNP array development is very time-consuming and costly.
Genotyping By Sequencing
• Free of the bias, GBS costs less and there are no upfront
efforts
SNP
DISCOVERY

Genotyping by sequencing
Marker Discovery Bulk Segregant Analysis
Fine
mapping
QTLs
Genomic
Selection
GWAS
POTENTIAL APPLICATIONS OF GBS DATA

Genotyping By
Sequencing
Reduced
Representation
Wheat,
Barley.....
Whole Genome
Resequencing.
Done in Rice
and Arabidopsis
Two Different Approaches of Genotyping By Sequencing

MARKER DISCOVERY
Cornell CBSU Workshop

(Huang et al.,2009)
Genetic map for 150 rice recombinant inbred has been constructed by using
Illumina genome analyser, resulted in discovery of 1,226,791 SNPS

The population was developed from a cross between two rice
cultivars with genome sequences, Oryza sativa ssp. japonica cv.
Nipponbare and Oryza sativa spp. indica cv. 93-11 .With a relatively
high mapping resolution, candidate genes for some QTL of large or
moderate effect were identified.
(Wang et al.,2011)

 Cloning QTL is technically challenging. It requires the development of near-
isogenic lines (NILs) through repeatedly backcrossing with one of the mapping
parents or additional samples of natural variants for association of phenotype
and candidate genes. Positional cloning using NILs is time-consuming and
labor-intensive because it takes a few generations of backcrossing to make
NILs and thousands of recombinants to fine map the candidate genes
 Genotyping by sequencing approach can substantially reduce the amount of
time and effort required for QTL mapping

49 QTL within relatively small genomic regions for 14 agronomic traits were identified
(Wang et al.,2011)

Traits measured directly in the field include heading date, culm diameter, plant height,
flag leaf length and flag leaf width, tiller angle, tiller number, panicle length, and awn
length. Traits measured in the laboratory following harvest include grain length, grain
width, grain thickness, grain weight, and spikelet number per panicle
(Wang et al.,2011)

Five QTL of relatively large effect (14.6–46.0%) were located on small genomic regions,
where strong candidate genes were found.
 The analysis using sequencing- based genotyping thus offers a powerful solution to
map QTL with high resolution
(Wang et al.,2011)

 RAD was initially proposed by Miller (2007) and adapted to incorporate
barcoding for multiplexing with Illumina sequencing technology by Baird et al,
(2008) . The RAD procedure has been used successfully to identify SNPs in a
number of plant species including eggplant, barley, and globe artichoke.
 Subsequently, Elshire et al. (2011) proposed a method for the construction of
highly multiplexed reduced complexity genotyping by sequencing (GBS) libraries.
The procedure is based on a similar restriction digestion technique to RAD, but it
is substantially less complicated, resulting in time and cost savings in library
preparation, but the resultant data contains a larger number of missing genotype
calls.
Developments in Genotyping by Sequencing

(Sonah et al.,2013)
An Improved Genotyping by Sequencing (GBS) Approach Offering Increased
Versatility and Efficiency of SNP Discovery and Genotyping
A uniform distribution of the ApeK1 restriction sites was observed following in silico digestion
of the soybean genome and a good proportion of the resultant fragments were short enough
for effective amplification and sequencing on the Illumina platform
Selection of an Appropriate Enzyme for GBS in Soybean

Summary of sequenced raw and processed reads in eight soybean
genotypes obtained on an Illumina Genome Analyzer II.
The number of sorted raw sequence reads ranged from 0.44 million reads (TGx1989-53F)
up to 1.00 million reads (Ocepara-4). A total of 5.50 million processed quality reads (98.76%
of all reads) were retained. Processed reads of the individual genotypes were mapped onto
the reference genome and only reads mapping to a unique location in the genome were
retained. Such uniquely mapped reads represented 85% of the total and were well
distributed across the chromosomes
(Sonah et al.,2013)

(a) Distribution of mapped sequence reads and SNPs identified using a GBS approach, the frequency of SNPs on the
twenty soybean chromosomes averaged 10 SNPs/Mb
(b) (b) Frequency of genes and transposons identified in the same bins on soybean chromosome. The distribution of
SNPs closely mirrors the distribution of genic sequences; it proved to be highest in gene-rich terminal regions and
lowest in highly repetitive centromeric and pericentromeric regions of chromosomes.
Sequence coverage and SNP distribution
(Sonah et al.,2013)

Optimizing the Number and Coverage of SNPs by the Use of Selective Primers
Library construction with a common primer having 1 (A or C) or 2 (AA, AC or CC) selective
bases at the 3′ end, a significant improvement in both the number and the depth of
coverage of called SNPs. Most libraries prepared using selective amplification resulted in a
greater number of SNP calls with an improved depth of coverage
(Sonah et al.,2013)

 A set of eight diverse soybean genotypes were used. Using ApeKI for GBS
library preparation and sequencing on an Illumina GAIIx machine, 5.5 M
reads were obtained and were processed.
 A total of 10,120 high quality SNPs were obtained and the distribution of
these SNPs mirrored closely the distribution of gene-rich regions in the
soybean genome. A total of 39.5% of the SNPs were present in genic regions
and 52.5% of these were located in the coding sequence.
 The use of selective primers to achieve a greater complexity reduction during
GBS library preparation has been proved. The number of SNP calls could be
increased by almost 40% and their depth of coverage can be more than
doubled.
SUMMARY

 Predicts desirable phenotypes by calculating breeding values based on
genotype.
 Statistical power is dependent on using large numbers of genetic markers,
so limited by the cost and availability of dense genome-wide marker data
 GBS can be used to generate markers to characterize breeding lines and
develop accurate GS models.
 Even modest gains from genomic selection could save years of in-field
evaluation.
Genomic Selection

Genotyping-by-sequencing (GBS) can be used for de novo genotyping of breeding
panels and to develop accurate GS models, even for the large, complex, and
polyploid wheat (Triticum aestivum L.) genome. Researchers applied GBS to a set
of 254 elite breeding lines from the CIMMYT and developed GS models for yield,
days to heading (DTH), and thousand-kernel weight (TKW).

GBS markers led to higher genomic prediction accuracies .For both yield traits and
heading date, the accuracy gain was in the range of 0.13 of 0.24. For TKW the increase
was smaller (0.05) and not significant (p-value > 0.05). With a comparable number of
markers, the GBS platform led to significantly higher accuracy (gains of approximately
0.15) for drought yield and heading date when compared to the DArT markers.

Researchers identified 41,371 single nucleotide polymorphisms
(SNPs). Genomic-estimated breeding value prediction accuracies with
GBS were 0.28 to 0.45 for grain yield, an improvement of 0.1 to 0.2
over an established marker platform for wheat.
(Poland et al.,2012),
SUMMARY

For barley the original GBS protocol has been extended to a
two-restriction-enzyme system .
[1] Elshireet al. (2011) A Robust, Simple Genotyping-by-
Sequencing (GBS) Approach for High Diversity Species.
PLoSONE 6(5): e19379. doi:10.1371/journal.pone.0019379
[2] Poland et al., (2012) Development of High-Density
Genetic Maps for Barley and Wheat Using a Novel Two-
Enzyme Genotyping-by-Sequencing Approach. PLoSONE
7(2): e32253. doi:10.1371/journal.pone.0032253

(Donato et al.,2013)
47 animals representing 7 taurine and indicine breeds of cattle from the US and Africa. 51,414
SNPs were detected throughout all autosomes with an average distance of 48.1 kb, and 1,143
SNPs on the X chromosome at an average distance of 130.3 kb, as well as 191 on unmapped
contigs.

Integration of genotyping-by-sequencing (GBS) in the context of
plant breeding and genomics
( Poland and Rife., 2012)

Conclusion
Genotyping by-sequencing (GBS) is a rapid and robust approach
for reduced-representation sequencing that combines genome-
wide molecular marker discovery and genotyping.
The flexibility and low cost of GBS makes this an excellent tool
for many applications and research questions in plant genetics and
breeding
GBS will become more powerful with the continued increase of
sequencing output, development of reference genomes, and
improvement of bioinformatics.

Genotyping by Sequencing

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Genotyping by Sequencing (20)

More from Senthil Natesan (20)

Recently uploaded (20)

Genotyping by Sequencing