SlideShare a Scribd company logo
Sequencing a genome
Definition 
• Determining the identity and order of 
nucleotides in the genetic material – usually 
DNA, sometimes RNA, of an organism
Basic problem 
• Genomes are large (typically millions or 
billions of base pairs) 
• Current technology can only reliably ‘read’ 
a short stretch – typically hundreds of base 
pairs
Elements of a solution 
• Automation – over the past decade, the 
amount of hand-labor in the ‘reads’ has 
been steadily and dramatically reduced 
• Assembly of the reads into sequences is an 
algorithmic and computational problem
A human drama 
• There are competing methods of assembly 
• The competing – public and private – 
sequencing teams used competing assembly 
methods
Assembly: 
• Putting sequenced fragments of DNA 
into their correct chromosomal 
positions
BAC 
• Bacterial artificial chromosome: 
bacterial DNA spliced with a medium-sized 
fragment of a genome (100 to 
300 kb) to be amplified in bacteria and 
sequenced.
Contig 
• Contiguous sequence of DNA created 
by assembling overlapping sequenced 
fragments of a chromosome (whether 
natural or artificial, as in BACs)
Cosmid 
• DNA from a bacterial virus spliced 
with a small fragment of a genome (45 
kb or less) to be amplified and 
sequenced
Directed sequencing 
• Successively sequencing DNA from 
adjacent stretches of chromosome
Draft sequence 
• Sequence with lower accuracy than a 
finished sequence; some segments are 
missing or in the wrong order or 
orientation
EST 
• Expressed sequence tag: a unique 
stretch of DNA within a coding region 
of a gene; useful for identifying full-length 
genes and as a landmark for 
mapping
Exon 
• Region of a gene’s DNA that encodes 
a portion of its protein; exons are 
interspersed with noncoding introns
Genome 
• The entire chromosomal genetic 
material of an organism
Intron 
• Region of a gene’s DNA that is not 
translated into a protein
Kilobase (kb) 
• Unit of DNA equal to 1000 bases
Locus 
• Chromosomal location of a gene or 
other piece of DNA
Megabase (mb) 
• Unit of DNA equal to 1 million bases
PCR 
• Polymerase chain reaction: a technique 
for amplifying a piece of DNA quickly 
and cheaply
Physical map 
• A map of the locations of identifiable 
markers spaced along the 
chromosomes; a physical map may 
also be a set of overlapping clones
Plasmid 
• Loop of bacterial DNA that replicates 
independently of the chromosomes; 
artificial plasmids can be inserted into 
bacteria to amplify DNA for 
sequencing
Regulatory region 
• A segment of DNA that controls 
whether a gene will be expressed and 
to what degree
Repetitive DNA 
• Sequences of varying lenths that occur 
in multiple copies in the genome; it 
represents much of the genome
Restriction enzyme 
• An enzyme that cuts DNA at specific 
sequences of base pairs
RFLP 
• Restriction fragment length 
polymorphism: genetic variation in the 
length of DNA fragments produced by 
restriction enzymes; useful as markers 
on maps
Scaffold 
• A series of contigs that are in the right 
order but are not necessarily connected 
in one continuous stretch of sequence
Shotgun sequencing 
• Breaking DNA into many small pieces, 
sequencing the pieces, and assembling 
the fragments
STS 
• Sequence tagged site: a unique stretch 
of DNA whose location is known; 
serves as a landmark for mapping and 
assembly
YAC 
• Yeast artificial chromosome: yeast 
DNA spliced with a large fragment of 
a genome (up to 1 mb) to be amplified 
in yeast cells and sequenced
Readings 
• Myers, “Whole Genome DNA Sequencing,” http://www.cs. 
arizona.edu/people/gene/PAPERS/whole.IEEE.pdf 
• Venter, et al, “The Sequence of the Human Genome,” 
Science, 16 Feb 2001, Vol. 291 No 5507, 1304 (parts 1 & 2) 
• Waterston, Lander, Sulston, “On the sequencing of the 
human genome,” PNAS, March 19, 2002, Vol 99, no 6, 
3712-3716 
• Myers, et.al., “On the sequencing and assembly of the 
human genome,” 
www.pnas.org/cgi/doi/10.1073/pnas.092136699
Hierarchical sequencing 
• Create a high-level physical map, using 
ESTs and STSs 
• Shred genome into overlapping clones 
• Multiply clones in BACs 
• ‘shotgun’ each clone 
• Read each ‘shotgunned’ fragment 
• Assemble the fragments
Physical map
Whole genome sequencing (WGS) 
• Make multiple copies of the target 
• Randomly ‘shotgun’ each target, discarding 
very big and very small pieces 
• Read each fragment 
• Reassemble the ‘reads’
Hierarchical v. whole-genome
The fragment assembly problem 
• Aim: infer the target from the reads 
• Difficulties – 
– Incomplete coverage. Leaves contigs separated 
by gaps of unknown size. 
– Sequencing errors. Rate increases with length 
of read. Less than some e. 
– Unknown orientation. Don’t know whether to 
use read or its Watson-Crick complement.
Scaling and computational 
complexity 
• Increasing size of target G. 
– 1990 – 40kb (one cosmid) 
– 1995 – 1.8 mb (H. Influenza) 
– 2001 – 3,200 mb (H. sapiens)
The repeat problem 
• Repeats 
– Bigger G means more repeats 
– Complex organisms have more repetitive 
elements 
– Small repeats may appear multiple times in a 
read 
– Long repeats may be bigger than reads (no 
unique region)
Gaps 
• Read length LR hasn’t changed much 
• w = LR /G gets steadily smaller 
• Gaps ~ Re- wR (Waterman & Lander)
How deep must coverage be?
Double-barreled shotgun 
sequencing 
• Choose longer fragments (say, 2 x LR) 
• Read both ends 
• Such fragments probably span gaps 
• This gives an approximate size of the gap 
• This links contigs into scaffolds
Genomic results
HGSC v Celera results
To do or not to do? 
• “The idea is gathering momentum. I shiver 
at the thought.” – David Baltimore, 1986 
• “If there is anything worth doing twice, it’s 
the human genome.” – David Haussler, 
2000
Public or private? 
• “This information is so important that it 
cannot be proprietary.” – C Thomas 
Caskey, 1987 
• “If a company behaves in what scientists 
believe is a socially responsible manner, 
they can’t make a profit.” – Robert Cook- 
Deegan, 1987
HW for Feb 17 
• Comment on these assertions (500-1000 
words): 
– WLS – “Our analysis indicates that the Celera 
paper provides neither a meaningful test of the 
WGS approach nor an independent sequence of 
the human genome.” 
– Venter – “This conclusion is based on incorrect 
assumptions and flawed reasoning.”

More Related Content

PPTX
Sequence Alignment
PDF
Gene prediction method
PPTX
NGS data formats and analyses
PPTX
Transcriptome analysis
PPTX
Nucleic acid hybridization
PPTX
Genomics: Organization of Genome, Strategies of Genome Sequencing, Model Plan...
PPTX
SNP Detection Methods and applications
PDF
Basics of Genome Assembly
Sequence Alignment
Gene prediction method
NGS data formats and analyses
Transcriptome analysis
Nucleic acid hybridization
Genomics: Organization of Genome, Strategies of Genome Sequencing, Model Plan...
SNP Detection Methods and applications
Basics of Genome Assembly

What's hot (20)

PPTX
Whole genome sequencing
PPTX
cDNA synthesis
PPTX
PPTX
Global and Local Sequence Alignment
PPTX
Sequence database
PPTX
Whole genome sequence
PPTX
BLAST (Basic local alignment search Tool)
PPTX
Ngs ppt
PPTX
Express sequence tags
PDF
Gene prediction methods vijay
PPTX
Ion torrent and SOLiD Sequencing Techniques
PPTX
Sanger sequencing
PDF
RNA-seq Analysis
PPTX
Next Generation Sequencing of DNA
PPTX
Dna sequencing
PPT
Est database
PPTX
Labelling of dna
PPTX
Gene prediction and expression
PPTX
DNA Sequencing
Whole genome sequencing
cDNA synthesis
Global and Local Sequence Alignment
Sequence database
Whole genome sequence
BLAST (Basic local alignment search Tool)
Ngs ppt
Express sequence tags
Gene prediction methods vijay
Ion torrent and SOLiD Sequencing Techniques
Sanger sequencing
RNA-seq Analysis
Next Generation Sequencing of DNA
Dna sequencing
Est database
Labelling of dna
Gene prediction and expression
DNA Sequencing
Ad

Viewers also liked (20)

PPT
Genome sequencing
PPTX
Genome sequencing
PDF
Whole Genome Analysis
PPTX
DNA SEQUENCING METHOD
PPTX
Whole genome sequencing of bacteria & analysis
PPTX
Third Generation Sequencing
PPTX
Genome Mapping
PPTX
The human genome project
PPTX
Dna sequencing powerpoint
PDF
Whole genome duplication and diversification of plant genomes
ZIP
YSP Week 3 HGP
PDF
Evolutionary genomics
PPTX
Mapping by pcr
PDF
Monteiro 2015 Conservação ex situ de espécies ameaçadas da flora brasileira: ...
PPTX
Biology 16 1 genes and variation[1]
PPTX
Genomic selection with weighted GBLUP and APY single step
PPT
Genome Sequencing Project
PPTX
Breeding and genomics in ILRI biosciences research
PPTX
Presentation for blast algorithm bio-informatice
Genome sequencing
Genome sequencing
Whole Genome Analysis
DNA SEQUENCING METHOD
Whole genome sequencing of bacteria & analysis
Third Generation Sequencing
Genome Mapping
The human genome project
Dna sequencing powerpoint
Whole genome duplication and diversification of plant genomes
YSP Week 3 HGP
Evolutionary genomics
Mapping by pcr
Monteiro 2015 Conservação ex situ de espécies ameaçadas da flora brasileira: ...
Biology 16 1 genes and variation[1]
Genomic selection with weighted GBLUP and APY single step
Genome Sequencing Project
Breeding and genomics in ILRI biosciences research
Presentation for blast algorithm bio-informatice
Ad

Similar to sequencing of genome (20)

PDF
Structural genomics
PPTX
PPTX
An introduction on sequence tagged site mapping
PPTX
Recombinant dna technology
PDF
genemappingmethods-180821040830.qqqqqpdf
PPTX
Gene mapping methods
PPTX
Dna mapping
PPT
High Throughput Sequencing Technologies: What We Can Know
PPTX
introduction to Genomics
PPT
Dna fingerprinting
PPTX
Genome editing
PDF
Recombinant DNA Technology
PPTX
Recombinant dna technology
PDF
Genomics-Mapping and sequencing.pdf
PPTX
Recombinant DNA-PPT.pptx
PPTX
Recombinant DNA-PPT.pptx
PDF
Recombinent DNA
PPTX
physical mapping- restriction map, STS map, EST map
PPTX
Genome sequencing,shotgun sequencing.pptx
PPT
Genetic engineering
Structural genomics
An introduction on sequence tagged site mapping
Recombinant dna technology
genemappingmethods-180821040830.qqqqqpdf
Gene mapping methods
Dna mapping
High Throughput Sequencing Technologies: What We Can Know
introduction to Genomics
Dna fingerprinting
Genome editing
Recombinant DNA Technology
Recombinant dna technology
Genomics-Mapping and sequencing.pdf
Recombinant DNA-PPT.pptx
Recombinant DNA-PPT.pptx
Recombinent DNA
physical mapping- restriction map, STS map, EST map
Genome sequencing,shotgun sequencing.pptx
Genetic engineering

More from Naveen Gupta (12)

PPT
Dynamic memory allocation
PPT
Reproduction
PPTX
stem cells
PPT
Mitosis2
PPTX
Mestrual cycle
PPTX
Menstrual cycle 2
PPT
Lecture 1 family planning
PPT
Human reproduction
PPT
Genetic engineering
PPT
Cell respiration
PPT
Asexual and sexual_reproduction
PPT
Biotechnology and genetic engineering
Dynamic memory allocation
Reproduction
stem cells
Mitosis2
Mestrual cycle
Menstrual cycle 2
Lecture 1 family planning
Human reproduction
Genetic engineering
Cell respiration
Asexual and sexual_reproduction
Biotechnology and genetic engineering

Recently uploaded (20)

PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
Sciences of Europe No 170 (2025)
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
Introduction to Cardiovascular system_structure and functions-1
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
2Systematics of Living Organisms t-.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
AlphaEarth Foundations and the Satellite Embedding dataset
Taita Taveta Laboratory Technician Workshop Presentation.pptx
The KM-GBF monitoring framework – status & key messages.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Classification Systems_TAXONOMY_SCIENCE8.pptx
Biophysics 2.pdffffffffffffffffffffffffff
Sciences of Europe No 170 (2025)
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Cell Membrane: Structure, Composition & Functions
Introduction to Cardiovascular system_structure and functions-1
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
neck nodes and dissection types and lymph nodes levels
ECG_Course_Presentation د.محمد صقران ppt
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
POSITIONING IN OPERATION THEATRE ROOM.ppt
2Systematics of Living Organisms t-.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Phytochemical Investigation of Miliusa longipes.pdf
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...

sequencing of genome

  • 2. Definition • Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism
  • 3. Basic problem • Genomes are large (typically millions or billions of base pairs) • Current technology can only reliably ‘read’ a short stretch – typically hundreds of base pairs
  • 4. Elements of a solution • Automation – over the past decade, the amount of hand-labor in the ‘reads’ has been steadily and dramatically reduced • Assembly of the reads into sequences is an algorithmic and computational problem
  • 5. A human drama • There are competing methods of assembly • The competing – public and private – sequencing teams used competing assembly methods
  • 6. Assembly: • Putting sequenced fragments of DNA into their correct chromosomal positions
  • 7. BAC • Bacterial artificial chromosome: bacterial DNA spliced with a medium-sized fragment of a genome (100 to 300 kb) to be amplified in bacteria and sequenced.
  • 8. Contig • Contiguous sequence of DNA created by assembling overlapping sequenced fragments of a chromosome (whether natural or artificial, as in BACs)
  • 9. Cosmid • DNA from a bacterial virus spliced with a small fragment of a genome (45 kb or less) to be amplified and sequenced
  • 10. Directed sequencing • Successively sequencing DNA from adjacent stretches of chromosome
  • 11. Draft sequence • Sequence with lower accuracy than a finished sequence; some segments are missing or in the wrong order or orientation
  • 12. EST • Expressed sequence tag: a unique stretch of DNA within a coding region of a gene; useful for identifying full-length genes and as a landmark for mapping
  • 13. Exon • Region of a gene’s DNA that encodes a portion of its protein; exons are interspersed with noncoding introns
  • 14. Genome • The entire chromosomal genetic material of an organism
  • 15. Intron • Region of a gene’s DNA that is not translated into a protein
  • 16. Kilobase (kb) • Unit of DNA equal to 1000 bases
  • 17. Locus • Chromosomal location of a gene or other piece of DNA
  • 18. Megabase (mb) • Unit of DNA equal to 1 million bases
  • 19. PCR • Polymerase chain reaction: a technique for amplifying a piece of DNA quickly and cheaply
  • 20. Physical map • A map of the locations of identifiable markers spaced along the chromosomes; a physical map may also be a set of overlapping clones
  • 21. Plasmid • Loop of bacterial DNA that replicates independently of the chromosomes; artificial plasmids can be inserted into bacteria to amplify DNA for sequencing
  • 22. Regulatory region • A segment of DNA that controls whether a gene will be expressed and to what degree
  • 23. Repetitive DNA • Sequences of varying lenths that occur in multiple copies in the genome; it represents much of the genome
  • 24. Restriction enzyme • An enzyme that cuts DNA at specific sequences of base pairs
  • 25. RFLP • Restriction fragment length polymorphism: genetic variation in the length of DNA fragments produced by restriction enzymes; useful as markers on maps
  • 26. Scaffold • A series of contigs that are in the right order but are not necessarily connected in one continuous stretch of sequence
  • 27. Shotgun sequencing • Breaking DNA into many small pieces, sequencing the pieces, and assembling the fragments
  • 28. STS • Sequence tagged site: a unique stretch of DNA whose location is known; serves as a landmark for mapping and assembly
  • 29. YAC • Yeast artificial chromosome: yeast DNA spliced with a large fragment of a genome (up to 1 mb) to be amplified in yeast cells and sequenced
  • 30. Readings • Myers, “Whole Genome DNA Sequencing,” http://www.cs. arizona.edu/people/gene/PAPERS/whole.IEEE.pdf • Venter, et al, “The Sequence of the Human Genome,” Science, 16 Feb 2001, Vol. 291 No 5507, 1304 (parts 1 & 2) • Waterston, Lander, Sulston, “On the sequencing of the human genome,” PNAS, March 19, 2002, Vol 99, no 6, 3712-3716 • Myers, et.al., “On the sequencing and assembly of the human genome,” www.pnas.org/cgi/doi/10.1073/pnas.092136699
  • 31. Hierarchical sequencing • Create a high-level physical map, using ESTs and STSs • Shred genome into overlapping clones • Multiply clones in BACs • ‘shotgun’ each clone • Read each ‘shotgunned’ fragment • Assemble the fragments
  • 33. Whole genome sequencing (WGS) • Make multiple copies of the target • Randomly ‘shotgun’ each target, discarding very big and very small pieces • Read each fragment • Reassemble the ‘reads’
  • 35. The fragment assembly problem • Aim: infer the target from the reads • Difficulties – – Incomplete coverage. Leaves contigs separated by gaps of unknown size. – Sequencing errors. Rate increases with length of read. Less than some e. – Unknown orientation. Don’t know whether to use read or its Watson-Crick complement.
  • 36. Scaling and computational complexity • Increasing size of target G. – 1990 – 40kb (one cosmid) – 1995 – 1.8 mb (H. Influenza) – 2001 – 3,200 mb (H. sapiens)
  • 37. The repeat problem • Repeats – Bigger G means more repeats – Complex organisms have more repetitive elements – Small repeats may appear multiple times in a read – Long repeats may be bigger than reads (no unique region)
  • 38. Gaps • Read length LR hasn’t changed much • w = LR /G gets steadily smaller • Gaps ~ Re- wR (Waterman & Lander)
  • 39. How deep must coverage be?
  • 40. Double-barreled shotgun sequencing • Choose longer fragments (say, 2 x LR) • Read both ends • Such fragments probably span gaps • This gives an approximate size of the gap • This links contigs into scaffolds
  • 42. HGSC v Celera results
  • 43. To do or not to do? • “The idea is gathering momentum. I shiver at the thought.” – David Baltimore, 1986 • “If there is anything worth doing twice, it’s the human genome.” – David Haussler, 2000
  • 44. Public or private? • “This information is so important that it cannot be proprietary.” – C Thomas Caskey, 1987 • “If a company behaves in what scientists believe is a socially responsible manner, they can’t make a profit.” – Robert Cook- Deegan, 1987
  • 45. HW for Feb 17 • Comment on these assertions (500-1000 words): – WLS – “Our analysis indicates that the Celera paper provides neither a meaningful test of the WGS approach nor an independent sequence of the human genome.” – Venter – “This conclusion is based on incorrect assumptions and flawed reasoning.”