SlideShare a Scribd company logo
C. elegans Genetics

 C.elegans has 2 sexes, self fertilizing hermaphrodites and males.

 Sex determined chromosomally - XX-hermaphrodite, X-male.

 Diploid for 5 autosomes.

 Standard classical genetic techniques can be applied.

 Life cycle – Zygote to adult ~3 days.

 Grow on petri dish – they eat bacteria.

 Can store them frozen in liquid nitrogen indefinately.




   Why might the hermaphrodite sex be useful for genetics?
Chromosome I            Genetic mapping.

Left arm    m.u.   bli-3
                                    m.u. = map unit.
            -15    egl-30
                                    Genetic mapping – recombination.
                   mab-20
            -10
                                    1 m.u. is 1% recombination per meiosis.
             -5     fog-1
                    unc-73 unc-57
Central      0      dpy-5
                           dpy-14
cluster             fer-1
             5      lin-11 unc-29

                    unc-75                Parent              Recombinant
            10
                    unc-101
            15

            20      glp-4             fog-1     +       fog-1           +
            25
                    unc-54            glp-4     +         +             glp-4
Right arm
We want to understand how life works – at the molecular level.

We had mutant genes with informative phenotypes.

The mutated genes were mapped onto linkage groups – chromosomes.

What kinds of proteins do these genes encode and how do these proteins
function?



   In 1983, identifying the molecular sequence of a gene
   defined by mutation was a complicated and time
   consuming business, even in the worm.

   If we only new the sequence of the genome!
As the term applies to recombinant DNA, what is a clone?




                            Starting with DNA extracted from
                            any organism,
             Vector
                            How can you take that and get one
                            single fragment into a vector and
                            grow billions of copies of that
                            single “cloned” molecule?


        Cloned DNA insert
C. elegans Genome Project




                                                                             unc-101
                                                                    unc-75




                                                                                                    unc-54
                                                  unc-73
                                         mab-20




                                                           lin-11
                                                  dpy-5




                                                                                       glp-4
                                                  fog-1
                                egl-30




                                                           fer-1
 Mutants - function



                       bli-3
     Genetic map




                                                                                               25
                                                                      10

                                                                                15

                                                                                       20
                                                       0

                                                              5
                               -15

                                           -10

                                                  -5
    Chromosomes        AACGTTCCACG.......

DNA sequence – genes                                                                                         Cloned DNA
and proteins                                                                                                 fragments




 Identify DNA sequences corresponding to genes defined by mutation.
If you wanted to clone sections of chromosomes for
     sequencing, how many copies of each chromosome
     would you start with?




  DNA




Of the order of millions – millions of copies of each chromosome
Purified genomic DNA




Fragment the chromosomal
DNA – either restriction
enzyme or mechanical shear.
Cloning methods used by the C. elegans genome project
                 Cosmid clones – ~ 40 Kb insert size – Genomic Library.


   Cosmid cloning vector                                  Linearised cosmid vector



                                                           Random fragments of
                                                           genomic DNA –
   Drug resistance marker
   E. coli origin of replication                           millions of them.
   cos site
   Useful restriction sites             DNA Ligase




Long concatenates of cosmid
vectors interspaced with
random fragments of
genomic DNA.
Mixed population “inserts”         In vitro lambda packaging
                                           extracts

                                                Lambda Terminase

                                                Other phage proteins

     COS sites in
     cosmid vector


                                 E. coli


             Critical step




Phage “transfects” single
cosmid into an E. coli cell.
CLONING
 This is a clone
                               Cells are plated onto medium with antibiotic selection.

                               Cells grown up to form bacterial colonies.
   Insert X
                               Each colony is derived from a single transfected cell.

                               Each colony is a clonal population.

                                          E. coli - clonal population with a single
                                          cosmid clone – single genomic DNA
                                          fragment.

                                                Billions of copies of one cloned
                                                insert.
                                                Freeze it for storage.
                                                Purify cosmid DNA.
                                                Sequence the insert.
Solid medium on plates   Liquid culture         Sub-clone fragments etc.
Started with many millions of different fragments of chromosomal DNA in
one tube.

End up with potentially millions of CLONED fragments, each in a different
E.coli colony – or culture.
We have got as far as random cloned fragments of genomic DNA.

What next?



Average cosmid insert size – 40 Kb

C.elegans genome ~100.3 Mb = 100,300 Kb

100,300/40 = 2,507.5

i.e. ~2,500 cosmid clones could contain the entire C. elegans
genome – but WOULD they?
In principle, 2500 cosmid clones could contain all the DNA of the C. elegans
    genome.

    Why not just start sequencing ~2500 clones picked at random?


  Imagine this:

  I give you a large and awkwardly shaped dice with 2500 faces, with a single
  number on each face, the numbers 1-2500.

  Roll the dice and write down the number on top.
  Repeat this – again and again and…….

  How many times would you have to roll the dice so that every face of the dice
  would have been on top at least once?


~ 4x2500 will give ~95% probability of any one side or DNA fragment, appearing.
~10x2500 raises probability to ~99%
The Golden Path
What if you could identify clones that overlapped slightly with ones another?



                                                How can we get these clones?


    Cloned DNA fragments – moderate overlaps.



With this approach you could sequence the entire genome by
sequencing less than 5000 cosmid clones (2x2500)
Cosmid fingerprinting

          1. Restriction digest of cosmid DNA.
          2. Separate fragments according to size by gel electrophoresis.
          3. Digitise the ladder of different sized DNA fragments obtained.

        Multiple common fragments – clones probably overlap.

        C. elegans genome project, ~17,000 cosmid clones fingerprinted.
A B C
        Assembled into “contigs” – overlapping clones.



                   “Contig”              ~17,000 random cosmid clones
          A                              Fingerprinting ~700 contigs
              B
                    C
                          D                C.elegans genome 100 Mb
                                           ~2,500 cosmid clones
700 contigs.

What is the minimum number of contigs the C. elegans genome could be
contained in?

Or – how would we know when we had succeeded in joining all the contigs?




 A method of filling the gaps – joining the contigs – was needed.
YACs – Yeast Artificial Chromosomes

DNA inserts of ~100 kb – 2 Mb.

Grown in yeast.

Clonal growth of yeast colonies, much like cosmids in E. coli.

YAC DNA separated by pulsed-field gel electrophoresis.




        C. elegans genome is ~100 Mb.

        Cosmid clones – approximately 40 kb inserts.
        YAC clones – select average 500 kb inserts.

        ~2500 cosmid clones would permit 1x coverage of the genome.
        ~200 YAC clones would permit 1x coverage of the genome.
~17,000 fingerprinted cosmid clones – ~700 unlinked contigs.


Cosmid clone
contigs


                                        ?                                                ?



6 Chromosomes      AACGTTCCACG.......




                                                                       unc-101
                                                              unc-75




                                                                                              unc-54
                                            unc-73
                                   mab-20




                                                     lin-11
                                            dpy-5




                                                                                 glp-4
                                            fog-1
                          egl-30




                                                     fer-1
                 bli-3




Genetic map

                                                                          15
                                                                10



                                                                                 20

                                                                                         25
                                                 0

                                                        5
                         -15

                                     -10

                                            -5
Joining up the contigs

              Contig X                                  Contig Y



                                  YAC clone


~700 contigs – grids of
representative cosmid clones.          • Large YAC clones (> 1Mb).
                                       • Purify YAC DNA – (PFGE).
                                       • Radio-label YAC DNA.
                                       • Hybridise to cosmid grid.
                                       • Expose to X-ray film.



                                 Linked cosmid clones
unc-101
                                                                   unc-75




                                                                                                   unc-54
                                                 unc-73
                                        mab-20




                                                          lin-11
                                                 dpy-5




                                                                                      glp-4
                                                 fog-1
                               egl-30




                                                          fer-1
                      bli-3
       Genetic map




                                                                     10

                                                                               15

                                                                                      20

                                                                                              25
                                                      0

                                                             5
                              -15

                                          -10

                                                 -5
A physical map of the genome - the “Golden Path” – chromosomes represented in ordered
overlapping clones or “clone contigs”.

YACs
    Cosmids




                     The Sequence of The Genome
Sequencing the C. elegans Genome




                              Individual cosmid clone.
                                  Randomly fragmented and shotgun cloned into
                                  sequencing vectors.
                                  Generally smaller insert size is best for primary
                                  sequence determination – 2-10 Kb.



Sequence of cosmid or YAC etc, determined and compiled in silico.
Finishing – directed cloning to fill in any gaps.

Check for overlap of sequence with overlapping cosmids.
Gaps between cosmid contigs ~20% of genome.

Most of these gaps were not random. They contained regions that could not be
cloned in cosmids.




YAC clones covering most of the gaps.

YAC DNA shotgun cloned into M13 or plasmid vectors.

Most of the DNA contained in these awkward regions was successfully sub-cloned
into small insert size vectors, and sequenced.

The sequence as published in December 1998 was generated from:
2527 cosmids, 257 YACs, 113 fosmids, 44 PCR products.
C. elegans cosmid K06A5, 24323 bp.
Flat sequence file –3955 bp shown.

>CEK06A5
acaagagagggcgcctcggccgtatgttgaatgggagatcgatggaaccgagacaacgagaaaaggaatagagacggagaaagagagagagagcgcgcgttgttggaaggatg
aaaaagaaaaaagacatgagctgcttcacaagagcttggcgaaagcaaagggcaaagtgttgacagcttagtggtggtagttggatcttctctcctcgttctctgctcacaac
tcgtctatcactcatatcacatttatttcccaatatcattttaacaacatcttccgatgcatgttcgtcaatattgcgcaaccactttgcaatattgtcaaaacttttcgcat
ttgtgatatcgtaaaccagcataattcccattgctccgcggtaatatgatgttgtgattgtgtggaatcgttcttgtccagctgtgtcccagatttgtaatttaatctttttt
ccttttaattcgatagttttaattttgaagtcgattcctgaatgaaaaaagaaaattattttgaaatcactagattctgaataaaaactaaccaatagttgagatgaatgtgg
tgttaaaggcatcatccgaaaatctgtacagaatgcaagtttttccaactcctgagtcgcctattagcagcaatttgaagagcatgtcatacggtcggcgagccatttttctt
ctgaaatgagaaaaagttgagaactaaagttgcacaaaagtaagagaaaagcacttgagtcatggcaaatagaacgaacactttgagatttcgaagaagttatcaagagttga
caattggaagatatttggaagaactttctaatttttttctagttttccaaaattaggtttttgtcataaaatgttgtcaaagaaaaaacaggacaaaatagttaattgttgtt
tccattataacaaaaaaaaatttgaacggagctattaacgcgtgcatgcgcaaatcacatcgattagctgtttctgggaaattctcgggaaaaggtgaacagcagctgctggc
ttcctctgcgggtcacgaaaacacaaagagatcattataattgttatttggaaaggaagcgaatctaaaacgggtacaggtggacgtttattgatcgaaagtgctttttattt
gaaattgaatggtgaactttgcaattttgtaatgcaaagtacgttatcagatggcatgagatgtgtgaagtgataaggaataaaatgtgaacgacatgttcaagaaactgtga
tttttcaataatttgtgatgaaatattttaggaacagaaatgaacatattaattgatataaaaacaataggaacactaactcataattatgataggtgaatatcaaaatgtgc
tagattttttgaagttaaaaaatacatttctaatattttttcaaataataagtttcagctgaaatttcagggtgatttcagaaagctatgttttgataaattgttttgaaaat
taaaagaagctacagcaaaaaaaaattaaagagaacatcgctccctcgtagtgtataatttttgattatcgaaaaaaatgagtcaatgatgaaaaggaagtcgcaatctcaaa
acttcaaaaatcaaaagaagccgttgcctctgtcatcaaaaattcagaagacaaggttgttgacaagggtcaattctcagtggtggagggcattgggcgtggtgaaatttttg
aaggctagtgtggttggacctctactagatagacaaaacccccgaaatagacgtttaatttgatgagatggtggagaaagaaaaggactcattctctagatgatagagagacc
agagatacagacaagagagggcgcctcggccgtatgttgaatgggagatcgatggaaccgagacaacgagaaaaggaatagagacggagaaagagagagagagcgcgcgttgt
tggaaggatgaaaaagaaaaaagacatgagctgcttcacaagagcttggcgaaagcaaagggcaaagtgttgacagcttagtggtggtagttggatcatgtgtttttatgttt
ccggtgggagaaggttcaacaaaaaatgaaaagaaaaagttcaagcggcatgaatcattctgagtttaaaacaaaattattgcgaaaattaatattaaaaccttttcacaaaa
cttcaagctaatctgttcatgaaaatttgaataatagttttttcccacctatttagaattaacttcatattaacgaaattaattaacgaatcgaaaattatgacttttcagaa
tcatctgaagttttttcacattccatgctgcatggaataatttgatcctggaatcgatatgtttttatggtatactttttaaccttcaatttagctggaaaagtatggaataa
ataattcccgaagctatgtacatatatgtagaattattgaatgattgtgagaacaacttgactttagcttgagtaggaatcggaatggctatcgaccgatcaacacttaggat
tgtaagaatggcagtaagaatatattgaagaaagaatgtttgttcataggaagagaaagagtattgcgaaatcatcatcgcccactttagaatggacgggcggtgagcggaca
tagagaattgtgaatgactaatgcttttgcagaatctagggcaaaatcgtaggaacaaacaattgtaatacggagaaaacaatcatatcgatcgatgatcatggagaaaaatg
tgatttaagtgagtagacttggaaaaattaataaaagcatgaattgtcgatatttttcatttattttcattataaagctctttaaaaacaaattaaatattgagaatggcttc
gaagaatattgtttcaaatatgttcaatggtgacaccttgcggataaaattaatgtaaaaatcatggaacacagattcactgatatctcattatctcaagcagtgtaattaga
gattttttggaacaattattttataaaactataaataaaccgtttatactactcaaagccaaatattcaagctattaccattttttttctaactaattcttgagcaattaaag
tattccccagtttttattttgcaacgactccaggcaaacacgctccgttgcacttgccgccaaggcgttgcattcaaatcagagagacatctcattccgatttctgtttttct
tccaataaacggtattttatgcctaatgggtgatacggaaattgttcctcttcgagtacaaaatgtacttgatagcgaaatcattcgtctcaacttgtggtccatgaaggtaa
ctgtctagtttttttaagttttcatgatttcaatatttttacagtttaacgcgaccagtttcaaactcgaaggttttgtgagaaatgaagaaggcactatgatgcagaaagtt
tgttccgaatttatttgtgtaagtcgagaaacatattcgtcaacaattttcattaaatattcagagacgcttcacttctacgttgcttttcgatgtttccggacgtttcttcg
acttggtcggacagattgatcgggaatatcaacaaaaaatgggaatgcctagtagaattattgatgaattttcaaatggaattcctgaaaattgggccgaccttatctattcc
tgcatgtcagccaaccaaagaagcgcacttcgccctatccaacaggctccaaaagaaccaattagaactagaacagaaccaattgttacgttggcagatgaaaccgagctaac
tggaggatgccagaaaaattccgaaaacgagaaagaaaggaacagacgtgagcgtgaagaacagcaaacaaaggaacgtgagagaagattagaagaagaaaaacaacgacgag
atgctgaagctgaggctgaaagaaggcgaaaagaagaggaagagctggaagaagctaattacacccttcgtgctccgaaatctcagaacggcgagccaatcactccgataaga
Genome sequence of C.elegans.

                                Sequence of entire genome.

                                Sequence of cDNA clones.

                                Approximately 19,500 predicted protein coding
                                gene sequences.

                                Large number of various kinds of functional
                                RNAs – not discuss further.

                                For this lecture – focus predicted proteins.




                                     Gene prediction? How?

Science, December 1998.
Computer based predictions

GENEFINDER

Biases in coding sequence - in C. elegans non-coding is AT rich.
Splice site signals, initiator methionines, termination codons.

Likely exons and probable/possible splice patterns.




       • Evidence that a prediction is correct?
       • Homology with genes in other organisms – homologues.
       • Known protein families.

       •Experimental evidence.

More Related Content

PPTX
gateway cloning
PDF
E. coli plasmids based vectors
PDF
Dna assembly 1
PPT
65 biotechnology2008 1
PPTX
Manipulating nucleic acids
PPTX
PPTX
Genetic engineering
PPT
Biotech 2011-08-recombinant-dna
gateway cloning
E. coli plasmids based vectors
Dna assembly 1
65 biotechnology2008 1
Manipulating nucleic acids
Genetic engineering
Biotech 2011-08-recombinant-dna

What's hot (19)

PPTX
Bacteriophage T4 and Bacteriophage lambda
DOC
Batch (1) final sem (1) molecular biology
DOC
Batch (1) supplementary exam m.sc - molecular biology
PDF
Pb Stem Cell Engineering
PPT
Cancer ppt 2
PDF
rprotein2
PPT
Critical role of host factors which recruit replication in positive strand rn...
PDF
dna rekombinan
PPTX
MOLECULAR ORGANIZATION OF EKARYOTIC RNA
PPTX
Structural studies of GPCRs
PDF
Ribozymes
PDF
Biotechniques v29p146 Admid
PDF
rprotein3
PPTX
Práctica de Transformación
PDF
Comparative Genomics for Marker Development in Cassava
PDF
Bairoch ISB closing-talk: CALIPHO
PDF
Heinrich et al., 2010
PDF
Dna Modifying Enzymes by Arijit Pani
Bacteriophage T4 and Bacteriophage lambda
Batch (1) final sem (1) molecular biology
Batch (1) supplementary exam m.sc - molecular biology
Pb Stem Cell Engineering
Cancer ppt 2
rprotein2
Critical role of host factors which recruit replication in positive strand rn...
dna rekombinan
MOLECULAR ORGANIZATION OF EKARYOTIC RNA
Structural studies of GPCRs
Ribozymes
Biotechniques v29p146 Admid
rprotein3
Práctica de Transformación
Comparative Genomics for Marker Development in Cassava
Bairoch ISB closing-talk: CALIPHO
Heinrich et al., 2010
Dna Modifying Enzymes by Arijit Pani
Ad

Viewers also liked (20)

PPTX
molecular biology techniques -jaypee university of information technology- ra...
PPTX
DNA microarray
PPT
Probe labeling
PPTX
RT PCR
PPTX
NetBioSIG2014-Keynote by Marian Walhout
PDF
Back to basics: Fundamental Concepts and Special Considerations in RNA Isolation
PPTX
molecular biology techniques -jaypee university of information technology- ra...
PPTX
Lectut btn-202-ppt-l22. hybridization procedures
PPTX
RNA isolation
PDF
281 lec30 mol_tech2
PDF
Lec16 Realtime PCR
PPTX
B.Tech Biotechnology II Elements of Biotechnology Unit 4 DNA Fingerprinting
PPT
Molecular probes kashmeera n.a.
PDF
281 lec29 mol_tech1
PPT
Pseudomonas
PPTX
Preparation and isolation of genomic
PPT
Genome organisation in eukaryotes...........!!!!!!!!!!!
PPT
PPTX
Dna isolation Principle
PPTX
Real time PCR
molecular biology techniques -jaypee university of information technology- ra...
DNA microarray
Probe labeling
RT PCR
NetBioSIG2014-Keynote by Marian Walhout
Back to basics: Fundamental Concepts and Special Considerations in RNA Isolation
molecular biology techniques -jaypee university of information technology- ra...
Lectut btn-202-ppt-l22. hybridization procedures
RNA isolation
281 lec30 mol_tech2
Lec16 Realtime PCR
B.Tech Biotechnology II Elements of Biotechnology Unit 4 DNA Fingerprinting
Molecular probes kashmeera n.a.
281 lec29 mol_tech1
Pseudomonas
Preparation and isolation of genomic
Genome organisation in eukaryotes...........!!!!!!!!!!!
Dna isolation Principle
Real time PCR
Ad

Similar to Genomics 2011 lecture 2 (20)

PDF
LECTURE 5_Principle of Cloning.pdf
PPTX
PPTX
Recombinant DNA Technology- Study of cloning vectors.pptx
PPTX
Vectors part 1 | molecular biology | biotechnology
PPTX
Recombinant DNA Technology
PPT
CONFERENCE 5-Techniques in Genetic Engineering-1.ppt
PPTX
cloning vectors in RDT.pptx
PPTX
Molecular Cloning - Vectors: Types & Characteristics
PPTX
Gene Cloning
PPT
Application1
PDF
cloning vectors-2-85.pdf used in Recombinant DNA Technology
PPTX
Bacteriophage by reshma
PPTX
Recombinant dna technology.pptx mona
PPT
Dna cloning intro
DOC
Glossary For Biotechnology
PDF
dna cloning.pdf
PPTX
RECOMBINANT DNA TECHNOLOGY
PPT
Vectors-A.K.Saha_.ppt
PPTX
Lecture two Genetics engineering pptE.pptx
PPTX
Vectors.pptx
LECTURE 5_Principle of Cloning.pdf
Recombinant DNA Technology- Study of cloning vectors.pptx
Vectors part 1 | molecular biology | biotechnology
Recombinant DNA Technology
CONFERENCE 5-Techniques in Genetic Engineering-1.ppt
cloning vectors in RDT.pptx
Molecular Cloning - Vectors: Types & Characteristics
Gene Cloning
Application1
cloning vectors-2-85.pdf used in Recombinant DNA Technology
Bacteriophage by reshma
Recombinant dna technology.pptx mona
Dna cloning intro
Glossary For Biotechnology
dna cloning.pdf
RECOMBINANT DNA TECHNOLOGY
Vectors-A.K.Saha_.ppt
Lecture two Genetics engineering pptE.pptx
Vectors.pptx

Recently uploaded (20)

PPTX
sap open course for s4hana steps from ECC to s4
PDF
cuic standard and advanced reporting.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Machine learning based COVID-19 study performance prediction
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
A Presentation on Artificial Intelligence
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation theory and applications.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Machine Learning_overview_presentation.pptx
PPTX
Spectroscopy.pptx food analysis technology
sap open course for s4hana steps from ECC to s4
cuic standard and advanced reporting.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Review of recent advances in non-invasive hemoglobin estimation
Chapter 3 Spatial Domain Image Processing.pdf
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Reach Out and Touch Someone: Haptics and Empathic Computing
Machine learning based COVID-19 study performance prediction
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
A Presentation on Artificial Intelligence
Network Security Unit 5.pdf for BCA BBA.
Encapsulation theory and applications.pdf
Empathic Computing: Creating Shared Understanding
Spectral efficient network and resource selection model in 5G networks
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Encapsulation_ Review paper, used for researhc scholars
Machine Learning_overview_presentation.pptx
Spectroscopy.pptx food analysis technology

Genomics 2011 lecture 2

  • 1. C. elegans Genetics  C.elegans has 2 sexes, self fertilizing hermaphrodites and males.  Sex determined chromosomally - XX-hermaphrodite, X-male.  Diploid for 5 autosomes.  Standard classical genetic techniques can be applied.  Life cycle – Zygote to adult ~3 days.  Grow on petri dish – they eat bacteria.  Can store them frozen in liquid nitrogen indefinately. Why might the hermaphrodite sex be useful for genetics?
  • 2. Chromosome I Genetic mapping. Left arm m.u. bli-3 m.u. = map unit. -15 egl-30 Genetic mapping – recombination. mab-20 -10 1 m.u. is 1% recombination per meiosis. -5 fog-1 unc-73 unc-57 Central 0 dpy-5 dpy-14 cluster fer-1 5 lin-11 unc-29 unc-75 Parent Recombinant 10 unc-101 15 20 glp-4 fog-1 + fog-1 + 25 unc-54 glp-4 + + glp-4 Right arm
  • 3. We want to understand how life works – at the molecular level. We had mutant genes with informative phenotypes. The mutated genes were mapped onto linkage groups – chromosomes. What kinds of proteins do these genes encode and how do these proteins function? In 1983, identifying the molecular sequence of a gene defined by mutation was a complicated and time consuming business, even in the worm. If we only new the sequence of the genome!
  • 4. As the term applies to recombinant DNA, what is a clone? Starting with DNA extracted from any organism, Vector How can you take that and get one single fragment into a vector and grow billions of copies of that single “cloned” molecule? Cloned DNA insert
  • 5. C. elegans Genome Project unc-101 unc-75 unc-54 unc-73 mab-20 lin-11 dpy-5 glp-4 fog-1 egl-30 fer-1 Mutants - function bli-3 Genetic map 25 10 15 20 0 5 -15 -10 -5 Chromosomes AACGTTCCACG....... DNA sequence – genes Cloned DNA and proteins fragments Identify DNA sequences corresponding to genes defined by mutation.
  • 6. If you wanted to clone sections of chromosomes for sequencing, how many copies of each chromosome would you start with? DNA Of the order of millions – millions of copies of each chromosome
  • 7. Purified genomic DNA Fragment the chromosomal DNA – either restriction enzyme or mechanical shear.
  • 8. Cloning methods used by the C. elegans genome project Cosmid clones – ~ 40 Kb insert size – Genomic Library. Cosmid cloning vector Linearised cosmid vector Random fragments of genomic DNA – Drug resistance marker E. coli origin of replication millions of them. cos site Useful restriction sites DNA Ligase Long concatenates of cosmid vectors interspaced with random fragments of genomic DNA.
  • 9. Mixed population “inserts” In vitro lambda packaging extracts Lambda Terminase Other phage proteins COS sites in cosmid vector E. coli Critical step Phage “transfects” single cosmid into an E. coli cell.
  • 10. CLONING This is a clone Cells are plated onto medium with antibiotic selection. Cells grown up to form bacterial colonies. Insert X Each colony is derived from a single transfected cell. Each colony is a clonal population. E. coli - clonal population with a single cosmid clone – single genomic DNA fragment. Billions of copies of one cloned insert. Freeze it for storage. Purify cosmid DNA. Sequence the insert. Solid medium on plates Liquid culture Sub-clone fragments etc.
  • 11. Started with many millions of different fragments of chromosomal DNA in one tube. End up with potentially millions of CLONED fragments, each in a different E.coli colony – or culture.
  • 12. We have got as far as random cloned fragments of genomic DNA. What next? Average cosmid insert size – 40 Kb C.elegans genome ~100.3 Mb = 100,300 Kb 100,300/40 = 2,507.5 i.e. ~2,500 cosmid clones could contain the entire C. elegans genome – but WOULD they?
  • 13. In principle, 2500 cosmid clones could contain all the DNA of the C. elegans genome. Why not just start sequencing ~2500 clones picked at random? Imagine this: I give you a large and awkwardly shaped dice with 2500 faces, with a single number on each face, the numbers 1-2500. Roll the dice and write down the number on top. Repeat this – again and again and……. How many times would you have to roll the dice so that every face of the dice would have been on top at least once? ~ 4x2500 will give ~95% probability of any one side or DNA fragment, appearing. ~10x2500 raises probability to ~99%
  • 14. The Golden Path What if you could identify clones that overlapped slightly with ones another? How can we get these clones? Cloned DNA fragments – moderate overlaps. With this approach you could sequence the entire genome by sequencing less than 5000 cosmid clones (2x2500)
  • 15. Cosmid fingerprinting 1. Restriction digest of cosmid DNA. 2. Separate fragments according to size by gel electrophoresis. 3. Digitise the ladder of different sized DNA fragments obtained. Multiple common fragments – clones probably overlap. C. elegans genome project, ~17,000 cosmid clones fingerprinted. A B C Assembled into “contigs” – overlapping clones. “Contig” ~17,000 random cosmid clones A Fingerprinting ~700 contigs B C D C.elegans genome 100 Mb ~2,500 cosmid clones
  • 16. 700 contigs. What is the minimum number of contigs the C. elegans genome could be contained in? Or – how would we know when we had succeeded in joining all the contigs? A method of filling the gaps – joining the contigs – was needed.
  • 17. YACs – Yeast Artificial Chromosomes DNA inserts of ~100 kb – 2 Mb. Grown in yeast. Clonal growth of yeast colonies, much like cosmids in E. coli. YAC DNA separated by pulsed-field gel electrophoresis. C. elegans genome is ~100 Mb. Cosmid clones – approximately 40 kb inserts. YAC clones – select average 500 kb inserts. ~2500 cosmid clones would permit 1x coverage of the genome. ~200 YAC clones would permit 1x coverage of the genome.
  • 18. ~17,000 fingerprinted cosmid clones – ~700 unlinked contigs. Cosmid clone contigs ? ? 6 Chromosomes AACGTTCCACG....... unc-101 unc-75 unc-54 unc-73 mab-20 lin-11 dpy-5 glp-4 fog-1 egl-30 fer-1 bli-3 Genetic map 15 10 20 25 0 5 -15 -10 -5
  • 19. Joining up the contigs Contig X Contig Y YAC clone ~700 contigs – grids of representative cosmid clones. • Large YAC clones (> 1Mb). • Purify YAC DNA – (PFGE). • Radio-label YAC DNA. • Hybridise to cosmid grid. • Expose to X-ray film. Linked cosmid clones
  • 20. unc-101 unc-75 unc-54 unc-73 mab-20 lin-11 dpy-5 glp-4 fog-1 egl-30 fer-1 bli-3 Genetic map 10 15 20 25 0 5 -15 -10 -5 A physical map of the genome - the “Golden Path” – chromosomes represented in ordered overlapping clones or “clone contigs”. YACs Cosmids The Sequence of The Genome
  • 21. Sequencing the C. elegans Genome Individual cosmid clone. Randomly fragmented and shotgun cloned into sequencing vectors. Generally smaller insert size is best for primary sequence determination – 2-10 Kb. Sequence of cosmid or YAC etc, determined and compiled in silico. Finishing – directed cloning to fill in any gaps. Check for overlap of sequence with overlapping cosmids.
  • 22. Gaps between cosmid contigs ~20% of genome. Most of these gaps were not random. They contained regions that could not be cloned in cosmids. YAC clones covering most of the gaps. YAC DNA shotgun cloned into M13 or plasmid vectors. Most of the DNA contained in these awkward regions was successfully sub-cloned into small insert size vectors, and sequenced. The sequence as published in December 1998 was generated from: 2527 cosmids, 257 YACs, 113 fosmids, 44 PCR products.
  • 23. C. elegans cosmid K06A5, 24323 bp. Flat sequence file –3955 bp shown. >CEK06A5 acaagagagggcgcctcggccgtatgttgaatgggagatcgatggaaccgagacaacgagaaaaggaatagagacggagaaagagagagagagcgcgcgttgttggaaggatg aaaaagaaaaaagacatgagctgcttcacaagagcttggcgaaagcaaagggcaaagtgttgacagcttagtggtggtagttggatcttctctcctcgttctctgctcacaac tcgtctatcactcatatcacatttatttcccaatatcattttaacaacatcttccgatgcatgttcgtcaatattgcgcaaccactttgcaatattgtcaaaacttttcgcat ttgtgatatcgtaaaccagcataattcccattgctccgcggtaatatgatgttgtgattgtgtggaatcgttcttgtccagctgtgtcccagatttgtaatttaatctttttt ccttttaattcgatagttttaattttgaagtcgattcctgaatgaaaaaagaaaattattttgaaatcactagattctgaataaaaactaaccaatagttgagatgaatgtgg tgttaaaggcatcatccgaaaatctgtacagaatgcaagtttttccaactcctgagtcgcctattagcagcaatttgaagagcatgtcatacggtcggcgagccatttttctt ctgaaatgagaaaaagttgagaactaaagttgcacaaaagtaagagaaaagcacttgagtcatggcaaatagaacgaacactttgagatttcgaagaagttatcaagagttga caattggaagatatttggaagaactttctaatttttttctagttttccaaaattaggtttttgtcataaaatgttgtcaaagaaaaaacaggacaaaatagttaattgttgtt tccattataacaaaaaaaaatttgaacggagctattaacgcgtgcatgcgcaaatcacatcgattagctgtttctgggaaattctcgggaaaaggtgaacagcagctgctggc ttcctctgcgggtcacgaaaacacaaagagatcattataattgttatttggaaaggaagcgaatctaaaacgggtacaggtggacgtttattgatcgaaagtgctttttattt gaaattgaatggtgaactttgcaattttgtaatgcaaagtacgttatcagatggcatgagatgtgtgaagtgataaggaataaaatgtgaacgacatgttcaagaaactgtga tttttcaataatttgtgatgaaatattttaggaacagaaatgaacatattaattgatataaaaacaataggaacactaactcataattatgataggtgaatatcaaaatgtgc tagattttttgaagttaaaaaatacatttctaatattttttcaaataataagtttcagctgaaatttcagggtgatttcagaaagctatgttttgataaattgttttgaaaat taaaagaagctacagcaaaaaaaaattaaagagaacatcgctccctcgtagtgtataatttttgattatcgaaaaaaatgagtcaatgatgaaaaggaagtcgcaatctcaaa acttcaaaaatcaaaagaagccgttgcctctgtcatcaaaaattcagaagacaaggttgttgacaagggtcaattctcagtggtggagggcattgggcgtggtgaaatttttg aaggctagtgtggttggacctctactagatagacaaaacccccgaaatagacgtttaatttgatgagatggtggagaaagaaaaggactcattctctagatgatagagagacc agagatacagacaagagagggcgcctcggccgtatgttgaatgggagatcgatggaaccgagacaacgagaaaaggaatagagacggagaaagagagagagagcgcgcgttgt tggaaggatgaaaaagaaaaaagacatgagctgcttcacaagagcttggcgaaagcaaagggcaaagtgttgacagcttagtggtggtagttggatcatgtgtttttatgttt ccggtgggagaaggttcaacaaaaaatgaaaagaaaaagttcaagcggcatgaatcattctgagtttaaaacaaaattattgcgaaaattaatattaaaaccttttcacaaaa cttcaagctaatctgttcatgaaaatttgaataatagttttttcccacctatttagaattaacttcatattaacgaaattaattaacgaatcgaaaattatgacttttcagaa tcatctgaagttttttcacattccatgctgcatggaataatttgatcctggaatcgatatgtttttatggtatactttttaaccttcaatttagctggaaaagtatggaataa ataattcccgaagctatgtacatatatgtagaattattgaatgattgtgagaacaacttgactttagcttgagtaggaatcggaatggctatcgaccgatcaacacttaggat tgtaagaatggcagtaagaatatattgaagaaagaatgtttgttcataggaagagaaagagtattgcgaaatcatcatcgcccactttagaatggacgggcggtgagcggaca tagagaattgtgaatgactaatgcttttgcagaatctagggcaaaatcgtaggaacaaacaattgtaatacggagaaaacaatcatatcgatcgatgatcatggagaaaaatg tgatttaagtgagtagacttggaaaaattaataaaagcatgaattgtcgatatttttcatttattttcattataaagctctttaaaaacaaattaaatattgagaatggcttc gaagaatattgtttcaaatatgttcaatggtgacaccttgcggataaaattaatgtaaaaatcatggaacacagattcactgatatctcattatctcaagcagtgtaattaga gattttttggaacaattattttataaaactataaataaaccgtttatactactcaaagccaaatattcaagctattaccattttttttctaactaattcttgagcaattaaag tattccccagtttttattttgcaacgactccaggcaaacacgctccgttgcacttgccgccaaggcgttgcattcaaatcagagagacatctcattccgatttctgtttttct tccaataaacggtattttatgcctaatgggtgatacggaaattgttcctcttcgagtacaaaatgtacttgatagcgaaatcattcgtctcaacttgtggtccatgaaggtaa ctgtctagtttttttaagttttcatgatttcaatatttttacagtttaacgcgaccagtttcaaactcgaaggttttgtgagaaatgaagaaggcactatgatgcagaaagtt tgttccgaatttatttgtgtaagtcgagaaacatattcgtcaacaattttcattaaatattcagagacgcttcacttctacgttgcttttcgatgtttccggacgtttcttcg acttggtcggacagattgatcgggaatatcaacaaaaaatgggaatgcctagtagaattattgatgaattttcaaatggaattcctgaaaattgggccgaccttatctattcc tgcatgtcagccaaccaaagaagcgcacttcgccctatccaacaggctccaaaagaaccaattagaactagaacagaaccaattgttacgttggcagatgaaaccgagctaac tggaggatgccagaaaaattccgaaaacgagaaagaaaggaacagacgtgagcgtgaagaacagcaaacaaaggaacgtgagagaagattagaagaagaaaaacaacgacgag atgctgaagctgaggctgaaagaaggcgaaaagaagaggaagagctggaagaagctaattacacccttcgtgctccgaaatctcagaacggcgagccaatcactccgataaga
  • 24. Genome sequence of C.elegans. Sequence of entire genome. Sequence of cDNA clones. Approximately 19,500 predicted protein coding gene sequences. Large number of various kinds of functional RNAs – not discuss further. For this lecture – focus predicted proteins. Gene prediction? How? Science, December 1998.
  • 25. Computer based predictions GENEFINDER Biases in coding sequence - in C. elegans non-coding is AT rich. Splice site signals, initiator methionines, termination codons. Likely exons and probable/possible splice patterns. • Evidence that a prediction is correct? • Homology with genes in other organisms – homologues. • Known protein families. •Experimental evidence.