SlideShare a Scribd company logo
A
C
G
T
Comparative Analysis of HumanComparative Analysis of Human
Chromosome 22q11.1-q12.3 withChromosome 22q11.1-q12.3 with
Syntenic Regions in the Chimpanzee,Syntenic Regions in the Chimpanzee,
Baboon, Bovine, Mouse, Pufferfish andBaboon, Bovine, Mouse, Pufferfish and
Zebrafish GenomesZebrafish Genomes
Dr. Bruce A. RoeDr. Bruce A. Roe
George Lynn Cross Research ProfessorGeorge Lynn Cross Research Professor
Advanced Center for Genome TechnologyAdvanced Center for Genome Technology
Department of Chemistry and BiochemistryDepartment of Chemistry and Biochemistry
University of OklahomaUniversity of Oklahoma
broe@ou.edu www.genome.ou.edubroe@ou.edu www.genome.ou.edu
LXVIII CSHL Symposium
“The Genome of Homo Sapiens”
May 28 - June 3, 2003
A
C
G
T
““The joy of science is the peopleThe joy of science is the people
you meet along the way and howyou meet along the way and how
they influence your life”they influence your life”
Jochanan Stenesh and Lilian Myers at Western Michigan University
and Bernie Dudock at SUNY Stony Brook
Bart Barrell and Alan Coulson
originally at the MRC-Hills
Road Cambridge and Ian
Dunham both now at the
Sanger Institute
Watson and Crick
Fred Sanger
Bev Emanuel at Childrens
Hospital of Philadelphia
A
C
G
T
Sanger,
Keio,
Wash U,
OU
A
C
G
T
Human Chromosome 22Human Chromosome 22
Sequence FeaturesSequence Features
• 39 % of the sequence is occupied by genes including39 % of the sequence is occupied by genes including
their introns, 5’ and 3’ non-translated regions.their introns, 5’ and 3’ non-translated regions.
• 3 % of the complete sequence encodes the protein3 % of the complete sequence encodes the protein
products of these genes.products of these genes.
• 42 % of the sequence is composed of repetitive42 % of the sequence is composed of repetitive
sequences, compared to 46 % for the entire genome.sequences, compared to 46 % for the entire genome.
• Only slightly over half of the genes predicted forOnly slightly over half of the genes predicted for
human chromosome 22 can be experimentallyhuman chromosome 22 can be experimentally
validated.*validated.*
* Shoemaker DD., et al. Experimental annotation of the human
genome using microarray technology. Nature. 409, 922-7 (2001).
A
C
G
T
An Individual’s Genome
Differs from the DNA of:
• Siblings by 1 to 2 million bases, ~99.98% identical, with
coding regions 99.99999% identical
• Unrelated humans by 6 million bases, ~99.8% identical
overall, with coding regions 99.9999% identical
• Chimpanzees by about 100 million base pairs ~98%
identical
• Baboons by about 300 million base pairs ~92% identical
• Mice by about 2.8 billion bases, but coding regions are
~90% identical
• Leaf spinach by about 2.9 billion bases, but coding
regions are ~40% identical
A
C
G
T
AGCCACACAGTGTCCACCGGATGGTTGATTTTGAAGCAGAGTAGCCACACAGTGTCCACCGGATGGTTGATTTTGAAGCAGAGT
TAGCTTGTCACCTGCCTCCCTTTCCCGGGACAACAGAAGCTGATAGCTTGTCACCTGCCTCCCTTTCCCGGGACAACAGAAGCTGA
CCTCTTTGCCTCTTTGNNTCTCTTGCGCAGTCTCTTGCGCAGATGATGAGTCTCCGGGGCTCTAATGATGAGTCTCCGGGGCTCTA
TGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGTGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAG
CAGAGTTCAACAGAGTTCAAGTAAGTACTGGTTTGGGGAGGTAAGTACTGGTTTGGGGAGNNAGGGTTGCAGCGAGGGTTGCAGCG
GCGCNNGAGCCAGGGTCTCCACCCAGGAAGGACTGAGCCAGGGTCTCCACCCAGGAAGGACTNNATCGGGCAGGGATCGGGCAGGG
TGTGGGGAAACAGGGAGGTTGTTCAGATGACCTGTGGGGAAACAGGGAGGTTGTTCAGATGACCACGGGACACCTACGGGACACCT
TTGACCCTGGCCGCTGTGGAGTGTTTGTGCTGGTTGATGCCTTTTGACCCTGGCCGCTGTGGAGTGTTTGTGCTGGTTGATGCCTT
CTGGGTGTGGAATTGTTTTTCCCGGAGTGGCCTCTGCCCTCTCCTGGGTGTGGAATTGTTTTTCCCGGAGTGGCCTCTGCCCTCTC
CCCTAGCCTGTCTCAGATCCTGGGAGCTGGTGAGCTGCCCCCTCCCTAGCCTGTCTCAGATCCTGGGAGCTGGTGAGCTGCCCCCT
GCAGGTGGATCGAGTAATTGCAGGGGTTTGGCAAGGACTTTGAGCAGGTGGATCGAGTAATTGCAGGGGTTTGGCAAGGACTTTGA
CAGACATCCCCAGGGGTGCCCGGGAGTGTGGGGTCCCAGACATCCCCAGGGGTGCCCGGGAGTGTGGGGTCCNNAGCCAGAGCCAG
Differences between individuals
The yellow underlined sequence is the first exon of
the BCR gene involved in leukemia. Only 5 bases
(NN) differ in non-gene regions.
A
C
G
T
Human Chromosome 22
Single Nucleotide Polymorphisms*
Number of overlaps 335
Size of overlaps 13,203,147 bp
Number of SNPs 11,116 (~1/1000 bp)
Number of substitutions 9,123 (82%)
Number of ins/del 1,193 (18%)
Only 48 of the 11,116 SNPs were in coding
regions ~ 10 fold lower than in non-coding
* E. Dawson, et al. A SNP Resource For Human Chromosome 22: Extracting Dense
Clusters of SNPs from the Genomic Sequence. Genome Research, 11, 170-178 (2001).
A
C
G
T
““We each are like a different symphony orchestra”We each are like a different symphony orchestra”
““All playing the same instruments slightly differently”All playing the same instruments slightly differently”
A
C
G
T
Good news and Bad newsGood news and Bad news
• Good news <40,000 genes (counting dark space?)Good news <40,000 genes (counting dark space?)
• Bad newsBad news
• 2-4 times as many proteins as other2-4 times as many proteins as other
species due to extensive alternativespecies due to extensive alternative
splicing in humans.splicing in humans.
• We only know the function of aboutWe only know the function of about
half the predicted genes.half the predicted genes.
• Likely > 1 million different geneLikely > 1 million different gene
products based on alternative splicingproducts based on alternative splicing
and post-translational modifications.and post-translational modifications.
A
C
G
T
Where we stand now
• We essentially have the ‘dictionary’ with allWe essentially have the ‘dictionary’ with all
the words (genes) spelled correctly, but onlythe words (genes) spelled correctly, but only
slightly more than half of the words (genes)slightly more than half of the words (genes)
have definitions.have definitions.
• Through comparative genomic sequencingThrough comparative genomic sequencing
we can annotate the human genome basedwe can annotate the human genome based
on evolutionary conserved gene sequenceson evolutionary conserved gene sequences
and use model systems to study geneand use model systems to study gene
expression.expression.
• Slightly over half of the genes predicted forSlightly over half of the genes predicted for
human chromosome 22 have beenhuman chromosome 22 have been
experimentally validated.experimentally validated.
A
C
G
T
A
C
G
T
Chimpanzee and Baboon
Genomic Sequencing
• Medically important model eukaryotic organisms
• The chimpanzee is our nearest evolutionary
relative with a genome that has ~98 %
sequence identity with the human genome
• The baboon genome has ~92 % sequence
identity with the human genome
A
C
G
T
PIP Plot of
a region of
human
chr22
compared
to syntenic
regions of
baboon
and mouse
human-
specific
repeat
regions
Questionable
gene present
in primates
but not in
rodents
A
C
G
T
Variations in the regions syntenic to the
human chr 22 immunoglobulin light chain
region from chimp, baboon, rat and mouse
A
C
G
T
34 Kbp
deletion
in
baboon
A
C
G
T
Exons in one
copy of a
zebrafish
duplicated
gene with
75%
homology to
human but
greatly
diverged,
<50%
homology, in
the other
copy
A
C
G
T
Instance
of a rare
alu
deletion in
chimp and
a gene
having
very low
homology
in fish
A
C
G
T
Conclusions from the analysis of
vertebrate genomic sequences
• Approximately 40% of the genome is expressed into
hnRNA which is processed to 10-fold smaller mature
mRNA with extensive alternative splicing (1 gene -->
multiple proteins).
• Approximately 40% repeat sequence density.
• Conserved coding sequences, promoters and enhancers
and exon spacing approximately proportional to
evolutionary distance from a common ancestor.
• Additional endogenous retroviral and alu sequences in the
human genome and some regions not present is different
vertebrates.
• Sequence drift in duplicated gene families.
• About half of the predicted genes have yet to be assigned
any known function.
A
C
G
T
“Zebrafish are small people that swim
in the water and breathe through gills”
Han Wang, Dept. Zoology and Director of the
University of Oklahoma Zebrafish Facility
A
C
G
T
How much of the ~1.7 Gbp genome has been sequenced so far?
The whole genome shotgun project comprises roughly 11.6 million traces by
now. With an average quality clipped trace length of 517 bp this adds to 6 Gb in
total, so the genome is covered 3.5 times.
The new assembly Zv2 is built on 11.7 million traces with an average trace
length of 651 bp length, adding up to 7.64 Gbp (4.5 x coverage).
The current Sanger Institute in-house statistics for the clone sequencing are:
* 322,712,747 bp unfinished
* 112,494,895 bp finished
* 435,207,642 bp total
A
C
G
T
Individuals within a single developing clutch
hatch sporadically during the whole period.
Hatching Period (48-72 h)
Embryos developing to the phyolotypic stage
when it posesses the classic vertebrate
bauplan.Migration of the posterior lateral line
primordium. Rapid organogenesis continues.
Pharyngula Period (24-48 h)
Somites develop, the rudiments of the primary
organs become visible, the tail bud becomes
more prominent and the embryo elongates. The
first cells differentiate morphologically, and the
first body movements appear.
Segmentation Period (10 1/3 - 24 h)
Morphogenetic cell movements of involution,
convergence, and extension occur, producing
the primary germ layers and the embryonic axis.
Gastrula Period (5 1/4 - 10 1/3h)
Begins at 128-cell stage or 8th zygotic cell cycle.
Embryo enters midblastula transition (MBT), the
onset of zygotic transcription. Period ends at the
onset ofgastrulation.
Blastula Period (2 1/4 - 5 1/4 h)
After the first cleavage, blastomeres divide at
approximately 15 minute intervals
Cleavage Period (0.7- 2.2 h)
The newly fertilized egg is in the zygote period
until the first cleavage occurs
Zygote Period (0-3/4 h)
DescriptionZebrafish Developmental stages(HPF*)
Kimmel CB, et al. Stages of embryonic development of the zebrafish. Dev Dyn 203, 253-310 (1995).
A
C
G
T
• Created and sequenced 10,000 clones from a zebrafish brain
and eye cDNA library.
• After a blast vs human chromosome 22, obtained the set of
zebrafish cDNA clones corresponding to several predicted
human chromosome 22 genes.
• Picked an EST whose expression profile matched a hypothetical
protein with and EST from a human fetal brain library.
Gene Expression in Zebrafish
A
C
G
T
Gene Expression in Zebrafish (cont)
• An antisense RNA hybridization probe was generated by in vitro
transcription in the presence of dig-UTP after cloning into an
expression vector.
• Whole mount in situ hybridization was to 24, 48, and 72 hours post-
fertilization zebrafish embryos.
• Hybridization was detected by anti-dig antibody.
1b6: AP000557.1.mRNA chr22 position:18495442-18504448 KIAA1020 hypothetical protein matches EST b6n20zf
24hpf 48hpf 72hpf
Probe1 b6
Probe1 b6 shows hybridization in the brain from 24 hours onward and in the eye
from 48 hours onward.
A
C
G
T
Exon-specific gene expression in zebra fish
embryos during development that is
amenable to automation
Incorporated mouse in situ methods for zebrafish that:
• shorten the length of probes from 1000 bp to 100 bp, thus
exon-specific probes,
• hybridizations in a 96 well multiplex microtiter plate format,
• digoxigenin labeled ssDNA probes generated from
assymetric, single primer amplification off PCR (eliminating
sub-cloning of each PCR product into T3/T7 expression
vectors), and
• eliminated the spurious labeling of the eye by introducing
glycine as the reagent of choice to rapidly inhibit the
proteinase K used to increase permeability of the embryos.
A
C
G
T
QuickTime™ and a Graphics decompressor are needed to see this picture.
QuickTime™ and a Graphics decompressor are needed to see this picture.
Whole mount in situ hybridization with
ssDNA-digoxigenin labeled probe
made from a PCR product. Brain-
specific expression of this mRNA
during embryonic development
A
C
G
T
Anti-sense probe Sense probe No probe
Typically only see anti-sense probe hybridizing,
and therefore stained by anti-dig antibody with
some probe-independent staining in the eye.
The importance of a “no probe” antibody staining
control to determine if any probe-independent
antibody staining occurs in the lens
72 hour post fertilization embryo
A
C
G
T
A probe to the unique 3’ UTR if
there are multiple paralogs
One last experiment with a surprise ending
A
C
G
T
Hybridization
probe a8h24
unique to 3’
UTR of
zebrafish gene
2 based on our
zebrafish EST
sequence
A
C
G
T
Anti-sense probe Sense probe No probe
Both the anti-sense and sense probes hybridized
to 72 hour post fertilization embryonic brain.
Indicating RNA transcribed from
the opposite, non-coding strand?
One too many controls sometimes
results in a surprise observation
A
C
G
T
What’s next for our Genome Center?
• Participate in sequencing the mouse, chimp, baboon,Participate in sequencing the mouse, chimp, baboon,
lemur, bovine, dog, cat, chicken and zebra fishlemur, bovine, dog, cat, chicken and zebra fish
genomes concentrating on:genomes concentrating on:
• Regions of high biological interest andRegions of high biological interest and
• Regions orthologous to human chromosome 22Regions orthologous to human chromosome 22
• Sequence theSequence the Medicago truncatulaMedicago truncatula (alfalfa) genome(alfalfa) genome
using a mapped BAC-based approach concentratingusing a mapped BAC-based approach concentrating
on coding regionson coding regions
• Continued sequencing of selected pathogenic bacteriaContinued sequencing of selected pathogenic bacteria
• Investigate the function of the predicted genes withInvestigate the function of the predicted genes with
unknown function in the zebrafish system first byunknown function in the zebrafish system first by
whole mountwhole mount in situin situ and then expression knock downand then expression knock down
experiments with morpholino oligos.experiments with morpholino oligos.
A
C
G
T
Laboratory OrganizationLaboratory Organization
Bruce Roe, PIBruce Roe, PI
InformaticsInformatics
Support TeamsSupport Teams
ProductionProduction AdministrationAdministration
Jim WhiteJim White
Steve KentonSteve Kenton
Hongshing LaiHongshing Lai
Sean Qian***Sean Qian***
Rose Morales-Diaz*Rose Morales-Diaz*
Mounir Elharam*Mounir Elharam*
Steve Shaull**Steve Shaull**
Doug WhiteDoug White
Work-study Undergraduate students**Work-study Undergraduate students**
KayLynn HaleKayLynn Hale
Dixie WishnuckDixie Wishnuck
Tami WomackTami Womack
Mary Catherine WilliamsMary Catherine Williams
DNA SynthesisDNA Synthesis
Phoebe Loh*Phoebe Loh*
Sulan QiSulan Qi
Bart Ford*Bart Ford*
Reagents &Reagents &
Equip. Maint.Equip. Maint.
Mounir Elharam*Mounir Elharam*
Doug WhiteDoug White
Clayton Powell**Clayton Powell**
Axin Hua***Axin Hua***
Weihong Xu****Weihong Xu****
Yanhong LiYanhong Li
Jami Milam****Jami Milam****
Sara Downard**Sara Downard**
Ging Sobhraksha**Ging Sobhraksha**
Limei YangLimei Yang
Angie Prescott*Angie Prescott*
Audra Wendt**Audra Wendt**
Mandi Aycock**Mandi Aycock**
Ziyun Yao***Ziyun Yao***
Steve Shaull*Steve Shaull*
Youngju Yoon****Youngju Yoon****
Trang DoTrang Do
Anh DoAnh Do
Lily FuLily Fu
Yang Ye**Yang Ye**
Tessa Manning**Tessa Manning**
Fu YingFu Ying
Liping ZhouLiping Zhou
Ruihua Shi****Ruihua Shi****
Junjie Wu****Junjie Wu****
Stephan Deschamps***Stephan Deschamps***
Shelly Oommen****Shelly Oommen****
Christopher Lau****Christopher Lau****
Research TeamsResearch Teams
Doris KupferDoris Kupfer
Julia Kim*Julia Kim*
Sun SoSun So
Graham Wiley**Graham Wiley**
Lin Song****Lin Song****
Ying NiYing Ni
Huarong JiangHuarong Jiang
ShaoPing Lin***ShaoPing Lin***
Honggui JiaHonggui Jia
Hongming WuHongming Wu
Baifang QinBaifang Qin
Peng ZhangPeng Zhang
Shuling LiShuling Li
Fares Najar***Fares Najar***
Chunmei QuChunmei Qu
Keqin WangKeqin Wang
Funding from the NHGRI, Noble Foundation, DOE, NSF (pending)
- Collaborators at Sanger, CWRU, CHOP, Keio, UIUC and Riken
Pheobe LohPheobe Loh **
Sulan QiSulan Qi
Bart Ford*Bart Ford*
* Previous undergraduate res. student* Previous undergraduate res. student
** Present undergraduate res. student** Present undergraduate res. student
*** Previous graduate student*** Previous graduate student
**** Present graduate student**** Present graduate student
A
C
G
T The AACCGGTT Team
A
C
G
T
Peggy and Charles Stephenson CenterPeggy and Charles Stephenson Center

More Related Content

PDF
The Human Genome Project - Part I
PDF
Genomics
PDF
The Human Genome Project - Part III
PPTX
Cracking the code of life
ZIP
YSP Week 3 HGP
PPT
Unilag workshop complex genome analysis
PPTX
Mouse genome
PPT
L14 human genome
The Human Genome Project - Part I
Genomics
The Human Genome Project - Part III
Cracking the code of life
YSP Week 3 HGP
Unilag workshop complex genome analysis
Mouse genome
L14 human genome

What's hot (20)

PPT
Introduction to genomes
PPTX
THE human genome
PPT
Genome origin
PPTX
Bio153 microbial genomics 2012
PPT
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
PPT
Gene mapping
PDF
When is a genome finished?
PPT
Genomics seminar copy
PDF
Fogarty Report
PDF
Human Genome
PPTX
PPTX
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
PPTX
Yeast genome project
PPTX
The language of life (all the subtitles)first ppt 2 bimester
PPTX
Human genetics evolutionary genetics
PPTX
Genomics 101 jun 15 2012
PPT
chloroplast genome ppt.
PPTX
Application of genomics in animals
PPTX
Rice genome sequencing by utkarsh
Introduction to genomes
THE human genome
Genome origin
Bio153 microbial genomics 2012
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
Gene mapping
When is a genome finished?
Genomics seminar copy
Fogarty Report
Human Genome
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Yeast genome project
The language of life (all the subtitles)first ppt 2 bimester
Human genetics evolutionary genetics
Genomics 101 jun 15 2012
chloroplast genome ppt.
Application of genomics in animals
Rice genome sequencing by utkarsh
Ad

Similar to CSHL (20)

PPT
human genome project - genetics and gene mapping
PDF
Marzillier_09052014.pdf
PPTX
Human genome project
PPTX
The Human Genome Project
PPTX
Prokaryote genome
PPTX
Genome concept, types, and function
PPTX
Human encodeproject
PPTX
Content of the genome
PDF
Human genome project (2) converted
PPTX
Markers
PPTX
Chapter 7 genome structure, chromatin, and the nucleosome (1)
PDF
Clase 2 - Genoma Humano proyecto conicet.pdf
PPTX
2014 whitney-research
PPT
Genomes and Their Evolution detailed explanation
PPT
Ap Chapter 21
PPT
Human genome project
PPTX
Human genetic diversity and origin of major human groups
PPTX
BFG_Chapter8_EukaryoticChromomose_v04.pptx
PDF
Mitochondrial DNA in Taxonomy and Phylogeny
PPT
genetic variation
human genome project - genetics and gene mapping
Marzillier_09052014.pdf
Human genome project
The Human Genome Project
Prokaryote genome
Genome concept, types, and function
Human encodeproject
Content of the genome
Human genome project (2) converted
Markers
Chapter 7 genome structure, chromatin, and the nucleosome (1)
Clase 2 - Genoma Humano proyecto conicet.pdf
2014 whitney-research
Genomes and Their Evolution detailed explanation
Ap Chapter 21
Human genome project
Human genetic diversity and origin of major human groups
BFG_Chapter8_EukaryoticChromomose_v04.pptx
Mitochondrial DNA in Taxonomy and Phylogeny
genetic variation
Ad

CSHL

  • 1. A C G T Comparative Analysis of HumanComparative Analysis of Human Chromosome 22q11.1-q12.3 withChromosome 22q11.1-q12.3 with Syntenic Regions in the Chimpanzee,Syntenic Regions in the Chimpanzee, Baboon, Bovine, Mouse, Pufferfish andBaboon, Bovine, Mouse, Pufferfish and Zebrafish GenomesZebrafish Genomes Dr. Bruce A. RoeDr. Bruce A. Roe George Lynn Cross Research ProfessorGeorge Lynn Cross Research Professor Advanced Center for Genome TechnologyAdvanced Center for Genome Technology Department of Chemistry and BiochemistryDepartment of Chemistry and Biochemistry University of OklahomaUniversity of Oklahoma broe@ou.edu www.genome.ou.edubroe@ou.edu www.genome.ou.edu LXVIII CSHL Symposium “The Genome of Homo Sapiens” May 28 - June 3, 2003
  • 2. A C G T ““The joy of science is the peopleThe joy of science is the people you meet along the way and howyou meet along the way and how they influence your life”they influence your life” Jochanan Stenesh and Lilian Myers at Western Michigan University and Bernie Dudock at SUNY Stony Brook Bart Barrell and Alan Coulson originally at the MRC-Hills Road Cambridge and Ian Dunham both now at the Sanger Institute Watson and Crick Fred Sanger Bev Emanuel at Childrens Hospital of Philadelphia
  • 4. A C G T Human Chromosome 22Human Chromosome 22 Sequence FeaturesSequence Features • 39 % of the sequence is occupied by genes including39 % of the sequence is occupied by genes including their introns, 5’ and 3’ non-translated regions.their introns, 5’ and 3’ non-translated regions. • 3 % of the complete sequence encodes the protein3 % of the complete sequence encodes the protein products of these genes.products of these genes. • 42 % of the sequence is composed of repetitive42 % of the sequence is composed of repetitive sequences, compared to 46 % for the entire genome.sequences, compared to 46 % for the entire genome. • Only slightly over half of the genes predicted forOnly slightly over half of the genes predicted for human chromosome 22 can be experimentallyhuman chromosome 22 can be experimentally validated.*validated.* * Shoemaker DD., et al. Experimental annotation of the human genome using microarray technology. Nature. 409, 922-7 (2001).
  • 5. A C G T An Individual’s Genome Differs from the DNA of: • Siblings by 1 to 2 million bases, ~99.98% identical, with coding regions 99.99999% identical • Unrelated humans by 6 million bases, ~99.8% identical overall, with coding regions 99.9999% identical • Chimpanzees by about 100 million base pairs ~98% identical • Baboons by about 300 million base pairs ~92% identical • Mice by about 2.8 billion bases, but coding regions are ~90% identical • Leaf spinach by about 2.9 billion bases, but coding regions are ~40% identical
  • 6. A C G T AGCCACACAGTGTCCACCGGATGGTTGATTTTGAAGCAGAGTAGCCACACAGTGTCCACCGGATGGTTGATTTTGAAGCAGAGT TAGCTTGTCACCTGCCTCCCTTTCCCGGGACAACAGAAGCTGATAGCTTGTCACCTGCCTCCCTTTCCCGGGACAACAGAAGCTGA CCTCTTTGCCTCTTTGNNTCTCTTGCGCAGTCTCTTGCGCAGATGATGAGTCTCCGGGGCTCTAATGATGAGTCTCCGGGGCTCTA TGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGTGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAG CAGAGTTCAACAGAGTTCAAGTAAGTACTGGTTTGGGGAGGTAAGTACTGGTTTGGGGAGNNAGGGTTGCAGCGAGGGTTGCAGCG GCGCNNGAGCCAGGGTCTCCACCCAGGAAGGACTGAGCCAGGGTCTCCACCCAGGAAGGACTNNATCGGGCAGGGATCGGGCAGGG TGTGGGGAAACAGGGAGGTTGTTCAGATGACCTGTGGGGAAACAGGGAGGTTGTTCAGATGACCACGGGACACCTACGGGACACCT TTGACCCTGGCCGCTGTGGAGTGTTTGTGCTGGTTGATGCCTTTTGACCCTGGCCGCTGTGGAGTGTTTGTGCTGGTTGATGCCTT CTGGGTGTGGAATTGTTTTTCCCGGAGTGGCCTCTGCCCTCTCCTGGGTGTGGAATTGTTTTTCCCGGAGTGGCCTCTGCCCTCTC CCCTAGCCTGTCTCAGATCCTGGGAGCTGGTGAGCTGCCCCCTCCCTAGCCTGTCTCAGATCCTGGGAGCTGGTGAGCTGCCCCCT GCAGGTGGATCGAGTAATTGCAGGGGTTTGGCAAGGACTTTGAGCAGGTGGATCGAGTAATTGCAGGGGTTTGGCAAGGACTTTGA CAGACATCCCCAGGGGTGCCCGGGAGTGTGGGGTCCCAGACATCCCCAGGGGTGCCCGGGAGTGTGGGGTCCNNAGCCAGAGCCAG Differences between individuals The yellow underlined sequence is the first exon of the BCR gene involved in leukemia. Only 5 bases (NN) differ in non-gene regions.
  • 7. A C G T Human Chromosome 22 Single Nucleotide Polymorphisms* Number of overlaps 335 Size of overlaps 13,203,147 bp Number of SNPs 11,116 (~1/1000 bp) Number of substitutions 9,123 (82%) Number of ins/del 1,193 (18%) Only 48 of the 11,116 SNPs were in coding regions ~ 10 fold lower than in non-coding * E. Dawson, et al. A SNP Resource For Human Chromosome 22: Extracting Dense Clusters of SNPs from the Genomic Sequence. Genome Research, 11, 170-178 (2001).
  • 8. A C G T ““We each are like a different symphony orchestra”We each are like a different symphony orchestra” ““All playing the same instruments slightly differently”All playing the same instruments slightly differently”
  • 9. A C G T Good news and Bad newsGood news and Bad news • Good news <40,000 genes (counting dark space?)Good news <40,000 genes (counting dark space?) • Bad newsBad news • 2-4 times as many proteins as other2-4 times as many proteins as other species due to extensive alternativespecies due to extensive alternative splicing in humans.splicing in humans. • We only know the function of aboutWe only know the function of about half the predicted genes.half the predicted genes. • Likely > 1 million different geneLikely > 1 million different gene products based on alternative splicingproducts based on alternative splicing and post-translational modifications.and post-translational modifications.
  • 10. A C G T Where we stand now • We essentially have the ‘dictionary’ with allWe essentially have the ‘dictionary’ with all the words (genes) spelled correctly, but onlythe words (genes) spelled correctly, but only slightly more than half of the words (genes)slightly more than half of the words (genes) have definitions.have definitions. • Through comparative genomic sequencingThrough comparative genomic sequencing we can annotate the human genome basedwe can annotate the human genome based on evolutionary conserved gene sequenceson evolutionary conserved gene sequences and use model systems to study geneand use model systems to study gene expression.expression. • Slightly over half of the genes predicted forSlightly over half of the genes predicted for human chromosome 22 have beenhuman chromosome 22 have been experimentally validated.experimentally validated.
  • 12. A C G T Chimpanzee and Baboon Genomic Sequencing • Medically important model eukaryotic organisms • The chimpanzee is our nearest evolutionary relative with a genome that has ~98 % sequence identity with the human genome • The baboon genome has ~92 % sequence identity with the human genome
  • 13. A C G T PIP Plot of a region of human chr22 compared to syntenic regions of baboon and mouse human- specific repeat regions Questionable gene present in primates but not in rodents
  • 14. A C G T Variations in the regions syntenic to the human chr 22 immunoglobulin light chain region from chimp, baboon, rat and mouse
  • 16. A C G T Exons in one copy of a zebrafish duplicated gene with 75% homology to human but greatly diverged, <50% homology, in the other copy
  • 17. A C G T Instance of a rare alu deletion in chimp and a gene having very low homology in fish
  • 18. A C G T Conclusions from the analysis of vertebrate genomic sequences • Approximately 40% of the genome is expressed into hnRNA which is processed to 10-fold smaller mature mRNA with extensive alternative splicing (1 gene --> multiple proteins). • Approximately 40% repeat sequence density. • Conserved coding sequences, promoters and enhancers and exon spacing approximately proportional to evolutionary distance from a common ancestor. • Additional endogenous retroviral and alu sequences in the human genome and some regions not present is different vertebrates. • Sequence drift in duplicated gene families. • About half of the predicted genes have yet to be assigned any known function.
  • 19. A C G T “Zebrafish are small people that swim in the water and breathe through gills” Han Wang, Dept. Zoology and Director of the University of Oklahoma Zebrafish Facility
  • 20. A C G T How much of the ~1.7 Gbp genome has been sequenced so far? The whole genome shotgun project comprises roughly 11.6 million traces by now. With an average quality clipped trace length of 517 bp this adds to 6 Gb in total, so the genome is covered 3.5 times. The new assembly Zv2 is built on 11.7 million traces with an average trace length of 651 bp length, adding up to 7.64 Gbp (4.5 x coverage). The current Sanger Institute in-house statistics for the clone sequencing are: * 322,712,747 bp unfinished * 112,494,895 bp finished * 435,207,642 bp total
  • 21. A C G T Individuals within a single developing clutch hatch sporadically during the whole period. Hatching Period (48-72 h) Embryos developing to the phyolotypic stage when it posesses the classic vertebrate bauplan.Migration of the posterior lateral line primordium. Rapid organogenesis continues. Pharyngula Period (24-48 h) Somites develop, the rudiments of the primary organs become visible, the tail bud becomes more prominent and the embryo elongates. The first cells differentiate morphologically, and the first body movements appear. Segmentation Period (10 1/3 - 24 h) Morphogenetic cell movements of involution, convergence, and extension occur, producing the primary germ layers and the embryonic axis. Gastrula Period (5 1/4 - 10 1/3h) Begins at 128-cell stage or 8th zygotic cell cycle. Embryo enters midblastula transition (MBT), the onset of zygotic transcription. Period ends at the onset ofgastrulation. Blastula Period (2 1/4 - 5 1/4 h) After the first cleavage, blastomeres divide at approximately 15 minute intervals Cleavage Period (0.7- 2.2 h) The newly fertilized egg is in the zygote period until the first cleavage occurs Zygote Period (0-3/4 h) DescriptionZebrafish Developmental stages(HPF*) Kimmel CB, et al. Stages of embryonic development of the zebrafish. Dev Dyn 203, 253-310 (1995).
  • 22. A C G T • Created and sequenced 10,000 clones from a zebrafish brain and eye cDNA library. • After a blast vs human chromosome 22, obtained the set of zebrafish cDNA clones corresponding to several predicted human chromosome 22 genes. • Picked an EST whose expression profile matched a hypothetical protein with and EST from a human fetal brain library. Gene Expression in Zebrafish
  • 23. A C G T Gene Expression in Zebrafish (cont) • An antisense RNA hybridization probe was generated by in vitro transcription in the presence of dig-UTP after cloning into an expression vector. • Whole mount in situ hybridization was to 24, 48, and 72 hours post- fertilization zebrafish embryos. • Hybridization was detected by anti-dig antibody. 1b6: AP000557.1.mRNA chr22 position:18495442-18504448 KIAA1020 hypothetical protein matches EST b6n20zf 24hpf 48hpf 72hpf Probe1 b6 Probe1 b6 shows hybridization in the brain from 24 hours onward and in the eye from 48 hours onward.
  • 24. A C G T Exon-specific gene expression in zebra fish embryos during development that is amenable to automation Incorporated mouse in situ methods for zebrafish that: • shorten the length of probes from 1000 bp to 100 bp, thus exon-specific probes, • hybridizations in a 96 well multiplex microtiter plate format, • digoxigenin labeled ssDNA probes generated from assymetric, single primer amplification off PCR (eliminating sub-cloning of each PCR product into T3/T7 expression vectors), and • eliminated the spurious labeling of the eye by introducing glycine as the reagent of choice to rapidly inhibit the proteinase K used to increase permeability of the embryos.
  • 25. A C G T QuickTime™ and a Graphics decompressor are needed to see this picture. QuickTime™ and a Graphics decompressor are needed to see this picture. Whole mount in situ hybridization with ssDNA-digoxigenin labeled probe made from a PCR product. Brain- specific expression of this mRNA during embryonic development
  • 26. A C G T Anti-sense probe Sense probe No probe Typically only see anti-sense probe hybridizing, and therefore stained by anti-dig antibody with some probe-independent staining in the eye. The importance of a “no probe” antibody staining control to determine if any probe-independent antibody staining occurs in the lens 72 hour post fertilization embryo
  • 27. A C G T A probe to the unique 3’ UTR if there are multiple paralogs One last experiment with a surprise ending
  • 28. A C G T Hybridization probe a8h24 unique to 3’ UTR of zebrafish gene 2 based on our zebrafish EST sequence
  • 29. A C G T Anti-sense probe Sense probe No probe Both the anti-sense and sense probes hybridized to 72 hour post fertilization embryonic brain. Indicating RNA transcribed from the opposite, non-coding strand? One too many controls sometimes results in a surprise observation
  • 30. A C G T What’s next for our Genome Center? • Participate in sequencing the mouse, chimp, baboon,Participate in sequencing the mouse, chimp, baboon, lemur, bovine, dog, cat, chicken and zebra fishlemur, bovine, dog, cat, chicken and zebra fish genomes concentrating on:genomes concentrating on: • Regions of high biological interest andRegions of high biological interest and • Regions orthologous to human chromosome 22Regions orthologous to human chromosome 22 • Sequence theSequence the Medicago truncatulaMedicago truncatula (alfalfa) genome(alfalfa) genome using a mapped BAC-based approach concentratingusing a mapped BAC-based approach concentrating on coding regionson coding regions • Continued sequencing of selected pathogenic bacteriaContinued sequencing of selected pathogenic bacteria • Investigate the function of the predicted genes withInvestigate the function of the predicted genes with unknown function in the zebrafish system first byunknown function in the zebrafish system first by whole mountwhole mount in situin situ and then expression knock downand then expression knock down experiments with morpholino oligos.experiments with morpholino oligos.
  • 31. A C G T Laboratory OrganizationLaboratory Organization Bruce Roe, PIBruce Roe, PI InformaticsInformatics Support TeamsSupport Teams ProductionProduction AdministrationAdministration Jim WhiteJim White Steve KentonSteve Kenton Hongshing LaiHongshing Lai Sean Qian***Sean Qian*** Rose Morales-Diaz*Rose Morales-Diaz* Mounir Elharam*Mounir Elharam* Steve Shaull**Steve Shaull** Doug WhiteDoug White Work-study Undergraduate students**Work-study Undergraduate students** KayLynn HaleKayLynn Hale Dixie WishnuckDixie Wishnuck Tami WomackTami Womack Mary Catherine WilliamsMary Catherine Williams DNA SynthesisDNA Synthesis Phoebe Loh*Phoebe Loh* Sulan QiSulan Qi Bart Ford*Bart Ford* Reagents &Reagents & Equip. Maint.Equip. Maint. Mounir Elharam*Mounir Elharam* Doug WhiteDoug White Clayton Powell**Clayton Powell** Axin Hua***Axin Hua*** Weihong Xu****Weihong Xu**** Yanhong LiYanhong Li Jami Milam****Jami Milam**** Sara Downard**Sara Downard** Ging Sobhraksha**Ging Sobhraksha** Limei YangLimei Yang Angie Prescott*Angie Prescott* Audra Wendt**Audra Wendt** Mandi Aycock**Mandi Aycock** Ziyun Yao***Ziyun Yao*** Steve Shaull*Steve Shaull* Youngju Yoon****Youngju Yoon**** Trang DoTrang Do Anh DoAnh Do Lily FuLily Fu Yang Ye**Yang Ye** Tessa Manning**Tessa Manning** Fu YingFu Ying Liping ZhouLiping Zhou Ruihua Shi****Ruihua Shi**** Junjie Wu****Junjie Wu**** Stephan Deschamps***Stephan Deschamps*** Shelly Oommen****Shelly Oommen**** Christopher Lau****Christopher Lau**** Research TeamsResearch Teams Doris KupferDoris Kupfer Julia Kim*Julia Kim* Sun SoSun So Graham Wiley**Graham Wiley** Lin Song****Lin Song**** Ying NiYing Ni Huarong JiangHuarong Jiang ShaoPing Lin***ShaoPing Lin*** Honggui JiaHonggui Jia Hongming WuHongming Wu Baifang QinBaifang Qin Peng ZhangPeng Zhang Shuling LiShuling Li Fares Najar***Fares Najar*** Chunmei QuChunmei Qu Keqin WangKeqin Wang Funding from the NHGRI, Noble Foundation, DOE, NSF (pending) - Collaborators at Sanger, CWRU, CHOP, Keio, UIUC and Riken Pheobe LohPheobe Loh ** Sulan QiSulan Qi Bart Ford*Bart Ford* * Previous undergraduate res. student* Previous undergraduate res. student ** Present undergraduate res. student** Present undergraduate res. student *** Previous graduate student*** Previous graduate student **** Present graduate student**** Present graduate student
  • 33. A C G T Peggy and Charles Stephenson CenterPeggy and Charles Stephenson Center