SlideShare a Scribd company logo
FST & Some Selection Index
진화, 인구집단 유전학과 건강 2014
김진섭
GSPH, SNU
October 29, 2014
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 1 / 65
Fst
Contents
1 Fst
Wright’s F-statistics
Cockerham’s θ-statistics
2 Selection Index
EHH
iHS
xp-EHH
3 Practice
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 2 / 65
Fst Wright’s F-statistics
3 types of Heterozygosity[4]
Individual, Subpopulation, Total Population
1 HI = 1
n
n
i=1
ˆHi
2 HS = 1
n
n
i=1 2pi qi
3 HT = 2¯p¯q
( ˆHi : observed heterozygosity in ith subpopulation, 2pi qi : average
heterozygosity in ith subpopulation, 2¯p¯q: average heterozygosity of total
population)
Locus 별로 값 구한다.
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 3 / 65
Fst Wright’s F-statistics
Wright’s F-statistics[4]
1 FIS = HS −HI
HS
2 FST = HT −HS
HT
3 FIT = HT −HI
HT
Example
FST = 0 → Subpopulation의 effect없다!! 차이 없다.
FST = 1 → Subpopulation별로 차이가 크다.
Simple relation
1 − FIT = (1 − FIS )(1 − FST )
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 4 / 65
Fst Wright’s F-statistics
http://academic.reed.edu/biology/professors/srenn/pages/
research/2011_students/sean/SM_thesis.html
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 5 / 65
Fst Wright’s F-statistics
http://www.johnderbyshire.com/Miscellaneous/Other/Fst.jpg
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 6 / 65
Fst Wright’s F-statistics
FST inference[5]
Convenient measure of genetic differentiation.
Most widely used descriptive statistics in population and
evolutionary genetics.
Natural selection in particular subpopulation.
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 7 / 65
Fst Wright’s F-statistics
Problem in estimation
HT = 2¯p¯q
1 Subpopulation마다 sample수가 다르면??
2 Ex: SASIA 1000명, Oceania 100명..
3 제대로 된 ¯p 추정이 아님.
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 8 / 65
Fst Cockerham’s θ-statistics
ANOVA approach[1, 5]
θ =
σP
σT
(σP: variance due to population, σT : total variance)
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 9 / 65
Fst Cockerham’s θ-statistics
Wright’s FST = Cockerham’s θ
실제 계산은 대부분 θ
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 10 / 65
Fst Cockerham’s θ-statistics
θ inference
Population > 2
대세와 다른 population이 있다!!
어떤 population인지는 말 안해준다.
Pairwise FST
2 population만 가지고 계산.
상대적인 비교.
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 11 / 65
Fst Cockerham’s θ-statistics
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 12 / 65
Fst Cockerham’s θ-statistics
Figure: FST calculated for each SNP between Tibetan and Han populations[6]
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 13 / 65
Fst Cockerham’s θ-statistics
Figure: Inter-population pairwise comparisons of FST statistics
http://academic.reed.edu/biology/professors/srenn/pages/
research/2011_students/sean/SM_thesis.html
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 14 / 65
Selection Index
Contents
1 Fst
Wright’s F-statistics
Cockerham’s θ-statistics
2 Selection Index
EHH
iHS
xp-EHH
3 Practice
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 15 / 65
Selection Index
특정 인구집단에 특정 haplotype이 많냐??
Example: Erik Corona’s slide - Next slide
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 16 / 65
Selection Index
Population Genetics
Glucose
HAPLOTYPES
GATTACAGATTACA 22%
AATTACAGATTAAA 3%
GACTACAGATTACC 19%
GATTACCTATTAAC 24%
AACTACAGATTACC 16%
GATTACAGACTACA 7%
AATTACAGATTACA 9%
Lactase + H2O
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 17 / 65
Selection Index
Population Genetics
Lactase + H2O
Glucose
HAPLOTYPES
GATTACAGATTACA 22%
AATTACAGATTAAA 3%
GACTACAGATTACC 19%
GATTACCTATTAAC 24%
AACTACAGATTACC 16%
GATTACAGACTACA 7%
AATTACAGATTACA 9%
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 18 / 65
Selection Index
Population Genetics
Lactase + H2O
Glucose
HAPLOTYPES
GATTACAGATTACA 22%
AATTACAGATTAAA 3%
GACTACAGATTACC 19%
GATTACCTATTAAC 24%
AACTACAGATTACC 16%
GATTACAGACTACA 7%
AATTACAGATTACA 9%
AATTGCAGATTACA <1%
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 19 / 65
Selection Index
Population Genetics
Lactase + H2O
Glucose
HAPLOTYPES
GATTACAGATTACA 22%
AATTACAGATTAAA 3%
GACTACAGATTACC 19%
GATTACCTATTAAC 24%
AACTACAGATTACC 16%
GATTACAGACTACA 7%
AATTACAGATTACA 9%
AATTGCAGATTACA <1%
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 20 / 65
Selection Index
Population Genetics
Lactase + H2O
Glucose
HAPLOTYPES
GATTACAGATTACA 21% -1%
AATTACAGATTAAA 3%
GACTACAGATTACC 19%
GATTACCTATTAAC 24%
AACTACAGATTACC 16%
GATTACAGACTACA 7%
AATTACAGATTACA 8% -1%
AATTGCAGATTACA 2% +2%
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 21 / 65
Selection Index
Population Genetics
Lactase + H2O
Glucose
HAPLOTYPES
GATTACAGATTACA 21% -1%
AATTACAGATTAAA 3%
GACTACAGATTACC 19%
GATTACCTATTAAC 23% -1%
AACTACAGATTACC 15% -1%
GATTACAGACTACA 7%
AATTACAGATTACA 7% -2%
AATTGCAGATTACA 5% +5%
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 22 / 65
Selection Index
Population Genetics
Lactase + H2O
Glucose
HAPLOTYPES
GATTACAGATTACA 20% -2%
AATTACAGATTAAA 3%
GACTACAGATTACC 19%
GATTACCTATTAAC 23% -1%
AACTACAGATTACC 15% -1%
GATTACAGACTACA 6% -1%
AATTACAGATTACA 5% -4%
AATTGCAGATTACA 9% +9%
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 23 / 65
Selection Index EHH
EHH: Sabeti, Reich et al. (2002)[7]
Extended Haplotype Homozygosity
Random으로 2개 haplotype 뽑았을 때 그것이 같을 확률은??
0 → haplotype이 다 다르다.
1 → haplotype이 모두 같다.
관심있는 haplotype을 Core라 한다.
EHHt =
s
i=1
eti
2
ct
2
(t: core haplotype, c: the number of samples of a particular core
haplotype, e: the number of samples of a particular extended haplotype, s:
the number of unique extended haplotype)
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 24 / 65
Selection Index EHH
How can we detect Pos. Sel.?
AATTACAGATTACA 50 people have this
GATTACAGATTACA 50 people have this
---- 50 KB ----
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 25 / 65
Selection Index EHH
50 KB + 20 KB = 70 KB__
AATTACAGATTACA AACACGC 10
AATTACAGATTACA ATGATAG 8
AATTACAGATTACA AACCCAG 7
AATTACAGATTACA CTGACAG 5
AATTACAGATTACA CAGACAG 3
AATTACAGATTACA AACACAG 6
AATTACAGATTACA CACACAG 4
AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24
GATTACAGATTACA CACACAG 26
How can we detect Pos. Sel.?
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 26 / 65
Selection Index EHH
Extended Haplotype Homozygosity (EHH)
AATTACAGATTACA AACACGC 10
AATTACAGATTACA ATGATAG 8
AATTACAGATTACA AACCCAG 7
AATTACAGATTACA CTGACAG 5
AATTACAGATTACA CAGACAG 3
AATTACAGATTACA AACACAG 6
AATTACAGATTACA CACACAG 4
AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24
GATTACAGATTACA CACACAG 26
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 27 / 65
Selection Index EHH
( (3
2
5
2
7
2
8
2)+
Extended Haplotype Homozygosity (EHH)
AATTACAGATTACA AACACGC 10
AATTACAGATTACA ATGATAG 8
AATTACAGATTACA AACCCAG 7
AATTACAGATTACA CTGACAG 5
AATTACAGATTACA CAGACAG 3
AATTACAGATTACA AACACAG 6
AATTACAGATTACA CACACAG 4
AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24
GATTACAGATTACA CACACAG 26
10
2)+( )+( )+ )+( )+6
2( )+4
2( )7
2
)50
2(
(
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 28 / 65
Selection Index EHH
)+
Extended Haplotype Homozygosity (EHH)
AATTACAGATTACA AACACGC 10
AATTACAGATTACA ATGATAG 8
AATTACAGATTACA AACCCAG 7
AATTACAGATTACA CTGACAG 5
AATTACAGATTACA CAGACAG 3
AATTACAGATTACA AACACAG 6
AATTACAGATTACA CACACAG 4
AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24
GATTACAGATTACA CACACAG 26
10
2( )+ 8
2( )+7
2( )+5
2( )+3
2( )+6
2( )+4
2( )7
2(
)50
2(
0.121
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 29 / 65
Selection Index EHH
EHH Drops Over Genetic Distance
EHH drops off quickly over 
genetic distance
Starts with 1
Ends at 0
Every hap block will 
eventually be unique
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 30 / 65
Selection Index EHH
AATTACAGATTACA AACACGC 10
AATTACAGATTACA ATGATAG 8
AATTACAGATTACA AACCCAG 7
AATTACAGATTACA CTGACAG 5
AATTACAGATTACA CAGACAG 3
AATTACAGATTACA AACACAG 6
AATTACAGATTACA CACACAG 4
AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24
GATTACAGATTACA CACACAG 26
EHH What It Is & What It Isn’t
Detects over‐representation of a haplotype
This will raise the p(two haps are homozygous)
Does NOT detect if a haplotype spread quickly
Low recombination != spread quickly
AATTACAGATTACA AACACGC 22
AATTACAGATTACA ATGATAG 28
GATTACAGATTACA CACATAG 24
GATTACAGATTACA CACACAG 26
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 31 / 65
Selection Index EHH
Compare EHH Scores
AATTACAGATTACA AACACGC 10
AATTACAGATTACA ATGATAG 8
AATTACAGATTACA AACCCAG 7
AATTACAGATTACA CTGACAG 5
AATTACAGATTACA CAGACAG 3
AATTACAGATTACA AACACAG 6
AATTACAGATTACA CACACAG 4
AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24
GATTACAGATTACA CACACAG 26
)+24
2( )26
2(
)50
2(
0.121
0.490
Low Recombination
Over Represented
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 32 / 65
Selection Index EHH
Can EHH Detect Pos. Sel.?
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 33 / 65
Selection Index EHH
Relative EHH
Detects over‐representation of a haplotype
Low recombination
This will raise the p(two haps are homozygous)
Does detect if a haplotype spread quickly
Other haplotype blocks are controls!
Recombination cold‐spot / hot‐spot agnostic
Low score if both alleles are assoc. w/ high or 
low recombination
AATTACAGATTACA AACACGC 22
AATTACAGATTACA ATGATAG 28
GATTACAGATTACA CACATAG 24
GATTACAGATTACA CACACAG 26
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 34 / 65
Selection Index EHH
Extended Haplotype Homozygosity (EHH)
AATTACAGATTACA AACACGC 10
AATTACAGATTACA ATGATAG 8
AATTACAGATTACA AACCCAG 7
AATTACAGATTACA CTGACAG 5
AATTACAGATTACA CAGACAG 3
AATTACAGATTACA AACACAG 6
AATTACAGATTACA CACACAG 4
AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24
GATTACAGATTACA CACACAG 26
0.121
0.490
0.490
0.121
= 4.05REHH =
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 35 / 65
Selection Index EHH
REHH: Problem #1
We get a different REHH value at different genetic 
distance cutoffs
AATTACAGATTACA 50
GATTACAGATTACA 50
---- 50 KB ----
REHH = 1.0
AATTACAGATTACA AACACGC 10
AATTACAGATTACA ATGATAG 8
AATTACAGATTACA AACCCAG 7
AATTACAGATTACA CTGACAG 5
AATTACAGATTACA CAGACAG 3
AATTACAGATTACA AACACAG 6
AATTACAGATTACA CACACAG 4
AATTACAGATTACA CACCCAG 7
GATTACAGATTACA CACATAG 24
GATTACAGATTACA CACACAG 26
---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 36 / 65
Selection Index EHH
Which REHH value to use?
Extend to the right
AGTTACAGATTACAAACACGC
AAATACAGATTACAATGATAG
AATTACAGATTACAAACCCAG
AATTTCAGATTACACTGACAG
AATTAAAGATTACACAGACAG
AATTACCGATTACAAACACAG
AATTACAAATTACACACACAG
AATTACAGGTTACACACCCAG
GATTACAGATTACACACATAG
GATTACAGATTACACACACAG
---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 37 / 65
Selection Index EHH
…ACAGATTACAGTTACAGATTACAAACACGC…
…ACAGATTACAAATACAGATTACAATGATAG…
…ACAGATTACAATTACAGATTACAAACCCAG…
…ACAGATTACAATTTCAGATTACACTGACAG…
…ACAGATTACAATTAAAGATTACACAGACAG…
…ACAGATTACAATTACCGATTACAAACACAG…
…ACAGATTACAATTACAAATTACACACACAG…
…ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG
…TACAGATTAGATTACAGATTACACACACAG
---------- 70 KB ---------
REHH = 4.05
Which REHH value to use?
Extend to the right
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 38 / 65
Selection Index EHH
Which REHH value to use?
Extend to the right
…ACAGATTACAGTTACAGATTACAAACACGC…
…ACAGATTACAAATACAGATTACAATGATAG…
…ACAGATTACAATTACAGATTACAAACCCAG…
…ACAGATTACAATTTCAGATTACACTGACAG…
…ACAGATTACAATTAAAGATTACACAGACAG…
…ACAGATTACAATTACCGATTACAAACACAG…
…ACAGATTACAATTACAAATTACACACACAG…
…ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG
…TACAGATTAGATTACAGATTACACACACAG
---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 39 / 65
Selection Index EHH
Which REHH value to use?
Extend to the right
…ACAGATTACAGTTACAGATTACAAACACGC…
…ACAGATTACAAATACAGATTACAATGATAG…
…ACAGATTACAATTACAGATTACAAACCCAG…
…ACAGATTACAATTTCAGATTACACTGACAG…
…ACAGATTACAATTAAAGATTACACAGACAG…
…ACAGATTACAATTACCGATTACAAACACAG…
…ACAGATTACAATTACAAATTACACACACAG…
…ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG
…TACAGATTAGATTACAGATTACACACACAG
---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 40 / 65
Selection Index EHH
Which REHH value to use?
Extend to the right
…ACAGATTACAGTTACAGATTACAAACACGC…
…ACAGATTACAAATACAGATTACAATGATAG…
…ACAGATTACAATTACAGATTACAAACCCAG…
…ACAGATTACAATTTCAGATTACACTGACAG…
…ACAGATTACAATTAAAGATTACACAGACAG…
…ACAGATTACAATTACCGATTACAAACACAG…
…ACAGATTACAATTACAAATTACACACACAG…
…ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG
…TACAGATTAGATTACAGATTACACACACAG
---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 41 / 65
Selection Index EHH
Which REHH value to use?
Extend to the right
…ACAGATTACAGTTACAGATTACAAACACGC…
…ACAGATTACAAATACAGATTACAATGATAG…
…ACAGATTACAATTACAGATTACAAACCCAG…
…ACAGATTACAATTTCAGATTACACTGACAG…
…ACAGATTACAATTAAAGATTACACAGACAG…
…ACAGATTACAATTACCGATTACAAACACAG…
…ACAGATTACAATTACAAATTACACACACAG…
…ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG
…TACAGATTAGATTACAGATTACACACACAG
---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 42 / 65
Selection Index EHH
Which REHH value to use?
Extend to the left
…ACAGATTACAGTTACAGATTACAAACACGC…
…ACAGATTACAAATACAGATTACAATGATAG…
…ACAGATTACAATTACAGATTACAAACCCAG…
…ACAGATTACAATTTCAGATTACACTGACAG…
…ACAGATTACAATTAAAGATTACACAGACAG…
…ACAGATTACAATTACCGATTACAAACACAG…
…ACAGATTACAATTACAAATTACACACACAG…
…ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG
…TACAGATTAGATTACAGATTACACACACAG
---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 43 / 65
Selection Index EHH
Which REHH value to use?
Extend to the left
…ACAGATTACAGTTACAGATTACAAACACGC…
…ACAGATTACAAATACAGATTACAATGATAG…
…ACAGATTACAATTACAGATTACAAACCCAG…
…ACAGATTACAATTTCAGATTACACTGACAG…
…ACAGATTACAATTAAAGATTACACAGACAG…
…ACAGATTACAATTACCGATTACAAACACAG…
…ACAGATTACAATTACAAATTACACACACAG…
…ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG
…TACAGATTAGATTACAGATTACACACACAG
---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 44 / 65
Selection Index EHH
Which REHH value to use?
Extend to the left
…ACAGATTACAGTTACAGATTACAAACACGC…
…ACAGATTACAAATACAGATTACAATGATAG…
…ACAGATTACAATTACAGATTACAAACCCAG…
…ACAGATTACAATTTCAGATTACACTGACAG…
…ACAGATTACAATTAAAGATTACACAGACAG…
…ACAGATTACAATTACCGATTACAAACACAG…
…ACAGATTACAATTACAAATTACACACACAG…
…ACAGATTACAATTACAGTTACACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG
…TACAGATTAGATTACAGATTACACACACAG
---------- 70 KB ---------
REHH = 4.05
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 45 / 65
Selection Index EHH
REHH: Problem #2
REHH score is heavily 
biased by allele 
frequencies
Must normalize
P(REHH | Allele Freq.)
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 46 / 65
Selection Index EHH
REHH: Problem #3
Not possible to detect 
selection in high 
frequency alleles
Solution requires a X‐
population approach 
(discussed later)
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 47 / 65
Selection Index EHH
Leaves a lot to be desired
Picking the maximum is arbitrary
Why not the mean REHH score?
Biased by allele frequency
ln(REHH | allele freq) ~ norm dist.
Still widely used and published with
REHH Overview
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 48 / 65
Selection Index EHH
Site-specific EHH[9]
두 allele의 EHH값의 대략적인 평균(weight: squared allele frequencies)
Focal SNP의 대략적인 EHH크기
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 49 / 65
Selection Index iHS
iHS: sabeti(2007)[8]
모든 위치에 대해 적분!!!!해서 비교
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 50 / 65
Selection Index iHS
Integrated Haplotype Score (iHS)
Unstandardized iHS = 
EHH
y  x
y = bwd distance
x = fwd distance
EHHD = derived allele
EHHA = ancestral allele
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 51 / 65
Selection Index iHS
…ACAGATTACAGTTACAGATTACAAACACGC…
…ACAGATTACAAATACAGATTACAATGATAG…
…ACAGATTACAATTACAGATTACAAACCCAG…
…ACAGATTACAATTTCAGATTACACTGACAG…
…ACAGATTACAATTAAAGATTACACAGACAG…
…ACAGATTACAATTACCGATTACAAACACAG…
…ACAGATTACAATTACAAATTACACACACAG…
…ACAGATTACAATTACAGTTACAACACCCAG…
…TACAGATTAGATTACAGATTACACACATAG
…TACAGATTAGATTACAGATTACACACACAG
+ 0.5 = 1.20.7
4.0 + 4.4 = 8.4
Unstandardized iHS
ln(8.4/3.2)  =  0.419 
Integrated Haplotype Score (iHS)
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 52 / 65
Selection Index iHS
iHS Characteristics
As both alleles have the same AUC, iHS zero
Large negative values indicate selection of allele in the 
denominator
Large positive values indicate selection of allele in the 
numerator
Still heavily biased by allele frequency!
Z‐score normalization
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 53 / 65
Selection Index iHS
Unstandardized iHS ‐ E(iHS | Allele Frequency) 
SD(iHS | Allele Frequency) 
E(iHS | Allele Freq.):   Estimated from empirical distribution
SD(iHS | Allele Freq.): Estimated from empirical distribution
Integrated Haplotype Score (iHS)
= iHS
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 54 / 65
Selection Index iHS
iHS Overview
iHS and REHH are EHH based methods to detect 
positive selection
iHS outperforms REHH in specific allele frequencies
They don’t completely outperform each other
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 55 / 65
Selection Index iHS
iHS: Problem #1
Still can’t detect selection in high frequency (old) 
alleles
Relatively High EHH values 
are not present high 
frequency (old) alleles!
Use a reference population
If pos. sel. didn’t take place 
in ref. population, EHH is 
high
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 56 / 65
Selection Index xp-EHH
xp-EHH: sabeti(2007)[8]
Population 별, 같은 allele별 integreted EHH를 비교!!
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 57 / 65
Selection Index xp-EHH
Cross Population EHH (XP‐EHH)
AATTACAGATTACA AACACGC 10
AATTACAGATTACA ATGATAG 8
AATTACAGATTACA AACCCAG 7
AATTACAGATTACA CTGACAG 5
AATTACAGATTACA CAGACAG 3
AATTACAGATTACA AACACAG 6
AATTACAGATTACA CACACAG 4
AATTACAGATTACA CACCCAG 7
Same allele but diff population
AATTACAGATTACA CACATAG 20
AATTACAGATTACA CACACAG 30
0.5
XP‐EHH = ln(3.3/0.5) = 1.89  Z‐score Norn
Integrate EHH over distance from allele
Calculated for fwd/rev sides independently
Integrate until EHH = 0.04 in e.a. population
3.3
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 58 / 65
Selection Index xp-EHH
REHH and iHS are more or less complementary
e.a. is better at detecting pos. sel. at diff freqs.
XP‐EHH
Can detect pos. sel. in high freq. alleles
Susceptible to population variation in 
recombination rate
Overview
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 59 / 65
Selection Index xp-EHH
Final Verdict: REHH vs iHS vs XP‐EHH
REHH
iHS test
XP‐EHH
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 60 / 65
Selection Index xp-EHH
Rsb[9]
Population끼리 비교하는 또다른 지표.
Population별로만 비교.
Locus별로 두 allele의 integrated EHH의 average: iES
Locus의 대략적인 selection정도를 population끼리 비교.
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 61 / 65
Practice
Contents
1 Fst
Wright’s F-statistics
Cockerham’s θ-statistics
2 Selection Index
EHH
iHS
xp-EHH
3 Practice
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 62 / 65
Practice
FST
hierfstat[3]
PER3 gene in HGDP(Human Genome Diversity Panel): 289 SNPs &
7 population
EHH, iHS
rehh[2]
패키지 자체 제공 예제
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 63 / 65
Practice
Reference I
[1] Cockerham, C. C. (1969). Variance of gene frequencies. Evolution, pages 72–84.
[2] Gautier, M. and Vitalis, R. (2012). rehh: an r package to detect footprints of selection in genome-wide snp data from
haplotype structure. Bioinformatics, 28(8):1176–1177.
[3] Goudet, J. (2005). Hierfstat, a package for r to compute and test hierarchical f-statistics. Molecular Ecology Notes,
5(1):184–186.
[4] Hamilton, M. (2011). Population genetics. John Wiley & Sons.
[5] Holsinger, K. E. and Weir, B. S. (2009). Genetics in geographically structured populations: defining, estimating and
interpreting fst. Nature Reviews Genetics, 10(9):639–650.
[6] Huerta-S´anchez, E., Jin, X., Bianba, Z., Peter, B. M., Vinckenbosch, N., Liang, Y., Yi, X., He, M., Somel, M., Ni, P., et al.
(2014). Altitude adaptation in tibetans caused by introgression of denisovan-like dna. Nature, 512(7513):194–197.
[7] Sabeti, P. C., Reich, D. E., Higgins, J. M., Levine, H. Z., Richter, D. J., Schaffner, S. F., Gabriel, S. B., Platko, J. V.,
Patterson, N. J., McDonald, G. J., et al. (2002). Detecting recent positive selection in the human genome from haplotype
structure. Nature, 419(6909):832–837.
[8] Sabeti, P. C., Varilly, P., Fry, B., Lohmueller, J., Hostetter, E., Cotsapas, C., Xie, X., Byrne, E. H., McCarroll, S. A.,
Gaudet, R., et al. (2007). Genome-wide detection and characterization of positive selection in human populations. Nature,
449(7164):913–918.
[9] Tang, K., Thornton, K. R., and Stoneking, M. (2007). A new approach for using genome scans to detect recent positive
selection in the human genome. PLoS biology, 5(7):e171.
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 64 / 65
Practice
END
Email : secondmath85@gmail.com
Office: (02)880-2743
H.P: 010-9192-5385
김진섭 (GSPH, SNU) FST & Some Selection Index October 29, 2014 65 / 65

More Related Content

PDF
Fst, selection index
PDF
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
PDF
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
PDF
Why Does Deep and Cheap Learning Work So Well
PDF
괴델(Godel)의 불완전성 정리 증명의 이해.
PDF
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
DOCX
가설검정의 심리학
PDF
Win Above Replacement in Sabermetrics
Fst, selection index
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Why Does Deep and Cheap Learning Work So Well
괴델(Godel)의 불완전성 정리 증명의 이해.
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
가설검정의 심리학
Win Above Replacement in Sabermetrics

More from Jinseob Kim (18)

PDF
Regression Basic : MLE
PDF
iHS calculation in R
PDF
Fst in R
PDF
질병부담계산: Dismod mr gbd2010
PDF
DALY & QALY
PDF
Case-crossover study
PDF
Generalized Additive Model
PDF
Deep Learning by JSKIM (Korean)
PDF
Machine Learning Introduction
PDF
Tree advanced
PDF
Deep learning by JSKIM
PDF
Main result
PDF
Multilevel study
PDF
GEE & GLMM in GWAS
PDF
Whole Genome Regression using Bayesian Lasso
PDF
useR 2014 jskim
PDF
R Introduction & auto make table1
PDF
Think bayes
Regression Basic : MLE
iHS calculation in R
Fst in R
질병부담계산: Dismod mr gbd2010
DALY & QALY
Case-crossover study
Generalized Additive Model
Deep Learning by JSKIM (Korean)
Machine Learning Introduction
Tree advanced
Deep learning by JSKIM
Main result
Multilevel study
GEE & GLMM in GWAS
Whole Genome Regression using Bayesian Lasso
useR 2014 jskim
R Introduction & auto make table1
Think bayes
Ad

Recently uploaded (20)

DOCX
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
PPT
Breast Cancer management for medicsl student.ppt
PDF
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
PPTX
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
PPTX
Transforming Regulatory Affairs with ChatGPT-5.pptx
PPT
Obstructive sleep apnea in orthodontics treatment
PDF
Oral Aspect of Metabolic Disease_20250717_192438_0000.pdf
PPTX
antibiotics rational use of antibiotics.pptx
PDF
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
PPTX
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...
PPTX
Stimulation Protocols for IUI | Dr. Laxmi Shrikhande
PPT
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
PPTX
Acid Base Disorders educational power point.pptx
PPTX
CHEM421 - Biochemistry (Chapter 1 - Introduction)
PPTX
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
PPTX
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
PPTX
CME 2 Acute Chest Pain preentation for education
PPT
HIV lecture final - student.pptfghjjkkejjhhge
PDF
focused on the development and application of glycoHILIC, pepHILIC, and comm...
PPTX
MANAGEMENT SNAKE BITE IN THE TROPICALS.pptx
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
Breast Cancer management for medicsl student.ppt
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
Transforming Regulatory Affairs with ChatGPT-5.pptx
Obstructive sleep apnea in orthodontics treatment
Oral Aspect of Metabolic Disease_20250717_192438_0000.pdf
antibiotics rational use of antibiotics.pptx
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...
Stimulation Protocols for IUI | Dr. Laxmi Shrikhande
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
Acid Base Disorders educational power point.pptx
CHEM421 - Biochemistry (Chapter 1 - Introduction)
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
CME 2 Acute Chest Pain preentation for education
HIV lecture final - student.pptfghjjkkejjhhge
focused on the development and application of glycoHILIC, pepHILIC, and comm...
MANAGEMENT SNAKE BITE IN THE TROPICALS.pptx
Ad

Selection index population_genetics