SlideShare a Scribd company logo
Bayesian selection of best subsets in high-dimensional
regression
Shiqiang Jin
Department of Statistics
Kansas State University, Manhattan, KS
Joint work with
Gyuhyeong Goh
Kansas State University, Manhattan, KS
July 31, 2019
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 1 / 25
Bayesian linear regression model in High-dimensional Data
Consider a linear regression model
y = Xβ + , (1)
where y = (y1, . . . , yn)T
is a response vector, X = (x1, . . . , xp) ∈ Rn×p
is a
model matrix, β = (β1, . . . , βp)T
is a coefficient vector and  ∼ N(0, σ2
In).
We assume p  n, i.e. High-dimensional data.
We assume only a few number of predictors are associated with the response,
i.e. β is sparse.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 2 / 25
Bayesian linear regression model in High-dimensional Data
To better explain the sparsity of β, we introduce a latent index set
γ ⊂ {1, . . . , p} so that Xγ represents a sub-matrix of X containing xj , j ∈ γ.
e.g. γ = {1, 3, 4} ⇒ Xγ = (x1, x3, x4).
The full model in (1) can be reduced to
y = Xγβγ + . (2)
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 3 / 25
Priors and marginal posterior distribution
Our Research Goals are to obtain:
(i) k most important predictors out of p
k

candidate models;
(ii) a single best model from among 2p
candidate models.
We consider
βγ|σ2
, γ ∼ Normal(0, τσ2
I|γ|),
σ2
∼ Inverse-Gamma(aσ/2, bσ/2),
π(γ) ∝ I(|γ| = k),
where |γ| is number of elements in γ.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 4 / 25
Priors and marginal posterior distribution
Our Research Goals are to obtain:
(i) k most important predictors out of p
k

candidate models;
(ii) a single best model from among 2p
candidate models.
We consider
βγ|σ2
, γ ∼ Normal(0, τσ2
I|γ|),
σ2
∼ Inverse-Gamma(aσ/2, bσ/2),
π(γ) ∝ I(|γ| = k),
where |γ| is number of elements in γ.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 4 / 25
Priors and marginal posterior distribution
Given k, it can be shown that
π(γ|y) ∝
(τ−1
)
|γ|
2
|XT
γXγ + τ−1I|γ||
1
2 (yT
Hγy + bσ)
aσ+n
2
I(|γ| = k)
≡ g(γ)I(|γ| = k),
where Hγ = In − Xγ(XT
γXγ + τ−1
I|γ|)−1
XT
γ.
Hence, g(γ) is our model selection criterion.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 5 / 25
Best subset selection Algorithm
Note that our goals are to obtain (i) k most important predictors out of p
k

models; (ii) a single best model from among 2p
candidate models.
Hence, this can be fallen into the description of best subset selection as
follows:
(i) Fixed size: for k = 1, 2, . . . , p, select the best subset model by
Mk = argγ max
|γ|=k
g(γ)
from p
k

candidate models.
(ii) Varying size: Pick the single best model from M1, . . . , Mp.
Challenge of best subset selection:
e.g. (i) p = 1000, k = 5, 1000
5

≈ 8 × 1012
; (ii) p = 40, 240
≈ 1012
.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 6 / 25
Best subset selection Algorithm
Note that our goals are to obtain (i) k most important predictors out of p
k

models; (ii) a single best model from among 2p
candidate models.
Hence, this can be fallen into the description of best subset selection as
follows:
(i) Fixed size: for k = 1, 2, . . . , p, select the best subset model by
Mk = argγ max
|γ|=k
g(γ)
from p
k

candidate models.
(ii) Varying size: Pick the single best model from M1, . . . , Mp.
Challenge of best subset selection:
e.g. (i) p = 1000, k = 5, 1000
5

≈ 8 × 1012
; (ii) p = 40, 240
≈ 1012
.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 6 / 25
Best subset selection Algorithm
Note that our goals are to obtain (i) k most important predictors out of p
k

models; (ii) a single best model from among 2p
candidate models.
Hence, this can be fallen into the description of best subset selection as
follows:
(i) Fixed size: for k = 1, 2, . . . , p, select the best subset model by
Mk = argγ max
|γ|=k
g(γ)
from p
k

candidate models.
(ii) Varying size: Pick the single best model from M1, . . . , Mp.
Challenge of best subset selection:
e.g. (i) p = 1000, k = 5, 1000
5

≈ 8 × 1012
; (ii) p = 40, 240
≈ 1012
.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 6 / 25
Neighborhood Search
To avoid the exhaustive computation, we resort to the idea of Neighborhood
Search proposed by Madigan et al. (1995) and Hans et al. (2007).
Let γ(t)
be a current state of γ, |γ(t)
| = k is model size.
Addition neighbor: N+(γ(t)
) = {γ(t)
∪ {j} : j /
∈ γ(t)
}; model size?
Deletion neighbor: N−(γ(t)
) = {γ(t)
 {j0
} : j0
∈ γ(t)
}; model size?
e.g. Suppose p = 4, k = 2. Let γ(t)
= {1, 2}, then
N+(γ(t)
) = {{1, 2, 3}, {1, 2, 4}},
N−(γ(t)
) = {{1}, {2}}.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 7 / 25
Hybrid best subset search with a fixed k
Note our Goal (i) is to find γ̂ = argγ max|γ̂|=k g(γ).
1. Initialize γ̂ s.t. |γ̂| = k.
2. Repeat #deterministic search:local optimum
Update γ̃ ← arg maxγ∈N+(γ̂) g(γ) ; # N+(γ̂) = {γ̂ ∪ {j} : j /
∈ γ̂}
Update γ̂ ← arg maxγ∈N−(γ̃) g(γ); # N−(γ̃) = {γ̃  {j} : j ∈ γ̃}
until convergence.
In Step 2, we have p + 1 many candidate models in all γ ∈ N+(γ̂) and all
γ ∈ N−(γ̃)) in each iteration.
compute g(γ) p + 1 times in each iteration.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 8 / 25
Hybrid best subset search with a fixed k
Note our Goal (i) is to find γ̂ = argγ max|γ̂|=k g(γ).
1. Initialize γ̂ s.t. |γ̂| = k.
2. Repeat #deterministic search:local optimum
Update γ̃ ← arg maxγ∈N+(γ̂) g(γ) ; # N+(γ̂) = {γ̂ ∪ {j} : j /
∈ γ̂}
Update γ̂ ← arg maxγ∈N−(γ̃) g(γ); # N−(γ̃) = {γ̃  {j} : j ∈ γ̃}
until convergence.
In Step 2, we have p + 1 many candidate models in all γ ∈ N+(γ̂) and all
γ ∈ N−(γ̃)) in each iteration.
compute g(γ) p + 1 times in each iteration.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 8 / 25
1st Key features of proposed algorithm
g(γ) =
(τ−1
)
|γ|
2
|XT
γXγ + τ−1I|γ||
1
2 (yT
Hγy + bσ)
aσ+n
2
,
where Hγ = In − Xγ(XT
γXγ + τ−1
I|γ|)−1
XT
γ.
compute inverse matrix and determinant p + 1 times.
We propose the following formula and we can show that evaluating all
candidate models in addition neighbor can be done simultaneously in this
single computation.
g(N+(γ̂)) =

(yT
Hγ̂y + bσ)1p −
(XT
Hγ̂y)2
τ−11p + diag(XT
Hγ̂X)
− aσ+n
2
×

τ−1
1p + diag(XT
Hγ̂X)
	−1/2
, (3)
Similarly for g(N−(γ̂)).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 9 / 25
Hybrid best subset search with a fixed k
Note our Goal (i) is to find γ̂ = argγ max|γ̂|=k g(γ).
1. Initialize γ̂ s.t. |γ̂| = k.
2. Repeat #deterministic search:local optimum
Update γ̃ ← arg max g(N+(γ̂)) ; # N+(γ̂) = {γ̂ ∪ {j} : j /
∈ γ̂}
Update γ̂ ← arg max g(N−(γ̃)); # N−(γ̃) = {γ̃  {j} : j ∈ γ̃}
until convergence.
3. Set γ(0)
= γ̂.
4. Repeat for t = 1, . . . , T: #stochastic search:global optimum
Sample γ∗
∼ π(γ|y) = {g(γ)}
P
γ
{g(γ)}
I{γ ∈ N+(γ(t−1)
)};
Sample γ(t)
∼ π(γ|y) = {g(γ)}
P
γ
{g(γ)}
I{γ ∈ N−(γ∗
)};
If π(γ̂|y)  π(γ(t)
|y), then update γ̂ = γ(t)
, break the loop, and go to 2.
5. Return γ̂.
Note that all g(γ) are computed simultaneously in their neighbor space.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 10 / 25
Hybrid best subset search with a fixed k
Note our Goal (i) is to find γ̂ = argγ max|γ̂|=k g(γ).
1. Initialize γ̂ s.t. |γ̂| = k.
2. Repeat #deterministic search:local optimum
Update γ̃ ← arg max g(N+(γ̂)) ; # N+(γ̂) = {γ̂ ∪ {j} : j /
∈ γ̂}
Update γ̂ ← arg max g(N−(γ̃)); # N−(γ̃) = {γ̃  {j} : j ∈ γ̃}
until convergence.
3. Set γ(0)
= γ̂.
4. Repeat for t = 1, . . . , T: #stochastic search:global optimum
Sample γ∗
∼ π(γ|y) = {g(γ)}
P
γ
{g(γ)}
I{γ ∈ N+(γ(t−1)
)};
Sample γ(t)
∼ π(γ|y) = {g(γ)}
P
γ
{g(γ)}
I{γ ∈ N−(γ∗
)};
If π(γ̂|y)  π(γ(t)
|y), then update γ̂ = γ(t)
, break the loop, and go to 2.
5. Return γ̂.
Note that all g(γ) are computed simultaneously in their neighbor space.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 10 / 25
2nd Key features of proposed algorithm
To avoid staying in the local optimum for a long time in step 4, we use the
escort distribution.
The idea of escort distribution (used in statistical physics and
thermodynamics) is introduced to stimulate the movement of Markov chain.
An escort distribution of g(γ) is given as
{g(γ)}α
P
γ{g(γ)}α
, α ∈ [0, 1]
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 11 / 25
Escort distributions
e.g. Consider 3 candidate models with its probability:
If α = 1,
{g(γ)}α
P
γ{g(γ)}α
=





0.02 model 1
0.90 model 2
0.08 model 3
.
1 2 3
Escort
probability
0.0
0.2
0.4
0.6
0.8
1.0
α=1
0.02
0.9
0.08
1 2 3
Escort
probability
0.0
0.2
0.4
0.6
0.8
1.0
α=0.5
0.1
0.69
0.21
1 2 3
Escort
probability
0.0
0.2
0.4
0.6
0.8
1.0
α=0.05
0.3
0.37
0.33
Figure: Escort distributions of πα(γ|y).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 12 / 25
Hybrid best subset search with a fixed k
Note our Goal (i) is to find γ̂ = argγ max|γ̂|=k g(γ).
1. Initialize γ̂ s.t. |γ̂| = k.
2. Repeat #deterministic search:local optimum
Update γ̃ ← arg max g(N+(γ̂)) ; # N+(γ̂) = {γ̂ ∪ {j} : j /
∈ γ̂}
Update γ̂ ← arg max g(N−(γ̃)); # N−(γ̃) = {γ̃  {j} : j ∈ γ̃}
until convergence.
3. Set γ(0)
= γ̂.
4. Repeat for t = 1, . . . , T: #stochastic search:global optimum
Sample γ∗
∼ πα(γ|y) = {g(γ)}α
P
γ
{g(γ)}α
I{γ ∈ N+(γ(t−1)
)}; # α ∈ [0, 1]
Sample γ(t)
∼ πα(γ|y) = {g(γ)}α
P
γ
{g(γ)}α
I{γ ∈ N−(γ∗
)};
If π(γ̂|y)  π(γ(t)
|y), then update γ̂ = γ(t)
, break the loop, and go to 2.
5. Return γ̂.
Note that all g(γ) are computed simultaneously in their neighbor space.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 13 / 25
Best subset selection with varying k
Note that Goal (ii): a single best model from among 2p
candidate models.
We extend ”fixed” k to varying k by assigning a prior on k.
Note that the uniform prior, k ∼ Uniform{1, . . . , K}, tends to assign larger
probability to a larger subset (see Chen and Chen (2008)).
We define
π(k) ∝ 1/

p
k

I(k ≤ K).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 14 / 25
Hybrid best subset search with varying k
Bayesian best subset selection can be done by maximizing
π(γ, k|y) ∝ g(γ)/

p
k

(4)
over (γ, k).
Our algorithm proceeds as follows:
1. Repeat for k = 1, . . . , K:
Given k, implement the hybrid search algorithm to obtain best subset model
γ̂k .
2. Find the best model γ̂∗
obtained by
γ̂∗
= arg max
k∈{1,...,K}

g(γ̂k )/

p
k

. (5)
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 15 / 25
Consistency of model selection criterion
Theorem
Let γ∗ indicate the true model. Define Γ = {γ : |γ| ≤ K, γ 6= γ∗}. Assume that
p = O(nξ
) for ξ ≥ 1. Under the asymptotic identifiability condition of Chen and
Chen (2008), if τ → ∞ as n → ∞ but τ = o(n), then the proposed Bayesian
subset selection possesses the Bayesian model selection consistency, that is,
π(γ∗|y)  max
γ∈Γ
π(γ|y) (6)
in probability as n → ∞.
As n → ∞, the maximizer of π(γ|y) is the true model based on our model
selection criterion.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 16 / 25
Simulation study
Setup
For given n = 100, we generate the data from
yi
ind
∼ Normal
p
X
j=1
βj xij , 1
!
,
where
I (xi1, . . . , xip)T iid
∼ Normal(0p, Σ) with Σ = (Σij )p×p and Σij = ρ|i−j|
,
I βj
iid
∼ Uniform{−1, −2, 1, 2} if j ∈ γ and βj = 0 if j /
∈ γ.
I γ is an index set of size 4 randomly selected from {1, 2, . . . , p}.
I We consider four scenarios for p and ρ:
(i) p = 200, ρ = 0.1, (ii) p = 200, ρ = 0.9,
(iii) p = 1000, ρ = 0.1, (iv) p = 1000, ρ = 0.9.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 17 / 25
Simulation study
Results (high-dimensional scenarios)
Table: 2, 000 replications; FDR (false discovery rate), TRUE% (percentage of the true
model detected), SIZE (selected model size), HAM (Hamming distance).
Scenario Method FDR (s.e.) TRUE% (s.e.) SIZE (s.e.) HAM (s.e.)
p = 200 Proposed 0.006 (0.001) 96.900 (0.388) 4.032 (0.004) 0.032 (0.004)
ρ = 0.1 SCAD 0.034 (0.002) 85.200 (0.794) 4.188 (0.011) 0.188 (0.011)
MCP 0.035 (0.002) 84.750 (0.804) 4.191 (0.011) 0.191 (0.011)
ENET 0.016 (0.001) 92.700 (0.582) 4.087 (0.007) 0.087 (0.007)
LASSO 0.020 (0.002) 91.350 (0.629) 4.109 (0.009) 0.109 (0.009)
p = 200 Proposed 0.023 (0.002) 88.750 (0.707) 3.985 (0.006) 0.203 (0.014)
ρ = 0.9 SCAD 0.059 (0.003) 74.150 (0.979) 4.107 (0.015) 0.480 (0.022)
MCP 0.137 (0.004) 55.400 (1.112) 4.264 (0.020) 1.098 (0.034)
ENET 0.501 (0.004) 0.300 (0.122) 7.716 (0.072) 5.018 (0.052)
LASSO 0.276 (0.004) 15.550 (0.811) 5.308 (0.033) 2.038 (0.034)
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 18 / 25
Simulation study
Results (ultra high-dimensional scenarios)
Table: 2, 000 replications; FDR (false discovery rate), TRUE% (percentage of the true
model detected), SIZE (selected model size), HAM (Hamming distance).
Scenario Method FDR (s.e.) TRUE% (s.e.) SIZE (s.e.) HAM (s.e.)
p = 1000 Proposed 0.004 (0.001) 98.100 (0.305) 4.020 (0.003) 0.020 (0.003)
ρ = 0.1 SCAD 0.027 (0.002) 87.900 (0.729) 4.145 (0.010) 0.145 (0.010)
MCP 0.031 (0.002) 86.550 (0.763) 4.172 (0.013) 0.172 (0.013)
ENET 0.035 (0.002) 84.850 (0.802) 4.181 (0.013) 0.206 (0.012)
LASSO 0.014 (0.001) 93.850 (0.537) 4.073 (0.007) 0.073 (0.007)
p = 1000 Proposed 0.023(0.002) 89.850 (0.675) 4.005 (0.005) 0.190 (0.013)
ρ = 0.9 SCAD 0.068 (0.003) 74.250 (0.978) 4.196 (0.014) 0.493 (0.023)
MCP 0.152 (0.004) 53.750 (1.115) 4.226 (0.017) 1.202 (0.035)
ENET 0.417 (0.005) 0.150 (0.087) 6.228 (0.068) 4.089 (0.043)
LASSO 0.265 (0.004) 19.500 (0.886) 5.139 (0.029) 1.909 (0.035)
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 19 / 25
Real data application
Data description
We apply the proposed method to Breast Invasive Carcinoma (BRCA) data
generated by The Cancer Genome Atlas (TCGA) Research Network
http://cancergenome.nih.gov.
The data set contains 17, 814 gene expression measurements (recorded on
the log scale) of 526 patients with primary solid tumor.
BRCA1 is a tumor suppressor gene and its mutations predispose women to
breast cancer (Findlay et al., 2018).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 20 / 25
Real data application
Results (based on 4, 000 genes)
Our goal here is to identify the best fitting model for estimating an
association between BRCA1 (response variable) and the other genes
(independent variables).
BRCA1 = β1 ∗ NBR2 + β2 ∗ DTL + . . . + βp ∗ VPS25 + .
Results:
Table: Model comparison
# of selected PMSE BIC EBIC
Proposed 8 0.60 984.45 1099.50
SCAD 4 0.68 1104.69 1166.47
MCP 4 0.68 1104.69 1166.47
ENET 5 0.68 1110.65 1186.25
LASSO 4 0.68 1104.69 1166.47
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 21 / 25
Real data application
Results (cont.)
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.582
0.001
0.001
0.001
0.001
0.001
0.001
0.001
0.001
RPL27
C10orf76
DTL
NBR2
C17orf53
TUBG1
CRBN
YTHDC2
VPS25
CMTM5
TUBA1B
LASSO ENET MCP SCAD Proposed
Methods
Genes
−0.1
0.0
0.1
0.2
0.3
Coefficient
Figure: Except C10orf76, 7 genes are documented as diseases-related genes
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 22 / 25
Concluding remarks
Parallel computing is applicable to our algorithm with varying k.
The proposed method can be extended to multivariate linear regression
models, binary regression models, and multivariate mixed responses models
(in progress).
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 23 / 25
REFERENCES
Hans, C., A. Dobra, and M. West (2007). Shotgun stochastic search for “large p” regression.
Journal of the American Statistical Association 102(478), 507–516.
Chen, J. and Z. Chen (2008). Extended bayesian information criteria for model selection with
large model spaces. Biometrika 95(3), 759–771.
Findlay, G. M., R. M. Daza, B. Martin, M. D. Zhang, A. P. Leith, M. Gasperini, J. D. Janizek,
X. Huang, L. M. Starita, and J. Shendure (2018). Accurate classification of brca1 variants
with saturation genome editing. Nature 562(7726), 217.
Madigan, David, Jeremy York, and Denis Allard (1995). Bayesian Graphical Models for Discrete
Data. International Statistical Review / Revue Internationale De Statistique 63(2), 215-232.
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 24 / 25
Contact: jinsq@ksu.edu
THANK YOU
Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan,
Bayesian selection of best subsets in high-dimensional regression July 31, 2019 25 / 25

More Related Content

PDF
Bayesian solutions to high-dimensional challenges using hybrid search
PDF
Block-Wise Density Distribution of Primes Less Than A Trillion in Arithmetica...
PDF
Learning to Grow Structured Visual Summaries for Document Collections
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
Lec7 deeprlbootcamp-svg+scg
PDF
Naïve Bayes Machine Learning Classification with R Programming: A case study ...
PDF
A quantum-inspired optimization heuristic for the multiple sequence alignment...
PDF
Hessian Matrices in Statistics
Bayesian solutions to high-dimensional challenges using hybrid search
Block-Wise Density Distribution of Primes Less Than A Trillion in Arithmetica...
Learning to Grow Structured Visual Summaries for Document Collections
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Lec7 deeprlbootcamp-svg+scg
Naïve Bayes Machine Learning Classification with R Programming: A case study ...
A quantum-inspired optimization heuristic for the multiple sequence alignment...
Hessian Matrices in Statistics

What's hot (20)

PDF
An overview of Bayesian testing
PDF
A new generalized lindley distribution
PDF
prior selection for mixture estimation
PDF
better together? statistical learning in models made of modules
PDF
ABC workshop: 17w5025
PDF
A lattice-based consensus clustering
PDF
ABC short course: final chapters
PDF
Bayesian inference on mixtures
PDF
Vancouver18
PDF
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
PDF
Introduction to Evidential Neural Networks
PDF
Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...
PDF
Reliable ABC model choice via random forests
PDF
ABC short course: model choice chapter
PDF
from model uncertainty to ABC
PDF
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
PDF
Intractable likelihoods
PDF
Using Consolidated Tabular and Text Data in Business Predictive Analytics
PDF
Likelihood-free Design: a discussion
PDF
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
An overview of Bayesian testing
A new generalized lindley distribution
prior selection for mixture estimation
better together? statistical learning in models made of modules
ABC workshop: 17w5025
A lattice-based consensus clustering
ABC short course: final chapters
Bayesian inference on mixtures
Vancouver18
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Introduction to Evidential Neural Networks
Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Reliable ABC model choice via random forests
ABC short course: model choice chapter
from model uncertainty to ABC
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Intractable likelihoods
Using Consolidated Tabular and Text Data in Business Predictive Analytics
Likelihood-free Design: a discussion
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Ad

Similar to Bayesian selection of best subsets in high-dimensional regression (20)

PDF
Bayesian hybrid variable selection under generalized linear models
PDF
Ajas11 alok
PDF
Model selection
PDF
Petrini - MSc Thesis
PDF
Bayesian Variable Selection in Linear Regression and A Comparison
PDF
Hybrid embedded and filter feature selection methods in big dimension mammary...
PDF
Linear Model Selection and Regularization (Article 6 - Practical exercises)
PDF
presentazione
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...
PDF
Hybrid filtering methods for feature selection in high-dimensional cancer data
PDF
A model-based relevance estimation approach for feature selection in microarr...
PDF
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
PDF
Recent Database Management Systems Research Articles - September 2020
PPT
Nbvtalkonfeatureselection
PDF
Selecting the best stochastic systems for large scale engineering problems
PDF
11 mm91r05
PDF
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
PDF
CDT 22 slides.pdf
PDF
Booster in High Dimensional Data Classification
PDF
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
Bayesian hybrid variable selection under generalized linear models
Ajas11 alok
Model selection
Petrini - MSc Thesis
Bayesian Variable Selection in Linear Regression and A Comparison
Hybrid embedded and filter feature selection methods in big dimension mammary...
Linear Model Selection and Regularization (Article 6 - Practical exercises)
presentazione
MUMS: Bayesian, Fiducial, and Frequentist Conference - Including Factors in B...
Hybrid filtering methods for feature selection in high-dimensional cancer data
A model-based relevance estimation approach for feature selection in microarr...
New Feature Selection Model Based Ensemble Rule Classifiers Method for Datase...
Recent Database Management Systems Research Articles - September 2020
Nbvtalkonfeatureselection
Selecting the best stochastic systems for large scale engineering problems
11 mm91r05
Improving Prediction Accuracy Results by Using Q-Statistic Algorithm in High ...
CDT 22 slides.pdf
Booster in High Dimensional Data Classification
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Encapsulation theory and applications.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
sap open course for s4hana steps from ECC to s4
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Spectroscopy.pptx food analysis technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
KodekX | Application Modernization Development
Approach and Philosophy of On baking technology
Encapsulation theory and applications.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Programs and apps: productivity, graphics, security and other tools
sap open course for s4hana steps from ECC to s4
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Unlocking AI with Model Context Protocol (MCP)
Spectroscopy.pptx food analysis technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25 Week I
MIND Revenue Release Quarter 2 2025 Press Release
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
Digital-Transformation-Roadmap-for-Companies.pptx
KodekX | Application Modernization Development

Bayesian selection of best subsets in high-dimensional regression

  • 1. Bayesian selection of best subsets in high-dimensional regression Shiqiang Jin Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, KS July 31, 2019 Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 1 / 25
  • 2. Bayesian linear regression model in High-dimensional Data Consider a linear regression model y = Xβ + , (1) where y = (y1, . . . , yn)T is a response vector, X = (x1, . . . , xp) ∈ Rn×p is a model matrix, β = (β1, . . . , βp)T is a coefficient vector and ∼ N(0, σ2 In). We assume p n, i.e. High-dimensional data. We assume only a few number of predictors are associated with the response, i.e. β is sparse. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 2 / 25
  • 3. Bayesian linear regression model in High-dimensional Data To better explain the sparsity of β, we introduce a latent index set γ ⊂ {1, . . . , p} so that Xγ represents a sub-matrix of X containing xj , j ∈ γ. e.g. γ = {1, 3, 4} ⇒ Xγ = (x1, x3, x4). The full model in (1) can be reduced to y = Xγβγ + . (2) Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 3 / 25
  • 4. Priors and marginal posterior distribution Our Research Goals are to obtain: (i) k most important predictors out of p k candidate models; (ii) a single best model from among 2p candidate models. We consider βγ|σ2 , γ ∼ Normal(0, τσ2 I|γ|), σ2 ∼ Inverse-Gamma(aσ/2, bσ/2), π(γ) ∝ I(|γ| = k), where |γ| is number of elements in γ. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 4 / 25
  • 5. Priors and marginal posterior distribution Our Research Goals are to obtain: (i) k most important predictors out of p k candidate models; (ii) a single best model from among 2p candidate models. We consider βγ|σ2 , γ ∼ Normal(0, τσ2 I|γ|), σ2 ∼ Inverse-Gamma(aσ/2, bσ/2), π(γ) ∝ I(|γ| = k), where |γ| is number of elements in γ. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 4 / 25
  • 6. Priors and marginal posterior distribution Given k, it can be shown that π(γ|y) ∝ (τ−1 ) |γ| 2 |XT γXγ + τ−1I|γ|| 1 2 (yT Hγy + bσ) aσ+n 2 I(|γ| = k) ≡ g(γ)I(|γ| = k), where Hγ = In − Xγ(XT γXγ + τ−1 I|γ|)−1 XT γ. Hence, g(γ) is our model selection criterion. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 5 / 25
  • 7. Best subset selection Algorithm Note that our goals are to obtain (i) k most important predictors out of p k models; (ii) a single best model from among 2p candidate models. Hence, this can be fallen into the description of best subset selection as follows: (i) Fixed size: for k = 1, 2, . . . , p, select the best subset model by Mk = argγ max |γ|=k g(γ) from p k candidate models. (ii) Varying size: Pick the single best model from M1, . . . , Mp. Challenge of best subset selection: e.g. (i) p = 1000, k = 5, 1000 5 ≈ 8 × 1012 ; (ii) p = 40, 240 ≈ 1012 . Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 6 / 25
  • 8. Best subset selection Algorithm Note that our goals are to obtain (i) k most important predictors out of p k models; (ii) a single best model from among 2p candidate models. Hence, this can be fallen into the description of best subset selection as follows: (i) Fixed size: for k = 1, 2, . . . , p, select the best subset model by Mk = argγ max |γ|=k g(γ) from p k candidate models. (ii) Varying size: Pick the single best model from M1, . . . , Mp. Challenge of best subset selection: e.g. (i) p = 1000, k = 5, 1000 5 ≈ 8 × 1012 ; (ii) p = 40, 240 ≈ 1012 . Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 6 / 25
  • 9. Best subset selection Algorithm Note that our goals are to obtain (i) k most important predictors out of p k models; (ii) a single best model from among 2p candidate models. Hence, this can be fallen into the description of best subset selection as follows: (i) Fixed size: for k = 1, 2, . . . , p, select the best subset model by Mk = argγ max |γ|=k g(γ) from p k candidate models. (ii) Varying size: Pick the single best model from M1, . . . , Mp. Challenge of best subset selection: e.g. (i) p = 1000, k = 5, 1000 5 ≈ 8 × 1012 ; (ii) p = 40, 240 ≈ 1012 . Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 6 / 25
  • 10. Neighborhood Search To avoid the exhaustive computation, we resort to the idea of Neighborhood Search proposed by Madigan et al. (1995) and Hans et al. (2007). Let γ(t) be a current state of γ, |γ(t) | = k is model size. Addition neighbor: N+(γ(t) ) = {γ(t) ∪ {j} : j / ∈ γ(t) }; model size? Deletion neighbor: N−(γ(t) ) = {γ(t) {j0 } : j0 ∈ γ(t) }; model size? e.g. Suppose p = 4, k = 2. Let γ(t) = {1, 2}, then N+(γ(t) ) = {{1, 2, 3}, {1, 2, 4}}, N−(γ(t) ) = {{1}, {2}}. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 7 / 25
  • 11. Hybrid best subset search with a fixed k Note our Goal (i) is to find γ̂ = argγ max|γ̂|=k g(γ). 1. Initialize γ̂ s.t. |γ̂| = k. 2. Repeat #deterministic search:local optimum Update γ̃ ← arg maxγ∈N+(γ̂) g(γ) ; # N+(γ̂) = {γ̂ ∪ {j} : j / ∈ γ̂} Update γ̂ ← arg maxγ∈N−(γ̃) g(γ); # N−(γ̃) = {γ̃ {j} : j ∈ γ̃} until convergence. In Step 2, we have p + 1 many candidate models in all γ ∈ N+(γ̂) and all γ ∈ N−(γ̃)) in each iteration. compute g(γ) p + 1 times in each iteration. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 8 / 25
  • 12. Hybrid best subset search with a fixed k Note our Goal (i) is to find γ̂ = argγ max|γ̂|=k g(γ). 1. Initialize γ̂ s.t. |γ̂| = k. 2. Repeat #deterministic search:local optimum Update γ̃ ← arg maxγ∈N+(γ̂) g(γ) ; # N+(γ̂) = {γ̂ ∪ {j} : j / ∈ γ̂} Update γ̂ ← arg maxγ∈N−(γ̃) g(γ); # N−(γ̃) = {γ̃ {j} : j ∈ γ̃} until convergence. In Step 2, we have p + 1 many candidate models in all γ ∈ N+(γ̂) and all γ ∈ N−(γ̃)) in each iteration. compute g(γ) p + 1 times in each iteration. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 8 / 25
  • 13. 1st Key features of proposed algorithm g(γ) = (τ−1 ) |γ| 2 |XT γXγ + τ−1I|γ|| 1 2 (yT Hγy + bσ) aσ+n 2 , where Hγ = In − Xγ(XT γXγ + τ−1 I|γ|)−1 XT γ. compute inverse matrix and determinant p + 1 times. We propose the following formula and we can show that evaluating all candidate models in addition neighbor can be done simultaneously in this single computation. g(N+(γ̂)) = (yT Hγ̂y + bσ)1p − (XT Hγ̂y)2 τ−11p + diag(XT Hγ̂X) − aσ+n 2 × τ−1 1p + diag(XT Hγ̂X) −1/2 , (3) Similarly for g(N−(γ̂)). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 9 / 25
  • 14. Hybrid best subset search with a fixed k Note our Goal (i) is to find γ̂ = argγ max|γ̂|=k g(γ). 1. Initialize γ̂ s.t. |γ̂| = k. 2. Repeat #deterministic search:local optimum Update γ̃ ← arg max g(N+(γ̂)) ; # N+(γ̂) = {γ̂ ∪ {j} : j / ∈ γ̂} Update γ̂ ← arg max g(N−(γ̃)); # N−(γ̃) = {γ̃ {j} : j ∈ γ̃} until convergence. 3. Set γ(0) = γ̂. 4. Repeat for t = 1, . . . , T: #stochastic search:global optimum Sample γ∗ ∼ π(γ|y) = {g(γ)} P γ {g(γ)} I{γ ∈ N+(γ(t−1) )}; Sample γ(t) ∼ π(γ|y) = {g(γ)} P γ {g(γ)} I{γ ∈ N−(γ∗ )}; If π(γ̂|y) π(γ(t) |y), then update γ̂ = γ(t) , break the loop, and go to 2. 5. Return γ̂. Note that all g(γ) are computed simultaneously in their neighbor space. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 10 / 25
  • 15. Hybrid best subset search with a fixed k Note our Goal (i) is to find γ̂ = argγ max|γ̂|=k g(γ). 1. Initialize γ̂ s.t. |γ̂| = k. 2. Repeat #deterministic search:local optimum Update γ̃ ← arg max g(N+(γ̂)) ; # N+(γ̂) = {γ̂ ∪ {j} : j / ∈ γ̂} Update γ̂ ← arg max g(N−(γ̃)); # N−(γ̃) = {γ̃ {j} : j ∈ γ̃} until convergence. 3. Set γ(0) = γ̂. 4. Repeat for t = 1, . . . , T: #stochastic search:global optimum Sample γ∗ ∼ π(γ|y) = {g(γ)} P γ {g(γ)} I{γ ∈ N+(γ(t−1) )}; Sample γ(t) ∼ π(γ|y) = {g(γ)} P γ {g(γ)} I{γ ∈ N−(γ∗ )}; If π(γ̂|y) π(γ(t) |y), then update γ̂ = γ(t) , break the loop, and go to 2. 5. Return γ̂. Note that all g(γ) are computed simultaneously in their neighbor space. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 10 / 25
  • 16. 2nd Key features of proposed algorithm To avoid staying in the local optimum for a long time in step 4, we use the escort distribution. The idea of escort distribution (used in statistical physics and thermodynamics) is introduced to stimulate the movement of Markov chain. An escort distribution of g(γ) is given as {g(γ)}α P γ{g(γ)}α , α ∈ [0, 1] Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 11 / 25
  • 17. Escort distributions e.g. Consider 3 candidate models with its probability: If α = 1, {g(γ)}α P γ{g(γ)}α =      0.02 model 1 0.90 model 2 0.08 model 3 . 1 2 3 Escort probability 0.0 0.2 0.4 0.6 0.8 1.0 α=1 0.02 0.9 0.08 1 2 3 Escort probability 0.0 0.2 0.4 0.6 0.8 1.0 α=0.5 0.1 0.69 0.21 1 2 3 Escort probability 0.0 0.2 0.4 0.6 0.8 1.0 α=0.05 0.3 0.37 0.33 Figure: Escort distributions of πα(γ|y). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 12 / 25
  • 18. Hybrid best subset search with a fixed k Note our Goal (i) is to find γ̂ = argγ max|γ̂|=k g(γ). 1. Initialize γ̂ s.t. |γ̂| = k. 2. Repeat #deterministic search:local optimum Update γ̃ ← arg max g(N+(γ̂)) ; # N+(γ̂) = {γ̂ ∪ {j} : j / ∈ γ̂} Update γ̂ ← arg max g(N−(γ̃)); # N−(γ̃) = {γ̃ {j} : j ∈ γ̃} until convergence. 3. Set γ(0) = γ̂. 4. Repeat for t = 1, . . . , T: #stochastic search:global optimum Sample γ∗ ∼ πα(γ|y) = {g(γ)}α P γ {g(γ)}α I{γ ∈ N+(γ(t−1) )}; # α ∈ [0, 1] Sample γ(t) ∼ πα(γ|y) = {g(γ)}α P γ {g(γ)}α I{γ ∈ N−(γ∗ )}; If π(γ̂|y) π(γ(t) |y), then update γ̂ = γ(t) , break the loop, and go to 2. 5. Return γ̂. Note that all g(γ) are computed simultaneously in their neighbor space. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 13 / 25
  • 19. Best subset selection with varying k Note that Goal (ii): a single best model from among 2p candidate models. We extend ”fixed” k to varying k by assigning a prior on k. Note that the uniform prior, k ∼ Uniform{1, . . . , K}, tends to assign larger probability to a larger subset (see Chen and Chen (2008)). We define π(k) ∝ 1/ p k I(k ≤ K). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 14 / 25
  • 20. Hybrid best subset search with varying k Bayesian best subset selection can be done by maximizing π(γ, k|y) ∝ g(γ)/ p k (4) over (γ, k). Our algorithm proceeds as follows: 1. Repeat for k = 1, . . . , K: Given k, implement the hybrid search algorithm to obtain best subset model γ̂k . 2. Find the best model γ̂∗ obtained by γ̂∗ = arg max k∈{1,...,K} g(γ̂k )/ p k . (5) Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 15 / 25
  • 21. Consistency of model selection criterion Theorem Let γ∗ indicate the true model. Define Γ = {γ : |γ| ≤ K, γ 6= γ∗}. Assume that p = O(nξ ) for ξ ≥ 1. Under the asymptotic identifiability condition of Chen and Chen (2008), if τ → ∞ as n → ∞ but τ = o(n), then the proposed Bayesian subset selection possesses the Bayesian model selection consistency, that is, π(γ∗|y) max γ∈Γ π(γ|y) (6) in probability as n → ∞. As n → ∞, the maximizer of π(γ|y) is the true model based on our model selection criterion. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 16 / 25
  • 22. Simulation study Setup For given n = 100, we generate the data from yi ind ∼ Normal p X j=1 βj xij , 1 ! , where I (xi1, . . . , xip)T iid ∼ Normal(0p, Σ) with Σ = (Σij )p×p and Σij = ρ|i−j| , I βj iid ∼ Uniform{−1, −2, 1, 2} if j ∈ γ and βj = 0 if j / ∈ γ. I γ is an index set of size 4 randomly selected from {1, 2, . . . , p}. I We consider four scenarios for p and ρ: (i) p = 200, ρ = 0.1, (ii) p = 200, ρ = 0.9, (iii) p = 1000, ρ = 0.1, (iv) p = 1000, ρ = 0.9. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 17 / 25
  • 23. Simulation study Results (high-dimensional scenarios) Table: 2, 000 replications; FDR (false discovery rate), TRUE% (percentage of the true model detected), SIZE (selected model size), HAM (Hamming distance). Scenario Method FDR (s.e.) TRUE% (s.e.) SIZE (s.e.) HAM (s.e.) p = 200 Proposed 0.006 (0.001) 96.900 (0.388) 4.032 (0.004) 0.032 (0.004) ρ = 0.1 SCAD 0.034 (0.002) 85.200 (0.794) 4.188 (0.011) 0.188 (0.011) MCP 0.035 (0.002) 84.750 (0.804) 4.191 (0.011) 0.191 (0.011) ENET 0.016 (0.001) 92.700 (0.582) 4.087 (0.007) 0.087 (0.007) LASSO 0.020 (0.002) 91.350 (0.629) 4.109 (0.009) 0.109 (0.009) p = 200 Proposed 0.023 (0.002) 88.750 (0.707) 3.985 (0.006) 0.203 (0.014) ρ = 0.9 SCAD 0.059 (0.003) 74.150 (0.979) 4.107 (0.015) 0.480 (0.022) MCP 0.137 (0.004) 55.400 (1.112) 4.264 (0.020) 1.098 (0.034) ENET 0.501 (0.004) 0.300 (0.122) 7.716 (0.072) 5.018 (0.052) LASSO 0.276 (0.004) 15.550 (0.811) 5.308 (0.033) 2.038 (0.034) Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 18 / 25
  • 24. Simulation study Results (ultra high-dimensional scenarios) Table: 2, 000 replications; FDR (false discovery rate), TRUE% (percentage of the true model detected), SIZE (selected model size), HAM (Hamming distance). Scenario Method FDR (s.e.) TRUE% (s.e.) SIZE (s.e.) HAM (s.e.) p = 1000 Proposed 0.004 (0.001) 98.100 (0.305) 4.020 (0.003) 0.020 (0.003) ρ = 0.1 SCAD 0.027 (0.002) 87.900 (0.729) 4.145 (0.010) 0.145 (0.010) MCP 0.031 (0.002) 86.550 (0.763) 4.172 (0.013) 0.172 (0.013) ENET 0.035 (0.002) 84.850 (0.802) 4.181 (0.013) 0.206 (0.012) LASSO 0.014 (0.001) 93.850 (0.537) 4.073 (0.007) 0.073 (0.007) p = 1000 Proposed 0.023(0.002) 89.850 (0.675) 4.005 (0.005) 0.190 (0.013) ρ = 0.9 SCAD 0.068 (0.003) 74.250 (0.978) 4.196 (0.014) 0.493 (0.023) MCP 0.152 (0.004) 53.750 (1.115) 4.226 (0.017) 1.202 (0.035) ENET 0.417 (0.005) 0.150 (0.087) 6.228 (0.068) 4.089 (0.043) LASSO 0.265 (0.004) 19.500 (0.886) 5.139 (0.029) 1.909 (0.035) Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 19 / 25
  • 25. Real data application Data description We apply the proposed method to Breast Invasive Carcinoma (BRCA) data generated by The Cancer Genome Atlas (TCGA) Research Network http://cancergenome.nih.gov. The data set contains 17, 814 gene expression measurements (recorded on the log scale) of 526 patients with primary solid tumor. BRCA1 is a tumor suppressor gene and its mutations predispose women to breast cancer (Findlay et al., 2018). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 20 / 25
  • 26. Real data application Results (based on 4, 000 genes) Our goal here is to identify the best fitting model for estimating an association between BRCA1 (response variable) and the other genes (independent variables). BRCA1 = β1 ∗ NBR2 + β2 ∗ DTL + . . . + βp ∗ VPS25 + . Results: Table: Model comparison # of selected PMSE BIC EBIC Proposed 8 0.60 984.45 1099.50 SCAD 4 0.68 1104.69 1166.47 MCP 4 0.68 1104.69 1166.47 ENET 5 0.68 1110.65 1186.25 LASSO 4 0.68 1104.69 1166.47 Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 21 / 25
  • 27. Real data application Results (cont.) 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.582 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 RPL27 C10orf76 DTL NBR2 C17orf53 TUBG1 CRBN YTHDC2 VPS25 CMTM5 TUBA1B LASSO ENET MCP SCAD Proposed Methods Genes −0.1 0.0 0.1 0.2 0.3 Coefficient Figure: Except C10orf76, 7 genes are documented as diseases-related genes Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 22 / 25
  • 28. Concluding remarks Parallel computing is applicable to our algorithm with varying k. The proposed method can be extended to multivariate linear regression models, binary regression models, and multivariate mixed responses models (in progress). Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 23 / 25
  • 29. REFERENCES Hans, C., A. Dobra, and M. West (2007). Shotgun stochastic search for “large p” regression. Journal of the American Statistical Association 102(478), 507–516. Chen, J. and Z. Chen (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika 95(3), 759–771. Findlay, G. M., R. M. Daza, B. Martin, M. D. Zhang, A. P. Leith, M. Gasperini, J. D. Janizek, X. Huang, L. M. Starita, and J. Shendure (2018). Accurate classification of brca1 variants with saturation genome editing. Nature 562(7726), 217. Madigan, David, Jeremy York, and Denis Allard (1995). Bayesian Graphical Models for Discrete Data. International Statistical Review / Revue Internationale De Statistique 63(2), 215-232. Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 24 / 25
  • 30. Contact: jinsq@ksu.edu THANK YOU Shiqiang Jin (Department of Statistics Kansas State University, Manhattan, KS Joint work with Gyuhyeong Goh Kansas State University, Manhattan, Bayesian selection of best subsets in high-dimensional regression July 31, 2019 25 / 25