SlideShare a Scribd company logo
Discussion of “Representative points for
Small and Big Data Problems”
(with thanks to Won Chang, Ben Lee, and Jaewoo Park)
Quasi Monte Carlo Transition Workshop
SAMSI, May 2018
Murali Haran
Department of Statistics, Penn State University
Murali Haran, Penn State 1
A few computational challenges
Maximize (minimize) expensive or intractable likelihood
(objective) function for data X and parameter θ,
ˆθMLE = arg max
θ
L(θ; X), or ˆβ = arg min
β
f(β; X)
Bayesian inference, with prior θ
π(θ|X) ∝ L(θ; X)p(θ).
Approximating normalizing constants
Notation: number of data points is n (as X = (X1, . . . , Xn)),
dimension of θ is d, dimension of each X is p
Murali Haran, Penn State 2
Big data and small data problems
These challenges (previous slide) can arise in different settings
Big data setting: n is large, making L(θ; X) expensive to
evaluate due to matrix computations.
High-dimensional regression (e.g. song release prediction,
Mak and Joseph, 2017)
Models for high-dimensional spatial data
High-dimensional output of a computer model
Small data setting: each "data point" is expensive to obtain
Statistical model = deterministic model + error model
deterministic model = climate model, engineering model
Very slow to run at each input (θ)
Studying deterministic model as we vary input similar to
likelihood or objective function that is expensive
Murali Haran, Penn State 3
A general strategy
Work with surrogate: replace L(θ; X) with L(·).
Evaluate L(θ; X) on a relatively small set of θ values. Fit a
Gaussian process (GP) approximation to these sample to
obtain LGP(θ; X), treated as a surrogate.
Literature starting with Sacks et al. (1989) and GP-based
emulation-calibration (Kennedy and O’Hagan, 2001)
Can do
optimization with LGP(θ; X)
Bayesian inference based on π(θ|X) ∝ LGP(θ; X)p(θ)
Murali Haran, Penn State 4
Challenges posed by GP approximations
Gaussian processes use dependence to pick up non-linear
relationships between input and output: remarkably flexible
“non-parametric” framework and hence very widely
applicable
(1) However, if input dimension (dimension of θ) is large
Expensive/impossible to fill up space with slow model,
resulting in poor prediction
(2) If possible to obtain lots of runs (model is not too
expensive), can fill up space but...
Expensive to fit GP to large n number of data points (model
runs): order n3
cost of evaluating L(θ; X) for each θ
Murali Haran, Penn State 5
Working Group IV’s solutions
Solutions:
(1) Kang and Huang (2018): Reduce dimension of input (θ) to
θ∗ using convex combination of kernels, LGP(θ∗; X)
(2) Mak and Joseph (2018): Reduce the number of data
points L(θ; X) via clever design of “support points”.
Reduction from X to X∗ to obtain surrogate LGP(θ; X∗).
Easier to evaluate
Active data reduction (Mak and Joseph, 2018): reduce the
number of data points from n to much smaller number n
Murali Haran, Penn State 6
Statistics literature on these problems
There is a (large) body of work on dimension reduction
Input space: Dimension reduction in regression by D.R.
Cook, Bing Li, F. Chiaromonte, others
Finding central mean subspace (Cook and Li, 2002, 2004)
Lots of theoretical work, and lots of applications
Also, literature separation between environmental/spatial
and engineering folks?
composite likelihood (Vecchia, 1988) (no reduction)
reduced-rank approaches...
Murali Haran, Penn State 7
Open questions - I
Reduced-rank approaches (active area) statistics):
kernel convolutions (Higdon, 1998)
predictive process (Banerjee et al., 2008)
random projections (Banerjee et al, 2012; Guan, Haran,
2018)
multi-resolution approaches (Katzsfuss, 2017)
Data compression literature?
How do the existing approaches compare to the proposed
approaches from this group?
Useful thought experiment, even without simulation study
computational costs? detailed complexity calculations?
approximation error?
ease of implementation? (should not be underestimated!)
theoretical guarantees?
Murali Haran, Penn State 8
A different kind of dimension-reduction problem
(Aside)
In many problems the output of the model is very
high-dimensional, that is, if X is p-dimensional in L(θ; X),
with p large
Example: climate model output (SAMSI transition
workshop next week)
An approach: Principal components for fast Gaussian
process emulation-calibration (e.g. Chang, Haran et al.,
2014, 2016a, b; Higdon et al., 2008):
Treat multiple model runs as replicates and find principal
components to obtain low-dimensional representation
Use GP to emulate just the principal components
Murali Haran, Penn State 9
Open questions - II
Is it possible to handle higher dimensions than the
examples shown in Kang and Huang? E.g. in climate
science interested in 10-20 or even larger dimension of θ
Are there connections between the dual optimization
approach (Lulu Kang’s talk) and other surrogate methods?
Does active data reduction preserve dependence structure
and other complexities in the data?
E.g. consider data compression work by Guinness and
Hammerling (2018), specifically targeted at spatial data
Active data reduction: How is GP fit quickly with new
samples at each iteration? (important!)
Any way to batch this instead of 1 point at a time?
Murali Haran, Penn State 10
Adaptive estimation of normalizing constants
Idea: fit linear combination of normal basis functions using
MCMC samples + unnormalized posterior evaluations
Closed-form normalizing constant from approximation
How does methodology work if (i) unnormalized posterior
is expensive, (ii) sampling is expensive?
Approximating covariance Σ: Fast? What is being
assumed about Σ? Need some restrictions, but cannot be
restrictive or it will not work well for complicated
dependence in posterior
Why refer to “rejected” samples from MCMC separately?
Treat as Monte Carlo procedure regardless of whether
MCMC was used (all “accepted”!)
Work would benefit from challenging Bayes example!
Murali Haran, Penn State 11
A sense of scale (what is “big”?)
Different ice sheet simulation models I work with
< 1 to 20 seconds per run (“run” = one input (θ))
2 to 10 minutes per run
48 hours per run
# evaluations (n) possible: hundreds to millions
# of parameters (d) of interest varies between 4 and 16
# dimensions of output (p) varies from 4 to ≈ 100,000
Different computational methods for different settings
MCMC algorithms (fast model, many parameters)
Gaussian process emulation (slow model, few parameters)
Reduced-dimensional GP (slow model, few parameters,
high-dimensional output), e.g. Chang, Haran et al. (2014)
Particle-based methods (moderately fast, many
parameters): ongoing work with Ben Lee et al. (talk at
SAMSI Transition Workshop next week)Murali Haran, Penn State 12
Another problem that pushes the envelope
Consider a problem where evaluating L(θ; X) is expensive
and θ is not low-dimensional
Question: How well would the working group’s methods
adapt to this scenario?
Example: Bayesian inference for doubly intractable
distributions
Murali Haran, Penn State 13
Models with intractable normalizing functions
Data: x ∈ χ, parameter: θ ∈ Θ
Probability model: h(x|θ)/Z(θ)
where Z(θ) = χ h(x|θ)dx is intractable
Popular examples
Social network models: exponential random graph models
(Robins et al., 2002; Hunter et al., 2008)
Models for lattice data (Besag, 1972, 1974)
Spatial point process models: interaction models
Strauss (1975), Geyer (1999), Geyer and Møller (1994),
Goldstein, Haran, Chiaromonte et al. (2015)
Murali Haran, Penn State 14
Bayesian inference
Bayesian inference
Prior : p(θ)
Posterior: π(θ|x) ∝ p(θ)h(x|θ)/Z(θ)
Acceptance ratio for Metropolis-Hastings algorithm
π(θ |x)q(θn|θ )
π(θn|x)q(θ |θn)
=
p(θ )Z(θn)h(x|θ )q(θn|θ )
p(θn)Z(θ )h(x|θn)q(θ |θn)
Cannot evaluate because of Z(·)
Murali Haran, Penn State 15
A function emulation approach
Existing algorithms are all computationally very expensive
(Park and Haran, 2018a)
Each iteration of algorithm involves an “inner sampler”, a
sampling algorithm for a high-dimensional auxiliary variable.
Inner sampler is expensive (again, expensive L(θ; X))
Our function emulation approach (Park and Haran, 2018b)
1. Approximate Z(θ) using importance sampling on some k
design points, ZIMP(θ1), . . . , ZIMP(θk )
2. Use Gaussian process emulation approach on k points to
interpolate this function at other values of θ, ZGP(θ)
3. Run MCMC algorithm using ZGP(θ) at each iteration
We have theoretical justification as # design points (k) and
# importance sampling draws increases
Murali Haran, Penn State 16
Results for an example
Emul1, Emul10 are two versions of our algorithm
Double M-H is fastest of existing algorithms
Simulated social network (ERGM): 1400 nodes
θ2 Mean 95%HPD Time(hour)
Double M-H 1.77 (1.44, 2.12) 23.83
Emul1 1.79 (1.45, 2.13) 0.45
Emul10 1.96 (1.87, 2.05) 1.39
True θ2 = 2: Emul10 is accurate, others are not
Computational efficiency allows us to use longer chain
(Emul10). Corresponding DMH algorithm ≈ 10 days
Murali Haran, Penn State 17
Positives and limitations
Our approach can provide accurate approximations for
problems for which other methods are unfeasible
Works well only for θ of dimension under 5. This still covers
a huge number of interesting problems, but it would be nice
to go beyond
higher-dimensions: unable to fill the space well enough to
approximate the normalizing function well
We require a good set of design points at the beginning.
Hence, have to run another (expensive) algorithm before
running this one. This is a major bottleneck
Interesting opportunities for (i) input-space dimension
reduction, (ii) clever design strategies
Murali Haran, Penn State 18
Discussion (of discussion)
Congratulations to the speakers: they are tackling
numerous very interesting and useful problems, broadly
related to handling expensive likelihood/objective functions
They offer creative solutions to challenging problems:
Clever design (support points)
New methods for dimension reduction of data
Lots of existing work in dimension reduction, and in
Gaussian process emulation-calibration literature that
might be worth investigating
Open problem when parameters are not low-dimensional
and the objective function is expensive to evaluate
Murali Haran, Penn State 19
Selected references
Higdon (1998) A process-convolution approach to modelling
temperatures, Env Ecol Stats
Park and Haran (2018a) Bayesian Inference in the Presence of
Intractable Normalizing Functions (on arxiv.org ) to appear J of
American Stat Assoc
Guan, Y. and Haran, M. (2018) “A Computationally Efficient
Projection-Based Approach for Spatial Generalized Linear Mixed
Models,” to appear in J of Comp and Graph Stats
Chang, W., Haran, M., Applegate, P., and Pollard, D. (2016)
“Calibrating an ice sheet model using high-dimensional binary
spatial data,” J of American Stat Assoc, 111 (513), 57-72.
Cook, R.D. and Li, B. (2002) “Dimension reduction for
conditional mean in regression,” Annals of Stats
Murali Haran, Penn State 20

More Related Content

PDF
Efficient projections
PPT
Learning for Optimization: EDAs, probabilistic modelling, or ...
PDF
Learning To Rank data2day 2017
PDF
Uncertainty Awareness in Integrating Machine Learning and Game Theory
PDF
A short and naive introduction to using network in prediction models
PDF
Eeee2017 Conference - OR in the digital era - ICT challenges | Presentation
PDF
Reproducibility and differential analysis with selfish
PDF
Differential analyses of structures in HiC data
Efficient projections
Learning for Optimization: EDAs, probabilistic modelling, or ...
Learning To Rank data2day 2017
Uncertainty Awareness in Integrating Machine Learning and Game Theory
A short and naive introduction to using network in prediction models
Eeee2017 Conference - OR in the digital era - ICT challenges | Presentation
Reproducibility and differential analysis with selfish
Differential analyses of structures in HiC data

What's hot (20)

PDF
Predicting Preference Reversals via Gaussian Process Uncertainty Aversion
DOCX
Planted Clique Research Paper
PDF
On the Dynamics of Machine Learning Algorithms and Behavioral Game Theory
PDF
Kernel methods for data integration in systems biology
PDF
COCOON14
PDF
On the Mining of Numerical Data with Formal Concept Analysis
PDF
From RNN to neural networks for cyclic undirected graphs
PDF
Investigating the 3D structure of the genome with Hi-C data analysis
PDF
Presentation
PDF
[A]BCel : a presentation at ABC in Roma
PDF
Ikdd co ds2017presentation_v2
PDF
Joco pavone
PDF
Random Forest for Big Data
PDF
Estimating Space-Time Covariance from Finite Sample Sets
PDF
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
PDF
Fast and Bootstrap Robust for LTS
PDF
Kernel methods for data integration in systems biology
PPT
Research Away Day Jun 2009
PDF
Developing effective meta heuristics for a probabilistic
PDF
Collective Response Spike Prediction for Mutually Interacting Consumers
Predicting Preference Reversals via Gaussian Process Uncertainty Aversion
Planted Clique Research Paper
On the Dynamics of Machine Learning Algorithms and Behavioral Game Theory
Kernel methods for data integration in systems biology
COCOON14
On the Mining of Numerical Data with Formal Concept Analysis
From RNN to neural networks for cyclic undirected graphs
Investigating the 3D structure of the genome with Hi-C data analysis
Presentation
[A]BCel : a presentation at ABC in Roma
Ikdd co ds2017presentation_v2
Joco pavone
Random Forest for Big Data
Estimating Space-Time Covariance from Finite Sample Sets
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
Fast and Bootstrap Robust for LTS
Kernel methods for data integration in systems biology
Research Away Day Jun 2009
Developing effective meta heuristics for a probabilistic
Collective Response Spike Prediction for Mutually Interacting Consumers
Ad

Similar to QMC: Transition Workshop - Discussion of "Representative Points for Small and Big Data Problems" - Murali Haran, May 7, 2018 (20)

PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PDF
Chap 8. Optimization for training deep models
PDF
Likelihood-free Design: a discussion
PPT
Intro to Model Selection
PDF
CLIM: Transition Workshop - Activities and Progress in the “Parameter Optimiz...
PPTX
Machine Learning basics
PDF
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
PDF
SigOpt_Bayesian_Optimization_Primer
PPT
Learning to Search Henry Kautz
PPT
Learning to Search Henry Kautz
PDF
slides of ABC talk at i-like workshop, Warwick, May 16
PDF
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
PPT
Statistical Machine________ Learning.ppt
PDF
MUMS: Transition & SPUQ Workshop - A Review of Model Calibration Methods with...
PDF
Principle of Maximum Entropy
PDF
Mahoney mlconf-nov13
PPT
Jörg Stelzer
PDF
Approaches to online quantile estimation
PDF
20130928 automated theorem_proving_harrison
PPT
original
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
Chap 8. Optimization for training deep models
Likelihood-free Design: a discussion
Intro to Model Selection
CLIM: Transition Workshop - Activities and Progress in the “Parameter Optimiz...
Machine Learning basics
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
SigOpt_Bayesian_Optimization_Primer
Learning to Search Henry Kautz
Learning to Search Henry Kautz
slides of ABC talk at i-like workshop, Warwick, May 16
OPTIMAL GLOBAL THRESHOLD ESTIMATION USING STATISTICAL CHANGE-POINT DETECTION
Statistical Machine________ Learning.ppt
MUMS: Transition & SPUQ Workshop - A Review of Model Calibration Methods with...
Principle of Maximum Entropy
Mahoney mlconf-nov13
Jörg Stelzer
Approaches to online quantile estimation
20130928 automated theorem_proving_harrison
original
Ad

More from The Statistical and Applied Mathematical Sciences Institute (20)

PDF
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
PDF
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
PDF
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
PDF
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
PDF
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
PDF
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
PPTX
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
PDF
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
PDF
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
PPTX
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
PDF
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
PDF
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
PDF
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
PDF
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
PDF
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
PDF
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
PPTX
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
PPTX
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
PDF
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
PDF
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...

Recently uploaded (20)

PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
Indian roads congress 037 - 2012 Flexible pavement
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
Computer Architecture Input Output Memory.pptx
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
advance database management system book.pdf
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PDF
HVAC Specification 2024 according to central public works department
PDF
IGGE1 Understanding the Self1234567891011
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Hazard Identification & Risk Assessment .pdf
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Share_Module_2_Power_conflict_and_negotiation.pptx
Indian roads congress 037 - 2012 Flexible pavement
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Computer Architecture Input Output Memory.pptx
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
advance database management system book.pdf
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Paper A Mock Exam 9_ Attempt review.pdf.
Weekly quiz Compilation Jan -July 25.pdf
TNA_Presentation-1-Final(SAVE)) (1).pptx
HVAC Specification 2024 according to central public works department
IGGE1 Understanding the Self1234567891011
LDMMIA Reiki Yoga Finals Review Spring Summer
History, Philosophy and sociology of education (1).pptx
AI-driven educational solutions for real-life interventions in the Philippine...
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Hazard Identification & Risk Assessment .pdf
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Chinmaya Tiranga Azadi Quiz (Class 7-8 )

QMC: Transition Workshop - Discussion of "Representative Points for Small and Big Data Problems" - Murali Haran, May 7, 2018

  • 1. Discussion of “Representative points for Small and Big Data Problems” (with thanks to Won Chang, Ben Lee, and Jaewoo Park) Quasi Monte Carlo Transition Workshop SAMSI, May 2018 Murali Haran Department of Statistics, Penn State University Murali Haran, Penn State 1
  • 2. A few computational challenges Maximize (minimize) expensive or intractable likelihood (objective) function for data X and parameter θ, ˆθMLE = arg max θ L(θ; X), or ˆβ = arg min β f(β; X) Bayesian inference, with prior θ π(θ|X) ∝ L(θ; X)p(θ). Approximating normalizing constants Notation: number of data points is n (as X = (X1, . . . , Xn)), dimension of θ is d, dimension of each X is p Murali Haran, Penn State 2
  • 3. Big data and small data problems These challenges (previous slide) can arise in different settings Big data setting: n is large, making L(θ; X) expensive to evaluate due to matrix computations. High-dimensional regression (e.g. song release prediction, Mak and Joseph, 2017) Models for high-dimensional spatial data High-dimensional output of a computer model Small data setting: each "data point" is expensive to obtain Statistical model = deterministic model + error model deterministic model = climate model, engineering model Very slow to run at each input (θ) Studying deterministic model as we vary input similar to likelihood or objective function that is expensive Murali Haran, Penn State 3
  • 4. A general strategy Work with surrogate: replace L(θ; X) with L(·). Evaluate L(θ; X) on a relatively small set of θ values. Fit a Gaussian process (GP) approximation to these sample to obtain LGP(θ; X), treated as a surrogate. Literature starting with Sacks et al. (1989) and GP-based emulation-calibration (Kennedy and O’Hagan, 2001) Can do optimization with LGP(θ; X) Bayesian inference based on π(θ|X) ∝ LGP(θ; X)p(θ) Murali Haran, Penn State 4
  • 5. Challenges posed by GP approximations Gaussian processes use dependence to pick up non-linear relationships between input and output: remarkably flexible “non-parametric” framework and hence very widely applicable (1) However, if input dimension (dimension of θ) is large Expensive/impossible to fill up space with slow model, resulting in poor prediction (2) If possible to obtain lots of runs (model is not too expensive), can fill up space but... Expensive to fit GP to large n number of data points (model runs): order n3 cost of evaluating L(θ; X) for each θ Murali Haran, Penn State 5
  • 6. Working Group IV’s solutions Solutions: (1) Kang and Huang (2018): Reduce dimension of input (θ) to θ∗ using convex combination of kernels, LGP(θ∗; X) (2) Mak and Joseph (2018): Reduce the number of data points L(θ; X) via clever design of “support points”. Reduction from X to X∗ to obtain surrogate LGP(θ; X∗). Easier to evaluate Active data reduction (Mak and Joseph, 2018): reduce the number of data points from n to much smaller number n Murali Haran, Penn State 6
  • 7. Statistics literature on these problems There is a (large) body of work on dimension reduction Input space: Dimension reduction in regression by D.R. Cook, Bing Li, F. Chiaromonte, others Finding central mean subspace (Cook and Li, 2002, 2004) Lots of theoretical work, and lots of applications Also, literature separation between environmental/spatial and engineering folks? composite likelihood (Vecchia, 1988) (no reduction) reduced-rank approaches... Murali Haran, Penn State 7
  • 8. Open questions - I Reduced-rank approaches (active area) statistics): kernel convolutions (Higdon, 1998) predictive process (Banerjee et al., 2008) random projections (Banerjee et al, 2012; Guan, Haran, 2018) multi-resolution approaches (Katzsfuss, 2017) Data compression literature? How do the existing approaches compare to the proposed approaches from this group? Useful thought experiment, even without simulation study computational costs? detailed complexity calculations? approximation error? ease of implementation? (should not be underestimated!) theoretical guarantees? Murali Haran, Penn State 8
  • 9. A different kind of dimension-reduction problem (Aside) In many problems the output of the model is very high-dimensional, that is, if X is p-dimensional in L(θ; X), with p large Example: climate model output (SAMSI transition workshop next week) An approach: Principal components for fast Gaussian process emulation-calibration (e.g. Chang, Haran et al., 2014, 2016a, b; Higdon et al., 2008): Treat multiple model runs as replicates and find principal components to obtain low-dimensional representation Use GP to emulate just the principal components Murali Haran, Penn State 9
  • 10. Open questions - II Is it possible to handle higher dimensions than the examples shown in Kang and Huang? E.g. in climate science interested in 10-20 or even larger dimension of θ Are there connections between the dual optimization approach (Lulu Kang’s talk) and other surrogate methods? Does active data reduction preserve dependence structure and other complexities in the data? E.g. consider data compression work by Guinness and Hammerling (2018), specifically targeted at spatial data Active data reduction: How is GP fit quickly with new samples at each iteration? (important!) Any way to batch this instead of 1 point at a time? Murali Haran, Penn State 10
  • 11. Adaptive estimation of normalizing constants Idea: fit linear combination of normal basis functions using MCMC samples + unnormalized posterior evaluations Closed-form normalizing constant from approximation How does methodology work if (i) unnormalized posterior is expensive, (ii) sampling is expensive? Approximating covariance Σ: Fast? What is being assumed about Σ? Need some restrictions, but cannot be restrictive or it will not work well for complicated dependence in posterior Why refer to “rejected” samples from MCMC separately? Treat as Monte Carlo procedure regardless of whether MCMC was used (all “accepted”!) Work would benefit from challenging Bayes example! Murali Haran, Penn State 11
  • 12. A sense of scale (what is “big”?) Different ice sheet simulation models I work with < 1 to 20 seconds per run (“run” = one input (θ)) 2 to 10 minutes per run 48 hours per run # evaluations (n) possible: hundreds to millions # of parameters (d) of interest varies between 4 and 16 # dimensions of output (p) varies from 4 to ≈ 100,000 Different computational methods for different settings MCMC algorithms (fast model, many parameters) Gaussian process emulation (slow model, few parameters) Reduced-dimensional GP (slow model, few parameters, high-dimensional output), e.g. Chang, Haran et al. (2014) Particle-based methods (moderately fast, many parameters): ongoing work with Ben Lee et al. (talk at SAMSI Transition Workshop next week)Murali Haran, Penn State 12
  • 13. Another problem that pushes the envelope Consider a problem where evaluating L(θ; X) is expensive and θ is not low-dimensional Question: How well would the working group’s methods adapt to this scenario? Example: Bayesian inference for doubly intractable distributions Murali Haran, Penn State 13
  • 14. Models with intractable normalizing functions Data: x ∈ χ, parameter: θ ∈ Θ Probability model: h(x|θ)/Z(θ) where Z(θ) = χ h(x|θ)dx is intractable Popular examples Social network models: exponential random graph models (Robins et al., 2002; Hunter et al., 2008) Models for lattice data (Besag, 1972, 1974) Spatial point process models: interaction models Strauss (1975), Geyer (1999), Geyer and Møller (1994), Goldstein, Haran, Chiaromonte et al. (2015) Murali Haran, Penn State 14
  • 15. Bayesian inference Bayesian inference Prior : p(θ) Posterior: π(θ|x) ∝ p(θ)h(x|θ)/Z(θ) Acceptance ratio for Metropolis-Hastings algorithm π(θ |x)q(θn|θ ) π(θn|x)q(θ |θn) = p(θ )Z(θn)h(x|θ )q(θn|θ ) p(θn)Z(θ )h(x|θn)q(θ |θn) Cannot evaluate because of Z(·) Murali Haran, Penn State 15
  • 16. A function emulation approach Existing algorithms are all computationally very expensive (Park and Haran, 2018a) Each iteration of algorithm involves an “inner sampler”, a sampling algorithm for a high-dimensional auxiliary variable. Inner sampler is expensive (again, expensive L(θ; X)) Our function emulation approach (Park and Haran, 2018b) 1. Approximate Z(θ) using importance sampling on some k design points, ZIMP(θ1), . . . , ZIMP(θk ) 2. Use Gaussian process emulation approach on k points to interpolate this function at other values of θ, ZGP(θ) 3. Run MCMC algorithm using ZGP(θ) at each iteration We have theoretical justification as # design points (k) and # importance sampling draws increases Murali Haran, Penn State 16
  • 17. Results for an example Emul1, Emul10 are two versions of our algorithm Double M-H is fastest of existing algorithms Simulated social network (ERGM): 1400 nodes θ2 Mean 95%HPD Time(hour) Double M-H 1.77 (1.44, 2.12) 23.83 Emul1 1.79 (1.45, 2.13) 0.45 Emul10 1.96 (1.87, 2.05) 1.39 True θ2 = 2: Emul10 is accurate, others are not Computational efficiency allows us to use longer chain (Emul10). Corresponding DMH algorithm ≈ 10 days Murali Haran, Penn State 17
  • 18. Positives and limitations Our approach can provide accurate approximations for problems for which other methods are unfeasible Works well only for θ of dimension under 5. This still covers a huge number of interesting problems, but it would be nice to go beyond higher-dimensions: unable to fill the space well enough to approximate the normalizing function well We require a good set of design points at the beginning. Hence, have to run another (expensive) algorithm before running this one. This is a major bottleneck Interesting opportunities for (i) input-space dimension reduction, (ii) clever design strategies Murali Haran, Penn State 18
  • 19. Discussion (of discussion) Congratulations to the speakers: they are tackling numerous very interesting and useful problems, broadly related to handling expensive likelihood/objective functions They offer creative solutions to challenging problems: Clever design (support points) New methods for dimension reduction of data Lots of existing work in dimension reduction, and in Gaussian process emulation-calibration literature that might be worth investigating Open problem when parameters are not low-dimensional and the objective function is expensive to evaluate Murali Haran, Penn State 19
  • 20. Selected references Higdon (1998) A process-convolution approach to modelling temperatures, Env Ecol Stats Park and Haran (2018a) Bayesian Inference in the Presence of Intractable Normalizing Functions (on arxiv.org ) to appear J of American Stat Assoc Guan, Y. and Haran, M. (2018) “A Computationally Efficient Projection-Based Approach for Spatial Generalized Linear Mixed Models,” to appear in J of Comp and Graph Stats Chang, W., Haran, M., Applegate, P., and Pollard, D. (2016) “Calibrating an ice sheet model using high-dimensional binary spatial data,” J of American Stat Assoc, 111 (513), 57-72. Cook, R.D. and Li, B. (2002) “Dimension reduction for conditional mean in regression,” Annals of Stats Murali Haran, Penn State 20