Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, Developing Stochastic Parameterizations of Subgrid Variability of Clouds and Turbulence using High-Resolution Simulations - Chris Bretherton, Aug 24, 2017

A vision for applying machine learning to
parameterization of unresolved subgrid variability in
climate models using higher-resolution simulations
Christopher S. Bretherton
Departments of Atmospheric Science and Applied Mathematics,
University of Washington

What your atmospheric scientists were doing Monday

Cloud
processes are
multiscale

A single grid column of a global climate model (100 x 100 km)

Representing such unresolved variability is challenging
Park et al. 2014 Park and Bretherton 2009
…so ‘parameterizations’ of moist physics are difficult to
develop, subjective and work imperfectly.

Feeds into weather/climate model biases & uncertainties
CMIP5
(models)
GPCP
(observations)
Example: Double-ITCZ rainfall bias in climate models
Subgrid parameterization improvements have come more
through ‘targeted tweaking’ than from fundamental advances.

Cloud processes are better simulated on fine grids
Giga-LES (Khairoutdinov et al. 2009)
100x100 km with 100 m grid

1 km global simulations are (briefly) possible
NICAM
3.5-14 km (mo-yr)
Tomita et al. 2005
0.9 km (for 24 hr):
Miyamoto et al. 2013

Multiyear simulations with ‘ultraparameterization’
Ultrafine-grid 2D cloud-resolving model in each GCM
grid column (Δx = 250 m, Δz = 20 m for z=0.5-2 km)
Parishani et al. 2017
0 8 km

Can hi-res simulations help subgrid parameterizations?
• High-resolution models have progressed faster than moist
physics parameterizations in GCMs
• They are conceptually simpler than GCMs, because cloud
properties and air velocity don’t vary much within a grid cell, so
a complicated model for their subgrid covariability is not
needed.
• They must still parameterize smaller-scale processes (e. g.
cloud droplets a few microns wide, complex ice crystals,
turbulence, aerosols). Like other models, they are works in
progress needing constant testing vs. observations.
• Still, high-resolution simulations provide realistic reference
datasets for parameterizing subgrid cloud process variability

The human bottleneck
• Since 1992, the GCSS program has used this approach
• Improved parameterizations of cumulus convection,
turbulence, and cloud microphysics have been
implemented in leading weather and climate models
• But progress has been slow, and the uptake of new
insights from hi-res modelling and new observations is
difficult because humans concoct parameterizations.

So, how about machine learning (ML)?
• Increasingly large and comprehensive training datasets
from high-resolution simulations, if we trust them.
• A coarse-graining problem: With variables computed by
the coarse grid model (e. g. temperature, moisture and
wind profiles), use the fine-grid model to return needed
quantities to the coarse-grid model (e. g. rainfall, vertical
profiles of fractional cloud cover, turbulence, atmospheric
heating and drying).
• Ideally, the needed quantities should be stochastic -
sampled from the hi-res derived pdf of internal fine-grid
variability consistent with the coarse-grid variables.
• Could machine learning techniques help?

I wish I knew…
Some machine learning challenges
• Supervised (structured) or unsupervised learning?
• How to use training data to make a scheme stochastic?
• Inputs may be highly multidimensional (e. g. whole
multilevel profiles of temperature and humidity)
• The training dataset only covers real atmospheric states,
but a GCM will stray outside such states.
• What to do in the likely event that a ‘black box’ ML-
based scheme is numerically unstable n a GCM?
• Can we make the scheme ‘modular’ so as to separate
subgrid variability issues from changes in microphysics?
• How can we tune a ML-based scheme to minimize
errors of the GCM vs. observations?
• What if we change the GCM grid resolution?

What we are not trying to do with machine learning
• Pure dimension reduction, i. e. using ML to find a
computationally simper approximation to a deterministic
but compute-intensive parameterization, e. g.
atmospheric radiative transfer.
• Learn governing equations. Our high resolution model
uses equations for cloud microphysics and fluid motions
that we assume are correct for this purpose. Ideally, we
want to learn the subgrid variability needed to apply
these equations on the coarse grid scale by integrating
over distributions (e. g. joint pdfs of vertical air velocity,
cloud water and rain water) that might not depend on the
microphysics parameterizations.

So this is where you help me
• We have just started a small (3 person) group in UW
Atm Sci, with connections to our eScience institute
• I will describe selected past work and our baby steps.
• I would appreciate the feedback of those of you who
have more thoughts about how to do this.

Past work I: Subramanian and Palmer 2017 JAMES
• ‘Superparameterized’ ECMWF IFS (Small 2D cloud
resolving model represents moist physics in each grid
column of the global model)
• Ensemble of SP-IFS forecasts made from particular days,
differing only in random initial temperature noise used to
kick off small-scale motions in each CRM.
• Is spread of the ensemble forecasts a realistic guide to the
overall forecast uncertainty? That is, is this a useful
strategy for stochastic parameterization of moist processes?
125 km

Ensemble of
10 day SP-IFS
rainfall
forecasts
initialized
21 Oct. 2011
Subramanian and Palmer 2017 JAMES

Error and spread of rainfall forecasts vs. satellite obs
Error is within
the forecast
spread, as
desired for
stochastic
parameterization
…a credible
stochastic
parameterization
strategy based
on hi-res CRMs
Subramanian and Palmer 2017 JAMES

Past work II: Krasnopolsky et al. 2013 Adv. Artif. Neur. Syst.
• Training/testing dataset: 120-day CRM simulating a 256x256
km region of the W Pacific Ocean driven by observed
boundary and surface conditions for Nov 1992-Feb 1993.
Lots of deep cumulonimbus cloud systems and rainfall.
First 80% for training, last 20% for testing.
• Neural net (NN) used to learn how atmospheric heating &
moistening profiles due to all CRM-simulated cloud, radiation
and turbulent processes depend upon the time-varying
atmospheric temperature & moisture profiles (unsupervised
learning of parameterization of all moist physics).

NN training
r = 256
In training data, underlying relation {T(z),q(z)} → {Q1(z),Q2(z),cloud} is
not deterministic; it also depends upon the detailed time evolution of the
convection in the CRM domain. For each use, a NN should be
stochastically drawn from an ensemble of NNs that fit the training data
‘well enough’ to be plausible. They use a 10-member ensemble.
5 layers, 594 parameters
Krasnopolsky et al. 2013

Validation using CRM test data set
NN ensemble predicts the CRM heating and cloud profiles encouragingly well

Diagnostic test of NN param over tropical Pacific
NN moist physics
parameterization tested
diagnostically using
inputs from CAM climate
model over a large
region of the tropical
Pacific.
NN maintains a
reasonable cloud cover
over the region, but
other aspects of the
simulation were not
comprehensively
validated.
Diagnostic test doesn’t
demonstrate NN would
perform well if used to
‘prognostically’ drive
CAM.
Cloud cover

Improving on this approach
• Can a version of this approach work prognostically?
• Best to train with more comprehensive datasets, e. g.
super/ultraparameterization or global cloud resolving model.
Narenpitak et al. 2017 JAMES

CHOMP: Advancing CLUBB using machine learning
A supervised learning strategy for improving the CLUBB moist turbulence
parameterization used in CAM6 (with Jeremy McGibbon)
CLUBB = Cloud Layers Unified By Binormals (Golaz et al. 2002)
CHOMP = Closing Higher Orders with Machine-learning Parameterization
Based on large-eddy simulations of cloudy atmospheric boundary layers
sampled on 12 cruises of a container ship from LA to Hawaii during DOE’s
MAGIC campaign in Oct. 2011-Sept. 2012 (McGibbon and Bretherton 2017).

The LES dataset
• Hourly 3D samples
from 12 4-day
cruises, 400+
vertical levels
• 11 cruises used for
training, 1 for testing

CLUBB and CHOMP
• Higher-order turbulence closure
(HOC): Gridcell-mean tendencies
depend on second-order moments
• 2nd-order depends on 3rd-order, etc.
• CLUBB: 11 moments of u, v, w, qt and
θl are prognosed; others are
diagnosed (‘closed’) in terms of these
assuming joint double-Gaussian
PDFs  errors!
• CHOMP: Use random forest
regression trained on LES output for
cloud-topped boundary layer cases to
relate unknown moments (blue) to
prognostic variables.
• Each of the 400+ model levels at
each LES output time gives a sample.
Prognostic variables
One typical CLUBB moment equation
(closure terms revised in CHOMP)

Profile of input moments at one time in test data

Profiles of desired output moments from same time
• Random forest (CHOMP) matches LES better than CLUBB.

Challenges for CHOMP
• Random forest outputs are piecewise constant functions
of inputs, but we need to take their vertical derivatives.
Even after smoothing this leads to numerical instability in
the prognostic variables.
• A more efficient neural net is harder to train and less
robust on this data set.
• What to do when inputs go well outside the range of the
training data?
• The LES outputs are partly stochastic functions of their
inputs. This would be nice to preserve in CHOMP, but
the random forest regression mostly averages this out.

Outlook
• Machine learning using comprehensive training datasets from
realistic high-resolution models of clouds, storms,turbulence
→ could break the human parameterization bottleneck?
• But has not yet been successfully used to develop a moist
physics parameterization implemented in a global model.
• Best approach is unclear, including how to implement
stochasticity and how to tune using observational data.
• Lots of potential and technical challenges - your help needed!

Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, Developing Stochastic Parameterizations of Subgrid Variability of Clouds and Turbulence using High-Resolution Simulations - Chris Bretherton, Aug 24, 2017

More Related Content

What's hot (20)

Similar to Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, Developing Stochastic Parameterizations of Subgrid Variability of Clouds and Turbulence using High-Resolution Simulations - Chris Bretherton, Aug 24, 2017 (20)

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded (20)

Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, Developing Stochastic Parameterizations of Subgrid Variability of Clouds and Turbulence using High-Resolution Simulations - Chris Bretherton, Aug 24, 2017