Jet Energy Corrections with Deep Neural Network Regression

Jet Energy Corrections with
DNN Regression
Daniel Holmberg
CMS ML Forum 08.09.2021

Introduction
Dataset
ML models
Training
Results
Summary
Extra material
CMS ML Forum 08.09.2021 Jet Energy Corrections with DNN Regression 2

Introduction
• The physical detector causes the jet transverse momentum
pT to be different from the true particle-level jet
• Corrected such that it agrees on average with the pT of
the particle level jet
• Determined by using basic kinematic quantities of the jet
• Possible to include more information and get better
corrections using machine learning
• Has been done successfully for b-jets using a deep
feed-forward neural network
• However, this study is about generically applicable
DNN-based corrections

Dataset
• QCD HT -binned samples, 2016 configuration
• /QCD HT* TuneCUETP8M1 13TeV-
madgraphMLM-pythia8/
RunIISummer16MiniAODv3*/MINIAODSIM
• Custom ML JEC dataset by A. Popov (ULB)
• Forked and added SV angles for initial coordinates in
ParticleNet
• Use 10M jets for training set, 2M jets for validation set
and 2M jets for test set

Data distribution
• Same shape for all jet flavours
• Flat in (pT , η) at low pT
• Steeply falling in pT at high pT
• Proportions of b, c, uds, and g jets fixed as 1 : 1 : 2 : 2
0 30 100 300 1000 3000
pgen
T
0
1
2
3
4
5
|
gen
|
100
101
102
103
104
u d s c b g unknown
0
1
2
3
4
Number
of
jets
1e6

Training features
• Event level
• pT , log pT , η, φ, ρ, mass, area
• multiplicity, pT D, σ2, num pv
• Charged PF candidates
• pT , η, φ, ∆pT , ∆η, ∆φ
• dxy, dz, dxy significance, normalized χ2
• num hits, num pixel hits, lost hits
• particle id, pv association quality
• Neutral PF candidates
• pT , η, φ, ∆pT , ∆η, ∆φ
• particle id, hcal energy fraction
• Secondary vertices
• pT , η, φ, ∆pT , ∆η, ∆φ, mass
• flight distance, significance, num tracks

Feature engineering
0 10 20 30 40
Multiplicity
0.00
0.02
0.04
0.06
0.08
fraction
of
jets/bin
80<pgen
T <100 GeV, 0<| gen|<1.3
quark
gluon
Create event-level
features: multiplicity,
pT D, σ2 that helps with
quark gluon discrimation
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
pTD
0.00
0.02
0.04
0.06
0.08
0.10
0.12
fraction
of
jets/bin
80<pgen
T <100 GeV, 0<| gen|<1.3
quark
gluon
0.00 0.05 0.10 0.15 0.20
2
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
fraction
of
jets/bin
80<pgen
T <100 GeV, 0<| gen|<1.3
quark
gluon

Feature engineering
• Relative features for all constituents
• ∆pT,i = ppf
T,i /pjet
T
• ∆ηi = sgn(ηjet)(ηpf
i − ηjet)
• ∆φi = (φpf
i − φjet + π)mod(2π) − π
• One hot encode categorical features
• particle id and primary vertex association quality
• e.g. neutral pid:
[1, 2, 22, 130] -> [
[1, 0, 0, 0], [0, 1, 0, 0],
[0, 0, 1, 0], [0, 0, 0, 1]
]

Target and loss
• Regression target ŷ = log(pgen
T /pT )
• Correction factor is thus ey
where y is the NN output
• MAE loss function L = 1
N
PN
i=1 |yi − ŷi |I|ŷi |<1
• The last factor rejects 0.8% of jets where the target is
way off
1.5 1.0 0.5 0.0 0.5 1.0 1.5
log(pgen
T /pT)
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
fraction
of
jets/bin

Choice of ML models
• For every jet there are global features as well as
constituents
• Jet constituents form a permutation invariant set
• Number of constituents varies from jet to jet
• Order doesn’t matter
• ⇒ Requires special treatment to use it for ML
• Deep Sets and Dynamic Graph CNN are examples of NN
architectures allowing for unordered sets to be consumed
• They have been used for jet tagging in Energy Flow
Networks and ParticleNet respectively
• Modified versions of Deep Sets and ParticleNet are used
to include all available information, both global features
and constituents!

Deep Sets
• Used a JEC study with Deep Sets from 2020 as baseline
• Procedure
• An MLP F : xi → yi is applied to every constituent xi
• Weights of the MLP are shared among all constituents
• The learned parameters are aggregated using a
permutation invariant operation
• Here the sum over all constituents
P
i yi is chosen
• This is based on the theorem that any function G({xi })
invariant under permutations of its inputs can be
represented in the form
P
i F(xi )
• Concatenate with global features and feed into MLP

Deep Sets architecture
Deep Sets Block
n = (64, 128, 256)
Fully Connected
512, ReLu
Fully Connected
256, ReLu
Fully Connected
128, ReLu
Fully Connected
64, ReLu
charged
constituents
global features
Fully Connected
1
Deep Sets Block
n = (64, 128, 256)
neutral
constituents
Deep Sets Block
n = (32, 64, 128)
secondary
vertices
Fully Connected
1024, ReLu
(a) Complete network
Aggregation
constituents
Dense
ReLu
Dense
ReLu
Dense
ReLu
applied
elementwise
BatchNorm
BatchNorm
BatchNorm
(b) Deep Sets block

ParticleNet
• Started from H. Qu’s Keras version of ParticleNet
• Edge convolution
• Begin with coordinates in pseudorapidity-azimuth space
• Calculate k-nearest neighboring particles for each particle
using the coordinates
• “Edge features” are constructed from the constituent
features using the indices of k-nearest neighboring
particles
• Feed into shared MLP to update each particle in the
graph (in practice using convolution layers)
• Perform permutation invariant aggregation, selected
mean which is used in the ParticleNet paper
• Subsequent EdgeConv blocks use the learned feature
vectors as coordinates (hence dynamic)
• Concatenate with global features and feed into MLP

ParticleNet architecture
coordinates
neutral
constituents
EdgeConv Block
k = 16, c = (64, 64, 64)
EdgeConv Block
k = 16, c = (128, 128, 128)
EdgeConv Block
k = 16, c = (256, 256, 256)
Global Average Pooling
Fully Connected
512, ReLu
global features
Fully Connected
256, ReLu
Fully Connected
128, ReLu
Fully Connected
64, ReLu
Fully Connected
1
coordinates
charged
constituents
EdgeConv Block
k = 16, c = (64, 64, 64)
EdgeConv Block
k = 16, c = (128, 128, 128)
EdgeConv Block
k = 16, c = (256, 256, 256)
coordinates
secondary
vertices
EdgeConv Block
k = 8, c = (32, 32, 32)
EdgeConv Block
k = 8, c = (64, 64, 64)
EdgeConv Block
k = 8, c = (128, 128, 128)
(a) Complete network
coordinates constituents
k-NN
edge features
k-NN indices
Dense
ReLu
Dense
ReLu
Dense
ReLu
Aggregation
ReLu
BatchNorm
BatchNorm
BatchNorm
(b) EdgeConv block

Training
• Two models are trained
• Deep Sets with 1.47M parameters
• ParticleNet with 1.20M parameters
• Using TensorFlow 2.4.1
• MirroredStrategy on two Nvidia GeForce RTX 3090 cards
• Adam optimizer
• Batch size 1024
• Learning rate 2 × 10−3, reduced by a factor of 5 when
validation loss plateaus
• Regularization through early stopping callback

Effective data pipeline
Figure: Naive and parallel data handling in TensorFlow.

Loss
• Deep Sets
• min training loss 0.0784
• min validation loss 0.0792
• ParticleNet
• min training loss 0.0776
• min validation loss 0.0785
(a) Deep Sets loss (b) ParticleNet loss

Results

all jet response
102 103
pgen
T
0.99
1.00
1.01
1.02
1.03
1.04
Median
response
0<| gen|<2.5
Standard
Deep Sets
ParticleNet
0.00
0.05
0.10
0.15
0.20
0.25
0.30
IQR
/
median
for
response
0<| gen|<2.5
Standard
Deep Sets
ParticleNet
102 103
pgen
T
0.9
1.0
Ratio
102 103
pgen
T
0.97
0.98
0.99
1.00
1.01
1.02
1.03
Median
response
2.5<| gen|<5
Standard
Deep Sets
ParticleNet
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
IQR
/
median
for
response
2.5<| gen|<5
Standard
Deep Sets
ParticleNet
102 103
pgen
T
0.8
0.9
1.0
Ratio

uds jet response
102 103
pgen
T
0.98
1.00
1.02
1.04
1.06
Median
response
uds, 0<| gen|<2.5
Standard
Deep Sets
ParticleNet
0.00
0.05
0.10
0.15
0.20
0.25
IQR
/
median
for
response
uds, 0<| gen|<2.5
Standard
Deep Sets
ParticleNet
102 103
pgen
T
0.9
1.0
Ratio
102
pgen
T
0.96
0.98
1.00
1.02
1.04
1.06
Median
response
uds, 2.5<| gen|<5
Standard
Deep Sets
ParticleNet
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
IQR
/
median
for
response
uds, 2.5<| gen|<5
Standard
Deep Sets
ParticleNet
102 103
pgen
T
0.8
0.9
1.0
Ratio

gluon jet response
102 103
pgen
T
0.985
0.990
0.995
1.000
1.005
1.010
1.015
1.020
Median
response
g, 0<| gen|<2.5
Standard
Deep Sets
ParticleNet
0.00
0.05
0.10
0.15
0.20
0.25
0.30
IQR
/
median
for
response
g, 0<| gen|<2.5
Standard
Deep Sets
ParticleNet
102 103
pgen
T
0.9
1.0
Ratio
102
pgen
T
0.96
0.97
0.98
0.99
1.00
1.01
1.02
Median
response
g, 2.5<| gen|<5
Standard
Deep Sets
ParticleNet
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
IQR
/
median
for
response
g, 2.5<| gen|<5
Standard
Deep Sets
ParticleNet
102 103
pgen
T
0.8
0.9
1.0
Ratio

b jet response
102 103
pgen
T
0.985
0.990
0.995
1.000
1.005
1.010
1.015
1.020
Median
response
b, 0<| gen|<2.5
Standard
Deep Sets
ParticleNet
0.00
0.05
0.10
0.15
0.20
0.25
0.30
IQR
/
median
for
response
b, 0<| gen|<2.5
Standard
Deep Sets
ParticleNet
102 103
pgen
T
0.9
1.0
Ratio
102
pgen
T
0.96
0.98
1.00
1.02
1.04
1.06
1.08
Median
response
b, 2.5<| gen|<5
Standard
Deep Sets
ParticleNet
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
IQR
/
median
for
response
b, 2.5<| gen|<5
Standard
Deep Sets
ParticleNet
102 103
pgen
T
0.8
0.9
1.0
Ratio

c jet response
102 103
pgen
T
0.99
1.00
1.01
1.02
1.03
Median
response
c, 0<| gen|<2.5
Standard
Deep Sets
ParticleNet
0.00
0.05
0.10
0.15
0.20
0.25
0.30
IQR
/
median
for
response
c, 0<| gen|<2.5
Standard
Deep Sets
ParticleNet
102 103
pgen
T
0.9
1.0
Ratio
102 103
pgen
T
0.97
0.98
0.99
1.00
1.01
1.02
1.03
1.04
Median
response
c, 2.5<| gen|<5
Standard
Deep Sets
ParticleNet
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
IQR
/
median
for
response
c, 2.5<| gen|<5
Standard
Deep Sets
ParticleNet
102 103
pgen
T
0.8
0.9
1.0
Ratio

flavour difference
g u d s c b
0.990
0.995
1.000
1.005
1.010
1.015
Median
response
pgen
T >30 GeV, 0<| gen|<2.5
Standard
Deep Sets
ParticleNet
g u d s c b
0.97
0.98
0.99
1.00
1.01
1.02
1.03
Median
response
pgen
T >30 GeV, 2.5<| gen|<5
Standard
Deep Sets
ParticleNet

Summary
• Improved pT resolution w.r.t standard corrections
• 10-15% for uds jets, 10% for b & c jets and around 8%
for g jets in the central region
• 10-20% for uds jets and 5-20% for the rest of the jets in
the forward region
• Reduced flavour differences
• Factor of 3 improvement in central region and 30% in
forward region
• ParticleNet vs Deep Sets
• 270k less parameters in my ParticleNet model
• Despite this ParticleNet achieves slightly better
resolution, especially for jets with higher pT
• ParticleNet also has slightly less flavour difference for the
response
• However, Deep Sets has fewer GPU intense operations
and is faster to train

Extra material

Residual response
102 103
pgen
T
0.01
0.00
0.01
0.02
0.03
0.04
0.05
0.06
R
uds
R
b
0<| gen|<2.5
Standard
Deep Sets
ParticleNet
102
pgen
T
0.100
0.075
0.050
0.025
0.000
0.025
0.050
0.075
0.100
R
uds
R
b
2.5<| gen|<5
Standard
Deep Sets
ParticleNet
102 103
pgen
T
0.01
0.00
0.01
0.02
0.03
0.04
R
uds
R
c
0<| gen|<2.5
Standard
Deep Sets
ParticleNet
102
pgen
T
0.02
0.00
0.02
0.04
0.06
R
uds
R
c
2.5<| gen|<5
Standard
Deep Sets
ParticleNet

Residual response
102 103
pgen
T
0.01
0.00
0.01
0.02
0.03
0.04
0.05
0.06
R
uds
R
g
0<| gen|<2.5
Standard
Deep Sets
ParticleNet
102
pgen
T
0.06
0.04
0.02
0.00
0.02
0.04
0.06
0.08
R
uds
R
g
2.5<| gen|<5
Standard
Deep Sets
ParticleNet
102 103
pgen
T
0.015
0.010
0.005
0.000
0.005
0.010
0.015
R
b
R
g
0<| gen|<2.5
Standard
Deep Sets
ParticleNet
102
pgen
T
0.02
0.00
0.02
0.04
0.06
R
b
R
g
2.5<| gen|<5
Standard
Deep Sets
ParticleNet

Residual response
102 103
pgen
T
0.000
0.005
0.010
0.015
0.020
0.025
R
c
R
g
0<| gen|<2.5
Standard
Deep Sets
ParticleNet
102
pgen
T
0.01
0.00
0.01
0.02
0.03
0.04
0.05
R
c
R
g
2.5<| gen|<5
Standard
Deep Sets
ParticleNet
102 103
pgen
T
0.005
0.000
0.005
0.010
0.015
0.020
0.025
R
c
R
b
0<| gen|<2.5
Standard
Deep Sets
ParticleNet
102
pgen
T
0.00
0.02
0.04
0.06
R
c
R
b
2.5<| gen|<5
Standard
Deep Sets
ParticleNet

Jet Energy Corrections with Deep Neural Network Regression

More Related Content

What's hot (20)

Similar to Jet Energy Corrections with Deep Neural Network Regression (20)

Recently uploaded (20)

Jet Energy Corrections with Deep Neural Network Regression