2019 CtD: A hybrid deep learning approach to vertexing

A hybrid deep learning approach to vertexing
Rui Fang1
Henry Schreiner1, 2
Mike Sokoloﬀ1
Constantin Weisser3
Mike Williams3
April 3, 2019
1
The University of Cincinnati
2
Princeton University
3
Massachusetts Institute of Technology
CtD/WIT 2019
Supported by:

0 5 10 15 20 25 30 35 40 45 50 55 60
# LHCb long tracks
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Efficiency
Found 103002 of 109733 (eff 93.87%)
False positive rate = 0.251 per event
Asymmetric cost function
Found 96616 of 109733 (eff 88.05%)
False positive rate = 0.0485 per event
Symmetric cost function
Events in sample = 20K
Training sample = 240K
0 5 10 15 20 25 30 35 40 45 50 55 60
# LHCb long tracks
102
103
104
PVs
1/16Fang, Schreiner, Sokoloﬀ, Weisser, Williams
April 3, 2019

Tracking in the LHCb upgrade Introduction
The changes
• 30 MHz software trigger
• 7.6 PVs per event (Poisson distribution)
• Roughly 5.5 visible PVs per event
The problem
• Much higher pileup
• Very little time to do the tracking
• Current algorithms too slow
We need to rethink our algorithms from the ground up...
April 3, 2019

Vertices and tracks Introduction
Vertices
• Events contain ≈ 7 Primary Vertices (≈ 5
visible PVs)
A PV should contain 5+ long tracks
• Multiple Secondary Vertices (SVs) per
event as well
A SV should contain 2+ tracks
Beams
PV
Track
SV
Adapt to machine learning?
• Sparse 3D data (41M pixels) → rich 1D data
• 1D convolutional neural nets
• Highly parallelizable, GPU friendly
• Opportunities to visualize learning process
April 3, 2019

A hybrid ML approach Introduction
Tracking Kernel generation Make predictions
CNNs
Interpret results
Truth Training
Validation
Machine learning features (so far)
• Prototracking converts sparse 3D dataset to feature-rich 1D dataset
• Easy and eﬀective visualization due to 1D nature
• Even simple networks can provide interesting results
What follows is a proof of principle implementation for ﬁnding PVs.
April 3, 2019

Kernel generation Design
Tracking procedure
• Hits lie on the 26 planes
• For simplicity, only 3 tracks shown
z axis (along the beam)
x PV
April 3, 2019

Tracking procedure
• Make a 3D grid of voxels (2D shown)
• Note: only z will be fully calculated and
stored
x PV
April 3, 2019

Tracking procedure
stored
• Tracking (full or partial)
x PV
April 3, 2019

Tracking procedure
stored
• Fill in each voxel center with Gaussian PDF
x PV
April 3, 2019

Tracking procedure
stored
• PDF for each (proto)track is combined
x PV
April 3, 2019

Tracking procedure
stored
• PDF for each (proto)track is combined
• Fill z “histogram” with maximum KDE value
in xy
x
Kernel
PV
April 3, 2019

Example of z KDE histogram Design
100 50 0 50 100 150 200 250 300
z values [mm]
0
500
1000
1500
2000
DensityofKernel
Kernel
LHCb PVs
Other PVs
LHCb SVs
Other SVs
Note: All events from toy detector simulation
Human learning
• Peaks generally correspond to PVs and SVs
Challenges
• Vertex may be oﬀset from peak
• Vertices interact
April 3, 2019

Target distribution Design
Build target distribution
• True PV position as the mean of Gaussian
• σ (standard deviation) is 100 µm (simpliﬁcation)
• Fill bins with integrated PDF within ±3 bins (±300 µm)
April 3, 2019

Neural network architecture Design
Inputs
1
2
3
· · ·
25
26
· · ·
3998
3999
4000
Convolution
Width:
25
Channels:
1 → 25
25 Channels
1
2
3
· · ·
15
16
· · ·
3998
3999
4000
Convolution
Width:
15
Channels:
25 → 25
25 Channels
1
2
3
· · ·
15
16
· · ·
3998
3999
4000
Convolution
Width:
15
Channels:
25 → 25
25 Channels
1
2
3
4
5
6
· · ·
3998
3999
4000
Convolution
Width:
5
Channels:
25 → 1
1 Channel
1
2
3
· · ·
91
92
· · ·
3998
3999
4000
Convolution
Width:
91
Channels:
1 → 1
Output
1
2
3
4
5
· · ·
3997
3998
3999
4000
-x x
y
Leaky relu
-x x
y
Leaky relu
-x x
y
Leaky relu
-x x
y
Leaky relu
-x x
y
Softplus
April 3, 2019

Cost function Design
10 6 10 5 10 4 10 3 10 2 10 1 100
yhat
0
10
20
30
40
50
60
cost
0.0 0.2 0.4 0.6 0.8
yhat
0
5
10
15
20
25
30
cost
Asym. Cost for y = 0.10
Symm. Cost for y = 0.10
Asym. Cost for y = 0.30
Symm. Cost for y = 0.30
Asym. Cost for y = 1e-5
Symm. Cost for y = 1e-5
0.2 0.4 0.6 0.8 1.0
yhat
0
2
4
6
8
10
cost
Approach
• Symmetric cost function: low FP but low efficiency
• Adding asymmetry term controls trade-off for FP vs. efficiency
April 3, 2019

False Positive and efficiency rates Results
88 89 90 91 92 93 94
Efficiency [%]
0.05
0.10
0.15
0.20
0.25FPperevent
Symm cost
Most asymm cost
88 89 90 91 92 93 94
Efficiency [%]
10 1
6×10 2
2×10 1
FPperevent
Symm cost
Most asymm cost
Search for PVs (handwritten, maybe not optimial)
• Search ±5 bins (±500µm) around a true PV
• At least 3 bins with predicted probability > 1% and
integrated probability > 20%.
Tunable efficiency vs. FP
• The asymmetry parameter
controls FP vs. efficiency
April 3, 2019

Compare predictions with targets: Examples Results
0
100
200
300
400
500
KernelDensity
True: 197.461 mm
Pred: 197.396 mm
: -65 µm
Event 5 @ 197.4 mm: PV found
Kernel Density
195.00 196.00 197.00 198.00 199.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
PV found example
0
200
400
600
800
1000
1200
1400
1600
KernelDensity
True: 36.068 mm
Pred: 36.400 mm
: 332 µm
Kernel Density
34.00 35.00 36.00 37.00 38.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
PV found example
April 3, 2019

Compare predictions with targets: When it works Results
0
200
400
600
800
1000
1200
KernelDensity
True: 48.904 mm
Pred: 48.954 mm
: 50 µm
Kernel Density
47.00 48.00 49.00 50.00 51.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
PV found example
0
50
100
150
200
KernelDensity
Pred: 0.976 mm
Event 0 @ 1.0 mm: Masked
Kernel Density
-1.00 0.00 1.00 2.00 3.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
Masked (<5 tracks) example
April 3, 2019

Compare predictions with targets: When it fails Results
0
50
100
150
200
250
KernelDensity
Pred: 65.696 mm
Event 2 @ 65.7 mm: False positive
Kernel Density
64.00 65.00 66.00 67.00 68.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
False Positive example
0
100
200
300
400
500
KernelDensity
True: 51.898 mm
Event 3 @ 51.9 mm: PV not found
Kernel Density
50.00 51.00 52.00 53.00 54.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
PV not found example
April 3, 2019

Future addition: xy information Future plans
Adding xy information
• Point of maximum z in xy available
• Extra information: sharp discontinuities
between PVs
• Need iterative approach or “reduced
importance”
What about a full 2D kernel?
• Not needed for LHCb currently (large xy,
“low” z overlap)
• Might be useful for other detectors!
0
500
1000
1500
2000
KernelDensity
True: 114.622 mm
Pred: 114.597 mm
: -26 µm
Kernel Density
113.00 114.00 115.00 116.00 117.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
April 3, 2019

Conclusions and plans Future plans
0 5 10 15 20 25 30 35 40 45 50 55 60
# LHCb long tracks
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Efficiency
• Proof-of-Principle established: a hybrid ML algorithm
using a 1-dimensional KDE processed by a 5-layer CNN
finds primary vertices with efficiencies and false positive
rates similar to traditional algorithms.
• Efficiency is tunable; increasing the efficiency also
increases the false positive rate.
• Adding information should improve performance.
• can add KDE (x,y) information to algorithm
• can associate tracks to PV candidates, then iterate.
• Next steps: train with full LHCb MC and deploy
inference engine in LHCb Hlt1 framework.
• Beyond LHCb
• approach might work for ATLAS and CMS (in 2D?);
• algorithm is an interesting ML laboratory.
April 3, 2019

Final words Future plans
Source code:
• https://gitlab.cern.ch/LHCb-Reco-Dev/pv-finder
• Runnable with Conda on macOS and Linux
Run: conda env create -f environment-gpu.yml
Python 3.6+ and PyTorch used for machine learning code
Generation now available too using the new Conda-Forge
ROOT and Pythia8 packages
Supported by:
• NSF OAC-1836650:
IRIS-HEP
• NSF OAC-1740102:
SI2:SSE
• NSF OAC-1739772:
SI2:SSE
April 3, 2019

Final words Future plans
Questions?
Source code:
• https://gitlab.cern.ch/LHCb-Reco-Dev/pv-finder
• Runnable with Conda on macOS and Linux
Run: conda env create -f environment-gpu.yml
Python 3.6+ and PyTorch used for machine learning code
Generation now available too using the new Conda-Forge
ROOT and Pythia8 packages
Supported by:
• NSF OAC-1836650:
IRIS-HEP
• NSF OAC-1740102:
SI2:SSE
• NSF OAC-1739772:
SI2:SSE
April 3, 2019

More predictions with targets (1) Backup
0
50
100
150
200
KernelDensity
True: 221.595 mm
Pred: 221.546 mm
: -49 µm
Kernel Density
219.00 220.00 221.00 222.00 223.00 224.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
0
500
1000
1500
2000
KernelDensity
True: 114.622 mm
Pred: 114.597 mm
: -26 µm
Kernel Density
113.00 114.00 115.00 116.00 117.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
April 3, 2019

0
200
400
600
800
1000
1200
1400
1600
KernelDensity
True: 129.336 mm
Pred: 129.337 mm
: 1 µm
Kernel Density
127.00 128.00 129.00 130.00 131.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
0
500
1000
1500
2000
KernelDensity
True: 143.224 mm
Pred: 143.199 mm
: -25 µm
Kernel Density
141.00 142.00 143.00 144.00 145.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
April 3, 2019

0
50
100
150
200
250
300
350
400
KernelDensity
True: 150.650 mm
Pred: 150.416 mm
: -234 µm
Kernel Density
148.00 149.00 150.00 151.00 152.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
0
500
1000
1500
2000
2500
KernelDensity
True: 179.560 mm
Pred: 179.591 mm
: 31 µm
Kernel Density
178.00 179.00 180.00 181.00 182.00
z values [mm]
150
100
50
0
50
100
150
xymaximum[m]
x
y
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Probability
Target
Predicted
Masked
April 3, 2019

The VELO Backup
Tracks
• Originate from vertices (not shown)
• Hits originate from tracks
• We only know the true track in simulation
• Nearly straight, but tracks may scatter in material
The VELO
• A set of 26 planes that detect tracks
• Tracks should hit one or more pixels per plane
• Sparse 3D dataset (41M pixels)
April 3, 2019

Questions for other experiments Backup
• Beam width (x, y): 40 µm for LHCb, what is yours?
• Transverse resolution: 5–15 µm for LHCb depending on number of tracks, what is yours?
• Longitudinal resolution: 40–100 µm for LHCb depending on number of tracks, what is
yours?
• Cleaning up prototracks based on IP could simplify kernel
• Can prototracking be done in the triggers?
April 3, 2019

2019 CtD: A hybrid deep learning approach to vertexing

More Related Content

What's hot (19)

Similar to 2019 CtD: A hybrid deep learning approach to vertexing (20)

More from Henry Schreiner (20)

Recently uploaded (20)

2019 CtD: A hybrid deep learning approach to vertexing