SlideShare a Scribd company logo
B. Kégl / AppStat@LAL Learning to discover
LEARNING TO DISCOVER:
MACHINE LEARNING IN HIGH-ENERGY PHYSICS
Linear Accelerator Laboratory and Computer Science Laboratory	

CNRS/IN2P3 & University Paris-S{ud,aclay}
BALÁZS KÉGL
CERN, May 13, 2014
1
B. Kégl / AppStat@LAL Learning to discover
OUTLINE
• What is machine learning?	

• Three projects to illustrate data science in HEP	

• budgeted learning for triggers (LHCb)	

• classification for discovery and the HiggsML challenge (ATLAS)	

• deep learning for imaging calorimeters (ILC)	

• Concluding remarks	

• interdisciplinarity: HEP, ML, data science
2
B. Kégl / AppStat@LAL Learning to discover
WHAT IS MACHINE LEARNING?
• “The science of getting computers to act without being
explicitly programmed” - Andrew Ng (Stanford/Coursera)	

• part of standard computer science curriculum since the 90s 	

• inferring knowledge from data	

• generalizing to unseen data	

• usually no parametric 

model assumptions	

• emphasizing the computational

challenges
Machine
Learning
Statistics
Optimization
Artificial
intelligence
Neuroscience
Cognitive
science
Signal
processing
Information
theory
Statistical
physics
3
B. Kégl / AppStat@LAL Learning to discover
MACHINE LEARNING TAXONOMY
• Supervised learning: non-parametric (model-free) input - output
functions 	

• classification (Trees, BDT, SVM, NN) - what you call MVA	

• regression (Trees, NN, Gaussian Processes)	

• Unsupervised learning: non-parametric data representation	

• clustering (k-means, spectral clustering, Dirichlet processes)	

• dimensionality reduction (PCA, ISOMAP, LLE, auto-associative NN)	

• density estimation (kernel density, Gaussian mixtures, the Boltzmann machine)	

• Reinforcement learning:	

• learning + dynamic control: learn to behave in an environment to maximize cumulative
reward
4
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
Character recognition
5
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
Emotion recognition
6
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
Speech recognition
7
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
• Input: a usually high dimensional vector x	

• Output: a category (label, class) y	

• Usually no parametric model	

• the classification function y = g(x) is learned using a training set 

D = {(x1,y1), . . . , (xn,yn)}	

• Well-tested algorithms:	

• neural networks, support vector machines, boosting
8
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION
The only goal is a low probability of error	

P(g(x) = y)	

on previously unseen examples (x, y)
9
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Real time face detection
10
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Real time web page ranking
11
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Real time ad placement
12
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Real time signal/background separation
13
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
The second goal is the fast execution of 	

g(x)
14
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Trade-off between quality and speed
15
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
• Time constraints	

• Memory constraints	

• Consumption constraints	

• Communication constraints
16
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
Learning deep decision DAGs in a
{djalel.benbouzid,busarobi,balazs.kegl}@
Original motivation
Before...
•
Stage 1
Stage 2
Stage 3
Stage 4
The common design: 	

cascade classification = trigger with levels
easy background
medium background
hard background
signal/very hard backgroundViola-Jones CVPR 2001 17
B. Kégl / AppStat@LAL Learning to discover
THE LHCB TRIGGER
• Collaboration with	

• Vava Gligorov (CERN)	

• Mike Williams (MIT)	

• Djalel Benbouzid (LAL)
18
B. Kégl / AppStat@LAL Learning to discover
THE LHCB TRIGGER
• A beautifully complex problem	

• varying feature costs	

• cost may depend on the value	

• events are bags of overlapping candidates
19
B. Kégl / AppStat@LAL Learning to discover
THE LHCB TRIGGER
Immediate cost
Bag-dependent cost
Value-dependent cost
D0_VTX_FD
PiS_IP
D0C_1_IP
D0C_2_IP
D0C_2_PTD0C_1_PT PiS_PT
D0C_2_IPC
D0C_2_TFC
D0C_1_IPC
D0C_1_TFC
PiS_IPC
PiS_TFCDstMD0MD0Tau
20
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Easy background
Background-like Signal-like
EVALUATE
Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
21
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
QUIT
Benbouzid et al. ICML 2012
Easy background
Background-like Signal-like
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
22
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
Easy background Benbouzid et al. ICML 201223
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
EVALUATE
Background-like Signal-like
24
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
SKIP
Background-like Signal-like
25
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPCEVALUATE
Background-like Signal-like
26
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
EVALUATE
Background-like Signal-like
27
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
EVALUATE
Background-like Signal-like
28
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
EVALUATE
Background-like Signal-like
29
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
SKIP
Background-like Signal-like
30
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
EVALUATE
Background-like Signal-like
31
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
SKIP
Background-like Signal-like
32
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard background Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
SKIP
SKIP
SKIP
SKIP
Background-like Signal-like
33
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Easy signal Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
EVALUATE
Background-like Signal-like
34
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Easy signal Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
QUIT
Background-like Signal-like
35
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Easy signal Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
36
B. Kégl / AppStat@LAL Learning to discover
MDDAG: A SIGNAL/BCKG DECISION GRAPH
Hard signal Benbouzid et al. ICML 2012
0ms
1.5ms
4ms
D0C_2_IP
D0_VTX_FD
PiS_IP
D0C_1_IP
PiS_PT
D0C_1_PT
D0C_2_PT
∞
D0Tau
D0M
DstM
D0C_1_TFC
PiS_TFC2
D0C_1_IPC
PiS_IPC2
D0C_2_TFC
D0C_2_IPC
EVAL
EVAL EVAL
EVAL
EVAL EVAL
EVAL EVAL
SKIP
SKIP
SKIP
SKIP
SKIP
37
B. Kégl / AppStat@LAL Learning to discover
BUDGETED CLASSIFICATION
• Classification with test-time constraints	

• An active research area due to IT applications	

• To be exploited for trigger design
38
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Challenge 1: estimation la plus précise et en un temps CPU minimal de
la masse du candidat boson de Higgs en fonction des observables de
l’événement, et malgré les particules non mesurées. Précision actuelle
(intégration avec chaine de Markov en dimension 5) ~20% en 0.1s par
événement
The HiggsML challenge
39
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
• In a nutshell	

• A vector x of variables is extracted from each event	

• A classifier g(x) is trained to separate signal from background	

• The background b is estimated in the selection region 

G = {x : g(x) = s}	

• Discovery is made when the number of real events n is significantly
higher than b
40
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
• Exciting physics	

• The Higgs to tau-tau excess is not yet at five sigma

Tech. Rep.ATLAS-CONF-2013-108	

• Exciting data science (statistics and machine learning)	

• What is the theoretical relationship between classification and test
sensitivity? 	

• What is the quantitative criteria to optimize? 	

• How to formally include systematic uncertainties?	

• Can we redesign classical algorithms (boosting, SVM, neural nets) for
optimizing this criteria? 41
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
We are organizing a 

data challenge 

to answer some of these
questions
Center for Data Science
Paris-Saclay
the HiggsML challenge
May to September 2014
When High Energy Physics meets Machine Learning
Joerg Stelzer - Atlas-CERN
Marc Schoenauer - INRIA
Balázs Kégl - Appstat-LAL
Cécile Germain - TAO-LRI
David Rousseau - Atlas-LAL
Glen Cowan - Atlas-RHUL
Isabelle Guyon - Chalearn
Claire Adam-Bourdarios - Atlas-LAL
Thorsten Wengler - Atlas-CERN
Andreas Hoecker - Atlas-CERN
Organization committee Advisory committee
info to participate and compete : https://www.kaggle.com/c/higgs-boson
42
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
The formal setup
• We simulate data: D = (x1, y1, w1), . . . , (xn, yn, wn)
• xi 2 Rd
is the feature vector
• yi 2 {background, signal} is the label
• wi 2 R+
is a non-negative weight (importance sampling)
• let S = {i : yi = s} and B = {i : yi = b} be the index sets of signal
and background events, respectively
• Maximize the Approximate Median Significance
G. Cowan, K. Cranmer, E. Gross, and O. Vitells. EPJ C, 71:1554, 2011.
AMS =
r
2
⇣
(s + b) ln
⇣
1 +
s
b
⌘
s
⌘
⇡
s
p
b
• bG = i : g(xi) = s
• s =
P
i2S bG wi
• b =
P
i2B bG wi 43
B. Kégl / AppStat@LAL Learning to discover
How to design g to maximize the sensitivity?
• A two-stage approach
1. optimize a discriminant (score) function f : Rd
! R using a classical
learning algorithm (BDT, NN)
CLASSIFICATION FOR DISCOVERY
Signal
Background
44
B. Kégl / AppStat@LAL Learning to discover
How to design g to maximize the sensitivity?
• A two-stage approach
1. optimize a discriminant (score) function f : Rd
! R using a classical
learning algorithm (BDT, NN)
CLASSIFICATION FOR DISCOVERY
Signal
Background
45
B. Kégl / AppStat@LAL Learning to discover
How to design g to maximize the sensitivity?
• A two-stage approach
1. optimize a discriminant (score) function f : Rd
! R using a classical
learning algorithm (BDT, NN)
CLASSIFICATION FOR DISCOVERY
Signal
Background
46
B. Kégl / AppStat@LAL Learning to discover
How to design g to maximize the sensitivity?
• A two-stage approach (make figure with score)
1. optimize a discriminant (score) function f : Rd
! R using a classical
learning algorithm (BDT, NN)
2. define g(x) = sign f(x) ✓ and optimize ✓ for maximizing the AMS
CLASSIFICATION FOR DISCOVERY
θ
3.5σ
47
B. Kégl / AppStat@LAL Learning to discover
Comparing with Atlas analysis
• Atlas does a manual pre-selection (category), the first maximum of
the AMS is completely eliminated. Why?
CLASSIFICATION FOR DISCOVERY
s = 250	

b = 5000 ± 500!
Systematics!
48
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
How to handle systematic (model) uncertainties?
• OK, so let’s design an objective function that can take background
systematics into consideration
• Likelihood with unknown background b ⇠ N(µb, b)
L(µs, µb) = P(n, b|µs, µb, b) =
(µs + µb)n
n!
e (µs+µb) 1
p
2⇡ b
e (b µb)2
/2 b
2
• Profile likelihood ratio (0) =
L(0, ˆˆµb)
L(ˆµs, ˆµb)
• The new Approximate Median Significance (by Glen Cowan)
AMS =
s
2
✓
(s + b) ln
s + b
b0
s b + b0
◆
+
(b b0)2
b
2
where
b0 =
1
2
⇣
b b
2
+
p
(b b
2)2 + 4(s + b) b
2
⌘
49
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
How to handle systematic (model) uncertainties?
• The new Approximate Median Significance
AMS =
s
2
✓
(s + b) ln
s + b
b0
s b + b0
◆
+
(b b0)2
b
2
where
b0 =
1
2
⇣
b b
2
+
p
(b b
2)2 + 4(s + b) b
2
⌘
New AMS
ATLAS
Old AMS
50
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Center for Data Science
Paris-Saclay
the HiggsML challenge
May to September 2014
When High Energy Physics meets Machine Learning
Joerg Stelzer - Atlas-CERN
Marc Schoenauer - INRIA
Balázs Kégl - Appstat-LAL
Cécile Germain - TAO-LRI
David Rousseau - Atlas-LAL
Glen Cowan - Atlas-RHUL
Isabelle Guyon - Chalearn
Claire Adam-Bourdarios - Atlas-LAL
Thorsten Wengler - Atlas-CERN
Andreas Hoecker - Atlas-CERN
Organization committee Advisory committee
info to participate and compete : https://www.kaggle.com/c/higgs-boson
A tool for getting 

the ML community excited
about your problem
OPEN
since yesterday
51
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Center for Data Science
Paris-Saclay
the HiggsML challenge
May to September 2014
When High Energy Physics meets Machine Learning
Joerg Stelzer - Atlas-CERN
Marc Schoenauer - INRIA
Balázs Kégl - Appstat-LAL
Cécile Germain - TAO-LRI
David Rousseau - Atlas-LAL
Glen Cowan - Atlas-RHUL
Isabelle Guyon - Chalearn
Claire Adam-Bourdarios - Atlas-LAL
Thorsten Wengler - Atlas-CERN
Andreas Hoecker - Atlas-CERN
Organization committee Advisory committee
info to participate and compete : https://www.kaggle.com/c/higgs-boson
• Organizing committee	

• David Rousseau (ATLAS / LAL)	

• Balázs Kégl (AppStat / LAL)	

• Cécile Germain (LRI / UPSud)	

• Glen Cowan (ATLAS / Royal Holloway)	

• Claire Adam Bourdarios (ATLAS / LAL)	

• Isabelle Guyon (ChaLearn)
52
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Center for Data Science
Paris-Saclay
the HiggsML challenge
May to September 2014
When High Energy Physics meets Machine Learning
Joerg Stelzer - Atlas-CERN
Marc Schoenauer - INRIA
Balázs Kégl - Appstat-LAL
Cécile Germain - TAO-LRI
David Rousseau - Atlas-LAL
Glen Cowan - Atlas-RHUL
Isabelle Guyon - Chalearn
Claire Adam-Bourdarios - Atlas-LAL
Thorsten Wengler - Atlas-CERN
Andreas Hoecker - Atlas-CERN
Organization committee Advisory committee
info to participate and compete : https://www.kaggle.com/c/higgs-boson
• Official ATLAS GEANT4 simulations	

• 30 features (variables)	

• 250K training: input, label, weight	

• 100K public test (AMS displayed real-time),
only input	

• 450K private test (to determine the winner
after the closing of the challenge), only input	

• public and private tests set are shuffled,
participants submit a vector of 550K labels	

• Using the “old” AMS	

• cannot compare participants 

if the metric varies a lot

53
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
Center for Data Science
Paris-Saclay
the HiggsML challenge
May to September 2014
When High Energy Physics meets Machine Learning
Joerg Stelzer - Atlas-CERN
Marc Schoenauer - INRIA
Balázs Kégl - Appstat-LAL
Cécile Germain - TAO-LRI
David Rousseau - Atlas-LAL
Glen Cowan - Atlas-RHUL
Isabelle Guyon - Chalearn
Claire Adam-Bourdarios - Atlas-LAL
Thorsten Wengler - Atlas-CERN
Andreas Hoecker - Atlas-CERN
Organization committee Advisory committee
info to participate and compete : https://www.kaggle.com/c/higgs-boson
• 16K$ prize pool	

• 7-4-2K$ for the three top
participants	

• HEP meets ML award for the most
useful model, decided by the ATLAS
members of the organizing
committee
54
B. Kégl / AppStat@LAL Learning to discover
LEADERBOARD AS OF THIS MORNING
55
B. Kégl / AppStat@LAL Learning to discover
LEARNING FROM SCRATCH:
THE DEEP LEARNING REVOLUTION
Why don’t we train a neural network on the 	

raw ~105-108 dimensional signal of ATLAS?

56
B. Kégl / AppStat@LAL Learning to discover
LEARNING FROM SCRATCH:
THE DEEP LEARNING REVOLUTION
• Because it is notoriously difficult to automatically build a model 

= learn particle physics just by looking at the event browser	

• Again, you are not alone: it is also notoriously difficult to
automatically build the model of natural scenes

= learn a model of our surroundings by just looking at
images	

• In the last 5-10 years, we are getting close	

• May be interesting if you do not know what you are looking
for 

57
B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
0 100 200 300 400 500 600 700 800
0
5
10
15
20
25
30
35
40
t @nsD
PE
58
B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
!"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/
12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#&
9-:;&0<*#$+*':;"0//'=0
>$0&#/('.*"0#.('-$*'$*2'7*8.#&
>$(0"#.('-$
>$'('#&*?'-$
@%(A-'$A*
B"#A:0$(/
2.#((0"0+*?'-$
8C0.(0+*D%.&0-$
2':;&0*E%(*$'.0
2,-"(*("%$.#(0+*/,-60"/
59
B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
60
B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
61
B. Kégl / AppStat@LAL Learning to discover
0 100 200 300 400 500 600 700 800
0
5
10
15
20
25
30
35
40
t @nsD
PE
WHAT IS THIS?
62
B. Kégl / AppStat@LAL Learning to discover
WHAT IS THIS?
!"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/
12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#&
9-:;&0<*#$+*':;"0//'=0
>$0&#/('.*"0#.('-$*'$*2'7*8.#&
>$(0"#.('-$
>$'('#&*?'-$
@%(A-'$A*
B"#A:0$(/
2.#((0"0+*?'-$
8C0.(0+*D%.&0-$
2':;&0*E%(*$'.0
2,-"(*("%$.#(0+*/,-60"/63
B. Kégl / AppStat@LAL Learning to discover
MODELS
• Inference	

• if you want to be able to answer questions about observed
phenomena, you need a model	

• if you want quantitative answers, you need a formal model	

• Formal setup	

• x: observation vector, Θ: parameter vector to infer	

• likelihood: p(x | Θ)	

• simulator: given Θ, generate a random x
64
B. Kégl / AppStat@LAL Learning to discover
A FORMAL MODEL
Bal´azs K´egl/LAL 3
Bal´azs K´egl/LAL 6
The observatory
Bal´azs K´egl / LAL 4
energy, direction, mass
X0, HEP
XMax, NMuMax, NMuTotal, LEP
LDF, asymmetry, S1000, S38, NMu1000
S, t0, risetime, jumps
Bal´azs K´egl/LAL
Cherenkov light
Bal´azs K´egl/LAL 12
PEs given ideal response
name notation unit
expected number of PEs in bin i ¯ni unitless
number of PEs in bin i ni unitless
0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475
0
2
4
6
8
10
12
14
16
18
20
22
24
26
t ns⇥
PE
AL 11
Ideal muon response
name notation unit
signal decay time ⇤ ns
signal risetime td ns
muon arrival time tµ ns
muon tracklength Lµ m
muon energy factor µ unitless
avg number of PEs per 1 m tracklength ⇥ m 1
L⌅⇤⌅⇧t⇥
1⌥
td
⌃
td
L⌅ ⇤⌅ ⇧
⌃td
t⌅
0 25 50 75 100 125 150 175 200
0
2
4
6
8
10
12
14
16
18
20
22
24
26
t ns⇥
PE
p(x | t)
0 100 200 300 400 500 600 700 800
0
5
10
15
20
25
30
35
40
t @nsD
PE
p(x | t1,...t4) p(t | Θ)
65
B. Kégl / AppStat@LAL Learning to discover
INFERENCE BY SAMPLING
66
B. Kégl / AppStat@LAL Learning to discover
HOW TO BUILD MODELS FOR THESE?
0 100 200 300 400 500 600 700 800
0
5
10
15
20
25
30
35
40
t @nsD
PE
!"#"$%&'()*+,+)-,)
,
!"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/
12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#&
9-:;&0<*#$+*':;"0//'=0
>$0&#/('.*"0#.('-$*'$*2'7*8.#&
>$(0"#.('-$
>$'('#&*?'-$
@%(A-'$A*
B"#A:0$(/
2.#((0"0+*?'-$
8C0.(0+*D%.&0-$
2':;&0*E%(*$'.0
2,-"(*("%$.#(0+*/,-60"/
5'A,*A"#$%&#"'()*;0":'(/*+0(#'&0+*='06*'$(-*,#+"-$'.*/,-60"
π
67
B. Kégl / AppStat@LAL Learning to discover
LEARNING FROM SCRATCH:
THE DEEP LEARNING REVOLUTION
• Training multi-layer neural networks	

• biological inspiration: we know the brain is multi-layer	

• appealing from a modeling point of view: abstraction increases with
depth	

• notoriously difficult to train until Hinton et al. (stacked RBMs) and
Bengio et al. (stacked autoencoders), around 2006	

• the key principle is (was?) unsupervised pre-training	

• they remain computationally very expensive, but they learn high-
level (abstract) features and they scale: with more data they learn
more
68
B. Kégl / AppStat@LAL Learning to discover
• Google passes the “purring
test” (ICML’12)	

• 16K cores watching 10M
youtube stills for 3 days	

• completely unsupervised: the
cat has just appeared as a
useful concept to represent
LEARNING FROM SCRATCH:
THE DEEP LEARNING REVOLUTION
69
B. Kégl / AppStat@LAL Learning to discover
• Can we also learn physics by observing natural
phenomena?
LEARNING FROM SCRATCH:
THE DEEP LEARNING REVOLUTION
!"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/
12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#&
9-:;&0<*#$+*':;"0//'=0
>$0&#/('.*"0#.('-$*'$*2'7*8.#&
>$(0"#.('-$
>$'('#&*?'-$
@%(A-'$A*
B"#A:0$(/
2.#((0"0+*?'-$
8C0.(0+*D%.&0-$
2':;&0*E%(*$'.0
2,-"(*("%$.#(0+*/,-60"/70
B. Kégl / AppStat@LAL Learning to discover
DEEP LEARNING FOR IMAGING
CALORIMETERS
• Collaboration with	

• Roman Poeschl (ILC / LAL)	

• Naomi van der Kolk (ILC / LAL)	

• Sviatoslav Bilokin (ILC / LAL)	

• Mehdi Cherti (AppStat / LAL)	

• Trong Hieu Tran (ILC / LLR)	

• Vincent Boudry (ILC / LLR)
71
B. Kégl / AppStat@LAL Learning to discover
FOOD FOR THOUGHT 1:
CERN ACCELERATING SCIENCE
Data Science	

!
Design of automated methods 	

to analyze massive and complex data 	

in order to extract useful information from them 	

72
B. Kégl / AppStat@LAL Learning to discover
Data science
statistics

machine learning

signal processing

data visualization

databases
Tool building
software engineering

clouds/grids

high-performance computing

optimization
Domain science
life brain earth

universe human society
FOOD FOR THOUGHT 1:
CERN ACCELERATING SCIENCE
73
B. Kégl / AppStat@LAL Learning to discover
You have been doing data science for a long time
Consider transferring knowledge 

to other sciences on 

how to organize large scientific projects around data
FOOD FOR THOUGHT 1:
CERN ACCELERATING SCIENCE
74
B. Kégl / AppStat@LAL Learning to discover
• What is a data challenge:	

• outreach to public?	

• peer-to-peer communication?
FOOD FOR THOUGHT II:
CERN ACCELERATING SCIENCE
75
B. Kégl / AppStat@LAL Learning to discover
• What is a data challenge:	

• between the two: communicating (translating!) your technical
problems to another scientific community which might have solutions
(and know-how to design solutions) for you	

• think about building formal channels (as you did for outreach and
“classical” publications)
FOOD FOR THOUGHT II:
CERN ACCELERATING SCIENCE
76
B. Kégl / AppStat@LAL Learning to discover
THANK YOU!
77
B. Kégl / AppStat@LAL Learning to discover
BACKUP
78
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
How to design g to maximize the sensitivity?
• A two-stage approach
1. optimize a discriminant function f : Rd
! R for balanced AUC or
balanced classification error: N0
s = N0
b = 0.5
Tree, N = 2., T = 100000
1 10 100 1000 104
0.00
0.05
0.10
0.15
0.20
0.25
T
balancederror
AdaBoost learning curves
79
B. Kégl / AppStat@LAL Learning to discover
Comparing with the official Atlas analysis
• Atlas does a manual pre-selection, the first maximum of the AMS is
completely eliminated. Why? Have we found something, or they have
an implicit reason?
• No we haven’t, yes they have
• µb has a ⇠ 10% relative systematic uncertainty: the 300 signals are
completely submerged by the = 600 background systematics
AMS = 1s
AMS = 2s
AMS = 3s
AMS = 4s
AMS = 5s
0 2000 4000 6000 8000 10000 12000
0
100
200
300
400
500
b HFPL
sHTPL
Unnormalized ROC
1 / 1
80
B. Kégl / AppStat@LAL Learning to discover
The Higgs boson ML challenge
• Dilemma: the physically relevant AMS is optimized in a tiny region
• the AMS has a high variance: a bad measure to compare the
participants
• in the Atlas analysis, there was 1 between the expected and
measured significances
1 / 1
81
B. Kégl / AppStat@LAL Learning to discover
The Higgs boson ML challenge
• We are even nervous about the original AMS (red), so we regularize it
AMS =
s
2
✓
(s + b + breg) ln
✓
1 +
s
b + breg
◆
s
◆
with breg = 10
1 / 1
82

More Related Content

PDF
Role of Machine Learning in High Energy physics research at LHC
PDF
用 Python 玩 LHC 公開數據
PPTX
Multi Object Tracking | Presentation 1 | ID 103001
PPTX
Multi Object Tracking | Final Defense | ID 103001
PPTX
Multi Object Tracking | Presentation 2 | ID 103001
PDF
Signal Discrimination in Cells Through A Negative Feedback
PDF
Autoencoding RNN for inference on unevenly sampled time-series data
PDF
How might machine learning help advance solar PV research?
Role of Machine Learning in High Energy physics research at LHC
用 Python 玩 LHC 公開數據
Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Final Defense | ID 103001
Multi Object Tracking | Presentation 2 | ID 103001
Signal Discrimination in Cells Through A Negative Feedback
Autoencoding RNN for inference on unevenly sampled time-series data
How might machine learning help advance solar PV research?

What's hot (11)

PDF
RAMP Data Challenge
PDF
Eyeriss Introduction
PDF
AIC x PyLadies TW Python Data Vis - 2: Plot packages
PPTX
Parallel Algorithm for Natural Neighbour Interpolation
PPTX
Solar System Processing with LSST: A Status Update
PDF
Dmitry Larko, H2O.ai - Kaggle Airbus Ship Detection Challenge - H2O World San...
PDF
eLabBench
PDF
DuraMat Data Analytics
PPTX
Stack Data structure
PDF
Materials Project computation and database infrastructure
PDF
University of Applied Science Esslingen @ Scilab Conference 2018
RAMP Data Challenge
Eyeriss Introduction
AIC x PyLadies TW Python Data Vis - 2: Plot packages
Parallel Algorithm for Natural Neighbour Interpolation
Solar System Processing with LSST: A Status Update
Dmitry Larko, H2O.ai - Kaggle Airbus Ship Detection Challenge - H2O World San...
eLabBench
DuraMat Data Analytics
Stack Data structure
Materials Project computation and database infrastructure
University of Applied Science Esslingen @ Scilab Conference 2018
Ad

Similar to Learning do discover: machine learning in high-energy physics (20)

PDF
Machine_Learning_Co__
PPTX
Learning
PPTX
ppt on introduction to Machine learning tools
PPTX
Machine learning ppt.
PPTX
Chapter 6 - Learning data and analytics course
PDF
01_introduction to machine learning algorithms and basics .pdf
PPT
c23_ml1.ppt
PDF
Machine learning and its parameter is discussed here
PPTX
L 8 introduction to machine learning final kirti.pptx
PPT
Machine learning and deep learning algorithms
PDF
01_introduction_ML.pdf
PPT
ML_Overview.ppt
PPTX
ML_Overview.pptx
PPT
ML_Overview.ppt
PPT
ML overview
PPTX
ML slide share.pptx
PPTX
Launching into machine learning
PPT
Different learning Techniques in Artificial Intelligence
PPT
ML-DecisionTrees.ppt
Machine_Learning_Co__
Learning
ppt on introduction to Machine learning tools
Machine learning ppt.
Chapter 6 - Learning data and analytics course
01_introduction to machine learning algorithms and basics .pdf
c23_ml1.ppt
Machine learning and its parameter is discussed here
L 8 introduction to machine learning final kirti.pptx
Machine learning and deep learning algorithms
01_introduction_ML.pdf
ML_Overview.ppt
ML_Overview.pptx
ML_Overview.ppt
ML overview
ML slide share.pptx
Launching into machine learning
Different learning Techniques in Artificial Intelligence
ML-DecisionTrees.ppt
Ad

More from Balázs Kégl (12)

PDF
Data-driven hypothesis generation using deep neural nets
PDF
Model-based reinforcement learning and self-driving engineering systems
PDF
Managing the AI process: putting humans (back) in the loop
PPTX
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
PDF
Machine learning in scientific workflows
PDF
A historical introduction to deep learning: hardware, data, and tricks
PDF
Build your own data challenge, or just organize team work
PDF
RAMP: Collaborative challenge with code submission
PDF
Deep learning and the systemic challenges of data science initiatives
PDF
What is wrong with data challenges
PDF
The systemic challenges in data science initiatives (and some solutions)
PDF
The Paris-Saclay Center for Data Science
Data-driven hypothesis generation using deep neural nets
Model-based reinforcement learning and self-driving engineering systems
Managing the AI process: putting humans (back) in the loop
DARMDN: Deep autoregressive mixture density nets for dynamical system mode...
Machine learning in scientific workflows
A historical introduction to deep learning: hardware, data, and tricks
Build your own data challenge, or just organize team work
RAMP: Collaborative challenge with code submission
Deep learning and the systemic challenges of data science initiatives
What is wrong with data challenges
The systemic challenges in data science initiatives (and some solutions)
The Paris-Saclay Center for Data Science

Recently uploaded (20)

PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Introduction to Business Data Analytics.
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Computer network topology notes for revision
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Business Data Analytics.
Miokarditis (Inflamasi pada Otot Jantung)
IB Computer Science - Internal Assessment.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Computer network topology notes for revision
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Major-Components-ofNKJNNKNKNKNKronment.pptx
Mega Projects Data Mega Projects Data
climate analysis of Dhaka ,Banglades.pptx
Business Acumen Training GuidePresentation.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
.pdf is not working space design for the following data for the following dat...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj

Learning do discover: machine learning in high-energy physics

  • 1. B. Kégl / AppStat@LAL Learning to discover LEARNING TO DISCOVER: MACHINE LEARNING IN HIGH-ENERGY PHYSICS Linear Accelerator Laboratory and Computer Science Laboratory CNRS/IN2P3 & University Paris-S{ud,aclay} BALÁZS KÉGL CERN, May 13, 2014 1
  • 2. B. Kégl / AppStat@LAL Learning to discover OUTLINE • What is machine learning? • Three projects to illustrate data science in HEP • budgeted learning for triggers (LHCb) • classification for discovery and the HiggsML challenge (ATLAS) • deep learning for imaging calorimeters (ILC) • Concluding remarks • interdisciplinarity: HEP, ML, data science 2
  • 3. B. Kégl / AppStat@LAL Learning to discover WHAT IS MACHINE LEARNING? • “The science of getting computers to act without being explicitly programmed” - Andrew Ng (Stanford/Coursera) • part of standard computer science curriculum since the 90s • inferring knowledge from data • generalizing to unseen data • usually no parametric 
 model assumptions • emphasizing the computational
 challenges Machine Learning Statistics Optimization Artificial intelligence Neuroscience Cognitive science Signal processing Information theory Statistical physics 3
  • 4. B. Kégl / AppStat@LAL Learning to discover MACHINE LEARNING TAXONOMY • Supervised learning: non-parametric (model-free) input - output functions • classification (Trees, BDT, SVM, NN) - what you call MVA • regression (Trees, NN, Gaussian Processes) • Unsupervised learning: non-parametric data representation • clustering (k-means, spectral clustering, Dirichlet processes) • dimensionality reduction (PCA, ISOMAP, LLE, auto-associative NN) • density estimation (kernel density, Gaussian mixtures, the Boltzmann machine) • Reinforcement learning: • learning + dynamic control: learn to behave in an environment to maximize cumulative reward 4
  • 5. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION Character recognition 5
  • 6. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION Emotion recognition 6
  • 7. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION Speech recognition 7
  • 8. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION • Input: a usually high dimensional vector x • Output: a category (label, class) y • Usually no parametric model • the classification function y = g(x) is learned using a training set 
 D = {(x1,y1), . . . , (xn,yn)} • Well-tested algorithms: • neural networks, support vector machines, boosting 8
  • 9. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION The only goal is a low probability of error P(g(x) = y) on previously unseen examples (x, y) 9
  • 10. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION Real time face detection 10
  • 11. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION Real time web page ranking 11
  • 12. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION Real time ad placement 12
  • 13. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION Real time signal/background separation 13
  • 14. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION The second goal is the fast execution of g(x) 14
  • 15. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION Trade-off between quality and speed 15
  • 16. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION • Time constraints • Memory constraints • Consumption constraints • Communication constraints 16
  • 17. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION Learning deep decision DAGs in a {djalel.benbouzid,busarobi,balazs.kegl}@ Original motivation Before... • Stage 1 Stage 2 Stage 3 Stage 4 The common design: cascade classification = trigger with levels easy background medium background hard background signal/very hard backgroundViola-Jones CVPR 2001 17
  • 18. B. Kégl / AppStat@LAL Learning to discover THE LHCB TRIGGER • Collaboration with • Vava Gligorov (CERN) • Mike Williams (MIT) • Djalel Benbouzid (LAL) 18
  • 19. B. Kégl / AppStat@LAL Learning to discover THE LHCB TRIGGER • A beautifully complex problem • varying feature costs • cost may depend on the value • events are bags of overlapping candidates 19
  • 20. B. Kégl / AppStat@LAL Learning to discover THE LHCB TRIGGER Immediate cost Bag-dependent cost Value-dependent cost D0_VTX_FD PiS_IP D0C_1_IP D0C_2_IP D0C_2_PTD0C_1_PT PiS_PT D0C_2_IPC D0C_2_TFC D0C_1_IPC D0C_1_TFC PiS_IPC PiS_TFCDstMD0MD0Tau 20
  • 21. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Easy background Background-like Signal-like EVALUATE Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC 21
  • 22. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH QUIT Benbouzid et al. ICML 2012 Easy background Background-like Signal-like 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC 22
  • 23. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC Easy background Benbouzid et al. ICML 201223
  • 24. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC EVALUATE Background-like Signal-like 24
  • 25. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC SKIP Background-like Signal-like 25
  • 26. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPCEVALUATE Background-like Signal-like 26
  • 27. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC EVALUATE Background-like Signal-like 27
  • 28. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC EVALUATE Background-like Signal-like 28
  • 29. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC EVALUATE Background-like Signal-like 29
  • 30. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC SKIP Background-like Signal-like 30
  • 31. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC EVALUATE Background-like Signal-like 31
  • 32. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC SKIP Background-like Signal-like 32
  • 33. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard background Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC SKIP SKIP SKIP SKIP Background-like Signal-like 33
  • 34. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Easy signal Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC EVALUATE Background-like Signal-like 34
  • 35. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Easy signal Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC QUIT Background-like Signal-like 35
  • 36. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Easy signal Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC 36
  • 37. B. Kégl / AppStat@LAL Learning to discover MDDAG: A SIGNAL/BCKG DECISION GRAPH Hard signal Benbouzid et al. ICML 2012 0ms 1.5ms 4ms D0C_2_IP D0_VTX_FD PiS_IP D0C_1_IP PiS_PT D0C_1_PT D0C_2_PT ∞ D0Tau D0M DstM D0C_1_TFC PiS_TFC2 D0C_1_IPC PiS_IPC2 D0C_2_TFC D0C_2_IPC EVAL EVAL EVAL EVAL EVAL EVAL EVAL EVAL SKIP SKIP SKIP SKIP SKIP 37
  • 38. B. Kégl / AppStat@LAL Learning to discover BUDGETED CLASSIFICATION • Classification with test-time constraints • An active research area due to IT applications • To be exploited for trigger design 38
  • 39. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY Challenge 1: estimation la plus précise et en un temps CPU minimal de la masse du candidat boson de Higgs en fonction des observables de l’événement, et malgré les particules non mesurées. Précision actuelle (intégration avec chaine de Markov en dimension 5) ~20% en 0.1s par événement The HiggsML challenge 39
  • 40. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY • In a nutshell • A vector x of variables is extracted from each event • A classifier g(x) is trained to separate signal from background • The background b is estimated in the selection region 
 G = {x : g(x) = s} • Discovery is made when the number of real events n is significantly higher than b 40
  • 41. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY • Exciting physics • The Higgs to tau-tau excess is not yet at five sigma
 Tech. Rep.ATLAS-CONF-2013-108 • Exciting data science (statistics and machine learning) • What is the theoretical relationship between classification and test sensitivity? • What is the quantitative criteria to optimize? • How to formally include systematic uncertainties? • Can we redesign classical algorithms (boosting, SVM, neural nets) for optimizing this criteria? 41
  • 42. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY We are organizing a 
 data challenge 
 to answer some of these questions Center for Data Science Paris-Saclay the HiggsML challenge May to September 2014 When High Energy Physics meets Machine Learning Joerg Stelzer - Atlas-CERN Marc Schoenauer - INRIA Balázs Kégl - Appstat-LAL Cécile Germain - TAO-LRI David Rousseau - Atlas-LAL Glen Cowan - Atlas-RHUL Isabelle Guyon - Chalearn Claire Adam-Bourdarios - Atlas-LAL Thorsten Wengler - Atlas-CERN Andreas Hoecker - Atlas-CERN Organization committee Advisory committee info to participate and compete : https://www.kaggle.com/c/higgs-boson 42
  • 43. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY The formal setup • We simulate data: D = (x1, y1, w1), . . . , (xn, yn, wn) • xi 2 Rd is the feature vector • yi 2 {background, signal} is the label • wi 2 R+ is a non-negative weight (importance sampling) • let S = {i : yi = s} and B = {i : yi = b} be the index sets of signal and background events, respectively • Maximize the Approximate Median Significance G. Cowan, K. Cranmer, E. Gross, and O. Vitells. EPJ C, 71:1554, 2011. AMS = r 2 ⇣ (s + b) ln ⇣ 1 + s b ⌘ s ⌘ ⇡ s p b • bG = i : g(xi) = s • s = P i2S bG wi • b = P i2B bG wi 43
  • 44. B. Kégl / AppStat@LAL Learning to discover How to design g to maximize the sensitivity? • A two-stage approach 1. optimize a discriminant (score) function f : Rd ! R using a classical learning algorithm (BDT, NN) CLASSIFICATION FOR DISCOVERY Signal Background 44
  • 45. B. Kégl / AppStat@LAL Learning to discover How to design g to maximize the sensitivity? • A two-stage approach 1. optimize a discriminant (score) function f : Rd ! R using a classical learning algorithm (BDT, NN) CLASSIFICATION FOR DISCOVERY Signal Background 45
  • 46. B. Kégl / AppStat@LAL Learning to discover How to design g to maximize the sensitivity? • A two-stage approach 1. optimize a discriminant (score) function f : Rd ! R using a classical learning algorithm (BDT, NN) CLASSIFICATION FOR DISCOVERY Signal Background 46
  • 47. B. Kégl / AppStat@LAL Learning to discover How to design g to maximize the sensitivity? • A two-stage approach (make figure with score) 1. optimize a discriminant (score) function f : Rd ! R using a classical learning algorithm (BDT, NN) 2. define g(x) = sign f(x) ✓ and optimize ✓ for maximizing the AMS CLASSIFICATION FOR DISCOVERY θ 3.5σ 47
  • 48. B. Kégl / AppStat@LAL Learning to discover Comparing with Atlas analysis • Atlas does a manual pre-selection (category), the first maximum of the AMS is completely eliminated. Why? CLASSIFICATION FOR DISCOVERY s = 250 b = 5000 ± 500! Systematics! 48
  • 49. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY How to handle systematic (model) uncertainties? • OK, so let’s design an objective function that can take background systematics into consideration • Likelihood with unknown background b ⇠ N(µb, b) L(µs, µb) = P(n, b|µs, µb, b) = (µs + µb)n n! e (µs+µb) 1 p 2⇡ b e (b µb)2 /2 b 2 • Profile likelihood ratio (0) = L(0, ˆˆµb) L(ˆµs, ˆµb) • The new Approximate Median Significance (by Glen Cowan) AMS = s 2 ✓ (s + b) ln s + b b0 s b + b0 ◆ + (b b0)2 b 2 where b0 = 1 2 ⇣ b b 2 + p (b b 2)2 + 4(s + b) b 2 ⌘ 49
  • 50. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY How to handle systematic (model) uncertainties? • The new Approximate Median Significance AMS = s 2 ✓ (s + b) ln s + b b0 s b + b0 ◆ + (b b0)2 b 2 where b0 = 1 2 ⇣ b b 2 + p (b b 2)2 + 4(s + b) b 2 ⌘ New AMS ATLAS Old AMS 50
  • 51. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY Center for Data Science Paris-Saclay the HiggsML challenge May to September 2014 When High Energy Physics meets Machine Learning Joerg Stelzer - Atlas-CERN Marc Schoenauer - INRIA Balázs Kégl - Appstat-LAL Cécile Germain - TAO-LRI David Rousseau - Atlas-LAL Glen Cowan - Atlas-RHUL Isabelle Guyon - Chalearn Claire Adam-Bourdarios - Atlas-LAL Thorsten Wengler - Atlas-CERN Andreas Hoecker - Atlas-CERN Organization committee Advisory committee info to participate and compete : https://www.kaggle.com/c/higgs-boson A tool for getting 
 the ML community excited about your problem OPEN since yesterday 51
  • 52. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY Center for Data Science Paris-Saclay the HiggsML challenge May to September 2014 When High Energy Physics meets Machine Learning Joerg Stelzer - Atlas-CERN Marc Schoenauer - INRIA Balázs Kégl - Appstat-LAL Cécile Germain - TAO-LRI David Rousseau - Atlas-LAL Glen Cowan - Atlas-RHUL Isabelle Guyon - Chalearn Claire Adam-Bourdarios - Atlas-LAL Thorsten Wengler - Atlas-CERN Andreas Hoecker - Atlas-CERN Organization committee Advisory committee info to participate and compete : https://www.kaggle.com/c/higgs-boson • Organizing committee • David Rousseau (ATLAS / LAL) • Balázs Kégl (AppStat / LAL) • Cécile Germain (LRI / UPSud) • Glen Cowan (ATLAS / Royal Holloway) • Claire Adam Bourdarios (ATLAS / LAL) • Isabelle Guyon (ChaLearn) 52
  • 53. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY Center for Data Science Paris-Saclay the HiggsML challenge May to September 2014 When High Energy Physics meets Machine Learning Joerg Stelzer - Atlas-CERN Marc Schoenauer - INRIA Balázs Kégl - Appstat-LAL Cécile Germain - TAO-LRI David Rousseau - Atlas-LAL Glen Cowan - Atlas-RHUL Isabelle Guyon - Chalearn Claire Adam-Bourdarios - Atlas-LAL Thorsten Wengler - Atlas-CERN Andreas Hoecker - Atlas-CERN Organization committee Advisory committee info to participate and compete : https://www.kaggle.com/c/higgs-boson • Official ATLAS GEANT4 simulations • 30 features (variables) • 250K training: input, label, weight • 100K public test (AMS displayed real-time), only input • 450K private test (to determine the winner after the closing of the challenge), only input • public and private tests set are shuffled, participants submit a vector of 550K labels • Using the “old” AMS • cannot compare participants 
 if the metric varies a lot
 53
  • 54. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY Center for Data Science Paris-Saclay the HiggsML challenge May to September 2014 When High Energy Physics meets Machine Learning Joerg Stelzer - Atlas-CERN Marc Schoenauer - INRIA Balázs Kégl - Appstat-LAL Cécile Germain - TAO-LRI David Rousseau - Atlas-LAL Glen Cowan - Atlas-RHUL Isabelle Guyon - Chalearn Claire Adam-Bourdarios - Atlas-LAL Thorsten Wengler - Atlas-CERN Andreas Hoecker - Atlas-CERN Organization committee Advisory committee info to participate and compete : https://www.kaggle.com/c/higgs-boson • 16K$ prize pool • 7-4-2K$ for the three top participants • HEP meets ML award for the most useful model, decided by the ATLAS members of the organizing committee 54
  • 55. B. Kégl / AppStat@LAL Learning to discover LEADERBOARD AS OF THIS MORNING 55
  • 56. B. Kégl / AppStat@LAL Learning to discover LEARNING FROM SCRATCH: THE DEEP LEARNING REVOLUTION Why don’t we train a neural network on the raw ~105-108 dimensional signal of ATLAS?
 56
  • 57. B. Kégl / AppStat@LAL Learning to discover LEARNING FROM SCRATCH: THE DEEP LEARNING REVOLUTION • Because it is notoriously difficult to automatically build a model 
 = learn particle physics just by looking at the event browser • Again, you are not alone: it is also notoriously difficult to automatically build the model of natural scenes
 = learn a model of our surroundings by just looking at images • In the last 5-10 years, we are getting close • May be interesting if you do not know what you are looking for 
 57
  • 58. B. Kégl / AppStat@LAL Learning to discover WHAT IS THIS? 0 100 200 300 400 500 600 700 800 0 5 10 15 20 25 30 35 40 t @nsD PE 58
  • 59. B. Kégl / AppStat@LAL Learning to discover WHAT IS THIS? !"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/ 12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#& 9-:;&0<*#$+*':;"0//'=0 >$0&#/('.*"0#.('-$*'$*2'7*8.#& >$(0"#.('-$ >$'('#&*?'-$ @%(A-'$A* B"#A:0$(/ 2.#((0"0+*?'-$ 8C0.(0+*D%.&0-$ 2':;&0*E%(*$'.0 2,-"(*("%$.#(0+*/,-60"/ 59
  • 60. B. Kégl / AppStat@LAL Learning to discover WHAT IS THIS? 60
  • 61. B. Kégl / AppStat@LAL Learning to discover WHAT IS THIS? 61
  • 62. B. Kégl / AppStat@LAL Learning to discover 0 100 200 300 400 500 600 700 800 0 5 10 15 20 25 30 35 40 t @nsD PE WHAT IS THIS? 62
  • 63. B. Kégl / AppStat@LAL Learning to discover WHAT IS THIS? !"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/ 12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#& 9-:;&0<*#$+*':;"0//'=0 >$0&#/('.*"0#.('-$*'$*2'7*8.#& >$(0"#.('-$ >$'('#&*?'-$ @%(A-'$A* B"#A:0$(/ 2.#((0"0+*?'-$ 8C0.(0+*D%.&0-$ 2':;&0*E%(*$'.0 2,-"(*("%$.#(0+*/,-60"/63
  • 64. B. Kégl / AppStat@LAL Learning to discover MODELS • Inference • if you want to be able to answer questions about observed phenomena, you need a model • if you want quantitative answers, you need a formal model • Formal setup • x: observation vector, Θ: parameter vector to infer • likelihood: p(x | Θ) • simulator: given Θ, generate a random x 64
  • 65. B. Kégl / AppStat@LAL Learning to discover A FORMAL MODEL Bal´azs K´egl/LAL 3 Bal´azs K´egl/LAL 6 The observatory Bal´azs K´egl / LAL 4 energy, direction, mass X0, HEP XMax, NMuMax, NMuTotal, LEP LDF, asymmetry, S1000, S38, NMu1000 S, t0, risetime, jumps Bal´azs K´egl/LAL Cherenkov light Bal´azs K´egl/LAL 12 PEs given ideal response name notation unit expected number of PEs in bin i ¯ni unitless number of PEs in bin i ni unitless 0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 0 2 4 6 8 10 12 14 16 18 20 22 24 26 t ns⇥ PE AL 11 Ideal muon response name notation unit signal decay time ⇤ ns signal risetime td ns muon arrival time tµ ns muon tracklength Lµ m muon energy factor µ unitless avg number of PEs per 1 m tracklength ⇥ m 1 L⌅⇤⌅⇧t⇥ 1⌥ td ⌃ td L⌅ ⇤⌅ ⇧ ⌃td t⌅ 0 25 50 75 100 125 150 175 200 0 2 4 6 8 10 12 14 16 18 20 22 24 26 t ns⇥ PE p(x | t) 0 100 200 300 400 500 600 700 800 0 5 10 15 20 25 30 35 40 t @nsD PE p(x | t1,...t4) p(t | Θ) 65
  • 66. B. Kégl / AppStat@LAL Learning to discover INFERENCE BY SAMPLING 66
  • 67. B. Kégl / AppStat@LAL Learning to discover HOW TO BUILD MODELS FOR THESE? 0 100 200 300 400 500 600 700 800 0 5 10 15 20 25 30 35 40 t @nsD PE !"#"$%&'()*+,+)-,) , !"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/ 12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#& 9-:;&0<*#$+*':;"0//'=0 >$0&#/('.*"0#.('-$*'$*2'7*8.#& >$(0"#.('-$ >$'('#&*?'-$ @%(A-'$A* B"#A:0$(/ 2.#((0"0+*?'-$ 8C0.(0+*D%.&0-$ 2':;&0*E%(*$'.0 2,-"(*("%$.#(0+*/,-60"/ 5'A,*A"#$%&#"'()*;0":'(/*+0(#'&0+*='06*'$(-*,#+"-$'.*/,-60" π 67
  • 68. B. Kégl / AppStat@LAL Learning to discover LEARNING FROM SCRATCH: THE DEEP LEARNING REVOLUTION • Training multi-layer neural networks • biological inspiration: we know the brain is multi-layer • appealing from a modeling point of view: abstraction increases with depth • notoriously difficult to train until Hinton et al. (stacked RBMs) and Bengio et al. (stacked autoencoders), around 2006 • the key principle is (was?) unsupervised pre-training • they remain computationally very expensive, but they learn high- level (abstract) features and they scale: with more data they learn more 68
  • 69. B. Kégl / AppStat@LAL Learning to discover • Google passes the “purring test” (ICML’12) • 16K cores watching 10M youtube stills for 3 days • completely unsupervised: the cat has just appeared as a useful concept to represent LEARNING FROM SCRATCH: THE DEEP LEARNING REVOLUTION 69
  • 70. B. Kégl / AppStat@LAL Learning to discover • Can we also learn physics by observing natural phenomena? LEARNING FROM SCRATCH: THE DEEP LEARNING REVOLUTION !"#$%&#"'()*#$+*,#+"-$'.*.#/.#+0/ 12(#"(*-34*5#+"-$'.*/,-60"/*'$*(,0*2'7*8.#& 9-:;&0<*#$+*':;"0//'=0 >$0&#/('.*"0#.('-$*'$*2'7*8.#& >$(0"#.('-$ >$'('#&*?'-$ @%(A-'$A* B"#A:0$(/ 2.#((0"0+*?'-$ 8C0.(0+*D%.&0-$ 2':;&0*E%(*$'.0 2,-"(*("%$.#(0+*/,-60"/70
  • 71. B. Kégl / AppStat@LAL Learning to discover DEEP LEARNING FOR IMAGING CALORIMETERS • Collaboration with • Roman Poeschl (ILC / LAL) • Naomi van der Kolk (ILC / LAL) • Sviatoslav Bilokin (ILC / LAL) • Mehdi Cherti (AppStat / LAL) • Trong Hieu Tran (ILC / LLR) • Vincent Boudry (ILC / LLR) 71
  • 72. B. Kégl / AppStat@LAL Learning to discover FOOD FOR THOUGHT 1: CERN ACCELERATING SCIENCE Data Science ! Design of automated methods to analyze massive and complex data in order to extract useful information from them 72
  • 73. B. Kégl / AppStat@LAL Learning to discover Data science statistics
 machine learning
 signal processing
 data visualization
 databases Tool building software engineering
 clouds/grids
 high-performance computing
 optimization Domain science life brain earth
 universe human society FOOD FOR THOUGHT 1: CERN ACCELERATING SCIENCE 73
  • 74. B. Kégl / AppStat@LAL Learning to discover You have been doing data science for a long time Consider transferring knowledge 
 to other sciences on 
 how to organize large scientific projects around data FOOD FOR THOUGHT 1: CERN ACCELERATING SCIENCE 74
  • 75. B. Kégl / AppStat@LAL Learning to discover • What is a data challenge: • outreach to public? • peer-to-peer communication? FOOD FOR THOUGHT II: CERN ACCELERATING SCIENCE 75
  • 76. B. Kégl / AppStat@LAL Learning to discover • What is a data challenge: • between the two: communicating (translating!) your technical problems to another scientific community which might have solutions (and know-how to design solutions) for you • think about building formal channels (as you did for outreach and “classical” publications) FOOD FOR THOUGHT II: CERN ACCELERATING SCIENCE 76
  • 77. B. Kégl / AppStat@LAL Learning to discover THANK YOU! 77
  • 78. B. Kégl / AppStat@LAL Learning to discover BACKUP 78
  • 79. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY How to design g to maximize the sensitivity? • A two-stage approach 1. optimize a discriminant function f : Rd ! R for balanced AUC or balanced classification error: N0 s = N0 b = 0.5 Tree, N = 2., T = 100000 1 10 100 1000 104 0.00 0.05 0.10 0.15 0.20 0.25 T balancederror AdaBoost learning curves 79
  • 80. B. Kégl / AppStat@LAL Learning to discover Comparing with the official Atlas analysis • Atlas does a manual pre-selection, the first maximum of the AMS is completely eliminated. Why? Have we found something, or they have an implicit reason? • No we haven’t, yes they have • µb has a ⇠ 10% relative systematic uncertainty: the 300 signals are completely submerged by the = 600 background systematics AMS = 1s AMS = 2s AMS = 3s AMS = 4s AMS = 5s 0 2000 4000 6000 8000 10000 12000 0 100 200 300 400 500 b HFPL sHTPL Unnormalized ROC 1 / 1 80
  • 81. B. Kégl / AppStat@LAL Learning to discover The Higgs boson ML challenge • Dilemma: the physically relevant AMS is optimized in a tiny region • the AMS has a high variance: a bad measure to compare the participants • in the Atlas analysis, there was 1 between the expected and measured significances 1 / 1 81
  • 82. B. Kégl / AppStat@LAL Learning to discover The Higgs boson ML challenge • We are even nervous about the original AMS (red), so we regularize it AMS = s 2 ✓ (s + b + breg) ln ✓ 1 + s b + breg ◆ s ◆ with breg = 10 1 / 1 82