Statistics and Fitting: overview and thoughts

Fitting and Statistics:
overview and thoughts
M. Marino
Group Meeting
22 Jan 2015
1

Black boxes are bad. As physicists, our goal is to
understand everything about the experiment as much as
possible. That includes (especially!) the analysis.
2

Goals for the next slides
• Key words - some you’ve heard before, some
perhaps not
• Explanation
• Practical examples
WARNING: may be
a bit pedantic
3

What I’m not going to talk
about
• The very basics… Gaussian, Poisson, Binomial are
all probably concepts you understand at least at
some level.
• Any sort of derivations
• A deep discussion of probability (e.g. Bayesian vs.
Frequentist)
4

Bread and butter
• Minimization:
• Maximum Likelihood
• χ2 - ﬁts
• (Markov Chain Monte Carlo) - ﬁtting models with
Bayesian statistics and MC methods
5

Maximum Likelihood
• ‘Maximize your likelihood function’
• (Actually, you typically minimize the -log L)
• You may choose a binning, but you don’t have to. (so-called ‘unbinned’ ML ﬁts…
important when e.g. a binning choice may bias your ﬁts.)
• The probability functions depend upon the underlying statistics, it’s probably normally
Poisson functions. Note: you can of course ‘multiply’ the likelihood function with
external information… this is how you build in external constraints.
L =
NY
n=1
f(✓, xn)
number of
points (bins)
f: probability for
seeing xn given θ
xn: nth data point
θ : set of parameters
6

χ2 ﬁts
L =
NY
n=1
f(✓, xn)
assume f is Gaussian
and take the - log L
NY
n=1
Ce [xn y(✓,xn)]2
/2 2
n
NX
n=1
(xn y(✓, xn))2
2 2
n
This is χ2 modulo a
factor 2
7

χ2 ﬁts
NX
n=1
(xn y(✓, xn))2
2
n
• This is a χ
2
function. It assumes that your underlying statistics are
Gaussian. If this is not true (e.g. your statistics are low) your
results may not be correct.
• You must choose a binning, which can open you up to binning biases
• This has some neat features, including a built-in “Goodness-of-
ﬁt” (more on this later)
8

Ok, so how do I use this?
• Typically, you don’t have to build your own χ2 or -log
L function. e.g. ROOT, RooFit will do this for you.
But sometimes you have to build your own.
• You minimize this function using some sort of
arbitrary minimizer. ROOT using MIGRAD which is
part of the MINUIT2 suite. It is generally very robust.
• (The following uses the output from ROOT, but
generally other minimizers should give you similar
output.)
9

Note, the ﬁt was
unbinned, but obviously
the plotting needs a
choice of bins
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-18938.8 FROM HESSE STATUS=OK 16 CALLS 65 TOTAL
EDM=5.46776e-08 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 mean 4.99743e+01 1.42546e-01 2.70887e-03 4.99743e+01
2 num 5.00000e+03 7.07084e+01 5.37579e-05 6.02076e-07
3 sigma 1.00794e+01 1.00805e-01 3.83045e-04 1.00794e+01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=0.5
2.032e-02 5.071e-07 2.719e-06
5.071e-07 5.000e+03 -1.795e-06
2.719e-06 -1.795e-06 1.016e-02
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3
1 0.00019 1.000 0.000 0.000
2 0.00000 0.000 1.000 -0.000
3 0.00019 0.000 -0.000 1.000
Examples: ML
10

FCN=67.5572 FROM MIGRAD STATUS=CONVERGED 43 CALLS 44 TOTAL
EDM=5.17307e-07 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 mean 5.00336e+01 1.43872e-01 5.80851e-04 4.58652e-03
2 num 5.03375e+03 7.09462e+01 5.73799e-05 -5.16551e-02
3 sigma 1.02075e+01 9.98379e-02 4.02277e-04 -2.48082e-03
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=1
2.070e-02 3.374e-04 -2.595e-05
3.374e-04 5.034e+03 1.873e-04
-2.595e-05 1.873e-04 9.968e-03
NO. GLOBAL 1 2 3
1 0.00181 1.000 0.000 -0.002
2 0.00004 0.000 1.000 0.000
3 0.00181 -0.002 0.000 1.000
Examples: χ2
Fit has 100 bins
11

Some comments
• The value of the minimized -log L is arbitrary. In
contrast, the minimized χ2 gives you information
(goodness-of-ﬁt, more later)
http://seal.web.cern.ch/seal/documents/minuit/mnerror.pdf
FCN=67.5572 FROM MIGRAD STATUS=CONVERGED 43 CALLS 44 TOTAL
FCN=-18938.8 FROM HESSE STATUS=OK 16 CALLS 65 TOTAL
12

Some comments
• The correlation matrix (which is just a normalized
covariance matrix) gives you information about the
interaction of your parameters. Values close to 1
(e.g. 0.95 and above) can be a cause for concern!
NO. GLOBAL 1 2 3
1 0.00019 1.000 0.000 0.000
2 0.00000 0.000 1.000 -0.000
3 0.00019 0.000 -0.000 1.000
NO. GLOBAL 1 2 3
1 0.00181 1.000 0.000 -0.002
2 0.00004 0.000 1.000 0.000
3 0.00181 -0.002 0.000 1.000
13

Some comments
• What about the errors on the parameters and all
that information?
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 mean 4.99743e+01 1.42546e-01 2.70887e-03 4.99743e+01
2 num 5.00000e+03 7.07084e+01 5.37579e-05 6.02076e-07
3 sigma 1.00794e+01 1.00805e-01 3.83045e-04 1.00794e+01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=0.5
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 mean 5.00336e+01 1.43872e-01 5.80851e-04 4.58652e-03
2 num 5.03375e+03 7.09462e+01 5.73799e-05 -5.16551e-02
3 sigma 1.02075e+01 9.98379e-02 4.02277e-04 -2.48082e-03
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=1
14

Errors on parameters
• are deﬁned as those values which increase the χ2 (-log
L) function from its minimum by 1 (0.5). (This was the
“ERROR DEF” seen on previous slide.)
• If your measurement error bars are wrong, the errors
on your parameters will be wrong, too!
• Most of this is derived from the curvature of the -log L, or
χ2, function: the second derivative matrix is calculated
and inverted at the minimum… (this always assumes the
shape at the minimum is parabolic). These errors are
not always appropriate, or accurate.
15

Covariance Matrix (or Error Matrix)
16
Vxy = E[(x µx)(y µy)]
(This is shown later as Σ).
Note: this is typically estimated during the
minimization process and is related to the curvature
of the -log L/χ2 function with respect to the
parameters

Covariance Matrix (or Error Matrix)
17
It is the inverse of the second derivative matrix:
(Vxy)
1
=
@2
log L
@x@y x=ˆx,y=ˆy
evaluated at the minimum. (To convince yourself of this,
consider that -log L should have the form of a multivariate
gaussian at/near the minimum.)

• Procedure: fix the parameter(s) of interest, and minimize the -
log L, χ2
. Scan the parameter and repeat.
• This finds the minimum contour for this parameter(s) of
interest.
• See, likelihood ratio, profile likelihood scan, chi-square profile.
This is a -log L
function, so the
1σ error is at 0.5
Profiling
18

A 2-D example of a proﬁle scan
39
Rn in air gap500 100015002000250030003500400045005000
UinHFE
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
ProfileNLL
0
10
20
30
40
50
FIG. 29. 2D proﬁle likelihood surface as a function of di↵erent contribution of U-like components
Shows correlation of two parameters, as well as the
combined error.19

How do I tell if the ﬁt is ok?
• Goodness of ﬁt!
• χ2, this is built in. (Note, you should also quote the χ2
and the number of degrees of freedom, never only
the reduced chi-sq). Also, as a shortcut, usually we
think χ2/NDF ~ 1 is “good”, but remember that larger
deviations from 1 are expected for small NDF, and only
small deviations are tolerated for large NDFs.
• You should use a chi-squared distribution lookup to get
the appropriate probability for a given χ2, NDF value pair.
e.g. ROOT: TMath::Prob(chi2, ndf)
20

Chi-Sq PDF with different NDFs:
(The minimum of a χ2 function is so distributed.)
https://en.wikipedia.org/wiki/Chi-squared_distribution
21

Chi-Sq CDF with different NDFs:
(The minimum of a χ2 function is so distributed.)
https://en.wikipedia.org/wiki/Chi-squared_distribution
22

• For ML fits, you can also calculate a chi-square
(but you must first choose a binning).
• Other “goodness of fit” tests, or rather “tests to see
if the data are derived from your fit model”:
Kolmogorov-Smirnov test.
• https://en.wikipedia.org/wiki/Kolmogorov
%E2%80%93Smirnov_test
23

Adding external information
L =
NY
n=1
f(✓, xn)
• You’ll hear talk of adding “penalty functions”, “weighting functions” to your
-log L, or χ
2
function
• Really, all that’s being done is a multiplication of the likelihood with an
additional probability.
• The exact form used depends on your problem, a Gaussian is a typical
choice
How do I incorporate other measurements,
systematic errors?
24

L =
NY
n=1
f(✓, xn)
Multivariate gaussian. Here, Σ is a positive deﬁnite
(symmetric) covariance matrix
25

For two variables, it looks like
26

which means your -log L function gets an additional
term added (ρ is the correlation between parameters):
(for χ2, this term multiplied by 2 is added)
27
1
2(1 ⇢2)

(x µx)2
2
x
+
(y µy)2
2
y
(x µx)(y µy)
x y

For just a single parameter (or a set of uncorrelated
parameters), one just adds
parameter of
interest
expected
(measured?) value
(measured) error
on this value
28
1
2

(x µx)2
2
x

That’s it, for now…
There’s a lot more to learn. If your program/ﬁt gives you output
you don’t understand, try to understand it!
29

G. Cowan Statistical Data Analysis / Stat 1 5
Some statistics books, papers, etc.
G. Cowan, Statistical Data Analysis, Clarendon, Oxford, 1998
R.J. Barlow, Statistics: A Guide to the Use of Statistical Methods in
the Physical Sciences, Wiley, 1989
Ilya Narsky and Frank C. Porter, Statistical Analysis Techniques in
Particle Physics, Wiley, 2014.
L. Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986
F. James., Statistical and Computational Methods in Experimental
Physics, 2nd ed., World Scientific, 2006
S. Brandt, Statistical and Computational Methods in Data
Analysis, Springer, New York, 1998 (with program library on CD)
J. Beringer et al. (Particle Data Group), Review of Particle Physics,
Phys. Rev. D86, 010001 (2012) ; see also pdg.lbl.gov sections on
probability, statistics, Monte Carlo
From: http://www.pp.rhul.ac.uk/~cowan/stat/stat_1.pdf
Have a look at this course series!
A python script with some of the ROOT/RooFit info:
https://gist.github.com/mgmarino/9c030c67072e4295d6ec
30

Statistics and Fitting: overview and thoughts

More Related Content

What's hot (6)

Similar to Statistics and Fitting: overview and thoughts (20)

Recently uploaded (20)

Statistics and Fitting: overview and thoughts