SlideShare a Scribd company logo
Fitting and Statistics:
overview and thoughts
M. Marino
Group Meeting
22 Jan 2015
1
Black boxes are bad. As physicists, our goal is to
understand everything about the experiment as much as
possible. That includes (especially!) the analysis.
2
Goals for the next slides
• Key words - some you’ve heard before, some
perhaps not
• Explanation
• Practical examples
WARNING: may be
a bit pedantic
3
What I’m not going to talk
about
• The very basics… Gaussian, Poisson, Binomial are
all probably concepts you understand at least at
some level.
• Any sort of derivations
• A deep discussion of probability (e.g. Bayesian vs.
Frequentist)
4
Bread and butter
• Minimization:
• Maximum Likelihood
• χ2 - fits
• (Markov Chain Monte Carlo) - fitting models with
Bayesian statistics and MC methods
5
Maximum Likelihood
• ‘Maximize your likelihood function’
• (Actually, you typically minimize the -log L)
• You may choose a binning, but you don’t have to. (so-called ‘unbinned’ ML fits…
important when e.g. a binning choice may bias your fits.)
• The probability functions depend upon the underlying statistics, it’s probably normally
Poisson functions. Note: you can of course ‘multiply’ the likelihood function with
external information… this is how you build in external constraints.
L =
NY
n=1
f(✓, xn)
number of
points (bins)
f: probability for
seeing xn given θ
xn: nth data point
θ : set of parameters
6
χ2 fits
L =
NY
n=1
f(✓, xn)
assume f is Gaussian
and take the - log L
NY
n=1
Ce [xn y(✓,xn)]2
/2 2
n
NX
n=1
(xn y(✓, xn))2
2 2
n
This is χ2 modulo a
factor 2
7
χ2 fits
NX
n=1
(xn y(✓, xn))2
2
n
• This is a χ
2
function. It assumes that your underlying statistics are
Gaussian. If this is not true (e.g. your statistics are low) your
results may not be correct.
• You must choose a binning, which can open you up to binning biases
• This has some neat features, including a built-in “Goodness-of-
fit” (more on this later)
8
Ok, so how do I use this?
• Typically, you don’t have to build your own χ2 or -log
L function. e.g. ROOT, RooFit will do this for you.
But sometimes you have to build your own.
• You minimize this function using some sort of
arbitrary minimizer. ROOT using MIGRAD which is
part of the MINUIT2 suite. It is generally very robust.
• (The following uses the output from ROOT, but
generally other minimizers should give you similar
output.)
9
Note, the fit was
unbinned, but obviously
the plotting needs a
choice of bins
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-18938.8 FROM HESSE STATUS=OK 16 CALLS 65 TOTAL
EDM=5.46776e-08 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 mean 4.99743e+01 1.42546e-01 2.70887e-03 4.99743e+01
2 num 5.00000e+03 7.07084e+01 5.37579e-05 6.02076e-07
3 sigma 1.00794e+01 1.00805e-01 3.83045e-04 1.00794e+01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=0.5
2.032e-02 5.071e-07 2.719e-06
5.071e-07 5.000e+03 -1.795e-06
2.719e-06 -1.795e-06 1.016e-02
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3
1 0.00019 1.000 0.000 0.000
2 0.00000 0.000 1.000 -0.000
3 0.00019 0.000 -0.000 1.000
Examples: ML
10
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=67.5572 FROM MIGRAD STATUS=CONVERGED 43 CALLS 44 TOTAL
EDM=5.17307e-07 STRATEGY= 1 ERROR MATRIX ACCURATE
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 mean 5.00336e+01 1.43872e-01 5.80851e-04 4.58652e-03
2 num 5.03375e+03 7.09462e+01 5.73799e-05 -5.16551e-02
3 sigma 1.02075e+01 9.98379e-02 4.02277e-04 -2.48082e-03
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=1
2.070e-02 3.374e-04 -2.595e-05
3.374e-04 5.034e+03 1.873e-04
-2.595e-05 1.873e-04 9.968e-03
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3
1 0.00181 1.000 0.000 -0.002
2 0.00004 0.000 1.000 0.000
3 0.00181 -0.002 0.000 1.000
Examples: χ2
Fit has 100 bins
11
Some comments
• The value of the minimized -log L is arbitrary. In
contrast, the minimized χ2 gives you information
(goodness-of-fit, more later)
http://seal.web.cern.ch/seal/documents/minuit/mnerror.pdf
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=67.5572 FROM MIGRAD STATUS=CONVERGED 43 CALLS 44 TOTAL
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=-18938.8 FROM HESSE STATUS=OK 16 CALLS 65 TOTAL
12
Some comments
• The correlation matrix (which is just a normalized
covariance matrix) gives you information about the
interaction of your parameters. Values close to 1
(e.g. 0.95 and above) can be a cause for concern!
http://seal.web.cern.ch/seal/documents/minuit/mnerror.pdf
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3
1 0.00019 1.000 0.000 0.000
2 0.00000 0.000 1.000 -0.000
3 0.00019 0.000 -0.000 1.000
PARAMETER CORRELATION COEFFICIENTS
NO. GLOBAL 1 2 3
1 0.00181 1.000 0.000 -0.002
2 0.00004 0.000 1.000 0.000
3 0.00181 -0.002 0.000 1.000
13
Some comments
• What about the errors on the parameters and all
that information?
http://seal.web.cern.ch/seal/documents/minuit/mnerror.pdf
EXT PARAMETER INTERNAL INTERNAL
NO. NAME VALUE ERROR STEP SIZE VALUE
1 mean 4.99743e+01 1.42546e-01 2.70887e-03 4.99743e+01
2 num 5.00000e+03 7.07084e+01 5.37579e-05 6.02076e-07
3 sigma 1.00794e+01 1.00805e-01 3.83045e-04 1.00794e+01
ERR DEF= 0.5
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=0.5
EXT PARAMETER STEP FIRST
NO. NAME VALUE ERROR SIZE DERIVATIVE
1 mean 5.00336e+01 1.43872e-01 5.80851e-04 4.58652e-03
2 num 5.03375e+03 7.09462e+01 5.73799e-05 -5.16551e-02
3 sigma 1.02075e+01 9.98379e-02 4.02277e-04 -2.48082e-03
EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=1
14
Errors on parameters
• are defined as those values which increase the χ2 (-log
L) function from its minimum by 1 (0.5). (This was the
“ERROR DEF” seen on previous slide.)
• If your measurement error bars are wrong, the errors
on your parameters will be wrong, too!
• Most of this is derived from the curvature of the -log L, or
χ2, function: the second derivative matrix is calculated
and inverted at the minimum… (this always assumes the
shape at the minimum is parabolic). These errors are
not always appropriate, or accurate.
http://seal.web.cern.ch/seal/documents/minuit/mnerror.pdf
15
Covariance Matrix (or Error Matrix)
16
Vxy = E[(x µx)(y µy)]
(This is shown later as Σ).
Note: this is typically estimated during the
minimization process and is related to the curvature
of the -log L/χ2 function with respect to the
parameters
Covariance Matrix (or Error Matrix)
17
It is the inverse of the second derivative matrix:
(Vxy)
1
=
@2
log L
@x@y x=ˆx,y=ˆy
evaluated at the minimum. (To convince yourself of this,
consider that -log L should have the form of a multivariate
gaussian at/near the minimum.)
• Procedure: fix the parameter(s) of interest, and minimize the -
log L, χ2
. Scan the parameter and repeat.
• This finds the minimum contour for this parameter(s) of
interest.
• See, likelihood ratio, profile likelihood scan, chi-square profile.
This is a -log L
function, so the
1σ error is at 0.5
Profiling
18
A 2-D example of a profile scan
39
Rn in air gap500 100015002000250030003500400045005000
UinHFE
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
ProfileNLL
0
10
20
30
40
50
FIG. 29. 2D profile likelihood surface as a function of di↵erent contribution of U-like components
Shows correlation of two parameters, as well as the
combined error.19
How do I tell if the fit is ok?
• Goodness of fit!
• χ2, this is built in. (Note, you should also quote the χ2
and the number of degrees of freedom, never only
the reduced chi-sq). Also, as a shortcut, usually we
think χ2/NDF ~ 1 is “good”, but remember that larger
deviations from 1 are expected for small NDF, and only
small deviations are tolerated for large NDFs.
• You should use a chi-squared distribution lookup to get
the appropriate probability for a given χ2, NDF value pair.
e.g. ROOT: TMath::Prob(chi2, ndf)
20
How do I tell if the fit is ok?
Chi-Sq PDF with different NDFs:
(The minimum of a χ2 function is so distributed.)
https://en.wikipedia.org/wiki/Chi-squared_distribution
21
How do I tell if the fit is ok?
Chi-Sq CDF with different NDFs:
(The minimum of a χ2 function is so distributed.)
https://en.wikipedia.org/wiki/Chi-squared_distribution
22
How do I tell if the fit is ok?
• For ML fits, you can also calculate a chi-square
(but you must first choose a binning).
• Other “goodness of fit” tests, or rather “tests to see
if the data are derived from your fit model”:
Kolmogorov-Smirnov test.
• https://en.wikipedia.org/wiki/Kolmogorov
%E2%80%93Smirnov_test
23
Adding external information
L =
NY
n=1
f(✓, xn)
• You’ll hear talk of adding “penalty functions”, “weighting functions” to your
-log L, or χ
2
function
• Really, all that’s being done is a multiplication of the likelihood with an
additional probability.
• The exact form used depends on your problem, a Gaussian is a typical
choice
How do I incorporate other measurements,
systematic errors?
24
Adding external information
L =
NY
n=1
f(✓, xn)
Multivariate gaussian. Here, Σ is a positive definite
(symmetric) covariance matrix
25
Adding external information
For two variables, it looks like
26
Adding external information
which means your -log L function gets an additional
term added (ρ is the correlation between parameters):
(for χ2, this term multiplied by 2 is added)
27
1
2(1 ⇢2)

(x µx)2
2
x
+
(y µy)2
2
y
(x µx)(y µy)
x y
Adding external information
For just a single parameter (or a set of uncorrelated
parameters), one just adds
parameter of
interest
expected
(measured?) value
(measured) error
on this value
28
1
2

(x µx)2
2
x
That’s it, for now…
There’s a lot more to learn. If your program/fit gives you output
you don’t understand, try to understand it!
29
G. Cowan Statistical Data Analysis / Stat 1 5
Some statistics books, papers, etc.
G. Cowan, Statistical Data Analysis, Clarendon, Oxford, 1998
R.J. Barlow, Statistics: A Guide to the Use of Statistical Methods in
the Physical Sciences, Wiley, 1989
Ilya Narsky and Frank C. Porter, Statistical Analysis Techniques in
Particle Physics, Wiley, 2014.
L. Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986
F. James., Statistical and Computational Methods in Experimental
Physics, 2nd ed., World Scientific, 2006
S. Brandt, Statistical and Computational Methods in Data
Analysis, Springer, New York, 1998 (with program library on CD)
J. Beringer et al. (Particle Data Group), Review of Particle Physics,
Phys. Rev. D86, 010001 (2012) ; see also pdg.lbl.gov sections on
probability, statistics, Monte Carlo
From: http://www.pp.rhul.ac.uk/~cowan/stat/stat_1.pdf
Have a look at this course series!
A python script with some of the ROOT/RooFit info:
https://gist.github.com/mgmarino/9c030c67072e4295d6ec
30

More Related Content

PDF
Lesson 32
PPTX
graphical method
PDF
Numerical analysis dual, primal, revised simplex
DOCX
Simplex method - Maximisation Case
PPT
Simplex Method
PPTX
Operations research - Chapter 04
PPTX
Operation research - Chapter 02
PPTX
Linear programming
Lesson 32
graphical method
Numerical analysis dual, primal, revised simplex
Simplex method - Maximisation Case
Simplex Method
Operations research - Chapter 04
Operation research - Chapter 02
Linear programming

What's hot (6)

PPTX
Simplex method maximisation
PPT
aaoczc2252
PPT
OR Linear Programming
PPTX
5. advance topics in lp
PDF
Comparisons
PPT
Simplex Method
Simplex method maximisation
aaoczc2252
OR Linear Programming
5. advance topics in lp
Comparisons
Simplex Method
Ad

Similar to Statistics and Fitting: overview and thoughts (20)

PPTX
Matlab Functions for programming fundamentals
PPTX
Cubesat challenge considerations deep dive
PDF
Firefly exact MCMC for Big Data
PPTX
Lec2-review-III-svm-logreg_for the beginner.pptx
PPTX
Lec2-review-III-svm-logregressionmodel.pptx
PPTX
Classification Algortyhm of Machine Learning
PPTX
Machine learning and linear regression programming
PDF
Matlab intro
PDF
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
PPTX
PDF
Neural Network Part-2
PDF
Biosight: Quantitative Methods for Policy Analysis - Introduction to GAMS, Li...
PPTX
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
PPTX
super vector machines algorithms using deep
PDF
Shor's discrete logarithm quantum algorithm for elliptic curves
PPTX
L3-.pptx
PDF
XGBoost: the algorithm that wins every competition
PPT
Dynamic pgmming
PPT
Dynamicpgmming
Matlab Functions for programming fundamentals
Cubesat challenge considerations deep dive
Firefly exact MCMC for Big Data
Lec2-review-III-svm-logreg_for the beginner.pptx
Lec2-review-III-svm-logregressionmodel.pptx
Classification Algortyhm of Machine Learning
Machine learning and linear regression programming
Matlab intro
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Neural Network Part-2
Biosight: Quantitative Methods for Policy Analysis - Introduction to GAMS, Li...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
super vector machines algorithms using deep
Shor's discrete logarithm quantum algorithm for elliptic curves
L3-.pptx
XGBoost: the algorithm that wins every competition
Dynamic pgmming
Dynamicpgmming
Ad

Recently uploaded (20)

PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
The scientific heritage No 166 (166) (2025)
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
2. Earth - The Living Planet earth and life
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
famous lake in india and its disturibution and importance
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
2Systematics of Living Organisms t-.pptx
PDF
An interstellar mission to test astrophysical black holes
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
The scientific heritage No 166 (166) (2025)
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Cell Membrane: Structure, Composition & Functions
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Introduction to Cardiovascular system_structure and functions-1
2. Earth - The Living Planet earth and life
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
POSITIONING IN OPERATION THEATRE ROOM.ppt
ECG_Course_Presentation د.محمد صقران ppt
Comparative Structure of Integument in Vertebrates.pptx
famous lake in india and its disturibution and importance
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
2Systematics of Living Organisms t-.pptx
An interstellar mission to test astrophysical black holes

Statistics and Fitting: overview and thoughts

  • 1. Fitting and Statistics: overview and thoughts M. Marino Group Meeting 22 Jan 2015 1
  • 2. Black boxes are bad. As physicists, our goal is to understand everything about the experiment as much as possible. That includes (especially!) the analysis. 2
  • 3. Goals for the next slides • Key words - some you’ve heard before, some perhaps not • Explanation • Practical examples WARNING: may be a bit pedantic 3
  • 4. What I’m not going to talk about • The very basics… Gaussian, Poisson, Binomial are all probably concepts you understand at least at some level. • Any sort of derivations • A deep discussion of probability (e.g. Bayesian vs. Frequentist) 4
  • 5. Bread and butter • Minimization: • Maximum Likelihood • χ2 - fits • (Markov Chain Monte Carlo) - fitting models with Bayesian statistics and MC methods 5
  • 6. Maximum Likelihood • ‘Maximize your likelihood function’ • (Actually, you typically minimize the -log L) • You may choose a binning, but you don’t have to. (so-called ‘unbinned’ ML fits… important when e.g. a binning choice may bias your fits.) • The probability functions depend upon the underlying statistics, it’s probably normally Poisson functions. Note: you can of course ‘multiply’ the likelihood function with external information… this is how you build in external constraints. L = NY n=1 f(✓, xn) number of points (bins) f: probability for seeing xn given θ xn: nth data point θ : set of parameters 6
  • 7. χ2 fits L = NY n=1 f(✓, xn) assume f is Gaussian and take the - log L NY n=1 Ce [xn y(✓,xn)]2 /2 2 n NX n=1 (xn y(✓, xn))2 2 2 n This is χ2 modulo a factor 2 7
  • 8. χ2 fits NX n=1 (xn y(✓, xn))2 2 n • This is a χ 2 function. It assumes that your underlying statistics are Gaussian. If this is not true (e.g. your statistics are low) your results may not be correct. • You must choose a binning, which can open you up to binning biases • This has some neat features, including a built-in “Goodness-of- fit” (more on this later) 8
  • 9. Ok, so how do I use this? • Typically, you don’t have to build your own χ2 or -log L function. e.g. ROOT, RooFit will do this for you. But sometimes you have to build your own. • You minimize this function using some sort of arbitrary minimizer. ROOT using MIGRAD which is part of the MINUIT2 suite. It is generally very robust. • (The following uses the output from ROOT, but generally other minimizers should give you similar output.) 9
  • 10. Note, the fit was unbinned, but obviously the plotting needs a choice of bins COVARIANCE MATRIX CALCULATED SUCCESSFULLY FCN=-18938.8 FROM HESSE STATUS=OK 16 CALLS 65 TOTAL EDM=5.46776e-08 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER INTERNAL INTERNAL NO. NAME VALUE ERROR STEP SIZE VALUE 1 mean 4.99743e+01 1.42546e-01 2.70887e-03 4.99743e+01 2 num 5.00000e+03 7.07084e+01 5.37579e-05 6.02076e-07 3 sigma 1.00794e+01 1.00805e-01 3.83045e-04 1.00794e+01 ERR DEF= 0.5 EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=0.5 2.032e-02 5.071e-07 2.719e-06 5.071e-07 5.000e+03 -1.795e-06 2.719e-06 -1.795e-06 1.016e-02 PARAMETER CORRELATION COEFFICIENTS NO. GLOBAL 1 2 3 1 0.00019 1.000 0.000 0.000 2 0.00000 0.000 1.000 -0.000 3 0.00019 0.000 -0.000 1.000 Examples: ML 10
  • 11. COVARIANCE MATRIX CALCULATED SUCCESSFULLY FCN=67.5572 FROM MIGRAD STATUS=CONVERGED 43 CALLS 44 TOTAL EDM=5.17307e-07 STRATEGY= 1 ERROR MATRIX ACCURATE EXT PARAMETER STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 mean 5.00336e+01 1.43872e-01 5.80851e-04 4.58652e-03 2 num 5.03375e+03 7.09462e+01 5.73799e-05 -5.16551e-02 3 sigma 1.02075e+01 9.98379e-02 4.02277e-04 -2.48082e-03 EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=1 2.070e-02 3.374e-04 -2.595e-05 3.374e-04 5.034e+03 1.873e-04 -2.595e-05 1.873e-04 9.968e-03 PARAMETER CORRELATION COEFFICIENTS NO. GLOBAL 1 2 3 1 0.00181 1.000 0.000 -0.002 2 0.00004 0.000 1.000 0.000 3 0.00181 -0.002 0.000 1.000 Examples: χ2 Fit has 100 bins 11
  • 12. Some comments • The value of the minimized -log L is arbitrary. In contrast, the minimized χ2 gives you information (goodness-of-fit, more later) http://seal.web.cern.ch/seal/documents/minuit/mnerror.pdf COVARIANCE MATRIX CALCULATED SUCCESSFULLY FCN=67.5572 FROM MIGRAD STATUS=CONVERGED 43 CALLS 44 TOTAL COVARIANCE MATRIX CALCULATED SUCCESSFULLY FCN=-18938.8 FROM HESSE STATUS=OK 16 CALLS 65 TOTAL 12
  • 13. Some comments • The correlation matrix (which is just a normalized covariance matrix) gives you information about the interaction of your parameters. Values close to 1 (e.g. 0.95 and above) can be a cause for concern! http://seal.web.cern.ch/seal/documents/minuit/mnerror.pdf PARAMETER CORRELATION COEFFICIENTS NO. GLOBAL 1 2 3 1 0.00019 1.000 0.000 0.000 2 0.00000 0.000 1.000 -0.000 3 0.00019 0.000 -0.000 1.000 PARAMETER CORRELATION COEFFICIENTS NO. GLOBAL 1 2 3 1 0.00181 1.000 0.000 -0.002 2 0.00004 0.000 1.000 0.000 3 0.00181 -0.002 0.000 1.000 13
  • 14. Some comments • What about the errors on the parameters and all that information? http://seal.web.cern.ch/seal/documents/minuit/mnerror.pdf EXT PARAMETER INTERNAL INTERNAL NO. NAME VALUE ERROR STEP SIZE VALUE 1 mean 4.99743e+01 1.42546e-01 2.70887e-03 4.99743e+01 2 num 5.00000e+03 7.07084e+01 5.37579e-05 6.02076e-07 3 sigma 1.00794e+01 1.00805e-01 3.83045e-04 1.00794e+01 ERR DEF= 0.5 EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=0.5 EXT PARAMETER STEP FIRST NO. NAME VALUE ERROR SIZE DERIVATIVE 1 mean 5.00336e+01 1.43872e-01 5.80851e-04 4.58652e-03 2 num 5.03375e+03 7.09462e+01 5.73799e-05 -5.16551e-02 3 sigma 1.02075e+01 9.98379e-02 4.02277e-04 -2.48082e-03 EXTERNAL ERROR MATRIX. NDIM= 25 NPAR= 3 ERR DEF=1 14
  • 15. Errors on parameters • are defined as those values which increase the χ2 (-log L) function from its minimum by 1 (0.5). (This was the “ERROR DEF” seen on previous slide.) • If your measurement error bars are wrong, the errors on your parameters will be wrong, too! • Most of this is derived from the curvature of the -log L, or χ2, function: the second derivative matrix is calculated and inverted at the minimum… (this always assumes the shape at the minimum is parabolic). These errors are not always appropriate, or accurate. http://seal.web.cern.ch/seal/documents/minuit/mnerror.pdf 15
  • 16. Covariance Matrix (or Error Matrix) 16 Vxy = E[(x µx)(y µy)] (This is shown later as Σ). Note: this is typically estimated during the minimization process and is related to the curvature of the -log L/χ2 function with respect to the parameters
  • 17. Covariance Matrix (or Error Matrix) 17 It is the inverse of the second derivative matrix: (Vxy) 1 = @2 log L @x@y x=ˆx,y=ˆy evaluated at the minimum. (To convince yourself of this, consider that -log L should have the form of a multivariate gaussian at/near the minimum.)
  • 18. • Procedure: fix the parameter(s) of interest, and minimize the - log L, χ2 . Scan the parameter and repeat. • This finds the minimum contour for this parameter(s) of interest. • See, likelihood ratio, profile likelihood scan, chi-square profile. This is a -log L function, so the 1σ error is at 0.5 Profiling 18
  • 19. A 2-D example of a profile scan 39 Rn in air gap500 100015002000250030003500400045005000 UinHFE 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 ProfileNLL 0 10 20 30 40 50 FIG. 29. 2D profile likelihood surface as a function of di↵erent contribution of U-like components Shows correlation of two parameters, as well as the combined error.19
  • 20. How do I tell if the fit is ok? • Goodness of fit! • χ2, this is built in. (Note, you should also quote the χ2 and the number of degrees of freedom, never only the reduced chi-sq). Also, as a shortcut, usually we think χ2/NDF ~ 1 is “good”, but remember that larger deviations from 1 are expected for small NDF, and only small deviations are tolerated for large NDFs. • You should use a chi-squared distribution lookup to get the appropriate probability for a given χ2, NDF value pair. e.g. ROOT: TMath::Prob(chi2, ndf) 20
  • 21. How do I tell if the fit is ok? Chi-Sq PDF with different NDFs: (The minimum of a χ2 function is so distributed.) https://en.wikipedia.org/wiki/Chi-squared_distribution 21
  • 22. How do I tell if the fit is ok? Chi-Sq CDF with different NDFs: (The minimum of a χ2 function is so distributed.) https://en.wikipedia.org/wiki/Chi-squared_distribution 22
  • 23. How do I tell if the fit is ok? • For ML fits, you can also calculate a chi-square (but you must first choose a binning). • Other “goodness of fit” tests, or rather “tests to see if the data are derived from your fit model”: Kolmogorov-Smirnov test. • https://en.wikipedia.org/wiki/Kolmogorov %E2%80%93Smirnov_test 23
  • 24. Adding external information L = NY n=1 f(✓, xn) • You’ll hear talk of adding “penalty functions”, “weighting functions” to your -log L, or χ 2 function • Really, all that’s being done is a multiplication of the likelihood with an additional probability. • The exact form used depends on your problem, a Gaussian is a typical choice How do I incorporate other measurements, systematic errors? 24
  • 25. Adding external information L = NY n=1 f(✓, xn) Multivariate gaussian. Here, Σ is a positive definite (symmetric) covariance matrix 25
  • 26. Adding external information For two variables, it looks like 26
  • 27. Adding external information which means your -log L function gets an additional term added (ρ is the correlation between parameters): (for χ2, this term multiplied by 2 is added) 27 1 2(1 ⇢2)  (x µx)2 2 x + (y µy)2 2 y (x µx)(y µy) x y
  • 28. Adding external information For just a single parameter (or a set of uncorrelated parameters), one just adds parameter of interest expected (measured?) value (measured) error on this value 28 1 2  (x µx)2 2 x
  • 29. That’s it, for now… There’s a lot more to learn. If your program/fit gives you output you don’t understand, try to understand it! 29
  • 30. G. Cowan Statistical Data Analysis / Stat 1 5 Some statistics books, papers, etc. G. Cowan, Statistical Data Analysis, Clarendon, Oxford, 1998 R.J. Barlow, Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences, Wiley, 1989 Ilya Narsky and Frank C. Porter, Statistical Analysis Techniques in Particle Physics, Wiley, 2014. L. Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986 F. James., Statistical and Computational Methods in Experimental Physics, 2nd ed., World Scientific, 2006 S. Brandt, Statistical and Computational Methods in Data Analysis, Springer, New York, 1998 (with program library on CD) J. Beringer et al. (Particle Data Group), Review of Particle Physics, Phys. Rev. D86, 010001 (2012) ; see also pdg.lbl.gov sections on probability, statistics, Monte Carlo From: http://www.pp.rhul.ac.uk/~cowan/stat/stat_1.pdf Have a look at this course series! A python script with some of the ROOT/RooFit info: https://gist.github.com/mgmarino/9c030c67072e4295d6ec 30