SlideShare a Scribd company logo
Sophia G. Schwalbe
Michele Zanolin, Marek Szczepanczyk
Undergraduate Research Symposium – Embry-Riddle Aeronautical University, Prescott
30 October 2015
Distributional Tests
for LIGO Detections
Overview
• Purpose
• Definition of Distributional Tests
• Examples of Distributional Tests
• Example Populations for LIGO
Purpose
• Two types of triggers
• Foreground: obtained from the
detector data
• Background: obtained from
detectors after time shifts are
applied
• Ranked with respect to quantities
related to energy (Signal-to-Noise
Ratio)
• Current detection relies on only the
loudest event in the foreground (by
calculating its False Alarm Probability
using the background triggers)
• Want to include possible weaker
events.
• i.e. low energy events
Distributional Tests
• Distributional tests compare a distribution of events with
respect to a reference distribution and produce a probability
that the distribution of events is compatible with the
reference
• There are two kinds of distributional tests:
• Parametric: the reference distribution has a known
analytical shape
• Non-parametric: the reference distribution is determined
with background runs
• In LIGO, we would like to use non-parametric tests
Non-Parametric Distributions
• Non-parametric distributions do not assume shape or
parameters.
• Parametric assume a shape (ex. to be normal) or parameter.
• Examples:
• Kolmogorov-Smirnov
• Mann-Whitney
• Chi-squared
• Binomial
Kolmogorov-Smirnov
• The maximum distance between background and
foreground distributions is found
• If D is small, likely to be same distribution
• If D is large, likely to be two different
distributions
• Probability distribution of the possible values of D is
found to find probability of having a value larger
than observed
• If consistent regardless of distribution, then
distributions are same (foreground is
background, no gravitational waves)
• If probability of a value of D larger than the
observed is small, not from same distribution
Kolmogorov-Smirnov
• Consider two unknown distributions SN1
(x) and SN2
(x) that are the fractions of the original distributions equal
to or below a given value of x
• Sizes of the distributions are N1 and N2, respectively
• x: measure of the amplitude of the event
• Null Hypothesis H0 : SN1
= SN2
for -∞<x<∞
• Computes the maximum distance over x between distributions
• The probability of D being greater than the observed value is given as P
• QKS: compliment of the cumulative distribution function (cdf) for D
• Ne: average size of the distributions
• P: derived from an approximation of the probability of D for the upper tail of the cdf to replace the
tables for confidence and constants
• λ: constant chosen from confidence level
• If less than a given percentage, the null hypothesis is rejected
Kolmogorov-Smirnov
• To chose cdf of D from two unknown distributions, consider rank
• Combining the distributions and ordering the events by a given parameter “ranks” the
events from 1 to N1+N2
• Can normalize rank by dividing individual rank by N1+N2 to give a value from 0 to 1
• Assuming distributions are equivalent, the probability distribution of the normalized
rank will be level
• Level probability distributions are known
• Since area under level probability distribution is same as area under cdf of D, the cdf of
D can be written in terms of the probability distribution of the normalized rank
•
𝑑𝑦
𝑑𝑥
is the Jacobian of y with respect to x
• py(y)=probability of D
• px(x)=probability of rank
Mann-Whitney
• First combines two distributions (foreground and background)
and ranks the events from lowest to highest with respect to
parameter
• Assumes that if it is the same distribution, all the events will be
dispersed uniformly instead of concentrated at smaller/larger
values
• Then computes sum of ranks for each distribution
• If sum deviates too much from the mean value of the ranks,
then the distributions are different
Symmetric Chi-Squared
• Unlike Kolmogorov-Smirnov and
Mann-Whitney, orders data into bins
• Compares two data populations
• Computes the χ2 statistic based on
the number of events in the given bin
• The probability for having a χ2
statistic greater than the observed
value is found
• If less than a given percentage,
not from the same distribution
Asymmetric Chi-Squared
• Compares a data set (foreground) with a known distribution
(background)
• Computes the χ2 statistic based on the number of events in
the foreground and expected number of events in the
background
• Then computes probability of having a χ2 statistic greater
than the observed
• Gives a higher χ2 statistic than the symmetric, so gives a
higher chance of distinguishing distributions
Populations
• Same amplitude repeaters
• Assumes all the triggers have the same parameter
• Background-like
• Distribution is exponential under logarithmic scale, where exponential
constant is correlated to that of the noise (background) triggers
• Galactic center
• Distribution mimics that of distribution of same transients but random
polarizations located in galactic center would produce
• Power law decay
• Assumes distributions of parameter proportional to negative integer powers
Summary
• Using different distributional tests, we can find which test is most
sensitive to different populations of signals
• Want to use non-parametric tests instead of parametric in order
to reduce assumptions on the distributions
• Some possible non-parametric tests include Kolmogorov-
Smirnov, Mann-Whitney, and chi-squared
• Next, need to analyze the difference in efficiency between
parametric and non-parametric tests for LIGO
References

More Related Content

PPT
Entropy and its significance related to GIS
PDF
Temporal models for mining, ranking and recommendation in the Web
PDF
Forecasting Space-Time Events - Strata + Hadoop World 2015 San Jose
PDF
Effects of bullet shape on drag
PPTX
International Space Station
PPTX
MANGALYAAN
PDF
Helicopter vibration reduction techniques
Entropy and its significance related to GIS
Temporal models for mining, ranking and recommendation in the Web
Forecasting Space-Time Events - Strata + Hadoop World 2015 San Jose
Effects of bullet shape on drag
International Space Station
MANGALYAAN
Helicopter vibration reduction techniques

Viewers also liked (14)

PPT
Shenzhen xuandi communication technology l.t.d
PDF
Posicionamento Brasscom: Fomento à prototipação de Internet das Coisas
DOC
Katherine E Horne Resume December 2015
PDF
COMPLETED PRACTICUM
PDF
Classification using Apache SystemML by Prithviraj Sen
PDF
PORFOLIO (F_DISERTATION) ALEJANDRO MARCILLA-GARCIA
DOCX
Bio_diversity_and_Indian_society
PDF
Cara membuat email
PDF
GT2PropulsionSystemSubmissionDocument
PDF
DOCX
Ashisdeb analytics new_cv_doc
PPSX
Retrospectiva do grupo apoio a inclusão 2
PDF
Latihan 2
PPT
Кузяева Эльвира Николаевна
Shenzhen xuandi communication technology l.t.d
Posicionamento Brasscom: Fomento à prototipação de Internet das Coisas
Katherine E Horne Resume December 2015
COMPLETED PRACTICUM
Classification using Apache SystemML by Prithviraj Sen
PORFOLIO (F_DISERTATION) ALEJANDRO MARCILLA-GARCIA
Bio_diversity_and_Indian_society
Cara membuat email
GT2PropulsionSystemSubmissionDocument
Ashisdeb analytics new_cv_doc
Retrospectiva do grupo apoio a inclusão 2
Latihan 2
Кузяева Эльвира Николаевна
Ad

Similar to Distributional Tests for LIGO Detections (20)

PPTX
Categorical data analysis.pptx
PPTX
Non-parametric.pptx qualitative and quantity data
PDF
202003241550010409rajeev_pandey_Non-Parametric.pdf
PDF
presentation4.pdf Intro to mcmc methodss
PDF
Bayesian Statistics.pdf
PPT
9-NON PARAMETRIC TEST in public health .ppt
PPT
Stats-Review-Maie-St-John-5-20-2009.ppt
PDF
An Introduction To Probability And Statistical Inference 1st Edition George G...
PDF
Statistics firstfive
PDF
Testing for mixtures at BNP 13
PDF
Introductory Statistics Explained.pdf
PPTX
Normal distriutionvggggggggggggggggggg.pptx
PPTX
NON-PARAMETRIC TESTS.pptx
PPTX
BIOSTATISTICS OVERALL JUNE 20241234567.pptx
PDF
Hmisiri nonparametrics book
PDF
Biostatistics notes for Masters in Public Health
PPT
8. Hypothesis Testing.ppt
DOCX
1) Those methods involving the collection, presentation, and chara.docx
PPTX
Introduction to Educational Statistics.pptx
PPTX
Applications of t, f and chi2 distributions
 
Categorical data analysis.pptx
Non-parametric.pptx qualitative and quantity data
202003241550010409rajeev_pandey_Non-Parametric.pdf
presentation4.pdf Intro to mcmc methodss
Bayesian Statistics.pdf
9-NON PARAMETRIC TEST in public health .ppt
Stats-Review-Maie-St-John-5-20-2009.ppt
An Introduction To Probability And Statistical Inference 1st Edition George G...
Statistics firstfive
Testing for mixtures at BNP 13
Introductory Statistics Explained.pdf
Normal distriutionvggggggggggggggggggg.pptx
NON-PARAMETRIC TESTS.pptx
BIOSTATISTICS OVERALL JUNE 20241234567.pptx
Hmisiri nonparametrics book
Biostatistics notes for Masters in Public Health
8. Hypothesis Testing.ppt
1) Those methods involving the collection, presentation, and chara.docx
Introduction to Educational Statistics.pptx
Applications of t, f and chi2 distributions
 
Ad

Distributional Tests for LIGO Detections

  • 1. Sophia G. Schwalbe Michele Zanolin, Marek Szczepanczyk Undergraduate Research Symposium – Embry-Riddle Aeronautical University, Prescott 30 October 2015 Distributional Tests for LIGO Detections
  • 2. Overview • Purpose • Definition of Distributional Tests • Examples of Distributional Tests • Example Populations for LIGO
  • 3. Purpose • Two types of triggers • Foreground: obtained from the detector data • Background: obtained from detectors after time shifts are applied • Ranked with respect to quantities related to energy (Signal-to-Noise Ratio) • Current detection relies on only the loudest event in the foreground (by calculating its False Alarm Probability using the background triggers) • Want to include possible weaker events. • i.e. low energy events
  • 4. Distributional Tests • Distributional tests compare a distribution of events with respect to a reference distribution and produce a probability that the distribution of events is compatible with the reference • There are two kinds of distributional tests: • Parametric: the reference distribution has a known analytical shape • Non-parametric: the reference distribution is determined with background runs • In LIGO, we would like to use non-parametric tests
  • 5. Non-Parametric Distributions • Non-parametric distributions do not assume shape or parameters. • Parametric assume a shape (ex. to be normal) or parameter. • Examples: • Kolmogorov-Smirnov • Mann-Whitney • Chi-squared • Binomial
  • 6. Kolmogorov-Smirnov • The maximum distance between background and foreground distributions is found • If D is small, likely to be same distribution • If D is large, likely to be two different distributions • Probability distribution of the possible values of D is found to find probability of having a value larger than observed • If consistent regardless of distribution, then distributions are same (foreground is background, no gravitational waves) • If probability of a value of D larger than the observed is small, not from same distribution
  • 7. Kolmogorov-Smirnov • Consider two unknown distributions SN1 (x) and SN2 (x) that are the fractions of the original distributions equal to or below a given value of x • Sizes of the distributions are N1 and N2, respectively • x: measure of the amplitude of the event • Null Hypothesis H0 : SN1 = SN2 for -∞<x<∞ • Computes the maximum distance over x between distributions • The probability of D being greater than the observed value is given as P • QKS: compliment of the cumulative distribution function (cdf) for D • Ne: average size of the distributions • P: derived from an approximation of the probability of D for the upper tail of the cdf to replace the tables for confidence and constants • λ: constant chosen from confidence level • If less than a given percentage, the null hypothesis is rejected
  • 8. Kolmogorov-Smirnov • To chose cdf of D from two unknown distributions, consider rank • Combining the distributions and ordering the events by a given parameter “ranks” the events from 1 to N1+N2 • Can normalize rank by dividing individual rank by N1+N2 to give a value from 0 to 1 • Assuming distributions are equivalent, the probability distribution of the normalized rank will be level • Level probability distributions are known • Since area under level probability distribution is same as area under cdf of D, the cdf of D can be written in terms of the probability distribution of the normalized rank • 𝑑𝑦 𝑑𝑥 is the Jacobian of y with respect to x • py(y)=probability of D • px(x)=probability of rank
  • 9. Mann-Whitney • First combines two distributions (foreground and background) and ranks the events from lowest to highest with respect to parameter • Assumes that if it is the same distribution, all the events will be dispersed uniformly instead of concentrated at smaller/larger values • Then computes sum of ranks for each distribution • If sum deviates too much from the mean value of the ranks, then the distributions are different
  • 10. Symmetric Chi-Squared • Unlike Kolmogorov-Smirnov and Mann-Whitney, orders data into bins • Compares two data populations • Computes the χ2 statistic based on the number of events in the given bin • The probability for having a χ2 statistic greater than the observed value is found • If less than a given percentage, not from the same distribution
  • 11. Asymmetric Chi-Squared • Compares a data set (foreground) with a known distribution (background) • Computes the χ2 statistic based on the number of events in the foreground and expected number of events in the background • Then computes probability of having a χ2 statistic greater than the observed • Gives a higher χ2 statistic than the symmetric, so gives a higher chance of distinguishing distributions
  • 12. Populations • Same amplitude repeaters • Assumes all the triggers have the same parameter • Background-like • Distribution is exponential under logarithmic scale, where exponential constant is correlated to that of the noise (background) triggers • Galactic center • Distribution mimics that of distribution of same transients but random polarizations located in galactic center would produce • Power law decay • Assumes distributions of parameter proportional to negative integer powers
  • 13. Summary • Using different distributional tests, we can find which test is most sensitive to different populations of signals • Want to use non-parametric tests instead of parametric in order to reduce assumptions on the distributions • Some possible non-parametric tests include Kolmogorov- Smirnov, Mann-Whitney, and chi-squared • Next, need to analyze the difference in efficiency between parametric and non-parametric tests for LIGO