Distributional Tests for LIGO Detections

Sophia G. Schwalbe
Michele Zanolin, Marek Szczepanczyk
Undergraduate Research Symposium – Embry-Riddle Aeronautical University, Prescott
30 October 2015
Distributional Tests
for LIGO Detections

Overview
• Purpose
• Definition of Distributional Tests
• Examples of Distributional Tests
• Example Populations for LIGO

Purpose
• Two types of triggers
• Foreground: obtained from the
detector data
• Background: obtained from
detectors after time shifts are
applied
• Ranked with respect to quantities
related to energy (Signal-to-Noise
Ratio)
• Current detection relies on only the
loudest event in the foreground (by
calculating its False Alarm Probability
using the background triggers)
• Want to include possible weaker
events.
• i.e. low energy events

Distributional Tests
• Distributional tests compare a distribution of events with
respect to a reference distribution and produce a probability
that the distribution of events is compatible with the
reference
• There are two kinds of distributional tests:
• Parametric: the reference distribution has a known
analytical shape
• Non-parametric: the reference distribution is determined
with background runs
• In LIGO, we would like to use non-parametric tests

Non-Parametric Distributions
• Non-parametric distributions do not assume shape or
parameters.
• Parametric assume a shape (ex. to be normal) or parameter.
• Examples:
• Kolmogorov-Smirnov
• Mann-Whitney
• Chi-squared
• Binomial

Kolmogorov-Smirnov
• The maximum distance between background and
foreground distributions is found
• If D is small, likely to be same distribution
• If D is large, likely to be two different
distributions
• Probability distribution of the possible values of D is
found to find probability of having a value larger
than observed
• If consistent regardless of distribution, then
distributions are same (foreground is
background, no gravitational waves)
• If probability of a value of D larger than the
observed is small, not from same distribution

Kolmogorov-Smirnov
• Consider two unknown distributions SN1
(x) and SN2
(x) that are the fractions of the original distributions equal
to or below a given value of x
• Sizes of the distributions are N1 and N2, respectively
• x: measure of the amplitude of the event
• Null Hypothesis H0 : SN1
= SN2
for -∞<x<∞
• Computes the maximum distance over x between distributions
• The probability of D being greater than the observed value is given as P
• QKS: compliment of the cumulative distribution function (cdf) for D
• Ne: average size of the distributions
• P: derived from an approximation of the probability of D for the upper tail of the cdf to replace the
tables for confidence and constants
• λ: constant chosen from confidence level
• If less than a given percentage, the null hypothesis is rejected

Kolmogorov-Smirnov
• To chose cdf of D from two unknown distributions, consider rank
• Combining the distributions and ordering the events by a given parameter “ranks” the
events from 1 to N1+N2
• Can normalize rank by dividing individual rank by N1+N2 to give a value from 0 to 1
• Assuming distributions are equivalent, the probability distribution of the normalized
rank will be level
• Level probability distributions are known
• Since area under level probability distribution is same as area under cdf of D, the cdf of
D can be written in terms of the probability distribution of the normalized rank
•
𝑑𝑦
𝑑𝑥
is the Jacobian of y with respect to x
• py(y)=probability of D
• px(x)=probability of rank

Mann-Whitney
• First combines two distributions (foreground and background)
and ranks the events from lowest to highest with respect to
parameter
• Assumes that if it is the same distribution, all the events will be
dispersed uniformly instead of concentrated at smaller/larger
values
• Then computes sum of ranks for each distribution
• If sum deviates too much from the mean value of the ranks,
then the distributions are different

Symmetric Chi-Squared
• Unlike Kolmogorov-Smirnov and
Mann-Whitney, orders data into bins
• Compares two data populations
• Computes the χ2 statistic based on
the number of events in the given bin
• The probability for having a χ2
statistic greater than the observed
value is found
• If less than a given percentage,
not from the same distribution

Asymmetric Chi-Squared
• Compares a data set (foreground) with a known distribution
(background)
• Computes the χ2 statistic based on the number of events in
the foreground and expected number of events in the
background
• Then computes probability of having a χ2 statistic greater
than the observed
• Gives a higher χ2 statistic than the symmetric, so gives a
higher chance of distinguishing distributions

Populations
• Same amplitude repeaters
• Assumes all the triggers have the same parameter
• Background-like
• Distribution is exponential under logarithmic scale, where exponential
constant is correlated to that of the noise (background) triggers
• Galactic center
• Distribution mimics that of distribution of same transients but random
polarizations located in galactic center would produce
• Power law decay
• Assumes distributions of parameter proportional to negative integer powers

Summary
• Using different distributional tests, we can find which test is most
sensitive to different populations of signals
• Want to use non-parametric tests instead of parametric in order
to reduce assumptions on the distributions
• Some possible non-parametric tests include Kolmogorov-
Smirnov, Mann-Whitney, and chi-squared
• Next, need to analyze the difference in efficiency between
parametric and non-parametric tests for LIGO

Distributional Tests for LIGO Detections

More Related Content

Viewers also liked (14)

Similar to Distributional Tests for LIGO Detections (20)

Distributional Tests for LIGO Detections