Lecture 5: Interval Estimation

Machine
Learning
for
Language
Technology
2015

h6p://stp.lingﬁl.uu.se/~san?nim/ml/2015/ml4lt_2015.htm

Sta%s%cal
Inference
(2)

Interval
Es?ma?on

Marina
San%ni

san%nim@stp.lingﬁl.uu.se

Department
of
Linguis%cs
and
Philology

Uppsala
University,
Uppsala,
Sweden

Autumn
2015

Acknowledgements

•  The
web,
sta%s%cal
websites,
online

calculators

Lecture 5: Statistical Inference 2:
Interval Estimation
2

Outline

•  Conﬁdence
intervals

– On
propor%ons

– On
means

•  Standard
error

Interval Estimation
3

Sta%s%cal
Inference:

Interval
Es%ma%on

•  Suppose
we
measure
the
error
of
a
classiﬁer
on
a
test

set
and
obtain
a
certain
numerical
error
rate,
eg.
25%.

•  This
corresponds
to
a
success
rate
of
75%.

•  This
is
an
es%mate
on
a
sample
(our
dataset).

•  What
can
we
say
about
the
"true"
success
rate
on
the

target
popula%on?

•  Remember:
We
have
observed
the
propor%on
of

correct
classiﬁca%ons
on
a
sample,
while
the

popula%on
is
unknown
to
us.

Interval Estimation
4

Our
prac%cal
ques%on
is…

l  When the estimated success rate is 75%, how
close is this value to the true success rate, ie the
success rate on the population?
♦  Depends on the amount of sample size
Interval Estimation
5

What
is
a
confidence
interval?

•  In
sta%s%cal
inference,
one
wishes
to
es%mate
popula%on

parameters
using
observed
sample
data

•  Confidence
intervals
provide
an
essen%al
understanding
of
how

much
faith
we
can
have
in
our
sample
es%mates

•  A
confidence
interval
is
a
range
computed
using
sample
sta%s%cs

to
es%mate
an
unknown
popula%on
parameter
with
a
given
level

of
confidence.

–  For
example,
we
want
to
say:
“we
are
80%
certain
that
true

popula%on
propor%on
falls
within
the
range
of
73.25%
and
76.75%

–  We
usually
write
the
confidence
interval
in
this
way:
[0.732,0.767]

Interval Estimation
6

Generally
speaking...

•  A
conﬁdence
interval
is
constructed
by
taking

the
point
es%mate
(p̂)
plus
and
minus
the

margin
of
error.

•  The
margin
of
error
is
computed
by

mul%plying
a
z
mul%plier
by
the

standard
error,
SE(p̂).

Interval Estimation
7

Deﬁni%on:
Standard
Error

•  Standard
error
is
a
sta%s%cal
term
that
measures
the

accuracy
with
which
a
sample
represents
a
popula%on.

•  In
sta%s%cs,
a
sample
mean
or
a
sample
propor%on

deviates
from
the
actual
mean
or
propor%on
of
a

popula%on;
this
devia%on
is
the
standard
error.

The
smaller
the
standard
error,
the
more

representa%ve
the
sample
will
be
of
the
overall

popula%on.
The
standard
error
is
also
inversely

propor%onal
to
the
sample
size;
the
larger
the
sample

size,
the
smaller
the
standard
error
because
the

sta%s%c
will
approach
the
actual
value.

Interval Estimation
8

The
Mul%plier

The multiplier is a constant that indicates the number of standard
deviations in a normal curve. The larger the multiplier, the higher
the confidence level, the narrower the confidence interval, the
more reliable the prediction of the performace.The constant for
80% percent confidence intervals is 1.28 (see table or use a
calculator: http://www.gngroup.com/stat.html )
Interval Estimation
9

Confidence
intervals

•  Confidence
intervals
of
a
propor%on

•  Confidence
intervals
of
the
mean

Interval Estimation
10

Conﬁdence
interval
for
propor%on

•  A
conﬁdence
interval
for
a
propor%on
is

constructed
by
taking
the
point
es%mate
(p̂)

plus
and
minus
the
margin
of
error.
The

margin
of
error
is
computed
by
mul%plying
a

mul%plier
by
the
standard
error,
SE(pˆ).

Interval Estimation
11

The
standard
error
of
propor%on:

p̂
(p-‐hat)

•  The
standard
error
is
an
es%mate
of
the
standard
devia%on

of
a
sta%s%c.

•  This
is
the
formula
of
the
Standard
Error
of
an
es%mated

propor%on
(the
hat
always
represents
an
es%mate)

•  p̂
=
es%mated
propor%on

•  n
=
sample
(number
of
observa%ons)

Interval Estimation
12

Our
prac%cal
ques%on
is…

l  When the estimated success rate is 75%, how
close is this value to the true success rate, ie the
success rate on the population?
♦  Depends on the amount of sample size
Interval Estimation
13

Conﬁdence
intervals
on
our

propor%on

l  We can say that our point estimate 75% lies
within a certain specified interval with a certain
specified confidence (say 80%):
l  Example: S=750 successes in N=1000 trials
l  Estimated success rate: 75%
l  How close is this to true success rate p?
l  Answer: with 80% confidence p in [73.2,76.7]
l  Another example: S=75 and N=100
l  Estimated success rate: 75%
l  Answer: With 80% confidence p in [69.1,80.1]
Interval Estimation
14

l  p̂ = 75%, n = 1000, confidence = 80% (so that z =
1.28):
p∈[0.732,0.767]
l  p̂ = 75%, n = 100, confidence = 80% (so that z = 1.28):
p∈[0.691,0.801]
l  Usually the normal distribution assumption is only valid
for large n (i.e. n > 100)
l  In a case like this: p̂ = 75%, n = 10, confidence = 80%
(so that z = 1.28): p∈[0.549,0.881]
Interval Estimation
15

Conﬁdence
Interval
Calculator
for
Propor%ons

hdps://www.mccallum-‐layton.co.uk/tools/sta%s%c-‐calculators/conﬁdence-‐interval-‐for-‐propor%ons-‐
calculator/

Interval Estimation
16

Conﬁdence
intervals
around
the
mean

Conﬁdence
intervals
are
calculated
based
on
the

standard
error
of
the
mean
(SEM):

s
=
sample
standard
devia%on
(see
formula
below)

n
=
sample
(number
of
observa%ons)

The
following
is
the
sample
standard
devia%on
formula
(see
also
lecture
2):

Interval Estimation
17

Example:
How
to
compute
the

confidence
interval
of
teh
mean

A
brand
ra%ng
on
a
five
point
scale
from
62
par%cipants
was
4.32
with
a
standard
devia%on
of
.845.

What
is
the
95%
confidence
interval?

1)
Find
the
mean:
4.32

2)
Compute
the
standard
devia%on:
.845

3)
Compute
the
standard
error
by
dividing
the
standard
devia%on
by
the
square
root
of
the
sample
size:

.845/
√(62)
=
.11

4)
Compute
the
margin
of
error
by
mul%plying
the
standard
error
by
2
(it
is
common
to
round
up
1.96

to
2).
=
.11
x
2
=
.22

5)
Compute
the
confidence
interval
by
adding
the
margin
of
error
to
the
mean
from
Step
1
and
then

subtrac%ng
the
margin
of
error
from
the
mean:

Lower
limit:
4.32-‐.22
=
4.10

Upper
limit:
4.32+.22
=
4.54

The
95%
confidence
interval
is
4.10
to
4.54.
We
don't
have
any
historical
data
using
this
5-‐point

branding
scale,
however,
historically,
scores
above
80%
of
the
maximum
value
tend
to
be
above

average
(4
out
of
5
on
a
5
point
scale).

Therefore
we
can
be
fairly
confident
that
the
brand
is
at
least

above
the
average
threshold
of
4
because
the
lower
end
of
the
confidence
interval
exceeds
4.

Source:
hdp://www.measuringu.com/blog/ci-‐five-‐steps.php

Interval Estimation
18

Conﬁdence
Interval
Calculator
for
Means

hdps://www.mccallum-‐layton.co.uk/tools/sta%s%c-‐calculators/conﬁdence-‐interval-‐for-‐mean-‐calculator/

Interval Estimation
19

Quiz
1:
Conﬁdence
Interval
(Mean)

You
take
a
sample
of
25
test
scores
from
a

popula%on.
The
sample
mean
is
38
and
the

populaton
standard
devia%on
is
6.5.
What
is
the

95%
conﬁdence
interval
of
the
mean?

1.  [37.49,38.51]

2.  [36.49,39.51]

3.  [35.45,40.55]

Interval Estimation
20

Calculator

hdps://www.mccallum-‐layton.co.uk/tools/sta%s%c-‐calculators/conﬁdence-‐
interval-‐for-‐mean-‐calculator

Interval Estimation
21

Quiz
2:
Conﬁdence
Interval

(Propor%on)

747
out
of
1168
female
students
said
they

always
use
a
seatbelt
when
driving.
What
is
the

99%
conﬁdence
interval
for
the
propor%on
of

female
students
in
the
popula%on
who
always

use
a
seatbelt
when
driving?

1.  [.612,.668]

2.  [.604,.676]

3.  None
of
the
above

Interval Estimation
22

Calculator

hdps://www.mccallum-‐layton.co.uk/tools/sta%s%c-‐calculators/conﬁdence-‐
interval-‐for-‐propor%ons-‐calculator/

Interval Estimation
23

Conclusions

•  A
confidence
interval
is
a
range
of
values
that
is
likely
to
contain
an

unknown
popula%on
parameter.

•  Confidence
intervals
serve
as
good
es%mates
of
the
popula%on

parameter
because
the
procedure
tends
to
produce
intervals
that

contain
the
parameter.

•  Confidence
intervals
are
comprised
of
the
point
es%mate
(the
most

likely
value)
and
a
margin
of
error
around
that
point
es%mate.
The

margin
of
error
indicates
the
amount
of
uncertainty
that
surrounds

the
sample
es%mate
of
the
popula%on
parameter.

We
will
resume
this
topic
in
Lecture
8.

Interval Estimation
24

The
end

Interval Estimation
25

Lecture 5: Interval Estimation

More Related Content

What's hot (20)

Similar to Lecture 5: Interval Estimation (20)

More from Marina Santini (20)

Recently uploaded (20)

Lecture 5: Interval Estimation