SlideShare a Scribd company logo
Outline
                  Why Statistics?
 Populations, Samples, and Census
         Some Sampling Concepts




             Lecture 1
Chapter 1: Basic Statistical Concepts


                      M. George Akritas




                M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                           Why Statistics?
          Populations, Samples, and Census
                  Some Sampling Concepts




Why Statistics?


Populations, Samples, and Census


Some Sampling Concepts
   Representative Samples
   Simple Random and Stratified Sampling
   Sampling With and Without Replacement
   Non-representative Sampling




                         M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                           Why Statistics?
          Populations, Samples, and Census
                  Some Sampling Concepts



Example (Examples of Engineering/Scientific Studies)
    Comparing the compressive strength of two or more cement
    mixtures.
    Comparing the effectiveness of three cleaning products in
    removing four different types of stains.
    Predicting failure time on the basis of stress applied.
    Assessing the effectiveness of a new traffic regulatory measure
    in reducing the weekly rate of accidents.
    Testing a manufacturer’s claim regarding a product’s quality.
    Studying the relation between salary increases and employee
    productivity in a large corporation.

What makes these studies challenging (and thus to require
Statistics) is the inherent or intrinsic variability:

                         M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                             Why Statistics?
            Populations, Samples, and Census
                    Some Sampling Concepts




     The compressive strength of different preparations of the same
     cement mixture will differ. The figure in http://sites.
     stat.psu.edu/~mga/401/fig/HistComprStrCement.pdf
     shows 32 compressive strength measurements, in MPa
     (MegaPascal units), of test cylinders 6 in. in diameter by 12
     in. high, using water/cement ratio of 0.4, measured on the
     28th day after they are made.
     Under the same stress, two beams will fail at different times.
     The proportion of defective items of a certain product will
     differ from batch to batch.

Intrinsic variability renders the objectives of the case studies, as
stated, ambiguous.


                           M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                            Why Statistics?
           Populations, Samples, and Census
                   Some Sampling Concepts




The objectives of the case studies can be made precise if stated in
terms of averages or means.

    Comparing the average hardness of two different cement
    mixtures.
    Predicting the average failure time on the basis of stress
    applied.
    Estimation of the average coefficient of thermal expansion.
    Estimation of the average proportion of defective items.

Moreover, because of variability, the words ”average” and ”mean”
have a technical meaning which can be made clear through the
concepts of population and sample.


                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                             Why Statistics?
            Populations, Samples, and Census
                    Some Sampling Concepts



Definition
Population is a well-defined collection of objects or subjects, of
relevance to a particular study, which are exposed to the same
treatment or method. Population members are called units.

Example (Examples of populations:)

    All water samples that can be taken from a lake.
    All items of a certain manufactured product.
    All students enrolled in Big Ten universities during the
    2007-08 academic year.
    Two types of cleaning products. (Each type corresponds to a
    population.)


                           M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                            Why Statistics?
           Populations, Samples, and Census
                   Some Sampling Concepts



The objective of a study is to investigate certain characteristic(s)
of the units of the population(s) of interest.

Example (Examples of characteristics:)

    All water samples taken from a lake. Characteristics: Mercury
    concentration; Concentration of other pollutants.
    All items of a certain manufactured product (that have, or will
    be produced). Characteristic: Proportion of defective items.
    All students enrolled in Big Ten universities during the
    2007-08 academic year. Characteristics: Favorite type of
    music; Political affiliation.
    Two types of cleaning products. Characteristic: cleaning
    effectiveness.


                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                       Why Statistics?
      Populations, Samples, and Census
              Some Sampling Concepts




In the example where different (but of the same type) beams
are exposed to different stress levels:
    the characteristic of interest is time to failure of a beam under
    each stress level, and
    each stress level used in the study corresponds to a separate
    population which consists of all beams that will be exposed to
    that stress level.
This emphasizes that populations are defined not only by the
units they consist of, but also by the method or treatment
applied to these units.




                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                       Why Statistics?
      Populations, Samples, and Census
              Some Sampling Concepts




Full (i.e. population-level) understanding of a characteristic
requires the examination of all population units, i.e. a census.

    For example, full understanding of the relation between salary
    and productivity of a corporation’s employees requires
    obtaining these two characteristics from all employees.
However,
    taking a census can be time consuming and expensive: The
    2000 U.S. Census costed $6.5 billion, while the 2010 Census
    costed $13 billion.
    Moreover, census is not feasible if the population is
    hypothetical or conceptual, i.e. not all members are
    available for examination.
Because of the above, we typically settle for examining all
units in a sample, which is a subset of the population.

                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                            Why Statistics?
           Populations, Samples, and Census
                   Some Sampling Concepts




Due to the intrinsic variability, the sample properties/attributes of
the characteristic of interest will differ from those of the
population. For example

     The average mercury concentration in 25 water samples will
     differ from the overall mercury concentration in the lake.
     The proportion in a sample of 100 PSU students who favor
     the use of solar energy will differ from the corresponding
     proportion of all PSU students.
     The relation between bear’s chest girth and weight in a
     sample of 10 bears, will differ from the corresponding relation
     in the entire population of 50 bears in a forested region.



                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                              Why Statistics?
             Populations, Samples, and Census
                     Some Sampling Concepts

The GOOD NEWS is that, if the sample is suitably drawn, then
sample properties approximate the population properties.


                       400
                       300
              Weight

                       200
                       100




                             20   25   30        35        40   45   50   55

                                                 Chest Girth




Figure: Population and sample relationships 1between Basic Statistical Concepts
                      M. George Akritas Lecture Chapter 1:
                                                           chest girth and
Outline
                              Why Statistics?
             Populations, Samples, and Census
                     Some Sampling Concepts


Sampling Variability


       Samples properties of the characteristic of interest also differ
       from sample to sample. For example:
        1. The number of US citizens, in a sample of size 20, who favor
           expanding solar energy, will (most likely) be different from the
           corresponding number in a different sample of 20 US citizens.
        2. The average mercury concentration in two sets of 25 water
           samples drawn from a lake will differ.
       The term sampling variability is used to describe such
       differences in the characteristic of interest from sample to
       sample.



                            M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                 Why Statistics?
Populations, Samples, and Census
        Some Sampling Concepts




         400
         300
Weight

         200
         100




               20      25     30        35        40    45     50      55

                                        Chest Girth




         Figure: Illustration of Sampling Variability.


                    M. George Akritas         Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                       Why Statistics?
      Populations, Samples, and Census
              Some Sampling Concepts




Population level properties/attributes of characteristic(s) of
interest are called (population) parameters.
     Examples of parameters include averages, proportions,
     percentiles, and correlation coefficient.
The corresponding sample properties/attributes of
characteristics are called statistics. The term sports statistics
comes from this terminology.
Sample statistics approximate the corresponding population
parameters but are not equal to them.
Statistical inference deals with the uncertainty issues which
arise in approximating parameters by statistics.
The tools of statistical inference include point and interval
estimation, hypothesis testing and prediction.


                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline
                           Why Statistics?
          Populations, Samples, and Census
                  Some Sampling Concepts




Example (Examples of Estimation, Hypothesis Testing and
Prediction)

    Estimation (point and interval) would be used in the task of
    estimating the coefficient of thermal expansion of a metal, or
    the air pollution level.
    Hypothesis testing would be used for deciding whether to take
    corrective action to bring the air pollution level down, or
    whether a manufacturer’s claim regarding the quality of a
    product is false.
    Prediction arises in cases where we would like to predict the
    failure time on the basis of the stress applied, or the age of a
    tree on the basis of its trunk diameter.


                         M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                       Why Statistics?    Simple Random and Stratified Sampling
      Populations, Samples, and Census    Sampling With and Without Replacement
              Some Sampling Concepts      Non-representative Sampling




For valid statistical inference the sample must be
representative of the population. For example, a sample of
PSU basketball players is not representative of PSU students,
if the characteristic of interest is height.
Typically it is hard to tell whether a sample is representative
of the population. So, we define a sample to be representative
if . . . (cyclical definition!!)

           it allows for valid statistical inference.

The only guarantee for that comes from the method used to
select the sample (sampling method).
The good news is that there are several sampling methods
guarantee representativeness.


                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                             Why Statistics?    Simple Random and Stratified Sampling
            Populations, Samples, and Census    Sampling With and Without Replacement
                    Some Sampling Concepts      Non-representative Sampling


Definition
A sample of size n is a simple random sample if the selection
process ensures that every sample of size n has equal chance of
being selected.
    To select a s.r.s. of size 10 from a population of 100 units, any
    of the 100!/(10!90!) samples of size 10 must be equally likely.
    In simple random sampling every member of the population
    has the same chance of being included in the sample. The
    reverse, however, is not true.

Example
To select a sample of 2 students from a population of 20 male and
20 female students, one selects at random one male and one
female students. Is this a s.r.s.? (Does every student have the
same chance of being included in the sample?)
                           M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                             Why Statistics?    Simple Random and Stratified Sampling
            Populations, Samples, and Census    Sampling With and Without Replacement
                    Some Sampling Concepts      Non-representative Sampling


Another sampling method for obtaining a representative sample is
called stratified sampling.

Definition
A stratified sample consists of simple random samples from each
of a number of groups (which are non-overlapping and make up
the entire population) called strata.

    Examples of strata include: ethnic groups, age groups, and
    production facilities.
    If the units in the different strata differ in terms of the
    characteristic under study, stratified sampling is preferable to
    s.r.s. For example, if different production facilities differ in
    terms of the proportion of defective products, a stratified
    sample is preferable.

                           M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                            Why Statistics?    Simple Random and Stratified Sampling
           Populations, Samples, and Census    Sampling With and Without Replacement
                   Some Sampling Concepts      Non-representative Sampling




How do we select a s.r.s. of size n from a population of N units?
    STEP 1: Assign to each unit a number from 1 to N.
    STEP 2: Write each number on a slips of paper, place the N
    slips of paper in an urn, and shuffle them.
    STEP 3: Select n slips of paper at random, one at a time.
Alternatively, the entire process can be performed in software like
R. We will see this in the next lab session.




                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                       Why Statistics?    Simple Random and Stratified Sampling
      Populations, Samples, and Census    Sampling With and Without Replacement
              Some Sampling Concepts      Non-representative Sampling



Sampling without replacement simply means that a
population unit can be included in a sample at most once. For
example, a simple random sample is obtained by sampling
without replacement: Once a unit’s slip of paper is drawn, it
is not placed back into the urn.
Sampling with replacement means that after a unit’s slip of
paper is chosen, it is put back in the urn. Thus a population
unit could be included in the sample anywhere between 0 and
n times. Rolling a die can be thought of as sampling with
replacement from the numbers 1, 2, . . . , 6.
Though conceptually undesirable, sampling with replacement
is easier to work with from a mathematical point of view.
When a population is very large, sampling with and without
replacement are practically equivalent.

                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                       Why Statistics?    Simple Random and Stratified Sampling
      Populations, Samples, and Census    Sampling With and Without Replacement
              Some Sampling Concepts      Non-representative Sampling




Non-representative samples arise whenever the sampling plan
is such that a part, or parts, of the population of interest are
either excluded from, or systematically under-represented in,
the sample. This is called selection bias.
Two examples of non-representative samples are self-selected
and convenience samples.
A self-selected sample often occurs when people are asked to
send in their opinions in surveys or questionnaires. For
example, in a political survey, often those who feel that things
are running smoothly or who support an incumbent will
(apathetically) not respond, whereas those activists who
strongly desire change will voice their opinions.



                     M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                            Why Statistics?    Simple Random and Stratified Sampling
           Populations, Samples, and Census    Sampling With and Without Replacement
                   Some Sampling Concepts      Non-representative Sampling


    A convenience sample is a sample made up from units that
    are most easily reached. For example, randomly selecting
    students from your classes will not result in a sample that is
    representative of all PSU students because your classes are
    mostly comprised of students with the same major as you.
    A famous example of selection bias is the following.

Example (The Literary Digest poll of 1936)
The magazine had been extremely successful in predicting the
results in US presidential elections, but in 1936 it predicted a
3-to-2 victory for Republican Alf Landon over the Democratic
incumbent Franklin Delano Roosevelt. Worth noting is that this
prediction was based on 2.3 million responses (out of 10 million
questionnaires sent). On the other hand Gallup correctly predicted
the outcome of that election by surveying only 50,000 people.
                          M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts
Outline   Representative Samples
                      Why Statistics?    Simple Random and Stratified Sampling
     Populations, Samples, and Census    Sampling With and Without Replacement
             Some Sampling Concepts      Non-representative Sampling




Go to next lesson http://www.stat.psu.edu/~mga/401/
course.info/b.lect2.pdf
Go to the Stat 401 home page
http://www.stat.psu.edu/~mga/401/course.info/
http://www.stat.psu.edu/~mga
http://www.google.com




                    M. George Akritas    Lecture 1 Chapter 1: Basic Statistical Concepts

More Related Content

PPT
Basic Statistical Concepts and Methods
PDF
StatisticsA Review of Basic Statistical Concepts
PDF
Rammeverk: Developing basic competences in Statistics Denmark
PDF
Lesson2
PPT
Introduction to statistics 2013
PPT
Statistics lesson 1
PPTX
Statistical Process Control
PDF
STATISTICS AND PROBABILITY (TEACHING GUIDE)
Basic Statistical Concepts and Methods
StatisticsA Review of Basic Statistical Concepts
Rammeverk: Developing basic competences in Statistics Denmark
Lesson2
Introduction to statistics 2013
Statistics lesson 1
Statistical Process Control
STATISTICS AND PROBABILITY (TEACHING GUIDE)

Similar to B.lect1 (20)

PPT
PPT
grade7statistics-150427083137-conversion-gate01.ppt
PDF
statistics.pdf
DOC
Statistics Assignments 090427
PPT
chap1.ppt
PDF
Statistics of engineer’s with basic concepts in statistics
PDF
StatIstics module 1
PDF
Basic stat
PPT
A basic Introduction To Statistics with examples
PPT
Statistics Vocabulary Chapter 1
PPT
PPT
Grade 7 Statistics
PPT
Sect 1.1 1.4
DOCX
Hcai 5220 lecture notes on campus sessions fall 11(2)
PPTX
Presentation1
PPT
Bioststistic mbbs-1 f30may
DOCX
Chapter 1
PDF
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
PDF
statistics - Populations and Samples.pdf
PPT
Chapter 3
grade7statistics-150427083137-conversion-gate01.ppt
statistics.pdf
Statistics Assignments 090427
chap1.ppt
Statistics of engineer’s with basic concepts in statistics
StatIstics module 1
Basic stat
A basic Introduction To Statistics with examples
Statistics Vocabulary Chapter 1
Grade 7 Statistics
Sect 1.1 1.4
Hcai 5220 lecture notes on campus sessions fall 11(2)
Presentation1
Bioststistic mbbs-1 f30may
Chapter 1
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
statistics - Populations and Samples.pdf
Chapter 3
Ad

More from Ankit Katiyar (20)

DOC
Transportation and assignment_problem
DOC
Time and space complexity
PDF
The oc curve_of_attribute_acceptance_plans
PDF
Stat methchapter
PDF
Simple queuingmodelspdf
PDF
Scatter diagrams and correlation and simple linear regresssion
PDF
Queueing 3
PDF
Queueing 2
PDF
Queueing
PDF
Probability mass functions and probability density functions
PDF
Lecture18
PDF
PDF
Lect 02
PDF
PDF
Introduction to basic statistics
PDF
Conceptual foundations statistics and probability
PDF
PDF
Applied statistics and probability for engineers solution montgomery && runger
PDF
A hand kano-model-boston_upa_may-12-2004
PDF
08.slauson.dissertation
Transportation and assignment_problem
Time and space complexity
The oc curve_of_attribute_acceptance_plans
Stat methchapter
Simple queuingmodelspdf
Scatter diagrams and correlation and simple linear regresssion
Queueing 3
Queueing 2
Queueing
Probability mass functions and probability density functions
Lecture18
Lect 02
Introduction to basic statistics
Conceptual foundations statistics and probability
Applied statistics and probability for engineers solution montgomery && runger
A hand kano-model-boston_upa_may-12-2004
08.slauson.dissertation
Ad

B.lect1

  • 1. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Lecture 1 Chapter 1: Basic Statistical Concepts M. George Akritas M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 2. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Why Statistics? Populations, Samples, and Census Some Sampling Concepts Representative Samples Simple Random and Stratified Sampling Sampling With and Without Replacement Non-representative Sampling M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 3. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Example (Examples of Engineering/Scientific Studies) Comparing the compressive strength of two or more cement mixtures. Comparing the effectiveness of three cleaning products in removing four different types of stains. Predicting failure time on the basis of stress applied. Assessing the effectiveness of a new traffic regulatory measure in reducing the weekly rate of accidents. Testing a manufacturer’s claim regarding a product’s quality. Studying the relation between salary increases and employee productivity in a large corporation. What makes these studies challenging (and thus to require Statistics) is the inherent or intrinsic variability: M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 4. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts The compressive strength of different preparations of the same cement mixture will differ. The figure in http://sites. stat.psu.edu/~mga/401/fig/HistComprStrCement.pdf shows 32 compressive strength measurements, in MPa (MegaPascal units), of test cylinders 6 in. in diameter by 12 in. high, using water/cement ratio of 0.4, measured on the 28th day after they are made. Under the same stress, two beams will fail at different times. The proportion of defective items of a certain product will differ from batch to batch. Intrinsic variability renders the objectives of the case studies, as stated, ambiguous. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 5. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts The objectives of the case studies can be made precise if stated in terms of averages or means. Comparing the average hardness of two different cement mixtures. Predicting the average failure time on the basis of stress applied. Estimation of the average coefficient of thermal expansion. Estimation of the average proportion of defective items. Moreover, because of variability, the words ”average” and ”mean” have a technical meaning which can be made clear through the concepts of population and sample. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 6. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Definition Population is a well-defined collection of objects or subjects, of relevance to a particular study, which are exposed to the same treatment or method. Population members are called units. Example (Examples of populations:) All water samples that can be taken from a lake. All items of a certain manufactured product. All students enrolled in Big Ten universities during the 2007-08 academic year. Two types of cleaning products. (Each type corresponds to a population.) M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 7. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts The objective of a study is to investigate certain characteristic(s) of the units of the population(s) of interest. Example (Examples of characteristics:) All water samples taken from a lake. Characteristics: Mercury concentration; Concentration of other pollutants. All items of a certain manufactured product (that have, or will be produced). Characteristic: Proportion of defective items. All students enrolled in Big Ten universities during the 2007-08 academic year. Characteristics: Favorite type of music; Political affiliation. Two types of cleaning products. Characteristic: cleaning effectiveness. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 8. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts In the example where different (but of the same type) beams are exposed to different stress levels: the characteristic of interest is time to failure of a beam under each stress level, and each stress level used in the study corresponds to a separate population which consists of all beams that will be exposed to that stress level. This emphasizes that populations are defined not only by the units they consist of, but also by the method or treatment applied to these units. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 9. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Full (i.e. population-level) understanding of a characteristic requires the examination of all population units, i.e. a census. For example, full understanding of the relation between salary and productivity of a corporation’s employees requires obtaining these two characteristics from all employees. However, taking a census can be time consuming and expensive: The 2000 U.S. Census costed $6.5 billion, while the 2010 Census costed $13 billion. Moreover, census is not feasible if the population is hypothetical or conceptual, i.e. not all members are available for examination. Because of the above, we typically settle for examining all units in a sample, which is a subset of the population. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 10. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Due to the intrinsic variability, the sample properties/attributes of the characteristic of interest will differ from those of the population. For example The average mercury concentration in 25 water samples will differ from the overall mercury concentration in the lake. The proportion in a sample of 100 PSU students who favor the use of solar energy will differ from the corresponding proportion of all PSU students. The relation between bear’s chest girth and weight in a sample of 10 bears, will differ from the corresponding relation in the entire population of 50 bears in a forested region. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 11. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts The GOOD NEWS is that, if the sample is suitably drawn, then sample properties approximate the population properties. 400 300 Weight 200 100 20 25 30 35 40 45 50 55 Chest Girth Figure: Population and sample relationships 1between Basic Statistical Concepts M. George Akritas Lecture Chapter 1: chest girth and
  • 12. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Sampling Variability Samples properties of the characteristic of interest also differ from sample to sample. For example: 1. The number of US citizens, in a sample of size 20, who favor expanding solar energy, will (most likely) be different from the corresponding number in a different sample of 20 US citizens. 2. The average mercury concentration in two sets of 25 water samples drawn from a lake will differ. The term sampling variability is used to describe such differences in the characteristic of interest from sample to sample. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 13. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts 400 300 Weight 200 100 20 25 30 35 40 45 50 55 Chest Girth Figure: Illustration of Sampling Variability. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 14. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Population level properties/attributes of characteristic(s) of interest are called (population) parameters. Examples of parameters include averages, proportions, percentiles, and correlation coefficient. The corresponding sample properties/attributes of characteristics are called statistics. The term sports statistics comes from this terminology. Sample statistics approximate the corresponding population parameters but are not equal to them. Statistical inference deals with the uncertainty issues which arise in approximating parameters by statistics. The tools of statistical inference include point and interval estimation, hypothesis testing and prediction. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 15. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Example (Examples of Estimation, Hypothesis Testing and Prediction) Estimation (point and interval) would be used in the task of estimating the coefficient of thermal expansion of a metal, or the air pollution level. Hypothesis testing would be used for deciding whether to take corrective action to bring the air pollution level down, or whether a manufacturer’s claim regarding the quality of a product is false. Prediction arises in cases where we would like to predict the failure time on the basis of the stress applied, or the age of a tree on the basis of its trunk diameter. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 16. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling For valid statistical inference the sample must be representative of the population. For example, a sample of PSU basketball players is not representative of PSU students, if the characteristic of interest is height. Typically it is hard to tell whether a sample is representative of the population. So, we define a sample to be representative if . . . (cyclical definition!!) it allows for valid statistical inference. The only guarantee for that comes from the method used to select the sample (sampling method). The good news is that there are several sampling methods guarantee representativeness. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 17. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Definition A sample of size n is a simple random sample if the selection process ensures that every sample of size n has equal chance of being selected. To select a s.r.s. of size 10 from a population of 100 units, any of the 100!/(10!90!) samples of size 10 must be equally likely. In simple random sampling every member of the population has the same chance of being included in the sample. The reverse, however, is not true. Example To select a sample of 2 students from a population of 20 male and 20 female students, one selects at random one male and one female students. Is this a s.r.s.? (Does every student have the same chance of being included in the sample?) M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 18. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Another sampling method for obtaining a representative sample is called stratified sampling. Definition A stratified sample consists of simple random samples from each of a number of groups (which are non-overlapping and make up the entire population) called strata. Examples of strata include: ethnic groups, age groups, and production facilities. If the units in the different strata differ in terms of the characteristic under study, stratified sampling is preferable to s.r.s. For example, if different production facilities differ in terms of the proportion of defective products, a stratified sample is preferable. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 19. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling How do we select a s.r.s. of size n from a population of N units? STEP 1: Assign to each unit a number from 1 to N. STEP 2: Write each number on a slips of paper, place the N slips of paper in an urn, and shuffle them. STEP 3: Select n slips of paper at random, one at a time. Alternatively, the entire process can be performed in software like R. We will see this in the next lab session. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 20. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Sampling without replacement simply means that a population unit can be included in a sample at most once. For example, a simple random sample is obtained by sampling without replacement: Once a unit’s slip of paper is drawn, it is not placed back into the urn. Sampling with replacement means that after a unit’s slip of paper is chosen, it is put back in the urn. Thus a population unit could be included in the sample anywhere between 0 and n times. Rolling a die can be thought of as sampling with replacement from the numbers 1, 2, . . . , 6. Though conceptually undesirable, sampling with replacement is easier to work with from a mathematical point of view. When a population is very large, sampling with and without replacement are practically equivalent. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 21. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Non-representative samples arise whenever the sampling plan is such that a part, or parts, of the population of interest are either excluded from, or systematically under-represented in, the sample. This is called selection bias. Two examples of non-representative samples are self-selected and convenience samples. A self-selected sample often occurs when people are asked to send in their opinions in surveys or questionnaires. For example, in a political survey, often those who feel that things are running smoothly or who support an incumbent will (apathetically) not respond, whereas those activists who strongly desire change will voice their opinions. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 22. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling A convenience sample is a sample made up from units that are most easily reached. For example, randomly selecting students from your classes will not result in a sample that is representative of all PSU students because your classes are mostly comprised of students with the same major as you. A famous example of selection bias is the following. Example (The Literary Digest poll of 1936) The magazine had been extremely successful in predicting the results in US presidential elections, but in 1936 it predicted a 3-to-2 victory for Republican Alf Landon over the Democratic incumbent Franklin Delano Roosevelt. Worth noting is that this prediction was based on 2.3 million responses (out of 10 million questionnaires sent). On the other hand Gallup correctly predicted the outcome of that election by surveying only 50,000 people. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  • 23. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling Go to next lesson http://www.stat.psu.edu/~mga/401/ course.info/b.lect2.pdf Go to the Stat 401 home page http://www.stat.psu.edu/~mga/401/course.info/ http://www.stat.psu.edu/~mga http://www.google.com M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts