SlideShare a Scribd company logo
Sampling Andrew Martin PS 372 University of Kentucky
What do polls tell us?
Population If we want to assess American support for competing presidential candidates, we must clarify what we mean by Americans. In other words, we need to specify a population.  A  population  is the complete set of relevant units of analysis. For the purpose of studying elections, the population is generally defined as the U.S. voting-age population (residents 18 and older).
Sample Interviewing every voting-age American would be impossible; the time and money constraints are too great. However, it is possible (and more practical) to select a sample from the population for investigation. A  sample  is any  subset   of units collected in some manner from a population. The sample size and method ultimately determines the quality of inferences that can be made about the population.
Population vs. Sample Study Population Sample U.S. Voting Voting-Age Pop. Gallup Poll Respondents Military Coups All coups Coups in Latin America in 1980s Supreme Court Decision-making All decisions on the merits Merits decisions on Burger Court Democratization All democratizing countries Democratizing post-Cold War
Populations and Samples Ideally, political scientists would like to measure and gather information about the population.  Examples: Averages, differences between two groups, relationships among variables. If this information can be found and quantified for the entire population, the number is known as a  population parameter . However,  sample statistics  allow political scientists to approximate the corresponding population values, or parameters.,
Population Parameters Population parameters are typically denoted by lowercase English or Greek  letters, usually the Greek letter theta ( θ ). A proportion, such as the proportion of Americans who support the war in Iraq at a particular time, typically designated as  P  or π.
Sample Statistics Sample statistics are frequently expresses with a hat (^) over a character to denote it is not a parameter but a statistic. Sometimes lowercase  p  is used for a sample proportion.
Statistics vs. Parameters For a population mean, the lowercase Greek letter Mu (μ) is used.  For the corresponding sample statistic, μ hat is used, or Y bar
Population vs. Samples An  element  is a single occurrence, realization or instance of the objects or entities being studied. A population can be subdivided into small groups known as  strata .  Each stratum shares one or more characteristic.
Sampling The particular population from which a sample is actually drawn is called a  sampling frame .  Sampling frames are supposed to contain all elements that are part of the population of interest, but in practice are often incomplete. Example: Polling UK students using the annual student phone directory.
1936 Presidential Election Literary digest predicted that Republican Alf Landon would defeat Democrat Franklin D. Roosevelt.  The population: U.S. voters.  The sampling frame: Telephone directories and automobile registration lists.  Telephone and car ownership were not common then. The sample was not representative of the actual population because it overrepresented wealthy voters.
Sampling Now, virtually everyone has a telephone. But some people have unlisted numbers. Researchers have developed  random digit dialing  to deal with this problem. A computer randomly selects telephone numbers, which is how people with unlisted numbers are contacted. However, not everyone owns a telephone. Millions of people are switching to cell phones, which will eventually cause pollsters to change their methodology.
Sampling Political science researchers like to use information collected in the sample to make inferences about the whole population.  If the sampling frame is incomplete or inappropriate,  sample bias  will occur. This causes the sample to be unrepresentative of the population and can lead scholars to draw incorrect conclusions.
Sampling A probability sample is simply a sample for which each element in the total population has a known probability of being sampled. A nonprobability sample is one in which each element in the population has an unknown probability of being selected.
Simple Random Sample In a  simple random sample  each element and combination of elements has an equal chance of being selected. However, this is often difficult to do in practice.
Simple Random Sample During the Vietnam War, the selective service decided it would randomly draft soldiers by sampling days of the year.  A drum contained 366 capsules with days of the year. Capsules were drawn, and men born on the day written on the capsule would be called to serve in the military unless exempted. However, the capsules must not have been properly mixed, because the Selective Service tended to oversample days during the last six months of the year.
Simple Random Sample Method 1 Example: We have 1,507 elements in the population and wish to draw a sample of 150.  Every element is numbered, starting at 1 and ending at 1,507. Using a random number table, elements can be selected each time its corresponding number appears. Any system of combining the numbers is acceptable as long as the numbers are random.
Simple Random Sample Method 2 Example: We have 1,507 elements in the population and wish to draw a sample of 150.  All elements are represented on corresponding marbles and put in a hit, which is continuously and thoroughly mixed. Each element has an equal chance of being selected.
Systematic Sample Elements are selected from a list at pre-determined intervals. In other words, they are chosen systematically rather than randomly. Every  j th element on a list is selected. This number is known as the sampling interval. If you have a population of 2,000 on a list and you want a sample of 200, you can select every 1oth element on the list for the sample. Usually the starting number is randomly selected. This is known as a  random start .
Systematic Sample Systematic samples could be biased if: 1. The elements on the list have been ranked according to a characteristic. 2. If the list contains a pattern corresponding to a sampling interval.
Stratified Sample A  stratified sample  is a probability sample in which elements sharing one or more characteristics are grouped, and elements are selected from each group in proportion to the group’s representation in the total population. It is less difficult to draw a sample from a homogeneous population than a heterogeneous population.
Stratified Sample Can be proportionate or disproportionate.  In a proportionate sample, each stratum is represented in proportion to its size in the population. To determine the number of elements to sample in each stratum, a sampling fraction must be calculated.
Proportionate Stratified Sample Ex: We have 8,000 freshmen, 5,000 sophomores, 4,000 juniors and 3,000 seniors we wish to poll about the presidential election. We want to sample 2,000 students.  The sampling fraction would be the desired sample size divided by the population, so 2000/20000. Therefore, we would sample 10 percent of each stratum, which in this case by school year.
Proportionate Stratified Sample When selecting characteristics on which to stratify a list, you should chose those expected to have a relationship with the dependent variable. Ex: Household income samples should stratify by education, sex and race. Ex: MCs stratified by party and experience. Ex: News stories by network.
Disproportionate Stratified Sample In a disproportionate sample, some strata are underrepresented and others are overrepresented. Usually, strata making up a smaller percentage of the population get oversampled so we can make useful inferences about that group independent of the other strata.  To prevent having a biased sample, each strata is weighted by its proportion of the population.
Disproportionate  Stratified Sample JRM 223 .625(Liberal Arts) + .125(Engineering) + .25(Business) = Mean Student Body GPA .625(2.5) + .125(3.3) + .25(2.7) = 2.65
Cluster Samples A cluster sample is a probability sample in which the sampling frame initially consists of clusters of elements.
NN 174
Cluster Samples Suppose there are 500 blocks, and from these blocks 25 are chosen at random.  On these 25 blocks, there are 4,000 dwelling units or households. One quarter of these households will be contacted because we desire a sample of 1,000 individuals.
Cluster Samples Each household's probability of being selected is the same. How do we know this? 25/500 (probability that the household's block will be chosen) X 1000/4000 (probability of being surveyed if block is chosen) = 1/80
Nonprobability Samples Sometimes an element's chance of being selected is unknown, and nonprobability samples have to be collected.  Although probability samples are prefereable, sometimes they are not feasible.  Researchers may be able to learn more studying carefully selected, even unusual cases.
Nonprobability Samples Purposive samples  allow the researcher to have discretion in selecting elements for observation.  (Ex: Fenno's Home Style)‏ A  quota  sample is a sample in which elements are sampled in proportion to the population. Similar to a stratified sample but elements are not chosen probabilistically. In a  snowball  sample, respondents are asked to identify other persons who might qualify for inclusion in the sample.
Statistical Inference Statistical inference  involves the mathematical theory and techniques for making conjectures about the unknown characteristics (parameters) of populations based on samples.
Statistical Inference Samples statistics provide us with estimates or approximations of population parameters.  These estimates may differ from the “true” value of the population parameter, but if the sample is collected correctly and is large enough, the estimates are unlikely to be far from the truth.
Statistical Inference We will focus on three concepts: Expected values Standard errors Sampling distributions
Expected Value Expected value  is the mean or average value of a sample statistic based on repeated samples from a population.
Expected Value Suppose a candidate for state senate wants to know how many independent voters live in a district, which has grown rapidly during the last 10 years. Therefore, there are no reliable Census data available. Why might a state senator care about the number of independents in his/her district?
Expected Value Suppose the true level of registered Independents is 25 percent, or .25.  In formal terms,  P  = .25, where  P  = population parameter. You take the first sample. Two of 10 respondents say they are independents. Your first estimate, or sample statistic, has some  sampling error . Specifically, the sampling error is the discrepancy (.05) between the population parameter (.25) and the sample statistic (.20).
Expected Value What about four samples? Let's assume you conduct 4 samples of 10 and the proportion of independents is:  (.20 + .30 + .40 + .20)/4 = .275 (observed value or mean not far from true value of .25)‏ Four samples of 10 brings us closer to the truth than one sample. What about 1,000 samples of 10? What about 1,000 samples of 50?
Expected Value If statistics (or observed values) are calculated for each of many, many independently and randomly chosen samples, their average or mean will equal the corresponding population parameter (or true value).  Statisticians refer to this mean as the expected value ( E ) of the estimator.
Expected Value If  θ  represents the population parameter, then  represents a sample estimator of that characteristic. We can then write:
Expected Value In the case of a sample proportion based on a simple random sample, we have: E ( p ) =  P where p is sample proportion,  P  is the estimated population proportion. In the long run, the sample statistic the average results would theoretically equal the true value or the population proportion.
Sampling Distribution A  sampling distribution  of a sample statistic is a theoretical expression that describes the mean variation, and shape of the distribution in an infinite number of occurrences of the statistic when calculated on samples of size N drawn independently and randomly from a population.
JRM 230
JRM 233
JRM 234
JRM 235
Confidence and Error Confidence  is the degree of belief that an estimated range of values – more specifically, a high or low value – includes or covers the population parameter. In political science this concept is normally described by a confidence interval. Standard error  is the standard deviation or measure of variability of a sampling distribution. In other words, it tells us how much variation there is in the sampling methods.
Standard Error The standard error of a reported proportion or percentage  p  measures its accuracy, and is the estimated standard deviation of that percentage. It can be estimated from just  p  and the sample size,  n , if  n  is small relative to the population size, using the following formula.
Standard Error
Confidence Intervals Get a poll, talk about it.
 
Gallup Poll Standard Error If we want the margin of error, or confidence interval for the poll, we multiply this number by 1.96. .01 * 1.96 = 1.96  ≈  2
Margin of Error Graphic
Confidence Intervals Standard distribution photo

More Related Content

PPT
Week 7 Sampling
PPTX
Systematic sampling in probability sampling
PPT
Sampling and Inference_Political_Science
PPT
Tqm sampling
PPT
Sampling methods
PPTX
Sampling class
PPTX
SAMPLING
Week 7 Sampling
Systematic sampling in probability sampling
Sampling and Inference_Political_Science
Tqm sampling
Sampling methods
Sampling class
SAMPLING

What's hot (11)

PPTX
Research 1: Sampling
PPTX
Simple random sampling
PPTX
PPTX
Sampling techniques new
PPTX
Quota Sampling
PPTX
Sampling techniques- Probability Sampling
PPTX
Sampling design
PPT
Chapter10 3%285%29
PPTX
Sample surveys
PPTX
Basic Concepts of Inferential statistics
DOCX
Research 1: Sampling
Simple random sampling
Sampling techniques new
Quota Sampling
Sampling techniques- Probability Sampling
Sampling design
Chapter10 3%285%29
Sample surveys
Basic Concepts of Inferential statistics
Ad

Viewers also liked (6)

PPT
Berry et al
PPT
Morestatistics22 091208004743-phpapp01
PPT
Statistics
PPT
Presidency
PPT
Civil Rights
PPT
Am Federalism
Berry et al
Morestatistics22 091208004743-phpapp01
Statistics
Presidency
Civil Rights
Am Federalism
Ad

Similar to Week 7 - sampling (20)

PPT
03 PSBE3_PPT.Ch03.ppt
PPS
nossi ch 9
DOCX
POL SOC 360 Sampling Generalizability
PPT
Common sampling techniques
PPT
Sampling and Types of sampling are given along with questions.ppt
PPTX
Statistics and probability - For Demo in Senior High School.pptx
PPTX
Statistics
PPTX
Sampling
PDF
brm Assign 5.pdf
PPT
Probability sampling techniques
PDF
statics engineering mechanics slides.pdf
PPTX
research method about population and sample .pptx
PDF
Pink-Cute-Simple-Group-Project-Presentation_20250106_123703_0000.pdf
DOCX
Sampling Techniques.docx
PPTX
STATISTICS AND PROBABILITY- ACTIVITY CH1
PPT
On Samples And Sampling
PPTX
STA 222 Lecture 1 Introduction to Statistical Inference.pptx
PPTX
sampling technique
PPTX
Sampling Technique - Anish
PDF
Section 5_Sampling.pdf , All in Statistics
03 PSBE3_PPT.Ch03.ppt
nossi ch 9
POL SOC 360 Sampling Generalizability
Common sampling techniques
Sampling and Types of sampling are given along with questions.ppt
Statistics and probability - For Demo in Senior High School.pptx
Statistics
Sampling
brm Assign 5.pdf
Probability sampling techniques
statics engineering mechanics slides.pdf
research method about population and sample .pptx
Pink-Cute-Simple-Group-Project-Presentation_20250106_123703_0000.pdf
Sampling Techniques.docx
STATISTICS AND PROBABILITY- ACTIVITY CH1
On Samples And Sampling
STA 222 Lecture 1 Introduction to Statistical Inference.pptx
sampling technique
Sampling Technique - Anish
Section 5_Sampling.pdf , All in Statistics

More from mandrewmartin (20)

PPT
Regression
PPT
Diffmeans
PPT
More tabs
PPT
Crosstabs
PPT
Statisticalrelationships
PPT
Statistics 091208004734-phpapp01 (1)
PPT
Research design pt. 2
PPT
Research design
PPT
Measurement pt. 2
PPT
Measurement
PPT
Introduction
PPT
Building blocks of scientific research
PPT
Studying politics scientifically
PPT
Chapter 11 Psrm
PPT
Stats Intro Ps 372
PPT
PPT
PPT
Political Parties
PPT
Elections
PPT
Bureaucracy
Regression
Diffmeans
More tabs
Crosstabs
Statisticalrelationships
Statistics 091208004734-phpapp01 (1)
Research design pt. 2
Research design
Measurement pt. 2
Measurement
Introduction
Building blocks of scientific research
Studying politics scientifically
Chapter 11 Psrm
Stats Intro Ps 372
Political Parties
Elections
Bureaucracy

Week 7 - sampling

  • 1. Sampling Andrew Martin PS 372 University of Kentucky
  • 2. What do polls tell us?
  • 3. Population If we want to assess American support for competing presidential candidates, we must clarify what we mean by Americans. In other words, we need to specify a population. A population is the complete set of relevant units of analysis. For the purpose of studying elections, the population is generally defined as the U.S. voting-age population (residents 18 and older).
  • 4. Sample Interviewing every voting-age American would be impossible; the time and money constraints are too great. However, it is possible (and more practical) to select a sample from the population for investigation. A sample is any subset of units collected in some manner from a population. The sample size and method ultimately determines the quality of inferences that can be made about the population.
  • 5. Population vs. Sample Study Population Sample U.S. Voting Voting-Age Pop. Gallup Poll Respondents Military Coups All coups Coups in Latin America in 1980s Supreme Court Decision-making All decisions on the merits Merits decisions on Burger Court Democratization All democratizing countries Democratizing post-Cold War
  • 6. Populations and Samples Ideally, political scientists would like to measure and gather information about the population. Examples: Averages, differences between two groups, relationships among variables. If this information can be found and quantified for the entire population, the number is known as a population parameter . However, sample statistics allow political scientists to approximate the corresponding population values, or parameters.,
  • 7. Population Parameters Population parameters are typically denoted by lowercase English or Greek letters, usually the Greek letter theta ( θ ). A proportion, such as the proportion of Americans who support the war in Iraq at a particular time, typically designated as P or π.
  • 8. Sample Statistics Sample statistics are frequently expresses with a hat (^) over a character to denote it is not a parameter but a statistic. Sometimes lowercase p is used for a sample proportion.
  • 9. Statistics vs. Parameters For a population mean, the lowercase Greek letter Mu (μ) is used. For the corresponding sample statistic, μ hat is used, or Y bar
  • 10. Population vs. Samples An element is a single occurrence, realization or instance of the objects or entities being studied. A population can be subdivided into small groups known as strata . Each stratum shares one or more characteristic.
  • 11. Sampling The particular population from which a sample is actually drawn is called a sampling frame . Sampling frames are supposed to contain all elements that are part of the population of interest, but in practice are often incomplete. Example: Polling UK students using the annual student phone directory.
  • 12. 1936 Presidential Election Literary digest predicted that Republican Alf Landon would defeat Democrat Franklin D. Roosevelt. The population: U.S. voters. The sampling frame: Telephone directories and automobile registration lists. Telephone and car ownership were not common then. The sample was not representative of the actual population because it overrepresented wealthy voters.
  • 13. Sampling Now, virtually everyone has a telephone. But some people have unlisted numbers. Researchers have developed random digit dialing to deal with this problem. A computer randomly selects telephone numbers, which is how people with unlisted numbers are contacted. However, not everyone owns a telephone. Millions of people are switching to cell phones, which will eventually cause pollsters to change their methodology.
  • 14. Sampling Political science researchers like to use information collected in the sample to make inferences about the whole population. If the sampling frame is incomplete or inappropriate, sample bias will occur. This causes the sample to be unrepresentative of the population and can lead scholars to draw incorrect conclusions.
  • 15. Sampling A probability sample is simply a sample for which each element in the total population has a known probability of being sampled. A nonprobability sample is one in which each element in the population has an unknown probability of being selected.
  • 16. Simple Random Sample In a simple random sample each element and combination of elements has an equal chance of being selected. However, this is often difficult to do in practice.
  • 17. Simple Random Sample During the Vietnam War, the selective service decided it would randomly draft soldiers by sampling days of the year. A drum contained 366 capsules with days of the year. Capsules were drawn, and men born on the day written on the capsule would be called to serve in the military unless exempted. However, the capsules must not have been properly mixed, because the Selective Service tended to oversample days during the last six months of the year.
  • 18. Simple Random Sample Method 1 Example: We have 1,507 elements in the population and wish to draw a sample of 150. Every element is numbered, starting at 1 and ending at 1,507. Using a random number table, elements can be selected each time its corresponding number appears. Any system of combining the numbers is acceptable as long as the numbers are random.
  • 19. Simple Random Sample Method 2 Example: We have 1,507 elements in the population and wish to draw a sample of 150. All elements are represented on corresponding marbles and put in a hit, which is continuously and thoroughly mixed. Each element has an equal chance of being selected.
  • 20. Systematic Sample Elements are selected from a list at pre-determined intervals. In other words, they are chosen systematically rather than randomly. Every j th element on a list is selected. This number is known as the sampling interval. If you have a population of 2,000 on a list and you want a sample of 200, you can select every 1oth element on the list for the sample. Usually the starting number is randomly selected. This is known as a random start .
  • 21. Systematic Sample Systematic samples could be biased if: 1. The elements on the list have been ranked according to a characteristic. 2. If the list contains a pattern corresponding to a sampling interval.
  • 22. Stratified Sample A stratified sample is a probability sample in which elements sharing one or more characteristics are grouped, and elements are selected from each group in proportion to the group’s representation in the total population. It is less difficult to draw a sample from a homogeneous population than a heterogeneous population.
  • 23. Stratified Sample Can be proportionate or disproportionate. In a proportionate sample, each stratum is represented in proportion to its size in the population. To determine the number of elements to sample in each stratum, a sampling fraction must be calculated.
  • 24. Proportionate Stratified Sample Ex: We have 8,000 freshmen, 5,000 sophomores, 4,000 juniors and 3,000 seniors we wish to poll about the presidential election. We want to sample 2,000 students. The sampling fraction would be the desired sample size divided by the population, so 2000/20000. Therefore, we would sample 10 percent of each stratum, which in this case by school year.
  • 25. Proportionate Stratified Sample When selecting characteristics on which to stratify a list, you should chose those expected to have a relationship with the dependent variable. Ex: Household income samples should stratify by education, sex and race. Ex: MCs stratified by party and experience. Ex: News stories by network.
  • 26. Disproportionate Stratified Sample In a disproportionate sample, some strata are underrepresented and others are overrepresented. Usually, strata making up a smaller percentage of the population get oversampled so we can make useful inferences about that group independent of the other strata. To prevent having a biased sample, each strata is weighted by its proportion of the population.
  • 27. Disproportionate Stratified Sample JRM 223 .625(Liberal Arts) + .125(Engineering) + .25(Business) = Mean Student Body GPA .625(2.5) + .125(3.3) + .25(2.7) = 2.65
  • 28. Cluster Samples A cluster sample is a probability sample in which the sampling frame initially consists of clusters of elements.
  • 30. Cluster Samples Suppose there are 500 blocks, and from these blocks 25 are chosen at random. On these 25 blocks, there are 4,000 dwelling units or households. One quarter of these households will be contacted because we desire a sample of 1,000 individuals.
  • 31. Cluster Samples Each household's probability of being selected is the same. How do we know this? 25/500 (probability that the household's block will be chosen) X 1000/4000 (probability of being surveyed if block is chosen) = 1/80
  • 32. Nonprobability Samples Sometimes an element's chance of being selected is unknown, and nonprobability samples have to be collected. Although probability samples are prefereable, sometimes they are not feasible. Researchers may be able to learn more studying carefully selected, even unusual cases.
  • 33. Nonprobability Samples Purposive samples allow the researcher to have discretion in selecting elements for observation. (Ex: Fenno's Home Style)‏ A quota sample is a sample in which elements are sampled in proportion to the population. Similar to a stratified sample but elements are not chosen probabilistically. In a snowball sample, respondents are asked to identify other persons who might qualify for inclusion in the sample.
  • 34. Statistical Inference Statistical inference involves the mathematical theory and techniques for making conjectures about the unknown characteristics (parameters) of populations based on samples.
  • 35. Statistical Inference Samples statistics provide us with estimates or approximations of population parameters. These estimates may differ from the “true” value of the population parameter, but if the sample is collected correctly and is large enough, the estimates are unlikely to be far from the truth.
  • 36. Statistical Inference We will focus on three concepts: Expected values Standard errors Sampling distributions
  • 37. Expected Value Expected value is the mean or average value of a sample statistic based on repeated samples from a population.
  • 38. Expected Value Suppose a candidate for state senate wants to know how many independent voters live in a district, which has grown rapidly during the last 10 years. Therefore, there are no reliable Census data available. Why might a state senator care about the number of independents in his/her district?
  • 39. Expected Value Suppose the true level of registered Independents is 25 percent, or .25. In formal terms, P = .25, where P = population parameter. You take the first sample. Two of 10 respondents say they are independents. Your first estimate, or sample statistic, has some sampling error . Specifically, the sampling error is the discrepancy (.05) between the population parameter (.25) and the sample statistic (.20).
  • 40. Expected Value What about four samples? Let's assume you conduct 4 samples of 10 and the proportion of independents is: (.20 + .30 + .40 + .20)/4 = .275 (observed value or mean not far from true value of .25)‏ Four samples of 10 brings us closer to the truth than one sample. What about 1,000 samples of 10? What about 1,000 samples of 50?
  • 41. Expected Value If statistics (or observed values) are calculated for each of many, many independently and randomly chosen samples, their average or mean will equal the corresponding population parameter (or true value). Statisticians refer to this mean as the expected value ( E ) of the estimator.
  • 42. Expected Value If θ represents the population parameter, then represents a sample estimator of that characteristic. We can then write:
  • 43. Expected Value In the case of a sample proportion based on a simple random sample, we have: E ( p ) = P where p is sample proportion, P is the estimated population proportion. In the long run, the sample statistic the average results would theoretically equal the true value or the population proportion.
  • 44. Sampling Distribution A sampling distribution of a sample statistic is a theoretical expression that describes the mean variation, and shape of the distribution in an infinite number of occurrences of the statistic when calculated on samples of size N drawn independently and randomly from a population.
  • 49. Confidence and Error Confidence is the degree of belief that an estimated range of values – more specifically, a high or low value – includes or covers the population parameter. In political science this concept is normally described by a confidence interval. Standard error is the standard deviation or measure of variability of a sampling distribution. In other words, it tells us how much variation there is in the sampling methods.
  • 50. Standard Error The standard error of a reported proportion or percentage p measures its accuracy, and is the estimated standard deviation of that percentage. It can be estimated from just p and the sample size, n , if n is small relative to the population size, using the following formula.
  • 52. Confidence Intervals Get a poll, talk about it.
  • 53.  
  • 54. Gallup Poll Standard Error If we want the margin of error, or confidence interval for the poll, we multiply this number by 1.96. .01 * 1.96 = 1.96 ≈ 2
  • 55. Margin of Error Graphic
  • 56. Confidence Intervals Standard distribution photo