SlideShare a Scribd company logo
Statistics: First Steps Andrew Martin PS 372 University of Kentucky
Variance Variance  is a measure of dispersion of data points about the mean for interval- and ratio-level data. Variance is a fundamental concept that social scientists seek to explain in the dependent variable.
 
Standard Deviation Standard deviation  is a measure of dispersion of data points about the mean for interval- and ratio-level data.  Like the mean, standard deviation is sensitive to extreme values.  Standard deviation is calculated as the square root of the variance.
 
 
Normal Distribution The bulk of observations lie in the center, where there is a single peak.  In a normal distribution half (50 percent) of the observations lie above the mean and half lie below it. The mean, median and mode have the same statistical values. Fewer and fewer observations fall in the tails. The spread of the distribution is symmetric.
Normal Distribution Mathematical theory allows us to know what percentage of observations lie within one (68%), two (95%) or three (98%) standard deviations of the mean. If data are not perfectly normally distributed, the percentages will only be approximations. Many naturally occurring variables do have nearly normal distributions. Some can be transformed using logarithms.
Frequency Distribution
What about categorical variables?
 
Example Calculate the ID and IQV for this PS 372 class grades using the following frequencies or proportions: Grade Freq. Prop. A 4 (.12)‏ B 7 (.21)‏ C 4 (.12)‏ D 7 (.21)‏ E 12 (.34)
Index of Diversity ID = 1 – ( p 2 a  +  p 2 b  +  p 2 c  + p 2 d  + p 2 e )‏ ID = 1 - (.12 2  + .21 2  + .12 2  + .21 2  + .34 2 )‏ ID = 1 - (.0144 + .0441 + .0144 + .0441 + .1156)‏ ID = 1 - (.2326)‏ ID = .7674
Index of Qualitative Variation 1 – ( p 2 a  +  p 2 b  +  p 2 c  + p 2 d  + p 2 e )‏ 1 - (1/K)‏
Index of Qualitative Variation .7674 (1 – 1/5)‏ .9592
 
Data Matrix A  data matrix  is an array of rows and columns that stores the values of a set of variables for all the cases in a data set. This is frequently referred to as a dataset.
 
 
Data Matrix from JRM
Properties of Good Graphs Should answer several of the following questions: (JRM 384)‏ 1. Where does the center of the distribution lie? 2. How spread out or bunched up are the observations? 3. Does it have a single peak or more than one?  4. Approximately what proportion of observations in in the ends of the distributions?
Properties of Good Graphs 5. Do observations tend to pile up at one end of the measurement scale, with relatively few observations at the other end? 6. Are there values that, compared with most, seem very large or very small? 7. How does one distribution compare to another in terms of shape, spread, and central tendency? 8. Do values of one variable seem related to another variable?
 
 
 
 
 
Statistical Concepts Let's quickly review some concepts.
Population A  population  refers to any well-defined set of objects such as people, countries, states, organizations, and so on. The term doesn't simply mean the population of the United States or some other geographical area.
Population A sample is a subset of the population. Samples are drawn in some known manner and each case is chosen independently of the other. From here on out, when the book uses the term sample, random sample or simple random sample, it's making reference to the same concept, which is a sample chosen at random.
Populations Parameters are numerical features of a population. A sample statistic is an estimator that corresponds to a population parameter of interest and is used to estimate the population value. Y is the sample mean, ( μ)  is the population mean. ^ is a “hat”, caret or circumflex
Two Kinds of Inference Hypothesis Testing Point and interval estimation
Hypothesis Testing Many claims can be translated into specific statements about a population that can be confirmed or disconfirmed with the aid of probability theory. Ex: There is no ideological difference evident in the voting patterns of Republican and Democrat justices on the U.S. Supreme Court.
Point and Interval Estimation The goal here is to estimate unknown population parameters from samples and to surround those estimates with confidence intervals. Confidence intervals suggest the estimate's reliability or precision.
Hypothesis Testing Start with a specific verbal claim or proposition. Ex: The chances of getting heads or tails when flipping the coin is are roughly the same. Ex: The chances of the United States electing a Republican or Democrat president are roughly the same.
Hypothesis Testing Next, the researcher constructs a null hypothesis. A  null hypothesis  is a statement that a population parameter equals a specific value.
Hypothesis Testing Following up on the coin example, the null hypothesis would equal .5.  Stated more formally: H 0 :  P  = .5 Where  P  stands for the probability that the coin will be heads when tossed.  H 0  is  typically used to denote a null hypothesis.
Hypothesis Testing Next, specify an alternative hypothesis.  An  alternative hypothesis  is a statement about the value or values of a population parameter. It is proposed as an alternative to the null hypothesis.  An alternative hypothesis can merely state that the population does not equal the null hypothesis, or is greater than or less than the null hypothesis.
Hypothesis Testing Suppose you believe the coin is unfair, but have no intuition about whether it is too prone to come up heads or tails.  Stated formally, the alternative hypothesis is: H A :  P   ≠ .5
Hypothesis Testing Perhaps you believe the coin is more likely to come up heads than tails. You would formulate the following alternative hypothesis: H A  :  P  > .5 Conversely, if you believe the coin is less likely to come up heads than tails, you would formulate the alternative hypothesis in the opposite direction: H A :  P  < .5
Hypothesis Testing After specifying the null and alternative hypothesis, identify the sample estimator that corresponds to the parameter in question.  The sample must come from the data, which in this case is generated by flipping a coin.
Hypothesis Testing Next, determine how the sample statistic is distributed in repeated random samples. That is, specify the sampling distribution of the estimator.  For example, what are the chances of getting 10 heads in 10 flips ( p  = 1.)? What about 9 heads in 10 flips ( p  = .9)? 8 flips ( p  = .8)?
 
Hypothesis Testing Make a decision rule based on some criterion of probability or likelihood.  In social sciences, a result that occurs with a probability of .05 (that is, 1 chance in 20) is considered unusual and consequently is grounds for rejecting a null hypothesis.  Other common thresholds (.01, .001) are also common. Make the decision rule before collecting data.
Hypothesis Testing In light of the decision rule, define a critical region. The critical region consists of those outcomes so unlikely to occur that one has cause to reject the null hypothesis should they occur. So there are areas of “rejection” (critical areas) and non-rejection.
 
Hypothesis Testing Collect a random sample and calculate the sample estimator. Calculate the observed test statistic. A test statistic converts the sample result into a number that can be compared with the critical values specified by your decision rule and critical values.  Examine the observed test statistic to see if it falls in the critical region. Make practical or theoretical interpretation of the findings.
 

More Related Content

PDF
Eric Delmelle: Disease Mapping
 
PPT
Chapter 06
PPTX
QUANTITAIVE DATA ANALYSIS
PPTX
Properties of estimators (blue)
PDF
Normal Curve in Total Quality Management
PPT
Areas In Statistics
PPT
Statistical Techniques in Business & Economics (McGRAV-HILL) 12 Edt. Chapter ...
PPT
Chapter 12
Eric Delmelle: Disease Mapping
 
Chapter 06
QUANTITAIVE DATA ANALYSIS
Properties of estimators (blue)
Normal Curve in Total Quality Management
Areas In Statistics
Statistical Techniques in Business & Economics (McGRAV-HILL) 12 Edt. Chapter ...
Chapter 12

What's hot (20)

PDF
Descriptive Statistics
PPT
Frequency Tables & Univariate Charts
PPTX
Statistics for Librarians, Session 2: Descriptive statistics
PPTX
Psych stats Probability and Probability Distribution
PPT
Descriptive statistics ii
PPTX
RSS probability theory
PPT
Malimu descriptive statistics.
PPTX
Inferential statistics
PPT
Chapter 11
PPT
Descriptive Statistics
PPTX
Descriptive Statistics, Numerical Description
PPTX
Descriptive statistics
PPT
Statistics 091208004734-phpapp01 (1)
PPT
PPT
Chapter 05
PPT
Torturing numbers - Descriptive Statistics for Growers (2013)
PPTX
Basics of Educational Statistics (Inferential statistics)
PPTX
Inferential Statistics
DOC
1 statistical analysis notes
PPTX
Probability
Descriptive Statistics
Frequency Tables & Univariate Charts
Statistics for Librarians, Session 2: Descriptive statistics
Psych stats Probability and Probability Distribution
Descriptive statistics ii
RSS probability theory
Malimu descriptive statistics.
Inferential statistics
Chapter 11
Descriptive Statistics
Descriptive Statistics, Numerical Description
Descriptive statistics
Statistics 091208004734-phpapp01 (1)
Chapter 05
Torturing numbers - Descriptive Statistics for Growers (2013)
Basics of Educational Statistics (Inferential statistics)
Inferential Statistics
1 statistical analysis notes
Probability
Ad

Viewers also liked (7)

PPT
Morestatistics22 091208004743-phpapp01
PPT
Week 7 - sampling
PPT
Berry et al
PPT
Presidency
PPT
Week 7 Sampling
PPT
Civil Rights
PPT
Am Federalism
Morestatistics22 091208004743-phpapp01
Week 7 - sampling
Berry et al
Presidency
Week 7 Sampling
Civil Rights
Am Federalism
Ad

Similar to Statistics (20)

PPTX
Hypothesis testing
PPT
More Statistics
PPT
Chi-square IMP.ppt
PDF
0hypothesis testing.pdf
PPT
Review Z Test Ci 1
DOCX
Module-2_Notes-with-Example for data science
PDF
Unit-2 Biostatistics Probability Definition
DOCX
Topic Learning TeamNumber of Pages 2 (Double Spaced)Num.docx
PDF
Pearson's Chi-square Test for Research Analysis
DOCX
Important terminologies
PPTX
Chapter_9.pptx
DOCX
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
PPTX
hypothesis in research .......................
PDF
Data Science interview questions of Statistics
PPTX
Courses_Advanced Statistics (14-15).pptx
PPTX
Hypothesis Testing.pptx ( T- test, F- test, U- test , Anova)
PDF
Machine Learning Machine Learning Interview
DOCX
Hypothesis testing
PPTX
Statistical Significance Tests.pptx
PDF
Review of Basic Statistics and Terminology
Hypothesis testing
More Statistics
Chi-square IMP.ppt
0hypothesis testing.pdf
Review Z Test Ci 1
Module-2_Notes-with-Example for data science
Unit-2 Biostatistics Probability Definition
Topic Learning TeamNumber of Pages 2 (Double Spaced)Num.docx
Pearson's Chi-square Test for Research Analysis
Important terminologies
Chapter_9.pptx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
hypothesis in research .......................
Data Science interview questions of Statistics
Courses_Advanced Statistics (14-15).pptx
Hypothesis Testing.pptx ( T- test, F- test, U- test , Anova)
Machine Learning Machine Learning Interview
Hypothesis testing
Statistical Significance Tests.pptx
Review of Basic Statistics and Terminology

More from mandrewmartin (20)

PPT
Regression
PPT
Diffmeans
PPT
More tabs
PPT
Crosstabs
PPT
Statisticalrelationships
PPT
Research design pt. 2
PPT
Research design
PPT
Measurement pt. 2
PPT
Measurement
PPT
Introduction
PPT
Building blocks of scientific research
PPT
Studying politics scientifically
PPT
Chapter 11 Psrm
PPT
Stats Intro Ps 372
PPT
PPT
PPT
Political Parties
PPT
Elections
PPT
Bureaucracy
PPT
Judiciary
Regression
Diffmeans
More tabs
Crosstabs
Statisticalrelationships
Research design pt. 2
Research design
Measurement pt. 2
Measurement
Introduction
Building blocks of scientific research
Studying politics scientifically
Chapter 11 Psrm
Stats Intro Ps 372
Political Parties
Elections
Bureaucracy
Judiciary

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
KodekX | Application Modernization Development
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Modernizing your data center with Dell and AMD
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Machine learning based COVID-19 study performance prediction
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation theory and applications.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Unlocking AI with Model Context Protocol (MCP)
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
NewMind AI Monthly Chronicles - July 2025
KodekX | Application Modernization Development
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Modernizing your data center with Dell and AMD
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Machine learning based COVID-19 study performance prediction
Reach Out and Touch Someone: Haptics and Empathic Computing
Per capita expenditure prediction using model stacking based on satellite ima...
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation theory and applications.pdf
cuic standard and advanced reporting.pdf
Building Integrated photovoltaic BIPV_UPV.pdf

Statistics

  • 1. Statistics: First Steps Andrew Martin PS 372 University of Kentucky
  • 2. Variance Variance is a measure of dispersion of data points about the mean for interval- and ratio-level data. Variance is a fundamental concept that social scientists seek to explain in the dependent variable.
  • 3.  
  • 4. Standard Deviation Standard deviation is a measure of dispersion of data points about the mean for interval- and ratio-level data. Like the mean, standard deviation is sensitive to extreme values. Standard deviation is calculated as the square root of the variance.
  • 5.  
  • 6.  
  • 7. Normal Distribution The bulk of observations lie in the center, where there is a single peak. In a normal distribution half (50 percent) of the observations lie above the mean and half lie below it. The mean, median and mode have the same statistical values. Fewer and fewer observations fall in the tails. The spread of the distribution is symmetric.
  • 8. Normal Distribution Mathematical theory allows us to know what percentage of observations lie within one (68%), two (95%) or three (98%) standard deviations of the mean. If data are not perfectly normally distributed, the percentages will only be approximations. Many naturally occurring variables do have nearly normal distributions. Some can be transformed using logarithms.
  • 11.  
  • 12. Example Calculate the ID and IQV for this PS 372 class grades using the following frequencies or proportions: Grade Freq. Prop. A 4 (.12)‏ B 7 (.21)‏ C 4 (.12)‏ D 7 (.21)‏ E 12 (.34)
  • 13. Index of Diversity ID = 1 – ( p 2 a + p 2 b + p 2 c + p 2 d + p 2 e )‏ ID = 1 - (.12 2 + .21 2 + .12 2 + .21 2 + .34 2 )‏ ID = 1 - (.0144 + .0441 + .0144 + .0441 + .1156)‏ ID = 1 - (.2326)‏ ID = .7674
  • 14. Index of Qualitative Variation 1 – ( p 2 a + p 2 b + p 2 c + p 2 d + p 2 e )‏ 1 - (1/K)‏
  • 15. Index of Qualitative Variation .7674 (1 – 1/5)‏ .9592
  • 16.  
  • 17. Data Matrix A data matrix is an array of rows and columns that stores the values of a set of variables for all the cases in a data set. This is frequently referred to as a dataset.
  • 18.  
  • 19.  
  • 21. Properties of Good Graphs Should answer several of the following questions: (JRM 384)‏ 1. Where does the center of the distribution lie? 2. How spread out or bunched up are the observations? 3. Does it have a single peak or more than one? 4. Approximately what proportion of observations in in the ends of the distributions?
  • 22. Properties of Good Graphs 5. Do observations tend to pile up at one end of the measurement scale, with relatively few observations at the other end? 6. Are there values that, compared with most, seem very large or very small? 7. How does one distribution compare to another in terms of shape, spread, and central tendency? 8. Do values of one variable seem related to another variable?
  • 23.  
  • 24.  
  • 25.  
  • 26.  
  • 27.  
  • 28. Statistical Concepts Let's quickly review some concepts.
  • 29. Population A population refers to any well-defined set of objects such as people, countries, states, organizations, and so on. The term doesn't simply mean the population of the United States or some other geographical area.
  • 30. Population A sample is a subset of the population. Samples are drawn in some known manner and each case is chosen independently of the other. From here on out, when the book uses the term sample, random sample or simple random sample, it's making reference to the same concept, which is a sample chosen at random.
  • 31. Populations Parameters are numerical features of a population. A sample statistic is an estimator that corresponds to a population parameter of interest and is used to estimate the population value. Y is the sample mean, ( μ) is the population mean. ^ is a “hat”, caret or circumflex
  • 32. Two Kinds of Inference Hypothesis Testing Point and interval estimation
  • 33. Hypothesis Testing Many claims can be translated into specific statements about a population that can be confirmed or disconfirmed with the aid of probability theory. Ex: There is no ideological difference evident in the voting patterns of Republican and Democrat justices on the U.S. Supreme Court.
  • 34. Point and Interval Estimation The goal here is to estimate unknown population parameters from samples and to surround those estimates with confidence intervals. Confidence intervals suggest the estimate's reliability or precision.
  • 35. Hypothesis Testing Start with a specific verbal claim or proposition. Ex: The chances of getting heads or tails when flipping the coin is are roughly the same. Ex: The chances of the United States electing a Republican or Democrat president are roughly the same.
  • 36. Hypothesis Testing Next, the researcher constructs a null hypothesis. A null hypothesis is a statement that a population parameter equals a specific value.
  • 37. Hypothesis Testing Following up on the coin example, the null hypothesis would equal .5. Stated more formally: H 0 : P = .5 Where P stands for the probability that the coin will be heads when tossed. H 0 is typically used to denote a null hypothesis.
  • 38. Hypothesis Testing Next, specify an alternative hypothesis. An alternative hypothesis is a statement about the value or values of a population parameter. It is proposed as an alternative to the null hypothesis. An alternative hypothesis can merely state that the population does not equal the null hypothesis, or is greater than or less than the null hypothesis.
  • 39. Hypothesis Testing Suppose you believe the coin is unfair, but have no intuition about whether it is too prone to come up heads or tails. Stated formally, the alternative hypothesis is: H A : P ≠ .5
  • 40. Hypothesis Testing Perhaps you believe the coin is more likely to come up heads than tails. You would formulate the following alternative hypothesis: H A : P > .5 Conversely, if you believe the coin is less likely to come up heads than tails, you would formulate the alternative hypothesis in the opposite direction: H A : P < .5
  • 41. Hypothesis Testing After specifying the null and alternative hypothesis, identify the sample estimator that corresponds to the parameter in question. The sample must come from the data, which in this case is generated by flipping a coin.
  • 42. Hypothesis Testing Next, determine how the sample statistic is distributed in repeated random samples. That is, specify the sampling distribution of the estimator. For example, what are the chances of getting 10 heads in 10 flips ( p = 1.)? What about 9 heads in 10 flips ( p = .9)? 8 flips ( p = .8)?
  • 43.  
  • 44. Hypothesis Testing Make a decision rule based on some criterion of probability or likelihood. In social sciences, a result that occurs with a probability of .05 (that is, 1 chance in 20) is considered unusual and consequently is grounds for rejecting a null hypothesis. Other common thresholds (.01, .001) are also common. Make the decision rule before collecting data.
  • 45. Hypothesis Testing In light of the decision rule, define a critical region. The critical region consists of those outcomes so unlikely to occur that one has cause to reject the null hypothesis should they occur. So there are areas of “rejection” (critical areas) and non-rejection.
  • 46.  
  • 47. Hypothesis Testing Collect a random sample and calculate the sample estimator. Calculate the observed test statistic. A test statistic converts the sample result into a number that can be compared with the critical values specified by your decision rule and critical values. Examine the observed test statistic to see if it falls in the critical region. Make practical or theoretical interpretation of the findings.
  • 48.