SlideShare a Scribd company logo
Quantitative Data Analysis:  Statistics – Part 2
Overview Part 1 Picturing the Data Pitfalls of Surveys Averages Variance and Standard Deviation  Part 2 The Normal Distribution  Z-Tests Confidence Intervals  T-Tests
The Normal Distribution
The Normal Distribution Abraham de Moivre, the 18th century statistician and consultant to gamblers was often called upon to make lengthy computations about coin flips. de Moivre noted that when the number of events (coin flips) increased, the shape of the binomial distribution approached a very smooth curve. In 1809 Carl Gauss developed the formula for the normal distribution and showed that the distribution of many natural phenomena are at least approximately normally distributed.
Abraham de Moivre Born  26 May 1667 Died 27 November 1754 Born in Champagne, France wrote a textbook on probability theory, " The Doctrine of Chances: a method of calculating the probabilities of events in play ". This book came out in four editions, 1711 in Latin, and 1718, 1738 and 1756 in English.  In the later editions of his book, de Moivre gives the first statement of the formula for the normal distribution curve.
Carl Friedrich Gauss Born  30 April 1777 Died 23 February 1855 Born in Lower Saxony, Germany In 1809 Gauss published the monograph  “ Theoria motus corporum coelestium in sectionibus conicis solem ambientium ” where among other things he introduces and describes several important statistical concepts, such as the method of least squares, the method of maximum likelihood, and the normal distribution.
 
 
The Normal Distribution
The Normal Distribution Age of students in a class Body temperature Pulse rate Shoe size IQ score  Diameter of trees Height?
The Normal Distribution
The Normal Distribution
 
Density Curves:  Properties
The Normal Distribution The graph has a single peak at the center, this peak occurs at the mean  The graph is symmetrical about the mean  The graph never touches the horizontal axis  The area under the graph is equal to 1
Characterization A normal distribution is bell-shaped and symmetric. The distribution is determined by the mean mu,   and the standard deviation sigma,   . The mean mu controls the center and sigma controls the spread.
Same Mean,  Different Standard Deviation 10 1
Different Mean,  Different Standard Deviation 10 1
Different Mean,  Same Standard Deviation 10 1
 
 
 
 
 
 
 
 
 
The Normal Distribution If a variable is normally distributed, then: within one standard deviation of the mean there will be approximately 68% of the data  within two standard deviations of the mean there will be approximately 95% of the data  within three standard deviations of the mean there will be approximately 99.7% of the data
The Normal Distribution
Why? One reason the normal distribution is important is that many psychological and organsational variables are distributed approximately normally. Measures of reading ability, introversion, job satisfaction, and memory are among the many psychological variables approximately normally distributed. Although the distributions are only approximately normal, they are usually quite close.
Why? A second reason the normal distribution is so important is that it is easy for mathematical statisticians to work with. This means that many kinds of statistical tests can be derived for normal distributions. Almost all statistical tests discussed in this text assume normal distributions. Fortunately, these tests work very well even if the distribution is only approximately normally distributed. Some tests work well even with very wide deviations from normality.
So what? Imagine we undertook an experiment where we measured staff productivity before and after we introduced a computer system to help record solutions to common issues of work Average productivity before = 6.4 Average productivity after = 9.2
So what? Before = 6.4 After = 9.2 0 10
So what? Is this a significant difference? Before = 6.4 After = 9.2 10 0
So what? or is it  more likely a sampling variation? Before = 6.4 After = 9.2 10 0
So what? Before = 6.4 After = 9.2 10 0
So what? Before = 6.4 After = 9.2 10 0
So what? Before = 6.4 After = 9.2 How many standard devaitions from the  mean is this? 10 0
So what? Before = 6.4 After = 9.2 How many standard devaitions from the  mean is this? 10 0 and is it statistically significant?
So what? Before = 6.4 After = 9.2 10 0 σ σ σ
One Tail / Two Tail One-Tailed  H0 : m1 >= m2  HA : m1 < m2 Two-Tailed  H0 : m1 = m2  HA : m1 <>m2
STANDARD NORMAL DISTRIBUTION Normal Distribution is defined as  N(mean, (Std dev)^2) Standard Normal Distribution is defined as N(0, (1)^2)
STANDARD NORMAL DISTRIBUTION Using the following formula : will convert a normal table into a standard normal table.
Exercise If the average IQ in a given population is 100, and the standard deviation is 15, what percentage of the population has an IQ of 145 or higher ?
Answer P(X >= 145) P(Z >= ((145 - 100)/15)) P(Z >= 3) From tables:  99.87% are less than 3  =>  0.13% of population
Trends in Statistical Tests used in Research Papers Historically Currently Testing Estimation Hypothesis Tests Quoting P-Values Confidence Intervals Results in: Accept/Reject Results in: p-Value Results in: Approx. Mean
Confidence Intervals   A confidence interval is used to express the uncertainty in a quantity being estimated. There is uncertainty because inferences are based on a random sample of finite size from a population or process of interest. To judge the statistical procedure we can ask what would happen if we were to repeat the same study, over and over, getting different data (and thus different confidence intervals) each time.
Confidence Intervals  
Born April 16, 1894 Died August 5, 1981 Born in Bessarabia, Imperial Russia statistician who spent most of his professional career at the University of California, Berkeley. Developed modern scientific sampling (random samples) in 1934, the Neyman-Pearson lemma in 1933 and the confidence interval in 1937. Jerzy Neyman
Born 11 August 1895 Died 12 June 1980 Born in Hampstead, London Son of Karl Pearson Leading British statistician Developed the Neyman-Pearson lemma in 1933. Egon Pearson
Neyman and Pearson's joint work formally started in the spring of 1927. From 1928 to 1934, they published several important papers on the theory of testing statistical hypotheses.  In developing their theory, Neyman and Pearson recognized the need to include alternative hypotheses and they perceived the errors in testing hypotheses concerning unknown population values based on sample observations that are subject to variation.  They called the error of rejecting a true hypothesis the first kind of error and the error of accepting a false hypothesis the second kind of error. They called a hypothesis that completely specifies a probability distribution a simple hypothesis. A hypothesis that is not a simple hypothesis is a composite hypothesis.  Their joint work lead to Neyman developing the idea of confidence interval estimation, published in 1937.
Confidence Intervals   Neyman, J. (1937)  &quot; Outline of a theory of statistical estimation based on the classical theory of probability &quot;  Philos. Trans. Roy. Soc. London. Ser. A. , Vol. 236  pp. 333–380.
Confidence Intervals   If we know the true population mean and sample  n  individuals, we know that if the data is normally distributed, Average mean of these  n  samples has a 95% chance of falling into the interval.
Confidence Intervals   where the standard error for a 95% CI may be calculated as follows;
Example 1
Example 1 Did FF have more of the popular vote than FG-L ? In a  random  sample of 721 respondents : 382 FF 339 FG-L Can we conclude that FF had more than 50% of the popular vote ?
Example 1 - Solution Sample proportion = p = 382/721 = 0.53 Sample size = n = 721 Standard Error = (SqRt((p(1-p)/n))) = 0.02 95% Confidence Interval 0.53 +/- 1.96 (0.02) 0.53 +/- 0.04 [0.49, 0.57] Thus, we cannot conclude that FF had more of the popular vote, since this interval spans 50%. So, we say:  &quot;the data are consistent with the hypothesis that there is no difference&quot; 
Example 2
Example 2 Did Obama have more of the popular vote than McCain ? In a  random  sample of 1000 respondents  532 Obama 468 McCain Can we conclude that Obama had more than 50% of the popular vote ?
Example 2 – 95% CI Sample proportion = p = 532/1000 = 0.532 Sample size = n = 1000 Standard Error = (SqRt((p(1-p)/n))) = 0.016 95% Confidence Interval 0.532 +/- 1.96 (0.016) 0.532 +/- 0.03136 [0.5006, 0.56336] Thus, we can conclude that Obama had more of the popular vote, since this interval does not span 50%. So, we say :  &quot;the data are consistent with the hypothesis that there is a difference in a 95% CI&quot; 
Example 2 – 99% CI Sample proportion = p = 532/1000 = 0.532 Sample size = n = 1000 Standard Error = (SqRt((p(1-p)/n))) = 0.016 99% Confidence Interval 0.532 +/- 2.58 (0.016) 0.532 +/- 0.041 [0.491, 0.573] Thus, we cannot conclude that Obama had more of the popular vote, since this interval does span 50%. So, we say :  &quot;the data are consistent with the hypothesis that there is no difference in a 99% CI&quot; 
Example 2 – 99.99% CI Sample proportion = p = 532/1000 = 0.532 Sample size = n = 1000 Standard Error = (SqRt((p(1-p)/n))) = 0.016 99.99% Confidence Interval 0.532 +/- 3.87 (0.016) 0.532 +/- 0.06 [0.472, 0.592] Thus, we cannot conclude that Obama had more of the popular vote, since this interval does span 50%. So, we say :  &quot;the data are consistent with the hypothesis that there is no difference in a 99.99% CI&quot; 
T-Tests
William Sealy Gosset Born  June 13, 1876 Died October 16, 1937 Born in Canterbury, England On graduating from Oxford in 1899, he joined the Dublin brewery of Arthur Guinness & Son. Published significant paper in 1908 concerning the t-distribution
Gosset acquired his statistical knowledge by study, and he also spend two terms in 1906–1907 in the biometric laboratory of Karl Pearson.  Gosset applied his knowledge for Guinness both in the brewery and on the farm - to the selection of the best yielding varieties of barley, and  to compare the different brewing processes for changing raw materials into beer.  Gosset and Pearson had a good relationship and Pearson helped Gosset with the mathematics of his papers. Pearson helped with the 1908 paper but he had little appreciation of their importance.  The papers addressed the brewer's concern with small samples, while the biometrician typically had hundreds of observations and saw no urgency in developing small-sample methods.
T-Tests Student (1908), “ The Probable Error of a Mean ” Biometrika, Vol. 6, No. 1, pp.1-25.
T-Tests Guinness did not allow its employees to publish results but the management decided to allow Gossett to publish it under a pseudonym - Student. Hence we have the Student's t-test .
T-Tests powerful parametric test for calculating the significance of a small sample mean necessary for small samples because their distributions are not normal one first has to calculate the &quot;degrees of freedom&quot;
 
  ~ THE GOLDEN RULE ~ Use the t-Test when your  sample size is less than 30
T-Tests If the underlying population is normal  If the underlying population is not skewed and reasonable to normal  (n < 15)  If the underlying population is skewed and there are no major outliers  (n > 15)  If the underlying population is skewed and some outliers  (n > 24)
T-Tests Form of Confidence Interval with t-Value Mean   +/-  tValue   *   SE --------  ------- as before  as before
Two Sample T-Test:  Unpaired Sample Consider a questionnaire on computer use to final year undergraduates in year 2007 and the same questionnaire give to undergraduates in 2008. As there is no direct one-to-one correspondence between individual students (in fact, there may be different number of students in different classes), you have to sum up all the responses of a given year, obtain an average from that, down the same for the following year, and compare averages.
Two Sample T-Test:  Paired Sample If you are doing a questionnaire that is testing the BEFORE/AFTER effect of parameter on the same population, then we can individually calculate differences between each sample and then average the differences. The paired test is a much strong (more powerful) statistical test.
Choosing the right test
Choosing a statistical test  http://www.graphpad.com/www/Book/Choose.htm
Choosing a statistical test  http://www.graphpad.com/www/Book/Choose.htm

More Related Content

PPTX
Inferential Statistics
PPTX
Regression diagnostics
PPTX
Inferential statistics
PPTX
Point estimation
PPTX
Skewness
PPTX
Normal Probabilty Distribution and its Problems
PPTX
Moment generating function &amp; bernoulli experiment
PPTX
t test using spss
Inferential Statistics
Regression diagnostics
Inferential statistics
Point estimation
Skewness
Normal Probabilty Distribution and its Problems
Moment generating function &amp; bernoulli experiment
t test using spss

What's hot (20)

PPT
Correlation VS Causation
PPTX
Skewness
PPTX
introduction to statistical theory
PPTX
Parametric and Non Parametric methods
PPT
Quantitative analysis
PPTX
Statistical inference concept, procedure of hypothesis testing
PPSX
Inferential statistics.ppt
PDF
Introduction to Generalized Linear Models
PPTX
Hypothesis testing
PPTX
Systematic sampling in probability sampling
PPTX
Hypothesis testing
PDF
Regression analysis algorithm
PPT
Estimation
PPTX
Anova (f test) and mean differentiation
PDF
Introduction to Statistics
PPT
Hypothesis
PPTX
Types of Statistics Descriptive and Inferential Statistics
PDF
Analysis of Variance (ANOVA)
PPTX
Data Collection (Methods/ Tools/ Techniques), Primary & Secondary Data, Quali...
Correlation VS Causation
Skewness
introduction to statistical theory
Parametric and Non Parametric methods
Quantitative analysis
Statistical inference concept, procedure of hypothesis testing
Inferential statistics.ppt
Introduction to Generalized Linear Models
Hypothesis testing
Systematic sampling in probability sampling
Hypothesis testing
Regression analysis algorithm
Estimation
Anova (f test) and mean differentiation
Introduction to Statistics
Hypothesis
Types of Statistics Descriptive and Inferential Statistics
Analysis of Variance (ANOVA)
Data Collection (Methods/ Tools/ Techniques), Primary & Secondary Data, Quali...
Ad

Viewers also liked (20)

PPT
Introduction to Statistics - Part 1
PPTX
Sampling distribution
PPT
Statistics and probability
PPT
Introduction To Statistics
PDF
Hypothesis testing; z test, t-test. f-test
PPT
PPTX
Probability and statistics (basic statistical concepts)
PPTX
Sampling and Sampling Distributions
PPT
Probability and statistics
PPT
Basic Statistical Concepts and Methods
PPTX
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
PPT
Chapter 1 descriptive_statistcs_1_2009_rev1_
PPT
PPT
Introduction to statistics
PPTX
Statistical analysis using spss
PPT
Chapter 1 introduction to statistics for engineers 1 (1)
PPT
Analysis of Interviews
PPT
Some Questions About Your Data
PPTX
Sampling Distributions
PPT
Interviews and Surveys
Introduction to Statistics - Part 1
Sampling distribution
Statistics and probability
Introduction To Statistics
Hypothesis testing; z test, t-test. f-test
Probability and statistics (basic statistical concepts)
Sampling and Sampling Distributions
Probability and statistics
Basic Statistical Concepts and Methods
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Chapter 1 descriptive_statistcs_1_2009_rev1_
Introduction to statistics
Statistical analysis using spss
Chapter 1 introduction to statistics for engineers 1 (1)
Analysis of Interviews
Some Questions About Your Data
Sampling Distributions
Interviews and Surveys
Ad

Similar to Introduction to Statistics - Part 2 (20)

PPTX
Review & Hypothesis Testing
PPTX
Standard Error & Confidence Intervals.pptx
PPT
Statistics excellent
DOCX
Module-2_Notes-with-Example for data science
DOCX
Confidence Interval ModuleOne of the key concepts of statist.docx
PPT
Introduction to Statistics2312.ppt Maths
PPT
Introduction to Statistics measures2312.ppt
DOCX
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
PPT
Introduction to Statistics2312.ppt
PPT
Introduction to Statistics23122223.ppt
PPT
What is an estimate with details regarding it's use in biostatistics
PDF
Normal and standard normal distribution
PPT
Introduction_to_Statistics_as_used_in_th.ppt
PPTX
Inorganic CHEMISTRY
PPTX
Chapter_9.pptx
PPT
Statistics
PDF
Unit-2 Biostatistics Probability Definition
PPTX
Lecture 7 Sample Size and CI.pptxtc5c5kyso6xr6x
PPT
statisticsintroductionofbusinessstats.ppt
PPT
Statistics
Review & Hypothesis Testing
Standard Error & Confidence Intervals.pptx
Statistics excellent
Module-2_Notes-with-Example for data science
Confidence Interval ModuleOne of the key concepts of statist.docx
Introduction to Statistics2312.ppt Maths
Introduction to Statistics measures2312.ppt
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Introduction to Statistics2312.ppt
Introduction to Statistics23122223.ppt
What is an estimate with details regarding it's use in biostatistics
Normal and standard normal distribution
Introduction_to_Statistics_as_used_in_th.ppt
Inorganic CHEMISTRY
Chapter_9.pptx
Statistics
Unit-2 Biostatistics Probability Definition
Lecture 7 Sample Size and CI.pptxtc5c5kyso6xr6x
statisticsintroductionofbusinessstats.ppt
Statistics

More from Damian T. Gordon (20)

PPTX
Introduction to Prompts and Prompt Engineering
PPTX
Introduction to Vibe Coding and Vibe Engineering
PPTX
TRIZ: Theory of Inventive Problem Solving
PPTX
Some Ethical Considerations of AI and GenAI
PPTX
Some Common Errors that Generative AI Produces
PPTX
The Use of Data and Datasets in Data Science
PPTX
A History of Different Versions of Microsoft Windows
PPTX
Writing an Abstract: A Question-based Approach
PPTX
Using GenAI for Universal Design for Learning
DOC
A CheckSheet for Inclusive Software Design
PPTX
A History of Versions of the Apple MacOS
PPTX
68 Ways that Data Science and AI can help address the UN Sustainability Goals
PPTX
Copyright and Creative Commons Considerations
PPTX
Exam Preparation: Some Ideas and Suggestions
PPTX
Studying and Notetaking: Some Suggestions
PPTX
The Growth Mindset: Explanations and Activities
PPTX
Hyperparameter Tuning in Neural Networks
PPTX
Early 20th Century Modern Art: Movements and Artists
PPTX
An Introduction to Generative Artificial Intelligence
PPTX
An Introduction to Green Computing with a fun quiz.
Introduction to Prompts and Prompt Engineering
Introduction to Vibe Coding and Vibe Engineering
TRIZ: Theory of Inventive Problem Solving
Some Ethical Considerations of AI and GenAI
Some Common Errors that Generative AI Produces
The Use of Data and Datasets in Data Science
A History of Different Versions of Microsoft Windows
Writing an Abstract: A Question-based Approach
Using GenAI for Universal Design for Learning
A CheckSheet for Inclusive Software Design
A History of Versions of the Apple MacOS
68 Ways that Data Science and AI can help address the UN Sustainability Goals
Copyright and Creative Commons Considerations
Exam Preparation: Some Ideas and Suggestions
Studying and Notetaking: Some Suggestions
The Growth Mindset: Explanations and Activities
Hyperparameter Tuning in Neural Networks
Early 20th Century Modern Art: Movements and Artists
An Introduction to Generative Artificial Intelligence
An Introduction to Green Computing with a fun quiz.

Recently uploaded (20)

PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Cell Types and Its function , kingdom of life
PPTX
Institutional Correction lecture only . . .
PDF
01-Introduction-to-Information-Management.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPH.pptx obstetrics and gynecology in nursing
Anesthesia in Laparoscopic Surgery in India
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
102 student loan defaulters named and shamed – Is someone you know on the list?
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Week 4 Term 3 Study Techniques revisited.pptx
human mycosis Human fungal infections are called human mycosis..pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Cell Types and Its function , kingdom of life
Institutional Correction lecture only . . .
01-Introduction-to-Information-Management.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Mark Klimek Lecture Notes_240423 revision books _173037.pdf

Introduction to Statistics - Part 2

  • 1. Quantitative Data Analysis: Statistics – Part 2
  • 2. Overview Part 1 Picturing the Data Pitfalls of Surveys Averages Variance and Standard Deviation Part 2 The Normal Distribution Z-Tests Confidence Intervals T-Tests
  • 4. The Normal Distribution Abraham de Moivre, the 18th century statistician and consultant to gamblers was often called upon to make lengthy computations about coin flips. de Moivre noted that when the number of events (coin flips) increased, the shape of the binomial distribution approached a very smooth curve. In 1809 Carl Gauss developed the formula for the normal distribution and showed that the distribution of many natural phenomena are at least approximately normally distributed.
  • 5. Abraham de Moivre Born 26 May 1667 Died 27 November 1754 Born in Champagne, France wrote a textbook on probability theory, &quot; The Doctrine of Chances: a method of calculating the probabilities of events in play &quot;. This book came out in four editions, 1711 in Latin, and 1718, 1738 and 1756 in English. In the later editions of his book, de Moivre gives the first statement of the formula for the normal distribution curve.
  • 6. Carl Friedrich Gauss Born 30 April 1777 Died 23 February 1855 Born in Lower Saxony, Germany In 1809 Gauss published the monograph “ Theoria motus corporum coelestium in sectionibus conicis solem ambientium ” where among other things he introduces and describes several important statistical concepts, such as the method of least squares, the method of maximum likelihood, and the normal distribution.
  • 7.  
  • 8.  
  • 10. The Normal Distribution Age of students in a class Body temperature Pulse rate Shoe size IQ score Diameter of trees Height?
  • 13.  
  • 14. Density Curves: Properties
  • 15. The Normal Distribution The graph has a single peak at the center, this peak occurs at the mean The graph is symmetrical about the mean The graph never touches the horizontal axis The area under the graph is equal to 1
  • 16. Characterization A normal distribution is bell-shaped and symmetric. The distribution is determined by the mean mu,  and the standard deviation sigma,  . The mean mu controls the center and sigma controls the spread.
  • 17. Same Mean, Different Standard Deviation 10 1
  • 18. Different Mean, Different Standard Deviation 10 1
  • 19. Different Mean, Same Standard Deviation 10 1
  • 20.  
  • 21.  
  • 22.  
  • 23.  
  • 24.  
  • 25.  
  • 26.  
  • 27.  
  • 28.  
  • 29. The Normal Distribution If a variable is normally distributed, then: within one standard deviation of the mean there will be approximately 68% of the data within two standard deviations of the mean there will be approximately 95% of the data within three standard deviations of the mean there will be approximately 99.7% of the data
  • 31. Why? One reason the normal distribution is important is that many psychological and organsational variables are distributed approximately normally. Measures of reading ability, introversion, job satisfaction, and memory are among the many psychological variables approximately normally distributed. Although the distributions are only approximately normal, they are usually quite close.
  • 32. Why? A second reason the normal distribution is so important is that it is easy for mathematical statisticians to work with. This means that many kinds of statistical tests can be derived for normal distributions. Almost all statistical tests discussed in this text assume normal distributions. Fortunately, these tests work very well even if the distribution is only approximately normally distributed. Some tests work well even with very wide deviations from normality.
  • 33. So what? Imagine we undertook an experiment where we measured staff productivity before and after we introduced a computer system to help record solutions to common issues of work Average productivity before = 6.4 Average productivity after = 9.2
  • 34. So what? Before = 6.4 After = 9.2 0 10
  • 35. So what? Is this a significant difference? Before = 6.4 After = 9.2 10 0
  • 36. So what? or is it more likely a sampling variation? Before = 6.4 After = 9.2 10 0
  • 37. So what? Before = 6.4 After = 9.2 10 0
  • 38. So what? Before = 6.4 After = 9.2 10 0
  • 39. So what? Before = 6.4 After = 9.2 How many standard devaitions from the mean is this? 10 0
  • 40. So what? Before = 6.4 After = 9.2 How many standard devaitions from the mean is this? 10 0 and is it statistically significant?
  • 41. So what? Before = 6.4 After = 9.2 10 0 σ σ σ
  • 42. One Tail / Two Tail One-Tailed H0 : m1 >= m2 HA : m1 < m2 Two-Tailed H0 : m1 = m2 HA : m1 <>m2
  • 43. STANDARD NORMAL DISTRIBUTION Normal Distribution is defined as N(mean, (Std dev)^2) Standard Normal Distribution is defined as N(0, (1)^2)
  • 44. STANDARD NORMAL DISTRIBUTION Using the following formula : will convert a normal table into a standard normal table.
  • 45. Exercise If the average IQ in a given population is 100, and the standard deviation is 15, what percentage of the population has an IQ of 145 or higher ?
  • 46. Answer P(X >= 145) P(Z >= ((145 - 100)/15)) P(Z >= 3) From tables: 99.87% are less than 3 => 0.13% of population
  • 47. Trends in Statistical Tests used in Research Papers Historically Currently Testing Estimation Hypothesis Tests Quoting P-Values Confidence Intervals Results in: Accept/Reject Results in: p-Value Results in: Approx. Mean
  • 48. Confidence Intervals   A confidence interval is used to express the uncertainty in a quantity being estimated. There is uncertainty because inferences are based on a random sample of finite size from a population or process of interest. To judge the statistical procedure we can ask what would happen if we were to repeat the same study, over and over, getting different data (and thus different confidence intervals) each time.
  • 50. Born April 16, 1894 Died August 5, 1981 Born in Bessarabia, Imperial Russia statistician who spent most of his professional career at the University of California, Berkeley. Developed modern scientific sampling (random samples) in 1934, the Neyman-Pearson lemma in 1933 and the confidence interval in 1937. Jerzy Neyman
  • 51. Born 11 August 1895 Died 12 June 1980 Born in Hampstead, London Son of Karl Pearson Leading British statistician Developed the Neyman-Pearson lemma in 1933. Egon Pearson
  • 52. Neyman and Pearson's joint work formally started in the spring of 1927. From 1928 to 1934, they published several important papers on the theory of testing statistical hypotheses. In developing their theory, Neyman and Pearson recognized the need to include alternative hypotheses and they perceived the errors in testing hypotheses concerning unknown population values based on sample observations that are subject to variation. They called the error of rejecting a true hypothesis the first kind of error and the error of accepting a false hypothesis the second kind of error. They called a hypothesis that completely specifies a probability distribution a simple hypothesis. A hypothesis that is not a simple hypothesis is a composite hypothesis. Their joint work lead to Neyman developing the idea of confidence interval estimation, published in 1937.
  • 53. Confidence Intervals   Neyman, J. (1937) &quot; Outline of a theory of statistical estimation based on the classical theory of probability &quot; Philos. Trans. Roy. Soc. London. Ser. A. , Vol. 236 pp. 333–380.
  • 54. Confidence Intervals   If we know the true population mean and sample n individuals, we know that if the data is normally distributed, Average mean of these n samples has a 95% chance of falling into the interval.
  • 55. Confidence Intervals   where the standard error for a 95% CI may be calculated as follows;
  • 57. Example 1 Did FF have more of the popular vote than FG-L ? In a random sample of 721 respondents : 382 FF 339 FG-L Can we conclude that FF had more than 50% of the popular vote ?
  • 58. Example 1 - Solution Sample proportion = p = 382/721 = 0.53 Sample size = n = 721 Standard Error = (SqRt((p(1-p)/n))) = 0.02 95% Confidence Interval 0.53 +/- 1.96 (0.02) 0.53 +/- 0.04 [0.49, 0.57] Thus, we cannot conclude that FF had more of the popular vote, since this interval spans 50%. So, we say: &quot;the data are consistent with the hypothesis that there is no difference&quot; 
  • 60. Example 2 Did Obama have more of the popular vote than McCain ? In a random sample of 1000 respondents 532 Obama 468 McCain Can we conclude that Obama had more than 50% of the popular vote ?
  • 61. Example 2 – 95% CI Sample proportion = p = 532/1000 = 0.532 Sample size = n = 1000 Standard Error = (SqRt((p(1-p)/n))) = 0.016 95% Confidence Interval 0.532 +/- 1.96 (0.016) 0.532 +/- 0.03136 [0.5006, 0.56336] Thus, we can conclude that Obama had more of the popular vote, since this interval does not span 50%. So, we say : &quot;the data are consistent with the hypothesis that there is a difference in a 95% CI&quot; 
  • 62. Example 2 – 99% CI Sample proportion = p = 532/1000 = 0.532 Sample size = n = 1000 Standard Error = (SqRt((p(1-p)/n))) = 0.016 99% Confidence Interval 0.532 +/- 2.58 (0.016) 0.532 +/- 0.041 [0.491, 0.573] Thus, we cannot conclude that Obama had more of the popular vote, since this interval does span 50%. So, we say : &quot;the data are consistent with the hypothesis that there is no difference in a 99% CI&quot; 
  • 63. Example 2 – 99.99% CI Sample proportion = p = 532/1000 = 0.532 Sample size = n = 1000 Standard Error = (SqRt((p(1-p)/n))) = 0.016 99.99% Confidence Interval 0.532 +/- 3.87 (0.016) 0.532 +/- 0.06 [0.472, 0.592] Thus, we cannot conclude that Obama had more of the popular vote, since this interval does span 50%. So, we say : &quot;the data are consistent with the hypothesis that there is no difference in a 99.99% CI&quot; 
  • 65. William Sealy Gosset Born June 13, 1876 Died October 16, 1937 Born in Canterbury, England On graduating from Oxford in 1899, he joined the Dublin brewery of Arthur Guinness & Son. Published significant paper in 1908 concerning the t-distribution
  • 66. Gosset acquired his statistical knowledge by study, and he also spend two terms in 1906–1907 in the biometric laboratory of Karl Pearson. Gosset applied his knowledge for Guinness both in the brewery and on the farm - to the selection of the best yielding varieties of barley, and to compare the different brewing processes for changing raw materials into beer. Gosset and Pearson had a good relationship and Pearson helped Gosset with the mathematics of his papers. Pearson helped with the 1908 paper but he had little appreciation of their importance. The papers addressed the brewer's concern with small samples, while the biometrician typically had hundreds of observations and saw no urgency in developing small-sample methods.
  • 67. T-Tests Student (1908), “ The Probable Error of a Mean ” Biometrika, Vol. 6, No. 1, pp.1-25.
  • 68. T-Tests Guinness did not allow its employees to publish results but the management decided to allow Gossett to publish it under a pseudonym - Student. Hence we have the Student's t-test .
  • 69. T-Tests powerful parametric test for calculating the significance of a small sample mean necessary for small samples because their distributions are not normal one first has to calculate the &quot;degrees of freedom&quot;
  • 70.  
  • 71.   ~ THE GOLDEN RULE ~ Use the t-Test when your sample size is less than 30
  • 72. T-Tests If the underlying population is normal If the underlying population is not skewed and reasonable to normal (n < 15) If the underlying population is skewed and there are no major outliers (n > 15) If the underlying population is skewed and some outliers (n > 24)
  • 73. T-Tests Form of Confidence Interval with t-Value Mean +/- tValue * SE -------- ------- as before as before
  • 74. Two Sample T-Test: Unpaired Sample Consider a questionnaire on computer use to final year undergraduates in year 2007 and the same questionnaire give to undergraduates in 2008. As there is no direct one-to-one correspondence between individual students (in fact, there may be different number of students in different classes), you have to sum up all the responses of a given year, obtain an average from that, down the same for the following year, and compare averages.
  • 75. Two Sample T-Test: Paired Sample If you are doing a questionnaire that is testing the BEFORE/AFTER effect of parameter on the same population, then we can individually calculate differences between each sample and then average the differences. The paired test is a much strong (more powerful) statistical test.
  • 77. Choosing a statistical test http://www.graphpad.com/www/Book/Choose.htm
  • 78. Choosing a statistical test http://www.graphpad.com/www/Book/Choose.htm