SlideShare a Scribd company logo
Quantitative Data Analysis:  Statistics – Part 1
" ... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what any one man will do, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant. So says the statistician. "
Overview Part 1 Picturing the Data Pitfalls of Surveys Averages Variance and Standard Deviation  Part 2 The Normal Distribution  Z-Tests Confidence Intervals  T-Tests
  ~ THE GOLDEN RULE ~ Statistics NEVER replace  the judgment of the expert.
Approach to Statistical Research Formulate a Hypothesis  State predictions of the hypothesis  Perform experiments or observations  Interpret experiments or observations  Evaluate results with respect to hypothesis  Refine hypothesis and start again  (Basically the same as all other research)
Hypothesis Testing H 0  :   Null Hypothesis , status quo H A  :  Alternative Hypothesis , research question So, either : " The data does not support H 0 " or " We fail to reject H 0 "
Types of Data Continuous height, age, time  Discrete # of days worked this week, # leaves on a tree  Ordinal {Good, O.K., Bad}  Nominal {Yes/No}, {Teacher/Chemist/Haberdasher}
Picturing The Data
 
Time-Series Plots Time related Data  e.g. Stock Prices
 
Pie Charts Nominal/Ordinal  Only suitable for data that adds up to 1  Hard to compare values in the chart
 
Bar Charts Nominal/Ordinal Easier to compare values than pie chart Suitable for a wider range of data
 
Histograms Continuous Data  Divide Data into ranges
 
Dot Plots Nominal/Ordinal Represents all the data  Difficult to read
 
Scatter Plots Excellent for examining association between two variables
 
Box Plots Nominal/Ordinal 1IQR, 3IQR - First interquartile range (IQR), third interquartile range (3QR) Outliers
 
 
 
John Tukey Born June 16, 1915 Died July 26, 2000 Born in New Bedford, Massachusetts He introduced the box plot in his 1977 book," Exploratory Data Analysis " Also the Cooley–Tukey FFT algorithm and jackknife estimation
While working with John von Neumann on early computer designs, Tukey introduced the word "bit" as a contraction of "binary digit". The term "bit" was first used in an article by Claude Shannon in 1948. The term "software", which Paul Niquette claims he coined in 1953, was first used in print by Tukey in a 1958 article in  American Mathematical Monthly , and thus some people attribute the term to him. John Tukey Paul Niquette  Claude Shannon  John von Neumann
Question 1 In a telephone survey of 68 households, when asked do they have pets, the following were the responses : 16 : No Pets 28 : Dogs 32 : Cats Draw the appropriate graphic to illustrate the results !!
Question 1 - Solution Total number surveyed = 68 Number with no pets = 16 =>Total with pets = (68 - 16) = 52 But total 28 dogs + 32 cats = 60  => So some people have both cats and dogs
 
Question 1 - Solution How many? It must be (60 - 52) = 8 people No pets = 16 Dogs =  20  Cats =  24 Both =  8 ------------------------- Total =  68
Question 1 - Solution Graphic: Pie Chart or Bar Chart
Question 1 - Solution Graphic: Pie Chart or Bar Chart
Pitfalls of Surveys
The Literary Digest  Poll 1936 US Presidential Election Alf Landon (R) vs. Franklin D. Roosevelt (D)
The Literary Digest  Poll
The Literary Digest  Poll Literary Digest  had been conducting successful presidential election polls since 1916 They had correctly predicted the outcomes of the 1916, 1920, 1924, 1928, and 1932 elections by conducting polls.  These polls were a lucrative venture for the magazine: readers liked them; newspapers played them up; and each “ballot” included a subscription blank.
The Literary Digest  Poll In 1936 they sent out 10 million ballots to two groups of people: prospective subscribers, “who were chiefly upper- and middle-income people” a list designed to "correct for bias" from the first list, consisting of names selected from telephone books and motor vehicle registries
 
The Literary Digest  Poll Response rate:  approximately 25%, or 2,376,523 responses Result:  Landon in a landslide (predicted 57% of the vote, Roosevelt predicted 40%)
The Literary Digest  Poll Response rate:  approximately 25%, or 2,376,523 responses Result:  Landon in a landslide (predicted 57% of the vote, Roosevelt predicted 40%) Election result:  Roosevelt received approximately 60% of the vote
The Literary Digest  Poll POSSIBLE CAUSES OF ERROR Selection Bias : By taking names and addresses from telephone directories, survey systematically excluded poor voters. Republicans were markedly overrepresented in 1936, Democrats did not have as many phones,  not as likely to drive cars, and did not read the  Literary Digest “ Sampling Frame” is the actual population of individuals from which a sample is drawn: Selection bias results when sampling frame is not representative of the population of interest
The Literary Digest  Poll POSSIBLE CAUSES OF ERROR Non-response Bias : Because only 20% of 10 million people returned surveys, non-respondents may have different preferences from respondents Indeed, respondents favored Landon Greater response rates reduce the odds of biased samples
Definitions and Formula
Terminology Population:   is a set of entities concerning which statistical inferences are to be drawn.  Sample:  a number of independent observations from the same probability distribution Parameter:  the distribution of a random variable as belonging to a family of probability distributions, distinguished from each other by the values of a finite number of parameters Bias:  a factor that causes a statistical sample of a population to have some examples of the population less represented than others.
Outliers (and their treatment)
Outliers (and their treatment) An "outlier" is an observation that does not fit the pattern in the rest of the data  Check the data Check with the measurer If reason to believe it is NOT real, change it if possible, otherwise leave it out (but note). If reason to believe it is real, leave it out and note.
The Mean The Mean (Arithmetic) The mean is defined as the sum of all the elements, divided by the number of elements. The statistical  mean  of a set of observations is the average of the measurements in a set of data
The Mode The mode is defined as the most frequently element in a set of elements. For example [1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17] has a mode of 6.  Given the list of data [1, 1, 2, 4, 4] the mode is not unique - the dataset may be said to be bimodal, while a set with more than two modes may be described as multimodal.
The Median The median is defined as the middle element, or the  value separating the higher half of a sample from the lower half.  If there is an even number of elements, it is half the sum of the middle two elements. Given the list of data [1, 1, 2, 4, 4] the median is 2.
The Variance But there can be a lot of variance in individual elements,  e.g. teacher salaries Average = €22,000 Lowest = € 12,000 Difference = 12,000 - 22,000 = -10,000
The Variance
The Variance
The Variance
The Variance Sum of (Sample - Average) = 0, thus we need to define variance. The  variance  of a set of data is a cumulative measure of the squares of the difference of all the data values from the mean divided by sample size minus one.
Standard Deviation The  standard deviation  of a set of data is the  positive square root  of the variance.  - 1 - 1
Born 27 March 1857 Died 27 April 1936 Born in Islington, London, England Father of Mathematical Statistics protégé of Francis Galton Inventor of the P-value, the Pearson correlation coefficient, Chi distance, the Method of moments, and Principal Component  Analysis Karl Pearson
Karl Pearson the term "standard deviation" in 1893, "although the idea was by then nearly a century old" (Abbott; Stigler, page 328).  The term "standard deviation" was introduced in a lecture of 31 January 1893, as a convenient substitute for the cumbersome "root mean square error" and the older expressions "error of mean square" and "mean error."  The term was first used in a publication in 1894 by Pearson in "Contributions to the Mathematical Theory of Evolution," (Philosophical Transactions of the Royal Society A, 185, (1894), 71-110.). http://jeff560.tripod.com/s.html
Question 2 Find the mean and variance of the following sample values : 36, 41, 43, 44, 46
Question 2 Mean :  =( 36 + 41 + 43 + 44 + 46) / 5 =210 / 5 =42
Question 2 Variance Difference  Square 36 – 42 = -6  36 41 – 42 = -1  1 43 – 42 = 1  1 44 – 42 = 2  4 46 – 42 = 4  16  ------------------------------------ 58 Variance  = 58 / (5 -1)  = 58 / 4  = 14.5 Standard Deviation  = SquareRoot(14.5)  = 3.8
http://www.oerrecommender.org/visits/94142
 
http://mathforum.org/library/drmath/view/65410.html
 

More Related Content

PPTX
Introduction to statistics
PPT
Chapter 3
PPT
Math 102- Statistics
PPTX
Statistics
ODP
Basic concepts of statistics
PPTX
Types of Data, Key Concept
PPT
Introduction to statistics
PPTX
statistic
Introduction to statistics
Chapter 3
Math 102- Statistics
Statistics
Basic concepts of statistics
Types of Data, Key Concept
Introduction to statistics
statistic

What's hot (19)

PPTX
Is the Data Scaled, Ordinal, or Nominal Proportional?
PPT
Statistics
PPT
Areas In Statistics
PPTX
Medical Statistics Part-I:Descriptive statistics
PPTX
Descriptive statistics
PPTX
Statistics Class 10 CBSE
PPT
Day 3 descriptive statistics
PPT
General Statistics boa
PPT
Probability and statistics(assign 7 and 8)
DOCX
descriptive and inferential statistics
PPT
Descriptive Statistics and Data Visualization
PPTX
Statistics
PPTX
Das20502 chapter 1 descriptive statistics
PPTX
Statistics in Physical Education
PPTX
PPT
Business Statistics
PPTX
Statistics for Physical Education
PPT
Descriptive Statistics
PDF
elementary statistic
Is the Data Scaled, Ordinal, or Nominal Proportional?
Statistics
Areas In Statistics
Medical Statistics Part-I:Descriptive statistics
Descriptive statistics
Statistics Class 10 CBSE
Day 3 descriptive statistics
General Statistics boa
Probability and statistics(assign 7 and 8)
descriptive and inferential statistics
Descriptive Statistics and Data Visualization
Statistics
Das20502 chapter 1 descriptive statistics
Statistics in Physical Education
Business Statistics
Statistics for Physical Education
Descriptive Statistics
elementary statistic
Ad

Viewers also liked (20)

PPT
Introduction to Statistics - Part 2
PPT
Statistics and probability
PPT
Introduction To Statistics
PPTX
Sampling distribution
PPTX
Introduction to statistics
PPT
Statistical ppt
PPT
Chapter 1 introduction to statistics for engineers 1 (1)
PPT
Statistics lesson 1
PPTX
Introduction to statistics
PPT
Introduction to Elementary statistics
PDF
Introduction to Statistics - Basic Statistical Terms
PPT
PPT
Probability and statistics
PPTX
Probability and statistics (basic statistical concepts)
PPT
Research Methods - v2.0
PPT
The Literature Review
PPT
The Six Thinking Hats
PPT
Doing a Literature Review - Part 2
PPTX
Operating Systems: Virtual Memory
PDF
What Is Statistics
Introduction to Statistics - Part 2
Statistics and probability
Introduction To Statistics
Sampling distribution
Introduction to statistics
Statistical ppt
Chapter 1 introduction to statistics for engineers 1 (1)
Statistics lesson 1
Introduction to statistics
Introduction to Elementary statistics
Introduction to Statistics - Basic Statistical Terms
Probability and statistics
Probability and statistics (basic statistical concepts)
Research Methods - v2.0
The Literature Review
The Six Thinking Hats
Doing a Literature Review - Part 2
Operating Systems: Virtual Memory
What Is Statistics
Ad

Similar to Introduction to Statistics - Part 1 (20)

PPTX
What is Statistics is all about basics of statistics
PPT
Penggambaran Data Secara Numerik
PPT
PPT
A basic Introduction To Statistics with examples
DOCX
S t a t i s t i c s
DOCX
S t a t i s t i c s
PPTX
Week1 GM533 Slides
PPT
PPTX
STATISTICS.pptx for the scholars and students
PPTX
Basic biostatistics dr.eezn
PPTX
Statistics (Measures of Dispersion)
PDF
Biostatistic ( descriptive statistics) MOHS
PPT
Bio stat
PPTX
Basics of Stats (2).pptx
PDF
Research Method for Business chapter 12
PDF
Practice test1 solution
PPT
Chapter 3
PPTX
Transportation and logistics modeling 2
PPTX
Biostatistics Basics Descriptive and Estimation Methods
PPTX
Descriptive
What is Statistics is all about basics of statistics
Penggambaran Data Secara Numerik
A basic Introduction To Statistics with examples
S t a t i s t i c s
S t a t i s t i c s
Week1 GM533 Slides
STATISTICS.pptx for the scholars and students
Basic biostatistics dr.eezn
Statistics (Measures of Dispersion)
Biostatistic ( descriptive statistics) MOHS
Bio stat
Basics of Stats (2).pptx
Research Method for Business chapter 12
Practice test1 solution
Chapter 3
Transportation and logistics modeling 2
Biostatistics Basics Descriptive and Estimation Methods
Descriptive

More from Damian T. Gordon (20)

PPTX
Introduction to Prompts and Prompt Engineering
PPTX
Introduction to Vibe Coding and Vibe Engineering
PPTX
TRIZ: Theory of Inventive Problem Solving
PPTX
Some Ethical Considerations of AI and GenAI
PPTX
Some Common Errors that Generative AI Produces
PPTX
The Use of Data and Datasets in Data Science
PPTX
A History of Different Versions of Microsoft Windows
PPTX
Writing an Abstract: A Question-based Approach
PPTX
Using GenAI for Universal Design for Learning
DOC
A CheckSheet for Inclusive Software Design
PPTX
A History of Versions of the Apple MacOS
PPTX
68 Ways that Data Science and AI can help address the UN Sustainability Goals
PPTX
Copyright and Creative Commons Considerations
PPTX
Exam Preparation: Some Ideas and Suggestions
PPTX
Studying and Notetaking: Some Suggestions
PPTX
The Growth Mindset: Explanations and Activities
PPTX
Hyperparameter Tuning in Neural Networks
PPTX
Early 20th Century Modern Art: Movements and Artists
PPTX
An Introduction to Generative Artificial Intelligence
PPTX
An Introduction to Green Computing with a fun quiz.
Introduction to Prompts and Prompt Engineering
Introduction to Vibe Coding and Vibe Engineering
TRIZ: Theory of Inventive Problem Solving
Some Ethical Considerations of AI and GenAI
Some Common Errors that Generative AI Produces
The Use of Data and Datasets in Data Science
A History of Different Versions of Microsoft Windows
Writing an Abstract: A Question-based Approach
Using GenAI for Universal Design for Learning
A CheckSheet for Inclusive Software Design
A History of Versions of the Apple MacOS
68 Ways that Data Science and AI can help address the UN Sustainability Goals
Copyright and Creative Commons Considerations
Exam Preparation: Some Ideas and Suggestions
Studying and Notetaking: Some Suggestions
The Growth Mindset: Explanations and Activities
Hyperparameter Tuning in Neural Networks
Early 20th Century Modern Art: Movements and Artists
An Introduction to Generative Artificial Intelligence
An Introduction to Green Computing with a fun quiz.

Recently uploaded (20)

PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Cell Types and Its function , kingdom of life
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Pharma ospi slides which help in ospi learning
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Cell Structure & Organelles in detailed.
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Cell Types and Its function , kingdom of life
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
O7-L3 Supply Chain Operations - ICLT Program
Pharma ospi slides which help in ospi learning
Supply Chain Operations Speaking Notes -ICLT Program
Microbial disease of the cardiovascular and lymphatic systems
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Renaissance Architecture: A Journey from Faith to Humanism
STATICS OF THE RIGID BODIES Hibbelers.pdf
Microbial diseases, their pathogenesis and prophylaxis
Cell Structure & Organelles in detailed.
human mycosis Human fungal infections are called human mycosis..pptx
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf

Introduction to Statistics - Part 1

  • 1. Quantitative Data Analysis: Statistics – Part 1
  • 2. " ... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what any one man will do, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant. So says the statistician. "
  • 3. Overview Part 1 Picturing the Data Pitfalls of Surveys Averages Variance and Standard Deviation Part 2 The Normal Distribution Z-Tests Confidence Intervals T-Tests
  • 4.   ~ THE GOLDEN RULE ~ Statistics NEVER replace the judgment of the expert.
  • 5. Approach to Statistical Research Formulate a Hypothesis State predictions of the hypothesis Perform experiments or observations Interpret experiments or observations Evaluate results with respect to hypothesis Refine hypothesis and start again (Basically the same as all other research)
  • 6. Hypothesis Testing H 0 : Null Hypothesis , status quo H A : Alternative Hypothesis , research question So, either : " The data does not support H 0 " or " We fail to reject H 0 "
  • 7. Types of Data Continuous height, age, time Discrete # of days worked this week, # leaves on a tree Ordinal {Good, O.K., Bad} Nominal {Yes/No}, {Teacher/Chemist/Haberdasher}
  • 9.  
  • 10. Time-Series Plots Time related Data e.g. Stock Prices
  • 11.  
  • 12. Pie Charts Nominal/Ordinal Only suitable for data that adds up to 1 Hard to compare values in the chart
  • 13.  
  • 14. Bar Charts Nominal/Ordinal Easier to compare values than pie chart Suitable for a wider range of data
  • 15.  
  • 16. Histograms Continuous Data Divide Data into ranges
  • 17.  
  • 18. Dot Plots Nominal/Ordinal Represents all the data Difficult to read
  • 19.  
  • 20. Scatter Plots Excellent for examining association between two variables
  • 21.  
  • 22. Box Plots Nominal/Ordinal 1IQR, 3IQR - First interquartile range (IQR), third interquartile range (3QR) Outliers
  • 23.  
  • 24.  
  • 25.  
  • 26. John Tukey Born June 16, 1915 Died July 26, 2000 Born in New Bedford, Massachusetts He introduced the box plot in his 1977 book," Exploratory Data Analysis " Also the Cooley–Tukey FFT algorithm and jackknife estimation
  • 27. While working with John von Neumann on early computer designs, Tukey introduced the word "bit" as a contraction of "binary digit". The term "bit" was first used in an article by Claude Shannon in 1948. The term "software", which Paul Niquette claims he coined in 1953, was first used in print by Tukey in a 1958 article in American Mathematical Monthly , and thus some people attribute the term to him. John Tukey Paul Niquette Claude Shannon John von Neumann
  • 28. Question 1 In a telephone survey of 68 households, when asked do they have pets, the following were the responses : 16 : No Pets 28 : Dogs 32 : Cats Draw the appropriate graphic to illustrate the results !!
  • 29. Question 1 - Solution Total number surveyed = 68 Number with no pets = 16 =>Total with pets = (68 - 16) = 52 But total 28 dogs + 32 cats = 60 => So some people have both cats and dogs
  • 30.  
  • 31. Question 1 - Solution How many? It must be (60 - 52) = 8 people No pets = 16 Dogs = 20 Cats = 24 Both = 8 ------------------------- Total = 68
  • 32. Question 1 - Solution Graphic: Pie Chart or Bar Chart
  • 33. Question 1 - Solution Graphic: Pie Chart or Bar Chart
  • 35. The Literary Digest Poll 1936 US Presidential Election Alf Landon (R) vs. Franklin D. Roosevelt (D)
  • 37. The Literary Digest Poll Literary Digest had been conducting successful presidential election polls since 1916 They had correctly predicted the outcomes of the 1916, 1920, 1924, 1928, and 1932 elections by conducting polls. These polls were a lucrative venture for the magazine: readers liked them; newspapers played them up; and each “ballot” included a subscription blank.
  • 38. The Literary Digest Poll In 1936 they sent out 10 million ballots to two groups of people: prospective subscribers, “who were chiefly upper- and middle-income people” a list designed to "correct for bias" from the first list, consisting of names selected from telephone books and motor vehicle registries
  • 39.  
  • 40. The Literary Digest Poll Response rate: approximately 25%, or 2,376,523 responses Result: Landon in a landslide (predicted 57% of the vote, Roosevelt predicted 40%)
  • 41. The Literary Digest Poll Response rate: approximately 25%, or 2,376,523 responses Result: Landon in a landslide (predicted 57% of the vote, Roosevelt predicted 40%) Election result: Roosevelt received approximately 60% of the vote
  • 42. The Literary Digest Poll POSSIBLE CAUSES OF ERROR Selection Bias : By taking names and addresses from telephone directories, survey systematically excluded poor voters. Republicans were markedly overrepresented in 1936, Democrats did not have as many phones,  not as likely to drive cars, and did not read the Literary Digest “ Sampling Frame” is the actual population of individuals from which a sample is drawn: Selection bias results when sampling frame is not representative of the population of interest
  • 43. The Literary Digest Poll POSSIBLE CAUSES OF ERROR Non-response Bias : Because only 20% of 10 million people returned surveys, non-respondents may have different preferences from respondents Indeed, respondents favored Landon Greater response rates reduce the odds of biased samples
  • 45. Terminology Population: is a set of entities concerning which statistical inferences are to be drawn. Sample: a number of independent observations from the same probability distribution Parameter: the distribution of a random variable as belonging to a family of probability distributions, distinguished from each other by the values of a finite number of parameters Bias: a factor that causes a statistical sample of a population to have some examples of the population less represented than others.
  • 46. Outliers (and their treatment)
  • 47. Outliers (and their treatment) An "outlier" is an observation that does not fit the pattern in the rest of the data Check the data Check with the measurer If reason to believe it is NOT real, change it if possible, otherwise leave it out (but note). If reason to believe it is real, leave it out and note.
  • 48. The Mean The Mean (Arithmetic) The mean is defined as the sum of all the elements, divided by the number of elements. The statistical mean of a set of observations is the average of the measurements in a set of data
  • 49. The Mode The mode is defined as the most frequently element in a set of elements. For example [1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17] has a mode of 6. Given the list of data [1, 1, 2, 4, 4] the mode is not unique - the dataset may be said to be bimodal, while a set with more than two modes may be described as multimodal.
  • 50. The Median The median is defined as the middle element, or the value separating the higher half of a sample from the lower half. If there is an even number of elements, it is half the sum of the middle two elements. Given the list of data [1, 1, 2, 4, 4] the median is 2.
  • 51. The Variance But there can be a lot of variance in individual elements, e.g. teacher salaries Average = €22,000 Lowest = € 12,000 Difference = 12,000 - 22,000 = -10,000
  • 55. The Variance Sum of (Sample - Average) = 0, thus we need to define variance. The variance of a set of data is a cumulative measure of the squares of the difference of all the data values from the mean divided by sample size minus one.
  • 56. Standard Deviation The standard deviation of a set of data is the positive square root of the variance. - 1 - 1
  • 57. Born 27 March 1857 Died 27 April 1936 Born in Islington, London, England Father of Mathematical Statistics protégé of Francis Galton Inventor of the P-value, the Pearson correlation coefficient, Chi distance, the Method of moments, and Principal Component Analysis Karl Pearson
  • 58. Karl Pearson the term "standard deviation" in 1893, "although the idea was by then nearly a century old" (Abbott; Stigler, page 328). The term "standard deviation" was introduced in a lecture of 31 January 1893, as a convenient substitute for the cumbersome "root mean square error" and the older expressions "error of mean square" and "mean error." The term was first used in a publication in 1894 by Pearson in "Contributions to the Mathematical Theory of Evolution," (Philosophical Transactions of the Royal Society A, 185, (1894), 71-110.). http://jeff560.tripod.com/s.html
  • 59. Question 2 Find the mean and variance of the following sample values : 36, 41, 43, 44, 46
  • 60. Question 2 Mean : =( 36 + 41 + 43 + 44 + 46) / 5 =210 / 5 =42
  • 61. Question 2 Variance Difference Square 36 – 42 = -6 36 41 – 42 = -1 1 43 – 42 = 1 1 44 – 42 = 2 4 46 – 42 = 4 16 ------------------------------------ 58 Variance = 58 / (5 -1) = 58 / 4 = 14.5 Standard Deviation = SquareRoot(14.5) = 3.8
  • 63.  
  • 65.