SlideShare a Scribd company logo
Lecture Slides  Elementary Statistics   Eleventh Edition  and the Triola Statistics Series  by Mario F. Triola
Chapter 1 Introduction to Statistics 1-1 Review and Preview 1-2 Statistical Thinking 1-3 Types of Data 1-4 Critical Thinking 1-5 Collecting Sample Data
Section 1-1 Review and Preview
Preview Polls, studies, surveys and other data collecting tools collect data from a small part of a larger group so that we can learn something about the larger group. This is a common and important goal of statistics: Learn about a large group by examining data from some of its members.
Preview In this context, the terms sample and population have special meaning. Formal definitions for these and other basic terms will be given here. In this section we will look at some of the ways to describe data.
Data collections of   observations (such as measurements, genders, survey responses) Data
Statistics is the science of planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data Statistics
Population Population   the complete collection of all individuals (scores, people, measurements, and so on) to be studied; the collection is complete in the sense that it includes  all  of the individuals to be studied
Census versus Sample Census Collection of data from  every  member of a population Sample Subcollection  of members selected from a population
Chapter Key Concepts Sample data must be collected in an appropriate way, such as through a process of  random  selection. If sample data are not collected in an appropriate way, the data may be so completely useless that no amount of statistical torturing can salvage them.
Section 1-2  Statistical Thinking
Key Concept This section introduces basic principles of statistical thinking used throughout this book. Whether conducting statistical analysis of data that we have collected, or analyzing a statistical analysis done by someone else, we should not rely on blind acceptance of mathematical calculation. We should consider these factors:
Context of the data Source of the data Sampling method Conclusions Practical implications Key Concept  (continued)
Context What do the values represent? Where did the data come from? Why were they collected? An understanding of the context will directly affect the statistical procedure used.
Source of data Is the source objective? Is the source biased? Is there some incentive to distort or spin results to support some self-serving position? Is there something to gain or lose by distorting results? Be vigilant and skeptical of studies from sources that may be biased.
Sampling Method Does the method chosen greatly influence the validity of the conclusion? Voluntary response (or self-selected) samples often have bias (those with special interest are more likely to participate). These samples’ results are not necessarily valid. Other methods are more likely to produce good results.
Conclusions Make statements that are clear to those without an understanding of statistics and its terminology. Avoid making statements not justified by the statistical analysis.
Practical Implications State practical implications of the results. There may exist some  statistical significance  yet there may be NO  practical significance . Common sense might suggest that the finding does not make enough of a difference to justify its use or to be practical.
Statistical Significance Consider the likelihood of getting the results by chance. If results could easily occur by chance, then they are  not statistically significant . If the likelihood of getting the results is so small, then the results are  statistically significant .
Section 1-3  Types of Data
Key Concept The subject of statistics is largely about using sample data to make inferences (or generalizations) about an entire population.  It is essential to know and understand the definitions that follow.
Parameter   a numerical measurement describing some characteristic of a  population . Parameter population parameter
Statistic Statistic   a numerical measurement describing some characteristic of a  sample . sample statistic
Quantitative Data Quantitative (or numerical) data   consists of  numbers  representing counts or measurements. Example:  The weights of supermodels Example:  The ages of respondents
Categorical Data Categorical (or qualitative or  attribute) data consists of names or labels (representing  categories) Example:  The genders (male/female) of professional athletes Example:  Shirt numbers on professional athletes uniforms - substitutes for names.
Working with Quantitative Data Quantitative data can further be described by distinguishing between  discrete  and  continuous  types.
Discrete   data result when the number of possible values is either a finite number or a ‘countable’ number  (i.e. the number of possible values is 0, 1, 2, 3, . . . ) Example:  The number of eggs that a hen lays Discrete Data
Continuous (numerical) data result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps Continuous Data Example:  The amount of milk that a cow produces; e.g. 2.343115 gallons per day
Levels of Measurement Another way to classify data is to use levels of measurement.  Four of these levels are discussed in the following slides.
Nominal level of measurement   characterized by data that consist of names, labels, or categories only, and the data  cannot  be arranged in an ordering scheme (such as low to high) Example:  Survey responses  yes ,  no ,  undecided Nominal Level
Ordinal level of measurement   involves data that can be arranged in some order, but differences between data values either cannot be determined or are meaningless Example: Course grades A, B, C, D, or F Ordinal Level
Interval level of measurement   like the ordinal level, with the additional property that the difference between any two data values is meaningful, however, there is no  natural  zero starting point (where  none  of the quantity is present) Example:  Years 1000, 2000, 1776, and 1492 Interval Level
Ratio level of measurement the interval level with the additional property that there is also a natural zero starting point (where zero indicates that  none   of the quantity is present);  for values at this level, differences and ratios are meaningful Example:  Prices of college textbooks ($0 represents no cost, a $100 book costs twice as much as a $50 book) Ratio Level
Nominal   - categories only Ordinal   - categories with some order Interval   - differences but no natural starting point Ratio   - differences  and  a natural starting point Summary - Levels of Measurement
Recap Basic definitions and terms describing data Parameters versus statistics Types of data (quantitative and qualitative) Levels of measurement In this section we have looked at:
Section 1-4  Critical Thinking
Key Concepts Success in the introductory statistics course typically requires more  common sense  than mathematical expertise . Improve skills in interpreting information based on data.   This section is designed to illustrate how common sense is used  when we think critically about data and statistics . Think carefully about the context, source, method, conclusions and practical implications.
Misuses of Statistics 1. Evil intent on the part of dishonest people. 2. Unintentional errors on the part of people who don’t know any better. We should learn to distinguish between statistical conclusions that are likely to be valid and those that are seriously flawed.
Graphs To correctly interpret a graph, you must analyze the  numerical  information given in the graph, so as not to be misled by the graph’s shape. READ labels and units on the axes!
Pictographs Part (b) is designed to exaggerate the difference by increasing each dimension in proportion to the actual amounts of oil consumption.
Bad Samples Voluntary response sample (or self-selected sample) one in which the respondents themselves decide whether to be included In this case, valid conclusions can be made only about the specific group of people who agree to participate and not about the population.
Correlation and Causality Concluding that one variable  causes  the other variable when in fact the variables are linked Two variables may seemed linked, smoking and pulse rate, this relationship is called correlation. Cannot conclude the one causes the other. Correlation does not imply causality .
Small Samples Conclusions should not be based on samples that are far too small.  Example:  Basing a school suspension rate on a sample of only  three  students
Percentages Misleading or unclear percentages are sometimes used.  For example, if you take 100% of a quantity,  you take it all . If you have improved 100%, then are you perfect?!  110% of an effort does not make sense.
Loaded Questions If survey questions are not worded carefully, the results of a study can be misleading. Survey questions can be “loaded” or intentionally worded to elicit a desired response. Too little money is being spent on “welfare” versus too little money is being spent on “assistance to the poor.” Results: 19% versus 63%
Order of Questions Questions are unintentionally loaded by such factors as the order of the items being considered. Would you say traffic contributes more or less to air pollution than industry? Results: traffic - 45%; industry - 27% When order reversed. Results: industry - 57%; traffic - 24%
Nonresponse Occurs when someone either refuses to respond to a survey question or is unavailable. People who refuse to talk to pollsters have a view of the world around them that is markedly different than those who will let poll-takers into their homes.
Missing Data Can dramatically affect results. Subjects may drop out for reasons unrelated to the study. People with low incomes are less likely to report their incomes. US Census suffers from missing people (tend to be homeless or low income).
Self-Interest Study Some parties with interest to promote will sponsor studies. Be wary of a survey in which the sponsor can enjoy monetary gain from the results. When assessing validity of a study, always consider whether the sponsor might influence the results.
Precise Numbers Because as a figure is precise, many people incorrectly assume that it is also  accurate . A precise number can be an estimate, and it should be referred to that way.
Deliberate Distortion Some studies or surveys are distorted on purpose.  The distortion can occur within the context of the data, the source of the data, the sampling method, or the conclusions.
Recap Reviewed misuses of statistics Illustrated how common sense can play a  big role in interpreting data and statistics In this section we have:
Section 1-5  Collecting Sample Data
Key Concept If sample data are not collected in an  appropriate way, the data may be so completely useless that no amount of statistical torturing can salvage them. Method used to collect sample data influences the quality of the statistical analysis. Of particular importance is  simple random sample .
Statistical methods are driven by the data that we collect. We typically obtain data from two distinct sources:  observational studies  and  experiment . Basics of Collecting Data
Observational study   observing and measuring specific  characteristics without attempting to  modify   the subjects being studied Observational Study
Experiment   apply some  treatment  and then observe its  effects on the subjects; (subjects in experiments are called  experimental units ) Experiment
Simple Random Sample Simple Random Sample of  n  subjects selected in such a way that every possible  sample of the same size  n  has the same chance of being chosen
Random Sample   members from the population are selected in such a way that each  individual member  in the population has an equal chance of being selected Random & Probability Samples Probability Sample selecting members from a population in such a way that each   member of the population has a known (but not necessarily the same) chance of being selected
Random Sampling  selection so that each  individual member has an  equal   chance  of being selected
Systematic Sampling Select some starting point and then  select every  k th element in the population
Convenience Sampling use results that are easy to get
Stratified Sampling subdivide the population into at  least two different subgroups that share the same characteristics, then draw a sample from each subgroup (or stratum)
Cluster Sampling divide the population area into sections  (or clusters); randomly select some of those clusters; choose  all  members from selected clusters
Multistage Sampling Collect data by using some combination of the basic sampling methods In a multistage sample design, pollsters select a sample in different stages, and each stage might use different methods of sampling
Random Systematic Convenience Stratified Cluster Multistage Methods of  Sampling - Summary
Different types of observational studies and experiment design Beyond the Basics of Collecting Data
Cross sectional study data are observed, measured, and collected at one point in time Retrospective (or case control) study data are collected from the past by going back in time (examine records, interviews, …) Prospective (or longitudinal or cohort) study data are collected in the future from groups sharing common factors (called  cohorts ) Types of Studies
Randomization  is used when subjects are assigned to different groups through a process of random selection. The logic is to use chance as a way to create two groups that are similar. Randomization
Replication is the repetition of an experiment on more than one subject. Samples should be large enough so that the erratic behavior that is characteristic of very small samples will not disguise the true effects of different treatments. It is used effectively when there are enough subjects to recognize the differences from different treatments. Replication Use a sample size that is large enough to let us see the true nature of any effects, and obtain the sample using an appropriate method, such as one based on  randomness .
Blinding   is a technique in which the subject doesn’t know whether he or she is receiving a treatment or a placebo. Blinding allows us to determine whether the treatment effect is significantly different from a  placebo effect , which occurs when an untreated subject reports improvement in symptoms. Blinding
Double-Blind   Blinding occurs at two levels: Double Blind (1) The subject doesn’t know whether he or she is receiving the treatment or a placebo (2) The experimenter does not know whether he or she is administering the treatment or placebo
Confounding  occurs in an experiment when the experimenter is not able to distinguish between the effects of different factors. Try to plan the experiment so that confounding does not occur. Confounding
Controlling Effects of Variables Completely Randomized Experimental Design assign subjects to different treatment groups through a process of  random selection Randomized Block Design a  block  is a group of subjects that  are similar, but blocks differ in ways that might affect the outcome of the experiment Rigorously Controlled Design carefully  assign subjects to different treatment groups, so that those given each treatment are similar in ways that are important to the experiment Matched Pairs Design compare exactly two treatment groups using subjects matched in pairs that are somehow related or have similar characteristics
Three very important considerations in the design of experiments are the following: Summary 1. Use  randomization  to assign subjects to different groups 2. Use replication by repeating the experiment on enough subjects so that effects of treatment or other factors can be clearly seen. 3. Control the effects of variables  by using such techniques as blinding and a completely randomized experimental design
Sampling error the difference between a sample result and the true population result; such an error results from chance sample fluctuations Nonsampling error   sample data incorrectly collected, recorded, or analyzed (such as by selecting a biased sample, using a defective instrument, or copying the data incorrectly) Errors No matter how well you plan and execute the sample collection process, there is likely to be some error in the results.
Recap In this section we have looked at: Types of studies and experiments Controlling the effects of variables Randomization Types of sampling Sampling errors

More Related Content

PPTX
Sample Size Determination
PPTX
PDF
Determining sample size
PPTX
Sample size determination
PPTX
Presentation on determination of size of sample (n)
PPTX
Sample Size Determination
PPTX
Sample size
Sample Size Determination
Determining sample size
Sample size determination
Presentation on determination of size of sample (n)
Sample Size Determination
Sample size

What's hot (20)

PDF
8 sampling & sample size (Dr. Mai,2014)
PPTX
Sample size calculation
PPT
Sampling and sample size determination
PPTX
How to determine sample size
PPT
Mangasini ppt lect_sample size determination
PPT
On Samples And Sampling
PPTX
Introduction to Sample size decision
PPTX
determination of sample size
PPTX
Sample determinants and size
PPTX
Sample size
PPTX
Sample size
PPTX
Sample size calculation
DOCX
Sample size determination
PPTX
SAMPLE SIZE, CONSENT, STATISTICS
PPTX
Sample and sample size
PPT
6. sample size v3
PDF
PDF
Sample size calculation in medical research
PPTX
Minimizing Risk In Phase II and III Sample Size Calculation
8 sampling & sample size (Dr. Mai,2014)
Sample size calculation
Sampling and sample size determination
How to determine sample size
Mangasini ppt lect_sample size determination
On Samples And Sampling
Introduction to Sample size decision
determination of sample size
Sample determinants and size
Sample size
Sample size
Sample size calculation
Sample size determination
SAMPLE SIZE, CONSENT, STATISTICS
Sample and sample size
6. sample size v3
Sample size calculation in medical research
Minimizing Risk In Phase II and III Sample Size Calculation
Ad

Similar to Stat11t Chapter1 (20)

PPT
Introduction To Statistics
PPT
Triola ed 11 chapter 1
PPTX
1.2 types of data
PDF
STATISTICS-E.pdf
PDF
StatIstics module 1
PPTX
1.1 statistical and critical thinking
PDF
Chap1
PDF
Chap1
DOCX
59172888 introduction-to-statistics-independent-study-requirements-2nd-sem-20...
PPTX
Business Statistics unit 1.pptx
PDF
Review of Basic Statistics and Terminology
PPTX
01 Introduction (1).pptx
PDF
statics engineering mechanics slides.pdf
DOCX
Statistics  What you Need to KnowIntroductionOften, when peop.docx
DOCX
Statistical lechure
PPTX
543957106-Introduction-Basic-Concepts-in-Statistics-PPT - Copy.pptx
PDF
CHAPTER 1.pdf Probability and Statistics for Engineers
PDF
CHAPTER 1.pdfProbability and Statistics for Engineers
PPT
Statistics for Business and Economics.ppt
PPTX
Chapter 1 of the book Basic Statistics as described by teacher
Introduction To Statistics
Triola ed 11 chapter 1
1.2 types of data
STATISTICS-E.pdf
StatIstics module 1
1.1 statistical and critical thinking
Chap1
Chap1
59172888 introduction-to-statistics-independent-study-requirements-2nd-sem-20...
Business Statistics unit 1.pptx
Review of Basic Statistics and Terminology
01 Introduction (1).pptx
statics engineering mechanics slides.pdf
Statistics  What you Need to KnowIntroductionOften, when peop.docx
Statistical lechure
543957106-Introduction-Basic-Concepts-in-Statistics-PPT - Copy.pptx
CHAPTER 1.pdf Probability and Statistics for Engineers
CHAPTER 1.pdfProbability and Statistics for Engineers
Statistics for Business and Economics.ppt
Chapter 1 of the book Basic Statistics as described by teacher
Ad

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Cloud computing and distributed systems.
PDF
Machine learning based COVID-19 study performance prediction
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
20250228 LYD VKU AI Blended-Learning.pptx
Review of recent advances in non-invasive hemoglobin estimation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
Modernizing your data center with Dell and AMD
Spectral efficient network and resource selection model in 5G networks
Cloud computing and distributed systems.
Machine learning based COVID-19 study performance prediction
Network Security Unit 5.pdf for BCA BBA.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
NewMind AI Weekly Chronicles - August'25 Week I
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Understanding_Digital_Forensics_Presentation.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Advanced methodologies resolving dimensionality complications for autism neur...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing

Stat11t Chapter1

  • 1. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by Mario F. Triola
  • 2. Chapter 1 Introduction to Statistics 1-1 Review and Preview 1-2 Statistical Thinking 1-3 Types of Data 1-4 Critical Thinking 1-5 Collecting Sample Data
  • 3. Section 1-1 Review and Preview
  • 4. Preview Polls, studies, surveys and other data collecting tools collect data from a small part of a larger group so that we can learn something about the larger group. This is a common and important goal of statistics: Learn about a large group by examining data from some of its members.
  • 5. Preview In this context, the terms sample and population have special meaning. Formal definitions for these and other basic terms will be given here. In this section we will look at some of the ways to describe data.
  • 6. Data collections of observations (such as measurements, genders, survey responses) Data
  • 7. Statistics is the science of planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on the data Statistics
  • 8. Population Population the complete collection of all individuals (scores, people, measurements, and so on) to be studied; the collection is complete in the sense that it includes all of the individuals to be studied
  • 9. Census versus Sample Census Collection of data from every member of a population Sample Subcollection of members selected from a population
  • 10. Chapter Key Concepts Sample data must be collected in an appropriate way, such as through a process of random selection. If sample data are not collected in an appropriate way, the data may be so completely useless that no amount of statistical torturing can salvage them.
  • 11. Section 1-2 Statistical Thinking
  • 12. Key Concept This section introduces basic principles of statistical thinking used throughout this book. Whether conducting statistical analysis of data that we have collected, or analyzing a statistical analysis done by someone else, we should not rely on blind acceptance of mathematical calculation. We should consider these factors:
  • 13. Context of the data Source of the data Sampling method Conclusions Practical implications Key Concept (continued)
  • 14. Context What do the values represent? Where did the data come from? Why were they collected? An understanding of the context will directly affect the statistical procedure used.
  • 15. Source of data Is the source objective? Is the source biased? Is there some incentive to distort or spin results to support some self-serving position? Is there something to gain or lose by distorting results? Be vigilant and skeptical of studies from sources that may be biased.
  • 16. Sampling Method Does the method chosen greatly influence the validity of the conclusion? Voluntary response (or self-selected) samples often have bias (those with special interest are more likely to participate). These samples’ results are not necessarily valid. Other methods are more likely to produce good results.
  • 17. Conclusions Make statements that are clear to those without an understanding of statistics and its terminology. Avoid making statements not justified by the statistical analysis.
  • 18. Practical Implications State practical implications of the results. There may exist some statistical significance yet there may be NO practical significance . Common sense might suggest that the finding does not make enough of a difference to justify its use or to be practical.
  • 19. Statistical Significance Consider the likelihood of getting the results by chance. If results could easily occur by chance, then they are not statistically significant . If the likelihood of getting the results is so small, then the results are statistically significant .
  • 20. Section 1-3 Types of Data
  • 21. Key Concept The subject of statistics is largely about using sample data to make inferences (or generalizations) about an entire population. It is essential to know and understand the definitions that follow.
  • 22. Parameter a numerical measurement describing some characteristic of a population . Parameter population parameter
  • 23. Statistic Statistic a numerical measurement describing some characteristic of a sample . sample statistic
  • 24. Quantitative Data Quantitative (or numerical) data consists of numbers representing counts or measurements. Example: The weights of supermodels Example: The ages of respondents
  • 25. Categorical Data Categorical (or qualitative or attribute) data consists of names or labels (representing categories) Example: The genders (male/female) of professional athletes Example: Shirt numbers on professional athletes uniforms - substitutes for names.
  • 26. Working with Quantitative Data Quantitative data can further be described by distinguishing between discrete and continuous types.
  • 27. Discrete data result when the number of possible values is either a finite number or a ‘countable’ number (i.e. the number of possible values is 0, 1, 2, 3, . . . ) Example: The number of eggs that a hen lays Discrete Data
  • 28. Continuous (numerical) data result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps Continuous Data Example: The amount of milk that a cow produces; e.g. 2.343115 gallons per day
  • 29. Levels of Measurement Another way to classify data is to use levels of measurement. Four of these levels are discussed in the following slides.
  • 30. Nominal level of measurement characterized by data that consist of names, labels, or categories only, and the data cannot be arranged in an ordering scheme (such as low to high) Example: Survey responses yes , no , undecided Nominal Level
  • 31. Ordinal level of measurement involves data that can be arranged in some order, but differences between data values either cannot be determined or are meaningless Example: Course grades A, B, C, D, or F Ordinal Level
  • 32. Interval level of measurement like the ordinal level, with the additional property that the difference between any two data values is meaningful, however, there is no natural zero starting point (where none of the quantity is present) Example: Years 1000, 2000, 1776, and 1492 Interval Level
  • 33. Ratio level of measurement the interval level with the additional property that there is also a natural zero starting point (where zero indicates that none of the quantity is present); for values at this level, differences and ratios are meaningful Example: Prices of college textbooks ($0 represents no cost, a $100 book costs twice as much as a $50 book) Ratio Level
  • 34. Nominal - categories only Ordinal - categories with some order Interval - differences but no natural starting point Ratio - differences and a natural starting point Summary - Levels of Measurement
  • 35. Recap Basic definitions and terms describing data Parameters versus statistics Types of data (quantitative and qualitative) Levels of measurement In this section we have looked at:
  • 36. Section 1-4 Critical Thinking
  • 37. Key Concepts Success in the introductory statistics course typically requires more common sense than mathematical expertise . Improve skills in interpreting information based on data. This section is designed to illustrate how common sense is used when we think critically about data and statistics . Think carefully about the context, source, method, conclusions and practical implications.
  • 38. Misuses of Statistics 1. Evil intent on the part of dishonest people. 2. Unintentional errors on the part of people who don’t know any better. We should learn to distinguish between statistical conclusions that are likely to be valid and those that are seriously flawed.
  • 39. Graphs To correctly interpret a graph, you must analyze the numerical information given in the graph, so as not to be misled by the graph’s shape. READ labels and units on the axes!
  • 40. Pictographs Part (b) is designed to exaggerate the difference by increasing each dimension in proportion to the actual amounts of oil consumption.
  • 41. Bad Samples Voluntary response sample (or self-selected sample) one in which the respondents themselves decide whether to be included In this case, valid conclusions can be made only about the specific group of people who agree to participate and not about the population.
  • 42. Correlation and Causality Concluding that one variable causes the other variable when in fact the variables are linked Two variables may seemed linked, smoking and pulse rate, this relationship is called correlation. Cannot conclude the one causes the other. Correlation does not imply causality .
  • 43. Small Samples Conclusions should not be based on samples that are far too small. Example: Basing a school suspension rate on a sample of only three students
  • 44. Percentages Misleading or unclear percentages are sometimes used. For example, if you take 100% of a quantity, you take it all . If you have improved 100%, then are you perfect?! 110% of an effort does not make sense.
  • 45. Loaded Questions If survey questions are not worded carefully, the results of a study can be misleading. Survey questions can be “loaded” or intentionally worded to elicit a desired response. Too little money is being spent on “welfare” versus too little money is being spent on “assistance to the poor.” Results: 19% versus 63%
  • 46. Order of Questions Questions are unintentionally loaded by such factors as the order of the items being considered. Would you say traffic contributes more or less to air pollution than industry? Results: traffic - 45%; industry - 27% When order reversed. Results: industry - 57%; traffic - 24%
  • 47. Nonresponse Occurs when someone either refuses to respond to a survey question or is unavailable. People who refuse to talk to pollsters have a view of the world around them that is markedly different than those who will let poll-takers into their homes.
  • 48. Missing Data Can dramatically affect results. Subjects may drop out for reasons unrelated to the study. People with low incomes are less likely to report their incomes. US Census suffers from missing people (tend to be homeless or low income).
  • 49. Self-Interest Study Some parties with interest to promote will sponsor studies. Be wary of a survey in which the sponsor can enjoy monetary gain from the results. When assessing validity of a study, always consider whether the sponsor might influence the results.
  • 50. Precise Numbers Because as a figure is precise, many people incorrectly assume that it is also accurate . A precise number can be an estimate, and it should be referred to that way.
  • 51. Deliberate Distortion Some studies or surveys are distorted on purpose. The distortion can occur within the context of the data, the source of the data, the sampling method, or the conclusions.
  • 52. Recap Reviewed misuses of statistics Illustrated how common sense can play a big role in interpreting data and statistics In this section we have:
  • 53. Section 1-5 Collecting Sample Data
  • 54. Key Concept If sample data are not collected in an appropriate way, the data may be so completely useless that no amount of statistical torturing can salvage them. Method used to collect sample data influences the quality of the statistical analysis. Of particular importance is simple random sample .
  • 55. Statistical methods are driven by the data that we collect. We typically obtain data from two distinct sources: observational studies and experiment . Basics of Collecting Data
  • 56. Observational study observing and measuring specific characteristics without attempting to modify the subjects being studied Observational Study
  • 57. Experiment apply some treatment and then observe its effects on the subjects; (subjects in experiments are called experimental units ) Experiment
  • 58. Simple Random Sample Simple Random Sample of n subjects selected in such a way that every possible sample of the same size n has the same chance of being chosen
  • 59. Random Sample members from the population are selected in such a way that each individual member in the population has an equal chance of being selected Random & Probability Samples Probability Sample selecting members from a population in such a way that each member of the population has a known (but not necessarily the same) chance of being selected
  • 60. Random Sampling selection so that each individual member has an equal chance of being selected
  • 61. Systematic Sampling Select some starting point and then select every k th element in the population
  • 62. Convenience Sampling use results that are easy to get
  • 63. Stratified Sampling subdivide the population into at least two different subgroups that share the same characteristics, then draw a sample from each subgroup (or stratum)
  • 64. Cluster Sampling divide the population area into sections (or clusters); randomly select some of those clusters; choose all members from selected clusters
  • 65. Multistage Sampling Collect data by using some combination of the basic sampling methods In a multistage sample design, pollsters select a sample in different stages, and each stage might use different methods of sampling
  • 66. Random Systematic Convenience Stratified Cluster Multistage Methods of Sampling - Summary
  • 67. Different types of observational studies and experiment design Beyond the Basics of Collecting Data
  • 68. Cross sectional study data are observed, measured, and collected at one point in time Retrospective (or case control) study data are collected from the past by going back in time (examine records, interviews, …) Prospective (or longitudinal or cohort) study data are collected in the future from groups sharing common factors (called cohorts ) Types of Studies
  • 69. Randomization is used when subjects are assigned to different groups through a process of random selection. The logic is to use chance as a way to create two groups that are similar. Randomization
  • 70. Replication is the repetition of an experiment on more than one subject. Samples should be large enough so that the erratic behavior that is characteristic of very small samples will not disguise the true effects of different treatments. It is used effectively when there are enough subjects to recognize the differences from different treatments. Replication Use a sample size that is large enough to let us see the true nature of any effects, and obtain the sample using an appropriate method, such as one based on randomness .
  • 71. Blinding is a technique in which the subject doesn’t know whether he or she is receiving a treatment or a placebo. Blinding allows us to determine whether the treatment effect is significantly different from a placebo effect , which occurs when an untreated subject reports improvement in symptoms. Blinding
  • 72. Double-Blind Blinding occurs at two levels: Double Blind (1) The subject doesn’t know whether he or she is receiving the treatment or a placebo (2) The experimenter does not know whether he or she is administering the treatment or placebo
  • 73. Confounding occurs in an experiment when the experimenter is not able to distinguish between the effects of different factors. Try to plan the experiment so that confounding does not occur. Confounding
  • 74. Controlling Effects of Variables Completely Randomized Experimental Design assign subjects to different treatment groups through a process of random selection Randomized Block Design a block is a group of subjects that are similar, but blocks differ in ways that might affect the outcome of the experiment Rigorously Controlled Design carefully assign subjects to different treatment groups, so that those given each treatment are similar in ways that are important to the experiment Matched Pairs Design compare exactly two treatment groups using subjects matched in pairs that are somehow related or have similar characteristics
  • 75. Three very important considerations in the design of experiments are the following: Summary 1. Use randomization to assign subjects to different groups 2. Use replication by repeating the experiment on enough subjects so that effects of treatment or other factors can be clearly seen. 3. Control the effects of variables by using such techniques as blinding and a completely randomized experimental design
  • 76. Sampling error the difference between a sample result and the true population result; such an error results from chance sample fluctuations Nonsampling error sample data incorrectly collected, recorded, or analyzed (such as by selecting a biased sample, using a defective instrument, or copying the data incorrectly) Errors No matter how well you plan and execute the sample collection process, there is likely to be some error in the results.
  • 77. Recap In this section we have looked at: Types of studies and experiments Controlling the effects of variables Randomization Types of sampling Sampling errors