SlideShare a Scribd company logo
Advanced Statistics for Librarians How to use and evaluate statistical information in library research Claremont Colleges Caltech Science & Electronic Resources Librarian Acquisitions Librarian Jason Price John McDonald
Advanced Statistics Part I : Research Design Part II : Statistical Concepts Part III : Evaluating Library Statistics
Research Design Validity How well an indicator accurately measures the concept being studied.  Is the technique appropriate to measure the concept being studied? Reliability How consistent is the measurement.  Does it yield the same results over repeated attempts and by different researchers?  How certain are the results? Generalizability How well (or likely) can the findings be applied to other situations?
Research Design Steps Research Question Hypotheses Data definitions Data collection Data analysis  Conclusions
Research Question What is the study designed to answer? Why is the study important? The more specific, the better! Example:  Should the library increase hours during finals week?
Hypothesis A statement about the expected results. What you will test after collecting data. Null Hypothesis , that there is no difference between Group 1 & Group 2 or Before/After. Notated  H o  = H a Alternate Hypothesis , that there is a difference and what that difference will be. Notated  H o  ≠ H a   Can also be directional if theory or prior research indicates :  H o  > H a
Data collection Observation Interviews Focus Groups Surveys Transaction Logs Others?
Data Collection: Sampling Necessary when it is impossible to study an entire population due to logical, geographical, monetary, or time constraints.  A sample must be a  good representation  of the rest of the population. The larger your sample, the more sure you can be that their answers truly reflect the population Accuracy increases when more respondents pick one choice over another.  E.g. More accuracy when 99% choose one presidential candidate The larger your population size, the larger your sample needs to be, except if your population is very large (i.e. the U.S., or very small (i.e. your household)
Simple Stratified Assumes homogeneity Assumes heterogeneity Sampling Designs
1) SS = Z 2  * (p) * (1-p) / c 2 2) ss = SS/1+(SS-1/pop) When you have very large pop size When you have finite pop size Z = Z value (e.g. 1.96 for 95% confidence level)  p = percentage picking a choice, expressed as  decimal (e.g. .5 for 50%) c = confidence interval, expressed as decimal  (e.g.,  .04 = ±4%) Sample size spreadsheet Calculating Sample Sizes
Research Question :  What is the color distribution of M&Ms?  Sample :  What is the color distribution of a simple random sample of M&Ms. Test : Does my sample yield different results than what is reported by the company? Method :  Packages of M&Ms distributed to each participant.  Each package is a  random sample  from the company.  M&M Sampling
Let’s look at the colors in individual samples of M&Ms M&M Data Collection & Testing M&M Sampling
Data Definitions Data Scales Nominal Ordinal Interval Ratio Frequency Distributions Flat Normal Skewed Variable Types Dependent Independent Extraneous
Data Scales Nominal : scaled without order, indicating that  classifications are different.  Example : Public & private institutions. Ordinal :  scaled with order, but without distance between values.  Example : Carnegie classifications Interval : scaled with order and establishes numerically equal distances on the scale.  Example : Grade level (freshman, sophomore, etc.) Ratio :  scaled with equal intervals and a zero starting point.  Example : Fulltext downloads. Nominal or ordinal variables are  discrete , while interval and ratio variables are  continuous
Name that data type! Salary Author of a book Hours spent in the library Patron status Publication year of a journal Ranked journal lists  Test results on instruction classes Number of articles read FTE
Data Distributions Described by their kurtosis (variability) and skew (extremes) Non-normal  (skewed): extreme values with steep slopes Normal : bell shaped curve with gradual slopes
Fulltime Students at ARL Schools N=114 Mean = 22K SD = 10K
Total Salaries & Wages at ARL Libraries N=114 Mean = 10M SD = 6.5M
Variables Dependent:  the variable being measured, studied, and predicted. Independent :  variables that can be manipulated or are predictors of the dependent variable. Extraneous : variables other than the independent variables that can influence the dependent variable.
Data analysis Descriptive statistics Mean, Median, Mode Standard Deviation Correlational statistics Correlation Inferential statistics T-test Regression Chi-square  ANOVA
Correlational Statistics Correlation establishes that two measures have a relationship.  Indicates direction & strength, but not causation! Allows researcher to consider other statistical tests with confidence. Requirements random sample interval or ratio data normal distribution linear relationship
Correlational Statistics Direction   Positive:   As one value increases, the other does as well. Example : Age and height. Library : Enrollment & materials budget. Negative:   As one value increases, the other decreases. Example : Car speed & time to destination. Library : Items purchased & shelf space. Strength   Value between 1 (positive) and -1 (negative).  The closer to those values, the stronger the relationship.
Correlation
Inferential Statistics Parametric :  assume that the dependent variable has a known underlying mathematical distribution (normal, binomial, Poisson, etc.) which serves as the basis for sample-to-population estimates. Parametric tests are robust and have great power efficiency.  Non-parametric : do not assume a normal distribution ( distribution free ) & require that the data meet fewer assumptions. Allow for the analysis of a mixture of data types.
T-Test Determine if there is a difference (in a characteristic) between two populations based on data from samples of those populations. Requirements random sample interval or ratio data normal distribution equal standard deviations
T-Test
Regression Predicts values of a dependent variable based on values of independent (predictor) variables Requirements :  interval or ratio data normal distribution correlated variables linear relationship
Regression
ANOVA Determine if there are differences between three or more sample means. Test the significance and direction of the difference. Requirements :  normal distribution (in each cell) Interval or ratio data homogeneity of variance
ANOVA
Chi Square Test Difference between expected and observed frequencies for nominal or ordinal data Requirements :  Any type of data Large sample size (>50) Similar distributions
Chi Square Test Pepsi Challenge Observed : Pepsi 85, Coke 57, RC 78  Expected  (equal) = 73.33 Degrees of freedom = rows - 1 = 3 - 1 =  2 Critical value of χ 2  =  5.99  at alpha = 0.05 Observed value of χ 2  =  5.8 Decision:  Fail to reject H 0 5.8  χ 2  =  219.99  220  Totals  0.3  21.81  4.67  73.33  78  RC  3.64  266.67  -16.33  73.33  57  Coke  1.86  136.19  11.67  73.33  85  Pepsi  (O-E) 2 /E (O-E) 2   O-E  E  O
Inferential Statistics Poisson regression Negative Binomial reg. OLS Regression Predict value from measured variables Wilcoxon test Chi-Square T-test Compare sample to a hypothetical value Kruskal-Wallace test  Chi-square test ANOVA Compare 3+ unmatched groups Mann-Whitney Komogorov-Smirnov Standard two-group t-test Compare 2 paired groups Mann-Whitney test  Fisher's test Unpaired t-test Compare 2 unpaired groups Spearman correlation Kendall's tau Pearson correlation Quantify association between variables Non-parametric Parametric Goal
Review: Research Design Research Question What will the study answer? Hypotheses What do you think the results will be? Data definitions What scales are the variables, what is the distribution, and what are the dependent, independent & extraneous variables? Data collection What is the best method for collecting the variables of interest? Data analysis  What are the proper statistical tests to use on the data? Conclusions What does the data show us or indicate?
Case Studies Citation Analysis Antelman, K (2004) “Do Open-Access Articles Have a Greater Research Impact?”  College & Research Libraries News  65(5):pp. 372-382 Usage Analysis Blecic, DD (1999) “Measurements of journal use: an analysis of the correlations between three methods.” Bull Med Libr Assoc 87(1): 20-25. Service Analysis Nichols, J; Shaffer, B; Shockey, K. (2003). “Changing the Face of Instruction: Is Online or In-class More Effective?”  College & Research Libraries , 64:5: 378-389.
“ Changing the Face of Instruction…” Is an online tutorial as effective in teaching library instruction as a classroom setting? H3. Students will report as much or more satisfaction with online instruction as students taking traditional instruction. Research Question Hypotheses H1. Students will have higher scores in information literacy tests after library instruction. H2. Students will have the same or higher scores in info-lit tests after taking online tutorials as students taking traditional instruction.
“ Changing the Face of Instruction…” Variables: Test scores & survey results Data Collection: Pretest/Posttest & Survey Variables &  Data Collection Statistical Tests Conclusions Accept H1:  Instruction improves literacy.  Desc Stats incl. mean, standard deviation, standard error, T-tests (1 & 2 tailed) Accept H3 alternative hypothesis – Student satisfaction is equal with both methods. Accept H2 alternative hypothesis – Online has no significant difference from traditional.
“ Do Open-Access Articles…” Research Question Hypothesis Variables and Data Collection Statistical Tests Conclusions Critical Questions
“ Do Open-Access Articles…” Do freely available articles have a greater research impact? Research impact: citation rates Open Access: freely available Research Question Hypotheses H1. Scholarly articles have a greater research impact if the articles are freely available online than if they are not. Ho: (null hypothesis): There is no difference between the mean citation rates: Ho: d1 = d0 Measures
“ Do Open-Access Articles…” Variables: Mean citation rates Data Collection: At least 50 articles from 10 leading journals in 4 disciplines.  Variables &  Data Collection Statistical Tests Conclusions Reject Ho:  Open Access articles are citation more than those that are not OA.  Desc Stats incl. mean, standard deviation, standard error, Wilcoxon sign-rank Validity?  Reliability of Measures? Generalizability? Alternate hypotheses? Discussion
My favorite statistic… Baseball is 90% mental –  the other half is physical.

More Related Content

PDF
Data Collection, Sampling, Measurement Concept, Questionnaire Designing-Types
DOCX
Quantitative data analysis
PPT
Quantitative analysis using SPSS
PDF
STATISTICAL TOOLS IN RESEARCH
PPT
Chapter 15 Social Research
PPTX
Commonly Used Statistics in Survey Research
PPTX
Statistics pres 3.31.2014
PPT
Quantitative data 2
Data Collection, Sampling, Measurement Concept, Questionnaire Designing-Types
Quantitative data analysis
Quantitative analysis using SPSS
STATISTICAL TOOLS IN RESEARCH
Chapter 15 Social Research
Commonly Used Statistics in Survey Research
Statistics pres 3.31.2014
Quantitative data 2

What's hot (19)

PDF
quantitative data analysis using spss
PPT
Analyzing survey data
PPTX
Parametric & non-parametric
PPTX
Non parametric test
PPTX
Multivariate
PPT
Thiyagu statistics
PPTX
Statistical test in spss
PDF
Overview of statistical tests: Data handling and data quality (Part II)
PPT
Quantitative data analysis
PPT
Research Methodology: Questionnaire, Sampling, Data Preparation
PPTX
Chapter 1
PPTX
Non parametric study; Statistical approach for med student
ODP
Review of "Survey Research Methods & Design in Psychology"
PPTX
Data Analysis
PPTX
3.1 non parametric test
PPTX
Commonly Used Statistics in Medical Research Part I
PDF
Analysis of Data - Dr. K. Thiyagu
PPTX
Statistical test
PDF
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
quantitative data analysis using spss
Analyzing survey data
Parametric & non-parametric
Non parametric test
Multivariate
Thiyagu statistics
Statistical test in spss
Overview of statistical tests: Data handling and data quality (Part II)
Quantitative data analysis
Research Methodology: Questionnaire, Sampling, Data Preparation
Chapter 1
Non parametric study; Statistical approach for med student
Review of "Survey Research Methods & Design in Psychology"
Data Analysis
3.1 non parametric test
Commonly Used Statistics in Medical Research Part I
Analysis of Data - Dr. K. Thiyagu
Statistical test
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Ad

Viewers also liked (6)

PPTX
February 2014 Library Statistics
PPT
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
PPTX
Changes in library standards : Statistics and evaluation as mirror of library...
PPT
Staff manual,lib.survey,statistics,standards.
PPTX
Annual Reports
PPT
Library management system
February 2014 Library Statistics
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
Changes in library standards : Statistics and evaluation as mirror of library...
Staff manual,lib.survey,statistics,standards.
Annual Reports
Library management system
Ad

Similar to Advanced statistics for librarians (20)

PDF
Research Procedure
PPTX
CHAPTER 3 Understanding Data hhhhhhhhhkk
PPT
Introduction To Statistics
PPTX
Module12_Statistics_and_Evidence -Based Practice.pptx
PPT
Day 11 t test for independent samples
PPTX
BASIC STATISTICAL TREATMENT IN RESEARCH.pptx
PPTX
Quantitative research
PPTX
Meta analysis with R
DOCX
Week 6 DQ1. What is your research questionIs there a differen.docx
PPTX
Quantitative Research Design.pptx
PPT
SPSS statistics - get help using SPSS
PPT
Week 7 a statistics
PDF
Data Analysis using Statistics and Hypothesis Testing in Quantitative Studies...
PPTX
statistical analysis.pptx
PDF
Methodology
PPTX
Application of Statistics in Assessing Student Learning Outcomes.pptx
PDF
Basic Statistics in Social Science Research.pdf
PPTX
statistical analysis gr12.pptx lesson in research
PPT
Experimental
PPT
MELJUN CORTES research designing_research_methodology
Research Procedure
CHAPTER 3 Understanding Data hhhhhhhhhkk
Introduction To Statistics
Module12_Statistics_and_Evidence -Based Practice.pptx
Day 11 t test for independent samples
BASIC STATISTICAL TREATMENT IN RESEARCH.pptx
Quantitative research
Meta analysis with R
Week 6 DQ1. What is your research questionIs there a differen.docx
Quantitative Research Design.pptx
SPSS statistics - get help using SPSS
Week 7 a statistics
Data Analysis using Statistics and Hypothesis Testing in Quantitative Studies...
statistical analysis.pptx
Methodology
Application of Statistics in Assessing Student Learning Outcomes.pptx
Basic Statistics in Social Science Research.pdf
statistical analysis gr12.pptx lesson in research
Experimental
MELJUN CORTES research designing_research_methodology

More from John McDonald (20)

PDF
Discovery or Displacement?: A Large Scale Longitudinal Study of the Effect of...
PPTX
Springer Symposium on Scholarly Communications
PPTX
Making the Data Work: Telling your story with Usage Statistics
PPTX
Transforming the Library
PPTX
Collaboration in Information Technology Services
PPTX
Ebook Availability Revisited: A Quantitative Analysis of the 2012 Ebook Aggre...
PPTX
What OCLC Data Analysis Reveals About SCELC Libraries
PPTX
SerialsSolutions Visit
PPTX
Communication Strategies for Pushing the Boundaries of Collaboration
PPTX
Fear Factor, Amazing Race, or Survivor: Threats & Opportunities for Libraries...
PPTX
Tipping the Cow: Reorganizing Staff to Support Electronic Resources
PPTX
Niso usage data forum 2007
PPTX
Size Matters: Engaging Your Users Where They Are @
PPTX
Oberlin Group Library Statistics
PPT
bX at Claremont
PPTX
Sherlock: The Summon Experience at Claremont
PPT
Copyright 2.0: Issues for Digital Natives
PPT
NISO Webinar on Usage Data: An Overview of Recent Usage Data Research
PPT
Usage Factor: Final Report & Next Steps
PPT
Changing the Structure of Scholarly Publishing: Open Access, Open Archives, a...
Discovery or Displacement?: A Large Scale Longitudinal Study of the Effect of...
Springer Symposium on Scholarly Communications
Making the Data Work: Telling your story with Usage Statistics
Transforming the Library
Collaboration in Information Technology Services
Ebook Availability Revisited: A Quantitative Analysis of the 2012 Ebook Aggre...
What OCLC Data Analysis Reveals About SCELC Libraries
SerialsSolutions Visit
Communication Strategies for Pushing the Boundaries of Collaboration
Fear Factor, Amazing Race, or Survivor: Threats & Opportunities for Libraries...
Tipping the Cow: Reorganizing Staff to Support Electronic Resources
Niso usage data forum 2007
Size Matters: Engaging Your Users Where They Are @
Oberlin Group Library Statistics
bX at Claremont
Sherlock: The Summon Experience at Claremont
Copyright 2.0: Issues for Digital Natives
NISO Webinar on Usage Data: An Overview of Recent Usage Data Research
Usage Factor: Final Report & Next Steps
Changing the Structure of Scholarly Publishing: Open Access, Open Archives, a...

Recently uploaded (20)

PPTX
Introduction to Building Materials
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
Indian roads congress 037 - 2012 Flexible pavement
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PDF
HVAC Specification 2024 according to central public works department
PDF
1_English_Language_Set_2.pdf probationary
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
20th Century Theater, Methods, History.pptx
PPTX
Virtual and Augmented Reality in Current Scenario
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
Introduction to Building Materials
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
LDMMIA Reiki Yoga Finals Review Spring Summer
Indian roads congress 037 - 2012 Flexible pavement
202450812 BayCHI UCSC-SV 20250812 v17.pptx
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Share_Module_2_Power_conflict_and_negotiation.pptx
B.Sc. DS Unit 2 Software Engineering.pptx
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
TNA_Presentation-1-Final(SAVE)) (1).pptx
HVAC Specification 2024 according to central public works department
1_English_Language_Set_2.pdf probationary
Chinmaya Tiranga quiz Grand Finale.pdf
A powerpoint presentation on the Revised K-10 Science Shaping Paper
20th Century Theater, Methods, History.pptx
Virtual and Augmented Reality in Current Scenario
Practical Manual AGRO-233 Principles and Practices of Natural Farming
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Paper A Mock Exam 9_ Attempt review.pdf.

Advanced statistics for librarians

  • 1. Advanced Statistics for Librarians How to use and evaluate statistical information in library research Claremont Colleges Caltech Science & Electronic Resources Librarian Acquisitions Librarian Jason Price John McDonald
  • 2. Advanced Statistics Part I : Research Design Part II : Statistical Concepts Part III : Evaluating Library Statistics
  • 3. Research Design Validity How well an indicator accurately measures the concept being studied. Is the technique appropriate to measure the concept being studied? Reliability How consistent is the measurement. Does it yield the same results over repeated attempts and by different researchers? How certain are the results? Generalizability How well (or likely) can the findings be applied to other situations?
  • 4. Research Design Steps Research Question Hypotheses Data definitions Data collection Data analysis Conclusions
  • 5. Research Question What is the study designed to answer? Why is the study important? The more specific, the better! Example: Should the library increase hours during finals week?
  • 6. Hypothesis A statement about the expected results. What you will test after collecting data. Null Hypothesis , that there is no difference between Group 1 & Group 2 or Before/After. Notated H o = H a Alternate Hypothesis , that there is a difference and what that difference will be. Notated H o ≠ H a Can also be directional if theory or prior research indicates : H o > H a
  • 7. Data collection Observation Interviews Focus Groups Surveys Transaction Logs Others?
  • 8. Data Collection: Sampling Necessary when it is impossible to study an entire population due to logical, geographical, monetary, or time constraints. A sample must be a good representation of the rest of the population. The larger your sample, the more sure you can be that their answers truly reflect the population Accuracy increases when more respondents pick one choice over another. E.g. More accuracy when 99% choose one presidential candidate The larger your population size, the larger your sample needs to be, except if your population is very large (i.e. the U.S., or very small (i.e. your household)
  • 9. Simple Stratified Assumes homogeneity Assumes heterogeneity Sampling Designs
  • 10. 1) SS = Z 2 * (p) * (1-p) / c 2 2) ss = SS/1+(SS-1/pop) When you have very large pop size When you have finite pop size Z = Z value (e.g. 1.96 for 95% confidence level) p = percentage picking a choice, expressed as decimal (e.g. .5 for 50%) c = confidence interval, expressed as decimal (e.g., .04 = ±4%) Sample size spreadsheet Calculating Sample Sizes
  • 11. Research Question : What is the color distribution of M&Ms? Sample : What is the color distribution of a simple random sample of M&Ms. Test : Does my sample yield different results than what is reported by the company? Method : Packages of M&Ms distributed to each participant. Each package is a random sample from the company. M&M Sampling
  • 12. Let’s look at the colors in individual samples of M&Ms M&M Data Collection & Testing M&M Sampling
  • 13. Data Definitions Data Scales Nominal Ordinal Interval Ratio Frequency Distributions Flat Normal Skewed Variable Types Dependent Independent Extraneous
  • 14. Data Scales Nominal : scaled without order, indicating that classifications are different. Example : Public & private institutions. Ordinal : scaled with order, but without distance between values. Example : Carnegie classifications Interval : scaled with order and establishes numerically equal distances on the scale. Example : Grade level (freshman, sophomore, etc.) Ratio : scaled with equal intervals and a zero starting point. Example : Fulltext downloads. Nominal or ordinal variables are discrete , while interval and ratio variables are continuous
  • 15. Name that data type! Salary Author of a book Hours spent in the library Patron status Publication year of a journal Ranked journal lists Test results on instruction classes Number of articles read FTE
  • 16. Data Distributions Described by their kurtosis (variability) and skew (extremes) Non-normal (skewed): extreme values with steep slopes Normal : bell shaped curve with gradual slopes
  • 17. Fulltime Students at ARL Schools N=114 Mean = 22K SD = 10K
  • 18. Total Salaries & Wages at ARL Libraries N=114 Mean = 10M SD = 6.5M
  • 19. Variables Dependent: the variable being measured, studied, and predicted. Independent : variables that can be manipulated or are predictors of the dependent variable. Extraneous : variables other than the independent variables that can influence the dependent variable.
  • 20. Data analysis Descriptive statistics Mean, Median, Mode Standard Deviation Correlational statistics Correlation Inferential statistics T-test Regression Chi-square ANOVA
  • 21. Correlational Statistics Correlation establishes that two measures have a relationship. Indicates direction & strength, but not causation! Allows researcher to consider other statistical tests with confidence. Requirements random sample interval or ratio data normal distribution linear relationship
  • 22. Correlational Statistics Direction Positive: As one value increases, the other does as well. Example : Age and height. Library : Enrollment & materials budget. Negative: As one value increases, the other decreases. Example : Car speed & time to destination. Library : Items purchased & shelf space. Strength Value between 1 (positive) and -1 (negative). The closer to those values, the stronger the relationship.
  • 24. Inferential Statistics Parametric : assume that the dependent variable has a known underlying mathematical distribution (normal, binomial, Poisson, etc.) which serves as the basis for sample-to-population estimates. Parametric tests are robust and have great power efficiency. Non-parametric : do not assume a normal distribution ( distribution free ) & require that the data meet fewer assumptions. Allow for the analysis of a mixture of data types.
  • 25. T-Test Determine if there is a difference (in a characteristic) between two populations based on data from samples of those populations. Requirements random sample interval or ratio data normal distribution equal standard deviations
  • 27. Regression Predicts values of a dependent variable based on values of independent (predictor) variables Requirements : interval or ratio data normal distribution correlated variables linear relationship
  • 29. ANOVA Determine if there are differences between three or more sample means. Test the significance and direction of the difference. Requirements : normal distribution (in each cell) Interval or ratio data homogeneity of variance
  • 30. ANOVA
  • 31. Chi Square Test Difference between expected and observed frequencies for nominal or ordinal data Requirements : Any type of data Large sample size (>50) Similar distributions
  • 32. Chi Square Test Pepsi Challenge Observed : Pepsi 85, Coke 57, RC 78 Expected (equal) = 73.33 Degrees of freedom = rows - 1 = 3 - 1 = 2 Critical value of χ 2 = 5.99 at alpha = 0.05 Observed value of χ 2 = 5.8 Decision: Fail to reject H 0 5.8 χ 2 = 219.99 220 Totals 0.3 21.81 4.67 73.33 78 RC 3.64 266.67 -16.33 73.33 57 Coke 1.86 136.19 11.67 73.33 85 Pepsi (O-E) 2 /E (O-E) 2 O-E E O
  • 33. Inferential Statistics Poisson regression Negative Binomial reg. OLS Regression Predict value from measured variables Wilcoxon test Chi-Square T-test Compare sample to a hypothetical value Kruskal-Wallace test Chi-square test ANOVA Compare 3+ unmatched groups Mann-Whitney Komogorov-Smirnov Standard two-group t-test Compare 2 paired groups Mann-Whitney test Fisher's test Unpaired t-test Compare 2 unpaired groups Spearman correlation Kendall's tau Pearson correlation Quantify association between variables Non-parametric Parametric Goal
  • 34. Review: Research Design Research Question What will the study answer? Hypotheses What do you think the results will be? Data definitions What scales are the variables, what is the distribution, and what are the dependent, independent & extraneous variables? Data collection What is the best method for collecting the variables of interest? Data analysis What are the proper statistical tests to use on the data? Conclusions What does the data show us or indicate?
  • 35. Case Studies Citation Analysis Antelman, K (2004) “Do Open-Access Articles Have a Greater Research Impact?” College & Research Libraries News 65(5):pp. 372-382 Usage Analysis Blecic, DD (1999) “Measurements of journal use: an analysis of the correlations between three methods.” Bull Med Libr Assoc 87(1): 20-25. Service Analysis Nichols, J; Shaffer, B; Shockey, K. (2003). “Changing the Face of Instruction: Is Online or In-class More Effective?” College & Research Libraries , 64:5: 378-389.
  • 36. “ Changing the Face of Instruction…” Is an online tutorial as effective in teaching library instruction as a classroom setting? H3. Students will report as much or more satisfaction with online instruction as students taking traditional instruction. Research Question Hypotheses H1. Students will have higher scores in information literacy tests after library instruction. H2. Students will have the same or higher scores in info-lit tests after taking online tutorials as students taking traditional instruction.
  • 37. “ Changing the Face of Instruction…” Variables: Test scores & survey results Data Collection: Pretest/Posttest & Survey Variables & Data Collection Statistical Tests Conclusions Accept H1: Instruction improves literacy. Desc Stats incl. mean, standard deviation, standard error, T-tests (1 & 2 tailed) Accept H3 alternative hypothesis – Student satisfaction is equal with both methods. Accept H2 alternative hypothesis – Online has no significant difference from traditional.
  • 38. “ Do Open-Access Articles…” Research Question Hypothesis Variables and Data Collection Statistical Tests Conclusions Critical Questions
  • 39. “ Do Open-Access Articles…” Do freely available articles have a greater research impact? Research impact: citation rates Open Access: freely available Research Question Hypotheses H1. Scholarly articles have a greater research impact if the articles are freely available online than if they are not. Ho: (null hypothesis): There is no difference between the mean citation rates: Ho: d1 = d0 Measures
  • 40. “ Do Open-Access Articles…” Variables: Mean citation rates Data Collection: At least 50 articles from 10 leading journals in 4 disciplines. Variables & Data Collection Statistical Tests Conclusions Reject Ho: Open Access articles are citation more than those that are not OA. Desc Stats incl. mean, standard deviation, standard error, Wilcoxon sign-rank Validity? Reliability of Measures? Generalizability? Alternate hypotheses? Discussion
  • 41. My favorite statistic… Baseball is 90% mental – the other half is physical.

Editor's Notes

  • #2: Science & Electronic Resources Librarian Libraries of the Claremont Colleges
  • #3: Part I will be an overview of developing a research project with the aim of using statistics as a methodology in the analysis. Part II will be an overview of statistical concepts and language. Part III will be an exercise in evaluating library statistics.
  • #4: Three key concepts to remember when designing a research project of any kind, but especially statistical projects are Validitiy, Reliability and Generalizability. Validity is how well a variable measures a particular concept; For example – if we are measuring use, is it valid to count reshelving figures as use? Fulltext downloads? Reliability, the consistency of the variable, measurement, or test; One basic of scientific and statistical analysis is that the results can be confirmed by others repeating the experiment or data analysis. Without reliability, we would have Generalizability, means can the results be applied to other situations. For example: if you observed students using group study rooms in this library, could you generalize that use across all hours, days, buildings, user groups, or other institutions? If you can, your research can become a general model, a universal law, or immutable truth. But if you can’t, that just means that the results are applicable in common situations.
  • #5: Designing research includes formulating initial hypotheses, or statements about what the researcher thinks the data will show, data collection (and manipulation) through a variety of techniques, and statistical analysis that is suited to the hypotheses and data. Here are the key stages for doing research. Statistics are only a tool to help us understand the outcome of the research. Much research can be done not employing statistical techniques – most ethnographic research relies on direct observation and not on analysis of statistics. Take medicine for example: drug works, drug is safe, prescribe drug. Observational data or microscopic data may suffice. But most research relies on statistical analysis of research data, no matter how it’s collected.
  • #10: There are two basic designs to sample: a simple random sample and a stratified random sample, and they are pretty similar. Draw a single circle and a circle composed of other circles inside to provide visual aid. A simple random sample is what it sounds. A group of subjects is chosen from the whole population and each subject has an equal chance of being samples. If you took the campus directory and randomly selected a 100 names, that would be a simple random sample. Draw examples comparing simple and stratified. A stratified random sample is a bit more complex. It assumes that your population is composed of different types of individuals and that you want some knowledge about each group. For example, libraries often want to know how well they serve their communities and want to know something about students, faculty and staff. Are they meeting each of their needs? The solution to this problem is to divide up the population into each group and then randomly sample each group. Samples from each group are generally proportional to the size of each population.
  • #12: Now comes to a really fun and interactive part of the workshop. In this study, we are going to sample M&Ms and try to figure out the frequency of colors. Not only that, but we’re going to test our results against what the Mars Candy Company says should be the frequency. Lets think about our M&M packs. At the plant, the company loads millions of these little candies into a big hopper and tries to mix them so that they are randomly distributed. When they get packaged, the company wants you to get some of each color, but does not regulate the number of colored candies going into your package – some may have more blues, some may have more yellows. Each of these packages, you can consider a random sample of the large hopper or bin of M&M candies. And if we sample enough of these packages, we should start getting close to the distribution of colors at the company. Remember, we are doing samples because we don’t have enough money to count all M&Ms sold in every store.
  • #13: How is accuracy affected by size of sample? What would explain a difference between our observed results and M&M’s reported figures? Was our sample a good representation of the population? Is our methodology valid? Are our results generalizable?
  • #14: A review of the Basic Statistics for Librarians workshop. The five components were statistical concepts, evaluation of literature, sampling, an introduction to usage statistics, and designing a research study. Concepts included frequency distributions including flat (no change), normal ( a bell curve shape), and skewed (very many sloping to very few or vice versa). Mean is the average of a group and median is the middle value of a set of ordered values. A standard deviation is the measure of the dispersion or variation in a sample. For a normal distribution, 68% of the data is found within +/- 1 SD, 95% is +/- 2, and over 99% is +/- 3. Three key concepts to remember when evaluating literature are Validity, how well a variable measures the concept being studied; Reliability, the consistency of the variable, measurement, or test; and Generalizability, can the results be applied to other situations. Sampling is the act of drawing a portion of subjects to measure from a larger population. A random sample is the strongest type of sample since it is assumed to be a fair representation of the population. More complex sampling includes stratified random sampling, where a portion from each representation group of the population is taken, or convenience sampling where the sample includes a non-random sample. Sample size is important and for small populations, more subjects are needed. Usage statistics are very important in applied librarianship today but researchers need to remember to ask questions such as what is being measured, who did what is measured, why they did it, and how many of them did it. Most datasets will include outliers and missing data that can impact the statistical tests but there are many techniques for dealing with these problem data.
  • #16: Quiz – scale these types of data? Take a few minutes to write down what type of data these are, then we’ll go over them: Salary: ratio Author: nominal Hours: ratio Patron: nominal Publication: interval Ranked: Ordinal Tests: interval Articles: interval FTE: ratio
  • #18: This is a histogram of fulltime enrollments at ARL schools Fulltime students average about 22 thousand The standard deviation is about 10 thousand. QUIZ How many schools fall between 12 and 32 thousand students? Answer: 68%
  • #19: Lets now look at some real data from libraries to apply the concepts of mean and standard deviation This is histogram I generated from data I collected from American Research Libraries on total salaries and wages. There are 114 libraries included in this histogram Mean salary and wages at ARL libraries is about 10 million SD is about 6 and a half million
  • #21: How do you get a Law named after you? The key stages of statistical research, for collection & analysis, I’ve just listed a few examples and will briefly go over them.
  • #32: I’ll explain this in depth – how to get DF, how to do Chi-Square, etc.
  • #33: I’ll explain this in depth – how to get DF, how to do Chi-Square, etc.
  • #35: Designing research includes formulating initial hypotheses, or statements about what the researcher thinks the data will show, data collection (and manipulation) through a variety of techniques, and statistical analysis that is suited to the hypotheses and data. Here are the key stages for doing research. Statistics are only a tool to help us understand the outcome of the research. Much research can be done not employing statistical techniques – most ethnographic research relies on direct observation and not on analysis of statistics. Take medicine for example: drug works, drug is safe, prescribe drug. Observational data or microscopic data may suffice. But most research relies on statistical analysis of research data, no matter how it’s collected.
  • #36: I’ll state some general introduction about each types of analysis. And then introduce Nichols’ et. al study as the first example.
  • #37: Everyone will read the article and then we’ll go through these together, with each item coming out after someone states it.
  • #38: After going through this, we’ll discuss what the study did right (pretest, posttest, survey), and did wrong, including assumptions (Not stating the null hypotheses, accepting the alternate hypothesis when should have been rejected.
  • #39: As a group, the participants will read through this study and come up with the answers to the 5 questions, with discussion centering around the reliabilibity, validity, and generalizability, with a focus on finding out if the methods, variables, and tests fit the question.
  • #40: Everyone will read the article and then we’ll go through these together, with each item coming out after someone states it.
  • #41: After going through this, we’ll discuss what the study did right (pretest, posttest, survey), and did wrong, including assumptions (Not stating the null hypotheses, accepting the alternate hypothesis when should have been rejected.
  • #42: Just for fun….