Analyzing Responses to Likert ItemsAn Exploration of Data from a Credibility Study Involving WikiDashboard(http://wikidashboard.parc.com)by Sanjay Kairam
WikiDashboard StudyThe SystemThe StudyThe Data
WikiDashboard“Social Dynamic Analysis Tool” for WikipediaMichael Scott (The Office): “Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you are getting the best possible information”What happens when we see who is doing the editing?
WikiDashboard (Close-Up)
WikiDashboard StudyStudy conducted on Amazon Mechanical TurkN = 288 subjectsSubjects paid $0.08 / HIT“Please read and evaluate this Wikipedia Article.”
Experiment ConditionsParticipants each placed in 1 of 3 conditions (each N = 96):Wiki Only (WO)Wiki + History (WH)WikiDashboard (WD)
Articles UsedEach subject read 1 (of 8 possible) Wikipedia articles.Article “Quality”:“Low-Quality” articles were those flagged as “B-Class” or “C-Class” by the Wikipedia community.“High-Quality” articles were those which had at one time been “Featured Articles”.Article “Controversiality”:“Controversial” articles were those on the extensive “List of Controversial Articles”.
SurveySelf-Reported Expertise“How familiar are you with the topic discussed on this Wikipedia page?”Manipulation/Quality Checks“In 5-20 words, please describe what this Wikipedia page is about.”“Please describe one fact from the article that you found interesting.” (WO)“Please name at least one user (by username or IP address) who has made multiple edits to this page. (WH, WD)
Credibility AssessmentAssessing agreement with these statements:“I believe that the information on this page is accurate.” (Accuracy)“I believe that the information on this page is objective.” (Objectivity)“I believe that the information on this page is current and up-to-date.” (Currency)“I believe that this page fully covers the relevant information on the topic.” (Coverage)“I trust the information on this page.” (Trust)
Likert Item ResponsesParticipants answered using a 5-point scale:-2:	“Strongly Disagree”-1:	“Somewhat Disagree”0:	“Neither Agree nor Disagree”+1:	“Somewhat Agree”+2:	“Strongly Agree”Now, what do we do with this data?
Analyzing Likert Item ResponsesVery often, we see papers reporting Likert responses using means:What is the average of 1 “Somewhat Agree” and 3 “Somewhat Disagree”s?Hint: It’s not “Somewhat Disagree and a Half”In this case, what does a “mean” mean?In most cases, an ANOVA would definitely not work as well, though people still try!
Options for AnalysisNon-Parametric Tests for Ordinal DataConversion to an Interval ScaleAggregating Items
Mann-Whitney U TestAlso called “Mann-Whitney-Wolcoxon”, “Wilcoxon Rank-Sum”, or “Wilcoxon-Mann-Whitney” test.Non-parametric test for assessing whether two independent samples of observations have equally large values.http://en.wikipedia.org/wiki/Mann-Whitney_U
Mann-Whitney U TestAssumptions:All observations from both groups are independent of each other.The responses are ordinal or continuous measurements.Null hypothesis includes symmetry between two populations consideredUnder alternative hypothesis, probability of an observation from pop. X exceeding an observation from pop. Y is not equal to 0.5http://en.wikipedia.org/wiki/Mann-Whitney_U
Kruskal-Wallis ANOVAWhat if we want to test more than 2 groups? (as we do, given our 3 experimental conditions) Kruskal-Wallis ANOVA is an extension of Mann-Whitney U to 3 or more groups.Also non-parametric, though it does assume that both distributions have a similar underlying shape.http://en.wikipedia.org/wiki/Kruskal-Wallis_one-way_analysis_of_variance
Analysis Using Non-Parametric TestsDo participants actually notice differences in article quality?Mann-Whitney: Significant effects of article quality for ratings of Accuracy (p< 0.001), Coverage (p< 0.01), Currency (p< 0.001), and Trust (p< 0.001), with marginally significant effect on Objectivity (p< 0.096).Kruskal-Wallis: Significant effect on ratings of Accuracy (p< 0.001), Coverage (p< 0.012), Currency (p< 0.001), and Trust (p< 0.001), with no significant effect on Objectivity.
Sample Boxplots: Ratings by Article QualityAccuracyCoverage
Analysis Using Non-Parametric TestsDo participants notice differences in how “controversial” an article is?Mann-Whitney: Significant effect on ratings of Coverage (p < 0.039), Currency (p < 0.039), Objectivity (p < 0.021), and Trust (p < 0.021), with no effect on ratings of Accuracy.Kruskal-Wallis: Significant effect on ratings of Objectivity (p < 0.042), and marginally significant effect for Coverage (p < 0.077) and Currency (p < 0.083), but no significant effect on Accuracy or Trust.
Analysis Using Non-Parametric TestsWhat we really want to know, however, is whether using WikiDashboard or Wiki + History makes participants more sensitive to article quality or controversiality than participants using Wikipedia on its own.Both tests only allow us to compare populations separated on the basis of a single variable, however, so we can’t explore these interaction effects.
Conversion to Interval ScaleIf there were a way to map our Likert item responses on to an interval scale, we could use more familiar/powerful statistical tests.If we found that the mapped data was normal, for instance, we could use our usual parametric tests such as MANOVA, which would help us find these interaction effects.
Conversion to Interval ScaleE.J. Snell (1964) describes a procedure for mapping ordered data, like Likert responses, to an assumed underlying continuous scale of measurement.At the end, he emphasizes that “the usefulness of the proposed method depends upon the assumption that the underlying scale of measurement can be transformed to produce a normal distribution.”Snell, E.J. A Scaling Procedure for Ordered Categorical Data, Biometrics 20(3), pp. 592-607 (1964).http://www.jstor.org/stable/2528498
Utilizing the Snell ConversionThe conversion procedure was used to transform the data – essentially mapped each response (ranging from -2 to +2) to a new point which ranged from roughly -1.00 to +4.05Essentially, it looks as if only the distances between the values has changed.
Histogram: Original Data
Histogram: Snell-Converted Data
Aggregating Likert ItemsIf we consider the various Likert items to be different measurements of a certain underlying trait (Credibility), then can we sum them and run parametric statistical tests?Haven’t tried this yet – is this a valid approach?
Analyzing Responses to Likert Itemsby Sanjay KairamEmail: sanjay.kairam@gmail.comTwitter: @skairam

More Related Content

PPT
Likert scale
PDF
Likert scale an introduction for beginners
PPTX
statistical analysis of questionnaires
PPTX
Statrting spss
PDF
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
PDF
Questionnaire analysis using_spss
PPTX
Factor analysis (1)
PPTX
Statistical tools in research 1
Likert scale
Likert scale an introduction for beginners
statistical analysis of questionnaires
Statrting spss
Factor Analysis and Correspondence Analysis Composite and Indicator Scores of...
Questionnaire analysis using_spss
Factor analysis (1)
Statistical tools in research 1

What's hot (20)

PPTX
Data Analysis in Research: Descriptive Statistics & Normality
PPTX
Intro to quant_s_tudents
PPT
Slayter on planning quant design for flc projects - may 2011
PPTX
2 statistics, measurement, graphical techniques
PPT
Analyzing survey data
PPT
Spss introductory session data entry and descriptive stats
PPT
Week4 Ensure Analysis Is Accurate And Complete
PPT
Coursework Data Interpretation
PPTX
Analysing/Interpreting Quantitative Research
PPTX
Data analysis and Presentation
PPT
Mba ii rm unit-4.1 data analysis & presentation a
PPTX
Statistical analysis, presentation on Data Analysis in Research.
PPTX
Unit 1.2
PPTX
Initial analysis of data metpen
PPTX
Mba2216 week 11 data analysis part 03 appendix
PPTX
3 survey, questionaire, graphic techniques
PPT
Abdm4064 week 11 data analysis
PPTX
Confirmatory factor analysis (cfa)
PPT
DataGathering-Qualitative and Quantitative
PPTX
Inferential Statistics
Data Analysis in Research: Descriptive Statistics & Normality
Intro to quant_s_tudents
Slayter on planning quant design for flc projects - may 2011
2 statistics, measurement, graphical techniques
Analyzing survey data
Spss introductory session data entry and descriptive stats
Week4 Ensure Analysis Is Accurate And Complete
Coursework Data Interpretation
Analysing/Interpreting Quantitative Research
Data analysis and Presentation
Mba ii rm unit-4.1 data analysis & presentation a
Statistical analysis, presentation on Data Analysis in Research.
Unit 1.2
Initial analysis of data metpen
Mba2216 week 11 data analysis part 03 appendix
3 survey, questionaire, graphic techniques
Abdm4064 week 11 data analysis
Confirmatory factor analysis (cfa)
DataGathering-Qualitative and Quantitative
Inferential Statistics
Ad

Similar to Analyzing Responses to Likert Items (20)

PPT
reliability and validity
PPS
Multivariate Models in Questionnaire Development
PDF
Contemporary research practices
PPTX
Statistics
PPTX
S6 quantitative research 2019
PPT
Experimental
DOCX
Answer all questions individually and cite all work!!1. Provid.docx
DOCX
Method of measuring test reliability
PPT
Factor anaysis scale dimensionality
PDF
PPT
Wikipedia as an Ontology for Describing Documents
PPTX
Doing observation and Data Analysis for Qualitative Research
PDF
Review of Basic Statistics and Terminology
PPTX
Factor analysis
PDF
Back to the basics-Part2: Data exploration: representing and testing data pro...
PPT
Indexes scales and typologies
PDF
Week_2_Lecture.pdf
DOCX
Strict Standards Only variables should be passed by reference.docx
PDF
Bengkel Analisis Data Menggunakan SPSS- 3152024-.pdf
PDF
Validity and Reliability of the Research Instrument; How to Test the Validati...
reliability and validity
Multivariate Models in Questionnaire Development
Contemporary research practices
Statistics
S6 quantitative research 2019
Experimental
Answer all questions individually and cite all work!!1. Provid.docx
Method of measuring test reliability
Factor anaysis scale dimensionality
Wikipedia as an Ontology for Describing Documents
Doing observation and Data Analysis for Qualitative Research
Review of Basic Statistics and Terminology
Factor analysis
Back to the basics-Part2: Data exploration: representing and testing data pro...
Indexes scales and typologies
Week_2_Lecture.pdf
Strict Standards Only variables should be passed by reference.docx
Bengkel Analisis Data Menggunakan SPSS- 3152024-.pdf
Validity and Reliability of the Research Instrument; How to Test the Validati...
Ad

Recently uploaded (20)

PDF
Trump Administration's workforce development strategy
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
Complications of Minimal Access-Surgery.pdf
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
International_Financial_Reporting_Standa.pdf
Trump Administration's workforce development strategy
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
Uderstanding digital marketing and marketing stratergie for engaging the digi...
Unit 4 Computer Architecture Multicore Processor.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
What if we spent less time fighting change, and more time building what’s rig...
FORM 1 BIOLOGY MIND MAPS and their schemes
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
Complications of Minimal Access-Surgery.pdf
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
202450812 BayCHI UCSC-SV 20250812 v17.pptx
LDMMIA Reiki Yoga Finals Review Spring Summer
Paper A Mock Exam 9_ Attempt review.pdf.
TNA_Presentation-1-Final(SAVE)) (1).pptx
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
International_Financial_Reporting_Standa.pdf

Analyzing Responses to Likert Items

  • 1. Analyzing Responses to Likert ItemsAn Exploration of Data from a Credibility Study Involving WikiDashboard(http://wikidashboard.parc.com)by Sanjay Kairam
  • 3. WikiDashboard“Social Dynamic Analysis Tool” for WikipediaMichael Scott (The Office): “Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you are getting the best possible information”What happens when we see who is doing the editing?
  • 5. WikiDashboard StudyStudy conducted on Amazon Mechanical TurkN = 288 subjectsSubjects paid $0.08 / HIT“Please read and evaluate this Wikipedia Article.”
  • 6. Experiment ConditionsParticipants each placed in 1 of 3 conditions (each N = 96):Wiki Only (WO)Wiki + History (WH)WikiDashboard (WD)
  • 7. Articles UsedEach subject read 1 (of 8 possible) Wikipedia articles.Article “Quality”:“Low-Quality” articles were those flagged as “B-Class” or “C-Class” by the Wikipedia community.“High-Quality” articles were those which had at one time been “Featured Articles”.Article “Controversiality”:“Controversial” articles were those on the extensive “List of Controversial Articles”.
  • 8. SurveySelf-Reported Expertise“How familiar are you with the topic discussed on this Wikipedia page?”Manipulation/Quality Checks“In 5-20 words, please describe what this Wikipedia page is about.”“Please describe one fact from the article that you found interesting.” (WO)“Please name at least one user (by username or IP address) who has made multiple edits to this page. (WH, WD)
  • 9. Credibility AssessmentAssessing agreement with these statements:“I believe that the information on this page is accurate.” (Accuracy)“I believe that the information on this page is objective.” (Objectivity)“I believe that the information on this page is current and up-to-date.” (Currency)“I believe that this page fully covers the relevant information on the topic.” (Coverage)“I trust the information on this page.” (Trust)
  • 10. Likert Item ResponsesParticipants answered using a 5-point scale:-2: “Strongly Disagree”-1: “Somewhat Disagree”0: “Neither Agree nor Disagree”+1: “Somewhat Agree”+2: “Strongly Agree”Now, what do we do with this data?
  • 11. Analyzing Likert Item ResponsesVery often, we see papers reporting Likert responses using means:What is the average of 1 “Somewhat Agree” and 3 “Somewhat Disagree”s?Hint: It’s not “Somewhat Disagree and a Half”In this case, what does a “mean” mean?In most cases, an ANOVA would definitely not work as well, though people still try!
  • 12. Options for AnalysisNon-Parametric Tests for Ordinal DataConversion to an Interval ScaleAggregating Items
  • 13. Mann-Whitney U TestAlso called “Mann-Whitney-Wolcoxon”, “Wilcoxon Rank-Sum”, or “Wilcoxon-Mann-Whitney” test.Non-parametric test for assessing whether two independent samples of observations have equally large values.http://en.wikipedia.org/wiki/Mann-Whitney_U
  • 14. Mann-Whitney U TestAssumptions:All observations from both groups are independent of each other.The responses are ordinal or continuous measurements.Null hypothesis includes symmetry between two populations consideredUnder alternative hypothesis, probability of an observation from pop. X exceeding an observation from pop. Y is not equal to 0.5http://en.wikipedia.org/wiki/Mann-Whitney_U
  • 15. Kruskal-Wallis ANOVAWhat if we want to test more than 2 groups? (as we do, given our 3 experimental conditions) Kruskal-Wallis ANOVA is an extension of Mann-Whitney U to 3 or more groups.Also non-parametric, though it does assume that both distributions have a similar underlying shape.http://en.wikipedia.org/wiki/Kruskal-Wallis_one-way_analysis_of_variance
  • 16. Analysis Using Non-Parametric TestsDo participants actually notice differences in article quality?Mann-Whitney: Significant effects of article quality for ratings of Accuracy (p< 0.001), Coverage (p< 0.01), Currency (p< 0.001), and Trust (p< 0.001), with marginally significant effect on Objectivity (p< 0.096).Kruskal-Wallis: Significant effect on ratings of Accuracy (p< 0.001), Coverage (p< 0.012), Currency (p< 0.001), and Trust (p< 0.001), with no significant effect on Objectivity.
  • 17. Sample Boxplots: Ratings by Article QualityAccuracyCoverage
  • 18. Analysis Using Non-Parametric TestsDo participants notice differences in how “controversial” an article is?Mann-Whitney: Significant effect on ratings of Coverage (p < 0.039), Currency (p < 0.039), Objectivity (p < 0.021), and Trust (p < 0.021), with no effect on ratings of Accuracy.Kruskal-Wallis: Significant effect on ratings of Objectivity (p < 0.042), and marginally significant effect for Coverage (p < 0.077) and Currency (p < 0.083), but no significant effect on Accuracy or Trust.
  • 19. Analysis Using Non-Parametric TestsWhat we really want to know, however, is whether using WikiDashboard or Wiki + History makes participants more sensitive to article quality or controversiality than participants using Wikipedia on its own.Both tests only allow us to compare populations separated on the basis of a single variable, however, so we can’t explore these interaction effects.
  • 20. Conversion to Interval ScaleIf there were a way to map our Likert item responses on to an interval scale, we could use more familiar/powerful statistical tests.If we found that the mapped data was normal, for instance, we could use our usual parametric tests such as MANOVA, which would help us find these interaction effects.
  • 21. Conversion to Interval ScaleE.J. Snell (1964) describes a procedure for mapping ordered data, like Likert responses, to an assumed underlying continuous scale of measurement.At the end, he emphasizes that “the usefulness of the proposed method depends upon the assumption that the underlying scale of measurement can be transformed to produce a normal distribution.”Snell, E.J. A Scaling Procedure for Ordered Categorical Data, Biometrics 20(3), pp. 592-607 (1964).http://www.jstor.org/stable/2528498
  • 22. Utilizing the Snell ConversionThe conversion procedure was used to transform the data – essentially mapped each response (ranging from -2 to +2) to a new point which ranged from roughly -1.00 to +4.05Essentially, it looks as if only the distances between the values has changed.
  • 25. Aggregating Likert ItemsIf we consider the various Likert items to be different measurements of a certain underlying trait (Credibility), then can we sum them and run parametric statistical tests?Haven’t tried this yet – is this a valid approach?
  • 26. Analyzing Responses to Likert Itemsby Sanjay KairamEmail: sanjay.kairam@gmail.comTwitter: @skairam

Editor's Notes

  • #18: Incidentally, the box plots show that not all of the effects are actually going in the supposed direction.For instance, Accuracy/Trust are higher for “Low-Quality” items, and Coverage/Currency/Objectivity are higher for “High-Quality” items.Some Thoughts:While the subjects rated the B/C-Class Articles as less complete and more current than the Featured Articles, they rated the “lower-quality” articles as more accurate and more trustworthy overall than the Featured Articles. This makes more sense when you consider why these articles were flagged in the first place. The article for “hip hop music” was actually once a featured article, but is currently flagged for reasons relating to the scope of hip hop (“No references to any of the classic hip hop stars”, “Hip-hop and Rap are entirely different things”, etc.), so perhaps while the existing material in the article may have been seen as accurate, the article overall is unsatisfactory for its lack of coverage and currency. Regarding “hypnagogia”, one of the discussion points flagged even on the main article page is that the article should be merged with another article, “Threshold Consciousness”, indicating that the article may not cover the topic fully.
  • #19: Again, we see an interesting pattern where they rated the “controversial” items as more credible on some measures and less credible on others. The fact that they rated the “controversial” articles higher on “coverage”, “currency”, and “objectivity” may have to do with the fact that these articles receive a high number of edits overall, meaning that the pages are frequently updated by a number of different editors? It’s interesting to see that in spite of seeing the “controversial” articles as more complete and objective, they still trust the non-controversial articles more.