SlideShare a Scribd company logo
Qual, Mixed, Machine
and Everything in Between
Dr. Stuart Shulman
Founder & CEO
Texifter
Prepared for the
3rd World Conference
on Qualitative Research
Lisbon, Portugal
October 17-19, 2018
BACKGROUND: MY ROOTS
Part One
Early 1990s: My Sustainable Agriculture Roots
Emergent properties found in very well read texts,
such as the character type “extremist agent of the law”
Agenda-setting in the press
Relations between Classes
Rates and Terms for Credit
Farm Profitability
Cost of Living
Soil Fertility
Education
Exploration
Speculation
Coding
Validation
Circa 1999
May 2001
Council for Excellence in Government
June 2002
National Defense University
QUALITATIVE METHODS
Part Two
Qualitative Methods: Genes, Taste, or Tactic?
• Qualitative by birth or choice?
– Some look to words as an alternative to number crunching
– Others rooted in rich and meaningful interpretive traditions
• Another group is fluent in both qual & quant
– Mixed methods open up rather than limits fields of knowledge
• One central goal is valid inferences about phenomena
– Replicable and transparent methods
– Attention to error and corrective measures
– Internal and external validation of results
• Using computers for qualitative data analysis helps, but…
– Rigor still originates with the research design, not the technology
– Software makes better organization and efficiency possible
– Coders enable the researcher to step back while scaling up
Purist
A Spectrum of Methods Approaches
deep immersion
closeness to data
antipathy to numbers
credible interpretation
in-depth analysis
contextual
subjective
experimental
mixed method
adaptive hybrid
flexible approach
interdisciplinary
open minded
quantitative
focus on error
measurement critical
validity and reliability
replication & objectivity
generalization
hypotheses
PositivistPluralist
COMPUTER ASSISTED QUALITATIVE
DATA ANALYSIS SOFTWARE
Part Three
An Incredibly Important Book for Me
Other Very Important Books
Traditional Off-the-Shelf CAQDAS
Nvivo
Atlas
MaxQDA
Text Analytics Packages
RapidMiner
Attensity
Published in 2008
Chi-Jung Lu and Stuart W. Shulman, “Rigor and Flexibility in Computer-based
Qualitative Research: Introducing the Coding Analysis Toolkit,” International
Journal of Multiple Research Approaches Vol. 2, No. 1 (2008), 105-117.
Positive Claims for Using Software
• Convenience: Data is accessible & reducible
• Efficiency: Computer-assisted tasks like search
• Organization: Codes, memos, and teams
• Patterns: Co-occurrences, frequencies, etc.
• Outliers: Significant and otherwise
• Scale: Testing of observable implications
• Iteration: A continuous and evolving process
• Transparency: Clarify methods & confirmability
• Legitimacy: Accuracy, validity, & credibility
Concerns about Using Software
• Convenience: Too many tempting short cuts
• Efficiency: May undermine meaning making
• Organization: Becomes an end itself
• Patterns: Can be misleading
• Outliers: May be undervalued as noise
• Scale: Big data is not better data
• Iteration: Bias may be inscribed in features
• Transparency: Features that are a black box
• Legitimacy: Research design is the actual key
COLLABORATION & MEASUREMENT
Part Four
Qual, Mixed, Machine and Everything in Between
Qual, Mixed, Machine and Everything in Between
Text Classification
A 2,500-year-old problem
Plato argued it would be frustrating; it still is
Software cannot remove the problem
It can expose it more quickly
Grimmer & Stewart “Text as Data”
Political Analysis (2013)
Volume is a problem for scholars
Coders are expensive
Groups struggle to accurately label text at scale
Validation of both humans and machines is “essential”
Some models are easier to validate than others
All models are wrong
Automated models enhance/amplify, but don’t replace humans
There is no one right way to do this
“Validate, validate, validate”
“What should be avoided then, is the blind use of
any method without a validation step.”
Qual, Mixed, Machine and Everything in Between
Computer Science & NSF Influence:
Measure Everything!
How fast?
How reliable?
How accurate?
Valid?
Inter-Rater Reliability is One Key Factor
Understanding the landscape of human interpretation better
prepares us to face the challenge of machine classification.
Fleiss’ Kappa: The Level of
Agreement Beyond Chance
Adjudicate Coder Disagreement
“CoderRank for Enhanced Machine Learning”
“CoderRank is to text analytics what
PageRank was to search. Just as Google
said not all web pages are created equal,
Texifter argues that not all humans are
created equal. When training machines, it
is best to rely most on the humans most
likely to create a valid observation. We
proposed a unique way to rank humans
on trust and knowledge vectors.”
HUMAN & MACHINE LEARNING
Part Five
Labeling, Tagging, or Annotation
Improves Machine Learning Over Time
Iterate Human Coding & Machine Learning
Word Sense Disambiguation (Relevance)
Qual, Mixed, Machine and Everything in Between
“Patriots” Football Versus Politics
Qual, Mixed, Machine and Everything in Between
Naturally Occurring Clusters of Free Text
Can Be Discovered Automatically
• A free and open source software option
• Web-based crowd source collaborative tools
• Measurement innovation
• Free real time Twitter data collection
• Random sampling and keystroke coding
• Advanced search and filtering
• Deduplication and clustering algorithms
• Custom machine-learning classifiers
• Word sense disambiguation
• CoderRank for enhanced machine learning
What have CAT & DiscoverText contributed
to the field of qualitative methodology?
Dr. Stuart W. Shulman
Founder & CEO, Texifter, LLC
Editor Emeritus, Journal of Information Technology & Politics
Contact Information
Email: stu@texifter.com
Twitter: @stuartwshulman
Thanks for Listening!

More Related Content

PPTX
Framing media analytics and data ethics
PPTX
Well-Being - A Sunset Conversation
PDF
Making Decisions in a World Awash in Data: We’re going to need a different bo...
PPTX
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
PPTX
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
PPTX
Reproducibility from an infomatics perspective
PDF
A Case for Expectation Informed Design
PPTX
Taylor Ghost of Altmetrics Yet to Come
Framing media analytics and data ethics
Well-Being - A Sunset Conversation
Making Decisions in a World Awash in Data: We’re going to need a different bo...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...
Reproducibility from an infomatics perspective
A Case for Expectation Informed Design
Taylor Ghost of Altmetrics Yet to Come

What's hot (10)

PDF
A Case for Expectation Informed Design - Full
PPTX
iConference 2018 BIAS workshop keynote
PPTX
Redistricting and Voting Technology
PPTX
Cross cultural software production and use
PPTX
Managing Confidential Information – Trends and Approaches
PPTX
Taming AI Engineering Ethics and Policy
PPT
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content
PPTX
June2014 brownbag privacy
PPTX
Text REtrieval Conference (TREC) Dynamic Domain Track 2015
A Case for Expectation Informed Design - Full
iConference 2018 BIAS workshop keynote
Redistricting and Voting Technology
Cross cultural software production and use
Managing Confidential Information – Trends and Approaches
Taming AI Engineering Ethics and Policy
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content
June2014 brownbag privacy
Text REtrieval Conference (TREC) Dynamic Domain Track 2015
Ad

Similar to Qual, Mixed, Machine and Everything in Between (20)

PPTX
Human and machines learning together
PPTX
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
PPTX
CAQDAS 2014 Pecha Kucha - Stuart Shulman
PDF
Measuring reliability and validity in human coding and machine classification
PDF
Mining Text, Survey, Twitter & RSS Data Using DiscoverText
PPTX
DiscoverText Product Overview
PPTX
CoderRank: Creating Gold Standards
PDF
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
PDF
Requirementv4
PDF
Stat Tech Reportv1
PPTX
Summit slide loop ny
PDF
Quality in qualitative research the role of the software’s in quality assur...
PPTX
Text mining why people need to be part of the process
PDF
Hybrid Intelligence
PDF
Requirment
PPT
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
PDF
Michael Bolton - Testing Through The Qualitive Lens - EuroSTAR 2012
PPTX
MCB Qualitative Analysis Workshop
PDF
Qualitative AI : Hoo-ha or Step-Change? CAQDAS webinar
PPT
kantorNSF-NIJ-ISI-03-06-04.ppt
Human and machines learning together
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
CAQDAS 2014 Pecha Kucha - Stuart Shulman
Measuring reliability and validity in human coding and machine classification
Mining Text, Survey, Twitter & RSS Data Using DiscoverText
DiscoverText Product Overview
CoderRank: Creating Gold Standards
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Requirementv4
Stat Tech Reportv1
Summit slide loop ny
Quality in qualitative research the role of the software’s in quality assur...
Text mining why people need to be part of the process
Hybrid Intelligence
Requirment
Using Qualitative Data Analysis Software By Michelle C. Bligh, Ph.D., Claremo...
Michael Bolton - Testing Through The Qualitive Lens - EuroSTAR 2012
MCB Qualitative Analysis Workshop
Qualitative AI : Hoo-ha or Step-Change? CAQDAS webinar
kantorNSF-NIJ-ISI-03-06-04.ppt
Ad

Recently uploaded (20)

PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
medical staffing services at VALiNTRY
PPTX
history of c programming in notes for students .pptx
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
top salesforce developer skills in 2025.pdf
PPTX
ai tools demonstartion for schools and inter college
PDF
System and Network Administration Chapter 2
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPT
Introduction Database Management System for Course Database
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Design an Analysis of Algorithms I-SECS-1021-03
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Adobe Illustrator 28.6 Crack My Vision of Vector Design
medical staffing services at VALiNTRY
history of c programming in notes for students .pptx
How to Choose the Right IT Partner for Your Business in Malaysia
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Operating system designcfffgfgggggggvggggggggg
Which alternative to Crystal Reports is best for small or large businesses.pdf
top salesforce developer skills in 2025.pdf
ai tools demonstartion for schools and inter college
System and Network Administration Chapter 2
PTS Company Brochure 2025 (1).pdf.......
Design an Analysis of Algorithms II-SECS-1021-03
Introduction Database Management System for Course Database
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Softaken Excel to vCard Converter Software.pdf
ISO 45001 Occupational Health and Safety Management System
Internet Downloader Manager (IDM) Crack 6.42 Build 41

Qual, Mixed, Machine and Everything in Between

  • 1. Qual, Mixed, Machine and Everything in Between Dr. Stuart Shulman Founder & CEO Texifter Prepared for the 3rd World Conference on Qualitative Research Lisbon, Portugal October 17-19, 2018
  • 3. Early 1990s: My Sustainable Agriculture Roots
  • 4. Emergent properties found in very well read texts, such as the character type “extremist agent of the law”
  • 6. Relations between Classes Rates and Terms for Credit Farm Profitability Cost of Living Soil Fertility Education Exploration Speculation Coding Validation
  • 8. May 2001 Council for Excellence in Government June 2002 National Defense University
  • 10. Qualitative Methods: Genes, Taste, or Tactic? • Qualitative by birth or choice? – Some look to words as an alternative to number crunching – Others rooted in rich and meaningful interpretive traditions • Another group is fluent in both qual & quant – Mixed methods open up rather than limits fields of knowledge • One central goal is valid inferences about phenomena – Replicable and transparent methods – Attention to error and corrective measures – Internal and external validation of results • Using computers for qualitative data analysis helps, but… – Rigor still originates with the research design, not the technology – Software makes better organization and efficiency possible – Coders enable the researcher to step back while scaling up
  • 11. Purist A Spectrum of Methods Approaches deep immersion closeness to data antipathy to numbers credible interpretation in-depth analysis contextual subjective experimental mixed method adaptive hybrid flexible approach interdisciplinary open minded quantitative focus on error measurement critical validity and reliability replication & objectivity generalization hypotheses PositivistPluralist
  • 12. COMPUTER ASSISTED QUALITATIVE DATA ANALYSIS SOFTWARE Part Three
  • 13. An Incredibly Important Book for Me
  • 16. Nvivo
  • 17. Atlas
  • 22. Published in 2008 Chi-Jung Lu and Stuart W. Shulman, “Rigor and Flexibility in Computer-based Qualitative Research: Introducing the Coding Analysis Toolkit,” International Journal of Multiple Research Approaches Vol. 2, No. 1 (2008), 105-117.
  • 23. Positive Claims for Using Software • Convenience: Data is accessible & reducible • Efficiency: Computer-assisted tasks like search • Organization: Codes, memos, and teams • Patterns: Co-occurrences, frequencies, etc. • Outliers: Significant and otherwise • Scale: Testing of observable implications • Iteration: A continuous and evolving process • Transparency: Clarify methods & confirmability • Legitimacy: Accuracy, validity, & credibility
  • 24. Concerns about Using Software • Convenience: Too many tempting short cuts • Efficiency: May undermine meaning making • Organization: Becomes an end itself • Patterns: Can be misleading • Outliers: May be undervalued as noise • Scale: Big data is not better data • Iteration: Bias may be inscribed in features • Transparency: Features that are a black box • Legitimacy: Research design is the actual key
  • 28. Text Classification A 2,500-year-old problem Plato argued it would be frustrating; it still is Software cannot remove the problem It can expose it more quickly
  • 29. Grimmer & Stewart “Text as Data” Political Analysis (2013) Volume is a problem for scholars Coders are expensive Groups struggle to accurately label text at scale Validation of both humans and machines is “essential” Some models are easier to validate than others All models are wrong Automated models enhance/amplify, but don’t replace humans There is no one right way to do this “Validate, validate, validate” “What should be avoided then, is the blind use of any method without a validation step.”
  • 31. Computer Science & NSF Influence: Measure Everything! How fast? How reliable? How accurate? Valid?
  • 32. Inter-Rater Reliability is One Key Factor Understanding the landscape of human interpretation better prepares us to face the challenge of machine classification. Fleiss’ Kappa: The Level of Agreement Beyond Chance
  • 34. “CoderRank for Enhanced Machine Learning” “CoderRank is to text analytics what PageRank was to search. Just as Google said not all web pages are created equal, Texifter argues that not all humans are created equal. When training machines, it is best to rely most on the humans most likely to create a valid observation. We proposed a unique way to rank humans on trust and knowledge vectors.”
  • 35. HUMAN & MACHINE LEARNING Part Five
  • 36. Labeling, Tagging, or Annotation Improves Machine Learning Over Time
  • 37. Iterate Human Coding & Machine Learning
  • 42. Naturally Occurring Clusters of Free Text Can Be Discovered Automatically
  • 43. • A free and open source software option • Web-based crowd source collaborative tools • Measurement innovation • Free real time Twitter data collection • Random sampling and keystroke coding • Advanced search and filtering • Deduplication and clustering algorithms • Custom machine-learning classifiers • Word sense disambiguation • CoderRank for enhanced machine learning What have CAT & DiscoverText contributed to the field of qualitative methodology?
  • 44. Dr. Stuart W. Shulman Founder & CEO, Texifter, LLC Editor Emeritus, Journal of Information Technology & Politics Contact Information Email: stu@texifter.com Twitter: @stuartwshulman Thanks for Listening!