SlideShare a Scribd company logo
Automated Content Analysis of Discussion Transcripts
Vitomir Kovanovi´c Dragan Gaˇsevi´c
v.kovanovic@ed.ac.uk dgasevic@acm.org
School of Informatics,
University of Edinburgh
Edinburgh, United Kingdom
v.kovanovic@ed.ac.uk
31 Aug 2015,
University of Edinburgh,
United Kingdom
Asynchronous online discussions -
“gold mine of information” (Henri, 1992)
• They are frequently used for all types of
education delivery,
• Their use produced large amount of data
about learning processes,
• Their use is well supported by the
social-constructivist pedagogies.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 1 / 18
Asynchronous online discussions - issues and challenges
• Produced data is used mainly for research after the courses are over,
• Content analysis techniques are complex and time consuming,
• Content analysis had almost no impact on educational practice (Donnelly and
Gardner, 2011),
• There is a need for more proactive use of the data through automation:
• Few attempts for automated content analysis,
• Focus mostly on surface level characteristics, and
• Not based on well established theories of education.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 2 / 18
Overall idea
Overall idea
To examine how we can use text mining for automation of content
analysis of discussion transcripts.
More specifically,
We looked at the automation of content analysis of cognitive
presence, one of the three main components of Community of
Inquiry framework.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 3 / 18
Community of Inquiry (CoI) model
Community of Inquiry model (Garrison, Anderson, and Archer, 1999)
Conceptual framework outlying important constructs that define worthwhile
educational experience in distance education setting.
Three presences:
• Social presence: relationships and social
climate in a community.
• Cognitive presence: phases of cognitive
engagement and knowledge construction.
• Teaching presence: instructional role
during social learning.
CoI model is:
• Extensively researched and validated.
• Adopts Content Analysis for assessment of
presences.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 4 / 18
Community of Inquiry (CoI) model
Community of Inquiry model (Garrison, Anderson, and Archer, 1999)
Conceptual framework outlying important constructs that define worthwhile
educational experience in distance education setting.
Three presences:
• Social presence: relationships and social
climate in a community.
• Cognitive presence: phases of cognitive
engagement and knowledge construction.
• Teaching presence: instructional role
during social learning.
CoI model is:
• Extensively researched and validated.
• Adopts Content Analysis for assessment of
presences.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 4 / 18
Cognitive presence
Cognitive Presence
“an extent to which the participants in any particular configuration of a
community of inquiry are able to construct meaning through sustained
communication.” (Garrison, Anderson, and Archer, 1999, p .89)
Four phases of cognitive presence:
1 Triggering event: Some issue, dilemma or problem is identified.
2 Exploration: Students move between private world of reflection and shared
world of social knowledge construction.
3 Integration: Students filter irrelevant information and synthesize new
knowledge.
4 Resolution: Students analyze practical applicability, test different
hypotheses, and start a new learning cycle.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 5 / 18
Cognitive presence coding scheme
• Use of whole message as unit of analysis,
• Look for particular indicators of different sociocognitive processes,
• Requires expertise with coding instrument and domain knowledge.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 6 / 18
Community of Inquiry (CoI) model
Issues and challenges:
• Very labor intensive,
• Crude coding scheme,
• Requires experienced coders,
• Can’t be used for real-time monitoring,
• Not explaining reasons behind observed levels of presences, and
• Not providing suggestions and guidelines for instructors to direct their
pedagogical decisions.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 7 / 18
Data set
• Six offerings of graduate level course in software engineering.
• Total of 1747 messages, 81 students,
• Manually coded by two coders (agreement = 98.1%, Cohen’s κ = 0.974),
ID Phase Messages (%)
0 Other 140 8.01%
1 Triggering Event 308 17.63%
2 Exploration 684 39.17%
3 Integration 508 29.08%
4 Resolution 107 6.12%
All phases 1747 100%
Number of Messages in Different Phases of Cognitive Presence
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 8 / 18
Feature extraction
• Unigrams, Bigrams and Trigrams,
• Part-of-Speech Bigrams and Trigrams,
• Backoff Bigrams and Trigrams:
Example: “John is working.”
Bigrams:
• john is,
• is working.
Backoff Bigrams:
• john verb ,
• noun is,
• is verb
• verb working.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 9 / 18
Feature extraction
• Dependency triplets: rel, head, modifier
Example: “Bills on ports and immigration were submitted by Senator
Brownback, Republican of Kansas.”
nsubjpass, submitted, Bills
auxpass, submitted, were
agent, submitted, Brownback
nn, Brownback, Senator
appos, Brownback, Republican
prep of, Republican, Kansas
prep on, Bills, ports
conj and, ports, immigration
prep on, Bills, immigration
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 10 / 18
Feature extraction
• Backoff dependency triplets:
Example: “Bills on ports and immigration were submitted by Senator
Brownback, Republican of Kansas.”
Dependency triplet:
• conj and, ports, immigration
Backoff dependency triplets:
• conj and, noun , immigration
• conj and, ports, noun
• conj and, noun , noun
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 11 / 18
Additional features
• Number of named entities in the message
Brainstorming should involve more concepts than posing a question,
• Is message first in the discussion?
Posing questions is more likely to be initiating discussions,
• Is message a reply to the first message in the discussion?
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 12 / 18
Classification
Classifier:
• SVM classifier with RBF kernel.
• Accuracy and kernel parameter tuning evaluated using nested 5-fold
cross-validation.
• Only features with support of 10 or more,
• Accuracy evaluated using 10 fold cross-validation,
• Comparison of models using McNemar’s test.
Implementation:
• Implemented in Java,
• Feature extraction using Stanford CoreNLP1
toolkit,
• Tokenization, Part-of-Speech, and Dependency parsing modules
• Classification using Weka (Witten, Frank, and Hall, 2011) and
LibSVM (Chang and Lin, 2011), and
• Statistical comparison using Java Statistical Classes (JSC)2
1http://nlp.stanford.edu/software/corenlp.shtml
2http://www.jsc.nildram.co.uk/index.htm
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 13 / 18
Results
• We achieved Cohen’s κ of 0.42 for our classification problem.
• Better then the existing Neural Network system (Cohen’s κ=0.31).
• Unigram baseline model achieved Cohen’s κ of 0.33.
Error analysis:
Predicted
Actual Other Trigg. Expl. Integ. Resol.
Other 17 04 05 02 00
Triggering 01 42 ⇒1
14 03 01
Exploration 02 09 98 24 04
Integration 01 03 38 ⇐1,2
56 04
Resolution 00 00 03 15 ⇐2
03
Confusion Matrix
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 14 / 18
Challenges
1 Effect of the large relative size of the
exploration class,
2 Effect of the code-up rule for coding,
3 No relative importance of features, and
4 Context is not taken into the account.
Code-up rule for coding
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 15 / 18
In progress: making use of tread context
• Discussions (and students’ learning) progresses from triggering to resolutions.
• Content of a message depends on the content of the previous messages.
• Content of a message depends on the learning progress of a given student.
Model for message classification
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 16 / 18
Approach: Hidden Markov models (HMMs) & Conditional
random fields (CRFs)
• Hidden Markov Models:
• HMMs used to models system states and their transitions in a variety of
contexts.
• Widely used, Bayesian Knowledge Tracing models based on HMMs.
• Challenges with HMM:
• Can this be modeled as HMM (2nd order HMMs?)
• Dependency only on a single previous state,
• One manifest variable for each state
• Conditional random fields:
• Used for structured predictions (e.g., speech recognition)
• For speech recognition, take into the account the classes of all letters in a word.
• Widely used in natural language processing,
• More flexible than HMMs,
• Challenges with CRF:
• Too many parameters to estimate with little data
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 17 / 18
Conclusions and future work
Summary:
• Promising path to explore,
• Use of backoff trigrams, plain and backoff dependency triplets, entity count
and first message indicator seems useful,
Future work:
• Additional types of features which look at the context of previous messages
(e.g., convergence vs. divergence),
• Moving away from SVM, explore other classification methods which are
better at explanation
• Give associated probabilities for each classification,
• Give relative importance of different features.
Challenges:
• Challenges with message unit of analysis and surface-level features,
• Low frequency of resolution messages.
V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 18 / 18
Thank you
Vitomir Kovanovic
v.kovanovic@ed.ac.uk
References I
Chang, Chih-Chung and Chih-Jen Lin (2011). “LIBSVM: A library for support vector machines”. In:
ACM Transactions on Intelligent Systems and Technology 2 (3), 27:1–27:27.
Donnelly, Roisin and John Gardner (2011). “Content analysis of computer conferencing transcripts”.
In: Interactive Learning Environments 19.4, pp. 303–315.
Garrison, D. Randy, Terry Anderson, and Walter Archer (1999). “Critical Inquiry in a Text-Based
Environment: Computer Conferencing in Higher Education”. In: The Internet and Higher Education
2.2–3, pp. 87–105.
Henri, France (1992). “Computer Conferencing and Content Analysis”. en. In: Collaborative Learning
Through Computer Conferencing, pp. 117–136.
Witten, Ian H., Eibe Frank, and Mark A. Hall (2011). Data Mining: Practical Machine Learning Tools
and Techniques, Third Edition. 3rd ed. Morgan Kaufmann.

More Related Content

PDF
Automated content analysis of cognitive presence: Improving the quality of in...
PDF
SFU SIAT Comprehensive Examination
PDF
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
PDF
Automated Cognitive Presence Detection in Online Discussion Transcripts
PDF
What is the source of social capital? The association between social network ...
PDF
Penetrating the black box of time-on-task estimation
PDF
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
PPTX
Analysis of Metadata and Topic Modeling for
Automated content analysis of cognitive presence: Improving the quality of in...
SFU SIAT Comprehensive Examination
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
Automated Cognitive Presence Detection in Online Discussion Transcripts
What is the source of social capital? The association between social network ...
Penetrating the black box of time-on-task estimation
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Analysis of Metadata and Topic Modeling for

What's hot (20)

PDF
Admixture of Poisson MRFs: A New Topic Model with Word Dependencies
PDF
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
POTX
LDA Beginner's Tutorial
PDF
A novel model of cognitive presence assessment using automated learning analy...
PPTX
Asia-Pacific LSP & Professional Communication, 2017: Developing research meth...
PPT
Introduction to OpenSemcq
PPT
How useful are semantic links for the detection of implicit references in csc...
PPTX
Preservice Teachers' Dispositions Toward Technology Integration
PDF
Research on Recommender Systems: Beyond Ratings and Lists
PDF
Commonsense knowledge for Machine Intelligence - part 1
PDF
E-Learning Adoption in a Higher Education Setting: An Empirical Study
PPTX
What questions are MOOCs asking? An evidence based investigation
DOC
Statistical and Empirical Approaches to Spoken Dialog Systems
PPT
Relevance based ranking of video comments on YouTube
PDF
Domain Modeling for Personalized Learning
PPT
Fuschi current Research and Developments
PDF
Some Methodological Thoughts on Using Text Mining for Frame Analysis of Media...
PDF
Evolving Lesson Plans to Assist Educators: From Paper-Based to Adaptive Lesso...
PPTX
Assignment 2 powerpoint 2
PDF
From Conventional to Technology-Assisted Alternative Assessment for Effective...
Admixture of Poisson MRFs: A New Topic Model with Word Dependencies
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
LDA Beginner's Tutorial
A novel model of cognitive presence assessment using automated learning analy...
Asia-Pacific LSP & Professional Communication, 2017: Developing research meth...
Introduction to OpenSemcq
How useful are semantic links for the detection of implicit references in csc...
Preservice Teachers' Dispositions Toward Technology Integration
Research on Recommender Systems: Beyond Ratings and Lists
Commonsense knowledge for Machine Intelligence - part 1
E-Learning Adoption in a Higher Education Setting: An Empirical Study
What questions are MOOCs asking? An evidence based investigation
Statistical and Empirical Approaches to Spoken Dialog Systems
Relevance based ranking of video comments on YouTube
Domain Modeling for Personalized Learning
Fuschi current Research and Developments
Some Methodological Thoughts on Using Text Mining for Frame Analysis of Media...
Evolving Lesson Plans to Assist Educators: From Paper-Based to Adaptive Lesso...
Assignment 2 powerpoint 2
From Conventional to Technology-Assisted Alternative Assessment for Effective...
Ad

Viewers also liked (8)

PDF
Inquiry-based Learning & MOOCs: Challenges & Opportunities
PDF
Learning Analytics for Communities of Inquiry
PDF
MOOCs in the news- A European perspective
PDF
Kovanović et al. 2017 - developing a mooc experimentation platform: insight...
PDF
A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...
PDF
MOOCs & Social Learning: Challenges and opportunities
PDF
What does effective online/blended teaching look like?
PDF
Automated System for Cognitive Presence Coding
Inquiry-based Learning & MOOCs: Challenges & Opportunities
Learning Analytics for Communities of Inquiry
MOOCs in the news- A European perspective
Kovanović et al. 2017 - developing a mooc experimentation platform: insight...
A Novel Model of Cognitive Presence Assessment Using Automated Learning Analy...
MOOCs & Social Learning: Challenges and opportunities
What does effective online/blended teaching look like?
Automated System for Cognitive Presence Coding
Ad

Similar to Automated Content Analysis of Discussion Transcripts (20)

PDF
Carolyn Rosé - WESST - From Data to Design of Dynamic Support for Collaborati...
PDF
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...
PPT
From Personal Meaning to Shared Understanding: The Nature of Discussion in a ...
PDF
Creating integrated domain, task and competency model
PPS
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
DOC
Assessing Teaching Presence In Instructional Cmc Susan Lulee
PPTX
The implementation of i2 Flex methodology in a language class and the use of ...
PDF
IRJET- Transcription of Conferences
PPTX
Co i evaluation
PDF
Impact the UX of Your Website with Contextual Inquiry
PDF
Shared Inquiry Questions
PDF
Synchronous and asynchronous video conferencing tools
PPT
Ed Technology Pedagogy 2014
PDF
IDENTIFYING THE DAMAGE ASSESSMENT TWEETS DURING DISASTER
PPT
Social and Cognitive Presence in Virtual Learning Environments
PDF
Conversational sensemaking Preece and Braines
PDF
Phil Ice's: Using the Community of Inquiry Framework to Assess the Impact of ...
PPT
Understanding Online Learning: Cognitive Prensence and the SOLO Taxonomy
PDF
Keynote Talk at ITS 2014: Multilevel Analysis of Socially Embedded Learning
PDF
The Connected Intelligence Centre: Human-Centered Analytics for UTS Data Chal...
Carolyn Rosé - WESST - From Data to Design of Dynamic Support for Collaborati...
Automated Analysis of Cognitive Presence in Online Discussions Written in Por...
From Personal Meaning to Shared Understanding: The Nature of Discussion in a ...
Creating integrated domain, task and competency model
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Assessing Teaching Presence In Instructional Cmc Susan Lulee
The implementation of i2 Flex methodology in a language class and the use of ...
IRJET- Transcription of Conferences
Co i evaluation
Impact the UX of Your Website with Contextual Inquiry
Shared Inquiry Questions
Synchronous and asynchronous video conferencing tools
Ed Technology Pedagogy 2014
IDENTIFYING THE DAMAGE ASSESSMENT TWEETS DURING DISASTER
Social and Cognitive Presence in Virtual Learning Environments
Conversational sensemaking Preece and Braines
Phil Ice's: Using the Community of Inquiry Framework to Assess the Impact of ...
Understanding Online Learning: Cognitive Prensence and the SOLO Taxonomy
Keynote Talk at ITS 2014: Multilevel Analysis of Socially Embedded Learning
The Connected Intelligence Centre: Human-Centered Analytics for UTS Data Chal...

More from Vitomir Kovanovic (12)

PDF
Introduction to Learning Analytics for High School Teachers and Managers
PDF
Extending video interactions to support self-regulated learning in an online ...
PPTX
Analysing social presence in online discussions through network and text anal...
PDF
Validating a theorized model of engagement in learning analytics
PDF
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...
PDF
Developing Self-regulated Learning in High-school Students: The Role of Learn...
PPTX
Unsupervised Learning for Learning Analytics Researchers
PPTX
Introduction to R for Learning Analytics Researchers
PDF
Introduction to Learning Analytics
PDF
Introduction to Epistemic Network Analysis
PDF
Understand students’ self-reflections through learning analytics
PDF
Assessing cognitive presence using automated learning analytics methods
Introduction to Learning Analytics for High School Teachers and Managers
Extending video interactions to support self-regulated learning in an online ...
Analysing social presence in online discussions through network and text anal...
Validating a theorized model of engagement in learning analytics
Examining the Value of Learning Analytics for Supporting Work-integrated Lear...
Developing Self-regulated Learning in High-school Students: The Role of Learn...
Unsupervised Learning for Learning Analytics Researchers
Introduction to R for Learning Analytics Researchers
Introduction to Learning Analytics
Introduction to Epistemic Network Analysis
Understand students’ self-reflections through learning analytics
Assessing cognitive presence using automated learning analytics methods

Recently uploaded (20)

PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Lesson notes of climatology university.
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Pharma ospi slides which help in ospi learning
PPTX
master seminar digital applications in india
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Institutional Correction lecture only . . .
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
O7-L3 Supply Chain Operations - ICLT Program
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
Lesson notes of climatology university.
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Microbial disease of the cardiovascular and lymphatic systems
Pharma ospi slides which help in ospi learning
master seminar digital applications in india
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Institutional Correction lecture only . . .
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
GDM (1) (1).pptx small presentation for students
Final Presentation General Medicine 03-08-2024.pptx
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
2.FourierTransform-ShortQuestionswithAnswers.pdf

Automated Content Analysis of Discussion Transcripts

  • 1. Automated Content Analysis of Discussion Transcripts Vitomir Kovanovi´c Dragan Gaˇsevi´c v.kovanovic@ed.ac.uk dgasevic@acm.org School of Informatics, University of Edinburgh Edinburgh, United Kingdom v.kovanovic@ed.ac.uk 31 Aug 2015, University of Edinburgh, United Kingdom
  • 2. Asynchronous online discussions - “gold mine of information” (Henri, 1992) • They are frequently used for all types of education delivery, • Their use produced large amount of data about learning processes, • Their use is well supported by the social-constructivist pedagogies. V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 1 / 18
  • 3. Asynchronous online discussions - issues and challenges • Produced data is used mainly for research after the courses are over, • Content analysis techniques are complex and time consuming, • Content analysis had almost no impact on educational practice (Donnelly and Gardner, 2011), • There is a need for more proactive use of the data through automation: • Few attempts for automated content analysis, • Focus mostly on surface level characteristics, and • Not based on well established theories of education. V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 2 / 18
  • 4. Overall idea Overall idea To examine how we can use text mining for automation of content analysis of discussion transcripts. More specifically, We looked at the automation of content analysis of cognitive presence, one of the three main components of Community of Inquiry framework. V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 3 / 18
  • 5. Community of Inquiry (CoI) model Community of Inquiry model (Garrison, Anderson, and Archer, 1999) Conceptual framework outlying important constructs that define worthwhile educational experience in distance education setting. Three presences: • Social presence: relationships and social climate in a community. • Cognitive presence: phases of cognitive engagement and knowledge construction. • Teaching presence: instructional role during social learning. CoI model is: • Extensively researched and validated. • Adopts Content Analysis for assessment of presences. V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 4 / 18
  • 6. Community of Inquiry (CoI) model Community of Inquiry model (Garrison, Anderson, and Archer, 1999) Conceptual framework outlying important constructs that define worthwhile educational experience in distance education setting. Three presences: • Social presence: relationships and social climate in a community. • Cognitive presence: phases of cognitive engagement and knowledge construction. • Teaching presence: instructional role during social learning. CoI model is: • Extensively researched and validated. • Adopts Content Analysis for assessment of presences. V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 4 / 18
  • 7. Cognitive presence Cognitive Presence “an extent to which the participants in any particular configuration of a community of inquiry are able to construct meaning through sustained communication.” (Garrison, Anderson, and Archer, 1999, p .89) Four phases of cognitive presence: 1 Triggering event: Some issue, dilemma or problem is identified. 2 Exploration: Students move between private world of reflection and shared world of social knowledge construction. 3 Integration: Students filter irrelevant information and synthesize new knowledge. 4 Resolution: Students analyze practical applicability, test different hypotheses, and start a new learning cycle. V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 5 / 18
  • 8. Cognitive presence coding scheme • Use of whole message as unit of analysis, • Look for particular indicators of different sociocognitive processes, • Requires expertise with coding instrument and domain knowledge. V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 6 / 18
  • 9. Community of Inquiry (CoI) model Issues and challenges: • Very labor intensive, • Crude coding scheme, • Requires experienced coders, • Can’t be used for real-time monitoring, • Not explaining reasons behind observed levels of presences, and • Not providing suggestions and guidelines for instructors to direct their pedagogical decisions. V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 7 / 18
  • 10. Data set • Six offerings of graduate level course in software engineering. • Total of 1747 messages, 81 students, • Manually coded by two coders (agreement = 98.1%, Cohen’s κ = 0.974), ID Phase Messages (%) 0 Other 140 8.01% 1 Triggering Event 308 17.63% 2 Exploration 684 39.17% 3 Integration 508 29.08% 4 Resolution 107 6.12% All phases 1747 100% Number of Messages in Different Phases of Cognitive Presence V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 8 / 18
  • 11. Feature extraction • Unigrams, Bigrams and Trigrams, • Part-of-Speech Bigrams and Trigrams, • Backoff Bigrams and Trigrams: Example: “John is working.” Bigrams: • john is, • is working. Backoff Bigrams: • john verb , • noun is, • is verb • verb working. V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 9 / 18
  • 12. Feature extraction • Dependency triplets: rel, head, modifier Example: “Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas.” nsubjpass, submitted, Bills auxpass, submitted, were agent, submitted, Brownback nn, Brownback, Senator appos, Brownback, Republican prep of, Republican, Kansas prep on, Bills, ports conj and, ports, immigration prep on, Bills, immigration V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 10 / 18
  • 13. Feature extraction • Backoff dependency triplets: Example: “Bills on ports and immigration were submitted by Senator Brownback, Republican of Kansas.” Dependency triplet: • conj and, ports, immigration Backoff dependency triplets: • conj and, noun , immigration • conj and, ports, noun • conj and, noun , noun V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 11 / 18
  • 14. Additional features • Number of named entities in the message Brainstorming should involve more concepts than posing a question, • Is message first in the discussion? Posing questions is more likely to be initiating discussions, • Is message a reply to the first message in the discussion? V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 12 / 18
  • 15. Classification Classifier: • SVM classifier with RBF kernel. • Accuracy and kernel parameter tuning evaluated using nested 5-fold cross-validation. • Only features with support of 10 or more, • Accuracy evaluated using 10 fold cross-validation, • Comparison of models using McNemar’s test. Implementation: • Implemented in Java, • Feature extraction using Stanford CoreNLP1 toolkit, • Tokenization, Part-of-Speech, and Dependency parsing modules • Classification using Weka (Witten, Frank, and Hall, 2011) and LibSVM (Chang and Lin, 2011), and • Statistical comparison using Java Statistical Classes (JSC)2 1http://nlp.stanford.edu/software/corenlp.shtml 2http://www.jsc.nildram.co.uk/index.htm V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 13 / 18
  • 16. Results • We achieved Cohen’s κ of 0.42 for our classification problem. • Better then the existing Neural Network system (Cohen’s κ=0.31). • Unigram baseline model achieved Cohen’s κ of 0.33. Error analysis: Predicted Actual Other Trigg. Expl. Integ. Resol. Other 17 04 05 02 00 Triggering 01 42 ⇒1 14 03 01 Exploration 02 09 98 24 04 Integration 01 03 38 ⇐1,2 56 04 Resolution 00 00 03 15 ⇐2 03 Confusion Matrix V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 14 / 18
  • 17. Challenges 1 Effect of the large relative size of the exploration class, 2 Effect of the code-up rule for coding, 3 No relative importance of features, and 4 Context is not taken into the account. Code-up rule for coding V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 15 / 18
  • 18. In progress: making use of tread context • Discussions (and students’ learning) progresses from triggering to resolutions. • Content of a message depends on the content of the previous messages. • Content of a message depends on the learning progress of a given student. Model for message classification V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 16 / 18
  • 19. Approach: Hidden Markov models (HMMs) & Conditional random fields (CRFs) • Hidden Markov Models: • HMMs used to models system states and their transitions in a variety of contexts. • Widely used, Bayesian Knowledge Tracing models based on HMMs. • Challenges with HMM: • Can this be modeled as HMM (2nd order HMMs?) • Dependency only on a single previous state, • One manifest variable for each state • Conditional random fields: • Used for structured predictions (e.g., speech recognition) • For speech recognition, take into the account the classes of all letters in a word. • Widely used in natural language processing, • More flexible than HMMs, • Challenges with CRF: • Too many parameters to estimate with little data V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 17 / 18
  • 20. Conclusions and future work Summary: • Promising path to explore, • Use of backoff trigrams, plain and backoff dependency triplets, entity count and first message indicator seems useful, Future work: • Additional types of features which look at the context of previous messages (e.g., convergence vs. divergence), • Moving away from SVM, explore other classification methods which are better at explanation • Give associated probabilities for each classification, • Give relative importance of different features. Challenges: • Challenges with message unit of analysis and surface-level features, • Low frequency of resolution messages. V. Kovanovi´c et al. (EDI) Automated Analysis of Discussion Transcripts 31 Aug 2015 18 / 18
  • 22. References I Chang, Chih-Chung and Chih-Jen Lin (2011). “LIBSVM: A library for support vector machines”. In: ACM Transactions on Intelligent Systems and Technology 2 (3), 27:1–27:27. Donnelly, Roisin and John Gardner (2011). “Content analysis of computer conferencing transcripts”. In: Interactive Learning Environments 19.4, pp. 303–315. Garrison, D. Randy, Terry Anderson, and Walter Archer (1999). “Critical Inquiry in a Text-Based Environment: Computer Conferencing in Higher Education”. In: The Internet and Higher Education 2.2–3, pp. 87–105. Henri, France (1992). “Computer Conferencing and Content Analysis”. en. In: Collaborative Learning Through Computer Conferencing, pp. 117–136. Witten, Ian H., Eibe Frank, and Mark A. Hall (2011). Data Mining: Practical Machine Learning Tools and Techniques, Third Edition. 3rd ed. Morgan Kaufmann.