SlideShare a Scribd company logo
21st International Conference on Mining Software Repositories
Incivility in Open Source Projects:
A Comprehensive Annotated Dataset of Locked
GitHub Issue Threads
Ramtin Ehsani, Mia Mohammad Imran, Robert Zita, Kostadin Damevski, Preetha Chatterjee
Drexel University
1
Preprint: https://arxiv.org/abs/2402.04183
Virginia Commonwealth
University
Elmhurst University
imranm3@vcu.edu
Motivation and Research Objective
• Fostering healthy collaborations, one of the main challenges in OSS
• Understanding and addressing incivility within OSS discussions is crucial
• A lack of a comprehensive approach to address uncivil interactions
• Scarcity of large annotated SE datasets
2
Research Objective: Curating a dataset of locked GitHub issue discussion threads
Annotated dataset of locked GitHub issue threads with heated discussions
Dataset Annotation
• 404 Locked issue threads, and 5,961 Individual comments
• Locked as "too heated" or demonstrated clear characteristics indicative of heated
discussions
• A total of 19 annotators
• To further improve the annotation quality, we used GPT-4
• Manually checked the instances of disagreements between GPT and annotators
3
Annotated Features
4
• Tone Bearing Discussion Features (TBDFs), uncivil features*
⚬ Bitter frustration, Impatience, Mocking, Irony, Vulgarity, etc.
• Triggers*
⚬ Failed use of code, Technical disagreements, Communication breakdown, etc.
• Targets*
⚬ People, Code/Tool, Company/organization, Undirected
• Consequences*
⚬ Discontinued further discussion, Escalating further, etc.
*
C. Miller, S. Cohen, D. Klug, B. Vasilescu and C. Kästner, "“Did You Miss My Comment or What?” Understanding Toxicity in Open Source Discussions," 2022
*
Isabella Ferreira, Jinghui Cheng, and Bram Adams, The "Shut the f**k up" Phenomenon: Characterizing Incivility in Open Source Code Review Discussions, 2021
*
Jaydeb Sarker, Asif Kamal Turzo, Ming Dong, and Amiangshu Bosu, Automated Identification of Toxic Code Reviews Using ToxiCR, 2023
*
Our open coding process
Dataset Description
5
• 1,365 comments annotated with an uncivil feature
⚬ Bitter frustration, Impatience, and Mocking the most recurrents (~68%)
Summary Research Opportunities
● A curated dataset of 404 locked GitHub issue threads from 213
OSS projects
[ Scan the QR code to access]
● Bitter frustration, Impatience, and Mocking are the most
prevalent TBDFs
● Failed use of tool/code or error messages the most common
trigger
● People are the most common target of incivility
● Discontinued further discussion is the most common
consequence
6
Preprint: https://arxiv.org/abs/2307.15631
ramtin.ehsani@drexel.edu
Preprint: https://arxiv.org/abs/2402.04183
imranm3@vcu.edu
● Resource for conducting comprehensive analyses of incivility
● Automatic incivility detection tools
○ an SE-specific incivility detection tool
● More than just flags for incivility
○ Offer insights into the specific types
● Exploration of how incivility might impact projects’ health
● Investigate how incivility affects targets from underrepresented
communities

More Related Content

PDF
Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Loc...
PDF
Diversity and inclusion in Open Source Software Communities
PDF
Software as a Well-Formed Research Object
PPTX
Thesis summary-arguments-about-deleting-wikipedia-content-paris-2013-04-19
PPTX
Emerging practices 2019 week 1
PDF
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
PPT
myExperiment @ Nettab
PDF
Andrew Moore past-present-potential
Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Loc...
Diversity and inclusion in Open Source Software Communities
Software as a Well-Formed Research Object
Thesis summary-arguments-about-deleting-wikipedia-content-paris-2013-04-19
Emerging practices 2019 week 1
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
myExperiment @ Nettab
Andrew Moore past-present-potential

Similar to Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads (20)

PPTX
September 23 2015 NISO Virtual Conference: Scholarly Communication Models: Ev...
PPTX
Breaking Binaries Research Session on Coding and Analysis
PPTX
Looking for Commonsense in the Semantic Web
PPTX
Semantic engagement
PPTX
Talking is (virtual) work -supporting online argumentation--2013-09-18 Malta ...
PDF
BCcampus a-great-babbling-bazaar
 
PPTX
How communities curate knowledge & how ontologists can help -Eurecom--2015-01-19
PPTX
Synthesizing knowledge from disagreement -- Manchester -- 2015-05-06
PPTX
Synthesizing knowledge from disagreement -cwi-2015-04-23
PDF
What Academia Can Learn from Open Source
PDF
Immersive Recommendation
PPTX
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
PPTX
Netnography webinar
PDF
Operationalisation of Collaboration Sunbelt 2015
PDF
Software Mining and Software Datasets
PPT
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PPTX
2311 EAAMO
PDF
The Future of Semantics on the Web
PPTX
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
PDF
Exploring ChatGPT for Toxicity Detection in GitHub
September 23 2015 NISO Virtual Conference: Scholarly Communication Models: Ev...
Breaking Binaries Research Session on Coding and Analysis
Looking for Commonsense in the Semantic Web
Semantic engagement
Talking is (virtual) work -supporting online argumentation--2013-09-18 Malta ...
BCcampus a-great-babbling-bazaar
 
How communities curate knowledge & how ontologists can help -Eurecom--2015-01-19
Synthesizing knowledge from disagreement -- Manchester -- 2015-05-06
Synthesizing knowledge from disagreement -cwi-2015-04-23
What Academia Can Learn from Open Source
Immersive Recommendation
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Netnography webinar
Operationalisation of Collaboration Sunbelt 2015
Software Mining and Software Datasets
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
2311 EAAMO
The Future of Semantics on the Web
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Exploring ChatGPT for Toxicity Detection in GitHub
Ad

More from Preetha Chatterjee (9)

PDF
Interpersonal Trust in OSS: Exploring Dimensions of Trust in GitHub Pull Requ...
PDF
Data Augmentation for Improving Emotion Recognition in Software Engineering C...
PPTX
Automatic Identification of Informative Code in Stack Overflow Posts
PPTX
Automatically Identifying the Quality of Developer Chats for Post Hoc Use
PPTX
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
PPTX
Extracting Archival-Quality Information from Software-Related Chats
PPTX
Mining Code Examples with Descriptive Text from Software Artifacts
PPTX
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
PDF
Extracting Code Segments and Their Descriptions from Research Articles
Interpersonal Trust in OSS: Exploring Dimensions of Trust in GitHub Pull Requ...
Data Augmentation for Improving Emotion Recognition in Software Engineering C...
Automatic Identification of Informative Code in Stack Overflow Posts
Automatically Identifying the Quality of Developer Chats for Post Hoc Use
Finding Help with Programming Errors: An Exploratory Study of Novice Software...
Extracting Archival-Quality Information from Software-Related Chats
Mining Code Examples with Descriptive Text from Software Artifacts
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineer...
Extracting Code Segments and Their Descriptions from Research Articles
Ad

Recently uploaded (20)

PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Computer network topology notes for revision
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
SAP 2 completion done . PRESENTATION.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Database Infoormation System (DBIS).pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
.pdf is not working space design for the following data for the following dat...
STUDY DESIGN details- Lt Col Maksud (21).pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Qualitative Qantitative and Mixed Methods.pptx
IB Computer Science - Internal Assessment.pptx
Reliability_Chapter_ presentation 1221.5784
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Computer network topology notes for revision
Supervised vs unsupervised machine learning algorithms
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Acceptance and paychological effects of mandatory extra coach I classes.pptx

Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads

  • 1. 21st International Conference on Mining Software Repositories Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads Ramtin Ehsani, Mia Mohammad Imran, Robert Zita, Kostadin Damevski, Preetha Chatterjee Drexel University 1 Preprint: https://arxiv.org/abs/2402.04183 Virginia Commonwealth University Elmhurst University imranm3@vcu.edu
  • 2. Motivation and Research Objective • Fostering healthy collaborations, one of the main challenges in OSS • Understanding and addressing incivility within OSS discussions is crucial • A lack of a comprehensive approach to address uncivil interactions • Scarcity of large annotated SE datasets 2 Research Objective: Curating a dataset of locked GitHub issue discussion threads Annotated dataset of locked GitHub issue threads with heated discussions
  • 3. Dataset Annotation • 404 Locked issue threads, and 5,961 Individual comments • Locked as "too heated" or demonstrated clear characteristics indicative of heated discussions • A total of 19 annotators • To further improve the annotation quality, we used GPT-4 • Manually checked the instances of disagreements between GPT and annotators 3
  • 4. Annotated Features 4 • Tone Bearing Discussion Features (TBDFs), uncivil features* ⚬ Bitter frustration, Impatience, Mocking, Irony, Vulgarity, etc. • Triggers* ⚬ Failed use of code, Technical disagreements, Communication breakdown, etc. • Targets* ⚬ People, Code/Tool, Company/organization, Undirected • Consequences* ⚬ Discontinued further discussion, Escalating further, etc. * C. Miller, S. Cohen, D. Klug, B. Vasilescu and C. Kästner, "“Did You Miss My Comment or What?” Understanding Toxicity in Open Source Discussions," 2022 * Isabella Ferreira, Jinghui Cheng, and Bram Adams, The "Shut the f**k up" Phenomenon: Characterizing Incivility in Open Source Code Review Discussions, 2021 * Jaydeb Sarker, Asif Kamal Turzo, Ming Dong, and Amiangshu Bosu, Automated Identification of Toxic Code Reviews Using ToxiCR, 2023 * Our open coding process
  • 5. Dataset Description 5 • 1,365 comments annotated with an uncivil feature ⚬ Bitter frustration, Impatience, and Mocking the most recurrents (~68%)
  • 6. Summary Research Opportunities ● A curated dataset of 404 locked GitHub issue threads from 213 OSS projects [ Scan the QR code to access] ● Bitter frustration, Impatience, and Mocking are the most prevalent TBDFs ● Failed use of tool/code or error messages the most common trigger ● People are the most common target of incivility ● Discontinued further discussion is the most common consequence 6 Preprint: https://arxiv.org/abs/2307.15631 ramtin.ehsani@drexel.edu Preprint: https://arxiv.org/abs/2402.04183 imranm3@vcu.edu ● Resource for conducting comprehensive analyses of incivility ● Automatic incivility detection tools ○ an SE-specific incivility detection tool ● More than just flags for incivility ○ Offer insights into the specific types ● Exploration of how incivility might impact projects’ health ● Investigate how incivility affects targets from underrepresented communities