SlideShare a Scribd company logo
How Do I Refactor This?
An Empirical Study on
Refactoring Trends and
Topics in Stack Overflow
Anthony Peruma · Steven Simmons ·
Eman AlOmar · Christian Newman ·
Mohamed Mkaouer · Ali Ouni
I C S E J o u r n a l - F i r s t P a p e r
Software Refactoring
2
An essential part of software maintenance and evolution
Improves the internal quality of the system, and reduce its
technical debt
Research in refactoring is well-established
➢ Detection of refactoring opportunities & code recommendations
Refactoring research is
continually evolving
Are developers applying refactorings in the
same environments, on problems with the
same characteristics and context, as
researchers assume?
• Refactoring is no longer about correcting
code smells
• Industry projects are complex and require
more complicated solutions
• Prior studies interviewed developers
GOAL
4
Understand the trends and challenges
around developer discussions on software
refactoring concepts and activities
The most popular programming-specific question and answer forum
Over 19 million questions and one million users
Research Questions
6
RQ1: How have refactoring discussions on Stack Overflow
grown over the years?
RQ2: What do developers discuss in refactoring-based
Stack Overflow posts?
RQ3: Which topics are the most popular and difficult
among refactoring-related questions?
7
Study Methodology
Experiment design
8
Posts – Questions, Answers &
Accepted Answers
Tags – Associated with a question
Score – Higher the score the
better
View Count – Number of time
the post was viewed
Posts with the refactor tag
Posts having ‘refactor’ in the
title
Quantitative – database
queries and custom code
Qualitative – manually
analyzing a statistically
significant sample
Anatomy of a post
9
tags
score
title
body
views
QUESTION
ANSWER
score
accepted answer
A mixed-methods approach
10
Summary of collected data
11
12
Empirical Results
How have refactoring discussions on
Stack Overflow grown over the years?
1. How have refactoring posts grown throughout
the years?
2. What is the distribution of questions and answers
among developers?
3. What are the tags that are associated with
refactoring questions?
RQ 1
13
RQ 1.1: How have refactoring posts grown throughout the
years?
Approach:
• Extract all questions that had the term ‘refactor’ in either the title or tag
• Extract all answers (i.e., accepted and non-accepted) associated with the
questions
Findings:
• 9,489 questions, from which, 828 did not have an associated answer
• Median time between a question and its first answer is 0.27 hours
• While the number of questions and accepted answers have increased yearly,
the volume by which they increased has been falling
14
RQ 1.2: What is the distribution of questions and answers
among developers?
Approach:
• Utilize the OwnerUserId field to identify the creator of a post
Findings:
• 7,795 distinct users are responsible for creating all refactoring questions
• Most developers asking questions, tend to only ask questions and not answer
questions
• Most developers would ask only one refactoring question
15
RQ 1.3: What are the tags that are associated with
refactoring questions?
Approach:
• Extract all distinct tags from all refactoring posts
• Manual review of the tags
Findings:
• 3,053 distinct tags
• Top five tags are related to programming
languages (or web frameworks) – Java, C#,
JavaScript, Ruby on Rails, and Ruby
• Constant rise in JavaScript questions
16
17
RQ 1 Summary
How have refactoring discussions on Stack Overflow
grown over the years?
• Stack Overflow is a popular venue for refactoring discussions between developers
• Refactoring questions usually receive a response in a short period of time
• There is a rise in questions around dynamically typed languages such as JavaScript
• Most tags are on algorithm and programming concepts, followed by frameworks
What do developers discuss in
refactoring-based Stack Overflow posts?
1. What are the frequent terms utilized by developers
in refactoring discussions?
2. To what extent do traditional refactoring
opportunities, known in existing literature, match
with the challenges faced by developers in Stack
Overflow posts?
3. What are the topics around software refactoring
that are being asked by developers?
RQ 2
18
RQ 2.1: What are the frequent terms utilized by developers
in refactoring discussions?
Approach:
• Extract the top keywords as bigrams from question posts
• Existence of terms correspond to refactoring operations
Findings:
• IDE ‘visual studio’ plays an important part in refactoring
discussions – the IDE supports multiple languages
• ‘refactoring tool’ shows the importance and reliance of tools
and IDEs in refactoring activities
• ‘legacy code’ highlights a common reason why developers
request support with refactoring
• Code extraction and moving are frequently discussed
19
RQ 2.2: To what extent do traditional refactoring opportunities, known in
existing literature, match with the challenges faced by developers in Stack
Overflow posts?
Approach:
• Occurrence of Self-Affirmed Refactoring terms in questions
Findings:
• Frequent mention of key internal quality attributes -- dependency, inheritance
• Use of terms such as ‘clean up’ or ‘redesign’ to discuss refactorings
• Non-functional attribute discussion around ‘readability’, ‘efficiency’, and
‘performance’
20
RQ 2.3: What are the topics around software refactoring that
are being asked by developers?
Approach:
• Topic modeling analysis using
latent Dirichlet allocation
• Includes text-preprocessing
• Use of topic coherence, perplexity
and visualization to determine the
optimum number of topics
• Manual analysis of a statistically
significant sample of questions
21
Findings:
RQ 2.3: What are the topics around software refactoring that
are being asked by developers?
22
Code
Optimization
Simplifying code
structures
Improve readability
and reusability
Reduce lengthy
switch-case
statements, loops,
and duplicate code
Tools and
IDEs
Perform complex
refactorings
Renaming software
artifacts
Architecture
and Design
Patterns
Accumulation of
code updates violate
design principles
Applying SOLID, DRY,
SRP, and KISS
principles
Unit Testing
Challenges with
evolving the test
suite alongside the
source code
Database
Business logic within
SQL scripts grow in
length and
complexity
Challenges with
readability, design
principles, and
system performance
23
RQ 2 Summary
What do developers discuss in refactoring-based
Stack Overflow posts?
• Refactoring discussions revolve around five topics – Code Optimization, Tools and
IDEs, Architecture and Design Patterns, Unit Testing, and Database
• Maintainability is a key concern
• Improving readability and reusability is of utmost concern
• Challenges in synchronizing refactoring changes across software engineering artifacts
Which topics are the most popular
and difficult among refactoring-
related questions?
RQ 3
24
Which topics are the most popular and difficult among
refactoring-related questions?
Approach:
• Measure popularity using a questions view count, favorite count, and score
• Measure difficulty: questions without answers, without accepted answers and
median time for an accepted answer
Findings:
• Questions on Tools/IDEs is the most popular, Database is the least popular
• Tools/IDE questions get more views than code optimization questions
• Questions on Tools/IDE are mostly unanswered than others
• Code Optimization questions are less challenging to answer 25
26
Discussion & Takeaways
Supporting the community
27
Research/Academic
Community
Developer
Community
Tool/IDE Vendor
Community
Research/Academic community
• Course curriculum to reflect real-world settings
• Adaptation of refactoring operations for multiple
programming language and artifact types
• Improve and extend the applicability of
readability quality metrics
• Expand the study and applicability of reusability
beyond source code
28
Tool/IDE vendor community
• Automatic synchronization between project
artifacts
• Enhanced rename refactoring functionality
• Enhance the user experience
29
Developer community
• Extend coding standards utilized in projects to
support naming standards for all project artifacts
• Integrating code quality tools into the build
process for the early detection of poor coding
practices
• Perform frequent and early peer-reviews on all
project artifacts
30
31
Conclusion
Conclusion
A quantitative and qualitative analysis of refactoring questions asked by
developers on Stack Overflow
Findings:
• Stack Overflow is a popular venue for developers to seek assistance with refactoring
• Growth in refactoring dynamically typed code such as Python and JavaScript
• Most questions are around optimizing source code to improve readability and reusability
• Refactoring is not limited to source code – database and unit testing artifact refactoring is common
• Tools are also a popular discussion topic among developers
32
Preprint: https://arxiv.org/abs/2110.12229
Thank You!
33
Anthony Peruma
h t t p : / / p e r u m a . m e
h t t p : / / s c a n l . o r g

More Related Content

PDF
What Programmers Say About Refactoring Tools? An Empirical Investigation of ...
PPTX
Code refactoring
PPTX
Why We Refactor? Confessions of GitHub Contributors
PPTX
30 Years of Refactoring Research and Practice: A Large Scale Refactoring Infr...
PDF
A Preliminary Study of Android Refactorings
PPTX
CASCON 2023 Most Influential Paper Award Talk
PPTX
Refactoring, 2nd Edition
PDF
The Power Of Refactoring (PHPCon Italia)
What Programmers Say About Refactoring Tools? An Empirical Investigation of ...
Code refactoring
Why We Refactor? Confessions of GitHub Contributors
30 Years of Refactoring Research and Practice: A Large Scale Refactoring Infr...
A Preliminary Study of Android Refactorings
CASCON 2023 Most Influential Paper Award Talk
Refactoring, 2nd Edition
The Power Of Refactoring (PHPCon Italia)

Similar to How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics in Stack Overflow (20)

PDF
The Power Of Refactoring (php|tek 09)
PDF
A Model To Compare The Degree Of Refactoring Opportunities Of Three Projects ...
PDF
A MODEL TO COMPARE THE DEGREE OF REFACTORING OPPORTUNITIES OF THREE PROJECTS ...
PDF
The Power Of Refactoring (4developers Krakow)
PDF
Refactoring 2TheMax (con ReSharper)
PPTX
Understanding the Longevity of Code Smells - Preliminary Results of an Explan...
PDF
Refactoring 2 The Max
PPTX
Refactoring code in .net
PDF
Code Refactoring in Software Development
PPTX
Recommending Refactorings based on Team Co-Maintenance Patterns
PPTX
A Multidimensional Empirical Study on Refactoring Activity
PPTX
Refactoring in Software Reengineering .pptx
PDF
From Mess To Masterpiece - JFokus 2017
PPTX
Refactoring
PPTX
mehdi-refactoring.pptx
PPTX
Refactoring
PDF
Unleashing the Power of Automated Refactoring with JDT
PPT
What makes a good code example?
PDF
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
PDF
How to improve the quality of your application
The Power Of Refactoring (php|tek 09)
A Model To Compare The Degree Of Refactoring Opportunities Of Three Projects ...
A MODEL TO COMPARE THE DEGREE OF REFACTORING OPPORTUNITIES OF THREE PROJECTS ...
The Power Of Refactoring (4developers Krakow)
Refactoring 2TheMax (con ReSharper)
Understanding the Longevity of Code Smells - Preliminary Results of an Explan...
Refactoring 2 The Max
Refactoring code in .net
Code Refactoring in Software Development
Recommending Refactorings based on Team Co-Maintenance Patterns
A Multidimensional Empirical Study on Refactoring Activity
Refactoring in Software Reengineering .pptx
From Mess To Masterpiece - JFokus 2017
Refactoring
mehdi-refactoring.pptx
Refactoring
Unleashing the Power of Automated Refactoring with JDT
What makes a good code example?
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...
How to improve the quality of your application
Ad

More from University of Hawai‘i at Mānoa (20)

PDF
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
PDF
Exploring Accessibility Trends and Challenges in Mobile App Development: A St...
PDF
The Impact of Generative AI-Powered Code Generation Tools on Software Enginee...
PDF
Mobile App Security Trends and Topics: An Examination of Questions From Stack...
PDF
On the Rationale and Use of Assertion Messages in Test Code: Insights from So...
PDF
A Developer-Centric Study Exploring Mobile Application Security Practices and...
PDF
Building Hawaii’s IT Future Together CIO Council & UH Manoa ICS Collaboration
PDF
Impostor Syndrome in Final Year Computer Science Students: An Eye Tracking an...
PDF
An Exploratory Study on the Occurrence of Self-Admitted Technical Debt in And...
PDF
Performance Comparison of Binary Machine Learning Classifiers in Identifying ...
PDF
Rename Chains: An Exploratory Study on the Occurrence and Characteristics of ...
PDF
A Primer on High-Quality Identifier Naming [ASE 2022]
PDF
Supporting the Maintenance of Identifier Names: A Holistic Approach to High-Q...
PDF
Preparing for the Academic Job Market: Experience and Tips from a Recent F...
PDF
Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship B...
PDF
A Primer on High-Quality Identifier Naming
PDF
Test Anti-Patterns: From Definition to Detection
PDF
Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship B...
PDF
Understanding Digits in Identifier Names: An Exploratory Study
PDF
IDEAL: An Open-Source Identifier Name Appraisal Tool
Exploring Code Comprehension in Scientific Programming: Preliminary Insight...
Exploring Accessibility Trends and Challenges in Mobile App Development: A St...
The Impact of Generative AI-Powered Code Generation Tools on Software Enginee...
Mobile App Security Trends and Topics: An Examination of Questions From Stack...
On the Rationale and Use of Assertion Messages in Test Code: Insights from So...
A Developer-Centric Study Exploring Mobile Application Security Practices and...
Building Hawaii’s IT Future Together CIO Council & UH Manoa ICS Collaboration
Impostor Syndrome in Final Year Computer Science Students: An Eye Tracking an...
An Exploratory Study on the Occurrence of Self-Admitted Technical Debt in And...
Performance Comparison of Binary Machine Learning Classifiers in Identifying ...
Rename Chains: An Exploratory Study on the Occurrence and Characteristics of ...
A Primer on High-Quality Identifier Naming [ASE 2022]
Supporting the Maintenance of Identifier Names: A Holistic Approach to High-Q...
Preparing for the Academic Job Market: Experience and Tips from a Recent F...
Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship B...
A Primer on High-Quality Identifier Naming
Test Anti-Patterns: From Definition to Detection
Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship B...
Understanding Digits in Identifier Names: An Exploratory Study
IDEAL: An Open-Source Identifier Name Appraisal Tool
Ad

Recently uploaded (20)

PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
System and Network Administraation Chapter 3
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
medical staffing services at VALiNTRY
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
L1 - Introduction to python Backend.pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
ai tools demonstartion for schools and inter college
Design an Analysis of Algorithms II-SECS-1021-03
Wondershare Filmora 15 Crack With Activation Key [2025
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
System and Network Administraation Chapter 3
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
ISO 45001 Occupational Health and Safety Management System
Internet Downloader Manager (IDM) Crack 6.42 Build 41
medical staffing services at VALiNTRY
Odoo POS Development Services by CandidRoot Solutions
Softaken Excel to vCard Converter Software.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Odoo Companies in India – Driving Business Transformation.pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
L1 - Introduction to python Backend.pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
How Creative Agencies Leverage Project Management Software.pdf
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
ai tools demonstartion for schools and inter college

How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics in Stack Overflow

  • 1. How Do I Refactor This? An Empirical Study on Refactoring Trends and Topics in Stack Overflow Anthony Peruma · Steven Simmons · Eman AlOmar · Christian Newman · Mohamed Mkaouer · Ali Ouni I C S E J o u r n a l - F i r s t P a p e r
  • 2. Software Refactoring 2 An essential part of software maintenance and evolution Improves the internal quality of the system, and reduce its technical debt Research in refactoring is well-established ➢ Detection of refactoring opportunities & code recommendations
  • 3. Refactoring research is continually evolving Are developers applying refactorings in the same environments, on problems with the same characteristics and context, as researchers assume? • Refactoring is no longer about correcting code smells • Industry projects are complex and require more complicated solutions • Prior studies interviewed developers
  • 4. GOAL 4 Understand the trends and challenges around developer discussions on software refactoring concepts and activities
  • 5. The most popular programming-specific question and answer forum Over 19 million questions and one million users
  • 6. Research Questions 6 RQ1: How have refactoring discussions on Stack Overflow grown over the years? RQ2: What do developers discuss in refactoring-based Stack Overflow posts? RQ3: Which topics are the most popular and difficult among refactoring-related questions?
  • 8. Experiment design 8 Posts – Questions, Answers & Accepted Answers Tags – Associated with a question Score – Higher the score the better View Count – Number of time the post was viewed Posts with the refactor tag Posts having ‘refactor’ in the title Quantitative – database queries and custom code Qualitative – manually analyzing a statistically significant sample
  • 9. Anatomy of a post 9 tags score title body views QUESTION ANSWER score accepted answer
  • 13. How have refactoring discussions on Stack Overflow grown over the years? 1. How have refactoring posts grown throughout the years? 2. What is the distribution of questions and answers among developers? 3. What are the tags that are associated with refactoring questions? RQ 1 13
  • 14. RQ 1.1: How have refactoring posts grown throughout the years? Approach: • Extract all questions that had the term ‘refactor’ in either the title or tag • Extract all answers (i.e., accepted and non-accepted) associated with the questions Findings: • 9,489 questions, from which, 828 did not have an associated answer • Median time between a question and its first answer is 0.27 hours • While the number of questions and accepted answers have increased yearly, the volume by which they increased has been falling 14
  • 15. RQ 1.2: What is the distribution of questions and answers among developers? Approach: • Utilize the OwnerUserId field to identify the creator of a post Findings: • 7,795 distinct users are responsible for creating all refactoring questions • Most developers asking questions, tend to only ask questions and not answer questions • Most developers would ask only one refactoring question 15
  • 16. RQ 1.3: What are the tags that are associated with refactoring questions? Approach: • Extract all distinct tags from all refactoring posts • Manual review of the tags Findings: • 3,053 distinct tags • Top five tags are related to programming languages (or web frameworks) – Java, C#, JavaScript, Ruby on Rails, and Ruby • Constant rise in JavaScript questions 16
  • 17. 17 RQ 1 Summary How have refactoring discussions on Stack Overflow grown over the years? • Stack Overflow is a popular venue for refactoring discussions between developers • Refactoring questions usually receive a response in a short period of time • There is a rise in questions around dynamically typed languages such as JavaScript • Most tags are on algorithm and programming concepts, followed by frameworks
  • 18. What do developers discuss in refactoring-based Stack Overflow posts? 1. What are the frequent terms utilized by developers in refactoring discussions? 2. To what extent do traditional refactoring opportunities, known in existing literature, match with the challenges faced by developers in Stack Overflow posts? 3. What are the topics around software refactoring that are being asked by developers? RQ 2 18
  • 19. RQ 2.1: What are the frequent terms utilized by developers in refactoring discussions? Approach: • Extract the top keywords as bigrams from question posts • Existence of terms correspond to refactoring operations Findings: • IDE ‘visual studio’ plays an important part in refactoring discussions – the IDE supports multiple languages • ‘refactoring tool’ shows the importance and reliance of tools and IDEs in refactoring activities • ‘legacy code’ highlights a common reason why developers request support with refactoring • Code extraction and moving are frequently discussed 19
  • 20. RQ 2.2: To what extent do traditional refactoring opportunities, known in existing literature, match with the challenges faced by developers in Stack Overflow posts? Approach: • Occurrence of Self-Affirmed Refactoring terms in questions Findings: • Frequent mention of key internal quality attributes -- dependency, inheritance • Use of terms such as ‘clean up’ or ‘redesign’ to discuss refactorings • Non-functional attribute discussion around ‘readability’, ‘efficiency’, and ‘performance’ 20
  • 21. RQ 2.3: What are the topics around software refactoring that are being asked by developers? Approach: • Topic modeling analysis using latent Dirichlet allocation • Includes text-preprocessing • Use of topic coherence, perplexity and visualization to determine the optimum number of topics • Manual analysis of a statistically significant sample of questions 21 Findings:
  • 22. RQ 2.3: What are the topics around software refactoring that are being asked by developers? 22 Code Optimization Simplifying code structures Improve readability and reusability Reduce lengthy switch-case statements, loops, and duplicate code Tools and IDEs Perform complex refactorings Renaming software artifacts Architecture and Design Patterns Accumulation of code updates violate design principles Applying SOLID, DRY, SRP, and KISS principles Unit Testing Challenges with evolving the test suite alongside the source code Database Business logic within SQL scripts grow in length and complexity Challenges with readability, design principles, and system performance
  • 23. 23 RQ 2 Summary What do developers discuss in refactoring-based Stack Overflow posts? • Refactoring discussions revolve around five topics – Code Optimization, Tools and IDEs, Architecture and Design Patterns, Unit Testing, and Database • Maintainability is a key concern • Improving readability and reusability is of utmost concern • Challenges in synchronizing refactoring changes across software engineering artifacts
  • 24. Which topics are the most popular and difficult among refactoring- related questions? RQ 3 24
  • 25. Which topics are the most popular and difficult among refactoring-related questions? Approach: • Measure popularity using a questions view count, favorite count, and score • Measure difficulty: questions without answers, without accepted answers and median time for an accepted answer Findings: • Questions on Tools/IDEs is the most popular, Database is the least popular • Tools/IDE questions get more views than code optimization questions • Questions on Tools/IDE are mostly unanswered than others • Code Optimization questions are less challenging to answer 25
  • 28. Research/Academic community • Course curriculum to reflect real-world settings • Adaptation of refactoring operations for multiple programming language and artifact types • Improve and extend the applicability of readability quality metrics • Expand the study and applicability of reusability beyond source code 28
  • 29. Tool/IDE vendor community • Automatic synchronization between project artifacts • Enhanced rename refactoring functionality • Enhance the user experience 29
  • 30. Developer community • Extend coding standards utilized in projects to support naming standards for all project artifacts • Integrating code quality tools into the build process for the early detection of poor coding practices • Perform frequent and early peer-reviews on all project artifacts 30
  • 32. Conclusion A quantitative and qualitative analysis of refactoring questions asked by developers on Stack Overflow Findings: • Stack Overflow is a popular venue for developers to seek assistance with refactoring • Growth in refactoring dynamically typed code such as Python and JavaScript • Most questions are around optimizing source code to improve readability and reusability • Refactoring is not limited to source code – database and unit testing artifact refactoring is common • Tools are also a popular discussion topic among developers 32 Preprint: https://arxiv.org/abs/2110.12229
  • 33. Thank You! 33 Anthony Peruma h t t p : / / p e r u m a . m e h t t p : / / s c a n l . o r g