Then, Now, and Next
Constants in Changing MSR Research Landscape
Ayushi Rastogi
Assistant Professor, University of Groningen
1
Photo by Chris Lawton on Unsplash
MSR’24 Vision and Reflection
Create data for problems of today and future.
2
Same purpose through different lens.
Road ahead is about balance.
Create data for problems of today and future.
2
Same purpose through different lens.
Road ahead is about balance.
Create data for problems of today and future.
2
Same purpose through different lens.
Road ahead is about balance.
What shaped my experiences?
3
4
4
5
Ass. Prof, U Groningen
Netherlands
2011 2014 2021
2018
2017
Postdoc, UC Irvine
USA
Postdoc, TU Delft
Netherlands
MSR Redmond intern
USA
Ph.D. , IIIT-Delhi
India
2024
5
Empirical studies
Improving developer
productivity
Ass. Prof, U Groningen
Netherlands
2011 2014 2021
2018
2017
Postdoc, UC Irvine
USA
Postdoc, TU Delft
Netherlands
MSR Redmond intern
USA
Ph.D. , IIIT-Delhi
India
2024
5
Empirical studies
Improving developer
productivity
Software industry
Human and social factors
Ass. Prof, U Groningen
Netherlands
2011 2014 2021
2018
2017
Postdoc, UC Irvine
USA
Postdoc, TU Delft
Netherlands
MSR Redmond intern
USA
Ph.D. , IIIT-Delhi
India
2024
5
Empirical studies
Improving developer
productivity
Software industry
Human and social factors
Societal relevance
Fairness problems at
work
Ass. Prof, U Groningen
Netherlands
2011 2014 2021
2018
2017
Postdoc, UC Irvine
USA
Postdoc, TU Delft
Netherlands
MSR Redmond intern
USA
Ph.D. , IIIT-Delhi
India
2024
5
Empirical studies
Improving developer
productivity
Software industry
Human and social factors
Societal relevance
Fairness problems at
work
Software-defined
SIG, ING
Ass. Prof, U Groningen
Netherlands
2011 2014 2021
2018
2017
Postdoc, UC Irvine
USA
Postdoc, TU Delft
Netherlands
MSR Redmond intern
USA
Ph.D. , IIIT-Delhi
India
2024
5
Empirical studies
Improving developer
productivity
Software industry
Human and social factors
Societal relevance
Fairness problems at
work
Software-defined
SIG, ING
Meta, Google,
Snap, Amazon
SMEs
Belsimpel, AFAS
Ass. Prof, U Groningen
Netherlands
2011 2014 2021
2018
2017
Postdoc, UC Irvine
USA
Postdoc, TU Delft
Netherlands
MSR Redmond intern
USA
Ph.D. , IIIT-Delhi
India
2024
Community + Collaborators + Mentors
6
Mining + Software Repository
7
Data Purpose Approach
8
Data / Software Repositories
9
10
Data as signals and proxies for solving
industry and societal problems.
10
Data as signals and proxies for solving
industry and societal problems.
measure developer
productivity
code review
speed
personality release cycle
explainability
disparity in
code evaluation
competing
projects
2011 2014 2021
2018
2017 2024
11
Data continues to grow in size and forms
2011 2014 2021
2018
2017 2024
11
Data continues to grow in size and forms
pre-GitHub era
2011 2014 2021
2018
2017 2024
11
Data continues to grow in size and forms
pre-GitHub era
2011 2014 2021
2018
2017 2024
11
Data continues to grow in size and forms
pre-GitHub era
2011 2014 2021
2018
2017 2024
GitHub/GHTorrent era
11
Data continues to grow in size and forms
pre-GitHub era
2011 2014 2021
2018
2017 2024
Geographical location
and code review decision
GitHub/GHTorrent era
Rastogi et. al Relationship between geographical location and evaluation of developer contribution. ESEM’18
11
Data continues to grow in size and forms
pre-GitHub era
2011 2014 2021
2018
2017 2024
Geographical location
and code review decision
An unprecedented opportunity
GitHub/GHTorrent era
Rastogi et. al Relationship between geographical location and evaluation of developer contribution. ESEM’18
12
Looked at existing data
Looked for data in unchartered spaces
2011 2014 2021
2018
2017 2024
12
Looked at existing data
Looked for data in unchartered spaces
2011 2014 2021
2018
2017 2024
MSR Most Influential Paper awards
12
Looked at existing data
Looked for data in unchartered spaces
2011 2014 2021
2018
2017 2024
MSR Most Influential Paper awards
2021
12
Looked at existing data
Looked for data in unchartered spaces
2011 2014 2021
2018
2017 2024
MSR Most Influential Paper awards
2021 MSR’22
brainstorming
12
Looked at existing data
Looked for data in unchartered spaces
2011 2014 2021
2018
2017 2024
MSR Most Influential Paper awards 2022
2021 MSR’22
brainstorming
12
Looked at existing data
Looked for data in unchartered spaces
Looked into problems for which data exists
2011 2014 2021
2018
2017 2024
MSR Most Influential Paper awards 2022
2021 MSR’22
brainstorming
Why we cannot study some problems?
13
14
Interesting data closed in industry vaults
2011 2014 2021
2018
2017 2024
14
Interesting data closed in industry vaults
MSR Redmond Software-defined
SIG, ING
2011 2014 2021
2018
2017 2024
Meta, Google,
Snap, Amazon
SMEs
Belsimpel, AFAS
14
Interesting data closed in industry vaults
Access to data determines what opportunity is
available and to whom
MSR Redmond Software-defined
SIG, ING
2011 2014 2021
2018
2017 2024
Meta, Google,
Snap, Amazon
SMEs
Belsimpel, AFAS
15
No data for some problems
15
No data for some problems
17 April, SEIP
18 April, SEIP
15
No data for some problems
Niche problems get limited traction
17 April, SEIP
18 April, SEIP
What does the data enable?
16
17
Data: see the problem, propose and test solutions
code review
speed
Inclusion on
gender
geography
2011 2014 2021
2018
2017 2024
Kudrjavets et. al. Waiting times Mining code review data to understand waiting times between acceptance and merging: An empirical analysis. MSR’22.
Prana et. al. Including everyone, everywhere: Understanding opportunities and challenges of geographic gender-inclusion in OSS. TSE’21.
17
Data: see the problem, propose and test solutions
Where do we apply the solution?
code review
speed
Inclusion on
gender
geography
2011 2014 2021
2018
2017 2024
Kudrjavets et. al. Waiting times Mining code review data to understand waiting times between acceptance and merging: An empirical analysis. MSR’22.
Prana et. al. Including everyone, everywhere: Understanding opportunities and challenges of geographic gender-inclusion in OSS. TSE’21.
Create data for problems of today and future.
18
What is our next GHTorrent? Know its limits.
How to assess solutions that are difficult to apply in practice?
Mining / Purpose and Approach
19
Technology/Context
20
21
Changing software development
2011 2014 2021
2018
2017 2024
21
Changing software development
2011 2014 2021
2018
2017 2024
Agile Continuous Integration Co-Pilot
21
Changing software development
New perspectives for known purposes
2011 2014 2021
2018
2017 2024
Agile Continuous Integration Co-Pilot
22
Changing analysis
2011 2014 2021
2018
2017 2024
22
Changing analysis
2011 2014 2021
2018
2017 2024
Logistic Regression Machine Learning Deep Learning Large Language Models
22
Changing analysis
Repeating ourselves in different/better ways
2011 2014 2021
2018
2017 2024
Logistic Regression Machine Learning Deep Learning Large Language Models
Same purpose through different lens.
23
Change is an opportunity to gauge its impact.
24
for software, industry, society, environment
2011 2014 2021
2018
2017 2024
24
for software, industry, society, environment
2011 2014 2021
2018
2017 2024
Quality Productivity Fairness Energy
24
for software, industry, society, environment
2011 2014 2021
2018
2017 2024
Quality Productivity Fairness Energy
24
for software, industry, society, environment
Move to intersectionality and balance
2011 2014 2021
2018
2017 2024
Quality Productivity Fairness Energy
Road ahead is about balance.
25
Road ahead is about balance.
25
Chen et. al. Leveraging test plan quality to improve code review efficacy. FSE’22
19 April, JF
Create data for problems of today and future.
26
Same purpose through different lens.
Road ahead is about balance.
Create data for problems of today and future.
26
Same purpose through different lens.
Road ahead is about balance.
Create data for problems of today and future.
26
Same purpose through different lens.
Road ahead is about balance.
No dull moment for SE research amid
technology shift in development and analysis.
27
E: a.rastogi@rug.nl X: @Ayushi_Rastogi
Photo by Chris Lawton on Unsplash

More Related Content

PDF
Webinar Slides: Measuring What Matters
PDF
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
PDF
The Road to Data-Informed Agile Development Processes
PDF
Neo4j – The Fastest Path to Scalable Real-Time Analytics
PPTX
Why is TDD so hard for Data Engineering and Analytics Projects?
PDF
Neo4j y GenAI
PPT
Innovation-related organizational decision-making: the case of responsive web...
PDF
PGDay Brasilia 2017
Webinar Slides: Measuring What Matters
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
The Road to Data-Informed Agile Development Processes
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Why is TDD so hard for Data Engineering and Analytics Projects?
Neo4j y GenAI
Innovation-related organizational decision-making: the case of responsive web...
PGDay Brasilia 2017

Similar to Vision and reflection on Mining Software Repositories research in 2024 (20)

PDF
Agile Software Development Practices: Perceptions & Project Data
PDF
Scholars@Cornell: From Data in Peace to Data in Use. (VIVO'18)
PDF
RNUG 2020: Domino Application Strategy: Key insights for successful moderniza...
DOC
Sivaprasad_resume
PDF
UX STRAT Europe 2021: Dr. Eva Deckers, Philips Experience Design
PDF
Benchmarking Linked Data Introductory Remarks
PDF
Making the Case for UX
PPTX
Educate 2017 (Learnosity Developer Conference) Opening Keynote
PPTX
Augmenting and Automating DevOps with Artificial Intelligence
PDF
Understanding Visualization Authoring for Genomics Data through User Interviews
PPTX
Where the data jobs are? A Data PDX talk
PPTX
The Do's and Don'ts of Mainframe Modernization
PDF
Landscape of AI/ML in 2023
PDF
Linked Data Overview - structured data on the web for US EPA 20140203
PPTX
Open Collaboration in a Digital World | Find your place in the future
PDF
Scholars@Cornell: Visualizing the Scholarship data
PDF
JDO 2019: Data Science for Developers - Matthew Renze
PPTX
Designing Software Ecosystems - How to Develop Sustainable Collaborations? - ...
PPTX
Mining Correlations of ATL Transformation and Metamodel Metrics
PDF
Critical Friends Brief
Agile Software Development Practices: Perceptions & Project Data
Scholars@Cornell: From Data in Peace to Data in Use. (VIVO'18)
RNUG 2020: Domino Application Strategy: Key insights for successful moderniza...
Sivaprasad_resume
UX STRAT Europe 2021: Dr. Eva Deckers, Philips Experience Design
Benchmarking Linked Data Introductory Remarks
Making the Case for UX
Educate 2017 (Learnosity Developer Conference) Opening Keynote
Augmenting and Automating DevOps with Artificial Intelligence
Understanding Visualization Authoring for Genomics Data through User Interviews
Where the data jobs are? A Data PDX talk
The Do's and Don'ts of Mainframe Modernization
Landscape of AI/ML in 2023
Linked Data Overview - structured data on the web for US EPA 20140203
Open Collaboration in a Digital World | Find your place in the future
Scholars@Cornell: Visualizing the Scholarship data
JDO 2019: Data Science for Developers - Matthew Renze
Designing Software Ecosystems - How to Develop Sustainable Collaborations? - ...
Mining Correlations of ATL Transformation and Metamodel Metrics
Critical Friends Brief
Ad

Recently uploaded (20)

PDF
Science Form five needed shit SCIENEce so
PPTX
Platelet disorders - thrombocytopenia.pptx
PPT
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPTX
Preformulation.pptx Preformulation studies-Including all parameter
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPTX
PMR- PPT.pptx for students and doctors tt
PPT
Cell Structure Description and Functions
PPTX
gene cloning powerpoint for general biology 2
PPT
Enhancing Laboratory Quality Through ISO 15189 Compliance
PDF
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
PPTX
A powerpoint on colorectal cancer with brief background
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
PPT
Animal tissues, epithelial, muscle, connective, nervous tissue
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PDF
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
PPTX
LIPID & AMINO ACID METABOLISM UNIT-III, B PHARM II SEMESTER
PPTX
Cells and Organs of the Immune System (Unit-2) - Majesh Sir.pptx
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Science Form five needed shit SCIENEce so
Platelet disorders - thrombocytopenia.pptx
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Preformulation.pptx Preformulation studies-Including all parameter
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PMR- PPT.pptx for students and doctors tt
Cell Structure Description and Functions
gene cloning powerpoint for general biology 2
Enhancing Laboratory Quality Through ISO 15189 Compliance
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
A powerpoint on colorectal cancer with brief background
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
Animal tissues, epithelial, muscle, connective, nervous tissue
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
LIPID & AMINO ACID METABOLISM UNIT-III, B PHARM II SEMESTER
Cells and Organs of the Immune System (Unit-2) - Majesh Sir.pptx
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Ad

Vision and reflection on Mining Software Repositories research in 2024

  • 1. Then, Now, and Next Constants in Changing MSR Research Landscape Ayushi Rastogi Assistant Professor, University of Groningen 1 Photo by Chris Lawton on Unsplash MSR’24 Vision and Reflection
  • 2. Create data for problems of today and future. 2 Same purpose through different lens. Road ahead is about balance.
  • 3. Create data for problems of today and future. 2 Same purpose through different lens. Road ahead is about balance.
  • 4. Create data for problems of today and future. 2 Same purpose through different lens. Road ahead is about balance.
  • 5. What shaped my experiences? 3
  • 6. 4
  • 7. 4
  • 8. 5 Ass. Prof, U Groningen Netherlands 2011 2014 2021 2018 2017 Postdoc, UC Irvine USA Postdoc, TU Delft Netherlands MSR Redmond intern USA Ph.D. , IIIT-Delhi India 2024
  • 9. 5 Empirical studies Improving developer productivity Ass. Prof, U Groningen Netherlands 2011 2014 2021 2018 2017 Postdoc, UC Irvine USA Postdoc, TU Delft Netherlands MSR Redmond intern USA Ph.D. , IIIT-Delhi India 2024
  • 10. 5 Empirical studies Improving developer productivity Software industry Human and social factors Ass. Prof, U Groningen Netherlands 2011 2014 2021 2018 2017 Postdoc, UC Irvine USA Postdoc, TU Delft Netherlands MSR Redmond intern USA Ph.D. , IIIT-Delhi India 2024
  • 11. 5 Empirical studies Improving developer productivity Software industry Human and social factors Societal relevance Fairness problems at work Ass. Prof, U Groningen Netherlands 2011 2014 2021 2018 2017 Postdoc, UC Irvine USA Postdoc, TU Delft Netherlands MSR Redmond intern USA Ph.D. , IIIT-Delhi India 2024
  • 12. 5 Empirical studies Improving developer productivity Software industry Human and social factors Societal relevance Fairness problems at work Software-defined SIG, ING Ass. Prof, U Groningen Netherlands 2011 2014 2021 2018 2017 Postdoc, UC Irvine USA Postdoc, TU Delft Netherlands MSR Redmond intern USA Ph.D. , IIIT-Delhi India 2024
  • 13. 5 Empirical studies Improving developer productivity Software industry Human and social factors Societal relevance Fairness problems at work Software-defined SIG, ING Meta, Google, Snap, Amazon SMEs Belsimpel, AFAS Ass. Prof, U Groningen Netherlands 2011 2014 2021 2018 2017 Postdoc, UC Irvine USA Postdoc, TU Delft Netherlands MSR Redmond intern USA Ph.D. , IIIT-Delhi India 2024
  • 15. Mining + Software Repository 7
  • 17. Data / Software Repositories 9
  • 18. 10 Data as signals and proxies for solving industry and societal problems.
  • 19. 10 Data as signals and proxies for solving industry and societal problems. measure developer productivity code review speed personality release cycle explainability disparity in code evaluation competing projects 2011 2014 2021 2018 2017 2024
  • 20. 11 Data continues to grow in size and forms 2011 2014 2021 2018 2017 2024
  • 21. 11 Data continues to grow in size and forms pre-GitHub era 2011 2014 2021 2018 2017 2024
  • 22. 11 Data continues to grow in size and forms pre-GitHub era 2011 2014 2021 2018 2017 2024
  • 23. 11 Data continues to grow in size and forms pre-GitHub era 2011 2014 2021 2018 2017 2024 GitHub/GHTorrent era
  • 24. 11 Data continues to grow in size and forms pre-GitHub era 2011 2014 2021 2018 2017 2024 Geographical location and code review decision GitHub/GHTorrent era Rastogi et. al Relationship between geographical location and evaluation of developer contribution. ESEM’18
  • 25. 11 Data continues to grow in size and forms pre-GitHub era 2011 2014 2021 2018 2017 2024 Geographical location and code review decision An unprecedented opportunity GitHub/GHTorrent era Rastogi et. al Relationship between geographical location and evaluation of developer contribution. ESEM’18
  • 26. 12 Looked at existing data Looked for data in unchartered spaces 2011 2014 2021 2018 2017 2024
  • 27. 12 Looked at existing data Looked for data in unchartered spaces 2011 2014 2021 2018 2017 2024 MSR Most Influential Paper awards
  • 28. 12 Looked at existing data Looked for data in unchartered spaces 2011 2014 2021 2018 2017 2024 MSR Most Influential Paper awards 2021
  • 29. 12 Looked at existing data Looked for data in unchartered spaces 2011 2014 2021 2018 2017 2024 MSR Most Influential Paper awards 2021 MSR’22 brainstorming
  • 30. 12 Looked at existing data Looked for data in unchartered spaces 2011 2014 2021 2018 2017 2024 MSR Most Influential Paper awards 2022 2021 MSR’22 brainstorming
  • 31. 12 Looked at existing data Looked for data in unchartered spaces Looked into problems for which data exists 2011 2014 2021 2018 2017 2024 MSR Most Influential Paper awards 2022 2021 MSR’22 brainstorming
  • 32. Why we cannot study some problems? 13
  • 33. 14 Interesting data closed in industry vaults 2011 2014 2021 2018 2017 2024
  • 34. 14 Interesting data closed in industry vaults MSR Redmond Software-defined SIG, ING 2011 2014 2021 2018 2017 2024 Meta, Google, Snap, Amazon SMEs Belsimpel, AFAS
  • 35. 14 Interesting data closed in industry vaults Access to data determines what opportunity is available and to whom MSR Redmond Software-defined SIG, ING 2011 2014 2021 2018 2017 2024 Meta, Google, Snap, Amazon SMEs Belsimpel, AFAS
  • 36. 15 No data for some problems
  • 37. 15 No data for some problems 17 April, SEIP 18 April, SEIP
  • 38. 15 No data for some problems Niche problems get limited traction 17 April, SEIP 18 April, SEIP
  • 39. What does the data enable? 16
  • 40. 17 Data: see the problem, propose and test solutions code review speed Inclusion on gender geography 2011 2014 2021 2018 2017 2024 Kudrjavets et. al. Waiting times Mining code review data to understand waiting times between acceptance and merging: An empirical analysis. MSR’22. Prana et. al. Including everyone, everywhere: Understanding opportunities and challenges of geographic gender-inclusion in OSS. TSE’21.
  • 41. 17 Data: see the problem, propose and test solutions Where do we apply the solution? code review speed Inclusion on gender geography 2011 2014 2021 2018 2017 2024 Kudrjavets et. al. Waiting times Mining code review data to understand waiting times between acceptance and merging: An empirical analysis. MSR’22. Prana et. al. Including everyone, everywhere: Understanding opportunities and challenges of geographic gender-inclusion in OSS. TSE’21.
  • 42. Create data for problems of today and future. 18 What is our next GHTorrent? Know its limits. How to assess solutions that are difficult to apply in practice?
  • 43. Mining / Purpose and Approach 19
  • 45. 21 Changing software development 2011 2014 2021 2018 2017 2024
  • 46. 21 Changing software development 2011 2014 2021 2018 2017 2024 Agile Continuous Integration Co-Pilot
  • 47. 21 Changing software development New perspectives for known purposes 2011 2014 2021 2018 2017 2024 Agile Continuous Integration Co-Pilot
  • 48. 22 Changing analysis 2011 2014 2021 2018 2017 2024
  • 49. 22 Changing analysis 2011 2014 2021 2018 2017 2024 Logistic Regression Machine Learning Deep Learning Large Language Models
  • 50. 22 Changing analysis Repeating ourselves in different/better ways 2011 2014 2021 2018 2017 2024 Logistic Regression Machine Learning Deep Learning Large Language Models
  • 51. Same purpose through different lens. 23 Change is an opportunity to gauge its impact.
  • 52. 24 for software, industry, society, environment 2011 2014 2021 2018 2017 2024
  • 53. 24 for software, industry, society, environment 2011 2014 2021 2018 2017 2024 Quality Productivity Fairness Energy
  • 54. 24 for software, industry, society, environment 2011 2014 2021 2018 2017 2024 Quality Productivity Fairness Energy
  • 55. 24 for software, industry, society, environment Move to intersectionality and balance 2011 2014 2021 2018 2017 2024 Quality Productivity Fairness Energy
  • 56. Road ahead is about balance. 25
  • 57. Road ahead is about balance. 25 Chen et. al. Leveraging test plan quality to improve code review efficacy. FSE’22 19 April, JF
  • 58. Create data for problems of today and future. 26 Same purpose through different lens. Road ahead is about balance.
  • 59. Create data for problems of today and future. 26 Same purpose through different lens. Road ahead is about balance.
  • 60. Create data for problems of today and future. 26 Same purpose through different lens. Road ahead is about balance.
  • 61. No dull moment for SE research amid technology shift in development and analysis. 27 E: a.rastogi@rug.nl X: @Ayushi_Rastogi Photo by Chris Lawton on Unsplash