SlideShare a Scribd company logo
Community-Assisted
Software Engineering
Decision Making
Gregory Gay and Mats Heimdahl
University of Minnesota
AI in SE: A Success Story
Large, active field, with:
● Growing research community
● Numerous conferences and workshops,
such as MSR, PROMISE, RAISE
● Large data repositories
● History of collaboration between industry
and academia
2
We're already good at drawing useful
conclusions. We expect further algorithmic
improvements.
But...
We need to improve our data!
3
Problem 1:
We don't know what data we need.
Trying to solve complex problems. Make
guesses, then collect data.
Results in missing attributes, added noise.
4
Problem 2:
The data we have is often weak.
Solution quality depends on data quality.
Some commonly-used data sets infamous for
missing values, unhelpful attributes, poor
recording standards.
5
We should improve data standards, but..
We need to use the data we have.
Synergy of human feedback and AI to turn
static data models into dynamic models.
Bring a Wikipedia model to data sets.
6
Inspiration: Recommender Systems
7
Enhanced Feedback Loop
8
Recommendation:
MC/DC
Helpful?
Yes
New Values for
Existing Attributes:
Num. Boolean
Expressions: 219
Num. Numeric
Calculations: 73
New Attributes to
Collect (and Values):
Ratio of Boolean to
Numeric Calculations:
3:1
Data to Delete:
Projects 1, 3, 7
Why should we enhance our data?
These dynamic data models allow:
● Low start-up costs.
● Build body of evidence over time.
● Address data quality issues.
● Human-in-the-loop feedback.
9
Challenge 1:
How do we collect feedback?
10
Challenge 2:
How do we use feedback?
Fundamental trade-off between human curation
and automated AI learning.
When should attributes be filtered? Un-updated
data phased out? New data added?
11
Challenge 3:
Motivating Users
How do we motivate users to:
● Provide feedback.
● Add new data.
● Update old data.
12
Motivation requires:
1. Incentive.
2. Ease of use/contribution.
3. Utility from and trust in the model.
13
We propose feedback-driven dynamic
data models maintained by a synergy of
user-feedback and automated AI techniques.
We propose that dynamic data will allow for
low start-up costs, a stronger body of
evidence over time, and adaptations to
changing industrial conditions.
14
For discussion...
1. Is this even a good idea?
2. What can we do to solve data quality
issues? (other than just the idea suggested
here)
3. What kind of data would benefit from
dynamic adaptation?
4. How do we motivate users to provide
feedback, new data, and update old data?
15

More Related Content

PDF
Big data expo - machine learning in the elastic stack
PPTX
Data Quality Analytics: Understanding what is in your data, before using it
PPTX
Supporting innovation in insurance with randomized experimentation
PDF
1645 track 1 bress_using his laptop
PDF
Data Analytics: From Basic Skills to Executive Decision-Making
PDF
IJDMBD-PUBLISHED ARTICLE
PDF
High Accuracy Model at what costs - Data Curry
Big data expo - machine learning in the elastic stack
Data Quality Analytics: Understanding what is in your data, before using it
Supporting innovation in insurance with randomized experimentation
1645 track 1 bress_using his laptop
Data Analytics: From Basic Skills to Executive Decision-Making
IJDMBD-PUBLISHED ARTICLE
High Accuracy Model at what costs - Data Curry

What's hot (19)

PDF
20151016 Data Science For Project Managers
PPTX
Introduction to data science
PDF
1440 track 2 boire_using our laptop
PPTX
Future of datascience
PDF
IT & Innovation - short summary
PDF
Ml in a day v 1.1
 
PPTX
Vikrant data scientist
PPTX
Managing Data Science | Lessons from the Field
PDF
CRISP-DM: a data science project methodology
PPTX
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
PDF
Evaluation of big data analysis
PDF
Data quality management Basic
PPTX
5 ways to get more from data science
PDF
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
PDF
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
PPTX
Simplify your analytics strategy
PDF
PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014
PPTX
eResearch AU 2015, intro slides
PDF
What is data science ?
20151016 Data Science For Project Managers
Introduction to data science
1440 track 2 boire_using our laptop
Future of datascience
IT & Innovation - short summary
Ml in a day v 1.1
 
Vikrant data scientist
Managing Data Science | Lessons from the Field
CRISP-DM: a data science project methodology
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Evaluation of big data analysis
Data quality management Basic
5 ways to get more from data science
Data Analyst vs Data Engineer vs Data Scientist | Data Analytics Masters Prog...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Simplify your analytics strategy
PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014
eResearch AU 2015, intro slides
What is data science ?
Ad

Viewers also liked (7)

PDF
The Robust Optimization of Non-Linear Requirements Models
PPT
Cukic Promise08 V3
PDF
Unit 6
PPTX
14 software technical_metrics
PDF
Software Metrics
PPT
Software Metrics
PPTX
Software design metrics
The Robust Optimization of Non-Linear Requirements Models
Cukic Promise08 V3
Unit 6
14 software technical_metrics
Software Metrics
Software Metrics
Software design metrics
Ad

Similar to Community-Assisted Software Engineering Decision Making (20)

PPTX
PPT 1.1.4.pptx_PPT 1.1.4.pptx_PPT 1.1.4.pptx
PPTX
PPT 1.1.4.pptx_PPT 1.1.4.pptx_PPT 1.1.4.pptx
PDF
Data Science for Business Managers - An intro to ROI for predictive analytics
PPTX
The Analytics and Data Science Landscape
PPTX
Introduction to Data Science
DOCX
Handling and Analyzing Big Data_ A Professional Guide
PPTX
Chapter 4 : Introduction to BigData.pptx
PPTX
Data Analytics in Industry Verticals, Data Analytics Lifecycle, Challenges of...
PPTX
basic of data science and big data......
PPTX
Big Data & Business Analytics: Understanding the Marketspace
PDF
Implementing Data Mesh WP LTIMindtree White Paper
PPTX
Top Rated Dissertation Data Analysis Services | PhD Assistance
PPTX
Real-Time Data Analytics Examples
PPTX
Building the Analytics Capability
PPTX
big data analytics pgpmx2015
PPTX
Why Everything You Know About bigdata Is A Lie
PPTX
Unit 1-FDS. .pptx
PDF
IRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
PPTX
Chapter 1 Introduction to Data Science (Computing)
PPTX
data science.pptx
PPT 1.1.4.pptx_PPT 1.1.4.pptx_PPT 1.1.4.pptx
PPT 1.1.4.pptx_PPT 1.1.4.pptx_PPT 1.1.4.pptx
Data Science for Business Managers - An intro to ROI for predictive analytics
The Analytics and Data Science Landscape
Introduction to Data Science
Handling and Analyzing Big Data_ A Professional Guide
Chapter 4 : Introduction to BigData.pptx
Data Analytics in Industry Verticals, Data Analytics Lifecycle, Challenges of...
basic of data science and big data......
Big Data & Business Analytics: Understanding the Marketspace
Implementing Data Mesh WP LTIMindtree White Paper
Top Rated Dissertation Data Analysis Services | PhD Assistance
Real-Time Data Analytics Examples
Building the Analytics Capability
big data analytics pgpmx2015
Why Everything You Know About bigdata Is A Lie
Unit 1-FDS. .pptx
IRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
Chapter 1 Introduction to Data Science (Computing)
data science.pptx

More from gregoryg (18)

PPTX
Finding Robust Solutions to Requirements Models
PDF
Distributed Decision Tree Induction
PDF
Irrf Presentation
PDF
Optimizing Requirements Decisions with KEYS
PPT
Confidence in Software Cost Estimation Results based on MMRE and PRED
PPT
Promise08 Wrapup
PPT
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
PPT
Software Defect Repair Times: A Multiplicative Model
PPT
Complementing Approaches in ERP Effort Estimation Practice: an Industrial Study
PPT
Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...
PPT
Implications of Ceiling Effects in Defect Predictors
PDF
Practical use of defect detection and prediction
PPT
Risk And Relevance 20080414ppt
PDF
Organizations Use Data
PPT
Boetticher Presentation Promise 2008v2
PPT
Elane - Promise08
PPT
Risk And Relevance 20080414ppt
PPT
Introduction Promise 2008 V3
Finding Robust Solutions to Requirements Models
Distributed Decision Tree Induction
Irrf Presentation
Optimizing Requirements Decisions with KEYS
Confidence in Software Cost Estimation Results based on MMRE and PRED
Promise08 Wrapup
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Software Defect Repair Times: A Multiplicative Model
Complementing Approaches in ERP Effort Estimation Practice: an Industrial Study
Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...
Implications of Ceiling Effects in Defect Predictors
Practical use of defect detection and prediction
Risk And Relevance 20080414ppt
Organizations Use Data
Boetticher Presentation Promise 2008v2
Elane - Promise08
Risk And Relevance 20080414ppt
Introduction Promise 2008 V3

Recently uploaded (20)

PDF
Empowerment Technology for Senior High School Guide
PPTX
Lesson notes of climatology university.
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PDF
Computing-Curriculum for Schools in Ghana
PDF
Trump Administration's workforce development strategy
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Cell Types and Its function , kingdom of life
PDF
RMMM.pdf make it easy to upload and study
PDF
Indian roads congress 037 - 2012 Flexible pavement
PDF
IGGE1 Understanding the Self1234567891011
PDF
Hazard Identification & Risk Assessment .pdf
PPTX
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
1_English_Language_Set_2.pdf probationary
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PPTX
Introduction to Building Materials
PPTX
Final Presentation General Medicine 03-08-2024.pptx
Empowerment Technology for Senior High School Guide
Lesson notes of climatology university.
Chinmaya Tiranga quiz Grand Finale.pdf
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
Computing-Curriculum for Schools in Ghana
Trump Administration's workforce development strategy
LDMMIA Reiki Yoga Finals Review Spring Summer
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Cell Types and Its function , kingdom of life
RMMM.pdf make it easy to upload and study
Indian roads congress 037 - 2012 Flexible pavement
IGGE1 Understanding the Self1234567891011
Hazard Identification & Risk Assessment .pdf
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
Weekly quiz Compilation Jan -July 25.pdf
1_English_Language_Set_2.pdf probationary
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
Introduction to Building Materials
Final Presentation General Medicine 03-08-2024.pptx

Community-Assisted Software Engineering Decision Making

  • 1. Community-Assisted Software Engineering Decision Making Gregory Gay and Mats Heimdahl University of Minnesota
  • 2. AI in SE: A Success Story Large, active field, with: ● Growing research community ● Numerous conferences and workshops, such as MSR, PROMISE, RAISE ● Large data repositories ● History of collaboration between industry and academia 2
  • 3. We're already good at drawing useful conclusions. We expect further algorithmic improvements. But... We need to improve our data! 3
  • 4. Problem 1: We don't know what data we need. Trying to solve complex problems. Make guesses, then collect data. Results in missing attributes, added noise. 4
  • 5. Problem 2: The data we have is often weak. Solution quality depends on data quality. Some commonly-used data sets infamous for missing values, unhelpful attributes, poor recording standards. 5
  • 6. We should improve data standards, but.. We need to use the data we have. Synergy of human feedback and AI to turn static data models into dynamic models. Bring a Wikipedia model to data sets. 6
  • 8. Enhanced Feedback Loop 8 Recommendation: MC/DC Helpful? Yes New Values for Existing Attributes: Num. Boolean Expressions: 219 Num. Numeric Calculations: 73 New Attributes to Collect (and Values): Ratio of Boolean to Numeric Calculations: 3:1 Data to Delete: Projects 1, 3, 7
  • 9. Why should we enhance our data? These dynamic data models allow: ● Low start-up costs. ● Build body of evidence over time. ● Address data quality issues. ● Human-in-the-loop feedback. 9
  • 10. Challenge 1: How do we collect feedback? 10
  • 11. Challenge 2: How do we use feedback? Fundamental trade-off between human curation and automated AI learning. When should attributes be filtered? Un-updated data phased out? New data added? 11
  • 12. Challenge 3: Motivating Users How do we motivate users to: ● Provide feedback. ● Add new data. ● Update old data. 12
  • 13. Motivation requires: 1. Incentive. 2. Ease of use/contribution. 3. Utility from and trust in the model. 13
  • 14. We propose feedback-driven dynamic data models maintained by a synergy of user-feedback and automated AI techniques. We propose that dynamic data will allow for low start-up costs, a stronger body of evidence over time, and adaptations to changing industrial conditions. 14
  • 15. For discussion... 1. Is this even a good idea? 2. What can we do to solve data quality issues? (other than just the idea suggested here) 3. What kind of data would benefit from dynamic adaptation? 4. How do we motivate users to provide feedback, new data, and update old data? 15