SlideShare a Scribd company logo
Five Days of
Empirical Software Engineering:
   The PASED Experience
          Massimiliano Di Penta
            Giuliano Antoniol
            Daniel M. Germán
          Yann-Gaël Guéhéneuc
              Bram Adams
Motivation
Empirical background important for graduate students
Courses on statistics insufficient to provide such a background
(Most) University curricula could not afford do have specific
courses
Some exceptions (there are others for sure!):
   •   Easterbrook’s CSC2130: Empirical Research Methods in
       Software Engineering at the University of Toronto (2009)
   •   Herbsleb’s 08-803: Empirical Methods for Socio-Technical
       Research at CMU (2010)
   •   Dewayne E. Perry course, Univ. of Texas at Austin
So students need that!
General Info
     About the School
• Ecole Polytechnique de Montréal, June 2011
• Funded by MITACS
 • low fee for students $250 all included
• 44 participants, 9 countries and 25 different
  institutions
• More on http://pased.soccerlab.polymtl.ca
Learning Objectives
1. plan and conduct software engineering
   experiments with human subjects and collect
   related data
2. plan and conduct software engineering studies
   involving the mining of data from
   (un)structured software repositories
3. build prediction and classification models from
   the collected data, and to use these models
Challenges


Choosing                   Combining
 Topics                    theory and
                             practice

            Dealing with
           heterogeneous
            participants
What Topics?
What Topics?




Planning the
   study
What Topics?
               Getting the
                  data




Planning the
   study
What Topics?
               Getting the
                  data




Planning the                 Analyzing
   study                      results
Only 5 days....




...that’s too much!
The Approach
• Learn by example and by doing format
 • Experiment design principles and
    statistics introduced by presenting cases
    from studies in literature
• Practical application of theoretical concepts
  during labs
• Course material and laboratory packages
  available online, including course videos
School Content

      Mining
                 Exp.      Text     Statistical Predictor
AM   Software
                Design    mining     Analysis    Models
     Archives

     Keynote    Keynote   Keynote   Keynote     Keynote

PM    Hands     Hands     Hands      Hands       Hands
       on        on        on         on          on
       lab       lab       lab        lab         lab
Learning by doing...
from the
                                                                                school

     Running example I
• Use of UML Stereotypes in comprehension and
 maintenance tasks
  • Filippo Ricca, Massimiliano Di Penta, Marco Torchiano, Paolo Tonella, Mariano
    Ceccato: How Developers' Experience and Ability Influence Web Application
    Comprehension Tasks Supported by UML Stereotypes: A Series of Four
    Experiments. IEEE Trans. Software Eng. 36(1): 96-118 (2010)

  • Filippo Ricca, Massimiliano Di Penta, Marco Torchiano, Paolo Tonella, Mariano
    Ceccato: The Role of Experience and Ability in Comprehension Tasks Supported by
    UML Stereotypes. ICSE 2007: 375-384

• In the following briefly referred as “Conallen”
from the
                                                 school

   Experiment Design
        Group 1   Group 2   Group 3   Group 4


Lab 1
         Sys A     Sys A     Sys B     Sys B


Lab 2
         Sys B     Sys B     Sys A     Sys A
from the
                         school

Data format: example
from the
                       school

Boxplots: Conallen
from the

 Paired analysis: example                                                                     school



   ID      F.Conallen    F.UML
                                   >wilcox.test(F.Conallen,F.UML, paired=TRUE,
   T20            0.74      0.74      alternative="greater")
   T21            0.74      0.51
   T22             0.7      0.29
   T24            0.88      0.62        Wilcoxon signed rank test with continuity correction
   T25            0.75       0.8
   T26            0.66      0.39
   T27            0.35      0.51
                                   data: F.Conallen and F.UML
   T28            0.62      0.59
   T29            0.57      0.68   V = 138, p-value = 0.04354
   T30            0.73      0.43
                                   alternative hypothesis: true location shift is greater than 0
   T32            0.74      0.56




• Must have data in a paired format
   • or you can use a proper R script
• Need to remove subjects that took part to one lab only
  • For parametric statistics just replace wilcox.test with t.test
Hands on Labs
•   Mining software repository challenge: extract
    interesting facts from git
•   Experiment design: groups working together on
    designing a study
•   Data analysis: text mining, analyze working data sets of
    previous experiments and build bug predictors
    •   Working data sets from previous experiments,
        PROMISE data sets
    •   Tools: R and Weka
Lab Script Example
#UNPAIRED ANALYSIS

#Analysis of single experiment
#Mann-Whitney
attach(tbn)
wilcox.test(Correct[Fit=="yes"],Correct[Fit=="no"],paired=FALSE,alternative="greater")

attach(ttrento)
wilcox.test(Correct[Fit=="yes"],Correct[Fit=="no"],paired=FALSE,alternative="greater")

attach(tphd)
wilcox.test(Correct[Fit=="yes"],Correct[Fit=="no"],paired=FALSE,alternative="greater")

#All data
attach(t)
wilcox.test(Correct[Fit=="yes"],Correct[Fit=="no"],paired=FALSE,alternative="greater")


#Exercises:
# 1) perform a two-tailed test
# 2) can t-test be applied instead of Wilcoxon test? test for data normality using the Wilk-Shapiro test
# 3) repeat the analysis using the t-test?
# 4) repeat the analysis for the time dependent variable
Calibrating Courses to
      Participants’ Profiles
 Excellent       1                             Excellent           4

    Good                           21             Good                            20

     Basic                          23             Basic                         18

    None         1                                None         2

             0           10   20         30                0           10        20     30

  Statistical Analyses                        Empirical Sw Engineering

Excellent            4                         Excellent       1

    Good                      16                  Good                      12

    Basic                           24             Basic                               25

    None         1                                None                 7

             0           10   20         30                0           10        20     30

Mining Sw repositories                               Machine learning
Feedbacks
                             Longer
  Guidelines on
                              Labs
  what not to do



Tutorials              How to write
on tools                empirical
                         papers
Acknowledgments
•   Lecturers (other than the paper authors):
    •   Ahmed E. Hassan, Queen’s University, Canada
    •   Andrian Marcus, Wayne State University, USA
•   Keynote Speakers
    •   Gail Murphy, University of British Columbia, USA
    •   Prem Devanbu, UC Davis, USA
    •   Alain Picard, Benchmark Consulting Services Inc., Canada
    •   Maria Codipiero, Peter Colligan, Kal Murtaia, SAP, Canada
    •   Marc-André Decoste, Google Montréal, Canada
•   Student volunteers
•   The attendees!
•   MITACS (http://www.mitacs.ca/)
Conclusions
ICSE12 SEE.ppt

More Related Content

PPTX
flooring and types
PPTX
Floors slideshare
PPTX
Roof covering-slideshare
DOCX
10 flooring
PPTX
flooring and its types

Similar to ICSE12 SEE.ppt (20)

PDF
See12.ppt
PDF
How to implement Survey
PDF
Linking data to publications: Towards the execution of papers
PPT
Evaluation
PPTX
IQSS Presentation to Program in Health Policy
PPT
Data Handling With Ict For Bb
PDF
Why do a designed experiment
PDF
Intelligent Tutoring Systems: The DynaLearn Approach
PDF
Data Mining
PDF
Wikis supporting research workshops in higher education, prospective use in C...
PDF
sesi 04 software Empirical Investigation.pdf
PDF
Basic research & documentation skills
PDF
Finding local lessons in software engineering
PDF
Open Machine Learning
PDF
basic statistics
PDF
Tutorial 3 - Research methods - Part 1
PDF
M4D-v0.4.pdf
PDF
Learning Local Lessons in Software Engineering
PDF
Just the basics_strata_2013
PPT
Advanced statistics for librarians
See12.ppt
How to implement Survey
Linking data to publications: Towards the execution of papers
Evaluation
IQSS Presentation to Program in Health Policy
Data Handling With Ict For Bb
Why do a designed experiment
Intelligent Tutoring Systems: The DynaLearn Approach
Data Mining
Wikis supporting research workshops in higher education, prospective use in C...
sesi 04 software Empirical Investigation.pdf
Basic research & documentation skills
Finding local lessons in software engineering
Open Machine Learning
basic statistics
Tutorial 3 - Research methods - Part 1
M4D-v0.4.pdf
Learning Local Lessons in Software Engineering
Just the basics_strata_2013
Advanced statistics for librarians
Ad

More from Ptidej Team (20)

PDF
From IoT to Software Miniaturisation
PDF
Presentation
PDF
Presentation
PDF
Presentation
PDF
Presentation by Lionel Briand
PDF
Manel Abdellatif
PDF
Azadeh Kermansaravi
PDF
Mouna Abidi
PDF
CSED - Manel Grichi
PDF
Cristiano Politowski
PDF
Will io t trigger the next software crisis
PDF
PDF
Thesis+of+laleh+eshkevari.ppt
PDF
Thesis+of+nesrine+abdelkafi.ppt
PDF
Medicine15.ppt
PDF
Qrs17b.ppt
PDF
Icpc11c.ppt
PDF
Icsme16.ppt
PDF
Msr17a.ppt
PDF
Icsoc15.ppt
From IoT to Software Miniaturisation
Presentation
Presentation
Presentation
Presentation by Lionel Briand
Manel Abdellatif
Azadeh Kermansaravi
Mouna Abidi
CSED - Manel Grichi
Cristiano Politowski
Will io t trigger the next software crisis
Thesis+of+laleh+eshkevari.ppt
Thesis+of+nesrine+abdelkafi.ppt
Medicine15.ppt
Qrs17b.ppt
Icpc11c.ppt
Icsme16.ppt
Msr17a.ppt
Icsoc15.ppt
Ad

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Big Data Technologies - Introduction.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Machine Learning_overview_presentation.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Cloud computing and distributed systems.
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Approach and Philosophy of On baking technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPT
Teaching material agriculture food technology
“AI and Expert System Decision Support & Business Intelligence Systems”
Big Data Technologies - Introduction.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Spectroscopy.pptx food analysis technology
Machine Learning_overview_presentation.pptx
MIND Revenue Release Quarter 2 2025 Press Release
20250228 LYD VKU AI Blended-Learning.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Empathic Computing: Creating Shared Understanding
Machine learning based COVID-19 study performance prediction
Cloud computing and distributed systems.
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Review of recent advances in non-invasive hemoglobin estimation
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation_ Review paper, used for researhc scholars
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Teaching material agriculture food technology

ICSE12 SEE.ppt

  • 1. Five Days of Empirical Software Engineering: The PASED Experience Massimiliano Di Penta Giuliano Antoniol Daniel M. Germán Yann-Gaël Guéhéneuc Bram Adams
  • 2. Motivation Empirical background important for graduate students Courses on statistics insufficient to provide such a background (Most) University curricula could not afford do have specific courses Some exceptions (there are others for sure!): • Easterbrook’s CSC2130: Empirical Research Methods in Software Engineering at the University of Toronto (2009) • Herbsleb’s 08-803: Empirical Methods for Socio-Technical Research at CMU (2010) • Dewayne E. Perry course, Univ. of Texas at Austin
  • 4. General Info About the School • Ecole Polytechnique de Montréal, June 2011 • Funded by MITACS • low fee for students $250 all included • 44 participants, 9 countries and 25 different institutions • More on http://pased.soccerlab.polymtl.ca
  • 5. Learning Objectives 1. plan and conduct software engineering experiments with human subjects and collect related data 2. plan and conduct software engineering studies involving the mining of data from (un)structured software repositories 3. build prediction and classification models from the collected data, and to use these models
  • 6. Challenges Choosing Combining Topics theory and practice Dealing with heterogeneous participants
  • 9. What Topics? Getting the data Planning the study
  • 10. What Topics? Getting the data Planning the Analyzing study results
  • 12. The Approach • Learn by example and by doing format • Experiment design principles and statistics introduced by presenting cases from studies in literature • Practical application of theoretical concepts during labs • Course material and laboratory packages available online, including course videos
  • 13. School Content Mining Exp. Text Statistical Predictor AM Software Design mining Analysis Models Archives Keynote Keynote Keynote Keynote Keynote PM Hands Hands Hands Hands Hands on on on on on lab lab lab lab lab
  • 15. from the school Running example I • Use of UML Stereotypes in comprehension and maintenance tasks • Filippo Ricca, Massimiliano Di Penta, Marco Torchiano, Paolo Tonella, Mariano Ceccato: How Developers' Experience and Ability Influence Web Application Comprehension Tasks Supported by UML Stereotypes: A Series of Four Experiments. IEEE Trans. Software Eng. 36(1): 96-118 (2010) • Filippo Ricca, Massimiliano Di Penta, Marco Torchiano, Paolo Tonella, Mariano Ceccato: The Role of Experience and Ability in Comprehension Tasks Supported by UML Stereotypes. ICSE 2007: 375-384 • In the following briefly referred as “Conallen”
  • 16. from the school Experiment Design Group 1 Group 2 Group 3 Group 4 Lab 1 Sys A Sys A Sys B Sys B Lab 2 Sys B Sys B Sys A Sys A
  • 17. from the school Data format: example
  • 18. from the school Boxplots: Conallen
  • 19. from the Paired analysis: example school ID F.Conallen F.UML >wilcox.test(F.Conallen,F.UML, paired=TRUE, T20 0.74 0.74 alternative="greater") T21 0.74 0.51 T22 0.7 0.29 T24 0.88 0.62 Wilcoxon signed rank test with continuity correction T25 0.75 0.8 T26 0.66 0.39 T27 0.35 0.51 data: F.Conallen and F.UML T28 0.62 0.59 T29 0.57 0.68 V = 138, p-value = 0.04354 T30 0.73 0.43 alternative hypothesis: true location shift is greater than 0 T32 0.74 0.56 • Must have data in a paired format • or you can use a proper R script • Need to remove subjects that took part to one lab only • For parametric statistics just replace wilcox.test with t.test
  • 20. Hands on Labs • Mining software repository challenge: extract interesting facts from git • Experiment design: groups working together on designing a study • Data analysis: text mining, analyze working data sets of previous experiments and build bug predictors • Working data sets from previous experiments, PROMISE data sets • Tools: R and Weka
  • 21. Lab Script Example #UNPAIRED ANALYSIS #Analysis of single experiment #Mann-Whitney attach(tbn) wilcox.test(Correct[Fit=="yes"],Correct[Fit=="no"],paired=FALSE,alternative="greater") attach(ttrento) wilcox.test(Correct[Fit=="yes"],Correct[Fit=="no"],paired=FALSE,alternative="greater") attach(tphd) wilcox.test(Correct[Fit=="yes"],Correct[Fit=="no"],paired=FALSE,alternative="greater") #All data attach(t) wilcox.test(Correct[Fit=="yes"],Correct[Fit=="no"],paired=FALSE,alternative="greater") #Exercises: # 1) perform a two-tailed test # 2) can t-test be applied instead of Wilcoxon test? test for data normality using the Wilk-Shapiro test # 3) repeat the analysis using the t-test? # 4) repeat the analysis for the time dependent variable
  • 22. Calibrating Courses to Participants’ Profiles Excellent 1 Excellent 4 Good 21 Good 20 Basic 23 Basic 18 None 1 None 2 0 10 20 30 0 10 20 30 Statistical Analyses Empirical Sw Engineering Excellent 4 Excellent 1 Good 16 Good 12 Basic 24 Basic 25 None 1 None 7 0 10 20 30 0 10 20 30 Mining Sw repositories Machine learning
  • 23. Feedbacks Longer Guidelines on Labs what not to do Tutorials How to write on tools empirical papers
  • 24. Acknowledgments • Lecturers (other than the paper authors): • Ahmed E. Hassan, Queen’s University, Canada • Andrian Marcus, Wayne State University, USA • Keynote Speakers • Gail Murphy, University of British Columbia, USA • Prem Devanbu, UC Davis, USA • Alain Picard, Benchmark Consulting Services Inc., Canada • Maria Codipiero, Peter Colligan, Kal Murtaia, SAP, Canada • Marc-André Decoste, Google Montréal, Canada • Student volunteers • The attendees! • MITACS (http://www.mitacs.ca/)