SlideShare a Scribd company logo
1 Het begint met een idee
Experiment planning
Ivano Malavolta
Vrije Universiteit Amsterdam
Experiment planning
Context selection
Research questions and hypotheses formulation
Variables selection
Subjects selection
2 Ivano Malavolta / S2 group / Empirical software engineering
Roadmap
Vrije Universiteit Amsterdam
3 Ivano Malavolta / S2 group / Empirical software engineering
Recall
Vrije Universiteit Amsterdam
4 Ivano Malavolta / S2 group / Empirical software engineering
Recall
Vrije Universiteit Amsterdam
● Experiment scoping describes WHY we run an experiment
● The planning determines HOW the experiment will be executed
○ Be careful here → the result of the experiment can be disturbed (or
even destroyed) if not planned properly
5 Ivano Malavolta / S2 group / Empirical software engineering
Scoping VS planning
Vrije Universiteit Amsterdam
6 Ivano Malavolta / S2 group / Empirical software engineering
Planning phases
Scope of this
lecture
Vrije Universiteit Amsterdam
7 Ivano Malavolta / S2 group / Empirical software engineering
Context selection
Vrije Universiteit Amsterdam
8 Ivano Malavolta / S2 group / Empirical software engineering
We already heard about context...
Vrije Universiteit Amsterdam
CONTEXT: the environment in which the experiment is
performed
Our goal here is to achieve the most general results
→ optimum: large real software projects, with practitioners
Many risks involved….
9 Ivano Malavolta / S2 group / Empirical software engineering
Context selection
Vrije Universiteit Amsterdam
10 Ivano Malavolta / S2 group / Empirical software engineering
Context selection dimensions
RealityLab
● Off-line ● On-line
● Students ● Professionals
● Toy problem ● Real problem
● Specific ● General
Vrije Universiteit Amsterdam
11 Ivano Malavolta / S2 group / Empirical software engineering
Quick exercise
Think about the Progressive web app case study
and formulate a potential context for an experiment
Vrije Universiteit Amsterdam
12 Ivano Malavolta / S2 group / Empirical software engineering
Research questions and hypotheses formulation
Vrije Universiteit Amsterdam
Research questions detail the specific objectives of the empirical
study
Incepted from the study definition
Starting point to identify the variables of interest of your study
13 Ivano Malavolta / S2 group / Empirical software engineering
Research questions formulation
Vrije Universiteit Amsterdam
Research questions should be as clear as possible
→ they will guide the whole experiment
Avoid questions you cannot answer
○ What is the best JavaScript framework in terms of performance?
○ What is the most productive programming language?
○ ...
Golden words: to what extent..., what is the impact of X to Y,
what are the traits of Xs, what are the characteristics of T...
Remind that in your report you will come back to them and
explicitly answer each of them in details
14 Ivano Malavolta / S2 group / Empirical software engineering
Suggestions
Vrije Universiteit Amsterdam
15 Ivano Malavolta / S2 group / Empirical software engineering
Example: CHABADA
Vrije Universiteit Amsterdam
16
Example: CHABADA
Vrije Universiteit Amsterdam
17
Example: CHABADA
Vrije Universiteit Amsterdam
18 Ivano Malavolta / S2 group / Empirical software engineering
Example: CHABADA
Vrije Universiteit Amsterdam
19 Ivano Malavolta / S2 group / Empirical software engineering
Example: CHABADA
Vrije Universiteit Amsterdam
20 Ivano Malavolta / S2 group / Empirical software engineering
From research questions to hypotheses
● When a research question is going to be addressed by
applying a statistical test, it would be helpful to formulate
an hypothesis
● Very useful to select what kind of statistical procedure you
need to use
Not needed in all cases
Vrije Universiteit Amsterdam
21 Ivano Malavolta / S2 group / Empirical software engineering
Hypotheses formulation
● Conjecture (P)
○ Administration of treatment has influence on some
feature
● Consequence (Q)
○ We observe a difference in terms of some feature
P → Q
Vrije Universiteit Amsterdam
22 Ivano Malavolta / S2 group / Empirical software engineering
Hypotheses formulation
Hypothesis: a formal statement about a phenomenon
● Null hypothesis H0
: no real trends or patterns in the
experiment setting (aka ~Q)
● Alternative hypothesis Ha
: there are real trends or
patterns in the experiment setting (aka Q)
Vrije Universiteit Amsterdam
23 Ivano Malavolta / S2 group / Empirical software engineering
Falsification (modus tollens)
● We aim at verifying the null hypothesis (~Q)
○ we test the null hypothesis H0
We can reject the null hypothesis → we can draw conclusions
This comes from Popper (1959): any statement in a scientific
field is true until anybody can contradict it
● Aiming at verifying Q is WRONG
○ Provides no insight on the conjecture
Vrije Universiteit Amsterdam
24 Ivano Malavolta / S2 group / Empirical software engineering
Example
● Question:
○ Do green algorithms improve software energy efficiency?
● Consequence (Q):
○ (when applying green algorithms) we observe a reduction
of energy consumption
● Conjecture (P):
○ applying green algorithms reduces energy consumption
Vrije Universiteit Amsterdam
25 Ivano Malavolta / S2 group / Empirical software engineering
Example
● Null hypothesis (¬Q): there is no reduction in terms of
energy consumption
H0
: mean(Ega
) >= mean(Enormal
)
● Alternative hypothesis (Q): the energy consumption
resulting from an application that uses green algorithms is
lower
Ha
: mean(Ega
) < mean(Enormal
)
Vrije Universiteit Amsterdam
26 Ivano Malavolta / S2 group / Empirical software engineering
What can happen now?
● We confirm the null hypothesis (~Q)
○ ~Q → ~P
○ our conjecture P has been falsified → green algorithms do
not reduce energy consumption (~P)
● We reject the null hypothesis (~Q)
○ Q = true
○ our conjecture P has been corroborated → we are more
confident that it is likely that green algorithms reduce energy
consumption (P)
Vrije Universiteit Amsterdam
27 Ivano Malavolta / S2 group / Empirical software engineering
Example 2
Vrije Universiteit Amsterdam
28
Other examples of hypotheses
Vrije Universiteit Amsterdam
29 Ivano Malavolta / S2 group / Empirical software engineering
Variables selection
Vrije Universiteit Amsterdam
30 Ivano Malavolta / S2 group / Empirical software engineering
Recap
Vrije Universiteit Amsterdam
31 Ivano Malavolta / S2 group / Empirical software engineering
Variables selection
● The choice of independent and dependent variables is
usually done in parallel
● Some variables cannot be measured directly (e.g.
productivity, code quality, effort...)
○ We use proxies to estimate them
■ proxies introduce a construct validity threat: is what we are
measuring a good representation of our variable?
Vrije Universiteit Amsterdam
32 Ivano Malavolta / S2 group / Empirical software engineering
Variables selection
● Independent variables should have some effect on the
dependent ones
→ do not choose variables randomly, think about your RQs
● After choosing the variables you have to define their types,
scales, ranges → this is part of measurement theory
Vrije Universiteit Amsterdam
33 Ivano Malavolta / S2 group / Empirical software engineering
Hypotheses formulation
● Often there is only 1 dependent variable (the main factor)
○ e.g., power consumption
● Often one level for the control group
○ e.g. use of old/traditional technique/tool
● One or more levels for experimental groups
○ e.g. use of new technique(s) tool(s)
Other independent variables are the co-factors
Vrije Universiteit Amsterdam
Our main factor is not the only variable influencing the dependent
variable(s)
○ e.g., network instability of your experimental environment, usage
patterns of the analysed website, skills of subjects, analyzed system,
experience of developers, ...
We will never account for all possible co-factors
Your best friend here is randomization
In a good experiment:
● limit their effect through a good experimental design
● able to separate their effect from main factors
● analyze the interaction with main factor
34
Co-factors
Vrije Universiteit Amsterdam
35 Ivano Malavolta / S2 group / Empirical software engineering
Example
Vrije Universiteit Amsterdam
36 Ivano Malavolta / S2 group / Empirical software engineering
Subjects selection
Vrije Universiteit Amsterdam
37 Ivano Malavolta / S2 group / Empirical software engineering
Subjects selection
● Population: the complete set of items of interest for our
experiment
○ e.g. open-source software applications
○ e.g., all existing progressive web apps
● Sample: representative selection of individuals for that
population
○ e.g. Apache, MySQL
○ e.g. progressive web apps mined from Alexa’s list
Vrije Universiteit Amsterdam
38 Ivano Malavolta / S2 group / Empirical software engineering
Sampling techniques
● Probability Sampling: the probability of selecting each
subject in the population is known
○ Simple Random Sampling: random selection from the population,
probability is 1/total
○ Stratified random sampling: the population is divided into groups
with a known distribution between the groups. Random sampling
is then applied within each group
● Non-probability sampling: the probability of selecting
each subject out of the population is unknown
○ Convenience: the most convenient (cost/distance/ complexity)
subjects are selected [usually it is the only way to go]
○ Quota: you select samples from groups of subjects (e.g. male vs
females, open-source vs closed source)
Vrije Universiteit Amsterdam
39 Ivano Malavolta / S2 group / Empirical software engineering
How big should be a sample?
● Sample size: the larger, the better (more general results)
● If the population has a high variation a larger sample size is
needed
● Data analysis may influence sample size
○ some statistical tests have meaning only on large
samples
Vrije Universiteit Amsterdam
You know how to:
● define the context of your experiment
● define research questions and hypotheses
● define independent and dependent variables
● strategies for selecting subjects
Next step
Measurement theory → how to define the “type” of variables
40 Ivano Malavolta / S2 group / Empirical software engineering
What this lecture means to you?
Vrije Universiteit Amsterdam
41 Ivano Malavolta / S2 group / Empirical software engineering
Readings
Chapter 8
Vrije Universiteit Amsterdam
42 Ivano Malavolta / S2 group / Empirical software engineering
Some contents of this part of lecture extracted from:
● Giuseppe Procaccianti’s lectures at VU
● Massimiliano Di Penta’s lectures at GSSI (Italy)
Acknowledgements

More Related Content

PDF
The Green Lab - [09 B] Experiment validity
PDF
The Green Lab - [05 B] Experiment design (advanced)
PDF
The Green Lab - [02 A] The experimental process
PDF
The Green Lab - [03 B] Measurement theory basics
PDF
The Green Lab - [05 A] Experiment design (basics)
PDF
The Green Lab - [09 A] Statistical tests and effect size
PDF
[05-B] Experiment design (advanced)
PDF
[02-A] The experimental process
The Green Lab - [09 B] Experiment validity
The Green Lab - [05 B] Experiment design (advanced)
The Green Lab - [02 A] The experimental process
The Green Lab - [03 B] Measurement theory basics
The Green Lab - [05 A] Experiment design (basics)
The Green Lab - [09 A] Statistical tests and effect size
[05-B] Experiment design (advanced)
[02-A] The experimental process

What's hot (20)

PDF
[05-A] Experiment design (basics)
PDF
The Green Lab - [07-A] Data Analysis
PDF
The Green Lab - [04-A] Lab environment and tools
PDF
The Green Lab - [11-A] Data Visualization
PDF
[02-B] Experiment scoping
PDF
The Green Lab - [01 C] Empirical software engineering
PDF
The Green Lab - [13 B] Future research challenges
PDF
The Green Lab - [12-A] Data visualization in R
PDF
[13 - B] Experiment reporting
PDF
The Green Lab - [01-B] Case study presentation
PDF
[03-B] Measurement theory basics
PDF
[09-A] Statistical tests and effect size
PDF
[07-B] Statistical hypothesis testing
PDF
[2017/2018] RESEARCH in software engineering
PDF
Collaborative Model-Driven Software Engineering: a Classification Framework a...
PDF
Visual Learning Pulse - Final Thesis presentation
PDF
Object-oriented design patterns in UML [Software Modeling] [Computer Science...
PDF
Alin_Galatan_Resume
ODP
Dynamic Optimization without Markov Assumptions: application to power systems
PPTX
Odin2018_Minh_ML_Risk_Prediction
[05-A] Experiment design (basics)
The Green Lab - [07-A] Data Analysis
The Green Lab - [04-A] Lab environment and tools
The Green Lab - [11-A] Data Visualization
[02-B] Experiment scoping
The Green Lab - [01 C] Empirical software engineering
The Green Lab - [13 B] Future research challenges
The Green Lab - [12-A] Data visualization in R
[13 - B] Experiment reporting
The Green Lab - [01-B] Case study presentation
[03-B] Measurement theory basics
[09-A] Statistical tests and effect size
[07-B] Statistical hypothesis testing
[2017/2018] RESEARCH in software engineering
Collaborative Model-Driven Software Engineering: a Classification Framework a...
Visual Learning Pulse - Final Thesis presentation
Object-oriented design patterns in UML [Software Modeling] [Computer Science...
Alin_Galatan_Resume
Dynamic Optimization without Markov Assumptions: application to power systems
Odin2018_Minh_ML_Risk_Prediction
Ad

Viewers also liked (13)

PDF
The Green Lab - [04 B] [PWA] Experiment setup
PDF
The Green Lab - [02 C] [case study] Progressive web apps
PDF
The Green Lab - [02 B] Experiment scoping
PDF
The Green Lab - [07-B] Hypothesis Testing
PDF
Beyond Native Apps: Web Technologies to the Rescue! [SPLASH 2016 - Mobile! k...
PDF
Sustainable Software for a Digital Society
PDF
Java and effective programming. Is it possible? - IAESTE Case Week 2016
PPT
European Green IT Webinar 2014 - Green Code Lab (France)
PPTX
Technology, apps, and websites you need to know about
PDF
Green-Language programming presentation
PDF
Presentation Joost Visser / SIG - what can be green about software- Workshop ...
PPTX
Green Software Lab
PPTX
Introduction to the Green Code
The Green Lab - [04 B] [PWA] Experiment setup
The Green Lab - [02 C] [case study] Progressive web apps
The Green Lab - [02 B] Experiment scoping
The Green Lab - [07-B] Hypothesis Testing
Beyond Native Apps: Web Technologies to the Rescue! [SPLASH 2016 - Mobile! k...
Sustainable Software for a Digital Society
Java and effective programming. Is it possible? - IAESTE Case Week 2016
European Green IT Webinar 2014 - Green Code Lab (France)
Technology, apps, and websites you need to know about
Green-Language programming presentation
Presentation Joost Visser / SIG - what can be green about software- Workshop ...
Green Software Lab
Introduction to the Green Code
Ad

Similar to The Green Lab - [03 A] Experiment planning (20)

PDF
[03-A] Experiment planning
PDF
[13 - A] Experiment validity
PDF
[01-B] Empirical software engineering
PPTX
Lecture 1 Introduction to Engineering Analysis.pptx
PDF
Pragmatic software testing education - SIGCSE 2019
PDF
Statistical Analysis of Results in Music Information Retrieval: Why and How
DOCX
1How to Perform ExperimentsBasic Concepts CSCI .docx
PDF
Uses of accelerometer sensor and its application in m-Learning environments: ...
PDF
Teach the importance of logic (programming)in Computer Science and why it is ...
PDF
An overview on diversity and Software Testing
PDF
Online Machine Learning: introduction and examples
PDF
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
ODP
Planning for power systems
ODP
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
PPTX
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
PDF
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Experi...
PDF
10 more lessons learned from building Machine Learning systems
PDF
10 more lessons learned from building Machine Learning systems - MLConf
PDF
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
[03-A] Experiment planning
[13 - A] Experiment validity
[01-B] Empirical software engineering
Lecture 1 Introduction to Engineering Analysis.pptx
Pragmatic software testing education - SIGCSE 2019
Statistical Analysis of Results in Music Information Retrieval: Why and How
1How to Perform ExperimentsBasic Concepts CSCI .docx
Uses of accelerometer sensor and its application in m-Learning environments: ...
Teach the importance of logic (programming)in Computer Science and why it is ...
An overview on diversity and Software Testing
Online Machine Learning: introduction and examples
130321 zephyrin soh - on the effect of exploration strategies on maintenanc...
Planning for power systems
Ilab Metis: we optimize power systems and we are not afraid of direct policy ...
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Experi...
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems - MLConf
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15

More from Ivano Malavolta (20)

PDF
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
PDF
Conducting Experiments on the Software Architecture of Robotic Systems (QRARS...
PDF
The H2020 experience
PDF
The Green Lab - Research cocktail @Vrije Universiteit Amsterdam (October 2020)
PDF
Software sustainability and Green IT
PDF
Navigation-aware and Personalized Prefetching of Network Requests in Android ...
PDF
How Maintainability Issues of Android Apps Evolve [ICSME 2018]
PDF
Experimenting on Mobile Apps Quality - a tale about Energy, Performance, and ...
PDF
Modeling objects interaction via UML sequence diagrams [Software Design] [Com...
PDF
Modeling behaviour via UML state machines [Software Design] [Computer Science...
PDF
Object-oriented design patterns in UML [Software Design] [Computer Science] [...
PDF
Structure modeling with UML [Software Design] [Computer Science] [Vrije Unive...
PDF
Requirements engineering with UML [Software Design] [Computer Science] [Vrije...
PDF
Modeling and abstraction, software development process [Software Design] [Com...
PDF
[2017/2018] Agile development
PDF
Reconstructing microservice-based architectures
PDF
[2017/2018] AADL - Architecture Analysis and Design Language
PDF
[2017/2018] Architectural languages
PDF
[2017/2018] Introduction to Software Architecture
PDF
Mobile Apps quality - a tale about energy, performance, and users’ perception
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Conducting Experiments on the Software Architecture of Robotic Systems (QRARS...
The H2020 experience
The Green Lab - Research cocktail @Vrije Universiteit Amsterdam (October 2020)
Software sustainability and Green IT
Navigation-aware and Personalized Prefetching of Network Requests in Android ...
How Maintainability Issues of Android Apps Evolve [ICSME 2018]
Experimenting on Mobile Apps Quality - a tale about Energy, Performance, and ...
Modeling objects interaction via UML sequence diagrams [Software Design] [Com...
Modeling behaviour via UML state machines [Software Design] [Computer Science...
Object-oriented design patterns in UML [Software Design] [Computer Science] [...
Structure modeling with UML [Software Design] [Computer Science] [Vrije Unive...
Requirements engineering with UML [Software Design] [Computer Science] [Vrije...
Modeling and abstraction, software development process [Software Design] [Com...
[2017/2018] Agile development
Reconstructing microservice-based architectures
[2017/2018] AADL - Architecture Analysis and Design Language
[2017/2018] Architectural languages
[2017/2018] Introduction to Software Architecture
Mobile Apps quality - a tale about energy, performance, and users’ perception

Recently uploaded (20)

PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
August Patch Tuesday
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Mushroom cultivation and it's methods.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation theory and applications.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Spectroscopy.pptx food analysis technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Assigned Numbers - 2025 - Bluetooth® Document
August Patch Tuesday
Diabetes mellitus diagnosis method based random forest with bat algorithm
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
SOPHOS-XG Firewall Administrator PPT.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Machine learning based COVID-19 study performance prediction
Programs and apps: productivity, graphics, security and other tools
Mushroom cultivation and it's methods.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation theory and applications.pdf
A comparative analysis of optical character recognition models for extracting...
Spectroscopy.pptx food analysis technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

The Green Lab - [03 A] Experiment planning

  • 1. 1 Het begint met een idee Experiment planning Ivano Malavolta
  • 2. Vrije Universiteit Amsterdam Experiment planning Context selection Research questions and hypotheses formulation Variables selection Subjects selection 2 Ivano Malavolta / S2 group / Empirical software engineering Roadmap
  • 3. Vrije Universiteit Amsterdam 3 Ivano Malavolta / S2 group / Empirical software engineering Recall
  • 4. Vrije Universiteit Amsterdam 4 Ivano Malavolta / S2 group / Empirical software engineering Recall
  • 5. Vrije Universiteit Amsterdam ● Experiment scoping describes WHY we run an experiment ● The planning determines HOW the experiment will be executed ○ Be careful here → the result of the experiment can be disturbed (or even destroyed) if not planned properly 5 Ivano Malavolta / S2 group / Empirical software engineering Scoping VS planning
  • 6. Vrije Universiteit Amsterdam 6 Ivano Malavolta / S2 group / Empirical software engineering Planning phases Scope of this lecture
  • 7. Vrije Universiteit Amsterdam 7 Ivano Malavolta / S2 group / Empirical software engineering Context selection
  • 8. Vrije Universiteit Amsterdam 8 Ivano Malavolta / S2 group / Empirical software engineering We already heard about context...
  • 9. Vrije Universiteit Amsterdam CONTEXT: the environment in which the experiment is performed Our goal here is to achieve the most general results → optimum: large real software projects, with practitioners Many risks involved…. 9 Ivano Malavolta / S2 group / Empirical software engineering Context selection
  • 10. Vrije Universiteit Amsterdam 10 Ivano Malavolta / S2 group / Empirical software engineering Context selection dimensions RealityLab ● Off-line ● On-line ● Students ● Professionals ● Toy problem ● Real problem ● Specific ● General
  • 11. Vrije Universiteit Amsterdam 11 Ivano Malavolta / S2 group / Empirical software engineering Quick exercise Think about the Progressive web app case study and formulate a potential context for an experiment
  • 12. Vrije Universiteit Amsterdam 12 Ivano Malavolta / S2 group / Empirical software engineering Research questions and hypotheses formulation
  • 13. Vrije Universiteit Amsterdam Research questions detail the specific objectives of the empirical study Incepted from the study definition Starting point to identify the variables of interest of your study 13 Ivano Malavolta / S2 group / Empirical software engineering Research questions formulation
  • 14. Vrije Universiteit Amsterdam Research questions should be as clear as possible → they will guide the whole experiment Avoid questions you cannot answer ○ What is the best JavaScript framework in terms of performance? ○ What is the most productive programming language? ○ ... Golden words: to what extent..., what is the impact of X to Y, what are the traits of Xs, what are the characteristics of T... Remind that in your report you will come back to them and explicitly answer each of them in details 14 Ivano Malavolta / S2 group / Empirical software engineering Suggestions
  • 15. Vrije Universiteit Amsterdam 15 Ivano Malavolta / S2 group / Empirical software engineering Example: CHABADA
  • 18. Vrije Universiteit Amsterdam 18 Ivano Malavolta / S2 group / Empirical software engineering Example: CHABADA
  • 19. Vrije Universiteit Amsterdam 19 Ivano Malavolta / S2 group / Empirical software engineering Example: CHABADA
  • 20. Vrije Universiteit Amsterdam 20 Ivano Malavolta / S2 group / Empirical software engineering From research questions to hypotheses ● When a research question is going to be addressed by applying a statistical test, it would be helpful to formulate an hypothesis ● Very useful to select what kind of statistical procedure you need to use Not needed in all cases
  • 21. Vrije Universiteit Amsterdam 21 Ivano Malavolta / S2 group / Empirical software engineering Hypotheses formulation ● Conjecture (P) ○ Administration of treatment has influence on some feature ● Consequence (Q) ○ We observe a difference in terms of some feature P → Q
  • 22. Vrije Universiteit Amsterdam 22 Ivano Malavolta / S2 group / Empirical software engineering Hypotheses formulation Hypothesis: a formal statement about a phenomenon ● Null hypothesis H0 : no real trends or patterns in the experiment setting (aka ~Q) ● Alternative hypothesis Ha : there are real trends or patterns in the experiment setting (aka Q)
  • 23. Vrije Universiteit Amsterdam 23 Ivano Malavolta / S2 group / Empirical software engineering Falsification (modus tollens) ● We aim at verifying the null hypothesis (~Q) ○ we test the null hypothesis H0 We can reject the null hypothesis → we can draw conclusions This comes from Popper (1959): any statement in a scientific field is true until anybody can contradict it ● Aiming at verifying Q is WRONG ○ Provides no insight on the conjecture
  • 24. Vrije Universiteit Amsterdam 24 Ivano Malavolta / S2 group / Empirical software engineering Example ● Question: ○ Do green algorithms improve software energy efficiency? ● Consequence (Q): ○ (when applying green algorithms) we observe a reduction of energy consumption ● Conjecture (P): ○ applying green algorithms reduces energy consumption
  • 25. Vrije Universiteit Amsterdam 25 Ivano Malavolta / S2 group / Empirical software engineering Example ● Null hypothesis (¬Q): there is no reduction in terms of energy consumption H0 : mean(Ega ) >= mean(Enormal ) ● Alternative hypothesis (Q): the energy consumption resulting from an application that uses green algorithms is lower Ha : mean(Ega ) < mean(Enormal )
  • 26. Vrije Universiteit Amsterdam 26 Ivano Malavolta / S2 group / Empirical software engineering What can happen now? ● We confirm the null hypothesis (~Q) ○ ~Q → ~P ○ our conjecture P has been falsified → green algorithms do not reduce energy consumption (~P) ● We reject the null hypothesis (~Q) ○ Q = true ○ our conjecture P has been corroborated → we are more confident that it is likely that green algorithms reduce energy consumption (P)
  • 27. Vrije Universiteit Amsterdam 27 Ivano Malavolta / S2 group / Empirical software engineering Example 2
  • 28. Vrije Universiteit Amsterdam 28 Other examples of hypotheses
  • 29. Vrije Universiteit Amsterdam 29 Ivano Malavolta / S2 group / Empirical software engineering Variables selection
  • 30. Vrije Universiteit Amsterdam 30 Ivano Malavolta / S2 group / Empirical software engineering Recap
  • 31. Vrije Universiteit Amsterdam 31 Ivano Malavolta / S2 group / Empirical software engineering Variables selection ● The choice of independent and dependent variables is usually done in parallel ● Some variables cannot be measured directly (e.g. productivity, code quality, effort...) ○ We use proxies to estimate them ■ proxies introduce a construct validity threat: is what we are measuring a good representation of our variable?
  • 32. Vrije Universiteit Amsterdam 32 Ivano Malavolta / S2 group / Empirical software engineering Variables selection ● Independent variables should have some effect on the dependent ones → do not choose variables randomly, think about your RQs ● After choosing the variables you have to define their types, scales, ranges → this is part of measurement theory
  • 33. Vrije Universiteit Amsterdam 33 Ivano Malavolta / S2 group / Empirical software engineering Hypotheses formulation ● Often there is only 1 dependent variable (the main factor) ○ e.g., power consumption ● Often one level for the control group ○ e.g. use of old/traditional technique/tool ● One or more levels for experimental groups ○ e.g. use of new technique(s) tool(s) Other independent variables are the co-factors
  • 34. Vrije Universiteit Amsterdam Our main factor is not the only variable influencing the dependent variable(s) ○ e.g., network instability of your experimental environment, usage patterns of the analysed website, skills of subjects, analyzed system, experience of developers, ... We will never account for all possible co-factors Your best friend here is randomization In a good experiment: ● limit their effect through a good experimental design ● able to separate their effect from main factors ● analyze the interaction with main factor 34 Co-factors
  • 35. Vrije Universiteit Amsterdam 35 Ivano Malavolta / S2 group / Empirical software engineering Example
  • 36. Vrije Universiteit Amsterdam 36 Ivano Malavolta / S2 group / Empirical software engineering Subjects selection
  • 37. Vrije Universiteit Amsterdam 37 Ivano Malavolta / S2 group / Empirical software engineering Subjects selection ● Population: the complete set of items of interest for our experiment ○ e.g. open-source software applications ○ e.g., all existing progressive web apps ● Sample: representative selection of individuals for that population ○ e.g. Apache, MySQL ○ e.g. progressive web apps mined from Alexa’s list
  • 38. Vrije Universiteit Amsterdam 38 Ivano Malavolta / S2 group / Empirical software engineering Sampling techniques ● Probability Sampling: the probability of selecting each subject in the population is known ○ Simple Random Sampling: random selection from the population, probability is 1/total ○ Stratified random sampling: the population is divided into groups with a known distribution between the groups. Random sampling is then applied within each group ● Non-probability sampling: the probability of selecting each subject out of the population is unknown ○ Convenience: the most convenient (cost/distance/ complexity) subjects are selected [usually it is the only way to go] ○ Quota: you select samples from groups of subjects (e.g. male vs females, open-source vs closed source)
  • 39. Vrije Universiteit Amsterdam 39 Ivano Malavolta / S2 group / Empirical software engineering How big should be a sample? ● Sample size: the larger, the better (more general results) ● If the population has a high variation a larger sample size is needed ● Data analysis may influence sample size ○ some statistical tests have meaning only on large samples
  • 40. Vrije Universiteit Amsterdam You know how to: ● define the context of your experiment ● define research questions and hypotheses ● define independent and dependent variables ● strategies for selecting subjects Next step Measurement theory → how to define the “type” of variables 40 Ivano Malavolta / S2 group / Empirical software engineering What this lecture means to you?
  • 41. Vrije Universiteit Amsterdam 41 Ivano Malavolta / S2 group / Empirical software engineering Readings Chapter 8
  • 42. Vrije Universiteit Amsterdam 42 Ivano Malavolta / S2 group / Empirical software engineering Some contents of this part of lecture extracted from: ● Giuseppe Procaccianti’s lectures at VU ● Massimiliano Di Penta’s lectures at GSSI (Italy) Acknowledgements