SlideShare a Scribd company logo
‹#› Het	begint	met	een	idee
Experiment	validity
Ivano	Malavolta
Vrije Universiteit Amsterdam
2 Ivano	Malavolta	/	S2	group	/	Empirical	software	engineering
Planning	phases
Scope of this
lecture
Vrije Universiteit Amsterdam
3 Ivano	Malavolta	/	S2	group	/	Green	Lab
Experiment	validity
● We aim for adequate validity, not universal validity
○ What matters is our population of interest
Validity is the extent to which our results are sound and
applicable to the real world
● Validity is in trade-off with experiment scope
Vrije Universiteit Amsterdam
Threats	Identification
4
● Identifying threats helps to plan for adequate validity
● Each threat needs appropriate mitigation
● Several classifications of validity threats:
○ Campbell and Stanley [1]
○ Cook and Campbell [2]
Ivano	Malavolta	/	S2	group	/	Green	Lab
Vrije Universiteit Amsterdam
5
Types	of	threat	to	validity
Theory
Observation
Cause EffectCausation
e.g. encoding algorithms e.g. Energy efficiency
Treatment Experiment Outcome
e.g. JPEG e.g. energy per image
Ivano	Malavolta	/	S2	group	/	Green	Lab
Vrije Universiteit Amsterdam
Causation
Experiment
6
Types	of	threat	to	validity
Theory
Observation
Cause Effect
Treatment Outcome
Construct
Internal
Conclusion
Construct
External
Ivano	Malavolta	/	S2	group	/	Green	Lab
e.g. encoding algorithms e.g. Energy efficiency
e.g. JPEG e.g. energy per image
Vrije Universiteit Amsterdam
7
Internal	validity
Internal Validity: causality between treatment and outcome
● Strongly related to the experiment design and operation
○ Are my results caused by the treatment?
○ Have I considered all possible factors?
Ivano	Malavolta	/	S2	group	/	Green	Lab
Vrije Universiteit Amsterdam
8
Internal	validity:	types	of	threat
● History
○ Different trials of the experiment performed in different time frames (eg,
after holidays vs normal days)
● Maturation
○ Subjects may react differently over time (eg, learning effect, tiresome,
boredome)
● Selection
○ Some subjects may abandon the experiment
○ Event worse, some specific type of subjects may leave it
Ivano	Malavolta	/	S2	group	/	Green	Lab
● Reliability of measures
○ If you repeat the measurement you should get similar results → same
conclusions
Vrije Universiteit Amsterdam
9
Internal	validity:	mitigation
Analyze and identify confounding factors/noise
Choose appropriate experiment design
Ivano	Malavolta	/	S2	group	/	Green	Lab
Keep environment under control
Vrije Universiteit Amsterdam
Conclusion Validity: statistical correctness and significance
● Are my conclusions correct?
● Are my results significant enough?
10
Conclusion	validity
Ivano	Malavolta	/	S2	group	/	Green	Lab
Vrije Universiteit Amsterdam
11
Conclusion	validity:	types	of	threat
● Low statistical power
○ Results not statistically significant
○ There is a significant difference but the statistical test does not reveal it due
to the low number of data points
● Violated assumptions of statistical tests
○ eg, many tests assume normally distributed samples
● Fishing and error rate
○ If you are combining multiple statistical tests, also their significance
should be adapted
Ivano	Malavolta	/	S2	group	/	Green	Lab
Vrije Universiteit Amsterdam
12
Conclusion	validity:	mitigation
Select appropriate tests
Use only as much significance as needed
Ivano	Malavolta	/	S2	group	/	Green	Lab
Vrije Universiteit Amsterdam
13
Construct	validity
● Have I defined my constructs properly?
● Am I analyzing the correct variables for the effects?
Construct Validity: relation between theory and observation
Ivano	Malavolta	/	S2	group	/	Green	Lab
Vrije Universiteit Amsterdam
14
Construct	validity:	types	of	threat
● Inadequate preoperational explication of constructs
○ construct not well defined before being translated into measures
○ Theory unclear
○ Comparing two methods, but not clear what does mean that a method is
better than another
● Mono-operation bias
○ I have one independent variable only, one single object or treatment
→ the experiment could not represent the theory
○ eg, inspection conducted on a single document not representative of the
set of documents on which the technique is often applied
● Mono-method bias
○ When you use a single type of measures or observations
○ The experimenter may bias the measures
Ivano	Malavolta	/	S2	group	/	Green	Lab
Vrije Universiteit Amsterdam
15
Construct	validity:	mitigation
Early definition of constructs (GQM)
Use appropriate experiment design
Introduce redundancy for cross-checks
Ivano	Malavolta	/	S2	group	/	Green	Lab
Vrije Universiteit Amsterdam
16
External	validity
● Are my results valid for the whole target population?
● Have I selected a representative sample?
External Validity: generalizability of the results
Ivano	Malavolta	/	S2	group	/	Green	Lab
Vrije Universiteit Amsterdam
17
External	validity:	types	of	threat
● Interaction of selection and treatment
○ the population of subjects is not representative of the one for which I would
like to generalize my results
○ eg, performing experiments with students to use results in industry
● Interaction of setting and treatment
○ the experimental setting or the material are not representative
○ e.g. I let the subjects using tools that they don’t use in the reality
○ e.g. Web development using textual editors
○ Use of toy objects
● Interaction of history and treatment
○ the experiment is conducted on a special time or day which affects the
results
○ eg, our experiment on green software is performed after a big congress at
which some subjects participated
Ivano	Malavolta	/	S2	group	/	Green	Lab
Vrije Universiteit Amsterdam
18
External	validity:	mitigation
Use an environment as realistic as possible
Explicitly define and model your context
Ivano	Malavolta	/	S2	group	/	Green	Lab
Vrije Universiteit Amsterdam
● You know that you have to explicitly take into account the
threats to validity of your experiment
● Discussing threats actually makes your experiment stronger
▪ you are not showing your weaknesses, but you are playing for replicability
● You will make tradeoffs between threats to validity in your
experiment
● Consider threats to validity as early as possible
▪ Reasoning on them will make you feel more confident about the scope
and design of your experiment
19
What	this	lecture	means	to	you?
Ivano	Malavolta	/	S2	group	/	Green	Lab
Vrije Universiteit Amsterdam
20 Ivano	Malavolta	/	S2	group	/	Empirical	software	engineering
Readings
Chapter 8
[1] Campbell and Stanley, Experimental and Quasi- Experimental designs for Research (1963).
(Blackboard)
[2] Cook and Campbell, Quasi-experimentation - Design and Analysis Issues for Field Settings
(1979). Available at the VU library.
Ivano	Malavolta	/	S2	group	/	Green	Lab
Vrije Universiteit Amsterdam
21 Ivano	Malavolta	/	S2	group	/	Empirical	software	engineering
Some contents of lecture extracted from:
● Giuseppe Procaccianti’s lectures at VU
Acknowledgements

More Related Content

PDF
[13 - B] Experiment reporting
PDF
[09-A] Statistical tests and effect size
PDF
[07-B] Statistical hypothesis testing
PDF
[03-A] Experiment planning
PDF
The Green Lab - [09 A] Statistical tests and effect size
PDF
[05-B] Experiment design (advanced)
PDF
The Green Lab - [07-A] Data Analysis
PDF
[05-A] Experiment design (basics)
[13 - B] Experiment reporting
[09-A] Statistical tests and effect size
[07-B] Statistical hypothesis testing
[03-A] Experiment planning
The Green Lab - [09 A] Statistical tests and effect size
[05-B] Experiment design (advanced)
The Green Lab - [07-A] Data Analysis
[05-A] Experiment design (basics)

What's hot (20)

PDF
The Green Lab - [11-A] Data Visualization
PDF
The Green Lab - [05 A] Experiment design (basics)
PDF
The Green Lab - [05 B] Experiment design (advanced)
PDF
The Green Lab - [01 C] Empirical software engineering
PDF
[02-A] The experimental process
PDF
[03-B] Measurement theory basics
PDF
[02-B] Experiment scoping
PDF
[01-B] Empirical software engineering
PDF
The Green Lab - [02 B] Experiment scoping
PDF
[07-A] Descriptive Statistics and data exploration
PDF
Empirical Software Engineering - What is it and why do we need it?
PPTX
Tech meetup Data Driven - Codemotion
PDF
Building and Evaluating Theories 
 in Software Engineering
PPTX
Data in science
PPTX
Predire il futuro con Machine Learning & Big Data
PPTX
L8 scientific visualization of data
PDF
00 DoE vers. OFAT (or COST) , a comparison
PDF
Testing Scientific Thinking Skills protocol
PPTX
Bps managing dissertation
PDF
Resume_xuezhi
The Green Lab - [11-A] Data Visualization
The Green Lab - [05 A] Experiment design (basics)
The Green Lab - [05 B] Experiment design (advanced)
The Green Lab - [01 C] Empirical software engineering
[02-A] The experimental process
[03-B] Measurement theory basics
[02-B] Experiment scoping
[01-B] Empirical software engineering
The Green Lab - [02 B] Experiment scoping
[07-A] Descriptive Statistics and data exploration
Empirical Software Engineering - What is it and why do we need it?
Tech meetup Data Driven - Codemotion
Building and Evaluating Theories 
 in Software Engineering
Data in science
Predire il futuro con Machine Learning & Big Data
L8 scientific visualization of data
00 DoE vers. OFAT (or COST) , a comparison
Testing Scientific Thinking Skills protocol
Bps managing dissertation
Resume_xuezhi
Ad

Similar to [13 - A] Experiment validity (20)

PDF
The Green Lab - [09 B] Experiment validity
PDF
The Green Lab - [03 A] Experiment planning
PDF
Introduction to meta analysis
PPTX
UNIT 1 L1.pptx Practical Research 2 Lesson
PPTX
Online Recitation Sessions
PPTX
Promise 2011: Panel - "Practical Software Project Improvements using Actionab...
PPTX
Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)
PDF
Pragmatic software testing education - SIGCSE 2019
PPTX
FINAL (PPT)_PR2 11_12 Q1 0101_UNIT 1_LESSON 1_Qualities of Quantitative Resea...
PPTX
ODiP: Open data and the scientific gift culture
PDF
Syllabus
PDF
Advanced Pedagogy training in different outcomes
PPTX
ualities-of-Quantitative-Research-1.pptx
PPTX
Design of esperiment
PDF
Step Up Your Survey Research - Dawn of the Data Age Lecture Series
DOCX
Psych 610 week 5 individual assignment homework exercise
PDF
Test Bank for Perspectives on Personality, 8th Edition
PPTX
2-nature.pptx
PDF
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...
PDF
Practical Business Statistics Student Solutions Manual e only 6th edition Edi...
The Green Lab - [09 B] Experiment validity
The Green Lab - [03 A] Experiment planning
Introduction to meta analysis
UNIT 1 L1.pptx Practical Research 2 Lesson
Online Recitation Sessions
Promise 2011: Panel - "Practical Software Project Improvements using Actionab...
Common Shortcomings in SE Experiments (ICSE'14 Doctoral Symposium Keynote)
Pragmatic software testing education - SIGCSE 2019
FINAL (PPT)_PR2 11_12 Q1 0101_UNIT 1_LESSON 1_Qualities of Quantitative Resea...
ODiP: Open data and the scientific gift culture
Syllabus
Advanced Pedagogy training in different outcomes
ualities-of-Quantitative-Research-1.pptx
Design of esperiment
Step Up Your Survey Research - Dawn of the Data Age Lecture Series
Psych 610 week 5 individual assignment homework exercise
Test Bank for Perspectives on Personality, 8th Edition
2-nature.pptx
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...
Practical Business Statistics Student Solutions Manual e only 6th edition Edi...
Ad

More from Ivano Malavolta (20)

PDF
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
PDF
Conducting Experiments on the Software Architecture of Robotic Systems (QRARS...
PDF
The H2020 experience
PDF
The Green Lab - Research cocktail @Vrije Universiteit Amsterdam (October 2020)
PDF
Software sustainability and Green IT
PDF
Navigation-aware and Personalized Prefetching of Network Requests in Android ...
PDF
How Maintainability Issues of Android Apps Evolve [ICSME 2018]
PDF
Collaborative Model-Driven Software Engineering: a Classification Framework a...
PDF
Experimenting on Mobile Apps Quality - a tale about Energy, Performance, and ...
PDF
Modeling objects interaction via UML sequence diagrams [Software Design] [Com...
PDF
Modeling behaviour via UML state machines [Software Design] [Computer Science...
PDF
Object-oriented design patterns in UML [Software Design] [Computer Science] [...
PDF
Structure modeling with UML [Software Design] [Computer Science] [Vrije Unive...
PDF
Requirements engineering with UML [Software Design] [Computer Science] [Vrije...
PDF
Modeling and abstraction, software development process [Software Design] [Com...
PDF
[2017/2018] Agile development
PDF
Reconstructing microservice-based architectures
PDF
[2017/2018] AADL - Architecture Analysis and Design Language
PDF
[2017/2018] Architectural languages
PDF
[2017/2018] Introduction to Software Architecture
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Conducting Experiments on the Software Architecture of Robotic Systems (QRARS...
The H2020 experience
The Green Lab - Research cocktail @Vrije Universiteit Amsterdam (October 2020)
Software sustainability and Green IT
Navigation-aware and Personalized Prefetching of Network Requests in Android ...
How Maintainability Issues of Android Apps Evolve [ICSME 2018]
Collaborative Model-Driven Software Engineering: a Classification Framework a...
Experimenting on Mobile Apps Quality - a tale about Energy, Performance, and ...
Modeling objects interaction via UML sequence diagrams [Software Design] [Com...
Modeling behaviour via UML state machines [Software Design] [Computer Science...
Object-oriented design patterns in UML [Software Design] [Computer Science] [...
Structure modeling with UML [Software Design] [Computer Science] [Vrije Unive...
Requirements engineering with UML [Software Design] [Computer Science] [Vrije...
Modeling and abstraction, software development process [Software Design] [Com...
[2017/2018] Agile development
Reconstructing microservice-based architectures
[2017/2018] AADL - Architecture Analysis and Design Language
[2017/2018] Architectural languages
[2017/2018] Introduction to Software Architecture

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Encapsulation theory and applications.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Machine Learning_overview_presentation.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
cuic standard and advanced reporting.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Approach and Philosophy of On baking technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
A comparative analysis of optical character recognition models for extracting...
Encapsulation theory and applications.pdf
Spectral efficient network and resource selection model in 5G networks
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
MIND Revenue Release Quarter 2 2025 Press Release
Machine Learning_overview_presentation.pptx
Electronic commerce courselecture one. Pdf
cuic standard and advanced reporting.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Mobile App Security Testing_ A Comprehensive Guide.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Machine learning based COVID-19 study performance prediction

[13 - A] Experiment validity

  • 2. Vrije Universiteit Amsterdam 2 Ivano Malavolta / S2 group / Empirical software engineering Planning phases Scope of this lecture
  • 3. Vrije Universiteit Amsterdam 3 Ivano Malavolta / S2 group / Green Lab Experiment validity ● We aim for adequate validity, not universal validity ○ What matters is our population of interest Validity is the extent to which our results are sound and applicable to the real world ● Validity is in trade-off with experiment scope
  • 4. Vrije Universiteit Amsterdam Threats Identification 4 ● Identifying threats helps to plan for adequate validity ● Each threat needs appropriate mitigation ● Several classifications of validity threats: ○ Campbell and Stanley [1] ○ Cook and Campbell [2] Ivano Malavolta / S2 group / Green Lab
  • 5. Vrije Universiteit Amsterdam 5 Types of threat to validity Theory Observation Cause EffectCausation e.g. encoding algorithms e.g. Energy efficiency Treatment Experiment Outcome e.g. JPEG e.g. energy per image Ivano Malavolta / S2 group / Green Lab
  • 6. Vrije Universiteit Amsterdam Causation Experiment 6 Types of threat to validity Theory Observation Cause Effect Treatment Outcome Construct Internal Conclusion Construct External Ivano Malavolta / S2 group / Green Lab e.g. encoding algorithms e.g. Energy efficiency e.g. JPEG e.g. energy per image
  • 7. Vrije Universiteit Amsterdam 7 Internal validity Internal Validity: causality between treatment and outcome ● Strongly related to the experiment design and operation ○ Are my results caused by the treatment? ○ Have I considered all possible factors? Ivano Malavolta / S2 group / Green Lab
  • 8. Vrije Universiteit Amsterdam 8 Internal validity: types of threat ● History ○ Different trials of the experiment performed in different time frames (eg, after holidays vs normal days) ● Maturation ○ Subjects may react differently over time (eg, learning effect, tiresome, boredome) ● Selection ○ Some subjects may abandon the experiment ○ Event worse, some specific type of subjects may leave it Ivano Malavolta / S2 group / Green Lab ● Reliability of measures ○ If you repeat the measurement you should get similar results → same conclusions
  • 9. Vrije Universiteit Amsterdam 9 Internal validity: mitigation Analyze and identify confounding factors/noise Choose appropriate experiment design Ivano Malavolta / S2 group / Green Lab Keep environment under control
  • 10. Vrije Universiteit Amsterdam Conclusion Validity: statistical correctness and significance ● Are my conclusions correct? ● Are my results significant enough? 10 Conclusion validity Ivano Malavolta / S2 group / Green Lab
  • 11. Vrije Universiteit Amsterdam 11 Conclusion validity: types of threat ● Low statistical power ○ Results not statistically significant ○ There is a significant difference but the statistical test does not reveal it due to the low number of data points ● Violated assumptions of statistical tests ○ eg, many tests assume normally distributed samples ● Fishing and error rate ○ If you are combining multiple statistical tests, also their significance should be adapted Ivano Malavolta / S2 group / Green Lab
  • 12. Vrije Universiteit Amsterdam 12 Conclusion validity: mitigation Select appropriate tests Use only as much significance as needed Ivano Malavolta / S2 group / Green Lab
  • 13. Vrije Universiteit Amsterdam 13 Construct validity ● Have I defined my constructs properly? ● Am I analyzing the correct variables for the effects? Construct Validity: relation between theory and observation Ivano Malavolta / S2 group / Green Lab
  • 14. Vrije Universiteit Amsterdam 14 Construct validity: types of threat ● Inadequate preoperational explication of constructs ○ construct not well defined before being translated into measures ○ Theory unclear ○ Comparing two methods, but not clear what does mean that a method is better than another ● Mono-operation bias ○ I have one independent variable only, one single object or treatment → the experiment could not represent the theory ○ eg, inspection conducted on a single document not representative of the set of documents on which the technique is often applied ● Mono-method bias ○ When you use a single type of measures or observations ○ The experimenter may bias the measures Ivano Malavolta / S2 group / Green Lab
  • 15. Vrije Universiteit Amsterdam 15 Construct validity: mitigation Early definition of constructs (GQM) Use appropriate experiment design Introduce redundancy for cross-checks Ivano Malavolta / S2 group / Green Lab
  • 16. Vrije Universiteit Amsterdam 16 External validity ● Are my results valid for the whole target population? ● Have I selected a representative sample? External Validity: generalizability of the results Ivano Malavolta / S2 group / Green Lab
  • 17. Vrije Universiteit Amsterdam 17 External validity: types of threat ● Interaction of selection and treatment ○ the population of subjects is not representative of the one for which I would like to generalize my results ○ eg, performing experiments with students to use results in industry ● Interaction of setting and treatment ○ the experimental setting or the material are not representative ○ e.g. I let the subjects using tools that they don’t use in the reality ○ e.g. Web development using textual editors ○ Use of toy objects ● Interaction of history and treatment ○ the experiment is conducted on a special time or day which affects the results ○ eg, our experiment on green software is performed after a big congress at which some subjects participated Ivano Malavolta / S2 group / Green Lab
  • 18. Vrije Universiteit Amsterdam 18 External validity: mitigation Use an environment as realistic as possible Explicitly define and model your context Ivano Malavolta / S2 group / Green Lab
  • 19. Vrije Universiteit Amsterdam ● You know that you have to explicitly take into account the threats to validity of your experiment ● Discussing threats actually makes your experiment stronger ▪ you are not showing your weaknesses, but you are playing for replicability ● You will make tradeoffs between threats to validity in your experiment ● Consider threats to validity as early as possible ▪ Reasoning on them will make you feel more confident about the scope and design of your experiment 19 What this lecture means to you? Ivano Malavolta / S2 group / Green Lab
  • 20. Vrije Universiteit Amsterdam 20 Ivano Malavolta / S2 group / Empirical software engineering Readings Chapter 8 [1] Campbell and Stanley, Experimental and Quasi- Experimental designs for Research (1963). (Blackboard) [2] Cook and Campbell, Quasi-experimentation - Design and Analysis Issues for Field Settings (1979). Available at the VU library. Ivano Malavolta / S2 group / Green Lab
  • 21. Vrije Universiteit Amsterdam 21 Ivano Malavolta / S2 group / Empirical software engineering Some contents of lecture extracted from: ● Giuseppe Procaccianti’s lectures at VU Acknowledgements