[13 - A] Experiment validity

‹#› Het begint met een idee
Experiment validity
Ivano Malavolta

Vrije Universiteit Amsterdam
2 Ivano Malavolta / S2 group / Empirical software engineering
Planning phases
Scope of this
lecture

3 Ivano Malavolta / S2 group / Green Lab
Experiment validity
● We aim for adequate validity, not universal validity
○ What matters is our population of interest
Validity is the extent to which our results are sound and
applicable to the real world
● Validity is in trade-off with experiment scope

Threats Identification
4
● Identifying threats helps to plan for adequate validity
● Each threat needs appropriate mitigation
● Several classifications of validity threats:
○ Campbell and Stanley [1]
○ Cook and Campbell [2]
Ivano Malavolta / S2 group / Green Lab

5
Types of threat to validity
Theory
Observation
Cause EffectCausation
e.g. encoding algorithms e.g. Energy efficiency
Treatment Experiment Outcome
e.g. JPEG e.g. energy per image

Causation
Experiment
6
Types of threat to validity
Theory
Observation
Cause Effect
Treatment Outcome
Construct
Internal
Conclusion
Construct
External
e.g. encoding algorithms e.g. Energy efficiency
e.g. JPEG e.g. energy per image

7
Internal validity
Internal Validity: causality between treatment and outcome
● Strongly related to the experiment design and operation
○ Are my results caused by the treatment?
○ Have I considered all possible factors?

8
Internal validity: types of threat
● History
○ Different trials of the experiment performed in different time frames (eg,
after holidays vs normal days)
● Maturation
○ Subjects may react differently over time (eg, learning effect, tiresome,
boredome)
● Selection
○ Some subjects may abandon the experiment
○ Event worse, some specific type of subjects may leave it
● Reliability of measures
○ If you repeat the measurement you should get similar results → same
conclusions

9
Internal validity: mitigation
Analyze and identify confounding factors/noise
Choose appropriate experiment design
Keep environment under control

Conclusion Validity: statistical correctness and significance
● Are my conclusions correct?
● Are my results significant enough?
10
Conclusion validity

11
Conclusion validity: types of threat
● Low statistical power
○ Results not statistically significant
○ There is a significant difference but the statistical test does not reveal it due
to the low number of data points
● Violated assumptions of statistical tests
○ eg, many tests assume normally distributed samples
● Fishing and error rate
○ If you are combining multiple statistical tests, also their significance
should be adapted

12
Conclusion validity: mitigation
Select appropriate tests
Use only as much significance as needed

13
Construct validity
● Have I defined my constructs properly?
● Am I analyzing the correct variables for the effects?
Construct Validity: relation between theory and observation

14
Construct validity: types of threat
● Inadequate preoperational explication of constructs
○ construct not well defined before being translated into measures
○ Theory unclear
○ Comparing two methods, but not clear what does mean that a method is
better than another
● Mono-operation bias
○ I have one independent variable only, one single object or treatment
→ the experiment could not represent the theory
○ eg, inspection conducted on a single document not representative of the
set of documents on which the technique is often applied
● Mono-method bias
○ When you use a single type of measures or observations
○ The experimenter may bias the measures

15
Construct validity: mitigation
Early definition of constructs (GQM)
Use appropriate experiment design
Introduce redundancy for cross-checks

16
External validity
● Are my results valid for the whole target population?
● Have I selected a representative sample?
External Validity: generalizability of the results

17
External validity: types of threat
● Interaction of selection and treatment
○ the population of subjects is not representative of the one for which I would
like to generalize my results
○ eg, performing experiments with students to use results in industry
● Interaction of setting and treatment
○ the experimental setting or the material are not representative
○ e.g. I let the subjects using tools that they don’t use in the reality
○ e.g. Web development using textual editors
○ Use of toy objects
● Interaction of history and treatment
○ the experiment is conducted on a special time or day which affects the
results
○ eg, our experiment on green software is performed after a big congress at
which some subjects participated

18
External validity: mitigation
Use an environment as realistic as possible
Explicitly define and model your context

● You know that you have to explicitly take into account the
threats to validity of your experiment
● Discussing threats actually makes your experiment stronger
▪ you are not showing your weaknesses, but you are playing for replicability
● You will make tradeoffs between threats to validity in your
experiment
● Consider threats to validity as early as possible
▪ Reasoning on them will make you feel more confident about the scope
and design of your experiment
19
What this lecture means to you?

Readings
Chapter 8
[1] Campbell and Stanley, Experimental and Quasi- Experimental designs for Research (1963).
(Blackboard)
[2] Cook and Campbell, Quasi-experimentation - Design and Analysis Issues for Field Settings
(1979). Available at the VU library.

Some contents of lecture extracted from:
● Giuseppe Procaccianti’s lectures at VU
Acknowledgements

[13 - A] Experiment validity

More Related Content

What's hot (20)

Similar to [13 - A] Experiment validity (20)

More from Ivano Malavolta (20)

Recently uploaded (20)

[13 - A] Experiment validity