User Experiments in Human-Computer Interaction

LECTURE 5:
USER EXPERIMENTS IN HCI
COMP 4026 – Advanced HCI
Semester 5 - 2017
Arindam Dey
University of South Australia

OVERVIEW
•  Why do we need user experiments?
•  How to design a user experiments?
•  Activity
•  How to run a user experiments?
•  Ethical considerations

Testing your idea/design/prototype with real users of
the application

You (designer / developer) ≠ User
Because you
•  know your system well
•  have special skills
•  know what you are measuring

Who should your users be in the study?
Sample must be a true representation of the population
Everyone who may
use your product
Participants in
your study

What users do and say?
To what extent they do it?
Why they do it and how to fix it?
courtesy: uxdesign.cc

Categories of usability tests based on goals
•  Formative
- Beginning of and during the product
development phase
- Usability problems and fixes
•  Summative
- Towards the end of the development phase
- Statistically measured usability

Categories of usability tests based on data collected
•  Qualitative
- Descriptions (verbally or behaviorally)
- Directly measured
- Takes more effort to analyze
- Mostly earlier in the design phase
•  Quantitative
- Measurements (numbers)
- Indirectly measured
- Later in the design phase

User Experiments in Human-Computer Interaction

User Experiments
•  A method of academic research in HCI
- To discover/test/approve new knowledge
•  Hypothesis driven
- Compares multiple conditions to discover causal
relationships
•  Replicable (generalizable)
- Thrives to remove bias and error (random
assignment)
•  Draw conclusions with statistical tests of the
hypothesis

Usability Testing vs. User Experiments
•  The methods can be the same
•  The goals are often different
•  Usability testing goals
- Identify usability problems & issues of a product
•  User experiment goals
- Answer research questions, discovering new
knowledge (generalizable results)

Usability Testing vs. User Experiments
Usability Testing User Experiment
Improve products Discover knowledge
Few participants Many participants
Results inform design Results validated statistically
Usually not completely replicable
- case specific results
Must be replicable
- generalizable results
Condition(s) controlled as much as
possible
Strongly controlled conditions
Procedure planned Experimental design
Results reported to product
designer / developer
Scientific report to scientific
community

Designing User Experiments
•  Hypothesis (research question)
•  Experimental task
•  Independent variables (IV)
•  Dependent variables (DV)
•  Subjective
•  Objective
•  Other variables
•  Random, controlled, confounding
•  Experimental designs
•  Within-subjects, between-subjects, mixed-factorial

Hypothesis
•  A prediction of the outcome
- Based on research question but narrower
- A research question can be tested in multiple
hypotheses
- Causal relationship between IV and DV
- A precise statement that can be directly tested
through an experiment
e.g. Condition A will be faster than Condition B

Hypothesis
•  Null hypothesis (H0)
- Predicts there is no effect of IV on DV
- Statistical tests accept/reject null hypothesis
•  Alternative hypothesis (HA)
- Predicts there is an effect of IV on DV
•  H0 and HA are mutually exclusive

Hypothesis Testing
Statistical tests (next lecture) are subject to Type I and
Type II errors

Experimental Task
•  A task that participants will do in a study under
different conditions
e.g. in Fitt’s Law studies participants click on buttons
using different input devices
•  Must be suitable to the application
- depends on what is the research question
•  Ideally risk-free

Independent Variables (IV)
•  Variables that are independent of participant's
behaviour
•  Systematically manipulated by the experimenter
•  Variables that experimenter is interested in
•  There can be one or more IVs in an experiment

Typical Independent Variables
•  Technology (controlled)
- Types of technology, device, interface, design
•  User
- Physical/mental/social status
- age, gender, computer experience,
professional domain, education, culture,
motivation, mood, and disabilities
•  Context of use
- Environmental status (physical/social)
- Lighting, noise, indoor/outdoor, public/
private

Independent Variables vs. Experimental Conditions
Conditions = factorized levels in all IVs

Dependent Variables (DV)
•  The outcome or effect that the researchers are
interested in
•  Dependent on participants’ behavior or the changes
in the IVs
•  Usually the outcomes that the researchers need to
measure
- measurements or observations

Dependent Variables (DV)
•  Subjective
- Based on users’ opinions, interpretations, points
of view, emotions and judgment
- More vulnerable to context and users’ status
- e.g. questionnaires, NASA TLX
•  Objective
- Not influenced by personal feeling/opinion
- Based on observation, compared against
standardized scale
- More consistent
- e.g. time, error

Typical Dependent Variables
•  Efficiency
- e.g. task completion time, speed
•  Accuracy
- e.g. error, success rate
•  Subjective satisfaction
- e.g. Likert scale ratings
•  Ease of learning
- e.g. test score, learning curve, retention rate
•  Physical or cognitive demand
- e.g. NASA task load index (TLX)

Other Variables
•  Controlled Variables
- Set to not change during an experiment
- The more controlled
- the more internal validity, but less
generalizable
•  Random Variables
- The more influence of random variable
- the less internal validity
•  Confounding Variables
- Variables that researchers failed to control
- damages internal validity

Validity of User Experiments
•  Internal Validity
- approximate truth about inferences regarding
cause-effect or causal relationships
- not relevant for observational studies
- higher under strict controlled lab conditions
•  External Validity
- the extent to which the conclusions of the
experiment is generalizable
- three types: population, environmental, and
temporal

Experimental Designs
•  Within-subjects
- Each subject performs under all the different conditions
- Repeated-measure
•  Between-subjects
- Each subject is assigned to one experimental
condition
- Independent samples
- Matched groups
•  Mixed-factorial
- Combination of the two
- More than one IV needed

Experimental Designs
Condition A
Participant 1
Participant 2
.
.
.
Participant 10
Condition B
Participant 1
Participant 2
.
.
.
Participant 10
Condition A
Participant 1
Participant 2
.
.
.
Participant 10
Condition B
Participant 11
Participant 12
.
.
.
Participant 20
Within-subjects Between-subjects

Within-Subjects vs. Between-Subjects
Within-subjects Between-subjects
Learning effect Avoids interference effects
(e.g. practice / learning effect)
Longer time for each participant
(larger impact of fatigue and frustration)
Shorter time for each participant
(less fatigue and frustration)
Individual difference can be isolated Impact of individuals difference
Easier to detect difference between
conditions
Harder to detect difference between
conditions
Requires smaller sample size Require larger sample size
Counterbalance/randomize the order of
presenting conditions
Randomized assignment to conditions or
matched groups

Randomization
•  Critical condition of a true experiment
•  The random assignment of treatments to the
•  experimental units or participants
•  No one, including the experimenters, can control the
assignments
•  Main way to minimize the effects of random
variables

Counterbalancing
•  All possible permutations
- 3 conditions => 3P3 = 6 permutations
- (1,2,3), (1,3,2), (2,1,3), (2,3,1), (3,1,2),
(3,2,1)
- 4 conditions => 4P4 = 24 permutations
- (1,2,3,4), (1,2,4,3), (1,3,2,4), (1,3,4,2), …
•  Number of participants must be multiple of number
of permutations

Balanced Latin Square
•  Latin Square
- Each item occurs once in each row and column
•  Balanced Latin Square
- Each item both precedes and follows each
other item an equal number of times

Participants / Subjects
•  The sample in your experiment
•  Number of participants
- Between subject design: 15~20 per condition
- Within subject design: 15~20
- The smaller the effect size, the more
participants needed
- The more variance between users, the more
participants needed
- The more conditions in the experiment, the
more participants needed

Power Analysis
•  You can calculate the ideal number of participants
you have to test
•  Parameters needed:
- α: the probability of rejecting the H0 given that
that the H0 is true (usually set to 0.05)
- β (1-β = power: the probability of observing a
difference when it really exist, usually set to 0.8)
- Effect size: difference of mean divided by std.
dev.
•  Free program for power analysis: g*Power
http://www.gpower.hhu.de/en.html

Errors
•  Random Errors
- Also called ‘chance errors’ or ‘noises’
- Cause variations in both direction
- Can be controlled by a large sample size +
randomization
•  Systematic Errors
- Also called ‘biases’
- Always push actual value in the same direction
- No matter how large the sample is, cannot be
offset unless the source of error is controlled

Errors
•  Five major sources
- measurement instruments
- experimental procedures
- sampling participants
- experimenter behavior
- experimental environment

After Designing the Study
•  Write down the design
- hypotheses
- task
- IVs and DVs
- design of the experiment
- participants
- randomization / counterbalancing
- data collection
•  Critically review your own design
•  Ask others to review your design

Activity
Fill out the template with your study design
You have designed a new application to resize
photos on mobile phones (Condition A) quickly.
There are several alternative solutions available in
the market, pick anyone of them (Condition B).
Design a user experiment to compare Condition A
and Condition B.

Running User Experiments
• Experimental Procedure
• Pilot Study
• Main Study

Typical Experimental Session (1/2)
•  Ensure the apparatus are ready
- Both the system under test and measurement
devices
- Test-run
- Make sure forms, questionnaires etc. are printed,…
•  Greet the participants
•  Introduce the purpose of the study and the procedures
(experimenter script)
•  Get the consent of the participants
•  Assign the participants to a specific experiment
condition according to the pre-defined randomization
method

Typical Experimental Session (2/2)
•  Participants complete training task
•  Participants complete experimental tasks
•  Participants answer questionnaires (if any)
•  If within-subject design
- change conditions and repeat above
•  Debriefing session
- Collect details through interview
•  Compliments (always give some gift)

Pilot Study
•  A small trial run of the main testing.
- Can identify majority of issues with both the
prototype and the experimental design
•  Pilot testing check:
- that the experimental plan is viable
- you can conduct the procedure
- your prototype and instruments for
measurement work appropriately
- the experimental task and environment
•  Iron out problems before doing the main experiment
•  This is not optional

As an Experimenter
•  Offload your Brain!
- Write down instructions and important
information
- Prepare checklists
- Print questionnaires and documents in advance
•  Take notes, document oddities
- Create templates
•  Rehearse procedures
- Do you need assistants?
•  Nothing is as bad as lost data - AVOID!!!
- Collect ASAP, Backup

Research Ethics
• Consent
• Respect
• Privacy

Consent
•  Participant has the right to know
- The experimental procedure
- What kind of data is collected
- Risks involved
- How the data will be stored and presented
•  Experimenter must
- Explain the experiment in detail
- Ask participant to sign a consent form

Respect Participants
•  They are volunteers and should be allowed to
- Take a rest (between conditions)
- Leave the experiment anytime without reasoning
- Given a token of appreciation (gift, money etc.)
- Take time to organize (but don’t waste their time)
“ Do unto others as you would have them do unto
you.”
- MATTHEW 7:12

Privacy
•  Never disclose their identifiable data to anyone
without written consent
•  Data must be stored in secure locations
- Digitally and physically
•  Don’t use identifiable data, images, or videos in
reports or publications

Limitations
• No data collection method will be perfect
- control problems
- available technical equipment
• Differences
- Multiple researchers
- Multiple methods
- Multiple measures
- Objective vs. Subjective
- Qualitative vs. Quantitative

Limitations
• A single study cannot tell us everything
- Important to make sure it’s replicable
• One paper ≠ scientific truth
- Different researchers, different methods, all
coming to the same conclusion, that’s when you
find consensus
• Science is not static
- Theories evolve and change over time

User Experiments in Human-Computer Interaction

More Related Content

What's hot (20)

Similar to User Experiments in Human-Computer Interaction (20)

Recently uploaded (20)

User Experiments in Human-Computer Interaction