4. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 4
WHY EVALUATE WITH “USABILITY EVALUATION”?
Following guidelines never sufficient for
good user interfaces
Need both good design and user
studies
Similar to users with Contextual Inquiry
Note: users, subjects participants
5. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 5
THE “DON’TS” IN USABILITY EVALUATIONS (1)
Don’t evaluate whether it works (quality assurance)
Don’t have experimenters (you) evaluate it – get participants
Don’t (just) ask participant questions. This is NOT an “opinion
survey.” Instead, watch their behavior.
Don’t evaluate with groups: see how well product works for each
person individually (not a “focus group”)
6. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 6
“DON’TS” OF USABILITY EVALUATIONS (2)
Don’t train participants:
We need to see if they can figure it out themselves.
Don’t test participant evaluate the product
It is NOT a “user test” It is called a Usability Evaluation instead.
Don’t put your ego as a designer on the line
7. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 7
ISSUE: RELIABILITY
Do the results generalize to other people?
In fact, there might be individual differences among participants
If comparing two products,
use statistics for confidence intervals, p<.01
Small number of participants cannot evaluate the entire website or app
Just a sample
8. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 8
ISSUE: VALIDITY
Did the evaluation measure what we want?
Wrong participants
“Confounding” factors, etc,
Issues which were not controlled but may be relevant to the evaluation
Other usability problems, settings, etc.
Ordering effects
Learning effects
Too much help given to some participants
9. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 9
PLAN OUR EVALUATION
Goals:
Formative – help decide features and design CI (back then)
Summative – evaluate product UE (now)
Pilot evaluations
Preliminary evaluations to check materials, look for bugs, etc
Evaluate the instructions, timing, etc
Participants do not have to be representative
10. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 10
EVALUATION DESIGN
Within Subjects
Each participant does all conditions
Removes individual differences
Add ordering effects,
otherwise, just randomize!
Between Subjects
Each participant does one
condition
Quicker for each participant
But need more participants due to
huge variation in people
11. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 11
SOME MEASUREMENTS
Learnability Efficiency Errors Web Analytics Questionnaire
12. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 12
ANALYZING THE MEASUREMENT DATA
Numeric Data
Example:
times, number of errors, etc.
Tables and plots using a
spreadsheet
Look for trends and outliers
Organize Problems by:
Scope:
How widespread is the problem?
Severity:
How critical is the problem?
http://www.cs.cmu.edu/~bam/ui
course/UsabilityEvalReport_tem
plate2016.docx
13. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 13
GOAL LEVELS
Pick Levels for product:
• Theoretical best level
• Desired (planned) level
• Minimum acceptable level
• Current level or competitor's level
Errors
0 1 2 5
Best Desired
Minimum
Acceptable Current
14. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 14
QUESTIONNAIRE DESIGN (1)
Collect general demographic information that may be relevant
Evaluate feelings towards your product and other products
Important to design questionnaire carefully, otherwise:
Participants may find questions confusing
May not measure what you are interested in
15. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 15
QUESTIONNAIRE DESIGN (2)
“Likert scale”
Propose something and let people agree or disagree:
agree disagree
The product was easy to use: 1 .. 2 .. 3 .. 4 .. 5
“Semantic differential scale”
Two opposite feelings:
difficult easy
Finding the right information was: -2 .. -1 .. 0 .. 1 .. 2
If multiple choices, rank order them:
Rank the choices in order of preference (with 1 being most preferred and 4 being least):
Interface #1 Interface #2 Interface #3 Interface #4
(in a real survey, describe the interfaces)
16. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 16
QUESTION DESIGN (STRATEGY)
Apply clear writing. Use simple sentences.
If participants make mistakes, then questionnaire is invalid.
Put all positive answers in a column. Do not alternate!
This website was easy to use.
It was difficult to find what I needed on this website.
Use ranges in the answer options.
Up to 1000
1000 – 10,000
Bigger than 10,000
17. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 17
Standard (Validated)
Questionnaires
“Questionnaire for User Interface Satisfaction” (QUIS)
Chin, J.P., Diehl, V.A.,
Norman, K.L.
(1988) Development of an
Instrument Measuring User
Satisfaction of the Human-
Computer Interface. ACM
CHI'88 Proceedings, 213-218.
http://hcibib.org/perlman/qu
estion.cgi?form=QUIS
18. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INF
ORMATICS, ITS
18
OTHER QUESTOINNAIRE
EXAMPLE
Please take a look on UX Book, page 446
19. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 19
VIDEOTAPING
Useful, but very slow to analyze
Good for problem demos to developers or management
Facilitate impact analysis
20. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 20
”THINK ALOUD” PROTOCOLS
Get participant to continuously verbalize their thoughts
Encourage participants to express whatever interesting
May need to “coach” participant to keep talking
Ask general questions
“What did you expect”,
“What are you thinking now”
Not:
What do you think that button is for”,
“Why didn’t you click here”
21. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 21
NUMBER OF PARTICIPANTS
About 30 for statistical studies
> 5 for usability evaluation
Reference:
https://www.nngroup.com/articles
/how-many-test-users/
Testing more participants didn't
result in appreciably more insights
22. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 22
ETHICAL CONSIDERATIONS
No harm to the participants
Emphasize the product being evaluated, not the participants
Results of evaluation and participants’ identities are kept secret
Stop evaluation if participant is too upset
At end, ask for comments, thank the participants
23. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 23
Hawthorne Effect
Definition: When people are aware that they are
being observed, they change their normal behavior
unintentionally.
Example:
You are observing how a
participant interacts with an app.
The participant is informed that
his/her actions on the app would
be recorded. As a result, the
participant may be extra careful
not to make mistakes on the app
to avoid embarrassment.
24. HUMAN-COMPUTER INTERACTION - DEPARTMENT OF INFORMATICS, ITS 24
Hawthorne Effect
Definition: When people are aware that they are
being observed, they change their normal behavior
unintentionally.
Solution:
Inform the participant that there’s
no right or wrong way of
completing their tasks during the
research or experiment. Provide
smaller warm-up tasks at the
beginning of the session so that
the participant can become
comfortable with the environment.
#11:Learnability: Time to learn how to do specific tasks (at a specific proficiency)
Efficiency: (Expert) Time to execute benchmark (typical) tasks. Throughput.
Errors: Error rate per task. Time spent on errors. Error severity.
Lots of measures from web analytics:
Abandonment rates, Completion rates, Clickthroughs,
% completions, etc.
Subjective satisfaction: Questionnaire.
Performance Measurements
Time, number of tasks completed, number of errors, severity of errors, number of times help needed, quality of results, emotions, etc.
Decide in advance what is relevant
Can get quantifiable, objective numbers
“Usability Engineering”
Can instrument software to take measurements
Or try to log results “live” or from videotape
Some available from web analytics
Emotions and preferences from questionnaires and apparent frustration, happiness with product
#14:Collect general demographic information that may be relevant
Age, sex, computer experience, etc.
Evaluate feelings towards your product and other products
Important to design questionnaire carefully
Participants may find questions confusing
May not answer the question you think you are asking
May not measure what you are interested in
#16:Very hard to design questions that are hard to misunderstand or misread
Clear writing, simple sentences
If participants make mistakes, then questionnaire is invalid
For example, all positive answers in a column
Do not alternate (ref: http://www.measuringu.com/positive-negative.php)
This website was easy to use.
It was difficult to find what I needed on this website.
Participant confusion overrides trying to make people pay attention (doesn’t work)
Examples of problems:
“How big is the codebase for this project?”
300
50k SLOC
342658
2000 files
Big
Revised: have ranges instead of a textbox:
Up to 1000 LOC
1000 – 10,000
10,000 – 100,000
100,000 – 1,000,000
Bigger than 1,000,000
#19:Often useful for measuring after the evaluation
But very slow to analyze and transcribe
Useful for demonstrating problems to developers, management
Compelling to see someone struggling
Facilitate Impact analysis
Which problems will be most important to fix?
How many participants and how much time wasted on each problem
But careful notetaking will often suffice when usability problems are noticed
#20:“Single most valuable usability engineering method” – Nielsen
Get participant to continuously verbalize their thoughts
Find out why participant does things
What thought would happen, why stuck, frustrated, etc.
Encourage participants to expand on whatever interesting
But interferes with timings
May need to “coach” participant to keep talking
Unnatural to describe what thinking
Ask general questions: “What did you expect”, “What are you thinking now”
Not: “What do you think that button is for”, “Why didn’t you click here”
Will “give away” the answer or bias the participant
Alternative: have two participants and encourage discussion
#21:About 30 for statistical studies
As few as 5 for usability evaluation
Can update after each participant to correct problems
But can be misled by “spurious behavior” of a single person
Accidents or just not representative
cite: https://www.nngroup.com/articles/how-many-test-users/
Five participants cannot
evaluate all of the product
Different tests for different
parts
Can’t just do longer tests
#22:No harm to the participants
Emotional distress
Highly trained people especially concerned about looking foolish
Emphasize product being evaluated, not participant
Results of evaluation and participants’ identities kept secret
Stop evaluation if participant is too upset
At end, ask for comments, explain any deceptions, thank the participants
#25:Who runs the experiment?
Trained usability engineers know how to run a valid usability evaluation
Called “facilitators”
Good methodology is important
2-3 vs. 5-6 of 8 usability problems found
But useful for developers & designers to watch
Available if product (i.e., system) crashes or participant gets
completely stuck
But have to keep them from interfering
Randy Pausch’s strategy
Having at least one observer
(notetaker) is useful
Common error: don’t help too early!
Where Evaluate?
Usability Labs
Cameras, 2-way mirrors,
specialists
Separate observation
and control room
Should disclose who is watching
Having one may increase usability evaluations in an organization
Can usually perform an evaluation anywhere
Can use portable video recorder, screen recorder, etc.
Stages of an Evaluation
Preparation
Introduction
Running the evaluation
Cleanup after the evaluation
Preparation and Introduction
Make sure evaluation is ready to go before participant arrives
Introduce the observation phase
Say purpose is to evaluate software
Consent form
Pre-test questionnaire
Give instructions
Instruct them on how to do a think aloud
Write down script to make sure consistent for all participants
Final instructions (“Rules”):
Say that you won’t be able to answer questions during, but if questions cross their mind, say them aloud
If you forget to think aloud, I’ll say “Please keep talking”