SlideShare a Scribd company logo
ICS3211 - Intelligent
Interfaces II
Combining design with technology for effective human-
computer interaction
Week 9
Department of AI,
University of Malta,
2022
1
Testing & Evaluation
Week 9 overview:
• The What, Why and When of Evaluation & Testing
• Testing: expert review and lab testing
• Evaluation: formative/summative
• Evaluation: heuristic, cognitive walkthrough, usability testing
• Case study: evaluating different interfaces
2
Learning Outcomes
At the end of this session you should be able to:
• describe different forms of evaluation for different interfaces;
• compare and contrast the different evaluation methods with the
different contexts and identify the best one to use;
• list various rules for the heuristic evaluation (Schniederman &
Nielsen);
• list the various types of usability testing involved in evaluation;
• combine the various evaluation methods to come up with a
method that is most suitable to the project chosen.
3
Introduction
• Why evaluate?
• Designers become too entranced
• What I like
• Sunk cost fallacy
• Experienced designers know extensive testing is required
• How do you test?
• A web site?
• Air traf
fi
c control system?
• When do you test?
4
What to Evaluate?
• What to evaluate may range from screen functions,
aesthetic designs to work
fl
ows;
• Users of an ambient display may want to know if it
changes people’s behaviour;
• Class Activity: What aspects would you want to
evaluate in a VR system designed to change users’
behaviour (you can choose which behaviour you
would want to see modi
fi
ed). Log in to Moodle VLE.
5
• Evaluation stages depends on product being
designed;
• Formative evaluations - evaluation done to check a
product continues to meet users’ needs
• Summative evaluations - evaluation done to assess
the success of a product
6
Ways of Categorising
Evaluation
Evaluation Categories
• Cognitive Psychological Approaches
• Social Psychology Methods - Interviews and
Questionnaires
• Social Science Methods
• Engineering Approaches
7
Expert Review
• Colleagues or Customers
• Ask for opinions
• Considerations:
• What is an expert? User or designer?
• Half day to week
8
Formal Usability Inspection
• Experts hold courtroom-style meeting
• Each side gives arguments (in an adversarial
format)
• There is a judge or moderator
• Extensive and expensive
• Good for novice designers and managers
9
Expert Reviews
• Can be conducted at any time in the design process
• Focus on being comprehensive rather than being speci
fi
c on
improvements
• Example review recommendations
• Change log in procedure (from 3 to 5 minutes, because users
were busy)
• Reordering sequence of displays, removing nonessential
actions, providing feedback.
• Also come up with features for future releases
10
Expert Review
• Placed in situation similar to user
• Take training courses
• Read documentation
• Take tutorials
• Try the interface in a realistic work environment (complete with noise and
distractions)
• Bird’s eye view
• Studying a full set of printed screens laid on the
fl
oor or pinned to the walls
• See topics such as consistency
11
Heuristic Evaluation
• Give Expert heuristic, ask them to evaluate
• Shneiderman's "Eight Golden Rules of Interface
Design"
• Nielsen’s Heuristics
12
Shneiderman's "Eight Golden
Rules of Interface Design
• Strive for consistency
• Enable frequent users to use shortcuts
• Offer informative feedback
• Design dialog to yield closure
• Offer simple error handling
• Permit easy reversal of actions
• Support internal locus of control
• Reduce short-term memory load
13
Nielsen’s Heuristics
• Visibility of system status
• Match between system and the real world
• User control and freedom
• Consistency and standards
• Error prevention
• Recognition rather than recall
• Flexibility and ef
fi
ciency of use
• Aesthetic and minimalist design
• Help users recognize, diagnose, and recover from errors
• Help and documentation
14
Consistency Inspection
• Verify consistency across family of interfaces
• Check terminology, fonts, color, layout, i/o formats
• Look at documentation and online help
• Also can be used in conjunction with software tools
15
Cognitive Walkthrough
• Experts “simulate” being users going through the interface
• Tasks are ordered by frequency
• Good for interfaces that can be learned by “exploratory
browsing”
• Usually walkthrough by themselves, then report their
experiences (written, video) to designers meeting
• Useful if application is geared for group the designers might not
be familiar with:
• Military, Assistive Technologies
16
Metaphors of human Thinking
(MOT)
• Experts consider metaphors for
fi
ve aspects of
human thinking
• Habit
• Stream of thought
• Awareness and Associations
• Relation between utterances and thought
• Knowing
• Appears better than cognitive walkthrough and
heuristic evaluation
17
Types of Evaluation
• Controlled settings involving users
• usability testing
• living labs
• Natural settings involving users
•
fi
eld studies
• Any settings not involving users
18
Usability Testing and Labs
• 1980s, testing was luxury (but deadlines crept up)
• Usability testing was incentive for deadlines
• Fewer project overlays
• Sped up projects
• Cost savings
• Labs are different than academia
• Less general theory
• More practical studies
19
Staff
• Expertise in testing (psych, hci, comp sci)
• 10 to 15 projects per year
• Meet with UI architect to plan testing (Figure 4.2)
• Participate in early task analysis and design reviews
• T – 2-6 weeks, creates study design and test plan
• E.g. Who are participants? Beta testers, current customers,
in company staff, advertising
• T -1 week, pilot test (1-3 participants)
20
Participants
• Labs categorize users based on:
• Computing background
• Experience with task
• Motivation
• Education
• Ability with the language used in the interface
• Controls for
• Physical concerns (e.g. eyesight, handedness, age)
• Experimental conditions (e.g. time of day, physical surroundings, noise,
temperature, distractions)
21
Recording Participants
• Logging is important, yet tedious
• Software to help
• Powerful to see people use your interface
• New approaches: eye tracking
• IRB items
• Focus users on interface
• Tell them the task, duration
22
Thinking Aloud
• Concurrent think aloud
• Invite users to think aloud
• Nothing they say is wrong
• Don’t interrupt, let the user talk
• Spontaneous, encourages positive suggestions
• Can be done in teams of participants
• Retrospective think aloud
• Asks people afterwards what they were thinking
• Issues with accuracy
• Does not interrupt users (timings are more accurate)
23
Types of Usability Testing
• Paper mockups and prototyping
• Inexpensive, rapid, very
productive
• Low
fi
delity is sometimes
better
24
http://expressionflow.com/wp-content/uploads/2007/05/paper-mock-up.png
http://user.meduni-graz.at/andreas.holzinger/holzinger/papers%20en/
Types of Usability Testing
• Discount usability testing
• Test early and often (with 3 to 6 testers)
• Pros: Most serious problems can be found with 6 testers. Good for
formative evaluation (early)
• Cons: Complex systems can’t be tested this way. Not good for summative
evaluation (late)
• Competitive usability testing
• Compare against prior or competitor’s versions
• Experimenter bias, be careful to not “prime the user”
• Within-subjects is preferred
25
Types of Usability Testing
• Universal usability testing
• Test with highly diverse
• Users (experience levels, ability, etc.)
• Platforms (mac, pc, linux)
• Hardware (old (how old is old?) -> latest)
• Networks (dial-up -> broadband)
• Field tests and portable labs
• Tests UI in realistic environments
• Beta tests
26
Types of Usability Testing
• Remote usability testing (via web)
• Recruited via online communities, email
• Large n
• Dif
fi
culty in logging, validating data
• Software can help
• Can You Break this Test
• Challenge testers to break a system
• Games, security, public displays
27
Limitations
• Focuses on
fi
rst-time users
• Limited coverage of interface features
• Emergency (military, medical, mission-critical)
• Rarely used features
• Dif
fi
cult to simulate realistic conditions
• Testing mobile devices
• Signal strength
• Batteries
• User focus
• Yet formal studies on user studies have identi
fi
ed
• Cost savings
• Return on investment (Sherman 2006, Bias and Mayhew 2005)
28
Survey Instruments
• Questionnaires
• Paper or online (e.g. surveymonkey.com)
• Easy to grasp for many people
• The power of many can be shown
• 80% of the 500 users who tried the system liked Option A
• 3 out of the 4 experts like Option B
• Success depends on
• Clear goals in advance
• Focused items
29
Designing survey questions
• Ideally
• Based on existing questions
• Reviewed by colleagues
• Pilot tested
• Direct activities are better than gathering statistics
• Fosters unexpected discoveries
• Important to pre-test questions
• Understandability
• Bias
30
Likert Scales
• Most common methodology
• Strongly Agree, Agree, Neutral, Disagree, Strongly
Disagree
• 5, 7, 9-point scales
• Examples
• Improves my performance in book searching and
buying
• Enables me to search and buy books faster
• Makes it easier to search for and purchase books
31
Most Used Likert-scales
• Questionnaire for User
Interaction Satisfaction
• E.g. questions
• How long have you
worked on this system?
• System Usability Scale
(SUS) – Brooke 1996
• Post-Study System
Usability Questionniare
• Computer System
Usability Questionniare
• Software usability
Measurement Inventory
• Website Analysis and
MeasureMent Inventory
• Mobile Phone Usability
Questionnaire
• Validity, Reliability
32
Bipolar Semantically
Anchored
• Coleman and Williges (1985)
• Pleasant versus Irritating
• Hostile 1 2 3 4 5 6 7 Friendly
• If needed, take existing questionnaires and alter
them slightly for your application
33
Acceptance Tests
• Set goals for performance
• Objective
• Measurable
• Examples
• Mean time between failures (e.g. MOSI)
• Test cases
• Response time requirements
• Readability (including documentation and help)
• Satisfaction
• Comprehensibility
34
Let’s discuss
You want your project to be user friendly.
• Choose Schneiderman or Nielsen’s heuristics to
provide an evaluation methodology:
• What kind of setting would you use?
• How much control would you want to exert?
• Which methods are recorded and when will they
be recorded?
35
Acceptance Tests
• By completing the acceptance tests
• Can be part of contractual ful
fi
llment
• Demonstrate objectivity
• Different than usability tests
• More adversarial
• Neutral party should conduct that
• Ex. Video game and smartphone companies
• App Store, Microsoft, Nintendo, Sony
36
Evaluation during use
• Evaluation methods after a product has been released
• Interviews with individual users
• Get very detailed on speci
fi
c concerns
• Costly and time-consuming
• Focus group discussions
• Patterns of usage
• Certain people can dominate or sway opinion
• Targeted focus groups
37
Continuous Logging
• The system itself logs user usage
• Video game example
• Other examples
• Track frequency of errors (gives an ordered list of what to address via tutorials,
training, text changes, etc.)
• Speed of performance
• Track which features are used and which are not
• Web Analytics
• Privacy? What gets logged? Opt-in/out?
• What about companies?
38
Online and Telephone Help
• Users enjoy having people ready to help (real-time
chat online or via telephone)
• E.g. Net
fl
ix has 8.4 million customers, how many
telephone customer service reps?
• 375
• Expensive, but higher customer satisfaction
• Cheaper versions use Bug Report systems
39
Automated Evaluation
• Software for evaluation
• Low level: Spelling, term concordance
• Metrics: number of displays, tabs, widgets, links
• World Wide Web Consortium Markup Validation
• US NIST Web Metrics Testbed
• New research areas: Evaluation of mobile platforms
40
Case Study
• Computer Game:
• Physiological responses used to evaluate users’
experiences;
• Video of participants playing - observation;
• User satisfaction questionnaire;
• Possibilities of applying crowdsourcing for online
performance evaluations
41

More Related Content

PDF
ICS3211 Lecture 9
PPTX
ICS3211 lecture 10
PDF
How to Conduct Usability Studies: A Librarian Primer
PPT
Don’t make me think!
PPT
Usability requirements
PPTX
Usability Evaluation
PPTX
11 - Evaluating Framework in Interaction Design_new.pptx
PDF
UserExperienceWebroot
ICS3211 Lecture 9
ICS3211 lecture 10
How to Conduct Usability Studies: A Librarian Primer
Don’t make me think!
Usability requirements
Usability Evaluation
11 - Evaluating Framework in Interaction Design_new.pptx
UserExperienceWebroot

Similar to ICS3211_lecture 9_2022.pdf (20)

PPTX
evaluation -human computer interaction.pptx
PPT
Usability evaluations (part 3)
PPTX
Unit 3_Evaluation Technique.pptx
PPSX
Heuristic evaluation principles
PPSX
heuristicevaluationprinciples-201015124852.ppsx
ZIP
Usability Testing
PPTX
Usability testing 2013.12.20.
PPT
evaluation technique uni 2
PPTX
Guerilla Human Computer Interaction and Customer Based Design
PDF
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event
PPTX
體驗劇場_1050524_W14_易用性測試_楊政達
PPT
Chapter 8 Evaluation Techniques
PDF
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
PPT
1.6- User-Centered Design - Introduction
PPTX
TESTING
PDF
UXprobe workshop at Dare Festival 2016
PPTX
Usability evaluation methods (part 2) and performance metrics
PDF
Remote Moderated Usability Testing & Tools
PDF
Leveraging User Research
PPTX
Heuristic evaluation
evaluation -human computer interaction.pptx
Usability evaluations (part 3)
Unit 3_Evaluation Technique.pptx
Heuristic evaluation principles
heuristicevaluationprinciples-201015124852.ppsx
Usability Testing
Usability testing 2013.12.20.
evaluation technique uni 2
Guerilla Human Computer Interaction and Customer Based Design
Usability Testing for Qualitative Researchers - QRCA NYC Chapter event
體驗劇場_1050524_W14_易用性測試_楊政達
Chapter 8 Evaluation Techniques
Rapid Prototyping for XR: Lecture 1 Introduction to Prototyping
1.6- User-Centered Design - Introduction
TESTING
UXprobe workshop at Dare Festival 2016
Usability evaluation methods (part 2) and performance metrics
Remote Moderated Usability Testing & Tools
Leveraging User Research
Heuristic evaluation

More from Vanessa Camilleri (20)

PDF
ICS 2208 Lecture 8 Slides AI and VR_.pdf
PDF
ICS2208 Lecture6 Notes for SL spaces.pdf
PDF
ICS 2208 Lecture Slide Notes for Topic 6
PDF
ICS2208 Lecture4 Intelligent Interface Agents.pdf
PDF
ICS2208 Lecture3 2023-2024 - Model Based User Interfaces
PDF
ICS2208 Lecture 2 Slides Interfaces_.pdf
PDF
ICS Lecture 11 - Intelligent Interfaces 2023
PDF
ICS3211_lecture 09_2023.pdf
PDF
ICS3211_lecture 08_2023.pdf
PDF
ICS3211_lecture_week72023.pdf
PDF
ICS3211_lecture_week62023.pdf
PDF
ICS3211_lecture_week52023.pdf
PDF
ICS3211_lecture 04 2023.pdf
PDF
ICS3211_lecture 03 2023.pdf
PDF
ICS3211_lecture 11.pdf
PDF
FoundationsAIEthics2023.pdf
PDF
ICS1020CV_2022.pdf
PDF
ARI5902_2022.pdf
PDF
ICS2208 Lecture10
PDF
ICS2208 lecture9
ICS 2208 Lecture 8 Slides AI and VR_.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS 2208 Lecture Slide Notes for Topic 6
ICS2208 Lecture4 Intelligent Interface Agents.pdf
ICS2208 Lecture3 2023-2024 - Model Based User Interfaces
ICS2208 Lecture 2 Slides Interfaces_.pdf
ICS Lecture 11 - Intelligent Interfaces 2023
ICS3211_lecture 09_2023.pdf
ICS3211_lecture 08_2023.pdf
ICS3211_lecture_week72023.pdf
ICS3211_lecture_week62023.pdf
ICS3211_lecture_week52023.pdf
ICS3211_lecture 04 2023.pdf
ICS3211_lecture 03 2023.pdf
ICS3211_lecture 11.pdf
FoundationsAIEthics2023.pdf
ICS1020CV_2022.pdf
ARI5902_2022.pdf
ICS2208 Lecture10
ICS2208 lecture9

Recently uploaded (20)

PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
master seminar digital applications in india
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
Institutional Correction lecture only . . .
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
VCE English Exam - Section C Student Revision Booklet
human mycosis Human fungal infections are called human mycosis..pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Week 4 Term 3 Study Techniques revisited.pptx
PPH.pptx obstetrics and gynecology in nursing
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Final Presentation General Medicine 03-08-2024.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Insiders guide to clinical Medicine.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
master seminar digital applications in india
FourierSeries-QuestionsWithAnswers(Part-A).pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Institutional Correction lecture only . . .
Microbial diseases, their pathogenesis and prophylaxis
Module 4: Burden of Disease Tutorial Slides S2 2025
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
102 student loan defaulters named and shamed – Is someone you know on the list?

ICS3211_lecture 9_2022.pdf

  • 1. ICS3211 - Intelligent Interfaces II Combining design with technology for effective human- computer interaction Week 9 Department of AI, University of Malta, 2022 1
  • 2. Testing & Evaluation Week 9 overview: • The What, Why and When of Evaluation & Testing • Testing: expert review and lab testing • Evaluation: formative/summative • Evaluation: heuristic, cognitive walkthrough, usability testing • Case study: evaluating different interfaces 2
  • 3. Learning Outcomes At the end of this session you should be able to: • describe different forms of evaluation for different interfaces; • compare and contrast the different evaluation methods with the different contexts and identify the best one to use; • list various rules for the heuristic evaluation (Schniederman & Nielsen); • list the various types of usability testing involved in evaluation; • combine the various evaluation methods to come up with a method that is most suitable to the project chosen. 3
  • 4. Introduction • Why evaluate? • Designers become too entranced • What I like • Sunk cost fallacy • Experienced designers know extensive testing is required • How do you test? • A web site? • Air traf fi c control system? • When do you test? 4
  • 5. What to Evaluate? • What to evaluate may range from screen functions, aesthetic designs to work fl ows; • Users of an ambient display may want to know if it changes people’s behaviour; • Class Activity: What aspects would you want to evaluate in a VR system designed to change users’ behaviour (you can choose which behaviour you would want to see modi fi ed). Log in to Moodle VLE. 5
  • 6. • Evaluation stages depends on product being designed; • Formative evaluations - evaluation done to check a product continues to meet users’ needs • Summative evaluations - evaluation done to assess the success of a product 6 Ways of Categorising Evaluation
  • 7. Evaluation Categories • Cognitive Psychological Approaches • Social Psychology Methods - Interviews and Questionnaires • Social Science Methods • Engineering Approaches 7
  • 8. Expert Review • Colleagues or Customers • Ask for opinions • Considerations: • What is an expert? User or designer? • Half day to week 8
  • 9. Formal Usability Inspection • Experts hold courtroom-style meeting • Each side gives arguments (in an adversarial format) • There is a judge or moderator • Extensive and expensive • Good for novice designers and managers 9
  • 10. Expert Reviews • Can be conducted at any time in the design process • Focus on being comprehensive rather than being speci fi c on improvements • Example review recommendations • Change log in procedure (from 3 to 5 minutes, because users were busy) • Reordering sequence of displays, removing nonessential actions, providing feedback. • Also come up with features for future releases 10
  • 11. Expert Review • Placed in situation similar to user • Take training courses • Read documentation • Take tutorials • Try the interface in a realistic work environment (complete with noise and distractions) • Bird’s eye view • Studying a full set of printed screens laid on the fl oor or pinned to the walls • See topics such as consistency 11
  • 12. Heuristic Evaluation • Give Expert heuristic, ask them to evaluate • Shneiderman's "Eight Golden Rules of Interface Design" • Nielsen’s Heuristics 12
  • 13. Shneiderman's "Eight Golden Rules of Interface Design • Strive for consistency • Enable frequent users to use shortcuts • Offer informative feedback • Design dialog to yield closure • Offer simple error handling • Permit easy reversal of actions • Support internal locus of control • Reduce short-term memory load 13
  • 14. Nielsen’s Heuristics • Visibility of system status • Match between system and the real world • User control and freedom • Consistency and standards • Error prevention • Recognition rather than recall • Flexibility and ef fi ciency of use • Aesthetic and minimalist design • Help users recognize, diagnose, and recover from errors • Help and documentation 14
  • 15. Consistency Inspection • Verify consistency across family of interfaces • Check terminology, fonts, color, layout, i/o formats • Look at documentation and online help • Also can be used in conjunction with software tools 15
  • 16. Cognitive Walkthrough • Experts “simulate” being users going through the interface • Tasks are ordered by frequency • Good for interfaces that can be learned by “exploratory browsing” • Usually walkthrough by themselves, then report their experiences (written, video) to designers meeting • Useful if application is geared for group the designers might not be familiar with: • Military, Assistive Technologies 16
  • 17. Metaphors of human Thinking (MOT) • Experts consider metaphors for fi ve aspects of human thinking • Habit • Stream of thought • Awareness and Associations • Relation between utterances and thought • Knowing • Appears better than cognitive walkthrough and heuristic evaluation 17
  • 18. Types of Evaluation • Controlled settings involving users • usability testing • living labs • Natural settings involving users • fi eld studies • Any settings not involving users 18
  • 19. Usability Testing and Labs • 1980s, testing was luxury (but deadlines crept up) • Usability testing was incentive for deadlines • Fewer project overlays • Sped up projects • Cost savings • Labs are different than academia • Less general theory • More practical studies 19
  • 20. Staff • Expertise in testing (psych, hci, comp sci) • 10 to 15 projects per year • Meet with UI architect to plan testing (Figure 4.2) • Participate in early task analysis and design reviews • T – 2-6 weeks, creates study design and test plan • E.g. Who are participants? Beta testers, current customers, in company staff, advertising • T -1 week, pilot test (1-3 participants) 20
  • 21. Participants • Labs categorize users based on: • Computing background • Experience with task • Motivation • Education • Ability with the language used in the interface • Controls for • Physical concerns (e.g. eyesight, handedness, age) • Experimental conditions (e.g. time of day, physical surroundings, noise, temperature, distractions) 21
  • 22. Recording Participants • Logging is important, yet tedious • Software to help • Powerful to see people use your interface • New approaches: eye tracking • IRB items • Focus users on interface • Tell them the task, duration 22
  • 23. Thinking Aloud • Concurrent think aloud • Invite users to think aloud • Nothing they say is wrong • Don’t interrupt, let the user talk • Spontaneous, encourages positive suggestions • Can be done in teams of participants • Retrospective think aloud • Asks people afterwards what they were thinking • Issues with accuracy • Does not interrupt users (timings are more accurate) 23
  • 24. Types of Usability Testing • Paper mockups and prototyping • Inexpensive, rapid, very productive • Low fi delity is sometimes better 24 http://expressionflow.com/wp-content/uploads/2007/05/paper-mock-up.png http://user.meduni-graz.at/andreas.holzinger/holzinger/papers%20en/
  • 25. Types of Usability Testing • Discount usability testing • Test early and often (with 3 to 6 testers) • Pros: Most serious problems can be found with 6 testers. Good for formative evaluation (early) • Cons: Complex systems can’t be tested this way. Not good for summative evaluation (late) • Competitive usability testing • Compare against prior or competitor’s versions • Experimenter bias, be careful to not “prime the user” • Within-subjects is preferred 25
  • 26. Types of Usability Testing • Universal usability testing • Test with highly diverse • Users (experience levels, ability, etc.) • Platforms (mac, pc, linux) • Hardware (old (how old is old?) -> latest) • Networks (dial-up -> broadband) • Field tests and portable labs • Tests UI in realistic environments • Beta tests 26
  • 27. Types of Usability Testing • Remote usability testing (via web) • Recruited via online communities, email • Large n • Dif fi culty in logging, validating data • Software can help • Can You Break this Test • Challenge testers to break a system • Games, security, public displays 27
  • 28. Limitations • Focuses on fi rst-time users • Limited coverage of interface features • Emergency (military, medical, mission-critical) • Rarely used features • Dif fi cult to simulate realistic conditions • Testing mobile devices • Signal strength • Batteries • User focus • Yet formal studies on user studies have identi fi ed • Cost savings • Return on investment (Sherman 2006, Bias and Mayhew 2005) 28
  • 29. Survey Instruments • Questionnaires • Paper or online (e.g. surveymonkey.com) • Easy to grasp for many people • The power of many can be shown • 80% of the 500 users who tried the system liked Option A • 3 out of the 4 experts like Option B • Success depends on • Clear goals in advance • Focused items 29
  • 30. Designing survey questions • Ideally • Based on existing questions • Reviewed by colleagues • Pilot tested • Direct activities are better than gathering statistics • Fosters unexpected discoveries • Important to pre-test questions • Understandability • Bias 30
  • 31. Likert Scales • Most common methodology • Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree • 5, 7, 9-point scales • Examples • Improves my performance in book searching and buying • Enables me to search and buy books faster • Makes it easier to search for and purchase books 31
  • 32. Most Used Likert-scales • Questionnaire for User Interaction Satisfaction • E.g. questions • How long have you worked on this system? • System Usability Scale (SUS) – Brooke 1996 • Post-Study System Usability Questionniare • Computer System Usability Questionniare • Software usability Measurement Inventory • Website Analysis and MeasureMent Inventory • Mobile Phone Usability Questionnaire • Validity, Reliability 32
  • 33. Bipolar Semantically Anchored • Coleman and Williges (1985) • Pleasant versus Irritating • Hostile 1 2 3 4 5 6 7 Friendly • If needed, take existing questionnaires and alter them slightly for your application 33
  • 34. Acceptance Tests • Set goals for performance • Objective • Measurable • Examples • Mean time between failures (e.g. MOSI) • Test cases • Response time requirements • Readability (including documentation and help) • Satisfaction • Comprehensibility 34
  • 35. Let’s discuss You want your project to be user friendly. • Choose Schneiderman or Nielsen’s heuristics to provide an evaluation methodology: • What kind of setting would you use? • How much control would you want to exert? • Which methods are recorded and when will they be recorded? 35
  • 36. Acceptance Tests • By completing the acceptance tests • Can be part of contractual ful fi llment • Demonstrate objectivity • Different than usability tests • More adversarial • Neutral party should conduct that • Ex. Video game and smartphone companies • App Store, Microsoft, Nintendo, Sony 36
  • 37. Evaluation during use • Evaluation methods after a product has been released • Interviews with individual users • Get very detailed on speci fi c concerns • Costly and time-consuming • Focus group discussions • Patterns of usage • Certain people can dominate or sway opinion • Targeted focus groups 37
  • 38. Continuous Logging • The system itself logs user usage • Video game example • Other examples • Track frequency of errors (gives an ordered list of what to address via tutorials, training, text changes, etc.) • Speed of performance • Track which features are used and which are not • Web Analytics • Privacy? What gets logged? Opt-in/out? • What about companies? 38
  • 39. Online and Telephone Help • Users enjoy having people ready to help (real-time chat online or via telephone) • E.g. Net fl ix has 8.4 million customers, how many telephone customer service reps? • 375 • Expensive, but higher customer satisfaction • Cheaper versions use Bug Report systems 39
  • 40. Automated Evaluation • Software for evaluation • Low level: Spelling, term concordance • Metrics: number of displays, tabs, widgets, links • World Wide Web Consortium Markup Validation • US NIST Web Metrics Testbed • New research areas: Evaluation of mobile platforms 40
  • 41. Case Study • Computer Game: • Physiological responses used to evaluate users’ experiences; • Video of participants playing - observation; • User satisfaction questionnaire; • Possibilities of applying crowdsourcing for online performance evaluations 41