SlideShare a Scribd company logo
Measuring the Quality
of Online Service
Jin Young Kim
Senior Applied Scientist
Microsoft Web Search and AI
About Jin Young Kim
• Data Scientist at Microsoft
• Quantified Self Enthusiast
(10 years of happiness tracking)
• Author of ‘Hello, Data Science’
(#1 Bestseller in Korea)
Data is the ingredients for all these issues
• Data for training and evaluating ML models
• Data for discovering the defect and issues
• Data for monitoring the health of existing service
• Data for measuring the value of new service
Issues in Online Service Development
• Planning
• How to set business objective & plan?
• Implementation
• How to train and improve ML models?
• Evaluation
• How much are users satisfied with the service?
Plan
ExecuteEvaluate
How can we collect data for these purposes?
Case Study: Data Collection for Restaurants
• Customer Behavior
• Facial expression
• Quantity of leftovers
• Pace of dining
Only limited type of data is
available, possibly with lots
of noise
Case Study: Data Collection for Restaurants
• Panel Survey
• Satisfaction for Food
• Satisfaction for Service
• Satisfaction for Environment
Survey can provide
insights into customer
satisfaction, but with
some caveat
Data Collection for Online Service
• User Behavior
• Various ‘signals’ from behavioral data
• Limited type of data is available, with lots of noise
• Needs substantial user base required
• Panel Survey
• Hire a group of panels, or use crowdsourcing
• Collect feedback for all aspects of service quality
• Cost of hiring and maintaining panel
Data Collection for Online Service (2)
• Direct User Feedback
• Request real-time feedback from customers
• Typically low response rate, with potential nuisance
• Widely used for personalized services (i.e., recommendation)
Panel Survey User Behavior
User Feedback
Measuring the Quality of Online Service - Jinyoung kim
How does major online service companies
collect data for measurement?
Search Engine: Google / Bing
• Early stage: panel-based survey
• Late stage: user behavior-based experiments
• Source: Google
How to evaluate the quality of this SERP?
Social Network: Facebook
• Before: use only user behavior
• Nowadays: user behavior + panel survey + user feedback
• Source: Slate / Quora
We could expose contents users are
actually satisfied instead of click-baits by
using panel survey and user feedback in
addition to signals from user behavior
- Julie Zhuo, Product Design VP at
Facebook
User feedback for Facebook News Feed
Recommendation System: Netflix
• Combine user feedback and behavior for measurement
• Source: Netflix
Movie Recommendations from Netflix
Algorithm
A
Algorithm
B
Can you tell if algorithm A vs. B is better?
Even the users
themselves
can’t!!!
Movie Recommendations from Netflix (2)
Results below are more relevant, but users engage more with the above
So, how should I collect data for my service?
• What signals can we extract out of user behavior?
• Are there incentives for users to provide feedback?
Service
Characteristics
• Do you already have substantial volume of active users?
• Can a panel evaluate user experience as a substitute?
Feasibility of
Collection
• Do you have marketing budget for building a user base,
or for a panel survey?
Cost of
Collection
How to evaluate the quality of this SERP?
Evaluation based on
user behavior
• Which result did users click?
• Is click the only measure of satisfaction?
• How long did a user stay on a result?
• Is longer dwell-time already better?
• Do users perform search repeatedly?
• Does loyalty mean satisfaction?
User behavior is an important clue, but a noisy one.
How can you design a panel survey for SERP
evaluation?
How would you evaluate
the search results for
query ‘crowdsourcing’?
Bad
Good
Excellent
Perfect
Q: Who do you think so?
Alternative: Evaluating a Webpage
How would you evaluate
the search results for
query ‘crowdsourcing’?
Bad
Good
Excellent
Perfect
Q: Who do you think so?
Alternative: Side-by-Side SERP Evaluation
Q: How would you
compare two results?
Left much better
Left slightly better
About the same
Right slightly better
Right much better
Q: Why do you think so?
Conclusions
Summary…
• As a first step in data science, plan on collecting high-quality data
• Combine various data collection methods depending on the
characteristics and lifecycle of service
• It takes a lot of consideration to get the panel survey done right
For more information…
• What you need to know about data even if you’re not a Data Scientist
• SIGIR’2015 Tutorial on Offline Search Evaluation
• Offline Evaluation for Information Retrieval
Foundation and Trend in IR Journal (To Appear)

More Related Content

PPTX
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
PPTX
SIGIR Tutorial on IR Evaluation: Designing an End-to-End Offline Evaluation P...
PDF
네이버서치ABT: 신뢰할 수 있는 A/B 테스트 플랫폼 개발 및 정착기
PPT
Combining Methods: Web Analytics and User Testing
PDF
Scribd, inc. slide share - google play - worldwide - 2021-05-03 18-07_11
PPTX
User Insights, Data Driven Design, and Stakeholder Buy In
PDF
Little Known Features of Qualtrics Research Suite That Will Make Your Life Ea...
PPTX
A bridge between two worlds – where qual and quant meet: Slides from UX Austr...
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
SIGIR Tutorial on IR Evaluation: Designing an End-to-End Offline Evaluation P...
네이버서치ABT: 신뢰할 수 있는 A/B 테스트 플랫폼 개발 및 정착기
Combining Methods: Web Analytics and User Testing
Scribd, inc. slide share - google play - worldwide - 2021-05-03 18-07_11
User Insights, Data Driven Design, and Stakeholder Buy In
Little Known Features of Qualtrics Research Suite That Will Make Your Life Ea...
A bridge between two worlds – where qual and quant meet: Slides from UX Austr...

What's hot (20)

PDF
Mobile research best practices
PDF
Lean User Testing Intro
PDF
User testing on a diet
PDF
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
PPT
Maxdiff webinar_10_19_10
PPTX
Analytics Academy 2015 Presentation Slides
PDF
User testing methodology
PPTX
Max diff scaling for research access(4)
PDF
User research for Product Managers - Product Tank London Jan 17
PPTX
Basics of AB testing in online products
PDF
Everything You Always Wanted to Know About Bad UX Research But Were Afraid to...
PPT
Understanding Online Audiences Bazley Ma Wonder Web 10 Jun09
PPTX
Younus poonawala Web Application Testing
PPTX
Successfully Managing Customer Experience Combining VoC and UX Testing
PDF
Intro to User Journey Maps for Building Better Websites - Cornell Drupal Camp...
PDF
You can't manage what you can't measure - User Vision Breakfast Briefing
PPTX
Crash Course on Startup Analytics
PDF
Driving Business Goals with Recommender Systems @ YAC/m 2015
PPTX
Machine learning applications nurturing growth of various business domains
PPT
Managing Top Tasks
Mobile research best practices
Lean User Testing Intro
User testing on a diet
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
Maxdiff webinar_10_19_10
Analytics Academy 2015 Presentation Slides
User testing methodology
Max diff scaling for research access(4)
User research for Product Managers - Product Tank London Jan 17
Basics of AB testing in online products
Everything You Always Wanted to Know About Bad UX Research But Were Afraid to...
Understanding Online Audiences Bazley Ma Wonder Web 10 Jun09
Younus poonawala Web Application Testing
Successfully Managing Customer Experience Combining VoC and UX Testing
Intro to User Journey Maps for Building Better Websites - Cornell Drupal Camp...
You can't manage what you can't measure - User Vision Breakfast Briefing
Crash Course on Startup Analytics
Driving Business Goals with Recommender Systems @ YAC/m 2015
Machine learning applications nurturing growth of various business domains
Managing Top Tasks
Ad

Viewers also liked (10)

PPTX
Subtleties in Tracking Happiness -- Seattle QS#10
PPT
Social Entrepreneur meets Technology by 황진솔 대표
PDF
150613 당신이 아마도 몰랐을 빅데이터 이야기 (YEF 공유)
PPTX
헬로 데이터 과학: 삶과 업무를 개선하는 데이터 과학 이야기 (스타트업 얼라이언스 강연)
PPTX
반상식적이고 주관적인 (CS) 유학 이야기
PPTX
빅 데이터 개요 및 활용
PPTX
빅데이터의 개념과 이해 그리고 활용사례 (Introduction to big data and use cases)
PPTX
빅데이터의 활용
PPTX
빅데이터의 이해
PPTX
Big data ppt
Subtleties in Tracking Happiness -- Seattle QS#10
Social Entrepreneur meets Technology by 황진솔 대표
150613 당신이 아마도 몰랐을 빅데이터 이야기 (YEF 공유)
헬로 데이터 과학: 삶과 업무를 개선하는 데이터 과학 이야기 (스타트업 얼라이언스 강연)
반상식적이고 주관적인 (CS) 유학 이야기
빅 데이터 개요 및 활용
빅데이터의 개념과 이해 그리고 활용사례 (Introduction to big data and use cases)
빅데이터의 활용
빅데이터의 이해
Big data ppt
Ad

Similar to Measuring the Quality of Online Service - Jinyoung kim (20)

PDF
Continuous Improvement
PPTX
SPSCT15 - An Independent Evaluation of Third-Party SharePoint Analytics Offer...
PPTX
UX Webinar: Always Be Testing
PPTX
SPSNYC15 - An Independent Evaluation of Third-Party SharePoint Analytics Offe...
PPTX
An Introduction to the World of User Research
PPT
Mktg350 lecture 10212013
PPTX
UCO16 - An Independent Evaluation of Third-Party SharePoint Analytics Offerings
PPTX
Share and Tell Stanford 2016
PPTX
PQF Overview
PPT
Using online tools to help us assess our public legal education work
PDF
Why your analytics land with a thud
PDF
Google Analytics 101
PPTX
PAS: The Planning Quality Framework
PDF
Digital Marketing Course Week 4: Digital Analytics
PPTX
Summer Internship Project Report for college.pptx
PPTX
Website Analytics and Measurement
PPTX
Analytics Best Practice for the Travel Industry
PPT
Life in a Not Provided World at SMX West 2012 by Micah Fisher-Kirshner
PPTX
Design Recommender systems from scratch
PPTX
Marketing in a social age travel updated (feb)
Continuous Improvement
SPSCT15 - An Independent Evaluation of Third-Party SharePoint Analytics Offer...
UX Webinar: Always Be Testing
SPSNYC15 - An Independent Evaluation of Third-Party SharePoint Analytics Offe...
An Introduction to the World of User Research
Mktg350 lecture 10212013
UCO16 - An Independent Evaluation of Third-Party SharePoint Analytics Offerings
Share and Tell Stanford 2016
PQF Overview
Using online tools to help us assess our public legal education work
Why your analytics land with a thud
Google Analytics 101
PAS: The Planning Quality Framework
Digital Marketing Course Week 4: Digital Analytics
Summer Internship Project Report for college.pptx
Website Analytics and Measurement
Analytics Best Practice for the Travel Industry
Life in a Not Provided World at SMX West 2012 by Micah Fisher-Kirshner
Design Recommender systems from scratch
Marketing in a social age travel updated (feb)

Recently uploaded (20)

PDF
Navigating the Thai Supplements Landscape.pdf
PDF
Microsoft 365 products and services descrption
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
IMPACT OF LANDSLIDE.....................
DOCX
Factor Analysis Word Document Presentation
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
New ISO 27001_2022 standard and the changes
PPTX
Leprosy and NLEP programme community medicine
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Business_Capability_Map_Collection__pptx
PPTX
modul_python (1).pptx for professional and student
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Introduction to the R Programming Language
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Navigating the Thai Supplements Landscape.pdf
Microsoft 365 products and services descrption
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
IMPACT OF LANDSLIDE.....................
Factor Analysis Word Document Presentation
Pilar Kemerdekaan dan Identi Bangsa.pptx
CYBER SECURITY the Next Warefare Tactics
New ISO 27001_2022 standard and the changes
Leprosy and NLEP programme community medicine
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Qualitative Qantitative and Mixed Methods.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Business_Capability_Map_Collection__pptx
modul_python (1).pptx for professional and student
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to the R Programming Language
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx

Measuring the Quality of Online Service - Jinyoung kim

  • 1. Measuring the Quality of Online Service Jin Young Kim Senior Applied Scientist Microsoft Web Search and AI
  • 2. About Jin Young Kim • Data Scientist at Microsoft • Quantified Self Enthusiast (10 years of happiness tracking) • Author of ‘Hello, Data Science’ (#1 Bestseller in Korea)
  • 3. Data is the ingredients for all these issues • Data for training and evaluating ML models • Data for discovering the defect and issues • Data for monitoring the health of existing service • Data for measuring the value of new service
  • 4. Issues in Online Service Development • Planning • How to set business objective & plan? • Implementation • How to train and improve ML models? • Evaluation • How much are users satisfied with the service? Plan ExecuteEvaluate
  • 5. How can we collect data for these purposes?
  • 6. Case Study: Data Collection for Restaurants • Customer Behavior • Facial expression • Quantity of leftovers • Pace of dining Only limited type of data is available, possibly with lots of noise
  • 7. Case Study: Data Collection for Restaurants • Panel Survey • Satisfaction for Food • Satisfaction for Service • Satisfaction for Environment Survey can provide insights into customer satisfaction, but with some caveat
  • 8. Data Collection for Online Service • User Behavior • Various ‘signals’ from behavioral data • Limited type of data is available, with lots of noise • Needs substantial user base required • Panel Survey • Hire a group of panels, or use crowdsourcing • Collect feedback for all aspects of service quality • Cost of hiring and maintaining panel
  • 9. Data Collection for Online Service (2) • Direct User Feedback • Request real-time feedback from customers • Typically low response rate, with potential nuisance • Widely used for personalized services (i.e., recommendation) Panel Survey User Behavior User Feedback
  • 11. How does major online service companies collect data for measurement?
  • 12. Search Engine: Google / Bing • Early stage: panel-based survey • Late stage: user behavior-based experiments • Source: Google
  • 13. How to evaluate the quality of this SERP?
  • 14. Social Network: Facebook • Before: use only user behavior • Nowadays: user behavior + panel survey + user feedback • Source: Slate / Quora We could expose contents users are actually satisfied instead of click-baits by using panel survey and user feedback in addition to signals from user behavior - Julie Zhuo, Product Design VP at Facebook
  • 15. User feedback for Facebook News Feed
  • 16. Recommendation System: Netflix • Combine user feedback and behavior for measurement • Source: Netflix
  • 17. Movie Recommendations from Netflix Algorithm A Algorithm B Can you tell if algorithm A vs. B is better? Even the users themselves can’t!!!
  • 18. Movie Recommendations from Netflix (2) Results below are more relevant, but users engage more with the above
  • 19. So, how should I collect data for my service? • What signals can we extract out of user behavior? • Are there incentives for users to provide feedback? Service Characteristics • Do you already have substantial volume of active users? • Can a panel evaluate user experience as a substitute? Feasibility of Collection • Do you have marketing budget for building a user base, or for a panel survey? Cost of Collection
  • 20. How to evaluate the quality of this SERP?
  • 21. Evaluation based on user behavior • Which result did users click? • Is click the only measure of satisfaction? • How long did a user stay on a result? • Is longer dwell-time already better? • Do users perform search repeatedly? • Does loyalty mean satisfaction? User behavior is an important clue, but a noisy one.
  • 22. How can you design a panel survey for SERP evaluation? How would you evaluate the search results for query ‘crowdsourcing’? Bad Good Excellent Perfect Q: Who do you think so?
  • 23. Alternative: Evaluating a Webpage How would you evaluate the search results for query ‘crowdsourcing’? Bad Good Excellent Perfect Q: Who do you think so?
  • 24. Alternative: Side-by-Side SERP Evaluation Q: How would you compare two results? Left much better Left slightly better About the same Right slightly better Right much better Q: Why do you think so?
  • 26. Summary… • As a first step in data science, plan on collecting high-quality data • Combine various data collection methods depending on the characteristics and lifecycle of service • It takes a lot of consideration to get the panel survey done right
  • 27. For more information… • What you need to know about data even if you’re not a Data Scientist • SIGIR’2015 Tutorial on Offline Search Evaluation • Offline Evaluation for Information Retrieval Foundation and Trend in IR Journal (To Appear)

Editor's Notes

  • #2: 행사 제목이 ‘우리가 데이터를 쓰는 법'인데요, 저는 오늘 데이터 수집에 초점을 맞추어 볼까 한다. 데이터로 일을 해보신 분들은 공감하겠지만 제대로 된 데이터가 있으면 이를 가공하는 것도, 사용하는 것도 상대적으로 쉽다.
  • #4: 이들 대부분은 데이터 문제다. 이처럼 다양한 유형의 데이터가 있지만 핵심은 서비스에 대한 고객의 반응을 측정하는 것이다.
  • #5: 온라인 서비스의 개발 과정은 크게 ~ 로 나눌 수 있다. 각 단계별로 다양한 이슈가 존재한다.
  • #7: 이해를 돕기 위해 식당을 예로 들어보자. 고객의 행동에서 얻을 수 있는 데이터는 무엇일까?
  • #8: 부족한 데이터는 패널 서베이를 통해 얻을 수 있다. 패널 서베이는 고객의 의견을 대표하는 패널을 고용하여 그들의 의견을 청취하는 것이다.
  • #9: 이런 데이터 수집 방법은 온라인 서비스의 개선에도 그대로 적용할 수 있다.
  • #10: 지금까지 두가지 방법을 알아보았는데, 이를 결합하면 어떨까? 사용자에게 실시간으로 피드백을 받는 것이다. 하지만, 이를 제대로 하지 않으면 낮은 응답률에 오히려 사용자를 성가시게 할 수도 있다.
  • #12: 이제 이런 데이터 수집 방법을 주요 온라인 서비스 기업에서 어떻게 활용하는지 알아보자.
  • #13: 우선 필자의 업무 영역인 검색 서비스 사례를 생각해보자. 검색서비스 개선을 위해서는 다양한 실험 기법이 사용되는데 ~ (뒤에 자세히 다룬다)
  • #15: 페이스북은 서비스 초기에 사용자 로그만 사용했다고 한다. 하지만 최근에는 ~ 페이스북 피드 랭킹에 사용자 로그와 함께 패널 서베이와 사용자 피드백을 추가로 사용함으로써 클릭만을 유도하는 컨텐츠 대신 사용자가 만족하는 컨텐츠를 더 많이 노출시킬 수 있었다.
  • #17: 넷플릭스에서는 검색 서비스와 추천 서비스의 평가에 각각 다른 데이터를 사용한다고 한다.
  • #18: 그 이유중 하나는 개인화된 추천 서비스의 결과를 서베이로 평가하기 어렵다는 것이다. 예를 들어 두가지 추천 알고리즘에서 나온 결과를 비교해보자. 이용자 자신도 우열을 판단하기가 쉽지 않다!
  • #23: No ground for comparison / What if the judge doesn’t understand the intent?
  • #24: No ground for comparison / What if the judge doesn’t understand the intent?
  • #25: Should we use ‘about the same’ vs. ‘the same??