SlideShare a Scribd company logo
1
Practical lessons in mining and
evaluating information systems
Hung-Hsuan Chen, National Central University
│ ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Copyright 2015 ITRI
Data Analytics Research Team (DART)
• Discover the problems
or needs (need)
• Have the programming,
math skills, and domain
knowledge to solve the
problem (skill)
• Have passion to realize
the plan (passion)
2
https://ncu-dart.github.io/
│ ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Copyright 2015 ITRI
My background
• An engineer wearing a
scientist’s hat?
• Deep learning and
ensemble learning on
recommender systems
(2014 – 2018)
• Academic search engine
CiteSeerX (2008 – 2013)
§ 4M+ documents
§ 87M+ citations
§ 2M – 4M hits per day
§ 300K+ monthly downloads
§ 100K documents added
monthly
3
Outline
• I will present 4 common pitfalls in training and
evaluating recommender systems
• These pitfalls appeared in many previous
studies on recommender systems and
information systems
• Details are in the following paper
§ Chen, H. H., Chung, C. A., Huang, H. C., & Tsui, W.
(2017). Common pitfalls in training and evaluating
recommender systems. ACM SIGKDD Explorations
Newsletter, 19(1), 37-45.
4
A typical flow to build a
recommender system
5
t0 t1 t2
No recommendation
period
Initial recommendation algorithm
Rorig is applied online
The logs of this period is used to train and
compare the the initial algorithm Rorig and the new
algorithm Rnew
The data used to train the
new recommendation
algorithm Rnew and re-train
the the original algorithm Rorig
Test data to
compare Rorig and
Rnew
ts
The logs (e.g.,
clickstream) of this
period is used to train
the initial
recommendation
algorithm Rorig
Issue 1: trained model may be
biased toward highly reachable
products
7
Clicks resulted from the in-page
direct links
Day Day 1 Day 2
Percentage 19.3150% 21.2812%
8
• If we use the clickstreams to generate the
positive samples, by rearranging the layout of
the pages or the link targets in the pages,
approximately 1/5 of the positive training
instances are likely to be different.
Percentage of promoted products in
the recommendation list
Meth
od
MC Categ
oryTP
TotalT
P
ICF-
U2I
ICF-I2I NMF-
U2I
NMF-
I2I
train-
all
100% 1.48% 1.84% 93.22
%
1.40% 1.48% 1.34%
train-
sel
1.08% 0.86% 0.98% 14.46
%
1.28% 1.32% 1.24%
9
• When using train-all as the training data, several
algorithms recommend many of the “promoted
products”
§ We seem to learn the “layout” of the product page
(i.e., the direct links from !" to !# ) instead of the
intrinsic relatedness of between products
Lessons learned
• The common wisdom that the clickstream
represents a user’s interest/habit could be
problematic
§ Clickstreams are highly influenced by the
reachability of the products and the layouts of the
product pages
• Training a recommender system based on the
clickstreams are likely to learn
§ The “layout” of the pages
§ The recommendation rules of the online
recommender system
• Need to select training data more carefully 10
Issue 2: the online
recommendation algorithm affects
the distribution of the test data
11
CTRs when using different online
recommendation algorithm
12
Lessons learned
• Previous studies sometimes use all the
available test data as the ground truth for
evaluation
• Unfortunately, such an evaluation process
inevitably favors the algorithms that suggest
products close to the online recommendation
algorithm
• We should carefully select the test dataset to
perform a fairer evaluation.
13
Issue 3: click through rates are
mediocre proxy to the
recommendation revenues
14
CTR vs recommendation revenue
15
recommendation revenue
CTR
• Based on ~1 year log
• The correlation of determination is only 0.089
§ A weak positive relationship
Lessons learned
• Comparing recommendation algorithms based
on the user-centric metrics (e.g., CTR) may fail
to capture the business owner’s satisfaction
(e.g., revenue)
• Unfortunately, studies on recommender
systems mostly perform comparisons based
on the user-centric metrics
• Even if a recommendation algorithm attracts
many clicks, we cannot assure this algorithm
will bring a large amount of revenue to the
website 16
Issue 4: evaluating
recommendation revenue is not
straightforward
17
Comparing number of purchases
1/25 1/29 2/2 2/6 2/10 2/14 2/18
date
totalorders
1/25 1/29 2/2 2/6 2/10 2/14 2/18
date
recmdorders
18
Green line: the channel with a
recommendation panel
Blue line: the channel without
a recommendation panel
Lessons learned
• Although a recommendation module may
help users discover their needs, these users,
even without the recommendations, may still
be able to locate the desired products by
other processes
• It is not clear whether a recommendation
module brings extra purchases, or simply re-
direct users from other purchasing processes
to recommendation
• A/B-testing might be necessary
19
Discussion
• We discussed 4 pitfalls in training and
evaluating recommender systems
• The first two issues are due to the biased data
collection of the training and the test datasets
• The third issue is regarding the proper
selection of the evaluation metrics
• The fourth issue discusses the extra purchase
vs re-directed purchase of the recommender
systems
20
21
• Hung-Hsuan Chen
• https://www.ncu.edu.tw/~hhchen/
Questions?

More Related Content

PPTX
Machine learning
PDF
Introduction to Data Science - Week 3 - Steps involved in Data Science
PPTX
Introduction to data science
PDF
Deploying Open Learning Analytics at a National Scale
PDF
Make Learning Big Data Work For You
PPTX
Open Learning Analytics panel presentation - LAK 15
PDF
AI Orange Belt - Session 2
PDF
8 minute intro to data science
Machine learning
Introduction to Data Science - Week 3 - Steps involved in Data Science
Introduction to data science
Deploying Open Learning Analytics at a National Scale
Make Learning Big Data Work For You
Open Learning Analytics panel presentation - LAK 15
AI Orange Belt - Session 2
8 minute intro to data science

What's hot (20)

PDF
Data Science and Machine Learning for Non Programmers | Edureka
PPTX
Jtelss - 030619 - Reflection guidance workshop
PDF
Data Science vs Machine Learning – What’s The Difference? | Data Science Cour...
PDF
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
PDF
Machine learning Summer Training report
PDF
A Primer in Statistical Discovery
PDF
Artificial Intelligence And Machine Learning PowerPoint Presentation Slides C...
PDF
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
PDF
Succeed in AI projects
PPTX
Machine Learning in Healthcare: A Case Study
PDF
Lak20 drill down recommendation
PDF
Building the Data Science Profession in Europe
PDF
AI Orange Belt - Day 1 - case by Jetpack.ai
PDF
What your employees need to learn to work with data in the 21 st century
PDF
Data Scientist By: Professor Lili Saghafi
PDF
Barga ACM DEBS 2013 Keynote
PDF
Artificial Intelligence with Python | Edureka
PPT
Towards designing an instrument in measuring the need of info vis
PDF
BigMLSchool: Customer Segmentation
PDF
Across System Learning Environment and Dashboard Design for K12 Teachers and ...
Data Science and Machine Learning for Non Programmers | Edureka
Jtelss - 030619 - Reflection guidance workshop
Data Science vs Machine Learning – What’s The Difference? | Data Science Cour...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Machine learning Summer Training report
A Primer in Statistical Discovery
Artificial Intelligence And Machine Learning PowerPoint Presentation Slides C...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Succeed in AI projects
Machine Learning in Healthcare: A Case Study
Lak20 drill down recommendation
Building the Data Science Profession in Europe
AI Orange Belt - Day 1 - case by Jetpack.ai
What your employees need to learn to work with data in the 21 st century
Data Scientist By: Professor Lili Saghafi
Barga ACM DEBS 2013 Keynote
Artificial Intelligence with Python | Edureka
Towards designing an instrument in measuring the need of info vis
BigMLSchool: Customer Segmentation
Across System Learning Environment and Dashboard Design for K12 Teachers and ...
Ad

Similar to [2018 台灣人工智慧學校校友年會] Practical experience in mining and evaluating information systems / 陳弘軒 (20)

PDF
Modern Perspectives on Recommender Systems and their Applications in Mendeley
PDF
Recsys2016 Tutorial by Xavier and Deepak
PDF
Lessons learned from Large Scale Real World Recommender Systems
PDF
Evaluating Collaborative Filtering Recommender Systems
PPTX
Recommender system
PDF
Automatic Recommendation of Trustworthy Users in Online Product Rating Sites
PDF
Modern Perspectives on Recommender Systems and their Applications in Mendeley
PDF
ML-Powered Recommendation System for Engagement & Revenue.pdf
PPTX
Collaborative Filtering Recommendation System
PPT
Recommender Systems Tutorial (Part 1) -- Introduction
PPT
Impersonal Recommendation system on top of Hadoop
PDF
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
PPTX
AI For Your Business: An Unfair Competitive Advantage
PDF
Challenges in Evaluating Exploration Effectiveness in Recommender Systems
PPTX
Lessons learnt at building recommendation services at industry scale
PDF
Predictive Solutions and Analytics for TV & Entertainment Businesses
PPT
Recommender systems
PPTX
PPTX
Design Recommender systems from scratch
PPT
Recommender lecture
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Recsys2016 Tutorial by Xavier and Deepak
Lessons learned from Large Scale Real World Recommender Systems
Evaluating Collaborative Filtering Recommender Systems
Recommender system
Automatic Recommendation of Trustworthy Users in Online Product Rating Sites
Modern Perspectives on Recommender Systems and their Applications in Mendeley
ML-Powered Recommendation System for Engagement & Revenue.pdf
Collaborative Filtering Recommendation System
Recommender Systems Tutorial (Part 1) -- Introduction
Impersonal Recommendation system on top of Hadoop
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
AI For Your Business: An Unfair Competitive Advantage
Challenges in Evaluating Exploration Effectiveness in Recommender Systems
Lessons learnt at building recommendation services at industry scale
Predictive Solutions and Analytics for TV & Entertainment Businesses
Recommender systems
Design Recommender systems from scratch
Recommender lecture
Ad

More from 台灣資料科學年會 (20)

PDF
[台灣人工智慧學校] 人工智慧技術發展與應用
PDF
[台灣人工智慧學校] 執行長報告
PDF
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
PDF
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
PDF
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
PDF
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
PDF
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
PDF
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
PDF
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
PDF
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
PDF
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
PDF
台灣人工智慧學校成果發表會
PDF
[台中分校] 第一期結業典禮 - 執行長談話
PDF
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
PDF
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
PDF
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
PDF
[TOxAIA新竹分校] 深度學習與Kaggle實戰
PDF
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
PDF
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
PDF
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
台灣人工智慧學校成果發表會
[台中分校] 第一期結業典禮 - 執行長談話
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳

Recently uploaded (20)

PPT
Predictive modeling basics in data cleaning process
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Introduction to Inferential Statistics.pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
Managing Community Partner Relationships
PDF
Introduction to the R Programming Language
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Transcultural that can help you someday.
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
New ISO 27001_2022 standard and the changes
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Global Data and Analytics Market Outlook Report
Predictive modeling basics in data cleaning process
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
retention in jsjsksksksnbsndjddjdnFPD.pptx
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
IMPACT OF LANDSLIDE.....................
STERILIZATION AND DISINFECTION-1.ppthhhbx
Introduction to Inferential Statistics.pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx
Managing Community Partner Relationships
Introduction to the R Programming Language
[EN] Industrial Machine Downtime Prediction
Transcultural that can help you someday.
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
New ISO 27001_2022 standard and the changes
Topic 5 Presentation 5 Lesson 5 Corporate Fin
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Global Data and Analytics Market Outlook Report

[2018 台灣人工智慧學校校友年會] Practical experience in mining and evaluating information systems / 陳弘軒

  • 1. 1 Practical lessons in mining and evaluating information systems Hung-Hsuan Chen, National Central University
  • 2. │ ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Copyright 2015 ITRI Data Analytics Research Team (DART) • Discover the problems or needs (need) • Have the programming, math skills, and domain knowledge to solve the problem (skill) • Have passion to realize the plan (passion) 2 https://ncu-dart.github.io/
  • 3. │ ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Copyright 2015 ITRI My background • An engineer wearing a scientist’s hat? • Deep learning and ensemble learning on recommender systems (2014 – 2018) • Academic search engine CiteSeerX (2008 – 2013) § 4M+ documents § 87M+ citations § 2M – 4M hits per day § 300K+ monthly downloads § 100K documents added monthly 3
  • 4. Outline • I will present 4 common pitfalls in training and evaluating recommender systems • These pitfalls appeared in many previous studies on recommender systems and information systems • Details are in the following paper § Chen, H. H., Chung, C. A., Huang, H. C., & Tsui, W. (2017). Common pitfalls in training and evaluating recommender systems. ACM SIGKDD Explorations Newsletter, 19(1), 37-45. 4
  • 5. A typical flow to build a recommender system 5
  • 6. t0 t1 t2 No recommendation period Initial recommendation algorithm Rorig is applied online The logs of this period is used to train and compare the the initial algorithm Rorig and the new algorithm Rnew The data used to train the new recommendation algorithm Rnew and re-train the the original algorithm Rorig Test data to compare Rorig and Rnew ts The logs (e.g., clickstream) of this period is used to train the initial recommendation algorithm Rorig
  • 7. Issue 1: trained model may be biased toward highly reachable products 7
  • 8. Clicks resulted from the in-page direct links Day Day 1 Day 2 Percentage 19.3150% 21.2812% 8 • If we use the clickstreams to generate the positive samples, by rearranging the layout of the pages or the link targets in the pages, approximately 1/5 of the positive training instances are likely to be different.
  • 9. Percentage of promoted products in the recommendation list Meth od MC Categ oryTP TotalT P ICF- U2I ICF-I2I NMF- U2I NMF- I2I train- all 100% 1.48% 1.84% 93.22 % 1.40% 1.48% 1.34% train- sel 1.08% 0.86% 0.98% 14.46 % 1.28% 1.32% 1.24% 9 • When using train-all as the training data, several algorithms recommend many of the “promoted products” § We seem to learn the “layout” of the product page (i.e., the direct links from !" to !# ) instead of the intrinsic relatedness of between products
  • 10. Lessons learned • The common wisdom that the clickstream represents a user’s interest/habit could be problematic § Clickstreams are highly influenced by the reachability of the products and the layouts of the product pages • Training a recommender system based on the clickstreams are likely to learn § The “layout” of the pages § The recommendation rules of the online recommender system • Need to select training data more carefully 10
  • 11. Issue 2: the online recommendation algorithm affects the distribution of the test data 11
  • 12. CTRs when using different online recommendation algorithm 12
  • 13. Lessons learned • Previous studies sometimes use all the available test data as the ground truth for evaluation • Unfortunately, such an evaluation process inevitably favors the algorithms that suggest products close to the online recommendation algorithm • We should carefully select the test dataset to perform a fairer evaluation. 13
  • 14. Issue 3: click through rates are mediocre proxy to the recommendation revenues 14
  • 15. CTR vs recommendation revenue 15 recommendation revenue CTR • Based on ~1 year log • The correlation of determination is only 0.089 § A weak positive relationship
  • 16. Lessons learned • Comparing recommendation algorithms based on the user-centric metrics (e.g., CTR) may fail to capture the business owner’s satisfaction (e.g., revenue) • Unfortunately, studies on recommender systems mostly perform comparisons based on the user-centric metrics • Even if a recommendation algorithm attracts many clicks, we cannot assure this algorithm will bring a large amount of revenue to the website 16
  • 17. Issue 4: evaluating recommendation revenue is not straightforward 17
  • 18. Comparing number of purchases 1/25 1/29 2/2 2/6 2/10 2/14 2/18 date totalorders 1/25 1/29 2/2 2/6 2/10 2/14 2/18 date recmdorders 18 Green line: the channel with a recommendation panel Blue line: the channel without a recommendation panel
  • 19. Lessons learned • Although a recommendation module may help users discover their needs, these users, even without the recommendations, may still be able to locate the desired products by other processes • It is not clear whether a recommendation module brings extra purchases, or simply re- direct users from other purchasing processes to recommendation • A/B-testing might be necessary 19
  • 20. Discussion • We discussed 4 pitfalls in training and evaluating recommender systems • The first two issues are due to the biased data collection of the training and the test datasets • The third issue is regarding the proper selection of the evaluation metrics • The fourth issue discusses the extra purchase vs re-directed purchase of the recommender systems 20
  • 21. 21 • Hung-Hsuan Chen • https://www.ncu.edu.tw/~hhchen/ Questions?