SlideShare a Scribd company logo
A New Similarity Measurement based on Hellinger Distance
For Collaborating Filtering in Sparse Data Set
Submitted in Fulfillment of Requirements for the
Degree of
MASTER OF TECHNOLOGY IN
COMPUTER SCIENCE AND ENGINEERING
specialization in
Information Security
by
Prabhu Kumar (15MT000624)
Under the guidance of
Dr. Rajendra Pamula
(Assistant Professor)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
INDIAN INSTITUTE OF TECHNOLOGY (INDIAN SCHOOL OF MINES), DHANBAD
INDIA
M AY 2017
Outlines
• Introduction of recommender system
• Source of information
• Types of recommendation system
• Architecture
• Similarity measurements
• Proposed method
• Result
• References
Introduction
What is Recommender System?
• It’s generic machine learning techniques
or information filtering system which predict
the user’s preference.
Example of Recommender System
• Recommender system widely used in Movie, News, and Music recommendation etc...
Source of Information
• The data which collects for recommendation is from Content, demographic, and
social media information.
Source of information (Continued..)
Types of Recommendation
1. Collaborative filtering recommendation system- It is based on the way which
humans have made decision throughout history and it is based on rating that user
has rated before using that specific items. So that, algorithm analyze their rating
predicts items for recommendation
2. Content based recommendation system- It is based on the user’s choices made
in the past in form of content that which content user liked the most in past
3. Hybrid recommendation system- Combinations of both
If A and B techniques is used for recommendation then A’s disadvantages will fix B
and B’s disadvantages will fix A .
Collaborating Filtering based Recommender system
Content based recommender system
Architecture of recommender system
• For matching process in Recommender system:
“KNN algorithm is one of most useful algorithm which is used for recommendation
the item to the users”
KNN-algorithm(oriented to users)
Continued…
Similarity Measurements
• Cosine Similarity:
“It measures angle between two vector of ratings, the lower the angle, higher the similarity”
𝒔𝒊𝒎(𝒖, 𝒗) 𝒄𝒐𝒔
=
𝒓 𝒖 . 𝒓 𝒗
𝒓 𝒖 . 𝒓 𝒗
“A vector which has magnitude and direction.”
Drawbacks:
• If the two vector are on same line example a=(2,2,2,2) and b=(3,3,3,3) then the cosine value will be 1,
the similarity value will be “0”.
• It suffers from the co-rated items.
• Similarity measurement is techniques which finds the nearest neighbor for an specific active user for
further processing of recommendation.
• ACOS (Adjusted Cosine Similarity) : “ Some people like to rate high even they don’t like the item very
much However some people like to rate low if they like the item too much. So, ACOS is introduced”
𝒔𝒊𝒎(𝒖, 𝒗) 𝑨𝑪𝑶𝑺
=
𝒋=𝟏
𝒕𝒐𝒕𝒂𝒍 𝒏𝒐 𝒐𝒇 𝒄𝒐−𝒓𝒂𝒕𝒆𝒅 𝒊𝒕𝒆𝒎𝒔
𝒓 𝒖 𝒋
− 𝒓 𝒖 𝒋
∗ (𝒓 𝒗 𝒋
− 𝒓 𝒗 𝒋
)
𝒋=𝟏
𝒕𝒐𝒕𝒂𝒍 𝒏𝒐 𝒐𝒇 𝒄𝒐−𝒓𝒂𝒕𝒆𝒅 𝒊𝒕𝒆𝒔𝒎
(𝒓 𝒖 𝒋
− 𝒓 𝒖 𝒋
) 𝟐
𝒋=𝟏
𝒕𝒐𝒕𝒂𝒍 𝒏𝒐 𝒐𝒇 𝒄𝒐−𝒓𝒂𝒕𝒆𝒅 𝒊𝒕𝒆𝒎𝒔
(𝒓 𝒗 𝒋
− 𝒓 𝒗 𝒋
) 𝟐
Drawbacks:
• Similar rating problems
• Few co-rated item problems
• Pearson’s co-relation : “It finds the linear co-relation between two vector of ratings”
𝒔𝒊𝒎(𝒖, 𝒗) 𝑷𝑪𝑪
=
𝒑∈𝒋(𝒓 𝒖,𝒑 − 𝒓 𝒖)(𝒓 𝒗,𝒑 − 𝒓 𝒗)
𝒑∈𝒋(𝒓 𝒖,𝒑 − 𝒓 𝒖) 𝟐 . 𝒑∈𝒋(𝒓 𝒗,𝒑 − 𝒓 𝒗)𝟐
Drawbacks:
• If the rating item vector is a=(2,2,2,2) and b=(1,2,3,4) or rating in vector is Flat then PCC can’t be calculate
• If the co-rated item 1, PCC will be “0”, So it suffer from the few co-rated items.
PIP (Proximity-Impact- Popularity) :
𝑠𝑖𝑚(𝑢, 𝑣) 𝑃𝐼𝑃
= 𝑗∈𝑡𝑜𝑡𝑎𝑙 𝑛𝑜 𝑜𝑓 𝑐𝑜−𝑟𝑎𝑡𝑒𝑑 𝑖𝑡𝑒𝑚𝑠 𝑃𝐼𝑃(𝑟𝑢 𝑗
, 𝑟𝑣 𝑗
)
Whereas, 𝑃𝐼𝑃 𝑟1, 𝑟2 = 𝑃𝑟𝑜𝑥𝑖𝑚𝑖𝑡𝑦 𝑟1, 𝑟2 ∗ 𝑖𝑚𝑝𝑎𝑐𝑡 𝑟1, 𝑟2 ∗ 𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦(𝑟1, 𝑟2)
𝑖𝑓 𝑟1 > 𝑟 𝑚𝑒𝑑 𝑎𝑛𝑑 𝑟2 > 𝑟 𝑚𝑒𝑑 :
𝑝𝑟𝑜𝑥𝑖𝑚𝑖𝑡𝑦 𝑟1, 𝑟2 = 𝑟1 − 𝑟2
𝑖𝑚𝑝𝑎𝑐𝑡 𝑟1, 𝑟2 = ( 𝑟1 − 𝑟 𝑚𝑒𝑑 + 1)( 𝑟2 − 𝑟 𝑚𝑒𝑑 + 1)
𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦 𝑟1, 𝑟2 = 1 + (
𝑟1+𝑟2
2
− 𝜇 𝑘)2
𝑒𝑙𝑠𝑒:
𝑝𝑟𝑜𝑚𝑖𝑡𝑦 𝑟1, 𝑟2 = 2 ∗ 𝑟1 − 𝑟2
𝑖𝑚𝑝𝑎𝑐𝑡 𝑟1, 𝑟2 =
1
( 𝑟1−𝑟 𝑚𝑒𝑑 +1)( 𝑟2−𝑟 𝑚𝑒𝑑 +1)
𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦 𝑟1, 𝑟2 = 1
and 𝜇 𝑘 = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑟𝑎𝑡𝑖𝑛𝑔 𝑓𝑜𝑟 𝑡ℎ𝑎𝑡 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑖𝑡𝑒𝑚 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑟𝑎𝑡𝑒𝑑 𝑏𝑦 𝑎𝑙𝑙 𝑢𝑠𝑒𝑟𝑠
Drawbacks:
• It doesn’t consider the proportion of common ratings made by users
• Jacard similarity measurement:
“It only considers the no of common rating between two users.”
𝑺𝒊𝒎(𝒖, 𝒗) 𝑱𝒂𝒄𝒂𝒓𝒅
=
𝑰 𝒖 ∩ 𝑰 𝒗
𝑰 𝒖 ∪ 𝑰 𝒗
Drawbacks:
• It doesn’t consider the absolute rating.
• Mean squared difference:
“It only considers the absolute rating ”
𝒔𝒊𝒎(𝒖, 𝒗) 𝒎𝒔𝒅 = 𝟏 −
𝒑∈𝑰(𝒓 𝒖,𝒑−𝒓 𝒗,𝒑) 𝟐
𝑰
Drawbacks:
• It doesn’t consider the no of common rating between two users so, it ignores the credibility of similarity
measurement.
• It ignores the proportion of common rating between two users.
Proposed method
Hellinger Distance:
• It is used to quantify the similarity between two vector.
• The minimum hellinger distance will be zero if no item is rated by both users and all the item rated by users as
absolutely same.
• The value of hellinger distance will range from 0 to 2
• 2 is defines at H(P,Q) ≤ 1 for all distance between the two users
𝑯 𝑷, 𝑸 =
𝟏
𝟐 𝒊=𝟏
𝒌
( 𝒑𝒊 − 𝒒𝒊) 𝟐
Let P = {2, 3, 1} and Q= {3, 2, 3}
So, Hellinger distance =
1
2
( 2 − 3)2 + 3 − 2 2 + ( 1 − 3)2
=
1
2
0.101021 + 0.101021 + 0.53589838 =
1
2
𝑋 0.85903 =0.60743
Local references:
• It plays an important role to find the local information about the user’s rating.
• It must provide positive as well as negative co-relation between two users.
• It is used for finding the actual relation between two users according to their ratings.
𝒍𝒐𝒄 𝒎𝒆𝒅 𝒓 𝒖𝒊 , 𝒓 𝒗𝒊 =
(𝒓 𝒖𝒊−𝒓 𝒎𝒆𝒅 )(𝒓 𝒗𝒊 −𝒓 𝒎𝒆𝒅)
𝒌∈𝑰 𝒖
(𝒓 𝒖𝒌 −𝒓 𝒎𝒆𝒅) 𝟐
𝒌∈𝑰 𝒗
(𝒓 𝒗𝒌−𝒓 𝒎𝒆𝒅) 𝟐
Whereas, K is all items rated by users
rui is the rating by user u for ith item.
rvi is the rating by user v for ith item.
rmed is the average of rating by users.
Proposed method equation :
𝑆 𝑢, 𝑣 = 𝐻 𝑢, 𝑣 ∗
𝑖∈𝑢 𝑗∈𝑣
𝑙𝑜𝑐 𝑟𝑢𝑖, 𝑟𝑣𝑗 + 𝐽𝑎𝑐𝑎𝑟𝑑(𝑢, 𝑣)
Where,
H(u, v) is the hellinger distance
loc(rui, rvj) is the local similarities between all the user’s rating to that items
Jacard (u, v) measures the rating proportion of two users.
Result:
• In this graph, the flat item-ratings and few common rating problem is solved using proposed
method.
• U1 and U3 and U2-U4 is flat rating, U4-U5 is improvement of Common rating Proportion.
• U3 to U5 has few co-rated item problem.
Item1 Item2 Item3 Item4
User1 4 3 5 4
User2 5 3 - -
User3 4 3 4 4
User4 2 1 - -
User5 4 2 - -
• The problem of same co-rated vector and few co-rated items has improved using proposed method and
also the simultaneous difference of rating problem has been solved.
• U1 and U3 has same co-rated Vector, it improves using proposed method.
• U1 and U5 suffers from few co-rated items
• U4 and U5 has simultaneous difference problem.
• The problem of local similarities and proportion of rating has improved using proposed
method.
• U4 and U5 has proportion of rating problem in PIP which improved by proposed method.
• U1 and U4 has few co-rated item problems.
• U2 and U4 has local similarities improvement.
Evaluation of Proposed method in large dataset
• Through large dataset of Movielens, called ML-100K, there are 100,000 ratings with
943 persons and 1682 movies. Another is ML-1M, it includes 6040 users and 3952
movies with 1,000,209 ratings. Each user has rated at least 20 movies.
• The movie’s recommendation using Cosine Similarity and proposed method.
• The movie’s recommendation using PIP (proximity-impact-popularity) and
proposed method.
References
• J. Bobadilla, F. Ortega, A. Hernando, A. Gutirrez, Recommender systems survey, Knowl.-Based Syst. 46 (2013) 109–132.
• P. Resnick, H.R. Varian, Recommender systems, Commun. ACM 40 (3) (1997) 56–58.
• G. Linden, B. Smith, J. York, Amazon.com recommendations: item-to-item collaborative filtering, IEEE Internet Comput. 7 (1)
(2003) 76–80.
• Y. Koren, Factorization meets the neighborhood: a multifaceted collaborative filtering model, in: Proceedings of the 14th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 426–434.
• C. Desrosiers, G. Karypis, A comprehensive survey of neighborhood-based recommendation methods, in: Recommender
Systems Handbook, 2011, pp. 107–144.
• M.J. Pazzani, D. Billsus, Content-based recommendation systems, The Adap. Web (2007) 325–341.
• H. Junming, C. Xueqi, G. Jiafeng, S. Huawei, Y. Kun, Social recommendation with interpersonal influence, ECAI 10 (2010) 601–
606.
Thank You !
A special thanks to my project guide Dr. Rajendra Pamula sir for
guiding, motivating and providing me with fruitful information throughout
the development process of this project work
My sincere gratitude to the panel of teachers present for giving their
precious time for listening and evaluating my project presentation

More Related Content

PDF
Collaborative Filtering 2: Item-based CF
PDF
Music Recommendation System with User-based and Item-based Collaborative Filt...
PPTX
Temporal based Recommendation System
PDF
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach
PDF
[UMAP 2016] User-Oriented Context Suggestion
PDF
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
PDF
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...
PDF
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization
Collaborative Filtering 2: Item-based CF
Music Recommendation System with User-based and Item-based Collaborative Filt...
Temporal based Recommendation System
[IUI 2017] Criteria Chains: A Novel Multi-Criteria Recommendation Approach
[UMAP 2016] User-Oriented Context Suggestion
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similari...
[UMAP 2015] Integrating Context Similarity with Sparse Linear Recommendation ...
[EMPIRE 2016] Adapt to Emotional Reactions In Context-aware Personalization

What's hot (20)

PDF
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
PDF
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
PPT
Recommendation and Information Retrieval: Two Sides of the Same Coin?
PDF
Tutorial: Context In Recommender Systems
PDF
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
PDF
At4102337341
PDF
IRJET- A Personalized Music Recommendation System
PPTX
ACM ICTIR 2019 Slides - Santa Clara, USA
PDF
[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems
PDF
A survey of memory based methods for collaborative filtering based techniques
PDF
Scalable recommendation with social contextual information
PDF
Pak eko 4412ijdms01
PDF
Analysis of Textual Data Classification with a Reddit Comments Dataset
PDF
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
PDF
Analysis of wavelet-based full reference image quality assessment algorithm
PDF
Low rank models for recommender systems with limited preference information
PDF
Kjartjo-lokaverkefni
PPT
Item basedcollaborativefilteringrecommendationalgorithms
PDF
debatrim_report (1)
PDF
IRJET- Book Recommendation System using Item Based Collaborative Filtering
[Decisions2013@RecSys]The Role of Emotions in Context-aware Recommendation
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In R...
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Tutorial: Context In Recommender Systems
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
At4102337341
IRJET- A Personalized Music Recommendation System
ACM ICTIR 2019 Slides - Santa Clara, USA
[IUI2015] A Revisit to The Identification of Contexts in Recommender Systems
A survey of memory based methods for collaborative filtering based techniques
Scalable recommendation with social contextual information
Pak eko 4412ijdms01
Analysis of Textual Data Classification with a Reddit Comments Dataset
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Analysis of wavelet-based full reference image quality assessment algorithm
Low rank models for recommender systems with limited preference information
Kjartjo-lokaverkefni
Item basedcollaborativefilteringrecommendationalgorithms
debatrim_report (1)
IRJET- Book Recommendation System using Item Based Collaborative Filtering
Ad

Similar to A new similarity measurement based on hellinger distance for collaborating filtering in sparse data set (20)

PDF
Recommender Systems! @ASAI 2011
PDF
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
PDF
Movie recommendation project
PPT
Chapter 02 collaborative recommendation
PPT
Chapter 02 collaborative recommendation
PDF
User Based Recommendation Systems (1).pdf
PPTX
movierecommendationproject-171223181147.pptx
PDF
Speaker pham cong dinh
PDF
PPT by Jannach_organized.pdf presentation on the recommendation
PPTX
Recommender Systems.pptx
PDF
Recommender Systems
PPTX
Lecture Notes on Recommender System Introduction
PPTX
Recommendation system
PPTX
Recommender systems: Content-based and collaborative filtering
PDF
LIBRS: LIBRARY RECOMMENDATION SYSTEM USING HYBRID FILTERING
PDF
A Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria Ratings
PDF
Entropy-weighted similarity measures for collaborative recommender systems.pdf
PDF
B1802021823
PDF
Analysing the performance of Recommendation System using different similarity...
PDF
Recommender Systems
Recommender Systems! @ASAI 2011
Aaa ped-19-Recommender Systems: Neighborhood-based Filtering
Movie recommendation project
Chapter 02 collaborative recommendation
Chapter 02 collaborative recommendation
User Based Recommendation Systems (1).pdf
movierecommendationproject-171223181147.pptx
Speaker pham cong dinh
PPT by Jannach_organized.pdf presentation on the recommendation
Recommender Systems.pptx
Recommender Systems
Lecture Notes on Recommender System Introduction
Recommendation system
Recommender systems: Content-based and collaborative filtering
LIBRS: LIBRARY RECOMMENDATION SYSTEM USING HYBRID FILTERING
A Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria Ratings
Entropy-weighted similarity measures for collaborative recommender systems.pdf
B1802021823
Analysing the performance of Recommendation System using different similarity...
Recommender Systems
Ad

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PPTX
Spectroscopy.pptx food analysis technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
KodekX | Application Modernization Development
Spectroscopy.pptx food analysis technology
Encapsulation_ Review paper, used for researhc scholars
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MIND Revenue Release Quarter 2 2025 Press Release
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Machine learning based COVID-19 study performance prediction
Building Integrated photovoltaic BIPV_UPV.pdf
Spectral efficient network and resource selection model in 5G networks
NewMind AI Weekly Chronicles - August'25 Week I
20250228 LYD VKU AI Blended-Learning.pptx
Empathic Computing: Creating Shared Understanding
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Unlocking AI with Model Context Protocol (MCP)
Advanced methodologies resolving dimensionality complications for autism neur...

A new similarity measurement based on hellinger distance for collaborating filtering in sparse data set

  • 1. A New Similarity Measurement based on Hellinger Distance For Collaborating Filtering in Sparse Data Set Submitted in Fulfillment of Requirements for the Degree of MASTER OF TECHNOLOGY IN COMPUTER SCIENCE AND ENGINEERING specialization in Information Security by Prabhu Kumar (15MT000624) Under the guidance of Dr. Rajendra Pamula (Assistant Professor) DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY (INDIAN SCHOOL OF MINES), DHANBAD INDIA M AY 2017
  • 2. Outlines • Introduction of recommender system • Source of information • Types of recommendation system • Architecture • Similarity measurements • Proposed method • Result • References
  • 3. Introduction What is Recommender System? • It’s generic machine learning techniques or information filtering system which predict the user’s preference.
  • 4. Example of Recommender System • Recommender system widely used in Movie, News, and Music recommendation etc...
  • 5. Source of Information • The data which collects for recommendation is from Content, demographic, and social media information.
  • 6. Source of information (Continued..)
  • 7. Types of Recommendation 1. Collaborative filtering recommendation system- It is based on the way which humans have made decision throughout history and it is based on rating that user has rated before using that specific items. So that, algorithm analyze their rating predicts items for recommendation 2. Content based recommendation system- It is based on the user’s choices made in the past in form of content that which content user liked the most in past 3. Hybrid recommendation system- Combinations of both If A and B techniques is used for recommendation then A’s disadvantages will fix B and B’s disadvantages will fix A .
  • 8. Collaborating Filtering based Recommender system
  • 11. • For matching process in Recommender system: “KNN algorithm is one of most useful algorithm which is used for recommendation the item to the users”
  • 14. Similarity Measurements • Cosine Similarity: “It measures angle between two vector of ratings, the lower the angle, higher the similarity” 𝒔𝒊𝒎(𝒖, 𝒗) 𝒄𝒐𝒔 = 𝒓 𝒖 . 𝒓 𝒗 𝒓 𝒖 . 𝒓 𝒗 “A vector which has magnitude and direction.” Drawbacks: • If the two vector are on same line example a=(2,2,2,2) and b=(3,3,3,3) then the cosine value will be 1, the similarity value will be “0”. • It suffers from the co-rated items. • Similarity measurement is techniques which finds the nearest neighbor for an specific active user for further processing of recommendation.
  • 15. • ACOS (Adjusted Cosine Similarity) : “ Some people like to rate high even they don’t like the item very much However some people like to rate low if they like the item too much. So, ACOS is introduced” 𝒔𝒊𝒎(𝒖, 𝒗) 𝑨𝑪𝑶𝑺 = 𝒋=𝟏 𝒕𝒐𝒕𝒂𝒍 𝒏𝒐 𝒐𝒇 𝒄𝒐−𝒓𝒂𝒕𝒆𝒅 𝒊𝒕𝒆𝒎𝒔 𝒓 𝒖 𝒋 − 𝒓 𝒖 𝒋 ∗ (𝒓 𝒗 𝒋 − 𝒓 𝒗 𝒋 ) 𝒋=𝟏 𝒕𝒐𝒕𝒂𝒍 𝒏𝒐 𝒐𝒇 𝒄𝒐−𝒓𝒂𝒕𝒆𝒅 𝒊𝒕𝒆𝒔𝒎 (𝒓 𝒖 𝒋 − 𝒓 𝒖 𝒋 ) 𝟐 𝒋=𝟏 𝒕𝒐𝒕𝒂𝒍 𝒏𝒐 𝒐𝒇 𝒄𝒐−𝒓𝒂𝒕𝒆𝒅 𝒊𝒕𝒆𝒎𝒔 (𝒓 𝒗 𝒋 − 𝒓 𝒗 𝒋 ) 𝟐 Drawbacks: • Similar rating problems • Few co-rated item problems • Pearson’s co-relation : “It finds the linear co-relation between two vector of ratings” 𝒔𝒊𝒎(𝒖, 𝒗) 𝑷𝑪𝑪 = 𝒑∈𝒋(𝒓 𝒖,𝒑 − 𝒓 𝒖)(𝒓 𝒗,𝒑 − 𝒓 𝒗) 𝒑∈𝒋(𝒓 𝒖,𝒑 − 𝒓 𝒖) 𝟐 . 𝒑∈𝒋(𝒓 𝒗,𝒑 − 𝒓 𝒗)𝟐 Drawbacks: • If the rating item vector is a=(2,2,2,2) and b=(1,2,3,4) or rating in vector is Flat then PCC can’t be calculate • If the co-rated item 1, PCC will be “0”, So it suffer from the few co-rated items.
  • 16. PIP (Proximity-Impact- Popularity) : 𝑠𝑖𝑚(𝑢, 𝑣) 𝑃𝐼𝑃 = 𝑗∈𝑡𝑜𝑡𝑎𝑙 𝑛𝑜 𝑜𝑓 𝑐𝑜−𝑟𝑎𝑡𝑒𝑑 𝑖𝑡𝑒𝑚𝑠 𝑃𝐼𝑃(𝑟𝑢 𝑗 , 𝑟𝑣 𝑗 ) Whereas, 𝑃𝐼𝑃 𝑟1, 𝑟2 = 𝑃𝑟𝑜𝑥𝑖𝑚𝑖𝑡𝑦 𝑟1, 𝑟2 ∗ 𝑖𝑚𝑝𝑎𝑐𝑡 𝑟1, 𝑟2 ∗ 𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦(𝑟1, 𝑟2) 𝑖𝑓 𝑟1 > 𝑟 𝑚𝑒𝑑 𝑎𝑛𝑑 𝑟2 > 𝑟 𝑚𝑒𝑑 : 𝑝𝑟𝑜𝑥𝑖𝑚𝑖𝑡𝑦 𝑟1, 𝑟2 = 𝑟1 − 𝑟2 𝑖𝑚𝑝𝑎𝑐𝑡 𝑟1, 𝑟2 = ( 𝑟1 − 𝑟 𝑚𝑒𝑑 + 1)( 𝑟2 − 𝑟 𝑚𝑒𝑑 + 1) 𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦 𝑟1, 𝑟2 = 1 + ( 𝑟1+𝑟2 2 − 𝜇 𝑘)2 𝑒𝑙𝑠𝑒: 𝑝𝑟𝑜𝑚𝑖𝑡𝑦 𝑟1, 𝑟2 = 2 ∗ 𝑟1 − 𝑟2 𝑖𝑚𝑝𝑎𝑐𝑡 𝑟1, 𝑟2 = 1 ( 𝑟1−𝑟 𝑚𝑒𝑑 +1)( 𝑟2−𝑟 𝑚𝑒𝑑 +1) 𝑝𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦 𝑟1, 𝑟2 = 1 and 𝜇 𝑘 = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑟𝑎𝑡𝑖𝑛𝑔 𝑓𝑜𝑟 𝑡ℎ𝑎𝑡 𝑝𝑎𝑟𝑡𝑖𝑐𝑢𝑙𝑎𝑟 𝑖𝑡𝑒𝑚 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑟𝑎𝑡𝑒𝑑 𝑏𝑦 𝑎𝑙𝑙 𝑢𝑠𝑒𝑟𝑠 Drawbacks: • It doesn’t consider the proportion of common ratings made by users
  • 17. • Jacard similarity measurement: “It only considers the no of common rating between two users.” 𝑺𝒊𝒎(𝒖, 𝒗) 𝑱𝒂𝒄𝒂𝒓𝒅 = 𝑰 𝒖 ∩ 𝑰 𝒗 𝑰 𝒖 ∪ 𝑰 𝒗 Drawbacks: • It doesn’t consider the absolute rating. • Mean squared difference: “It only considers the absolute rating ” 𝒔𝒊𝒎(𝒖, 𝒗) 𝒎𝒔𝒅 = 𝟏 − 𝒑∈𝑰(𝒓 𝒖,𝒑−𝒓 𝒗,𝒑) 𝟐 𝑰 Drawbacks: • It doesn’t consider the no of common rating between two users so, it ignores the credibility of similarity measurement. • It ignores the proportion of common rating between two users.
  • 18. Proposed method Hellinger Distance: • It is used to quantify the similarity between two vector. • The minimum hellinger distance will be zero if no item is rated by both users and all the item rated by users as absolutely same. • The value of hellinger distance will range from 0 to 2 • 2 is defines at H(P,Q) ≤ 1 for all distance between the two users 𝑯 𝑷, 𝑸 = 𝟏 𝟐 𝒊=𝟏 𝒌 ( 𝒑𝒊 − 𝒒𝒊) 𝟐 Let P = {2, 3, 1} and Q= {3, 2, 3} So, Hellinger distance = 1 2 ( 2 − 3)2 + 3 − 2 2 + ( 1 − 3)2 = 1 2 0.101021 + 0.101021 + 0.53589838 = 1 2 𝑋 0.85903 =0.60743
  • 19. Local references: • It plays an important role to find the local information about the user’s rating. • It must provide positive as well as negative co-relation between two users. • It is used for finding the actual relation between two users according to their ratings. 𝒍𝒐𝒄 𝒎𝒆𝒅 𝒓 𝒖𝒊 , 𝒓 𝒗𝒊 = (𝒓 𝒖𝒊−𝒓 𝒎𝒆𝒅 )(𝒓 𝒗𝒊 −𝒓 𝒎𝒆𝒅) 𝒌∈𝑰 𝒖 (𝒓 𝒖𝒌 −𝒓 𝒎𝒆𝒅) 𝟐 𝒌∈𝑰 𝒗 (𝒓 𝒗𝒌−𝒓 𝒎𝒆𝒅) 𝟐 Whereas, K is all items rated by users rui is the rating by user u for ith item. rvi is the rating by user v for ith item. rmed is the average of rating by users.
  • 20. Proposed method equation : 𝑆 𝑢, 𝑣 = 𝐻 𝑢, 𝑣 ∗ 𝑖∈𝑢 𝑗∈𝑣 𝑙𝑜𝑐 𝑟𝑢𝑖, 𝑟𝑣𝑗 + 𝐽𝑎𝑐𝑎𝑟𝑑(𝑢, 𝑣) Where, H(u, v) is the hellinger distance loc(rui, rvj) is the local similarities between all the user’s rating to that items Jacard (u, v) measures the rating proportion of two users.
  • 21. Result: • In this graph, the flat item-ratings and few common rating problem is solved using proposed method. • U1 and U3 and U2-U4 is flat rating, U4-U5 is improvement of Common rating Proportion. • U3 to U5 has few co-rated item problem. Item1 Item2 Item3 Item4 User1 4 3 5 4 User2 5 3 - - User3 4 3 4 4 User4 2 1 - - User5 4 2 - -
  • 22. • The problem of same co-rated vector and few co-rated items has improved using proposed method and also the simultaneous difference of rating problem has been solved. • U1 and U3 has same co-rated Vector, it improves using proposed method. • U1 and U5 suffers from few co-rated items • U4 and U5 has simultaneous difference problem.
  • 23. • The problem of local similarities and proportion of rating has improved using proposed method. • U4 and U5 has proportion of rating problem in PIP which improved by proposed method. • U1 and U4 has few co-rated item problems. • U2 and U4 has local similarities improvement.
  • 24. Evaluation of Proposed method in large dataset • Through large dataset of Movielens, called ML-100K, there are 100,000 ratings with 943 persons and 1682 movies. Another is ML-1M, it includes 6040 users and 3952 movies with 1,000,209 ratings. Each user has rated at least 20 movies.
  • 25. • The movie’s recommendation using Cosine Similarity and proposed method.
  • 26. • The movie’s recommendation using PIP (proximity-impact-popularity) and proposed method.
  • 27. References • J. Bobadilla, F. Ortega, A. Hernando, A. Gutirrez, Recommender systems survey, Knowl.-Based Syst. 46 (2013) 109–132. • P. Resnick, H.R. Varian, Recommender systems, Commun. ACM 40 (3) (1997) 56–58. • G. Linden, B. Smith, J. York, Amazon.com recommendations: item-to-item collaborative filtering, IEEE Internet Comput. 7 (1) (2003) 76–80. • Y. Koren, Factorization meets the neighborhood: a multifaceted collaborative filtering model, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 426–434. • C. Desrosiers, G. Karypis, A comprehensive survey of neighborhood-based recommendation methods, in: Recommender Systems Handbook, 2011, pp. 107–144. • M.J. Pazzani, D. Billsus, Content-based recommendation systems, The Adap. Web (2007) 325–341. • H. Junming, C. Xueqi, G. Jiafeng, S. Huawei, Y. Kun, Social recommendation with interpersonal influence, ECAI 10 (2010) 601– 606.
  • 28. Thank You ! A special thanks to my project guide Dr. Rajendra Pamula sir for guiding, motivating and providing me with fruitful information throughout the development process of this project work My sincere gratitude to the panel of teachers present for giving their precious time for listening and evaluating my project presentation