SlideShare a Scribd company logo
MINING COMPETITORS FROM LARGE
UNSTRUCTURED DATA
7/24/2020 1MIT WPU, Department of Computer Science
and Engineering, Pune
Presented by : Tejas Salunkhe
Guided by: Mrs Sushila Aghav
Contents
• Introduction
• Motivation
• Objectives
• Literature review
• Comparative study
• Research gap
• Problem Statement
• Data-sets
• System Architecture
• Algorithms
• Implementation Results
• Conclusion and Future scope
• References
• Publications
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
2
Introduction
• Large competitors competing for market share
• Users often get confused what to buy? Where to buy? Which service to avail?
• Mining competitors gives users immediate result on which service they can avail
• It creates a link between “What user wants” vs “What company offers”
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
3
Motivation
• Users usually gets confused which product to be used from large group of
products
• Businesses strive to deliver right product to right set of customers
• The main motivation here is identifying a right balance between the User and the
Business so that the user gets right set of services or product and Business get
right set of customers
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
4
Literature Review
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
5
Sr.
No.
Title of Paper Author
Conference/ Journals
Description Advantages / Limitation
1. Trust aware
recommendation in Social
Networks
Shuihuang Deng , Longtao
Huang , Guandog Xu,
Xindong Wu
In this paper the researchers
implemented a trust aware
recommendation approach
called TRA.
*. TRA can be used to find out
the trusted reviews, hence can
be termed useful in finding out
right reviews from verified
sources
#. Process of obtaining data is
cubersome and refer numerous
computation which fail to
provide solid solutions on large
datasets
2. Mining Competitors George Valkanas, Theodoros
Lappas, and Dimitrios
Gunopulos
In this paper the researchers
desciribed various ways of
finding Top Competitors
*. Describe various ways of
mining competitors across
various datasets
#. They naively compute the
competitiveness of every single
item in the corpus with respect
to the target item.
Literature Review
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
6
Sr.
No.
Title of Paper Author
Conference/
Journals
Description Advantages / Limitation
3. User-service rating
prediction by exploring
social user rating behaviour
Guoshuai ,
Xueming , Xing
Xie
This paper proposes a user-service rating
prediction approach by exploring users’ rating
behaviors with considering four social network
factors: user personal interest (related to user
and the item’s topics), interpersonal interest
similarity (related to user interest),
interpersonal rating behavior similarity
(related to users’ rating habits), and
interpersonal rating behavior diffusion (related
to users’ behavior diffusions).
*. Designed a approach that could
automatically provide User service
rating prediction
#. More factors could be
considered for the purpose of
analysis
4. Understanding short texts by
harvesting and analyzing
semantic knowledge
Wen Hua ,
Zhongyuan,Haixu
n,Xiaofang Zhou
In this work, they propose a generalized
framework to understand short texts
effectively and efficiently. More specifically,
they divide the task of short text understanding
into three subtasks: text
segmentation, type detection, and concept
labeling.
*. Algorithm that automatically
learns the best way to understand
the short texts
#. Fails to analyze the text which
dont fall in the data dictionary
Research Gap
• Formal defination of Competitiveness between two items was not previously
addressed and validated both quantitatively and qualitatively
• The results and outcomes obtained via Data Mining are very hard to understand
• The formalizations currently present cannot be applicable across various domains
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
7
Problem Statement
• In any Business success is based on ability to make item more appealing to
customer than the competitor. Even the customer struggle to find a right set of
product as per their requirements, many times they end up buying the product
which is not required or doesn't fullfill their needs. Various data mining
techniques can be used to improve the user experience and also may turn out
beneficial from Business point of view.
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
8
Data Set
1. We use Hotel dataset which has about 13 different parameters like :
• Name, Address, Latitude, Longitude, Facility, Hotel Star's,Vendor Id, Reviews, User Id,
User Email, User Rating, Vendor Id, Vendor Email
2. How did we extract this Dataset?
• This DataSet was scraped from websites like TripAdvisor, MakemyTrip,
Trivago
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
9
Data Set
• Tools like Octoparse and DataMiner were used for Data Scraping
• We Scraped 1200 hotel records which have the above given fields(except Vendor Id,
Vendor Email, User Id, User Email, User Rating)
• The Latitude and Longitude captured was used to showcase the location of the Hotels
with respect to the users current location...
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
10
System Architecture
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
11
System Architecture for Mining Competitors
Algorithm
Input: I = User requirements
Output: P = Top K competitors for the given user requirement
Begin
1. Get the user requirements from the User along with the keyset of requirements
2. Get K value from the user so as to know exactly how many recommendations
does the user need for given set of data
3. After receiving the value of K also need how exactly the user requires the
recommendation based on Ratings, reviews or location
4. Get the requirements and apply Cminer ++ algorithm on the Database
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
12
Algorithm
5. Map the requirement given by the user with that from Database using Cminer++
6. Provide a list of Top K competitors
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
13
Hardware and Software Requirements
 1. Software Requirement:
1. Technology: Java
2. Tools: Eclipse Luna, Octoparse, Data Miner
3. Operating System: Windows 10
4. Server - Apache Tomcat 8.0
5. Database - MySQL 5.0
 2. Hardware Requirement:
Hard disk : 1TB
RAM : 8GB
Processor : Intel Core i5 or above
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
14
Defining Competitivess
7/24/2020 MIT WPU, Department of Computer Science and
Engineering, Pune
15
Name Bar Breakfast Gym Parking Pool Wifi
Hilton Yes No Yes Yes Yes Yes
Marriot Yes Yes No Yes Yes Yes
Westin No Yes Yes Yes No Yes
ID Size Features
q1 100 (parking, wifi)
q2 50 (parking)
q3 60 (wifi)
q4 120 (gym, wifi)
q5 250 (breakfast, parking)
q6 80 (gym, bar, breakfast)
Defining Competitiveness
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
16
Restaurant Pairs Common Segments Common %
Hilton, Marriot (q1, q2, q3) 32%
Hilton, Westin (q1, q2, q3, q4) 50%
Marriot, Westin (q1, q2, q3, q5) 70%
Observations:
•Lowest competitiveness is observed by Hilton, Marriot even though these hotels are quite similar by the
feature
•This shows similarity is not a good proxy for competitveness
Defining Competitiveness
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
17
VC
q
jifqf
qpji ,^2
*)(),(  

We define the Competitiveness between i and j in market with a feature subset f as follows:
Cf(i,j) : represents probability that two items are included in the consideration set of random users
p(q) : percentage of users represented by query q
V : Pairwise coverage
•If a random user U shows interest in item i, then he is also most likely to be interested in items with highest
Cf(i,j) values,
Pairwise Coverage
• Pairwise coverage of a feature f is the percentage of all possible values of
f that can be covered by both i,j
• Lets consider the Pairwise coverage for :
1. Binary and Categorical Features
2. Numeric Features
3. Ordinal Features
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
18
V
q
ji ,
V
q
ji ,
Binary and Categorical Features
• Categorical feature takes one or more values from finite space
• Single value features include feature like eg: Brand of Camera, Location of Hotel
etc
• Multi-value features include amenities offered by Hotel etc
• Any categorical feataure can be encoded via set of Binary features with each
binary feature lacking coverage of original one
• A feature can be fully covered if f[i] = f[j] = 1 or equivalently f[i] * f[j] = 1 or not
covered at all
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
19
Binary and Categorical Features
• Binary features equation:
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
20
][*][,
jfifV
f
ji

Numeric Features
• Takes value from predefined range
• Numeric features takes value in [0,1] range with higher values being more
preferred
• For eg: Consider a two hotels i, j with values 0.5 and 0.8 for the feature food
quality. Their pairwise coverage is 0.5 in this case. Conceptually they will fight
for all the customers who accept food quality <= 0.5. Also the customers with
high requirement for food quality would ignore i as a option.
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
21
Numeric Features
• Numeric features equation
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
22
])[],[min(,
jfifV
f
ji

Ordinal Features
• Takes values from finite ordered list
• Characteristic example is popular 5 star product scale used to evaluate quality of
service or product
• Consider there are 2 hotels with ratings like 2, 3 and Customer demands atleast 4
star rating then in such case he wont consider both the hotels , while a customer
who requires atleast 3 star rating would consider the second hotel
• As in the case of Numeric feature we consider pairwise coverage as lowest value
amongst the competitor hotels
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
23
Ordinal Features
• Two items compete for 2 out of 5 levels of the ordinal scale, the competitiveness
is proportional to 2/5 =0.4
• Pairwise coverage for Ordinal Features can be given as follows:
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
24
||
))(),(min(
, f
f
ji
V
jfif
V 
Extending Competitive Definition
• Feature Uniformity:
1) Users demanding quality in [0,0.1] might be different than those demanding a
value in [0.4,0.5]
• Feature Importance:
1. A common assumption in our research is that all the features in the query have
equal importance if not specifically mentioned
2. However a user who submits the query q=(f1,f2) may care more about f1 than f2
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
25
Computational Analysis
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
26
Sr. No Algorithm Advantages Disadvantages
1. Naive Provides a consistent computational time
regardless of k
Naively computes
competativeness of
every single items in the
corpus with respect to
Target item
2. G miner Performs well for datasets with distinct
queries
Time Consuming but
performs better than
Naive
3. C miner Performs well for datasets with populas
queries
Considers Large
numbers of
computations for larger
value of k
Computational Analysis
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
27
Sr. No Algorithm Advantages Disadvantages
4 C Miner ++ Provides improvised results with increased
value of k, this is due to Pruning feature of
the Cminer++
If discarding or
evaluating candidates is
improvised the results
can be even more
improvised
Summary of Comparitive Study
• Naive fails to provide the similar computations when compared to G miner , C
miner and Cminer ++
• Naive < G miner < C miner < C miner++
• C miner ++ provides a improvised pruning which enables improved quality
results with increased value of k
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
28
Computations
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
29
ALGORITHM Execution Times Sum of Number of Competitors
C Miner 463
0.004 3
0.04 10
0.6 150
1.2 300
C Miner ++ 463
0.003 3
0.03 10
0.45 150
0.9 300
G Miner 463
0.005 3
0.05 10
0.75 150
1.5 300
Naive 463
0.25 463
Grand Total 1852
Accuracy
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
30
Algorithm Accuracy
Naive 60%
G Miner 72%
C Miner 83%
C Miner ++ 87%
Accuracy
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
31
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
C Miner C Miner ++ G Miner Naive
Total
Contributions
• Pruning Efficiency
• Reduced the Number of Considered Queries
• Used Tools like Octoparse, Data Miner for Data Scraping
• Boosted C Miner algorithm by adding Query Ordering & by adding
UPDATETOPK() and GETSLAVES()
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
32
Conclusion and Future Scope
• Formal definition of competitiveness between two item is validated
• The formalization is applicable across large number of domains
• We addressed a computationally challenging problem of finding Top K competitors
of a given item
• The evaluations revealed that even small number of reviews is sufficient to estimate
the given market and find interesting observations
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
33
Conclusion and Future Scope
• The evaluations carried out can be used across various domains and thus can be
considered as a future scope
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
34
References
• George Valkanas, Theodoros Lappas, and Dimitrios Gunopulos, “Mining Competitors from Large
Unstructured Datasets”, IEEE Transactions on Knowledge and Data Engineering, 2017.
• Deng, Shuiguang, Longtao Huang, Guandong Xu, Xindong Wu and Zhaohui Wu. “On Deep Learning
for Trust-Aware Recommendations in Social Networks.” IEEE Transactions on Neural Networks and
Learning Systems 28 (2017): 1164-1177.
• Qingchao Kong , Wenji Mao , Guandan Chen , Daniel Zeng “Exploring trends and patterns of
popularity stage evolution in social media.” IEEE Transactions on Knowledge and Data Engineering,
2018
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
35
References
• Deng, Shuiguang, Longtao Huang, Guandong Xu, Xindong Wu and Zhaohui Wu. “On Deep Learning
for Trust-Aware Recommendations in Social Networks.” IEEE Transactions on Neural Networks and
Learning Systems 28 (2017): 1164-1177.
• Hua, Wen, Zhongyuan Wang, Haixun Wang, Kai Zheng and Xiaofang Zhou. “Understand Short Texts
by Harvesting and Analyzing Semantic Knowledge.” IEEE Transactions on Knowledge and Data
Engineering 29 (2017): 499-512.
• Zhao, Guoshuai, Xueming Qian and Xing Xie. “User-Service Rating Prediction by Exploring Social
Users' Rating Behaviors.” IEEE Transactions on Multimedia 18 (2016): 496-506
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
36
Publications
• Survey paper:
 Paper Title : MINING SOCIAL NETWORKS FOR BUSINESS COMPETITION
ANALYSIS
 Journal : Asian Journal For Convergence In Technology (AJCT), 2019
• Implementation paper:
 Paper Title:MINING COMPETITORS FROM STRUCTURED/UNSTRUCTURED
DATA
 Journal: International Journal of Scienfic and Engineering Research (IJSER) (ISSN
2229-5518)
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
37
THANK YOU
7/24/2020 MIT WPU, Department of Computer Science
and Engineering, Pune
38

More Related Content

PDF
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
PDF
Multidirectional Product Support System for Decision Making In Textile Indust...
PDF
IRJET-Survey on Identification of Top-K Competitors using Data Mining
PDF
Projection pursuit Random Forest using discriminant feature analysis model fo...
PDF
Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021
PDF
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
PPTX
Predicting Current User Intent with Contextual Markov Models
PDF
Tutorial on Operationalizing Treatments against Bias: Challenges and Solution...
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
Multidirectional Product Support System for Decision Making In Textile Indust...
IRJET-Survey on Identification of Top-K Competitors using Data Mining
Projection pursuit Random Forest using discriminant feature analysis model fo...
Tutorial on Advances in Bias-aware Recommendation on the Web @ WSDM 2021
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
Predicting Current User Intent with Contextual Markov Models
Tutorial on Operationalizing Treatments against Bias: Challenges and Solution...

What's hot (12)

PDF
Nitesh synposis
PDF
IRJET- E-Commerce Recommendation based on Users Rating Data
PDF
Prediction of Default Customer in Banking Sector using Artificial Neural Network
PDF
Java datamining ieee Projects 2012 @ Seabirds ( Chennai, Mumbai, Pune, Nagpur...
PDF
IRJET- Analysis of Rating Difference and User Interest
PDF
IEEE Projects 2012 - 2013
PDF
MTVRep: A movie and TV show reputation system based on fine-grained sentiment ...
PDF
Tutorial on Countering Bias in Personalized Rankings: From Data Engineering t...
PDF
Tutorial on Bias in Personalized Rankings: Concepts to Code @ ICDM 2020
PDF
Tutorial on Bias in Rec Sys @ UMAP2020
PDF
IRJET- Physical Design of Approximate Multiplier for Area and Power Efficiency
PDF
Bulk ieee projects 2012 2013
Nitesh synposis
IRJET- E-Commerce Recommendation based on Users Rating Data
Prediction of Default Customer in Banking Sector using Artificial Neural Network
Java datamining ieee Projects 2012 @ Seabirds ( Chennai, Mumbai, Pune, Nagpur...
IRJET- Analysis of Rating Difference and User Interest
IEEE Projects 2012 - 2013
MTVRep: A movie and TV show reputation system based on fine-grained sentiment ...
Tutorial on Countering Bias in Personalized Rankings: From Data Engineering t...
Tutorial on Bias in Personalized Rankings: Concepts to Code @ ICDM 2020
Tutorial on Bias in Rec Sys @ UMAP2020
IRJET- Physical Design of Approximate Multiplier for Area and Power Efficiency
Bulk ieee projects 2012 2013
Ad

Similar to Mining competitors from large unstructured data (20)

PDF
IRJET- Android Application for Service by using Bidding and Ratings in nearby...
PDF
Automated Feature Selection and Churn Prediction using Deep Learning Models
PDF
IRJET-Smart Tourism Recommender System
PDF
IRJET- Career Counselling Chatbot
PDF
Multi attribute decision making for mobile phone selection
PDF
Providing Highly Accurate Service Recommendation over Big Data using Adaptive...
PDF
IRJET- Research On Android Application for service By Using Bidding and R...
PDF
IRJET - Interaction based Expert System
PDF
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
PDF
IRJET - Customer Churn Analysis in Telecom Industry
PDF
University Recommendation Support System using ML Algorithms
PDF
SURVEY ON SENTIMENT ANALYSIS
PDF
IRJET- Placement Portal and Prediction System
PDF
CFA-NY Workshop - Final slides
PDF
IRJET- Shopping Mall Experience using Beacon Technology
PDF
Online Service Rating Prediction by Removing Paid Users and Jaccard Coefficient
PDF
Tourist Destination Recommendation System using Cosine Similarity
PDF
IRJET - An Intelligent Recommendation for Social Contextual Image using H...
PDF
Service Rating Prediction by check-in and check-out behavior of user and POI
PDF
COLLEGE ONLINE ELECTION SYSTEM
IRJET- Android Application for Service by using Bidding and Ratings in nearby...
Automated Feature Selection and Churn Prediction using Deep Learning Models
IRJET-Smart Tourism Recommender System
IRJET- Career Counselling Chatbot
Multi attribute decision making for mobile phone selection
Providing Highly Accurate Service Recommendation over Big Data using Adaptive...
IRJET- Research On Android Application for service By Using Bidding and R...
IRJET - Interaction based Expert System
IRJET - Recommendations Engine with Multi-Objective Contextual Bandits (U...
IRJET - Customer Churn Analysis in Telecom Industry
University Recommendation Support System using ML Algorithms
SURVEY ON SENTIMENT ANALYSIS
IRJET- Placement Portal and Prediction System
CFA-NY Workshop - Final slides
IRJET- Shopping Mall Experience using Beacon Technology
Online Service Rating Prediction by Removing Paid Users and Jaccard Coefficient
Tourist Destination Recommendation System using Cosine Similarity
IRJET - An Intelligent Recommendation for Social Contextual Image using H...
Service Rating Prediction by check-in and check-out behavior of user and POI
COLLEGE ONLINE ELECTION SYSTEM
Ad

Recently uploaded (20)

PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Classroom Observation Tools for Teachers
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Cell Types and Its function , kingdom of life
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Microbial disease of the cardiovascular and lymphatic systems
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
VCE English Exam - Section C Student Revision Booklet
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
TR - Agricultural Crops Production NC III.pdf
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Classroom Observation Tools for Teachers
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Cell Types and Its function , kingdom of life
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPH.pptx obstetrics and gynecology in nursing
Week 4 Term 3 Study Techniques revisited.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
Abdominal Access Techniques with Prof. Dr. R K Mishra
STATICS OF THE RIGID BODIES Hibbelers.pdf
Anesthesia in Laparoscopic Surgery in India
school management -TNTEU- B.Ed., Semester II Unit 1.pptx

Mining competitors from large unstructured data

  • 1. MINING COMPETITORS FROM LARGE UNSTRUCTURED DATA 7/24/2020 1MIT WPU, Department of Computer Science and Engineering, Pune Presented by : Tejas Salunkhe Guided by: Mrs Sushila Aghav
  • 2. Contents • Introduction • Motivation • Objectives • Literature review • Comparative study • Research gap • Problem Statement • Data-sets • System Architecture • Algorithms • Implementation Results • Conclusion and Future scope • References • Publications 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 2
  • 3. Introduction • Large competitors competing for market share • Users often get confused what to buy? Where to buy? Which service to avail? • Mining competitors gives users immediate result on which service they can avail • It creates a link between “What user wants” vs “What company offers” 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 3
  • 4. Motivation • Users usually gets confused which product to be used from large group of products • Businesses strive to deliver right product to right set of customers • The main motivation here is identifying a right balance between the User and the Business so that the user gets right set of services or product and Business get right set of customers 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 4
  • 5. Literature Review 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 5 Sr. No. Title of Paper Author Conference/ Journals Description Advantages / Limitation 1. Trust aware recommendation in Social Networks Shuihuang Deng , Longtao Huang , Guandog Xu, Xindong Wu In this paper the researchers implemented a trust aware recommendation approach called TRA. *. TRA can be used to find out the trusted reviews, hence can be termed useful in finding out right reviews from verified sources #. Process of obtaining data is cubersome and refer numerous computation which fail to provide solid solutions on large datasets 2. Mining Competitors George Valkanas, Theodoros Lappas, and Dimitrios Gunopulos In this paper the researchers desciribed various ways of finding Top Competitors *. Describe various ways of mining competitors across various datasets #. They naively compute the competitiveness of every single item in the corpus with respect to the target item.
  • 6. Literature Review 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 6 Sr. No. Title of Paper Author Conference/ Journals Description Advantages / Limitation 3. User-service rating prediction by exploring social user rating behaviour Guoshuai , Xueming , Xing Xie This paper proposes a user-service rating prediction approach by exploring users’ rating behaviors with considering four social network factors: user personal interest (related to user and the item’s topics), interpersonal interest similarity (related to user interest), interpersonal rating behavior similarity (related to users’ rating habits), and interpersonal rating behavior diffusion (related to users’ behavior diffusions). *. Designed a approach that could automatically provide User service rating prediction #. More factors could be considered for the purpose of analysis 4. Understanding short texts by harvesting and analyzing semantic knowledge Wen Hua , Zhongyuan,Haixu n,Xiaofang Zhou In this work, they propose a generalized framework to understand short texts effectively and efficiently. More specifically, they divide the task of short text understanding into three subtasks: text segmentation, type detection, and concept labeling. *. Algorithm that automatically learns the best way to understand the short texts #. Fails to analyze the text which dont fall in the data dictionary
  • 7. Research Gap • Formal defination of Competitiveness between two items was not previously addressed and validated both quantitatively and qualitatively • The results and outcomes obtained via Data Mining are very hard to understand • The formalizations currently present cannot be applicable across various domains 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 7
  • 8. Problem Statement • In any Business success is based on ability to make item more appealing to customer than the competitor. Even the customer struggle to find a right set of product as per their requirements, many times they end up buying the product which is not required or doesn't fullfill their needs. Various data mining techniques can be used to improve the user experience and also may turn out beneficial from Business point of view. 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 8
  • 9. Data Set 1. We use Hotel dataset which has about 13 different parameters like : • Name, Address, Latitude, Longitude, Facility, Hotel Star's,Vendor Id, Reviews, User Id, User Email, User Rating, Vendor Id, Vendor Email 2. How did we extract this Dataset? • This DataSet was scraped from websites like TripAdvisor, MakemyTrip, Trivago 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 9
  • 10. Data Set • Tools like Octoparse and DataMiner were used for Data Scraping • We Scraped 1200 hotel records which have the above given fields(except Vendor Id, Vendor Email, User Id, User Email, User Rating) • The Latitude and Longitude captured was used to showcase the location of the Hotels with respect to the users current location... 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 10
  • 11. System Architecture 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 11 System Architecture for Mining Competitors
  • 12. Algorithm Input: I = User requirements Output: P = Top K competitors for the given user requirement Begin 1. Get the user requirements from the User along with the keyset of requirements 2. Get K value from the user so as to know exactly how many recommendations does the user need for given set of data 3. After receiving the value of K also need how exactly the user requires the recommendation based on Ratings, reviews or location 4. Get the requirements and apply Cminer ++ algorithm on the Database 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 12
  • 13. Algorithm 5. Map the requirement given by the user with that from Database using Cminer++ 6. Provide a list of Top K competitors 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 13
  • 14. Hardware and Software Requirements  1. Software Requirement: 1. Technology: Java 2. Tools: Eclipse Luna, Octoparse, Data Miner 3. Operating System: Windows 10 4. Server - Apache Tomcat 8.0 5. Database - MySQL 5.0  2. Hardware Requirement: Hard disk : 1TB RAM : 8GB Processor : Intel Core i5 or above 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 14
  • 15. Defining Competitivess 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 15 Name Bar Breakfast Gym Parking Pool Wifi Hilton Yes No Yes Yes Yes Yes Marriot Yes Yes No Yes Yes Yes Westin No Yes Yes Yes No Yes ID Size Features q1 100 (parking, wifi) q2 50 (parking) q3 60 (wifi) q4 120 (gym, wifi) q5 250 (breakfast, parking) q6 80 (gym, bar, breakfast)
  • 16. Defining Competitiveness 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 16 Restaurant Pairs Common Segments Common % Hilton, Marriot (q1, q2, q3) 32% Hilton, Westin (q1, q2, q3, q4) 50% Marriot, Westin (q1, q2, q3, q5) 70% Observations: •Lowest competitiveness is observed by Hilton, Marriot even though these hotels are quite similar by the feature •This shows similarity is not a good proxy for competitveness
  • 17. Defining Competitiveness 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 17 VC q jifqf qpji ,^2 *)(),(    We define the Competitiveness between i and j in market with a feature subset f as follows: Cf(i,j) : represents probability that two items are included in the consideration set of random users p(q) : percentage of users represented by query q V : Pairwise coverage •If a random user U shows interest in item i, then he is also most likely to be interested in items with highest Cf(i,j) values,
  • 18. Pairwise Coverage • Pairwise coverage of a feature f is the percentage of all possible values of f that can be covered by both i,j • Lets consider the Pairwise coverage for : 1. Binary and Categorical Features 2. Numeric Features 3. Ordinal Features 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 18 V q ji , V q ji ,
  • 19. Binary and Categorical Features • Categorical feature takes one or more values from finite space • Single value features include feature like eg: Brand of Camera, Location of Hotel etc • Multi-value features include amenities offered by Hotel etc • Any categorical feataure can be encoded via set of Binary features with each binary feature lacking coverage of original one • A feature can be fully covered if f[i] = f[j] = 1 or equivalently f[i] * f[j] = 1 or not covered at all 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 19
  • 20. Binary and Categorical Features • Binary features equation: 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 20 ][*][, jfifV f ji 
  • 21. Numeric Features • Takes value from predefined range • Numeric features takes value in [0,1] range with higher values being more preferred • For eg: Consider a two hotels i, j with values 0.5 and 0.8 for the feature food quality. Their pairwise coverage is 0.5 in this case. Conceptually they will fight for all the customers who accept food quality <= 0.5. Also the customers with high requirement for food quality would ignore i as a option. 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 21
  • 22. Numeric Features • Numeric features equation 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 22 ])[],[min(, jfifV f ji 
  • 23. Ordinal Features • Takes values from finite ordered list • Characteristic example is popular 5 star product scale used to evaluate quality of service or product • Consider there are 2 hotels with ratings like 2, 3 and Customer demands atleast 4 star rating then in such case he wont consider both the hotels , while a customer who requires atleast 3 star rating would consider the second hotel • As in the case of Numeric feature we consider pairwise coverage as lowest value amongst the competitor hotels 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 23
  • 24. Ordinal Features • Two items compete for 2 out of 5 levels of the ordinal scale, the competitiveness is proportional to 2/5 =0.4 • Pairwise coverage for Ordinal Features can be given as follows: 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 24 || ))(),(min( , f f ji V jfif V 
  • 25. Extending Competitive Definition • Feature Uniformity: 1) Users demanding quality in [0,0.1] might be different than those demanding a value in [0.4,0.5] • Feature Importance: 1. A common assumption in our research is that all the features in the query have equal importance if not specifically mentioned 2. However a user who submits the query q=(f1,f2) may care more about f1 than f2 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 25
  • 26. Computational Analysis 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 26 Sr. No Algorithm Advantages Disadvantages 1. Naive Provides a consistent computational time regardless of k Naively computes competativeness of every single items in the corpus with respect to Target item 2. G miner Performs well for datasets with distinct queries Time Consuming but performs better than Naive 3. C miner Performs well for datasets with populas queries Considers Large numbers of computations for larger value of k
  • 27. Computational Analysis 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 27 Sr. No Algorithm Advantages Disadvantages 4 C Miner ++ Provides improvised results with increased value of k, this is due to Pruning feature of the Cminer++ If discarding or evaluating candidates is improvised the results can be even more improvised
  • 28. Summary of Comparitive Study • Naive fails to provide the similar computations when compared to G miner , C miner and Cminer ++ • Naive < G miner < C miner < C miner++ • C miner ++ provides a improvised pruning which enables improved quality results with increased value of k 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 28
  • 29. Computations 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 29 ALGORITHM Execution Times Sum of Number of Competitors C Miner 463 0.004 3 0.04 10 0.6 150 1.2 300 C Miner ++ 463 0.003 3 0.03 10 0.45 150 0.9 300 G Miner 463 0.005 3 0.05 10 0.75 150 1.5 300 Naive 463 0.25 463 Grand Total 1852
  • 30. Accuracy 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 30 Algorithm Accuracy Naive 60% G Miner 72% C Miner 83% C Miner ++ 87%
  • 31. Accuracy 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 31 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 C Miner C Miner ++ G Miner Naive Total
  • 32. Contributions • Pruning Efficiency • Reduced the Number of Considered Queries • Used Tools like Octoparse, Data Miner for Data Scraping • Boosted C Miner algorithm by adding Query Ordering & by adding UPDATETOPK() and GETSLAVES() 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 32
  • 33. Conclusion and Future Scope • Formal definition of competitiveness between two item is validated • The formalization is applicable across large number of domains • We addressed a computationally challenging problem of finding Top K competitors of a given item • The evaluations revealed that even small number of reviews is sufficient to estimate the given market and find interesting observations 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 33
  • 34. Conclusion and Future Scope • The evaluations carried out can be used across various domains and thus can be considered as a future scope 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 34
  • 35. References • George Valkanas, Theodoros Lappas, and Dimitrios Gunopulos, “Mining Competitors from Large Unstructured Datasets”, IEEE Transactions on Knowledge and Data Engineering, 2017. • Deng, Shuiguang, Longtao Huang, Guandong Xu, Xindong Wu and Zhaohui Wu. “On Deep Learning for Trust-Aware Recommendations in Social Networks.” IEEE Transactions on Neural Networks and Learning Systems 28 (2017): 1164-1177. • Qingchao Kong , Wenji Mao , Guandan Chen , Daniel Zeng “Exploring trends and patterns of popularity stage evolution in social media.” IEEE Transactions on Knowledge and Data Engineering, 2018 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 35
  • 36. References • Deng, Shuiguang, Longtao Huang, Guandong Xu, Xindong Wu and Zhaohui Wu. “On Deep Learning for Trust-Aware Recommendations in Social Networks.” IEEE Transactions on Neural Networks and Learning Systems 28 (2017): 1164-1177. • Hua, Wen, Zhongyuan Wang, Haixun Wang, Kai Zheng and Xiaofang Zhou. “Understand Short Texts by Harvesting and Analyzing Semantic Knowledge.” IEEE Transactions on Knowledge and Data Engineering 29 (2017): 499-512. • Zhao, Guoshuai, Xueming Qian and Xing Xie. “User-Service Rating Prediction by Exploring Social Users' Rating Behaviors.” IEEE Transactions on Multimedia 18 (2016): 496-506 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 36
  • 37. Publications • Survey paper:  Paper Title : MINING SOCIAL NETWORKS FOR BUSINESS COMPETITION ANALYSIS  Journal : Asian Journal For Convergence In Technology (AJCT), 2019 • Implementation paper:  Paper Title:MINING COMPETITORS FROM STRUCTURED/UNSTRUCTURED DATA  Journal: International Journal of Scienfic and Engineering Research (IJSER) (ISSN 2229-5518) 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 37
  • 38. THANK YOU 7/24/2020 MIT WPU, Department of Computer Science and Engineering, Pune 38