SlideShare a Scribd company logo
ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 148 (2019) 80–86
1877-0509 © 2019 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the scientific committee of the Second International Conference on Intelligent Computing in
Data Sciences (ICDS 2018).
10.1016/j.procs.2019.01.011
© 2019 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the scientific committee of the Second International Conference on Intelligent Computing in
Data Sciences (ICDS 2018).
Keywords: Semantic analysis; Opinion mining; Reputation generation; Machine learning.
1. Introduction
Over the past few years, the web has been growing at an incredible rate. Nowadays, people can buy products,
watch movies and make reservations via the Internet. Before making a decision, most people would like to seek
other people’s opinions on a target entity in order to judge its performances. In this case, one could scan both entity
descriptions and user comments. Hence, people have generally established a certain online custom: first look at other
users’ comments, then make a decision towards the target entity. On the other hand, online sellers would like to collect
the user comments with high praise and put them in the description of their products in order to attract more purchases.
According to recent statistics, the number of users of some famous online shopping centers, e.g., Taobao, Jingdong and
Amazon has exceeded one billion. Each of above commercial websites contains a huge number of product comments.
These comments contain user opinions on the products. As the opinions show the subjective attitudes, evaluations,
and speculations of users expressed in natural languages, this kind of contents contributed by the Internet users has
been well recognized as valuable information. It can be exploited to analyze public opinions on a specific product in
∗ Corresponding author. Tel.: +212632561278.
E-mail address: abdessamad.benlahbib@usmba.ac.ma
Second International Conference on Intelligent Computing in Data Sciences (ICDS 2018)
An Unsupervised Approach for Reputation Generation
Abdessamad Benlahbiba,∗, El Habib Nfaouia
aLIIAN Laboratory, Faculty of Sciences Dhar EL Mehraz, Sidi Mohammed Ben Abdellah University, Fez, Morocco
Abstract
Nowadays, watching a movie, buying a product, making hotel reservations and other e-commerce trades are strung to consulting
other peoples reviews and recommendations on the target entity. Indeed, Amazon, IMDB (Internet Movie Database) as well as
several websites provide a convenient platform where users share freely their opinions and their subjective attitudes towards the
target entity with no restrictions. However, those opinions are too much to be examined one by one, this is why a general reputation
value makes the task of choosing the right product much easier. In this paper, we propose a reputation generation approach based
on opinion clustering and semantic analysis. In our approach, opinions are grouped into a number of clusters that contain opinions
with the same attitude or preference. By aggregating the ratings attached to the clusters, we generate the reputation of an entity.
Experimental results demonstrate the effectiveness of the proposed approach in generating reputation value.
Abdessamad Benlahbib et al. / Procedia Computer Science 148 (2019) 80–86 81
order to figure out user likes or dislikes [5]. In this paper, we propose to use LSA (Latent Semantic Analysis) model,
then applying K-means algorithm to cluster opinions based on their semantic relations, and by aggregating the ratings
attached to the fused opinions, we normalize the reputation of an entity. The paper is organized as follows. Section
2 gives a brief review of related work. In Section 3, we present the details of our approach. We show experimental
results followed by additional analysis and discussions in Section 4. Finally, conclusions are presented in Section 5.
2. Related work
Reputation is a measure that is derived from direct or indirect knowledge on earlier interactions of entities and is
used to assess the level of trust an entity puts into another entity [1].
Reputation systems are typically based on public information in order to reflect the community’s opinion in general
[2]. The simplest form of computing reputation scores is simply to sum the number of positive ratings and negative
ratings separately, and to keep a total score as the positive score minus the negative score. This is the principle used in
eBay’s reputation forum which is described in [3]. In [4], a more advanced scheme proposed to compute the reputation
score as the average of all ratings, and this principle is used in the reputation systems of numerous commercial web
sites, such as Epinions and Amazon. Advanced models in this category compute a weighted average of all the ratings,
where the rating weight can be determined by factors such as rater trustworthiness/reputation, age of the rating,
distance between rating and current score etc.
Recently, Zheng et al [5] proposed a novel reputation generation approach based on opinion fusion and mining. In their
approach, opinions are filtered to eliminate unrelated ones, and then grouped into a number of fused principal opinion
sets that contain opinions with a similar or the same attitude or preference. By aggregating the ratings attached to the
fused opinions, they normalize the reputation of an entity. They claimed that: ”No work has explored the opinions
expressed in natural languages, opinion voting, opinion citation and user feedback ratings in a comprehensive way
for reputation generation” [5].
3. Proposed method
In this section, we remember the LSA technique and the K-means algorithm, then we describe in depth our pro-
posed method for reputation generation.
3.1. Latent Semantic Analysis
Latent Semantic Analysis (LSA) is a technique in natural language processing of analyzing relationships between a
set of documents and the terms they contain by producing a set of concepts related to the documents and terms. In [6],
T.K. Landauer, P.W. Foltz and D. Laham describe LSA as follows: ”Latent Semantic Analysis (LSA) is a theory and
method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a
large corpus of text (Landauer and Dumais, 1997). The underlying idea is that the aggregate of all the word contexts
in which a given word does and does not appear provides a set of mutual constraints that largely determines the
similarity of meaning of words and sets of words to each other. The adequacy of LSAs reflection of human knowledge
has been established in a variety of ways. For example, its scores overlap those of humans on standard vocabulary and
subject matter tests; it mimics human word sorting and category judgments; it simulates wordword and passageword
lexical priming data, and, it accurately estimates passage coherence, learnability of passages by individual students,
and the quality of knowledge contained in an essay”.
3.2. K-means Algorithm
K-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster
analysis in data mining. K-means algorithm aims to divide M points in N dimensions into K clusters so that the
within-cluster sum of squares is minimized [7][8].
82 Abdessamad Benlahbib et al. / Procedia Computer Science 148 (2019) 80–86
3.3. System overview
We propose the following procedure to cluster and mine opinions for reputation generation.
1. Opinion data collection and preprocess. During this step, we collect the opinion data about an entity coming
from websites (product, movie, etc). Because there are many types of raw opinion data that contain many words
and symbols, preprocessing of such collected raw data is required, such as filtering word segmentation and stop
words and eliminating useless expressions and pictures, etc.
2. Opinion clustering. After applying LSA model, we cluster opinions into different clusters by using K-means
algorithm. In this step, some statistics can be gained for reputation generation such as the number of opinions,
the sum of similarity and the sum of ratings in each cluster.
3. Reputation generation. This step further aggregates clustered opinions to generate a reputation value by con-
sidering the popularity and other statistics of principal opinions.
3.4. Opinion clustering
The opinion clustering algorithm is shown below (Algorithm 1).
Algorithm 1 Opinion clustering
Begin
Step 1: Apply LSA model (We have used TruncatedSVD from Sklearn library in Python).
Step 2: Set a number of clusters and apply K-means clustering algorithm.
Step 3: Acquire the statistics of each cluster: (the sum of the similarity in a cluster using cosine similarity metric,
the sum of ratings in a cluster and the number of similar reviews in a cluster).
End
By applying Algorithm 1, we cluster opinions into several principal opinion sets after applying LSA model. The
opinions in each cluster hold a similar or same perspective. Once the processing based on Algorithm 1 has been
completed, the opinions are grouped into a number of clusters. Meanwhile, we also get the statistics of the clusters,
i.e., the number of similar opinions in each cluster, the sum of their ratings and the sum of their similarity by using
the cosine similarity metric.
3.5. Reputation generation
Based on the result of opinion clustering, we propose a method for generating a single reputation value of an entity.
In the overall, it is important to show users a concrete scale of reputation expressed by a single value. This reputation
presentation can provide good user experiences, especially for mobile Internet users who use mobile devices with
small screen sizes.
We propose formula (1) to generate the reputation of entity ”A” based on the clustering of the opinions on ”A”.
Rep(A) =
1
n clusters
.
n clusters

k=1
Vk.Sk
Nk.Nk
(1)
We denote:
n clusters : The number of clusters.
Nk : The number of opinions in cluster k.
Abdessamad Benlahbib et al. / Procedia Computer Science 148 (2019) 80–86 83
Sk : The sum of the similarity in cluster k.
Vk : The sum of ratings in cluster k.
In (1), we assume that each opinion has a rating on the entity attached to it. In our case, the rating is a number
ranging from 1 to 10 to represent a level of satisfaction.
4. Results and discussion
4.1. Dataset
We have created manually a dataset containing 600 reviews for six different movies by using IMDB website that
contains user reviews and ratings towards movies.
The statistical information of datasets is shown in Table 1.
Table 1. Statistical information of Datasets
The total number of reviews and ratings 600
The number of reviews per movie 100
4.2. Preprocessing reviews
After collecting all reviews, we applied tokenization, stemming and stop words removal in the reviews in order to
use them to carry out opinion clustering.
4.3. Evaluation measures
To measure the effectiveness of our system, we use AE (Absolute Error) and MAE(Mean Absolute Error) which
are defined as follows:
Absolute Error: The difference between the measured or inferred value of a quantity and its actual value.
Mean Absolute Error: The average of the absolute difference between prediction and actual observation.
4.4. Opinion clustering
The reviews can be grouped into a number of clusters. We can also acquire their statistics during clustering, such
as the number of similar opinions, the sum of similarity, and the sum of ratings in each cluster. To illustrate this
process, we provided example results of opinion clustering based on the 100 reviews of a movie in datasets as shown
in Table 2. We provide a python implementation for the clustering step in Github 1
.
For defining the best value of n clusters, we perform many execution with different number of clusters values.
4.5. Reputation generation
In order to evaluate our approach, we compared the final reputation computed by formula (1) with a users weighted
average vote computed by IMDB (IMDBWAV) website to represent a rating for a target movie, which is a number
ranging from 1 to 10 as shown in Fig 1. We varied the number of clusters from 2 to 19. Fig 2 and 3 show the Absolute
Error between IMDBWAV (IMDB users weighted average vote) and reputation value computed by our approach for
all movies.
1 https://github.com/abdessamadbenlahbib/Reputation-generation-K-mean/blob/master/Python_Code.
txt
84 Abdessamad Benlahbib et al. / Procedia Computer Science 148 (2019) 80–86
Table 2. Example results of opinion clustering (Algorithm 1).
Cluster SimSum RatSum Num
C1 12.99 94 13
C2 14.97 120 15
C3 14.94 114 15
C4 9.98 61 10
C5 10.95 91 11
C6 14.96 128 15
C7 9.98 82 10
C8 10.99 90 11
Legend: SimSum: the sum of the similarity in a cluster.
RatSum: the sum of ratings in a cluster.
Num: the number of similar reviews in a cluster.
Fig. 1. IMDBWAV (IMDB users weighted average vote) for The Shawshank Redemption movie
Fig. 2. Absolute Error between IMDBWAV and reputation value computed by our approach for movie 1, 2 and 3
Abdessamad Benlahbib et al. / Procedia Computer Science 148 (2019) 80–86 85
Fig. 3. Absolute Error between IMDBWAV and reputation value computed by our approach for movie 4, 5 and 6
As we can see in Fig 2 and 3, the Absolute Error between IMDBWAV and reputation value computed by our
approach is high when n clusters = 2 and n clusters = 3, then it begins to decrease. As described in Al-
gorithm 1, different values of n clusters could lead to different clustering results of reviews, which cause dif-
ferent final reputation values. Therefore, choosing a suitable value of n clusters is particularly important. We
conducted experiments to study the influence of the number of clusters n clusters on reputation generation. We
varied the number of clusters from 2 to 19 and we computed the MAE (Mean Absolute Error) between IMDB-
WAV and the reputation values computed by our approach for all the reviews of datasets for n clusters =
{2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}
Fig 4 shows the result of our experiments.
Fig. 4. Mae for different n clusters values
Both Fig 4 and Table 3 show that our approach performs best when n clusters = 9, since the MAE between
IMDBWAV and the values computed by (1) using all the reviews of dataset reaches its minimum.
5. Conclusions
In this paper, we have proposed an approach to generate reputation based on opinion clustering. By performing
opinion clustering, we classify various opinions into a number of clusters and gain their popularities, average similarity
and ratings. Thus, it becomes easy to aggregate all clusters to generate a single reputation value.
The experimental results have shown that our approach achieves an accurate reputation value in comparison with the
IMDB weighted average vote towards the target movies by choosing a suitable number of clusters.
86 Abdessamad Benlahbib et al. / Procedia Computer Science 148 (2019) 80–86
Table 3. MAE between IMDBWAV and the reputation values computed by our method using all the reviews of the 6 movies
n clusters Mean Absolute Error
2 0.43478668
3 0.28583178
4 0.18399365
5 0.1158738
6 0.08081721
7 0.06048751
8 0.04663486
9 0.0419782
10 0.0535229
11 0.05158553
12 0.05617364
13 0.07550483
14 0.05620427
15 0.05560771
16 0.05956126
17 0.08015718
18 0.07338574
19 0.04463707
6. References
[1] Z. Yan, Trust Management in Mobile Environments - Usable and Autonomic Models, IGI Global, Hershey,
Pennsylvania, USA, 2013.
[2] Audun Josang, Roslan Ismail, Colin Boyd. A survey of trust and reputation systems for online service provi-
sion. in: Decision Support Systems Volume 43 Issue 2, March, 2007, Pages 618-644. DOI: 10.1016/j.dss.2005.05.019.
[3] P. Resnick and R. Zeckhauser. Trust Among Strangers in Internet Transactions: Empirical Analysis of
eBay’s Reputation System. In M.R. Baye, editor, The Economics of the Internet and E-Commerce, volume 11 of
Advances in Applied Microeconomics. Elsevier Science, 2002.
[4] J. Schneider et al. Disseminating Trust Information in Wearable Communities. In Proceedings of the 2nd
International Symposium on Handheld and Ubiquitous Computing (HUC2K), September 2000.
[5] Zheng Yan , Xu-yang Jing , Witold Pedrycz , Fusing and Mining Opinions for Reputation Generation, In-
formation Fusion (2016), doi: 10.1016/j.inffus.2016.11.011.
[6] Landauer, T. K., Foltz, P. W., Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Pro-
cesses, 25, 259-284.
[7] J. A. HARTIGAN and M. A. WONG. Algorithm AS 136: A K-Means Clustering Algorithm. Journal of
the Royal Statistical Society. Series C (Applied Statistics), Vol. 28, No. 1 (1979), pp. 100-108.
[8] John A. Hartigan. Clustering Algorithms. 99th John Wiley  Sons, Inc. New York, NY, USA 1975.
ISBN:047135645X

More Related Content

PDF
Using NLP Approach for Analyzing Customer Reviews
PDF
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
International Journal of Engineering Research and Development (IJERD)
PDF
Book recommendation system using opinion mining technique
PDF
Computing Ratings and Rankings by Mining Feedback Comments
PDF
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
PDF
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
Using NLP Approach for Analyzing Customer Reviews
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
Book recommendation system using opinion mining technique
Computing Ratings and Rankings by Mining Feedback Comments
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION

Similar to An Unsupervised Approach For Reputation Generation (20)

PDF
Empirical Model of Supervised Learning Approach for Opinion Mining
PDF
Ijetcas14 580
PDF
Framework for opinion as a service on review data of customer using semantics...
PDF
A Survey on Opinion Mining and its Challenges
PDF
OPINION MINING AND ANALYSIS: A SURVEY
PDF
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
PDF
A Novel Jewellery Recommendation System using Machine Learning and Natural La...
PPTX
Business Analytics Final Capstone Project Presenation PPT.pptx
PDF
Ijebea14 271
PDF
IRJET- Analyzing Sentiments in One Go
PDF
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
PDF
B021202011015
PDF
IRJET- Opinion Mining and Sentiment Analysis for Online Review
PDF
Anu paper(IJARCCE)
PDF
MTVRep: A movie and TV show reputation system based on fine-grained sentiment ...
PPT
Mining Product Reputations On the Web
PDF
An Opinion Mining and Sentiment Analysis Techniques: A Survey
PDF
IRJET- Implementation of Review Selection using Deep Learning
PDF
EXTRACTING BUSINESS INTELLIGENCE FROM ONLINE PRODUCT REVIEWS
Empirical Model of Supervised Learning Approach for Opinion Mining
Ijetcas14 580
Framework for opinion as a service on review data of customer using semantics...
A Survey on Opinion Mining and its Challenges
OPINION MINING AND ANALYSIS: A SURVEY
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
A Novel Jewellery Recommendation System using Machine Learning and Natural La...
Business Analytics Final Capstone Project Presenation PPT.pptx
Ijebea14 271
IRJET- Analyzing Sentiments in One Go
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
B021202011015
IRJET- Opinion Mining and Sentiment Analysis for Online Review
Anu paper(IJARCCE)
MTVRep: A movie and TV show reputation system based on fine-grained sentiment ...
Mining Product Reputations On the Web
An Opinion Mining and Sentiment Analysis Techniques: A Survey
IRJET- Implementation of Review Selection using Deep Learning
EXTRACTING BUSINESS INTELLIGENCE FROM ONLINE PRODUCT REVIEWS
Ad

More from Kayla Jones (20)

PDF
Free Printable Stationery - Letter Size. Online assignment writing service.
PDF
Critique Response Sample Summary Response Essa
PDF
Definisi Dan Contoh Paragraph Cause And Effe
PDF
Analysis on A s Laundry Shop A Profit Maximization Approach.pdf
PDF
A Close and Distant Reading of Shakespearean Intertextuality.pdf
PDF
A CRITICAL ANALYSIS OF THE STATUS AND APPLICATION OF THE RESPONSIBILITY TO PR...
PDF
A Developmental Evolutionary Framework for Psychology.pdf
PDF
A Pedagogical Model for Improving Thinking About Learning.pdf
PDF
5 The epidemiology of obesity.pdf
PDF
Agri-tourism handbook.pdf
PDF
1001 Solved Engineering Fundamentals Problems 3rd Ed..pdf.pdf
PDF
A THEORETICAL FRAMEWORK OF STRESS MANAGEMENT- CONTEMPORARY APPROACHES, MODELS...
PDF
Assessing Testing Practices with Reference to Communicative Competence in Ess...
PDF
abstract on climate change.pdf
PDF
A research strategy for text desigbers The role of headings.pdf
PDF
An Analysis Of Consumers Perception Towards Rebranding A Study Of Hero Moto...
PDF
A Psicologia Da Crian A Jean Piaget
PDF
An Exposition Of The Nature Of Volunteered Geographical Information And Its S...
PDF
Addressing Homelessness In Public Parks
PDF
A Critical Analysis Of The Academic Papers Written By Experienced Associate A...
Free Printable Stationery - Letter Size. Online assignment writing service.
Critique Response Sample Summary Response Essa
Definisi Dan Contoh Paragraph Cause And Effe
Analysis on A s Laundry Shop A Profit Maximization Approach.pdf
A Close and Distant Reading of Shakespearean Intertextuality.pdf
A CRITICAL ANALYSIS OF THE STATUS AND APPLICATION OF THE RESPONSIBILITY TO PR...
A Developmental Evolutionary Framework for Psychology.pdf
A Pedagogical Model for Improving Thinking About Learning.pdf
5 The epidemiology of obesity.pdf
Agri-tourism handbook.pdf
1001 Solved Engineering Fundamentals Problems 3rd Ed..pdf.pdf
A THEORETICAL FRAMEWORK OF STRESS MANAGEMENT- CONTEMPORARY APPROACHES, MODELS...
Assessing Testing Practices with Reference to Communicative Competence in Ess...
abstract on climate change.pdf
A research strategy for text desigbers The role of headings.pdf
An Analysis Of Consumers Perception Towards Rebranding A Study Of Hero Moto...
A Psicologia Da Crian A Jean Piaget
An Exposition Of The Nature Of Volunteered Geographical Information And Its S...
Addressing Homelessness In Public Parks
A Critical Analysis Of The Academic Papers Written By Experienced Associate A...
Ad

Recently uploaded (20)

PDF
Computing-Curriculum for Schools in Ghana
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Classroom Observation Tools for Teachers
PDF
Insiders guide to clinical Medicine.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Lesson notes of climatology university.
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Pre independence Education in Inndia.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Institutional Correction lecture only . . .
PPTX
Cell Types and Its function , kingdom of life
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Complications of Minimal Access Surgery at WLH
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
Computing-Curriculum for Schools in Ghana
Sports Quiz easy sports quiz sports quiz
Classroom Observation Tools for Teachers
Insiders guide to clinical Medicine.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Lesson notes of climatology university.
Abdominal Access Techniques with Prof. Dr. R K Mishra
Pre independence Education in Inndia.pdf
TR - Agricultural Crops Production NC III.pdf
Institutional Correction lecture only . . .
Cell Types and Its function , kingdom of life
STATICS OF THE RIGID BODIES Hibbelers.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Microbial diseases, their pathogenesis and prophylaxis
2.FourierTransform-ShortQuestionswithAnswers.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
human mycosis Human fungal infections are called human mycosis..pptx
Complications of Minimal Access Surgery at WLH
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Renaissance Architecture: A Journey from Faith to Humanism

An Unsupervised Approach For Reputation Generation

  • 1. ScienceDirect Available online at www.sciencedirect.com Procedia Computer Science 148 (2019) 80–86 1877-0509 © 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the scientific committee of the Second International Conference on Intelligent Computing in Data Sciences (ICDS 2018). 10.1016/j.procs.2019.01.011 © 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the scientific committee of the Second International Conference on Intelligent Computing in Data Sciences (ICDS 2018). Keywords: Semantic analysis; Opinion mining; Reputation generation; Machine learning. 1. Introduction Over the past few years, the web has been growing at an incredible rate. Nowadays, people can buy products, watch movies and make reservations via the Internet. Before making a decision, most people would like to seek other people’s opinions on a target entity in order to judge its performances. In this case, one could scan both entity descriptions and user comments. Hence, people have generally established a certain online custom: first look at other users’ comments, then make a decision towards the target entity. On the other hand, online sellers would like to collect the user comments with high praise and put them in the description of their products in order to attract more purchases. According to recent statistics, the number of users of some famous online shopping centers, e.g., Taobao, Jingdong and Amazon has exceeded one billion. Each of above commercial websites contains a huge number of product comments. These comments contain user opinions on the products. As the opinions show the subjective attitudes, evaluations, and speculations of users expressed in natural languages, this kind of contents contributed by the Internet users has been well recognized as valuable information. It can be exploited to analyze public opinions on a specific product in ∗ Corresponding author. Tel.: +212632561278. E-mail address: abdessamad.benlahbib@usmba.ac.ma Second International Conference on Intelligent Computing in Data Sciences (ICDS 2018) An Unsupervised Approach for Reputation Generation Abdessamad Benlahbiba,∗, El Habib Nfaouia aLIIAN Laboratory, Faculty of Sciences Dhar EL Mehraz, Sidi Mohammed Ben Abdellah University, Fez, Morocco Abstract Nowadays, watching a movie, buying a product, making hotel reservations and other e-commerce trades are strung to consulting other peoples reviews and recommendations on the target entity. Indeed, Amazon, IMDB (Internet Movie Database) as well as several websites provide a convenient platform where users share freely their opinions and their subjective attitudes towards the target entity with no restrictions. However, those opinions are too much to be examined one by one, this is why a general reputation value makes the task of choosing the right product much easier. In this paper, we propose a reputation generation approach based on opinion clustering and semantic analysis. In our approach, opinions are grouped into a number of clusters that contain opinions with the same attitude or preference. By aggregating the ratings attached to the clusters, we generate the reputation of an entity. Experimental results demonstrate the effectiveness of the proposed approach in generating reputation value.
  • 2. Abdessamad Benlahbib et al. / Procedia Computer Science 148 (2019) 80–86 81 order to figure out user likes or dislikes [5]. In this paper, we propose to use LSA (Latent Semantic Analysis) model, then applying K-means algorithm to cluster opinions based on their semantic relations, and by aggregating the ratings attached to the fused opinions, we normalize the reputation of an entity. The paper is organized as follows. Section 2 gives a brief review of related work. In Section 3, we present the details of our approach. We show experimental results followed by additional analysis and discussions in Section 4. Finally, conclusions are presented in Section 5. 2. Related work Reputation is a measure that is derived from direct or indirect knowledge on earlier interactions of entities and is used to assess the level of trust an entity puts into another entity [1]. Reputation systems are typically based on public information in order to reflect the community’s opinion in general [2]. The simplest form of computing reputation scores is simply to sum the number of positive ratings and negative ratings separately, and to keep a total score as the positive score minus the negative score. This is the principle used in eBay’s reputation forum which is described in [3]. In [4], a more advanced scheme proposed to compute the reputation score as the average of all ratings, and this principle is used in the reputation systems of numerous commercial web sites, such as Epinions and Amazon. Advanced models in this category compute a weighted average of all the ratings, where the rating weight can be determined by factors such as rater trustworthiness/reputation, age of the rating, distance between rating and current score etc. Recently, Zheng et al [5] proposed a novel reputation generation approach based on opinion fusion and mining. In their approach, opinions are filtered to eliminate unrelated ones, and then grouped into a number of fused principal opinion sets that contain opinions with a similar or the same attitude or preference. By aggregating the ratings attached to the fused opinions, they normalize the reputation of an entity. They claimed that: ”No work has explored the opinions expressed in natural languages, opinion voting, opinion citation and user feedback ratings in a comprehensive way for reputation generation” [5]. 3. Proposed method In this section, we remember the LSA technique and the K-means algorithm, then we describe in depth our pro- posed method for reputation generation. 3.1. Latent Semantic Analysis Latent Semantic Analysis (LSA) is a technique in natural language processing of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. In [6], T.K. Landauer, P.W. Foltz and D. Laham describe LSA as follows: ”Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text (Landauer and Dumais, 1997). The underlying idea is that the aggregate of all the word contexts in which a given word does and does not appear provides a set of mutual constraints that largely determines the similarity of meaning of words and sets of words to each other. The adequacy of LSAs reflection of human knowledge has been established in a variety of ways. For example, its scores overlap those of humans on standard vocabulary and subject matter tests; it mimics human word sorting and category judgments; it simulates wordword and passageword lexical priming data, and, it accurately estimates passage coherence, learnability of passages by individual students, and the quality of knowledge contained in an essay”. 3.2. K-means Algorithm K-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. K-means algorithm aims to divide M points in N dimensions into K clusters so that the within-cluster sum of squares is minimized [7][8].
  • 3. 82 Abdessamad Benlahbib et al. / Procedia Computer Science 148 (2019) 80–86 3.3. System overview We propose the following procedure to cluster and mine opinions for reputation generation. 1. Opinion data collection and preprocess. During this step, we collect the opinion data about an entity coming from websites (product, movie, etc). Because there are many types of raw opinion data that contain many words and symbols, preprocessing of such collected raw data is required, such as filtering word segmentation and stop words and eliminating useless expressions and pictures, etc. 2. Opinion clustering. After applying LSA model, we cluster opinions into different clusters by using K-means algorithm. In this step, some statistics can be gained for reputation generation such as the number of opinions, the sum of similarity and the sum of ratings in each cluster. 3. Reputation generation. This step further aggregates clustered opinions to generate a reputation value by con- sidering the popularity and other statistics of principal opinions. 3.4. Opinion clustering The opinion clustering algorithm is shown below (Algorithm 1). Algorithm 1 Opinion clustering Begin Step 1: Apply LSA model (We have used TruncatedSVD from Sklearn library in Python). Step 2: Set a number of clusters and apply K-means clustering algorithm. Step 3: Acquire the statistics of each cluster: (the sum of the similarity in a cluster using cosine similarity metric, the sum of ratings in a cluster and the number of similar reviews in a cluster). End By applying Algorithm 1, we cluster opinions into several principal opinion sets after applying LSA model. The opinions in each cluster hold a similar or same perspective. Once the processing based on Algorithm 1 has been completed, the opinions are grouped into a number of clusters. Meanwhile, we also get the statistics of the clusters, i.e., the number of similar opinions in each cluster, the sum of their ratings and the sum of their similarity by using the cosine similarity metric. 3.5. Reputation generation Based on the result of opinion clustering, we propose a method for generating a single reputation value of an entity. In the overall, it is important to show users a concrete scale of reputation expressed by a single value. This reputation presentation can provide good user experiences, especially for mobile Internet users who use mobile devices with small screen sizes. We propose formula (1) to generate the reputation of entity ”A” based on the clustering of the opinions on ”A”. Rep(A) = 1 n clusters . n clusters k=1 Vk.Sk Nk.Nk (1) We denote: n clusters : The number of clusters. Nk : The number of opinions in cluster k.
  • 4. Abdessamad Benlahbib et al. / Procedia Computer Science 148 (2019) 80–86 83 Sk : The sum of the similarity in cluster k. Vk : The sum of ratings in cluster k. In (1), we assume that each opinion has a rating on the entity attached to it. In our case, the rating is a number ranging from 1 to 10 to represent a level of satisfaction. 4. Results and discussion 4.1. Dataset We have created manually a dataset containing 600 reviews for six different movies by using IMDB website that contains user reviews and ratings towards movies. The statistical information of datasets is shown in Table 1. Table 1. Statistical information of Datasets The total number of reviews and ratings 600 The number of reviews per movie 100 4.2. Preprocessing reviews After collecting all reviews, we applied tokenization, stemming and stop words removal in the reviews in order to use them to carry out opinion clustering. 4.3. Evaluation measures To measure the effectiveness of our system, we use AE (Absolute Error) and MAE(Mean Absolute Error) which are defined as follows: Absolute Error: The difference between the measured or inferred value of a quantity and its actual value. Mean Absolute Error: The average of the absolute difference between prediction and actual observation. 4.4. Opinion clustering The reviews can be grouped into a number of clusters. We can also acquire their statistics during clustering, such as the number of similar opinions, the sum of similarity, and the sum of ratings in each cluster. To illustrate this process, we provided example results of opinion clustering based on the 100 reviews of a movie in datasets as shown in Table 2. We provide a python implementation for the clustering step in Github 1 . For defining the best value of n clusters, we perform many execution with different number of clusters values. 4.5. Reputation generation In order to evaluate our approach, we compared the final reputation computed by formula (1) with a users weighted average vote computed by IMDB (IMDBWAV) website to represent a rating for a target movie, which is a number ranging from 1 to 10 as shown in Fig 1. We varied the number of clusters from 2 to 19. Fig 2 and 3 show the Absolute Error between IMDBWAV (IMDB users weighted average vote) and reputation value computed by our approach for all movies. 1 https://github.com/abdessamadbenlahbib/Reputation-generation-K-mean/blob/master/Python_Code. txt
  • 5. 84 Abdessamad Benlahbib et al. / Procedia Computer Science 148 (2019) 80–86 Table 2. Example results of opinion clustering (Algorithm 1). Cluster SimSum RatSum Num C1 12.99 94 13 C2 14.97 120 15 C3 14.94 114 15 C4 9.98 61 10 C5 10.95 91 11 C6 14.96 128 15 C7 9.98 82 10 C8 10.99 90 11 Legend: SimSum: the sum of the similarity in a cluster. RatSum: the sum of ratings in a cluster. Num: the number of similar reviews in a cluster. Fig. 1. IMDBWAV (IMDB users weighted average vote) for The Shawshank Redemption movie Fig. 2. Absolute Error between IMDBWAV and reputation value computed by our approach for movie 1, 2 and 3
  • 6. Abdessamad Benlahbib et al. / Procedia Computer Science 148 (2019) 80–86 85 Fig. 3. Absolute Error between IMDBWAV and reputation value computed by our approach for movie 4, 5 and 6 As we can see in Fig 2 and 3, the Absolute Error between IMDBWAV and reputation value computed by our approach is high when n clusters = 2 and n clusters = 3, then it begins to decrease. As described in Al- gorithm 1, different values of n clusters could lead to different clustering results of reviews, which cause dif- ferent final reputation values. Therefore, choosing a suitable value of n clusters is particularly important. We conducted experiments to study the influence of the number of clusters n clusters on reputation generation. We varied the number of clusters from 2 to 19 and we computed the MAE (Mean Absolute Error) between IMDB- WAV and the reputation values computed by our approach for all the reviews of datasets for n clusters = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19} Fig 4 shows the result of our experiments. Fig. 4. Mae for different n clusters values Both Fig 4 and Table 3 show that our approach performs best when n clusters = 9, since the MAE between IMDBWAV and the values computed by (1) using all the reviews of dataset reaches its minimum. 5. Conclusions In this paper, we have proposed an approach to generate reputation based on opinion clustering. By performing opinion clustering, we classify various opinions into a number of clusters and gain their popularities, average similarity and ratings. Thus, it becomes easy to aggregate all clusters to generate a single reputation value. The experimental results have shown that our approach achieves an accurate reputation value in comparison with the IMDB weighted average vote towards the target movies by choosing a suitable number of clusters.
  • 7. 86 Abdessamad Benlahbib et al. / Procedia Computer Science 148 (2019) 80–86 Table 3. MAE between IMDBWAV and the reputation values computed by our method using all the reviews of the 6 movies n clusters Mean Absolute Error 2 0.43478668 3 0.28583178 4 0.18399365 5 0.1158738 6 0.08081721 7 0.06048751 8 0.04663486 9 0.0419782 10 0.0535229 11 0.05158553 12 0.05617364 13 0.07550483 14 0.05620427 15 0.05560771 16 0.05956126 17 0.08015718 18 0.07338574 19 0.04463707 6. References [1] Z. Yan, Trust Management in Mobile Environments - Usable and Autonomic Models, IGI Global, Hershey, Pennsylvania, USA, 2013. [2] Audun Josang, Roslan Ismail, Colin Boyd. A survey of trust and reputation systems for online service provi- sion. in: Decision Support Systems Volume 43 Issue 2, March, 2007, Pages 618-644. DOI: 10.1016/j.dss.2005.05.019. [3] P. Resnick and R. Zeckhauser. Trust Among Strangers in Internet Transactions: Empirical Analysis of eBay’s Reputation System. In M.R. Baye, editor, The Economics of the Internet and E-Commerce, volume 11 of Advances in Applied Microeconomics. Elsevier Science, 2002. [4] J. Schneider et al. Disseminating Trust Information in Wearable Communities. In Proceedings of the 2nd International Symposium on Handheld and Ubiquitous Computing (HUC2K), September 2000. [5] Zheng Yan , Xu-yang Jing , Witold Pedrycz , Fusing and Mining Opinions for Reputation Generation, In- formation Fusion (2016), doi: 10.1016/j.inffus.2016.11.011. [6] Landauer, T. K., Foltz, P. W., Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Pro- cesses, 25, 259-284. [7] J. A. HARTIGAN and M. A. WONG. Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 28, No. 1 (1979), pp. 100-108. [8] John A. Hartigan. Clustering Algorithms. 99th John Wiley Sons, Inc. New York, NY, USA 1975. ISBN:047135645X