SlideShare a Scribd company logo
Bulletin of Electrical Engineering and Informatics
Vol. 9, No. 6, December 2020, pp. 2492~2498
ISSN: 2302-9285, DOI: 10.11591/eei.v9i6.2018  2492
Journal homepage: http://beei.org
Marketplace affiliates potential analysis using cosine similarity
and vision-based page segmentation
Wildan Budiawan Zulfikar1
, Mohamad Irfan2
, Muhammad Ghufron3
, Jumadi4
, Esa Firmansyah5
1,3
Department of Informatics, UIN Sunan Gunung Djati, Indonesia
2
Department of ICT, Asia E University, Malaysia
4
School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia
5
Department of Informatics, STMIK Sumedang, Indonesia
Article Info ABSTRACT
Article history:
Received Aug 15, 2019
Revised Jan 28, 2020
Accepted Mar 1, 2020
One success factor of an online affiliate is determined by the quality of
the content source. Therefore, affiliate marketplaces need to do an objective
assessment to retrieve content data that will be used to choose the right
product in the appropriate product filter. Usually, the selection is not made
using a good and measured system so that the selection of product content
is only based on parts that are not in accordance with what is seen or
subjective. However, if analyzed using a good and measurable system will
produce an objective product content and can have a positive impact on users
because the selection is based on factual data. The purpose of this research
is to analyze the potential of the affiliate marketplace by combining cosine
similarity with vision-based page segmentation. This is a new breakthrough
made for optimization to get the best content in accordance with the required
criteria. This work will produce a number of product recommendations that
are appropriate for publication and then made use of for comparison that
matches the required criteria. At the limited evaluation stage, the performance
of the proposed model obtained satisfactory results, in which 5 queries tested
were all as expected.
Keywords:
Cosine similarity
Marketplace affiliates
Page segmentation
Vision
Web scraping
This is an open access article under the CC BY-SA license.
Corresponding Author:
Wildan Budiawan Zulfikar,
Department of Informatics,
UIN Sunan Gunung Djati,
105th A.H. Nasution Street, Bandung, 40614, Indonesia.
Email: wildan.b@uinsgd.ac.id
1. INTRODUCTION
Nowadays, information technology has created new types and business opportunities where more
and more business transactions are being made online. Therefore, everyone might easily carry out buying
and selling transactions [1-3]. Many companies try to offer a variety of products using this media [4, 5].
One of the benefits of the existence of the internet is as a media promotion of a product. A product that
is online via the internet can bring huge benefits to entrepreneurs because the product is known throughout
the world [4, 6].
Web scraping is the process of extracting information from a website. Web scraping is an alternative
way that chose because the required data is not always available in the API, another source like shared
database or data warehouse, or even they do not provide the API at all [7-12]. This research has used product
attribute data obtained from several marketplace affiliates using web scraping techniques. It used one of the web
scraping methods, vision-based page segmentation. Vision-based page Segmentation is an algorithm for
website page metadata. Based on previous research, this method of extracting tag tree data can detect content
Bulletin of Electr Eng & Inf ISSN: 2302-9285 
Marketplace affiliates potential analysis using cosine similarity and… (Wildan Budiawan Zulfikar)
2493
structures quickly [13, 14]. It transforms the deep web into a visual tree [13, 15]. The result is divided into
several segments and can be processed using DOM parser before it can finally be processed and modeled [13].
In addition, the proposed model applies Cosine Similarity and TF-IDF. Cosine-Similarity is one
algorithm that functions to compare similarities between documents. In this case, what is compared is a query
with a training document [16-18]. In calculating cosine similarity, first, do a scalar multiplication between
the query and the document then add up, then do the multiplication between the length of the document and
the length of the query that has been squared, after that the square root count is calculated [16, 19-23].
Furthermore, the results of the scalar multiplication are divided by the results of the multiplication
of the length of the document and query.
2. RESEARCH METHOD
In this article, it will be explained that the existing attribute data is sourced from some marketplace
data. Data sources are taken directly from the original website. Online marketplace data taken is product data
that is still active in the product category list. Detailed marketplace affiliate data used in this work is
described in Table 1.
Table 1. List of marketplace affiliate
Marketplace URL Role
Tokopedia https://www.tokopedia.com Main marketplace
Bukalapak https://www.bukalapak.com 2nd
marketplace
Blanja https://www.blanja.com 3rd
marketplace
Lazada https://www.lazada.co.id 4th
marketplace
In this work, the main marketplace is Tokopedia. Then, one product will compare to another
marketplaces. The use of this method is divided into two processes namely the first process will be scraping
product data based on all selected web marketplace data. This method uses the id category and name category
attributes of each product. When the process of web scraping, product data will be divided into several
categories that will be done using the cosine similarity and vision-based page segmentation methods.
After the data is formed in the form of HTML dom, the system will determine one of the data used to do
the process to display the data. The category becomes one of the data used as a reference for this data
filtering process because it shows the level of each product data based on the category and is appropriate in
retrieving accurate data and filters in the price and rating order. Product availability is the second factor
because it supports product competency.
2.1. Cosine similarity
The following is a simulation or example of data used in the process. Category data can be seen in
Table 2. Conditions are adjusted to each category which in this case is limited to 6 categories. The product
attributes that will be analyzed in the work in detail can be seen in Table 3.
Table 2. Product categories
Categories Code
Fashion Cat_001
Health Cat_002
Beauty Cat_003
Smartphone and tablet Cat_004
Laptop Cat_005
Computer Cat_006
Table 3. Product attributes
Attributes Code
Name prod_name
ID prod_id
SKU prod_sku
Link prod_link
Figure prod_fig
Price prod_price
Category code prod_cat_id
Category description prod_cat_name
Advertiser prod_ads
Table 4 is the query data that will be calculated using TF-IDF based on a specific query. This work
involves six queries and four affiliate marketplaces as explained in Table 4. Figure 1 is a visualization of
Table 4 to make it easier to read valid data and the same or almost the same then the table is converted into a
graph diagram. Pictures from the graph diagram of the query and matching with each of the place list lists
can be seen in the Figure 1.
 ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 9, No. 6, December 2020 : 2492 – 2498
2494
Tabel 4. TF-IDF
Query Marketplace 1 (D1) Marketplace 2 (D2) Marketplace 3 (D3) Marketplace 4 (D4)
Galaxy s7 1 1 0 0
Samsung 1 1 1 1
iPhone X 128Gb Black 1 0 0 0
Galaxy+S7 1 1 1 1
MacBook Air 1 0 1 1
Keyboard Razer 1 0 1 0
Figure 1. TF-IDF data on graph
The first stage of vision-based page segmentation is to determine the initial weight of each query
manually. For example, the first weight is filled by Samsung's query and the second group is weighted by
Galaxy. Then obtained: Centroid 1=0.3 and Centroid 2=0.3 as explained in Table 5 and the visualization
explained in Figure 2.
Table 5. TF-IDF data and weighting
Code D1 D2 Description
Oppo 0 0
Lenovo 0 0
Samsung 0.3 0 Weight 1
Asus 0 0
Galaxy 0 0.3 Weight 2
Iphone 0 0
Figure 2. TF-IDF data in a graph diagram with weighted queries
Next calculate the distance of each data with each weight using (1) [24, 25]:
(1)
The next work to calculate the weight by comparing query data 1 with each query taken that has
weight. The query data weighting can be seen in Table 6. Then, look for the average of each weight value
to be used as a new query weight namely:
W1 New: (0.3, 0.3)
W2 New: (0.0, 0.0)
This step will continue to be repeated until the conditions are met. The desired condition is that there
is no change in the weighting of the data source which means there is no difference between the data query
and the query in the previous iteration. Then the second iteration will be performed using a new weighting.
iphone x 128
gb black
macbook air
galaxy s7
keyboard
razer
samsung
0
2
4
6
8
10
0 5 10 15 20
TF
IDF
GRAPH
TOOLS
QUERY
oppo lenovo
samsung
asus
galaxy
iphone
0
2
4
6
8
10
0 5 10 15 20
TF-IDF
Weighting
Query
Dokumen 1 dan 2
Bulletin of Electr Eng & Inf ISSN: 2302-9285 
Marketplace affiliates potential analysis using cosine similarity and… (Wildan Budiawan Zulfikar)
2495
In this experiment, the algorithm will be completed in the third iteration. The final results are presented in the
form of a Cartesian diagram to make it easier to see the closeness of the data between the weighting and each
data as explained on Table 6. Table 6 explains the list of Queries included in the category. Products that are
in the first weighting are Q3, Q5, Q6 and in the second weighting are Q1, Q2, Q4.
Table 6. Clustering results
W1 W2
Q3 Q1
Q5 Q2
Q6 Q4
2.2. Vision-based page segmentation
Query retrieved adjusted to the query that has been selected. The higher the weighting value
of the selected query the higher the query used and conversely the lower the weighting of the query the lower
weighting of the query is used. Next, calculate the normalization of data according to the vision-based page
segmentation formula then multiplied by the weights that have been determined at the initialization stage
namely (2) for profit and (3) for costs:
{ } (2)
{ } (3)
3. RESULTS AND DISCUSSION
After conducting the previous weighting phase, product data search will be performed based on
the Vision-based Page Segmentation algorithm. Based on some data that has been classified, only one
product data will be taken that matches the previous TF-IDF process and to be compared using this
algorithm. The query groups to be selected are based on the query to be searched. If the query is appropriate,
then the appropriate group will be taken, and while the position is not appropriate, page 404 or product page
will not be selected. In this case, the first query is taken that is Samsung Galaxy. The following is the use
of the vision-based page segmentation algorithm as described in the previous chapter:
a. First step is determining new product data. Figure 3 describe the default position of product detail
including name, dimension, price, and any related data of product.
Figure 3. Scheme design vision-based page segmentation product scraping data
 ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 9, No. 6, December 2020 : 2492 – 2498
2496
b. Then, each product attributes parse by its category and subcategory as describe on Figure 4.
Figure 4. Web process data page product data scraping
c. Extract data that will be searched using (4).
(4)
Where w(1) is the query data that is input with the total weight that will be searched from the weight value
of R and r is the number of segments related to the query that already has a value.
d. Validation of data that has been processed and normalize data according to (5).
{ } (5)
The results of data normalization using are explained in Table 7.
Table 7. The result of data normalization
No. Kode Div.Id Div.Attr Div.Class Cache Alpha result
1 Q1 0.875 1 0.8 1 0.5
2 Q2 1 1 0.8 0.954545455 1
3 Q4 0.9375 0.875 1 0.909090909 1
e. Normalization results are multiplied by the weights and summed to find out the final result of
the preference value with (6) and final preference result explained in Table 8.
∑ (6)
Table 8. Final preference results
No. Kode Div.Id Div.Attr Div.Class Cache Alpha result Result
W 0.3 0.2 0.2 0.15 0.15
1 Q1 0.875 1 0.8 1 0.5 0.8475
2 Q2 1 1 0.8 0.954545455 1 0.953181818
3 Q4 0.9375 0.875 1 0.909090909 1 0.942613636
Based on Table 8, it can be concluded that the most recommended Query data is the Query data with the Q2
code. Query data Q2 gets 0.95 results and is only 0.01 points different from Q4.
4. CONCLUSION
If evaluated from performance, the proposed model gets the appropriate results. 5 queries tested
everything as expected. The cosine similarity algorithm successfully improvised the vision-based page
segmentation algorithm and was able to adjust product 1 to other products that were eligible to be selected
by searching product data processed by TF-IDF. Further work, we suggest comparing this model with
different methods.
Bulletin of Electr Eng & Inf ISSN: 2302-9285 
Marketplace affiliates potential analysis using cosine similarity and… (Wildan Budiawan Zulfikar)
2497
REFERENCES
[1] Y. U. Chandra, S. Karya, and M. Hendrawaty, “Decision support systems for customer to buy products with
an integration of reviews and comments from marketplace e-commerce sites in Indonesia: A proposed model,”
International Journal Advamced Science Engineering Information Technology., vol. 9, no. 4, pp. 1171-1176, 2019.
[2] I. O. Sfenrianto, A. Christiano, and M. P. Mulani, “Impact of e-service on customer loyalty in marketplace in
Indonesia,” Journal of Theoretical and Applied Information Technology, vol. 96, no. 20, pp. 6795-6805, 2018.
[3] G. J. A. Santoso and T. A. Napitupulu, “Factors affecting seller loyalty in business e-marketplace : A case of
Indonesia,” Journal of Theoretical and Applied Information Technology, vol. 96, no. 1, pp. 162-171,·Jan. 2018.
[4] M. A. Fauzan, A. S. Nisafani, and A. Wibisono, “Seller reputation impact on sales performance in public
e-marketplace Bukalapak,” TELKOMNIKA Telecommunication Computing Electronic and Control, vol. 17, no. 4,
pp. 1810-1817, Aug. 2019.
[5] H. Yoganarasimhan, “The value of reputation in an online freelance marketplace,” Marketing Science, vol. 32,
no. 6, pp. 860-891, Nov. 2013.
[6] M. N. Alraja and M. A. Said Kashoob, “Transformation to electronic purchasing: an empirical investigation,”
TELKOMNIKA Telecommunication Computing Electronic and Control, vol. 17, no. 3, pp. 1209-1219, June 2019.
[7] S. K. Malik and S. Rizvi, “Information extraction using web usage mining, web scrapping and semantic
annotation,” 2011 International Conference on Computational Intelligence and Communication Networks,
Gwalior, pp. 465-469, 2011.
[8] K. Sriraghav, S. Jayanthi, N. Vidya, and V. S. Felix Enigo, “ScrAnViz-A tool to scrap, analyze and visualize
unstructured-data using attribute-based opinion mining algorithm,” 2017 Innovations in Power and Advanced
Computing Technologies (i-PACT), Vellore, pp. 1-5, 2017.
[9] R. Murali, “An intelligent web spider for online e-commerce data extraction,” 2018 Second International
Conference on Green Computing and Internet of Things (ICGCIoT), Bangalore, India, pp. 332-339, 2018.
[10] S. Mehak, R. Zafar, S. Aslam, and S. M. Bhatti, “Exploiting filtering approach with web scrapping for smart online
shopping: Penny wise: A wise tool for online shopping,” 2019 2nd International Conference on Computing,
Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan, pp. 1-5, 2019.
[11] Đ. Petrović and I. Stanišević, “Web scrapping and storing data in a database, a case study of the used cars
market,” 2017 25th Telecommunication Forum (TELFOR), Belgrade, pp. 1-4, 2017.
[12] J. G. Thomsen, E. Ernst, C. Brabrand, and M. Schwartzbach, “WebSelF: A web scraping framework,”
International Conference on Web Engineering 2012, Lecture Notes in Computer Science, Springer, Berlin,
Heidelberg, vol. 7387, pp. 347-361, 2012.
[13] M. Cormer, R. Mann, K. Moffatt, and R. Cohen, “Towards an improved vision-based web page segmentation
algorithm,” 2017 14th Conference on Computer and Robot Vision (CRV), Edmonton, AB, pp. 345-352, 2017.
[14] A. Bhardwaj and V. Mangat, “A novel approach for content extraction from web pages,” 2014 Recent Advances in
Engineering and Computational Sciences (RAECS), Chandigarh, pp. 1-4, 2014.
[15] P. Ko, S. Kang, and H. Kumar, “Web page dependent vision based segementation for web sites,”
Seventh IEEE/ACIS International Conference on Computer and Information Science (ICIS 2008), Portland, OR,
pp. 690-694, 2008.
[16] D. Xue and Y. Wang, “Applying cosine similarity to discount evidence,” 2017 10th International Symposium on
Computational Intelligence and Design (ISCID), Hangzhou, pp. 516-519, 2017.
[17] X. Wang, Z. Xu, X. Xia, and C. Mao, “Computing user similarity by combining SimRank++ and cosine similarities
to improve collaborative filtering,” 2017 14th Web Information Systems and Applications Conference (WISA),
Liuzhou, pp. 205-210, 2017.
[18] P. P. Gokul, B. K. Akhil, and K. K. M. Shiva, “Sentence similarity detection in Malayalam language using cosine
similarity,” 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information &
Communication Technology (RTEICT), Bangalore, pp. 221-225, 2017.
[19] M. Alodadi and V. P. Janeja, “Similarity in patient support forums using TF-IDF and cosine similarity metrics,”
2015 International Conference on Healthcare Informatics, Dallas, TX, pp. 521-522, 2015.
[20] Hilary I. Okagbue, Sheila A. Bishop, Pelumi E. Oguntunde, Patience I. Adamu, Abiodun A. Opanuga, and Elvir M.
Akhmetshin, “Modified CiteScore metric for reducing the effect of self-citations,” TELKOMNIKA
Telecommunication Computing Electronic and Control, vol. 17, no. 6, pp. 3044-3049, Dec. 2019.
[21] A. Hamdy and M. Elsayed, “Towards more accurate automatic recommendation of software design patterns,”
Journal of Theoretical & Applied Information Technology, vol. 96, no. 15, pp. 5069-5079, 2018.
[22] D. Jayasri and D. D. Manimegalai, “An efficient cross ontology-based similarity measure for bio-document
retrieval system,” Journal of Theoretical & Applied Information Technology, vol. 54, no. 2, pp. 245-258, 2013.
[23] Y. Kawada, “Cosine similarity and the Borda rule,” Social Choice and Welfare, vol. 51, no. 1, pp. 1-11, Jun. 2018.
[24] W. Uther, “TF–IDF,” in C. Sammut and G. I. Webb (Ed.), Encyclopedia of Machine Learning, Boston, MA:
Springer US, pp. 986-987, 2011.
[25] Li-Ping Jing, Hou-Kuan Huang, and Hong-Bo Shi, “Improved feature selection approach TFIDF in text mining,”
Proceedings. International Conference on Machine Learning and Cybernetics, Beijing, vol. 2, pp. 944-946, 2002.
 ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 9, No. 6, December 2020 : 2492 – 2498
2498
BIOGRAPHIES OF AUTHORS
Wildan Budiawan Zulfikar received the B.Eng degree from UIN Sunan Gunung Djati,
Indonesia, and M.Cs from STMIK LIKMI, Indonesia. He currently a lecturer at UIN Sunan
Gunung Djati, Indonesia. His research area is in Information System and Data Mining.
Mohamad Irfan received the B.Eng degree from UIN Sunan Gunung Djati, Indonesia, and
M.Cs from STMIK LIKMI, Indonesia. He currently a Ph.D student of Asia E University,
Malaysia. His research focused on Information System.
Muhammad Ghufron received the B.Eng degree from UIN Sunan Gunung Djati, Indonesia. He
currently a research assistant of Informatics Department. His research interest is Business
Information System.
Jumadi received the B.Eng degree from Dharma Negara Business and Infromatics School,
Indonesia and M.Cs from Universitas Gajah Mada, Indonesia. He currently a Ph.D student of
School of Electrical Engineering and Informatics, Institute Teknologi Bandung, Indonesia. His
research focussed on Semantics Information Retrieval.
Esa Firmansyah received the B.Eng degree from STMIK PMBI Bandung, and M.Kom from
STTIBI Jakarta, Indonesia. He currently a Ph.D student of Asia E University, Malaysia. His
research focussed on Information Technology & Information System.

More Related Content

PDF
E-commerce online review for detecting influencing factors users perception
PDF
Data Exchange Design with SDMX Format for Interoperability Statistical Data
PDF
Bulk ieee projects 2012 2013
PDF
Consumption capability analysis for Micro-blog users based on data mining
PDF
Enhanced Privacy Preserving Access Control in Incremental Data using microagg...
PDF
Multidirectional Product Support System for Decision Making In Textile Indust...
PDF
A Generic Model for Student Data Analytic Web Service (SDAWS)
PDF
Unstructured multidimensional array multimedia retrival model based xml database
E-commerce online review for detecting influencing factors users perception
Data Exchange Design with SDMX Format for Interoperability Statistical Data
Bulk ieee projects 2012 2013
Consumption capability analysis for Micro-blog users based on data mining
Enhanced Privacy Preserving Access Control in Incremental Data using microagg...
Multidirectional Product Support System for Decision Making In Textile Indust...
A Generic Model for Student Data Analytic Web Service (SDAWS)
Unstructured multidimensional array multimedia retrival model based xml database

What's hot (20)

PDF
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
DOCX
Enterprise Software Architecture Project
PDF
IRJET - Encoded Polymorphic Aspect of Clustering
PDF
direct marketing in banking using data mining
PDF
Feature Based Semantic Polarity Analysis Through Ontology
PDF
IRJET-Survey on Identification of Top-K Competitors using Data Mining
PDF
Corporate Policy Governance in Secure MD5 Data Changes and Multi Hand Adminis...
PDF
A tutorial on secure outsourcing of large scalecomputation for big data
PDF
Optimized Feature Extraction and Actionable Knowledge Discovery for Customer ...
PDF
A Framework for Visualizing Association Mining Results
PDF
Linking Behavioral Patterns to Personal Attributes through Data Re-Mining
PDF
Latent semantic analysis and cosine similarity for hadith search engine
PDF
IRJET- E-commerce Recommendation System
PDF
An Efficient Framework for Predicting and Recommending M-Commerce Patterns Ba...
PDF
An Improved Support Vector Machine Classifier Using AdaBoost and Genetic Algo...
PDF
Paper Explained: Deep learning framework for measuring the digital strategy o...
PDF
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
PDF
PERFORMANCE ANALYSIS OFMODIFIED ALGORITHM FOR FINDINGMULTILEVEL ASSOCIATION R...
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Enterprise Software Architecture Project
IRJET - Encoded Polymorphic Aspect of Clustering
direct marketing in banking using data mining
Feature Based Semantic Polarity Analysis Through Ontology
IRJET-Survey on Identification of Top-K Competitors using Data Mining
Corporate Policy Governance in Secure MD5 Data Changes and Multi Hand Adminis...
A tutorial on secure outsourcing of large scalecomputation for big data
Optimized Feature Extraction and Actionable Knowledge Discovery for Customer ...
A Framework for Visualizing Association Mining Results
Linking Behavioral Patterns to Personal Attributes through Data Re-Mining
Latent semantic analysis and cosine similarity for hadith search engine
IRJET- E-commerce Recommendation System
An Efficient Framework for Predicting and Recommending M-Commerce Patterns Ba...
An Improved Support Vector Machine Classifier Using AdaBoost and Genetic Algo...
Paper Explained: Deep learning framework for measuring the digital strategy o...
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
PERFORMANCE ANALYSIS OFMODIFIED ALGORITHM FOR FINDINGMULTILEVEL ASSOCIATION R...
Ad

Similar to Marketplace affiliates potential analysis using cosine similarity and vision-based page segmentation (20)

PDF
Product Comparison Website using Web scraping and Machine learning.
PDF
Image based Search Engine for Online Shopping
PDF
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
PDF
Image Based Information Retrieval Using Deep Learning and Clustering Techniques
PDF
Image Based Information Retrieval Using Deep Learning and Clustering Techniques
PDF
Virtual Shopping Using Image Processing AND Augmented Reality
PDF
IRJET - Visual E-Commerce Application using Deep Learning
PDF
Engineering challenges in vertical search engines
PDF
Search relevancy
PDF
One Stop Recommendation
PDF
Content Based Image Retrieval
PPT
Bridging the Semantic Gap in Vertical Image Search by Combining Text and Visu...
PDF
George Moiseev - Classification of E-commerce Websites by Product Categories
PDF
Web Pages Visual Similarity - Search Central Live Zurich 2024
PDF
Identifying Auxiliary Web Images Using Combinations of Analyses
PDF
btpreport
PDF
IRJET - Visual Enhancement of E-Commerce Products
PDF
Cd24534538
PDF
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
Product Comparison Website using Web scraping and Machine learning.
Image based Search Engine for Online Shopping
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
Image Based Information Retrieval Using Deep Learning and Clustering Techniques
Image Based Information Retrieval Using Deep Learning and Clustering Techniques
Virtual Shopping Using Image Processing AND Augmented Reality
IRJET - Visual E-Commerce Application using Deep Learning
Engineering challenges in vertical search engines
Search relevancy
One Stop Recommendation
Content Based Image Retrieval
Bridging the Semantic Gap in Vertical Image Search by Combining Text and Visu...
George Moiseev - Classification of E-commerce Websites by Product Categories
Web Pages Visual Similarity - Search Central Live Zurich 2024
Identifying Auxiliary Web Images Using Combinations of Analyses
btpreport
IRJET - Visual Enhancement of E-Commerce Products
Cd24534538
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
Ad

More from journalBEEI (20)

PDF
Square transposition: an approach to the transposition process in block cipher
PDF
Hyper-parameter optimization of convolutional neural network based on particl...
PDF
Supervised machine learning based liver disease prediction approach with LASS...
PDF
A secure and energy saving protocol for wireless sensor networks
PDF
Plant leaf identification system using convolutional neural network
PDF
Customized moodle-based learning management system for socially disadvantaged...
PDF
Understanding the role of individual learner in adaptive and personalized e-l...
PDF
Prototype mobile contactless transaction system in traditional markets to sup...
PDF
Wireless HART stack using multiprocessor technique with laxity algorithm
PDF
Implementation of double-layer loaded on octagon microstrip yagi antenna
PDF
The calculation of the field of an antenna located near the human head
PDF
Exact secure outage probability performance of uplinkdownlink multiple access...
PDF
Design of a dual-band antenna for energy harvesting application
PDF
Transforming data-centric eXtensible markup language into relational database...
PDF
Key performance requirement of future next wireless networks (6G)
PDF
Noise resistance territorial intensity-based optical flow using inverse confi...
PDF
Modeling climate phenomenon with software grids analysis and display system i...
PDF
An approach of re-organizing input dataset to enhance the quality of emotion ...
PDF
Parking detection system using background subtraction and HSV color segmentation
PDF
Quality of service performances of video and voice transmission in universal ...
Square transposition: an approach to the transposition process in block cipher
Hyper-parameter optimization of convolutional neural network based on particl...
Supervised machine learning based liver disease prediction approach with LASS...
A secure and energy saving protocol for wireless sensor networks
Plant leaf identification system using convolutional neural network
Customized moodle-based learning management system for socially disadvantaged...
Understanding the role of individual learner in adaptive and personalized e-l...
Prototype mobile contactless transaction system in traditional markets to sup...
Wireless HART stack using multiprocessor technique with laxity algorithm
Implementation of double-layer loaded on octagon microstrip yagi antenna
The calculation of the field of an antenna located near the human head
Exact secure outage probability performance of uplinkdownlink multiple access...
Design of a dual-band antenna for energy harvesting application
Transforming data-centric eXtensible markup language into relational database...
Key performance requirement of future next wireless networks (6G)
Noise resistance territorial intensity-based optical flow using inverse confi...
Modeling climate phenomenon with software grids analysis and display system i...
An approach of re-organizing input dataset to enhance the quality of emotion ...
Parking detection system using background subtraction and HSV color segmentation
Quality of service performances of video and voice transmission in universal ...

Recently uploaded (20)

PDF
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
PDF
Soil Improvement Techniques Note - Rabbi
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PPTX
Artificial Intelligence
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
communication and presentation skills 01
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PPTX
Information Storage and Retrieval Techniques Unit III
PPT
Occupational Health and Safety Management System
PPTX
UNIT - 3 Total quality Management .pptx
PDF
Visual Aids for Exploratory Data Analysis.pdf
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPT
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
PPTX
Fundamentals of Mechanical Engineering.pptx
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
Soil Improvement Techniques Note - Rabbi
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
Artificial Intelligence
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
communication and presentation skills 01
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
Information Storage and Retrieval Techniques Unit III
Occupational Health and Safety Management System
UNIT - 3 Total quality Management .pptx
Visual Aids for Exploratory Data Analysis.pdf
Safety Seminar civil to be ensured for safe working.
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
A5_DistSysCh1.ppt_INTRODUCTION TO DISTRIBUTED SYSTEMS
Fundamentals of Mechanical Engineering.pptx

Marketplace affiliates potential analysis using cosine similarity and vision-based page segmentation

  • 1. Bulletin of Electrical Engineering and Informatics Vol. 9, No. 6, December 2020, pp. 2492~2498 ISSN: 2302-9285, DOI: 10.11591/eei.v9i6.2018  2492 Journal homepage: http://beei.org Marketplace affiliates potential analysis using cosine similarity and vision-based page segmentation Wildan Budiawan Zulfikar1 , Mohamad Irfan2 , Muhammad Ghufron3 , Jumadi4 , Esa Firmansyah5 1,3 Department of Informatics, UIN Sunan Gunung Djati, Indonesia 2 Department of ICT, Asia E University, Malaysia 4 School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia 5 Department of Informatics, STMIK Sumedang, Indonesia Article Info ABSTRACT Article history: Received Aug 15, 2019 Revised Jan 28, 2020 Accepted Mar 1, 2020 One success factor of an online affiliate is determined by the quality of the content source. Therefore, affiliate marketplaces need to do an objective assessment to retrieve content data that will be used to choose the right product in the appropriate product filter. Usually, the selection is not made using a good and measured system so that the selection of product content is only based on parts that are not in accordance with what is seen or subjective. However, if analyzed using a good and measurable system will produce an objective product content and can have a positive impact on users because the selection is based on factual data. The purpose of this research is to analyze the potential of the affiliate marketplace by combining cosine similarity with vision-based page segmentation. This is a new breakthrough made for optimization to get the best content in accordance with the required criteria. This work will produce a number of product recommendations that are appropriate for publication and then made use of for comparison that matches the required criteria. At the limited evaluation stage, the performance of the proposed model obtained satisfactory results, in which 5 queries tested were all as expected. Keywords: Cosine similarity Marketplace affiliates Page segmentation Vision Web scraping This is an open access article under the CC BY-SA license. Corresponding Author: Wildan Budiawan Zulfikar, Department of Informatics, UIN Sunan Gunung Djati, 105th A.H. Nasution Street, Bandung, 40614, Indonesia. Email: wildan.b@uinsgd.ac.id 1. INTRODUCTION Nowadays, information technology has created new types and business opportunities where more and more business transactions are being made online. Therefore, everyone might easily carry out buying and selling transactions [1-3]. Many companies try to offer a variety of products using this media [4, 5]. One of the benefits of the existence of the internet is as a media promotion of a product. A product that is online via the internet can bring huge benefits to entrepreneurs because the product is known throughout the world [4, 6]. Web scraping is the process of extracting information from a website. Web scraping is an alternative way that chose because the required data is not always available in the API, another source like shared database or data warehouse, or even they do not provide the API at all [7-12]. This research has used product attribute data obtained from several marketplace affiliates using web scraping techniques. It used one of the web scraping methods, vision-based page segmentation. Vision-based page Segmentation is an algorithm for website page metadata. Based on previous research, this method of extracting tag tree data can detect content
  • 2. Bulletin of Electr Eng & Inf ISSN: 2302-9285  Marketplace affiliates potential analysis using cosine similarity and… (Wildan Budiawan Zulfikar) 2493 structures quickly [13, 14]. It transforms the deep web into a visual tree [13, 15]. The result is divided into several segments and can be processed using DOM parser before it can finally be processed and modeled [13]. In addition, the proposed model applies Cosine Similarity and TF-IDF. Cosine-Similarity is one algorithm that functions to compare similarities between documents. In this case, what is compared is a query with a training document [16-18]. In calculating cosine similarity, first, do a scalar multiplication between the query and the document then add up, then do the multiplication between the length of the document and the length of the query that has been squared, after that the square root count is calculated [16, 19-23]. Furthermore, the results of the scalar multiplication are divided by the results of the multiplication of the length of the document and query. 2. RESEARCH METHOD In this article, it will be explained that the existing attribute data is sourced from some marketplace data. Data sources are taken directly from the original website. Online marketplace data taken is product data that is still active in the product category list. Detailed marketplace affiliate data used in this work is described in Table 1. Table 1. List of marketplace affiliate Marketplace URL Role Tokopedia https://www.tokopedia.com Main marketplace Bukalapak https://www.bukalapak.com 2nd marketplace Blanja https://www.blanja.com 3rd marketplace Lazada https://www.lazada.co.id 4th marketplace In this work, the main marketplace is Tokopedia. Then, one product will compare to another marketplaces. The use of this method is divided into two processes namely the first process will be scraping product data based on all selected web marketplace data. This method uses the id category and name category attributes of each product. When the process of web scraping, product data will be divided into several categories that will be done using the cosine similarity and vision-based page segmentation methods. After the data is formed in the form of HTML dom, the system will determine one of the data used to do the process to display the data. The category becomes one of the data used as a reference for this data filtering process because it shows the level of each product data based on the category and is appropriate in retrieving accurate data and filters in the price and rating order. Product availability is the second factor because it supports product competency. 2.1. Cosine similarity The following is a simulation or example of data used in the process. Category data can be seen in Table 2. Conditions are adjusted to each category which in this case is limited to 6 categories. The product attributes that will be analyzed in the work in detail can be seen in Table 3. Table 2. Product categories Categories Code Fashion Cat_001 Health Cat_002 Beauty Cat_003 Smartphone and tablet Cat_004 Laptop Cat_005 Computer Cat_006 Table 3. Product attributes Attributes Code Name prod_name ID prod_id SKU prod_sku Link prod_link Figure prod_fig Price prod_price Category code prod_cat_id Category description prod_cat_name Advertiser prod_ads Table 4 is the query data that will be calculated using TF-IDF based on a specific query. This work involves six queries and four affiliate marketplaces as explained in Table 4. Figure 1 is a visualization of Table 4 to make it easier to read valid data and the same or almost the same then the table is converted into a graph diagram. Pictures from the graph diagram of the query and matching with each of the place list lists can be seen in the Figure 1.
  • 3.  ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 9, No. 6, December 2020 : 2492 – 2498 2494 Tabel 4. TF-IDF Query Marketplace 1 (D1) Marketplace 2 (D2) Marketplace 3 (D3) Marketplace 4 (D4) Galaxy s7 1 1 0 0 Samsung 1 1 1 1 iPhone X 128Gb Black 1 0 0 0 Galaxy+S7 1 1 1 1 MacBook Air 1 0 1 1 Keyboard Razer 1 0 1 0 Figure 1. TF-IDF data on graph The first stage of vision-based page segmentation is to determine the initial weight of each query manually. For example, the first weight is filled by Samsung's query and the second group is weighted by Galaxy. Then obtained: Centroid 1=0.3 and Centroid 2=0.3 as explained in Table 5 and the visualization explained in Figure 2. Table 5. TF-IDF data and weighting Code D1 D2 Description Oppo 0 0 Lenovo 0 0 Samsung 0.3 0 Weight 1 Asus 0 0 Galaxy 0 0.3 Weight 2 Iphone 0 0 Figure 2. TF-IDF data in a graph diagram with weighted queries Next calculate the distance of each data with each weight using (1) [24, 25]: (1) The next work to calculate the weight by comparing query data 1 with each query taken that has weight. The query data weighting can be seen in Table 6. Then, look for the average of each weight value to be used as a new query weight namely: W1 New: (0.3, 0.3) W2 New: (0.0, 0.0) This step will continue to be repeated until the conditions are met. The desired condition is that there is no change in the weighting of the data source which means there is no difference between the data query and the query in the previous iteration. Then the second iteration will be performed using a new weighting. iphone x 128 gb black macbook air galaxy s7 keyboard razer samsung 0 2 4 6 8 10 0 5 10 15 20 TF IDF GRAPH TOOLS QUERY oppo lenovo samsung asus galaxy iphone 0 2 4 6 8 10 0 5 10 15 20 TF-IDF Weighting Query Dokumen 1 dan 2
  • 4. Bulletin of Electr Eng & Inf ISSN: 2302-9285  Marketplace affiliates potential analysis using cosine similarity and… (Wildan Budiawan Zulfikar) 2495 In this experiment, the algorithm will be completed in the third iteration. The final results are presented in the form of a Cartesian diagram to make it easier to see the closeness of the data between the weighting and each data as explained on Table 6. Table 6 explains the list of Queries included in the category. Products that are in the first weighting are Q3, Q5, Q6 and in the second weighting are Q1, Q2, Q4. Table 6. Clustering results W1 W2 Q3 Q1 Q5 Q2 Q6 Q4 2.2. Vision-based page segmentation Query retrieved adjusted to the query that has been selected. The higher the weighting value of the selected query the higher the query used and conversely the lower the weighting of the query the lower weighting of the query is used. Next, calculate the normalization of data according to the vision-based page segmentation formula then multiplied by the weights that have been determined at the initialization stage namely (2) for profit and (3) for costs: { } (2) { } (3) 3. RESULTS AND DISCUSSION After conducting the previous weighting phase, product data search will be performed based on the Vision-based Page Segmentation algorithm. Based on some data that has been classified, only one product data will be taken that matches the previous TF-IDF process and to be compared using this algorithm. The query groups to be selected are based on the query to be searched. If the query is appropriate, then the appropriate group will be taken, and while the position is not appropriate, page 404 or product page will not be selected. In this case, the first query is taken that is Samsung Galaxy. The following is the use of the vision-based page segmentation algorithm as described in the previous chapter: a. First step is determining new product data. Figure 3 describe the default position of product detail including name, dimension, price, and any related data of product. Figure 3. Scheme design vision-based page segmentation product scraping data
  • 5.  ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 9, No. 6, December 2020 : 2492 – 2498 2496 b. Then, each product attributes parse by its category and subcategory as describe on Figure 4. Figure 4. Web process data page product data scraping c. Extract data that will be searched using (4). (4) Where w(1) is the query data that is input with the total weight that will be searched from the weight value of R and r is the number of segments related to the query that already has a value. d. Validation of data that has been processed and normalize data according to (5). { } (5) The results of data normalization using are explained in Table 7. Table 7. The result of data normalization No. Kode Div.Id Div.Attr Div.Class Cache Alpha result 1 Q1 0.875 1 0.8 1 0.5 2 Q2 1 1 0.8 0.954545455 1 3 Q4 0.9375 0.875 1 0.909090909 1 e. Normalization results are multiplied by the weights and summed to find out the final result of the preference value with (6) and final preference result explained in Table 8. ∑ (6) Table 8. Final preference results No. Kode Div.Id Div.Attr Div.Class Cache Alpha result Result W 0.3 0.2 0.2 0.15 0.15 1 Q1 0.875 1 0.8 1 0.5 0.8475 2 Q2 1 1 0.8 0.954545455 1 0.953181818 3 Q4 0.9375 0.875 1 0.909090909 1 0.942613636 Based on Table 8, it can be concluded that the most recommended Query data is the Query data with the Q2 code. Query data Q2 gets 0.95 results and is only 0.01 points different from Q4. 4. CONCLUSION If evaluated from performance, the proposed model gets the appropriate results. 5 queries tested everything as expected. The cosine similarity algorithm successfully improvised the vision-based page segmentation algorithm and was able to adjust product 1 to other products that were eligible to be selected by searching product data processed by TF-IDF. Further work, we suggest comparing this model with different methods.
  • 6. Bulletin of Electr Eng & Inf ISSN: 2302-9285  Marketplace affiliates potential analysis using cosine similarity and… (Wildan Budiawan Zulfikar) 2497 REFERENCES [1] Y. U. Chandra, S. Karya, and M. Hendrawaty, “Decision support systems for customer to buy products with an integration of reviews and comments from marketplace e-commerce sites in Indonesia: A proposed model,” International Journal Advamced Science Engineering Information Technology., vol. 9, no. 4, pp. 1171-1176, 2019. [2] I. O. Sfenrianto, A. Christiano, and M. P. Mulani, “Impact of e-service on customer loyalty in marketplace in Indonesia,” Journal of Theoretical and Applied Information Technology, vol. 96, no. 20, pp. 6795-6805, 2018. [3] G. J. A. Santoso and T. A. Napitupulu, “Factors affecting seller loyalty in business e-marketplace : A case of Indonesia,” Journal of Theoretical and Applied Information Technology, vol. 96, no. 1, pp. 162-171,·Jan. 2018. [4] M. A. Fauzan, A. S. Nisafani, and A. Wibisono, “Seller reputation impact on sales performance in public e-marketplace Bukalapak,” TELKOMNIKA Telecommunication Computing Electronic and Control, vol. 17, no. 4, pp. 1810-1817, Aug. 2019. [5] H. Yoganarasimhan, “The value of reputation in an online freelance marketplace,” Marketing Science, vol. 32, no. 6, pp. 860-891, Nov. 2013. [6] M. N. Alraja and M. A. Said Kashoob, “Transformation to electronic purchasing: an empirical investigation,” TELKOMNIKA Telecommunication Computing Electronic and Control, vol. 17, no. 3, pp. 1209-1219, June 2019. [7] S. K. Malik and S. Rizvi, “Information extraction using web usage mining, web scrapping and semantic annotation,” 2011 International Conference on Computational Intelligence and Communication Networks, Gwalior, pp. 465-469, 2011. [8] K. Sriraghav, S. Jayanthi, N. Vidya, and V. S. Felix Enigo, “ScrAnViz-A tool to scrap, analyze and visualize unstructured-data using attribute-based opinion mining algorithm,” 2017 Innovations in Power and Advanced Computing Technologies (i-PACT), Vellore, pp. 1-5, 2017. [9] R. Murali, “An intelligent web spider for online e-commerce data extraction,” 2018 Second International Conference on Green Computing and Internet of Things (ICGCIoT), Bangalore, India, pp. 332-339, 2018. [10] S. Mehak, R. Zafar, S. Aslam, and S. M. Bhatti, “Exploiting filtering approach with web scrapping for smart online shopping: Penny wise: A wise tool for online shopping,” 2019 2nd International Conference on Computing, Mathematics and Engineering Technologies (iCoMET), Sukkur, Pakistan, pp. 1-5, 2019. [11] Đ. Petrović and I. Stanišević, “Web scrapping and storing data in a database, a case study of the used cars market,” 2017 25th Telecommunication Forum (TELFOR), Belgrade, pp. 1-4, 2017. [12] J. G. Thomsen, E. Ernst, C. Brabrand, and M. Schwartzbach, “WebSelF: A web scraping framework,” International Conference on Web Engineering 2012, Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, vol. 7387, pp. 347-361, 2012. [13] M. Cormer, R. Mann, K. Moffatt, and R. Cohen, “Towards an improved vision-based web page segmentation algorithm,” 2017 14th Conference on Computer and Robot Vision (CRV), Edmonton, AB, pp. 345-352, 2017. [14] A. Bhardwaj and V. Mangat, “A novel approach for content extraction from web pages,” 2014 Recent Advances in Engineering and Computational Sciences (RAECS), Chandigarh, pp. 1-4, 2014. [15] P. Ko, S. Kang, and H. Kumar, “Web page dependent vision based segementation for web sites,” Seventh IEEE/ACIS International Conference on Computer and Information Science (ICIS 2008), Portland, OR, pp. 690-694, 2008. [16] D. Xue and Y. Wang, “Applying cosine similarity to discount evidence,” 2017 10th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, pp. 516-519, 2017. [17] X. Wang, Z. Xu, X. Xia, and C. Mao, “Computing user similarity by combining SimRank++ and cosine similarities to improve collaborative filtering,” 2017 14th Web Information Systems and Applications Conference (WISA), Liuzhou, pp. 205-210, 2017. [18] P. P. Gokul, B. K. Akhil, and K. K. M. Shiva, “Sentence similarity detection in Malayalam language using cosine similarity,” 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, pp. 221-225, 2017. [19] M. Alodadi and V. P. Janeja, “Similarity in patient support forums using TF-IDF and cosine similarity metrics,” 2015 International Conference on Healthcare Informatics, Dallas, TX, pp. 521-522, 2015. [20] Hilary I. Okagbue, Sheila A. Bishop, Pelumi E. Oguntunde, Patience I. Adamu, Abiodun A. Opanuga, and Elvir M. Akhmetshin, “Modified CiteScore metric for reducing the effect of self-citations,” TELKOMNIKA Telecommunication Computing Electronic and Control, vol. 17, no. 6, pp. 3044-3049, Dec. 2019. [21] A. Hamdy and M. Elsayed, “Towards more accurate automatic recommendation of software design patterns,” Journal of Theoretical & Applied Information Technology, vol. 96, no. 15, pp. 5069-5079, 2018. [22] D. Jayasri and D. D. Manimegalai, “An efficient cross ontology-based similarity measure for bio-document retrieval system,” Journal of Theoretical & Applied Information Technology, vol. 54, no. 2, pp. 245-258, 2013. [23] Y. Kawada, “Cosine similarity and the Borda rule,” Social Choice and Welfare, vol. 51, no. 1, pp. 1-11, Jun. 2018. [24] W. Uther, “TF–IDF,” in C. Sammut and G. I. Webb (Ed.), Encyclopedia of Machine Learning, Boston, MA: Springer US, pp. 986-987, 2011. [25] Li-Ping Jing, Hou-Kuan Huang, and Hong-Bo Shi, “Improved feature selection approach TFIDF in text mining,” Proceedings. International Conference on Machine Learning and Cybernetics, Beijing, vol. 2, pp. 944-946, 2002.
  • 7.  ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 9, No. 6, December 2020 : 2492 – 2498 2498 BIOGRAPHIES OF AUTHORS Wildan Budiawan Zulfikar received the B.Eng degree from UIN Sunan Gunung Djati, Indonesia, and M.Cs from STMIK LIKMI, Indonesia. He currently a lecturer at UIN Sunan Gunung Djati, Indonesia. His research area is in Information System and Data Mining. Mohamad Irfan received the B.Eng degree from UIN Sunan Gunung Djati, Indonesia, and M.Cs from STMIK LIKMI, Indonesia. He currently a Ph.D student of Asia E University, Malaysia. His research focused on Information System. Muhammad Ghufron received the B.Eng degree from UIN Sunan Gunung Djati, Indonesia. He currently a research assistant of Informatics Department. His research interest is Business Information System. Jumadi received the B.Eng degree from Dharma Negara Business and Infromatics School, Indonesia and M.Cs from Universitas Gajah Mada, Indonesia. He currently a Ph.D student of School of Electrical Engineering and Informatics, Institute Teknologi Bandung, Indonesia. His research focussed on Semantics Information Retrieval. Esa Firmansyah received the B.Eng degree from STMIK PMBI Bandung, and M.Kom from STTIBI Jakarta, Indonesia. He currently a Ph.D student of Asia E University, Malaysia. His research focussed on Information Technology & Information System.