IRJET- A Comprehensive way of finding Top-K Competitors using C-Miner Algorithm

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 03 | Mar-2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 3021
A Comprehensive way of finding Top-K Competitors using
C-Miner Algorithm
Sk. Wasim Akram1, G. Manoj Babu2, D. Pratap Roy3, G. Lakshmi Narayana Reddy4
1Asst. professor, Dept. of Computer Science and Engineering, VVIT, AP, India
2,3,4 Student , Dept. of Computer Science and Engineering, VVIT, AP, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – In order to get success in any business
environment it is important to attract the customers than
the competitor. A number of difficulties arise in the
perspective of this task that is to find a method to formalize
and compute the competitiveness relationship between two
items and to find the true competitors of a given item also to
know the features of an item that most affects its
competitiveness. Despite the impact and relevance of this
problem to many domains, only a limited amount of work
has been devoted toward an efficient solution. In this paper,
we present a formal definition of the competitiveness
between two items. An efficient method is presented for
evaluating competitiveness between items in large datasets
and address the natural problem of displaying the top-k
competitors of a given item. Our approach is evaluated
against strong baselines via a user study and experiments
on multiple datasets from various domains.
Key Words: Contenders, Pyramid Finder, C-Miner,
SkyLine, Cluster.
1. INTRODUCTION
Data mining is the process of sorting through large data
sets to discover patterns and establish relationships to
solve problems through data analytics [1]. Data mining
tools allow enterprises to predict future trends. A Long
line of research has exhibited the vital significance of
recognizing and observing firms rivals. Propelled by this
issue, the promoting and administration group have
concentrated on exact strategies for competitors
generation and in addition to techniques for breaking
down known contenders. Surviving exploration of the
previous has concentrated on mining near articulations
(e.g. "Thing A is superior to Item B") from the Web or
other literary sources. Despite the fact that such
articulations can without a doubt be pointers of
competitiveness, they are truant in numerous spaces. For
example, think about the area of get-away bundles. For
this situation, things have no doled out the name by which
they can be questioned or contrasted and each other.
Further, the recurrence of printed relative proof can differ
enormously crosswise over areas. For instance, when
looking at mark names at the firm level (e.g." Google
versus Yahoo" or "Sony versus Panasonic"), it is to be sure
likely that relative examples can be found by basically
questioning the web. Notwithstanding, it is anything but
difficult to recognize standard spaces where such proof is
to a great degree rare, for example, shoes, gems, inns,
eateries, and furniture. Roused by these weaknesses, we
propose another formalization of the competitiveness
between two things, in the market sections that they can
both cover.
2. EXISTING SYSTEM
The administration writing is rich with works that
emphasis on how directors can physically recognize
competitors. Some of these works demonstrate contender
recognizable proof as a psychological classification
process in which administrators create mental portrayals
of contenders and utilize them to order competitor firms.
Other manual classification techniques depend on market-
and asset-based similarities between a firm and applicant
contenders.
2.1 DISADVANTAGES OF EXISTING SYSTEM
The existing methodology isn't proper for assessing the
intensity of any two things or firms in a given market.
Rather, the creators accept that the arrangement of
contenders is given and, in this way, they will likely
register the estimation of the picked measures for every
contender. What's more, the reliance on value-based
information is a constraint we don't have.
3. PROPOSED SYSTEM
We propose another formalization of the intensity
between two things, in view of the market portions that
they can both cover. We depict a strategy for processing
every one of the sections in a given market in light of
mining vast survey datasets. This strategy enables us to
functionalize our meaning of competitiveness and address
the issue of finding the best k contenders of a thing in any
given market [2].
3.1 COMPETITIVENESS
Give U a chance to be the number of inhabitants in every
single conceivable client in a given market. We look at that
as a thing I covers a client u ∈ U in the event that it can
cover the greater part of the client's necessities. At that
point, the competitiveness between two things I, j is
relative to the quantity of clients that they can both cover
[2].

Fig - 1: Example of competitiveness on items
The above figure shows the competitiveness between 3
things i, j and k. Every item is mapped to the set of features
that it can offer to a customer. We Consider 3 features in
this example they are X, Y, Z. The left side of the Fig.1
shows three groups of customers named as group1,
group2 and group3. Every group represents a different
market segment. Customers are grouped based on their
preferences with respect to the features. Let us consider,
the customers in group2 are only interested in features X
and Y. We observe that items j and k are not competitive,
since they are not appeal to the same groups of customers
[2] [3].
This case represents the perfect situation, in which we
approach the total arrangement of clients in a given
market, and in addition to particular market sections and
their necessities. Practically speaking, be that as it may,
such data isn't accessible. With a specific end goal to beat
this, we portray a strategy for registering every one of the
fragments in a given market in light of mining extensive
audit datasets. This strategy enables us to functionalize
our meaning of competitiveness and address the issue of
finding the best k contenders of a thing in any given
market. As we appear in our work, this issue presents
critical computational difficulties, particularly within the
sight of huge datasets with hundreds or thousands of
things, for example, those that are regularly found in
standard spaces. We address these difficulties through an
exceptionally adaptable structure for top-k calculation,
including a productive assessment calculation and a fitting
record.
The common client session on a survey stage, for example,
Yelp, Amazon or Trip Advisor, comprises of the
accompanying advances
1) Specify every single required element in an inquiry.
2) Submit the inquiry to the site's internet searcher and
recover the coordinating things.
3) Process the surveys of the returned things and settle on
a buy choice.
In this setting, things that cover the client's prerequisites
will be incorporated into the web crawler's reaction and
will go after her consideration. Then again, non-covering
things won't be considered by the client and, along these
lines, won't have an opportunity to contend. Next, we
show a case that stretches out this basic leadership
procedure to a multi-client setting [4].
3.2 PYRAMID FINDER
Given the horizon Sky(I) of an arrangement of things I and
a thing i ∈ I, let Y contain the k things from Sky(I) that are
most aggressive with I. At that point, a thing j ∈ I must be
in the best k contenders of i, if j ∈ Y or if j is overwhelmed
by one of the things in Y. we don't have to consider the
whole arrangement of applicants with a specific end goal
to generate the top-k contenders. This propels us to build
the horizon pyramid. A structure that unimaginably
diminishes the amount of things that ought to be
considered. We allude to the calculation used to build the
horizon pyramid as Pyramid Finder [5][6].
Pseudo code:
Input: Set of items I
Output: Dominance Pyramid DI
1: DI[0] ←Sky(I)
2: Z ← I Skyline(I)
3: Level ←1.
4: while Z is not empty do
5: DI[level] ←Sky(Z)
6: for every item j ∈ DI[level] do
7: for every item I ∈ DI[level1] do
8: if i dominates j then
9: Add a link i → j
10: break
11: end if
12: end for
13: end for
14: Z ← Z skyline(Z)
15: level ←level + 1
16: end while
3.3 THE C-MINER ALGORITHM
Next, we exhibit C Miner, a correct calculation for finding
the best k contenders of a given thing. Our calculation
influences utilization of the horizon to pyramid keeping in
mind the end goal to diminish the quantity of things that
should be considered. Given that we just think about the

best k contenders, we can incrementally process the score
of every applicant and stop when it is ensured that the
best k has developed [7].
The info incorporates the arrangement of things I, the
arrangement of highlights F, the thing of intrigue I, the
number k of best contenders to recover, the set Q of
inquiries and their probabilities, and the horizon pyramid
DI . The calculation initially recovers the things that
overwhelm I. These things have the greatest conceivable
intensity with I. On the off chance that at any rate k such
things exist, we report those and close. Else, we add them
to Top-k and decrement our financial plan of k
appropriately. Consider LB keeps up the most reduced
lower bound from the present best k set and is utilized to
prune competitors. We instate the arrangement of
applicants X as the association of things in the main layer
of the pyramid and the arrangement of things commanded
by those as of now in the Top-k [8] .
This is accomplished by means of calling GETSLAVES
routine by passing parameters (Top-k, DI). In each cycle,
C-Miner nourishes the arrangement of hopefuls X to the
UPDATETOPK() schedule, which prunes things in view of
the LB edge. It at that point refreshes the Top-k set
through the MERGE() routine work, which recognizes the
things with the most astounding intensity from Top-k .
This can be accomplished in direct time, since both X and
TOP-K are arranged. The pruning edge LB is set to the
most noticeably bad (least) score among the new Top-k. At
long last, GETSLAVES() routine is utilized to extend the
arrangement of hopefuls by including things that are
overwhelmed by those in X [9].
The UPDATETOPK() procedures hopefuls in X and finds at
most k applicants with the most elevated intensity with i.
The routine uses an information structure nearby Top-k,
executed as a cooperative cluster: the score of every
applicant fills in as the key, while its id fills in as the
esteem. The cluster is key-arranged, to encourage the
calculation of the k best things. The structure is
consequently truncated with the goal that it generally
contains at most k things. We instate the lower and upper
limits. For each thing j ∈ X, low(j) keeps up the current
intensity score of j as new inquiries are considered and
fills in as a lower bound to the applicant's real score. Each
lower bound low(j) begins from 0, and after the fulfillment
of UPDATETOPK(), it incorporates the genuine intensity
score CF (i, j) of competitor j with the central thing i. Then
again, up(j) is an idealistic upper bound on j's
competitiveness score. At first, up(j) is set to the most
extreme conceivable score. For each question q ∈ Q, maxV
holds the most extreme conceivable competitiveness
between thing i and some other thing for that inquiry,
which is in reality the scope of i as for q. At that point, for
every competitor j ∈ X , we subtract maxV from up(j) and
afterward add to it the genuine intensity amongst i and j
for inquiry q. In the event that the upper bound up(j) of a
competitor j progresses toward becoming lower than the
pruning limit LB, at that point j can be securely precluded.
Something else, low(j) is refreshed and j stays in thought.
After each refresh, the estimation of LB is set to the most
exceedingly terrible score in nearby TOP-K, to utilize
stricter pruning in future cycles.
In the event that the quantity of applicant’s |X| turns out to
be less or equivalent to k, the loop over the questions
stops. This is an early-ceasing model: since we will
probably recover the best k hopefuls in X, having |X| <= k
implies that every outstanding competitor ought to be
returned. We finish the intensity calculation of the rest of
the hopefuls and refresh nearby Top-k as needs be. This
happens after the consummation of the principal circle,
with a specific end goal to stay away from pointless
bound-checking and enhance execution [10].
3.4 ADVANTAGES OF PROPOSED SYSTEM
1. A formal meaning of the competitiveness between two
things, in view of their interest to the different client
portions in their market. Our approach beats the
dependence of past work on rare near proof mined from
the content.
2. A formal technique for the distinguishing proof of the
diverse kinds of clients in a given market, and also for the
estimation of the level of clients that have a place with
each sort.
3. A profoundly versatile structure for finding the best k
contenders of a given thing in expansive datasets.
4. CONCLUSION
In this work, we are providing a traditional definition
among various competitors based on features specified by
users on different items. C-miner Algorithm combined
with pyramid finder provides an efficient way of
identifying Top-k Contenders by considering various
factors like preferences and opinions of the users, and
finally it generates clusters by validating the score of
various participants to find K best things. This proposed
framework is designed to handle large data sets that are
chosen from various domains which are consider as key
data sets that are helpful in finding Top-k competitors. The
basic idea of our methodology is efficient and adaptable to
enhance and evaluate real datasets from a variety of
domains.
REFERENCES
[1]http://searchsqlserver.techtarget.com/definition/data-
mining.
[2]George Valkanas, Theodoros Lappas, and Dimitrios
Gunopulos, ”Mining Competitors from Large Unstructured
Datasets”,DOI10.1109/TKDE.2017.2705101, IEEE
Transactions on Knowledge and Data Engineering.

[3]Theodoros Lappas, George Valkanas, Dimitrios
Gunopulos, ”Efﬁcient and Domain-Invariant Competitor
Mining”,2012.
[4] Mark Bergena, y and Margaret A. Peteraf b,
”Competitor Identification and Competitor Analysis: A
Broad-Based Managerial Approach”, 2002.
[5] Sanket Shah, Amit Thakkar, Sonal Rami, ”A Novel
Approach for Making Recommendation using Skyline
Query based on user Location and Preference”, Indian
Journal of Science and Technology, Vol 9(30), DOI:
10.17485/ijst/2016/v9i30/99075,August 2016.
[6] kian-lee tian, pin-kwang Eng, Beng chin Ooi, “Efficient
Progressive Skyline Computation”,2001.
[7] Rui Li shenghua Bao, Jin Wang, Yong Yu, ”Cominer: An
Effective Algorithm for Mining Competitors from the web”
,2006.
[8] Qian Wan, Raymond Chi-Wing Wong, Yu Peng,
”Finding Top-k Profitable products”, 2012.
[9] Maksim Lapin, Matthias Hein, and Bernt Schiele,
”Analysis and Optimization of Loss Functions for
Multiclass, Top-k, and Multi label Classiﬁcation”, 12 Dec
2016.
[10] Vincent S., T seng, “Efficient algorithms for mining
Top-k high utility ItemSets”, ieee vol.28 January 2016.

IRJET- A Comprehensive way of finding Top-K Competitors using C-Miner Algorithm

More Related Content

What's hot (14)

Similar to IRJET- A Comprehensive way of finding Top-K Competitors using C-Miner Algorithm (20)

More from IRJET Journal (20)

Recently uploaded (20)

IRJET- A Comprehensive way of finding Top-K Competitors using C-Miner Algorithm