SlideShare a Scribd company logo
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Issue: 01 | Jan-2014, Available @ http://www.ijret.org 347
VARIANCE ROVER SYSTEM: WEB ANALYTICS TOOL USING DATA
MINING
G. S. Kalekar1
, A.P.Mulmule2
, A. A. Pujari3
, A. A. Ugaonkar4
1, 2, 3, 4
Student, Computer Engineering Department, GES's R.H.S.COEMSR, Nashik, Maharashtra, India
Abstract
Learning Analytics by nature relies on computational information processing activities intended to extract from raw data some
interesting aspects that can be used to obtain insights into the behaviors of learners, the design of learning experiences, etc. There is a
large variety of computational techniques that can be employed, all with interesting properties, but it is the interpretation of their
results that really forms the core of the analytics process. As a rising subject, data mining and business intelligence are playing an
increasingly important role in the decision support activity of every walk of life. The Variance Rover System (VRS) mainly focused on
the large data sets obtained from online web visiting and categorizing this into clusters according some similarity and the process of
predicting customer behavior and selecting actions to influence that behavior to benefit the company, so as to take optimized and
beneficial decisions of business expansion.
Keywords: Analytics, Business intelligence, Clustering, Data Mining, Standard K-means, Optimized K-means
----------------------------------------------------------------------***------------------------------------------------------------------------
1. INTRODUCTION
With the tremendous competition in the domestic and
international business, Data Analytics has become one of
matters of concern to the enterprise. This important concept
has been given a new lease of life because of the growth of the
Internet and E-business. Data analysis takes Data at the center.
It gives a new life to the enterprise organization system and
optimizes the business process.
Data Mining Can be Used in Various business application for
different purposes such as decision support system, customer
retention strategies ,selective marketing, business management
user profile analysis to name a few. Data mining is the process
of discovering the knowledge. In today’s electronic
information era it becomes highly challenged to digital firms
to manage customer data to retrieve useful information as per
their requirement from that data, so market segmentation can
be used. Market segmentation also include customer retention
strategies, allocation of resources for advertising, to check
profit margins so outcome of segmentation plays big role in
deciding price of the products, attracting new customers and
identifying potential customers. Clustering analysis is able to
find out data distribution and proper inter relationship between
data items clustering is defined as “grouping of similar data”
.Clustering splits records in the database or data objects in the
dataset into series of meaningful subclasses or group. Data
mining is basically a useful process in which formation which
is incomplete and random that has been generated from
various business tasks such as production, marketing,
customer services of the enterprise.
1.1 Necessity
The focus area of this system is market research and analysis.
It is a web-based application and aims at determining target
markets and consumer density and identifying potential
customers. We have used the concept of Cluster analysis for
the same. This application will help determine the user’s
browsing details and monitor customer population. Web User
analysis is a simple template that provides a graphical, time-
phased overview of process in terms of conceptual design,
mission, analysis, and definition phases.
Fig -1: Web Analytics Template diagram
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Issue: 01 | Jan-2014, Available @ http://www.ijret.org 348
To trace the future market at particular place we need to find
that from which location of the world the visitors of the
website belong. Maximum used services by this visitors also
referral of the website and whole control of the system at
administrator. The similar kind of facility is available with
Google Analytic Tools but for every domain we have to pay
the different fees, also the whole database needs to be shared
with Google. Report formation is not as per our requirement
which is the major task in every system. But, unfortunately the
ownership of the generated reports lies with Google Analytics.
1.2 Objectives
The main objective of this paper is market research from mass
of real time data which work faster, better and robust. Along
with market research the project will cover the following
aspects:
i. Market research.
ii. Analyzing visitor traffic country wise and product
wise.
iii. Customer tracking
iv. Referrer researches
v. Product Positioning
vi. Business expansion
vii. Predicting future markets
viii. Retrieve user’s browsing details
ix. Reporting
2. LITERATURE SURVEY
Data mining is a powerful new technology with great ability to
help organizations to focus on the most important information
asset such as data that they have collected about the behavior
of their customers and potential customers. Most of the
analytical techniques used in data mining are often well-
known mathematical algorithms and techniques. It explores
information within the data that queries and reports can't
effectively reveal. Data mining techniques such as Decision
trees, Genetic algorithms, Neural networks and many more
help to analyze data in efficient manner. What is new is the
application of those techniques to general business problems
made possible by the increased availability of data and
inexpensive storage and processing power.
Clustering is nothing but “grouping similar kind of data
together”. By making use of clustering and the analytical tool
proper interrelationships between data points can be found out.
Thus, Data Mining will surely allow to analyze the data for
developing a good analytical tool. Thus such a tool should be
able to compare between different characteristics or attributes
of different groups and indentify different important
characteristics of each segment to decide different business
strategies. Hence, Clustering analysis can find out the
distribution of different data entities as well can find out
proper inter relationships between the data objects so that it
can divide the data set into series of meaningful subclasses.
One such common yet exclusive method used in clustering
analysis is K-means Clustering algorithm, which is a fast
method for classification of data into required clusters. Also,
Optimization of this algorithm will lead to faster segmentation
and will surely lead to more efficient results.
3. RELATED ALGORITHMS
3.1 Standard K-Means
K-means algorithm is an algorithm used to classify or to group
the objects based on attributes features into K number of
group. K is positive integer number. The grouping is done by
minimizing the sum of squares of distances between data and
the corresponding cluster centroid. Thus, the purpose of K-
mean clustering is to classify the data [5].
Standard k-means algorithm:
1. Initially, the number of clusters must be known, or
chosen, to be K say. K is positive integer number
.These points represent initial group centroids.Often
chosen such that the points are mutually “farthest
apart”, in some way
2. Assign each object to the group that has the closest
centroid.
3. When all objects have been assigned, recalculate the
positions of the K centroids.
4. This process is iterated until the centroids no longer
move. This produces a separation of the objects into
groups from which the metric to be minimized can be
calculated.
Advantage is that it is simple and has high processing speed
when applied to large amount of data. K-Means calculates
centroid of the clusters by taking average of the data points in
the data set. Its disadvantage is that it does not yield the same
result with each execution, as the resultant clusters depend
upon the initial random assignments as discussed in[1][4] and
most distance calculations in standard K-means are redundant.
3.2 Optimized K-Means
We use triangle inequality to reduce these redundant
calculations. In this way we improved the efficiency of the
algorithm to a large extent. As can be seen from the generally
acknowledged truth, the sum of two sides is greater than the
third side in a triangle. Euclidean distance meets the triangle
inequality, which we can [1] extend to the multi-dimensional
Euclidean space. We can take three vectors in Euclidean space
randomly: x, a, b, then:
d(x,a) + d(a,b) ≥ d(x,b)
d(a,b)− d(x,a )≤ d(x,b)
d(Ci,Cj), is the distance between two cluster centers.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Issue: 01 | Jan-2014, Available @ http://www.ijret.org 349
If 2d(x,a )≤ d(Ci,Cj) then:
2d(x,Cj )- d(x,Cj) ≤ d(Cj,Ck)- d(x,Cj) (1)
According to equation (1) then : d(x,Cj) ≤ d(x,Ck)
First, select initial cluster centers, and set the lower bound
y(x,f)=0 for each data point and cluster center. Second, assign
each data point to its nearest initial cluster, we will use the
results obtained previously to avoid unnecessary distance
calculations in this process. Each time d(x,f) is computed, set
y(x,f)=d(x,f).
4. APPLICATION
If the system is build to trace that which services or products
or application of the website are requested by the Visitors,
then it leads Business Intelligence to take place. We can
provide reports as desired which will surely help the
organization to make decisions regarding whether to expand,
shrink and retain their services, products or applications.
CONCLUSIONS
Customer is an important asset of an enterprise. Considering
this, the proposed system does the market research and
analysis which helps to determine target markets and
consumer density. Here, Data mining provides the technology
to analyze mass volume of data and/or detect hidden patterns
in data to convert raw data into valuable information.
ACKNOWLEDGMENTS
Thanks to Assistant Prof N.V.Alone for his valuable,
intellectual and encouraging guidance .He is the Head of the
Department of Computer Department at GES R.H.S COE,
Nashik. He has received Master in Computer Engineering and
has more than twelve years of teaching experience.
REFERENCES
[1]. Gao Hua , “Customer Relationship Management Based on
Data Mining Technique” 978-1-4244-8694-6/11/$26.00
©2011 IEEE
[2]. Xiaoping Qin, Shijue Zheng , Tingting HeMing Zou,Ying
Huang, “Optimizated K-means algorithm and application in
CRM system” 2010 International Symposium on Computer,
Communication, Control and Automation
[3]. Shu-Hsien Liao , Pei-Hui Chu, Pei-Yuan Hsiao “Data
mining techniques and its applications –A Decade review
from 2000 to 2011” , journal homepage:
www.elsevier.com/locate/eswa,Expert Systems with
Applications 39 (2012) 11303–11311
[4]. Mrs.G.P.Dharne, Mrs.S.A. Kinariwala Mrs.A.S.vaidya,
MS.P.V. Pandit, “A web user analyser by hierarchical and
optimized K-means algotrithm”, vol.1, issue7, dec.2011
Technique” 978-1- 4244-8694-6/11/$26.00 ©2011 IEEE
[5]. J. Han and M. Kamber, Data Mining: Concepts and
Techniques 2006
BIOGRAPHIES
Gauri Sanjay Kalekar, She is pursuing her
B.E. course in computer from University of
Pune.
Aditi Prakash Mulmule, She is pursuing her
B.E. course in computer from University of
Pune.
Akshay Arvind Pujari, He is pursuing his
B.E. course in computer from University of
Pune.
Abhilash Ajay Ugaonkar, He is pursuing his
B.E. course in computer from University of
Pune

More Related Content

PDF
Recommendation system using bloom filter in mapreduce
PDF
A statistical data fusion technique in virtual data integration environment
PDF
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
PDF
Effective data mining for proper
PDF
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
DOCX
PDF
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
PDF
Enhancement techniques for data warehouse staging area
Recommendation system using bloom filter in mapreduce
A statistical data fusion technique in virtual data integration environment
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
Effective data mining for proper
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
Enhancement techniques for data warehouse staging area

What's hot (18)

PDF
IRJET- Missing Data Imputation by Evidence Chain
PDF
Web Based Fuzzy Clustering Analysis
PDF
V2 i9 ijertv2is90699-1
PDF
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
PDF
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
PDF
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
PDF
Survey paper on Big Data Imputation and Privacy Algorithms
PPT
Data Mining In Market Research
PDF
Analysis on Data Mining Techniques for Heart Disease Dataset
PDF
Clustering
PDF
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
PDF
IRJET - An Overview of Machine Learning Algorithms for Data Science
PDF
A study on rough set theory based
PDF
A frame work for clustering time evolving data
PPTX
data generalization and summarization
PDF
TWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SET
PDF
F04463437
IRJET- Missing Data Imputation by Evidence Chain
Web Based Fuzzy Clustering Analysis
V2 i9 ijertv2is90699-1
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
Survey paper on Big Data Imputation and Privacy Algorithms
Data Mining In Market Research
Analysis on Data Mining Techniques for Heart Disease Dataset
Clustering
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION
IRJET - An Overview of Machine Learning Algorithms for Data Science
A study on rough set theory based
A frame work for clustering time evolving data
data generalization and summarization
TWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SET
F04463437
Ad

Viewers also liked (20)

PDF
Experimental evaluation of performance of electrical
PDF
Load balancing in public cloud combining the concepts of data mining and netw...
PDF
Improved method for pattern discovery in text mining
PDF
Influence of feeding system in injection moulding for
PDF
Laboratory studies of dense bituminous mixes ii with
PDF
Classification accuracy of sar images for various land
PDF
Optimization of energy in public buildings
PDF
Dehulling characteristics of oat (ol 9 variety) as affected by grain moisture...
PDF
Language identification using g lda
PDF
A novel p q control algorithm for combined active
PDF
Matlab based comparative studies on selected mppt
PDF
Assessment of indoor air quality in an automobile industry
PDF
Secret keys and the packets transportation for privacy data forwarding method...
PDF
Geospatial information system for tourism management in aurangabad city a re...
PDF
Offline signature identification using high intensity variations and cross ov...
PDF
Partial encryption of compresed video
PDF
A case study on energy savings in air conditioning system by heat recovery us...
PDF
Moderate quality of voice transmission using 8 bit micro-controller through z...
PDF
Video copy detection using segmentation method and
PDF
An enhanced approach for securing mobile agents from
Experimental evaluation of performance of electrical
Load balancing in public cloud combining the concepts of data mining and netw...
Improved method for pattern discovery in text mining
Influence of feeding system in injection moulding for
Laboratory studies of dense bituminous mixes ii with
Classification accuracy of sar images for various land
Optimization of energy in public buildings
Dehulling characteristics of oat (ol 9 variety) as affected by grain moisture...
Language identification using g lda
A novel p q control algorithm for combined active
Matlab based comparative studies on selected mppt
Assessment of indoor air quality in an automobile industry
Secret keys and the packets transportation for privacy data forwarding method...
Geospatial information system for tourism management in aurangabad city a re...
Offline signature identification using high intensity variations and cross ov...
Partial encryption of compresed video
A case study on energy savings in air conditioning system by heat recovery us...
Moderate quality of voice transmission using 8 bit micro-controller through z...
Video copy detection using segmentation method and
An enhanced approach for securing mobile agents from
Ad

Similar to Variance rover system web analytics tool using data (20)

PDF
Data mining techniques
PDF
Data mining techniques a survey paper
PDF
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
PDF
A study and survey on various progressive duplicate detection mechanisms
PDF
Feature Subset Selection for High Dimensional Data using Clustering Techniques
PDF
Different Classification Technique for Data mining in Insurance Industry usin...
PDF
PDF
How Partitioning Clustering Technique For Implementing...
PDF
Correlation of artificial neural network classification and nfrs attribute fi...
PDF
Evaluating the efficiency of rule techniques for file classification
PDF
Applications Of Clustering Techniques In Data Mining A Comparative Study
PDF
IRJET- A Detailed Study on Classification Techniques for Data Mining
PDF
Evaluating the efficiency of rule techniques for file
PDF
Introduction to feature subset selection method
PDF
A Survey on Machine Learning Algorithms
PDF
Review of Existing Methods in K-means Clustering Algorithm
PDF
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
PDF
Prediction of Default Customer in Banking Sector using Artificial Neural Network
PDF
Effectual citizen relationship management with data mining techniques
PDF
Effectual citizen relationship management with data mining techniques
Data mining techniques
Data mining techniques a survey paper
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
A study and survey on various progressive duplicate detection mechanisms
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Different Classification Technique for Data mining in Insurance Industry usin...
How Partitioning Clustering Technique For Implementing...
Correlation of artificial neural network classification and nfrs attribute fi...
Evaluating the efficiency of rule techniques for file classification
Applications Of Clustering Techniques In Data Mining A Comparative Study
IRJET- A Detailed Study on Classification Techniques for Data Mining
Evaluating the efficiency of rule techniques for file
Introduction to feature subset selection method
A Survey on Machine Learning Algorithms
Review of Existing Methods in K-means Clustering Algorithm
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
Prediction of Default Customer in Banking Sector using Artificial Neural Network
Effectual citizen relationship management with data mining techniques
Effectual citizen relationship management with data mining techniques

More from eSAT Publishing House (20)

PDF
Likely impacts of hudhud on the environment of visakhapatnam
PDF
Impact of flood disaster in a drought prone area – case study of alampur vill...
PDF
Hudhud cyclone – a severe disaster in visakhapatnam
PDF
Groundwater investigation using geophysical methods a case study of pydibhim...
PDF
Flood related disasters concerned to urban flooding in bangalore, india
PDF
Enhancing post disaster recovery by optimal infrastructure capacity building
PDF
Effect of lintel and lintel band on the global performance of reinforced conc...
PDF
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
PDF
Wind damage to buildings, infrastrucuture and landscape elements along the be...
PDF
Shear strength of rc deep beam panels – a review
PDF
Role of voluntary teams of professional engineers in dissater management – ex...
PDF
Risk analysis and environmental hazard management
PDF
Review study on performance of seismically tested repaired shear walls
PDF
Monitoring and assessment of air quality with reference to dust particles (pm...
PDF
Low cost wireless sensor networks and smartphone applications for disaster ma...
PDF
Coastal zones – seismic vulnerability an analysis from east coast of india
PDF
Can fracture mechanics predict damage due disaster of structures
PDF
Assessment of seismic susceptibility of rc buildings
PDF
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
PDF
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Likely impacts of hudhud on the environment of visakhapatnam
Impact of flood disaster in a drought prone area – case study of alampur vill...
Hudhud cyclone – a severe disaster in visakhapatnam
Groundwater investigation using geophysical methods a case study of pydibhim...
Flood related disasters concerned to urban flooding in bangalore, india
Enhancing post disaster recovery by optimal infrastructure capacity building
Effect of lintel and lintel band on the global performance of reinforced conc...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Shear strength of rc deep beam panels – a review
Role of voluntary teams of professional engineers in dissater management – ex...
Risk analysis and environmental hazard management
Review study on performance of seismically tested repaired shear walls
Monitoring and assessment of air quality with reference to dust particles (pm...
Low cost wireless sensor networks and smartphone applications for disaster ma...
Coastal zones – seismic vulnerability an analysis from east coast of india
Can fracture mechanics predict damage due disaster of structures
Assessment of seismic susceptibility of rc buildings
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...

Recently uploaded (20)

PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPT
Project quality management in manufacturing
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Construction Project Organization Group 2.pptx
PDF
composite construction of structures.pdf
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Welding lecture in detail for understanding
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
Arduino robotics embedded978-1-4302-3184-4.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Foundation to blockchain - A guide to Blockchain Tech
Strings in CPP - Strings in C++ are sequences of characters used to store and...
CH1 Production IntroductoryConcepts.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Project quality management in manufacturing
CYBER-CRIMES AND SECURITY A guide to understanding
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Construction Project Organization Group 2.pptx
composite construction of structures.pdf
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Internet of Things (IOT) - A guide to understanding
Welding lecture in detail for understanding
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems

Variance rover system web analytics tool using data

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Issue: 01 | Jan-2014, Available @ http://www.ijret.org 347 VARIANCE ROVER SYSTEM: WEB ANALYTICS TOOL USING DATA MINING G. S. Kalekar1 , A.P.Mulmule2 , A. A. Pujari3 , A. A. Ugaonkar4 1, 2, 3, 4 Student, Computer Engineering Department, GES's R.H.S.COEMSR, Nashik, Maharashtra, India Abstract Learning Analytics by nature relies on computational information processing activities intended to extract from raw data some interesting aspects that can be used to obtain insights into the behaviors of learners, the design of learning experiences, etc. There is a large variety of computational techniques that can be employed, all with interesting properties, but it is the interpretation of their results that really forms the core of the analytics process. As a rising subject, data mining and business intelligence are playing an increasingly important role in the decision support activity of every walk of life. The Variance Rover System (VRS) mainly focused on the large data sets obtained from online web visiting and categorizing this into clusters according some similarity and the process of predicting customer behavior and selecting actions to influence that behavior to benefit the company, so as to take optimized and beneficial decisions of business expansion. Keywords: Analytics, Business intelligence, Clustering, Data Mining, Standard K-means, Optimized K-means ----------------------------------------------------------------------***------------------------------------------------------------------------ 1. INTRODUCTION With the tremendous competition in the domestic and international business, Data Analytics has become one of matters of concern to the enterprise. This important concept has been given a new lease of life because of the growth of the Internet and E-business. Data analysis takes Data at the center. It gives a new life to the enterprise organization system and optimizes the business process. Data Mining Can be Used in Various business application for different purposes such as decision support system, customer retention strategies ,selective marketing, business management user profile analysis to name a few. Data mining is the process of discovering the knowledge. In today’s electronic information era it becomes highly challenged to digital firms to manage customer data to retrieve useful information as per their requirement from that data, so market segmentation can be used. Market segmentation also include customer retention strategies, allocation of resources for advertising, to check profit margins so outcome of segmentation plays big role in deciding price of the products, attracting new customers and identifying potential customers. Clustering analysis is able to find out data distribution and proper inter relationship between data items clustering is defined as “grouping of similar data” .Clustering splits records in the database or data objects in the dataset into series of meaningful subclasses or group. Data mining is basically a useful process in which formation which is incomplete and random that has been generated from various business tasks such as production, marketing, customer services of the enterprise. 1.1 Necessity The focus area of this system is market research and analysis. It is a web-based application and aims at determining target markets and consumer density and identifying potential customers. We have used the concept of Cluster analysis for the same. This application will help determine the user’s browsing details and monitor customer population. Web User analysis is a simple template that provides a graphical, time- phased overview of process in terms of conceptual design, mission, analysis, and definition phases. Fig -1: Web Analytics Template diagram
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Issue: 01 | Jan-2014, Available @ http://www.ijret.org 348 To trace the future market at particular place we need to find that from which location of the world the visitors of the website belong. Maximum used services by this visitors also referral of the website and whole control of the system at administrator. The similar kind of facility is available with Google Analytic Tools but for every domain we have to pay the different fees, also the whole database needs to be shared with Google. Report formation is not as per our requirement which is the major task in every system. But, unfortunately the ownership of the generated reports lies with Google Analytics. 1.2 Objectives The main objective of this paper is market research from mass of real time data which work faster, better and robust. Along with market research the project will cover the following aspects: i. Market research. ii. Analyzing visitor traffic country wise and product wise. iii. Customer tracking iv. Referrer researches v. Product Positioning vi. Business expansion vii. Predicting future markets viii. Retrieve user’s browsing details ix. Reporting 2. LITERATURE SURVEY Data mining is a powerful new technology with great ability to help organizations to focus on the most important information asset such as data that they have collected about the behavior of their customers and potential customers. Most of the analytical techniques used in data mining are often well- known mathematical algorithms and techniques. It explores information within the data that queries and reports can't effectively reveal. Data mining techniques such as Decision trees, Genetic algorithms, Neural networks and many more help to analyze data in efficient manner. What is new is the application of those techniques to general business problems made possible by the increased availability of data and inexpensive storage and processing power. Clustering is nothing but “grouping similar kind of data together”. By making use of clustering and the analytical tool proper interrelationships between data points can be found out. Thus, Data Mining will surely allow to analyze the data for developing a good analytical tool. Thus such a tool should be able to compare between different characteristics or attributes of different groups and indentify different important characteristics of each segment to decide different business strategies. Hence, Clustering analysis can find out the distribution of different data entities as well can find out proper inter relationships between the data objects so that it can divide the data set into series of meaningful subclasses. One such common yet exclusive method used in clustering analysis is K-means Clustering algorithm, which is a fast method for classification of data into required clusters. Also, Optimization of this algorithm will lead to faster segmentation and will surely lead to more efficient results. 3. RELATED ALGORITHMS 3.1 Standard K-Means K-means algorithm is an algorithm used to classify or to group the objects based on attributes features into K number of group. K is positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus, the purpose of K- mean clustering is to classify the data [5]. Standard k-means algorithm: 1. Initially, the number of clusters must be known, or chosen, to be K say. K is positive integer number .These points represent initial group centroids.Often chosen such that the points are mutually “farthest apart”, in some way 2. Assign each object to the group that has the closest centroid. 3. When all objects have been assigned, recalculate the positions of the K centroids. 4. This process is iterated until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated. Advantage is that it is simple and has high processing speed when applied to large amount of data. K-Means calculates centroid of the clusters by taking average of the data points in the data set. Its disadvantage is that it does not yield the same result with each execution, as the resultant clusters depend upon the initial random assignments as discussed in[1][4] and most distance calculations in standard K-means are redundant. 3.2 Optimized K-Means We use triangle inequality to reduce these redundant calculations. In this way we improved the efficiency of the algorithm to a large extent. As can be seen from the generally acknowledged truth, the sum of two sides is greater than the third side in a triangle. Euclidean distance meets the triangle inequality, which we can [1] extend to the multi-dimensional Euclidean space. We can take three vectors in Euclidean space randomly: x, a, b, then: d(x,a) + d(a,b) ≥ d(x,b) d(a,b)− d(x,a )≤ d(x,b) d(Ci,Cj), is the distance between two cluster centers.
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Issue: 01 | Jan-2014, Available @ http://www.ijret.org 349 If 2d(x,a )≤ d(Ci,Cj) then: 2d(x,Cj )- d(x,Cj) ≤ d(Cj,Ck)- d(x,Cj) (1) According to equation (1) then : d(x,Cj) ≤ d(x,Ck) First, select initial cluster centers, and set the lower bound y(x,f)=0 for each data point and cluster center. Second, assign each data point to its nearest initial cluster, we will use the results obtained previously to avoid unnecessary distance calculations in this process. Each time d(x,f) is computed, set y(x,f)=d(x,f). 4. APPLICATION If the system is build to trace that which services or products or application of the website are requested by the Visitors, then it leads Business Intelligence to take place. We can provide reports as desired which will surely help the organization to make decisions regarding whether to expand, shrink and retain their services, products or applications. CONCLUSIONS Customer is an important asset of an enterprise. Considering this, the proposed system does the market research and analysis which helps to determine target markets and consumer density. Here, Data mining provides the technology to analyze mass volume of data and/or detect hidden patterns in data to convert raw data into valuable information. ACKNOWLEDGMENTS Thanks to Assistant Prof N.V.Alone for his valuable, intellectual and encouraging guidance .He is the Head of the Department of Computer Department at GES R.H.S COE, Nashik. He has received Master in Computer Engineering and has more than twelve years of teaching experience. REFERENCES [1]. Gao Hua , “Customer Relationship Management Based on Data Mining Technique” 978-1-4244-8694-6/11/$26.00 ©2011 IEEE [2]. Xiaoping Qin, Shijue Zheng , Tingting HeMing Zou,Ying Huang, “Optimizated K-means algorithm and application in CRM system” 2010 International Symposium on Computer, Communication, Control and Automation [3]. Shu-Hsien Liao , Pei-Hui Chu, Pei-Yuan Hsiao “Data mining techniques and its applications –A Decade review from 2000 to 2011” , journal homepage: www.elsevier.com/locate/eswa,Expert Systems with Applications 39 (2012) 11303–11311 [4]. Mrs.G.P.Dharne, Mrs.S.A. Kinariwala Mrs.A.S.vaidya, MS.P.V. Pandit, “A web user analyser by hierarchical and optimized K-means algotrithm”, vol.1, issue7, dec.2011 Technique” 978-1- 4244-8694-6/11/$26.00 ©2011 IEEE [5]. J. Han and M. Kamber, Data Mining: Concepts and Techniques 2006 BIOGRAPHIES Gauri Sanjay Kalekar, She is pursuing her B.E. course in computer from University of Pune. Aditi Prakash Mulmule, She is pursuing her B.E. course in computer from University of Pune. Akshay Arvind Pujari, He is pursuing his B.E. course in computer from University of Pune. Abhilash Ajay Ugaonkar, He is pursuing his B.E. course in computer from University of Pune