SlideShare a Scribd company logo
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
449
PRIVACY PRESERVING CLUSTERING ON CENTRALIZED DATA
THROUGH SCALING TRANSFORMATION
Khatri Nishant P. Ms. Preeti Gupta
M. Tech. (CSE) CSE Dept.
Amity School of Engg. & Tech. Amity School of Engg & Tech
Amity University Rajasthan, Amity University Rajasthan,
Jaipur, India Jaipur, India
Tusal Patel
M. Tech. (CSE)
Amity School of Engg. & Tech.
Amity University Rajasthan,
Jaipur, India
ABSTRACT
Data sharing among organizations is considered to be useful as it offers mutual
benefits for effective decision making and business growth. Data mining techniques can be
applied on this shared data which can help in extracting meaningful, useful, previously
unknown and ultimately comprehensible information from large databases. This ultimately
leads to knowledge discovery and the mined knowledge can be used for irrefutable profits by
both the parties. However information which is an important asset to business organizations,
when shared raises an issue of privacy breach. Though this paper, privacy preserving
clustering for centralized data through scaling based transformation is being introduced.
Keywords: Data mining, Clustering, Privacy Preservation, Scaling
I INTRODUCTION
The information age has enabled many organizations to gather large volume of data.
However, the usefulness of this data is negligible if “meaningful information” or
“knowledge” cannot be extracted from it and is not put to best use in future to increase
effectiveness. Data mining otherwise known as knowledge discovery is the technique used by
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING
& TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 3, May-June (2013), pp. 449-454
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
450
analysts to find out the hidden and unknown pattern from the collection of data which can be
put to great use for deducing convincing opportunities. In contrast to standard statistical
methods, data mining techniques search for interesting information. Many techniques like
classification, clustering, association rule mining, etc. can be applied for mining knowledge
from large databases.
Confidentiality Issues in Data Mining: It can be seen that there are situations where
sharing of data among organizations can lead to mutual gain. But a key issue that arises in
any kind of sharing of data is that of confidentiality. The need for privacy is sometimes due to
law (e.g., for medical databases) or can be motivated by business interests. Therefore the
issue raises a challenge for researchers for finding techniques to preserve the privacy of data
among the communicating parties.
Most privacy preserving data mining methods use some form of transformation on data to
perform privacy preservation. Typically, such methods reduce the granularity of representation
to preserve privacy.
This paper presents a technique of privacy preserving clustering where irreversible scaling
transformation applied on centralized data stored in a data matrix can lead to preserving of
confidentiality yet not changing the nature of the data and the relationship existing between
the data objects.
II. RELATED WORK
[1] suggests the method of privacy preserving computation of cluster means. It is done
using two protocols ( one based on oblivious polynomial evaluation and second on
homomorphic encryption). In [2], the k-means technique is used to preserve privacy of
vertically partitioned data. Vertically partitioned data means the complete attribute set of
database is divided into two or more sets and each set serves as individual database. [3]
suggests the decision tree technique for privacy preserving over vertically partitioned data.
[4] suggests the method for privacy preserving clustering by Rotation Based Technique(RBT)
which is very effective method concentrated mainly on isometric transformation. [5] presents
an algorithm for privacy preservation for Support Vector Machine(SVM) based classification
using local and global models. Local models are local to each party which are not disclosed
while generating global model jointly. The global model remains the same for every party
which is then used for classifying new data objects. [6] represents the modified k-means
algorithm for privacy preserving. A privacy preserving protocol for k-clustering is used on
horizontally partitioned databases. Many more privacy preservation techniques has been
presented in [6] for Naive Bayes and Decision Tree classification. [7] presented various
techniques for privacy preservation for different procedures of data mining. An algorithm is
suggested for preserving privacy in association rule mining. A subroutine has also been
presented for securely finding the closest cluster in k-means clustering for privacy
preservation. [8] represents various cryptographic techniques for privacy preserving. [9]
presents the theoretical and experimental results to demonstrate that most probably the
random data distortion preserves little data privacy.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
451
III. PRIVACY PRESERVING CLUSTERING BY DATA MATRIX
TRANSFORMATION
A. Terms Used
a. Data Matrix
Objects (e.g. individuals, patterns, events) are usually represented as points (vectors) in a
multidimensional space. Each dimension represents a distinct attribute describing the object.
Thus, an object is represented as an m x n matrix D, where there are m rows, one for each
object, and n columns, one for each attribute. This matrix is referred to as a data matrix,
represented as follows:












mnmkm
nk
nk
aaa
aaa
aaa
..
.....
..
..
1
2221
1111
B. Assumption
1) In the paper an effort to secure attributes with numeric values is made, with an
assumption that numeric data (e.g. salary, age, phone number, etc.). is definitely the most
sensitive data that needs to be secured.
C. General Approach
Scaling Based Transformation (SBT) method:
Let Dmxn be a data matrix, where each row represents an object, and each object contains
values for each of n numerical attributes. The SBT method of dimension n is an ordered pair,
defined as SBT = (D, fs), where:
1. D R mxn
is a normalized data matrix of objects to be clustered
2. fs is scaling based transformation function
In this procedure as the scaling operation of data matrix is used , which is taken as 2D
transformation. So it is mandatory to decide the scaling factor. Here it is supposed to be kept
same in both the x and y direction. Doing so will lead to shifting point on a higher scale. This
is the key factor of maintaining the cluster distribution before and after the transformation.
Even the points will be distorted as compared to original data points, the cluster distribution
remains the same. Thus this procedure preserves the privacy without distorting the data
mining results before transformation.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
452
D. Proposed Algorithm
SBT_Algorithm
Input : Dmxn // Dmxn is normalized data matrix
Output: D'mxn
1. k ← n/2
2. Pk ← k pairs(Ai,Aj) in D such that 1 ≤ i,j ≤ n and i ≠ j
3. Decide scaling factor s.
4. For each selected pair Pk in pairs(d) do
a. V(A'i,A'j) ← S X V(Ai,Aj) // S is scaling matrix with s as scaling factor
End for
End
E. Results
For performing the proposed procedure iris2D dataset is used which contains 150
records. We have performed the clustering operation using Weka 3.6. We have used simple
k-Means clustering algorithm for our dataset.
1) Cluster distribution before transformation.
Figure 1- Cluster Distribution before transformation
This output shows that 100 records belong to first cluster (cluster 0) and rest of 50 records
belong to second cluster (cluster 1).
After this the transformed data set is supplied to Weka for k-Means clustering and the
visualized output is as shown below.
2) Cluster distribution after transformation.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
453
Figure 2- Cluster Distribution after transformation
Comparing Figure 1 and Figure 2 it is clear that the cluster distribution before and
after transformation remains the same. Hence our procedure works effectively to maintain
privacy for the confidential numeric data.
F. Security
The above stated procedure provides security to the numeric data. It means even if the
standard deviation and mean of the numeric dataset is published then also the original
numeric data of dataset before transformation cannot be interpreted correctly. This is
accomplished mainly in two steps:
1) Data Camouflage: First we try to conceal raw data by normalization. Obviously it is not
secure but it is beneficial in two ways a) It gives an equal weight to all attributes and b) It
makes difficult the re-identification of objects with other datasets.
2) Attribute Distortion: By scaling two attribute values at a time attribute distortion is
achieved
IV. CONCLUSION
In this paper, a scaling based transformation method has been introduced for Privacy
Preserving Clustering on Centralized Data. The proposed method is designed to preserve
privacy only for numeric confidential data. This procedure also ensures the similar cluster
distributions before and after transformation. This method is clustering algorithm
independent. Moreover unsuccessful attempt is also made to recover original data from
normalized data which ensures the security of data after transformation without changes in
cluster distribution.
Nowadays whatever data is required at particular site only that data is stored locally.
So the complete dataset is stored in distributed manner. Doing so maintains the availability of
data and also reduces the load of data server.
Hence as a part of future work this procedure can be applied to the distributed data by
making some changes for preserving privacy. This would lead to better method for
maintaining confidentiality of distributed (Horizontally/Vertically) data.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME
454
V. REFERENCES
[1] "Privacy Preserving Clustering" by S.Jha, L. Kruger, P. McDaniel
[2] "Privacy Preserving KMeans Clustering over Vertically Partitioned Data" by Jaideep
Vaidya,Chris Clifton in SIGKDD 2003.
[3] "Privacy Preserving Decision Trees over Vertically Partitioned Data" by Jaideep
Vaidya,Chris Clifton, Murat Kantarcioglu, A. Scott Patterson at ACM Transactions on
Knowledge Discovery from Data, Vol. 2, No. 3, Article 14, Publication date: October
2008.
[4] "Privacy Preserving Spatio-Temporal Clustering on Horizontally Partitioned Data" Ali
Inan, Yucel Saygin
[5] "Privacy Preserving SVM Classification on Vertically Partitioned Data" Hwanjo Yu,
Jaideep Vaidya, Xiaoqian Jiang.
[6] "Communication Efficient Privacy-Preserving Clustering" Geetha Jagannathan,
Krishnan Pillaipakkamnatt, Rebecca N. Wright, Daryl, Umano
[7] A thesis on "Privacy Preserving Data Mining Over Vertically Partitioned Data" by
Jaideep Shrikant Vaidya.
[8] "Cryptographic techniques for privacy-preserving data mining" by Benny Pinkas.
[9] "Random Data Perturbation Techniques and Privacy Preserving Data Mining " by Hillol
Kargupta, Souptik Dutta, Qi Wang, Krishnamoorthy Sivakumar.
[10] Deepika Khurana and Dr. M.P.S Bhatia, “Dynamic Approach To K-Means Clustering
Algorithm”, International Journal of Computer Engineering & Technology (IJCET),
Volume 4, Issue 3, 2013, pp. 204 - 219, ISSN Print: 0976 – 6367, ISSN Online:
0976 – 6375.

More Related Content

PDF
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
PDF
TWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SET
PDF
Distance based transformation for privacy preserving data mining using hybrid...
PDF
Introduction to Multi-Objective Clustering Ensemble
PDF
The Architecture of Cloud Storage Model Based On Confusion Theory
PDF
Scalable and efficient cluster based framework for
PDF
Scalable and efficient cluster based framework for multidimensional indexing
PDF
Data Hiding Method With High Embedding Capacity Character
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
TWO PARTY HIERARICHAL CLUSTERING OVER HORIZONTALLY PARTITIONED DATA SET
Distance based transformation for privacy preserving data mining using hybrid...
Introduction to Multi-Objective Clustering Ensemble
The Architecture of Cloud Storage Model Based On Confusion Theory
Scalable and efficient cluster based framework for
Scalable and efficient cluster based framework for multidimensional indexing
Data Hiding Method With High Embedding Capacity Character

What's hot (19)

PDF
An efficient algorithm for privacy
PDF
F04713641
PDF
Protecting Data by Improving Quality of Stego Image based on Enhanced Reduced...
PDF
Reduct generation for the incremental data using rough set theory
PDF
Survey paper on Big Data Imputation and Privacy Algorithms
PDF
K-means Clustering Method for the Analysis of Log Data
PDF
50120130406022
PDF
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
PDF
Az36311316
PDF
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
PDF
Preprocessing and secure computations for privacy preservation data mining
PDF
84cc04ff77007e457df6aa2b814d2346bf1b
PDF
Dynamic approach to k means clustering algorithm-2
PDF
PRIVACY PRESERVING CLUSTERING IN DATA MINING USING VQ CODE BOOK GENERATION
PDF
50120130406008
PDF
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
PDF
Privacy Preserving Clustering on Distorted data
PDF
IRJET- Enhanced Density Based Method for Clustering Data Stream
PDF
Variance rover system
An efficient algorithm for privacy
F04713641
Protecting Data by Improving Quality of Stego Image based on Enhanced Reduced...
Reduct generation for the incremental data using rough set theory
Survey paper on Big Data Imputation and Privacy Algorithms
K-means Clustering Method for the Analysis of Log Data
50120130406022
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...
Az36311316
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
Preprocessing and secure computations for privacy preservation data mining
84cc04ff77007e457df6aa2b814d2346bf1b
Dynamic approach to k means clustering algorithm-2
PRIVACY PRESERVING CLUSTERING IN DATA MINING USING VQ CODE BOOK GENERATION
50120130406008
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...
Privacy Preserving Clustering on Distorted data
IRJET- Enhanced Density Based Method for Clustering Data Stream
Variance rover system
Ad

Similar to Privacy preserving clustering on centralized data through scaling transf (20)

PDF
A Review on Privacy Preservation in Data Mining
PDF
A review on privacy preservation in data mining
PDF
A Review on Privacy Preservation in Data Mining
PDF
A Review on Privacy Preservation in Data Mining
PDF
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
PDF
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
PDF
A Comparative Study on Privacy Preserving Datamining Techniques
PDF
VECTOR QUANTIZATION FOR PRIVACY PRESERVING CLUSTERING IN DATA MINING
PDF
Privacy preservation techniques in data mining
PDF
Privacy preservation techniques in data mining
PDF
A Survey on Features and Techniques Description for Privacy of Sensitive Info...
PDF
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
PDF
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
PDF
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
PDF
1699 1704
PDF
1699 1704
PDF
Data Transformation Technique for Protecting Private Information in Privacy P...
PDF
Bj32809815
A Review on Privacy Preservation in Data Mining
A review on privacy preservation in data mining
A Review on Privacy Preservation in Data Mining
A Review on Privacy Preservation in Data Mining
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
A Comparative Study on Privacy Preserving Datamining Techniques
VECTOR QUANTIZATION FOR PRIVACY PRESERVING CLUSTERING IN DATA MINING
Privacy preservation techniques in data mining
Privacy preservation techniques in data mining
A Survey on Features and Techniques Description for Privacy of Sensitive Info...
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
1699 1704
1699 1704
Data Transformation Technique for Protecting Private Information in Privacy P...
Bj32809815
Ad

More from IAEME Publication (20)

PDF
IAEME_Publication_Call_for_Paper_September_2022.pdf
PDF
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
PDF
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
PDF
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
PDF
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
PDF
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
PDF
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
PDF
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
PDF
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
PDF
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
PDF
GANDHI ON NON-VIOLENT POLICE
PDF
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
PDF
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
PDF
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
PDF
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
PDF
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
PDF
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
PDF
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
PDF
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
PDF
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
IAEME_Publication_Call_for_Paper_September_2022.pdf
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
GANDHI ON NON-VIOLENT POLICE
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
A Presentation on Artificial Intelligence
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
Per capita expenditure prediction using model stacking based on satellite ima...
A Presentation on Artificial Intelligence
MYSQL Presentation for SQL database connectivity
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Big Data Technologies - Introduction.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Understanding_Digital_Forensics_Presentation.pptx
NewMind AI Monthly Chronicles - July 2025
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Network Security Unit 5.pdf for BCA BBA.
Encapsulation_ Review paper, used for researhc scholars
Unlocking AI with Model Context Protocol (MCP)
Mobile App Security Testing_ A Comprehensive Guide.pdf
Review of recent advances in non-invasive hemoglobin estimation

Privacy preserving clustering on centralized data through scaling transf

  • 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 449 PRIVACY PRESERVING CLUSTERING ON CENTRALIZED DATA THROUGH SCALING TRANSFORMATION Khatri Nishant P. Ms. Preeti Gupta M. Tech. (CSE) CSE Dept. Amity School of Engg. & Tech. Amity School of Engg & Tech Amity University Rajasthan, Amity University Rajasthan, Jaipur, India Jaipur, India Tusal Patel M. Tech. (CSE) Amity School of Engg. & Tech. Amity University Rajasthan, Jaipur, India ABSTRACT Data sharing among organizations is considered to be useful as it offers mutual benefits for effective decision making and business growth. Data mining techniques can be applied on this shared data which can help in extracting meaningful, useful, previously unknown and ultimately comprehensible information from large databases. This ultimately leads to knowledge discovery and the mined knowledge can be used for irrefutable profits by both the parties. However information which is an important asset to business organizations, when shared raises an issue of privacy breach. Though this paper, privacy preserving clustering for centralized data through scaling based transformation is being introduced. Keywords: Data mining, Clustering, Privacy Preservation, Scaling I INTRODUCTION The information age has enabled many organizations to gather large volume of data. However, the usefulness of this data is negligible if “meaningful information” or “knowledge” cannot be extracted from it and is not put to best use in future to increase effectiveness. Data mining otherwise known as knowledge discovery is the technique used by INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 3, May-June (2013), pp. 449-454 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 450 analysts to find out the hidden and unknown pattern from the collection of data which can be put to great use for deducing convincing opportunities. In contrast to standard statistical methods, data mining techniques search for interesting information. Many techniques like classification, clustering, association rule mining, etc. can be applied for mining knowledge from large databases. Confidentiality Issues in Data Mining: It can be seen that there are situations where sharing of data among organizations can lead to mutual gain. But a key issue that arises in any kind of sharing of data is that of confidentiality. The need for privacy is sometimes due to law (e.g., for medical databases) or can be motivated by business interests. Therefore the issue raises a challenge for researchers for finding techniques to preserve the privacy of data among the communicating parties. Most privacy preserving data mining methods use some form of transformation on data to perform privacy preservation. Typically, such methods reduce the granularity of representation to preserve privacy. This paper presents a technique of privacy preserving clustering where irreversible scaling transformation applied on centralized data stored in a data matrix can lead to preserving of confidentiality yet not changing the nature of the data and the relationship existing between the data objects. II. RELATED WORK [1] suggests the method of privacy preserving computation of cluster means. It is done using two protocols ( one based on oblivious polynomial evaluation and second on homomorphic encryption). In [2], the k-means technique is used to preserve privacy of vertically partitioned data. Vertically partitioned data means the complete attribute set of database is divided into two or more sets and each set serves as individual database. [3] suggests the decision tree technique for privacy preserving over vertically partitioned data. [4] suggests the method for privacy preserving clustering by Rotation Based Technique(RBT) which is very effective method concentrated mainly on isometric transformation. [5] presents an algorithm for privacy preservation for Support Vector Machine(SVM) based classification using local and global models. Local models are local to each party which are not disclosed while generating global model jointly. The global model remains the same for every party which is then used for classifying new data objects. [6] represents the modified k-means algorithm for privacy preserving. A privacy preserving protocol for k-clustering is used on horizontally partitioned databases. Many more privacy preservation techniques has been presented in [6] for Naive Bayes and Decision Tree classification. [7] presented various techniques for privacy preservation for different procedures of data mining. An algorithm is suggested for preserving privacy in association rule mining. A subroutine has also been presented for securely finding the closest cluster in k-means clustering for privacy preservation. [8] represents various cryptographic techniques for privacy preserving. [9] presents the theoretical and experimental results to demonstrate that most probably the random data distortion preserves little data privacy.
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 451 III. PRIVACY PRESERVING CLUSTERING BY DATA MATRIX TRANSFORMATION A. Terms Used a. Data Matrix Objects (e.g. individuals, patterns, events) are usually represented as points (vectors) in a multidimensional space. Each dimension represents a distinct attribute describing the object. Thus, an object is represented as an m x n matrix D, where there are m rows, one for each object, and n columns, one for each attribute. This matrix is referred to as a data matrix, represented as follows:             mnmkm nk nk aaa aaa aaa .. ..... .. .. 1 2221 1111 B. Assumption 1) In the paper an effort to secure attributes with numeric values is made, with an assumption that numeric data (e.g. salary, age, phone number, etc.). is definitely the most sensitive data that needs to be secured. C. General Approach Scaling Based Transformation (SBT) method: Let Dmxn be a data matrix, where each row represents an object, and each object contains values for each of n numerical attributes. The SBT method of dimension n is an ordered pair, defined as SBT = (D, fs), where: 1. D R mxn is a normalized data matrix of objects to be clustered 2. fs is scaling based transformation function In this procedure as the scaling operation of data matrix is used , which is taken as 2D transformation. So it is mandatory to decide the scaling factor. Here it is supposed to be kept same in both the x and y direction. Doing so will lead to shifting point on a higher scale. This is the key factor of maintaining the cluster distribution before and after the transformation. Even the points will be distorted as compared to original data points, the cluster distribution remains the same. Thus this procedure preserves the privacy without distorting the data mining results before transformation.
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 452 D. Proposed Algorithm SBT_Algorithm Input : Dmxn // Dmxn is normalized data matrix Output: D'mxn 1. k ← n/2 2. Pk ← k pairs(Ai,Aj) in D such that 1 ≤ i,j ≤ n and i ≠ j 3. Decide scaling factor s. 4. For each selected pair Pk in pairs(d) do a. V(A'i,A'j) ← S X V(Ai,Aj) // S is scaling matrix with s as scaling factor End for End E. Results For performing the proposed procedure iris2D dataset is used which contains 150 records. We have performed the clustering operation using Weka 3.6. We have used simple k-Means clustering algorithm for our dataset. 1) Cluster distribution before transformation. Figure 1- Cluster Distribution before transformation This output shows that 100 records belong to first cluster (cluster 0) and rest of 50 records belong to second cluster (cluster 1). After this the transformed data set is supplied to Weka for k-Means clustering and the visualized output is as shown below. 2) Cluster distribution after transformation.
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 453 Figure 2- Cluster Distribution after transformation Comparing Figure 1 and Figure 2 it is clear that the cluster distribution before and after transformation remains the same. Hence our procedure works effectively to maintain privacy for the confidential numeric data. F. Security The above stated procedure provides security to the numeric data. It means even if the standard deviation and mean of the numeric dataset is published then also the original numeric data of dataset before transformation cannot be interpreted correctly. This is accomplished mainly in two steps: 1) Data Camouflage: First we try to conceal raw data by normalization. Obviously it is not secure but it is beneficial in two ways a) It gives an equal weight to all attributes and b) It makes difficult the re-identification of objects with other datasets. 2) Attribute Distortion: By scaling two attribute values at a time attribute distortion is achieved IV. CONCLUSION In this paper, a scaling based transformation method has been introduced for Privacy Preserving Clustering on Centralized Data. The proposed method is designed to preserve privacy only for numeric confidential data. This procedure also ensures the similar cluster distributions before and after transformation. This method is clustering algorithm independent. Moreover unsuccessful attempt is also made to recover original data from normalized data which ensures the security of data after transformation without changes in cluster distribution. Nowadays whatever data is required at particular site only that data is stored locally. So the complete dataset is stored in distributed manner. Doing so maintains the availability of data and also reduces the load of data server. Hence as a part of future work this procedure can be applied to the distributed data by making some changes for preserving privacy. This would lead to better method for maintaining confidentiality of distributed (Horizontally/Vertically) data.
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME 454 V. REFERENCES [1] "Privacy Preserving Clustering" by S.Jha, L. Kruger, P. McDaniel [2] "Privacy Preserving KMeans Clustering over Vertically Partitioned Data" by Jaideep Vaidya,Chris Clifton in SIGKDD 2003. [3] "Privacy Preserving Decision Trees over Vertically Partitioned Data" by Jaideep Vaidya,Chris Clifton, Murat Kantarcioglu, A. Scott Patterson at ACM Transactions on Knowledge Discovery from Data, Vol. 2, No. 3, Article 14, Publication date: October 2008. [4] "Privacy Preserving Spatio-Temporal Clustering on Horizontally Partitioned Data" Ali Inan, Yucel Saygin [5] "Privacy Preserving SVM Classification on Vertically Partitioned Data" Hwanjo Yu, Jaideep Vaidya, Xiaoqian Jiang. [6] "Communication Efficient Privacy-Preserving Clustering" Geetha Jagannathan, Krishnan Pillaipakkamnatt, Rebecca N. Wright, Daryl, Umano [7] A thesis on "Privacy Preserving Data Mining Over Vertically Partitioned Data" by Jaideep Shrikant Vaidya. [8] "Cryptographic techniques for privacy-preserving data mining" by Benny Pinkas. [9] "Random Data Perturbation Techniques and Privacy Preserving Data Mining " by Hillol Kargupta, Souptik Dutta, Qi Wang, Krishnamoorthy Sivakumar. [10] Deepika Khurana and Dr. M.P.S Bhatia, “Dynamic Approach To K-Means Clustering Algorithm”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013, pp. 204 - 219, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.