SlideShare a Scribd company logo
A Scalable Two-Phase Top-DownSpecialization Approach 
for Data Anonymization Using MapReduce on Cloud 
ABSTRACT: 
A large number of cloud services require users to share private data like electronic 
health records for data analysis or mining, bringing privacy concerns. 
Anonymizing data sets via generalization to satisfy certain privacy requirements 
such as k-anonymity is a widely used category of privacy preserving techniques. 
At present, the scale of data in many cloud applications increases tremendously in 
accordance with the Big Data trend, thereby making it a challenge for commonly 
used software tools to capture, manage, and process such large-scale data within a 
tolerable elapsed time. As a result, it is a challenge for existing anonymization 
approaches to achieve privacy preservation on privacy-sensitive large-scale data 
sets due to their insufficiency of scalability. In this paper, we propose a scalable 
two-phase top-down specialization (TDS) approach to anonymize large-scale data 
sets using the MapReduce framework on cloud. In both phases of our approach, we 
deliberately design a group of innovative MapReduce jobs to concretely 
accomplish the specialization computation in a highly scalable way. Experimental
evaluation results demonstrate that with our approach, the scalability and 
efficiency of TDS can be significantly improved over existing approaches. 
EXISTING SYSTEM: 
 A widely adopted parallel data processing framework, to address the 
scalability problem of the top-down specialization (TDS) approach for large-scale 
data anonymization. The TDS approach, offering a good tradeoff 
between data utility and data consistency, is widely applied for data 
anonymization. Most TDS algorithms are centralized, resulting in their 
inadequacy in handling largescale data sets. Although some distributed 
algorithms have been proposed, they mainly focus on secure anonymization 
of data sets from multiple parties, rather than the scalability aspect. 
DISADVANTAGES OF EXISTING SYSTEM: 
 The MapReduce computation paradigm still a challenge to design proper 
MapReduce jobs for TDS.
PROPOSED SYSTEM: 
 In this paper, we propose a scalable two-phase top-down specialization 
(TDS) approach to anonymize large-scale data sets using the MapReduce 
framework on cloud. 
 In both phases of our approach, we deliberately design a group of 
innovative MapReduce jobs to concretely accomplish the specialization 
computation in a highly scalable way. 
ADVANTAGES OF PROPOSED SYSTEM: 
 Accomplish the specializations in a highly scalable fashion. 
 Gain high scalability. 
 Significantly improve the scalability and efficiency of TDS for data 
anonymization over existing approaches.
SYSTEM ARCHITECTURE: 
SYSTEM REQUIREMENTS: 
HARDWARE REQUIREMENTS: 
 System : Pentium IV 2.4 GHz. 
 Hard Disk : 40 GB. 
 Floppy Drive : 1.44 Mb. 
 Monitor : 15 VGA Colour. 
 Mouse : Logitech.
 Ram : 512 Mb. 
SOFTWARE REQUIREMENTS: 
 Operating system : Windows XP/7. 
 Coding Language : JAVA/J2EE 
 IDE : Netbeans 7.4 
 Database : MYSQL 
REFERENCE: 
Xuyun Zhang, Laurence T. Yang,Chang Liu, and Jinjun Chen,“A Scalable Two- 
Phase Top-DownSpecialization Approach for Data Anonymization Using 
MapReduce on Cloud”,VOL. 25,NO. 2,FEBRUARY 2014.

More Related Content

DOCX
a scalable two phase top down specialization approach for data anonymization ...
DOCX
Cross cloud map reduce for big data
PPTX
PPTX
Grid computing the grid
PPTX
Cs6703 grid and cloud computing unit 1
DOCX
Cloud colonography distributed medical testbed over cloud
PDF
Grid computing notes
PPTX
Planet lab : cloud vs grid computing
a scalable two phase top down specialization approach for data anonymization ...
Cross cloud map reduce for big data
Grid computing the grid
Cs6703 grid and cloud computing unit 1
Cloud colonography distributed medical testbed over cloud
Grid computing notes
Planet lab : cloud vs grid computing

What's hot (20)

PDF
Introducing Novel Graph Database Cloud Computing For Efficient Data Management
PPT
Grid computing [2005]
PPT
PDF
4. the grid evolution
PDF
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
PPTX
My Other Computer is a Data Center: The Sector Perspective on Big Data
PDF
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
DOCX
A TIME EFFICIENT APPROACH FOR DETECTING ERRORS IN BIG SENSOR DATA ON CLOUD
DOCX
Hadoop map reduce for mobile clouds
PPTX
Cluster computing
PPT
Grid and cluster_computing_chapter1
DOCX
OPTIMIZING END-TO-END BIG DATA TRANSFERS OVER TERABITS NETWORK INFRASTRUCTURE
PPTX
Slide 1
PPTX
PDF
A time efficient approach for detecting errors in big sensor data on cloud
PPTX
A time efficient approach for detecting errors in big sensor data on cloud
PPT
2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...
PPT
060314 Ispra Htap Presentations Husar 060314 Ispra
PPTX
Cluster computing
Introducing Novel Graph Database Cloud Computing For Efficient Data Management
Grid computing [2005]
4. the grid evolution
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
My Other Computer is a Data Center: The Sector Perspective on Big Data
MataNui - Building a Grid Data Infrastructure that "doesn't suck!"
A TIME EFFICIENT APPROACH FOR DETECTING ERRORS IN BIG SENSOR DATA ON CLOUD
Hadoop map reduce for mobile clouds
Cluster computing
Grid and cluster_computing_chapter1
OPTIMIZING END-TO-END BIG DATA TRANSFERS OVER TERABITS NETWORK INFRASTRUCTURE
Slide 1
A time efficient approach for detecting errors in big sensor data on cloud
A time efficient approach for detecting errors in big sensor data on cloud
2006-03-14 WG on HTAP-Relevant IT Techniques, Tools and Philosophies: DataFed...
060314 Ispra Htap Presentations Husar 060314 Ispra
Cluster computing
Ad

Similar to JPJ1402 A Scalable Two-Phase Top-Down Specialization Approach For Data Anonymization Using Mapreduce On Cloud (13)

PPTX
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization ...
PDF
Anonymization of data using mapreduce on cloud
PPTX
FINAL REVIEW of a scalable phase top down
PDF
Ieeepro techno solutions ieee dotnet project - generalized approach for data
PDF
Ieeepro techno solutions ieee java project - generalized approach for data
PDF
Ieeepro techno solutions ieee java project - generalized approach for data
PDF
Ieeepro techno solutions ieee java project - generalized approach for data
PDF
Ijcatr04051015
PDF
A scalabl e and cost effective framework for privacy preservation over big d...
PDF
Data Anonymization for Privacy Preservation in Big Data
PPTX
Proximity aware local-recoding anonymization with map reduce for scalable big...
PDF
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
PDF
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization ...
Anonymization of data using mapreduce on cloud
FINAL REVIEW of a scalable phase top down
Ieeepro techno solutions ieee dotnet project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
Ijcatr04051015
A scalabl e and cost effective framework for privacy preservation over big d...
Data Anonymization for Privacy Preservation in Big Data
Proximity aware local-recoding anonymization with map reduce for scalable big...
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
Misusability Measure Based Sanitization of Big Data for Privacy Preserving Ma...
Ad

More from chennaijp (20)

DOCX
JPEEE1440 Cascaded Two-Level Inverter-Based Multilevel STATCOM for High-Pow...
DOCX
JPN1423 Stars a Statistical Traffic Pattern
DOCX
JPN1422 Defending Against Collaborative Attacks by Malicious Nodes in MANETs...
DOCX
JPN1420 Joint Routing and Medium Access Control in Fixed Random Access Wire...
DOCX
JPN1418 PSR: A Lightweight Proactive Source Routing Protocol For Mobile Ad H...
DOCX
JPN1417 AASR: An Authenticated Anonymous Secure Routing Protocol for MANETs ...
DOCX
JPN1416 Sleep Scheduling for Geographic Routing in Duty-Cycled Mobile Sensor...
DOCX
JPN1415 R3E: Reliable Reactive Routing Enhancement for Wireless Sensor Netw...
DOCX
JPN1411 Secure Continuous Aggregation in Wireless Sensor Networks
DOCX
JPN1414 Distributed Deployment Algorithms for Improved Coverage in a Networ...
DOCX
JPN1413 An Energy-Balanced Routing Method Based on Forward-Aware Factor for...
DOCX
JPN1412 Transmission-Efficient Clustering Method for Wireless Sensor Networ...
DOCX
JPN1410 Secure and Efficient Data Transmission for Cluster-Based Wireless Se...
DOCX
JPN1409 Neighbor Table Based Shortcut Tree Routing in ZigBee Wireless Networks
DOCX
JPN1408 Hop-by-Hop Message Authentication and Source Privacy in Wireless Sen...
DOCX
JPN1406 Snapshot and Continuous Data Collection in Probabilistic Wireless S...
DOCX
JPN1405 RBTP: Low-Power Mobile Discovery Protocol through Recursive Binary T...
DOCX
JPN1404 Optimal Multicast Capacity and Delay Tradeoffs in MANETs
DOCX
JPM1410 Images as Occlusions of Textures: A Framework for Segmentation
DOCX
JPM1407 Exposing Digital Image Forgeries by Illumination Color Classification
JPEEE1440 Cascaded Two-Level Inverter-Based Multilevel STATCOM for High-Pow...
JPN1423 Stars a Statistical Traffic Pattern
JPN1422 Defending Against Collaborative Attacks by Malicious Nodes in MANETs...
JPN1420 Joint Routing and Medium Access Control in Fixed Random Access Wire...
JPN1418 PSR: A Lightweight Proactive Source Routing Protocol For Mobile Ad H...
JPN1417 AASR: An Authenticated Anonymous Secure Routing Protocol for MANETs ...
JPN1416 Sleep Scheduling for Geographic Routing in Duty-Cycled Mobile Sensor...
JPN1415 R3E: Reliable Reactive Routing Enhancement for Wireless Sensor Netw...
JPN1411 Secure Continuous Aggregation in Wireless Sensor Networks
JPN1414 Distributed Deployment Algorithms for Improved Coverage in a Networ...
JPN1413 An Energy-Balanced Routing Method Based on Forward-Aware Factor for...
JPN1412 Transmission-Efficient Clustering Method for Wireless Sensor Networ...
JPN1410 Secure and Efficient Data Transmission for Cluster-Based Wireless Se...
JPN1409 Neighbor Table Based Shortcut Tree Routing in ZigBee Wireless Networks
JPN1408 Hop-by-Hop Message Authentication and Source Privacy in Wireless Sen...
JPN1406 Snapshot and Continuous Data Collection in Probabilistic Wireless S...
JPN1405 RBTP: Low-Power Mobile Discovery Protocol through Recursive Binary T...
JPN1404 Optimal Multicast Capacity and Delay Tradeoffs in MANETs
JPM1410 Images as Occlusions of Textures: A Framework for Segmentation
JPM1407 Exposing Digital Image Forgeries by Illumination Color Classification

Recently uploaded (20)

PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
master seminar digital applications in india
PDF
Pre independence Education in Inndia.pdf
PPTX
Institutional Correction lecture only . . .
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Pharma ospi slides which help in ospi learning
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Cell Structure & Organelles in detailed.
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
Sports Quiz easy sports quiz sports quiz
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
TR - Agricultural Crops Production NC III.pdf
master seminar digital applications in india
Pre independence Education in Inndia.pdf
Institutional Correction lecture only . . .
Abdominal Access Techniques with Prof. Dr. R K Mishra
human mycosis Human fungal infections are called human mycosis..pptx
Pharma ospi slides which help in ospi learning
STATICS OF THE RIGID BODIES Hibbelers.pdf
Cell Types and Its function , kingdom of life
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
O7-L3 Supply Chain Operations - ICLT Program
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Microbial disease of the cardiovascular and lymphatic systems
Module 4: Burden of Disease Tutorial Slides S2 2025
Cell Structure & Organelles in detailed.
Final Presentation General Medicine 03-08-2024.pptx
Basic Mud Logging Guide for educational purpose
Renaissance Architecture: A Journey from Faith to Humanism

JPJ1402 A Scalable Two-Phase Top-Down Specialization Approach For Data Anonymization Using Mapreduce On Cloud

  • 1. A Scalable Two-Phase Top-DownSpecialization Approach for Data Anonymization Using MapReduce on Cloud ABSTRACT: A large number of cloud services require users to share private data like electronic health records for data analysis or mining, bringing privacy concerns. Anonymizing data sets via generalization to satisfy certain privacy requirements such as k-anonymity is a widely used category of privacy preserving techniques. At present, the scale of data in many cloud applications increases tremendously in accordance with the Big Data trend, thereby making it a challenge for commonly used software tools to capture, manage, and process such large-scale data within a tolerable elapsed time. As a result, it is a challenge for existing anonymization approaches to achieve privacy preservation on privacy-sensitive large-scale data sets due to their insufficiency of scalability. In this paper, we propose a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the MapReduce framework on cloud. In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way. Experimental
  • 2. evaluation results demonstrate that with our approach, the scalability and efficiency of TDS can be significantly improved over existing approaches. EXISTING SYSTEM:  A widely adopted parallel data processing framework, to address the scalability problem of the top-down specialization (TDS) approach for large-scale data anonymization. The TDS approach, offering a good tradeoff between data utility and data consistency, is widely applied for data anonymization. Most TDS algorithms are centralized, resulting in their inadequacy in handling largescale data sets. Although some distributed algorithms have been proposed, they mainly focus on secure anonymization of data sets from multiple parties, rather than the scalability aspect. DISADVANTAGES OF EXISTING SYSTEM:  The MapReduce computation paradigm still a challenge to design proper MapReduce jobs for TDS.
  • 3. PROPOSED SYSTEM:  In this paper, we propose a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the MapReduce framework on cloud.  In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way. ADVANTAGES OF PROPOSED SYSTEM:  Accomplish the specializations in a highly scalable fashion.  Gain high scalability.  Significantly improve the scalability and efficiency of TDS for data anonymization over existing approaches.
  • 4. SYSTEM ARCHITECTURE: SYSTEM REQUIREMENTS: HARDWARE REQUIREMENTS:  System : Pentium IV 2.4 GHz.  Hard Disk : 40 GB.  Floppy Drive : 1.44 Mb.  Monitor : 15 VGA Colour.  Mouse : Logitech.
  • 5.  Ram : 512 Mb. SOFTWARE REQUIREMENTS:  Operating system : Windows XP/7.  Coding Language : JAVA/J2EE  IDE : Netbeans 7.4  Database : MYSQL REFERENCE: Xuyun Zhang, Laurence T. Yang,Chang Liu, and Jinjun Chen,“A Scalable Two- Phase Top-DownSpecialization Approach for Data Anonymization Using MapReduce on Cloud”,VOL. 25,NO. 2,FEBRUARY 2014.