SecureMR -  Practical Hadoop Security Triangle Hadoop Users Group  September 14 th , 2010 /32
SecureMR - Overview Long-term Goal Deploy MapReduce over open systems with security guarantee Motivation Industry  Google, Yahoo!, Facebook Academia  Machine Learning, Data Intensive Computation, Image Processing Our Focus Provide  integrity  assurance for MapReduce in open systems Basic Idea Adopt a replication-based scheme Decentralize integrity verification /32
Outline Introduction System Model System Design Analysis and Evaluation Related Work Conclusion /32
MapReduce Overview … …  Reduce Phase DFS … …  Map Phase M2 R1 Input B2 … … Bn B1 M1 Local Write Read from DFS Assign MapTask Assign ReduceTask Remote Read Output 1 Output r Write to DFS … …  Intermediate Result DFS /32 Rr Reducer Mapper Mn Master P1 ... … Pr P1 … … Pr P1 … … Pr
MapReduce – WordCount Application Hello World, Bye World!  Hello MapReduce, Goodbye to MapReduce. Welcome to ACSAC, Goodbye to ACSAC. Reduce Phase DFS Map Phase Intermediate Result DFS (Hello, 1)  (Bye, 1) (World, 1) (World, 1) (Welcome, 1) (to, 1) (to, 1) (ACSAC, 1) (Goodbye, 1) (ACSAC, 1) (Hello, 1) (to, 1) (MapReduce, 1) (Goodbye, 1) (MapReduce, 1) R1 R2 (Hello, 2)  (Bye, 1) (Welcome, 1) (to, 3) (World, 2) (ACSAC, 2) (Goodbye, 2) (MapReduce, 2) /32 M1 M2 M3
Outline Introduction System Model System Design Analysis and Evaluation Related Work Conclusion /32
System Model Goal Deploy MapReduce over open systems with integrity assurance Open system is different from closed system Attacks against MapReduce in open systems Communication attacks Eavesdropping, DoS and replay attacks Data processing service integrity attacks Insert fake data, tamper data and drop data /32 (Our Focus)
System Model –  Integrity Attacks … …  Reduce Phase DFS … …  Map Phase Input P1 ... … Pr B2 … … Bn B1 P1 … … Pr P1 … … Pr Output 1 Output r … …  Intermediate Result DFS /32 M2 R1 M1 Rr Mn Master
System Model Assumptions PKI is deployed in advance Master is trusted DFS provides data integrity protection [Atallah, et al., ICDE’08] Attack Models Non-collusive malicious behavior Collusive malicious behavior /32
Outline Introduction System Model System Design Analysis and Evaluation Related Work Conclusion /32
SecureMR Basic Idea Adopt a replication-based scheme ( integrity ) /32
A Naive Approach B1 B2 B3 B4 Read Send results to master Send results to master Send intermediate result to reducer Process … … Bn Ma Mb Ra /32 Rb Process Scalability? Integrity? P1 P2 … … Pr P1 P2 … … Pr H P1 … H P1 … == ???
A Naive Approach B1 B2 B3 B4 Read Send results to master Send results to master … … Bn Ma Mb Ra /32 Rb P1 P2 … … Pr P1 P2 … … Pr H P1 H P2 … H P1 H P2 … H ==
A Naive Approach Read Send results to master Send results to master Ma Mb Ra Send  tampered  result to reducer Output 1 /32 Rb Output 1 == P1 P2 … … Pr P1 P2 … … Pr B1 B2 B3 B4 … … Bn H P1 H P2 … H P1 H P2 … H ==
SecureMR Basic Idea Adopt a replication-based scheme ( integrity ) Decentralize  integrity verification ( scalability & integrity ) Design Goals Security Non-repudiation, resilience to DoS and replay attacks Performance Minimize computation cost and network communications Applicability Preserve existing protocol as much as possible /32
SecureMR – Architecture Design MapReduce Open Systems Grid Computing, Volunteer Computing and P2P Computing Network Infrastructure User Applications Task Executor Scheduler Task Executor /32 Reducer Master Mapper
SecureMR – Architecture Design SecureMR Open Systems Grid Computing, Volunteer Computing and P2P Computing Network Infrastructure User Applications Secure Task Executor Secure Verifier Secure Scheduler Secure Manager Secure Task Executor Secure Committer /32 Reducer Master Mapper
SecureMR – Communication Design … …  Reduce Phase B1 B2 … … Bn DFS 2. Read 7. Notify … …  Map Phase 5. Compare 1.1. Assign 8. Request 9. Response 10. Verify 3. Process Master 4. Commit 1.2. Assign 6. Assign Input /32 M2 R1 M1 Rr Reducer Mapper Mn Commitment Verification
SecureMR – Commitment Protocol Send hashes Send hashes Ma Mb Read /32 P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig H P1 H P2 … {H} sig H P1 … {H} sig == B1 B2 B3 B4 … … Bn
SecureMR – Verification Protocol H P1 H P2 … H Pr {H r } sig P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig Send hashes Send hashes Notify &  {H P1 }sig Read & Calculate  H’ P1 H P1  ==  H’ P1 ? … … … … Notify &  {H Pr }sig Read & Calculate  H’ Pr H Pr  ==  H’ Pr ? Read Ma Mb R1 Rr /32 P1 P2 … … Pr B1 B2 B3 B4 … … Bn
SecureMR – Verification Protocol H P1 H P2 … H Pr {H r } sig P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig Send hashes Send hashes Notify &  {H P1 }sig Read & Calculate  H’ P1 H P1   ==  H’ P1 Read Ma Mb R1 /32 P1 P2 … … Pr B1 B2 B3 B4 … … Bn
MapReduce in Open Systems –  Integrity … …  Reduce Phase DFS … …  Map Phase Input B2 … … Bn B1 Local Write Read from DFS Assign MapTask Assign ReduceTask Remote Read Output 1 Output r Write to DFS … …  Intermediate Result DFS /32 M2 R1 M1 Rr Reducer Mapper Mn Master P1 ... … Pr P1 … … Pr P1 … … Pr
Outline Introduction System Model System Design Analysis and Evaluation Related Work Conclusion /32
SecureMR – Analysis Security Analysis No false alarm Non-repudiation Attacker Behavior Analysis Periodical attackers without collusion  (Detection Rate) Periodical attackers with collusion (Detection Rate) Strategic attackers (Misbehaving Probability) Detection Rate We define the detection rate, denoted  D rate , as the probability that the inconsistency between results caused by the misbehavior is detected during  l  jobs. /32
SecureMR – Analysis /32 Detection Rate for Collusive Periodical Attacker # of works n = 50 misbehaving probability p m  = 0.5 # of blocks b = 20 # of jobs l = 15 p b  – duplication rate m – # of malicious workers
SecureMR – Evaluation System Implementation Implementation based on Hadoop Two scheduling algorithms for comparisons Naive task scheduling algorithm Commitment-based task scheduling algorithm Non-blocking  Consistency  verification Experiment Setup 14 hosts in Virtual Computing Lab (VCL) 2.66GHz Intel Intel(R) Core(TM) 2 Duo Ubuntu Linux 8.04, Sun JDK 6 and Hadoop 0.19 Hadoop WordCount application /32
SecureMR – Evaluation /32 # of map tasks = 60 # of reduce tasks = 25 size of input data = 1GB Response Time We define the response time as the time to finish map and reduce tasks in a job. Response Time vs Duplication Rate
Outline Introduction System Model System Design Analysis and Evaluation Related Work Conclusion /32
Related Work Research related to MapReduce Machine Learning [Cheng, et al., NIPS 2006] Data Intensive Computing [Ekanayake, et al., eScience 2008] Semantic Annotation [Laclav´ık, et al., ICCS 2008] Few attention paied to the integrity protection in MapReduce Related techniques Sampling for uncheatable grid computing [Du, et al., ICDCS 2004] Quiz for result verification [Zhao, et al., P2P 2005] Majority voting and sport-checking [Sarmenta, et al., FGCS 2002] None of them addressed unique challenges like massive data processing and multi-party distributed computation Research on system security Securing publish-subscribe services [Srivatsa, et al., CCS 2005] Peerreview in distributed systems [Haeberlen, et al., SOSP 2007] SecureMR focuses on a different domain /32
Outline Introduction System Model System Design Analysis and Evaluation Related Work Conclusion /32
Conclusion To the best of our knowledge, our work makes the first attempt to address this problem. Contributions A decentralized replication-based integrity verification scheme A prototype of SecureMR Analytical study and experimental evaluation of performance overhead Future Work Explore other techniques to address collusion attack Provide data quality assurance for final result /32
Thank you Questions? /32

More Related Content

PDF
Information-Flow Analysis of Design Breaks up
PPT
Chapter 18 - Distributed Coordination
PPTX
network attacks
PDF
Chapter 11d coordination agreement
DOCX
Distributed System
PPT
PDF
Program and Network Properties
PPT
Prescriptive Topology Daemon - ptmd
Information-Flow Analysis of Design Breaks up
Chapter 18 - Distributed Coordination
network attacks
Chapter 11d coordination agreement
Distributed System
Program and Network Properties
Prescriptive Topology Daemon - ptmd

What's hot (20)

PDF
Principles of Compiler Design
PPT
program partitioning and scheduling IN Advanced Computer Architecture
PPT
Chapter 1 pc
PPTX
Lec 4 (program and network properties)
PDF
Operating System : Ch18 distributed coordination
PPT
Distributed Coordination
PPT
Chapter 11c coordination agreement
PDF
COMPUTER CONCEPTS AND FUNDAMENTALS OF PROGRAMMING
PPTX
Replication in Distributed Systems
PPTX
Communication And Synchronization In Distributed Systems
PDF
Bulk-Synchronous-Parallel - BSP
PDF
Clock Synchronization in Distributed Systems
PPT
Hardware and Software parallelism
PPT
dos mutual exclusion algos
PPT
Process Management-Process Migration
PPTX
Performance measures
PPT
Distributed System
PDF
Ebc7fc8ba9801f03982acec158fa751744ca copie
PPT
Mutual Exclusion Election (Distributed computing)
Principles of Compiler Design
program partitioning and scheduling IN Advanced Computer Architecture
Chapter 1 pc
Lec 4 (program and network properties)
Operating System : Ch18 distributed coordination
Distributed Coordination
Chapter 11c coordination agreement
COMPUTER CONCEPTS AND FUNDAMENTALS OF PROGRAMMING
Replication in Distributed Systems
Communication And Synchronization In Distributed Systems
Bulk-Synchronous-Parallel - BSP
Clock Synchronization in Distributed Systems
Hardware and Software parallelism
dos mutual exclusion algos
Process Management-Process Migration
Performance measures
Distributed System
Ebc7fc8ba9801f03982acec158fa751744ca copie
Mutual Exclusion Election (Distributed computing)
Ad

Similar to Tri hug 2010 wei (20)

PDF
Towards a virtual domain based authentication on mapreduce
PDF
Parallel Data Processing with MapReduce: A Survey
PDF
Hadoop & MapReduce
PDF
Seminar_Report_hadoop
PPT
Introduction To Map Reduce
PDF
Sparse matrix computations in MapReduce
PDF
MapReduce basics
PPTX
MapReduce presentation
PPT
Big Data & NoSQL - EFS'11 (Pavlo Baron)
PPTX
Introduction to map reduce
PDF
Эдуард Бортников «Предсказание «узких мест» при выполнении команд в кластерах...
PPT
Seminar Presentation Hadoop
PPT
hadoop introduce
PPT
PDF
Hadoop Overview & Architecture
 
PDF
Authentic and Anonymous Data Sharing with Data Partitioning in Big Data
PDF
Handout3o
PPTX
mapreduce.pptx
PDF
MSR 2009
Towards a virtual domain based authentication on mapreduce
Parallel Data Processing with MapReduce: A Survey
Hadoop & MapReduce
Seminar_Report_hadoop
Introduction To Map Reduce
Sparse matrix computations in MapReduce
MapReduce basics
MapReduce presentation
Big Data & NoSQL - EFS'11 (Pavlo Baron)
Introduction to map reduce
Эдуард Бортников «Предсказание «узких мест» при выполнении команд в кластерах...
Seminar Presentation Hadoop
hadoop introduce
Hadoop Overview & Architecture
 
Authentic and Anonymous Data Sharing with Data Partitioning in Big Data
Handout3o
mapreduce.pptx
MSR 2009
Ad

More from ryancox (6)

PPT
Developing Applications for Beagle Bone Black, Raspberry Pi and SoC Single Bo...
PDF
Brian O'Connor HBase Talk - Triangle Hadoop Users Group Dec 2010
ZIP
Hadoop New And Note - December 2010 TriHUG
PPTX
Megadata With Python and Hadoop
PPT
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
PPT
dtrace
Developing Applications for Beagle Bone Black, Raspberry Pi and SoC Single Bo...
Brian O'Connor HBase Talk - Triangle Hadoop Users Group Dec 2010
Hadoop New And Note - December 2010 TriHUG
Megadata With Python and Hadoop
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
dtrace

Tri hug 2010 wei

  • 1. SecureMR - Practical Hadoop Security Triangle Hadoop Users Group September 14 th , 2010 /32
  • 2. SecureMR - Overview Long-term Goal Deploy MapReduce over open systems with security guarantee Motivation Industry Google, Yahoo!, Facebook Academia Machine Learning, Data Intensive Computation, Image Processing Our Focus Provide integrity assurance for MapReduce in open systems Basic Idea Adopt a replication-based scheme Decentralize integrity verification /32
  • 3. Outline Introduction System Model System Design Analysis and Evaluation Related Work Conclusion /32
  • 4. MapReduce Overview … … Reduce Phase DFS … … Map Phase M2 R1 Input B2 … … Bn B1 M1 Local Write Read from DFS Assign MapTask Assign ReduceTask Remote Read Output 1 Output r Write to DFS … … Intermediate Result DFS /32 Rr Reducer Mapper Mn Master P1 ... … Pr P1 … … Pr P1 … … Pr
  • 5. MapReduce – WordCount Application Hello World, Bye World!  Hello MapReduce, Goodbye to MapReduce. Welcome to ACSAC, Goodbye to ACSAC. Reduce Phase DFS Map Phase Intermediate Result DFS (Hello, 1) (Bye, 1) (World, 1) (World, 1) (Welcome, 1) (to, 1) (to, 1) (ACSAC, 1) (Goodbye, 1) (ACSAC, 1) (Hello, 1) (to, 1) (MapReduce, 1) (Goodbye, 1) (MapReduce, 1) R1 R2 (Hello, 2) (Bye, 1) (Welcome, 1) (to, 3) (World, 2) (ACSAC, 2) (Goodbye, 2) (MapReduce, 2) /32 M1 M2 M3
  • 6. Outline Introduction System Model System Design Analysis and Evaluation Related Work Conclusion /32
  • 7. System Model Goal Deploy MapReduce over open systems with integrity assurance Open system is different from closed system Attacks against MapReduce in open systems Communication attacks Eavesdropping, DoS and replay attacks Data processing service integrity attacks Insert fake data, tamper data and drop data /32 (Our Focus)
  • 8. System Model – Integrity Attacks … … Reduce Phase DFS … … Map Phase Input P1 ... … Pr B2 … … Bn B1 P1 … … Pr P1 … … Pr Output 1 Output r … … Intermediate Result DFS /32 M2 R1 M1 Rr Mn Master
  • 9. System Model Assumptions PKI is deployed in advance Master is trusted DFS provides data integrity protection [Atallah, et al., ICDE’08] Attack Models Non-collusive malicious behavior Collusive malicious behavior /32
  • 10. Outline Introduction System Model System Design Analysis and Evaluation Related Work Conclusion /32
  • 11. SecureMR Basic Idea Adopt a replication-based scheme ( integrity ) /32
  • 12. A Naive Approach B1 B2 B3 B4 Read Send results to master Send results to master Send intermediate result to reducer Process … … Bn Ma Mb Ra /32 Rb Process Scalability? Integrity? P1 P2 … … Pr P1 P2 … … Pr H P1 … H P1 … == ???
  • 13. A Naive Approach B1 B2 B3 B4 Read Send results to master Send results to master … … Bn Ma Mb Ra /32 Rb P1 P2 … … Pr P1 P2 … … Pr H P1 H P2 … H P1 H P2 … H ==
  • 14. A Naive Approach Read Send results to master Send results to master Ma Mb Ra Send tampered result to reducer Output 1 /32 Rb Output 1 == P1 P2 … … Pr P1 P2 … … Pr B1 B2 B3 B4 … … Bn H P1 H P2 … H P1 H P2 … H ==
  • 15. SecureMR Basic Idea Adopt a replication-based scheme ( integrity ) Decentralize integrity verification ( scalability & integrity ) Design Goals Security Non-repudiation, resilience to DoS and replay attacks Performance Minimize computation cost and network communications Applicability Preserve existing protocol as much as possible /32
  • 16. SecureMR – Architecture Design MapReduce Open Systems Grid Computing, Volunteer Computing and P2P Computing Network Infrastructure User Applications Task Executor Scheduler Task Executor /32 Reducer Master Mapper
  • 17. SecureMR – Architecture Design SecureMR Open Systems Grid Computing, Volunteer Computing and P2P Computing Network Infrastructure User Applications Secure Task Executor Secure Verifier Secure Scheduler Secure Manager Secure Task Executor Secure Committer /32 Reducer Master Mapper
  • 18. SecureMR – Communication Design … … Reduce Phase B1 B2 … … Bn DFS 2. Read 7. Notify … … Map Phase 5. Compare 1.1. Assign 8. Request 9. Response 10. Verify 3. Process Master 4. Commit 1.2. Assign 6. Assign Input /32 M2 R1 M1 Rr Reducer Mapper Mn Commitment Verification
  • 19. SecureMR – Commitment Protocol Send hashes Send hashes Ma Mb Read /32 P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig H P1 H P2 … {H} sig H P1 … {H} sig == B1 B2 B3 B4 … … Bn
  • 20. SecureMR – Verification Protocol H P1 H P2 … H Pr {H r } sig P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig Send hashes Send hashes Notify & {H P1 }sig Read & Calculate H’ P1 H P1 == H’ P1 ? … … … … Notify & {H Pr }sig Read & Calculate H’ Pr H Pr == H’ Pr ? Read Ma Mb R1 Rr /32 P1 P2 … … Pr B1 B2 B3 B4 … … Bn
  • 21. SecureMR – Verification Protocol H P1 H P2 … H Pr {H r } sig P1 P2 … … Pr H P1 H P2 … H Pr {H r } sig Send hashes Send hashes Notify & {H P1 }sig Read & Calculate H’ P1 H P1 == H’ P1 Read Ma Mb R1 /32 P1 P2 … … Pr B1 B2 B3 B4 … … Bn
  • 22. MapReduce in Open Systems – Integrity … … Reduce Phase DFS … … Map Phase Input B2 … … Bn B1 Local Write Read from DFS Assign MapTask Assign ReduceTask Remote Read Output 1 Output r Write to DFS … … Intermediate Result DFS /32 M2 R1 M1 Rr Reducer Mapper Mn Master P1 ... … Pr P1 … … Pr P1 … … Pr
  • 23. Outline Introduction System Model System Design Analysis and Evaluation Related Work Conclusion /32
  • 24. SecureMR – Analysis Security Analysis No false alarm Non-repudiation Attacker Behavior Analysis Periodical attackers without collusion (Detection Rate) Periodical attackers with collusion (Detection Rate) Strategic attackers (Misbehaving Probability) Detection Rate We define the detection rate, denoted D rate , as the probability that the inconsistency between results caused by the misbehavior is detected during l jobs. /32
  • 25. SecureMR – Analysis /32 Detection Rate for Collusive Periodical Attacker # of works n = 50 misbehaving probability p m = 0.5 # of blocks b = 20 # of jobs l = 15 p b – duplication rate m – # of malicious workers
  • 26. SecureMR – Evaluation System Implementation Implementation based on Hadoop Two scheduling algorithms for comparisons Naive task scheduling algorithm Commitment-based task scheduling algorithm Non-blocking Consistency verification Experiment Setup 14 hosts in Virtual Computing Lab (VCL) 2.66GHz Intel Intel(R) Core(TM) 2 Duo Ubuntu Linux 8.04, Sun JDK 6 and Hadoop 0.19 Hadoop WordCount application /32
  • 27. SecureMR – Evaluation /32 # of map tasks = 60 # of reduce tasks = 25 size of input data = 1GB Response Time We define the response time as the time to finish map and reduce tasks in a job. Response Time vs Duplication Rate
  • 28. Outline Introduction System Model System Design Analysis and Evaluation Related Work Conclusion /32
  • 29. Related Work Research related to MapReduce Machine Learning [Cheng, et al., NIPS 2006] Data Intensive Computing [Ekanayake, et al., eScience 2008] Semantic Annotation [Laclav´ık, et al., ICCS 2008] Few attention paied to the integrity protection in MapReduce Related techniques Sampling for uncheatable grid computing [Du, et al., ICDCS 2004] Quiz for result verification [Zhao, et al., P2P 2005] Majority voting and sport-checking [Sarmenta, et al., FGCS 2002] None of them addressed unique challenges like massive data processing and multi-party distributed computation Research on system security Securing publish-subscribe services [Srivatsa, et al., CCS 2005] Peerreview in distributed systems [Haeberlen, et al., SOSP 2007] SecureMR focuses on a different domain /32
  • 30. Outline Introduction System Model System Design Analysis and Evaluation Related Work Conclusion /32
  • 31. Conclusion To the best of our knowledge, our work makes the first attempt to address this problem. Contributions A decentralized replication-based integrity verification scheme A prototype of SecureMR Analytical study and experimental evaluation of performance overhead Future Work Explore other techniques to address collusion attack Provide data quality assurance for final result /32

Editor's Notes

  • #6: Use Hello as an example.