SlideShare a Scribd company logo
Hadoop:
  Code Injection
  Distributed Fault Injection

    Konstantin Boudnik
       Hadoop committer, Pig contributor
       cos@apache.org
Few assumptions

   The following work has been done with use of AOP injection technology called
    AspectJ
   Similar results could be achieved with
       direct code implementation
       MOP (monkey-patching)
       Direct byte-code manipulations
   Offered approaches aren't limited by the scope of Hadoop platform ;)

   The scope of the talk isn't about AspectJ nor AOP/MOP technology




                                             2
Code Injection




       3
What for?

    Some APIs as extremely useful as dangerous if made public
       stop/blacklist a node or daemon
    
        change a node configuration

    certain functionality is experimental and needn't to be in production


    a component's source code is unavailable


    a build's re-spin isn't practical


    many changes of the same nature need to be applied


    your application doesn't have enough bugs yet


                                          4
Use cases
   producing a build for developer's testing

   simulate faults and test error recovery before deployment

   to sneak-in to the production something your boss don't
    need to know




                                 5
Injecting away
pointcut execGetBlockFile() :
// the following will inject faults inside of the method in question
  execution (* FSDataset.getBlockFile(..)) && !within(FSDatasetAspects +);

// This aspect specifies the logic of our fault point.
// In this case it simply throws DiskErrorException before invoking
// the method, specified by callGetBlockFile() pointcut
before() throws DiskErrorException : execGetBlockFile() {
  if (ProbabilityModel.injectCriteria(FSDataset.class.getSimpleName())) {
    LOG.info("Before the injection point");
    Thread.dumpStack();
    throw new DiskErrorException("FI: injected fault point at "
        + thisJoinPoint.getStaticPart().getSourceLocation());
  }
}




                                      6
Injecting away (intercept & mock)
 pointcut callCreateUri() : call (URI FileDataServlet.createUri(
     String, HdfsFileStatus, UserGroupInformation, ClientProtocol,
     HttpServletRequest, String));

 /** Replace host name with "localhost" for unit test environment. */
 URI around () throws URISyntaxException : callCreateUri() {
   final URI original = proceed();
   LOG.info("FI: original uri = " + original);
   final URI replaced = new URI(original.getScheme(),
       original.getUserInfo(),
       "localhost", original.getPort(), original.getPath(),
       original.getQuery(),
       original.getFragment()) ;
   LOG.info("FI: replaced uri = " + replaced);
   return replaced;
 }




                                     7
Distributed Fault Injection




              8
Why Fault Injection

   Hadoop deals with many kinds of faults
           Block corruption

           Failures of disk, Datanode, Namenode, Clients, Jobtracker, Tasktrackers and Tasks

           Varying rates of bandwidth and latency

   These are hard to test
           Unit tests mostly deal with specific single faults or patterns

           Faults do not occur frequently and hard to reproduce

   Need to inject fault in the real system (as opposed to a simulated system)
   More info
       http://wiki.apache.org/hadoop/HowToUseInjectionFramework




                                                         9
Usage models


    An actor configures a Hadoop cluster and “dials-in” a desired faults then runs a
    set of applications on the cluster.
    
        Test the behavior of particular feature under faults
    
        Test time and consistency of recovery at high rate of faults
    
        Observe loss of data under certain pattern and frequency of faults
    
        Observe performance/utilization
        
            Note: can inject faults in the real system's (as opposed to a simulated system)
            running jobs

    An actor write/reuse a unit/function test using the fault inject framework to
    introduce faults during the test

    Recovery procedures testing (!)


                                                 10
Fault examples (Hdfs)

   Link/communication failure and communication corruption
       Namenode to Datanode communication
       Client to Datanode communications
       Client to Namenode communications
   Namenode related failures
       General slow downs
       Edit logs slow downs
       NFS-mounted volume is slow or not responding
   Datanode related failures
       Hardware corruption and data failures
   Storage latencies and bandwidth anomalies


                                                11
Fault examples (Mapreduce)


    Task tracker
    
        Lost task trackers

    Tasks
    
        Timeouts
    
        Slow downs
    
        Shuffle failures
    
        Sort/merge failures

    Local storage issues

    JobTracker failures

    Link communication failures and corruptions

                                         12
n
Scale


    Multi-hundred nodes cluster

    Heterogeneous environment
    
        OS. switches, secure/non-secure configurations


    Multi-node faults scenarios (e.g. pipelines recovery)


    Requires fault manager/dispensary

    
        Support for multi-node, multi-conditions faults

    
        Fault identification, reproducibility, repeatability

    
        Infrastructure auto-discovery to avoid configuration complexities


                                             13
Coming soon...




       14
Client side
pointcut execGetBlockFile() :
// the following will inject faults inside of the method in question
  execution (* FSDataset.getBlockFile(..)) && !within(FSDatasetAspects +);

before() throws DiskErrorException : execGetBlockFile() {
  ArrayList<GenericFault> pipelineFault =
       FiDispenser.getFaultsFor(FSDataset.class,
       FaultID.PipelineRecovery(),
       RANDOM);

    for (int i = 0; i < pipelineFault.size(); i++) {
      pipelineFault.get(i).execute();
    }
}

Fault dispenser
MachineGroup Rack1DataNodes = new MachineGroup(rack1, TYPE.DN)

Rack1DataNodes.each {
  if (it.type == RANDOM) {
    it.setTimeout(random.nextInt(2000))
    it.setType(DiskErrorException.class)
    it.setReport('logcollector.domain.com', SYSLOG)
  }
}
                                        15
Q&A




 16
Attic slides




      17
White-box system testing: Herriot




                18
Goals


    Write cluster-based tests using Java object model


    Automate many types of tests on real clusters:

    
        Functional

    
        System

    
        Load

    
        Recovery


    More information

    
        http://wiki.apache.org/hadoop/HowToUseSystemTestFramework



                                       19
Main Features

 
     Remote daemon Observability and Controllability APIs

 
     Enables large cluster-based tests written in Java using JUnit
     (TestNG) framework

 
     Herriot is comprised of a library of utility APIs, and code
     injections into Hadoop binaries

 
     Assumes a deployed and instrumented cluster

 
     Production build contains NO Herriot instrumentation

 
     Supports fault injection



                                        20
Major design considerations


 
     Common

 
     RPC-based utilities to control remote daemons

 
     Daemons belong to different roles

 
     Remote process management from Java: start/stop, change/push
     configuration, etc.

 
     HDFS and MR specific APIs on top of Common




                                     21
Common Features

 
     Get a daemon (a remote Hadoop process) current configuration

 
     Get a daemon process info: thread#, heap, environment…

 
     Ping a daemon (make sure it’s up and ready)

 
     Get/list FileStatus of a path from a remote daemon

 
     Deamon Log Inspection: Grep remote logs, count exceptions…

 
     Cluster setup/tear down; restart

 
     Change a daemon(s) configuration, push new configs…



                                        22
Deployment Diagram



                                    DN/TT host VM
 Test host VM
                                 JT host VM
                               NN host VM
   Test                                    Injected
                Herriot
                library                Injected
                                           code
                                   Herriot
                                       code
                                   injections




                          23

More Related Content

PPT
Np unit iv i
PDF
Formbook - In-depth malware analysis (Botconf 2018)
PDF
Possibility of arbitrary code execution by Step-Oriented Programming
PPTX
Cpp unit
PPTX
The Art of Exploiting Unconventional Use-after-free Bugs in Android Kernel by...
PDF
No instrumentation Golang Logging with eBPF (GoSF talk 11/11/20)
PPTX
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
PPT
Troubleshooting Linux Kernel Modules And Device Drivers
Np unit iv i
Formbook - In-depth malware analysis (Botconf 2018)
Possibility of arbitrary code execution by Step-Oriented Programming
Cpp unit
The Art of Exploiting Unconventional Use-after-free Bugs in Android Kernel by...
No instrumentation Golang Logging with eBPF (GoSF talk 11/11/20)
Speculative Execution of Parallel Programs with Precise Exception Semantics ...
Troubleshooting Linux Kernel Modules And Device Drivers

What's hot (20)

PDF
Implementation of the ZigBee ZCL Reporting Configuration Features
PDF
[嵌入式系統] MCS-51 實驗 - 使用 IAR (3)
PPTX
Windows Debugging with WinDbg
PPTX
08 - Return Oriented Programming, the chosen one
PDF
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
PDF
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
PDF
fg.workshop: Software vulnerability
PDF
You're Off the Hook: Blinding Security Software
PDF
Erp 2.50 openbravo environment installation openbravo-wiki
PDF
Debugging linux kernel tools and techniques
PDF
Linux Kernel Debugging Essentials workshop
PDF
Awesome_fuzzing_for _pentester_red-pill_2017
PDF
Course lecture - An introduction to the Return Oriented Programming
PPTX
Androsia: A step ahead in securing in-memory Android application data by Sami...
PDF
CppUnit using introduction
PDF
Reverse engineering - Shellcodes techniques
PPT
Virtual platform
PDF
[ZigBee 嵌入式系統] ZigBee 應用實作 - 使用 TI Z-Stack Firmware
PDF
Crash dump analysis - experience sharing
PPTX
Return Oriented Programming (ROP) Based Exploits - Part I
Implementation of the ZigBee ZCL Reporting Configuration Features
[嵌入式系統] MCS-51 實驗 - 使用 IAR (3)
Windows Debugging with WinDbg
08 - Return Oriented Programming, the chosen one
Take a Jailbreak -Stunning Guards for iOS Jailbreak- by Kaoru Otsuka
For the Greater Good: Leveraging VMware's RPC Interface for fun and profit by...
fg.workshop: Software vulnerability
You're Off the Hook: Blinding Security Software
Erp 2.50 openbravo environment installation openbravo-wiki
Debugging linux kernel tools and techniques
Linux Kernel Debugging Essentials workshop
Awesome_fuzzing_for _pentester_red-pill_2017
Course lecture - An introduction to the Return Oriented Programming
Androsia: A step ahead in securing in-memory Android application data by Sami...
CppUnit using introduction
Reverse engineering - Shellcodes techniques
Virtual platform
[ZigBee 嵌入式系統] ZigBee 應用實作 - 使用 TI Z-Stack Firmware
Crash dump analysis - experience sharing
Return Oriented Programming (ROP) Based Exploits - Part I
Ad

Similar to Hadoop: Code Injection, Distributed Fault Injection (20)

PDF
Command pattern vs. MVC: Lean Beans (are made of this)
PPT
香港六合彩 &raquo; SlideShare
PDF
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
PDF
Hadoop Internals
PPTX
A Fabric/Puppet Build/Deploy System
PDF
Understanding the Dalvik Virtual Machine
PDF
Working Effectively With Legacy Perl Code
PDF
Introduction to PowerShell
PPTX
Resilience Testing
PPTX
FIWARE Wednesday Webinars - How to Debug IoT Agents
PDF
LibOS as a regression test framework for Linux networking #netdev1.1
PPT
.NET Debugging Tips and Techniques
PPT
.Net Debugging Techniques
PDF
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
PDF
Genode Compositions
PDF
What's New In Apache Lenya 1.4
PDF
What to expect from Java 9
PPT
Stopping the Rot - Putting Legacy C++ Under Test
PPTX
Real World Lessons on the Pain Points of Node.js Applications
PPTX
Oleksandr Valetskyy - DI vs. IoC
Command pattern vs. MVC: Lean Beans (are made of this)
香港六合彩 &raquo; SlideShare
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Hadoop Internals
A Fabric/Puppet Build/Deploy System
Understanding the Dalvik Virtual Machine
Working Effectively With Legacy Perl Code
Introduction to PowerShell
Resilience Testing
FIWARE Wednesday Webinars - How to Debug IoT Agents
LibOS as a regression test framework for Linux networking #netdev1.1
.NET Debugging Tips and Techniques
.Net Debugging Techniques
Bug bites Elephant? Test-driven Quality Assurance in Big Data Application Dev...
Genode Compositions
What's New In Apache Lenya 1.4
What to expect from Java 9
Stopping the Rot - Putting Legacy C++ Under Test
Real World Lessons on the Pain Points of Node.js Applications
Oleksandr Valetskyy - DI vs. IoC
Ad

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Introducing the data science sandbox as a service 8.30.18
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Spectroscopy.pptx food analysis technology
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation theory and applications.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Big Data Technologies - Introduction.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Machine Learning_overview_presentation.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Programs and apps: productivity, graphics, security and other tools
Spectroscopy.pptx food analysis technology
A comparative analysis of optical character recognition models for extracting...
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation theory and applications.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Review of recent advances in non-invasive hemoglobin estimation
Assigned Numbers - 2025 - Bluetooth® Document
The Rise and Fall of 3GPP – Time for a Sabbatical?
Dropbox Q2 2025 Financial Results & Investor Presentation
Big Data Technologies - Introduction.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine Learning_overview_presentation.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation_ Review paper, used for researhc scholars
gpt5_lecture_notes_comprehensive_20250812015547.pdf

Hadoop: Code Injection, Distributed Fault Injection

  • 1. Hadoop: Code Injection Distributed Fault Injection Konstantin Boudnik Hadoop committer, Pig contributor cos@apache.org
  • 2. Few assumptions  The following work has been done with use of AOP injection technology called AspectJ  Similar results could be achieved with  direct code implementation  MOP (monkey-patching)  Direct byte-code manipulations  Offered approaches aren't limited by the scope of Hadoop platform ;)  The scope of the talk isn't about AspectJ nor AOP/MOP technology 2
  • 4. What for?  Some APIs as extremely useful as dangerous if made public  stop/blacklist a node or daemon  change a node configuration  certain functionality is experimental and needn't to be in production  a component's source code is unavailable  a build's re-spin isn't practical  many changes of the same nature need to be applied  your application doesn't have enough bugs yet 4
  • 5. Use cases  producing a build for developer's testing  simulate faults and test error recovery before deployment  to sneak-in to the production something your boss don't need to know 5
  • 6. Injecting away pointcut execGetBlockFile() : // the following will inject faults inside of the method in question execution (* FSDataset.getBlockFile(..)) && !within(FSDatasetAspects +); // This aspect specifies the logic of our fault point. // In this case it simply throws DiskErrorException before invoking // the method, specified by callGetBlockFile() pointcut before() throws DiskErrorException : execGetBlockFile() { if (ProbabilityModel.injectCriteria(FSDataset.class.getSimpleName())) { LOG.info("Before the injection point"); Thread.dumpStack(); throw new DiskErrorException("FI: injected fault point at " + thisJoinPoint.getStaticPart().getSourceLocation()); } } 6
  • 7. Injecting away (intercept & mock) pointcut callCreateUri() : call (URI FileDataServlet.createUri( String, HdfsFileStatus, UserGroupInformation, ClientProtocol, HttpServletRequest, String)); /** Replace host name with "localhost" for unit test environment. */ URI around () throws URISyntaxException : callCreateUri() { final URI original = proceed(); LOG.info("FI: original uri = " + original); final URI replaced = new URI(original.getScheme(), original.getUserInfo(), "localhost", original.getPort(), original.getPath(), original.getQuery(), original.getFragment()) ; LOG.info("FI: replaced uri = " + replaced); return replaced; } 7
  • 9. Why Fault Injection  Hadoop deals with many kinds of faults  Block corruption  Failures of disk, Datanode, Namenode, Clients, Jobtracker, Tasktrackers and Tasks  Varying rates of bandwidth and latency  These are hard to test  Unit tests mostly deal with specific single faults or patterns  Faults do not occur frequently and hard to reproduce  Need to inject fault in the real system (as opposed to a simulated system)  More info  http://wiki.apache.org/hadoop/HowToUseInjectionFramework 9
  • 10. Usage models  An actor configures a Hadoop cluster and “dials-in” a desired faults then runs a set of applications on the cluster.  Test the behavior of particular feature under faults  Test time and consistency of recovery at high rate of faults  Observe loss of data under certain pattern and frequency of faults  Observe performance/utilization  Note: can inject faults in the real system's (as opposed to a simulated system) running jobs  An actor write/reuse a unit/function test using the fault inject framework to introduce faults during the test  Recovery procedures testing (!) 10
  • 11. Fault examples (Hdfs)  Link/communication failure and communication corruption  Namenode to Datanode communication  Client to Datanode communications  Client to Namenode communications  Namenode related failures  General slow downs  Edit logs slow downs  NFS-mounted volume is slow or not responding  Datanode related failures  Hardware corruption and data failures  Storage latencies and bandwidth anomalies 11
  • 12. Fault examples (Mapreduce)  Task tracker  Lost task trackers  Tasks  Timeouts  Slow downs  Shuffle failures  Sort/merge failures  Local storage issues  JobTracker failures  Link communication failures and corruptions 12
  • 13. n Scale  Multi-hundred nodes cluster  Heterogeneous environment  OS. switches, secure/non-secure configurations  Multi-node faults scenarios (e.g. pipelines recovery)  Requires fault manager/dispensary  Support for multi-node, multi-conditions faults  Fault identification, reproducibility, repeatability  Infrastructure auto-discovery to avoid configuration complexities 13
  • 15. Client side pointcut execGetBlockFile() : // the following will inject faults inside of the method in question execution (* FSDataset.getBlockFile(..)) && !within(FSDatasetAspects +); before() throws DiskErrorException : execGetBlockFile() { ArrayList<GenericFault> pipelineFault = FiDispenser.getFaultsFor(FSDataset.class, FaultID.PipelineRecovery(), RANDOM); for (int i = 0; i < pipelineFault.size(); i++) { pipelineFault.get(i).execute(); } } Fault dispenser MachineGroup Rack1DataNodes = new MachineGroup(rack1, TYPE.DN) Rack1DataNodes.each { if (it.type == RANDOM) { it.setTimeout(random.nextInt(2000)) it.setType(DiskErrorException.class) it.setReport('logcollector.domain.com', SYSLOG) } } 15
  • 19. Goals  Write cluster-based tests using Java object model  Automate many types of tests on real clusters:  Functional  System  Load  Recovery  More information  http://wiki.apache.org/hadoop/HowToUseSystemTestFramework 19
  • 20. Main Features  Remote daemon Observability and Controllability APIs  Enables large cluster-based tests written in Java using JUnit (TestNG) framework  Herriot is comprised of a library of utility APIs, and code injections into Hadoop binaries  Assumes a deployed and instrumented cluster  Production build contains NO Herriot instrumentation  Supports fault injection 20
  • 21. Major design considerations  Common  RPC-based utilities to control remote daemons  Daemons belong to different roles  Remote process management from Java: start/stop, change/push configuration, etc.  HDFS and MR specific APIs on top of Common 21
  • 22. Common Features  Get a daemon (a remote Hadoop process) current configuration  Get a daemon process info: thread#, heap, environment…  Ping a daemon (make sure it’s up and ready)  Get/list FileStatus of a path from a remote daemon  Deamon Log Inspection: Grep remote logs, count exceptions…  Cluster setup/tear down; restart  Change a daemon(s) configuration, push new configs… 22
  • 23. Deployment Diagram DN/TT host VM Test host VM JT host VM NN host VM Test Injected Herriot library Injected code Herriot code injections 23