Open In App

How MapReduce completes a task?

Last Updated : 11 Aug, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Running a MapReduce job isn't just about splitting data and computing results it also involves monitoring, handling failures and finally committing the output. Let’s break down what happens when a job completes successfully (and what Hadoop does when things go wrong).

Final Stage: Marking Job as Successful

Once all the map and reduce tasks are done, ApplicationMaster updates job status to SUCCESSFUL.

1. Status Update

  • The ApplicationMaster marks the job as successful after the last task finishes.
  • The client (your program) gets this update through waitForCompletion(true) method.

waitForCompletion(true) is a blocking call that waits until the entire job finishes and returns job status to the client.

2. Console Output

Once the job completes successfully:

  • A success message is printed on console confirming job’s completion.
  • Job counters and statistics are displayed such as number of input/output records processed, bytes read/written and time taken by each task phase.
  • This helps users understand job performance and verify that expected data processing occurred.

3. Optional HTTP Notification

You can enable automatic job completion notifications by configuring:

mapreduce.job.end-notification.url

This URL gets a callback when your job ends.

4. Cleanup Phase

After the job is marked complete, All containers (Map, Reduce, ApplicationMaster) clean up their temporary files and resources. The OutputCommitter runs commitJob():

  • Moves temporary task outputs into the final output directory.
  • The final output is saved to the directory specified in your code using FileOutputFormat.setOutputPath().
  • Deletes intermediate or scratch data used during execution.
  • Ensures the output is consistent and ready for downstream use.

Note: Hadoop uses FileOutputCommitter by default, which ensures output is only published if all tasks succeed preventing partial writes.

5. Archival

After successful completion, Job History Server stores metadata about the completed job. This includes logs, counters and configuration useful for:

  • Debugging failed tasks
  • Auditing past job runs
  • Performance analysis and tuning

Handling Failures in MapReduce

Hadoop’s strength lies in fault tolerance. Jobs can still succeed even if some components fail. Let’s explore how Hadoop handles failures.

Components That Might Fail

  1. ApplicationMaster
  2. NodeManager
  3. ResourceManager
  4. Map or Reduce Tasks

This diagram shows which MapReduce components step in to complete tasks and manage failures:

Let’s explore how Hadoop detects, manages and recovers from task or node failures to ensure job completion.

Task Failure

If a Mapper or Reducer crashes due to user code errors (e.g., bugs, bad input) JVM running that task exits.

  • The ApplicationMaster marks the attempt as failed and releases the container.
  • Logs are generated automatically to help with debugging.

In Hadoop Streaming, any non-zero exit code is treated as a failure. This behavior is controlled by:

stream.non.zero.exit.is.failure #(default = true)

JVM Crashes or Sudden Exit

If JVM running a task crashes or exits unexpectedly:

  • The NodeManager detects the abnormal termination.
  • It reports the failure to ApplicationMaster, which then marks the task attempt as failed.

This helps Hadoop recover quickly by retrying the task on another node.

Hanging Tasks

If a task appears to hang (i.e., stops reporting progress):

  • After a default timeout of 10 minutes, Hadoop assumes it's stuck and kills JVM running it.
  • This prevents jobs from stalling indefinitely.

Configurable via:

mapreduce.task.timeout

Set to 0 to disable timeout (not recommended)

Retrying Failed Tasks

When a Map or Reduce task fails due to temporary issues (like a bad node or network glitch), ApplicationMaster takes charge of rescheduling the task typically on a different node to avoid repeating the problem.

Each task is retried up to 4 times by default. These limits can be configured using:

mapreduce.map.maxattempts
mapreduce.reduce.maxattempts

If all retry attempts fail, task is marked as permanently failed and entire job may be aborted.

  • Failed: Task error during execution
  • Killed: Task was intentionally stopped (e.g., timeout, duplicate)

Job Failure

A MapReduce job may fail if:

  • A task fails all retry attempts despite rescheduling by ApplicationMaster.
  • Core components like ApplicationMaster, NodeManager or ResourceManager crash or become unresponsive.

You can allow some task failures without failing the entire job by configuring:

mapreduce.map.failures.maxpercent
mapreduce.reduce.failures.maxpercent

But if failure thresholds are exceeded or system components remain down job is marked as FAILED and will not complete.

MapReduce is built to handle real-world issues code bugs, machine crashes, JVM errors and still complete the job if possible. Always monitor logs and counters after job completion. They reveal critical info about task health and performance.


Similar Reads