Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-5285

JobTracker hangs for long periods of time

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.19.1
    • 0.19.2, 0.20.0
    • None
    • None
    • Reviewed

    Description

      On one of the larger clusters of 2000 nodes, JT hanged quite often, sometimes for times in the order of 10-15 minutes and once for one and a half hours. The stack trace shows that JobInProgress.obtainTaskCleanupTask() is waiting for lock on JobInProgress object which JobInProgress.initTasks() is holding for a long time waiting for DFS operations.

      Attachments

        1. 5285.1.patch
          13 kB
          Vinod Kumar Vavilapalli
        2. 5285.patch
          13 kB
          Devaraj Das
        3. trace.txt
          255 kB
          Vinod Kumar Vavilapalli

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ddas Devaraj Das
            vinodkv Vinod Kumar Vavilapalli
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment