Hadoop Inside

Hadoop Inside

TC 데이터플랫폼실 GFIS팀
이은조

What is Hadoop
 Hadoop is a Framework & System for
 parallel processing of
 large amounts of data in
 a distributed computing environment
http://searchbusinessintelligence.techtarget.in/tutorial/Apache-Hadoop-FAQ-for-BI-professionals

 Apache project
 open source
 java based
 google system clone
 GFS -> HDFS
 MapReduce -> MapReduce

Distributed Processing System
 How to process data in distributed environment
 how to read/write data
 how to control nodes
 load balancing
 Monitoring
 node status
 task status
 Fault tolerance
 error detection
 process error, network error, hardware error, …
 error handling
 temporary error: retry -> duplication, data corruption, …
 permanent error: fail over(which one?)
 process hang: timeout & retry
• too long -> long response time
• too short -> infinite loop

Hadoop System Architecture

HDFS + MapReduce

Secondary
Job Name
Name
Tracker Node
Node

Task Data Task Data Task Data
Tracker Node Tracker Node Tracker Node

: Node : Process : Heart Beat : Data Read/Write

HDFS
 vs. Filesystem
 inode – namespace
 cylinder / track – data node
 blocks(bytes) – blocks(Mbytes)
 Features
 very large files
 write once, read many times
 support for usual file system operations
 ls, cp, mv, rm, chmod, chown, put, cat, …
 no support for multiple writers or arbitrary modifications

Block Replication & Rack Awareness

1 2
1 2 1 3
3 4
1 3 2 4

3 4 4 2
1 1 2
2
1
2

3 4
3 4 : File : Server
3 4
: Block : Rack

HDFS - Read

Data Read

1. Read Request
Name
Client
Node
2. Response

3. Reqeust 4. Read
Data Data

Data Node Data Node Data Node

: Node : Data Block : Data I/O : Operation Message

HDFS - Write

Data Write

1. Write Request
Name
Client
Node
2. Response

3. Write 5. Write
Data Done

Data Node 4. Write Data Node 4. Write Data Node
Replica Replica


HDFS – Write (Failure)

Data Write

1. Write Request
Name
Client
Node
2. Response

3. Write 5. Write
Data Done


4. Write
Replica


HDFS – Write (Failure)

Data Write

Name Data Node
Client
Node

Replica
Arrangement

Delete Write
Partial block Replica



MapReduce
 Definition
 map: (+1) [ 1, 2, 3, 4, …, 10 ] -> [ 2, 3, 4, 5, …, 11 ]
 reduce: (+) [ 2, 3, 4, 5, …, 11 ] -> 65
 Programming Model for processing data sets in Hadoop
 projection, filter -> map task
 aggregation, join -> reduce task
 sort -> partitioning
 Job Tracker & Task Trackers
 master / slave
 job = many tasks
 # of map tasks = # of file splits (default: # of blocks)
 # of reduce tasks = user configuration

MapReduce
Map / Reduce Task

: Distributed File System : Map Task : Map Output Record (Key/Value pair)

: Split : Reduce Task : Reduce Output Record (Key/Value pair)

: Input Data Record : Shuffling & Sorting : Partition

Mapper - partitioning
 double indexed structure

Output Buffer key value key value … key value
(default: 100Mb)

1st Index partition key value partition key value …
offset offset offset offset

2nd Index key key key ….
offset offset offset

 Spill Thread
 data sorting: 2nd index (quick sort)
 spill file generating
 spill data file & index file
 flush
 merge sort (by key) per partition

Reducer –fetching
 GetMapEventsThread
 map event listener
 MapOutputCopier
 data fetching from completed mapper (HTTP)
 concurrent running in some threads
 Merger
 key sorting (heap sort)

completion Job completion events
events Tracker

TaskTracker
TaskTracker (reduce task)
(map task)

HTTP - GET Copier
TaskTracker
(map task) Reducer
Copier

TaskTracker
(map task)

Job Flow
JobTracker Node

5. add job
Job
3. submit job Tracker

Client Node 6. heartbeat
4. retrieve
input spilts
MapReduce 1. runJob Job Task
Program Client Tracker 7. assign task

Shared
File System 8. retrieve 9. launch
2. copy job
resources job resources

Child

11. read data/ 10. run
write result
: Node : Job Queue : Job
Map/
Reduce
: JVM : Method Call : Task Task

: Class : I/O TaskTracker Node

Monitoring
 Heart beat
 task tracker status checking
 task request / alignment
 other commands (restart, shudown, kill task, …)
 Cluster Status
 Job / Task Status
 JobInProgress
 TaskInProgress
 Reporter & Metrics
 Black list

Monitoring (Summary)
 Heart beat
 task tracker status checking
 task request / alignment
 other commands (restart, shudown, kill task, …)
 Cluster Status
 Job / Task Status
 JobInProgress
 TaskInProgress
 Reporter & Metrics
 Black list

Task Scheduler
 job queue
 red-black tree ( java.util.TreeMap)
 sort by priority & job id (request time)
 load factor
 remain tasks / capacity
 task alignment
 high priority
 new task > speculative execution task > dummy splits task
 map task (local) > map task (non-local) > reduce task
 padding
 padding = MIN(total tasks * pad faction, task capacity)
 for speculative execution

Error Handling
 Retry
 configurable (default 4 times)
 Timeout
 configurable
 Speculative Execution
 current – start >= 1 minute
 average progress – progress > 20%

 how to control nodes HDFS Client
 load balancing master / slave
 Monitoring replication / rack awareness
 node status job scheduler
 task status
 Fault tolerance
 error detection
 error handling

 load balancing
 Monitoring
 node status
heart beat
 task status
job/task status
 Fault tolerance reporter / metrics
 error detection
 error handling

 load balancing
 Monitoring
 node status
black list
 task status time out & retry
 Fault tolerance speculative execution
 error detection
 error handling

Limitations
 map -> reduce network overhead
 iterative processing
 full(or theta) join
 small size but many splits data
 Low latency
 polling & pulling
 job initializing
 optimized for throughput
 job scheduling
 data access

Hadoop Inside

More Related Content

What's hot (19)

Similar to Hadoop Inside (20)

More from Eun-Jo Lee (7)

Recently uploaded (20)

Hadoop Inside