hadoop_module6

Bridging
Technology
Gap

Netxillon
Technologies

Hadoop
Administra9on

By
Gurmukh
Singh

Module
6:
High
Availability

Hadoop

Netxillon
Technologies

Key
Points
from
Module
5:

Hadoop
2.0
and
YARN
Advantages

Hadoop
2.0
directory
structure
and
changes.

The
YARN
Work
Flow

Hadoop

Netxillon
Technologies

Agenda:

• 
Hadoop
2.0
and
YARN

• 
YARN
ﬂow

• 
Setup
HA
using
Shared
Storage
and
Zookeeper.

Hadoop

Netxillon
Technologies

Hadoop
2.0
Setup
differences

-‐  The
configuraIon
files
locaIon
has
now
moved
to
“$HADOOP_HOME/etc/hadoop”

-‐  The
jar
are
now
located
at
“$HADOOP_HOME/share/hadoop/mapreduce/*example.jar”

-‐  The
locaIon
for
admin
binaries
is
now
at
“$HADOOP_HOME/sbin”

-‐  Jobtracker/tasktracker
have
been
upgraded
to
Resource/Node
Manager.

-‐  There
is
no
“hadoop-‐daemon.sh
start
resourcemanger”
command,
it
is
upgraded
to
yarn
command
line.

-‐  The
Job
execuIon
is
done
by
YARN

Hadoop

Netxillon
Technologies

Hadoop
2.0
Cluster
Setup
hdfs-‐site.xml

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/data/namenode</value>

</property>

core-‐site.xml

<configura9on>

<property>

<name>fs.defaultFS</name>

<value>hdfs://ha-‐nn1.hacluster1.com:9000</value>

</property>

</configura9on>

Hadoop

Netxillon
Technologies

yarn-‐site.xml

<property>

<name>yarn.nodemanager.aux-‐services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-‐
services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

Hadoop
2.0
Distributed
Setup

mapred-‐site.xml

<property>

<name>mapreduce.framework.name</
name><value>yarn</value>

</property>

Hadoop

Netxillon
Technologies

Hadoop
2.0
Distributed
Setup

DEMO

Hadoop

Netxillon
Technologies

Job
Tracker
Disadvantages:

• 
Is
single
point
of
failure.

• 
JobTracker
is
heavy
loaded.

•  Does
resource
Management

•  Job
Scheduling

•  Takes
care
of
job
failures
and
recreaIon

Hadoop

Netxillon
Technologies

YARN
–
Yet
another
resource
nego9ator

Firstly,
Yarn
and
MRv2
are
not
the
same
thing.

Each
job
controls
its
own
desIny.

Responsible
for
Cluster
resource
Management

Hadoop

Netxillon
Technologies

YARN
components

Hadoop

Netxillon
Technologies

YARN
Flow

• 
Client
submits
job
and
with
the
help
of
ResourceManager
gets
a
ApplicaIon
ID.

• 
RM
chooses
a
NodeManager
with
available
resources
and
requests
MR
App
Master.

• 
Node
Manager
allocates
container
for
the
Master
and
the
assigns
MR
job
to
it.

• 
Splits
are
read
from
the
HDFS
by
the
MRApp
Master.

• 
MRApp
Master
again
negoIates
with
Resource
Manager
to
ﬁnd
the
node
with
maximum
resources.

• 
MRApp
Master
assigns
the
map/reduces
tasks
on
that
parIcular
NodeManager.

• 
NodeManager
creates
Yarnchild
to
execute
the
jobs.

• 
Yarnchild
executes
the
map
and
reduce
task
aZer
acquiring
the
resources
from
HDFS.

Hadoop

Netxillon
Technologies

YARN
Flow

Hadoop

Netxillon
Technologies

Hadoop
HA
-‐
HDFS

Namenode
is
a
single
point
of
failure,
what
if
it
fails
?

We
will
have
outage,
and
someImes
data
loss
due
to
corrupIon.

How
quickly
we
can
do
the
switch
if
needed.

Whether
the
switch
is
a
manual
failover
or
AutomaIc
failover.

Lets
look
at
all
the
above
ques9ons.

Hadoop

Netxillon
Technologies

Hadoop
HA
–
using
shared
NFS

Hadoop

Netxillon
Technologies

Hadoop
HA
–
using
shared
NFS

DEMO

Hadoop

Netxillon
Technologies

Hadoop
HA
-‐
HDFS

Using
Zookeeper

Hadoop

Netxillon
Technologies

Hadoop
HA
-‐
HDFS

Using
Zookeeper

docs.hortonworks.com

Hadoop

Netxillon
Technologies

Zookeeper

ZooKeeper
is
a
centralized
service
for
maintaining
conﬁguraIon
informaIon,
naming,
providing
distributed

synchronizaIon,
and
providing
group
services

Hadoop

Netxillon
Technologies

Zookeeper
Conﬁgura9on

9ckTime=2000

#
The
number
of
Icks
that
the
iniIal

#
synchronizaIon
phase
can
take

initLimit=10

#
The
number
of
Icks
that
can
pass
between

#
sending
a
request
and
ge`ng
an
acknowledgement

syncLimit=5

dataDir=/tmp/zookeeper

#
the
port
at
which
the
clients
will
connect

clientPort=2181

#

server.1=192.168.1.70:2888:3888

server.2=192.168.1.71:2888:3888

server.3=192.168.1.69:2888:3888

Hadoop

Netxillon
Technologies

•  Make
sure
zookeeper
is
up
and
coordinaIng.

•  Start
journal
nodes.

•  Format
the
Namenode

•  Format
the
zkFC

Hadoop
2.0
HA
Setup
using
QJM

Hadoop

Netxillon
Technologies

DEMO

Hadoop
2.0
HA
Setup
using
QJM

Hadoop

Netxillon
Technologies

Hadoop
2.0
Setup

DEMO

Hadoop

Netxillon
Technologies

Hadoop
Upgrade

1.
hadoop
dfsadmin
-‐upgradeProgress
status

2.
Stop
all
client
applicaIons
running
on
the
MapReduce
cluster.

3.
Perform
a
filesystem
check

hadoop
fsck
/
-‐files
-‐blocks
-‐loca9ons
>
dfs-‐v-‐old-‐fsck-‐1.log

4.
Save
a
complete
lisIng
of
the
HDFS
namespace
to
a
local
file

hadoop
dfs
-‐lsr
/
>
dfs-‐v-‐old-‐lsr-‐1.log

5.
Create
a
list
of
DataNodes
parIcipaIng
in
the
cluster:

hadoop
dfsadmin
-‐report
>
dfs-‐v-‐old-‐report-‐1.log

Hadoop

Netxillon
Technologies

Hadoop
Upgrade

6.
OpIonally
backup
HDFS
data

7.
Upgrade
process:

Point
to
the
new
directory,
update
environment
variables.

8.
hadoop-‐daemon.sh
start
namenode
-‐upgrade

9.
hadoop
dfsadmin
-‐upgradeProgress
status

10.
Now
start
the
datanode,
aZer
poinIng
to
the
new
hadoop
directory

11.
hadoop
dfsadmin
-‐safemode
get

12.
hadoop
dfsadmin
-‐ﬁnalizeUpgrade

Hadoop

Netxillon
Technologies

Further
Readings:

-‐  hnp://hadoop.apache.org/docs/r2.7.0/hadoop-‐project-‐dist/hadoop-‐hdfs/HDFSHighAvailabilityWithQJM.html

-‐  hnps://hadoop.apache.org/docs/stable/hadoop-‐project-‐dist/hadoop-‐hdfs/HDFSHighAvailabilityWithNFS.html

GitHub:
hnps://github.com/netxillon/hadoop/tree/master/HA_QJM

Hadoop

Netxillon
Technologies

Further
Reading:

• 
hnps://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

• 
hnp://www.aosabook.org/en/hdfs.html

Hadoop

Netxillon
Technologies

Topics
for
Next
Class:

• 
Hive,
HBASE,
PIG

• 
Sqoop,
Flume

Hadoop

Netxillon
Technologies

Pre-‐Readings
before
the
next
class:

• 
hnps://hbase.apache.org/

• 
hnp://hortonworks.com/hadoop/hive/

• 
hnps://hive.apache.org/

• 
hnps://pig.apache.org/

Netxillon
Technologies

Any
Ques9ons
?

Netxillon
Technologies

GitHub: https://github.com/netxillon/hadoop

Thanks
!

trainings@netxillon.com

hadoop_module6

More Related Content

What's hot (20)

Viewers also liked (19)

Similar to hadoop_module6 (20)

hadoop_module6