2014 feb 24_big_datacongress_hadoopsession1_hadoop101

Adam
Muise
–
Solu/on
Architect,
Hortonworks

HADOOP
101:

AN
INTRODUCTION
TO
HADOOP
WITH
THE

HORTONWORKS
SANDBOX

100%
Open
Source
–

Democra/zed
Access
to

Data

The
leaders
of
Hadoop’s

development

We
do
Hadoop

Drive
Innova/on
in

the
plaForm
–
We

lead
the
roadmap

Community
driven,

Enterprise
Focused

We
do
Hadoop
successfully.

Support

Training

Professional
Services

Enter
the
Hadoop.

………

hOp://www.fabulouslybroke.com/2011/05/ninja-‐elephants-‐and-‐other-‐awesome-‐stories/

Hadoop
was
created
because

tradi/onal
technologies
never
cut
it

for
the
Internet
proper/es
like

Google,
Yahoo,
Facebook,
TwiOer,

and
LinkedIn

Tradi/onal
architecture
didn’t

scale
enough…

App
App
App
App

App
App
App
App

DB
DB

DB

SAN

App
App
App
App

DB
DB

DB

SAN

DB
DB

DB

SAN

Databases
can
become
bloated

and
useless

$upercompu/ng

Tradi/onal
architectures
cost
too

much
at
that
volume…

$/TB

$pecial

Hardware

So
what
is
the
answer?

If
you
could
design
a
system
that

would
handle
this,
what
would
it

look
like?

It
would
probably
need
a
highly

resilient,
self-‐healing,
cost-‐eﬃcient,

distributed
ﬁle
system…

Storage

Storage

Storage

Storage

Storage

Storage

Storage

Storage

Storage

It
would
probably
need
a
completely

parallel
processing
framework
that

took
tasks
to
the
data…

Processing
Processing
Processing

Storage
Storage
Storage

Processing
Processing
Processing

Storage
Storage
Storage

Processing
Processing
Processing

Storage
Storage
Storage

It
would
probably
run
on
commodity

hardware,
virtualized
machines,
and

common
OS
plaForms

Processing
Processing
Processing

Storage
Storage
Storage

Processing
Processing
Processing

Storage
Storage
Storage

Processing
Processing
Processing

Storage
Storage
Storage

It
would
probably
be
open
source
so

innova/on
could
happen
as
quickly

as
possible

It
would
need
a
cri/cal
mass
of

users

Tez

Storm

YARN

Pig

HDFS

MapReduce

Apache
Hadoop

HCatalog

Hive

HBase

Ambari

Knox

Sqoop

Falcon

Flume

Storm

Tez

Pig

YARN

HDFS

MapReduce

Hortonworks
Data
PlaForm

HCatalog

Hive

HBase

Ambari

Knox

Sqoop

Falcon

Flume

We
are
going
to
learn
how
to
work

with
Hadoop
in
less
than
an
hour.

To
do
this,
we
need
to
install

Hadoop
right?

The
Sandbox
is
‘Hadoop
in
a
Can’.

It
contains
one
copy
of
each
of
the

Master
and
Worker
node
processes

used
in
a
cluster,
only
in
a
single

virtual
node.

Processing
Processing
Processing

Storage
Storage
Storage

Processing
Processing
Processing

Storage
Storage
Storage

Processing

Storage

Linux
VM

Processing
Processing
Processing

Storage
Storage
Storage

Gefng
started
with
Sandbox
VM:

-‐
Pick
your
ﬂavor
of
VM
at…

hOp://www.hortonworks.com/sandbox

-‐
Start
the
sandbox
VM

-‐
ﬁnd
the
IP
displayed

-‐
go
to…

hOp://172.16.130.131

-‐
Register

-‐
Click
on
‘Start
Tutorials’

-‐
On
the
lek
hand
nav,
click
on
‘HCatalog,
Basic
Pig

&
Hive
Commands’

In
this
tutorial
we
will:

-‐
Land
ﬁles
in
HDFS

-‐
Assign
metadata
with
HCatalog

-‐
Use
SQL
with
Hive

-‐
Learn
to
process
data
with
Pig

Try
the
other
tutorials.

Hadoop
is
the
new
Modern
Data

Architecture
for
the
Enterprise

There is NO second place

Hortonworks

…the
Bull
Elephant
of
Hadoop
InnovaGon

© Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION

Page
29

2014 feb 24_big_datacongress_hadoopsession1_hadoop101

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to 2014 feb 24_big_datacongress_hadoopsession1_hadoop101 (20)

More from Adam Muise (12)

Recently uploaded (20)

2014 feb 24_big_datacongress_hadoopsession1_hadoop101