Big data and Hadoop

Handling Of Big Data using
Hadoop Framework Rahul Mahawar
1142030 (IT-2)
1

2
o What is big data.
o How do we generate it.
o Problem with Big data
o How to handle.
o What is Hadoop.
o Components
o Terminology behind it.
o Why to choose Hadoop.
Content
BIG DATA HADOOP

3
Big Data.?
What is BIGDATA?
How Much big is
this data
How do we
Generateit
How should it be
HANDLED
How to
Process
Where do we Store
How big data
is a Problem

4
What is BIGDATA?
For a System
In terms of Storage
beyond it’s capacity
In terms of Processing Power

5
In other words..
Small data is when
Big data is when is crash
is fit in the ram
because is not fit in the ram.
- DevOps Borat

7
Byte : 1 Grain of Rice
Kilo Byte
Kilo Byte : 1 Cup of Rice

8
Kilo Byte : 1 Cup of Rice
Megabyte
Mega Byte : 8 Bags of Rice

9
Kilobyte : 1 Cup of Rice
Megabyte : 8 Bags of Rice
Gigabyte : 3 Semi Trucks
Gigabyte
Gigabyte

10
Terabyte : 2 Container Ships
Byte

11
Petabyte : A Small Island
Petabyte

12
Exabyte : Almost A State Exabyte

13
Exabyte : Almost A State
Zettabyte : Fills the Pacific Ocean
Zettabyte

14
Yottabyte : A EARTH SIZE Rice ball
Yottabyte

15
Yottabyte : A EARTH SIZE Rice ball
Apple 1st Gen
Desktop
Internet
Big Data
The Future?

16
How do we
Generate big data??
In Every
60 Seconds
3.3 million Post
3,42,000 tweets
41000 Photo Upload
4 million Searches
50 billion messages
120 hours video upload

18
Problem with big data!
Storage?
Processing?
or

19
Is it Storage?
Google Servers

21
What is BIGDATA?
How Much big is
this data
How do we
Generateit
How should it be
HANDLED
How to
Process
Where do we Store
How big data
is a Problem

22
What Is Hadoop?
Hadoop is an open source software Framework for
Distributed storage and Distributed Processing
of very large data sets on Computer Clusters built
form Commodity Hardware..

23
100
Why
Distribution?
50 Signature per hour..
2nd hour 50 +
100
100
100
= 150
= 200
= 250
3rd hour 100 +
4th hour 150 +
+
+
50 50
2nd hour
3rd hour
4th hour
No
pending
Files

24
What Is Hadoop?

25
wwwwwwwwwww
Cluster Of Computer Commodity Hardware
Hardware
That is
cheap
Affordable
In price
And
Easy to Obtain

2
For
Storing
For
Processing
26
Components of Hadoop?
HDFS Map
Reduce

27
A Specially
Designed File System For Hadoop
To Store
huge amount of data
Using
Commodity Hardware
Why Special.?
In our System
Block Size = 4kb
When we install
Hadoop
Block Size = 64mb
HDFS
[ Hadoop distributed file system ]

File to store
200 Mb File.txt
28
Client
1
Server
1
5
42 3
7 86
Data Nodes
Name
Node
a.txt b.txt c.txt d.txt
64mb + 64mb + 64mb + 8mb
=200mb
Meta data
File.txt
a.txt- 1,2,3
b.txt- 3,4,5
c.txt- 5,6,7
d.txt- 6,7,9
4
blocks
of
memory
a.txt - 2,3,4

2
For
Storing
For
Processing
29
Components of Hadoop?
HDFS Map
Reduce

30
Map-Reduce
[ Technique to process the data ]
Map Reduce
Technique
To map
The way
For desired
Location
Technique
To get
The final
output

Program
Output.txt
31
Client
1
Server
1
5
42 3
7 86
Name Node
Meta data
File.txt
a.txt – 1,3,4
b.txt –3,,5,7
c.txt – 4,7,8
d.txt – 6,7,8
Data Nodes
Job Tracker
Task Tracker

32
Program
Output.txt
32
Client
1
Server
1
5
42 3
7 86
Name Node
Meta data
File.txt
a.txt – 1,3,4
b.txt –3,,5,7
c.txt – 4,7,8
d.txt – 6,7,8
Data Nodes
Job Tracker
Task Tracker
Reduce

33
What Is Hadoop?

34
200 mb = 4 different nodes
To process 200mb = t sec
To process 64mb = t/4 sec
Think of a size of 1000 Mb..
1000 Gb..?
4 Times faster

35
Hadoop
Scalable
Fast
Cost Effective
Handle Failure efficiently
Use simple programming model

36
Not Fit for Small data
Replication
Potential Stability
Too hard maintaining cluster
Very Complex Algorithms
Hadoop

Big data and Hadoop

More Related Content

Similar to Big data and Hadoop (20)

Recently uploaded (20)

Big data and Hadoop