SlideShare a Scribd company logo
Handling Of Big Data using
Hadoop Framework Rahul Mahawar
1142030 (IT-2)
1
2
o What is big data.
o How do we generate it.
o Problem with Big data
o How to handle.
o What is Hadoop.
o Components
o Terminology behind it.
o Why to choose Hadoop.
Content
BIG DATA HADOOP
3
Big Data.?
What is BIGDATA?
How Much big is
this data
How do we
Generateit
How should it be
HANDLED
How to
Process
Where do we Store
How big data
is a Problem
4
What is BIGDATA?
For a System
In terms of Storage
beyond it’s capacity
In terms of Processing Power
5
In other words..
Small data is when
Big data is when is crash
is fit in the ram
because is not fit in the ram.
- DevOps Borat
Byte
Byte : 1 Grain of Rice
6
7
Byte : 1 Grain of Rice
Kilo Byte
Kilo Byte : 1 Cup of Rice
8
Byte : 1 Grain of Rice
Kilo Byte : 1 Cup of Rice
Megabyte
Mega Byte : 8 Bags of Rice
9
Byte : 1 Grain of Rice
Kilobyte : 1 Cup of Rice
Megabyte : 8 Bags of Rice
Gigabyte : 3 Semi Trucks
Gigabyte
Gigabyte
10
Byte : 1 Grain of Rice
Kilobyte : 1 Cup of Rice
Megabyte : 8 Bags of Rice
Gigabyte : 3 Semi Trucks
Terabyte : 2 Container Ships
Byte
11
Byte : 1 Grain of Rice
Kilobyte : 1 Cup of Rice
Megabyte : 8 Bags of Rice
Gigabyte : 3 Semi Trucks
Terabyte : 2 Container Ships
Petabyte : A Small Island
Petabyte
12
Byte : 1 Grain of Rice
Kilobyte : 1 Cup of Rice
Megabyte : 8 Bags of Rice
Gigabyte : 3 Semi Trucks
Terabyte : 2 Container Ships
Petabyte : A Small Island
Exabyte : Almost A State Exabyte
13
Byte : 1 Grain of Rice
Kilobyte : 1 Cup of Rice
Megabyte : 8 Bags of Rice
Gigabyte : 3 Semi Trucks
Terabyte : 2 Container Ships
Petabyte : A Small Island
Exabyte : Almost A State
Zettabyte : Fills the Pacific Ocean
Zettabyte
14
Byte : 1 Grain of Rice
Kilobyte : 1 Cup of Rice
Megabyte : 8 Bags of Rice
Gigabyte : 3 Semi Trucks
Terabyte : 2 Container Ships
Petabyte : A Small Island
Exabyte : Almost A State
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE Rice ball
Yottabyte
15
Byte : 1 Grain of Rice
Kilobyte : 1 Cup of Rice
Megabyte : 8 Bags of Rice
Gigabyte : 3 Semi Trucks
Terabyte : 2 Container Ships
Petabyte : A Small Island
Exabyte : Almost A State
Zettabyte : Fills the Pacific Ocean
Yottabyte : A EARTH SIZE Rice ball
Apple 1st Gen
Desktop
Internet
Big Data
The Future?
16
How do we
Generate big data??
In Every
60 Seconds
3.3 million Post
3,42,000 tweets
41000 Photo Upload
4 million Searches
50 billion messages
120 hours video upload
17
18
Problem with big data!
Storage?
Processing?
or
19
Is it Storage?
Google Servers
20
Processing Big data
21
What is BIGDATA?
How Much big is
this data
How do we
Generateit
How should it be
HANDLED
How to
Process
Where do we Store
How big data
is a Problem
22
What Is Hadoop?
Hadoop is an open source software Framework for
Distributed storage and Distributed Processing
of very large data sets on Computer Clusters built
form Commodity Hardware..
23
100
Why
Distribution?
50 Signature per hour..
2nd hour 50 +
100
100
100
= 150
= 200
= 250
3rd hour 100 +
4th hour 150 +
+
+
50 50
2nd hour
3rd hour
4th hour
No
pending
Files
24
What Is Hadoop?
Hadoop is an open source software Framework for
Distributed storage and Distributed Processing
of very large data sets on Computer Clusters built
form Commodity Hardware..
25
wwwwwwwwwww
Cluster Of Computer Commodity Hardware
Hardware
That is
cheap
Affordable
In price
And
Easy to Obtain
2
For
Storing
For
Processing
26
Components of Hadoop?
HDFS Map
Reduce
27
A Specially
Designed File System For Hadoop
To Store
huge amount of data
Using
Commodity Hardware
Why Special.?
In our System
Block Size = 4kb
When we install
Hadoop
Block Size = 64mb
HDFS
[ Hadoop distributed file system ]
File to store
200 Mb File.txt
28
Client
1
Server
1
5
42 3
7 86
Data Nodes
Name
Node
a.txt b.txt c.txt d.txt
64mb + 64mb + 64mb + 8mb
=200mb
Meta data
File.txt
a.txt- 1,2,3
b.txt- 3,4,5
c.txt- 5,6,7
d.txt- 6,7,9
4
blocks
of
memory
a.txt - 2,3,4
2
For
Storing
For
Processing
29
Components of Hadoop?
HDFS Map
Reduce
30
Map-Reduce
[ Technique to process the data ]
Map Reduce
Technique
To map
The way
For desired
Location
Technique
To get
The final
output
Program
Output.txt
31
Client
1
Server
1
5
42 3
7 86
Name Node
Meta data
File.txt
a.txt – 1,3,4
b.txt –3,,5,7
c.txt – 4,7,8
d.txt – 6,7,8
Data Nodes
Job Tracker
Task Tracker
32
Program
Output.txt
32
Client
1
Server
1
5
42 3
7 86
Name Node
Meta data
File.txt
a.txt – 1,3,4
b.txt –3,,5,7
c.txt – 4,7,8
d.txt – 6,7,8
Data Nodes
Job Tracker
Task Tracker
Reduce
33
What Is Hadoop?
Hadoop is an open source software Framework for
Distributed storage and Distributed Processing
of very large data sets on Computer Clusters built
form Commodity Hardware..
34
200 mb = 4 different nodes
To process 200mb = t sec
To process 64mb = t/4 sec
Think of a size of 1000 Mb..
1000 Gb..?
4 Times faster
35
Hadoop
Scalable
Fast
Cost Effective
Handle Failure efficiently
Use simple programming model
36
Not Fit for Small data
Replication
Potential Stability
Too hard maintaining cluster
Very Complex Algorithms
Hadoop
37
Thank You

More Related Content

PDF
Big data & hadoop Introduction
PPTX
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
PPTX
Intro to BigData , Hadoop and Mapreduce
PPTX
Rebot Project Contents and Description
PPT
Big Data And Hadoop
PDF
Android N is now Android Nougat
PDF
Big data
Big data & hadoop Introduction
Scaling AncestryDNA with the Hadoop Ecosystem. Presented at the San Jose Hado...
Intro to BigData , Hadoop and Mapreduce
Rebot Project Contents and Description
Big Data And Hadoop
Android N is now Android Nougat
Big data

Similar to Big data and Hadoop (20)

PPTX
Intro to big data and how it works
PPTX
Big data anuj
ODP
re:Introduce Big Data and Hadoop Eco-system.
ODP
re:Introduce Big Data and Hadoop Eco-system.
PPTX
Whatisbigdata 130718170809-phpapp01
PPTX
PPTX
PPTX
What is big data?
PDF
Big data
PPTX
20171017 3PL Machine Learning & AI in Transport & Logistics
PDF
Big Data simplified
PPTX
Overview of bigdata
PPT
Big Data
PDF
Big Data Chapter1.pdf
PPTX
Introduction Big data
PPTX
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
PPTX
Big Data
PPTX
Sharing bisnis big data v3 part1
PDF
Introduction to Big Data & Hadoop
PPTX
BIG DATA
Intro to big data and how it works
Big data anuj
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
Whatisbigdata 130718170809-phpapp01
What is big data?
Big data
20171017 3PL Machine Learning & AI in Transport & Logistics
Big Data simplified
Overview of bigdata
Big Data
Big Data Chapter1.pdf
Introduction Big data
What Is Hadoop? | What Is Big Data & Hadoop | Introduction To Hadoop | Hadoop...
Big Data
Sharing bisnis big data v3 part1
Introduction to Big Data & Hadoop
BIG DATA
Ad

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Assigned Numbers - 2025 - Bluetooth® Document
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Machine Learning_overview_presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
cuic standard and advanced reporting.pdf
PPT
Teaching material agriculture food technology
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Cloud computing and distributed systems.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
A Presentation on Artificial Intelligence
Network Security Unit 5.pdf for BCA BBA.
Assigned Numbers - 2025 - Bluetooth® Document
The AUB Centre for AI in Media Proposal.docx
Machine Learning_overview_presentation.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation_ Review paper, used for researhc scholars
cuic standard and advanced reporting.pdf
Teaching material agriculture food technology
A comparative analysis of optical character recognition models for extracting...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Per capita expenditure prediction using model stacking based on satellite ima...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Cloud computing and distributed systems.
MYSQL Presentation for SQL database connectivity
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
A Presentation on Artificial Intelligence
Ad

Big data and Hadoop

  • 1. Handling Of Big Data using Hadoop Framework Rahul Mahawar 1142030 (IT-2) 1
  • 2. 2 o What is big data. o How do we generate it. o Problem with Big data o How to handle. o What is Hadoop. o Components o Terminology behind it. o Why to choose Hadoop. Content BIG DATA HADOOP
  • 3. 3 Big Data.? What is BIGDATA? How Much big is this data How do we Generateit How should it be HANDLED How to Process Where do we Store How big data is a Problem
  • 4. 4 What is BIGDATA? For a System In terms of Storage beyond it’s capacity In terms of Processing Power
  • 5. 5 In other words.. Small data is when Big data is when is crash is fit in the ram because is not fit in the ram. - DevOps Borat
  • 6. Byte Byte : 1 Grain of Rice 6
  • 7. 7 Byte : 1 Grain of Rice Kilo Byte Kilo Byte : 1 Cup of Rice
  • 8. 8 Byte : 1 Grain of Rice Kilo Byte : 1 Cup of Rice Megabyte Mega Byte : 8 Bags of Rice
  • 9. 9 Byte : 1 Grain of Rice Kilobyte : 1 Cup of Rice Megabyte : 8 Bags of Rice Gigabyte : 3 Semi Trucks Gigabyte Gigabyte
  • 10. 10 Byte : 1 Grain of Rice Kilobyte : 1 Cup of Rice Megabyte : 8 Bags of Rice Gigabyte : 3 Semi Trucks Terabyte : 2 Container Ships Byte
  • 11. 11 Byte : 1 Grain of Rice Kilobyte : 1 Cup of Rice Megabyte : 8 Bags of Rice Gigabyte : 3 Semi Trucks Terabyte : 2 Container Ships Petabyte : A Small Island Petabyte
  • 12. 12 Byte : 1 Grain of Rice Kilobyte : 1 Cup of Rice Megabyte : 8 Bags of Rice Gigabyte : 3 Semi Trucks Terabyte : 2 Container Ships Petabyte : A Small Island Exabyte : Almost A State Exabyte
  • 13. 13 Byte : 1 Grain of Rice Kilobyte : 1 Cup of Rice Megabyte : 8 Bags of Rice Gigabyte : 3 Semi Trucks Terabyte : 2 Container Ships Petabyte : A Small Island Exabyte : Almost A State Zettabyte : Fills the Pacific Ocean Zettabyte
  • 14. 14 Byte : 1 Grain of Rice Kilobyte : 1 Cup of Rice Megabyte : 8 Bags of Rice Gigabyte : 3 Semi Trucks Terabyte : 2 Container Ships Petabyte : A Small Island Exabyte : Almost A State Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE Rice ball Yottabyte
  • 15. 15 Byte : 1 Grain of Rice Kilobyte : 1 Cup of Rice Megabyte : 8 Bags of Rice Gigabyte : 3 Semi Trucks Terabyte : 2 Container Ships Petabyte : A Small Island Exabyte : Almost A State Zettabyte : Fills the Pacific Ocean Yottabyte : A EARTH SIZE Rice ball Apple 1st Gen Desktop Internet Big Data The Future?
  • 16. 16 How do we Generate big data?? In Every 60 Seconds 3.3 million Post 3,42,000 tweets 41000 Photo Upload 4 million Searches 50 billion messages 120 hours video upload
  • 17. 17
  • 18. 18 Problem with big data! Storage? Processing? or
  • 21. 21 What is BIGDATA? How Much big is this data How do we Generateit How should it be HANDLED How to Process Where do we Store How big data is a Problem
  • 22. 22 What Is Hadoop? Hadoop is an open source software Framework for Distributed storage and Distributed Processing of very large data sets on Computer Clusters built form Commodity Hardware..
  • 23. 23 100 Why Distribution? 50 Signature per hour.. 2nd hour 50 + 100 100 100 = 150 = 200 = 250 3rd hour 100 + 4th hour 150 + + + 50 50 2nd hour 3rd hour 4th hour No pending Files
  • 24. 24 What Is Hadoop? Hadoop is an open source software Framework for Distributed storage and Distributed Processing of very large data sets on Computer Clusters built form Commodity Hardware..
  • 25. 25 wwwwwwwwwww Cluster Of Computer Commodity Hardware Hardware That is cheap Affordable In price And Easy to Obtain
  • 27. 27 A Specially Designed File System For Hadoop To Store huge amount of data Using Commodity Hardware Why Special.? In our System Block Size = 4kb When we install Hadoop Block Size = 64mb HDFS [ Hadoop distributed file system ]
  • 28. File to store 200 Mb File.txt 28 Client 1 Server 1 5 42 3 7 86 Data Nodes Name Node a.txt b.txt c.txt d.txt 64mb + 64mb + 64mb + 8mb =200mb Meta data File.txt a.txt- 1,2,3 b.txt- 3,4,5 c.txt- 5,6,7 d.txt- 6,7,9 4 blocks of memory a.txt - 2,3,4
  • 30. 30 Map-Reduce [ Technique to process the data ] Map Reduce Technique To map The way For desired Location Technique To get The final output
  • 31. Program Output.txt 31 Client 1 Server 1 5 42 3 7 86 Name Node Meta data File.txt a.txt – 1,3,4 b.txt –3,,5,7 c.txt – 4,7,8 d.txt – 6,7,8 Data Nodes Job Tracker Task Tracker
  • 32. 32 Program Output.txt 32 Client 1 Server 1 5 42 3 7 86 Name Node Meta data File.txt a.txt – 1,3,4 b.txt –3,,5,7 c.txt – 4,7,8 d.txt – 6,7,8 Data Nodes Job Tracker Task Tracker Reduce
  • 33. 33 What Is Hadoop? Hadoop is an open source software Framework for Distributed storage and Distributed Processing of very large data sets on Computer Clusters built form Commodity Hardware..
  • 34. 34 200 mb = 4 different nodes To process 200mb = t sec To process 64mb = t/4 sec Think of a size of 1000 Mb.. 1000 Gb..? 4 Times faster
  • 35. 35 Hadoop Scalable Fast Cost Effective Handle Failure efficiently Use simple programming model
  • 36. 36 Not Fit for Small data Replication Potential Stability Too hard maintaining cluster Very Complex Algorithms Hadoop