SlideShare a Scribd company logo
2
Most read
5
Most read
7
Most read
BY – SHUBHAM PARMAR
What is Hadoop?
• The Apache Hadoop software library is a
framework that allows for the distributed
processing of large data sets across clusters
of computers using simple programming
models.
• It is made by apache software foundation in
2011.
• Written in JAVA.
Hadoop is open source software.
Framework
Massive Storage
Processing Power
Big Data
• Big data is a term used to define very large amount of unstructured and
semi structured data a company creates.
•The term is used when talking about Petabytes and Exabyte of data.
•That much data would take so much time and cost to load into relational
database for analysis.
•Facebook has almost 10billion photos taking up to 1Petabytes of storage.
So what is the problem??
1. Processing that large data is very difficult in relational database.
2. It would take too much time to process data and cost.
We can solve this problem by Distributed
Computing.
But the problems in distributed computing is –
Hardware failure
Chances of hardware failure is always there.
Combine the data after analysis
Data from all disks have to be combined from all the disks which is a mess.
To Solve all the Problems Hadoop Came.
It has two main parts –
1. Hadoop Distributed File System (HDFS),
2. Data Processing Framework & MapReduce
1. Hadoop Distributed File System
It ties so many small and reasonable priced machines together into a single cost effective computer
cluster.
Data and application processing are protected against hardware failure.
 If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed
computing does not fail.
it automatically stores multiple copies of all data.
It provides simplified programming model which allows user to quickly read and write the
distributed system.
2. MapReduce
MapReduce is a programming model for processing and generating large data sets with a
parallel, distributed algorithm on a cluster.
It is an associative implementation for processing and generating large data sets.
MAP function that process a key pair to generates a set of intermediate key pairs.
REDUCE function that merges all intermediate values associated with the same intermediate
key
PPT on Hadoop
PPT on Hadoop
Pros of Hadoop
1. Computing power
2. Flexibility
3. Fault Tolerance
4. Low Cost
5. Scalability
Cons of Hadoop
1. Integration with existing systems
Hadoop is not optimised for ease for use. Installing and integrating with existing
databases might prove to be difficult, especially since there is no software support
provided.
2. Administration and ease of use
Hadoop requires knowledge of MapReduce, while most data practitioners use SQL. This
means significant training may be required to administer Hadoop clusters.
3. Security
Hadoop lacks the level of security functionality needed for safe enterprise deployment,
especially if it concerns sensitive data.
PPT on Hadoop

More Related Content

PPTX
Introduction to Hadoop
PPTX
HADOOP TECHNOLOGY ppt
PPSX
PPTX
Introduction to Hadoop and Hadoop component
PPTX
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
PPTX
Hadoop File system (HDFS)
PPTX
Hadoop And Their Ecosystem ppt
PPTX
Introduction to Hadoop Technology
Introduction to Hadoop
HADOOP TECHNOLOGY ppt
Introduction to Hadoop and Hadoop component
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop File system (HDFS)
Hadoop And Their Ecosystem ppt
Introduction to Hadoop Technology

What's hot (20)

PPTX
Big data and Hadoop
PPTX
Hadoop
PPTX
Big Data and Hadoop
PPTX
Map Reduce
PPTX
Big Data Analytics with Hadoop
PPTX
PPTX
Hadoop and Big Data
PPT
Hadoop Map Reduce
PPTX
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
PPT
Unit-3_BDA.ppt
PDF
Google App Engine
PDF
Hadoop Ecosystem
PPTX
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
PDF
Hadoop YARN
PPTX
Introduction to Apache Hadoop Eco-System
PPTX
Hadoop Tutorial For Beginners
PDF
Hadoop Overview & Architecture
 
PPTX
Introduction to HDFS
PDF
Map Reduce
PPT
Seminar Presentation Hadoop
Big data and Hadoop
Hadoop
Big Data and Hadoop
Map Reduce
Big Data Analytics with Hadoop
Hadoop and Big Data
Hadoop Map Reduce
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Unit-3_BDA.ppt
Google App Engine
Hadoop Ecosystem
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop YARN
Introduction to Apache Hadoop Eco-System
Hadoop Tutorial For Beginners
Hadoop Overview & Architecture
 
Introduction to HDFS
Map Reduce
Seminar Presentation Hadoop
Ad

Viewers also liked (12)

PPTX
HADOOP TECHNOLOGY ppt
PDF
Introduccion apache hadoop
PDF
Hadoop
PPTX
Big data con Hadoop y SSIS 2016
PPTX
Hadoop: MapReduce para procesar grandes cantidades de datos
PDF
¿Por que cambiar de Apache Hadoop a Apache Spark?
PDF
Introducción a hadoop
PDF
Seminario mongo db springdata 10-11-2011
PDF
Hadoop en accion
ODP
Hadoop demo ppt
PPTX
Introduction to Machine Learning
HADOOP TECHNOLOGY ppt
Introduccion apache hadoop
Hadoop
Big data con Hadoop y SSIS 2016
Hadoop: MapReduce para procesar grandes cantidades de datos
¿Por que cambiar de Apache Hadoop a Apache Spark?
Introducción a hadoop
Seminario mongo db springdata 10-11-2011
Hadoop en accion
Hadoop demo ppt
Introduction to Machine Learning
Ad

Similar to PPT on Hadoop (20)

PPTX
Hadoop training in bangalore
PPTX
Hadoop tutorial for Freshers,
PPTX
Hadoop
PPT
Introduction to Apache hadoop
DOCX
Hadoop Seminar Report
ODP
Hadoop seminar
PPT
Big Data & Hadoop
PDF
Big data and hadoop overvew
PPTX
Hadoop An Introduction
PPTX
Hadoop technology
DOCX
1. what is hadoop part 1
PPT
Hadoop Technology
PPTX
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
PPTX
What is Hadoop? Key Concepts, Architecture, and Applications
PPTX
Hadoo its a good pdf to read some notes p.pptx
PDF
00 hadoop welcome_transcript
PPTX
Apache hadoop basics
PPT
Hadoop tutorial
PPT
Big data with hadoop
PPT
BIG_DATA(HADOOP)
Hadoop training in bangalore
Hadoop tutorial for Freshers,
Hadoop
Introduction to Apache hadoop
Hadoop Seminar Report
Hadoop seminar
Big Data & Hadoop
Big data and hadoop overvew
Hadoop An Introduction
Hadoop technology
1. what is hadoop part 1
Hadoop Technology
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
What is Hadoop? Key Concepts, Architecture, and Applications
Hadoo its a good pdf to read some notes p.pptx
00 hadoop welcome_transcript
Apache hadoop basics
Hadoop tutorial
Big data with hadoop
BIG_DATA(HADOOP)

Recently uploaded (20)

PPTX
web development for engineering and engineering
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
additive manufacturing of ss316l using mig welding
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
composite construction of structures.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPT
Project quality management in manufacturing
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Construction Project Organization Group 2.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
web development for engineering and engineering
UNIT-1 - COAL BASED THERMAL POWER PLANTS
additive manufacturing of ss316l using mig welding
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Foundation to blockchain - A guide to Blockchain Tech
composite construction of structures.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Lesson 3_Tessellation.pptx finite Mathematics
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CH1 Production IntroductoryConcepts.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Project quality management in manufacturing
OOP with Java - Java Introduction (Basics)
Embodied AI: Ushering in the Next Era of Intelligent Systems
Construction Project Organization Group 2.pptx
Structs to JSON How Go Powers REST APIs.pdf
Arduino robotics embedded978-1-4302-3184-4.pdf

PPT on Hadoop

  • 1. BY – SHUBHAM PARMAR
  • 2. What is Hadoop? • The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. • It is made by apache software foundation in 2011. • Written in JAVA.
  • 3. Hadoop is open source software. Framework Massive Storage Processing Power
  • 4. Big Data • Big data is a term used to define very large amount of unstructured and semi structured data a company creates. •The term is used when talking about Petabytes and Exabyte of data. •That much data would take so much time and cost to load into relational database for analysis. •Facebook has almost 10billion photos taking up to 1Petabytes of storage.
  • 5. So what is the problem?? 1. Processing that large data is very difficult in relational database. 2. It would take too much time to process data and cost.
  • 6. We can solve this problem by Distributed Computing. But the problems in distributed computing is – Hardware failure Chances of hardware failure is always there. Combine the data after analysis Data from all disks have to be combined from all the disks which is a mess.
  • 7. To Solve all the Problems Hadoop Came. It has two main parts – 1. Hadoop Distributed File System (HDFS), 2. Data Processing Framework & MapReduce
  • 8. 1. Hadoop Distributed File System It ties so many small and reasonable priced machines together into a single cost effective computer cluster. Data and application processing are protected against hardware failure.  If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. it automatically stores multiple copies of all data. It provides simplified programming model which allows user to quickly read and write the distributed system.
  • 9. 2. MapReduce MapReduce is a programming model for processing and generating large data sets with a parallel, distributed algorithm on a cluster. It is an associative implementation for processing and generating large data sets. MAP function that process a key pair to generates a set of intermediate key pairs. REDUCE function that merges all intermediate values associated with the same intermediate key
  • 12. Pros of Hadoop 1. Computing power 2. Flexibility 3. Fault Tolerance 4. Low Cost 5. Scalability
  • 13. Cons of Hadoop 1. Integration with existing systems Hadoop is not optimised for ease for use. Installing and integrating with existing databases might prove to be difficult, especially since there is no software support provided. 2. Administration and ease of use Hadoop requires knowledge of MapReduce, while most data practitioners use SQL. This means significant training may be required to administer Hadoop clusters. 3. Security Hadoop lacks the level of security functionality needed for safe enterprise deployment, especially if it concerns sensitive data.