SlideShare a Scribd company logo
Big Data Hadoop
Presented by:- Rahul Sharma
B-Tech(Cloud Technology) 2nd year
Poornima University (I.Nurture)
What is
Hadoop?
 Hadoop is an open-source software framework for
storing data and running applications on clusters of
commodity hardware. It provides massive storage for any
kind of data, enormous processing power and the ability to
handle virtually limitless tasks or jobs.
What is the
use of Hadoop
technology?
 Hadoop is an open source, Java-based programming
framework that supports the processing and storage of
extremely large data sets in a distributed computing
environment. It is part of the Apache project sponsored by the
Apache Software Foundation
Why is
Hadoop
important?
 Ability to store and process huge amounts of any kind of data,
quickly- With data volumes and varieties constantly increasing,
especially from social media and the Internet of Things (IoT), that's
a key consideration.
 Computing power- Hadoop's distributed computing model
processes big data fast. The more computing nodes you use, the
more processing power you have.
 Flexibility- Unlike traditional relational databases, you don’t have
to preprocess data before storing it. You can store as much data
as you want and decide how to use it later. That includes
unstructured data like text, images and videos.
 Low cost- The open-source framework is free and uses commodity
hardware to store large quantities of data.
 Scalability- You can easily grow your system to handle more data
simply by adding nodes. Little administration is required.
What are the
challenges
of using
Hadoop?
 Full-fledged data management and governance- Hadoop does
not have easy-to-use, full-feature tools for data management,
data cleansing, governance and metadata. Especially lacking
are tools for data quality and standardization.
 Data security- Another challenge centers around the
fragmented data security issues, though new tools and
technologies are surfacing. The Kerberos authentication
protocol is a great step toward making Hadoop environments
secure.
How Is
Hadoop Being
Used?
 Low-cost storage and data archive- The modest cost of commodity
hardware makes Hadoop useful for storing and combining data
such as transactional, social media, sensor, machine, scientific,
click streams, etc. The low-cost storage lets you keep information
that is not deemed currently critical but that you might want to
analyze later.
 IoT and Hadoop- Things in the IoT need to know what to
communicate and when to act. At the core of the IoT is a
streaming, always on torrent of data. Hadoop is often used as the
data store for millions or billions of transactions. You can then
continuously improve these instructions, because Hadoop is
constantly being updated with new data that doesn’t match
previously defined patterns.
How Is
Hadoop Being
Used?
 Complement your data warehouse- We're now seeing Hadoop
beginning to sit beside data warehouse environments, as well as
certain data sets being offloaded from the data warehouse into
Hadoop or new types of data going directly to Hadoop. The end
goal for every organization is to have a right platform for storing
and processing data of different schema, formats, etc. to support
different use cases that can be integrated at different levels.
How It Works
and a Hadoop
Glossary.
"Currently, four core modules are included in the basic framework
from the Apache Foundation:"
 Hadoop Common – the libraries and utilities used by other Hadoop
modules.
 Hadoop Distributed File System (HDFS) – the Java-based scalable
system that stores data across multiple machines without prior
organization.
 YARN – (Yet Another Resource Negotiator) provides resource
management for the processes running on Hadoop.
 MapReduce – a parallel processing software framework. It is
comprised of two steps. Map step is a master node that takes inputs
and partitions them into smaller subproblems and then distributes
them to worker nodes. After the map step has taken place, the
master node takes the answers to all of the subproblems and
combines them to produce output.
Any Queries !
Thank You. !

More Related Content

PPT
Hire Hadoop Developer
PPTX
A Glimpse of Bigdata - Introduction
PPTX
Big Data Technology Stack : Nutshell
PPTX
Bigdata and hadoop
PDF
Bigdata and Hadoop Bootcamp
PPTX
Big Data Hadoop Training- Multisoft Systems
PPTX
Intro to Big Data Hadoop
PDF
Using Machine Learning with HDInsight
Hire Hadoop Developer
A Glimpse of Bigdata - Introduction
Big Data Technology Stack : Nutshell
Bigdata and hadoop
Bigdata and Hadoop Bootcamp
Big Data Hadoop Training- Multisoft Systems
Intro to Big Data Hadoop
Using Machine Learning with HDInsight

What's hot (20)

PPTX
Big Data and Hadoop
PDF
Open source stak of big data techs open suse asia
PPT
Big Data Analytics 2014
PPTX
Hadoop
PDF
Introduction To Big Data Analytics On Hadoop - SpringPeople
PPTX
Big data & hadoop
PDF
Hadoop and Big Data Analytics | Sysfore
PPTX
Big Data and Hadoop
PPTX
Design of Hadoop Distributed File System
PPTX
1.demystifying big data & hadoop
PPTX
The Big Data Stack
PPTX
Comparison - RDBMS vs Hadoop vs Apache
PDF
Hadoop core concepts
PDF
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
PDF
Lecture4 big data technology foundations
PPTX
Big Data Concepts
PPTX
Hadoop info
PPTX
Big data Analytics Hadoop
PPTX
Big data vahidamiri-tabriz-13960226-datastack.ir
PPT
Big data introduction, Hadoop in details
Big Data and Hadoop
Open source stak of big data techs open suse asia
Big Data Analytics 2014
Hadoop
Introduction To Big Data Analytics On Hadoop - SpringPeople
Big data & hadoop
Hadoop and Big Data Analytics | Sysfore
Big Data and Hadoop
Design of Hadoop Distributed File System
1.demystifying big data & hadoop
The Big Data Stack
Comparison - RDBMS vs Hadoop vs Apache
Hadoop core concepts
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Lecture4 big data technology foundations
Big Data Concepts
Hadoop info
Big data Analytics Hadoop
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data introduction, Hadoop in details
Ad

Similar to Big Data Hadoop Technology (20)

ODP
Hadoop seminar
PPTX
Hadoop_EcoSystem slide by CIDAC India.pptx
PPTX
Hadoo its a good pdf to read some notes p.pptx
PDF
Hadoop .pdf
PDF
Big data and hadoop overvew
PDF
Hadoop Primer
PDF
Introduction to apache hadoop
PPTX
MOD-2 presentation on engineering students
PDF
Hadoop framework thesis (3)
PDF
Unit IV.pdf
PPT
Introduction to Apache hadoop
PPTX
Introduction to apache hadoop copy
PPTX
2. hadoop fundamentals
PDF
B.MONICA II M.SC COMPUTER SCIENCE
PPTX
Hadoop and Big data in Big data and cloud.pptx
PDF
Hadoop installation by santosh nage
DOCX
Hadoop Seminar Report
PPTX
Lecture 2 Hadoop.pptx
PPTX
Big data and hadoop anupama
PPTX
Introduction to Hadoop
Hadoop seminar
Hadoop_EcoSystem slide by CIDAC India.pptx
Hadoo its a good pdf to read some notes p.pptx
Hadoop .pdf
Big data and hadoop overvew
Hadoop Primer
Introduction to apache hadoop
MOD-2 presentation on engineering students
Hadoop framework thesis (3)
Unit IV.pdf
Introduction to Apache hadoop
Introduction to apache hadoop copy
2. hadoop fundamentals
B.MONICA II M.SC COMPUTER SCIENCE
Hadoop and Big data in Big data and cloud.pptx
Hadoop installation by santosh nage
Hadoop Seminar Report
Lecture 2 Hadoop.pptx
Big data and hadoop anupama
Introduction to Hadoop
Ad

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Machine learning based COVID-19 study performance prediction
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Approach and Philosophy of On baking technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Cloud computing and distributed systems.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
cuic standard and advanced reporting.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Electronic commerce courselecture one. Pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Review of recent advances in non-invasive hemoglobin estimation
Machine learning based COVID-19 study performance prediction
20250228 LYD VKU AI Blended-Learning.pptx
Chapter 3 Spatial Domain Image Processing.pdf
MIND Revenue Release Quarter 2 2025 Press Release
The Rise and Fall of 3GPP – Time for a Sabbatical?
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Unlocking AI with Model Context Protocol (MCP)
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
Cloud computing and distributed systems.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
cuic standard and advanced reporting.pdf
Programs and apps: productivity, graphics, security and other tools
NewMind AI Weekly Chronicles - August'25-Week II
Electronic commerce courselecture one. Pdf

Big Data Hadoop Technology

  • 1. Big Data Hadoop Presented by:- Rahul Sharma B-Tech(Cloud Technology) 2nd year Poornima University (I.Nurture)
  • 2. What is Hadoop?  Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless tasks or jobs.
  • 3. What is the use of Hadoop technology?  Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation
  • 4. Why is Hadoop important?  Ability to store and process huge amounts of any kind of data, quickly- With data volumes and varieties constantly increasing, especially from social media and the Internet of Things (IoT), that's a key consideration.  Computing power- Hadoop's distributed computing model processes big data fast. The more computing nodes you use, the more processing power you have.  Flexibility- Unlike traditional relational databases, you don’t have to preprocess data before storing it. You can store as much data as you want and decide how to use it later. That includes unstructured data like text, images and videos.  Low cost- The open-source framework is free and uses commodity hardware to store large quantities of data.  Scalability- You can easily grow your system to handle more data simply by adding nodes. Little administration is required.
  • 5. What are the challenges of using Hadoop?  Full-fledged data management and governance- Hadoop does not have easy-to-use, full-feature tools for data management, data cleansing, governance and metadata. Especially lacking are tools for data quality and standardization.  Data security- Another challenge centers around the fragmented data security issues, though new tools and technologies are surfacing. The Kerberos authentication protocol is a great step toward making Hadoop environments secure.
  • 6. How Is Hadoop Being Used?  Low-cost storage and data archive- The modest cost of commodity hardware makes Hadoop useful for storing and combining data such as transactional, social media, sensor, machine, scientific, click streams, etc. The low-cost storage lets you keep information that is not deemed currently critical but that you might want to analyze later.  IoT and Hadoop- Things in the IoT need to know what to communicate and when to act. At the core of the IoT is a streaming, always on torrent of data. Hadoop is often used as the data store for millions or billions of transactions. You can then continuously improve these instructions, because Hadoop is constantly being updated with new data that doesn’t match previously defined patterns.
  • 7. How Is Hadoop Being Used?  Complement your data warehouse- We're now seeing Hadoop beginning to sit beside data warehouse environments, as well as certain data sets being offloaded from the data warehouse into Hadoop or new types of data going directly to Hadoop. The end goal for every organization is to have a right platform for storing and processing data of different schema, formats, etc. to support different use cases that can be integrated at different levels.
  • 8. How It Works and a Hadoop Glossary. "Currently, four core modules are included in the basic framework from the Apache Foundation:"  Hadoop Common – the libraries and utilities used by other Hadoop modules.  Hadoop Distributed File System (HDFS) – the Java-based scalable system that stores data across multiple machines without prior organization.  YARN – (Yet Another Resource Negotiator) provides resource management for the processes running on Hadoop.  MapReduce – a parallel processing software framework. It is comprised of two steps. Map step is a master node that takes inputs and partitions them into smaller subproblems and then distributes them to worker nodes. After the map step has taken place, the master node takes the answers to all of the subproblems and combines them to produce output.