SlideShare a Scribd company logo
Presented by : Nabin Nayak
Enrollment No : 01512002017
Contents
oWhat is Hadoop Technology ?
oDeveloper of Hadoop
oHadoop Features
oTwo main features of Hadoop
oGoals/Requirement
oHadoop Framework and Tools
oPros of Hadoop
oCons of Hadoop
What is Hadoop Technology ?
 The most well known technology
used for Big Data is Hadoop.
 Open source software framework
designed for storage and
processing of large scale data on
clusters of commodity hardware.
 The Apache Hadoop software
library is a framework that
allows for the distributed
processing of large data sets
across clusters of computers
using simple programming
models.
 It is a flexible and highly-
available architecture for
large scale computation and
data processing on a network
of commodity hardware.
 It is made by apache software
foundation in 2011.
 Written in JAVA.
Developer of Hadoop
Michael j. cafarella Doug cutting
 Doug Cutting
and Michael J.
Cafarella
developed Hadoop
to support
distribution for
the Nutch search
engine project.
 The project was
funded by Yahoo
Features of Hadoop
 Hadoop provides access to the file systems
 The Hadoop Common package contains the
 necessary JAR files and scripts
 The package also provides source code, documentation and a
contribution section that includes projects from the Hadoop
Community.
Problems Before Hadoop
1. Processing that large data is very difficult in relational
database.
2. It would take too much time to process data and cost.
We can solve this problem by Distributed
Computing.
• But the problems in distributed computing is –
Hardware failure
Chances of hardware failure is always there.
Combine the data after analysis
Data from all disks have to be combined from all the disks
which is a mess.
To Solve all the Problems Hadoop Came.
It has two main parts –
• Hadoop Distributed File System (HDFS),
• MapReduce
Two main features of Hadoop
1.Hadoop Distributed File
System
• It ties so many small and reasonable
priced machines together into a single
cost effective computer cluster.
• Data and application processing are
protected against hardware failure.
• If a node goes down, jobs are
automatically redirected to other
nodes to make sure the distributed
computing does not fail.
• it automatically stores multiple copies
of all data.
2. MapReduce
• MapReduce is a programming model for
processing and generating large data sets with
a parallel, distributed algorithm on a cluster.
• It is an associative implementation for
processing and generating large data sets.
• MAP function that process a key pair to
generates a set of intermediate key pairs.
• REDUCE function that merges all intermediate
values associated with the same intermediate
key.
Goals / requirement
 Abstract and facilitate the storage and processing of large and/or rapidly growing data
sets
• Structured and non-structured data
• Simple programming models
 High scalability and availability
 Use commodity (cheap!) hardware with little redundancy
 Fault-tolerance
 Move computation rather than data
Hadoop Framework and Tools
Pros of Hadoop
1. Computing power
2. Flexibility
3. Fault Tolerance
4. Low Cost
5. Scalability
Cons of Hadoop
 Integration with existing systems
Hadoop is not optimized for ease for use. Installing and
integrating with existing databases might prove to be
difficult, especially since there is no software support
provided.
 Administration and ease of use
Hadoop requires knowledge of MapReduce, while most data
practitioners use SQL. This means significant training may
be required to administer Hadoop clusters.
 Security
Hadoop lacks the level of security functionality needed for
safe enterprise deployment, especially if it concerns
sensitive data.
Benefits of Hadoop
• Cost Saving and efficient and reliable data processing
• Provides an economically scalable solution
• Storing and processing of large amount of data
• Data grid operating system
• It is deployed on industry standard servers rather than expensive
specialized data storage systems
Famous user of Hadoop
Hadoop

More Related Content

PPTX
Hadoop training in bangalore
PPTX
Hadoop tutorial for Freshers,
PPTX
Case study on big data
PDF
Introduction To Hadoop Administration - SpringPeople
PPTX
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
PPT
Cloud computing and Hadoop introduction
PPTX
Extending your Hadoop Implementation to the Cloud
PPTX
Big data & Hadoop
Hadoop training in bangalore
Hadoop tutorial for Freshers,
Case study on big data
Introduction To Hadoop Administration - SpringPeople
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Cloud computing and Hadoop introduction
Extending your Hadoop Implementation to the Cloud
Big data & Hadoop

What's hot (20)

PPTX
Analyzing Hadoop Data Using Sparklyr

PPTX
Big data course
PDF
Impala use case @ Zoosk
PPTX
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
PPTX
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
PPT
Daniel Abadi HadoopWorld 2010
PPTX
Big Data Open Source Technologies
PDF
Building Data Quality pipelines with Apache Spark and Delta Lake
PPTX
Spark in the Enterprise - 2 Years Later by Alan Saldich
PPTX
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
DOC
PDF
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
PDF
Transitioning Compute Models: Hadoop MapReduce to Spark
PPTX
Harnessing the Power of Apache Hadoop
PPTX
An Introduction to Apache Spark
PPTX
PDF
Big data processing with apache spark
PPTX
Part 1: Introducing the Cloudera Data Science Workbench
PDF
Spark mhug2
PPTX
Hadoop in the cloud – The what, why and how from the experts
Analyzing Hadoop Data Using Sparklyr

Big data course
Impala use case @ Zoosk
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Daniel Abadi HadoopWorld 2010
Big Data Open Source Technologies
Building Data Quality pipelines with Apache Spark and Delta Lake
Spark in the Enterprise - 2 Years Later by Alan Saldich
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Building a Data Pipeline With Tools From the Hadoop Ecosystem - StampedeCon 2016
Transitioning Compute Models: Hadoop MapReduce to Spark
Harnessing the Power of Apache Hadoop
An Introduction to Apache Spark
Big data processing with apache spark
Part 1: Introducing the Cloudera Data Science Workbench
Spark mhug2
Hadoop in the cloud – The what, why and how from the experts
Ad

Similar to Hadoop (20)

PPTX
PPT on Hadoop
PPTX
Hadoo its a good pdf to read some notes p.pptx
PDF
Big data and hadoop overvew
PPTX
Big data technologies and databases
PPTX
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
DOCX
hadoop seminar training report
PPT
Big Data & Hadoop
PDF
Hadoop .pdf
ODP
Hadoop seminar
PPTX
Hadoop info
PPTX
Hadoop technology
PDF
Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for women
PPTX
Bigdata and hadoop
PDF
PPTX
Big Data Hadoop Technology
PPTX
Apache hadoop basics
PDF
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
PPTX
2. hadoop fundamentals
PPSX
PPTX
Introduction to Hadoop
PPT on Hadoop
Hadoo its a good pdf to read some notes p.pptx
Big data and hadoop overvew
Big data technologies and databases
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
hadoop seminar training report
Big Data & Hadoop
Hadoop .pdf
Hadoop seminar
Hadoop info
Hadoop technology
Hadoop J.G.Rohini 2nd M.sc., computer science bon secours college for women
Bigdata and hadoop
Big Data Hadoop Technology
Apache hadoop basics
Hadoop J.G.Rohini II M.Sc.,computer science Bon secours college for women
2. hadoop fundamentals
Introduction to Hadoop
Ad

Recently uploaded (20)

PPTX
master seminar digital applications in india
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Cell Structure & Organelles in detailed.
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Business Ethics Teaching Materials for college
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
Anesthesia in Laparoscopic Surgery in India
master seminar digital applications in india
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Renaissance Architecture: A Journey from Faith to Humanism
Microbial disease of the cardiovascular and lymphatic systems
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
VCE English Exam - Section C Student Revision Booklet
Cell Structure & Organelles in detailed.
Complications of Minimal Access Surgery at WLH
Pharmacology of Heart Failure /Pharmacotherapy of CHF
2.FourierTransform-ShortQuestionswithAnswers.pdf
human mycosis Human fungal infections are called human mycosis..pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
Business Ethics Teaching Materials for college
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
RMMM.pdf make it easy to upload and study
Anesthesia in Laparoscopic Surgery in India

Hadoop

  • 1. Presented by : Nabin Nayak Enrollment No : 01512002017
  • 2. Contents oWhat is Hadoop Technology ? oDeveloper of Hadoop oHadoop Features oTwo main features of Hadoop oGoals/Requirement oHadoop Framework and Tools oPros of Hadoop oCons of Hadoop
  • 3. What is Hadoop Technology ?  The most well known technology used for Big Data is Hadoop.  Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware.  The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.  It is a flexible and highly- available architecture for large scale computation and data processing on a network of commodity hardware.  It is made by apache software foundation in 2011.  Written in JAVA.
  • 4. Developer of Hadoop Michael j. cafarella Doug cutting  Doug Cutting and Michael J. Cafarella developed Hadoop to support distribution for the Nutch search engine project.  The project was funded by Yahoo
  • 5. Features of Hadoop  Hadoop provides access to the file systems  The Hadoop Common package contains the  necessary JAR files and scripts  The package also provides source code, documentation and a contribution section that includes projects from the Hadoop Community.
  • 6. Problems Before Hadoop 1. Processing that large data is very difficult in relational database. 2. It would take too much time to process data and cost.
  • 7. We can solve this problem by Distributed Computing. • But the problems in distributed computing is – Hardware failure Chances of hardware failure is always there. Combine the data after analysis Data from all disks have to be combined from all the disks which is a mess.
  • 8. To Solve all the Problems Hadoop Came. It has two main parts – • Hadoop Distributed File System (HDFS), • MapReduce
  • 9. Two main features of Hadoop 1.Hadoop Distributed File System • It ties so many small and reasonable priced machines together into a single cost effective computer cluster. • Data and application processing are protected against hardware failure. • If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. • it automatically stores multiple copies of all data. 2. MapReduce • MapReduce is a programming model for processing and generating large data sets with a parallel, distributed algorithm on a cluster. • It is an associative implementation for processing and generating large data sets. • MAP function that process a key pair to generates a set of intermediate key pairs. • REDUCE function that merges all intermediate values associated with the same intermediate key.
  • 10. Goals / requirement  Abstract and facilitate the storage and processing of large and/or rapidly growing data sets • Structured and non-structured data • Simple programming models  High scalability and availability  Use commodity (cheap!) hardware with little redundancy  Fault-tolerance  Move computation rather than data
  • 12. Pros of Hadoop 1. Computing power 2. Flexibility 3. Fault Tolerance 4. Low Cost 5. Scalability
  • 13. Cons of Hadoop  Integration with existing systems Hadoop is not optimized for ease for use. Installing and integrating with existing databases might prove to be difficult, especially since there is no software support provided.  Administration and ease of use Hadoop requires knowledge of MapReduce, while most data practitioners use SQL. This means significant training may be required to administer Hadoop clusters.  Security Hadoop lacks the level of security functionality needed for safe enterprise deployment, especially if it concerns sensitive data.
  • 14. Benefits of Hadoop • Cost Saving and efficient and reliable data processing • Provides an economically scalable solution • Storing and processing of large amount of data • Data grid operating system • It is deployed on industry standard servers rather than expensive specialized data storage systems
  • 15. Famous user of Hadoop

Editor's Notes

  • #4: Notes to presenter: Description of what you learned in your own words on one side. Include information about the topic Details about the topic will also be helpful here. Tell the story of your learning experience. Just like a story there should always be a beginning, middle and an end. On the other side, you can add a graphic that provides evidence of what you learned. Feel free to use more than one slide to reflect upon your process. It also helps to add some video of your process.