SlideShare a Scribd company logo
Big Data (HADOOP AND MAPREDUCE?)
What is Hadoop? Simple answer, Hadoop lets you store files bigger than what
can be stored on one particular node or server. So you can store very, very
large files and many files on multiple servers/computers in a distributed fashion.
Advantages of Hadoop include affordability (it runs on industry standard hardware and
agility (store any data, run any analysis).
Hadoop is an Apache open source project that providesa parallel storage and
processing framework. Itsprimary purpose is to run MapReduce batch programs in
parallel on tens to thousands of server nodes.
Hadoop scales out to large clusters of serversand storage using the Hadoop Distributed
File System (HDFS) to manage huge data sets and spread them across the servers.
Hadoop comes with libraries and utilities needed by other Hadoop modules. Hadoop
consists of the Hadoop Common package, which providesfilesystemand OS level
abstractions, a MapReduce engine. The Hadoop Common package contains the
necessary JAVA files and scripts needed to start Hadoop. The package also provides
source code, documentation, and a contribution section that includes projects from
the Hadoop Community
Hadoop Distributed file-systemthat stores data on commodity machines, providing very
high aggregate bandwidth across the cluster. Hadoop scales out to large clustersof
serversand storage using the Hadoop Distributed File System (HDFS) to manage huge
data sets and spread them across the servers.
HDFS was designed to be a scalable, fault-tolerant, distributed storage systemthat
workscloselywith MapReduce. HDFS will “just work” under a variety of physical and
systemic circumstances. By distributing storage and computation across many servers,
the combined storage resource can grow with demand while remaining economical at
every size.
What is Map Reduce?
Map reduce is a framework for processing the data. The data is not moved in a
conventional fashion using the network becauseit is slow for huge amount of data and
media. MapReduce uses a better approach to fit well with big data sets. So rather than
move the data to the software, MapReducemoves the processing software to the
data.
MAP
REDUCE
KEY TO BE OR NOT
VALUE 2 2 1 1
Map Reduce – a programming model for large scale data processing. MapReduce
refers to the application modules written by a programmer that run in two phases: first
mapping the data (extract) then reducing it (transform).
Hadoop’s greatest benefits is the ability of programmers to write application modules in
almost any language and run them in parallel on the same cluster that stores the data.
With Hadoop, any programmer can harness the power and capacity of thousands of
CPUs and hard drivessimultaneously.
KEY TO BE OR NOT TO BE
VALUE 1 1 1 1 1 1

More Related Content

PPTX
Introduction to bigdata
PDF
Cred_hadoop_presenatation
PPTX
PPTX
Hadoop Architecture
PPT
Big data
PPTX
Introduction to hadoop
PPTX
Hadoop An Introduction
PPT
Big data and hadoop
Introduction to bigdata
Cred_hadoop_presenatation
Hadoop Architecture
Big data
Introduction to hadoop
Hadoop An Introduction
Big data and hadoop

What's hot (20)

PPTX
HADOOP TECHNOLOGY ppt
PPTX
Hadoop Technology
PDF
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
PDF
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
PPTX
Hadoop
PPT
Introduction to Apache hadoop
PDF
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
PPTX
Hadoop introduction
PPTX
Analytics 3
PPTX
Hadoop vs Apache Spark
PDF
An Introduction to Apache Spark
PDF
PPT
Hire Hadoop Developer
PPT
Hadoop distributions - ecosystem
PPTX
Big Data and Hadoop - An Introduction
PPTX
Big data
PPTX
1.demystifying big data & hadoop
PPTX
Design of Hadoop Distributed File System
PPT
The solution for big data
PPTX
Big data analysis using hadoop cluster
HADOOP TECHNOLOGY ppt
Hadoop Technology
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
Hadoop
Introduction to Apache hadoop
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop introduction
Analytics 3
Hadoop vs Apache Spark
An Introduction to Apache Spark
Hire Hadoop Developer
Hadoop distributions - ecosystem
Big Data and Hadoop - An Introduction
Big data
1.demystifying big data & hadoop
Design of Hadoop Distributed File System
The solution for big data
Big data analysis using hadoop cluster
Ad

Similar to Hadoop map reduce (20)

PDF
2.1-HADOOP.pdf
PPTX
Introduction to Hadoop and Hadoop component
DOCX
project report on hadoop
PPT
Hadoop a Natural Choice for Data Intensive Log Processing
PPTX
PPTX
PPTX
Cppt Hadoop
PDF
Survey on Performance of Hadoop Map reduce Optimization Methods
PPT
Hadoop ppt2
PPTX
Managing Big data with Hadoop
PPTX
215824116_JABEZ_DBMS - bi215824116 M.Sc. Bioinformatics.pptx
PPTX
Distributed Systems Hadoop.pptx
PPT
Big Data and Hadoop Basics
PPT
unit-3bda-230421082621-d2b7d921.ppthjghh
PPT
Unit-3_BDA.ppt
PPT
Hadoop distributed file system (HDFS), HDFS concept
PPTX
Big Data and Hadoop Guide
PDF
Introduction To Hadoop Administration - SpringPeople
PDF
Hadoop overview.pdf
PPT
Hadoop in action
2.1-HADOOP.pdf
Introduction to Hadoop and Hadoop component
project report on hadoop
Hadoop a Natural Choice for Data Intensive Log Processing
Cppt Hadoop
Survey on Performance of Hadoop Map reduce Optimization Methods
Hadoop ppt2
Managing Big data with Hadoop
215824116_JABEZ_DBMS - bi215824116 M.Sc. Bioinformatics.pptx
Distributed Systems Hadoop.pptx
Big Data and Hadoop Basics
unit-3bda-230421082621-d2b7d921.ppthjghh
Unit-3_BDA.ppt
Hadoop distributed file system (HDFS), HDFS concept
Big Data and Hadoop Guide
Introduction To Hadoop Administration - SpringPeople
Hadoop overview.pdf
Hadoop in action
Ad

More from VijayMohan Vasu (17)

PPTX
DATA SCIENCE CERTIFICATES
PPTX
Midway Experience PowerBI
PPTX
Experience Power BI
PPTX
DWBI-WORK MIDWAY
PPTX
DATA WAREHOUSE AND BUSINESS INTELLIGENCE
PDF
Balanced Diet
DOCX
Predictive analytics usage and challenges
DOCX
Predictive analytics in the world of big data
PPTX
R for data analytics
PPTX
Predictive analytics for modern business
PPTX
Data science & data scientist
PDF
Social Media Marketing
PDF
Introduction to data warehousing and business intelligence
PDF
Inmon & kimball method
PDF
Introduction to data warehousing and business intelligence
PPTX
Smartbi Presentation
PPTX
Smartbi Presentation
DATA SCIENCE CERTIFICATES
Midway Experience PowerBI
Experience Power BI
DWBI-WORK MIDWAY
DATA WAREHOUSE AND BUSINESS INTELLIGENCE
Balanced Diet
Predictive analytics usage and challenges
Predictive analytics in the world of big data
R for data analytics
Predictive analytics for modern business
Data science & data scientist
Social Media Marketing
Introduction to data warehousing and business intelligence
Inmon & kimball method
Introduction to data warehousing and business intelligence
Smartbi Presentation
Smartbi Presentation

Recently uploaded (20)

PDF
Lecture1 pattern recognition............
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Database Infoormation System (DBIS).pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Quality review (1)_presentation of this 21
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Lecture1 pattern recognition............
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to Knowledge Engineering Part 1
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
1_Introduction to advance data techniques.pptx
Business Acumen Training GuidePresentation.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Database Infoormation System (DBIS).pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
STUDY DESIGN details- Lt Col Maksud (21).pptx
climate analysis of Dhaka ,Banglades.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
IB Computer Science - Internal Assessment.pptx
Quality review (1)_presentation of this 21
Miokarditis (Inflamasi pada Otot Jantung)
.pdf is not working space design for the following data for the following dat...
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg

Hadoop map reduce

  • 1. Big Data (HADOOP AND MAPREDUCE?) What is Hadoop? Simple answer, Hadoop lets you store files bigger than what can be stored on one particular node or server. So you can store very, very large files and many files on multiple servers/computers in a distributed fashion. Advantages of Hadoop include affordability (it runs on industry standard hardware and agility (store any data, run any analysis). Hadoop is an Apache open source project that providesa parallel storage and processing framework. Itsprimary purpose is to run MapReduce batch programs in parallel on tens to thousands of server nodes. Hadoop scales out to large clusters of serversand storage using the Hadoop Distributed File System (HDFS) to manage huge data sets and spread them across the servers. Hadoop comes with libraries and utilities needed by other Hadoop modules. Hadoop consists of the Hadoop Common package, which providesfilesystemand OS level abstractions, a MapReduce engine. The Hadoop Common package contains the necessary JAVA files and scripts needed to start Hadoop. The package also provides source code, documentation, and a contribution section that includes projects from the Hadoop Community Hadoop Distributed file-systemthat stores data on commodity machines, providing very high aggregate bandwidth across the cluster. Hadoop scales out to large clustersof serversand storage using the Hadoop Distributed File System (HDFS) to manage huge data sets and spread them across the servers. HDFS was designed to be a scalable, fault-tolerant, distributed storage systemthat workscloselywith MapReduce. HDFS will “just work” under a variety of physical and systemic circumstances. By distributing storage and computation across many servers,
  • 2. the combined storage resource can grow with demand while remaining economical at every size. What is Map Reduce? Map reduce is a framework for processing the data. The data is not moved in a conventional fashion using the network becauseit is slow for huge amount of data and media. MapReduce uses a better approach to fit well with big data sets. So rather than move the data to the software, MapReducemoves the processing software to the data. MAP REDUCE KEY TO BE OR NOT VALUE 2 2 1 1 Map Reduce – a programming model for large scale data processing. MapReduce refers to the application modules written by a programmer that run in two phases: first mapping the data (extract) then reducing it (transform). Hadoop’s greatest benefits is the ability of programmers to write application modules in almost any language and run them in parallel on the same cluster that stores the data. With Hadoop, any programmer can harness the power and capacity of thousands of CPUs and hard drivessimultaneously. KEY TO BE OR NOT TO BE VALUE 1 1 1 1 1 1