Hadoop map reduce

Big Data (HADOOP AND MAPREDUCE?)
What is Hadoop? Simple answer, Hadoop lets you store files bigger than what
can be stored on one particular node or server. So you can store very, very
large files and many files on multiple servers/computers in a distributed fashion.
Advantages of Hadoop include affordability (it runs on industry standard hardware and
agility (store any data, run any analysis).
Hadoop is an Apache open source project that providesa parallel storage and
processing framework. Itsprimary purpose is to run MapReduce batch programs in
parallel on tens to thousands of server nodes.
Hadoop scales out to large clusters of serversand storage using the Hadoop Distributed
File System (HDFS) to manage huge data sets and spread them across the servers.
Hadoop comes with libraries and utilities needed by other Hadoop modules. Hadoop
consists of the Hadoop Common package, which providesfilesystemand OS level
abstractions, a MapReduce engine. The Hadoop Common package contains the
necessary JAVA files and scripts needed to start Hadoop. The package also provides
source code, documentation, and a contribution section that includes projects from
the Hadoop Community
Hadoop Distributed file-systemthat stores data on commodity machines, providing very
high aggregate bandwidth across the cluster. Hadoop scales out to large clustersof
serversand storage using the Hadoop Distributed File System (HDFS) to manage huge
data sets and spread them across the servers.
HDFS was designed to be a scalable, fault-tolerant, distributed storage systemthat
workscloselywith MapReduce. HDFS will “just work” under a variety of physical and
systemic circumstances. By distributing storage and computation across many servers,

the combined storage resource can grow with demand while remaining economical at
every size.
What is Map Reduce?
Map reduce is a framework for processing the data. The data is not moved in a
conventional fashion using the network becauseit is slow for huge amount of data and
media. MapReduce uses a better approach to fit well with big data sets. So rather than
move the data to the software, MapReducemoves the processing software to the
data.
MAP
REDUCE
KEY TO BE OR NOT
VALUE 2 2 1 1
Map Reduce – a programming model for large scale data processing. MapReduce
refers to the application modules written by a programmer that run in two phases: first
mapping the data (extract) then reducing it (transform).
Hadoop’s greatest benefits is the ability of programmers to write application modules in
almost any language and run them in parallel on the same cluster that stores the data.
With Hadoop, any programmer can harness the power and capacity of thousands of
CPUs and hard drivessimultaneously.
KEY TO BE OR NOT TO BE
VALUE 1 1 1 1 1 1

Hadoop map reduce

More Related Content

What's hot (20)

Similar to Hadoop map reduce (20)

More from VijayMohan Vasu (17)

Recently uploaded (20)

Hadoop map reduce