big data and hadoop

NATCOM-2017
Paper Presentation on :
BIG DATA AND ITS
ASSOCIATION WITH HADOOP
PRESENTED BY:
SHAMAMA KAMAL

INTRODUCTION
 WHAT IS BIG DATA?
 The amount of data that is too large to be
processed, stored or to be available for
retrieval.
 The traditional methods of database
management techniques cannot be used
since the data that is available is too large,
unstructured and changing rapidly.
 Many online firms face the problem of BIG
DATA.

3 V’s Of Big Data
 Volume: Presence of data in a very large
quantity.
 Variety: Data was present in more then one
form.
 Velocity: Change and increase in data is a
very fast process.

ARRIVAL OF HADOOP
 A tool for Big Data Analytics.
 The idea of Hadoop was given by Google in
one of its research paper outlining its
approach to handle enormous amount of
data.
 Created by Doug Cutting and Mike Carafella
in 2005.

WHAT IS HADOOP?
 A software program that lets on easy processing of vast
amount of data and manage it using reduction of data.
 FEATURES
 Scalable
 Reliable
 Economical
 Efficient
 COMPONENTS
 Hadoop common/core
 HDFS
 MapReduce
 Hadoop Cluster

HADOOP AS AN OSS
 Written in Java with some code in C and
command line utilities.
 OSS so is freely available for modification.
 PIG and HIVE under the MapReduce
function.
 Any programming language can be used with
Hadoop Streaming to implement Map() and
Reduce() parts of the user’s program.

MapReduce Technique
 A two step process carried out by the map()
and reduce() function respectively.
 The Mapper functions takes from the
programmer that what data he wants to
retrieve and then the Reducer function will
take the data and integrate it.
 Works on data in the cluster.

JobTracker and TaskTracker
 The clients submit MapReduce jobs to the
JobTracker.
 JobTracker pushes work out to the
available TaskTracker nodes in the cluster.
 JobTracker pushes work out to the
available TaskTracker nodes in the cluster,
keeping the work as close to data as possible.
 Communicate regularly to check if the system has
not failed.

CONCLUSION
 Hadoop enables distributed parallel processing of
large amount of data.
 Processing is done across industry-standard,
inexpensive, servers that does both storing and
processing of data and are much scalable.
 The efficiency of the data reduction by Hadoop is
dependent on its MapReduce function.
 It is 100% open source, and pioneered a new way
to store and processing data

REFERENCES
 Big Data Analytics using Hadoop , By: Bijesh
Dhyani and Anurag Barthwal.
 A survey paper on MapReduce in Big Data,
By: P. Sudha, Dr. R. Gunavathi
 www.milanor.net/blog/an-example-of-
mapreduce-with-rmr2/
 M. A. Beyer and D. Laney, “The importance of
‟big data‟: A definition,” Gartner, Tech. Rep.,
2012.

big data and hadoop

More Related Content

What's hot (19)

Similar to big data and hadoop (20)

Recently uploaded (20)

big data and hadoop