SlideShare a Scribd company logo
NATCOM-2017
Paper Presentation on :
BIG DATA AND ITS
ASSOCIATION WITH HADOOP
PRESENTED BY:
SHAMAMA KAMAL
INTRODUCTION
 WHAT IS BIG DATA?
 The amount of data that is too large to be
processed, stored or to be available for
retrieval.
 The traditional methods of database
management techniques cannot be used
since the data that is available is too large,
unstructured and changing rapidly.
 Many online firms face the problem of BIG
DATA.
3 V’s Of Big Data
 Volume: Presence of data in a very large
quantity.
 Variety: Data was present in more then one
form.
 Velocity: Change and increase in data is a
very fast process.
ARRIVAL OF HADOOP
 A tool for Big Data Analytics.
 The idea of Hadoop was given by Google in
one of its research paper outlining its
approach to handle enormous amount of
data.
 Created by Doug Cutting and Mike Carafella
in 2005.
WHAT IS HADOOP?
 A software program that lets on easy processing of vast
amount of data and manage it using reduction of data.
 FEATURES
 Scalable
 Reliable
 Economical
 Efficient
 COMPONENTS
 Hadoop common/core
 HDFS
 MapReduce
 Hadoop Cluster
HADOOP AS AN OSS
 Written in Java with some code in C and
command line utilities.
 OSS so is freely available for modification.
 PIG and HIVE under the MapReduce
function.
 Any programming language can be used with
Hadoop Streaming to implement Map() and
Reduce() parts of the user’s program.
MapReduce Technique
 A two step process carried out by the map()
and reduce() function respectively.
 The Mapper functions takes from the
programmer that what data he wants to
retrieve and then the Reducer function will
take the data and integrate it.
 Works on data in the cluster.
MapReduce Technique
JobTracker and TaskTracker
 The clients submit MapReduce jobs to the
JobTracker.
 JobTracker pushes work out to the
available TaskTracker nodes in the cluster.
 JobTracker pushes work out to the
available TaskTracker nodes in the cluster,
keeping the work as close to data as possible.
 Communicate regularly to check if the system has
not failed.
CONCLUSION
 Hadoop enables distributed parallel processing of
large amount of data.
 Processing is done across industry-standard,
inexpensive, servers that does both storing and
processing of data and are much scalable.
 The efficiency of the data reduction by Hadoop is
dependent on its MapReduce function.
 It is 100% open source, and pioneered a new way
to store and processing data
REFERENCES
 Big Data Analytics using Hadoop , By: Bijesh
Dhyani and Anurag Barthwal.
 A survey paper on MapReduce in Big Data,
By: P. Sudha, Dr. R. Gunavathi
 www.milanor.net/blog/an-example-of-
mapreduce-with-rmr2/
 M. A. Beyer and D. Laney, “The importance of
‟big data‟: A definition,” Gartner, Tech. Rep.,
2012.
THANK YOU!!
ANY QUERIES?

More Related Content

PPTX
Hadoop Tutorial
PPTX
Revolution Analytics
PPTX
Big data
PPTX
Bigdata " new level"
PDF
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
PPT
Aginity "Big Data" Research Lab
DOCX
1. what is hadoop part 1
PDF
Big Data on Public Cloud
Hadoop Tutorial
Revolution Analytics
Big data
Bigdata " new level"
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
Aginity "Big Data" Research Lab
1. what is hadoop part 1
Big Data on Public Cloud

What's hot (19)

PDF
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
PPTX
Dataiku Flow and dctc - Berlin Buzzwords
PPT
Pervasive DataRush
PDF
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
PDF
Introduction to Big Data
PDF
Big Data Analytics
DOCX
Big data abstract
PPTX
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second ...
PPTX
IBM Big Data in the Cloud
PDF
Introduction to Big Data
PPT
Query O
PPTX
Application of Distributed processing and Big data in agricultural DSS
PDF
Big Data- Automotive Industry Use Case
ODP
BigData Hadoop
PPTX
Expect More from Hadoop
PPT
BigData Analytics
PPTX
Hadoop Training
PDF
MetaScale Case Study: Hadoop Extends DataStage ETL Capacity
PPTX
Aginity Big Data Research Lab V3
Big Data LDN 2018: FORTUNE 100 LESSONS ON ARCHITECTING DATA LAKES FOR REAL-TI...
Dataiku Flow and dctc - Berlin Buzzwords
Pervasive DataRush
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
Introduction to Big Data
Big Data Analytics
Big data abstract
HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second ...
IBM Big Data in the Cloud
Introduction to Big Data
Query O
Application of Distributed processing and Big data in agricultural DSS
Big Data- Automotive Industry Use Case
BigData Hadoop
Expect More from Hadoop
BigData Analytics
Hadoop Training
MetaScale Case Study: Hadoop Extends DataStage ETL Capacity
Aginity Big Data Research Lab V3
Ad

Similar to big data and hadoop (20)

DOCX
Big data processing using - Hadoop Technology
PPTX
Lecture 3.31 3.32.pptx
PDF
Big data technology
PPTX
Module 1- Introduction to Big Data and Hadoop
PPTX
Big Data
PPTX
Big data Hadoop presentation
PPTX
Big data Presentation
PPTX
Introduction-to-Big-Data-and-Hadoop.pptx
PPTX
Big data analytics: Technology's bleeding edge
PDF
Hadoop Master Class : A concise overview
PPTX
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
PPTX
Analysing of big data using map reduce
PDF
Learn About Big Data and Hadoop The Most Significant Resource
PDF
big data analytics introduction chapter 1
PPTX
Big data analytics - hadoop
PPTX
Hadoop
PPTX
Chapter1-Introduction Εισαγωγικές έννοιες
PPT
Big data with hadoop
PPTX
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
DOCX
hadoop seminar training report
Big data processing using - Hadoop Technology
Lecture 3.31 3.32.pptx
Big data technology
Module 1- Introduction to Big Data and Hadoop
Big Data
Big data Hadoop presentation
Big data Presentation
Introduction-to-Big-Data-and-Hadoop.pptx
Big data analytics: Technology's bleeding edge
Hadoop Master Class : A concise overview
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Analysing of big data using map reduce
Learn About Big Data and Hadoop The Most Significant Resource
big data analytics introduction chapter 1
Big data analytics - hadoop
Hadoop
Chapter1-Introduction Εισαγωγικές έννοιες
Big data with hadoop
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
hadoop seminar training report
Ad

Recently uploaded (20)

PDF
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPT
DATA COLLECTION METHODS-ppt for nursing research
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
Transcultural that can help you someday.
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
How to run a consulting project- client discovery
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
SAP 2 completion done . PRESENTATION.pptx
DATA COLLECTION METHODS-ppt for nursing research
Qualitative Qantitative and Mixed Methods.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
A Complete Guide to Streamlining Business Processes
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Transcultural that can help you someday.
STERILIZATION AND DISINFECTION-1.ppthhhbx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Business Analytics and business intelligence.pdf
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
How to run a consulting project- client discovery
Galatica Smart Energy Infrastructure Startup Pitch Deck

big data and hadoop

  • 1. NATCOM-2017 Paper Presentation on : BIG DATA AND ITS ASSOCIATION WITH HADOOP PRESENTED BY: SHAMAMA KAMAL
  • 2. INTRODUCTION  WHAT IS BIG DATA?  The amount of data that is too large to be processed, stored or to be available for retrieval.  The traditional methods of database management techniques cannot be used since the data that is available is too large, unstructured and changing rapidly.  Many online firms face the problem of BIG DATA.
  • 3. 3 V’s Of Big Data  Volume: Presence of data in a very large quantity.  Variety: Data was present in more then one form.  Velocity: Change and increase in data is a very fast process.
  • 4. ARRIVAL OF HADOOP  A tool for Big Data Analytics.  The idea of Hadoop was given by Google in one of its research paper outlining its approach to handle enormous amount of data.  Created by Doug Cutting and Mike Carafella in 2005.
  • 5. WHAT IS HADOOP?  A software program that lets on easy processing of vast amount of data and manage it using reduction of data.  FEATURES  Scalable  Reliable  Economical  Efficient  COMPONENTS  Hadoop common/core  HDFS  MapReduce  Hadoop Cluster
  • 6. HADOOP AS AN OSS  Written in Java with some code in C and command line utilities.  OSS so is freely available for modification.  PIG and HIVE under the MapReduce function.  Any programming language can be used with Hadoop Streaming to implement Map() and Reduce() parts of the user’s program.
  • 7. MapReduce Technique  A two step process carried out by the map() and reduce() function respectively.  The Mapper functions takes from the programmer that what data he wants to retrieve and then the Reducer function will take the data and integrate it.  Works on data in the cluster.
  • 9. JobTracker and TaskTracker  The clients submit MapReduce jobs to the JobTracker.  JobTracker pushes work out to the available TaskTracker nodes in the cluster.  JobTracker pushes work out to the available TaskTracker nodes in the cluster, keeping the work as close to data as possible.  Communicate regularly to check if the system has not failed.
  • 10. CONCLUSION  Hadoop enables distributed parallel processing of large amount of data.  Processing is done across industry-standard, inexpensive, servers that does both storing and processing of data and are much scalable.  The efficiency of the data reduction by Hadoop is dependent on its MapReduce function.  It is 100% open source, and pioneered a new way to store and processing data
  • 11. REFERENCES  Big Data Analytics using Hadoop , By: Bijesh Dhyani and Anurag Barthwal.  A survey paper on MapReduce in Big Data, By: P. Sudha, Dr. R. Gunavathi  www.milanor.net/blog/an-example-of- mapreduce-with-rmr2/  M. A. Beyer and D. Laney, “The importance of ‟big data‟: A definition,” Gartner, Tech. Rep., 2012.