SlideShare a Scribd company logo
A Study of Hadoop in Map-Reduce
Poumita Das
Shubharthi Dasgupta
Priyanka Das
What is Big Data??
Big data is an evolving term that describes any voluminous amount of
structured, semi-structured and unstructured data that has the
potential to be mined for information.
The 3 V’s
Why DFS
An introduction to Map-Reduce
Map-Reduce programs are designed to compute large volumes of data in a
parallel fashion. There are 3 steps
• Map
• Shuffle
• Reduce
Map-Reduce continued
Map Shuffle Reduce
What is Hadoop??
Apache Hadoop is a framework that
allows for the distributed processing
of large data sets across clusters of
commodity computers using a
simple programming model.
Hadoop core components
• Namenode
• Datanode
• Client
• User
• Job tracker
• Task tracker
Namenode
The NameNode maintains the namespace tree and the mapping of
blocks to DataNodes. In a cluster there may exist hundreds or even
thousands of datanodes.
Secondary NameNode reads the metadata from RAM and writes it into a
secondary storage. However it is NOT a substitute of a NameNode
Datanode
On startup, a DataNode connects to the NameNode; spinning until that
service comes up. It then responds to requests from the NameNode for
filesystem operations.
Client applications can talk directly to a DataNode, once the NameNode has
provided the location of the data.
HDFS client
User applications access the filesystem using the HDFS client. A client has mainly 3
operations.
• Creating a new file
• File read
• File write
Creating a new file
File read
HDFS implements a single-
writer, multiple-reader model.
That is reading is a parallel
operation in Hadoop
File write
An HDFS file consists of blocks.
When there is a need for a new
block, the NameNode allocates
a block with a unique block ID
and determines a list of
DataNodes to host replicas of
the block.
Job tracker and task tracker
Hadoop ecosystem
• PIG
• HIVE
• MAHOUT
A Sample Program
The Output
Why Anagrams?
• Started out as a simple relaxation game, finding anagrams in
sentences
• Games and Puzzles like Scrabble
• Ciphers, like permutation cipher, transposition ciphers
Future scope
Keeping in mind the vast application of Hadoop we have certain graph-
searching techniques in mind that would be much more easier to solve
with the help of Map-reduce engine.
References
• Introduction to Hadoop: Welcome to Apache
https://hadoop.apache.org/
• Cloudera Documentation: Usage
http://www.cloudera.com/content/cloudera/en/documentation/hado
op-tutorial/CDH5/Hadoop-Tutorial/ht_usage.html
• Edureka: Anatomy of a Map-Reduce Job
http://www.edureka.co/blog/anatomy-of-a-mapreduce-job-in-
apache-hadoop/
• Stackoverflow: Explain Map-Reduce Simply
http://stackoverflow.com/questions/28982/please-explain-
mapreduce-simply
Thank you

More Related Content

PPT
Hadoop training by keylabs
PDF
Hadoop architecture-tutorial
PPT
Hadoop technology
PDF
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
PDF
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
PPTX
Hadoop
PPTX
Seminar ppt
PDF
Cred_hadoop_presenatation
Hadoop training by keylabs
Hadoop architecture-tutorial
Hadoop technology
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop
Seminar ppt
Cred_hadoop_presenatation

What's hot (19)

PPTX
Hadoop Technology
PPTX
Hadoop architecture-tutorial
PPTX
PPTX
Big data and hadoop
DOCX
Hadoop Research
PPTX
HADOOP TECHNOLOGY ppt
PPTX
Hadoop An Introduction
PPSX
Hadoop – big deal
PPTX
2.introduction to hdfs
PDF
Map reduce & HDFS with Hadoop
PPTX
Hadoop distributed file system
PPTX
Design of Hadoop Distributed File System
PPTX
Sector Vs Hadoop
PDF
Introduction to Hadoop part1
PPTX
Apache hadoop basics
PPTX
Big data
PPTX
Apache hadoop technology : Beginners
PPTX
Hadoop introduction
Hadoop Technology
Hadoop architecture-tutorial
Big data and hadoop
Hadoop Research
HADOOP TECHNOLOGY ppt
Hadoop An Introduction
Hadoop – big deal
2.introduction to hdfs
Map reduce & HDFS with Hadoop
Hadoop distributed file system
Design of Hadoop Distributed File System
Sector Vs Hadoop
Introduction to Hadoop part1
Apache hadoop basics
Big data
Apache hadoop technology : Beginners
Hadoop introduction
Ad

Similar to Hadoop (20)

PPTX
Managing Big data with Hadoop
PPTX
Topic 9a-Hadoop Storage- HDFS.pptx
PDF
Hadoop Ecosystem
PPTX
Apache Hadoop Big Data Technology
PDF
20131205 hadoop-hdfs-map reduce-introduction
PDF
hdfs readrmation ghghg bigdats analytics info.pdf
PPTX
Big data
PPTX
Introduction to Hadoop and Hadoop component
PPT
Hadoop Technology
PPTX
Hadoop Distributed File System
PDF
an detailed notes on Hadoop-Cluster.ppt
PPTX
Hadoop
PDF
Hadoop
PDF
Unit IV.pdf
PDF
big data hadoop technonolgy for storing and processing data
PPTX
Hadoop
PPTX
Hadoop - HDFS
PPTX
Bigdata and Hadoop Introduction
DOCX
Managing Big data with Hadoop
Topic 9a-Hadoop Storage- HDFS.pptx
Hadoop Ecosystem
Apache Hadoop Big Data Technology
20131205 hadoop-hdfs-map reduce-introduction
hdfs readrmation ghghg bigdats analytics info.pdf
Big data
Introduction to Hadoop and Hadoop component
Hadoop Technology
Hadoop Distributed File System
an detailed notes on Hadoop-Cluster.ppt
Hadoop
Hadoop
Unit IV.pdf
big data hadoop technonolgy for storing and processing data
Hadoop
Hadoop - HDFS
Bigdata and Hadoop Introduction
Ad

Hadoop

  • 1. A Study of Hadoop in Map-Reduce Poumita Das Shubharthi Dasgupta Priyanka Das
  • 2. What is Big Data?? Big data is an evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information.
  • 5. An introduction to Map-Reduce Map-Reduce programs are designed to compute large volumes of data in a parallel fashion. There are 3 steps • Map • Shuffle • Reduce
  • 7. What is Hadoop?? Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.
  • 8. Hadoop core components • Namenode • Datanode • Client • User • Job tracker • Task tracker
  • 9. Namenode The NameNode maintains the namespace tree and the mapping of blocks to DataNodes. In a cluster there may exist hundreds or even thousands of datanodes. Secondary NameNode reads the metadata from RAM and writes it into a secondary storage. However it is NOT a substitute of a NameNode
  • 10. Datanode On startup, a DataNode connects to the NameNode; spinning until that service comes up. It then responds to requests from the NameNode for filesystem operations. Client applications can talk directly to a DataNode, once the NameNode has provided the location of the data.
  • 11. HDFS client User applications access the filesystem using the HDFS client. A client has mainly 3 operations. • Creating a new file • File read • File write
  • 13. File read HDFS implements a single- writer, multiple-reader model. That is reading is a parallel operation in Hadoop
  • 14. File write An HDFS file consists of blocks. When there is a need for a new block, the NameNode allocates a block with a unique block ID and determines a list of DataNodes to host replicas of the block.
  • 15. Job tracker and task tracker
  • 16. Hadoop ecosystem • PIG • HIVE • MAHOUT
  • 19. Why Anagrams? • Started out as a simple relaxation game, finding anagrams in sentences • Games and Puzzles like Scrabble • Ciphers, like permutation cipher, transposition ciphers
  • 20. Future scope Keeping in mind the vast application of Hadoop we have certain graph- searching techniques in mind that would be much more easier to solve with the help of Map-reduce engine.
  • 21. References • Introduction to Hadoop: Welcome to Apache https://hadoop.apache.org/ • Cloudera Documentation: Usage http://www.cloudera.com/content/cloudera/en/documentation/hado op-tutorial/CDH5/Hadoop-Tutorial/ht_usage.html • Edureka: Anatomy of a Map-Reduce Job http://www.edureka.co/blog/anatomy-of-a-mapreduce-job-in- apache-hadoop/ • Stackoverflow: Explain Map-Reduce Simply http://stackoverflow.com/questions/28982/please-explain- mapreduce-simply