SlideShare a Scribd company logo
Hadoop & HDFS
"       version 1.0


File & Content Solutions!
What is Hadoop!


§  Built and distributed as part of the Apache Software
    Project;

    "
§  Hadoop EcoSystem:"
   §  Common – set of components and interfaces for a DFS and
       general I/O;"
   §  Avro – A serialization system for efficient, cross language RPC,
       and persistent data storage;"
   §  MapReduce – A distributed data processing model and
       execution environment that runs on large clusters of commodity
       machines;"
   §  HDFS – A distributed File System that runs on large clusters of
       commodity hardware."


                                                         File & Content Solutions!
Common Terms in Hadoop HDFS!


§  Name node - manages the File System namespace. It
    maintains the File System tree and the metadata for all
    the files and directories in the tree. 

    

      This information is stored persistently on the local disk in
      the form of two files: the namespace image and the edit
      log.

      "
§  Data node- Workhorses of the File System. They store
      and retrieve blocks when they are told to (by clients or
      the name node), and they report back to the name node
      periodically with lists of blocks that they are storing."


                                                     File & Content Solutions!
Common Terms in Hadoop HDFS!


§  Secondary Name node - Its main role is to periodically
    merge the namespace image with the edit log to prevent
    the edit log from becoming too large. The secondary
    name node usually runs on a separate physical machine

    "




                                               File & Content Solutions!
Hadoop Distributed File System - HDFS!


§  HDFS is a File System designed for storing very large
    files with streaming data access patterns, running on
    clusters of commodity hardware. 

    "
§  HDFS has a permissions model for files and directories
    that is much like POSIX."




                          POSIX is an acronym for Portable Operating System Interface."


                                                                  File & Content Solutions!
Writing data into Hadoop!




                            File & Content Solutions!
Reading data from HDFS!




                          File & Content Solutions!
MapReduce!


§  "Map" step: The master node takes the input, divides it
    into smaller sub-problems, and distributes them to
    worker nodes. A worker node may do this again in turn,
    leading to a multi-level tree structure. The worker node
    processes the smaller problem, and passes the answer
    back to its master node.

    "
§  "Reduce" step: The master node then collects the
    answers to all the sub-problems and combines them in
    some way to form the output – the answer to the problem
    it was originally trying to solve."
"
                                                File & Content Solutions!
MapReduce!




             File & Content Solutions!
HDFS Storage Solution!


§  The DataLogix Hadoop Storage Solution contains:"
   §  Enterprise Scale-Out storage solution for Hadoop workflows.

       "
   §  Native connectivity for Hadoop and Eco-systems components:"
      §    Hive"
      §    Hbase"
      §    Pig"
      §    Mahout

            "
   §  No single point of failure Name Node;

       "
   §  No 3x mirroring, native N+M protection is used;

       "
   §  SnapShot, Sync and NDMP back-up is supported."

                                                          File & Content Solutions!
Writing into Hadoop with the DataLogix solution!




§  The storage system becomes the Name Node and as well as the Data
    Node

    "
§  Provides scalability and protection of the data. 

    "
§  Hadoop cluster no longer has a single point of failure and no longer
    writes multiple 64MB-128MB chunks of data to datanodes"

                                                           File & Content Solutions!
Reading Hadoop Data !




§  Data is read off the cluster back to the compute nodes;

    "
§  The Data Nodes are now compute nodes and are independent of
    the data in the Hadoop cluster:"
   §  Benefits are that Hadoop hardware can be ugraded without the need for
       migration of data. "


                                                            File & Content Solutions!
More information?!!


§  More information about the Hadoop storage solutions?

    

      Please contact us:

      

        DataLogix

        Phone: +31(0)30-7440710

        e-mail: info@datalogix.nl

        

          www.datalogix.nl"




                                               File & Content Solutions!

More Related Content

PPTX
Hadoop File system (HDFS)
PPTX
Ravi Namboori Hadoop & HDFS Architecture
PPTX
HDFS Tiered Storage
PPTX
Hadoop Distributed File System
PPTX
Hadoop hdfs
PPTX
2.introduction to hdfs
PDF
What is HDFS | Hadoop Distributed File System | Edureka
PPTX
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Hadoop File system (HDFS)
Ravi Namboori Hadoop & HDFS Architecture
HDFS Tiered Storage
Hadoop Distributed File System
Hadoop hdfs
2.introduction to hdfs
What is HDFS | Hadoop Distributed File System | Edureka
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature

What's hot (20)

PPTX
Hadoop distributed file system
PDF
Hadoop HDFS
PDF
Pillars of Heterogeneous HDFS Storage
DOCX
PPTX
Hadoop distributed file system
PPTX
Apache hadoop basics
PPTX
A Basic Introduction to the Hadoop eco system - no animation
PDF
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
PDF
Ceph Days 2014 Paul Evans Slide Deck
PPTX
Apache Hadoop
PPSX
Hadoop – big deal
PPTX
Hadoop in three use cases
PPTX
Hadoop Distributed File System
PPTX
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
PDF
02.28.13 WANdisco ApacheCon 2013
PPTX
Comparison - RDBMS vs Hadoop vs Apache
PPTX
Hadoop Distributed File System
PPTX
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
PPTX
HADOOP TECHNOLOGY ppt
PPT
Hadoop - Introduction to Hadoop
Hadoop distributed file system
Hadoop HDFS
Pillars of Heterogeneous HDFS Storage
Hadoop distributed file system
Apache hadoop basics
A Basic Introduction to the Hadoop eco system - no animation
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Ceph Days 2014 Paul Evans Slide Deck
Apache Hadoop
Hadoop – big deal
Hadoop in three use cases
Hadoop Distributed File System
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
02.28.13 WANdisco ApacheCon 2013
Comparison - RDBMS vs Hadoop vs Apache
Hadoop Distributed File System
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
HADOOP TECHNOLOGY ppt
Hadoop - Introduction to Hadoop
Ad

Similar to DataLogix Hadoop Solution (20)

PDF
Big data overview of apache hadoop
PDF
Big data overview of apache hadoop
PPTX
PDF
Hadoop with Lustre WhitePaper
PDF
PPTX
Big data with HDFS and Mapreduce
PDF
getFamiliarWithHadoop
PPT
ODP
Hadoop HDFS by rohitkapa
PDF
Hadoop Distributed File System
PPTX
unit 2 - book ppt.pptxtyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
ODP
Apache Hadoop HDFS
PPTX
Lecture10_CloudServicesModel_MapReduceHDFS.pptx
PDF
Bigdata Technologies that includes various components .pdf
PPTX
Topic 9a-Hadoop Storage- HDFS.pptx
PDF
Petabyte scale on commodity infrastructure
PPTX
PDF
Introduction to hadoop and hdfs
PPT
hadoop
Big data overview of apache hadoop
Big data overview of apache hadoop
Hadoop with Lustre WhitePaper
Big data with HDFS and Mapreduce
getFamiliarWithHadoop
Hadoop HDFS by rohitkapa
Hadoop Distributed File System
unit 2 - book ppt.pptxtyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
Apache Hadoop HDFS
Lecture10_CloudServicesModel_MapReduceHDFS.pptx
Bigdata Technologies that includes various components .pdf
Topic 9a-Hadoop Storage- HDFS.pptx
Petabyte scale on commodity infrastructure
Introduction to hadoop and hdfs
hadoop
Ad

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
A Presentation on Artificial Intelligence
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
cuic standard and advanced reporting.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPT
Teaching material agriculture food technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
MYSQL Presentation for SQL database connectivity
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Weekly Chronicles - August'25-Week II
A Presentation on Artificial Intelligence
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Machine Learning_overview_presentation.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
cuic standard and advanced reporting.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Spectral efficient network and resource selection model in 5G networks
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
The AUB Centre for AI in Media Proposal.docx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Teaching material agriculture food technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
“AI and Expert System Decision Support & Business Intelligence Systems”

DataLogix Hadoop Solution

  • 1. Hadoop & HDFS
" version 1.0 File & Content Solutions!
  • 2. What is Hadoop! §  Built and distributed as part of the Apache Software Project;
 " §  Hadoop EcoSystem:" §  Common – set of components and interfaces for a DFS and general I/O;" §  Avro – A serialization system for efficient, cross language RPC, and persistent data storage;" §  MapReduce – A distributed data processing model and execution environment that runs on large clusters of commodity machines;" §  HDFS – A distributed File System that runs on large clusters of commodity hardware." File & Content Solutions!
  • 3. Common Terms in Hadoop HDFS! §  Name node - manages the File System namespace. It maintains the File System tree and the metadata for all the files and directories in the tree. 
 
 This information is stored persistently on the local disk in the form of two files: the namespace image and the edit log.
 " §  Data node- Workhorses of the File System. They store and retrieve blocks when they are told to (by clients or the name node), and they report back to the name node periodically with lists of blocks that they are storing." File & Content Solutions!
  • 4. Common Terms in Hadoop HDFS! §  Secondary Name node - Its main role is to periodically merge the namespace image with the edit log to prevent the edit log from becoming too large. The secondary name node usually runs on a separate physical machine
 " File & Content Solutions!
  • 5. Hadoop Distributed File System - HDFS! §  HDFS is a File System designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware. 
 " §  HDFS has a permissions model for files and directories that is much like POSIX." POSIX is an acronym for Portable Operating System Interface." File & Content Solutions!
  • 6. Writing data into Hadoop! File & Content Solutions!
  • 7. Reading data from HDFS! File & Content Solutions!
  • 8. MapReduce! §  "Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node.
 " §  "Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve." " File & Content Solutions!
  • 9. MapReduce! File & Content Solutions!
  • 10. HDFS Storage Solution! §  The DataLogix Hadoop Storage Solution contains:" §  Enterprise Scale-Out storage solution for Hadoop workflows.
 " §  Native connectivity for Hadoop and Eco-systems components:" §  Hive" §  Hbase" §  Pig" §  Mahout
 " §  No single point of failure Name Node;
 " §  No 3x mirroring, native N+M protection is used;
 " §  SnapShot, Sync and NDMP back-up is supported." File & Content Solutions!
  • 11. Writing into Hadoop with the DataLogix solution! §  The storage system becomes the Name Node and as well as the Data Node
 " §  Provides scalability and protection of the data. 
 " §  Hadoop cluster no longer has a single point of failure and no longer writes multiple 64MB-128MB chunks of data to datanodes" File & Content Solutions!
  • 12. Reading Hadoop Data ! §  Data is read off the cluster back to the compute nodes;
 " §  The Data Nodes are now compute nodes and are independent of the data in the Hadoop cluster:" §  Benefits are that Hadoop hardware can be ugraded without the need for migration of data. " File & Content Solutions!
  • 13. More information?!! §  More information about the Hadoop storage solutions?
 
 Please contact us:
 
 DataLogix
 Phone: +31(0)30-7440710
 e-mail: info@datalogix.nl
 
 www.datalogix.nl" File & Content Solutions!