SlideShare a Scribd company logo
HDFS Federation


Suresh Srinivas
@suresh_m_s




                  Page 1
Agenda

• HDFS Background

• Current Limitations

• Federation Architecture

• Federation Details

• Next Steps
HDFS Architecture

                                                        Two main layers
                                                        •   Namespace
Block Storage Namespace




                                                            ›   Consists of dirs, files and blocks
                            Namenode
                                        NS                  ›   Supports create, delete, modify and list files or
                                                                dirs operations
                                Block Management


                                                        •   Block Storage
                           Datanode     …    Datanode       ›   Block Management
                                      Storage                   • Datanode cluster membership
                                                                • Supports create/delete/modify/get block location
                                                                  operations
                                                                • Manages replication and replica placement
                                                            ›   Storage - provides read and write access to
                                                                blocks



                                                                   3
HDFS Architecture…

                                                        Implemented as
Block Storage Namespace




                           Namenode                     • Single Namespace Volume
                                        NS                 › Namespace Volume = Namespace +
                                Block Management             Blocks
                                                        • Single namenode with a namespace
                           Datanode     …    Datanode      › Entire namespace is in memory
                                      Storage              › Provides Block Management
                                                        • Datanodes store block replicas
                                                           › Block files stored on local file system




                                                                4
Limitation - Scalability

 Scalability
 •   Storage scales horizontally - namespace doesn’t
 •   Limited number of files, dirs and blocks
     ›   250 million files and blocks at 64GB Namenode heap size
           •   Still a very large cluster
           •   Facebook clusters are sized at ~70 PB storage



 Performance
 •   File system operations throughput limited by a single node
     ›   120K read ops/sec and 6000 write ops/sec
           •   Easily scalable to 20K write ops/sec by code improvements



                                              5
Limitation - Isolation

 Poor Isolation
 •   All the tenants share a single namespace
     ›   Separate volume for tenants is desirable

 •   Lacks separate namespace for different application
     categories or application requirements
     ›   Experimental apps can affect production apps

     ›   Example - HBase could use its own namespace




                                       6
Limitation – Tight Coupling

Namespace and Block Management are distinct services
• Tightly coupled due to co-location
• Scaling block management independent of namespace is simpler
• Simplifies Namespace and scaling it


Block Storage could be a generic service
• Namespace is one of the applications to use the service
• Other services can be built directly on Block Storage
   › HBase
   ›   Foreign namespaces



                                    7
Isolation is a problem for even small
                 clusters




                   8
HDFS Federation
         Namespace       NN-1                    NN-k                   NN-n

                                                                               Foreig
                                NS1                     NS k                   n NS n
                                           ...                    ...


                                  Pool 1             Pool k              Pool n
         Block Storage




                                                    Block Pools




                          Datanode 1               Datanode 2            Datanode m
                                ...                     ...                    ...
                                                 Common Storage


•   Multiple independent Namenodes and Namespace Volumes in a cluster
     ›   Namespace Volume = Namespace + Block Pool
•   Block Storage as generic storage service
     ›   Set of blocks for a Namespace Volume is called a Block Pool
     ›   DNs store blocks for all the Namespace Volumes – no partitioning
Key Ideas & Benefits

•   Distributed Namespace: Partitioned across namenodes
                                                                                        Alternate NN
     ›     Simple and Robust due to independent masters                                Implementation   HBase
                                                                           HDFS
         • Each master serves a namespace volume                         Namespace                      MR tmp

         • Preserves namenode stability – little namenode code change

     ›     Scalability – 6K nodes, 100K tasks, 200PB and 1 billion files

•   Block Pools enable generic storage service                                       Storage Service


     ›     Enables Namespace Volumes to be independent of each other
     ›     Fuels innovation and Rapid development
            • New implementations of file systems and Applications on top of block storage possible
            • New block pool categories – tmp storage, distributed cache, small object storage

•   In future, move Block Management out of namenode to separate set of nodes
     ›     Simplifies namespace/application implementation
            • Distributed namenode becomes significantly simpler
HDFS Federation Details
• Simple design
   › Little change to the Namenode, most changes in Datanode, Config and Tools
   › Core development in 4 months
   › Namespace and Block Management remain in Namenode
      • Block Management could be moved out of namenode in the future

• Little impact on existing deployments
   › Single namenode configuration runs as is

• Datanodes provide storage services for all the namenodes
   › Register with all the namenodes
   › Send periodic heartbeats and block reports to all the namenodes
   › Send block received/deleted for a block pool to corresponding namenode



                                           11
HDFS Federation Details…
• Cluster Web UI for better manageability
    › Provides cluster summary
    › Namenode list and summary of namenode status
    › Decommissioning status
• Tools
    › Decommissioning works with multiple namespace
    › Balancer works with multiple namespaces
       • Both Datanode storage or Block Pool storage can be balanced

• Namenode can be added/deleted in Federated cluster
    › No need to restart the cluster
• Single configuration for all the nodes in the cluster
Managing Namespaces
•   Federation has multiple namespaces – Don’t                                 Client-side
    you need a single global namespace?                                    /
                                                                               mount-table
     ›   Key is to share the data and the names used
         to access the data

•   A global namespace is one way to do that
                                                               data project home     tmp
•   Client-side mount table is another way to
    share.
     ›   Shared mount-table => “global” shared view
     ›   Personalized mount-table => per-application                                  NS4
         view
         • Share the data that matter by mounting it

•   Client-side implementation of mount tables
                                                         NS1       NS2         NS3
     ›   No single point of failure
     ›   No hotspot for root and top level directories
Next Steps

• Complete separation of namespace and block
  management layers
  › Block storage as generic service

• Partial namespace in memory for further scalability
• Move partial namespace from one namenode to
  another
  › Namespace operation - no data copy
Next Steps…
                          • Namenode as a container for namespaces
                             › Lots of small namespaces
                               •   Chosen per user/tenant/data feed

                               •   Mount tables for unified namespace
                                   •    Can be managed by a central volume server
                 …

     Namenodes               › Move namespace from one container to
                               another for balancing
Datanode   …   Datanode
                          • Combined with partial namespace
                             › Choose number of namenodes to match
                               •   Sum of (Namespace working set)

                               •   Sum of (Namespace throughput)




                                   15
Thank You

More information
1. HDFS-1052 – HDFS Scalability with multiple namenodes
2. An Introduction to HDFS Federation –
  https://hortonworks.com/an-introduction-to-hdfs-federation/

More Related Content

PDF
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
PPTX
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
PDF
Novell Storage Manager: Your Secret Weapon for Simplified File and User Manag...
PPSX
Exchange 2010 ha ctd
PDF
Gluster Webinar: Introduction to GlusterFS
PDF
Inexpensive storage
PDF
Life without the Novell Client
PDF
Methods of NoSQL database systems benchmarking
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Novell Storage Manager: Your Secret Weapon for Simplified File and User Manag...
Exchange 2010 ha ctd
Gluster Webinar: Introduction to GlusterFS
Inexpensive storage
Life without the Novell Client
Methods of NoSQL database systems benchmarking

What's hot (20)

PDF
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
PDF
Intro to GlusterFS Webinar - August 2011
ODP
Benchmarking MongoDB and CouchBase
PDF
Introduction to hadoop and hdfs
PPT
1556 a 07
PDF
Extending the lifecycle of your storage area network
PPTX
Hadoop on Virtual Machines
PDF
Ph.D. thesis presentation
PPTX
Gluster Blog 11.15.2010
PDF
Cache-partitioning
PDF
My sql with enterprise storage
PDF
Dell Solutions Tour 2015 - Azure i ditt eget datasenter, Kristian Nese, CTO L...
PDF
Embracing Open Source: Practice and Experience from Alibaba
PPTX
CodeFutures - Scaling Your Database in the Cloud
PDF
Virtualization Primer for Java Developers
PPTX
Severalnines Training: MySQL Cluster - Part X
PDF
LVS development and experience
PPTX
Vancouver bug enterprise storage and zfs
PDF
Apache Hadoop on Virtual Machines
PPTX
Nn ha hadoop world.final
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Intro to GlusterFS Webinar - August 2011
Benchmarking MongoDB and CouchBase
Introduction to hadoop and hdfs
1556 a 07
Extending the lifecycle of your storage area network
Hadoop on Virtual Machines
Ph.D. thesis presentation
Gluster Blog 11.15.2010
Cache-partitioning
My sql with enterprise storage
Dell Solutions Tour 2015 - Azure i ditt eget datasenter, Kristian Nese, CTO L...
Embracing Open Source: Practice and Experience from Alibaba
CodeFutures - Scaling Your Database in the Cloud
Virtualization Primer for Java Developers
Severalnines Training: MySQL Cluster - Part X
LVS development and experience
Vancouver bug enterprise storage and zfs
Apache Hadoop on Virtual Machines
Nn ha hadoop world.final
Ad

Similar to Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks (20)

PPTX
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
PPTX
March 2011 HUG: HDFS Federation
PPTX
HDFS Federation++
PDF
Hdfs Dhruba
PDF
HDFS Architecture
PPTX
Apache Hadoop
PDF
Tutorial Haddop 2.3
PPTX
Ted Dunning - Whither Hadoop
PDF
Design for a Distributed Name Node
PPTX
Evolving HDFS to a Generalized Distributed Storage Subsystem
PPTX
Evolving HDFS to a Generalized Storage Subsystem
PDF
PDF
Sep 2012 HUG: Giraffa File System to Grow Hadoop Bigger
PPTX
Hadoop HDFS Architeture and Design
PPTX
Cloud Computing - Cloud Technologies and Advancements
PPT
Borthakur hadoop univ-research
PDF
Design, Scale and Performance of MapR's Distribution for Hadoop
PPTX
Google
PPTX
Hadop-HDFS-HDFS-Hadop-HDFS-HDFS-Hadop-HDFS-HDFS
PPTX
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
March 2011 HUG: HDFS Federation
HDFS Federation++
Hdfs Dhruba
HDFS Architecture
Apache Hadoop
Tutorial Haddop 2.3
Ted Dunning - Whither Hadoop
Design for a Distributed Name Node
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
Sep 2012 HUG: Giraffa File System to Grow Hadoop Bigger
Hadoop HDFS Architeture and Design
Cloud Computing - Cloud Technologies and Advancements
Borthakur hadoop univ-research
Design, Scale and Performance of MapR's Distribution for Hadoop
Google
Hadop-HDFS-HDFS-Hadop-HDFS-HDFS-Hadop-HDFS-HDFS
Ad

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Introducing the data science sandbox as a service 8.30.18
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Advanced IT Governance
PDF
Empathic Computing: Creating Shared Understanding
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Cloud computing and distributed systems.
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Approach and Philosophy of On baking technology
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Electronic commerce courselecture one. Pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
Advanced Soft Computing BINUS July 2025.pdf
Advanced IT Governance
Empathic Computing: Creating Shared Understanding
Review of recent advances in non-invasive hemoglobin estimation
Cloud computing and distributed systems.
Unlocking AI with Model Context Protocol (MCP)
Approach and Philosophy of On baking technology
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Per capita expenditure prediction using model stacking based on satellite ima...
Dropbox Q2 2025 Financial Results & Investor Presentation
Big Data Technologies - Introduction.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Weekly Chronicles - August'25 Week I
Reach Out and Touch Someone: Haptics and Empathic Computing
Electronic commerce courselecture one. Pdf

Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks

  • 2. Agenda • HDFS Background • Current Limitations • Federation Architecture • Federation Details • Next Steps
  • 3. HDFS Architecture Two main layers • Namespace Block Storage Namespace › Consists of dirs, files and blocks Namenode NS › Supports create, delete, modify and list files or dirs operations Block Management • Block Storage Datanode … Datanode › Block Management Storage • Datanode cluster membership • Supports create/delete/modify/get block location operations • Manages replication and replica placement › Storage - provides read and write access to blocks 3
  • 4. HDFS Architecture… Implemented as Block Storage Namespace Namenode • Single Namespace Volume NS › Namespace Volume = Namespace + Block Management Blocks • Single namenode with a namespace Datanode … Datanode › Entire namespace is in memory Storage › Provides Block Management • Datanodes store block replicas › Block files stored on local file system 4
  • 5. Limitation - Scalability Scalability • Storage scales horizontally - namespace doesn’t • Limited number of files, dirs and blocks › 250 million files and blocks at 64GB Namenode heap size • Still a very large cluster • Facebook clusters are sized at ~70 PB storage Performance • File system operations throughput limited by a single node › 120K read ops/sec and 6000 write ops/sec • Easily scalable to 20K write ops/sec by code improvements 5
  • 6. Limitation - Isolation Poor Isolation • All the tenants share a single namespace › Separate volume for tenants is desirable • Lacks separate namespace for different application categories or application requirements › Experimental apps can affect production apps › Example - HBase could use its own namespace 6
  • 7. Limitation – Tight Coupling Namespace and Block Management are distinct services • Tightly coupled due to co-location • Scaling block management independent of namespace is simpler • Simplifies Namespace and scaling it Block Storage could be a generic service • Namespace is one of the applications to use the service • Other services can be built directly on Block Storage › HBase › Foreign namespaces 7
  • 8. Isolation is a problem for even small clusters 8
  • 9. HDFS Federation Namespace NN-1 NN-k NN-n Foreig NS1 NS k n NS n ... ... Pool 1 Pool k Pool n Block Storage Block Pools Datanode 1 Datanode 2 Datanode m ... ... ... Common Storage • Multiple independent Namenodes and Namespace Volumes in a cluster › Namespace Volume = Namespace + Block Pool • Block Storage as generic storage service › Set of blocks for a Namespace Volume is called a Block Pool › DNs store blocks for all the Namespace Volumes – no partitioning
  • 10. Key Ideas & Benefits • Distributed Namespace: Partitioned across namenodes Alternate NN › Simple and Robust due to independent masters Implementation HBase HDFS • Each master serves a namespace volume Namespace MR tmp • Preserves namenode stability – little namenode code change › Scalability – 6K nodes, 100K tasks, 200PB and 1 billion files • Block Pools enable generic storage service Storage Service › Enables Namespace Volumes to be independent of each other › Fuels innovation and Rapid development • New implementations of file systems and Applications on top of block storage possible • New block pool categories – tmp storage, distributed cache, small object storage • In future, move Block Management out of namenode to separate set of nodes › Simplifies namespace/application implementation • Distributed namenode becomes significantly simpler
  • 11. HDFS Federation Details • Simple design › Little change to the Namenode, most changes in Datanode, Config and Tools › Core development in 4 months › Namespace and Block Management remain in Namenode • Block Management could be moved out of namenode in the future • Little impact on existing deployments › Single namenode configuration runs as is • Datanodes provide storage services for all the namenodes › Register with all the namenodes › Send periodic heartbeats and block reports to all the namenodes › Send block received/deleted for a block pool to corresponding namenode 11
  • 12. HDFS Federation Details… • Cluster Web UI for better manageability › Provides cluster summary › Namenode list and summary of namenode status › Decommissioning status • Tools › Decommissioning works with multiple namespace › Balancer works with multiple namespaces • Both Datanode storage or Block Pool storage can be balanced • Namenode can be added/deleted in Federated cluster › No need to restart the cluster • Single configuration for all the nodes in the cluster
  • 13. Managing Namespaces • Federation has multiple namespaces – Don’t Client-side you need a single global namespace? / mount-table › Key is to share the data and the names used to access the data • A global namespace is one way to do that data project home tmp • Client-side mount table is another way to share. › Shared mount-table => “global” shared view › Personalized mount-table => per-application NS4 view • Share the data that matter by mounting it • Client-side implementation of mount tables NS1 NS2 NS3 › No single point of failure › No hotspot for root and top level directories
  • 14. Next Steps • Complete separation of namespace and block management layers › Block storage as generic service • Partial namespace in memory for further scalability • Move partial namespace from one namenode to another › Namespace operation - no data copy
  • 15. Next Steps… • Namenode as a container for namespaces › Lots of small namespaces • Chosen per user/tenant/data feed • Mount tables for unified namespace • Can be managed by a central volume server … Namenodes › Move namespace from one container to another for balancing Datanode … Datanode • Combined with partial namespace › Choose number of namenodes to match • Sum of (Namespace working set) • Sum of (Namespace throughput) 15
  • 16. Thank You More information 1. HDFS-1052 – HDFS Scalability with multiple namenodes 2. An Introduction to HDFS Federation – https://hortonworks.com/an-introduction-to-hdfs-federation/