SlideShare a Scribd company logo
Yuval Carmel
Tel-Aviv University
"Advanced Topics in Storage Systems" - Spring 2013
 About & Keywords
 Motivation & Purpose
 Assumptions
 Architecture overview & Comparison
 Measurements
 How does it fit in?
 The Future
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 About & Keywords
 Motivation & Purpose
 Assumptions
 Architecture overview & Comparison
 Measurements
 How does it fit in?
 The Future
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 The Google File System - Sanjay
Ghemawat, Howard Gobioff, and Shun-Tak
Leung, {authors}@Google.com, SOSP’03
 The Hadoop Distributed File System -
Konstantin Shvachko, Hairong Kuang, Sanjay
Radia, Robert Chansler, Sunnyvale, California
USA, {authors}@Yahoo-Inc.com, IEEE2010
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 GFS
 HDFS
 Apache Hadoop – A framework for running
applications on large clusters of commodity
hardware, implements the MapReduce
computational paradigm, and using HDFS as
it’s compute nodes.
 MapReduce – A programming model for
processing large data sets with parallel
distributed algorithm.
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 About & Keywords
 Motivation & Purpose
 Assumptions
 Architecture overview & Comparison
 Measurements
 How does it fit in?
 The Future
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
Early days (at Stanford)
~1998
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 Today…
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 GFS – Implemented especially for meeting the
rapidly growing demands of Google’s data
processing needs.
 HDFS – Implemented for the purpose of
running Hadoop’s MapReduce applications.
Created as an open-source framework for the
usage of different clients with different
needs.
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 About & Keywords
 Motivation
 Assumptions
 Architecture overview & Comparison
 Measurements
 How does it fit in?
 The Future
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 Many inexpensive commodity hardware that
often fail.
 Millions of files, multi-GB files are common
 Two types of reads
◦ Large streaming reads
◦ Small random reads (usually batched together)
 Once written, files are seldom modified
◦ Random writes are supported but do not have to be
efficient.
 Concurrent writes
 High sustained bandwidth is more important
than low latency
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 About & Keywords
 Motivation
 Assumptions
 Architecture overview & Comparison
 Measurements
 How does it fit in?
 The Future
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 File Structure - GFS
◦ Divided into 64 MB chunks
◦ Chunk identified by 64-bit handle
◦ Chunks replicated
◦ (default 3 replicas)
◦ Chunks divided into 64KB blocks
◦ Each block has a 32-bit checksum
 File Structure – HDFS
◦ Divided into 128MB blocks
◦ NameNode holds block replica as 2 files
 One for the data
 One for checksum & generation stamp.
…
chunk
file
blocks
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 Data Flow (I/O operations) – GFS
◦ Leases at primary (60 sec. default)
◦ Client read -
 Sends request to master
 Caches list of replicas
locations for a limited time.
◦ Client Write –
 1-2: client obtains replica
locations and identity of primary replica
 3: client pushes data to replicas
(stored in LRU buffer by chunk servers holding replicas)
 4: client issues update request to primary
 5: primary forwards/performs write request
 6: primary receives replies from replica
 7: primary replies to client
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 Data Flow (I/O operations) – HDFS
◦ No Leases (client decides where to write)
◦ Exposes the file’s block’s locations (enabling
applications like MapReduce to schedule tasks).
◦ Client read & write –
 Similar to GFS.
 Mutation order is handled
with a client constructed
pipeline.
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 Replica management – GFS & HDFS
◦ Placement policy
 Minimizing write cost.
 Reliability & Availability – Different racks
 No more than one replica on one node, and no more
than two replica’s in the same rack (HDFS).
 Network bandwidth utilization – First block same as
writer.
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 Data balancing – GFS
◦ Placing new replicas on chunkservers with below average
disk space utilization
◦ Master rebalances replicas periodically
 Data balancing (The Balancer) – HDFS
◦ Avoiding disk space utilization on write (prevents bottle-
neck situation on a small subset of DataNodes).
◦ Runs as an application in the cluster (by the cluster admin).
◦ Optimizes inter-rack communication.
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 GFS’s consistency model
◦ Write
 Large or cross-chunk writes are divided buy client into individual writes.
◦ Record Append
 GFS’s recommendation (preferred over write).
 Client specifies only the data (no offset).
 GFS chooses the offset and returns to client.
 No locks and client synchronization is needed.
 Atomically, at-least-once semantics.
 Client retries faild operations.
 Defined in regions of successful appends, but may have undefined intervening regions.
◦ Application Safeguard
 Insert checksums in records
headers to detect fragments.
 Insert sequence numbers to
detect duplications.
primary
replica
consistent
primary
replica
defined
primary
replica
inconsistent
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 About & Keywords
 Motivation & Purpose
 Assumptions
 Architecture overview & Comparison
 Measurements
 How does it fit in?
 The Future
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 GFS micro benchmark
◦ Configuration
 one master, two master replicas, 16 chunkservers, and 16 clients. All
the machines are configured with dual 1.4 GHz PIII processors, 2 GB of
memory, two 80 GB 5400 rpm disks, and a 100 Mbps full-duplex
Ethernet connection to an HP 2524 switch. All 19 GFS server machines
are connected to one switch, and all 16 client machines to the other.
The two switches are connected with a 1 Gbps link.
◦ Reads
 N clients read simultaneously from the file system. Each
client reads a randomly selected 4 MB region from a 320 GB
file set. This is repeated 256 times so that each client ends
up reading 1 GB of data.
◦ Writes
 N clients write simultaneously to N distinct files
◦ Record append
 N clients append simultaneously to a single file
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
Total network limit (Read) = 125 MB/s (Switch’s connection)
Network limit per client (Read) = 12.5 MB/s
Total network limit (Write) = 67 MB/s (Each byte is written to three
different chunkservers, total chunkservers is 16)
Record append limit = 12.5 MB/s (appending to the same chunk)
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 Real world clusters (at Google)
*Does not show
chunck fetch
latency in master
(30 to 60 sec)
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 HDFS DFSIO benchmark
◦ 3500 Nodes.
◦ Uses the MapReduce framework.
◦ Read & Write rates
 DFSIO Read: 66 MB/s per node.
 DFSIO Write: 40 MB/s per node.
 Busy cluster read: 1.02 MB/s per node.
 Busy cluster write: 1.09 MB/s per node.
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 About & Keywords
 Motivation & Purpose
 Assumptions
 Architecture overview & Comparison
 Measurements
 How does it fit in?
 The Future
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
GFS / HDFS
MapReduce / Hadoop BigTable / HBase
Sawzall / Pig / Hive
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 About & Keywords
 Assumptions & Purpose
 Architecture overview & Comparison
 Measurements
 How does it fit in?
 The Future
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 Build for “real-time”
low latency
operations instead
of big batch
operations.
 Smaller chuncks
(1MB)
 Constant update
 Eliminated “single
point of failure” in
GFS (The master)
Colossus
Caffeine BigTable
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 Real secondary (“hot” backup) NameNode –
Facebook’s AvatarNode
(Already in production).
 Low latency MapReduce.
 Inter cluster cooperation.
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013
 Hadoop & HDFS User Guide
◦ http://archive.cloudera.com/cdh/3/hadoop/hdfs_user_guide.h
tml
 Google file system at Virginia Tech (CS 5204 – Operating
Systems)
 Hadoop tutorial: Intro to HDFS
◦ http://www.youtube.com/watch?v=ziqx2hJY8Hg

Under the Hood: Hadoop Distributed Filesystem reliability with
Namenode and Avatarnode. by Andrew Ryan for Facebook
Engineering.
HDFS Vs. GFS, "Advanced Topics in
Storage Systems" - Spring 2013

More Related Content

PDF
Hadoop YARN
PDF
Big data real time architectures
PDF
Kafka to the Maxka - (Kafka Performance Tuning)
PDF
The CAP Theorem
PPTX
HBase in Practice
PPT
Hadoop Map Reduce
PDF
The Google File System (GFS)
PDF
Spark (Structured) Streaming vs. Kafka Streams
Hadoop YARN
Big data real time architectures
Kafka to the Maxka - (Kafka Performance Tuning)
The CAP Theorem
HBase in Practice
Hadoop Map Reduce
The Google File System (GFS)
Spark (Structured) Streaming vs. Kafka Streams

What's hot (20)

PPTX
Hadoop hdfs
PDF
Cassandra Introduction & Features
PDF
Hadoop Distributed File System
PPTX
Millions of Regions in HBase: Size Matters
PDF
Ozone and HDFS's Evolution
PDF
HDFS Architecture
PDF
PPTX
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
PPTX
RocksDB detail
PDF
Hadoop & MapReduce
PPTX
Cloud Computing Principles and Paradigms: 5 virtual machines provisioning and...
PPTX
MapReduce Programming Model
PPTX
Data Structures used in Linux kernel
PDF
Dynamo and BigTable - Review and Comparison
PPT
High Performance Computing
PDF
Facebook Messages & HBase
PPT
Hive(ppt)
PDF
Apache Hadoop YARN
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
PPT
Cache memory presentation
Hadoop hdfs
Cassandra Introduction & Features
Hadoop Distributed File System
Millions of Regions in HBase: Size Matters
Ozone and HDFS's Evolution
HDFS Architecture
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
RocksDB detail
Hadoop & MapReduce
Cloud Computing Principles and Paradigms: 5 virtual machines provisioning and...
MapReduce Programming Model
Data Structures used in Linux kernel
Dynamo and BigTable - Review and Comparison
High Performance Computing
Facebook Messages & HBase
Hive(ppt)
Apache Hadoop YARN
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Cache memory presentation
Ad

Similar to Gfs vs hdfs (20)

PDF
IRJET- A Study of Comparatively Analysis for HDFS and Google File System ...
PPTX
storage-systems.pptx
PPT
Gfs final
PPTX
Hadop-HDFS-HDFS-Hadop-HDFS-HDFS-Hadop-HDFS-HDFS
PPT
Gfs google-file-system-13331
PDF
Google File System
PPTX
Cloud storage
PPT
Google File System
PPTX
GFS & HDFS Introduction
PPT
googlefs-vijay.ppt ghix hdlp pdopld og un
PPTX
Cluster based storage - Nasd and Google file system - advanced operating syst...
PPTX
Google
PPTX
Cloud computing UNIT 2.1 presentation in
PPTX
Google file system
PPTX
PDF
HDFS Design Principles
PDF
Filesystem Comparison: NFS vs GFS2 vs OCFS2
PDF
Hadoop data management
PPTX
GFS xouzfz h ghdzg ix booc ug nog ghzg m
PDF
[B4]deview 2012-hdfs
IRJET- A Study of Comparatively Analysis for HDFS and Google File System ...
storage-systems.pptx
Gfs final
Hadop-HDFS-HDFS-Hadop-HDFS-HDFS-Hadop-HDFS-HDFS
Gfs google-file-system-13331
Google File System
Cloud storage
Google File System
GFS & HDFS Introduction
googlefs-vijay.ppt ghix hdlp pdopld og un
Cluster based storage - Nasd and Google file system - advanced operating syst...
Google
Cloud computing UNIT 2.1 presentation in
Google file system
HDFS Design Principles
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Hadoop data management
GFS xouzfz h ghdzg ix booc ug nog ghzg m
[B4]deview 2012-hdfs
Ad

Recently uploaded (20)

PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
cuic standard and advanced reporting.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
KodekX | Application Modernization Development
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
A Presentation on Artificial Intelligence
NewMind AI Monthly Chronicles - July 2025
Understanding_Digital_Forensics_Presentation.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
NewMind AI Weekly Chronicles - August'25 Week I
The Rise and Fall of 3GPP – Time for a Sabbatical?
cuic standard and advanced reporting.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KodekX | Application Modernization Development
Building Integrated photovoltaic BIPV_UPV.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Spectral efficient network and resource selection model in 5G networks
A Presentation on Artificial Intelligence

Gfs vs hdfs

  • 1. Yuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013
  • 2.  About & Keywords  Motivation & Purpose  Assumptions  Architecture overview & Comparison  Measurements  How does it fit in?  The Future HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 3.  About & Keywords  Motivation & Purpose  Assumptions  Architecture overview & Comparison  Measurements  How does it fit in?  The Future HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 4.  The Google File System - Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, {authors}@Google.com, SOSP’03  The Hadoop Distributed File System - Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Sunnyvale, California USA, {authors}@Yahoo-Inc.com, IEEE2010 HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 5.  GFS  HDFS  Apache Hadoop – A framework for running applications on large clusters of commodity hardware, implements the MapReduce computational paradigm, and using HDFS as it’s compute nodes.  MapReduce – A programming model for processing large data sets with parallel distributed algorithm. HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 6.  About & Keywords  Motivation & Purpose  Assumptions  Architecture overview & Comparison  Measurements  How does it fit in?  The Future HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 7. Early days (at Stanford) ~1998 HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 8.  Today… HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 9.  GFS – Implemented especially for meeting the rapidly growing demands of Google’s data processing needs.  HDFS – Implemented for the purpose of running Hadoop’s MapReduce applications. Created as an open-source framework for the usage of different clients with different needs. HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 10.  About & Keywords  Motivation  Assumptions  Architecture overview & Comparison  Measurements  How does it fit in?  The Future HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 11.  Many inexpensive commodity hardware that often fail.  Millions of files, multi-GB files are common  Two types of reads ◦ Large streaming reads ◦ Small random reads (usually batched together)  Once written, files are seldom modified ◦ Random writes are supported but do not have to be efficient.  Concurrent writes  High sustained bandwidth is more important than low latency HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 12.  About & Keywords  Motivation  Assumptions  Architecture overview & Comparison  Measurements  How does it fit in?  The Future HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 13.  File Structure - GFS ◦ Divided into 64 MB chunks ◦ Chunk identified by 64-bit handle ◦ Chunks replicated ◦ (default 3 replicas) ◦ Chunks divided into 64KB blocks ◦ Each block has a 32-bit checksum  File Structure – HDFS ◦ Divided into 128MB blocks ◦ NameNode holds block replica as 2 files  One for the data  One for checksum & generation stamp. … chunk file blocks HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 14. HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 15. HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 16.  Data Flow (I/O operations) – GFS ◦ Leases at primary (60 sec. default) ◦ Client read -  Sends request to master  Caches list of replicas locations for a limited time. ◦ Client Write –  1-2: client obtains replica locations and identity of primary replica  3: client pushes data to replicas (stored in LRU buffer by chunk servers holding replicas)  4: client issues update request to primary  5: primary forwards/performs write request  6: primary receives replies from replica  7: primary replies to client HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 17.  Data Flow (I/O operations) – HDFS ◦ No Leases (client decides where to write) ◦ Exposes the file’s block’s locations (enabling applications like MapReduce to schedule tasks). ◦ Client read & write –  Similar to GFS.  Mutation order is handled with a client constructed pipeline. HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 18.  Replica management – GFS & HDFS ◦ Placement policy  Minimizing write cost.  Reliability & Availability – Different racks  No more than one replica on one node, and no more than two replica’s in the same rack (HDFS).  Network bandwidth utilization – First block same as writer. HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 19.  Data balancing – GFS ◦ Placing new replicas on chunkservers with below average disk space utilization ◦ Master rebalances replicas periodically  Data balancing (The Balancer) – HDFS ◦ Avoiding disk space utilization on write (prevents bottle- neck situation on a small subset of DataNodes). ◦ Runs as an application in the cluster (by the cluster admin). ◦ Optimizes inter-rack communication. HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 20.  GFS’s consistency model ◦ Write  Large or cross-chunk writes are divided buy client into individual writes. ◦ Record Append  GFS’s recommendation (preferred over write).  Client specifies only the data (no offset).  GFS chooses the offset and returns to client.  No locks and client synchronization is needed.  Atomically, at-least-once semantics.  Client retries faild operations.  Defined in regions of successful appends, but may have undefined intervening regions. ◦ Application Safeguard  Insert checksums in records headers to detect fragments.  Insert sequence numbers to detect duplications. primary replica consistent primary replica defined primary replica inconsistent HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 21.  About & Keywords  Motivation & Purpose  Assumptions  Architecture overview & Comparison  Measurements  How does it fit in?  The Future HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 22.  GFS micro benchmark ◦ Configuration  one master, two master replicas, 16 chunkservers, and 16 clients. All the machines are configured with dual 1.4 GHz PIII processors, 2 GB of memory, two 80 GB 5400 rpm disks, and a 100 Mbps full-duplex Ethernet connection to an HP 2524 switch. All 19 GFS server machines are connected to one switch, and all 16 client machines to the other. The two switches are connected with a 1 Gbps link. ◦ Reads  N clients read simultaneously from the file system. Each client reads a randomly selected 4 MB region from a 320 GB file set. This is repeated 256 times so that each client ends up reading 1 GB of data. ◦ Writes  N clients write simultaneously to N distinct files ◦ Record append  N clients append simultaneously to a single file HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 23. Total network limit (Read) = 125 MB/s (Switch’s connection) Network limit per client (Read) = 12.5 MB/s Total network limit (Write) = 67 MB/s (Each byte is written to three different chunkservers, total chunkservers is 16) Record append limit = 12.5 MB/s (appending to the same chunk) HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 24.  Real world clusters (at Google) *Does not show chunck fetch latency in master (30 to 60 sec) HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 25.  HDFS DFSIO benchmark ◦ 3500 Nodes. ◦ Uses the MapReduce framework. ◦ Read & Write rates  DFSIO Read: 66 MB/s per node.  DFSIO Write: 40 MB/s per node.  Busy cluster read: 1.02 MB/s per node.  Busy cluster write: 1.09 MB/s per node. HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 26.  About & Keywords  Motivation & Purpose  Assumptions  Architecture overview & Comparison  Measurements  How does it fit in?  The Future HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 27. GFS / HDFS MapReduce / Hadoop BigTable / HBase Sawzall / Pig / Hive HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 28.  About & Keywords  Assumptions & Purpose  Architecture overview & Comparison  Measurements  How does it fit in?  The Future HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 29.  Build for “real-time” low latency operations instead of big batch operations.  Smaller chuncks (1MB)  Constant update  Eliminated “single point of failure” in GFS (The master) Colossus Caffeine BigTable HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 30.  Real secondary (“hot” backup) NameNode – Facebook’s AvatarNode (Already in production).  Low latency MapReduce.  Inter cluster cooperation. HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013
  • 31.  Hadoop & HDFS User Guide ◦ http://archive.cloudera.com/cdh/3/hadoop/hdfs_user_guide.h tml  Google file system at Virginia Tech (CS 5204 – Operating Systems)  Hadoop tutorial: Intro to HDFS ◦ http://www.youtube.com/watch?v=ziqx2hJY8Hg  Under the Hood: Hadoop Distributed Filesystem reliability with Namenode and Avatarnode. by Andrew Ryan for Facebook Engineering. HDFS Vs. GFS, "Advanced Topics in Storage Systems" - Spring 2013