GOOGLE FILE SYSTEM 
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 
Presented By – Ankit Thiranh
OVERVIEW 
• Introduction 
• Architecture 
• Characteristics 
• System Interaction 
• Master Operation and Fault tolerance and diagnosis 
• Measurements 
• Some Real world clusters and their performance
INTRODUCTION 
• Google – large amount of data 
• Need a good file distribution system to process its data 
• Solution: Google File System 
• GFS is : 
• Large 
• Distributed 
• Highly fault tolerant system
ASSUMPTIONS 
• The system is built from many inexpensive commodity components that often fail. 
• The system stores a modest number of large files. 
• Primarily two kind of reads: large streaming reads and small random needs. 
• Many large sequential writes append data to files. 
• The system must efficiently implement well-defined semantics for multiple clients that 
concurrently append to the same file. 
• High sustained bandwidth is more important than low latency.
ARCHITECTURE
CHARACTERISTICS 
• Single master 
• Chunk size 
• Metadata 
• In-Memory Data structures 
• Chunk Locations 
• Operational Log 
• Consistency Model (figure) 
• Guarantees by GFS 
• Implications for Applications 
Write Record Append 
Serial Success defined Defined 
interspersed with 
inconsistent 
Concurrent 
successes 
Consistent but 
undefined 
Failure inconsistent 
File Region State After Mutation
SYSTEM INTERACTION 
• Leases and Mutation Order 
• Data flow 
• Atomic Record appends 
• Snapshot 
Figure 2: Write Control and Data Flow
MASTER OPERATION 
• Namespace Management and Locking 
• Replica Placement 
• Creation, Re-replication, Rebalancing 
• Garbage Collection 
• Mechanism 
• Discussion 
• State Replica Detection
FAULT TOLERANCE AND DIAGNOSIS 
• High Availability 
• Fast Recovery 
• Chunk Replication 
• Master Replication 
• Data Integrity 
• Diagnostics tools
MEASUREMENTS 
Aggregate Throughputs. Top curves show theoretical limits imposed by the network topology. Bottom curves 
show measured throughputs. They have error bars that show 95% confidence intervals, which are illegible in 
some cases because of low variance in measurements.
REAL WORLD CLUSTERS 
• Two clusters were examined: 
• Cluster A used for Research and development by over a hundred users. 
• Cluster B is used for production data processing with occasional human 
intervention 
• Storage 
• Metadata 
Cluster A B 
Chunkservers 342 227 
Available disk Size 
72 TB 
Used Disk Space 
55 TB 
Characteristics of two GFS clusters 
180 TB 
155 TB 
Number of Files 
Number of Dead Files 
Number of chunks 
735 k 
22 k 
992 k 
737 k 
232 k 
1550 k 
Metadata at chunkservers 
Metadata at master 
13 GB 
48 MB 
21 GB 
60 MB
PERFORMANCE EVALUATION OF TWO 
CLUSTERS 
• Read and write rates and Master load 
Cluster A B 
Read Rate (last minute) 583 MB/s 380 MB/s 
Read Rate (last hour) 562 MB/s 384 MB/s 
Read Rate (since start) 589 MB/s 49 MB/s 
Write Rate (last minute) 1 MB/s 101 MB/s 
Write Rate (last hour) 2 MB/s 117 MB/s 
Write Rate (since start) 25 MB/s 13 MB/s 
Master ops (last minute) 325 Ops/s 533 Ops/s 
Master ops (last hour) 381 Ops/s 518 Ops/s 
Master ops (since start) 202 Ops/s 347 Ops/s 
Performance Metrics for Two GFS Clusters
WORKLOAD BREAKDOWN 
• Chunkserver Workload 
Operation Read Write Record Append 
Cluster X Y X Y X Y 
0K 0.4 2.6 0 0 0 0 
1B….1K 0.1 4.1 6.6 4.9 0.2 9.2 
1K…8K 65.2 38.5 0.4 1.0 18.9 15.2 
8K…64K 29.9 45.1 17.8 43.0 78.0 2.8 
64K….128K 0.1 0.7 2.3 1.9 < 0.1 4.3 
128K….256K 0.2 0.3 31.6 0.4 < 0.1 10.6 
256K…512K 0.1 0.1 4.2 7.7 < 0.1 31.2 
512K….1M 3.9 6.9 35.5 28.7 2.2 25.5 
1M..inf 0.1 1.8 1.5 12.3 0.7 2.2 
Operation Read Write Record Append 
Cluster X Y X Y X Y 
1B….1K < 0.1 <0.1 < 0.1 <0.1 < 0.1 <0.1 
1K…8K 13.8 3.9 < 0.1 <0.1 < 0.1 0.1 
8K…64K 11.4 9.3 2.4 5.9 78.0 0.3 
64K….128K 0.3 0.7 0.3 0.3 < 0.1 1.2 
128K….256K 0.8 0.6 16.5 0.2 < 0.1 5.8 
256K…512K 1.4 0.3 3.4 7.7 < 0.1 38.4 
512K….1M 65.9 55.1 74.1 58.0 0.1 46.8 
1M..inf 6.4 28.0 3.3 28.0 53.9 7.4 
Operations Break down by Size (% ) Bytes Transferred Breakdown by Operation Size (% )
WORKLOAD BREAKDOWN 
• Master Workload 
Cluster X Y 
Open 26.1 16.3 
Delete 0.7 1.5 
FindLocation 64.3 65.8 
FindLeaseHolder 7.8 13.4 
FindMatchingFiles 0.6 2.2 
All other combined 0.5 0.8 
Master Requests Break down by Type (% )
Google file system

More Related Content

PPTX
Google File System
PDF
Google File System
PPTX
Google file system
PPTX
Google file system GFS
PPT
google file system
PPT
Hadoop ppt2
PPTX
Case Study - SUN NFS
Google File System
Google File System
Google file system
Google file system GFS
google file system
Hadoop ppt2
Case Study - SUN NFS

What's hot (20)

PPTX
Improving Hadoop Cluster Performance via Linux Configuration
PPTX
Google File System
PDF
Delta from a Data Engineer's Perspective
PDF
TriHUG October: Apache Ranger
PPT
Map reduce in BIG DATA
PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
PDF
Parallel Algorithms
PPT
Anatomy of classic map reduce in hadoop
PPTX
Parallel Programming
PPT
7. Key-Value Databases: In Depth
PPT
Computer organization memory hierarchy
PPTX
Warmhole routing ppt
PPTX
Introduction to HDFS
PPTX
Cloudera Hadoop Distribution
PPT
Secondary storage structure-Operating System Concepts
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
PDF
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
PDF
HDFS Architecture
PDF
Uint-4 Mining Data Stream.pdf
PDF
MapReduce: Simplified Data Processing on Large Clusters
Improving Hadoop Cluster Performance via Linux Configuration
Google File System
Delta from a Data Engineer's Perspective
TriHUG October: Apache Ranger
Map reduce in BIG DATA
From cache to in-memory data grid. Introduction to Hazelcast.
Parallel Algorithms
Anatomy of classic map reduce in hadoop
Parallel Programming
7. Key-Value Databases: In Depth
Computer organization memory hierarchy
Warmhole routing ppt
Introduction to HDFS
Cloudera Hadoop Distribution
Secondary storage structure-Operating System Concepts
HBase and HDFS: Understanding FileSystem Usage in HBase
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
HDFS Architecture
Uint-4 Mining Data Stream.pdf
MapReduce: Simplified Data Processing on Large Clusters
Ad

Viewers also liked (11)

PPTX
Google file system
PPT
Google File System
PPTX
GOOGLE FILE SYSTEM
PDF
The google file system
PPTX
Google File Systems
PDF
The Google File System (GFS)
PPT
GFS - Google File System
PPTX
Cluster based storage - Nasd and Google file system - advanced operating syst...
PPTX
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
PPTX
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se
Google file system
Google File System
GOOGLE FILE SYSTEM
The google file system
Google File Systems
The Google File System (GFS)
GFS - Google File System
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
10 Tips for Making Beautiful Slideshow Presentations by www.visuali.se
Ad

Similar to Google file system (20)

PPT
Advance google file system
PDF
Google File System
PPT
advanced Google file System
PPT
Gfs final
PPTX
GFS xouzfz h ghdzg ix booc ug nog ghzg m
PPT
googlefs-vijay.ppt ghix hdlp pdopld og un
PPTX
Google File System
PPT
Gfs google-file-system-13331
PDF
Google File System: System and Design Overview
PPT
Google file system
PPT
Lalit
PPT
Distributed file systems (from Google)
PDF
The Google file system
PPT
Lec3 Dfs
PPT
Distributed computing seminar lecture 3 - distributed file systems
PPT
PDF
Gfs sosp2003
PPT
tittle
PPTX
GFS & HDFS Introduction
Advance google file system
Google File System
advanced Google file System
Gfs final
GFS xouzfz h ghdzg ix booc ug nog ghzg m
googlefs-vijay.ppt ghix hdlp pdopld og un
Google File System
Gfs google-file-system-13331
Google File System: System and Design Overview
Google file system
Lalit
Distributed file systems (from Google)
The Google file system
Lec3 Dfs
Distributed computing seminar lecture 3 - distributed file systems
Gfs sosp2003
tittle
GFS & HDFS Introduction

Recently uploaded (20)

PPTX
Virtual and Augmented Reality in Current Scenario
PDF
Complications of Minimal Access-Surgery.pdf
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
advance database management system book.pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
PPTX
Computer Architecture Input Output Memory.pptx
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
My India Quiz Book_20210205121199924.pdf
PDF
International_Financial_Reporting_Standa.pdf
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Virtual and Augmented Reality in Current Scenario
Complications of Minimal Access-Surgery.pdf
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
What if we spent less time fighting change, and more time building what’s rig...
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
AI-driven educational solutions for real-life interventions in the Philippine...
Chinmaya Tiranga quiz Grand Finale.pdf
advance database management system book.pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
Computer Architecture Input Output Memory.pptx
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
Paper A Mock Exam 9_ Attempt review.pdf.
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
My India Quiz Book_20210205121199924.pdf
International_Financial_Reporting_Standa.pdf
Uderstanding digital marketing and marketing stratergie for engaging the digi...
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc

Google file system

  • 1. GOOGLE FILE SYSTEM Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presented By – Ankit Thiranh
  • 2. OVERVIEW • Introduction • Architecture • Characteristics • System Interaction • Master Operation and Fault tolerance and diagnosis • Measurements • Some Real world clusters and their performance
  • 3. INTRODUCTION • Google – large amount of data • Need a good file distribution system to process its data • Solution: Google File System • GFS is : • Large • Distributed • Highly fault tolerant system
  • 4. ASSUMPTIONS • The system is built from many inexpensive commodity components that often fail. • The system stores a modest number of large files. • Primarily two kind of reads: large streaming reads and small random needs. • Many large sequential writes append data to files. • The system must efficiently implement well-defined semantics for multiple clients that concurrently append to the same file. • High sustained bandwidth is more important than low latency.
  • 6. CHARACTERISTICS • Single master • Chunk size • Metadata • In-Memory Data structures • Chunk Locations • Operational Log • Consistency Model (figure) • Guarantees by GFS • Implications for Applications Write Record Append Serial Success defined Defined interspersed with inconsistent Concurrent successes Consistent but undefined Failure inconsistent File Region State After Mutation
  • 7. SYSTEM INTERACTION • Leases and Mutation Order • Data flow • Atomic Record appends • Snapshot Figure 2: Write Control and Data Flow
  • 8. MASTER OPERATION • Namespace Management and Locking • Replica Placement • Creation, Re-replication, Rebalancing • Garbage Collection • Mechanism • Discussion • State Replica Detection
  • 9. FAULT TOLERANCE AND DIAGNOSIS • High Availability • Fast Recovery • Chunk Replication • Master Replication • Data Integrity • Diagnostics tools
  • 10. MEASUREMENTS Aggregate Throughputs. Top curves show theoretical limits imposed by the network topology. Bottom curves show measured throughputs. They have error bars that show 95% confidence intervals, which are illegible in some cases because of low variance in measurements.
  • 11. REAL WORLD CLUSTERS • Two clusters were examined: • Cluster A used for Research and development by over a hundred users. • Cluster B is used for production data processing with occasional human intervention • Storage • Metadata Cluster A B Chunkservers 342 227 Available disk Size 72 TB Used Disk Space 55 TB Characteristics of two GFS clusters 180 TB 155 TB Number of Files Number of Dead Files Number of chunks 735 k 22 k 992 k 737 k 232 k 1550 k Metadata at chunkservers Metadata at master 13 GB 48 MB 21 GB 60 MB
  • 12. PERFORMANCE EVALUATION OF TWO CLUSTERS • Read and write rates and Master load Cluster A B Read Rate (last minute) 583 MB/s 380 MB/s Read Rate (last hour) 562 MB/s 384 MB/s Read Rate (since start) 589 MB/s 49 MB/s Write Rate (last minute) 1 MB/s 101 MB/s Write Rate (last hour) 2 MB/s 117 MB/s Write Rate (since start) 25 MB/s 13 MB/s Master ops (last minute) 325 Ops/s 533 Ops/s Master ops (last hour) 381 Ops/s 518 Ops/s Master ops (since start) 202 Ops/s 347 Ops/s Performance Metrics for Two GFS Clusters
  • 13. WORKLOAD BREAKDOWN • Chunkserver Workload Operation Read Write Record Append Cluster X Y X Y X Y 0K 0.4 2.6 0 0 0 0 1B….1K 0.1 4.1 6.6 4.9 0.2 9.2 1K…8K 65.2 38.5 0.4 1.0 18.9 15.2 8K…64K 29.9 45.1 17.8 43.0 78.0 2.8 64K….128K 0.1 0.7 2.3 1.9 < 0.1 4.3 128K….256K 0.2 0.3 31.6 0.4 < 0.1 10.6 256K…512K 0.1 0.1 4.2 7.7 < 0.1 31.2 512K….1M 3.9 6.9 35.5 28.7 2.2 25.5 1M..inf 0.1 1.8 1.5 12.3 0.7 2.2 Operation Read Write Record Append Cluster X Y X Y X Y 1B….1K < 0.1 <0.1 < 0.1 <0.1 < 0.1 <0.1 1K…8K 13.8 3.9 < 0.1 <0.1 < 0.1 0.1 8K…64K 11.4 9.3 2.4 5.9 78.0 0.3 64K….128K 0.3 0.7 0.3 0.3 < 0.1 1.2 128K….256K 0.8 0.6 16.5 0.2 < 0.1 5.8 256K…512K 1.4 0.3 3.4 7.7 < 0.1 38.4 512K….1M 65.9 55.1 74.1 58.0 0.1 46.8 1M..inf 6.4 28.0 3.3 28.0 53.9 7.4 Operations Break down by Size (% ) Bytes Transferred Breakdown by Operation Size (% )
  • 14. WORKLOAD BREAKDOWN • Master Workload Cluster X Y Open 26.1 16.3 Delete 0.7 1.5 FindLocation 64.3 65.8 FindLeaseHolder 7.8 13.4 FindMatchingFiles 0.6 2.2 All other combined 0.5 0.8 Master Requests Break down by Type (% )

Editor's Notes

  • #6: GFS – single master, multiple chunkservers, multiple client. Files- divided into chunks, chunks- immutable and globally unique 64 bit chunk handle. Stored in multiple chunkservers, master- contains metadata includes the namespace, access control information, mapping of file to chunks and current location of chunks
  • #7: Single Master- can make sophisticated chunk replacement and replication decisions using global knowledge. Read example Chunk Size – 64 MB, advantages – reduces client-master interation, client more likely to perform many operations on given chunk, reduces metadata size. Metadata – stores file and chunk namespaces, mapping from files to chunks, location to chunk’s relica, metadata stored in memory to do fast operations, chunk location – does not keep a record, polls at startup, monitor by sending heartbeat messages,operation log- contains a history of critical metadata changes. Guarantee- application mutation on same order to all the replicas , using chunk version numbers to detect any replica Consistent – all replicas have the same data, defined – consistent – defined and client can see what the mutation has written
  • #8: Mutation – operation that changes the content of metadata Data flow – bandwidth – data is [pushed linearly along the server, avoid bottlenecks and high-latency links- each machine forwards the data to closest possible, latency min – pipelining the data transfer over TCP connections. Record append – client specifies the data, GFS appends automatically, same way as control flow Snapshots – makes a copy of file or ‘directory tree’ minimizing any interruption with ongoing mutations
  • #9: Master – executes all namespace operations, manages chunk replicas, Namespace – GFS logically represent its namespace as a look up table mapping full path names to metadata. Replica placement - 1) maximise data reliability and availability, and 2) maximum bandwidth utilization Creation, re-replication – replicas on severs with below average disk utilization, limit recent creation on each chunk server, spread replicas of a chunk across racks Garbage collection – after deletion, file renamed to a hidden file, deleted after 3 days, orphaned chunks, State replica detection – chunkserver failure missing mutation while it is down, master assigns – chunk server numbers to distinguish
  • #10: Fast recovery – mast and chunk server designed such that they restore their data and start in two seconds Chunk replication – discussed earlier Master replication – operations log and checkpoints are replicated on multiple machines, shadow masters – provide read-only access Data integrity – uses checksumming to detect corruption of stored data, we can recover from corruption using replicas, but it is impractical Diagnostic tools – generate diagnostic logs that record many significant events. The RPC logs include the exact requests and responses sent on the wire, except for the file data being read or written.
  • #12: The two clusters have similar numbers of files, though B has a larger proportion of dead files, namely files which were deleted or replaced by a new version but whose storage have not yet been reclaimed. It also has more chunks because its files tend to be larger
  • #14: Read returns no data in Y b’coz applications in production system use file as producer-consumer queues cluster Y sees a much higher percentage of large record appends than cluster X does because our production systems, which use cluster Y, are more aggressively tuned for GFS