50% found this document useful (2 votes)

1K views4 pages

Google File System Insights

This is a 5-page summary of the paper "The Google File System" by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Uploaded by

Marco

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

50% found this document useful (2 votes)

1K views4 pages

Google File System Insights

This is a 5-page summary of the paper "The Google File System" by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Uploaded by

Marco

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

The Google File System

Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung

The authors of this paper have designed and implemented a scalable distributed file
system for large distributed data-intensive applications called the Google File
System (GFS). The GFS provides fault tolerance while running on inexpensive
commodity hardware and delivers high aggregate performance to a large number of
clients.
The design of the GFS have been driven by key observations of Google applications
workloads and technological environment, such as:

Constant monitoring, error detection, fault tolerance and automatic recovery

must be integral to the system.

I/O operation and block sizes have to be revisited.
Appending becomes the focus of performance optimization and atomicity

guarantees.
Co-designing the applications and the file system API benefits the overall
system by increasing our flexibility.

Assumptions

The system is built from many inexpensive commodity components that often

fail.
The system stores a modest number of large files.
The workloads primarily consist of two kinds of reads: large streaming reads

and small random reads.

The workloads also have many large, sequential writes that append data to

files.
The system must efficiently implement well-defined semantics for multiple

clients that concurrently append to the same file.

High sustained bandwidth is more important than low latency.

Architecture
As shown in the picture,

GFS

cluster consists of a single

master

and

chunk

servers

accessed

multiple
and

multiple

clients.
Chunk servers store chunks (fixed-sized files)

local

disks as Linux files and read or write chunk data specified by a chunk handle and
byte range. For reliability, each chunk is replicated on multiple chunk servers.
Some of the tasks done by the GFS master are:

Maintains all file system metadata.

Make replication decisions using global knowledge.
Chunk lease management.
Garbage collection of orphaned chunks.
Chunk migration between chunk servers.
Communicate with each chunk server through heartbeat messages to give it
instructions and collect its state.

The size of each chunk is 64MB. This is larger than the typical file systems block
sizes. This offers several advantages, such as:

Reduces clients need to interact with the master because reads and writes on
the same chunk require only one initial request to the master for chunk

location information
Client is more likely to perform many operations on a given chunk reducing

network overhead
Reduces the size of the metadata stored on the master

Write Control and Data Flow

1. The

client

asks

the

master

which

chunk server hold the current

lease

for the chunk and the locations

2. The master replies with the
primary and the locations of the
3. The client pushes the data to all
4. Once all the replicas have
receiving the data, the client

of the other replicas.

identity of the
other replicas.
the replicas.
acknowledged
send

write

request to the primary.

5. The primary forwards the write request to
all secondary replicas.
6. The primary replies to the client.

Conclusions

GFS demonstrates the qualities essential for supporting large-scale data

processing workloads on commodity hardware.

Observations have led to radically different points in the design space.
Authors treat component failures as the norm rather than the exception.
GFS system provides fault tolerance by constant monitoring, replicating

crucial data and fast and automatic recovery.

Use check summing to detect data corruption at the disk or IDE subsystem

level.
Design delivers high aggregate throughput to many concurrent readers and

writers performing a variety of tasks.

GFS has successfully met the storage needs in Google and is widely used
within the company as the storage platform for research and development as
well as production data processing.

Questions
The following questions should be answer after analyzing any scientific paper.
1. What is the problem that arises in the paper?

The authors work at one of the biggest technology companies in the

world and they have seen there is a rapidly growing demand of Googles data
processing need. This inspires the authors to take a look into the problem and
analyze the existing distributed file systems.
2. Why is this an interesting or important problem?
Due to the exponential increase of data there is a rapidly growing
demand of data processing need. This is a worldwide problem and not only a
problem at Google. Therefore, any solution to this problem is going to be
helpful to many companies. This is why the solution proposed is interesting
and relevant.
3. What are other solutions that have been proposed to solve this problem?
Solutions proposed to solve this problem include previous distributed
file systems such as AFS, xFS, Frangipani and Intermezzo.
4. What is the solution the authors propose?
The authors designed and implemented the Google File System, a
scalable

distributed

file

system

for

large

distributed

data-intensive

applications. It provides fault tolerance while running on inexpensive

commodity hardware, and it delivers high aggregate performance to a large
number of clients
5. How successful is this solution?
The fact that this implementation is used in one of the worlds biggest
technology companies means that the solution proposed by the authors in
this paper is very successful and it can be implemented by other companies
around the world. Sometimes we read many papers proposing solutions to
different kind of problems. This papers shows us experiments and results on
how the solutions being proposed can solve the problem in discussion but not
all the papers shows us a real-world example of the solution being
implemented. In this case, the file system proposed by the authors is widely
use in real-world scenarios which makes the solution and the paper itself
more interesting and relevant.

Gfs
No ratings yet
Gfs
15 pages
9238 DC Assignment 3
No ratings yet
9238 DC Assignment 3
5 pages
GPS Vs Hdfs
No ratings yet
GPS Vs Hdfs
6 pages
Questions On Google File System
100% (1)
Questions On Google File System
3 pages
The Google File System: 1. Abstract
No ratings yet
The Google File System: 1. Abstract
9 pages
Google File System
No ratings yet
Google File System
48 pages
Chapter 2 1712934164766
No ratings yet
Chapter 2 1712934164766
21 pages
2 Uvm
No ratings yet
2 Uvm
15 pages
Google File System 1
No ratings yet
Google File System 1
48 pages
Google File System and Hadoop Distributed File System-An Analogy
No ratings yet
Google File System and Hadoop Distributed File System-An Analogy
11 pages
Large Scale Distributed File System Survey
No ratings yet
Large Scale Distributed File System Survey
7 pages
Distributed File System Study
No ratings yet
Distributed File System Study
4 pages
Case Study: Google File System
No ratings yet
Case Study: Google File System
7 pages
The Google File System: Alexandru Costan
No ratings yet
The Google File System: Alexandru Costan
38 pages
Refer Slide Time: 00:15
No ratings yet
Refer Slide Time: 00:15
31 pages
MIT 6.824 - Lecture 3 - GFS
No ratings yet
MIT 6.824 - Lecture 3 - GFS
1 page
Google File System Review 2016
No ratings yet
Google File System Review 2016
4 pages
Lecture 4.1 - Hadoop - MapReduce - Hbase
No ratings yet
Lecture 4.1 - Hadoop - MapReduce - Hbase
94 pages
BDA Unit I
No ratings yet
BDA Unit I
18 pages
The Google File System: Firas Abuzaid
No ratings yet
The Google File System: Firas Abuzaid
22 pages
1564-Article Text-2810-1-10-20171231 PDF
No ratings yet
1564-Article Text-2810-1-10-20171231 PDF
5 pages
DS Mod 5.2
No ratings yet
DS Mod 5.2
6 pages
Google File System
No ratings yet
Google File System
6 pages
AStudyOnDistributedFileSystems MahmutUNVER
No ratings yet
AStudyOnDistributedFileSystems MahmutUNVER
6 pages
Saritha Gfs Report
No ratings yet
Saritha Gfs Report
28 pages
Storage Systems
No ratings yet
Storage Systems
23 pages
DS Lecture 5
No ratings yet
DS Lecture 5
28 pages
BDA Unit-1
No ratings yet
BDA Unit-1
19 pages
GFD Summary
No ratings yet
GFD Summary
3 pages
Electronics: Performance Evaluations of Distributed File Systems For Scientific Big Data in FUSE Environment
No ratings yet
Electronics: Performance Evaluations of Distributed File Systems For Scientific Big Data in FUSE Environment
16 pages
A Novel Distributed File System Using Blockchain Metadata
No ratings yet
A Novel Distributed File System Using Blockchain Metadata
20 pages
DC - PPT A Case Study On Distributed File Systems
No ratings yet
DC - PPT A Case Study On Distributed File Systems
17 pages
The File System: Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung (Google)
No ratings yet
The File System: Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung (Google)
31 pages
The Google File System
No ratings yet
The Google File System
21 pages
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
No ratings yet
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
52 pages
Google File System
No ratings yet
Google File System
22 pages
Hadoop and Big Data Unit 2
No ratings yet
Hadoop and Big Data Unit 2
11 pages
Google File System
No ratings yet
Google File System
2 pages
Google File System for Developers
No ratings yet
Google File System for Developers
28 pages
What Is Distributed Data Processing?
No ratings yet
What Is Distributed Data Processing?
2 pages
DBMS Final
No ratings yet
DBMS Final
21 pages
Google File System Overview
No ratings yet
Google File System Overview
18 pages
Bda Material Unit 2
No ratings yet
Bda Material Unit 2
19 pages
A Comparative Study of The Architectures and Applications of Scalable High-Performance Distributed File Systems
No ratings yet
A Comparative Study of The Architectures and Applications of Scalable High-Performance Distributed File Systems
11 pages
Class Notes
No ratings yet
Class Notes
9 pages
2 GFS
No ratings yet
2 GFS
30 pages
Chap 6
No ratings yet
Chap 6
54 pages
Distributed Computing Module 5 Important Topics PYQs
No ratings yet
Distributed Computing Module 5 Important Topics PYQs
23 pages
An Overview of Google File System (GFS) - Medium
No ratings yet
An Overview of Google File System (GFS) - Medium
10 pages
Google File System Report
50% (2)
Google File System Report
36 pages
R16 4-1 BDA - Unit-2 (Ref-3)
No ratings yet
R16 4-1 BDA - Unit-2 (Ref-3)
22 pages
Unit 2
No ratings yet
Unit 2
22 pages
Distributed File Systems Concepts and Examples
No ratings yet
Distributed File Systems Concepts and Examples
55 pages
Dist Sys Unit 4 Notes
No ratings yet
Dist Sys Unit 4 Notes
45 pages
15 Gfs
No ratings yet
15 Gfs
40 pages
Proceedings of The FAST 2002 Conference On File and Storage Technologies
No ratings yet
Proceedings of The FAST 2002 Conference On File and Storage Technologies
15 pages
P2P File Sharing
No ratings yet
P2P File Sharing
43 pages
Xerox D95/D110/D125/D136 Copier/Printer: System Administration Guide
No ratings yet
Xerox D95/D110/D125/D136 Copier/Printer: System Administration Guide
250 pages
Caretaker App
No ratings yet
Caretaker App
9 pages
1.3 Vertical Redundancy Check (VRC) or Parity Check
No ratings yet
1.3 Vertical Redundancy Check (VRC) or Parity Check
2 pages
Big Data Cheat Sheet
No ratings yet
Big Data Cheat Sheet
1 page
HYCU TroubleshootingGuide
No ratings yet
HYCU TroubleshootingGuide
36 pages
Activity No 3 Data Definition and Transfer
No ratings yet
Activity No 3 Data Definition and Transfer
7 pages
Stage Lighting Technician Ebook
100% (1)
Stage Lighting Technician Ebook
100 pages
KL 005.11 Ksws Description en v1.0
100% (1)
KL 005.11 Ksws Description en v1.0
3 pages
Tivoli Access Manager Problem Determination Using Logging and Tracing Features
No ratings yet
Tivoli Access Manager Problem Determination Using Logging and Tracing Features
41 pages
Grade 9 Computer Worksheet
No ratings yet
Grade 9 Computer Worksheet
4 pages
AN12086
No ratings yet
AN12086
16 pages
CPU and System Operations Guide
No ratings yet
CPU and System Operations Guide
3 pages
Oracle DBA 19C Administration
100% (2)
Oracle DBA 19C Administration
72 pages
Computer Boot-Up Troubleshoot Flowchart
100% (1)
Computer Boot-Up Troubleshoot Flowchart
21 pages
Elfin-EW1X User ManualV1.3 (20200415)
100% (1)
Elfin-EW1X User ManualV1.3 (20200415)
26 pages
(Avrdude) Index of - Branches - RELEASE - 5 - 11 - Avrdude
No ratings yet
(Avrdude) Index of - Branches - RELEASE - 5 - 11 - Avrdude
1 page
ENGG1003 Lab01 Fun With Python
No ratings yet
ENGG1003 Lab01 Fun With Python
24 pages
BC 414 - Programming Database Changes Complee
No ratings yet
BC 414 - Programming Database Changes Complee
80 pages
SatLink VSAT User Guide
No ratings yet
SatLink VSAT User Guide
188 pages
Gulfstream G Iv 300 FMS
No ratings yet
Gulfstream G Iv 300 FMS
14 pages
Basics of Computers Secondary Memory
No ratings yet
Basics of Computers Secondary Memory
3 pages
Vivid I - S5 - S6 - SN78261
No ratings yet
Vivid I - S5 - S6 - SN78261
24 pages
Microprocessor Systems Lab Manual 2023-1 PDF
No ratings yet
Microprocessor Systems Lab Manual 2023-1 PDF
130 pages
Wildfire: Palo Alto Networks: Wildfire Datasheet
No ratings yet
Wildfire: Palo Alto Networks: Wildfire Datasheet
4 pages
RNC 3
No ratings yet
RNC 3
8 pages
Top 50 VMware Interview Questions and Answers of 2023
No ratings yet
Top 50 VMware Interview Questions and Answers of 2023
13 pages
DOC-20240607-WA0038. MC Updated
No ratings yet
DOC-20240607-WA0038. MC Updated
2 pages
PDF 17
No ratings yet
PDF 17
13 pages
Variant Demand18
No ratings yet
Variant Demand18
3 pages
Interview Spi
No ratings yet
Interview Spi
14 pages

Google File System Insights

Uploaded by

Google File System Insights

Uploaded by

The Google File System

Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung

Constant monitoring, error detection, fault tolerance and automatic recovery

must be integral to the system.

and small random reads.

clients that concurrently append to the same file.

cluster consists of a single

Maintains all file system metadata.

Write Control and Data Flow

chunk server hold the current

for the chunk and the locations

of the other replicas.

request to the primary.

GFS demonstrates the qualities essential for supporting large-scale data

processing workloads on commodity hardware.

crucial data and fast and automatic recovery.

writers performing a variety of tasks.

The authors work at one of the biggest technology companies in the

applications. It provides fault tolerance while running on inexpensive

You might also like