[go: up one dir, main page]

0% found this document useful (0 votes)
15 views48 pages

Google File System 1

Google File System (GFS) is a scalable distributed file system designed for large data-intensive applications, focusing on performance, reliability, scalability, and availability. It features a familiar file system interface, supports atomic append operations, and allows concurrent writes while ensuring atomicity. GFS architecture consists of a master and multiple chunk-servers, where files are divided into fixed-size chunks that are replicated for fault tolerance.

Uploaded by

rajopradhan77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views48 pages

Google File System 1

Google File System (GFS) is a scalable distributed file system designed for large data-intensive applications, focusing on performance, reliability, scalability, and availability. It features a familiar file system interface, supports atomic append operations, and allows concurrent writes while ensuring atomicity. GFS architecture consists of a master and multiple chunk-servers, where files are divided into fixed-size chunks that are replicated for fault tolerance.

Uploaded by

rajopradhan77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Google file System

Google file System


Google file system (GFS)
➢ Google File System, a scalable distributed file system for large
distributed data-intensive applications.
➢ Google File System (GFS) to meet the rapidly growing demands of
Google’s data processing needs.
➢ GFS shares many of the same goals as other distributed file systems
such as performance, scalability, reliability, and availability.
➢ GFS provides a familiar file system interface.
➢ Files are organized hierarchically in directories and identified by
pathnames.
➢ Support the usual operations to create, delete, open, close, read, and
write files.
GFS
➢ Small as well as multi-GB files are common.
➢Each file typically contains many application objects such as web
documents.
➢ GFS provides an atomic append operation called record append.
In a traditional write, the client specifies the offset at which data is to
be written.
➢Concurrent writes to the same region are not serializable.
➢GFS has snapshot and record append operations.
GFS (snapshot and record append)
➢The snapshot operation makes a copy of a file or a directory.
➢Record append allows multiple clients to append data to the
same file concurrently while guaranteeing the atomicity of each
individual client’s append.
➢ It is useful for implementing multi-way merge results.
➢GFS consist of two kinds of reads: large streaming reads and
small random reads.
➢ In large streaming reads, individual operations typically read
hundreds of KBs, more commonly 1 MB or more.
➢A small random read typically reads a few KBs at some
arbitrary offset.
Common Goals of GFS
and most Distributed File Systems
➢ Performance
➢ Reliability
➢ Scalability
➢ Availability
Other GFS Concepts
➢ Component failures are the norm rather
than the exception.
➢File System consists of hundreds or even thousands
of storage machines built from inexpensive
commodity parts.

➢ Files are Huge. Multi-GB Files are common.


➢ Each file typically contains many application
objects such as web documents.
➢ Append, Append, Append.
➢ Most files are mutated by appending new data
rather than overwriting
Other GFS Concepts
➢ Why assume hardware failure is the norm?
➢ It is cheaper to assume common failure on poor hardware and
account for it, rather than invest in expensive hardware and still
experience occasional failure.
➢The amount of layers in a distributed system (network, disk,
memory, physical connections, power, OS, application) mean
failure on any could contribute to data corruption.
GFS Interface
➢ GFS – familiar file system interface
➢ Files organized hierarchically in directories,
path names
➢ Create, delete, open, close, read, write
operations
➢ Snapshot and record append (allows multiple
clients to append simultaneously - atomic)
GFS Architecture
GFS Architecture
GFS Architecture
Chunk
GFS Architecture
➢A GFS cluster consists of a single master and multiple chunk-
servers and is accessed by multiple clients, Each of these is typically
a commodity Linux machine.

➢ It is easy to run both a chunk-server and a client on the same


machine.

➢As long as machine resources permit, it is possible to run flaky


application code is acceptable.
GFS Architecture
➢Files are divided into fixed-size chunks.
➢ Each chunk is identified by an immutable and
globally unique 64 bit chunk assigned by the
master at the time of chunk creation.
➢Chunk-servers store chunks on local disks as
Linux files, each chunk is replicated on multiple
chunk-servers.
➢The master maintains all file system metadata.
This includes the namespace, access control
GFS is fault tolerance?
Consistency
Consistency
Write control and data flow
Replica placement
Replica placement
Garbage Collection

You might also like