0% found this document useful (0 votes)

108 views8 pages

Google Cloud's Colossus Unveiled

The document provides an overview of Colossus, Google's scalable storage system that underpins both Google Cloud services and popular products like YouTube and Gmail. It explains the foundational components of Colossus, including its cluster-level file system, distributed metadata model, and how it efficiently manages data storage and access. Colossus is designed to handle massive scalability and durability, ensuring that applications built on Google Cloud can leverage the same robust infrastructure that supports Google's own services.

Uploaded by

bcs2022044

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views8 pages

Google Cloud's Colossus Unveiled

Uploaded by

bcs2022044

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Cloud Blog Contact sales Get started for free

Storage & Data Transfer

Colossus under the hood: a peek

into Google’s scalable storage
system
April 20, 2021

Dean Hildebrand Denis Serenyi

Technical Director, Office of Tech Lead, Google Cloud
the CTO, Google Cloud Storage

You trust Google Cloud with your critical data, but did you
know that Google also relies on the same underlying
storage infrastructure for its other businesses as well?
That’s right, the same storage system that powers Google
Cloud also underpins Google’s most popular products,
supporting globally available services like YouTube, Drive,
Cloud
and Gmail.Blog
Contact sales Get started for free

That foundational storage system is Colossus, which

backs Google’s extensive ecosystem of storage services,
such as Cloud Storage and Firestore, supporting a diverse
range of workloads, including transaction processing,
data serving, analytics and archiving, boot disks, and
home directories.

In this post, we take a deeper look at the storage

infrastructure behind your VMs, specifically the Colossus
file system, and how it helps enable massive scalability
and data durability for Google services as well as your
applications.

Google Cloud scales

because Google scales
Before we dive into how storage services operate, it’s
important to understand the single infrastructure that
supports both Cloud and Google products. Like any well-
designed software system, all of Google is layered with a
common set of scalable services. There are three main
building blocks used by each of our storage services:

Colossus is our cluster-level file system, successor to

the Google File System (GFS).

Spanner is our globally-consistent, scalable relational

database.
Borg is a scalable job scheduler that launches
everything from compute to storage services. It was
and continues to be a big influence on the design and
development of Kubernetes.
These three core building blocks are used to provide the
Cloud
underlyingBlog
Contact sales
infrastructure for all Google Cloud storage
Get started for free

services, from Firestore to Cloud SQL to Filestore, and

Cloud Storage. Whenever you access your favorite
storage services, the same three building blocks are
working together to provide everything you need. Borg
provisions the needed resources, Spanner stores all the
metadata about access permissions and data location,
and then Colossus manages, stores, and provides access
to all your data.

Google Cloud takes these same building blocks and then

layers everything needed to provide the level of
availability, performance, and durability you need from
your storage services. In other words, your own
applications will scale the same as Google products
because they rely on the same core infrastructure based
on these three services scaling to meet your needs.

Colossus in a nutshell
Now, let’s take a closer look at how Colossus works.

But first, a little background on Colossus:

It’s the next-generation of the GFS.

Its design enhances storage scalability and improves

availability to handle the massive growth in data
needs of an ever-growing number of applications.

Colossus introduced a distributed metadata model

that delivered a more scalable and highly available
metadata subsystem.

But how does it all work? And how can one file system
underpin such a wide range of workloads? Below is a
diagram of the key components of the Colossus control
plane:
Cloud Blog Contact sales Get started for free

Client library
The client library is how an application or service interacts
with Colossus. The client is probably the most complex
part of the entire file system. There’s a lot of functionality,
such as software RAID, that goes into the client based on
an application’s requirements. Applications built on top of
Colossus use a variety of encodings to fine-tune
performance and cost trade-offs for different workloads.

Colossus Control Plane

The foundation of Colossus is its scalable metadata
service, which consists of many Curators. Clients talk
directly to curators for control operations, such as file
creation, and can scale horizontally.

Metadata database
Curators store file system metadata in Google’s high-
performance NoSQL database, BigTable. The original
motivation for building Colossus was to solve scaling limits
we experienced with Google File System (GFS) when
trying to accommodate metadata related to Search.
Storing file metadata in BigTable allowed Colossus to
scale up by over 100x over the largest GFS clusters.

D File Servers
Colossus also minimizes the number of hops for data on
the network. Data flows directly between clients and “D”
Cloud Blog
file servers
Contact sales
(our network attached disks).
Get started for free

Custodians
Colossus also includes background storage managers
called Custodians. They play a key role in maintaining the
durability and availability of data as well as overall
efficiency, handling tasks like disk space balancing and
RAID reconstruction.

How Colossus provides

rock-solid, scalable
storage
To see how this all works in action, let’s consider how
Cloud Storage uses Colossus. You’ve probably heard us
talk a lot about how Cloud Storage can support a wide
range of use cases, from archival storage to high
throughput analytics, but we don’t often talk about the
system that lies beneath.

With Colossus, a single cluster is scalable to exabytes of

storage and tens of thousands of machines. In the
example above, for example, we have instances accessing
Cloud Blogfrom Compute EngineContact
Cloud Storage
sales
VMs, YouTube
Get started for free

serving nodes, and Ads MapReduce nodes—all of which

are able to share the same underlying file system to
complete requests. The key ingredient is having a shared
storage pool that is managed by the Colossus control
plane, providing the illusion that each has its own isolated
file system.

Disaggregation of resources drives more efficient use of

valuable resources and lowers costs across all workloads.
For instance, it’s possible to provision for the peak
demand of low latency workloads, like a YouTube video,
and then run batch analytic workloads more cheaply by
having them fill in the gaps of otherwise idle time.

Let’s take a look at a few other benefits Colossus brings

to the table.

Simplify hardware complexity

As you might imagine, any file system supporting Google
services has fairly daunting throughput and scaling
requirements that must handle multi-TB files and massive
datasets. Colossus abstracts away a lot of physical
hardware complexity that would otherwise plague
storage-intensive applications.

Google data centers have a tremendous variety of

underlying storage hardware, offering a mix of spinning
disk and flash storage in many sizes and types. On top of
this, applications have extremely diverse requirements
around durability, availability, and latency. To ensure each
application has the storage it requires, Colossus provides
a range of service tiers. Applications use these different
tiers by specifying I/O, availability, and durability
requirements, and then provisioning resources (bytes and
Cloud Blog undifferentiated units.
I/O) as abstract,
Contact sales Get started for free

In addition, at Google scale, hardware is failing virtually all

the time—not because it’s unreliable, but because there’s
a lot of it. Failures are a natural part of operating at such
an enormous scale, and it’s imperative that its file system
provide fault tolerance and transparent recovery.
Colossus steers IO around these failures and does fast
background recovery to provide highly durable and
available storage.

The end result is that the associated complexity

headaches of dealing with hardware resources are
significantly reduced, making it easy for any application to
get and use the storage it requires.

Maximize storage efficiency

Now, as you might imagine it takes some management
magic to ensure that storage resources are available
when applications need them without overprovisioning.
Colossus takes advantage of the fact that data has a wide
variety of access patterns and frequencies (i.e., hot data
that is accessed frequently) and uses a mix of flash and
disk storage to meet any need.

The hottest data is put on flash for more efficient serving

and lower latency. We buy just enough flash to push the
I/O density per gigabyte into what disks can typically
provide and buy just enough disks to ensure we have
enough capacity. With the right mix, we can maximize
storage efficiency and avoid wasteful overprovisioning.

For disk-based storage, we want to keep disks full and

busy to avoid excess inventory and wasted disk IOPs. To
do this, Colossus uses intelligent disk management to get
as much value as possible from available disk IOPs. Newly
Cloud Blog
written data
Contact sales
(i.e. hotter data) is evenly distributed across
Get started for free

all the drives in a cluster. Data is then rebalanced and

moved to larger capacity drives as it ages and becomes
colder. This works great for analytics workloads, for
example, where data typically cools off as it ages.

Battle-tested to deliver
massive scale
So, there you have it—Colossus is the secret scaling
superpower behind Google’s storage infrastructure.
Colossus not only handles the storage needs of Google
Cloud services, but also provides the storage capabilities
of Google’s internal storage needs, helping to deliver
content to the billions of people using Search, Maps,
YouTube, and more every single day. When you build your
business on Google Cloud you get access to the same
super-charged infrastructure that keeps Google running.
We’ll keep making our infrastructure better, so you don’t
have to.

To learn more about Google Cloud’s storage architecture,

check out the Next ‘20 session from which this post was
developed, “A peek at the Google Storage infrastructure
behind the VM .” And check out the cloud storage website
to learn more about all our storage offerings.

Storage & Data Transfer

Optimizing object
storage costs in Google
Cloud: location and
classes

Inside Google's Colossus File System
No ratings yet
Inside Google's Colossus File System
5 pages
GCP Infrastructure & Compute Engine Guide
No ratings yet
GCP Infrastructure & Compute Engine Guide
27 pages
00 Course Intro
No ratings yet
00 Course Intro
19 pages
CC
No ratings yet
CC
17 pages
Google Cloud Fundamentals: Core Infrastructure: Summary and Next Steps
No ratings yet
Google Cloud Fundamentals: Core Infrastructure: Summary and Next Steps
15 pages
00 Course Intro
No ratings yet
00 Course Intro
26 pages
GCP Fund Module 9 Summary and Review
No ratings yet
GCP Fund Module 9 Summary and Review
13 pages
2023.08.04-Gcp - Gcloud Commands Notes
No ratings yet
2023.08.04-Gcp - Gcloud Commands Notes
2 pages
Sodapdf
No ratings yet
Sodapdf
6 pages
Google Cloud Platform
No ratings yet
Google Cloud Platform
17 pages
Google Cloud Platform
No ratings yet
Google Cloud Platform
17 pages
(T-KUBGKE-B) M1 - Introduction To Google Cloud - ILT v1.7
No ratings yet
(T-KUBGKE-B) M1 - Introduction To Google Cloud - ILT v1.7
68 pages
Google Cloud Platform SGCI Webinar Slides PDF
100% (1)
Google Cloud Platform SGCI Webinar Slides PDF
51 pages
Google Architecture
No ratings yet
Google Architecture
9 pages
Storage Architecture and Challenges: Faculty Summit, July 29, 2010 Andrew Fikes, Principal Engineer
No ratings yet
Storage Architecture and Challenges: Faculty Summit, July 29, 2010 Andrew Fikes, Principal Engineer
25 pages
Unit 4
No ratings yet
Unit 4
41 pages
GoogleCloudFlatform Baisc
No ratings yet
GoogleCloudFlatform Baisc
6 pages
Google Architecture Insights
No ratings yet
Google Architecture Insights
7 pages
4-Goolge Cloud Platform (GCP) Compute, Storage and Network Services-20-08-2024
No ratings yet
4-Goolge Cloud Platform (GCP) Compute, Storage and Network Services-20-08-2024
71 pages
GCP Fundamentals
100% (2)
GCP Fundamentals
178 pages
CC Assignment1
No ratings yet
CC Assignment1
17 pages
An Overview of Google File System (GFS) - Medium
No ratings yet
An Overview of Google File System (GFS) - Medium
10 pages
GCP (Google Cloud Platform) Interview Questions: Click Here
No ratings yet
GCP (Google Cloud Platform) Interview Questions: Click Here
30 pages
Cloud Infrastructure for IT Pros
No ratings yet
Cloud Infrastructure for IT Pros
10 pages
Azure Cheat Sheet
No ratings yet
Azure Cheat Sheet
42 pages
Chapter 4 - Cloud Storage and Platforms
No ratings yet
Chapter 4 - Cloud Storage and Platforms
29 pages
GCP Associate Guide
No ratings yet
GCP Associate Guide
14 pages
GCPvsAWS Ebook Q42020 RF
No ratings yet
GCPvsAWS Ebook Q42020 RF
31 pages
Awesome Google Cloud Resources
No ratings yet
Awesome Google Cloud Resources
8 pages
35395adc GCS 1161
No ratings yet
35395adc GCS 1161
13 pages
GCP1
No ratings yet
GCP1
9 pages
TLW Assignment 3 27-Sep-2024 10-32-28
No ratings yet
TLW Assignment 3 27-Sep-2024 10-32-28
28 pages
Google Distributed System
No ratings yet
Google Distributed System
40 pages
Google Cloud Platform: Quick Overview: Build, Test and Deploy Applications On Google's Infrastructure
No ratings yet
Google Cloud Platform: Quick Overview: Build, Test and Deploy Applications On Google's Infrastructure
22 pages
Google Cloud Platform - A Cheat Sheet - TechRepublic PDF
No ratings yet
Google Cloud Platform - A Cheat Sheet - TechRepublic PDF
12 pages
Unlock The Cloud A Beginners Guide To Google Cloud Platform GCP
No ratings yet
Unlock The Cloud A Beginners Guide To Google Cloud Platform GCP
10 pages
File Module Slides 6 Deploying Applications To Google Cloud en - en
No ratings yet
File Module Slides 6 Deploying Applications To Google Cloud en - en
16 pages
Week 4 GCP Notes
No ratings yet
Week 4 GCP Notes
7 pages
Google Warsaw: EMEA Cloud Innovation Hub
No ratings yet
Google Warsaw: EMEA Cloud Innovation Hub
5 pages
GCP ACE CheatSheets
100% (4)
GCP ACE CheatSheets
49 pages
Introduction to Cloud Computing and GCP
No ratings yet
Introduction to Cloud Computing and GCP
39 pages
Intro To Google Cloud Platform
No ratings yet
Intro To Google Cloud Platform
86 pages
3.2 - Data Storage Services
No ratings yet
3.2 - Data Storage Services
98 pages
PDF OD M4 Storage in The Cloud
No ratings yet
PDF OD M4 Storage in The Cloud
65 pages
GFS - Architecture M5 GFS - Architecture M5
No ratings yet
GFS - Architecture M5 GFS - Architecture M5
25 pages
Cloud Computing
No ratings yet
Cloud Computing
11 pages
Cloud REPORT Aak
No ratings yet
Cloud REPORT Aak
21 pages
CS6065 CCA 3b Cloud Infrastructure
No ratings yet
CS6065 CCA 3b Cloud Infrastructure
25 pages
(T-AK8S-I) M1 - Introduction To Google Cloud - ILT v1.7
No ratings yet
(T-AK8S-I) M1 - Introduction To Google Cloud - ILT v1.7
41 pages
Cloud REPORT Aak
No ratings yet
Cloud REPORT Aak
21 pages
T-GCPBDML-B - M1 - Big Data and Machine Learning On Google Cloud - ILT Slides
No ratings yet
T-GCPBDML-B - M1 - Big Data and Machine Learning On Google Cloud - ILT Slides
76 pages
Google Cloud & ML Specialization Guide
100% (5)
Google Cloud & ML Specialization Guide
25 pages
Script - Google Cloud Infrastructure
No ratings yet
Script - Google Cloud Infrastructure
6 pages
01 Cloud Introduction
No ratings yet
01 Cloud Introduction
140 pages
Cloud Computing
No ratings yet
Cloud Computing
88 pages
Day 1 - Basics of GCP
100% (2)
Day 1 - Basics of GCP
3 pages
GCP Fund Module 4 Storage in The Cloud
100% (1)
GCP Fund Module 4 Storage in The Cloud
37 pages
Storage Components of Google Cloud Platform
No ratings yet
Storage Components of Google Cloud Platform
11 pages
VIM Editor Guide for Linux Users
No ratings yet
VIM Editor Guide for Linux Users
12 pages
Chemical Engineer's Career Profile
No ratings yet
Chemical Engineer's Career Profile
1 page
S2 20212022CSEB424 - 4313 Final Exam
No ratings yet
S2 20212022CSEB424 - 4313 Final Exam
17 pages
Understanding OOP Concepts With C#
No ratings yet
Understanding OOP Concepts With C#
16 pages
Nexgine Utility Lip Manual
No ratings yet
Nexgine Utility Lip Manual
26 pages
Catia Flower
No ratings yet
Catia Flower
25 pages
Final ICT - Computer Programming Grade 11-12
72% (25)
Final ICT - Computer Programming Grade 11-12
16 pages
Youtubexpert GIGs
No ratings yet
Youtubexpert GIGs
71 pages
80 Tips To Increase Your Productivity in Unity 2022 LTS
No ratings yet
80 Tips To Increase Your Productivity in Unity 2022 LTS
94 pages
Product Data Sheet: Asfora - Single Socket Outlet With Side Earth - 16A Cream
No ratings yet
Product Data Sheet: Asfora - Single Socket Outlet With Side Earth - 16A Cream
2 pages
Peoplelink 4K AF Soundbar Plus 2022 1
No ratings yet
Peoplelink 4K AF Soundbar Plus 2022 1
4 pages
M2 3 UML-StateDiagrams
No ratings yet
M2 3 UML-StateDiagrams
9 pages
Synopsis For Government Ration Shop Management System
No ratings yet
Synopsis For Government Ration Shop Management System
2 pages
Aviation Dissertation Writing Guide
100% (2)
Aviation Dissertation Writing Guide
4 pages
Fiserv's Journey To Intelligent Cloud Migration and Optimization With Alation AWS
No ratings yet
Fiserv's Journey To Intelligent Cloud Migration and Optimization With Alation AWS
18 pages
Chapter 2-Let's Move, Maqueen!
No ratings yet
Chapter 2-Let's Move, Maqueen!
6 pages
The Official Ubuntu Book Matthew Helmke Download
100% (2)
The Official Ubuntu Book Matthew Helmke Download
56 pages
MFC400 Datasheet
No ratings yet
MFC400 Datasheet
36 pages
SAP Table Authorizations
No ratings yet
SAP Table Authorizations
24 pages
Basic Memory Interfacing Guide
No ratings yet
Basic Memory Interfacing Guide
139 pages
Bees News Letter: K S R Institute For Engineering and Technology
No ratings yet
Bees News Letter: K S R Institute For Engineering and Technology
8 pages
Computer Aided Machine Drawing Laboratory Manual
No ratings yet
Computer Aided Machine Drawing Laboratory Manual
126 pages
Cover Leter CV Email
No ratings yet
Cover Leter CV Email
6 pages
Math Flash Cards: Activities
No ratings yet
Math Flash Cards: Activities
39 pages
Riyad Ankeh: Mechanical Engineering Student
No ratings yet
Riyad Ankeh: Mechanical Engineering Student
1 page
Lecture 2
No ratings yet
Lecture 2
39 pages
Common Abbreviations in English
No ratings yet
Common Abbreviations in English
23 pages
Data Structures
No ratings yet
Data Structures
7 pages
OmniPCX Office CSTA v5 Guide
No ratings yet
OmniPCX Office CSTA v5 Guide
75 pages
ANSYS Tutorial Mode Superposition
No ratings yet
ANSYS Tutorial Mode Superposition
16 pages

Google Cloud's Colossus Unveiled

Uploaded by

Google Cloud's Colossus Unveiled

Uploaded by

Cloud Blog Contact sales Get started for free

Storage & Data Transfer

Colossus under the hood: a peek

Dean Hildebrand Denis Serenyi

That foundational storage system is Colossus, which

In this post, we take a deeper look at the storage

Google Cloud scales

Colossus is our cluster-level file system, successor to

Spanner is our globally-consistent, scalable relational

services, from Firestore to Cloud SQL to Filestore, and

Google Cloud takes these same building blocks and then

But first, a little background on Colossus:

It’s the next-generation of the GFS.

Its design enhances storage scalability and improves

Colossus introduced a distributed metadata model

Colossus Control Plane

How Colossus provides

With Colossus, a single cluster is scalable to exabytes of

serving nodes, and Ads MapReduce nodes—all of which

Disaggregation of resources drives more efficient use of

Let’s take a look at a few other benefits Colossus brings

Simplify hardware complexity

Google data centers have a tremendous variety of

In addition, at Google scale, hardware is failing virtually all

The end result is that the associated complexity

Maximize storage efficiency

The hottest data is put on flash for more efficient serving

For disk-based storage, we want to keep disks full and

all the drives in a cluster. Data is then rebalanced and

To learn more about Google Cloud’s storage architecture,

Storage & Data Transfer

You might also like