[go: up one dir, main page]

100% found this document useful (1 vote)
201 views6 pages

Google Bigtable: Describe The Data Model of Bigtable

The Bigtable data model uses a sparse, distributed, multidimensional sorted map indexed by row, column, and timestamp keys storing string values. The Bigtable API allows creating/deleting tables and column families as well as reading, writing, and iterating over table data. Data is stored in SSTables using a block index for efficient lookups. The Chubby service provides a distributed lock used by Bigtable for master election and metadata storage. The Bigtable architecture has client libraries, a single master server, and multiple tablet servers where tablets are assigned, located via a hierarchy, and loads are balanced by the master.

Uploaded by

Đorđe Klisura
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
201 views6 pages

Google Bigtable: Describe The Data Model of Bigtable

The Bigtable data model uses a sparse, distributed, multidimensional sorted map indexed by row, column, and timestamp keys storing string values. The Bigtable API allows creating/deleting tables and column families as well as reading, writing, and iterating over table data. Data is stored in SSTables using a block index for efficient lookups. The Chubby service provides a distributed lock used by Bigtable for master election and metadata storage. The Bigtable architecture has client libraries, a single master server, and multiple tablet servers where tablets are assigned, located via a hierarchy, and loads are balanced by the master.

Uploaded by

Đorđe Klisura
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Google Bigtable

1. Describe the data model of Bigtable.


A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed
by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of
bytes.

(row:string, column:string, time:int64) -> string

2. Describe the design of the Bigtable API.


The Bigtable API provides functions for creating and deleting tables and column families. It also
provides functions for changing cluster, table, and column family metadata, such as access control
rights. Client applications can write or delete values in Bigtable, look up values from individual
rows, or iterate over a subset of the data in a table.
// Write a new anchor and delete an old anchor
RowMutation r1(T, "com.cnn.www");
r1.Set("anchor:www.c-span.org", "CNN");
r1.Delete("anchor:www.abc.com");
Operation op;
Apply(&op, &r1);

Bigtable supports several other features that allow the user to manipulate data in more
complex ways:
1. Bigtable supports single-row transactions, which can be used to perform atomic read-
modify-write sequences on data stored under a single row key.
2. Bigtable allows cells to be used as integer counters.
3. And finally, Bigtable supports the execution of client-supplied scripts in the address
spaces of the servers.
3. Present the Google SSTable file format.

The Google SSTable file format is used internally to store Bigtable data. An SSTable provides a
persistent, ordered immutable map from keys to values, where both keys and values are arbitrary
byte strings. Operations are provided to look up the value associated with a specified key, and to
iterate over all key/value pairs in a specified key range. Internally, each SSTable contains a
sequence of blocks (typically each block is 64KB in size, but this is configurable).
A block index (stored at the end of the SSTable) is used to locate blocks; the index is
loaded into memory when the SSTable is opened. A lookup can be performed with a single disk
seek: we first find the appropriate block by performing a binary search in the in-memory index,
and then reading the appropriate block from disk. Optionally, an SSTable can be completely
mapped into memory, which allows us to perform lookups and scans without touching disk.

4. Describe the role of the Google Chubby service.

Bigtable relies on a highly-available and persistent distributed lock service called Chubby.
A Chubby service consists of five active replicas, one of which is elected to be the master and
actively serve requests. The service is live when a majority of the replicas are running and can
communicate with each other. Chubby uses the Paxos algorithm to keep its replicas consistent in
the face of failure. Chubby provides a namespace that consists of directories and small files. Each
directory or file can be used as a lock, and reads and writes to a file are atomic. The Chubby client
library provides consistent caching of Chubby files. Each Chubby client maintains a session with
a Chubby service. A client’s session expires if it is unable to renew its session lease within the
lease expiration time. When a client’s session expires, it loses any locks and open handles. Chubby
clients can also register callbacks on Chubby files and directories for notification of changes or
session expiration.

Bigtable uses Chubby for a variety of tasks:


 to ensure that there is at most one active master at any time;
 to store the bootstrap location of Bigtable data to discover tablet servers and finalize
tablet server deaths;
 to store Bigtable schema information (the column family information for each
table);
 and to store access control lists.

If Chubby becomes unavailable for an extended period of time, Bigtable becomes unavailable.

5. Explain the architecture of the Google Bigtable server.

The Bigtable implementation has three major components:


1. a library that is linked into every client,
2. one master server, and
3. many tablet servers.
6. Describe the role of the Master server.

The master is responsible for assigning tablets to tablet servers, detecting the addition and
expiration of tablet servers, balancing tablet-server load, and garbage collection of files in GFS. In
addition, it handles schema changes such as table and column family creations. Client data does
not move through the master: clients communicate directly with tablet servers for reads and writes.
Because Bigtable clients do not rely on the master for tablet location information, most clients
never communicate with the master. As a result, the master is lightly loaded in practice.

7. Describe the role of the Tablet server in Bigtable.

Each tablet server manages a set of tablets (typically we have somewhere between ten to a
thousand tablets per tablet server). The tablet server handles read and write requests to the tablets
that it has loaded, and also splits tablets that have grown too large.
Tablet servers can be dynamically added (or removed) from a cluster to accommodate
changes in workloads. A Bigtable cluster stores a number of tables. Each table consists of a set of
tablets, and each tablet contains all data associated with a row range. Initially, each table consists
of just one tablet. As a table grows, it is automatically split into multiple tablets, each
approximately 100-200 MB in size by default.

8. Present the protocol for locating a tablet.

We use a three-level hierarchy analogous to that of a B +- tree to store tablet location information
(Figure 4).

The first level is a file stored in Chubby that contains the location of the root tablet. The root tablet
contains the location of all tablets in a special METADATA table. Each METADATA tablet
contains the location of a set of user tablets.
The root tablet is just the first tablet in the METADATA table, but is treated specially - it is never
split - to ensure that the tablet location hierarchy has no more than three levels. The METADATA
table stores the location of a tablet under a row key that is an encoding of the tablet’s table identifier
and its end row.
The client library caches tablet locations. If the client does not know the location of a tablet, or if
it discovers that cached location information is incorrect, then it recursively moves up the tablet
location hierarchy. If the client’s cache is empty, the location algorithm requires three network
round-trips, including one read from Chubby. If the client’s cache is stale, the location algorithm
could take up to six round-trips, because stale cache entries are only discovered upon misses
(assuming that METADATA tablets do not move very frequently). Although tablet locations are
stored in memory, so no GFS accesses are required, we further reduce this cost in the common
case by having the client library prefetch tablet locations: it reads the metadata for more than one
tablet whenever it reads the METADATA table. We also store secondary information in the
METADATA table, including a log of all events pertaining to each tablet (such as when a server
begins serving it). This information is helpful for debugging and performance analysis.

9. Describe how and when tablets are discovered, assigned, and unassigned.
Each tablet is assigned to one tablet server at a time. The master keeps track of the set of live tablet
servers, and the current assignment of tablets to tablet servers, including which tablets are
unassigned. When a tablet is unassigned, and a tablet server with sufficient room for the tablet is
available, the master assigns the tablet by sending a tablet load request to the tablet server. Bigtable
uses Chubby to keep track of tablet servers. When a tablet server starts, it creates, and acquires an
exclusive lock on, a uniquely-named file in a specific Chubby directory. The master monitors this
directory (the servers directory) to discover tablet servers. A tablet server stops serving its tablets
if it loses its exclusive lock. A tablet server will attempt to reacquire an exclusive lock on its file
as long as the file still exists. If the file no longer exists, then the tablet server will never be able to
serve again, so it kills itself.
The master is responsible for detecting when a tablet server is no longer serving its tablets, and for
reassigning those tablets as soon as possible. To detect when a tablet server is no longer serving
its tablets, the master periodically asks each tablet server for the status of its lock. If a tablet server
reports that it has lost its lock, or if the master was unable to reach a server during its last several
attempts, the master attempts to acquire an exclusive lock on the server’s file. If the master is able
to acquire the lock, then Chubby is live and the tablet server is either dead or having trouble
reaching Chubby, so the master ensures that the tablet server can never serve again by deleting its
server file. Once a server’s file has been deleted, the master can move all the tablets that were
previously assigned to that server into the set of unassigned tablets
When a master is started by the cluster management system, it needs to discover the current tablet
assignments before it can change them. The master executes the following steps at startup.
 The master grabs a unique master lock in Chubby, which prevents concurrent master
instantiations.
 The master scans the servers’ directory in Chubby to find the live servers.
 The master communicates with every live tablet server to discover what tablets are already
assigned to each server.
 The master scans the METADATA table to learn the set of tablets. Whenever this scan
encounters a tablet that is not already assigned, the master adds the tablet to the set of
unassigned tablets, which makes the tablet eligible for tablet assignment.
One complication is that the scan of the METADATA table cannot happen until the METADATA
tablets have been assigned. Therefore, before starting this scan (step 4), the master adds the root
tablet to the set of unassigned tablets if an assignment for the root tablet was not discovered during
step 3. This addition ensures that the root tablet will be assigned. Because the root tablet contains
the names of all METADATA tablets, the master knows about all of them after it has scanned the
root tablet.

10. Explain how and when tablets are merged and split.
The set of existing tablets only changes when a table is created or deleted, two existing tablets are
merged to form one larger tablet, or an existing tablet is split into two smaller tablets. The master
is able to keep track of these changes because it initiates all but the last. Tablet splits are treated
especially since they are initiated by a tablet server. The tablet server commits the split by
recording information for the new tablet in the METADATA table. When the split has committed,
it notifies the master. In case the split notification is lost (either because the tablet server or the
master died), the master detects the new tablet when it asks a tablet server to load the tablet that
has now split. The tablet server will notify the master of the split, because the tablet entry it finds
in the METADATA table will specify only a portion of the tablet that the master asked it to load.

11. Describe tablet read and write operations.


When a write operation arrives at a tablet server, the server checks that it is well-formed, and that
the sender is authorized to perform the mutation. Authorization is performed by reading the list of
permitted writers from a Chubby file (which is almost always a hit in the Chubby client cache). A
valid mutation is written to the commit log. Group commit is used to improve the throughput of
lots of small mutations. After the write has been committed, its contents are inserted into the
memtable.
When a read operation arrives at a tablet server, it is similarly checked for well-formedness and
proper authorization. A valid read operation is executed on a merged view of the sequence of
SSTables and the memtable. Since the SSTables and the memtable are lexicographically sorted
data structures, the merged view can be formed efficiently.
Incoming read and write operations can continue while tablets are split and merged.

12. Describe when and how tablets are compacted.


As write operations execute, the size of the memtable increases. When the memtable size reaches
a threshold, the memtable is frozen, a new memtable is created, and the frozen memtable is
converted to an SSTable and written to GFS.
This minor compaction process has two goals:
 it shrinks the memory usage of the tablet server, and
 it reduces the amount of data that has to be read from the commit log during recovery if
this server dies.
Incoming read and write operations can continue while compactions occur.
Every minor compaction creates a new SSTable. If this behavior continued unchecked, read
operations might need to merge updates from an arbitrary number of SSTables. Instead, we bound
the number of such files by periodically executing a merging compaction in the background. A
merging compaction reads the contents of a few SSTables and the memtable, and writes out a new
SSTable. The input SSTables and memtable can be discarded as soon as the compaction has
finished.
A merging compaction that rewrites all SSTables into exactly one SSTable is called a major
compaction. SSTables produced by non-major compactions can contain special deletion entries
that suppress deleted data in older SSTables that are still live. A major compaction, on the other
hand, produces an SSTable that contains no deletion information or deleted data. Bigtable cycles
through all of its tablets and regularly applies major compactions to them. These major
compactions allow Bigtable to reclaim resources used by deleted data, and also allow it to ensure
that deleted data disappears from the system in a timely fashion, which is important for services
that store sensitive data.

You might also like