1.
Explain the basics of the Google App Engine (GAE) infrastructure programming
model.
Introduction:
Google App Engine (GAE) is a Platform as a Service (PaaS) provided by Google that
allows developers to build, deploy, and run web applications on Google’s infrastructure
without worrying about managing servers or hardware.
GAE offers a complete platform including computing power, data storage, security, and load
balancing.
Key Features of GAE:
1. Supports Programming Languages:
o Java and Python are mainly supported.
o Developers can use web frameworks like Django (Python) and Google Web
Toolkit (Java).
2. Automatic Scaling:
o GAE automatically adjusts resources like CPU and memory depending on
traffic.
o No need for manual scaling or managing servers.
3. Load Balancing:
o Distributes incoming traffic efficiently across multiple servers for high
performance.
4. Sandboxed Environment:
o Each app runs in a secure, isolated environment which increases security and
stability.
5. Persistent Data Storage:
o GAE uses BigTable (a NoSQL database) to store structured data.
o Blobstore is available for large file storage (up to 2 GB).
6. APIs and Services:
o Provides built-in APIs for:
▪ Sending emails
▪ Authenticating users via Google accounts
▪ Accessing images, URLs, etc.
7. Free and Pay-as-you-go Model:
o Free usage up to a quota.
o Charges apply only when you exceed the quota.
GAE Architecture:
Component Function
DataStore Stores data using BigTable with support for transactions.
Provides an environment to run Java/Python apps
Application Runtime
securely.
Admin Console Used to deploy, monitor, and manage applications easily.
Google Secure Data Connector
Provides secure access to private data from the cloud.
(SDC)
Allows developers to test apps locally before deploying
Local SDK
to the cloud.
Real-World Applications Built on GAE:
• Gmail
• Google Docs
• Google Maps
• Google Earth
• These apps are scalable and support millions of users globally.
Summary:
Google App Engine allows developers to focus on writing application logic while Google
handles everything else like infrastructure, scaling, and performance. It’s a powerful tool for
building reliable and scalable web applications easily.
2. Outline the architecture of Google File System (GFS).
Introduction:
Google File System (GFS) is a distributed file system created by Google to store and
manage huge amounts of data across many servers. It is mainly used for internal Google
applications like search indexing, Gmail, etc.
Key Design Goals of GFS:
• Handle very large files (hundreds of MB or GB).
• Be fault-tolerant (hardware failures are common).
• Support high throughput rather than low latency.
• Optimized for write-once, read-many usage patterns.
GFS Architecture:
GFS uses a Master–Chunk Server model:
Component Description
Master Controls the file system. Maintains metadata such as file names, chunk
Server locations, and namespace.
Component Description
Chunk Store actual file data in chunks (default size: 64 MB). Each chunk is
Servers replicated on multiple servers (usually 3).
Request file data from the master, then communicate directly with chunk
Clients
servers to read/write chunks.
Data Flow in GFS (Write Operation):
1. Client → Master: Client asks the master which chunk server holds the data and
where the replicas are.
2. Master Response: Master tells the client which server is the primary and the list of
secondaries.
3. Client → Replicas: Client sends the data to all replicas (primary + secondaries).
4. Client → Primary: Once all servers receive the data, the client sends a write
command to the primary server.
5. Primary → Secondaries: Primary assigns a serial number and forwards the
command.
6. All Confirm: Once all secondaries finish writing, they confirm back.
7. Primary → Client: Finally, the primary server informs the client that the write was
successful.
Key Features:
• Fault Tolerance:
o Every chunk is replicated (usually 3 times) across different servers/racks.
o Ensures data availability even if some servers fail.
• Efficient Data Management:
o Large block size (64 MB) helps reduce metadata size and speeds up sequential
data access.
• Master Server Role:
o Handles metadata and gives instructions.
o Doesn’t participate in actual data transfer, improving performance.
• Shadow Master:
o A backup copy of the master to ensure continuity during failures.
Real-Time Example:
Let’s say Google Search needs to index web pages:
• The data is stored in GFS as large files.
• GFS breaks them into chunks, stores them across different servers.
• If one server fails, GFS can still fetch data from its replicas.
Summary:
GFS provides a scalable, fault-tolerant, and high-performance storage system to support
Google’s massive data needs. Its architecture is simple but powerful—based on a central
master, chunk servers, and intelligent client communication.