[go: up one dir, main page]

0% found this document useful (0 votes)
5 views6 pages

Cloud - Computing - Unit 3

The document discusses various aspects of cloud computing, focusing on data storage and management, including relational databases, cloud file systems, and NoSQL databases. It highlights key technologies like Google File System, Hadoop Distributed File System, Google Bigtable, and Amazon Dynamo, along with their architectures and features. Additionally, it covers the MapReduce programming model for processing large datasets, including its phases and real-world applications.

Uploaded by

aitrc.office
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views6 pages

Cloud - Computing - Unit 3

The document discusses various aspects of cloud computing, focusing on data storage and management, including relational databases, cloud file systems, and NoSQL databases. It highlights key technologies like Google File System, Hadoop Distributed File System, Google Bigtable, and Amazon Dynamo, along with their architectures and features. Additionally, it covers the MapReduce programming model for processing large datasets, including its phases and real-world applications.

Uploaded by

aitrc.office
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Cloud Computing –unit - 3

1. Data in the Cloud


Cloud platforms provide storage and management of data without the need for on-premise
infrastructure. Data can be structured, semi-structured, or unstructured.

1.1 Relational Databases in the Cloud


Relational databases store data in tables (rows and columns), with relationships defined
between tables. SQL is used to query relational databases.

Cloud Examples:

- Amazon RDS

- Google Cloud SQL

- Microsoft Azure SQL Database

Features:

- Automatic backups

- Multi-zone replication

- Automatic failover

- Horizontal and vertical scaling

2. Cloud File Systems


Traditional file systems can't scale to handle petabytes of data efficiently. Cloud computing uses
distributed file systems to manage large data sets across clusters.

2.1 Google File System (GFS)


Proprietary system by Google, designed for huge data across commodity hardware.
Architecture:

- Master Node: metadata and chunk locations

- Chunk Servers: store data blocks (chunks)

- Default chunk size: 64 MB, replicated 3 times

Advantages:

- Fault-tolerant

- Scalable

- Optimized for large, sequential files

2.2 Hadoop Distributed File System (HDFS)


Inspired by GFS, open-source, used in big data.

Architecture:

- NameNode (Master): manages metadata

- DataNodes: store actual blocks

Characteristics:

- Block size: 128 MB (default)

- Fault-tolerance via replication (3 copies)

- Write-once, read-many optimized

2.3 Comparison: GFS vs HDFS


| Feature | GFS | HDFS |

|------------------|-----------|-----------|

| Origin | Google | Apache |

| Open Source | No | Yes |


| Block Size | 64 MB | 128 MB |

| Language | C++ | Java |

| Replication | 3 copies | 3 copies |

3. NoSQL Databases in the Cloud


Created to address scalability limitations of RDBMS.

3.1 Google Bigtable


Structured data across clusters, built on GFS.

Structure:

- Key: Row, column, timestamp

- Supports versioning

Used in:

- Google Search Index

- Google Earth and Maps

3.2 Apache HBase


Open-source Bigtable implementation, runs on HDFS.

Features:

- Column-family oriented

- Real-time read/write

- Schema-less

3.3 Amazon Dynamo


Highly available key-value storage.
Design:

- Eventual Consistency

- Decentralized P2P architecture

- Consistent Hashing, Vector Clocks

Use Cases:

- Shopping carts, Session storage

4. MapReduce and Extensions


Programming model for large dataset processing.

4.1 Concept of MapReduce


Two phases:

1. Map: Emit key-value pairs

2. Reduce: Merge values by key

Example: Word Count

Map: <word, 1>

Reduce: Sum counts per word

4.2 Parallel Computing in MapReduce


- Data and Task parallelism

- Intermediate results shuffled and sorted

Efficient if:

- Local processing
- Minimized data movement

4.3 Relational Operations Using MapReduce


- Selection: Filter in Map

- Projection: Select columns in Map

- Group By: Group in Reduce

- Join: Tagged Map, matched Reduce

4.4 Enterprise Batch Processing


Used for large data transformation jobs.

Tools: Hadoop, Spark

Use Cases:

- Monthly reports

- ETL pipelines

4.5 Real-World Applications of MapReduce


- Log analysis

- Social graphs

- Recommendation engines

- Bioinformatics

- Fraud detection

Summary Table
| Concept | Type | Use Case | Key Feature |

|------------|----------|--------------------------|---------------------------|

| RDBMS | SQL DB | Structured data | ACID transactions |


| GFS/HDFS | File Sys | Big data storage | Distributed, fault-tolerant |

| Bigtable | NoSQL | Google-scale storage | Column family, timestamps |

| HBase | NoSQL | Real-time Hadoop access | Built on HDFS |

| Dynamo | NoSQL | Key-value store | Decentralized, eventual consistency |

| MapReduce | Process | Batch processing | Map & Reduce model |

You might also like