0% found this document useful (0 votes)

5 views6 pages

Cloud - Computing - Unit 3

The document discusses various aspects of cloud computing, focusing on data storage and management, including relational databases, cloud file systems, and NoSQL databases. It highlights key technologies like Google File System, Hadoop Distributed File System, Google Bigtable, and Amazon Dynamo, along with their architectures and features. Additionally, it covers the MapReduce programming model for processing large datasets, including its phases and real-world applications.

Uploaded by

aitrc.office

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views6 pages

Cloud - Computing - Unit 3

Uploaded by

aitrc.office

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Cloud Computing –unit - 3

1. Data in the Cloud

Cloud platforms provide storage and management of data without the need for on-premise
infrastructure. Data can be structured, semi-structured, or unstructured.

1.1 Relational Databases in the Cloud

Relational databases store data in tables (rows and columns), with relationships defined
between tables. SQL is used to query relational databases.

Cloud Examples:

- Amazon RDS

- Google Cloud SQL

- Microsoft Azure SQL Database

Features:

- Automatic backups

- Multi-zone replication

- Automatic failover

- Horizontal and vertical scaling

2. Cloud File Systems

Traditional file systems can't scale to handle petabytes of data efficiently. Cloud computing uses
distributed file systems to manage large data sets across clusters.

2.1 Google File System (GFS)

Proprietary system by Google, designed for huge data across commodity hardware.
Architecture:

- Master Node: metadata and chunk locations

- Chunk Servers: store data blocks (chunks)

- Default chunk size: 64 MB, replicated 3 times

Advantages:

- Fault-tolerant

- Scalable

- Optimized for large, sequential files

2.2 Hadoop Distributed File System (HDFS)

Inspired by GFS, open-source, used in big data.

Architecture:

- NameNode (Master): manages metadata

- DataNodes: store actual blocks

Characteristics:

- Block size: 128 MB (default)

- Fault-tolerance via replication (3 copies)

- Write-once, read-many optimized

2.3 Comparison: GFS vs HDFS

| Feature | GFS | HDFS |

|------------------|-----------|-----------|

| Origin | Google | Apache |

| Open Source | No | Yes |

| Block Size | 64 MB | 128 MB |

| Language | C++ | Java |

| Replication | 3 copies | 3 copies |

3. NoSQL Databases in the Cloud

Created to address scalability limitations of RDBMS.

3.1 Google Bigtable

Structured data across clusters, built on GFS.

Structure:

- Key: Row, column, timestamp

- Supports versioning

Used in:

- Google Search Index

- Google Earth and Maps

3.2 Apache HBase

Open-source Bigtable implementation, runs on HDFS.

Features:

- Column-family oriented

- Real-time read/write

- Schema-less

3.3 Amazon Dynamo

Highly available key-value storage.
Design:

- Eventual Consistency

- Decentralized P2P architecture

- Consistent Hashing, Vector Clocks

Use Cases:

- Shopping carts, Session storage

4. MapReduce and Extensions

Programming model for large dataset processing.

4.1 Concept of MapReduce

Two phases:

1. Map: Emit key-value pairs

2. Reduce: Merge values by key

Example: Word Count

Map: <word, 1>

Reduce: Sum counts per word

4.2 Parallel Computing in MapReduce

- Data and Task parallelism

- Intermediate results shuffled and sorted

Efficient if:

- Local processing
- Minimized data movement

4.3 Relational Operations Using MapReduce

- Selection: Filter in Map

- Projection: Select columns in Map

- Group By: Group in Reduce

- Join: Tagged Map, matched Reduce

4.4 Enterprise Batch Processing

Used for large data transformation jobs.

Tools: Hadoop, Spark

Use Cases:

- Monthly reports

- ETL pipelines

4.5 Real-World Applications of MapReduce

- Log analysis

- Social graphs

- Recommendation engines

- Bioinformatics

- Fraud detection

|------------|----------|--------------------------|---------------------------|

| RDBMS | SQL DB | Structured data | ACID transactions |

| Bigtable | NoSQL | Google-scale storage | Column family, timestamps |

| HBase | NoSQL | Real-time Hadoop access | Built on HDFS |

| Dynamo | NoSQL | Key-value store | Decentralized, eventual consistency |

| MapReduce | Process | Batch processing | Map & Reduce model |

Unit-3 CC
No ratings yet
Unit-3 CC
10 pages
CC Unit - 03
No ratings yet
CC Unit - 03
10 pages
G 20241123 155553 0000 PDF
No ratings yet
G 20241123 155553 0000 PDF
5 pages
Analyzing Big Data in Hadoop Spark
No ratings yet
Analyzing Big Data in Hadoop Spark
30 pages
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
No ratings yet
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
36 pages
Unit 3 (Ii) - CC
No ratings yet
Unit 3 (Ii) - CC
10 pages
Apache Hadoop and Spark:: and Use Cases For Data Analysis
No ratings yet
Apache Hadoop and Spark:: and Use Cases For Data Analysis
48 pages
BD by Maaz
No ratings yet
BD by Maaz
19 pages
Attachment
No ratings yet
Attachment
11 pages
Fillatre Big Data
No ratings yet
Fillatre Big Data
98 pages
Lecture 5 Distributed Storage Systems
No ratings yet
Lecture 5 Distributed Storage Systems
26 pages
Bda 123
No ratings yet
Bda 123
36 pages
Chapter 7
No ratings yet
Chapter 7
51 pages
Data Management For Distributed Sensor Networks: A Literature Review
No ratings yet
Data Management For Distributed Sensor Networks: A Literature Review
68 pages
Unit II
No ratings yet
Unit II
8 pages
IOT and Comp - Architecture
No ratings yet
IOT and Comp - Architecture
17 pages
Unit 4 - Class Notes
No ratings yet
Unit 4 - Class Notes
6 pages
BDA Module2
No ratings yet
BDA Module2
83 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
51 pages
Chapter - 2 Hadoop
100% (1)
Chapter - 2 Hadoop
32 pages
Inside Cloud - Case Study
No ratings yet
Inside Cloud - Case Study
11 pages
IET Udaipur BDA Unit-1
No ratings yet
IET Udaipur BDA Unit-1
10 pages
Data Cube On Cloud Computing
No ratings yet
Data Cube On Cloud Computing
10 pages
Unit 1 J2 Big Data
No ratings yet
Unit 1 J2 Big Data
6 pages
NoSQL DBs
No ratings yet
NoSQL DBs
46 pages
Big Data Analytics
No ratings yet
Big Data Analytics
61 pages
Top Big Data Platforms & Use Cases
No ratings yet
Top Big Data Platforms & Use Cases
9 pages
Data Science
No ratings yet
Data Science
87 pages
Big Data & Hadoop Architecture Guide
50% (2)
Big Data & Hadoop Architecture Guide
168 pages
Unit 2 - Intro To Hadoop
No ratings yet
Unit 2 - Intro To Hadoop
51 pages
CCD Chapter 3 Notes
No ratings yet
CCD Chapter 3 Notes
11 pages
BDA Class3
No ratings yet
BDA Class3
15 pages
Unit 3 Notes FCC
No ratings yet
Unit 3 Notes FCC
51 pages
07 BigData DataAnalysis
No ratings yet
07 BigData DataAnalysis
66 pages
Unit Iii
No ratings yet
Unit Iii
22 pages
Big Data Unit-Ii Notes
No ratings yet
Big Data Unit-Ii Notes
7 pages
CC Unit-5
No ratings yet
CC Unit-5
9 pages
Cloud Computing Unit 3
No ratings yet
Cloud Computing Unit 3
10 pages
BDT Viva Questions
No ratings yet
BDT Viva Questions
2 pages
GCP Storage Compute
No ratings yet
GCP Storage Compute
378 pages
Bda Angel
No ratings yet
Bda Angel
5 pages
BDA Answers
No ratings yet
BDA Answers
6 pages
In9040 PHD Presentation Selimozcan 2
No ratings yet
In9040 PHD Presentation Selimozcan 2
36 pages
Big Data Slides
No ratings yet
Big Data Slides
26 pages
NoSQL Module1 PPT
No ratings yet
NoSQL Module1 PPT
64 pages
Hadoop and Spark Overview
No ratings yet
Hadoop and Spark Overview
34 pages
Unit 2
No ratings yet
Unit 2
41 pages
Bda Kar
No ratings yet
Bda Kar
5 pages
Data Features and Databases in Cloud and Grid
No ratings yet
Data Features and Databases in Cloud and Grid
18 pages
Unit-3: Describe Mapreduce With Application?
No ratings yet
Unit-3: Describe Mapreduce With Application?
6 pages
Cloud Compute
No ratings yet
Cloud Compute
46 pages
Large-Scale Data Analytics: Traditional Database Systems
No ratings yet
Large-Scale Data Analytics: Traditional Database Systems
11 pages
Unit 4 1
No ratings yet
Unit 4 1
7 pages
Big Data
No ratings yet
Big Data
3 pages
Week 14
No ratings yet
Week 14
33 pages
Exam Overview: GCP Data Engineer
100% (5)
Exam Overview: GCP Data Engineer
12 pages
Mapreduce: Simplified Data Processing On Large Clusters
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters
38 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Cloud Computing Unit4
No ratings yet
Cloud Computing Unit4
5 pages
Cloud Computing Unit 1
No ratings yet
Cloud Computing Unit 1
13 pages
minorprojectII Report
No ratings yet
minorprojectII Report
29 pages
Professional CV Resume
No ratings yet
Professional CV Resume
1 page
Certified List of Candidates: Davao Del Norte - City of Tagum Davao Del Norte - City of Tagum
No ratings yet
Certified List of Candidates: Davao Del Norte - City of Tagum Davao Del Norte - City of Tagum
3 pages
SQL Basics and Indexing
No ratings yet
SQL Basics and Indexing
12 pages
B.Tech Mid-Term Project Guide
No ratings yet
B.Tech Mid-Term Project Guide
3 pages
ABAP Workbench Foundations
No ratings yet
ABAP Workbench Foundations
6 pages
Table Name: Personalinfo: Field Name Data Type Size Constraints
No ratings yet
Table Name: Personalinfo: Field Name Data Type Size Constraints
6 pages
Linux Drive Formatting Guide
No ratings yet
Linux Drive Formatting Guide
2 pages
Sector File Provider Descriptor
No ratings yet
Sector File Provider Descriptor
2 pages
Advanced Internet Tech & Graphics Lab
No ratings yet
Advanced Internet Tech & Graphics Lab
39 pages
Networking: PIF VIF
No ratings yet
Networking: PIF VIF
17 pages
SV Datatypes
No ratings yet
SV Datatypes
19 pages
PMBus Specification Rev 1 2 Part II 20100906
No ratings yet
PMBus Specification Rev 1 2 Part II 20100906
106 pages
DBMS Assignment 4
No ratings yet
DBMS Assignment 4
7 pages
MODBUS Map For GRP 5 and PWR MGR 381339 - 221I
No ratings yet
MODBUS Map For GRP 5 and PWR MGR 381339 - 221I
22 pages
Data Recovery
No ratings yet
Data Recovery
7 pages
SQL Joins Explained for Beginners
No ratings yet
SQL Joins Explained for Beginners
3 pages
Introduction To Keys: Database Management System
No ratings yet
Introduction To Keys: Database Management System
19 pages
Microsoft PowerPoint - 03 - ETL Process - PPT (Compatibility Mode)
No ratings yet
Microsoft PowerPoint - 03 - ETL Process - PPT (Compatibility Mode)
16 pages
Chapter 2 Arrays in PHP
No ratings yet
Chapter 2 Arrays in PHP
19 pages
Servidor NAS Hikvision PDF
No ratings yet
Servidor NAS Hikvision PDF
6 pages
PMD 150 PDF
No ratings yet
PMD 150 PDF
116 pages
Hybris-SAP Integration Architecture
100% (1)
Hybris-SAP Integration Architecture
9 pages
Final R15 Dbmscourse File
No ratings yet
Final R15 Dbmscourse File
33 pages
Modak
No ratings yet
Modak
9 pages
The Osi Model and TCP Ip Protocol Suite
No ratings yet
The Osi Model and TCP Ip Protocol Suite
24 pages
Chat GPT
No ratings yet
Chat GPT
6 pages
Rai School Naxal - Kathmandu Syllabus For Open Text Book Examination 2020-2021 Class Xii - Com
No ratings yet
Rai School Naxal - Kathmandu Syllabus For Open Text Book Examination 2020-2021 Class Xii - Com
2 pages
Professional Cloud DevOps Engineer
No ratings yet
Professional Cloud DevOps Engineer
9 pages
Azure For Developers
100% (3)
Azure For Developers
65 pages
Epon Ieee
No ratings yet
Epon Ieee
10 pages
Technical English: Network Basics
No ratings yet
Technical English: Network Basics
2 pages

Cloud - Computing - Unit 3

Uploaded by

Cloud - Computing - Unit 3

Uploaded by

Cloud Computing –unit - 3

1. Data in the Cloud

1.1 Relational Databases in the Cloud

- Google Cloud SQL

- Microsoft Azure SQL Database

- Horizontal and vertical scaling

2. Cloud File Systems

2.1 Google File System (GFS)

- Master Node: metadata and chunk locations

- Chunk Servers: store data blocks (chunks)

- Default chunk size: 64 MB, replicated 3 times

- Optimized for large, sequential files

2.2 Hadoop Distributed File System (HDFS)

- NameNode (Master): manages metadata

- DataNodes: store actual blocks

- Block size: 128 MB (default)

- Fault-tolerance via replication (3 copies)

- Write-once, read-many optimized

2.3 Comparison: GFS vs HDFS

| Origin | Google | Apache |

| Open Source | No | Yes |

| Language | C++ | Java |

| Replication | 3 copies | 3 copies |

3. NoSQL Databases in the Cloud

3.1 Google Bigtable

- Key: Row, column, timestamp

- Google Search Index

- Google Earth and Maps

3.2 Apache HBase

3.3 Amazon Dynamo

- Decentralized P2P architecture

- Consistent Hashing, Vector Clocks

- Shopping carts, Session storage

4. MapReduce and Extensions

4.1 Concept of MapReduce

1. Map: Emit key-value pairs

2. Reduce: Merge values by key

Example: Word Count

Map: <word, 1>

Reduce: Sum counts per word

4.2 Parallel Computing in MapReduce

- Intermediate results shuffled and sorted

4.3 Relational Operations Using MapReduce

- Projection: Select columns in Map

- Group By: Group in Reduce

- Join: Tagged Map, matched Reduce

4.4 Enterprise Batch Processing

Tools: Hadoop, Spark

4.5 Real-World Applications of MapReduce

| RDBMS | SQL DB | Structured data | ACID transactions |

| Bigtable | NoSQL | Google-scale storage | Column family, timestamps |

| HBase | NoSQL | Real-time Hadoop access | Built on HDFS |

| Dynamo | NoSQL | Key-value store | Decentralized, eventual consistency |

| MapReduce | Process | Batch processing | Map & Reduce model |

You might also like