0% found this document useful (0 votes)

34 views26 pages

Lecture 5 Distributed Storage Systems

The document discusses distributed storage systems in cloud computing, highlighting their importance for reliable, scalable, and high-performance data storage. It covers various cloud storage services, distributed file systems, NoSQL databases, data consistency models, cloud-based data warehousing, and real-time processing solutions. Additionally, it addresses backup, disaster recovery, and security measures essential for maintaining data integrity and availability.

Uploaded by

5699silver

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views26 pages

Lecture 5 Distributed Storage Systems

Uploaded by

5699silver

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Distributed Storage Systems

Cloud Computing
Spring 2025
Introduction
• In cloud computing, storage is not confined to a single server or
location.
• Distributed storage systems enable reliable, scalable, and high-
performance data storage across a network of machines.
• These systems underpin many cloud services and are fundamental to
supporting modern applications that require access to large-scale,
highly available data.
• This chapter explores the various facets of distributed storage in the
cloud, from fundamental storage services to advanced architectures
for real-time processing and disaster recovery.
Cloud Storage Services
Cloud providers offer highly scalable and durable storage solutions for
unstructured data. Key services include:

• Amazon S3

• Google Cloud Storage

• Azure Blob Storage

Amazon S3
• Amazon Simple Storage Service (S3) is an object storage service that
offers industry-leading scalability, data availability, with high durability
(99.999999999%), and security.
• S3 organizes data into buckets and allows users to store and retrieve
any amount of data at any time.
• Key features include lifecycle management, cross-region replication,
and fine-grained access control.
• Supports versioning, lifecycle policies, and encryption.
• Integrates with AWS analytics and compute services.
Google Cloud Storage
• Google Cloud Storage offers unified object storage for developers and
enterprises.
• It provides multiple storage classes (Standard, Nearline, Coldline,
Archive) designed for different access frequencies.
• Features include strong consistency, automatic redundancy across
regions, and integration with other Google Cloud services such as
BigQuery and AI/ML tools .
• Strong consistency model.
Google Cloud Storage
Azure Blob Storage
• Azure Blob Storage is Microsoft’s object storage solution for the
cloud.
• It is optimized for storing massive amounts of unstructured data such
as text and binary data.
• Blob Storage supports three access tiers: hot, cool, and archive access
tiers, enabling cost-effective storage based on usage patterns.
• Supports block blobs, append blobs, and page blobs
• Integrated with Azure Data Lake for analytics
Distributed File Systems
Distributed file systems enable large-scale data storage across clusters.
Key systems include:

• Hadoop Distributed File System (HDFS)

• Ceph

• Lustre
Hadoop Distributed File System (HDFS)
• HDFS is a scalable, fault-tolerant distributed file system designed to
run on commodity hardware.
• Designed for batch processing with MapReduce
• Replicates data across nodes for fault tolerance
• Optimized for large, sequential reads
• It divides large files into blocks and distributes them across nodes in a
cluster.
• Each block is replicated to ensure data durability and availability.
Ceph
• Ceph is a unified, distributed storage system designed for excellent
performance, reliability, and scalability.
• It provides object, block, and file system storage in a single platform.
• Ceph uses the CRUSH (Controlled Replication Under Scalable Hashing)
algorithm for data placement, eliminating the need for a central
metadata server.
• Highly scalable with self-healing capabilities
Lustre
• Lustre is a high-performance distributed file system commonly used
in large-scale cluster computing.
• Supports POSIX (Portable OS Interface) compliance for compatibility
• It is widely deployed in supercomputing environments where
performance and throughput are critical.
• Used in scientific computing and financial modeling.
NoSQL Databases in the Cloud
NoSQL databases provide flexible schemas and horizontal scalability for
cloud applications.

• Amazon DynamoDB

• Apache Cassandra

• MongoDB Atlas
Amazon DynamoDB
• DynamoDB is a fully managed NoSQL database service that supports
key-value and document data models.
• Single-digit millisecond latency with auto-scaling
• Supports ACID transactions (atomicity, consistency, isolation, and
durability) and global tables
• It is designed for low-latency and high-throughput applications and
offers features such as on-demand scaling, DAX (DynamoDB
Accelerator), and global tables.
Apache Cassandra
• Cassandra is a highly scalable NoSQL database designed for handling
large amounts of data across multiple commodity servers with no
single point of failure.
• It uses a peer-to-peer architecture and supports eventual consistency.
• Decentralized, wide-column store with tunable consistency
• Linear scalability across multiple data centers
• Used by Netflix, Apple, and other large-scale applications
MongoDB Atlas
• MongoDB Atlas is a fully managed cloud version of MongoDB, a
document-based NoSQL database.
• Atlas supports multi-region deployments, automated backups, and
integrated monitoring tools.
• Document-oriented database with JSON-like schema.
• Supports sharding for horizontal scaling.
• Available as a managed service.
Data Consistency Models and Replication
Strategies
Distributed storage systems often face trade-offs between consistency,
availability, and partition tolerance (CAP theorem). Various consistency
models are used to balance these trade-offs:
• Strong Consistency: Guarantees that all users see the same data at
the same time.
• Eventual Consistency: Updates will eventually propagate through the
system, but immediate consistency is not guaranteed.
• Causal Consistency: Ensures that causally related updates are seen by
all nodes in the same order.
Data Consistency Models and Replication
Strategies
Replication strategies include:
• Master-slave replication: One node handles writes, others replicate
data.
• Multi-master replication: Multiple nodes can handle writes, requiring
conflict resolution.
• Quorum-based replication: Read and write operations require a
quorum of nodes to agree. Balances consistency and availability (e.g.,
Dynamo-style systems)
• Synchronous replication: Ensures data consistency but increases
latency
• Asynchronous replication: Lower latency but risk of data loss
Cloud-Based Data Warehousing
Modern data warehouses enable large-scale analytics with serverless
architectures.

• Google BigQuery

• Snowflake

• Amazon Redshift
Google BigQuery
• BigQuery is a serverless, highly scalable data warehouse that allows
users to run SQL-like queries on large datasets.

• It supports real-time analytics and integrates with various data

ingestion tools.

• Real-time querying and integration with ML models.

Snowflake
• Snowflake offers a cloud-native data warehouse with separate
compute and storage, enabling elastic scalability and concurrent
workloads.

• Its architecture supports structured and semi-structured data.

Amazon Redshift
• Redshift is a fully managed data warehouse that uses columnar
storage and parallel processing to deliver high performance for
analytical queries.

• It integrates with S3 for data lakes and supports Redshift Spectrum for
querying data directly from S3.
Data Streaming and Real-Time Processing
Real-time data processing is crucial for applications such as fraud
detection, log analysis, and recommendation systems.
Cloud-based streaming services include:
• Apache Kafka: A distributed event streaming platform that enables
real-time data feeds.
• Amazon Kinesis/ Azure Event Hubs: A suite of services for real-time
data ingestion and processing.
- Managed streaming services for real-time analytics
- Supports ingestion from IoT, logs, and transactions
• Google Cloud Dataflow: A serverless data processing service for
stream and batch data using Apache Beam SDK.
Backup, Disaster Recovery, and Storage
Security

• Backup and Disaster Recovery

• Storage Security
Backup and Disaster Recovery
Cloud providers offer automated backup services with options for
versioning and point-in-time recovery.
Disaster recovery strategies include:
• Cold standby: Delayed recovery using periodically updated backups.
• Warm standby: Partially active infrastructure that can be quickly
scaled.
• Hot standby: Fully active and redundant systems across regions.
Storage Security
Security in cloud storage involves:

• Encryption: Both in transit (TLS) and at rest (AES-256).

• Access Control: Fine-grained IAM policies, Access control, and access logs.

• Immutable storage: to prevent ransomware attacks

• Compliance: Adherence to standards like GDPR, HIPAA, and SOC 2.

Conclusion
• Distributed storage systems are foundational to the reliability,
performance, and scalability of cloud-based solutions.
• From object storage services and distributed file systems to NoSQL
databases and real-time processing platforms, understanding these
systems is essential for architects and developers building cloud-
native applications.
• Moreover, robust replication strategies, consistency models, and
security mechanisms ensure the integrity and availability of data in a
distributed environment.

Cloud Unit-4-2
No ratings yet
Cloud Unit-4-2
32 pages
Ccomputing Madurya
No ratings yet
Ccomputing Madurya
20 pages
CC Unit-5
No ratings yet
CC Unit-5
9 pages
Data Cube On Cloud Computing
No ratings yet
Data Cube On Cloud Computing
10 pages
CC Unit - 03
No ratings yet
CC Unit - 03
10 pages
Chapter 7
No ratings yet
Chapter 7
51 pages
Big Data and Cloud Computing
No ratings yet
Big Data and Cloud Computing
27 pages
Unit-Iii CC
No ratings yet
Unit-Iii CC
14 pages
L2 AWS Basics
No ratings yet
L2 AWS Basics
56 pages
Cloud Data Management Condensed 9 Pages
No ratings yet
Cloud Data Management Condensed 9 Pages
9 pages
UNIT4CC
No ratings yet
UNIT4CC
45 pages
12.4.unit Iv
No ratings yet
12.4.unit Iv
8 pages
CC Ist Ia Question Bank 24-25
No ratings yet
CC Ist Ia Question Bank 24-25
9 pages
46-Article Text-261-2-10-20210422
No ratings yet
46-Article Text-261-2-10-20210422
10 pages
Unit 5 CC
No ratings yet
Unit 5 CC
7 pages
Lpic Devops 701 2
No ratings yet
Lpic Devops 701 2
8 pages
Chapter 5 Storage in CC
No ratings yet
Chapter 5 Storage in CC
38 pages
Data-Intensive Computing
No ratings yet
Data-Intensive Computing
88 pages
Module 4 Iot
No ratings yet
Module 4 Iot
33 pages
Unit3 - Cloud Data Storage
No ratings yet
Unit3 - Cloud Data Storage
7 pages
CC Module 5
No ratings yet
CC Module 5
22 pages
Cloud Computing and Data Management Study Guide
No ratings yet
Cloud Computing and Data Management Study Guide
9 pages
Unit 3 Analytical Service
No ratings yet
Unit 3 Analytical Service
6 pages
Unit 3 (Ii) - CC
No ratings yet
Unit 3 (Ii) - CC
10 pages
IOT Unit 3
No ratings yet
IOT Unit 3
10 pages
Cloud Computing
No ratings yet
Cloud Computing
8 pages
Introduction To
No ratings yet
Introduction To
9 pages
CC Unit 2
No ratings yet
CC Unit 2
22 pages
Module 4
No ratings yet
Module 4
14 pages
Module-5 Notes
No ratings yet
Module-5 Notes
35 pages
03a Big Data Technology Foundation
No ratings yet
03a Big Data Technology Foundation
24 pages
Cloud Computing Notes
No ratings yet
Cloud Computing Notes
15 pages
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
No ratings yet
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
36 pages
Data Features and Databases in Cloud and Grid
No ratings yet
Data Features and Databases in Cloud and Grid
18 pages
What Is A Data Centre?
No ratings yet
What Is A Data Centre?
19 pages
IET Udaipur BDA Unit-1
No ratings yet
IET Udaipur BDA Unit-1
10 pages
Unit 2
No ratings yet
Unit 2
59 pages
CC Assiginment 2
No ratings yet
CC Assiginment 2
3 pages
Cloud Computing Module-5
No ratings yet
Cloud Computing Module-5
5 pages
Cloud Computing Essentials
No ratings yet
Cloud Computing Essentials
10 pages
20 - 04 - 2024 Cheatsheet
No ratings yet
20 - 04 - 2024 Cheatsheet
3 pages
Unit-3 CC
No ratings yet
Unit-3 CC
10 pages
Unit 3 Notes FCC
No ratings yet
Unit 3 Notes FCC
51 pages
Data Management For Distributed Sensor Networks: A Literature Review
No ratings yet
Data Management For Distributed Sensor Networks: A Literature Review
68 pages
Unit5 Part2 CDatabase
No ratings yet
Unit5 Part2 CDatabase
20 pages
CC Unit4 Smce
No ratings yet
CC Unit4 Smce
24 pages
Cloud Storage
No ratings yet
Cloud Storage
14 pages
ICC Cloud Computing Technologies (Ankit Nandera)
No ratings yet
ICC Cloud Computing Technologies (Ankit Nandera)
15 pages
#5 Cloud Computing
No ratings yet
#5 Cloud Computing
28 pages
I ST Internal-CE
No ratings yet
I ST Internal-CE
26 pages
Cloud Computing
No ratings yet
Cloud Computing
47 pages
Short Notes - 4 - 5 - 6
No ratings yet
Short Notes - 4 - 5 - 6
7 pages
Unit 3
No ratings yet
Unit 3
4 pages
GCP Storage Compute
No ratings yet
GCP Storage Compute
378 pages
Cloud Computing Revision
No ratings yet
Cloud Computing Revision
18 pages
Distributed File System and Scalable Computing
No ratings yet
Distributed File System and Scalable Computing
8 pages
Cloud Storage and Data Management
No ratings yet
Cloud Storage and Data Management
6 pages
Unit-4 - Cloud Storage and Database Services
No ratings yet
Unit-4 - Cloud Storage and Database Services
88 pages
Adobe CQ 5.6 Advanced Developer Student Workbook - FINAL - 20130403
No ratings yet
Adobe CQ 5.6 Advanced Developer Student Workbook - FINAL - 20130403
308 pages
Ontap Release History: NDVM - Non Disruptive Volume Movement - Datamotion For Volumes (Vol Move)
No ratings yet
Ontap Release History: NDVM - Non Disruptive Volume Movement - Datamotion For Volumes (Vol Move)
4 pages
SQL Table and View Management
No ratings yet
SQL Table and View Management
9 pages
PDF Reader Settings Overview
100% (1)
PDF Reader Settings Overview
7 pages
Enterprise Vault - VSE+ Training Administration
No ratings yet
Enterprise Vault - VSE+ Training Administration
24 pages
Jaas in Action - Chapter03 02
No ratings yet
Jaas in Action - Chapter03 02
16 pages
Hitesh Joshi CV
No ratings yet
Hitesh Joshi CV
1 page
CH 09
No ratings yet
CH 09
31 pages
How To Migrate Personalization On OAF
No ratings yet
How To Migrate Personalization On OAF
11 pages
Practice Test 3 New
No ratings yet
Practice Test 3 New
22 pages
Offline Peg A
No ratings yet
Offline Peg A
12 pages
Khushboo Komal FullStackPythonDeveloper
No ratings yet
Khushboo Komal FullStackPythonDeveloper
3 pages
etInterview-Questions-and-Answers for-Experienced-and-Freshers
No ratings yet
etInterview-Questions-and-Answers for-Experienced-and-Freshers
6 pages
D75058GC20 Ep
No ratings yet
D75058GC20 Ep
326 pages
Name: Jerywin Dulangan Bayawan DATE: 09/30/21 Year/Course/Section: Bsis/3/A Module #: 2
No ratings yet
Name: Jerywin Dulangan Bayawan DATE: 09/30/21 Year/Course/Section: Bsis/3/A Module #: 2
1 page
Doctor Search & Booking Platform Guide
No ratings yet
Doctor Search & Booking Platform Guide
4 pages
User Login Sistem / Aplikasi: Class Diagram
No ratings yet
User Login Sistem / Aplikasi: Class Diagram
1 page
Karthik
No ratings yet
Karthik
13 pages
EXP-2 Sa DP Lab B.tech Final Year
No ratings yet
EXP-2 Sa DP Lab B.tech Final Year
3 pages
PranjalAgarwal Resume
No ratings yet
PranjalAgarwal Resume
1 page
Oracle Forms Look & Feel Project: Developer Guide
No ratings yet
Oracle Forms Look & Feel Project: Developer Guide
45 pages
Google Cloud Sales Credential 1
100% (1)
Google Cloud Sales Credential 1
7 pages
EMC-NAS Celerra VNX Get-Logs Health-Check
No ratings yet
EMC-NAS Celerra VNX Get-Logs Health-Check
3 pages
Comm Vault
No ratings yet
Comm Vault
43 pages
DEVASC
No ratings yet
DEVASC
667 pages
Jira REST API v3 Overview
No ratings yet
Jira REST API v3 Overview
43 pages
Authentik: Flexible Identity Provider
No ratings yet
Authentik: Flexible Identity Provider
7 pages
Job Description - Purchasing Officer
No ratings yet
Job Description - Purchasing Officer
2 pages
ICEDQ Next Gen Vs Others
No ratings yet
ICEDQ Next Gen Vs Others
20 pages
SAP SD Performance Tuning: - Xbox and The Need For Speed
No ratings yet
SAP SD Performance Tuning: - Xbox and The Need For Speed
33 pages

Lecture 5 Distributed Storage Systems

Uploaded by

Lecture 5 Distributed Storage Systems

Uploaded by

Distributed Storage Systems

• Google Cloud Storage

• Azure Blob Storage

• Hadoop Distributed File System (HDFS)

• It supports real-time analytics and integrates with various data

• Real-time querying and integration with ML models.

• Its architecture supports structured and semi-structured data.

• Backup and Disaster Recovery

• Encryption: Both in transit (TLS) and at rest (AES-256).

• Immutable storage: to prevent ransomware attacks

• Compliance: Adherence to standards like GDPR, HIPAA, and SOC 2.

You might also like