[go: up one dir, main page]

0% found this document useful (0 votes)
31 views19 pages

Amazon S3 - Comprehensive Notes

Amazon S3 is an object storage service that provides scalable, durable, and secure data storage accessible from anywhere on the web. It features various storage classes optimized for different use cases, supports versioning, lifecycle management, and offers robust security and access control mechanisms. Best practices for using S3 include enabling versioning, configuring lifecycle policies, and integrating with other AWS services for enhanced functionality.

Uploaded by

pravicheers
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views19 pages

Amazon S3 - Comprehensive Notes

Amazon S3 is an object storage service that provides scalable, durable, and secure data storage accessible from anywhere on the web. It features various storage classes optimized for different use cases, supports versioning, lifecycle management, and offers robust security and access control mechanisms. Best practices for using S3 include enabling versioning, configuring lifecycle policies, and integrating with other AWS services for enhanced functionality.

Uploaded by

pravicheers
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Amazon S3 - Comprehensive Notes

What is Amazon S3?


Amazon Simple Storage Service (S3) is an object storage service that offers industry-leading
scalability, data availability, security, and performance. S3 is designed to store and retrieve any amount
of data from anywhere on the web.
Key Characteristics
Object Storage: Files stored as objects in buckets (not file system hierarchy)
Virtually Unlimited: Can store unlimited amounts of data
Globally Accessible: Available from anywhere on the internet (with proper permissions)
Highly Durable: 99.999999999% (11 9's) durability
Highly Available: 99.99% availability SLA for most storage classes
RESTful API: Accessible via HTTP/HTTPS REST API calls
Core Concepts
Objects
What: Individual files stored in S3
Components:
Key: Unique identifier (filename) within a bucket
Value: The actual data content
Metadata: Additional information about the object
Version ID: Unique identifier for object versions
Access Control Information: Permissions for the object
Object Key Naming:
Up to 1,024 characters
Case sensitive
Can include UTF-8 characters
Best practice: Use forward slashes (/) to create logical hierarchy
Buckets
What: Containers for objects in S3
Characteristics:
Globally unique names across all AWS accounts
Regional resources (created in specific regions)
Flat namespace (no nesting of buckets)
Can contain unlimited number of objects
Bucket Naming Rules:
3-63 characters long
Lowercase letters, numbers, and hyphens only
Must start and end with letter or number
Cannot be formatted as IP addresses
Must be globally unique
Regions
S3 buckets are created in specific AWS regions
Objects stored in a region remain there unless explicitly moved
Choose regions based on:
Latency: Closer to users for better performance
Compliance: Data residency requirements
Costs: Storage and transfer costs vary by region
Storage Classes
S3 offers multiple storage classes optimized for different use cases and access patterns:
1. S3 Standard
Use Case: Frequently accessed data
Durability: 99.999999999% (11 9's)
Availability: 99.99%
Minimum Storage Duration: None
Retrieval Fee: None
Best For: Active websites, content distribution, mobile applications
2. S3 Intelligent-Tiering
Use Case: Data with unknown or changing access patterns
How it Works: Automatically moves objects between frequent and infrequent access tiers
Monitoring Fee: Small monthly fee per object
Best For: Data with unpredictable access patterns
3. S3 Standard-IA (Infrequent Access)
Use Case: Data accessed less frequently but needs rapid access
Cost: Lower storage cost than Standard, but retrieval fees apply
Minimum Storage Duration: 30 days
Best For: Backups, disaster recovery, long-term storage
4. S3 One Zone-IA
Use Case: Infrequently accessed data that doesn't require multiple AZ resilience
Storage: Single Availability Zone
Cost: 20% less than Standard-IA
Best For: Secondary backup copies, recreatable data
5. S3 Glacier Instant Retrieval
Use Case: Archive data that needs millisecond retrieval
Minimum Storage Duration: 90 days
Retrieval: Instant (milliseconds)
Best For: Medical images, news media assets
6. S3 Glacier Flexible Retrieval
Use Case: Archive data with flexible retrieval times
Retrieval Options:
Expedited: 1-5 minutes
Standard: 3-5 hours
Bulk: 5-12 hours
Minimum Storage Duration: 90 days
Best For: Backup, archive, compliance data
7. S3 Glacier Deep Archive
Use Case: Long-term archive and digital preservation
Lowest Cost: Cheapest storage class
Retrieval Time: 12-48 hours
Minimum Storage Duration: 180 days
Best For: Compliance archives, digital preservation
8. S3 Outposts
Use Case: On-premises S3 storage
Location: AWS Outposts rack
Best For: Data residency requirements, local processing
Versioning
What is Versioning?
Versioning allows you to keep multiple versions of an object in the same bucket, providing protection
against accidental deletion or modification.
How it Works
When enabled, S3 creates a unique version ID for each object
New uploads create new versions rather than overwriting
Delete operations don't permanently delete; they add a "delete marker"
Previous versions remain accessible until explicitly deleted
States
1. Unversioned (default): Only current version exists
2. Versioning-enabled: Multiple versions can exist
3. Versioning-suspended: New versions not created, existing versions retained
Best Practices
Enable versioning for important data
Use lifecycle policies to manage old versions
Consider MFA Delete for additional protection
Monitor storage costs as versions accumulate
Lifecycle Management
Purpose
Automatically transition objects between storage classes or delete them based on predefined rules,
optimizing costs and management overhead.
Lifecycle Rule Components
1. Rule Name: Descriptive identifier
2. Scope: Which objects the rule applies to (prefix, tags, or all objects)
3. Actions: What to do (transition or delete)
4. Timeline: When to perform actions
Transition Actions
Standard → Standard-IA: Minimum 30 days in Standard
Standard-IA → Glacier: Minimum 30 days in Standard-IA
Glacier → Deep Archive: Minimum 90 days in Glacier
Expiration Actions
Delete current versions after specified time
Delete incomplete multipart uploads
Delete previous versions (with versioning enabled)
Example Lifecycle Rules
Rule 1: Log Files
- Move to Standard-IA after 30 days
- Move to Glacier after 90 days
- Delete after 7 years
Rule 2: Backup Data
- Move to Standard-IA after 1 day
- Move to Glacier Deep Archive after 30 days
- Never delete (compliance requirement)

Security and Access Control


Access Control Methods
1. Bucket Policies
JSON-based: Define permissions using JSON syntax
Resource-based: Attached to buckets
Cross-account access: Can grant access to other AWS accounts
Public access: Can make buckets publicly accessible
Example Bucket Policy:
json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-bucket/*"
}
]
}

2. IAM Policies
User/Role-based: Attached to IAM users, groups, or roles
Identity-based: Control what identities can do
Fine-grained: Specific actions on specific resources
3. Access Control Lists (ACLs)
Legacy method: Predates IAM and bucket policies
Limited granularity: Basic read/write permissions
Use cases: Simple scenarios, cross-account access
4. Pre-signed URLs
Temporary access: Time-limited URLs for specific operations
Secure sharing: Share private objects without exposing credentials
Programmatic generation: Created using AWS SDKs
Block Public Access Settings
Four settings to prevent accidental public access:
1. BlockPublicAcls: Block new public ACLs
2. IgnorePublicAcls: Ignore existing public ACLs
3. BlockPublicPolicy: Block new public bucket policies
4. RestrictPublicBuckets: Restrict public bucket policies to authorized principals
Best Practice: Enable all four settings by default, only disable when public access is explicitly
required.
Encryption
Encryption at Rest
Server-Side Encryption (SSE):
1. SSE-S3: S3 manages encryption keys
AES-256 encryption
AWS manages all aspects of encryption
Default for new buckets
2. SSE-KMS: AWS Key Management Service
Additional key management features
Audit trail for key usage
Fine-grained access control
3. SSE-C: Customer-provided keys
Customer manages encryption keys
S3 performs encryption/decryption
Keys not stored by AWS
4. CSE: Client-Side Encryption
Data encrypted before upload
Customer manages entire encryption process
AWS never sees unencrypted data
Encryption in Transit
HTTPS/TLS: All data transfer encrypted
SSL endpoints: Available in all regions
Certificate validation: Ensures connection security
Access Logging
Server Access Logs: Detailed records of bucket requests
CloudTrail Integration: API-level logging for compliance
VPC Flow Logs: Network-level visibility
Performance Optimization
Request Patterns
Hot Spotting
Problem: High request rates to objects with similar key prefixes
Solution: Distribute requests using random prefixes or hex characters
Example: Instead of logs/2023/01/01/... , use a1b2c3-logs/2023/01/01/...
Request Rate Performance
GET/HEAD/DELETE: 3,500 requests per second per prefix
PUT/COPY/POST: 3,500 requests per second per prefix
LIST: 100 requests per second per bucket
Transfer Acceleration
Purpose: Speed up uploads and downloads using CloudFront edge locations
How it Works: Route traffic through AWS edge locations
Cost: Additional per-GB transfer charge
Use Cases: Global user base, large files, long distances from AWS regions
Multipart Upload
Purpose: Improve upload performance and reliability for large objects
Recommended for: Objects larger than 100 MB
Required for: Objects larger than 5 GB
Benefits:
Parallel uploads of parts
Resume failed uploads
Upload while creating the object
Best Practices:
Use 10-100 MB part sizes for optimal performance
Upload parts in parallel when possible
Complete or abort multipart uploads to avoid charges
CloudFront Integration
Content Distribution: Cache frequently accessed content globally
Origin Access Control (OAC): Secure access to S3 origins
Performance: Reduce latency for global users
Cost Optimization: Reduce data transfer costs
Monitoring and Analytics
CloudWatch Metrics
Request Metrics:
NumberOfObjects
BucketSizeBytes
AllRequests
GetRequests
PutRequests
DeleteRequests
HeadRequests
PostRequests
ListRequests
Error Metrics:
4xxErrors
5xxErrors
Performance Metrics:
FirstByteLatency
TotalRequestLatency
S3 Storage Lens
Purpose: Organization-wide visibility into storage usage and activity
Features:
Cost optimization insights
Data protection best practices
Performance optimization recommendations
Scope: Account, organization, or custom configurations
S3 Inventory
Purpose: Scheduled reports of objects and metadata
Formats: CSV, ORC, or Parquet
Use Cases: Compliance, lifecycle management, analytics
S3 Analytics
Storage Class Analysis: Recommendations for lifecycle policies
Data Access Patterns: Understand how data is accessed
Cost Optimization: Identify opportunities to reduce costs
Data Management Features
Cross-Region Replication (CRR)
Purpose: Automatically replicate objects across different AWS regions
Requirements: Source and destination in different regions, versioning enabled
Use Cases: Compliance, disaster recovery, latency reduction
Same-Region Replication (SRR)
Purpose: Replicate objects within the same region to different buckets
Use Cases: Aggregate logs, live replication between accounts
Replication Configuration
What can be replicated:
All objects or subset based on prefixes/tags
Storage class of replicated objects
Ownership changes
Metadata and ACLs
Replication Time Control (RTC):
15-minute replication SLA
CloudWatch metrics for monitoring
Additional cost for guaranteed timing
S3 Batch Operations
Purpose: Perform large-scale batch operations on S3 objects
Operations:
Copy objects
Set object tags or metadata
Set ACLs
Initiate object restores from Glacier
Invoke Lambda functions
Process:
1. Create job with list of objects and operation
2. S3 processes objects in batches
3. Receive completion report with results
Object Lock
Purpose: Write-once-read-many (WORM) model for regulatory compliance
Modes:
Governance: Users with special permissions can modify
Compliance: No one can modify, including root account
Legal Hold: Indefinite retention until explicitly removed
Requirements:
Versioning must be enabled
Cannot be disabled once configured
Applies to individual object versions
Event Notifications
Event Types
Object Created: PUT, POST, COPY, CompleteMultipartUpload
Object Deleted: Delete, DeleteMarkerCreated
Object Restore: Post-initiated, completed
Reduced Redundancy Storage (RRS): Object lost events
Notification Destinations
1. Amazon SQS: Queue messages for processing
2. Amazon SNS: Publish notifications to topics
3. AWS Lambda: Trigger serverless functions
4. Amazon EventBridge: Route events to multiple targets
Configuration
Suffix/Prefix filtering: Only notify for specific object names
Event filtering: Only notify for specific event types
Multiple destinations: Send same event to multiple targets
Use Cases
Image processing: Trigger Lambda when image uploaded
Data pipeline: Start ETL process when data files arrive
Backup verification: Confirm successful backup completion
Compliance logging: Log all access and modifications
Cost Optimization
Storage Class Selection
Decision Factors:
Access frequency
Retrieval time requirements
Minimum storage duration
Compliance requirements
Cost Comparison (relative to Standard):
Standard: 100% (baseline)
Intelligent-Tiering: ~95% + monitoring fee
Standard-IA: ~50% + retrieval fees
One Zone-IA: ~40% + retrieval fees
Glacier Instant Retrieval: ~30%
Glacier Flexible Retrieval: ~20%
Glacier Deep Archive: ~10%
Lifecycle Policies
Cost Optimization Strategies:
Move infrequently accessed data to cheaper storage classes
Delete objects after required retention period
Remove incomplete multipart uploads
Delete old versions in versioned buckets
Request Optimization
Reduce Request Costs:
Batch operations instead of individual API calls
Use S3 Inventory instead of LIST operations for large buckets
Implement exponential backoff for retries
Data Transfer Optimization
Reduce Transfer Costs:
Use CloudFront for frequently accessed content
Keep data in same region as compute resources
Use VPC endpoints for internal AWS traffic
Consider S3 Transfer Acceleration for global users
Monitoring Tools
Cost Explorer: Analyze S3 spending patterns
S3 Storage Lens: Organization-wide cost insights
Billing Alerts: Set up notifications for unexpected costs
Best Practices
Naming Conventions
Buckets:
Use descriptive, meaningful names
Include organization/project identifier
Follow consistent naming pattern
Consider environment indicators (dev, test, prod)
Objects:
Use logical hierarchy with forward slashes
Include timestamp or version in key name
Avoid sequential prefixes for high-request-rate scenarios
Use consistent naming patterns within buckets
Security Best Practices
1. Enable Block Public Access settings by default
2. Use IAM policies instead of ACLs when possible
3. Enable versioning for important data
4. Configure lifecycle policies to manage versions
5. Enable CloudTrail for API-level logging
6. Use encryption for sensitive data
7. Implement least privilege access principles
8. Regular access reviews and cleanup
Performance Best Practices
1. Use appropriate storage class for access patterns
2. Implement multipart upload for large objects
3. Optimize request patterns to avoid hot spotting
4. Use CloudFront for global content distribution
5. Monitor performance metrics with CloudWatch
6. Consider Transfer Acceleration for global users
Operational Best Practices
1. Enable versioning for data protection
2. Configure lifecycle policies for cost optimization
3. Set up monitoring and alerting for important metrics
4. Use S3 Inventory for large-scale object management
5. Implement backup and disaster recovery strategies
6. Regular cost review and optimization
7. Document bucket purposes and access patterns
Compliance Best Practices
1. Enable Object Lock for regulatory requirements
2. Configure appropriate retention policies
3. Enable comprehensive logging (CloudTrail, Access Logs)
4. Implement data classification and handling procedures
5. Regular compliance audits and reviews
6. Maintain data lineage documentation
Integration with Other AWS Services
Compute Services
EC2: Direct access via AWS CLI, SDKs
Lambda: Event-driven processing, serverless workflows
ECS/EKS: Container-based applications, shared storage
EMR: Big data processing, data lake analytics
Database Services
RDS: Backup storage, data export/import
DynamoDB: Backup storage, data archival
Redshift: Data warehouse source, backup storage
Analytics Services
Athena: Query S3 data using SQL
QuickSight: Business intelligence and visualization
Glue: ETL processing, data catalog
EMR: Big data processing and analytics
AI/ML Services
SageMaker: Model artifacts, training data storage
Rekognition: Image and video analysis
Comprehend: Natural language processing
Textract: Document analysis and extraction
Content Delivery
CloudFront: Global content distribution
API Gateway: REST API integration
Route 53: DNS-based routing
Troubleshooting Common Issues
Access Denied Errors
Potential Causes:
Insufficient IAM permissions
Bucket policy restrictions
Block Public Access settings
Object-level ACL restrictions
Troubleshooting Steps:
1. Check IAM policy permissions
2. Review bucket policy statements
3. Verify Block Public Access settings
4. Examine object ACLs
5. Confirm correct region and bucket name
Performance Issues
Symptoms:
Slow upload/download speeds
High latency
Request timeouts
Solutions:
Use Transfer Acceleration
Implement multipart upload
Optimize request patterns
Use CloudFront for distribution
Check network connectivity
Cost Overruns
Common Causes:
Incorrect storage class selection
Excessive data transfer charges
High request rates
Incomplete multipart uploads
Solutions:
Review storage class usage
Implement lifecycle policies
Monitor request patterns
Set up billing alerts
Use S3 Storage Lens for insights
Data Consistency Issues
S3 Consistency Model:
Strong consistency: For all operations (as of December 2020)
Eventually consistent: No longer applies to new operations
Best Practices:
Design applications to handle any edge-case inconsistencies
Use versioning for critical data
Implement proper error handling
Limits and Quotas
Bucket Limits
100 buckets per account (soft limit, can be increased)
Unlimited objects per bucket
5 TB maximum object size
5 GB maximum single PUT operation
Request Limits
3,500 PUT/COPY/POST/DELETE requests per second per prefix
5,500 GET/HEAD requests per second per prefix
No limit on total requests per bucket
Other Limits
1,024 characters maximum object key length
2 KB maximum metadata size per object
100 lifecycle rules per bucket
1,000 access policy statements per bucket
Conclusion
Amazon S3 is a foundational AWS service that provides:
Core Value:
Virtually unlimited, durable object storage
Multiple storage classes for cost optimization
Rich feature set for data management and security
Seamless integration with other AWS services
Key Success Factors:
Understand access patterns to choose appropriate storage classes
Implement proper security controls and monitoring
Use lifecycle policies for cost optimization
Design for performance from the beginning
Follow AWS best practices for security and compliance
Common Use Cases:
Static website hosting
Data backup and archival
Content distribution
Data lakes and analytics
Application data storage
Disaster recovery
S3's flexibility and feature richness make it suitable for virtually any data storage scenario, from simple
file storage to complex data lake architectures. The key to success is understanding your specific
requirements and configuring S3 accordingly.

You might also like