Data-Engineering-with-AWS-Cookbook

Data Engineering with AWS Cookbook, published by Packt.

Following the chapter list with the link to the companion files as needed.

Chapter 1

Controlling access to S3 buckets
Storage types in S3 for optimized storage costs
Enforcing encryption of S3 buckets
Setting up retention policies for your objects
Versioning your data
Replicating your data
Monitoring your S3 buckets

Chapter 2

Creating read-only replicas for RDS
Redshift live data sharing among your clusters
Synchronizing Glue Data Catalog to a different account
Enforcing fine-grained permissions on S3 data sharing using Lake Formation
Sharing your S3 data temporarily using a presigned URL
Real-time sharing of S3 data
Sharing read-only access to your CloudWatch data with another AWS account

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Applying Data Quality check on Glue tables
Automating the discovery and reporting of sensitive data on your S3 buckets
Establishing a tagging strategy for AWS resources
Building your distributed data community with AWS DataZone following data mesh principles
Handling security-sensitive data (PII and PHI)
Ensuring S3 compliance with AWS Config

Chapter 7

Creating Data Quality for ETL jobs in AWS Glue Studio notebooks
Unit testing your data quality using Deequ
Schema management for ETL pipeline
Building unit test functions for ETL pipeline
Building data cleaning and profiling jobs with DataBrew

Chapter 8

Chapter 9

Automatically setting CloudWatch log group retention to reduce cost
Creating custom dashboards to monitor Data Lake services
Setting up System Manager to remediate non-compliance with AWS Config rules
Using AWS config to automate non-compliance S3 server access logging policy
Tracking AWS Data Lake cost per analytics workload

Chapter 10

Chapter 11

Reviewing the steps and processes for migrating an on-premises platform to AWS
Choosing your AWS analytics stack – the re-platforming approach
Picking the correct migration approach for your workload
Planning for prototyping and testing
Converting ETL processes with big data frameworks
Defining and executing your migration process with Hadoop
Migrating the existing Hadoop security authentication and authorization processes

Chapter 12

Creating SCT migration assessment report with AWS SCT
Extracting Data with AWS DMS
Live example – migrating an Oracle database from a local laptop to AWS RDS using AWS SCT
Leveraging AWS Snow Family for large-scale data migration

Chapter 13

Calculating total cost of ownership (TCO) using AWS TCO calculators
Conducting a Hadoop migration assessment using the TCO simulator
Selecting how to store your data
Migrating on-premises HDFS data using AWS DataSync
Migrating the Hive Metastore to AWS
Migrating and running Apache Oozie on Amazon EMR
Migrating an Oozie database to the Amazon RDS MySQL
Setting up networking – establishing a secure connection to your EMR cluster
Performing a seamless HBase migration to AWS
Migrating HBase to DynamoDB on AWS

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
Chapter01		Chapter01
Chapter02		Chapter02
Chapter03		Chapter03
Chapter04		Chapter04
Chapter05		Chapter05
Chapter06		Chapter06
Chapter07		Chapter07
Chapter08		Chapter08
Chapter09		Chapter09
Chapter10		Chapter10
Chapter12		Chapter12
Chapter13		Chapter13
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data-Engineering-with-AWS-Cookbook

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Chapter 11

Chapter 12

Chapter 13

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

PacktPublishing/Data-Engineering-with-AWS-Cookbook

Folders and files

Latest commit

History

Repository files navigation

Data-Engineering-with-AWS-Cookbook

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages