AWS Data Infrastructure Guide

The document outlines the process of building and managing data infrastructure using AWS services, including setting up data lakes, ingesting data, and preparing it for analytics. It emphasizes the importance of data cataloging, security, governance, and automation in data workflows. Additionally, it discusses orchestration and automation tools like AWS Step Functions and AWS Lambda to streamline data processing and analytics tasks.

Uploaded by

Devendra Talele

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views9 pages

AWS Data Infrastructure Guide

Uploaded by

Devendra Talele

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Build and manage data infrastructure and platforms

This includes setting up databases, data lakes, and data warehouses on AWS
services like Amazon Simple Storage Service (Amazon S3), AWS Glue, Amazon
Redshift, among others.

Ingest data from various sources

You can use tools like AWS Glue jobs or AWS Lambda functions to ingest data from
databases, applications, files, and streaming devices into the centralized data
platforms.
Prepare the ingested data for analytics
Use technologies like AWS Glue, Apache Spark, or Amazon EMR to prepare data by
cleaning, transforming, and enriching it.

Catalog and document curated datasets

Use AWS Glue crawlers to determine the format and schema, group data into tables,
and write metadata to the AWS Glue Data Catalog. Use metadata tagging in Data
Catalog for data governance and discoverability.

Automate regular data workflows and pipelines

Simplify and accelerate data processing using services like AWS Glue workflows,
AWS Lambda, or AWS Step Functions.

Ensure data quality, security, and compliance

Create access controls, establish authorization policies, and build monitoring
processes. Use Amazon DataZone or AWS Lake Formation to manage and govern
access to data using fine-grained controls. These controls help ensure access with
the right level of privileges and context.
First stage start with deciding the storage place ------- 2 STORE—
Before you can ingest data, you need a place to put it, therefore a modern data
architecture starts with the data lake. A data lake is a centralized repository that
you can use store structured, semi-structured, and unstructured data at scale.
Organizations can use it to ingest, store, and analyze diverse datasets without the
need for extensive preprocessing.

Amazon S3 provides an optimal foundation for a data lake because of its virtually
unlimited scalability and high durability. You can seamlessly and non-disruptively
increase storage from gigabytes to petabytes of content and only pay for what you
use.

1 INGEST : After you have established the data lake, you can use specialized AWS
services to ingest different types of data into your data lake.
3 --- CATALOG
An essential component of a data lake built on Amazon S3 is the data catalog.
Organizations can use cataloging to keep track of data assets and understand what
data exists, where it is located, its quality, and how it is used. A data catalog is
designed to provide a single source of truth about the contents of the data lake.
AWS Glue Data Catalog creates a catalog of metadata about your stored assets.
Use this catalog to help search and find relevant data sources based on various
attributes like name, owner, business terms, and others.

4 --- PROCESS
After the data is cataloged, it can now be processed or transformed into formats
that are more useful for analysis and insights. Transformation can include data type
conversion, filtering, aggregation, standardization, and normalizing.

5 ---- DELIVER -- Analytics Service

Transformed data is delivered to consumers and stakeholders, such as data
scientists, data analysts, and business analysts. The primary purpose of data
analytics is to extract insights from data that can lead to good business or
organizational outcomes. Many AWS services can be used at this stage.
6 – Security and Governance—
Security in data analytics systems refers to measures taken to protect data from
unauthorized access, breaches, or attacks. It involves safeguarding data
confidentiality, integrity, and availability. The entire data analytics system depends
on data being secured and accessible only by authorized users.
Governance encompasses the policies, procedures, and processes that ensure the
proper management, quality, and use of data. It involves defining roles,
responsibilities, and decision-making processes related to data.
Following are some of the AWS services used for security and governance. These
are covered in more detail in this course in the Security and Monitoring in Data
Analytics Systems lesson.

With Lake Formation, you can centrally manage and scale fine-grained data
access permissions and share data with confidence within and outside your
organization.

IAM manages fine-grained access and permissions for human users, software users,
other services, and microservices.
https://aws.amazon.com/big-data/datalakes-and-analytics/
https://docs.aws.amazon.com/wellarchitected/latest/analytics-lens/scenarios.html

Orchestration and Automation Options

As businesses increasingly rely on data analytics to make informed decisions,
managing the complexity of data workflows, processing, and analysis becomes a
significant challenge. Without efficient coordination and automation, data pipelines
can become fragmented, error-prone, and time-consuming to manage. Additionally,
scaling these processes to handle growing volumes of data can be daunting.
Orchestration and automation can help solve these problems.
Orchestration is the process of coordinating multiple services to define and
manage the flow of data through a series of steps. It involves defining workflows
and dependencies between steps.

Automation refers to using tools and services to automate repetitive tasks related
to data ingestion, processing, and analytics.

Automation is suitable for simple repetitive tasks. Orchestration is needed for

complex workflows involving the coordination of multiple services, teams, and
dependencies across stages.
Typically, they are used together in analytics workflows. For example, orchestration
could involve coordinating multiple automated tasks in a defined sequence.
Together, orchestration and automation can streamline operations, improve
reliability, and empower non-programmers to manage complex workflows.

Many AWS services can be used to orchestrate pipelines and workflows. They can
be combined in nearly unlimited ways to meet very demanding requirements.
The following is a partial list of AWS services that can be used in data analytics
systems for orchestration and automation.
AWS Step Functions
Step Functions is a visual workflow service to orchestrate and automate workflows,
pipelines, and processes. Step Functions ensures tasks run in the correct order. It
does the following:
 Orchestrates ETL workflows by connecting Lambda functions that extract the
data from sources, transform it, and load it into databases and data lakes.
 Runs batch jobs on data in AWS Glue, AWS EMR, or other services.
 Processes streaming data by connecting Lambda functions processing data
from Kinesis Data Streams or Amazon Data Firehose for real-time analytics.
AWS Lambda
Lambda runs code (called Lambda functions) without provisioning or managing
servers. Combined with Step Functions, Lambda functions can invoke AWS services
and microservices and perform tasks to that are part of orchestrated workflows.
 Lambda functions can be invoked by events from data sources like Amazon
S3, DynamoDB, or Kinesis Data Streams to process incoming data in real
time.
 Step Functions can be used to orchestrate multiple Lambda functions for
error handling, retries, and visualizations.
 Lambda functions can be used in event-driven architectures with services like
Amazon SNS and Amazon SQS to decouple and coordinate different analytics
tasks.

Big Data PDF
No ratings yet
Big Data PDF
18 pages
Modernize Your Analyticsand Data Architecture
No ratings yet
Modernize Your Analyticsand Data Architecture
47 pages
An Introduction To Data Lakes and Data Analytics On AWS ANT204
No ratings yet
An Introduction To Data Lakes and Data Analytics On AWS ANT204
34 pages
Data Lake On Aws
No ratings yet
Data Lake On Aws
29 pages
AWS Walkthrough and Service Location Guide Rev
No ratings yet
AWS Walkthrough and Service Location Guide Rev
9 pages
Cheat Sheet AWS Data Engineer Associate
No ratings yet
Cheat Sheet AWS Data Engineer Associate
117 pages
Final Project On Data Lakes With AWS
No ratings yet
Final Project On Data Lakes With AWS
2 pages
Data Platform On Aws and Snowflake Ra
No ratings yet
Data Platform On Aws and Snowflake Ra
1 page
Data Lakes For Maximum Flexibility
No ratings yet
Data Lakes For Maximum Flexibility
29 pages
AWS Data-Lake Ebook
No ratings yet
AWS Data-Lake Ebook
9 pages
AWS ML Cheat Sheet Nov 2024
No ratings yet
AWS ML Cheat Sheet Nov 2024
100 pages
Awsq
No ratings yet
Awsq
5 pages
Building Data Lakes
No ratings yet
Building Data Lakes
40 pages
Modern Data Architectures Using The AWS WellArchitected Data Analytics Lens REPEAT ARC321-R2
100% (1)
Modern Data Architectures Using The AWS WellArchitected Data Analytics Lens REPEAT ARC321-R2
19 pages
DocScanner 20 Oct 2024 2-19 PM
No ratings yet
DocScanner 20 Oct 2024 2-19 PM
16 pages
1 AWS Analytics and Data Lakes
No ratings yet
1 AWS Analytics and Data Lakes
15 pages
AWS Data Lakes Course Overview
No ratings yet
AWS Data Lakes Course Overview
187 pages
DataAnalytics AWS PDF
No ratings yet
DataAnalytics AWS PDF
133 pages
AWS Data Analytics - Technical - Student
No ratings yet
AWS Data Analytics - Technical - Student
160 pages
Aditya Technical Seminar
No ratings yet
Aditya Technical Seminar
10 pages
Modernserverlessdatalak
No ratings yet
Modernserverlessdatalak
45 pages
AWS Data Lake
No ratings yet
AWS Data Lake
118 pages
PSO Data Analytics Day 1
100% (1)
PSO Data Analytics Day 1
106 pages
Enterprise Data Warehousing On Aws
No ratings yet
Enterprise Data Warehousing On Aws
26 pages
AWS Analytics and Data Solutions
No ratings yet
AWS Analytics and Data Solutions
34 pages
AWS Data Lake
100% (1)
AWS Data Lake
104 pages
Brief Introduction To Amazon
No ratings yet
Brief Introduction To Amazon
7 pages
Bigdata Pipeline With AWS: Author: Diksha Singh Tomer Computer and Science Engineering Banasthali University, India
No ratings yet
Bigdata Pipeline With AWS: Author: Diksha Singh Tomer Computer and Science Engineering Banasthali University, India
9 pages
AWS Services
No ratings yet
AWS Services
34 pages
ANT205 R Achieving Your Modern Data Architecture
No ratings yet
ANT205 R Achieving Your Modern Data Architecture
71 pages
AWS Data Lake
No ratings yet
AWS Data Lake
13 pages
Data Lake
No ratings yet
Data Lake
26 pages
AWS - 06 - Best Practice To Secure DataLake
No ratings yet
AWS - 06 - Best Practice To Secure DataLake
75 pages
Awsdataanalyticsonawstechnicaliltinstructordeck2023 230304021823 0674c2bb
No ratings yet
Awsdataanalyticsonawstechnicaliltinstructordeck2023 230304021823 0674c2bb
146 pages
AWS Data Lake
No ratings yet
AWS Data Lake
87 pages
Redshift-DA Handout
No ratings yet
Redshift-DA Handout
121 pages
TA3 Big Data Analytics
No ratings yet
TA3 Big Data Analytics
13 pages
Implementing Travel & Hospitality Data Mesh: AWS Reference Architecture
No ratings yet
Implementing Travel & Hospitality Data Mesh: AWS Reference Architecture
2 pages
AWS Data Ingestion Workshop Guide
No ratings yet
AWS Data Ingestion Workshop Guide
43 pages
BDC Output 10
No ratings yet
BDC Output 10
7 pages
58076778-Node Javier Ramirez - AWS PDF
No ratings yet
58076778-Node Javier Ramirez - AWS PDF
73 pages
Modern Data Architecture On AWS: A Practical Guide For Building Next-Gen Data Platforms On AWS Behram Irani PDF Download
No ratings yet
Modern Data Architecture On AWS: A Practical Guide For Building Next-Gen Data Platforms On AWS Behram Irani PDF Download
46 pages
REPEAT 3 Architecting Your Data Lake With SAP On AWS ENT310-R3
No ratings yet
REPEAT 3 Architecting Your Data Lake With SAP On AWS ENT310-R3
22 pages
Data Ingestion Patterns in AWS - A Practical Guide - by Data Dev Backyard - Medium
No ratings yet
Data Ingestion Patterns in AWS - A Practical Guide - by Data Dev Backyard - Medium
13 pages
1605192076066-614 DAS-C01 Study Guide
No ratings yet
1605192076066-614 DAS-C01 Study Guide
18 pages
Data Engineering - Session 03
No ratings yet
Data Engineering - Session 03
26 pages
Building Data Lakes
No ratings yet
Building Data Lakes
51 pages
Unit 2 Assignment
No ratings yet
Unit 2 Assignment
4 pages
Data Engineering and Data Engineer - Students
No ratings yet
Data Engineering and Data Engineer - Students
56 pages
AWS Lake House for Data Insights
No ratings yet
AWS Lake House for Data Insights
59 pages
AWS Tools for Data Engineers
No ratings yet
AWS Tools for Data Engineers
24 pages
Aiesec X Aws Workshop
No ratings yet
Aiesec X Aws Workshop
45 pages
Data Lake Solution On Aws
No ratings yet
Data Lake Solution On Aws
24 pages
AWS Certified Big Data Specialty Exam
No ratings yet
AWS Certified Big Data Specialty Exam
13 pages
Architecture For Data Ingestion Clean Processing and Visulizationyounesse
No ratings yet
Architecture For Data Ingestion Clean Processing and Visulizationyounesse
2 pages
Aws Sol Mod 5
No ratings yet
Aws Sol Mod 5
24 pages
Data Lake On The Aws Cloud With Talend Big Data Platform
100% (1)
Data Lake On The Aws Cloud With Talend Big Data Platform
13 pages
Practical Exercise 1.1: Introduction To Windows Explorer
No ratings yet
Practical Exercise 1.1: Introduction To Windows Explorer
9 pages
Proxylab
No ratings yet
Proxylab
15 pages
8th Annual Revision
No ratings yet
8th Annual Revision
3 pages
Oracle - 1Z0-238 EBS R12: Install, Patch and Maintain Applications
No ratings yet
Oracle - 1Z0-238 EBS R12: Install, Patch and Maintain Applications
5 pages
Mid Defense
No ratings yet
Mid Defense
68 pages
BX4003 - Introduction To CO & OS Unit 1
0% (1)
BX4003 - Introduction To CO & OS Unit 1
38 pages
1090C Computer Programming I Lab 1 - Flow Charts and Pseudo Code SPRING 2020 10 Pts / 2 Extra Pts
100% (1)
1090C Computer Programming I Lab 1 - Flow Charts and Pseudo Code SPRING 2020 10 Pts / 2 Extra Pts
9 pages
Release Management & Agile Expert
No ratings yet
Release Management & Agile Expert
11 pages
Project Grey Goose Attacks On Critical Infrastructure
No ratings yet
Project Grey Goose Attacks On Critical Infrastructure
21 pages
Oracle Exadata Database Service Cloudcustomer
No ratings yet
Oracle Exadata Database Service Cloudcustomer
1,152 pages
How To Set Up The Microsoft Network Client Version 3 0 For MS
100% (2)
How To Set Up The Microsoft Network Client Version 3 0 For MS
11 pages
EE1071-Turnitin Guide For Students
No ratings yet
EE1071-Turnitin Guide For Students
6 pages
SAP.C ABAPD 2309.v2024-02-12.q35
No ratings yet
SAP.C ABAPD 2309.v2024-02-12.q35
34 pages
Term 1 Class 2
No ratings yet
Term 1 Class 2
5 pages
AIS950 Installation and Operation Instructions 87149-5-En
No ratings yet
AIS950 Installation and Operation Instructions 87149-5-En
92 pages
MA NSP Manual 2017-05 en
No ratings yet
MA NSP Manual 2017-05 en
16 pages
Select Options in Module Pool Program
No ratings yet
Select Options in Module Pool Program
3 pages
Government Polytechnic College Palakkad KODUMBU (PO), KERALA-678551
No ratings yet
Government Polytechnic College Palakkad KODUMBU (PO), KERALA-678551
26 pages
MET ONE 3400 Brochure
No ratings yet
MET ONE 3400 Brochure
5 pages
Code of Conduct For Students
No ratings yet
Code of Conduct For Students
4 pages
Unit-3 AIOT Complete
No ratings yet
Unit-3 AIOT Complete
21 pages
PSiRA Online User Manual Guide
No ratings yet
PSiRA Online User Manual Guide
6 pages
Bootstrap HTML Calculator Guide
No ratings yet
Bootstrap HTML Calculator Guide
20 pages
NetBooting Naomi - Client Setup Guide
No ratings yet
NetBooting Naomi - Client Setup Guide
5 pages
Group2 AIML Final Report
No ratings yet
Group2 AIML Final Report
82 pages
DCE Brochure
No ratings yet
DCE Brochure
5 pages
Blind Assist
No ratings yet
Blind Assist
24 pages
7th UMC Online Meeting Scheduled 29.05.2025
No ratings yet
7th UMC Online Meeting Scheduled 29.05.2025
19 pages
IT & Math Graduate's CV
No ratings yet
IT & Math Graduate's CV
3 pages
Project Management Process For Project
No ratings yet
Project Management Process For Project
28 pages

AWS Data Infrastructure Guide

Uploaded by

AWS Data Infrastructure Guide

Uploaded by

Build and manage data infrastructure and platforms

Ingest data from various sources

Catalog and document curated datasets

Automate regular data workflows and pipelines

Ensure data quality, security, and compliance

5 ---- DELIVER -- Analytics Service

Orchestration and Automation Options

Automation is suitable for simple repetitive tasks. Orchestration is needed for

You might also like