Introduction to
Analytics on AWS
Lesly Reyes
Telco Solutions Architect
© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates.
Customers want more value from their data
Growing From new Increasingly Used by Analyzed by many
Exponentially sources diverse many people applications
© 2022, Amazon Web Services, Inc. or its affiliates.
Modern data strategy in action
Machine
Learning Databases
Catalog People,
Data
Apps, and
Sources
Governance Devices
Data
Analytics
Lakes
© 2022, Amazon Web Services, Inc. or its affiliates. 3
AWS Analytics Pillars
Scalable data Purpose-built Serverless and Unified data Built-in machine
lakes for performance easy to use access, security, learning
and cost and governance
© 2022, Amazon Web Services, Inc. or its affiliates.
Broadest portfolio
of analytics tools
Amazon
S3
Unmatched durability, Built to store and retrieve
availability, and scalability any amount of data
© 2022, Amazon Web Services, Inc. or its affiliates.
The benefits of data lakes
Store all your data in open formats
Catalog
Cost-effectively scale storage to exabytes
Decouple storage from compute
Data lake Choice of analytical and ML engines
Process data in place
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
AMAZON
AMAZON AMAZON AMAZON AMAZON
OPENSEARCH
REDSHIFT ATHENA EMR KINESIS & MSK
SERVICE
Data Query all your Big data Log and search Real-time
warehousing data using SQL processing analytics analytics
or Python
© 2022, Amazon Web Services, Inc. or its affiliates.
AWS has the most serverless options for data analytics
in the cloud
AWS Glue Amazon Kinesis
Data integration, ETL, and Catalog Real-time analytics
Amazon Redshift Amazon MSK
Data warehousing Real-time analytics
AWS
Amazon EMR
Analytics Amazon QuickSight
Big data processing Visualization
Amazon Athena AWS Lake Formation
Interactive Analytics Data lake setup management and governance
© 2022, Amazon Web Services, Inc. or its affiliates.
Challenges of building and securing modern data lakes
Support updates Row-level Automatic storage
and deletes Fine-grained optimization
Secure sharing
© 2022, Amazon Web Services, Inc. or its affiliates.
Break down data silos
Extract, Visual data Data Data warehouse Federated
transform, load preparation replication to/from data lake query
© 2022, Amazon Web Services, Inc. or its affiliates.
aws.amazon.com/analytics
© 2022, Amazon Web Services, Inc. or its affiliates. 11
Analyze all your data
Amazon
Redshift Price-performance at any scale
THE BEST PRICE-PERFORMANCE
FOR CLOUD DATA WAREHOUSING
Easy, secure, and reliable
© 2022, Amazon Web Services, Inc. or its affiliates.
Fully managed and customizable
Amazon Latest open-source releases
EMR
RUN BIG DATA APPLICATIONS Automatically scale up and down
IN THE CLOUD
Best price-performance
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon EMR Studio for
interactive data analytics
Amazon
EMR Multiple deployment models
RUN BIG DATA APPLICATIONS
IN THE CLOUD
Amazon S3 data lake integration
© 2022, Amazon Web Services, Inc. or its affiliates.
Fully managed
Amazon
OpenSearch Log and search analytics
Service
SUCCESSOR TO
AMAZON ELASTICSEARCH SERVICE
Cost effective
© 2022, Amazon Web Services, Inc. or its affiliates.
Kinesis Data Streams
Amazon Kinesis Data Analytics
Kinesis
COLLECT, PROCESS, AND Kinesis Video Streams
ANALYZE VIDEO AND DATA
STREAMS IN REAL TIME
Kinesis Data Firehose
© 2022, Amazon Web Services, Inc. or its affiliates.
Compatible
Amazon Fully managed
MSK Highly available
FULLY MANAGED, HIGHLY
AVAILABLE, AND SECURE
Secure
© 2022, Amazon Web Services, Inc. or its affiliates.
AWS Lake Formation
BUILD SECURE DATA LAKES
Amazon S3
Portfolio of integrated Lake Formation
analytics tools
Simplified
ingest and
cleaning
Amazon Athena Amazon QuickSight AWS Glue Blueprints ML Transform
Cost effective, durable
data lake storage with
global replication capabilities
Amazon Redshift AWS Glue Acid Transactions Storage
Reliable and Optimization
optimized
data lakes
Amazon SageMaker Amazon EMR Catalog Permissions
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon Amazon Amazon
Redshift EMR Athena
Simplify security
management with
AWS Lake Formation Data Lake
Admin Lake Access Data
Formation Control Catalog
Amazon S3 data lake storage
Data Lake
© 2022, Amazon Web Services, Inc. or its affiliates.
Auto-scaling and serverless
Amazon Internal and/or external users
QuickSight
CLOUD-NATIVE BI SOLUTION
FOR ILLUMINATING
Deeply integrated with AWS services
ORGANIZATIONAL INSIGHTS
Augmented insights on-demand
© 2022, Amazon Web Services, Inc. or its affiliates.
Integrate data faster
AWS Glue Automate at scale
SIMPLE, SCALABLE,
AND SERVERLESS
No servers to manage
© 2022, Amazon Web Services, Inc. or its affiliates.
AWS Glue: Key Capabilities
SERVERLESS DATA INTEGRATION SERVICE
Scalable Data
Integration Engine
Built-in data transforms
Execution engine
Monitor
© 2022, Amazon Web Services, Inc. or its affiliates.
AWS Glue: Key Capabilities
SERVERLESS DATA INTEGRATION SERVICE
Scalable Data Centralized and Unified
Integration Engine Data Governance
Built-in data transforms Glue data catalog
Glue crawlers
Execution engine
Monitor Lake formation
© 2022, Amazon Web Services, Inc. or its affiliates.
AWS Glue: Key Capabilities
SERVERLESS DATA INTEGRATION SERVICE
Scalable Data Centralized and Unified Connect and
Integration Engine Data Governance Ingest Data
Built-in data transforms Glue data catalog Glue connectors
Glue crawlers Glue connector marketplace
Execution engine
Lake formation Variety of interfaces
Monitor
© 2022, Amazon Web Services, Inc. or its affiliates.
AWS Glue: Key Capabilities
SERVERLESS DATA INTEGRATION SERVICE
Scalable Data Centralized and Unified Connect and User Productivity
Integration Engine Data Governance Ingest Data and Data Ops
Built-in data transforms Glue data catalog Glue connectors Persona specific tools
Glue crawlers Glue connector marketplace
Execution engine Productivity tools
Lake formation Variety of interfaces Data ops tools
Monitor
© 2022, Amazon Web Services, Inc. or its affiliates.
AWS Glue: Key Capabilities
SERVERLESS DATA INTEGRATION SERVICE
Scalable Data Centralized and Unified Connect and User Productivity
Integration Engine Data Governance Ingest Data and Data Ops
Built-in data transforms Glue data catalog Glue connectors Persona specific tools
Glue crawlers Glue connector marketplace
Execution engine Productivity tools
Lake formation Variety of interfaces Data ops tools
Monitor
© 2022, Amazon Web Services, Inc. or its affiliates.
Simple, instant start
Amazon Interactive, advanced analytics
Athena Open and flexible
QUERY ALL YOUR
DATA USING
SQL OR PYTHON
Cost-effective
© 2022, Amazon Web Services, Inc. or its affiliates.
Python Python
SQL SQL
INTERACTIVE,
SIMPLE, ADVANCED OPEN AND COST
INSTANT START ANALYTICS FLEXIBLE EFFECTIVE
Serverless, no setup Federated queries ANSI SQL, Apache Spark Pay only for what you use
across 35+ data stores
Instant start, SQL: Save on
optimized runtimes Use PySpark ecosystem Multiple formats, per-query costs
for fast results compression types, and through compression
Simplified notebooks on complex joins
Point to S3 and console for PySpark and data types Python: minimize idle
start querying compute charges
© 2022, Amazon Web Services, Inc. or its affiliates.
Amazon Amazon
DynamoDB EMR
Amazon Amazon
OpenSearch Aurora
Service Amazon S3
Amazon Amazon
Redshift SageMaker
© 2022, Amazon Web Services, Inc. or its affiliates.
AWS Glue Studio
ETL developer
Rich visual interface
250+ built-in transformations
Profile data to understand data
patterns and anomalies
Work on large datasets at scale
© 2022, Amazon Web Services, Inc. or its affiliates.
AWS Glue DataBrew
Business Analyst
Data Scientist
Run ETL jobs without writing code
Monitor thousands of jobs through
a single pane of glass
Distributed processes
Advanced transforms through
code snippets
© 2022, Amazon Web Services, Inc. or its affiliates.
AWS Glue Notebooks
Data engineer
Clean and normalize data with a rich
visual interface
Choose from 250+ built-in
transformations to automate tasks
Profile data to understand data
patterns and anomalies
© 2022, Amazon Web Services, Inc. or its affiliates.
Where to start
© 2022, Amazon Web Services, Inc. or its affiliates. 33
Thank you!
Thank you!
Lesly Reyes
reylesl@amazon.com
© 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. 34