[go: up one dir, main page]

0% found this document useful (0 votes)
46 views15 pages

Snow Pro Data Engineer Study Guide

Uploaded by

dumanwarabhiraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views15 pages

Snow Pro Data Engineer Study Guide

Uploaded by

dumanwarabhiraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

SNOWPROⓇ ADVANCED:

DATA ENGINEER (DEA-C02)​


EXAM STUDY GUIDE
Last Updated: June 9, 2025
TABLE OF CONTENTS

SNOWPRO ADVANCED: DATA ENGINEER STUDY GUIDE OVERVIEW​ 2


RECOMMENDATIONS FOR USING THE GUIDE​ 2
SNOWPRO ADVANCED: DATA ENGINEER CERTIFICATION OVERVIEW​ 3
SNOWPRO ADVANCED: DATA ENGINEER PREREQUISITE​ 4
STEPS TO SUCCESS​ 4
SNOWPRO ADVANCED: DATA ENGINEER SUBJECT AREA BREAKDOWN​ 4
Domain 1.0: Data Movement​ 5
Domain 1.0 Study Resources​ 6
Domain 2.0: Performance Optimization​ 7
Domain 2.0 Study Resources​ 7
Domain 3.0: Storage & Data Protection​ 8
Domain 3.0 Study Resources​ 8
Domain 4.0: Data Governance​ 9
Domain 4.0 Study Resources​ 9
Domain 5.0: Data Transformation​ 10
Domain 5.0 Study Resources​ 10
SNOWPRO ADVANCED: DATA ENGINEER SAMPLE QUESTIONS​ 11
NEXT STEPS​ 14
REGISTERING FOR YOUR EXAM​ 14
MAINTAINING YOUR CERTIFICATION​ 14

​ ​ ​ ​ ​ ​ ​ ​ ​ Page 1
SNOWPRO ADVANCED: DATA ENGINEER STUDY GUIDE OVERVIEW

This study guide highlights concepts that may be covered on Snowflake’s SnowPro Advanced:
Data Engineer Certification exam.

Use of this study guide does not guarantee certification success.

Holding the SnowPro Core certification in good standing is a prerequisite for taking the
Advanced: Data Engineer certification.

For an overview and more information on the SnowPro Core Certification exam or SnowPro
Advanced Certification series, please navigate here.

RECOMMENDATIONS FOR USING THE GUIDE

This guide will show the Snowflake topics and subtopics covered on the exam. Following the
topics will be additional resources consisting of videos, documentation, blogs, and/or exercises
to help you understand data engineering with Snowflake.

Estimated length of time required to complete the study guide: 10 – 13 hours.

Some links may have more value than others, depending on your experience. The same
amount of time should not be spent on each link. Some links may appear in more than one
content domain.

​ ​ ​ ​ ​ ​ ​ ​ ​ Page 2
SNOWPRO ADVANCED: DATA ENGINEER CERTIFICATION
OVERVIEW

The SnowPro Advanced: Data Engineer tests advanced knowledge and skills used to apply
comprehensive data engineering principles using Snowflake. The exam will assess skills through
scenario-based questions and real-world examples.

This certification will test the ability to:

●​ Source data from Data Lakes, APIs, and on-premises


●​ Transform, replicate, and share data across cloud platforms
●​ Design end-to-end near real-time streams
●​ Design scalable compute solutions for Data Engineer workloads
●​ Evaluate performance metrics

Target Audience:

2 + years of data engineering experience, including practical experience using Snowflake for
Data Engineer tasks. In addition, successful candidates may have:

●​ A working knowledge of Restful APIs, SQL, semi-structured datasets, and cloud native
concepts.
●​ Programming experience is a plus.

This exam is designed for:

●​ Data Engineers
●​ Software Engineers

​ ​ ​ ​ ​ ​ ​ ​ ​ Page 3
SNOWPRO ADVANCED: DATA ENGINEER PREREQUISITE

Eligible individuals must hold an active SnowPro Core Certified credential. If you feel you need
more guidance on the fundamentals, please see the SnowPro Core Exam Study Guide.

STEPS TO SUCCESS
1.​ Review the Data Engineer Exam Guide
2.​ Attend Snowflake’s Instructor-Led Data Engineering I Training
3.​ Attend Snowflake’s Instructor-Led Data Engineer II Training
4.​ Review and study applicable white papers and documentation
5.​ Get hands-on practical experience with relevant business requirements using Snowflake
6.​ Attend Snowflake Webinars
7.​ Attend Snowflake Virtual Hands-on Labs for practical experience
8.​ Schedule your exam
9.​ Take your exam!

Additional Snowflake Asset to check out for Data Engineering:

Cloud Data Engineering for Dummies

SNOWPRO ADVANCED: DATA ENGINEER SUBJECT AREA


BREAKDOWN

The following table contains the domains and weightings covered on the exam. It is not a
comprehensive listing of all the content that will be presented on the exam.

Domain Domain Weightings

1.0 Data Movement 26%

2.0 Performance Optimization 21%

3.0 Storage and Data Protection 14%

4.0 Data Governance 14%

5.0 Data Transformation 25%

​ ​ ​ ​ ​ ​ ​ ​ ​ Page 4
Domain 1.0: Data Movement

1.1​Given a data set, load data into 1.5​ Analyze and differentiate types of
Snowflake. data pipelines.
●​ Outline considerations for data ●​ Create User-Defined Functions
loading (UDFs)
●​ Define data loading features and ●​ Design and use the Snowflake
potential impact SQL API
●​ Create data pipelines in
1.2​Ingest data of various formats Snowpark
through the mechanics of Snowflake.
●​ Required file formats 1.6​Install, configure, and use connectors
●​ Ingestion of structured, to connect to Snowflake.
semi-structured, and unstructured ●​ Kafka connectors
data ●​ Spark connectors
●​ Implementation of stages and file ●​ Python connectors
formats
1.7​Design and build data sharing solutions.
1.3​ Troubleshoot data ingestion. ●​ Implement a data share
●​ Identify causes of ingestion errors ●​ Create and manage views
●​ Determine resolutions for ●​ Implement row-level filtering
ingestion errors ●​ Share data using the Snowflake
Marketplace
1.4​Design, build, and troubleshoot ●​ Share data using a listing
continuous data pipelines.
●​ Stages 1.8​Outline when to use external tables
●​ Tasks and define how they work.
●​ Streams ●​ Manage external tables
●​ Snowpipe (for example, Auto ●​ Manage Iceberg tables
ingest as compared to Rest API) ●​ Perform general table
●​ Snowpipe Streaming management
●​ Manage schema evolution
●​ Unload data

​ ​ ​ ​ ​ ​ ​ ​ ​ Page 5
Domain 1.0 Study Resources
Snowflake Documentation Links
Lab Guides Bulk loading - Local File Systems
Accelerating Data Engineering with COPY_HISTORY
Snowflake & dbt COPY INTO
Auto-Ingest Twitter Data into Snowflake CREATE EXTERNAL TABLE
Automating Data Pipelines to Drive CREATE FILE FORMAT
CREATE STREAM
Additional Assets CREATE TASK
Moving from On-Premises ETL to Data Loading Tutorials
Cloud-Driven ELT (white paper) Data Sharing
Intro to Data Engineering with Databases, Tables & Views
Snowpark for Python (virtual DESCRIBE STAGE
hands-on lab) Dynamic Tables
Building Continuous Data Pipelines External Tables
with Snowflake (virtual hands-on lab) File Functions
HASH
Internal stage parameters
Kaffka and Spark Connectors
Loading Data into Snowflake
Python Connector
SYSTEM$PIPE_STATUS
Snowflake SQL Data Types
Snowpipe
Snowpipe REST API
Snowpipe Streaming
Streams and Tasks
VALIDATE - Data loading
VALIDATE_PIPE_LOAD

​ ​ ​ ​ ​ ​ ​ ​ ​ Page 6
Domain 2.0: Performance Optimization
●​ Query acceleration service
2.1 Troubleshoot underperforming queries. ●​ Snowpark-optimized warehouses
●​ Identify underperforming queries ●​ Caching features
●​ Outline telemetry around the operation
●​ Increase efficiency 2.3 Monitor continuous data pipelines.
●​ Identify the root cause ●​ Snowflake objects
○​ Tasks
2.2 Given a scenario, configure a ○​ Streams
solution for the best performance.
○​ Snowpipe Streaming
●​ Scale out compared to scale up
●​ Virtual warehouse properties (for ○​ Alerts
example, size, multi-cluster) ●​ Notifications
●​ Query complexity ●​ Data Quality and data metric
●​ Micro-partitions and the impact of function monitoring
clustering
●​ Materialized views
●​ Search optimization service

Domain 2.0 Study Resources

Lab Guides
Resource Optimization: Performance LOAD_HISTORY View
Resource Optimization: Usage Monitoring Monitor query activity with QUERY
HISTORY
Additional Assets Monitoring data loading
Performance Impact from Local and Remote PIPE_USAGE_HISTORY view
Disk Spilling (blog) Queries
Snowflake: Visualizing Warehouse QUERY_HISTORY,QUERY_HISTORY_BY_*
Performance (blog) SHOW STREAMS
Virtual Warehouse Best Practices (blog) System clustering information
System functions
Snowflake Documentation Links SYSTEM$CLUSTERING_INFORMATION
Account Usage TASK_HISTORY
Configuring tasks to send notifications Using persisted query results
COPY_HISTORY Using the query acceleration Service
Data quality and data metric functions Virtual Warehouses
Databases, Tables & Views

​ ​ ​ ​ ​ ​ ​ ​ ​ Page 7
Domain 3.0: Storage & Data Protection

3.1 Implement and manage data 3.2 Use system functions to analyze
recovery features in Snowflake. micro-partitions.
●​ Time Travel ●​ Clustering depth
○​ Impact of streams ●​ Cluster keys
●​ Fail-safe
●​ Cross-region and cross-cloud 3.3 Use Time Travel and cloning to
replication create new development environments.
●​ Clone objects
●​ Validate changes before promoting
●​ Rollback changes

Domain 3.0 Study Resources

Lab Guides
Getting Started with Time Travel

Snowflake Documentation Links


Cloning considerations
Clustering keys and clustered tables
CREATE STREAM
Database replication and failover/failback
Account replication
Micro-partitions and data clustering
Parameters
Snowflake Time Travel & Fail-safe
Stage, pipe, and load history replication
SYSTEM$CLUSTERING_DEPTH
SYSTEM$CLUSTERING_INFORMATION
SYSTEM$EXPLAIN_PLAN_JSON
TABLE_STORAGE_METRICS view

​ ​ ​ ​ ​ ​ ​ ​ ​ Page 8
Domain 4.0: Data Governance
​ ​
4.1 Monitor data. ●​ Use data masking with Role-Based
●​ Apply object tagging and Access Control (RBAC) to secure
classifications sensitive data
●​ Use data classification to monitor ●​ Explain the options available to support
data row-level security using Snowflake row
●​ Manage data lineage and object access policies
dependencies ○​ Use aggregation policies
●​ Monitor data quality ●​ Use DDL to manage Dynamic Data
Masking and row access policies
4.2 Establish and maintain data ●​ Use best practices to create and apply
protection. data masking policies
●​ Implement column-level security ●​ Use Snowflake Data Clean Rooms to
○​ Use in conjunction with Dynamic share data
Data Masking ○​ Use the web-app
○​ Use in conjunction with external ●​ Use the Snowflake developer API
tokenization
○​ Use projection policies

Domain 4.0 Study Resources

​ Additional Assets
Snowflake RBAC Security Prefers Role
Inheritance to Role Composition (blog)

Snowflake Documentation Links


ALTER DYNAMIC TABLE
Continuous Data Protection​
CREATE MATERIALIZED VIEW
Data access
Managing Governance in Snowflake
Managing Security in Snowflake
Object tagging
Projection policies

​ ​ ​ ​ ​ ​ ​ ​ ​ Page 9
Domain 5.0: Data Transformation
●​ Traverse and transform
5.1 Define User-Defined Functions semi-structured data to structured
(UDFs) and outline how to use them. data
●​ Snowpark UDFs (for example, Java, ●​ Transform structured data to
Python, Scala) semi-structured data
●​ Secure UDFs
●​ SQL UDFs 5.5 Handle and process unstructured
●​ JavaScript UDFs data.
●​ User-Defined Table Functions ●​ Use unstructured data
(UDTFs) ○​ URL types
●​ User-Defined Aggregate Functions ●​ Use directory tables
(UDAFs)
●​ Use the Rest API
5.2 Define and create external functions.
●​ Secure external functions 5.6 Use Snowpark for data
●​ Work with external functions transformation.
●​ Understand Snowpark architecture
5.3 Design, build, and leverage stored ●​ Query and filter data using the
procedures. Snowpark library
●​ Snowpark stored procedures (for ●​ Perform data transformations using
example, Java, Python, Scala) Snowpark (for example,
aggregations)
5.4 Handle and transform ●​ Manipulate Snowpark DataFrames
semi-structured data.
●​ SQL Scripting stored procedures
●​ JavaScript stored procedures
●​ Transaction management

Domain 5.0 Study Resources

Additional Assets OBJECT_CONSTRUCT


Best Practices for Managing Unstructured Queries
Data (white paper) Semi-Structured Data
Snowflake Scripting
Snowflake Documentation Links Snowpark API
ARRAY_AGG Stored Procedures
CREATE API INTEGRATION Transactions
CREATE EXTERNAL FUNCTION TRY_PARSE_JSON
CREATE PROCEDURE UDFs (User-Defined Functions)
Databases, Tables & Views Unstructured data
External Functions Working with DataFrames
Handler writing in Snowpark

​ ​ ​ ​ ​ ​ ​ ​ ​ Page 10
SNOWPRO ADVANCED: DATA ENGINEER SAMPLE QUESTIONS

1.​ A Data Engineer needs to reload data into a table from files that have been in cloud
storage; some files have been stored more than 64 days.

How should the data be loaded to avoid duplicating data that is already stored in the
table?

A.​ Execute a COPY INTO <table> command with the FORCE option set to =
TRUE.
B.​ Delete all rows from the table and then execute a COPY INTO <table>
command.
C.​ Execute a COPY INTO <table> command with the LOAD_UNCERTAIN_FILES
option set to = TRUE.
D.​ Truncate the table and then execute a COPY INTO <table> statement.

2.​ A Data Engineer discovered that 9 days ago an outage caused a failure of a data
pipeline run that was supposed to deliver data summarizing daily sales. The data did not
update correctly and data is missing. The company is using a Snowflake Enterprise
edition, with all data protection settings at the default values.

The Engineer needs to verify that the pipeline architecture is compliant with these data
protection rules for data in the FENIX_DWH database:

1. 10 days of data history must be accessible in the SALES schema


2. 1 day of data history must be accessible in all other schemas

​ How can these requirements be met MOST cost effectively?

A.​ Rely on the default Snowflake settings for data protection.


B.​ Use the ALTER ACCOUNT SET DATA_RETENTION_TIME_IN_DAYS = 10;
parameter.
C.​ Use the ALTER SCHEMA SALES SET DATA_RETENTION_TIME_IN_DAYS =
10; parameter.
D.​ Use the ALTER DATABASE FENIX_DWH SET
DATA_RETENTION_TIME_IN_DAYS =10; parameter.

​ ​ ​ ​ ​ ​ ​ ​ ​ Page 11
3.​ A Data Engineer wants to parse JSON in Snowflake and run the following query:

​ create or replace temporary table vartab (id integer, v


varchar);

insert into vartab (id, v) values


(1, '[-1, 12, 289, 2188, false,]'),
(2, '{ "x" : "abc", "y" : false, "z": 10} '),
(3, '[ "x" : "def", "y" : true, "z": 11 '),
(4, '[-1, 12, 289, 2188], NULL');

select id, try_parse_json(v)


from vartab
order by id;

What will be the result of the query?

A.​ The following error message will be returned: Error parsing JSON:
incomplete object value.
B.​
1 [ -1, 12, 289, 2188, false, undefined ]
2 { "x": "abc", "y": false, "z": 10 }
3 NULL
4 NULL

C. 1 [ -1, 12, 289, 2188, false, undefined ]


2 { "x": "abc", "y": false, "z": 10 }
3 [ 3, '[ "x" : "def", "y" : true, "z": 11 ]
4 NULL

D.
1 [ -1, 12, 289, 2188, false, undefined ]
2 { "x": "abc", "y": false, "z": 10 }
3 NULL
4 [-1, 12, 289, 2188], NULL

​ ​ ​ ​ ​ ​ ​ ​ ​ Page 12
4.​ A task was created to refresh data that is used to generate reports about data in a table.
The task was running well until a large amount of historical data was added to the table.

Recently, queries to generate reports are failing. The ETL team checked the task history
and reports:

1. The task has been executed successfully.


2. Once initiated the task resumes correctly.
3. The task has all required grants.
4. This is a stand-alone task with no dependencies.
5. The task is not using a stream.

Why is this task failing and what can be done to address this problem?

A.​ The task is limited to a default duration of 60 minutes. The task needs to be
recreated with the value USER_TASK_TIMEOUT_MS set to a higher value.
B.​ The task is limited to a default duration of 120 minutes.The task needs to be
recreated with the value
USER_TASK_MINIMUM_TRIGGER_INTERVAL_IN_SECONDS set to a higher
value.
C.​ The task is limited to a default duration of 60 minutes. The task needs to be
recreated with the value
USER_TASK_MINIMUM_TRIGGER_INTERVAL_IN_SECONDS set to a higher
value.
D.​ The task is limited to a default duration of 120 minutes.The task needs to be
recreated with the value of USER_TASK_TIMEOUT_MS set to a higher value.

5.​ A Data Engineer recreated a table, SALES, using the CREATE [ OR REPLACE ]
TABLE command without backing up the existing table. The Engineer found this
information in the QUERY HISTORY view in the INFORMATION SCHEMA:

statement identifier for the "create or replace table" command:


1e5d0ca9-003e-22e6-b858-a7f5b37c5729
Time of execution: Wed, 26 Jun 2024 09:20:00 -0700

How can the original table data be retrieved?

A.​ DELETE FROM SALES;


INSERT INTO SALES SELECT * FROM SALES AT (TIMESTAMP =>
'Wed, 26 Jun 2024 09:20:00 -0700'::timestamp_tz);
B.​ UNDROP TABLE SALES;
DELETE FROM SALES;
INSERT INTO SALES SELECT * FROM SALES AT (OFFSET =>
-60*5);
C.​ ALTER TABLE SALES RENAME TO SALES_new;
UNDROP TABLE SALES;
D.​ ALTER TABLE SALES RENAME TO SALES_new;
UNDROP TABLE SALES;

​ ​ ​ ​ ​ ​ ​ ​ ​ Page 13
INSERT INTO SALES SELECT * FROM SALES BEFORE(STATEMENT =>
'1e5d0ca9-003e-22e6-b858-a7f5b37c5729');
Correct responses for sample questions:
1: d, 2: c, 3: b, 4: a , 5: c

NEXT STEPS

REGISTERING FOR YOUR EXAM


When you are ready to register for the exam navigate here to get started. Select the exam you
want to take and click “Register Now”. This will take you to our Certification Management
system where you will register to take the exam.

MAINTAINING YOUR CERTIFICATION


All Snowflake Certifications expire two (2) years after your certification issue date.
SnowPro Certifications can now be recertified through the Snowflake Continuing Education
(CE) program which includes these options -
●​ Completion of eligible Snowflake Instructor Led (ILT) Training Courses
●​ Earning of an equivalent or higher-level SnowPro Certification

Note: You must have a valid Certification to participate in the Continuing Education (CE)
program.

The information provided in this study guide is provided for your purposes only and may not be provided
to third parties.

IN ADDITION, THIS STUDY GUIDE IS PROVIDED “AS IS”. NEITHER SNOWFLAKE NOR ITS
SUPPLIERS MAKES ANY OTHER WARRANTIES, EXPRESS OR IMPLIED, STATUTORY OR
OTHERWISE, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY,
TITLE, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT.

​ ​ ​ ​ ​ ​ ​ ​ ​ Page 14

You might also like