Snowflake - End To End Learning
Snowflake - End To End Learning
SNOWFLAKE
ALL CONCEPTS TO GET STARTED
Data Engineering 101- Snowflake
CloudData Warehouse
Acloud-basedplatformforstoringand
analyzingdata,whichoffersscalability,
flexibility,andcost-efficiencycomparedto
traditional on-premises data warehouses.
Snowflakeprovidesafullymanagedservice
withseparatecompute,storage,andcloud
serviceslayers,makingiteasiertoscaleand
managedataoperations.
2
Data Engineering 101- Snowflake
Snowflake
Architecture
Snowflake'sarchitectureseparatesstorage
andcompute,allowingforindependent
scalingandefficientdatamanagement.
Thisdesigneliminatesmanylimitationsof
traditional data warehouses.
Snowflakeusesamulti-clustershareddata
architecture,wherestorageiscentralized,
andcomputeresourcescanbescaledupor
downindependentlybasedonworkload.
3
Data Engineering 101- Snowflake
Virtual Warehouse
Avirtualwarehouseisaclusterofcompute
resourcesinSnowflake.Eachvirtual
warehousecanbescaledindependentlyto
matchtheworkload,providingthe
necessarycomputepowerforquery
executionwithoutaffectingother
warehouses.
Ifacompanyneedstorunaheavy
analyticalqueryduringpeakbusiness
hours,theycanscaleupthevirtual
warehousetoalargersize,ensuringfaster
queryperformance.Afterthepeakhours,
thewarehousecanbescaleddowntosave
costs.
4
Data Engineering 101- Snowflake
Database
Alogicalgroupingofschemas,tables,and
otherdatabaseobjects.Itprovidesa
namespacefororganizingandmanaging
data.
CreatinganewdatabaseinSnowflake:
CREATE DATABASE sales_data;.
Thiscommandsetsupanewdatabase
whereallsales-relatedschemasandtables
canbeorganized.
5
Data Engineering 101- Snowflake
Schema
Alogicalgroupingofdatabaseobjectssuch
as tables, views, and stored procedures.
Schemashelporganizeobjectswithina
database.
Creatinganewschemainadatabase:
CREATESCHEMAsales_data.january;.
Thisschemacancontainalltablesrelated to
January's sales data.
6
Data Engineering 101- Snowflake
Table
Astructuredsetofdataelements(values)
organizedinrowsandcolumns.Tablesare
fundamentalstorageobjectsinadatabase.
Creatinganewtable:
CREATE TABLE customers
(id INT, name STRING, email STRING);.
Thistablestorescustomerinformation.
7
Data Engineering 101- Snowflake
View
Avirtualtablebasedontheresult-setofa
SQLquery.Viewsdonotstoredata
themselvesbutprovideawaytorepresent
data stored in tables.
Creatinganewview:
CREATEVIEWvip_customers
ASSELECT*FROMcustomers
WHEREstatus='VIP';.
ThisviewshowsonlyVIPcustomers.
8
Data Engineering 101- Snowflake
Stage
Alocationwheredatafilesarestored
temporarilybeforebeingloadedinto
Snowflaketables.Stagescanbeinternalor
external (e.g., S3, Azure Blob Storage).
9
Data Engineering 101- Snowflake
File Format
Definestheformatofdatafilestobeloaded
intoSnowflake(e.g.,CSV,JSON,Avro).File
formatsspecifyhowSnowflakeshould
interpretthecontentsofthefiles.
CreatingafileformatforCSVfiles:
CREATEFILEFORMATmy_csv_format
TYPE='CSV'
FIELD_OPTIONALLY_ENCLOSED_BY ='"';.
10
Data Engineering 101- Snowflake
Warehouse Size
Snowflakeoffersdifferentsizesforvirtual
warehouses(e.g.,X-Small,Small,Medium,
Large)toaccommodatevariousworkloads.
Largersizesprovidemorecompute
resources.
ASmallwarehousemightbesufficientfor
routinequeries,whileaLargewarehouse
canhandlecomplexanalyticalqueries.
Adjustthesizebasedonworkload
demands.
11
Data Engineering 101- Snowflake
Scaling Up
Increasingthesizeofavirtualwarehouseto
providemorecomputeresourcesfora
specificworkload.
Scalingupavirtualwarehouse:
ALTERWAREHOUSEmy_warehouse
SET WAREHOUSE_SIZE ='LARGE';.
Thisincreasesthecomputepoweravailable
for queries.
12
Data Engineering 101- Snowflake
Scaling Out
Addingmorecomputeclusterstoavirtual
warehousetohandleincreased
concurrencyandworkloaddemands.
Enablingauto-scalingforawarehouse:
ALTERWAREHOUSEmy_warehouse
SET MAX_CLUSTER_COUNT =5;.
Snowflakewilladdclustersasneededto
handleconcurrentqueries.
13
Data Engineering 101- Snowflake
A u t o - Suspend
Automaticallysuspendsavirtualwarehouse
whenitisidleforaspecifiedperiod,saving
costs.
Settingauto-suspendforawarehouse:
ALTERWAREHOUSEmy_warehouse
SET AUTO_SUSPEND =300;.
Thewarehousewillsuspendafter5minutes
of inactivity.
14
Data Engineering 101- Snowflake
A u t o - Resume
Automaticallyresumesasuspendedvirtual
warehousewhenaqueryissubmitted,
ensuringavailabilitywithoutmanual
intervention.
Enablingauto-resumeforawarehouse:
ALTERWAREHOUSEmy_warehouse
SET AUTO_RESUME =TRUE;.
Thewarehousewillresumeautomatically
whenaqueryissubmitted.
15
Data Engineering 101- Snowflake
Query Caching
Snowflakecachestheresultsofqueriesto
speeduprepeatedqueryexecutions,
reducingtheneedforre-computationand
savingcomputeresources.
Runningthesamequerytwicewillutilize
thecachedresultiftheunderlyingdatahas
notchanged,improvingperformanceand
efficiency.
16
Data Engineering 101- Snowflake
Result Cache
Storestheresultsofqueriesexecuted
withinthepast24hours.Thecacheis
accessibletoalluserswithintheaccount,
reducingcomputecostsandspeedingup
query performance.
Ifaqueryisrunandthenre-runwithin24
hourswithoutchangestotheunderlying
data,theresultisfetchedfromtheresult
cache,savingcomputeresources.
17
Data Engineering 101- Snowflake
Metadata C a c h e
Storesmetadataaboutdatabaseobjectsto
speedupqueryparsingandplanning.This
cachehelpsoptimizequeryexecutionby
reducingthetimeneededtoaccess
metadata.
Metadataabouttables,columns,and
statisticsiscached,allowingfasterquery
planningandexecution.Thishelps
Snowflakeoptimizeperformancefor
complexqueries.
18
Data Engineering 101- Snowflake
Data Caching
Snowflakecachesdatainthelocalstorage
ofvirtualwarehousestoimprovequery
performance.Thiscacheisindependentfor
eachvirtualwarehouse.
Frequentlyaccesseddataisstoredinthe
localdiskcacheofavirtualwarehouse,
reducingtheneedtofetchdatafrom
remotestoragerepeatedly,thusimproving
performance.
19
Data Engineering 101- Snowflake
Stages
LocationsinSnowflakewheredatafilescan
bestoredbeforebeingloadedintotables.
Stagescanbeinternal(withinSnowflake)or
external (e.g., AWS S3).
Aninternalstagecanbecreatedusing
CREATE STAGE my_stage;.
Datafilescanbeuploadedtothisstage
beforebeingloadedintoatable.
20
Data Engineering 101- Snowflake
COPY INTO
Command
Usedtoloaddatafromastageintoa
Snowflaketable.Thecommandspecifies
thetargettableandthesourcefile(s)along
withoptionaltransformations.
Loadingdatafromastageintoatable:
COPYINTOmy_table
FROM @my_stage/file.csv
FILE_FORMAT=(FORMAT_NAME=
'my_csv_format');.
21
Data Engineering 101- Snowflake
Time Travel
Allowsuserstoquery,clone,orrestoredata
toapreviousstatewithinadefined
retentionperiod.Thisfeatureaidsindata
recovery and auditing.
Queryingatableasitwasataspecificpoint
in time:
SELECT *FROM my_table
AT (TIMESTAMP =>'2022-06-01T00:00:00');.
22
Data Engineering 101- Snowflake
Zero-Copy Cloning
Enablescreatingacloneofadatabase,
schema,ortablewithoutcopyingthedata.
Changestotheclonedonotaffectthe
original, and vice versa.
Creatingacloneofatable:
CREATECLONEmy_table_clone
OF my_table;.
Thisallowsworkingwithasnapshotofthe
datawithoutadditionalstoragecosts.
23
Data Engineering 101- Snowflake
Allowssharingofdataacrossdifferent
Snowflakeaccountswithoutmovingor
copyingthedata.Consumerscanquery
shared data in real-time.
SharingdatawithanotherSnowflake
account:
CREATE SHARE my_share;
GRANT SELECT ON TABLE my_table TO
SHARE my_share;.
Therecipientcanaccesstheshareddata
directly.
24
Data Engineering 101- Snowflake
Snowsight
Snowflake'snewwebuserinterfacethat enhancestheuserexperience
withfeatures likeintegrateddashboards,interactive visualizations,and
animprovedSQLeditor.
Userscancreateandmanageinteractive dashboardswithin
Snowsight,allowing themtovisualizedatatrendsandshare insights
withtheirteam.Forexample,a salesteamcanuseSnowsightto
builda dashboardthattracksmonthlysales performanceacross
different regions.
25
Data Engineering 101- Snowflake
Snowflake
Community
Avibrantnetworkofusers,experts,and
partnerswhoshareknowledge,best
practices,andsupporteachotherinusing
Snowflake.Itincludesusergroups,forums,
andspecialinterestgroups.
26
Data Engineering 101- Snowflake
Data Marketplace
TheSnowflakeDataMarketplaceisa
platformwhereuserscandiscover,access,
andsharelivedatasetsfromvarious
providers.Itfacilitatesdatacollaboration
andallowsuserstoenrichtheirowndata
withexternaldatasources.
Amarketingteamcanaccessdemographic
datafromathird-partyproviderthrough
theDataMarketplacetoenhancetheir
customeranalysis.Theycanintegratethis
datawiththeirinternalsalesdatatogain
deeperinsightsintocustomerbehaviorand
preferences.
27
Data Engineering 101- Snowflake
Multi-Cluster Warehouses
Multi-ClusterWarehousesallowSnowflake
toautomaticallymanagethenumberof
computeclustersneededtohandlevarying
workloads.Thisensuresoptimal
performanceandresourceutilization
withoutmanualintervention.
A retailcompanycansetup amulti-cluster
warehousetohandlethehighconcurrency
ofqueriesduringBlackFridaysales.
Snowflakeautomaticallyaddsclustersto
managetheincreasedloadandremoves
themwhentheloaddecreases,ensuring
efficientuseofresourcesandcost
management.
28
Data Engineering 101- Snowflake
Materialized Views
Materializedviewsstoretheresultsetofa
queryphysicallyandautomaticallyupdate
whentheunderlyingdatachanges.They
improvequeryperformancebyproviding
pre-computedresults.
Creatingamaterializedview:
CREATE MATERIALIZED VIEW mv_sales
ASSELECT*FROMsales
WHERE year =2022;.
Queriesonthisviewarefastersincethe
resultsarepre-computed.
29
Data Engineering 101- Snowflake
Task
Tasksareusedtoautomatetheexecution
ofSQLstatements,includingprocedural
logic,atspecifiedintervalsorupon
completionofothertasks.
Creatingatasktorunaqueryeveryhour:
CREATETASKhourly_task
WAREHOUSE='my_warehouse'
SCHEDULE='1HOUR'
ASI
NSERTINTOdaily_sales
SELECT*FROMsales
WHEREsales_date=CURRENT_DATE;.
30
Data Engineering 101- Snowflake
Stream
Streamstrackchangestoatable(inserts,
updates,deletes)andprovideachange
datacapture(CDC)mechanismforefficient
data processing.
Creatingastream:
CREATESTREAMsales_streamONTABLE
sales;.
Thestreamcaptureschangestothesales
table,whichcanbeprocessedlater.
31
Data Engineering 101- Snowflake
Pipe
Pipesautomatedataloadingby
continuouslyingestingdatafromexternal
stages(e.g.,AWSS3,AzureBlobStorage)
intoSnowflaketables.
CreatingapipetoloaddatafromanS3
bucket:
CREATEPIPEmy_pipe
ASCOPYINTOmy_table
FROM@my_stage/file.csv
FILE_FORMAT=(FORMAT_NAME=
'my_csv_format');.
32
Data Engineering 101- Snowflake
Warehouse
Monitoring
Snowflakeprovidestoolstomonitorthe
performanceandusageofvirtual
warehouses,helpingusersoptimize
resourceallocationandmanagecosts.
UsingtheWAREHOUSE_METERING_HISTORY
viewtomonitorwarehouseusageandcosts:
SELECT*FROM
WAREHOUSE_METERING_HISTORY
WHEREWAREHOUSE_NAME=
'my_warehouse';.
33
Data Engineering 101- Snowflake
Role-BasedAccess Control
(RBAC)
Asecuritymodelthatrestrictsaccessto
dataandresourcesbasedontheroles
assigned to users. Snowflake allows fine-
grainedcontroloveraccesspermissions.
Creatingaroleandgrantingprivileges:
CREATEROLEanalyst_role;
GRANTSELECTONDATABASEsales_dataTO
ROLE analyst_role;.
Assigningtheroletoauser:
GRANT ROLE analyst_role TO USER john_doe;.
34
Data Engineering 101- Snowflake
DynamicData Masking
DynamicDataMaskingallowsSnowflaketo
hidesensitivedatainqueryresultsbased
ontheroleoftheuseraccessingthedata.
Thisenhancesdatasecurityandprivacy.
Maskingsensitivedata:
CREATEMASKINGPOLICYssn_mask AS (val
STRING) RETURNS STRING ->CASE
WHENCURRENT_ROLE()IN('analyst_role')
THEN 'XXX-XX-XXXX'
ELSEvalEND;Applyingthepolicy:
ALTERTABLEcustomers
MODIFYCOLUMNssn SET MASKING POLICY
ssn_mask;.
35
Data Engineering 101- Snowflake
External Tables
ExternaltablesallowSnowflaketoquery
datastoredinexternallocations(e.g.,AWS
S3,AzureBlobStorage)withoutloadingit
intoSnowflake.
Creatinganexternaltable:
CREATEEXTERNALTABLEmy_ext_tableWITH
LOCATION='@my_external_stage'
FILE_FORMAT=(FORMAT_NAME=
'my_csv_format');.
Thistableallowsqueryingdatadirectly
fromtheexternalstage.
36
Data Engineering 101- Snowflake
Data Replication
Snowflake'sdatareplicationfeatureallows
forthereplicationofdatabasesacross
differentregionsandcloudprovidersto
enhancedataavailabilityanddisaster
recovery.
Settingup datareplication:
CREATEREPLICATIONGROUP
my_replicationASREPLICATION
TO REGION 'aws_us_west_2';.
Thisreplicatesthedatabasetoadifferent
AWSregion.
37
Data Engineering 101- Snowflake
Failoverand
Failback
Snowflakeprovidesfailoverandfailback
capabilitiestoensurehighavailabilityand
disasterrecovery.Failoverallowsswitching
toareplicaincaseofafailure,andfailback
switchesbackoncetheoriginalisrestored.
Configuringfailoverforadatabase:
ALTER DATABASE my_database
SETFAILOVERGROUP=my_failover_group;.
Thisensuresthatthedatabasecanswitch
toareplicaincaseofafailure.
38
Data Engineering 101- Snowflake
SearchOptimization
Service
ASnowflakefeaturethatimprovesthe
performanceofsearchesonlargetablesby
creatingandmaintainingsearch
optimizationstructures.
Enablingsearchoptimizationforatable:
ALTERTABLEmy_tableSETSEARCH
OPTIMIZATION =TRUE;.
Thisimprovestheperformanceofsearch
queriesonthetable.
39
Data Engineering 101- Snowflake
SnowflakeData
Exchange
AplatformthatallowsSnowflakeusersto
shareandaccesslivedatasecurely.It
facilitatesdatacollaborationand
monetizationbyprovidingamarketplace
for data providers and consumers.
PublishingdatatotheDataExchange:
CREATEEXCHANGEmy_exchange;
GRANTSELECTONTABLEmy_tableTO
EXCHANGEmy_exchange;.
Otheruserscansubscribetoandquerythe
shared data.
40
Data Engineering 101- Snowflake
Data Masking
Datamaskingprovidesawaytoprotect
sensitivedatabymaskingitinqueryresults,
basedonuserroles.Thisensuresthat
sensitiveinformationisnotexposedto
unauthorizedusers.
Creatingadatamaskingpolicy:
CREATEMASKINGPOLICYemail_mask
AS (val STRING)
RETURNS STRING ->CASE
WHENCURRENT_ROLE()IN('analyst_role')
THEN '********@domain.com' ELSE val END;
Applyingthepolicytoacolumn:
ALTERTABLEusers
MODIFYCOLUMNemailSETMASKINGPOLICY
email_mask;.
41
Data Engineering 101- Snowflake
Snowpipe
SnowpipeisSnowflake'scontinuousdata
ingestionservice,whichallowsforthe
automatedloadingofdatafromexternal
stagesintoSnowflaketables.
CreatingaSnowpipetoloaddata:
CREATEPIPEmy_pipe
ASCOPYINTOmy_table
FROM@my_stage
FILE_FORMAT=(FORMAT_NAME=
'my_csv_format');.
Snowpipewillautomaticallyloadnewdata
files as they arrive in the stage.
42
Data Engineering 101- Snowflake
External Functions
ExternalfunctionsallowSnowflaketocall
externalservicesandintegratewith
externalsystemsdirectlyfromSQLqueries.
Thisenablesadvanceddataprocessingand
integrationcapabilities.
Creatinganexternalfunction:
CREATE EXTERNAL FUNCTION
my_ext_function()
RETURNS STRING API_INTEGRATION =
my_api_integration;.
ThisfunctioncancallanexternalAPIand
returntheresulttoSnowflake.
43
Data Engineering 101- Snowflake
Streamstrackchangestotables,andtasks
automatetheexecutionofSQLbasedon
schedulesorevents.Together,theyenable
efficientchangedatacaptureand automation.
Creatingastreamandtask:
CREATESTREAMmy_stream
ONTABLEmy_table;
CREATETASKmy_taskWAREHOUSE=
'my_warehouse'SCHEDULE='1HOUR'
AS
INSERTINTOmy_target_table
SELECT*FROMmy_stream;.
44
Data Engineering 101- Snowflake
Snowflake
Organizations
SnowflakeOrganizationsprovidea wayto
managemultipleSnowflakeaccounts
withinanorganization.Thisenablesbetter
resourceallocation,costmanagement,and
governance.
Creatinganorganization:
CREATE ORGANIZATION my_org;
andaddingaccountstoit.Thisallows
centralmanagementofmultipleSnowflake
accounts.
45
Data Engineering 101- Snowflake
Data Governance
Snowflakeoffersfeaturesfordata
governance,includingaccesscontrols,data
masking,andauditlogging,toensuredata
security, privacy, and compliance.
Implementingdatagovernance:
CREATEROWACCESSPOLICYmy_policyAS
(val STRING)
RETURNSBOOLEAN->CURRENT_ROLE()
IN ('data_governance_role');
andapplyingittoatable.
46
Data Engineering 101- Snowflake
Account Usage
Snowflakeprovidesaccountusageviewsto
trackandanalyzeresourceusage,query
performance,andcostmanagement.These
viewshelpinmonitoringandoptimizing
Snowflake usage.
Queryingaccountusage:
SELECT*FROM
ACCOUNT_USAGE.QUERY_HISTORYWHERE
QUERY_TEXT ILIKE '%SELECT%';
ThisretrievesthehistoryofSELECTqueries
executedintheaccount.
47
Data Engineering 101- Snowflake
Resource Monitors
Resourcemonitorsallowadministratorsto
manageandcontrolcomputeresource
usagebysettingthresholdsandtriggering
actionswhenlimitsarereached.
Creatingaresourcemonitor:
CREATERESOURCEMONITORmy_monitor
WITH CREDIT_QUOTA =1000;
andassigningittoawarehouse.This
monitorwilltrackthecomputecreditsused
bythewarehouseandtakeactionifthe
quotaisexceeded.
48
Data Engineering 101- Snowflake
Query Optimization
Snowflakeprovidesvarioustoolsand techniquestooptimizequery
performance, includingusingtheQueryProfiler, optimizingtable
structures,andleveraging caching.
UsingtheQueryProfiler:
SELECT*FROM
TABLE(QUERY_HISTORY_BY_SESSION(SESSI
ON_ID =>'my_session'));
Thishelpsidentifyandoptimizeslow-runningqueries.
49
Data Engineering 101- Snowflake
Data Sharing
Snowflakeallowssecuresharingofdata
betweendifferentaccountswithoutdata
movement.Shareddatacanbeaccessedin
real-time,ensuringconsistencyand
reducinglatency.
Creatingashare:CREATESHAREmy_share;
andaddingtablestoit.OtherSnowflake
accountscanaccesstheshareddata
directly.
50
Data Engineering 101- Snowflake
Cloning
CloninginSnowflakecreatesacopyofa
database,schema,ortablewithout
duplicatingthedata.Thisisusefulfor
creatingtestenvironmentsandforbackup
purposes.
Cloningatable:
CREATECLONEmy_table_cloneOF
my_table;
Thisallowsworkingwithasnapshotofthe
datawithoutadditionalstoragecosts.
51
Data Engineering 101- Snowflake
DataLoadand
Unload
Snowflakeprovidesvariousmethodsfor
loadingandunloadingdata,includingbulk
loadingwiththeCOPYcommand,using
Snowpipeforcontinuousloading,and
unloadingdatatoexternalstages.
Loadingdata:
COPYINTOmy_table
FROM@my_stageFILE_FORMAT=
(FORMAT_NAME='my_csv_format');
andunloadingdata:
COPYINTO@my_stageFROMmy_table;.
52
Data Engineering 101- Snowflake
Data Encryption
Snowflakeencryptsdataatrestandin
transittoensuredatasecurity.Encryption
keysaremanagedautomatically,andusers
canalsoprovidetheirownkeysfor
additionalsecurity.
Enablingencryptionforatable:
ALTERTABLEmy_tableSET
DATA_RETENTION_TIME_IN_DAYS =90;
Thisensuresthatdataisencryptedand
retainedforaspecifiedperiod.
53
Data Engineering 101- Snowflake
Data Retention
Snowflakeprovidesdataretentionpolicies
tomanagehowlongdataiskeptinthe
system. This includes Time Travel and Fail-
safe periods for data recovery.
54
Data Engineering 101- Snowflake
Fa i l - Safe
Fail-SafeisaSnowflakefeaturethat
providesanadditional7-dayperiodfor
recoveringdataaftertheTimeTravel
retentionperiodhasexpired.Thisensures
data recovery in case of failures.
55
Data Engineering 101- Snowflake
User-Defined
Functions(UDFs)
UDFsallowuserstodefinetheirown
functionsinSQLorJavaScript,extending
Snowflake'sbuilt-infunctionalitywith
customlogic.
CreatingaSQLUDF:
CREATEFUNCTIONmy_udf(xINT)
RETURNS INT
LANGUAGESQL
AS
'RETURNx *2';
Thisfunctionmultipliestheinputby2.
56
Data Engineering 101- Snowflake
Stored Procedures
StoredproceduresinSnowflakeallowfor procedurallogicandcomplex
operationsto beencapsulatedinSQLorJavaScript, enabling
automationandreusablecode.
Creatingastoredprocedure:
CREATE PROCEDURE my_proc()
RETURNS STRING LANGUAGE JAVASCRIPT
AS $$ return 'Hello, World!';
$$;
andcallingit:
C A L L my_proc();.
57
Data Engineering 101- Snowflake
Privilegesand
Grants
Snowflake'ssecuritymodelusesprivileges
andgrantstocontrolaccesstodatabase
objects.Rolesareassignedprivileges,and
usersareassignedroles.
Grantingprivileges:
GRANTSELECTONTABLEmy_table
TOROLEanalyst_role;
Thisallowsuserswiththeanalyst_roleto
querythetable.
58
Data Engineering 101- Snowflake
RolesandRole
Hierarchies
RolesinSnowflakedefineasetofprivileges
andcanbeassignedtousers.Role
hierarchiesallowrolestoinheritprivileges
fromotherroles,simplifyingaccess
management.
Creatingarolehierarchy:
CREATEROLEsenior_analyst;
GRANTROLEanalyst_roleTOROLE
senior_analyst;
Userswiththesenior_analystroleinherit
privilegesfromtheanalyst_role.
59
Data Engineering 101- Snowflake
Session Variables
SessionvariablesinSnowflakestorevalues
thatcanbeusedwithinasession.They
allowfordynamicSQLandreusablecode.
Settingandusingasessionvariable:
SETmy_var='Hello,World!';
and SELECT $my_var;
Thisreturnsthevalueofthevariable.
60
Data Engineering 101- Snowflake
Parameter
Management
Snowflakeallowsconfigurationofvarious
parametersattheaccount,session,and
objectlevelstocustomizebehaviorand
optimizeperformance.
61
Data Engineering 101- Snowflake
Semi-Structured Data
Snowflakesupportssemi-structureddata
formatssuchasJSON,Avro,Parquet,and
XML.Thisallowsforflexibledatamodeling
andintegrationwithmoderndatasources.
QueryingJSONdata:SELECTjson_data:id
FROMmy_table;.Thisretrievesthe"id"field
fromJSONdatastoredinacolumn.
62
Data Engineering 101- Snowflake
Data Compression
Snowflakeautomaticallycompressesdata
toreducestoragecostsandimprovequery
performance.Differentcompression
algorithmsareusedbasedonthedatatype.
Snowflake'sautomaticcompressionmeans
usersdon'tneedtomanuallyconfigure
compressionsettings,astheplatform
optimizes storage efficiency.
63
Data Engineering 101- Snowflake
Cost Management
Snowflakeprovidestoolsandpracticesto
manageandoptimizecosts,including
resourcemonitors,usageviews,andbest
practicesforqueryoptimization.
Usingresourcemonitorstocontrolcosts:
CREATERESOURCEMONITORmy_monitor
WITH CREDIT_QUOTA =1000;
andsettingupalertsforbudgetthresholds.
64
Data Engineering 101- Snowflake
Query History
Snowflaketracksqueryhistory,allowing
userstoreviewandanalyzepastqueriesfor
performanceoptimizationand
troubleshooting.
Accessingqueryhistory:
SELECT *FROM QUERY_HISTORY
WHERE QUERY_TEXT ILIKE '%SELECT%'; This
retrievesahistoryofSELECTqueries
executedintheaccount.
65
Data Engineering 101- Snowflake
Metadata
Management
Snowflakemanagesmetadataforall
databaseobjects,providingdetailed
informationabouttables,columns,and
otherobjects.Thismetadataisusedfor
queryoptimizationanddatagovernance.
Queryingmetadata:
SELECT*FROM
INFORMATION_SCHEMA.TABLES
WHERETABLE_SCHEMA='PUBLIC';
Thisretrievesinformationaboutalltablesin
thePUBLICschema.
66
Data Engineering 101- Snowflake
Data Import/Export
Snowflakesupportsvariousmethodsfor
importingandexportingdata,including
bulkloadingwiththeCOPYcommandand
unloadingtoexternalstages.
Importingdata:
COPYINTOmy_table
FROM@my_stageFILE_FORMAT=
(FORMAT_NAME='my_csv_format');
andexportingdata:
COPYINTO@my_stageFROMmy_table;
67
Data Engineering 101- Snowflake
Data Quality
Snowflakeprovidesfeaturestoensuredata
quality,includingconstraints,data
validation, and profiling.
Implementing dataqualitychecks:
CREATETABLEmy_table(
idINTPRIMARYKEY,
name STRING NOT NULL);
Thisensuresthatthe"id"columnisunique
andthe"name"columnisnotnull.
68
Data Engineering 101- Snowflake
Data Lineage
Datalineagetrackstheflowofdata
throughSnowflake,fromingestionto
transformationtoanalysis,providing
visibilityintodatadependenciesand
transformations.
Usingviewsandtaskstotrackdatalineage:
CREATEVIEWmy_view
AS SELECT*FROMmy_table;
and
CREATETASKmy_task
AS
INSERTINTOmy_table
SELECT *FROM my_view;
69
Data Engineering 101- Snowflake
Business Continuity
Snowflake'sfeaturesforbusinesscontinuity
includedatareplication,failover,andfail-
safe,ensuringthatdataisalwaysavailable
and recoverable in case of disasters.
70
Data Engineering 101- Snowflake
Governanceand
Compliance
Snowflakeprovidestoolsfordata
governanceandcompliance,including
accesscontrols,datamasking,andaudit
logging,toensuredatasecurityand
regulatorycompliance.
Implementingcompliancepolicies:
CREATEROW ACCESSPOLICY
compliance_policyAS(valSTRING)
RETURNSBOOLEAN->CURRENT_ROLE()
IN ('compliance_role');
andapplyingittoatable.
71
Data Engineering 101- Snowflake
Advanced Analytics
Snowflakesupportsadvancedanalytics
capabilities,includingmachinelearning
integration,geospatialdataprocessing,and
complexdatatransformations.
Integratingwithmachinelearningmodels:
CREATEFUNCTIONpredict_sales(xFLOAT)
RETURNSFLOAT
LANGUAGEPYTHONRUNTIME='3.8'
H A N D L E R ='my_model.predict';
ThisfunctioncallsaPythonmodelforsales
prediction.
72
Data Engineering 101- Snowflake
Data Monetization
Snowflake'sdatamarketplaceandsecure
datasharingenableorganizationsto
monetizetheirdataassetsbysharingor
sellingdatatootherSnowflakeusers.
Publishingdataformonetization:
CREATEEXCHANGEmy_exchange;
andaddingdatatoitforotherusersto
accessandpurchase.
73
Data Engineering 101- Snowflake
Geospatial Data
Snowflakesupportsgeospatialdatatypes
andfunctions,allowinguserstostore,
query,andanalyzespatialdatasuchas
points, polygons, and geometries.
Querying geospatialdata:
SELECTST_DISTANCE(point1,point2)
FROMmy_table;
Thiscalculatesthedistancebetweentwo
pointsstoredinatable.
74
Data Engineering 101- Snowflake
Snowflake'sscalablearchitectureand
supportforsemi-structureddatamakeit
well-suitedforprocessingandanalyzingIoT
(Internet of Things) data.
LoadingIoTdata:
COPYINTOmy_table
FROM@iot_stage
FILE_FORMAT=(FORMAT_NAME=
'json_format');
This ingests JSON data from IoT devices.
75
Data Engineering 101- Snowflake
Real-Time Analytics
Snowflakesupportsreal-timeanalyticsby
allowingcontinuousdataingestionand
immediatequeryingoffreshdata.
UsingSnowpipeforreal-timedata
ingestion:
CREATEPIPEmy_pipe
ASCOPYINTOmy_table
FROM@my_stage
FILE_FORMAT=(FORMAT_NAME=
'my_csv_format');
76
Data Engineering 101- Snowflake
Data Federation
Snowflake'sexternaltablesanddata sharingfeaturesenabledata
federation, allowinguserstoqueryandcombinedata frommultiple
sourceswithoutmovingthe data.
Creatinganexternaltabletofederatedata: CREATE
EXTERNALTABLEmy_ext_tableWITH LOCATION
='@my_external_stage' FILE_FORMAT=
(FORMAT_NAME=
'my_csv_format');
Thistableallowsqueryingdatadirectly fromtheexternalstage.
77
Data Engineering 101- Snowflake
Security
Integrations
Snowflakeintegrateswithsecuritytools
andframeworks,includingsinglesign-on
(SSO),multi-factorauthentication(MFA),
andencryptionkeymanagement,to
enhancedatasecurity.
ConfiguringSSO:
ALTER ACCOUNT SET SSO_LOGIN_PAGE =
'https://mycompany.com/sso';.
Thisenablessinglesign-onforSnowflake
users.
78
Data Engineering 101- Snowflake
ContinuousData
Protection
Snowflake'scontinuousdataprotection
featuresincludeTimeTravel,Fail-safe,and
datareplication,ensuringdataintegrityand
availability at all times.
Setting up data replication:
CREATE REPLICATION GROUP
my_replication
AS REPLICATION TO REGION
'aws_us_west_2';.
Thisreplicatesthedatabasetoadifferent
AWSregion.
79
Data Engineering 101- Snowflake
Snowflakeallowsuserstodefinecustom
datatypesandenforcedataintegrity
throughconstraintsand validationrules.
Creatingacustomdatatype:
CREATE DOMAIN email_type
AS STRING CHECK (VALUE LIKE '%@%.%');
This enforces email format validation.
80
Data Engineering 101- Snowflake
Hybrid Tables
HybridtablesinSnowflakecombinethe
benefitsoftransactionalandanalytical
processing,allowingforefficientreal-time
data analysis.
Creatingahybridtable:
CREATEHYBRIDTABLEmy_table
(id INT, data STRING);
Thistablesupportsbothtransactionaland
analytical workloads.
81
Data Engineering 101- Snowflake
Data Archiving
Snowflake'sdataretentionandarchiving
featureshelpmanagelong-termstorageof
historicaldata,ensuringthatitisavailable
forcomplianceandanalysis.
82
Data Engineering 101- Snowflake
Data Classification
DataclassificationinSnowflakehelps
categorizedatabasedonsensitivityand
importance,enablingbetterdata
governanceandsecurity.
Classifying data:
ALTERTABLEmy_tableSETTAG
classification ='sensitive';
Thistagsthetableascontainingsensitive
data.
83
Data Engineering 101- Snowflake
DataMasking
Policies
DatamaskingpoliciesinSnowflakeprovide
dynamicmaskingofsensitivedatabased
onuserroles,ensuringthatonlyauthorized
userscanseetheactualdata.
Creatingadatamaskingpolicy:
CREATEMASKINGPOLICYssn_mask
AS (val STRING)
RETURNS STRING ->CASE
WHENCURRENT_ROLE()IN('analyst_role')
THEN 'XXX-XX-XXXX' ELSE val END;
Applyingthepolicy:
ALTERTABLEcustomersMODIFYCOLUMNssn
SET MASKING POLICY ssn_mask;
84
Data Engineering 101- Snowflake
RowaccesspoliciesallowSnowflaketo
restrictaccesstospecificrowsinatable
basedonuserrolesandothercriteria,
enhancingdatasecurityandcompliance.
Creatingarowaccesspolicy:
CREATEROWACCESSPOLICYrow_policy
AS (val STRING)
RETURNSBOOLEAN->CURRENT_ROLE()IN
('analyst_role');
Applyingthepolicy:
ALTERTABLEmy_tableMODIFYROW
ACCESS POLICY row_policy;.
85
Data Engineering 101- Snowflake
Cross-Cloud
Replication
Snowflakesupportscross-cloudreplication,
allowingdatatobereplicatedacross
differentcloudproviders(e.g.,AWS,Azure,
GoogleCloud)forhighavailabilityand
disaster recovery.
Settingupcross-cloudreplication:
CREATEREPLICATIONGROUP
my_replication
ASREPLICATIONTOREGION'azure_eastus';
ThisreplicatesthedatabasetoanAzure
region.
86
Data Engineering 101- Snowflake
Event-DrivenData Processing
Snowflake'stasksandstreamsenable event-
drivendataprocessing,allowing
actionstobetriggeredbasedonchangesin
dataorscheduledintervals.
Creatinganevent-driventask:
CREATETASKmy_task
WAREHOUSE='my_warehouse'
AFTER INSERT ON my_table
ASI
NSERTINTOaudit_table
SELECT *FROM my_table;
87
Data Engineering 101- Snowflake
DataEncryptionKey
Management
Snowflakeallowsuserstomanagetheir
ownencryptionkeysforaddedsecurity,
providingcontroloverdataencryptionand
compliancewithregulatoryrequirements.
Settingacustomer-managed key:
ALTER DATABASE my_database
SET ENCRYPTION ='my_custom_key';
Thisusesauser-providedkeyfordata
encryption.
88
Data Engineering 101- Snowflake
Geospatial
Functions
Snowflakeprovidesgeospatialfunctionsto
performspatialanalysisandoperationson
geographicdata,suchasdistance
calculationsandspatialjoins.
Usingageospatialfunction:
SELECTST_DISTANCE(point1,point2)
FROMmy_table;
Thiscalculatesthedistancebetweentwo
geographicpointsstoredinatable.
89
Data Engineering 101- Snowflake
Graph Analytics
Snowflakesupportsgraphanalytics,
enablinguserstomodelandanalyze
relationshipsbetweendatapointsusing
graphstructuresandalgorithms.
Performinggraphanalytics:
CREATETABLEgraph_edges
(src INT, dst INT);
andrunninggraphqueriestoanalyze
relationships.
90
Data Engineering 101- Snowflake
Data Versioning
Snowflake'sTimeTravelandZero-Copy
Cloningfeaturesenabledataversioning,
allowinguserstocreate,manage,and
querydifferentversionsofdataforanalysis
andauditing.
Creatingaversionofatable:
CREATECLONEmy_table_clone
OFmy_table;
Thisclonerepresentsaversionofthe
originaltablethatcanbequeriedand
analyzed separately.
91
Data Engineering 101- Snowflake
API Integration
Snowflakesupportsintegrationwith
externalAPIs,allowinguserstocallexternal
servicesandincorporatereal-timedatainto
Snowflakequeriesand workflows.
CreatinganexternalfunctiontocallanAPI:
CREATEEXTERNALFUNCTION
my_ext_function()RETURNSSTRING
API_INTEGRATION=my_api_integration;
ThisfunctioncancallanexternalAPIand
returntheresulttoSnowflake.
92
THANK YOU
93