[go: up one dir, main page]

0% found this document useful (0 votes)
7 views93 pages

Snowflake - End To End Learning

Uploaded by

souravgdsc7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views93 pages

Snowflake - End To End Learning

Uploaded by

souravgdsc7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

Data Engineering

SNOWFLAKE
ALL CONCEPTS TO GET STARTED
Data Engineering 101- Snowflake

CloudData Warehouse
Acloud-basedplatformforstoringand
analyzingdata,whichoffersscalability,
flexibility,andcost-efficiencycomparedto
traditional on-premises data warehouses.

Snowflakeprovidesafullymanagedservice
withseparatecompute,storage,andcloud
serviceslayers,makingiteasiertoscaleand
managedataoperations.

2
Data Engineering 101- Snowflake

Snowflake
Architecture
Snowflake'sarchitectureseparatesstorage
andcompute,allowingforindependent
scalingandefficientdatamanagement.
Thisdesigneliminatesmanylimitationsof
traditional data warehouses.

Snowflakeusesamulti-clustershareddata
architecture,wherestorageiscentralized,
andcomputeresourcescanbescaledupor
downindependentlybasedonworkload.

3
Data Engineering 101- Snowflake

Virtual Warehouse
Avirtualwarehouseisaclusterofcompute
resourcesinSnowflake.Eachvirtual
warehousecanbescaledindependentlyto
matchtheworkload,providingthe
necessarycomputepowerforquery
executionwithoutaffectingother
warehouses.
Ifacompanyneedstorunaheavy
analyticalqueryduringpeakbusiness
hours,theycanscaleupthevirtual
warehousetoalargersize,ensuringfaster
queryperformance.Afterthepeakhours,
thewarehousecanbescaleddowntosave
costs.

4
Data Engineering 101- Snowflake

Database

Alogicalgroupingofschemas,tables,and
otherdatabaseobjects.Itprovidesa
namespacefororganizingandmanaging
data.

CreatinganewdatabaseinSnowflake:
CREATE DATABASE sales_data;.
Thiscommandsetsupanewdatabase
whereallsales-relatedschemasandtables
canbeorganized.

5
Data Engineering 101- Snowflake

Schema

Alogicalgroupingofdatabaseobjectssuch
as tables, views, and stored procedures.
Schemashelporganizeobjectswithina
database.

Creatinganewschemainadatabase:
CREATESCHEMAsales_data.january;.
Thisschemacancontainalltablesrelated to
January's sales data.

6
Data Engineering 101- Snowflake

Table

Astructuredsetofdataelements(values)
organizedinrowsandcolumns.Tablesare
fundamentalstorageobjectsinadatabase.

Creatinganewtable:
CREATE TABLE customers
(id INT, name STRING, email STRING);.

Thistablestorescustomerinformation.

7
Data Engineering 101- Snowflake

View

Avirtualtablebasedontheresult-setofa
SQLquery.Viewsdonotstoredata
themselvesbutprovideawaytorepresent
data stored in tables.

Creatinganewview:
CREATEVIEWvip_customers
ASSELECT*FROMcustomers
WHEREstatus='VIP';.
ThisviewshowsonlyVIPcustomers.

8
Data Engineering 101- Snowflake

Stage

Alocationwheredatafilesarestored
temporarilybeforebeingloadedinto
Snowflaketables.Stagescanbeinternalor
external (e.g., S3, Azure Blob Storage).

Creatinganinternalstage: CREATE STAGE


my_stage;.
Thisstagecanbeusedtostoredatafiles
beforeloadingthemintotables.

9
Data Engineering 101- Snowflake

File Format

Definestheformatofdatafilestobeloaded
intoSnowflake(e.g.,CSV,JSON,Avro).File
formatsspecifyhowSnowflakeshould
interpretthecontentsofthefiles.

CreatingafileformatforCSVfiles:
CREATEFILEFORMATmy_csv_format
TYPE='CSV'
FIELD_OPTIONALLY_ENCLOSED_BY ='"';.

10
Data Engineering 101- Snowflake

Warehouse Size

Snowflakeoffersdifferentsizesforvirtual
warehouses(e.g.,X-Small,Small,Medium,
Large)toaccommodatevariousworkloads.
Largersizesprovidemorecompute
resources.

ASmallwarehousemightbesufficientfor
routinequeries,whileaLargewarehouse
canhandlecomplexanalyticalqueries.
Adjustthesizebasedonworkload
demands.

11
Data Engineering 101- Snowflake

Scaling Up

Increasingthesizeofavirtualwarehouseto
providemorecomputeresourcesfora
specificworkload.

Scalingupavirtualwarehouse:
ALTERWAREHOUSEmy_warehouse
SET WAREHOUSE_SIZE ='LARGE';.
Thisincreasesthecomputepoweravailable
for queries.

12
Data Engineering 101- Snowflake

Scaling Out

Addingmorecomputeclusterstoavirtual
warehousetohandleincreased
concurrencyandworkloaddemands.

Enablingauto-scalingforawarehouse:
ALTERWAREHOUSEmy_warehouse
SET MAX_CLUSTER_COUNT =5;.
Snowflakewilladdclustersasneededto
handleconcurrentqueries.

13
Data Engineering 101- Snowflake

A u t o - Suspend

Automaticallysuspendsavirtualwarehouse
whenitisidleforaspecifiedperiod,saving
costs.

Settingauto-suspendforawarehouse:
ALTERWAREHOUSEmy_warehouse
SET AUTO_SUSPEND =300;.
Thewarehousewillsuspendafter5minutes
of inactivity.

14
Data Engineering 101- Snowflake

A u t o - Resume

Automaticallyresumesasuspendedvirtual
warehousewhenaqueryissubmitted,
ensuringavailabilitywithoutmanual
intervention.

Enablingauto-resumeforawarehouse:
ALTERWAREHOUSEmy_warehouse
SET AUTO_RESUME =TRUE;.
Thewarehousewillresumeautomatically
whenaqueryissubmitted.

15
Data Engineering 101- Snowflake

Query Caching

Snowflakecachestheresultsofqueriesto
speeduprepeatedqueryexecutions,
reducingtheneedforre-computationand
savingcomputeresources.

Runningthesamequerytwicewillutilize
thecachedresultiftheunderlyingdatahas
notchanged,improvingperformanceand
efficiency.

16
Data Engineering 101- Snowflake

Result Cache

Storestheresultsofqueriesexecuted
withinthepast24hours.Thecacheis
accessibletoalluserswithintheaccount,
reducingcomputecostsandspeedingup
query performance.

Ifaqueryisrunandthenre-runwithin24
hourswithoutchangestotheunderlying
data,theresultisfetchedfromtheresult
cache,savingcomputeresources.

17
Data Engineering 101- Snowflake

Metadata C a c h e

Storesmetadataaboutdatabaseobjectsto
speedupqueryparsingandplanning.This
cachehelpsoptimizequeryexecutionby
reducingthetimeneededtoaccess
metadata.

Metadataabouttables,columns,and
statisticsiscached,allowingfasterquery
planningandexecution.Thishelps
Snowflakeoptimizeperformancefor
complexqueries.

18
Data Engineering 101- Snowflake

Data Caching

Snowflakecachesdatainthelocalstorage
ofvirtualwarehousestoimprovequery
performance.Thiscacheisindependentfor
eachvirtualwarehouse.

Frequentlyaccesseddataisstoredinthe
localdiskcacheofavirtualwarehouse,
reducingtheneedtofetchdatafrom
remotestoragerepeatedly,thusimproving
performance.

19
Data Engineering 101- Snowflake

Stages

LocationsinSnowflakewheredatafilescan
bestoredbeforebeingloadedintotables.
Stagescanbeinternal(withinSnowflake)or
external (e.g., AWS S3).

Aninternalstagecanbecreatedusing
CREATE STAGE my_stage;.
Datafilescanbeuploadedtothisstage
beforebeingloadedintoatable.

20
Data Engineering 101- Snowflake

COPY INTO
Command
Usedtoloaddatafromastageintoa
Snowflaketable.Thecommandspecifies
thetargettableandthesourcefile(s)along
withoptionaltransformations.

Loadingdatafromastageintoatable:
COPYINTOmy_table
FROM @my_stage/file.csv
FILE_FORMAT=(FORMAT_NAME=
'my_csv_format');.

21
Data Engineering 101- Snowflake

Time Travel

Allowsuserstoquery,clone,orrestoredata
toapreviousstatewithinadefined
retentionperiod.Thisfeatureaidsindata
recovery and auditing.

Queryingatableasitwasataspecificpoint
in time:
SELECT *FROM my_table
AT (TIMESTAMP =>'2022-06-01T00:00:00');.

22
Data Engineering 101- Snowflake

Zero-Copy Cloning

Enablescreatingacloneofadatabase,
schema,ortablewithoutcopyingthedata.
Changestotheclonedonotaffectthe
original, and vice versa.

Creatingacloneofatable:
CREATECLONEmy_table_clone
OF my_table;.
Thisallowsworkingwithasnapshotofthe
datawithoutadditionalstoragecosts.

23
Data Engineering 101- Snowflake

Secure Data Sharing

Allowssharingofdataacrossdifferent
Snowflakeaccountswithoutmovingor
copyingthedata.Consumerscanquery
shared data in real-time.

SharingdatawithanotherSnowflake
account:
CREATE SHARE my_share;
GRANT SELECT ON TABLE my_table TO
SHARE my_share;.
Therecipientcanaccesstheshareddata
directly.

24
Data Engineering 101- Snowflake

Snowsight
Snowflake'snewwebuserinterfacethat enhancestheuserexperience
withfeatures likeintegrateddashboards,interactive visualizations,and
animprovedSQLeditor.

Userscancreateandmanageinteractive dashboardswithin
Snowsight,allowing themtovisualizedatatrendsandshare insights
withtheirteam.Forexample,a salesteamcanuseSnowsightto
builda dashboardthattracksmonthlysales performanceacross
different regions.

25
Data Engineering 101- Snowflake

Snowflake
Community
Avibrantnetworkofusers,experts,and
partnerswhoshareknowledge,best
practices,andsupporteachotherinusing
Snowflake.Itincludesusergroups,forums,
andspecialinterestgroups.

Joining the Snowflake Community allows


userstoparticipate in discussions,attend
meetups, and access valuable resources.
Forinstance,adataanalystcanjoinavirtual
specialinterestgroupfocusedondata
warehousingtolearnfromothers'
experiencesandsharetheirowninsights.

26
Data Engineering 101- Snowflake

Data Marketplace
TheSnowflakeDataMarketplaceisa
platformwhereuserscandiscover,access,
andsharelivedatasetsfromvarious
providers.Itfacilitatesdatacollaboration
andallowsuserstoenrichtheirowndata
withexternaldatasources.

Amarketingteamcanaccessdemographic
datafromathird-partyproviderthrough
theDataMarketplacetoenhancetheir
customeranalysis.Theycanintegratethis
datawiththeirinternalsalesdatatogain
deeperinsightsintocustomerbehaviorand
preferences.

27
Data Engineering 101- Snowflake

Multi-Cluster Warehouses
Multi-ClusterWarehousesallowSnowflake
toautomaticallymanagethenumberof
computeclustersneededtohandlevarying
workloads.Thisensuresoptimal
performanceandresourceutilization
withoutmanualintervention.
A retailcompanycansetup amulti-cluster
warehousetohandlethehighconcurrency
ofqueriesduringBlackFridaysales.
Snowflakeautomaticallyaddsclustersto
managetheincreasedloadandremoves
themwhentheloaddecreases,ensuring
efficientuseofresourcesandcost
management.

28
Data Engineering 101- Snowflake

Materialized Views

Materializedviewsstoretheresultsetofa
queryphysicallyandautomaticallyupdate
whentheunderlyingdatachanges.They
improvequeryperformancebyproviding
pre-computedresults.

Creatingamaterializedview:
CREATE MATERIALIZED VIEW mv_sales
ASSELECT*FROMsales
WHERE year =2022;.
Queriesonthisviewarefastersincethe
resultsarepre-computed.

29
Data Engineering 101- Snowflake

Task

Tasksareusedtoautomatetheexecution
ofSQLstatements,includingprocedural
logic,atspecifiedintervalsorupon
completionofothertasks.

Creatingatasktorunaqueryeveryhour:
CREATETASKhourly_task
WAREHOUSE='my_warehouse'
SCHEDULE='1HOUR'
ASI
NSERTINTOdaily_sales
SELECT*FROMsales
WHEREsales_date=CURRENT_DATE;.

30
Data Engineering 101- Snowflake

Stream

Streamstrackchangestoatable(inserts,
updates,deletes)andprovideachange
datacapture(CDC)mechanismforefficient
data processing.

Creatingastream:
CREATESTREAMsales_streamONTABLE
sales;.
Thestreamcaptureschangestothesales
table,whichcanbeprocessedlater.

31
Data Engineering 101- Snowflake

Pipe

Pipesautomatedataloadingby
continuouslyingestingdatafromexternal
stages(e.g.,AWSS3,AzureBlobStorage)
intoSnowflaketables.

CreatingapipetoloaddatafromanS3
bucket:
CREATEPIPEmy_pipe
ASCOPYINTOmy_table
FROM@my_stage/file.csv
FILE_FORMAT=(FORMAT_NAME=
'my_csv_format');.

32
Data Engineering 101- Snowflake

Warehouse
Monitoring
Snowflakeprovidestoolstomonitorthe
performanceandusageofvirtual
warehouses,helpingusersoptimize
resourceallocationandmanagecosts.

UsingtheWAREHOUSE_METERING_HISTORY
viewtomonitorwarehouseusageandcosts:
SELECT*FROM
WAREHOUSE_METERING_HISTORY
WHEREWAREHOUSE_NAME=
'my_warehouse';.

33
Data Engineering 101- Snowflake

Role-BasedAccess Control
(RBAC)
Asecuritymodelthatrestrictsaccessto
dataandresourcesbasedontheroles
assigned to users. Snowflake allows fine-
grainedcontroloveraccesspermissions.

Creatingaroleandgrantingprivileges:
CREATEROLEanalyst_role;
GRANTSELECTONDATABASEsales_dataTO
ROLE analyst_role;.

Assigningtheroletoauser:
GRANT ROLE analyst_role TO USER john_doe;.

34
Data Engineering 101- Snowflake

DynamicData Masking
DynamicDataMaskingallowsSnowflaketo
hidesensitivedatainqueryresultsbased
ontheroleoftheuseraccessingthedata.
Thisenhancesdatasecurityandprivacy.
Maskingsensitivedata:
CREATEMASKINGPOLICYssn_mask AS (val
STRING) RETURNS STRING ->CASE
WHENCURRENT_ROLE()IN('analyst_role')
THEN 'XXX-XX-XXXX'
ELSEvalEND;Applyingthepolicy:
ALTERTABLEcustomers
MODIFYCOLUMNssn SET MASKING POLICY
ssn_mask;.

35
Data Engineering 101- Snowflake

External Tables

ExternaltablesallowSnowflaketoquery
datastoredinexternallocations(e.g.,AWS
S3,AzureBlobStorage)withoutloadingit
intoSnowflake.

Creatinganexternaltable:
CREATEEXTERNALTABLEmy_ext_tableWITH
LOCATION='@my_external_stage'
FILE_FORMAT=(FORMAT_NAME=
'my_csv_format');.
Thistableallowsqueryingdatadirectly
fromtheexternalstage.

36
Data Engineering 101- Snowflake

Data Replication

Snowflake'sdatareplicationfeatureallows
forthereplicationofdatabasesacross
differentregionsandcloudprovidersto
enhancedataavailabilityanddisaster
recovery.

Settingup datareplication:
CREATEREPLICATIONGROUP
my_replicationASREPLICATION
TO REGION 'aws_us_west_2';.
Thisreplicatesthedatabasetoadifferent
AWSregion.

37
Data Engineering 101- Snowflake

Failoverand
Failback
Snowflakeprovidesfailoverandfailback
capabilitiestoensurehighavailabilityand
disasterrecovery.Failoverallowsswitching
toareplicaincaseofafailure,andfailback
switchesbackoncetheoriginalisrestored.

Configuringfailoverforadatabase:
ALTER DATABASE my_database
SETFAILOVERGROUP=my_failover_group;.
Thisensuresthatthedatabasecanswitch
toareplicaincaseofafailure.

38
Data Engineering 101- Snowflake

SearchOptimization
Service
ASnowflakefeaturethatimprovesthe
performanceofsearchesonlargetablesby
creatingandmaintainingsearch
optimizationstructures.

Enablingsearchoptimizationforatable:
ALTERTABLEmy_tableSETSEARCH
OPTIMIZATION =TRUE;.
Thisimprovestheperformanceofsearch
queriesonthetable.

39
Data Engineering 101- Snowflake

SnowflakeData
Exchange
AplatformthatallowsSnowflakeusersto
shareandaccesslivedatasecurely.It
facilitatesdatacollaborationand
monetizationbyprovidingamarketplace
for data providers and consumers.
PublishingdatatotheDataExchange:
CREATEEXCHANGEmy_exchange;
GRANTSELECTONTABLEmy_tableTO
EXCHANGEmy_exchange;.
Otheruserscansubscribetoandquerythe
shared data.

40
Data Engineering 101- Snowflake

Data Masking
Datamaskingprovidesawaytoprotect
sensitivedatabymaskingitinqueryresults,
basedonuserroles.Thisensuresthat
sensitiveinformationisnotexposedto
unauthorizedusers.
Creatingadatamaskingpolicy:
CREATEMASKINGPOLICYemail_mask
AS (val STRING)
RETURNS STRING ->CASE
WHENCURRENT_ROLE()IN('analyst_role')
THEN '********@domain.com' ELSE val END;
Applyingthepolicytoacolumn:
ALTERTABLEusers
MODIFYCOLUMNemailSETMASKINGPOLICY
email_mask;.

41
Data Engineering 101- Snowflake

Snowpipe

SnowpipeisSnowflake'scontinuousdata
ingestionservice,whichallowsforthe
automatedloadingofdatafromexternal
stagesintoSnowflaketables.
CreatingaSnowpipetoloaddata:
CREATEPIPEmy_pipe
ASCOPYINTOmy_table
FROM@my_stage
FILE_FORMAT=(FORMAT_NAME=
'my_csv_format');.
Snowpipewillautomaticallyloadnewdata
files as they arrive in the stage.

42
Data Engineering 101- Snowflake

External Functions

ExternalfunctionsallowSnowflaketocall
externalservicesandintegratewith
externalsystemsdirectlyfromSQLqueries.
Thisenablesadvanceddataprocessingand
integrationcapabilities.
Creatinganexternalfunction:
CREATE EXTERNAL FUNCTION
my_ext_function()
RETURNS STRING API_INTEGRATION =
my_api_integration;.
ThisfunctioncancallanexternalAPIand
returntheresulttoSnowflake.

43
Data Engineering 101- Snowflake

Streams and Tasks

Streamstrackchangestotables,andtasks
automatetheexecutionofSQLbasedon
schedulesorevents.Together,theyenable
efficientchangedatacaptureand automation.
Creatingastreamandtask:
CREATESTREAMmy_stream
ONTABLEmy_table;
CREATETASKmy_taskWAREHOUSE=
'my_warehouse'SCHEDULE='1HOUR'
AS
INSERTINTOmy_target_table
SELECT*FROMmy_stream;.

44
Data Engineering 101- Snowflake

Snowflake
Organizations
SnowflakeOrganizationsprovidea wayto
managemultipleSnowflakeaccounts
withinanorganization.Thisenablesbetter
resourceallocation,costmanagement,and
governance.
Creatinganorganization:
CREATE ORGANIZATION my_org;
andaddingaccountstoit.Thisallows
centralmanagementofmultipleSnowflake
accounts.

45
Data Engineering 101- Snowflake

Data Governance

Snowflakeoffersfeaturesfordata
governance,includingaccesscontrols,data
masking,andauditlogging,toensuredata
security, privacy, and compliance.

Implementingdatagovernance:
CREATEROWACCESSPOLICYmy_policyAS
(val STRING)
RETURNSBOOLEAN->CURRENT_ROLE()
IN ('data_governance_role');
andapplyingittoatable.

46
Data Engineering 101- Snowflake

Account Usage

Snowflakeprovidesaccountusageviewsto
trackandanalyzeresourceusage,query
performance,andcostmanagement.These
viewshelpinmonitoringandoptimizing
Snowflake usage.

Queryingaccountusage:
SELECT*FROM
ACCOUNT_USAGE.QUERY_HISTORYWHERE
QUERY_TEXT ILIKE '%SELECT%';
ThisretrievesthehistoryofSELECTqueries
executedintheaccount.

47
Data Engineering 101- Snowflake

Resource Monitors

Resourcemonitorsallowadministratorsto
manageandcontrolcomputeresource
usagebysettingthresholdsandtriggering
actionswhenlimitsarereached.

Creatingaresourcemonitor:
CREATERESOURCEMONITORmy_monitor
WITH CREDIT_QUOTA =1000;
andassigningittoawarehouse.This
monitorwilltrackthecomputecreditsused
bythewarehouseandtakeactionifthe
quotaisexceeded.

48
Data Engineering 101- Snowflake

Query Optimization
Snowflakeprovidesvarioustoolsand techniquestooptimizequery
performance, includingusingtheQueryProfiler, optimizingtable
structures,andleveraging caching.

UsingtheQueryProfiler:
SELECT*FROM
TABLE(QUERY_HISTORY_BY_SESSION(SESSI
ON_ID =>'my_session'));

Thishelpsidentifyandoptimizeslow-runningqueries.

49
Data Engineering 101- Snowflake

Data Sharing

Snowflakeallowssecuresharingofdata
betweendifferentaccountswithoutdata
movement.Shareddatacanbeaccessedin
real-time,ensuringconsistencyand
reducinglatency.

Creatingashare:CREATESHAREmy_share;
andaddingtablestoit.OtherSnowflake
accountscanaccesstheshareddata
directly.

50
Data Engineering 101- Snowflake

Cloning

CloninginSnowflakecreatesacopyofa
database,schema,ortablewithout
duplicatingthedata.Thisisusefulfor
creatingtestenvironmentsandforbackup
purposes.

Cloningatable:
CREATECLONEmy_table_cloneOF
my_table;
Thisallowsworkingwithasnapshotofthe
datawithoutadditionalstoragecosts.

51
Data Engineering 101- Snowflake

DataLoadand
Unload
Snowflakeprovidesvariousmethodsfor
loadingandunloadingdata,includingbulk
loadingwiththeCOPYcommand,using
Snowpipeforcontinuousloading,and
unloadingdatatoexternalstages.
Loadingdata:
COPYINTOmy_table
FROM@my_stageFILE_FORMAT=
(FORMAT_NAME='my_csv_format');
andunloadingdata:
COPYINTO@my_stageFROMmy_table;.

52
Data Engineering 101- Snowflake

Data Encryption

Snowflakeencryptsdataatrestandin
transittoensuredatasecurity.Encryption
keysaremanagedautomatically,andusers
canalsoprovidetheirownkeysfor
additionalsecurity.

Enablingencryptionforatable:
ALTERTABLEmy_tableSET
DATA_RETENTION_TIME_IN_DAYS =90;
Thisensuresthatdataisencryptedand
retainedforaspecifiedperiod.

53
Data Engineering 101- Snowflake

Data Retention

Snowflakeprovidesdataretentionpolicies
tomanagehowlongdataiskeptinthe
system. This includes Time Travel and Fail-
safe periods for data recovery.

Setting data retention:


ALTERTABLEmy_tableSET
DATA_RETENTION_TIME_IN_DAYS =7;
Thisconfiguresthetabletoretainhistorical
data for 7 days.

54
Data Engineering 101- Snowflake

Fa i l - Safe

Fail-SafeisaSnowflakefeaturethat
providesanadditional7-dayperiodfor
recoveringdataaftertheTimeTravel
retentionperiodhasexpired.Thisensures
data recovery in case of failures.

Accessing Fail-Safe data:


SELECT*FROMmy_tableBEFORE
(END_TIME =>'2022-06-01T00:00:00');
ThisretrievesdatathatisintheFail-Safe
period.

55
Data Engineering 101- Snowflake

User-Defined
Functions(UDFs)
UDFsallowuserstodefinetheirown
functionsinSQLorJavaScript,extending
Snowflake'sbuilt-infunctionalitywith
customlogic.
CreatingaSQLUDF:
CREATEFUNCTIONmy_udf(xINT)
RETURNS INT
LANGUAGESQL
AS
'RETURNx *2';
Thisfunctionmultipliestheinputby2.

56
Data Engineering 101- Snowflake

Stored Procedures
StoredproceduresinSnowflakeallowfor procedurallogicandcomplex
operationsto beencapsulatedinSQLorJavaScript, enabling
automationandreusablecode.

Creatingastoredprocedure:
CREATE PROCEDURE my_proc()
RETURNS STRING LANGUAGE JAVASCRIPT
AS $$ return 'Hello, World!';
$$;
andcallingit:
C A L L my_proc();.

57
Data Engineering 101- Snowflake

Privilegesand
Grants
Snowflake'ssecuritymodelusesprivileges
andgrantstocontrolaccesstodatabase
objects.Rolesareassignedprivileges,and
usersareassignedroles.
Grantingprivileges:
GRANTSELECTONTABLEmy_table
TOROLEanalyst_role;
Thisallowsuserswiththeanalyst_roleto
querythetable.

58
Data Engineering 101- Snowflake

RolesandRole
Hierarchies
RolesinSnowflakedefineasetofprivileges
andcanbeassignedtousers.Role
hierarchiesallowrolestoinheritprivileges
fromotherroles,simplifyingaccess
management.
Creatingarolehierarchy:
CREATEROLEsenior_analyst;
GRANTROLEanalyst_roleTOROLE
senior_analyst;
Userswiththesenior_analystroleinherit
privilegesfromtheanalyst_role.

59
Data Engineering 101- Snowflake

Session Variables

SessionvariablesinSnowflakestorevalues
thatcanbeusedwithinasession.They
allowfordynamicSQLandreusablecode.

Settingandusingasessionvariable:
SETmy_var='Hello,World!';
and SELECT $my_var;
Thisreturnsthevalueofthevariable.

60
Data Engineering 101- Snowflake

Parameter
Management
Snowflakeallowsconfigurationofvarious
parametersattheaccount,session,and
objectlevelstocustomizebehaviorand
optimizeperformance.

Setting a session parameter:


ALTER SESSION SET QUERY_TAG =
'MyQuery';
Thistagsquerieswithinthesessionfor
easier tracking.

61
Data Engineering 101- Snowflake

Semi-Structured Data
Snowflakesupportssemi-structureddata
formatssuchasJSON,Avro,Parquet,and
XML.Thisallowsforflexibledatamodeling
andintegrationwithmoderndatasources.

QueryingJSONdata:SELECTjson_data:id
FROMmy_table;.Thisretrievesthe"id"field
fromJSONdatastoredinacolumn.

62
Data Engineering 101- Snowflake

Data Compression

Snowflakeautomaticallycompressesdata
toreducestoragecostsandimprovequery
performance.Differentcompression
algorithmsareusedbasedonthedatatype.

Snowflake'sautomaticcompressionmeans
usersdon'tneedtomanuallyconfigure
compressionsettings,astheplatform
optimizes storage efficiency.

63
Data Engineering 101- Snowflake

Cost Management

Snowflakeprovidestoolsandpracticesto
manageandoptimizecosts,including
resourcemonitors,usageviews,andbest
practicesforqueryoptimization.

Usingresourcemonitorstocontrolcosts:
CREATERESOURCEMONITORmy_monitor
WITH CREDIT_QUOTA =1000;
andsettingupalertsforbudgetthresholds.

64
Data Engineering 101- Snowflake

Query History

Snowflaketracksqueryhistory,allowing
userstoreviewandanalyzepastqueriesfor
performanceoptimizationand
troubleshooting.

Accessingqueryhistory:
SELECT *FROM QUERY_HISTORY
WHERE QUERY_TEXT ILIKE '%SELECT%'; This
retrievesahistoryofSELECTqueries
executedintheaccount.

65
Data Engineering 101- Snowflake

Metadata
Management
Snowflakemanagesmetadataforall
databaseobjects,providingdetailed
informationabouttables,columns,and
otherobjects.Thismetadataisusedfor
queryoptimizationanddatagovernance.

Queryingmetadata:
SELECT*FROM
INFORMATION_SCHEMA.TABLES
WHERETABLE_SCHEMA='PUBLIC';
Thisretrievesinformationaboutalltablesin
thePUBLICschema.

66
Data Engineering 101- Snowflake

Data Import/Export

Snowflakesupportsvariousmethodsfor
importingandexportingdata,including
bulkloadingwiththeCOPYcommandand
unloadingtoexternalstages.

Importingdata:
COPYINTOmy_table
FROM@my_stageFILE_FORMAT=
(FORMAT_NAME='my_csv_format');
andexportingdata:
COPYINTO@my_stageFROMmy_table;

67
Data Engineering 101- Snowflake

Data Quality

Snowflakeprovidesfeaturestoensuredata
quality,includingconstraints,data
validation, and profiling.

Implementing dataqualitychecks:
CREATETABLEmy_table(
idINTPRIMARYKEY,
name STRING NOT NULL);
Thisensuresthatthe"id"columnisunique
andthe"name"columnisnotnull.

68
Data Engineering 101- Snowflake

Data Lineage
Datalineagetrackstheflowofdata
throughSnowflake,fromingestionto
transformationtoanalysis,providing
visibilityintodatadependenciesand
transformations.
Usingviewsandtaskstotrackdatalineage:
CREATEVIEWmy_view
AS SELECT*FROMmy_table;
and
CREATETASKmy_task
AS
INSERTINTOmy_table
SELECT *FROM my_view;

69
Data Engineering 101- Snowflake

Business Continuity

Snowflake'sfeaturesforbusinesscontinuity
includedatareplication,failover,andfail-
safe,ensuringthatdataisalwaysavailable
and recoverable in case of disasters.

Setting up a failover group: CREATE


FAILOVER GROUP my_group
AS FAILOVER TO REGION 'aws_us_west_2';
Thisensuresthatthedatabasecanswitch
toareplicaincaseofafailure.

70
Data Engineering 101- Snowflake

Governanceand
Compliance
Snowflakeprovidestoolsfordata
governanceandcompliance,including
accesscontrols,datamasking,andaudit
logging,toensuredatasecurityand
regulatorycompliance.

Implementingcompliancepolicies:
CREATEROW ACCESSPOLICY
compliance_policyAS(valSTRING)
RETURNSBOOLEAN->CURRENT_ROLE()
IN ('compliance_role');
andapplyingittoatable.

71
Data Engineering 101- Snowflake

Advanced Analytics

Snowflakesupportsadvancedanalytics
capabilities,includingmachinelearning
integration,geospatialdataprocessing,and
complexdatatransformations.

Integratingwithmachinelearningmodels:
CREATEFUNCTIONpredict_sales(xFLOAT)
RETURNSFLOAT
LANGUAGEPYTHONRUNTIME='3.8'
H A N D L E R ='my_model.predict';
ThisfunctioncallsaPythonmodelforsales
prediction.

72
Data Engineering 101- Snowflake

Data Monetization

Snowflake'sdatamarketplaceandsecure
datasharingenableorganizationsto
monetizetheirdataassetsbysharingor
sellingdatatootherSnowflakeusers.

Publishingdataformonetization:
CREATEEXCHANGEmy_exchange;
andaddingdatatoitforotherusersto
accessandpurchase.

73
Data Engineering 101- Snowflake

Geospatial Data

Snowflakesupportsgeospatialdatatypes
andfunctions,allowinguserstostore,
query,andanalyzespatialdatasuchas
points, polygons, and geometries.

Querying geospatialdata:
SELECTST_DISTANCE(point1,point2)
FROMmy_table;
Thiscalculatesthedistancebetweentwo
pointsstoredinatable.

74
Data Engineering 101- Snowflake

IoT Data Processing

Snowflake'sscalablearchitectureand
supportforsemi-structureddatamakeit
well-suitedforprocessingandanalyzingIoT
(Internet of Things) data.

LoadingIoTdata:
COPYINTOmy_table
FROM@iot_stage
FILE_FORMAT=(FORMAT_NAME=
'json_format');
This ingests JSON data from IoT devices.

75
Data Engineering 101- Snowflake

Real-Time Analytics

Snowflakesupportsreal-timeanalyticsby
allowingcontinuousdataingestionand
immediatequeryingoffreshdata.

UsingSnowpipeforreal-timedata
ingestion:
CREATEPIPEmy_pipe
ASCOPYINTOmy_table
FROM@my_stage
FILE_FORMAT=(FORMAT_NAME=
'my_csv_format');

76
Data Engineering 101- Snowflake

Data Federation
Snowflake'sexternaltablesanddata sharingfeaturesenabledata
federation, allowinguserstoqueryandcombinedata frommultiple
sourceswithoutmovingthe data.

Creatinganexternaltabletofederatedata: CREATE
EXTERNALTABLEmy_ext_tableWITH LOCATION
='@my_external_stage' FILE_FORMAT=
(FORMAT_NAME=
'my_csv_format');
Thistableallowsqueryingdatadirectly fromtheexternalstage.

77
Data Engineering 101- Snowflake

Security
Integrations
Snowflakeintegrateswithsecuritytools
andframeworks,includingsinglesign-on
(SSO),multi-factorauthentication(MFA),
andencryptionkeymanagement,to
enhancedatasecurity.

ConfiguringSSO:
ALTER ACCOUNT SET SSO_LOGIN_PAGE =
'https://mycompany.com/sso';.
Thisenablessinglesign-onforSnowflake
users.

78
Data Engineering 101- Snowflake

ContinuousData
Protection
Snowflake'scontinuousdataprotection
featuresincludeTimeTravel,Fail-safe,and
datareplication,ensuringdataintegrityand
availability at all times.
Setting up data replication:
CREATE REPLICATION GROUP
my_replication
AS REPLICATION TO REGION
'aws_us_west_2';.
Thisreplicatesthedatabasetoadifferent
AWSregion.

79
Data Engineering 101- Snowflake

Custom Data Types

Snowflakeallowsuserstodefinecustom
datatypesandenforcedataintegrity
throughconstraintsand validationrules.

Creatingacustomdatatype:
CREATE DOMAIN email_type
AS STRING CHECK (VALUE LIKE '%@%.%');
This enforces email format validation.

80
Data Engineering 101- Snowflake

Hybrid Tables

HybridtablesinSnowflakecombinethe
benefitsoftransactionalandanalytical
processing,allowingforefficientreal-time
data analysis.

Creatingahybridtable:
CREATEHYBRIDTABLEmy_table
(id INT, data STRING);
Thistablesupportsbothtransactionaland
analytical workloads.

81
Data Engineering 101- Snowflake

Data Archiving

Snowflake'sdataretentionandarchiving
featureshelpmanagelong-termstorageof
historicaldata,ensuringthatitisavailable
forcomplianceandanalysis.

Setting data retention:


ALTER TABLE my_table
SET DATA_RETENTION_TIME_IN_DAYS =365;
Thisconfiguresthetabletoretainhistorical
data for one year.

82
Data Engineering 101- Snowflake

Data Classification

DataclassificationinSnowflakehelps
categorizedatabasedonsensitivityand
importance,enablingbetterdata
governanceandsecurity.

Classifying data:
ALTERTABLEmy_tableSETTAG
classification ='sensitive';
Thistagsthetableascontainingsensitive
data.

83
Data Engineering 101- Snowflake

DataMasking
Policies
DatamaskingpoliciesinSnowflakeprovide
dynamicmaskingofsensitivedatabased
onuserroles,ensuringthatonlyauthorized
userscanseetheactualdata.
Creatingadatamaskingpolicy:
CREATEMASKINGPOLICYssn_mask
AS (val STRING)
RETURNS STRING ->CASE
WHENCURRENT_ROLE()IN('analyst_role')
THEN 'XXX-XX-XXXX' ELSE val END;
Applyingthepolicy:
ALTERTABLEcustomersMODIFYCOLUMNssn
SET MASKING POLICY ssn_mask;

84
Data Engineering 101- Snowflake

Row Access Policies

RowaccesspoliciesallowSnowflaketo
restrictaccesstospecificrowsinatable
basedonuserrolesandothercriteria,
enhancingdatasecurityandcompliance.

Creatingarowaccesspolicy:
CREATEROWACCESSPOLICYrow_policy
AS (val STRING)
RETURNSBOOLEAN->CURRENT_ROLE()IN
('analyst_role');
Applyingthepolicy:
ALTERTABLEmy_tableMODIFYROW
ACCESS POLICY row_policy;.

85
Data Engineering 101- Snowflake

Cross-Cloud
Replication
Snowflakesupportscross-cloudreplication,
allowingdatatobereplicatedacross
differentcloudproviders(e.g.,AWS,Azure,
GoogleCloud)forhighavailabilityand
disaster recovery.
Settingupcross-cloudreplication:
CREATEREPLICATIONGROUP
my_replication
ASREPLICATIONTOREGION'azure_eastus';
ThisreplicatesthedatabasetoanAzure
region.

86
Data Engineering 101- Snowflake

Event-DrivenData Processing
Snowflake'stasksandstreamsenable event-
drivendataprocessing,allowing
actionstobetriggeredbasedonchangesin
dataorscheduledintervals.

Creatinganevent-driventask:
CREATETASKmy_task
WAREHOUSE='my_warehouse'
AFTER INSERT ON my_table
ASI
NSERTINTOaudit_table
SELECT *FROM my_table;

87
Data Engineering 101- Snowflake

DataEncryptionKey
Management
Snowflakeallowsuserstomanagetheir
ownencryptionkeysforaddedsecurity,
providingcontroloverdataencryptionand
compliancewithregulatoryrequirements.

Settingacustomer-managed key:
ALTER DATABASE my_database
SET ENCRYPTION ='my_custom_key';
Thisusesauser-providedkeyfordata
encryption.

88
Data Engineering 101- Snowflake

Geospatial
Functions
Snowflakeprovidesgeospatialfunctionsto
performspatialanalysisandoperationson
geographicdata,suchasdistance
calculationsandspatialjoins.
Usingageospatialfunction:
SELECTST_DISTANCE(point1,point2)
FROMmy_table;
Thiscalculatesthedistancebetweentwo
geographicpointsstoredinatable.

89
Data Engineering 101- Snowflake

Graph Analytics

Snowflakesupportsgraphanalytics,
enablinguserstomodelandanalyze
relationshipsbetweendatapointsusing
graphstructuresandalgorithms.

Performinggraphanalytics:
CREATETABLEgraph_edges
(src INT, dst INT);
andrunninggraphqueriestoanalyze
relationships.

90
Data Engineering 101- Snowflake

Data Versioning

Snowflake'sTimeTravelandZero-Copy
Cloningfeaturesenabledataversioning,
allowinguserstocreate,manage,and
querydifferentversionsofdataforanalysis
andauditing.

Creatingaversionofatable:
CREATECLONEmy_table_clone
OFmy_table;
Thisclonerepresentsaversionofthe
originaltablethatcanbequeriedand
analyzed separately.

91
Data Engineering 101- Snowflake

API Integration

Snowflakesupportsintegrationwith
externalAPIs,allowinguserstocallexternal
servicesandincorporatereal-timedatainto
Snowflakequeriesand workflows.

CreatinganexternalfunctiontocallanAPI:
CREATEEXTERNALFUNCTION
my_ext_function()RETURNSSTRING
API_INTEGRATION=my_api_integration;
ThisfunctioncancallanexternalAPIand
returntheresulttoSnowflake.

92
THANK YOU

93

You might also like