0 ratings0% found this document useful (0 votes) 323 views63 pagesBest Practices For Using Tableau With Snowflake
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
RT
ys snowflake
=
ro
Best Practices for Using
Tableau with Snowflake
BY ALAN ELDRIDGE, ET. AL.>
b
SS snowflake
What's inside:
ect
Peace
Ce eed
Cet ne hae ence
OMe ecco thane y
1c
Sa ae
PN ene ate)
Pe eS a
a en eee
29 Implementing role-based sccurity
EC eo esac
Te Re ne
ey
Ce ee cee er
CO oe anes
adwb
ys snowflake
Introduction
It isa widely recognized truth today that data is an increasingly valuable asset for organizations.
Greater and greater volumes of data are being generated and captured across systems at
increasing levels of granularity, At the same tims, move Useis are demanding sccess to this dats
‘to answer questions and gain business insignt
Snowflake and Tableau are leading technologies addressing these two challenges—Snowflake
providing a near-limitless platform for cloud-buit data warehousing: and Tableau providing a
highly intuitive, self-service data analytics platform.
‘The objective of this whitepaper isto hep you make: utes fiom both of these
highly complementary products. It is intended for Tableau users who are new to Snawfaks,
Snowfiake users who are new ta Tableau, and of course any users who are new to both,
tse off
{G-cnmsenon auoes nest PACTS FO UNING TABLEAU WITH SHOWSLAKE 3dL
aS snowflake
What is Tableau?
‘Tableau Software fsa business inteligence solution thet integrates data analysis and reports into a
process, one that lets everyday business users quickly explore data and
shift views on the fly to follow their train of thaught. Tableau combines data exploration, visualization,
reporting and dashboarding into an application that is easy to lear and use.
continuous visual anal
Tableau’ solution set consists of
INTERACT ‘three main products:
wine Putyour dats an where ye peed it vis browser, + Tableau Desktop—the end-user tool for data
een SLs aes srs oa comboweng en oe
tsed on ts oan orn conjunction wth Tableau
oO = Server/Tebleau Online
“Tableau Serverthe server letform
that provides series for olaboration
prereset and content
shoring. Thiscan be deployed ieee o
inthe ous on Amazes AS Microsofts
Azure or Google's GCP).
“Tableau Onine-2 software
Version of beau Sener
a-service
Either warking standalone with Tableau
Desitap, or by publishing content to Tableau
Server or Tableau Online, users can directly work
‘with data stored in Snowflake data warehouses.
SECURITY & COMPLIANCE
at) ena
DEPLOYMENT
Deploy where and how you want
Tableau can integrate into your esting data infrastructure,
‘whether on-prem orn tne cloud.
O@8ses
{ovwneon cues srsT PRACTICES FOR UNG TALEAU WITH SHOMELARE 4RT
ys snowflake
What is Snowflake?
Sawfake fs an analytic data warehouse provided as software-as-a-servce (SaaS). Snowflake
provides a data warehouse that is faster, easier to use and far more flexible than traditional data
warehouse offerings.
Snowflakes data warehouse is not bult on an
‘sting database or big-data software platform such as
Hadoop. The Snowflake data warehouse uses @ new SQL database engine with a uniaue architecture
esigned for the cloud. To the user, Snowflake has many similarities to other enterprise cats
‘warehouses, but alsa has additional functionality and unique capabilites
DATA WAREHOUSE AS A CLOUD SERVICE
Snowflake's data warehouse isa true SaaS offering. More specifically
+ There sino hardware (virtual or physical for you to select, install configure ar menage,
+ There is no software for you to instal, configure or manage,
+ Ongoing maintenance, management and tuning are handled by Snowflake,
Snowflake runs completely on cloud infrastructure. Al components of Snowflakes service (other than
2n optional command ne client) unin a secure public Cloud infrastructure.
Snowflake was originally bult on the Amazon Web Services (AWS) cloud infrastructure, Snowflake
Uses virtual comaute instances provided by AWS EC2 (Elastic Compute Clous) for its comoute
nescs and AWS 3 (Simple Storase Service for persistent storage of dats, In aclition, #3 Snow!lake
continues ta serve enterprises ofall sizes across all industries, more and more customers consider
AWS and other cloud infrastructure providers when moving to the cloud, Snowflake has responded,
and is now also avalible on Microsoft Azure.
Snowfake Is not @ packaged software offering that can be installed by @ user. Snowflake manages al
aspects of software installation and updates. Snowflake cannot be run on private clou Infrastructures
(on-premises or hasted),
SNOWFLAKE ARCHITECTURE
Snowflakes architecture isa hybrid of traditional shared-cisk database architectures and shared
nothing database architectures. Simlar to shared-disk architectures, Snowflake uses @ central data
repository for persisted data that's accessible from all compute nodes in the data warehouse, But
similar to shared-nothing architectures, Snov/fiake processes queries using MPP (massively parallel
processing) compute clusters where each node in the cluster stores ¢ portion of the entire data set
locally This suptoach offers the Gate management simplicity of # shavedhoisk architecture, out with the
performance and scale-out benefits of a shared-nathing architecture.
{Pevwomon cuves west raAcrices FoR USING TABLEAU WITH SHOWFLARE 5RT
ys snowflake
Snowflake’ unique architecture consists of three key layers
+ Cloud Services
+ Query Processing
+ Database Storage
CLOUD
SERVICES
QUERY
PROCESSING
DYN
Patel tas
DATABASE STORAGE
‘When data is loaded into Snowflake, Snowflake reorganizes that data into its intemal optimizes,
compressed, columnar format. Snovifiake siores this optimized dats using Amazon Web Service's 59
(Simple Storage Service) cloud storage or Azure Blob Storage
Snowflake manages all aspects of how this data is stored in AWS or Azure—the organization. ile size,
structure, compression, metadata statistics and other aspects of data storage, The date objects stores
by Snowflake are not crectly visible or accessinle by customers; they are accessible only thraugh SQL
‘query operations run using Snowfake,
QUERY PROCESSING
Query execution is performed in the processing layer. Snowflake processes queries using ‘virtua
warehouses. Each virtual warehouse is an MPP compute cluster composed of muitiole compute nodes
allocated by Snowflake from Amazon EC2 or Azure Virtual Machines.
Each virtual warehouse isan independent compute cluster that does not share comaute resources:
“with other virtual warehouses. As 2 result each virus! warehouse has no impact en the peiformance
of other vieual warehouses,
For more information, see virtual warehouses in the Snowfake online documentation,
{G-cnmnon auoes nest PRACT FOm UNING TABLEAU WITH SMOWELAKEwb
ys snowflake
CLOUD SERVICES
‘The cloud services layers a collection of services that coordinate activities across Snowflake, These:
25 Te together all ofthe citferent components of Snowflake In order to process user requests,
from login to query cispatch. The cloud services layer also runs on compute instances provisioned
by Snowflake.
‘Among the services in this layer
+ Authentication
+ Infrastructure management
+ Metadata management
* Query parsing and optimization
+ Access control
{Grenson auoes nest PACTS FOR UNING TABLEAU WHT SMOWSLAKE 7wb
ys snowflake
What you don’t have to worry
about with Snowflake
Because Snowflake is a clouc-oult, data warehouse as a service, there are lots of things you don't
‘need to worry about compared to 8 trecitional on-oremises solution:
taining hardware and software
+ Install
1. provisioning and
~ Snowake isa cloud-built data warehouse as a service. All vou need to dois create an account
‘and load some data. You can then just connect from Tebleau and start querying, We even provide
free usage to help vou get started if you are new to Snowfake!
* Working out the capacity of your data warehouse
~~ Srowtlahe is = fully elastic platform, yo it can scale te hance al of your data and al of your users
You can adjust the size of your werehouses (the layer that does the query execution) up and down
{and on the fly ta handle peaks and lls in your cate usage. You can even turn your warehouses
completely off to save money when you are not using them,
* Learning new tools and a new query language
~ Snowfake isa fuly SQL-compliant data warehouse so all the skills and tools you already have
(such 2s Tableau) will easly connect. We provide connectors for ODBC, JDBC, Python, Spark and
Nodes as wel 5 web-based anc command-line interfaces.
~ Business users increasingly need to work with both traditionally structured data (e.g. data in
\VARCHAR, INT and DATE columns in tables) as well as semistructured data in formats such 2s
XML_JSON and Parquet. Snowflake provides 3 special data type called a VARIANT thet alows
‘you to simply load your semi-structured data and then query It the same way you would query
traditionally structured data—via SQL.
* Optimizing and maintaining your data
~ Snowflake is highly-scalable, columnar deta platiown that alows users to run analytic queries
‘uickiy and easly t does nat require you to manage how your data is indexed or distributed
{cross partitions. In Snowfake, this is all trensparently manages by the platform,
~ Snowflake also provides inherent dats protection capabilites, so you don't nes
snapshots, backups or other administrative tasks such as running VACUUM jobs.
worry about
{Grenson auoes nest PRACTICHS FOR USING TABLEAU WHT SMOWELAKERT
ys snowflake
* Securing your data
~ Secuityis critica forall cloud-besed systems, and Snowflake has you cavered on this front. All
ata is transparently encrypted when it is loaded into Snowflake, and itis kept encrypted at al
‘pres when at rest and in transit.
= If your business requirements include working with dats that requires HIPAA, Pll or PCL
‘compliance, Snowfiake can also support these validations with the Enterprise for Secure
Data ection.
* Sharing data
~ Snowflake has revolutionized how organizations distribute and consume shared data, Snowlake’s
Unique architecture enables lve data sharing bebween Snowfleke accounts without copving and
moving data sets. Data providers enable secure data shares to the'r data consumers, who can
‘view and seamlessly cornbine it with their own dats sources. When a dats provider aces t9 or
updates data, consumers abways see the most current version.
Al of these festures free up your time snd allows you to focus your eitention on using Tableau to view
ang understand your data,
{Pevwemon cuves west PaACTCEs FOR USING TABLEAU WITH SNOWFLAKE. #wb
ys snowflake
Creating efficient
Tableau workbooks
Not every recommendation in this whitepaper is unique to using Tableau with Snowflake, There are
many best practice approaches for creating eficient workbooks that are comman across al data
sources, and @ great resource for this information isthe ‘Best Practices for Designing Efficient Tableau
Workbooks’ whitepaper
‘That whitepaper contains too much to repeat here, but the key points are:
‘Keep it simple—The msjovty of performance problems ste caused by inefficient workbook desien
(eg. to0 many charts on one dashboard or trying to show too much data at once) Allow your users
{0 incrementally cil down to cetals, rather than trying to show everything then iter.
* Less is more—Workewith only the data you need. The fewer rows and columns you work with, the
‘aster your queries will execute. Also, splay only the data you need, The fewer marks you ora,
‘the faster your workbooks will render.
Don’t try to do It all yourself—The query generetor in Tableau Is one of the most efficient on the
‘market, so trust it to create the queries for you. Certainly, you should understand how things in
2 Tableau worksheet impact the queries generated, but the less you fight against the tool (eg
through custom SQL) the better your experience wil be.
{Grenson aces nest PRACT Fom UNING TABLEAU WITH SMOWSLAKE 18dL
ys snowflake
Connecting to Snowflake
USE THE RIGHT CONNECTOR
To connect to Snowfake, Tableau
use the
Snowfeke-provided ODBC driver. To use this, you should
TRH 1086) So come oy
— Oe
= : Dime
{Grenson auoes nest PRACTICES FOR UNINGTARLEAU WITH SHOWSUAKEa
x snowflake
LIVE VS. EXTRACTED CONNECTIONS
Because Snowflake is a high-performance, analytic data warehouse. it allows users to run ive
connections over very lege dais volLimes and stil maintain ecceptab's query response fines by
selecting an appropriately sized warehouse, With Snowflake there is less need ta use Tebleau extracts
to improve query performance than on other trecitional data platforms.
‘Tablesy extracts can be complementary to Snow"lsk= when:
* Users require an offline data cache that can be used without @ connection ta Snowflake: or
* Users create aggregated extracts to act as summarized data caches, This was initially proposed by
Jeff Feng in a TCLS presentation (Cold, Warm, & Hot Dato: Applving a Three-Tiered Hadoop Dota
‘Strategy] 2s an approach to working with large, slow data lakes. However, because Snowflake can
hanale large volumes of structured and semi-structured data while still providing analytic avery
performance, you could use it to provide both the cold and warm layers
cusToM sat
Sometimes users new to Tableau try to bring old-world techniques to their workbooks such as
cresting dats soutes Using hand-written SQL ststements. In many cases this s counteroroductve
25 Tables can generate much more efficient queries when we simply define the join relationships
between the tables anc let the query engine write SQL specie tothe view being created. However,
there are times when specifying joins inthe date connection window does not offer all the flexibility
‘you need to define the relationships in your data,
Creating a data connection using a hand-written SQL statement can be very powerful, But in some
cases, custom SQL can actually reduce performance. This is because in contrast ta defining joins,
custom SQL is never deconstructed and is always executed atomically. This means no join culling
‘ocours, and we can end up with situations where the database is asked to process the whole query,
possibly multiple times.
Example
Consider the following, very simple Tableau worksheet, It's showing the number of records in the
‘TPCH SF1 sample schema for each manth, broken down by the market segment:
fiecusacewoa|ecce=="(@r/¢ ee |
a
— —
{Pevweon cuves west PEACTICES FOR USING TABLEAU WITH SHOWFLAKE. 1wb
ys snowflake
lfour undetiying data model uses the recommended approach of linking tables:
[ese ioe toe Deen
= 4 Die DA mene wen
‘The Tablesu query engine produces the following optimal query that joins only the tables needed and
returns just the columns we are disolayings
SELECT "Customers“C_MITSEGMENT? AS °C_MKTSEGMENT.
‘COUNTIDISTINCT "Orders'70 .ORDERKEV") AS "ctd-0 ORDERKEY'ok,
DDATE_TRUNC MONTH" Ovders""O, ORDERDATE" AS “tmm.0. ORDERDATE-o*
FROM "TPCHI"VLINEITEM “Unto
INNER JOIN “TPCHI'ORDERS' "Orders" ON (Liner _ORDERKEY" = "Orders'70_ORDERKEY)
INNER JOIN “TPCH""CUSTOMER" "Customers" ON Orders'"0_CUSTKE" = *Customers™C,CUSTKEY’)
GROUPE 1.
3
Tris results in the following optimal query olan:
{Govwnmon ouDes orsrPeACTCEs FOR USING TANLEAM ITH SHOW FLAKEa
x snowflake
‘What happens though if we simulate an exemple of embedcing business logic in the data model?
Using custom SQL there are tio aoproaches we can take, The preferred approach Isto isolate the
custom SQL to just the part of the model that is affected and keep the rest of the schema as jain
Cefinitions, This is what the folowing example does simply by replacing the LINEITEM table with 3
custom SQL statement:
In this case, the following query is generated by Tableau (the custom SQLs highlighted for
easy identification}:
SELECT "Customers 7C_MKTSEGMENT AS "C_ MKTSEGMENT,
‘COUNTIDISTINCT "Orders'70_ORDERKEY") AS “cd-0 ORDERKEY.OK,
[DATE_TRUNC MONTH" Orders'"O. ORDERDATE®) AS “mm-O. ORDERDATE:0e*
FROM (
SELECT
FROM TPCHS.LINEITEM
)"Unetten
INNER JOIN “TPCHI'ORDERS "Orders" ON (Linetem™_LORDERKEY” = "Ordes'70 ORDERKEY)
INNER JOIN “TPCH1""CUSTOMER' "Customers" ON Orders'"O_CUSTKEY" = “Customers. CUSTKEY')
GROUPEY 1,
3
(Note: iis important that you dont include a terminating semi-colon in your custom SQL as this wil
cause an eran)
{Deven cuves west PEACTICES FOR USING TABLEAU WITH SNOWFLAKE.wb
ys snowflake
As you can see, the custom SQ] is nat decomposed but because Its scope Is just for the LINEITEM
table, Tableau can cul (eliminate) the joins to the unneeded tables, The Snowflake optimizer then
‘manages to parse this into an optimal query plan, identical tothe initial examy
“The worst-case result happens when we encapsulate the entire data model (with all table joins) into
2 single, monolithic custom SQL statement. This is often what users who are new to Tableau wil do,
especialy f they have an existing query staternent they have used in another toa)
Ft
{Gowen oUDES oEsr PRACTICES FOR USING TARLEAM METI SHOW FLARE 15a
x snowflake
‘Agpin, the custom SQLs not decomposed so all the Tableeu query engine does is wrap the custom
SQL in a surrouncing SELECT statement, This means there is no join culling and Snowfake Is required
‘to join all the tables together before the required subset of data is selected (again, the custom SQL is
highlighted for readabilty:
SELECT "Bad Custom SQUY°C_MICTSEGMENT: AS "C. MKTSEGMENT:
‘COUNTIDISTINCT "Bad Custom SO™'0_ORDERKEY") AS "etd:O_ORDERKEY0K,
DDATE_TRUNC{MONTH"Bad Custom SQL"O_ORDERDATE") AS “trn:0_ ORDERDATE:04"
FROM (
SELECT*
FROM “TPGHI'7LINEITEM™-Unettem”
INNER JOIN "TECH" ORDERS" “Orders” ON CLineten“L_ORDERKEY”~ “Orders""0. ORDERKEY")
INNER JOIN "TPCHI™CUSTOMER" "Customers" ON (Orders™0_CUSTKEY" = "Customers™C,CUSTKEY)
INNER JOIN “TPCH"PART*Parts™ ON (Linltem’"_ PARTKEY" = "Pars™P_PARTKEY")
INNER JOIN “TECH "PARTSUPP* “PARTSUPP” ON (arst"P PARTKEV" = “BARTSUPPT°PS_PARTKEY")
INNER JOIN "TPCHE'TSUPPLIER’“Suppllrs" ON (PARTSUPP™PS.SUPPKEY" = "SupolrsS SUPPKEY")
INNER JOIN “TPCHI'TNATION "CustNalion” ON CCustomersi=C_ NATIONKEY" = "CistNaton"7N_
NATIONKEY)
INNER JOIN "TPCHITREGION? *Custegion ON ("CistNation""N REGIONKEY" = "CustRegion™7R.
REGIONKEY)
) "Bad Custom SQL
GROUPE 1.
3
‘As the following query plan shows, the Snowflake optimizer can parse this only so far, and a less
efficient query is performed:
=a cI
(SS Som
anata teomenren 3
Tiislest approach is even more ineficient as this entire query plan needs to be run for potentially every
{query in the dashboard. In this example, there is only one. But if there were muitiale data driven zones
(eg. multi charts fiters, egends, etc) then each could potentially run this entire query wrenped in
lis own select clause. |f this were the case, t would be orsferable to follow the Tebleey recommended
practices and create the data connection as a Tableau extract (data volumes permitting). This would require
‘the complex query to run only ance and then all subsequent queries would run against the extract.
owen cuves west yancrices romwb
ys snowflake
INITIAL SQL
As just discussed, one of the limitations of custom SQlLis that it can be run multiple times for a single
dashboard. One way to avoid this isto use intial SQL to create 2 temporary table which will then be
‘the selected table in your query. Because inital SQL is executed eniy once when the workbook is
‘opened (as opposed to every time the visualization Is changed for custom SQL this could significantly
Improve performance in some cases, However, the flip-side is that the data populated into the temp
table wil be static for the duration of the session, even ifthe data in the underlying tables changes.
Example
Using the exemple above, insiese of placing the entre auery in = custom SQL statement, we could use:
itin an inital SQL block and instantate a temporary table:
(CREATE OR REPLACE TEMP TABLE TPCHLLFOO AS
SELECT"
FROM “TPCHI'SLINEITEM™ Une"
INNER JOIN "TPCH'"ORDERS' Orders" ON (Linelten_ORDERKEY" = "Orders170_ORDERKEY")
INNER JOIN “TPCHE=CUSTOMER™ “Customers” ON FOrdars™O_CUSTKEY" = "Customets™C. CUSTKEY)
INNER JOIN “TPCHE"PART*Partst ON (Linltem!"_ PARTKEY" = "Pars™P_PARTKEY")
INNER JOIN "TPCHE'"PARTSUPP” "PARTSUPP” ON (Parts7P_PARTKEV" = "PARTSUPP""PS_PARTKEY)
INNER JOIN “TPCHI'TSUPPLIER" "Supliers" ON CPARTSUPP™PS.SUPPKEY" = "Supoles7S_SUPPKEY)
INNER JOIN “TECH "NATION? *CustNation” ON CCustomers"C_ NATIONKEY" = *CustNation
NATIONKEY)
INNER JOIN “TPCHE'"REGION? “Custer” ON *CstNaton"*N REGIONKEY" = "CustRegion™"R_
REGIONKEY,
\We then simply select the FOO table as our data source:
‘And Tebleau generates the following query:
SELECT "FOO"C_MKCTSEGMENT? AS"C_MKTSEGMENT?,
‘COUNTIDISTINCT "FOO ORDERKEY") AS "td:0_ ORDERKEY.ok’
DDATE_TRUNCIMONTH"FOOO_ORDERDATE") AS “tnm:0_ORDERDATEOK"
FROM *TPCHI'SFOO" *FOO"
GROUPBY 1,
3
{Gown ouDes orsrPeAcrices FoR usin TANLEAM ITH SHOW ELAKE 7RT
ys snowflake
Tris has a very simple query plan and fast execution time as the hard work has been done in the
creation of the temp table, However, the data returned will not reflect changes to the underlying fact
tables until we start 2 new session and the temp table is recreated,
Note that on Tableau Server an administrator can seta restriction on intial SQL so thet It will nat
run. You might need to check to see that this is okay n your environment if you are planning to
publish your workbook to share with ethers, Also, note that temp tables take up adbitional space
in Snowflake that will contrizute to the account storage charges, but they are ephemera, so this s
generally not significant,
VIEWS
\Views are another alternative to using custom SQL. Instead of materaizing the result into a temp
table, which is potentially a time-consuming action to perform when the workbook is inially opened,
‘you could define a view:
CREATE OR REPLACE VIEW FOO. VWAS
SELECT
FROM “TPCHIVLINEITEM" "inet"
INNER JOIN “TPCH1™ORDERS" "Orders" ON (Linen ORDERKEY" = "Orders'"O. ORDERKEY)
INNER JOIN “TPCHI''CUSTOMER' "Customers" ON (Orders"'0_CUSTKEV" = "Customers1C.CUSTKEY))
INNER JOIN “TPCHI'= PART" "Pats" ON CLineltemL_PARTKEY" = "Pt
INNER JOIN “TCH
INNER JOIN "TPCHI'"NATION? "CustNation” ON ("Customers!™C_NATIONKEY" = "CustNatior??N_
NATIONKEY)
INNER JOIN "TPCHE'TREGION? "CustReplon” ON ("CustNatlon""N_REGIONKEY = "CustRegionR_
REGIONKEY)
In Tableau, we then use the view as our data source:
{ewan cuves west raacrices ronRT
ys snowflake
‘Tne query generated by Tableau Is simple, ast is with the temp table appras:
SELECT"FOO_WW'"C_ MKTSEGMENT" AS °C_MKCTSEGMENTS,
‘COUNTIDISTINCT "FOO. VWY70 _ORDERKE")AS ct. ORDERKEYG
[DATE_TRUNCIMONTH'FOO_ V0. ORDERDATE) AS “nitOORDERDATE:OK*
FROM *TPCHI'FOO_WW" *FOO. VW"
GROUPE 1,
2
However. this takes longer to run and has 2 ess efficient query olan as the view needs to be evaluated
at query time. The benefit is that you will always see up-to-date data in your results:
Allof the above approaches are valid but vou should select the most appropriate solution based on
‘your particular needs of performance, data freshness, maintainabiity and reussbiity,
{Gowen ouDeEs orsr PRACTICES FOR USING TANLEAM ITH SNOWFLAKEwb
ys snowflake
Working with
semi-structured data
‘Today, business users find themselves working with data in multiole forms from numerous sources,
‘This includes an ever-expanding amount of machine-genereted deta from applications, sensors, mabile
devices, etc Increasingly, these deta are provided in semi-structured date formats such as JSON,
‘Auro, ORC, Parquet and XML that have flexible schemas, These semi-structured dats formats do not
conform to the standards of traditionally structured data, but instead contain tags or other types of
‘mark-up that identify ingividual, distinct elements within the data:
Coys
‘coord ("et -37.813999, "Yon 144963518),
“eourty AUS,
“fidname" “MELBOURNE
‘ig: 2158177,
“ame "Mesboure’
‘room 5
"eos
“al 88)
“mai
“pumity’: 3, "pressure: 1010, "emp" 2084,
“temo, max: 305.15, "temo_ in": 301.25)
var
"3H057),
time 151446494,
weather
“dosripton®: “tral "eons "00,
ia 500, "ai "Raio
win
"deg 950, "speeds 4.1)
1
‘Two of the key attributes that cistinguish semi-structured data from structured data are nested data
structures and the lack of a fred schema:
* Unlike structured data, which represents data as 2 flat table, semi-structured data can contin
r-level hierarchies of nested information
+ ‘Structured data requires a fixed schema that i defined before the data can be loaded and queried
ina relational databace system. Semi-structured data does not require @ rior definition ofa schema
and can constantly evolve: i., new attributes can be added at any time.
‘Tableau has introduced support for directly reacing JSON data files. However, broader support for
semi-structured data can be achieved by fst loading the data into Snowflake. Snowflake provides
Native support for semi-structured data, inclucing:
+ Flexble-schema data types for loading semi-structured data without transformation
* Direct ingestion of JSON, Avra, ORC, Parquet and XML file formats
+ Automatic conversion of data to Snowflake's optimized internal storage format
* Database optimization for fast and efficient querying of semi-structured data
{Grenson auces nest PRACT FOm UNING TABLEAU WITH SHOWSLAKE 20wb
aS snowflake
‘THE VARIANT DATA TYPE
Rather than requiring semi-structured data to be parsed and transformed into a traditional schema
of sngle~yalue columns, Snowflake stares semi-structured dats in single column of = special yoo
\VARIANT. Each VARIANT column can contain an entire semi-structured object consisting of multiple
key-value pairs,
SELECT’ FROM SNOWFLAKE SAMPLE DATAWEATHERWEATHER 14 TOTAL
ura
‘KATHMANDU. io 1285240,
‘Tne VARIANT type Is quit 1 store the semi-structured object as 2 text string, but
rather the individual keys and their values are stored in 2 columnarized format, just lke normal columns in
8 relational table, This means thst storage and query performance for operations on data ina VARIANT
column are very similar to storege and query performance far data in # normal relational column
Note that the maximum number of key-value pairs that will be columnarized for a single VARIANT
column is 1000, f your semi-structured data has > 1000 key value pairs you may benefit from
spreading the data across multiple VARIANT columns. Adcitionally, each VARIANT entry is imited to 2
maximum size of 16ME of comaressed lata
‘The individual key-value pairs can be queried directly frern VARIANT columns with 2 minor extension
‘to traditional SQL syntaxc
SELECTVitime: TIMESTAMP TIME,
‘ctyname:VARCHAR CITY,
Vetycountry:VARCHAR COUNTRY,
main temp_max- 273-15} FLOAT AS TEMP._MAX
\Vimein.temp in -273.15;:FLOAT AS TEMP-MIN,
Veweather(0}. main:VARCHAR AS WEATHER MAIN
FROM SNOWFLAKE. SAMPLE. DATAWEATHERWEATHER 14 TOTAL:
euen2018 05am | Me aU 2 e Ran
eon 2016 12 Ay 2 v Ren
Tuer2016 1105 am | Mel au 2 18 suse
{Grenson auces nest PRACT FOm UNING TABLEAU WITH SMOWSLAKE 2RT
ys snowflake
ACCESSING SEMI-STRUCTURED DATA FROM TABLEAU
Unfortunately, Tableau does not automatically recognize the VARIANT data type soit doesn't
sutomatically create queties with the SQL extensions outlined above, This means we nese to manually
create the SQL necessary for accessing the data in t
(As outlined earlier in this paper, one may to sccess semistructured cats is to use custom SOL.
‘The best practices densified there should b= followed in this case—speciically that you dont use a
‘monolithic statement that joins across muitile tables. Instead, use 2 discrete statement to reference
the key-value pairs from the VARIANT column and then join that custom SQL "table" with the other
regular tables, Also remember to set “assume referential integrity’ so the Tableau query generator can
cull tables when they are not required,
Example
‘To use the WEATHER 14 TOTAL table in the semple WEATHER schema, we can create a custom SQL.
{Gowneon ouDes orsr PRACTICES FOR USING TANLEAM METH SHOW LAKE 2a
x snowflake
‘Tris data source can then be used to create vizzes and dashboards as I the user were connected to =
traditional structured schema, with similar performance:
In this exemple, the WEATHER _14 TOTAL table has ~66M records and response times in Tableau
Using 2 X-SMALL warehouse were quite acceptable,
Cf course, as outlined earier, this SQL could also be used in 8 DB view, or in an initial SQL statement
to create a temp table~the specifc spprosch you use should be cictated by your needs. Alternatively,
as a way to ensure governance and a consistent view of the JSON data, you can always oublish the
(sta source to Tableau Server (or Tableau Online) for reuse across multiple users and workbooks:
Genmrson ouces vestRT
ys snowflake
PASS-THROUGH SQL
‘Tablesu provides @ number of functions that alow us to pass raw SQL fragments thraugh to
‘he undertying data source for processing, These are often referred to as RAW/SQL functions:
hitpv/onlinehelptableau.com/current/pro/desktap/er-us/functions_ functions_passthrough htm!
‘We can use these to dynamically extract values from a VARIANT fie
exnty Bete ona x
resa_sT(”
eityscountry stringy (V1)
‘This SQL fragment is passed through to Showfiake without alteration. Here is @ query fram 2
calculation based on RAWSQL statements (the fragments are highlghted!
SELECT (ety countrystrng) AS "1D?
“AVGK(Valrtemp max- 273.15i:fea) AS "avgtemp_maxcok
FROM "WEATHER'TWEATHER_14 TOTAL” WEATHER. 14 TOTAL’ GROUP BY 2
‘The advantage of this approach is that the Tableau author can extract specific VARIANT values
without needing to create an entire custom SQL table statement,
eT
‘Wile there are benefits to keeping your semi-structured data in VARIANT data types, there are
259 |iniistons as outlined above, f your dats scheme is well defined and sistic, you may benefit
‘rom performing some ELT (extract-laad-transform) transformations on your deta ta convert into @
‘racitional data schema.
‘There are multiple ways to do this. One isto use SOL statements cirecly in Snowflake (for more
tall on the supported semi-structured data functions click on this links htps/éocs-snowTekennet/
‘manuals/sq/-eference/functions-semistructured ntmissem-structured-date-functions) f you need
to conditionally separate the data into multiple tables you can use Snowflake's multi-table insert
statement to imorove performance: hitos://docs.snowflake.net/manuals/sal-reference/sal/insert-mult
tabletm
{Gowen ouDes orsr PRACTICES FOR USING TANLEAM ITH SNOWFLAKERT
ys snowflake
Another approach Is to use thire-party ETL/ELT tools from Snowflake partners such as
Informatica, Matilion, Alaoms anc Alteryx (to name 3 few), Note that If you are dealing with
large volumes of data, you wil want to use tools that can take advantage of in-DB processing.
This means that the data processing will be pushed down into Snowflake, This greatly increases
"worktlow performance by eliminating the need to transfer massive amounts of dats aut of
Snawake to the ETL tool, manipulate i locally and then push it back into Snowflake,
owen ouces vesa
x snowflake
Working with
Snowflake Time Travel
Sawflake Time Travel is 3 powerful feature that lets users access historical data. This enables many
Powerful capabilities but in the case of Tableau it allows users to query data in the past that has since
been updated ov deleted
‘When any data manipulation operations are performed on a table (eg. INSERT. UPDATE, DELETE
ste], Snowflake retains previous versions of the isble data for = defined period of time. This enables
querying earier versions of the Gata using an AT or BEFORE clause.
Example
‘Tne following query selects historical data from a table as of the date and time represented by the
specified timestamp:
sevect*
FROM ny table Timestamp => Mon, Of May 2015 1620.00 -0700timestamgl,
‘The following query selects historical data from a table as of five minutes ago:
setect*
FROM my table ATlofset => -60°S);
‘The folowing very selects historical Gata from = table ue)
‘the specified statement:
but net including any changes made by
setect*
FROM my table BEFOREtatement => BeSd0ca?-005e-44e6-0858-28150370S726);
‘These features are included standard forall accounts (Le, no adcitional licensing is required). However,
standard Time Travel is one dey. Extended Time Travel (up to 90 days) requires Snowfake Enterprise
Edition or highes) In addition, Time Travel requires addtional dats storage, which has associated fees.
ACCESSING TIME TRAVEL DATA FROM TABLEAU
‘As with semi-structured data, Tableau does nat automatically understand naw to create time travel
‘queries in Snowflake, so we must use custom SQL. Again, we recommend you use best prectices by
limiting the scope of your custom SQL to the table(s) affected,
Example
‘The following Tableau data source shows a copy of the TPCH_SF1 schema, with Time Travel enabled,
\We want to be able to query the Lineltem table and use the data it contained at an arbitrary point in
‘the past. To do this we use @ custom SQL statement for just the Lineltern table. In this case, we set an
AT clause for the Lineltem table and use a parameter to passin the Time Travel time value:
{G-cnmsnon aces nest PAACTICS FOR UNING TABLEAU WITH SMOWSLAKE 26wb
ys snowflake
Using this query in the following worksheet
ee
wec0ego WOR: D vom mee to
7
- =<
Results in the following query running in Snowflake (the custom SQL Is highlighted):
SELECT "Customers “C_MITSEGMENT? AS °C_MKTSEGMENT.
‘COUNTIDISTINCT "Orders'70 .ORDERKEV") AS "ctd-0 ORDERKEY'ok,
DDATE_TRUNCTMONTHOrdere'"0, ORDERDATE) AS “tmn:O_ ORDERDATE:k*
FROM (
SELECT?
FROM TPCHLLINEITEM ATitinestamp => 2018-01-01 12:00:00.967:TIMESTAMP._NTZ)
)"Uneten
INNER JOIN “TECHI'"ORDERS' “Orders ON (Lineten"_ORDERKEY" = "Ordes!70_ORDERKEY")
INNER JOIN “TPCH1™CUSTOMER' "Customers" ON Ordes""0.CUSTKEV" = “Customers™C.CUSTKEY')
GROUPE A.
3
‘Thanks to the parameter, tis easy forthe user to select cferent as-et times, Again, this SQL could
2s0 be used in 2 DB view orn an initial SQL statement to create a temp table. The specific approach
‘you use should be dictated by your needs.
Geneon ouces vest peacrices foxRT
ys snowflake
Working with
Snowflake Data Sharing
Snowflake Data Sharing makes it possible to cirectly share data in real-time and ina secure, governed
and scalable way from the Snowflake cloud data warehouse. Organizations can use it to share data
interaly across lines of business, or even extemaliy with customers and partners with almost
no friction or effort. This is an entirely new way of connecting users to dats that doesn't require
transmission, and it significantly recuces the trecitional pein points of storage cuplication and latency.
Instead of transmitting deta, Snowflake Data Sharing enables “date consumers’ ta cirecty access reac:
coniy copies of ive data that’s in @ “data provider's’ account,
‘The obvious advantage of ths for Tableau users is the ability to query data shared by 2 provider and to
know itis always up-to-date, No ongoing data administration is required by the consumer.
Example
To access shared data, you frst view the avaiable inbound shares:
show shares;
‘You can then create a database from the Inbound share end apply appropriate privileges to the
necessary rales:
{iCreate a database from the share
create or replace database SNOWFLAKE SAMPLE DATA
ftom share SNOWFLAKE SHARED. SAMPLES;
{Grant permissions to others
_grant imported privileges on database SNOWFLAKE_SAMPLE.DATA to roe publi:
Users can then sccess the database objects as if they were local. The tables and views anpeer i
‘Tableau the same as any other:
el
eee FOG Fe SFO PPO) we
=e
to
{Grenson aces nesr PRACT FOm UNING TABLEAU WITH SHOWSLAKE 28wb
ys snowflake
Implementing
role-based security
‘Acommon requirement when accessing data Isto implement role-based security where the data
retuned in @ Tableau viz is restricted elther by row, column or both, There are multiple ways to
1 and Snowfiake, and a key determinant ofthe right solution 's how reusable you
‘The following sections explain howto implement both Tableau-only and generic solutions.
SETTING UP THE DATA ACCESS RULES
‘Torestrict the deta available to a user. we need a table to define re
user context and the
cata elements we wish ta alow access.
Example
Using the TPCH_$F1 schema in the sample data, we create a simple REGION, SECLRITY table 2s follows:
ety
aly
link this table ve the REGION and NATION tables to the CUSTOMER table:
2 FP PPPrPrPrr
{Grenson ouces nest PRACTICHS FOR USING TABLEAU WITH SMOWELAKE 28RT
ys snowflake
In Tableau we now see ctferent result s
retumed for each of the three RS_USER values:
en xe key
If we needed to create access rules across multiple dimensions of aur data (e.g. in adcition to region,
‘restrict access by product brane), then ne would create multiple access rule tables
‘sparopriately ita cur dats schema
‘we also want
ano connect the
PASSING IN THE USER CONTEXT
“To make the above model secure, we need to enforce the restriction so that a viewing user can only
see the result set permitted to their user context. Le, where the RS_USER field is equal to the viewing
Users usemame, We nesd to pass this usemins from Teblzeu into Snowflake end the way ve do
‘this depends an whether we are creating @ Tableau-only solution or something thats generic and wil
work for any data tool
TABLEAU-ONLY SOLUTION
If we are building a Tableau-only solution we have the ability to define some of our security logic at
the Tableau layer. This can be an advantage in some scenarios, particularly when the viz author does
‘not have permission to modify the database scheme or add new objects (eg. views), as would be the
case ifyou were consuming 2 data share from another account.
Aduitionally, ifthe nteracior users in Tableau (i=, users scvsssine the viz vie Tebleeu Server, as
‘opgosed to the author creating it in Tablesu Desktop) do not use incvictal logins for Snowflake, then
‘you must create the solution using this appraach because the Snawfiake variable CURRENT_USER
will be the same for all user sessions,
Example
To-enforce the rest source filter in Tableau that restricts the result set to
‘where the RS_USER field matches the Tableau function USERNAME) which returns the login ofthe
current use!
{owneon ouDEs oEsr PRACTICES FOR USING TARLEAM ITH SHOWELAKE 2wb
ys snowflake
“The resuitis only the data permitted for the viewing user. Nots, that in the following screenshot there:
ate na worksheet-evel fiters 2s the iter is enforced on the data source:
2 FBR RO RP OT or Tm 7/4 Pow
net 1
Tae - <=
area or euan
‘The USERNAME\) function is evaluated in Tableau and then oushed through to Snowflake a6 a literal
as you can see in the resulting query:
SELECT'ALAN’ AS “Caleustion 38812177424095891107,
CustNationN_NAME" AS "NAME"
FROM "TPCHI"INATION” "Custation”
INNER JOIN “TPCH1™REGION' "CustRegion® ON ("CustNation"*N_REGIONKEY" = “CustRegio’
REGIONKEY)
INNER JOIN “TPCHIREGION SECURITY" “REGION SECURITY" ON ("CustRegion"ZR.REGIONKEY" =
“REGION. SECURITY""RS_REGIONKEY")
WHERE (UPPERCREGION.SECURITY!"RS_USER") = ALAN)
GROUPEY2
{O-cnmsnon auoes nesrPRACTICS FO UNING TABLEAU WHT SMOWSLAKE 3RT
ys snowflake
‘To enforce this fiter and prevent 8 workbook author from editing or emaving it, you should publish
the data source to Tableau Server and create the workbook using the published data source,
SOLUTION FOR ANY DATA TOOL,
‘Wile the above solution is simple to implement it only works for environments where all access is
vig Tableau, 2s the security lope is implemented in the Tableau layer. not the database, To mete tie
solution tray robust and applicable for users connecting via any tool, we need to push the logic dawn
Into Snowflake, How we do this depends on whether each user has thelr own Snowflake login or
whether all queries are sent to Snowflake vie @ common login,
Example
Continuing the exemple above. rather than using @ dats source filter in Tableau, we will create 3 View
in Snowfake:
‘ate or replace secure view "TPCHI™'SECURE_REGION.VW" 35
‘select R_REGIONKEY, R.NAME, COMMENT, RS_USER.
from “TPCHI"REGION""CustRegion”
ner jin “TPCH"REGION SECURITY" "REGION SECURITY"
‘on CustReglon’?R_ REGIONKEY" = "REGION, SECURITY"RS_REGIONKEY)
[WHERE (UPPER REGION SECURITY )PPER|CURRENT. USER),
In this example we are using the Snow/eke variable CURRENT_USER but we could equally be using
CURRENT_ROLE ifyour solution needed to be mare scalable. Cleary this requires each viewing user
to.be logging in to Snowfake with their own credentials—you can enforce this when publishing the
‘workbook to Tableau Server by setting the authentication type to ‘promet user’
Manage Data Sources
source user ipe © ‘narerceaton
6 Anyeatatoolsottion Embedcedinworkbock | [ Promptuser :
Promot user
Embedded passwort
| embed Al Passwords
It you are using "embedded password! then CURRENT_USER and CURRENT _ROLE wil be the same
forall user sessions, In this case we need to passin the viewing user name vis an inital SQL block:
{owneon ouDEs oEsr PRACTICES FOR USING TABLEAU ITH SHOW LAKE 22wb
ys snowflake
‘The view would then refer
ce this variable:
rate or replace secure view "TPCHI"'SECURE_REGION.VW" 25
‘select R_REGIONKEY, R NAME, R.COWMENT, RS_USER
fom “TRCHI"TREGION" "CustRegion”
Inner jin “TPCHI'7REGION. SECURITY” "REGION SECURITY"
fon CustRegion’"R_REGIONKEY" - "REGION SECURITY""R5_REGIONKEY")
WHERE (UPPERCREGION, SECURITY" JPPER(SVIEWING. USER;
Tris then enforces the fiter on the result set returned to Tableau based on the Tableau user name:
‘There is one final step required to enforce the security rules specified in the SECURE REGION WW
‘view. We need to enforce referential integrity in the schema, Without thi, if you dontt use the
SECURE_REGION.VW in your query then join culling could drop this table from the auery and
security would be bypassed,
{Grenson auces nest PRACT FOm UNING TABLEAU WITH SHOWSLAKE 32RT
ys snowflake
It your data permits, you can create constraints between the tables in Snowflake (see hitos://docs,
snowflake.net/manuals/sq-reference/sqi/reate-table-constraint html for detall
Uncheck “assume referential integrity" in Tableau:
‘or you can simply
ao eS aoe. a 8/85)
{Grenson auoes nest PRACT FOm UNING TABLEAU WITH SMOWSLAKE 34wb
ys snowflake
Using custom aggregations
Snowflake provides 2 number of custom aggregation functions outside the ANSI SQL specification. Some
of these are specie to working with semi-structured data, while others are useful for approximating
results (2, carnal, simierty: frequency, etc) when you are working over very leige dats volumes.
Atul ist ofthe avaliable aggregetion functions can be found in the Snowflake online documentation:
httos:/docs snowflake:net/manuals/sal-reference/functions-aggregation.html
“To use these functions in Tableau, you can leverage the pass-through functions in Tableau
httpx/onlinehelptableau.com/current/pro/desktop/en-us/functions, functions passthrough htm!
Example
Snowflake provides # custom agaregation function APPROX_COUNT_DISTINCT which uses
HyperLogLog to retum an approximation of the distinct cardinality of afield. To use this function,
‘we can create calculated feld that leverages the appropriate RAW/SQL function. When we use this,
caleulation in 2 Tableau viz:
e672 064-c/8-#a. Dar oe Gee |e
enw —
arcran — ep
poproxCourtd|
sc RISER ore ID
Ta mmm [I
“Tableau produces the following query in Snowflake:
SELECT COUNTIDISTINCT “Orders'°0. ORDERKEY" AS “ctetO. ORDERKEYk
(APPROX_COUNT_DISTINCTI‘Ordes!"0,ORDERKEY") AS "us Calculaton_1944218056180731900K,
[DATE_PARTCYEAROrders™O_ORDERDATE? AS "yO ORDERDATE:o”
FROM “TPCH.SFULINEITEME “Lint
INNER JOIN "TPCH_SFI"ORDERS" "Orders" ON (‘Lelie "L_ORDERKEY" = "Orders0_ORDERKEY)
GROUPEYS
{Grenson aces nest PAACTICS FOR UNING TABLEAU WITH SMOWSLAKE 36RT
ys snowflake
Scaling Snowflake warehouses
Snowflake supports two ways to scale warehouses:
+ Scale up by resizing 2 warehouse.
+ Scale aut by adding clusters to a warehouse (requires Snowflake Enterprise Edition or higher)
WAREHOUSE RESIZING IMPROVES PERFORMANCE
Resizing a warehouse generally improves query performance, particulary for larger, more complex.
‘queries It can also help reduce the queuing that accurs if warehouse does nat have enough servers
to pracess all the queries that are submitted concurrently. Nate that warehouse resizing is nat
intended for handing concurrency issues, Instead, use adcitional warehouses to handle the workload
lor use @ mutt-cluster warehouse if this feature is avalible for your account,
‘The number of servers recuired to process a query depends on the size and complexity of the query
For the most part aueties scale inesty with warshouse size, parc aly fr later mere complex aueries:
* When considering factors that impact query pracessing, the overall data sizeof the tables being
(queried has more impact than the number of rows.
+ Fitering in @ query using predicates also has an impact on processing, 2s does the number of joins/
tables in the query.
Snawfake supports resizing a warehouse at any time, even while running. Ifa query is running slowly
ano you have additional queries of similar size and complexity that you want ta run on the seme
warehouse, you might choose to resize the warehouse while It 's running. However, note the following:
© As stated earlier about warehouse size, /arger’s nut nsvessaily fester: for smaller basic queres thet
re ayeadly executing auicky, you may not see any sigificant improvement after resizing
+ Resizing @ running warehouse does not impact queries that are already being processed by the
‘warehouse: the additional servers are only used for queued and new queries.
Decreasing the size ofa running warehouse removes servers from the warehouse, When the servers
ate removed, the cache associated with the servers is drooped, which can imoact performance the
samme way thst suspencing the wershouse cen impact performance after its resumed. Keep this 9
ming when choosing whether ta decrease the size of a running warehouse or keep it atthe current
Size, In other words, there is @ trade-off with saving credits versus maintaining the server cache,
Example
To demonstrate the scale-up capability of Snowake, the following dashboard was created against
the TPCH-SF10 samole data schema snd published to Tableau Server. Note that this dashboard was
intentionally designed to ensure multiple queries would be initiated in the underlying DBMS:
{Pevween cuves west PEACTICES FOR USING TABLEAU WITH SHOWFLAKE. 3RT
ys snowflake
Resort
sy te
‘Tne data source forthe workbook was also configured with an intil SQL statement to disable query
result caching within Snowflake via the USE_CACHED RESULT session parameter. Result caching
(which we caver in more detal later] would make every query after the first one extremely fast
because the avery results do not need to be recomputed and are just reed from cache in < 1 sec
‘Tablolt was then installed on @ Windows PC to act 2s a load generator, invoking the dashboard with =
refreah=y parameter to prevent caching within Tableau Server. A series of test runs were then performed,
running the above dashbostd auainst citferent Snowfske warehouse sizes from XS to XL All ests
"were run for five minutes with a single mulated user using the lnteractVisLaadTest script, and the total
umber of completed tests and average test response times were recorded. The resuits areas follows:
: 1
Ss ol
{even cuves west PaACTCEs FOR USING TABLEAU WITH SHOWFLAKE 37RT
ys snowflake
As you can see, as we increase the size of the warehouse (from XS up to XL) the average query time
clecreases because there are more compute resources avaiable. The Improvement tals off from L to
Les we have reached 2 point where compute resources are no longer 2 bottleneck
‘You can also see in the fist run of the query at each warehouse size (the orenge dot in the chart
below), data needs to be read from the underiying table storage into the warehouse cache.
Subseavent runs for each warehouse size can then read data entirely from the warehouse cache,
resulting in improves performance:
ase Fst Request
wx Bie
e
ia en e
e
hee e
De yey
‘To achieve the best resuts, try to execute relatively homogeneous queries (size, compleity, deta sets,
etc) on the same warehouse; executing queries of widely-varying size and/or complexity on the same
‘warehouse makes it more dificult to analyze warehouse load, which can make it more difficult to
select the best size to match the size, composition and number of queries in your workload.
MULTI-CLUSTER WAREHOUSES IMPROVE CONCURRENCY
Mult-cluster warehouses enable you ta scale warehouse resources to manage your user and query
concurrency needs as they change, such as during peak and off hours.
By default, ¢ warehouse consists ofa single cluster of servers that determines the total resour
avaliable to the warehouse for executing queries, As queries are submitted to a warehouse, the
‘warehouse allocates server resources to each query and begins executing the queries. f resources are
insuffi
to execute all the queries submitted ta the warehouse, Snowflake queues the additional
‘Queries until the necessary resources become available,
With muit-cluster warehouses, Snowflake supports allocating, either statically or dynamically. larger
Doo! of resources to each warehouse,
{evweeon cuves west PEACTICES FOR USING TABLEAU WITH SHOWFLAKE. 3a
x snowflake
‘When deciding whether to use multi-cluster warehouses and choosing the number of cl
‘consider the following:
per warehou
‘ent for running in maximized mode, configure mult-cluster
warshouses for suto-scal ng. This enables Snowflake to automaticaly stert and stow clusters a5
needed. Snowflake starts new clusters when it detects queries are queuing, and it stops clusters
ven it Gstermines fist Ne clusters are sufficient t2 wn the current query (ead.
© Unless you have # specie reau te
oT
ere are to scaling polices avaiable:
‘queving by running new clusters as soon as queries start to queue.
sful load recistrieution checks, which are perfor
~ Standard scaling minimiz
The clusters will shut down after 2-3 su
at one-minute intervals.
d
~ Economy scaling conserves crecits by keeping running clusters fully losded rather than starting
S-> M) the total numberof tests completed increases by 8155 and the average test response time
decreases by 44%, This Is unsurprising as we have Increased the number of servers running queries
from one to two to four, Check out the following chart on the left:
“al ‘li
seta sig rae) sean S209 wt
a oF 5 g
a:
a:
ai
i:
a:
a
However, when we keep the number of servers constant by running four XS clusters, two S clusters or
a single M cluster (each of which has four servers) we see that the mult-cluster warehouse approach
has sinnficanly beter scab lity characteristics 55% more completed tests with an averase response
‘Sime of SES, Check out the above chart on the right.
{G-cnmenon auces nest PRACT FOm UNING TABLEAU WITH SMOWSLAKE 8RT
ys snowflake
Caching
‘The combination of Tableau and Snowflake has caching at multiple levels. The more effective we can
make our caching, the more efficient our environment will be. Queries and workbooks wil return
(saul fester anc! ess oad il be spplied to tie server layers, leacing to arester scalability.
TABLEAU CACHING
‘Tableau has caching in both Tableau Desktop and Tableau Server which can significantly reduce the
amount of rendering, calculation and querying needed to display views to users. For Tableau Server in
particular, caching is critical for achieving concurrent user scalability
PRESENTATION LAYER
Caching atthe presentation ayer is relevant for Tableau Server only, In Tableau Server, multiple end
users view and interact with views via a browser or mabile device. Depending on the capability of the
browser and the compleuty of the view, rendering will be done elther on the server or on the cent
BROWSER
In the case of clent-side rendering, the client browser downloads a JavaScript "viz cient” that can
vender Teblesu viz [shen reguests the intial dete geckage for the view ftom the seiversealleg the
bootstrap response. This package includes the layout ofthe view and the data for the marks to be
cisplayes. Together, they are called the view mace!
\With this data, the viz client can then craw the view locally in the client's browser
[As the user explores the view, simple interactions (e.g. tooltios, highlighting and selecting marks) can
bbe handled locally by the viz client using the local data cache, This means no communication with
‘the server is required anc the resuit is a fast-sereen update. This also recuess the compute load on
‘the server which helps with Tebleau Server scalability, More complex interactions (e.g, changing @
parameter or filter) cannot be handled locally, so the update Is sent to the server and the new view
made! is sent to update t
local data cache.
fil re fe
een!
Not al vizzes can be rendered client side. There are some complexities that wil force Tableau to use
senerside rendering:
+ ifthe user's browser does not support the HTML 5 «canvas» element
+ If 3 view uses the polygon mark type or the page history feature: or
«fs view is Geter ned to bs overly complex. The complenty of the vie
number of marks, rows, columns and more.
is determined by the
{Grenson ouces nest PRACTICES FOR USING TABLEAU WITH SMOWELAKE AtRT
ys snowflake
THE
Inthe case of server-side rendering, the view is rendered on the server as set of static image “tes!”
“These tle are sent to the browser, and a much simpler viz cient assembles them inta the final view.
‘Tne viz client monitors the session for user interaction (eg. havering fora tooltip, selecting mars,
highlighting, interacting with fiters, ec), and if anything needs to be drawn it sends a request back to
the server for new tle, No data is stored on the cient, s0 ll interactions require a server round-tri,
‘To help with performance, the tie images are persisted on Tableau Server in the Se cache. Inthe event
‘the same tle Is requested, t can be served tram the cache instead of being re-rendered. Note that the
tile cache can be used only if the subsequent request is for the exact same view. This means requests
for the same view from different sized browser windows can result in cfferent tiles being rendered.
(One of the very easy performance optimizations you can make is to set your dashboards to fixed size
instead of automatic. This means the view requests will always be the same size, irrespective ofthe
browser window cimensions, allowing a better hit rate forthe tle cache.
SESSION AND BOOTSTRAP RESPONSE
Wen rendering # vi. we nese the vew layout instructions and mark dats~refetred to a5 the view
rmodel—before we can do anything, The VizQL Server generates the madel based on several factors
Inclucing the requesting user credentials and the size of the viewing window, It computes the view of
the viz (Le, relevant header and axis layouts, label postions, legends, filters, marks and such) and then
sends itto the renderer (either the browser for client-side rendering or the VizQL Server for server
side rendering).
‘The VizQL Server on Tableau Server maintains 2 copy of view models for each user session. This is
initially the default view of the vz, but as users interact withthe view in their browser. itis updated
to reflect the view state—highlights, filters, selections, ec. A dashboard can reference multiple view
madels—one for each worksheet used, it includes the resuits from local calculations (e.g. table cals,
reference lines, forecasts, clustering, etc) and the visual layout (now many rows/columns to display for
small multiles and crosstabs, the number and interval of ans ticks/grd lines to draw, the number and
location of merk labels to be shown, etc)
Generating view models can be computationally intensive, owe want to avoid this whenever
possible. The initial models for views—called baotstrap responses—are persisted on Tableau Server by
the Cache Server. When a request comes in for view, we check the cache to see if the bootstrap
|ssponse aleady exists, and fit does, we serve it up without having to recompute eng reserialze
Noteall views can use the bootstrap response cache, thaugh. Using the following capabilities wil
prevent the bootstrap response from being cached:
{Grenson auoes nest PRACT FOm UNING TABLEAU WITH SMOWSLAKEwb
ys snowflake
+ Ifa view uses relative date fiters:
+ fa view uses user-specific fitters: or
+ Ifa view uses published Data Server connections.
Like the ble cache, the bootstrap resoonse is specie to the view size so reguests forthe same
view from browser windows with different sizes will produce aifferent bootstrap responses. The
easy performance optimizations you can make is to set your dashboards to be fixed size instead of
automatic, This means the view requests will always be the same size respective of the browse!
window dimensions, allowing a better hit rate for the bootstrap response cache.
Maintaining the view model is important as it means the VizQL doesn't need to recompute the view
state every time t wry by the VizQL
Server, and it does its best to share results across user sessions where possible. The key factors in
‘whether 3 visual model can be shared are:
cts. The madels are created and cached in
+ The size ofthe viz display area: Models can be shared only across sessions where the size of the
is the same. As already stated above, setting your dashboards to have fixed size will benefit the
model cache in much the same way as it benefits the bootstrap response and tile caches, alowing
greater reuse and lowering the workload on the server.
‘Whether the selections/fiters match: f the users changing fiters,oarameters, doing dil-downy
Up, etc, the model s shared only with sessions that have a matching view state. Try not to publish
workbooks with show selections" checked as this can reduce the Ikeinood that cifferent sessions
will match,
The credentials used to connect to the data source: f users a/= oromsted for credentials to
connec to the dats source, then the model can be shares any across User sessions with the same
recenifals. Use this capabi fy with caution 2s it can revue the effectiveness of the model cache.
‘Whether user filtering is used: if the workbook contains user fiters or has calculations containing:
‘functions such as USERNAME or IS MEMBEROF, then the madel is not shared with any other
User sessions. Use these functions with caution as they can signifcantly reduce the effectiveness of
‘the madel cache.
‘The session models ate not persistent and expire after 30 mins of inactivity (default settingthis
can be changed) in order to recycle memory on the server. Ifyou find that your sessions are expiring
bosfore you are truly finished with 2 view, consicer increasing the VizQL session timeout setting
ANALYTICS LAYER
“The analytics layer includes the data manipulations and calculations performed on the underiying data
‘Tne objective of caching at this level Isto avoid unnecessary computations and to reduce the number
cof queries sent to the underlying data sources.
‘Tne caching in this layer applies to both Tableau Desktop and Tableau Server: however, there sre
ces in the caching mechanisms between the two tools. These cifferences are in ine with the
on 3 single user (Desktop) and multi-user (Server) environment,
{evwemon cuves west raAcrices FOR USING TABLEAU WITH SNOWFLAKE. 4RT
ys snowflake
ABSTRACT QUERY
‘When a user interacts with a view, many physical queries may be generated, Rether than blindly
eyscunng ther al Tableau groups them into # query batch ane cecomplies them to see itthere are
any optimizations that can be achieved, This could involve removing duplicate queries, combining.
‘multiple similar queries into a single statement or even eliminating queries where the results of ane.
can be derived from the results of another, Tableau maintains a cache of query results indexed by the
logical structure of the query, and ths is checked to see i its necessary t9 proceed further to the
netve query.
‘Tableau incorporates logic to not cache the results of
* queries that return large result sets (t00 big forthe cache}:
+ queries that execute quickly (faster to run the query than check the cache!
* gueries that use user fiters: or
© queries that use relative dete filters.
DATA LAYER
‘The data layer addresses the native connection bebween Tableau and the data sources, Caching at this
level is about persisting the results of queries for reuse in future It also determines the nature of the
connection to the data source - whether we are using live connections or the Tableau deta engine
(replaced with Hyper in Tableau 10.5 anc later
NATIVE QUERY
‘The native query cache is similar to the abstract auery cache, but instead of being indexed by the
logics] query structure it's Keyed by the actual query sistement, Multis abstract queries can resolve
to the same native query
Lie the abstract query cache, Tableau incorporates losic to not cache the resuits of
* queries that return large resuit sets (too big for the cache}:
* queries that execute quickly (faster to run the query than check the cachet
+ queries that have user fiers: or
+ queries that use relative date fiters.
SNOWFLAKE CACHING
RESULT CACHING
‘When a query is executed, the results persisted in Snowflake for a period of time (currently 24
hours), at the end of which the results purged from the system,
Apprsisted result is availabe for reuse by another avery as long as the user executing the query has
the necessary access privileges forthe tablels) used in the query and all of the following conditions
have been met:
{Peowmon cuves west PaACTcEs FOR USING TABLEAU WITH SNOWFLAKE.RT
ys snowflake
+ The new query syntactically matches the previously-executed query.
+ The table data contriauting to the a
ted result for the previous auery is stil avaliable,
+ Any configuration options that affect how the result was produced have not changed,
result has not changed
+ The 8
+ The query does not include functions that must be evaluated at execution time
(eg, CURRENT_TIMESTAMP),
Result reuse can substantially reduce query time because Snowflake bypasses query execution and,
instead, retrieves the result cirectly fram the cache. Each time the persisted result fora query is
reused, Snowflake resets the 24-hour retention period for the result, up to a maximum of 31 dave
from the dete anc tine that the query was frst evecuis. After 21 days, the result purees, snd the
next time the query s submitted, a new result is returned and persisted.
Result reuse is controled by the session osremsier USE CACHED RESULT. By Gefeult the psremeter
is enabled but can be overridden at the account, user and session level if desired.
Note that the result cache in Snowflake will contain very similar data as the native query cache in
“Tablesu; however, they operate indevendently
WAREHOUSE CACHING.
Each warehouse, when running, maintains a cache of table data accessed as queries are processed by
the warehouse, This enables improved performance for subsequent queries if they are able to read
from the cache instead of from the table(s)in the query. The size ofthe cache is determined by the
numberof servers in the warehouse: i. the larger the warehouse (and, therefore, the number of
servers in the warehouse, the larger the cache.
Tris cache is dropped when the warehouse is suspended, which may result in slower inital
performance fer some queries after the wershouse is resumed. As the resumed warshouse runs and
processes more queries, the cache Is rebut, and queries that are able to take acvantage of tne cache:
‘will experience improved performance.
Keep this in ming when deciding whether to suspend 2 warehouse or leave it running, In other words
consider the trade-off betieen saving credits by suspending 2 warehouse versus maintaining the
cache of data from previous queries to help with performance,
{Pevweeon cuves west PaACTCEs FOR USING TABLEAU WITH SHOWFLAKE.wb
ys snowflake
Other performance considerations
CONSTRAINTS
Constraints define integrity and consistency rules for data stared in tables. Snowflake provides suoport
for constraints 2s defined in the ANSI SQL standard, as well as some extensions for comostiblty with
other databases, such as Oracle. Constraints are provided primarily for data modeling purposes and
compatibitty with other databases, as well as to support client tools that utilize constraints,
Its recommended that you define constraints between tables you Intend to use in Tableau. Tableau
Uses constraints to perform jain culing oin efimination), which can improve the performance of
generated queries. I vou cannot define constrsints, be sure to set “Assume Referential Integrity" in the
“Tablesy dais source to slow the query aenerstor to cul unnesdd joins
EFFECT OF READ-ONLY PERMISSIONS
Having the ality to ceate temp tables Is Important for Tableau users: when avaliable, they are used
in multiple situations (eg. complex fiters. actions. sets, etc). When they are unavailable (eg. if you
ate working with a shared database via # read-only account), Tableau wil try to use alternate avery
structures, but these can be less efficient and in extreme cases can cause errors
Example
See the following visualizetion—a scatter plot showing Order Tetal Price vs. Supply Cost for each
Customer Name in the TPCH-SF1 sample deta set, We have lassoed a number of the points and
created a set which is then used on the color shel to show IN/OUT:
\When we create the set, Tableau creates 3 temp table (in the background) to hol the lst of
dimensions specified in the set (i this case, the customer name)
{Grenson ouces nest PRACTICES FOR USING TABLEAU WITH SMOWFLAKE afSb
ys snowflake
(CREATE LOCAL TEMPORARY TABLE Tableau 97-2_Fiter(
"OK C_NAME" VARCHAR) NOT NULL
"Tableau join ag” BIGINT NOT NULL
) ON COMMIT PRESERVE ROWS
‘We then populate this tern table with the selected values:
INSERT INTO "# Tableau 37_2_Flter*(X_C_NAME:"X_Tableau jln fog")
‘VALUES (77),
Finally, the following query is then executed to generate the required result set:
[SELECT “ustomers""¢_ NAME" AS“C. NAME",
(CASE WHEN (CASE WHEN (NOT (Fifer 1*°% Tableau Jln fag" IS NULL) THEN 1 ELSE 0 END} ~ 1) THEN 1
ELSE 0 END) AS “lst tn,
'SUM(Orders'"0_TOTAL PRICE’) AS "sur:_TOTALPRICEK
SUMCPARTSUPP*"P5_SUPPLYCOST) AS "sum:P5,SUPPLYCOSToK*
FROM "TPCHI'TLINEITEM "Lneltem"
INNER JOIN “TPCHI'ORDERS' "Orders" ON (Lineitem™_ORDERKEY" = "Orde/s'70_ORDERKEY)
INNER JOIN “TPCH1"CUSTOMER' "Customers" ON Orders-"0_CUSTKEY"~“Customers™C_CUSTKEY’)
INNER JOIN “TPCHPART? Parts" ON (Ineltem_PARTKEY:~ “Parts PARTKEY?)
INNER JOIN “TPCHI':PARTSUPP" *PARTSUPP” ON (‘Parts'"P_PARTKEY" = "PARTSUPP'PS_PARTKEY)
LEFTJOIN *#Tableau_37_2 Fiter"*Fiter "ON (Customers™"C NAME" = "iter 1% CNAME?)
GROUPEY 1,
2
If user does not have the right to create tempo tables n the target database or scheme, Tableau will
create WHERE IN clauses in the query to specify the set members:
SELECT "Customers7C_NAME”AS "C_NAME"
FROM “TPCH_SF1""CUSTOME!
customers!
[WHERE (Cusiome:s. NAME" IN (Customer#000000388, Customer #000000412% "Customer#000000679;
“astomers000001522,"Cstomer#000001948, ‘Customer#000001995, Custemer=000002266;,
“Customer#000002548!‘Customert0000030197"Customer#000008433""Customer#000003451
‘Customers000147362; 'Customer#000249548))
GROUPEY1
ORDER BY 1ASC
{Orcnmron ouces nest PRACTICES FOR USING TABLEAU WITH SMOWELAKE &?st
L
ys snowflake
However.
poroach is subject to the limitations of the query parser. and you will encounter errors
Ifyou ty to create @ set with > 16,884 marks:
‘SQL compilation error: eror ine 3 at position 31 maximum number of
exceeded, expected at most 16,384, ot 40,483
‘Unable to propery caleulate the domain forthe field Set 1. Displayed data
may be incorrect.
Goto Support
coor) Show Guery Hide Deals
This error could be overcome by converting the data connection fram 2 live connection te an
extracted connection; however, this is a viable approach only if the data volumes are nat massive,
{Orcnmsron ouces nest PRACTICES FOR USING TABLEAU WITH SMOWELAKE #8a
x snowflake
Measuring performance
IN TABLEAU
PERFORMANCE RECORDER
‘Tne frst place you should look fo
Tebleau Desktop and Server. You enable this
ee Window Hale
Ope Help
er supper.
Chaco radu Update.
Wate Tang eee
Samp Workbooks
sample Galery
noose Language
Marage Product Keys.
mance Recording feature of
Char Sea Ser Sin in
(Enable Aceclerated Graphics
Manage ter SniceComecton
‘Set Dashioard Web View Soc
{O-cnmaron ouces nest PRACTICES FOR USING TABLEAU WITH SMOWELAKE 4wb
ys snowflake
Start performance recording, then open your workbook. Interact wih Its if you were an end user, and
when you feel you have gathered enough data, go back in the Help menu and stop recording. Another
‘Tableau Desktop window will pen at this point with the data captured:
vets Sorted by Te
<<)
‘cr [inca sh ugencare ana agra
‘ome acne ne
ath lf] Ona re —
thet som ol fomcnog apy fey on oma ope = mee ompape)
aetna opener
‘You can now identity the actions in the workbook that take the mast time; for exemple, in the above image,
the selected query takes 0.8 seconds to complete, Clicking on the bar shows the text ofthe query being
‘eecuted. As the output ofthe performance recorder sa Tableau Workbook you can create adcitional
views to explore this information in ather ways.
‘You should use this information to identity those sections of a workbook that afe the best candidates
forreviews 2, where you get the best improvement for the time you spencl, More information on
interoreting these recordings can be found by clicking the following lnk:
https/tabsoft.co/1RdF420
Locs
DESKTOP LOGS
In Tableau, you can find detalled information on what Tableau is doing in the log files. The default
location for Desktop is C:\Users\\ Documents\My Tableau Repository\Logs\log.tt
Tris file is auite verbose and is written 2s JSON encoded text. but if you search for “begin-query” or
"end-auery’ you can fine the query string being passed! to the dats source, Looking et the “end-ouery”
log record will iso show you the time the query took te run and the number af records that were
retuned to Tablea
{evwmon cuves west PaACTCEs FOR USING TABLEAU WITH SHOWFLAKE. 5a
x snowflake
(s2015-05-2¢71225:41.2267 pid 6460, "2674 eV “fo ea'T sess :
tung. eit wotocot "4306707 algun ELT (DimPreduategary ProdutCsegoyfael
{AS [none ProductCategoryamen],\n[DimProductSubcategory)[ProductSubcstegoryName] AS
Inone:roduetSubeategoryNameins] \n SUMICAST((F2cSals|[ReturnQuonty) as BIGINT) AS
IbumRcturnQuanttyoKl\n SUM(FactSales](SalesAmount) AS [sum:SalesAmount-ol\iFROM (dbo) [FacSales]
[FaetSatesI\n INNER JOIN [dbo [DfmPreduct] [DimProduct} ON (FactSales|Preductky = [DinPreduct.
[ProductkeyD\n INNER JON [abo),[DInProductsubcategory] [Dimeroductsubcategory] ON DimProduct
[ProductSubestegorykey ~[DimProductSubeategory).[PreductSubcstegoryKeyi\n INNER JOIN ido
(DinProductCategory|[DinProductCategory] ON (DimProductSubcetegory](ProductCategoryKey] =
[DimproductCategory[ProduetCategoryKey)\nGROUP BY [DienPreduct Category [ProductCategoryName}\n
(DimprocuctSubcategory)[ProductSubcatogoryNamelrows'32/elapsed0.951)
Since Tebleau 10.1, JSON files are @ supported date source, so you can also open the log files with
“Tabless for sesieransiyis. You can analyze the Sine it takes for each event, what queries are being
run and how much data they are returning:
eesnea oe Nea Owe
Another useful tool for analyzing Tableau Desitop performance [s the Tableau Log Viewer. This cross-
platform tool allows you to easily views filter and search the records from Tableau's log file. A powerful
feature of TLV is that it supports live monitoring of Tableau Desktoo, so you can see in real-time the
log entries creeted by your actions in Tableau.
{Pevwemon cuves west PEACTICES FOR USING TABLEAU WITH SHOWFLAKE 51wb
ys snowflake
‘This too! is made avelaole as-is by Tableau nd can be cawnleaded from GitHub:
hetpurbitiw/2rQJecu
‘Additionally, you could load the JSON logs into Snowflake and take advantage of the VARIANT data
type to help with your analvsts
SERVER LOGS
If you are looking on Tableau Server, the logs are in C:\PragramDate\Tableau\Tableau Serverideta\
tabsve\\Logs, In most cases, you will be Interested in the VizQL Server lg files. If you
do not have console access to the Tableau Server machine, you can download @ ZIP archive ofthe log.
files from the Tableau Server status page:
Logfiles
Jan 272085 502M cmd
(oot | a
Just ke the Tableeu Desktop log fles, these are JSON formaties text fles,s0 you can open them in
‘Tableau Desktop or can reac them in their raw form. However, because the Information about user
sessions and actions fs spread across multiple files (coesponding to the various services of Tableau
Server) you may prefer to use a powerful tool called Logshatk to analyze Server logs. Logshark is 2
command-ine utilty that you can run against Tableau logs to generate a set of workbooks that provide
insights into system performance, content usage and error investigation. You can use Logshark to
visualize, investigate and solve issues with Tebleau at your convenience.
{Pevweeon cuves west PaACTCES FOR USING TABLEAU WITH SHOWFLAKE 52Ry
2
< snowflake
Alcs ae + dhe
‘You can find out more about Logshark st the folowing link:
hitou/tebsoft.co/2rQKBoS
‘And you can download it from GitHub:
hetpubitly/2rQS0qn
Additionally, you could load the JSON logs into Snowfake and take advantage of the VARIANT data
type to help with your analysts
SERVER PERFORMANCE VIEWS.
‘Tebleau Server comes with several views to help administrators monitor activity on Tableau Server,
‘The views sre cated in the Analysis table on the server's Maintenance cage. They cen prove
‘valuable information on the perfarmance af individual workbooks, helping you focus your attention on
those that are performing poorly:
{Grenson auoes nest PRACT FOm USING TABLEAU WITH SMOWSLAKE 52More information on t
se Views can be found at the folowing link:
httoxtebsoft.co/ 1R}CCL2
Adgitonally, custom administrative views can be created by connecting to the PostgreSQL databe:
that mstes uo part of the Tableau repository. Instructions can be found hers
httoutabsoft.co/1RICACR
TABMON
‘TebMon, is an open source cluster monitor for Tableau Server that allows you to collect performance
statistics aver time. TabMon is community-supported, and we are releasing the full source code under
the MIT open source license.
‘TabMon records system heaith and application metrics aut oft
‘Windows Perfmon, Java Health and Java Mbean MX) counters on Tableau Server machines across @
network You can use TabMon to monitor physical (CPU, RAM), network and hard-clsk usage. You can
track cache-it ratio, request latency, active sessions and much more, It cisplays the dé
Unified structure, making it easy to visualize the data in Tableau Desktop.
box collects bult-in meties tke
ina dean,
Genmrson ouces vesta
x snowflake
a
‘TabMon glves you full control over which metrics to collect and which machines to monitor, no
serlpting or cosing required. All you need to know is the machine and the metric name. Tat Mon
can un bath remotely and independently of your cluster. You can monitor, aggregate and analyze
the health of your clusters) from any comauter on your network with almost no added load to your
production machines.
‘You can find more information on TabMon here:
hetpubit y/1ULFe
TABJOLT
‘While not exactly a performance monitoring tool, TabJott is very useful to helo you determine if your
prablems are relates to platform capacity /ssues. Teblot is particularly useful for testing Snowflake
mmuit-cluster warehouse scenarios when you want to simulate 2 load generated by multiple users.
‘Teblolt is a "point-and-run’ load and performance testing tool specifically designed to work easily with
‘Tableau Server. Unlike trectional load-testing tao, TabJolt can automatically crive load against your
‘Tableau Server without script development or maintenance, Because Tablott is aware of Tableau's
tation model, t can automatically load visualizations and interpret possible interactions during
execution,
Tris allows you to ust point Tabsott to one or more workbooks on your server and automatically load
and execute interactions on the Tableau views. Tablott also collects key metrics including average
response time, throughput and 95th percentile response time, and captures Windows performance
metrics for correlation,
(Of course, even with Tebdolt, users should have sufficient knowledge of Tableau Server's architecture.
‘Treating Tableau Server asa black box for load testing is not recommended and will ikely yield results
that arentin line with your expectations.
‘You can find more information on Tablolt here:
eto: ty/AULE gi
{Grenson aces nest PRACT FOm UNING TABLEAU WITH SMOWSLAKE SERT
ys snowflake
IN SNOWFLAKE
Snowflake has several built-in features that allow you to monitor performance of the queries
generated by Tableey and to link them lack to speciic workbooks, dats connections and users.
SNOWFLAKE INFORMATION SCHEMA
‘The Snowflake Information Schems [aka ‘Data Dictionary’) Is a set of system-defined views and
functions that provide extensive information about the objects crested in your account, The
Snowflake Information Schema is based on the SQl-92 ANSI Standard Information Schema, but also
includes views and functions that ae specific to Snowflake,
SNOWFLAKE QUERY HISTORY
\Within tne Information Schema is @ set of tables and table functions that can be used to retrieve
Information on queries that have been run during the past 7 days
httos:/docs snowflake:net/manuals/sal-reference/functions/auery_history.
“These queries can return detaled information on the profile execution timing and profile of queries,
“yh Gan then be used to determine F your warehouse sizes are appropriate. You can se= information
fon the start tme, end time, query duration, # rows retumed and the data volume scanned. You can
aso see the amount of time queries spend queued vs, executing, which is useful Information when
considering adding mult-cluster scaling to your warehouses.
‘A.useful way to access this information is through the query_histary table functions via Tableau:
{Deven cuves west PEACTICES FOR USING TABLEAU WITH SHOWFLAKE.RT
ys snowflake
‘Tne query history for an account can also be accessed from the Snowflake web interface:
TEEEEFERRERREEES
This page displays all queries executed in the
previous sessione and queries executed by other users, This view shows query duration broken down
by aueveing, comolltion and execution and is also useful for identifying queries that are resolved
‘through the result cache,
last 14 days, including queries executed by you in
SS aaa
:
1
i
i
i
t
q
7
i
pritegisr
iiiiit
{O-cnmenon auoes nest PRACT FOm UNING TABLEAU WITH SMOWSLAKE 57wb
ys snowflake
By clicking on the hypertink i
forthe selected query:
Query ID column, you can etl through to more detailed statistics
Tris is also where you have access to the query profile,
SNOWFLAKE QUERY PROFILE
Query Profle is @ powerful tool for understanding the mechanics of queries. Implemented as a page in
the Snowflake web interface, It provides @ graphical tree view, a8 wal 2s cetalled execution statistics,
for the major components of the pracessing plan for a query. it's designed to help troubleshoot issues
in SQL query expressions that commonly cause performance bottlenecks.
To access Query Profle:
© Goto the Worksheet or History gee in the web interface,
* Click on the Query ID for a campieted query.
+ In the query detalls, click the Profile ta.
{Gowen ouDEs oEsr PRACTICES FOR USING TARLEAM ITH SHOW FLAKE 5RT
ys snowflake
‘To help you analyze query performance, the detal pane! orovides two classes of profiling information:
EXECUTION TIME
Execution time provides information about ‘where the time was spent’ during the processing of @
‘query. Time spent can be broken down into the following categories, displayed in the folowing order:
+ Processing-time spent on data provessing by the CPU
+ Local Disk 1O—time when the processing was blocked by local disk access
+ Remote Disk 1O—time when the processing was blocked by remote ask access
+ Network Communication work data transfer
+ Synchronization various synchronization activities between participating processes
© Initialization—tme soent setting up the query erocessing
when the processing was waiting for
STATISTICS
Amajor source
‘following sections:
formation provided in the detail pane! consists of various st
stcs. grouped in the
+ 10—information about the input-outout operations performed during the query
~ SCAN PROGRESS—the percentage of data scanned for a given table so far
~ BYTES SCANNED—the number of bytes scanned so far
~ PERCENTAGE SCANNED FROM CACHE-the percentage of dats scanned fram the local disk cache
~ BYTES WRITTEN-bytes vnitten (e., when loading into a table)
~ BYTES WRITTEN TO RESULT—bytes written to 2 result object
~ BYTES READ FROM RESULT-bytes read from @ result object
~ EXTERNAL BYTES SCANNED—bytes read from an external object (e. 2 stage)
{GowHeon oUDES oEsr PRACTICES FOR USING TANLEAM ITI SHOW FLAKE 3RT
ys snowflake
+ DML-statistics for Data Manipulation Language (DML) queries:
NUMBER OF ROWS INSERTED—number of rows inserted into table (or tebes)
NUMBER OF ROWS UPDATED-number of rows undated in a table
NUMBER OF ROWS DELETED—number of rows deleted from 2 table
NUMBER OF ROWS UNLOADED-number of rows unloaded during data export
NUMBER OF BYTES DELETED—number of bytes deleted tram a table
+ Pruninginformation onthe effects of table pruning
~ PARTITIONS SCANNED—number of partitans scanned! so far
~ PARTITIONS TOTAL-total numberof partitions ina given table
+ Spiling information about disk usage for operations where intermeciate results 60 nt fin memory
~ BYTES SPILLED TO LOCAL STORAGE™volume of data spl to loca disk
~ BYTES SPILLED TO REMOTE STORAGE—volume of data spilled to remote disk
+ Network—netwark communication:
~ BYTES SENT OVER THE NETWORK-amount of data sent over the network
‘Tris information can help you identify commen probl
s thet occur with SQL queries
“EXPLODING” JOINS
‘Two common mistakes users make is joining tables without providing ajoin condition (resulting in
2 ‘Cartesian Product’) and getting the join condition incorrect (resulting in records from one table
matching multiple records from another table). For such queries the Join operator produces
significantly often by orders of magnitude) more tuples than it consumes.
Tris can be observed by looking at the number of records produced by a Join operator and typically
is also reflected in Join operator consuming a lot of time.
‘The following examole shows input in the hundreds of records but outout in the hundreds of thousands:
select tt. te
from
join t2 on tttet = tet
and tte? = 202:
{ewemon cuves west PaAcrices FOR USING TABLEAU WITH SNOWFLAKE.RT
ys snowflake
‘Tne recommended action for this problem Is to review the join conditions defined in the Tableau data
connection sereen and correct any omissions,
QUERIES TOO LARGE TO FIT IN MEMORY
For some operations (e.. dupicate elimination for a huge data set), the amount of memory available
for the servers used to execute the operation might not be sulfcient to hole intermesiate results. As
2 result, the query processing engine will start sping the cata to a lacal disk. if the local disk space is
Incufficient, the spilled data is then saved to remote cis.
‘This soiling can have # profound effect on guery performance especially
cist). To alleviate ths, we recommend:
sis ssplled to 2 remote
* Using 3 lager warehouse (effectively increasing the available memory/lacalcisk space for the
operation), and/or
+ Processing dats in smaller batches
INEFFICIENT PRUNING
Shawflske collects rich statistics on data allowing it to avoid reading unneeded parts ofa table bases
on the query filters. However. for this to have an effec, the data storage order needs to be correlated
with the query fiter atibutes,
‘The efficiency of pruning can be cbserved by comparing Partitions scanned and Partitions totsl
statistics in the TableScan operators. If the former isa smal fraction ofthe latter, pruning is eft
‘ot, the pruning did not have an effect.
Of course, pruning can help only for queries that actually iter out a sigicent amount of cata
If the pruning statistics éo not show data reduction, but there isa Filter operator above TableScan that
filters out 2 number of records, this might signal that a different data organization might be beneficial
for this query.
LINKING PERFORMANCE DATA BETWEEN TABLEAU AND SNOWFLAKE
(One of the challenges users will face when examining performance is connecting gueries thet unin
Snowflake with the workbooks and user sessions in Tableau that generated them, One useful way to
do this isto set the QUERY_TAG session variable using an intial SQL statement.
{Pevwemon cuves west PEACTCES FOR USING TABLEAU WITH SHOWFLAKE.a
3% snowflake
‘Tableau provides paremeters that can be inserted into initial SQL statements that will pass through
attrioutes from the Tableau environment into Snowfike:
ToblesuServerUeer_— | The eer name ofthe current server user. Use when
esting up maersonanon onthe eavver Sans 27 ematy
fing fthe ser isnot signee nfo Tadley Server
TeSesUGenerLeesu) | Tae user came ns aoe ctone surant server sen
se wnen setting usimpersenaton on te server
[Returns an empty tring tne user i at Sard 922
Tesenuape “hs pane ot the Tesesu spp azon Teolesu Desktop Potesio
Tools Server
Tea
“The version ofthe Tableau apateson 5S
‘WerkbockName | The name cf the Tblesu workbook Use enyin
wongoots witn en enbecded dat source
Francia Anahsie
If you use the TableauServerUser or TableauServerUserFull parameter in an initial SQL statement.
‘you will create = dedicates connection in Tableeu that can't be shared with other users. This wil iso
restrict cache sharing, which can enhance security but may slaw performance.
For more detailed information, see the Tableau documentation here:
http:/onlinehelp tableau com/current/pro/des
initialsaLhtm|
Example
Using an intial SQL bi
the following statement is used to set the QUERY_TAG session verible:
ALTER SESSION SET QUERY.TAG = [WorkbookNmel{TaleauAppTableauServer User:
‘The resuiting value is then availabe from the Query History table function and can be used to
sttlaute quer es to speciic workbooks and uses:
(Query Times
he Tie a TT
ott hima oo
{Grenson auoes nest PAACTICS FOR UNING TABLEAU WITH SMOWSLAKE 2wb
as snowflake
Find out more
Snowfleke isthe only data warehouse built forthe cloud, enabling the dats-criven enterorise with instent
elasticity, secure data shering and per-second pricing. Snowfake combines the power of data warehousing,
the flexibility of big data platforms and th ity ofthe cloud at a fraction of the cost of traditional
solutions. Snowflake: Your data, no limits. Find out more at snawfake.net.