[go: up one dir, main page]

0% found this document useful (0 votes)
135 views14 pages

How To Knit Your Data Mesh On Snowflake-2

Snowflake provides several key capabilities that support implementing a data mesh architecture. It can be used as a single platform for diverse data types and workloads. As a distributed platform, Snowflake avoids silos and allows distributed domain teams to independently manage databases and workloads while securely sharing data assets. It also includes built-in capabilities for data sharing and publishing that help implement data as a product.

Uploaded by

infinity dam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views14 pages

How To Knit Your Data Mesh On Snowflake-2

Snowflake provides several key capabilities that support implementing a data mesh architecture. It can be used as a single platform for diverse data types and workloads. As a distributed platform, Snowflake avoids silos and allows distributed domain teams to independently manage databases and workloads while securely sharing data assets. It also includes built-in capabilities for data sharing and publishing that help implement data as a product.

Uploaded by

infinity dam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

HOW TO KNIT

YOUR DATA MESH


ON SNOWFLAKE

WHITE PAPER
Data mesh1 has become an increasingly popular approach to data management in recent years. Companies
in all industries are choosing data mesh for decentralized data management to improve data agility and avoid
the organizational bottlenecks often connected with centralized and monolithic approaches.
This paper discusses the Snowflake approach to data mesh. It describes some of the most critical Snowflake
capabilities for a data mesh and presents typical architecture options that our clients have chosen in order to
implement a self-service data platform that supports distributed domains.
Data mesh is primarily an organizational approach that defines responsibilities and coordination across
separate domain teams and their data products. However, the right technology is needed to enable the
domains to follow the data mesh concept in a feasible way.

Data mesh is not about technology [...] but you need the right technology to enable the
data product teams with a variety of capabilities. Every domain and data product team
doesn’t have to reinvent the wheel and start from scratch and build their data & analytics
platform. Similarly, we need to make it simple for the data product teams. Without this,
there is no empowerment and no decentralization.”
—OMAR KHAWAJA, Global Head BI, Roche (2022)

Snowflake is being used successfully as a data platform by many companies that follow a data mesh approach.
There is no single technology platform that provides a complete end-to-end solution to support the data
mesh concept. However, Snowflake provides many of the capabilities needed for a self-service data platform,
enabling a distributed, domain-driven architecture and offering capabilities to help implement data as a
product and federated computational governance.

THE SNOWFLAKE APPROACH TO DATA MESH


After working with numerous clients on their data • Be mindful of cost and complexity. For example,
mesh initiatives, Snowflake has embraced the it has proven beneficial to keep the set of tools
following approach: in the self-service data platform as small and as
consistent as possible across all domains while
• We recognize that data mesh is first and
satisfying all critical domain requirements.
foremost an organizational transformation.
This transformation has many non-technical • Define incentives and success criteria early on,
implications but often also requires changes at including measurable KPIs for domains, data
the IT architecture and technology level. products, the self-service data platform, and
governance controls.
• Be pragmatic. We advise our clients not to aim
at implementing the “perfect” data mesh but to • There is no out-of-the-box data mesh solution. We
be guided by addressing their specific pain points embrace our extensive partner network to build
and objectives. For example, polyglot storage joint solutions that meet client requirements. For
and multi-modal access are useful concepts, example, tools for data governance, automation,
but companies should focus on their actual DevOps, and other areas are often part of a data
requirements to maximize impact. mesh solution, even if not discussed in detail in
this article.
• Start small, expand incrementally, and work your
way up along the data mesh maturity curve over
time. For example, start with one or two domains
and data products to satisfy an immediate business
need, and then exploit the early success to expand
the mesh.

www.thoughtworks.com/en-us/what-we-do/data-and-ai/data-mesh
1

WHITE PAPER 2
RELEVANT SNOWFLAKE CAPABILITIES
Snowflake offers a number of key capabilities that our its relational tables. Supported file formats include
clients have found helpful in building the self-service JSON, XML, Parquet, AVRO, Delta Lake2, Apache
data platform for a data mesh. Iceberg3, and others. Additionally, Snowflake has
support for unstructured data, such as images, video,
Snowflake is much more than a cloud data or other binary formats. Data can be manipulated in
warehouse the Snowflake platform using SQL, Python4, Scala,
Snowflake is an integrated cloud service provider Java, and Javascript, or by invoking external functions
that offers a broad range of capabilities for data on the broader cloud platform.
engineering, data lakes, data warehousing, data Snowflake may or may not provide all functional
sharing, and significant portions of a typical machine capabilities that your domain teams require, but it
learning lifecycle. offers a significant range of capabilities in a single
In particular, users can build and automate data service that would otherwise require a collection of
transformation pipelines to turn diverse input data cloud services to be integrated. Such integration can
into governed data products. Snowflake can operate be complex and time consuming and requires highly
on common file formats in your cloud storage buckets skilled individuals.
as easily as on input streams (e.g., from Kafka) or

FIGURE 1: SNOWFLAKE AS A SINGLE PLATFORM FOR DIFFERENT TYPES OF DATA AND WORKLOADS

2
In public preview at the time of publication, August 2022.
3
In private preview at the time of publication, August 2022.
4
In public preview at the time of publication, August 2022.

WHITE PAPER 3
Snowflake is a distributed platform, not a monolith publishing metadata (“listings”). Using listing discovery
Snowflake is a distributed but interconnected controls, producers can privately share with other
platform that avoids silos and enables distributed accounts, group of accounts or publicly share via the
teams to exchange data in a governed and secure Snowflake Marketplace. Data producers can specify
manner. How does that work? A company can create SLAs or SLOs for the data that they are sharing, such
one or multiple Snowflake accounts that can reside in as the update frequency, the amount of history, the
the same or in different cloud regions and platforms temporal granularity of the data, and other properties
(Figure 2). Each account can be home to multiple that help describe the data as a product.
separate databases for which compute and storage Other teams can search to discover relevant data
resources can be deployed and scaled independently, assets available to them and obtain or request access.
in a distributed way. Such data consumers gain live access to the shared
Different domain teams can work autonomously using data, which remains under the control of the producer
independent compute power in separate databases who can customize access policies or revoke access at
or even in separate accounts while still using the any time. The access to shared data does not require
underlying Snowflake platform to share data assets an ETL or data movement process to be implemented
with each other. Note that the Snowflake concept by the producer or consumer. Producers can also
of a “database” is not merely a traditional relational publish and share external tables, which are “views”
database but also includes all other functional over files stored outside of Snowflake, and which can
capabilities in Snowflake such as data engineering, optionally include Delta Lake and Iceberg formats.
data lake, data warehousing, data sharing, and data Producers can even share data with third parties
science. Using compute clusters to combine and outside of the company even if those parties are not
process data from multiple databases or accounts active Snowflake clients. For example, a data producer
is a core capability of the Snowflake platform. can share data externally via a so-called Snowflake
reader account and the full breadth of supported
Snowflake has built-in data sharing and APIs. Or, they can periodically export (partitioned)
marketplace capabilities data to a cloud storage bucket using any of today’s
popular file formats.
Data producers in Snowflake can share data, data
services or applications with other accounts by

SNOWFLAKE ORGANIZATION
(Top control across all of your accounts)

SNOWFLAKE ACCOUNT 1 SNOWFLAKE ACCOUNT 2 SNOWFLAKE ACCOUNT N

FRANKFURT ZURICH LONDON

Snowflake
Data Sharing

There can be multiple databases within a Snowflake account.

FIGURE 2: SNOWFLAKE ORGANIZATION, ACCOUNTS, AND DATABASES SUPPORT A DISTRIBUTED ARCHITECTURE

WHITE PAPER 4
Snowflake offers a broad range of security and DATA PRODUCTS IN SNOWFLAKE
governance features
In a data mesh, each domain creates, maintains, and
Federated governance is arguably one of the most owns one or more data products that are shared with
challenging parts of a data mesh journey and other domains and data consumers. Treating data
often requires one or more tools in combination as a product requires most of all a product-oriented
to satisfy all requirements. At the platform level, mindset that must become an organizational habit.
Snowflake supports role-based access control, row- Additionally, domains need suitable self-service tools
level access policies, column-level data masking, that support the creation and management of data
external tokenization, as well as data lineage, audit products. Let’s outline how Snowflake can help you
capabilities, and more. Users can also assign one implement the concept of data as a product.
or more metadata tags (key-value pairs) to most
any kind of object in Snowflake, such as accounts, A data product is defined as the combination
databases, schemas, tables, columns, compute of data plus metadata, code, and infrastructure
clusters, users, roles, tasks, shares, and other objects. dependencies.
Tags are inherited through the object hierarchy and • Data: In Snowflake the data of a data product
can be exploited to discover, track, restrict, monitor, can have various forms, such as tables, views,
and audit objects based on user-defined semantics. files (JSON, XML, Parquet, Avro, CSV, etc.), or
Additionally, tag-based access policies5 enable users external tables that act as views over files outside
to associate an access restriction with a tag such of Snowflake. A single data product can consist
that the access policy is automatically applied to any of multiple such objects. A typical practice for
matching data object that carries the relevant tag. domains is to use one schema per data product
In Snowflake, the definition of most governance to group the data objects and optionally also the
controls such as tags, access policies, or masking rules code for each data product. Data producers can
can be defined separately from applying these controls model the data in whichever way is best suited to
to data objects. This enables domain owners to agree satisfy the needs of the data consumers.
on common tags or policies across domains while • Metadata: The metadata of a data product
leaving their enforcement or extension to each domain includes the technical metadata of its data
individually. Additionally, secure views and data clean objects, such as tables names, column names,
room functionality can be used for analytics over data types, or file format definitions. The
sensitive data that could not be shared otherwise. metadata also includes object dependencies, data
From a product usage perspective, metrics such lineage, and access history. Each object can also
as telemetry and consumption data are collected, be annotated with tags which are key-value pairs
which can be used for impact analysis. This enables to express arbitrary metadata such as data origin,
domain teams to track how and how often their data domain, sensitivity, business terms, taxonomy,
products are being used by different consumers. cost center, or other user-defined attributes.

When a data product is published in the


Snowflake provides a near self-service experience Snowflake Marketplace, the data producer is
Some of the most common reasons for our clients prompted to provide documentation such as a
to choose Snowflake include ease of use and near- product description, the business need, examples,
zero maintenance. These are critical properties terms of service, and a link to data product
for a self-service platform. For example, users can support. The producer is also prompted to specify
easily instantiate and scale their own compute data product SLOs such as the update frequency,
clusters without support from an IT infrastructure the amount of history, the temporal granularity of
team. Cloning dev and test environments is equally the data, and other properties (see Figure 3).
straightforward. A change data capture mechanism
can be set up with a 1-line SQL DDL statement. This
focus on ease of use has been a guiding principle for
all features and functions on the Snowflake platform.
5
In public preview at the time of publication, August 2022.

WHITE PAPER 5
• Code: The code of a data product includes the • Infrastructure dependencies: For example, a
pipelines and transformations that create and Snowflake task that schedules and orchestrates
refresh a data product. In Snowflake, such code the pipeline to refresh a data product can specify
can include Snowflake Tasks, pipes, Streams, a certain compute cluster for the job. This can be
Stored Procedures6, user-defined functions, etc., a dedicated compute resource for just one data
all of which are Snowflake objects that can be product or shared among multiple data products.
grouped in a schema per data product. The code Either way, the cluster can be suspended and
in these objects can be SQL, Java, Javascript, resumed automatically as needed to incur cost
Scala, or Python and runs natively in the only when it performs work. Also, clusters can
Snowflake platform. be scaled up and down in a self-service manner.
Tasks, pipes, and other operations can also be
The code can also include policies. In Snowflake
serverless to reduce or remove the need for
this can be code for role-based access control,
explicit infrastructure dependencies.
dynamic data masking policies, row-level access
control policies, secure views, object tagging, or
code to classify or anonymize/tokenize the data.

FIGURE 3: SPECIFYING DATA PRODUCT SLOS FOR A DATA PRODUCT LISTING

6
In public preview at the time of publication, August 2022.

WHITE PAPER 6
Snowflake supports a variety of input and output used to securely access and distribute data,
ports for data products, including streaming data services and applications seamlessly across
ingestion, a Kafka connector, a Spark connector, clouds without requiring additional ETL pipeline
a Dataframe API, automatic data ingestion from or integrations.
cloud storage buckets, a REST API, file formats, and
Data products should also exhibit a number of
of course SQL APIs such JDBC, ODBC, .NET, and
important properties. Table 1 lists some examples
APIs for many popular programming languages.
of Snowflake capabilities that can help you achieve
Snowflake’s collaboration capabilities can also be
these characteristics.

DATA PRODUCT EXAMPLES OF SNOWFLAKE CAPABILITIES


CHARACTERISTICS (NOT EXHAUSTIVE)

Role-Based Access Control, Row-Level Access Policies, Dynamic Data Masking,


Secure
Encryption, Tokenization

Discoverable Targeted Discovery/Snowflake Marketplace, Optional integration with 3rd-party catalog

Addressable Snowflake data shares, standardized access cross-cloud and cross-region

Custom metadata tags, data listings with documentation, statistical shape of the data
Understandable
in Snowsight

SLOs/SLAs such as update frequency or granularity, data lineage, object dependencies,


Trustworthy
access history

SQL, Python, Java, Scala, SQL APIs, REST API, Dataframes, etc., to access multi-model
Natively accessible
data (structured, semi-structured, unstructured, various file types, etc.)

ANSI SQL data types, unified metadata and common APIs across domains, Snowflake
Interoperable
collaboration, data sharing, marketplace, data exchange

Composite data products that consist of multiple objects, Data products can consist of
Valuable on its own
data objects plus functions that can be shared with data product consumers

TABLE 1: DATA PRODUCT CHARACTERISTICS SUPPORTED IN SNOWFLAKE

WHITE PAPER 7
ARCHITECTURE OPTIONS FOR • In the remainder of this article we will not
DISTRIBUTED DOMAINS discuss the “Schema per domain” option in
Now let’s look at different Snowflake topologies that detail but it has many similarities to “Database
companies have chosen as a platform to support per domain.”
distributed domains. These topologies are general • “Heterogeneous domains”: Domains can use
patterns and an actual implementation can vary different IT stacks
based on specific requirements and preferences.
• Some domains use Snowflake and some use
• “Account per domain”: Each domain uses a other systems.
separate Snowflake account
• Some domains are in the cloud and some can
• Maximum isolation between domains. potentially be on-premises.
• Different domains can operate in different • Usually incurs a higher degree of complexity in
cloud regions and cloud platforms. order to accommodate heterogeneous domain
• Enables a multi-region and multi-cloud data environments.
mesh with a consistent Snowflake experience • Requires special consideration since it can
and built-in data sharing capabilities between be contrary to the data mesh goal of using a
domains, based on a central data exchange of common domain-agnostic self-serve platform
metadata where all domains can publish and across all domains.
obtain access to data products.
Further architecture variations or hybrid approaches
• “Database per domain”: Each domain uses one derived from these base types are possible and
or more separate Snowflake databases plausible. For example, a company might choose
• All of these databases are managed in a single “database per domain” and have these databases in
Snowflake account. several accounts instead of a single account. Also,
some domains might be using separate databases
• Simplified management of users, security, and while others use an entire Snowflake account. Also,
governance across domains. the environment that a domain team uses often
• Access to data products can easily be provided consists of Snowflake plus additional tools based on
by setting object level permissions across their requirements and skills.
databases. The key point is that Snowflake supports multiple
• Each domain team can still spin up and scale architecture choices that enable different trade-offs
their own computer clusters independent from between domain autonomy and decentralization on
other domains. the one hand versus different degrees of complexity
and operational management on the other hand.
• “Schema per domain”: Each domain uses separate
schemas in a single database Each company needs to find the right balance
between centralization and decentralization that
• Lowest degree of isolation between domain works best for their size, legacy, and organizational
environments. culture. The same applies to federated governance
• Each domain team can still spin up and scale where companies need to choose the right balance
their own computer clusters isolated from between centralized control and local domain
other domains. autonomy that works best for them.

• Potentially higher effort in naming conventions The following sections discuss these architecture
to distinguish between objects from different options in more detail. The focus of that discussion
domains. lies mainly on the Snowflake topologies rather than
the integration with third-party tools that our clients
• Can be useful for sub-domains in a domain/
often use together with Snowflake for their data
subdomain scenario.
mesh initiatives.

WHITE PAPER 8
Single Account: Database Per Domain
A popular topology that many of our data mesh clients Having all domains use the Snowflake Data Cloud
are adopting uses a single Snowflake account in enables them to have their separate environments
which domains use separate databases and separate and compute resources without being physical silos
compute clusters as their autonomous environments. that would make data product access challenging.
Each domain can be assigned one or more databases
Some governance is decided centrally and is applied
and clusters for their development, test, and
to all databases with a DevOps process. This can be
production needs. The self-service nature of the
facilitated by features such as object tags to keep
platform enables domains to use Snowflake’s Zero-
an easy overview of the different objects owned
Copy Cloning capabilities to (re)create dev and test
by the domains. Within the domains governance is
environments instantly and frequently. Additionally,
controlled by the domain teams that apply role-based
different users within a domain can be allowed to
access control as well as row- and column-level access
spin up and scale their own compute clusters for their
policies to secure data and exclude users and domains
respective needs in a self-service manner. Still, cost
from getting unwanted access to certain data.
and consumption monitors as well as quotas can be
configured for domains or other levels of granularity
in the user and resource hierarchy.

THIRD PARTY CATALOG / SNOWFLAKE MARKETPLACE


Inventory of shared data products

META DATA SEARCH ACCESS REQUEST WORKFLOW

DATA SNOWFLAKE ACCOUNT DATA


SOURCES SNOWFLAKE CONSUMERS
Domain Env.
DATABASE

Domain Env.

Internal Consumers
Domain Env.

Domain Env.

Domain Env.
External Consumers

Domain Env.

FIGURE 4: SINGLE SNOWFLAKE ACCOUNT - DATABASE PER DOMAIN

WHITE PAPER 9
Every domain can have multiple schemas, with one Naming conventions should be planned carefully, as
serving as a layer to make products available to there can be a lot of objects, considering that every
other domains. Another approach would be using domain might require DT(A)P (Development, Test,
a common share database where every domain will Acceptance, Production) environments, which they
have a schema to publish its data products as views can easily create with Zero-Copy Cloning.
(no copies). These products can be structured, semi- Similar to this approach is using a schema per
structured, or unstructured data, depending on what domain. The implications of this approach are similar
the product comprises. The products are then listed to the database per domain approach, as everything
in a third-party data catalog to be discoverable. is organized logically within one Snowflake account. It
For requesting access to a product, we have seen should be noted, however, that having a database per
multiple ways such as a manual approach, where the domain is easier regarding sharing data with external
requestor needs to open a ticket, which then will be consumers publicly via Snowflake Marketplace or
processed by the domain team and access allowed or privately using listing discovery controls.
denied by granting an access role to the requestor.
Some catalogs provide a more automatic flow. Multiple Accounts: Account Per Domain
Another possible topology allows each domain to
The advantages of having all the domains in one
operate in a separate Snowflake account. These
Snowflake account are:
accounts can be in the same or in different cloud
• Access to data products can easily be done by regions and cloud platforms. The global Snowflake
setting intra-database permissions. Data Cloud enables companies and domains to share
• Centralized network, security and governance data across accounts, regions, and cloud platforms
policy administration simplifies the overall and easily obtain standardized access to each other’s
management. data products in a secure and governed fashion.
Some of our clients are using this capability to
• Disaster recovery is simpler as it only requires
support a multi-region, multi-cloud data mesh.
one other account in another region or cloud
to support.

FIGURE 5: THE SNOWGRID AS A GLOBAL DATA CLOUD PLATFORM

WHITE PAPER 10
There are various reasons why companies are The resulting topology (Figure 6) is logically very
choosing this topology. For example, a company may similar to using a separate database per domain,
be operating in a globally distributed manner where except that each domain now “owns” a separate
different domains might naturally align with different Snowflake account and uses the Snowflake data
locations and regions in the world. Some companies sharing and Marketplace capabilities to make data
may be operating globally and need to observe data products accessible to others.
locality requirements (for example, an international
The advantages compared to the database per
company where some of the data is not allowed
domain approach can be the following:
to leave Europe without anonymization, masking,
or other measures to ensure compliance with data • Data sharing and collaboration capabilities can
privacy regulations). be used across domains.

Another common reason is mergers and acquisitions • Global naming standards can be applied more
that may force a company to exchange data across easily as each account is an independent
regions or cloud platforms. Some companies namespace.
intentionally pursue a multi-cloud strategy for • Cloud platform and regional preferences can
diversification or to accommodate preferences and be supported.
existing investments that different business units have
already made. Using separate accounts also achieves • There is separate security and user management
greater autonomy of the domains for example, if there per account.
is a requirement to have separate user management
and security management for each domain).

THIRD PARTY CATALOG / SNOWFLAKE MARKETPLACE


Inventory of shared data products

META DATA SEARCH ACCESS REQUEST WORKFLOW

DATA SNOWFLAKE DATA


SOURCES ACCOUNT CONSUMERS
Domain Env. FRANKFURT
SNOWFLAKE
DATABASE

Domain Env. LONDON


Internal Consumers

Domain Env. ZURICH

Domain Env. US WEST

Domain Env. SINGAPORE


External Consumers

Domain Env. AUSTRALIA

FIGURE 6: MULTIPLE ACCOUNTS - ONE ACCOUNT PER DOMAIN

WHITE PAPER 11
Heterogeneous Architecture
Some clients have asked us how to integrate other and serve data products? Or should they rather be
non-Snowflake domain environments into the viewed as data sources that domains take as input?
topologies discussed above. Such integrations result The latter case often enables clients to fall back to
in a heterogeneous architecture where not all of the topologies discussed above.
the domains are using the same domain-agnostic
One approach to integrating non-Snowflake domain
data platform to implement their pipelines and data
implementations is for them to push data assets or
products. Often this is motivated by the desire to
“near-ready data products” to an intermediate layer for
reuse multiple different repositories or technology
which Snowflake can act as a “proxy” that serves the
stacks that already exist in different parts of the
data products with consistent governance, security,
organization.
interoperability, etc., to the rest of the data mesh.
We found that such a heterogeneous architecture
This intermediate layer could be, for example, Kafka
usually drives up the cost and complexity of a
topics that feed into Snowflake via continuous
data mesh journey. The reason is that greater
ingestion followed by automatic updates of data
heterogeneity of the participating systems makes it
products in Snowflake. The intermediate layer could
harder to ensure consistency in governance, security,
also be one or more cloud storage buckets in Amazon
metadata, interoperability standards, performance,
S3, Azure Blob Storage, Azure Data Lake Storage,
required skills, IT support, and other critical areas.
or Google Cloud Storage. Data formats can include
Therefore we encourage clients to carefully consider
JSON, XML, Parquet, AVRO, Apache Iceberg, Delta
the role of the diverse systems and repositories that
Lake, and others. Snowflake can either auto-ingest
they seek to integrate into a data mesh. Are those
new files from storage buckets continuously for best
systems truly used as domain environments that build
performance, security, and automatic management,

THIRD PARTY CATALOG / SNOWFLAKE MARKETPLACE


Inventory of shared data products
META DATA SEARCH ACCESS REQUEST WORKFLOW

DATA
CONSUMERS

Non-Snowflake
Domain

Internal Consumers

Non-Snowflake
Domain
External Consumers

FIGURE 7: HETEROGENEOUS ARCHITECTURE

WHITE PAPER 12
or expose read access to such files as External Tables is performance when data from multiple different
to the rest of the data mesh. Snowflake’s External repositories needs to be joined. This typically requires
Tables are essentially views over data that resides data movement to bring data into one common place
in files outside of Snowflake. Still, External Tables to compute the join, even if other predicates can be
are first-class data objects in Snowflake that can be pushed down to the data sources.
secured and governed, joined, and even shared via
This can prohibit virtualization for performance
Snowflake collaboration much like other data objects
sensitive use cases. With Snowflake, a join across
in Snowflake. This enables Snowflake to act as an
multiple data objects in a single database has
integration layer that can expose outside data in a
approximately the same performance characteristics
consistent and governed manner without necessarily
as when these data objects are in separate databases
ingesting and duplicating the data.
or even in separate Snowflake accounts, which is
Various other options for integrating non-Snowflake a non-trivial property of the Snowflake platform
environments into the data platform exist but are architecture. Another challenge that we have seen
beyond the scope of this document. with virtualization in some companies is that it
can encourage teams to continue to work in their
Some companies look at data virtualization as a
respective technology islands, which are often
potential solution to integrating a diverse set of
very domain-specific, instead of striving towards a
domain environments. While there are certainly valid
common and domain-agnostic, self-service platform.
use cases for data virtualization, we found that it
also creates a number of challenges. One challenge

SUMMARY
Data mesh is not a silver bullet for all data management and data integration challenges. But if you determine that
data mesh is the right approach for your enterprise, make sure that you focus on the organizational and non-technical
questions that are mandatory for success. Some examples include organizational changes, roles and responsibilities,
staffing, incentives and accountability, buy-in from key stakeholders, or shifting the mindset to product thinking.

Eventually, you will need to design a self-service IT architecture that can support distributed domains and data products
with federated governance. Snowflake can play that key role as an easy-to-use self-service platform for domain teams.
Snowflake supports different topologies that allow companies to choose the desired degree of decentralization and
domain autonomy while ensuring that domains remain interconnected and interoperable. Snowflake enables single-
account topologies as well as multi-region, multi-cloud architectures and supports the integration of external domains
or multi-company collaboration setups. The underlying Snowflake platform with the global Snowgrid and the Snowflake
Marketplace act as the connecting tissue that helps organizations avoid the risk of creating data silos.

Additionally, Snowflake provides a broad range of features that help companies implement the concepts of data as a
product and federated governance. Snowflake also integrates easily with a broad range of third-party tools that can
provide additional platform capabilities. The Snowflake Data Cloud is an excellent technology choice to complement
the organizational changes and processes that are required for a successful data mesh transformation. To learn more
about how Snowflake can help, visit snowflake.com/data-mesh

WHITE PAPER 13
ABOUT SNOWFLAKE
Snowflake delivers the Data Cloud—a global network where thousands of organizations mobilize
data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations
unite their siloed data, easily discover and securely share governed data, and execute diverse analytic
workloads. Wherever data or users live, Snowflake delivers a single and seamless experience across
multiple public clouds. Snowflake’s platform is the engine that powers and provides access to the
Data Cloud, creating a solution for data warehousing, data lakes, data engineering, data science, data
application development, and data sharing. Join Snowflake customers, partners, and data providers
already taking their businesses to new frontiers in the Data Cloud. snowflake.com

© 2022 Snowflake Inc. All rights reserved. Snowflake, the Snowflake logo, and all other Snowflake product, feature and service
names mentioned herein are registered trademarks or trademarks of Snowflake Inc. in the United States and other countries.
All other brand names or logos mentioned or used herein are for identification purposes only and may be the trademarks of their
respective holder(s). Snowflake may not be associated with, or be sponsored or endorsed by, any such holder(s).

WHITE PAPER

You might also like