Commonly Asked Snowflake
Commonly Asked Snowflake
Commonly Asked Snowflake
Snowflake cache
7. Is there a cost associated with Time Travel in Snowflake?
Yes, Time travel is the feature provided by snowflake to retrieve data that is
removed from Snowflake databases.
using time travel you can do :
1. Query data in the past that has since been updated or deleted.
2. Create clones of entire tables, schemas, and databases at or before specific
points in the past.
3. Restore tables, schemas, and databases that have been dropped.
Once the Time travel period is over, data is moved to the Fail-safe zone.
For the snowflake standard edition, the default Time travel period is 1.
For the snowflake Enterprise edition,
for transient and temp DB, schema, tables, the default time travel period is 1.
for permanent DB, schema, tables, and views, the default time travel can
ranges from 1to 90 days.
8. What is fail-safe in Snowflake
When the time-travel period elapses, removed data moves to Fail-safe zone of
7 days for Ent. edition snowflake and above. Once data went to Failsafe, we
need to contact Snowflake in order to restore the data. It may take from 24
hrs to days to get the data. The charges will occur from where the state of the
data is changed on basis of 24 Hr.
9. What is the difference between Time-Travel vs Fail-Safe in
Snowflake
Time travel has a time period ranging from 0 to 90 days for permanent DB,
schema, and tables where Fail safe time is of 7 days only.
Once the table/schema is dropped from the SF account it will get into Time
travel according to the time travel duration of that object (0–90) days.
Once TT is elapsed, objects move into the Fail-safe zone.
Snowflake provides us with 3 methods of time travel –
a. Using Timestamp — We can do time travel to any point of time before or
after the specified timestamp.
b. Using Offset — We can do time travel to any previous point in time.
c. Using Query ID — We can do time travel to any point of time before or
after the specified Query ID.
Now lets drop the table :
and then we can read the data from the table again :
10. How does zero-copy cloning work and what are its advantage
Zero copy cloning is just like a creating clone of the snowflake object.
You can create clones of SF objects such as DB, schema, table, stream, stage,
file formats, sequence, and task.
when you create a clone, Snowflake will point the metadata of the source object
to cloned object depicting cloning until you make any changes to cloned object.
1. Main advantage of this is it creates a copy of the object in less time.
2. It does not consume any extra space if no updates happen on the cloned
object.
3. fast way to take backup of any object.
Syntax:
create table orders_clone clone orders;
11. What are Data Shares in Snowflake?
Data sharing is the feature provided by snowflake to share data across
snowflake accounts and people outside of the snowflake accounts. You can
share data according to the customized datasets shared. For people outside
snowflake, you need to create a reader account with access to only read the
data.
Below are the objects that can be shared:
Tables
External tables
Secure views
Secure materialized views
Secure UDFs
There are two types of users:
1. Data provider: The provider creates a share of a database in their account and
grants access to specific objects in the database. The provider can also share
data from multiple databases, as long as these databases belong to the same
account.
2. Data consumer: On the consumer side, a read-only database is created
from the share. Access to this database is configurable using the same, standard
role-based access control that Snowflake provides for all objects in the system.
12. What is Horizontal scaling vs Vertical scaling in Snowflake.
Snowflake Enterprise and the above versions support a multi-cluster
warehouse where you can create a multi-cluster environment to handle
scalability. The warehouse can be scaled horizontally or vertically.
The multicluster warehouse can be configured in two ways :
1. Maximized mode: Where min amount and max amount of clusters are the
same but (1 < cluster size ≤10)
Maximized mode
2. Auto-Scale mode: Where min amount and max amount of clusters are
different ( min = 2 and Max=10)
Auto-Scale mode
You can manually change your warehouse according to your query structure
and complexity. Below are the scaling methods available in snowflake.
Vertical scaling :
scaling up: Increasing the size of the warehouse (small to medium)
scaling down: decreasing the size of the warehouse (medium to small)
Horizontal scaling:
Scaling in:
Removing unwanted clusters from warehouse limit. (4 → 2)
scaling out:
Adding more clusters to the existing list of warehouses. (2 → 4)
12. Where is metadata stored in Snowflake?
Once the table is created in Snowflake, it generates metadata bout the table
containing a count of the rows, the date-time stamp on which it gets created,
and aggregate functions such as sum, min, and a max of numerical
columns.
Metadata is stored in S3 where snowflake manages the data storage.
that's why while querying the metadata, there is no need of running a
warehouse.
13. Briefly explain the different data security features that are
available in Snowflake
Multiple data security options are available in snowflake such as :
1. Secure view
2. Reader account
3. Shared data
4. RBAC
14. What are the responsibilities of a storage layer in Snowflake?
The storage layer is nothing but the cloud storage service where data resides.
It has responsibilities such as :
1. Data protection
2. Data durability
3. Data Encryption
4. Archival of Data
15. Is Snowflake an MPP database
Yes. By MPP it means Massively Parallel processing. Snowflake is built on the
cloud so it inherits the characteristics of the cloud such as scalability. It can
handle parallel running queries by adding necessary compute resources.
Snowflake supports shared-nothing architecture where the compute env is
shared between the users. When the query load increases, it automatically
creates multiple clusters on nodes capable of handling the complex query logic
and execution.
16. Explain the different table Types available in Snowflake:
It supports three types of tables :
1. Permanent :
Permanent tables are the default type of tables getting created in snowflake. It
occupies the storage in cloud storage. The data stored in a permanent table gets
partitioned into micro-partitions for better data retrieval. This type of table has
better security features such as Time travel
The default time travel period for the permanent table is 90 days.
2. Temporary: Unlike permanent tables, temporary tables do not occupy the
storage. All the data stays temporarily in the memory. It holds the data only for
that particular session.
3. Transients: Transient tables are similar to temporary with respect to the time
travel period but the only difference is transient tables need to be dropped
manually. They will not get dropped until explicitly dropped.
17. Explain the differences and similarities between Transient and
Temporary tables
18. Which Snowflake edition should you use if you want to enable
time travel for up to 90 days :
The Standard edition supports the time travel period of up to 1 day. For time
travel of more than 1 day for the permanent table, we need to get a Snowflake
edition higher than standard. All snowflake editions support only one day of
time travel by default.
19. What are Micro-partitions :
Snowflake has its unique way of storing the data in cloud storage. Snowflake is
a columnar data warehouse as it stores data in columnar format. By columnar,
it means instead of storing data row-wise it split the table into columnar chunks
called Micro-partitions. Why micro because it only limits each partition to be
50 to 500 MB.
Snowflake doesn’t support indexing instead it manages the metadata of each
micro-partition to retrieve data faster. A relational database when queried uses
indexes to traverse all the rows to find requested data. The overhead of reading
all the unused data causes the data retrieval time consuming and compute-
heavy. Contrary to relational DB, snowflake uses the metadata of MP and
checks which chunk or MP contains the data requested by the user. Metadata
content the offset and the number of rows consist in that particular micro
partition. Using the metadata, snowflake manages all micro-partitions for data
storage and retrieval.
Check the snowflake doc on micro-partitions.
20. By default, clustering keys are created for every table, how can
you disable this option
When new data continuously arrived and loaded into micro-partitions some
columns (for example, event_date) have constant values in all partitions
(naturally clustered), while other columns (for example, city) may have the
same values appearing over and over in all partitions.
Snowflake allows you to define clustering keys, one or more columns that are
used to co-locate the data in the table in the same micro-partitions.
To suspend Automatic Clustering for a table, use the ALTER TABLE command
with a SUSPENDRECLUSTER clause. For example:
alter table t1 suspend recluster;
4. Resume should be clear enough, don’t add the skills that you don't know
7. Don’t panic, if you don’t know the answer say politely that you never worked on it
10. Write down the questions after interview, don’t repeat the mistakes
1. What is Snowflake?
Snowflake is a cloud-based data warehousing service employing a unique architecture
that enhances data handling and analysis. It stands out from traditional data warehouse
solutions with features like simplicity, superior performance, high concurrency, and cost-
effectiveness. Snowflake's architecture, separating computation and storage,
revolutionizes data warehousing by consolidating all data into a single system,
facilitating storage and analysis with cloud tools.
Become a Snowflake Certified professional by learning this HKR Snowflake Training !
2. What is special about snowflake architecture?
Snowflake's architecture is a hybrid of shared-nothing and shared-disk designs, combining a
central data repository for consistent data storage accessible from all computing nodes and
massively parallel processing for executing queries. Its three-layer structure includes Query
Processing, Data Storage, and Cloud Services, each playing a crucial role in managing and
optimizing data handling, from query execution to metadata management.
3. How to access Snowflake's data warehouse?
Snowflake's data warehouse can be accessed through various methods, including
ODBC and JDBC drivers, Python libraries, a web user interface, and the SnowSQL
command-line client, catering to diverse user preferences and technical requirements.
4. What benefits does a Snowflake database offer?
Snowflake's cloud-native design addresses challenges unresolved by traditional
warehouses. Key benefits are as follows:
Robust security.
Consistent high availability.
Seamless data sharing.
Impressive speed and performance.
The capability to handle both structured and unstructured data efficiently.
Snowflake's virtual warehouse is a set of clusters facilitating data operations like loading
and queries, providing essential resources such as CPU and temporary storage for
efficient data processing.
Snowflake Training
Explore Curriculum
13. Why do we use Fail-safe?
Fail-safe in Snowflake addresses the drawbacks of traditional data backup methods,
offering a more efficient, cost-effective, and time-saving solution for data recovery and
management, minimizing company downtime.
14. What are the data shares in Snowflake?
Snowflake enables safe data sharing across accounts, allowing users to share readable
but non-modifiable database objects. This feature facilitates data collaboration and
access across different management and geographical units.
15. What is Zero-copy cloning?
Zero-copy cloning in Snowflake allows the creation of data object copies without actual
data duplication, enabling real-time data access and manipulation with minimal resource
usage.
Related Article: Snowflake Vs Redshift
16. What is Snowpipe?
Snowpipe is an efficient, user-friendly service for continuous data loading into
Snowflake, optimizing the data loading process through micro-batching and immediate
data availability for analysis.
17. What is Horizontal scaling and Vertical Scaling?
Horizontal scaling in Snowflake enhances concurrency through additional virtual
warehouses, while vertical scaling optimizes workload processing by choosing larger
warehouse sizes, demonstrating its scalability and performance adaptability.
18. Explain about the database storage layer.
Snowflake's database storage layer organizes data in an optimized, columnar format,
ensuring efficient storage and management of data with advanced organization,
compression, and access methodologies.
19. What is the Query processing layer?
Snowflake's query processing layer uses virtual warehouses, each a separate cluster,
ensuring isolated and efficient query execution enhancing performance and reliability.
Become a Snowflake Certified professional by learning this HKR Snowflake Training in
Chennai !
20. What is the Cloud Service layer?
The Cloud Services layer in Snowflake manages and coordinates various tasks,
including user requests, authentication, and infrastructure management, demonstrating a
seamless integration of services for optimal performance.
A columnar database, like Snowflake, stores data in columns rather than rows,
enhancing analytical query processing and overall database performance, marking a
significant advancement in BI analysis.