• LOGIN
  • No products in the cart.

Snowflake Tutorial Overview Updated- 2020

Snowflake has a unique architecture, built for the cloud. one of the most advantages is the true separation of storage and compute. The storage layer keeps the info on immutable cloud storage, like Amazon S3 or Azure Blob Storage. The compute layer consists of warehouses. you’ll check out a warehouse as a virtual machine which does the computations on the data.

Architecture

The architecture are often summarized as follows:

Database Storage

When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar format. Snowflake stores this optimized data in cloud storage.

Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of knowledge storage are handled by Snowflake. the info objects stored by Snowflake aren’t directly visible nor accessible by customers; they’re only accessible through SQL query operations run using Snowflake.

Query Processing

Query execution is performed within the processing layer. Snowflake processes queries using “virtual warehouses”. Each virtual warehouse is an MPP compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider.

Each virtual warehouse is an independent compute cluster that doesn’t share compute resources with other virtual warehouses. As a result, each virtual warehouse has no impact on the performance of other virtual warehouses.

Cloud Services

The cloud services layer may be a collection of services that coordinate activities across Snowflake. These services tie together all of the various components of Snowflake so as to process user requests, from login to question dispatch. The cloud services layer also runs on compute instances provisioned by Snowflake from the cloud provider.

Among the services in this layer:

  • Authentication

  • Infrastructure management

  • Metadata management

  • Query parsing and optimization

  • Access control

Features Of Snowflake

Data Loading in Snowflake

Though snowflake provides Bulk and Continuous loading for ingesting data, the simplest solution depends on the quantity and frequency of knowledge load. Data from flat files are often loaded using Web UI and bulk data are often loaded by various methods. the majority data loading are often automated in snowflake using snow pipe. the info from the source is loaded into stage files which are then loaded into tables.

Bulk Loading:

Enables loading batches of knowledge from any external cloud or the local system to an indoor cloud location before the info is loaded into the table using the COPY command. Few transformations supported by Snowflake while loading data are column reordering, column omission, casts, truncating text strings.

Continuous loading:

Small volumes of knowledge are often loaded employing a method of continuous loading and makes them availab le incrementally for analysis. it’s done using snow pipes.

The data are often loaded from the local system, cloud services like AWS S3, Google cloud storage, Microsoft Azure. The file formats that are supported are delimited files (CSV, TSV etc), JSON, Avro, ORC, Parquet, XML. Loading of structured and unstructured data are often done on an equivalent table.

Data Storage in Snowflake

The compressed and optimised data is stored during a columnar format by Snowflake in cloud storage. Continuous Data Protection (CDP) which incorporates time travel and fail-safe may be a feature available to all or any accounts at no additional cost. it’s designed to supply future protection for your data. Unless mentioned at the time of creation, the tables in snowflake are created as permanent data. Snowflake stores these tables by dividing the rows into multiple micro partitions. The rows within the table are grouped and mapped into separate micro-partitions and are stored in columnar fashion. Any changes or updates within the data will affect the micro partition to reflect the new data.

  • Temporary tables: Temporary files exist only within the user session and specified duration of session. they need no fail-safe option and has time travel retention period of 0-1 day.
  • Transient tables: These tables possess the characteristics of both permanent and temporary tables. They possess no Fail-safe and have a retention period of 0-1 day just like the temporary tables but aren’t related to any sessions, unlike temporary tables.
  • Staged File Storage : The files containing the info to be loaded are stored piecemeal , both internal and external stages are supported by Snowflake. It offers cloning feature which will be useful for creating instant backups that don’t incur any additional costs.

Data Modelling

Snowflake supports creating views, procedures and functions using SQL codes from the tables stored.

  • Views: Different views are often created supported the need which will grant specific access to the users using CREATE VIEW statement. There are two sorts of view – on materialized view which are usually mentioned as views and materialized views. The results of the non-materialized views aren’t stored whereas the results of the materialized views are stored more sort of a table. The secure view allows the info owner to grant access to other Snowflake users for a selected subset of knowledge which provides cell-level security in multi-tenant situations.
  • Stored Procedures:  Stored procedures allow us to make complex business logics by combining the SQL statements with procedural logic. These are written in Snowflake using JavaScript code which may execute the SQL statements by calling a JavaScript API which provides error handling and procedural logic.
  • Functions (User Defined Functions): The user-defined functions in Snowflake are often written using SQL statements or JavaScript code which returns scalar or tabular results. Secure UDFs allows to link, join and analyse data with data from other users while preventing the access of data by them which may be shared.

Authorization and Authentication

Since Snowflake may be a data warehouse for cloud, it uses internet protocols for network communication and security. Snowflake authorization includes OAuth, Federated Authentication, Multi-factor Authentication.

OAuth

For validation and authorization, Snowflake provisions OAuth which is an Open standard Protocol that permits clients authorized access to Snowflake without storing or sharing user login credentials which are referred to as delegated authorization. Snowflake aids OAuth to clients using integrations, a snowflake object that gives an interface between Snowflake and third-party services.

Federated Authorization

Users can hook up with Snowflake using Single-sign-on which is enabled by Federated Authentication. during a federated environment, authentication and access are separated using external entities which give independent authentication.

Multi-factor Authentication

Multi factored authentication can provide increased login security for users connecting to Snowflake which is provided as an integrated feature powered by Duo Security Service which is managed by Snowflake which needs installation of Duo Mobile Application.

Reporting

Reporting in snowflake is primarily to look at the results for a question that has been executed. In snowflake the worksheet offers the user an interface for creating SQL Queries and DML, DDL operations to look at the results only in tabular format, no other visualizations are available. Snowflake allows to make multiple worksheets with its own separate session and reserve it during a library of saved worksheets for later use. Saved worksheets aren’t accessible outside the snowflake web interface and can’t be shared with other users. Your current interface role determines the default role for worksheets that you simply open, but the worksheets aren’t tied to the interface role. Each worksheet has its own role which will be set independently.

Snowflake with its exceptional architecture along side its concurrency and ease are often used for Data warehousing. Snowflake has all the DW features under one roof and with its exceptional cost, we will leverage it for our purposes.

June 6, 2020
GoLogica Technologies Private Limited. All rights reserved 2024.