Loading Data in Snowflake
So, on the previous blog we have learned and understand Snowflake Architecture and How to create Database, Table
As the table were empty there are multiple ways to insert data into a table.
But if we have to load data which is already existing in different source or in some file format or in same server, for these scenario Snowflake have different which is quite help. Lets look into this,
Snowflake provide two ways or method to load data into tables -
Bulk Loading and Continuous Loading
Bulk Load — Loading the whole data at one time
- Most Frequent Load
- Uses Warehouse
- Loading from Stages
- Copy command
- Transformation possible
Continuous Load -
- Designed to load small volume of Data
- Automatically gets loaded once they are added to stages
- Latest data always available for analysis
- Snow-pipe (Serverless feature)
We would need stages to performing Loading in Snowflake -
Stages are the database objects which have target database or table location where data can be loaded
Types of Stages -
External Stages
- External cloud provider service are used
- S3 from AWS
- Azure blob storage from Azure
- GCP
- Database object created in schema
- Create stage command
Note — Addition cost map apply if region/platform differs
Internal Stages
- Local storage maintained by SNOWFLAKE
Steps to create Stages on snowflake
Follow the command below -
// CREATE A NEW DATABASE
create or replace database MANAGE_DB;
// CREATE SCHEMA FOR EXTERNAL STAGES
CREATE OR REPLACE SCHEMA EXTERNAL_STAGE;
// CREATING EXTERNAL STAGES QUERY
CREATE OR REPLACE STAGE MANAGE_DB.EXTERNAL_STAGE.AWS_STAGE — FULLY QUALIFIED STAGE NAME
url=’s3://bucketsnowflakes3'
credentials=(aws_key_id=’ABCD_DUMMY_ID’ aws_secret_key=’1234abcd_key’);
// DESCRIPTION OF STAGE
DESC STAGE MANAGE_DB.EXTERNAL_STAGE.AWS_STAGE
//ALTER STAGE
ALTER STAGE AWS_STAGE
SET credentials=(aws_key_id=’ABCD_DUMMY_ID’ aws_secret_key=’your_key’);
// PUBLICALLY ACCESS STAGING AREA
CREATE OR REPLACE STAGE MANAGE_DB.EXTERNAL_STAGE.AWS_STAGE
url=’s3://bucketsnowflakes3';
// LIST STAGE
LIST @AWS_STAGE — @ IS USED AS A PREFIX FOR EXTERNAL STAGING OBJECTS
Note — To load data into any any source or Target, 1st we have to create a table in it. Its a very important step in Loading data between same or different services
// Creating ORDERS table
CREATE OR REPLACE TABLE OUR_FIRST_DB.PUBLIC.ORDERS (
ORDER_ID VARCHAR(30),
AMOUNT INT,
PROFIT INT,
QUANTITY INT,
CATEGORY VARCHAR(30),
SUBCATEGORY VARCHAR(30));
SELECT * FROM OUR_FIRST_DB.PUBLIC.ORDERS — EMPTY TABLE
Performing Bulk load of data from Staging Table on Snowflake Table
// COPY COMMAND NEED FULLY QUALIFIED NAME
// First copy command
COPY INTO OUR_FIRST_DB.PUBLIC.ORDERS — TARGET
FROM @MANAGE_DB.EXTERNAL_STAGE.AWS_STAGE — SOURCE
file_format = (type = csv field_delimiter=’,’ skip_header=1);
— This will throw error as there are multiple files present in the AWS stage and some file are not able to map with Orders table
LIST @MANAGE_DB.EXTERNAL_STAGE.AWS_STAGE
// COPY COMMAND WITH SELECTIVE FILES
COPY INTO OUR_FIRST_DB.PUBLIC.ORDERS — TARGET
FROM @MANAGE_DB.EXTERNAL_STAGE.AWS_STAGE — SOURCE
file_format = (type = csv field_delimiter=’,’ skip_header=1)
files = (‘OrderDetails.csv’)
SELECT * FROM OUR_FIRST_DB.PUBLIC.ORDERS
— What if we again load the data? Try it yourself and let me know in the comments. It will be great practice for you!!
— truncate table OUR_FIRST_DB.PUBLIC.ORDERS
// COPY COMMAND WITH PATTERN
COPY INTO OUR_FIRST_DB.PUBLIC.ORDERS
FROM @MANAGE_DB.external_stage.aws_stage
file_format= (type = csv field_delimiter=’,’ skip_header=1)
pattern=’.*Order.*’;
In the next Blog we will see “How to handle Error in Bulk Load”.
Some of my other blog -
- Database in Snowflake. If I ask this question… | by Sarvesh Pandey | Jun, 2022 | Medium
- Snowflake Architecture and Warehouse | by Sarvesh Pandey | Medium | Medium
- Parameter Passing — Move multiple Files from Blog Storage to Azure SQL DB | by Sarvesh Pandey | Medium
- Database Backup Location is AZURE! | by Sarvesh Pandey | Medium
- Starting Machine Learning with Python! | by Sarvesh Pandey | May, 2022 | Medium