Copy option in Snowflake DW

Sarvesh Pandey

5 min readJul 4, 2022

This article will help you to understand different feature/options available with Copy data command in Snowflake.

Understanding Copy Options -

Validation

Size Limit

Return Failed only

TruncateColumn

Force

Default Copy Syntax

Syntax-

Copy into <table_name>

From ExternalStage

Files = (‘File_name_1’,’File_name_2')

FILE FORMAT = <file format name>

Validation Mode -

Validate data files instead of loading them

To check wheater if any error or data is corrupted or not

Syntax -

Copy into <table_name>

From ExternalStage

Files = (‘File_name_1’,’File_name_2')

FILE FORMAT = <file format name>

VALIDATION_MODE = RETURN_n_ROWS | RETURN_ERROR

Lets create a Database and Table name in Snowflake for Data Validation

CREATE OR REPLACE DATABASE COPY_DB;

CREATE OR REPLACE TABLE COPY_DB.PUBLIC.ORDERS (

ORDER_ID VARCHAR(30),

AMOUNT VARCHAR(30),

PROFIT INT,

QUANTITY INT,

CATEGORY VARCHAR(30),

SUBCATEGORY VARCHAR(30));

// Prepare stage object

CREATE OR REPLACE STAGE COPY_DB.PUBLIC.aws_stage_copy

url=’s3://snowflakebucket-copyoption/size/’;

LIST @COPY_DB.PUBLIC.aws_stage_copy;

Before Loading, check the data by running the validation script

COPY INTO COPY_DB.PUBLIC.ORDERS

FROM @aws_stage_copy

file_format= (type = csv field_delimiter=’,’ skip_header=1)

pattern=’.*Order.*’

VALIDATION_MODE = RETURN_ERRORS

Output -

Another which will return ’N’ rows when there are no error in data validation. N = 5

COPY INTO COPY_DB.PUBLIC.ORDERS

FROM @aws_stage_copy

file_format= (type = csv field_delimiter=’,’ skip_header=1)

pattern=’.*Order.*’

VALIDATION_MODE = RETURN_5_ROWS

Output -

What happens when there are bad records??

Let’s create another stage object

// Prepare stage object

CREATE OR REPLACE STAGE COPY_DB.PUBLIC.aws_stage_copy

url=’s3://snowflakebucket-copyoption/returnfailed/’;

Run the validation script -

COPY INTO COPY_DB.PUBLIC.ORDERS

FROM @aws_stage_copy

file_format= (type = csv field_delimiter=’,’ skip_header=1)

pattern=’.*Order.*’

VALIDATION_MODE = RETURN_ERRORS

Output -

Working with Error -

Now we can see our validation get failed. What will be the next step? How we should remove those error so our data would be clean?

There will be lots of question arises after getting an Error. In Snowflake, approach is little bit different. Lets find out -

Method 1

First we will store the Error buy running few scripts -

// Storing rejected /failed results in a table

CREATE OR REPLACE TABLE rejected AS

select rejected_record from table(result_scan(last_query_id())); — Run this scripts it will give you error records from last run query

INSERT INTO rejected

select rejected_record from table(result_scan(last_query_id()));

SELECT * FROM rejected;

Method 2

Suppose we have to load data into target table and also check/save error records. We can achive this by below comand

COPY INTO COPY_DB.PUBLIC.ORDERS

FROM @aws_stage_copy

file_format= (type = csv field_delimiter=’,’ skip_header=1)

pattern=’.*Order.*’

ON_ERROR=CONTINUE

This will load data, only those records which don’t have any error.

Check the Status column. It clearly says there are some bad records in the file due to which its loaded only Partial Data.

select * from table(validate(orders, job_id => ‘_last’));

Error Details and File name can be found in using the select statement.

Key points to be noted -

select * from table(validate(table_name, job_id => Query_id));

If we place ‘_last’ this will give error records from last executed query. If you want for some other query, you will need Query ID of that execution command.

Then create a table and insert records just like Method 1

How to understand what error message is saying -

Run the select command over the table to check what kind of data is loaded and do the sum with rejected table records.

If you notice the error message is nothing but the records which are populated in single column.

Now, we will split the records into multiple and replace those with new records(modified records).

CREATE OR REPLACE TABLE rejected_values as

SELECT

SPLIT_PART(rejected_record,’,’,1) as ORDER_ID,

SPLIT_PART(rejected_record,’,’,2) as AMOUNT,

SPLIT_PART(rejected_record,’,’,3) as PROFIT,

SPLIT_PART(rejected_record,’,’,4) as QUATNTITY,

SPLIT_PART(rejected_record,’,’,5) as CATEGORY,

SPLIT_PART(rejected_record,’,’,6) as SUBCATEGORY

FROM rejected;

Size Limit -

This will help to load data into target table by specifying threshold file size. After the threshold reached data load will get stop. Size in bytes.

Note — The file size limit of for all file which will be loaded.

Syntax -

Copy into <table_name>

From ExternalStage

Files = (‘File_name_1’,’File_name_2')

FILE FORMAT = <file format name>

Size_limit = num;

Return Failed Only -

It is used when you want to specify those files which are failed to return while loading the data.

Syntax -

Copy into <table_name>

From ExternalStage

Files = (‘File_name_1’,’File_name_2')

FILE FORMAT = <file format name>

ON_Error = continue

RETURN_FAILED_ONLY = TRUE; //Default is FALSE

TRUNCATECOLUMS -

Suppose you created a table with column of 10 character length. But while loading the data there is one record which have char length greater then 10 i.e 12. Using this option those extra char will be truncated to avoid error while copying data into the Target table.

In simple terms, extra char will be truncated.

Syntax -

Copy into <table_name>

From ExternalStage

Files = (‘File_name_1’,’File_name_2')

FILE FORMAT = <file format name>

TRUNCATECOLUMNS = True;

//False will produce error while loading data into Target.

Force -

Specifies to load files weather they are already loaded or not changed since they have loaded.

Leads to duplication of data

Syntax -

Copy into <table_name>

From ExternalStage

Files = (‘File_name_1’,’File_name_2')

FILE FORMAT = <file format name>

FORCE = TRUE; //Default is FALSE

Some of my other blog -

Loading Data in Snowflake. So, on the previous blog we have… | by Sarvesh Pandey | Jun, 2022 | Medium

Database in Snowflake. If I ask this question… | by Sarvesh Pandey | Jun, 2022 | Medium

Snowflake Architecture and Warehouse | by Sarvesh Pandey | Medium | Medium

Copy option in Snowflake DW

Written by Sarvesh Pandey