COMPARE WHETHER TWO TABLES CONTAIN THE SAME DATA OR NOT USING HASH_AGG IN SNOWFLAKE.

Snowflake has one cool function called HASH_AGG which returns a 64 bit signed hash value over the set of inputs column. As the snowflake document itself mentioned, “HASH_AGG is not a cryptographic hash function and should not be used as such”. We can use it to compare whether two columns or sets of columns value are identical or not. 

I cloned the product table and created the product_clone table. They contain exactly the same data. Now, If we run the below queries, They will return the same hash value:

SELECT HASH_AGG(*) FROM  product;
SELECT HASH_AGG(*) FROM product_clone;

Since two queries return the same hash value; we can say that product and product_clone have the same data and are identical to each other. However, The hash number means nothing for us we just want to confirm whether the two tables are the same or not. So alternatively we can compare as below as well:

SELECT HASH_AGG(*)=(SELECT HASH_AGG(*) FROM PRODUCT_CLONE) AS IS_SAME
FROM  product;

Instead of  *, we can check HASH_AGG on column level as well.

SELECT HASH_AGG(ITM_KEY)=(SELECT HASH_AGG(ITM_KEY) FROM PRODUCT_CLONE) AS IS_SAME
FROM  product;

Since we are comparing two table on the basis of hash value instead of comparing column wise data It is much fastest than comparing using joins. But it helps us to only check whether two tables data are same or not. If there is difference in data and we need to identify the degree of difference we need to use other options like Join, set functions etc.

Leave a Reply