Snowflake Cost Optimization Best Practices

Data EngineeringSnowflake Data Cloud

On-premise systems have many advantages but also have a huge share of challenges—for example, system maintenance, licensing, and software upgrades. Most businesses are moving to cloud-based platforms to eliminate their challenges in on-premice systems. There are also challenges in moving to cloud technology. Snowflake is one of the top-most preferred cloud data warehouses. Costing or pricing is a crucial parameter while or before implementing the solution. Snowflake, it is important to understand the costs involved in the Snowflake before implementing a solution. Before managing your costs, you must understand the unique Snowflake architecture and underlying concepts related to snowflake cost optimization. The diagram below illustrates the Snowflake 3-Tier Architecture, which consists of three largely independent hardware layers.

The Snowflake Architecture is made up of the following components:

Cloud Services:  Which accepts SQL statements and manages the overall system.  Compute is billed per-second but the first 10% of compute costs are free. 

Virtual Warehouses:  Which account for about 90% of all costs and are where most SQL queries are executed.  The warehouse cost model is explained below. 

Cloud Storage:  Which is a fixed cost per terabyte per month, and not a major concern. Of the three areas, the Virtual Warehouse Cost is the most significant, although there are additional potential costs from: 

Server-less Features:  For example, Snowpipe, Automatic Clustering, Server-less Tasks and Database Replication.  These tend to be billed by the second. 

Cross Region Transfer Costs:  When data is transferred across Snowflake Accounts in different Regions, the Cloud Provider may charge (per gigabyte) for data transferred across Region or Cloud boundary.

Snowflake Cost Optimization

TIPS FOR SNOWFLAKE COST OPTIMIZATION

Given the extensive insights above, the top tips to snowflake cost optimization include: 

  1. Warehouse Sizing: Size warehouses depending upon the size of workloads. Be aware queries should run twice as fast on a larger warehouse but stop increasing warehouse size when the elapsed time improvements drop below 50%.
  1. Submit multiple SQL jobs in parallel in a different connection running on a shared batch transformation warehouse to maximize throughput. 
  2. Usage of Scaling Policy: For warehouses designed to run lower priority batch jobs, set the MAX_CLUSTER_COUNT and SCALING_POLICY = ‘ECONOMY’ to balance the need to maximize throughput with optimizing compute cost.
  1. Use Warehouse Configuration properly: For end-user warehouses where performance is a priority, set the MAX_CLUSTER_COUNT = 5 and SCALING_POLICY = ‘STANDARD.’ This will automatically allocate additional clusters as the concurrent workload increases. However, put the MAX_CLUSTER_COUNT to the smallest number possible while controlling the time spent queuing. With a SCALING POLICY of STANDARD, avoid setting the MAX_CLUSTER_COUNT = 10 (or higher) unless maximizing performance is a much higher priority than controlling cost.
  2. Using Auto_Suspend: Only set the AUTO_SUSPEND to zero if you are confident you are aware of the potential risks of continuous spending and take steps to suspend warehouses when not needed.
  3. Update the Query Timeout Default Value: Snowflake has a default query timeout value of 48 hours (172800 seconds), meaning that queries running longer than 48 hours will be automatically terminated. Updating the query timeout value can help reduce Snowflake costs by preventing long-running queries from consuming excessive compute resources.

Set a STATEMENT TIMEOUT IN SECONDS for every warehouse based on the warehouse size to limit the risk of overspending on an individual query. 

  1. Using Resource Monitor Tool:  It is important to set up resource monitors to keep the Snowflake usage within budget. These monitors can be set up to send alerts and/or stop WH automatically when consumption levels reach predefined thresholds.  

Setting the notification thresholds below your hard limit is recommended to avoid running out of capacity. These alerts can be selected at an Account level (comprising all WHs) or for individual warehouses.

  1. Use zero-copy cloning effectively – One of the best features of Snowflake is Cloning. Many environments, like DEV, UAT, and PROD, are created while working on data warehousing projects. Testing on UAT requires the lasted data; the full table, schema, or database can be cloned from PROD to UAT.

     

    With this feature, only metadata is created for the cloned table; it points to the same storage, eliminating the additional storage cost.

  2. Analyze account usage – Monitoring account usage, including query and warehouse metering history, is essential for tracking credit consumption and identifying workloads that took the longest to optimize and eliminate patterns and areas that took longer.

    Consider a dedicated x-small warehouse where a query runs 20 times daily with minimal workload. Each run takes 10 seconds, resulting in a total average daily compute time of 200 seconds.

     

    Snowflake charges a minimum of 1 minute every time the warehouse is resumed. This means that the user will pay for 1200 seconds of computing cost, which is six times higher than the actual consumption. To avoid such patterns, the user can take snowflake cost optimization actions, such as changing the schedules of the query, running in a shared virtual warehouse, etc. 

Author

  • Vikrant Chavan

    Vikrant Chavan is a Marketing expert @ 64 Squares LLC having a command on 360-degree digital marketing channels. Vikrant is having 8+ years of experience in digital marketing.

    View all posts

Leave a Reply

CALL NOW
× WhatsApp