[2025] Understanding Snowflake Query Engine Internals

You must have heard Snowflake ’s query engine is cloud native, not developed on existing engines, and provides better performance, making Snowflake a better choice for AI Cloud Platform. But have you ever wondered how Snowflake’s Query Engine works? If not, then this article is just for you.

Starting with the basics….

Life of a Query in Snowflake

The life of a query in Snowflake starts with the client sending a query to the Snowflake Engine. The very first thing the query engine does is to do the result cache lookup. If it finds the result, it generates the output for the query.

If the resultset is not available in the cache, the Query Planner and Optimizer process the query. They prune and filter the query and use metadata to identify the exact records that need to be processed.

Post that the Virtual warehouse scans only needed data from the local SSD or blob storage, processes it, and returns it to the cloud services layer.

Finally, the result is processed and stored in the cache for future use and returned to the client.

How Query Compilation Works in Snowflake?

The Query Compiler is the upper layer of the Query Engine. The query text is given as input to the Parser, the Parser generates the output in a format called Query Block Internal Representation. The parser checks for syntax errors.

Think of it like: Grammar check and sentence breakdown

Then the parser sends it to the semantic analysis, which performs type checking and name resolution. It also performs other optimizations, such as applying row access policy and other security policies.

Think of it like: Making sure your sentence makes logical sense.

After that, we get the logical rewrites stage. It helps in optimizing the query by rewriting it into a logically equivalent but more efficient version. For instance, removing redundant expressions, flattening nested subqueries, etc.

Think of it like: Saying the same thing more smartly.

The next stage is micro partition pruning, which uses the internal component Pruner, which is repeatedly invoked multiple times for the process of compilation.

It determines which micro-partitions are relevant to your query, based on metadata like min/max values.

Think of it like: Massive performance booster, by avoiding unnecessary data scans.

The initial Plan generation builds the initial logical plan using relational algebra. It’s not optimized yet, just a functional breakdown of what needs to happen.

Think of it like: A raw to-do list for your query.

Plan Rewrites applies the internal optimizations to the initial logical plan, like predicate pushdown, join simplification, and aggregation pushdown. It helps reduce the data movement and improve execution performance.

Think of it like: Tidying and organizing your to-do list for max efficiency.

Cost-Based Join Ordering evaluates multiple join orders and calculates a cost estimate. It then chooses the join order with the lowest estimated cost.

Think of it like: Choosing the fastest route based on traffic conditions.

Finally, we have the last layer, which is Physical Plan generation. It converts the optimized logical plan into a physical execution plan that includes actual operators, distribution strategy, and resource assignment.

Think of it like: Turning your travel plan into GPS directions with real roads and turns.

Understanding Query Execution Layer of Query Engine

The query execution layer is the place where the actual work happens based on the compiled and optimized query plan.

It has 3 major components:

Columnar execution:

Snowflake stores and processes data in columnar format (not row-based).

This provides faster reads when you are only querying specific columns. It also provides better compression and efficient scanning since values in a column tend to be similar.

Think of it like: Reading only the chapter you need from a book instead of the whole book.

Vectorized execution:

It processes data in batches instead of row-by-row. Each vector is a batch of thousands of values from a column. It makes the CPU cache-friendly execution, enabling SIMD instructions.

For instance, instead of:

amount[0] > 100
amount[1] > 100...

It does:

amount[0:1023] > 100 in one go

Think of it like: Running a washing machine full of clothes (vector) instead of washing one sock at a time (row-based)

Push-Based Flow Control:

This means the data producers (like scan operators) push data downstream to consumers(like joins or aggregations), as fast as the consumer can handle.

This is different from Pull-based flow control, where consumers request data from producers. This provides pipeline parallelism, allows backpressure handling.

Think of it like: A conveyor belt system in a factory, where items move continuously through each processing stage.

The columnar, vectorized, and push based makes Snowflake’s execution engine massively parallel, low-latency and resource-efficient.

Query Statistics Collection in Snowflake

Snowflake automatically collects statistics (“metadata”) at multiple levels:

Table and micro-partition level

Number of rows
size in bytes

Column Level

Min/ Max
Null Count
Distinct Count

Sub-column Level

Similar column stats for common paths in semi-structured data

Snowflake statistics are always up-to-date and accurate. These are used as input to the Optimizer cost model, and for certain optimizations like pruning and constant folding.

Columns and table metadata are cached in the cloud services layer, so the users do not have to worry about managing the statistics manually.

Lastly, Understanding Query Plan Terminology

The query plan is a DAG that consists of operators connected by links.

Operators:

They are the nodes in the graph that contain common operators like join, filters, and aggregations.

Links:

The links are responsible for data exchanges and handle parallel distribution of data.

How to get Query Information?

There are 2 approaches: you can get query plan information using Snowsight UI or by using get_query_operator_stats().

As a best practice, the larger tables should be on the right side of the DAG, which means DAG should be at the right side slope.

If you are interested in reading Snowflake’s architecture, then here’s a quick blog for you:

If you are a Data architect and looking to implement a data lakehouse using Apache Iceberg and Snowflake, here’s a quick link for you:

About Me:

Hi there! I am Divyansh Saxena

I am an Snowflake Advanced Certified Data Architect with a proven track record of success in Snowflake AI Data Cloud technology. Highly skilled in designing, implementing, and maintaining data pipelines, ETL workflows, and data warehousing solutions. Possessing advanced knowledge of Snowflake’s features and functionality, I am a Snowflake Data Superhero since 2023. With a major career in Snowflake Data Cloud, I have a deep understanding of cloud-native data architecture and can leverage it to deliver high-performing, scalable, and secure data solutions.

Follow me on Medium for regular updates on Snowflake Best Practices and other trending topics:

Also, I am open to connecting all data enthusiasts across the globe on LinkedIn:

https://www.linkedin.com/in/divyanshsaxena/

[2025] Understanding Snowflake Query Engine Internals

Divyansh Saxena

Building KPMG India’s Snowflake Practice - developing partnerships and solutions🌟 Snowflake Advanced Certified Architect ❄️ Snowflake DSH 2023-25 ❄️ | STUG Leader | SnowPro SME | 17X ☁️ Certified | 8K @LinkedIn

Life of a Query in Snowflake

How Query Compilation Works in Snowflake?

Understanding Query Execution Layer of Query Engine

Columnar execution:

Vectorized execution:

Push-Based Flow Control:

Query Statistics Collection in Snowflake

Table and micro-partition level

Column Level

Sub-column Level

Lastly, Understanding Query Plan Terminology

Operators:

Links:

How to get Query Information?

About Me:

Discover Snowflake Superpowers

3,021 followers

More articles by this author

Others also viewed

Your 4 step guide to copying data from AWS S3 to Snowflake using Matillion

Navigate the World of Cloud Data Services: An Overview for Tech Executives

Bigquery

Part 4: Final Thoughts – Pros, Cons, and Best Practices Across Storage Solutions

A Win for MongoDB as they Acquire Voyage AI

Snowflake VS Azure Synapse | 7 reasons why you should choose Snowflake OR Synapse on Azure

Week 10 (4 Mar - 10 Mar)

AWS update of Week 10 (6Mar-12Mar)

Vector Databases in Azure: Current State, Selection Criteria, and Benefits

Amazon Athena vs. Traditional CSV Analysis: Why Serverless Querying Wins in Modern Data Workflows

Explore topics

Life of a Query in Snowflake

How Query Compilation Works in Snowflake?

Understanding Query Execution Layer of Query Engine

Columnar execution:

Vectorized execution:

Push-Based Flow Control:

Query Statistics Collection in Snowflake

Table and micro-partition level

Column Level

Sub-column Level

Lastly, Understanding Query Plan Terminology

Operators:

Links:

How to get Query Information?

About Me:

Discover Snowflake Superpowers

3,021 followers

Don't miss out on the Upcoming Snowflake Workshop on Cortex Agents and Apache Polaris

Apr 21, 2025

[2025] Improve Your Pandas Workloads Using Snowflake Snowpark Pandas API

Mar 28, 2025

Empowering Data, Building Futures: Snowflake Data Superhero @Partner Connect 2025

Mar 21, 2025

[2025] Confused Between COPY, Snowpipe, Dynamic Tables? Let’s Understand Ingestion Mechanisms in Snowflake

Mar 16, 2025

Snowflake Arctic, Snowflake Data Cleanroom, Upcoming Jaipur Virtual Meetup, and More...

Apr 30, 2024

New Snowflake User Groups Now in India | April 13, 2024- Jaipur Meet-Up

Apr 7, 2024

500+ Enrollments! Avail Your Snowflake Snowpark for Python Course for FREE

Nov 5, 2023

Elevate Your Snowflake Experience with Flurry Insights Native Apps

Nov 4, 2023

Master Snowflake Snowpark API with My Latest Python Course!

Oct 22, 2023

Keep Abreast with Monthly Snowflake Rewind

Sep 20, 2023

Others also viewed

Your 4 step guide to copying data from AWS S3 to Snowflake using Matillion

Navigate the World of Cloud Data Services: An Overview for Tech Executives

Bigquery

Part 4: Final Thoughts – Pros, Cons, and Best Practices Across Storage Solutions

A Win for MongoDB as they Acquire Voyage AI

Snowflake VS Azure Synapse | 7 reasons why you should choose Snowflake OR Synapse on Azure

Week 10 (4 Mar - 10 Mar)

AWS update of Week 10 (6Mar-12Mar)

Vector Databases in Azure: Current State, Selection Criteria, and Benefits

Amazon Athena vs. Traditional CSV Analysis: Why Serverless Querying Wins in Modern Data Workflows

Explore topics