[2025] Improve Your Pandas Workloads Using Snowflake Snowpark Pandas API

Divyansh Saxena

Building KPMG India’s Snowflake Practice - developing partnerships and solutions🌟 Snowflake Advanced Certified Architect ❄️ Snowflake DSH 2023-25 ❄️ | STUG Leader | SnowPro SME | 17X ☁️ Certified | 8K @LinkedIn

Published Mar 28, 2025

Those who are familiar with the Pandas Dataframes, I believe you are aware of Pandas not been optimized for handling very large datasets. It operates in-memory, which can lead to excessive RAM usage and slow performance.

Pandas does not natively support distributed processing across multiple machines or clusters, and you need to switch to Dask, Modin or Vaex for handling large-scale data more efficiently. If your data is on snowflake, here’s an amazing thing for you, Snowpark Pandas API, allowing you to run your pandas workloads efficiently without compromising the data security related concerns. Let’s talk in detail about Snowflake Snowpark Pandas API.

Starting with the basics…

Why you should try Snowpark Pandas API? [Asked in Interviews too]

The Pandas on Snowflake help you running the pandas code directly on your data in Snowflake. The experience is familiar to pandas experience that you know and love with additional benefits.

Firstly you will be able to run code in distributed manner, allowing you to work on much larger datasets. Secondly it runs workloads natively in Snowflake through transpilation to SQL, enabling it to take advantage of parallelization and Snowflake security and governance benefits.

Snowflake Snowpark Pandas API is part of Snowflake python library, that allows to develop scalable data processing of the Python code within Snowflake platform.

So if you have an existing code written in pandas, or you and your team are familiar with pandas and want to collaborate on same code base, then Snowpark Pandas API is made for you.

The Snowpark Pandas API are part of the Snowflake ecosystem, which allows users with multiple benefits, including:

Data does not leave snowflake’s secure platform.. Pandas on Snowflake allows uniformity within data organization on how it is accessed, making it easier for auditing and governance.
Leverages Snowflake’s compute engine, so you have to worry less on setting up or managing additional compute infrastructure.
Using existing query optimization techniques for improvising pandas workloads, since Snowpark Pandas API bridges the convenience of pandas with the scalability of Snowflake.

Getting started with Snowpark Pandas API

If you are using Snowpark for a long time, you need to make sure that you have pandas on Snowflake package installed.

Once you have it installed, you can use modin and can get started with Snowpark Pandas API.

import modin.pandas as pd
# Import the Snowpark pandas plugin for modin
import snowflake.snowpark.modin.plugin

Now you will the similar experience as you are using native Pandas (but with the features and support of Snowflake). For instance, reading the data:

You can also read data from various files, like excel, parquet, json, and more.

Blend your local data with Snowflake Tables using Snowpark Pandas API

You can also use Snowpark Pandas API dataframes to read data from snowflake tables, and even write back into snowflake or convert them into snowpark dataframes.

Interoperability Between Snowpark Dataframes and Pandas API

Both are highly interoperable, so you can leverage both to build your data pipelines.

You can convert a Snowpark Pandas API dataframe into Snowpark Pandas dataframe using to_snowpark operation. This operation assigns an implicit order to each row, and maintains this row order during the lifetime of the DataFrame, leading to I/O cost for this conversion.

Similarly you can convert Snowpark dataframes into Snowpark Pandas API dataframe using to_snowpark_pandas operation. The resulting Snowpark DataFrame operates on a data snapshot of the source Snowpark pandas DataFrame. This means that changes to the underlying table will not be reflected during the evaluation of the Snowpark operations.

So it’s highly recommended to use read_snowflake for table data instead of creating from a Snowpark DataFrame to avoid unnecessary conversions.

Understanding Evaluation Approaches for Snowpark Pandas API VS Native Pandas

Snowpark Pandas API, integrates with Snowflake, allowing us to handle much larger dataset that exceed the memory capacity of a single machine. So yes you would require a snowflake connection in order to use Snowflake Snowpark Pandas API

On the other side, Native pandas, operates on a single machine and processes the data in memory.

In terms of evaluation, pandas executes operations immediately and materializes results fully in the memory after operations. Now this makes a pain point as this eager evaluation of operations lead to memory pressure, as data needs to be moved extensively within the machine.

On the other hand, Snowpark Pandas API does mimics the eager evaluation model but internally builds a lazily-evaluated query graph to enable optimization across operations.

Fusing and transpiling operations through a query graph enables additional optimization opportunities for the underlying distributed Snowflake compute engine, which decreases both cost and end-to-end pipeline runtime compared to running pandas directly within Snowflake.

*Fusing is optimization technique where multiple operations are combined into single operation to improve performance. *Transpiling refers to the converting the code from one source to another.

Pandas on Snowflake is a huge topic that can’t be covered in a single article, however you must have got the glimpse of the benefits that you can get.

If this article reaches out to you, and you are interested to explore more, then do make sure to subscribe my medium channel and stay tuned for upcoming blogs on Snowpark Pandas API.

Quick Recaps

If you are looking more details on Setting up Snowpark Locally as test environment, here’s quick guide for you:

And if you are looking for a cheatsheet for Snowpark Dataframes, here’s a good article that I have published in the past:

About Me:

Hi there! I am Divyansh Saxena

I am an Snowflake Advanced Certified Data Architect with a proven track record of success in Snowflake AI Data Cloud technology. Highly skilled in designing, implementing, and maintaining data pipelines, ETL workflows, and data warehousing solutions. Possessing advanced knowledge of Snowflake’s features and functionality, I am a Snowflake Data Superhero since 2023. With a major career in Snowflake Data Cloud, I have a deep understanding of cloud-native data architecture and can leverage it to deliver high-performing, scalable, and secure data solutions.

Follow me on Medium for regular updates on Snowflake Best Practices and other trending topics:

Also, I am open to connecting all data enthusiasts across the globe on LinkedIn:

https://www.linkedin.com/in/divyanshsaxena/

Discover Snowflake Superpowers

3,021 followers

+ Subscribe

vamsi k

4mo

Hey Saxena! 👋 I just completed a real-world messy data cleaning project using Python and pandas: ✅ Cleaned and standardized names ✅ Fixed inconsistent date formats ✅ Handled missing values with logic ✅ Cleaned and mapped city names properly 📂 Fully structured pipeline with README – anyone can run it from scratch! 🔗 GitHub: https://lnkd.in/g2KYCt8e 📌 You can also see more projects I’ve posted on my profile: 🔗 https://www.linkedin.com/in/vamsi-k-7b1756252/recent-activity/all/ 👇 Would love to hear your thoughts in the comments if it interests you!

2 Reactions

Future Tech Skills

4mo

Great advice

1 Reaction

Syed Kamran Pasha

Senior Data Engineer - Snowflake - Reltio MDM - Salesforce - Business intelligence

4mo

This is great for #Pandas users 👍

1 Reaction

See more comments

To view or add a comment, sign in

See all

[2025] Improve Your Pandas Workloads Using Snowflake Snowpark Pandas API

Divyansh Saxena

Building KPMG India’s Snowflake Practice - developing partnerships and solutions🌟 Snowflake Advanced Certified Architect ❄️ Snowflake DSH 2023-25 ❄️ | STUG Leader | SnowPro SME | 17X ☁️ Certified | 8K @LinkedIn

Why you should try Snowpark Pandas API? [Asked in Interviews too]

Getting started with Snowpark Pandas API

Blend your local data with Snowflake Tables using Snowpark Pandas API

Interoperability Between Snowpark Dataframes and Pandas API

Understanding Evaluation Approaches for Snowpark Pandas API VS Native Pandas

Quick Recaps

About Me:

Discover Snowflake Superpowers

3,021 followers

More articles by this author

Others also viewed

Using Polars in unison with Databricks Unity Catalog

Just Enough Spark! Core Concepts Revisited !!

Heart Attack Prediction in the U.S. Using Databricks and AWS

Import Data into Postgres Table Using Pandas

Quick Guide on using Databricks Delta Lake using Python API

An In-depth Exploration of PySpark: A Powerful Framework for Big Data Processing

Handling Video Uploads with Pre-Signed URLs & Event-Based Processing

RDD vs DataFrame vs Dataset: Choosing the Right Abstraction in Apache Spark

End-to-End Data Engineering Project with Airflow, Python, and AWS

Understanding Apache Spark's Execution Model: From Transformations to Tasks

Explore topics

Why you should try Snowpark Pandas API? [Asked in Interviews too]

Getting started with Snowpark Pandas API

Blend your local data with Snowflake Tables using Snowpark Pandas API

Interoperability Between Snowpark Dataframes and Pandas API

Understanding Evaluation Approaches for Snowpark Pandas API VS Native Pandas

Quick Recaps

About Me:

Discover Snowflake Superpowers

3,021 followers

Don't miss out on the Upcoming Snowflake Workshop on Cortex Agents and Apache Polaris

Apr 21, 2025

[2025] Understanding Snowflake Query Engine Internals

Apr 13, 2025

Empowering Data, Building Futures: Snowflake Data Superhero @Partner Connect 2025

Mar 21, 2025

[2025] Confused Between COPY, Snowpipe, Dynamic Tables? Let’s Understand Ingestion Mechanisms in Snowflake

Mar 16, 2025

Snowflake Arctic, Snowflake Data Cleanroom, Upcoming Jaipur Virtual Meetup, and More...

Apr 30, 2024

New Snowflake User Groups Now in India | April 13, 2024- Jaipur Meet-Up

Apr 7, 2024

500+ Enrollments! Avail Your Snowflake Snowpark for Python Course for FREE

Nov 5, 2023

Elevate Your Snowflake Experience with Flurry Insights Native Apps

Nov 4, 2023

Master Snowflake Snowpark API with My Latest Python Course!

Oct 22, 2023

Keep Abreast with Monthly Snowflake Rewind

Sep 20, 2023

Others also viewed

Using Polars in unison with Databricks Unity Catalog

Just Enough Spark! Core Concepts Revisited !!

Heart Attack Prediction in the U.S. Using Databricks and AWS

Import Data into Postgres Table Using Pandas

Quick Guide on using Databricks Delta Lake using Python API

An In-depth Exploration of PySpark: A Powerful Framework for Big Data Processing

Handling Video Uploads with Pre-Signed URLs & Event-Based Processing

RDD vs DataFrame vs Dataset: Choosing the Right Abstraction in Apache Spark

End-to-End Data Engineering Project with Airflow, Python, and AWS

Understanding Apache Spark's Execution Model: From Transformations to Tasks

Explore topics