Introducing LocalStack for Snowflake: The new emulator to build & test data pipelines locally
We are excited to announce LocalStack for Snowflake (preview) which enables a high-fidelity, fully local Snowflake experience to develop & test your data pipelines.
Introduction
We are excited to announce the public preview of LocalStack for Snowflake ❄️. This new emulator enables you to develop and test your Snowflake data applications locally and in CI pipelines, using a Docker container that can be easily plugged into your Snowflake integrations, such as Snowpark, different client libraries, or Streamlit applications, among others. This preview version of the Snowflake emulator supports various key features, such as:
- DDL/DML/DQL operations on warehouses, databases, schemas, and tables;
- Storing files in Snowflake stages;
- Tasks for scheduled execution;
- Snowpipe streaming with Kafka connector;
- User-defined functions (UDFs) in JavaScript and Python;
- Table streams for change data capture (CDC) and audit logs;
- Cross-database resource sharing;
- Basic support for running Streamlit applications locally;
- Infrastructure-as-code with Terraform & Pulumi;
- Test your data applications in Continuous Integration (CI) pipelines with GitHub Actions, CircleCI & GitLab CI (among others);
- Enterprise-tested features like Cloud Pods and Persistence for state snapshots.
The Snowflake emulator allows you to bypass the need to rely on the live version for local development and testing while enabling high-velocity and agile test-driven development for your data applications. With this release, we are demonstrating our commitment to go multi-cloud and build a complete suite of developer tools that will allow you to achieve efficiency and cost savings by bringing development and testing closer together.
The Snowflake emulator is currently in public preview, and you can reach out to us to get access! This blog explores how we’ve reached this important milestone, outlines what it means for our users, and provides a quick introduction to help you get started.
For more detailed guidance, please navigate to our LocalStack for Snowflake documentation.
How did we get here?
Software development organizations need a fast build lifecycle, quick continuous integration workflows, and a smoother developer experience — all this while optimizing cost, putting security in the forefront, and testing all application logic at the same time. However, adopting modern cloud-based solutions has been challenging due to a slow dev&test loop that massively undercuts the promise given to us by proprietary providers. The centralized, remote execution model of cloud providers is a limitation & liability, as developers continue to juggle around live development environments, long build times, inefficient CI pipelines, and cumbersome developer experience.
LocalStack has responded to these challenges by providing a robust cloud emulator that supports over 100 AWS services, allowing for integration tests to be run locally and in CI environments. This capability has positioned LocalStack as a critical tool for developers seeking to improve efficiency and reduce dependencies on remote cloud environments.
As LocalStack’s user base expanded, so did the demand for similar capabilities with Snowflake. Although a local testing framework from Snowflake is available, it only provides mock support for running integration tests, which falls short of more complex use cases. Leveraging the existing toolset in the LocalStack core cloud emulator — including our RDS Postgres utilities, snapshot testing library, analytics service client, and more — allowed us to build an initial experimental preview. This new extension was announced on our Discuss forum, where it quickly gained significant traction within the community.
How did we build this?
At its core, we utilize PostgreSQL as the database engine to store the user data and execute queries. The SQL syntax of Snowflake queries is overall fairly similar to PostgreSQL, but there are several more or less subtle differences. The figure below outlines some of the main components used in our implementation:
Query Processors are the main building blocks that collectively comprise the core engine that processes incoming user queries. We distinguish 3 main types of query processors:
- Database initializers are pieces of logic that are applied only once upon creation of a database (e.g., creating custom SQL functions).
- Query pre-processors operate on the abstract syntax tree (AST) of SQL queries and transform incoming queries in Snowflake format to target queries that can be executed in the DB engine (Postgres).
- Result post-processors take care of applying additional custom logic and converting the DB engine query results to Snowflake API-compatible result sets — either as JSON blobs or in Apache Arrow table format.
Auxiliary Services encompass additional pieces of logic to handle file stages, session states, table streams, tasks, as well as other integrations and functions.
How do I start?
To get started with the Snowflake emulator, pull our Docker image from DockerHub:
docker pull localstack/snowflake:latest
You can start the emulator using the localstack
CLI after exporting your LocalStack Auth Token (LOCALSTACK_AUTH_TOKEN
) in your terminal session:
export LOCALSTACK_AUTH_TOKEN=<your-auth-token>
IMAGE_NAME=localstack/snowflake localstack start
This command starts the emulator on snowflake.localhost.localstack.cloud
, a DNS name that resolves to the local IP address 127.0.0.1
. This setup ensures that the connector interacts seamlessly with the local APIs.
If you’re using Snowflake Drivers, such as the Snowflake Connector for Python, you can use the following code to connect to the local Snowflake instance:
import snowflake.connector as sf
conn = sf.connect(
user="test",
password="test",
account="test",
database="test",
host="snowflake.localhost.localstack.cloud",
)
Once connected, you can set up your development environment by executing commands to establish core components. Create a warehouse named test_warehouse
, a database named testdb
, and a schema named testschema
with the following commands:
conn.cursor().execute("CREATE WAREHOUSE IF NOT EXISTS test_warehouse")
conn.cursor().execute("CREATE DATABASE IF NOT EXISTS testdb")
conn.cursor().execute("USE DATABASE testdb")
conn.cursor().execute("CREATE SCHEMA IF NOT EXISTS testschema")
Similarly, you can utilize the JDBC driver to connect to the Snowflake emulator from your preferred database visualization tool. Here is an example of running Snowflake SQL queries on DBeaver connected to the Snowflake emulator:
You can navigate to the LocalStack logs via localstack logs
to see the Snowflake emulator in action. To connect your existing Snowflake app to the emulator, all you need to do is add the Snowflake Host name as snowflake.localhost.localstack.cloud
while specifying mock credentials for your Snowflake user, password, and account.
Note that LocalStack at no point talks to the real Snowflake instance — Everything runs locally, giving you the full power and flexibility to develop and test your data applications without relying on real cloud resources.
Running a Snowflake Data Application locally
To demonstrate a complete scenario, we used the Building a Data Application Snowflake quickstart app and deployed it against the local emulator.
The code for this application is available on GitHub. After following the installation instructions (make install
) and seeding the data into local Snowflake (make seed
) using Snow CLI, you can start the app locally and interact with the local tables via the web user-interface (make start-web
).
The screenshot below shows how the Web app queries NYC Citibike trips data and displays the distribution of trips by month and weekday.
Next steps
The foundation for our next steps in the development of the Snowflake emulator lies in parity, performance, and developer experience, and our promise to provide the best tooling to empower data engineers across the entire software development lifecycle (SDLC).
Join our preview program today! We’re working very closely with our community of early adopters to understand your use cases, and we can prioritize feature development during this current stage of development, to ensure our implementation properly supports your requirements. Here are some features you can expect in the upcoming months:
- Full emulation of database roles, role-based access control, and row-level security policies.
- Enhanced support for table streams and CDC use cases.
- Advanced integration with other storage/streaming cloud services in LocalStack (AWS Glue, Kinesis Firehose, S3, AppFlow, etc).
- Tooling for test data management and preseeding the emulator with data from a real Snowflake instance.
- A Web user experience to inspect the state of your local Snowflake resources and help with common day-to-day tasks.
- A connection proxy that allows mirroring data from real Snowflake cloud into the local emulator, to easily flip the switch between local and remote query execution.
We are excited to have the privilege of working with our community to accelerate cloud and data development processes. LocalStack is poised to change the cloud development landscape, and we are looking forward to your continued support and feedback!
Learn more
- Watch the webinar recording on testing data pipelines with the Snowflake emulator.
- Navigate to our documentation & tutorials to try out the various features.
- Check out our Snowflake samples to explore a variety of use cases that you can run locally.
- Have questions? Join the LocalStack Slack Community to get help.
New to LocalStack? Create a free account today and get started!