LocalStack LogoLocalStack Icon

Integrating Polaris with LocalStack for Snowflake and Trino

Learn how to build a fully local data lakehouse by integrating Snowflake emulator with Apache Polaris and Trino. This tutorial walks through creating shared Iceberg tables backed by LocalStack S3, enabling seamless interoperability across, without touching the cloud.

Integrating Polaris with LocalStack for Snowflake and Trino

Introduction

Managing data across query engines like Snowflake and Trino is complex and often results in data silos, making it hard to maintain a single source of truth. Apache Iceberg, an open table format, combined with a central catalog service like Apache Polaris, facilitates access to the same data by multiple engines, enabling a unified architecture.

However, building and testing pipelines across engines remains slow and expensive due to long feedback loops and high cloud costs, which delay development and hinder early issue detection. The LocalStack for Snowflake emulator addresses these problems by letting you run a local Snowflake environment in Docker with a “shift-left” approach to data engineering.

This tutorial shows how to set up a local data lakehouse using the Snowflake emulator with Apache Polaris and Trino. You will use both engines to query the same shared Apache Iceberg tables stored in LocalStack S3 for interoperable local development.

How LocalStack works with Polaris

To support local development, LocalStack provides a localstack/polaris Docker image that runs Apache Polaris as a containerized service, enabling easy setup of a REST-based Iceberg catalog locally.

The integration between the Snowflake emulator and the Polaris container is achieved through a series of SQL commands that enable Snowflake to utilize Polaris for metadata and LocalStack S3 for data storage.

Here is how the connection is made:

  • A CREATE EXTERNAL VOLUME statement is used to define a pointer to a storage location in LocalStack S3. This tells the Snowflake emulator where to store Parquet data files and Iceberg metadata files. It includes:
    • S3 bucket URI in LocalStack
    • Access key and secret key for authentication
  • A CREATE CATALOG INTEGRATION statement sets up the connection to Polaris as an external Iceberg REST catalog. Key parameters include:
    • CATALOG_SOURCE=ICEBERG_REST: Indicates the use of the Iceberg REST API, which Polaris implements
    • CATALOG_URI: Set to the network address of the Polaris container (e.g., http://polaris:8181)
    • REST_AUTHENTICATION: Includes OAuth credentials required for Polaris API access

Once the integration is configured, you can create Iceberg tables using CREATE ICEBERG TABLE with the CATALOG='iceberg_catalog' parameter.

Snowflake sends metadata operations to Polaris, which handles schema and partition management, while actual data is written to the S3 location specified by the external volume.

Architecture Diagram

This separation of metadata and storage enables other engines like Trino to access and query the same Iceberg tables.

Prerequisites

Before you start, make sure you have the following installed:

Step 1: Set up the project

First, clone the sample application repository from GitHub and navigate into the project directory.

Terminal window
git clone https://github.com/localstack-samples/polaris-demo.git
cd polaris-demo

Step 2: Understand and start the services

The sample application uses Docker Compose to start the services. The interoperability between Snowflake and Trino is setup via their respective configurations pointing to the same Polaris Catalog and S3 backend.

2.1: Understanding the Docker Compose configuration

The docker-compose.yml file, provided in the repository, sets up and configures all required services for the sample app.

docker-compose.yml
services:
localstack:
image: localstack/snowflake:latest
ports:
- "127.0.0.1:4566:4566"
- "127.0.0.1:4510-4559:4510-4559"
- "127.0.0.1:443:443"
environment:
- LOCALSTACK_AUTH_TOKEN=${LOCALSTACK_AUTH_TOKEN:?}
- DEBUG=1
- DOCKER_FLAGS='-e SF_LOG=trace'
volumes:
- "./volume:/var/lib/localstack"
- "/var/run/docker.sock:/var/run/docker.sock"
polaris:
image: localstack/polaris:latest
ports:
- "8181:8181"
- "8182"
environment:
AWS_REGION: us-east-1
AWS_ACCESS_KEY_ID: test
AWS_SECRET_ACCESS_KEY: test
AWS_ENDPOINT_URL: http://localstack:4566
POLARIS_BOOTSTRAP_CREDENTIALS: default-realm,root,s3cr3t
polaris.realm-context.realms: default-realm
quarkus.otel.sdk.disabled: "true"
healthcheck:
test: ["CMD", "curl", "http://localhost:8182/healthcheck"]
interval: 10s
timeout: 10s
retries: 5
create-polaris-catalog:
image: curlimages/curl
depends_on:
polaris:
condition: service_healthy
volumes:
- ./create-polaris-catalog.sh:/create-polaris-catalog.sh
command: ["/bin/sh", "/create-polaris-catalog.sh"]
trino:
image: trinodb/trino:latest
ports:
- "8080:8080"
volumes:
- ./trino-config/catalog:/etc/trino/catalog
depends_on:
polaris:
condition: service_healthy

Here is a breakdown of the services:

  • localstack: Runs the LocalStack container with the Snowflake emulator enabled, exposing ports for AWS services and the Snowflake API.
  • polaris: Deploys the Apache Polaris Catalog service, configured to use LocalStack as its S3-compatible backend via the AWS_ENDPOINT_URL environment variable.
  • create-polaris-catalog: A temporary service that executes the create-polaris-catalog.sh script (provided in the repository) to set up the Polaris Catalog after the service becomes healthy.
  • trino: Launches the Trino query engine with configuration files mounted from the local trino-config directory (provided in the repository).

2.2: Understanding the Polaris Catalog Setup

The create-polaris-catalog.sh script automates Polaris Catalog setup by performing the following steps:

  • Waits for the Polaris service to report as healthy.
  • Requests an OAuth2 bearer token from Polaris for authentication.
  • Sends a POST request to the Polaris management API to create a catalog named polaris, configured to use s3://test-bucket/ in LocalStack S3 as its default storage location.
  • Adds TABLE_WRITE_DATA privilege to catalog_admin role.

2.3: Understanding the Trino Configuration

The configuration for Trino’s Iceberg connector is in trino-config/catalog/iceberg.properties.

Terminal window
# iceberg.properties
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=http://polaris:8181/api/catalog
iceberg.rest-catalog.security=OAUTH2
iceberg.rest-catalog.oauth2.credential=root:s3cr3t
iceberg.rest-catalog.oauth2.scope=PRINCIPAL_ROLE:ALL
iceberg.rest-catalog.case-insensitive-name-matching=true
iceberg.rest-catalog.warehouse=polaris
fs.native-s3.enabled=true
s3.endpoint=http://localstack:4566
s3.aws-access-key=test
s3.aws-secret-key=test
s3.region=us-east-1
s3.path-style-access=true

This file defines how Trino connects to Iceberg data.

Key properties include:

  • connector.name=iceberg: Specifies use of the Iceberg connector
  • iceberg.catalog.type=rest: Sets the catalog type to REST
  • iceberg.rest-catalog.uri=http://polaris:8181/api/catalog: Points to Polaris’s API endpoint
  • s3.endpoint=http://localstack:4566: Uses the LocalStack S3 endpoint for data access

2.4: Launch the Docker Compose

Before starting, ensure your LOCALSTACK_AUTH_TOKEN is exported as an environment variable.

Terminal window
export LOCALSTACK_AUTH_TOKEN=<your-localstack-auth-token>

Now, start all the services using Docker Compose:

Terminal window
docker-compose up

It will take a few minutes to start all the services. Once all the services are running, you can check the status of the services using the following command:

Terminal window
docker ps

You should see the following output:

Terminal window
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f82c9dc1defd trinodb/trino:latest "/usr/lib/trino/bin/…" 19 hours ago Up 9 seconds (health: starting) 0.0.0.0:8080->8080/tcp polaris-demo-trino-1
849aab637b09 localstack/polaris:latest "/opt/jboss/containe…" 19 hours ago Up 20 seconds (healthy) 8080/tcp, 8443/tcp, 0.0.0.0:8181->8181/tcp, 0.0.0.0:58598->8182/tcp polaris-demo-polaris-1
2264db66c672 localstack/snowflake:latest "docker-entrypoint.sh" 19 hours ago Up 19 seconds (healthy) 127.0.0.1:443->443/tcp, 127.0.0.1:4510-4559->4510-4559/tcp, 53/tcp, 5678/tcp, 127.0.0.1:4566->4566/tcp polaris-demo-localstack-1

Step 3: Create AWS and Snowflake Resources

With all services running, you can now create the required LocalStack S3 bucket and then use SQL commands to define the data structures in the Snowflake emulator.

3.1: Create the S3 Bucket

The Polaris Catalog uses a bucket named test-bucket. Create this bucket in LocalStack S3 using the awslocal CLI:

Terminal window
awslocal s3 mb s3://test-bucket

3.2: Create an External Volume

Next, you need to define a named location in LocalStack S3 that Snowflake can use to read and write data. Run the following SQL statement on your preferred SQL client:

CREATE OR REPLACE EXTERNAL VOLUME iceberg_volume
STORAGE_LOCATIONS = (
(
NAME = 'aws-s3-test'
STORAGE_PROVIDER = 'S3'
STORAGE_BASE_URL = 's3://test-bucket/'
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::000000000000:root'
ENCRYPTION=(TYPE='AWS_SSE_S3')
)
)
ALLOW_WRITES = TRUE;

3.3: Create a Catalog Integration

Now you need to define a Catalog Integration that connects Snowflake to the external Polaris REST Catalog, enabling it to discover and manage Iceberg tables through Polaris.

Run the following SQL statement on your preferred SQL client:

CREATE CATALOG INTEGRATION iceberg_catalog
CATALOG_SOURCE=ICEBERG_REST
TABLE_FORMAT=ICEBERG
CATALOG_NAMESPACE='test_namespace'
REST_CONFIG=(
CATALOG_URI='http://polaris:8181'
CATALOG_NAME='polaris'
)
REST_AUTHENTICATION=(
TYPE=OAUTH
OAUTH_CLIENT_ID='root'
OAUTH_CLIENT_SECRET='s3cr3t'
OAUTH_ALLOWED_SCOPES=(PRINCIPAL_ROLE:ALL)
)
ENABLED=TRUE;

3.4: Create an Iceberg Table

This statement creates a new Iceberg table.

CREATE ICEBERG TABLE iceberg_table (c1 TEXT)
CATALOG='iceberg_catalog',
EXTERNAL_VOLUME='iceberg_volume',
BASE_LOCATION='test/test_namespace';

The CATALOG and EXTERNAL_VOLUME tell Snowflake to use the Polaris integration and the S3 volume you just created.

3.5: Insert and Query Data

Now, insert data into the table and query it back to confirm that it works.

INSERT INTO iceberg_table(c1) VALUES ('test'), ('foobar');
SELECT * FROM iceberg_table;

The output would be:

+--------+
| C1 |
|--------|
| test |
| foobar |
+--------+

Step 4: Querying Data with Trino

The main benefit of this setup is that the table created through the Snowflake emulator can also be queried using Trino. Connect to the Trino container to run queries from its CLI:

Terminal window
docker exec -it polaris-demo-trino-1 trino

At the trino> prompt, query the table using the full name: iceberg.test_namespace.iceberg_table, which includes the catalog, namespace, and table name.

Terminal window
SELECT * FROM iceberg.test_namespace.iceberg_table;

You should see the exact same output, demonstrating successful interoperability:

Terminal window
c1
--------
test
foobar
(2 rows)

Step 5: Inspect the S3 bucket

You can inspect the raw files in LocalStack S3 to see how Apache Iceberg organizes data. Open the S3 Resource Browser on LocalStack Web Application and navigate to the test-bucket.

S3 Resource Browser

Inside the test/test_namespace/ prefix, you’ll find:

  • data/: Contains Parquet files (.parquet), each holding a subset of table rows
  • metadata/: Contains Iceberg metadata, including:
    • Manifest lists and manifest files (.avro)
    • Core table metadata (.json) defining schema, partitions, snapshots, and data-file mapping.

This confirms that both the Snowflake emulator and Trino use the standard Iceberg file layout, relying on Polaris as the metadata catalog for interoperability.

Conclusion

This tutorial showed how to build a local data lakehouse using LocalStack for Snowflake, Apache Polaris, and Trino. By leveraging the Apache Iceberg format and a centralized REST catalog, you can enable seamless interoperability between query engines.

With LocalStack, you gain a fully local, containerized environment that emulates critical cloud resources, including Snowflake and S3, making it ideal for developing and testing complex data architectures. This local-first setup lets you:

  • Maintain a single source of truth accessible by multiple engines
  • Develop and iterate on multi-engine data pipelines locally
  • Accelerate feedback loops by avoiding real cloud dependencies
  • Validate data logic in an isolated, reproducible environment

Harsh Mishra
Harsh Mishra
Engineer at LocalStack
Harsh Mishra is an Engineer at LocalStack and AWS Community Builder. Harsh has previously worked at HackerRank, Red Hat, and Quansight, and specialized in DevOps, Platform Engineering, and CI/CD pipelines.