Integrating Polaris with LocalStack for Snowflake and Trino

Introduction

Managing data across query engines like Snowflake and Trino is complex and often results in data silos, making it hard to maintain a single source of truth. Apache Iceberg, an open table format, combined with a central catalog service like Apache Polaris, facilitates access to the same data by multiple engines, enabling a unified architecture.

However, building and testing pipelines across engines remains slow and expensive due to long feedback loops and high cloud costs, which delay development and hinder early issue detection. The LocalStack for Snowflake emulator addresses these problems by letting you run a local Snowflake environment in Docker with a “shift-left” approach to data engineering.

This tutorial shows how to set up a local data lakehouse using the Snowflake emulator with Apache Polaris and Trino. You will use both engines to query the same shared Apache Iceberg tables stored in LocalStack S3 for interoperable local development.

How LocalStack works with Polaris

To support local development, LocalStack provides a localstack/polaris Docker image that runs Apache Polaris as a containerized service, enabling easy setup of a REST-based Iceberg catalog locally.

The integration between the Snowflake emulator and the Polaris container is achieved through a series of SQL commands that enable Snowflake to utilize Polaris for metadata and LocalStack S3 for data storage.

Here is how the connection is made:

A CREATE EXTERNAL VOLUME statement is used to define a pointer to a storage location in LocalStack S3. This tells the Snowflake emulator where to store Parquet data files and Iceberg metadata files. It includes:
- S3 bucket URI in LocalStack
- Access key and secret key for authentication
A CREATE CATALOG INTEGRATION statement sets up the connection to Polaris as an external Iceberg REST catalog. Key parameters include:
- CATALOG_SOURCE=ICEBERG_REST: Indicates the use of the Iceberg REST API, which Polaris implements
- CATALOG_URI: Set to the network address of the Polaris container (e.g., http://polaris:8181)
- REST_AUTHENTICATION: Includes OAuth credentials required for Polaris API access

Once the integration is configured, you can create Iceberg tables using CREATE ICEBERG TABLE with the CATALOG='iceberg_catalog' parameter.

Snowflake sends metadata operations to Polaris, which handles schema and partition management, while actual data is written to the S3 location specified by the external volume.

Architecture Diagram

This separation of metadata and storage enables other engines like Trino to access and query the same Iceberg tables.

Prerequisites

Before you start, make sure you have the following installed:

A valid LocalStack Auth Token with a Snowflake license
Docker and Docker Compose
A SQL client compatible with Snowflake (like DBeaver)
AWS CLI with awslocal wrapper script
Git

Step 1: Set up the project

First, clone the sample application repository from GitHub and navigate into the project directory.

git clone https://github.com/localstack-samples/polaris-demo.git
cd polaris-demo

Step 2: Understand and start the services

The sample application uses Docker Compose to start the services. The interoperability between Snowflake and Trino is setup via their respective configurations pointing to the same Polaris Catalog and S3 backend.

2.1: Understanding the Docker Compose configuration

The docker-compose.yml file, provided in the repository, sets up and configures all required services for the sample app.

services:
  localstack:
    image: localstack/snowflake:latest
    ports:
      - "127.0.0.1:4566:4566"
      - "127.0.0.1:4510-4559:4510-4559"
      - "127.0.0.1:443:443"
    environment:
      - LOCALSTACK_AUTH_TOKEN=${LOCALSTACK_AUTH_TOKEN:?}
      - DEBUG=1
      - DOCKER_FLAGS='-e SF_LOG=trace'
    volumes:
      - "./volume:/var/lib/localstack"
      - "/var/run/docker.sock:/var/run/docker.sock"

  polaris:
    image: localstack/polaris:latest
    ports:
      - "8181:8181"
      - "8182"
    environment:
      AWS_REGION: us-east-1
      AWS_ACCESS_KEY_ID: test
      AWS_SECRET_ACCESS_KEY: test
      AWS_ENDPOINT_URL: http://localstack:4566
      POLARIS_BOOTSTRAP_CREDENTIALS: default-realm,root,s3cr3t
      polaris.realm-context.realms: default-realm
      quarkus.otel.sdk.disabled: "true"

    healthcheck:
      test: ["CMD", "curl", "http://localhost:8182/healthcheck"]
      interval: 10s
      timeout: 10s
      retries: 5

  create-polaris-catalog:
    image: curlimages/curl
    depends_on:
      polaris:
        condition: service_healthy
    volumes:
      - ./create-polaris-catalog.sh:/create-polaris-catalog.sh
    command: ["/bin/sh", "/create-polaris-catalog.sh"]

  trino:
    image: trinodb/trino:latest
    ports:
      - "8080:8080"
    volumes:
      - ./trino-config/catalog:/etc/trino/catalog
    depends_on:
      polaris:
        condition: service_healthy

Here is a breakdown of the services:

localstack: Runs the LocalStack container with the Snowflake emulator enabled, exposing ports for AWS services and the Snowflake API.
polaris: Deploys the Apache Polaris Catalog service, configured to use LocalStack as its S3-compatible backend via the AWS_ENDPOINT_URL environment variable.
create-polaris-catalog: A temporary service that executes the create-polaris-catalog.sh script (provided in the repository) to set up the Polaris Catalog after the service becomes healthy.
trino: Launches the Trino query engine with configuration files mounted from the local trino-config directory (provided in the repository).

2.2: Understanding the Polaris Catalog Setup

The create-polaris-catalog.sh script automates Polaris Catalog setup by performing the following steps:

Waits for the Polaris service to report as healthy.
Requests an OAuth2 bearer token from Polaris for authentication.
Sends a POST request to the Polaris management API to create a catalog named polaris, configured to use s3://test-bucket/ in LocalStack S3 as its default storage location.
Adds TABLE_WRITE_DATA privilege to catalog_admin role.

2.3: Understanding the Trino Configuration

The configuration for Trino’s Iceberg connector is in trino-config/catalog/iceberg.properties.

# iceberg.properties

connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=http://polaris:8181/api/catalog
iceberg.rest-catalog.security=OAUTH2
iceberg.rest-catalog.oauth2.credential=root:s3cr3t
iceberg.rest-catalog.oauth2.scope=PRINCIPAL_ROLE:ALL
iceberg.rest-catalog.case-insensitive-name-matching=true
iceberg.rest-catalog.warehouse=polaris
fs.native-s3.enabled=true
s3.endpoint=http://localstack:4566
s3.aws-access-key=test
s3.aws-secret-key=test
s3.region=us-east-1
s3.path-style-access=true

This file defines how Trino connects to Iceberg data.

Key properties include:

connector.name=iceberg: Specifies use of the Iceberg connector
iceberg.catalog.type=rest: Sets the catalog type to REST
iceberg.rest-catalog.uri=http://polaris:8181/api/catalog: Points to Polaris’s API endpoint
s3.endpoint=http://localstack:4566: Uses the LocalStack S3 endpoint for data access

2.4: Launch the Docker Compose

Before starting, ensure your LOCALSTACK_AUTH_TOKEN is exported as an environment variable.

export LOCALSTACK_AUTH_TOKEN=<your-localstack-auth-token>

Now, start all the services using Docker Compose:

docker-compose up

It will take a few minutes to start all the services. Once all the services are running, you can check the status of the services using the following command:

docker ps

You should see the following output:

CONTAINER ID   IMAGE                         COMMAND                  CREATED        STATUS                            PORTS                                                                                                    NAMES
f82c9dc1defd   trinodb/trino:latest          "/usr/lib/trino/bin/…"   19 hours ago   Up 9 seconds (health: starting)   0.0.0.0:8080->8080/tcp                                                                                   polaris-demo-trino-1
849aab637b09   localstack/polaris:latest     "/opt/jboss/containe…"   19 hours ago   Up 20 seconds (healthy)           8080/tcp, 8443/tcp, 0.0.0.0:8181->8181/tcp, 0.0.0.0:58598->8182/tcp                                      polaris-demo-polaris-1
2264db66c672   localstack/snowflake:latest   "docker-entrypoint.sh"   19 hours ago   Up 19 seconds (healthy)           127.0.0.1:443->443/tcp, 127.0.0.1:4510-4559->4510-4559/tcp, 53/tcp, 5678/tcp, 127.0.0.1:4566->4566/tcp   polaris-demo-localstack-1

Step 3: Create AWS and Snowflake Resources

With all services running, you can now create the required LocalStack S3 bucket and then use SQL commands to define the data structures in the Snowflake emulator.

3.1: Create the S3 Bucket

The Polaris Catalog uses a bucket named test-bucket. Create this bucket in LocalStack S3 using the awslocal CLI:

awslocal s3 mb s3://test-bucket

3.2: Create an External Volume

Next, you need to define a named location in LocalStack S3 that Snowflake can use to read and write data. Run the following SQL statement on your preferred SQL client:

CREATE OR REPLACE EXTERNAL VOLUME iceberg_volume
  STORAGE_LOCATIONS = (
    (
      NAME = 'aws-s3-test'
  STORAGE_PROVIDER = 'S3'
  STORAGE_BASE_URL = 's3://test-bucket/'
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::000000000000:root'
  ENCRYPTION=(TYPE='AWS_SSE_S3')
    )
  )
  ALLOW_WRITES = TRUE;

3.3: Create a Catalog Integration

Now you need to define a Catalog Integration that connects Snowflake to the external Polaris REST Catalog, enabling it to discover and manage Iceberg tables through Polaris.

Run the following SQL statement on your preferred SQL client:

CREATE CATALOG INTEGRATION iceberg_catalog
CATALOG_SOURCE=ICEBERG_REST
TABLE_FORMAT=ICEBERG
CATALOG_NAMESPACE='test_namespace'
REST_CONFIG=(
    CATALOG_URI='http://polaris:8181'
    CATALOG_NAME='polaris'
)
REST_AUTHENTICATION=(
    TYPE=OAUTH
    OAUTH_CLIENT_ID='root'
    OAUTH_CLIENT_SECRET='s3cr3t'
    OAUTH_ALLOWED_SCOPES=(PRINCIPAL_ROLE:ALL)
)
ENABLED=TRUE;

3.4: Create an Iceberg Table

This statement creates a new Iceberg table.

CREATE ICEBERG TABLE iceberg_table (c1 TEXT)
CATALOG='iceberg_catalog',
EXTERNAL_VOLUME='iceberg_volume',
BASE_LOCATION='test/test_namespace';

The CATALOG and EXTERNAL_VOLUME tell Snowflake to use the Polaris integration and the S3 volume you just created.

3.5: Insert and Query Data

Now, insert data into the table and query it back to confirm that it works.

INSERT INTO iceberg_table(c1) VALUES ('test'), ('foobar');
SELECT * FROM iceberg_table;

The output would be:

+--------+
| C1     |
|--------|
| test   |
| foobar |
+--------+

Step 4: Querying Data with Trino

The main benefit of this setup is that the table created through the Snowflake emulator can also be queried using Trino. Connect to the Trino container to run queries from its CLI:

docker exec -it polaris-demo-trino-1 trino

At the trino> prompt, query the table using the full name: iceberg.test_namespace.iceberg_table, which includes the catalog, namespace, and table name.

SELECT * FROM iceberg.test_namespace.iceberg_table;

You should see the exact same output, demonstrating successful interoperability:

   c1
--------
 test
 foobar
(2 rows)

Step 5: Inspect the S3 bucket

You can inspect the raw files in LocalStack S3 to see how Apache Iceberg organizes data. Open the S3 Resource Browser on LocalStack Web Application and navigate to the test-bucket.

S3 Resource Browser

Inside the test/test_namespace/ prefix, you’ll find:

data/: Contains Parquet files (.parquet), each holding a subset of table rows
metadata/: Contains Iceberg metadata, including:
- Manifest lists and manifest files (.avro)
- Core table metadata (.json) defining schema, partitions, snapshots, and data-file mapping.

This confirms that both the Snowflake emulator and Trino use the standard Iceberg file layout, relying on Polaris as the metadata catalog for interoperability.

Conclusion

This tutorial showed how to build a local data lakehouse using LocalStack for Snowflake, Apache Polaris, and Trino. By leveraging the Apache Iceberg format and a centralized REST catalog, you can enable seamless interoperability between query engines.

With LocalStack, you gain a fully local, containerized environment that emulates critical cloud resources, including Snowflake and S3, making it ideal for developing and testing complex data architectures. This local-first setup lets you:

Maintain a single source of truth accessible by multiple engines
Develop and iterate on multi-engine data pipelines locally
Accelerate feedback loops by avoiding real cloud dependencies
Validate data logic in an isolated, reproducible environment