Launch of Polars Cloud and Distributed Polars

Wed, 3 Sept 2025

After working hard since our Polars Cloud announcement last February, we are pleased to officially launch Polars Cloud. Polars Cloud is now Generally Available on AWS. Beyond that, we also launched our novel Distributed Engine in Open Beta on Polars Cloud.

You can immediately get started at https://cloud.pola.rs/.

After that you can fire a remote distributed query:

import polars_cloud as pc
import polars as pl
from datetime import date

pc.authenticate()

with pc.ComputeContext(
    workspace="<my-workspace>",
    cpus=2,
    memory=8,
    cluster_size=8,
) as ctx:
    in_progress = (
        pl.scan_parquet("s3://polars-cloud-samples-us-east-2-prd/pdsh/sf100/lineitem/",
            storage_options={
                "aws_request_payer": "true",
            })
        .filter(pl.col("l_shipdate") <= date(1998, 9, 2))
        .group_by("l_returnflag", "l_linestatus")
        .agg(
            count_order=pl.len()
        )
        .remote(ctx)
        .distributed()
        .execute()
    )

    print(in_progress.await_result().head)

Closing the DataFrame scale gap

The General Availability of Polars Cloud on AWS marks a major milestone in closing the DataFrame scale gap—the historic divide between the ease of pandas locally and the scalability of PySpark remotely. By making Polars Cloud broadly accessible, we bring to life our mission of delivering fast, flexible and open-source data tools that run everywhere, giving users a single API that seamlessly scales from a laptop to the cloud.

Equally significant is the Open Beta of our Distributed Engine, which leverages Polars’ novel streaming architecture to offer not just horizontal but also vertical and diagonal scaling strategies. This design directly addresses the cost, complexity and performance tradeoffs users face today, while making high-performance compute broadly accessible. Together, these launches represent a step-change: remote execution that feels native, distribution without friction, and an architecture built to meet the future of large-scale data processing head-on.

1. What is Polars Cloud

Polars Cloud is a managed data platform that enables you to run Polars queries remotely in the cloud at scale. We will manage the cloud infrastructure and the scaling. Besides remote execution, Polars Cloud offers different scaling strategies, where distributed is most important. Our distributed engine uses our OSS streaming engine on the workers. This ensures we stay committed in making OSS Polars better as we will become one of the direct users. Because of Polars’ strength in vertical compute, Polars’ distributed offers not only horizontal, but also diagonal scaling strategies. Here we have a single big worker for tasks that would be better off on a beefy single node and would not benefit from the shuffling overhead. Polars Cloud will allow you to choose the best scaling strategy that fits your use case, offering one API for any scale, meaning you can reduce cost, time, and complexity.

Learn more about Polars Cloud in our initial announcement post.

2. Polars Distributed Engine in Public Beta

Our distributed engine is available in Public Beta. We are confident that we achieved a state where our distributed engine is useful and in some cases even one of the best options available. There are of course features we haven’t supported in a distributed manner yet, in that case we will automatically fall back to a single node for that operation. Among many other operations, we can run our PDS-H benchmark fully distributed. If you want to stay updated of what our distributed engine is capable of, keep an eye on the tracking issue here.

Where I think our distributed engine shines, is combining partitionable queries with order dependent data processing like in this query below.

result = (
    trades.group_by_dynamic(
        "time",
        every="1m",
        group_by="symbol"
    ).agg(
        avg_price=pl.col("price").mean(),
        total_size=pl.col("size").sum(),
        interval_start=pl.col("time"),
    ).join_asof(
        fairs,
        left_on="interval_start",
        right_on="time",
        by="symbol",
        strategy="backward"
    ).select(
        "symbol",
        "interval_start",
        "avg_price",
        "total_size",
        "fair_value"
    )
)

This query really combines the power of Polars’ single node execution with the scalability of Polars’ distributed. It can horizontally partition over symbols and then utilize Polars’ fast query engine to process the partitions on powerful workers.

3. Near future

Features that will land soon are:

On-premises support

We have begun working on supporting the Polars Cloud distributed architecture on-premises. We expect to onboard the first clients in the coming months. Are you interested in on-premise Polars Cloud, contact us via the form below.
Live cluster dashboard

The current version of Polars Cloud has a dashboard that shows you summaries of your queries, clusters, vCPU etc. The cluster dashboard we are building will have a direct connection to your cluster, allowing us to show much more information. And because Polars streaming executor is written from scratch, we can add custom tracing that can give you deep insights in the operations that your queries spend time and how much utilization it has at any point in time. The possibilities here are very exciting to me as our vertical integration means we have access to all the information in the stack.
Orchestration

As we are building a data platform, as minimal version of task orchestration cannot be left out. We don’t aim to replace tools like Airflow or Prefect, but we do want to offer you the option to schedule your queries with Polars Cloud alone. Note that we believe in strong integration with other tools and have therefore chosen for a Polars Cloud client that can directly be used with Polars OSS and popular orchestration tools.
Autoscaling

As we can scale both vertically and horizontally with heterogenous worker sizes, we have unique scaling opportunities. We plan to land vertical and diagonal (where the big worker scales) autoscaling soon. Later we will expand that to horizontal autoscaling as well.
Catalog support

Our early design partners informed us that most users were using iceberg to load their data. Since then we’ve made a large effort to make our iceberg support native and distributed. Besides the iceberg table format, we will also expose a catalog so that users can organize their datasets easier.
Multi-region

Initially we launched in the US East region only. This gives us acceptable latencies for the US and western Europe. We are going to launch multi-region as soon as possible so that all regions will experience minimal latencies.

Get started