We're hiring
Back to blog

Announcing Polars 1.41

By Thijs Nieuwdorp on Tue, 26 May 2026

Polars 1.41 is out. In this post we’ll go through three highlights: Parquet metadata decoding is significantly faster for wide tables, the query optimizer now eliminates redundant subplans at every nesting depth, and LazyFrame.gather is now available.

Faster Parquet Metadata Decoding

PR: #27427

Every scan_parquet call starts by reading and decoding the Parquet metadata stored in the footer. The footer contains column statistics, row group metadata, and type information. For wide tables with many columns or many row groups, decoding this footer was a measurable part of total scan time.

Thrift is the binary serialization format Parquet uses to describe its metadata structures. The previous decoder was auto-generated from the Thrift IDL definitions, which are general-purpose and built to describe arbitrary schemas. This auto-generated code had low performance and became a decoding bottleneck with with many small row groups or wide schemas. Since Parquet’s metadata schema is stable, 1.41 replaces that generic decoder with a hand-written one specialized for it, cutting unnecessary allocations and branches.

The metadata-decoding speedup scales with the number of columns:

ColumnsBeforeAfterSpeedup
1002.12 ms1.32 ms1.61×
100011.66 ms4.43 ms2.63×
10000117.4 ms35.74 ms3.29×

Benchmarks measure footer decoding only (not full scan time) on Apple M4 Pro with uncompressed Parquet, 20 row groups, 1 row per group. Wider tables see larger gains because more column metadata means more Thrift to decode.

You don’t have to change anything in your code to get these benefits. Existing scan_parquet calls get the speedup automatically.

Nested Common Subplan Elimination

PR: #27340

The Polars query optimizer detects when the same subplan appears in multiple branches of a query and replaces repeated evaluations with a single cached result. Before 1.41, this detection only worked at one level of nesting. A subplan shared across two branches would be deduplicated, but if the result was itself shared again higher up in the plan, the work was repeated.

1.41 fixes this by traversing into cache nodes during the elimination pass, so deduplication now applies at every nesting depth.

This example from the PR shows the pattern:

import polars as pl

lf = pl.LazyFrame({"a": [0, 1, 2]})
lf1 = lf.select(pl.col("a") + 1)
lf2 = pl.concat([lf1, lf1])
lf3 = pl.concat([lf2, lf2])

lf3.show_graph()

lf1 appears twice in lf2, and lf2 appears twice in lf3. Before 1.41, the outer concat is cached but the inner lf1 is still evaluated twice. After 1.41, the optimizer traverses into cache nodes during the elimination pass, so both levels are deduplicated:

Query plan before and after CSE optimization. Before: one CACHE node but two duplicate select nodes for lf1. After: nested CACHE nodes that deduplicate both lf1 and lf2.

LazyFrame.gather

PR: #27501

DataFrame.gather() selects rows by integer index. Until 1.41, this wasn’t available for a LazyFrame, which meant you had to materialize it first:

# Before 1.41
df = lf.collect()
result = df.gather([0, 2, 4])

LazyFrame.gather() is now available, so row selection by index can stay in the lazy API:

import polars as pl

lf = pl.LazyFrame({
    "product": ["A", "B", "C", "D", "E"],
    "revenue": [100, 250, 80, 320, 150],
})

result = lf.gather([1, 3]).collect()

# Output
# shape: (2, 2)
# ┌─────────┬─────────┐
# │ product ┆ revenue │
# │ ---     ┆ ---     │
# │ str     ┆ i64     │
# ╞═════════╪═════════╡
# │ B       ┆ 250     │
# │ D       ┆ 320     │
# └─────────┴─────────┘

And More

The full list of changes is in the 1.41 release notes on GitHub.

Follow us for updates:

[LinkedIn] - [Twitter/X] - [Bluesky] - [GitHub]

1
2
4
3
5
6
7
8
9
10
11
12