Yesterday, we published a new breaking Polars release to PyPI.
The new version includes quite a few breaking changes.
Most of these changes have been announced through deprecation warnings, but we did include some surprises.
This document should help you navigate the upgrade to Polars 0.19
.
For the full list of changes, please see our changelog.
Table of contents
Breaking changes
While this section does not contain an exhaustive list of all breaking changes, we think these are most likely to impact your code.
Aggregation functions no longer support horizontal computation
This impacts aggregation functions like sum
, min
, and max
.
These functions were overloaded to support both vertical and horizontal computation.
Recently, new dedicated functionality for horizontal computation was released, and horizontal computation was deprecated.
Restore the old behavior by using the horizontal variant, e.g. sum_horizontal
.
Example
Before:
df = pl.DataFrame({'a': [1, 2], 'b': [11, 12]})
df.select(pl.sum('a', 'b')) # horizontal computation
shape: (2, 1)
┌─────┐
│ sum │
│ --- │
│ i64 │
╞═════╡
│ 12 │
│ 14 │
└─────┘
After:
df = pl.DataFrame({'a': [1, 2], 'b': [11, 12]})
df.select(pl.sum('a', 'b')) # vertical computation
shape: (1, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 3 ┆ 23 │
└─────┴─────┘
Update to all
/ any
all
will now ignore null values by default, rather than treat them as False
.
For both any
and all
, the drop_nulls
parameter has been renamed to ignore_nulls
and is now keyword-only.
Also fixed an issue when setting this parameter to False
would erroneously result in None
output in some cases.
To restore the old behavior, set ignore_nulls
to False
and check for None
output.
Example
Before:
pl.Series([True, None]).all()
False
After:
pl.Series([True, None]).all()
True
Improved error types for many methods
Improving our error messages is an ongoing effort.
We did a sweep of our Python code base and made many improvements to error messages and error types.
Most notably, many ValueError
s were changed to TypeError
s.
If your code relies on handling Polars exceptions, you may have to make some adjustments.
Example
Before:
pl.Series(values=15)
ValueError: Series constructor called with unsupported type; got 'int'
After:
pl.Series(values=15)
TypeError: Series constructor called with unsupported type 'int' for the `values` parameter
Updates to expression input parsing
Methods like select
and with_columns
accept one or more expressions.
But they also accept strings, integers, lists, and other inputs that we try to interpret as expressions.
We updated our internal logic to parse inputs more consistently.
Example
Before:
pl.DataFrame({'a': [1, 2]}).with_columns(None)
shape: (2, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 2 │
└─────┘
After:
pl.DataFrame({'a': [1, 2]}).with_columns(None)
shape: (2, 2)
┌─────┬─────────┐
│ a ┆ literal │
│ --- ┆ --- │
│ i64 ┆ null │
╞═════╪═════════╡
│ 1 ┆ null │
│ 2 ┆ null │
└─────┴─────────┘
shuffle
/ sample
now use an internal Polars seed
If you used the built-in Python random.seed
function to control the randomness of Polars expressions, this will no longer work.
Instead, use the new set_random_seed
function.
Example
Before:
import random
random.seed(1)
After:
import polars as pl
pl.set_random_seed(1)
Deprecations
Creating a consistent and intuitive API is hard, finding the right name for each function, method, and parameter might be the hardest part.
The new version comes with quite some naming changes, and you will most likely run into deprecation warnings when upgrading to 0.19
.
If you want to upgrade without worrying about deprecation warnings right now, you can add the following snippet to your code:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
groupby
renamed to group_by
This is not a change we make lightly, as it will impact almost all our users. But “group by” are really two different words, and our naming strategy dictates that these should be separated by an underscore.
Most likely, a simple search and replace will be enough to take care of this update:
- Search:
.groupby(
- Replace:
.group_by(
apply
renamed to map_*
apply
is probably the most misused part of our API. Many Polars users come from pandas, where apply
has a completely different meaning.
We now consolidate all our functionality for user-defined functions under the name map
. This results in the following renaming:
Before | After |
---|---|
Series/Expr.apply | map_elements |
Series/Expr.rolling_apply | rolling_map |
DataFrame.apply | map_rows |
GroupBy.apply | map_groups |
pl.apply | map_groups |
map | map_batches |