Let us explore all of the non-trivial data types that Polars supports, so that you understand why Polars supports them and when you should use them. Many of them you already know or understand intuitively. This article will push your intuition further, so you can confidently pick the appropriate data types when you are working with data in Polars.
Not counting variants, there are 18 data types that Polars supports at the time of writing this article (Polars 1.14.0):
Boolean
– Boolean type that is bit-packed efficiently;Int8
,Int16
,Int32
, andInt64
– varying-precision integer types;UInt8
,UInt16
,UInt32
, andUInt64
– varying-precision unsigned integer types;Float32
andFloat64
– varying-precision float types;Decimal
– 128-bit decimal type with optional precision and non-negative scale. This gives you fine-grained control over the precision of your floats;String
– variable length UTF-8 encoded string data, typically human-readable;Binary
– variable length, arbitrary raw binary data;Date
– represents a calendar date;Time
– represents a time of day;Datetime
– represents a calendar date and time of day;Duration
– represents a time duration;Array
– homogeneous arbitrary dimension array with a fixed shape;List
– homogeneous 1D container with variable length;Categorical
– efficient encoding of string data where the categories are inferred at runtime;Enum
– efficient ordered encoding of a set of predetermined string categories;Struct
– composite product type that can store multiple fields;Object
– wraps arbitrary Python objects; andNull
– represents null values.
The more common types
Booleans
The Boolean type, pl.Boolean
, is described as being “bit-packed efficiently”. You only need one
bit to represent true (1) or false (0) and a byte has 8 bits, so Polars can use a single byte to
represent 8 Boolean values. You can verify this with a simple experiment:
import random
import polars as pl
bools = [random.choice([True, False]) for _ in range(8 * 127)]
s = pl.Series(bools, dtype=pl.Boolean)
print(s.estimated_size()) # 127
This shows that if a series has $n$ Boolean values, Polars uses $n / 8$ bytes for that series.
Missing data
The data type Null
is the type for the Polars value null
, which is somewhat similar to Python’s
None
. The value null
represents a missing value:
s = pl.Series([1, None, 3, 4, 5, None])
print(s.count()) # 4
print(s.is_null().sum()) # 2
Any Polars series, or any dataframe column, can contain the value null
to represent a missing
value in that position, and Polars uses null
to represent missing data for every data type, even
for numerical types.
As an interesting piece of trivia, using the function is_null
to check which values in a column
are missing is “free” in Polars. That is because Polars stores a validity mask with Boolean values
indicating which items in a series are missing. This validity mask is bit-packed in the same
efficient manner as Boolean columns, so the memory overhead is minimal:
s1 = pl.Series([1, 2, 3, 4, 5, 6, 7, 8], dtype=pl.Int64) # 8 bytes x 8 integers = 64 bytes
print(s1.estimated_size()) # 64
# 64 bytes for the 8 integers Int64, plus the validity mask:
# 1 bit x 8 integers = 1 byte; total: 64 + 1 = 65
s2 = pl.Series([1, 2, 3, 4, None, 6, 7, 8], dtype=pl.Int64)
print(s2.estimated_size()) # 65
print(s2.is_null())
64
65
shape: (8,)
Series: '' [bool]
[
false
false
false
false
true
false
false
false
]
Integer types, signed and unsigned
The integer types Int8
, Int16
, Int32
, Int64
, UInt8
, UInt16
, UInt32
, and UInt64
, are
very common in several programming languages and in dataframe libraries, but for a Python user the
distinctions might come as a surprise because Python can handle arbitrarily large integers with
ease:
googol = 10 ** 100
print(googol % 99999999977) # 11526618770
Polars does not handle arbitrary large integers. Its integer types are Int8
through Int64
, and
the number after Int
tells how many bits are used to store the integer. Since one bit is needed
for the sign of the number, an integer data type of $n$ bits can represent integers from $-2^{n-1}$
to $2^{n-1} - 1$:
Data type | Lower limit | Upper limit |
---|---|---|
Int8 | -128 | 127 |
Int16 | -32768 | 32767 |
Int32 | -2147483648 | 2147483647 |
Int64 | -9223372036854775808 | 9223372036854775807 |
You should use the lowest precision variant that suits your needs, since lower-precision variants are more memory-efficient. When you know that a certain variable cannot have negative values (for example, the number of children a person has or their age), you can use an unsigned integer. Unsigned integers have the same precisions but they only represent positive numbers, so their upper limits are twice1 as large as the respective signed integer:
Data type | Upper limit |
---|---|
UInt8 | 255 |
UInt16 | 65535 |
UInt32 | 4294967295 |
UInt64 | 18446744073709551615 |
When possible, using an unsigned integer over a signed integer adds a layer of protection to your code because your code will raise an error if you try to use a negative integer in a column with unsigned integers. This may help you catch bugs or issues with your data.
Since integer types have lower and upper limits, calculations that produce values that fall outside of the limits make the values wrap around:
s = pl.Series([0, 255], dtype=pl.UInt8) # Integers between 0 and 255.
print(s + 1) # 255 + 1 = 256 wraps around to 0.
print("---")
print(s - 1) # 0 - 1 = -1 wraps around to 255.
shape: (2,)
Series: '' [u8]
[
1
0
]
---
shape: (2,)
Series: '' [u8]
[
255
254
]
This wrapping around is often referred to as overflowing or underflowing, depending on whether the values got too big or too small.
Floating point numbers
The caveats associated with working with floating point numbers in Python also apply when working
with the data types Float32
and Float64
in Polars and the Python type float
typically matches
Polars’ Float64
. Python and Polars follow the standard
IEEE 754, so you can read that if you are interested in
learning about the limitations of floating point number arithmetic in depth.
Columns with the data types Float32
and Float64
also use the value null
to represent missing
data. The value NaN
, which is a special floating point value, is used to represent the result of
some operations that are mathematically undetermined. Here are some examples:
pl.Series([0]) / 0
inf = float("inf")
pl.Series([inf]) - inf
pl.Series([inf]) / inf
All three computations produce the same output:
shape: (1,)
Series: '' [f64]
[
NaN
]
Strings
Working with strings in Polars is efficient because Polars provides many useful string-specific
functions under the namespace str
:
print(pl.Series(["Hello, world!", "Polars is great"]).str.slice(0, 6))
shape: (2,)
Series: '' [str]
[
"Hello,"
"Polars"
]
Due to the way the Polars data types inference mechanism works, sometimes you will have string columns but another specialised data type is more appropriate. This happens when reading temporal data from files, for example, because Polars will not parse data into temporal data types unless you tell it to. A string column also occurs naturally when working with categorical data, in which case a data type for categorical data might be more appropriate.
Temporal data types
The Polars temporal data types are quite intuitive to work with, and they are all very similar to
the data types available in the standard module datetime
:
Temporal data type | Similar type in datetime |
---|---|
Date | datetime.date |
Time | datetime.time |
Datetime | datetime.datetime |
Duration | datetime.timedelta |
Polars
supports dozens of specialised temporal expressions
that you can access from the namespace dt
.
Dates, times, and dates with times
The data type Date
represents a calendar date: a day, a month, and a year. For example, someone’s
birthdate would be appropriately represented as a Date
. This can be parsed from a string or
created directly from a datetime.date
object:
from datetime import date
df = pl.DataFrame({
"superhero": ["Superman", "Batman", "Deadpool"],
"first_appearance": [ # Source: respective Wikipedia articles
date(1938, 4, 18),
date(1939, 3, 30),
date(1990, 12, 11),
]
})
print(df)
shape: (3, 2)
┌───────────┬──────────────────┐
│ superhero ┆ first_appearance │
│ --- ┆ --- │
│ str ┆ date │
╞═══════════╪══════════════════╡
│ Superman ┆ 1938-04-18 │
│ Batman ┆ 1939-03-30 │
│ Deadpool ┆ 1990-12-11 │
└───────────┴──────────────────┘
On the other hand, the data type Time
represents a day time: an hour, minutes, seconds, and
sometimes even fractions of a second. For example, the time for which your alarm clock is set would
be appropriately represented as a Time
. Analogously to dates, times can be parsed from strings or
created directly from datetime.time
objects:
from datetime import time
df = pl.DataFrame({
"superhero": ["Superman", "Batman", "Deadpool"],
"avg_wake_up_time": [ # Source: made up numbers
time(5, 30, 0),
time(13, 0, 0),
time(11, 27, 56),
]
})
print(df)
shape: (3, 2)
┌───────────┬──────────────────┐
│ superhero ┆ avg_wake_up_time │
│ --- ┆ --- │
│ str ┆ time │
╞═══════════╪══════════════════╡
│ Superman ┆ 05:30:00 │
│ Batman ┆ 13:00:00 │
│ Deadpool ┆ 11:27:56 │
└───────────┴──────────────────┘
To recap, the data types Date
and Time
are orthogonal because they do not share units. When you
need both together, for example to represent your next doctor appointment, you use the data type
Datetime
. A Datetime
is a data type that aggregates the units from Date
and Time
and it also
provides functionality to handle much dreaded timezones. It goes without saying, but you can create
values of this type by parsing strings or by using datetime.datetime
objects:
from datetime import datetime, timedelta, timezone
now = datetime.now()
datetimes = [
now,
now.replace(tzinfo=timezone(timedelta(hours=1))),
now.replace(tzinfo=timezone(timedelta(hours=-3))),
]
s = pl.Series(datetimes)
print(s) # All values are converted to UTC.
shape: (3,)
Series: '' [datetime[μs]]
[
2024-11-22 19:14:25.468051
2024-11-22 18:14:25.468051
2024-11-22 22:14:25.468051
]
Polars will convert all times to UTC to homogeneise the timezone in a single series/column.
If you set the timezone of a series or column, you will see the timezone appear in the data type:
print(s.dt.convert_time_zone("Europe/Amsterdam"))
shape: (3,)
Series: '' [datetime[μs, Europe/Amsterdam]] # <-- new TZ shows in the data type
[
2024-11-25 11:23:55.322912 CET # <-- times are adjusted to new TZ
2024-11-25 10:23:55.322912 CET
2024-11-25 14:23:55.322912 CET
]
The other piece of information shown in the data type, in this case µs
, is the time unit in which
the datetime is kept. This
can be tweaked if you need more or less precision.
Although the namespace dt
provides dozens of temporal-specific operations, some of the functions
you might end up using very often are the ones from the namespace str
that parse string data into
the temporal types:
Expression | Target data type | Docs link |
---|---|---|
.str.to_date | Date | 🔗 |
.str.to_datetime | Datetime | 🔗 |
.str.to_time | Time | 🔗 |
.str.strptime | Any of the three | 🔗 |
Be aware that when you are parsing strings into temporal data types, you must
use the format specifiers from Rust’s crate chrono
,
not the ones from the Python library datetime
. The more common specifiers are the same, but the
specifications do not match entirely. The same caveat applies to formatting temporal data types as
strings.
Time durations
The data type Duration
is the data type that arises naturally when you perform arithmetics with
the other data types:
bedtime = pl.Series([
datetime(2024, 11, 22, 23, 56),
datetime(2024, 11, 24, 0, 23),
datetime(2024, 11, 24, 23, 37),
])
wake_up = pl.Series([
datetime(2024, 11, 23, 7, 30),
datetime(2024, 11, 24, 7, 30),
datetime(2024, 11, 25, 8, 0),
])
sleep = wake_up - bedtime
print(sleep)
shape: (3,)
Series: '' [duration[μs]]
[
7h 34m
7h 7m
8h 23m
]
The data type Binary
The data type Binary
is suitable for when you want to represent raw binary data in your series or
dataframes. An easy way of creating a series with the data type Binary
is by specifying a Python
bytes
object:
s = pl.Series([b"binary", b"data", b"here"])
print(s.dtype) # Binary
Polars supports a few expressions specialised for the data type Binary
that you can find in the
namespace bin
. For example, the expression .bin.size
gives you the size of the values in the
Binary
column:
print(s.bin.size())
shape: (3,)
Series: '' [u32]
[
6
4
4
]
While there are many data types for which Polars provides many specialised expressions, the data
type Binary
is mostly provided as a convenient type for you to be able to hold data in your
dataframe. Examples of scenarios where you might want to use the data type Binary
include images
and audio, files in proprietary formats, or serialized data.
The data type Decimal
You can think of the data type Decimal
as a variant of the data types Float32
and Float64
, but
where you can control the number of decimal places your numbers have. While using the data type
Decimal
does not prevent all rounding errors, it can help prevent some of them.
For example, the two additions below produce 1.0
as the result, but that’s only because of
rounding errors:
tiny = pow(10, -16)
print(f"{tiny + 1 = }")
print("With Float64:")
print(pl.Series([tiny], dtype=pl.Float64) + 1)
tiny + 1 = 1.0
With Float64:
shape: (1,)
Series: '' [f64]
[
1.0
]
By using the data type Decimal
, with enough decimal places, we can get an accurate result:
print("With Decimal(None, 24):")
print(pl.Series([tiny], dtype=pl.Decimal(None, 24)) + 1)
With Decimal(None, 24):
shape: (1,)
Series: '' [decimal[*,24]]
[
1.000000000000000100000000
]
The data type Decimal
takes as second argument the number of digits after the decimal point. In
the snippet above, we set it to 24
. The first argument specifies the maximum total number of
digits in each number. Setting it to None
lets Polars infer the value we need.
The data type Decimal
does not have a dedicated namespace with specialised expressions and at the
time of writing it is considered an unstable feature. You should also understand that Decimal
is
not a silver bullet for all your rounding errors, since Decimal
also has limited precision.
Categorical data
A categorical variable is a variable that can only take one of a pre-determined set of values. Personally, I have always thought of categorical variables as the ones that make sense to be presented as a dropdown in a form you are filling in. For example, it would be pretty absurd if you were filling out a form where you had to input your exact salary by selecting a value in a dropdown. But if you were asked your nationality, you’d expect a dropdown list that you could quickly navigate until you found your country.
In Polars, categorical variables are always derived from string data and Polars provides two similar data types that let you work with categorical data. We present them next.
The data type Enum
The data type Enum
is the preferred data type that you should use when dealing with categorical
data. To cast a column or a series to the data type Enum
, you need a three step process:
- determine what are the valid categories (this can be done statically, or computed programmatically when feasible);
- create a “variant” of the
Enum
data type by instantiating it; and - finally you can cast your series or column:
valid_values = ["panda", "polar", "brown"] # 1.
bear_enum = pl.Enum(valid_values) # 2.
s = pl.Series(["panda", "polar", "panda", "brown", "panda"], dtype=bear_enum) # 3.
print(s)
shape: (5,)
Series: '' [enum]
[
"panda"
"polar"
"panda"
"brown"
"panda"
]
You might say that the series printed looks just like a series with the string values, and you are correct. When printed or inspected visually, they look exactly the same.
However, under the hood, Polars is allowed to do operations on series with the data type Enum
in a
more efficient way because it knows only a fixed set of strings are legal values. This makes it more
memory efficient to have a series of the data type Enum
, as well as typically faster to operate on
those series.
Polars will also complain if you include a value that does not belong in the enumeration:
s = pl.Series(["pand", "snake"], dtype=bear_enum)
# InvalidOperationError: conversion from `str` to `enum` failed in column '' for 2 out of 2 values: ["pand", "snake"]
This can help catch issues with your data, from typos to values that are flat out wrong.
The data type Enum
also has the benefit of allowing you to work with your categories as if they
are ordered. In the example of the bears, the ordering does not make sense. If our categorical
variable represents the level of formal education pursued, then there is an ordering that makes
sense:
education_level = ["High school", "BSc", "MSc", "PhD"]
education_enum = pl.Enum(education_level)
people = pl.DataFrame({
"name": pl.Series(["A", "B", "C", "D"]),
"degree": pl.Series(["High school", "MSc", "PhD", "MSc"], dtype=education_enum),
})
print(people.filter(pl.col("degree") >= "MSc"))
shape: (3, 2)
┌──────┬────────┐
│ name ┆ degree │
│ --- ┆ --- │
│ str ┆ enum │
╞══════╪════════╡
│ B ┆ MSc │
│ C ┆ PhD │
│ D ┆ MSc │
└──────┴────────┘
The data type Categorical
You can also use the data type Categorical
when working with categorical data.
You do not need to specify the valid values upfront with the data type Categorical
. Instead,
Polars will infer these values for you. This can sound strictly better than using the data type
Enum
, but the inference that Polars makes comes at a cost.
Disadvantages of using Categorical
instead of Enum
include:
- it is less performant when you operate on two columns that should have the same categories but were created independently; and
- it does not catch values that are invalid.
It is not all bad, and there are cases where the data type Categorical
is what you want. As a rule
of thumb, only use Categorical
when you cannot use Enum
pragmatically.
Regardless of whether you use the data type Enum
or the data type Categorical
, you can always
use the namespace cat
and its only function get_categories
to retrieve the unique values being
used as categories:
s = pl.Series(["panda", "polar", "pand", "snake", "panda"], dtype=pl.Categorical)
print(s.cat.get_categories().to_list())
# ['panda', 'polar', 'pand', 'snake']
Nested data types
Nested data types resemble Python containers. These are data types that contain data inside them. Polars supports three nested data types:
Struct
is like a typed dictionary in Python where the keys are fixed strings;List
is like a Python list, but with the restriction that all elements must be of the same type; andArray
is like a NumPy array, where the elements all have the same type, but the shape of the array itself is fixed.
The data type Struct
You can look at the data type Struct
more or less like a Python dictionary with string keys. If a
column has the data type Struct
, all the rows have the same string keys, so typing.TypedDict
is
more analogous to Struct
than the built-in dict
.
You will understand why a Struct
is needed if you see a situation where the data type Struct
arises naturally:
df = pl.DataFrame({
"name": ["A", "B", "C", "D"],
"favourite_sport": ["basketball", "baseball", "soccer", "basketball"],
})
print(df.select(pl.col("favourite_sport").value_counts()))
shape: (3, 1)
┌──────────────────┐
│ favourite_sport │
│ --- │
│ struct[2] │
╞══════════════════╡
│ {"baseball",1} │
│ {"soccer",1} │
│ {"basketball",2} │
└──────────────────┘
The value {"baseball",1}
shows that the value "baseball"
appeared only 1
time in the column
"favourite_sport"
. The reason the values and their respective counts are put together is because a
single expression should only output a single column. Since we typed only one expression,
pl.col("favourite_sport").value_counts()
, in the context select
, this should produce as output a
single column.
Since the data type Struct
is like a dictionary, we can “key” into the struct to extract different
values:
df = pl.DataFrame({
"name": ["A", "B", "C", "D"],
"favourite_sport": ["basketball", "baseball", "soccer", "basketball"],
})
counts = df.select(pl.col("favourite_sport").value_counts())
print(counts.select(
pl.col("favourite_sport").struct.field("favourite_sport"),
pl.col("favourite_sport").struct.field("count"),
))
shape: (3, 2)
┌─────────────────┬───────┐
│ favourite_sport ┆ count │
│ --- ┆ --- │
│ str ┆ u32 │
╞═════════════════╪═══════╡
│ soccer ┆ 1 │
│ basketball ┆ 2 │
│ baseball ┆ 1 │
└─────────────────┴───────┘
Using
the namespace struct
,
that provides functions specialised to work with the data type Struct
, we can use field
to
access one of the values of the struct. If you have a column with the data type Struct
and you
want to extract all the fields into their respective columns, you can do that with .struct.unnest
:
df = pl.DataFrame({
"name": ["A", "B", "C", "D"],
"favourite_sport": ["basketball", "baseball", "soccer", "basketball"],
})
print(
df.select(
pl.col("favourite_sport").value_counts()
.struct.unnest()
)
)
shape: (3, 2)
┌─────────────────┬───────┐
│ favourite_sport ┆ count │
│ --- ┆ --- │
│ str ┆ u32 │
╞═════════════════╪═══════╡
│ baseball ┆ 1 │
│ basketball ┆ 2 │
│ soccer ┆ 1 │
└─────────────────┴───────┘
A single expression can only produce a single column as output. Similarly, in certain scenarios,
especially when working with custom functions, an expression will also expect a single expression as
its input. In those cases, you may need to pack multiple columns into a column with the data type
Struct
.
(You know that a single expression can only produce a single column as output, so how come the
single expression that ends with .struct.unnest
in the example above produced a dataframe with two
columns? You can find the answer in
this blog article about expression expansion.)
The data type List
You can understand the data type List
if you ever worked with lists in Python. It is a
varying-length, one dimensional container. The main distinction between Python’s list built-in and
the data type List
in Polars is that, in Polars, List
must be homogeneous:
list_example = pl.Series([
[1, 2, 3],
[],
[4, 5],
[6],
])
print(list_example)
failed = pl.Series([
[1, 2, 3],
[],
["four", "five"],
[6],
])
shape: (4,)
Series: '' [list[i64]]
[
[1, 2, 3]
[]
[4, 5]
[6]
]
TypeError: unexpected value while building Series of type List(Int64)
Polars provides
a few dozen functions in
the namespace list
specially to work with columns of the data type List
. The data type List
is
pretty flexible but you should have understood by now that more flexibility tends to imply less
performance. The data type Array
is less flexible, which in turn makes it more performant. You
will learn about it next.
The data type Array
The data type Array
is akin to a NumPy array in two senses:
- the elements must have the same type; and
- the shape of the array must be fixed.
Possible use cases for the data type Array
include collections of images, matrices, and
tic-tac-toe boards.
If you want to create a column or a series of the data type Array
, all the arrays must have the
same shape. This can be a simple one-dimensional list that always has the same length:
print(
pl.Series(
[
["Harry", "Potter"],
["Hermione", "Granger"],
["Ron", "Weasley"],
],
dtype=pl.Array(pl.String, (2,)),
)
)
shape: (3,)
Series: '' [array[str, 2]]
[
["Harry", "Potter"]
["Hermione", "Granger"]
["Ron", "Weasley"]
]
But you could also nest lists within lists, as long as the nesting was the same for all of the lists.
When you want to use the data type Array
and your values are in Python lists, even if they all
have the same length, you have to specify the type Array
directly. The first argument is the data
type of the values and the second argument is, generally, a tuple specifying the shape.
Polars will only infer Array
as a data type if you pass a single NumPy arrays for the values. In
that case, Polars knows that all the sub-arrays of that array have the same shape, so Polars knows
it is safe to infer the data type as Array
.
For example, if you wanted a Polars series with the data type Array(pl.Int64, (2, 2, 2))
with the
three 3D arrays
import numpy as np
np.array(range(8)).reshape((2, 2, 2))
np.array(range(8, 16)).reshape((2, 2, 2))
np.array(range(16, 24)).reshape((2, 2, 2))
then you would need to create a 4D array with all those as subarrays along the first dimension:
major = np.array(range(24)).reshape(3, 2, 2, 2)
print(pl.Series(major))
shape: (3,)
Series: '' [array[i64, (2, 2, 2)]]
[
[[[0, 1], [2, 3]], [[4, 5], [6, 7]]]
[[[8, 9], [10, 11]], [[12, 13], [14, 15]]]
[[[16, 17], [18, 19]], [[20, 21], [22, 23]]]
]
The reason Polars is so conservative when inferring the data type as Array
is because checking
that all values have the exact same shape is time consuming.
Polars implements a number of
functions specialised to work with arrays in the namespace arr
and all the functions implemented in the namespace arr
also exist in the namespace list
. (Except
for the function arr.to_list
, which converts a column of the data type Array
into a column of
the data type List
.) However, because of the restriction that arrays must have a fixed and
constant shape, the functions in the namespace arr
are generally more efficient than their list
counterparts.
The data type Object
If you’ve read this whole article and you still cannot find the data type that is suitable for your
use case, it might be that Polars does not support the data type that you need. Even still, Polars
allows you to hold arbitrary Python objects in series through the data type Object
. This data type
is a catch-all type and because it is so generic, Polars cannot provide any specialised functions to
work with arbitrary objects, much the same way how Python provides the type object
and object
by
itself doesn’t implement any behaviour other than the extremely generic ones that all objects
implement.
Here is an example of a Polars series with the data type Object
:
functions = pl.Series([enumerate, zip, max, min, all, any, sorted])
print(functions)
shape: (7,)
Series: '' [o][object]
[
<class 'enumerate'>
<class 'zip'>
<built-in function max>
<built-in function min>
<built-in function all>
<built-in function any>
<built-in function sorted>
]
Summary table
The table below presents a summary that establishes a connection between Polars data types and some Python types. This is not to say that the two are always equivalent, but rather that you might use the Python correspondence to help you understand the Polars data type.
The Polars data type … | is like Python’s … |
---|---|
Boolean | bool |
Int8 , Int16 , Int32 , and Int64 | int with restrictions |
UInt8 , UInt16 , UInt32 , and UInt64 | int with restrictions |
Float32 and Float64 | float |
Decimal | decimal.Decimal |
String | str |
Binary | bytes |
Date | datetime.date |
Time | datetime.time |
Datetime | datetime.datetime |
Duration | datetime.timedelta |
Array | numpy.array |
List | list |
Categorical | enum.StrEnum |
Enum | enum.StrEnum |
Struct | typing.TypedDict |
Object | object |
Null | None |
Note: ⚠️ This blog article was written with Polars 1.14.0.
PyData Global 2024
On the 3rd of December of 2024, at the virtual conference PyData Global 2024, we will present a talk “Understanding Polars data types”. Check the schedule to see at what time the talk will happen in your timezone, but if you’re attending the conference be sure to attend the talk!
Footnotes
-
plus or minus one, I will leave it up to you to figure out which one. ↩