Understanding Polars data types

Tue, 26 Nov 2024

Let us explore all of the non-trivial data types that Polars supports, so that you understand why Polars supports them and when you should use them. Many of them you already know or understand intuitively. This article will push your intuition further, so you can confidently pick the appropriate data types when you are working with data in Polars.

Not counting variants, there are 18 data types that Polars supports at the time of writing this article (Polars 1.14.0):

Boolean – Boolean type that is bit-packed efficiently;
Int8, Int16, Int32, and Int64 – varying-precision integer types;
UInt8, UInt16, UInt32, and UInt64 – varying-precision unsigned integer types;
Float32 and Float64 – varying-precision float types;
Decimal – 128-bit decimal type with optional precision and non-negative scale. This gives you fine-grained control over the precision of your floats;
String – variable length UTF-8 encoded string data, typically human-readable;
Binary – variable length, arbitrary raw binary data;
Date – represents a calendar date;
Time – represents a time of day;
Datetime – represents a calendar date and time of day;
Duration – represents a time duration;
Array – homogeneous arbitrary dimension array with a fixed shape;
List – homogeneous 1D container with variable length;
Categorical – efficient encoding of string data where the categories are inferred at runtime;
Enum – efficient ordered encoding of a set of predetermined string categories;
Struct – composite product type that can store multiple fields;
Object – wraps arbitrary Python objects; and
Null – represents null values.

The more common types

Booleans

The Boolean type, pl.Boolean, is described as being “bit-packed efficiently”. You only need one bit to represent true (1) or false (0) and a byte has 8 bits, so Polars can use a single byte to represent 8 Boolean values. You can verify this with a simple experiment:

import random
import polars as pl

bools = [random.choice([True, False]) for _ in range(8 * 127)]
s = pl.Series(bools, dtype=pl.Boolean)
print(s.estimated_size())  # 127

This shows that if a series has $n$ Boolean values, Polars uses $n / 8$ bytes for that series.

Missing data

The data type Null is the type for the Polars value null, which is somewhat similar to Python’s None. The value null represents a missing value:

s = pl.Series([1, None, 3, 4, 5, None])
print(s.count())  # 4
print(s.is_null().sum())  # 2

Any Polars series, or any dataframe column, can contain the value null to represent a missing value in that position, and Polars uses null to represent missing data for every data type, even for numerical types.

As an interesting piece of trivia, using the function is_null to check which values in a column are missing is “free” in Polars. That is because Polars stores a validity mask with Boolean values indicating which items in a series are missing. This validity mask is bit-packed in the same efficient manner as Boolean columns, so the memory overhead is minimal:

s1 = pl.Series([1, 2, 3, 4, 5, 6, 7, 8], dtype=pl.Int64)  # 8 bytes x 8 integers = 64 bytes
print(s1.estimated_size())  # 64

# 64 bytes for the 8 integers Int64, plus the validity mask:
# 1 bit x 8 integers = 1 byte; total: 64 + 1 = 65
s2 = pl.Series([1, 2, 3, 4, None, 6, 7, 8], dtype=pl.Int64)
print(s2.estimated_size())  # 65
print(s2.is_null())

64
65
shape: (8,)
Series: '' [bool]
[
        false
        false
        false
        false
        true
        false
        false
        false
]

Integer types, signed and unsigned

The integer types Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, and UInt64, are very common in several programming languages and in dataframe libraries, but for a Python user the distinctions might come as a surprise because Python can handle arbitrarily large integers with ease:

googol = 10 ** 100
print(googol % 99999999977)  # 11526618770

Polars does not handle arbitrary large integers. Its integer types are Int8 through Int64, and the number after Int tells how many bits are used to store the integer. Since one bit is needed for the sign of the number, an integer data type of $n$ bits can represent integers from $-2^{n-1}$ to $2^{n-1} - 1$:

Data type	Lower limit	Upper limit
`Int8`	-128	127
`Int16`	-32768	32767
`Int32`	-2147483648	2147483647
`Int64`	-9223372036854775808	9223372036854775807

You should use the lowest precision variant that suits your needs, since lower-precision variants are more memory-efficient. When you know that a certain variable cannot have negative values (for example, the number of children a person has or their age), you can use an unsigned integer. Unsigned integers have the same precisions but they only represent positive numbers, so their upper limits are twice¹ as large as the respective signed integer:

Data type	Upper limit
`UInt8`	255
`UInt16`	65535
`UInt32`	4294967295
`UInt64`	18446744073709551615

When possible, using an unsigned integer over a signed integer adds a layer of protection to your code because your code will raise an error if you try to use a negative integer in a column with unsigned integers. This may help you catch bugs or issues with your data.

Since integer types have lower and upper limits, calculations that produce values that fall outside of the limits make the values wrap around:

s = pl.Series([0, 255], dtype=pl.UInt8)  # Integers between 0 and 255.

print(s + 1)  # 255 + 1 = 256 wraps around to 0.
print("---")
print(s - 1)  # 0 - 1 = -1 wraps around to 255.

shape: (2,)
Series: '' [u8]
[
        1
        0
]
---
shape: (2,)
Series: '' [u8]
[
        255
        254
]

This wrapping around is often referred to as overflowing or underflowing, depending on whether the values got too big or too small.

Floating point numbers

The caveats associated with working with floating point numbers in Python also apply when working with the data types Float32 and Float64 in Polars and the Python type float typically matches Polars’ Float64. Python and Polars follow the standard IEEE 754, so you can read that if you are interested in learning about the limitations of floating point number arithmetic in depth.

Columns with the data types Float32 and Float64 also use the value null to represent missing data. The value NaN, which is a special floating point value, is used to represent the result of some operations that are mathematically undetermined. Here are some examples:

pl.Series([0]) / 0

inf = float("inf")
pl.Series([inf]) - inf
pl.Series([inf]) / inf

All three computations produce the same output:

shape: (1,)
Series: '' [f64]
[
        NaN
]

Strings

Working with strings in Polars is efficient because Polars provides many useful string-specific functions under the namespace str:

print(pl.Series(["Hello, world!", "Polars is great"]).str.slice(0, 6))

shape: (2,)
Series: '' [str]
[
        "Hello,"
        "Polars"
]

Due to the way the Polars data types inference mechanism works, sometimes you will have string columns but another specialised data type is more appropriate. This happens when reading temporal data from files, for example, because Polars will not parse data into temporal data types unless you tell it to. A string column also occurs naturally when working with categorical data, in which case a data type for categorical data might be more appropriate.

Temporal data types

The Polars temporal data types are quite intuitive to work with, and they are all very similar to the data types available in the standard module datetime:

Temporal data type	Similar type in `datetime`
`Date`	`datetime.date`
`Time`	`datetime.time`
`Datetime`	`datetime.datetime`
`Duration`	`datetime.timedelta`

Polars supports dozens of specialised temporal expressions that you can access from the namespace dt.

Dates, times, and dates with times

The data type Date represents a calendar date: a day, a month, and a year. For example, someone’s birthdate would be appropriately represented as a Date. This can be parsed from a string or created directly from a datetime.date object:

from datetime import date

df = pl.DataFrame({
    "superhero": ["Superman", "Batman", "Deadpool"],
    "first_appearance": [  # Source: respective Wikipedia articles
        date(1938, 4, 18),
        date(1939, 3, 30),
        date(1990, 12, 11),
    ]
})
print(df)

shape: (3, 2)
┌───────────┬──────────────────┐
│ superhero ┆ first_appearance │
│ ---       ┆ ---              │
│ str       ┆ date             │
╞═══════════╪══════════════════╡
│ Superman  ┆ 1938-04-18       │
│ Batman    ┆ 1939-03-30       │
│ Deadpool  ┆ 1990-12-11       │
└───────────┴──────────────────┘

On the other hand, the data type Time represents a day time: an hour, minutes, seconds, and sometimes even fractions of a second. For example, the time for which your alarm clock is set would be appropriately represented as a Time. Analogously to dates, times can be parsed from strings or created directly from datetime.time objects:

from datetime import time

df = pl.DataFrame({
    "superhero": ["Superman", "Batman", "Deadpool"],
    "avg_wake_up_time": [  # Source: made up numbers
        time(5, 30, 0),
        time(13, 0, 0),
        time(11, 27, 56),
    ]
})
print(df)

shape: (3, 2)
┌───────────┬──────────────────┐
│ superhero ┆ avg_wake_up_time │
│ ---       ┆ ---              │
│ str       ┆ time             │
╞═══════════╪══════════════════╡
│ Superman  ┆ 05:30:00         │
│ Batman    ┆ 13:00:00         │
│ Deadpool  ┆ 11:27:56         │
└───────────┴──────────────────┘

To recap, the data types Date and Time are orthogonal because they do not share units. When you need both together, for example to represent your next doctor appointment, you use the data type Datetime. A Datetime is a data type that aggregates the units from Date and Time and it also provides functionality to handle much dreaded timezones. It goes without saying, but you can create values of this type by parsing strings or by using datetime.datetime objects:

from datetime import datetime, timedelta, timezone

now = datetime.now()
datetimes = [
    now,
    now.replace(tzinfo=timezone(timedelta(hours=1))),
    now.replace(tzinfo=timezone(timedelta(hours=-3))),
]

s = pl.Series(datetimes)
print(s)  # All values are converted to UTC.

shape: (3,)
Series: '' [datetime[μs]]
[
        2024-11-22 19:14:25.468051
        2024-11-22 18:14:25.468051
        2024-11-22 22:14:25.468051
]

Polars will convert all times to UTC to homogeneise the timezone in a single series/column.

If you set the timezone of a series or column, you will see the timezone appear in the data type:

print(s.dt.convert_time_zone("Europe/Amsterdam"))

shape: (3,)
Series: '' [datetime[μs, Europe/Amsterdam]]  # <-- new TZ shows in the data type
[
        2024-11-25 11:23:55.322912 CET  # <-- times are adjusted to new TZ
        2024-11-25 10:23:55.322912 CET
        2024-11-25 14:23:55.322912 CET
]

The other piece of information shown in the data type, in this case µs, is the time unit in which the datetime is kept. This can be tweaked if you need more or less precision.

Although the namespace dt provides dozens of temporal-specific operations, some of the functions you might end up using very often are the ones from the namespace str that parse string data into the temporal types:

Expression	Target data type	Docs link
`.str.to_date`	`Date`	🔗
`.str.to_datetime`	`Datetime`	🔗
`.str.to_time`	`Time`	🔗
`.str.strptime`	Any of the three	🔗

Be aware that when you are parsing strings into temporal data types, you must use the format specifiers from Rust’s crate chrono, not the ones from the Python library datetime. The more common specifiers are the same, but the specifications do not match entirely. The same caveat applies to formatting temporal data types as strings.

Time durations

The data type Duration is the data type that arises naturally when you perform arithmetics with the other data types:

bedtime = pl.Series([
    datetime(2024, 11, 22, 23, 56),
    datetime(2024, 11, 24, 0, 23),
    datetime(2024, 11, 24, 23, 37),
])

wake_up = pl.Series([
    datetime(2024, 11, 23, 7, 30),
    datetime(2024, 11, 24, 7, 30),
    datetime(2024, 11, 25, 8, 0),
])

sleep = wake_up - bedtime
print(sleep)

shape: (3,)
Series: '' [duration[μs]]
[
        7h 34m
        7h 7m
        8h 23m
]

The data type `Binary`

The data type Binary is suitable for when you want to represent raw binary data in your series or dataframes. An easy way of creating a series with the data type Binary is by specifying a Python bytes object:

s = pl.Series([b"binary", b"data", b"here"])
print(s.dtype)  # Binary

Polars supports a few expressions specialised for the data type Binary that you can find in the namespace bin. For example, the expression .bin.size gives you the size of the values in the Binary column:

print(s.bin.size())

shape: (3,)
Series: '' [u32]
[
        6
        4
        4
]

While there are many data types for which Polars provides many specialised expressions, the data type Binary is mostly provided as a convenient type for you to be able to hold data in your dataframe. Examples of scenarios where you might want to use the data type Binary include images and audio, files in proprietary formats, or serialized data.

The data type `Decimal`

You can think of the data type Decimal as a variant of the data types Float32 and Float64, but where you can control the number of decimal places your numbers have. While using the data type Decimal does not prevent all rounding errors, it can help prevent some of them.

For example, the two additions below produce 1.0 as the result, but that’s only because of rounding errors:

tiny = pow(10, -16)
print(f"{tiny + 1 = }")
print("With Float64:")
print(pl.Series([tiny], dtype=pl.Float64) + 1)

tiny + 1 = 1.0
With Float64:
shape: (1,)
Series: '' [f64]
[
        1.0
]

By using the data type Decimal, with enough decimal places, we can get an accurate result:

print("With Decimal(None, 24):")
print(pl.Series([tiny], dtype=pl.Decimal(None, 24)) + 1)

With Decimal(None, 24):
shape: (1,)
Series: '' [decimal[*,24]]
[
        1.000000000000000100000000
]

The data type Decimal takes as second argument the number of digits after the decimal point. In the snippet above, we set it to 24. The first argument specifies the maximum total number of digits in each number. Setting it to None lets Polars infer the value we need.

The data type Decimal does not have a dedicated namespace with specialised expressions and at the time of writing it is considered an unstable feature. You should also understand that Decimal is not a silver bullet for all your rounding errors, since Decimal also has limited precision.

Categorical data

A categorical variable is a variable that can only take one of a pre-determined set of values. Personally, I have always thought of categorical variables as the ones that make sense to be presented as a dropdown in a form you are filling in. For example, it would be pretty absurd if you were filling out a form where you had to input your exact salary by selecting a value in a dropdown. But if you were asked your nationality, you’d expect a dropdown list that you could quickly navigate until you found your country.

In Polars, categorical variables are always derived from string data and Polars provides two similar data types that let you work with categorical data. We present them next.

The data type `Enum`

The data type Enum is the preferred data type that you should use when dealing with categorical data. To cast a column or a series to the data type Enum, you need a three step process:

determine what are the valid categories (this can be done statically, or computed programmatically when feasible);
create a “variant” of the Enum data type by instantiating it; and
finally you can cast your series or column:

valid_values = ["panda", "polar", "brown"]  # 1.
bear_enum = pl.Enum(valid_values)  # 2.

s = pl.Series(["panda", "polar", "panda", "brown", "panda"], dtype=bear_enum)  # 3.
print(s)

shape: (5,)
Series: '' [enum]
[
        "panda"
        "polar"
        "panda"
        "brown"
        "panda"
]

You might say that the series printed looks just like a series with the string values, and you are correct. When printed or inspected visually, they look exactly the same.

However, under the hood, Polars is allowed to do operations on series with the data type Enum in a more efficient way because it knows only a fixed set of strings are legal values. This makes it more memory efficient to have a series of the data type Enum, as well as typically faster to operate on those series.

Polars will also complain if you include a value that does not belong in the enumeration:

s = pl.Series(["pand", "snake"], dtype=bear_enum)
# InvalidOperationError: conversion from `str` to `enum` failed in column '' for 2 out of 2 values: ["pand", "snake"]

This can help catch issues with your data, from typos to values that are flat out wrong.

The data type Enum also has the benefit of allowing you to work with your categories as if they are ordered. In the example of the bears, the ordering does not make sense. If our categorical variable represents the level of formal education pursued, then there is an ordering that makes sense:

education_level = ["High school", "BSc", "MSc", "PhD"]
education_enum = pl.Enum(education_level)

people = pl.DataFrame({
    "name": pl.Series(["A", "B", "C", "D"]),
    "degree": pl.Series(["High school", "MSc", "PhD", "MSc"], dtype=education_enum),
})

print(people.filter(pl.col("degree") >= "MSc"))

shape: (3, 2)
┌──────┬────────┐
│ name ┆ degree │
│ ---  ┆ ---    │
│ str  ┆ enum   │
╞══════╪════════╡
│ B    ┆ MSc    │
│ C    ┆ PhD    │
│ D    ┆ MSc    │
└──────┴────────┘

The data type `Categorical`

You can also use the data type Categorical when working with categorical data.

You do not need to specify the valid values upfront with the data type Categorical. Instead, Polars will infer these values for you. This can sound strictly better than using the data type Enum, but the inference that Polars makes comes at a cost.

Disadvantages of using Categorical instead of Enum include:

it is less performant when you operate on two columns that should have the same categories but were created independently; and
it does not catch values that are invalid.

It is not all bad, and there are cases where the data type Categorical is what you want. As a rule of thumb, only use Categorical when you cannot use Enum pragmatically.

Regardless of whether you use the data type Enum or the data type Categorical, you can always use the namespace cat and its only function get_categories to retrieve the unique values being used as categories:

s = pl.Series(["panda", "polar", "pand", "snake", "panda"], dtype=pl.Categorical)
print(s.cat.get_categories().to_list())
# ['panda', 'polar', 'pand', 'snake']

Nested data types

Nested data types resemble Python containers. These are data types that contain data inside them. Polars supports three nested data types:

Struct is like a typed dictionary in Python where the keys are fixed strings;
List is like a Python list, but with the restriction that all elements must be of the same type; and
Array is like a NumPy array, where the elements all have the same type, but the shape of the array itself is fixed.

The data type `Struct`

You can look at the data type Struct more or less like a Python dictionary with string keys. If a column has the data type Struct, all the rows have the same string keys, so typing.TypedDict is more analogous to Struct than the built-in dict.

You will understand why a Struct is needed if you see a situation where the data type Struct arises naturally:

df = pl.DataFrame({
    "name": ["A", "B", "C", "D"],
    "favourite_sport": ["basketball", "baseball", "soccer", "basketball"],
})
print(df.select(pl.col("favourite_sport").value_counts()))

shape: (3, 1)
┌──────────────────┐
│ favourite_sport  │
│ ---              │
│ struct[2]        │
╞══════════════════╡
│ {"baseball",1}   │
│ {"soccer",1}     │
│ {"basketball",2} │
└──────────────────┘

The value {"baseball",1} shows that the value "baseball" appeared only 1 time in the column "favourite_sport". The reason the values and their respective counts are put together is because a single expression should only output a single column. Since we typed only one expression, pl.col("favourite_sport").value_counts(), in the context select, this should produce as output a single column.

Since the data type Struct is like a dictionary, we can “key” into the struct to extract different values:

df = pl.DataFrame({
    "name": ["A", "B", "C", "D"],
    "favourite_sport": ["basketball", "baseball", "soccer", "basketball"],
})
counts = df.select(pl.col("favourite_sport").value_counts())

print(counts.select(
    pl.col("favourite_sport").struct.field("favourite_sport"),
    pl.col("favourite_sport").struct.field("count"),
))

shape: (3, 2)
┌─────────────────┬───────┐
│ favourite_sport ┆ count │
│ ---             ┆ ---   │
│ str             ┆ u32   │
╞═════════════════╪═══════╡
│ soccer          ┆ 1     │
│ basketball      ┆ 2     │
│ baseball        ┆ 1     │
└─────────────────┴───────┘

Using the namespace struct, that provides functions specialised to work with the data type Struct, we can use field to access one of the values of the struct. If you have a column with the data type Struct and you want to extract all the fields into their respective columns, you can do that with .struct.unnest:

df = pl.DataFrame({
    "name": ["A", "B", "C", "D"],
    "favourite_sport": ["basketball", "baseball", "soccer", "basketball"],
})
print(
    df.select(
        pl.col("favourite_sport").value_counts()
        .struct.unnest()
    )
)

shape: (3, 2)
┌─────────────────┬───────┐
│ favourite_sport ┆ count │
│ ---             ┆ ---   │
│ str             ┆ u32   │
╞═════════════════╪═══════╡
│ baseball        ┆ 1     │
│ basketball      ┆ 2     │
│ soccer          ┆ 1     │
└─────────────────┴───────┘

A single expression can only produce a single column as output. Similarly, in certain scenarios, especially when working with custom functions, an expression will also expect a single expression as its input. In those cases, you may need to pack multiple columns into a column with the data type Struct.

(You know that a single expression can only produce a single column as output, so how come the single expression that ends with .struct.unnest in the example above produced a dataframe with two columns? You can find the answer in this blog article about expression expansion.)

The data type `List`

You can understand the data type List if you ever worked with lists in Python. It is a varying-length, one dimensional container. The main distinction between Python’s list built-in and the data type List in Polars is that, in Polars, List must be homogeneous:

list_example = pl.Series([
    [1, 2, 3],
    [],
    [4, 5],
    [6],
])
print(list_example)

failed = pl.Series([
    [1, 2, 3],
    [],
    ["four", "five"],
    [6],
])

shape: (4,)
Series: '' [list[i64]]
[
        [1, 2, 3]
        []
        [4, 5]
        [6]
]

TypeError: unexpected value while building Series of type List(Int64)

Polars provides a few dozen functions in the namespace list specially to work with columns of the data type List. The data type List is pretty flexible but you should have understood by now that more flexibility tends to imply less performance. The data type Array is less flexible, which in turn makes it more performant. You will learn about it next.

The data type `Array`

The data type Array is akin to a NumPy array in two senses:

the elements must have the same type; and
the shape of the array must be fixed.

Possible use cases for the data type Array include collections of images, matrices, and tic-tac-toe boards.

If you want to create a column or a series of the data type Array, all the arrays must have the same shape. This can be a simple one-dimensional list that always has the same length:

print(
    pl.Series(
        [
            ["Harry", "Potter"],
            ["Hermione", "Granger"],
            ["Ron", "Weasley"],
        ],
        dtype=pl.Array(pl.String, (2,)),
    )
)

shape: (3,)
Series: '' [array[str, 2]]
[
        ["Harry", "Potter"]
        ["Hermione", "Granger"]
        ["Ron", "Weasley"]
]

But you could also nest lists within lists, as long as the nesting was the same for all of the lists.

When you want to use the data type Array and your values are in Python lists, even if they all have the same length, you have to specify the type Array directly. The first argument is the data type of the values and the second argument is, generally, a tuple specifying the shape.

Polars will only infer Array as a data type if you pass a single NumPy arrays for the values. In that case, Polars knows that all the sub-arrays of that array have the same shape, so Polars knows it is safe to infer the data type as Array.

For example, if you wanted a Polars series with the data type Array(pl.Int64, (2, 2, 2)) with the three 3D arrays

import numpy as np

np.array(range(8)).reshape((2, 2, 2))
np.array(range(8, 16)).reshape((2, 2, 2))
np.array(range(16, 24)).reshape((2, 2, 2))

then you would need to create a 4D array with all those as subarrays along the first dimension:

major = np.array(range(24)).reshape(3, 2, 2, 2)
print(pl.Series(major))

shape: (3,)
Series: '' [array[i64, (2, 2, 2)]]
[
        [[[0, 1], [2, 3]], [[4, 5], [6, 7]]]
        [[[8, 9], [10, 11]], [[12, 13], [14, 15]]]
        [[[16, 17], [18, 19]], [[20, 21], [22, 23]]]
]

The reason Polars is so conservative when inferring the data type as Array is because checking that all values have the exact same shape is time consuming.

Polars implements a number of functions specialised to work with arrays in the namespace arr and all the functions implemented in the namespace arr also exist in the namespace list. (Except for the function arr.to_list, which converts a column of the data type Array into a column of the data type List.) However, because of the restriction that arrays must have a fixed and constant shape, the functions in the namespace arr are generally more efficient than their list counterparts.

The data type `Object`

If you’ve read this whole article and you still cannot find the data type that is suitable for your use case, it might be that Polars does not support the data type that you need. Even still, Polars allows you to hold arbitrary Python objects in series through the data type Object. This data type is a catch-all type and because it is so generic, Polars cannot provide any specialised functions to work with arbitrary objects, much the same way how Python provides the type object and object by itself doesn’t implement any behaviour other than the extremely generic ones that all objects implement.

Here is an example of a Polars series with the data type Object:

functions = pl.Series([enumerate, zip, max, min, all, any, sorted])
print(functions)

shape: (7,)
Series: '' [o][object]
[
        <class 'enumerate'>
        <class 'zip'>
        <built-in function max>
        <built-in function min>
        <built-in function all>
        <built-in function any>
        <built-in function sorted>
]

Summary table

The table below presents a summary that establishes a connection between Polars data types and some Python types. This is not to say that the two are always equivalent, but rather that you might use the Python correspondence to help you understand the Polars data type.

The Polars data type …	is like Python’s …
`Boolean`	`bool`
`Int8`, `Int16`, `Int32`, and `Int64`	`int` with restrictions
`UInt8`, `UInt16`, `UInt32`, and `UInt64`	`int` with restrictions
`Float32` and `Float64`	`float`
`Decimal`	`decimal.Decimal`
`String`	`str`
`Binary`	`bytes`
`Date`	`datetime.date`
`Time`	`datetime.time`
`Datetime`	`datetime.datetime`
`Duration`	`datetime.timedelta`
`Array`	`numpy.array`
`List`	`list`
`Categorical`	`enum.StrEnum`
`Enum`	`enum.StrEnum`
`Struct`	`typing.TypedDict`
`Object`	`object`
`Null`	`None`

Note: ⚠️ This blog article was written with Polars 1.14.0.

PyData Global 2024

On the 3rd of December of 2024, at the virtual conference PyData Global 2024, we will present a talk “Understanding Polars data types”. Check the schedule to see at what time the talk will happen in your timezone, but if you’re attending the conference be sure to attend the talk!

plus or minus one, I will leave it up to you to figure out which one. ↩

Understanding Polars data types

The more common types

Booleans

Missing data

Integer types, signed and unsigned

Floating point numbers

Strings

Temporal data types

Dates, times, and dates with times

Time durations

The data type Binary

The data type Decimal

Categorical data

The data type Enum

The data type Categorical

Nested data types

The data type Struct

The data type List

The data type Array

The data type Object