Back to blog

Accelerating Polars DataFrames

Thu, 4 Apr 2024

Polars on GPU

Polars and NVIDIA engineers are collaborating to bring GPU acceleration to Polars DataFrames in the near future. With the collaboration Polars is able to fully utilize both the CPU and GPU (if available) significantly speeding up certain workloads.

High level design

RAPIDS cuDF is a Python GPU DataFrame library (built on Apache Arrow) for loading, joining, aggregating, filtering, and otherwise manipulating data. With the addition of the GPU engine, users get to decide on which engine they want to run their data workloads. All engines are supported by the Polars optimizer, ensuring that for each workload it is dynamically determined which operations can execute on the GPU or CPU. The functionality, similar to the streaming engine, becomes available as .collect(gpu=True).


The capability will be made available as a feature flag so that users with GPUs will be able to install the engine whenever their data workflow needs it. The GPU engine will be maintained primarily by NVIDIA and both teams will collaborate to ensure it works seamlessly for Polars users.

To accommodate the new functionality, required changes made by NVIDIA will be merged into the Polars project in the upcoming period. Following these changes the new functionality will become generally available for everyone in the foreseeable future.

CPU and GPU in data workflows

CPUs are well suited for queries requiring high levels of sequential processing and multitasking. On the other hand, GPUs were originally designed for rendering graphics. Their design makes them very efficient in parallel processing. GPUs contain hundreds to thousands of smaller cores capable of performing simultaneous computations, which is especially beneficial for the matrix and vector operations that are fundamental to deep learning algorithms and large-scale data processing.

cuDF has shown remarkable performance improvements on specific operations and dataset sizes, for example when doing group bys, joins and string operations. Introducing the GPU capability will offer the best of both worlds to Polars users.

More (technical) details will follow soon as the implementation is being finalized.



Polars is a high performance vectorized query engine for the new era of DataFrames. Built from the ground up in Rust, with interfaces to Python, Javascript and R. We believe that high performance computing should be easy and accessible for everyone. Collaborating with the RAPIDS team enables more users to benefit from GPU acceleration straight from the familiar Polars syntax


NVIDIA RAPIDS is a suite of CUDA-X libraries for developers and data scientists to accelerate the most popular open-source data processing and machine learning solutions. Built on CUDA primitives for low-level compute optimization, RAPIDS exposes GPU parallelism and high memory bandwidth to deliver unparalleled speedups in analytics and machine learning tasks.