alt-text

Getting DataFrame Size in Polars (Python)

There are many reasons why you might want to find the size (in memory) of a Polars DataFrame. The size of your dataframe might determine if using a LazyFrame and features like streaming are required. Another reason could be to write a dataframe to a number files with a specific target size. Whatever the reason Polars provides an easy way to get the approximate size of a dataframe using estimated_size().

Estimated Size Method

estimated_size() by default returns the size value in bytes. However estimated_size() uses a unit parameter which allows for values to be returned in kilobytes, megabytes, gigabytes, and terabytes as well. Acceptable unit parmater values include b, kb, mb, gb, and tb.

The code below shows using estimated_size() with different unit parameters.

import polars as pl

df = pl.DataFrame(
    {
        "state": ["Alabama", "Alaska", "Arizona", "Arkansas", "California"]
    }
)

# estimated size in bytes
df.estimated_size()
38

# estimated size in kilobytes
df.estimated_size(unit='kb')
0.037109375

# estimated size in megabytes
df.estimated_size(unit='mb')
3.62396240234375e-05

# estimated size in gigabytes
df.estimated_size(unit='gb')
3.5390257835388184e-08

# estimated size in terabytes
df.estimated_size(unit='tb')
3.456079866737127e-11