There are many reasons why you might want to find the size (in memory) of a Polars DataFrame. The size of your dataframe might determine if using a LazyFrame
and features like streaming are required. Another reason could be to write a dataframe to a number files with a specific target size. Whatever the reason Polars provides an easy way to get the approximate size of a dataframe using estimated_size()
.
estimated_size()
by default returns the size value in bytes. However estimated_size()
uses a unit
parameter which allows for values to be returned in kilobytes, megabytes, gigabytes, and terabytes as well. Acceptable unit
parmater values include b
, kb
, mb
, gb
, and tb
.
The code below shows using estimated_size()
with different unit
parameters.
import polars as pl
df = pl.DataFrame(
{
"state": ["Alabama", "Alaska", "Arizona", "Arkansas", "California"]
}
)
# estimated size in bytes
df.estimated_size()
38
# estimated size in kilobytes
df.estimated_size(unit='kb')
0.037109375
# estimated size in megabytes
df.estimated_size(unit='mb')
3.62396240234375e-05
# estimated size in gigabytes
df.estimated_size(unit='gb')
3.5390257835388184e-08
# estimated size in terabytes
df.estimated_size(unit='tb')
3.456079866737127e-11