Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To add some balance:

- I can't rule out that a pandas wizard couldn't have achieved the same speed-up in pandas

- polars code was slightly more verbose. For example, when calculating columns based on other columns in the same chain, in pandas, each new column can be defined as a kwarg in a single call to `assign`, whereas in polars, columns that depend on other must be defined in their own calls to `with_columns`

- handling of categoricals in polars seemed a little underbaked, though my main complaint, that categories cannot be pre-defined, seems to have been recently addressed: https://github.com/pola-rs/polars/issues/10705

- polars is not yet 1.0, breaking changes will happen



Regarding your second point, you can use the walrus operator to retain the results of a computation within a single `.with_columns()` call. See https://stackoverflow.com/a/77609494

Edited to add: also, if you’re using a lazy dataframe, you can just naively do the same operation twice (once to store it in a named column and once again in the subsequent computation), and Polars will use common subexpression elimination (CSE) to prevent recomputing the result. You can verify this is true using the `.explain()` method of a lazy dataframe operation containing the `.with_columns()` call.


That's awesome, thanks for sharing! Though tbh I'm not likely to use it.. it's a bit too magical - though still a delicious hack.


I just edited my comment above to add more info about common subexpression elimination. It’s magic that happens behind your back on lazy dataframes. Polars is great!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: