Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's pretty much just an alternative to SQL that's a lot easier/natural to use for more hardcore data analysis.

You can much more easily compose the operations you want to run.

Just think of it as an API for manipulating tabular data stored somewhere (often parquet files, though they can query many different data sources).



Dataframes and SQL have overlapping functionality, but I wouldn't say that dataframes are an "alternative" to SQL. The tradeoffs are very different. You don't have to worry about minimizing disk reads or think about concurrency issues like transactions or locks, because a dataframe is just an in-memory data structure like a list or a dict, rather than a database. Dataframes also aren't really about relational algebra like SQL is.


Have you tried polars? I agree that if pandas is all you've tried, that it's pretty far from an alternative frontend for query engines, but if you've tried polars it maps pretty cleanly to SQL, can be query optimized to a query plan similarly to SQL, I should've made it clear that I'm talking about an alternative to SQL used in an OLAP context, not for OLTP.


Data tables tend to also be a standard ingestion format for statistical tools in many cases.


> It's pretty much just an alternative to SQL

Polars actually has SQL supported, so you can mix and match: e.g. construct DF from SQL.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: