Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, it's annoying negative feature of many tech products. Of course it's natural to want to speak to your target audience (in this case, data scientists who like Pandas but find it annoyingly slow/inflexible), but it's quite alienating to newbies who might otherwise become your most enthusiastic customers.

I am the target audience for Polars and have been meaning to try it for several months, but I keep procrastinating about because I feel residual loyalty to Pandas because Wes McKinney (its creator) took the time to write a helpful book about the most common analytical tools: https://wesmckinney.com/book/



Ritchie Vink (the creator of Polars) deliberately decided not to write a book so that he (and his team) can focus full time on Polars itself.

Thijs Nieuwdorp and I are currently working on the O'Reilly book "Python Polars: The Definitive Guide" [1]. It'll be a while before it gets released, but the Early Release version on the O'Reilly platform gets updated regularly. We also post draft chapters on the Polars Discord server [2].

The Discord server is also a great place to ask questions and interact with the Polars team.

[1] More information about the book: https://jeroenjanssens.com/pp/

[2] Polars Discord server: https://discord.gg/fngBqDry


Slightly offtopic: it's a tragedy that projects like this use discord as the primary discussion forum. It's like slack in that knowledge goes to die there.


Yeah one thing that helps a bit is that they try to encourage that you post your questions to stack overflow and they'll answer it.


Disagree. Yes, forums with nested comment structure and voting are better for preserving and indexing data, but synchronous communication (like Discord) is better for having actual conversations and building community. Back and forth volleys of messages let people completely explore threads of thinking, rather than waiting days (or months) for a single response that misunderstands the question.

There’s a reason people prefer it: it’s better.


> There’s a reason people prefer it: it’s better.

That's a very "black and white" view, of any issue.

It could be that it's better for that specific person(s) who is having the current conversation. But what about after that?

It used to be that you could search Google and if it was answered in a forum post, you'd find it via Google. But since a lot of it are closed behind locked doors (like Discord), it becomes really hard to find, even with Discords own search.

Everything being ephemeral could help someone right now, but what about the others in the future, wanting the exact same answer?


Just ask it again so you can burn- and drive out the existing members with repetitive questions!


Okay I'll elaborate, it's better for community building and answering questions for nascent companies or organizations. A lot of the time there's a big disconnect between a community and how a product is intended to be used.

I built a paper trading app for stocks and options and Discord was the primary place where the users talked. The subreddit was almost completely empty, nobody responded to tweets, the Instagram was thriving but there was no sense of community because you couldn't tell if anyone commenting actually used the product or a meme just showed up in their feed.

Did I have to repeat myself a bunch? Yes, but that's fine, especially because the answer sometimes changed (rapidly), like "No we don't support multiple trading accounts per user" > "Yeah I implemented that last week, it's in the release notes, to add a second account..."

For a mature product that's not changing as much and more or less has all its features built out, it makes sense to branch out into more structured forums that are easily searchable, especially as you progress through different versions and users are looking for answers to past versions.


> I built a paper trading app for stocks and options and Discord was the primary place where the users talked. The subreddit was almost completely empty, nobody responded to tweets, the Instagram was thriving but there was no sense of community because you couldn't tell if anyone commenting actually used the product or a meme just showed up in their feed.

What I'm hearing is that you think Discord is better than a forum, because people talked in Discord but they didn't talk in the forum, a forum which you didn't have?

Do you have comparable experience building a community via a traditional forum VS doing so with Discord? As far as I can tell, it doesn't seem like you have tried a normal forum, yet you say Discord is better for community building.


Shouldn't the knowledge go into a wiki anyways? Reading old discussions in Reddit can turn up interesting things, but it's time consuming and bad information doesn't get fixed later.


Luckily our book will also be available in hard copy so you can digest all that hard-won knowledge in an offline manner :)


I'll wait until chatGPT can regurgitate it.


I do understand your "snarky" comment humoring, however do buy a copy if you want to support them, it's neither cheap or easy to make a book.


losing all nuance by virtue of getting dopamine quicker? count me in!


I often see this comment, and every time I think; but having people come to the information AND the community is better for the project.


Short term perhaps, but long term having a non-indexed community is inconvenient for newcomers.


microsoft copilot can summarize discussions. with some orchestration it could extract even from past discussions question+answers and structure them in a stackoverflow-like format.

source: we use this feature in beta as part of the enterprise copilot license to summarize Teams calls. Yes, it listens to us talking and spits out bullet points from our discussions at the end of the call. It's so good it feels like magic sometimes.

note on copilot: any capable model could probably do it. I just said copilot because it does it today.


That's why Ritchie is very active on, and often refers to, Stackoverflow as well! Exactly to document frequent questions, instead of losing them to chat history.


there are projects that you can use to index discord servers, unfortunately a lot of communities just don't use them.


Why would they? A person who picks Discord has no idea what knowledge discovery is.


Forums work really well for this. I personally avoid using Discord because chatrooms are too much of a time suck. There's far more chaff to sift through and trying to keep up with everything leads to FOMO.


by community do you mean all the people who make an account just to ask a question on the project's discord, only ever open it to check if someone answered and then never use discord again?


There is a free Polars user guide [0] as a part of Polars project. It was known as "polars-book" before it has been was in-tree [1].

[0] https://docs.pola.rs/user-guide/

[1] https://github.com/pola-rs/polars/tree/main/docs/user-guide


I'm not suggesting people need to write books to introduce their projects, but that landing pages should be more accessible to newbies if you want to build a big user base. A lot of projects introduce themselves by ticking off a list of currently-desirable buzzwords ('performant', 'beautiful' etc.) but neglect to articulate clearly what their project is and why someone might want to use it.


Any plans to try fine tune an LLM specialised in polars? That would really be the killer feature to get major adoption IMO.


Is there a book that is even more basic for more junior people in regards to dataframe / storage solutions for ML applications to recommend? Thank you


> it's quite alienating to newbies who might otherwise become your most enthusiastic customers.

Newbies are your best target audience too! They aren't already ingrained in a system and have to learn a new framework. They are starting from yours. If a newbie can't get through your docs, you need to improve your docs. But it's strange to me how mature Polars is and that the docs are still this bad. It makes it feel like it isn't in active/continued development. Polars is certainly a great piece of software, but that doesn't mean much if you can't get people to use it. And the better your docs, the quicker you turn noobs into wizards. The quicker you do that, the quicker you offload support onto your newfound wizards.


"Newbies" to data science are indeed a good target audience, before they are already attached to pandas. But this doesn't imply they know nothing. It's very unlikely that someone both 1. has a need to do the kind of data analysis that polars is good at, and 2. has never heard of the "data frame" concept.


The docs are okay, but the feature set is lacking compared to pandas, which is understandable since this is at version 0.2. I was exploring if it's possible to use this, but we need diff aggregation which it doesn't have, so it's a no go right now.


Do you mean something like `.agg(pl.col("foo").diff())`?

Or is diff aggregation its own thing? (I tried searching for the term, but didn't find much.)


Nevermind, it has it but it's under Computation in polars.Series.diff and I was looking under Aggregation. This is great.

For instance you've got a time series with an odometer value and you want the a delta from the previous sample to compute the every trip.


> But it's strange to me how mature Polars is and that the docs are still this bad.

Interesting, I've personally found them quite good and compared to datafusion or duckdb they're dramatically better. I agree pandas has better docs, but one of the strengths of polars is that I find I often don't need the docs due to putting lots of careful thought into designing a minimal and elegant API, not to mention they're actually care about subtle quirks like making autocomplete, type hinting, etc. work well.


Sounds like we might be coming from different perspectives. I honestly don't use any DF libraries often, and really only Pandas. I used to use pandas a fair amount, but that was years ago, and now I only have to reach for it a few times a year. So maybe the docs are good for people that already have deeper experience. Because I think just the fact that you have used datafusion and duckdb illustrates that you're more skilled in this domain than I am, because I haven't used those haha.

But I do think making good docs is quite hard. You usually have multiple audiences that you might not even be aware of. Which makes one of the most important things to do is keep an open ear to listen for them. It's easy to get trapped thinking you got your audience but you're actually closing the door to many more groups (unintentionally). It's also just easy to be focused on the "real" work and not think about docs.


What, specifically, is bad about the docs? This whole thread is people who just looked at the home page, saw that it is "DataFrames", but didn't know what that means and came here to complain. Nobody has said anything about issues with the docs for someone who understands what a data frame is (or spent like two minutes looking that up) but is struggling to figure out how to use this library specifically.


I think your experience is probably making it difficult to understand the noob side of things. For me, I've struggled with simply slicing up a dataframe. And as I specified, these aren't tools I use a lot, so the "who understands what a data frame is" probably doesn't apply to me very well and we certainly don't need the pejorative nature suggesting that it is trivially understood or something I should know through divine intervention. I'm sure it's not difficult, but it can take time for things to click.

Hell, I can do pretty complex integrals and derivatives and now so much of that seems trivial to me now but I did struggle when learning it. Don't shame people for not already knowing things when they are explicitly trying to learn things. Shame the people that think they know and refuse to learn. There's no reason to not be nice.

Having done a lot of teaching I have a note, don't expect noobs to be able to articulate their problems well. They're noobs. They have the capacity to complain but it takes expertise to have clarify that complaint, turning it into a critique. I get that this is frustrating, but being nice turns noobs into experts and often friends too.


I really think this is a misunderstanding of the purpose of different kinds of documentation. The documentation of a new tool for a mature technique is just not the primary place to focus on writing a beginners' tutorial / course on using that technique. Certainly, "the more the merrier" is a good mantra for documentation, so if they do add such material, all the better. But it is very sensible for it to not be the focus. The focus should be, "how can you use this specific iteration of a tool for this technique to do the things you already know how to do".

Nobody is suggesting that you should be an expert on data frames "through divine intervention". But the place to expect to learn about those things is the many articles, tutorials, courses, and books on the subject, not the website of one specific new tool in the space.

If you're really interested in learning about this, a fairly canonical place to start would be "Python for Data Analysis"[0] by Wes McKinney, the creator of pandas and one of the creators of the arrow in-memory columnar data format that most of these projects build atop now.

This is a (multiple-) book length topic, not a project landing page length topic.

0: https://wesmckinney.com/book/


> But it is very sensible for it to not be the focus.

Sure. I mean devs can do whatever they want. But the package is a few years old now and they do frequently advertise, so I don't think it makes make it more approachable for... you know... noobs.

This is a bit difficult of a conversation too, because you've moved the goal post. I've always held the context of noob, but now you've shifted to just be dismissive of noobs. Totally fine, but different from the last comment.

> But the place to expect to learn about those things is the many articles, tutorials, courses, and books on the subject, not the website of one specific new tool in the space.

I actually disagree. This is the outsourcing I expressed previously, but it's clear from the number of complaints that this is not sufficient for a novice. You do seem passionate about this issue, and so maybe you have the opportunity to fill that gap. But I very much think that official documentation is supposed to be the best place. Frankly because it is written by the people who have a full understanding of the system and how it all integrates together. I'm sure you've run into tons of Medium tutorials that get the job done but are also utter garbage and misinform users. It isn't surprising when most of these are written by those in the process of learning, and are better than nothing, but they are entirely insufficient. The whole point of creating a library is to save people time. That time includes onboarding. For example of good docs, I highly recommend the vim docs. Even man pages are often surprisingly good.


> now you've shifted to just be dismissive of noobs

No, I'm sorry, this is getting ridiculous. I'm not being dismissive of noobs, I'm saying "noobs should seek introductory material when attempting to learn an entirely new subject, like books, courses, or tutorials on the subject matter".

It's just so freaking weird for you to expect every single tool in some space to create that introductory material.

I promise you that the ruby on rails website did not assume total ignorance of the term "web application" when I first came across it as a "noob". I was a total noob at ruby on rails, but I had to understand why I might be interested in "web applications, but easier".

I could spend all day coming up with examples that are just like this. And this is not some kind of failure of imagination in how to document specific projects, it's just specialization. The website of a new tool for something that has been done a bunch of times over multiple decades is not the right place to put the canonical text on what the thing you're doing is; you put that in a book or in college courses or other kinds of training materials.

Unless what you have made is a brand new entirely unfamiliar thing (which is very rare) with no introductory materials for your brand new novel concept available anywhere, it makes more sense to focus your documentation on "why choose this specific solution over the other ones people are already familiar with" rather than "what even is the thing that we're doing here from first principles". Sure, add some links to the best introductory materials, but don't try to write them yourself, that's crazy!

> I actually disagree. This is the outsourcing I expressed previously, but it's clear from the number of complaints that this is not sufficient for a novice. You do seem passionate about this issue, and so maybe you have the opportunity to fill that gap.

No, I'm not passionate about this issue. I think people who actually want to learn things will continue doing research and reading books and taking classes to learn about new subjects, and that people who just want to complain will continue to do so. There is no "gap" to fill. There are tons of great materials that will describe in great depth what "data frames" are, and how to work with them, for anyone who is even the tiniest bit interested.

> I very much think that official documentation is supposed to be the best place. Frankly because it is written by the people who have a full understanding of the system and how it all integrates together.

I think what you seem to be confused by is the difference between this one library - polars - and an entire large subject - tabular data analysis using data frames. It certainly does make sense for the polars website to document the polars library, which (in my view) it already does. But if you want to learn the subject, you need to do that in the normal way that people have always learned new subjects. I'm sorry, because you seem resistant to this, but again, the way to do that is with books and courses, not by reading the documentation of one tool comprising a tiny sliver of a very large subject.

> I'm sure you've run into tons of Medium tutorials that get the job done but are also utter garbage and misinform users.

No, Medium tutorials should not be your go-to source for learning about a new subject! Your go-to source should be books and courses.

This is why I keep commenting here. I want to get through to you that you seem to be going about the acquisition of knowledge in a very weird and fundamentally misguided way. It just isn't the case that knowledge is mostly found in the documentation of tools! There is way more foundational knowledge to learn than it would ever make sense for every little tool to document themselves.

This is, in a very literal sense, why people write books about things, and why schools exist. We don't teach algebra by linking to the Mathematica documentation.


I can't speak for the Python side of the Polars docs but coming from Python and Pandas to Rust and Polars hasn't always been easy. To be fair, that isn't just about docs but also finding articles or Stack Overflow answers for people doing similar things.


That certainly makes sense!


I'm a dataframes noob. I saw this post and the performance claims attracted me. I went to chatGPT to understand what dataframes were about. Then on udemy, I searched for a polar course. A course required pre-requisites : a bit about jupyter notebooks and pandas. Then I went through a few modules of a pandas course. Now, I'm going through a polars course. Altogether, I spent about 2-3 hours to setup the environment and know what this is all about.

A little bit context would have helped to have attracted a lot more noobs.g


Your first paragraph makes perfect sense! I was nodding along. But then your concluding sentence was a bit of a record scratch for me. This all worked as intended! You knew what the project was about - "data frames" - and what might make it attractive to you - the performance claims - and then you went and followed exactly the right path to get the context you needed to understand what's going on with it. It's a big topic that you were able to spin up on to a basic level in 2-3 hours, by pulling on strings starting at this landing page. This is a very successful outcome.

I'd also recommend this book: https://wesmckinney.com/book/. It's not about polars, but you'd be able to transfer its ideas to polars easily once you read it.


"How To Be A Pandas Expert"[1] is a good primer on dataframes. There's a certain mental model you need to use dataframes effectively but it's not apparent from reading the official docs. The video makes it explicit: dataframes are about like-indexed one-dimensional data, and every dataframe operation can be understood in terms of what it does to the index.

[1] https://www.youtube.com/watch?v=oazUQPrs8nw


The Rust docs are for some reason much worse than the Python docs, or at least that used to be the case


I'm a data engineering newbie and I found it very clear, and it gave me an enthusiastic feeling (not an "alienating" feeling).

This whole thread just comes across as unmitigated pedantry to me.


Presumably you were introduced to the concept of DataFrames and how they're used through some other source, because Polars landing page doesn't even bother to mention it's used for data analysis and documentation simply assumes you're already familiar with the core concepts.

Compare that to Pandas which starts with the basics, "pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language." It then leads you to "Getting started" guide which features "Intro to pandas" that explains the core concepts.


Wes has also worked hard to improve a lot of the missteps of pandas, such as through pyarrow, which may prove even more impactful than pandas has been to date.

Polars is also a wonderful project!


Polars is also based on McKinney’s Arrow project.

Polars is a DataFrame interface on top of an OLAP Query Engine implemented in Rust using Apache Arrow Columnar Format as the memory model.

https://github.com/pola-rs/polars/blob/main/README.md


Wes also literally created another Python dataframe project, Ibis, to overcome many of the issues with pandas

https://ibis-project.org

most data engines and dataframe tools these days use Apache Arrow, it's a bit orthogonal


It’s annoying only because it’s on hacker news, because what are the odds of getting on it if you don’t know what is it and don’t have a need for it?


It's annoying because a single leading sentence would be enough to explain a product. Some of the words (for example "Data Frame") in that sentence can be links to other pages if that's necessary. It's a small change but it makes a huge difference.


HTML has an element to provide definitions of terms without having to link out, but almost nobody uses it.


I mean, pretty high. What if your boss just tells you to learn polars, and you don’t know why? Saying what something is, is just good communication, and can help clarify for people who are confused.


Guess in the remote event that you're told to learn a new skill that you don't know anything about, you go to pola.rs website and see "DataFrames for the new era" and start getting documentation from there, about what DataFrame is, the website is clearly showing what is it, it's your duty to understand what is it, I would argue that if you knew what DataFrames are you would be saying "Why is it saying something so basic and don't just show me the good stuff?"

I for example hate website that try to serve newbies, newbies have a lot of content if they are interested, it's not that all the web needs to serve them


> What if your boss just tells you to learn polars, and you don’t know why? Saying what something is, is just good communication

Shouldn't the good communication happen when the boss tells you to learn polars? Like, why are you telling me this, boss; what is it that you need done?


These workplaces where bosses tell employees to learn unheard-of tools with zero context sound terrible.


> Yes, it's annoying negative feature of many tech products.

Sadly its not only tech products, but also things like security disclosures too.

It always follows the same pattern:

    - Spend $X time coding/researching something.
    - Spend $not_enough_time documenting it.
    - Spend $far_too_much_time thinking about / "engaging with the community" in deciding on a cute name, fancy logo and cool looking website.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: