I know Hacker News dislikes spreadsheets as opposed to statistical programming languages like R and Python, and there have been many startups trying to disrupt that paradigm, but time and time again they have been the most reliable tools for data analysis and collaborating with nontechnical users.
Even as a data scientist working with other data scientists, my most common deliverable is a Google Sheet.
Note, sapico.me (without www) is a bare Windows Server welcome page. Could redirect it to www so Google search bot isn’t confused about what’s at sapico.me.
I don't dislike people using the statistical tools available to them, but in my own field (social sciences) there's a huge replication crisis going on right now. And a lot of that is due to people who were never good at math taking easy-to-use statistical tools like Excel and SPSS and blindly running stats without programming or math training.
Is it too much to ask that people treat a field with a bit of respect? Like, just because NYT reporters can use some of these "data skills", can they hold off a bit until we figure out if they're even any good at statistical analysis after their crash course? We currently have an entire academic field that has to throw away a lot of their findings because tools like sheets and SPSS gave them false confidence. I don't have any higher hopes for the NYT newsroom.
I think that’s unfairly dismissive to the data scientists, statisticians, analysts and engineers who work at the New York Times and other major publications (as well as smaller, but crucially important places like Pro Publica).
The purpose of this material isn’t to suddenly turn normal reporters into data scientists, it’s to give them a better grasp and understanding how how to evaluate different types of information that become important when reporting.
I don’t know how good or bad this material is — a cursory glance shows that it’s very low-level, the type of stuff I learned in my 100 level accounting and stats classes as an undergrad. But I won’t dismiss this material being made available and potentially augmented for all — tho I wish it was stored in GitHub or GitLab.
If you look through the material, there is nothing that actually says that someone who goes through this training will be a skilled data journalist. But it might just prevent poorly-interpreted articles like this [1] from being written.
And for the record, I’ve worked with data journalists who were more skilled in math and computer science than the engineers I work with at giant tech companies.
I don't think the folks here dislike spreadsheets for working with data. I think the opposition is against spreadsheets being used to make mission critical but very brittle applications.
In my consulting career, I've seen major organizations track data in Excel, updating each month by creating a new set of columns. An absolute nightmare to handle. Of course every month the structure changed slightly, to make it more fun.
I've also seen values stored as an #rgb only - no actual value in a cell.
Nothing wrong with analyzing data in Excel. Nothing wrong with running limited operations in Excel if it is done by the same people and highly standardized.
Tracking major operations in Excel, mostly outside the existing ERP which is happening more often than not in my opinion, is big no-no.
Side note: I know people pasting screenshots from other Excel sheets into Excel sheets....
To add to that, I have seen billions of dollars in EMD traded off an Excel model that took 30 minutes to run (come in to work, hit F9 to recalc / pull in updated market data) go get coffee, come back, send trades to trading desk
SPSS and it’s native format .sav are the industry standard for survey data. If anyone is in that field and hasn’t used Q (q research software) I highly recommend it. I used to do a lot of crosstabs and banner reports in SPSS and wincross and Q saved my probably 15 hours of work in that process per survey.
Excel is an excellent tool, the most democratic data tool out there (and for the foreseeable short term future).
Whether you are a finance person working on balancing the sheets (or whatever else they do, that's beyond my knowledge) or an operation person building complex macros, it is both versatile, easy to use and yet powerful.
The byproducts of this is that it is also subverted by the genius of human ingenuity and you end up with some pretty interesting, awe-inspiring contraptions.
Such as building a calendar system but using arrow graphs to fill it out.
And that's fine, except when you try to scale things up to automate the process and save people money.
Now you have to do some pretty wasteful engineering to accommodate this pesky creativity we have.
That is the really interesting and a testament to being a really good software (in features and reach).
It breaks the boundaries of the limited vocabulary of computers and therefore can only be fully leveraged by humans.
It's both a great reminder that most people in the world are not on the same level as the crowd around here [1] and that as the group who create tools used by such an audience, we have to be mindful of that.
On a positive note, I think there are some interesting work being done to rethink how to approach those tasks, I remember using one product in particular that had the right mix of being visually engaging while enforcing boundaries. But this won't solve the issue of users having to hack their way into getting what they want how they want it.
For the Excel pros here: how do you ever template anything in Excel? Like if you want the same formula for two different tables, do you just copy-paste? How do you keep them in sync? There's just so many things I want to do in Excel and it seems so limited in what it can do that I can't fathom how non-programmers end up using it so successfully.
I've seen locked sheets used as data source for constants and formulas with variables, the cells reference those with lookups and variables.
I've come across a few spreadsheets that made me want to meet and learn something from the authors. Some have been very elaborate and surprisingly resilient.
I have seen formulas entered as plain text in a cell, and then a macro used to update other cells with that formula. You can then change the formula in one place and run the macro to update it everywhere.
The best way to create reusable formulas in your workbooks is to add your table to the data model (which creates an in-memory tabular cube) and write measures using the DAX formula language. This has the added benefit that a single formula can be written to aggregate data at different levels, for example, a sum can be calculated over days, months, or years. This will only allow you to share formulas in a single workbook. The data model in Excel is powerful and under utilized.
Most Excel users would copy paste a formula even _within_ the same table, and don't know that Excel's built-in 'Tables' allow you to automatically keep a formula consistent between rows.
I'm not sure whether there's a good answer to your question
If there is, I'm sure that most 'Excel pros' don't know it.
I've seen folks in excel say "I'm not a programmer, so I did this in excel." and ... I was pretty damn impressed. Some of them had some of the logical thinking basics down to be pretty good at programming, if they gave it a try I suspect some might be damn good.
I’m not really very facile with excel, but one place it’s really handy for me is situations that combine manual data entry with computation.
A super simple example is keeping track of monthly bills when you have flat mates, and need to split them every month. It’s not so critical that I feel the need to version control it etc, but it’s still nice to have a visually inspectable record. Even though I spend most of my days writing code, anything I can dream up that involves python or whatever just seems unnecessarily opaque and baroque. A spreadsheet is ideally suited to the task.
I don't know who dislikes spreadsheets, I am a programmer by trade and I regularly use spreadsheets to accomplish dozens of tasks, from managing household budget to making semi-automated expense reports to keeping scores in local gaming club. Spreadsheets are an excellent tool if used properly, and more you learn about them more uses they reveal.
Until you see motor simulations running in Excel which break in many ways.
If your data can be manually entered it's decent, or if you just need to filter CSV by some columns. But anything above that please choose a proper programming language and data format.
Even as a data scientist working with other data scientists, my most common deliverable is a Google Sheet.