*John Ousterhout had famously written that threads were bad, and many people agr...

John Ousterhout had famously written that threads were bad, and many people agreed with him because they seemed to be very hard to use.

Google software avoided them almost always, pretty much banning them outright, and the engineers doing the banning cited Ousterhout

Yeah this is simply not true. It was threads AND async in the C++ world (and C++ was of course most of the cycles)

The ONLY way to use all your cores is either threads or processes, and Google favored threads over processes (at least fork() based concurrency, which Burrows told people not to use).

For example, I'm 99.99% sure MapReduce workers started a bunch of threads to use the cores within a machine, not a bunch of processes. It's probably in the MapReduce paper.

So it can't be even a little true that threads were "avoided almost always"

---

What I will say is that pattern to say fan out requests to 50 or 200 servers and join in C++ was async. It wasn't idiomatic to use threads for that because of the cost, not because threads are "hard to use". (I learned that from hacking on Jeff Dean's tiny low latency gsearch code in 2006)

But even as early as 2009, people pushed back and used shitloads of threads, because ASYNC is hard to use -- it's a lot of manual state management.

e.g. From the paper about the incremental indexing system that launched in ~2009

https://research.google/pubs/large-scale-incremental-process...

https://storage.googleapis.com/gweb-research2023-media/pubto...

Early in the implementation of Percolator, we decided to make all API calls blocking and rely on running THOUSANDS OF THREADS PER MACHINE to provide enough parallelism to maintain good CPU utilization. We chose this thread-per-request model mainly to make application code easier to write, compared to the event-driven model. Forcing users to bundle up their state each of the (many) times they fetched a data item from the table would have made application development much more difficult. Our experience with thread-per-request was, on the whole, positive: application code is simple, we achieve good utilization on many-core machines, and crash debugging is simplified by meaningful and complete stack traces. We encountered fewer race conditions in application code than we feared. The biggest drawbacks of the approach were scalability issues in the Linux kernel and Google infrastructure related to high thread counts. Our in-house kernel development team was able to deploy fixes to ad- dress the kernel issues

To say threads were "almost always avoided" is indeed ridiculous -- IIRC this was a few dedicated clusters of >20,000 machines running 2000-5000+ threads each ... (on I'd guess ~32 cores at the time)

I remember being in a meeting where the indexing VP mentioned the kernel patches mentioned above, which is why I thought of that paper

Also as you say there were threads all over the place in other areas too, GWS, MapReduce, etc.