You will be even more horrified to learn that installing the entire list of deps of a project that would take a few seconds on my home laptop may take up to 20 minutes at some clients because many FS calls do a network round-trip.
We are not talking about exceptions either. This is pretty standard stuff when you work outside of the IT-literate companies.
At one client, they provided me with a part time tester, they neglected to give him the permissions to install git. Took 3 weeks to fix.
The same client makes us dev on Windows machine but deploy on Linux pods. We can't directly test on the linux, nor connect to them, only deploy on it. In fact, we don't even have the specs of the pods, I had to create a whole API endpoint in the project just to be able to fetch them.
Other things I got to enjoy:
- CTO storing the passwords of all the servers in an libre office file
- lead testing in prod, as root, by copying files through ftp. No version control.
- sysadmin that had an interesting way of managing his servers: he remote controlled one particular windows machine using team viewer which ones the only one that could connect through ssh to them.
The list is quite long.
This makes you see the entire world with a whole new perspective.
I always thought that all devs should spend a year doing tech support for a variety of companies so that they get a reality check on what most humans actually have to deal with when working on a computer.
It's also literally factually incorrect. Pretty much the entire field of mechanistic interpretability would obviously point out that models have an internal definition of what a bug is.
> Thus, we concluded that 1M/1013764 represents a broad variety of errors in code.
(Also the section after "We find three different safety-relevant code features: an unsafe code feature 1M/570621 which activates on security vulnerabilities, a code error feature 1M/1013764 which activates on bugs and exceptions")
This feature fires on actual bugs; it's not just a model pattern matching saying "what a bug hunter may say next".
This is more of an article describing their methodology than a full paper. But yes, there's plenty of peer reviewed papers on this topic, scaling sparse autoencoders to produce interpretable features for large models.
There's a ton of peer reviewed papers on SAEs in the past 2 years; some of them are presented at conferences.
(Not GP) There was a well recognized reproducibility problem in the ML field before LLM-mania, and that's considering published papers with proper peer-reviews. The current state of afairs in some ways is even less rigourous than that, and then some people in the field feel free to overextend their conclusions into other fields like neurosciences.
We're in the "mad science" regime because the current speed of progress means adding rigor would sacrifice velocity. Preprints are the lifeblood of the field because preprints can be put out there earlier and start contributing earlier.
Anthropic, much as you hate them, has some of the best mechanistic interpretability researchers and AI wranglers across the entire industry. When they find things, they find things. Your "not scientifically rigorous" is just a flimsy excuse to dismiss the findings that make you deeply uncomfortable.
Current LLMs do not think. Just because all models anthropomorphize the repetitive actions a model is looping through does not mean they are truly thinking or reasoning.
On the flip side the idea of this being true has been a very successful indirect marketing campaign.
My point was not that I’m 100% convinced that LLMs can think or are intelligent.
My point was that we don’t have a great definition for (human) intelligence either. The articles you posted also don’t seem to be too confident in what human intelligence actually entails.
> There is controversy over how to define intelligence. Scholars describe its constituent abilities in various ways, and differ in the degree to which they conceive of intelligence as quantifiable.
Given that an LLM isn’t even human but essentially an alien entity, who can confidently say they are intelligent or not?
I’m very sceptic of those who are very convinced one or the other way.
Are LLMs intelligent in the way that humans are? I’m quite sure they aren’t.
Are LLMs just stochastic parrots? I don’t find that framing convincing anymore either.
Either way it’s not clear, just check how this topic is discussed daily in most frontpage threads for the last couple of years
> So if you can sell those MT for $1-5, you're printing money.
The IF is doing a lot of heavy lifting there.
I understood the OP in the context of "human history has not produced sufficiently many tokens to be sent into the machines to make the return of investment possible mathematically".
Maybe the "token production" accelerates, and the need for so much compute realizes, who knows.
reply