It's not true, in general. Systems almost universally have unintended consequences and result in side effects their designers did not foresee.
Designing benchmarks resistant to adversarial attempts to exploit the benchmark software is just something no one was thinking about when they created SWE-bench.
You are misunderstanding the saying. It is entirely about unintended consequences and viewing the system for what it actually does and not any stated intentions of the designers.
1. We must ignore the intentions of the designers (your claim), and instead see what the outcomes are
2. Therefore we should ignore Beer's intentions when designing the phrase POSWID, and instead see how it is used.
3. The overwhelming majority of people using it on the internet (including the GP comment) is to imply that the people perpetuating the system actually desire the outcome.
So the purpose of POSWID is clearly to imply intent.
Whose intent? POSWID Is about structural incentives not personal intent, and these can be, and likely are, an emergent behavior. It’s about reframing away from intents, treating the system as a structure and removing the whole structure for replacement. As opposed to localized reforms which are exposed to the same prior emergent behaviors leading to constant backsliding.
There are plenty of cases where you absolutely can/should discuss outcomes in a way where the intention is not factored in because it can often be straight up irrelevant.
If a gun is developed with the intention of hunting only bears and someone uses it to shoot people, you don’t have to constantly preface things by talking about how it’s supposed to be used only on bears. Sometimes that fact, depending on the context of the conversation, is simply not relevant.
To cover my bases here: yes it often is relevant and maybe even critical info, but it often isn’t either of those things.
It does not ignore the word. It subverts it, and that's the point. It's the system equivalent of "death of the author", which states that omes a work is written, the authors intent loses relevance and the work must be examined on its own. The aurhors opinion or relationship to the work carries no more weight than any other persons.
That's not "true" in any demonstrable sense, but it can be a useful form of analysis. As it is with "purpose of a system"
This is not how people outside of cybernetics use POSWID. From context it does not appear to be how SlinkyOnStairs was using it either.
I think it's also trying to be too cute. The first two definitions of purpose on Wiktionary[A]:
1. The end for which something is done, is made or exists.
2. Function, role.
People (uselessly) talking about the purpose of a system are often referring to #1, while POSWID is using it to mean #2. The real point of POSWID is that only definition #2 matters. POSWID is a terrible phrase not because it is wrong, but because is is an equivocation -- I suspect that Beer intended it as a pun, but the difference between the two is if one gets the joke. POSWID gets used incorrectly because people don't get the joke.
> From context it does not appear to be how SlinkyOnStairs was using it either.
The exact definition of "purpose" doesn't matter much here.
The particular version of the heuristic used here is that the stated purpose and the actual purpose often differ. POSIWID being the observation that the actual purpose is reflected by the outcomes of the system, because if that isn't the case the system gets changed.
Thus, the observation about AI benchmarks. AI companies have had years now to stop using unreliable benchmarks as advertising material. There's been years of piece after piece about the problems with these benchmarks. And yet the AI marketing continues as is.
> POSIWID being the observation that the actual purpose is reflected by the outcomes of the system, because if that isn't the case the system gets changed.
I fundamentally disagree with this, and it seems to differ from how other proponents of POSIWID in this thread view POSIWID.
It also seems trivially false; systems are dynamic what was the purpose of the system just before it was changed because people didn't like the outcomes?
I'd go further and say this is also the cybernetics equivalent of the religious teachings about humans, specifically the whole "judge by one's deeds, not by one's words" thing. So it's not like it's a novel idea.
Also worth remembering that most systems POSIWID is said about, and in fact ~all important systems affecting people, are not designed in the first place. Market forces, social, political, even organizational dynamics, are not designed top-down, they're emergent, and bottom-up wishes and intentions do not necessarily carry over to the system at large.
If you accept what the system actually does now, and decides to live with it as it is, you just deprecated the original "purpose" and made it irrelevant. You embraced "the purpose is what it does" - to you.
I think the point is that if the side effects become known and are accepted, or if they are known and rejected, then indeed the purpose of the system is what it does.
> Designing benchmarks resistant to adversarial attempts to exploit the benchmark software is just something no one was thinking about when they created SWE-bench
That seems like a major oversight. "AI does whatever maximizes reward/minimizes loss, not what you actually want" is one of the biggest challenges in ML in the last two decades (relevant here because researchers selecting architectures and training regimens that maximize public benchmarks are just a bigger training loop with those benchmarks as reward function). And the analogous issue post-training in AGI-like systems is well studied as the alignment problem, the core issue of classical AI safety
If cheating the benchmark is easier than passing it, you expect the cheating strategy to emerge and win. (Just like you would with humans btw)
I think the point of the saying is that as systems tend to expand, sooner or later we become part of them. That means that we can no longer see them from outside, we're now part of the system and our goals and the system's goals will align. Then the purpose of the system can't be anything else than what it does.
Same. Anyone who has designed anything at all in any domain realizes that what your intentions are and what materializes are often not the same. You have practical constraints in the real world. That doesn’t somehow make the constraints the purpose. The saying makes no sense.
In true HN fashion, you’re an engineer that somehow thinks that they should just form opinions through your divine intuition instead of actually reading the source material, which you very clearly haven’t done.
You’d think that for you to become “so sick of” a saying, you might actually at some point read up on what it means.
I am so tired of this saying.
It's not true, in general. Systems almost universally have unintended consequences and result in side effects their designers did not foresee.
Designing benchmarks resistant to adversarial attempts to exploit the benchmark software is just something no one was thinking about when they created SWE-bench.