Hacker Newsnew | past | comments | ask | show | jobs | submit | dinkblam's commentslogin

what is the evidence that being able to play games equates to AGI?

The test doesn't prove you have AGI. It proves you don't have AGI. If your AI can't solve these problems that humans can solve, it can't be AGI.

Once the AIs solve this, there will be another ARC-AGI. And so on until we can't find any more problems that can be solved by humans and not AI. And that's when we'll know we have AGI.


AI X that can solve the tests contrasted with AI Y that cannot, with all else being equal, means X is closer to AGI than Y. There's no meaningful scale implicit to the tests, either.

Kinda crazy that Yudkowsky and all those rationalists and enthusiasts spent over a decade obsessing over this stuff, and we've had almost 80 years of elite academics pondering on it, and none of them could come up with a meaningful, operational theory of intelligence. The best we can do is "closer to AGI" as a measurement, and even then, it's not 100% certain, because a model might have some cheap tricks implicit to the architecture that don't actually map to a meaningful difference in capabilities.

Gotta love the field of AI.


Will there be a point in that series of ARC-AGI tests where AI can design the next test, or is designing the next text always going to be a problem that can be solved by humans and not AI?

I don't see why AI couldn't design tests. But they can only be validated by humans, as they are intended to be possible and ideally easy for humans to solve.

Yes, but I guess you see what I'm getting at. If designing the next ARC-AGI test is impossible for AI without a human in the loop, then AGI becomes unreachable by definition.

>It proves you don't have AGI.

It doesn't prove anything of the sort. ARC-AGI has always been nothing special in that regard but this one really takes the cake. A 'human baseline' that isn't really a baseline and a scoring so convoluted a model could beat every game in reasonable time and still score well below 100. Really what are we doing here ?

That Francois had to do all this nonsense should tell you the state of where we are right now.


None whatsoever.

It's a "let's find a task humans are decent at, but modern AIs are still very bad at" kind of adversarial benchmark.

The exact coverage of this one is: spatial reasoning across multiple turns, agentic explore/exploit with rule inference and preplanning. Directly targeted against the current generation of LLMs.


There isn't a strict definition of AGI, there's no way to find evidence for what equates to it, and besides, things like this are meant only as likely necessary conditions.

Anyway, from the article:

> As long as there is a gap between AI and human learning, we do not have AGI.

This seems like a reasonable requirement. Something I think about a lot with vibe coding is that unlike humans, individual models do not get better within a codebase over time, they get worse.


Is that within a codebase off relatively fixed size that things get worse as time goes on, or are you saying as the codebase grows that the limits of a model's context means that because the model is no longer able to hold the entire codebase within its context that it performs worse than when the codebase was smaller?

I think there's a few factors, codebase size is one, and the tendency for vibe coding to be mostly additive certainly doesn't help with that.

But vibe coding also tends to produce somewhat poor architecture, lots of redundant and intermingled bits that should be refactored. I think the model is worse the worse code it has to work with, which I presume is only in part because it's fundamentally harder to work with bad code, but also in part because its context is filled with bad code.


The evolution of the test has been partly due to the evolution of AI capabilities. To take the most skeptical view, the types of puzzles AI has trouble solving are in the domain of capabilities where AGI might be required in order to solve them.

By updating the tests specifically in areas AI has trouble with, it creates a progressive feedback loop against which AI development can be moved forward. There's no known threshold or well defined capability or particular skill that anyone can point to and say "that! That's AGI!". The best we can do right now is a direction. Solving an ARC-AGI test moves the capabilities of that AI some increment closer to the AGI threshold. There's no good indication as to whether solving a particular test means it's 15% closer to AGI or .000015%.

It's more or less a best effort empiricist approach, since we lack a theory of intelligence that provides useful direction (as opposed to a formalization like AIXI which is way too broad to be useful in the context of developing AGI.)


I think the idea is that if they cannot perform any cognitive task that is trivial for humans then we can state they haven’t reached ‘AGI’.

It used to be easy to build these tests. I suspect it’s getting harder and harder.

But if we run out of ideas for tests that are easy for humans but impossible for models, it doesn’t mean none exist. Perhaps that’s when we turn to models to design candidate tests, and have humans be the subjects to try them out ad nauseam until no more are ever uncovered? That sounds like a lovely future…


The reality is machines can brute force endlessly to an extent humans cannot, and make it seem like they are intelligent.

Thats not intelligence though. Even if it may appear to be. Does it matter? Thats another question. But certaintly is not a representation of intelligence.


That is not the claim. It is a necessary condition, but not a sufficient one.

The evidence is that humans are able to win these games. AGI is usually defined as the ability to do any intellectual task about as well as a highly competent human could. The point of these ARC benchmarks is to find tasks that humans can do easily and AI cannot, thus driving a new reasoning competency as companies race each other to beat human performance on the benchmark.

> AGI is usually defined as the ability to do any intellectual task about as well as a highly competent human could

I think one major disconnect, is that for most people, AGI is when interacting with an AI is basically in every way like interacting with a human, including in failure modes. And likely, that this human would be the smartest most knowledgeable human you can imagine, like the top expert in all domains, with the utmost charisma and humor, etc.

This is why the "goal post" appears to be always moving, because the non-commoners who are involved with making AGI and what not never want to accept that definition, which to be fair seems too subjective, and instead like to approach AGI like something different, it can solve some problems human's can't, when it doesn't fail, it behaves like an expert human, etc.

Even if an AI could do any intellectual task about as well as a highly competent human could, I believe most people would not consider it AGI, if it lacks the inherent opinion, personality, character, inquiries, failure patterns, of a human.

And I think that goes so far as, a text only model can never meet this bar. If it cannot react in equal time to subtle facial queues, sounds, if answering you and the flow of conversation is slower than it would be with a human, etc. All these are also required for what I consider the commoner accepting AGI as having been achieved.


By that definition, does a human at the other end of a high-latency video call not have AGI because they can't react any faster that the connection's latency would allow them to have? From your POV what's the difference between that and an AI that's just slow?

> does a human at the other end of a high-latency video call not have AGI because they can't react any faster that the connection's latency would allow them to have

Correct. A person who'd mentally operate that slowly would be considered to have some cognitive disability. For example, would likely not be allowed to drive a car.

You could be fooled in thinking it is a human behind a slow connection, but layman would not consider it real AGI in my opinion, since you have to handicap the human, it seems like lowering the bar just to pretend you reached AGI.

You might recognize it's pretty close to AGI, if it has all the other qualities, but it needs to also operate at a similar response time, uptime, and so on.

My point is, everyone that's not trying to build AGI defines it as, same as an idealized smartest human would be in every way. I truly think this is how most people imagine AGI in their head, and until you have that, they'll say it's not AGI, and industry folks will claim the goalpost keeps moving, when in reality they kept setting their own post.


it seems if you want the same on macOS, this is the place to contribute:

https://github.com/Alien4042x/Wine-NTsync-Userspace-macOS-ba...


That's interesting. I thought the point was that it needed to be in-kernel for performance reasons; if it works in userspace why did linux not do that?

Ideally it does need to be in-kernel for performance reasons. But that's not possible on macOS, so it's better to have it in userspace than not at all.

But does anyone care about MacOS? ;)

I mean, I know Mac has had some great games (eg. I spent so much time on school Macs playing that Bolo tank game) ... but they have probably <1% of the number of games Windows has. I'd expect a simiilar percentage of devs to be interested in Mace (or whatever you call Mac Wine).


Not sure what you mean. The number of Mac games isn't relevant to a subthread about a project to increase performance when Windows games on Mac.

everything in germany's economy is going downhill, the troubles in their car industry is just one symptom (but not the cause)


Where did you get that opinion? Germany is not doing great but OK in the group of Western countries, and its car industry is both very imporant and in trouble, so it's not an unreasonable opinion that things would be better without that trouble.


Germany has a great layer of "consultants" that fudge the books and make everything look profitable and rosy. It's the land of "Arbeitsgruppen" and "Berater" - folks that ensure things get buried and forgotten.

But there is no investment in the future, no investment in infrastructure and no investment in anything creative, in fact, that's were cuts are made, in the arts and culture.

Once a society can no longer afford the arts, you know there is something going wrong and Germany is going wrong. Perhaps "klagen auf hohem Niveau" (complaining from up on high) but the higher they are, the further they fall.


It's not. It's more like a cancer patient with an Überweisung for their first cancer screening but dragging their feet to go and do it. They know is it bad and will get worse but they're afraid of facing it.


Imho, the german mentality just doesn't fit today's economy. Too risk averse, too conservative. Creativity is not really embraced.

The state of the german IT sector also shows that.

Most startups have nearly no moat at all and purely live off marketing with some sprinkles of corporate identity.


In Switzerland, we use a lot of German and Switzerland born products, and they mostly suck.


I'm not saying there are no good products. Hetzner Cloud come into my mind for example. It's executed really well.

I'm saying that the number of good software offerings is too low, to have a significant impact on the country's economy.

One of the advantages Germany had though, was a somewhat good and accessible higher education system in regards of computer science.

Now, with software development becoming a commodity, this advantage vanishes.


What is the cause?


imho there are multiple, starting with the pension and healthcare system which are not sustainable with the current demography trend, which pushed them into going all in with immigration, which fractured whatever was left of german identity (which was arguably already wiped out after ww2 and the cold war). Taxes are going up, retirement age is increasing, pensions are decreasing, public services are getting worse year after year, there is nothing young people can focus on, nothing they can expect to have better than their parents or grand parents, most will never own their place. The self sabotage of the energy sector certainly didn't help. No long term vision + no clear way to improvement + no sense of appartenance = game over, and this is hitting most of the west at once, it's all about individualism and consumption, you can't build societies on these principles.


You wrote my thoughts. Add one more thing: Germany is federation with insanely complex administration. With many different (outdated) education systems, too many public healthcare insurers. It’s too much of regulation of everything decreasing real efficiency to zero.

Latest example (I am electrical engineer AND electrician): from this year on my buddy heating system specialist can’t help me with photovoltaic system installation on the roof. Last year he was qualified, this year not anymore. He can however install air conditioning unit on the roof this year too. But not the solar panels… Every year some shady lobby group writes some special law crippling last pieces of working system.

There should be some deregulation and centralization institution in Germany with a real short time efficiency increase plan. Otherwise it will stay there as a country of Oktoberfest and Cologne Carnival.


> it's all about individualism and consumption, you can't build societies on these principles

Lots of real problems listed, but such a non-sequitur conclusion. US is built on these principles, China seems to be more individualistic and consumerist than Germany too. If anything, a big problem in Germany is low ambition as the societal norm. A bit of consumerism could actually help with that, as to consume you need to earn, and to earn, you need some ambition.


Tax system and IG Metall salary tables will kill ambition very quickly. The highest salary groups do not guarantee comfy lifestyle for the corresponding areas anymore. Giving away half of salary as mandatory insurance and paying 19% value added tax from the rest is just insulting. Don’t forget the rents in 2026. It’s again new all time high. It does not pay off to work anymore.


Yeah, to me it seems that instead of fighting individualism, Germany needs to make sure that it pays off. Higher taxes for ownership, lower taxes for income from one's work for example.


> US is built on these principles,

And it's a complete clown show rewarding moral bankruptcy that ended up fabricating and promoting uneducated degenerates such as Trump, Hegseth, Miller, &co to the highest positions.

Thanks for making my point really...


These are very different problems from what Germany has though. And it's a recent issue, while individualism is a core tenet of American culture since independence.


If anything, _more_ individualism and personal responsibility would help, not less.


What is the "current demography trend", when did it become current, how does it compare to 35 years ago?


Uh oh


Do not worry, the army is going downhill too


I have fond memories of porting Cube, Sauerbraten and AssaultCube to the Mac back in the day. Given what i've seen from Wouter back in the day i am not surprised he is still on it full steam…


great article but the 44 tonne limit is not "physics", it is regulation. if an electric truck would be allowed to weigh 5 tonnes more all these calculations would be different.


The regulation is at least partially informed by physics though.

Braking distances, road damage (scales with the fourth power of axle weight), bridge limits, etc.

If the limit could safely and appropriately be 49 tons for diesel trucks right now, it probably would be.


trying to click on the link i got a "security verification" screen.

i aborted after 5 seconds of waiting.


> that Bitcoin is a stable source of value

no sane person could have ever held that notion - there is no underlying value.


The computing cost to mine more bitcoin is hailed as the underlying value by proponents of that notion. It depends on bitcoin holders refusing to sell at a price lower than the cost of mining, which isn't a given. It's also a notion that doesn't account for potential innovations such as quantum computing, which would significantly reduce crypto mining costs.


Hindsight is 20/20. That bitcoin is a store of value has been talked about for a very long time when other blockchains overtook it in terms of functionality. People’s memories are short so I am sure it will be touted as such again in a couple years.


Spain basically does not do the required maintenance:

https://www.reuters.com/world/spains-deadly-rail-accidents-p...


From the linked article:

> [The] stretch of track that was renovated last May and inspected on January 7.

The track had been inspected very recently. Maybe the inspection standards are inadequate?

The linked article also shows figures that are quite meaningless without context.

> [The] vast majority [of Spain's high-speed rail budget] went to new infrastructure with only some 16% earmarked for maintenance, renewal and upgrades. That compares with between 34% to 39% spent by France, Germany and Italy,

They simply can't compare those numbers as-is. Of course Spain will be spending less in maintenance as a percentage of the total budget if it's still mainly building new tracks. It's not a useful figure.


> The track had been inspected very recently. Maybe the inspection standards are inadequate?

Spanish officials are very good at deflecting blame and playing politics. Nobody wants to be held accountable for a catastrophe. Also see the 2024 floods in Valencia; a partially preventable tragedy, followed by a whole lot of mud slinging, but zero accountability.

So while inspection standards might be inadequate, I would take anything a senior official says with a pound of salt.


But he is correct. If you have a large enough budget for new construction it can make any maintenance expenditure look tiny. The right figures to compare are normalized by length and age of track, not percentages of the total budget.


> 2024 floods in Valencia; a partially preventable tragedy, followed by a whole lot of mud slinging

sigh

Of course you're right


Yep, plus their network is pretty new anyways. Which generally needs less maintenance than older infrastructure.


Just because something is new, doesn't mean it's full of faults.


Specifically the fractured track was a soldered joint that joined a track from 1989 with a new one from a few weeks ago.


This was a track laid a few weeks ago? I think that's the problem.


Soldered eh? No wonder then that it broke.


English is unusual in that we have both Germanic "weld" and Latinate "solder" and they've acquired different meanings. Spanish (and other Romance languages) use the term "solder" (soldado) for both.


As an aside: Chinese also uses the same term for both (焊接), and the standard English translation is "welding". This can lead to some confusion when Chinese manufacturers start talking about e.g. "surface-mount welding". :)


Heh, that would be a funny misunderstanding to have as well as the opposite, when you get back something soldered when you expected it to be welded.


Interesting. In dutch we use 'solderen' vs 'lassen', in German they use 'schweizen' and 'loten'.

English has a third term like that as well called 'brazing', then there is silver solder (a high temperature version of soldering), in dutch we'd call that 'hardsolderen', whereas what the English call brazing we call oxy-acetyleen lassen (which is more of a process name by virtue of naming the ingredients).

Soldadura autogeno and Soldadura en el arco (sp?) are what I think the modifiers used in Spanish to indicate brazing and (arc) welding.


Schweissen und löten. Has nothing to do with Switzerland (Schweiz) ;)


As a matter of fact schweissen is only correct spelling in Switzerland.

In Germany it would be schweißen.


Ah yes, you are right! I was going by ear, rather than by the written version, in fact I can't recall seeing it written. German is a language that I will happily use but don't ask me to write a letter in it, you'll probably need exponential notation to represent the number of errors.


Czech uses "Pájení" (derived from "joining") vs "Svařování" (derived from "boiling".

So, also different with different etymology in a language from a different group (although these things were probably influenced by German)


Yeah - the Czech wording is quite clever:

* the first one makes it clear a something (a different material) is used to join things together

* the second one implies you melt/boil the things to join them together


> Spain spent an average of about 1.5 billion euros ($1.76 billion) a year from 2018 to 2022 on its high-speed network, more than any other country. However, the vast majority went to new infrastructure with only some 16% earmarked for maintenance, renewal and upgrades. That compares with between 34% to 39% spent by France, Germany and Italy, whose networks are far less extensive, according to the Commission data.

Conflating the maintenance budget with the money invested in new infrastructure in this way is not very useful IMHO. How much inspection/maintenance money was spent per km of (high-speed and overall) railway track would be much more informative...


on around 300 days per year i see a "severe weather alarm" on the iPhone Weather app, although nothing at all is happening, completely ridiculous.


We've gone so over the top on weather fearcasting. Just look out the window if you want to know what the weather is. Save the "the world is ending" messages for truly life-threatening, property-damaging weather (and no, temperature alone doesn't qualify---it's easy to know it's cold or hot by just stepping outside).


Timely. I’m about to turn off severe weather alerts from my local city because they insist on spamming - multiple times per day - cold weather alerts.

And they start at pretty ridiculous temperatures in the double digits. The only way those would be dangerous to you is if you were homeless and lacked any form of winter clothing, at which point you either already know or are too far mentally gone for a text alert to help you.


does it include the breaths you take during coding? best if you can specify your lung volume and breaths per minute!


hahaha we could also track if you typed too fast! ... actually, this is an actual idea, if you use AI to generate the code ... hmmm; that would then be a fun project vs a cloud cost saving one


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: