> But my point is that LLM's essentially arrive at answers by brute force through search.
If "brute force" worked for this, we wouldn't have needed LLMs; a bunch of nested for-loops can brute force anything.
The reason why LLMs are clearly "magic" in ways similar to our own intelligence (which we very much don't understand either) is precisely because it can actually arrive at an answer without brute force, which is computationally prohibitive for most non-trivial problems anyway. Even if the LLM takes several hours spinning in a reasoning loop, those millions tokens still represent a minuscule part of the total possible solution space.
And yes, we're obviously more efficient and smarter. The smarter part should come as no surprise given that our brains have vastly more "parameters". The efficient part is definitely remarkable, but completely orthogonal to the question of whether the phenomenon exhibited is fundamentally the same or not.
If you treat the human brain as a model, and account for the full complexity of neurons (one neuron != one parameter!) it has several orders of magnitude more parameters than any LLM we've made to date, so it shouldn't come as a surprise.
What is surprising is that our brain, as complex as it is, can train so fast on such a meager energy budget.
You are right, but at the same time the human brain does way more stuff (muscle coordination, smell, touch sensing) and all those others take up at least some budget.
So interesting question, but I'm not convinced it's only a scale issue. Like finished models don't really learn the same way as humans do - we actually change the parameters "at runtime", basically updating the model and learning is not only for the current context.
We were optimized to rapidly adapt to changing environments by solving the problems that arise through tool-making and cooperation in complex multi-stage tasks (like say hunting that mammoth to make clothing out of it). It turns out that the cheapest evolutionary pathway to get there has some interesting emergent phenomena.
Human babies "train" their brain on literally gigabytes of multi-modal data dumped on them through all their sensory organs every second.
In a very real sense, our magic superpower is that we "giga-scale" with such low resource consumption, especially considering how large (in terms of parameters) the brain is compared to even the most advanced models we have running on those thousands of GPUs today. But that's where all those millions of years of evolution pay off. Don't diss the wetware!
WTL was never a "step between the MFC and .NET" in any meaningful sense. It was more like a very lightweight subset of MFC+ATL, never officially supported or recommended, just something that Microsoft used internally that it decided to publish and then community picked up.
"Eventually" here is something on the order of a few expected lifespans of the universe.
The fact that we're getting meaningful results out of LLMs on a human timescale means that they're doing something very different.
reply