This was their play all along with their unethical data collection practices: let others use the APIs to discover the applications, then use the data against them to offer integrated solutions in every vertical of interest. Cursor, once Anthropic’s biggest customer, was one of the early ones they screwed.
They are also fighting for their lives because these insane valuations simply aren’t justified by being dumb pipes. Fortunately, open weights models are widely available and have crossed a threshold of usefulness that cements their place as good substitutes.
When you read technical papers on various models, you’ll find that they often did most of the pretraining and even the supervised fine tuning using relatively short context data; then they “extended” the context window by training on a little bit of long context data. I think this is what is meant by not being trained uniformly.
However, now that RL environments and long-horizon agentic performance have taken such a prominent role in model development, I wonder if that practice still holds. I know that the most recent Gemma and Qwen models are incomparably more reliable at long contexts than their predecessors, even though, e.g. Qwen already had a 256k context. It just didn’t work like it does now.
One can’t say that proposition is obvious to the population at large. Else, “we” (as in Earth in 2026) would have very political dynamics. So maybe Banksy felt inclined to do a public service announcement.
The model outputs a probability distribution for the next token, given the sequence of all previous tokens in the context window. It’s just a list of floats in the same order as the list of tokens that the tokenizer uses.
After that, a piece of software that is NOT the LLM chooses the next token. This is called the sampler. There are different sampling parameters and strategies available, but if you want repeatable* outputs, just take the token with the highest probability number.
* Perfect determinism in this sense is difficult to achieve because GPU calculations naturally have a minor bit of nondeterminism. But you can get very close.
Believe it or not in statistics and machine learning the hard coded parts of a model that impact the results are considered part of the model. But I understand that now days we don't care about these things because ai goes brrr.
Hm, I don't think this looks like Anthropic's design style. Anthropic is kind of doing a Chobanicore + Corporate Memphis design system that I personally find kind of creepy. But the website here just feels fresh and pleasant.
Agreed; that's a beautiful site. The main design style apart from minimalism that I notice is glassmorphism. Well, that and a very well chosen Monet to set the tone.
Well both aren’t “more important”, since that’s illogical. I think recent strides in high performance small LLMs have shown that the tasks LLMs are useful for may not require the level of representational capacity that trillion-parameter models offer.
However: the labs releasing these high-intelligence-density models are getting them by first training much larger models and then distilling down. So the most interesting question to me is, how can we accelerate learning in small networks to avoid the necessity of training huge teacher networks?
This is just blind belief. The model discussed in this topic already outperforms “well made” frontier LLMs of 12-18 months ago. If what you wrote is true, that wouldn’t have been possible.
Absolutely. Plus as these companies become hungrier for revenue and to get out of the commodity market they are in, they are only going to get more aggressive in their (ab)use of customer data.
I would recommend trying oMLX, which is much more performant and efficient than LM Studio. It has block-level KV context caching that makes long chats and agentic/tool calling scenarios MUCH faster.
They are also fighting for their lives because these insane valuations simply aren’t justified by being dumb pipes. Fortunately, open weights models are widely available and have crossed a threshold of usefulness that cements their place as good substitutes.
reply