The players barely ever change. People don't have problems following sports, you shouldn't struggle so much with this once you accept top spot changes.
I didn't express this well but my interest isn't "who is in the top spot", and is more _why and _how various labs get the results they do. This is also magnified by the fact that I'm not only interested in hosted providers of inference but local models as well. What's your take on the best model to run for coding on 24GB of VRAM locally after the last few weeks of releases? Which harness do you prefer? What quants do you think are best? To use your sports metaphor it's more than following the national leagues but also following college and even high school leagues as well. And the real interest isn't even who's doing well but WHY, at each level.
It is funny seeing people ping pong between Anthropic and ChatGPT, with similar rhetoric in both directions.
At this point I would just pick the one who's "ethics" and user experience you prefer. The difference in performance between these releases has had no impact on the meaningful work one can do with them, unless perhaps they are on the fringes in some domain.
Personally I am trying out the open models cloud hosted, since I am not interested in being rug pulled by the big two providers. They have come a long way, and for all the work I actually trust to an LLM they seem to be sufficient.
Their financial projections that to a big part their valuation and investor story is built on involves actually making money, and lots of money, at some point. That money has to come from somewhere.
It's more optimal than planned economies until we have AI planned economies with realtime feedback, I guess.
Consumers get cheap goods during oversupply and most inefficient companies get elliminated during bust while consolidation leads to economies of scale.
Why is the opposite of capitalist markets automatically assumed to be a command economy? Co-op style businesses aren't really capitalist orientated but are also not reliant on government action.
Jules is similar to Twill with the following differences:
- Twill is CLI-agnostic, meaning you can use Claude Code, Codex or Gemini. Jules only works with Gemini.
- We focus on the delegation experience: Twill has native integrations with your typical stack like Slack or Linear. The PRs comes back with proofs of work, such as screenshots or videos.
Larger models better understand and reproduce what's in their training set.
For example, I used to get verbatim quotes and answers from copyrighted works when I used GPT-3.5. That's what clued me in to the copyright problem. Whereas, the smallest models often produced nonsense about the same topics. Because small models often produce nonsense.
You might need to do a new test each time to avoid your old ones being scraped into the training sets. Maybe a new one for each model produced after your last one. Totally unrelated to the last one, too.
because generating nice looking svg requires handling code, shapes, long context, reasoning and at 2b you most likely will break the syntax of the file 9 times out of 10 if you train for that. or you will need to go for simpler pelicans. might not be worth to ft on a 2b. but on their top tier open model it is definitly worth it. even not directly but just crawling a github would make it train on your pelicans.
They are not doing random rotation, simplification here means they are aligning the outliers. If you threw a bunch of shapes on the ground they are picking up one that rolled away and putting it with the others.
>How can a boolean value preserve all of the relational and positional information between data points?
They aren't reducing entire vector to a bollean only each of its dimensions.
I have a junior position open and got 1,300 applicants in 1 week before we took it down. Many of the candidates with strong resumes are just lying and doing so well enough to pass HR screens.
I doubt any sort of AI screen would help though as many of the lying candidates are already using AI assist tools making it just a cat and mouse race...
I don't know a good solution to give everyone a fair chance.
You can't give everyone a fair chance, but at least don't waste their time with a stupid AI interview.
Also, at the end of the day, in your 1,300 applicants maybe you have 200 who are a perfect fit and as equally good. But you just have one position. So even with a perfect system that gives you complete information, you'll still have to reject 199 strong candidates.
It's not just for politics but fairness. You can't just one day up and decide to make something illegal that others depending on for livelyhood. It's good enough that it limits growth of the banned thing.
Sure you can. It just takes backbone, which is rarely found in the political class.
If I, as a voter, voted for a politician who promised to ban dumping mercury in the local river, I don't expect them to say "Oh, but any company already dumping mercury in the river can keep doing so, because we don't want to hurt people's livelihood." That's not what I voted for.
Ok, but if you are investing capital in some sort of production line or industrialization you are not going to want to do that in an area where you might just lose your entire investment instantly; instead, you're just going to invest it in Texas or China. Of course with more extreme examples like yours you do have to put some cost on the existing companies to get it fixed, but it would be something with a smaller cost like having to dispose of the mercury properly (whereas in this article's examples they just flat out ban these things, which you can't do to existing factories).
For sure there would be a disincentive to "invest" in the area where you might lose the investment. That would be intentional. As a voter, I specifically don't want companies to be making those kinds of "investments" in my region. Go "invest" your dirty industry in China. If California's reputation for harshly regulating these things prevents these kinds of businesses from opening here in the first place, I consider that Working As Intended. We could make that reputation even stronger by not grandfathering things.
reply