Hacker Newsnew | past | comments | ask | show | jobs | submit | wordpad's commentslogin

Planetary Annihilation did it and wrote and gave talks about it.

The players barely ever change. People don't have problems following sports, you shouldn't struggle so much with this once you accept top spot changes.

I didn't express this well but my interest isn't "who is in the top spot", and is more _why and _how various labs get the results they do. This is also magnified by the fact that I'm not only interested in hosted providers of inference but local models as well. What's your take on the best model to run for coding on 24GB of VRAM locally after the last few weeks of releases? Which harness do you prefer? What quants do you think are best? To use your sports metaphor it's more than following the national leagues but also following college and even high school leagues as well. And the real interest isn't even who's doing well but WHY, at each level.

The technical report discussing the why and how is here: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

Follow the AI newsletters. They bundle the news along with their Op-Ed and summarize it better.

Tips on what newsletters are worth signing up for ?

Can you suggest some good ones?

I really like latent.space and simonwillison.com.

Also (shameless self-promo) I publish a 2x weekly blog just to force myself to keep up: https://aimlbling-about.ninerealmlabs.com/treadmill/



Thanks for this!

Link to direct newsletter subscription: https://importai.substack.com/


It is funny seeing people ping pong between Anthropic and ChatGPT, with similar rhetoric in both directions.

At this point I would just pick the one who's "ethics" and user experience you prefer. The difference in performance between these releases has had no impact on the meaningful work one can do with them, unless perhaps they are on the fringes in some domain.

Personally I am trying out the open models cloud hosted, since I am not interested in being rug pulled by the big two providers. They have come a long way, and for all the work I actually trust to an LLM they seem to be sufficient.


Their financial projections that to a big part their valuation and investor story is built on involves actually making money, and lots of money, at some point. That money has to come from somewhere.

I find ChatGPT annoying mostly

Open settings > personalization. Set it to efficient base style. Turn off enthusiasm and warmth. You’re welcome

Yea but even then it's still annoying. "It's not about the enthusiasm and warmth but the general tone"

Setting “base style and tone” to “efficient” works fine for me.

That's way more than 10, around 50


>Capitalists claim that this is optimal.

It's more optimal than planned economies until we have AI planned economies with realtime feedback, I guess.

Consumers get cheap goods during oversupply and most inefficient companies get elliminated during bust while consolidation leads to economies of scale.


No this is literally a sign of an unstable system with too high of a gain K.

There is an alternative where legislation dampens this behavior but the short term profits will be lower. Hence the hawks don’t like it.


>legislation dampens this behavior

Potentially. Well meaning and thought out legislation still distorts the markets, possibly making things objectively worse.


This is a wild take.


Sophomoric take more precisely.


Why is the opposite of capitalist markets automatically assumed to be a command economy? Co-op style businesses aren't really capitalist orientated but are also not reliant on government action.


How does this compare to Jules from Google?


Jules is similar to Twill with the following differences:

- Twill is CLI-agnostic, meaning you can use Claude Code, Codex or Gemini. Jules only works with Gemini.

- We focus on the delegation experience: Twill has native integrations with your typical stack like Slack or Linear. The PRs comes back with proofs of work, such as screenshots or videos.


That's very interesting, thank you!


Do you think it's just part of their training set now?


It's time to do "frog on a skateboard" now.



Seems very likely, even if Google has behaved ethically.

Simon and YC/HN has published/boosted these gradual improvements and evaluations for quite some time now.

There is a https://simonwillison.net/robots.txt but it allows pretty much everything, AI-wise.


If it's part of their training set why do the 2B and 4B models produce such terrible SVGs?


We were promised full SVG zoos, Simon. I want to see SVG pangolins please


Larger models better understand and reproduce what's in their training set.

For example, I used to get verbatim quotes and answers from copyrighted works when I used GPT-3.5. That's what clued me in to the copyright problem. Whereas, the smallest models often produced nonsense about the same topics. Because small models often produce nonsense.

You might need to do a new test each time to avoid your old ones being scraped into the training sets. Maybe a new one for each model produced after your last one. Totally unrelated to the last one, too.


Because it is in their training set but it's unrealistic to expect a 2B or 4B model to be able to perfectly reproduce everything it's seen before.

The training no doubt contributed to their ability to (very) loosely approximate an SVG of pelican on a bicycle, though.

Frankly I'm impressed


because generating nice looking svg requires handling code, shapes, long context, reasoning and at 2b you most likely will break the syntax of the file 9 times out of 10 if you train for that. or you will need to go for simpler pelicans. might not be worth to ft on a 2b. but on their top tier open model it is definitly worth it. even not directly but just crawling a github would make it train on your pelicans.


They are not doing random rotation, simplification here means they are aligning the outliers. If you threw a bunch of shapes on the ground they are picking up one that rolled away and putting it with the others.

>How can a boolean value preserve all of the relational and positional information between data points?

They aren't reducing entire vector to a bollean only each of its dimensions.


> AI capability problem is mostly solved; the distribution and trust problem isn't.

SaaS opportunity? Maybe, some sort of marketplace of AI-written applications and services with discovery features?


I have a junior position open and got 1,300 applicants in 1 week before we took it down. Many of the candidates with strong resumes are just lying and doing so well enough to pass HR screens.

I doubt any sort of AI screen would help though as many of the lying candidates are already using AI assist tools making it just a cat and mouse race...

I don't know a good solution to give everyone a fair chance.


You can't give everyone a fair chance, but at least don't waste their time with a stupid AI interview.

Also, at the end of the day, in your 1,300 applicants maybe you have 200 who are a perfect fit and as equally good. But you just have one position. So even with a perfect system that gives you complete information, you'll still have to reject 199 strong candidates.


It's not just for politics but fairness. You can't just one day up and decide to make something illegal that others depending on for livelyhood. It's good enough that it limits growth of the banned thing.


Sure you can. It just takes backbone, which is rarely found in the political class.

If I, as a voter, voted for a politician who promised to ban dumping mercury in the local river, I don't expect them to say "Oh, but any company already dumping mercury in the river can keep doing so, because we don't want to hurt people's livelihood." That's not what I voted for.


Ok, but if you are investing capital in some sort of production line or industrialization you are not going to want to do that in an area where you might just lose your entire investment instantly; instead, you're just going to invest it in Texas or China. Of course with more extreme examples like yours you do have to put some cost on the existing companies to get it fixed, but it would be something with a smaller cost like having to dispose of the mercury properly (whereas in this article's examples they just flat out ban these things, which you can't do to existing factories).


For sure there would be a disincentive to "invest" in the area where you might lose the investment. That would be intentional. As a voter, I specifically don't want companies to be making those kinds of "investments" in my region. Go "invest" your dirty industry in China. If California's reputation for harshly regulating these things prevents these kinds of businesses from opening here in the first place, I consider that Working As Intended. We could make that reputation even stronger by not grandfathering things.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: