More

swalsh · 2026-04-22T22:48:32 1776898112

Try running with Open Code. It works quite well.

docheinestages · 2026-04-23T09:25:44 1776936344

I had an equally painful experience with Open Code. I don't think the harness is the issue. It's the need for a large context window and slow inference.

swalsh · 2026-04-21T22:25:52 1776810352

Been using the model for a few hours now. I'm actually reall impressed with it. This is the first time i've found value in an image model for stuff I actually do. I've been using it to build powerpoint slides, and mockups. It's CRAZY good at that.

johnwheeler · 2026-04-22T01:40:03 1776822003

Yeah, it's funny. I would expect to see more enthusiasm versus just basic run-of-the-mill, "oh, there it is". Leave it to the HN crowd. This is incredible. I don't even like OpenAI.

rkozik1989 · 2026-04-22T13:53:40 1776866020

LLMs make for great day 1 demos, but in a few weeks I promise you many people will be able to tell nearly all of the images generated by this are AI. It just takes time and exposure to figure out the new common flaws.

Frankly, I am not sure if they will ever actually be able to solve this problem or if it'll be a continuous game of whackamole, but regardless there's a large crowd of people out there where if they can tell something is AI generated they will not support the company behind it. Being able to tell anything is AI generate cheapens brands.

johnwheeler · 2026-04-22T16:36:11 1776875771

You're thinking is like everyone else's, and it's backwards. The world will learn to accept it as the standard way of doing things and people will appreciate one generation over another and look at manual image creation as a niche activity like blacksmithing vs assembly-line manufacturing and automation. With the latter, you appreciate the intent and the end result. Same thing here, people are just adjusting to it.

pembrook · 2026-04-22T08:06:58 1776845218

HN is engineer heavy so its a bunch of people who spend their days looking at code. If it's not a coding model they'll likely never use it.

To the average HN'er, images and design are superfluous aesthetic decoration for normies.

And for those on HN who do care about aesthetics, they're using Midjourney, which blows any GPT/Gemini model out of the water when it comes to taste even if it doesn't follow your prompt very well.

The examples given on this landing page are stock image-esque trash outside of the improvements in visual text generation.

swalsh · 2026-04-07T20:58:02 1775595482

My understanding is GPT 6 works via synaptic space reasoning... which I find terrifying. I hope if true, OpenAI does some safety testing on that, beyond what they normally do.

tyre · 2026-04-07T21:31:09 1775597469

From the recent New Yorker piece on Sam:

“My vibes don’t match a lot of the traditional A.I.-safety stuff,” Altman said. He insisted that he continued to prioritize these matters, but when pressed for specifics he was vague: “We still will run safety projects, or at least safety-adjacent projects.” When we asked to interview researchers at the company who were working on existential safety—the kinds of issues that could mean, as Altman once put it, “lights-out for all of us”—an OpenAI representative seemed confused. “What do you mean by ‘existential safety’?” he replied. “That’s not, like, a thing.”

actionfromafar · 2026-04-07T22:21:45 1775600505

Amusing! Even if they believe that, they should know the company communicated the opposite earlier.

HDThoreaun · 2026-04-08T02:38:26 1775615906

No chance an openAI spokesperson doesnt know what existential safety is

Barbing · 2026-04-08T03:57:23 1775620643

I did not read the response as...

>Please provide the definition of Existential Safety.

I read:

>Are you mentally stable? Our product would never hurt humanity--how could any language model?

stratos123 · 2026-04-08T10:22:04 1775643724

The absolute gall of this guy to laugh off a question about x-risks. Meanwhile, also Sam Altman, in 2015: "Development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity. There are other threats that I think are more certain to happen (for example, an engineered virus with a long incubation period and a high mortality rate) but are unlikely to destroy every human in the universe in the way that SMI could. Also, most of these other big threats are already widely feared." [1]

[1] https://blog.samaltman.com/machine-intelligence-part-1

t0lo · 2026-04-08T13:14:20 1775654060

Why are these people always like this.

coppsilgold · 2026-04-07T21:56:35 1775598995

Likely an improvement on:

> We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.

<https://arxiv.org/abs/2502.05171>

levocardia · 2026-04-07T21:11:07 1775596267

Oh you mean literally the thing in AI2027 that gets everyone killed? Wonderful.

Turn_Trout · 2026-04-08T01:27:13 1775611633

AI 2027 is not a real thing which happened. At best, it is informed speculation.

mgambati · 2026-04-08T02:21:13 1775614873

Funny if you open their website and go to April 2026 you literally see this: 26b revenue (Anthropic beat 30b) + pro human hacking (mythos?).

I don’t think predictions, but they did a great call until now.

Turn_Trout · 2026-04-08T15:31:52 1775662312

I agree that they called many things remarkably well! That doesn't change the fact that AI 2027 is not a thing which happened, so it isn't valid to point out "this killed us in AI 2027." There are many reasons to want to preserve CoT monitorability. Instead of AI 2027, I'd point to https://arxiv.org/html/2507.11473.

notrealyme123 · 2026-04-07T21:09:11 1775596151

That's sounds really interesting. Do you have some hints where to read more?

arm32 · 2026-04-07T21:12:28 1775596348

Oh, of course they will /s

swalsh · 2026-04-06T20:52:02 1775508722

I had no idea one could buy a Blackhawk for $1.5M

player_piano · 2026-04-06T20:56:47 1775509007

It's the fuel cost that gets you...

bombcar · 2026-04-07T04:03:27 1775534607

Someone's not watching HeavyDSparks ;)

https://www.youtube.com/watch?v=m3P3FWkBFU4

joncrane · 2026-04-06T22:50:13 1775515813

There are Chinooks with no bids as of yet....as well as a Bombardier Challenger

swalsh · 2026-04-02T19:38:58 1775158738

I gave the same prompt (a small rust project that's not easy, but not overly sophisticated) to both Gemma-4 26b and Qwen 3.5 27b via OpenCode. Qwen 3.5 ran for a bit over an hour before I killed it, Gemma 4 ran for about 20 minutes before it gave up. Lots of failed tool calls.

I asked codex to write a summary about both code bases.

"Dev 1" Qwen 3.5

"Dev 2" Gemma 4

Dev 1 is the stronger engineer overall. They showed better architectural judgment, stronger completeness, and better maintainability instincts. The weakness is execution rigor: they built more, but didn’t verify enough, so important parts don’t actually hold up cleanly.

Dev 2 looks more like an early-stage prototyper. The strength is speed to a rough first pass, but the implementation is much less complete, less polished, and less dependable. The main weakness is lack of finish and technical rigor.

If I were choosing between them as developers, I’d take Dev 1 without much hesitation.

Looking at the code myself, i'd agree with codex.

coder543 · 2026-04-02T19:45:39 1775159139

There are issues with the chat template right now[0], so tool calling does not work reliably[1].

Every time people try to rush to judge open models on launch day... it never goes well. There are ~always bugs on launch day.

[0]: https://github.com/ggml-org/llama.cpp/pull/21326

[1]: https://github.com/ggml-org/llama.cpp/issues/21316

stavros · 2026-04-02T23:02:10 1775170930

What causes these? Given how simple the LLM interface is (just completion), why don't teams make a simple, standardized template available with their model release so the inference engine can just read it and work properly? Can someone explain the difficulty with that?

Yukonv · 2026-04-02T23:46:49 1775173609

The model does have the format specified but there is no _one_ standard. For this model it’s defined in the [ tokenizer_config.json [0]. As for llama.cpp they seem to be using a more type safe approach to reading the arguments.

[0] https://huggingface.co/google/gemma-4-31B-it/blob/main/token...

stavros · 2026-04-03T00:21:51 1775175711

Hm, but surely there will be converters for such simple formats? I'm confused as to how there can be calling bugs when the model already includes the template.

emidoots · 2026-04-02T21:32:27 1775165547

was just merged

coder543 · 2026-04-02T21:36:14 1775165774

It was just an example of a bug, not that it was the only bug. I’ve personally reported at least one other for Gemma 4 on llama.cpp already.

In a few days, I imagine that Gemma 4 support should be in better shape.

petu · 2026-04-02T20:59:25 1775163565

Qwen 3.5 27B is dense, so (I think) should be compared to Gemma 4 31B.

Or Gemma-4 26B(-A4B) should be compared to Qwen 3.5 35B(-A3B)

redman25 · 2026-04-02T21:38:53 1775165933

Exactly, compare MoE with MoE and dense with dense otherwise it's apples and oranges.

swalsh · 2026-04-03T00:00:41 1775174441

Its coding to coding. I could care less how the model is architected, i only care how it performs in a real world scenario.

petu · 2026-04-03T07:05:58 1775199958

If you don't care about how it's architectured, why you care about size? Compare it to Q3.5 397B-A17B.

Just like smaller size models are speed / cost optimization, so is MoE.

G4 26B-A4B goes 150 t/s on 4090/5090, 80 t/s on M5 Max. Q3.5 35B-A3B is comparably fast. They are flash-lite/nano class models.

G4 31B despite small increase in total parameter count is over 5 times slower. Q3.5 27B is comparably slow. They are approximating flash/mini class models (I believe sizes of proprietary models in this class are closer to Q3.5 122B-A10B or Llama 4 Scout 109B-A17B).

daemonologist · 2026-04-03T01:43:37 1775180617

The implication is that there is (should be) a major speed difference - naively you'd expect the MoE to be 10x faster and cheaper, which can be pretty relevant on real world tasks.

zozbot234 · 2026-04-02T20:41:58 1775162518

The models are not technically comparable: the Qwen is dense, the Gemma is MoE. The ~33B models are the other way around!

swalsh · 2026-04-02T19:12:05 1775157125

Try using Grok 4.1 reasoning. It's crazy cheap, and really it's not that bad.

sdenton4 · 2026-04-03T01:49:32 1775180972

Sure, it might try to subtly steer you towards fascism, but other than that, it's great.

swalsh · 2026-03-18T17:26:13 1773854773

Neurons that fire together, wire together. Your brain optimizes for your environment over time. As we get older, our brains are running in a more optimized way than when we're younger. That's why older hunters are more effective than younger hunters. They're finely tuned for their environment. It's an evolutionary advantage. But it also means that they're not firing in "novel" ways as much as the "kids". "kids" are more creative I think because their brains are still adopting, exploring novelty, neuron connections aren't as deeply tied together yet.

This is also maybe one of the biggest pitfalls as our society get's "older" with more old people, and less "kids". We need kids to force us to do things differently.

swalsh · 2026-03-18T17:13:43 1773854023

Oh i've been looking for a project for my 11 year old... he's a very project oriented learner, which schools don't seem to do anymore.

hermitcrab · 2026-03-18T19:35:27 1773862527

What country are you in?

swalsh · 2026-03-15T17:17:49 1773595069

Speak for yourself, I have never thrown away code at this rate in my entire career. I couldn't keep up this pace without AI codegen.

sarchertech · 2026-03-15T17:47:11 1773596831

Did you read the article? I don’t think that refutes anything the author said even a little bit.

foolserrandboy · 2026-03-15T17:40:18 1773596418

swalsh · 2026-03-04T09:15:19 1772615719

I bet claude was hyping this guy up as he was building it. "Absolutely, a rust compiler written in PHP is a great idea!"

jlg23 · 2026-03-04T11:05:00 1772622300

Every compiler in any language for any language has at the very least educational value.

On the other hand, demeaning comments without any traces of constructive criticism don't have any value.

embedding-shape · 2026-03-04T10:52:16 1772621536

Does it matter who the sycophant was or just that there was a sycophant?

My partner does that as well as LLMs at this point; "Sure honey, I remember you've talked a lot about Rust and about Clojure in the past, and you seem excited about this Clojure-To-Rust transpiler you're building, it sounds like a great idea!", is that bad too?

nz · 2026-03-04T10:48:50 1772621330

There is no comment on whether LLMs/agents have been used. I feel like projects should explicitly say if they were _or_ were not used. There is no license file, and no copyright header either. This feels like "fauxpen-source": imagine getting LEX+YACC to generate a parser, and presenting the generated C code as "open-source".

This is just another way to throw binaries over the wire, but much worse. This has the _worst_ qualities of the GPL _and_ pseudo-free-software-licenses (i.e. the EULAs used by mongo and others). It has all the deceptive qualities of the latter (e.g. we are open but not really -- similar to Sun Microsystems [love this company btw, in spite of its blunders], trying to convince people that NeWS is "free" but that the cost of media [the CD-ROM] is $900), with the viral qualities of the former (e.g. the fruit of the poison tree problem -- if you use this in your code, then not only can you not copyright the code, but you might actually be liable for infringement of copyright and/or patents).

I would appreciate it if the contributor, mrconter11, would treat HN as an internet space filled with intelligent thinking people, and not a bunch of shallow and mindless rubes. (Please (1) explicitly disclose both the use and absence of use of LLMs -- people are more likely to use your software this way, and preserves the integrity of the open source ecosystem, and (2) share you prompts and session).

So passes the glory of open source.

stephenlind · 2026-03-04T10:56:14 1772621774

According to his Readme he seems to have built a 3D engine completely from scratch 8 years ago without using any library:

https://github.com/mrconter1/IntuitiveEngine

> A simple 3D engine made only with 2D drawLine functions.

nz · 2026-03-04T11:05:52 1772622352

That is (slightly) reassuring (but the rest of his portfolio does not inspire confidence). Nevertheless, we should be required to disclose whether the code has been (legally) tainted or not. This will help people make informed decisions, and will also help people replace the code if legal consequences appear on the horizon, or if they are ready to move from prototype to production.

stephenlind · 2026-03-04T14:22:49 1772634169

Slightly?