Hacker Newsnew | past | comments | ask | show | jobs | submit | xml's commentslogin

I'd like to add a few failure modes:

- LLM removes/disables/weakens tests (disallowing manipulation of tests is not really possible in Python since the language is too dynamic, so the entire execution has to be sandboxed, which makes timing more difficult)

- LLM mutates input, which might throw off some tests (for example, sorting an array where all values have been set to zero is easy), but this can easily be solved by copying the input to somewhere safe or regenerating it from a fixed random seed.

- LLM writes code that only passes the test cases and nothing else, often with a new special case inserted after every failed test. Randomizing everything seems to be a good defense, although it is not always easy to know beforehand what to randomize. Tensor shapes are obvious, but randomizing data distribution to prevent circumvention via precision downgrades is difficult.

And regarding "10. Baseline Kernel" from the article; I've had LLMs call __import__ or compile and obfuscate the code in order to circumvent tests. The proposed defense of static analysis is not quite sufficient here.

I can relate to all points mentioned in the article. They really do happen in practice, and many are also applicable to test-driven development with LLMs. Is there any benchmark to evaluate whether an agent solves a task "in the spirit of the prompt" instead of simply solving it to pass tests?


Even with inflated RAM prices, you can buy a Strix Halo Mini PC with 128GB unified memory right now for less than 2k. It will run gpt-oss-120b (59 GB) at an acceptable 45+ tokens per second: https://github.com/lhl/strix-halo-testing?tab=readme-ov-file...

I also believe that it should eventually be possible to train a model with somewhat persistent mixture of experts, so you only have to load different experts every few tokens. This will enable streaming experts from NVMe SSDs, so you can run state of the art models at interactive speeds with very little VRAM as long as they fit on your disk.


I agree the parent is a bit too pessimistic, especially because we care about logical skills and context size more than remembering random factoids.

But on a tangent, why do you believe in mixture of experts?

Every thing I know about them makes me believe they're a dead-end architecturally.


> But on a tangent, why do you believe in mixture of experts?

The fact that all big SoTA models use MoE is certainly a strong reason. They are more difficult to train, but the efficiency gains seem to be worth it.

> Every thing I know about them makes me believe they're a dead-end architecturally.

Something better will come around eventually, but I do not think that we need much change in architecture to achieve consumer-grade AI. Someone just has to come up with the right loss function for training, then one of the major research labs has to train a large model with it and we are set.

I just checked Google Scholar for a paper with a title like "Temporally Persistent Mixture of Experts" and could not find it yet, but the idea seems straightforward, so it will probably show up soon.


> But on a tangent, why do you believe in mixture of experts

In a hardware inference approach you can do tens of thousands tokens per second and run your agents in a breadth first style. It is all very simply conceptually, and not more than a few years away.


Looks like the pop-ups that cover 70 % of my netbook's screen are still there: https://i.imgur.com/uPbVW2o.png

I wonder if getting rid of those pop-ups would have slowed Stack Overflow's decline. The number of new posts has dropped by 98 % (198540 posts in 2020 down to 3097 in January: https://data.stackexchange.com/stackoverflow/query/1926661#g...)


If most people are not using a tool properly, it is not their fault; it is the tool's fault.

Git is better than what came before, and it might be the best at what it does, but that does not mean that it is good.

- The interface is unintuitive.

- Jargon is everywhere.

- Feature discoverability is bad.

- Once something goes wrong, it is often more difficult to recover. If you're not familiar enough with Git to get yourself into that situation, then you certainly aren't familiar enough to get yourself out of it.

Many of those issues are due to git being a command line interface, but others (like no general undo and funny names) are simply due to bad design.

I think it is about time that we try again and build a better version control tool, but maybe git is just too entrenched.


> If most people are not using a tool properly, it is not their fault; it is the tool's fault.

I would say that is a reasonable criticism of git ... but I've seen the same thing in svn, perforce, cvs, and rcs. Different variations of the same issue of people not caring about the version history.

Since it's been a problem since the dawn of version control, it is either something that is part of all version control being a tool's fault that has been carried with it since doing ci, or it is something that people aren't caring about.

I feel this is more akin to a lack of comments in code and poor style choices and blaming the text editor for not making it easier to comment code.


> problem since the dawn of version control ... a tool's fault ... or it is something that people aren't caring about.

At the start of my career I ended up in a UI position. Old school usability on the back side of a 2 way mirror.

The tool has lots of shortcomings: images, documents that aren't text, working with parts of repositories... These aren't issues faced by the kernel (where emailing patches is the order of the day). And these shortcomings have lead to other tools emerging and being popular, like artifactory, journaling file systems, and various DAM's.

Technology on the whole keeps stacking turtles rather than going back to first principles and fixing core issues. Auth (DAP, LDAP, and every modern auth solution). Security (so many layers, tied back to auth). Containers and virtualization (as a means of installing software...). Versioning is just one among this number. We keep stacking turtles in the hope that another layer of abstraction will solve the problem, but we're just hiding it.

One of the few places where we (as an industry) have gone back and "ripped off the bandaid" is Systemd... It's a vast improvement but I would not call it user friendly.

Usability remains a red headed step child, its the last bastion of "wont fix: works for me" being an acceptable answer.


> If most people are not using a tool properly, it is not their fault; it is the tool's fault.

This is a standard that we don't apply to most other tools outside of IT. I do think git could be more usable, but most powerful tools have sharp edges and require training.

A bandsaw is a fantastic tool, but if you try to use one without reading about it first, you'll end up losing a finger. I'm not sure I'd blame the bandsaw in that instance...


Contemporary bandsaws used by people who take workplace safety seriously have emergency brakes for just that reason (countless trained operators also lost fingers). Improving tools is something we've been doing since our ancestors first held a branch. If we satisfied ourselves with good enough we'd live much different lives.


Then again, the number of shop teachers missing a finger would give anybody pause. Blame is secondary to the fact that you just lost your fucking finger. Thankfully, git's sharp edges won't permanently physically maim you, though guts sharp edges resulting in you committing API keys GitHub can still hurt you, just in your wallet but at least you didn't lose a finger.


Contrast with:

https://old.reddit.com/r/todayilearned/comments/158lp0m/comm...

>My high school shop teacher, before he let any of us near the machines or power tools, told us horror stories about students who lost fingers and eyes by being careless with them. For the entirety of that semester, nobody got so much as a chipped fingernail.

which is a better match for my experience --- the best advice I ever got was from my high school shop teacher:

>Before turning on the power switch, count to ten under your breath on all your fingers while visualizing all the forces involved and all the ways the operation could go wrong, then remind yourself that you want to be able to repeat that count after turning the power off.

I don't think Sawstop would have a business model if all tablesaw injuries were tried by a jury of such shop teachers (heard him scream at the kid who removed a guard through hearing protection all the way on the other side of the shop around a corner while operating a lathe while making a heavy interrupted roughing cut w/ a chisel I really should have paused to sharpen --- the student was banned from ever entering the shop again).


To put it into your metaphor: I am not advocating against the existence of bandsaws. I would just rather have bandsaws that do not cut off your fingers if you do not read a book about them first and make it difficult to sew the fingers back on, while requiring arcane incantations to do their work.

There are of course power tools with obnoxious protections that make them difficult to use, but since we are dealing with software here, we are not bound by the laws of physics. I believe that we can create a better tool that is both powerful and easy to use.


We did! Mercurial!


> If most people are not using a tool properly, it is not their fault; it is the tool's fault.

Replace tool with one of piano|guitar|etc and see your logic fall apart. Software tools like any other have a manual and require effort and time to learn.


modern instruments are actually improved designs of older instruments which were just that: badly-designed & hard to use


Modern instruments are still difficult to use unless you spend time learning how to. Just like git.


No they're easy to use. They're hard to master. Git is hard to figure out how to even upload to.


I see the point and I wouldn't want to belabor the metaphor, but I really feel like guitar is actually extremely difficult to even get started with. Between the awkward stretching of the fingers, how difficult/painful it is to hold down strings hard enough (and close enough to the fret) to get a clean, clear note, and how hard it is to hold those strings down in such a way that your fingers don't brush against other strings, I'd say guitar is crazy hard to start. I'm saying this as someone who has been playing and enjoying guitar for decades. Beginners have a rough time of it for awhile.


Git is much easier to master than the piano. I played piano for years and can only just play two-handed melodies if they aren't well-aligned.

I've read a few blog posts and half a book on git, and I don't remember the last time I had issues with it.

I also don't recall a junior ever having trouble uploading files with git. Unless they're in an interactive rebase, which wouldn't happen your first time trying out git.


There’s inherent beauty in mastering the piano. It’s worth it to spend time practicing.

Git is just a means to an end. Heck, it’s usually a means to a means to an end: it is only a tool for version control of code, and the code itself is just a means to education or running the actual business.


It can be that. In the same vein, playing a guitar can just be the means to an end. Some people play music to get paid. Do you think playing your 60th wedding gig feels beautiful or meaningful?

I personally think git is a marvel of engineering. Hackers are people who are capable of seeing beauty in systems. We're at least nominally on "hacker news", even though a better name might be "VC news".


Yes, but it's not as easy to USE as a piano is! My 4 year old niece can play a piano, but her comprehension of Git is extremely poor.


Depends on the instrument. Anyone without experience or instruction is gonna make a fool of themselves picking up a wind instrument. You need to train the muscles in your face and mouth to form the correct embouchure needed to produce a clear note.


The hardest thing to get people to, is to think of a evolution of their codebase as opposed to just a single state with a sophisticated backup system. Writing proper commit messages and diffs follows from that. The actual Git commands are then incidentally.


Git is better in some ways, but it is insanely complicated. That matters less now with AI tooling but still there was a time when we all had many choices (commercial and open source) for source control tools. My canonical example is git checkout -b <branch_name> for a new branch, git branch -D <branch_name> to delete the local branch and git push origin :<branch_name>.

I know that that is the old syntax but holy hell, that's insane. Why couldn't it always have been git branch --create|delete|delete-remote? It could have but Linus doesn't care about your feelings or small brain. :)


Right on. Git is good at what it does, but its CLI is too low-level. It feels more like an assembly language than an end-user language, and a haphazard one at that.

There are wrappers that make it much more approachable. IntelliJ’s Git frontend, for example, is pretty nice.


Git is a cli software. If you find yourself repeating a set of commands, what you should do is abstract it using an alias or a script. And you will have you own nice interface.


My git experience got a lot better after I built scripts and customized the git config files to fully exploit:

  git-log --graph --reflog

  git-commit --amend

  git-cherry-pick
Also, becoming fluent with creation of and switching between local, short-lived branches.

With the above in order, I found I could subset the git state model:

* temporary branches rather than the "stash"

* commit tentative work to HEAD; amend, discard or set aside in a temporary branches rather than as later discoveries require

* side-step the index/cache/staging_area for most operations -- transfer directly between work tree and HEAD commit


That's a cop out. Not every CLI sucks so bad it needs a wrapper.


No it's not -- powerful CLIs with lots of features are made to be wrapped


Even as a CLI it’s not great. git checkout is extremely overloaded.


Not really. The basic entity of git is the commit and git checkout is meant to restore the working tree to the state when the commit has been created. It may act on the whole tree, a specific part and if no commit has been specified, it uses the index as the source. And with branches being just pointers to commits, it's quite easy to see where the range of options comes from.

Git has its model for version control and it's something that most tutorials don't explain. The CLI is giving you maximum control over this model. For daily operations, it's quite easy to wrap it in a much amenable interface.


IntelliJ's Git frontend makes the Git command line look like a relic for people who would rather type "outlook forward email" than use a GUI. I was disappointed they gave up their attempt to make it a standalone app.


Git has poor design because it forces users to learn its model of things rather than meeting users where they are. There’s way too many leaky abstractions that pop out as soon as you stray from the happy path, and you might not even know you’re straying when you do it.

Instead, the complexity of your mental model should scale with the complexity of the thing you’re trying to do. Writing a “hello world” in Java does not require a mental model of all of the powerful things Java can do.

We want CS 101 students to use version control but for a lot of them it will also be their first time using a CLI and a programming and also the underlying CS concept.


> Git has poor design because it forces users to learn its model of things rather than meeting users where they are.

If the users are unwilling to put even a minimum of effort and thought into it, that is pretty much impossible. I see that as the main problem.


For UI/UX arguments on professional grade software, I present the Bloomberg terminal. A fever dream in shades of orange, black and blue.

For professional work, people can and do learn complex interfaces and jargon, if it is advantageous.


Very true, though it has improved a over the years. Most people haven't noticed because when git has introduced newer simpler commands it hasn't deprecated the old ones. You can now use it like this, but most people don't know it:

git switch some-branch # edit files git restore file2 # undo changes to file2 git stage file1 git commit

Instead of the old workflow using checkout with a bunch of different flags.

I agree though that git is needlessly obtuse. I advocated for mercurial instead of git for years because mercurial was so much more user friendly, but git won. I hear good things about jj now


I haven’t used jj but isn’t that exactly what it’s meant to be?

I believe git is architecturally sound and well designed, but the command line syntax can be overly confusing and opaque.


Git is badly designed, but your rule is also bad.

If somebody can get a lot done with a tool, then it's a good tool. And a lot of tools can't both enable people to get things done and avoid being misused. They have to pick one.


> If somebody can get a lot done with a tool, then it's a good tool.

Does "getting it done with pliers" make them a good wrench?


If somebody has some technique that screw things with pliers that are much faster and reliable than a wrench, you not being able to replicate it doesn't invalidate that person's usage.

Now, until such a person exists, ridiculous counterexamples are still ridiculous.


    > this code is not copyright protected, therefore you are not allowed to apply a MIT LICENSE to this project.
Why not? You still can (and probably should) disclaim warranty and whether the code is copyright protected may vary by jurisdiction.

(Not sure if claiming copyright without having it has any legal consequences though.)


    > Specifically, we collected new data created after January 2025, including: [...] new fiction on Archive of Our Own (Various, 2025),
Not sure how to feel about this. From a researcher's point of view, reproducibility is important, but the last time someone publicly collected data from AO3, the community was not very fond of that.

https://huggingface.co/datasets/nyuuzyou/archiveofourown/dis...


Yeah, that HF dataset page is rough. 247+ threads, mostly DMCA reports, archive-locked fics scraped without consent, dataset reuploaded after takedown. The AO3 community had every reason to be furious.

Not RWKV-specific though. Most large corpora have the same sources in them, they just don't list them explicitly. Whether the transparency makes it better or worse is a real question.


You can still be excited! Recently, GLM-OCR was released, which is a relatively small OCR model (2.5 GB unquantized) that can run on CPU with good quality. I've been using it to digitize various hand-written notes and all my shopping receipts this week.

https://github.com/zai-org/GLM-OCR

(Shameless plug: I also maintain a simplified version of GLM-OCR without dependency on the transformers library, which makes it much easier to install: https://github.com/99991/Simple-GLM-OCR/)


Were there any particular challenges when implementing your library? I have implemented my own serialization library [1] (with a focus on not allowing arbitrary code execution), but had skipped dataclasses for now, since they seemed difficult to get right. What was your experience?

[1] https://github.com/99991/safeserialize

Side note: I think that a warning in the README about arbitrary code execution for deserialization of untrusted inputs would be nice.


Good question! Dataclasses were actually pretty easy - Python's introspection tools made them straightforward.

The tricky parts were:

- Type hints - Mapping __init__ params to attributes, especially with complex types - Preserving types - Keeping tuples as tuples and sets as sets (not just lists) - Error messages - Tracking paths like obj.address.street through the whole pipeline

I checked out safeserialize, by the way—the focus on preventing arbitrary code execution is a really smart niche.


A word of caution: There are SVGs which can freeze a page, so make sure that you do not link to any third party SVGs. This is a known bug, but both the Google Chrome and Mozilla team do not want to fix it.

Here is an evil example SVG for demonstration.

DON'T CLICK THIS LINK UNLESS YOU WANT TO RISK CRASHING YOUR BROWSER!

https://asdf10.com/danger.svg


Crash a single page or even the whole browser isn't really a security problem though. In fact, there are so many ways to freeze the whole tab or even browser ui with build-in function if you apply it way too many times. (For example, a long chain of blur filters will make the chrome ui non responsive because the render time will skyrocket.)

Although if the affect area does escape the tab, the issue will have higher priority because that would be annoying to user.


Wait so are recursive XXE attacks like (I'm assuming) this one possible on Github READMEs? Or have they somehow mitigated them?


It's recursive, but not XXE. It is 20 layers of nested SVG groups, where the first group contains 10 blue circles, and every subsequent group contains 10 of the previous group. This would render as around 10^20 blue circles.


SVG is XML-based, unlike HTML which follows the SGML spec

From curling the malicious page you can also see:

    <?xml version="1.0" encoding="UTF-8"?>
        <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="1000" height="1000">


Yes, SVGs are XML-based and may be vulnerable to generic XML-based XML external entity (XXE) or exponential entity expansion attacks, but this particular malicious SVG is using SVG-specific features to create the resource exhaustion.


I think external entities can be disabled completely right, but who knows, it may pay off to check out what GH did here :)


Here is another inference implementation in Python (only dependency is PyTorch).

https://github.com/99991/SimpleTinyLlama

The new checkpoints did not seem much better and they changed the chat format for some reason, so I did not port the new checkpoints yet. Perhaps I'll get to it this weekend.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: