Hacker Newsnew | past | comments | ask | show | jobs | submit | jakozaur's commentslogin

Funny thing, AI is not that terrible at using Ghidra. We released a benchmark on that and hopefully models will improve: https://quesma.com/blog/introducing-binaryaudit/

There is MCPs for Ghidra

Yeah this. I saw some guys on youtube use AI MCPs to do some crazy reverse engineering.

It's difficult to be an AI doomer when you see stuff like this.


“AI Doomer” is ambiguous here! Do you mean someone who optimistic about ai will never amount to anything, or that ai won’t be the end of humanity?

Would you have a link / links or hints about the channel?

Funny coincidence, I'm working on a benchmark showcasing AI capabilities in binary analysis.

Actually, AI has huge potential for superhuman capabilities in reverse engineering. This is an extremely tedious job with low productivity. Currently reserved, primarily when there is no other option (e.g., malware analysis). AI can make binary analysis go mainstream for proactive audits to secure against supply-chain attacks.


Great point! Not just binary analysis, plus even self-analysis! (See skill-snitch analyze and snitch on itself below!)

MOOLLM's Anthropic skill scanning and monitoring "skill-snitch" skill has superhuman capabilities in reviewing and reverse engineering and monitoring the behavior of untrusted Anthropic and MOOLLM skills, and is also great for debugging and optimizing skills.

It composes with the "cursor-mirror" skill, which gives you full reflective access to all of Cursor's internal chat state, behavior, tool calls, parameters, prompts, thinking, file reads and writes, etc.

That's but one example of how skills can compose, call each other, delegate from one to another, even recurse, iterate, and apply many (HUNDREDS) of skills in one llm completion call.

https://news.ycombinator.com/item?id=46878126

Leela MOOLLM Demo Transcript: https://github.com/SimHacker/moollm/blob/main/designs/LEELA-...

I call this "speed of light" as opposed to "carrier pigeon". In my experiments I ran 33 game turns with 10 characters playing Fluxx — dialogue, game mechanics, emotional reactions — in a single context window and completion call. Try that with MCP and you're making hundreds of round-trips, each suffering from token quantization, noise, and cost. Skills can compose and iterate at the speed of light without any detokenization/tokenization cost and distortion, while MCP forces serialization and waiting for carrier pigeons.

speed-of-light skill: https://github.com/SimHacker/moollm/tree/main/skills/speed-o...

Skills also compose. MOOLLM's cursor-mirror skill introspects Cursor's internals via a sister Python script that reads cursor's chat history and sqlite databases — tool calls, context assembly, thinking blocks, chat history. Everything, for all time, even after Cursor's chat has summarized and forgotten: it's still all there and searchable!

cursor-mirror skill: https://github.com/SimHacker/moollm/tree/main/skills/cursor-...

MOOLLM's skill-snitch skill composes with cursor-mirror for security monitoring of untrusted skills, also performance testing and optimization of trusted ones. Like Little Snitch watches your network, skill-snitch watches skill behavior — comparing declared tools and documentation against observed runtime behavior.

skill-snitch skill: https://github.com/SimHacker/moollm/tree/main/skills/skill-s...

You can even use skill-snitch like a virus scanner to review and monitor untrusted skills. I have more than 100 skills and had skill-snitch review each one including itself -- you can find them in the skill-snitch-report.md file of each skill in MOOLLM. Here is skill-snitch analyzing and reporting on itself, for example:

skill-snitch's skill-snitch-report.md: https://github.com/SimHacker/moollm/blob/main/skills/skill-s...

MOOLLM's thoughtful-commitment skill also composes with cursor-mirror to trace the reasoning behind git commits.

thoughtful-commit skill: https://github.com/SimHacker/moollm/tree/main/skills/thought...

MCP is still valuable for connecting to external systems. But for reasoning, simulation, and skills calling skills? In-context beats tool-call round-trips by orders of magnitude.

More: Speed of Light -vs- Carrier Pigeon (an allegory for Skills -vs- MCP):

https://github.com/SimHacker/moollm/blob/main/designs/SPEED-...


Haven't dived deep into it yet, but dabbled in similar areas last year (trying to get various bits to reliably "run" in-context).

My immediate thought was to want to apply it to the problem I've been having lately: could it be adapted to soothe the nightmare of bloated llm code environments where the model functionally forgets how to code/follow project guidelines & just wants to complete everything with insecure tutorial style pattern matching?


Great idea. Currently, people have to rely on client-side spans in OpenTelemetry. However, it would be awesome if we could get spans for slow SQL queries, along with explanations.


In this benchmark, micro-services are really small, ~300 lines, and sometimes just two of them. More realistic tasks (large codebases, more microservices) would have a lower success rate.


I'd expect it to actually do better in a large codebase. e.g. you'd already have an HTTP middleware stack, so it'd know that it can just add a layer to that for traces (and in fact there might already be off-the-shelf layers for whatever framework) vs. having to invent that on its own for the bare microservice.


See x thread for rationale: https://x.com/mitchellh/status/2014433315261124760?s=46&t=FU...

“ Ultimately, I want to see full session transcripts, but we don't have enough tool support for that broadly.”

I have a side project, git-prompt-story to attach Claude Vode session in GitHub git notes. Though it is not that simple to do automatic (e.g. i need to redact credentials).


Not sure how I feel about transcripts. Ultimately I do my best to make any contributions I make high quality, and that means taking time to polish things. Exposing the tangled mess of my thought process leading up to that either means I have to "polish" that too (whatever that ends up looking like), or put myself in a vulnerable position of showing my tangled process to get to the end result.


I've thought about saving my prompts along with project development and even done it by hand a few times, but eventually I realized I don't really get much value from doing so. Are there good reasons to do it?


For me it's increasingly the work. I spend more time in Claude Code going back and forth with the agent than I do in my text editor hacking on the code by hand. Those transcripts ARE the work I've been doing. I want to save them in the same way that I archive my notes and issues and other ephemera around my projects.

My latest attempt at this is https://github.com/simonw/claude-code-transcripts which produces output like the is: https://gisthost.github.io/?c75bf4d827ea4ee3c325625d24c6cd86...


Right, I get that writing prompts is "the work", but if you run them again you don't get the same code. So what's the point of keeping them? They are not 'source code' in the same sense as a programming language.


That's why I want the transcript that shows the prompts AND the responses. The prompts alone have little value. The overall conversation shows me exactly what I did, what the agent did and the end result.


> shows me exactly what I did

I get that, but I guess what I'm asking is, why does it matter what you did?

The result is working, documented source code, which seems to me to be the important part. What value does keeping the prompt have?

I'm not trying to needle, I just don't see it.


It's like issues in that it helps me record why I had the agent solve problems in a particular way.

It's also great for improving my prompting skills over time - I can go back and see what worked.


It's not for you. It's so others can see how you arrived to the code that was generated. They can learn better prompting for themselves from it, and also how you think. They can see which cases got considered, or not. All sorts of good stuff that would be helpful for reviewing giant PRs.


Sounds depressing. First you deal with massive PRs and now also these agent prompts. Soon enough there won't be any coding at all, it seems. Just doomscrolling through massive prompt files and diffs in hopes of understanding what is going on.


I suspect this future will not play out. Mitchell is definitely leaning to one side on this debate.

To me, quality code is quality code no matter how it was arrived at. That should be the end of it


Using them for evals at a future date.

I save all of mine, including their environment, and plan to use them for iterating on my various system prompts and tool instructions.


If the AI generated most of the code based on these prompts, it's definitely valuable to review the prompts before even looking at the code. Especially in the case where contributions come from a wide range of devs at different experience levels.

At a minimum it will help you to be skeptical at specific parts of the diff so you can look at those more closely in your review. But it can inform test scenarios etc.


>I want to see full session transcripts, but we don't have enough tool support for that broadly

I think AI could help with that.


simow wrote a tool that does this for Claude code

https://simonw.substack.com/p/a-new-way-to-extract-detailed-...


You should be able to attach the plan file that you and Claude develop in Plan mode before even starting to code. This should be the source of truth.


On our team, we have discussed attaching claude transcripts to jira tickets, not github PRs (though the PRs are attached to tickets)


I wish Neal would do behind the scenes, how he built this art. I wonder whether LLM assistants like Claude Code make such an interactive show more feasible.

He previously did a game "Infinite Craft" which leveraged Llama models. However, I was only able to find an outdated blog from 2019.


I think you'd notice a pretty big difference in an LLM clone of this site. The art, music, and other small wouldn't be as consistent or hang together as nicely.


If I could download the LLM clone, and share it, I think I'd prefer it. This is just a website that could at any moment disappear, it isn't like a book.


Not sure if I get this: WASM lets you use any language in the browser, though it still works way better with languages without GC, such as Rust or a transpiling C engine. Java is unlikely to be the best choice.

In the era of LLM assistants like Claude Code, any engineer can write frontend code using popular stacks like React and TypeScript. This use case is when those tools shine.


Java running in the browser is unlikely as typescript has largely tamed the mess of Javascript. Java requires a JVM and shipping an entire JVM so its runs atop another VM is kinda redundant. Except if JVM itself gets compiled and cached as a WASM bundle and Java compilers start accept WASM-JVM as a target. That will just be distraction tbh, Java has its strength in large scale systems and it should just focus on those rather than get caught up in Frontend's messy world.


The article literally links to a frontend that does just that, run the JVM on top of WASM. It performs fine: https://teavm.org/gallery.html

I'm not sure if I'd use it for a website or anything, but if my goal was to embed a simulation or complex widget, I wouldn't ignore it as an option.


It doesn't run the JVM. It's an ahead-of-time compiler that converts Java bytecode to wasm.


Oh, if you want a full fat JVM, then you want CheerpJ https://cheerpjdemos.leaningtech.com/SwingDemo.html#demo

Takes a few seconds longer to load because it loads all of Java Spring, but it still performs just fine on my phone (though the lack of on screen keyboard activation makes it rather unfortunate for use in modern web apps).


> That will just be distraction tbh, Java has its strength in large scale systems and it should just focus on those rather than get caught up in Frontend's messy world.

Multiple people can work on different things in the Java ecosystem.

Compiling Rust to WASM doesn't really distract anyone from compiling Rust to x86 or ARM, either.


LLVM IR is quite fun to play with from many programming languages. The Java example is rather educational, but there are several practical example,s such as in Go Lang:

https://github.com/llir/llvm


The effect of climate change may be highly uneven. Some regions will be fine with adaptation, while other places will hardly sustain cities.


It is even more true with startups and business. Super rushed is bad, but doing for too long decreases quality.


Moving fast should not mean reducing quality.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: