More

jakozaur · 2026-02-16T15:44:43 1771256683

Funny thing, AI is not that terrible at using Ghidra. We released a benchmark on that and hopefully models will improve: https://quesma.com/blog/introducing-binaryaudit/

Alifatisk · 2026-02-16T16:23:04 1771258984

There is MCPs for Ghidra

joe_mamba · 2026-02-16T16:35:52 1771259752

Yeah this. I saw some guys on youtube use AI MCPs to do some crazy reverse engineering.

It's difficult to be an AI doomer when you see stuff like this.

le-mark · 2026-02-17T00:44:16 1771289056

“AI Doomer” is ambiguous here! Do you mean someone who optimistic about ai will never amount to anything, or that ai won’t be the end of humanity?

thenaturalist · 2026-02-16T20:55:18 1771275318

Would you have a link / links or hints about the channel?

jakozaur · 2026-02-04T09:00:00 1770195600

Funny coincidence, I'm working on a benchmark showcasing AI capabilities in binary analysis.

Actually, AI has huge potential for superhuman capabilities in reverse engineering. This is an extremely tedious job with low productivity. Currently reserved, primarily when there is no other option (e.g., malware analysis). AI can make binary analysis go mainstream for proactive audits to secure against supply-chain attacks.

DonHopkins · 2026-02-04T12:29:00 1770208140

Great point! Not just binary analysis, plus even self-analysis! (See skill-snitch analyze and snitch on itself below!)

MOOLLM's Anthropic skill scanning and monitoring "skill-snitch" skill has superhuman capabilities in reviewing and reverse engineering and monitoring the behavior of untrusted Anthropic and MOOLLM skills, and is also great for debugging and optimizing skills.

It composes with the "cursor-mirror" skill, which gives you full reflective access to all of Cursor's internal chat state, behavior, tool calls, parameters, prompts, thinking, file reads and writes, etc.

That's but one example of how skills can compose, call each other, delegate from one to another, even recurse, iterate, and apply many (HUNDREDS) of skills in one llm completion call.

https://news.ycombinator.com/item?id=46878126

Leela MOOLLM Demo Transcript: https://github.com/SimHacker/moollm/blob/main/designs/LEELA-...

I call this "speed of light" as opposed to "carrier pigeon". In my experiments I ran 33 game turns with 10 characters playing Fluxx — dialogue, game mechanics, emotional reactions — in a single context window and completion call. Try that with MCP and you're making hundreds of round-trips, each suffering from token quantization, noise, and cost. Skills can compose and iterate at the speed of light without any detokenization/tokenization cost and distortion, while MCP forces serialization and waiting for carrier pigeons.

speed-of-light skill: https://github.com/SimHacker/moollm/tree/main/skills/speed-o...

Skills also compose. MOOLLM's cursor-mirror skill introspects Cursor's internals via a sister Python script that reads cursor's chat history and sqlite databases — tool calls, context assembly, thinking blocks, chat history. Everything, for all time, even after Cursor's chat has summarized and forgotten: it's still all there and searchable!

cursor-mirror skill: https://github.com/SimHacker/moollm/tree/main/skills/cursor-...

MOOLLM's skill-snitch skill composes with cursor-mirror for security monitoring of untrusted skills, also performance testing and optimization of trusted ones. Like Little Snitch watches your network, skill-snitch watches skill behavior — comparing declared tools and documentation against observed runtime behavior.

skill-snitch skill: https://github.com/SimHacker/moollm/tree/main/skills/skill-s...

You can even use skill-snitch like a virus scanner to review and monitor untrusted skills. I have more than 100 skills and had skill-snitch review each one including itself -- you can find them in the skill-snitch-report.md file of each skill in MOOLLM. Here is skill-snitch analyzing and reporting on itself, for example:

skill-snitch's skill-snitch-report.md: https://github.com/SimHacker/moollm/blob/main/skills/skill-s...

MOOLLM's thoughtful-commitment skill also composes with cursor-mirror to trace the reasoning behind git commits.

thoughtful-commit skill: https://github.com/SimHacker/moollm/tree/main/skills/thought...

MCP is still valuable for connecting to external systems. But for reasoning, simulation, and skills calling skills? In-context beats tool-call round-trips by orders of magnitude.

More: Speed of Light -vs- Carrier Pigeon (an allegory for Skills -vs- MCP):

https://github.com/SimHacker/moollm/blob/main/designs/SPEED-...

TheGoddessInari · 2026-02-04T20:41:22 1770237682

Haven't dived deep into it yet, but dabbled in similar areas last year (trying to get various bits to reliably "run" in-context).

My immediate thought was to want to apply it to the problem I've been having lately: could it be adapted to soothe the nightmare of bloated llm code environments where the model functionally forgets how to code/follow project guidelines & just wants to complete everything with insecure tutorial style pattern matching?

jakozaur · 2026-02-01T10:01:13 1769940073

Great idea. Currently, people have to rely on client-side spans in OpenTelemetry. However, it would be awesome if we could get spans for slow SQL queries, along with explanations.

jakozaur · 2026-01-29T16:53:29 1769705609

In this benchmark, micro-services are really small, ~300 lines, and sometimes just two of them. More realistic tasks (large codebases, more microservices) would have a lower success rate.

ndriscoll · 2026-01-29T17:26:55 1769707615

I'd expect it to actually do better in a large codebase. e.g. you'd already have an HTTP middleware stack, so it'd know that it can just add a layer to that for traces (and in fact there might already be off-the-shelf layers for whatever framework) vs. having to invent that on its own for the bare microservice.

jakozaur · 2026-01-23T10:56:50 1769165810

See x thread for rationale: https://x.com/mitchellh/status/2014433315261124760?s=46&t=FU...

“ Ultimately, I want to see full session transcripts, but we don't have enough tool support for that broadly.”

I have a side project, git-prompt-story to attach Claude Vode session in GitHub git notes. Though it is not that simple to do automatic (e.g. i need to redact credentials).

ollien · 2026-01-23T13:14:00 1769174040

Not sure how I feel about transcripts. Ultimately I do my best to make any contributions I make high quality, and that means taking time to polish things. Exposing the tangled mess of my thought process leading up to that either means I have to "polish" that too (whatever that ends up looking like), or put myself in a vulnerable position of showing my tangled process to get to the end result.

radarsat1 · 2026-01-23T10:59:12 1769165952

I've thought about saving my prompts along with project development and even done it by hand a few times, but eventually I realized I don't really get much value from doing so. Are there good reasons to do it?

simonw · 2026-01-23T11:23:03 1769167383

For me it's increasingly the work. I spend more time in Claude Code going back and forth with the agent than I do in my text editor hacking on the code by hand. Those transcripts ARE the work I've been doing. I want to save them in the same way that I archive my notes and issues and other ephemera around my projects.

My latest attempt at this is https://github.com/simonw/claude-code-transcripts which produces output like the is: https://gisthost.github.io/?c75bf4d827ea4ee3c325625d24c6cd86...

radarsat1 · 2026-01-23T14:26:21 1769178381

Right, I get that writing prompts is "the work", but if you run them again you don't get the same code. So what's the point of keeping them? They are not 'source code' in the same sense as a programming language.

simonw · 2026-01-23T14:33:38 1769178818

That's why I want the transcript that shows the prompts AND the responses. The prompts alone have little value. The overall conversation shows me exactly what I did, what the agent did and the end result.

radarsat1 · 2026-01-23T19:58:45 1769198325

> shows me exactly what I did

I get that, but I guess what I'm asking is, why does it matter what you did?

The result is working, documented source code, which seems to me to be the important part. What value does keeping the prompt have?

I'm not trying to needle, I just don't see it.

simonw · 2026-01-24T01:28:19 1769218099

It's like issues in that it helps me record why I had the agent solve problems in a particular way.

It's also great for improving my prompting skills over time - I can go back and see what worked.

fragmede · 2026-01-23T11:05:53 1769166353

It's not for you. It's so others can see how you arrived to the code that was generated. They can learn better prompting for themselves from it, and also how you think. They can see which cases got considered, or not. All sorts of good stuff that would be helpful for reviewing giant PRs.

Ronsenshi · 2026-01-23T11:45:59 1769168759

Sounds depressing. First you deal with massive PRs and now also these agent prompts. Soon enough there won't be any coding at all, it seems. Just doomscrolling through massive prompt files and diffs in hopes of understanding what is going on.

verdverm · 2026-01-23T19:52:42 1769197962

I suspect this future will not play out. Mitchell is definitely leaning to one side on this debate.

To me, quality code is quality code no matter how it was arrived at. That should be the end of it

verdverm · 2026-01-23T19:51:02 1769197862

Using them for evals at a future date.

I save all of mine, including their environment, and plan to use them for iterating on my various system prompts and tool instructions.

awesan · 2026-01-23T11:51:07 1769169067

If the AI generated most of the code based on these prompts, it's definitely valuable to review the prompts before even looking at the code. Especially in the case where contributions come from a wide range of devs at different experience levels.

At a minimum it will help you to be skeptical at specific parts of the diff so you can look at those more closely in your review. But it can inform test scenarios etc.

optimalsolver · 2026-01-23T11:27:58 1769167678

>I want to see full session transcripts, but we don't have enough tool support for that broadly

I think AI could help with that.

stevenhuang · 2026-01-23T13:16:04 1769174164

simow wrote a tool that does this for Claude code

https://simonw.substack.com/p/a-new-way-to-extract-detailed-...

couchdb_ouchdb · 2026-01-23T15:16:38 1769181398

You should be able to attach the plan file that you and Claude develop in Plan mode before even starting to code. This should be the source of truth.

empath75 · 2026-01-23T16:29:40 1769185780

On our team, we have discussed attaching claude transcripts to jira tickets, not github PRs (though the PRs are attached to tickets)

jakozaur · 2025-12-10T16:46:00 1765385160

I wish Neal would do behind the scenes, how he built this art. I wonder whether LLM assistants like Claude Code make such an interactive show more feasible.

He previously did a game "Infinite Craft" which leveraged Llama models. However, I was only able to find an outdated blog from 2019.

jonahx · 2025-12-10T19:17:59 1765394279

I think you'd notice a pretty big difference in an LLM clone of this site. The art, music, and other small wouldn't be as consistent or hang together as nicely.

benatkin · 2025-12-10T23:01:32 1765407692

If I could download the LLM clone, and share it, I think I'd prefer it. This is just a website that could at any moment disappear, it isn't like a book.

jakozaur · 2025-12-08T08:32:10 1765182730

Not sure if I get this: WASM lets you use any language in the browser, though it still works way better with languages without GC, such as Rust or a transpiling C engine. Java is unlikely to be the best choice.

In the era of LLM assistants like Claude Code, any engineer can write frontend code using popular stacks like React and TypeScript. This use case is when those tools shine.

another_twist · 2025-12-08T08:46:23 1765183583

Java running in the browser is unlikely as typescript has largely tamed the mess of Javascript. Java requires a JVM and shipping an entire JVM so its runs atop another VM is kinda redundant. Except if JVM itself gets compiled and cached as a WASM bundle and Java compilers start accept WASM-JVM as a target. That will just be distraction tbh, Java has its strength in large scale systems and it should just focus on those rather than get caught up in Frontend's messy world.

jeroenhd · 2025-12-08T09:10:26 1765185026

The article literally links to a frontend that does just that, run the JVM on top of WASM. It performs fine: https://teavm.org/gallery.html

I'm not sure if I'd use it for a website or anything, but if my goal was to embed a simulation or complex widget, I wouldn't ignore it as an option.

bloppe · 2025-12-08T09:18:51 1765185531

It doesn't run the JVM. It's an ahead-of-time compiler that converts Java bytecode to wasm.

jeroenhd · 2025-12-08T09:44:41 1765187081

Oh, if you want a full fat JVM, then you want CheerpJ https://cheerpjdemos.leaningtech.com/SwingDemo.html#demo

Takes a few seconds longer to load because it loads all of Java Spring, but it still performs just fine on my phone (though the lack of on screen keyboard activation makes it rather unfortunate for use in modern web apps).

eru · 2025-12-08T09:16:57 1765185417

> That will just be distraction tbh, Java has its strength in large scale systems and it should just focus on those rather than get caught up in Frontend's messy world.

Multiple people can work on different things in the Java ecosystem.

Compiling Rust to WASM doesn't really distract anyone from compiling Rust to x86 or ARM, either.

jakozaur · 2025-12-07T12:39:35 1765111175

LLVM IR is quite fun to play with from many programming languages. The Java example is rather educational, but there are several practical example,s such as in Go Lang:

https://github.com/llir/llvm

jakozaur · 2025-12-06T13:38:27 1765028307

The effect of climate change may be highly uneven. Some regions will be fine with adaptation, while other places will hardly sustain cities.

jakozaur · 2025-12-06T13:34:37 1765028077

It is even more true with startups and business. Super rushed is bad, but doing for too long decreases quality.

thunderbong · 2025-12-06T16:12:13 1765037533

Moving fast should not mean reducing quality.