Hacker Newsnew | past | comments | ask | show | jobs | submit | LatencyKills's commentslogin

Ex-Apple macOS/Xcode dev here.

I just downloaded your app and ran it through hopper. There is a LOT of embedded Apple Script. I would never run an app like this with SIP disabled or without an active network blocker.

Your app requires direct access to major OS components: code signing, even during alpha should be a requirement.


I use a series of stop hook [0] scripts. For example, I have a script [1] that forces Claude to execute tests whenever code files are changed. The stop hook runs automatically and will force Claude to continue working until the script passes.

I also have a script that forces Claude to generate a summary of work [2] if it hadn't done so on it's own.

[0]: https://code.claude.com/docs/en/hooks

[1]: https://gist.github.com/Looking4OffSwitch/c3d5848935fec5ac3b...

[2]: https://gist.github.com/Looking4OffSwitch/3b13b65e40284be899...


Looks an interesting approach.

So, the tests being written are based on what? on user input or to test the changes it made according to claude's analogy?

If it is the latter, it can be the same problem of forcing its analogy. In this case by validating it


> So, the tests being written are based on what?

I don't think you understand what I'm saying. If, during a session, Claude makes any change to a source file, the stop hook script FORCES Claude to run the existing tests. There is literally no way Claude can get around running the tests because the prompt will not stop being processed until the stop hook script passes.

There is no contradiction. Stop hooks (as well as all the other hook types) are the only way to force Claude to work deterministically.

Hook scripts can be as simple or as complex as you like: you define the success criteria. For example, if Claude just added a new feature but didn't create a test for it, then a stop hook would prevent Claude from stopping until the test was written.


Mate I still didn't understand that.

stop hook forces the claude to run the existing tests. Fine if claude added a new feature but didn't created a test for it. It will wait for it to perform that action. Fine. We can adjust the complexitiy of hook scripts. That's ok.

Tell me if I'm wrong, I am understanding it more like a compiler like if the syntax is ok, just pass. Similarly here, if the test were ran, it will look for a marker file in /tmp, it found it, and pass.

I did not understood the part of tests. Maybe my question is more clear now.


This is one of the simplest and most fundamental ways to manage Claude’s behavior. I genuinely can’t make it clearer. The issue you’re describing seems to center on situations where Claude becomes “stubborn” and doesn’t follow instructions.

When that happens, the solution is straightforward: create a hook script that explicitly enforces the behavior you want. By doing so, you remove ambiguity and leave Claude no option but to comply.

If you can share a specific case where Claude isn’t following your prompts, I can help you craft an appropriate hook to correct it.


From their agent-rules.md:

> This is not negotiable. This is not optional. You cannot rationalize your way out of this.

Some days I really miss the predictability of a good old if/else block. /s


This is awesome. I've was a dev on the C++ team at MS in the 90s and was sure that RTTI was the closest the language would ever get to having a true reflection system.

The ToS clearly states that any loses related to bugs or problems with smart contracts are the responsibility of the user.

So people give you money and they're out of luck if your systems don't work correctly... as in 'trust me bro'?


Pretty standard as far as ToS for these sorts of project goes. Just want users to be cautious before making deposits. Software is internally audited ofc (but its important to note that anything can happen)

Where are the smart contracts?

Base L2.

The smart contract portion simply holds funds, while game server manages them.

High level flow:

Deposit: Player calls deposit(serverId) → contract pulls buyInAmount USDC, emits an event. The server indexer picks it up and spawns the player with mass.

Play: All gameplay from this point on is off-chain. No transactions during play. Simply ws communication between authoritative server and clients.

When a player exits: Server signs a payout ticket (contractAddr, chainId, serverId, sessionId, player, amount, deadline). A relayer submits it on-chain — contract verifies the signature, prevents replay via sessionId, takes a fee, sends net USDC to the player.

Trust model: Contract holds USDC, server is trusted for amounts, but the contract enforces no double-claims and no overdraft. ~430 lines of Solidity.


Heh... okay. No one in their right mind will touch this.

Less hostility bro. Just a person trying to build something that hopefully finds its market.

Do we not have a multi-billion gambling market within the US alone.

Do you see a problem with the product itself. OR the space.


I'm not your bro. Your target audience are the people who post about “technology” on Reddit; not the hacker news crowd.

The fact that your ToS has copy and paste disabled, and that you share nothing about the project makes this entire thing a joke. There are literal rug pull projects that share more than you have.

That’s not “hostile”; it’s the truth. I have no problem with any type of crypto implementation when it is done correctly.

(Me: ex-Apple / Microsoft dev who helped build MS’s first btc client api and audits smart contracts)


I got a chuckle the last time I used Claude's /insights command. The number one thing in the report was, "User frequently stops processing to provide corrections." ;-)

Is that the correct link? That's a blog post from 2024 about the git cli.

I have no clue where that link even came from, I haven't read anything about the git cli in ages!

I made a new post with the correct link. Here it is for your reference: https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_...


Wtf. I need to work on my clipboard game

Exactly. I spent 20 years split between MS and Apple. Some of the best people I ever worked with were in QA. One guy in particular was an extremely talented engineer who simply didn't enjoy the canonical "coding" role; what he did enjoy was finding bugs and breaking things. ;-)

Really? The best people I worked with were never QA.

Moreover, the best QAs would almost always try to be not QA - to shift into a better respected and better paid field.

I wish it werent so (hence my username) but there is a definite class divide between devs and QA and it shows up not just in terms of the pay packets but also who gets the boot in down times and who gets listened to. This definitely affects the quality of people.

I think it's overdue an overhaul much like the sysadmin->devops transition.


We have differing experiences, which shouldn't be surprising. My example explicitly referred to someone who was a good engineer who enjoyed the QA role.

This might have been an Apple/MS thing, but we always had very technical QA people on the dev tools team. For example, the QA lead for the C++ compiler had written their own compiler from scratch and was an amazing contributor.


In the Windows team (back before the test org was decimated) I saw the described "class divide". Anybody who was good enough would switch from SDET to SDE [disclaimer: obviously there were some isolated exceptions]. The test team produced reams of crappy test frameworks, each of which seemed like a "proving project" for its creators to show they could be competent SDEs. After the Great Decimation my dev team took ownership of many such frameworks and it was a total boondoggle; we wasted years trying (and mostly failing) to sort through the crappy test code.

This was all unfortunate, and I agree in principle with having a separate test org, but in Windows the culture unfortunately seemed to be built around testers as second-class software developers.


I spent most of my time working on Visual Studio (in the Boston time frame) so we got to interact with pretty much every team. I absolutely hated interacting with the Windows team. Everything was a fight for no reason.

As I said above, everyone has their own experiences but the QA folks I worked with at MS were fantastic.

Not sure if you're aware but Dave Plumber now has a really good YT channel [0] where he talks about MS back in those days. It's a fun walk down memory lane.

[0]: https://www.youtube.com/@DavesGarage


I mean the people that come up thru QA may be the best while getting enough time in the company to go to a position that pays.

But yea, so many companies cheap their QA and then wonders why their QA sucks.


> Really? The best people I worked with were never QA.

> Moreover, the best QAs would almost always try to be not QA - to shift into a better respected and better paid field.

That sort of seems circular. If they're not respected or paid well, of course most of the talented people would not want to remain in QA, and eventually you'd just have mediocre QA. That doesn't really give you any insight into whether high quality QA would be useful though.

(edit: I see now that's basically the point you're trying to make, so I guess we're in agreement)


std::future doesn't give you a state machine. You get the building blocks you have to assemble into one manually. Coroutines give you the same building blocks but let the compiler do the assembly, making the suspension points visible in the source while hiding the mechanical boilerplate.

This is why coroutine-based frameworks (e.g., C++20 coroutines with cppcoro) have largely superseded future-chaining for async state machine work — the generated code is often equivalent, but the source code is dramatically cleaner and closer to the synchronous equivalent.

(me: ex-Visual Studio dev who worked extensively on our C++ coroutine implementation)


It doesn't seem like a clear win to me. The only "assembly" required with std::future is creating the associated promise and using it to signal when that async step is done, and the upside is a nice readable linear flow, as well as ease of integration (just create a thread to run the state machine function if want multiple in parallel).

With the coroutine approach using yield, doesn't that mean the caller needs to decide when to call it again? With the std::future approach where it's event driven by the promise being set when that state/step has completed.


You are describing a single async step, not a state machine. "Create a promise, set it when done", that's one state. A real async state machine has N states with transitions, branching, error handling, and cleanup between them.

> "The only 'assembly' required is creating the associated promise"

Again, that is only true for one step. For a state machine with N states you need explicit state enums or a long chain of .then() continuations. You also need to the manage the shared state across continuations (normally on the heap). You need to manage manual error propagation across each boundary and handle the cancellation tokens.

You only get a "A nice readable linear flow" using std:future when 1) using a blocking .get() on a thread, or 2) .then() chaining, which isn't "nice" by any means.

Lastly, you seem to be conflating a co_yield (generator, pull-based) with co_await (event-driven, push-based). With co_await, the coroutine is resumed by whoever completes the awaitable.

But what do I know... I only worked on implementing coroutines in cl.exe for 4 years. ;-)


I only mentioned co_yield() since that's what the article was (ab)using, although perhaps justifiably so. It seems the coroutine support was added to C++ in a very flexible way, but so low level as to be daunting/inconvenient to use. It needs to have more high level facilities (like Generators) built on top.

What I was thinking of as a state machine with using std::future was a single function state machine, using switch (state) to the state specific dispatch of asynch ops using std::future, wait for completion then select next state.


> as to be daunting/inconvenient to use

I don't even know how to respond to that. How in the world are you using C++ professionally if you think coroutines are "daunting"? No one uses C++ for it's "convenience" factor. We use it for the power and control it affords.

> What I was thinking of as a state machine with using std::future was a single function state machine, using switch (state) to the state specific dispatch of asynch ops using std::future, wait for completion then select next state.

Uh huh. What about error propagation and all the other very real issues I mentioned that you are just ignoring? Why not just let the compiler do all the work the way it was spec'ed and implemented?


So it's meant to be inconvenient, and that's the only right and proper way?!

Sounds more like punishment than software design, but each to their own.


I get what you’re saying, but you kicked off this thread like an expert — even though you knew you were talking to someone who helped build the very thing you’re critiquing.

It’s pretty clear you’ve never built a production-grade async state machine.

C++ is designed to provide the plumbing, not the kitchen sink. It’s a language for building abstractions, not handing them to you — though in practice, there’s a rich ecosystem if you’d rather reuse than reinvent.

That flexibility comes at the cost of convenience, which is why most new engineers don’t start with C++.

What you call “intimidating,” I call powerful. If coroutines throw you off, you’re probably using the wrong language.

Last thought — when you run into someone who’s built the tools you rely on, ask them questions instead of trying to lecture them. I would have been more than happy to work through a pedagogical solution with you.

/ignored


> It’s pretty clear you’ve never built a production-grade async state machine.

Haha .. you have no idea.

FWIW I've built frameworks exactly for that, and it's highly likely that you've unwittingly used one of them.


Uh huh. The person who gets confused by how co_wait() actually works and thinks that coroutines are "intimidating" wrote frameworks that I would have used to build our C++ compiler. Do you not understand that cl.exe doesn't use external frameworks? lmfao

I said used, as in used in your everyday life, by interacting with computer systems whose backend implementations you are blissfully ignorant of.

But yeah, if you want to win arguments then arguing against yourself and your own hallucinations, is in your case the best way to go.


Um... you might want to look at my profile. In addition to working at MS and Apple for two decades (where I touched everything from firmware, ring-0, and ring-3), I was on the team that created SoftICE [0]: the first commercial ring-0 debugger for Windows. I also created the automated deadlock detector for BoundsChecker [1], which requires an in-depth understanding of operating system internals.

> computer systems whose backend implementations you are blissfully ignorant of

I am extremely confident in my "backend" knowledge (of course, an actual systems engineer would never refer to their work as "backend").

You wrote a "C++ framework" that runs in the "backend" of a "computer system"? Do I have that right? Please let me know what it is so that I can decompile it and see how it was implemented!

[0]: https://en.wikipedia.org/wiki/SoftICE

[1]: https://en.wikipedia.org/wiki/BoundsChecker


I feel like thats really oversellign coro -- theres still a TON of boilerplate

My response specifically addressed the question of why you might choose one option over the other.

Do you believe that std::future is the better option?


I've been working on a utility that lets me "see through" app windows on macOS [1] (I was a dev on Apple's Xcode team and have a strong understanding of how to do this efficiently using private APIs).

I wondered how Claude Code would approach the problem. I fully expected it to do something most human engineers would do: brute-force with ScreenCaptureKit.

It almost instantly figured out that it didn't have to "see through" anything and (correctly) dismissed ScreenCaptureKit due to the performance overhead.

This obviously isn't a "frontier" type problem, but I was impressed that it came up with a novel solution.

[1]: https://imgur.com/a/gWTGGYa


That's actually pretty cool. What made you think of doing this in the first place?

Thanks! I've been doing a lot of work on a laptop screen (I normally work on an ultrawide) and got tired of constantly switching between windows to find the information I need.

I've also added the ability to create a picture-in-picture section of any application window, so you can move a window to the background while still seeing its important content.

I'll probably do a Show HN at some point.


Was it a novel solution for you or for everyone? Because that's a pretty big difference. A lot stuff novel for me would be something someone had been doing for decades somewhere.

Unless you worked on the macOS content server directly you’d have no idea that my solution was even possible.

That fact that Claude skipped over all the obvious solutions is why I used the word novel.


How confident are you that this knowledge was not part of the training data? Was there no stackoverflow questions/replies with it, no tech forum posts, private knowledge bases, etc?

Not trying to diminish its results, just one should always assume that LLMs have a rough memory on pretty much the whole of the internet/human knowledge. Google itself was very impressive back then in how it managed to dig out stuff interesting me (though it's no longer good at finding a single article with almost exact keywords...), and what makes LLMs especially great is that they combine that with some surface level transformation to make that information fit the current, particular need.


Do you think AlphaGo is regurgitating human gameplay? No it’s not: it’s learning an optimal policy based on self play. That is essentially what you’re seeing with agents. People have a very misguided understanding of the training process and the implications of RL in verifiable domains. That’s why coding agents will certainly reach superhuman performance. Straw/steel man depending on what you believe: “But they won’t be able to understand systems! But a good spec IS programming!” also a bad take: agents absolutely can interact with humans, interpret vague deseridata, fill in the gaps, ask for direction. You are not going to need to write a spec the same way you need to today. It will be exactly like interacting with a very good programmer in EVERY sense of the word

How does alphago come into picture? It works in a completely different way all together.

I'm not saying that LLMs can't solve new-ish problems, not part of the training data, but they sure as hell not got some Apple-specific library call from a divine revelation.


AlphaGo comes into the picture to explain that in fact coding agents in verifiable domains are absolutely trained in very similar ways.

It’s not magic they can’t access information that’s not available but they are not regurgitating or interpolating training data. That’s not what I’m saying. I’m saying: there is a misconception stemming from a limited understanding of how coding agents are trained that they somehow are limited by what’s in the training data or poorly interpolating that space. This may be true for some domains but not for coding or mathematics. AlphaGo is the right mental model here: RL in verifiable domains means your gradient steps are taking you in directions that are not limited by the quality or content of the training data that is used only because starting from scratch using RL is very inefficient. Human training data gives the models a more efficient starting point for RL.


Well said.

Why is ScreenCaptureKit a bad choice for performance?

Because you can't control what the content server is doing. SCK doesn't care if you only need a small section of a window: it performs multiple full window memory copies that aren't a problem for normal screen recorders... but for a utility like mine, the user needs to see the updated content in milliseconds.

Also, as I mentioned above, when using SCK, the user cannot minimize or maximize any "watched" window, which is, in most cases, a deal-breaker.

My solution runs at under 2% cpu utilization because I don't have to first receive the full window content. SCK was not designed for this use case at all.


It's been a while since I looked at this but I'm not entirely sure I agree with this. ScreenCaptureKit vends IOSurfaces which don't have copies besides the one that happens to fill the buffer during rendering. I'm not entirely sure what other options you have that are better besides maybe portal views.

> but I'm not entirely sure I agree with this

I worked on the AVC team and built the original SCK Instruments plugin for performance monitoring. I'm assuming you aren't talking about ring-0 (which is where the performance hit occurs). That said, if you want your users to be able to minimize/maximize any "watched" window, ScreenCaptureKit is a non-starter. The OBS team has been asking Apple to remove that restriction for years.

Here's a more real-world scenario [0] where Seymour has to handle more than a single window. I can cycle through the entire z-order at 60 fps while capturing all of the content windows. In fact, Seymore can display the contents of minimized windows, which the content server doesn't even support natively. BTW, this quick demo was done using a debug build. The release build can run at < 4% cpu utilization with a dozen windows active and has full multi-monitor and multi-space support. Also, remember that SCK pays no attention to windows that are hidden; something that Seymour has to do constantly.

Here's something else you can't do with SCK: picture-in-picture windows [1] that can exist even when the source window is hidden. This is super helpful when watching builds or app logs on larger monitors. No more command+tabbing to babysit things.

[0]: https://imgur.com/a/cfgrD0y

[1]: https://imgur.com/a/jiR3GQ0


What was the solution?

Well, I'm not going to share either solution as this is actually a pretty useful utility that I plan on releasing, but the short answer is: 1) don't use ScreenCaptureKit, and 2) take advantage of what CGWindowListCreateImage() offers through the content server. This is a simple IPC mechanism that does not trigger all the SKC limitations (i.e., no multi-space or multi-desktop support). In fact, when using SKC, the user cannot even minimize the "watched" window.

Claude realized those issues right from the start.

One of the trickiest parts is tracking the window content while the window is moving - the content server doesn't, natively, provide that information.


Huh, Claude one-shotted it out of a single message from me. Man, LLMs have gotten good.

No it didn't. Like I said... it may have gotten something that worked but there is no way Claude got it to work while supporting multi-spaces, multi-desktops, and using under 2% cpu utilization. My solution can display app window content even when those windows are minimized, which is not something the content server supports.

My point was that Claude realized all the SKC problems and came up with a solution that 99% of macOS devs wouldn't even know existed.


> it may have gotten something that worked but there is no way Claude got it to work while supporting multi-spaces, multi-desktops, and using under 2% cpu utilization.

Maybe, but that's the magic of LLMs - they can now one-shot or few-shot (N<10) you something good enough for a specific user. Like, not supporting multi-desktops is fine if one doesn't use them (and if that changes, few more prompts about this particular issue - now the user actually knows specifically what they need - should close the gap).


And now it does.

Sorry, "now it does", what?

The things it didn't, which you then helpfully spelled out.

Do you believe my brief overview of the problem will help Claude identify the specific undocumented functions required for my solution? Is that how you think data gets fed back into models during training?

Yes. I don't think you appreciate just how much information your comments provide. You just told us (and Claude) what the interesting problems are, and confirmed both the existence of relevant undocumented functions, and that they are the right solution to those problems. What you didn't flag as interesting, and possible challenges you did not mention (such as these APIs being flaky, or restricted to Apple first-party use, or such) is even more telling.

Most hard problems are hard because of huge uncertainty around what's possible and how to get there. It's true for LLMs as much as it is for humans (and for the same reasons). Here, you gave solid answers to both, all but spelling out the solution.

ETA:

> Is that how you think data gets fed back into models during training?

No, one comment chain on a niche site is not enough.

It is, however, how the data gets fed into prompt, whether by user or autonomously (e.g. RAG).


> Yes. I don't think you appreciate just how much information your comments provide

Lol... no. You don't know how I solved the problem and you just read everything that Claude did.

Absolutely nothing in the key part of my solution uses a single public API (and there are thousands). And you think that Claude can just "figure that out" when my HK comments gets fed back in during training?

I sincerely wish we'd see less /r/technology ridiculousness on HN.


I wonder how many 'ideas guys' will now think that with LLMs they can keep their precious to themselves while at the same bragging about them in online fora. Before they needed those pesky programmers negotiating for a slice of the pie, but this time it will be different.

Next up: copyright protection and/or patents on prompts. Mark my words.


I'm pretty sure a large fraction of the vibecoded stuff out there is from the "ideas guys." This time will be different because they'll find out very quickly whether their ideas are worth anything. The term "slop" substantially applies to the ideas themselves.

I don't think there will be copyright or patents on prompts per se, but I do think patents will become a lot more popular. With AI rewriting entire projects and products from scratch, copyright for software is meaningless, so patents are one of the very few moats left. Probably the only moat for the little guys.


It one-shotted what exactly?

Because LatencyKills is clearly describing a broader set of requirements related to their solution.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: