Hacker Newsnew | past | comments | ask | show | jobs | submit | gbro3n's commentslogin

I completely see your point, but when my / developer time is worth what it is compared to the cost of a frontier model subscription, I'm wary of choosing anything but the best model I can. I would love to be able to say I have X technique for compensating for the model shortfall, but my experience so far has been that bigger, later models out perform older, smaller ones. I genuinely hope this changes through. I understand the investment that it has taken to get us to this point, but intelligence doesn't seem like it's something that should be gated.

Right; but every major generation has had diminishing returns on the last. Two years ago the difference was HUGE between major releases, and now we're discussing Opus 4.6 Vs. 4.7 and people cannot seem to agree if it is an improvement or regression (and even their data in the card shows regressions).

So my point is: If you have the attitude that unless it is the bleeding edge, it may have well not exist, then local models are never going to be good enough. But truth is they're now well exceeding what they need to be to be huge productivity tools, and would have been bleeding edge fairly recently.


I feel like I'm going to have to try the next model. For a few cycles yet. My opinion is that Opus 4.7 is performing worse for my current work flow, but 4.6 was a significant step up, and I'd be getting worse results and shipping slower if I'd stuck with 4.5. The providers are always going to swear that the latest is the greatest. Demis Hassabis recently said in an interview that he thinks the better funded projects will continue to find significant gains through advanced techniques, but that open source models figure out what was changed after about 6 months or so. We'll see I guess. Don't get me wrong, I'd love to settle down with one model and I'd love it to be something I could self host for free.

> I completely see your point, but when my / developer time is worth what it is compared to the cost of a frontier model subscription, I'm wary of choosing anything but the best model I can.

Don't you understand that by choosing the best model we can, we are, collectively, step by step devaluating what our time is worth? Do you really think we all can keep our fancy paychecks while keep using AI?


Do you think if you or me stopped using AI that everyone else will too? We're still what we always were - problem solvers who have gained the ability to learn and understand systems better that the general population, communicate clearly (to humans and now AIs). Unfortunately our knowledge of language APIs and syntax has diminished in value, but we have so many more skills that will be just as valuable as ever. As the amount of software grows, so will the need for people who know how to manage the complexity that comes with it.

> Unfortunately our knowledge of language APIs and syntax has diminished in value, but we have so many more skills that will be just as valuable as ever.

There were always jobs that required those "many more skills" but didn't require any programming skills.

We call those people Business Analysts and you could have been doing it for decades now. You didn't, because those jobs paid half what a decent/average programmer made.

Now you are willingly jumping into that position without realising that the lag between your value (i.e. half your salary, or less) would eventually disappear.


I guess we will need to wait and see if AI can remove ALL of the complexity that requires a software engineer over a business analyst. I can't currently believe that it will. BA's I've worked with vary in technical capability from 'having coded before and understanding DB schema basics and network architecture' to 'I know how the business works but nothing about computers'. If we got to the point in the future where every computer system ran on the same frameworks in the same way, and AI understood it perfectly, then maybe. But while AI is a probabilistic technology manipulating deterministic systems, we will always need people to understand whats going on, and whether they write a lot of code or not, they will be engineers, not analysts. Whether it's more or less of those people, we will see.

> If we got to the point in the future where every computer system ran on the same frameworks in the same way, and AI understood it perfectly, then maybe.

They don't need to all run on the same frameworks, they just need to run on documented frameworks.

What possible value can you bring to a BA?

The system topology (say, if the backend was microservices vs Lambda vs something-else)? The LLM can explain to the BA what their options are, and the impact of those options.

The framework being used (Vue, or React, or something else)? The AI can directly twiddle that for the BA.

Solving a problem? If the observability is setup, the LLM can pinpoint almost all the problems too,and with a separate UAT or failover-type replica, can repro, edit, build, deploy and test faster than you can.

Like I already said, if[1] you're now able to build or enhance a system without actually needing programming skills, why are you excited about that? You could always do that. It's just that it pays half what programming skills gets you.

You (and many others who boast about not writing code since $DATE) appear to be willingly moving to a role that already pays less, and will pay even less once the candidates for that role double (because now all you programmers are shifting towards it).

It's supply and demand, that's all.

--------------

[1] That's a very big "If", I think. However, the programmers who are so glad to not program appear to believe that it's a very small "If", because they're the ones explaining just how far the capabilities have come in just a year, and expect the trend to continue. Of course, if the SOTA models never get better than what we have now, then, sure - your argument holds - you'll still provide value.


I did the same this year. I really liked Digital Ocean though, compared to more complex cloud offerings like AWS. AWS feels like spending more for the same complexity. At least DO feels like it does save time and mental band width. Still though, the performance of cloud VPS is abysmal for the price. I'm now on Hetzner + K3's plus Flux CD (with Cloudflare for file storage (R2) and caching. I run postgres on the same machine with frequent dump backups. If I ever need realtime read replicas, I'll likely just migrate the DB to Neon or something and keep Hetzner with snapshots for running app containers.

I have heard it said that tokens will become commodities. I like being able to switch between Open AI and Anthropics models, but I feel I'd manage if one of them disappeared. I'd probably even get by with Gemini. I don't want to lock in to any one provider any more than I want to lock in to my energy provider. I might pay 2x for a better model, but no more, and I can see that not being the case for much longer.

My current take is that AI is helping me experiment much faster. I can get less involved with the parts of an application that matter less and focus more (manually) on the parts that do. I agree with a lot of the sentiment here - even with the best intentions of reviewing every line of AI code, when it works well and I'm working fast on low stakes functionality, that sometimes doesn't happen. This can be offset however by using AI efficiencies to maintain better test coverage than I would by hand (unit and e2e), having documentation updated with assistance and having diagrams maintained to help me review. There are still some annoyances, when the AI struggles with seemingly simple issues, but I think that we all have to admit that programming was difficult, and quality issues existed before AI.

I built static site publishing into AS Notes, to add in to the mix (https://www.asnotes.io an extension for VS Code). It's markdown and wikilink based, and can publish either the whole workspace or from one or more specific folders. I've designed it so that I was not dependent on any platform for my static sites. Publishing is a pro feature, but it's a one time lifetime licence purchase.

I've used open claw (just for learning, I agree with the author it's not reliable enough to do anything useful) but also have a similar daily summary routine which is a basic gemini api call to a personal mcp server that has access to my email, calendar etc. The latter is so much more reliable. Open claw flows sometimes nail it, and then the next day fails miserably. It seems like we need a way to 'bank' the correct behaviours - like 'do it like you did it on Monday'. I feel that for any high percentage reliability, we will end up moving towards using LLMs as glue with as much of the actual work as possible being handed off to MCP or persisted routine code. The best use case for LLMs currently is writing code, because once it's written, tested and committed, it's useful for the long term. If we had to generate the same code on the fly for every run, there's no way it would ever work reliably. If we extrapolate that idea, I think it helps to see what we can and can't expect from AI.

This is interesting. I haven't used OpenClaw but I set up my own autonomous agent using Codex + ChatGPT Plus + systemd + normal UNIX email and user account infrastructure. And it's been working great! I'm very happy with it. It's been doing all kinds of tasks for me, effectively as an employee of my company.

I haven't seen any issues with memory so far. Using one long rolling context window, a diary and a markdown wiki folder seems sufficient to have it do stuff well. It's early days still and I might still encounter issues as I demand more, but I might just create a second or third bot and treat them as 'specialists' as I would with employees.


I did (using Claude Code) something that sounds very similar to this. It’s a bunch of bootstrapped Unix tools, systemd units, and some markdown files. Two comments:

- I suspect that in this moment, cobbling together your own simple version of a “claw-alike” is far more likely to be productive than a “real” claw. These are still pretty complex systems! And if you don’t have good mental models of what they’re doing under the hood and why, they’re very likely to fail in surprising, infuriating, or downright dangerous ways.

For example, I have implemented my own “sleep” context compaction process and while I’m certain there are objectively better implementations of it than mine… My one is legible to me and therefore I can predict with some accuracy how my productivity tamagotchi will behave day-to-day in a way that I could not if I wasn’t involved in creating it.

(Nb I expect this is a temporary state of affairs while the quality gap between homemade and “professional” just isn’t that big)

- I do use mine as a personal assistant, and I think there is a lot of potential value in this category for people like me with ADD-style brains. For whatever reason, explaining in some detail how a task should be done is often much easier for me than just doing the task (even if, objectively, there’s equal or higher effort required for the former). It therefore doesn’t do anything I _couldn’t_ do myself. But it does do stuff I _wouldn’t_ do on my own.


Right - I think email is a much better UI than Slack or WhatsApp or Discord for that reason. It forces you to write properly and explain what you want, instead of firing off a quick chat. Writing things down helps you think. And because coding harnesses like Codex are very good at interacting with their UNIX environments but are also kinda slow, email's higher latency expectations are a better fit for the underlying technology.

Any chance you might put this on GH? Sounds really interesting.

Maybe but it's so simple I'm not sure it's worth it. You can easily make your own!

What sort of tasks do you have it do for you?

Two categories: actual useful work for the company, and improving the bot's own infrastructure.

Useful work includes: bug triage, matching up external user bug reports on GitHub to the internal YouTrack, fixing easy looking bugs, working on a redesign of the website. I also want to extend it to handling the quarterly accounting, which is already largely automated with AI but I still need to run the scripts myself, preparing answers to support queries, and more work on bug fixing+features. It has access to the bug tracker, internal git and CI system as if it were an employee and uses all of those quite successfully.

Meta-work has so far included: making a console so I can watch what it's doing when it wakes up, regularly organizing its own notes and home directory, improving the wakeup rhythm, and packaging up its infrastructure to a repeatable install script so I can create more of them. I work with a charity in the UK whose owner has expressed interest in an OpenClaw but I warned him off because of all the horror stories. If this experiment continues to work out I might create some more agents for people like him.

I'm not sure it's super useful for individuals. I haven't felt any great need to treat it as a personal assistant yet. ChatGPT web UI works fine for most day to day stuff in my personal life. It's very much acting like an extra employee would at a software company, not a personal secretary or anything like that.

It sounds like our experience differs because you wanted something more controlled with access to your own personal information like email, etc, whereas I gave "Axiom" (it chose its own name) its own accounts and keep it strictly separated from mine. Also, so far I haven't given it many regular repeating tasks beyond a nightly wakeup to maintain its own home directory. I can imagine that for e.g. the accounting work we'd need to do some meta-work first on a calendar integration so it doesn't forget.


I’m doing this exact same thing in my solo saas company, except with Cursor’s Cloud Agents. I can kick them off from web, slack, linear, or on a scheduled basis, so I’m doing a lot of the same things as you. It’s just prompts on a cron, with access to some tools and skills, but super useful.

That unreliability was why I gave up on OpenClaw. I tried hard to give it very simple tasks but it had a high degree of failure. Heartbeats and RAG are lightyears away from where they need to be. I'm not sure if this can be overcome using an application layer right now, but I trust that many people are trying, and I'm eager to see what emerges in the next year. In the mean time I know that they're working very hard on continuous learning - real-time updates to weights and parametric knowledge. It could be that in a year or so, we can all have customised models.

That would be great if that comes to fruition. Investing in a model with weights updates would be like investing in employee training, rather than just giving the same unreliable employee more and more specific instructions.

I've had a crack at this problem in Agent Kanban for VS Code (https://github.com/appsoftwareltd/vscode-agent-kanban). The core idea is that you converse with the agent in a markdown task file in a plan, todo, implement flow, and that I have found works really well for long running complex tasks, and I use this tool every day. But after a while, the agent just forgets to converse in the task file. The only way to get it to (mostly) reliably converse in the task file is to reference the task file and instructions in AGENTS.md. There is support for git work trees and skipping commits of the agents file so as not to pollute the file with the specific task info. There is also an option for working without work trees, but in this flow I had to add chat participant "refresh" commands to help the agent keep it's instructions fresh in context. It's a problem that I believe will slowly get better as better agents appear, and get cheaper to use, because general LLM capability is the key differentiator at the moment.

I built the AS Notes extension for VS Code (https://asnotes.io) partly because I wanted to be able to write my notes with the support of other VS Code extensions, and because of the agent harness options in VS Code (copilot etc). The key thing for easy zettlekasten management is really good wikilink support in markdown. AS Notes supports nested wikilinking and automatic updating in the index on rename etc.

This looks great. Building right into the editor looks like a solid way to go. I built "Agent Kanban" (anextension) for VS Code to enforce a similar "plan, tasks, implement" flow as you describe. That flow is really powerful for getting solid Agentic coding results. My tool went the route of encouraging the model via augmenting AGENTS.md and having the Kanban task file be markdown that the user and agent converse in (with some support for git worktrees which helps when running multiple sessions in parallel): https://www.appsoftware.com/blog/introducing-vs-code-agent-k...


I just posted something similar but with Obsidian Kanban plugin .md files:

https://news.ycombinator.com/item?id=47659511


It's always surprised me that Youtube being owned by the worlds leading search company has such awful on-site search. I've always left Youtube and searched for youtube videos via Google search, which brings up better results!


Hostile engineering


Maybe, I'm not clear what the goal is though.


I guess YouTube doesn't really have any competition, i.e it's not like you're going to switch to the competitor video platform and search there. Your only option is to watch through multiple other videos before finding the one you want, which is great for them.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: