What you've experienced is different from what was originally mentioned though. Even with the best human developers, you can't provide a normal natural language prompt and get back the exact code you would have written, because natural language has ambiguities and the probability that the other person (or LLM) will resolve all of them exactly as you would is approaches zero.
Collaborating with someone/something else via natural language in a programming project inherently trades control for productivity (or the promise of it). That tradeoff can be worth it depending on how much productivity you gain and how competent the collaborator is, but it can't be avoided.
Ah, the old "you suck at prompting" angle again, isn't it? If you're going to shill this hard, at least come up with something new and original, this is sounding more than desperate.
Most people suck at playing the piano. Most people suck at prompting coding agents. If you practice either of those things you'll get better at them.
I really don't understand the "stop telling me I'm holding it wrong" argument. You probably are holding it wrong!
Is this born out of some weird belief that "AI" is meant to be science fiction technology that you don't ever need to learn how to use?
That would help explain why conversations like this are full of people who claim to get great results and other people who say every time they've tried it the results have been terrible.
> I really don't understand the "stop telling me I'm holding it wrong" argument. You probably are holding it wrong!
I can't speak for others, but from my end it really seems like there's no actual way to detect whether someone is holding it right or wrong until after the implications for LLMs are known. If someone is enthusiastic about LLMs, we don't see claims that they're holding it wrong. It's only if an LLM project fails, or someone tries them and concludes they don't work as well as proponents say, that the accusations come out, even if the person in question had been using these tools for a long time and previously been a supporter. This makes it seem like "holding it wrong" is a post hoc justification for ignoring evidence that would tend to contradict the pro-LLM narrative, not a measurable fact someone's LLM usage.
> Most people suck at playing the piano. Most people suck at prompting coding agents. If you practice either of those things you'll get better at them.
It would be funny, if by now I weren't convinced you are pushing these false analogies on purpose. The key difference between a piano and LLMs being, the piano will produce the same sounds to a same sequence of keys. Every single time. A piano is deterministic. The LLMs are not, and you know it, which makes your constant comparison of deterministic with non-deterministic tools sound a bit dishonest. So please stop using these very weak analogies.
> I really don't understand the "stop telling me I'm holding it wrong" argument. You probably are holding it wrong!
Right, another weak argument. Writing English language paragraphs is not a science you seem to imply it is. You're not the only person using the LLMs intensively for the last years, and it's not like there this huge secret to using them - after all they use natural language as their primary interface. But that's besides the point. We're not discussing if they are hard or easy to use or whatever. We are discussing if I should replace the magnificent supercomputer already placed in my head by mother nature or God or Aliens or whatever you believe in, for a very shitty, downgraded version 0.0.1 of it sitting in someone's datacenter, all for the sake of sometimes cutting some corners by getting that quick awk/sed oneliner or some boilerplate code? I don't think that's a worthy tradeoff, especially when the relevant reports indicate an objective slowdown, which probably also explains the so-called LLM-fatigue.
> Is this born out of some weird belief that "AI" is meant to be science fiction technology that you don't ever need to learn how to use?
No, actually it is born out of the weird belief which your sponsors have been either explicitly or implicitly promoting, now for the 4th year, in various intensities and frequencies, that the LLM technology will be equal to a "country of PhDs in a datacenter". All of this based on the super weird transhumanist ideology a lot of the people directly or indirectly sponsoring your writing actively believe in. And whether you like it or not, even if you have never implied the same, you have been a useful helper by providing a more "rational" sounding voice, commenting on the supposed incremental improvements and progress and what not.
Most people suck at falconry. If you practice at falconry you'll get better at it.
Falcons certainly aren't deterministic.
> it's not like there this huge secret to using them - after all they use natural language as their primary interface
That's what makes them hard to use! A programming language has like ~30 keywords and does what you tell it to do. An LLM accepts input in 100+ human languages and, as you've already pointed out many times, responds in non-deterministic ways. That makes figuring out how to use them effectively really difficult.
> We are discussing if I should replace the magnificent supercomputer already placed in my head by mother nature or God or Aliens or whatever you believe in, for a very shitty, downgraded version 0.0.1 of it sitting in someone's datacenter
We really aren't. I consistently argue for LLMs as tools that augment and amplify human expertise, not as tools that replace it.
I never repeat the "country of PhDs" stuff because I think it's over-hyped nonsense. I talk about what LLMs can actually do.
Well falcons are not deterministic and are trained to do something in the art of falconry, yes. Still I fail to see an analogy here as it is the falcon gets trained to execute a few specific tasks triggered by specific commands. Much like a dog. The human more or less needs to remember those few commands. We don't teach dogs and falcons to do everything do we ? Although we do teach specific dogs do to specific tasks in various domains. But no one ever claimed Fido was superintelligent and that we needed to figure him out better.
> That's what makes them hard to use! A programming language has like ~30 keywords and does what you tell it to do. An LLM accepts input in 100+ human languages and, as you've already pointed out many times, responds in non-deterministic ways. That makes figuring out how to use them effectively really difficult.
Well yes and no. The problem with figuring out how to use them (LLMs) effectively is exactly caused by their inherent un-predictability, which is a feature of their architecture further exacerbated by whatever datasets they were trained on. And so since we have no f*ing clue as to what the glorified slot machines might pop out next, and it is not even sure as recently measured, that they make us more productive, the logical question is - why should we, as you propose in your latest blog, bend our minds to try and "figure them out" ? If they are un-predictable, that means effectively that we do not control them, so what good is our effort in "figuring them out"? How can you figure out a slot machine? And why the hell should we use it for anything else other than a shittier replacement for pre-2019 Google? In this state they are neither augmentation nor amplification. They are a drag on productivity and it shows, hint - AWS December outage. How is that amplifying anything other than toil and work for the humans?
I've found that using LLMs has had a very material effect on my productivity as a software developer. I write about them to help other people understand how I'm getting such great results and that this is a learnable skill that they can pick up.
I know about the METR paper that says people over-estimate the productivity gains. Taking that into account, I am still 100% certain that the productivity gains I'm seeing are real.
The other day I knocked out a custom macOS app for presenting web-pages-as-slides in Swift UI in 40 minutes, complete with a Tailscale-backed remote presenter control interface I could run from my phone. I've never touched Swift before. Nobody on earth will convince me that I could have done that without assistance from an LLM.
(And I'm sure you could say that's a bad example and a toy, but I've got several hundred more like that, many of which are useful, robust software I run in production.)
That's beside my point. You are trading off the LoC for quality of code. You're not onto some big secret here - I've also built complete fullstack web applications with LLMs, complete with ORM data models and payment integrations. With the issue being....the LLMs will often produce the laziest code possible, such as putting the stripe secret directly into the frontend for anyone with two neurons in their brain to see.... or mixing up TS and JS code...or suggesting an outdated library version.... or for the thousandth time not using the auth functions in the backend we already implemented, and instead adding again session authentication in the expressjs handlers...etc etc. etc. We all know how to "knock out" major applications with them. Again you are not sitting on a big secret that the rest of us have yet to find out. "Knocking out" an application with an LLM most of us have done several times over the last few years, most of them not being toy examples like yours. The issue is the quality of the code and the question whether the effort we have to put into controlling the slot machine is worth the effort.
Part of the argument I'm developing in my writing here is that LLMs should enable us to write better code, and if that's not happening we need to reevaluate and improve the way we are putting them to use. That chapter is still in my drafts.
> Again you are not sitting on a big secret that the rest of us have yet to find out. "Knocking out" an application with an LLM most of us have done several times over the last few years, most of them not being toy examples like yours.
That's still a very tiny portion of the software developer population. I know that because I talk to people - there is a desperate need for grounded, hype-free guidance to help the rest of our industry navigate this stuff and that's what I intend to provide.
The hardest part is exactly what you're describing here: figuring out how to get great results despite the models often using outdated libraries, writing lazy code, leaking API tokens, messing up details etc.
> Part of the argument I'm developing in my writing here is that LLMs should enable us to write better code, and if that's not happening we need to reevaluate and improve the way we are putting them to use. That chapter is still in my drafts.
So you see, after so much hype and hard and soft promotion efforts ( I count your writing in the latter category), you'd think it should not be "us" figuring it out - should it not be the people who are shoving this crap down our throats?
> That's still a very tiny portion of the software developer population. I know that because I talk to people - there is a desperate need for grounded, hype-free guidance to help the rest of our industry navigate this stuff and that's what I intend to provide.
That's a very arrogant position to assume - on the one hand there is no big secret to using these tools provided you can express yourself at all in written language. However some people for various reasons, I suspect mostly those who wandered into this profession as "coders" in the last years from other, less-paid disciplines, and lacking in basic understanding of computers, plus being motivated purely extrinsically - by money - I suspect those people may treat these tools as wonder oracles and may be stupid enough to think the problem is their "prompting" and not inherent un-reliability of LLMs. But everyone else, that is those of us who understand computers at a bit deeper level, do not want to fix Sams and Darios shit LLMs. These folks promised us no less than superintelligent systems, doing this, doing that, curing cancer, writing all the code in 6 months (or is it now 5 months already), creating a society where "work is optional" etc. So again - where TF is all of this shit promised by people sponsoring your soft promotion of LLMs? Why should we develop dependence on tools built by people who obviously dont know WTF they are talking about and who have been fundamentally wrong on several ocassions over the past few years. Whatever you are trying to do, whether you honestly believe in it or not I am afraid is a fool's errand at best.
> you'd think it should not be "us" figuring it out - should it not be the people who are shoving this crap down our throats?
If they're "shoveling this crap down our throats" why should we expect them to help here?
More to the point: a consistent pattern over the last four years has been that the AI labs don't know what their stuff can do yet.. They will openly admit that. They have clearly established that the best way to find out what models can do is to put them out into the world and wait to hear back from their users.
> That's a very arrogant position to assume - on the one hand there is no big secret to using these tools provided you can express yourself at all in written language. However some people for various reasons, I suspect mostly those who wandered into this profession as "coders" in the last years from other, less-paid disciplines, and lacking in basic understanding of computers
I can't take you calling me "arrogant" seriously when in the very next breath you declare coding agents trivial to use and suggest that anyone having trouble with them is a coder and not a proper software engineer!
A hill I will happily die on is that LLM tools, including coding agents, are deceptively difficult to use. If you accepted that was true yourself, maybe you would be able to get better results out of them.
> If they're "shoveling this crap down our throats" why should we expect them to help here?
No no no - they are not supposed "to help". They own this complete timeline of LLMs. Dario Amodei said several times over that the agents will be writing ALL CODE in 6 months. We are now at least one month into his latest instance of this promise. He also babbled a lot about "PhD" level intelligence, just like the other ghoul at that other company. THEY are the ones who promote the supposed superintelligence creeping up on us closer each day. Whatever benchmarks they always push out with new release. But we should cut them some slack, accept that we are stupid for not wanting to burn our brains in multihour sessions with LLMs and just try to figure it out? We should not accept explaining it away as merely some cheap "hype". These people are not some C-list celebrities. They are billionaire CEOs, running companies supposedly worth into high hundreds of billions of dollars, making huge market influencing statements. I expect those statements to be true. Because if they are not, and they are smart people and will know if they are pushing out untruths on purpose, well that's just criminal behaviour. Now tell me more about how "we" should figure it out.
> A hill I will happily die on is that LLM tools, including coding agents, are deceptively difficult to use. If you accepted that was true yourself, maybe you would be able to get better results out of them.
:) No mate, please drop that "getting good results" nonsense. I have been getting good results too if I babysit them, and for the record, have done a bit more with them than just various model use cases. The issue for me and a lot of other people, that with a lot of care and safeguarding and attention etc, yes you can even build something to deploy in production - and myself and my team have done so - however it is so that they are not worth all the babysitting and especially the immense mental fatigue that comes out of working with them in continuity over a longer time span. At the end of the day, for complex projects its actually faster if I shortcircuit my thinking machine to my code-writing executors and skip the natural language bollocks altogether (save for the original spec). Using LLMs is like putting additional friction in between my brain and my hands.
The most impressive part is the remote control mechanism from my phone but yeah, it's not meant to be amazing, it's meant to be something useful that I couldn't have built myself (not knowing SwiftUI) and I knocked out in 40 minutes with Claude Code.
YMMV. I've had a lot of practice at prompting.