What LLM text generation has shown is that you don't actually have to understand...

abeppu · on Nov 29, 2023

But:

- natural language is flexible, computer languages are less so.

- "pretty decent English" still includes hallucinations. I've seen companies whose product demo for generating marketing copy just makes up a plausible review. Hallucinating methods, variables, other packages/modules yields broken code.

- the human thought behind natural language is not feasible to directly provide to a model. An IR corresponding to the source of the program is feasible to provide. A trace of the program executing is feasible to provide. Grounding an LLM in the rich exterior world that humans talk about is hard; grounding an LSM in the rich internal representations accessible to an IDE or a debugger is achievable.

majormajor · on Nov 29, 2023

"pretty decent english" is a pretty fuzzy bar.

Indeed, Chat GPT 4 and Copilot can generate "pretty decent code" that will look fine to the average human coder even when it's incorrect (making up methods or getting params wrong or slighly missing requirements or similar).

The level of precision required for "pretty decent non-trivial code" is much higher than prose that looks like it was written by an educated human, so I share the idea that if it was augmented - even in really stupid ways like asking the IDE if it would even compile, in the case of Copilot, before suggesting it to the user - it would work much better at a much lower effort than increasing it's understanding implicitly by orders of magnitude.

staunton · on Nov 29, 2023

> you don't actually have to understand English to generate pretty decent English. You just have to have enough examples.

I would have thought babies have been showing this beyond a doubt since time immemorial.

eru · on Nov 30, 2023

No, because we can't look into their skulls, to figure out whether they 'understand', whatever that means.

fragmede · on Nov 29, 2023

right. we're already abstracting from English words and characters into tokens, piping code through half a compiler so the LSM is given the AST to train on doesn't seem all that far fetched.