Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd guess it's a result of punishing repetition at the RLHF stage to stop it getting into the loops that copilot etc used to so easily fall into.


The idea of having the ‘temperature’ parameter is to avoid that sort of looping, but successfully training that behaviour out of the model during RLHF (instead of just raising the temperature) would seem to require the model to develop some sense of what repetition is.

It’s one thing to be able to mimic human text, but to be able to ‘know’ what it means to repeat in general seems to be a slightly higher level of abstraction than I’d expect would just emerge.

…but maybe LLMs have developed more sophisticated models of language than I think.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: