I'd guess it's a result of punishing repetition at the RLHF stage to stop it get...

xanderlewis · on Nov 30, 2023

The idea of having the ‘temperature’ parameter is to avoid that sort of looping, but successfully training that behaviour out of the model during RLHF (instead of just raising the temperature) would seem to require the model to develop some sense of what repetition is.

It’s one thing to be able to mimic human text, but to be able to ‘know’ what it means to repeat in general seems to be a slightly higher level of abstraction than I’d expect would just emerge.

…but maybe LLMs have developed more sophisticated models of language than I think.