> political press photographers aren't hired for originality! Yes they almost de...

vintermann · on Aug 16, 2022

> Yes they almost definitely are, what else would they be hired for?

Craftmansship. They have subtle, but probably very good, intuitions about lighting, facial expressions etc. and also a lot of concrete knowledge about these things. All of which an algorithm can pick up on by examining thousands of images once each.

Once or many matters because if you just examine each data point one, there can be no overfitting in the traditional sense. And that's what shocks me about the "one epoch is all you need" fruits.

eru · on Aug 16, 2022

> Once or many matters because if you just examine each data point one, there can be no overfitting in the traditional sense.

What makes you so sure? If my algorithm was literally just storing stuff in a hashtable to look up later, you'd get overfitting from a single exposure.

vintermann · on Aug 16, 2022

Well gradient descent doesn't do that. And the models, while big in terms of parameter data, are not nearly big enough to actually store all the training data.

Think of it in terms of updating beliefs about the target distribution. With backpropagation, you predict based on the input, and update your beliefs according to how wrong you were. So in a sense it's unsound to re-use data - your beliefs already incorporate them! And traditional overfitting is all that - it's when you use up all the information in your training data. This was many people's objection to neural nets (and I thought it was a good objection at the time, and thought myself that the future lay with more "sound" methods, which performed better on most metrics anyway at the time, rather than with dodgy biomimicry which wasn't really even similar to biological brains at all).

But yes, there are other types of overfitting if you want to get philosophical about it. It's just that the one I and everyone used to worry about, from training too much on your data, just isn't important anymore. And most of those clever principled and less-principled regularization methods just don't matter anymore!