Maybe this is what Altman was less than candid about. That the speed up was boug...

_ea1k · on Nov 29, 2023

TBH, I thought this attack was well known. I think it was a couple of months ago that someone demonstrated using "a a a a a a" in very large sequences to get ChatGPT to start spewing raw training data.

Which sets of data that you get is fairly random, and it is likely mixing different sets as well to some degree.

Oddly, other online LLMs do not seem to be as easy to fool.

bunabhucan · on Nov 30, 2023

>Model capacity. Our findings may also be of independent interest to researchers who otherwise do not find privacy mo- tivating. In order for GPT-Neo 6B to be able to emit nearly a gigabyte of training data, this information must be stored somewhere in the model weights. And because this model can be compressed to just a few GB on disk without loss of utility, this means that approximately 10% of the entire model capacity is “wasted” on verbatim memorized training data. Would models perform better or worse if this data was not memorized

WhitneyLand · on Nov 29, 2023

No, it can easily happen.

- They don’t do compression by “definition”. They are designed to predict, prediction is key to information theory, so they just have similar qualities.

- Everyone wants their model to learn, not copy data, but overfitting happens sometimes and overfitting can look the same as copying.

a_random_canuck · on Nov 30, 2023

> and overfitting can look the same as copying

Is there really any difference?

WhitneyLand · on Nov 30, 2023

Copied data vs an overfit model?

A little like random number generation vs data corruption…

Output may look the same, but one is done on purpose and one means your system is going to crap.

furyofantares · on Nov 29, 2023

> By definition (an LLM as a “compressor”) this shouldn’t happen.

A couple problems with this.

1) That's not the definition of an LLM, it's just a useful way to think about it.

2) That is exactly what I'd expect a compressor to do. That's the exact job of lossless compression.

Of course the metaphor is lossy compression, not lossless. But it's not that surprising if lossy compression reproduces some piece of what it compressed. A jpeg doesn't get every pixel or every local group of pixels wrong.

discreteevent · on Nov 29, 2023

>By definition (an LLM as a “compressor”) this shouldn’t happen.

It depends on how lossy the compression is?

cma · on Nov 29, 2023

RAG: retrieval augmented generation

tsunamifury · on Nov 29, 2023

Uh, he said right in dev day that Turbo was updated using cached data in some fashion and thats how they updated the model to 2023 data

swyx · on Nov 29, 2023

> That the speed up was bought by throwing RAG into the mix.

sorry what? TFA does not mention RAG at all. are you reading your own biases into this or did i miss something

6gvONxR4sf7o · on Nov 29, 2023

At the very least, it demonstrates another difference between Altman's move-fast camp and the move-carefully camp.