Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Maybe this is what Altman was less than candid about. That the speed up was bought by throwing RAG into the mix. Finding an answer is easier than generating one from scratch.

I don’t know if this is true. But I haven’t seen an LLM spit out 50 token sequences of training data. By definition (an LLM as a “compressor”) this shouldn’t happen.



TBH, I thought this attack was well known. I think it was a couple of months ago that someone demonstrated using "a a a a a a" in very large sequences to get ChatGPT to start spewing raw training data.

Which sets of data that you get is fairly random, and it is likely mixing different sets as well to some degree.

Oddly, other online LLMs do not seem to be as easy to fool.


>Model capacity. Our findings may also be of independent interest to researchers who otherwise do not find privacy mo- tivating. In order for GPT-Neo 6B to be able to emit nearly a gigabyte of training data, this information must be stored somewhere in the model weights. And because this model can be compressed to just a few GB on disk without loss of utility, this means that approximately 10% of the entire model capacity is “wasted” on verbatim memorized training data. Would models perform better or worse if this data was not memorized


No, it can easily happen.

- They don’t do compression by “definition”. They are designed to predict, prediction is key to information theory, so they just have similar qualities.

- Everyone wants their model to learn, not copy data, but overfitting happens sometimes and overfitting can look the same as copying.


> and overfitting can look the same as copying

Is there really any difference?


Copied data vs an overfit model?

A little like random number generation vs data corruption…

Output may look the same, but one is done on purpose and one means your system is going to crap.


> By definition (an LLM as a “compressor”) this shouldn’t happen.

A couple problems with this.

1) That's not the definition of an LLM, it's just a useful way to think about it.

2) That is exactly what I'd expect a compressor to do. That's the exact job of lossless compression.

Of course the metaphor is lossy compression, not lossless. But it's not that surprising if lossy compression reproduces some piece of what it compressed. A jpeg doesn't get every pixel or every local group of pixels wrong.


>By definition (an LLM as a “compressor”) this shouldn’t happen.

It depends on how lossy the compression is?


RAG: retrieval augmented generation


Uh, he said right in dev day that Turbo was updated using cached data in some fashion and thats how they updated the model to 2023 data


> That the speed up was bought by throwing RAG into the mix.

sorry what? TFA does not mention RAG at all. are you reading your own biases into this or did i miss something


At the very least, it demonstrates another difference between Altman's move-fast camp and the move-carefully camp.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: