Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Really? I retain plenty of copyrighted material in my head. What matters is the contexts in which I reproduce it (if any).

A search index might also contain copyrighted material. As long as it's used for search queries as opposed to regurgitation there's no problem. Search indexes and LLMs are both clearly very beneficial tools to have access to.

 help



Reproduce it. Sit in a clean room and write it all out. Then go check your accuracy. I'm curious to see what it is.

What does this (thought) experiment accomplish? That is, what point are you trying to make here?

Since we're talking about an electronic system the search index example is the more directly relevant one. Anyone who wants to object to LLMs is going to need to take care to ensure consistency with his views on Google's search index.


I wasn't aware I could read 95% of Harry Potter through constructed queries using Google's search index. Can you demonstrate how I might do this?

Also can you point out how copyright law changes because we're using an "electronic system" as opposed to an "analog system?"


You could do the equivalent if they would let you. They don't. That's the point I was getting at. How the thing is used is what actually matters, not that it has "absorbed" copyrighted material.

I never claimed any change in copyright law. Only that one analogy was more direct than the other for the purpose of the current discussion.

You didn't answer my question. What point were you trying to make with your earlier reply?


Are you a for profit product?

Professional performers could certainly be viewed as such in this analogy. They memorize and then reproduce copyrighted material as a matter of course.

And when they do is when copyright protections might come into play. But not the basic learning of being a human being.

My playing copyrighted music on my synths at home, or singing lyrics along are different than if I am a professional musician benefiting financially from playing someone else's music in public.

Producing a product = market rules apply Just living as a human = totally different thing


Yes, I agree. That was my entire point when I said: What matters is the contexts in which I reproduce it (if any).

The issue is not (or at least should not be) that LLMs are trained on material subject to copyright or can be very intentionally coaxed into regurgitating copyrighted material. The issue should be people building or using systems with the explicit intent of reproducing copyrighted material in an unauthorized manner.


If an LLM is a product, and it contains the work (in this case can spit out Harry Potter) it is derivative. Doesn't matter what it's used for.

> If an LLM is a product, and it contains the work (in this case can spit out Harry Potter) it is derivative. Doesn't matter what it's used for.

That's not the definition of a derivative work in copyright law; further, whether what legally qualifies as a derivative work is within the scope of the exclusive rights of the copyright holder is, in the US, subject to whether it is within one of the exceptions to exclusive rights in the law, notably the fair use exception, which very much does depend on, among other things, what it is used for.


That's dogma on your part. Rather than practical outcome you're opting for human exceptionalism. I can't accept that.

Merely containing a work doesn't make something derivative. A photograph could inadvertently capture a copyrighted image in the background but so long as it isn't the primary focus I think your line of reasoning there fails.


TIL that the law is dogma.

I'm opting for the law differentiating between a product and a person.

'We trained our model on Harry Potter and somehow Harry Potter got into our model' is a ridiculous defense.


It is your view that's dogmatic. The law in this area has yet to be fully tested in court, let alone any prospective changes that might be made to it in the near future.

Regardless, I thought this was a discussion about what the law ought to say.

The defense is that the model is not designed to output Harry Potter verbatim, and in fact will not unless you jump through lots of hoops. Image generation would probably provide you with a stronger position here since those setups can easily output likenesses without needing to carefully engineer the prompt to cause them to do so. But even then it is clearly not the intention of the people training or deploying them that they be used that way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: