Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not copyright expert, but I would have imagined the answer is relatively straightforward: if an image would be infringing if drawn by a human, it's also infringing if drawn by a model, and similarly for non-infringing. Does the manner in which the image is generated play into whether it infringes a copyright?


> Does the manner in which the image is generated play into whether it infringes a copyright?

Yes, it does. Let's leave AI out for a moment.

If you lock yourself into your room with no Internet access, and you draw (or write etc) something independently that just happens to look exactly like some already existing copyright-ed work, you are still not infringing any copyrights.

If you produced the same work by making a copy, you would infringe copyright.

See https://en.wikipedia.org/wiki/Clean_room_design for how that's relevant in real life.

However you can infringe on patents and trademarks with independent work, ie even if you don't just make a copy.


AI isn't locked in a room. No end user can verify the sourced training images, so anything coming close to an existing work should be assumed to have been part of the source material for the derivative.


Oh but you can. I've scanned the LAION training data set (which was used for mini DALLE, Imagen, and Disco Diffusion) and I recognized several images that long to business partners of mine.

How does an AI know what Obama looks like? It memorized thousands of images of him, most of which were by professional photographers and, hence, copyrighted.


Well, it probably didn't memorize any single of them. One thing that has changed dramatically from back when I studied AI (a whole 5ish years ago, sigh...), to my shock, is that the number of training epochs has dropped dramatically. Sometimes down to just one. If it's only ever looked at your image and updated its weights once, could that really be enough to copy it? Only if virtually everything in your image is also present in thousands of other images, which would suggest there was very little originality in your photo to plagiarize. (This is certainly the case with Obama's press photographers - political press photographers aren't hired for originality!)


This is a very interesting thought. If the image is very generic, IMO it should not be copyrightable, it should be public domain.


> political press photographers aren't hired for originality!

Yes they almost definitely are, what else would they be hired for? You wouldn't watch the same news everyday (even if it feels that way)

> the number of training epochs has dropped dramatically

So? Lets imagine a documented nurodiverse person (lets use a photographic memory) "snapsohts" a famous "works" and then sits in a room, after 5 years training, to produce "some work" are they free of copyright claims? Have you not seen the nurodiverse person who draws the entire New York sky line from 1 helicopter ride?

Your approach seems non-sensical, 1 epoch - 10,000 epochs, it doesn't matter - its still a copy / derivative work (which may exempt it, fair enough).

Personally I think all current AI work is just copyright on steroids, DALL-E outputs Shutterstock logos given the right input and Githubs co-poilt is just a lawsuit waiting to happen (implying anyone has the funds to actually sue Microsoft these days, the T&C's of Github must be incredibly broad).


> Yes they almost definitely are, what else would they be hired for?

Craftmansship. They have subtle, but probably very good, intuitions about lighting, facial expressions etc. and also a lot of concrete knowledge about these things. All of which an algorithm can pick up on by examining thousands of images once each.

Once or many matters because if you just examine each data point one, there can be no overfitting in the traditional sense. And that's what shocks me about the "one epoch is all you need" fruits.


> Once or many matters because if you just examine each data point one, there can be no overfitting in the traditional sense.

What makes you so sure? If my algorithm was literally just storing stuff in a hashtable to look up later, you'd get overfitting from a single exposure.


Well gradient descent doesn't do that. And the models, while big in terms of parameter data, are not nearly big enough to actually store all the training data.

Think of it in terms of updating beliefs about the target distribution. With backpropagation, you predict based on the input, and update your beliefs according to how wrong you were. So in a sense it's unsound to re-use data - your beliefs already incorporate them! And traditional overfitting is all that - it's when you use up all the information in your training data. This was many people's objection to neural nets (and I thought it was a good objection at the time, and thought myself that the future lay with more "sound" methods, which performed better on most metrics anyway at the time, rather than with dodgy biomimicry which wasn't really even similar to biological brains at all).

But yes, there are other types of overfitting if you want to get philosophical about it. It's just that the one I and everyone used to worry about, from training too much on your data, just isn't important anymore. And most of those clever principled and less-principled regularization methods just don't matter anymore!


I think the problem is with the definition of copyright. It doesn't apply to the current environment. It could even be the case, that it never made sense in the first place, and only now, we are realizing.

I have never seen Obama in real life. If I can draw him because I know him from copyrighted works, is my draw copyright infringement?


Are you copying one or more of those works? If so, yes. If you aren't, then no. How do you know? Well as an artist, after thousands of hours of copying, and thousands of hours of drawing from imagination, you know which activity you are doing. As an untrained person, you might not know, but the quality/value is so low no one cares.


But drawing from imagination is just copying from an abstraction existing in your brain. And that abstraction comes from inputs from the outside that you saw in the past, where would it come from if not that?

In the Obama example it comes from copyrighted inputs. Without the photos in the magazines and newspapers I would not have idea how Obama looks. Any draw I do, it's going to be derivative of those copyrighted photos and, maybe, other inputs more generics that I saw in the past (that could be copyrighted too).

How is that different from what that software is doing? It's not, it's just that it's done in an industrial scale.

This is just the same problem an artisan had when factories started to appear.


You are using the words 'input' 'output' and 'abstraction' interchangeably between the computer version and the brain version. We know next to nothing what those words mean regarding the brain. Declaring an equivalence seems like a substantial overreach. What we do do is use the terms from the latest technology to describe the brain[0]. What's particularly egregious about this iteration is we are also using human terms to describe the latest technology, and then using those terms 'inspired by', 'creative', 'intelligence', 'learning' as if the algorithms have anything to do with what humans are doing. And then even more, we go on to use them as smokescreen for piracy "the computer 'learned', therefore it's somehow not copying the source material".

Publishers license text to print in books. Music producers license samples to use in songs. Artists license photos and textures to use in illustrations and scenes. _License the source material used in training sets_. _Then_ go out and industrialize art, everyone wins!

[0] https://www.cl.cam.ac.uk/~jgd1000/metaphors.pdf


How does a human know what Obama looks like? They memorized thousands of images[1] of him, most of which were by professional photographers and, hence, copyrighted.

At what point is a neural network sophisticated enough that it can be compared to human learning for copyright purposes?

[1]: For most people at least. Some will have actually seen him in person, and for those this statement would of course not be true.


> AI isn't locked in a room.

Humans aren't in a locked room either. Unless you put them there. Same for AI.

To say the same thing with less snark: in the future there might be a market for AIs trained on 'clean room' data.


Wait, if you draw Pikachu or Homer Simpson from your memory, even without access to internet, you are still infringing the rights of the holders of the characters. How is it different than OpenAI or Midjourney drawing pictures of Homer Simpson from their database/"memory" ?


Drawing Pikachu or Homer Simpson from memory is more likely to violate trademarks than copyrights, I would guess?


Characters are different, as recognisable characters have copyright on their own (not all characters do). But let's say you're drawing a painting from memory, you may or may not be infringing copyright. The question is about substantial copying.


> If you lock yourself into your room with no Internet access, and you draw (or write etc) something independently that just happens to look exactly like some already existing copyright-ed work, you are still not infringing any copyrights.

That's not true.


Please see this as a long-winded and approximate description of clean room development (https://en.wikipedia.org/wiki/Clean_room_design) which is definitely a thing.


The general consensus in the IP space (in the US at least) is to treat AI no differently than any other tool.


> Does the manner in which the image is generated play into whether it infringes a copyright?

Is taking a picture of a painting enough to violate copyright? Is a digital copy of a VHS tape a copyright violation? Is copy/pasting an image a copyright violation?

A computer is not creative because it cannot think. It can only copy what others do; it doesn't understand what it's doing, only how to do something. Equating a human with a computer is unreasonable at the current state of computing and I doubt we'll see a truly conscious AI in the future.

For an automated tool to replicate art into a state such that it would no longer be a mere reproduction of copyrighted material, one would assume that the tool would gain such sophistication that the tool itself should deserve copyright rather than its operators. After all, a commissioner of an art piece may provide the prompt but the art itself is under copyright by the author.

We'll have to see how the courts look at these reconstructions. Personally, I believe tools like DALL-E and Copilot are no more than fancy copy/paste systems and should only be trained on copyright-free materials for their use not to be subject to copyright issues.


Copyright is a privilege granted to man. It would be ridiculous to treat tools as people.


I’m also not a copyright expert, so I wonder about things like if I draw a (crappy) picture of Spider-Man and slap it on a tshirt, can I sell it without violating copyright? I’m sure there’s established law about this kind of thing.


My guess is that your badly drawn Spider-Man T-shirt would violate trademark, but not copyright.


If your shitty Spidey is a drawing you did yourself, you are not violating copyright. You are however blatantly violating Disney’s collection of trademarks on “Spider-Man”, the distinctive appearance of the character, and a host of associated marks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: