‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says

L4sBot@lemmy.world · 1 year ago

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says

S410@lemmy.ml · 1 year ago

Every work is protected by copyright, unless stated otherwise by the author.
If you want to create a capable system, you want real data and you want a wide range of it, including data that is rarely considered to be a protected work, despite being one.
I can guarantee you that you’re going to have a pretty hard time finding a dataset with diverse data containing things like napkin doodles or bathroom stall writing that’s compiled with permission of every copyright holder involved.

Exatron@lemmy.world · 1 year ago

How hard it is doesn’t matter. If you can’t compensate people for using their work, or excluding work people don’t want users, you just don’t get that data.

There’s plenty of stuff in the public domain.

beckerist@lemmy.world · 1 year ago

So… let the trainers pay for it. I don’t see the issue

Fisk400@feddit.nu · 1 year ago

Sounds like a OpenAI problem and not an us problem.

HelloThere@sh.itjust.works · edit-2 1 year ago

I never said it was going to be easy - and clearly that is why OpenAI didn’t bother.

If they want to advocate for changes to copyright law then I’m all ears, but let’s not pretend they actually have any interest in that.

deweydecibel@lemmy.world · 1 year ago

I can guarantee you that you’re going to have a pretty hard time finding a dataset with diverse data containing things like napkin doodles or bathroom stall writing that’s compiled with permission of every copyright holder involved.

You make this sound like a bad thing.

BURN@lemmy.world · 1 year ago

And why is that a bad thing?

Why are you entitled to other peoples work, just because “it’s hard to find data”?

S410@lemmy.ml · 1 year ago

Why are you entitled to other peoples work?

Do you really think you’ve never consumed data that was not intended for you? Never used copyrighted works or their elements in your own works?

Re-purposing other people’s work is literally what humanity has been doing for far longer than the term “license” existed.

If the original inventor of the fire drill didn’t want others to use it and barred them from creating a fire bow, arguing it’s “plagiarism” and “a tool that’s intended to replace me”, we wouldn’t have a civilization.

If artists could bar other artists from creating music or art based on theirs, we wouldn’t have such a thing as “genres”. There are genres of music that are almost entirely based around sampling and many, many popular samples were never explicitly allowed or licensed to anyone. Listen to a hundred most popular tracks of the last 50 years, and I guarantee you, a dozen or more would contain the amen break, for example.

Whatever it is you do with data: consume and use yourself or train a machine learning model using it, you’re either disregarding a large number of copyright restrictions and using all of it, or exist in an informational vacuum.