GenAI tools ‘could not exist’ if firms are made to pay copyright::undefined
I do love how AI has gotten Corporate Giants to start attacking the Copyright System they’ve used to beat down the little man for generations
Maybe because it’s not the same corporations? We might be seeing a giant powershift from IP hoarders to makers.
Makers use the copyright system to their advantage as well though. If I write code and place it on github, the only thing stopping a mega corp stealing it is the copyright I hold.
Abolishing copyright is not a win.
Let’s not kid ourselves that the copyright is stopping mega corporations from stealing your github code.
What’s stopping them from hiring an engineer that basically rewrites your code? No one would ever know.
Copyleft enforcement is laughable at best and thats with legitimate non profits working on it (like FSF) and that’s when it comes to direct library use without modifications and there’s basically no history of prosecution or penalties for partial code copying (nor that there should be imo) that’s even when 1:1 code has been found!
I feel like copyright has been doing very little in modern age and have yet to see any science that contradicts my opinion here. Most copyright holders (like high 90%) are mega corporations like ghetty images that hardly contribute back to the society.
it definitely is, tho.
of course they could steal your code, but you could steal theirs. you could steal their software. you could steal their paywalled articles. you could steal all those things that are affected by artificial scarcity. They have much more to loose if copyright gets abolished.
In an ideal world, I’d make copyright to be 5 years for individuals and abolished for corporations. But the world is far from ideal and individuals have much to win if copyright gets abolished as a whole.
Weakening copyright is a win
So they’re admitting that their entire business model requires them to break the law. Sounds like they shouldn’t exist.
Reproduction of copyrighted material would be breaking the law. Studying it and using it as reference when creating original content is not.
Reproduction of copyrighted material would be breaking the law. Studying it and using it as reference when creating original content is not.
I’m curious why we think otherwise when it is a student obtaining an unauthorized copy of a textbook to study, or researchers getting papers from sci-hub. Probably because it benefits corporations and they say so?
While I would like to be in a world where knowledge is free, this is apples and oranges.
OpenAI can purchase a textbook and read it. If their AI uses the knowledge gained to explain maths to an individual, without reproducing the original material, then there’s no issue.
The difference is the student in your example didn’t buy their textbook. Someone else bought it and reproduced the original for others to study from.
If OpenAI was pirating textbooks, that would be a wholly separate issue.
The fact that the “AI” can spit out whole passages verbatim when given the right prompts, suggests that there is a big problem here and they haven’t a clue how to fix it.
It’s not “learning” anything other than the probable order of words.
humans studying it, is fair use.
Copyright can only be granted to works created by a human, but I don’t know of any such restriction for fair use. Care to share a source explaining why you think only humans are able to use fair use as a defense for copyright infringement?
Because a human has to use talent+effort to make something that’s fair use. They adapt a product into something that while similar is noticeably different. AI will
-
make things that are not just similar but not noticeably different.
-
There’s not an effort in creation. There’s human thought behind a prompt but not on the AI following it.
-
If allowed to AI companies will basically copyright everything…
You are aware of the insane amounts of research, human effort and the type of human talent that is required to make a simple piece of software, let alone a complex artificial neural network model whose function is to try and solve whatever stuff…right?
Your reply has nothing to do with fair use doctrine.
-
What’s the difference? Humans are just the intent suppliers, the rest of the art is mostly made possible by software, whether photoshop or stable diffusion.
It likely doesn’t break the law. You should check out this article by Kit Walsh, a senior staff attorney at the EFF, and this one by Katherine Klosek, the director of information policy and federal relations at the Association of Research Libraries.
Headlines like these let people assume that it’s illegal, rather than educate people on their rights.
The Kit Walsh article purposefully handwaves around a couple of issues that could present larger issues as law suits in this arena continue.
-
He says that due to the size of training data and the model, only a byte of data per image could be stored in any compressed format, but this assumes all training data is treated equally. It’s very possible certain image artifacts are compressed/stored in the weights more than other images.
-
These models don’t produce exact copies. Beyond the Getty issue, nytimes recently released an article about a near duplicate - https://www.nytimes.com/interactive/2024/01/25/business/ai-image-generators-openai-microsoft-midjourney-copyright.html.
I think some of the points he makes are valid, but they’re making a lot of assumptions about what is actually going on in these models which we either don’t know for certain or have evidence to the contrary.
I didn’t read Katherine’s article so maybe there is something more there.
She addresses both of those, actually. The Midjourney thing isn’t new, It’s the sign of a poorly trained model.
I’m not sure she does, just read the article and it focuses primarily what models can train on. However, the real meat of the issue, at least I think, with GenAI is what it produces.
For example, if I built a model that just spit out exact frames from “Space Jam”, I don’t think anyone would argue that would be a problem. The question is where is the line?
This part does:
It’s not surprising that the complaints don’t include examples of substantially similar images. Research regarding privacy concerns suggests it is unlikely it is that a diffusion-based model will produce outputs that closely resemble one of the inputs.
According to this research, there is a small chance that a diffusion model will store information that makes it possible to recreate something close to an image in its training data, provided that the image in question is duplicated many times during training. But the chances of an image in the training data set being duplicated in output, even from a prompt specifically designed to do just that, is literally less than one in a million.
The linked paper goes into more detail.
On the note of output, I think you’re responsible for infringing works, whether you used Photoshop, copy & paste, or a generative model. Also, specific instances will need to be evaluated individually, and there might be models that don’t qualify. Midjourney’s new model is so poorly trained that it’s downright easy to get these bad outputs.
This goes back to my previous comment of handwaving away the details. There is a model out there that clearly is reproducing copyrighted materials almost identically (nytimes article), we also have issues with models spitting out training data https://www.wired.com/story/chatgpt-poem-forever-security-roundup/. Clearly people studying these models don’t fully know what is actually possible.
Additionally, it only takes one instance to show that these models, in general, can and do have issues with regurgitating copyrighted data. Whether that passes the bar for legal consequences we’ll have to see, but i think it’s dangerous to take a couple of statements made by people who don’t seem to understand the unknowns in this space at face value.
The article dealt with Stable Diffusion, the only open model that allowed people to study it. If there were more problems with Stable Diffusion, we’d’ve heard of them by now. These are the critical solutions Open-source development offers here. By making AI accessible, we maximize public participation and understanding, foster responsible development, as well as prevent harmful control attempts.
As it stands, she was much better informed than you are and is an expert in law to boot. On the other hand, you’re making a sweeping generalization right into an appeal to ignorance. It’s dangerous to assert a proposition just because it has not been proven false.
-
Not that I am a fan of the current implementation of copyright in the US, but I know if I was planning on building my business around something that couldn’t exist without violating copyright I would surely thought of that fairly early on.
The LCA principles also make the careful and critical distinction between input to train an LLM, and output—which could potentially be infringing if it is substantially similar to an original expressive work.
from your second link. I don’t often see this brought up in discussions. The problem of models trained on copyrighted info is definitely different than what you do with that model/output from it. If you’re making money from infringing, the fair use arguments are historically less successful. I have less of an issue with the general training of a model vs. commercial infringing use.
You’re responsible for infringing works, whether you used Photoshop, copy & paste, or a generative model.
I don’t disagree with that statement. I’m having trouble seeing how that fits with what I said, though. Can you elaborate?
It doesn’t really, I was just kind of restating what you quoted. Since no one factor of fair use is more important than the others, and it is possible to have a fair use defense even if you do not meet all the criteria of fair use, do you have data to back up your claims about moneymaking infringement?
I’d be fine with this argument if these generative tools were only being used by non-profits. But they aren’t.
So I think there has to be some compromise here. Some type of licensing fee should be paid by these generative AI tools.
You’re basically arguing for making any free use of them illegal, thereby giving a monopoly to the richest and most powerful capitalists.
Humans won’t be able to compete, and you won’t be able to use the means of generation either.
I’m arguing for free commercial use being illegal, absolutely.
And that fee should scale based on who is using it for commercial purposes. Microsoft and Google should be paying far, far out the ass for their data.
And who does that serve?
Do you mean the whole thing or something specific like free commercial use being illegal?
I think the answer to both are the people who created the art, text, etc that these generative AI tools are going to make mostly obsolete.
Open source or open use AI will be practically illegal. Research will be practically impossible. It will be exclusively controlled by super rich and powerful corporations.
It won’t benefit the creators. It will only benefit those with the most capital that can buy up the training data needed and then can set the market so they make almost all of the money. For example you’d need to buy all the user content from reddit, facebook and twitter to train an AI. That will cost many millions because it’s a precious commodity (and only they own it). So only a few will control the “means of generation” and they will (have to) use it to make profit for themselves. This will make it practically illegal to make a free or an independent AI because you don’t have access to training data. This sets the rules and will lead to incredibly bad outcomes. For example anti-consumerist thinking or dissent could be suppressed, or other more subtle biases. Anything that reduces profit from advertising or threatens the shareholders. And they can manipulate the training data behind closed doors.
But that is how it’s going to go and it’s going to make the effects of AI generation on our civilization extra bad. We are so fucked :(
So… This may be an unpopular question. Almost every time AI is discussed, a staggering number of posts support very right-wing positions. EG on topics like this one: Unearned money for capital owners. It’s all Ayn Rand and not Karl Marx. Posters seem to be unaware of that, though.
Is that the “neoliberal Zeitgeist” or what you may call it?
I’m worried about what this may mean for the future.
ETA: 7 downvotes after 1 hour with 0 explanation. About what I expected.
I see way too many people advocating for copyright. I understand in this case it benefits big companies rather than consumers, but if you disagree with copyright, as I do, you should be consistent.
Copyright law should benefit humans, not machines, not corporations. And no, corporations are not people. Anthony Kennedy can get bent.
Abolishing copyright in the way that allows for the existence of Gen AI benefits people far more than it does corpos
I’d say the main reason is companies are profiting off the work of others. It’s not some grand positive motive for society, but taking the work of others, from other companies, sure, but also from small time artists, writers, etc.
Then selling access to the information they took from others.
I wouldn’t call it a right wing position.
Wanting to abolish the IRS is a right-wing policy that will benefit the rich. That doesn’t change when some marketing genius talks about how the IRS takes money from small time artists, writers, etc. Same thing. It’s about substance and not manipulative framing.
That isn’t remotely similar…
The IRS takes a portion of income. This is taking away someone’s income, then charging access to it.
Like it or not, these people need money to survive. Calling it right wing to think these individuals deserve to be paid for someone taking their work, then using it for a product they sell access to, is absolutely insane to me.
I don’t know how this is supposed to make sense.
One is a percentage of income that everyone pays into.
The other is stealing someone’s work then using that person’s work for profit.
Recognizing that stealing someone’s work is not a right-wing position.
How is this complicated?
I see. Thanks for explaining.
This view of property rights as absolute is what right-libertarians, anarcho-capitalists, etc… espouse. Usually the cries of “theft” come when it gets to taxes, though. Is it supposed to be not right because it’s about intellectual property?
Property rights are not necessarily right-wing (communism notwithstanding). What is definitely right-wing is (heritable) privilege and that’s implied in these views of property.
ETA: Just to make sure that I really understand what you are saying: When you say “stealing someone’s work” you do mean the unauthorized copying of copyrighted expression, yes? Do you actually understand that copyright is intellectual property and that property is not usually called work? Labor and capital are traditionally considered opposites, of a sort, particularly among the left.
So… You think their art or writing was created by what then? Magic? Do you think no time was expended in the creation of books, research, drawings, painted canvases, etc?
Do you think they should starve because we currently live in a world driven entirely around money?
I don’t get your point even remotely.
I think it’s a conflation of the ideas of what copyright should be and actually is. I don’t tend to see many people who believe copyright should be abolished in its entirety, and if people write a book or a song they should have some kind of control over that work. But there’s a lot of contention over the fact that copyright as it exists now is a bit of a farce, constantly traded and sold and lasting an aeon after the person who created the original work dies.
It seems fairly morally constant to think that something old and part of the zeitgeist should not be under copyright, but that the system needs an overhaul when companies are using your live journal to make a robot call center.
Lemmy seems left-wing on economics in other threads. But on AI, it’s private property all the way, without regard for the consequences on society. The view on intellectual property is that of Ayn Rand. Economically, it does not get further to the right than that.
My interpretation is that people go by gut feeling and never think of the consequences. The question is, why does their gut give them a far-right answer? One answer is that somehow our culture, at present, fosters such reactions; that it is the zeitgeist. If that’s the truth (and this reflects a wider trend) then inequality will continue to increase as a result of voter’s demands.
Yeah I think that this is showing a lot of people only really care about espousing anti-privatization ideas as long as it suits their personal interests and as long as they feel they have more to gain than to lose. People are selfish, and a lot of progressive, or really any kind of passionate rhetoric is often conveniently self-serving and emotionally driven, rather than truly principled.
You’re not wrong but how many people here are actually pursuing their own personal interest. Most people here are probably wage-earners. Yet so many people support giving more money to property owners without any kind of requirement or incentive for work. Just a rent for property owners. It feels like this should be met with knee-jerk rejection.
I don’t know what you’re on about, the majority of the thread is pro open source AI and anti-capitalist, which is as left a stance as it gets, it’s not called “copyleft” for no reason. No one here wants to see AI banned and the already insane IP laws expanded to the benefit of the few corpos like the NYT at the expense of broader society.
IDK. I have seen a number of pro-corpo copyleft takes. It’s absolutely crazy to me. The pitch is that expansive copyright makes for expansive copyleft. It seems neo-feudal to me. The lords have their castles but the peasants have their commons.
Fair enough, seems like they’re down voting us anyway