• Star@sopuli.xyzOP
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    1 year ago

    It’s so ridiculous when corporations steal everyone’s work for their own profit, no one bats an eye but when a group of individuals do the same to make education and knowledge free for everyone it’s somehow illegal, unethical, immoral and what not.

    • Grimy@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      1 year ago

      Using publically available data to train isn’t stealing.

      Daily reminder that the ones pushing this narrative are literally corporation like OpenAI. If you can’t use copyright materials freely to train on, it brings up the cost in such a way that only a handful of companies can afford the data.

      They want to kill the open-source scene and are manipulating you to do so. Don’t build their moat for them.

      • givesomefucks@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        1 year ago

        And using publicly available data to train gets you a shitty chatbot…

        Hell, even using copyrighted data to train isn’t that great.

        Like, what do you even think they’re doing here for your conspiracy?

        You think OpenAI is saying they should pay for the data? They’re trying to use it for free.

        Was this a meta joke and you had a chatbot write your comment?

        • tourist@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Was this a meta joke and you had a chatbot write your comment?

          if someone said this to me I’d cry

        • webghost0101@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          1 year ago

          The point that was being made was that public available data includes a whole lot amount of copyrighted data to begin with and its pretty much impossible to filter it out. Grand example, the Eiffel tower in Paris is not copyright protected, but the lights on it are so you can only using pictures of the Eiffel tower during the day, if the picture itself isn’t copyright protected by the original photographer. Copyright law has all these complex caveat and exception that make it impossible to tell in glance whether or not it is protected.

          This in turn means, if AI cannot legally train on copyrighted materials it finds online without paying huge sums of money then effectively only mega corporation who can pay copyright fines as cost of business will be able to afford training decent AI.

          The only other option to produce any ai of such type is a very narrow curated set of known materials with a public use license but that is not going to get you anything competent on its own.

          EDIT: In case it isn’t clear i am clarifying what i understood from Grimy@lemmy.world comment, not adding to it.

          • RainfallSonata@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            arrow-down
            1
            ·
            1 year ago

            I didn’t want any of this shit. IDGAF if we don’t have AI. I’m still not sure the internet actually improved anything, let alone what the benefits of AI are supposed to be.

      • kibiz0r@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 year ago

        We have a mechanism for people to make their work publically visible while reserving certain rights for themselves.

        Are you saying that creators cannot (or ought not be able to) reserve the right to ML training for themselves? What if they want to selectively permit that right to FOSS or non-profits?

        • Grimy@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          1 year ago

          Essentially yes. There isn’t a happy solution where FOSS gets the best images and remains competitive. The amount of data needed is outside what can be donated. Any open source work will be so low in quality as to be unusable.

          It also won’t be up to them. The platforms where the images are posted will be selling and brokering. No individual is getting a call unless they are a household name.

          None of the artists are getting paid either way so yeah, I’m thinking of society in general first.

      • grue@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        1 year ago

        They want to kill the open-source scene

        Yeah, by using the argument you just gave as an excuse to “launder” copyleft works in the training data into permissively-licensed output.

        Including even a single copyleft work in the training data ought to force every output of the system to be copyleft. Or if it doesn’t, then the alternative is that the output shouldn’t be legal to use at all.

        • Grimy@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          1 year ago

          100% agree, making all outputs copyleft is a great solution. We get to keep the economic and cultural boom that AI brings while keeping the big companies in check.

  • Aielman15@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I pirated 90% of the texts I used to write my thesis at university, because those books would have cost me hundreds of euros that I didn’t have.

    Fuck you, capitalism.

  • UnderpantsWeevil@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 year ago

    Consider who sits on OpenAI’s board and owns all their equity.

    SciHub’s big mistake was to fail to get someone like Sundar Pichai or Jamie Iannone with a billion-dollar stake in the company.

    • UnderpantsWeevil@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      1 year ago

      A.I. doesn’t violate copywrite laws. It is the data-mining done to train A.I. and the regurgitation of said data in the responses that ultimately violate these laws. A model trained on privately owned, properly licensed, or exclusively public works wouldn’t be a problem.

      Even then, I would argue that lack of attribution is a bigger problem than merely violating copywrite. A big part of the LLM mystique is in how it can spit out a few lines of Shakespeare without accreditation and convince its users that its some kind of master poet.

      Copywrite law is stupid and broken. But plagarism is a problem in its own right, as it seeks to effectively sell people their own creative commons at an absurd markup.

      • trafficnab@lemmy.ca
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        1 year ago

        A model trained on privately owned, properly licensed, or exclusively public works wouldn’t be a problem.

        This is how we end up with only corpo owned AIs being allowed to exist imo, places like stock photo sites are the only ones with large enough repositories of images to train AI that they have all the legal rights to

        The way I see it, either generative AI is legal, free for everyone to run locally, and the created works are public domain, OR, everyone pays $20/mo to massive faceless corpos for the rest of their lives to have the privilege of access to it because they’re the only ones who own all (or have enough money to license) the IP needed to train them

        • UnderpantsWeevil@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          1 year ago

          This is how we end up with only corpo owned AIs being allowed to exist imo

          Its how you end up with sixteen different streaming services that only vend a sliver of the total available content, sure. But the underlying technology of AI grows independent of what its trained on.

          The way I see it, either generative AI is legal, free for everyone to run locally, and the created works are public domain, OR, everyone pays $20/mo to massive faceless corpos for the rest of their lives to have the privilege of access to it

          There are other alternatives. These sites can be restricted to data within the public domain. And we can increase our investment in public media. The problem of NYT articles being digested and regurgitated as ChatGPT info-vomit isn’t a problem if the NYT is a publicly owned and operated enterprise. Then its not struggling to profit off journalism, but treating this information as a loss-leading public service open to all, with ChatGPT simply operating as a tool to store, process, and present the data.

          Similarly, if you limit generative AI to the old Mickey Mouse and Winnie-the-Pooh films from the 1930s, you leave plenty of room for original artists to create new works without fear that their livelihoods get chews up and fed back into the system. If you invest in public art exhibitions then these artists can get paid to pursue their craft, the art becomes public domain immediately, and digital tools that want to riff on the original are free to do so without undermining the artists themselves.