cross-posted from: https://lemmy.ml/post/5400607

This is a classic case of tragedy of the commons, where a common resource is harmed by the profit interests of individuals. The traditional example of this is a public field that cattle can graze upon. Without any limits, individual cattle owners have an incentive to overgraze the land, destroying its value to everybody.

We have commons on the internet, too. Despite all of its toxic corners, it is still full of vibrant portions that serve the public good — places like Wikipedia and Reddit forums, where volunteers often share knowledge in good faith and work hard to keep bad actors at bay.

But these commons are now being overgrazed by rapacious tech companies that seek to feed all of the human wisdom, expertise, humor, anecdotes and advice they find in these places into their for-profit A.I. systems.

  • kibiz0r@midwest.social
    link
    fedilink
    English
    arrow-up
    2
    ·
    9 months ago

    Your justification seems to rest on whether LLM training technically passes the legal standard of violating IP.

    That’s not a super compelling argument to me, because:

    1. Nobody designed current IP law with LLMs in mind
    2. I would wager that a vast majority of creators whose works were consumed by LLMs did not consider whether their license would permit such an act, and thus didn’t meaningfully consent to have their work used this way (whether or not the law would agree)
    3. I would argue that IP law is heavily stacked in favor of platforms (who own IP, but do not create it) and against creators (who create, but do not own IP) and consumers

    I don’t think that there is fundamentally anything wrong with LLMs as a technology. My problem is that the economic incentives are misaligned with long-term stability of the creative pools that fuel these things in the first place.

    • FaceDeer@kbin.social
      link
      fedilink
      arrow-up
      2
      arrow-down
      1
      ·
      9 months ago

      Your justification seems to rest on whether LLM training technically passes the legal standard of violating IP.

      That’s basically all that I’m talking about here, yeah. I’m saying that the current laws don’t appear to say anything against training AIs off of public data. The AI model is not a copy of that data, nor is its output.

      Nobody designed current IP law with LLMs in mind

      Indeed. Things are not illegal by default, there needs to be a law or some sort of precedent that makes them illegal. In the realm of LLMs that’s very sparse right now for exactly the reason you say. Nobody anticipated it so nobody wrote any laws forbidding it.

      I would wager that a vast majority of creators whose works were consumed by LLMs did not consider whether their license would permit such an act, and thus didn’t meaningfully consent to have their work used this way (whether or not the law would agree)

      There are things that you can use intellectual property for that do not require consent in the first place. Fair use describes various categories of that. If it’s not illegal to use copyrighted material without permission when training AIs, why would it matter whether the license permitted it or the author consented to it?

      I would argue that IP law is heavily stacked in favor of platforms (who own IP, but do not create it) and against creators (who create, but do not own IP) and consumers

      Wouldn’t requiring licensing of data for the training of LLMs stack things even more in the favour of big IP-owning platforms?

      Again, as I said before, if you think some specific bit of LLM output is violating the copyright of some code you wrote, there’s already laws in place specifically covering that situation. You can go to court and show that the two pieces of code are substantially identical and sue for damages or whatever. The AI model itself is another matter, though, and I doubt any current laws would count it as a “copy” of the data that went into training it.