Wanting to profit from AI companies hunt for training data (over and above the community that created that data) is a big part of what created the context for the recent migration away from Reddit. How will the fediverse approach this problem?

  • key@lemmy.keychat.org
    link
    fedilink
    arrow-up
    14
    ·
    1 year ago

    By not spitting into the wind. It’s infeasible to try to prevent all web scraping from any possible IP which is what you would need to do. Reddit just took advantage of the media topic as a justification, they’re not doing anything real.

    • sachasage@lemmy.worldOP
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      Fair, but then there’s a line between scraping through ordinary traffic and using API access to gather large data sets.

      • key@lemmy.keychat.org
        link
        fedilink
        arrow-up
        3
        arrow-down
        1
        ·
        1 year ago

        Is there? Effect is the same. Use machine learning to parse html generically and throw hardware and a pool of IPs at it. A lot more efficient than coding an API client for every service out there. It’s the same approach search engines use.

        I don’t see anything being done effectively without legal protections.