Sarah Silverman and other authors are suing OpenAI and Meta for copyright infringement, alleging that they're training their LLMs on books via Library Genesis and Z-Library

Arthur Besse@lemmy.ml · edit-2 2 years ago

Sarah Silverman and other authors are suing OpenAI and Meta for copyright infringement, alleging that they're training their LLMs on books via Library Genesis and Z-Library

Moonrise2473 · 2 years ago

No, there’s a big difference. It just includes scraped data from pirate websites. As in: the page with the description and the “download now” button.

This is because they didn’t do a separate scrape for training, but they used what they already had in their service during scans for web indexing. Is zlibrary (b-ok in the article) present in web results? Yes. So, parts of their pages (excerpt stolen from Amazon + a “login to download now” button) are also present in the model.

Between this and assuming that the bot was specifically programmed to login in a pirate website, pay for VIP access (download 1000 eBooks a day instead of 10), then parse the content of the ebook, which aren’t in a consistent format because it’s user uploaded so it changes consistently there’s an ocean of difference

Sarah Silverman and other authors are suing OpenAI and Meta for copyright infringement, alleging that they're training their LLMs on books via Library Genesis and Z-Library

Sarah Silverman and other authors are suing OpenAI and Meta for copyright infringement, alleging that they're training their LLMs on books via Library Genesis and Z-Library

Sarah Silverman Sues ChatGPT Creator for Copyright Infringement