• Moonrise2473
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    No, there’s a big difference. It just includes scraped data from pirate websites. As in: the page with the description and the “download now” button.

    This is because they didn’t do a separate scrape for training, but they used what they already had in their service during scans for web indexing. Is zlibrary (b-ok in the article) present in web results? Yes. So, parts of their pages (excerpt stolen from Amazon + a “login to download now” button) are also present in the model.

    Between this and assuming that the bot was specifically programmed to login in a pirate website, pay for VIP access (download 1000 eBooks a day instead of 10), then parse the content of the ebook, which aren’t in a consistent format because it’s user uploaded so it changes consistently there’s an ocean of difference