How would a site make itself acessible to the internet in general while also not allowing itself to be scraped using technology?
robots.txt does rely on being respected, just like no tresspassing signs. The lack of enforcement is the problem, and keeping robots.txt to track the permissions would make it effective again.
I am agreeing, just with a slightky different take.
How would a site make itself acessible to the internet in general while also not allowing itself to be scraped using technology?
robots.txt does rely on being respected, just like no tresspassing signs. The lack of enforcement is the problem, and keeping robots.txt to track the permissions would make it effective again.
I am agreeing, just with a slightky different take.
User agent catching is rather effective. You can serve different responses based on UA.
So generally people will use a robots.txt to catch the bots that play nice and then use useragents to manage abusers.