Should lemmy.ml block chatgpt scraping in robots.txt?

GnuLinuxDude@lemmy.ml · 1 year ago

Should lemmy.ml block chatgpt scraping in robots.txt?

Hubi@feddit.de · 1 year ago

Wouldn’t they theoretically be able to set up their own instance, federate with all the larger ones and scrape the data this way? Not sure if blocking them via the robots.txt file is the most effective barrier in case that they really want the data.

dreadedsemi@lemmy.world · edit-2 1 year ago

Robots.txt is more of an honor system. If they respect , they won’t do that trick.

NightAuthor@beehaw.org · 1 year ago

Robots.txt is just a notice anyways. Your scraper could just ignore it, no workaround necessary.