fattyfoods@feddit.nl to Open Source@lemmy.ml · 4 months agoThe Open-Source Software Saving the Internet From AI Bot Scraperswww.404media.coexternal-linkmessage-square24linkfedilinkarrow-up10arrow-down10
arrow-up10arrow-down1external-linkThe Open-Source Software Saving the Internet From AI Bot Scraperswww.404media.cofattyfoods@feddit.nl to Open Source@lemmy.ml · 4 months agomessage-square24linkfedilink
minus-squaremedem@lemmy.wtflinkfedilinkarrow-up1·4 months ago<Stupidquestion> What advantage does this software provide over simply banning bots via robots.txt? </Stupidquestion>
minus-squarePlantPowerPhysicist@discuss.tchncs.delinkfedilinkarrow-up1·4 months agothe scrapers ignore robots.txt. It doesn’t really ban them - it just asks them not to access things, but they are programmed by assholes.
minus-squarekcweller@feddit.nllinkfedilinkarrow-up1·4 months agoRobots.txt expects that the client is respecting the rules, for instance, marking that they are a scraper. AI scrapers don’t respect this trust, and thus robots.txt is meaningless.
<Stupidquestion>
What advantage does this software provide over simply banning bots via robots.txt?
</Stupidquestion>
the scrapers ignore robots.txt. It doesn’t really ban them - it just asks them not to access things, but they are programmed by assholes.
Robots.txt expects that the client is respecting the rules, for instance, marking that they are a scraper.
AI scrapers don’t respect this trust, and thus robots.txt is meaningless.