Cynicus Rex@lemmy.ml to Privacy@lemmy.mlEnglish · 5 months agoHow to block AI Crawler Bots using robots.txt filewww.cyberciti.bizexternal-linkmessage-square60fedilinkarrow-up142arrow-down17
arrow-up135arrow-down1external-linkHow to block AI Crawler Bots using robots.txt filewww.cyberciti.bizCynicus Rex@lemmy.ml to Privacy@lemmy.mlEnglish · 5 months agomessage-square60fedilink
minus-squareCynicus Rex@lemmy.mlOPlinkfedilinkarrow-up0arrow-down1·5 months ago#TL;DR: User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: Google-Extended Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Amazonbot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Omgilibot Disallow: / User-Agent: FacebookBot Disallow: / User-Agent: Applebot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Bytespider Disallow: / User-agent: Claude-Web Disallow: / User-agent: Diffbot Disallow: / User-agent: ImagesiftBot Disallow: / User-agent: Omgilibot Disallow: / User-agent: Omgili Disallow: / User-agent: YouBot Disallow: /
minus-squaremox@lemmy.sdf.orglinkfedilinkarrow-up0·5 months agoOf course, nothing stops a bot from picking a user agent field that exactly matches a web browser.
minus-squareJackbyDev@programming.devlinkfedilinkEnglisharrow-up1·5 months agoNothing stops a bot from choosing to not read robots.txt
#TL;DR:
User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: Google-Extended Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Amazonbot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Omgilibot Disallow: / User-Agent: FacebookBot Disallow: / User-Agent: Applebot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Bytespider Disallow: / User-agent: Claude-Web Disallow: / User-agent: Diffbot Disallow: / User-agent: ImagesiftBot Disallow: / User-agent: Omgilibot Disallow: / User-agent: Omgili Disallow: / User-agent: YouBot Disallow: /
Of course, nothing stops a bot from picking a user agent field that exactly matches a web browser.
Nothing stops a bot from choosing to not read robots.txt