互联网不是法外之地,做任何事情之前都请三思而行
通常在网站根目录下的 robots.txt 中约定了可以爬取的内容,以BiliBili为例:
https://www.bilibili.com/robots.txt
User-agent: YisouspiderAllow: /User-agent: ApplebotAllow: /User-agent: bingbotAllow: /User-agent: Sogou inst spiderAllow: /User-agent: Sogou web spiderAllow: /User-agent: 360SpiderAllow: /User-agent: GooglebotAllow: /User-agent: BaiduspiderAllow: /User-agent: BytespiderAllow: /User-agent: PetalBotAllow: /User-agent: *Disallow: /
