>>> import requests>>> r = requests.get("https://www.amazon.cn/gp/product/B01M8L5Z3Y")>>> r.status_code503>>> r.encoding'IOS-8859-1'>>> r.encoding = r.apparent_encoding>>> t.text
- 网站隐形设定了只接收浏览器的请求,拒绝了爬虫的请求
使用r.request.headers查看头部信息
>>> r,request.headers{'User-Agent': 'python-requests/2.22.0', 'Accept-Encoding': 'gzip, deflate','Accept': '*/*', 'Connection': 'keep-alive'}
修改头部信息
>>> kv = {'user-agent':'Mozilla/5.0'}>>> r = requests.get(url, headers=kv)>>> r.status_code200
全代码
import requestsurl = "https://www.amazon.cn/gp/product/B01M8L5Z3Y"try:kv = {'user-agent':'Mozilla/5.0'}r = requests.get(url, headers=kv)r.raise_for_status()r.encoding = r.apparent_encodingprint(r.text[:1000])except:print("ERROR!")
