>>> import requests
>>> r = requests.get("https://www.amazon.cn/gp/product/B01M8L5Z3Y")
>>> r.status_code
503
>>> r.encoding
'IOS-8859-1'
>>> r.encoding = r.apparent_encoding
>>> t.text
- 网站隐形设定了只接收浏览器的请求,拒绝了爬虫的请求
使用r.request.headers查看头部信息
>>> r,request.headers
{'User-Agent': 'python-requests/2.22.0', 'Accept-Encoding': 'gzip, deflate',
'Accept': '*/*', 'Connection': 'keep-alive'}
修改头部信息
>>> kv = {'user-agent':'Mozilla/5.0'}
>>> r = requests.get(url, headers=kv)
>>> r.status_code
200
全代码
import requests
url = "https://www.amazon.cn/gp/product/B01M8L5Z3Y"
try:
kv = {'user-agent':'Mozilla/5.0'}
r = requests.get(url, headers=kv)
r.raise_for_status()
r.encoding = r.apparent_encoding
print(r.text[:1000])
except:
print("ERROR!")