1. >>> import requests
  2. >>> r = requests.get("https://www.amazon.cn/gp/product/B01M8L5Z3Y")
  3. >>> r.status_code
  4. 503
  5. >>> r.encoding
  6. 'IOS-8859-1'
  7. >>> r.encoding = r.apparent_encoding
  8. >>> t.text
  • 网站隐形设定了只接收浏览器的请求,拒绝了爬虫的请求
  • 使用r.request.headers查看头部信息

    1. >>> r,request.headers
    2. {'User-Agent': 'python-requests/2.22.0', 'Accept-Encoding': 'gzip, deflate',
    3. 'Accept': '*/*', 'Connection': 'keep-alive'}
  • 修改头部信息

    1. >>> kv = {'user-agent':'Mozilla/5.0'}
    2. >>> r = requests.get(url, headers=kv)
    3. >>> r.status_code
    4. 200

    全代码

    1. import requests
    2. url = "https://www.amazon.cn/gp/product/B01M8L5Z3Y"
    3. try:
    4. kv = {'user-agent':'Mozilla/5.0'}
    5. r = requests.get(url, headers=kv)
    6. r.raise_for_status()
    7. r.encoding = r.apparent_encoding
    8. print(r.text[:1000])
    9. except:
    10. print("ERROR!")