2019-08-25-Python 抓取页面的请求重试方法

layout: posttitle: Python 抓取页面的请求重试方法
subtitle: Python 抓取页面的请求重试方法
date: 2019-08-25
author: he xiaodong
header-img: img/default-post-bg.jpg
catalog: true
tags:
- Python
- 抓取页面
- 请求重试
- 捕获报错
- python requests
- requests retry

layout: posttitle: Python 抓取页面的请求重试方法
subtitle: Python 抓取页面的请求重试方法
date: 2019-08-25
author: he xiaodong
header-img: img/default-post-bg.jpg
catalog: true
tags:
- Python
- 抓取页面
- 请求重试
- 捕获报错
- python requests
- requests retry

Python 在请求某个链接的页面时，偶尔会遇到频次限制，或者返回其他乱七八糟的错误，从而被中断了循环抓取，加一个重试机制就好了。

第一种方案：请求上进行 try catch 操作，失败在次请求就行，以下的 get 只是举例

def get(url):
    # import requests
    try:
        return requests.get(url, headers=headers, cookies={'Accept_Cookie_Policy': 'true'}, timeout=10)
    except Exception:
        # sleep for a bit in case that helps
        time.sleep(1)
        # try again
        return get(url)

第二种方法：使用内置的 adapter 设置一下，自动重试就行

from requests.adapters import HTTPAdapter
s = requests.Session()
s.mount('http://', HTTPAdapter(max_retries=3))
s.mount('https://', HTTPAdapter(max_retries=3))
s.get('http://example.com', timeout=10)

基础频率的限制，这样的的方案就够了，如果 IP 被对方服务器拉黑，那就不是重试的问题了。

参考链接：Best practice with retries with requests

最后恰饭阿里云全系列产品/短信包特惠购买中小企业上云最佳选择阿里云内部优惠券

layout: posttitle: Python 抓取页面的请求重试方法subtitle: Python 抓取页面的请求重试方法date: 2019-08-25author: he xiaodongheader-img: img/default-post-bg.jpgcatalog: truetags:- Python- 抓取页面- 请求重试- 捕获报错- python requests- requests retry

layout: posttitle: Python 抓取页面的请求重试方法
subtitle: Python 抓取页面的请求重试方法
date: 2019-08-25
author: he xiaodong
header-img: img/default-post-bg.jpg
catalog: true
tags:
- Python
- 抓取页面
- 请求重试
- 捕获报错
- python requests
- requests retry