Content-Length缺失

有些时候,headers缺失Content-Length,Transfer-Encoding为chunked
请求的headers里Accept-Encoding默认为’gzip, deflate’

  1. import requests
  2. url = 'https://data.broadinstitute.org/gsea-msigdb/msigdb/release/7.2/msigdb_v7.2.xml'
  3. resp = requests.get(url, stream=True)
  4. resp.headers
  5. {'Date': 'Tue, 16 Mar 2021 08:32:06 GMT', 'Server': 'Apache', 'Last-Modified': 'Thu, 22 Oct 2020 05:06:05 GMT', 'ETag': '"ba9113e-5b23b6bced775"', 'Accept-Ranges': 'bytes', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'gzip', 'Access-Control-Allow-Origin': '*', 'Keep-Alive': 'timeout=15, max=100', 'Connection': 'Keep-Alive', 'Transfer-Encoding': 'chunked', 'Content-Type': 'text/xml'}
  6. resp.request.headers
  7. {'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

解决办法

取消请求headers里Accept-Encoding的默认: headers={'Accept-Encoding': None}**

  1. resp = requests.get(url, stream=True, headers={'Accept-Encoding': None})
  2. resp.headers
  3. {'Date': 'Tue, 16 Mar 2021 08:33:13 GMT', 'Server': 'Apache', 'Last-Modified': 'Thu, 22 Oct 2020 05:06:05 GMT', 'ETag': '"ba9113e-5b23b6bced775"', 'Accept-Ranges': 'bytes', 'Content-Length': '19562
  4. 9374', 'Vary': 'Accept-Encoding', 'Access-Control-Allow-Origin': '*', 'Keep-Alive': 'timeout=15, max=100', 'Connection': 'Keep-Alive', 'Content-Type': 'text/xml'}
  5. resp.request.headers
  6. {'User-Agent': 'python-requests/2.25.1', 'Accept': '*/*', 'Connection': 'keep-alive'}