1. 简介

简单优雅,下载次数多,受人喜欢,下载量高,官方文档完善
Request官网

2. 发起请求

构建请求和响应报文

  1. import requests
  2. targetUrl = "http://httpbin.org/get"
  3. resp = requests.get(targetUrl)
  4. print(resp.text)
  1. {
  2. "args": {},
  3. "headers": {
  4. "Accept": "*/*",
  5. "Accept-Encoding": "gzip, deflate",
  6. "Host": "httpbin.org",
  7. "User-Agent": "python-requests/2.25.1",
  8. "X-Amzn-Trace-Id": "Root=1-60225ec0-04d817761fdafd9459be4c66"
  9. },
  10. "origin": "47.240.65.248",
  11. "url": "http://httpbin.org/get"
  12. }

直接打印文本

  1. import requests
  2. test_url = "http://httpbin.org/get" # 测试url
  3. res = requests.get(test_url)
  4. print(res.text)
  1. {
  2. "args": {},
  3. "headers": {
  4. "Accept": "*/*",
  5. "Accept-Encoding": "gzip, deflate",
  6. "Host": "httpbin.org",
  7. "User-Agent": "python-requests/2.25.1",
  8. "X-Amzn-Trace-Id": "Root=1-6022666c-492947c37cb3a71443f517bd"
  9. },
  10. "origin": "183.223.85.232",
  11. "url": "http://httpbin.org/get"
  12. }

测试其他请求方式

POST方式

  1. import requests
  2. # post请求
  3. data = {"data1":"Spider", "data2":"测试爬虫"}
  4. test_url = "http://httpbin.org/post" # 测试url
  5. res = requests.post(test_url,data=data)
  6. print(res.text)
  1. {
  2. "args": {},
  3. "data": "",
  4. "files": {},
  5. "form": {
  6. "data1": "Spider",
  7. "data2": "\u6d4b\u8bd5\u722c\u866b"
  8. },
  9. "headers": {
  10. "Accept": "*/*",
  11. "Accept-Encoding": "gzip, deflate",
  12. "Content-Length": "55",
  13. "Content-Type": "application/x-www-form-urlencoded",
  14. "Host": "httpbin.org",
  15. "User-Agent": "python-requests/2.25.1",
  16. "X-Amzn-Trace-Id": "Root=1-60227e99-47295bf138329e21631e53d8"
  17. },
  18. "json": null,
  19. "origin": "47.240.65.248",
  20. "url": "http://httpbin.org/post"
  21. }

传递json表单
json和data二选一, 有data则json为null

  1. import requests
  2. # post请求
  3. data = {"data1":"Spider", "data2":"测试爬虫"}
  4. json = {"json_style":"json-data"}
  5. test_url = "http://httpbin.org/post" # 测试url
  6. res = requests.post(test_url, json=json)
  7. print(res.text)
  1. {
  2. "args": {},
  3. "data": "{\"json_style\": \"json-data\"}",
  4. "files": {},
  5. "form": {},
  6. "headers": {
  7. "Accept": "*/*",
  8. "Accept-Encoding": "gzip, deflate",
  9. "Content-Length": "27",
  10. "Content-Type": "application/json",
  11. "Host": "httpbin.org",
  12. "User-Agent": "python-requests/2.25.1",
  13. "X-Amzn-Trace-Id": "Root=1-60228036-0845b88b3f85bca75085b16c"
  14. },
  15. "json": {
  16. "json_style": "json-data"
  17. },
  18. "origin": "47.240.65.248",
  19. "url": "http://httpbin.org/post"
  20. }

测试request的几个参数

Request库的使用 - 图1

测试自动转码

用户以get访问方式,传入字典形式的参数,特别注意的是字典中的值是中文,所以requests库的一个功能就是中文的自动转码

  1. import requests
  2. test_url = "http://httpbin.org/get" # 测试url
  3. params = {"name1":"网络", "name2":"爬虫"} # 请求参数
  4. res = requests.get(test_url, params=params)
  5. print(res.text)
  1. {
  2. "args": {
  3. "name1": "\u7f51\u7edc",
  4. "name2": "\u722c\u866b"
  5. },
  6. "headers": {
  7. "Accept": "*/*",
  8. "Accept-Encoding": "gzip, deflate",
  9. "Host": "httpbin.org",
  10. "User-Agent": "python-requests/2.25.1",
  11. "X-Amzn-Trace-Id": "Root=1-602269a1-670a1cff406e773611659033"
  12. },
  13. "origin": "47.240.65.248",
  14. "url": "http://httpbin.org/get?name1=\u7f51\u7edc&name2=\u722c\u866b"
  15. }

测试headers

响应报文中有headers

  1. import requests
  2. test_url = "http://httpbin.org/get" # 测试url
  3. headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.56"}
  4. params = {"name1":"网络", "name2":"爬虫"}
  5. res = requests.get(test_url, params=params, headers=headers)
  6. print(res.text)
  1. {
  2. "args": {
  3. "name1": "\u7f51\u7edc",
  4. "name2": "\u722c\u866b"
  5. },
  6. "headers": {
  7. "Accept": "*/*",
  8. "Accept-Encoding": "gzip, deflate",
  9. "Host": "httpbin.org",
  10. "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.56",
  11. "X-Amzn-Trace-Id": "Root=1-60226c1f-7f595f34039a54485b1ace0d"
  12. },
  13. "origin": "47.240.65.248",
  14. "url": "http://httpbin.org/get?name1=\u7f51\u7edc&name2=\u722c\u866b"
  15. }

直接在requests方法中使用cookies参数

  1. import requests
  2. test_url = "http://httpbin.org/get" # 测试url
  3. headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.56"}
  4. cookies = {"sessionid":"hashcode", "userid": "987654321"}
  5. params = {"name1":"网络", "name2":"爬虫"}
  6. res = requests.get(test_url, params=params, headers=headers, cookies=cookies)
  7. print(res.text)
  1. {
  2. "args": {
  3. "name1": "\u7f51\u7edc",
  4. "name2": "\u722c\u866b"
  5. },
  6. "headers": {
  7. "Accept": "*/*",
  8. "Accept-Encoding": "gzip, deflate",
  9. "Cookie": "sessionid=hashcode; userid=987654321",
  10. "Host": "httpbin.org",
  11. "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.56",
  12. "X-Amzn-Trace-Id": "Root=1-60226e2a-5a62cdd162da8d122627d70f"
  13. },
  14. "origin": "47.240.65.248",
  15. "url": "http://httpbin.org/get?name1=\u7f51\u7edc&name2=\u722c\u866b"
  16. }

设置超时

  1. import requests
  2. test_url = "http://httpbin.org/get" # 测试url
  3. headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.56"}
  4. cookies = {"sessionid":"hashcode", "userid": "987654321"}
  5. params = {"name1":"网络", "name2":"爬虫"}
  6. res = requests.get(test_url, params=params, headers=headers, cookies=cookies, timeout=1)
  7. print(res.text)

关闭重定向

关闭重定向之后,像github此类存在重定向功能的网站无法正常访问,会直接显示302
302是临时重定向的意思

  1. # 重定向
  2. url = "http://github.com"
  3. res_gh = requests.get(url, allow_redirects=False)
  4. print(res_gh.text)
  5. print(res_gh.status_code)

使用代理

仅作演示

  1. import requests
  2. test_url = "http://httpbin.org/get" # 测试url
  3. headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.56"}
  4. cookies = {"sessionid":"hashcode", "userid": "987654321"}
  5. proxies = {"http":"123.123.12.123"}
  6. params = {"name1":"网络", "name2":"爬虫"}
  7. res = requests.get(test_url, params=params, headers=headers, cookies=cookies, timeout=100, proxies=proxies)
  8. print(res.text)

证书验证

  1. # 证书验证
  2. url = "https://inv-veri.chinatax.gov.cn/"
  3. res_ca = requests.get(url)
  4. print(res_ca.text)
  1. During handling of the above exception, another exception occurred:
  2. Traceback (most recent call last):
  3. File "C:\Users\41999\Documents\PycharmProjects\Python\TZ—Spyder\第三节Request库的使用\request_demo.py", line 40, in <module>
  4. res_ca = requests.get(url)
  5. File "C:\Users\41999\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 76, in get
  6. return request('get', url, params=params, **kwargs)
  7. File "C:\Users\41999\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 61, in request
  8. return session.request(method=method, url=url, **kwargs)
  9. File "C:\Users\41999\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 542, in request
  10. resp = self.send(prep, **send_kwargs)
  11. File "C:\Users\41999\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 655, in send
  12. r = adapter.send(request, **kwargs)
  13. File "C:\Users\41999\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 514, in send
  14. raise SSLError(e, request=request)
  15. requests.exceptions.SSLError: HTTPSConnectionPool(host='inv-veri.chinatax.gov.cn', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1123)')))

对比(设置verify=false之后就不再验证证书,但是返回报文依然会报错)

  1. # 证书验证
  2. url = "https://inv-veri.chinatax.gov.cn/"
  3. res_ca = requests.get(url, verify=False)
  4. print(res_ca.text)
  1. C:\Users\41999\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py:1013: InsecureRequestWarning: Unverified HTTPS request is being made to host 'inv-veri.chinatax.gov.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  2. warnings.warn(
  3. <!DOCTYPE html>
  4. <html xmlns="http://www.w3.org/1999/xhtml">
  5. <head>
  6. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  7. <meta http-equiv="X-UA-Compatible" content="IE=edge" />
  8. <title>国家税务总局全国增值税发票查验平台</title>
  9. <meta name="keywords" content="">
  10. <META HTTP-EQUIV="pragma" CONTENT="no-cache">

使用urllib3忽略警告(urllib3.disable_warnings())

  1. # 证书验证
  2. urllib3.disable_warnings()
  3. url = "https://inv-veri.chinatax.gov.cn/"
  4. res_ca = requests.get(url, verify=False)
  5. print(res_ca.text)

使用requests包忽略警告(无需导入urllib3包)

  1. # 证书验证
  2. requests.packages.urllib3.disable_warnings()
  3. url = "https://inv-veri.chinatax.gov.cn/"
  4. res_ca = requests.get(url, verify=False)
  5. print(res_ca.text)
  1. <!DOCTYPE html>
  2. <html xmlns="http://www.w3.org/1999/xhtml">
  3. <head>
  4. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  5. <meta http-equiv="X-UA-Compatible" content="IE=edge" />
  6. <title>国家税务总局全国增值税发票查验平台</title>
  7. <meta name="keywords" content="">
  8. <META HTTP-EQUIV="pragma" CONTENT="no-cache">
  9. <META HTTP-EQUIV="Cache-Control" CONTENT="no-cache, must-revalidate">
  10. <META HTTP-EQUIV="expires" CONTENT="0">

4. 接受响应

正常显示网页中的中文字体

res.encoding = “utf-8” # 为了正常显示网页中的中文信息

将获取的网页转换成字符串,而非字节码

print(res.text) # 直接转换成字符串,非字节码

将获取的数据转换成字节码而非字符串,图片和视频需要此格式

print(res.content) # 字节码,图片视频使用此参数

打印状态码

print(res.status_code) # 打印状态码
print(“—-“*20)

解析Json并且获取响应头中的数据

print(res.json()[“headers”][“User-Agent”]) # 用json取响应报文中headers中的数据,自动转换为字典格式

获取响应头

print(res.headers)
print(“—-“*20)

获取Cookies

print(res.cookies)

获取URL

print(res.url)
print(“—-“*20)

这里是获取响应头,request是标准格式,不是包Resquests

print(res.request.headers)

  1. import requests
  2. # post请求
  3. data = {"data1":"Spider", "data2":"测试爬虫"}
  4. json = {"json_style":"json-data"}
  5. headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.56"}
  6. test_url = "http://httpbin.org/post" # 测试url
  7. res = requests.post(test_url, json=json)
  8. res.encoding = "utf-8" # 为了正常显示网页中的中文信息
  9. print(res.text) # 直接转换成字符串,非字节码
  10. print(res.content) # 字节码,图片视频使用此参数
  11. print(res.status_code) # 打印状态码
  12. print("---"*20)
  13. print(res.json()["headers"]["User-Agent"]) # 用json取响应报文中headers中的数据,自动转换为字典格式
  14. print(res.headers)
  15. print("---"*20)
  16. print(res.cookies)
  17. print(res.url)
  18. print("---"*20)
  19. # 这里是获取响应头,request是标准格式,不是包Resquests
  20. print(res.request.headers)
  1. {
  2. "args": {},
  3. "data": "{\"json_style\": \"json-data\"}",
  4. "files": {},
  5. "form": {},
  6. "headers": {
  7. "Accept": "*/*",
  8. "Accept-Encoding": "gzip, deflate",
  9. "Content-Length": "27",
  10. "Content-Type": "application/json",
  11. "Host": "httpbin.org",
  12. "User-Agent": "python-requests/2.25.1",
  13. "X-Amzn-Trace-Id": "Root=1-602342b5-6d003f1f348f62150ec0a2f5"
  14. },
  15. "json": {
  16. "json_style": "json-data"
  17. },
  18. "origin": "47.240.65.248",
  19. "url": "http://httpbin.org/post"
  20. }
  21. b'{\n "args": {}, \n "data": "{\\"json_style\\": \\"json-data\\"}", \n "files": {}, \n "form": {}, \n "headers": {\n "Accept": "*/*", \n "Accept-Encoding": "gzip, deflate", \n "Content-Length": "27", \n "Content-Type": "application/json", \n "Host": "httpbin.org", \n "User-Agent": "python-requests/2.25.1", \n "X-Amzn-Trace-Id": "Root=1-602342b5-6d003f1f348f62150ec0a2f5"\n }, \n "json": {\n "json_style": "json-data"\n }, \n "origin": "47.240.65.248", \n "url": "http://httpbin.org/post"\n}\n'
  22. 200
  23. ------------------------------------------------------------
  24. python-requests/2.25.1
  25. {'Content-Length': '502', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Origin': '*', 'Connection': 'keep-alive', 'Content-Type': 'application/json', 'Date': 'Wed, 10 Feb 2021 02:19:33 GMT', 'Keep-Alive': 'timeout=4', 'Proxy-Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0'}
  26. ------------------------------------------------------------
  27. <RequestsCookieJar[]>
  28. http://httpbin.org/post
  29. ------------------------------------------------------------
  30. {'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '27', 'Content-Type': 'application/json'}
  31. Process finished with exit code 0

5. Session对象

功能:自动更新请求头信息,常用在账户登录的时候,先访问登录页面的URL,再访问数据提交的URL
例如12306

使用request请求获取请求头headers,含cookies

  1. import resquests
  2. index_url = "https://www.bing.com"
  3. headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.56"}
  4. session = requests.session()
  5. session.headers = headers
  6. res_ss = session.get(index_url)
  7. res_ss.encoding = "utf-8"
  8. # print(res_ss.text)
  9. '''使用session之后就可以使用request请求获取headers,这时候headers包含cookies'''
  10. print("---"*40)
  11. print(res_ss.request.headers)
  1. {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.56',
  2. 'Cookie': 'SNRHOP=TS=637485229167516980&I=1; _EDGE_S=F=1&SID=05EF50E1397D6E18095B5F3B383E6F57; _EDGE_V=1; MUID=056E607057156E9C14FE6FAA56566F14'}

20210210 年三十前一天 headers中的cookies无法解决,先学习后面的抓包工具