常用命令
基本写法
用response存储get请求来的内容。
import requests #导入库
response = requests.get("http://httpbin.org/get")
print(response.text)
带参数的get请求 (params)
参数的名称是
params
import requests
data = {
'name':'germey', # 注意要有逗号
'age':22
}
response = requests.get("http://httpbin.org/get",params=data)
#这里把包含参数的 data 赋予传参数的 params 就可以了。
print(response.text)
解析json
response.json()
等同于 json.loads(response.text)
import requests
import json
response = requests.get("http://httpbin.org/get")
print(type(response.text)) # 返回 字符串样式
print(response.json())
print(json.loads(response.text)) #返回字典样式
print(type(response.json())) #注意,response.json 后面带一个括号
获取二进制数据库
下载视频、图片等的时候常用的一种方法
import requests
response = requests.get("https://github.com/favicon.ico")
print(type(response.text),type(response.content))
print(response.text) # <class 'str'> 字符串
print(response.content) #<class 'bytes'> 二进制编码
保存文件到本地
wb
和ab
测试出来效果一样,但是为什么用wb? 先记下来吧。favicon.ico
表示保存下来的文件名。
import requests
response = requests.get("https://github.com/favicon.ico")
with open('favicon.ico','wb') as f:
f.write(response.content)
f.close()
添加headers
如果不加,会被判定为爬虫,网站会拒绝请求。
import requests
headers = {
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
}
response = requests.get("https://www.zhihu.com/explore",headers=headers)
print(response.text)
基本 POST 请求
参数名称为 data, 以字典的方式传入data 就可以
import requests
data = {'name':'feilong','age':'22'}
response = requests.post("http://httpbin.org/post",data=data)
print(response.text)
响应
response 属性
import requests
response = requests.get('http://www.jianshu.com')
print(type(response.status_code),response.status_code)
print(type(response.headers),response.headers)
print(type(response.cookies),response.cookies)
print(type(response.url),response.url)
print(type(response.history),response.history)
状态码判断
import requests
response = requests.get('http://www.jianshu.com')
exit() if not response.status_code == requests.codes.ok else print('Request Successfully')
高级操作
文件上传
文件上传用 post 这个参数
‘file_name’ 是上传的文件名称,可以自定义
import requests
files = {'file_name': open('favicon.ico','rb')}
response = requests.post("http://httpbin.org/post",files=files)
print(response.text)
获取cookies
import requests
response = requests.get("https://www.baidu.com")
print(response.cookies)
for key,value in response.cookies.items():
print(key+'='+value)
会话维持
使用 Session() 会话对象让你能够跨请求保持某些参数。它也会在同一个 Session 实例发出的所有请求之间保持 cookie
import requests
s = requests.Session()
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get("http://httpbin.org/cookies")
print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'
证书验证
get在请求https网站前会先确定证书是否合法,如果不合法会sslerror 然后中断
import requests
response = requests.get('https://www.12306.cn',verify=False)
print(response.status_code)
所以,在这里加一个参数 verify
就可以避免被中断
但是,它会有 ⚠️警告信息(大意:你没有验证证书,验证是有必要的)这个时候,我们可以改成下面的代码,就会消除警告。
import requests
from requests.packages import urllib3
urllib3.disable_warnings()
response = requests.get('https://www.12306.cn',verify=False)
print(response.status_code)
这个时候,就只会返回状态码 200
也可指定本地的证书进行验证, 但是本地要有证书才可以。
import requests
response = requests.get('https://www.12306.cn',cert=('/path/server.cert','/path/key'))
print(response.status_code)
代理设置
http/https 代理
无密码
import requests
proxies = {
"http":"http://127.0.0.1:9843", #这里是地址
"https":"https:127.0.0.1:9023" #这里是地址
}
response = requests.get("https://www.taobao.com",proxies=proxies)
print(response.status_code)
有密码
import requests
proxies = {
"http":"http://user:password@127.0.0.1:9843", #这里是地址加用户名和密码
}
response = requests.get("https://www.taobao.com",proxies=proxies)
print(response.status_code)
socks 代理
import requests
proxies = {
'http':'socks5://127.0.0.1:9742'
'https':'socks5://127.0.0.1:9742'
}
response = requests.get("https://www.taobao.com",proxies=proxies)
print(response.status_code)
超时设置
在规定时间内没有获得应答就直接返回应答
import requests
from requests.exceptions import ReadTimeout
try:
response = requests.get("http://httpbin.org/get",timeout=0.1)
print(response.status_code)
except ReadTimeout:
print("timeout")
首先,导入了requests, 然后从requests.exceptions 导入 ReadTimeout模块
第二,设置一个异常处理,如果成功就打印状态码 200,如果失败就打印 timeout
认证设置
import requests
from requests.auth import HTTPBasicAuth
r = requests.get('http://zfl420.pythonanywhere.com/admin/blog/post/add/',auth=HTTPBasicAuth('zfl420','nihao52cons'))
print(r.status_code)
异常处理
import requests
from requests.exceptions import ReadTimeout, ConnectionError,RequestException
try:
response = requests.get("http://httpbin.org/get",timeout=0.1)
print(response.status_code)
except ReadTimeout:
print('timeout')
except ConnectionError:
print('Connection Error')
except RequestException:
print('Error')
可以捕捉不同的异常