1. request
2. 发送请求
地址传递值
字典传递
3. 代理
4. 定制请求头 header
添加header信息,这是最基本的反爬的措施
5. Cookie
6. 使用Corntab定时调度爬虫

1. request

request.urlopen(url) 打开一个url获取Response 对象
res.getirl() 获取主机地址
res.getcode() 获取状态码
- 200为成功
- 3xx发生了重定向
- 4xx访问资源有问题
- 5xx内部错误
res.info() 获取响应头
res.read() 获取的是字节形式的内容返回一文本对象
- textl.decode(“utf-8”) 需要指定编码才能正确显示
res.json() 如果返回为json数据直接解码
res.encoding = “utf-8” 将Response 以此编码读取

2. 发送请求

res.get(url) 发送get请求
- ```python
地址传递值
requests.get(http://httpbin.org/get?name=gemey&age=22)

字典传递

data = { ‘name’: ‘tom’, ‘age’: 20 }

response = requests.get(‘http://httpbin.org/get‘, params=data)


- 
res.post(url)   发送post请求
   - 
```python
#post请求通过字典或者 json字符串传递参数
data = {'name':'tom','age':'22'}
response = requests.post('http://httpbin.org/post', data=data)

3. 代理

同添加headers方法，代理参数也要是一个dict 属性名为proxies

proxy = {
    'http': '120.25.253.234:812',
    'https' '163.125.222.244:8123'
}
req = requests.get(url, proxies=proxy)

4. 定制请求头 header

添加header信息,这是最基本的反爬的措施有一些网站拥有反爬技术,我们需要模拟真实浏览器的包头进行访问发生

为请求添加 HTTP 头部，只要简单地传递一个 dict 给 headers 参数就可以了。

request.Request(url,headers=header)
- ```python
添加header信息,这是最基本的反爬的措施
url =”http://www.dianping.com/“ #有一些网站拥有反爬技术,我们需要模拟真实浏览器的包头进行访问发生 header={ “User-Agent”:”Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36” } #需要一个字典存放包头 req=request.Request(url,headers=header) #requests需要一个网站和包头 res=request.urlopen(req)

print(res.geturl()) #获取主机地址 print(res.getcode()) #获取请求状态码 200为成功 3xx发生了重定向 4xx访问资源有问题 5xx内部错误 print(res.info()) #获取响应头


- 
获取响应头
   - 
```python
r.headers['Content-Type']
r.headers.get('content-type')  #根据key获取响应头

5. Cookie

如果某个响应中包含一些 cookie，你可以快速访问它们

res.cookies[“键名”] 获取cookies

url ='http://example.com/some/cookie/setting/url'
r = requests.get(url)
r.cookies['example_cookie_name']  #获取指定key的cookies

发送cookies 通过请求时传递cookies属性

url = 'http://httpbin.org/cookies'
cookies = dict(cookies_are='working')
r = requests.get(url, cookies=cookies)
r.text

6. 使用Corntab定时调度爬虫

在linux上安装chrome

cd /etc/yum.repos.d/
touch google-chrome.repo

添加chrome源

[google-chrome]
name=google-chrome
baseurl=http://dl.google.com/linux/chrome/rpm/stable/$basearch
enabled=1
gpgcheck=1
gpgkey=https://dl-ssl.google.com/linux/linux_signing_key.pub

yum -y install google-chrome-stable --nogpgcheck

crontab -e 编写python执行

计算机从入门到入土

01. request