_’’’
爬取天气案例
需求:爬取近7天的天气(日期,天气,温度,风力)
url: http://www.weather.com.cn/weather/101250101.shtml(湖南长沙近7天的天气预报)
(查看网页源码:查找:今天,在源码上找到了,所以上面的url就是今天的目标url)
步骤:
1)获取整个网页源码
2)获取url数据
3)在ul数据里面找li数据
4)在li数据提取 日期,天气,温度,风力
5)保存到CSV文件里
‘’’
_import requests
import re
import csv
url = ‘http://www.weather.com.cn/weather/101250101.shtml‘
headers = {
‘user-agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36’
}
#发请求,获取响应
res = requests.get(url)
res.encoding = ‘utf-8’
html = res.text
result = re.match(r’.(
- .?
ul = result.group(1)
lis = re.findall(r’
pattern = re.compile(r’<li.?>.?
(.?)
.?<p.?>(.?).?(.?).?(.?).‘, re.S)data = []
for i in lis:
r = pattern.match(i)
# print(r.group(1), r.group(2), r.group(3), r.group(4))
data.append((r.group(1), r.group(2), r.group(3), r.group(4)))
with open(‘weather.csv’, ‘w’, encoding=’utf-8’, newline=’’) as f:
# 创建写入对象
writer = csv.writer(f)
# 写入表头(也要写成列表的形式)
writer.writerow([“日期”, “天气”, “温度”, “风力”])
writer.writerows(data)
注意:csv的写入,数据形式只能有两种: 1)列表 2)字典