Itempipeline的使用
1、分析:斗鱼
2、需求:爬取主播封面图片和名称
Url: https://m.douyu.com/api/room/list?page={}&type=yz
3、明确目标 item.py
4、制作爬虫 spider.py
5、存储数据
1、创建项目
scrapy startproject douyu
2、制作爬虫
cd douyu
scrapy genspider spider[项目名称] www.douyu.com
3、items.py
class DouyuItem(scrapy.Item):
nickname = scrapy.Field()
verticalSrc = scrapy.Field()
4、spider.py
import scrapy
import json
from douyu.items import DouyuItem
class SpiderSpider(scrapy.Spider):
name = 'spider'
#allowed_domains = ['wwww.douyu.com']
url = "https://m.douyu.com/api/room/list?page={}&type=yz"
offset = 0
start_urls = [url.format(offset)]
def parse(self, response):
# 数据类型转换
datas = json.loads(response.text)['data']['list']
for data in datas:
item = DouyuItem()
item['nickname'] = data['nickname']
item['verticalSrc'] = data['verticalSrc']
yield item
self.offset += 1 # 翻页
yield scrapy.Request(self.url.format(self.offset), callback=self.parse)
5、将图片保存到文件夹目录下 — pipelines.py
from itemadapter import ItemAdapter
from scrapy.pipelines.images import ImagesPipeline
import scrapy
import os
class DouyuPipeline(ImagesPipeline):
# 专门用来下载图片的函数
def get_media_requests(self, item, info):
image_url = item['verticalSrc']
yield scrapy.Request(image_url)
# 重命名
def item_completed(self, results, item, info):
path = 'D:\WWW\Python\class\第19节\douyu\image'
image_path = results[0][1]['path']
# 重命名
os.rename(path + '/' + image_path, path + '/' + item['nickname'] + '.jpg')
return item
注意:
要开启settings.py中的ITEM_PIPELINES参数。并定义一下图片保存地址
IMAGES_STORE='D:\WWW\Python\class\第19节\douyu\image'
ITEM_PIPELINES = {
'douyu.pipelines.DouyuPipeline': 300,
}