Itempipeline的使用
1、分析:斗鱼
2、需求:爬取主播封面图片和名称
Url: https://m.douyu.com/api/room/list?page={}&type=yz
3、明确目标 item.py
4、制作爬虫 spider.py
5、存储数据
1、创建项目
scrapy startproject douyu
2、制作爬虫
cd douyuscrapy genspider spider[项目名称] www.douyu.com
3、items.py
class DouyuItem(scrapy.Item):nickname = scrapy.Field()verticalSrc = scrapy.Field()
4、spider.py
import scrapyimport jsonfrom douyu.items import DouyuItemclass SpiderSpider(scrapy.Spider):name = 'spider'#allowed_domains = ['wwww.douyu.com']url = "https://m.douyu.com/api/room/list?page={}&type=yz"offset = 0start_urls = [url.format(offset)]def parse(self, response):# 数据类型转换datas = json.loads(response.text)['data']['list']for data in datas:item = DouyuItem()item['nickname'] = data['nickname']item['verticalSrc'] = data['verticalSrc']yield itemself.offset += 1 # 翻页yield scrapy.Request(self.url.format(self.offset), callback=self.parse)
5、将图片保存到文件夹目录下 — pipelines.py
from itemadapter import ItemAdapterfrom scrapy.pipelines.images import ImagesPipelineimport scrapyimport osclass DouyuPipeline(ImagesPipeline):# 专门用来下载图片的函数def get_media_requests(self, item, info):image_url = item['verticalSrc']yield scrapy.Request(image_url)# 重命名def item_completed(self, results, item, info):path = 'D:\WWW\Python\class\第19节\douyu\image'image_path = results[0][1]['path']# 重命名os.rename(path + '/' + image_path, path + '/' + item['nickname'] + '.jpg')return item
注意:
要开启settings.py中的ITEM_PIPELINES参数。并定义一下图片保存地址
IMAGES_STORE='D:\WWW\Python\class\第19节\douyu\image'ITEM_PIPELINES = {'douyu.pipelines.DouyuPipeline': 300,}
