Itempipeline的使用

1、分析:斗鱼
2、需求:爬取主播封面图片和名称
Url: https://m.douyu.com/api/room/list?page={}&type=yz
3、明确目标 item.py
4、制作爬虫 spider.py
5、存储数据

1、创建项目

  1. scrapy startproject douyu

2、制作爬虫

  1. cd douyu
  2. scrapy genspider spider[项目名称] www.douyu.com

3、items.py

  1. class DouyuItem(scrapy.Item):
  2. nickname = scrapy.Field()
  3. verticalSrc = scrapy.Field()

4、spider.py

  1. import scrapy
  2. import json
  3. from douyu.items import DouyuItem
  4. class SpiderSpider(scrapy.Spider):
  5. name = 'spider'
  6. #allowed_domains = ['wwww.douyu.com']
  7. url = "https://m.douyu.com/api/room/list?page={}&type=yz"
  8. offset = 0
  9. start_urls = [url.format(offset)]
  10. def parse(self, response):
  11. # 数据类型转换
  12. datas = json.loads(response.text)['data']['list']
  13. for data in datas:
  14. item = DouyuItem()
  15. item['nickname'] = data['nickname']
  16. item['verticalSrc'] = data['verticalSrc']
  17. yield item
  18. self.offset += 1 # 翻页
  19. yield scrapy.Request(self.url.format(self.offset), callback=self.parse)

5、将图片保存到文件夹目录下 — pipelines.py

  1. from itemadapter import ItemAdapter
  2. from scrapy.pipelines.images import ImagesPipeline
  3. import scrapy
  4. import os
  5. class DouyuPipeline(ImagesPipeline):
  6. # 专门用来下载图片的函数
  7. def get_media_requests(self, item, info):
  8. image_url = item['verticalSrc']
  9. yield scrapy.Request(image_url)
  10. # 重命名
  11. def item_completed(self, results, item, info):
  12. path = 'D:\WWW\Python\class\第19节\douyu\image'
  13. image_path = results[0][1]['path']
  14. # 重命名
  15. os.rename(path + '/' + image_path, path + '/' + item['nickname'] + '.jpg')
  16. return item

注意:
要开启settings.py中的ITEM_PIPELINES参数。并定义一下图片保存地址

  1. IMAGES_STORE='D:\WWW\Python\class\第19节\douyu\image'
  2. ITEM_PIPELINES = {
  3. 'douyu.pipelines.DouyuPipeline': 300,
  4. }