1.json数据提取方法

json

  • 数据交换格式,看起来像python类型(列表、字典)的字符串
  • 使用json之前需要导入
    • import json
  • 哪里会返回json数据
    • 浏览器切换到手机版
    • 抓包app
  • json.loads()方法
    • 把json字符串转化为python类型
    • json.loads(json字符串)
  • json.dumps()方法
    • 把python类型转化为json字符串
    • json.dumps({})
    • json.dumps(ret1,ensure_ascii=False,indent=4)
      • ensure_ascii:让中文显示成中文
      • indent:能够让下一行在上一行的基础上空格
  • 豆瓣电视爬虫案例

Screenshot_20210530_091236_tv.danmaku.bili.jpgScreenshot_20210530_090759_tv.danmaku.bili.jpgScreenshot_20210530_090735_tv.danmaku.bili.jpgScreenshot_20210530_090034_tv.danmaku.bili.jpgScreenshot_20210530_085513_tv.danmaku.bili.jpg

2.retrying模块的使用

第一步,安装retrying库

  • pip install retrying

第二步,导入模块

  1. from retrying import retry
  2. @retry(stop_max_attempt_number=3)
  3. def fun1():
  4. print('this is func1')
  5. raise ValueError('this is test error')
  6. if __name__ == '__main__':
  7. fun1()

Screenshot_20210529_233514_tv.danmaku.bili.jpg

3.xpath模块的学习

Screenshot_20210529_215919_tv.danmaku.bili.jpg
Screenshot_20210529_214323_tv.danmaku.bili.jpg

Screenshot_20210530_135226_tv.danmaku.bili.jpgScreenshot_20210530_140532_tv.danmaku.bili.jpg

4.lxml模块的学习

Screenshot_20210530_135226_tv.danmaku.bili.jpgScreenshot_20210530_140532_tv.danmaku.bili.jpg

5.cookie相关的请求

Screenshot_20210529_233514_tv.danmaku.bili.jpg
Screenshot_20210529_233015_tv.danmaku.bili.jpgScreenshot_20210529_232219_tv.danmaku.bili.jpgScreenshot_20210529_230742_tv.danmaku.bili.jpg
Screenshot_20210529_225633_tv.danmaku.bili.jpg
Screenshot_20210529_225403_tv.danmaku.bili.jpgScreenshot_20210529_225255_tv.danmaku.bili.jpgScreenshot_20210529_224410_tv.danmaku.bili.jpgScreenshot_20210529_223454_tv.danmaku.bili.jpgScreenshot_20210529_220525_tv.danmaku.bili.jpgScreenshot_20210529_213749_tv.danmaku.bili.jpg

6.session

6节课掌握python爬虫 - 图24