17-2 得到某类型每天听本书信息 - 图1

抓取的结果信息包含:

  • 书籍名称
  • 书籍链接
  • 一句话介绍
  • 时长
  • 价格
  • 简介

    结果示例图:

    17-2 得到某类型每天听本书信息 - 图2

    模板:

    1. {"_id":"dedao-tingshu","startUrl":["https://www.igetget.com/list/%E5%BF%83%E7%90%86%E5%AD%A6/0RMtQSwfyUy"],"selectors":[{"id":"name","type":"SelectorLink","parentSelectors":["info"],"selector":"a","multiple":true,"delay":0},{"id":"intro1","type":"SelectorText","parentSelectors":
    2. ["name"],"selector":".head-text h3","multiple":false,"regex":"","delay":0},{"id":"time","type":"SelectorText","parentSelectors":
    3. ["name"],"selector":".head-text p","multiple":false,"regex":"","delay":0},{"id":"price","type":"SelectorText","parentSelectors":
    4. ["name"],"selector":"div.book-coin-bt","multiple":false,"regex":"","delay":0},{"id":"intro2","type":"SelectorText","parentSelectors":
    5. ["name"],"selector":".intro-content p","multiple":false,"regex":"","delay":0},{"id":"info","type":"SelectorElementClick","parentSelectors":
    6. ["_root"],"selector":"li.pro-detail","multiple":true,"delay":"2000","clickElementSelector":"div.page-num-div:nth-of-type(n+2)","clickType":"clickOnce","discardInitialElements":"do-not-discard","clickElementUniquenessType":"uniqueText"}]}

    模板套用步骤:

    (1)进入需要抓取的每天听本书分类页面,例如:https://www.igetget.com/list/%E5%BF%83%E7%90%86%E5%AD%A6/0RMtQSwfyUy
    (2)导入模板
    (3)替换 Start URL为要抓取的网页链接
    (4)开始抓取