38-1 博客园文章搜索结果信息 - 图2

抓取的结果信息包含:

  • 标题
  • 链接
  • 作者
  • 作者链接
  • 发布时间
  • 推荐数
  • 评论数
  • 阅读数

    结果示例图:

    38-1 博客园文章搜索结果信息 - 图3

    模板:

  1. {"_id":"bokeyuan-search","startUrl":
  2. ["https://zzk.cnblogs.com/s/blogpost?Keywords=%E4%BA%A7%E5%93%81%E7%BB%8F%E7%90%86&pageindex=[1-5]"],"selectors":[{"id":"info","type":"SelectorElement","parentSelectors":["_root"],"selector":"div.searchItem","multiple":true,"delay":0},{"id":"title","type":"SelectorLink","parentSelectors":
  3. ["info"],"selector":".searchItemTitle a","multiple":false,"delay":0},{"id":"author","type":"SelectorLink","parentSelectors":["info"],"selector":".searchItemInfo-userName a","multiple":false,"delay":0},{"id":"time","type":"SelectorText","parentSelectors":["info"],"selector":"span.searchItemInfo-publishDate","multiple":false,"regex":"","delay":0},{"id":"recommends","type":"SelectorText","parentSelectors":["info"],"selector":"span.searchItemInfo-good","multiple":false,"regex":"\\d+","delay":0},{"id":"comments","type":"SelectorText","parentSelectors":["info"],"selector":"span.searchItemInfo-comments","multiple":false,"regex":"\\d+","delay":0},{"id":"reads","type":"SelectorText","parentSelectors":["info"],"selector":"span.searchItemInfo-views","multiple":false,"regex":"\\d+","delay":0}]}

模板套用步骤:

(1)进入需要抓取的文章搜索结果页面,例如:https://zzk.cnblogs.com/s/blogpost?Keywords=%E4%BA%A7%E5%93%81%E7%BB%8F%E7%90%86&pageindex=1
(2)导入模板
(3)替换 Start URL为要抓取的网页链接(抓取多页需修改 Start URL 里的页码数)
(4)开始抓取