抓取的结果信息包含:
- 店铺名字
- 店铺评分
- 店铺地址
- 店铺主页链接
- 电话
结果示例图:
模板:
{"_id":"meituan-meishi","startUrl":
["https://bj.meituan.com/meishi/c17/pn[1-5]/"],"selectors":[{"id":"info","type":"SelectorElement","parentSelectors":["_root"],"selector":"li.clear","multiple":true,"delay":""},{"id":"name","type":"SelectorText","parentSelectors":["info"],"selector":"h4","multiple":false,"regex":"","delay":0},{"id":"score","type":"SelectorText","parentSelectors":
["info"],"selector":".source p","multiple":false,"regex":"","delay":0},{"id":"address","type":"SelectorText","parentSelectors":["info"],"selector":"p.desc","multiple":false,"regex":"","delay":0},{"id":"link","type":"SelectorLink","parentSelectors":
["info"],"selector":".info a","multiple":false,"delay":0},
{"id":"phone-number","type":"SelectorText","parentSelectors":
["link"],"selector":".address p:nth-of-type(2)","multiple":false,"regex":"","delay":0}]}
模板套用步骤:
(1)进入需要抓取的美食分类页面,例如:https://bj.meituan.com/meishi/c17/pn1/
(2)导入模板
(3)替换 Start URL为要抓取的网页链接,(抓取多页需修改 Start URL 里的页码数)
(4)开始抓取
小提示:
- link 列是为了辅助抓取,可以在excel 里面删掉。
- 如果跳出验证码,则需要手动输入,然后重新抓取