翻页:
    https://careers.tencent.com/search.html?index=1
    https://careers.tencent.com/search.html?index=2
    1、创建scrapy项目

    1. scrapy startproject tencent

    2、创建爬虫程序

    1. scrapy genspider spider tencent.com

    3、写一个运行文件

    1. from scrapy import cmdline
    2. cmdline.execute('scrpay crawl spider'.split())
    3. # 或者 cmdline.execute(['scrapy','crawl','spider1'])

    4、spider.py文件

    1. import scrapy
    2. import json
    3. from tencent.items import TencentItem
    4. class SpiderSpider(scrapy.Spider):
    5. name = 'spider'
    6. allowed_domains = ['tencent.com']
    7. one_url = 'https://careers.tencent.com/tencentcareer/api/post/Query?timestamp=1646136476254&countryId=&cityId=&bgIds=&productId=&categoryId=&parentCategoryId=&attrId=&keyword=&pageIndex={}&pageSize=10&language=zh-cn&area=cn '
    8. detail_url = 'https://careers.tencent.com/tencentcareer/api/post/ByPostId?timestamp=1650456224399&postId={}&language=zh-cn'
    9. # 翻页
    10. def start_requests(self):
    11. for page in range(1, 11):
    12. url = self.one_url.format(page)
    13. yield scrapy.Request(url, self.parse)
    14. def parse(self, response):
    15. # 解析数据 xpath css
    16. # parse库
    17. data = json.loads(response.text)
    18. for job in data['Data']['Posts']:
    19. item = TencentItem()
    20. item['job_name'] = job['RecruitPostName']
    21. post_id = job['PostId']
    22. # 获取详情页面url
    23. detailUrl = self.detail_url.format(post_id)
    24. yield scrapy.Request(url=detailUrl,callback=self.parse_detail,meta={'item':item})
    25. def parse_detail(self,response):
    26. # 解析详情页面
    27. item = response.meta.get('item')
    28. data = json.loads(response.text)
    29. item['job_duty'] = data['Data']['Responsibility']
    30. print(item)

    5、items.py文件

    1. class TencentItem(scrapy.Item):
    2. # define the fields for your item here like:
    3. # name = scrapy.Field()
    4. # 职位名称
    5. job_name = scrapy.Field()
    6. # 职位要求
    7. job_duty = scrapy.Field()