‘’’
    网站:https://cs.lianjia.com/ershoufang/
    需求:前5页数据 __
    名称 位置 价格 面积 保存到csv

    分析网站:
    二手房:
    第一页:https://cs.lianjia.com/ershoufang/rs/
    也可以改成:https://cs.lianjia.com/ershoufang/pg1/
    第二页:https://cs.lianjia.com/ershoufang/pg2/
    第三页:https://cs.lianjia.com/ershoufang/pg3/
    第四页:https://cs.lianjia.com/ershoufang/pg4/
    第五页:https://cs.lianjia.com/ershoufang/pg5/
    所以可以写成 url = f’https://cs.lianjia.com/ershoufang/pg{page}/‘ 变量page为1到5遍历
    ‘’’_

    1. import requests
    2. import csv
    3. from lxml import etree
    4. data_list = []
    5. for page in range(1,6):
    6. url = f'https://cs.lianjia.com/ershoufang/pg{page}/'
    7. req = requests.get(url)
    8. html = etree.HTML(req.text)
    9. lis = html.xpath('//ul[@class="sellListContent"]/li')
    10. for li in lis:
    11. divs = li.xpath('./div[@class="info clear"]')
    12. for div in divs:
    13. data={}
    14. title = div.xpath('./div[@class="title"]/a/text()')[0]
    15. flood = div.xpath('./div[@class="flood"]/div/a/text()')[:2]
    16. flood1 = '-'.join(flood)
    17. address = div.xpath('./div[@class="address"]/div/text()')[0]
    18. address1 = list(address.split("|"))[:2]
    19. address2 = "|".join(address1)
    20. price1 = div.xpath('./div[@class="priceInfo"]/div/span/text()')[0]+"万"
    21. price2 = div.xpath('./div[@class="priceInfo"]/div/span/text()')[1]
    22. price3 = price1+"|"+price2
    23. data["title"] = title
    24. data["flood"] = flood1
    25. data["address"] = address2
    26. data["price"] = price3
    27. data_list.append(data)
    28. with open("长沙二手房房源.csv", "w", encoding="utf-8", newline='') as f:
    29. wt = csv.DictWriter(f,fieldnames=['title', 'flood', 'address', 'price'])
    30. wt.writeheader()
    31. wt.writerows(data_list)

    总结一下:
    其实这作业不是很难,但我就是做了很久,什么原因,还不是思路不清晰(做练习做得少),开始在写代码时,第4行我是这样写的:list=[],结果一运行就报错:TypeError: ‘list’ object is not callable,去问度娘才知道自己犯了一个很低级的错误(代码第4行:list=[],为什么?用了python的关键字作变量).
    附上成果(长沙二手房房源.csv)
    20220410.png