‘’’
网站:https://cs.lianjia.com/ershoufang/
需求:前5页数据 __ 名称 位置 价格 面积 保存到csv
分析网站:
二手房:
第一页:https://cs.lianjia.com/ershoufang/rs/
也可以改成:https://cs.lianjia.com/ershoufang/pg1/
第二页:https://cs.lianjia.com/ershoufang/pg2/
第三页:https://cs.lianjia.com/ershoufang/pg3/
第四页:https://cs.lianjia.com/ershoufang/pg4/
第五页:https://cs.lianjia.com/ershoufang/pg5/
所以可以写成 url = f’https://cs.lianjia.com/ershoufang/pg{page}/‘ 变量page为1到5遍历
‘’’_
import requestsimport csvfrom lxml import etreedata_list = []for page in range(1,6):url = f'https://cs.lianjia.com/ershoufang/pg{page}/'req = requests.get(url)html = etree.HTML(req.text)lis = html.xpath('//ul[@class="sellListContent"]/li')for li in lis:divs = li.xpath('./div[@class="info clear"]')for div in divs:data={}title = div.xpath('./div[@class="title"]/a/text()')[0]flood = div.xpath('./div[@class="flood"]/div/a/text()')[:2]flood1 = '-'.join(flood)address = div.xpath('./div[@class="address"]/div/text()')[0]address1 = list(address.split("|"))[:2]address2 = "|".join(address1)price1 = div.xpath('./div[@class="priceInfo"]/div/span/text()')[0]+"万"price2 = div.xpath('./div[@class="priceInfo"]/div/span/text()')[1]price3 = price1+"|"+price2data["title"] = titledata["flood"] = flood1data["address"] = address2data["price"] = price3data_list.append(data)with open("长沙二手房房源.csv", "w", encoding="utf-8", newline='') as f:wt = csv.DictWriter(f,fieldnames=['title', 'flood', 'address', 'price'])wt.writeheader()wt.writerows(data_list)
总结一下:
其实这作业不是很难,但我就是做了很久,什么原因,还不是思路不清晰(做练习做得少),开始在写代码时,第4行我是这样写的:list=[],结果一运行就报错:TypeError: ‘list’ object is not callable,去问度娘才知道自己犯了一个很低级的错误(代码第4行:list=[],为什么?用了python的关键字作变量).
附上成果(长沙二手房房源.csv)
