‘’’
网站:https://cs.lianjia.com/ershoufang/
需求:前5页数据 __ 名称 位置 价格 面积 保存到csv
分析网站:
二手房:
第一页:https://cs.lianjia.com/ershoufang/rs/
也可以改成:https://cs.lianjia.com/ershoufang/pg1/
第二页:https://cs.lianjia.com/ershoufang/pg2/
第三页:https://cs.lianjia.com/ershoufang/pg3/
第四页:https://cs.lianjia.com/ershoufang/pg4/
第五页:https://cs.lianjia.com/ershoufang/pg5/
所以可以写成 url = f’https://cs.lianjia.com/ershoufang/pg{page}/‘ 变量page为1到5遍历
‘’’_
import requests
import csv
from lxml import etree
data_list = []
for page in range(1,6):
url = f'https://cs.lianjia.com/ershoufang/pg{page}/'
req = requests.get(url)
html = etree.HTML(req.text)
lis = html.xpath('//ul[@class="sellListContent"]/li')
for li in lis:
divs = li.xpath('./div[@class="info clear"]')
for div in divs:
data={}
title = div.xpath('./div[@class="title"]/a/text()')[0]
flood = div.xpath('./div[@class="flood"]/div/a/text()')[:2]
flood1 = '-'.join(flood)
address = div.xpath('./div[@class="address"]/div/text()')[0]
address1 = list(address.split("|"))[:2]
address2 = "|".join(address1)
price1 = div.xpath('./div[@class="priceInfo"]/div/span/text()')[0]+"万"
price2 = div.xpath('./div[@class="priceInfo"]/div/span/text()')[1]
price3 = price1+"|"+price2
data["title"] = title
data["flood"] = flood1
data["address"] = address2
data["price"] = price3
data_list.append(data)
with open("长沙二手房房源.csv", "w", encoding="utf-8", newline='') as f:
wt = csv.DictWriter(f,fieldnames=['title', 'flood', 'address', 'price'])
wt.writeheader()
wt.writerows(data_list)
总结一下:
其实这作业不是很难,但我就是做了很久,什么原因,还不是思路不清晰(做练习做得少),开始在写代码时,第4行我是这样写的:list=[],结果一运行就报错:TypeError: ‘list’ object is not callable,去问度娘才知道自己犯了一个很低级的错误(代码第4行:list=[],为什么?用了python的关键字作变量).
附上成果(长沙二手房房源.csv)