‘’’
网站:https://cs.lianjia.com/ershoufang/
需求:前5页数据 __ 名称 位置 价格 面积 保存到csv
分析网站:
二手房:
第一页:https://cs.lianjia.com/ershoufang/rs/
也可以改成:https://cs.lianjia.com/ershoufang/pg1/
第二页:https://cs.lianjia.com/ershoufang/pg2/
第三页:https://cs.lianjia.com/ershoufang/pg3/
第四页:https://cs.lianjia.com/ershoufang/pg4/
第五页:https://cs.lianjia.com/ershoufang/pg5/
所以可以写成 url = f’https://cs.lianjia.com/ershoufang/pg{page}/‘ 变量page为1到5遍历
‘’’_
import csv
import requests
from bs4 import BeautifulSoup
data_list = []
for page in range(1,6):
url = f'https://cs.lianjia.com/ershoufang/pg{page}'
req = requests.get(url)
soup = BeautifulSoup(req.text, 'lxml')
ul = soup.find(class_="sellListContent")
divs = ul.find_all(class_="info clear")
for div in divs:
list1 = {}
title = div.find('div', class_='title').text.replace("必看好房", "")
flood = div.find('div', class_='flood').text
address = div.find('div', class_='address').text.split('|')[:2]
address1 = "|".join(address)
price1 = div.find('div', class_='totalPrice totalPrice2').text
price2 = div.find('div', class_='unitPrice').text
price = price1+"|"+price2
list1['title'] = title
list1['flood'] = flood
list1['address'] = address1
list1['price'] = price
data_list.append(list1)
with open('bs4_20220410_长沙二手房房源.csv', 'w', encoding='utf-8', newline='') as f:
wt = csv.DictWriter(f,fieldnames=['title', 'flood', 'address', 'price'])
wt.writeheader()
wt.writerows(data_list)
print("文件保存成功,任务完成!")
总结一下:其实用bs4来做真的很爽!看看代码,很清晰的感觉!
附(bs420220410长沙二手房房源.csv)