写了个简单爬虫把安全客上的SRC列表爬了下来
代码:
import requestsimport reurl = "https://www.anquanke.com/src"def src(url):res = requests.get(url).content.decode('utf-8')return rescode = re.findall(r'<a target="_blank" rel="noopener noreferrer" href="/src/(\d+)">',str(src(url)))for codes in code:res2 = requests.get("https://www.anquanke.com/src/" + codes).content.decode('utf-8')srcname = re.findall(r'<title>(.*) - 安全客',res2)urladdress = re.findall(r'<h2>网址.*href="(.*?)">.*<h2>漏洞提交入口</h2>',res2,flags=re.DOTALL)str_srcname = "".join(srcname)str_urladdress = "".join(urladdress)print(r"名称:{0} , 地址:{1} ".format(str_srcname,str_urladdress))
效果:
学到的东西:
正则“.”匹配时是默认不匹配换行符的,后面加个“flags=re.DOTALL”就好了
