用到的库

  • requests
  • json

    原理

  • 导入所需的库

  • 用迭代的方式枚举所有知乎链接
  • 判断状态码是否是200
  • 如果是200就打印链接
  • 保存为txt文件

    完整代码

    ``` import requests import json

def get_links(): links = [] nummber = 19550224 while nummber < 900000000: nummber = nummber + 1 urls = ‘https://www.zhihu.com/question/‘ + str(nummber) links.append(urls) return links

def write_to_file(content): with open(‘19550224.txt’, ‘a’, encoding=’utf-8’) as f: f.write(json.dumps(content, ensure_ascii=False) + ‘\n’) f.close()

def main():
links = get_links() for link in links: headers = {‘User-Agent’:’Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36’} response = requests.get(link, headers=headers) if response.status_code == 200: print(link) write_to_file(link)

if name == ‘main‘: main()

  1. ## 其他
  2. 需要知道知乎的第一个链接地址,当然也可以自己从 10000000 开始迭代自己找出来。<br />[https://www.zhihu.com/question/19550225](https://www.zhihu.com/question/19550225)<br />这是知乎的第一个问题, 编号是 19550225
  3. ## 优化后
  4. 把判断放到了循环内,这样就不用获取所有的nummer 再开始运算了。效率高了很多。<br />另外,把headers 放到全局,这样不用每一次都需要获取一次 headers

import requests import json

headers = {‘User-Agent’:’Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36’}

def write_to_file(content): with open(‘zhihu.txt’, ‘a’, encoding=’utf-8’) as f: f.write(json.dumps(content, ensure_ascii=False) + ‘\n’)
f.close()

def main(): nummber = 61158073 while nummber >= 61158073: nummber = nummber + 1 url = ‘https://www.zhihu.com/question/‘ + str(nummber) response = requests.get(url, headers=headers) if response.status_code == 200: print(url) write_to_file(url)

if name == ‘main‘: main()

```