Python 網絡爬蟲 - 图1
© getcodify.com

由於語法渲染問題而影響閱讀體驗, 請移步博客閱讀~
本文GitPage地址

Crawler (爬虫)

1. Quick Start

Crawler img location sites from National Geographic web site & downloading them.

  1. from bs4 import BeautifulSoup
  2. from urllib.request import urlopen
  3. import re
  4. import requests
  5. ## Starting resuqest
  6. html = urlopen("http://www.nationalgeographic.com.cn/animals/").read().decode('utf-8')
  7. soup = BeautifulSoup(html, features='lxml')
  8. img_links = soup.find_all("img", {"src": re.compile('http://image..*?\.jpg')})
  9. for link in img_links:
  10. print(link['src']) # pic locationg

Python 網絡爬蟲 - 图2

  1. ## With adding this
  2. ## mkdir img # 创建一个img文件夹
  3. for link in img_links:
  4. print(link['src'])
  5. if link['src'][0:4] == 'http':
  6. url = link['src']
  7. r = requests.get(url, stream=True)
  8. image_name = url.split('/')[-1]
  9. with open('./img/%s' % image_name, 'wb') as f:
  10. for chunk in r.iter_content(chunk_size=128):
  11. f.write(chunk)
  12. print('Saved %s' % image_name)

Running result:

Python 網絡爬蟲 - 图3

实战案例:

科技快讯


Enjoy~

本文由Python腳本GitHub/語雀自動更新

由於語法渲染問題而影響閱讀體驗, 請移步博客閱讀~
本文GitPage地址

GitHub: Karobben
Blog:Karobben
BiliBili:史上最不正經的生物狗