Python 網絡爬蟲 - 《Python》

Crawler (爬虫)
1. Quick Start
实战案例:
- 科技快讯">科技快讯


© getcodify.com

由於語法渲染問題而影響閱讀體驗，請移步博客閱讀～
本文GitPage地址

Crawler (爬虫)

1. Quick Start

Crawler img location sites from National Geographic web site & downloading them.

from bs4 import BeautifulSoup
from urllib.request import urlopen
import re
import requests
## Starting resuqest
html = urlopen("http://www.nationalgeographic.com.cn/animals/").read().decode('utf-8')
soup = BeautifulSoup(html, features='lxml')
img_links = soup.find_all("img", {"src": re.compile('http://image..*?\.jpg')})
for link in img_links:
    print(link['src']) # pic locationg

Python 網絡爬蟲 - 图2

## With adding this
## mkdir img # 创建一个img文件夹
for link in img_links:
    print(link['src'])
    if link['src'][0:4] == 'http':
        url = link['src']
        r = requests.get(url, stream=True)
        image_name = url.split('/')[-1]
        with open('./img/%s' % image_name, 'wb') as f:
            for chunk in r.iter_content(chunk_size=128):
                f.write(chunk)
        print('Saved %s' % image_name)

Running result:

Python 網絡爬蟲 - 图3

实战案例:

科技快讯

Enjoy~

本文由Python腳本GitHub/語雀自動更新

由於語法渲染問題而影響閱讀體驗，請移步博客閱讀～
本文GitPage地址

GitHub: Karobben
Blog:Karobben
BiliBili:史上最不正經的生物狗