colly - readme - 《Golang 学习库》

特性
例子
安装
Bugs
使用Colly的其他项目

快速、优雅的爬虫框架对于golang开发者
colly提供一个干净的接口对于写任何类型的网络爬虫。
使用Colly，您可以轻松地从网站提取结构化数据，可以用于广泛的应用程序，如数据挖掘，数据处理或存档。

特性

干净的API
快速(单核>1k请求/秒)
管理每个域的请求延迟和最大并发性
自动cookie和会话处理
同步/异步并行抓取
缓存
非unicode响应的自动编码
robots . txt的支持
分布式抓取
通过环境变量进行配置
扩展

例子

    c := colly.NewCollector()
    // Find and visit all links
    c.OnHTML("a[href]", func(e *colly.HTMLElement) {
        e.Request.Visit(e.Attr("href"))
    })
    c.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting", r.URL)
    })
    c.Visit("http://go-colly.org/")
}

请参阅示例文件夹以获得更详细的示例。

安装

添加colloy到你的 go.mod文件：

module github.com/x/y
go 1.14
require (
        github.com/gocolly/colly/v2 latest
)

Bugs

bug或建议吗?访问问题跟踪器或加入#colly的freenode

使用Colly的其他项目

下面是使用Colly的公共开源项目列表:

greenpeace/check-my-pages Scraping script to test the Spanish Greenpeace web archive.
altsab/gowap Wappalyzer implementation in Go.
jesuiscamille/goquotes A quotes scrapper, making your day a little better!
jivesearch/jivesearch A search engine that doesn’t track you.
Leagify/colly-draft-prospects A scraper for future NFL Draft prospects.
lucasepe/go-ps4 Search playstation store for your favorite PS4 games using the command line.
yringler/inside-chassidus-scraper Scrapes Rabbi Paltiel’s web site for lesson metadata.
gamedb/gamedb A database of Steam games.
lawzava/scrape CLI for email scraping from any website.
eureka101v/WeiboSpiderGo A sina weibo(chinese twitter) scrapper
Go-phie/gophie Search, Download and Stream movies from your terminal
imthaghost/goclone Clone websites to your computer within seconds.
superiss/spidy Crawl the web and collect expired domains.
docker-slim/docker-slim Optimize your Docker containers to make them smaller and better.