功能: 将通过http请求url,然后读取内容转化为markdown输出
工程创建
# 创建工程
cargo new scrape-url
添加依赖
在工程目录下的文件 Cargo.toml 的[dependencies] 加入
## http依赖
reqwest = { version = "0.11", features = ["blocking"] }
## html 转 markdown
html2md = "0.2"
Cargo.toml完整文件
[package]
name = "scrape-url"
version = "0.1.0"
authors = ["yangxuan_321 <yangxuan_321@163.com>"]
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
# http依赖
reqwest = { version = "0.11", features = ["blocking"] }
# html 转 markdown
html2md = "0.2"
代码编写
use std::fs;
fn main() {
let url = "https://jingyan.baidu.com/article/358570f6bf07f08f4624fc3e.html";
let output = "358570f6bf07f08f4624fc3e.md";
println!("Fetching url: {}", url);
let body = reqwest::blocking::get(url).unwrap().text().unwrap();
println!("Converting html to markdown...");
let md = html2md::parse_html(&body);
fs::write(output, md.as_bytes()).unwrap();
println!("Converted markdown has been saved in {}.", output);
}
运行
cargo run
如果编译过程中报错cargo版本或者相关包下载失败问题,请参照
Fetching url: https://jingyan.baidu.com/article/358570f6bf07f08f4624fc3e.html
Converting html to markdown...
Converted markdown has been saved in 358570f6bf07f08f4624fc3e.md.
错误解决参照
- this version of Cargo is older than the
2021
edition, and only supports2015
and2018
editions.
rustup default nightly && rustup update
- error: failed to run custom build command for
openssl-sys v0.9.75
sudo apt install libssl-dev