功能: 将通过http请求url,然后读取内容转化为markdown输出

工程创建

  1. # 创建工程
  2. cargo new scrape-url

添加依赖

在工程目录下的文件 Cargo.toml 的[dependencies] 加入

  1. ## http依赖
  2. reqwest = { version = "0.11", features = ["blocking"] }
  3. ## html 转 markdown
  4. html2md = "0.2"

Cargo.toml完整文件

  1. [package]
  2. name = "scrape-url"
  3. version = "0.1.0"
  4. authors = ["yangxuan_321 <yangxuan_321@163.com>"]
  5. edition = "2021"
  6. # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
  7. [dependencies]
  8. # http依赖
  9. reqwest = { version = "0.11", features = ["blocking"] }
  10. # html 转 markdown
  11. html2md = "0.2"

代码编写

  1. use std::fs;
  2. fn main() {
  3. let url = "https://jingyan.baidu.com/article/358570f6bf07f08f4624fc3e.html";
  4. let output = "358570f6bf07f08f4624fc3e.md";
  5. println!("Fetching url: {}", url);
  6. let body = reqwest::blocking::get(url).unwrap().text().unwrap();
  7. println!("Converting html to markdown...");
  8. let md = html2md::parse_html(&body);
  9. fs::write(output, md.as_bytes()).unwrap();
  10. println!("Converted markdown has been saved in {}.", output);
  11. }

运行

cargo run

如果编译过程中报错cargo版本或者相关包下载失败问题,请参照

  1. Fetching url: https://jingyan.baidu.com/article/358570f6bf07f08f4624fc3e.html
  2. Converting html to markdown...
  3. Converted markdown has been saved in 358570f6bf07f08f4624fc3e.md.

错误解决参照

  • this version of Cargo is older than the 2021 edition, and only supports 2015 and 2018 editions.
  1. rustup default nightly && rustup update
  • error: failed to run custom build command for openssl-sys v0.9.75
  1. sudo apt install libssl-dev