通过例子学 Rust - 20. 标准库线程、进程、文件等 - 《rust》

线程
- 测试：map-reduce
通道
路径
- 参见
文件输入输出
子进程
- 管道
- 等待
文件系统操作
程序参数

标准库提供的其他类型

线程（Threads）
信道（Channel）
文件输入输出（File I/O）

这些内容在原生类型（第二节）进行了有效的扩充。

线程

Rust 通过 spawn 函数提供了创建本地操作系统（native OS）线程的机制。

use std::thread;
static NTHREADS: i32 = 10000;
// 这是主（`main`）线程
fn main() {
    // 提供一个 vector 来存放所创建的子线程（children）。
    let mut children = vec![];
    for i in 0..NTHREADS {
        // 启动（spin up）另一个线程
        children.push(thread::spawn(move || {
            println!("this is thread number {}", i)
        }));
    }
    for child in children {
        // 等待线程结束。返回一个结果。
        let _ = child.join();
    }
}
// 这些线程由操作系统调度（schedule）。

测试：map-reduce

标准库提供了开箱即用的线程类型，结合所有权概念和别名规则，自动地避免了数据竞争（data race）。
当某状态对某线程是可见的，别名规则（即一个可变引用 XOR 一些只读引用。译注：XOR 是异或的意思，即「二者仅居其一」）就自动地避免了别的线程对它的操作。（当需要同步处理时，请使用 Mutex 或 Channel 这样的同步类型。）

use std::thread;
// 这是 `main` 线程
fn main() {
    // 这是我们要处理的数据。
    // 我们会通过线程实现 map-reduce 算法，从而计算每一位的和
    // 每个用空白符隔开的块都会分配给单独的线程来处理
    //
    // 试一试：插入空格，看看输出会怎样变化！
    let data = "86967897737416471853297327050364959
11861322575564723963297542624962850
70856234701860851907960690014725639
3839796670710609417278 3238747669219
52380795257888236525459303330302837
58495327135744041048897885734297812
699202164389808735488 08413720956532
16278424637452589860345374828574668";
    // 创建一个向量，用于储存将要创建的子线程
    let mut children = vec![];
    /*************************************************************************
     * "Map" 阶段
     *
     * 把数据分段，并进行初始化处理
     ************************************************************************/
    // 把数据分段，每段将会单独计算
    // 每段都是完整数据的一个引用（&str）
    let chunked_data = data.split_whitespace();
    // 对分段的数据进行迭代。
    // .enumerate() 会把当前的迭代计数与被迭代的元素以元组 (index, element)
    // 的形式返回。接着立即使用 “解构赋值” 将该元组解构成两个变量，
    // `i` 和 `data_segment`。
    for (i, data_segment) in chunked_data.enumerate() {
        println!("data segment {} is \"{}\"", i, data_segment);
        // 用单独的线程处理每一段数据
        //
        // spawn() 返回新线程的句柄（handle），我们必须拥有句柄，
        // 才能获取线程的返回值。
        //
        // 'move || -> u32' 语法表示该闭包：
        // * 没有参数（'||'）
        // * 会获取所捕获变量的所有权（'move'）
        // * 返回无符号 32 位整数（'-> u32'）
        //
        // Rust 可以根据闭包的内容推断出 '-> u32'，所以我们可以不写它。
        //
        // 试一试：data_segment 会存活到系统结束，将所有权转移
        children.push(thread::spawn(move || -> u32 {
            // 计算该段的每一位的和：
            let result = data_segment
                        // 对该段中的字符进行迭代..
                        .chars()
                        // ..把字符转成数字..
                        .map(|c| c.to_digit(10).expect("should be a digit"))
                        // ..对返回的数字类型的迭代器求和
                        .sum();
            // println! 会锁住标准输出，这样各线程打印的内容不会交错在一起
            println!("processed segment {}, result={}", i, result);
            // 不需要 “return”，因为 Rust 是一种 “表达式语言”，每个代码块中
            // 最后求值的表达式就是代码块的值。
            result
        }));
    }
    /*************************************************************************
     * "Reduce" 阶段
     *
     * 收集中间结果，得出最终结果
     ************************************************************************/
    // 把每个线程产生的中间结果收入一个新的向量中
    let mut intermediate_sums = vec![];
    for child in children {
        // 收集每个子线程的返回值
        let intermediate_sum = child.join().unwrap();
        intermediate_sums.push(intermediate_sum);
    }
    // 把所有中间结果加起来，得到最终结果
    //
    // 我们用 “涡轮鱼” 写法 ::<> 来为 sum() 提供类型提示。
    //
    // 显式地指定 intermediate_sums 是只需要给 final_result 加显示类型
    let final_result = intermediate_sums.iter().sum::<u32>();
    println!("Final sum result: {}", final_result);
}

待改进：使得数据总是被分成有限数目的段，这个数目是由程序开头的静态常量决定的。

通道

Rust 为线程之间的通信提供了异步的通道（channel）。通道允许两个端点之间信息的单向流动：Sender（发送端）和 Receiver（接收端）。

use std::sync::mpsc::{Sender, Receiver};
use std::sync::mpsc;
use std::thread;
static NTHREADS: i32 = 3;
fn main() {
    // 通道有两个端点：`Sender<T>` 和 `Receiver<T>`，其中 `T` 是要发送
    // 的消息的类型（类型标注是可选的）
    let (tx, rx): (Sender<i32>, Receiver<i32>) = mpsc::channel();
    for id in 0..NTHREADS {
        // sender 端可被复制
        let thread_tx = tx.clone();
        // 每个线程都将通过通道来发送它的 id
        thread::spawn(move || {
            // 被创建的线程取得 `thread_tx` 的所有权
            // 每个线程都把消息放在通道的消息队列中
            thread_tx.send(id).unwrap();
            // 发送是一个非阻塞（non-blocking）操作，线程将在发送完消息后
            // 会立即继续进行
            println!("thread {} finished", id);
        });
    }
    // 所有消息都在此处被收集
    let mut ids = Vec::with_capacity(NTHREADS as usize);
    for _ in 0..NTHREADS {
        // `recv` 方法从通道中拿到一个消息
        // 若无可用消息的话，`recv` 将阻止当前线程
        ids.push(rx.recv());
    }
    // 显示消息被发送的次序
    println!("{:?}", ids);
}
// 输出，其中 thread 2 没有输出时主线程已经关闭
// thread 0 finished
// thread 1 finished     
// [Ok(0), Ok(1), Ok(2)]

路径

Path 结构体代表了底层文件系统的文件路径。Path 分为两种：posix::Path，针对类 UNIX 系统；以及 windows::Path，针对 Windows。prelude 会选择并输出符合平台类型的 Path 种类。
Path 在内部并不是用 UTF-8 字符串表示的，而是存储为若干字节（ Vec<u8> ）的 vector。因此，将 Path 转化成 &tr 并非零开销，且可能失败，它返回一个 Option 。

use std::path::Path;
fn main() {
    // 从 `&'static str` 创建一个 `Path`
    let path = Path::new(".");
    // `display` 方法返回一个可显示（showable）的结构体
    let display = path.display();
    // `join` 使用操作系统特定的分隔符来合并路径到一个字节容器，并返回新的路径
    let new_path = path.join("a").join("b");
    // 将路径转换成一个字符串切片
    match new_path.to_str() {
        None => panic!("new path is not a valid UTF-8 sequence"),
        Some(s) => println!("new path is {}", s), // ./a/b
    }
}

参见

OsStr 和 Metadata。

文件输入输出

File 结构体表示一个被打开的文件（它包裹了一个文件描述符），并赋予了对表示文件的读写能力。
File 的所有方法都返回了 io::Result<T> 类型，它是 Result<T, io::Error> 的别名。
所有 I/O 操作的失败都变成显式的。

打开文件

open 静态方法能够以只读模式（read-only mode）打开一个文件。
File 拥有资源，即文件描述符（file descriptor），它会在自身被 drop 时关闭文件。

use std::error::Error;
use std::fs::File;
use std::io::prelude::*;
use std::path::Path;
fn main() {
    // 创建指向所需的文件的路径
    let path = Path::new("hello.txt");
    let display = path.display();
    // 以只读方式打开路径，返回 `io::Result<File>`
    let mut file = match File::open(&path) {
        // `io::Error` 的 `description` 方法返回一个描述错误的字符串。
        Err(why) => panic!("couldn't open {}: {}", display,
                                                   why.description()),
        Ok(file) => file,
    };
    // 读取文件内容到一个字符串，返回 `io::Result<usize>`
    let mut s = String::new();
    match file.read_to_string(&mut s) {
        Err(why) => panic!("couldn't read {}: {}", display,
                                                   why.description()),
        Ok(_) => print!("{} contains:\n{}", display, s),
    }
    // `file` 离开作用域，并且 `hello.txt` 文件将被关闭。
}

创建文件

create 静态方法以只写模式（write-only mode）打开一个文件。若文件已经存在，则旧内容将被销毁。否则，将创建一个新文件。

static LOREM_IPSUM: &'static str = "Lorem ipsum dolor sit amet";
use std::error::Error;
use std::io::prelude::*;
use std::fs::File;
use std::path::Path;
fn main() {
    let path = Path::new("out/lorem_ipsum.txt");
    let display = path.display();
    // 以只写模式打开文件，返回 `io::Result<File>`
    let mut file = match File::create(&path) {
        Err(why) => panic!("couldn't create {}: {}",
                           display,
                           why.description()),
        Ok(file) => file,
    };
    // 将 `LOREM_IPSUM` 字符串写进 `file`，返回 `io::Result<()>`
    match file.write_all(LOREM_IPSUM.as_bytes()) {
        Err(why) => {
            panic!("couldn't write to {}: {}", display,
                                               why.description())
        },
        Ok(_) => println!("successfully wrote to {}", display),
    }
}

open_mode ：更加通用，以其他方式打开文件，如：read+write，append 等。

读取行

方法 lines() 在文件的行上返回一个迭代器。
应该是 str 实现了 AsRef`` 的 trait。

use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;
fn main() {
    // 在生成输出之前，文件主机必须存在于当前路径中
    if let Ok(lines) = read_lines("./hosts") {
        // 使用迭代器，返回一个（可选）字符串
        for line in lines {
            if let Ok(ip) = line {
                println!("{}", ip);
            }      
        }   
    }
}
// 输出包裹在 Result 中以允许匹配错误，
// 将迭代器返回给文件行的读取器（Reader）。
fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where P: AsRef<Path>, {
    let file = File::open(filename)?;
    Ok(io::BufReader::new(file).lines())
}

运行结果：

$ rustc read_lines.rs && ./read_lines 127.0.0.1

这个过程比在内存中创建 String 更有效，特别是处理更大的文件。

子进程

process::Output 结构体表示已结束的子进程（child process）的输出，而 process::Command 结构体是一个进程创建者（process builder）。

创建一个命令行执行的例子（rustc —version）：

use std::process::Command;
fn main() {
    let output = Command::new("rustc")
        .arg("--version")
        .output().unwrap_or_else(|e| {
            panic!("failed to execute process: {}", e)
    });
    if output.status.success() {
        let s = String::from_utf8_lossy(&output.stdout);
        print!("rustc succeeded and stdout was:\n{}", s);
    } else {
        let s = String::from_utf8_lossy(&output.stderr);
        print!("rustc failed and stderr was:\n{}", s);
    }
}

管道

std::Child 结构体代表了一个正在运行的子进程，它暴露了 stdin（标准输入），stdout（标准输出）和 stderr（标准错误）句柄，从而可以通过管道与所代表的进程交互。

use std::error::Error;
use std::io::prelude::*;
use std::process::{Command, Stdio};
static PANGRAM: &'static str =
"the quick brown fox jumped over the lazy dog\n";
fn main() {
    // 启动 `wc` 命令
    let process = match Command::new("wc")
                                .stdin(Stdio::piped())
                                .stdout(Stdio::piped())
                                .spawn() {
        Err(why) => panic!("couldn't spawn wc: {}", why.description()),
        Ok(process) => process,
    };
    // 将字符串写入 `wc` 的 `stdin`。
    //
    // `stdin` 拥有 `Option<ChildStdin>` 类型，不过我们已经知道这个实例不为空值，
    // 因而可以直接 `unwrap 它。
    match process.stdin.unwrap().write_all(PANGRAM.as_bytes()) {
        Err(why) => panic!("couldn't write to wc stdin: {}",
                           why.description()),
        Ok(_) => println!("sent pangram to wc"),
    }
    // 因为 `stdin` 在上面调用后就不再存活，所以它被 `drop` 了，管道也被关闭。
    //
    // 这点非常重要，因为否则 `wc` 就不会开始处理我们刚刚发送的输入。
    // `stdout` 字段也拥有 `Option<ChildStdout>` 类型，所以必需解包。
    let mut s = String::new();
    match process.stdout.unwrap().read_to_string(&mut s) {
        Err(why) => panic!("couldn't read wc stdout: {}",
                           why.description()),
        Ok(_) => print!("wc responded with:\n{}", s),
    }
}

结果：

sent pangram to wc wc responded with: 1 9 45

等待
如果你想等待一个 process::Child 完成，就必须调用 Child::wait，这会返回一个 process::ExitStatus。 ```rust use std::process::Command;

fn main() { let mut child = Command::new(“sleep”).arg(“5”).spawn().unwrap(); let _result = child.wait().unwrap();

println!("reached end of main");

}

```shell
$ rustc wait.rs && ./wait
reached end of main
# `wait` keeps running for 5 seconds
# `sleep 5` command ends, and then our `wait` program finishes

文件系统操作

std::io::fs 模块包含几个处理文件系统的函数。

use std::fs::{self, File, OpenOptions};
use std::io::{self, prelude::*, BufRead};
use std::path::Path;
// `% cat path` 的简单实现
fn cat(path: &Path) -> io::Result<String> {
    let mut f = File::open(path)?;
    let mut s = String::new();
    f.read_to_string(&mut s)?;
    Ok(s)
}
// `% echo s > path` 的简单实现
fn echo(s: &str, path: &Path) -> io::Result<()> {
    let mut f = File::create(path)?;
    f.write_all(s.as_bytes())
}
// `% touch path` 的简单实现（忽略已存在的文件）
fn touch(path: &Path) -> io::Result<()> {
    match OpenOptions::new().create(true).write(true).open(path) {
        Ok(_) => Ok(()),
        Err(e) => Err(e),
    }
}

程序参数

标准库

命令行参数可使用 std::env::args 进行接收，这将返回一个迭代器，该迭代器会对每个参数举出一个字符串。

use std::env;
fn main() {
    let args: Vec<String> = env::args().collect();
    // 第一个参数是调用本程序的路径
    println!("My path is {}.", args[0]);
    // 其余的参数是被传递给程序的命令行参数。
    // 请这样调用程序：
    //   $ ./args arg1 arg2
    println!("I got {:?} arguments: {:?}.", args.len() - 1, &args[1..]);
}

$ ./args 1 2 3 My path is ./args. I got 3 arguments: [“1”, “2”, “3”].

crate

很多 crate 提供了编写命令行应用的额外功能。 clap 的最佳实践可以参考 Rust Cookbook 。

参数解析

可以使用模式匹配来解析简单的参数。

没有输入参数；
输入一个参数必须为数字；
输入两个参数第一个是「命令」，第二个是数字；
其他情况输出帮助。 ```rust use std::env;

fn increase(number: i32) { println!(“{}”, number + 1); }

fn decrease(number: i32) { println!(“{}”, number - 1); }

fn help() { println!(“usage: match_args Check whether given string is the answer. match_args {{increase|decrease}} Increase or decrease given integer by one.”); }

fn main() { let args: Vec = env::args().collect();

match args.len() {
    // 没有传入参数
    1 => {
        println!("My name is 'match_args'. Try passing some arguments!");
    },
    // 一个传入参数
    2 => {
        match args[1].parse() {
            Ok(42) => println!("This is the answer!"),
            _ => println!("This is not the answer."),
        }
    },
    // 传入一条命令和一个参数
    3 => {
        let cmd = &args[1];
        let num = &args[2];
        // 解析数字
        let number: i32 = match num.parse() {
            Ok(n) => {
                n
            },
            Err(_) => {
                println!("error: second argument not an integer");
                help();
                return;
            },
        };
        // 解析命令
        match &cmd[..] {
            "increase" => increase(number),
            "decrease" => decrease(number),
            _ => {
                println!("error: invalid command");
                help();
            },
        }
    },
    // 所有其他情况
    _ => {
        // 显示帮助信息
        help();
    }
}

}

<a name="JpQoe"></a>
## 外部语言函数接口（FFI）
C 语言库外部语言函数接口（Foreign Function Interface，FFI）。外部语言函数必须在一个 `extern` 代码块中声明，且该代码要带有一个包含库名称的 `#[link]` 属性。
```rust
use std::fmt;
// 这个 extern 代码块链接到 libm 库
#[link(name = "m")]
extern {
    // 这个外部函数用于计算单精度复数的平方根
    fn csqrtf(z: Complex) -> Complex;
    // 这个用来计算单精度复数的复变余弦
    fn ccosf(z: Complex) -> Complex;
}
// 由于调用其他语言的函数被认为是不安全的，我们通常会给它们写一层安全的封装
fn cos(z: Complex) -> Complex {
    unsafe { ccosf(z) }
}
fn main() {
    // z = -1 + 0i
    let z = Complex { re: -1., im: 0. };
    // 调用外部语言函数是不安全操作
    let z_sqrt = unsafe { csqrtf(z) };
    println!("the square root of {:?} is {:?}", z, z_sqrt);
    // 调用不安全操作的安全的 API 封装
    println!("cos({:?}) = {:?}", z, cos(z));
}
// 单精度复数的最简实现
#[repr(C)]
#[derive(Clone, Copy)]
struct Complex {
    re: f32,
    im: f32,
}
impl fmt::Debug for Complex {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        if self.im < 0. {
            write!(f, "{}-{}i", self.re, -self.im)
        } else {
            write!(f, "{}+{}i", self.re, self.im)
        }
    }
}

C 语言内存复制

extern "C" {
    pub fn memcmp(s1: *const u8, s2: *const u8, n: usize) -> i32;
}