08.常用的集合 - 8.2 String - 《Rust》

字符串是什么
通常说的字符串是指？
其他类型的字符串
创建一个新的字符串(String)
更新 String
对 String 按索引的形式进行访问
字节、标量值、字形簇
切割 String
遍历 String 的方法
String 不简单

Rust开发者经常会被字符串困扰的原因

Rust 倾向于暴露可能的错误
字符串数据结构复杂
UTF-8

字符串是什么
Byte 的集合
一些方法
- 能将 byte 解析为文本
Rust 的核心语言层面，只有一个字符串类型：字符串切片 str (或 &str,即通常以借用的方式出现)
字符串切片：对存储在其它地方、UTF-8编码的字符串的引用
- 字符串字面值：存储在二进制文件中，也是字符串切片
String 类型
- 来自标准库而不是核心语言
- 可增长、可修改、可拥有(可获得所有权)
- UTF-8 编码
  通常说的字符串是指？
String 和 &str
- 标准库里用的多
- UTF-8编码
这里主要学习的是 String

其他类型的字符串
Rust 的标准库还包含了很多其他的字符串类型，例如 OsString、OsStr、CString、CStr
- String 后缀：通常是指可获得所有权
- Str 后缀：通常指是可借用的
- 可存储在不同编码的文本或在内存中以不同的形式展现(布局不同)
Library crate (第三方库)针对存储字符串可提供更多的选项

创建一个新的字符串(String)
很多 Vec 的操作都可用于 String(因为String就是Byte的集合)

String::new() 函数

fn main() {
  let mut s = String::new();
}

使用初始值来创建 String：

to_string() 方法，可用于实现了 Display trait 类型，包括字符串字面值

fn main() {
let data = "initial contents"; // 字符串字面值
let s = data.to_string(); // 使用 to_string 方法转变为 String 类型的变量 s
let s1 = "initial contents".to_string(); // 直接使用字符串字面值的 to_string 方法
}

String::from() 函数，从字面值创建 String

fn main() {
let s = String::from("initial contents");
}

UTF-8 编码的例子

fn main() {
  let hello = String::from("السلام عليكم");
  let hello = String::from("Dobrý den");
  let hello = String::from("Hello");
  let hello = String::from("שָׁלוֹם");
  let hello = String::from("नमस्ते");
  let hello = String::from("こんにちは");
  let hello = String::from("안녕하세요");
  let hello = String::from("你好");
  let hello = String::from("Olá");
  let hello = String::from("Здравствуйте");
  let hello = String::from("Hola");
}

更新 String

push_str() 方法：把一个字符串切片附加到 String

fn main() {
  let mut s = String::from("foo");
  s.push_str("bar");
  println!("{}", s);
}

push_str 的方法签名

pub fn push_str(&mut self, string: &str)

&str: 借用了字符串切片，字符串字面值就是字符串切片。
这个方法不会获得参数的所有权。

fn main() {
  let mut s = String::from("foo");
  let s1 = String::from("bar");
  s.push_str(&s1);
  println!("{}", s1);
}

把 s1 传入之后，再使用 s1，也不会报错。

push() 方法：把单个字符附加到 String

fn main() {
  let mut s = String::from("foo");
  s.push('l');
}

＋: 连接字符串
```
fn main() {
  let s1 = String::from("Hello, ");
  let s2 = String::from("World!");
  let s3 = s1 + &s2;
  println!("{}", s3);
  println!("{}", s1);
  println!("{}", s2);
}
```
注意加号前面的变量是 String 类型，加号后面的变量是字符串切片类型，实际上是 String 类型的引用。
拼接之后 s1 不可以继续使用了，s2 可以。
- ➕ 连接字符串，使用了类似这个签名的方法 fn add(self, s: &str) -> String {}
  - 标准库中的 add 方法使用了泛型
  - 只能把 &str 添加到 String
  - 解引用强制转换(deref coercion) - 把 String 的引用转换成了字符串切片。

第一个字符串和第二个字符串的引用相加到一起，add方法第二个参数是字符串切片，不是字符串引用。
因为第二个参数有 & ，所以第二个参数的所有权可以保留。
而第一个参数 slef ，没有 &，所以 add 操作会取得第一个参数的所有权。
所以字符串拼接之后，第一个参数的所有权就被移动到 add 函数内。

format!: 更灵活的连接多个字符串

fn main() {
  let s1 = String::from("tic");
  let s2 = String::from("tac");
  let s3 = String::from("toe");
  // let s3 = s1 + "-" + &s2 + "-" + &s3;
  // println!("{}", s3);
  let s = format!("{}-{}-{}", s1, s2, s3);
  println!("{}", s);
}

不用取得参数的所有权。

对 String 按索引的形式进行访问

按索引语法访问 String 的某部分，会报错
```
fn main() {
  let s1 = String::from("tic");
  let h = s1[0];
}
```
报错 ``bash error[E0277]: the typeStringcannot be indexed by{integer}--> src/main.rs:3:13 | 3 | let h = s1[0]; | ^^^^^Stringcannot be indexed by{integer}| = help: the traitIndex<{integer}>is not implemented forString`

error: aborting due to previous error

For more information about this error, try rustc --explain E0277. error: could not compile string


- Rust 的字符串不支持索引语法访问
<a name="BldRJ"></a>
### 内部表示
- String 是对 Vec<u8> 的包装
   - len() 方法
```rust
fn main() { 
    let len = String::from("Hola").len(); // 每个字母占用一个字节共4
    let len = String::from("Здравствуйте").len(); // 每个 Unicode 标量值 占用二个字节共 24 
    println!("{}", len);
}

String 里面的索引，并不能总是对应上 Unicode 标量值。

字节、标量值、字形簇

Bytes, Scalar Values, Grapheme Clusters

Rust 有三种看待字符串的方式

字节

fn main() {
let w = "नमस्ते"; // 梵文书写的印度语单词
for b in w.bytes() {
   println!("{}", b);
}
}

标量值

fn main() {
let w = "नमस्ते"; // 梵文书写的印度语单词
for b in w.chars() {
   println!("{}", b);
}
}

字形簇(最接近所谓的“字母”) 从字符串中获取字形簇是很复杂的，所以标准库并没有提供这个功能。crates.io 上有些提供这样功能的 crate。

Rust 不允许对 String 进行索引的最后一个原因
- 索引操作应消耗一个常量时间 ( O(1) )
- 而 String 无法保证：需要遍历所有内容，来确定有多少个合法的字符
  切割 String

可以使用 [] 和一个范围来创建字符串的切片

fn main() {
  let hello = "Здравствуйте";
  let s = &hello[0..4];
  println!("{}", s);
}

4个字节对应两个字母，如果使用范围 [0..3]

fn main() {
  let hello = "Здравствуйте";
  let s = &hello[0..3];
  println!("{}", s);
}

会出现错误，切割时不是 char 的边界

thread 'main' panicked at 'byte index 3 is not a char boundary; 
  it is inside 'д' (bytes 2..4) of `Здравствуйте`', src/main.rs:3:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

必须谨慎使用
如果切割时跨越了字符边界，程序就会 panic。
遍历 String 的方法

对于标量值：chars() 方法
对于字节：bytes() 方法
对于字形簇：很复杂，标准库未提供

String 不简单
Rust 选择将正确处理 String 数据作为所有 Rust 程序的默认行为
- 程序员必须在处理 UTF-8 数据之前投入更多的精力
可防止在开发后期处理涉及非 ASCII 字符的错误。

8.2 String

字符串是什么

通常说的字符串是指？

其他类型的字符串

创建一个新的字符串(String)

更新 String

对 String 按索引的形式进行访问

字节、标量值、字形簇

切割 String

遍历 String 的方法

String 不简单