Rust 中有多种字符串类型。
image.png

1、分类

Rust的文本类型主要包含6种:characterstringraw stringbytebyte stringraw byte string

1.1、character(rust类型为:char)

以一对单引号包含的单个合法的Unicode character,可以包含单引号自身,但是此时要以斜杠转义,如下都是合法的character字符:

  1. let s1 = 'H';
  2. let s2 = '\'';
  3. let s3 = '"';
  4. let s4 = '\n';
  5. let s5 = '\x41'; // 与s6等价
  6. let s6 = 'A';
  7. let s7 = '\u{6211}'; //与s8等价
  8. let s8 = '我';

特别注意:char值的合法范围是 0x0000 ~ 0xD7FF 或者 0xE000 ~ 0x10FFFF,采用UTF-32编码,固定4个字节长度。

1.2、string(rust类型为:&str)

**
以一对双引号包含的多个合法的Unicode character,可以包含双引号自身,但是此时要以斜杠转义,并允许使用”\”换行,此时下一行行首的所有空白字符会被自动去除,如下都是合法的string:

  1. let s1 = "HHHH";
  2. let s2 = "\"";
  3. let s3 = "''";
  4. let s4 = "\n\n\n";
  5. let s5 = "\x41\x41"; // 与s6等价
  6. let s6 = "AA";
  7. let s7 = "\u{6211}\u{6211}"; //与s8等价
  8. let s8 = "我我";
  9. let s9 = "hello"; // 与s10等价
  10. let s10 = "he\
  11. llo";

1.3、raw string(rust类型为:&str)

raw string不处理任意转义字符,以r开头,紧跟着0~n个#字符,中间是任何的Unicode character序列,然后以同样数量的 # 结束,以下都是合法的 raw string:

  1. let s1 = r"abc"; // -> abc
  2. let s2 = r"abc'"; // -> abc'
  3. let s3 = r"我"; // -> 我
  4. let s4 = r"\x41"; // -> \x41
  5. let s5 = r"\n"; // -> \n
  6. let s6 = r"\u{6211}\u{6211}"; // -> \u{6211}\u{6211}
  7. let s7 = r#"""#; // -> "
  8. let s8 = r"###"; // -> ###
  9. let s9 = r#"hello
  10. world"#; // -> hello\n\n\tworld
  • 如果 Unicode character序列不包含双引号,则首尾的#可以省略。
  • 如果 Unicode character序列包含有#序列,并且包含双引号字符,则位于开始和结束的#数量要比Unicode character序列包含的连续的#的最多个数至少多一个。

    1.4、byte literals(rust类型为:u8)

    **
    b字符开头,一对单引号包含的单个合法的ASCII,可以包含单引号自身,但是此时要以斜杠转义,如下都是合法的byte literals字符,与character的差别是:character支持unicodebyte literals不支持Unicode,如下都是合法的byte literals

    1. let s1 = b'\x41';
    2. let s2 = b'A';
    3. let s3 = b'\t';
    4. let s4 = b'\'';
    5. let s5 = b'"';
    6. let s6 = b'\\';

    1.5、byte string literals(rust类型是:&[u8; usize])

    b字符开头,一对双引号包含的多个合法的ASCII,可以包含双引号自身,但是此时要以斜杠转义,并允许使用”\”换行,此时下一行行首的所有空白字符会被自动去除,如下都是合法的byte string literals

    1. let s1 = b"\x41\x41";
    2. let s2 = b"AA";
    3. let s3 = b"\t\t";
    4. let s4 = b"\'\'";
    5. let s5 = b"\"\"";
    6. let s6 = b"\\\\";

    1.6、raw byte string literals(rust类型是:&[u8; usize])

    raw byte string不处理任意转移字符,以br开头,紧跟着0~n个#字符,中间是任何的ASCII序列,然后以同样数量的 #结束,以下都是合法的raw byte string literals

    1. let s1 = br"abc";
    2. let s2 = br"abc'";
    3. let s3 = br"\x41";
    4. let s4 = br"\n";
    5. let s5 = br"\u{6211}\u{6211}";
    6. let s6 = br#"""#;
    7. let s7 = br"###";
    8. let s8 = r#"hello
    9. world"#; // -> hello\n\n\tworld
  • 如果raw byte string序列不包含双引号,则首尾的#可以省略;

  • 如果raw byte string序列包含有#序列,并且包含双引号字符,则位于开始和结束的#数量要比raw byte string序列包含的连续的#的最多个数至少多一个。

    重要类型 String

    其基本等价于一个智能指针,内部实现为:
    1. pub struct String {
    2. vec: Vec<u8>,
    3. }

    2、应用场景

    2.1、类型转换

    String to &str

    ```rust let s1 = String::from(“中国-China”); let s2 = s1.as_str();
  1. <a name="RvHGF"></a>
  2. #### `&str` to `String`
  3. ```rust
  4. let s1 = "中国-China";
  5. let s2 = s1.to_string();
  6. let s3 = String::from("中国-China");
  7. let s4: String = s1.into();

String/&str to slice: &[u8]

  1. let s1 = String::from("中国-China");
  2. let s2 = s1.as_bytes();
  3. let s1 = "中国-China";
  4. let s2 = s1.as_bytes();

char to u8u8 to char

  1. let s1: u8 = 70;
  2. let s2 = s1 as char;
  3. let s3 = 'H';
  4. let s4 = s3 as u8;

2.2、具体场景

计算字节长度

  1. fn main() {
  2. let s1 = "中国-China";
  3. println!("{:?}", s1.len()); // -> 12
  4. let s2 = String::from("中国-China");
  5. println!("{:?}", s2.len()); // -> 12
  6. }

计算字符个数

  1. fn main() {
  2. let s1 = "中国-China";
  3. println!("{:?}", s1.chars().count()); // -> 8
  4. let s2 = String::from("中国-China");
  5. println!("{:?}", s2.chars().count()); // -> 8
  6. }

截取指定开始的 n 个的字符

  1. fn substr(s: &str, start: usize, length: usize) -> String {
  2. s.chars().skip(start).take(length).collect()
  3. }

获取指定位置开始的n个字节(如果存在非法的字符边界,则返回None)

  1. fn main() {
  2. let s = String::from("中国-China");
  3. println!("{:?}", s.get(0..=5)); // -> Some("中国")
  4. println!("{:?}", s.get(0..=4)); // -> None
  5. }

判断是不是包含某个子串

  1. fn main() {
  2. let s1 = "中国-China";
  3. let s2 = String::from("中国-China");
  4. assert_eq!(true, s1.contains("中国"));
  5. assert_eq!(true, s2.contains("中国"));
  6. }

判断是不是以某个字符串开头

  1. fn main() {
  2. let s1 = "中国-China";
  3. let s2 = String::from("中国-China");
  4. assert_eq!(true, s1.starts_with("中国"));
  5. assert_eq!(true, s2.starts_with("中国"));
  6. }

判断是不是以某个字符串结尾

  1. fn main() {
  2. let s1 = "中国-China";
  3. let s2 = String::from("中国-China");
  4. assert_eq!(true, s1.ends_with("China"));
  5. assert_eq!(true, s2.ends_with("China"));
  6. }

全部转为大写

  1. fn main() {
  2. let s1 = "中国-China";
  3. let s2 = String::from("中国-China");
  4. println!("{:?}", s1.to_uppercase()); // -> 中国-CHINA
  5. println!("{:?}", s2.to_uppercase()); // -> 中国-CHINA
  6. //请注意与 to_uppercase() 的不同
  7. let mut s3 = String::from("中国-China");
  8. s3.make_ascii_uppercase();
  9. println!("{:?}", s3); // -> 中国-CHINA
  10. }

全部转为小写

  1. fn main() {
  2. let s1 = "中国-China";
  3. let s2 = String::from("中国-China");
  4. println!("{:?}", s1.to_lowercase()); // -> 中国-china
  5. println!("{:?}", s2.to_lowercase()); // -> 中国-china
  6. //请注意与 to_lowercase() 的不同
  7. let mut s3 = String::from("中国-China");
  8. s3.make_ascii_lowercase();
  9. println!("{:?}", s3); // -> 中国-china
  10. }

判断是不是ASCII字符串

  1. fn main() {
  2. let s1 = "中国-China";
  3. let s2 = "China";
  4. assert_eq!(false, s1.is_ascii());
  5. assert_eq!(true, s2.is_ascii());
  6. }

判断指定位置是不是一个合法的 UTF-8 边界

  1. fn main() {
  2. let s = String::from("中国-China");
  3. assert_eq!(true, s.is_char_boundary(0));
  4. assert_eq!(true, s.is_char_boundary(12));
  5. assert_eq!(false, s.is_char_boundary(2));
  6. assert_eq!(true, s.is_char_boundary(3));
  7. }

字符串替换

  1. fn main() {
  2. let s = String::from("中国-China");
  3. println!("{:?}", s.replace("中国", "China"));
  4. }

字符串切割

  1. fn main() {
  2. let s = String::from("中国-China");
  3. let result: Vec<&str> = s.split("-").collect();
  4. println!("{:?}", result); // -> ["中国", "China"]
  5. }