正则表达式 - RegExp 和字符串上的方法 - 《现代 JavaScript 教程》

str.match(regexp)
str.matchAll(regexp)
str.split(regexp|substr, limit)
str.search(regexp)
str.replace(str|reg, str|func)
regexp.exec(str)
regexp.test(str)

本章深入介绍与正则相关的方法。

`str.match(regexp)`

str.match(regexp) 方法查找字符串 str 中匹配 regexp 的项目。

返回结果包含三种情况：

一、regexp 没有使用标志 g，会返回一个数组：包含第一个匹配项和所有捕获组信息，还有 index 属性（匹配索引）、input 属性（原始字符串，等于 str）。

let str = "I love JavaScript";
let result = str.match(/Java(Script)/);
alert( result[0] );     // JavaScript (full match)
alert( result[1] );     // Script (first capturing group)
alert( result.length ); // 2
// Additional information:
alert( result.index );  // 0 (match position)
alert( result.input );  // I love JavaScript (source string)

二、regexp 包含标志 g，会返回一个数组：由所有（字符串）匹配项组成，不包含捕获组和其他信息。

let str = "I love JavaScript";
let result = str.match(/Java(Script)/g);
alert( result[0] ); // JavaScript
alert( result.length ); // 1

三、如果无匹配，不管有无标志 g，统一返回 null。

注意，没有匹配的时候，不是返回空数组，而是返回 null。忘记这一点，很容易导致错误。

let str = "I love JavaScript";
let result = str.match(/HTML/);
alert(result); // null
alert(result.length); // Error: Cannot read property 'length' of null

当然，可以对返回结果做判断，始终返回一个数组。

let result = str.match(regexp) || [];

`str.matchAll(regexp)`

str.matchAll(regexp) 方法可以看成是 str.match(regexp) 方法的“改进版”，与后者有 3 点不同：

会返回一个可迭代对象，而非数组。我们可以使用 Array.from 转成一个数组。
可迭代对象的每一个成员都是携带捕获组信息的数组，与不带 g 标志的 str.match 方法返回的数据结构一致。
如果无匹配项，还是返回可迭代对象，不过是空的。

例如：

let str = '<h1>Hello, world!</h1>';
let regexp = /<(.*?)>/g;
let matchAll = str.matchAll(regexp);
alert(matchAll); // [object RegExp String Iterator], not array, but an iterable
matchAll = Array.from(matchAll); // array now
let firstMatch = matchAll[0];
alert( firstMatch[0] );  // <h1>
alert( firstMatch[1] );  // h1
alert( firstMatch.index );  // 0
alert( firstMatch.input );  // <h1>Hello, world!</h1>

我们可以使用 for..of 循环遍历 matchAll 变量，这样也就不需要使用 Array.from 了。

`str.split(regexp|substr, limit)`

使用正则实例 regexp 或字符串 substr 作为分隔符，拆分 str。

使用字符串作为分隔符：

alert('12-34-56'.split('-')) // array of [12, 34, 56]

使用正则实例作为分隔符：

alert('12, 34, 56'.split(/,\s*/)) // array of [12, 34, 56]

`str.search(regexp)`

str.search(regexp) 返回第一个匹配项的索引位置，无匹配项返回 -1:

let str = "A drop of ink may make a million think";
alert( str.search( /ink/i ) ); // 10 (first match position)

请注意：**search** 方法只会查找第一个匹配项。

如果还需要查找后续的匹配项，请使用其他查找方法。比如，查找全部匹配项信息的 str.matchAll(regexp) 方法。

`str.replace(str|reg, str|func)`

str.replace 方法是查找和替换字符串的利器。

举个简单例子：

// replace a dash by a colon
alert('12-34-56'.replace("-", ":")) // 12:34-56

这里有个陷阱。

当 replace 方法的第一个参数是字符串的时候，只会查找并替换第一个匹配项。

从上例结果可以看到，只有第一个“-”被替换成了“:”。为了替换掉所有的“-”，要使用 /-/g。

// replace all dashes by a colon
alert( '12-34-56'.replace( /-/g, ":" ) )  // 12:34:56

第二个参数表示替换字符串，我们可以在该字符串中使用特殊符号，插入匹配项的片段信息：

语雀内容

举例：

let str = "John Smith";
// swap first and last name
alert(str.replace(/(john) (smith)/i, '$2, $1')) // Smith, John

第二个参数还可以是一个函数，用来处理更加“智能”的替换场景。

该函数会在每一个匹配项上调用，函数返回值会作为替换内容插入到结果字符串中。

回调函数会这种参数形式调用：func(str, p1, p2, ..., pn, offset, input, groups)：

str —— 匹配项
p1, p2, ..., pn —— 捕获组（即圆括号包围的部分）匹配内容
offset —— 匹配项在字符串中的索引位置
input —— 源字符串
groups —— 命名捕获组对象。

如果正则中没有捕获组，则回调仅包含 3 个参数：func(str, offset, input)。

举例，大写所有匹配项：

let str = "html and css";
let result = str.replace(/html|css/gi, str => str.toUpperCase());
alert(result); // HTML and CSS

使用匹配项的索引值替换匹配项：

alert("Ho-Ho-ho".replace(/ho/gi, (match, offset) => offset)); // 0-3-6

下例正则中，包含两个捕获组，替换函数在调用时会携带 5 个参数：完全匹配项、第一个捕获组匹配内容、第二个捕获组匹配内容、完全匹配项在源字符串中的索引位置以及源字符串。

let str = "John Smith";
let result = str.replace(/(\w+) (\w+)/, (match, name, surname) => `${surname}, ${name}`);
alert(result); // Smith, John

如果存在很多捕获组，还可以使用剩余参数运算符收集、访问：

let str = "John Smith";
let result = str.replace(/(\w+) (\w+)/, (...match) => `${match[2]}, ${match[1]}`);
alert(result); // Smith, John

如果使用了命名捕获组，回调的最后一个参数就是包含命名捕获组信息的对象：

let str = "John Smith";
let result = str.replace(/(?<name>\w+) (?<surname>\w+)/, (...match) => {
  let groups = match.pop();
  return `${groups.surname}, ${groups.name}`;
});
alert(result); // Smith, John

替换函数为我们提供了终极替代能力，因为它可以获取有关匹配的所有信息，还能执行包括访问外部变量在内的几乎所有操作。

`regexp.exec(str)`

regexp.exec(str) 方法返回字符串 str 中匹配 regexp 的首个项目。与之前介绍的方法不同的是，这是正则对象上的方法，而非字符串方法。

regexp.exec(str) 方法的返回结构，根据有无标志 g，也有区别。

如果没有 g，regexp.exec(str) 的作用与 str.match(regexp) 一样，返回第一个匹配结果。

如果包含 g：

调用 regexp.exec(str) 会返回第一个匹配项，并且将紧跟在当前匹配项后面那个位置的索引值，记录到 regexp.lastIndex 属性中，
下一次对 regexp.exec(str) 的调用，会从 regexp.lastIndex 记录的位置开始查找，
……
如果无匹配项，返回 null，并且重置 regexp.lastIndex 属性值为 0。

因此，对同一个字符串重复调用 regexp.exec(str) 方法，将返回所有匹配项，过程中会使用 regexp.lastIndex 属性记录最新的起始查找位置。

在 str.matchAll 方法添加进 JS 之前，我们通常会使用循环调用 regexp.exec 的方式找到所有匹配项：

let str = 'More about JavaScript at https://javascript.info';
let regexp = /javascript/ig;
let result;
while (result = regexp.exec(str)) {
  alert( `Found ${result[0]} at position ${result.index}` );
  // Found JavaScript at position 11, then
  // Found javascript at position 33
}

这种方式现在还奏效，不过使用最新的 str.matchAll 方法会更加方便。

我们还能可以通过手动设置 regexp 的 lastIndex 属性，从指定位置开始查找匹配项。

例如：

let str = 'Hello, world!';
let regexp = /\w+/g; // without flag "g", lastIndex property is ignored
regexp.lastIndex = 5; // search from 5th position (from the comma)
alert( regexp.exec(str) ); // world

如果使用了正则标志 y，那么将精确地从 regexp.lastIndex 位置处进行匹配。

我们将上例中的标志 g 替换为 y，发现没有匹配项，因为在索引 5 处不是一个字（word）。

let str = 'Hello, world!';
let regexp = /\w+/y;
regexp.lastIndex = 5; // search exactly at position 5
alert( regexp.exec(str) ); // null

当我们需要通过正则表达式从字符串的某个确切位置（而非更远的位置）“读取”内容时很方便。

regexp.test(str)

regexp.test(str) 方法查找字符串中是否包含对应匹配项，有的话返回 true，否则返回 false。

有对应匹配项的例子：

let str = "I love JavaScript";
// these two tests do the same
alert( /love/i.test(str) ); // true
alert( str.search(/love/i) != -1 ); // true

无对应匹配项的例子：

let str = "Bla-bla-bla";
alert( /love/i.test(str) ); // false
alert( str.search(/love/i) != -1 ); // false

如果正则包含标志 g，regexp.test 会查看和更新 regexp.lastIndex 属性，就像 regexp.exec 方法一样。

我们可以利用这个特性，从指定位置开始查找是否包含匹配项：

let regexp = /love/gi;
let str = "I love JavaScript";
// start the search from position 10:
regexp.lastIndex = 10;
alert( regexp.test(str) ); // false (no match)

注意，在不同的字符串上使用同一个正则实例，可能会导致错误结果。因为 regexp.test 会更新 regexp.lastIndex 属性，导致下一次查找是从非 0 索引处开始的。

下例中，我们使用同一个正则实例对两个一样的字符串做匹配，第二次就失败了：

let regexp = /javascript/g;  // (regexp just created: regexp.lastIndex=0)
alert( regexp.test("javascript") ); // true (regexp.lastIndex=10 now)
alert( regexp.test("javascript") ); // false

这是因为第二次匹配是从索引 10 处开始的。

为了能让方法正常使用，我们需要在查找前手动设置 regexp.lastIndex = 0；或者换用字符串方法 str.match/search/...，就避免了 lastIndex 带来的问题。

📄 文档信息

🕘 更新时间：2020/04/21
🔗 原文链接：http://javascript.info/regexp-methods