使用正则表达式指定模式
- 元字符
Python正则表达式
re.findall()
- 示例 1：re.findall()
Program to extract numbers from a string import re string = ‘hello 12 hi 89. Howdy 34’ pattern = ‘\d+’ result = re.findall(pattern, string) print(result) # Output: [‘12’, ‘89’, ‘34’]
如果找不到模式，则re.findall()返回一个空列表。
- 重新拆分（）
  - 示例 2：re.split()
- re.sub()
  - 示例 3：re.sub()
Program to remove all whitespaces import re # multiline string string = ‘abc 12\ de 23 \n f45 6’ # matches all whitespace characters pattern = ‘\s+’ # empty string replace = ‘’ new_string = re.sub(pattern, replace, string) print(new_string) # Output: abc12de23f456
如果找不到模式，则re.sub()返回原始字符串。
- re.subn()
  - 示例 4：re.subn()
Program to remove all whitespaces import re # multiline string string = ‘abc 12\ de 23 \n f45 6’ # matches all whitespace characters pattern = ‘\s+’ # empty string replace = ‘’ new_string = re.subn(pattern, replace, string) print(new_string) # Output: (‘abc12de23f456’, 4)
- 研究（）
  - 示例 5：re.search()
- 匹配对象

甲寄存器ular例PRESSION（正则表达式）是字符序列，它定义一个搜索模式。例如，
^a…s$
上面的代码定义了一个 RegEx 模式。模式是：任何以 5 个字母开头的字符串一种并以秒.
使用 RegEx 定义的模式可用于匹配字符串。

表达	细绳	匹配？
^a…s$	abs	不匹配
	alias	比赛
	abyss	比赛
	Alias	不匹配
	An abacus	不匹配

Python 有一个名为re与 RegEx 一起使用的模块。下面是一个例子：

import re pattern = ‘^a…s$’ test_string = ‘abyss’ result = re.match(pattern, test_string) if result: print(“Search successful.”) else: print(“Search unsuccessful.”)
在这里，我们使用re.match()函数来搜索图案内测试字符串. 如果搜索成功，该方法将返回一个匹配对象。如果不是，则返回None。

还有其他几个函数定义在关于模块与 RegEx 一起使用。在我们探索之前，让我们先了解一下正则表达式本身。
如果您已经了解 RegEx 的基础知识，请跳转到Python RegEx。

使用正则表达式指定模式

要指定正则表达式，使用元字符。在上面的例子中，^和$是元字符。

元字符

元字符是由 RegEx 引擎以特殊方式解释的字符。以下是元字符列表：
[] . ^ $ * + ? {} () \ |

[] - 方括号
方括号指定要匹配的一组字符。

表达	细绳	匹配？
[abc]	a	1 场比赛
	ac	2 场比赛
	Hey Jude	不匹配
	abc de ca	5 场比赛

在这里，[abc]如果您尝试匹配的字符串包含a,b或中的任何一个，则将匹配c。
您还可以使用-方括号内指定一个字符范围。

[a-e]与相同[abcde]。
[1-4]与相同[1234]。
[0-39]与相同[01239]。

您可以通过^在方括号开头使用插入符号来补充（反转）字符集。

[^abc] 表示任何字符，除了一种或者乙或者 C.
[^0-9] 表示任何非数字字符。

.-期间
句点匹配任何单个字符（换行符除外’\n’）。

表达	细绳	匹配？
..	a	不匹配
	ac	1 场比赛
	acd	1 场比赛
	acde	2 个匹配项（包含 4 个字符）

^-插入符号
插入符号^用于检查字符串是否以某个字符开头。

表达	细绳	匹配？
^a	a	1 场比赛
	abc	1 场比赛
	bac	不匹配
^ab	abc	1 场比赛
	acb	不匹配（以开头a但不以b）

$-美元
美元符号$用于检查字符串是否以某个字符结尾。

表达	细绳	匹配？
a$	a	1 场比赛
	formula	1 场比赛
	cab	不匹配

-明星
星号匹配剩下的零次或多次出现的模式。

表达	细绳	匹配？
ma*n	mn	1 场比赛
	man	1 场比赛
	maaan	1 场比赛
	main	不匹配（a后面没有n）
	woman	1 场比赛

+-加上
加号+匹配剩下的一个或多个模式。

表达	细绳	匹配？
ma+n	mn	无匹配（无a字符）
	man	1 场比赛
	maaan	1 场比赛
	main	不匹配（a 后面没有 n）
	woman	1 场比赛

?-问号
问号符号?匹配剩下的模式的零次或一次出现。

表达	细绳	匹配？
ma?n	mn	1 场比赛
	man	1 场比赛
	maaan	不匹配（多于一个a字符）
	main	不匹配（a 后面没有 n）
	woman	1 场比赛

{}-大括号
考虑以下代码：{n,m}。这意味着至少n，并且至多米重复的模式留给它。

表达	细绳	匹配？
a{2,3}	abc dat	不匹配
	abc daat	1 场比赛（在）daat
	aabc daaat	2 场比赛（在aabc和）daaat
	aabc daaaat	2 场比赛（在aabc和）daaaat

让我们再试一个例子。此 RegEx[0-9]{2, 4}匹配至少 2 位但不超过 4 位

表达	细绳	匹配？
[0-9]{2,4}	ab123csde	1 场比赛（比赛于）ab123csde
	12 and 345673	3 匹配 ( 12, 3456, 73)
	1 and 2	不匹配

|-交替
竖线|用于交替（or运算符）。

表达	细绳	匹配？
a\|b	cde	不匹配
	ade	1 场比赛（比赛于ade）
	acdbea	3 场比赛（在）acdbea

在这里，a|b匹配任何包含其中之一的字符串一种或者乙

()-组
括号()用于对子模式进行分组。例如，(a|b|c)xz匹配与任一匹配的任何字符串一种或者乙或者 C 其次是 xz

表达	细绳	匹配？
(a\|b\|c)xz	ab xz	不匹配
	abxz	1 场比赛（比赛于）abxz
	axz cabxz	2 场比赛（在）axzbc cabxz

-反斜杠
反冲\用于转义各种字符，包括所有元字符。例如，
\$a如果字符串包含$后跟a. 在这里，$RegEx 引擎不会以特殊方式解释。
如果不确定某个字符是否有特殊含义，可以放在\它前面。这可确保不会以特殊方式对待角色。

特殊序列
特殊序列使常用模式更易于编写。这是特殊序列的列表：

\A - 如果指定的字符位于字符串的开头，则匹配。

表达	细绳	匹配？
\Athe	the sun	比赛
	In the sun	不匹配

\b - 如果指定的字符位于单词的开头或结尾，则匹配。

表达	细绳	匹配？
\bfoo	football	比赛
	a football	比赛
	afootball	不匹配
foo\b	the foo	比赛
	the afoo test	比赛
	the afootest	不匹配

\B- 对面\b。如果指定的字符不在单词的开头或结尾，则匹配。

表达	细绳	匹配？
\Bfoo	football	不匹配
	a football	不匹配
	afootball	比赛
foo\B	the foo	不匹配
	the afoo test	不匹配
	the afootest	比赛

\d- 匹配任何十进制数字。相当于[0-9]

表达	细绳	匹配？
\d	12abc3	3 场比赛（在）12abc3
	Python	不匹配

\D- 匹配任何非十进制数字。相当于[^0-9]

表达	细绳	匹配？
\D	1ab34”50	3 场比赛（在）1ab34”50
	1345	不匹配

\s- 匹配包含任何空白字符的字符串。相当于[ \t\n\r\f\v]。

表达	细绳	匹配？
\s	Python RegEx	1 场比赛
	PythonRegEx	不匹配

\S- 匹配包含任何非空白字符的字符串。相当于[^ \t\n\r\f\v]。

表达	细绳	匹配？
\S	a b	2 场比赛（在）ab
		不匹配

\w- 匹配任何字母数字字符（数字和字母）。相当于[a-zA-Z0-9]。顺便说一下，下划线也被认为是一个字母数字字符。

表达	细绳	匹配？
\w	12&”: ;c	3 场比赛（在）12&”: ;c
	%”> !	不匹配

\W- 匹配任何非字母数字字符。相当于[^a-zA-Z0-9_]

表达	细绳	匹配？
\W	1a2%c	1 场比赛（在）1a2%c
	Python	不匹配

\Z - 如果指定的字符位于字符串的末尾，则匹配。

表达	细绳	匹配？
Python\Z	I like Python	1 场比赛
	I like Python Programming	不匹配
	Python is fun.	不匹配

提示：要构建和测试正则表达式，您可以使用 RegEx 测试器工具，例如regex101。该工具不仅可以帮助您创建正则表达式，还可以帮助您学习它。
现在您了解了 RegEx 的基础知识，让我们讨论如何在您的 Python 代码中使用 RegEx。

Python正则表达式

Python 有一个名为re正则表达式的模块。要使用它，我们需要导入模块。
import re
该模块定义了几个函数和常量来使用 RegEx。

re.findall()

该re.findall()方法返回包含所有匹配项的字符串列表。

示例 1：re.findall()

Program to extract numbers from a string import re string = ‘hello 12 hi 89. Howdy 34’ pattern = ‘\d+’ result = re.findall(pattern, string) print(result) # Output: [‘12’, ‘89’, ‘34’]
如果找不到模式，则re.findall()返回一个空列表。

重新拆分（）

该re.split方法在匹配的地方拆分字符串，并返回发生拆分的字符串列表。

示例 2：re.split()

import re string = ‘Twelve:12 Eighty nine:89.’ pattern = ‘\d+’ result = re.split(pattern, string) print(result) # Output: [‘Twelve:’, ‘ Eighty nine:’, ‘.’]
如果找不到模式，则re.split()返回一个包含原始字符串的列表。

您可以将maxsplit参数传递给re.split()方法。这是将发生的最大拆分数。

import re string = ‘Twelve:12 Eighty nine:89 Nine:9.’ pattern = ‘\d+’ # maxsplit = 1 # split only at the first occurrence result = re.split(pattern, string, 1) print(result) # Output: [‘Twelve:’, ‘ Eighty nine:89 Nine:9.’]
顺便说一下，的默认值maxsplit是0；意味着所有可能的分裂。

re.sub()

的语法re.sub()是：
re.sub(pattern, replace, string)
该方法返回一个字符串，其中匹配的出现被替换为代替多变的。

示例 3：re.sub()

Program to remove all whitespaces import re # multiline string string = ‘abc 12\ de 23 \n f45 6’ # matches all whitespace characters pattern = ‘\s+’ # empty string replace = ‘’ new_string = re.sub(pattern, replace, string) print(new_string) # Output: abc12de23f456
如果找不到模式，则re.sub()返回原始字符串。

你可以通过数数作为方法的第四个参数re.sub()。如果省略，则结果为 0。这将替换所有出现的内容。
import re # multiline string string = ‘abc 12\ de 23 \n f45 6’ # matches all whitespace characters pattern = ‘\s+’ replace = ‘’ new_string = re.sub(r’\s+’, replace, string, 1) print(new_string) # Output: # abc12de 23 # f45 6

re.subn()

的re.subn()类似re.sub()，除了它返回包含新的字符串和由取代数目2项的元组。

示例 4：re.subn()

Program to remove all whitespaces import re # multiline string string = ‘abc 12\ de 23 \n f45 6’ # matches all whitespace characters pattern = ‘\s+’ # empty string replace = ‘’ new_string = re.subn(pattern, replace, string) print(new_string) # Output: (‘abc12de23f456’, 4)

研究（）

该re.search()方法有两个参数：一个模式和一个字符串。该方法查找 RegEx 模式与字符串匹配的第一个位置。
如果搜索成功，则re.search()返回一个匹配对象；如果没有，则返回None。
match = re.search(pattern, str)

示例 5：re.search()

import re string = “Python is fun” # check if ‘Python’ is at the beginning match = re.search(‘\APython’, string) if match: print(“pattern found inside the string”) else: print(“pattern not found”) # Output: pattern found inside the string
这里，比赛包含匹配对象。

匹配对象

您可以使用dir()函数获取匹配对象的方法和属性。
匹配对象的一些常用方法和属性有：

匹配组（）

该group()方法返回字符串中匹配的部分。

示例 6：匹配对象

import re string = ‘39801 356, 2102 1111’ # Three digit number followed by space followed by two digit number pattern = ‘(\d{3}) (\d{2})’ # match variable contains a Match object. match = re.search(pattern, string) if match: print(match.group()) else: print(“pattern not found”) # Output: 801 35
这里，比赛变量包含一个匹配对象。
我们的模式(\d{3}) (\d{2})有两个子组(\d{3})和(\d{2})。您可以获取这些带括号的子组的字符串部分。就是这样：
>>> match.group(1) ‘801’ >>> match.group(2) ‘35’ >>> match.group(1, 2) (‘801’, ‘35’) >>> match.groups() (‘801’, ‘35’)

match.start()、match.end() 和 match.span()

该start()函数返回匹配子字符串的起始索引。类似地，end()返回匹配子字符串的结束索引。
>>> match.start() 2 >>> match.end() 8
该span()函数返回一个包含匹配部分的开始和结束索引的元组。
>>> match.span() (2, 8)

match.re 和 match.string

re匹配对象的属性返回一个正则表达式对象。同样，string属性返回传递的字符串。
>>> match.re re.compile(‘(\d{3}) (\d{2})’) >>> match.string ‘39801 356, 2102 1111’

我们已经涵盖了re模块中定义的所有常用方法。如果您想了解更多信息，请访问Python 3 re 模块。

在 RegEx 前使用 r 前缀

什么时候 r 或者电阻prefix 用在正则表达式之前，表示原始字符串。例如，’\n’是一个新行而r’\n’意味着两个字符：一个反斜杠\后跟n.
反冲\用于转义各种字符，包括所有元字符。但是，使用r前缀使\视为正常字符。

示例 7：使用 r 前缀的原始字符串

import re string = ‘\n and \r are escape sequences.’ result = re.findall(r’[\n\r]’, string) print(result) # Output: [‘\n’, ‘\r’]

Python入门教程

Python 正则表达式

使用正则表达式指定模式

元字符

Python正则表达式

re.findall()

示例 1：re.findall()

Program to extract numbers from a string import re string = ‘hello 12 hi 89. Howdy 34’ pattern = ‘\d+’ result = re.findall(pattern, string) print(result) # Output: [‘12’, ‘89’, ‘34’]
如果找不到模式，则re.findall()返回一个空列表。

重新拆分（）

示例 2：re.split()

re.sub()

示例 3：re.sub()

re.subn()

示例 4：re.subn()

Program to remove all whitespaces import re # multiline string string = ‘abc 12\ de 23 \n f45 6’ # matches all whitespace characters pattern = ‘\s+’ # empty string replace = ‘’ new_string = re.subn(pattern, replace, string) print(new_string) # Output: (‘abc12de23f456’, 4)

研究（）

示例 5：re.search()

匹配对象

匹配组（）

示例 6：匹配对象

match.start()、match.end() 和 match.span()

match.re 和 match.string

在 RegEx 前使用 r 前缀

示例 7：使用 r 前缀的原始字符串

Python 正则表达式

使用正则表达式指定模式

元字符

Python正则表达式

re.findall()

示例 1：re.findall()

Program to extract numbers from a string import re string = ‘hello 12 hi 89. Howdy 34’ pattern = ‘\d+’ result = re.findall(pattern, string) print(result) # Output: [‘12’, ‘89’, ‘34’] 如果找不到模式，则re.findall()返回一个空列表。

重新拆分（）

示例 2：re.split()

re.sub()

示例 3：re.sub()

re.subn()

示例 4：re.subn()

Program to remove all whitespaces import re # multiline string string = ‘abc 12\ de 23 \n f45 6’ # matches all whitespace characters pattern = ‘\s+’ # empty string replace = ‘’ new_string = re.subn(pattern, replace, string) print(new_string) # Output: (‘abc12de23f456’, 4)

研究（）

示例 5：re.search()

匹配对象

匹配组（）

示例 6：匹配对象

match.start()、match.end() 和 match.span()

match.re 和 match.string

在 RegEx 前使用 r 前缀

示例 7：使用 r 前缀的原始字符串

Program to extract numbers from a string import re string = ‘hello 12 hi 89. Howdy 34’ pattern = ‘\d+’ result = re.findall(pattern, string) print(result) # Output: [‘12’, ‘89’, ‘34’]
如果找不到模式，则re.findall()返回一个空列表。