10 正则查找 - 《Python正则手册》

实现正则查找的函数有：

re.findall：在字符串中找到正则表达式所匹配的所有子串，并返回一个列表，如果没有找到匹配的，则返回空列表。
re.finditer：在字符串中找到正则表达式所匹配的所有子串，并把它们作为一个迭代器返回。

看完了正则匹配，相信正则查找对于你来说已经很简单。下面直接举几个例子。
示例1 我们需要找出这段文本中所有的数字：

s = ' taobao 123 google 456'
re.findall("\d+", s)

结果：

['123', '456']

使用finditer返回迭代器：

it = re.finditer("\d+", s)
for match_obj in it:
    print(match_obj.group(), end=" ")

结果：

123 456

示例2
例如我们希望查找出下面这段英文中所有4个字母的单词：

s = "Clothes are so significant in our daily life that we can't live withoutthem"
re.findall(r"\b[a-z]{4}\b", s, re.I)

结果：

['life', 'that', 'live', 'them']

可以看到很顺利的找到了想要的结果。

\b 表示单词边界，可以回正则规则匹配表查看

也可以使用re.finditer方法返回一个迭代器：

for match_obj in re.finditer(r"\b[a-z]{4}\b", s, re.I):
    print(match_obj.group(), end=" ")

结果：

life
that
live
them

注意：re.finditer方法返回的迭代器迭代取出的每一个对象都是 re.Match 对象
示例3
提取出下面文本中所有的单词（被双引号引起来的要作为一个单词，例如the little cat，最终结果无需去重）： we found “the little cat” is in the hat, we like “the little cat”

s = 'we found "the little cat" is in the hat, we like "the little cat"'
print(re.findall('\w+|".*?"', s))

结果：

['we', 'found', '"the little cat"', 'is', 'in', 'the', 'hat', 'we', 'like','"the little cat"']

示例4 提取出下面网页中head 标签的内容：

<html>
    <head>
        <title>学习正则表达式</title>
    </head>
    <body></body>
</html>

参考解法：

s = """<html>
    <head>
        <title>学习正则表达式</title>
    </head>
    <body></body>
    </html>"""
re.findall("(?si)<head>(.*)<\/head>", s)

结果：

['\n\t\t<title>学习正则表达式</title>\n\t']