1-1 正则表达式 - 正则表达式 - 《Python》

正则表达式的竞争力
re包
正则表达式字符
真实应用场景
注意：

正则表达式的竞争力

节点最细只能细化到文本字符串，字符串切片适配度比较低，正则表达式则是对字符串进行定制化处理。

re包

match（匹配对象，正则表达式) 返回匹配结果 , 匹配失败<br />
group 匹配正则表达式中的圆括号，从外往里面匹配
例如 a（bc） group（0）对应abc，group（1）对应bc

正则表达式字符

任意字符/任意多次/至少一次

. * +

line = "gengdan2020-11-3"
regex = "...g"
result = re.match(regex, line)
print(result.group(0))

line = "gengdan2020-11-3"
regex = ".*"
result = re.match(regex, line)
print(result.group(0))

分组

group ( )
从左往右，从外往里匹配规则

line = "gengdan2020-11-3"
regex = "....(...)"
result = re.match(regex, line)
print(result.group(1))

限定开头结尾

^ 从字符串开头进行匹配
$ 从字符串末尾进行匹配

字符 $
line = "genggdan2020-11-3"
regex = ".*3$"
result = re.match(regex, line)
print(result.group(0))

字符 $ ^
line = "genggdan2020-11-3"
regex = "^g.*3$"
result = re.match(regex, line)
print(result.group(0))

非贪婪

? 非贪婪字符
定义匹配方向：贪婪就是从右往左匹配，非贪婪就是从左往右匹配<br />

line = "boooobby123"
regex = ".*?(b.*b).*"
match_obj = re.match(regex,line)
print(match_obj.group(1))

定义修饰对象是否必须出现：0-1个，在不是必须出现的字符后面加上？则该字符出现或者不出现都匹配成功

line = "2020"
regex = "^-?[1-9]\d*$"
match_obj = re.match(regex, line)
print(match_obj.group(0))

定制次数

{2} {2,4} 闭区间

line = "gengdan222"
regex = ".*(2{2})"
match_obj = re.match(regex,line)
print(match_obj.grou ap(1))

定制内容

根据ascill码
[ ] 定制内容
[^``] 定制“非内容“
[0-9a-zA-Z] 匹配所有单字符和数字
[\u4E00-\u9FA5] 匹配中文字符

line = "gengdan2020-11-3"
regex = ".*?([^3]{4}[-/]{1}[0-9]{2}[-/][0-9])"
match_obj = re.match(regex, line)
print(match_obj.group(1))

或者

| 搭配 ( ) 使用

line = "gengdan2020-11-3"
regex = "gengdan|gengdan2020-11-3"
match_obj = re.match(regex, line)
print(match_obj.group(0))

空格/非空格字符

\s \S

line = "gengdan 2020-11-3"
regex = "gengdan\s2020-11-3"
match_obj = re.match(regex, line)
print(match_obj.group(0))

字母+下划线+数字 / 非字母+下划线+数字

 `\w  \W`

line = "n_2020-11-3"
regex = "\w{6}\W"
match_obj = re.match(regex, line)
print(match_obj.group(0))

数字/非数字

\d \D

line = "2020-11-3"
regex = "\d+"
match_obj = re.match(regex, line)
print(match_obj.group(0))

真实应用场景

 line = "XXX出生于2001/6/1"
# line = "XXX出生于2001-6-1"
# line = "XXX出生于2001年06月01日"
# line =  "XXX出生于2001-06"
regex = ".*出生于(\d{4}[-/年]\d{1,2}([-月/]\d{1,2}|[-月/]$|$))"
match_obj = re.match(regex,line)
print(match_obj.group(1))

regex = ".*出生于(\d{4}[-/年]\d{1,2}   ([-月/]\d{1,2}  |  [-月/]$  |  $))"

注意：

1 形容词在修饰对象后面（取反，开头，结尾字符例外）
2 ^ 同时表示“开头”“非”
3 转义字符 \ 如果我想要的字符是正则表达式的一个字符，但是我要用这个字符，不需要字符带的功能，就要转义
4 https://www.bejson.com/othertools/regex/ 正则表达式生成+测试网址
5 先懂原理概念底层，才去用一些开箱即用的东西，否则随便一个bug都能消费掉大量时间，并且生成的东西有很大局限性，一旦业务场景有变，很可能就不适配。