Groups

A group can be created by surrounding part of a regular expression with parentheses.
This means that a group can be given as an argument to metacharacters such as * and ?.
使用()括起来表示groups
The content of groups in a match can be accessed using the group function.
A call of group(0) or group() returns the whole match.
A call of group(n), where n is greater than 0, returns the nth group from the left.
The method groups() returns all groups up from 1. ```python import re

pattern = r”a(bc)(de)(f(g)h)i”

match = re.match(pattern, “abcdefghijklmnop”)

if match: print(match.group()) print(match.group(0)) # 同上，返回整个匹配 print(match.group(1)) # 第1个group：bc print(match.group(2)) # 第2个：de print(match.group(3)) # 第3个：fgh print(match.group(4)) # 第4个：g print(match.groups())


- 
There are several kinds of special groups.<br />
Two useful ones are **named groups** and **non-capturing groups**.<br />
**Named groups** have the format **(?P...)**, where **name** is the name of the group, and **...** is the content. They behave exactly the same as normal groups, except they can be accessed by **group(name)** in addition to its number.（给group命名）<br />
**Non-capturing groups** have the format **(?:...)**. They are not accessible by the group method, so they can be added to an existing regular expression without breaking the numbering.（无法被group获取，用于添加而不影响编号）
```python
import re
pattern = r"(?P<first>abc)(?:def)(ghi)"
match = re.match(pattern, "abcdefghi")
if match:
    print(match.group("first"))
    print(match.groups())

Another important metacharacter is |.
This means “or”, so red|blue matches either “red” or “blue”. ```python import re

pattern = r”gr(a|e)y”

match = re.match(pattern, “gray”) if match: print (“Match 1”)

match = re.match(pattern, “grey”) if match: print (“Match 2”)

match = re.match(pattern, “griy”) if match: print (“Match 3”)


<a name="e990100b"></a>
# Special Sequences
- 
There are various **special sequences** you can use in regular expressions. They are written as a backslash followed by another character.<br />
One useful special sequence is a backslash and a number between 1 and 99, e.g., \1 or \17. This matches the expression of the group of that number.
<br />正则表达式中的小括号"()"。是代表分组的意思。 如果再其后面出现\1则是代表与第一个小括号中要匹配的内容相同,\17则是与第17个小括号内容要相同，注意此方法要与()连用
```python
import re
pattern = r"(.+) \1"
match = re.match(pattern, "word word")
if match:
    print ("Match 1")
match = re.match(pattern, "?! ?!")
if match:
    print ("Match 2")    
match = re.match(pattern, "abc def")
if match:
    print ("Match 3")

More useful special sequences are \d, \s, and \w.
These match digits（数字）, whitespace（空格）, and word characters（单词字符） respectively.
In ASCII mode they are equivalent to [0-9], [ \t\n\r\f\v], and [a-zA-Z0-9_].
In Unicode mode they match certain other characters, as well. For instance, \w matches letters with accents.
Versions of these special sequences with upper case letters - \D, \S, and \W - mean the opposite to the lower-case versions. For instance, \D matches anything that isn’t a digit. ```python import re

pattern = r”(\D+\d)” # 匹配任意非数字+数字

match = re.match(pattern, “Hi 999!”) if match: print(“Match 1”)

match = re.match(pattern, “1, 23, 456!”) if match: print(“Match 2”)

match = re.match(pattern, “ ! $?”) if match: print(“Match 3”)


- 
Additional special sequences are **\A**, **\Z**, and **\b**.<br />
The sequences **\A** and **\Z** match the beginning and end of a string, respectively.<br />
The sequence **\b** matches the empty string between **\w** and **\W** characters, or **\w** characters and the beginning or end of the string. Informally, it represents the boundary between words.<br />
The sequence **\B** matches the empty string anywhere else.
```python
import re
pattern = r"\b(cat)\b"  # 匹配cat，其前后需为空白或非空，但不能为字母或数字等
match = re.search(pattern, "The cat sat!")
if match:
    print ("Match 1")
match = re.search(pattern, "We s>cat<tered?")
if match:
    print ("Match 2")
match = re.search(pattern, "We scattered.")
if match:
    print ("Match 3")

Email Extraction

To demonstrate a sample usage of regular expressions, lets create a program to extract email addresses from a string.
Suppose we have a text that contains an email address:
```
str = "Please contact info@sololearn.com for assistance"
```

Our goal is to extract the substring “info@sololearn.com“.
A basic email address consists of a word and may include dots or dashes. This is followed by the @ sign and the domain name (the name, a dot, and the domain name suffix).
This is the basis for building our regular expression.

pattern = r"([\w\.-]+)@([\w\.-]+)(\.[\w\.]+)"

[\w.-]+ matches one or more word character, dot or dash.
The regex above says that the string should contain a word (with dots and dashes allowed), followed by the @ sign, then another similar word, then a dot and another word.
Our regex contains three groups:
1 - first part of the email address.
2 - domain name without the suffix.
3 - the domain suffix.

```python import re

pattern = r”([\w.-]+)@([\w.-]+)(.[\w.]+)” str = “Please contact info@sololearn.com for assistance”

match = re.search(pattern, str) if match: print(match.group())


<br />In case the string contains multiple email addresses, we could use the **re.findall** method instead of **re.search**, to extract all email addresses.
<a name="12f88548"></a>
# Phone Number Validator
You are given a number input, and need to check if it is a valid phone number.<br />
A valid phone number has exactly 8 digits and starts with **1**, **8** or **9**.<br />
Output "Valid" if the number is valid and "Invalid", if it is not.
**Sample Input**<br />
81239870
**Sample Output**<br />
Valid
```python
import re
#your code goes here
pattern = r"\b^[189](\d){7}\b"
number = input()
if re.match(pattern, number):
    print("Valid")
else:
    print("Invalid")

Python进阶

Groups and Special Sequences

Groups

Email Extraction