Groups

  • A group can be created by surrounding part of a regular expression with parentheses.
    This means that a group can be given as an argument to metacharacters such as * and ?.
    使用()括起来表示groups

  • The content of groups in a match can be accessed using the group function.
    A call of group(0) or group() returns the whole match.
    A call of group(n), where n is greater than 0, returns the nth group from the left.
    The method groups() returns all groups up from 1. ```python import re

pattern = r”a(bc)(de)(f(g)h)i”

match = re.match(pattern, “abcdefghijklmnop”)

if match: print(match.group()) print(match.group(0)) # 同上,返回整个匹配 print(match.group(1)) # 第1个group:bc print(match.group(2)) # 第2个:de print(match.group(3)) # 第3个:fgh print(match.group(4)) # 第4个:g print(match.groups())

  1. -
  2. There are several kinds of special groups.<br />
  3. Two useful ones are **named groups** and **non-capturing groups**.<br />
  4. **Named groups** have the format **(?P...)**, where **name** is the name of the group, and **...** is the content. They behave exactly the same as normal groups, except they can be accessed by **group(name)** in addition to its number.(给group命名)<br />
  5. **Non-capturing groups** have the format **(?:...)**. They are not accessible by the group method, so they can be added to an existing regular expression without breaking the numbering.(无法被group获取,用于添加而不影响编号)
  6. ```python
  7. import re
  8. pattern = r"(?P<first>abc)(?:def)(ghi)"
  9. match = re.match(pattern, "abcdefghi")
  10. if match:
  11. print(match.group("first"))
  12. print(match.groups())
  • Another important metacharacter is |.
    This means “or”, so red|blue matches either “red” or “blue”. ```python import re

pattern = r”gr(a|e)y”

match = re.match(pattern, “gray”) if match: print (“Match 1”)

match = re.match(pattern, “grey”) if match: print (“Match 2”)

match = re.match(pattern, “griy”) if match: print (“Match 3”)

  1. <a name="e990100b"></a>
  2. # Special Sequences
  3. -
  4. There are various **special sequences** you can use in regular expressions. They are written as a backslash followed by another character.<br />
  5. One useful special sequence is a backslash and a number between 1 and 99, e.g., \1 or \17. This matches the expression of the group of that number.
  6. <br />正则表达式中的小括号"()"。是代表分组的意思。 如果再其后面出现\1则是代表与第一个小括号中要匹配的内容相同,\17则是与第17个小括号内容要相同,注意此方法要与()连用
  7. ```python
  8. import re
  9. pattern = r"(.+) \1"
  10. match = re.match(pattern, "word word")
  11. if match:
  12. print ("Match 1")
  13. match = re.match(pattern, "?! ?!")
  14. if match:
  15. print ("Match 2")
  16. match = re.match(pattern, "abc def")
  17. if match:
  18. print ("Match 3")
  • More useful special sequences are \d, \s, and \w.
    These match digits(数字), whitespace(空格), and word characters(单词字符) respectively.
    In ASCII mode they are equivalent to [0-9], [ \t\n\r\f\v], and [a-zA-Z0-9_].
    In Unicode mode they match certain other characters, as well. For instance, \w matches letters with accents.
    Versions of these special sequences with upper case letters - \D, \S, and \W - mean the opposite to the lower-case versions. For instance, \D matches anything that isn’t a digit. ```python import re

pattern = r”(\D+\d)” # 匹配任意非数字+数字

match = re.match(pattern, “Hi 999!”) if match: print(“Match 1”)

match = re.match(pattern, “1, 23, 456!”) if match: print(“Match 2”)

match = re.match(pattern, “ ! $?”) if match: print(“Match 3”)

  1. -
  2. Additional special sequences are **\A**, **\Z**, and **\b**.<br />
  3. The sequences **\A** and **\Z** match the beginning and end of a string, respectively.<br />
  4. The sequence **\b** matches the empty string between **\w** and **\W** characters, or **\w** characters and the beginning or end of the string. Informally, it represents the boundary between words.<br />
  5. The sequence **\B** matches the empty string anywhere else.
  6. ```python
  7. import re
  8. pattern = r"\b(cat)\b" # 匹配cat,其前后需为空白或非空,但不能为字母或数字等
  9. match = re.search(pattern, "The cat sat!")
  10. if match:
  11. print ("Match 1")
  12. match = re.search(pattern, "We s>cat<tered?")
  13. if match:
  14. print ("Match 2")
  15. match = re.search(pattern, "We scattered.")
  16. if match:
  17. print ("Match 3")

Email Extraction

  • To demonstrate a sample usage of regular expressions, lets create a program to extract email addresses from a string.
    Suppose we have a text that contains an email address:
    1. str = "Please contact info@sololearn.com for assistance"


Our goal is to extract the substring “info@sololearn.com“.
A basic email address consists of a word and may include dots or dashes. This is followed by the @ sign and the domain name (the name, a dot, and the domain name suffix).
This is the basis for building our regular expression.

  1. pattern = r"([\w\.-]+)@([\w\.-]+)(\.[\w\.]+)"


[\w.-]+ matches one or more word character, dot or dash.
The regex above says that the string should contain a word (with dots and dashes allowed), followed by the @ sign, then another similar word, then a dot and another word.
Our regex contains three groups:
1 - first part of the email address.
2 - domain name without the suffix.
3 - the domain suffix.

  • ```python import re

pattern = r”([\w.-]+)@([\w.-]+)(.[\w.]+)” str = “Please contact info@sololearn.com for assistance”

match = re.search(pattern, str) if match: print(match.group())

  1. <br />In case the string contains multiple email addresses, we could use the **re.findall** method instead of **re.search**, to extract all email addresses.
  2. <a name="12f88548"></a>
  3. # Phone Number Validator
  4. You are given a number input, and need to check if it is a valid phone number.<br />
  5. A valid phone number has exactly 8 digits and starts with **1**, **8** or **9**.<br />
  6. Output "Valid" if the number is valid and "Invalid", if it is not.
  7. **Sample Input**<br />
  8. 81239870
  9. **Sample Output**<br />
  10. Valid
  11. ```python
  12. import re
  13. #your code goes here
  14. pattern = r"\b^[189](\d){7}\b"
  15. number = input()
  16. if re.match(pattern, number):
  17. print("Valid")
  18. else:
  19. print("Invalid")