正则表达式

Regular Expressions

  • introduction
    Regular expressions are a powerful tool for various kinds of string manipulation.
    They are a domain specific language (DSL) that is present as a library in most modern programming languages, not just Python.
    They are useful for two main tasks:
  • verifying that strings match a pattern (for instance, that a string has the format of an email address),
  • performing substitutions in a string (such as changing all American spellings to British ones).
    NOTE:
    Domain specific languages are highly specialized mini programming languages.
    Regular expressions are a popular example, and SQL (for database manipulation) is another.
    Private domain-specific languages are often used for specific industrial purposes.

  • use
    Regular expressions in Python can be accessed using the re module, which is part of the standard library.
    After you’ve defined a regular expression, the re.match function can be used to determine whether it matches at the beginning of a string.
    If it does, match returns an object representing the match, if not, it returns None.
    To avoid any confusion while working with regular expressions, we would use raw strings as r”expression”.
    Raw strings don’t escape anything, which makes use of regular expressions easier.
    Example ```python import re

pattern = r”spam”

if re.match(pattern, “spamspamspam”): #匹配以pattern开头的字符串 print(“Match”) else: print(“No match”)

  1. -
  2. Other functions to match patterns are **re.search** and **re.findall**.<br />
  3. The function **re.search** finds a match of a pattern anywhere in the string.<br />
  4. The function **re.findall** returns a list of all substrings that match a pattern.
  5. <br />**Example:**
  6. ```python
  7. import re
  8. pattern = r"spam"
  9. if re.match(pattern, "eggspamsausagespam"):
  10. print("Match")
  11. else:
  12. print("No match")
  13. if re.search(pattern, "eggspamsausagespam"): #匹配任意位置的pattern
  14. print("Match")
  15. else:
  16. print("No match")
  17. print(re.findall(pattern, "eggspamsausagespam")) #寻找所有的pattern,并且返回一个列表


In the example above, the match function did not match the pattern, as it looks at the beginning of the string.
The search function found a match in the string.
The function re.finditer does the same thing as re.findall, except it returns an iterator, rather than a list.

  • The regex search returns an object with several methods that give details about it.
    These methods include group which returns the string matched, start and end which return the start and ending positions of the first match, and span which returns the start and end positions of the first match as a tuple.
    Example:
  1. import re
  2. pattern = r"pam"
  3. match = re.search(pattern, "eggspamsausage")
  4. if match:
  5. print(match.group()) # 返回匹配的字符串
  6. print(match.start()) # 返回匹配字符串的开始位置
  7. print(match.end()) # 返回匹配字符串的结束位置
  8. print(match.span()) # 返回开始和结束位置,以元组的形式
  • Search & Replace

One of the most important re methods that use regular expressions is sub.
Syntax:

  1. re.sub(pattern, repl, string, count=0)

This method replaces all occurrences of the pattern in string with repl, substituting all occurrences, unless count provided. This method returns the modified string.
Example:

  1. import re
  2. str = "My name is David. Hi David."
  3. pattern = r"David"
  4. newstr = re.sub(pattern, "Amy", str)
  5. print(newstr)