Regular Expressions in Excel

Regular expressions are used for Pattern Matching.
To use in Excel follow these steps:

Step 1: Add VBA reference to “Microsoft VBScript Regular Expressions 5.5”

  • Select “Developer” tab (I don’t have this tab what do I do?)
  • Select “Visual Basic” icon from ‘Code’ ribbon section
  • In “Microsoft Visual Basic for Applications” window select “Tools” from the top menu.
  • Select “References”
  • Check the box next to “Microsoft VBScript Regular Expressions 5.5” to include in your workbook.
  • Click “OK”

Step 2: Define your pattern
Basic definitions:
- Range.

  • E.g. a-z matches an lower case letters from a to z
  • E.g. 0-5 matches any number from 0 to 5

[] Match exactly one of the objects inside these brackets.

  • E.g. [a] matches the letter a
  • E.g. [abc] matches a single letter which can be a, b or c
  • E.g. [a-z] matches any single lower case letter of the alphabet.

() Groups different matches for return purposes. See examples below.
{} Multiplier for repeated copies of pattern defined before it.

  • E.g. [a]{2} matches two consecutive lower case letter a: aa
  • E.g. [a]{1,3} matches at least one and up to three lower case letter a, aa, aaa

+ Match at least one, or more, of the pattern defined before it.

  • E.g. a+ will match consecutive a’s a, aa, aaa, and so on

? Match zero or one of the pattern defined before it.

  • E.g. Pattern may or may not be present but can only be matched one time.
  • E.g. [a-z]? matches empty string or any single lower case letter.

* Match zero or more of the pattern defined before it.

  • E.g. Wildcard for pattern that may or may not be present.
  • E.g. [a-z]* matches empty string or string of lower case letters.

. Matches any character except newline \n

  • E.g. a. Matches a two character string starting with a and ending with anything except \n

| OR operator

  • E.g. a|b means either a or b can be matched.
  • E.g. red|white|orange matches exactly one of the colors.

^ NOT operator

  • E.g. [^0-9] character can not contain a number
  • E.g. [^aA] character can not be lower case a or upper case A

\ Escapes special character that follows (overrides above behavior)

  • E.g. \., \\, \(, \?, \$, \^

Anchoring Patterns:
^ Match must occur at start of string

  • E.g. ^a First character must be lower case letter a
  • E.g. ^[0-9] First character must be a number.

$ Match must occur at end of string

  • E.g. a$ Last character must be lower case letter a

Precedence table:

  1. Order Name Representation
  2. 1 Parentheses ( )
  3. 2 Multipliers ? + * {m,n} {m, n}?
  4. 3 Sequence & Anchors abc ^ $
  5. 4 Alternation |

Predefined Character Abbreviations:

  1. abr same as meaning
  2. \d [0-9] Any single digit
  3. \D [^0-9] Any single character that's not a digit
  4. \w [a-zA-Z0-9_] Any word character
  5. \W [^a-zA-Z0-9_] Any non-word character
  6. \s [ \r\t\n\f] Any space character
  7. \S [^ \r\t\n\f] Any non-space character
  8. \n [\n] New line

Example 1: Run as macro
The following example macro looks at the value in cell A1 to see if the first 1 or 2 characters are digits. If so, they are removed and the rest of the string is displayed. If not, then a box appears telling you that no match is found. Cell A1 values of 12abc will return abc, value of 1abc will return abc, value of abc123 will return “Not Matched” because the digits were not at the start of the string.

  1. Private Sub simpleRegex()
  2. Dim strPattern As String: strPattern = "^[0-9]{1,2}"
  3. Dim strReplace As String: strReplace = ""
  4. Dim regEx As New RegExp
  5. Dim strInput As String
  6. Dim Myrange As Range
  7. Set Myrange = ActiveSheet.Range("A1")
  8. If strPattern <> "" Then
  9. strInput = Myrange.Value
  10. With regEx
  11. .Global = True
  12. .MultiLine = True
  13. .IgnoreCase = False
  14. .Pattern = strPattern
  15. End With
  16. If regEx.Test(strInput) Then
  17. MsgBox (regEx.Replace(strInput, strReplace))
  18. Else
  19. MsgBox ("Not matched")
  20. End If
  21. End If
  22. End Sub

Example 2: Run as an in-cell function
This example is the same as example 1 but is setup to run as an in-cell function. To use, change the code to this:

  1. Function simpleCellRegex(Myrange As Range) As String
  2. Dim regEx As New RegExp
  3. Dim strPattern As String
  4. Dim strInput As String
  5. Dim strReplace As String
  6. Dim strOutput As String
  7. strPattern = "^[0-9]{1,3}"
  8. If strPattern <> "" Then
  9. strInput = Myrange.Value
  10. strReplace = ""
  11. With regEx
  12. .Global = True
  13. .MultiLine = True
  14. .IgnoreCase = False
  15. .Pattern = strPattern
  16. End With
  17. If regEx.test(strInput) Then
  18. simpleCellRegex = regEx.Replace(strInput, strReplace)
  19. Else
  20. simpleCellRegex = "Not matched"
  21. End If
  22. End If
  23. End Function

Place your strings (“12abc”) in cell A1. Enter this formula =simpleCellRegex(A1) in cell B1 and the result will be “abc”.
StackOverflow | Regex - 图1


Example 3: Loop Through Range

This example is the same as example 1 but loops through a range of cells.

  1. Private Sub simpleRegex()
  2. Dim strPattern As String: strPattern = "^[0-9]{1,2}"
  3. Dim strReplace As String: strReplace = ""
  4. Dim regEx As New RegExp
  5. Dim strInput As String
  6. Dim Myrange As Range
  7. Set Myrange = ActiveSheet.Range("A1:A5")
  8. For Each cell In Myrange
  9. If strPattern <> "" Then
  10. strInput = cell.Value
  11. With regEx
  12. .Global = True
  13. .MultiLine = True
  14. .IgnoreCase = False
  15. .Pattern = strPattern
  16. End With
  17. If regEx.Test(strInput) Then
  18. MsgBox (regEx.Replace(strInput, strReplace))
  19. Else
  20. MsgBox ("Not matched")
  21. End If
  22. End If
  23. Next
  24. End Sub

Example 4: Splitting apart different patterns

This example loops through a range (A1, A2 & A3) and looks for a string starting with three digits followed by a single alpha character and then 4 numeric digits. The output splits apart the pattern matches into adjacent cells by using the (). $1 represents the first pattern matched within the first set of ().

  1. Private Sub splitUpRegexPattern()
  2. Dim regEx As New RegExp
  3. Dim strPattern As String
  4. Dim strInput As String
  5. Dim Myrange As Range
  6. Set Myrange = ActiveSheet.Range("A1:A3")
  7. For Each C In Myrange
  8. strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"
  9. If strPattern <> "" Then
  10. strInput = C.Value
  11. With regEx
  12. .Global = True
  13. .MultiLine = True
  14. .IgnoreCase = False
  15. .Pattern = strPattern
  16. End With
  17. If regEx.test(strInput) Then
  18. C.Offset(0, 1) = regEx.Replace(strInput, "$1")
  19. C.Offset(0, 2) = regEx.Replace(strInput, "$2")
  20. C.Offset(0, 3) = regEx.Replace(strInput, "$3")
  21. Else
  22. C.Offset(0, 1) = "(Not matched)"
  23. End If
  24. End If
  25. Next
  26. End Sub

Results:
StackOverflow | Regex - 图2


Additional Pattern Examples

  1. String Regex Pattern Explanation
  2. a1aaa [a-zA-Z][0-9][a-zA-Z]{3} Single alpha, single digit, three alpha characters
  3. a1aaa [a-zA-Z]?[0-9][a-zA-Z]{3} May or may not have preceding alpha character
  4. a1aaa [a-zA-Z][0-9][a-zA-Z]{0,3} Single alpha, single digit, 0 to 3 alpha characters
  5. a1aaa [a-zA-Z][0-9][a-zA-Z]* Single alpha, single digit, followed by any number of alpha characters
  6. </i8> \<\/[a-zA-Z][0-9]\> Exact non-word character except any single alpha followed by any single digit

Share
Follow