- To use regular expressions, import re.
- Use re.search() to see if a string matches a regular expression.
- Use re.findall() to extract strings that match a regular expression.
- Reference: https://docs.python.org/3/howto/regex.html
Match a Regular Expression
import re fileH = open('file.txt') for line in fileH: line = line.rstrip() if re.search('From: ', line): print(line)
Extract Substring Matching a Regular Expression
Function re.findall extracts substrings that match a regular expression, returning a list of matches.
import re x= 'My 2 favorite numbers are 14 and 98' y = re.findall('[0-9]+', x) >>> print(y) ['2', '14', '98']
Parentheses are not part of the match, but can define what needs to be extracted
x = 'From: someone@hotmail.com Sat Jan 5' y = re.findall('^From: (\S+@\S+)', x) print(y) ['someone@hotmail.com']
Greedy vs Non-Greedy
- Greedy. The + and * operators will favor the largest possible match. Example: ‘^F.+:’
- Non-greedy. A question mark after + or * will suspend the greedy match. Example: ‘^F.+?:’
Special Characters
^ Matches the beginning of a line $ Matches the end of the line . Matches any character \s Matches whitespace \S Matches any non-whitespace character * Repeats a character zero or more times *? Repeats a character zero or more times (non-greedy) + Repeats a character one or more times +? Repeats a character one or more times (non-greedy) [aeiou] Matches a single character in the listed set [^XYZ] Matches a single character not in the listed set [a-z0-9] The set of characters can include a range ( Indicates where string extraction is to start ) Indicates where string extraction is to end