This tutorial will walk you through Python Regular Expression a.k.a. RegEx. We have covered every little detail to make this topic simpler for you. You will find special mention of Python RegEx Search, Findall, Match, and Compile methods with full code examples.
Note: The syntax used here is for Python 3. You may modify it to use with other versions of Python.
Python Regular Expression History
Before we begin, have a look at the history of Python’s regular expressions briefly:
- Start of Python (1980s): Python began without built-in regular expressions.
- Adding Regex (mid-1990s): Python got basic regex support with the “regex” module.
- Switch to re (1998): The “regex” module was replaced with the better re module.
- Improvements (2000s-2010s): re got faster, more reliable, and Unicode-friendly.
- Python 3.0 (2008): Importing re became standard.
- Extras (Third-party): Some use third-party libraries like “regex” or “re2” for extra features and speed.
To Learn Python from Scratch – Read Python Tutorial
What is Regular Expression?
Regex (short for “Regular Expression”) in Python is a powerful tool for searching, matching, and manipulating text based on patterns.
It allows us to define patterns using a specialized syntax and then search for and manipulate text that matches those patterns.
Python Regular Expression Support
Python provides a re
module that includes functions for pattern matching and manipulating the string characters.
The re
module has RegEx functions to search patterns in strings. We can even use this module for string substitution.
This Python regular expression module (re) contains capabilities that are similar to the Perl RegEx. It comprises of functions such as match(), sub(), split(), search(), findall(), etc.
How to Use Regular Expression in Python?
To use a regular expression, first, you need to import the re
module. You also need to understand how to pass a raw string (r'expression'
) to a function. Another thing is to interpret the result of a RegEx function.
1. Import Re Module
When you want to use any functions present in the re module, you can access it with the below syntax
import re re.function_name(list_of_arguments)
Or use this alternative approach.
from re import function_name function_name(list_of_arguments)
2. Use Raw String Argument
You might need to use raw string to pass it as the pattern argument to Python regular expression functions. Follow the below code to learn how to use it.
search(r"[a-z]", "yogurt AT 24")
3. RegEx Function Return Value
If a Python RegEx function (mainly the search() and match() functions) succeeds, then it returns a Match object.
We can pass the object to the group() function to extract the resultant string.
The group() method takes a numeric value to return the output of the matched string or to a specific subgroup.
print("matchResult.group() : ", matchResult.group()) print("matchResult.group(1) : ", matchResult.group(1))
6 Most Useful Regular Expression Functions in Python
The two most important functions used are the search and match functions. When you wish to perform a regular expression search on a string, the interpreter traverses it from left to right. If the pattern matches perfectly, then it returns a match object or None on failure.
1. RegEx Search in Python
The search() function gets you the first occurrence of a string containing the string pattern.
The syntax for regular expression search is:
import re re.search(string_pattern, string, flags)
Please note that you can use the following metacharacters to form string patterns.
(+ ? . * ^ $ ( ) [ ] { } | \)
Apart from the previous set, there are some more such as:
\A, \n, \r, \t, \d, \D, \w, \z etc and so on.
Let’s see the Python RegEx search() example:
from re import search Search = search(r“[a-z]”, “yogurt AT 24”) print((Search))
The output is as follows:
<_sre.SRE_Match object; span=(0, 1), match='y'>
Also Check: Search a Python Dictionary by Key with Example
2. RegEx Match
The match() function gets you the match containing the pattern from the start of the string.
The syntax for regular expression match in Python is:
import re re.match(string_pattern, string, flags)
Let’s see the match() example:
from re import match print(match(r"PVR", "PVR Cinemas is the best."))
The output is as follows:
<_sre.SRE_Match object; span=(0, 3), match='PVR'>
3. RegEx Split
It is used to split strings in Python according to the string pattern.
The syntax for the split() is:
import re re.split(string_pattern, string)
Let’s see the split() example:
from re import split print(split(r"y", "Python"))
The output is as follows:
['P', 'thon']
4. RegEx Sub String
It is used to substitute a part of a string according to a string pattern.
The syntax for the sub() is:
import re re.sub(string_pattern, strings)
Let’s see the sub() example:
from re import sub print(sub(r“Machine Learning”, “Artificial Intelligence”, “Machine Learning is the Future.”))
The result is as follows:
Artificial Intelligence is the Future.
5. Python RegEx Findall
It is used to find the occurrence of the string pattern anywhere in the string.
The syntax for findall()
is:
import re re.findall(string_pattern, strings)
Let’s see the Python RegEx Findall() example:
from re import findall print(findall(r“[a-e]”, “I am interested in Python Programming Language”))
The output is as follows:
['a', 'e', 'e', 'e', 'd', 'a', 'a', 'a', 'e']
6. RegEx Compile in Python
It helps you create a string pattern for future purposes rather than on-the-fly string matching.
The syntax for compile() is:
import re
re.compile(string_pattern)
Let’s see the Python RegEx compile() example:
import re future_pattern = re.compile(“[0-9]”) #This is a variable that can be stored for future use. print(future_pattern.search(“1 s d f 2 d f 3 f d f 4 A l s”)) print(future_pattern.match(“1 s d f 2 d f 3 f d f 4 ”))
The result is as follows:
<_sre.SRE_Match object; span=(0, 1), match='1'>
Python RegEx Examples: Search, Findall, Match, and Compile
At this point, we can further explore some complex examples of using regular expressions in Python, covering re.search()
, re.findall()
, and re.compile()
. We’ll start with a problem statement for each example:
Example#1: Use Python RegEx Search to Find Phone Numbers
Problem: Given a text, find and extract valid U.S. phone numbers in different formats (e.g., (123) 456-7890, 123-456-7890, 123.456.7890).
import re
# Sample text containing phone numbers
text = "Contact us at (123) 456-7890 or 123-456-7890. For support, call 555.555.5555 or (987) 654-3210."
# Define a regular expression pattern for matching U.S. phone numbers
phone_pattern = r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'
# Search for and print all phone numbers in the text
start = 0
while True:
match = re.search(phone_pattern, text[start:])
if match:
phone_number = match.group()
print("Found phone number:", phone_number)
start += match.end()
else:
break
Output:
Found phone number: (123) 456-7890
Found phone number: 123-456-7890
Found phone number: 555.555.5555
Found phone number: (987) 654-3210
Example#2: Use Python RegEx Findall to Parse HTML Tags
Problem: Extract and list all HTML tags from an HTML document.
import re
# Sample HTML document
html_text = "<h1>Hello, <b>World</b></h1> <p>Welcome to <a href='https://example.com'>Example</a></p>"
# Define a regular expression pattern for matching HTML tags
html_tag_pattern = r'<[^>]+>'
# Find and print all HTML tags in the document
html_tags = re.findall(html_tag_pattern, html_text)
print("Found HTML tags:")
for tag in html_tags:
print(tag)
Output:
Found HTML tags:
<h1>
<b>
</b>
</h1>
<p>
<a href='https://example.com'>
</a>
</p>
Example#3: Use Python RegEx Match to Validate Passwords
Problem: Check if a password meets certain criteria, such as being at least 8 characters long and containing at least one uppercase letter, one lowercase letter, and one digit.
import re
# Sample passwords
passwords = ["Passw0rd123", "Weakpass", "Secure@123", "ABCDabcd", "P@ss"]
# Define a regular expression pattern for password validation
password_pattern = r'^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)[A-Za-z\d]{8,}$'
# Validate passwords
for password in passwords:
if re.match(password_pattern, password):
print(f"'{password}' is a valid password.")
else:
print(f"'{password}' is not a valid password.")
Output:
'Passw0rd123' is a valid password.
'Weakpass' is not a valid password.
'Secure@123' is not a valid password.
'ABCDabcd' is not a valid password.
'P@ss' is not a valid password.
Also Read: Python Code to Generate Random Email in Python
Example#4: Use Python RegEx Compile to Validate Email
Problem: You need to validate a list of email addresses to check if they follow a valid email format. Valid email addresses should have the following characteristics:
- They should contain an alphanumeric username (including dots and underscores) followed by ‘@’.
- The domain name should contain alphanumeric characters, dots, and hyphens.
- The top-level domain (TLD) should be 2-6 characters long and consist of only alphabetic characters.
import re
# Sample list of email addr
email_list = [
"user@example.com",
"user@my-website.net",
"name.123@email.co.uk",
"invalid.email@.com",
"user@website.",
]
# Define a reg expr pattern for email validation
email_pat = re.compile(r'^[a-zA-Z0-9._]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6}$')
# Validate email addr
for email in email_list:
if email_pat.match(email):
print(f"'{email}' is a valid email addr.")
else:
print(f"'{email}' is not a valid email addr.")
These examples demonstrate regular expressions to solve specific problems, including extracting phone numbers, parsing HTML tags, and validating passwords. They are a powerful tool for text manipulation and pattern matching in Python.
Further References
If you want to learn more about the module re in Python 3, visit the following link.
REF: https://docs.python.org/3/library/re.html
The link may be a bit too abstract for beginners or intermediate users. However, it will be worth referring to it for advanced users.
Lastly, our site needs your support to remain free. Share this post on social media (Facebook/Twitter) if you gained some knowledge from this tutorial.
Enjoy coding,
TechBeamers