Regular expressions are a powerful tool for manipulating and searching text data. They are used to match patterns in strings and can be used to perform various operations such as search, replace, and extract. In Python, regular expressions are supported through the re
module.
The basic structure of a regular expression pattern is a sequence of characters that defines a search pattern. The most common characters used in regular expressions are:
.
: Matches any character except a newline character^
: Matches the start of a line$
: Matches the end of a line*
: Matches zero or more occurrences of the preceding character+
: Matches one or more occurrences of the preceding character?
: Matches zero or one occurrences of the preceding character{n}
: Matches exactly n occurrences of the preceding character{n,m}
: Matches at least n and at most m occurrences of the preceding character[]
: Matches any character inside the square brackets|
: Matches either the expression before or after the vertical bar\
: Escapes the special character that follows it
To use regular expressions in Python, you first need to import the re
module and then use the re.search()
function to search for a pattern in a string. The re.search()
function returns a match object if a match is found, and None
if no match is found.
Here is an example that uses regular expressions to search for a pattern in a string:
import re
text = "The quick brown fox jumps over the lazy dog"
pattern = "fox"
match = re.search(pattern, text)
if match:
print("Match found at index", match.start())
else:
print("No match found")
In this example, the pattern fox
is searched for in the string text
. If a match is found, the re.search()
function returns a match object, which has a start()
method that returns the starting index of the match in the string.
You can also use the re.findall()
function to find all occurrences of a pattern in a string. This function returns a list of all matches as strings. Here is an example:
import re
text = "The quick brown fox jumps over the lazy dog"
pattern = "\w+"
matches = re.findall(pattern, text)
print(matches)
In this example, the pattern \w+
is used to match one or more word characters in the string text
. The re.findall()
function returns a list of all matches, which are the individual words in the string.
You can also use the re.sub()
function to replace all occurrences of a pattern in a string with a new string. Here is an example:
import re
text = "The quick brown fox jumps over the lazy dog"
pattern = "fox"
new_text = re.sub(pattern, "cat", text)
print(new_text)
In this example, the pattern fox
is searched for in the string text
and replaced with the string cat
. The re.sub()
function returns a new string with all occurrences of the pattern replaced.