Regular Expressions

Regular expressions are a powerful tool for manipulating and searching text data. They are used to match patterns in strings and can be used to perform various operations such as search, replace, and extract. In Python, regular expressions are supported through the re module.

The basic structure of a regular expression pattern is a sequence of characters that defines a search pattern. The most common characters used in regular expressions are:

  • .: Matches any character except a newline character
  • ^: Matches the start of a line
  • $: Matches the end of a line
  • *: Matches zero or more occurrences of the preceding character
  • +: Matches one or more occurrences of the preceding character
  • ?: Matches zero or one occurrences of the preceding character
  • {n}: Matches exactly n occurrences of the preceding character
  • {n,m}: Matches at least n and at most m occurrences of the preceding character
  • []: Matches any character inside the square brackets
  • |: Matches either the expression before or after the vertical bar
  • \: Escapes the special character that follows it

To use regular expressions in Python, you first need to import the re module and then use the re.search() function to search for a pattern in a string. The re.search() function returns a match object if a match is found, and None if no match is found.

Here is an example that uses regular expressions to search for a pattern in a string:

import re

text = "The quick brown fox jumps over the lazy dog"

pattern = "fox"

match = re.search(pattern, text)

if match:
    print("Match found at index", match.start())
else:
    print("No match found")

In this example, the pattern fox is searched for in the string text. If a match is found, the re.search() function returns a match object, which has a start() method that returns the starting index of the match in the string.

You can also use the re.findall() function to find all occurrences of a pattern in a string. This function returns a list of all matches as strings. Here is an example:

import re

text = "The quick brown fox jumps over the lazy dog"

pattern = "\w+"

matches = re.findall(pattern, text)

print(matches)

In this example, the pattern \w+ is used to match one or more word characters in the string text. The re.findall() function returns a list of all matches, which are the individual words in the string.

You can also use the re.sub() function to replace all occurrences of a pattern in a string with a new string. Here is an example:

import re

text = "The quick brown fox jumps over the lazy dog"

pattern = "fox"

new_text = re.sub(pattern, "cat", text)

print(new_text)

In this example, the pattern fox is searched for in the string text and replaced with the string cat. The re.sub() function returns a new string with all occurrences of the pattern replaced.