Skip to main content Brad's PyNotes

RE Module: Regular Expressions for Pattern Matching

TL;DR

The re module provides regular expression operations for pattern matching, searching, and text manipulation with functions like search(), match(), findall(), and sub().

Interesting!

Python’s re module caches compiled patterns automatically - calling re.search() with the same pattern multiple times is optimized internally, but using re.compile() explicitly is still more efficient for repeated use.

Basic Pattern Matching

python code snippet start

import re

text = "The year 2024 was great, but 2025 will be better!"

# Find first match
match = re.search(r'\d{4}', text)
if match:
    print(match.group())  # 2024

# Find all matches  
years = re.findall(r'\d{4}', text)
print(years)  # ['2024', '2025']

# Check if string starts with pattern
if re.match(r'The', text):
    print("Starts with 'The'")

python code snippet end

Common Patterns

python code snippet start

# Email validation (basic)
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
email = "user@example.com"
if re.match(email_pattern, email):
    print("Valid email")

# Phone number extraction
text = "Call me at 555-123-4567 or (555) 987-6543"
phones = re.findall(r'[\(\d\)-\s]+', text)

python code snippet end

Text Substitution

python code snippet start

# Replace patterns
text = "Hello world, hello universe"
new_text = re.sub(r'hello', 'hi', text, flags=re.IGNORECASE)
print(new_text)  # Hi world, hi universe

# Advanced substitution with groups
html = "<p>Hello</p><div>World</div>"
text_only = re.sub(r'<[^>]+>', '', html)
print(text_only)  # HelloWorld

python code snippet end

The re module transforms complex text processing tasks into concise, readable pattern-based operations.

Regular expressions complement string processing utilities and work seamlessly with file I/O operations for text analysis and data extraction.

Reference: Python RE Module Documentation