Difflib Module

December 5, 2025

TL;DR

The difflib module provides tools for comparing sequences (especially text strings) and generating difference reports in various formats. It can find the similarity between strings, produce unified or context diffs like Unix diff tools, and identify close matches from a list of possibilities.

Interesting!

The get_close_matches() function makes fuzzy string matching trivially easy - perfect for “did you mean?” suggestions in command-line tools or fixing typos.

Finding Close Matches

The simplest entry point is get_close_matches(), which finds similar strings from a list:

from difflib import get_close_matches

words = ['apple', 'banana', 'apricot', 'avocado', 'grape']
possibilities = get_close_matches('aple', words, n=3, cutoff=0.6)
print(possibilities)
# Output: ['apple', 'grape']

# Great for command suggestions
valid_commands = ['start', 'stop', 'restart', 'status']
user_input = 'stat'
suggestions = get_close_matches(user_input, valid_commands)
if suggestions:
    print(f"Did you mean: {suggestions[0]}?")
    # Output: Did you mean: start?
else:
    print("No suggestions found")

The cutoff parameter (default 0.6) controls how similar strings must be, with 1.0 being identical and 0.0 accepting anything.

Measuring Similarity

SequenceMatcher calculates how similar two sequences are:

from difflib import SequenceMatcher

def similarity_ratio(str1, str2):
    return SequenceMatcher(None, str1, str2).ratio()

print(similarity_ratio('hello world', 'hello there'))  # 0.6363...
print(similarity_ratio('Python', 'Python'))            # 1.0
print(similarity_ratio('Python', 'Java'))              # 0.0

# Works with any sequence
list1 = [1, 2, 3, 4, 5]
list2 = [1, 2, 4, 5, 6]
print(SequenceMatcher(None, list1, list2).ratio())    # 0.8

The ratio() method returns a value between 0 and 1, where higher values indicate greater similarity. You can also use quick_ratio() for a faster (but less accurate) upper bound.

Generating Unified Diffs

Create Unix-style unified diffs between text files:

from difflib import unified_diff

original = ['Line 1\n', 'Line 2\n', 'Line 3\n', 'Line 4\n']
modified = ['Line 1\n', 'Line 2 modified\n', 'Line 3\n', 'Line 5\n']

diff = unified_diff(original, modified,
                   fromfile='original.txt',
                   tofile='modified.txt')

for line in diff:
    print(line, end='')

Output:

i--- original.txt
+++ modified.txt
@@ -1,4 +1,4 @@
 Line 1
-Line 2
+Line 2 modified
 Line 3
-Line 4
+Line 5

This format is identical to what diff -u produces, making it perfect for version control systems or patch generation.

Line-by-Line Comparison

For more detailed comparison with intra-line changes highlighted:

from difflib import Differ

d = Differ()
text1 = ['Hello world\n', '''Python is great isn't it!\n''']
text2 = ['Hello world\n', '''Python is fabo isn't it!\n''']

result = list(d.compare(text1, text2))
for line in result:
    print(repr(line))

Output shows '- ' for removed lines, '+ ' for added lines, ' ' for unchanged, and '? ' for intra-line markers:

'  Hello world\n'
"- Python is great isn't it!\n"
'?           ^^^ ^\n'
"+ Python is fabo isn't it!\n"
'?           ^ ^^\n'

HTML Output

For web applications, HtmlDiff generates side-by-side HTML comparison tables:

from difflib import HtmlDiff

html_diff = HtmlDiff()
text1_lines = ['Line 1', 'Line 2', 'Line 3']
text2_lines = ['Line 1', 'Modified Line 2', 'Line 3']

# Generate complete HTML document with built-in styling
html_doc = html_diff.make_file(text1_lines, text2_lines,
                               fromdesc='Original',
                               todesc='Modified')

# Write to file
with open('diff_output.html', 'w') as f:
    f.write(html_doc)

# Note: make_table() is also available for embedding, but requires
# you to provide CSS styling to match the diff table classes

The make_file() method produces a complete HTML document with all necessary styling, while make_table() gives you just the table for embedding (but you’ll need to add CSS for the diff highlighting). See a HTML diff example generated with HtmlDiff.make_file().

The difflib simplifies the work of sequence comparison, providing a mix of programmatic/mathematical tools for comparing data and also standardised diff output for displaying changes to people in a way that is familiar.

Also see: string module provides the building blocks for text manipulation, while regular expressions offer pattern-based text processing. For formatting text output, check out the textwrap module .

Reference: difflib - Helpers for computing deltas

Difflib Module

TL;DR

Interesting!

Finding Close Matches

Measuring Similarity

Generating Unified Diffs

Line-by-Line Comparison

HTML Output

Textwrap Module: Elegant Text Formatting and Wrapping

String Module

RE Module: Regular Expressions for Pattern Matching

String Module

RE Module: Regular Expressions for Pattern Matching

Textwrap Module: Elegant Text Formatting and Wrapping

Textwrap Module: Elegant Text Formatting and Wrapping

RE Module: Regular Expressions for Pattern Matching