Skip to main content Brad's PyNotes

Built-in Types: Str, List, Dict, and Set

TL;DR

Python’s built-in types (str, list, dict, set) come packed with dozens of methods for manipulation, searching, and transformation. Dictionaries maintain insertion order since Python 3.7, and many lesser-known methods like dict.setdefault(), str.removeprefix(), and set operations on dictionary views can simplify common patterns.

Interesting!

Dictionary views support set operations. You can perform intersection, union, and difference operations directly on dict.keys() to find common keys between dictionaries, unique keys, or differences without converting to sets first.

python code snippet start

d1 = {'a': 1, 'b': 2, 'c': 3}
d2 = {'b': 4, 'c': 5, 'd': 6}

# Common keys
d1.keys() & d2.keys()  # {'b', 'c'}

# All keys
d1.keys() | d2.keys()  # {'a', 'b', 'c', 'd'}

# Keys only in d1
d1.keys() - d2.keys()  # {'a'}

python code snippet end

String Power Methods

Beyond basic slicing and concatenation, strings have specialized methods for common tasks:

python code snippet start

# Remove prefix/suffix (Python 3.9+)
url = "https://example.com"
url.removeprefix("https://")  # "example.com"

filename = "report.pdf"
filename.removesuffix(".pdf")  # "report"

# Partition - split into exactly 3 parts
email = "user@example.com"
user, sep, domain = email.partition("@")
# user='user', sep='@', domain='example.com'

# Split on line boundaries (handles \n, \r\n, \r)
text = "line1\nline2\r\nline3"
text.splitlines()  # ['line1', 'line2', 'line3']

# Zero-fill numbers for fixed-width formatting
"42".zfill(5)      # "00042"
"-42".zfill(5)     # "-0042" - sign aware

# Center/justify text
"Title".center(20, "=")  # "=======Title========"
"Left".ljust(10, ".")    # "Left......"

python code snippet end

String formatting has evolved through several generations. F-strings (Python 3.6+) now support debug specifiers:

python code snippet start

name = "Alice"
age = 30

# Debug format - includes variable name
print(f"{name=}, {age=}")  # name='Alice', age=30

python code snippet end

Unicode Strings

Python 3 strings are Unicode by default, with specialized methods for international text handling.

Case folding for comparisons: The casefold() method applies Unicode case-folding transformations for case-insensitive string matching. Unlike lower(), which performs simple lowercase conversion, casefold() handles complex case mappings defined in the Unicode standard - converting characters like German ß to “ss”, Greek Σ to “σ”, and other multi-character expansions. Use casefold() when comparing user input, search terms, or any international text where case should be ignored.

python code snippet start

# casefold() handles Unicode case folding rules
"Maße".casefold()  # "masse" (ß becomes ss)
"Maße".lower()     # "maße" (ß unchanged)

# Critical for case-insensitive matching
"Straße".casefold() == "STRASSE".casefold()  # True
"Straße".lower() == "STRASSE".lower()        # False

# Greek example
"ΣΊΣΥΦΟΣ".casefold() == "σίσυφος".casefold()  # True

python code snippet end

Encoding and decoding: Strings encode to bytes, bytes decode to strings.

python code snippet start

# Encode to different formats
text = "Hello, 世界"
text.encode('utf-8')      # b'Hello, \xe4\xb8\x96\xe7\x95\x8c'
text.encode('utf-16')     # b'\xff\xfeH\x00e\x00l\x00l\x00o\x00...'

# Decode with error handling
data = b'caf\xe9'
data.decode('latin-1')              # "café"
data.decode('utf-8', errors='ignore')    # "caf" - skip invalid
data.decode('utf-8', errors='replace')   # "caf�" - replacement char

python code snippet end

Unicode character access: Every character has a code point accessible via ord() and chr().

python code snippet start

# Character to code point
ord('A')      # 65
ord('€')      # 8364
ord('🐍')     # 128013

# Code point to character
chr(65)       # 'A'
chr(128013)   # '🐍'

# Useful for character ranges
"".join(chr(i) for i in range(0x1F600, 0x1F610))  # Emoji range

python code snippet end

Reference: Unicode HOWTO - Python Documentation

List Methods and Pitfalls

Lists provide in-place modification methods. Key distinction: methods that modify in-place return None, not the modified list.

python code snippet start

numbers = [3, 1, 4, 1, 5]

# In-place operations return None
result = numbers.sort()  # result is None
print(numbers)           # [1, 1, 3, 4, 5]

# Use sorted() for a new list
numbers = [3, 1, 4, 1, 5]
result = sorted(numbers)  # result is [1, 1, 3, 4, 5]

python code snippet end

Common pitfall with list repetition:

python code snippet start

# Wrong - all sublists are the same object
matrix = [[]] * 3
matrix[0].append(1)
# Result: [[1], [1], [1]]

# Correct - create separate lists
matrix = [[] for _ in range(3)]
matrix[0].append(1)
# Result: [[1], [], []]

python code snippet end

Dictionary Convenience Methods

The setdefault() method combines get-or-create logic in one call:

python code snippet start

# Without setdefault
counts = {}
for word in ["apple", "banana", "apple"]:
    if word not in counts:
        counts[word] = 0
    counts[word] += 1

# With setdefault
counts = {}
for word in ["apple", "banana", "apple"]:
    counts[word] = counts.setdefault(word, 0) + 1

python code snippet end

Dictionary merging became cleaner in Python 3.9:

python code snippet start

defaults = {'color': 'blue', 'size': 'medium'}
custom = {'size': 'large', 'style': 'bold'}

# Merge operator (newer wins)
merged = defaults | custom
# {'color': 'blue', 'size': 'large', 'style': 'bold'}

# In-place merge
defaults |= custom

python code snippet end

Since Python 3.7, dictionaries preserve insertion order. The popitem() method leverages this by removing in LIFO order:

python code snippet start

d = {"first": 1, "second": 2, "third": 3}
d.popitem()  # ('third', 3) - most recently added
d.popitem()  # ('second', 2)

python code snippet end

Set Operations

Sets excel at membership testing and eliminating duplicates. They support mathematical set operations with intuitive operators:

python code snippet start

evens = {2, 4, 6, 8}
primes = {2, 3, 5, 7}

evens | primes    # Union: {2, 3, 4, 5, 6, 7, 8}
evens & primes    # Intersection: {2}
evens - primes    # Difference: {4, 6, 8}
evens ^ primes    # Symmetric difference: {3, 4, 5, 6, 7, 8}

# Test relationships
{1, 2} <= {1, 2, 3}     # Subset: True
{1, 2} < {1, 2}         # Proper subset: False
evens.isdisjoint({1, 3, 5})  # No overlap: True

python code snippet end

Safe removal:

python code snippet start

s = {1, 2, 3}

s.remove(4)    # Raises KeyError
s.discard(4)   # No error, silent no-op

python code snippet end

Lesser-Known Type Features

String translation for character mapping:

python code snippet start

# Form 1: Two equal-length strings (character-to-character mapping)
trans = str.maketrans('aeiou', '12345')
'hello world'.translate(trans)  # 'h2ll4 w4rld'

# Form 2: Three arguments (mapping + deletion)
# Third argument specifies characters to delete
trans = str.maketrans('aeiou', '12345', 'world')
'hello world'.translate(trans)  # 'h2ll4 '

# Form 3: Dictionary mapping (most flexible)
# Maps ordinals or characters to ordinals/strings/None
trans = str.maketrans({
    'h': 'H',           # Character to character
    ord('e'): '3',      # Ordinal to string
    'o': None,          # Character to None (delete)
    108: 'L'            # Ordinal (for 'l') to character
})
'hello world'.translate(trans)  # 'H3LL wrLd'

python code snippet end

Numeric type conversions:

python code snippet start

# Float to exact rational representation
(3.14).as_integer_ratio()  # (707065141471711, 2251799813685248)

# Bit manipulation on integers
(42).bit_length()   # 6 (needs 6 bits)
(42).bit_count()    # 3 (three 1-bits) - Python 3.10+

# Hexadecimal round-trip for exact float storage
h = (3.14159).hex()         # '0x1.921f9f01b866ep+1'
float.fromhex(h)            # 3.14159

python code snippet end

Bytes to hex conversion:

python code snippet start

data = b'\xde\xad\xbe\xef'
data.hex()                  # 'deadbeef'
bytes.fromhex('deadbeef')   # b'\xde\xad\xbe\xef'

# With separators (Python 3.8+)
data.hex(' ')               # 'de ad be ef'
data.hex(':', 2)            # 'dead:beef'

python code snippet end

Python’s built-in types form the foundation for all data manipulation. Understanding their methods, operators, and performance characteristics enables writing more concise and efficient code.

For specialized container types beyond these basics, check out the collections module with defaultdict, Counter, and deque. If you’re working with data structures more generally, the data structures tutorial provides comprehensive coverage. Dictionary comprehensions are explored in detail in the PEP 274 article .

Reference: Built-in Types - Python Documentation