Gc Module - Garbage Collection Control
TL;DR
The gc module provides an interface to Python’s garbage collector, allowing you to manually trigger collection, disable automatic collection, debug memory leaks, and tune performance. It’s particularly useful for finding reference cycles, optimizing memory usage in long-running processes, and understanding what objects are consuming memory.
Interesting!
Python’s garbage collector uses a generation-based system that can detect and clean up reference cycles that normal reference counting can’t handle. You can actually inspect every single object being tracked by the garbage collector using gc.get_objects(), which is incredibly powerful for debugging mysterious memory leaks.
Manual Collection Control
You can take control of when garbage collection happens:
python code snippet start
import gc
# Disable automatic collection
gc.disable()
# Create reference cycles that need GC to clean up
class Node:
def __init__(self, value):
self.value = value
self.next = None
# Create a circular linked list and then abandon it
def create_cycle():
nodes = [Node(i) for i in range(1000)]
for i in range(len(nodes)):
nodes[i].next = nodes[(i + 1) % len(nodes)]
# When this function returns, nodes becomes unreachable
# but the circular references keep objects alive
create_cycle() # Creates 1000 nodes with circular references
# Manually trigger collection when you're ready
collected = gc.collect()
print(f"Collected {collected} objects") # Will show ~1000+ objects
# Re-enable automatic collection
gc.enable()python code snippet end
The collect() function returns the number of objects collected. You can also specify which generation to collect: gc.collect(0) for young generation only, or gc.collect(2) (the default) for a full collection.
Understanding Generations
Python uses a two-generation system as of Python 3.14: generation 0 (young) and generation 2 (old). However, gc.collect() accepts three arguments to control collection behavior:
python code snippet start
import gc
# Different collection modes
gc.collect(0) # Collect young generation only
gc.collect(1) # Collect young + increment old generation (partial)
gc.collect(2) # Full collection of all generations (default)
gc.collect() # Same as gc.collect(2)
# Check collection thresholds
print(gc.get_threshold()) # (700, 10, 10)
# Check current collection counts
print(gc.get_count()) # (121, 0, 0) - allocations since last collection
# See which objects are in which generation
young_objects = gc.get_objects(generation=0)
old_objects = gc.get_objects(generation=2)
print(f"Young: {len(young_objects)}, Old: {len(old_objects)}")
# Note: gc.get_objects(generation=1) returns empty list (no generation 1)python code snippet end
Generation 0 holds new objects, while generation 2 holds objects that survived previous collections. When allocations exceed threshold 700, collection starts. The threshold of 10 for generation 2 means roughly 1% of old objects are scanned per collection.
Debugging Memory Leaks
The gc module helps debug memory issues by saving objects to gc.garbage:
python code snippet start
import gc
# DEBUG_SAVEALL saves all collected objects to gc.garbage
gc.set_debug(gc.DEBUG_SAVEALL)
def create_cycle():
a = []
b = [a]
a.append(b) # Reference cycle!
return # Cycles become unreachable
create_cycle()
# Force collection
collected = gc.collect()
# With DEBUG_SAVEALL, gc.garbage contains ALL collected objects
if gc.garbage:
print(f"Found {len(gc.garbage)} collected objects")
for obj in gc.garbage:
print(type(obj), id(obj))
gc.garbage.clear() # Clear for next test
# Turn off debugging
gc.set_debug(0)python code snippet end
The DEBUG_LEAK flag combines DEBUG_COLLECTABLE, DEBUG_UNCOLLECTABLE, and DEBUG_SAVEALL. Note that DEBUG_SAVEALL saves ALL collected objects, not just uncollectable ones. Truly uncollectable objects (rare in modern Python) are those with __del__ methods that create cycles.
Finding Object References
When tracking down memory leaks, you can discover what’s holding references to your objects:
python code snippet start
import gc
class MyClass:
pass
obj = MyClass()
container = [obj]
# Find what references this object
referrers = gc.get_referrers(obj)
print(f"Found {len(referrers)} referrers") # Will include 'container'
# Find what this object references
referents = gc.get_referents(container)
print(f"Found {len(referents)} referents") # Will include 'obj'python code snippet end
Note the warning in the documentation: objects returned by get_referrers() might be in a temporarily invalid state during construction, so use this for debugging only.
Inspecting All Tracked Objects
You can get a snapshot of all objects being tracked:
python code snippet start
import gc
# Get all tracked objects
all_objects = gc.get_objects()
print(f"Tracking {len(all_objects)} objects")
# Find all instances of a specific type
my_objects = [obj for obj in gc.get_objects()
if isinstance(obj, MyClass)]python code snippet end
This is incredibly useful for finding memory leaks in production systems - you can see exactly what types of objects are accumulating.
What Gets Tracked?
Not all objects need garbage collection tracking:
python code snippet start
import gc
# Immutable atoms aren't tracked
print(gc.is_tracked(42)) # False
print(gc.is_tracked("hello")) # False
# Containers are tracked (they might have cycles)
print(gc.is_tracked([])) # True
print(gc.is_tracked({})) # False (empty dict optimization)
print(gc.is_tracked({"a": 1})) # Truepython code snippet end
Python only tracks objects that could potentially be part of reference cycles - primarily containers like lists, dicts, and custom objects with instance dictionaries.
Collection Callbacks
You can register callbacks to run before and after garbage collection:
python code snippet start
import gc
def gc_callback(phase, info):
if phase == "start":
print("GC starting...")
elif phase == "stop":
print(f"Collected: {info['collected']}")
print(f"Uncollectable: {info['uncollectable']}")
gc.callbacks.append(gc_callback)
# Now collections will trigger your callback
gc.collect()python code snippet end
This is useful for monitoring GC activity in production and understanding its performance impact.
Performance Tuning
For long-running applications, you might want to tune collection frequency:
python code snippet start
import gc
# More aggressive young generation collection
gc.set_threshold(500, 5) # Collect more frequently
# Or disable for critical sections
gc.disable()
# ... time-critical code ...
gc.enable()python code snippet end
Disabling GC during performance-critical sections can provide significant speedups if you know you’re not creating reference cycles.
Fork Optimization
When using multiprocessing, you can optimize memory usage across forked processes:
python code snippet start
import gc
import os
# In parent process before fork
gc.disable()
gc.freeze() # Mark current objects as permanent
# Fork creates child processes
pid = os.fork()
if pid == 0: # Child process
gc.enable() # Only track new objects
# ... child process work ...python code snippet end
This prevents the garbage collector from scanning shared memory that won’t change, improving performance.
The gc module gives you fine-grained control over Python’s memory management. While most applications can rely on automatic garbage collection, understanding and using this module becomes essential for debugging memory leaks, optimizing long-running processes, and building high-performance systems.
The sys module also provides runtime inspection capabilities, while the gc module focuses specifically on memory management and garbage collection.
Reference: gc - Garbage Collector interface