Statistics Module
TL;DR
The statistics
module provides mathematical statistics functions for calculating averages, spread measures, and correlations without external dependencies.
Interesting!
Unlike floating-point calculations, the statistics module can work with Decimal and Fraction types for exact mathematical precision in statistical calculations!
Central Tendency
python code snippet start
import statistics
data = [1, 2, 2, 3, 4, 4, 4, 5, 6]
# Different types of averages
print(statistics.mean(data)) # 3.444... (arithmetic mean)
print(statistics.median(data)) # 4 (middle value)
print(statistics.mode(data)) # 4 (most frequent)
print(statistics.harmonic_mean(data)) # 2.738... (reciprocal average)
print(statistics.geometric_mean(data)) # 3.218... (product-based)
# Handle even-length datasets
even_data = [1, 2, 3, 4]
print(statistics.median(even_data)) # 2.5 (average of middle two)
python code snippet end
Spread and Variability
python code snippet start
test_scores = [82, 85, 78, 92, 87, 83, 91, 79, 86, 84]
# Sample statistics (default)
sample_std = statistics.stdev(test_scores) # Sample standard deviation
sample_var = statistics.variance(test_scores) # Sample variance
# Population statistics
pop_std = statistics.pstdev(test_scores) # Population standard deviation
pop_var = statistics.pvariance(test_scores) # Population variance
print(f"Sample StdDev: {sample_std:.2f}")
print(f"Population StdDev: {pop_std:.2f}")
python code snippet end
Working with Different Number Types
python code snippet start
from decimal import Decimal
from fractions import Fraction
# Exact decimal calculations
decimal_data = [Decimal('1.1'), Decimal('2.2'), Decimal('3.3')]
exact_mean = statistics.mean(decimal_data) # Decimal('2.2') - exact!
# Fraction calculations
fraction_data = [Fraction(1, 3), Fraction(2, 3), Fraction(1, 2)]
fraction_mean = statistics.mean(fraction_data) # Fraction(1, 2) - exact!
# Mixed numeric types work too
mixed_data = [1, 2.5, Decimal('3.1'), Fraction(4, 1)]
print(statistics.mean(mixed_data))
python code snippet end
Correlation and Regression
python code snippet start
# Correlation between two datasets
heights = [60, 62, 64, 66, 68, 70]
weights = [115, 125, 135, 145, 155, 165]
correlation = statistics.correlation(heights, weights)
print(f"Correlation: {correlation:.3f}") # Close to 1.0 (strong positive)
# Linear regression
slope, intercept = statistics.linear_regression(heights, weights)
print(f"Slope: {slope:.2f}, Intercept: {intercept:.2f}")
# Predict weight for 72" height
predicted_weight = slope * 72 + intercept
python code snippet end
Probability Distributions
python code snippet start
from statistics import NormalDist
# Create normal distribution
grades = NormalDist(75, 12) # mean=75, stdev=12
# Calculate probabilities
prob_pass = 1 - grades.cdf(60) # P(grade > 60)
prob_a_grade = 1 - grades.cdf(90) # P(grade > 90)
print(f"Pass rate: {prob_pass:.1%}")
print(f"A grade rate: {prob_a_grade:.1%}")
# Generate samples
samples = [grades.inv_cdf(random.random()) for _ in range(1000)]
python code snippet end
Perfect for quick statistical analysis without needing NumPy or pandas - ideal for calculator-level statistics in pure Python! Use with decimal precision for exact calculations and random data generation for sampling. For data collection, integrate with CSV data processing and performance measurement analysis . Algorithm optimization benefits from priority queue algorithms for efficient data structures.