Skip to main content Brad's PyNotes

Statistics Module

TL;DR

The statistics module provides mathematical statistics functions for calculating averages, spread measures, and correlations without external dependencies.

Interesting!

Unlike floating-point calculations, the statistics module can work with Decimal and Fraction types for exact mathematical precision in statistical calculations!

Central Tendency

python code snippet start

import statistics

data = [1, 2, 2, 3, 4, 4, 4, 5, 6]

# Different types of averages
print(statistics.mean(data))           # 3.444... (arithmetic mean)
print(statistics.median(data))         # 4 (middle value)
print(statistics.mode(data))           # 4 (most frequent)
print(statistics.harmonic_mean(data))  # 2.738... (reciprocal average)
print(statistics.geometric_mean(data)) # 3.218... (product-based)

# Handle even-length datasets
even_data = [1, 2, 3, 4]
print(statistics.median(even_data))    # 2.5 (average of middle two)

python code snippet end

Spread and Variability

python code snippet start

test_scores = [82, 85, 78, 92, 87, 83, 91, 79, 86, 84]

# Sample statistics (default)
sample_std = statistics.stdev(test_scores)     # Sample standard deviation
sample_var = statistics.variance(test_scores)  # Sample variance

# Population statistics
pop_std = statistics.pstdev(test_scores)       # Population standard deviation
pop_var = statistics.pvariance(test_scores)    # Population variance

print(f"Sample StdDev: {sample_std:.2f}")
print(f"Population StdDev: {pop_std:.2f}")

python code snippet end

Working with Different Number Types

python code snippet start

from decimal import Decimal
from fractions import Fraction

# Exact decimal calculations
decimal_data = [Decimal('1.1'), Decimal('2.2'), Decimal('3.3')]
exact_mean = statistics.mean(decimal_data)  # Decimal('2.2') - exact!

# Fraction calculations
fraction_data = [Fraction(1, 3), Fraction(2, 3), Fraction(1, 2)]
fraction_mean = statistics.mean(fraction_data)  # Fraction(1, 2) - exact!

# Mixed numeric types work too
mixed_data = [1, 2.5, Decimal('3.1'), Fraction(4, 1)]
print(statistics.mean(mixed_data))

python code snippet end

Correlation and Regression

python code snippet start

# Correlation between two datasets
heights = [60, 62, 64, 66, 68, 70]
weights = [115, 125, 135, 145, 155, 165]

correlation = statistics.correlation(heights, weights)
print(f"Correlation: {correlation:.3f}")  # Close to 1.0 (strong positive)

# Linear regression
slope, intercept = statistics.linear_regression(heights, weights)
print(f"Slope: {slope:.2f}, Intercept: {intercept:.2f}")

# Predict weight for 72" height
predicted_weight = slope * 72 + intercept

python code snippet end

Probability Distributions

python code snippet start

from statistics import NormalDist

# Create normal distribution
grades = NormalDist(75, 12)  # mean=75, stdev=12

# Calculate probabilities
prob_pass = 1 - grades.cdf(60)        # P(grade > 60)
prob_a_grade = 1 - grades.cdf(90)     # P(grade > 90)

print(f"Pass rate: {prob_pass:.1%}")
print(f"A grade rate: {prob_a_grade:.1%}")

# Generate samples
samples = [grades.inv_cdf(random.random()) for _ in range(1000)]

python code snippet end

Perfect for quick statistical analysis without needing NumPy or pandas - ideal for calculator-level statistics in pure Python! Use with decimal precision for exact calculations and random data generation for sampling. For data collection, integrate with CSV data processing and performance measurement analysis . Algorithm optimization benefits from priority queue algorithms for efficient data structures.

Reference: statistics — Mathematical statistics functions