Density Estimation API¶
The density module provides smooth density estimation via Poisson P-splines on histogram counts.
density_estimate¶
Estimate a smooth density from raw data via Poisson P-spline on histogram counts.
Parameters:
-
x(array - like) –Raw data values.
-
bins(int, default:100) –Number of histogram bins. Narrow bins give excellent results (§3.3).
-
xl(float, default:None) –Left boundary for histogram and B-spline domain. Defaults to min(x). Set carefully for bounded data (e.g., 0 for times).
-
xr(float, default:None) –Right boundary. Defaults to max(x).
-
nseg(int, default:20) –Number of B-spline segments.
-
degree(int, default:3) –B-spline degree.
-
penalty_order(int, default:3) –Penalty order. d=3 preserves mean and variance of the raw histogram.
-
lambda_(float, default:None) –Fixed smoothing parameter. If None, selected via AIC.
-
lambda_bounds(tuple, default:(1e-6, 1e6)) –Bounds for AIC search when lambda_ is None.
Returns:
-
DensityResult–Density estimation result with grid, density, and fitted PSpline.
Source code in src/psplines/density.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | |
DensityResult¶
Result of density estimation.
Attributes:
-
grid(NDArray) –Evaluation points (bin midpoints).
-
density(NDArray) –Normalized smooth density values (integrates to ~1).
-
mu(NDArray) –Raw fitted counts (before normalization).
-
lambda_(float) –Smoothing parameter used.
-
pspline(PSpline) –The underlying fitted PSpline object.
-
bin_width(float) –Width of histogram bins.
Usage Examples¶
Basic Density Estimation¶
import numpy as np
from psplines import density_estimate
# Generate sample data
np.random.seed(42)
data = np.random.normal(0, 1, 500)
# Estimate density (AIC-optimal lambda)
result = density_estimate(data, bins=100)
# Plot
import matplotlib.pyplot as plt
plt.plot(result.grid, result.density)
plt.xlabel("x")
plt.ylabel("Density")
plt.title("Smooth Density Estimate")
plt.show()
Bimodal Distribution¶
# Mixture of two Gaussians
data = np.concatenate([
np.random.normal(-2, 0.5, 300),
np.random.normal(2, 0.8, 200),
])
result = density_estimate(data, bins=100, penalty_order=3)
# result.density integrates to ~1
print(f"Integral: {np.trapz(result.density, result.grid):.4f}")
Fixed Smoothing Parameter¶
# Use a specific lambda instead of AIC selection
result = density_estimate(data, bins=80, lambda_=10.0)
Custom Domain¶
Method¶
The density estimation procedure (Eilers & Marx 2021, §3.3):
- Bin the raw data into a histogram with
binsequally spaced bins - Fit a Poisson P-spline to the bin counts (log link)
- Select \(\lambda\) via AIC (or use a fixed value)
- Normalize the fitted counts to integrate to 1
Using penalty_order=3 (default) preserves the mean and variance of the data
(conservation of moments: a penalty of order \(d\) preserves moments up to order \(d-1\)).
DensityResult Attributes¶
| Attribute | Description |
|---|---|
grid |
Bin midpoints (evaluation points) |
density |
Normalized density values (integrates to ~1) |
mu |
Fitted Poisson counts (before normalization) |
lambda_ |
Selected (or fixed) smoothing parameter |
pspline |
The underlying fitted PSpline object |
bin_width |
Width of each histogram bin |