Penalty Matrices API¶
The penalty module provides functions for constructing difference penalty matrices used in P-spline smoothing.
Functions¶
Create a sparse difference matrix of shape (n-order)×n.
Parameters:
-
n(int) –Length of the coefficient vector α.
-
order(int, default:2) –Order d of the finite difference (must be >= 0).
Returns:
-
D(csr_matrix) –Sparse (n-order)×n matrix implementing d-th order differences. If order=0, returns identity; if order>=n, returns an empty matrix.
Source code in src/psplines/penalty.py
Build the asymmetric penalty D'VD for a shape constraint.
Given the current coefficient vector alpha, this constructs:
P_shape = D_d' V D_d
where D_d is the d-th order difference matrix and
V = diag(v) with v_j = 1 when the constraint is violated
and 0 otherwise. Multiplied by a large κ and added to the
normal equations, this drives the solution toward the constraint.
Parameters:
-
alpha(1-D array, length n) –Current coefficient vector.
-
constraint_type(str) –One of
"increasing","decreasing","convex","concave", or"nonneg". -
mask(bool array, default:None) –Length
n - d(where d is the diff-order implied by constraint_type). When supplied, the penalty is only active wheremask[j]is True (selective constraint, §8.7).
Returns:
-
P_shape((csr_matrix, shape(n, n))) –Symmetric positive semi-definite penalty matrix. Must still be scaled by κ before adding to the system.
Source code in src/psplines/penalty.py
Penalty matrix with exponentially varying weights.
Builds D' V D where V = diag(exp(γ j / m)), m = n - order,
and j = 0, …, m-1. When γ = 0 this reduces to the standard
penalty D'D.
Parameters:
-
n(int) –Length of the coefficient vector α.
-
order(int, default:2) –Order of the finite-difference penalty (default 2).
-
gamma(float, default:0.0) –Exponential rate. Positive → heavier penalty toward the right boundary; negative → heavier toward the left.
Returns:
-
P((csr_matrix, shape(n, n))) –Weighted penalty matrix (does not include λ scaling).
Source code in src/psplines/penalty.py
Penalty matrix with arbitrary per-difference weights.
Builds D' V D where V = diag(weights). This is the building
block for adaptive smoothing: the caller estimates weights (one per
row of D) and passes them here.
Parameters:
-
n(int) –Length of the coefficient vector α.
-
order(int, default:2) –Order of the finite-difference penalty (default 2).
-
weights(1-D array of length ``n - order``, default:None) –Non-negative per-difference penalty weights. If None, all weights are 1 (standard penalty).
Returns:
-
P((csr_matrix, shape(n, n))) –Weighted penalty matrix (does not include a global λ factor).
Source code in src/psplines/penalty.py
X-gap-aware sparse difference matrix using divided differences.
For uniformly spaced x this is proportional to the standard
:func:difference_matrix; for non-uniform spacing it correctly
accounts for variable gaps so that the roughness penalty is
expressed in the units of x rather than index position.
The matrix is built recursively. Order 1 produces:
.. math::
(D_1 z)_i = \frac{z_{i+1} - z_i}{x_{i+1} - x_i}
Order 2 applies a second divided difference to the order-1 result using midpoints, and so on.
Parameters:
-
x(1-D array, length *n*) –Sorted, strictly increasing sample positions.
-
order(int, default:2) –Order of the divided difference (must be >= 1).
Returns:
-
D_x(csr_matrix, shape ``(n - order, n)``) –Sparse divided-difference operator.
Raises:
-
ValueError–If order < 1, x is too short, or x is not strictly increasing.
References
Eilers (2003), "A perfect smoother", Anal. Chem. 75 3631–3636.
Source code in src/psplines/penalty.py
Usage Examples¶
First-Order Differences¶
import numpy as np
from psplines.penalty import difference_matrix
# First-order difference matrix
D1 = difference_matrix(n=5, order=1)
print(D1.toarray())
# [[-1 1 0 0 0]
# [ 0 -1 1 0 0]
# [ 0 0 -1 1 0]
# [ 0 0 0 -1 1]]
Second-Order Differences¶
# Second-order difference matrix
D2 = difference_matrix(n=5, order=2)
print(D2.toarray())
# [[ 1 -2 1 0 0]
# [ 0 1 -2 1 0]
# [ 0 0 1 -2 1]]
Higher-Order Differences¶
# Third-order difference matrix
D3 = difference_matrix(n=6, order=3)
print(D3.toarray())
# [[-1 3 -3 1 0 0]
# [ 0 -1 3 -3 1 0]
# [ 0 0 -1 3 -3 1]]
Asymmetric Penalty for Shape Constraints¶
import numpy as np
from psplines.penalty import asymmetric_penalty_matrix
# Coefficients that violate monotonicity (decreasing segment)
alpha = np.array([1.0, 2.0, 3.0, 2.5, 4.0, 5.0])
# Build penalty that targets decreasing violations
P = asymmetric_penalty_matrix(alpha, "increasing")
print(P.toarray()) # non-zero only at violation (index 3)
# With a selective domain mask (only enforce on first 3 diffs)
mask = np.array([True, True, True, False, False])
P_masked = asymmetric_penalty_matrix(alpha, "increasing", mask=mask)
Variable (Exponential) Penalty Weights¶
from psplines.penalty import variable_penalty_matrix
# Standard penalty (γ = 0)
P0 = variable_penalty_matrix(n=20, order=2, gamma=0.0)
# Heavier penalty toward the right boundary
P_right = variable_penalty_matrix(n=20, order=2, gamma=5.0)
# Heavier penalty toward the left boundary
P_left = variable_penalty_matrix(n=20, order=2, gamma=-5.0)
Adaptive Per-Difference Weights¶
from psplines.penalty import adaptive_penalty_matrix
# Per-difference weights (e.g. from a secondary smoothing pass)
weights = np.ones(18) # n=20, order=2 → 18 differences
weights[5:10] = 0.1 # less penalty in the middle
P = adaptive_penalty_matrix(n=20, order=2, weights=weights)
Divided Differences for Non-Uniform Spacing¶
When data are non-uniformly spaced, the standard difference matrix treats all gaps as equal. The divided-difference matrix weights each difference by the reciprocal of the gap in \(x\), giving a roughness penalty in the natural units of the independent variable:
import numpy as np
from psplines.penalty import divided_difference_matrix
# Non-uniform sample positions
x = np.array([0.0, 1.0, 4.0, 5.0, 10.0])
# First-order divided differences
D1 = divided_difference_matrix(x, order=1)
print(D1.toarray())
# [[-1. 1. 0. 0. 0. ]
# [ 0. -0.33 0.33 0. 0. ]
# [ 0. 0. -1. 1. 0. ]
# [ 0. 0. 0. -0.2 0.2 ]]
# Second-order divided differences
D2 = divided_difference_matrix(x, order=2)
print(D2.shape) # (3, 5)
# Key property: second divided differences annihilate linear functions
z_linear = 2.0 * x + 3.0
print(D2 @ z_linear) # ≈ [0, 0, 0]
# For quadratics, the result is constant
z_quad = x ** 2
print(D2 @ z_quad) # ≈ [2, 2, 2]
On uniformly spaced \(x\), divided_difference_matrix is proportional to the standard difference_matrix:
x_uniform = np.linspace(0, 1, 20)
h = x_uniform[1] - x_uniform[0]
D_std = difference_matrix(20, order=2)
D_div = divided_difference_matrix(x_uniform, order=2)
# D_div ≈ D_std / h²
np.allclose(D_div.toarray(), D_std.toarray() / h**2) # True
Mathematical Background¶
Difference Operators¶
The \(p\)-th order difference operator \(\Delta^p\) is defined recursively:
- \(\Delta^0 \alpha_i = \alpha_i\) (identity)
- \(\Delta^1 \alpha_i = \alpha_{i+1} - \alpha_i\) (first difference)
- \(\Delta^p \alpha_i = \Delta^{p-1} \alpha_{i+1} - \Delta^{p-1} \alpha_i\)
Matrix Representation¶
The difference matrix \(D_p\) has entries such that \((D_p \alpha)_i = \Delta^p \alpha_i\).
For first-order differences (\(p=1\)): $\(D_1 = \begin{pmatrix} -1 & 1 & 0 & \cdots & 0 \\ 0 & -1 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & -1 & 1 \end{pmatrix}\)$
For second-order differences (\(p=2\)): $\(D_2 = \begin{pmatrix} 1 & -2 & 1 & 0 & \cdots & 0 \\ 0 & 1 & -2 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & 1 & -2 & 1 \end{pmatrix}\)$
Properties¶
- Dimension: For \(n\) coefficients, \(D_p\) has size \((n-p) \times n\)
- Rank: The rank of \(D_p\) is \(n-p\) (for typical cases)
- Sparsity: Each row has at most \(p+1\) non-zero entries
- Banded structure: All non-zeros lie within \(p+1\) diagonals
Penalty Interpretation¶
The penalty term \(\lambda \|D_p \alpha\|^2\) penalizes:
- \(p=1\): Large first differences (rough slopes)
- \(p=2\): Large second differences (rough curvature)
- \(p=3\): Large third differences (rough rate of curvature change)
Implementation Details¶
Sparse Storage¶
- Returns
scipy.sparse.csr_matrixfor efficient storage - Memory usage: \(O((n-p) \times (p+1))\) instead of \(O((n-p) \times n)\)
- Fast matrix-vector multiplication
Numerical Properties¶
- Condition number: Increases with penalty order
- Null space: \(D_p\) has a null space of dimension \(p\)
- Regularization: The penalty \(\lambda \|D_p \alpha\|^2\) regularizes the fit
Construction Algorithm¶
The implementation uses efficient sparse matrix construction:
- Compute binomial coefficients for the difference operator
- Build row indices, column indices, and data arrays
- Construct sparse matrix in COO format
- Convert to CSR format for efficient operations
Usage in P-Splines¶
In the P-spline objective function: $\(\min_\alpha \|y - B\alpha\|^2 + \lambda \|D_p \alpha\|^2\)$
The penalty matrix appears as \(P = \lambda D_p^T D_p\), leading to the linear system: $\((B^T B + \lambda D_p^T D_p) \alpha = B^T y\)$
Shape Constraints via Asymmetric Penalties (§8.7)¶
Shape constraints (monotonicity, convexity, etc.) are enforced via the asymmetric penalty of Eilers & Marx (2021, eq. 8.14–8.15). Define a diagonal matrix \(V = \text{diag}(v)\) where \(v_j = 1\) when the \(j\)-th difference violates the constraint and \(v_j = 0\) otherwise. The shape penalty is:
where \(d\) is the implied difference order (1 for monotonicity, 2 for convexity, 0 for non-negativity) and \(\kappa\) is a large constant (default \(10^8\)). This penalty is re-evaluated at each iteration, updating \(V\) based on the current solution, until the coefficients converge.
Supported constraint types:
| Type | Difference order | Penalises |
|---|---|---|
increasing |
1 | \(\Delta\alpha_j < 0\) |
decreasing |
1 | \(\Delta\alpha_j > 0\) |
convex |
2 | \(\Delta^2\alpha_j < 0\) |
concave |
2 | \(\Delta^2\alpha_j > 0\) |
nonneg |
0 | \(\alpha_j < 0\) |
A selective domain mask restricts the penalty to a sub-range of the coefficient indices so that the constraint applies only in a chosen region of the \(x\)-domain.
Variable and Adaptive Penalty Weights (§8.8)¶
Exponential variable weights¶
Replace the standard penalty \(D^T D\) with \(D^T V D\) where \(V = \text{diag}\!\bigl(\exp(\gamma j/m)\bigr)\), \(j = 0, \ldots, m-1\). This yields heavier/lighter regularisation near one boundary:
Adaptive per-difference weights¶
A secondary B-spline basis of \(K_w\) segments is fitted to the log-residuals to obtain per-difference weights \(w_j\). The penalty becomes \(D^T \text{diag}(w) D\), allowing spatially varying smoothness:
Penalty Order Selection¶
- Order 1: Good for piecewise linear trends
- Order 2: Most common choice, penalizes curvature
- Order 3: For very smooth curves, penalizes jerk
- Higher orders: Rarely needed, may cause numerical issues
Divided Differences for Non-Uniform Data¶
The standard difference matrix \(D_p\) assumes equal spacing between coefficients. When applied to the Whittaker smoother (where \(B = I\) and coefficients are the data values), non-uniform spacing in \(x\) requires correcting the differences.
The divided-difference operator \(D_x\) replaces each finite difference with
and second-order divided differences are built recursively via midpoints. The
penalty \(\lambda \|D_x z\|^2\) then measures roughness in the units of \(x\) rather
than index position. This is used automatically by
WhittakerSmoother when non-uniform spacing is detected.
See also: divided_difference_matrix.