Skip to content

Optimization API

The optimize module provides functions for automatic selection of the smoothing parameter λ using various criteria.

Functions

Find Lambda that minimizes GCV = (rss/n) / (1 - edf/n)^2.

For GLM families, uses deviance in place of RSS.

Source code in src/psplines/optimize.py
def cross_validation(
    pspline: PSpline,
    lambda_bounds: tuple[float, float] = (1e-6, 1e6),
) -> tuple[float, float]:
    """
    Find Lambda that minimizes GCV = (rss/n) / (1 - edf/n)^2.

    For GLM families, uses deviance in place of RSS.
    """

    def gcv(
        lam: float,
        coef: np.ndarray,
        rss_or_dev: float,
        Dcoef: np.ndarray,
        C: sp.spmatrix | None,
    ) -> float:
        n = np.asarray(pspline.y).size
        assert pspline.B is not None
        edf = effective_df(
            pspline.B,
            difference_matrix(pspline.B.shape[1], pspline.penalty_order),
            lam,
            W=pspline._W,
        )
        return (rss_or_dev / n) / (1 - edf / n) ** 2

    return _optimise_lambda(pspline, gcv, lambda_bounds)

Find Lambda that minimizes AIC = nlog(dev/n) + 2edf.

For Gaussian: dev = RSS. For GLM: deviance from family. Works for Poisson density estimation (§3.3) and binomial smoothing (§3.2).

Source code in src/psplines/optimize.py
def aic(
    pspline: PSpline,
    lambda_bounds: tuple[float, float] = (1e-6, 1e6),
) -> tuple[float, float]:
    """
    Find Lambda that minimizes AIC = n*log(dev/n) + 2*edf.

    For Gaussian: dev = RSS. For GLM: deviance from family.
    Works for Poisson density estimation (§3.3) and binomial smoothing (§3.2).
    """

    def crit(
        lam: float,
        coef: np.ndarray,
        rss_or_dev: float,
        Dcoef: np.ndarray,
        C: sp.spmatrix | None,
    ) -> float:
        n = np.asarray(pspline.y).size
        assert pspline.B is not None
        edf = effective_df(
            pspline.B,
            difference_matrix(pspline.B.shape[1], pspline.penalty_order),
            lam,
            W=pspline._W,
        )
        return float(n * np.log(rss_or_dev / n) + 2 * edf)

    return _optimise_lambda(pspline, crit, lambda_bounds)

Pick lambda via maximum curvature of the L-curve. Implements two-stage search (coarse + refine), vectorized curvature, optional smoothing, and edge-case warnings.

Parameters:

  • pspline (PSpline) –

    Fitted PSpline instance (coefficient solver ready).

  • lambda_bounds (tuple, default: (1e-06, 1000000.0) ) –

    (min, max) bounds for initial lambda grid (log-uniform).

  • num_lambda (int, default: 81 ) –

    Number of points in the initial lambda grid.

  • refine (bool, default: True ) –

    Whether to perform a second, finer search around the coarse optimum.

  • refine_factor (float, default: 10 ) –

    Factor to widen/narrow bounds for refinement around coarse lambda.

  • refine_points (int, default: 81 ) –

    Number of points in the refined grid.

  • smooth_kappa (bool, default: True ) –

    Whether to apply a 3-point moving average to curvature values.

Source code in src/psplines/optimize.py
def l_curve(
    pspline: PSpline,
    lambda_bounds: tuple[float, float] = (1e-6, 1e6),
    num_lambda: int = 81,
    refine: bool = True,
    refine_factor: float = 10,
    refine_points: int = 81,
    smooth_kappa: bool = True,
) -> tuple[float, float]:
    """
    Pick lambda via maximum curvature of the L-curve.
    Implements two-stage search (coarse + refine), vectorized curvature,
    optional smoothing, and edge-case warnings.

    Parameters
    ----------
    pspline : PSpline
        Fitted PSpline instance (coefficient solver ready).
    lambda_bounds : tuple
        (min, max) bounds for initial lambda grid (log-uniform).
    num_lambda : int
        Number of points in the initial lambda grid.
    refine : bool
        Whether to perform a second, finer search around the coarse optimum.
    refine_factor : float
        Factor to widen/narrow bounds for refinement around coarse lambda.
    refine_points : int
        Number of points in the refined grid.
    smooth_kappa : bool
        Whether to apply a 3-point moving average to curvature values.
    """
    # Coarse grid search
    log_min, log_max = np.log10(lambda_bounds[0]), np.log10(lambda_bounds[1])
    grid = np.logspace(log_min, log_max, num_lambda)
    lr, lp = _sweep_lambda(pspline, grid)
    valid = np.isfinite(lr) & np.isfinite(lp)
    x, y, lamv = lp[valid], lr[valid], grid[valid]

    kappa = _parametric_curvature(x, y, smooth=smooth_kappa)
    idx = int(np.nanargmax(kappa))
    if idx < 2 or idx > len(x) - 3:
        warnings.warn(
            "L-curve optimum at boundary of grid; consider expanding lambda_bounds",
            UserWarning,
        )
    lam_corner = lamv[idx]
    kappa_corner = kappa[idx]

    # Optional refinement around coarse optimum
    if refine:
        lower = lam_corner / refine_factor
        upper = lam_corner * refine_factor
        log_l, log_u = np.log10(lower), np.log10(upper)
        grid2 = np.logspace(log_l, log_u, refine_points)
        lr2, lp2 = _sweep_lambda(pspline, grid2)
        valid2 = np.isfinite(lr2) & np.isfinite(lp2)
        x2, y2, lamv2 = lp2[valid2], lr2[valid2], grid2[valid2]

        kappa2 = _parametric_curvature(x2, y2, smooth=smooth_kappa)
        idx2 = int(np.nanargmax(kappa2))
        if idx2 < 2 or idx2 > len(x2) - 3:
            warnings.warn(
                "Refined L-curve optimum at boundary; expand refine_factor or refine_points",
                UserWarning,
            )
        lam_corner = lamv2[idx2]
        kappa_corner = kappa2[idx2]

    return _finish(pspline, lam_corner, kappa_corner)

Pick Lambda via minimum distance on V‑curve.

Source code in src/psplines/optimize.py
def v_curve(
    pspline: PSpline,
    lambda_bounds: tuple[float, float] = (1e-6, 1e6),
    num_lambda: int = 81,
) -> tuple[float, float]:
    """
    Pick Lambda via minimum distance on V‑curve.
    """
    grid = np.logspace(
        np.log10(lambda_bounds[0]), np.log10(lambda_bounds[1]), num_lambda
    )
    lr, lp = _sweep_lambda(pspline, grid)
    valid = np.isfinite(lr) & np.isfinite(lp)
    if valid.sum() < 2:
        raise OptimizationError("Not enough V‑curve points")
    dr = np.diff(lr[valid])
    dp = np.diff(lp[valid])
    dist = np.hypot(dr, dp)
    mid = np.sqrt(grid[valid][:-1] * grid[valid][1:])
    idx = int(np.argmin(dist))
    return _finish(pspline, mid[idx], dist[idx])

2-D grid search for variable-penalty parameters (λ, γ).

The penalty matrix uses exponentially varying weights v_j = exp(γ j / m) (§8.8). For each (λ, γ) pair the model is solved and scored with GCV or AIC.

Parameters:

  • pspline (PSpline) –

    A fitted PSpline instance (used for basis, knots, data).

  • gamma_range ((float, float), default: (-20.0, 20.0) ) –

    Bounds for the γ grid (linear scale).

  • lambda_bounds ((float, float), default: (1e-06, 1000000.0) ) –

    Bounds for the λ grid (log₁₀ scale).

  • num_gamma (int, default: 41 ) –

    Number of γ grid points.

  • num_lambda (int, default: 41 ) –

    Number of λ grid points.

  • criterion ({'gcv', 'aic'}, default: {'gcv' ) –

    Selection criterion.

Returns:

  • best_lambda ( float ) –
  • best_gamma ( float ) –
  • best_score ( float ) –
  • scores ( (ndarray, shape(num_gamma, num_lambda)) ) –

    Full score surface.

Source code in src/psplines/optimize.py
def variable_penalty_cv(
    pspline: PSpline,
    gamma_range: tuple[float, float] = (-20.0, 20.0),
    lambda_bounds: tuple[float, float] = (1e-6, 1e6),
    num_gamma: int = 41,
    num_lambda: int = 41,
    criterion: str = "gcv",
) -> tuple[float, float, float, np.ndarray]:
    """
    2-D grid search for variable-penalty parameters (λ, γ).

    The penalty matrix uses exponentially varying weights
    ``v_j = exp(γ j / m)`` (§8.8).  For each (λ, γ) pair the model is
    solved and scored with GCV or AIC.

    Parameters
    ----------
    pspline : PSpline
        A *fitted* PSpline instance (used for basis, knots, data).
    gamma_range : (float, float)
        Bounds for the γ grid (linear scale).
    lambda_bounds : (float, float)
        Bounds for the λ grid (log₁₀ scale).
    num_gamma : int
        Number of γ grid points.
    num_lambda : int
        Number of λ grid points.
    criterion : {{'gcv', 'aic'}}
        Selection criterion.

    Returns
    -------
    best_lambda : float
    best_gamma : float
    best_score : float
    scores : ndarray, shape (num_gamma, num_lambda)
        Full score surface.
    """
    if criterion not in ("gcv", "aic"):
        raise OptimizationError(f"criterion must be 'gcv' or 'aic', got {criterion!r}")

    if pspline.B is None:
        pspline.fit()

    B = pspline.B
    assert B is not None
    nb = B.shape[1]
    y = np.asarray(pspline.y, dtype=float)
    n = y.size
    C = pspline._C

    fam = pspline._family_obj
    is_glm = fam is not None and not fam.is_gaussian
    user_weights: np.ndarray | None = (
        np.asarray(pspline.weights) if pspline.weights is not None else None
    )

    # Pre-compute cross-products for Gaussian
    W = pspline._W
    if not is_glm:
        if W is not None:
            BtW = B.T @ W  # type: ignore[attr-defined]
            BtB = (BtW @ B).tocsr()
            Bty = BtW @ y
        else:
            BtB = (B.T @ B).tocsr()  # type: ignore[attr-defined]
            Bty = B.T @ y  # type: ignore[attr-defined]

    gamma_grid = np.linspace(gamma_range[0], gamma_range[1], num_gamma)
    lambda_grid = np.logspace(
        np.log10(lambda_bounds[0]),
        np.log10(lambda_bounds[1]),
        num_lambda,
    )
    scores = np.full((num_gamma, num_lambda), np.inf)

    for i, gamma in enumerate(gamma_grid):
        DtD_g = variable_penalty_matrix(nb, pspline.penalty_order, gamma)
        for j, lam in enumerate(lambda_grid):
            try:
                if is_glm:
                    assert fam is not None
                    coef, mu, eta = _irls_solve(
                        B,
                        DtD_g,
                        y,
                        lam,
                        fam,
                        C,
                        user_weights=user_weights,
                        max_iter=pspline.max_iter,
                        tol=pspline.tol,
                    )
                    dev = fam.deviance(y, mu)
                    # ED with this penalty
                    Bt = B.T  # type: ignore[attr-defined]
                    if W is not None:
                        assert pspline._W is not None
                        A = (Bt @ (pspline._W @ B) + lam * DtD_g).tocsr()  # type: ignore[operator]
                        BtB_glm: NDArray = (Bt @ (pspline._W @ B)).toarray()  # type: ignore[operator, attr-defined]
                    else:
                        A = (Bt @ B + lam * DtD_g).tocsr()
                        BtB_glm = (Bt @ B).toarray()  # type: ignore[attr-defined]
                    A_dense = A.toarray()
                    edf = float(np.trace(BtB_glm @ np.linalg.inv(A_dense)))
                    rss_or_dev = dev
                else:
                    P = (lam * DtD_g).tocsr()
                    A_sys = (BtB + P).tocsr()  # type: ignore[possibly-undefined, operator]
                    coef = solve_penalized(BtB, Bty, P, C)  # type: ignore[possibly-undefined]
                    fit_vals = B @ coef
                    resid = y - fit_vals
                    if W is not None:
                        rss_or_dev = float(resid @ (W @ resid))
                    else:
                        rss_or_dev = float(np.sum(resid**2))

                    A_dense = A_sys.toarray()
                    BtB_d: NDArray = (
                        BtB.toarray()  # type: ignore[possibly-undefined, attr-defined]
                        if sp.issparse(BtB)  # type: ignore[possibly-undefined]
                        else np.asarray(BtB)  # type: ignore[possibly-undefined]
                    )
                    edf = float(np.trace(BtB_d @ np.linalg.inv(A_dense)))

                if criterion == "gcv":
                    scores[i, j] = (rss_or_dev / n) / (1 - edf / n) ** 2
                else:  # aic
                    scores[i, j] = n * np.log(max(rss_or_dev / n, 1e-300)) + 2 * edf
            except (np.linalg.LinAlgError, ValueError, RuntimeError):
                continue

    # Find best
    idx = np.unravel_index(np.argmin(scores), scores.shape)
    best_gamma = float(gamma_grid[idx[0]])
    best_lambda = float(lambda_grid[idx[1]])
    best_score = float(scores[idx])

    # Update pspline
    pspline.lambda_ = best_lambda
    pspline.penalty_gamma = best_gamma
    pspline.fit()

    return best_lambda, best_gamma, best_score, scores

Usage Examples

import numpy as np
from psplines import PSpline
from psplines.optimize import cross_validation

# Create spline
x = np.linspace(0, 1, 100)
y = np.sin(2 * np.pi * x) + 0.1 * np.random.randn(100)
spline = PSpline(x, y, nseg=20)

# Find optimal lambda using cross-validation
best_lambda, cv_score = cross_validation(spline)
print(f"Optimal λ: {best_lambda:.6f}, CV score: {cv_score:.6f}")

# Use optimal lambda
spline.lambda_ = best_lambda
spline.fit()

AIC-Based Selection

from psplines.optimize import aic

# Find optimal lambda using AIC
best_lambda, aic_score = aic(spline)
print(f"Optimal λ: {best_lambda:.6f}, AIC: {aic_score:.6f}")

GLM Models (Poisson, Binomial)

All optimizers work with GLM P-splines. For non-Gaussian families, scoring uses deviance instead of RSS, and each candidate λ requires a full IRLS convergence:

# Poisson P-spline with AIC-optimal lambda
counts = np.random.poisson(np.exp(np.sin(x)), 100)
spline_pois = PSpline(x, counts, nseg=20, family="poisson")
best_lambda, aic_score = aic(spline_pois)
spline_pois.lambda_ = best_lambda
spline_pois.fit()

# Binomial P-spline with cross-validation
trials = np.full(100, 20)
successes = np.random.binomial(trials, 1 / (1 + np.exp(-2 * np.sin(x))))
spline_bin = PSpline(x, successes, nseg=20, family="binomial", trials=trials)
best_lambda, cv_score = cross_validation(spline_bin)
spline_bin.lambda_ = best_lambda
spline_bin.fit()

L-Curve Method

from psplines.optimize import l_curve

# Find optimal lambda using L-curve
best_lambda, curvature = l_curve(spline)
print(f"Optimal λ: {best_lambda:.6f}, Curvature: {curvature:.6f}")

Variable Penalty Parameter Selection

When using exponentially varying penalty weights (§8.8), variable_penalty_cv performs a 2-D grid search over \((\lambda, \gamma)\):

from psplines.optimize import variable_penalty_cv

# Fit an initial spline (required for basis info)
spline = PSpline(x, y, nseg=20)
spline.fit()

# 2-D grid search for best (λ, γ) using GCV
best_lambda, best_gamma, best_score, scores = variable_penalty_cv(
    spline,
    gamma_range=(-10, 10),
    lambda_bounds=(1e-4, 1e4),
    num_gamma=41,
    num_lambda=41,
    criterion="gcv",
)
print(f"Optimal λ={best_lambda:.4f}, γ={best_gamma:.2f}")

# Apply the optimal parameters
spline.lambda_ = best_lambda
spline.penalty_gamma = best_gamma
spline.fit()

Comparing Methods

import matplotlib.pyplot as plt

# Compare different methods
methods = {
    'CV': cross_validation,
    'AIC': aic,
    'L-curve': l_curve,
    'V-curve': v_curve
}

results = {}
for name, method in methods.items():
    try:
        lambda_opt, score = method(spline)
        results[name] = lambda_opt
        print(f"{name}: λ = {lambda_opt:.6f}")
    except Exception as e:
        print(f"{name} failed: {e}")

# Plot comparison
if results:
    methods_list = list(results.keys())
    lambdas = list(results.values())

    plt.figure(figsize=(8, 5))
    plt.bar(methods_list, lambdas)
    plt.yscale('log')
    plt.ylabel('Optimal λ')
    plt.title('Comparison of Parameter Selection Methods')
    plt.show()

Mathematical Background

Cross-Validation

Generalized Cross-Validation (GCV) minimizes: $\(\text{GCV}(\lambda) = \frac{n \|y - S_\lambda y\|^2}{[n - \text{tr}(S_\lambda)]^2}\)$

where \(S_\lambda = B(B^TB + \lambda D^TD)^{-1}B^T\) is the smoothing matrix.

Advantages: - Well-established statistical foundation - Good performance across various problems - Automatic selection without user input

Disadvantages: - Can be computationally expensive - May oversmooth in some cases

Akaike Information Criterion (AIC)

AIC balances fit quality and model complexity: $\(\text{AIC}(\lambda) = n \log(\hat{\sigma}^2) + 2 \cdot \text{ED}(\lambda)\)$

where: - \(\hat{\sigma}^2 = \|y - S_\lambda y\|^2 / n\) is the residual variance - \(\text{ED}(\lambda) = \text{tr}(S_\lambda)\) is the effective degrees of freedom

For GLM families (Poisson, Binomial), AIC uses deviance instead of RSS: $\(\text{AIC}(\lambda) = \text{Dev}(\lambda) + 2 \cdot \text{ED}(\lambda)\)$

Advantages: - Information-theoretic foundation - Fast computation - Good for model comparison - Works with both Gaussian and GLM families

Disadvantages: - May not work well for all noise levels - Less robust than cross-validation

L-Curve Method

The L-curve plots \(\log(\|D\alpha_\lambda\|^2)\) vs \(\log(\|y - B\alpha_\lambda\|^2)\) and finds the point of maximum curvature.

Curvature is computed as: $\(\kappa(\lambda) = \frac{2(\rho' \eta'' - \rho'' \eta')}{(\rho'^2 + \eta'^2)^{3/2}}\)$

where \(\rho(\lambda) = \log(\|y - B\alpha_\lambda\|^2)\) and \(\eta(\lambda) = \log(\|D\alpha_\lambda\|^2)\).

Advantages: - Intuitive geometric interpretation - No statistical assumptions - Good for ill-posed problems

Disadvantages: - Can be sensitive to noise - May not have clear corner

V-Curve Method

Similar to L-curve but uses different scaling and looks for valley shape.

Variable Penalty Selection (§8.8)

For the exponential variable penalty \(P(\gamma) = D^T\text{diag}(e^{\gamma j/m})D\), variable_penalty_cv evaluates GCV (or AIC) over a 2-D grid of \((\lambda, \gamma)\) values and returns the combination that minimises the criterion:

\[(\hat\lambda, \hat\gamma) = \arg\min_{\lambda, \gamma} \text{GCV}(\lambda, \gamma)\]

This adds a single extra hyperparameter \(\gamma\) that controls the spatial distribution of the penalty weight while \(\lambda\) controls overall smoothness.

Implementation Details

Optimization Strategy

All methods use: 1. Logarithmic search: Test λ values on logarithmic grid 2. Golden section search: Refine around the optimal region 3. Sparse linear algebra: Efficient computation of smoothing matrices

Default Parameter Ranges

  • λ range: \(10^{-6}\) to \(10^6\) (12 orders of magnitude)
  • Grid points: 50 logarithmically spaced values
  • Refinement: Golden section search with tolerance \(10^{-6}\)

Performance Considerations

  • Cross-validation: \(O(n^3)\) for dense problems, \(O(n^2)\) for sparse
  • AIC: \(O(n^2)\) computation per λ value
  • L-curve: \(O(n^2)\) plus curvature computation
  • V-curve: Similar to L-curve

Numerical Stability

  • Uses sparse Cholesky decomposition when possible
  • Handles ill-conditioned matrices gracefully
  • Monitors condition numbers and issues warnings

Method Selection Guidelines

  1. Start with cross-validation: Most robust for general use
  2. Try AIC: If CV is too slow or gives unreasonable results
  3. Use L-curve: For regularization-heavy applications
  4. Compare methods: Check consistency across approaches

Problem-Specific Recommendations

  • Small datasets (n < 100): Cross-validation or AIC
  • Large datasets (n > 1000): AIC or L-curve for speed
  • High noise: Cross-validation (more robust)
  • Low noise: Any method should work well
  • Sparse data: L-curve or V-curve
  • Time series: Cross-validation with temporal structure

Troubleshooting

If optimization fails: 1. Check input data for issues (NaN, infinite values) 2. Try different nseg values 3. Reduce the λ search range 4. Use a different optimization method 5. Manually specify λ based on domain knowledge