Optimization API¶

The optimize module provides functions for automatic selection of the smoothing parameter λ using various criteria.

Functions¶

Find Lambda that minimizes GCV = (rss/n) / (1 - edf/n)^2.

Source code in src/psplines/optimize.py

def cross_validation(
    pspline: PSpline,
    lambda_bounds: tuple[float, float] = (1e-6, 1e6),
) -> tuple[float, float]:
    """
    Find Lambda that minimizes GCV = (rss/n) / (1 - edf/n)^2.
    """

    def gcv(lam, coef, rss, Dcoef, C):
        n = pspline.y.size
        edf = effective_df(
            pspline.B, difference_matrix(pspline.B.shape[1], pspline.penalty_order), lam
        )
        return (rss / n) / (1 - edf / n) ** 2

    return _optimise_lambda(pspline, gcv, lambda_bounds)

Find Lambda that minimizes AIC = nlog(rss/n) + 2edf.

Source code in src/psplines/optimize.py

def aic(
    pspline: PSpline,
    lambda_bounds: tuple[float, float] = (1e-6, 1e6),
) -> tuple[float, float]:
    """
    Find Lambda that minimizes AIC = n*log(rss/n) + 2*edf.
    """

    def crit(lam, coef, rss, Dcoef, C):
        n = pspline.y.size
        edf = effective_df(
            pspline.B, difference_matrix(pspline.B.shape[1], pspline.penalty_order), lam
        )
        return n * np.log(rss / n) + 2 * edf

    return _optimise_lambda(pspline, crit, lambda_bounds)

Pick lambda via maximum curvature of the L-curve. Implements two-stage search (coarse + refine), vectorized curvature, optional smoothing, and edge-case warnings.

Parameters:

pspline (PSpline) –

Fitted PSpline instance (coefficient solver ready).
lambda_bounds (tuple, default: (1e-06, 1000000.0) ) –

(min, max) bounds for initial lambda grid (log-uniform).
num_lambda (int, default: 81 ) –

Number of points in the initial lambda grid.
refine (bool, default: True ) –

Whether to perform a second, finer search around the coarse optimum.
refine_factor (float, default: 10 ) –

Factor to widen/narrow bounds for refinement around coarse lambda.
refine_points (int, default: 81 ) –

Number of points in the refined grid.
smooth_kappa (bool, default: True ) –

Whether to apply a 3-point moving average to curvature values.

Source code in src/psplines/optimize.py

def l_curve(
    pspline,
    lambda_bounds=(1e-6, 1e6),
    num_lambda=81,
    refine=True,
    refine_factor=10,
    refine_points=81,
    smooth_kappa=True,
):
    """
    Pick lambda via maximum curvature of the L-curve.
    Implements two-stage search (coarse + refine), vectorized curvature,
    optional smoothing, and edge-case warnings.

    Parameters
    ----------
    pspline : PSpline
        Fitted PSpline instance (coefficient solver ready).
    lambda_bounds : tuple
        (min, max) bounds for initial lambda grid (log-uniform).
    num_lambda : int
        Number of points in the initial lambda grid.
    refine : bool
        Whether to perform a second, finer search around the coarse optimum.
    refine_factor : float
        Factor to widen/narrow bounds for refinement around coarse lambda.
    refine_points : int
        Number of points in the refined grid.
    smooth_kappa : bool
        Whether to apply a 3-point moving average to curvature values.
    """
    # Coarse grid search
    log_min, log_max = np.log10(lambda_bounds[0]), np.log10(lambda_bounds[1])
    grid = np.logspace(log_min, log_max, num_lambda)
    lr, lp = _sweep_lambda(pspline, grid)
    valid = np.isfinite(lr) & np.isfinite(lp)
    x, y, lamv = lp[valid], lr[valid], grid[valid]

    # Vectorized curvature calculation
    # central differences for dx, dy, ddx, ddy
    dx = (x[2:] - x[:-2]) * 0.5
    dy = (y[2:] - y[:-2]) * 0.5
    ddx = x[2:] - 2 * x[1:-1] + x[:-2]
    ddy = y[2:] - 2 * y[1:-1] + y[:-2]
    kappa = np.full_like(x, np.nan)
    denom = (dx * dx + dy * dy) ** 1.5
    kappa[1:-1] = np.abs(dx * ddy - dy * ddx) / denom

    # Optional smoothing of curvature
    if smooth_kappa:
        kernel = np.ones(3) / 3
        kappa = np.convolve(kappa, kernel, mode="same")

    # Identify coarse optimum
    idx = int(np.nanargmax(kappa))
    # Edge-case warning if optimum near boundary
    if idx < 2 or idx > len(x) - 3:
        warnings.warn(
            "L-curve optimum at boundary of grid; consider expanding lambda_bounds",
            UserWarning,
        )
    lam_corner = lamv[idx]
    kappa_corner = kappa[idx]

    # Optional refinement around coarse optimum
    if refine:
        lower = lam_corner / refine_factor
        upper = lam_corner * refine_factor
        log_l, log_u = np.log10(lower), np.log10(upper)
        grid2 = np.logspace(log_l, log_u, refine_points)
        lr2, lp2 = _sweep_lambda(pspline, grid2)
        valid2 = np.isfinite(lr2) & np.isfinite(lp2)
        x2, y2, lamv2 = lp2[valid2], lr2[valid2], grid2[valid2]

        dx2 = (x2[2:] - x2[:-2]) * 0.5
        dy2 = (y2[2:] - y2[:-2]) * 0.5
        ddx2 = x2[2:] - 2 * x2[1:-1] + x2[:-2]
        ddy2 = y2[2:] - 2 * y2[1:-1] + y2[:-2]
        kappa2 = np.full_like(x2, np.nan)
        denom2 = (dx2 * dx2 + dy2 * dy2) ** 1.5
        kappa2[1:-1] = np.abs(dx2 * ddy2 - dy2 * ddx2) / denom2
        if smooth_kappa:
            kappa2 = np.convolve(kappa2, kernel, mode="same")

        idx2 = int(np.nanargmax(kappa2))
        if idx2 < 2 or idx2 > len(x2) - 3:
            warnings.warn(
                "Refined L-curve optimum at boundary; expand refine_factor or refine_points",
                UserWarning,
            )
        lam_corner = lamv2[idx2]
        kappa_corner = kappa2[idx2]

    return _finish(pspline, lam_corner, kappa_corner)

Pick Lambda via minimum distance on V‑curve.

Source code in src/psplines/optimize.py

def v_curve(
    pspline: PSpline,
    lambda_bounds: tuple[float, float] = (1e-6, 1e6),
    num_lambda: int = 81,
) -> tuple[float, float]:
    """
    Pick Lambda via minimum distance on V‑curve.
    """
    grid = np.logspace(
        np.log10(lambda_bounds[0]), np.log10(lambda_bounds[1]), num_lambda
    )
    lr, lp = _sweep_lambda(pspline, grid)
    valid = np.isfinite(lr) & np.isfinite(lp)
    if valid.sum() < 2:
        raise RuntimeError("Not enough V‑curve points")
    dr = np.diff(lr[valid])
    dp = np.diff(lp[valid])
    dist = np.hypot(dr, dp)
    mid = np.sqrt(grid[valid][:-1] * grid[valid][1:])
    idx = int(np.argmin(dist))
    return _finish(pspline, mid[idx], dist[idx])

Usage Examples¶

Cross-Validation (Recommended)¶

import numpy as np
from psplines import PSpline
from psplines.optimize import cross_validation

# Create spline
x = np.linspace(0, 1, 100)
y = np.sin(2 * np.pi * x) + 0.1 * np.random.randn(100)
spline = PSpline(x, y, nseg=20)

# Find optimal lambda using cross-validation
best_lambda, cv_score = cross_validation(spline)
print(f"Optimal λ: {best_lambda:.6f}, CV score: {cv_score:.6f}")

# Use optimal lambda
spline.lambda_ = best_lambda
spline.fit()

AIC-Based Selection¶

from psplines.optimize import aic

# Find optimal lambda using AIC
best_lambda, aic_score = aic(spline)
print(f"Optimal λ: {best_lambda:.6f}, AIC: {aic_score:.6f}")

L-Curve Method¶

from psplines.optimize import l_curve

# Find optimal lambda using L-curve
best_lambda, curvature = l_curve(spline)
print(f"Optimal λ: {best_lambda:.6f}, Curvature: {curvature:.6f}")

Comparing Methods¶

import matplotlib.pyplot as plt

# Compare different methods
methods = {
    'CV': cross_validation,
    'AIC': aic,
    'L-curve': l_curve,
    'V-curve': v_curve
}

results = {}
for name, method in methods.items():
    try:
        lambda_opt, score = method(spline)
        results[name] = lambda_opt
        print(f"{name}: λ = {lambda_opt:.6f}")
    except Exception as e:
        print(f"{name} failed: {e}")

# Plot comparison
if results:
    methods_list = list(results.keys())
    lambdas = list(results.values())

    plt.figure(figsize=(8, 5))
    plt.bar(methods_list, lambdas)
    plt.yscale('log')
    plt.ylabel('Optimal λ')
    plt.title('Comparison of Parameter Selection Methods')
    plt.show()

Mathematical Background¶

Cross-Validation¶

Generalized Cross-Validation (GCV) minimizes: \(\(\text{GCV}(\lambda) = \frac{n \|y - S_\lambda y\|^2}{[n - \text{tr}(S_\lambda)]^2}\)\)

where \(S_\lambda = B(B^TB + \lambda D^TD)^{-1}B^T\) is the smoothing matrix.

Advantages: - Well-established statistical foundation - Good performance across various problems - Automatic selection without user input

Disadvantages: - Can be computationally expensive - May oversmooth in some cases

Akaike Information Criterion (AIC)¶

AIC balances fit quality and model complexity: \(\(\text{AIC}(\lambda) = n \log(\hat{\sigma}^2) + 2 \cdot \text{ED}(\lambda)\)\)

where: - \(\hat{\sigma}^2 = \|y - S_\lambda y\|^2 / n\) is the residual variance - \(\text{ED}(\lambda) = \text{tr}(S_\lambda)\) is the effective degrees of freedom

Advantages: - Information-theoretic foundation - Fast computation - Good for model comparison

Disadvantages: - May not work well for all noise levels - Less robust than cross-validation

L-Curve Method¶

The L-curve plots \(\log(\|D\alpha_\lambda\|^2)\) vs \(\log(\|y - B\alpha_\lambda\|^2)\) and finds the point of maximum curvature.

Curvature is computed as: \(\(\kappa(\lambda) = \frac{2(\rho' \eta'' - \rho'' \eta')}{(\rho'^2 + \eta'^2)^{3/2}}\)\)

where \(\rho(\lambda) = \log(\|y - B\alpha_\lambda\|^2)\) and \(\eta(\lambda) = \log(\|D\alpha_\lambda\|^2)\).

Advantages: - Intuitive geometric interpretation - No statistical assumptions - Good for ill-posed problems

Disadvantages: - Can be sensitive to noise - May not have clear corner

V-Curve Method¶

Similar to L-curve but uses different scaling and looks for valley shape.

Implementation Details¶

Optimization Strategy¶

All methods use: 1. Logarithmic search: Test λ values on logarithmic grid 2. Golden section search: Refine around the optimal region 3. Sparse linear algebra: Efficient computation of smoothing matrices

Default Parameter Ranges¶

λ range: \(10^{-6}\) to \(10^6\) (12 orders of magnitude)
Grid points: 50 logarithmically spaced values
Refinement: Golden section search with tolerance \(10^{-6}\)

Performance Considerations¶

Cross-validation: \(O(n^3)\) for dense problems, \(O(n^2)\) for sparse
AIC: \(O(n^2)\) computation per λ value
L-curve: \(O(n^2)\) plus curvature computation
V-curve: Similar to L-curve

Numerical Stability¶

Uses sparse Cholesky decomposition when possible
Handles ill-conditioned matrices gracefully
Monitors condition numbers and issues warnings

Method Selection Guidelines¶

Recommended Approach¶

Start with cross-validation: Most robust for general use
Try AIC: If CV is too slow or gives unreasonable results
Use L-curve: For regularization-heavy applications
Compare methods: Check consistency across approaches

Problem-Specific Recommendations¶

Small datasets (n < 100): Cross-validation or AIC
Large datasets (n > 1000): AIC or L-curve for speed
High noise: Cross-validation (more robust)
Low noise: Any method should work well
Sparse data: L-curve or V-curve
Time series: Cross-validation with temporal structure

Troubleshooting¶

If optimization fails: 1. Check input data for issues (NaN, infinite values) 2. Try different nseg values 3. Reduce the λ search range 4. Use a different optimization method 5. Manually specify λ based on domain knowledge