Skip to content

Whittaker Smoother API

The Whittaker smoother is the special case of P-splines where the basis matrix \(B = I\) (identity). Instead of estimating B-spline coefficients, the smoother operates directly on the data vector \(y\), solving:

\[(\mathbf{W} + \lambda\, \mathbf{D}^\top \mathbf{D})\, \mathbf{z} = \mathbf{W}\, \mathbf{y}\]

For non-uniformly spaced data, the standard difference operator \(D\) is replaced by the divided-difference operator \(D_x\) which weights each finite difference by the reciprocal of the gap in \(x\). This ensures the roughness penalty is expressed in the natural units of the independent variable rather than index position.

WhittakerSmoother Class

Whittaker smoother with x-aware penalty for non-uniform data.

Solves (W + λ D'D) z = W y where D is either the standard difference operator (uniform spacing) or the divided-difference operator (non-uniform spacing).

Parameters:

  • x ((array - like, shape(n))) –

    Sample positions — must be finite. Need not be sorted or uniformly spaced; the smoother sorts internally.

  • y ((array - like, shape(n))) –

    Observed values at each x.

  • lambda_ (float, default: 100.0 ) –

    Smoothing parameter (> 0).

  • penalty_order (int, default: 2 ) –

    Difference order for the roughness penalty (1 or 2, default 2).

  • weights ((array - like, shape(n)), default: None ) –

    Non-negative observation weights. Zero means "missing".

fit

Fit the smoother via a single sparse solve.

Solves (W + λ D'D) z = W y and stores results in :attr:fitted_values, :attr:ED, and :attr:se_fitted.

Returns:

Source code in src/psplines/whittaker.py
def fit(self) -> WhittakerSmoother:
    """Fit the smoother via a single sparse solve.

    Solves ``(W + λ D'D) z = W y`` and stores results in
    :attr:`fitted_values`, :attr:`ED`, and :attr:`se_fitted`.

    Returns
    -------
    WhittakerSmoother
        *self*, for method-chaining.
    """
    x, y_sorted, w, W, _D, DtD = self._prepare_system()
    n = x.size

    # Solve  (W + λ D'D) z = W y
    A = (W + self.lambda_ * DtD).tocsr()  # type: ignore[operator]
    z = spsolve(A, W @ y_sorted)

    # Diagonal of A⁻¹ via sparse LU — used for both ED and SEs
    diag_invA = _sparse_diag_inv(A)

    # Effective degrees of freedom: ED = trace(W @ A⁻¹) = w · diag(A⁻¹)
    self.ED = float(w @ diag_invA)

    # Pointwise SEs: se_i = sqrt(φ · [A⁻¹]_ii),  φ = RSS / (n − ED)
    resid = y_sorted - z
    dof = max(n - self.ED, 1.0)
    rss = float(resid @ (W @ resid))
    phi = rss / dof
    self.se_fitted = np.sqrt(np.maximum(phi * diag_invA, 0.0))

    # Unsort back to original order
    assert self._unsort_idx is not None
    self.fitted_values = z[self._unsort_idx]
    self.se_fitted = self.se_fitted[self._unsort_idx]

    return self

predict

predict(x_new: ArrayLike) -> NDArray

Interpolate fitted values at new x-locations.

Uses linear interpolation on the fitted (sorted) grid.

Parameters:

  • x_new (array - like) –

    Query positions.

Returns:

  • NDArray

    Interpolated smoothed values.

Source code in src/psplines/whittaker.py
def predict(self, x_new: ArrayLike) -> NDArray:
    """Interpolate fitted values at new x-locations.

    Uses linear interpolation on the fitted (sorted) grid.

    Parameters
    ----------
    x_new : array-like
        Query positions.

    Returns
    -------
    NDArray
        Interpolated smoothed values.
    """
    if self.fitted_values is None:
        raise FittingError("Model not fitted. Call fit() first.")
    assert self._x_sorted is not None and self._sort_idx is not None
    xq = np.asarray(x_new, dtype=float).ravel()
    if xq.size == 0:
        raise ValidationError("x_new cannot be empty")
    # fitted_values is in original order; get sorted version for interp
    z_sorted = self.fitted_values[self._sort_idx]
    return np.asarray(np.interp(xq, self._x_sorted, z_sorted))

cross_validation

cross_validation(
    lambda_bounds: tuple[float, float] = (1e-06, 1000000.0),
) -> tuple[float, float]

Select lambda via generalized cross-validation (GCV).

Minimises GCV(λ) = (RSS / n) / (1 - ED / n)² over a bounded search in log₁₀(λ).

Parameters:

  • lambda_bounds ((float, float), default: (1e-06, 1000000.0) ) –

    Lower and upper bounds for the λ search.

Returns:

  • best_lambda ( float ) –
  • best_score ( float ) –
Source code in src/psplines/whittaker.py
def cross_validation(
    self,
    lambda_bounds: tuple[float, float] = (1e-6, 1e6),
) -> tuple[float, float]:
    """Select lambda via generalized cross-validation (GCV).

    Minimises ``GCV(λ) = (RSS / n) / (1 - ED / n)²`` over a
    bounded search in log₁₀(λ).

    Parameters
    ----------
    lambda_bounds : (float, float)
        Lower and upper bounds for the λ search.

    Returns
    -------
    best_lambda : float
    best_score : float
    """
    _x, y_sorted, w, W, D, DtD = self._prepare_system()
    n = _x.size

    def obj(loglam: float) -> float:
        lam = 10.0**loglam
        _, rss, _, edf = self._solve_for_lambda(
            lam,
            y_sorted,
            w,
            W,
            D,
            DtD,
            need_edf=True,
        )
        denom = (1.0 - edf / n) ** 2
        return (rss / n) / denom if denom > 0 else np.inf

    res = minimize_scalar(
        obj,
        bounds=(np.log10(lambda_bounds[0]), np.log10(lambda_bounds[1])),
        method="bounded",
    )
    if not res.success:
        raise OptimizationError(f"GCV optimisation failed: {res.message}")
    best_lam = 10.0**res.x
    self.lambda_ = best_lam
    self.fit()
    return best_lam, float(res.fun)

v_curve

v_curve(
    lambda_bounds: tuple[float, float] = (1e-06, 1000000.0),
    num_lambda: int = 81,
) -> tuple[float, float]

Select lambda via the V-curve minimum-distance criterion.

Sweeps a log-uniform grid of λ values, computes (log RSS, log penalty) at each, and picks the λ where consecutive-point distance is minimised.

Parameters:

  • lambda_bounds ((float, float), default: (1e-06, 1000000.0) ) –

    Lower and upper bounds for the λ grid.

  • num_lambda (int, default: 81 ) –

    Number of grid points.

Returns:

  • best_lambda ( float ) –
  • best_score ( float ) –

    Minimum distance value.

Source code in src/psplines/whittaker.py
def v_curve(
    self,
    lambda_bounds: tuple[float, float] = (1e-6, 1e6),
    num_lambda: int = 81,
) -> tuple[float, float]:
    """Select lambda via the V-curve minimum-distance criterion.

    Sweeps a log-uniform grid of λ values, computes
    ``(log RSS, log penalty)`` at each, and picks the λ where
    consecutive-point distance is minimised.

    Parameters
    ----------
    lambda_bounds : (float, float)
        Lower and upper bounds for the λ grid.
    num_lambda : int
        Number of grid points.

    Returns
    -------
    best_lambda : float
    best_score : float
        Minimum distance value.
    """
    _x, y_sorted, w, W, D, DtD = self._prepare_system()

    grid = np.logspace(
        np.log10(lambda_bounds[0]), np.log10(lambda_bounds[1]), num_lambda
    )
    log_rss = np.full(num_lambda, np.nan)
    log_pen = np.full(num_lambda, np.nan)

    for i, lam in enumerate(grid):
        _, rss, pen, _ = self._solve_for_lambda(
            lam,
            y_sorted,
            w,
            W,
            D,
            DtD,
            need_penalty=True,
        )
        log_rss[i] = np.log(max(rss, 1e-300))
        log_pen[i] = np.log(max(pen, 1e-300))

    valid = np.isfinite(log_rss) & np.isfinite(log_pen)
    if valid.sum() < 2:
        raise OptimizationError("Not enough valid V-curve points")

    dr = np.diff(log_rss[valid])
    dp = np.diff(log_pen[valid])
    dist = np.hypot(dr, dp)
    mid = np.sqrt(grid[valid][:-1] * grid[valid][1:])
    idx = int(np.argmin(dist))
    best_lam = float(mid[idx])
    self.lambda_ = best_lam
    self.fit()
    return best_lam, float(dist[idx])

Usage Examples

Basic Smoothing

import numpy as np
from psplines import WhittakerSmoother

# Noisy signal
x = np.linspace(0, 10, 200)
y = np.sin(x) + 0.3 * np.random.default_rng(42).standard_normal(200)

# Fit with a fixed lambda
ws = WhittakerSmoother(x, y, lambda_=1e4, penalty_order=2)
ws.fit()

# Smoothed values (same length as input, in original order)
z = ws.fitted_values

Automatic Lambda Selection

# GCV-based selection (recommended)
ws = WhittakerSmoother(x, y)
best_lam, gcv_score = ws.cross_validation()
print(f"Optimal λ = {best_lam:.2f}")

# V-curve selection (alternative)
ws2 = WhittakerSmoother(x, y)
best_lam, v_score = ws2.v_curve()

Non-Uniform Spacing

The key advantage over the standard Whittaker smoother: gaps in \(x\) are handled correctly via divided differences.

# Irregularly sampled signal
rng = np.random.default_rng(7)
x = np.sort(rng.uniform(0, 10, 150))
y = np.sin(x) + 0.2 * rng.standard_normal(150)

ws = WhittakerSmoother(x, y, lambda_=100)
ws.fit()

# The smoother automatically uses divided differences
# when x-spacing is non-uniform

Missing Data via Weights

Zero weights effectively mark observations as missing; the smoother interpolates through them:

x = np.linspace(0, 10, 200)
y_true = np.sin(x)
y = y_true.copy()
y[80:120] = 99.0  # corrupt a region

weights = np.ones(200)
weights[80:120] = 0.0  # mark as missing

ws = WhittakerSmoother(x, y, lambda_=1e3, weights=weights)
ws.fit()
# ws.fitted_values smoothly interpolates through the gap

Interpolation at New Points

ws = WhittakerSmoother(x, y, lambda_=1e3).fit()

# Predict at new locations via linear interpolation
x_new = np.linspace(0, 10, 500)
z_new = ws.predict(x_new)

Unsorted Input

The smoother accepts unsorted \(x\) — it sorts internally and returns results in the original order:

rng = np.random.default_rng(42)
x = rng.uniform(0, 10, 100)  # not sorted
y = np.sin(x) + 0.1 * rng.standard_normal(100)

ws = WhittakerSmoother(x, y, lambda_=1e3).fit()
# ws.fitted_values[i] corresponds to x[i], not to sorted x

When to Use WhittakerSmoother vs PSpline

Criterion WhittakerSmoother PSpline
Basis Identity (\(B = I\)) B-spline basis
Prediction at new points Linear interpolation Exact B-spline evaluation
Derivatives Not supported Analytic via derivative basis
GLM families Gaussian only Gaussian, Poisson, Binomial
Shape constraints Not supported Full support
Speed Single sparse solve Single sparse solve (+ basis construction)
Non-uniform spacing Handled via divided differences Handled via B-spline basis
Ideal for Fast signal smoothing, spectra, time series General smoothing, derivatives, GLMs

Rule of thumb

Use WhittakerSmoother when you want fast, simple smoothing on a fixed grid — especially for uniformly or near-uniformly sampled signals where you don't need derivatives or fancy GLM features. Use PSpline for everything else.