Whittaker Smoother API¶

The Whittaker smoother is the special case of P-splines where the basis matrix \(B = I\) (identity). Instead of estimating B-spline coefficients, the smoother operates directly on the data vector \(y\), solving:

\[(\mathbf{W} + \lambda\, \mathbf{D}^\top \mathbf{D})\, \mathbf{z} = \mathbf{W}\, \mathbf{y}\]

For non-uniformly spaced data, the standard difference operator \(D\) is replaced by the divided-difference operator \(D_x\) which weights each finite difference by the reciprocal of the gap in \(x\). This ensures the roughness penalty is expressed in the natural units of the independent variable rather than index position.

WhittakerSmoother Class¶

Whittaker smoother with x-aware penalty for non-uniform data.

Solves (W + λ D'D) z = W y where D is either the standard difference operator (uniform spacing) or the divided-difference operator (non-uniform spacing).

Parameters:

x ((array - like, shape(n))) –

Sample positions — must be finite. Need not be sorted or uniformly spaced; the smoother sorts internally.
y ((array - like, shape(n))) –

Observed values at each x.
lambda_ (float, default: 100.0 ) –

Smoothing parameter (> 0).
penalty_order (int, default: 2 ) –

Difference order for the roughness penalty (1 or 2, default 2).
weights ((array - like, shape(n)), default: None ) –

Non-negative observation weights. Zero means "missing".

fit ¶

fit() -> WhittakerSmoother

Fit the smoother via a single sparse solve.

Solves (W + λ D'D) z = W y and stores results in :attr:fitted_values, :attr:ED, and :attr:se_fitted.

Returns:

WhittakerSmoother –

self, for method-chaining.

Source code in src/psplines/whittaker.py

def fit(self) -> WhittakerSmoother:
    """Fit the smoother via a single sparse solve.

    Solves ``(W + λ D'D) z = W y`` and stores results in
    :attr:`fitted_values`, :attr:`ED`, and :attr:`se_fitted`.

    Returns
    -------
    WhittakerSmoother
        *self*, for method-chaining.
    """
    x, y_sorted, w, W, _D, DtD = self._prepare_system()
    n = x.size

    # Solve  (W + λ D'D) z = W y
    A = (W + self.lambda_ * DtD).tocsr()  # type: ignore[operator]
    z = spsolve(A, W @ y_sorted)

    # Diagonal of A⁻¹ via sparse LU — used for both ED and SEs
    diag_invA = _sparse_diag_inv(A)

    # Effective degrees of freedom: ED = trace(W @ A⁻¹) = w · diag(A⁻¹)
    self.ED = float(w @ diag_invA)

    # Pointwise SEs: se_i = sqrt(φ · [A⁻¹]_ii),  φ = RSS / (n − ED)
    resid = y_sorted - z
    dof = max(n - self.ED, 1.0)
    rss = float(resid @ (W @ resid))
    phi = rss / dof
    self.se_fitted = np.sqrt(np.maximum(phi * diag_invA, 0.0))

    # Unsort back to original order
    assert self._unsort_idx is not None
    self.fitted_values = z[self._unsort_idx]
    self.se_fitted = self.se_fitted[self._unsort_idx]

    return self

predict ¶

predict(x_new: ArrayLike) -> NDArray

Interpolate fitted values at new x-locations.

Uses linear interpolation on the fitted (sorted) grid.

Parameters:

x_new (array - like) –

Query positions.

Returns:

NDArray –

Interpolated smoothed values.

Source code in src/psplines/whittaker.py

def predict(self, x_new: ArrayLike) -> NDArray:
    """Interpolate fitted values at new x-locations.

    Uses linear interpolation on the fitted (sorted) grid.

    Parameters
    ----------
    x_new : array-like
        Query positions.

    Returns
    -------
    NDArray
        Interpolated smoothed values.
    """
    if self.fitted_values is None:
        raise FittingError("Model not fitted. Call fit() first.")
    assert self._x_sorted is not None and self._sort_idx is not None
    xq = np.asarray(x_new, dtype=float).ravel()
    if xq.size == 0:
        raise ValidationError("x_new cannot be empty")
    # fitted_values is in original order; get sorted version for interp
    z_sorted = self.fitted_values[self._sort_idx]
    return np.asarray(np.interp(xq, self._x_sorted, z_sorted))

cross_validation ¶

cross_validation(
    lambda_bounds: tuple[float, float] = (1e-06, 1000000.0),
) -> tuple[float, float]

Select lambda via generalized cross-validation (GCV).

Minimises GCV(λ) = (RSS / n) / (1 - ED / n)² over a bounded search in log₁₀(λ).

Parameters:

lambda_bounds ((float, float), default: (1e-06, 1000000.0) ) –

Lower and upper bounds for the λ search.

Returns:

best_lambda ( float ) –
best_score ( float ) –

Source code in src/psplines/whittaker.py

def cross_validation(
    self,
    lambda_bounds: tuple[float, float] = (1e-6, 1e6),
) -> tuple[float, float]:
    """Select lambda via generalized cross-validation (GCV).

    Minimises ``GCV(λ) = (RSS / n) / (1 - ED / n)²`` over a
    bounded search in log₁₀(λ).

    Parameters
    ----------
    lambda_bounds : (float, float)
        Lower and upper bounds for the λ search.

    Returns
    -------
    best_lambda : float
    best_score : float
    """
    _x, y_sorted, w, W, D, DtD = self._prepare_system()
    n = _x.size

    def obj(loglam: float) -> float:
        lam = 10.0**loglam
        _, rss, _, edf = self._solve_for_lambda(
            lam,
            y_sorted,
            w,
            W,
            D,
            DtD,
            need_edf=True,
        )
        denom = (1.0 - edf / n) ** 2
        return (rss / n) / denom if denom > 0 else np.inf

    res = minimize_scalar(
        obj,
        bounds=(np.log10(lambda_bounds[0]), np.log10(lambda_bounds[1])),
        method="bounded",
    )
    if not res.success:
        raise OptimizationError(f"GCV optimisation failed: {res.message}")
    best_lam = 10.0**res.x
    self.lambda_ = best_lam
    self.fit()
    return best_lam, float(res.fun)

v_curve ¶

v_curve(
    lambda_bounds: tuple[float, float] = (1e-06, 1000000.0),
    num_lambda: int = 81,
) -> tuple[float, float]

Select lambda via the V-curve minimum-distance criterion.

Sweeps a log-uniform grid of λ values, computes (log RSS, log penalty) at each, and picks the λ where consecutive-point distance is minimised.

Parameters:

lambda_bounds ((float, float), default: (1e-06, 1000000.0) ) –

Lower and upper bounds for the λ grid.
num_lambda (int, default: 81 ) –

Number of grid points.

Returns:

best_lambda ( float ) –
best_score ( float ) –

Minimum distance value.

Source code in src/psplines/whittaker.py

def v_curve(
    self,
    lambda_bounds: tuple[float, float] = (1e-6, 1e6),
    num_lambda: int = 81,
) -> tuple[float, float]:
    """Select lambda via the V-curve minimum-distance criterion.

    Sweeps a log-uniform grid of λ values, computes
    ``(log RSS, log penalty)`` at each, and picks the λ where
    consecutive-point distance is minimised.

    Parameters
    ----------
    lambda_bounds : (float, float)
        Lower and upper bounds for the λ grid.
    num_lambda : int
        Number of grid points.

    Returns
    -------
    best_lambda : float
    best_score : float
        Minimum distance value.
    """
    _x, y_sorted, w, W, D, DtD = self._prepare_system()

    grid = np.logspace(
        np.log10(lambda_bounds[0]), np.log10(lambda_bounds[1]), num_lambda
    )
    log_rss = np.full(num_lambda, np.nan)
    log_pen = np.full(num_lambda, np.nan)

    for i, lam in enumerate(grid):
        _, rss, pen, _ = self._solve_for_lambda(
            lam,
            y_sorted,
            w,
            W,
            D,
            DtD,
            need_penalty=True,
        )
        log_rss[i] = np.log(max(rss, 1e-300))
        log_pen[i] = np.log(max(pen, 1e-300))

    valid = np.isfinite(log_rss) & np.isfinite(log_pen)
    if valid.sum() < 2:
        raise OptimizationError("Not enough valid V-curve points")

    dr = np.diff(log_rss[valid])
    dp = np.diff(log_pen[valid])
    dist = np.hypot(dr, dp)
    mid = np.sqrt(grid[valid][:-1] * grid[valid][1:])
    idx = int(np.argmin(dist))
    best_lam = float(mid[idx])
    self.lambda_ = best_lam
    self.fit()
    return best_lam, float(dist[idx])

Usage Examples¶

Basic Smoothing¶

import numpy as np
from psplines import WhittakerSmoother

# Noisy signal
x = np.linspace(0, 10, 200)
y = np.sin(x) + 0.3 * np.random.default_rng(42).standard_normal(200)

# Fit with a fixed lambda
ws = WhittakerSmoother(x, y, lambda_=1e4, penalty_order=2)
ws.fit()

# Smoothed values (same length as input, in original order)
z = ws.fitted_values

Automatic Lambda Selection¶

# GCV-based selection (recommended)
ws = WhittakerSmoother(x, y)
best_lam, gcv_score = ws.cross_validation()
print(f"Optimal λ = {best_lam:.2f}")

# V-curve selection (alternative)
ws2 = WhittakerSmoother(x, y)
best_lam, v_score = ws2.v_curve()

Non-Uniform Spacing¶

The key advantage over the standard Whittaker smoother: gaps in \(x\) are handled correctly via divided differences.

# Irregularly sampled signal
rng = np.random.default_rng(7)
x = np.sort(rng.uniform(0, 10, 150))
y = np.sin(x) + 0.2 * rng.standard_normal(150)

ws = WhittakerSmoother(x, y, lambda_=100)
ws.fit()

# The smoother automatically uses divided differences
# when x-spacing is non-uniform

Missing Data via Weights¶

Zero weights effectively mark observations as missing; the smoother interpolates through them:

x = np.linspace(0, 10, 200)
y_true = np.sin(x)
y = y_true.copy()
y[80:120] = 99.0  # corrupt a region

weights = np.ones(200)
weights[80:120] = 0.0  # mark as missing

ws = WhittakerSmoother(x, y, lambda_=1e3, weights=weights)
ws.fit()
# ws.fitted_values smoothly interpolates through the gap

Interpolation at New Points¶

ws = WhittakerSmoother(x, y, lambda_=1e3).fit()

# Predict at new locations via linear interpolation
x_new = np.linspace(0, 10, 500)
z_new = ws.predict(x_new)

Unsorted Input¶

The smoother accepts unsorted \(x\) — it sorts internally and returns results in the original order:

rng = np.random.default_rng(42)
x = rng.uniform(0, 10, 100)  # not sorted
y = np.sin(x) + 0.1 * rng.standard_normal(100)

ws = WhittakerSmoother(x, y, lambda_=1e3).fit()
# ws.fitted_values[i] corresponds to x[i], not to sorted x

When to Use WhittakerSmoother vs PSpline¶

Criterion	`WhittakerSmoother`	`PSpline`
Basis	Identity (\(B = I\))	B-spline basis
Prediction at new points	Linear interpolation	Exact B-spline evaluation
Derivatives	Not supported	Analytic via derivative basis
GLM families	Gaussian only	Gaussian, Poisson, Binomial
Shape constraints	Not supported	Full support
Speed	Single sparse solve	Single sparse solve (+ basis construction)
Non-uniform spacing	Handled via divided differences	Handled via B-spline basis
Ideal for	Fast signal smoothing, spectra, time series	General smoothing, derivatives, GLMs

Rule of thumb

Use WhittakerSmoother when you want fast, simple smoothing on a fixed grid — especially for uniformly or near-uniformly sampled signals where you don't need derivatives or fancy GLM features. Use PSpline for everything else.