Core API¶

The core module contains the main PSpline class that implements penalized B-spline smoothing.

PSpline Class¶

Univariate penalised B-spline smoother.

fit ¶

fit(
    *, xl: float | None = None, xr: float | None = None
) -> PSpline

Fit the P-spline model.

Parameters:

xl (float, default: None ) –

Left boundary of the domain. Defaults to min(x).
xr (float, default: None ) –

Right boundary of the domain. Defaults to max(x).

Returns:

PSpline –

The fitted spline object.

Source code in src/psplines/core.py

def fit(self, *, xl: float | None = None, xr: float | None = None) -> PSpline:
    """
    Fit the P-spline model.

    Parameters
    ----------
    xl : float, optional
        Left boundary of the domain. Defaults to min(x).
    xr : float, optional
        Right boundary of the domain. Defaults to max(x).

    Returns
    -------
    PSpline
        The fitted spline object.
    """
    # Validate and set domain
    x_array = np.asarray(self.x)
    x_min = float(np.min(x_array))
    x_max = float(np.max(x_array))

    if xl is not None:
        if not np.isfinite(xl):
            raise ValueError("xl must be finite")
        if xl > x_min:
            raise ValueError(f"xl ({xl}) must be <= min(x) ({x_min})")

    if xr is not None:
        if not np.isfinite(xr):
            raise ValueError("xr must be finite")
        if xr < x_max:
            raise ValueError(f"xr ({xr}) must be >= max(x) ({x_max})")

    self._xl = xl if xl is not None else x_min
    self._xr = xr if xr is not None else x_max

    if self._xl >= self._xr:
        raise ValueError(f"xl ({self._xl}) must be < xr ({self._xr})")
    # basis and penalty
    self.B, self.knots = b_spline_basis(
        self.x, self._xl, self._xr, self.nseg, self.degree
    )
    nb = self.B.shape[1]
    D = difference_matrix(nb, self.penalty_order)
    # sparse cross-products
    self._BtB = (self.B.T @ self.B).tocsr()
    self._DtD = (D.T @ D).tocsr()
    self._Bty = self.B.T @ self.y
    # constraints
    self._build_constraints(nb)
    # solve coefficients
    P = self.lambda_ * self._DtD
    self.coef = self._solve_coef(P)
    # fitted values
    self.fitted_values = self.B @ self.coef
    # analytic uncertainty
    self._update_uncertainty()
    return self

predict ¶

predict(
    x_new: ArrayLike,
    *,
    derivative_order: int | None = None,
    return_se: bool = False,
    se_method: str = "analytic",
    B_boot: int = 5000,
    seed: int | None = None,
    n_jobs: int = 1,
    hdi_prob: float = 0.95,
) -> NDArray | tuple[NDArray, ...]

Predict smooth (or derivative) with optional uncertainty.

Parameters:

x_new (array - like) –

Points at which to evaluate the spline.
derivative_order (int, default: None ) –

Order of derivative to compute. None for function values.
return_se (bool, default: False ) –

Whether to return standard errors.
se_method (str, default: "analytic" ) –

Method for computing uncertainty: - 'analytic': delta-method SE -> (fhat, se) - 'bootstrap': parametric bootstrap SEs -> (fhat, se) - 'bayes': posterior mean + HDI credible band -> (mean, lower, upper)
B_boot (int, default: 5000 ) –

Number of bootstrap replicates (for bootstrap method).
seed (int, default: None ) –

Random seed (for bootstrap method).
n_jobs (int, default: 1 ) –

Number of parallel jobs (for bootstrap method).
hdi_prob (float, default: 0.95 ) –

Probability for HDI credible band (for Bayesian method).

Returns:

NDArray or tuple –

Predictions, optionally with uncertainty estimates.

Source code in src/psplines/core.py

def predict(
    self,
    x_new: ArrayLike,
    *,
    derivative_order: int | None = None,
    return_se: bool = False,
    se_method: str = "analytic",
    B_boot: int = 5000,
    seed: int | None = None,
    n_jobs: int = 1,
    hdi_prob: float = 0.95,
) -> NDArray | tuple[NDArray, ...]:
    """
    Predict smooth (or derivative) with optional uncertainty.

    Parameters
    ----------
    x_new : array-like
        Points at which to evaluate the spline.
    derivative_order : int, optional
        Order of derivative to compute. None for function values.
    return_se : bool, default False
        Whether to return standard errors.
    se_method : str, default "analytic"
        Method for computing uncertainty:
        - 'analytic': delta-method SE -> (fhat, se)
        - 'bootstrap': parametric bootstrap SEs -> (fhat, se)
        - 'bayes': posterior mean + HDI credible band -> (mean, lower, upper)
    B_boot : int, default 5000
        Number of bootstrap replicates (for bootstrap method).
    seed : int, optional
        Random seed (for bootstrap method).
    n_jobs : int, default 1
        Number of parallel jobs (for bootstrap method).
    hdi_prob : float, default 0.95
        Probability for HDI credible band (for Bayesian method).

    Returns
    -------
    NDArray or tuple
        Predictions, optionally with uncertainty estimates.
    """
    if self.coef is None:
        raise RuntimeError("Model not fitted. Call fit() first.")

    # Validate input
    try:
        xq = _as1d(x_new)
    except (ValueError, TypeError) as e:
        raise ValueError(f"Invalid x_new array: {e}")

    if xq.size == 0:
        raise ValueError("x_new cannot be empty")
    if not np.all(np.isfinite(xq)):
        raise ValueError("x_new contains non-finite values (NaN or inf)")

    # Validate parameters
    if derivative_order is not None and derivative_order <= 0:
        raise ValueError(f"derivative_order must be positive (got {derivative_order})")
    if se_method not in ("analytic", "bootstrap", "bayes"):
        raise ValueError(f"se_method must be 'analytic', 'bootstrap', or 'bayes' (got '{se_method}')")
    if B_boot <= 0:
        raise ValueError(f"B_boot must be positive (got {B_boot})")
    if n_jobs <= 0:
        raise ValueError(f"n_jobs must be positive (got {n_jobs})")
    if not (0 < hdi_prob < 1):
        raise ValueError(f"hdi_prob must be between 0 and 1 (got {hdi_prob})")

    # Bayesian credible band
    if se_method == "bayes":
        if self.trace is None:
            raise RuntimeError("Call bayes_fit() first to sample the posterior.")
        if self._spline is None:
            raise RuntimeError("No BSpline stored. Run bayes_fit() first.")

        # Evaluate stored BSpline (or its k-th derivative)
        if derivative_order is None:
            Bq = self._spline(xq)
        else:
            Bq = self._spline(xq, nu=derivative_order)
        # to array for matmul
        Bq = Bq.toarray() if sp.issparse(Bq) else np.asarray(Bq)

        # posterior alpha draws: shape (n_samples, n_basis)
        alpha_draws = (
            self.trace.posterior["alpha"].stack(sample=("chain", "draw")).values
        )

        # If it came transposed (basis × samples), swap axes
        if (
            alpha_draws.shape[0] == Bq.shape[1]
            and alpha_draws.shape[1] != Bq.shape[1]
        ):
            alpha_draws = alpha_draws.T

        # Check dimensions now match: (n_samples, nb)
        S, p = alpha_draws.shape
        if p != Bq.shape[1]:
            raise ValueError(
                f"Mismatch: posterior alpha has length {p}, but basis has {Bq.shape[1]} cols"
            )

        # draws of f^(k)(x): (n_samples, n_points)
        deriv_draws = alpha_draws @ Bq.T

        # summarize
        mean = deriv_draws.mean(axis=0)
        lower = np.percentile(deriv_draws, (1 - hdi_prob) / 2 * 100, axis=0)
        upper = np.percentile(deriv_draws, (1 + hdi_prob) / 2 * 100, axis=0)
        return mean, lower, upper

    # Bootstrap SEs
    if se_method == "bootstrap" and return_se:
        return self._bootstrap_predict(xq, derivative_order, B_boot, seed, n_jobs)

    # Analytic or plain prediction
    if derivative_order is None:
        assert self._xl is not None and self._xr is not None, "Domain bounds not set. Call fit() first."
        Bq, _ = b_spline_basis(xq, self._xl, self._xr, self.nseg, self.degree)
    else:
        if derivative_order <= 0:
            raise ValueError("derivative_order must be positive")
        assert self._xl is not None and self._xr is not None, "Domain bounds not set. Call fit() first."
        Bq, _ = b_spline_derivative_basis(
            xq,
            self._xl,
            self._xr,
            self.nseg,
            self.degree,
            derivative_order,
            self.knots,
        )

    fhat = Bq @ self.coef

    if not return_se:
        return fhat

    if se_method != "analytic":
        raise ValueError(f"Unknown se_method: {se_method}")

    if self.se_coef is None:
        raise RuntimeError("Analytic SEs unavailable. Call fit() with uncertainty computation.")

    cov_diag = self.se_coef**2
    B2 = Bq.multiply(Bq) if sp.issparse(Bq) else Bq**2
    se = np.sqrt(B2 @ cov_diag)
    return fhat, se

derivative ¶

derivative(
    x_new: ArrayLike,
    *,
    deriv_order: int = 1,
    return_se: bool = False,
    se_method: str = "analytic",
    **kwargs: Any,
) -> NDArray | tuple[NDArray, ...]

Compute k-th derivative of the fitted spline.

Parameters:

x_new (array - like) –

Points at which to evaluate the derivative.
deriv_order (int, default: 1 ) –

Order of derivative to compute.
return_se (bool, default: False ) –

Whether to return standard errors.
se_method (str, default: "analytic" ) –

Method for computing standard errors ("analytic" or "bootstrap").
**kwargs (Any, default: {} ) –

Additional arguments passed to bootstrap method if applicable.

Returns:

NDArray or tuple –

Derivative values, optionally with standard errors.

Source code in src/psplines/core.py

def derivative(
    self,
    x_new: ArrayLike,
    *,
    deriv_order: int = 1,
    return_se: bool = False,
    se_method: str = "analytic",
    **kwargs: Any
) -> NDArray | tuple[NDArray, ...]:
    """
    Compute k-th derivative of the fitted spline.

    Parameters
    ----------
    x_new : array-like
        Points at which to evaluate the derivative.
    deriv_order : int, default 1
        Order of derivative to compute.
    return_se : bool, default False
        Whether to return standard errors.
    se_method : str, default "analytic"
        Method for computing standard errors ("analytic" or "bootstrap").
    **kwargs
        Additional arguments passed to bootstrap method if applicable.

    Returns
    -------
    NDArray or tuple
        Derivative values, optionally with standard errors.
    """
    if self.coef is None:
        raise RuntimeError("Model not fitted. Call fit() first.")

    # Validate input
    try:
        xq = _as1d(x_new)
    except (ValueError, TypeError) as e:
        raise ValueError(f"Invalid x_new array: {e}")

    if xq.size == 0:
        raise ValueError("x_new cannot be empty")
    if not np.all(np.isfinite(xq)):
        raise ValueError("x_new contains non-finite values (NaN or inf)")
    if deriv_order <= 0:
        raise ValueError(f"deriv_order must be positive (got {deriv_order})")
    if se_method not in ("analytic", "bootstrap", "bayes"):
        raise ValueError(f"se_method must be 'analytic', 'bootstrap', or 'bayes' (got '{se_method}')")

    # Use predict method for bootstrap and Bayesian methods
    if se_method in ("bootstrap", "bayes"):
        return self.predict(
            x_new,
            derivative_order=deriv_order,
            return_se=return_se,
            se_method=se_method,
            **kwargs
        )

    # Direct computation for analytic method (more efficient)
    assert self._xl is not None and self._xr is not None, "Domain bounds not set. Call fit() first."
    Bq, _ = b_spline_derivative_basis(
        xq,
        self._xl,
        self._xr,
        self.nseg,
        self.degree,
        deriv_order,
        self.knots,
    )
    fhat = Bq @ self.coef

    if not return_se:
        return fhat

    if self.se_coef is None:
        raise RuntimeError("Analytic SEs unavailable. Call fit() with uncertainty computation.")

    cov_diag = self.se_coef**2
    B2 = Bq.multiply(Bq) if sp.issparse(Bq) else Bq**2
    se = np.sqrt(B2 @ cov_diag)
    return fhat, se

bayes_fit ¶

bayes_fit(
    a: float = 2.0,
    b: float = 0.1,
    c: float = 2.0,
    d: float = 1.0,
    draws: int = 2000,
    tune: int = 2000,
    chains: int = 4,
    cores: int = 4,
    target_accept: float = 0.9,
    random_seed: int | None = None,
) -> Any

Source code in src/psplines/core.py

def bayes_fit(
    self,
    a: float = 2.0,
    b: float = 0.1,
    c: float = 2.0,
    d: float = 1.0,
    draws: int = 2000,
    tune: int = 2000,
    chains: int = 4,
    cores: int = 4,
    target_accept: float = 0.9,
    random_seed: int | None = None,
) -> Any:
    # Prepare basis and penalty
    x_array = np.asarray(self.x)
    self._xl = float(np.min(x_array)) if self._xl is None else self._xl
    self._xr = float(np.max(x_array)) if self._xr is None else self._xr
    B_sp, self.knots = b_spline_basis(
        self.x, self._xl, self._xr, self.nseg, self.degree
    )
    B = B_sp.toarray() if sp.issparse(B_sp) else B_sp
    nb = B.shape[1]
    D = difference_matrix(nb, self.penalty_order).toarray()
    y = self.y
    I_nb = np.eye(nb)

    # store BSpline object for predict
    coeffs = np.eye(nb)
    self._spline = BSpline(self.knots, coeffs, self.degree, extrapolate=False)

    with pm.Model():
        lam = pm.Gamma("lam", alpha=a, beta=b, shape=D.shape[0])
        Q = pm.math.dot(D.T * lam, D)
        Q_j = Q + I_nb * 1e-6
        alpha = pm.MvNormal(
            "alpha",
            mu=pm.math.zeros(Q_j.shape[0]),
            tau=Q_j,
            shape=Q_j.shape[0],
        )
        sigma = pm.InverseGamma("sigma", alpha=c, beta=d)
        mu = pm.Deterministic("mu", pm.math.dot(B, alpha))
        pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y)
        trace = pm.sample(
            draws=draws,
            tune=tune,
            chains=chains,
            cores=cores,
            target_accept=target_accept,
            random_seed=random_seed,
        )

    # Store results
    self.trace = trace
    self.coef = (
        trace.posterior["alpha"]
        .stack(sample=("chain", "draw"))
        .mean(dim="sample")
        .values
    )
    self.fitted_values = (
        trace.posterior["mu"]
        .stack(sample=("chain", "draw"))
        .mean(dim="sample")
        .values
    )
    self.lambda_post = (
        trace.posterior["lam"]
        .stack(sample=("chain", "draw"))
        .mean(dim="sample")
        .values
    )
    return trace

plot_lam_trace ¶

plot_lam_trace(figsize: tuple[int, int] = (8, 6)) -> None

Plot trace and marginal posterior for each lambda_j.

Source code in src/psplines/core.py

def plot_lam_trace(self, figsize: tuple[int, int] = (8, 6)) -> None:
    """
    Plot trace and marginal posterior for each lambda_j.
    """
    az.plot_trace(self.trace, var_names=["lam"], figsize=figsize)

plot_alpha_trace ¶

plot_alpha_trace(figsize: tuple[int, int] = (8, 6)) -> None

Plot trace and marginal posterior for alpha coefficients.

Source code in src/psplines/core.py

def plot_alpha_trace(self, figsize: tuple[int, int] = (8, 6)) -> None:
    """
    Plot trace and marginal posterior for alpha coefficients.
    """
    az.plot_trace(self.trace, var_names=["alpha"], figsize=figsize)

plot_posterior ¶

plot_posterior(figsize: tuple[int, int] = (8, 6)) -> None

Plot posterior

Source code in src/psplines/core.py

def plot_posterior(self, figsize: tuple[int, int] = (8, 6)) -> None:
    """
    Plot posterior
    """
    az.plot_posterior(
        self.trace,
        var_names=["lam", "sigma"],
        figsize=figsize,
        hdi_prob=0.95,
        point_estimate="mean",
    )

Usage Examples¶

Basic Usage¶

import numpy as np
from psplines import PSpline

# Generate data
x = np.linspace(0, 1, 50)
y = np.sin(2 * np.pi * x) + 0.1 * np.random.randn(50)

# Create and fit spline
spline = PSpline(x, y, nseg=20, lambda_=1.0)
spline.fit()

# Make predictions
x_new = np.linspace(0, 1, 100)
y_pred = spline.predict(x_new)

With Uncertainty Quantification¶

# Analytical standard errors
y_pred, se = spline.predict(x_new, return_se=True)

# Bootstrap confidence intervals
y_pred, se_boot = spline.predict(x_new, return_se=True, 
                                se_method="bootstrap", B_boot=1000)

Derivative Computation¶

# First derivative
dy_dx = spline.derivative(x_new, deriv_order=1)

# Second derivative with uncertainty
d2y_dx2, se = spline.derivative(x_new, deriv_order=2, return_se=True)

Bayesian Inference¶

# Fit Bayesian model
trace = spline.bayes_fit(draws=2000, tune=1000)

# Get posterior credible intervals
mean, lower, upper = spline.predict(x_new, se_method="bayes")

Model Parameters¶

Basis Function Parameters¶

nseg: Number of B-spline segments
- Controls the flexibility of the basis
- More segments = more flexible fit
- Typical values: 10-50
degree: Degree of B-spline basis functions
- Controls local smoothness
- Higher degree = smoother derivatives
- Typical values: 1-4 (3 is most common)

Penalty Parameters¶

lambda_: Smoothing parameter
- Controls the trade-off between fit and smoothness
- Higher values = smoother fits
- Can be selected automatically
penalty_order: Order of the difference penalty
- 1: Penalizes first differences (rough penalty on slopes)
- 2: Penalizes second differences (rough penalty on curvature)
- 3: Penalizes third differences (rough penalty on jerk)

Constraint Parameters¶

constraints: Dictionary specifying boundary constraints
- "deriv": Derivative constraints at boundaries
- Example: {"deriv": {"order": 1, "initial": 0, "final": 0}}

Model Attributes¶

After fitting, the following attributes are available:

coef: B-spline coefficients
fitted_values: Fitted values at input points
knots: Knot vector used for B-splines
ED: Effective degrees of freedom
sigma2: Residual variance estimate
se_coef: Standard errors of coefficients
se_fitted: Standard errors of fitted values

Error Handling¶

The PSpline class includes comprehensive input validation:

Array validation: Checks for proper dimensions, finite values
Parameter validation: Ensures valid ranges and types
State validation: Verifies model is fitted before prediction
Constraint validation: Validates constraint specifications

All errors provide descriptive messages with suggested solutions.