Skip to content

Core API

The core module contains the main PSpline class that implements penalized B-spline smoothing.

PSpline Class

Univariate penalised B-spline smoother.

fit

fit(
    *, xl: float | None = None, xr: float | None = None
) -> PSpline

Fit the P-spline model.

For Gaussian family, solves the penalized normal equations directly. For Poisson/Binomial families, uses IRLS (eq. 2.21–2.22).

Parameters:

  • xl (float, default: None ) –

    Left boundary of the domain. Defaults to min(x).

  • xr (float, default: None ) –

    Right boundary of the domain. Defaults to max(x).

Returns:

  • PSpline

    The fitted spline object.

Source code in src/psplines/core.py
def fit(self, *, xl: float | None = None, xr: float | None = None) -> PSpline:
    """
    Fit the P-spline model.

    For Gaussian family, solves the penalized normal equations directly.
    For Poisson/Binomial families, uses IRLS (eq. 2.21–2.22).

    Parameters
    ----------
    xl : float, optional
        Left boundary of the domain. Defaults to min(x).
    xr : float, optional
        Right boundary of the domain. Defaults to max(x).

    Returns
    -------
    PSpline
        The fitted spline object.
    """
    # Validate and set domain
    x_array = np.asarray(self.x)
    x_min = float(np.min(x_array))
    x_max = float(np.max(x_array))

    if xl is not None:
        if not np.isfinite(xl):
            raise ValidationError("xl must be finite")
        if xl > x_min:
            raise ValidationError(f"xl ({xl}) must be <= min(x) ({x_min})")

    if xr is not None:
        if not np.isfinite(xr):
            raise ValidationError("xr must be finite")
        if xr < x_max:
            raise ValidationError(f"xr ({xr}) must be >= max(x) ({x_max})")

    self._xl = xl if xl is not None else x_min
    self._xr = xr if xr is not None else x_max

    if self._xl >= self._xr:
        raise ValidationError(f"xl ({self._xl}) must be < xr ({self._xr})")

    # basis and penalty
    self.B, self.knots = b_spline_basis(
        self.x, self._xl, self._xr, self.nseg, self.degree
    )
    nb = self.B.shape[1]

    # Build base penalty (may be modified by variable/adaptive modes)
    if self.penalty_gamma is not None:
        self._DtD = variable_penalty_matrix(
            nb, self.penalty_order, self.penalty_gamma
        )
    else:
        D = difference_matrix(nb, self.penalty_order)
        self._DtD = (D.T @ D).tocsr()

    # constraints
    self._build_constraints(nb)

    fam = self._family_obj
    assert fam is not None

    # Dispatch to the appropriate fitting strategy
    if self.adaptive:
        self._fit_adaptive(fam)
    elif self.shape:
        self._fit_shape_constrained(fam)
    elif fam.is_gaussian:
        self._fit_gaussian()
    else:
        self._fit_irls(fam)

    return self

predict

predict(
    x_new: ArrayLike,
    *,
    return_se: bool = False,
    se_method: str = "analytic",
    scale: str = "response",
    n_boot: int = 5000,
    seed: int | None = None,
    n_jobs: int = 1,
    ci_prob: float = 0.95,
) -> NDArray | tuple[NDArray, ...]

Predict fitted values with optional uncertainty.

For derivatives, use :meth:derivative instead.

Parameters:

  • x_new (array - like) –

    Points at which to evaluate the spline.

  • return_se (bool, default: False ) –

    Whether to return standard errors.

  • se_method (str, default: "analytic" ) –

    Method for computing uncertainty: - 'analytic': delta-method SE -> (fhat, se) or (mu_hat, lower, upper) - 'bootstrap': parametric bootstrap SEs -> (fhat, se) - 'bayes': posterior mean + credible interval -> (mean, lower, upper)

  • scale (str, default: "response" ) –

    Scale for predictions: - 'response': predictions on response scale (mu for Poisson, pi for Binomial) - 'link': predictions on linear predictor scale (eta) For Gaussian, both are identical.

  • n_boot (int, default: 5000 ) –

    Number of bootstrap replicates (for bootstrap method).

  • seed (int, default: None ) –

    Random seed (for bootstrap method).

  • n_jobs (int, default: 1 ) –

    Number of parallel jobs (for bootstrap method).

  • ci_prob (float, default: 0.95 ) –

    Probability for confidence/credible interval.

Returns:

  • NDArray or tuple

    Predictions, optionally with uncertainty estimates. For GLM with return_se=True and scale="response": returns (mu_hat, lower, upper) with CI on response scale. For scale="link" with return_se=True: returns (eta_hat, se_eta).

Source code in src/psplines/core.py
def predict(
    self,
    x_new: ArrayLike,
    *,
    return_se: bool = False,
    se_method: str = "analytic",
    scale: str = "response",
    n_boot: int = 5000,
    seed: int | None = None,
    n_jobs: int = 1,
    ci_prob: float = 0.95,
) -> NDArray | tuple[NDArray, ...]:
    """
    Predict fitted values with optional uncertainty.

    For derivatives, use :meth:`derivative` instead.

    Parameters
    ----------
    x_new : array-like
        Points at which to evaluate the spline.
    return_se : bool, default False
        Whether to return standard errors.
    se_method : str, default "analytic"
        Method for computing uncertainty:
        - 'analytic': delta-method SE -> (fhat, se) or (mu_hat, lower, upper)
        - 'bootstrap': parametric bootstrap SEs -> (fhat, se)
        - 'bayes': posterior mean + credible interval -> (mean, lower, upper)
    scale : str, default "response"
        Scale for predictions:
        - 'response': predictions on response scale (mu for Poisson, pi for Binomial)
        - 'link': predictions on linear predictor scale (eta)
        For Gaussian, both are identical.
    n_boot : int, default 5000
        Number of bootstrap replicates (for bootstrap method).
    seed : int, optional
        Random seed (for bootstrap method).
    n_jobs : int, default 1
        Number of parallel jobs (for bootstrap method).
    ci_prob : float, default 0.95
        Probability for confidence/credible interval.

    Returns
    -------
    NDArray or tuple
        Predictions, optionally with uncertainty estimates.
        For GLM with return_se=True and scale="response":
          returns (mu_hat, lower, upper) with CI on response scale.
        For scale="link" with return_se=True:
          returns (eta_hat, se_eta).
    """
    if self.coef is None:
        raise FittingError("Model not fitted. Call fit() first.")

    xq = self._validate_query_points(x_new)
    self._validate_predict_params(se_method, scale, n_boot, n_jobs, ci_prob)

    fam = self._family_obj
    assert fam is not None

    # Bayesian credible band
    if se_method == "bayes":
        return self._bayes_predict(xq, deriv_order=None, ci_prob=ci_prob)

    # Bootstrap SEs
    if se_method == "bootstrap" and return_se:
        return self._bootstrap_predict(xq, None, n_boot, seed, n_jobs)

    # Analytic or plain prediction
    if self._xl is None or self._xr is None:
        raise FittingError("Domain bounds not set. Call fit() first.")
    Bq, _ = b_spline_basis(xq, self._xl, self._xr, self.nseg, self.degree)

    eta_hat: NDArray = np.asarray(Bq @ self.coef).ravel()

    if not return_se:
        if scale == "link" or fam.is_gaussian:
            return eta_hat
        return np.asarray(fam.inverse_link(eta_hat))

    if self._Ainv is None:
        raise FittingError("Analytic SEs unavailable. Call fit() first.")

    se_eta = self._compute_se(Bq)

    if scale == "link" or fam.is_gaussian:
        return eta_hat, se_eta

    # Response scale: transform CI endpoints via inverse link (§2.12.3)
    from scipy.stats import norm

    z_val = norm.ppf((1 + ci_prob) / 2)
    eta_lower = eta_hat - z_val * se_eta
    eta_upper = eta_hat + z_val * se_eta
    mu_hat = np.asarray(fam.inverse_link(eta_hat))
    mu_lower = np.asarray(fam.inverse_link(eta_lower))
    mu_upper = np.asarray(fam.inverse_link(eta_upper))
    return mu_hat, mu_lower, mu_upper

derivative

derivative(
    x_new: ArrayLike,
    *,
    deriv_order: int = 1,
    return_se: bool = False,
    se_method: str = "analytic",
    n_boot: int = 5000,
    seed: int | None = None,
    n_jobs: int = 1,
    ci_prob: float = 0.95,
) -> NDArray | tuple[NDArray, ...]

Compute k-th derivative of the fitted spline.

Parameters:

  • x_new (array - like) –

    Points at which to evaluate the derivative.

  • deriv_order (int, default: 1 ) –

    Order of derivative to compute.

  • return_se (bool, default: False ) –

    Whether to return standard errors.

  • se_method (str, default: "analytic" ) –

    Method for computing uncertainty: - 'analytic': delta-method SE -> (fhat, se) - 'bootstrap': parametric bootstrap SEs -> (fhat, se) - 'bayes': posterior mean + credible interval -> (mean, lower, upper)

  • n_boot (int, default: 5000 ) –

    Number of bootstrap replicates (for bootstrap method).

  • seed (int, default: None ) –

    Random seed (for bootstrap method).

  • n_jobs (int, default: 1 ) –

    Number of parallel jobs (for bootstrap method).

  • ci_prob (float, default: 0.95 ) –

    Probability for confidence/credible interval.

Returns:

  • NDArray or tuple

    Derivative values, optionally with standard errors.

Source code in src/psplines/core.py
def derivative(
    self,
    x_new: ArrayLike,
    *,
    deriv_order: int = 1,
    return_se: bool = False,
    se_method: str = "analytic",
    n_boot: int = 5000,
    seed: int | None = None,
    n_jobs: int = 1,
    ci_prob: float = 0.95,
) -> NDArray | tuple[NDArray, ...]:
    """
    Compute k-th derivative of the fitted spline.

    Parameters
    ----------
    x_new : array-like
        Points at which to evaluate the derivative.
    deriv_order : int, default 1
        Order of derivative to compute.
    return_se : bool, default False
        Whether to return standard errors.
    se_method : str, default "analytic"
        Method for computing uncertainty:
        - 'analytic': delta-method SE -> (fhat, se)
        - 'bootstrap': parametric bootstrap SEs -> (fhat, se)
        - 'bayes': posterior mean + credible interval -> (mean, lower, upper)
    n_boot : int, default 5000
        Number of bootstrap replicates (for bootstrap method).
    seed : int, optional
        Random seed (for bootstrap method).
    n_jobs : int, default 1
        Number of parallel jobs (for bootstrap method).
    ci_prob : float, default 0.95
        Probability for confidence/credible interval.

    Returns
    -------
    NDArray or tuple
        Derivative values, optionally with standard errors.
    """
    if self.coef is None:
        raise FittingError("Model not fitted. Call fit() first.")

    xq = self._validate_query_points(x_new)
    if deriv_order <= 0:
        raise ValidationError(f"deriv_order must be positive (got {deriv_order})")
    if se_method not in ("analytic", "bootstrap", "bayes"):
        raise ValidationError(
            f"se_method must be 'analytic', 'bootstrap', or 'bayes' (got '{se_method}')"
        )

    # Bayesian credible band
    if se_method == "bayes":
        return self._bayes_predict(xq, deriv_order=deriv_order, ci_prob=ci_prob)

    # Bootstrap SEs
    if se_method == "bootstrap" and return_se:
        return self._bootstrap_predict(xq, deriv_order, n_boot, seed, n_jobs)

    # Analytic method
    if self._xl is None or self._xr is None:
        raise FittingError("Domain bounds not set. Call fit() first.")
    Bq, _ = b_spline_derivative_basis(
        xq,
        self._xl,
        self._xr,
        self.nseg,
        self.degree,
        deriv_order,
        self.knots,
    )
    fhat: NDArray = np.asarray(Bq @ self.coef).ravel()

    if not return_se:
        return fhat

    if self._Ainv is None:
        raise FittingError("Analytic SEs unavailable. Call fit() first.")

    se = self._compute_se(Bq)
    return fhat, se

bayes_fit

bayes_fit(
    a: float = 2.0,
    b: float = 0.1,
    c: float = 2.0,
    d: float = 1.0,
    draws: int = 2000,
    tune: int = 2000,
    chains: int = 4,
    cores: int = 4,
    target_accept: float = 0.9,
    random_seed: int | None = None,
    adaptive: bool = False,
    nuts_sampler: str = "pymc",
) -> Any

Fit the P-spline model using Bayesian inference via PyMC.

Two modes are available:

Standard (default, adaptive=False) — implements the Bayesian P-spline of Eilers & Marx §3.5 / Lang & Brezger (2003) with a single penalty parameter:

.. math::

\lambda \sim \text{Gamma}(a, b), \quad
\alpha \mid \lambda \sim N(0, (\lambda D'D + \epsilon I)^{-1}), \quad
\sigma \sim \text{InverseGamma}(c, d)

Adaptive (adaptive=True) — uses one penalty parameter per row of the difference matrix, allowing spatially varying smoothness (similar to §8.8):

.. math::

\lambda_j \sim \text{Gamma}(a, b), \quad
\alpha \mid \lambda \sim N(0, (D' \Lambda D + \epsilon I)^{-1})

Requires the bayes optional dependencies: pip install psplines[bayes]

Parameters:

  • a (float, default: 2.0 ) –

    Shape parameter for the Gamma prior on penalty lambda(s).

  • b (float, default: 0.1 ) –

    Rate parameter for the Gamma prior on penalty lambda(s).

  • c (float, default: 2.0 ) –

    Shape parameter for the InverseGamma prior on sigma.

  • d (float, default: 1.0 ) –

    Scale parameter for the InverseGamma prior on sigma.

  • draws (int, default: 2000 ) –

    Number of MCMC posterior draws per chain.

  • tune (int, default: 2000 ) –

    Number of tuning (warmup) steps per chain.

  • chains (int, default: 4 ) –

    Number of MCMC chains.

  • cores (int, default: 4 ) –

    Number of CPU cores for parallel sampling.

  • target_accept (float, default: 0.9 ) –

    Target acceptance rate for the NUTS sampler.

  • random_seed (int, default: None ) –

    Random seed for reproducibility.

  • adaptive (bool, default: False ) –

    If False, use a single scalar penalty lambda (§3.5). If True, use per-difference lambdas for spatially adaptive smoothing (§8.8).

  • nuts_sampler (str, default: 'pymc' ) –

    NUTS sampler backend passed to pymc.sample(). "pymc" (default) uses the built-in PyTensor/C backend. "nutpie" compiles the model once via numba and samples chains in parallel threads, which is significantly faster for multi-chain runs. Requires nutpie to be installed (pip install psplines[nutpie]).

Returns:

  • InferenceData

    The posterior trace object.

Source code in src/psplines/core.py
def bayes_fit(
    self,
    a: float = 2.0,
    b: float = 0.1,
    c: float = 2.0,
    d: float = 1.0,
    draws: int = 2000,
    tune: int = 2000,
    chains: int = 4,
    cores: int = 4,
    target_accept: float = 0.9,
    random_seed: int | None = None,
    adaptive: bool = False,
    nuts_sampler: str = "pymc",
) -> Any:
    """
    Fit the P-spline model using Bayesian inference via PyMC.

    Two modes are available:

    **Standard (default, adaptive=False)** — implements the Bayesian
    P-spline of Eilers & Marx §3.5 / Lang & Brezger (2003) with a
    single penalty parameter:

    .. math::

        \\lambda \\sim \\text{Gamma}(a, b), \\quad
        \\alpha \\mid \\lambda \\sim N(0, (\\lambda D'D + \\epsilon I)^{-1}), \\quad
        \\sigma \\sim \\text{InverseGamma}(c, d)

    **Adaptive (adaptive=True)** — uses one penalty parameter per row of
    the difference matrix, allowing spatially varying smoothness
    (similar to §8.8):

    .. math::

        \\lambda_j \\sim \\text{Gamma}(a, b), \\quad
        \\alpha \\mid \\lambda \\sim N(0, (D' \\Lambda D + \\epsilon I)^{-1})

    Requires the ``bayes`` optional dependencies: ``pip install psplines[bayes]``

    Parameters
    ----------
    a : float, default 2.0
        Shape parameter for the Gamma prior on penalty lambda(s).
    b : float, default 0.1
        Rate parameter for the Gamma prior on penalty lambda(s).
    c : float, default 2.0
        Shape parameter for the InverseGamma prior on sigma.
    d : float, default 1.0
        Scale parameter for the InverseGamma prior on sigma.
    draws : int, default 2000
        Number of MCMC posterior draws per chain.
    tune : int, default 2000
        Number of tuning (warmup) steps per chain.
    chains : int, default 4
        Number of MCMC chains.
    cores : int, default 4
        Number of CPU cores for parallel sampling.
    target_accept : float, default 0.9
        Target acceptance rate for the NUTS sampler.
    random_seed : int, optional
        Random seed for reproducibility.
    adaptive : bool, default False
        If False, use a single scalar penalty lambda (§3.5).
        If True, use per-difference lambdas for spatially adaptive
        smoothing (§8.8).
    nuts_sampler : str, optional
        NUTS sampler backend passed to ``pymc.sample()``.  ``"pymc"``
        (default) uses the built-in PyTensor/C backend.  ``"nutpie"``
        compiles the model once via numba and samples chains in parallel
        threads, which is significantly faster for multi-chain runs.
        Requires ``nutpie`` to be installed (``pip install psplines[nutpie]``).

    Returns
    -------
    arviz.InferenceData
        The posterior trace object.
    """
    pm = _import_pymc()
    # Prepare basis and penalty
    x_array = np.asarray(self.x)
    self._xl = float(np.min(x_array)) if self._xl is None else self._xl
    self._xr = float(np.max(x_array)) if self._xr is None else self._xr
    B_sp, self.knots = b_spline_basis(
        self.x, self._xl, self._xr, self.nseg, self.degree
    )
    B = B_sp.toarray() if sp.issparse(B_sp) else B_sp
    nb = B.shape[1]
    D = difference_matrix(nb, self.penalty_order).toarray()
    DtD = D.T @ D
    y = self.y
    I_nb = np.eye(nb)

    # store BSpline object for predict
    coeffs = np.eye(nb)
    self._spline = BSpline(self.knots, coeffs, self.degree, extrapolate=False)

    with pm.Model():
        if adaptive:
            # Per-difference lambdas (spatially adaptive, §8.8)
            lam = pm.Gamma("lam", alpha=a, beta=b, shape=D.shape[0])
            Q = pm.math.dot(D.T * lam, D)
        else:
            # Single scalar lambda (standard Bayesian P-spline, §3.5)
            lam = pm.Gamma("lam", alpha=a, beta=b)
            Q = lam * DtD

        Q_j = Q + I_nb * 1e-6
        alpha = pm.MvNormal(
            "alpha",
            mu=pm.math.zeros(Q_j.shape[0]),
            tau=Q_j,
            shape=Q_j.shape[0],
        )
        sigma = pm.InverseGamma("sigma", alpha=c, beta=d)
        mu = pm.Deterministic("mu", pm.math.dot(B, alpha))
        pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y)
        trace = pm.sample(
            draws=draws,
            tune=tune,
            chains=chains,
            cores=cores,
            target_accept=target_accept,
            random_seed=random_seed,
            nuts_sampler=nuts_sampler,
        )

    # Store results
    self.trace = trace
    self.coef = (
        trace.posterior["alpha"]
        .stack(sample=("chain", "draw"))
        .mean(dim="sample")
        .values
    )
    self.fitted_values = (
        trace.posterior["mu"]
        .stack(sample=("chain", "draw"))
        .mean(dim="sample")
        .values
    )
    self.lambda_post = (
        trace.posterior["lam"]
        .stack(sample=("chain", "draw"))
        .mean(dim="sample")
        .values
    )
    return trace

plot_lam_trace

plot_lam_trace(figsize: tuple[int, int] = (8, 6)) -> None

Plot trace and marginal posterior for each lambda_j.

Source code in src/psplines/core.py
def plot_lam_trace(self, figsize: tuple[int, int] = (8, 6)) -> None:
    """
    Plot trace and marginal posterior for each lambda_j.
    """
    az = _import_arviz()
    az.plot_trace(self.trace, var_names=["lam"], figsize=figsize)

plot_alpha_trace

plot_alpha_trace(figsize: tuple[int, int] = (8, 6)) -> None

Plot trace and marginal posterior for alpha coefficients.

Source code in src/psplines/core.py
def plot_alpha_trace(self, figsize: tuple[int, int] = (8, 6)) -> None:
    """
    Plot trace and marginal posterior for alpha coefficients.
    """
    az = _import_arviz()
    az.plot_trace(self.trace, var_names=["alpha"], figsize=figsize)

plot_posterior

plot_posterior(figsize: tuple[int, int] = (8, 6)) -> None

Plot posterior

Source code in src/psplines/core.py
def plot_posterior(self, figsize: tuple[int, int] = (8, 6)) -> None:
    """
    Plot posterior
    """
    az = _import_arviz()
    az.plot_posterior(
        self.trace,
        var_names=["lam", "sigma"],
        figsize=figsize,
        hdi_prob=0.95,
        point_estimate="mean",
    )

Usage Examples

Basic Usage

import numpy as np
from psplines import PSpline, ShapeConstraint, SlopeZeroConstraint

# Generate data
x = np.linspace(0, 1, 50)
y = np.sin(2 * np.pi * x) + 0.1 * np.random.randn(50)

# Create and fit spline
spline = PSpline(x, y, nseg=20, lambda_=1.0)
spline.fit()

# Make predictions
x_new = np.linspace(0, 1, 100)
y_pred = spline.predict(x_new)

With Uncertainty Quantification

# Analytical standard errors
y_pred, se = spline.predict(x_new, return_se=True)

# Bootstrap confidence intervals
y_pred, se_boot = spline.predict(x_new, return_se=True,
                                se_method="bootstrap", n_boot=1000)

Derivative Computation

# First derivative
dy_dx = spline.derivative(x_new, deriv_order=1)

# Second derivative with uncertainty
d2y_dx2, se = spline.derivative(x_new, deriv_order=2, return_se=True)

With Observation Weights

# Heteroscedastic weights (trust right side of data more)
weights = 1.0 + 5.0 * x
spline = PSpline(x, y, weights=weights)
spline.fit()

# Missing data via zero weights
weights = np.ones_like(x)
weights[20:30] = 0.0  # mark observations 20-29 as missing
spline = PSpline(x, y, weights=weights)
spline.fit()  # penalty interpolates smoothly through the gap

Poisson P-Spline (Count Data)

# Smooth count data with log link
years = np.arange(1851, 1963, dtype=float)
counts = ...  # annual event counts

spline = PSpline(years, counts, nseg=20, lambda_=100, family="poisson")
spline.fit()

# Predictions are positive (response scale)
x_new = np.linspace(1851, 1962, 200)
mu_pred = spline.predict(x_new)  # scale="response" is default

# Confidence intervals on response scale (asymmetric, always positive)
mu_hat, lower, upper = spline.predict(x_new, return_se=True)

# Linear predictor (log scale)
eta = spline.predict(x_new, scale="link")

Shape-Constrained Smoothing

# Monotone increasing fit
x = np.linspace(0, 5, 60)
y = np.log(x + 1) + 0.2 * np.random.randn(60)

spline = PSpline(x, y, nseg=20, lambda_=1.0,
                 shape=[ShapeConstraint(type="increasing")])
spline.fit()

# All first differences of the fitted coefficients will be ≥ 0
y_pred = spline.predict(np.linspace(0, 5, 200))
# Concave fit
spline = PSpline(x, y, nseg=20, lambda_=1.0,
                 shape=[ShapeConstraint(type="concave")])
spline.fit()
# Multiple constraints: increasing AND concave
spline = PSpline(x, y, nseg=20, lambda_=1.0,
                 shape=[ShapeConstraint(type="increasing"), ShapeConstraint(type="concave")])
spline.fit()
# Selective domain constraint (monotone only for x ≤ 3)
spline = PSpline(x, y, nseg=20, lambda_=1.0,
                 shape=[ShapeConstraint(type="increasing",
                         domain=(float(x.min()), 3.0))])
spline.fit()

Variable Penalty (Exponential Weights)

# Heavier smoothing toward the right boundary (γ > 0)
spline = PSpline(x, y, nseg=20, lambda_=1.0, penalty_gamma=5.0)
spline.fit()

Adaptive Penalty (Spatially Varying Smoothness)

# Nonparametric adaptive smoothing — the penalty weights are
# estimated from the data via a secondary B-spline basis.
spline = PSpline(x, y, nseg=20, lambda_=1.0,
                 adaptive=True, adaptive_nseg=10,
                 adaptive_lambda=100.0)
spline.fit()

# The estimated per-difference weights are stored in:
print(spline._adaptive_weights)

Poisson with Exposure Offsets

# Rate modeling: mu = exp(B*alpha) * exposure
exposure = ...  # population at risk
spline = PSpline(x, counts, family="poisson", offset=np.log(exposure))
spline.fit()

Binomial P-Spline (Binary/Proportion Data)

# Bernoulli response (y in {0, 1})
spline = PSpline(age, presence, nseg=15, lambda_=10, family="binomial")
spline.fit()

# Fitted probabilities are in [0, 1]
pi_hat = spline.predict(age_new)

# Grouped binomial (y successes out of t trials)
spline = PSpline(x, successes, family="binomial", trials=trials_vec)
spline.fit()

Density Estimation

from psplines import density_estimate

# Smooth density from raw data (AIC-optimal lambda)
result = density_estimate(raw_data, bins=100, penalty_order=3)

# result.density integrates to ~1
# result.grid contains evaluation points
# result.pspline is the underlying fitted PSpline

Bayesian Inference

# Standard Bayesian P-spline (single λ, §3.5 of Eilers & Marx 2021)
trace = spline.bayes_fit(draws=2000, tune=1000)

# Adaptive Bayesian P-spline (per-difference λ_j, §8.8)
trace = spline.bayes_fit(draws=2000, tune=1000, adaptive=True)

# Get posterior credible intervals (works with either mode)
mean, lower, upper = spline.predict(x_new, se_method="bayes")

Model Parameters

Basis Function Parameters

  • nseg: Number of B-spline segments

    • Controls the flexibility of the basis
    • More segments = more flexible fit
    • Typical values: 10-50
  • degree: Degree of B-spline basis functions

    • Controls local smoothness
    • Higher degree = smoother derivatives
    • Typical values: 1-4 (3 is most common)

Penalty Parameters

  • lambda_: Smoothing parameter

    • Controls the trade-off between fit and smoothness
    • Higher values = smoother fits
    • Can be selected automatically
  • penalty_order: Order of the difference penalty

    • 1: Penalizes first differences (rough penalty on slopes)
    • 2: Penalizes second differences (rough penalty on curvature)
    • 3: Penalizes third differences (rough penalty on jerk)

Observation Weights

  • weights: Optional array of non-negative observation weights
    • Same length as x and y
    • Replaces the normal equations with weighted form: \((B'WB + \lambda D'D)\alpha = B'Wy\)
    • Higher weights give more influence to those observations
    • Zero weights effectively mark observations as missing — the penalty interpolates through gaps
    • None (default) treats all observations equally (equivalent to weights=1)

Constraint Parameters

  • constraints: Dictionary specifying boundary constraints

    • "deriv": Derivative constraints at boundaries
    • Example: {"deriv": {"order": 1, "initial": 0, "final": 0}}
  • slope_zero: Enforce flat slope in a subdomain

    • Example: slope_zero=SlopeZeroConstraint(domain=(2.0, 4.0))

Shape Constraint Parameters

  • shape: List of ShapeConstraint specifications (§8.7)

    • Each entry is a ShapeConstraint with type (required) and optional domain
    • Supported types: "increasing", "decreasing", "convex", "concave", "nonneg"
    • Optional domain=(lo, hi) restricts the constraint to that \(x\)-range
    • Multiple constraints can be combined (e.g. increasing + concave)
    • Example: [ShapeConstraint(type="increasing"), ShapeConstraint(type="concave", domain=(0, 5))]
  • shape_kappa: Large penalty weight for shape violations (default \(10^8\))

    • Higher values enforce the constraint more strictly
    • Too large may cause numerical issues
  • max_shape_iter: Maximum iterations for the asymmetric-penalty loop (default 50)

Adaptive / Variable Penalty Parameters

  • adaptive: Enable nonparametric adaptive penalty (§8.8, default False)

    • Estimates spatially varying weights via a secondary B-spline basis
    • Alternates between weight estimation and coefficient estimation
  • adaptive_nseg: Number of segments for the secondary weight basis (default 10)

  • adaptive_lambda: Smoothing parameter for the weight basis (default 100.0)

  • adaptive_max_iter: Maximum outer iterations for adaptive loop (default 20)

  • penalty_gamma: Exponential variable penalty rate (§8.8, default None)

    • When set, uses weights \(v_j = \exp(\gamma j / m)\)
    • Positive \(\gamma\) → heavier penalty toward the right boundary
    • Negative \(\gamma\) → heavier penalty toward the left boundary
    • None (default) uses the standard uniform penalty

Model Attributes

After fitting, the following attributes are available:

  • coef: B-spline coefficients
  • fitted_values: Fitted values at input points
  • knots: Knot vector used for B-splines
  • ED: Effective degrees of freedom
  • phi_: Residual variance estimate
  • se_coef: Standard errors of coefficients
  • se_fitted: Standard errors of fitted values

Error Handling

The PSpline class includes comprehensive input validation:

  • Array validation: Checks for proper dimensions, finite values
  • Parameter validation: Ensures valid ranges and types
  • State validation: Verifies model is fitted before prediction
  • Constraint validation: Validates constraint specifications

All errors provide descriptive messages with suggested solutions.