Optimization API¶
The optimize module provides functions for automatic selection of the smoothing parameter λ using various criteria.
Functions¶
Find Lambda that minimizes GCV = (rss/n) / (1 - edf/n)^2.
For GLM families, uses deviance in place of RSS.
Source code in src/psplines/optimize.py
Find Lambda that minimizes AIC = nlog(dev/n) + 2edf.
For Gaussian: dev = RSS. For GLM: deviance from family. Works for Poisson density estimation (§3.3) and binomial smoothing (§3.2).
Source code in src/psplines/optimize.py
Pick lambda via maximum curvature of the L-curve. Implements two-stage search (coarse + refine), vectorized curvature, optional smoothing, and edge-case warnings.
Parameters:
-
pspline(PSpline) –Fitted PSpline instance (coefficient solver ready).
-
lambda_bounds(tuple, default:(1e-06, 1000000.0)) –(min, max) bounds for initial lambda grid (log-uniform).
-
num_lambda(int, default:81) –Number of points in the initial lambda grid.
-
refine(bool, default:True) –Whether to perform a second, finer search around the coarse optimum.
-
refine_factor(float, default:10) –Factor to widen/narrow bounds for refinement around coarse lambda.
-
refine_points(int, default:81) –Number of points in the refined grid.
-
smooth_kappa(bool, default:True) –Whether to apply a 3-point moving average to curvature values.
Source code in src/psplines/optimize.py
Pick Lambda via minimum distance on V‑curve.
Source code in src/psplines/optimize.py
2-D grid search for variable-penalty parameters (λ, γ).
The penalty matrix uses exponentially varying weights
v_j = exp(γ j / m) (§8.8). For each (λ, γ) pair the model is
solved and scored with GCV or AIC.
Parameters:
-
pspline(PSpline) –A fitted PSpline instance (used for basis, knots, data).
-
gamma_range((float, float), default:(-20.0, 20.0)) –Bounds for the γ grid (linear scale).
-
lambda_bounds((float, float), default:(1e-06, 1000000.0)) –Bounds for the λ grid (log₁₀ scale).
-
num_gamma(int, default:41) –Number of γ grid points.
-
num_lambda(int, default:41) –Number of λ grid points.
-
criterion({'gcv', 'aic'}, default:{'gcv') –Selection criterion.
Returns:
-
best_lambda(float) – -
best_gamma(float) – -
best_score(float) – -
scores((ndarray, shape(num_gamma, num_lambda))) –Full score surface.
Source code in src/psplines/optimize.py
457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 | |
Usage Examples¶
Cross-Validation (Recommended)¶
import numpy as np
from psplines import PSpline
from psplines.optimize import cross_validation
# Create spline
x = np.linspace(0, 1, 100)
y = np.sin(2 * np.pi * x) + 0.1 * np.random.randn(100)
spline = PSpline(x, y, nseg=20)
# Find optimal lambda using cross-validation
best_lambda, cv_score = cross_validation(spline)
print(f"Optimal λ: {best_lambda:.6f}, CV score: {cv_score:.6f}")
# Use optimal lambda
spline.lambda_ = best_lambda
spline.fit()
AIC-Based Selection¶
from psplines.optimize import aic
# Find optimal lambda using AIC
best_lambda, aic_score = aic(spline)
print(f"Optimal λ: {best_lambda:.6f}, AIC: {aic_score:.6f}")
GLM Models (Poisson, Binomial)¶
All optimizers work with GLM P-splines. For non-Gaussian families, scoring uses deviance instead of RSS, and each candidate λ requires a full IRLS convergence:
# Poisson P-spline with AIC-optimal lambda
counts = np.random.poisson(np.exp(np.sin(x)), 100)
spline_pois = PSpline(x, counts, nseg=20, family="poisson")
best_lambda, aic_score = aic(spline_pois)
spline_pois.lambda_ = best_lambda
spline_pois.fit()
# Binomial P-spline with cross-validation
trials = np.full(100, 20)
successes = np.random.binomial(trials, 1 / (1 + np.exp(-2 * np.sin(x))))
spline_bin = PSpline(x, successes, nseg=20, family="binomial", trials=trials)
best_lambda, cv_score = cross_validation(spline_bin)
spline_bin.lambda_ = best_lambda
spline_bin.fit()
L-Curve Method¶
from psplines.optimize import l_curve
# Find optimal lambda using L-curve
best_lambda, curvature = l_curve(spline)
print(f"Optimal λ: {best_lambda:.6f}, Curvature: {curvature:.6f}")
Variable Penalty Parameter Selection¶
When using exponentially varying penalty weights (§8.8), variable_penalty_cv
performs a 2-D grid search over \((\lambda, \gamma)\):
from psplines.optimize import variable_penalty_cv
# Fit an initial spline (required for basis info)
spline = PSpline(x, y, nseg=20)
spline.fit()
# 2-D grid search for best (λ, γ) using GCV
best_lambda, best_gamma, best_score, scores = variable_penalty_cv(
spline,
gamma_range=(-10, 10),
lambda_bounds=(1e-4, 1e4),
num_gamma=41,
num_lambda=41,
criterion="gcv",
)
print(f"Optimal λ={best_lambda:.4f}, γ={best_gamma:.2f}")
# Apply the optimal parameters
spline.lambda_ = best_lambda
spline.penalty_gamma = best_gamma
spline.fit()
Comparing Methods¶
import matplotlib.pyplot as plt
# Compare different methods
methods = {
'CV': cross_validation,
'AIC': aic,
'L-curve': l_curve,
'V-curve': v_curve
}
results = {}
for name, method in methods.items():
try:
lambda_opt, score = method(spline)
results[name] = lambda_opt
print(f"{name}: λ = {lambda_opt:.6f}")
except Exception as e:
print(f"{name} failed: {e}")
# Plot comparison
if results:
methods_list = list(results.keys())
lambdas = list(results.values())
plt.figure(figsize=(8, 5))
plt.bar(methods_list, lambdas)
plt.yscale('log')
plt.ylabel('Optimal λ')
plt.title('Comparison of Parameter Selection Methods')
plt.show()
Mathematical Background¶
Cross-Validation¶
Generalized Cross-Validation (GCV) minimizes: $\(\text{GCV}(\lambda) = \frac{n \|y - S_\lambda y\|^2}{[n - \text{tr}(S_\lambda)]^2}\)$
where \(S_\lambda = B(B^TB + \lambda D^TD)^{-1}B^T\) is the smoothing matrix.
Advantages: - Well-established statistical foundation - Good performance across various problems - Automatic selection without user input
Disadvantages: - Can be computationally expensive - May oversmooth in some cases
Akaike Information Criterion (AIC)¶
AIC balances fit quality and model complexity: $\(\text{AIC}(\lambda) = n \log(\hat{\sigma}^2) + 2 \cdot \text{ED}(\lambda)\)$
where: - \(\hat{\sigma}^2 = \|y - S_\lambda y\|^2 / n\) is the residual variance - \(\text{ED}(\lambda) = \text{tr}(S_\lambda)\) is the effective degrees of freedom
For GLM families (Poisson, Binomial), AIC uses deviance instead of RSS: $\(\text{AIC}(\lambda) = \text{Dev}(\lambda) + 2 \cdot \text{ED}(\lambda)\)$
Advantages: - Information-theoretic foundation - Fast computation - Good for model comparison - Works with both Gaussian and GLM families
Disadvantages: - May not work well for all noise levels - Less robust than cross-validation
L-Curve Method¶
The L-curve plots \(\log(\|D\alpha_\lambda\|^2)\) vs \(\log(\|y - B\alpha_\lambda\|^2)\) and finds the point of maximum curvature.
Curvature is computed as: $\(\kappa(\lambda) = \frac{2(\rho' \eta'' - \rho'' \eta')}{(\rho'^2 + \eta'^2)^{3/2}}\)$
where \(\rho(\lambda) = \log(\|y - B\alpha_\lambda\|^2)\) and \(\eta(\lambda) = \log(\|D\alpha_\lambda\|^2)\).
Advantages: - Intuitive geometric interpretation - No statistical assumptions - Good for ill-posed problems
Disadvantages: - Can be sensitive to noise - May not have clear corner
V-Curve Method¶
Similar to L-curve but uses different scaling and looks for valley shape.
Variable Penalty Selection (§8.8)¶
For the exponential variable penalty \(P(\gamma) = D^T\text{diag}(e^{\gamma j/m})D\),
variable_penalty_cv evaluates GCV (or AIC) over a 2-D grid of
\((\lambda, \gamma)\) values and returns the combination that minimises the criterion:
This adds a single extra hyperparameter \(\gamma\) that controls the spatial distribution of the penalty weight while \(\lambda\) controls overall smoothness.
Implementation Details¶
Optimization Strategy¶
All methods use: 1. Logarithmic search: Test λ values on logarithmic grid 2. Golden section search: Refine around the optimal region 3. Sparse linear algebra: Efficient computation of smoothing matrices
Default Parameter Ranges¶
- λ range: \(10^{-6}\) to \(10^6\) (12 orders of magnitude)
- Grid points: 50 logarithmically spaced values
- Refinement: Golden section search with tolerance \(10^{-6}\)
Performance Considerations¶
- Cross-validation: \(O(n^3)\) for dense problems, \(O(n^2)\) for sparse
- AIC: \(O(n^2)\) computation per λ value
- L-curve: \(O(n^2)\) plus curvature computation
- V-curve: Similar to L-curve
Numerical Stability¶
- Uses sparse Cholesky decomposition when possible
- Handles ill-conditioned matrices gracefully
- Monitors condition numbers and issues warnings
Method Selection Guidelines¶
Recommended Approach¶
- Start with cross-validation: Most robust for general use
- Try AIC: If CV is too slow or gives unreasonable results
- Use L-curve: For regularization-heavy applications
- Compare methods: Check consistency across approaches
Problem-Specific Recommendations¶
- Small datasets (n < 100): Cross-validation or AIC
- Large datasets (n > 1000): AIC or L-curve for speed
- High noise: Cross-validation (more robust)
- Low noise: Any method should work well
- Sparse data: L-curve or V-curve
- Time series: Cross-validation with temporal structure
Troubleshooting¶
If optimization fails: 1. Check input data for issues (NaN, infinite values) 2. Try different nseg values 3. Reduce the λ search range 4. Use a different optimization method 5. Manually specify λ based on domain knowledge