Generalized Additive Models¶
Penalized regression spline GAMs via P-IRLS matching R’s mgcv::gam().
Generalized Additive Models (GAM).
Provides smooth term specification and basis construction for penalized regression spline GAMs following Wood (2017).
Usage:
from pystatistics.gam import gam, s
sol = gam(
y,
smooths=[s('x1', k=15), s('x2', bs='tp')],
smooth_data={'x1': x1, 'x2': x2},
)
print(sol.summary())
- class pystatistics.gam.GAMParams(coefficients, fitted_values, linear_predictor, residuals, edf, total_edf, lambdas, s_scales, covariance, scale, gcv, ubre, reml_score, deviance, null_deviance, log_likelihood, aic, n_obs, family_name, link_name, dispersion_fixed, converged, outer_converged, n_iter, method, backend_name)[source]¶
Bases:
objectParameters from a fitted GAM.
Carries the full numerical output of the stable (augmented-QR) penalized iteratively re-weighted least squares fitting procedure.
- Parameters:
coefficients (ndarray[tuple[Any, ...], dtype[floating[Any]]])
fitted_values (ndarray[tuple[Any, ...], dtype[floating[Any]]])
linear_predictor (ndarray[tuple[Any, ...], dtype[floating[Any]]])
total_edf (float)
scale (float)
gcv (float)
ubre (float)
reml_score (float | None)
deviance (float)
null_deviance (float)
log_likelihood (float)
aic (float)
n_obs (int)
family_name (str)
link_name (str)
dispersion_fixed (bool)
converged (bool)
outer_converged (bool)
n_iter (int)
method (str)
backend_name (str)
- coefficients¶
Full coefficient vector (parametric + constrained basis coefficients; smooth coefficients live in the sum-to-zero constrained parameterisation, so a smooth declared with
kbasis functions contributesk - 1coefficients — same as mgcv).- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.floating[Any]]]
- fitted_values¶
Response-scale predictions (mu_hat).
- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.floating[Any]]]
- linear_predictor¶
Link-scale predictions (eta_hat).
- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.floating[Any]]]
- residuals¶
Working residuals (y - mu_hat).
- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.floating[Any]]]
- edf¶
Effective degrees of freedom per smooth term.
- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.floating[Any]]]
- lambdas¶
Selected (or user-fixed) smoothing parameters, one per smooth. For
crsmooths these are directly comparable to mgcv’ssp(identical penalty construction and scaling); fortpsmooths the fit is function-space identical to mgcv’s but the penalty parameterisation (and hence the sp units) differs by the eigenbasis convention — compare tp fits via EDF/fitted values, not raw sp.- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.floating[Any]]]
- s_scales¶
Per-smooth
S.scalepenalty normalisation factors (dividelambdasby these to get function-space units).- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.floating[Any]]]
- covariance¶
Bayesian posterior covariance of the coefficients (
scale * (X'WX + S_lambda)^{-1}, mgcv’sVp).- Type:
numpy.ndarray[tuple[Any, …], numpy.dtype[numpy.floating[Any]]]
- gcv¶
GCV score (scale-free families: the UBRE score is the criterion actually minimised; both are reported).
- Type:
- reml_score¶
Laplace REML criterion at the selected lambdas — only populated when
method='REML';Noneotherwise.- Type:
float | None
- aic¶
Akaike information criterion, classical GAM convention
-2 loglik + 2 (total_edf [+1 if scale estimated]). NOTE: mgcv >= 1.8.x corrects the df for smoothing-parameter uncertainty (Wood, Pya & Saefken 2016, ‘edf2’), so mgcv’sAIC()is systematically slightly larger; documented, not hidden.- Type:
- dispersion_fixed¶
True when the family’s dispersion is fixed at 1 (binomial, poisson, negative.binomial) — controls z-vs-t and Chi.sq-vs-F conventions downstream.
- Type:
- outer_converged¶
Whether the smoothing-parameter search converged away from its bounds (always True when
spwas user-fixed).- Type:
- backend_name¶
Execution-path disclosure (always
'cpu_qr_pirls'; the GAM module is CPU-only).- Type:
- class pystatistics.gam.SmoothInfo(term_name, var_name, basis_type, k, edf, ref_df, chi_sq, p_value, coef_indices, lambda_, s_scale)[source]¶
Bases:
objectInformation about a single smooth term in a fitted GAM.
One
SmoothInfois produced pers()term after fitting. It records the basis metadata and the approximate significance test for the smooth.- Parameters:
- basis_type¶
Basis identifier:
'cr'(cubic regression spline) or'tp'(thin plate regression spline).- Type:
- k¶
The basis dimension as declared in
s(x, k=...)(the smooth contributesk - 1coefficients after the sum-to-zero identifiability constraint, same as mgcv).- Type:
- ref_df¶
Reference degrees of freedom,
tr(2H - HH)over the term’s block (mgcv’s Ref.df convention).- Type:
- chi_sq¶
Approximate significance statistic. For fixed-dispersion families this is the chi-squared-referenced quadratic form
beta' pinv(V_beta) beta; for estimated-dispersion families it is that form divided by Ref.df, i.e. an F STATISTIC (the summary labels the column accordingly, matching mgcv). A simplified form of mgcv’s Wood (2013) test — agrees with mgcv’s reported statistic to ~0.2% on validated cases, not bit-identical.- Type:
- coef_indices¶
(start, end)slice into the full coefficient vector identifying this term’s (constrained) coefficients.
- s_scale¶
Penalty normalisation factor for this term (
lambda_ / s_scaleis in function-space units).- Type:
- pystatistics.gam.s(var_name, k=10, bs='cr')[source]¶
Convenience constructor for smooth terms, matching
mgcv::s().Usage:
from pystatistics.gam import s, gam result = gam(y, X, smooths=[s('x1', k=15), s('x2')])
- Parameters:
- Returns:
A
SmoothTermspecification object.- Return type:
- class pystatistics.gam.SmoothTerm(var_name, k=10, bs='cr')[source]¶
Bases:
objectSpecification for a smooth term
s(x)in a GAM formula.A pure, immutable-by-convention specification: variable name, basis type and basis dimension. All fit-derived quantities (basis matrices, penalties, constraint reparameterisations) live inside the fit — a
SmoothTermcan safely be reused across multiplegam()calls and datasets without stale-state hazards.- var_name¶
Name of the predictor variable.
- k¶
Basis dimension (>= 3), exactly as mgcv’s
s(x, k=...). The fitted smooth contributesk - 1coefficients after its sum-to-zero identifiability constraint (same as mgcv).
- bs¶
Basis type:
'cr'(cubic regression spline, default) or'tp'(thin plate regression spline).
- __init__(var_name, k=10, bs='cr')[source]¶
Create a smooth term specification.
- Parameters:
- Raises:
ValidationError – If
var_nameis empty,kis out of range, orbsis not a recognised basis type.- Return type:
None
- var_name¶
- k¶
- bs¶
- pystatistics.gam.gam(y, X=None, *, smooths=None, smooth_data=None, family='gaussian', method='GCV', sp=None, tol=1e-08, max_iter=200, names=None)[source]¶
Fit a Generalized Additive Model.
Matches R’s
mgcv::gam()for common cases. A GAM models the response as:g(E[y]) = beta_0 + f_1(x_1) + f_2(x_2) + ... + X @ beta
where each
f_jis a smooth function estimated from the data using penalized regression splines, with per-smooth sum-to-zero identifiability constraints (a smooth declared withkbasis functions contributesk - 1coefficients, exactly as mgcv).- Parameters:
y (_Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) – Response variable, 1-D array of n observations.
X (_Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None) – Parametric design matrix
(n, p). May include an intercept column. IfNone, an intercept-only parametric part is used.smooths (list[SmoothTerm] | None) – Smooth term specifications from
s().smooth_data (dict[str, _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]] | None) – Mapping from smooth variable names to
(n,)arrays.family (str | Family) – Exponential family, e.g.
'gaussian','binomial','poisson','Gamma'.method='REML'supports the fixed-dispersion families and the Gaussian-identity model.method (str) – Smoothing-parameter selection criterion:
'GCV'(mgcvGCV.Cpsemantics — UBRE is used automatically for known-scale families, as mgcv does) or'REML'(Laplace, Wood 2011).sp (_Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str] | None) – Optional fixed smoothing parameters, one per smooth (mgcv’s
sp=). Skips selection; use for reproducing a specific fit.tol (float) – P-IRLS convergence tolerance (relative penalized deviance).
max_iter (int) – Maximum P-IRLS iterations.
names (list[str] | None) – Optional display names for the parametric coefficients.
- Returns:
A
GAMSolution.- Raises:
ValidationError – invalid inputs, unknown family/method/basis,
kexceeding the unique values of a smooth variable, or aspvector of the wrong length/sign.ConvergenceError – a genuinely divergent P-IRLS (loud, never a silent answer).
- Return type:
- class pystatistics.gam.GAMSolution(_result, _smooth_infos, _names=None)[source]¶
Bases:
SolutionReprMixinUser-facing GAM result with convenient properties and R-style output.
Wraps a
Result[GAMParams]and provides:Coefficient access.
Smooth-term information (EDF, approximate significance tests).
R-style
summary()matchingmgcv::summary.gam().
- property coefficients: ndarray[tuple[Any, ...], dtype[_ScalarT]]¶
Full coefficient vector (parametric + basis).
- property edf: ndarray[tuple[Any, ...], dtype[_ScalarT]]¶
Effective degrees of freedom per smooth term.
- property smooth_terms: list[SmoothInfo]¶
Information about each smooth term.
- property ubre: float¶
UBRE / scaled-AIC score (the criterion mgcv’s GCV.Cp minimises for known-scale families).
- property lambdas: ndarray[tuple[Any, ...], dtype[_ScalarT]]¶
Selected (or user-fixed) smoothing parameters, one per smooth.
Reported in the same penalty scaling mgcv reports
spin; divide byparams.s_scalesfor function-space units.
- property covariance: ndarray[tuple[Any, ...], dtype[_ScalarT]]¶
Bayesian posterior covariance of the coefficients (mgcv
Vp).
- property se: ndarray[tuple[Any, ...], dtype[_ScalarT]]¶
Standard errors for ALL coefficients (posterior, mgcv-style).
- property r_squared_adj: float¶
Adjusted R-squared based on deviance explained.
Matches the formula used by
mgcv:R_adj = 1 - (deviance / df_resid) / (null_deviance / df_null)
- summary()[source]¶
R-style summary matching
summary(gam(...)).Includes:
Family and link function.
Parametric coefficient table with standard errors and tests.
Approximate significance tests for each smooth term.
Adjusted R-squared, deviance explained, GCV, and scale.
- Returns:
Multi-line summary string.
- Return type: