Multinomial Regression¶
Multinomial logit (softmax) regression matching R’s nnet::multinom().
Multinomial logistic regression (softmax regression).
Matches R’s nnet::multinom() for the multinomial logit model.
- Usage:
from pystatistics.multinomial import multinom, MultinomialSolution
result = multinom(y, X) result.summary()
- pystatistics.multinomial.multinom(y, X, *, names=None, category_names=None, tol=1e-08, max_iter=200, backend=None, conf_level=0.95)[source]¶
Fit a multinomial logistic regression model.
Estimates the multinomial logit (softmax regression) model via maximum likelihood using L-BFGS-B optimization. The last class (highest code) is the reference category, matching the convention used by R’s nnet::multinom().
- The model assumes:
P(Y = j | x) = exp(x’ beta_j) / sum_k exp(x’ beta_k)
with beta_J = 0 for identifiability.
- Parameters:
y (_Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) – Response vector of integer class codes 0, 1, …, J-1. Shape (n,).
X (_Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) – Design matrix of shape (n, p). The user is responsible for including an intercept column if desired, matching R’s behavior where formula syntax handles intercept addition.
names (list[str] | None) – Optional list of predictor names matching columns of X. If None, defaults to [“x0”, “x1”, …].
category_names (list[str] | None) – Optional list of class labels in order 0, 1, …, J-1. If None, defaults to [“0”, “1”, …].
tol (float) – Convergence tolerance for L-BFGS-B optimizer. Both function tolerance (ftol) and gradient tolerance (gtol) are set to this value.
max_iter (int) – Maximum number of optimizer iterations.
backend (str | None) – Compute backend = (device, precision).
'cpu'(default): the numpy reference path, validated against Rnnet::multinom.'gpu': float32, matches CPU at the project’sGPU_FP32tolerance tier (rtol = 1e-4) — raises if no GPU is available (Rule 1: no silent fallback).'gpu_fp64': float64, CUDA only (raises on MPS) for CPU-matching precision, subject to the consumer-NVIDIA FP64 throughput penalty.'auto': GPU-float32 if CUDA is present, else CPU. GPU keepsXand the one-hotyon the device for the full optimization; only the parameter vector and (scalar NLL, gradient vector) cross the bus per L-BFGS-B iteration.conf_level (float)
- Returns:
MultinomialSolution wrapping the fitted model.
- Raises:
ValidationError – If inputs fail validation.
ConvergenceError – If optimization does not converge.
- Return type:
Examples
>>> import numpy as np >>> from pystatistics.multinomial import multinom >>> rng = np.random.default_rng(42) >>> n = 200 >>> X = np.column_stack([np.ones(n), rng.standard_normal((n, 2))]) >>> # True model: 3 classes, last is reference >>> y = rng.choice(3, size=n) >>> result = multinom(y, X) >>> result.summary()
- class pystatistics.multinomial.MultinomialSolution(_result, _conf_level=0.95)[source]¶
Bases:
SolutionReprMixinMultinomial logistic regression solution.
Wraps a Result[MultinomialParams] and provides convenient access to model quantities: coefficients, standard errors, z-values, p-values, predictions, and model fit statistics.
The reference class is the last class (highest code), matching R’s nnet::multinom() convention. Coefficients are reported for each non-reference class relative to the reference.
- _result¶
The underlying Result[MultinomialParams] object.
- property coefficient_matrix: ndarray[tuple[Any, ...], dtype[floating[Any]]]¶
Coefficient matrix of shape (J-1, p).
One row per non-reference class, one column per predictor.
- property coef: dict[str, dict[str, float]]¶
class -> predictor -> value.
Excludes the reference class (whose coefficients are all zero by definition).
- Returns:
Dict mapping each non-reference class name to a dict mapping predictor names to coefficient values.
- Type:
Coefficients as nested dict
- property vcov: ndarray[tuple[Any, ...], dtype[floating[Any]]]¶
Variance-covariance matrix of the estimated coefficients.
Shape
((J-1)*p, (J-1)*p), ordered to matchcoefficient_matrix.ravel()(row-major over (class, predictor)). The diagonal reshaped to(J-1, p)gives the squared standard errors.
- property standard_errors: ndarray[tuple[Any, ...], dtype[floating[Any]]]¶
Standard error matrix of shape (J-1, p).
Derived from the diagonal of the variance-covariance matrix.
- property z_values: ndarray[tuple[Any, ...], dtype[floating[Any]]]¶
Z-value (Wald statistic) matrix of shape (J-1, p).
Computed as coefficient / standard_error.
- property p_values: ndarray[tuple[Any, ...], dtype[floating[Any]]]¶
Two-sided p-value matrix of shape (J-1, p).
Computed from the z-values using the standard normal distribution.
- property conf_int: ndarray[tuple[Any, ...], dtype[floating[Any]]]¶
Wald confidence intervals, shape (J-1, p, 2).
coef ± z * seper (non-reference class, predictor), with the normal quantile forconf_level(multinomial inference is asymptotic-normal). The trailing axis is [lower, upper];exp(conf_int)gives relative-risk-ratio intervals.
- property fitted_probs: ndarray[tuple[Any, ...], dtype[floating[Any]]]¶
Predicted probability matrix of shape (n, J).
Each row sums to 1. Columns correspond to class_names in order.
- property predicted_class: ndarray[tuple[Any, ...], dtype[int64]]¶
Predicted class codes of shape (n,).
The predicted class for each observation is the one with the highest fitted probability (argmax).