Multinomial Regression

Multinomial logit (softmax) regression matching R’s nnet::multinom().

Multinomial logistic regression (softmax regression).

Matches R’s nnet::multinom() for the multinomial logit model.

Usage:

from pystatistics.multinomial import multinom, MultinomialSolution

result = multinom(y, X) result.summary()

pystatistics.multinomial.multinom(y, X, *, names=None, category_names=None, tol=1e-08, max_iter=200, backend=None, conf_level=0.95)[source]

Fit a multinomial logistic regression model.

Estimates the multinomial logit (softmax regression) model via maximum likelihood using L-BFGS-B optimization. The last class (highest code) is the reference category, matching the convention used by R’s nnet::multinom().

The model assumes:

P(Y = j | x) = exp(x’ beta_j) / sum_k exp(x’ beta_k)

with beta_J = 0 for identifiability.

Parameters:
  • y (_Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) – Response vector of integer class codes 0, 1, …, J-1. Shape (n,).

  • X (_Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]) – Design matrix of shape (n, p). The user is responsible for including an intercept column if desired, matching R’s behavior where formula syntax handles intercept addition.

  • names (list[str] | None) – Optional list of predictor names matching columns of X. If None, defaults to [“x0”, “x1”, …].

  • category_names (list[str] | None) – Optional list of class labels in order 0, 1, …, J-1. If None, defaults to [“0”, “1”, …].

  • tol (float) – Convergence tolerance for L-BFGS-B optimizer. Both function tolerance (ftol) and gradient tolerance (gtol) are set to this value.

  • max_iter (int) – Maximum number of optimizer iterations.

  • backend (str | None) – Compute backend = (device, precision). 'cpu' (default): the numpy reference path, validated against R nnet::multinom. 'gpu': float32, matches CPU at the project’s GPU_FP32 tolerance tier (rtol = 1e-4) — raises if no GPU is available (Rule 1: no silent fallback). 'gpu_fp64': float64, CUDA only (raises on MPS) for CPU-matching precision, subject to the consumer-NVIDIA FP64 throughput penalty. 'auto': GPU-float32 if CUDA is present, else CPU. GPU keeps X and the one-hot y on the device for the full optimization; only the parameter vector and (scalar NLL, gradient vector) cross the bus per L-BFGS-B iteration.

  • conf_level (float)

Returns:

MultinomialSolution wrapping the fitted model.

Raises:
Return type:

MultinomialSolution

Examples

>>> import numpy as np
>>> from pystatistics.multinomial import multinom
>>> rng = np.random.default_rng(42)
>>> n = 200
>>> X = np.column_stack([np.ones(n), rng.standard_normal((n, 2))])
>>> # True model: 3 classes, last is reference
>>> y = rng.choice(3, size=n)
>>> result = multinom(y, X)
>>> result.summary()
class pystatistics.multinomial.MultinomialSolution(_result, _conf_level=0.95)[source]

Bases: SolutionReprMixin

Multinomial logistic regression solution.

Wraps a Result[MultinomialParams] and provides convenient access to model quantities: coefficients, standard errors, z-values, p-values, predictions, and model fit statistics.

The reference class is the last class (highest code), matching R’s nnet::multinom() convention. Coefficients are reported for each non-reference class relative to the reference.

Parameters:
  • _result (Result[MultinomialParams])

  • _conf_level (float)

_result

The underlying Result[MultinomialParams] object.

__init__(_result, _conf_level=0.95)[source]

Initialize from a Result[MultinomialParams].

Parameters:
  • _result (Result[MultinomialParams]) – The fitted model result. Must contain a MultinomialParams payload.

  • _conf_level (float) – Confidence level for conf_int (default 0.95).

Return type:

None

property coefficient_matrix: ndarray[tuple[Any, ...], dtype[floating[Any]]]

Coefficient matrix of shape (J-1, p).

One row per non-reference class, one column per predictor.

property coef: dict[str, dict[str, float]]

class -> predictor -> value.

Excludes the reference class (whose coefficients are all zero by definition).

Returns:

Dict mapping each non-reference class name to a dict mapping predictor names to coefficient values.

Type:

Coefficients as nested dict

property vcov: ndarray[tuple[Any, ...], dtype[floating[Any]]]

Variance-covariance matrix of the estimated coefficients.

Shape ((J-1)*p, (J-1)*p), ordered to match coefficient_matrix.ravel() (row-major over (class, predictor)). The diagonal reshaped to (J-1, p) gives the squared standard errors.

property standard_errors: ndarray[tuple[Any, ...], dtype[floating[Any]]]

Standard error matrix of shape (J-1, p).

Derived from the diagonal of the variance-covariance matrix.

property z_values: ndarray[tuple[Any, ...], dtype[floating[Any]]]

Z-value (Wald statistic) matrix of shape (J-1, p).

Computed as coefficient / standard_error.

property p_values: ndarray[tuple[Any, ...], dtype[floating[Any]]]

Two-sided p-value matrix of shape (J-1, p).

Computed from the z-values using the standard normal distribution.

property conf_level: float

Confidence level for conf_int (default 0.95).

property conf_int: ndarray[tuple[Any, ...], dtype[floating[Any]]]

Wald confidence intervals, shape (J-1, p, 2).

coef ± z * se per (non-reference class, predictor), with the normal quantile for conf_level (multinomial inference is asymptotic-normal). The trailing axis is [lower, upper]; exp(conf_int) gives relative-risk-ratio intervals.

property fitted_probs: ndarray[tuple[Any, ...], dtype[floating[Any]]]

Predicted probability matrix of shape (n, J).

Each row sums to 1. Columns correspond to class_names in order.

property predicted_class: ndarray[tuple[Any, ...], dtype[int64]]

Predicted class codes of shape (n,).

The predicted class for each observation is the one with the highest fitted probability (argmax).

property log_likelihood: float

Maximized log-likelihood of the fitted model.

property aic: float

Akaike Information Criterion.

property deviance: float

Residual deviance (-2 * log_likelihood).

property null_deviance: float

Null deviance (-2 * null_log_likelihood).

property pseudo_r_squared: float

McFadden’s pseudo R-squared.

Defined as 1 - (log_likelihood / null_log_likelihood). Values closer to 1 indicate better fit.

property n_obs: int

Number of observations.

property n_classes: int

Number of response classes J.

property category_names: tuple[str, ...]

Response category labels in order (last is reference).

property feature_names: tuple[str, ...]

Predictor names matching columns of X.

property converged: bool

Whether the optimizer converged.

property n_iter: int

Number of optimizer iterations.

summary()[source]

Generate R-style summary of the fitted model.

Returns a formatted string matching the style of R’s summary.multinom() output, showing coefficients, standard errors, and z-values for each non-reference class.

Returns:

Formatted summary string.

Return type:

str