Multivariate Normal MLE

Maximum likelihood estimation for multivariate normal distributions with missing data. Direct BFGS and EM algorithms. Little’s MCAR test. Missing data pattern analysis.

Multivariate normal maximum likelihood estimation with missing data.

Public API:

mlest(data, …) -> MVNSolution

pystatistics.mvnmle.mlest(data_or_design, *, algorithm='direct', backend='auto', method=None, tol=None, max_iter=None, verbose=False)[source]

Maximum likelihood estimation for multivariate normal with missing data.

Accepts EITHER:
  1. An MVNDesign object

  2. Raw data array or DataFrame (convenience)

Parameters:
  • data_or_design (array-like or MVNDesign) – Data matrix with NaN for missing values, or MVNDesign object.

  • algorithm (str) –

    Estimation algorithm: - ‘direct’ (default): BFGS optimization on the log-likelihood,

    using R-exact inverse Cholesky parameterization.

    • ’em’: Expectation-Maximization algorithm. Typically slower to converge but guaranteed monotone likelihood increase.

  • backend (str) – Backend selection: ‘auto’, ‘cpu’, ‘gpu’.

  • method (str or None) – Optimization method for direct algorithm. If None, auto-selected by backend. Ignored for EM.

  • tol (float or None) – Convergence tolerance. If None, uses algorithm-appropriate default: direct = 1e-5 (gradient tolerance), em = 1e-4 (parameter change).

  • max_iter (int or None) – Maximum iterations. If None, uses algorithm-appropriate default: direct = 100, em = 1000.

  • verbose (bool) – Print progress information.

Return type:

MVNSolution

Examples

>>> from pystatistics.mvnmle import mlest, datasets
>>> result = mlest(datasets.apple)
>>> result_em = mlest(datasets.apple, algorithm='em')
>>> print(result.muhat)
>>> print(result.loglik)
class pystatistics.mvnmle.MVNDesign(_data, _n, _p)[source]

Bases: object

Design for multivariate normal MLE with missing data.

Wraps a data matrix (n observations x p variables) that may contain NaN values representing missing data. Immutable after construction.

Construction:

MVNDesign.from_array(data) MVNDesign.from_datasource(ds, columns=[‘a’, ‘b’, ‘c’])

Parameters:
classmethod from_array(data)[source]

Build MVNDesign from array-like data.

Parameters:

data (array-like) – 2D data matrix. Can be numpy array, pandas DataFrame, or any array-like with .values attribute.

Return type:

MVNDesign

classmethod from_datasource(source, *, columns=None)[source]

Build MVNDesign from a DataSource.

Parameters:
  • source (DataSource) – Data source providing columns

  • columns (list of str, optional) – Column names to include. If None, uses all columns.

Return type:

MVNDesign

property data: ndarray[tuple[Any, ...], dtype[floating[Any]]]

Data matrix (n x p), may contain NaN.

property n: int

Number of observations.

property p: int

Number of variables.

property n_missing: int

Total number of missing values.

property missing_rate: float

Overall missing rate (0.0 to 1.0).

property has_missing: bool

Whether data has any missing values.

class pystatistics.mvnmle.MVNSolution(_result, _design)[source]

Bases: object

User-facing MVN MLE results.

Wraps the backend Result and provides convenient accessors for all MVN estimation outputs.

Parameters:
property muhat: ndarray[tuple[Any, ...], dtype[floating[Any]]]

Estimated mean vector.

property sigmahat: ndarray[tuple[Any, ...], dtype[floating[Any]]]

Estimated covariance matrix.

property loglik: float

Log-likelihood at the estimated parameters.

property converged: bool

Whether the optimization converged.

property n_iter: int

Number of iterations.

property gradient_norm: float | None

Final gradient norm, if available.

property correlation_matrix: ndarray[tuple[Any, ...], dtype[floating[Any]]]

Correlation matrix derived from estimated covariance.

property standard_deviations: ndarray[tuple[Any, ...], dtype[floating[Any]]]

Standard deviations from estimated covariance.

property aic: float

Akaike Information Criterion.

property bic: float

Bayesian Information Criterion.

property info: dict[str, Any]

Backend metadata.

property timing: dict[str, float] | None

Execution timing breakdown.

property backend_name: str

Name of the backend that produced this result.

property warnings: tuple[str, ...]

Non-fatal warnings from computation.

summary()[source]

Generate summary output.

Return type:

str

to_dict()[source]

Convert to dictionary for serialization.

Return type:

dict[str, Any]

class pystatistics.mvnmle.MVNParams(muhat, sigmahat, loglik, n_iter, converged, gradient_norm=None)[source]

Bases: object

Parameter payload for MVN MLE.

Immutable data computed by backends.

Parameters:
muhat: ndarray[tuple[Any, ...], dtype[floating[Any]]]
sigmahat: ndarray[tuple[Any, ...], dtype[floating[Any]]]
loglik: float
n_iter: int
converged: bool
gradient_norm: float | None = None
pystatistics.mvnmle.analyze_patterns(data)[source]

Analyze missingness patterns in the data.

Parameters:

data (array-like) – Data matrix with missing values as np.nan. Can be NumPy array or pandas DataFrame.

Returns:

Patterns sorted by frequency (most common first).

Return type:

List[PatternInfo]

pystatistics.mvnmle.pattern_summary(patterns, data_shape=None)[source]

Generate summary statistics for missingness patterns.

Parameters:
  • patterns (List[PatternInfo]) – Pattern information from analyze_patterns()

  • data_shape (Optional[Tuple[int, int]]) – Original data shape (n_obs, n_vars)

Return type:

PatternSummary

class pystatistics.mvnmle.PatternInfo(pattern_id, observed_indices, missing_indices, n_cases, data, pattern_vector)[source]

Bases: object

Information about a single missingness pattern.

Parameters:
pattern_id: int
observed_indices: ndarray
missing_indices: ndarray
n_cases: int
data: ndarray
pattern_vector: ndarray
property n_observed: int
property n_missing: int
property percent_cases: float
class pystatistics.mvnmle.PatternSummary(n_patterns, total_cases, overall_missing_rate, most_common_pattern, complete_cases, complete_cases_percent, variable_missing_rates)[source]

Bases: object

Summary statistics for all missingness patterns in a dataset.

Parameters:
n_patterns: int
total_cases: int
overall_missing_rate: float
most_common_pattern: PatternInfo
complete_cases: int
complete_cases_percent: float
variable_missing_rates: Dict[int, float]
pystatistics.mvnmle.little_mcar_test(data, alpha=0.05, verbose=False)[source]

Little’s test for Missing Completely at Random (MCAR).

Parameters:
  • data (array-like, shape (n_observations, n_variables)) – Data matrix with missing values as np.nan.

  • alpha (float, default=0.05) – Significance level

  • verbose (bool, default=False) – Print detailed progress

Return type:

MCARTestResult

class pystatistics.mvnmle.MCARTestResult(statistic, df, p_value, rejected, alpha, patterns, n_patterns, n_patterns_used, ml_mean, ml_cov, convergence_warnings)[source]

Bases: object

Result of Little’s MCAR test.

Parameters:
statistic: float
df: int
p_value: float
rejected: bool
alpha: float
patterns: List[PatternInfo]
n_patterns: int
n_patterns_used: int
ml_mean: ndarray
ml_cov: ndarray
convergence_warnings: List[str]
summary()[source]

Generate human-readable summary of test results.

Return type:

str