Core

Core infrastructure: DataSource container, Result wrapper, device selection, precision management, and exception classes.

Core infrastructure for PyStatistics.

class pystatistics.core.DataSource(_data, _capabilities, _metadata=<factory>)[source]

Bases: object

Universal data container. Domain-agnostic.

Construct via factory classmethods, not directly.

The lumber yard analogy: DataSource has data (logs). It doesn’t know or care what you’re building—furniture (regression), paper (MVN MLE), or two-by-fours (survival analysis).

Parameters:
keys()[source]

Return the names of all available arrays.

Returns:

frozenset of array names

Return type:

frozenset[str]

Example

>>> ds = DataSource.from_arrays(X=X, y=y)
>>> ds.keys()
frozenset({'X', 'y'})
property n_observations: int

Number of statistical units (rows).

property metadata: dict[str, Any]

Domain-agnostic metadata.

supports(capability)[source]

Check if this DataSource supports a capability.

Parameters:

capability (str) – Use constants from pystatistics.core.capabilities

Returns:

True if supported, False otherwise

Return type:

bool

Note

Unknown capabilities return False, never raise.

classmethod from_arrays(*, X=None, y=None, data=None, columns=None, **named_arrays)[source]

Construct from NumPy arrays.

Parameters:
Return type:

DataSource

classmethod from_file(path, *, columns=None)[source]

Construct from file (CSV, NPY).

Parameters:
Return type:

DataSource

classmethod from_dataframe(df, *, source_path=None)[source]

Construct from pandas DataFrame.

Parameters:
  • df (pd.DataFrame)

  • source_path (str | None)

Return type:

DataSource

classmethod from_tensors(*, X=None, y=None, **named_tensors)[source]

Construct from PyTorch tensors (already on GPU).

Parameters:
  • X (torch.Tensor | None)

  • y (torch.Tensor | None)

  • named_tensors (torch.Tensor)

Return type:

DataSource

classmethod build(*args, **kwargs)[source]

Convenience factory that dispatches to appropriate from_* method.

Examples

DataSource.build(X=X, y=y) # from_arrays DataSource.build(“data.csv”) # from_file

Return type:

DataSource

class pystatistics.core.Result(params, info, timing, backend_name, warnings=<factory>, provenance=<factory>)[source]

Bases: Generic[P]

Immutable result envelope for statistical computations.

Type Parameters:

P: The domain-specific parameter payload type

Parameters:
params

Domain-specific parameters (coefficients, estimates, etc.)

Type:

pystatistics.core.result.P

info

Structured metadata (method, convergence, diagnostics)

Type:

dict[str, Any]

timing

Execution timing breakdown, or None if not measured

Type:

dict[str, float] | None

backend_name

Identifier of the backend that produced this result

Type:

str

warnings

Non-fatal issues encountered during computation

Type:

tuple[str, …]

provenance

Reproducibility metadata (versions, device, algorithm)

Type:

dict[str, Any]

The frozen=True ensures results are immutable after creation, which is important for reproducibility and prevents accidental modification.

Examples

>>> # Direct method (no convergence notion)
>>> Result(
...     params=LinearParams(coefficients=beta),
...     info={'method': 'qr', 'rank': 5},
...     timing={'total_seconds': 0.01},
...     backend_name='cpu_qr'
... )
>>> # Iterative method
>>> Result(
...     params=MVNParams(mu=mu, sigma=sigma),
...     info={'method': 'em', 'converged': True, 'iterations': 23},
...     timing={'total_seconds': 0.5, 'e_step': 0.3, 'm_step': 0.2},
...     backend_name='cpu_em'
... )
params: P
info: dict[str, Any]
timing: dict[str, float] | None
backend_name: str
warnings: tuple[str, ...]
provenance: dict[str, Any]
has_warning(substring)[source]

Check if any warning contains the given substring.

Parameters:

substring (str)

Return type:

bool

exception pystatistics.core.PyStatisticsError[source]

Bases: Exception

Base exception for all PyStatistics errors.

exception pystatistics.core.ValidationError[source]

Bases: PyStatisticsError

Input validation failed.

Raised when user-provided inputs fail validation checks.

exception pystatistics.core.DimensionError[source]

Bases: ValidationError

Array dimensions are incorrect or inconsistent.

Raised when array shapes don’t match expected dimensions or when multiple arrays have inconsistent shapes.

exception pystatistics.core.NumericalError[source]

Bases: PyStatisticsError

Numerical computation failed.

Base class for errors arising from numerical issues during computation.

exception pystatistics.core.SingularMatrixError(message, matrix_name=None, condition_number=None, rank=None, expected_rank=None)[source]

Bases: NumericalError

Matrix is singular or nearly singular.

Raised when a matrix operation requires invertibility but the matrix is singular or numerically rank-deficient.

Parameters:
  • message (str)

  • matrix_name (str | None)

  • condition_number (float | None)

  • rank (int | None)

  • expected_rank (int | None)

matrix_name

Name/description of the problematic matrix

condition_number

Estimated condition number, if available

rank

Numerical rank, if computed

expected_rank

Expected rank (typically min(n, p))

exception pystatistics.core.NotPositiveDefiniteError(message, matrix_name=None, min_eigenvalue=None)[source]

Bases: NumericalError

Matrix is not positive definite.

Raised when an operation requires a positive definite matrix (e.g., Cholesky decomposition) but the matrix fails this requirement.

Parameters:
  • message (str)

  • matrix_name (str | None)

  • min_eigenvalue (float | None)

matrix_name

Name/description of the problematic matrix

min_eigenvalue

Minimum eigenvalue, if computed

exception pystatistics.core.ConvergenceError(message, iterations, final_change=None, reason=None, threshold=None)[source]

Bases: PyStatisticsError

Iterative algorithm failed to converge.

Raised when an iterative optimization method (EM, Newton-Raphson, IRLS) fails to meet convergence criteria within the maximum number of iterations.

Parameters:
  • message (str)

  • iterations (int)

  • final_change (float | None)

  • reason (str | None)

  • threshold (float | None)

iterations

Number of iterations completed

final_change

Final parameter or objective change

reason

Why convergence failed (e.g., ‘max_iterations’, ‘diverging’)

threshold

The convergence threshold that was not met

DataSource

class pystatistics.core.DataSource(_data, _capabilities, _metadata=<factory>)[source]

Universal data container. Domain-agnostic.

Construct via factory classmethods, not directly.

The lumber yard analogy: DataSource has data (logs). It doesn’t know or care what you’re building—furniture (regression), paper (MVN MLE), or two-by-fours (survival analysis).

Parameters:
keys()[source]

Return the names of all available arrays.

Returns:

frozenset of array names

Return type:

frozenset[str]

Example

>>> ds = DataSource.from_arrays(X=X, y=y)
>>> ds.keys()
frozenset({'X', 'y'})
property n_observations: int

Number of statistical units (rows).

property metadata: dict[str, Any]

Domain-agnostic metadata.

supports(capability)[source]

Check if this DataSource supports a capability.

Parameters:

capability (str) – Use constants from pystatistics.core.capabilities

Returns:

True if supported, False otherwise

Return type:

bool

Note

Unknown capabilities return False, never raise.

classmethod from_arrays(*, X=None, y=None, data=None, columns=None, **named_arrays)[source]

Construct from NumPy arrays.

Parameters:
Return type:

DataSource

classmethod from_file(path, *, columns=None)[source]

Construct from file (CSV, NPY).

Parameters:
Return type:

DataSource

classmethod from_dataframe(df, *, source_path=None)[source]

Construct from pandas DataFrame.

Parameters:
  • df (pd.DataFrame)

  • source_path (str | None)

Return type:

DataSource

classmethod from_tensors(*, X=None, y=None, **named_tensors)[source]

Construct from PyTorch tensors (already on GPU).

Parameters:
  • X (torch.Tensor | None)

  • y (torch.Tensor | None)

  • named_tensors (torch.Tensor)

Return type:

DataSource

classmethod build(*args, **kwargs)[source]

Convenience factory that dispatches to appropriate from_* method.

Examples

DataSource.build(X=X, y=y) # from_arrays DataSource.build(“data.csv”) # from_file

Return type:

DataSource