Core¶

Core infrastructure: DataSource container, Result wrapper, device selection, precision management, and exception classes.

Core infrastructure for PyStatistics.

class pystatistics.core.DataSource(_data, _capabilities, _metadata=<factory>)[source]¶

Bases: object

Universal data container. Domain-agnostic.

Construct via factory classmethods, not directly.

The lumber yard analogy: DataSource has data (logs). It doesn’t know or care what you’re building—furniture (regression), paper (MVN MLE), or two-by-fours (survival analysis).

Parameters:

_data (dict[str, Any])
_capabilities (frozenset[str])
_metadata (dict[str, Any])

keys()[source]¶

Return the names of all available arrays.

Returns:: frozenset of array names
Return type:: frozenset[str]

Example

>>> ds = DataSource.from_arrays(X=X, y=y)
>>> ds.keys()
frozenset({'X', 'y'})

property n_observations: int¶: Number of statistical units (rows).

property metadata: dict[str, Any]¶: Domain-agnostic metadata.

supports(capability)[source]¶

Check if this DataSource supports a capability.

Parameters:: capability (str) – Use constants from pystatistics.core.capabilities
Returns:: True if supported, False otherwise
Return type:: bool

Note

Unknown capabilities return False, never raise.

classmethod from_arrays(*, X=None, y=None, data=None, columns=None, **named_arrays)[source]¶

Construct from NumPy arrays.

Parameters:

X (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None)
y (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None)
data (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None)
columns (list[str] | None)
named_arrays (ndarray[tuple[Any, ...], dtype[_ScalarT]])

Return type:

DataSource

classmethod from_file(path, *, columns=None)[source]¶

Construct from file (CSV, NPY).

Parameters:

path (str | Path)
columns (list[str] | None)

Return type:

DataSource

classmethod from_dataframe(df, *, source_path=None)[source]¶

Construct from pandas DataFrame.

Parameters:

df (pd.DataFrame)
source_path (str | None)

Return type:

DataSource

classmethod from_tensors(*, X=None, y=None, **named_tensors)[source]¶

Construct from PyTorch tensors (already on GPU).

Parameters:

X (torch.Tensor | None)
y (torch.Tensor | None)
named_tensors (torch.Tensor)

Return type:

DataSource

classmethod build(*args, **kwargs)[source]¶

Convenience factory that dispatches to appropriate from_* method.

Examples

DataSource.build(X=X, y=y) # from_arrays DataSource.build(“data.csv”) # from_file

Return type:: DataSource

class pystatistics.core.Result(params, info, timing, backend_name, warnings=<factory>, provenance=<factory>)[source]¶

Bases: Generic[P]

Immutable result envelope for statistical computations.

Type Parameters:: P: The domain-specific parameter payload type

Parameters:

params (P)
info (dict[str, Any])
timing (dict[str, float] | None)
backend_name (str)
warnings (tuple[str, ...])
provenance (dict[str, Any])

params¶

Domain-specific parameters (coefficients, estimates, etc.)

Type:: pystatistics.core.result.P

info¶

Structured metadata (method, convergence, diagnostics)

Type:: dict[str, Any]

timing¶

Execution timing breakdown, or None if not measured

Type:: dict[str, float] | None

backend_name¶

Identifier of the backend that produced this result

Type:: str

warnings¶

Non-fatal issues encountered during computation

Type:: tuple[str, …]

provenance¶

Reproducibility metadata (versions, device, algorithm)

Type:: dict[str, Any]

The frozen=True ensures results are immutable after creation, which is important for reproducibility and prevents accidental modification.

Examples

>>> # Direct method (no convergence notion)
>>> Result(
...     params=LinearParams(coefficients=beta),
...     info={'method': 'qr', 'rank': 5},
...     timing={'total_seconds': 0.01},
...     backend_name='cpu_qr'
... )

>>> # Iterative method
>>> Result(
...     params=MVNParams(mu=mu, sigma=sigma),
...     info={'method': 'em', 'converged': True, 'iterations': 23},
...     timing={'total_seconds': 0.5, 'e_step': 0.3, 'm_step': 0.2},
...     backend_name='cpu_em'
... )

params: P¶

info: dict[str, Any]¶

timing: dict[str, float] | None¶

backend_name: str¶

warnings: tuple[str, ...]¶

provenance: dict[str, Any]¶

has_warning(substring)[source]¶

Check if any warning contains the given substring.

Parameters:: substring (str)
Return type:: bool

exception pystatistics.core.PyStatisticsError[source]¶

Bases: Exception

Base exception for all PyStatistics errors.

exception pystatistics.core.ValidationError[source]¶

Bases: PyStatisticsError

Input validation failed.

Raised when user-provided inputs fail validation checks.

exception pystatistics.core.DimensionError[source]¶

Bases: ValidationError

Array dimensions are incorrect or inconsistent.

Raised when array shapes don’t match expected dimensions or when multiple arrays have inconsistent shapes.

exception pystatistics.core.NumericalError[source]¶

Bases: PyStatisticsError

Numerical computation failed.

Base class for errors arising from numerical issues during computation.

exception pystatistics.core.SingularMatrixError(message, matrix_name=None, condition_number=None, rank=None, expected_rank=None)[source]¶

Bases: NumericalError

Matrix is singular or nearly singular.

Raised when a matrix operation requires invertibility but the matrix is singular or numerically rank-deficient.

Parameters:

message (str)
matrix_name (str | None)
condition_number (float | None)
rank (int | None)
expected_rank (int | None)

matrix_name¶: Name/description of the problematic matrix

condition_number¶: Estimated condition number, if available

rank¶: Numerical rank, if computed

expected_rank¶: Expected rank (typically min(n, p))

exception pystatistics.core.NotPositiveDefiniteError(message, matrix_name=None, min_eigenvalue=None)[source]¶

Bases: NumericalError

Matrix is not positive definite.

Raised when an operation requires a positive definite matrix (e.g., Cholesky decomposition) but the matrix fails this requirement.

Parameters:

message (str)
matrix_name (str | None)
min_eigenvalue (float | None)

matrix_name¶: Name/description of the problematic matrix

min_eigenvalue¶: Minimum eigenvalue, if computed

exception pystatistics.core.ConvergenceError(message, iterations, final_change=None, reason=None, threshold=None)[source]¶

Bases: PyStatisticsError

Iterative algorithm failed to converge.

Raised when an iterative optimization method (EM, Newton-Raphson, IRLS) fails to meet convergence criteria within the maximum number of iterations.

Parameters:

message (str)
iterations (int)
final_change (float | None)
reason (str | None)
threshold (float | None)

iterations¶: Number of iterations completed

final_change¶: Final parameter or objective change

reason¶: Why convergence failed (e.g., ‘max_iterations’, ‘diverging’)

threshold¶: The convergence threshold that was not met

DataSource¶

class pystatistics.core.DataSource(_data, _capabilities, _metadata=<factory>)[source]¶

Universal data container. Domain-agnostic.

Construct via factory classmethods, not directly.

The lumber yard analogy: DataSource has data (logs). It doesn’t know or care what you’re building—furniture (regression), paper (MVN MLE), or two-by-fours (survival analysis).

Parameters:

_data (dict[str, Any])
_capabilities (frozenset[str])
_metadata (dict[str, Any])

keys()[source]¶

Return the names of all available arrays.

Returns:: frozenset of array names
Return type:: frozenset[str]

Example

>>> ds = DataSource.from_arrays(X=X, y=y)
>>> ds.keys()
frozenset({'X', 'y'})

property n_observations: int¶: Number of statistical units (rows).

property metadata: dict[str, Any]¶: Domain-agnostic metadata.

supports(capability)[source]¶

Check if this DataSource supports a capability.

Parameters:: capability (str) – Use constants from pystatistics.core.capabilities
Returns:: True if supported, False otherwise
Return type:: bool

Note

Unknown capabilities return False, never raise.

classmethod from_arrays(*, X=None, y=None, data=None, columns=None, **named_arrays)[source]¶

Construct from NumPy arrays.

Parameters:

X (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None)
y (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None)
data (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None)
columns (list[str] | None)
named_arrays (ndarray[tuple[Any, ...], dtype[_ScalarT]])

Return type:

DataSource

classmethod from_file(path, *, columns=None)[source]¶

Construct from file (CSV, NPY).

Parameters:

path (str | Path)
columns (list[str] | None)

Return type:

DataSource

classmethod from_dataframe(df, *, source_path=None)[source]¶

Construct from pandas DataFrame.

Parameters:

df (pd.DataFrame)
source_path (str | None)

Return type:

DataSource

classmethod from_tensors(*, X=None, y=None, **named_tensors)[source]¶

Construct from PyTorch tensors (already on GPU).

Parameters:

X (torch.Tensor | None)
y (torch.Tensor | None)
named_tensors (torch.Tensor)

Return type:

DataSource

classmethod build(*args, **kwargs)[source]¶

Convenience factory that dispatches to appropriate from_* method.

Examples

DataSource.build(X=X, y=y) # from_arrays DataSource.build(“data.csv”) # from_file

Return type:: DataSource