Descriptive Statistics

Descriptive statistics, correlation matrices, covariance, quantiles (all 9 R types), skewness, and kurtosis.

Descriptive statistics module.

Provides comprehensive descriptive statistics matching R’s implementations, with optional GPU acceleration for large datasets.

Public API:

describe(data) - All statistics at once cor(x) - Correlation matrix (Pearson, Spearman, Kendall) cov(x) - Covariance matrix (Bessel-corrected) var(x) - Variance (Bessel-corrected) quantile(x) - Quantiles (all 9 R types) summary(x) - Six-number summary (Min, Q1, Median, Mean, Q3, Max)

pystatistics.descriptive.describe(data, *, use='everything', quantile_type=7, backend='auto')[source]

Compute comprehensive descriptive statistics.

Computes: mean, variance, standard deviation, covariance matrix, Pearson correlation, quantiles (0, 0.25, 0.5, 0.75, 1), skewness, kurtosis, and six-number summary.

Parameters:
  • data (array-like or DescriptiveDesign) – 1D or 2D data matrix.

  • use (str) – Missing data handling. ‘everything’ (propagate NaN), ‘complete.obs’ (listwise deletion), ‘pairwise.complete.obs’ (pairwise deletion for cor/cov).

  • quantile_type (int) – R quantile type (1-9). Default 7 matches R default.

  • backend (str) – ‘auto’, ‘cpu’, ‘gpu’.

Return type:

DescriptiveSolution with all statistics populated.

pystatistics.descriptive.cor(x, y=None, *, method='pearson', use='everything', backend='auto')[source]

Compute correlation matrix. Matches R cor().

Parameters:
  • x (array-like or DescriptiveDesign) – 2D data matrix (columns are variables), or DescriptiveDesign.

  • y (array-like, optional) – Second variable (1D). If provided, computes cor(x, y) by stacking as a 2-column matrix.

  • method (str) – ‘pearson’, ‘spearman’, ‘kendall’.

  • use (str) – Missing data handling.

  • backend (str) – ‘auto’, ‘cpu’, ‘gpu’.

Return type:

DescriptiveSolution with correlation matrix populated.

pystatistics.descriptive.cov(x, y=None, *, use='everything', backend='auto')[source]

Compute covariance matrix (Bessel-corrected, n-1). Matches R cov().

Parameters:
  • x (array-like or DescriptiveDesign) – 2D data matrix (columns are variables).

  • y (array-like, optional) – Second variable (1D).

  • use (str) – Missing data handling.

  • backend (str) – ‘auto’, ‘cpu’, ‘gpu’.

Return type:

DescriptiveSolution with covariance_matrix populated.

pystatistics.descriptive.var(x, *, use='everything', backend='auto')[source]

Compute variance (Bessel-corrected, n-1). Matches R var().

For 1D input: returns per-column variance. For 2D input with p > 1: returns covariance matrix (same as cov()).

Parameters:
  • x (array-like or DescriptiveDesign) – 1D or 2D data.

  • use (str) – Missing data handling.

  • backend (str) – ‘auto’, ‘cpu’, ‘gpu’.

Return type:

DescriptiveSolution with variance or covariance_matrix populated.

pystatistics.descriptive.quantile(x, probs=None, *, type=7, use='everything', backend='auto')[source]

Compute quantiles. Matches R quantile() with all 9 types.

Parameters:
  • x (array-like or DescriptiveDesign) – 1D or 2D data.

  • probs (array-like, optional) – Probabilities in [0, 1]. Default (0, 0.25, 0.5, 0.75, 1.0).

  • type (int) – R quantile type 1-9. Default 7 (R default).

  • use (str) – Missing data handling.

  • backend (str) – ‘auto’, ‘cpu’, ‘gpu’.

Return type:

DescriptiveSolution with quantiles populated.

pystatistics.descriptive.summary(x, *, use='everything', backend='auto')[source]

Compute six-number summary. Matches R summary() for numeric vectors.

Computes: Min, Q1, Median, Mean, Q3, Max (per column).

Parameters:
  • x (array-like or DescriptiveDesign) – 1D or 2D data.

  • use (str) – Missing data handling.

  • backend (str) – ‘auto’, ‘cpu’, ‘gpu’.

Return type:

DescriptiveSolution with summary_table populated.

class pystatistics.descriptive.DescriptiveDesign(_data, _n, _p, _columns)[source]

Bases: object

Design for descriptive statistics.

Wraps a data matrix (n observations x p variables) that may contain NaN values representing missing data. Immutable after construction.

Construction:

DescriptiveDesign.from_array(data) DescriptiveDesign.from_datasource(ds, columns=[‘a’, ‘b’, ‘c’])

Parameters:
classmethod from_array(data)[source]

Build DescriptiveDesign from array-like data.

Parameters:

data (array-like) – 1D or 2D data matrix. Can be numpy array, pandas DataFrame, or any array-like with .values attribute. 1D input is reshaped to (n, 1).

Return type:

DescriptiveDesign

classmethod from_datasource(source, *, columns=None)[source]

Build DescriptiveDesign from a DataSource.

Parameters:
  • source (DataSource) – Data source providing columns.

  • columns (list of str, optional) – Column names to include. If None, uses all columns.

Return type:

DescriptiveDesign

property data: ndarray[tuple[Any, ...], dtype[floating[Any]]]

Data matrix (n x p), may contain NaN.

property n: int

Number of observations.

property p: int

Number of variables.

property columns: tuple[str, ...] | None

Column names, or None if not available.

property n_missing: int

Total number of missing values.

property has_missing: bool

Whether data has any missing values.

class pystatistics.descriptive.DescriptiveParams(mean=None, variance=None, sd=None, skewness=None, kurtosis=None, covariance_matrix=None, correlation_pearson=None, correlation_spearman=None, correlation_kendall=None, quantiles=None, quantile_probs=None, quantile_type=None, summary_table=None, n_complete=None, pairwise_n=None)[source]

Bases: object

Parameter payload for descriptive statistics.

All fields are optional (None if not computed). describe() populates all; individual functions populate only their specific fields.

Parameters:
mean: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None
variance: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None
sd: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None
skewness: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None
kurtosis: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None
covariance_matrix: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None
correlation_pearson: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None
correlation_spearman: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None
correlation_kendall: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None
quantiles: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None
quantile_probs: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None
quantile_type: int | None = None
summary_table: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None
n_complete: int | None = None
pairwise_n: ndarray[tuple[Any, ...], dtype[integer[Any]]] | None = None
class pystatistics.descriptive.DescriptiveSolution(_result, _design)[source]

Bases: object

User-facing descriptive statistics results.

Wraps Result[DescriptiveParams] and provides convenient accessors.

Parameters:
property mean: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None

Per-column means, shape (p,).

property variance: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None

Per-column variance (Bessel-corrected, n-1), shape (p,).

property sd: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None

Per-column standard deviation, shape (p,).

property skewness: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None

Per-column skewness (bias-adjusted), shape (p,).

property kurtosis: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None

Per-column excess kurtosis (bias-adjusted), shape (p,).

property covariance_matrix: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None

Covariance matrix (Bessel-corrected), shape (p, p).

property correlation_matrix: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None

Returns whichever correlation matrix was computed (Pearson first).

property correlation_pearson: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None

Pearson correlation matrix, shape (p, p).

property correlation_spearman: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None

Spearman rank correlation matrix, shape (p, p).

property correlation_kendall: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None

Kendall tau-b correlation matrix, shape (p, p).

property quantiles: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None

Quantile values, shape (n_probs, p).

property quantile_probs: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None

Probabilities used for quantile computation.

property quantile_type: int | None

R quantile type used (1-9).

property summary_table: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None

Min, Q1, Median, Mean, Q3, Max.

Type:

Six-number summary (6, p)

property n_complete: int | None

Number of complete (no-NaN) observations used.

property pairwise_n: ndarray[tuple[Any, ...], dtype[integer[Any]]] | None

Pairwise observation counts, shape (p, p).

property columns: tuple[str, ...] | None

Column names from the design.

property info: dict[str, Any]
property timing: dict[str, float] | None
property backend_name: str
property warnings: tuple[str, ...]
summary()[source]

R-style summary output.

Return type:

str