Descriptive Statistics¶
Descriptive statistics, correlation matrices, covariance, quantiles (all 9 R types), skewness, and kurtosis.
Descriptive statistics module.
Provides comprehensive descriptive statistics matching R’s implementations, with optional GPU acceleration for large datasets.
- Public API:
describe(data) - All statistics at once cor(x) - Correlation matrix (Pearson, Spearman, Kendall) cov(x) - Covariance matrix (Bessel-corrected) var(x) - Variance (Bessel-corrected) quantile(x) - Quantiles (all 9 R types) summary(x) - Six-number summary (Min, Q1, Median, Mean, Q3, Max)
- pystatistics.descriptive.describe(data, *, use='everything', quantile_type=7, backend='auto')[source]¶
Compute comprehensive descriptive statistics.
Computes: mean, variance, standard deviation, covariance matrix, Pearson correlation, quantiles (0, 0.25, 0.5, 0.75, 1), skewness, kurtosis, and six-number summary.
- Parameters:
data (array-like or DescriptiveDesign) – 1D or 2D data matrix.
use (str) – Missing data handling. ‘everything’ (propagate NaN), ‘complete.obs’ (listwise deletion), ‘pairwise.complete.obs’ (pairwise deletion for cor/cov).
quantile_type (int) – R quantile type (1-9). Default 7 matches R default.
backend (str) – ‘auto’, ‘cpu’, ‘gpu’.
- Return type:
DescriptiveSolution with all statistics populated.
- pystatistics.descriptive.cor(x, y=None, *, method='pearson', use='everything', backend='auto')[source]¶
Compute correlation matrix. Matches R cor().
- Parameters:
x (array-like or DescriptiveDesign) – 2D data matrix (columns are variables), or DescriptiveDesign.
y (array-like, optional) – Second variable (1D). If provided, computes cor(x, y) by stacking as a 2-column matrix.
method (str) – ‘pearson’, ‘spearman’, ‘kendall’.
use (str) – Missing data handling.
backend (str) – ‘auto’, ‘cpu’, ‘gpu’.
- Return type:
DescriptiveSolution with correlation matrix populated.
- pystatistics.descriptive.cov(x, y=None, *, use='everything', backend='auto')[source]¶
Compute covariance matrix (Bessel-corrected, n-1). Matches R cov().
- Parameters:
x (array-like or DescriptiveDesign) – 2D data matrix (columns are variables).
y (array-like, optional) – Second variable (1D).
use (str) – Missing data handling.
backend (str) – ‘auto’, ‘cpu’, ‘gpu’.
- Return type:
DescriptiveSolution with covariance_matrix populated.
- pystatistics.descriptive.var(x, *, use='everything', backend='auto')[source]¶
Compute variance (Bessel-corrected, n-1). Matches R var().
For 1D input: returns per-column variance. For 2D input with p > 1: returns covariance matrix (same as cov()).
- Parameters:
x (array-like or DescriptiveDesign) – 1D or 2D data.
use (str) – Missing data handling.
backend (str) – ‘auto’, ‘cpu’, ‘gpu’.
- Return type:
DescriptiveSolution with variance or covariance_matrix populated.
- pystatistics.descriptive.quantile(x, probs=None, *, type=7, use='everything', backend='auto')[source]¶
Compute quantiles. Matches R quantile() with all 9 types.
- Parameters:
x (array-like or DescriptiveDesign) – 1D or 2D data.
probs (array-like, optional) – Probabilities in [0, 1]. Default (0, 0.25, 0.5, 0.75, 1.0).
type (int) – R quantile type 1-9. Default 7 (R default).
use (str) – Missing data handling.
backend (str) – ‘auto’, ‘cpu’, ‘gpu’.
- Return type:
DescriptiveSolution with quantiles populated.
- pystatistics.descriptive.summary(x, *, use='everything', backend='auto')[source]¶
Compute six-number summary. Matches R summary() for numeric vectors.
Computes: Min, Q1, Median, Mean, Q3, Max (per column).
- Parameters:
x (array-like or DescriptiveDesign) – 1D or 2D data.
use (str) – Missing data handling.
backend (str) – ‘auto’, ‘cpu’, ‘gpu’.
- Return type:
DescriptiveSolution with summary_table populated.
- class pystatistics.descriptive.DescriptiveDesign(_data, _n, _p, _columns)[source]¶
Bases:
objectDesign for descriptive statistics.
Wraps a data matrix (n observations x p variables) that may contain NaN values representing missing data. Immutable after construction.
- Construction:
DescriptiveDesign.from_array(data) DescriptiveDesign.from_datasource(ds, columns=[‘a’, ‘b’, ‘c’])
- Parameters:
- classmethod from_array(data)[source]¶
Build DescriptiveDesign from array-like data.
- Parameters:
data (array-like) – 1D or 2D data matrix. Can be numpy array, pandas DataFrame, or any array-like with .values attribute. 1D input is reshaped to (n, 1).
- Return type:
- classmethod from_datasource(source, *, columns=None)[source]¶
Build DescriptiveDesign from a DataSource.
- Parameters:
source (DataSource) – Data source providing columns.
columns (list of str, optional) – Column names to include. If None, uses all columns.
- Return type:
- class pystatistics.descriptive.DescriptiveParams(mean=None, variance=None, sd=None, skewness=None, kurtosis=None, covariance_matrix=None, correlation_pearson=None, correlation_spearman=None, correlation_kendall=None, quantiles=None, quantile_probs=None, quantile_type=None, summary_table=None, n_complete=None, pairwise_n=None)[source]¶
Bases:
objectParameter payload for descriptive statistics.
All fields are optional (None if not computed). describe() populates all; individual functions populate only their specific fields.
- Parameters:
mean (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
variance (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
skewness (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
kurtosis (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
covariance_matrix (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
correlation_pearson (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
correlation_spearman (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
correlation_kendall (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
quantiles (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
quantile_probs (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
quantile_type (int | None)
summary_table (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
n_complete (int | None)
pairwise_n (ndarray[tuple[Any, ...], dtype[integer[Any]]] | None)
- class pystatistics.descriptive.DescriptiveSolution(_result, _design)[source]¶
Bases:
objectUser-facing descriptive statistics results.
Wraps Result[DescriptiveParams] and provides convenient accessors.
- Parameters:
_result (Result[DescriptiveParams])
_design (DescriptiveDesign)
- property variance: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶
Per-column variance (Bessel-corrected, n-1), shape (p,).
- property sd: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶
Per-column standard deviation, shape (p,).
- property skewness: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶
Per-column skewness (bias-adjusted), shape (p,).
- property kurtosis: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶
Per-column excess kurtosis (bias-adjusted), shape (p,).
- property covariance_matrix: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶
Covariance matrix (Bessel-corrected), shape (p, p).
- property correlation_matrix: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶
Returns whichever correlation matrix was computed (Pearson first).
- property correlation_pearson: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶
Pearson correlation matrix, shape (p, p).
- property correlation_spearman: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶
Spearman rank correlation matrix, shape (p, p).
- property correlation_kendall: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶
Kendall tau-b correlation matrix, shape (p, p).
- property quantiles: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶
Quantile values, shape (n_probs, p).
- property quantile_probs: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶
Probabilities used for quantile computation.
- property summary_table: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶
Min, Q1, Median, Mean, Q3, Max.
- Type:
Six-number summary (6, p)