GPU-accelerated statistical computing for Python
Validated against R to machine precision. Free forever.
PyStatistics is a comprehensive statistical computing library for Python that maintains two parallel computational paths:
The library covers the full spectrum of classical statistics: regression, survival analysis, ANOVA, mixed models, bootstrap methods, hypothesis testing, descriptive statistics, and multivariate normal MLE with missing data.
1. Correctness > Fidelity > Performance > Convenience
2. Fail fast, fail loud — no silent fallbacks
3. Explicit over implicit — require parameters, don't assume intent
4. Two-tier validation — CPU vs R, then GPU vs CPU
Coming soon to PyPI. For now, install from source:
pip install git+https://github.com/sgcx-org/pystatistics.git
With GPU support (requires PyTorch):
pip install "pystatistics[gpu] @ git+https://github.com/sgcx-org/pystatistics.git"
from pystatistics.regression import fit
import numpy as np
X = np.random.randn(1000, 5)
y = X @ [1, 2, 3, -1, 0.5] + np.random.randn(1000) * 0.1
result = fit(X, y)
print(result.summary())
# Logistic regression
y_binary = (X @ [1, -1, 0.5, 0, 0] + np.random.randn(1000) > 0).astype(float)
result = fit(X, y_binary, family='binomial')
# GPU acceleration (any model)
result = fit(X, y, backend='gpu')
from pystatistics.hypothesis import t_test, p_adjust
result = t_test([1,2,3,4,5], [3,4,5,6,7])
print(result.statistic, result.p_value, result.conf_int)
print(result.summary()) # R-style output
# Multiple testing correction
p_adjusted = p_adjust([0.01, 0.04, 0.03, 0.005], method='BH')
from pystatistics.survival import kaplan_meier, coxph
import numpy as np
time = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
event = np.array([1, 0, 1, 1, 0, 1, 1, 0, 1, 1])
km = kaplan_meier(time, event)
print(km.survival, km.se, km.ci_lower, km.ci_upper)
X = np.column_stack([np.random.randn(10)])
cox = coxph(time, event, X)
print(cox.coefficients, cox.hazard_ratios)
Every module follows the same architecture: DataSource → Design → fit() → Backend.solve() → Result
Linear & generalized linear models. OLS, logistic, Poisson via IRLS. CPU QR, GPU Cholesky.
API Reference →Mean, SD, correlation, covariance, quantiles (all 9 R types), skewness, kurtosis.
API Reference →t-test, chi-squared, Fisher exact, Wilcoxon, KS, proportions, F-test, p.adjust.
API Reference →Bootstrap (ordinary, balanced, parametric), permutation tests, 5 CI methods, batched GPU solver.
API Reference →One-way, factorial, ANCOVA, repeated measures. Type I/II/III SS. Tukey, Bonferroni, Dunnett.
API Reference →LMM & GLMM. Random intercepts/slopes, nested/crossed, REML/ML, Satterthwaite df.
API Reference →Multivariate normal MLE with missing data. Direct & EM algorithms. Little's MCAR test.
API Reference →