Hypothesis Testing¶
t-test, chi-squared, Fisher exact, Wilcoxon, Kolmogorov-Smirnov, proportions, F-test, and p-value adjustment (Holm, BH, Bonferroni, Hochberg, Hommel, BY).
Hypothesis testing module.
Provides hypothesis tests matching R’s implementations, validated against R to rtol=1e-10.
- Public API:
t_test(x, y) - Student’s t-test (one-sample, two-sample, paired) chisq_test(x) - Pearson’s chi-squared test (independence, GOF) fisher_test(x) - Fisher’s exact test (2x2 and r x c) wilcox_test(x, y) - Wilcoxon rank-sum / signed-rank test ks_test(x, y) - Kolmogorov-Smirnov test prop_test(x, n) - Test of proportions var_test(x, y) - F-test to compare two variances p_adjust(p) - Multiple testing correction (Holm, BH, etc.)
- pystatistics.hypothesis.t_test(x, y=None, *, alternative='two.sided', mu=0.0, paired=False, var_equal=False, conf_level=0.95, backend='cpu')[source]¶
Student’s t-test. Matches R t.test().
- Parameters:
x (array-like or HypothesisDesign) – Sample data. 1D numeric vector.
y (array-like or None) – Optional second sample for two-sample test.
alternative (str) – “two.sided” (default), “less”, or “greater”.
mu (float) – Hypothesized mean (one-sample) or difference in means (two-sample). Default 0.
paired (bool) – If True, perform paired t-test. x and y must have same length.
var_equal (bool) – If True, use pooled variance (Student’s t). If False (default), use Welch’s approximation with Welch-Satterthwaite degrees of freedom. R defaults to Welch.
conf_level (float) – Confidence level for the interval. Default 0.95.
backend (str) – ‘cpu’ (default). Hypothesis tests are CPU-only.
- Returns:
Test result with statistic, p_value, conf_int, estimate, etc.
- Return type:
- pystatistics.hypothesis.chisq_test(x, y=None, *, correct=True, p=None, rescale_p=False, simulate_p_value=False, B=2000, backend='cpu')[source]¶
Pearson’s Chi-squared test. Matches R chisq.test().
- Parameters:
x (array-like or HypothesisDesign) – A 2D contingency table, or a 1D vector of observed counts (for goodness-of-fit test). Can also be a pre-built HypothesisDesign.
y (array-like or None) – If x is 1D and y is provided, a contingency table is built from cross-tabulation.
correct (bool) – Apply Yates’ continuity correction for 2x2 tables. Default True (matches R).
p (array-like or None) – Expected proportions for GOF test. If None, assumes uniform.
rescale_p (bool) – If True, rescale p to sum to 1.
simulate_p_value (bool) – If True, compute p-value by Monte Carlo simulation.
B (int) – Number of Monte Carlo replicates. Default 2000.
backend (str) – ‘cpu’ (default). GPU Monte Carlo will be added later.
- Returns:
Test result with statistic, p_value, and extras (observed, expected, residuals, stdres for independence test).
- Return type:
- pystatistics.hypothesis.fisher_test(x, y=None, *, alternative='two.sided', conf_int=True, conf_level=0.95, simulate_p_value=False, B=2000, backend='cpu')[source]¶
Fisher’s Exact Test for Count Data. Matches R fisher.test().
- Parameters:
x (array-like or HypothesisDesign) – A 2D contingency table.
y (array-like or None) – If x is 1D, second factor to cross-tabulate.
alternative (str) – “two.sided” (default), “less”, or “greater”. Only “two.sided” for r x c (r > 2 or c > 2).
conf_int (bool) – Compute confidence interval for odds ratio (2x2 only).
conf_level (float) – Confidence level. Default 0.95.
simulate_p_value (bool) – Use Monte Carlo for p-value.
B (int) – Number of Monte Carlo replicates. Default 2000.
backend (str) – ‘cpu’ (default).
- Returns:
Test result with p_value, estimate (odds ratio for 2x2), conf_int (2x2 only).
- Return type:
- pystatistics.hypothesis.wilcox_test(x, y=None, *, alternative='two.sided', mu=0.0, paired=False, exact=None, correct=True, conf_int=True, conf_level=0.95, backend='cpu')[source]¶
Wilcoxon rank-sum or signed-rank test. Matches R wilcox.test().
- Parameters:
x (array-like or HypothesisDesign) – Numeric vector.
y (array-like or None) – Second sample for rank-sum test, or paired sample.
alternative (str) – “two.sided” (default), “less”, or “greater”.
mu (float) – Hypothesized location (one-sample) or shift (two-sample).
paired (bool) – If True, perform paired (signed-rank) test.
exact (bool or None) – If None (default), use exact test for small n without ties.
correct (bool) – Apply continuity correction for normal approximation.
conf_int (bool) – Compute Hodges-Lehmann confidence interval.
conf_level (float) – Confidence level. Default 0.95.
backend (str) – ‘cpu’ (default).
- Returns:
Test result with statistic (V or W), p_value, conf_int, estimate.
- Return type:
- pystatistics.hypothesis.ks_test(x, y=None, *, alternative='two.sided', distribution=None, backend='cpu', **dist_params)[source]¶
Kolmogorov-Smirnov test. Matches R ks.test().
- Parameters:
x (array-like or HypothesisDesign) – Numeric vector of observations.
y (array-like or None) – Second sample for two-sample test. If None, performs one-sample test against a theoretical distribution.
alternative (str) – “two.sided” (default), “less”, or “greater”.
distribution (str or None) – Distribution name for one-sample test (“norm”, “unif”, “exp”). If None and y is None, defaults to standard normal.
backend (str) – ‘cpu’ (default). Hypothesis tests are CPU-only.
**dist_params (float) – Distribution parameters (e.g., mean=0, sd=1 for “norm”).
- Returns:
Test result with statistic (D), p_value.
- Return type:
- pystatistics.hypothesis.prop_test(x, n=None, *, p=None, alternative='two.sided', conf_level=0.95, correct=True, backend='cpu')[source]¶
Test of proportions. Matches R prop.test().
- Parameters:
x (array-like or HypothesisDesign) – Number of successes. Scalar or vector.
n (array-like or None) – Number of trials. Scalar or vector (same length as x).
p (float or array-like or None) – Null hypothesis proportion(s). If None, tests equality of proportions (k >= 2 groups).
alternative (str) – “two.sided” (default), “less”, or “greater”. Only “two.sided” allowed for k > 1 groups.
conf_level (float) – Confidence level for the interval. Default 0.95.
correct (bool) – Apply Yates’ continuity correction. Default True.
backend (str) – ‘cpu’ (default). Hypothesis tests are CPU-only.
- Returns:
Test result with statistic, p_value, conf_int, estimate.
- Return type:
- pystatistics.hypothesis.var_test(x, y=None, *, ratio=1.0, alternative='two.sided', conf_level=0.95, backend='cpu')[source]¶
F-test to compare two variances. Matches R var.test().
- Parameters:
x (array-like or HypothesisDesign) – First sample.
y (array-like or None) – Second sample. Required unless x is a HypothesisDesign.
ratio (float) – Hypothesized ratio of variances (var_x / var_y). Default 1.
alternative (str) – “two.sided” (default), “less”, or “greater”.
conf_level (float) – Confidence level. Default 0.95.
backend (str) – ‘cpu’ (default). Hypothesis tests are CPU-only.
- Returns:
Test result with statistic (F), p_value, conf_int, estimate (ratio of variances).
- Return type:
- pystatistics.hypothesis.p_adjust(p, method='holm', n=None)[source]¶
Adjust p-values for multiple comparisons. Matches R p.adjust().
- Parameters:
- Returns:
Adjusted p-values, same length as input. Clipped to [0, 1]. NaN positions in input produce NaN in output.
- Return type:
ndarray
- class pystatistics.hypothesis.HypothesisDesign(test_type, _x=None, _y=None, _mu=0.0, _alternative='two.sided', _conf_level=0.95, _var_equal=False, _paired=False, _correct=True, _table=None, _successes=None, _trials=None, _expected_p=None, _rescale_p=False, _simulate_p_value=False, _n_monte_carlo=2000, _compute_conf_int=True, _exact=None, _compute_wilcox_ci=True, _distribution=None, _dist_params=None, _ratio=1.0, _data_name='')[source]¶
Bases:
objectDesign for hypothesis tests.
Uses a tagged-union approach: the test_type field identifies which fields are populated. Factory classmethods validate inputs.
Do not construct directly; use factory classmethods.
- Parameters:
test_type (str)
_mu (float)
_alternative (str)
_conf_level (float)
_var_equal (bool)
_paired (bool)
_correct (bool)
_table (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
_successes (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
_trials (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
_expected_p (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
_rescale_p (bool)
_simulate_p_value (bool)
_n_monte_carlo (int)
_compute_conf_int (bool)
_exact (bool | None)
_compute_wilcox_ci (bool)
_distribution (str | None)
_ratio (float)
_data_name (str)
- classmethod for_t_test(x, y=None, *, mu=0.0, paired=False, var_equal=False, alternative='two.sided', conf_level=0.95)[source]¶
Build design for t_test().
- classmethod for_chisq_test(x, y=None, *, correct=True, p=None, rescale_p=False, simulate_p_value=False, B=2000)[source]¶
Build design for chisq_test().
If x is 2D (or x and y are both 1D), it’s an independence test. If x is 1D with no y, it’s a goodness-of-fit test.
- Parameters:
- Return type:
- classmethod for_prop_test(x, n, *, p=None, alternative='two.sided', conf_level=0.95, correct=True)[source]¶
Build design for prop_test().
- Parameters:
x (array-like) – Number of successes. Scalar or vector.
n (array-like) – Number of trials. Scalar or vector (same length as x).
p (float or array-like or None) – Null hypothesis proportions. If None, tests equality of proportions.
alternative (str) – “two.sided”, “less”, or “greater”.
conf_level (float) – Confidence level for the interval.
correct (bool) – Apply Yates’ continuity correction.
- Return type:
- classmethod for_fisher_test(x, y=None, *, alternative='two.sided', conf_int=True, conf_level=0.95, simulate_p_value=False, B=2000)[source]¶
Build design for fisher_test().
- Parameters:
x (array-like) – A 2D contingency table, or 1D factor vector.
y (array-like or None) – If x is 1D, second factor vector to cross-tabulate.
alternative (str) – “two.sided”, “less”, or “greater”. Only “two.sided” for r x c with r > 2 or c > 2.
conf_int (bool) – Compute confidence interval for odds ratio (2x2 only).
conf_level (float) – Confidence level.
simulate_p_value (bool) – Use Monte Carlo simulation for p-value.
B (int) – Number of Monte Carlo replicates.
- Return type:
- classmethod for_wilcox_test(x, y=None, *, mu=0.0, paired=False, exact=None, correct=True, conf_int=True, conf_level=0.95, alternative='two.sided')[source]¶
Build design for wilcox_test().
- classmethod for_ks_test(x, y=None, *, alternative='two.sided', distribution=None, **dist_params)[source]¶
Build design for ks_test().
- Parameters:
x (array-like) – Numeric vector of observations.
y (array-like or None) – Second sample for two-sample test, OR None for one-sample.
alternative (str) – “two.sided” (default), “less”, or “greater”.
distribution (str or None) – Distribution name for one-sample test (“norm”, “unif”, “exp”). If None and y is None, defaults to standard normal.
**dist_params (float) – Distribution parameters (e.g., mean=0, sd=1 for norm).
- Return type:
- class pystatistics.hypothesis.HTestParams(statistic, statistic_name, parameter, p_value, conf_int, conf_level, estimate, null_value, alternative, method, data_name, extras=None)[source]¶
Bases:
objectParameter payload for hypothesis tests.
Maps directly to R’s htest structure. Every hypothesis test returns this same structure; test-specific extras go in the extras dict.
- Parameters:
- parameter¶
Distribution parameters, e.g. {“df”: 9} or {“num df”: 4, “denom df”: 8}. None for exact tests with no degrees of freedom.
- Type:
dict or None
- conf_int¶
Confidence interval, shape (2,). None if not computed.
- Type:
ndarray or None
- class pystatistics.hypothesis.HTestSolution(_result, _design)[source]¶
Bases:
objectUser-facing hypothesis test results.
Wraps Result[HTestParams] and provides R’s print.htest output format via summary(). All standard htest fields are available as properties.
- Parameters:
_result (Result[HTestParams])
_design (HypothesisDesign | None)
- property conf_int: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶
Confidence interval, shape (2,).
- property observed: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None¶
observed counts.
- Type:
For chisq_test
- property expected: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None¶
expected counts under H0.
- Type:
For chisq_test
- property residuals: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None¶
Pearson residuals.
- Type:
For chisq_test
- property stdres: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None¶
standardized residuals.
- Type:
For chisq_test
- summary()[source]¶
Format as R’s print.htest output.
- Produces output like:
Welch Two Sample t-test
data: x and y t = 2.2345, df = 17.43, p-value = 0.03891 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:
0.1234567 4.5678901
sample estimates: mean of x mean of y
5.123456 2.789012
- Return type: