Hypothesis Testing¶

t-test, chi-squared, Fisher exact, Wilcoxon, Kolmogorov-Smirnov, proportions, F-test, and p-value adjustment (Holm, BH, Bonferroni, Hochberg, Hommel, BY).

Hypothesis testing module.

Provides hypothesis tests matching R’s implementations, validated against R to rtol=1e-10.

Public API:: t_test(x, y) - Student’s t-test (one-sample, two-sample, paired) chisq_test(x) - Pearson’s chi-squared test (independence, GOF) fisher_test(x) - Fisher’s exact test (2x2 and r x c) wilcox_test(x, y) - Wilcoxon rank-sum / signed-rank test ks_test(x, y) - Kolmogorov-Smirnov test prop_test(x, n) - Test of proportions var_test(x, y) - F-test to compare two variances p_adjust(p) - Multiple testing correction (Holm, BH, etc.)

pystatistics.hypothesis.t_test(x, y=None, *, alternative='two.sided', mu=0.0, paired=False, var_equal=False, conf_level=0.95, backend='cpu')[source]¶

Student’s t-test. Matches R t.test().

Parameters:

x (array-like or HypothesisDesign) – Sample data. 1D numeric vector.
y (array-like or None) – Optional second sample for two-sample test.
alternative (str) – “two.sided” (default), “less”, or “greater”.
mu (float) – Hypothesized mean (one-sample) or difference in means (two-sample). Default 0.
paired (bool) – If True, perform paired t-test. x and y must have same length.
var_equal (bool) – If True, use pooled variance (Student’s t). If False (default), use Welch’s approximation with Welch-Satterthwaite degrees of freedom. R defaults to Welch.
conf_level (float) – Confidence level for the interval. Default 0.95.
backend (str) – ‘cpu’ (default). Hypothesis tests are CPU-only.

Returns:

Test result with statistic, p_value, conf_int, estimate, etc.

Return type:

HTestSolution

pystatistics.hypothesis.chisq_test(x, y=None, *, correct=True, p=None, rescale_p=False, simulate_p_value=False, B=2000, backend='cpu')[source]¶

Pearson’s Chi-squared test. Matches R chisq.test().

Parameters:

x (array-like or HypothesisDesign) – A 2D contingency table, or a 1D vector of observed counts (for goodness-of-fit test). Can also be a pre-built HypothesisDesign.
y (array-like or None) – If x is 1D and y is provided, a contingency table is built from cross-tabulation.
correct (bool) – Apply Yates’ continuity correction for 2x2 tables. Default True (matches R).
p (array-like or None) – Expected proportions for GOF test. If None, assumes uniform.
rescale_p (bool) – If True, rescale p to sum to 1.
simulate_p_value (bool) – If True, compute p-value by Monte Carlo simulation.
B (int) – Number of Monte Carlo replicates. Default 2000.
backend (str) – ‘cpu’ (default). GPU Monte Carlo will be added later.

Returns:

Test result with statistic, p_value, and extras (observed, expected, residuals, stdres for independence test).

Return type:

HTestSolution

pystatistics.hypothesis.fisher_test(x, y=None, *, alternative='two.sided', conf_int=True, conf_level=0.95, simulate_p_value=False, B=2000, backend='cpu')[source]¶

Fisher’s Exact Test for Count Data. Matches R fisher.test().

Parameters:

x (array-like or HypothesisDesign) – A 2D contingency table.
y (array-like or None) – If x is 1D, second factor to cross-tabulate.
alternative (str) – “two.sided” (default), “less”, or “greater”. Only “two.sided” for r x c (r > 2 or c > 2).
conf_int (bool) – Compute confidence interval for odds ratio (2x2 only).
conf_level (float) – Confidence level. Default 0.95.
simulate_p_value (bool) – Use Monte Carlo for p-value.
B (int) – Number of Monte Carlo replicates. Default 2000.
backend (str) – ‘cpu’ (default).

Returns:

Test result with p_value, estimate (odds ratio for 2x2), conf_int (2x2 only).

Return type:

HTestSolution

pystatistics.hypothesis.wilcox_test(x, y=None, *, alternative='two.sided', mu=0.0, paired=False, exact=None, correct=True, conf_int=True, conf_level=0.95, backend='cpu')[source]¶

Wilcoxon rank-sum or signed-rank test. Matches R wilcox.test().

Parameters:

x (array-like or HypothesisDesign) – Numeric vector.
y (array-like or None) – Second sample for rank-sum test, or paired sample.
alternative (str) – “two.sided” (default), “less”, or “greater”.
mu (float) – Hypothesized location (one-sample) or shift (two-sample).
paired (bool) – If True, perform paired (signed-rank) test.
exact (bool or None) – If None (default), use exact test for small n without ties.
correct (bool) – Apply continuity correction for normal approximation.
conf_int (bool) – Compute Hodges-Lehmann confidence interval.
conf_level (float) – Confidence level. Default 0.95.
backend (str) – ‘cpu’ (default).

Returns:

Test result with statistic (V or W), p_value, conf_int, estimate.

Return type:

HTestSolution

pystatistics.hypothesis.ks_test(x, y=None, *, alternative='two.sided', distribution=None, backend='cpu', **dist_params)[source]¶

Kolmogorov-Smirnov test. Matches R ks.test().

Parameters:

x (array-like or HypothesisDesign) – Numeric vector of observations.
y (array-like or None) – Second sample for two-sample test. If None, performs one-sample test against a theoretical distribution.
alternative (str) – “two.sided” (default), “less”, or “greater”.
distribution (str or None) – Distribution name for one-sample test (“norm”, “unif”, “exp”). If None and y is None, defaults to standard normal.
backend (str) – ‘cpu’ (default). Hypothesis tests are CPU-only.
**dist_params (float) – Distribution parameters (e.g., mean=0, sd=1 for “norm”).

Returns:

Test result with statistic (D), p_value.

Return type:

HTestSolution

pystatistics.hypothesis.prop_test(x, n=None, *, p=None, alternative='two.sided', conf_level=0.95, correct=True, backend='cpu')[source]¶

Test of proportions. Matches R prop.test().

Parameters:

x (array-like or HypothesisDesign) – Number of successes. Scalar or vector.
n (array-like or None) – Number of trials. Scalar or vector (same length as x).
p (float or array-like or None) – Null hypothesis proportion(s). If None, tests equality of proportions (k >= 2 groups).
alternative (str) – “two.sided” (default), “less”, or “greater”. Only “two.sided” allowed for k > 1 groups.
conf_level (float) – Confidence level for the interval. Default 0.95.
correct (bool) – Apply Yates’ continuity correction. Default True.
backend (str) – ‘cpu’ (default). Hypothesis tests are CPU-only.

Returns:

Test result with statistic, p_value, conf_int, estimate.

Return type:

HTestSolution

pystatistics.hypothesis.var_test(x, y=None, *, ratio=1.0, alternative='two.sided', conf_level=0.95, backend='cpu')[source]¶

F-test to compare two variances. Matches R var.test().

Parameters:

x (array-like or HypothesisDesign) – First sample.
y (array-like or None) – Second sample. Required unless x is a HypothesisDesign.
ratio (float) – Hypothesized ratio of variances (var_x / var_y). Default 1.
alternative (str) – “two.sided” (default), “less”, or “greater”.
conf_level (float) – Confidence level. Default 0.95.
backend (str) – ‘cpu’ (default). Hypothesis tests are CPU-only.

Returns:

Test result with statistic (F), p_value, conf_int, estimate (ratio of variances).

Return type:

HTestSolution

pystatistics.hypothesis.p_adjust(p, method='holm', n=None)[source]¶

Adjust p-values for multiple comparisons. Matches R p.adjust().

Parameters:

p (array-like) – Vector of p-values.
method (str) – Adjustment method. One of: “holm” (default), “hochberg”, “hommel”, “bonferroni”, “BH”, “BY”, “fdr” (alias for BH), “none”.
n (int or None) – Number of comparisons. Default len(p). Can be larger than len(p) when some p-values are omitted.

Returns:

Adjusted p-values, same length as input. Clipped to [0, 1]. NaN positions in input produce NaN in output.

Return type:

ndarray

class pystatistics.hypothesis.HypothesisDesign(test_type, _x=None, _y=None, _mu=0.0, _alternative='two.sided', _conf_level=0.95, _var_equal=False, _paired=False, _correct=True, _table=None, _successes=None, _trials=None, _expected_p=None, _rescale_p=False, _simulate_p_value=False, _n_monte_carlo=2000, _compute_conf_int=True, _exact=None, _compute_wilcox_ci=True, _distribution=None, _dist_params=None, _ratio=1.0, _data_name='')[source]¶

Bases: object

Design for hypothesis tests.

Uses a tagged-union approach: the test_type field identifies which fields are populated. Factory classmethods validate inputs.

Do not construct directly; use factory classmethods.

Parameters:

test_type (str)
_x (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
_y (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
_mu (float)
_alternative (str)
_conf_level (float)
_var_equal (bool)
_paired (bool)
_correct (bool)
_table (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
_successes (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
_trials (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
_expected_p (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
_rescale_p (bool)
_simulate_p_value (bool)
_n_monte_carlo (int)
_compute_conf_int (bool)
_exact (bool | None)
_compute_wilcox_ci (bool)
_distribution (str | None)
_dist_params (dict[str, float] | None)
_ratio (float)
_data_name (str)

test_type: str¶

property x: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶

property y: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶

property table: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶

property mu: float¶

property alternative: str¶

property conf_level: float¶

property var_equal: bool¶

property paired: bool¶

property correct: bool¶

property simulate_p_value: bool¶

property n_monte_carlo: int¶

property compute_conf_int: bool¶

property exact: bool | None¶

property compute_wilcox_ci: bool¶

property distribution: str | None¶

property dist_params: dict[str, float] | None¶

property ratio: float¶

property successes: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶

property trials: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶

property expected_p: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶

property rescale_p: bool¶

property data_name: str¶

classmethod for_t_test(x, y=None, *, mu=0.0, paired=False, var_equal=False, alternative='two.sided', conf_level=0.95)[source]¶

Build design for t_test().

Parameters:

x (ArrayLike)
y (ArrayLike | None)
mu (float)
paired (bool)
var_equal (bool)
alternative (str)
conf_level (float)

Return type:

HypothesisDesign

classmethod for_chisq_test(x, y=None, *, correct=True, p=None, rescale_p=False, simulate_p_value=False, B=2000)[source]¶

Build design for chisq_test().

If x is 2D (or x and y are both 1D), it’s an independence test. If x is 1D with no y, it’s a goodness-of-fit test.

Parameters:

x (ArrayLike)
y (ArrayLike | None)
correct (bool)
p (ArrayLike | None)
rescale_p (bool)
simulate_p_value (bool)
B (int)

Return type:

HypothesisDesign

classmethod for_prop_test(x, n, *, p=None, alternative='two.sided', conf_level=0.95, correct=True)[source]¶

Build design for prop_test().

Parameters:

x (array-like) – Number of successes. Scalar or vector.
n (array-like) – Number of trials. Scalar or vector (same length as x).
p (float or array-like or None) – Null hypothesis proportions. If None, tests equality of proportions.
alternative (str) – “two.sided”, “less”, or “greater”.
conf_level (float) – Confidence level for the interval.
correct (bool) – Apply Yates’ continuity correction.

Return type:

HypothesisDesign

classmethod for_fisher_test(x, y=None, *, alternative='two.sided', conf_int=True, conf_level=0.95, simulate_p_value=False, B=2000)[source]¶

Build design for fisher_test().

Parameters:

x (array-like) – A 2D contingency table, or 1D factor vector.
y (array-like or None) – If x is 1D, second factor vector to cross-tabulate.
alternative (str) – “two.sided”, “less”, or “greater”. Only “two.sided” for r x c with r > 2 or c > 2.
conf_int (bool) – Compute confidence interval for odds ratio (2x2 only).
conf_level (float) – Confidence level.
simulate_p_value (bool) – Use Monte Carlo simulation for p-value.
B (int) – Number of Monte Carlo replicates.

Return type:

HypothesisDesign

classmethod for_wilcox_test(x, y=None, *, mu=0.0, paired=False, exact=None, correct=True, conf_int=True, conf_level=0.95, alternative='two.sided')[source]¶

Build design for wilcox_test().

Parameters:

x (ArrayLike)
y (ArrayLike | None)
mu (float)
paired (bool)
exact (bool | None)
correct (bool)
conf_int (bool)
conf_level (float)
alternative (str)

Return type:

HypothesisDesign

classmethod for_ks_test(x, y=None, *, alternative='two.sided', distribution=None, **dist_params)[source]¶

Build design for ks_test().

Parameters:

x (array-like) – Numeric vector of observations.
y (array-like or None) – Second sample for two-sample test, OR None for one-sample.
alternative (str) – “two.sided” (default), “less”, or “greater”.
distribution (str or None) – Distribution name for one-sample test (“norm”, “unif”, “exp”). If None and y is None, defaults to standard normal.
**dist_params (float) – Distribution parameters (e.g., mean=0, sd=1 for norm).

Return type:

HypothesisDesign

classmethod for_var_test(x, y, *, ratio=1.0, alternative='two.sided', conf_level=0.95)[source]¶

Build design for var_test().

Parameters:

x (array-like) – First sample.
y (array-like) – Second sample.
ratio (float) – Hypothesized ratio of variances (var_x / var_y). Default 1.
alternative (str) – “two.sided” (default), “less”, or “greater”.
conf_level (float) – Confidence level. Default 0.95.

Return type:

HypothesisDesign

class pystatistics.hypothesis.HTestParams(statistic, statistic_name, parameter, p_value, conf_int, conf_level, estimate, null_value, alternative, method, data_name, extras=None)[source]¶

Bases: object

Parameter payload for hypothesis tests.

Maps directly to R’s htest structure. Every hypothesis test returns this same structure; test-specific extras go in the extras dict.

Parameters:

statistic (float | None)
statistic_name (str)
parameter (dict[str, float] | None)
p_value (float)
conf_int (ndarray[tuple[Any, ...], dtype[floating[Any]]] | None)
conf_level (float)
estimate (dict[str, float] | None)
null_value (dict[str, float] | None)
alternative (str)
method (str)
data_name (str)
extras (dict[str, Any] | None)

statistic¶

Test statistic value (None for Fisher 2x2 exact test).

Type:: float or None

statistic_name¶

Name of the test statistic (“t”, “X-squared”, “W”, “V”, “D”, “F”).

Type:: str

parameter¶

Distribution parameters, e.g. {“df”: 9} or {“num df”: 4, “denom df”: 8}. None for exact tests with no degrees of freedom.

Type:: dict or None

p_value¶

p-value of the test.

Type:: float

conf_int¶

Confidence interval, shape (2,). None if not computed.

Type:: ndarray or None

conf_level¶

Confidence level (e.g. 0.95).

Type:: float

estimate¶

Point estimate(s), e.g. {“mean of x”: 5.1, “mean of y”: 3.2}.

Type:: dict or None

null_value¶

Hypothesized value under H0, e.g. {“difference in means”: 0}.

Type:: dict or None

alternative¶

“two.sided”, “less”, or “greater”.

Type:: str

method¶

Human-readable method name, e.g. “Welch Two Sample t-test”.

Type:: str

data_name¶

Description of the data, e.g. “x and y”.

Type:: str

extras¶

Test-specific additional outputs (e.g. observed/expected/residuals for chi-squared test).

Type:: dict or None

statistic: float | None¶

statistic_name: str¶

parameter: dict[str, float] | None¶

p_value: float¶

conf_int: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶

conf_level: float¶

estimate: dict[str, float] | None¶

null_value: dict[str, float] | None¶

alternative: str¶

method: str¶

data_name: str¶

extras: dict[str, Any] | None = None¶

class pystatistics.hypothesis.HTestSolution(_result, _design)[source]¶

Bases: object

User-facing hypothesis test results.

Wraps Result[HTestParams] and provides R’s print.htest output format via summary(). All standard htest fields are available as properties.

Parameters:

_result (Result[HTestParams])
_design (HypothesisDesign | None)

property statistic: float | None¶: Test statistic value.

property statistic_name: str¶: Name of the test statistic (e.g. ‘t’, ‘X-squared’).

property parameter: dict[str, float] | None¶

9}).

Type:: Distribution parameters (e.g. {‘df’

property p_value: float¶: p-value of the test.

property conf_int: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None¶: Confidence interval, shape (2,).

property conf_level: float¶: Confidence level.

property estimate: dict[str, float] | None¶: Point estimate(s).

property null_value: dict[str, float] | None¶: Hypothesized value under H0.

property alternative: str¶: Alternative hypothesis direction.

property method: str¶: Human-readable method name.

property data_name: str¶: Description of the data.

property extras: dict[str, Any] | None¶: Test-specific additional outputs.

property observed: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None¶

observed counts.

Type:: For chisq_test

property expected: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None¶

expected counts under H0.

Type:: For chisq_test

property residuals: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None¶

Pearson residuals.

Type:: For chisq_test

property stdres: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None¶

standardized residuals.

Type:: For chisq_test

property info: dict[str, Any]¶

property timing: dict[str, float] | None¶

property backend_name: str¶

property warnings: tuple[str, ...]¶

summary()[source]¶

Format as R’s print.htest output.

Produces output like:: Welch Two Sample t-test

data: x and y t = 2.2345, df = 17.43, p-value = 0.03891 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

0.1234567 4.5678901

sample estimates: mean of x mean of y

5.123456 2.789012

Return type:: str