Hypothesis Testing

t-test, chi-squared, Fisher exact, Wilcoxon, Kolmogorov-Smirnov, proportions, F-test, and p-value adjustment (Holm, BH, Bonferroni, Hochberg, Hommel, BY).

Hypothesis testing module.

Provides hypothesis tests matching R’s implementations, validated against R to rtol=1e-10.

Public API:

t_test(x, y) - Student’s t-test (one-sample, two-sample, paired) chisq_test(x) - Pearson’s chi-squared test (independence, GOF) fisher_test(x) - Fisher’s exact test (2x2 and r x c) wilcox_test(x, y) - Wilcoxon rank-sum / signed-rank test ks_test(x, y) - Kolmogorov-Smirnov test prop_test(x, n) - Test of proportions var_test(x, y) - F-test to compare two variances p_adjust(p) - Multiple testing correction (Holm, BH, etc.)

pystatistics.hypothesis.t_test(x, y=None, *, alternative='two.sided', mu=0.0, paired=False, var_equal=False, conf_level=0.95, backend='cpu')[source]

Student’s t-test. Matches R t.test().

Parameters:
  • x (array-like or HypothesisDesign) – Sample data. 1D numeric vector.

  • y (array-like or None) – Optional second sample for two-sample test.

  • alternative (str) – “two.sided” (default), “less”, or “greater”.

  • mu (float) – Hypothesized mean (one-sample) or difference in means (two-sample). Default 0.

  • paired (bool) – If True, perform paired t-test. x and y must have same length.

  • var_equal (bool) – If True, use pooled variance (Student’s t). If False (default), use Welch’s approximation with Welch-Satterthwaite degrees of freedom. R defaults to Welch.

  • conf_level (float) – Confidence level for the interval. Default 0.95.

  • backend (str) – ‘cpu’ (default). Hypothesis tests are CPU-only.

Returns:

Test result with statistic, p_value, conf_int, estimate, etc.

Return type:

HTestSolution

pystatistics.hypothesis.chisq_test(x, y=None, *, correct=True, p=None, rescale_p=False, simulate_p_value=False, B=2000, backend='cpu')[source]

Pearson’s Chi-squared test. Matches R chisq.test().

Parameters:
  • x (array-like or HypothesisDesign) – A 2D contingency table, or a 1D vector of observed counts (for goodness-of-fit test). Can also be a pre-built HypothesisDesign.

  • y (array-like or None) – If x is 1D and y is provided, a contingency table is built from cross-tabulation.

  • correct (bool) – Apply Yates’ continuity correction for 2x2 tables. Default True (matches R).

  • p (array-like or None) – Expected proportions for GOF test. If None, assumes uniform.

  • rescale_p (bool) – If True, rescale p to sum to 1.

  • simulate_p_value (bool) – If True, compute p-value by Monte Carlo simulation.

  • B (int) – Number of Monte Carlo replicates. Default 2000.

  • backend (str) – ‘cpu’ (default). GPU Monte Carlo will be added later.

Returns:

Test result with statistic, p_value, and extras (observed, expected, residuals, stdres for independence test).

Return type:

HTestSolution

pystatistics.hypothesis.fisher_test(x, y=None, *, alternative='two.sided', conf_int=True, conf_level=0.95, simulate_p_value=False, B=2000, backend='cpu')[source]

Fisher’s Exact Test for Count Data. Matches R fisher.test().

Parameters:
  • x (array-like or HypothesisDesign) – A 2D contingency table.

  • y (array-like or None) – If x is 1D, second factor to cross-tabulate.

  • alternative (str) – “two.sided” (default), “less”, or “greater”. Only “two.sided” for r x c (r > 2 or c > 2).

  • conf_int (bool) – Compute confidence interval for odds ratio (2x2 only).

  • conf_level (float) – Confidence level. Default 0.95.

  • simulate_p_value (bool) – Use Monte Carlo for p-value.

  • B (int) – Number of Monte Carlo replicates. Default 2000.

  • backend (str) – ‘cpu’ (default).

Returns:

Test result with p_value, estimate (odds ratio for 2x2), conf_int (2x2 only).

Return type:

HTestSolution

pystatistics.hypothesis.wilcox_test(x, y=None, *, alternative='two.sided', mu=0.0, paired=False, exact=None, correct=True, conf_int=True, conf_level=0.95, backend='cpu')[source]

Wilcoxon rank-sum or signed-rank test. Matches R wilcox.test().

Parameters:
  • x (array-like or HypothesisDesign) – Numeric vector.

  • y (array-like or None) – Second sample for rank-sum test, or paired sample.

  • alternative (str) – “two.sided” (default), “less”, or “greater”.

  • mu (float) – Hypothesized location (one-sample) or shift (two-sample).

  • paired (bool) – If True, perform paired (signed-rank) test.

  • exact (bool or None) – If None (default), use exact test for small n without ties.

  • correct (bool) – Apply continuity correction for normal approximation.

  • conf_int (bool) – Compute Hodges-Lehmann confidence interval.

  • conf_level (float) – Confidence level. Default 0.95.

  • backend (str) – ‘cpu’ (default).

Returns:

Test result with statistic (V or W), p_value, conf_int, estimate.

Return type:

HTestSolution

pystatistics.hypothesis.ks_test(x, y=None, *, alternative='two.sided', distribution=None, backend='cpu', **dist_params)[source]

Kolmogorov-Smirnov test. Matches R ks.test().

Parameters:
  • x (array-like or HypothesisDesign) – Numeric vector of observations.

  • y (array-like or None) – Second sample for two-sample test. If None, performs one-sample test against a theoretical distribution.

  • alternative (str) – “two.sided” (default), “less”, or “greater”.

  • distribution (str or None) – Distribution name for one-sample test (“norm”, “unif”, “exp”). If None and y is None, defaults to standard normal.

  • backend (str) – ‘cpu’ (default). Hypothesis tests are CPU-only.

  • **dist_params (float) – Distribution parameters (e.g., mean=0, sd=1 for “norm”).

Returns:

Test result with statistic (D), p_value.

Return type:

HTestSolution

pystatistics.hypothesis.prop_test(x, n=None, *, p=None, alternative='two.sided', conf_level=0.95, correct=True, backend='cpu')[source]

Test of proportions. Matches R prop.test().

Parameters:
  • x (array-like or HypothesisDesign) – Number of successes. Scalar or vector.

  • n (array-like or None) – Number of trials. Scalar or vector (same length as x).

  • p (float or array-like or None) – Null hypothesis proportion(s). If None, tests equality of proportions (k >= 2 groups).

  • alternative (str) – “two.sided” (default), “less”, or “greater”. Only “two.sided” allowed for k > 1 groups.

  • conf_level (float) – Confidence level for the interval. Default 0.95.

  • correct (bool) – Apply Yates’ continuity correction. Default True.

  • backend (str) – ‘cpu’ (default). Hypothesis tests are CPU-only.

Returns:

Test result with statistic, p_value, conf_int, estimate.

Return type:

HTestSolution

pystatistics.hypothesis.var_test(x, y=None, *, ratio=1.0, alternative='two.sided', conf_level=0.95, backend='cpu')[source]

F-test to compare two variances. Matches R var.test().

Parameters:
  • x (array-like or HypothesisDesign) – First sample.

  • y (array-like or None) – Second sample. Required unless x is a HypothesisDesign.

  • ratio (float) – Hypothesized ratio of variances (var_x / var_y). Default 1.

  • alternative (str) – “two.sided” (default), “less”, or “greater”.

  • conf_level (float) – Confidence level. Default 0.95.

  • backend (str) – ‘cpu’ (default). Hypothesis tests are CPU-only.

Returns:

Test result with statistic (F), p_value, conf_int, estimate (ratio of variances).

Return type:

HTestSolution

pystatistics.hypothesis.p_adjust(p, method='holm', n=None)[source]

Adjust p-values for multiple comparisons. Matches R p.adjust().

Parameters:
  • p (array-like) – Vector of p-values.

  • method (str) – Adjustment method. One of: “holm” (default), “hochberg”, “hommel”, “bonferroni”, “BH”, “BY”, “fdr” (alias for BH), “none”.

  • n (int or None) – Number of comparisons. Default len(p). Can be larger than len(p) when some p-values are omitted.

Returns:

Adjusted p-values, same length as input. Clipped to [0, 1]. NaN positions in input produce NaN in output.

Return type:

ndarray

class pystatistics.hypothesis.HypothesisDesign(test_type, _x=None, _y=None, _mu=0.0, _alternative='two.sided', _conf_level=0.95, _var_equal=False, _paired=False, _correct=True, _table=None, _successes=None, _trials=None, _expected_p=None, _rescale_p=False, _simulate_p_value=False, _n_monte_carlo=2000, _compute_conf_int=True, _exact=None, _compute_wilcox_ci=True, _distribution=None, _dist_params=None, _ratio=1.0, _data_name='')[source]

Bases: object

Design for hypothesis tests.

Uses a tagged-union approach: the test_type field identifies which fields are populated. Factory classmethods validate inputs.

Do not construct directly; use factory classmethods.

Parameters:
test_type: str
property x: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None
property y: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None
property table: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None
property mu: float
property alternative: str
property conf_level: float
property var_equal: bool
property paired: bool
property correct: bool
property simulate_p_value: bool
property n_monte_carlo: int
property compute_conf_int: bool
property exact: bool | None
property compute_wilcox_ci: bool
property distribution: str | None
property dist_params: dict[str, float] | None
property ratio: float
property successes: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None
property trials: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None
property expected_p: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None
property rescale_p: bool
property data_name: str
classmethod for_t_test(x, y=None, *, mu=0.0, paired=False, var_equal=False, alternative='two.sided', conf_level=0.95)[source]

Build design for t_test().

Parameters:
  • x (ArrayLike)

  • y (ArrayLike | None)

  • mu (float)

  • paired (bool)

  • var_equal (bool)

  • alternative (str)

  • conf_level (float)

Return type:

HypothesisDesign

classmethod for_chisq_test(x, y=None, *, correct=True, p=None, rescale_p=False, simulate_p_value=False, B=2000)[source]

Build design for chisq_test().

If x is 2D (or x and y are both 1D), it’s an independence test. If x is 1D with no y, it’s a goodness-of-fit test.

Parameters:
  • x (ArrayLike)

  • y (ArrayLike | None)

  • correct (bool)

  • p (ArrayLike | None)

  • rescale_p (bool)

  • simulate_p_value (bool)

  • B (int)

Return type:

HypothesisDesign

classmethod for_prop_test(x, n, *, p=None, alternative='two.sided', conf_level=0.95, correct=True)[source]

Build design for prop_test().

Parameters:
  • x (array-like) – Number of successes. Scalar or vector.

  • n (array-like) – Number of trials. Scalar or vector (same length as x).

  • p (float or array-like or None) – Null hypothesis proportions. If None, tests equality of proportions.

  • alternative (str) – “two.sided”, “less”, or “greater”.

  • conf_level (float) – Confidence level for the interval.

  • correct (bool) – Apply Yates’ continuity correction.

Return type:

HypothesisDesign

classmethod for_fisher_test(x, y=None, *, alternative='two.sided', conf_int=True, conf_level=0.95, simulate_p_value=False, B=2000)[source]

Build design for fisher_test().

Parameters:
  • x (array-like) – A 2D contingency table, or 1D factor vector.

  • y (array-like or None) – If x is 1D, second factor vector to cross-tabulate.

  • alternative (str) – “two.sided”, “less”, or “greater”. Only “two.sided” for r x c with r > 2 or c > 2.

  • conf_int (bool) – Compute confidence interval for odds ratio (2x2 only).

  • conf_level (float) – Confidence level.

  • simulate_p_value (bool) – Use Monte Carlo simulation for p-value.

  • B (int) – Number of Monte Carlo replicates.

Return type:

HypothesisDesign

classmethod for_wilcox_test(x, y=None, *, mu=0.0, paired=False, exact=None, correct=True, conf_int=True, conf_level=0.95, alternative='two.sided')[source]

Build design for wilcox_test().

Parameters:
  • x (ArrayLike)

  • y (ArrayLike | None)

  • mu (float)

  • paired (bool)

  • exact (bool | None)

  • correct (bool)

  • conf_int (bool)

  • conf_level (float)

  • alternative (str)

Return type:

HypothesisDesign

classmethod for_ks_test(x, y=None, *, alternative='two.sided', distribution=None, **dist_params)[source]

Build design for ks_test().

Parameters:
  • x (array-like) – Numeric vector of observations.

  • y (array-like or None) – Second sample for two-sample test, OR None for one-sample.

  • alternative (str) – “two.sided” (default), “less”, or “greater”.

  • distribution (str or None) – Distribution name for one-sample test (“norm”, “unif”, “exp”). If None and y is None, defaults to standard normal.

  • **dist_params (float) – Distribution parameters (e.g., mean=0, sd=1 for norm).

Return type:

HypothesisDesign

classmethod for_var_test(x, y, *, ratio=1.0, alternative='two.sided', conf_level=0.95)[source]

Build design for var_test().

Parameters:
  • x (array-like) – First sample.

  • y (array-like) – Second sample.

  • ratio (float) – Hypothesized ratio of variances (var_x / var_y). Default 1.

  • alternative (str) – “two.sided” (default), “less”, or “greater”.

  • conf_level (float) – Confidence level. Default 0.95.

Return type:

HypothesisDesign

class pystatistics.hypothesis.HTestParams(statistic, statistic_name, parameter, p_value, conf_int, conf_level, estimate, null_value, alternative, method, data_name, extras=None)[source]

Bases: object

Parameter payload for hypothesis tests.

Maps directly to R’s htest structure. Every hypothesis test returns this same structure; test-specific extras go in the extras dict.

Parameters:
statistic

Test statistic value (None for Fisher 2x2 exact test).

Type:

float or None

statistic_name

Name of the test statistic (“t”, “X-squared”, “W”, “V”, “D”, “F”).

Type:

str

parameter

Distribution parameters, e.g. {“df”: 9} or {“num df”: 4, “denom df”: 8}. None for exact tests with no degrees of freedom.

Type:

dict or None

p_value

p-value of the test.

Type:

float

conf_int

Confidence interval, shape (2,). None if not computed.

Type:

ndarray or None

conf_level

Confidence level (e.g. 0.95).

Type:

float

estimate

Point estimate(s), e.g. {“mean of x”: 5.1, “mean of y”: 3.2}.

Type:

dict or None

null_value

Hypothesized value under H0, e.g. {“difference in means”: 0}.

Type:

dict or None

alternative

“two.sided”, “less”, or “greater”.

Type:

str

method

Human-readable method name, e.g. “Welch Two Sample t-test”.

Type:

str

data_name

Description of the data, e.g. “x and y”.

Type:

str

extras

Test-specific additional outputs (e.g. observed/expected/residuals for chi-squared test).

Type:

dict or None

statistic: float | None
statistic_name: str
parameter: dict[str, float] | None
p_value: float
conf_int: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None
conf_level: float
estimate: dict[str, float] | None
null_value: dict[str, float] | None
alternative: str
method: str
data_name: str
extras: dict[str, Any] | None = None
class pystatistics.hypothesis.HTestSolution(_result, _design)[source]

Bases: object

User-facing hypothesis test results.

Wraps Result[HTestParams] and provides R’s print.htest output format via summary(). All standard htest fields are available as properties.

Parameters:
property statistic: float | None

Test statistic value.

property statistic_name: str

Name of the test statistic (e.g. ‘t’, ‘X-squared’).

property parameter: dict[str, float] | None

9}).

Type:

Distribution parameters (e.g. {‘df’

property p_value: float

p-value of the test.

property conf_int: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None

Confidence interval, shape (2,).

property conf_level: float

Confidence level.

property estimate: dict[str, float] | None

Point estimate(s).

property null_value: dict[str, float] | None

Hypothesized value under H0.

property alternative: str

Alternative hypothesis direction.

property method: str

Human-readable method name.

property data_name: str

Description of the data.

property extras: dict[str, Any] | None

Test-specific additional outputs.

property observed: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None

observed counts.

Type:

For chisq_test

property expected: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None

expected counts under H0.

Type:

For chisq_test

property residuals: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None

Pearson residuals.

Type:

For chisq_test

property stdres: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None

standardized residuals.

Type:

For chisq_test

property info: dict[str, Any]
property timing: dict[str, float] | None
property backend_name: str
property warnings: tuple[str, ...]
summary()[source]

Format as R’s print.htest output.

Produces output like:

Welch Two Sample t-test

data: x and y t = 2.2345, df = 17.43, p-value = 0.03891 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

0.1234567 4.5678901

sample estimates: mean of x mean of y

5.123456 2.789012

Return type:

str