Monte Carlo Methods¶

Bootstrap resampling (ordinary, balanced, parametric), permutation tests, and five bootstrap confidence interval methods (normal, basic, percentile, BCa, studentized).

PyStatistics Monte Carlo methods.

Provides bootstrap resampling (matching R’s boot package) and permutation testing with CPU and GPU backends.

Usage:

from pystatistics.montecarlo import boot, boot_ci, permutation_test

# Bootstrap result = boot(data, statistic, R=999, seed=42) ci_result = boot_ci(result, type=”perc”)

# Permutation test result = permutation_test(x, y, statistic, R=9999)

pystatistics.montecarlo.boot(data, statistic, R=999, *, sim='ordinary', stype='i', strata=None, ran_gen=None, mle=None, seed=None, backend='auto')[source]¶

Bootstrap resampling. Matches R’s boot::boot().

The statistic function signature depends on sim: - For nonparametric (sim=”ordinary” or “balanced”):

statistic(data, indices) -> array of shape (k,) where indices are bootstrap sample indices (stype=”i”), frequency counts (stype=”f”), or weights (stype=”w”).

For parametric (sim=”parametric”):
statistic(simulated_data) -> array of shape (k,) where simulated_data is generated by ran_gen(data, mle, rng).

Parameters:

data (ArrayLike) – Original data, shape (n,) or (n, p).
statistic (Callable) – Function to compute the statistic(s) of interest.
R (int) – Number of bootstrap replicates. Default 999.
sim (Literal['ordinary', 'parametric', 'balanced']) – Simulation type: “ordinary”, “balanced”, or “parametric”.
stype (Literal['i', 'f', 'w']) – Type of second argument to statistic: “i”, “f”, or “w”.
strata (ArrayLike | None) – Stratification vector (resampling within strata).
ran_gen (Callable | None) – For parametric bootstrap: fn(data, mle, rng) -> sim_data.
mle (Any) – Parameter estimates for parametric bootstrap.
seed (int | None) – Random seed for reproducibility.
backend (Literal['auto', 'cpu', 'gpu']) – “auto”, “cpu”, or “gpu”.

Returns:

BootstrapSolution with t0, t, bias, SE.

Return type:

BootstrapSolution

Examples

>>> import numpy as np
>>> from pystatistics.montecarlo import boot
>>> data = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
>>> def mean_stat(data, indices):
...     return np.array([np.mean(data[indices])])
>>> result = boot(data, mean_stat, R=999, seed=42)
>>> result.t0  # observed mean
>>> result.bias  # bootstrap bias estimate
>>> result.se  # bootstrap standard error

pystatistics.montecarlo.boot_ci(boot_out, *, conf=0.95, type='all', index=0, var_t0=None, var_t=None)[source]¶

Compute bootstrap confidence intervals. Matches R’s boot::boot.ci().

Takes a BootstrapSolution from boot() and computes confidence intervals using one or more methods.

Parameters:

boot_out (BootstrapSolution) – Result from boot().
conf (float | Sequence[float]) – Confidence level(s). Default 0.95.
type (str | Sequence[str]) – CI type(s): “normal”, “basic”, “perc”, “bca”, “stud”, or “all”. “all” computes normal, basic, percentile, and BCa (not studentized unless var_t is provided).
index (int) – Which statistic to compute CI for (0-indexed into t0).
var_t0 (float | None) – Variance of the observed statistic (for normal/studentized).
var_t (ndarray[tuple[Any, ...], dtype[_ScalarT]] | None) – Per-replicate variance estimates, shape (R,). Required for studentized CI.

Returns:

New BootstrapSolution with CI populated.

Return type:

BootstrapSolution

Examples

>>> result = boot(data, mean_stat, R=999, seed=42)
>>> ci_result = boot_ci(result, type="perc")
>>> ci_result.ci["perc"]  # shape (k, 2) for [lower, upper]

pystatistics.montecarlo.permutation_test(x, y, statistic, R=9999, *, alternative='two.sided', seed=None, backend='auto')[source]¶

Permutation test for two groups.

Shuffles the combined data R times, computing the test statistic on each permutation. P-value uses the Phipson-Smyth correction: (count + 1) / (R + 1).

Parameters:

x (ArrayLike) – Group 1 data.
y (ArrayLike) – Group 2 data.
statistic (Callable) – fn(x, y) -> float. The test statistic.
R (int) – Number of permutations. Default 9999.
alternative (Literal['two.sided', 'less', 'greater']) – “two.sided”, “less”, or “greater”.
seed (int | None) – Random seed for reproducibility.
backend (Literal['auto', 'cpu', 'gpu']) – “auto”, “cpu”, or “gpu”.

Returns:

PermutationSolution with observed_stat, perm_stats, p_value.

Return type:

PermutationSolution

Examples

>>> x = np.array([1, 2, 3, 4, 5])
>>> y = np.array([6, 7, 8, 9, 10])
>>> def mean_diff(x, y): return np.mean(x) - np.mean(y)
>>> result = permutation_test(x, y, mean_diff, R=9999, seed=42)
>>> result.p_value