PyStatsBio

What It Is

PyStatsBio provides domain-specific statistical methods for biotech and pharmaceutical research. It's built on top of PyStatistics, inheriting its validation rigor and GPU acceleration capabilities.

The library covers four key areas of the drug development pipeline: clinical trial planning (power analysis), preclinical pharmacology (dose-response), biomarker evaluation (diagnostic accuracy), and pharmacokinetics (NCA).

Relationship to PyStatistics

PyStatistics is the general-purpose statistical engine — regression, ANOVA, mixed models, hypothesis testing. PyStatsBio adds domain-specific methods that biostatisticians and pharmacologists need. Install PyStatsBio and you get both.

Installation

Coming soon to PyPI. For now, install from source:

pip install git+https://github.com/sgcx-org/pystatsbio.git

This automatically installs PyStatistics as a dependency. For GPU support:

pip install "pystatsbio[gpu] @ git+https://github.com/sgcx-org/pystatsbio.git"

Quick Start

Power Analysis — Clinical Trial Planning

from pystatsbio.power import power_t_test, power_logrank

# How many subjects for a two-sample t-test?
result = power_t_test(d=0.5, alpha=0.05, power=0.8)
print(f"Required n per group: {result.n}")

# Sample size for a survival study (log-rank test)
result = power_logrank(hr=0.7, alpha=0.05, power=0.8, p_event=0.6)
print(f"Required total N: {result.n}")

Dose-Response Modeling

from pystatsbio.doseresponse import fit_drm, ec50
import numpy as np

dose = np.array([0.01, 0.1, 1, 10, 100, 1000])
response = np.array([5, 10, 30, 70, 90, 95])

# Fit a 4-parameter logistic curve (self-starting, no manual guesses)
result = fit_drm(dose, response, model='LL.4')
print(result.summary())

# Extract EC50 with confidence interval
ec = ec50(result, conf_level=0.95)
print(f"EC50: {ec.estimate} ({ec.ci_lower}, {ec.ci_upper})")

ROC Analysis — Biomarker Evaluation

from pystatsbio.diagnostic import roc, diagnostic_accuracy
import numpy as np

response = np.array([1, 1, 1, 0, 0, 0, 1, 0, 1, 0])
predictor = np.random.randn(10)

# ROC curve with DeLong confidence intervals
result = roc(response, predictor)
print(f"AUC: {result.auc} ({result.auc_ci_lower}, {result.auc_ci_upper}")

# Sensitivity, specificity, PPV, NPV at a cutoff
dx = diagnostic_accuracy(response, predictor, cutoff=0.5)
print(dx.sensitivity, dx.specificity)

Modules

Each module targets a specific phase of biotech research and validates against established R packages.