Open Source

PyStatsBio

Biotech & pharmaceutical statistics built on PyStatistics

Domain-specific methods for the drug development pipeline. Free forever.

What It Is

PyStatsBio provides domain-specific statistical methods for biotech and pharmaceutical research. It's built on top of PyStatistics, inheriting its validation rigor and GPU acceleration capabilities.

The library covers four key areas of the drug development pipeline: clinical trial planning (power analysis), preclinical pharmacology (dose-response), biomarker evaluation (diagnostic accuracy), and pharmacokinetics (NCA).

Relationship to PyStatistics

PyStatistics is the general-purpose statistical engine — regression, ANOVA, mixed models, hypothesis testing. PyStatsBio adds domain-specific methods that biostatisticians and pharmacologists need. Install PyStatsBio and you get both.

Installation

Coming soon to PyPI. For now, install from source:

pip install git+https://github.com/sgcx-org/pystatsbio.git

This automatically installs PyStatistics as a dependency. For GPU support:

pip install "pystatsbio[gpu] @ git+https://github.com/sgcx-org/pystatsbio.git"

Quick Start

Power Analysis — Clinical Trial Planning

from pystatsbio.power import power_t_test, power_logrank

# How many subjects for a two-sample t-test?
result = power_t_test(d=0.5, alpha=0.05, power=0.8)
print(f"Required n per group: {result.n}")

# Sample size for a survival study (log-rank test)
result = power_logrank(hr=0.7, alpha=0.05, power=0.8, p_event=0.6)
print(f"Required total N: {result.n}")

Dose-Response Modeling

from pystatsbio.doseresponse import fit_drm, ec50
import numpy as np

dose = np.array([0.01, 0.1, 1, 10, 100, 1000])
response = np.array([5, 10, 30, 70, 90, 95])

# Fit a 4-parameter logistic curve (self-starting, no manual guesses)
result = fit_drm(dose, response, model='LL.4')
print(result.summary())

# Extract EC50 with confidence interval
ec = ec50(result, conf_level=0.95)
print(f"EC50: {ec.estimate} ({ec.ci_lower}, {ec.ci_upper})")

ROC Analysis — Biomarker Evaluation

from pystatsbio.diagnostic import roc, diagnostic_accuracy
import numpy as np

response = np.array([1, 1, 1, 0, 0, 0, 1, 0, 1, 0])
predictor = np.random.randn(10)

# ROC curve with DeLong confidence intervals
result = roc(response, predictor)
print(f"AUC: {result.auc} ({result.auc_ci_lower}, {result.auc_ci_upper}")

# Sensitivity, specificity, PPV, NPV at a cutoff
dx = diagnostic_accuracy(response, predictor, cutoff=0.5)
print(dx.sensitivity, dx.specificity)

Modules

Each module targets a specific phase of biotech research and validates against established R packages.

power

Sample size & power calculations. t-tests, proportions, log-rank, ANOVA, non-inferiority, equivalence, crossover, cluster trials.

Validates against: pwr, TrialSize, gsDesign, PowerTOST

API Reference →

doseresponse

4PL/5PL curve fitting, EC50/IC50, relative potency, benchmark dose. GPU-accelerated batch fitting for HTS.

Validates against: drc, nplr, BMDS

API Reference →

diagnostic

ROC analysis with DeLong CIs, sensitivity/specificity, optimal cutoffs, batch AUC for biomarker panels.

Validates against: pROC, OptimalCutpoints, epiR

API Reference →

pk

Non-compartmental pharmacokinetic analysis. AUC (linear-up/log-down), Cmax, half-life, clearance.

Validates against: PKNCA, NonCompart

API Reference →
Full API Reference → View on GitHub ← Back to Technology