enrichmap.pl.compare_variograms#
- enrichmap.pl.compare_variograms(adata: AnnData, score_key: str, batch_key: str, spatial_key: str = 'spatial', n_lags: int = 20, maxlag: str | float = 'median', model: Literal['spherical', 'exponential', 'gaussian', 'matern'] = 'spherical', n_subsample: int | None = 3000, n_permutations: int = 999, random_state: int = 0, group_key: str | None = None, plot: bool = True, figsize: tuple[float, float] = (10, 4), palette: str | dict | None = None, save: str | None = None, save_kwargs: dict | None = None) DataFrame#
Fit empirical semivariograms to EnrichMap scores per patient and extract structural parameters for cross-patient comparison of spatial organisation.
A semivariogram describes how the dissimilarity between score values changes as a function of the spatial distance (lag) between spots. Three parameters are extracted from a fitted theoretical model:
Effective range: the lag distance at which the variogram plateaus, i.e. the spatial scale of autocorrelation. A short range means the score field is composed of many small patches; a long range means large, coherent spatial domains. Because coordinates are min-max normalised per patient to [0, 1], the range is expressed as a fraction of tissue extent and is directly comparable across slides of different physical sizes.
Sill: the plateau semivariance, representing the total spatial variance of the score field.
Nugget: the semivariance at lag zero, capturing measurement noise or fine-scale variation below the resolution of the spatial graph. A high nugget-to-sill ratio indicates that most of the variance is spatially unstructured.
Two patients can share identical score distributions yet have completely different variograms, making this a powerful complement to marginal summaries like violin plots.
Statistical testing#
When exactly two patients are present, a spot-label permutation test is run automatically on the difference in effective range. All spots from both patients are pooled and randomly reassigned to two groups of the original sizes; variograms are refitted on each permutation to build a null distribution. This tests whether the observed difference in spatial scale is larger than expected if score values were randomly distributed across the two tissue architectures.
When multiple patients are present and
group_keyis provided, a permutation test is run on the difference in group-mean effective range, analogous to a two-sample t-test but without distributional assumptions.All test results are stored in
df.attrsand printed in the plot title whenplot=True.- param adata:
Annotated data matrix. Must contain the EnrichMap score column in
adata.obsand spatial coordinates inadata.obsm[spatial_key].- type adata:
AnnData
- param score_key:
Column name in
adata.obsholding the EnrichMap score to analyse, e.g."enrichmap_score"or"EMT_score".- type score_key:
str
- param batch_key:
Column name in
adata.obsidentifying individual patients or slides, e.g."patient_id"or"library_id". Each unique value is treated as a separate sample with its own spatial graph and variogram.- type batch_key:
str
- param spatial_key:
Key in
adata.obsmcontaining the 2D spatial coordinates as an (n_spots, 2) array.- type spatial_key:
str, default
"spatial"- param n_lags:
Number of evenly spaced lag bins for the empirical variogram. More bins give a smoother curve but require more spot pairs per bin. Values between 15 and 30 work well for Visium-scale data.
- type n_lags:
int, default 20
- param maxlag:
Maximum lag distance considered.
"median"(recommended) uses the median pairwise distance within each sample, which is a robust default that avoids noisy estimates at large lags where few spot pairs exist. Can also be a float in normalised coordinate units (e.g.0.5to consider lags up to half the tissue extent).- type maxlag:
str or float, default
"median"- param model:
Theoretical variogram model fitted to the empirical semivariance.
"spherical"is the most commonly used and has a clear transition from spatially structured to unstructured variance."exponential"approaches the sill asymptotically (no sharp plateau)."gaussian"implies very smooth spatial fields."matern"is the most flexible but may overfit with few lags.- type model:
{"spherical", "exponential", "gaussian", "matern"}, default"spherical"- param n_subsample:
If set, subsample each patient to at most this many spots before fitting. Variogram estimation is O(n²) in the number of spots, so subsampling is recommended for slides with more than ~4000 spots. Set to
Noneto use all spots.- type n_subsample:
int or None, default 3000
- param n_permutations:
Number of permutations for statistical testing. Higher values give more precise p-values. For exploratory analysis 499 is sufficient; for publication-ready results use 999 or higher.
- type n_permutations:
int, default 999
- param random_state:
Seed for the random number generator, used for subsampling and permutation tests. Set for reproducibility.
- type random_state:
int, default 0
- param group_key:
Column name in
adata.obsfor a higher-level clinical grouping, e.g."subtype","treatment_arm"or"response". When provided, the plot is coloured by group and a group-level permutation test is run on the effective range. WhenNone, each patient is plotted individually.- type group_key:
str or None, optional
- param plot:
Whether to produce a three-panel figure showing overlaid variogram curves (left), effective range comparison (centre) and sill comparison (right).
- type plot:
bool, default True
- param figsize:
Figure size in inches (width, height).
- type figsize:
tuple of float, default (10, 4)
- param palette:
Colour palette for the plot. Can be a seaborn palette name (e.g.
"Set2"), a dictionary mapping group/patient names to colours, orNonefor the default palette.- type palette:
str, dict or None, optional
- returns:
One row per patient with columns:
patient: patient/sample identifier (frombatch_key).group: clinical group label (only ifgroup_keyis set).effective_range: spatial autocorrelation scale (normalised).sill: total spatial variance (plateau semivariance).nugget: fine-scale / measurement noise variance.nugget_sill_ratio: proportion of variance that is spatially unstructured (nugget / sill).
Statistical test results, when applicable, are stored as dictionaries in
df.attrs:df.attrs["pairwise_test"]: spot-label permutation test result (two-patient case).df.attrs["group_test"]: group-level permutation test result (multi-patient withgroup_key).
- rtype:
pd.DataFrame
Examples
Compare variograms across two patients (pairwise test):
>>> result = compare_variograms( ... adata, ... score_key="enrichmap_score", ... batch_key="patient_id", ... n_lags=15, ... ) >>> print(result) patient effective_range sill nugget nugget_sill_ratio patient_01 0.532 1.170 0 0.0 patient_02 0.069 0.972 0 0.0 >>> print(result.attrs["pairwise_test"]["p_value"]) 0.005
Compare variograms across clinical groups:
>>> result = compare_variograms( ... adata, ... score_key="enrichmap_score", ... batch_key="patient_id", ... group_key="subtype", ... ) >>> print(result.attrs["group_test"]) {'test': 'permutation test (difference in group means)', 'group_a': 'luminal', 'group_b': 'basal', 'mean_a': 0.532, 'mean_b': 0.063, 'observed_diff': 0.469, 'p_value': 0.088, ...}
See also
compare_morans_iSpatial autocorrelation comparison via Moran’s I.
compare_wassersteinPairwise optimal transport distance between spatially embedded score fields.