enrichmap.pl.compare_variograms

enrichmap.pl.compare_variograms#

enrichmap.pl.compare_variograms(adata: AnnData, score_key: str, batch_key: str, spatial_key: str = 'spatial', n_lags: int = 20, maxlag: str | float = 'median', model: Literal['spherical', 'exponential', 'gaussian', 'matern'] = 'spherical', n_subsample: int | None = 3000, n_permutations: int = 999, random_state: int = 0, group_key: str | None = None, plot: bool = True, figsize: tuple[float, float] = (10, 4), palette: str | dict | None = None, save: str | None = None, save_kwargs: dict | None = None) DataFrame#

Fit empirical semivariograms to EnrichMap scores per patient and extract structural parameters for cross-patient comparison of spatial organisation.

A semivariogram describes how the dissimilarity between score values changes as a function of the spatial distance (lag) between spots. Three parameters are extracted from a fitted theoretical model:

  • Effective range: the lag distance at which the variogram plateaus, i.e. the spatial scale of autocorrelation. A short range means the score field is composed of many small patches; a long range means large, coherent spatial domains. Because coordinates are min-max normalised per patient to [0, 1], the range is expressed as a fraction of tissue extent and is directly comparable across slides of different physical sizes.

  • Sill: the plateau semivariance, representing the total spatial variance of the score field.

  • Nugget: the semivariance at lag zero, capturing measurement noise or fine-scale variation below the resolution of the spatial graph. A high nugget-to-sill ratio indicates that most of the variance is spatially unstructured.

Two patients can share identical score distributions yet have completely different variograms, making this a powerful complement to marginal summaries like violin plots.

Statistical testing#

When exactly two patients are present, a spot-label permutation test is run automatically on the difference in effective range. All spots from both patients are pooled and randomly reassigned to two groups of the original sizes; variograms are refitted on each permutation to build a null distribution. This tests whether the observed difference in spatial scale is larger than expected if score values were randomly distributed across the two tissue architectures.

When multiple patients are present and group_key is provided, a permutation test is run on the difference in group-mean effective range, analogous to a two-sample t-test but without distributional assumptions.

All test results are stored in df.attrs and printed in the plot title when plot=True.

param adata:

Annotated data matrix. Must contain the EnrichMap score column in adata.obs and spatial coordinates in adata.obsm[spatial_key].

type adata:

AnnData

param score_key:

Column name in adata.obs holding the EnrichMap score to analyse, e.g. "enrichmap_score" or "EMT_score".

type score_key:

str

param batch_key:

Column name in adata.obs identifying individual patients or slides, e.g. "patient_id" or "library_id". Each unique value is treated as a separate sample with its own spatial graph and variogram.

type batch_key:

str

param spatial_key:

Key in adata.obsm containing the 2D spatial coordinates as an (n_spots, 2) array.

type spatial_key:

str, default "spatial"

param n_lags:

Number of evenly spaced lag bins for the empirical variogram. More bins give a smoother curve but require more spot pairs per bin. Values between 15 and 30 work well for Visium-scale data.

type n_lags:

int, default 20

param maxlag:

Maximum lag distance considered. "median" (recommended) uses the median pairwise distance within each sample, which is a robust default that avoids noisy estimates at large lags where few spot pairs exist. Can also be a float in normalised coordinate units (e.g. 0.5 to consider lags up to half the tissue extent).

type maxlag:

str or float, default "median"

param model:

Theoretical variogram model fitted to the empirical semivariance. "spherical" is the most commonly used and has a clear transition from spatially structured to unstructured variance. "exponential" approaches the sill asymptotically (no sharp plateau). "gaussian" implies very smooth spatial fields. "matern" is the most flexible but may overfit with few lags.

type model:

{"spherical", "exponential", "gaussian", "matern"}, default "spherical"

param n_subsample:

If set, subsample each patient to at most this many spots before fitting. Variogram estimation is O(n²) in the number of spots, so subsampling is recommended for slides with more than ~4000 spots. Set to None to use all spots.

type n_subsample:

int or None, default 3000

param n_permutations:

Number of permutations for statistical testing. Higher values give more precise p-values. For exploratory analysis 499 is sufficient; for publication-ready results use 999 or higher.

type n_permutations:

int, default 999

param random_state:

Seed for the random number generator, used for subsampling and permutation tests. Set for reproducibility.

type random_state:

int, default 0

param group_key:

Column name in adata.obs for a higher-level clinical grouping, e.g. "subtype", "treatment_arm" or "response". When provided, the plot is coloured by group and a group-level permutation test is run on the effective range. When None, each patient is plotted individually.

type group_key:

str or None, optional

param plot:

Whether to produce a three-panel figure showing overlaid variogram curves (left), effective range comparison (centre) and sill comparison (right).

type plot:

bool, default True

param figsize:

Figure size in inches (width, height).

type figsize:

tuple of float, default (10, 4)

param palette:

Colour palette for the plot. Can be a seaborn palette name (e.g. "Set2"), a dictionary mapping group/patient names to colours, or None for the default palette.

type palette:

str, dict or None, optional

returns:

One row per patient with columns:

  • patient: patient/sample identifier (from batch_key).

  • group: clinical group label (only if group_key is set).

  • effective_range: spatial autocorrelation scale (normalised).

  • sill: total spatial variance (plateau semivariance).

  • nugget: fine-scale / measurement noise variance.

  • nugget_sill_ratio: proportion of variance that is spatially unstructured (nugget / sill).

Statistical test results, when applicable, are stored as dictionaries in df.attrs:

  • df.attrs["pairwise_test"]: spot-label permutation test result (two-patient case).

  • df.attrs["group_test"]: group-level permutation test result (multi-patient with group_key).

rtype:

pd.DataFrame

Examples

Compare variograms across two patients (pairwise test):

>>> result = compare_variograms(
...     adata,
...     score_key="enrichmap_score",
...     batch_key="patient_id",
...     n_lags=15,
... )
>>> print(result)
   patient  effective_range   sill  nugget  nugget_sill_ratio
patient_01            0.532  1.170       0                0.0
patient_02            0.069  0.972       0                0.0
>>> print(result.attrs["pairwise_test"]["p_value"])
0.005

Compare variograms across clinical groups:

>>> result = compare_variograms(
...     adata,
...     score_key="enrichmap_score",
...     batch_key="patient_id",
...     group_key="subtype",
... )
>>> print(result.attrs["group_test"])
{'test': 'permutation test (difference in group means)',
 'group_a': 'luminal', 'group_b': 'basal',
 'mean_a': 0.532, 'mean_b': 0.063,
 'observed_diff': 0.469, 'p_value': 0.088, ...}

See also

compare_morans_i

Spatial autocorrelation comparison via Moran’s I.

compare_wasserstein

Pairwise optimal transport distance between spatially embedded score fields.