enrichmap.pl.compare_wasserstein#
- enrichmap.pl.compare_wasserstein(adata: AnnData, score_key: str, batch_key: str, spatial_key: str = 'spatial', spatial_weight: float = 1.0, score_weight: float = 1.0, n_subsample: int | None = 2000, n_permutations: int = 999, random_state: int = 0, group_key: str | None = None, plot: bool = True, figsize: tuple[float, float] = (7, 6), cmap: str = 'magma_r', linkage_method: str = 'average', save: str | None = None, save_kwargs: dict | None = None) DataFrame#
Pairwise Wasserstein (earth mover’s) distance between patients based on spatially embedded EnrichMap scores.
Each patient’s score field is represented as an empirical distribution over a joint (x, y, score) space. The Wasserstein-2 distance then quantifies how much “work” is needed to transform one patient’s spatial score landscape into another’s, capturing differences in both the spatial arrangement and the magnitude of scores simultaneously. Two patients can have identical marginal score distributions yet produce a large Wasserstein distance if their spatial patterns differ — for instance, a single coherent hotspot versus many scattered foci.
The output is a patient-by-patient distance matrix that can be used directly for hierarchical clustering, multidimensional scaling or downstream statistical testing.
Normalisation#
Spatial coordinates are min-max normalised per patient to [0, 1] so that varying slide extents (e.g. different tissue sizes or imaging resolutions) do not dominate the distance. Scores are likewise normalised per patient to [0, 1]. The
spatial_weightandscore_weightparameters allow tuning the relative contribution of location versus score amplitude to the final distance:spatial_weight > score_weight: emphasises where scores are located in tissue space; useful for detecting pattern rearrangements.score_weight > spatial_weight: emphasises score magnitudes; useful when amplitude differences are biologically meaningful.Equal weights (default): balanced comparison.
Statistical testing#
When exactly two patients are present, a spot-label permutation test is run automatically. All spots from both patients are pooled and randomly split into two groups of the original sizes. Coordinates are preserved — each spot keeps its original spatial position — so the test asks: “is the observed Wasserstein distance larger than expected if these score values were randomly distributed across these two tissue architectures?” The p-value is computed as the fraction of permuted distances that equal or exceed the observed distance.
When multiple patients are present and
group_keyis provided, a PERMANOVA (permutational multivariate analysis of variance) is run on the distance matrix, analogous tovegan::adonis2in R. This tests whether within-group distances are smaller than between-group distances — i.e. whether the clinical grouping explains a significant proportion of variance in spatial score organisation.All test results are stored in
df.attrsand displayed in the plot title whenplot=True.- param adata:
Annotated data matrix. Must contain the EnrichMap score column in
adata.obsand spatial coordinates inadata.obsm[spatial_key].- type adata:
AnnData
- param score_key:
Column name in
adata.obsholding the EnrichMap score to analyse, e.g."enrichmap_score"or"EMT_score".- type score_key:
str
- param batch_key:
Column name in
adata.obsidentifying individual patients or slides, e.g."patient_id"or"library_id". Each unique value is treated as a separate sample.- type batch_key:
str
- param spatial_key:
Key in
adata.obsmcontaining the 2D spatial coordinates as an (n_spots, 2) array.- type spatial_key:
str, default
"spatial"- param spatial_weight:
Multiplicative weight applied to the (normalised) spatial dimensions before computing pairwise distances. Increase to make the metric more sensitive to where scores are located; decrease to downweight spatial location relative to score amplitude.
- type spatial_weight:
float, default 1.0
- param score_weight:
Multiplicative weight applied to the (normalised) score dimension. Increase to make the metric more sensitive to score amplitude differences; decrease to emphasise spatial arrangement.
- type score_weight:
float, default 1.0
- param n_subsample:
If set, subsample each patient to at most this many spots before computing distances. The optimal transport solver operates on an (n × m) cost matrix, so runtime is roughly O(n² · m) for each pair. Subsampling to 2000 keeps pairwise computation under ~1s for typical Visium data. Set to
Noneto use all spots (may be slow for >5000 spots per sample).- type n_subsample:
int or None, default 2000
- param n_permutations:
Number of permutations for significance testing. Higher values give more precise p-values. For exploratory analysis 499 is sufficient; for publication-ready results use 999 or higher.
- type n_permutations:
int, default 999
- param random_state:
Seed for the random number generator, used for subsampling and permutation tests. Set for reproducibility.
- type random_state:
int, default 0
- param group_key:
Column name in
adata.obsfor a higher-level clinical grouping, e.g."subtype","treatment_arm"or"response". When provided, the clustermap is annotated with a colour sidebar and a PERMANOVA test is run on the distance matrix. WhenNone, no group-level testing is performed.- type group_key:
str or None, optional
- param plot:
Whether to produce a hierarchically clustered heatmap of the distance matrix. For exactly two patients this is not very informative; consider setting
plot=False.- type plot:
bool, default True
- param figsize:
Figure size in inches (width, height) for the clustermap.
- type figsize:
tuple of float, default (7, 6)
- param cmap:
Colourmap for the distance heatmap.
"magma_r"gives a dark (low distance) to light (high distance) gradient.- type cmap:
str, default
"magma_r"- param linkage_method:
Linkage method for hierarchical clustering of the distance matrix.
"average"(UPGMA) is the standard choice for distance matrices. Other options include"ward","complete"and"single".- type linkage_method:
str, default
"average"- returns:
Square distance matrix indexed and columned by patient name. Values are Wasserstein-2 distances in the joint (x_norm, y_norm, score_norm) space.
Statistical test results, when applicable, are stored as dictionaries in
df.attrs:df.attrs["pairwise_test"]: spot-label permutation test result (two-patient case), containing keys"observed_distance","null_mean","null_std","p_value"and"n_permutations".df.attrs["permanova"]: PERMANOVA result (multi-patient withgroup_key), containing keys"pseudo_F","p_value"and"n_permutations".
- rtype:
pd.DataFrame
Examples
Pairwise comparison of two patients:
>>> dist = compare_wasserstein( ... adata, ... score_key="enrichmap_score", ... batch_key="patient_id", ... plot=False, ... ) >>> print(dist) patient_01 patient_02 patient_01 0.0000 0.1375 patient_02 0.1375 0.0000 >>> print(dist.attrs["pairwise_test"]["p_value"]) 0.025
Multi-patient comparison with PERMANOVA:
>>> dist = compare_wasserstein( ... adata, ... score_key="enrichmap_score", ... batch_key="patient_id", ... group_key="subtype", ... ) >>> print(dist.attrs["permanova"]) {'test': 'PERMANOVA', 'pseudo_F': 6.22, 'p_value': 0.088, ...}
Emphasise spatial arrangement over score amplitude:
>>> dist = compare_wasserstein( ... adata, ... score_key="enrichmap_score", ... batch_key="patient_id", ... spatial_weight=2.0, ... score_weight=0.5, ... )
See also
compare_morans_iSpatial autocorrelation comparison via Moran’s I.
compare_variogramsSemivariogram-based comparison of spatial scale and structure.