enrichmap.pl.compare_wasserstein#

enrichmap.pl.compare_wasserstein(adata: AnnData, score_key: str, batch_key: str, spatial_key: str = 'spatial', spatial_weight: float = 1.0, score_weight: float = 1.0, n_subsample: int | None = 2000, n_permutations: int = 999, random_state: int = 0, group_key: str | None = None, plot: bool = True, figsize: tuple[float, float] = (7, 6), cmap: str = 'magma', linkage_method: str = 'average', save: str | None = None, save_kwargs: dict | None = None, return_result: bool = False) → Mock#

Pairwise Wasserstein (earth mover’s) distance between patients based on spatially embedded EnrichMap scores.

Each patient’s score field is represented as an empirical distribution over a joint (x, y, score) space. The Wasserstein-2 distance then quantifies how much “work” is needed to transform one patient’s spatial score landscape into another’s, capturing differences in both the spatial arrangement and the magnitude of scores simultaneously. Two patients can have identical marginal score distributions yet produce a large Wasserstein distance if their spatial patterns differ — for instance, a single coherent hotspot versus many scattered foci.

The output is a patient-by-patient distance matrix that can be used directly for hierarchical clustering, multidimensional scaling or downstream statistical testing.

Normalisation#

Spatial coordinates are min-max normalised per patient to [0, 1] so that varying slide extents (e.g. different tissue sizes or imaging resolutions) do not dominate the distance. Scores are likewise normalised per patient to [0, 1]. The spatial_weight and score_weight parameters allow tuning the relative contribution of location versus score amplitude to the final distance:

spatial_weight > score_weight: emphasises where scores are located in tissue space; useful for detecting pattern rearrangements.
score_weight > spatial_weight: emphasises score magnitudes; useful when amplitude differences are biologically meaningful.
Equal weights (default): balanced comparison.

Statistical testing#

When exactly two patients are present, a spot-label permutation test is run automatically. All spots from both patients are pooled and randomly split into two groups of the original sizes. Coordinates are preserved — each spot keeps its original spatial position — so the test asks: “is the observed Wasserstein distance larger than expected if these score values were randomly distributed across these two tissue architectures?” The p-value is computed as the fraction of permuted distances that equal or exceed the observed distance.

When multiple patients are present and group_key is provided, a PERMANOVA (permutational multivariate analysis of variance) is run on the distance matrix, analogous to vegan::adonis2 in R. This tests whether within-group distances are smaller than between-group distances — i.e. whether the clinical grouping explains a significant proportion of variance in spatial score organisation.

All test results are stored in df.attrs and displayed in the plot title when plot=True.

param adata:

Annotated data matrix. Must contain the EnrichMap score column in adata.obs and spatial coordinates in adata.obsm[spatial_key].

type adata:

AnnData

param score_key:

Column name in adata.obs holding the EnrichMap score to analyse, e.g. "enrichmap_score" or "EMT_score".

type score_key:

str

param batch_key:

Column name in adata.obs identifying individual patients or slides, e.g. "patient_id" or "library_id". Each unique value is treated as a separate sample.

type batch_key:

str

param spatial_key:

Key in adata.obsm containing the 2D spatial coordinates as an (n_spots, 2) array.

type spatial_key:

str, default "spatial"

param spatial_weight:

Multiplicative weight applied to the (normalised) spatial dimensions before computing pairwise distances. Increase to make the metric more sensitive to where scores are located; decrease to downweight spatial location relative to score amplitude.

type spatial_weight:

float, default 1.0

param score_weight:

Multiplicative weight applied to the (normalised) score dimension. Increase to make the metric more sensitive to score amplitude differences; decrease to emphasise spatial arrangement.

type score_weight:

float, default 1.0

param n_subsample:

If set, subsample each patient to at most this many spots before computing distances. The optimal transport solver operates on an (n × m) cost matrix, so runtime is roughly O(n² · m) for each pair. Subsampling to 2000 keeps pairwise computation under ~1s for typical Visium data. Set to None to use all spots (may be slow for >5000 spots per sample).

type n_subsample:

int or None, default 2000

param n_permutations:

Number of permutations for significance testing. Higher values give more precise p-values. For exploratory analysis 499 is sufficient; for publication-ready results use 999 or higher.

type n_permutations:

int, default 999

param random_state:

Seed for the random number generator, used for subsampling and permutation tests. Set for reproducibility.

type random_state:

int, default 0

param group_key:

Column name in adata.obs for a higher-level clinical grouping, e.g. "subtype", "treatment_arm" or "response". When provided, the heatmap is annotated with colour strips and a PERMANOVA test is run on the distance matrix. When None, no group-level testing is performed.

type group_key:

str or None, optional

param plot:

Whether to produce a hierarchically clustered heatmap of the distance matrix. For exactly two patients this is not very informative; consider setting plot=False.

type plot:

bool, default True

param figsize:

Controls the figure width in inches. The height is auto-computed from the width so that heatmap cells are exactly square; the height component of this tuple is therefore ignored.

type figsize:

tuple of float, default (7, 6)

param cmap:

Colourmap for the distance heatmap.

type cmap:

str, default "magma"

param linkage_method:

Linkage method for hierarchical clustering of the distance matrix. "average" (UPGMA) is the standard choice for distance matrices. Other options include "ward", "complete" and "single".

type linkage_method:

str, default "average"

param save:

Filename to save the figure (e.g. "wasserstein.pdf"). The file is written to a figures/ subdirectory of the working directory. When None (default), the figure is not saved.

type save:

str or None, optional

param save_kwargs:

Extra keyword arguments forwarded to fig.savefig, e.g. {"dpi": 300, "bbox_inches": "tight"}.

type save_kwargs:

dict or None, optional

param return_result:

When True, return the distance matrix DataFrame. When False (default), return None.

type return_result:

bool, default False

returns:

When return_result=True: square distance matrix indexed and columned by patient name. Values are Wasserstein-2 distances in the joint (x_norm, y_norm, score_norm) space.

Statistical test results, when applicable, are stored as dictionaries in df.attrs:

df.attrs["pairwise_test"]: spot-label permutation test result (two-patient case), containing keys "observed_distance", "null_mean", "null_std", "p_value" and "n_permutations".
df.attrs["permanova"]: PERMANOVA result (multi-patient with group_key), containing keys "pseudo_F", "p_value" and "n_permutations".

rtype:

pd.DataFrame or None

Examples

Pairwise comparison of two patients:

>>> dist = compare_wasserstein(
...     adata,
...     score_key="enrichmap_score",
...     batch_key="patient_id",
...     plot=False,
... )
>>> print(dist)
            patient_01  patient_02
patient_01      0.0000      0.1375
patient_02      0.1375      0.0000
>>> print(dist.attrs["pairwise_test"]["p_value"])
0.025

Multi-patient comparison with PERMANOVA:

>>> dist = compare_wasserstein(
...     adata,
...     score_key="enrichmap_score",
...     batch_key="patient_id",
...     group_key="subtype",
... )
>>> print(dist.attrs["permanova"])
{'test': 'PERMANOVA', 'pseudo_F': 6.22, 'p_value': 0.088, ...}

Emphasise spatial arrangement over score amplitude:

>>> dist = compare_wasserstein(
...     adata,
...     score_key="enrichmap_score",
...     batch_key="patient_id",
...     spatial_weight=2.0,
...     score_weight=0.5,
... )

enrichmap.pl.compare_wasserstein

Contents

enrichmap.pl.compare_wasserstein#

Normalisation#

Statistical testing#