Core

SpaceShip — Main way to get where you’re going

SpaceShip is the recommended entry point. It orchestrates data preprocessing, GRN inference, and ligand-receptor communication and spawning workers into a single configurable object.

from SpaceTravLR import SpaceShip

ship = SpaceShip(name='MyTissue', outdir='./output')
ship.setup_(adata)                 # preprocess, run CellOracle, (optionally COMMOT)
ship.spawn_worker()                # submit a SLURM job, or call SpaceTravLR directly
class SpaceTravLR.spaceship.SpaceShip(name='AlienTissue', outdir='./output')[source]

Bases: object

SpaceShip is the main entry point for the SpaceTravLR analysis pipeline. It manages the data, directory structure, and execution of the various steps involved in spatial gene regulatory network inference and perturbation.

Parameters:
  • name (str, optional) – Name of the project/analysis, by default ‘AlienTissue’.

  • outdir (str, optional) – Path to the output directory where results will be saved, by default ‘./output’.

__init__(name='AlienTissue', outdir='./output')[source]
Parameters:
  • name (str)

  • outdir (str)

fit(**kwargs)[source]
interactive_select(adata, size=10, annot='cell_type', mode='spatial')[source]

Launches an interactive scatter plot for selecting cells.

Parameters:
  • adata (ad.AnnData) – AnnData object.

  • size (int, optional) – Point size, by default 10.

  • annot (str, optional) – Color by annotation, by default ‘cell_type’.

  • mode (str, optional) – ‘spatial’ or ‘umap’, by default ‘spatial’.

Returns:

Interactive scatter plot widget.

Return type:

jscatter.Scatter

is_everything_ok()[source]

Checks if all necessary output files and directories exist.

Returns:

True if all checks pass.

Return type:

bool

static load_base_GRN(species)[source]
Return type:

DataFrame

load_base_cell_thresholds()[source]
Return type:

DataFrame

perturb(target, propagation=4, gene_expr=0, cells=None)[source]

Performs in silico perturbation of a target gene.

Simulates the effect of changing a gene’s expression (knockout or overexpression) on the entire transcriptome, considering spatial signaling propagation.

Parameters:
  • target (str or list) – Target gene(s) to perturb.

  • propagation (int, optional) – Number of propagation steps (hops) in the network, by default 4.

  • gene_expr (float or list, optional) – Target expression level (0 for knockout), by default 0.

  • cells (list, optional) – List of cell indices to apply perturbation to (None for all cells), by default None.

Returns:

Simulated gene expression matrix after perturbation.

Return type:

pd.DataFrame

process_adata_(adata, annot='cell_type')[source]

Preprocesses the AnnData object for SpaceTravLR analysis.

This method checks for required fields, normalizes data if necessary, computes PCA/Neighbors/UMAP if missing, and imputes gene expression if needed. It saves the processed AnnData to the output directory.

Parameters:
  • adata (ad.AnnData) – The AnnData object containing the spatial transcriptomics data.

  • annot (str, optional) – The column name in adata.obs containing cell type annotations, by default ‘cell_type’.

run_celloracle_(alpha=5)[source]

Runs CellOracle to infer the base Gene Regulatory Network (GRN).

It constructs a cluster-specific GRN based on the base network structure and the expression data in the AnnData object.

Parameters:

alpha (int, optional) – Regularization parameter for the model, by default 5.

run_commot_(radius=350)[source]

Runs COMMOT to infer spatial cell-cell communication.

This method identifies ligand-receptor interactions and computes their spatial communication scores. It also computes received ligand signals for each cell.

Parameters:

radius (int, optional) – Spatial radius for communication in microns (or coordinate units), by default 350.

run_spacetravlr(max_epochs=150, learning_rate=0.005, spatial_dim=64, batch_size=512, radius=300, contact_distance=50)[source]

Trains the SpaceTravLR model to learn spatial gene regulation.

This method initializes and trains the SpaceTravLR neural network model to predict gene expression based on TF activity and spatial ligand-receptor interactions.

Parameters:
  • max_epochs (int, optional) – Maximum number of training epochs, by default 150.

  • learning_rate (float, optional) – Learning rate for the optimizer, by default 5e-3.

  • spatial_dim (int, optional) – Dimension of the spatial embedding, by default 64.

  • batch_size (int, optional) – Batch size for training, by default 512.

  • radius (int, optional) – Radius for secreted signaling, by default 300.

  • contact_distance (int, optional) – Distance for contact-dependent signaling, by default 50.

setup_(adata, overwrite=False, run_commot=False)[source]

Sets up the SpaceShip environment and runs the preprocessing pipeline.

This includes creating directories, processing AnnData, running CellOracle, and running COMMOT.

Parameters:
  • adata (ad.AnnData) – Input AnnData object.

  • overwrite (bool, optional) – If True, overwrites existing output directory, by default False.

Returns:

Returns self for method chaining.

Return type:

self

setup_perturbations(adata, override_params=None, subsample=None, use_float16=False)[source]

Initializes the GeneFactory for running perturbations.

Parameters:
  • adata (ad.AnnData) – AnnData object used for perturbation simulations.

  • override_params (dict, optional) – Dictionary to override run parameters, by default None.

  • subsample (int, optional) – Number of cells to subsample for faster loading, by default None.

  • use_float16 (bool, optional) – Use float16 for lower memory usage, by default False.

spawn_worker(partition='preempt', clusters='gpu', gres='gpu:1', job_name='SpaceTravLR', lifespan=3, python_path='python')[source]

Submits a SLURM job to run the analysis.

Parameters:
  • partition (str, optional) – SLURM partition, by default ‘preempt’.

  • clusters (str, optional) – SLURM cluster, by default ‘gpu’.

  • gres (str, optional) – Generic Resource Scheduling (e.g. gpu:1), by default ‘gpu:1’.

  • job_name (str, optional) – Name of the job, by default ‘SpaceTravLR’.

  • lifespan (int, optional) – Wall-time in hours, by default 3.

  • python_path (str, optional) – Path to python executable, by default ‘python’.

SpaceTravLR — Training Orchestrator

SpaceTravLR.oracles.SpaceTravLR manages the training queue and dispatches SpatialCellularProgramsEstimator for each target gene.

from SpaceTravLR.oracles import SpaceTravLR

model = SpaceTravLR(
    adata,
    save_dir='./models',
    annot='cell_type_int',
    grn=grn,
    max_epochs=100,
    radius=200,
)

model.run()
class SpaceTravLR.oracles.SpaceTravLR(adata, save_dir='./models', annot='cell_type_int', grn=None, max_epochs=50, spatial_dim=64, learning_rate=0.005, batch_size=512, rotate_maps=True, layer='imputed_count', alpha=0.05, threshold_lambda=1e-06, tflinks=None, tf_ligand_cutoff=0.01, radius=200, contact_distance=30, skip_clusters=None, scale_factor=1)[source]

Bases: BaseTravLR

__init__(adata, save_dir='./models', annot='cell_type_int', grn=None, max_epochs=50, spatial_dim=64, learning_rate=0.005, batch_size=512, rotate_maps=True, layer='imputed_count', alpha=0.05, threshold_lambda=1e-06, tflinks=None, tf_ligand_cutoff=0.01, radius=200, contact_distance=30, skip_clusters=None, scale_factor=1)[source]
static imbue_adata_with_space(adata, annot='cell_type_int', spatial_dim=64, in_place=False, method='fast')[source]

Generate and cache 2D spatial maps for each cell location

run()[source]

GeneFactory — Perturbation Engine

GeneFactory loads pre-trained spatial coefficients (betas) and uses them to simulate in-silico perturbations.

from SpaceTravLR.gene_factory import GeneFactory

gf = GeneFactory.from_json(adata, 'output/params.json')
gf.load_betas()

# Single-gene knockout
result_df = gf.perturb(target='Myc', n_propagation=4, gene_expr=0)

# Whole-genome screen
gf.genome_screen(save_to='./ko_results', mode='knockout')
class SpaceTravLR.gene_factory.GeneFactory(adata, models_dir, annot='cell_type_int', radius=200, contact_distance=30, scale_factor=1, beta_scale_factor=1, beta_cap=None, co_grn=None)[source]

Bases: BaseTravLR

GeneFactory handles the loading of trained models (betas) and facilitates in silico perturbations. It effectively acts as a factory for generating simulated gene expression profiles under various perturbation conditions.

Parameters:
  • adata (ad.AnnData) – AnnData object containing the data.

  • models_dir (str) – Directory where the trained models (betadata) are stored.

  • annot (str, optional) – Annotation key in adata.obs, by default ‘cell_type_int’.

  • radius (int, optional) – Spatial radius for signaling, by default 200.

  • contact_distance (int, optional) – Contact distance for signaling, by default 30.

  • scale_factor (int, optional) – Scaling factor for spatial coordinates, by default 1.

  • beta_scale_factor (int, optional) – Scaling factor for beta values, by default 1.

  • beta_cap (float, optional) – Cap for beta values to prevent explosions in simulation, by default None.

  • co_grn (object, optional) – CellOracle GRN object, by default None.

__init__(adata, models_dir, annot='cell_type_int', radius=200, contact_distance=30, scale_factor=1, beta_scale_factor=1, beta_cap=None, co_grn=None)[source]
compute_betas(**kwargs)[source]
classmethod from_json(adata, json_path, override_params=None, beta_scale_factor=1, beta_cap=None, co_grn=None)[source]

Creates a GeneFactory instance from a parameters JSON file.

Parameters:
  • adata (ad.AnnData) – AnnData object.

  • json_path (str) – Path to the JSON file containing run parameters.

  • override_params (dict, optional) – Dictionary to override parameters from JSON, by default None.

  • beta_scale_factor (int, optional) – Scaling factor for beta values, by default 1.

  • beta_cap (float, optional) – Cap for beta values, by default None.

  • co_grn (object, optional) – CellOracle GRN object, by default None.

Returns:

Initialized GeneFactory instance.

Return type:

GeneFactory

genome_screen(save_to, n_propagation=4, priority_genes=None, mode='knockout', cells=None)[source]

Perform a genome-wide perturbation screen (knockout or overexpression).

Iterates through all possible targets (TFs, ligands, receptors) and performs the specified perturbation, saving the results to disk.

Parameters:
  • save_to (str) – Directory to save the results.

  • n_propagation (int, optional) – Number of propagation steps, by default 4.

  • priority_genes (list, optional) – List of genes to prioritize in the screen, by default None.

  • mode (str, optional) – ‘knockout’ or ‘overexpress’, by default ‘knockout’.

  • cells (list, optional) – List of cell indices to restrict perturbation to, by default None.

static get_ko_data(perturb_dir, adata)[source]
static load_betadata(gene, save_dir, obs_names=None)[source]
load_betas(subsample=None, float16=False, obs_names=None)[source]

Loads the spatial gene regulatory coefficients (betas) from disk.

Parameters:
  • subsample (int, optional) – Number of cells to subsample, by default None.

  • float16 (bool, optional) – Use float16 precision to save memory, by default False.

  • obs_names (list, optional) – List of cell names to load betas for, by default None.

perturb(target, n_propagation=4, gene_expr=0, cells=None, save_layer=False, delta_dir=None)[source]

Simulates perturbation of a target gene and propagates the effect.

Parameters:
  • target (str or list) – Target gene(s) to perturb.

  • n_propagation (int, optional) – Number of propagation steps, by default 4.

  • gene_expr (float, optional) – Expression level of the target gene (0 for knockout), by default 0.

  • cells (list, optional) – List of cell indices to apply perturbation to, by default None.

  • save_layer (bool, optional) – Whether to save the result as a layer in adata, by default False.

  • delta_dir (str, optional) – Directory to save delta matrices, by default None.

Returns:

DataFrame containing the simulated gene expression.

Return type:

pd.DataFrame

perturb_batch(target_genes, save_to=None, n_propagation=4, gene_expr=0, cells=None)[source]

Runs perturbations for a batch of target genes.

Parameters:
  • target_genes (list) – List of genes to perturb.

  • save_to (str, optional) – Directory to save results, by default None.

  • n_propagation (int, optional) – Number of propagation steps, by default 4.

  • gene_expr (float, optional) – Target expression level, by default 0.

  • cells (list, optional) – List of cells to apply perturbation to, by default None.

property possible_targets
splash_betas(gene, obs_names=None)[source]

Computes the derivatives by splitting up ligand terms into individual gene components. This essentially converts betadata of cell x modulators into cell x genes.

Parameters:
  • gene (str) – The gene to compute derivatives for.

  • obs_names (list, optional) – List of cell names to compute derivatives for, by default all.

Returns:

DataFrame with derivatives for each cell at each location

Return type:

pd.DataFrame

update_status(msg='', color='black_on_green')[source]

VirtualTissue — In-Silico Tissue Simulation

VirtualTissue provides a high-level interface for visualising perturbation effects across the spatial tissue map.

from SpaceTravLR.virtual_tissue import VirtualTissue

vt = VirtualTissue(
    adata,
    betadatas_path='./output/betadata',
    ko_path='./ko_results',
)

impact = vt.compute_ko_impact(['Myc', 'Sox2'])
vt.plot_radar(['Myc'], impact_df=impact)
vt.plot_arrows(perturb_target='Myc', threshold=0.1)

SubsampledTissue

An extension of VirtualTissue that aggregates results across multiple spatial sub-samples.

OracleQueue — Training Job Queue

OracleQueue manages the set of genes waiting to be modelled, with file-based locking for safe multi-agent parallel training.

class SpaceTravLR.oracles.OracleQueue(model_dir, all_genes, priority_genes=None, lock_timeout=3600)[source]

Bases: object

A jobs manager for training gene models in parallel Ideal for HPC environments

__init__(model_dir, all_genes, priority_genes=None, lock_timeout=3600)[source]
Args:

model_dir (str): Directory to store the trained model weights. all_genes (list): List of all genes to train models for. priority_genes (list, optional): List of genes to train models for first. lock_timeout (int, optional): Timeout for job locks in seconds.

add_orphan(gene)[source]

Creates a .orphan file to mark a gene as untrainable. Used when a gene-gene network is too sparse to train a model. For example when a gene has no known TFs.

property age

Return the age of the queue in seconds.

property agents
property completed_genes
create_lock(gene)[source]

Create a lock for a gene while the model is being trained. This prevents multiple processes from training the same gene.

delete_lock(gene)[source]
static extract_gene_name(path)[source]
property is_empty
kill_old_locks()[source]
last_refresh_age()[source]

Return the age of the last refresh in seconds.

property num_orphans
property regulated_genes
property remaining_genes

See also

Model — full reference for SpatialCellularProgramsEstimator, the core per-gene spatial regression engine.

BetaFrame — Spatial Coefficient Matrix

BetaFrame is a pandas.DataFrame subclass that stores the spatially-varying regression coefficients (βs) for a single target gene.

from SpaceTravLR.beta import BetaFrame

bf = BetaFrame.from_path('output/Myc_betadata.parquet')
bf_splashed = bf.splash(rw_ligands, rw_ligands_tfl, gex_df)

Betabase — Collection of BetaFrames

Betabase manages loading and caching of BetaFrame objects from disk for all trained genes.

class SpaceTravLR.beta.Betabase(adata, folder, gene_subset=None, subsample=None, float16=True, obs_names=None, genes=None, randomize=False, auto_load=True)[source]

Bases: object

Holds a collection of BetaFrames for each gene.

__init__(adata, folder, gene_subset=None, subsample=None, float16=True, obs_names=None, genes=None, randomize=False, auto_load=True)[source]
collect_interactions(cell_type, annot='cell_type', aggregate='mean')[source]
load_betadata(gene_name)[source]
load_betas_from_disk(obs_names=None)[source]

obs_names are the str cell index from adata.obs_names

Visionary — Cross-Dataset Prediction

Visionary enables transferring trained spatial models from a reference dataset to a new test dataset.

class SpaceTravLR.visionary.Visionary(ref_adata, test_adata, ref_json_path, prematching, matching_annot='cell_type', subsample=None, override_params=None)[source]

Bases: GeneFactory

A class for cross-predicting gene expression from a reference dataset to a test dataset. Reference and test datasets can differ in sample or modality, but should have similar spatial-resolution such that spots can be mapped to each other.

__init__(ref_adata, test_adata, ref_json_path, prematching, matching_annot='cell_type', subsample=None, override_params=None)[source]
compute_betas(subsample=None, float16=True)[source]
static load_betadata(gene, save_dir, matching)[source]
reformat()[source]
splash_betas(gene)[source]

Computes the derivatives by splitting up ligand terms into individual gene components. This essentially converts betadata of cell x modulators into cell x genes.

Parameters:
  • gene (str) – The gene to compute derivatives for.

  • obs_names (list, optional) – List of cell names to compute derivatives for, by default all.

Returns:

DataFrame with derivatives for each cell at each location

Return type:

pd.DataFrame

CyberBoss — Multi-Resolution Transfer

CyberBoss extends Visionary for datasets with different spatial resolutions (e.g. single-cell → Visium spots).

class SpaceTravLR.visionary.CyberBoss(ref_adata, test_adata, ref_json_path, prematching, subsample=None)[source]

Bases: Visionary

A class for cross-predicting gene expression from a reference dataset to a test dataset. Reference and test datasets can have different spatial-resolution and differ in context.

__init__(ref_adata, test_adata, ref_json_path, prematching, subsample=None)[source]
compute_betas(subsample=None, float16=False)[source]
reformat()[source]

Astronaut — Distributed Training Runner

Astronaut is a subclass of SpaceTravLR that uses pre-computed spatial feature maps (e.g. COVET_SQRT) rather than deriving them at training time.

class SpaceTravLR.astronomer.Astronaut(*args, **kwargs)[source]

Bases: SpaceTravLR

__init__(*args, **kwargs)[source]
run(sp_maps_key='COVET_SQRT')[source]