Core¶
SpaceShip — Main way to get where you’re going¶
SpaceShip is the recommended entry point. It orchestrates
data preprocessing, GRN inference, and ligand-receptor communication
and spawning workers into a single configurable object.
from SpaceTravLR import SpaceShip
ship = SpaceShip(name='MyTissue', outdir='./output')
ship.setup_(adata) # preprocess, run CellOracle, (optionally COMMOT)
ship.spawn_worker() # submit a SLURM job, or call SpaceTravLR directly
- class SpaceTravLR.spaceship.SpaceShip(name='AlienTissue', outdir='./output')[source]¶
Bases:
objectSpaceShip is the main entry point for the SpaceTravLR analysis pipeline. It manages the data, directory structure, and execution of the various steps involved in spatial gene regulatory network inference and perturbation.
- Parameters:
name (str, optional) – Name of the project/analysis, by default ‘AlienTissue’.
outdir (str, optional) – Path to the output directory where results will be saved, by default ‘./output’.
- interactive_select(adata, size=10, annot='cell_type', mode='spatial')[source]¶
Launches an interactive scatter plot for selecting cells.
- Parameters:
adata (ad.AnnData) – AnnData object.
size (int, optional) – Point size, by default 10.
annot (str, optional) – Color by annotation, by default ‘cell_type’.
mode (str, optional) – ‘spatial’ or ‘umap’, by default ‘spatial’.
- Returns:
Interactive scatter plot widget.
- Return type:
jscatter.Scatter
- is_everything_ok()[source]¶
Checks if all necessary output files and directories exist.
- Returns:
True if all checks pass.
- Return type:
bool
- perturb(target, propagation=4, gene_expr=0, cells=None)[source]¶
Performs in silico perturbation of a target gene.
Simulates the effect of changing a gene’s expression (knockout or overexpression) on the entire transcriptome, considering spatial signaling propagation.
- Parameters:
target (str or list) – Target gene(s) to perturb.
propagation (int, optional) – Number of propagation steps (hops) in the network, by default 4.
gene_expr (float or list, optional) – Target expression level (0 for knockout), by default 0.
cells (list, optional) – List of cell indices to apply perturbation to (None for all cells), by default None.
- Returns:
Simulated gene expression matrix after perturbation.
- Return type:
pd.DataFrame
- process_adata_(adata, annot='cell_type')[source]¶
Preprocesses the AnnData object for SpaceTravLR analysis.
This method checks for required fields, normalizes data if necessary, computes PCA/Neighbors/UMAP if missing, and imputes gene expression if needed. It saves the processed AnnData to the output directory.
- Parameters:
adata (ad.AnnData) – The AnnData object containing the spatial transcriptomics data.
annot (str, optional) – The column name in adata.obs containing cell type annotations, by default ‘cell_type’.
- run_celloracle_(alpha=5)[source]¶
Runs CellOracle to infer the base Gene Regulatory Network (GRN).
It constructs a cluster-specific GRN based on the base network structure and the expression data in the AnnData object.
- Parameters:
alpha (int, optional) – Regularization parameter for the model, by default 5.
- run_commot_(radius=350)[source]¶
Runs COMMOT to infer spatial cell-cell communication.
This method identifies ligand-receptor interactions and computes their spatial communication scores. It also computes received ligand signals for each cell.
- Parameters:
radius (int, optional) – Spatial radius for communication in microns (or coordinate units), by default 350.
- run_spacetravlr(max_epochs=150, learning_rate=0.005, spatial_dim=64, batch_size=512, radius=300, contact_distance=50)[source]¶
Trains the SpaceTravLR model to learn spatial gene regulation.
This method initializes and trains the SpaceTravLR neural network model to predict gene expression based on TF activity and spatial ligand-receptor interactions.
- Parameters:
max_epochs (int, optional) – Maximum number of training epochs, by default 150.
learning_rate (float, optional) – Learning rate for the optimizer, by default 5e-3.
spatial_dim (int, optional) – Dimension of the spatial embedding, by default 64.
batch_size (int, optional) – Batch size for training, by default 512.
radius (int, optional) – Radius for secreted signaling, by default 300.
contact_distance (int, optional) – Distance for contact-dependent signaling, by default 50.
- setup_(adata, overwrite=False, run_commot=False)[source]¶
Sets up the SpaceShip environment and runs the preprocessing pipeline.
This includes creating directories, processing AnnData, running CellOracle, and running COMMOT.
- Parameters:
adata (ad.AnnData) – Input AnnData object.
overwrite (bool, optional) – If True, overwrites existing output directory, by default False.
- Returns:
Returns self for method chaining.
- Return type:
self
- setup_perturbations(adata, override_params=None, subsample=None, use_float16=False)[source]¶
Initializes the GeneFactory for running perturbations.
- Parameters:
adata (ad.AnnData) – AnnData object used for perturbation simulations.
override_params (dict, optional) – Dictionary to override run parameters, by default None.
subsample (int, optional) – Number of cells to subsample for faster loading, by default None.
use_float16 (bool, optional) – Use float16 for lower memory usage, by default False.
- spawn_worker(partition='preempt', clusters='gpu', gres='gpu:1', job_name='SpaceTravLR', lifespan=3, python_path='python')[source]¶
Submits a SLURM job to run the analysis.
- Parameters:
partition (str, optional) – SLURM partition, by default ‘preempt’.
clusters (str, optional) – SLURM cluster, by default ‘gpu’.
gres (str, optional) – Generic Resource Scheduling (e.g. gpu:1), by default ‘gpu:1’.
job_name (str, optional) – Name of the job, by default ‘SpaceTravLR’.
lifespan (int, optional) – Wall-time in hours, by default 3.
python_path (str, optional) – Path to python executable, by default ‘python’.
SpaceTravLR — Training Orchestrator¶
SpaceTravLR.oracles.SpaceTravLR manages the training queue and dispatches
SpatialCellularProgramsEstimator
for each target gene.
from SpaceTravLR.oracles import SpaceTravLR
model = SpaceTravLR(
adata,
save_dir='./models',
annot='cell_type_int',
grn=grn,
max_epochs=100,
radius=200,
)
model.run()
- class SpaceTravLR.oracles.SpaceTravLR(adata, save_dir='./models', annot='cell_type_int', grn=None, max_epochs=50, spatial_dim=64, learning_rate=0.005, batch_size=512, rotate_maps=True, layer='imputed_count', alpha=0.05, threshold_lambda=1e-06, tflinks=None, tf_ligand_cutoff=0.01, radius=200, contact_distance=30, skip_clusters=None, scale_factor=1)[source]¶
Bases:
BaseTravLR- __init__(adata, save_dir='./models', annot='cell_type_int', grn=None, max_epochs=50, spatial_dim=64, learning_rate=0.005, batch_size=512, rotate_maps=True, layer='imputed_count', alpha=0.05, threshold_lambda=1e-06, tflinks=None, tf_ligand_cutoff=0.01, radius=200, contact_distance=30, skip_clusters=None, scale_factor=1)[source]¶
GeneFactory — Perturbation Engine¶
GeneFactory loads pre-trained spatial coefficients
(betas) and uses them to simulate in-silico perturbations.
from SpaceTravLR.gene_factory import GeneFactory
gf = GeneFactory.from_json(adata, 'output/params.json')
gf.load_betas()
# Single-gene knockout
result_df = gf.perturb(target='Myc', n_propagation=4, gene_expr=0)
# Whole-genome screen
gf.genome_screen(save_to='./ko_results', mode='knockout')
- class SpaceTravLR.gene_factory.GeneFactory(adata, models_dir, annot='cell_type_int', radius=200, contact_distance=30, scale_factor=1, beta_scale_factor=1, beta_cap=None, co_grn=None)[source]¶
Bases:
BaseTravLRGeneFactory handles the loading of trained models (betas) and facilitates in silico perturbations. It effectively acts as a factory for generating simulated gene expression profiles under various perturbation conditions.
- Parameters:
adata (ad.AnnData) – AnnData object containing the data.
models_dir (str) – Directory where the trained models (betadata) are stored.
annot (str, optional) – Annotation key in adata.obs, by default ‘cell_type_int’.
radius (int, optional) – Spatial radius for signaling, by default 200.
contact_distance (int, optional) – Contact distance for signaling, by default 30.
scale_factor (int, optional) – Scaling factor for spatial coordinates, by default 1.
beta_scale_factor (int, optional) – Scaling factor for beta values, by default 1.
beta_cap (float, optional) – Cap for beta values to prevent explosions in simulation, by default None.
co_grn (object, optional) – CellOracle GRN object, by default None.
- __init__(adata, models_dir, annot='cell_type_int', radius=200, contact_distance=30, scale_factor=1, beta_scale_factor=1, beta_cap=None, co_grn=None)[source]¶
- classmethod from_json(adata, json_path, override_params=None, beta_scale_factor=1, beta_cap=None, co_grn=None)[source]¶
Creates a GeneFactory instance from a parameters JSON file.
- Parameters:
adata (ad.AnnData) – AnnData object.
json_path (str) – Path to the JSON file containing run parameters.
override_params (dict, optional) – Dictionary to override parameters from JSON, by default None.
beta_scale_factor (int, optional) – Scaling factor for beta values, by default 1.
beta_cap (float, optional) – Cap for beta values, by default None.
co_grn (object, optional) – CellOracle GRN object, by default None.
- Returns:
Initialized GeneFactory instance.
- Return type:
- genome_screen(save_to, n_propagation=4, priority_genes=None, mode='knockout', cells=None)[source]¶
Perform a genome-wide perturbation screen (knockout or overexpression).
Iterates through all possible targets (TFs, ligands, receptors) and performs the specified perturbation, saving the results to disk.
- Parameters:
save_to (str) – Directory to save the results.
n_propagation (int, optional) – Number of propagation steps, by default 4.
priority_genes (list, optional) – List of genes to prioritize in the screen, by default None.
mode (str, optional) – ‘knockout’ or ‘overexpress’, by default ‘knockout’.
cells (list, optional) – List of cell indices to restrict perturbation to, by default None.
- load_betas(subsample=None, float16=False, obs_names=None)[source]¶
Loads the spatial gene regulatory coefficients (betas) from disk.
- Parameters:
subsample (int, optional) – Number of cells to subsample, by default None.
float16 (bool, optional) – Use float16 precision to save memory, by default False.
obs_names (list, optional) – List of cell names to load betas for, by default None.
- perturb(target, n_propagation=4, gene_expr=0, cells=None, save_layer=False, delta_dir=None)[source]¶
Simulates perturbation of a target gene and propagates the effect.
- Parameters:
target (str or list) – Target gene(s) to perturb.
n_propagation (int, optional) – Number of propagation steps, by default 4.
gene_expr (float, optional) – Expression level of the target gene (0 for knockout), by default 0.
cells (list, optional) – List of cell indices to apply perturbation to, by default None.
save_layer (bool, optional) – Whether to save the result as a layer in adata, by default False.
delta_dir (str, optional) – Directory to save delta matrices, by default None.
- Returns:
DataFrame containing the simulated gene expression.
- Return type:
pd.DataFrame
- perturb_batch(target_genes, save_to=None, n_propagation=4, gene_expr=0, cells=None)[source]¶
Runs perturbations for a batch of target genes.
- Parameters:
target_genes (list) – List of genes to perturb.
save_to (str, optional) – Directory to save results, by default None.
n_propagation (int, optional) – Number of propagation steps, by default 4.
gene_expr (float, optional) – Target expression level, by default 0.
cells (list, optional) – List of cells to apply perturbation to, by default None.
- property possible_targets¶
- splash_betas(gene, obs_names=None)[source]¶
Computes the derivatives by splitting up ligand terms into individual gene components. This essentially converts betadata of cell x modulators into cell x genes.
- Parameters:
gene (str) – The gene to compute derivatives for.
obs_names (list, optional) – List of cell names to compute derivatives for, by default all.
- Returns:
DataFrame with derivatives for each cell at each location
- Return type:
pd.DataFrame
VirtualTissue — In-Silico Tissue Simulation¶
VirtualTissue provides a high-level interface for
visualising perturbation effects across the spatial tissue map.
from SpaceTravLR.virtual_tissue import VirtualTissue
vt = VirtualTissue(
adata,
betadatas_path='./output/betadata',
ko_path='./ko_results',
)
impact = vt.compute_ko_impact(['Myc', 'Sox2'])
vt.plot_radar(['Myc'], impact_df=impact)
vt.plot_arrows(perturb_target='Myc', threshold=0.1)
SubsampledTissue¶
An extension of VirtualTissue that aggregates
results across multiple spatial sub-samples.
OracleQueue — Training Job Queue¶
OracleQueue manages the set of genes waiting to be
modelled, with file-based locking for safe multi-agent parallel training.
- class SpaceTravLR.oracles.OracleQueue(model_dir, all_genes, priority_genes=None, lock_timeout=3600)[source]¶
Bases:
objectA jobs manager for training gene models in parallel Ideal for HPC environments
- __init__(model_dir, all_genes, priority_genes=None, lock_timeout=3600)[source]¶
- Args:
model_dir (str): Directory to store the trained model weights. all_genes (list): List of all genes to train models for. priority_genes (list, optional): List of genes to train models for first. lock_timeout (int, optional): Timeout for job locks in seconds.
- add_orphan(gene)[source]¶
Creates a .orphan file to mark a gene as untrainable. Used when a gene-gene network is too sparse to train a model. For example when a gene has no known TFs.
- property age¶
Return the age of the queue in seconds.
- property agents¶
- property completed_genes¶
- create_lock(gene)[source]¶
Create a lock for a gene while the model is being trained. This prevents multiple processes from training the same gene.
- property is_empty¶
- property num_orphans¶
- property regulated_genes¶
- property remaining_genes¶
See also
Model — full reference for
SpatialCellularProgramsEstimator,
the core per-gene spatial regression engine.
BetaFrame — Spatial Coefficient Matrix¶
BetaFrame is a pandas.DataFrame subclass that
stores the spatially-varying regression coefficients (βs) for a single target gene.
from SpaceTravLR.beta import BetaFrame
bf = BetaFrame.from_path('output/Myc_betadata.parquet')
bf_splashed = bf.splash(rw_ligands, rw_ligands_tfl, gex_df)
Betabase — Collection of BetaFrames¶
Betabase manages loading and caching of
BetaFrame objects from disk for all trained genes.
- class SpaceTravLR.beta.Betabase(adata, folder, gene_subset=None, subsample=None, float16=True, obs_names=None, genes=None, randomize=False, auto_load=True)[source]¶
Bases:
objectHolds a collection of BetaFrames for each gene.
Visionary — Cross-Dataset Prediction¶
Visionary enables transferring trained spatial models
from a reference dataset to a new test dataset.
- class SpaceTravLR.visionary.Visionary(ref_adata, test_adata, ref_json_path, prematching, matching_annot='cell_type', subsample=None, override_params=None)[source]¶
Bases:
GeneFactoryA class for cross-predicting gene expression from a reference dataset to a test dataset. Reference and test datasets can differ in sample or modality, but should have similar spatial-resolution such that spots can be mapped to each other.
- __init__(ref_adata, test_adata, ref_json_path, prematching, matching_annot='cell_type', subsample=None, override_params=None)[source]¶
- splash_betas(gene)[source]¶
Computes the derivatives by splitting up ligand terms into individual gene components. This essentially converts betadata of cell x modulators into cell x genes.
- Parameters:
gene (str) – The gene to compute derivatives for.
obs_names (list, optional) – List of cell names to compute derivatives for, by default all.
- Returns:
DataFrame with derivatives for each cell at each location
- Return type:
pd.DataFrame
CyberBoss — Multi-Resolution Transfer¶
CyberBoss extends Visionary
for datasets with different spatial resolutions (e.g. single-cell → Visium spots).
- class SpaceTravLR.visionary.CyberBoss(ref_adata, test_adata, ref_json_path, prematching, subsample=None)[source]¶
Bases:
VisionaryA class for cross-predicting gene expression from a reference dataset to a test dataset. Reference and test datasets can have different spatial-resolution and differ in context.
Astronaut — Distributed Training Runner¶
Astronaut is a subclass of
SpaceTravLR that uses pre-computed spatial feature maps
(e.g. COVET_SQRT) rather than deriving them at training time.