GitHub

Getting started

Predicting cellular responses to perturbation across diverse contexts with State

Running on the server

if you are running on the server, simply follow below:

git clone --branch experimental_setup git@github.com:DeepSpringAI/state.git
cd state

Installation

downloading the dataset


run python data_download.py

chmod +x run.sh
make setup

training

make train

Installation (not on the server)

if uv is not already installed, use the command below to install it:

pip install uv

clone the repo and installing the package

git clone --branch experimental_setup git@github.com:DeepSpringAI/state.git
cd state
uv tool install -e .

CLI Usage

state --help

set the dataset path

set the path to the examples directory if different:

export DATASET_PATH='examples'

State Transition Model (ST)

To train with a mixed experiment (including both zeroshot and fewshot)

state tx train \
  data.kwargs.toml_config_path="$(pwd)/examples/mixed.toml" \
  data.kwargs.embed_key=X_hvg \
  data.kwargs.num_workers=32 \
  data.kwargs.batch_col=batch_var \
  data.kwargs.pert_col=target_gene \
  data.kwargs.cell_type_key=cell_type \
  data.kwargs.control_pert=TARGET1 \
  training.max_steps=5000 \
  training.batch_size=64 \
  training.lr=1e-4 \
  model=state \
  output_dir="./mixed_for_competition" \
  name="unified_model_mixed_for_the_meeting"

The cell lines and perturbations specified in the TOML should match the values appearing in the data.kwargs.cell_type_key and data.kwargs.pert_col used above. To evaluate STATE on the specified task, you can use the tx predict command:

 state tx predict \                                            
  --output-dir ./mixed_for_competition/unified_model_mixed_for_the_meeting/ \
  --checkpoint final.ckpt

It will look in the output_dir above, for a checkpoints folder.

If you instead want to use a trained checkpoint for inference (e.g. on data not specified) in the TOML file:

state tx infer \                                                                                        
  --model-dir ./mixed_for_competition/unified_model_mixed_for_the_meeting/ \
  --adata competition_support_set/competition_val_template.h5ad \
  --output competition/prediction_new.h5ad \
  --pert-col target_gene

Data Preprocessing

State provides two preprocessing commands to prepare data for training and inference:

Training Data Preprocessing

Use preprocess_train to normalize, log-transform, and select highly variable genes from your training data:

state tx preprocess_train \
  --adata /path/to/raw_data.h5ad \
  --output /path/to/preprocessed_training_data.h5ad \
  --num_hvgs 2000

This command:

Normalizes total counts per cell (sc.pp.normalize_total)
Applies log1p transformation (sc.pp.log1p)
Identifies highly variable genes (sc.pp.highly_variable_genes)
Stores the HVG expression matrix in .obsm['X_hvg']

Inference Data Preprocessing

Use preprocess_infer to create a "control template" for model inference:

state tx preprocess_infer \
  --adata /path/to/real_data.h5ad \
  --output /path/to/control_template.h5ad \
  --control_condition "DMSO" \
  --pert_col "treatment" \
  --seed 42

converting to the competition template

run h5da_convertor.py to get the .vcc file:

python h5da_convertor.py

Licenses

State code is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0).

The model weights and output are licensed under the Arc Research Institute State Model Non-Commercial License and subject to the Arc Research Institute State Model Acceptable Use Policy.

Any publication that uses this source code or model parameters should cite the State paper.

Name		Name	Last commit message	Last commit date
Latest commit History 853 Commits
.github		.github
assets		assets
competition/first_run		competition/first_run
competition_old/first_run		competition_old/first_run
competition_support_set		competition_support_set
dockerfiles		dockerfiles
examples		examples
experiments		experiments
mixed_for_competition/unified_model_mixed_for_the_meeting		mixed_for_competition/unified_model_mixed_for_the_meeting
scripts		scripts
src/state		src/state
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
=		=
BATCH_SIZE_NONE_EXPLAINED.md		BATCH_SIZE_NONE_EXPLAINED.md
BREAKPOINT_QUICK_GUIDE.md		BREAKPOINT_QUICK_GUIDE.md
CODEBASE_OVERVIEW.md		CODEBASE_OVERVIEW.md
DATA_INVESTIGATION_GUIDE.md		DATA_INVESTIGATION_GUIDE.md
DEBUG_BREAKPOINT_GUIDE.md		DEBUG_BREAKPOINT_GUIDE.md
DEFAULT_BATCH_SIZE_EXPLAINED.md		DEFAULT_BATCH_SIZE_EXPLAINED.md
EXACT_BATCHING_LINE.md		EXACT_BATCHING_LINE.md
EXPERIMENTS_README.md		EXPERIMENTS_README.md
EXPERIMENT_02_READY.md		EXPERIMENT_02_READY.md
EXPERIMENT_LOG.md		EXPERIMENT_LOG.md
GPU_FIX_DOCUMENTATION.md		GPU_FIX_DOCUMENTATION.md
LATENT_TO_GENE_DECODER_EXPLAINED.md		LATENT_TO_GENE_DECODER_EXPLAINED.md
LICENSE		LICENSE
MODEL_ACCEPTABLE_USE_POLICY.md		MODEL_ACCEPTABLE_USE_POLICY.md
MODEL_FLOW_EXPLAINED.md		MODEL_FLOW_EXPLAINED.md
MODEL_LICENSE.md		MODEL_LICENSE.md
MODEL_SEQUENCE_ORDER.md		MODEL_SEQUENCE_ORDER.md
MODIFICATION_01_SUMMARY.md		MODIFICATION_01_SUMMARY.md
Makefile		Makefile
README.md		README.md
SE_EMBEDDING_STORAGE.md		SE_EMBEDDING_STORAGE.md
STATE_MODEL_DOCUMENTATION.md		STATE_MODEL_DOCUMENTATION.md
STATE_for_Virtual_Cell_Challenge.ipynb		STATE_for_Virtual_Cell_Challenge.ipynb
ST_TRAINING_FLOW_EXPLAINED.md		ST_TRAINING_FLOW_EXPLAINED.md
TRAINING_INPUT_EXPLAINED.md		TRAINING_INPUT_EXPLAINED.md
WHERE_BATCHING_HAPPENS.md		WHERE_BATCHING_HAPPENS.md
control_vs_perturbed_explained.md		control_vs_perturbed_explained.md
data_download.py		data_download.py
h5da_convertor.py		h5da_convertor.py
progress.md		progress.md
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
run.sh		run.sh
singularity.def		singularity.def
vGPU_Complete_Guide.md		vGPU_Complete_Guide.md
vgpu_explanation.md		vgpu_explanation.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting started

Running on the server

Installation

downloading the dataset

training

Installation (not on the server)

if uv is not already installed, use the command below to install it:

clone the repo and installing the package

CLI Usage

set the dataset path

State Transition Model (ST)

Data Preprocessing

Training Data Preprocessing

Inference Data Preprocessing

converting to the competition template

Licenses

About

Uh oh!

Releases

Packages

Languages

License

DeepSpringAI/state

Folders and files

Latest commit

History

Repository files navigation

Getting started

Running on the server

Installation

downloading the dataset

training

Installation (not on the server)

if uv is not already installed, use the command below to install it:

clone the repo and installing the package

CLI Usage

set the dataset path

State Transition Model (ST)

Data Preprocessing

Training Data Preprocessing

Inference Data Preprocessing

converting to the competition template

Licenses

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages