This is the official codebase for scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI.
scGPT is available on PyPI. To install scGPT, run the following command:
$ pip install scgpt
[Optional] We recommend using wandb for logging and visualization.
$ pip install wandb
For developing, we are using the Poetry package manager. To install Poetry, follow the instructions here.
$ git clone this-repo-url
$ cd scGPT
$ poetry install
Note: The flash-attn
dependency usually requires specific GPU and CUDA version. If you encounter any issues, please refer to the flash-attn repository for installation instructions. For now, May 2023, we recommend using CUDA 11.7 and flash-attn<1.0.5 due to various issues reported about installing new versions of flash-attn.
Please download the pretrained scGPT checkpoints from here.
Please see our example code in examples/finetune_integration.py. By default, the script assumes the scGPT checkpoint folder stored in the examples/save
directory.
- Upload the pretrained model checkpoint
- Publish to pypi
- Provide the pretraining code with generative attention masking
- Finetuning examples for multi-omics integration, cell tyep annotation, perturbation prediction, cell generation
- Example code for Gene Regulatory Network analysis
- Documentation website with readthedocs
- Bump up to pytorch 2.0
- New pretraining on larger datasets
- Reference mapping example
We greatly welcome contributions to scGPT. Please submit a pull request if you have any ideas or bug fixes. We also welcome any issues you encounter while using scGPT.
We sincerely thank the authors of following open-source projects:
@article{cui2023scGPT,
title={scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI},
author={Cui, Haotian and Wang, Chloe and Maan, Hassaan and Wang, Bo},
journal={bioRxiv},
year={2023},
publisher={Cold Spring Harbor Laboratory}
}