8000 GitHub - STOmics/cellbin2: This repository is for CellBin research group
[go: up one dir, main page]

Skip to content

STOmics/cellbin2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
8000
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


cellbin2: A framework for generating single-cell gene expression data

Introduction

CellBin is an image processing pipeline designed to delineate cell boundaries for spatial analysis. It consists of several image analysis steps. Given the image and gene expression data as input, CellBin performs image registration, tissue segmentation, nuclei segmentation, and molecular labeling (i.e., cell border expanding), ultimately defining the molecular boundaries of individual cells. It incorporates a suite of self-developed algorithms, including deep-learning models, for each of the analysis task. The processed data is then mapped onto the chip to extract molecular information, resulting in an accurate single-cell expression matrix. (Cover image) For more information on CellBin, please refer to the following link.

Cellbin2 is an upgraded version of the original CellBin platform with two key enhancements:

  1. Expanded Algorithm Library: Incorporates additional image processing algorithms to serve broader application scenarios like single-cell RNA-seq, Plant cellbin.
  2. Configurable Architecture: Refactored codebase allows users to customize analysis pipelines through JSON and YAML configuration files.

Installation and Quick Start

Linux

# Clone the main repository
git clone https://github.com/STOmics/cellbin2
# git clone -b <branch> https://github.com/STOmics/cellbin2

# Create and activate a Conda environment
conda create --name cellbin2 python=3.8
conda activate cellbin2

# Install package dependencies
cd cellbin2
pip install .[cp,rs]

# For development mode (optional):
# pip install -e .[cp,rs]      # Editable install with basic extras
# pip install -e .[cp,rs,rp]   # Editable install including report module
# if you pip install packages error, please refer to the pyproject.toml file for more details.

# Execute the demo (takes ~30-40 minutes on GPU hardware)
python demo.py

Performance Note:
We strongly recommend using GPU acceleration for optimal performance. Below is the runtime comparison of two processing modes for an S1 chip (1cmΒ² chip area):

Processing Mode Runtime
GPU 30-40 mins
CPU 6-7 hours

Benchmark hardware:
GPU: NVIDIA GeForce RTX 3060
CPU: AMD Ryzen 7 5800H
Memory: 16GB

If the pipeline defaults to CPU mode unexpectedly, follow our GPU troubleshooting guide to verify your hardware setup.

Output Verification:
After completion, validate the output integrity by comparing your results with the Outputs.

Tutorials

Core Workflow

The cellbin_pipeline.py script serves as the main entry point for CellBin2 analysis. It supports two configuration approaches:

  1. Configuration files : Use JSON files for full customization
  2. Command-line arguments: Quick setup using key parameters with kit-based defaults

πŸ“˜ Configuration Guide:
See JSON Configuration Documentation for full parameter specifications.

Basic Usage

# Minimal configuration (requires complete parameters in JSON)
CUDA_VISIBLE_DEVICES=0 python cellbin2/cellbin_pipeline.py -c <SN> -p <config.json> -o <output_dir> 

# Kit-based configuration (auto-loads predefined settings)
CUDA_VISIBLE_DEVICES=0 python cellbin2/cellbin_pipeline.py -c <SN> -i <image.tif> -s <stain_type> -m <expression.gef> -o <output_dir> -k "Kit Name"

# View all availa
8000
ble parameters
python cellbin2/cellbin_pipeline.py -h

Key Parameters

Parameter Required* Description Examples
-c βœ“ Serial number of chip SN
-o βœ“ Output directory results/SAMPLE123
-i βœ“β–³ Primary image path (required for kit-based mode) SN.tif
-s βœ“β–³ Stain type (required for kit-based mode) DAPI, ssDNA, HE
-p β–³ Path to custom configuration file
JSON Configuration Documentation
config/custom.json
-m β–³ Gene expression matrix SN.raw.gef
-mi β–³ Multi-channel images IF=SN_IF.tif
-pr β–³ Protein expression matrix SN_IF.protein.gef
-k βœ“β–³ Kit type (required for kit-based mode,See kit list below) "Stereo-CITE T FF V1.1 R"

*βœ“ = Always required, βœ“β–³ = Required for kit-based mode, β–³ = Optional

Supported Kit Types

KIT_VERSIONS = (
    # Standard product versions
    'Stereo-seq T FF V1.2',       
    'Stereo-seq T FF V1.3',
    'Stereo-CITE T FF V1.0',   
    'Stereo-CITE T FF V1.1',
    'Stereo-seq N FFPE V1.0', 
    
    # Research versions
    'Stereo-seq T FF V1.2 R',
    'Stereo-seq T FF V1.3 R',
    'Stereo-CITE T FF V1.0 R',
    'Stereo-CITE T FF V1.1 R',
    'Stereo-seq N FFPE V1.0 R',     
)

The kit controls the module switches and parameters in the JSON configuration to customize the analysis workflow.
Detailed configurations per kit: config.md.
More introduction about kits type, you can view STOmics official website.

Common Use Cases

Case 1:Stereo-seq T FF

ssDNA

CUDA_VISIBLE_DEVICES=0 python cellbin2/cellbin_pipeline.py \
-c SN \
-i SN.tif \
-s ssDNA \
-m SN.raw.gef \
-o test/SN \
-k "Stereo-seq T FF V1.2"

Case 2:Stereo-CITE

DAPI + IF + trans gef

CUDA_VISIBLE_DEVICES=0 python cellbin2/cellbin_pipeline.py \
-c SN \
-i SN.tif \
-s DAPI \
-mi IF=SN_IF.tif \
-m SN.raw.gef \
-o test/SN \
-k "Stereo-CITE T FF V1.1 R"

Case 3:Stereo-CITE

DAPI + protein gef

CUDA_VISIBLE_DEVICES=0 python cellbin2/cellbin_pipeline.py \
-c SN \
-i SN_fov_stitched.tif \
-s DAPI \
-pr IF=SN.protein.tissue.gef \
-o /test/SN \
-k "Stereo-CITE T FF V1.1 R"

Case 4:Stereo-CITE

DAPI + IF + trans gef + protein gef

CUDA_VISIBLE_DEVICES=0 python cellbin2/cellbin_pipeline.py \
-c SN \ # chip number
-i SN_DAPI_fov_stitched.tif \  # ssDNA, DAPI, HE data path
-mi IF=SN_IF.tif \
-s DAPI \  # stain type (ssDNA, DAPI, HE)
-m SN.raw.gef \  # Transcriptomics gef path
-pr SN.protein.raw.gef \  # protein gef path
-o test/SN \ # output dir
-k "Stereo-CITE T FF V1.1 R"

Case 5:Stereo-cell

trans gef

CUDA_VISIBLE_DEVICES=0 python cellbin2/cellbin_pipeline.py \
-c SN \ # chip number
-p only_matrix.json \ # Personalized Json File
-o test/SN \ # output dir

please modify only_matrix.json

Case 6: Plant cellbin

ssDNA + FB + trans gef

CUDA_VISIBLE_DEVICES=0 python cellbin2/cellbin_pipeline.py \
-c SN \ # chip number
-p Plant.json \ # Personalized Json File
-o test/SN \ # output dir

please modify Plant.json

Case 7: Multi-stain cellbin

ssDNA + HE + trans gef

CUDA_VISIBLE_DEVICES=0 python cellbin2/cellbin_pipeline.py \
-c SN \ # chip number
-i SN_ssDNA_fov_stitched.tif \  # ssDNA,DAPI data path
-mi HE=SN_HE_fov_stitched.tif \ # HE data path. This image has been registered with ssDNA(DAPI) image
-s ssDNA \  # stain type (ssDNA, DAPI)
-m SN.raw.gef \  # Transcriptomics gef path
-o test/SN \ # output dir
-k "Stereo-CITE T FF V1.1 R"

more examples, please visit example.md

ErrorCode

refer to error.md

Outputs

File Name Description
SN_cell_mask.tif Final cell mask
SN_mask.tif Final nuclear mask
SN_tissue_mask.tif Final tissue mask
SN_params.json CellBin 2.0 input params
SN.ipr Image processing record
metrics.json CellBin 2.0 Metrics
CellBin_0.0.1_report.html CellBin 2.0 report
SN.rpi Recorded image processing (for visualization)
SN.stereo A JSON-formatted manifest file that records the visualization files in the result
SN.tar.gz tar.gz file
SN_DAPI_mask.tif Cell mask on registered image
SN_DAPI_regist.tif Registered image
SN_DAPI_tissue_cut.tif Tissue mask on registered image
SN_IF_mask.tif Cell mask on registered image
SN_IF_regist.tif Registered image
SN_IF_tissue_cut.tif Tissue mask on registered image
SN_Transcriptomics_matrix_template.txt Track template on gene matrix
  • Image files (*.tif): Inspect using ImageJ
  • Gene expression file (generated only when matrix_extract module is enabled): Visualize with StereoMap v4.

Reference

CellBin introduction (Chinese)
https://github.com/STOmics/CellBin
https://github.com/MouseLand/cellpose
https://github.com/matejak/imreg_dft
https://github.com/rezazad68/BCDU-Net
https://github.com/libvips/pyvips
https://github.com/vanvalenlab/deepcell-tf
https://github.com/ultralytics/ultralytics

Tweets
Stereo-seq CellBin introduction (Chinese)
Stereo-seq CellBin application intro (Chinese)
Stereo-seq CellBin cell segmentation database introduction (Chinese)
CellBin: The Core Image Processing Pipeline in SAW for Generating Single-cell Gene Expression Data for Stereo-seq (English)
A Practical Guide to SAW Output Files for Stereo-seq (English)

Paper related
CellBin: a highly accurate single-cell gene expression processing pipeline for high-resolution spatial transcriptomics (GitHub Link)
Generating single-cell gene expression profiles for high-resolution spatial transcriptomics based on cell boundary images (GitHub Link)
CellBinDB: A Large-Scale Multimodal Annotated Dataset for Cell Segmentation with Benchmarking of Universal Models (GitHub Link)

Video tutorial
Cell segmentation tool selection and application (Chinese)
One-stop solution for spatial single-cell data acquisition (Chinese)
Single-cell processing framework for high resolution spatial omics (Chinese)

About

This repository is for CellBin research group

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 6

0