LSD NM experiments tutorial

General notes

A tutorial for running training/inference of networks used for paper (in singularity containers).
Tested on Ubuntu 18.04 with Quadro P6000 (24gb gpu ram)
Assuming singularity is installed and setup (some tutorials here and here)
Assuming conda is installed and setup (helpful instructions if needed)

Getting started

Clone this repo:

git clone https://github.com/funkelab/lsd_nm_experiments.git

Create simple environment for fetching containers:

conda create -n test_env python=3.8
conda activate test_env

If you just need boto3 for fetching containers:

pip install boto3

If you also want to view data with neuroglancer:

pip install boto3 neuroglancer h5py zarr

Downloading container(s)

download_imgs.py will download the singularity container used in the paper (lsd:v0.8.img)
We found that sometimes this singularity container throws errors because of deprecated cuda versions that don't play well with updated drivers
We made another image (lsd_legacy.img) that should handle this. If running into libcublas or gcc errors with the original container, consider using the newer one
To download (uncomment legacy img if desired):

python download_imgs.py

Or building legacy container from source:

Nvidia has deprecated the conda packages of the cudnn / cudatoolkit versions used in the original singularity container (6.0/8.0)
This sometimes causes problems with updated drivers (even though the point of containerization is to solve this...)

Can get around by installing these from tars, specifically these:

https://anaconda.org/numba/cudatoolkit/8.0/download/osx-64/cudatoolkit-8.0-3.tar.bz2
https://repo.anaconda.com/pkgs/free/linux-64/cudnn-6.0.21-cuda8.0_0.tar.bz2

setup.sh will fetch these packages and use them to install directly into the conda environment when creating the Singularity container
The other packages are just specified in the lsd_legacy.yml
To build legacy image locally:

./setup.sh

Note - the original container runs fine with exec, the legacy one uses run. To be honest, not sure what the problem is here
So singularity exec --nv lsd:v0.8.img python -c "import tensorflow" will work, and
singularity run --nv lsd_legacy.img python -c "import tensorflow" will work, but
singularity exec --nv lsd_legacy.img python -c "import tensorflow" will cause import errors

Fetching data

Navigate to 01_data directory (cd 01_data)
fetch_data.py will download training data from the aws bucket using a json file (datasets.json) specifying the training volumes for each dataset
The script defaults to just downloading the first volume for each dataset (one for each of zebrafinch, fib25, hemi, or three total)
download data:

python fetch_data.py

you could also use the singularity container to download the data, but we already have boto3 in the basic env we created anyway

Creating masks

create_masks.py will create a labels_mask that we use for training to constrain random locations
If you installed zarr into the conda environment, you can just run with:

python create_masks.py

otherwise, using the singularity container:

singularity exec --nv ../lsd:v0.8.img python create_masks.py

or singularity run... if using legacy container

Viewing the data

If you installed neuroglancer into your environment, you can view the data with view_data.py
e.g python -i view_data.py -d funke/fib25/training/tstvol-520-1.zarr
If you are viewing remotely, you could also set the bind address with -b (defaults to localhost)
There are some good little packages & tutorials for using neuroglancer differently. Examples here, here and here

Example fib25 training data:

Downloading network checkpoints

fetch_checkpoint.py will download a specified network checkpoint for a given dataset to the target folder in 02_train
We can start with the baseline affinities for the zebrafinch dataset just using conda boto3 (or use singularity if desired):

python fetch_checkpoint.py

Training a network

Navigate to the zebrafinch baseline directory (cd ../02_train/zebrafinch/baseline)
We start by creating our network in mknet.py (e.g placeholders to match to the trained graphs):

singularity exec --nv ../../../lsd:v0.8.img python mknet.py

It should print a bunch of layer names and tensor shapes (that looks like a sideways U-Net) to the command line
Check the files in the directory (e.g tree .), it should now look like:

.
├── checkpoint
├── config.json
├── config.meta
├── mknet.py
├── predict.py
├── predict_scan.py
├── train_net_checkpoint_400000.data-00000-of-00001
├── train_net_checkpoint_400000.index
├── train_net_checkpoint_400000.meta
├── train_net.json
├── train_net.meta
├── train.py
└── view_batch.py

Train for 1 iteration:

singularity exec --nv ../../../lsd:v0.8.img python train.py 1

You'll see that gunpowder will print ERROR:tensorflow:Couldn't match files for checkpoint ./train_net_checkpoint_500000
This is because it checks the checkpoint file which specifies iteration 500000 (since this network was trained for longer than the optimal checkpoint)
If we view the batch, we'll see that the predictions are all grey, since it really only trained for a single iteration and didn't use the checkpoint (python -i view_batch.py -f snapshots/batch_1.hdf):

To fix, simply edit this checkpoint file to point to the downloaded checkpoint iteration instead (e.g 500000 -> 400000)
Now running the above won't do anything, because of line 27 in train.py:

if trained_until >= max_iteration:
     return

So just make sure to train to the checkpoint + n, eg for 1 extra iteration:

singularity exec --nv ../../../lsd:v0.8.img python train.py 400001

We can then view the saved batch, e.g:

python -i view_batch.py -f snapshots/batch_400001.hdf

Running Inference

For the lsds experiments, we ran everything from inference through evaluation in a blockwise fashion using daisy
For inference, this meant having a blockwise prediction script that called a gunpowder predict pipeline inside each process
For example, this script would distribute this script by using a DaisyRequestBlocks gunpowder node
If you just want to run inference on a small volume (in memory), you can instead use a Scan node
We added adapted all blockwise inference scripts to also use scan nodes (e.g predict_scan.py)
Example run on zfinch training data:

singularity exec --nv ../../../lsd:v0.8.img python predict_scan.py

Resulting affinities:

Multitask (MTLSD)

This can be run exactly the same as the baseline above.
Inside 01_data/fetch_checkpoint.py, uncomment mtlsd and run.
Navigate to 02_train/zebrafinch/mtlsd
Create network (... python mknet.py)
Change checkpoint file iteration to match downloaded checkpoint (500000 -> 400000)
Train (... python train.py 400001)
View batch (... python -i view_batch.py ...) -> will also show LSDs
Predict (... python predict_scan.py) -> will write out LSDs and Affs
This will also give us LSDs:

Tip - you can view the different components of the LSDs by adjusting the shader in neuroglancer, e.g changing 0,1,2 to 3,4,5 will show the diagonal entries of the covariance component (or the direction the processes move):

void main() {
    emitRGB(
        vec3(
            toNormalized(getDataValue(3)),
            toNormalized(getDataValue(4)),
            toNormalized(getDataValue(5)))
        );
}

Or to view a single channel of the 10d lsds, (e.g channel 6):

void main() {
    float v = toNormalized(getDataValue(6));
    vec4 rgba = vec4(0,0,0,0);
    if (v != 0.0) {
        rgba = vec4(colormapJet(v), 1.0);
    }
    emitRGBA(rgba);
}

Autocontext (ACLSD and ACRLSD)

These networks rely on a pretrained LSD (raw -> LSDs) network, eg:
ACLSD: Raw -> LSDs -> Affs
ACRLSD: Raw -> LSDs + cropped Raw -> Affs
Because of this, they are more computationally expensive (since training requires extra context to first predict the lsds)
They ran using 23.5 GB of available 24 GB GPU RAM when testing on quadro p6000. If you have less than that you will likely run into cuda OOM errors
If you have access to sufficient gpu memory, to start navigate to 01_data and uncomment lsd + run script to get the pretrained lsd checkpoint
Go to lsd directory (cd ../02_train/zebrafinch/lsd) and follow same instructions as baseline and mtlsd nets above
Once you have the lsd checkpoints (and test_prediction.zarr with pred_lsds following prediction), start with a basic autocontext network (aclsd).
Get the aclsd checkpoint, navigate to appropriate directory, change checkpoint file as before and train network.
Visualizing the resulting batch shows us the larger raw context needed for predicting the lsds to ensure that the output affinities remain the same size:

Run prediction as before - note, will not run if lsds have not been predicted first. This script could be adapted to predict the lsds on-the-fly using just the lsds and affs checkpoints
The same can be done for the ACRLSD network (note, requires a merge provider during prediction as this network takes both lsds and raw data as input)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
01_data		01_data
02_train/zebrafinch		02_train/zebrafinch
static		static
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
Singularity		Singularity
download_imgs.py		download_imgs.py
lsd_legacy.yml		lsd_legacy.yml
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LSD NM experiments tutorial

General notes

Getting started

Downloading container(s)

Or building legacy container from source:

Fetching data

Creating masks

Viewing the data

Downloading network checkpoints

Training a network

Running Inference

Multitask (MTLSD)

Autocontext (ACLSD and ACRLSD)

Todos: add consolidated fib25/hemi nets to tutorial

About

Uh oh!

Releases

Packages

Uh oh!

Languages

funkelab/lsd_nm_experiments

Folders and files

Latest commit

History

Repository files navigation

LSD NM experiments tutorial

General notes

Getting started

Downloading container(s)

Or building legacy container from source:

Fetching data

Creating masks

Viewing the data

Downloading network checkpoints

Training a network

Running Inference

Multitask (MTLSD)

Autocontext (ACLSD and ACRLSD)

Todos: add consolidated fib25/hemi nets to tutorial

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages