This is a fork from Dani's work (please see below for citing) to remove R as we don't need this for teaching but do have a few more Python packages that we do use. We've also added some JupyterLab extensions to make interacting with the Lab server a bit easier.
We previously experimented with four approaches to installation: VirtualBox; Vagrant; Docker; and Anaconda Python directly. Each of these has pros and cons, but after careful consideration we have come to the conclusion that Docker is the most robust way to ensure a consistent experience in which all students end up with the same versions of each library, difficult-to-diagnose hardware/OS issues are minimised, and running/recovery is the most straightfoward.
=========================
For most users you should really be looking at this through the GitHub.io web site!
=========================
- Work out why the Intel (AMD64) image is so much larger than the Apple Silicon (ARM64) image. I can get a decent report using
docker history --format "{{.Size}}\t{{.CreatedBy}}" --no-trunc jreades/sds:2023-intel | grep -e "[G]B"
which traces the difference back to just two layers:- 6.52GB
RUN |2 USERNAME=jovyan TARGETPLATFORM=linux/amd64 /bin/bash -c mamba env update -n base --quiet --file ./${yaml_nm} && conda clean --all --yes --force-pkgs-dirs && find /opt/conda/ -follow -type f -name '*.a' -delete && find /opt/conda/ -follow -type f -name '*.pyc' -delete && find /opt/conda/ -follow -type f -name '*.js.map' -delete && pip cache purge && rm -rf /home/$NB_USER/.cache/pip && rm ./${yaml_nm} # buildkit
- 5.48GB
RUN |2 USERNAME=jovyan TARGETPLATFORM=linux/amd64 /bin/bash -c fix-permissions $CONDA_DIR && fix-permissions $HOME # buildkit
- My guess is that the second command's effect depends on the effects of the first: there are a lot of files modified by the
mamba
update but they end up with a different/wrong set of permissions from what thefix-permissions
script is expecting so it then has to modify the permissions on all of them which almost doubles the size of image.
- 6.52GB
- Start up the UCL VPN.
- Connect to JupyterHub
- Authenticate using UCL credentials.
- Create a new terminal: File > New > Terminal
I think that these instructions are not correct (see below for the alternative) in the sense the use of a symlink can cause problems and duplicated environments down the line:
course_name="casa0013"
ln -s /shared/.../casa/${course_name} $HOME/${course_name}
conda config --add envs_dirs /shared/groups/.../casa/${course_name}/envs
curl -o /tmp/casa0013.yml https://raw.githubusercontent.com/jreades/sds_env/master/conda/environment_py.yml
conda env create -n casa0013 -f /tmp/casa0013.yml
I now think that the correct way to do this is:
course_name="casa0013"
conda config --add envs_dirs /shared/groups/.../casa/envs
curl -o /tmp/casa0013.yml https://raw.githubusercontent.com/jreades/sds_env/master/conda/environment_py.yml
conda env create -p /shared/groups/.../casa/envs -f /tmp/casa0013.yml
However, note that this now means you have .../casa/casa0013/envs/casa0013...
so it might be more sensible to set envs_dirs
to just ...casa/envs
and then have per-module environments underneath that.
Two shortcomings in the existing approach of generating environment_py.yml
were identified and need to be tweaked in the Makefile:
- Remove anything with ‘linux’ in it
- Remove SOMPY and
mrmr
- Remove version from gitpython.
- Remove python-graphviz entirely.
Additional issues may exist with replication to non-Linux systems.
To connect to JupyterHub:
- Start up the UCL VPN.
- Connect to JupyterHub
- Authenticate using UCL credentials.
- If you see a URL that ends in
tree?
please replace this withlab?
to get the JupyterLab interface and not the original Jupyter Notebook interface. - Create a new terminal: File > New > Terminal
Note that you need to replace ...
with the appropriate path (this will be obvious logged in):
course_name="casa0013"
conda config --append envs_dirs /shared/groups/.../casa/envs
jupyter contrib nbextension install --user
This draws heavily on Dani Arribas-Bel's work for Liverpool. If you use this, you should cite him.
@software{hadoop,
author = {{Dani Arribas-Bel}},
title = {\texttt{gds_env}: A containerised platform for Geographic Data Science},
url = {https://github.com/darribas/gds_env},
version = {3.0},
date = {2019-08-06},
}