8000 Optimize multi-arch Docker images (#298) · SerAcero/python-tutorial@176e30b · GitHub
[go: up one dir, main page]

Skip to content

Commit 176e30b

Browse files
authored
Optimize multi-arch Docker images (empa-scientific-it#298)
- Reduce image size by implementing multi-stage build - Remove torchaudio package and optimize PyTorch installation - Add GPU/CPU variant support via build arguments - Enable multi-architecture builds (amd64/arm64) with proper manifests - Fix Renku compatibility by setting provenance: false - Simplify container environment variables and working directory - Create architecture-specific tags for better image management
1 parent b14c079 commit 176e30b

File tree

4 files changed

+166
-33
lines changed

4 files changed

+166
-33
lines changed

.github/workflows/docker-build.yml

Lines changed: 91 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ jobs:
2929
strategy:
3030
matrix:
3131
arch: [amd64, arm64]
32+
variant: [cpu, cuda]
3233
fail-fast: false
3334
steps:
3435
- name: Checkout code
@@ -54,10 +55,10 @@ jobs:
5455
with:
5556
images: ghcr.io/${{ github.repository }}
5657
tags: |
57-
type=raw,value=latest,enable=${{ github.ref == 'refs/heads/main' }}
58-
type=ref,event=pr
59-
type=ref,event=tag
60-
type=sha,format=short
58+
type=raw,value=${{ matrix.variant }}-${{ matrix.arch }},enable=${{ github.ref == 'refs/heads/main' }}
59+
type=raw,value=${{ matrix.variant }}-${{ matrix.arch }}-pr-${{ github.event.pull_request.number }},enable=${{ github.event_name == 'pull_request' }}
60+
type=raw,value=${{ matrix.variant }}-${{ matrix.arch }}-${{ github.ref_name }},enable=${{ startsWith(github.ref, 'refs/tags/') }}
61+
type=raw,value=${{ matrix.variant }}-${{ matrix.arch }}-sha-${{ github.sha }}
6162
6263
- name: Build and push
6364
uses: docker/build-push-action@v6
@@ -66,4 +67,89 @@ jobs:
6667
platforms: linux/${{ matrix.arch }}
6768
push: ${{ github.event_name != 'pull_request' || github.event.pull_request.head.repo.full_name == github.repository }}
6869
tags: ${{ steps.meta.outputs.tags }}
69-
labels: ${{ steps.meta.outputs.labels }}
70+
provenance: false
71+
build-args: |
72+
PYTORCH_VARIANT=${{ matrix.variant }}
73+
74+
create-manifests:
75+
needs: build-and-push
76+
runs-on: ubuntu-latest
77+
if: github.event_name != 'pull_request' || github.event.pull_request.head.repo.full_name == github.repository
78+
permissions:
79+
packages: write
80+
steps:
81+
- name: Set up QEMU
82+
uses: docker/setup-qemu-action@v3
83+
84+
- name: Setup Docker Buildx
85+
uses: docker/setup-buildx-action@v3
86+
with:
87+
driver-opts: |
88+
image=moby/buildkit:latest
89+
network=host
90+
91+
- name: Log in to GHCR
92+
uses: docker/login-action@v3
93+
with:
94+
registry: ghcr.io
95+
username: ${{ github.actor }}
96+
password: ${{ secrets.GITHUB_TOKEN }}
97+
98+
- name: Create and push CPU manifest
99+
run: |
100+
# Determine the correct tag suffixes based on the event type
101+
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
102+
# PR build - use the commit SHA for more predictable references
103+
AMD64_TAG="cpu-amd64-sha-${{ github.sha }}"
104+
ARM64_TAG="cpu-arm64-sha-${{ github.sha }}"
105+
TARGET_TAG="cpu-sha-${{ github.sha }}"
106+
elif [[ "${{ startsWith(github.ref, 'refs/tags/') }}" == "true" ]]; then
107+
# Tag build - use tag version in the name
108+
AMD64_TAG="cpu-amd64-${{ github.ref_name }}"
109+
ARM64_TAG="cpu-arm64-${{ github.ref_name }}"
110+
TARGET_TAG="cpu-${{ github.ref_name }}"
111+
else
112+
# Main branch build - use simple arch tags
113+
AMD64_TAG="cpu-amd64"
114+
ARM64_TAG="cpu-arm64"
115+
TARGET_TAG="cpu"
116+
fi
117+
118+
# Create the manifest with the correct tag names
119+
echo "Creating CPU manifest using $AMD64_TAG and $ARM64_TAG"
120+
docker buildx imagetools create --tag ghcr.io/${{ github.repository }}:${TARGET_TAG} \
121+
ghcr.io/${{ github.repository }}:${AMD64_TAG} \
122+
ghcr.io/${{ github.repository }}:${ARM64_TAG}
123+
124+
# If on main branch, also tag as latest
125+
if [[ "${{ github.ref }}" == "refs/heads/main" ]]; then
126+
docker buildx imagetools create --tag ghcr.io/${{ github.repository }}:latest \
127+
ghcr.io/${{ github.repository }}:${AMD64_TAG} \
128+
ghcr.io/${{ github.repository }}:${ARM64_TAG}
129+
fi
130+
131+
- name: Create and push CUDA manifest
132+
run: |
133+
# Determine the correct tag suffixes based on the event type
134+
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
135+
# PR build - use the commit SHA for more pred 1E0A ictable references
136+
AMD64_TAG="cuda-amd64-sha-${{ github.sha }}"
137+
ARM64_TAG="cuda-arm64-sha-${{ github.sha }}"
138+
TARGET_TAG="cuda-sha-${{ github.sha }}"
139+
elif [[ "${{ startsWith(github.ref, 'refs/tags/') }}" == "true" ]]; then
140+
# Tag build - use tag version in the name
141+
AMD64_TAG="cuda-amd64-${{ github.ref_name }}"
142+
ARM64_TAG="cuda-arm64-${{ github.ref_name }}"
143+
TARGET_TAG="cuda-${{ github.ref_name }}"
144+
else
145+
# Main branch build - use simple arch tags
146+
AMD64_TAG="cuda-amd64"
147+
ARM64_TAG="cuda-arm64"
148+
TARGET_TAG="cuda"
149+
fi
150+
151+
# Create the manifest with the correct tag names
152+
echo "Creating CUDA manifest using $AMD64_TAG and $ARM64_TAG"
153+
docker buildx imagetools create --tag ghcr.io/${{ github.repository }}:${TARGET_TAG} \
154+
ghcr.io/${{ github.repository }}:${AMD64_TAG} \
155+
ghcr.io/${{ github.repository }}:${ARM64_TAG}

Dockerfile

Lines changed: 53 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,8 @@
1-
# Use the jupyter/minimal-notebook as the base image
2-
FROM quay.io/jupyter/minimal-notebook:latest
1+
# Stage 1: Build environment
2+
FROM quay.io/jupyter/minimal-notebook:latest AS builder
33

4-
# Metadata labels
5-
LABEL org.opencontainers.image.title="Python Tutorial"
6-
LABEL org.opencontainers.image.description="A containerized Python tutorial environment with Jupyter Lab."
7-
LABEL org.opencontainers.image.authors="Empa Scientific IT <scientificit@empa.ch>"
8-
LABEL org.opencontainers.image.url="https://github.com/empa-scientific-it/python-tutorial"
9-
LABEL org.opencontainers.image.source="https://github.com/empa-scientific-it/python-tutorial"
10-
LABEL org.opencontainers.image.version="1.0.0"
11-
LABEL org.opencontainers.image.licenses="MIT"
12-
13-
# Set environment variables for the tutorial and repository
14-
ENV BASENAME="python-tutorial"
15-
ENV REPO=${HOME}/${BASENAME}
16-
ENV IPYTHONDIR="${HOME}/.ipython"
4+
# Define build argument for PyTorch variant (cpu or cuda)
5+
ARG PYTORCH_VARIANT=cpu
176

187
# Switch to root user to install additional dependencies
198
USER root
@@ -33,16 +22,61 @@ USER ${NB_UID}
3322
# Set up the Conda environment
3423
COPY docker/environment.yml /tmp/environment.yml
3524
RUN mamba env update -n base -f /tmp/environment.yml && \
25+
# Force remove any existing PyTorch installations first
26+
pip uninstall -y torch torchvision && \
27+
# Install PyTorch packages without cache - conditionally based on variant
28+
if [ "$PYTORCH_VARIANT" = "cpu" ]; then \
29+
echo "Installing CPU-only PyTorch" && \
30+
pip install --no-cache-dir --force-reinstall torch torchvision --index-url https://download.pytorch.org/whl/cpu; \
31+
else \
32+
echo "Installing CUDA-enabled PyTorch" && \
33+
pip install --no-cache-dir --force-reinstall torch torchvision; \
34+
fi && \
35+
# Clean up all package caches to reduce image size
3636
mamba clean --all -f -y && \
37+
# Remove pip cache
38+
rm -rf ~/.cache/pip && \
3739
fix-permissions "${CONDA_DIR}" && \
3840
fix-permissions "/home/${NB_USER}"
3941

40-
# Prepare IPython configuration (move earlier in the build)
41-
RUN mkdir -p ${HOME}/.ipython/profile_default
42+
# Stage 2: Runtime environment - creates a lighter final image
43+
FROM quay.io/jupyter/minimal-notebook:latest
44+
45+
# Inherit build argument for image labeling
46+
ARG PYTORCH_VARIANT=cpu
47+
48+
# Metadata labels
49+
LABEL org.opencontainers.image.title="Python Tutorial"
50+
LABEL org.opencontainers.image.description="A containerized Python tutorial environment with Jupyter Lab."
51+
LABEL org.opencontainers.image.authors="Empa Scientific IT <scientificit@empa.ch>"
52+
LABEL org.opencontainers.image.url="https://github.com/empa-scientific-it/python-tutorial"
53+
LABEL org.opencontainers.image.source="https://github.com/empa-scientific-it/python-tutorial"
54+
LABEL org.opencontainers.image.version="1.0.0"
55+
LABEL org.opencontainers.image.licenses="MIT"
56+
LABEL org.opencontainers.image.variant="pytorch-${PYTORCH_VARIANT}"
57+
58+
# Switch to root user to install minimal dependencies
59+
USER root
60+
RUN apt-get update && \
61+
apt-get install -y --no-install-recommends \
62+
libgl1 && \
63+
apt-get clean && \
64+
rm -rf /var/lib/apt/lists/*
65+
66+
# Switch back to the default notebook user
67+
USER ${NB_UID}
68+
69+
# Copy the conda environment from the builder stage
70+
COPY --from=builder ${CONDA_DIR} ${CONDA_DIR}
71+
72+
# Copy home directory with configurations
73+
COPY --from=builder --chown=${NB_UID}:${NB_GID} /home/${NB_USER} /home/${NB_USER}
74+
75+
# Prepare IPython configuration
4276
COPY --chown=${NB_UID}:${NB_GID} binder/ipython_config.py ${HOME}/.ipython/profile_default/
4377

44-
# Set the working directory to the repository
45-
WORKDIR ${REPO}
78+
# Set the working directory to user's home (repository will be cloned here by Renku)
79+
WORKDIR /home/${NB_USER}
4680

4781
# Use the default ENTRYPOINT from the base image to start Jupyter Lab
4882
ENTRYPOINT ["tini", "-g", "--", "start.sh"]

README.md

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ You should now create a new environment with `conda`:
4848
conda env create -f binder/environment.yml
4949
```
5050

51-
> **Warning**
51+
> [!WARNING]
5252
>
5353
> If you are on Windows and using Command Prompt or the PowerShell, please make sure to adjust the paths in the commands above accordingly.
5454
@@ -74,7 +74,7 @@ jupyter lab
7474

7575
### 2. With Docker
7676

77-
> **Note**
77+
> [!NOTE]
7878
>
7979
> The following instructions are for Windows. With minor changes, the steps work on macOS or Linux as well.
8080
@@ -84,28 +84,44 @@ jupyter lab
8484

8585
3. Open PowerShell: Once Docker Desktop is installed, open PowerShell on your Windows machine. You can do this by pressing the "Windows" key and typing "PowerShell" in the search bar.
8686

87-
4. Pull the Docker image: In PowerShell, run the following command to pull the "empascientificit/python-tutorial" Docker image:
87+
4. Pull the Docker image: In PowerShell, run the following command to pull the Docker im F438 age:
8888

8989
```console
9090
docker pull ghcr.io/empa-scientific-it/python-tutorial:latest
9191
```
9292

93+
> [!NOTE]
94+
>
95+
> The `latest` tag points to the CPU-only variant of the image, which is optimized for size and compatibility. If you have a CUDA-compatible GPU and want to use GPU acceleration for PyTorch operations, you can use the CUDA-enabled variant by replacing `latest` with `cuda`:
96+
>
97+
> ```console
98+
> docker pull ghcr.io/empa-scientific-it/python-tutorial:cuda
99+
> ```
100+
101+
> [!IMPORTANT]
102+
>
103+
> Using the CUDA variant requires a NVIDIA GPU with compatible drivers properly installed and configured for Docker. See [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) for setup instructions.
104+
93105
5. Run the Docker container: Once the image is downloaded, run the following command to start a Docker container from the image:
94106
95107
```console
96108
docker run -p 8888:8888 --name python_tutorial -v /path/to/python-tutorial:/home/jovyan/python-tutorial ghcr.io/empa-scientific-it/python-tutorial:latest jupyter lab --ip 0.0.0.0 --no-browser
97109
```
98110
111+
> [!NOTE]
112+
>
113+
> If you pulled the CUDA variant, replace `:latest` with `:cuda` in the command above.
114+
99115
Replace `/path/to/python-tutorial` with the path to the folder you created in step 2, for example `C:/Users/yourusername/Desktop/python-tutorial`.
100116

101-
> **Note**
117+
> [!NOTE]
102118
>
103-
> The above command will **mirror** the content of your local folder (e.g., `C:/Users/yourusername/Desktop/python-tutorial`) to the `work/` folder **inside the container**. In this way, every file or folder you copy or create into `work/` will be saved on your machine, and will remain there **even if you stop Docker**.
119+
> The above command will **mirror** the content of your local folder (e.g., `C:/Users/yourusername/Desktop/python-tutorial`) to the `~/python-tutorial` folder **inside the container**. In this way, every file or folder you copy or create into `~/python-tutorial` will be saved on your machine, and will remain there **even if you stop Docker**.
104120
105121
6. Access the Jupyter Notebook: Open a web browser and navigate to `http://localhost:8888/lab`. You should see the Jupyter Notebook interface. Enter the token provided in the PowerShell console to access the notebook. Alternatively, you can directly click on the link that appears in the PowerShell after the container has started.
106122

107123
You can now use the Jupyter in the Docker container to run the python-tutorial. When you're done, you can stop the container by pressing `Ctrl+C` in the PowerShell console.
108124

109-
> **Note**
125+
> [!NOTE]
110126
>
111127
> If you want to restart the container, you can simply run the command `docker container start python_tutorial`.

docker/environment.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,5 @@ dependencies:
2626
- python-dotenv
2727
- pillow
2828
- opencv-python
29-
- torch
30-
- torchaudio
31-
- torchvision
3229
- albumentations
3330
- grad-cam

0 commit comments

Comments
 (0)
0