add e2e notebook tests #27

briangallagher · 2025-06-10T20:16:02Z

What this PR does / why we need it:
Add e2e tests to ensure that the sdk is still compatible with the latest version of trainer.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):

Fixes #18

Checklist:

Docs included if any changes are user facing

briangallagher · 2025-06-11T09:38:36Z

@andreyvelich Can you approve this workflow?

Electronic-Waste · 2025-06-11T09:41:32Z

/ok-to-test
/rerun-all

kramaranya · 2025-06-12T07:03:43Z

Makefile

+LOCALBIN ?= $(PROJECT_DIR)/bin
+
+# Tool versions
+KIND_VERSION ?= $(shell go list -m -f '{{.Version}}' sigs.k8s.io/kind)


We don't have go.mod in sdk repo, could just set a fallback value?

yes, will set to current version, v0.27.0, thanks

kramaranya · 2025-06-12T07:31:23Z

.github/workflows/test-e2e.yaml

+            NOTEBOOK_INPUT=./examples/training/pytorch/image-classification/mnist.ipynb \
+            NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_mnist.ipynb \
+            PAPERMILL_TIMEOUT=900
+          make test-e2e-notebook \
+            NOTEBOOK_INPUT=./examples/training/pytorch/question-answering/fine-tune-distilbert.ipynb \
+            NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_fine-tune-distilbert.ipynb \


Those should be imported from trainer repo, right?

Suggested change

NOTEBOOK_INPUT=./examples/training/pytorch/image-classification/mnist.ipynb \

NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_mnist.ipynb \

PAPERMILL_TIMEOUT=900

make test-e2e-notebook \

NOTEBOOK_INPUT=./examples/training/pytorch/question-answering/fine-tune-distilbert.ipynb \

NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_fine-tune-distilbert.ipynb \

NOTEBOOK_INPUT=./trainer/examples/pytorch/image-classification/mnist.ipynb \

NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_mnist.ipynb \

PAPERMILL_TIMEOUT=900

make test-e2e-notebook \

NOTEBOOK_INPUT=./trainer/examples/pytorch/question-answering/fine-tune-distilbert.ipynb \

NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_fine-tune-distilbert.ipynb \

briangallagher · 2025-06-17T08:41:47Z

@Electronic-Waste @andreyvelich Workflow is waiting approval again.
Also, I think it got stuck in the queue the last time because the workflow is requesting ubuntu-latest-16-cores
I copied this config from the trainer repo so I presume it will be required to execute the notebooks successfully?
Can this 16 core image be approved and configured?

tenzen-y · 2025-06-20T14:16:48Z

@Electronic-Waste @andreyvelich Workflow is waiting approval again. Also, I think it got stuck in the queue the last time because the workflow is requesting ubuntu-latest-16-cores I copied this config from the trainer repo so I presume it will be required to execute the notebooks successfully? Can this 16 core image be approved and configured?

Done.

.github/workflows/test-e2e.yaml

andreyvelich · 2025-06-26T11:45:23Z

.github/workflows/test-e2e.yaml

+    strategy:
+      fail-fast: false
+      matrix:
+        kubernetes-version: ["1.29.14", "1.30.0", "1.31.0", "1.32.3"]


Can we drop support for Kubernetes v1.29 and add v1.33 ?

andreyvelich · 2025-06-26T11:47:10Z

.github/workflows/test-e2e.yaml

+      - name: Setup Go
+        uses: actions/setup-go@v5
+        with:
+          go-version: ${{ env.GO_VERSION }} # Use the GO_VERSION environment variable


Why do we need Go for this workflow ?

Go is used to install kind - link Following the approach from Trainer.

andreyvelich · 2025-06-26T11:51:39Z

Makefile

+
+.PHONY: test-e2e-setup-cluster
+test-e2e-setup-cluster: kind ## Setup Kind cluster for e2e test.
+	KIND=$(KIND) K8S_VERSION=$(K8S_VERSION) ./hack/e2e-setup-cluster.sh


Shall we just re-use script from the kubeflow/trainer ?
Maybe we can download repo into tmp/ and run make commands ?
In that case, we don't need to maintain Go and Kind versions in 2 places.

WDYT @briangallagher @tenzen-y ?

@andreyvelich Are you suggesting to re-use only e2e-setup-cluster.sh ?
Overtime the sdk repo might need its own cluster setup, something additional to what is in trainer? Is it useful to have it separate from the beginning and allow it to evolve?
That said, I don't mind either way.

Initially we don't maintain any control plane components in the SDK, we don't need to have separate installation stage. I would suggest to re-use the script for now and evolve it later if that is required.

juliusvonkohout · 2025-06-26T16:40:17Z

Please make sure that you enforce PSS baseline or restricted on the Kubernetes namespaces you create.

andreyvelich · 2025-07-03T12:12:28Z

Hi @briangallagher, did you get a chance to check comments ?

Signed-off-by: Brian Gallagher <briangal@gmail.com>

briangallagher · 2025-07-14T16:12:23Z

Hi @briangallagher, did you get a chance to check comments ?

done now, apologies, I was on extended leave.

briangallagher · 2025-07-16T10:38:27Z

@andreyvelich @Electronic-Waste Can you enable the workflow and support for ubuntu-latest-16-cores. I can't test this fully on my own repo due to resource limitations.

andreyvelich · 2025-07-16T16:14:18Z

@andreyvelich @Electronic-Waste Can you enable the workflow and support for ubuntu-latest-16-cores. I can't test this fully on my own repo due to resource limitations.

I've enabled them already. Let's see if e2e will succeed.

andreyvelich

Looks great, thank you for this @briangallagher!
/lgtm
/assign @Electronic-Waste @astefanutti

andreyvelich · 2025-07-16T16:17:16Z

/ok-to-test

.github/workflows/test-e2e.yaml

astefanutti · 2025-07-17T06:56:29Z

.github/workflows/test-e2e.yaml

+
+    strategy:
+      fail-fast: false
+      matrix:


Should we start adding the Trainer version in the matrix right now?

Given that the sdk is compatible with v2 of the trainer and it's not yet released, I'm not sure what we would put here or how it would work. I think we should address it in a future PR, wdyt? Related maybe, we should finalise the version control and compatibility strategy

Right, we could either put master there right now, but addressing it in a future PR is perfectly fine too.

added master ref

.github/workflows/test-e2e.yaml

Signed-off-by: Brian Gallagher <briangal@gmail.com>

astefanutti · 2025-07-17T09:34:28Z

/lgtm

.github/workflows/test-e2e.yaml

Signed-off-by: Brian Gallagher <briangal@gmail.com>

andreyvelich

Thanks @briangallagher!
/lgtm
/approve

google-oss-prow · 2025-07-17T13:47:00Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [andreyvelich]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

google-oss-prow bot added the size/L label Jun 10, 2025

google-oss-prow bot added the ok-to-test label Jun 11, 2025

kramaranya reviewed Jun 12, 2025

View reviewed changes

tenzen-y reviewed Jun 20, 2025

View reviewed changes

.github/workflows/test-e2e.yaml Outdated Show resolved Hide resolved

andreyvelich reviewed Jun 26, 2025

View reviewed changes

briangallagher added 4 commits July 11, 2025 11:04

add e2e ntoebook tests

93a21d9

Signed-off-by: Brian Gallagher <briangal@gmail.com>

address comments in review for e2e test

3a9ea54

Signed-off-by: Brian Gallagher <briangal@gmail.com>

addressing review comments

8bc9499

Signed-off-by: Brian Gallagher <briangal@gmail.com>

running cluster setup and e2e tests from checked out trainer repo

bce90c0

Signed-off-by: Brian Gallagher <briangal@gmail.com>

briangallagher force-pushed the add-e2e-notebook-tests branch from e7a2398 to bce90c0 Compare July 14, 2025 16:11

google-oss-prow bot added size/M and removed size/L labels Jul 14, 2025

andreyvelich reviewed Jul 16, 2025

View reviewed changes

AE20
google-oss-prow bot assigned astefanutti, Electronic-Waste and andreyvelich Jul 16, 2025

google-oss-prow bot added the lgtm label Jul 16, 2025

andreyvelich reviewed Jul 16, 2025

View reviewed changes

.github/workflows/test-e2e.yaml Outdated Show resolved Hide resolved

astefanutti reviewed Jul 17, 2025

View reviewed changes

update kubernetes-version support

d7fbbed

Signed-off-by: Brian Gallagher <briangal@gmail.com>

google-oss-prow bot removed the lgtm label Jul 17, 2025

google-oss-prow bot added the lgtm label Jul 17, 2025

andreyvelich reviewed Jul 17, 2025

View reviewed changes

.github/workflows/test-e2e.yaml Outdated Show resolved Hide resolved

google-oss-prow bot removed the lgtm label Jul 17, 2025

use oci gh arc runner for e2e test workflow

23a8095

Signed-off-by: Brian Gallagher <briangal@gmail.com>

briangallagher force-pushed the add-e2e-notebook-tests branch from fadebd0 to 23a8095 Compare July 17, 2025 12:57

andreyvelich reviewed Jul 17, 2025

View reviewed changes

google-oss-prow bot added the lgtm label Jul 17, 2025

google-oss-prow bot added the approved label Jul 17, 2025

google-oss-prow bot merged commit 634cda1 into kubeflow:main Jul 17, 2025
7 checks passed

google-oss-prow bot added this to the v0.1 milestone Jul 17, 2025

add e2e notebook tests #27

add e2e notebook tests #27

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reaso 5DA0 n for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!