8000 add e2e notebook tests by briangallagher · Pull Request #27 · kubeflow/sdk · GitHub
[go: up one dir, main page]

Skip to content

Conversation

briangallagher
Copy link
Contributor

What this PR does / why we need it:
Add e2e tests to ensure that the sdk is still compatible with the latest version of trainer.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):

Fixes #18

Checklist:

  • Docs included if any changes are user facing

@briangallagher
Copy link
Contributor Author

@andreyvelich Can you approve this workflow?

@Electronic-Waste
Copy link
Member

/ok-to-test
/rerun-all

Makefile Outdated
LOCALBIN ?= $(PROJECT_DIR)/bin

# Tool versions
KIND_VERSION ?= $(shell go list -m -f '{{.Version}}' sigs.k8s.io/kind)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have go.mod in sdk repo, could just set a fallback value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will set to current version, v0.27.0, thanks

Comment on lines 76 to 81
NOTEBOOK_INPUT=./examples/training/pytorch/image-classification/mnist.ipynb \
NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_mnist.ipynb \
PAPERMILL_TIMEOUT=900
make test-e2e-notebook \
NOTEBOOK_INPUT=./examples/training/pytorch/question-answering/fine-tune-distilbert.ipynb \
NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_fine-tune-distilbert.ipynb \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those should be imported from trainer repo, right?

Suggested change
NOTEBOOK_INPUT=./examples/training/pytorch/image-classification/mnist.ipynb \
NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_mnist.ipynb \
PAPERMILL_TIMEOUT=900
make test-e2e-notebook \
NOTEBOOK_INPUT=./examples/training/pytorch/question-answering/fine-tune-distilbert.ipynb \
NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_fine-tune-distilbert.ipynb \
NOTEBOOK_INPUT=./trainer/examples/pytorch/image-classification/mnist.ipynb \
NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_mnist.ipynb \
PAPERMILL_TIMEOUT=900
make test-e2e-notebook \
NOTEBOOK_INPUT=./trainer/examples/pytorch/question-answering/fine-tune-distilbert.ipynb \
NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_fine-tune-distilbert.ipynb \

@briangallagher
Copy link
Contributor Author

@Electronic-Waste @andreyvelich Workflow is waiting approval again.
Also, I think it got stuck in the queue the last time because the workflow is requesting ubuntu-latest-16-cores
I copied this config from the trainer repo so I presume it will be required to execute the notebooks successfully?
Can this 16 core image be approved and configured?

@tenzen-y
Copy link
Member

@Electronic-Waste @andreyvelich Workflow is waiting approval again. Also, I think it got stuck in the queue the last time because the workflow is requesting ubuntu-latest-16-cores I copied this config from the trainer repo so I presume it will be required to execute the notebooks successfully? Can this 16 core image be approved and configured?

Done.

strategy:
fail-fast: false
matrix:
kubernetes-version: ["1.29.14", "1.30.0", "1.31.0", "1.32.3"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop support for Kubernetes v1.29 and add v1.33 ?

Comment on lines 44 to 47
- name: Setup Go
uses: actions/setup-go@v5
with:
go-version: ${{ env.GO_VERSION }} # Use the GO_VERSION environment variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need Go for this workflow ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go is used to install kind - link Following the approach from Trainer.

Makefile Outdated

.PHONY: test-e2e-setup-cluster
test-e2e-setup-cluster: kind ## Setup Kind cluster for e2e test.
KIND=$(KIND) K8S_VERSION=$(K8S_VERSION) ./hack/e2e-setup-cluster.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we just re-use script from the kubeflow/trainer ?
Maybe we can download repo into tmp/ and run make commands ?
In that case, we don't need to maintain Go and Kind versions in 2 places.

WDYT @briangallagher @tenzen-y ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich Are you suggesting to re-use only e2e-setup-cluster.sh ?
Overtime the sdk repo might need its own cluster setup, something additional to what is in trainer? Is it useful to have it separate from the beginning and allow it to evolve?
That said, I don't mind either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially we don't maintain any control plane components in the SDK, we don't need to have separate installation stage. I would suggest to re-use the script for now and evolve it later if that is required.

@juliusvonkohout
Copy link
Member

Please make sure that you enforce PSS baseline or restricted on the Kubernetes namespaces you create.

@andreyvelich
Copy link
Member

Hi @briangallagher, did you get a chance to check comments ?

Signed-off-by: Brian Gallagher <briangal@gmail.com>
Signed-off-by: Brian Gallagher <briangal@gmail.com>
Signed-off-by: Brian Gallagher <briangal@gmail.com>
Signed-off-by: Brian Gallagher <briangal@gmail.com>
@briangallagher briangallagher force-pushed the add-e2e-notebook-tests branch from e7a2398 to bce90c0 Compare July 14, 2025 16:11
@google-oss-prow google-oss-prow bot added size/M and removed size/L labels Jul 14, 2025
@briangallagher
Copy link
Contributor Author

Hi @briangallagher, did you get a chance to check comments ?

done now, apologies, I was on extended leave.

@briangallagher
Copy link
Contributor Author

@andreyvelich @Electronic-Waste Can you enable the workflow and support for ubuntu-latest-16-cores. I can't test this fully on my own repo due to resource limitations.

@andreyvelich
Copy link
Member

@andreyvelich @Electronic-Waste Can you enable the workflow and support for ubuntu-latest-16-cores. I can't test this fully on my own repo due to resource limitations.

I've enabled them already. Let's see if e2e will succeed.

Copy link
Member
@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you for this @briangallagher!
/lgtm
/assign @Electronic-Waste @astefanutti

@andreyvelich
Copy link
Member

/ok-to-test


strategy:
fail-fast: false
matrix:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we start adding the Trainer version in the matrix right now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the sdk is compatible with v2 of the trainer and it's not yet released, I'm not sure what we would put here or how it would work. I think we should address it in a future PR, wdyt? Related maybe, we should finalise the version control and compatibility strategy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we could either put master there right now, but addressing it in a future PR is perfectly fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added master ref

Signed-off-by: Brian Gallagher <briangal@gmail.com>
@google-oss-prow google-oss-prow bot removed the lgtm label Jul 17, 2025
@astefanutti
Copy link
Contributor

/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Jul 17, 2025
@google-oss-prow google-oss-prow bot removed the lgtm label Jul 17, 2025
Signed-off-by: Brian Gallagher <briangal@gmail.com>
@briangallagher briangallagher force-pushed the add-e2e-notebook-tests branch from fadebd0 to 23a8095 Compare July 17, 2025 12:57
Copy link
Member
@andreyvelich andreyvelich left a comment

Choose a reaso 5DA0 n for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @briangallagher!
/lgtm
/approve

@google-oss-prow google-oss-prow bot added the lgtm label Jul 17, 2025
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 634cda1 into kubeflow:main Jul 17, 2025
7 checks passed
@google-oss-prow google-oss-prow bot added this to the v0.1 milestone Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add E2E tests for Kubeflow SDK
7 participants
0