-
Notifications
You must be signed in to change notification settings - Fork 35
add e2e notebook tests #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add e2e notebook tests #27
Conversation
@andreyvelich Can you approve this workflow? |
/ok-to-test |
Makefile
Outdated
LOCALBIN ?= $(PROJECT_DIR)/bin | ||
|
||
# Tool versions | ||
KIND_VERSION ?= $(shell go list -m -f '{{.Version}}' sigs.k8s.io/kind) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have go.mod
in sdk repo, could just set a fallback value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, will set to current version, v0.27.0, thanks
.github/workflows/test-e2e.yaml
Outdated
NOTEBOOK_INPUT=./examples/training/pytorch/image-classification/mnist.ipynb \ | ||
NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_mnist.ipynb \ | ||
PAPERMILL_TIMEOUT=900 | ||
make test-e2e-notebook \ | ||
NOTEBOOK_INPUT=./examples/training/pytorch/question-answering/fine-tune-distilbert.ipynb \ | ||
NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_fine-tune-distilbert.ipynb \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those should be imported from trainer repo, right?
NOTEBOOK_INPUT=./examples/training/pytorch/image-classification/mnist.ipynb \ | |
NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_mnist.ipynb \ | |
PAPERMILL_TIMEOUT=900 | |
make test-e2e-notebook \ | |
NOTEBOOK_INPUT=./examples/training/pytorch/question-answering/fine-tune-distilbert.ipynb \ | |
NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_fine-tune-distilbert.ipynb \ | |
NOTEBOOK_INPUT=./trainer/examples/pytorch/image-classification/mnist.ipynb \ | |
NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_mnist.ipynb \ | |
PAPERMILL_TIMEOUT=900 | |
make test-e2e-notebook \ | |
NOTEBOOK_INPUT=./trainer/examples/pytorch/question-answering/fine-tune-distilbert.ipynb \ | |
NOTEBOOK_OUTPUT=./artifacts/notebooks/${{ matrix.kubernetes-version }}_fine-tune-distilbert.ipynb \ |
@Electronic-Waste @andreyvelich Workflow is waiting approval again. |
Done. |
strategy: | ||
fail-fast: false | ||
matrix: | ||
kubernetes-version: ["1.29.14", "1.30.0", "1.31.0", "1.32.3"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we drop support for Kubernetes v1.29 and add v1.33 ?
.github/workflows/test-e2e.yaml
Outdated
- name: Setup Go | ||
uses: actions/setup-go@v5 | ||
with: | ||
go-version: ${{ env.GO_VERSION }} # Use the GO_VERSION environment variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need Go for this workflow ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go is used to install kind - link Following the approach from Trainer.
Makefile
Outdated
|
||
.PHONY: test-e2e-setup-cluster | ||
test-e2e-setup-cluster: kind ## Setup Kind cluster for e2e test. | ||
KIND=$(KIND) K8S_VERSION=$(K8S_VERSION) ./hack/e2e-setup-cluster.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we just re-use script from the kubeflow/trainer
?
Maybe we can download repo into tmp/
and run make
commands ?
In that case, we don't need to maintain Go and Kind versions in 2 places.
WDYT @briangallagher @tenzen-y ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich Are you suggesting to re-use only e2e-setup-cluster.sh ?
Overtime the sdk repo might need its own cluster setup, something additional to what is in trainer? Is it useful to have it separate from the beginning and allow it to evolve?
That said, I don't mind either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially we don't maintain any control plane components in the SDK, we don't need to have separate installation stage. I would suggest to re-use the script for now and evolve it later if that is required.
Please make sure that you enforce PSS baseline or restricted on the Kubernetes namespaces you create. |
Hi @briangallagher, did you get a chance to check comments ? |
Signed-off-by: Brian Gallagher <briangal@gmail.com>
Signed-off-by: Brian Gallagher <briangal@gmail.com>
Signed-off-by: Brian Gallagher <briangal@gmail.com>
Signed-off-by: Brian Gallagher <briangal@gmail.com>
e7a2398
to
bce90c0
Compare
done now, apologies, I was on extended leave. |
@andreyvelich @Electronic-Waste Can you enable the workflow and support for ubuntu-latest-16-cores. I can't test this fully on my own repo due to resource limitations. |
I've enabled them already. Let's see if e2e will succeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you for this @briangallagher!
/lgtm
/assign @Electronic-Waste @astefanutti
/ok-to-test |
|
||
strategy: | ||
fail-fast: false | ||
matrix: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we start adding the Trainer version in the matrix right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the sdk is compatible with v2 of the trainer and it's not yet released, I'm not sure what we would put here or how it would work. I think we should address it in a future PR, wdyt? Related maybe, we should finalise the version control and compatibility strategy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we could either put master
there right now, but addressing it in a future PR is perfectly fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added master ref
Signed-off-by: Brian Gallagher <briangal@gmail.com>
/lgtm |
Signed-off-by: Brian Gallagher <briangal@gmail.com>
fadebd0
to
23a8095
Compare
There was a problem hiding this comment.
Choose a reaso 5DA0 n for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @briangallagher!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Add e2e tests to ensure that the sdk is still compatible with the latest version of trainer.
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...
format, will close the issue(s) when PR gets merged):Fixes #18
Checklist: