-
Notifications
You must be signed in to change notification settings - Fork 826
feat(runtimes): Add Framework Label to the Runtimes #2761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(runtimes): Add Framework Label to the Runtimes #2761
Conversation
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Pull Request Test Coverage Report for Build 16645963859Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!!
/lgtm
Out of curiosity, what are the plans with this mpi runtime?
# TODO (andreyvelich): Change this to DeepSpeed or MLX runtime. |
metadata: | ||
name: deepspeed-distributed | ||
labels: | ||
trainer.kubeflow.org/trainer-type: custom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure, we don't think the trainer type can be safely inferred in the SDK from the framework
label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can if we define the mapping of supported builtin trainers in the SDK.
Shall we try to do that initially @astefanutti ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd be inclined to try that so we keep what has to be exposed on the training runtimes minimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, for yaml users, they'll use the runtime without trainer.kubeflow.org/trainer-type
label? Is this label only intended for the validation in SDK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, for yaml users,
@Electronic-Waste I don't think that this is needed for YAML users.
If users are familiar with kubectl
, they can always check the TrainJob and TrainingRuntimeSpec by themself.
Also, it is very tricky to use TorchTune runtimes without SDK, since user doesn't know which parameters they can specify (e.g. TorchTuneConfig)
We have WIP PR to remove it: #2760, we still discuss how to define deprecation strategy for the runtimes. |
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* feat(runtimes): Add Trainer Type and Framework Labels Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Remove trainer type from the labels Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
* feat(runtimes): Add Trainer Type and Framework Labels * Remove trainer type from the labels --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
As we discussed in Slack and GitHub, we would like to introduce this label to the runtime to define ML Framework:
Ref: kubeflow/sdk#31 (comment),
https://cloud-native.slack.com/archives/C0742LDFZ4K/p1753710956860929
/assign @kubeflow/kubeflow-trainer-team @astefanutti @kramaranya