feat(runtimes): Add Framework Label to the Runtimes #2761

andreyvelich · 2025-07-30T23:05:16Z

As we discussed in Slack and GitHub, we would like to introduce this label to the runtime to define ML Framework:

trainer.kubeflow.org/framework

Ref: kubeflow/sdk#31 (comment),
https://cloud-native.slack.com/archives/C0742LDFZ4K/p1753710956860929

/assign @kubeflow/kubeflow-trainer-team @astefanutti @kramaranya

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

coveralls · 2025-07-30T23:09:51Z

Pull Request Test Coverage Report for Build 16645963859

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 0.0%

Totals
Change from base Build 16592909581:	0.0%
Covered Lines:	0
Relevant Lines:	0

💛 - Coveralls

kramaranya

Thank you!!
/lgtm

Out of curiosity, what are the plans with this mpi runtime?

trainer/manifests/base/runtimes/mpi_distributed.yaml

Line 1 in dbf13dd

# TODO (andreyvelich): Change this to DeepSpeed or MLX runtime.

astefanutti · 2025-07-31T07:53:51Z

manifests/base/runtimes/deepspeed_distributed.yaml

 metadata:
  name: deepspeed-distributed
+  labels:
+    trainer.kubeflow.org/trainer-type: custom


Just to be sure, we don't think the trainer type can be safely inferred in the SDK from the framework label?

We can if we define the mapping of supported builtin trainers in the SDK.
Shall we try to do that initially @astefanutti ?

Yes, I'd be inclined to try that so we keep what has to be exposed on the training runtimes minimal.

So, for yaml users, they'll use the runtime without trainer.kubeflow.org/trainer-type label? Is this label only intended for the validation in SDK?

So, for yaml users,

@Electronic-Waste I don't think that this is needed for YAML users.
If users are familiar with kubectl, they can always check the TrainJob and TrainingRuntimeSpec by themself.
Also, it is very tricky to use TorchTune runtimes without SDK, since user doesn't know which parameters they can specify (e.g. TorchTuneConfig)

andreyvelich · 2025-07-31T09:51:00Z

Thank you!! /lgtm

Out of curiosity, what are the plans with this mpi runtime?

trainer/manifests/base/runtimes/mpi_distributed.yaml

Line 1 in dbf13dd

# TODO (andreyvelich): Change this to DeepSpeed or MLX runtime.

We have WIP PR to remove it: #2760, we still discuss how to define deprecation strategy for the runtimes.

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

astefanutti · 2025-07-31T10:32:38Z

/lgtm

tenzen-y

Thx
/lgtm
/approve

google-oss-prow · 2025-07-31T10:35:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* feat(runtimes): Add Trainer Type and Framework Labels Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> * Remove trainer type from the labels Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

* feat(runtimes): Add Trainer Type and Framework Labels * Remove trainer type from the labels --------- Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

feat(runtimes): Add Trainer Type and Framework Labels

dbf13dd

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

google-oss-prow bot requested review from astefanutti and kuizhiqing July 30, 2025 23:05

google-oss-prow bot added the size/S label Jul 30, 2025

kramaranya reviewed Jul 31, 2025

View reviewed changes

google-oss-prow 10BC0 bot assigned kramaranya Jul 31, 2025

google-oss-prow bot added the lgtm label Jul 31, 2025

astefanutti reviewed Jul 31, 2025

View reviewed changes

Remove trainer type from the labels

90e5bdc

Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>

google-oss-prow bot removed the lgtm label Jul 31, 2025

andreyvelich changed the title ~~feat(runtimes): Add Trainer Type and Framework Labels~~ feat(runtimes): Add Framework Label to the Runtimes Jul 31, 2025

google-oss-prow bot assigned astefanutti Jul 31, 2025

google-oss-prow bot added the lgtm label Jul 31, 2025

tenzen-y reviewed Jul 31, 2025

View reviewed changes

google-oss-prow bot assigned tenzen-y Jul 31, 2025

google-oss-prow bot added the approved label Jul 31, 2025

google-oss-prow bot merged commit aa46e02 into kubeflow:master Jul 31, 2025
19 checks passed

google-oss-prow bot added this to the v2.1 milestone Jul 31, 2025

andreyvelich deleted the add-runtime-labels branch July 31, 2025 11:30

andreyvelich mentioned this pull request Aug 1, 2025

feat(trainer): Support Framework Labels in Runtimes kubeflow/sdk#56

Merged

andreyvelich mentioned this pull request Aug 19, 2025

feat: Implement TrainerClient Backends & Local Process kubeflow/sdk#33

Merged

1 task

astefanutti mentioned this pull request Sep 24, 2025

[release-2.0] feat(runtimes): Add Framework Label to the Runtimes (#2761) #2851

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(runtimes): Add Framework Label to the Runtimes #2761

feat(runtimes): Add Framework Label to the Runtimes #2761

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat(runtimes): Add Framework Label to the Runtimes #2761

feat(runtimes): Add Framework Label to the Runtimes #2761

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Pull Request Test Coverage Report for Build 16645963859

Details

💛 - Coveralls

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!