-
Notifications
You must be signed in to change notification settings - Fork 4
feat: add ability to specify mlflow tag for tesseract #426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #426 +/- ##
==========================================
+ Coverage 66.91% 76.73% +9.81%
==========================================
Files 29 29
Lines 3431 3452 +21
Branches 535 538 +3
==========================================
+ Hits 2296 2649 +353
+ Misses 945 570 -375
- Partials 190 233 +43 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thanks for the PR. I'm trying to understand what these tags are used for in practice, and whether it really makes sense to set them server-wide (as opposed to per request, like the Would also be interested in @johnbcoughlin's take on this. |
|
@nbowen3 and I spoke about this, here are my takeaways:
Given all of this, I think we should be wary about adding call-time |
| def _ensure_active_run(self) -> None: | ||
| """Ensure there is an active MLflow run, starting one if needed.""" | ||
| if self.mlflow.active_run() is None: | ||
| self.start_run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this all of a sudden?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly protection - someone can call log_... without having an active run which starts one implicitly, which wouldn't include tags. This is to ensure if someone does that we start the run with the tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example, we don't seem to call start_run in the example metrics tesseract: https://github.com/pasteurlabs/tesseract-core/blob/main/examples/metrics/tesseract_api.py#L19-L21.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we always start a run internally before every request, so this should be unnecessary unless you actually observed this failure mode in action.
So why not add a generic setting then, like I'd rather have something less pretty but more likely to be useful for a wide range of users than several specialized configs that are always 1 step behind what people actually need. |
@dionhaefner That's fair. I'm not super experienced with python but I believe you could parse that env variable into a dict then do something like |
|
Yes, for example you can use |
@dionhaefner I'll take a pass at implementing that then re-ping. Regarding envvar vs as part of the request, are we fine to keep it as the envvar for now? |
|
Yes that's fine. Thanks! |
Relevant issue or PR
Description of changes
This adds the ability to specify mlflow tags for Tesseract runs via the
TESSERACT_MLFLOW_TAGSenv variable. This is useful to embed additional metadata into the MLFlow run.n.b. As we want to ensure that all runs have the tags if set, we need to make a small change here to check if there is an active run. Previously, if there was no an active run and you logged something, it would implicitly create one. That is a problem as it will not include the tags. Instead we now explicitly check if there an active run; and, if not, we create one with the tags.
Testing done
Local and unit testing