feat: add ability to specify mlflow tag for tesseract #426

nbowen3 · 2025-12-17T20:56:42Z

Relevant issue or PR

Description of changes

This adds the ability to specify mlflow tags for Tesseract runs via the TESSERACT_MLFLOW_TAGS env variable. This is useful to embed additional metadata into the MLFlow run.

n.b. As we want to ensure that all runs have the tags if set, we need to make a small change here to check if there is an active run. Previously, if there was no an active run and you logged something, it would implicitly create one. That is a problem as it will not include the tags. Instead we now explicitly check if there an active run; and, if not, we create one with the tags.

Testing done

Local and unit testing

codecov · 2025-12-17T20:58:52Z

Codecov Report

❌ Patch coverage is 81.81818% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.73%. Comparing base (5284bb6) to head (0a86797).

Files with missing lines	Patch %	Lines
tesseract_core/runtime/mpa.py	80.95%	1 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #426      +/-   ##
==========================================
+ Coverage   66.91%   76.73%   +9.81%     
==========================================
  Files          29       29              
  Lines        3431     3452      +21     
  Branches      535      538       +3     
==========================================
+ Hits         2296     2649     +353     
+ Misses        945      570     -375     
- Partials      190      233      +43

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dionhaefner · 2025-12-18T08:36:00Z

Thanks for the PR. I'm trying to understand what these tags are used for in practice, and whether it really makes sense to set them server-wide (as opposed to per request, like the run_id parameter we have introduced for a similar purpose). What's the primary use case @nbowen3?

Would also be interested in @johnbcoughlin's take on this.

johnbcoughlin · 2025-12-18T19:28:48Z

@nbowen3 and I spoke about this, here are my takeaways:

P4D uses tesseract run, not tesseract serve, so from P4D's perspective it's much the same to set the env var for the lifetime of the Tesseract.
Passing tags on a per-call basis is definitely more useful than on a per-serve basis, however:
In my opinion the auto-start_run functionality (which uses the TESSERACT_MLFLOW_ family of env vars) is still not useful except for callers who just require a "lowest common denominator" logging experience (like P4D invoking an arbitrary Tesseract).
To make the auto-start_run functionality competitive with just calling start_run yourself, we would need a way to
- thread through most of the parameters listed here, including run_name and parent_run_id.
- disable automatically starting a run in jacobian_vector_product and other autodiff calls.

Given all of this, I think we should be wary about adding call-time start_run parameters to the Tesseract surface area. The benefits of making them call-time apply to non-P4D use cases, and don't begin to materialize until at least 2 or 3 more have been added.

dionhaefner · 2025-12-19T15:10:15Z

tesseract_core/runtime/mpa.py

+    def _ensure_active_run(self) -> None:
+        """Ensure there is an active MLflow run, starting one if needed."""
+        if self.mlflow.active_run() is None:
+            self.start_run()


Why do we need this all of a sudden?

Mostly protection - someone can call log_... without having an active run which starts one implicitly, which wouldn't include tags. This is to ensure if someone does that we start the run with the tags.

As an example, we don't seem to call start_run in the example metrics tesseract: https://github.com/pasteurlabs/tesseract-core/blob/main/examples/metrics/tesseract_api.py#L19-L21.

IIRC we always start a run internally before every request, so this should be unnecessary unless you actually observed this failure mode in action.

dionhaefner · 2025-12-19T15:14:46Z

thread through most of the parameters listed here, including run_name and parent_run_id.

So why not add a generic setting then, like mlflow_extra_args? Then the feature in this PR becomes TESSERACT_MLFLOW_EXTRA_ARGS='{"tags": {"foo": "bar"}, "run_name": "myrun", ...}'.

I'd rather have something less pretty but more likely to be useful for a wide range of users than several specialized configs that are always 1 step behind what people actually need.

nbowen3 · 2025-12-19T16:27:36Z

thread through most of the parameters listed here, including run_name and parent_run_id.

So why not add a generic setting then, like mlflow_extra_args? Then the feature in this PR becomes TESSERACT_MLFLOW_EXTRA_ARGS='{"tags": {"foo": "bar"}, "run_name": "myrun", ...}'.

I'd rather have something less pretty but more likely to be useful for a wide range of users than several specialized configs that are always 1 step behind what people actually need.

@dionhaefner That's fair. I'm not super experienced with python but I believe you could parse that env variable into a dict then do something like mlflow.start_run(**extra_args_dict)? That way we don't have to parse out the keys themselves to individual vars and don't have to make additional parsing logic each time a field is added. We do lose some safety this way but could compare keys against a known list.

dionhaefner · 2025-12-19T16:30:18Z

Yes, for example you can use ast.literal_eval to transform the string contained in the envvar into a Python dict safely.

nbowen3 · 2025-12-19T17:17:19Z

Yes, for example you can use ast.literal_eval to transform the string contained in the envvar into a Python dict safely.

@dionhaefner I'll take a pass at implementing that then re-ping. Regarding envvar vs as part of the request, are we fine to keep it as the envvar for now?

dionhaefner · 2025-12-19T19:50:34Z

Yes that's fine. Thanks!

feat: add ability to specify mlflow tag for tesseract

db739ab

nbowen3 requested review from apaleyes, dionhaefner and xalelax as code owners December 17, 2025 20:56

dionhaefner reviewed Dec 19, 2025

View reviewed changes

nbowen3 added 2 commits December 19, 2025 16:54

update to accept dict for parameters

12310e1

linting

0a86797

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add ability to specify mlflow tag for tesseract #426

feat: add ability to specify mlflow tag for tesseract #426

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: add ability to specify mlflow tag for tesseract #426

Are you sure you want to change the base?

feat: add ability to specify mlflow tag for tesseract #426

Conversation

Relevant issue or PR

Description of changes

Testing done

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants