fix: model.fit metric not collected issue. #1085

Genesis929 · 2024-10-15T00:46:27Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

shobsi · 2024-10-15T19:07:46Z

bigframes/ml/core.py

        # fit the model, synchronously
        _, job = session._start_query_ml_ddl(sql)
+        if session._metrics is not None:
+            session._metrics.count_job_stats(job)


Should we not do this inside _start_query_ml_ddl? That way all ML DDLs will be accounted in the session metrics

Updated, and added an assertion in test for register.

shobsi · 2024-10-15T19:52:42Z

tests/system/large/ml/test_linear_model.py

+
+    start_execution_count = df._block._expr.session._metrics.execution_count
+
    model.fit(X_train, y_train)


I think we have the opportunity to make this work work all ML APIs - fit, score, predict, transform, .... We should write such tests for all APIs by writing the metrics collection logic in more central place in the code.

score, predict works, seems internally is calling other functions.

Good to know, thanks for checking! We could just pick one or two of the existing tests that does fit, score, predict, register and transform and assert everywhere that we are counting stats for those operations.

Test Added.

shobsi · 2024-10-15T20:53:01Z

bigframes/session/__init__.py

        job_config.destination_encryption_configuration = None

-        return bf_io_bigquery.start_query_with_client(self.bqclient, sql, job_config)
+        results_iterator, query_job = bf_io_bigquery.start_query_with_client(


I see start_query_with_client already has the stats update

python-bigquery-dataframes/bigframes/session/_io/bigquery/__init__.py

Lines 245 to 246 in fd06d31

if metrics is not None:

metrics.count_job_stats(query_job)

so feels like we could be adding double counting

It's not double counting, but yes here we can send the metric into the start_query_with_client, thanks for point that out.

shobsi · 2024-10-15T20:53:51Z

tests/system/large/ml/test_linear_model.py

    model.fit(X_train, y_train)

+    end_execution_count = df._block._expr.session._metrics.execution_count
+    assert end_execution_count - start_execution_count == 2


Worth adding a comment why fit does 2 queries

fix: model.fit metric not collected issue.

126f1a7

product-auto-label bot added size: xs Pull request size is extra small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Oct 15, 2024

Genesis929 and others added 3 commits October 15, 2024 01:24

update

f343e9e

Merge branch 'main' into model_fit_metric_fix

f97974b

fix unit test

f64eba2

Genesis929 marked this pull request as ready for review October 15, 2024 17:15

Genesis929 requested review from a team as code owners October 15, 2024 17:15

Genesis929 requested a review from TrevorBergeron October 15, 2024 17:15

blunderbuss-gcf bot assigned tswast Oct 15, 2024

Genesis929 added the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 15, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Oct 15, 2024

Genesis929 requested review from chelsea-lin and shobsi and removed request for chelsea-lin October 15, 2024 18:04

shobsi reviewed Oct 15, 2024

View reviewed changes

update code and test

a10a547

product-auto-label bot added size: s Pull request size is small. and removed size: xs Pull request size is extra small. labels Oct 15, 2024

Genesis929 requested a review from shobsi October 15, 2024 20:23

shobsi reviewed Oct 15, 2024

View reviewed changes

update code

8e7937b

Genesis929 requested a review from shobsi October 15, 2024 21:51

Genesis929 added 2 commits October 15, 2024 22:00

update comment

70f96b0

update test

0a40d19

shobsi approved these changes Oct 15, 2024

View reviewed changes

Genesis929 merged commit 06cec00 into main Oct 15, 2024

Genesis929 deleted the model_fit_metric_fix branch October 15, 2024 23:13

release-please bot mentioned this pull request Oct 15, 2024

chore(main): release 1.23.0 #1075

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: model.fit metric not collected issue. #1085

fix: model.fit metric not collected issue. #1085

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		start_execution_count = df._block._expr.session._metrics.execution_count

		model.fit(X_train, y_train)

fix: model.fit metric not collected issue. #1085

fix: model.fit metric not collected issue. #1085

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants