Fix DaskWorker and add GitHub Actions workflow for Dask tests #686

adi611 · 2023-08-10T17:47:09Z

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Summary

Fixes the DaskWorker and add a GitHub Actions workflow file called testdask.yml for Dask tests.

Checklist

I have added tests to cover my changes (if necessary)
I have updated documentation (if necessary)

codecov · 2023-08-11T07:33:40Z

Codecov Report

Patch coverage has no change and project coverage change: -0.10% ⚠️

Comparison is base (29d3d1f) 82.88% compared to head (0f646d2) 82.79%.

❗ Current head 0f646d2 differs from pull request most recent head 3d687b0. Consider uploading reports for the commit 3d687b0 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #686      +/-   ##
==========================================
- Coverage   82.88%   82.79%   -0.10%     
==========================================
  Files          22       22              
  Lines        4845     4848       +3     
  Branches     1391        0    -1391     
==========================================
- Hits         4016     4014       -2     
- Misses        825      834       +9     
+ Partials        4        0       -4

Flag	Coverage Δ
unittests	`82.79% <0.00%> (-0.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
pydra/engine/workers.py	`19.15% <0.00%> (-0.13%)`	⬇️

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

YAML gotcha 3.10 -> 3.1, versions should be quoted.

for more information, see https://pre-commit.ci

- Add concurrency group - Add permissions - Bump actions/setup-python to v4 - Add OS (Ubuntu and macOS) to test matrix - Streamline job steps for testing

ghisvail · 2023-08-11T12:01:05Z

@adi611 @djarecka Interesting,

It looks like testing fails on macOS regardless of the Python version.

The tests succeed on Linux for Python 3.9, 3.10 and 3.11, though there are intermittent failures due to some tests running forever without triggering a timeout. This is quite concerning I think, as we probably don't want Pydra jobs to run indefinitely for unknown reasons.

ghisvail · 2023-08-11T12:26:49Z

pydra/engine/tests/test_workflow.py::test_wf_3nd_st_1[dask] RERUN [ 88%]

This test is very long to process.

ghisvail · 2023-08-11T13:37:43Z

After more than an hour:

=========================== short test summary info ============================
FAILED pydra/engine/tests/test_workflow.py::test_wf_3nd_st_1[dask] - Exception: graph is not empty, but not able to get more tasks - something may have gone wrong when retrieving the results of predecessor tasks. This could be caused by a file-system error or a bug in the internal workflow logic, but is likely to be caused by the hash of an upstream node being unstable. 

Hash instability can be caused by an input of the node being modified in place, or by psuedo-random ordering of `set` or `frozenset` inputs (or nested attributes of inputs) in the hash calculation. To ensure that sets are hashed consistently you can you can try set the environment variable PYTHONHASHSEED=0 for all processes, but it is best to try to identify where the set objects are occurring and manually hash their sorted elements. (or use list objects instead)

Blocked tasks
-------------

mult (FunctionTask_9119c450eb4aba771bfa0b0d61c16836) is blocked by add2x (FunctionTask_32acf4c9930ee17f343a230ee86c85d3), which matches names of []; add2y (FunctionTask_ad4669811dc84a9749ce5e6c6ecc1204), which matches names of []
= 1 failed, 546 passed, 374 skipped, 3 xfailed, 84 warnings, 3 rerun in 3987.10s (1:06:27) =

djarecka · 2023-08-11T18:15:31Z

@ghisvail - yes, I had a problem on my laptop.

How this is related to #673, should we close one ? if you only run test_workflow (as in #673), everything works?

adi611 · 2023-08-11T19:07:20Z

Should I add a commit updating GA workflow file to run the tests as two different jobs - one for test_workflow.py and one for the rest?

adi611 · 2023-08-11T19:43:35Z

On ubuntu-latest:

for Python 3.10 and 3.11 - all the tests pass
for Python 3.9 - test_wf_3nd_st_1[dask] in test_workflow.py fails after running for a considerable time, other tests pass

On macos-latest:

for Python 3.9, 3.10 and 3.11 - test_duplicate_input_on_split_wf in test_workflow.py fails due to timeout, other tests pass. Point to note is that this test runs on cf plugin only and it passes on other GA workflows like testpydra.yml

Also for ubuntu-latest, previously the Dask GA workflow failed for both test_duplicate_input_on_split_wf and test_inner_outer_wf_duplicate due to timeouts but now after the recent commits to the Pydra repo it passes for both.

ghisvail · 2023-08-16T13:02:51Z

A first remark, the logs contain a lot of UserWarning: Port 8787 is already in use.

Which might indicate that we are not cleaning Dask resources properly during the tests.

ghisvail · 2023-08-16T13:12:20Z

I was also wondering, have we got a test workflow not too trivial for Dask parallelization which we could exercise outside of pytest?

ghisvail · 2023-08-16T14:08:33Z

More clues that there is probably something going on with Dask resources:

/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/distributed/client.py:1542: RuntimeWarning: coroutine 'wait_for' was never awaited
  self.close()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/distributed/client.py:1542: RuntimeWarning: coroutine 'Client._close' was never awaited
  self.close()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

ghisvail · 2023-08-16T14:13:24Z

I re-run the tests for Python 3.10 on ubuntu-latest` and it failed this time, sadly.

djarecka · 2023-08-17T04:57:22Z

@adi611 - so is it exactly the same version that was running fine when each test file was running separately? In my case it doesn't change anything...

I've tried to debug today on my osx but hasn't got too far yet..:(

adi611 · 2023-08-17T08:11:06Z

I re-run the tests for Python 3.10 on ubuntu-latest` and it failed this time, sadly.

It is confusing to debug, since even for the same environment the results are inconsistent.

…files (and perhaps some others)

djarecka · 2023-08-19T04:31:07Z

@adi611 - I think I fixed the dask worker. You can either accept my PR that I made to your repository/branch or move to #689
you can also check how does it work on your local machine or collabs

adi611 · 2023-08-19T09:41:48Z

[adi611-patch-updatedask-1](/adi611/pydra/tree/adi611-patch-updatedask-1)

I tried it on my local machine as well as on colab and no test fails! Thank you for the help!

properly closing the dask Client

djarecka · 2023-08-19T13:02:32Z

ok, great! I will merge this now. I think it runs much faster now and we can run more tests with the dask plugin, but we can do it in a separate pull request.

update daskworker

ff18811

ghisvail and others added 3 commits August 11, 2023 09:35

FIX: Use quotes for Python version

d9c6684

YAML gotcha 3.10 -> 3.1, versions should be quoted.

[pre-commit.ci] auto fixes from pre-commit.com hooks

2f33d97

for more information, see https://pre-commit.ci

CI: Some more updates

a0778c8

- Add concurrency group - Add permissions - Bump actions/setup-python to v4 - Add OS (Ubuntu and macOS) to test matrix - Streamline job steps for testing

This was referenced Aug 16, 2023

GitHub Actions workflow for Dask tests #677

Closed

Fix DaskWorker: AttributeError: 'tuple' object has no attribute '_run' #673

Closed

djarecka added 2 commits August 19, 2023 00:16

properly closing the dask Client, seems to solve issue with too many …

3936e65

…files (and perhaps some others)

fixing spaces in testdask yaml file

c7384b9

Merge pull request #25 from djarecka/adi611-adi611-patch-updatedask-1

3d687b0

properly closing the dask Client

djarecka mentioned this pull request Aug 19, 2023

fixes for dask #689

Closed

2 tasks

djarecka merged commit f796a9c into nipype:master Aug 19, 2023

adi611 mentioned this pull request Sep 21, 2023

add google summer of code 2023 changes nipype/pydra-tutorial#48

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix DaskWorker and add GitHub Actions workflow for Dask tests #686

Fix DaskWorker and add GitHub Actions workflow for Dask tests #686

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix DaskWorker and add GitHub Actions workflow for Dask tests #686

Fix DaskWorker and add GitHub Actions workflow for Dask tests #686

Uh oh!

Conversation

Types of changes

Summary

Checklist

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!