[FEA] PyTorch and RMM sharing memory pool

Is your feature request related to a problem? Please describe.
Currently I'm running a streamz workflow that uses pytorch. I notice that I continue to encounter errors like below where pytorch is not able to allocate enough memory.

RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 31.72 GiB total capacity; 29.02 GiB already allocated; 244.88 MiB free; 29.80 GiB reserved in total by PyTorch)

I'm wondering if pytorch and rmm are competing for memory and if so if there's a recommended way to manage
Describe the solution you'd like
If possible, for pytorch and rmm to potentially use the same memory pool. Or a recommended method to resolve this type of memory issue

Describe alternatives you've considered
None

Additional context
The streamz workflow end-to-end can be found here. In short summary, it first initializes a streamz worklfow that uses dask to read in data from kafka. Then processes that data using cyBERT inferencing which can be found here. cyBERT uses cudf for data pre-processing steps and a BERT model for inferencing. Then the processed data is published back to kafka.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions