10000 [FEA] PyTorch and RMM sharing memory pool · Issue #501 · rapidsai/rmm · GitHub
[go: up one dir, main page]

Skip to content
[FEA] PyTorch and RMM sharing memory pool #501
@brhodes10

Description

@brhodes10

Is your feature request related to a problem? Please describe.
Currently I'm running a streamz workflow that uses pytorch. I notice that I continue to encounter errors like below where pytorch is not able to allocate enough memory.

RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 31.72 GiB total capacity; 29.02 GiB already allocated; 244.88 MiB free; 29.80 GiB reserved in total by PyTorch)

I'm wondering if pytorch and rmm are competing for memory and if so if there's a recommended way to manage
Describe the solution you'd like
If possible, for pytorch and rmm to potentially use the same memory pool. Or a recommended method to resolve this type of memory issue

Describe alternatives you've considered
None

Additional context
The streamz workflow end-to-end can be found here. In short summary, it first initializes a streamz worklfow that uses dask to read in data from kafka. Then processes that data using cyBERT inferencing which can be found here. cyBERT uses cudf for data pre-processing steps and a BERT model for inferencing. Then the processed data is published back to kafka.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0