-
Notifications
You must be signed in to change notification settings - Fork 225
Description
Is your feature request related to a problem? Please describe.
Currently I'm running a streamz workflow that uses pytorch. I notice that I continue to encounter errors like below where pytorch is not able to allocate enough memory.
RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 31.72 GiB total capacity; 29.02 GiB already allocated; 244.88 MiB free; 29.80 GiB reserved in total by PyTorch)
I'm wondering if pytorch and rmm are competing for memory and if so if there's a recommended way to manage
Describe the solution you'd like
If possible, for pytorch and rmm to potentially use the same memory pool. Or a recommended method to resolve this type of memory issue
Describe alternatives you've considered
None
Additional context
The streamz workflow end-to-end can be found here. In short summary, it first initializes a streamz worklfow that uses dask to read in data from kafka. Then processes that data using cyBERT inferencing which can be found here. cyBERT uses cudf
for data pre-processing steps and a BERT model for inferencing. Then the processed data is published back to kafka.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status