10000 Let's see how this makes the docs look · pytorch/pytorch@363fe3c · GitHub
[go: up one dir, main page]

Skip to content

Commit 363fe3c

Browse files
committed
Let's see how this makes the docs look
1 parent 4d8dca7 commit 363fe3c

File tree

3 files changed

+32
-10
lines changed

3 files changed

+32
-10
lines changed

docs/source/cuda.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,8 @@ Graphs (beta)
8484
graph
8585
make_graphed_callables
8686

87+
.. _cuda-memory-management-api:
88+
8789
Memory management
8890
-----------------
8991
.. autosummary::

docs/source/notes/cuda.rst

Lines changed: 26 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -346,29 +346,45 @@ complete snapshot of the memory allocator state via
346346
:meth:`~torch.cuda.memory_snapshot`, which can help you understand the
347347
underlying allocation patterns produced by your code.
348348

349+
.. _cuda-memory-envvars:
350+
351+
Environment variables
352+
^^^^^^^^^^^^^^^^^^^^^
353+
349354
Use of a caching allocator can interfere with memory checking tools such as
350355
``cuda-memcheck``. To debug memory errors using ``cuda-memcheck``, set
351356
``PYTORCH_NO_CUDA_MEMORY_CACHING=1`` in your environment to disable caching.
352357

353-
The behavior of caching allocator can be controlled via environment variable
358+
The behavior of the caching allocator can be controlled via the environment variable
354359
``PYTORCH_CUDA_ALLOC_CONF``.
355360
The format is ``PYTORCH_CUDA_ALLOC_CONF=<option>:<value>,<option2>:<value2>...``
356361
Available options:
357362

358-
* ``max_split_size_mb`` prevents the allocator from splitting blocks larger
359-
than this size (in MB). This can help prevent fragmentation and may allow
360-
some borderline workloads to complete without running out of memory.
361-
Performance cost can range from 'zero' to 'substatial' depending on
362-
allocation patterns. Default value is unlimited, i.e. all blocks can be
363-
split. The :meth:`~torch.cuda.memory_stats` and
364-
:meth:`~torch.cuda.memory_summary` methods are useful for tuning. This
365-
option should be used as a last resort for a workload that is aborting
366-
due to 'out of memory' and showing a large amount of inactive split blocks.
367363
* ``backend`` allows selecting the underlying allocator implementation.
368364
Currently, valid options are ``native``, which uses Pytorch's native
369365
implementation, and ``cudaMallocAsync``, which uses
370366
`CUDA's built-in asynchronous allocator`_.
371367
``cudaMallocAsync`` requires CUDA 11.4 or newer. The default is ``native``.
368+
* ``max_split_size_mb`` prevents the native allocator
369+
from splitting blocks larger than this size (in MB). This can reduce
370+
fragmentation and may allow some borderline workloads to complete without
371+
running out of memory. Performance cost can range from 'zero' to 'substatial'
372+
depending on allocation patterns. Default value is unlimited, i.e. all blocks
373+
can be split. The
374+
:meth:`~torch.cuda.memory_stats` and
375+
:meth:`~torch.cuda.memory_summary` methods are useful for tuning. This
376+
option should be used as a last resort for a workload that is aborting
377+
due to 'out of memory' and showing a large amount of inactive split blocks.
378+
``max_split_size_mb`` is only meaningful with ``backend:native``.
379+
With ``backend:cudaMallocAsync``, ``max_split_size_mb`` is ignored.
380+
381+
.. note::
382+
383+
Some stats reported by the
384+
:ref:`CUDA memory management API<cuda-memory-management-api>`
385+
are specific to ``backend:native``, and are not meaningful with
386+
``backend:cudaMallocAsync``.
387+
See each function's docstring for details.
372388

373389
.. _CUDA's built-in asynchronous allocator:
374390
https://developer.nvidia.com/blog/using-cuda-stream-ordered-memory-allocator-part-1/

torch/cuda/memory.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,10 @@ def memory_stats(device: Union[Device, int] = None) -> Dict[str, Any]:
183183
.. note::
184184
See :ref:`cuda-memory-management` for more details about GPU memory
185185
management.
186+
187+
.. note::
188+
With :ref:`backend:cudaMallocAsync<cuda-memory-envvars>`, some stats are not
189+
meaningful, and are always reported as zero.
186190
"""
187191
result = []
188192

0 commit comments

Comments
 (0)
0