8000 Add Python API, semantics and implementation details for DLPack (#106) · data-apis/array-api@e2474ce · GitHub
[go: up one dir, main page]

Skip to content

Commit e2474ce

Browse files
authored
Add Python API, semantics and implementation details for DLPack (#106)
* Update data interchange section with Python API and semantics * temporary commit to enable MyST feature duplicate change with that in the complex64/128 PR * Add DLPack synchronization semantics; add from_dlpack/__dlpack__ to API * Update stream numbering for `__dlpack__` This matches `__cuda_array_interface__`. * Add __dlpack__ device and update description of stream=None * Add more device-specific notes for CUDA/ROCm stream handling * Fix issue where producer/consumer were reversed * Improve the description of the stream keyword for `__dlpack__` * Update __dlpack_device__ to use IntEnum for device type * Add -1 as a sentinel value for DLPack stream handling * Add supported DLPack version range. * Add details on strides null and size 0 arrays. Also fix a couple of small textual things.
1 parent 014cef6 commit e2474ce

File tree

4 files changed

+151
-10
lines changed

4 files changed

+151
-10
lines changed

spec/API_specification/array_object.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -377,6 +377,70 @@ Evaluates `x1_i & x2_i` for each element `x1_i` of an array instance `x1` with t
377377
Element-wise results must equal the results returned by the equivalent element-wise function [`bitwise_and(x1, x2)`](elementwise_functions.md#logical_andx1-x2-).
378378
```
379379

380+
(method-__dlpack__)=
381+
### \_\_dlpack\_\_(/, *, stream=None)
382+
383+
Exports the array as a DLPack capsule, for consumption by {ref}`function-from_dlpack`.
384+
385+
#### Parameters
386+
387+
- **stream**: _Optional\[int\]_
388+
389+
- An optional pointer to a stream, as a Python integer, provided by the consumer that the producer will use to make the array safe to operate on. The pointer is a positive integer. `-1` is a special value that may be used by the consumer to signal "producer must not do any synchronization". Device-specific notes:
390+
391+
:::{admonition} CUDA
392+
- `None`: producer must assume the legacy default stream (default),
393+
- `1`: the legacy default stream,
394+
- `2`: the per-thread default stream,
395+
- `> 2`: stream number represented as a Python integer.
396+
397+
Note that `0` is disallowed (it's ambiguous, it could mean either `None`, `1` or `2`).
398+
:::
399+
400+
:::{admonition} ROCm
401+
- `None`: producer must assume the legacy default stream (default),
402+
- `0`: the default stream,
403+
- `> 2`: stream number represented as a Python integer.
404+
405+
Using `1` and `2` is not supported.
406+
:::
407+
408+
```{tip}
409+
It is recommended that implementers explicitly handle streams. If
410+
they use the legacy default stream, specifying `1` (CUDA) or `0`
411+
(ROCm) is preferred. `None` is a safe default for developers who do
412+
not want to think about stream handling at all, potentially at the
413+
cost of more synchronization than necessary.
414+
```
415+
416+
#### Returns
417+
418+
- **capsule**: _<PyCapsule>_
419+
420+
- A DLPack capsule for the array. See {ref}`data-interchange` for details.
421+
422+
(method-__dlpack_device__)=
423+
### \_\_dlpack\_device\_\_()
424+
425+
Returns device type and device ID in DLPack format. Meant for use within {ref}`function-from_dlpack`.
426+
427+
#### Returns
428+
429+
- **device**: _Tuple\[enum.IntEnum, int\]_
430+
431+
- A tuple `(device_type, device_id)` in DLPack format. Valid device type enum members are:
432+
433+
```
434+
CPU = 1
435+
CUDA = 2
436+
CPU_PINNED = 3
437+
OPENCL = 4
438+
VULKAN = 7
439+
METAL = 8
440+
VPI = 9
441+
ROCM = 10
442+
```
443+
380444
(method-__eq__)=
381445
### \_\_eq\_\_(x1, x2, /)
382446

spec/API_specification/creation_functions.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,27 @@ Returns a two-dimensional array with ones on the `k`th diagonal and zeros elsewh
132132

133133
- an array where all elements are equal to zero, except for the `k`th diagonal, whose values are equal to one.
134134

135+
(function-from_dlpack)=
136+
### from_dlpack(x, /)
137+
138+
Returns a new array containing the data from another (array) object with a `__dlpack__` method.
139+
140+
#### Parameters
141+
142+
- **x**: _object_
143+
144+
- input (array) object.
145+
146+
#### Returns
147+
148+
- **out**: _<array>_
149+
150+
- an array containing the data in `x`.
151+
152+
```{note}
153+
The returned array may be either a copy or a view. See {ref}`data-interchange` for details.
154+
```
155+
135156
(function-full)=
136157
### full(shape, fill_value, /, *, dtype=None, device=None)
137158
31 KB
Loading

spec/design_topics/data_interchange.md

Lines changed: 66 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -58,18 +58,74 @@ means the object it is attached to must return a `numpy.ndarray`
5858
containing the data the object holds).
5959
```
6060

61-
TODO: design an appropriate Python API for DLPACK (`to_dlpack` followed by `from_dlpack` is a little clunky, we'd like it to work more like the buffer protocol does on CPU, with a single constructor function).
6261

63-
TODO: specify the expected behaviour with copy/view/move/shared-memory semantics in detail.
62+
## Syntax for data interchange with DLPack
6463

64+
The array API will offer the following syntax for data interchange:
6565

66-
```{note}
66+
1. A `from_dlpack(x)` function, which accepts (array) objects with a
67+
`__dlpack__` method and uses that method to construct a new array
68+
containing the data from `x`.
69+
2. `__dlpack__(self, stream=None)` and `__dlpack_device__` methods on the
70+
array object, which will be called from within `from_dlpack`, to query
71+
what device the array is on (may be needed to pass in the correct
72+
stream, e.g. in the case of multiple GPUs) and to access the data.
73+
74+
75+
## Semantics
76+
77+
DLPack describe the memory layout of strided, n-dimensional arrays.
78+
When a user calls `y = from_dlpack(x)`, the library implementing `x` (the
79+
"producer") will provide access to the data from `x` to the library
80+
containing `from_dlpack` (the "consumer"). If possible, this must be
81+
zero-copy (i.e. `y` will be a _view_ on `x`). If not possible, that library
82+
may make a copy of the data. In both cases:
83+
- the producer keeps owning the memory
84+
- `y` may or may not be a view, therefore the user must keep the
85+
recommendation to avoid mutating `y` in mind - see
86+
{ref}`copyview-mutability`.
87+
- Both `x` and `y` may continue to be used just like arrays created in other ways.
6788

6889
If an array that is accessed via the interchange protocol lives on a
69-
device that the requesting library does not support, one of two things
70-
must happen: moving data to another device, or raising an exception.
71-
Device transfers are typically expensive, hence doing that silently can
72-
lead to hard to detect performance issues. Hence it is recommended to
73-
raise an exception, and let the user explicitly enable device transfers
74-
via, e.g., a `force=False` keyword that they can set to `True`.
75-
```
90+
device that the requesting library does not support, it is recommended to
91+
raise a `TypeError`.
92+
93+
Stream handling through the `stream` keyword applies to CUDA and ROCm (perhaps
94+
to other devices that have a stream concept as well, however those haven't been
95+
considered in detail). The consumer must pass the stream it will use to the
96+
producer; the producer must synchronize or wait on the stream when necessary.
97+
In the common case of the default stream being used, synchronization will be
98+
unnecessary so asynchronous execution is enabled.
99+
100+
101+
## Implementation
102+
103+
_Note that while this API standard largely tries to avoid discussing implementation details, some discussion and requirements are needed here because data interchange requires coordination between implementers on, e.g., memory management._
104+
105+
![Diagram of DLPack structs](/_static/images/DLPack_diagram.png)
106+
107+
_DLPack diagram. Dark blue are the structs it defines, light blue struct members, gray text enum values of su C096 pported devices and data types._
108+
109+
The `__dlpack__` method will produce a `PyCapsule` containing a
110+
`DLPackManagedTensor`, which will be consumed immediately within
111+
`from_dlpack` - therefore it is consumed exactly once, and it will not be
112+
visible to users of the Python API.
113+
114+
The consumer must set the PyCapsule name to `"used_dltensor"`, and call the
115+
`deleter` of the `DLPackManagedTensor` when it no longer needs the data.
116+
117+
When the `strides` field in the `DLTensor` struct is `NULL`, it indicates a
118+
row-major compact array. If the array is of size zero, the data pointer in
119+
`DLTensor` should be set to either `NULL` or `0`.
120+
121+
DLPack version used must be `0.2 <= DLPACK_VERSION < 1.0`. For further
122+
details on DLPack design and how to implement support for it,
123+
refer to [github.com/dmlc/dlpack](https://github.com/dmlc/dlpack).
124+
125+
:::{warning}
126+
DLPack contains a `device_id`, which will be the device ID (an integer, `0, 1, ...`) which the producer library uses. In practice this will likely be the same numbering as that of the consumer, however that is not guaranteed. Depending on the hardware type, it may be possible for the consumer library implementation to look up the actual device from the pointer to the data - this is possible for example for CUDA device pointers.
127+
128+
It is recommended that implementers of this array API consider and document
129+
whether the `.device` attribute of the array returned from `from_dlpack` is
130+
guaranteed to be in a certain order or not.
131+
:::

0 commit comments

Comments
 (0)
0