|
12 | 12 | 1. A `Buffer` class. A *buffer* is a contiguous block of memory - this is the
|
13 | 13 | only thing that actually maps to a 1-D array in a sense that it could be
|
14 | 14 | converted to NumPy, CuPy, et al.
|
15 |
| -2. A `Column` class. A *column* has a name and a single dtype. It can consist |
| 15 | +2. A `Column` class. A *column* has a single dtype. It can consist |
16 | 16 | of multiple *chunks*. A single chunk of a column (which may be the whole
|
17 | 17 | column if ``num_chunks == 1``) is modeled as again a `Column` instance, and
|
18 | 18 | contains 1 data *buffer* and (optionally) one *mask* for missing data.
|
19 |
| -3. A `DataFrame` class. A *data frame* is an ordered collection of *columns*. |
20 |
| - It has a single device, and all its rows are the same length. It can consist |
21 |
| - of multiple *chunks*. A single chunk of a data frame is modeled as |
22 |
| - again a `DataFrame` instance. |
| 19 | +3. A `DataFrame` class. A *data frame* is an ordered collection of *columns*, |
| 20 | + which are identified with names that are unique strings. All the data |
| 21 | + frame's rows are the same length. It can consist of multiple *chunks*. A |
| 22 | + single chunk of a data frame is modeled as again a `DataFrame` instance. |
23 | 23 | 4. A *mask* concept. A *mask* of a single-chunk column is a *buffer*.
|
24 | 24 | 5. A *chunk* concept. A *chunk* is a sub-dividing element that can be applied
|
25 | 25 | to a *data frame* or a *column*.
|
|
59 | 59 |
|
60 | 60 | Note that row labels could be added in the future - right now there's no clear
|
61 | 61 | requirements for more complex row labels that cannot be represented by a single
|
62 |
| -column. That do exist, for example Modin has has table and tree-based row |
| 62 | +column. These do exist, for example Modin has has table and tree-based row |
63 | 63 | labels.
|
64 | 64 |
|
65 | 65 | """
|
@@ -194,19 +194,19 @@ def offset(self) -> int:
|
194 | 194 | pass
|
195 | 195 |
|
196 | 196 | @property
|
197 |
| - def dtype(self) -> Tuple[int, int, str, str]: |
| 197 | + def dtype(self) -> Tuple[enum.IntEnum, int, str, str]: |
198 | 198 | """
|
199 | 199 | Dtype description as a tuple ``(kind, bit-width, format string, endianness)``
|
200 | 200 |
|
201 | 201 | Kind :
|
202 | 202 |
|
203 |
| - - 0 : signed integer |
204 |
| - - 1 : unsigned integer |
205 |
| - - 2 : IEEE floating point |
206 |
| - - 20 : boolean |
207 |
| - - 21 : string (UTF-8) |
208 |
| - - 22 : datetime |
209 |
| - - 23 : categorical |
| 203 | + - INT = 0 |
| 204 | + - UINT = 1 |
| 205 | + - FLOAT = 2 |
| 206 | + - BOOL = 20 |
| 207 | + - STRING = 21 # UTF-8 |
| 208 | +
8000
- DATETIME = 22 |
| 209 | + - CATEGORICAL = 23 |
210 | 210 |
|
211 | 211 | Bit-width : the number of bits as an integer
|
212 | 212 | Format string : data type description format string in Apache Arrow C
|
|
0 commit comments