8000 Address some code review comments · data-apis/dataframe-api@c08ec10 · GitHub
[go: up one dir, main page]

Skip to content

Commit c08ec10

Browse files
committed
Address some code review comments
1 parent 90b4f42 commit c08ec10

File tree

2 files changed

+26
-15
lines changed

2 files changed

+26
-15
lines changed

protocol/dataframe_protocol.py

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,14 @@
1212
1. A `Buffer` class. A *buffer* is a contiguous block of memory - this is the
1313
only thing that actually maps to a 1-D array in a sense that it could be
1414
converted to NumPy, CuPy, et al.
15-
2. A `Column` class. A *column* has a name and a single dtype. It can consist
15+
2. A `Column` class. A *column* has a single dtype. It can consist
1616
of multiple *chunks*. A single chunk of a column (which may be the whole
1717
column if ``num_chunks == 1``) is modeled as again a `Column` instance, and
1818
contains 1 data *buffer* and (optionally) one *mask* for missing data.
19-
3. A `DataFrame` class. A *data frame* is an ordered collection of *columns*.
20-
It has a single device, and all its rows are the same length. It can consist
21-
of multiple *chunks*. A single chunk of a data frame is modeled as
22-
again a `DataFrame` instance.
19+
3. A `DataFrame` class. A *data frame* is an ordered collection of *columns*,
20+
which are identified with names that are unique strings. All the data
21+
frame's rows are the same length. It can consist of multiple *chunks*. A
22+
single chunk of a data frame is modeled as again a `DataFrame` instance.
2323
4. A *mask* concept. A *mask* of a single-chunk column is a *buffer*.
2424
5. A *chunk* concept. A *chunk* is a sub-dividing element that can be applied
2525
to a *data frame* or a *column*.
@@ -59,7 +59,7 @@
5959
6060
Note that row labels could be added in the future - right now there's no clear
6161
requirements for more complex row labels that cannot be represented by a single
62-
column. That do exist, for example Modin has has table and tree-based row
62+
column. These do exist, for example Modin has has table and tree-based row
6363
labels.
6464
6565
"""
@@ -194,19 +194,19 @@ def offset(self) -> int:
194194
pass
195195

196196
@property
197-
def dtype(self) -> Tuple[int, int, str, str]:
197+
def dtype(self) -> Tuple[enum.IntEnum, int, str, str]:
198198
"""
199199
Dtype description as a tuple ``(kind, bit-width, format string, endianness)``
200200
201201
Kind :
202202
203-
- 0 : signed integer
204-
- 1 : unsigned integer
205-
- 2 : IEEE floating point
206-
- 20 : boolean
207-
- 21 : string (UTF-8)
208-
- 22 : datetime
209-
- 23 : categorical
203+
- INT = 0
204+
- UINT = 1
205+
- FLOAT = 2
206+
- BOOL = 20
207+
- STRING = 21 # UTF-8
208+
8000 - DATETIME = 22
209+
- CATEGORICAL = 23
210210
211211
Bit-width : the number of bits as an integer
212212
Format string : data type description format string in Apache Arrow C

protocol/pandas_implementation.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,16 @@ def _from_dataframe(df : DataFrameObject) -> pd.DataFrame:
6868
return pd.DataFrame(columns)
6969

7070

71+
class _DtypeKind(enum.IntEnum):
72+
INT = 0
73+
UINT = 1
74+
FLOAT = 2
75+
BOOL = 20
76+
STRING = 21 # UTF-8
77+
DATETIME = 22
78+
CATEGORICAL = 23
79+
80+
7181
def convert_column_to_ndarray(col : ColumnObject) -> np.ndarray:
7282
"""
7383
"""
@@ -82,7 +92,8 @@ def convert_column_to_ndarray(col : ColumnObject) -> np.ndarray:
8292
_dtype = col.dtype
8393
kind = _dtype[0]
8494
bitwidth = _dtype[1]
85-
if _dtype[0] not in (0, 1, 2, 20):
95+
_k = _DtypeKind
96+
if _dtype[0] not in (_k.INT, _k.UINT, _k.FLOAT, _k.BOOL):
8697
raise RuntimeError("Not a boolean, integer or floating-point dtype")
8798

8899
_ints = {8: np.int8, 16: np.int16, 32: np.int32, 64: np.int64}

0 commit comments

Comments
 (0)
0