You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: protocol/dataframe_protocol_summary.md
+14-6Lines changed: 14 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -90,7 +90,9 @@ this is a consequence, and that that should be acceptable to them.
90
90
91
91
## Conceptual model of a dataframe
92
92
93
-
For a protocol to exchange dataframes between libraries, we need both a model of what we mean by "dataframe" conceptually for the purposes of the protocol, and a model of how the data is represented in memory:
93
+
For a protocol to exchange dataframes between libraries, we need both a model
94
+
of what we mean by "dataframe" conceptually for the purposes of the protocol,
95
+
and a model of how the data is represented in memory:
94
96
95
97

96
98
@@ -141,9 +143,13 @@ length. A **dataframe** contains one or more chunks.
141
143
12. Must support chunking, i.e. accessing the data in "batches" of rows.
142
144
There must be metadata the consumer can access to learn in how many
143
145
chunks the data is stored. The consumer may also convert the data in
144
-
more chunks than it is stored in, i.e. it can ask the producer to slice its columns to shorter length. That request may not be such that it would force the producer
145
-
to concatenate data that is already stored in separate chunks.
146
-
_Rationale: support for chunking is more efficient for libraries that natively store chunks, and it is needed for dataframes that do not fit in memory (e.g. dataframes stored on disk or lazily evaluated)._
<
8000
/td>
146
+
more chunks than it is stored in, i.e. it can ask the producer to slice
147
+
its columns to shorter length. That request may not be such that it would
148
+
force the producer to concatenate data that is already stored in separate
149
+
chunks.
150
+
_Rationale: support for chunking is more efficient for libraries that
151
+
natively store chunks, and it is needed for dataframes that do not fit in
152
+
memory (e.g. dataframes stored on disk or lazily evaluated)._
147
153
148
154
We'll also list some things that were discussed but are not requirements:
149
155
@@ -186,7 +192,8 @@ isn't completely necessary, however it's expected that a full dataframe API
186
192
standard will have such a function. The array API standard also has such a
187
193
function, namely `from_dlpack`. Adding at least a recommendation on syntax
188
194
for this function would make sense, e.g., `from_dataframe(df, stream=None)`.
189
-
Discussion at https://github.com/data-apis/dataframe-api/issues/29#issuecomment-685903651 is relevant.
195
+
Discussion at https://github.com/data-apis/dataframe-api/issues/29#issuecomment-685903651
196
+
is relevant.
190
197
191
198
192
199
## Frequently asked questions
@@ -196,7 +203,8 @@ Discussion at https://github.com/data-apis/dataframe-api/issues/29#issuecomment-
196
203
What we are aiming for is quite similar to the Arrow C Data Interface (see
197
204
the [rationale for the Arrow C Data Interface](https://arrow.apache.org/docs/format/CDataInterface.html#rationale)),
198
205
except `__dataframe__` is a Python-level rather than C-level interface.
199
-
The data types format specification of that interface is something that could be used unchanged.
206
+
The data types format specification of that interface is something that could
207
+
be used unchanged.
200
208
201
209
The main (only?) limitation seems to be that it does not have device support
202
210
-@kkraus14 will bring this up on the Arrow dev mailing list. Also note that
0 commit comments