8000 Reflow some lines · data-apis/dataframe-api@6839642 · GitHub
[go: up one dir, main page]

Skip to content

Commit 6839642

Browse files
committed
Reflow some lines
1 parent c0b5759 commit 6839642

File tree

1 file changed

+14
-6
lines changed

1 file changed

+14
-6
lines changed

protocol/dataframe_protocol_summary.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,9 @@ this is a consequence, and that that should be acceptable to them.
9090

9191
## Conceptual model of a dataframe
9292

93-
For a protocol to exchange dataframes between libraries, we need both a model of what we mean by "dataframe" conceptually for the purposes of the protocol, and a model of how the data is represented in memory:
93+
For a protocol to exchange dataframes between libraries, we need both a model
94+
of what we mean by "dataframe" conceptually for the purposes of the protocol,
95+
and a model of how the data is represented in memory:
9496

9597
![Image of a dataframe model, containing chunks, columns and 1-D arrays](conceptual_model_df_memory.png)
9698

@@ -141,9 +143,13 @@ length. A **dataframe** contains one or more chunks.
141143
12. Must support chunking, i.e. accessing the data in "batches" of rows.
142144
There must be metadata the consumer can access to learn in how many
143145
chunks the data is stored. The consumer may also convert the data in
144-
more chunks than it is stored in, i.e. it can ask the producer to slice its columns to shorter length. That request may not be such that it would force the producer
145-
to concatenate data that is already stored in separate chunks.
146-
_Rationale: support for chunking is more efficient for libraries that natively store chunks, and it is needed for dataframes that do not fit in memory (e.g. dataframes stored on disk or lazily evaluated)._
< 8000 /td>
146+
more chunks than it is stored in, i.e. it can ask the producer to slice
147+
its columns to shorter length. That request may not be such that it would
148+
force the producer to concatenate data that is already stored in separate
149+
chunks.
150+
_Rationale: support for chunking is more efficient for libraries that
151+
natively store chunks, and it is needed for dataframes that do not fit in
152+
memory (e.g. dataframes stored on disk or lazily evaluated)._
147153

148154
We'll also list some things that were discussed but are not requirements:
149155

@@ -186,7 +192,8 @@ isn't completely necessary, however it's expected that a full dataframe API
186192
standard will have such a function. The array API standard also has such a
187193
function, namely `from_dlpack`. Adding at least a recommendation on syntax
188194
for this function would make sense, e.g., `from_dataframe(df, stream=None)`.
189-
Discussion at https://github.com/data-apis/dataframe-api/issues/29#issuecomment-685903651 is relevant.
195+
Discussion at https://github.com/data-apis/dataframe-api/issues/29#issuecomment-685903651
196+
is relevant.
190197

191198

192199
## Frequently asked questions
@@ -196,7 +203,8 @@ Discussion at https://github.com/data-apis/dataframe-api/issues/29#issuecomment-
196203
What we are aiming for is quite similar to the Arrow C Data Interface (see
197204
the [rationale for the Arrow C Data Interface](https://arrow.apache.org/docs/format/CDataInterface.html#rationale)),
198205
except `__dataframe__` is a Python-level rather than C-level interface.
199-
The data types format specification of that interface is something that could be used unchanged.
206+
The data types format specification of that interface is something that could
207+
be used unchanged.
200208

201209
The main (only?) limitation seems to be that it does not have device support
202210
- @kkraus14 will bring this up on the Arrow dev mailing list. Also note that

0 commit comments

Comments
 (0)
0