10000 Add variable-length string support by kgryte · Pull Request #45 · data-apis/dataframe-api · GitHub
[go: up one dir, main page]

Skip to content

Add variable-length string support #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
5a8d030
Update protocol to support returning an offsets buffer
kgryte Jun 23, 2021
3aac477
Add punctuation
kgryte Jun 23, 2021
c7728e2
Update protocol to return dtype along with buffer
kgryte Jun 24, 2021
c8000f7
Add string support in various methods and add todos
kgryte Jun 24, 2021
040f928
Add support for resolving an offsets buffer
kgryte Jun 24, 2021
a982987
Add support for returning a data buffer for string dtypes
kgryte Jun 24, 2021
fd4d71b
Update offsets buffer accessor
kgryte Jun 24, 2021
e40f902
Add implementation to convert a string column
kgryte Jun 24, 2021
c122b3c
Add tests
kgryte Jun 24, 2021
0d04af3
Handle missing values
kgryte Jun 24, 2021
58fee89
Update typing and docs
kgryte Jun 24, 2021
2c4a846
Add comment
kgryte Jun 24, 2021
2e3914f
Requirements document for the dataframe interchange protocol (#35)
rgommers Jun 25, 2021
f9f259c
Remove outdated figures
rgommers Jun 25, 2021
a545faa
Document that strided buffers do not need to be supported
rgommers Jun 25, 2021
52abf7a
Merge pull request #38 from data-apis/protocol-impl
rgommers Jun 25, 2021
6010ae7
Add todo
kgryte Jun 28, 2021
ac1a5ca
Merge branch 'main' of https://github.com/data-apis/dataframe-api int…
kgryte Jun 28, 2021
89a7996
Remove colons
kgryte Jun 28, 2021
a3ff4e7
Fix grammar
kgryte Jul 8, 2021
ff84e8c
Rename methods
kgryte Jul 8, 2021
c954f3c
Rename methods
kgryte Jul 8, 2021
ed64fb7
Update describe_null to indicate a byte array for string dtype
kgryte Jul 8, 2021
9b9aecf
Return encoding for missing values
kgryte Jul 19, 2021
4026900
Update test
kgryte Jul 19, 2021
87d7143
Use invalid value encoding
kgryte Jul 19, 2021
56ee2da
Update copy
kgryte Jul 19, 2021
0035c90
Use Arrow format strings
kgryte Jul 19, 2021
91ed6a1
Add `get_buffers` method to the protocol
kgryte Jul 19, 2021
26fb48d
Remove individual methods
kgryte Jul 19, 2021
0d0e94b
Update copy
kgryte Jul 19, 2021
9ec830c
Refactor to return a dictionary of buffers
kgryte Jul 19, 2021
0dd4e2c
Update comments
kgryte Jul 19, 2021
ade0d76
Fix copy
kgryte Jul 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Document that strided buffers do not need to be supported
  • Loading branch information
rgommers committed Jun 25, 2021
commit a545faa1fdc6fa926866600fb989826a6b9721e3
8 changes: 7 additions & 1 deletion protocol/dataframe_protocol_summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,13 @@ We'll also list some things that were discussed but are not requirements:
3. Extension dtypes, i.e. a way to extend the set of dtypes that is
explicitly support, are out of scope.
_Rationale: complex to support, not used enough to justify that complexity._
4. "virtual columns", i.e. columns for which the data is not yet in memory
4. Support for strided storage in buffers.
_Rationale: this is supported by a subset of dataframes only, mainly those
that use NumPy arrays. In many real-world use cases, strided arrays will
force a copy at some point, so requiring contiguous memory layout (and hence
an extra copy at the moment `__dataframe__` is used) is considered a good
trade-off for reduced implementation complexity._
5. "virtual columns", i.e. columns for which the data is not yet in memory
because it uses lazy evaluation, are not supported other than through
letting the producer materialize the data in memory when the consumer
calls `__dataframe__`.
Expand Down
0