-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
PDEP-10: Add pyarrow as a required dependency #52711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
89a3a3b
cf88b43
dafa709
5e1fbd1
44a3321
ea9f5e3
fbd1aa0
6d667b4
bed5f0b
12622bb
864b8d1
2d4f4fd
bb332ca
a8275fa
1148007
b406dc1
ecc4d5b
ec1c0e3
23eb251
dd7c62a
2ddd82a
3c54d22
1b60fbb
70cdf74
14602a6
2cfb92f
e0e406c
f047032
ed28c04
99de932
99fd739
9384bc7
c3beeb3
8347e83
d740403
959873e
f936280
2db0037
c2b8cfe
4e05151
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
Co-authored-by: Irv Lustig <irv@princeton.com>
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,7 +11,7 @@ | |
|
||
This PDEP proposes that: | ||
|
||
- PyArrow becomes a runtime dependency starting pandas 2.1 | ||
- PyArrow becomes a runtime dependency starting with pandas 2.1 | ||
- The minimum version of PyArrow supported starting pandas 2.1 is version 7. | ||
phofl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- The minimum version of PyArrow will be bumped every major pandas release to the highest | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm.. This might be too aggressive and might also make it hard to predict what the minimum version will be. I'd recommend following what we do for numpy, which is according to NEP 29, support There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the challenge with offering a similar support window for the two libraries is that NumPy has a very stable ABI whereas PyArrow does not There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if I'm missing something, but sounds like what @lithomas1 is proposing is pretty much the same as what's written in the proposal but phrased in a different way. Has the proposal been updated, or am I misunderstanding that supporting the releases of the last 24 months, and supporting the highest/oldest version two years old? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I read this as every major release we will bump the min required version of pyarrow to the latest version, but might be misreading here. Note that my proposed change would be different in that we would drop Arrow versions in both major/minor versions (as opposed to every major version), just like we do with numpy (once we reach the end of the NEP support window).
I might have missed some more discussion on this, but I thought we were going to restrict current usage of pyarrow to just what's exposed through Python. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
To the latest version About bumping in major or minor releases, I don't have a preference, either is fine for me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can rephrase this to make it more clear but @datapythonista has it correct. The only distinction here, compared to what we do with numpy today, is that pyarrow would be bumped only during a pandas major release.
Under this proposal, PyArrow will only be used as a runtime dependency There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you consider this to upgrading pyarrow in both major and minor versions to be consistent with numpy? I ask because it is probably tricky for downstream to predict the length of our major release cycle (for 2.0 I think we delayed it twice. IIRC 1.4 was supposed to be 2.0). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sure that would be okay with me too |
||
PyArrow version that has been released for at least 2 years, and the minimum PyArrow version will be | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this version be consistent across the entire pandas API?
e.g. If I wanted to bump the pyarrow version for just the CSV parser to something higher, would I be able to do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minimum version would be consistent across the library, but IMO that shouldn't stop development of features that exist in newer versions of pyarrow (we already do this with version checking or try/except)