8000 PDEP-10: Add pyarrow as a required dependency by mroeschke · Pull Request #52711 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

PDEP-10: Add pyarrow as a required dependency #52711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Jul 30, 2023
Merged
Changes from 1 commit
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
89a3a3b
Start pdep 10
mroeschke Apr 14, 2023
cf88b43
Merge remote-tracking branch 'upstream/main' into pdep/pyarrow
mroeschke Apr 17, 2023
dafa709
finish drawbacks, fix other sections
mroeschke Apr 17, 2023
5e1fbd1
Add number
mroeschke Apr 17, 2023
44a3321
our current version is 7 not 6
mroeschke Apr 17, 2023
ea9f5e3
Merge remote-tracking branch 'upstream/main' into pdep/pyarrow
mroeschke Apr 18, 2023
fbd1aa0
Clarify and fix typo
mroeschke Apr 18, 2023
6d667b4
Update web/pandas/pdeps/0010-required-pyarrow-dependency.md
phofl Apr 21, 2023
bed5f0b
Update web/pandas/pdeps/0010-required-pyarrow-dependency.md
phofl Apr 21, 2023
12622bb
Update web/pandas/pdeps/0010-required-pyarrow-dependency.md
phofl Apr 21, 2023
864b8d1
Add string as a preferential pyarrow type
mroeschke Apr 21, 2023
2d4f4fd
Add metric about number of pyarrow import checks
mroeschke Apr 21, 2023
bb332ca
Clarify with actual call
mroeschke Apr 21, 2023
a8275fa
Clarify with actual call
mroeschke Apr 21, 2023
1148007
Merge remote-tracking branch 'upstream/main' into pdep/pyarrow
mroeschke Apr 28, 2023
b406dc1
Address some comments
mroeschke Apr 28, 2023
ecc4d5b
Update 0010-required-pyarrow-dependency.md
phofl Apr 28, 2023
ec1c0e3
Update 0010-required-pyarrow-dependency.md
phofl Apr 28, 2023
23eb251
add Patrick as an author, remove constraint on only 8000 bumping during ma…
mroeschke Apr 28, 2023
dd7c62a
Merge remote-tracking branch 'upstream/main' into pdep/pyarrow
mroeschke May 9, 2023
2ddd82a
Change required proposal for 3.0 to be version requiring pyarrow & st…
mroeschke May 9, 2023
3c54d22
Merge remote-tracking branch 'upstream/main' into pdep/pyarrow
mroeschke May 9, 2023
1b60fbb
Address typos
mroeschke May 9, 2023
70cdf74
Merge branch 'main' into pdep/pyarrow
mroeschke May 24, 2023
14602a6
Merge branch 'main' into pdep/pyarrow
mroeschke Jun 1, 2023
2cfb92f
Merge branch 'main' into pdep/pyarrow
mroeschke Jun 9, 2023
e0e406c
Merge branch 'main' into pdep/pyarrow
mroeschke Jun 20, 2023
f047032
Update 0010-required-pyarrow-dependency.md
phofl Jul 2, 2023
ed28c04
Update web/pandas/pdeps/0010-required-pyarrow-dependency.md
phofl Jul 3, 2023
99de932
Update 0010-required-pyarrow-dependency.md
phofl Jul 4, 2023
99fd739
Update 0010-required-pyarrow-dependency.md
phofl Jul 4, 2023
9384bc7
Update 0010-required-pyarrow-dependency.md
phofl Jul 4, 2023
c3beeb3
Update 0010-required-pyarrow-dependency.md
phofl Jul 4, 2023
8347e83
improve structure, list user benefits more clearly, add faq
MarcoGorelli Jul 5, 2023
d740403
restore little demo
MarcoGorelli Jul 5, 2023
959873e
remove masked part, note that pyarrow dtyeps will likely be ready by 3
MarcoGorelli Jul 5, 2023
f936280
Merge pull request #26 from MarcoGorelli/pdep10-amendments
mroeschke Jul 6, 2023
2db0037
Update 0010-required-pyarrow-dependency.md
phofl Jul 13, 2023
c2b8cfe
Merge branch 'main' into pdep/pyarrow
mroeschke Jul 25, 2023
4e05151
Update 0010-required-pyarrow-dependency.md
phofl Jul 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update web/pandas/pdeps/0010-required-pyarrow-dependency.md
Co-authored-by: Irv Lustig <irv@princeton.com>
  • Loading branch information
phofl and Dr-Irv authored Apr 21, 2023
commit 6d667b483657b72107fee1daa65bfb563788df1b
2 changes: 1 addition & 1 deletion web/pandas/pdeps/0010-required-pyarrow-dependency.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

This PDEP proposes that:

- PyArrow becomes a runtime dependency starting pandas 2.1
- PyArrow becomes a runtime dependency starting with pandas 2.1
- The minimum version of PyArrow supported starting pandas 2.1 is version 7.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this version be consistent across the entire pandas API?

e.g. If I wanted to bump the pyarrow version for just the CSV parser to something higher, would I be able to do it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The minimum version would be consistent across the library, but IMO that shouldn't stop development of features that exist in newer versions of pyarrow (we already do this with version checking or try/except)

- The minimum version of PyArrow will be bumped every major pandas release to the highest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. This might be too aggressive and might also make it hard to predict what the minimum version will be.

I'd recommend following what we do for numpy, which is according to NEP 29, support
"all minor versions of NumPy released in the prior 24 months from the anticipated release date with a minimum of 3 minor versions of NumPy", for arrow as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the challenge with offering a similar support window for the two libraries is that NumPy has a very stable ABI whereas PyArrow does not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I'm missing something, but sounds like what @lithomas1 is proposing is pretty much the same as what's written in the proposal but phrased in a different way. Has the proposal been updated, or am I misunderstanding that supporting the releases of the last 24 months, and supporting the highest/oldest version two years old?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this as every major release we will bump the min required version of pyarrow to the latest version, but might be misreading here.

Note that my proposed change would be different in that we would drop Arrow versions in both major/minor versions (as opposed to every major version), just like we do with numpy (once we reach the end of the NEP support window).

I think the challenge with offering a similar support window for the two libraries is that NumPy has a very stable ABI whereas PyArrow does not

I might have missed some more discussion on this, but I thought we were going to restrict current usage of pyarrow to just what's exposed through Python.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this as every major release we will bump the min required version of pyarrow to the latest version, but might be misreading here.

To the latest version that has been released for at least 2 years. So, the minimum PyArrow version we support will be around 24 months old, and we should be supporting all the versions since that one, so more or less the same policy as NumPy. @mroeschke not sure if it's easy to rephrase in a way that it's more obvious what's the policy.

About bumping in major or minor releases, I don't have a preference, either is fine for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rephrase this to make it more clear but @datapythonista has it correct. The only distinction here, compared to what we do with numpy today, is that pyarrow would be bumped only during a pandas major release.

I think the challenge with offering a similar support window for the two libraries is that NumPy has a very stable ABI whereas PyArrow does not

Under this proposal, PyArrow will only be used as a runtime dependency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you consider this to upgrading pyarrow in both major and minor versions to be consistent with numpy?

I ask because it is probably tricky for downstream to predict the length of our major release cycle (for 2.0 I think we delayed it twice. IIRC 1.4 was supposed to be 2.0).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you consider this to upgrading pyarrow in both major and minor versions to be consistent with numpy?

Sure that would be okay with me too

PyArrow version that has been released for at least 2 years, and the minimum PyArrow version will be
Expand Down
0