8000 Merge pull request #11596 from rgommers/nep-backcompat · numpy/numpy@5f2a5ae · GitHub
[go: up one dir, main page]

Skip to content

Commit 5f2a5ae

Browse files
authored
Merge pull request #11596 from rgommers/nep-backcompat
NEP: backwards compatibility and deprecation policy
2 parents c233a1e + 5ee09d4 commit 5f2a5ae

File tree

1 file changed

+288
-0
lines changed

1 file changed

+288
-0
lines changed
Lines changed: 288 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,288 @@
1+
=======================================================
2+
NEP 23 — Backwards compatibility and deprecation policy
3+
=======================================================
4+
5+
:Author: Ralf Gommers <ralf.gommers@gmail.com>
6+
:Status: Draft
7+
:Type: Process
8+
:Created: 2018-07-14
9+
:Resolution: <url> (required for Accepted | Rejected | Withdrawn)
10+
11+
Abstract
12+
--------
13+
14+
In this NEP we describe NumPy's approach to backwards compatibility,
15+
its deprecation and removal policy, and the trade-offs and decision
16+
processes for individual cases where breaking backwards compatibility
17+
is considered.
18+
19+
20+
Detailed description
21+
--------------------
22+
23+
NumPy has a very large user base. Those users rely on NumPy being stable
24+
and the code they write that uses NumPy functionality to keep working.
25+
NumPy is also actively maintained and improved -- and sometimes improvements
26+
require, or are made much easier, by breaking backwards compatibility.
27+
Finally, there are trade-offs in stability for existing users vs. avoiding
28+
errors or having a better user experience for new users. These competing
29+
needs often give rise to heated debates and delays in accepting or rejecting
30+
contributions. This NEP tries to address that by providing a policy as well
31+
as examples and rationales for when it is or isn't a good idea to break
32+
backwards compatibility.
33+
34+
General principles:
35+
36+
- Aim not to break users' code unnecessarily.
37+
- Aim never to change code in ways that can result in users silently getting
38+
incorrect results from their previously working code.
39+
- Backwards incompatible changes can be made, provided the benefits outweigh
40+
the costs.
41+
- When assessing the costs, keep in mind that most users do not read the mailing
42+
list, do not look at deprecation warnings, and sometimes wait more than one or
43+
two years before upgrading from their old version. And that NumPy has
44+
many hundreds of thousands or even a couple of million users, so "no one will
45+
do or use this" is very likely incorrect.
46+
- Benefits include improved functionality, usability and performance (in order
47+
of importance), as well as lower maintenance cost and improved future
48+
extensibility.
49+
- Bug fixes are exempt from the backwards compatibility policy. However in case
50+
of serious impact on users (e.g. a downstream library doesn't build anymore),
51+
even bug fixes may have to be delayed for one or more releases.
52+
- The Python API and the C API will be treated in the same way.
53+
54+
55+
Examples
56+
^^^^^^^^
57+
58+
We now discuss a number of concrete examples to illustrate typical issues
59+
and trade-offs.
60+
61+
**Changing the behavior of a function**
62+
63+
``np.histogram`` is probably the most infamous example.
64+
First, a new keyword ``new=False`` was introduced, this was then switched
65+
over to None one release later, and finally it was removed again.
66+
Also, it has a ``normed`` keyword that had behavior that could be considered
67+
either suboptimal or broken (depending on ones opinion on the statistics).
68+
A new keyword ``density`` was introduced to replace it; ``normed`` started giving
69+
``DeprecationWarning`` only in v.1.15.0. Evolution of ``histogram``::
70+
71+
def histogram(a, bins=10, range=None, normed=False): # v1.0.0
72+
73+
def histogram(a, bins=10, range=None, normed=False, weights=None, new=False): #v1.1.0
74+
75+
def histogram(a, bins=10, range=None, normed=False, weights=None, new=None): #v1.2.0
76+
77+
def histogram(a, bins=10, range=None, normed=False, weights=None): #v1.5.0
78+
79+
def histogram(a, bins=10, range=None, normed=False, weights=None, density=None): #v1.6.0
80+
81+
def histogram(a, bins=10, range=None, normed=None, weights=None, density=None): #v1.15.0
82+
# v1.15.0 was the first release where `normed` started emitting
83+
# DeprecationWarnings
84+
85+
The ``new`` keyword was planned from the start to be temporary. Such a plan
86+
forces users to change their code more than once, which is almost never the
87+
right thing to do. Instead, a better approach here would have been to
88+
deprecate ``histogram`` and introduce a new function ``hist`` in its place.
89+
90+
**Returning a view rather than a copy**
91+
92+
The ``ndarray.diag`` method used to return a copy. A view would be better for
93+
both performance and design consistency. This change was warned about
94+
(``FutureWarning``) in v.8.0, and in v1.9.0 ``diag`` was changed to return
95+
a *read-only* view. The planned change to a writeable view in v1.10.0 was
96+
postponed due to backwards compatibility concerns, and is still an open issue
97+
(gh-7661).
98+
99+
What should have happened instead: nothing. This change resulted in a lot of
100+
discussions and wasted effort, did not achieve its final goal, and was not that
101+
important in the first place. Finishing the change to a *writeable* view in
102+
the future is not desired, because it will result in users silently getting
103+
different results if they upgraded multiple versions or simply missed the
104+
warnings.
105+
106+
**Disallowing indexing with floats**
107+
108+
Indexing an array with floats is asking for something ambiguous, and can be a
109+
sign of a bug in user code. After some discussion, it was deemed a good idea
110+
to deprecate indexing with floats. This was first tried for the v1.8.0
111+
release, however in pre-release testing it became clear that this would break
112+
many libraries that depend on NumPy. Therefore it was reverted before release,
113+
to give those libraries time to fix their code first. It was finally
114+
introduced for v1.11.0 and turned into a hard error for v1.12.0.
115+
116+
This change was disruptive, however it did catch real bugs in, e.g., SciPy and
117+
scikit-learn. Overall the change was worth the cost, and introducing it in
118+
master first to allow testing, then removing it again before a release, is a
119+
useful strategy.
120+
121+
Similar recent deprecations also look like good examples of
122+
cleanups/improvements:
123+
124+
- removing deprecated boolean indexing (gh-8312)
125+
- deprecating truth testing on empty arrays (gh-9718)
126+
- deprecating ``np.sum(generator)`` (gh-10670, one issue with this one is that
127+
its warning message is wrong - this should error in the future).
128+
129+
**Removing the financial functions**
130+
131+
The financial functions (e.g. ``np.pmt``) are badly named, are present in the
132+
main NumPy namespace, and don't really fit well within NumPy's scope.
133+
They were added in 2008 after
134+
`a discussion <https://mail.python.org/pipermail/numpy-discussion/2008-April/032353.html>`_
135+
on the mailing list where opinion was divided (but a majority in favor).
136+
At the moment these functions don't cause a lot of overhead, however there are
137+
multiple issues and PRs a year for them which cost maintainer time to deal
138+
with. And they clutter up the ``numpy`` namespace. Discussion in 2013 happened
139+
on removing them again (gh-2880).
140+
141+
This case is borderline, but given that they're clearly out of scope,
142+
deprecation and removal out of at least the main ``numpy`` namespace can be
143+
proposed. Alternatively, document clearly that new features for financial
144+
functions are unwanted, to keep the maintenance costs to a minimum.
145+
146+
**Examples of features not added because of backwards compatibility**
147+
148+
TODO: do we have good examples here? Possibly subclassing related?
149+
150+
151+
Removing complete submodules
152+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
153+
154+
This year there have been suggestions to consider removing some or all of
155+
``numpy.distutils``, ``numpy.f2py``, ``numpy.linalg``, and ``numpy.random``.
156+
The motivation was that all these cost maintenance effort, and that they slow
157+
down work on the core of Numpy (ndarrays, dtypes and ufuncs).
158+
159+
The impact on downstream libraries and users would be very large, and
160+
maintenance of these modules would still have to happen. Therefore this is
161+
simply not a good idea; removing these submodules should not happen even for
162+
a new major version of NumPy.
163+
164+
165+
Subclassing of ndarray
166+
^^^^^^^^^^^^^^^^^^^^^^
167+
168+
Subclassing of ``ndarray`` is a pain point. ``ndarray`` was not (or at least
169+
not well) designed to be subclassed. Despite that, a lot of subclasses have
170+
been created even within the NumPy code base itself, and some of those (e.g.
171+
``MaskedArray``, ``astropy.units.Quantity``) are quite popular. The main
172+
problems with subclasses are:
173+
174+
- They make it hard to change ``ndarray`` in ways that would otherwise be
175+
backwards compatible.
176+
- Some of them change the behavior of ndarray methods, making it difficult to
177+
write code that accepts array duck-types.
178+
179+
Subclassing ``ndarray`` has been officially discouraged for a long time. Of
180+
the most important subclasses, ``np.matrix`` will be deprecated (see gh-10142)
181+
and ``MaskedArray`` will be kept in NumPy (`NEP 17
182+
<http://www.numpy.org/neps/nep-0017-split-out-maskedarray.html>`_).
183+
``MaskedArray`` will ideally be rewritten in a way such that it uses only
184+
public NumPy APIs. For subclasses outside of NumPy, more work is needed to
185+
provide alternatives (e.g. mixins, see gh-9016 and gh-10446) or better support
186+
for custom dtypes (see gh-2899). Until that is done, subclasses need to be
187+
taken into account when making change to the NumPy code base. A future change
188+
in NumPy to not support subclassing will certainly need a major version
189+
increase.
190+
191+
192+
Policy
193+
------
194+
195+
1. Code changes that have the potential to silently change the results of a users'
196+
code must never be made (except in the case of clear bugs).
197+
2. Code changes that break users' code (i.e. the user will see a clear exception)
198+
can be made, *provided the benefit is worth the cost* and suitable deprecation
199+
warnings have been raised first.
200+
3. Deprecation warnings are in all cases warnings that functionality will be removed.
201+
If there is no intent to remove functionlity, then deprecation in documentation
202+
only or other types of warnings shall be used.
203+
4. Deprecations for stylistic reasons (e.g. consistency between functions) are
204+
strongly discouraged.
205+
206+
Deprecations:
207+
208+
- shall include the version numbers of both when the functionality was deprecated
209+
and when it will be removed (either two releases after the warning is
210+
introduced, or in the next major version).
211+
- shall include information on alternatives to the deprecated functionality, or a
212+
reason for the deprecation if no clear alternative is available.
213+
- shall use ``VisibleDeprecationWarning`` rather than ``DeprecationWarning``
214+
for cases of relevance to end users (as opposed to cases only relevant to
215+
libraries building on top of NumPy).
216+
- shall be listed in the release notes of the release where the deprecation happened.
217+
218+
Removal of deprecated functionality:
219+
220+
- shall be done after 2 releases (assuming a 6-monthly release cycle; if that changes,
221+
there shall be at least 1 year between deprecation and removal), unless the
222+
impact of the removal is such that a major version number increase is
223+
warranted.
224+
- shall be listed in the release notes of the release where the removal happened.
225+
226+
Versioning:
227+
228+
- removal of deprecated code can be done in any minor (but not bugfix) release.
229+
- for heavily used functionality (e.g. removal of ``np.matrix``, of a whole submodule,
230+
or significant changes to behavior for subclasses) the major version number shall
231+
be increased.
232+
233+
In concrete cases where this policy needs to be applied, decisions are made according
234+
to the `NumPy governance model
235+
<https://docs.scipy.org/doc/numpy/dev/governance/index.html>`_.
236+
237+
Functionality with more strict policies:
238+
239+
- ``numpy.random`` has its own backwards compatibility policy,
240+
see `NEP 19 <http://www.numpy.org/neps/nep-0019-rng-policy.html>`_.
241+
- The file format for ``.npy`` and ``.npz`` files must not be changed in a backwards
242+
incompatible way.
243+
244+
245+
Alternatives
246+
------------
247+
248+
**Being more aggressive with deprecations.**
249+
250+
The goal of being more aggressive is to allow NumPy to move forward faster.
251+
This would avoid others inventing their own solutions (often in multiple
252+
places), as well as be a benefit to users without a legacy code base. We
253+
reject this alternative because of the place NumPy has in the scientific Python
254+
ecosystem - being fairly conservative is required in order to not increase the
255+
extra maintenance for downstream libraries and end users to an unacceptable
256+
level.
257+
258+
**Semantic versioning.**
259+
260+
This would change the versioning scheme for code removals; those could then
261+
only be done when the major version number is increased. Rationale for
262+
rejection: semantic versioning is relatively common in software engineering,
263+
however it is not at all common in the Python world. Also, it would mean that
264+
NumPy's version number simply starts to increase faster, which would be more
265+
confusing than helpful. gh-10156 contains more discussion on this alternative.
266+
267+
268+
Discussion
269+
----------
270+
271+
TODO
272+
273+
This section may just be a bullet list including links to any discussions
274+
regarding the NEP:
275+
276+
- This includes links to mailing list threads or relevant GitHub issues.
277+
278+
279+
References and Footnotes
280+
------------------------
281+
282+
.. [1] TODO
283+
284+
285+
Copyright
286+
---------
287+
288+
This document has been placed in the public domain. [1]_

0 commit comments

Comments
 (0)
0