|
| 1 | +======================================================= |
| 2 | +NEP 23 — Backwards compatibility and deprecation policy |
| 3 | +======================================================= |
| 4 | + |
| 5 | +:Author: Ralf Gommers <ralf.gommers@gmail.com> |
| 6 | +:Status: Draft |
| 7 | +:Type: Process |
| 8 | +:Created: 2018-07-14 |
| 9 | +:Resolution: <url> (required for Accepted | Rejected | Withdrawn) |
| 10 | + |
| 11 | +Abstract |
| 12 | +-------- |
| 13 | + |
| 14 | +In this NEP we describe NumPy's approach to backwards compatibility, |
| 15 | +its deprecation and removal policy, and the trade-offs and decision |
| 16 | +processes for individual cases where breaking backwards compatibility |
| 17 | +is considered. |
| 18 | + |
| 19 | + |
| 20 | +Detailed description |
| 21 | +-------------------- |
| 22 | + |
| 23 | +NumPy has a very large user base. Those users rely on NumPy being stable |
| 24 | +and the code they write that uses NumPy functionality to keep working. |
| 25 | +NumPy is also actively maintained and improved -- and sometimes improvements |
| 26 | +require, or are made much easier, by breaking backwards compatibility. |
| 27 | +Finally, there are trade-offs in stability for existing users vs. avoiding |
| 28 | +errors or having a better user experience for new users. These competing |
| 29 | +needs often give rise to heated debates and delays in accepting or rejecting |
| 30 | +contributions. This NEP tries to address that by providing a policy as well |
| 31 | +as examples and rationales for when it is or isn't a good idea to break |
| 32 | +backwards compatibility. |
| 33 | + |
| 34 | +General principles: |
| 35 | + |
| 36 | +- Aim not to break users' code unnecessarily. |
| 37 | +- Aim never to change code in ways that can result in users silently getting |
| 38 | + incorrect results from their previously working code. |
| 39 | +- Backwards incompatible changes can be made, provided the benefits outweigh |
| 40 | + the costs. |
| 41 | +- When assessing the costs, keep in mind that most users do not read the mailing |
| 42 | + list, do not look at deprecation warnings, and sometimes wait more than one or |
| 43 | + two years before upgrading from their old version. And that NumPy has |
| 44 | + many hundreds of thousands or even a couple of million users, so "no one will |
| 45 | + do or use this" is very likely incorrect. |
| 46 | +- Benefits include improved functionality, usability and performance (in order |
| 47 | + of importance), as well as lower maintenance cost and improved future |
| 48 | + extensibility. |
| 49 | +- Bug fixes are exempt from the backwards compatibility policy. However in case |
| 50 | + of serious impact on users (e.g. a downstream library doesn't build anymore), |
| 51 | + even bug fixes may have to be delayed for one or more releases. |
| 52 | +- The Python API and the C API will be treated in the same way. |
| 53 | + |
| 54 | + |
| 55 | +Examples |
| 56 | +^^^^^^^^ |
| 57 | + |
| 58 | +We now discuss a number of concrete examples to illustrate typical issues |
| 59 | +and trade-offs. |
| 60 | + |
| 61 | +**Changing the behavior of a function** |
| 62 | + |
| 63 | +``np.histogram`` is probably the most infamous example. |
| 64 | +First, a new keyword ``new=False`` was introduced, this was then switched |
| 65 | +over to None one release later, and finally it was removed again. |
| 66 | +Also, it has a ``normed`` keyword that had behavior that could be considered |
| 67 | +either suboptimal or broken (depending on ones opinion on the statistics). |
| 68 | +A new keyword ``density`` was introduced to replace it; ``normed`` started giving |
| 69 | +``DeprecationWarning`` only in v.1.15.0. Evolution of ``histogram``:: |
| 70 | + |
| 71 | + def histogram(a, bins=10, range=None, normed=False): # v1.0.0 |
| 72 | + |
| 73 | + def histogram(a, bins=10, range=None, normed=False, weights=None, new=False): #v1.1.0 |
| 74 | + |
| 75 | + def histogram(a, bins=10, range=None, normed=False, weights=None, new=None): #v1.2.0 |
| 76 | + |
| 77 | + def histogram(a, bins=10, range=None, normed=False, weights=None): #v1.5.0 |
| 78 | + |
| 79 | + def histogram(a, bins=10, range=None, normed=False, weights=None, density=None): #v1.6.0 |
| 80 | + |
| 81 | + def histogram(a, bins=10, range=None, normed=None, weights=None, density=None): #v1.15.0 |
| 82 | + # v1.15.0 was the first release where `normed` started emitting |
| 83 | + # DeprecationWarnings |
| 84 | + |
| 85 | +The ``new`` keyword was planned from the start to be temporary. Such a plan |
| 86 | +forces users to change their code more than once, which is almost never the |
| 87 | +right thing to do. Instead, a better approach here would have been to |
| 88 | +deprecate ``histogram`` and introduce a new function ``hist`` in its place. |
| 89 | + |
| 90 | +**Returning a view rather than a copy** |
| 91 | + |
| 92 | +The ``ndarray.diag`` method used to return a copy. A view would be better for |
| 93 | +both performance and design consistency. This change was warned about |
| 94 | +(``FutureWarning``) in v.8.0, and in v1.9.0 ``diag`` was changed to return |
| 95 | +a *read-only* view. The planned change to a writeable view in v1.10.0 was |
| 96 | +postponed due to backwards compatibility concerns, and is still an open issue |
| 97 | +(gh-7661). |
| 98 | + |
| 99 | +What should have happened instead: nothing. This change resulted in a lot of |
| 100 | +discussions and wasted effort, did not achieve its final goal, and was not that |
| 101 | +important in the first place. Finishing the change to a *writeable* view in |
| 102 | +the future is not desired, because it will result in users silently getting |
| 103 | +different results if they upgraded multiple versions or simply missed the |
| 104 | +warnings. |
| 105 | + |
| 106 | +**Disallowing indexing with floats** |
| 107 | + |
| 108 | +Indexing an array with floats is asking for something ambiguous, and can be a |
| 109 | +sign of a bug in user code. After some discussion, it was deemed a good idea |
| 110 | +to deprecate indexing with floats. This was first tried for the v1.8.0 |
| 111 | +release, however in pre-release testing it became clear that this would break |
| 112 | +many libraries that depend on NumPy. Therefore it was reverted before release, |
| 113 | +to give those libraries time to fix their code first. It was finally |
| 114 | +introduced for v1.11.0 and turned into a hard error for v1.12.0. |
| 115 | + |
| 116 | +This change was disruptive, however it did catch real bugs in, e.g., SciPy and |
| 117 | +scikit-learn. Overall the change was worth the cost, and introducing it in |
| 118 | +master first to allow testing, then removing it again before a release, is a |
| 119 | +useful strategy. |
| 120 | + |
| 121 | +Similar recent deprecations also look like good examples of |
| 122 | +cleanups/improvements: |
| 123 | + |
| 124 | +- removing deprecated boolean indexing (gh-8312) |
| 125 | +- deprecating truth testing on empty arrays (gh-9718) |
| 126 | +- deprecating ``np.sum(generator)`` (gh-10670, one issue with this one is that |
| 127 | + its warning message is wrong - this should error in the future). |
| 128 | + |
| 129 | +**Removing the financial functions** |
| 130 | + |
| 131 | +The financial functions (e.g. ``np.pmt``) are badly named, are present in the |
| 132 | +main NumPy namespace, and don't really fit well within NumPy's scope. |
| 133 | +They were added in 2008 after |
| 134 | +`a discussion <https://mail.python.org/pipermail/numpy-discussion/2008-April/032353.html>`_ |
| 135 | +on the mailing list where opinion was divided (but a majority in favor). |
| 136 | +At the moment these functions don't cause a lot of overhead, however there are |
| 137 | +multiple issues and PRs a year for them which cost maintainer time to deal |
| 138 | +with. And they clutter up the ``numpy`` namespace. Discussion in 2013 happened |
| 139 | +on removing them again (gh-2880). |
| 140 | + |
| 141 | +This case is borderline, but given that they're clearly out of scope, |
| 142 | +deprecation and removal out of at least the main ``numpy`` namespace can be |
| 143 | +proposed. Alternatively, document clearly that new features for financial |
| 144 | +functions are unwanted, to keep the maintenance costs to a minimum. |
| 145 | + |
| 146 | +**Examples of features not added because of backwards compatibility** |
| 147 | + |
| 148 | +TODO: do we have good examples here? Possibly subclassing related? |
| 149 | + |
| 150 | + |
| 151 | +Removing complete submodules |
| 152 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 153 | + |
| 154 | +This year there have been suggestions to consider removing some or all of |
| 155 | +``numpy.distutils``, ``numpy.f2py``, ``numpy.linalg``, and ``numpy.random``. |
| 156 | +The motivation was that all these cost maintenance effort, and that they slow |
| 157 | +down work on the core of Numpy (ndarrays, dtypes and ufuncs). |
| 158 | + |
| 159 | +The impact on downstream libraries and users would be very large, and |
| 160 | +maintenance of these modules would still have to happen. Therefore this is |
| 161 | +simply not a good idea; removing these submodules should not happen even for |
| 162 | +a new major version of NumPy. |
| 163 | + |
| 164 | + |
| 165 | +Subclassing of ndarray |
| 166 | +^^^^^^^^^^^^^^^^^^^^^^ |
| 167 | + |
| 168 | +Subclassing of ``ndarray`` is a pain point. ``ndarray`` was not (or at least |
| 169 | +not well) designed to be subclassed. Despite that, a lot of subclasses have |
| 170 | +been created even within the NumPy code base itself, and some of those (e.g. |
| 171 | +``MaskedArray``, ``astropy.units.Quantity``) are quite popular. The main |
| 172 | +problems with subclasses are: |
| 173 | + |
| 174 | +- They make it hard to change ``ndarray`` in ways that would otherwise be |
| 175 | + backwards compatible. |
| 176 | +- Some of them change the behavior of ndarray methods, making it difficult to |
| 177 | + write code that accepts array duck-types. |
| 178 | + |
| 179 | +Subclassing ``ndarray`` has been officially discouraged for a long time. Of |
| 180 | +the most important subclasses, ``np.matrix`` will be deprecated (see gh-10142) |
| 181 | +and ``MaskedArray`` will be kept in NumPy (`NEP 17 |
| 182 | +<http://www.numpy.org/neps/nep-0017-split-out-maskedarray.html>`_). |
| 183 | +``MaskedArray`` will ideally be rewritten in a way such that it uses only |
| 184 | +public NumPy APIs. For subclasses outside of NumPy, more work is needed to |
| 185 | +provide alternatives (e.g. mixins, see gh-9016 and gh-10446) or better support |
| 186 | +for custom dtypes (see gh-2899). Until that is done, subclasses need to be |
| 187 | +taken into account when making change to the NumPy code base. A future change |
| 188 | +in NumPy to not support subclassing will certainly need a major version |
| 189 | +increase. |
| 190 | + |
| 191 | + |
| 192 | +Policy |
| 193 | +------ |
| 194 | + |
| 195 | +1. Code changes that have the potential to silently change the results of a users' |
| 196 | + code must never be made (except in the case of clear bugs). |
| 197 | +2. Code changes that break users' code (i.e. the user will see a clear exception) |
| 198 | + can be made, *provided the benefit is worth the cost* and suitable deprecation |
| 199 | + warnings have been raised first. |
| 200 | +3. Deprecation warnings are in all cases warnings that functionality will be removed. |
| 201 | + If there is no intent to remove functionlity, then deprecation in documentation |
| 202 | + only or other types of warnings shall be used. |
| 203 | +4. Deprecations for stylistic reasons (e.g. consistency between functions) are |
| 204 | + strongly discouraged. |
| 205 | + |
| 206 | +Deprecations: |
| 207 | + |
| 208 | +- shall include the version numbers of both when the functionality was deprecated |
| 209 | + and when it will be removed (either two releases after the warning is |
| 210 | + introduced, or in the next major version). |
| 211 | +- shall include information on alternatives to the deprecated functionality, or a |
| 212 | + reason for the deprecation if no clear alternative is available. |
| 213 | +- shall use ``VisibleDeprecationWarning`` rather than ``DeprecationWarning`` |
| 214 | + for cases of relevance to end users (as opposed to cases only relevant to |
| 215 | + libraries building on top of NumPy). |
| 216 | +- shall be listed in the release notes of the release where the deprecation happened. |
| 217 | + |
| 218 | +Removal of deprecated functionality: |
| 219 | + |
| 220 | +- shall be done after 2 releases (assuming a 6-monthly release cycle; if that changes, |
| 221 | + there shall be at least 1 year between deprecation and removal), unless the |
| 222 | + impact of the removal is such that a major version number increase is |
| 223 | + warranted. |
| 224 | +- shall be listed in the release notes of the release where the removal happened. |
| 225 | + |
| 226 | +Versioning: |
| 227 | + |
| 228 | +- removal of deprecated code can be done in any minor (but not bugfix) release. |
| 229 | +- for heavily used functionality (e.g. removal of ``np.matrix``, of a whole submodule, |
| 230 | + or significant changes to behavior for subclasses) the major version number shall |
| 231 | + be increased. |
| 232 | + |
| 233 | +In concrete cases where this policy needs to be applied, decisions are made according |
| 234 | +to the `NumPy governance model |
| 235 | +<https://docs.scipy.org/doc/numpy/dev/governance/index.html>`_. |
| 236 | + |
| 237 | +Functionality with more strict policies: |
| 238 | + |
| 239 | +- ``numpy.random`` has its own backwards compatibility policy, |
| 240 | + see `NEP 19 <http://www.numpy.org/neps/nep-0019-rng-policy.html>`_. |
| 241 | +- The file format for ``.npy`` and ``.npz`` files must not be changed in a backwards |
| 242 | + incompatible way. |
| 243 | + |
| 244 | + |
| 245 | +Alternatives |
| 246 | +------------ |
| 247 | + |
| 248 | +**Being more aggressive with deprecations.** |
| 249 | + |
| 250 | +The goal of being more aggressive is to allow NumPy to move forward faster. |
| 251 | +This would avoid others inventing their own solutions (often in multiple |
| 252 | +places), as well as be a benefit to users without a legacy code base. We |
| 253 | +reject this alternative because of the place NumPy has in the scientific Python |
| 254 | +ecosystem - being fairly conservative is required in order to not increase the |
| 255 | +extra maintenance for downstream libraries and end users to an unacceptable |
| 256 | +level. |
| 257 | + |
| 258 | +**Semantic versioning.** |
| 259 | + |
| 260 | +This would change the versioning scheme for code removals; those could then |
| 261 | +only be done when the major version number is increased. Rationale for |
| 262 | +rejection: semantic versioning is relatively common in software engineering, |
| 263 | +however it is not at all common in the Python world. Also, it would mean that |
| 264 | +NumPy's version number simply starts to increase faster, which would be more |
| 265 | +confusing than helpful. gh-10156 contains more discussion on this alternative. |
| 266 | + |
| 267 | + |
| 268 | +Discussion |
| 269 | +---------- |
| 270 | + |
| 271 | +TODO |
| 272 | + |
| 273 | +This section may just be a bullet list including links to any discussions |
| 274 | +regarding the NEP: |
| 275 | + |
| 276 | +- This includes links to mailing list threads or relevant GitHub issues. |
| 277 | + |
| 278 | + |
| 279 | +References and Footnotes |
| 280 | +------------------------ |
| 281 | + |
| 282 | +.. [1] TODO |
| 283 | +
|
| 284 | +
|
| 285 | +Copyright |
| 286 | +--------- |
| 287 | + |
| 288 | +This document has been placed in the public domain. [1]_ |
0 commit comments