diff --git a/doc/neps/nep-0023-backwards-compatibility.rst b/doc/neps/nep-0023-backwards-compatibility.rst new file mode 100644 index 000000000000..158b08f1f18d --- /dev/null +++ b/doc/neps/nep-0023-backwards-compatibility.rst @@ -0,0 +1,288 @@ +======================================================= +NEP 23 — Backwards compatibility and deprecation policy +======================================================= + +:Author: Ralf Gommers +:Status: Draft +:Type: Process +:Created: 2018-07-14 +:Resolution: (required for Accepted | Rejected | Withdrawn) + +Abstract +-------- + +In this NEP we describe NumPy's approach to backwards compatibility, +its deprecation and removal policy, and the trade-offs and decision +processes for individual cases where breaking backwards compatibility +is considered. + + +Detailed description +-------------------- + +NumPy has a very large user base. Those users rely on NumPy being stable +and the code they write that uses NumPy functionality to keep working. +NumPy is also actively maintained and improved -- and sometimes improvements +require, or are made much easier, by breaking backwards compatibility. +Finally, there are trade-offs in stability for existing users vs. avoiding +errors or having a better user experience for new users. These competing +needs often give rise to heated debates and delays in accepting or rejecting +contributions. This NEP tries to address that by providing a policy as well +as examples and rationales for when it is or isn't a good idea to break +backwards compatibility. + +General principles: + +- Aim not to break users' code unnecessarily. +- Aim never to change code in ways that can result in users silently getting + incorrect results from their previously working code. +- Backwards incompatible changes can be made, provided the benefits outweigh + the costs. +- When assessing the costs, keep in mind that most users do not read the mailing + list, do not look at deprecation warnings, and sometimes wait more than one or + two years before upgrading from their old version. And that NumPy has + many hundreds of thousands or even a couple of million users, so "no one will + do or use this" is very likely incorrect. +- Benefits include improved functionality, usability and performance (in order + of importance), as well as lower maintenance cost and improved future + extensibility. +- Bug fixes are exempt from the backwards compatibility policy. However in case + of serious impact on users (e.g. a downstream library doesn't build anymore), + even bug fixes may have to be delayed for one or more releases. +- The Python API and the C API will be treated in the same way. + + +Examples +^^^^^^^^ + +We now discuss a number of concrete examples to illustrate typical issues +and trade-offs. + +**Changing the behavior of a function** + +``np.histogram`` is probably the most infamous example. +First, a new keyword ``new=False`` was introduced, this was then switched +over to None one release later, and finally it was removed again. +Also, it has a ``normed`` keyword that had behavior that could be considered +either suboptimal or broken (depending on ones opinion on the statistics). +A new keyword ``density`` was introduced to replace it; ``normed`` started giving +``DeprecationWarning`` only in v.1.15.0. Evolution of ``histogram``:: + + def histogram(a, bins=10, range=None, normed=False): # v1.0.0 + + def histogram(a, bins=10, range=None, normed=False, weights=None, new=False): #v1.1.0 + + def histogram(a, bins=10, range=None, normed=False, weights=None, new=None): #v1.2.0 + + def histogram(a, bins=10, range=None, normed=False, weights=None): #v1.5.0 + + def histogram(a, bins=10, range=None, normed=False, weights=None, density=None): #v1.6.0 + + def histogram(a, bins=10, range=None, normed=None, weights=None, density=None): #v1.15.0 + # v1.15.0 was the first release where `normed` started emitting + # DeprecationWarnings + +The ``new`` keyword was planned from the start to be temporary. Such a plan +forces users to change their code more than once, which is almost never the +right thing to do. Instead, a better approach here would have been to +deprecate ``histogram`` and introduce a new function ``hist`` in its place. + +**Returning a view rather than a copy** + +The ``ndarray.diag`` method used to return a copy. A view would be better for +both performance and design consistency. This change was warned about +(``FutureWarning``) in v.8.0, and in v1.9.0 ``diag`` was changed to return +a *read-only* view. The planned change to a writeable view in v1.10.0 was +postponed due to backwards compatibility concerns, and is still an open issue +(gh-7661). + +What should have happened instead: nothing. This change resulted in a lot of +discussions and wasted effort, did not achieve its final goal, and was not that +important in the first place. Finishing the change to a *writeable* view in +the future is not desired, because it will result in users silently getting +different results if they upgraded multiple versions or simply missed the +warnings. + +**Disallowing indexing with floats** + +Indexing an array with floats is asking for something ambiguous, and can be a +sign of a bug in user code. After some discussion, it was deemed a good idea +to deprecate indexing with floats. This was first tried for the v1.8.0 +release, however in pre-release testing it became clear that this would break +many libraries that depend on NumPy. Therefore it was reverted before release, +to give those libraries time to fix their code first. It was finally +introduced for v1.11.0 and turned into a hard error for v1.12.0. + +This change was disruptive, however it did catch real bugs in, e.g., SciPy and +scikit-learn. Overall the change was worth the cost, and introducing it in +master first to allow testing, then removing it again before a release, is a +useful strategy. + +Similar recent deprecations also look like good examples of +cleanups/improvements: + +- removing deprecated boolean indexing (gh-8312) +- deprecating truth testing on empty arrays (gh-9718) +- deprecating ``np.sum(generator)`` (gh-10670, one issue with this one is that + its warning message is wrong - this should error in the future). + +**Removing the financial functions** + +The financial functions (e.g. ``np.pmt``) are badly named, are present in the +main NumPy namespace, and don't really fit well within NumPy's scope. +They were added in 2008 after +`a discussion `_ +on the mailing list where opinion was divided (but a majority in favor). +At the moment these functions don't cause a lot of overhead, however there are +multiple issues and PRs a year for them which cost maintainer time to deal +with. And they clutter up the ``numpy`` namespace. Discussion in 2013 happened +on removing them again (gh-2880). + +This case is borderline, but given that they're clearly out of scope, +deprecation and removal out of at least the main ``numpy`` namespace can be +proposed. Alternatively, document clearly that new features for financial +functions are unwanted, to keep the maintenance costs to a minimum. + +**Examples of features not added because of backwards compatibility** + +TODO: do we have good examples here? Possibly subclassing related? + + +Removing complete submodules +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This year there have been suggestions to consider removing some or all of +``numpy.distutils``, ``numpy.f2py``, ``numpy.linalg``, and ``numpy.random``. +The motivation was that all these cost maintenance effort, and that they slow +down work on the core of Numpy (ndarrays, dtypes and ufuncs). + +The impact on downstream libraries and users would be very large, and +maintenance of these modules would still have to happen. Therefore this is +simply not a good idea; removing these submodules should not happen even for +a new major version of NumPy. + + +Subclassing of ndarray +^^^^^^^^^^^^^^^^^^^^^^ + +Subclassing of ``ndarray`` is a pain point. ``ndarray`` was not (or at least +not well) designed to be subclassed. Despite that, a lot of subclasses have +been created even within the NumPy code base itself, and some of those (e.g. +``MaskedArray``, ``astropy.units.Quantity``) are quite popular. The main +problems with subclasses are: + +- They make it hard to change ``ndarray`` in ways that would otherwise be + backwards compatible. +- Some of them change the behavior of ndarray methods, making it difficult to + write code that accepts array duck-types. + +Subclassing ``ndarray`` has been officially discouraged for a long time. Of +the most important subclasses, ``np.matrix`` will be deprecated (see gh-10142) +and ``MaskedArray`` will be kept in NumPy (`NEP 17 +`_). +``MaskedArray`` will ideally be rewritten in a way such that it uses only +public NumPy APIs. For subclasses outside of NumPy, more work is needed to +provide alternatives (e.g. mixins, see gh-9016 and gh-10446) or better support +for custom dtypes (see gh-2899). Until that is done, subclasses need to be +taken into account when making change to the NumPy code base. A future change +in NumPy to not support subclassing will certainly need a major version +increase. + + +Policy +------ + +1. Code changes that have the potential to silently change the results of a users' + code must never be made (except in the case of clear bugs). +2. Code changes that break users' code (i.e. the user will see a clear exception) + can be made, *provided the benefit is worth the cost* and suitable deprecation + warnings have been raised first. +3. Deprecation warnings are in all cases warnings that functionality will be removed. + If there is no intent to remove functionlity, then deprecation in documentation + only or other types of warnings shall be used. +4. Deprecations for stylistic reasons (e.g. consistency between functions) are + strongly discouraged. + +Deprecations: + +- shall include the version numbers of both when the functionality was deprecated + and when it will be removed (either two releases after the warning is + introduced, or in the next major version). +- shall include information on alternatives to the deprecated functionality, or a + reason for the deprecation if no clear alternative is available. +- shall use ``VisibleDeprecationWarning`` rather than ``DeprecationWarning`` + for cases of relevance to end users (as opposed to cases only relevant to + libraries building on top of NumPy). +- shall be listed in the release notes of the release where the deprecation happened. + +Removal of deprecated functionality: + +- shall be done after 2 releases (assuming a 6-monthly release cycle; if that changes, + there shall be at least 1 year between deprecation and removal), unless the + impact of the removal is such that a major version number increase is + warranted. +- shall be listed in the release notes of the release where the removal happened. + +Versioning: + +- removal of deprecated code can be done in any minor (but not bugfix) release. +- for heavily used functionality (e.g. removal of ``np.matrix``, of a whole submodule, + or significant changes to behavior for subclasses) the major version number shall + be increased. + +In concrete cases where this policy needs to be applied, decisions are made according +to the `NumPy governance model +`_. + +Functionality with more strict policies: + +- ``numpy.random`` has its own backwards compatibility policy, + see `NEP 19 `_. +- The file format for ``.npy`` and ``.npz`` files must not be changed in a backwards + incompatible way. + + +Alternatives +------------ + +**Being more aggressive with deprecations.** + +The goal of being more aggressive is to allow NumPy to move forward faster. +This would avoid others inventing their own solutions (often in multiple +places), as well as be a benefit to users without a legacy code base. We +reject this alternative because of the place NumPy has in the scientific Python +ecosystem - being fairly conservative is required in order to not increase the +extra maintenance for downstream libraries and end users to an unacceptable +level. + +**Semantic versioning.** + +This would change the versioning scheme for code removals; those could then +only be done when the major version number is increased. Rationale for +rejection: semantic versioning is relatively common in software engineering, +however it is not at all common in the Python world. Also, it would mean that +NumPy's version number simply starts to increase faster, which would be more +confusing than helpful. gh-10156 contains more discussion on this alternative. + + +Discussion +---------- + +TODO + +This section may just be a bullet list including links to any discussions +regarding the NEP: + +- This includes links to mailing list threads or relevant GitHub issues. + + +References and Footnotes +------------------------ + +.. [1] TODO + + +Copyright +--------- + +This document has been placed in the public domain. [1]_