diff --git a/.gitignore b/.gitignore
index abba1fdc0c0a..3952be1fea21 100644
--- a/.gitignore
+++ b/.gitignore
@@ -15,6 +15,12 @@
 *.tmp
 *.vim
 tags
+cscope.out
+# gnu global
+GPATH
+GRTAGS
+GSYMS
+GTAGS
 
 # Compiled source #
 ###################
@@ -123,7 +129,7 @@ numpy/core/src/private/npy_partition.h
 numpy/core/src/private/scalarmathmodule.h
 numpy/core/src/scalarmathmodule.c
 numpy/core/src/umath/funcs.inc
-numpy/core/src/umath/loops.c
+numpy/core/src/umath/loops.[ch]
 numpy/core/src/umath/operand_flag_tests.c
 numpy/core/src/umath/simd.inc
 numpy/core/src/umath/struct_ufunc_test.c
diff --git a/.travis.yml b/.travis.yml
index 12a443d41493..eba6890dfb0d 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -31,6 +31,10 @@ before_install:
   - ulimit -a
   - mkdir builds
   - pushd builds
+  # Build into own virtualenv
+  # We therefore control our own environment, avoid travis' numpy
+  - virtualenv --python=python venv
+  - source venv/bin/activate
   - pip install nose
   # pip install coverage
   - python -V
diff --git a/doc/HOWTO_DOCUMENT.rst.txt b/doc/HOWTO_DOCUMENT.rst.txt
index 2854b6b9098b..650f7d35caf3 100644
--- a/doc/HOWTO_DOCUMENT.rst.txt
+++ b/doc/HOWTO_DOCUMENT.rst.txt
@@ -30,14 +30,14 @@ A Guide to NumPy/SciPy Documentation
 
 Overview
 --------
-In general, we follow the standard Python style conventions as described here:
- * `Style Guide for C Code <http://www.python.org/peps/pep-0007.html>`_
- * `Style Guide for Python Code <http://www.python.org/peps/pep-0008.html>`_
- * `Docstring Conventions <http://www.python.org/peps/pep-0257.html>`_
+We mostly follow the standard Python style conventions as described here:
+ * `Style Guide for C Code <http://python.org/dev/peps/pep-0007/>`_
+ * `Style Guide for Python Code <http://python.org/dev/peps/pep-0008/>`_
+ * `Docstring Conventions <http://python.org/dev/peps/pep-0257/>`_
 
 Additional PEPs of interest regarding documentation of code:
- * `Docstring Processing Framework <http://www.python.org/peps/pep-0256.html>`_
- * `Docutils Design Specification <http://www.python.org/peps/pep-0258.html>`_
+ * `Docstring Processing Framework <http://python.org/dev/peps/pep-0256/>`_
+ * `Docutils Design Specification <http://python.org/dev/peps/pep-0258/>`_
 
 Use a code checker:
  * `pylint <http://www.logilab.org/857>`_
diff --git a/doc/neps/return-of-revenge-of-matmul-pep.rst b/doc/neps/return-of-revenge-of-matmul-pep.rst
new file mode 100644
index 000000000000..b19f07d851df
--- /dev/null
+++ b/doc/neps/return-of-revenge-of-matmul-pep.rst
@@ -0,0 +1,1380 @@
+PEP: 465
+Title: A dedicated infix operator for matrix multiplication
+Version: $Revision$
+Last-Modified: $Date$
+Author: Nathaniel J. Smith <njs@pobox.com>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 20-Feb-2014
+Python-Version: 3.5
+Post-History: 13-Mar-2014
+
+Abstract
+========
+
+This PEP proposes a new binary operator to be used for matrix
+multiplication, called ``@``.  (Mnemonic: ``@`` is ``*`` for
+mATrices.)
+
+
+Specification
+=============
+
+A new binary operator is added to the Python language, together
+with the corresponding in-place version:
+
+=======  ========================= ===============================
+ Op      Precedence/associativity     Methods
+=======  ========================= ===============================
+``@``    Same as ``*``             ``__matmul__``, ``__rmatmul__``
+``@=``   n/a                       ``__imatmul__``
+=======  ========================= ===============================
+
+No implementations of these methods are added to the builtin or
+standard library types.  However, a number of projects have reached
+consensus on the recommended semantics for these operations; see
+`Intended usage details`_ below for details.
+
+For details on how this operator will be implemented in CPython, see
+`Implementation details`_.
+
+
+Motivation
+==========
+
+Executive summary
+-----------------
+
+In numerical code, there are two important operations which compete
+for use of Python's ``*`` operator: elementwise multiplication, and
+matrix multiplication.  In the nearly twenty years since the Numeric
+library was first proposed, there have been many attempts to resolve
+this tension [#hugunin]_; none have been really satisfactory.
+Currently, most numerical Python code uses ``*`` for elementwise
+multiplication, and function/method syntax for matrix multiplication;
+however, this leads to ugly and unreadable code in common
+circumstances.  The problem is bad enough that significant amounts of
+code continue to use the opposite convention (which has the virtue of
+producing ugly and unreadable code in *different* circumstances), and
+this API fragmentation across codebases then creates yet more
+problems.  There does not seem to be any *good* solution to the
+problem of designing a numerical API within current Python syntax --
+only a landscape of options that are bad in different ways.  The
+minimal change to Python syntax which is sufficient to resolve these
+problems is the addition of a single new infix operator for matrix
+multiplication.
+
+Matrix multiplication has a singular combination of features which
+distinguish it from other binary operations, which together provide a
+uniquely compelling case for the addition of a dedicated infix
+operator:
+
+* Just as for the existing numerical operators, there exists a vast
+  body of prior art supporting the use of infix notation for matrix
+  multiplication across all fields of mathematics, science, and
+  engineering; ``@`` harmoniously fills a hole in Python's existing
+  operator system.
+
+* ``@`` greatly clarifies real-world code.
+
+* ``@`` provides a smoother onramp for less experienced users, who are
+  particularly harmed by hard-to-read code and API fragmentation.
+
+* ``@`` benefits a substantial and growing portion of the Python user
+  community.
+
+* ``@`` will be used frequently -- in fact, evidence suggests it may
+  be used more frequently than ``//`` or the bitwise operators.
+
+* ``@`` allows the Python numerical community to reduce fragmentation,
+  and finally standardize on a single consensus duck type for all
+  numerical array objects.
+
+
+Background: What's wrong with the status quo?
+---------------------------------------------
+
+When we crunch numbers on a computer, we usually have lots and lots of
+numbers to deal with.  Trying to deal with them one at a time is
+cumbersome and slow -- especially when using an interpreted language.
+Instead, we want the ability to write down simple operations that
+apply to large collections of numbers all at once.  The *n-dimensional
+array* is the basic object that all popular numeric computing
+environments use to make this possible.  Python has several libraries
+that provide such arrays, with numpy being at present the most
+prominent.
+
+When working with n-dimensional arrays, there are two different ways
+we might want to define multiplication.  One is elementwise
+multiplication::
+
+  [[1, 2],     [[11, 12],     [[1 * 11, 2 * 12],
+   [3, 4]]  x   [13, 14]]  =   [3 * 13, 4 * 14]]
+
+and the other is `matrix multiplication`_:
+
+.. _matrix multiplication: https://en.wikipedia.org/wiki/Matrix_multiplication
+
+::
+
+  [[1, 2],     [[11, 12],     [[1 * 11 + 2 * 13, 1 * 12 + 2 * 14],
+   [3, 4]]  x   [13, 14]]  =   [3 * 11 + 4 * 13, 3 * 12 + 4 * 14]]
+
+Elementwise multiplication is useful because it lets us easily and
+quickly perform many multiplications on a large collection of values,
+without writing a slow and cumbersome ``for`` loop.  And this works as
+part of a very general schema: when using the array objects provided
+by numpy or other numerical libraries, all Python operators work
+elementwise on arrays of all dimensionalities.  The result is that one
+can write functions using straightforward code like ``a * b + c / d``,
+treating the variables as if they were simple values, but then
+immediately use this function to efficiently perform this calculation
+on large collections of values, while keeping them organized using
+whatever arbitrarily complex array layout works best for the problem
+at hand.
+
+Matrix multiplication is more of a special case.  It's only defined on
+2d arrays (also known as "matrices"), and multiplication is the only
+operation that has an important "matrix" version -- "matrix addition"
+is the same as elementwise addition; there is no such thing as "matrix
+bitwise-or" or "matrix floordiv"; "matrix division" and "matrix
+to-the-power-of" can be defined but are not very useful, etc.
+However, matrix multiplication is still used very heavily across all
+numerical application areas; mathematically, it's one of the most
+fundamental operations there is.
+
+Because Python syntax currently allows for only a single
+multiplication operator ``*``, libraries providing array-like objects
+must decide: either use ``*`` for elementwise multiplication, or use
+``*`` for matrix multiplication.  And, unfortunately, it turns out
+that when doing general-purpose number crunching, both operations are
+used frequently, and there are major advantages to using infix rather
+than function call syntax in both cases.  Thus it is not at all clear
+which convention is optimal, or even acceptable; often it varies on a
+case-by-case basis.
+
+Nonetheless, network effects mean that it is very important that we
+pick *just one* convention.  In numpy, for example, it is technically
+possible to switch between the conventions, because numpy provides two
+different types with different ``__mul__`` methods.  For
+``numpy.ndarray`` objects, ``*`` performs elementwise multiplication,
+and matrix multiplication must use a function call (``numpy.dot``).
+For ``numpy.matrix`` objects, ``*`` performs matrix multiplication,
+and elementwise multiplication requires function syntax.  Writing code
+using ``numpy.ndarray`` works fine.  Writing code using
+``numpy.matrix`` also works fine.  But trouble begins as soon as we
+try to integrate these two pieces of code together.  Code that expects
+an ``ndarray`` and gets a ``matrix``, or vice-versa, may crash or
+return incorrect results.  Keeping track of which functions expect
+which types as inputs, and return which types as outputs, and then
+converting back and forth all the time, is incredibly cumbersome and
+impossible to get right at any scale.  Functions that defensively try
+to handle both types as input and DTRT, find themselves floundering
+into a swamp of ``isinstance`` and ``if`` statements.
+
+PEP 238 split ``/`` into two operators: ``/`` and ``//``.  Imagine the
+chaos that would have resulted if it had instead split ``int`` into
+two types: ``classic_int``, whose ``__div__`` implemented floor
+division, and ``new_int``, whose ``__div__`` implemented true
+division.  This, in a more limited way, is the situation that Python
+number-crunchers currently find themselves in.
+
+In practice, the vast majority of projects have settled on the
+convention of using ``*`` for elementwise multiplication, and function
+call syntax for matrix multiplication (e.g., using ``numpy.ndarray``
+instead of ``numpy.matrix``).  This reduces the problems caused by API
+fragmentation, but it doesn't eliminate them.  The strong desire to
+use infix notation for matrix multiplication has caused a number of
+specialized array libraries to continue to use the opposing convention
+(e.g., scipy.sparse, pyoperators, pyviennacl) despite the problems
+this causes, and ``numpy.matrix`` itself still gets used in
+introductory programming courses, often appears in StackOverflow
+answers, and so forth.  Well-written libraries thus must continue to
+be prepared to deal with both types of objects, and, of course, are
+also stuck using unpleasant funcall syntax for matrix multiplication.
+After nearly two decades of trying, the numerical community has still
+not found any way to resolve these problems within the constraints of
+current Python syntax (see `Rejected alternatives to adding a new
+operator`_ below).
+
+This PEP proposes the minimum effective change to Python syntax that
+will allow us to drain this swamp.  It splits ``*`` into two
+operators, just as was done for ``/``: ``*`` for elementwise
+multiplication, and ``@`` for matrix multiplication.  (Why not the
+reverse?  Because this way is compatible with the existing consensus,
+and because it gives us a consistent rule that all the built-in
+numeric operators also apply in an elementwise manner to arrays; the
+reverse convention would lead to more special cases.)
+
+So that's why matrix multiplication doesn't and can't just use ``*``.
+Now, in the the rest of this section, we'll explain why it nonetheless
+meets the high bar for adding a new operator.
+
+
+Why should matrix multiplication be infix?
+------------------------------------------
+
+Right now, most numerical code in Python uses syntax like
+``numpy.dot(a, b)`` or ``a.dot(b)`` to perform matrix multiplication.
+This obviously works, so why do people make such a fuss about it, even
+to the point of creating API fragmentation and compatibility swamps?
+
+Matrix multiplication shares two features with ordinary arithmetic
+operations like addition and multiplication on numbers: (a) it is used
+very heavily in numerical programs -- often multiple times per line of
+code -- and (b) it has an ancient and universally adopted tradition of
+being written using infix syntax.  This is because, for typical
+formulas, this notation is dramatically more readable than any
+function call syntax.  Here's an example to demonstrate:
+
+One of the most useful tools for testing a statistical hypothesis is
+the linear hypothesis test for OLS regression models.  It doesn't
+really matter what all those words I just said mean; if we find
+ourselves having to implement this thing, what we'll do is look up
+some textbook or paper on it, and encounter many mathematical formulas
+that look like:
+
+.. math::
+
+    S = (H \beta - r)^T (H V H^T)^{-1} (H \beta - r)
+
+Here the various variables are all vectors or matrices (details for
+the curious: [#lht]_).
+
+Now we need to write code to perform this calculation. In current
+numpy, matrix multiplication can be performed using either the
+function or method call syntax. Neither provides a particularly
+readable translation of the formula::
+
+    import numpy as np
+    from numpy.linalg import inv, solve
+
+    # Using dot function:
+    S = np.dot((np.dot(H, beta) - r).T,
+               np.dot(inv(np.dot(np.dot(H, V), H.T)), np.dot(H, beta) - r))
+
+    # Using dot method:
+    S = (H.dot(beta) - r).T.dot(inv(H.dot(V).dot(H.T))).dot(H.dot(beta) - r)
+
+With the ``@`` operator, the direct translation of the above formula
+becomes::
+
+    S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)
+
+Notice that there is now a transparent, 1-to-1 mapping between the
+symbols in the original formula and the code that implements it.
+
+Of course, an experienced programmer will probably notice that this is
+not the best way to compute this expression.  The repeated computation
+of :math:`H \beta - r` should perhaps be factored out; and,
+expressions of the form ``dot(inv(A), B)`` should almost always be
+replaced by the more numerically stable ``solve(A, B)``.  When using
+``@``, performing these two refactorings gives us::
+
+    # Version 1 (as above)
+    S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)
+
+    # Version 2
+    trans_coef = H @ beta - r
+    S = trans_coef.T @ inv(H @ V @ H.T) @ trans_coef
+
+    # Version 3
+    S = trans_coef.T @ solve(H @ V @ H.T, trans_coef)
+
+Notice that when comparing between each pair of steps, it's very easy
+to see exactly what was changed.  If we apply the equivalent
+transformations to the code using the .dot method, then the changes
+are much harder to read out or verify for correctness::
+
+    # Version 1 (as above)
+    S = (H.dot(beta) - r).T.dot(inv(H.dot(V).dot(H.T))).dot(H.dot(beta) - r)
+
+    # Version 2
+    trans_coef = H.dot(beta) - r
+    S = trans_coef.T.dot(inv(H.dot(V).dot(H.T))).dot(trans_coef)
+
+    # Version 3
+    S = trans_coef.T.dot(solve(H.dot(V).dot(H.T)), trans_coef)
+
+Readability counts!  The statements using ``@`` are shorter, contain
+more whitespace, can be directly and easily compared both to each
+other and to the textbook formula, and contain only meaningful
+parentheses.  This last point is particularly important for
+readability: when using function-call syntax, the required parentheses
+on every operation create visual clutter that makes it very difficult
+to parse out the overall structure of the formula by eye, even for a
+relatively simple formula like this one.  Eyes are terrible at parsing
+non-regular languages.  I made and caught many errors while trying to
+write out the 'dot' formulas above.  I know they still contain at
+least one error, maybe more.  (Exercise: find it.  Or them.)  The
+``@`` examples, by contrast, are not only correct, they're obviously
+correct at a glance.
+
+If we are even more sophisticated programmers, and writing code that
+we expect to be reused, then considerations of speed or numerical
+accuracy might lead us to prefer some particular order of evaluation.
+Because ``@`` makes it possible to omit irrelevant parentheses, we can
+be certain that if we *do* write something like ``(H @ V) @ H.T``,
+then our readers will know that the parentheses must have been added
+intentionally to accomplish some meaningful purpose.  In the ``dot``
+examples, it's impossible to know which nesting decisions are
+important, and which are arbitrary.
+
+Infix ``@`` dramatically improves matrix code usability at all stages
+of programmer interaction.
+
+
+Transparent syntax is especially crucial for non-expert programmers
+-------------------------------------------------------------------
+
+A large proportion of scientific code is written by people who are
+experts in their domain, but are not experts in programming.  And
+there are many university courses run each year with titles like "Data
+analysis for social scientists" which assume no programming
+background, and teach some combination of mathematical techniques,
+introduction to programming, and the use of programming to implement
+these mathematical techniques, all within a 10-15 week period.  These
+courses are more and more often being taught in Python rather than
+special-purpose languages like R or Matlab.
+
+For these kinds of users, whose programming knowledge is fragile, the
+existence of a transparent mapping between formulas and code often
+means the difference between succeeding and failing to write that code
+at all.  This is so important that such classes often use the
+``numpy.matrix`` type which defines ``*`` to mean matrix
+multiplication, even though this type is buggy and heavily
+disrecommended by the rest of the numpy community for the
+fragmentation that it causes.  This pedagogical use case is, in fact,
+the *only* reason ``numpy.matrix`` remains a supported part of numpy.
+Adding ``@`` will benefit both beginning and advanced users with
+better syntax; and furthermore, it will allow both groups to
+standardize on the same notation from the start, providing a smoother
+on-ramp to expertise.
+
+
+But isn't matrix multiplication a pretty niche requirement?
+-----------------------------------------------------------
+
+The world is full of continuous data, and computers are increasingly
+called upon to work with it in sophisticated ways.  Arrays are the
+lingua franca of finance, machine learning, 3d graphics, computer
+vision, robotics, operations research, econometrics, meteorology,
+computational linguistics, recommendation systems, neuroscience,
+astronomy, bioinformatics (including genetics, cancer research, drug
+discovery, etc.), physics engines, quantum mechanics, geophysics,
+network analysis, and many other application areas.  In most or all of
+these areas, Python is rapidly becoming a dominant player, in large
+part because of its ability to elegantly mix traditional discrete data
+structures (hash tables, strings, etc.) on an equal footing with
+modern numerical data types and algorithms.
+
+We all live in our own little sub-communities, so some Python users
+may be surprised to realize the sheer extent to which Python is used
+for number crunching -- especially since much of this particular
+sub-community's activity occurs outside of traditional Python/FOSS
+channels.  So, to give some rough idea of just how many numerical
+Python programmers are actually out there, here are two numbers: In
+2013, there were 7 international conferences organized specifically on
+numerical Python [#scipy-conf]_ [#pydata-conf]_.  At PyCon 2014, ~20%
+of the tutorials appear to involve the use of matrices
+[#pycon-tutorials]_.
+
+To quantify this further, we used Github's "search" function to look
+at what modules are actually imported across a wide range of
+real-world code (i.e., all the code on Github).  We checked for
+imports of several popular stdlib modules, a variety of numerically
+oriented modules, and various other extremely high-profile modules
+like django and lxml (the latter of which is the #1 most downloaded
+package on PyPI).  Starred lines indicate packages which export array-
+or matrix-like objects which will adopt ``@`` if this PEP is
+approved::
+
+    Count of Python source files on Github matching given search terms
+                     (as of 2014-04-10, ~21:00 UTC)
+    ================ ==========  ===============  =======  ===========
+    module           "import X"  "from X import"    total  total/numpy
+    ================ ==========  ===============  =======  ===========
+    sys                 2374638            63301  2437939         5.85
+    os                  1971515            37571  2009086         4.82
+    re                  1294651             8358  1303009         3.12
+    numpy ************** 337916 ********** 79065 * 416981 ******* 1.00
+    warnings             298195            73150   371345         0.89
+    subprocess           281290            63644   344934         0.83
+    django                62795           219302   282097         0.68
+    math                 200084            81903   281987         0.68
+    threading            212302            45423   257725         0.62
+    pickle+cPickle       215349            22672   238021         0.57
+    matplotlib           119054            27859   146913         0.35
+    sqlalchemy            29842            82850   112692         0.27
+    pylab *************** 36754 ********** 41063 ** 77817 ******* 0.19
+    scipy *************** 40829 ********** 28263 ** 69092 ******* 0.17
+    lxml                  19026            38061    57087         0.14
+    zlib                  40486             6623    47109         0.11
+    multiprocessing       25247            19850    45097         0.11
+    requests              30896              560    31456         0.08
+    jinja2                 8057            24047    32104         0.08
+    twisted               13858             6404    20262         0.05
+    gevent                11309             8529    19838         0.05
+    pandas ************** 14923 *********** 4005 ** 18928 ******* 0.05
+    sympy                  2779             9537    12316         0.03
+    theano *************** 3654 *********** 1828 *** 5482 ******* 0.01
+    ================ ==========  ===============  =======  ===========
+
+These numbers should be taken with several grains of salt (see
+footnote for discussion: [#github-details]_), but, to the extent they
+can be trusted, they suggest that ``numpy`` might be the single
+most-imported non-stdlib module in the entire Pythonverse; it's even
+more-imported than such stdlib stalwarts as ``subprocess``, ``math``,
+``pickle``, and ``threading``.  And numpy users represent only a
+subset of the broader numerical community that will benefit from the
+``@`` operator.  Matrices may once have been a niche data type
+restricted to Fortran programs running in university labs and military
+clusters, but those days are long gone.  Number crunching is a
+mainstream part of modern Python usage.
+
+In addition, there is some precedence for adding an infix operator to
+handle a more-specialized arithmetic operation: the floor division
+operator ``//``, like the bitwise operators, is very useful under
+certain circumstances when performing exact calculations on discrete
+values.  But it seems likely that there are many Python programmers
+who have never had reason to use ``//`` (or, for that matter, the
+bitwise operators).  ``@`` is no more niche than ``//``.
+
+
+So ``@`` is good for matrix formulas, but how common are those really?
+----------------------------------------------------------------------
+
+We've seen that ``@`` makes matrix formulas dramatically easier to
+work with for both experts and non-experts, that matrix formulas
+appear in many important applications, and that numerical libraries
+like numpy are used by a substantial proportion of Python's user base.
+But numerical libraries aren't just about matrix formulas, and being
+important doesn't necessarily mean taking up a lot of code: if matrix
+formulas only occured in one or two places in the average
+numerically-oriented project, then it still wouldn't be worth adding a
+new operator.  So how common is matrix multiplication, really?
+
+When the going gets tough, the tough get empirical.  To get a rough
+estimate of how useful the ``@`` operator will be, the table below
+shows the rate at which different Python operators are actually used
+in the stdlib, and also in two high-profile numerical packages -- the
+scikit-learn machine learning library, and the nipy neuroimaging
+library -- normalized by source lines of code (SLOC).  Rows are sorted
+by the 'combined' column, which pools all three code bases together.
+The combined column is thus strongly weighted towards the stdlib,
+which is much larger than both projects put together (stdlib: 411575
+SLOC, scikit-learn: 50924 SLOC, nipy: 37078 SLOC). [#sloc-details]_
+
+The ``dot`` row (marked ``******``) counts how common matrix multiply
+operations are in each codebase.
+
+::
+
+    ====  ======  ============  ====  ========
+      op  stdlib  scikit-learn  nipy  combined
+    ====  ======  ============  ====  ========
+       =    2969          5536  4932      3376 / 10,000 SLOC
+       -     218           444   496       261
+       +     224           201   348       231
+      ==     177           248   334       196
+       *     156           284   465       192
+       %     121           114   107       119
+      **      59           111   118        68
+      !=      40            56    74        44
+       /      18           121   183        41
+       >      29            70   110        39
+      +=      34            61    67        39
+       <      32            62    76        38
+      >=      19            17    17        18
+      <=      18            27    12        18
+     dot ***** 0 ********** 99 ** 74 ****** 16
+       |      18             1     2        15
+       &      14             0     6        12
+      <<      10             1     1         8
+      //       9             9     1         8
+      -=       5            21    14         8
+      *=       2            19    22         5
+      /=       0            23    16         4
+      >>       4             0     0         3
+       ^       3             0     0         3
+       ~       2             4     5         2
+      |=       3             0     0         2
+      &=       1             0     0         1
+     //=       1             0     0         1
+      ^=       1             0     0         0
+     **=       0             2     0         0
+      %=       0             0     0         0
+     <<=       0             0     0         0
+     >>=       0             0     0         0
+    ====  ======  ============  ====  ========
+
+These two numerical packages alone contain ~780 uses of matrix
+multiplication.  Within these packages, matrix multiplication is used
+more heavily than most comparison operators (``<`` ``!=`` ``<=``
+``>=``).  Even when we dilute these counts by including the stdlib
+into our comparisons, matrix multiplication is still used more often
+in total than any of the bitwise operators, and 2x as often as ``//``.
+This is true even though the stdlib, which contains a fair amount of
+integer arithmetic and no matrix operations, makes up more than 80% of
+the combined code base.
+
+By coincidence, the numeric libraries make up approximately the same
+proportion of the 'combined' codebase as numeric tutorials make up of
+PyCon 2014's tutorial schedule, which suggests that the 'combined'
+column may not be *wildly* unrepresentative of new Python code in
+general.  While it's impossible to know for certain, from this data it
+seems entirely possible that across all Python code currently being
+written, matrix multiplication is already used more often than ``//``
+and the bitwise operations.
+
+
+But isn't it weird to add an operator with no stdlib uses?
+----------------------------------------------------------
+
+It's certainly unusual (though extended slicing existed for some time
+builtin types gained support for it, ``Ellipsis`` is still unused
+within the stdlib, etc.).  But the important thing is whether a change
+will benefit users, not where the software is being downloaded from.
+It's clear from the above that ``@`` will be used, and used heavily.
+And this PEP provides the critical piece that will allow the Python
+numerical community to finally reach consensus on a standard duck type
+for all array-like objects, which is a necessary precondition to ever
+adding a numerical array type to the stdlib.
+
+
+Compatibility considerations
+============================
+
+Currently, the only legal use of the ``@`` token in Python code is at
+statement beginning in decorators.  The new operators are both infix;
+the one place they can never occur is at statement beginning.
+Therefore, no existing code will be broken by the addition of these
+operators, and there is no possible parsing ambiguity between
+decorator-@ and the new operators.
+
+Another important kind of compatibility is the mental cost paid by
+users to update their understanding of the Python language after this
+change, particularly for users who do not work with matrices and thus
+do not benefit.  Here again, ``@`` has minimal impact: even
+comprehensive tutorials and references will only need to add a
+sentence or two to fully document this PEP's changes for a
+non-numerical audience.
+
+
+Intended usage details
+======================
+
+This section is informative, rather than normative -- it documents the
+consensus of a number of libraries that provide array- or matrix-like
+objects on how ``@`` will be implemented.
+
+This section uses the numpy terminology for describing arbitrary
+multidimensional arrays of data, because it is a superset of all other
+commonly used models.  In this model, the *shape* of any array is
+represented by a tuple of integers.  Because matrices are
+two-dimensional, they have len(shape) == 2, while 1d vectors have
+len(shape) == 1, and scalars have shape == (), i.e., they are "0
+dimensional".  Any array contains prod(shape) total entries.  Notice
+that `prod(()) == 1`_ (for the same reason that sum(()) == 0); scalars
+are just an ordinary kind of array, not a special case.  Notice also
+that we distinguish between a single scalar value (shape == (),
+analogous to ``1``), a vector containing only a single entry (shape ==
+(1,), analogous to ``[1]``), a matrix containing only a single entry
+(shape == (1, 1), analogous to ``[[1]]``), etc., so the dimensionality
+of any array is always well-defined.  Other libraries with more
+restricted representations (e.g., those that support 2d arrays only)
+might implement only a subset of the functionality described here.
+
+.. _prod(()) == 1: https://en.wikipedia.org/wiki/Empty_product
+
+Semantics
+---------
+
+The recommended semantics for ``@`` for different inputs are:
+
+* 2d inputs are conventional matrices, and so the semantics are
+  obvious: we apply conventional matrix multiplication.  If we write
+  ``arr(2, 3)`` to represent an arbitrary 2x3 array, then ``arr(2, 3)
+  @ arr(3, 4)`` returns an array with shape (2, 4).
+
+* 1d vector inputs are promoted to 2d by prepending or appending a '1'
+  to the shape, the operation is performed, and then the added
+  dimension is removed from the output.  The 1 is always added on the
+  "outside" of the shape: prepended for left arguments, and appended
+  for right arguments.  The result is that matrix @ vector and vector
+  @ matrix are both legal (assuming compatible shapes), and both
+  return 1d vectors; vector @ vector returns a scalar.  This is
+  clearer with examples.
+
+  * ``arr(2, 3) @ arr(3, 1)`` is a regular matrix product, and returns
+    an array with shape (2, 1), i.e., a column vector.
+
+  * ``arr(2, 3) @ arr(3)`` performs the same computation as the
+    previous (i.e., treats the 1d vector as a matrix containing a
+    single *column*, shape = (3, 1)), but returns the result with
+    shape (2,), i.e., a 1d vector.
+
+  * ``arr(1, 3) @ arr(3, 2)`` is a regular matrix product, and returns
+    an array with shape (1, 2), i.e., a row vector.
+
+  * ``arr(3) @ arr(3, 2)`` performs the same computation as the
+    previous (i.e., treats the 1d vector as a matrix containing a
+    single *row*, shape = (1, 3)), but returns the result with shape
+    (2,), i.e., a 1d vector.
+
+  * ``arr(1, 3) @ arr(3, 1)`` is a regular matrix product, and returns
+    an array with shape (1, 1), i.e., a single value in matrix form.
+
+  * ``arr(3) @ arr(3)`` performs the same computation as the
+    previous, but returns the result with shape (), i.e., a single
+    scalar value, not in matrix form.  So this is the standard inner
+    product on vectors.
+
+  An infelicity of this definition for 1d vectors is that it makes
+  ``@`` non-associative in some cases (``(Mat1 @ vec) @ Mat2`` !=
+  ``Mat1 @ (vec @ Mat2)``).  But this seems to be a case where
+  practicality beats purity: non-associativity only arises for strange
+  expressions that would never be written in practice; if they are
+  written anyway then there is a consistent rule for understanding
+  what will happen (``Mat1 @ vec @ Mat2`` is parsed as ``(Mat1 @ vec)
+  @ Mat2``, just like ``a - b - c``); and, not supporting 1d vectors
+  would rule out many important use cases that do arise very commonly
+  in practice.  No-one wants to explain to new users why to solve the
+  simplest linear system in the obvious way, they have to type
+  ``(inv(A) @ b[:, np.newaxis]).flatten()`` instead of ``inv(A) @ b``,
+  or perform an ordinary least-squares regression by typing
+  ``solve(X.T @ X, X @ y[:, np.newaxis]).flatten()`` instead of
+  ``solve(X.T @ X, X @ y)``.  No-one wants to type ``(a[np.newaxis, :]
+  @ b[:, np.newaxis])[0, 0]`` instead of ``a @ b`` every time they
+  compute an inner product, or ``(a[np.newaxis, :] @ Mat @ b[:,
+  np.newaxis])[0, 0]`` for general quadratic forms instead of ``a @
+  Mat @ b``.  In addition, sage and sympy (see below) use these
+  non-associative semantics with an infix matrix multiplication
+  operator (they use ``*``), and they report that they haven't
+  experienced any problems caused by it.
+
+* For inputs with more than 2 dimensions, we treat the last two
+  dimensions as being the dimensions of the matrices to multiply, and
+  'broadcast' across the other dimensions.  This provides a convenient
+  way to quickly compute many matrix products in a single operation.
+  For example, ``arr(10, 2, 3) @ arr(10, 3, 4)`` performs 10 separate
+  matrix multiplies, each of which multiplies a 2x3 and a 3x4 matrix
+  to produce a 2x4 matrix, and then returns the 10 resulting matrices
+  together in an array with shape (10, 2, 4).  The intuition here is
+  that we treat these 3d arrays of numbers as if they were 1d arrays
+  *of matrices*, and then apply matrix multiplication in an
+  elementwise manner, where now each 'element' is a whole matrix.
+  Note that broadcasting is not limited to perfectly aligned arrays;
+  in more complicated cases, it allows several simple but powerful
+  tricks for controlling how arrays are aligned with each other; see
+  [#broadcasting]_ for details.  (In particular, it turns out that
+  when broadcasting is taken into account, the standard scalar *
+  matrix product is a special case of the elementwise multiplication
+  operator ``*``.)
+
+  If one operand is >2d, and another operand is 1d, then the above
+  rules apply unchanged, with 1d->2d promotion performed before
+  broadcasting.  E.g., ``arr(10, 2, 3) @ arr(3)`` first promotes to
+  ``arr(10, 2, 3) @ arr(3, 1)``, then broadcasts the right argument to
+  create the aligned operation ``arr(10, 2, 3) @ arr(10, 3, 1)``,
+  multiplies to get an array with shape (10, 2, 1), and finally
+  removes the added dimension, returning an array with shape (10, 2).
+  Similarly, ``arr(2) @ arr(10, 2, 3)`` produces an intermediate array
+  with shape (10, 1, 3), and a final array with shape (10, 3).
+
+* 0d (scalar) inputs raise an error.  Scalar * matrix multiplication
+  is a mathematically and algorithmically distinct operation from
+  matrix @ matrix multiplication, and is already covered by the
+  elementwise ``*`` operator.  Allowing scalar @ matrix would thus
+  both require an unnecessary special case, and violate TOOWTDI.
+
+
+Adoption
+--------
+
+We group existing Python projects which provide array- or matrix-like
+types based on what API they currently use for elementwise and matrix
+multiplication.
+
+**Projects which currently use * for elementwise multiplication, and
+function/method calls for matrix multiplication:**
+
+The developers of the following projects have expressed an intention
+to implement ``@`` on their array-like types using the above
+semantics:
+
+* numpy
+* pandas
+* blaze
+* theano
+
+The following projects have been alerted to the existence of the PEP,
+but it's not yet known what they plan to do if it's accepted.  We
+don't anticipate that they'll have any objections, though, since
+everything proposed here is consistent with how they already do
+things:
+
+* pycuda
+* panda3d
+
+**Projects which currently use * for matrix multiplication, and
+function/method calls for elementwise multiplication:**
+
+The following projects have expressed an intention, if this PEP is
+accepted, to migrate from their current API to the elementwise-``*``,
+matmul-``@`` convention (i.e., this is a list of projects whose API
+fragmentation will probably be eliminated if this PEP is accepted):
+
+* numpy (``numpy.matrix``)
+* scipy.sparse
+* pyoperators
+* pyviennacl
+
+The following projects have been alerted to the existence of the PEP,
+but it's not known what they plan to do if it's accepted (i.e., this
+is a list of projects whose API fragmentation may or may not be
+eliminated if this PEP is accepted):
+
+* cvxopt
+
+**Projects which currently use * for matrix multiplication, and which
+don't really care about elementwise multiplication of matrices:**
+
+There are several projects which implement matrix types, but from a
+very different perspective than the numerical libraries discussed
+above.  These projects focus on computational methods for analyzing
+matrices in the sense of abstract mathematical objects (i.e., linear
+maps over free modules over rings), rather than as big bags full of
+numbers that need crunching.  And it turns out that from the abstract
+math point of view, there isn't much use for elementwise operations in
+the first place; as discussed in the Background section above,
+elementwise operations are motivated by the bag-of-numbers approach.
+So these projects don't encounter the basic problem that this PEP
+exists to address, making it mostly irrelevant to them; while they
+appear superficially similar to projects like numpy, they're actually
+doing something quite different.  They use ``*`` for matrix
+multiplication (and for group actions, and so forth), and if this PEP
+is accepted, their expressed intention is to continue doing so, while
+perhaps adding ``@`` as an alias.  These projects include:
+
+* sympy
+* sage
+
+
+Implementation details
+======================
+
+New functions ``operator.matmul`` and ``operator.__matmul__`` are
+added to the standard library, with the usual semantics.
+
+A corresponding function ``PyObject* PyObject_MatrixMultiply(PyObject
+*o1, PyObject o2)`` is added to the C API.
+
+A new AST node is added named ``MatMult``, along with a new token
+``ATEQUAL`` and new bytecode opcodes ``BINARY_MATRIX_MULTIPLY`` and
+``INPLACE_MATRIX_MULTIPLY``.
+
+Two new type slots are added; whether this is to ``PyNumberMethods``
+or a new ``PyMatrixMethods`` struct remains to be determined.
+
+
+Rationale for specification details
+===================================
+
+Choice of operator
+------------------
+
+Why ``@`` instead of some other spelling?  There isn't any consensus
+across other programming languages about how this operator should be
+named [#matmul-other-langs]_; here we discuss the various options.
+
+Restricting ourselves only to symbols present on US English keyboards,
+the punctuation characters that don't already have a meaning in Python
+expression context are: ``@``, backtick, ``$``, ``!``, and ``?``.  Of
+these options, ``@`` is clearly the best; ``!`` and ``?`` are already
+heavily freighted with inapplicable meanings in the programming
+context, backtick has been banned from Python by BDFL pronouncement
+(see PEP 3099), and ``$`` is uglier, even more dissimilar to ``*`` and
+:math:`\cdot`, and has Perl/PHP baggage.  ``$`` is probably the
+second-best option of these, though.
+
+Symbols which are not present on US English keyboards start at a
+significant disadvantage (having to spend 5 minutes at the beginning
+of every numeric Python tutorial just going over keyboard layouts is
+not a hassle anyone really wants).  Plus, even if we somehow overcame
+the typing problem, it's not clear there are any that are actually
+better than ``@``.  Some options that have been suggested include:
+
+* U+00D7 MULTIPLICATION SIGN: ``A × B``
+* U+22C5 DOT OPERATOR: ``A ⋅ B``
+* U+2297 CIRCLED TIMES: ``A ⊗ B``
+* U+00B0 DEGREE: ``A ° B``
+
+What we need, though, is an operator that means "matrix
+multiplication, as opposed to scalar/elementwise multiplication".
+There is no conventional symbol with this meaning in either
+programming or mathematics, where these operations are usually
+distinguished by context.  (And U+2297 CIRCLED TIMES is actually used
+conventionally to mean exactly the wrong things: elementwise
+multiplication -- the "Hadamard product" -- or outer product, rather
+than matrix/inner product like our operator).  ``@`` at least has the
+virtue that it *looks* like a funny non-commutative operator; a naive
+user who knows maths but not programming couldn't look at ``A * B``
+versus ``A × B``, or ``A * B`` versus ``A ⋅ B``, or ``A * B`` versus
+``A ° B`` and guess which one is the usual multiplication, and which
+one is the special case.
+
+Finally, there is the option of using multi-character tokens.  Some
+options:
+
+* Matlab and Julia use a ``.*`` operator.  Aside from being visually
+  confusable with ``*``, this would be a terrible choice for us
+  because in Matlab and Julia, ``*`` means matrix multiplication and
+  ``.*`` means elementwise multiplication, so using ``.*`` for matrix
+  multiplication would make us exactly backwards from what Matlab and
+  Julia users expect.
+
+* APL apparently used ``+.×``, which by combining a multi-character
+  token, confusing attribute-access-like . syntax, and a unicode
+  character, ranks somewhere below U+2603 SNOWMAN on our candidate
+  list.  If we like the idea of combining addition and multiplication
+  operators as being evocative of how matrix multiplication actually
+  works, then something like ``+*`` could be used -- though this may
+  be too easy to confuse with ``*+``, which is just multiplication
+  combined with the unary ``+`` operator.
+
+* PEP 211 suggested ``~*``.  This has the downside that it sort of
+  suggests that there is a unary ``*`` operator that is being combined
+  with unary ``~``, but it could work.
+
+* R uses ``%*%`` for matrix multiplication.  In R this forms part of a
+  general extensible infix system in which all tokens of the form
+  ``%foo%`` are user-defined binary operators.  We could steal the
+  token without stealing the system.
+
+* Some other plausible candidates that have been suggested: ``><`` (=
+  ascii drawing of the multiplication sign ×); the footnote operator
+  ``[*]`` or ``|*|`` (but when used in context, the use of vertical
+  grouping symbols tends to recreate the nested parentheses visual
+  clutter that was noted as one of the major downsides of the function
+  syntax we're trying to get away from); ``^*``.
+
+So, it doesn't matter much, but ``@`` seems as good or better than any
+of the alternatives:
+
+* It's a friendly character that Pythoneers are already used to typing
+  in decorators, but the decorator usage and the math expression
+  usage are sufficiently dissimilar that it would be hard to confuse
+  them in practice.
+
+* It's widely accessible across keyboard layouts (and thanks to its
+  use in email addresses, this is true even of weird keyboards like
+  those in phones).
+
+* It's round like ``*`` and :math:`\cdot`.
+
+* The mATrices mnemonic is cute.
+
+* The swirly shape is reminiscent of the simultaneous sweeps over rows
+  and columns that define matrix multiplication
+
+* Its asymmetry is evocative of its non-commutative nature.
+
+* Whatever, we have to pick something.
+
+
+Precedence and associativity
+----------------------------
+
+There was a long discussion [#associativity-discussions]_ about
+whether ``@`` should be right- or left-associative (or even something
+more exotic [#group-associativity]_). Almost all Python operators are
+left-associative, so following this convention would be the simplest
+approach, but there were two arguments that suggested matrix
+multiplication might be worth making right-associative as a special
+case:
+
+First, matrix multiplication has a tight conceptual association with
+function application/composition, so many mathematically sophisticated
+users have an intuition that an expression like :math:`R S x` proceeds
+from right-to-left, with first :math:`S` transforming the vector
+:math:`x`, and then :math:`R` transforming the result. This isn't
+universally agreed (and not all number-crunchers are steeped in the
+pure-math conceptual framework that motivates this intuition
+[#oil-industry-versus-right-associativity]_), but at the least this
+intuition is more common than for other operations like :math:`2 \cdot
+3 \cdot 4` which everyone reads as going from left-to-right.
+
+Second, if expressions like ``Mat @ Mat @ vec`` appear often in code,
+then programs will run faster (and efficiency-minded programmers will
+be able to use fewer parentheses) if this is evaluated as ``Mat @ (Mat
+@ vec)`` then if it is evaluated like ``(Mat @ Mat) @ vec``.
+
+However, weighing against these arguments are the following:
+
+Regarding the efficiency argument, empirically, we were unable to find
+any evidence that ``Mat @ Mat @ vec`` type expressions actually
+dominate in real-life code. Parsing a number of large projects that
+use numpy, we found that when forced by numpy's current funcall syntax
+to choose an order of operations for nested calls to ``dot``, people
+actually use left-associative nesting slightly *more* often than
+right-associative nesting [#numpy-associativity-counts]_.  And anyway,
+writing parentheses isn't so bad -- if an efficiency-minded programmer
+is going to take the trouble to think through the best way to evaluate
+some expression, they probably *should* write down the parentheses
+regardless of whether they're needed, just to make it obvious to the
+next reader that they order of operations matter.
+
+In addition, it turns out that other languages, including those with
+much more of a focus on linear algebra, overwhelmingly make their
+matmul operators left-associative. Specifically, the ``@`` equivalent
+is left-associative in R, Matlab, Julia, IDL, and Gauss. The only
+exceptions we found are Mathematica, in which ``a @ b @ c`` would be
+parsed non-associatively as ``dot(a, b, c)``, and APL, in which all
+operators are right-associative. There do not seem to exist any
+languages that make ``@`` right-associative and ``*``
+left-associative. And these decisions don't seem to be controversial
+-- I've never seen anyone complaining about this particular aspect of
+any of these other languages, and the left-associativity of ``*``
+doesn't seem to bother users of the existing Python libraries that use
+``*`` for matrix multiplication. So, at the least we can conclude from
+this that making ``@`` left-associative will certainly not cause any
+disasters. Making ``@`` right-associative, OTOH, would be exploring
+new and uncertain ground.
+
+And another advantage of left-associativity is that it is much easier
+to learn and remember that ``@`` acts like ``*``, than it is to
+remember first that ``@`` is unlike other Python operators by being
+right-associative, and then on top of this, also have to remember
+whether it is more tightly or more loosely binding than
+``*``. (Right-associativity forces us to choose a precedence, and
+intuitions were about equally split on which precedence made more
+sense. So this suggests that no matter which choice we made, no-one
+would be able to guess or remember it.)
+
+On net, therefore, the general consensus of the numerical community is
+that while matrix multiplication is something of a special case, it's
+not special enough to break the rules, and ``@`` should parse like
+``*`` does.
+
+
+(Non)-Definitions for built-in types
+------------------------------------
+
+No ``__matmul__`` or ``__matpow__`` are defined for builtin numeric
+types (``float``, ``int``, etc.) or for the ``numbers.Number``
+hierarchy, because these types represent scalars, and the consensus
+semantics for ``@`` are that it should raise an error on scalars.
+
+We do not -- for now -- define a ``__matmul__`` method on the standard
+``memoryview`` or ``array.array`` objects, for several reasons.  Of
+course this could be added if someone wants it, but these types would
+require quite a bit of additional work beyond ``__matmul__`` before
+they could be used for numeric work -- e.g., they have no way to do
+addition or scalar multiplication either! -- and adding such
+functionality is beyond the scope of this PEP.  In addition, providing
+a quality implementation of matrix multiplication is highly
+non-trivial.  Naive nested loop implementations are very slow and
+shipping such an implementation in CPython would just create a trap
+for users.  But the alternative -- providing a modern, competitive
+matrix multiply -- would require that CPython link to a BLAS library,
+which brings a set of new complications.  In particular, several
+popular BLAS libraries (including the one that ships by default on
+OS X) currently break the use of ``multiprocessing`` [#blas-fork]_.
+Together, these considerations mean that the cost/benefit of adding
+``__matmul__`` to these types just isn't there, so for now we'll
+continue to delegate these problems to numpy and friends, and defer a
+more systematic solution to a future proposal.
+
+There are also non-numeric Python builtins which define ``__mul__``
+(``str``, ``list``, ...).  We do not define ``__matmul__`` for these
+types either, because why would we even do that.
+
+
+Non-definition of matrix power
+------------------------------
+
+Earlier versions of this PEP also proposed a matrix power operator,
+``@@``, analogous to ``**``.  But on further consideration, it was
+decided that the utility of this was sufficiently unclear that it
+would be better to leave it out for now, and only revisit the issue if
+-- once we have more experience with ``@`` -- it turns out that ``@@``
+is truly missed. [#atat-discussion]_
+
+
+Rejected alternatives to adding a new operator
+==============================================
+
+Over the past few decades, the Python numeric community has explored a
+variety of ways to resolve the tension between matrix and elementwise
+multiplication operations.  PEP 211 and PEP 225, both proposed in 2000
+and last seriously discussed in 2008 [#threads-2008]_, were early
+attempts to add new operators to solve this problem, but suffered from
+serious flaws; in particular, at that time the Python numerical
+community had not yet reached consensus on the proper API for array
+objects, or on what operators might be needed or useful (e.g., PEP 225
+proposes 6 new operators with unspecified semantics).  Experience
+since then has now led to consensus that the best solution, for both
+numeric Python and core Python, is to add a single infix operator for
+matrix multiply (together with the other new operators this implies
+like ``@=``).
+
+We review some of the rejected alternatives here.
+
+**Use a second type that defines __mul__ as matrix multiplication:**
+As discussed above (`Background: What's wrong with the status quo?`_),
+this has been tried this for many years via the ``numpy.matrix`` type
+(and its predecessors in Numeric and numarray).  The result is a
+strong consensus among both numpy developers and developers of
+downstream packages that ``numpy.matrix`` should essentially never be
+used, because of the problems caused by having conflicting duck types
+for arrays.  (Of course one could then argue we should *only* define
+``__mul__`` to be matrix multiplication, but then we'd have the same
+problem with elementwise multiplication.)  There have been several
+pushes to remove ``numpy.matrix`` entirely; the only counter-arguments
+have come from educators who find that its problems are outweighed by
+the need to provide a simple and clear mapping between mathematical
+notation and code for novices (see `Transparent syntax is especially
+crucial for non-expert programmers`_).  But, of course, starting out
+newbies with a dispreferred syntax and then expecting them to
+transition later causes its own problems.  The two-type solution is
+worse than the disease.
+
+**Add lots of new operators, or add a new generic syntax for defining
+infix operators:** In addition to being generally un-Pythonic and
+repeatedly rejected by BDFL fiat, this would be using a sledgehammer
+to smash a fly.  The scientific python community has consensus that
+adding one operator for matrix multiplication is enough to fix the one
+otherwise unfixable pain point. (In retrospect, we all think PEP 225
+was a bad idea too -- or at least far more complex than it needed to
+be.)
+
+**Add a new @ (or whatever) operator that has some other meaning in
+general Python, and then overload it in numeric code:** This was the
+approach taken by PEP 211, which proposed defining ``@`` to be the
+equivalent of ``itertools.product``.  The problem with this is that
+when taken on its own terms, it's pretty clear that
+``itertools.product`` doesn't actually need a dedicated operator.  It
+hasn't even been deemed worth of a builtin.  (During discussions of
+this PEP, a similar suggestion was made to define ``@`` as a general
+purpose function composition operator, and this suffers from the same
+problem; ``functools.compose`` isn't even useful enough to exist.)
+Matrix multiplication has a uniquely strong rationale for inclusion as
+an infix operator.  There almost certainly don't exist any other
+binary operations that will ever justify adding any other infix
+operators to Python.
+
+**Add a .dot method to array types so as to allow "pseudo-infix"
+A.dot(B) syntax:** This has been in numpy for some years, and in many
+cases it's better than dot(A, B).  But it's still much less readable
+than real infix notation, and in particular still suffers from an
+extreme overabundance of parentheses.  See `Why should matrix
+multiplication be infix?`_ above.
+
+**Use a 'with' block to toggle the meaning of * within a single code
+block**: E.g., numpy could define a special context object so that
+we'd have::
+
+    c = a * b   # element-wise multiplication
+    with numpy.mul_as_dot:
+        c = a * b  # matrix multiplication
+
+However, this has two serious problems: first, it requires that every
+array-like type's ``__mul__`` method know how to check some global
+state (``numpy.mul_is_currently_dot`` or whatever).  This is fine if
+``a`` and ``b`` are numpy objects, but the world contains many
+non-numpy array-like objects.  So this either requires non-local
+coupling -- every numpy competitor library has to import numpy and
+then check ``numpy.mul_is_currently_dot`` on every operation -- or
+else it breaks duck-typing, with the above code doing radically
+different things depending on whether ``a`` and ``b`` are numpy
+objects or some other sort of object.  Second, and worse, ``with``
+blocks are dynamically scoped, not lexically scoped; i.e., any
+function that gets called inside the ``with`` block will suddenly find
+itself executing inside the mul_as_dot world, and crash and burn
+horribly -- if you're lucky.  So this is a construct that could only
+be used safely in rather limited cases (no function calls), and which
+would make it very easy to shoot yourself in the foot without warning.
+
+**Use a language preprocessor that adds extra numerically-oriented
+operators and perhaps other syntax:** (As per recent BDFL suggestion:
+[#preprocessor]_) This suggestion seems based on the idea that
+numerical code needs a wide variety of syntax additions.  In fact,
+given ``@``, most numerical users don't need any other operators or
+syntax; it solves the one really painful problem that cannot be solved
+by other means, and that causes painful reverberations through the
+larger ecosystem.  Defining a new language (presumably with its own
+parser which would have to be kept in sync with Python's, etc.), just
+to support a single binary operator, is neither practical nor
+desireable.  In the numerical context, Python's competition is
+special-purpose numerical languages (Matlab, R, IDL, etc.).  Compared
+to these, Python's killer feature is exactly that one can mix
+specialized numerical code with code for XML parsing, web page
+generation, database access, network programming, GUI libraries, and
+so forth, and we also gain major benefits from the huge variety of
+tutorials, reference material, introductory classes, etc., which use
+Python.  Fragmenting "numerical Python" from "real Python" would be a
+major source of confusion.  A major motivation for this PEP is to
+*reduce* fragmentation.  Having to set up a preprocessor would be an
+especially prohibitive complication for unsophisticated users.  And we
+use Python because we like Python!  We don't want
+almost-but-not-quite-Python.
+
+**Use overloading hacks to define a "new infix operator" like *dot*,
+as in a well-known Python recipe:** (See: [#infix-hack]_) Beautiful is
+better than ugly.  This is... not beautiful.  And not Pythonic.  And
+especially unfriendly to beginners, who are just trying to wrap their
+heads around the idea that there's a coherent underlying system behind
+these magic incantations that they're learning, when along comes an
+evil hack like this that violates that system, creates bizarre error
+messages when accidentally misused, and whose underlying mechanisms
+can't be understood without deep knowledge of how object oriented
+systems work.
+
+**Use a special "facade" type to support syntax like arr.M * arr:**
+This is very similar to the previous proposal, in that the ``.M``
+attribute would basically return the same object as ``arr *dot` would,
+and thus suffers the same objections about 'magicalness'.  This
+approach also has some non-obvious complexities: for example, while
+``arr.M * arr`` must return an array, ``arr.M * arr.M`` and ``arr *
+arr.M`` must return facade objects, or else ``arr.M * arr.M * arr``
+and ``arr * arr.M * arr`` will not work.  But this means that facade
+objects must be able to recognize both other array objects and other
+facade objects (which creates additional complexity for writing
+interoperating array types from different libraries who must now
+recognize both each other's array types and their facade types).  It
+also creates pitfalls for users who may easily type ``arr * arr.M`` or
+``arr.M * arr.M`` and expect to get back an array object; instead,
+they will get a mysterious object that throws errors when they attempt
+to use it.  Basically with this approach users must be careful to
+think of ``.M*`` as an indivisible unit that acts as an infix operator
+-- and as infix-operator-like token strings go, at least ``*dot*``
+is prettier looking (look at its cute little ears!).
+
+
+Discussions of this PEP
+=======================
+
+Collected here for reference:
+
+* Github pull request containing much of the original discussion and
+  drafting: https://github.com/numpy/numpy/pull/4351
+
+* sympy mailing list discussions of an early draft:
+
+  * https://groups.google.com/forum/#!topic/sympy/22w9ONLa7qo
+  * https://groups.google.com/forum/#!topic/sympy/4tGlBGTggZY
+
+* sage-devel mailing list discussions of an early draft:
+  https://groups.google.com/forum/#!topic/sage-devel/YxEktGu8DeM
+
+* 13-Mar-2014 python-ideas thread:
+  https://mail.python.org/pipermail/python-ideas/2014-March/027053.html
+
+* numpy-discussion thread on whether to keep ``@@``:
+  http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069448.html
+
+* numpy-discussion threads on precedence/associativity of ``@``:
+  * http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069444.html
+  * http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069605.html
+
+
+References
+==========
+
+.. [#preprocessor] From a comment by GvR on a G+ post by GvR; the
+   comment itself does not seem to be directly linkable: https://plus.google.com/115212051037621986145/posts/hZVVtJ9bK3u
+.. [#infix-hack] http://code.activestate.com/recipes/384122-infix-operators/
+   http://www.sagemath.org/doc/reference/misc/sage/misc/decorators.html#sage.misc.decorators.infix_operator
+.. [#scipy-conf] http://conference.scipy.org/past.html
+.. [#pydata-conf] http://pydata.org/events/
+.. [#lht] In this formula, :math:`\beta` is a vector or matrix of
+   regression coefficients, :math:`V` is the estimated
+   variance/covariance matrix for these coefficients, and we want to
+   test the null hypothesis that :math:`H\beta = r`; a large :math:`S`
+   then indicates that this hypothesis is unlikely to be true. For
+   example, in an analysis of human height, the vector :math:`\beta`
+   might contain one value which was the the average height of the
+   measured men, and another value which was the average height of the
+   measured women, and then setting :math:`H = [1, -1], r = 0` would
+   let us test whether men and women are the same height on
+   average. Compare to eq. 2.139 in
+   http://sfb649.wiwi.hu-berlin.de/fedc_homepage/xplore/tutorials/xegbohtmlnode17.html
+
+   Example code is adapted from https://github.com/rerpy/rerpy/blob/0d274f85e14c3b1625acb22aed1efa85d122ecb7/rerpy/incremental_ls.py#L202
+
+.. [#pycon-tutorials] Out of the 36 tutorials scheduled for PyCon 2014
+   (https://us.pycon.org/2014/schedule/tutorials/), we guess that the
+   8 below will almost certainly deal with matrices:
+
+   * Dynamics and control with Python
+
+   * Exploring machine learning with Scikit-learn
+
+   * How to formulate a (science) problem and analyze it using Python
+     code
+
+   * Diving deeper into Machine Learning with Scikit-learn
+
+   * Data Wrangling for Kaggle Data Science Competitions – An etude
+
+   * Hands-on with Pydata: how to build a minimal recommendation
+     engine.
+
+   * Python for Social Scientists
+
+   * Bayesian statistics made simple
+
+   In addition, the following tutorials could easily involve matrices:
+
+   * Introduction to game programming
+
+   * mrjob: Snakes on a Hadoop *("We'll introduce some data science
+     concepts, such as user-user similarity, and show how to calculate
+     these metrics...")*
+
+   * Mining Social Web APIs with IPython Notebook
+
+   * Beyond Defaults: Creating Polished Visualizations Using Matplotlib
+
+   This gives an estimated range of 8 to 12 / 36 = 22% to 33% of
+   tutorials dealing with matrices; saying ~20% then gives us some
+   wiggle room in case our estimates are high.
+
+.. [#sloc-details] SLOCs were defined as physical lines which contain
+   at least one token that is not a COMMENT, NEWLINE, ENCODING,
+   INDENT, or DEDENT.  Counts were made by using ``tokenize`` module
+   from Python 3.2.3 to examine the tokens in all files ending ``.py``
+   underneath some directory.  Only tokens which occur at least once
+   in the source trees are included in the table.  The counting script
+   is available `in the PEP repository
+   <http://hg.python.org/peps/file/tip/pep-0465/scan-ops.py>`_.
+
+   Matrix multiply counts were estimated by counting how often certain
+   tokens which are used as matrix multiply function names occurred in
+   each package.  This creates a small number of false positives for
+   scikit-learn, because we also count instances of the wrappers
+   around ``dot`` that this package uses, and so there are a few dozen
+   tokens which actually occur in ``import`` or ``def`` statements.
+
+   All counts were made using the latest development version of each
+   project as of 21 Feb 2014.
+
+   'stdlib' is the contents of the Lib/ directory in commit
+   d6aa3fa646e2 to the cpython hg repository, and treats the following
+   tokens as indicating matrix multiply: n/a.
+
+   'scikit-learn' is the contents of the sklearn/ directory in commit
+   69b71623273ccfc1181ea83d8fb9e05ae96f57c7 to the scikit-learn
+   repository (https://github.com/scikit-learn/scikit-learn), and
+   treats the following tokens as indicating matrix multiply: ``dot``,
+   ``fast_dot``, ``safe_sparse_dot``.
+
+   'nipy' is the contents of the nipy/ directory in commit
+   5419911e99546401b5a13bd8ccc3ad97f0d31037 to the nipy repository
+   (https://github.com/nipy/nipy/), and treats the following tokens as
+   indicating matrix multiply: ``dot``.
+
+.. [#blas-fork] BLAS libraries have a habit of secretly spawning
+   threads, even when used from single-threaded programs.  And threads
+   play very poorly with ``fork()``; the usual symptom is that
+   attempting to perform linear algebra in a child process causes an
+   immediate deadlock.
+
+.. [#threads-2008] http://fperez.org/py4science/numpy-pep225/numpy-pep225.html
+
+.. [#broadcasting] http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
+
+.. [#matmul-other-langs] http://mail.scipy.org/pipermail/scipy-user/2014-February/035499.html
+
+.. [#github-details] Counts were produced by manually entering the
+   string ``"import foo"`` or ``"from foo import"`` (with quotes) into
+   the Github code search page, e.g.:
+   https://github.com/search?q=%22import+numpy%22&ref=simplesearch&type=Code
+   on 2014-04-10 at ~21:00 UTC.  The reported values are the numbers
+   given in the "Languages" box on the lower-left corner, next to
+   "Python".  This also causes some undercounting (e.g., leaving out
+   Cython code, and possibly one should also count HTML docs and so
+   forth), but these effects are negligible (e.g., only ~1% of numpy
+   usage appears to occur in Cython code, and probably even less for
+   the other modules listed).  The use of this box is crucial,
+   however, because these counts appear to be stable, while the
+   "overall" counts listed at the top of the page ("We've found ___
+   code results") are highly variable even for a single search --
+   simply reloading the page can cause this number to vary by a factor
+   of 2 (!!).  (They do seem to settle down if one reloads the page
+   repeatedly, but nonetheless this is spooky enough that it seemed
+   better to avoid these numbers.)
+
+   These numbers should of course be taken with multiple grains of
+   salt; it's not clear how representative Github is of Python code in
+   general, and limitations of the search tool make it impossible to
+   get precise counts.  AFAIK this is the best data set currently
+   available, but it'd be nice if it were better.  In particular:
+
+   * Lines like ``import sys, os`` will only be counted in the ``sys``
+     row.
+
+   * A file containing both ``import X`` and ``from X import`` will be
+     counted twice
+
+   * Imports of the form ``from X.foo import ...`` are missed.  We
+     could catch these by instead searching for "from X", but this is
+     a common phrase in English prose, so we'd end up with false
+     positives from comments, strings, etc.  For many of the modules
+     considered this shouldn't matter too much -- for example, the
+     stdlib modules have flat namespaces -- but it might especially
+     lead to undercounting of django, scipy, and twisted.
+
+   Also, it's possible there exist other non-stdlib modules we didn't
+   think to test that are even more-imported than numpy -- though we
+   tried quite a few of the obvious suspects.  If you find one, let us
+   know!  The modules tested here were chosen based on a combination
+   of intuition and the top-100 list at pypi-ranking.info.
+
+   Fortunately, it doesn't really matter if it turns out that numpy
+   is, say, merely the *third* most-imported non-stdlib module, since
+   the point is just that numeric programming is a common and
+   mainstream activity.
+
+   Finally, we should point out the obvious: whether a package is
+   import**ed** is rather different from whether it's import**ant**.
+   No-one's claiming numpy is "the most important package" or anything
+   like that.  Certainly more packages depend on distutils, e.g., then
+   depend on numpy -- and far fewer source files import distutils than
+   import numpy.  But this is fine for our present purposes.  Most
+   source files don't import distutils because most source files don't
+   care how they're distributed, so long as they are; these source
+   files thus don't care about details of how distutils' API works.
+   This PEP is in some sense about changing how numpy's and related
+   packages' APIs work, so the relevant metric is to look at source
+   files that are choosing to directly interact with that API, which
+   is sort of like what we get by looking at import statements.
+
+.. [#hugunin] The first such proposal occurs in Jim Hugunin's very
+   first email to the matrix SIG in 1995, which lays out the first
+   draft of what became Numeric. He suggests using ``*`` for
+   elementwise multiplication, and ``%`` for matrix multiplication:
+   https://mail.python.org/pipermail/matrix-sig/1995-August/000002.html
+
+.. [#atat-discussion] http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069502.html
+
+.. [#associativity-discussions]
+   http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069444.html
+   http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069605.html
+
+.. [#oil-industry-versus-right-associativity]
+   http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069610.html
+
+.. [#numpy-associativity-counts]
+   http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069578.html
+
+.. [#group-associativity]
+   http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069530.html
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
diff --git a/doc/release/1.8.2-notes.rst b/doc/release/1.8.2-notes.rst
new file mode 100644
index 000000000000..c21f81a27dd4
--- /dev/null
+++ b/doc/release/1.8.2-notes.rst
@@ -0,0 +1,19 @@
+NumPy 1.8.2 Release Notes
+*************************
+
+This is a bugfix only release in the 1.8.x series.
+
+Issues fixed
+============
+
+* gh-4836: partition produces wrong results for multiple selections in equal ranges
+* gh-4656: Make fftpack._raw_fft threadsafe
+* gh-4628: incorrect argument order to _copyto in in np.nanmax, np.nanmin
+* gh-4642: Hold GIL for converting dtypes types with fields
+* gh-4733: fix np.linalg.svd(b, compute_uv=False)
+* gh-4853: avoid unaligned simd load on reductions on i386
+* gh-4722: Fix seg fault converting empty string to object
+* gh-4613: Fix lack of NULL check in array_richcompare
+* gh-4774: avoid unaligned access for strided byteswap
+* gh-650: Prevent division by zero when creating arrays from some buffers
+* gh-4602: ifort has issues with optimization flag O2, use O1
diff --git a/doc/release/1.9.0-notes.rst b/doc/release/1.9.0-notes.rst
index c00f7f9d6912..37343ec6dbe9 100644
--- a/doc/release/1.9.0-notes.rst
+++ b/doc/release/1.9.0-notes.rst
@@ -6,8 +6,6 @@ This release supports Python 2.6 - 2.7 and 3.2 - 3.4.
 
 Highlights
 ==========
-* Addition of `__numpy_ufunc__` to allow overriding ufuncs in ndarray
-  subclasses.
 * Numerous performance improvements in various areas, most notably indexing and
   operations on small arrays are significantly faster.
   Indexing operations now also release the GIL.
@@ -35,6 +33,8 @@ Future Changes
 * String version checks will break because, e.g., '1.9' > '1.10' is True. A
   NumpyVersion class has been added that can be used for such comparisons.
 * The diagonal and diag functions will return writeable views in 1.10.0
+* The `S` and/or `a` dtypes may be changed to represent Python strings
+  instead of bytes, in Python 3 these two types are very different.
 
 
 Compatibility notes
@@ -176,6 +176,11 @@ introduced in advanced indexing operations:
 
 * Indexing with more then one ellipsis (``...``) is deprecated.
 
+Non-integer reduction axis indexes are deprecated
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Non-integer axis indexes to reduction ufuncs like `add.reduce` or `sum` are
+deprecated.
+
 ``promote_types`` and string dtype
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ``promote_types`` function now returns a valid string length when given an
@@ -262,13 +267,6 @@ ufunc reductions do since 1.7. One can now say axis=(index, index) to pick a
 list of axes for the reduction. The ``keepdims`` keyword argument was also
 added to allow convenient broadcasting to arrays of the original shape.
 
-Ufunc and Dot Overrides
-~~~~~~~~~~~~~~~~~~~~~~~
-For better compatibility with external objects you can now override
-universal functions (ufuncs), ``numpy.core._dotblas.dot``, and
-``numpy.core.multiarray.dot`` (the numpy.dot functions). By defining a
-``__numpy_ufunc__`` method.
-
 Dtype parameter added to ``np.linspace`` and ``np.logspace``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The returned data type from the ``linspace`` and ``logspace`` functions can
@@ -336,6 +334,12 @@ in either an error being raised, or wrong results computed.
 Improvements
 ============
 
+Better numerical stability for sum in some cases
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Pairwise summation is now used in the sum method, but only along the fast
+axis and for groups of the values <= 8192 in length. This should also
+improve the accuracy of var and std in some common cases.
+
 Percentile implemented in terms of ``np.partition``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ``np.percentile`` has been implemented in terms of ``np.partition`` which
diff --git a/doc/release/1.9.1-notes.rst b/doc/release/1.9.1-notes.rst
new file mode 100644
index 000000000000..a72e71aae151
--- /dev/null
+++ b/doc/release/1.9.1-notes.rst
@@ -0,0 +1,33 @@
+NumPy 1.9.1 Release Notes
+*************************
+
+This is a bugfix only release in the 1.9.x series.
+
+Issues fixed
+============
+
+* gh-5184: restore linear edge behaviour of gradient to as it was in < 1.9.
+  The second order behaviour is available via the `edge_order` keyword
+* gh-4007: workaround Accelerate sgemv crash on OSX 10.9
+* gh-5100: restore object dtype inference from iterable objects without `len()`
+* gh-5163: avoid gcc-4.1.2 (red hat 5) miscompilation causing a crash
+* gh-5138: fix nanmedian on arrays containing inf
+* gh-5240: fix not returning out array from ufuncs with subok=False set
+* gh-5203: copy inherited masks in MaskedArray.__array_finalize__
+* gh-2317: genfromtxt did not handle filling_values=0 correctly
+* gh-5067: restore api of npy_PyFile_DupClose in python2
+* gh-5063: cannot convert invalid sequence index to tuple
+* gh-5082: Segmentation fault with argmin() on unicode arrays
+* gh-5095: don't propagate subtypes from np.where
+* gh-5104: np.inner segfaults with SciPy's sparse matrices
+* gh-5251: Issue with fromarrays not using correct format for unicode arrays
+* gh-5136: Import dummy_threading if importing threading fails
+* gh-5148: Make numpy import when run with Python flag '-OO'
+* gh-5147: Einsum double contraction in particular order causes ValueError
+* gh-479: Make f2py work with intent(in out)
+* gh-5170: Make python2 .npy files readable in python3
+* gh-5027: Use 'll' as the default length specifier for long long
+* gh-4896: fix build error with MSVC 2013 caused by C99 complex support
+* gh-4465: Make PyArray_PutTo respect writeable flag
+* gh-5225: fix crash when using arange on datetime without dtype set
+* gh-5231: fix build in c99 mode
diff --git a/doc/release/1.9.2-notes.rst b/doc/release/1.9.2-notes.rst
new file mode 100644
index 000000000000..857b6fe30b57
--- /dev/null
+++ b/doc/release/1.9.2-notes.rst
@@ -0,0 +1,25 @@
+NumPy 1.9.2 Release Notes
+*************************
+
+This is a bugfix only release in the 1.9.x series.
+
+Issues fixed
+============
+
+* `#5316 <https://github.com/numpy/numpy/issues/5316>`__: fix too large dtype alignment of strings and complex types
+* `#5424 <https://github.com/numpy/numpy/issues/5424>`__: fix ma.median when used on ndarrays
+* `#5481 <https://github.com/numpy/numpy/issues/5481>`__: Fix astype for structured array fields of different byte order
+* `#5354 <https://github.com/numpy/numpy/issues/5354>`__: fix segfault when clipping complex arrays
+* `#5524 <https://github.com/numpy/numpy/issues/5524>`__: allow np.argpartition on non ndarrays
+* `#5612 <https://github.com/numpy/numpy/issues/5612>`__: Fixes ndarray.fill to accept full range of uint64
+* `#5155 <https://github.com/numpy/numpy/issues/5155>`__: Fix loadtxt with comments=None and a string None data
+* `#4476 <https://github.com/numpy/numpy/issues/4476>`__: Masked array view fails if structured dtype has datetime component
+* `#5388 <https://github.com/numpy/numpy/issues/5388>`__: Make RandomState.set_state and RandomState.get_state threadsafe
+* `#5390 <https://github.com/numpy/numpy/issues/5390>`__: make seed, randint and shuffle threadsafe
+* `#5374 <https://github.com/numpy/numpy/issues/5374>`__: Fixed incorrect assert_array_almost_equal_nulp documentation
+* `#5393 <https://github.com/numpy/numpy/issues/5393>`__: Add support for ATLAS > 3.9.33.
+* `#5313 <https://github.com/numpy/numpy/issues/5313>`__: PyArray_AsCArray caused segfault for 3d arrays
+* `#5492 <https://github.com/numpy/numpy/issues/5492>`__: handle out of memory in rfftf
+* `#4181 <https://github.com/numpy/numpy/issues/4181>`__: fix a few bugs in the random.pareto docstring
+* `#5359 <https://github.com/numpy/numpy/issues/5359>`__: minor changes to linspace docstring
+* `#4723 <https://github.com/numpy/numpy/issues/4723>`__: fix a compile issues on AIX
diff --git a/doc/release/1.9.3-notes.rst b/doc/release/1.9.3-notes.rst
new file mode 100644
index 000000000000..9abadec5f966
--- /dev/null
+++ b/doc/release/1.9.3-notes.rst
@@ -0,0 +1,23 @@
+NumPy 1.9.3 Release Notes
+*************************
+
+This is a bugfix only release in the 1.9.x series.
+
+The only changes from 1.9.2 are a fix for reading gzipped text files on Python
+3.5 and some build fixes.
+
+Issues fixed
+============
+
+* `#5866 <https://github.com/numpy/numpy/pull/5866>`__: fix error finding
+  Python headers when ``build_ext`` ``--include-dirs`` is set;
+* `#6016 <https://github.com/numpy/numpy/pull/6016>`__: fix ``np.loadtxt``
+  error on Python 3.5 when reading from gzip files;
+* `#5555 <https://github.com/numpy/numpy/pull/5555>`__: Replace deprecated
+  options for ifort;
+* `#6096 <https://github.com/numpy/numpy/pull/6096>`__: remove /GL for VS2015
+  in check_long_double_representation;
+* `#6141 <https://github.com/numpy/numpy/pull/6141>`__: enable Visual Studio
+  2015 C99 features;
+* `#6171 <https://github.com/numpy/numpy/pull/6171>`__: revert C99 complex for
+  MSVC14.
diff --git a/doc/source/reference/arrays.classes.rst b/doc/source/reference/arrays.classes.rst
index 036185782b17..2b97bc309357 100644
--- a/doc/source/reference/arrays.classes.rst
+++ b/doc/source/reference/arrays.classes.rst
@@ -39,76 +39,6 @@ Special attributes and methods
 
 Numpy provides several hooks that classes can customize:
 
-.. function:: class.__numpy_ufunc__(self, ufunc, method, i, inputs, **kwargs)
-
-   .. versionadded:: 1.9
-
-   Any class (ndarray subclass or not) can define this method to
-   override behavior of Numpy's ufuncs. This works quite similarly to
-   Python's ``__mul__`` and other binary operation routines.
-
-   - *ufunc* is the ufunc object that was called. 
-   - *method* is a string indicating which Ufunc method was called
-     (one of ``"__call__"``, ``"reduce"``, ``"reduceat"``,
-     ``"accumulate"``, ``"outer"``, ``"inner"``). 
-   - *i* is the index of *self* in *inputs*.
-   - *inputs* is a tuple of the input arguments to the ``ufunc``
-   - *kwargs* is a dictionary containing the optional input arguments
-     of the ufunc. The ``out`` argument is always contained in
-     *kwargs*, if given. See the discussion in :ref:`ufuncs` for
-     details.
-
-   The method should return either the result of the operation, or
-   :obj:`NotImplemented` if the operation requested is not
-   implemented.
-
-   If one of the arguments has a :func:`__numpy_ufunc__` method, it is
-   executed *instead* of the ufunc.  If more than one of the input
-   arguments implements :func:`__numpy_ufunc__`, they are tried in the
-   order: subclasses before superclasses, otherwise left to right. The
-   first routine returning something else than :obj:`NotImplemented`
-   determines the result. If all of the :func:`__numpy_ufunc__`
-   operations return :obj:`NotImplemented`, a :exc:`TypeError` is
-   raised.
-
-   If an :class:`ndarray` subclass defines the :func:`__numpy_ufunc__`
-   method, this disables the :func:`__array_wrap__`,
-   :func:`__array_prepare__`, :data:`__array_priority__` mechanism
-   described below.
-
-   .. note:: In addition to ufuncs, :func:`__numpy_ufunc__` also
-      overrides the behavior of :func:`numpy.dot` even though it is
-      not an Ufunc.
-
-   .. note:: If you also define right-hand binary operator override
-      methods (such as ``__rmul__``) or comparison operations (such as
-      ``__gt__``) in your class, they take precedence over the
-      :func:`__numpy_ufunc__` mechanism when resolving results of
-      binary operations (such as ``ndarray_obj * your_obj``).
-
-      The technical special case is: ``ndarray.__mul__`` returns
-      ``NotImplemented`` if the other object is *not* a subclass of
-      :class:`ndarray`, and defines both ``__numpy_ufunc__`` and
-      ``__rmul__``. Similar exception applies for the other operations
-      than multiplication.
-
-      In such a case, when computing a binary operation such as
-      ``ndarray_obj * your_obj``, your ``__numpy_ufunc__`` method
-      *will not* be called.  Instead, the execution passes on to your
-      right-hand ``__rmul__`` operation, as per standard Python
-      operator override rules.
-
-      Similar special case applies to *in-place operations*: If you
-      define ``__rmul__``, then ``ndarray_obj *= your_obj`` *will not*
-      call your ``__numpy_ufunc__`` implementation. Instead, the
-      default Python behavior ``ndarray_obj = ndarray_obj * your_obj``
-      occurs.
-
-      Note that the above discussion applies only to Python's builtin
-      binary operation mechanism. ``np.multiply(ndarray_obj,
-      your_obj)`` always calls only your ``__numpy_ufunc__``, as
-      expected.
-
 .. function:: class.__array_finalize__(self)
 
    This method is called whenever the system internally allocates a
diff --git a/doc/source/reference/arrays.indexing.rst b/doc/source/reference/arrays.indexing.rst
index d04f89897858..ef0180e0f43b 100644
--- a/doc/source/reference/arrays.indexing.rst
+++ b/doc/source/reference/arrays.indexing.rst
@@ -31,9 +31,9 @@ integer, or a tuple of slice objects and integers. :const:`Ellipsis`
 and :const:`newaxis` objects can be interspersed with these as
 well. In order to remain backward compatible with a common usage in
 Numeric, basic slicing is also initiated if the selection object is
-any sequence (such as a :class:`list`) containing :class:`slice`
+any non-ndarray sequence (such as a :class:`list`) containing :class:`slice`
 objects, the :const:`Ellipsis` object, or the :const:`newaxis` object,
-but no integer arrays or other embedded sequences.
+but not for integer arrays or other embedded sequences.
 
 .. index::
    triple: ndarray; special methods; getslice
@@ -46,8 +46,8 @@ scalar <arrays.scalars>` representing the corresponding item.  As in
 Python, all indices are zero-based: for the *i*-th index :math:`n_i`,
 the valid range is :math:`0 \le n_i < d_i` where :math:`d_i` is the
 *i*-th element of the shape of the array.  Negative indices are
-interpreted as counting from the end of the array (*i.e.*, if *i < 0*,
-it means :math:`n_i + i`).
+interpreted as counting from the end of the array (*i.e.*, if
+:math:`n_i < 0`, it means :math:`n_i + d_i`).
 
 
 All arrays generated by basic slicing are always :term:`views <view>`
@@ -84,7 +84,7 @@ concepts to remember include:
 
 - Assume *n* is the number of elements in the dimension being
   sliced. Then, if *i* is not given it defaults to 0 for *k > 0* and
-  *n* for *k < 0* . If *j* is not given it defaults to *n* for *k > 0*
+  *n - 1* for *k < 0* . If *j* is not given it defaults to *n* for *k > 0*
   and -1 for *k < 0* . If *k* is not given it defaults to 1. Note that
   ``::`` is the same as ``:`` and means select all indices along this
   axis.
diff --git a/doc/source/reference/c-api.array.rst b/doc/source/reference/c-api.array.rst
index 23355bc91c80..baf8043789b5 100644
--- a/doc/source/reference/c-api.array.rst
+++ b/doc/source/reference/c-api.array.rst
@@ -1632,11 +1632,11 @@ Conversion
 Shape Manipulation
 ^^^^^^^^^^^^^^^^^^
 
-.. cfunction:: PyObject* PyArray_Newshape(PyArrayObject* self, PyArray_Dims* newshape)
+.. cfunction:: PyObject* PyArray_Newshape(PyArrayObject* self, PyArray_Dims* newshape, NPY_ORDER order)
 
     Result will be a new array (pointing to the same memory location
-    as *self* if possible), but having a shape given by *newshape*
-    . If the new shape is not compatible with the strides of *self*,
+    as *self* if possible), but having a shape given by *newshape*.
+    If the new shape is not compatible with the strides of *self*,
     then a copy of the array with the new specified shape will be
     returned.
 
@@ -1645,6 +1645,7 @@ Shape Manipulation
     Equivalent to :meth:`ndarray.reshape` (*self*, *shape*) where *shape* is a
     sequence. Converts *shape* to a :ctype:`PyArray_Dims` structure and
     calls :cfunc:`PyArray_Newshape` internally.
+    For back-ward compatability -- Not recommended
 
 .. cfunction:: PyObject* PyArray_Squeeze(PyArrayObject* self)
 
diff --git a/doc/source/reference/c-api.types-and-structures.rst b/doc/source/reference/c-api.types-and-structures.rst
index f1e216a5c5ca..95272c151a1e 100644
--- a/doc/source/reference/c-api.types-and-structures.rst
+++ b/doc/source/reference/c-api.types-and-structures.rst
@@ -244,7 +244,7 @@ PyArrayDescr_Type
         Indicates that items of this data-type must be reference
         counted (using :cfunc:`Py_INCREF` and :cfunc:`Py_DECREF` ).
 
-    .. cvar:: NPY_ITEM_LISTPICKLE
+    .. cvar:: NPY_LIST_PICKLE
 
         Indicates arrays of this data-type must be converted to a list
         before pickling.
@@ -646,9 +646,9 @@ PyUFunc_Type
           void **data;
           int ntypes;
           int check_return;
-          char *name;
+          const char *name;
           char *types;
-          char *doc;
+          const char *doc;
           void *ptr;
           PyObject *obj;
           PyObject *userloops;
diff --git a/doc/source/reference/routines.array-creation.rst b/doc/source/reference/routines.array-creation.rst
index 23b35243b2f6..c7c6ab8152f9 100644
--- a/doc/source/reference/routines.array-creation.rst
+++ b/doc/source/reference/routines.array-creation.rst
@@ -20,6 +20,8 @@ Ones and zeros
    ones_like
    zeros
    zeros_like
+   full
+   full_like
 
 From existing data
 ------------------
diff --git a/doc/source/reference/routines.array-manipulation.rst b/doc/source/reference/routines.array-manipulation.rst
index ca97bb47955c..81af0a315906 100644
--- a/doc/source/reference/routines.array-manipulation.rst
+++ b/doc/source/reference/routines.array-manipulation.rst
@@ -54,6 +54,8 @@ Changing kind of array
    asmatrix
    asfarray
    asfortranarray
+   ascontiguousarray
+   asarray_chkfinite
    asscalar
    require
 
diff --git a/doc/source/reference/routines.ma.rst b/doc/source/reference/routines.ma.rst
index 5cb38e83f41a..66bcb1f1c10a 100644
--- a/doc/source/reference/routines.ma.rst
+++ b/doc/source/reference/routines.ma.rst
@@ -65,6 +65,8 @@ Inspecting the array
    ma.nonzero
    ma.shape
    ma.size
+   ma.is_masked
+   ma.is_mask
 
    ma.MaskedArray.data
    ma.MaskedArray.mask
@@ -141,6 +143,7 @@ Joining arrays
 
    ma.column_stack
    ma.concatenate
+   ma.append
    ma.dstack
    ma.hstack
    ma.vstack
diff --git a/doc/source/reference/routines.maskna.rst b/doc/source/reference/routines.maskna.rst
deleted file mode 100644
index 2910acbac8f4..000000000000
--- a/doc/source/reference/routines.maskna.rst
+++ /dev/null
@@ -1,11 +0,0 @@
-NA-Masked Array Routines
-========================
-
-.. currentmodule:: numpy
-
-NA Values
----------
-.. autosummary::
-   :toctree: generated/
-
-   isna
diff --git a/doc/source/reference/routines.polynomials.classes.rst b/doc/source/reference/routines.polynomials.classes.rst
index 14729f08becb..c40795434be9 100644
--- a/doc/source/reference/routines.polynomials.classes.rst
+++ b/doc/source/reference/routines.polynomials.classes.rst
@@ -211,7 +211,7 @@ constant are 0, but both can be specified.::
 In the first case the lower bound of the integration is set to -1 and the
 integration constant is 0. In the second the constant of integration is set
 to 1 as well. Differentiation is simpler since the only option is the
-number times the polynomial is differentiated::
+number of times the polynomial is differentiated::
 
     >>> p = P([1, 2, 3])
     >>> p.deriv(1)
@@ -270,7 +270,7 @@ polynomials up to degree 5 are plotted below.
     >>> import matplotlib.pyplot as plt
     >>> from numpy.polynomial import Chebyshev as T
     >>> x = np.linspace(-1, 1, 100)
-    >>> for i in range(6): ax = plt.plot(x, T.basis(i)(x), lw=2, label="T_%d"%i)
+    >>> for i in range(6): ax = plt.plot(x, T.basis(i)(x), lw=2, label="$T_%d$"%i)
     ...
     >>> plt.legend(loc="upper left")
     <matplotlib.legend.Legend object at 0x3b3ee10>
@@ -284,7 +284,7 @@ The same plots over the range -2 <= `x` <= 2 look very different:
     >>> import matplotlib.pyplot as plt
     >>> from numpy.polynomial import Chebyshev as T
     >>> x = np.linspace(-2, 2, 100)
-    >>> for i in range(6): ax = plt.plot(x, T.basis(i)(x), lw=2, label="T_%d"%i)
+    >>> for i in range(6): ax = plt.plot(x, T.basis(i)(x), lw=2, label="$T_%d$"%i)
     ...
     >>> plt.legend(loc="lower right")
     <matplotlib.legend.Legend object at 0x3b3ee10>
diff --git a/doc/source/release.rst b/doc/source/release.rst
index eb366661fa28..8336b091a6a9 100644
--- a/doc/source/release.rst
+++ b/doc/source/release.rst
@@ -2,7 +2,11 @@
 Release Notes
 *************
 
+.. include:: ../release/1.9.2-notes.rst
+.. include:: ../release/1.9.1-notes.rst
 .. include:: ../release/1.9.0-notes.rst
+.. include:: ../release/1.8.2-notes.rst
+.. include:: ../release/1.8.1-notes.rst
 .. include:: ../release/1.8.0-notes.rst
 .. include:: ../release/1.7.2-notes.rst
 .. include:: ../release/1.7.1-notes.rst
diff --git a/doc/source/user/c-info.python-as-glue.rst b/doc/source/user/c-info.python-as-glue.rst
index 985d478e06b6..0560c005ea70 100644
--- a/doc/source/user/c-info.python-as-glue.rst
+++ b/doc/source/user/c-info.python-as-glue.rst
@@ -249,8 +249,14 @@ necessary to tell f2py that the value of n depends on the input a (so
 that it won't try to create the variable n until the variable a is
 created).
 
+After modifying ``add.pyf``, the new python module file can be generated
+by compiling both ``add.f95`` and ``add.pyf``::
+
+    f2py -c add.pyf add.f95 
+
 The new interface has docstring:
 
+    >>> import add
     >>> print add.zadd.__doc__
     zadd - Function signature:
       c = zadd(a,b)
diff --git a/doc/source/user/install.rst b/doc/source/user/install.rst
index 9d6f61e657db..1da664e08700 100644
--- a/doc/source/user/install.rst
+++ b/doc/source/user/install.rst
@@ -37,15 +37,16 @@ Most of the major distributions provide packages for NumPy, but these can lag
 behind the most recent NumPy release. Pre-built binary packages for Ubuntu are
 available on the `scipy ppa
 <https://edge.launchpad.net/~scipy/+archive/ppa>`_. Redhat binaries are
-available in the `EPD <http://www.enthought.com/products/epd.php>`_.
+available in the `Enthought Canopy
+<https://www.enthought.com/products/canopy/>`_.
 
 Mac OS X
 --------
 
 A universal binary installer for NumPy is available from the `download site
 <http://sourceforge.net/project/showfiles.php?group_id=1369&
-package_id=175103>`_. The `EPD <http://www.enthought.com/products/epd.php>`_
-provides NumPy binaries.
+package_id=175103>`_. The `Enthought Canopy
+<https://www.enthought.com/products/canopy/>`_ provides NumPy binaries.
 
 Building from source
 ====================
diff --git a/doc/sphinxext b/doc/sphinxext
index 447dd0b59c2f..84cc897d266e 160000
--- a/doc/sphinxext
+++ b/doc/sphinxext
@@ -1 +1 @@
-Subproject commit 447dd0b59c2fe91ca9643701036d3d04919ddc7e
+Subproject commit 84cc897d266e0afc28fc5296edf01afb08005472
diff --git a/numpy/add_newdocs.py b/numpy/add_newdocs.py
index 86ea4b8b6093..8af64a65f158 100644
--- a/numpy/add_newdocs.py
+++ b/numpy/add_newdocs.py
@@ -3834,7 +3834,7 @@ def luf(lamdaexpr, *args, **kwargs):
 
 add_newdoc('numpy.core.multiarray', 'copyto',
     """
-    copyto(dst, src, casting='same_kind', where=None, preservena=False)
+    copyto(dst, src, casting='same_kind', where=None)
 
     Copies values from one array to another, broadcasting as necessary.
 
@@ -3862,9 +3862,6 @@ def luf(lamdaexpr, *args, **kwargs):
         A boolean array which is broadcasted to match the dimensions
         of `dst`, and selects elements to copy from `src` to `dst`
         wherever it contains the value True.
-    preservena : bool, optional
-        If set to True, leaves any NA values in `dst` untouched. This
-        is similar to the "hard mask" feature in numpy.ma.
 
     """)
 
@@ -3879,11 +3876,6 @@ def luf(lamdaexpr, *args, **kwargs):
     If `values` is not the same size as `a` and `mask` then it will repeat.
     This gives behavior different from ``a[mask] = values``.
 
-    .. note:: The `putmask` functionality is also provided by `copyto`, which
-              can be significantly faster and in addition is NA-aware
-              (`preservena` keyword).  Replacing `putmask` with
-              ``np.copyto(a, values, where=mask)`` is recommended.
-
     Parameters
     ----------
     a : array_like
@@ -4459,12 +4451,12 @@ def luf(lamdaexpr, *args, **kwargs):
 
 
 tobytesdoc = """
-    a.tostring(order='C')
+    a.{name}(order='C')
 
-    Construct a Python string containing the raw data bytes in the array.
+    Construct Python bytes containing the raw data bytes in the array.
 
-    Constructs a Python string showing a copy of the raw contents of
-    data memory. The string can be produced in either 'C' or 'Fortran',
+    Constructs Python bytes showing a copy of the raw contents of
+    data memory. The bytes object can be produced in either 'C' or 'Fortran',
     or 'Any' order (the default is 'C'-order). 'Any' order means C-order
     unless the F_CONTIGUOUS flag in the array is set, in which case it
     means 'Fortran' order.
@@ -4479,29 +4471,31 @@ def luf(lamdaexpr, *args, **kwargs):
 
     Returns
     -------
-    s : str
-        A Python string exhibiting a copy of `a`'s raw data.
+    s : bytes
+        Python bytes exhibiting a copy of `a`'s raw data.
 
     Examples
     --------
     >>> x = np.array([[0, 1], [2, 3]])
     >>> x.tobytes()
-    '\\x00\\x00\\x00\\x00\\x01\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x03\\x00\\x00\\x00'
+    b'\\x00\\x00\\x00\\x00\\x01\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x03\\x00\\x00\\x00'
     >>> x.tobytes('C') == x.tobytes()
     True
     >>> x.tobytes('F')
-    '\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x01\\x00\\x00\\x00\\x03\\x00\\x00\\x00'
+    b'\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x01\\x00\\x00\\x00\\x03\\x00\\x00\\x00'
 
     """
 
 add_newdoc('numpy.core.multiarray', 'ndarray',
-           ('tostring', tobytesdoc.format(deprecated=
+           ('tostring', tobytesdoc.format(name='tostring',
+                                          deprecated=
                                           'This function is a compatibility '
                                           'alias for tobytes. Despite its '
                                           'name it returns bytes not '
                                           'strings.')))
 add_newdoc('numpy.core.multiarray', 'ndarray',
-           ('tobytes', tobytesdoc.format(deprecated='.. versionadded:: 1.9.0')))
+           ('tobytes', tobytesdoc.format(name='tobytes',
+                                         deprecated='.. versionadded:: 1.9.0')))
 
 add_newdoc('numpy.core.multiarray', 'ndarray', ('trace',
     """
@@ -5519,6 +5513,8 @@ def luf(lamdaexpr, *args, **kwargs):
         in the result as dimensions with size one. With this option,
         the result will broadcast correctly against the original `arr`.
 
+        .. versionadded:: 1.7.0
+
     Returns
     -------
     r : ndarray
diff --git a/numpy/compat/py3k.py b/numpy/compat/py3k.py
index f5ac3f9f8489..4607d9502332 100644
--- a/numpy/compat/py3k.py
+++ b/numpy/compat/py3k.py
@@ -36,7 +36,7 @@ def asstr(s):
         return str(s)
 
     def isfileobj(f):
-        return isinstance(f, (io.FileIO, io.BufferedReader))
+        return isinstance(f, (io.FileIO, io.BufferedReader, io.BufferedWriter))
 
     def open_latin1(filename, mode='r'):
         return open(filename, mode=mode, encoding='iso-8859-1')
diff --git a/numpy/compat/tests/test_compat.py b/numpy/compat/tests/test_compat.py
new file mode 100644
index 000000000000..3df142e042e3
--- /dev/null
+++ b/numpy/compat/tests/test_compat.py
@@ -0,0 +1,19 @@
+from os.path import join
+
+from numpy.compat import isfileobj
+from numpy.testing import TestCase, assert_
+from numpy.testing.utils import tempdir
+
+
+def test_isfileobj():
+    with tempdir(prefix="numpy_test_compat_") as folder:
+        filename = join(folder, 'a.bin')
+
+        with open(filename, 'wb') as f:
+            assert_(isfileobj(f))
+
+        with open(filename, 'ab') as f:
+            assert_(isfileobj(f))
+
+        with open(filename, 'rb') as f:
+            assert_(isfileobj(f))
diff --git a/numpy/core/__init__.py b/numpy/core/__init__.py
index 79bc72a8c59c..0b8d5bb17786 100644
--- a/numpy/core/__init__.py
+++ b/numpy/core/__init__.py
@@ -52,7 +52,11 @@
 # The name numpy.core._ufunc_reconstruct must be
 #   available for unpickling to work.
 def _ufunc_reconstruct(module, name):
-    mod = __import__(module)
+    # The `fromlist` kwarg is required to ensure that `mod` points to the
+    # inner-most module rather than the parent package when module name is
+    # nested. This makes it possible to pickle non-toplevel ufuncs such as
+    # scipy.special.expit for instance.
+    mod = __import__(module, fromlist=[name])
     return getattr(mod, name)
 
 def _ufunc_reduce(func):
diff --git a/numpy/core/blasdot/_dotblas.c b/numpy/core/blasdot/_dotblas.c
index 48aa39ff87df..0679e38f8162 100644
--- a/numpy/core/blasdot/_dotblas.c
+++ b/numpy/core/blasdot/_dotblas.c
@@ -9,6 +9,7 @@
 #include "numpy/arrayobject.h"
 #include "npy_config.h"
 #include "npy_pycompat.h"
+#include "common.h"
 #include "ufunc_override.h"
 #ifndef CBLAS_HEADER
 #define CBLAS_HEADER "cblas.h"
@@ -529,7 +530,7 @@ dotblas_matrixproduct(PyObject *NPY_UNUSED(dummy), PyObject *args, PyObject* kwa
             l = PyArray_DIM(oap1, PyArray_NDIM(oap1) - 1);
 
             if (PyArray_DIM(oap2, 0) != l) {
-                PyErr_SetString(PyExc_ValueError, "matrices are not aligned");
+                not_aligned(oap1, PyArray_NDIM(oap1) - 1, oap2, 0);
                 goto fail;
             }
             nd = PyArray_NDIM(ap1) + PyArray_NDIM(ap2) - 2;
@@ -579,7 +580,7 @@ dotblas_matrixproduct(PyObject *NPY_UNUSED(dummy), PyObject *args, PyObject* kwa
         l = PyArray_DIM(ap1, PyArray_NDIM(ap1) - 1);
 
         if (PyArray_DIM(ap2, 0) != l) {
-            PyErr_SetString(PyExc_ValueError, "matrices are not aligned");
+            not_aligned(ap1, PyArray_NDIM(ap1) - 1, ap2, 0);
             goto fail;
         }
         nd = PyArray_NDIM(ap1) + PyArray_NDIM(ap2) - 2;
@@ -1007,7 +1008,8 @@ dotblas_innerproduct(PyObject *NPY_UNUSED(dummy), PyObject *args)
         l = PyArray_DIM(ap1, PyArray_NDIM(ap1)-1);
 
         if (PyArray_DIM(ap2, PyArray_NDIM(ap2)-1) != l) {
-            PyErr_SetString(PyExc_ValueError, "matrices are not aligned");
+            not_aligned(ap1, PyArray_NDIM(ap1) - 1,
+                        ap2, PyArray_NDIM(ap2) - 1);
             goto fail;
         }
         nd = PyArray_NDIM(ap1)+PyArray_NDIM(ap2)-2;
diff --git a/numpy/core/blasdot/apple_sgemv_patch.c b/numpy/core/blasdot/apple_sgemv_patch.c
new file mode 100644
index 000000000000..9941a0731338
--- /dev/null
+++ b/numpy/core/blasdot/apple_sgemv_patch.c
@@ -0,0 +1,216 @@
+#ifdef APPLE_ACCELERATE_SGEMV_PATCH  
+
+/* This is an ugly hack to circumvent a bug in Accelerate's cblas_sgemv.
+ *
+ * See: https://github.com/numpy/numpy/issues/4007
+ *
+ */
+
+#define NPY_NO_DEPRECATED_API NPY_API_VERSION
+#include "Python.h"
+#include "numpy/arrayobject.h"
+
+#include <string.h>
+#include <dlfcn.h>
+#include <stdlib.h>
+#include <stdio.h>
+
+/* ----------------------------------------------------------------- */
+/* Original cblas_sgemv */
+
+#define VECLIB_FILE "/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/vecLib"
+
+enum CBLAS_ORDER {CblasRowMajor=101, CblasColMajor=102};
+enum CBLAS_TRANSPOSE {CblasNoTrans=111, CblasTrans=112, CblasConjTrans=113};
+extern void cblas_xerbla(int info, const char *rout, const char *form, ...);
+
+typedef void cblas_sgemv_t(const enum CBLAS_ORDER order,
+                 const enum CBLAS_TRANSPOSE TransA, const int M, const int N,
+                 const float alpha, const float  *A, const int lda,
+                 const float  *X, const int incX, 
+                 const float beta, float  *Y, const int incY);
+
+typedef void cblas_sgemm_t(const enum CBLAS_ORDER order,
+                 const enum CBLAS_TRANSPOSE TransA, const enum CBLAS_TRANSPOSE TransB,
+                 const int M, const int N, const int K,
+                 const float alpha, const float  *A, const int lda,
+                 const float  *B, const int ldb, 
+                 const float beta, float  *C, const int incC); 
+
+typedef void fortran_sgemv_t( const char* trans, const int* m, const int* n,
+             const float* alpha, const float* A, const int* ldA,
+             const float* X, const int* incX,
+             const float* beta, float* Y, const int* incY );
+
+static void *veclib = NULL;
+static cblas_sgemv_t *accelerate_cblas_sgemv = NULL;
+static cblas_sgemm_t *accelerate_cblas_sgemm = NULL;
+static fortran_sgemv_t *accelerate_sgemv = NULL;
+static int AVX_and_10_9 = 0;
+
+/* Dynamic check for AVX support
+ * __builtin_cpu_supports("avx") is available in gcc 4.8,
+ * but clang and icc do not currently support it. */
+#define cpu_supports_avx()\
+(system("sysctl -n machdep.cpu.features | grep -q AVX") == 0)
+
+/* Check if we are using MacOS X version 10.9 */
+#define using_mavericks()\
+(system("sw_vers -productVersion | grep -q 10\\.9\\.") == 0) 
+ 
+__attribute__((destructor))
+static void unloadlib(void)
+{
+   if (veclib) dlclose(veclib);
+}
+
+__attribute__((constructor))
+static void loadlib()
+/* automatically executed on module import */
+{
+    char errormsg[1024];
+    int AVX, MAVERICKS;
+    memset((void*)errormsg, 0, sizeof(errormsg));
+    /* check if the CPU supports AVX */
+    AVX = cpu_supports_avx();
+    /* check if the OS is MacOS X Mavericks */
+    MAVERICKS = using_mavericks();    
+    /* we need the workaround when the CPU supports
+     * AVX and the OS version is Mavericks */
+    AVX_and_10_9 = AVX && MAVERICKS;
+    /* load vecLib */
+    veclib = dlopen(VECLIB_FILE, RTLD_LOCAL | RTLD_FIRST);
+    if (!veclib) {
+        veclib = NULL;
+        sprintf(errormsg,"Failed to open vecLib from location '%s'.", VECLIB_FILE);
+        Py_FatalError(errormsg); /* calls abort() and dumps core */
+    }
+    /* resolve Fortran SGEMV from Accelerate */
+    accelerate_sgemv = (fortran_sgemv_t*) dlsym(veclib, "sgemv_");
+    if (!accelerate_sgemv) {
+        unloadlib();
+        sprintf(errormsg,"Failed to resolve symbol 'sgemv_'.");
+        Py_FatalError(errormsg);
+    }
+    /* resolve cblas_sgemv from Accelerate */
+    accelerate_cblas_sgemv = (cblas_sgemv_t*) dlsym(veclib, "cblas_sgemv");
+    if (!accelerate_cblas_sgemv) {
+        unloadlib();
+        sprintf(errormsg,"Failed to resolve symbol 'cblas_sgemv'.");
+        Py_FatalError(errormsg);
+    }
+    /* resolve cblas_sgemm from Accelerate */
+    accelerate_cblas_sgemm = (cblas_sgemm_t*) dlsym(veclib, "cblas_sgemm");
+    if (!accelerate_cblas_sgemm) {
+        unloadlib();
+        sprintf(errormsg,"Failed to resolve symbol 'cblas_sgemm'.");
+        Py_FatalError(errormsg);
+    }
+}
+
+/* ----------------------------------------------------------------- */
+/* Fortran SGEMV override */
+
+void sgemv_( const char* trans, const int* m, const int* n,
+             const float* alpha, const float* A, const int* ldA,
+             const float* X, const int* incX,
+             const float* beta, float* Y, const int* incY )
+{    
+    /* It is safe to use the original SGEMV if we are not using AVX on Mavericks
+     * or the input arrays A, X and Y are all aligned on 32 byte boundaries. */
+    #define BADARRAY(x) (((npy_intp)(void*)x) % 32)
+    const int use_sgemm = AVX_and_10_9 && (BADARRAY(A) || BADARRAY(X) || BADARRAY(Y));
+    if (!use_sgemm) {
+        accelerate_sgemv(trans,m,n,alpha,A,ldA,X,incX,beta,Y,incY);
+        return;
+    }
+
+    /* Arrays are misaligned, the CPU supports AVX, and we are running 
+     * Mavericks.
+     *
+     * Emulation of SGEMV with SGEMM: 
+     *
+     * SGEMV allows vectors to be strided. SGEMM requires all arrays to be
+     * contiguous along the leading dimension. To emulate striding in SGEMV
+     * with the leading dimension arguments in SGEMM we compute
+     *
+     *    Y = alpha * op(A) @ X + beta * Y
+     *
+     * as
+     *
+     *    Y.T = alpha * X.T @ op(A).T + beta * Y.T
+     *
+     * Because Fortran uses column major order and X.T and Y.T are row vectors,
+     * the leading dimensions of X.T and Y.T in SGEMM become equal to the 
+     * strides of the the column vectors X and Y in SGEMV. */
+
+    switch (*trans) {
+        case 'T':
+        case 't':
+        case 'C':
+        case 'c':
+            accelerate_cblas_sgemm( CblasColMajor, CblasNoTrans, CblasNoTrans,
+                1, *n, *m, *alpha, X, *incX, A, *ldA, *beta, Y, *incY );
+            break;
+        case 'N':
+        case 'n':
+            accelerate_cblas_sgemm( CblasColMajor, CblasNoTrans, CblasTrans,
+                1, *m, *n, *alpha, X, *incX, A, *ldA, *beta, Y, *incY );
+            break;
+        default:
+            cblas_xerbla(1, "SGEMV", "Illegal transpose setting: %c\n", *trans);
+    }
+}
+
+/* ----------------------------------------------------------------- */
+/* Override for an alias symbol for sgemv_ in Accelerate */
+
+void sgemv (char *trans,
+            const int *m, const int *n,
+            const float *alpha,
+            const float *A, const int *lda,
+            const float *B, const int *incB,
+            const float *beta,
+            float *C, const int *incC)
+{
+    sgemv_(trans,m,n,alpha,A,lda,B,incB,beta,C,incC);
+}
+
+/* ----------------------------------------------------------------- */
+/* cblas_sgemv override, based on Netlib CBLAS code */
+
+void cblas_sgemv(const enum CBLAS_ORDER order,
+                 const enum CBLAS_TRANSPOSE TransA, const int M, const int N,
+                 const float alpha, const float  *A, const int lda,
+                 const float  *X, const int incX, const float beta,
+                 float  *Y, const int incY)
+{
+   char TA;
+   if (order == CblasColMajor)
+   {
+      if (TransA == CblasNoTrans) TA = 'N';
+      else if (TransA == CblasTrans) TA = 'T';
+      else if (TransA == CblasConjTrans) TA = 'C';
+      else 
+      {
+         cblas_xerbla(2, "cblas_sgemv","Illegal TransA setting, %d\n", TransA);
+      }
+      sgemv_(&TA, &M, &N, &alpha, A, &lda, X, &incX, &beta, Y, &incY);
+   }
+   else if (order == CblasRowMajor)
+   {
+      if (TransA == CblasNoTrans) TA = 'T';
+      else if (TransA == CblasTrans) TA = 'N';
+      else if (TransA == CblasConjTrans) TA = 'N';
+      else
+      {
+         cblas_xerbla(2, "cblas_sgemv", "Illegal TransA setting, %d\n", TransA);
+         return;
+      }
+      sgemv_(&TA, &N, &M, &alpha, A, &lda, X, &incX, &beta, Y, &incY);
+   }
+   else
+      cblas_xerbla(1, "cblas_sgemv", "Illegal Order setting, %d\n", order);
+}
+
+#endif
diff --git a/numpy/core/blasdot/python_xerbla.c b/numpy/core/blasdot/python_xerbla.c
new file mode 100644
index 000000000000..bdf0b9058f7e
--- /dev/null
+++ b/numpy/core/blasdot/python_xerbla.c
@@ -0,0 +1,51 @@
+#include "Python.h"
+
+/*
+ * From f2c.h, this should be safe unless fortran is set to use 64
+ * bit integers. We don't seem to have any good way to detect that.
+ */
+typedef int integer;
+
+/*
+  From the original manpage:
+  --------------------------
+  XERBLA is an error handler for the LAPACK routines.
+  It is called by an LAPACK routine if an input parameter has an invalid value.
+  A message is printed and execution stops.
+
+  Instead of printing a message and stopping the execution, a
+  ValueError is raised with the message.
+
+  Parameters:
+  -----------
+  srname: Subroutine name to use in error message, maximum six characters.
+          Spaces at the end are skipped.
+  info: Number of the invalid parameter.
+*/
+
+int xerbla_(char *srname, integer *info)
+{
+        static const char format[] = "On entry to %.*s" \
+                " parameter number %d had an illegal value";
+        char buf[sizeof(format) + 6 + 4];   /* 6 for name, 4 for param. num. */
+
+        int len = 0; /* length of subroutine name*/
+#ifdef WITH_THREAD
+        PyGILState_STATE save;
+#endif
+
+        while( len<6 && srname[len]!='\0' )
+                len++;
+        while( len && srname[len-1]==' ' )
+                len--;
+#ifdef WITH_THREAD
+        save = PyGILState_Ensure();
+#endif
+        PyOS_snprintf(buf, sizeof(buf), format, len, srname, *info);
+        PyErr_SetString(PyExc_ValueError, buf);
+#ifdef WITH_THREAD
+        PyGILState_Release(save);
+#endif
+
+        return 0;
+}
diff --git a/numpy/core/code_generators/cversions.txt b/numpy/core/code_generators/cversions.txt
index d621152247d4..acfced81262a 100644
--- a/numpy/core/code_generators/cversions.txt
+++ b/numpy/core/code_generators/cversions.txt
@@ -28,4 +28,4 @@
 # Version 9 (NumPy 1.9) Added function annotations.
 # The interface has not changed, but the hash is different due to
 # the annotations, so keep the previous version number.
-0x00000009 = 49b27dc2dc7206a775a7376fdbc3b80c
+0x00000009 = 982c4ebb6e7e4c194bf46b1535b4ef1b
diff --git a/numpy/core/code_generators/genapi.py b/numpy/core/code_generators/genapi.py
index 5ab60a37cade..84bd042f53ae 100644
--- a/numpy/core/code_generators/genapi.py
+++ b/numpy/core/code_generators/genapi.py
@@ -473,9 +473,9 @@ def fullapi_hash(api_dicts):
     of the list of items in the API (as a string)."""
     a = []
     for d in api_dicts:
-        for name, index in order_dict(d):
+        for name, data in order_dict(d):
             a.extend(name)
-            a.extend(str(index))
+            a.extend(','.join(map(str, data)))
 
     return md5new(''.join(a).encode('ascii')).hexdigest()
 
diff --git a/numpy/core/code_generators/generate_numpy_api.py b/numpy/core/code_generators/generate_numpy_api.py
index a590cfb48d66..415cbf7fcd00 100644
--- a/numpy/core/code_generators/generate_numpy_api.py
+++ b/numpy/core/code_generators/generate_numpy_api.py
@@ -8,8 +8,9 @@
 
 import numpy_api
 
+# use annotated api when running under cpychecker
 h_template = r"""
-#ifdef _MULTIARRAYMODULE
+#if defined(_MULTIARRAYMODULE) || defined(WITH_CPYCHECKER_STEALS_REFERENCE_TO_ARG_ATTRIBUTE)
 
 typedef struct {
         PyObject_HEAD
diff --git a/numpy/core/fromnumeric.py b/numpy/core/fromnumeric.py
index 49fd57e29c34..72d59fd0caa2 100644
--- a/numpy/core/fromnumeric.py
+++ b/numpy/core/fromnumeric.py
@@ -679,8 +679,16 @@ def argpartition(a, kth, axis=-1, kind='introselect', order=None):
     >>> x[np.argpartition(x, (1, 3))]
     array([1, 2, 3, 4])
 
+    >>> x = [3, 4, 2, 1]
+    >>> np.array(x)[np.argpartition(x, 3)]
+    array([2, 1, 3, 4])
+
     """
-    return a.argpartition(kth, axis, kind=kind, order=order)
+    try:
+        argpartition = a.argpartition
+    except AttributeError:
+        return _wrapit(a, 'argpartition',kth, axis, kind, order)
+    return argpartition(kth, axis, kind=kind, order=order)
 
 
 def sort(a, axis=-1, kind='quicksort', order=None):
diff --git a/numpy/core/function_base.py b/numpy/core/function_base.py
index 400e75eb5b52..7e52276ea01e 100644
--- a/numpy/core/function_base.py
+++ b/numpy/core/function_base.py
@@ -32,17 +32,21 @@ def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None):
     retstep : bool, optional
         If True, return (`samples`, `step`), where `step` is the spacing
         between samples.
-    dtype : dtype
+    dtype : dtype, optional
         The type of the output array.  If `dtype` is not given, infer the data
         type from the other input arguments.
 
+        .. versionadded:: 1.9.0
+
     Returns
     -------
     samples : ndarray
         There are `num` equally spaced samples in the closed interval
         ``[start, stop]`` or the half-open interval ``[start, stop)``
         (depending on whether `endpoint` is True or False).
-    step : float (only if `retstep` is True)
+    step : float
+        Only returned if `retstep` is True
+
         Size of spacing between samples.
 
 
diff --git a/numpy/core/include/numpy/npy_3kcompat.h b/numpy/core/include/numpy/npy_3kcompat.h
index fec95779a1df..8a9109c5c4e8 100644
--- a/numpy/core/include/numpy/npy_3kcompat.h
+++ b/numpy/core/include/numpy/npy_3kcompat.h
@@ -298,7 +298,7 @@ npy_PyFile_DupClose(PyObject *file, FILE* handle)
 #else
 
 /* DEPRECATED, DO NOT USE */
-#define npy_PyFile_DupClose(f, h, p) npy_PyFile_DupClose2((f), (h), (p))
+#define npy_PyFile_DupClose(f, h) npy_PyFile_DupClose2((f), (h), 0)
 
 /* use these */
 static NPY_INLINE FILE *
diff --git a/numpy/core/include/numpy/npy_common.h b/numpy/core/include/numpy/npy_common.h
index 5cba8c9d2a3c..92b03d20cae1 100644
--- a/numpy/core/include/numpy/npy_common.h
+++ b/numpy/core/include/numpy/npy_common.h
@@ -133,6 +133,7 @@ extern long long __cdecl _ftelli64(FILE *);
 #else
     #define npy_ftell ftell
 #endif
+    #include <sys/types.h>
     #define npy_lseek lseek
     #define npy_off_t off_t
 
@@ -264,18 +265,9 @@ typedef unsigned PY_LONG_LONG npy_ulonglong;
 #  ifdef _MSC_VER
 #    define NPY_LONGLONG_FMT         "I64d"
 #    define NPY_ULONGLONG_FMT        "I64u"
-#  elif defined(__APPLE__) || defined(__FreeBSD__)
-/*   "%Ld" only parses 4 bytes -- "L" is floating modifier on MacOS X/BSD */
+#  else
 #    define NPY_LONGLONG_FMT         "lld"
 #    define NPY_ULONGLONG_FMT        "llu"
-/*
-     another possible variant -- *quad_t works on *BSD, but is deprecated:
-     #define LONGLONG_FMT   "qd"
-     #define ULONGLONG_FMT   "qu"
-*/
-#  else
-#    define NPY_LONGLONG_FMT         "Ld"
-#    define NPY_ULONGLONG_FMT        "Lu"
 #  endif
 #  ifdef _MSC_VER
 #    define NPY_LONGLONG_SUFFIX(x)   (x##i64)
diff --git a/numpy/core/include/numpy/npy_math.h b/numpy/core/include/numpy/npy_math.h
index b7920460d88a..461651b08170 100644
--- a/numpy/core/include/numpy/npy_math.h
+++ b/numpy/core/include/numpy/npy_math.h
@@ -164,7 +164,7 @@ double npy_spacing(double x);
     #ifndef NPY_HAVE_DECL_ISNAN
         #define npy_isnan(x) ((x) != (x))
     #else
-        #ifdef _MSC_VER
+        #if defined(_MSC_VER) && (_MSC_VER < 1900)
             #define npy_isnan(x) _isnan((x))
         #else
             #define npy_isnan(x) isnan(x)
@@ -195,7 +195,7 @@ double npy_spacing(double x);
     #ifndef NPY_HAVE_DECL_ISINF
         #define npy_isinf(x) (!npy_isfinite(x) && !npy_isnan(x))
     #else
-        #ifdef _MSC_VER
+        #if defined(_MSC_VER) && (_MSC_VER < 1900)
             #define npy_isinf(x) (!_finite((x)) && !_isnan((x)))
         #else
             #define npy_isinf(x) isinf((x))
diff --git a/numpy/core/include/numpy/ufuncobject.h b/numpy/core/include/numpy/ufuncobject.h
index 38e3dcf0f4c9..a24a0d83774f 100644
--- a/numpy/core/include/numpy/ufuncobject.h
+++ b/numpy/core/include/numpy/ufuncobject.h
@@ -152,13 +152,13 @@ typedef struct _tagPyUFuncObject {
         int check_return;
 
         /* The name of the ufunc */
-        char *name;
+        const char *name;
 
         /* Array of type numbers, of size ('nargs' * 'ntypes') */
         char *types;
 
         /* Documentation string */
-        char *doc;
+        const char *doc;
 
         void *ptr;
         PyObject *obj;
diff --git a/numpy/core/numeric.py b/numpy/core/numeric.py
index 57784a51f971..5d7407ce0de9 100644
--- a/numpy/core/numeric.py
+++ b/numpy/core/numeric.py
@@ -6,9 +6,11 @@
 import collections
 from . import multiarray
 from . import umath
-from .umath import *
+from .umath import (invert, sin, UFUNC_BUFSIZE_DEFAULT, ERR_IGNORE,
+                    ERR_WARN, ERR_RAISE, ERR_CALL, ERR_PRINT, ERR_LOG,
+                    ERR_DEFAULT, PINF, NAN)
 from . import numerictypes
-from .numerictypes import *
+from .numerictypes import longlong, intc, int_, float_, complex_, bool_
 
 if sys.version_info[0] >= 3:
     import pickle
@@ -358,9 +360,6 @@ def extend_all(module):
         if a not in adict:
             __all__.append(a)
 
-extend_all(umath)
-extend_all(numerictypes)
-
 newaxis = None
 
 
@@ -2834,6 +2833,10 @@ def _setdef():
 False_ = bool_(False)
 True_ = bool_(True)
 
+from .umath import *
+from .numerictypes import *
 from . import fromnumeric
 from .fromnumeric import *
 extend_all(fromnumeric)
+extend_all(umath)
+extend_all(numerictypes)
diff --git a/numpy/core/records.py b/numpy/core/records.py
index d0f82a25c6d5..42c832b88fd7 100644
--- a/numpy/core/records.py
+++ b/numpy/core/records.py
@@ -71,7 +71,6 @@
 # are equally allowed
 
 numfmt = nt.typeDict
-_typestr = nt._typestr
 
 def find_duplicate(list):
     """Find duplication in a list, return a list of duplicated elements"""
@@ -527,15 +526,12 @@ def fromarrays(arrayList, dtype=None, shape=None, formats=None,
     if formats is None and dtype is None:
         # go through each object in the list to see if it is an ndarray
         # and determine the formats.
-        formats = ''
+        formats = []
         for obj in arrayList:
             if not isinstance(obj, ndarray):
                 raise ValueError("item in the array list must be an ndarray.")
-            formats += _typestr[obj.dtype.type]
-            if issubclass(obj.dtype.type, nt.flexible):
-                formats += repr(obj.itemsize)
-            formats += ','
-        formats = formats[:-1]
+            formats.append(obj.dtype.str)
+        formats = ','.join(formats)
 
     if dtype is not None:
         descr = sb.dtype(dtype)
diff --git a/numpy/core/setup.py b/numpy/core/setup.py
index 5da04241317e..44403e414c85 100644
--- a/numpy/core/setup.py
+++ b/numpy/core/setup.py
@@ -773,6 +773,7 @@ def generate_multiarray_templated_sources(ext, build_dir):
             join('src', 'multiarray', 'shape.h'),
             join('src', 'multiarray', 'ucsnarrow.h'),
             join('src', 'multiarray', 'usertypes.h'),
+            join('src', 'private', 'npy_config.h'),
             join('src', 'private', 'lowlevel_strided_loops.h'),
             join('include', 'numpy', 'arrayobject.h'),
             join('include', 'numpy', '_neighborhood_iterator_imp.h'),
@@ -954,13 +955,15 @@ def get_dotblas_sources(ext, build_dir):
         if blas_info:
             if ('NO_ATLAS_INFO', 1) in blas_info.get('define_macros', []):
                 return None # dotblas needs ATLAS, Fortran compiled blas will not be sufficient.
-            return ext.depends[:1]
+            return ext.depends[:3]
         return None # no extension module will be built
 
     config.add_extension('_dotblas',
                          sources = [get_dotblas_sources],
                          depends = [join('blasdot', '_dotblas.c'),
-                                  join('blasdot', 'cblas.h'),
+                                    join('blasdot', 'python_xerbla.c'),
+                                    join('blasdot', 'apple_sgemv_patch.c'),
+                                    join('blasdot', 'cblas.h'),
                                   ],
                          include_dirs = ['blasdot'],
                          extra_info = blas_info
diff --git a/numpy/core/setup_common.py b/numpy/core/setup_common.py
index be5673a47873..81eec041b403 100644
--- a/numpy/core/setup_common.py
+++ b/numpy/core/setup_common.py
@@ -171,6 +171,15 @@ def check_long_double_representation(cmd):
     cmd._check_compiler()
     body = LONG_DOUBLE_REPRESENTATION_SRC % {'type': 'long double'}
 
+    # Disable whole program optimization (the default on vs2015, with python 3.5+)
+    # which generates intermediary object files and prevents checking the
+    # float representation.
+    if sys.platform == "win32":
+        try:
+            cmd.compiler.compile_options.remove("/GL")
+        except ValueError:
+            pass
+
     # We need to use _compile because we need the object filename
     src, object = cmd._compile(body, None, None, 'c')
     try:
diff --git a/numpy/core/src/multiarray/arraytypes.c.src b/numpy/core/src/multiarray/arraytypes.c.src
index 92752be92e90..3ce25db17d8e 100644
--- a/numpy/core/src/multiarray/arraytypes.c.src
+++ b/numpy/core/src/multiarray/arraytypes.c.src
@@ -2991,7 +2991,7 @@ static int
     memcpy(mp, ip, elsize);
     *max_ind = 0;
     for (i = 1; i < n; i++) {
-        ip += elsize;
+        ip += elsize / sizeof(@type@);
         if (@fname@_compare(ip, mp, aip) > 0) {
             memcpy(mp, ip, elsize);
             *max_ind = i;
@@ -3048,7 +3048,7 @@ static int
     memcpy(mp, ip, elsize);
     *min_ind = 0;
     for(i=1; i<n; i++) {
-        ip += elsize;
+        ip += elsize / sizeof(@type@);
         if (@fname@_compare(mp,ip,aip) > 0) {
             memcpy(mp, ip, elsize);
             *min_ind=i;
@@ -3467,8 +3467,6 @@ static void
     npy_intp i;
     @type@ max_val, min_val;
 
-    min_val = *min;
-    max_val = *max;
     if (max != NULL) {
         max_val = *max;
     }
@@ -3696,6 +3694,7 @@ static int
  * #align = char, char, npy_ucs4#
  * #NAME = Void, String, Unicode#
  * #endian = |, |, =#
+ * #flags = 0, 0, NPY_NEEDS_INIT#
  */
 static PyArray_ArrFuncs _Py@NAME@_ArrFuncs = {
     {
@@ -3775,8 +3774,8 @@ static PyArray_Descr @from@_Descr = {
     NPY_@from@LTR,
     /* byteorder */
     '@endian@',
-    /* flags */
-    0,
+    /* flags, unicode needs init as py3.3 does not like printing garbage  */
+    @flags@,
     /* type_num */
     NPY_@from@,
     /* elsize */
@@ -3922,7 +3921,8 @@ NPY_NO_EXPORT PyArray_Descr @from@_Descr = {
     /* elsize */
     @num@ * sizeof(@fromtype@),
     /* alignment */
-    @num@ * _ALIGN(@fromtype@),
+    @num@ * _ALIGN(@fromtype@) > NPY_MAX_COPY_ALIGNMENT ?
+        NPY_MAX_COPY_ALIGNMENT : @num@ * _ALIGN(@fromtype@),
     /* subarray */
     NULL,
     /* fields */
@@ -4268,7 +4268,8 @@ set_typeinfo(PyObject *dict)
 #endif
                 NPY_@name@,
                 NPY_BITSOF_@name@,
-                @num@ * _ALIGN(@type@),
+                @num@ * _ALIGN(@type@) > NPY_MAX_COPY_ALIGNMENT ?
+                    NPY_MAX_COPY_ALIGNMENT : @num@ * _ALIGN(@type@),
                 (PyObject *) &Py@Name@ArrType_Type));
     Py_DECREF(s);
 
diff --git a/numpy/core/src/multiarray/buffer.c b/numpy/core/src/multiarray/buffer.c
index ea1a885ed735..d3dd85b826d1 100644
--- a/numpy/core/src/multiarray/buffer.c
+++ b/numpy/core/src/multiarray/buffer.c
@@ -269,7 +269,8 @@ _buffer_format_string(PyArray_Descr *descr, _tmp_string_t *str,
 #else
             tmp = name;
 #endif
-            if (tmp == NULL || PyBytes_AsStringAndSize(tmp, &p, &len) < 0) {
+            if (tmp == NULL || PyBytes_AsStringAndSize(tmp, &p, &len) == -1) {
+                PyErr_Clear();
                 PyErr_SetString(PyExc_ValueError, "invalid field name");
                 return -1;
             }
diff --git a/numpy/core/src/multiarray/calculation.c b/numpy/core/src/multiarray/calculation.c
index 50938be4cb6d..5563a2515c8d 100644
--- a/numpy/core/src/multiarray/calculation.c
+++ b/numpy/core/src/multiarray/calculation.c
@@ -1182,7 +1182,7 @@ PyArray_Clip(PyArrayObject *self, PyObject *min, PyObject *max, PyArrayObject *o
 NPY_NO_EXPORT PyObject *
 PyArray_Conjugate(PyArrayObject *self, PyArrayObject *out)
 {
-    if (PyArray_ISCOMPLEX(self)) {
+    if (PyArray_ISCOMPLEX(self) || PyArray_ISOBJECT(self)) {
         if (out == NULL) {
             return PyArray_GenericUnaryFunction(self,
                                                 n_ops.conjugate);
diff --git a/numpy/core/src/multiarray/common.c b/numpy/core/src/multiarray/common.c
index 2b3d3c3d267f..a8490f0e8164 100644
--- a/numpy/core/src/multiarray/common.c
+++ b/numpy/core/src/multiarray/common.c
@@ -518,12 +518,20 @@ PyArray_DTypeFromObjectHelper(PyObject *obj, int maxdims,
         return 0;
     }
 
+    /*
+     * fails if convertable to list but no len is defined which some libraries
+     * require to get object arrays
+     */
+    size = PySequence_Size(obj);
+    if (size < 0) {
+        goto fail;
+    }
+
     /* Recursive case, first check the sequence contains only one type */
     seq = PySequence_Fast(obj, "Could not convert object to sequence");
     if (seq == NULL) {
         goto fail;
     }
-    size = PySequence_Fast_GET_SIZE(seq);
     objects = PySequence_Fast_ITEMS(seq);
     common_type = size > 0 ? Py_TYPE(objects[0]) : NULL;
     for (i = 1; i < size; ++i) {
@@ -676,7 +684,16 @@ _IsAligned(PyArrayObject *ap)
 
     /* alignment 1 types should have a efficient alignment for copy loops */
     if (PyArray_ISFLEXIBLE(ap) || PyArray_ISSTRING(ap)) {
-        alignment = 16;
+        npy_intp itemsize = PyArray_ITEMSIZE(ap);
+        /* power of two sizes may be loaded in larger moves */
+        if (((itemsize & (itemsize - 1)) == 0)) {
+            alignment = itemsize > NPY_MAX_COPY_ALIGNMENT ?
+                NPY_MAX_COPY_ALIGNMENT : itemsize;
+        }
+        else {
+            /* if not power of two it will be accessed bytewise */
+            alignment = 1;
+        }
     }
 
     if (alignment == 1) {
@@ -779,64 +796,3 @@ offset_bounds_from_strides(const int itemsize, const int nd,
     *lower_offset = lower;
     *upper_offset = upper;
 }
-
-
-/**
- * Convert an array shape to a string such as "(1, 2)".
- *
- * @param Dimensionality of the shape
- * @param npy_intp pointer to shape array
- * @param String to append after the shape `(1, 2)%s`.
- * 
- * @return Python unicode string
- */
-NPY_NO_EXPORT PyObject *
-convert_shape_to_string(npy_intp n, npy_intp *vals, char *ending)
-{
-    npy_intp i;
-    PyObject *ret, *tmp;
-
-    /*
-     * Negative dimension indicates "newaxis", which can
-     * be discarded for printing if it's a leading dimension.
-     * Find the first non-"newaxis" dimension.
-     */
-    for (i = 0; i < n && vals[i] < 0; i++);
-
-    if (i == n) {
-        return PyUString_FromFormat("()%s", ending);
-    }
-    else {
-        ret = PyUString_FromFormat("(%" NPY_INTP_FMT, vals[i++]);
-        if (ret == NULL) {
-            return NULL;
-        }
-    }
-
-    for (; i < n; ++i) {
-        if (vals[i] < 0) {
-            tmp = PyUString_FromString(",newaxis");
-        }
-        else {
-            tmp = PyUString_FromFormat(",%" NPY_INTP_FMT, vals[i]);
-        }
-        if (tmp == NULL) {
-            Py_DECREF(ret);
-            return NULL;
-        }
-
-        PyUString_ConcatAndDel(&ret, tmp);
-        if (ret == NULL) {
-            return NULL;
-        }
-    }
-
-    if (i == 1) {
-        tmp = PyUString_FromFormat(",)%s", ending);
-    }
-    else {
-        tmp = PyUString_FromFormat(")%s", ending);
-        }
-    PyUString_ConcatAndDel(&ret, tmp);
-    return ret;
-}
diff --git a/numpy/core/src/multiarray/common.h b/numpy/core/src/multiarray/common.h
index 6b49d6b4cf5c..2de31e4674ae 100644
--- a/numpy/core/src/multiarray/common.h
+++ b/numpy/core/src/multiarray/common.h
@@ -3,6 +3,7 @@
 #include <numpy/npy_common.h>
 #include <numpy/npy_cpu.h>
 #include <numpy/ndarraytypes.h>
+#include <numpy/npy_3kcompat.h>
 
 #define error_converting(x)  (((x) == -1) && PyErr_Occurred())
 
@@ -69,9 +70,6 @@ offset_bounds_from_strides(const int itemsize, const int nd,
                            const npy_intp *dims, const npy_intp *strides,
                            npy_intp *lower_offset, npy_intp *upper_offset);
 
-NPY_NO_EXPORT PyObject *
-convert_shape_to_string(npy_intp n, npy_intp *vals, char *ending);
-
 
 /*
  * Returns -1 and sets an exception if *index is an invalid index for
@@ -208,6 +206,123 @@ _is_basic_python_type(PyObject * obj)
     return 0;
 }
 
+
+/**
+ * Convert an array shape to a string such as "(1, 2)".
+ *
+ * @param Dimensionality of the shape
+ * @param npy_intp pointer to shape array
+ * @param String to append after the shape `(1, 2)%s`.
+ *
+ * @return Python unicode string
+ */
+static NPY_INLINE PyObject *
+convert_shape_to_string(npy_intp n, npy_intp *vals, char *ending)
+{
+    npy_intp i;
+    PyObject *ret, *tmp;
+
+    /*
+     * Negative dimension indicates "newaxis", which can
+     * be discarded for printing if it's a leading dimension.
+     * Find the first non-"newaxis" dimension.
+     */
+    for (i = 0; i < n && vals[i] < 0; i++);
+
+    if (i == n) {
+        return PyUString_FromFormat("()%s", ending);
+    }
+    else {
+        ret = PyUString_FromFormat("(%" NPY_INTP_FMT, vals[i++]);
+        if (ret == NULL) {
+            return NULL;
+        }
+    }
+
+    for (; i < n; ++i) {
+        if (vals[i] < 0) {
+            tmp = PyUString_FromString(",newaxis");
+        }
+        else {
+            tmp = PyUString_FromFormat(",%" NPY_INTP_FMT, vals[i]);
+        }
+        if (tmp == NULL) {
+            Py_DECREF(ret);
+            return NULL;
+        }
+
+        PyUString_ConcatAndDel(&ret, tmp);
+        if (ret == NULL) {
+            return NULL;
+        }
+    }
+
+    if (i == 1) {
+        tmp = PyUString_FromFormat(",)%s", ending);
+    }
+    else {
+        tmp = PyUString_FromFormat(")%s", ending);
+    }
+    PyUString_ConcatAndDel(&ret, tmp);
+    return ret;
+}
+
+
+/*
+ * Sets ValueError with "matrices not aligned" message for np.dot and friends
+ * when a.shape[i] should match b.shape[j], but doesn't.
+ */
+static NPY_INLINE void
+not_aligned(PyArrayObject *a, int i, PyArrayObject *b, int j)
+{
+    PyObject *errmsg = NULL, *format = NULL, *fmt_args = NULL,
+             *i_obj = NULL, *j_obj = NULL,
+             *shape1 = NULL, *shape2 = NULL,
+             *shape1_i = NULL, *shape2_j = NULL;
+
+    format = PyUString_FromString("shapes %s and %s not aligned:"
+                                  " %d (dim %d) != %d (dim %d)");
+
+    shape1 = convert_shape_to_string(PyArray_NDIM(a), PyArray_DIMS(a), "");
+    shape2 = convert_shape_to_string(PyArray_NDIM(b), PyArray_DIMS(b), "");
+
+    i_obj = PyLong_FromLong(i);
+    j_obj = PyLong_FromLong(j);
+
+    shape1_i = PyLong_FromSsize_t(PyArray_DIM(a, i));
+    shape2_j = PyLong_FromSsize_t(PyArray_DIM(b, j));
+
+    if (!format || !shape1 || !shape2 || !i_obj || !j_obj ||
+            !shape1_i || !shape2_j) {
+        goto end;
+    }
+
+    fmt_args = PyTuple_Pack(6, shape1, shape2,
+                            shape1_i, i_obj, shape2_j, j_obj);
+    if (fmt_args == NULL) {
+        goto end;
+    }
+
+    errmsg = PyUString_Format(format, fmt_args);
+    if (errmsg != NULL) {
+        PyErr_SetObject(PyExc_ValueError, errmsg);
+    }
+    else {
+        PyErr_SetString(PyExc_ValueError, "shapes are not aligned");
+    }
+
+end:
+    Py_XDECREF(errmsg);
+    Py_XDECREF(fmt_args);
+    Py_XDECREF(format);
+    Py_XDECREF(i_obj);
+    Py_XDECREF(j_obj);
+    Py_XDECREF(shape1);
+    Py_XDECREF(shape2);
+    Py_XDECREF(shape1_i);
+    Py_XDECREF(shape2_j);
+}
+
 #include "ucsnarrow.h"
 
 #endif
diff --git a/numpy/core/src/multiarray/conversion_utils.c b/numpy/core/src/multiarray/conversion_utils.c
index b84dff864f99..d32fcabd2575 100644
--- a/numpy/core/src/multiarray/conversion_utils.c
+++ b/numpy/core/src/multiarray/conversion_utils.c
@@ -16,6 +16,11 @@
 
 #include "conversion_utils.h"
 
+static int
+PyArray_PyIntAsInt_ErrMsg(PyObject *o, const char * msg) NPY_GCC_NONNULL(2);
+static npy_intp
+PyArray_PyIntAsIntp_ErrMsg(PyObject *o, const char * msg) NPY_GCC_NONNULL(2);
+
 /****************************************************************
 * Useful function for conversion when used with PyArg_ParseTuple
 ****************************************************************/
@@ -215,8 +220,9 @@ PyArray_AxisConverter(PyObject *obj, int *axis)
         *axis = NPY_MAXDIMS;
     }
     else {
-        *axis = PyArray_PyIntAsInt(obj);
-        if (PyErr_Occurred()) {
+        *axis = PyArray_PyIntAsInt_ErrMsg(obj,
+                               "an integer is required for the axis");
+        if (error_converting(*axis)) {
             return NPY_FAIL;
         }
     }
@@ -251,7 +257,8 @@ PyArray_ConvertMultiAxis(PyObject *axis_in, int ndim, npy_bool *out_axis_flags)
         }
         for (i = 0; i < naxes; ++i) {
             PyObject *tmp = PyTuple_GET_ITEM(axis_in, i);
-            int axis = PyArray_PyIntAsInt(tmp);
+            int axis = PyArray_PyIntAsInt_ErrMsg(tmp,
+                          "integers are required for the axis tuple elements");
             int axis_orig = axis;
             if (error_converting(axis)) {
                 return NPY_FAIL;
@@ -281,7 +288,8 @@ PyArray_ConvertMultiAxis(PyObject *axis_in, int ndim, npy_bool *out_axis_flags)
 
         memset(out_axis_flags, 0, ndim);
 
-        axis = PyArray_PyIntAsInt(axis_in);
+        axis = PyArray_PyIntAsInt_ErrMsg(axis_in,
+                                   "an integer is required for the axis");
         axis_orig = axis;
 
         if (error_converting(axis)) {
@@ -736,13 +744,12 @@ PyArray_CastingConverter(PyObject *obj, NPY_CASTING *casting)
 * Other conversion functions
 *****************************/
 
-/*NUMPY_API*/
-NPY_NO_EXPORT int
-PyArray_PyIntAsInt(PyObject *o)
+static int
+PyArray_PyIntAsInt_ErrMsg(PyObject *o, const char * msg)
 {
     npy_intp long_value;
     /* This assumes that NPY_SIZEOF_INTP >= NPY_SIZEOF_INT */
-    long_value = PyArray_PyIntAsIntp(o);
+    long_value = PyArray_PyIntAsIntp_ErrMsg(o, msg);
 
 #if (NPY_SIZEOF_INTP > NPY_SIZEOF_INT)
     if ((long_value < INT_MIN) || (long_value > INT_MAX)) {
@@ -754,8 +761,14 @@ PyArray_PyIntAsInt(PyObject *o)
 }
 
 /*NUMPY_API*/
-NPY_NO_EXPORT npy_intp
-PyArray_PyIntAsIntp(PyObject *o)
+NPY_NO_EXPORT int
+PyArray_PyIntAsInt(PyObject *o)
+{
+    return PyArray_PyIntAsInt_ErrMsg(o, "an integer is required");
+}
+
+static npy_intp
+PyArray_PyIntAsIntp_ErrMsg(PyObject *o, const char * msg)
 {
 #if (NPY_SIZEOF_LONG < NPY_SIZEOF_INTP)
     long long long_value = -1;
@@ -763,7 +776,6 @@ PyArray_PyIntAsIntp(PyObject *o)
     long long_value = -1;
 #endif
     PyObject *obj, *err;
-    static char *msg = "an integer is required";
 
     if (!o) {
         PyErr_SetString(PyExc_TypeError, msg);
@@ -909,6 +921,13 @@ PyArray_PyIntAsIntp(PyObject *o)
     return long_value;
 }
 
+/*NUMPY_API*/
+NPY_NO_EXPORT npy_intp
+PyArray_PyIntAsIntp(PyObject *o)
+{
+    return PyArray_PyIntAsIntp_ErrMsg(o, "an integer is required");
+}
+
 
 /*
  * PyArray_IntpFromIndexSequence
diff --git a/numpy/core/src/multiarray/convert_datatype.c b/numpy/core/src/multiarray/convert_datatype.c
index 1db3bfe8575b..35503d1e2e2a 100644
--- a/numpy/core/src/multiarray/convert_datatype.c
+++ b/numpy/core/src/multiarray/convert_datatype.c
@@ -634,6 +634,52 @@ static npy_bool
 PyArray_CanCastTypeTo_impl(PyArray_Descr *from, PyArray_Descr *to,
                            NPY_CASTING casting);
 
+/*
+ * Compare two field dictionaries for castability.
+ *
+ * Return 1 if 'field1' can be cast to 'field2' according to the rule
+ * 'casting', 0 if not.
+ *
+ * Castabiliy of field dictionaries is defined recursively: 'field1' and
+ * 'field2' must have the same field names (possibly in different
+ * orders), and the corresponding field types must be castable according
+ * to the given casting rule.
+ */
+static int
+can_cast_fields(PyObject *field1, PyObject *field2, NPY_CASTING casting)
+{
+    Py_ssize_t ppos;
+    PyObject *key;
+    PyObject *tuple1, *tuple2;
+
+    if (field1 == field2) {
+        return 1;
+    }
+    if (field1 == NULL || field2 == NULL) {
+        return 0;
+    }
+    if (PyDict_Size(field1) != PyDict_Size(field2)) {
+        return 0;
+    }
+
+    /* Iterate over all the fields and compare for castability */
+    ppos = 0;
+    while (PyDict_Next(field1, &ppos, &key, &tuple1)) {
+        if ((tuple2 = PyDict_GetItem(field2, key)) == NULL) {
+            return 0;
+        }
+        /* Compare the dtype of the field for castability */
+        if (!PyArray_CanCastTypeTo(
+                        (PyArray_Descr *)PyTuple_GET_ITEM(tuple1, 0),
+                        (PyArray_Descr *)PyTuple_GET_ITEM(tuple2, 0),
+                        casting)) {
+            return 0;
+        }
+    }
+
+    return 1;
+}
+
 /*NUMPY_API
  * Returns true if data of type 'from' may be cast to data of type
  * 'to' according to the rule 'casting'.
@@ -687,7 +733,6 @@ PyArray_CanCastTypeTo_impl(PyArray_Descr *from, PyArray_Descr *to,
     else if (PyArray_EquivTypenums(from->type_num, to->type_num)) {
         /* For complicated case, use EquivTypes (for now) */
         if (PyTypeNum_ISUSERDEF(from->type_num) ||
-                        PyDataType_HASFIELDS(from) ||
                         from->subarray != NULL) {
             int ret;
 
@@ -715,6 +760,23 @@ PyArray_CanCastTypeTo_impl(PyArray_Descr *from, PyArray_Descr *to,
             return ret;
         }
 
+        if (PyDataType_HASFIELDS(from)) {
+            switch (casting) {
+                case NPY_EQUIV_CASTING:
+                case NPY_SAFE_CASTING:
+                case NPY_SAME_KIND_CASTING:
+                    /*
+                     * `from' and `to' must have the same fields, and
+                     * corresponding fields must be (recursively) castable.
+                     */
+                    return can_cast_fields(from->fields, to->fields, casting);
+
+                case NPY_NO_CASTING:
+                default:
+                    return PyArray_EquivTypes(from, to);
+            }
+        }
+
         switch (from->type_num) {
             case NPY_DATETIME: {
                 PyArray_DatetimeMetaData *meta1, *meta2;
diff --git a/numpy/core/src/multiarray/ctors.c b/numpy/core/src/multiarray/ctors.c
index d93995c8a3e1..3da2dfae7a39 100644
--- a/numpy/core/src/multiarray/ctors.c
+++ b/numpy/core/src/multiarray/ctors.c
@@ -1054,12 +1054,12 @@ PyArray_NewFromDescr_int(PyTypeObject *subtype, PyArray_Descr *descr, int nd,
     fa->data = data;
 
     /*
-     * If the strides were provided to the function, need to
-     * update the flags to get the right CONTIGUOUS, ALIGN properties
+     * always update the flags to get the right CONTIGUOUS, ALIGN properties
+     * not owned data and input strides may not be aligned and on some
+     * platforms (debian sparc) malloc does not provide enough alignment for
+     * long double types
      */
-    if (strides != NULL) {
-        PyArray_UpdateFlags((PyArrayObject *)fa, NPY_ARRAY_UPDATE_ALL);
-    }
+    PyArray_UpdateFlags((PyArrayObject *)fa, NPY_ARRAY_UPDATE_ALL);
 
     /*
      * call the __array_finalize__
diff --git a/numpy/core/src/multiarray/datetime.c b/numpy/core/src/multiarray/datetime.c
index 850c92b44975..e5fb6a16f654 100644
--- a/numpy/core/src/multiarray/datetime.c
+++ b/numpy/core/src/multiarray/datetime.c
@@ -1840,7 +1840,7 @@ convert_datetime_metadata_tuple_to_datetime_metadata(PyObject *tuple,
         }
         unit_str = tmp;
     }
-    if (PyBytes_AsStringAndSize(unit_str, &basestr, &len) < 0) {
+    if (PyBytes_AsStringAndSize(unit_str, &basestr, &len) == -1) {
         Py_DECREF(unit_str);
         return -1;
     }
@@ -1919,7 +1919,7 @@ convert_pyobject_to_datetime_metadata(PyObject *obj,
         return -1;
     }
 
-    if (PyBytes_AsStringAndSize(ascii, &str, &len) < 0) {
+    if (PyBytes_AsStringAndSize(ascii, &str, &len) == -1) {
         Py_DECREF(ascii);
         return -1;
     }
@@ -3047,7 +3047,7 @@ cast_timedelta_to_timedelta(PyArray_DatetimeMetaData *src_meta,
  * Returns true if the object is something that is best considered
  * a Datetime, false otherwise.
  */
-static npy_bool
+static NPY_GCC_NONNULL(1) npy_bool
 is_any_numpy_datetime(PyObject *obj)
 {
     return (PyArray_IsScalar(obj, Datetime) ||
@@ -3296,7 +3296,8 @@ datetime_arange(PyObject *start, PyObject *stop, PyObject *step,
         }
     }
     else {
-        if (is_any_numpy_datetime(start) || is_any_numpy_datetime(stop)) {
+        if ((start && is_any_numpy_datetime(start)) ||
+            is_any_numpy_datetime(stop)) {
             type_nums[0] = NPY_DATETIME;
         }
         else {
diff --git a/numpy/core/src/multiarray/datetime_busday.c b/numpy/core/src/multiarray/datetime_busday.c
index 331e104969ed..c81badcfb2a9 100644
--- a/numpy/core/src/multiarray/datetime_busday.c
+++ b/numpy/core/src/multiarray/datetime_busday.c
@@ -850,7 +850,7 @@ PyArray_BusDayRollConverter(PyObject *roll_in, NPY_BUSDAY_ROLL *roll)
         obj = obj_str;
     }
 
-    if (PyBytes_AsStringAndSize(obj, &str, &len) < 0) {
+    if (PyBytes_AsStringAndSize(obj, &str, &len) == -1) {
         Py_DECREF(obj);
         return 0;
     }
diff --git a/numpy/core/src/multiarray/datetime_busdaycal.c b/numpy/core/src/multiarray/datetime_busdaycal.c
index 91ba24c97bde..1e0268446b23 100644
--- a/numpy/core/src/multiarray/datetime_busdaycal.c
+++ b/numpy/core/src/multiarray/datetime_busdaycal.c
@@ -48,7 +48,7 @@ PyArray_WeekMaskConverter(PyObject *weekmask_in, npy_bool *weekmask)
         Py_ssize_t len;
         int i;
 
-        if (PyBytes_AsStringAndSize(obj, &str, &len) < 0) {
+        if (PyBytes_AsStringAndSize(obj, &str, &len) == -1) {
             Py_DECREF(obj);
             return 0;
         }
diff --git a/numpy/core/src/multiarray/datetime_strings.c b/numpy/core/src/multiarray/datetime_strings.c
index 54587cb5c309..a86af52069ee 100644
--- a/numpy/core/src/multiarray/datetime_strings.c
+++ b/numpy/core/src/multiarray/datetime_strings.c
@@ -1589,7 +1589,7 @@ array_datetime_as_string(PyObject *NPY_UNUSED(self), PyObject *args,
             Py_INCREF(strobj);
         }
 
-        if (PyBytes_AsStringAndSize(strobj, &str, &len) < 0) {
+        if (PyBytes_AsStringAndSize(strobj, &str, &len) == -1) {
             Py_DECREF(strobj);
             goto fail;
         }
@@ -1637,7 +1637,7 @@ array_datetime_as_string(PyObject *NPY_UNUSED(self), PyObject *args,
             char *str;
             Py_ssize_t len;
 
-            if (PyBytes_AsStringAndSize(timezone_obj, &str, &len) < 0) {
+            if (PyBytes_AsStringAndSize(timezone_obj, &str, &len) == -1) {
                 goto fail;
             }
 
diff --git a/numpy/core/src/multiarray/descriptor.c b/numpy/core/src/multiarray/descriptor.c
index 8b55c9fbd79f..e456d98be1c6 100644
--- a/numpy/core/src/multiarray/descriptor.c
+++ b/numpy/core/src/multiarray/descriptor.c
@@ -1323,7 +1323,7 @@ PyArray_DescrConverter(PyObject *obj, PyArray_Descr **at)
         Py_ssize_t len = 0;
 
         /* Check for a string typecode. */
-        if (PyBytes_AsStringAndSize(obj, &type, &len) < 0) {
+        if (PyBytes_AsStringAndSize(obj, &type, &len) == -1) {
             goto error;
         }
 
@@ -2369,11 +2369,8 @@ arraydescr_setstate(PyArray_Descr *self, PyObject *args)
 {
     int elsize = -1, alignment = -1;
     int version = 4;
-#if defined(NPY_PY3K)
-    int endian;
-#else
     char endian;
-#endif
+    PyObject *endian_obj;
     PyObject *subarray, *fields, *names = NULL, *metadata=NULL;
     int incref_names = 1;
     int int_dtypeflags = 0;
@@ -2390,68 +2387,39 @@ arraydescr_setstate(PyArray_Descr *self, PyObject *args)
     }
     switch (PyTuple_GET_SIZE(PyTuple_GET_ITEM(args,0))) {
     case 9:
-#if defined(NPY_PY3K)
-#define _ARGSTR_ "(iCOOOiiiO)"
-#else
-#define _ARGSTR_ "(icOOOiiiO)"
-#endif
-        if (!PyArg_ParseTuple(args, _ARGSTR_, &version, &endian,
+        if (!PyArg_ParseTuple(args, "(iOOOOiiiO)", &version, &endian_obj,
                     &subarray, &names, &fields, &elsize,
                     &alignment, &int_dtypeflags, &metadata)) {
+            PyErr_Clear();
             return NULL;
-#undef _ARGSTR_
         }
         break;
     case 8:
-#if defined(NPY_PY3K)
-#define _ARGSTR_ "(iCOOOiii)"
-#else
-#define _ARGSTR_ "(icOOOiii)"
-#endif
-        if (!PyArg_ParseTuple(args, _ARGSTR_, &version, &endian,
+        if (!PyArg_ParseTuple(args, "(iOOOOiii)", &version, &endian_obj,
                     &subarray, &names, &fields, &elsize,
                     &alignment, &int_dtypeflags)) {
             return NULL;
-#undef _ARGSTR_
         }
         break;
     case 7:
-#if defined(NPY_PY3K)
-#define _ARGSTR_ "(iCOOOii)"
-#else
-#define _ARGSTR_ "(icOOOii)"
-#endif
-        if (!PyArg_ParseTuple(args, _ARGSTR_, &version, &endian,
+        if (!PyArg_ParseTuple(args, "(iOOOOii)", &version, &endian_obj,
                     &subarray, &names, &fields, &elsize,
                     &alignment)) {
             return NULL;
-#undef _ARGSTR_
         }
         break;
     case 6:
-#if defined(NPY_PY3K)
-#define _ARGSTR_ "(iCOOii)"
-#else
-#define _ARGSTR_ "(icOOii)"
-#endif
-        if (!PyArg_ParseTuple(args, _ARGSTR_, &version,
-                    &endian, &subarray, &fields,
+        if (!PyArg_ParseTuple(args, "(iOOOii)", &version,
+                    &endian_obj, &subarray, &fields,
                     &elsize, &alignment)) {
-            PyErr_Clear();
-#undef _ARGSTR_
+            return NULL;
         }
         break;
     case 5:
         version = 0;
-#if defined(NPY_PY3K)
-#define _ARGSTR_ "(COOii)"
-#else
-#define _ARGSTR_ "(cOOii)"
-#endif
-        if (!PyArg_ParseTuple(args, _ARGSTR_,
-                    &endian, &subarray, &fields, &elsize,
+        if (!PyArg_ParseTuple(args, "(OOOii)",
+                    &endian_obj, &subarray, &fields, &elsize,
                     &alignment)) {
-#undef _ARGSTR_
             return NULL;
         }
         break;
@@ -2494,11 +2462,55 @@ arraydescr_setstate(PyArray_Descr *self, PyObject *args)
         }
     }
 
+    /* Parse endian */
+    if (PyUnicode_Check(endian_obj) || PyBytes_Check(endian_obj)) {
+        PyObject *tmp = NULL;
+        char *str;
+        Py_ssize_t len;
+
+        if (PyUnicode_Check(endian_obj)) {
+            tmp = PyUnicode_AsASCIIString(endian_obj);
+            if (tmp == NULL) {
+                return NULL;
+            }
+            endian_obj = tmp;
+        }
+
+        if (PyBytes_AsStringAndSize(endian_obj, &str, &len) == -1) {
+            Py_XDECREF(tmp);
+            return NULL;
+        }
+        if (len != 1) {
+            PyErr_SetString(PyExc_ValueError,
+                            "endian is not 1-char string in Numpy dtype unpickling");
+            Py_XDECREF(tmp);
+            return NULL;
+        }
+        endian = str[0];
+        Py_XDECREF(tmp);
+    }
+    else {
+        PyErr_SetString(PyExc_ValueError,
+                        "endian is not a string in Numpy dtype unpickling");
+        return NULL;
+    }
 
     if ((fields == Py_None && names != Py_None) ||
         (names == Py_None && fields != Py_None)) {
         PyErr_Format(PyExc_ValueError,
-                "inconsistent fields and names");
+                "inconsistent fields and names in Numpy dtype unpickling");
+        return NULL;
+    }
+
+    if (names != Py_None && !PyTuple_Check(names)) {
+        PyErr_Format(PyExc_ValueError,
+                "non-tuple names in Numpy dtype unpickling");
+        return NULL;
+    }
+
+    if (fields != Py_None && !PyDict_Check(fields)) {
+        PyErr_Format(PyExc_ValueError,
+                "non-dict fields in Numpy dtype unpickling");
         return NULL;
     }
 
@@ -2563,13 +2575,82 @@ arraydescr_setstate(PyArray_Descr *self, PyObject *args)
     }
 
     if (fields != Py_None) {
-        Py_XDECREF(self->fields);
-        self->fields = fields;
-        Py_INCREF(fields);
-        Py_XDECREF(self->names);
-        self->names = names;
-        if (incref_names) {
-            Py_INCREF(names);
+        /*
+         * Ensure names are of appropriate string type
+         */
+        Py_ssize_t i;
+        int names_ok = 1;
+        PyObject *name;
+
+        for (i = 0; i < PyTuple_GET_SIZE(names); ++i) {
+            name = PyTuple_GET_ITEM(names, i);
+            if (!PyUString_Check(name)) {
+                names_ok = 0;
+                break;
+            }
+        }
+
+        if (names_ok) {
+            Py_XDECREF(self->fields);
+            self->fields = fields;
+            Py_INCREF(fields);
+            Py_XDECREF(self->names);
+            self->names = names;
+            if (incref_names) {
+                Py_INCREF(names);
+            }
+        }
+        else {
+#if defined(NPY_PY3K)
+            /*
+             * To support pickle.load(f, encoding='bytes') for loading Py2
+             * generated pickles on Py3, we need to be more lenient and convert
+             * field names from byte strings to unicode.
+             */
+            PyObject *tmp, *new_name, *field;
+
+            tmp = PyDict_New();
+            if (tmp == NULL) {
+                return NULL;
+            }
+            Py_XDECREF(self->fields);
+            self->fields = tmp;
+
+            tmp = PyTuple_New(PyTuple_GET_SIZE(names));
+            if (tmp == NULL) {
+                return NULL;
+            }
+            Py_XDECREF(self->names);
+            self->names = tmp;
+
+            for (i = 0; i < PyTuple_GET_SIZE(names); ++i) {
+                name = PyTuple_GET_ITEM(names, i);
+                field = PyDict_GetItem(fields, name);
+                if (!field) {
+                    return NULL;
+                }
+
+                if (PyUnicode_Check(name)) {
+                    new_name = name;
+                    Py_INCREF(new_name);
+                }
+                else {
+                    new_name = PyUnicode_FromEncodedObject(name, "ASCII", "strict");
+                    if (new_name == NULL) {
+                        return NULL;
+                    }
+                }
+
+                PyTuple_SET_ITEM(self->names, i, new_name);
+                if (PyDict_SetItem(self->fields, new_name, field) != 0) {
+                    return NULL;
+                }
+            }
+#else
+            PyErr_Format(PyExc_ValueError,
+                "non-string names in Numpy dtype unpickling");
+            return NULL;
+#endif
         }
     }
 
diff --git a/numpy/core/src/multiarray/lowlevel_strided_loops.c.src b/numpy/core/src/multiarray/lowlevel_strided_loops.c.src
index b9063273faf8..38e7656f39f1 100644
--- a/numpy/core/src/multiarray/lowlevel_strided_loops.c.src
+++ b/numpy/core/src/multiarray/lowlevel_strided_loops.c.src
@@ -1490,7 +1490,9 @@ mapiter_@name@(PyArrayMapIterObject *mit)
     /* Constant information */
     npy_intp fancy_dims[NPY_MAXDIMS];
     npy_intp fancy_strides[NPY_MAXDIMS];
+#if @isget@
     int iteraxis;
+#endif
 
     char *baseoffset = mit->baseoffset;
     char **outer_ptrs = mit->outer_ptrs;
@@ -1498,7 +1500,9 @@ mapiter_@name@(PyArrayMapIterObject *mit)
     PyArrayObject *array= mit->array;
 
     /* Fill constant information */
+#if @isget@
     iteraxis = mit->iteraxes[0];
+#endif
     for (i = 0; i < numiter; i++) {
         fancy_dims[i] = mit->fancy_dims[i];
         fancy_strides[i] = mit->fancy_strides[i];
diff --git a/numpy/core/src/multiarray/mapping.c b/numpy/core/src/multiarray/mapping.c
index e2b8ef700bd2..40622ca6188b 100644
--- a/numpy/core/src/multiarray/mapping.c
+++ b/numpy/core/src/multiarray/mapping.c
@@ -206,14 +206,17 @@ prepare_index(PyArrayObject *self, PyObject *index,
             n = 0;
             make_tuple = 1;
         }
-        n = PySequence_Size(index);
+        else {
+            n = PySequence_Size(index);
+        }
         if (n < 0 || n >= NPY_MAXDIMS) {
             n = 0;
         }
         for (i = 0; i < n; i++) {
             PyObject *tmp_obj = PySequence_GetItem(index, i);
             if (tmp_obj == NULL) {
-                make_tuple = 1;
+                PyErr_Clear();
+                make_tuple = 0;
                 break;
             }
             if (PyArray_Check(tmp_obj) || PySequence_Check(tmp_obj)
@@ -1047,7 +1050,7 @@ array_boolean_subscript(PyArrayObject *self,
         Py_INCREF(dtype);
         ret = (PyArrayObject *)PyArray_NewFromDescr(Py_TYPE(self), dtype, 1,
                             &size, PyArray_STRIDES(ret), PyArray_BYTES(ret),
-                            0, (PyObject *)self);
+                            PyArray_FLAGS(self), (PyObject *)self);
 
         if (ret == NULL) {
             Py_DECREF(tmp);
@@ -1221,7 +1224,7 @@ array_assign_boolean_subscript(PyArrayObject *self,
 
     if (needs_api) {
         /*
-         * FIXME?: most assignment operations stop after the first occurance
+         * FIXME?: most assignment operations stop after the first occurrence
          * of an error. Boolean does not currently, but should at least
          * report the error. (This is only relevant for things like str->int
          * casts which call into python)
@@ -1436,7 +1439,7 @@ array_subscript(PyArrayObject *self, PyObject *op)
         /*
          * TODO: Should this be a view or not? The only reason not would be
          *       optimization (i.e. of array[...] += 1) I think.
-         *       Before, it was just self for a single Ellipis.
+         *       Before, it was just self for a single ellipsis.
          */
         result = PyArray_View(self, NULL, NULL);
         /* A single ellipsis, so no need to decref */
@@ -1569,7 +1572,7 @@ array_subscript(PyArrayObject *self, PyObject *op)
                                       PyArray_SHAPE(tmp_arr),
                                       PyArray_STRIDES(tmp_arr),
                                       PyArray_BYTES(tmp_arr),
-                                      0, /* TODO: Flags? */
+                                      PyArray_FLAGS(self),
                                       (PyObject *)self);
 
         if (result == NULL) {
@@ -1655,6 +1658,58 @@ array_assign_item(PyArrayObject *self, Py_ssize_t i, PyObject *op)
 }
 
 
+/*
+ * This fallback takes the old route of `arr.flat[index] = values`
+ * for one dimensional `arr`. The route can sometimes fail slightly
+ * differently (ValueError instead of IndexError), in which case we
+ * warn users about the change. But since it does not actually care *at all*
+ * about shapes, it should only fail for out of bound indexes or
+ * casting errors.
+ */
+NPY_NO_EXPORT int
+attempt_1d_fallback(PyArrayObject *self, PyObject *ind, PyObject *op)
+{
+    PyObject *err = PyErr_Occurred();
+    PyArrayIterObject *self_iter = NULL;
+
+    Py_INCREF(err);
+    PyErr_Clear();
+
+    self_iter = (PyArrayIterObject *)PyArray_IterNew((PyObject *)self);
+    if (self_iter == NULL) {
+        goto fail;
+    }
+    if (iter_ass_subscript(self_iter, ind, op) < 0) {
+        goto fail;
+    }
+    
+    Py_XDECREF((PyObject *)self_iter);
+    Py_DECREF(err);
+
+    if (DEPRECATE(
+            "assignment will raise an error in the future, most likely "
+            "because your index result shape does not match the value array "
+            "shape. You can use `arr.flat[index] = values` to keep the old "
+            "behaviour.") < 0) {
+        return -1;
+    }
+    return 0;
+
+  fail:
+    if (!PyErr_ExceptionMatches(err)) {
+        PyObject *err, *val, *tb;
+        PyErr_Fetch(&err, &val, &tb);
+        DEPRECATE_FUTUREWARNING(
+            "assignment exception type will change in the future");
+        PyErr_Restore(err, val, tb);
+    }
+
+    Py_XDECREF((PyObject *)self_iter);
+    Py_DECREF(err);
+    return -1;
+}
+
+
 /*
  * General assignment with python indexing objects.
  */
@@ -1746,9 +1801,21 @@ array_assign_subscript(PyArrayObject *self, PyObject *ind, PyObject *op)
             Py_INCREF(op);
             tmp_arr = (PyArrayObject *)op;
         }
+
         if (array_assign_boolean_subscript(self,
                                            (PyArrayObject *)indices[0].object,
                                            tmp_arr, NPY_CORDER) < 0) {
+            /*
+             * Deprecated case. The old boolean indexing seemed to have some
+             * check to allow wrong dimensional boolean arrays in all cases.
+             */
+            if (PyArray_NDIM(tmp_arr) > 1) {
+                if (attempt_1d_fallback(self, indices[0].object,
+                                        (PyObject*)tmp_arr) < 0) {
+                    goto fail;
+                }
+                goto success;
+            }
             goto fail;
         }
         goto success;
@@ -1899,14 +1966,36 @@ array_assign_subscript(PyArrayObject *self, PyObject *ind, PyObject *op)
                                              tmp_arr, descr);
 
     if (mit == NULL) {
-        goto fail;
+        /*
+         * This is a deprecated special case to allow non-matching shapes
+         * for the index and value arrays.
+         */
+        if (index_type != HAS_FANCY || index_num != 1) {
+            /* This is not a "flat like" 1-d special case */
+            goto fail;
+        }
+        if (attempt_1d_fallback(self, indices[0].object, op) < 0) {
+            goto fail;
+        }
+        goto success;
     }
 
     if (tmp_arr == NULL) {
         /* Fill extra op */
 
         if (PyArray_CopyObject(mit->extra_op, op) < 0) {
-            goto fail;
+            /*
+             * This is a deprecated special case to allow non-matching shapes
+             * for the index and value arrays.
+             */
+             if (index_type != HAS_FANCY || index_num != 1) {
+                /* This is not a "flat like" 1-d special case */
+                goto fail;
+            }
+            if (attempt_1d_fallback(self, indices[0].object, op) < 0) {
+                goto fail;
+            }
+            goto success;
         }
     }
 
@@ -2357,7 +2446,7 @@ PyArray_MapIterCheckIndices(PyArrayMapIterObject *mit)
     NPY_BEGIN_THREADS_DEF;
 
     if (mit->size == 0) {
-        /* All indices got broadcasted away, do *not* check as it always was */
+        /* All indices got broadcast away, do *not* check as it always was */
         return 0;
     }
 
@@ -2580,7 +2669,7 @@ PyArray_MapIterNew(npy_index_info *indices , int index_num, int index_type,
      *   1. No subspace iteration is necessary, so the extra_op can
      *      be included into the index iterator (it will be buffered)
      *   2. Subspace iteration is necessary, so the extra op is iterated
-     *      independendly, and the iteration order is fixed at C (could
+     *      independently, and the iteration order is fixed at C (could
      *      also use Fortran order if the array is Fortran order).
      *      In this case the subspace iterator is not buffered.
      *
@@ -2773,7 +2862,7 @@ PyArray_MapIterNew(npy_index_info *indices , int index_num, int index_type,
                   NPY_ITER_GROWINNER;
 
     /*
-     * For a single 1-d operand, guarantee itertion order
+     * For a single 1-d operand, guarantee iteration order
      * (scipy used this). Note that subspace may be used.
      */
     if ((mit->numiter == 1) && (PyArray_NDIM(index_arrays[0]) == 1)) {
@@ -2985,7 +3074,7 @@ PyArray_MapIterNew(npy_index_info *indices , int index_num, int index_type,
 
   fail:
     /*
-     * Check whether the operand was not broadcastable and replace the error
+     * Check whether the operand could not be broadcast and replace the error
      * in that case. This should however normally be found early with a
      * direct goto to broadcast_error
      */
@@ -3000,7 +3089,7 @@ PyArray_MapIterNew(npy_index_info *indices , int index_num, int index_type,
                 /* (j < 0 is currently impossible, extra_op is reshaped) */
                 j >= 0 &&
                 PyArray_DIM(extra_op, i) != mit->dimensions[j]) {
-            /* extra_op cannot be broadcasted to the indexing result */
+            /* extra_op cannot be broadcast to the indexing result */
             goto broadcast_error;
         }
     }
@@ -3060,7 +3149,7 @@ PyArray_MapIterNew(npy_index_info *indices , int index_num, int index_type,
  * that most of this public API is currently not guaranteed
  * to stay the same between versions. If you plan on using
  * it, please consider adding more utility functions here
- * to accomodate new features.
+ * to accommodate new features.
  */
 NPY_NO_EXPORT PyObject *
 PyArray_MapIterArray(PyArrayObject * a, PyObject * index)
diff --git a/numpy/core/src/multiarray/methods.c b/numpy/core/src/multiarray/methods.c
index 5fab174bafa1..5407c14ffe36 100644
--- a/numpy/core/src/multiarray/methods.c
+++ b/numpy/core/src/multiarray/methods.c
@@ -1670,6 +1670,13 @@ array_setstate(PyArrayObject *self, PyObject *args)
             tmp = PyUnicode_AsLatin1String(rawdata);
             Py_DECREF(rawdata);
             rawdata = tmp;
+            if (tmp == NULL) {
+                /* More informative error message */
+                PyErr_SetString(PyExc_ValueError,
+                                ("Failed to encode latin1 string when unpickling a Numpy array. "
+                                 "pickle.load(a, encoding='latin1') is assumed."));
+                return NULL;
+            }
         }
 #endif
 
@@ -1680,7 +1687,7 @@ array_setstate(PyArrayObject *self, PyObject *args)
             return NULL;
         }
 
-        if (PyBytes_AsStringAndSize(rawdata, &datastr, &len)) {
+        if (PyBytes_AsStringAndSize(rawdata, &datastr, &len) == -1) {
             Py_DECREF(rawdata);
             return NULL;
         }
diff --git a/numpy/core/src/multiarray/multiarray_tests.c.src b/numpy/core/src/multiarray/multiarray_tests.c.src
index bd0366bd5db9..8de29e7bf290 100644
--- a/numpy/core/src/multiarray/multiarray_tests.c.src
+++ b/numpy/core/src/multiarray/multiarray_tests.c.src
@@ -719,6 +719,81 @@ array_indexing(PyObject *NPY_UNUSED(self), PyObject *args)
     return NULL;
 }
 
+/*
+ * Test C-api PyArray_AsCArray item getter
+ */
+static PyObject *
+test_as_c_array(PyObject *NPY_UNUSED(self), PyObject *args)
+{
+    PyArrayObject *array_obj;
+    npy_intp dims[3];   // max 3-dim
+    npy_intp i=0, j=0, k=0;
+    npy_intp num_dims = 0;
+    PyArray_Descr *descr = NULL;
+    double *array1 = NULL;
+    double **array2 = NULL;
+    double ***array3 = NULL;
+    double temp = 9999;
+
+    if (!PyArg_ParseTuple(args, "O!l|ll",
+                &PyArray_Type, &array_obj,
+                &i, &j, &k)) {
+        return NULL;
+    }
+
+    if (NULL == array_obj) {
+        return NULL;
+    }
+
+    num_dims = PyArray_NDIM(array_obj);
+    descr = PyArray_DESCR(array_obj);
+
+    switch (num_dims) {
+        case 1:
+            if (PyArray_AsCArray(
+                    (PyObject **) &array_obj,
+                    (void *) &array1,
+                    dims,
+                    1,
+                    descr) < 0) {
+                PyErr_SetString(PyExc_RuntimeError, "error converting 1D array");
+                return NULL;
+            }
+            temp = array1[i];
+            PyArray_Free((PyObject *) array_obj, (void *) array1);
+            break;
+        case 2:
+            if (PyArray_AsCArray(
+                    (PyObject **) &array_obj,
+                    (void **) &array2,
+                    dims,
+                    2,
+                    descr) < 0) {
+                PyErr_SetString(PyExc_RuntimeError, "error converting 2D array");
+                return NULL;
+            }
+            temp = array2[i][j];
+            PyArray_Free((PyObject *) array_obj, (void *) array2);
+            break;
+        case 3:
+            if (PyArray_AsCArray(
+                    (PyObject **) &array_obj,
+                    (void ***) &array3,
+                    dims,
+                    3,
+                    descr) < 0) {
+                PyErr_SetString(PyExc_RuntimeError, "error converting 3D array");
+                return NULL;
+            }
+            temp = array3[i][j][k];
+            PyArray_Free((PyObject *) array_obj, (void *) array3);
+            break;
+        default:
+            PyErr_SetString(PyExc_ValueError, "array.ndim not in [1, 3]");
+            return NULL;
+    }
+    return Py_BuildValue("f", temp);
+}
 
 /*
  * Test nditer of too large arrays using remove axis, etc.
@@ -850,6 +925,9 @@ static PyMethodDef Multiarray_TestsMethods[] = {
     {"array_indexing",
         array_indexing,
         METH_VARARGS, NULL},
+    {"test_as_c_array",
+        test_as_c_array,
+        METH_VARARGS, NULL},
     {"test_nditer_too_large",
         test_nditer_too_large,
         METH_VARARGS, NULL},
diff --git a/numpy/core/src/multiarray/multiarraymodule.c b/numpy/core/src/multiarray/multiarraymodule.c
index 682705a1b8e4..b694b150b7a1 100644
--- a/numpy/core/src/multiarray/multiarraymodule.c
+++ b/numpy/core/src/multiarray/multiarraymodule.c
@@ -220,7 +220,7 @@ PyArray_AsCArray(PyObject **op, void *ptr, npy_intp *dims, int nd,
             goto fail;
         }
         for (i = 0; i < n; i++) {
-            ptr3[i] = ptr3[n + (m-1)*i];
+            ptr3[i] = (char **) &ptr3[n + m * i];
             for (j = 0; j < m; j++) {
                 ptr3[i][j] = PyArray_BYTES(ap) + i*PyArray_STRIDES(ap)[0] + j*PyArray_STRIDES(ap)[1];
             }
@@ -576,6 +576,12 @@ PyArray_Concatenate(PyObject *op, int axis)
     PyArrayObject **arrays;
     PyArrayObject *ret;
 
+    if (!PySequence_Check(op)) {
+        PyErr_SetString(PyExc_TypeError,
+                        "The first input argument needs to be a sequence");
+        return NULL;
+    }
+
     /* Convert the input list into arrays */
     narrays = PySequence_Size(op);
     if (narrays < 0) {
@@ -818,6 +824,9 @@ PyArray_InnerProduct(PyObject *op1, PyObject *op2)
     typenum = PyArray_ObjectType(op2, typenum);
 
     typec = PyArray_DescrFromType(typenum);
+    if (typec == NULL) {
+        return NULL;
+    }
     Py_INCREF(typec);
     ap1 = (PyArrayObject *)PyArray_FromAny(op1, typec, 0, 0,
                                         NPY_ARRAY_ALIGNED, NULL);
@@ -841,7 +850,7 @@ PyArray_InnerProduct(PyObject *op1, PyObject *op2)
 
     l = PyArray_DIMS(ap1)[PyArray_NDIM(ap1) - 1];
     if (PyArray_DIMS(ap2)[PyArray_NDIM(ap2) - 1] != l) {
-        PyErr_SetString(PyExc_ValueError, "matrices are not aligned");
+        not_aligned(ap1, PyArray_NDIM(ap1) - 1, ap2, PyArray_NDIM(ap2) - 1);
         goto fail;
     }
 
@@ -961,7 +970,7 @@ PyArray_MatrixProduct2(PyObject *op1, PyObject *op2, PyArrayObject* out)
         matchDim = 0;
     }
     if (PyArray_DIMS(ap2)[matchDim] != l) {
-        PyErr_SetString(PyExc_ValueError, "objects are not aligned");
+        not_aligned(ap1, PyArray_NDIM(ap1) - 1, ap2, matchDim);
         goto fail;
     }
     nd = PyArray_NDIM(ap1) + PyArray_NDIM(ap2) - 2;
@@ -1401,9 +1410,7 @@ array_putmask(PyObject *NPY_UNUSED(module), PyObject *args, PyObject *kwds)
 static int
 _equivalent_fields(PyObject *field1, PyObject *field2) {
 
-    Py_ssize_t ppos;
-    PyObject *key;
-    PyObject *tuple1, *tuple2;
+    int same, val;
 
     if (field1 == field2) {
         return 1;
@@ -1411,33 +1418,20 @@ _equivalent_fields(PyObject *field1, PyObject *field2) {
     if (field1 == NULL || field2 == NULL) {
         return 0;
     }
-
-    if (PyDict_Size(field1) != PyDict_Size(field2)) {
-        return 0;
+#if defined(NPY_PY3K)
+    val = PyObject_RichCompareBool(field1, field2, Py_EQ);
+    if (val != 1 || PyErr_Occurred()) {
+#else
+    val = PyObject_Compare(field1, field2);
+    if (val != 0 || PyErr_Occurred()) {
+#endif
+        same = 0;
     }
-
-    /* Iterate over all the fields and compare for equivalency */
-    ppos = 0;
-    while (PyDict_Next(field1, &ppos, &key, &tuple1)) {
-        if ((tuple2 = PyDict_GetItem(field2, key)) == NULL) {
-            return 0;
-        }
-        /* Compare the dtype of the field for equivalency */
-        if (!PyArray_CanCastTypeTo((PyArray_Descr *)PyTuple_GET_ITEM(tuple1, 0),
-                                   (PyArray_Descr *)PyTuple_GET_ITEM(tuple2, 0),
-                                   NPY_EQUIV_CASTING)) {
-            return 0;
-        }
-        /* Compare the byte position of the field */
-        if (PyObject_RichCompareBool(PyTuple_GET_ITEM(tuple1, 1),
-                                     PyTuple_GET_ITEM(tuple2, 1),
-                                     Py_EQ) != 1) {
-            PyErr_Clear();
-            return 0;
-        }
+    else {
+        same = 1;
     }
-
-    return 1;
+    PyErr_Clear();
+    return same;
 }
 
 /*
@@ -1839,7 +1833,7 @@ array_scalar(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *kwds)
 
     static char *kwlist[] = {"dtype","obj", NULL};
     PyArray_Descr *typecode;
-    PyObject *obj = NULL;
+    PyObject *obj = NULL, *tmpobj = NULL;
     int alloc = 0;
     void *dptr;
     PyObject *ret;
@@ -1849,11 +1843,6 @@ array_scalar(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *kwds)
                 &PyArrayDescr_Type, &typecode, &obj)) {
         return NULL;
     }
-    if (typecode->elsize == 0) {
-        PyErr_SetString(PyExc_ValueError,
-                "itemsize cannot be zero");
-        return NULL;
-    }
 
     if (PyDataType_FLAGCHK(typecode, NPY_ITEM_IS_POINTER)) {
         if (obj == NULL) {
@@ -1863,6 +1852,9 @@ array_scalar(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *kwds)
     }
     else {
         if (obj == NULL) {
+            if (typecode->elsize == 0) {
+                typecode->elsize = 1;
+            }
             dptr = PyArray_malloc(typecode->elsize);
             if (dptr == NULL) {
                 return PyErr_NoMemory();
@@ -1871,14 +1863,31 @@ array_scalar(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *kwds)
             alloc = 1;
         }
         else {
+#if defined(NPY_PY3K)
+            /* Backward compatibility with Python 2 Numpy pickles */
+            if (PyUnicode_Check(obj)) {
+                tmpobj = PyUnicode_AsLatin1String(obj);
+                obj = tmpobj;
+                if (tmpobj == NULL) {
+                    /* More informative error message */
+                    PyErr_SetString(PyExc_ValueError,
+                                    ("Failed to encode Numpy scalar data string to latin1. "
+                                     "pickle.load(a, encoding='latin1') is assumed if unpickling."));
+                    return NULL;
+                }
+            }
+#endif
+
             if (!PyString_Check(obj)) {
                 PyErr_SetString(PyExc_TypeError,
                         "initializing object must be a string");
+                Py_XDECREF(tmpobj);
                 return NULL;
             }
             if (PyString_GET_SIZE(obj) < typecode->elsize) {
                 PyErr_SetString(PyExc_ValueError,
                         "initialization string is too small");
+                Py_XDECREF(tmpobj);
                 return NULL;
             }
             dptr = PyString_AS_STRING(obj);
@@ -1890,6 +1899,7 @@ array_scalar(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *kwds)
     if (alloc) {
         PyArray_free(dptr);
     }
+    Py_XDECREF(tmpobj);
     return ret;
 }
 
@@ -2763,7 +2773,7 @@ PyArray_Where(PyObject *condition, PyObject *x, PyObject *y)
             NULL, arr, ax, ay
         };
         npy_uint32 op_flags[4] = {
-            NPY_ITER_WRITEONLY | NPY_ITER_ALLOCATE,
+            NPY_ITER_WRITEONLY | NPY_ITER_ALLOCATE | NPY_ITER_NO_SUBTYPE,
             NPY_ITER_READONLY, NPY_ITER_READONLY, NPY_ITER_READONLY
         };
         PyArray_Descr * common_dt = PyArray_ResultType(2, &op_in[0] + 2,
diff --git a/numpy/core/src/multiarray/number.c b/numpy/core/src/multiarray/number.c
index a26a93c1d3eb..16314ea43475 100644
--- a/numpy/core/src/multiarray/number.c
+++ b/numpy/core/src/multiarray/number.c
@@ -88,6 +88,8 @@ PyArray_SetNumericOps(PyObject *dict)
 
 static int
 has_ufunc_attr(PyObject * obj) {
+    /* ufunc override disabled for 1.9 */
+    return 0;
     /* attribute check is expensive for scalar operations, avoid if possible */
     if (PyArray_CheckExact(obj) || _is_basic_python_type(obj)) {
         return 0;
diff --git a/numpy/core/src/multiarray/scalartypes.c.src b/numpy/core/src/multiarray/scalartypes.c.src
index 110bef248a5f..4fa634098bbe 100644
--- a/numpy/core/src/multiarray/scalartypes.c.src
+++ b/numpy/core/src/multiarray/scalartypes.c.src
@@ -1078,6 +1078,24 @@ gentype_richcompare(PyObject *self, PyObject *other, int cmp_op)
 {
     PyObject *arr, *ret;
 
+    /*
+     * If the other object is None, False is always right. This avoids
+     * the array None comparison, at least until deprecation it is fixed.
+     * After that, this may be removed and numpy false would be returned.
+     *
+     * NOTE: np.equal(NaT, None) evaluates to TRUE! This is an
+     *       an inconsistency, which may has to be considered
+     *       when the deprecation is finished.
+     */
+    if (other == Py_None) {
+        if (cmp_op == Py_EQ) {
+            Py_RETURN_FALSE;
+        }
+        if (cmp_op == Py_NE) {
+            Py_RETURN_TRUE;
+        }
+    }
+
     arr = PyArray_FromScalar(self, NULL);
     if (arr == NULL) {
         return NULL;
diff --git a/numpy/core/src/multiarray/shape.c b/numpy/core/src/multiarray/shape.c
index 2278b5d5b785..7beadf1bc435 100644
--- a/numpy/core/src/multiarray/shape.c
+++ b/numpy/core/src/multiarray/shape.c
@@ -780,7 +780,8 @@ PyArray_Transpose(PyArrayObject *ap, PyArray_Dims *permute)
         PyArray_DIMS(ret)[i] = PyArray_DIMS(ap)[permutation[i]];
         PyArray_STRIDES(ret)[i] = PyArray_STRIDES(ap)[permutation[i]];
     }
-    PyArray_UpdateFlags(ret, NPY_ARRAY_C_CONTIGUOUS | NPY_ARRAY_F_CONTIGUOUS);
+    PyArray_UpdateFlags(ret, NPY_ARRAY_C_CONTIGUOUS | NPY_ARRAY_F_CONTIGUOUS |
+                        NPY_ARRAY_ALIGNED);
     return (PyObject *)ret;
 }
 
diff --git a/numpy/core/src/npymath/npy_math_complex.c.src b/numpy/core/src/npymath/npy_math_complex.c.src
index 920f107b89b2..9cbfd32aeeb4 100644
--- a/numpy/core/src/npymath/npy_math_complex.c.src
+++ b/numpy/core/src/npymath/npy_math_complex.c.src
@@ -247,7 +247,8 @@
 #ifdef HAVE_@KIND@@C@
 @type@ npy_@kind@@c@(@ctype@ z)
 {
-    __@ctype@_to_c99_cast z1 = {z};
+    __@ctype@_to_c99_cast z1;
+    z1.npy_z = z;
     return @kind@@c@(z1.c99_z);
 }
 #endif
@@ -260,8 +261,9 @@
 #ifdef HAVE_@KIND@@C@
 @ctype@ npy_@kind@@c@(@ctype@ z)
 {
-    __@ctype@_to_c99_cast z1 = {z};
+    __@ctype@_to_c99_cast z1;
     __@ctype@_to_c99_cast ret;
+    z1.npy_z = z;
     ret.c99_z = @kind@@c@(z1.c99_z);
     return ret.npy_z;
 }
@@ -275,9 +277,11 @@
 #ifdef HAVE_@KIND@@C@
 @ctype@ npy_@kind@@c@(@ctype@ x, @ctype@ y)
 {
-    __@ctype@_to_c99_cast xcast = {x};
-    __@ctype@_to_c99_cast ycast = {y};
+    __@ctype@_to_c99_cast xcast;
+    __@ctype@_to_c99_cast ycast;
     __@ctype@_to_c99_cast ret;
+    xcast.npy_z = x;
+    ycast.npy_z = y;
     ret.c99_z = @kind@@c@(xcast.c99_z, ycast.c99_z);
     return ret.npy_z;
 }
diff --git a/numpy/core/src/npymath/npy_math_private.h b/numpy/core/src/npymath/npy_math_private.h
index b3b1690bedc8..284d203bff98 100644
--- a/numpy/core/src/npymath/npy_math_private.h
+++ b/numpy/core/src/npymath/npy_math_private.h
@@ -485,6 +485,24 @@ do {                                                            \
  * support is available
  */
 #ifdef NPY_USE_C99_COMPLEX
+
+/* Microsoft C defines _MSC_VER */
+#ifdef _MSC_VER
+typedef union {
+        npy_cdouble npy_z;
+        _Dcomplex c99_z;
+} __npy_cdouble_to_c99_cast;
+
+typedef union {
+        npy_cfloat npy_z;
+        _Fcomplex c99_z;
+} __npy_cfloat_to_c99_cast;
+
+typedef union {
+        npy_clongdouble npy_z;
+        _Lcomplex c99_z;
+} __npy_clongdouble_to_c99_cast;
+#else /* !_MSC_VER */
 typedef union {
         npy_cdouble npy_z;
         complex double c99_z;
@@ -499,7 +517,9 @@ typedef union {
         npy_clongdouble npy_z;
         complex long double c99_z;
 } __npy_clongdouble_to_c99_cast;
-#else
+#endif /* !_MSC_VER */
+
+#else /* !NPY_USE_C99_COMPLEX */
 typedef union {
         npy_cdouble npy_z;
         npy_cdouble c99_z;
@@ -514,6 +534,6 @@ typedef union {
         npy_clongdouble npy_z;
         npy_clongdouble c99_z;
 } __npy_clongdouble_to_c99_cast;
-#endif
+#endif /* !NPY_USE_C99_COMPLEX */
 
 #endif /* !_NPY_MATH_PRIVATE_H_ */
diff --git a/numpy/core/src/npysort/heapsort.c.src b/numpy/core/src/npysort/heapsort.c.src
index 84c9d7bd4a44..ba6c27f48e42 100644
--- a/numpy/core/src/npysort/heapsort.c.src
+++ b/numpy/core/src/npysort/heapsort.c.src
@@ -28,9 +28,9 @@
 
 #define NPY_NO_DEPRECATED_API NPY_API_VERSION
 
-#include <stdlib.h>
 #include "npy_sort.h"
 #include "npysort_common.h"
+#include <stdlib.h>
 
 #define NOT_USED NPY_UNUSED(unused)
 #define PYA_QS_STACK 100
diff --git a/numpy/core/src/npysort/mergesort.c.src b/numpy/core/src/npysort/mergesort.c.src
index 7f98c4016189..c99c0e614ace 100644
--- a/numpy/core/src/npysort/mergesort.c.src
+++ b/numpy/core/src/npysort/mergesort.c.src
@@ -28,9 +28,9 @@
 
 #define NPY_NO_DEPRECATED_API NPY_API_VERSION
 
-#include <stdlib.h>
 #include "npy_sort.h"
 #include "npysort_common.h"
+#include <stdlib.h>
 
 #define NOT_USED NPY_UNUSED(unused)
 #define PYA_QS_STACK 100
diff --git a/numpy/core/src/npysort/quicksort.c.src b/numpy/core/src/npysort/quicksort.c.src
index 272615ab328a..a27530eb4f5d 100644
--- a/numpy/core/src/npysort/quicksort.c.src
+++ b/numpy/core/src/npysort/quicksort.c.src
@@ -28,9 +28,9 @@
 
 #define NPY_NO_DEPRECATED_API NPY_API_VERSION
 
-#include <stdlib.h>
 #include "npy_sort.h"
 #include "npysort_common.h"
+#include <stdlib.h>
 
 #define NOT_USED NPY_UNUSED(unused)
 #define PYA_QS_STACK 100
diff --git a/numpy/core/src/npysort/selection.c.src b/numpy/core/src/npysort/selection.c.src
index 920c07ec6485..4167b26947a1 100644
--- a/numpy/core/src/npysort/selection.c.src
+++ b/numpy/core/src/npysort/selection.c.src
@@ -390,7 +390,10 @@ int
         /* move pivot into position */
         SWAP(SORTEE(low), SORTEE(hh));
 
-        store_pivot(hh, kth, pivots, npiv);
+        /* kth pivot stored later */
+        if (hh != kth) {
+            store_pivot(hh, kth, pivots, npiv);
+        }
 
         if (hh >= kth)
             high = hh - 1;
@@ -400,10 +403,11 @@ int
 
     /* two elements */
     if (high == low + 1) {
-        if (@TYPE@_LT(v[IDX(high)], v[IDX(low)]))
+        if (@TYPE@_LT(v[IDX(high)], v[IDX(low)])) {
             SWAP(SORTEE(high), SORTEE(low))
-        store_pivot(low, kth, pivots, npiv);
+        }
     }
+    store_pivot(kth, kth, pivots, npiv);
 
     return 0;
 }
diff --git a/numpy/core/src/private/npy_config.h b/numpy/core/src/private/npy_config.h
index 453dbd065af9..70a4c0c1fe82 100644
--- a/numpy/core/src/private/npy_config.h
+++ b/numpy/core/src/private/npy_config.h
@@ -3,11 +3,51 @@
 
 #include "config.h"
 #include "numpy/numpyconfig.h"
+#include "numpy/npy_cpu.h"
 
 /* Disable broken MS math functions */
-#if defined(_MSC_VER) || defined(__MINGW32_VERSION)
+#if (defined(_MSC_VER) && (_MSC_VER < 1900)) || defined(__MINGW32_VERSION)
+
 #undef HAVE_ATAN2
+#undef HAVE_ATAN2F
+#undef HAVE_ATAN2L
+
 #undef HAVE_HYPOT
+#undef HAVE_HYPOTF
+#undef HAVE_HYPOTL
+
+#endif
+
+#if defined(_MSC_VER) && (_MSC_VER == 1900)
+
+#undef HAVE_CASIN
+#undef HAVE_CASINF
+#undef HAVE_CASINL
+#undef HAVE_CASINH
+#undef HAVE_CASINHF
+#undef HAVE_CASINHL
+#undef HAVE_CATAN
+#undef HAVE_CATANF
+#undef HAVE_CATANL
+#undef HAVE_CATANH
+#undef HAVE_CATANHF
+#undef HAVE_CATANHL
+
+#endif
+
+/*
+ * largest alignment the copy loops might require
+ * required as string, void and complex types might get copied using larger
+ * instructions than required to operate on them. E.g. complex float is copied
+ * in 8 byte moves but arithmetic on them only loads in 4 byte moves.
+ * the sparc platform may need that alignment for long doubles.
+ * amd64 is not harmed much by the bloat as the system provides 16 byte
+ * alignment by default.
+ */
+#if (defined NPY_CPU_X86 || defined _WIN32)
+#define NPY_MAX_COPY_ALIGNMENT 8
+#else
+ #define NPY_MAX_COPY_ALIGNMENT 16
 #endif
 
 /* Safe to use ldexp and frexp for long double for MSVC builds */
diff --git a/numpy/core/src/private/ufunc_override.h b/numpy/core/src/private/ufunc_override.h
index 6b0f73fcfd62..b64391c9f022 100644
--- a/numpy/core/src/private/ufunc_override.h
+++ b/numpy/core/src/private/ufunc_override.h
@@ -26,6 +26,7 @@ normalize___call___args(PyUFuncObject *ufunc, PyObject *args,
         else {
             obj = PyTuple_GetSlice(args, nin, nargs);
             PyDict_SetItemString(*normal_kwds, "out", obj);
+            Py_DECREF(obj);
         }
     }
 }
@@ -187,6 +188,10 @@ PyUFunc_CheckOverride(PyUFuncObject *ufunc, char *method,
     /* Pos of each override in args */
     int with_override_pos[NPY_MAXARGS];
 
+    /* disabled until remaining issues are fixed */
+    *result = NULL;
+    return 0;
+
     /*
      * Check inputs
      */
diff --git a/numpy/core/src/umath/loops.c.src b/numpy/core/src/umath/loops.c.src
index 89f1206b4c58..035a27fd2efa 100644
--- a/numpy/core/src/umath/loops.c.src
+++ b/numpy/core/src/umath/loops.c.src
@@ -2572,6 +2572,7 @@ OBJECT_@kind@(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUS
             return;
         }
         ret = PyObject_IsTrue(ret_obj);
+        Py_DECREF(ret_obj);
         if (ret == -1) {
 #if @identity@ != -1
             if (in1 == in2) {
@@ -2621,6 +2622,7 @@ OBJECT_sign(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED
         }
         ret = PyLong_FromLong(v);
         if (PyErr_Occurred()) {
+            Py_DECREF(zero);
             return;
         }
         Py_XDECREF(*out);
@@ -2635,6 +2637,7 @@ OBJECT_sign(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED
         PyObject *ret = PyInt_FromLong(
                             PyObject_Compare(in1 ? in1 : Py_None, zero));
         if (PyErr_Occurred()) {
+            Py_DECREF(zero);
             return;
         }
         Py_XDECREF(*out);
diff --git a/numpy/core/src/umath/simd.inc.src b/numpy/core/src/umath/simd.inc.src
index 92dc0c659867..5b111eb0d152 100644
--- a/numpy/core/src/umath/simd.inc.src
+++ b/numpy/core/src/umath/simd.inc.src
@@ -37,7 +37,9 @@
      ((abs(args[1] - args[0]) >= (vsize)) || ((abs(args[1] - args[0]) == 0))))
 
 #define IS_BLOCKABLE_REDUCE(esize, vsize) \
-    (steps[1] == (esize) && abs(args[1] - args[0]) >= (vsize))
+    (steps[1] == (esize) && abs(args[1] - args[0]) >= (vsize) && \
+     npy_is_aligned(args[1], (esize)) && \
+     npy_is_aligned(args[0], (esize)))
 
 #define IS_BLOCKABLE_BINARY(esize, vsize) \
     (steps[0] == steps[1] && steps[1] == steps[2] && steps[2] == (esize) && \
@@ -480,14 +482,18 @@ sse2_binary_scalar2_@kind@_@TYPE@(@type@ * op, @type@ * ip1, @type@ * ip2, npy_i
 
 /**end repeat1**/
 
-/* compress 4 vectors to 4/8 bytes in op with filled with 0 or 1 */
+/*
+ * compress 4 vectors to 4/8 bytes in op with filled with 0 or 1
+ * the last vector is passed as a pointer as MSVC 2010 is unable to ignore the
+ * calling convention leading to C2719 on 32 bit, see #4795
+ */
 static NPY_INLINE void
-sse2_compress4_to_byte_@TYPE@(@vtype@ r1, @vtype@ r2, @vtype@ r3, @vtype@ r4,
+sse2_compress4_to_byte_@TYPE@(@vtype@ r1, @vtype@ r2, @vtype@ r3, @vtype@ * r4,
                               npy_bool * op)
 {
     const __m128i mask = @vpre@_set1_epi8(0x1);
     __m128i ir1 = @vpre@_packs_epi32(@cast@(r1), @cast@(r2));
-    __m128i ir2 = @vpre@_packs_epi32(@cast@(r3), @cast@(r4));
+    __m128i ir2 = @vpre@_packs_epi32(@cast@(r3), @cast@(*r4));
     __m128i rr = @vpre@_packs_epi16(ir1, ir2);
 #if @double@
     rr = @vpre@_packs_epi16(rr, rr);
@@ -535,7 +541,7 @@ sse2_binary_@kind@_@TYPE@(npy_bool * op, @type@ * ip1, @type@ * ip2, npy_intp n)
             @vtype@ r2 = @vpre@_@VOP@_@vsuf@(b, b);
             @vtype@ r3 = @vpre@_@VOP@_@vsuf@(c, c);
             @vtype@ r4 = @vpre@_@VOP@_@vsuf@(d, d);
-            sse2_compress4_to_byte_@TYPE@(r1, r2, r3, r4, &op[i]);
+            sse2_compress4_to_byte_@TYPE@(r1, r2, r3, &r4, &op[i]);
         }
     }
     else {
@@ -552,7 +558,7 @@ sse2_binary_@kind@_@TYPE@(npy_bool * op, @type@ * ip1, @type@ * ip2, npy_intp n)
             @vtype@ r2 = @vpre@_@VOP@_@vsuf@(b1, b2);
             @vtype@ r3 = @vpre@_@VOP@_@vsuf@(c1, c2);
             @vtype@ r4 = @vpre@_@VOP@_@vsuf@(d1, d2);
-            sse2_compress4_to_byte_@TYPE@(r1, r2, r3, r4, &op[i]);
+            sse2_compress4_to_byte_@TYPE@(r1, r2, r3, &r4, &op[i]);
         }
     }
     LOOP_BLOCKED_END {
@@ -577,7 +583,7 @@ sse2_binary_scalar1_@kind@_@TYPE@(npy_bool * op, @type@ * ip1, @type@ * ip2, npy
         @vtype@ r2 = @vpre@_@VOP@_@vsuf@(s, b);
         @vtype@ r3 = @vpre@_@VOP@_@vsuf@(s, c);
         @vtype@ r4 = @vpre@_@VOP@_@vsuf@(s, d);
-        sse2_compress4_to_byte_@TYPE@(r1, r2, r3, r4, &op[i]);
+        sse2_compress4_to_byte_@TYPE@(r1, r2, r3, &r4, &op[i]);
     }
     LOOP_BLOCKED_END {
         op[i] = sse2_ordered_cmp_@kind@_@TYPE@(ip1[0], ip2[i]);
@@ -601,7 +607,7 @@ sse2_binary_scalar2_@kind@_@TYPE@(npy_bool * op, @type@ * ip1, @type@ * ip2, npy
         @vtype@ r2 = @vpre@_@VOP@_@vsuf@(b, s);
         @vtype@ r3 = @vpre@_@VOP@_@vsuf@(c, s);
         @vtype@ r4 = @vpre@_@VOP@_@vsuf@(d, s);
-        sse2_compress4_to_byte_@TYPE@(r1, r2, r3, r4, &op[i]);
+        sse2_compress4_to_byte_@TYPE@(r1, r2, r3, &r4, &op[i]);
     }
     LOOP_BLOCKED_END {
         op[i] = sse2_ordered_cmp_@kind@_@TYPE@(ip1[i], ip2[0]);
diff --git a/numpy/core/src/umath/ufunc_object.c b/numpy/core/src/umath/ufunc_object.c
index d825f15e943d..b0134b8f7f30 100644
--- a/numpy/core/src/umath/ufunc_object.c
+++ b/numpy/core/src/umath/ufunc_object.c
@@ -73,7 +73,7 @@ static int
 _does_loop_use_arrays(void *data);
 
 static int
-_extract_pyvals(PyObject *ref, char *name, int *bufsize,
+_extract_pyvals(PyObject *ref, const char *name, int *bufsize,
                 int *errmask, PyObject **errobj);
 
 static int
@@ -237,7 +237,7 @@ static int PyUFunc_NUM_NODEFAULTS = 0;
 #endif
 
 static PyObject *
-_get_global_ext_obj(char * name)
+get_global_ext_obj(void)
 {
     PyObject *thedict;
     PyObject *ref = NULL;
@@ -259,12 +259,12 @@ _get_global_ext_obj(char * name)
 
 
 static int
-_get_bufsize_errmask(PyObject * extobj, char * ufunc_name,
+_get_bufsize_errmask(PyObject * extobj, const char *ufunc_name,
                      int *buffersize, int *errormask)
 {
     /* Get the buffersize and errormask */
     if (extobj == NULL) {
-        extobj = _get_global_ext_obj(ufunc_name);
+        extobj = get_global_ext_obj();
     }
     if (_extract_pyvals(extobj, ufunc_name,
                         buffersize, errormask, NULL) < 0) {
@@ -430,7 +430,7 @@ _find_array_prepare(PyObject *args, PyObject *kwds,
  *          if an error handling method is 'call'
  */
 static int
-_extract_pyvals(PyObject *ref, char *name, int *bufsize,
+_extract_pyvals(PyObject *ref, const char *name, int *bufsize,
                 int *errmask, PyObject **errobj)
 {
     PyObject *retval;
@@ -518,41 +518,41 @@ _extract_pyvals(PyObject *ref, char *name, int *bufsize,
 NPY_NO_EXPORT int
 PyUFunc_GetPyValues(char *name, int *bufsize, int *errmask, PyObject **errobj)
 {
-    PyObject *ref = _get_global_ext_obj(name);
+    PyObject *ref = get_global_ext_obj();
 
     return _extract_pyvals(ref, name, bufsize, errmask, errobj);
 }
 
-#define _GETATTR_(str, rstr) do {if (strcmp(name, #str) == 0)     \
+#define GETATTR(str, rstr) do {if (strcmp(name, #str) == 0)     \
         return PyObject_HasAttrString(op, "__" #rstr "__");} while (0);
 
 static int
-_has_reflected_op(PyObject *op, char *name)
+_has_reflected_op(PyObject *op, const char *name)
 {
-    _GETATTR_(add, radd);
-    _GETATTR_(subtract, rsub);
-    _GETATTR_(multiply, rmul);
-    _GETATTR_(divide, rdiv);
-    _GETATTR_(true_divide, rtruediv);
-    _GETATTR_(floor_divide, rfloordiv);
-    _GETATTR_(remainder, rmod);
-    _GETATTR_(power, rpow);
-    _GETATTR_(left_shift, rlshift);
-    _GETATTR_(right_shift, rrshift);
-    _GETATTR_(bitwise_and, rand);
-    _GETATTR_(bitwise_xor, rxor);
-    _GETATTR_(bitwise_or, ror);
+    GETATTR(add, radd);
+    GETATTR(subtract, rsub);
+    GETATTR(multiply, rmul);
+    GETATTR(divide, rdiv);
+    GETATTR(true_divide, rtruediv);
+    GETATTR(floor_divide, rfloordiv);
+    GETATTR(remainder, rmod);
+    GETATTR(power, rpow);
+    GETATTR(left_shift, rlshift);
+    GETATTR(right_shift, rrshift);
+    GETATTR(bitwise_and, rand);
+    GETATTR(bitwise_xor, rxor);
+    GETATTR(bitwise_or, ror);
     /* Comparisons */
-    _GETATTR_(equal, eq);
-    _GETATTR_(not_equal, ne);
-    _GETATTR_(greater, lt);
-    _GETATTR_(less, gt);
-    _GETATTR_(greater_equal, le);
-    _GETATTR_(less_equal, ge);
+    GETATTR(equal, eq);
+    GETATTR(not_equal, ne);
+    GETATTR(greater, lt);
+    GETATTR(less, gt);
+    GETATTR(greater_equal, le);
+    GETATTR(less_equal, ge);
     return 0;
 }
 
-#undef _GETATTR_
+#undef GETATTR
 
 
 /* Return the position of next non-white-space char in the string */
@@ -779,7 +779,7 @@ static int get_ufunc_arguments(PyUFuncObject *ufunc,
     int i, nargs, nin = ufunc->nin;
     PyObject *obj, *context;
     PyObject *str_key_obj = NULL;
-    char *ufunc_name;
+    const char *ufunc_name;
     int type_num;
 
     int any_flexible = 0, any_object = 0, any_flexible_userloops = 0;
@@ -1762,7 +1762,7 @@ make_arr_prep_args(npy_intp nin, PyObject *args, PyObject *kwds)
  *  - ufunc_name: name of ufunc
  */
 static int
-_check_ufunc_fperr(int errmask, PyObject *extobj, char* ufunc_name) {
+_check_ufunc_fperr(int errmask, PyObject *extobj, const char *ufunc_name) {
     int fperr;
     PyObject *errobj = NULL;
     int ret;
@@ -1778,7 +1778,7 @@ _check_ufunc_fperr(int errmask, PyObject *extobj, char* ufunc_name) {
 
     /* Get error object globals */
     if (extobj == NULL) {
-        extobj = _get_global_ext_obj(ufunc_name);
+        extobj = get_global_ext_obj();
     }
     if (_extract_pyvals(extobj, ufunc_name,
                         NULL, NULL, &errobj) < 0) {
@@ -1800,7 +1800,7 @@ PyUFunc_GeneralizedFunction(PyUFuncObject *ufunc,
 {
     int nin, nout;
     int i, j, idim, nop;
-    char *ufunc_name;
+    const char *ufunc_name;
     int retval = -1, subok = 1;
     int needs_api = 0;
 
@@ -2325,7 +2325,7 @@ PyUFunc_GenericFunction(PyUFuncObject *ufunc,
 {
     int nin, nout;
     int i, nop;
-    char *ufunc_name;
+    const char *ufunc_name;
     int retval = -1, subok = 1;
     int need_fancy = 0;
 
@@ -2640,7 +2640,7 @@ reduce_type_resolver(PyUFuncObject *ufunc, PyArrayObject *arr,
     int i, retcode;
     PyArrayObject *op[3] = {arr, arr, NULL};
     PyArray_Descr *dtypes[3] = {NULL, NULL, NULL};
-    char *ufunc_name = ufunc->name ? ufunc->name : "(unknown)";
+    const char *ufunc_name = ufunc->name ? ufunc->name : "(unknown)";
     PyObject *type_tup = NULL;
 
     *out_dtype = NULL;
@@ -2816,7 +2816,7 @@ PyUFunc_Reduce(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *out,
     PyArray_Descr *dtype;
     PyArrayObject *result;
     PyArray_AssignReduceIdentityFunc *assign_identity = NULL;
-    char *ufunc_name = ufunc->name ? ufunc->name : "(unknown)";
+    const char *ufunc_name = ufunc->name ? ufunc->name : "(unknown)";
     /* These parameters come from a TLS global */
     int buffersize = 0, errormask = 0;
 
@@ -2912,7 +2912,7 @@ PyUFunc_Accumulate(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *out,
     PyUFuncGenericFunction innerloop = NULL;
     void *innerloopdata = NULL;
 
-    char *ufunc_name = ufunc->name ? ufunc->name : "(unknown)";
+    const char *ufunc_name = ufunc->name ? ufunc->name : "(unknown)";
 
     /* These parameters come from extobj= or from a TLS global */
     int buffersize = 0, errormask = 0;
@@ -3265,7 +3265,7 @@ PyUFunc_Reduceat(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *ind,
     PyUFuncGenericFunction innerloop = NULL;
     void *innerloopdata = NULL;
 
-    char *ufunc_name = ufunc->name ? ufunc->name : "(unknown)";
+    const char *ufunc_name = ufunc->name ? ufunc->name : "(unknown)";
     char *opname = "reduceat";
 
     /* These parameters come from extobj= or from a TLS global */
@@ -3750,7 +3750,7 @@ PyUFunc_GenericReduction(PyUFuncObject *ufunc, PyObject *args,
         }
         for (i = 0; i < naxes; ++i) {
             PyObject *tmp = PyTuple_GET_ITEM(axes_in, i);
-            long axis = PyInt_AsLong(tmp);
+            int axis = PyArray_PyIntAsInt(tmp);
             if (axis == -1 && PyErr_Occurred()) {
                 Py_XDECREF(otype);
                 Py_DECREF(mp);
@@ -3771,7 +3771,7 @@ PyUFunc_GenericReduction(PyUFuncObject *ufunc, PyObject *args,
     }
     /* Try to interpret axis as an integer */
     else {
-        long axis = PyInt_AsLong(axes_in);
+        int axis = PyArray_PyIntAsInt(axes_in);
         /* TODO: PyNumber_Index would be good to use here */
         if (axis == -1 && PyErr_Occurred()) {
             Py_XDECREF(otype);
@@ -3932,18 +3932,19 @@ _find_array_wrap(PyObject *args, PyObject *kwds,
     PyObject *with_wrap[NPY_MAXARGS], *wraps[NPY_MAXARGS];
     PyObject *obj, *wrap = NULL;
 
-    /* If a 'subok' parameter is passed and isn't True, don't wrap */
+    /*
+     * If a 'subok' parameter is passed and isn't True, don't wrap but put None
+     * into slots with out arguments which means return the out argument
+     */
     if (kwds != NULL && (obj = PyDict_GetItem(kwds,
                                               npy_um_str_subok)) != NULL) {
         if (obj != Py_True) {
-            for (i = 0; i < nout; i++) {
-                output_wrap[i] = NULL;
-            }
-            return;
+            /* skip search for wrap members */
+            goto handle_out;
         }
     }
 
-    nargs = PyTuple_GET_SIZE(args);
+
     for (i = 0; i < nin; i++) {
         obj = PyTuple_GET_ITEM(args, i);
         if (PyArray_CheckExact(obj) || PyArray_IsAnyScalar(obj)) {
@@ -4001,6 +4002,8 @@ _find_array_wrap(PyObject *args, PyObject *kwds,
      * exact ndarray so that no PyArray_Return is
      * done in that case.
      */
+handle_out:
+    nargs = PyTuple_GET_SIZE(args);
     for (i = 0; i < nout; i++) {
         int j = nin + i;
         int incref = 1;
@@ -4305,7 +4308,7 @@ NPY_NO_EXPORT PyObject *
 PyUFunc_FromFuncAndData(PyUFuncGenericFunction *func, void **data,
                         char *types, int ntypes,
                         int nin, int nout, int identity,
-                        char *name, char *doc, int check_return)
+                        const char *name, const char *doc, int check_return)
 {
     return PyUFunc_FromFuncAndDataAndSignature(func, data, types, ntypes,
         nin, nout, identity, name, doc, check_return, NULL);
@@ -4316,7 +4319,7 @@ NPY_NO_EXPORT PyObject *
 PyUFunc_FromFuncAndDataAndSignature(PyUFuncGenericFunction *func, void **data,
                                      char *types, int ntypes,
                                      int nin, int nout, int identity,
-                                     char *name, char *doc,
+                                     const char *name, const char *doc,
                                      int check_return, const char *signature)
 {
     PyUFuncObject *ufunc;
diff --git a/numpy/core/src/umath/ufunc_type_resolution.c b/numpy/core/src/umath/ufunc_type_resolution.c
index 6ef4438b41ca..82f8c94146d3 100644
--- a/numpy/core/src/umath/ufunc_type_resolution.c
+++ b/numpy/core/src/umath/ufunc_type_resolution.c
@@ -58,7 +58,7 @@ PyUFunc_ValidateCasting(PyUFuncObject *ufunc,
                             PyArray_Descr **dtypes)
 {
     int i, nin = ufunc->nin, nop = nin + ufunc->nout;
-    char *ufunc_name;
+    const char *ufunc_name;
 
     ufunc_name = ufunc->name ? ufunc->name : "<unnamed ufunc>";
 
@@ -186,7 +186,7 @@ PyUFunc_SimpleBinaryComparisonTypeResolver(PyUFuncObject *ufunc,
                                 PyArray_Descr **out_dtypes)
 {
     int i, type_num1, type_num2;
-    char *ufunc_name;
+    const char *ufunc_name;
 
     ufunc_name = ufunc->name ? ufunc->name : "<unnamed ufunc>";
 
@@ -292,7 +292,7 @@ PyUFunc_SimpleUnaryOperationTypeResolver(PyUFuncObject *ufunc,
                                 PyArray_Descr **out_dtypes)
 {
     int i, type_num1;
-    char *ufunc_name;
+    const char *ufunc_name;
 
     ufunc_name = ufunc->name ? ufunc->name : "<unnamed ufunc>";
 
@@ -433,7 +433,7 @@ PyUFunc_SimpleBinaryOperationTypeResolver(PyUFuncObject *ufunc,
                                 PyArray_Descr **out_dtypes)
 {
     int i, type_num1, type_num2;
-    char *ufunc_name;
+    const char *ufunc_name;
 
     ufunc_name = ufunc->name ? ufunc->name : "<unnamed ufunc>";
 
@@ -591,7 +591,7 @@ PyUFunc_AdditionTypeResolver(PyUFuncObject *ufunc,
 {
     int type_num1, type_num2;
     int i;
-    char *ufunc_name;
+    const char *ufunc_name;
 
     ufunc_name = ufunc->name ? ufunc->name : "<unnamed ufunc>";
 
@@ -781,7 +781,7 @@ PyUFunc_SubtractionTypeResolver(PyUFuncObject *ufunc,
 {
     int type_num1, type_num2;
     int i;
-    char *ufunc_name;
+    const char *ufunc_name;
 
     ufunc_name = ufunc->name ? ufunc->name : "<unnamed ufunc>";
 
@@ -963,7 +963,7 @@ PyUFunc_MultiplicationTypeResolver(PyUFuncObject *ufunc,
 {
     int type_num1, type_num2;
     int i;
-    char *ufunc_name;
+    const char *ufunc_name;
 
     ufunc_name = ufunc->name ? ufunc->name : "<unnamed ufunc>";
 
@@ -1106,7 +1106,7 @@ PyUFunc_DivisionTypeResolver(PyUFuncObject *ufunc,
 {
     int type_num1, type_num2;
     int i;
-    char *ufunc_name;
+    const char *ufunc_name;
 
     ufunc_name = ufunc->name ? ufunc->name : "<unnamed ufunc>";
 
@@ -1873,7 +1873,7 @@ linear_search_type_resolver(PyUFuncObject *self,
 {
     npy_intp i, j, nin = self->nin, nop = nin + self->nout;
     int types[NPY_MAXARGS];
-    char *ufunc_name;
+    const char *ufunc_name;
     int no_castable_output, use_min_scalar;
 
     /* For making a better error message on coercion error */
@@ -1982,7 +1982,7 @@ type_tuple_type_resolver(PyUFuncObject *self,
     npy_intp i, j, n, nin = self->nin, nop = nin + self->nout;
     int n_specified = 0;
     int specified_types[NPY_MAXARGS], types[NPY_MAXARGS];
-    char *ufunc_name;
+    const char *ufunc_name;
     int no_castable_output, use_min_scalar;
 
     /* For making a better error message on coercion error */
@@ -2042,7 +2042,7 @@ type_tuple_type_resolver(PyUFuncObject *self,
             type_tup = str_obj;
         }
 
-        if (!PyBytes_AsStringAndSize(type_tup, &str, &length) < 0) {
+        if (PyBytes_AsStringAndSize(type_tup, &str, &length) == -1) {
             Py_XDECREF(str_obj);
             return -1;
         }
diff --git a/numpy/core/src/umath/umathmodule.c b/numpy/core/src/umath/umathmodule.c
index 3ed7ee7714ec..52aa2e48d282 100644
--- a/numpy/core/src/umath/umathmodule.c
+++ b/numpy/core/src/umath/umathmodule.c
@@ -54,16 +54,15 @@ object_ufunc_type_resolver(PyUFuncObject *ufunc,
                                 PyArray_Descr **out_dtypes)
 {
     int i, nop = ufunc->nin + ufunc->nout;
-    PyArray_Descr *obj_dtype;
 
-    obj_dtype = PyArray_DescrFromType(NPY_OBJECT);
-    if (obj_dtype == NULL) {
+    out_dtypes[0] = PyArray_DescrFromType(NPY_OBJECT);
+    if (out_dtypes[0] == NULL) {
         return -1;
     }
 
-    for (i = 0; i < nop; ++i) {
-        Py_INCREF(obj_dtype);
-        out_dtypes[i] = obj_dtype;
+    for (i = 1; i < nop; ++i) {
+        Py_INCREF(out_dtypes[0]);
+        out_dtypes[i] = out_dtypes[0];
     }
 
     return 0;
@@ -215,9 +214,6 @@ static PyUFuncGenericFunction frexp_functions[] = {
 #endif
 };
 
-static void * blank3_data[] = { (void *)NULL, (void *)NULL, (void *)NULL};
-static void * blank6_data[] = { (void *)NULL, (void *)NULL, (void *)NULL,
-                                (void *)NULL, (void *)NULL, (void *)NULL};
 static char frexp_signatures[] = {
 #ifdef HAVE_FREXPF
     NPY_HALF, NPY_HALF, NPY_INT,
@@ -228,6 +224,7 @@ static char frexp_signatures[] = {
     ,NPY_LONGDOUBLE, NPY_LONGDOUBLE, NPY_INT
 #endif
 };
+static void * blank_data[12];
 
 #if NPY_SIZEOF_LONG == NPY_SIZEOF_INT
 #define LDEXP_LONG(typ) typ##_ldexp
@@ -358,14 +355,16 @@ InitOtherOperators(PyObject *dictionary) {
     int num;
 
     num = sizeof(frexp_functions) / sizeof(frexp_functions[0]);
-    f = PyUFunc_FromFuncAndData(frexp_functions, blank3_data,
+    assert(sizeof(blank_data) / sizeof(blank_data[0]) >= num);
+    f = PyUFunc_FromFuncAndData(frexp_functions, blank_data,
                                 frexp_signatures, num,
                                 1, 2, PyUFunc_None, "frexp", frdoc, 0);
     PyDict_SetItemString(dictionary, "frexp", f);
     Py_DECREF(f);
 
     num = sizeof(ldexp_functions) / sizeof(ldexp_functions[0]);
-    f = PyUFunc_FromFuncAndData(ldexp_functions, blank6_data,
+    assert(sizeof(blank_data) / sizeof(blank_data[0]) >= num);
+    f = PyUFunc_FromFuncAndData(ldexp_functions, blank_data,
                                 ldexp_signatures, num,
                                 2, 1, PyUFunc_None, "ldexp", lddoc, 0);
     PyDict_SetItemString(dictionary, "ldexp", f);
diff --git a/numpy/core/tests/test_blasdot.py b/numpy/core/tests/test_blasdot.py
index caa576abcf7e..6b5afef14f86 100644
--- a/numpy/core/tests/test_blasdot.py
+++ b/numpy/core/tests/test_blasdot.py
@@ -1,7 +1,9 @@
 from __future__ import division, absolute_import, print_function
 
-import numpy as np
 import sys
+from itertools import product
+
+import numpy as np
 from numpy.core import zeros, float64
 from numpy.testing import dec, TestCase, assert_almost_equal, assert_, \
      assert_raises, assert_array_equal, assert_allclose, assert_equal
@@ -152,6 +154,7 @@ def test_dot_array_order():
                     assert_almost_equal(b.dot(c), _dot(b, c), decimal=prec)
                     assert_almost_equal(c.T.dot(b.T), _dot(c.T, b.T), decimal=prec)
 
+@dec.skipif(True) # ufunc override disabled for 1.9
 def test_dot_override():
     class A(object):
         def __numpy_ufunc__(self, ufunc, method, pos, inputs, **kwargs):
@@ -169,3 +172,78 @@ def __numpy_ufunc__(self, ufunc, method, pos, inputs, **kwargs):
     assert_equal(c.dot(a), "A")
     assert_raises(TypeError, np.dot, b, c)
     assert_raises(TypeError, c.dot, b)
+
+
+def test_npdot_segfault():
+    if sys.platform != 'darwin': return
+    # Test for float32 np.dot segfault
+    # https://github.com/numpy/numpy/issues/4007
+
+    def aligned_array(shape, align, dtype, order='C'):
+        # Make array shape `shape` with aligned at `align` bytes
+        d = dtype()
+        # Make array of correct size with `align` extra bytes
+        N = np.prod(shape)
+        tmp = np.zeros(N * d.nbytes + align, dtype=np.uint8)
+        address = tmp.__array_interface__["data"][0]
+        # Find offset into array giving desired alignment
+        for offset in range(align):
+            if (address + offset) % align == 0: break
+        tmp = tmp[offset:offset+N*d.nbytes].view(dtype=dtype)
+        return tmp.reshape(shape, order=order)
+
+    def as_aligned(arr, align, dtype, order='C'):
+        # Copy `arr` into an aligned array with same shape
+        aligned = aligned_array(arr.shape, align, dtype, order)
+        aligned[:] = arr[:]
+        return aligned
+
+    def assert_dot_close(A, X, desired):
+        assert_allclose(np.dot(A, X), desired, rtol=1e-5, atol=1e-7)
+
+    m = aligned_array(100, 15, np.float32)
+    s = aligned_array((100, 100), 15, np.float32)
+    # This always segfaults when the sgemv alignment bug is present
+    np.dot(s, m)
+    # test the sanity of np.dot after applying patch
+    for align, m, n, a_order in product(
+        (15, 32),
+        (10000,),
+        (200, 89),
+        ('C', 'F')):
+        # Calculation in double precision
+        A_d = np.random.rand(m, n)
+        X_d = np.random.rand(n)
+        desired = np.dot(A_d, X_d)
+        # Calculation with aligned single precision
+        A_f = as_aligned(A_d, align, np.float32, order=a_order)
+        X_f = as_aligned(X_d, align, np.float32)
+        assert_dot_close(A_f, X_f, desired)
+        # Strided A rows
+        A_d_2 = A_d[::2]
+        desired = np.dot(A_d_2, X_d)
+        A_f_2 = A_f[::2]
+        assert_dot_close(A_f_2, X_f, desired)
+        # Strided A columns, strided X vector
+        A_d_22 = A_d_2[:, ::2]
+        X_d_2 = X_d[::2]
+        desired = np.dot(A_d_22, X_d_2)
+        A_f_22 = A_f_2[:, ::2]
+        X_f_2 = X_f[::2]
+        assert_dot_close(A_f_22, X_f_2, desired)
+        # Check the strides are as expected
+        if a_order == 'F':
+            assert_equal(A_f_22.strides, (8, 8 * m))
+        else:
+            assert_equal(A_f_22.strides, (8 * n, 8))
+        assert_equal(X_f_2.strides, (8,))
+        # Strides in A rows + cols only
+        X_f_2c = as_aligned(X_f_2, align, np.float32)
+        assert_dot_close(A_f_22, X_f_2c, desired)
+        # Strides just in A cols
+        A_d_12 = A_d[:, ::2]
+        desired = np.dot(A_d_12, X_d_2)
+        A_f_12 = A_f[:, ::2]
+        assert_dot_close(A_f_12, X_f_2c, desired)
+        # Strides in A cols and X
+        assert_dot_close(A_f_12, X_f_2, desired)
diff --git a/numpy/core/tests/test_datetime.py b/numpy/core/tests/test_datetime.py
index bf0ba6807394..4e432f8850e6 100644
--- a/numpy/core/tests/test_datetime.py
+++ b/numpy/core/tests/test_datetime.py
@@ -1412,6 +1412,11 @@ def test_datetime_arange(self):
                                 np.datetime64('2012-02-03T14Z', 's'),
                                 np.timedelta64(5, 'Y'))
 
+    def test_datetime_arange_no_dtype(self):
+        d = np.array('2010-01-04', dtype="M8[D]")
+        assert_equal(np.arange(d, d + 1), d)
+        assert_raises(ValueError, np.arange, d)
+
     def test_timedelta_arange(self):
         a = np.arange(3, 10, dtype='m8')
         assert_equal(a.dtype, np.dtype('m8'))
@@ -1430,6 +1435,11 @@ def test_timedelta_arange(self):
         assert_raises(TypeError, np.arange, np.timedelta64(0, 'Y'),
                                 np.timedelta64(5, 'D'))
 
+    def test_timedelta_arange_no_dtype(self):
+        d = np.array(5, dtype="m8[D]")
+        assert_equal(np.arange(d, d + 1), d)
+        assert_raises(ValueError, np.arange, d)
+
     def test_datetime_maximum_reduce(self):
         a = np.array(['2010-01-02', '1999-03-14', '1833-03'], dtype='M8[D]')
         assert_equal(np.maximum.reduce(a).dtype, np.dtype('M8[D]'))
diff --git a/numpy/core/tests/test_deprecations.py b/numpy/core/tests/test_deprecations.py
index a1f4664a53a3..ef56766f5f41 100644
--- a/numpy/core/tests/test_deprecations.py
+++ b/numpy/core/tests/test_deprecations.py
@@ -12,7 +12,7 @@
 
 import numpy as np
 from numpy.testing import (dec, run_module_suite, assert_raises,
-                           assert_warns, assert_array_equal)
+                           assert_warns, assert_array_equal, assert_)
 
 
 class _DeprecationTestCase(object):
@@ -249,6 +249,14 @@ def mult(a, b):
         self.assert_not_deprecated(mult, args=([1], np.int_(3)))
 
 
+    def test_reduce_axis_float_index(self):
+        d = np.zeros((3,3,3))
+        self.assert_deprecated(np.min, args=(d, 0.5))
+        self.assert_deprecated(np.min, num=1, args=(d, (0.5, 1)))
+        self.assert_deprecated(np.min, num=1, args=(d, (1, 2.2)))
+        self.assert_deprecated(np.min, num=2, args=(d, (.2, 1.2)))
+
+
 class TestBooleanArgumentDeprecation(_DeprecationTestCase):
     """This tests that using a boolean as integer argument/indexing is
     deprecated.
@@ -426,6 +434,26 @@ def test_none_comparison(self):
             assert_raises(FutureWarning, operator.eq, np.arange(3), None)
             assert_raises(FutureWarning, operator.ne, np.arange(3), None)
 
+    def test_scalar_none_comparison(self):
+        # Scalars should still just return false and not give a warnings.
+        with warnings.catch_warnings(record=True) as w:
+            warnings.filterwarnings('always', '', FutureWarning)
+            assert_(not np.float32(1) == None)
+            assert_(not np.str_('test') == None)
+            # This is dubious (see below):
+            assert_(not np.datetime64('NaT') == None)
+
+            assert_(np.float32(1) != None)
+            assert_(np.str_('test') != None)
+            # This is dubious (see below):
+            assert_(np.datetime64('NaT') != None)
+        assert_(len(w) == 0)
+
+        # For documentaiton purpose, this is why the datetime is dubious.
+        # At the time of deprecation this was no behaviour change, but
+        # it has to be considered when the deprecations is done.
+        assert_(np.equal(np.datetime64('NaT'), None))
+
 
 class TestIdentityComparisonDepreactions(_DeprecationTestCase):
     """This tests the equal and not_equal object ufuncs identity check
diff --git a/numpy/core/tests/test_indexing.py b/numpy/core/tests/test_indexing.py
index 6b0b0a0b52f2..ccc0f5fb9365 100644
--- a/numpy/core/tests/test_indexing.py
+++ b/numpy/core/tests/test_indexing.py
@@ -111,7 +111,9 @@ def test_single_int_index(self):
         # Index out of bounds produces IndexError
         assert_raises(IndexError, a.__getitem__, 1<<30)
         # Index overflow produces IndexError
-        assert_raises(IndexError, a.__getitem__, 1<<64)
+        with warnings.catch_warnings(record=True):
+            warnings.filterwarnings('always', '', DeprecationWarning)
+            assert_raises(IndexError, a.__getitem__, 1<<64)
 
     def test_single_bool_index(self):
         # Single boolean index
@@ -147,7 +149,7 @@ def test_boolean_indexing_onedim(self):
 
     def test_boolean_assignment_value_mismatch(self):
         # A boolean assignment should fail when the shape of the values
-        # cannot be broadcasted to the subscription. (see also gh-3458)
+        # cannot be broadcast to the subscription. (see also gh-3458)
         a = np.arange(4)
         def f(a, v):
             a[a > -1] = v
@@ -188,12 +190,12 @@ def test_reverse_strides_and_subspace_bufferinit(self):
         # If the strides are not reversed, the 0 in the arange comes last.
         assert_equal(a[0], 0)
 
-        # This also tests that the subspace buffer is initiliazed:
+        # This also tests that the subspace buffer is initialized:
         a = np.ones((5, 2))
         c = np.arange(10).reshape(5, 2)[::-1]
         a[b, :] = c
         assert_equal(a[0], [0, 1])
-    
+
     def test_reversed_strides_result_allocation(self):
         # Test a bug when calculating the output strides for a result array
         # when the subspace size was 1 (and test other cases as well)
@@ -285,6 +287,17 @@ def __array_finalize__(self, old):
         assert_((a == 1).all())
 
 
+    def test_subclass_writeable(self):
+        d = np.rec.array([('NGC1001', 11), ('NGC1002', 1.), ('NGC1003', 1.)],
+                         dtype=[('target', 'S20'), ('V_mag', '>f4')])
+        ind = np.array([False,  True,  True], dtype=bool)
+        assert_(d[ind].flags.writeable)
+        ind = np.array([0, 1])
+        assert_(d[ind].flags.writeable)
+        assert_(d[...].flags.writeable)
+        assert_(d[0].flags.writeable)
+
+
     def test_memory_order(self):
         # This is not necessary to preserve. Memory layouts for
         # more complex indices are not as simple.
@@ -335,7 +348,7 @@ def test_small_regressions(self):
         # Reference count of intp for index checks
         a = np.array([0])
         refcount = sys.getrefcount(np.dtype(np.intp))
-        # item setting always checks indices in seperate function:
+        # item setting always checks indices in separate function:
         a[np.array([0], dtype=np.intp)] = 1
         a[np.array([0], dtype=np.uint8)] = 1
         assert_raises(IndexError, a.__setitem__,
@@ -367,6 +380,37 @@ def test_unaligned(self):
         d[b % 2 == 0]
         d[b % 2 == 0] = x[::2]
 
+    def test_tuple_subclass(self):
+        arr = np.ones((5, 5))
+
+        # A tuple subclass should also be an nd-index
+        class TupleSubclass(tuple):
+            pass
+        index = ([1], [1])
+        index = TupleSubclass(index)
+        assert_(arr[index].shape == (1,))
+        # Unlike the non nd-index:
+        assert_(arr[index,].shape != (1,))
+
+    def test_broken_sequence_not_nd_index(self):
+        # See gh-5063:
+        # If we have an object which claims to be a sequence, but fails
+        # on item getting, this should not be converted to an nd-index (tuple)
+        # If this object happens to be a valid index otherwise, it should work
+        # This object here is very dubious and probably bad though:
+        class SequenceLike(object):
+            def __index__(self):
+                return 0
+
+            def __len__(self):
+                return 1
+
+            def __getitem__(self, item):
+                raise IndexError('Not possible')
+
+        arr = np.arange(10)
+        assert_array_equal(arr[SequenceLike()], arr[SequenceLike(),])
+
 
 class TestFieldIndexing(TestCase):
     def test_scalar_return_type(self):
@@ -402,8 +446,14 @@ def test_prepend_not_one(self):
 
         # Too large and not only ones.
         assert_raises(ValueError, assign, a, s_[...],  np.ones((2, 1)))
-        assert_raises(ValueError, assign, a, s_[[1, 2, 3],],  np.ones((2, 1)))
-        assert_raises(ValueError, assign, a, s_[[[1], [2]],], np.ones((2,2,1)))
+        
+        with warnings.catch_warnings():
+            # Will be a ValueError as well.
+            warnings.simplefilter("error", DeprecationWarning)
+            assert_raises(DeprecationWarning, assign, a, s_[[1, 2, 3],],
+                          np.ones((2, 1)))
+            assert_raises(DeprecationWarning, assign, a, s_[[[1], [2]],],
+                          np.ones((2,2,1)))
 
 
     def test_simple_broadcasting_errors(self):
@@ -520,11 +570,11 @@ class TestMultiIndexingAutomated(TestCase):
      These test use code to mimic the C-Code indexing for selection.
 
      NOTE: * This still lacks tests for complex item setting.
-           * If you change behavoir of indexing, you might want to modify
+           * If you change behavior of indexing, you might want to modify
              these tests to try more combinations.
            * Behavior was written to match numpy version 1.8. (though a
              first version matched 1.7.)
-           * Only tuple indicies are supported by the mimicing code.
+           * Only tuple indices are supported by the mimicking code.
              (and tested as of writing this)
            * Error types should match most of the time as long as there
              is only one error. For multiple errors, what gets raised
@@ -547,7 +597,7 @@ def setUp(self):
             slice(4, -1, -2),
             slice(None, None, -3),
             # Some Fancy indexes:
-            np.empty((0, 1, 1), dtype=np.intp), # empty broadcastable
+            np.empty((0, 1, 1), dtype=np.intp), # empty and can be broadcast
             np.array([0, 1, -2]),
             np.array([[2], [0], [1]]),
             np.array([[0, -1], [0, 1]], dtype=np.dtype('intp').newbyteorder()),
@@ -594,7 +644,7 @@ def _get_multi_index(self, arr, indices):
         fancy_dim = 0
         # NOTE: This is a funny twist (and probably OK to change).
         # The boolean array has illegal indexes, but this is
-        # allowed if the broadcasted fancy-indices are 0-sized.
+        # allowed if the broadcast fancy-indices are 0-sized.
         # This variable is to catch that case.
         error_unless_broadcast_to_empty = False
 
@@ -639,7 +689,7 @@ def _get_multi_index(self, arr, indices):
         if arr.ndim - ndim < 0:
             # we can't take more dimensions then we have, not even for 0-d arrays.
             # since a[()] makes sense, but not a[(),]. We will raise an error
-            # lateron, unless a broadcasting error occurs first.
+            # later on, unless a broadcasting error occurs first.
             raise IndexError
 
         if ndim == 0 and not None in in_indices:
@@ -651,7 +701,7 @@ def _get_multi_index(self, arr, indices):
 
         for ax, indx in enumerate(in_indices):
             if isinstance(indx, slice):
-                # convert to an index array anways:
+                # convert to an index array
                 indx = np.arange(*indx.indices(arr.shape[ax]))
                 indices.append(['s', indx])
                 continue
@@ -684,7 +734,7 @@ def _get_multi_index(self, arr, indices):
                     indx = flat_indx
                 else:
                     # This could be changed, a 0-d boolean index can
-                    # make sense (even outide the 0-d indexed array case)
+                    # make sense (even outside the 0-d indexed array case)
                     # Note that originally this is could be interpreted as
                     # integer in the full integer special case.
                     raise IndexError
@@ -736,7 +786,7 @@ def _get_multi_index(self, arr, indices):
             arr = arr.transpose(*(fancy_axes + axes))
 
         # We only have one 'f' index now and arr is transposed accordingly.
-        # Now handle newaxes by reshaping...
+        # Now handle newaxis by reshaping...
         ax = 0
         for indx in indices:
             if indx[0] == 'f':
@@ -754,7 +804,7 @@ def _get_multi_index(self, arr, indices):
                     res = np.broadcast(*indx[1:]) # raises ValueError...
                 else:
                     res = indx[1]
-                # unfortunatly the indices might be out of bounds. So check
+                # unfortunately the indices might be out of bounds. So check
                 # that first, and use mode='wrap' then. However only if
                 # there are any indices...
                 if res.size != 0:
@@ -892,7 +942,7 @@ def test_multidim(self):
         # spot and the simple ones in one other spot.
         with warnings.catch_warnings():
             # This is so that np.array(True) is not accepted in a full integer
-            # index, when running the file seperatly.
+            # index, when running the file separately.
             warnings.filterwarnings('error', '', DeprecationWarning)
             for simple_pos in [0, 2, 3]:
                 tocheck = [self.fill_indices, self.complex_indices,
diff --git a/numpy/core/tests/test_multiarray.py b/numpy/core/tests/test_multiarray.py
index f768f3acb398..688274dcea43 100644
--- a/numpy/core/tests/test_multiarray.py
+++ b/numpy/core/tests/test_multiarray.py
@@ -22,7 +22,7 @@
 from numpy.core.multiarray_tests import (
         test_neighborhood_iterator, test_neighborhood_iterator_oob,
         test_pydatamem_seteventhook_start, test_pydatamem_seteventhook_end,
-        test_inplace_increment, get_buffer_info
+        test_inplace_increment, get_buffer_info, test_as_c_array
         )
 from numpy.testing import (
         TestCase, run_module_suite, assert_, assert_raises,
@@ -68,6 +68,17 @@ def test_otherflags(self):
         assert_equal(self.a.flags.aligned, True)
         assert_equal(self.a.flags.updateifcopy, False)
 
+    def test_string_align(self):
+        a = np.zeros(4, dtype=np.dtype('|S4'))
+        assert_(a.flags.aligned)
+        # not power of two are accessed bytewise and thus considered aligned
+        a = np.zeros(5, dtype=np.dtype('|S4'))
+        assert_(a.flags.aligned)
+
+    def test_void_align(self):
+        a = np.zeros(4, dtype=np.dtype([("a", "i4"), ("b", "i4")]))
+        assert_(a.flags.aligned)
+
 class TestHash(TestCase):
     # see #3793
     def test_int(self):
@@ -297,6 +308,10 @@ def test_construction(self):
         d2 = dtype('f8')
         assert_equal(d2, dtype(float64))
 
+    def test_byteorders(self):
+        self.assertNotEqual(dtype('<i4'), dtype('>i4'))
+        self.assertNotEqual(dtype([('a', '<i4')]), dtype([('a', '>i4')]))
+
 class TestZeroRank(TestCase):
     def setUp(self):
         self.d = array(0), array('x', object)
@@ -557,6 +572,12 @@ def test_zeros_like_like_zeros(self):
         assert_array_equal(zeros_like(d), d)
         assert_equal(zeros_like(d).dtype, d.dtype)
 
+    def test_empty_unicode(self):
+        # don't throw decode errors on garbage memory
+        for i in range(5, 100, 5):
+            d = np.empty(i, dtype='U')
+            str(d)
+
     def test_sequence_non_homogenous(self):
         assert_equal(np.array([4, 2**80]).dtype, np.object)
         assert_equal(np.array([4, 2**80, 4]).dtype, np.object)
@@ -601,6 +622,20 @@ def __getitem__(self, index):
         assert_(a.dtype == np.dtype(object))
         assert_raises(ValueError, np.array, [Fail()])
 
+    def test_no_len_object_type(self):
+        # gh-5100, want object array from iterable object without len()
+        class Point2:
+            def __init__(self):
+                pass
+
+            def __getitem__(self, ind):
+                if ind in [0, 1]:
+                    return ind
+                else:
+                    raise IndexError()
+        d = np.array([Point2(), Point2(), Point2()])
+        assert_equal(d.dtype, np.dtype(object))
+
 
 class TestStructured(TestCase):
     def test_subarray_field_access(self):
@@ -676,6 +711,65 @@ def test_subarray_comparison(self):
         b = np.array([(5, 43), (10, 1)], dtype=[('a', '<i8'), ('b', '>f8')])
         assert_equal(a == b, [False, True])
 
+    def test_casting(self):
+        # Check that casting a structured array to change its byte order
+        # works
+        a = np.array([(1,)], dtype=[('a', '<i4')])
+        assert_(np.can_cast(a.dtype, [('a', '>i4')], casting='unsafe'))
+        b = a.astype([('a', '>i4')])
+        assert_equal(b, a.byteswap().newbyteorder())
+        assert_equal(a['a'][0], b['a'][0])
+
+        # Check that equality comparison works on structured arrays if
+        # they are 'equiv'-castable
+        a = np.array([(5, 42), (10, 1)], dtype=[('a', '>i4'), ('b', '<f8')])
+        b = np.array([(42, 5), (1, 10)], dtype=[('b', '>f8'), ('a', '<i4')])
+        assert_(np.can_cast(a.dtype, b.dtype, casting='equiv'))
+        assert_equal(a == b, [True, True])
+
+        # Check that 'equiv' casting can reorder fields and change byte
+        # order
+        assert_(np.can_cast(a.dtype, b.dtype, casting='equiv'))
+        c = a.astype(b.dtype, casting='equiv')
+        assert_equal(a == c, [True, True])
+
+        # Check that 'safe' casting can change byte order and up-cast
+        # fields
+        t = [('a', '<i8'), ('b', '>f8')]
+        assert_(np.can_cast(a.dtype, t, casting='safe'))
+        c = a.astype(t, casting='safe')
+        assert_equal((c == np.array([(5, 42), (10, 1)], dtype=t)),
+                     [True, True])
+
+        # Check that 'same_kind' casting can change byte order and
+        # change field widths within a "kind"
+        t = [('a', '<i4'), ('b', '>f4')]
+        assert_(np.can_cast(a.dtype, t, casting='same_kind'))
+        c = a.astype(t, casting='same_kind')
+        assert_equal((c == np.array([(5, 42), (10, 1)], dtype=t)),
+                     [True, True])
+
+        # Check that casting fails if the casting rule should fail on
+        # any of the fields
+        t = [('a', '>i8'), ('b', '<f4')]
+        assert_(not np.can_cast(a.dtype, t, casting='safe'))
+        assert_raises(TypeError, a.astype, t, casting='safe')
+        t = [('a', '>i2'), ('b', '<f8')]
+        assert_(not np.can_cast(a.dtype, t, casting='equiv'))
+        assert_raises(TypeError, a.astype, t, casting='equiv')
+        t = [('a', '>i8'), ('b', '<i2')]
+        assert_(not np.can_cast(a.dtype, t, casting='same_kind'))
+        assert_raises(TypeError, a.astype, t, casting='same_kind')
+        assert_(not np.can_cast(a.dtype, b.dtype, casting='no'))
+        assert_raises(TypeError, a.astype, b.dtype, casting='no')
+
+        # Check that non-'unsafe' casting can't change the set of field names
+        for casting in ['no', 'safe', 'equiv', 'same_kind']:
+            t = [('a', '>i4')]
+            assert_(not np.can_cast(a.dtype, t, casting=casting))
+            t = [('a', '>i4'), ('b', '<f8'), ('c', 'i4')]
+            assert_(not np.can_cast(a.dtype, t, casting=casting))
+
 
 class TestBool(TestCase):
     def test_test_interning(self):
@@ -1402,6 +1496,12 @@ def test_partition(self):
                 d[i:].partition(0, kind=k)
             assert_array_equal(d, tgt)
 
+            d = np.array([0, 1, 2, 3, 4, 5, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
+                          7, 7, 7, 7, 7, 9])
+            kth = [0, 3, 19, 20]
+            assert_equal(np.partition(d, kth, kind=k)[kth], (0, 3, 7, 7))
+            assert_equal(d[np.argpartition(d, kth, kind=k)][kth], (0, 3, 7, 7))
+
             d = np.array([2, 1])
             d.partition(0, kind=k)
             assert_raises(ValueError, d.partition, 2)
@@ -1597,6 +1697,24 @@ def test_partition_unicode_kind(self):
         assert_raises(ValueError, d.partition, 2, kind=k)
         assert_raises(ValueError, d.argpartition, 2, kind=k)
 
+    def test_partition_fuzz(self):
+        # a few rounds of random data testing
+       for j in range(10, 30):
+           for i in range(1, j - 2):
+               d = np.arange(j)
+               np.random.shuffle(d)
+               d = d % np.random.randint(2, 30)
+               idx = np.random.randint(d.size)
+               kth = [0, idx, i, i + 1]
+               tgt = np.sort(d)[kth]
+               assert_array_equal(np.partition(d, kth)[kth], tgt,
+                                  err_msg="data: %r\n kth: %r" % (d, kth))
+
+    def test_argpartition_gh5524(self):
+        #  A test for functionality of argpartition on lists.
+        d = [6,7,3,2,9,0]
+        p = np.argpartition(d,1)
+        self.assert_partitioned(np.array(d)[p],[1])
 
     def test_flatten(self):
         x0 = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
@@ -1630,6 +1748,7 @@ def test_dot(self):
         a.dot(b=b, out=c)
         assert_equal(c, np.dot(a, b))
 
+    @dec.skipif(True) # ufunc override disabled for 1.9
     def test_dot_override(self):
         class A(object):
             def __numpy_ufunc__(self, ufunc, method, pos, inputs, **kwargs):
@@ -1755,8 +1874,46 @@ def test_ravel(self):
         assert_equal(a.ravel(order='K'), [2, 3, 0, 1])
         assert_(a.ravel(order='K').flags.owndata)
 
+    def test_conjugate(self):
+        a = np.array([1-1j, 1+1j, 23+23.0j])
+        ac = a.conj()
+        assert_equal(a.real, ac.real)
+        assert_equal(a.imag, -ac.imag)
+        assert_equal(ac, a.conjugate())
+        assert_equal(ac, np.conjugate(a))
+
+        a = np.array([1-1j, 1+1j, 23+23.0j], 'F')
+        ac = a.conj()
+        assert_equal(a.real, ac.real)
+        assert_equal(a.imag, -ac.imag)
+        assert_equal(ac, a.conjugate())
+        assert_equal(ac, np.conjugate(a))
+
+        a = np.array([1, 2, 3])
+        ac = a.conj()
+        assert_equal(a, ac)
+        assert_equal(ac, a.conjugate())
+        assert_equal(ac, np.conjugate(a))
+
+        a = np.array([1.0, 2.0, 3.0])
+        ac = a.conj()
+        assert_equal(a, ac)
+        assert_equal(ac, a.conjugate())
+        assert_equal(ac, np.conjugate(a))
+
+        a = np.array([1-1j, 1+1j, 1, 2.0], object)
+        ac = a.conj()
+        assert_equal(ac, [k.conjugate() for k in a])
+        assert_equal(ac, a.conjugate())
+        assert_equal(ac, np.conjugate(a))
+
+        a = np.array([1-1j, 1, 2.0, 'f'], object)
+        assert_raises(AttributeError, lambda: a.conj())
+        assert_raises(AttributeError, lambda: a.conjugate()) 
+
 
 class TestBinop(object):
+    @dec.skipif(True) # ufunc override disabled for 1.9
     def test_ufunc_override_rop_precedence(self):
         # Check that __rmul__ and other right-hand operations have
         # precedence over __numpy_ufunc__
@@ -1875,6 +2032,7 @@ def __rop__(self, *other):
             yield check, op_name, True
             yield check, op_name, False
 
+    @dec.skipif(True) # ufunc override disabled for 1.9
     def test_ufunc_override_rop_simple(self):
         # Check parts of the binary op overriding behavior in an
         # explicit test case that is easier to understand.
@@ -2186,6 +2344,11 @@ def test_output_shape(self):
         a.argmax(-1, out=out)
         assert_equal(out, a.argmax(-1))
 
+    def test_argmax_unicode(self):
+        d = np.zeros(6031, dtype='<U9')
+        d[5942] = "as"
+        assert_equal(d.argmax(), 5942)
+
 
 class TestArgmin(TestCase):
 
@@ -2291,6 +2454,11 @@ def test_output_shape(self):
         a.argmin(-1, out=out)
         assert_equal(out, a.argmin(-1))
 
+    def test_argmin_unicode(self):
+        d = np.ones(6031, dtype='<U9')
+        d[6001] = "0"
+        assert_equal(d.argmin(), 6001)
+
 
 class TestMinMax(TestCase):
     def test_scalar(self):
@@ -4121,6 +4289,23 @@ def test_mapiter(self):
         assert_equal(b, [ 100.1,  51.,   6.,   3.,   4.,   5. ])
 
 
+class TestAsCArray(TestCase):
+    def test_1darray(self):
+        array = np.arange(24, dtype=np.double)
+        from_c = test_as_c_array(array, 3)
+        assert_equal(array[3], from_c)
+
+    def test_2darray(self):
+        array = np.arange(24, dtype=np.double).reshape(3, 8)
+        from_c = test_as_c_array(array, 2, 4)
+        assert_equal(array[2, 4], from_c)
+
+    def test_3darray(self):
+        array = np.arange(24, dtype=np.double).reshape(2, 3, 4)
+        from_c = test_as_c_array(array, 1, 2, 3)
+        assert_equal(array[1, 2, 3], from_c)
+
+
 class PriorityNdarray():
     __array_priority__ = 1000
 
diff --git a/numpy/core/tests/test_numeric.py b/numpy/core/tests/test_numeric.py
index 40bbe5aec906..a1bc195d7ee9 100644
--- a/numpy/core/tests/test_numeric.py
+++ b/numpy/core/tests/test_numeric.py
@@ -5,6 +5,7 @@
 from decimal import Decimal
 import warnings
 import itertools
+import platform
 
 import numpy as np
 from numpy.core import *
@@ -931,6 +932,7 @@ def test_nonzero_twodim(self):
         assert_equal(np.nonzero(x['a']), ([0, 1, 1, 2], [2, 0, 1, 1]))
         assert_equal(np.nonzero(x['b']), ([0, 0, 1, 2, 2], [0, 2, 0, 1, 2]))
 
+        assert_(not x['a'].T.flags.aligned)
         assert_equal(np.count_nonzero(x['a'].T), 4)
         assert_equal(np.count_nonzero(x['b'].T), 5)
         assert_equal(np.nonzero(x['a'].T), ([0, 1, 1, 2], [1, 1, 2, 0]))
@@ -1047,8 +1049,17 @@ def test_array_equiv(self):
 
 def assert_array_strict_equal(x, y):
     assert_array_equal(x, y)
-    # Check flags
-    assert_(x.flags == y.flags)
+    # Check flags, 32 bit arches typically don't provide 16 byte alignment
+    if ((x.dtype.alignment <= 8 or
+            np.intp().dtype.itemsize != 4) and
+            sys.platform != 'win32'):
+        assert_(x.flags == y.flags)
+    else:
+        assert_(x.flags.owndata == y.flags.owndata)
+        assert_(x.flags.writeable == y.flags.writeable)
+        assert_(x.flags.c_contiguous == y.flags.c_contiguous)
+        assert_(x.flags.f_contiguous == y.flags.f_contiguous)
+        assert_(x.flags.updateifcopy == y.flags.updateifcopy)
     # check endianness
     assert_(x.dtype.isnative == y.dtype.isnative)
 
@@ -1165,6 +1176,18 @@ def test_simple_complex(self):
         act = self.clip(a, m, M)
         assert_array_strict_equal(ac, act)
 
+    def test_clip_complex(self):
+        # Address Issue gh-5354 for clipping complex arrays
+        # Test native complex input without explicit min/max
+        # ie, either min=None or max=None
+        a = np.ones(10, dtype=np.complex)
+        m = a.min()
+        M = a.max()
+        am = self.fastclip(a, m, None)
+        aM = self.fastclip(a, None, M)
+        assert_array_strict_equal(am, a)
+        assert_array_strict_equal(aM, a)
+
     def test_clip_non_contig(self):
         #Test clip for non contiguous native input and native scalar min/max.
         a   = self._generate_data(self.nr * 2, self.nc * 3)
@@ -1686,6 +1709,20 @@ def test_ddof2(self):
         assert_almost_equal(std(self.A, ddof=2)**2,
                             self.real_var*len(self.A)/float(len(self.A)-2))
 
+    def test_out_scalar(self):
+        d = np.arange(10)
+        out = np.array(0.)
+        r = np.std(d, out=out)
+        assert_(r is out)
+        assert_array_equal(r, out)
+        r = np.var(d, out=out)
+        assert_(r is out)
+        assert_array_equal(r, out)
+        r = np.mean(d, out=out)
+        assert_(r is out)
+        assert_array_equal(r, out)
+
+
 class TestStdVarComplex(TestCase):
     def test_basic(self):
         A = array([1, 1.j, -1, -1.j])
diff --git a/numpy/core/tests/test_numerictypes.py b/numpy/core/tests/test_numerictypes.py
index 077e9447592d..ef8db0f33478 100644
--- a/numpy/core/tests/test_numerictypes.py
+++ b/numpy/core/tests/test_numerictypes.py
@@ -1,6 +1,7 @@
 from __future__ import division, absolute_import, print_function
 
 import sys
+import warnings
 from numpy.testing import *
 from numpy.compat import asbytes, asunicode
 import numpy as np
@@ -363,7 +364,9 @@ class TestMultipleFields(TestCase):
     def setUp(self):
         self.ary = np.array([(1, 2, 3, 4), (5, 6, 7, 8)], dtype='i4,f4,i2,c8')
     def _bad_call(self):
-        return self.ary['f0', 'f1']
+        with warnings.catch_warnings(record=True):
+            warnings.filterwarnings('always', '', DeprecationWarning)
+            return self.ary['f0', 'f1']
     def test_no_tuple(self):
         self.assertRaises(IndexError, self._bad_call)
     def test_return(self):
diff --git a/numpy/core/tests/test_records.py b/numpy/core/tests/test_records.py
index 8c9ce5c708a4..355e5480a992 100644
--- a/numpy/core/tests/test_records.py
+++ b/numpy/core/tests/test_records.py
@@ -1,5 +1,6 @@
 from __future__ import division, absolute_import, print_function
 
+import sys
 from os import path
 import numpy as np
 from numpy.testing import *
@@ -15,6 +16,14 @@ def test_fromrecords(self):
         r = np.rec.fromrecords([[456, 'dbe', 1.2], [2, 'de', 1.3]],
                             names='col1,col2,col3')
         assert_equal(r[0].item(), (456, 'dbe', 1.2))
+        assert_equal(r['col1'].dtype.kind, 'i')
+        if sys.version_info[0] >= 3:
+            assert_equal(r['col2'].dtype.kind, 'U')
+            assert_equal(r['col2'].dtype.itemsize, 12)
+        else:
+            assert_equal(r['col2'].dtype.kind, 'S')
+            assert_equal(r['col2'].dtype.itemsize, 3)
+        assert_equal(r['col3'].dtype.kind, 'f')
 
     def test_method_array(self):
         r = np.rec.array(asbytes('abcdefg') * 100, formats='i2,a3,i4', shape=3, byteorder='big')
diff --git a/numpy/core/tests/test_regression.py b/numpy/core/tests/test_regression.py
index 9f40d7b54023..431f80534dc1 100644
--- a/numpy/core/tests/test_regression.py
+++ b/numpy/core/tests/test_regression.py
@@ -181,7 +181,7 @@ def test_endian_bool_indexing(self,level=rlevel):
         assert_(np.all(b[yb] > 0.5))
 
     def test_endian_where(self,level=rlevel):
-        """GitHuB issue #369"""
+        """GitHub issue #369"""
         net = np.zeros(3, dtype='>f4')
         net[1] = 0.00458849
         net[2] = 0.605202
@@ -290,7 +290,7 @@ def test_unicode_string_comparison(self,level=rlevel):
 
     def test_tobytes_FORTRANORDER_discontiguous(self,level=rlevel):
         """Fix in r2836"""
-        # Create discontiguous Fortran-ordered array
+        # Create non-contiguous Fortran ordered array
         x = np.array(np.random.rand(3, 3), order='F')[:, :2]
         assert_array_almost_equal(x.ravel(), np.fromstring(x.tobytes()))
 
@@ -311,7 +311,7 @@ def bfb(): x[:] = np.arange(3, dtype=float)
         self.assertRaises(ValueError, bfb)
 
     def test_nonarray_assignment(self):
-        # See also Issue gh-2870, test for nonarray assignment
+        # See also Issue gh-2870, test for non-array assignment
         # and equivalent unsafe casted array assignment
         a = np.arange(10)
         b = np.ones(10, dtype=bool)
@@ -398,6 +398,41 @@ def __getitem__(self, key):
 
         assert_raises(KeyError, np.lexsort, BuggySequence())
 
+    def test_pickle_py2_bytes_encoding(self):
+        # Check that arrays and scalars pickled on Py2 are
+        # unpickleable on Py3 using encoding='bytes'
+
+        test_data = [
+            # (original, py2_pickle)
+            (np.unicode_('\u6f2c'),
+             asbytes("cnumpy.core.multiarray\nscalar\np0\n(cnumpy\ndtype\np1\n"
+                     "(S'U1'\np2\nI0\nI1\ntp3\nRp4\n(I3\nS'<'\np5\nNNNI4\nI4\n"
+                     "I0\ntp6\nbS',o\\x00\\x00'\np7\ntp8\nRp9\n.")),
+
+            (np.array([9e123], dtype=np.float64),
+             asbytes("cnumpy.core.multiarray\n_reconstruct\np0\n(cnumpy\nndarray\n"
+                     "p1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I1\ntp6\ncnumpy\ndtype\n"
+                     "p7\n(S'f8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\n"
+                     "I0\ntp12\nbI00\nS'O\\x81\\xb7Z\\xaa:\\xabY'\np13\ntp14\nb.")),
+
+            (np.array([(9e123,)], dtype=[('name', float)]),
+             asbytes("cnumpy.core.multiarray\n_reconstruct\np0\n(cnumpy\nndarray\np1\n"
+                     "(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I1\ntp6\ncnumpy\ndtype\np7\n"
+                     "(S'V8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'|'\np11\nN(S'name'\np12\ntp13\n"
+                     "(dp14\ng12\n(g7\n(S'f8'\np15\nI0\nI1\ntp16\nRp17\n(I3\nS'<'\np18\nNNNI-1\n"
+                     "I-1\nI0\ntp19\nbI0\ntp20\nsI8\nI1\nI0\ntp21\n"
+                     "bI00\nS'O\\x81\\xb7Z\\xaa:\\xabY'\np22\ntp23\nb.")),
+        ]
+
+        if sys.version_info[:2] >= (3, 4):
+            # encoding='bytes' was added in Py3.4
+            for original, data in test_data:
+                result = pickle.loads(data, encoding='bytes')
+                assert_equal(result, original)
+
+                if isinstance(result, np.ndarray) and result.dtype.names:
+                    for name in result.dtype.names:
+                        assert_(isinstance(name, str))
 
     def test_pickle_dtype(self,level=rlevel):
         """Ticket #251"""
@@ -560,7 +595,7 @@ def test_reshape_zero_strides(self, level=rlevel):
         assert_(a.reshape(5, 1).strides[0] == 0)
 
     def test_reshape_zero_size(self, level=rlevel):
-        """Github Issue #2700, setting shape failed for 0-sized arrays"""
+        """GitHub Issue #2700, setting shape failed for 0-sized arrays"""
         a = np.ones((0, 2))
         a.shape = (-1, 2)
 
@@ -568,7 +603,7 @@ def test_reshape_zero_size(self, level=rlevel):
     # With NPY_RELAXED_STRIDES_CHECKING the test becomes superfluous.
     @dec.skipif(np.ones(1).strides[0] == np.iinfo(np.intp).max)
     def test_reshape_trailing_ones_strides(self):
-        # Github issue gh-2949, bad strides for trailing ones of new shape
+        # GitHub issue gh-2949, bad strides for trailing ones of new shape
         a = np.zeros(12, dtype=np.int32)[::2] # not contiguous
         strides_c = (16, 8, 8, 8)
         strides_f = (8, 24, 48, 48)
@@ -756,8 +791,12 @@ def test_bool_indexing_invalid_nr_elements(self, level=rlevel):
         s = np.ones(10, dtype=float)
         x = np.array((15,), dtype=float)
         def ia(x, s, v): x[(s>0)]=v
-        self.assertRaises(ValueError, ia, x, s, np.zeros(9, dtype=float))
-        self.assertRaises(ValueError, ia, x, s, np.zeros(11, dtype=float))
+        # After removing deprecation, the following are ValueErrors.
+        # This might seem odd as compared to the value error below. This
+        # is due to the fact that the new code always uses "nonzero" logic
+        # and the boolean special case is not taken.
+        self.assertRaises(IndexError, ia, x, s, np.zeros(9, dtype=float))
+        self.assertRaises(IndexError, ia, x, s, np.zeros(11, dtype=float))
         # Old special case (different code path):
         self.assertRaises(ValueError, ia, x.flat, s, np.zeros(9, dtype=float))
 
@@ -844,7 +883,7 @@ def test_object_array_refcounting(self, level=rlevel):
         cnt0_b = cnt(b)
         cnt0_c = cnt(c)
 
-        # -- 0d -> 1d broadcasted slice assignment
+        # -- 0d -> 1-d broadcast slice assignment
 
         arr = np.zeros(5, dtype=np.object_)
 
@@ -861,7 +900,7 @@ def test_object_array_refcounting(self, level=rlevel):
 
         del arr
 
-        # -- 1d -> 2d broadcasted slice assignment
+        # -- 1-d -> 2-d broadcast slice assignment
 
         arr  = np.zeros((5, 2), dtype=np.object_)
         arr0 = np.zeros(2, dtype=np.object_)
@@ -880,7 +919,7 @@ def test_object_array_refcounting(self, level=rlevel):
 
         del arr, arr0
 
-        # -- 2d copying + flattening
+        # -- 2-d copying + flattening
 
         arr  = np.zeros((5, 2), dtype=np.object_)
 
@@ -1025,8 +1064,8 @@ def test_compress_small_type(self, level=rlevel):
         b = np.zeros((2, 1), dtype = np.single)
         try:
             a.compress([True, False], axis = 1, out = b)
-            raise AssertionError("compress with an out which cannot be " \
-                                 "safely casted should not return "\
+            raise AssertionError("compress with an out which cannot be "
+                                 "safely casted should not return "
                                  "successfully")
         except TypeError:
             pass
@@ -1794,6 +1833,67 @@ def test_pickle_bytes_overwrite(self):
             bytestring = "\x01  ".encode('ascii')
             assert_equal(bytestring[0:1], '\x01'.encode('ascii'))
 
+    def test_pickle_py2_array_latin1_hack(self):
+        # Check that unpickling hacks in Py3 that support
+        # encoding='latin1' work correctly.
+
+        # Python2 output for pickle.dumps(numpy.array([129], dtype='b'))
+        data = asbytes("cnumpy.core.multiarray\n_reconstruct\np0\n(cnumpy\nndarray\np1\n(I0\n"
+                       "tp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I1\ntp6\ncnumpy\ndtype\np7\n(S'i1'\np8\n"
+                       "I0\nI1\ntp9\nRp10\n(I3\nS'|'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x81'\n"
+                       "p13\ntp14\nb.")
+        if sys.version_info[0] >= 3:
+            # This should work:
+            result = pickle.loads(data, encoding='latin1')
+            assert_array_equal(result, np.array([129], dtype='b'))
+            # Should not segfault:
+            assert_raises(Exception, pickle.loads, data, encoding='koi8-r')
+
+    def test_pickle_py2_scalar_latin1_hack(self):
+        # Check that scalar unpickling hack in Py3 that supports
+        # encoding='latin1' work correctly.
+
+        # Python2 output for pickle.dumps(...)
+        datas = [
+            # (original, python2_pickle, koi8r_validity)
+            (np.unicode_('\u6bd2'),
+             asbytes("cnumpy.core.multiarray\nscalar\np0\n(cnumpy\ndtype\np1\n"
+                     "(S'U1'\np2\nI0\nI1\ntp3\nRp4\n(I3\nS'<'\np5\nNNNI4\nI4\nI0\n"
+                     "tp6\nbS'\\xd2k\\x00\\x00'\np7\ntp8\nRp9\n."),
+             'invalid'),
+
+            (np.float64(9e123),
+             asbytes("cnumpy.core.multiarray\nscalar\np0\n(cnumpy\ndtype\np1\n(S'f8'\n"
+                     "p2\nI0\nI1\ntp3\nRp4\n(I3\nS'<'\np5\nNNNI-1\nI-1\nI0\ntp6\n"
+                     "bS'O\\x81\\xb7Z\\xaa:\\xabY'\np7\ntp8\nRp9\n."),
+             'invalid'),
+
+            (np.bytes_(asbytes('\x9c')),  # different 8-bit code point in KOI8-R vs latin1
+             asbytes("cnumpy.core.multiarray\nscalar\np0\n(cnumpy\ndtype\np1\n(S'S1'\np2\n"
+                     "I0\nI1\ntp3\nRp4\n(I3\nS'|'\np5\nNNNI1\nI1\nI0\ntp6\nbS'\\x9c'\np7\n"
+                     "tp8\nRp9\n."),
+             'different'),
+        ]
+        if sys.version_info[0] >= 3:
+            for original, data, koi8r_validity in datas:
+                result = pickle.loads(data, encoding='latin1')
+                assert_equal(result, original)
+
+                # Decoding under non-latin1 encoding (e.g.) KOI8-R can
+                # produce bad results, but should not segfault.
+                if koi8r_validity == 'different':
+                    # Unicode code points happen to lie within latin1,
+                    # but are different in koi8-r, resulting to silent
+                    # bogus results
+                    result = pickle.loads(data, encoding='koi8-r')
+                    assert_(result != original)
+                elif koi8r_validity == 'invalid':
+                    # Unicode code points outside latin1, so results
+                    # to an encoding exception
+                    assert_raises(ValueError, pickle.loads, data, encoding='koi8-r')
+                else:
+                    raise ValueError(koi8r_validity)
+
     def test_structured_type_to_object(self):
         a_rec = np.array([(0, 1), (3, 2)], dtype='i4,i8')
         a_obj = np.empty((2,), dtype=object)
@@ -2003,6 +2103,13 @@ def __eq__(self, other):
         assert_equal(np.int32(10) == x, "OK")
         assert_equal(np.array([10]) == x, "OK")
 
+    def test_pickle_empty_string(self):
+        # gh-3926
+
+        import pickle
+        test_string = np.string_('')
+        assert_equal(pickle.loads(pickle.dumps(test_string)), test_string)
+
 
 if __name__ == "__main__":
     run_module_suite()
diff --git a/numpy/core/tests/test_scalarmath.py b/numpy/core/tests/test_scalarmath.py
index d823e963f77c..afdc06c03d8e 100644
--- a/numpy/core/tests/test_scalarmath.py
+++ b/numpy/core/tests/test_scalarmath.py
@@ -83,6 +83,18 @@ def test_blocked(self):
                 np.add(1, inp2, out=out)
                 assert_almost_equal(out, exp1, err_msg=msg)
 
+    def test_lower_align(self):
+        # check data that is not aligned to element size
+        # i.e doubles are aligned to 4 bytes on i386
+        d = np.zeros(23 * 8, dtype=np.int8)[4:-4].view(np.float64)
+        o = np.zeros(23 * 8, dtype=np.int8)[4:-4].view(np.float64)
+        assert_almost_equal(d + d, d * 2)
+        np.add(d, d, out=o)
+        np.add(np.ones_like(d), d, out=o)
+        np.add(d, np.ones_like(d), out=o)
+        np.add(np.ones_like(d), d)
+        np.add(d, np.ones_like(d))
+
 
 class TestPower(TestCase):
     def test_small_types(self):
diff --git a/numpy/core/tests/test_ufunc.py b/numpy/core/tests/test_ufunc.py
index 080606dce1e5..5a883f4bc73a 100644
--- a/numpy/core/tests/test_ufunc.py
+++ b/numpy/core/tests/test_ufunc.py
@@ -14,6 +14,10 @@ def test_pickle(self):
         import pickle
         assert pickle.loads(pickle.dumps(np.sin)) is np.sin
 
+        # Check that ufunc not defined in the top level numpy namespace such as
+        # numpy.core.test_rational.test_add can also be pickled
+        assert pickle.loads(pickle.dumps(test_add)) is test_add
+
     def test_pickle_withstring(self):
         import pickle
         astring = asbytes("cnumpy.core\n_ufunc_reconstruct\np0\n"
@@ -647,7 +651,6 @@ class MyArray(np.ndarray):
         a = np.array(1).view(MyArray)
         assert_(type(np.any(a)) is MyArray)
 
-
     def test_casting_out_param(self):
         # Test that it's possible to do casts on output
         a = np.ones((200, 100), np.int64)
@@ -1087,5 +1090,64 @@ def test_inplace_fancy_indexing(self):
         self.assertRaises(TypeError, np.add.at, values, [0, 1], 1)
         assert_array_equal(values, np.array(['a', 1], dtype=np.object))
 
+    def test_reduce_arguments(self):
+        f = np.add.reduce
+        d = np.ones((5,2), dtype=int)
+        o = np.ones((2,), dtype=d.dtype)
+        r = o * 5
+        assert_equal(f(d), r)
+        # a, axis=0, dtype=None, out=None, keepdims=False
+        assert_equal(f(d, axis=0), r)
+        assert_equal(f(d, 0), r)
+        assert_equal(f(d, 0, dtype=None), r)
+        assert_equal(f(d, 0, dtype='i'), r)
+        assert_equal(f(d, 0, 'i'), r)
+        assert_equal(f(d, 0, None), r)
+        assert_equal(f(d, 0, None, out=None), r)
+        assert_equal(f(d, 0, None, out=o), r)
+        assert_equal(f(d, 0, None, o), r)
+        assert_equal(f(d, 0, None, None), r)
+        assert_equal(f(d, 0, None, None, keepdims=False), r)
+        assert_equal(f(d, 0, None, None, True), r.reshape((1,) + r.shape))
+        # multiple keywords
+        assert_equal(f(d, axis=0, dtype=None, out=None, keepdims=False), r)
+        assert_equal(f(d, 0, dtype=None, out=None, keepdims=False), r)
+        assert_equal(f(d, 0, None, out=None, keepdims=False), r)
+
+        # too little
+        assert_raises(TypeError, f)
+        # too much
+        assert_raises(TypeError, f, d, 0, None, None, False, 1)
+        # invalid axis
+        assert_raises(TypeError, f, d, "invalid")
+        assert_raises(TypeError, f, d, axis="invalid")
+        assert_raises(TypeError, f, d, axis="invalid", dtype=None,
+                      keepdims=True)
+        # invalid dtype
+        assert_raises(TypeError, f, d, 0, "invalid")
+        assert_raises(TypeError, f, d, dtype="invalid")
+        assert_raises(TypeError, f, d, dtype="invalid", out=None)
+        # invalid out
+        assert_raises(TypeError, f, d, 0, None, "invalid")
+        assert_raises(TypeError, f, d, out="invalid")
+        assert_raises(TypeError, f, d, out="invalid", dtype=None)
+        # keepdims boolean, no invalid value
+        # assert_raises(TypeError, f, d, 0, None, None, "invalid")
+        # assert_raises(TypeError, f, d, keepdims="invalid", axis=0, dtype=None)
+        # invalid mix
+        assert_raises(TypeError, f, d, 0, keepdims="invalid", dtype="invalid",
+                     out=None)
+
+        # invalid keyord
+        assert_raises(TypeError, f, d, 0, keepdims=True, invalid="invalid",
+                      out=None)
+        assert_raises(TypeError, f, d, invalid=0)
+        assert_raises(TypeError, f, d, axis=0, dtype=None, keepdims=True,
+                      out=None, invalid=0)
+        assert_raises(TypeError, f, d, axis=0, dtype=None,
+                      out=None, invalid=0)
+        assert_raises(TypeError, f, d, axis=0, dtype=None, invalid=0)
+
+
 if __name__ == "__main__":
     run_module_suite()
diff --git a/numpy/core/tests/test_umath.py b/numpy/core/tests/test_umath.py
index b3ddc239813b..5f195333a2bc 100644
--- a/numpy/core/tests/test_umath.py
+++ b/numpy/core/tests/test_umath.py
@@ -36,6 +36,36 @@ def test_e(self):
     def test_euler_gamma(self):
         assert_allclose(ncu.euler_gamma, 0.5772156649015329, 1e-15)
 
+class TestOut(TestCase):
+    def test_out_subok(self):
+        for b in (True, False):
+            aout = np.array(0.5)
+
+            r = np.add(aout, 2, out=aout)
+            assert_(r is aout)
+            assert_array_equal(r, aout)
+
+            r = np.add(aout, 2, out=aout, subok=b)
+            assert_(r is aout)
+            assert_array_equal(r, aout)
+
+            r = np.add(aout, 2, aout, subok=False)
+            assert_(r is aout)
+            assert_array_equal(r, aout)
+
+            d = np.ones(5)
+            o1 = np.zeros(5)
+            o2 = np.zeros(5, dtype=np.int32)
+            r1, r2 = np.frexp(d, o1, o2, subok=b)
+            assert_(r1 is o1)
+            assert_array_equal(r1, o1)
+            assert_(r2 is o2)
+            assert_array_equal(r2, o2)
+
+            r1, r2 = np.frexp(d, out=o1, subok=b)
+            assert_(r1 is o1)
+            assert_array_equal(r1, o1)
+
 
 class TestDivision(TestCase):
     def test_division_int(self):
@@ -333,11 +363,10 @@ def test_log1p(self):
         assert_almost_equal(ncu.log1p(1e-6), ncu.log(1+1e-6))
 
     def test_special(self):
-        assert_equal(ncu.log1p(np.nan), np.nan)
-        assert_equal(ncu.log1p(np.inf), np.inf)
-        with np.errstate(divide="ignore"):
+        with np.errstate(invalid="ignore", divide="ignore"):
+            assert_equal(ncu.log1p(np.nan), np.nan)
+            assert_equal(ncu.log1p(np.inf), np.inf)
             assert_equal(ncu.log1p(-1.), -np.inf)
-        with np.errstate(invalid="ignore"):
             assert_equal(ncu.log1p(-2.), np.nan)
             assert_equal(ncu.log1p(-np.inf), np.nan)
 
@@ -753,6 +782,13 @@ def test_minmax_blocked(self):
                     inp[i] = -1e10
                     assert_equal(inp.min(), -1e10, err_msg=msg)
 
+    def test_lower_align(self):
+        # check data that is not aligned to element size
+        # i.e doubles are aligned to 4 bytes on i386
+        d = np.zeros(23 * 8, dtype=np.int8)[4:-4].view(np.float64)
+        assert_equal(d.max(), d[0])
+        assert_equal(d.min(), d[0])
+
 
 class TestAbsoluteNegative(TestCase):
     def test_abs_neg_blocked(self):
@@ -785,6 +821,17 @@ def test_abs_neg_blocked(self):
                             np.negative(inp, out=out)
                             assert_array_equal(out, -1*inp, err_msg=msg)
 
+    def test_lower_align(self):
+        # check data that is not aligned to element size
+        # i.e doubles are aligned to 4 bytes on i386
+        d = np.zeros(23 * 8, dtype=np.int8)[4:-4].view(np.float64)
+        assert_equal(np.abs(d), d)
+        assert_equal(np.negative(d), -d)
+        np.negative(d, out=d)
+        np.negative(np.ones_like(d), out=d)
+        np.abs(d, out=d)
+        np.abs(np.ones_like(d), out=d)
+
 
 class TestSpecialMethods(TestCase):
     def test_wrap(self):
@@ -944,6 +991,7 @@ def __array__(self):
         assert_equal(ncu.maximum(a, B()), 0)
         assert_equal(ncu.maximum(a, C()), 0)
 
+    @dec.skipif(True) # ufunc override disabled for 1.9
     def test_ufunc_override(self):
         class A(object):
             def __numpy_ufunc__(self, func, method, pos, inputs, **kwargs):
@@ -970,6 +1018,7 @@ def __numpy_ufunc__(self, func, method, pos, inputs, **kwargs):
         assert_equal(res0[5], {})
         assert_equal(res1[5], {})
 
+    @dec.skipif(True) # ufunc override disabled for 1.9
     def test_ufunc_override_mro(self):
 
         # Some multi arg functions for testing.
@@ -1063,6 +1112,7 @@ def __numpy_ufunc__(self, func, method, pos, inputs, **kwargs):
         assert_raises(TypeError, four_mul_ufunc, 1, 2, c_sub, c)
         assert_raises(TypeError, four_mul_ufunc, 1, c, c_sub, c)
 
+    @dec.skipif(True) # ufunc override disabled for 1.9
     def test_ufunc_override_methods(self):
         class A(object):
             def __numpy_ufunc__(self, ufunc, method, pos, inputs, **kwargs):
@@ -1166,6 +1216,7 @@ def __numpy_ufunc__(self, ufunc, method, pos, inputs, **kwargs):
         assert_equal(res[3], 0)
         assert_equal(res[4], (a, [4, 2], 'b0'))
 
+    @dec.skipif(True) # ufunc override disabled for 1.9
     def test_ufunc_override_out(self):
         class A(object):
             def __numpy_ufunc__(self, ufunc, method, pos, inputs, **kwargs):
@@ -1200,6 +1251,7 @@ def __numpy_ufunc__(self, ufunc, method, pos, inputs, **kwargs):
         assert_equal(res7['out'][0], 'out0')
         assert_equal(res7['out'][1], 'out1')
 
+    @dec.skipif(True) # ufunc override disabled for 1.9
     def test_ufunc_override_exception(self):
         class A(object):
             def __numpy_ufunc__(self, *a, **kwargs):
diff --git a/numpy/distutils/ccompiler.py b/numpy/distutils/ccompiler.py
index 8484685c0f97..c8cacff7f260 100644
--- a/numpy/distutils/ccompiler.py
+++ b/numpy/distutils/ccompiler.py
@@ -583,8 +583,8 @@ def gen_lib_options(compiler, library_dirs, runtime_library_dirs, libraries):
 # Also fix up the various compiler modules, which do
 # from distutils.ccompiler import gen_lib_options
 # Don't bother with mwerks, as we don't support Classic Mac.
-for _cc in ['msvc', 'bcpp', 'cygwinc', 'emxc', 'unixc']:
-    _m = sys.modules.get('distutils.'+_cc+'compiler')
+for _cc in ['msvc9', 'msvc', 'bcpp', 'cygwinc', 'emxc', 'unixc']:
+    _m = sys.modules.get('distutils.' + _cc + 'compiler')
     if _m is not None:
         setattr(_m, 'gen_lib_options', gen_lib_options)
 
diff --git a/numpy/distutils/command/build_ext.py b/numpy/distutils/command/build_ext.py
index b48e4227a03b..b75d19ec402a 100644
--- a/numpy/distutils/command/build_ext.py
+++ b/numpy/distutils/command/build_ext.py
@@ -46,10 +46,22 @@ def initialize_options(self):
         self.fcompiler = None
 
     def finalize_options(self):
-        incl_dirs = self.include_dirs
+        # Ensure that self.include_dirs and self.distribution.include_dirs
+        # refer to the same list object. finalize_options will modify
+        # self.include_dirs, but self.distribution.include_dirs is used
+        # during the actual build.
+        # self.include_dirs is None unless paths are specified with
+        # --include-dirs.
+        # The include paths will be passed to the compiler in the order:
+        # numpy paths, --include-dirs paths, Python include path.
+        if isinstance(self.include_dirs, str):
+            self.include_dirs = self.include_dirs.split(os.pathsep)
+        incl_dirs = self.include_dirs or []
+        if self.distribution.include_dirs is None:
+            self.distribution.include_dirs = []
+        self.include_dirs = self.distribution.include_dirs
+        self.include_dirs.extend(incl_dirs)
         old_build_ext.finalize_options(self)
-        if incl_dirs is not None:
-            self.include_dirs.extend(self.distribution.include_dirs or [])
 
     def run(self):
         if not self.extensions:
diff --git a/numpy/distutils/command/config.py b/numpy/distutils/command/config.py
index 0086e36328ca..1b688bdd67ad 100644
--- a/numpy/distutils/command/config.py
+++ b/numpy/distutils/command/config.py
@@ -59,17 +59,28 @@ def _check_compiler (self):
                     e = get_exception()
                     msg = """\
 Could not initialize compiler instance: do you have Visual Studio
-installed ? If you are trying to build with mingw, please use python setup.py
-build -c mingw32 instead ). If you have Visual Studio installed, check it is
-correctly installed, and the right version (VS 2008 for python 2.6, VS 2003 for
-2.5, etc...). Original exception was: %s, and the Compiler
-class was %s
+installed?  If you are trying to build with MinGW, please use "python setup.py
+build -c mingw32" instead.  If you have Visual Studio installed, check it is
+correctly installed, and the right version (VS 2008 for python 2.6, 2.7 and 3.2,
+VS 2010 for >= 3.3).
+
+Original exception was: %s, and the Compiler class was %s
 ============================================================================""" \
                         % (e, self.compiler.__class__.__name__)
                     print ("""\
 ============================================================================""")
                     raise distutils.errors.DistutilsPlatformError(msg)
 
+            # After MSVC is initialized, add an explicit /MANIFEST to linker
+            # flags.  See issues gh-4245 and gh-4101 for details.  Also
+            # relevant are issues 4431 and 16296 on the Python bug tracker.
+            from distutils import msvc9compiler
+            if msvc9compiler.get_build_version() >= 10:
+                for ldflags in [self.compiler.ldflags_shared,
+                                self.compiler.ldflags_shared_debug]:
+                    if '/MANIFEST' not in ldflags:
+                        ldflags.append('/MANIFEST')
+
         if not isinstance(self.fcompiler, FCompiler):
             self.fcompiler = new_fcompiler(compiler=self.fcompiler,
                                            dry_run=self.dry_run, force=1,
diff --git a/numpy/distutils/fcompiler/gnu.py b/numpy/distutils/fcompiler/gnu.py
index b786c0a46927..368506470ad4 100644
--- a/numpy/distutils/fcompiler/gnu.py
+++ b/numpy/distutils/fcompiler/gnu.py
@@ -220,6 +220,9 @@ def _c_arch_flags(self):
     def get_flags_arch(self):
         return []
 
+    def runtime_library_dir_option(self, dir):
+        return '-Wl,-rpath="%s"' % dir
+
 class Gnu95FCompiler(GnuFCompiler):
     compiler_type = 'gnu95'
     compiler_aliases = ('gfortran',)
@@ -252,12 +255,13 @@ def version_match(self, version_string):
     possible_executables = ['gfortran', 'f95']
     executables = {
         'version_cmd'  : ["<F90>", "--version"],
-        'compiler_f77' : [None, "-Wall", "-ffixed-form",
+        'compiler_f77' : [None, "-Wall", "-g", "-ffixed-form",
+                          "-fno-second-underscore"] + _EXTRAFLAGS,
+        'compiler_f90' : [None, "-Wall", "-g",
                           "-fno-second-underscore"] + _EXTRAFLAGS,
-        'compiler_f90' : [None, "-Wall", "-fno-second-underscore"] + _EXTRAFLAGS,
-        'compiler_fix' : [None, "-Wall", "-ffixed-form",
+        'compiler_fix' : [None, "-Wall",  "-g","-ffixed-form",
                           "-fno-second-underscore"] + _EXTRAFLAGS,
-        'linker_so'    : ["<F90>", "-Wall"],
+        'linker_so'    : ["<F90>", "-Wall", "-g"],
         'archiver'     : ["ar", "-cr"],
         'ranlib'       : ["ranlib"],
         'linker_exe'   : [None, "-Wall"]
diff --git a/numpy/distutils/fcompiler/intel.py b/numpy/distutils/fcompiler/intel.py
index a80e525e3c7a..f76174c7a1d9 100644
--- a/numpy/distutils/fcompiler/intel.py
+++ b/numpy/distutils/fcompiler/intel.py
@@ -152,7 +152,7 @@ def update_executables(self):
     module_include_switch = '/I'
 
     def get_flags(self):
-        opt = ['/nologo', '/MD', '/nbs', '/Qlowercase', '/us']
+        opt = ['/nologo', '/MD', '/nbs', '/names:lowercase', '/assume:underscore']
         return opt
 
     def get_flags_free(self):
diff --git a/numpy/distutils/system_info.py b/numpy/distutils/system_info.py
index 48c92c548224..a05043055d84 100644
--- a/numpy/distutils/system_info.py
+++ b/numpy/distutils/system_info.py
@@ -10,6 +10,13 @@
   atlas_blas_info
   atlas_blas_threads_info
   lapack_atlas_info
+  lapack_atlas_threads_info
+  atlas_3_10_info
+  atlas_3_10_threads_info
+  atlas_3_10_blas_info,
+  atlas_3_10_blas_threads_info,
+  lapack_atlas_3_10_info
+  lapack_atlas_3_10_threads_info
   blas_info
   lapack_info
   openblas_info
@@ -302,6 +309,12 @@ def get_info(name, notfound_action=0):
           'atlas_blas_threads': atlas_blas_threads_info,
           'lapack_atlas': lapack_atlas_info,  # use lapack_opt instead
           'lapack_atlas_threads': lapack_atlas_threads_info,  # ditto
+          'atlas_3_10': atlas_3_10_info,  # use lapack_opt or blas_opt instead
+          'atlas_3_10_threads': atlas_3_10_threads_info,                # ditto
+          'atlas_3_10_blas': atlas_3_10_blas_info,
+          'atlas_3_10_blas_threads': atlas_3_10_blas_threads_info,
+          'lapack_atlas_3_10': lapack_atlas_3_10_info,  # use lapack_opt instead
+          'lapack_atlas_3_10_threads': lapack_atlas_3_10_threads_info,  # ditto
           'mkl': mkl_info,
           # openblas which may or may not have embedded lapack
           'openblas': openblas_info,          # use blas_opt instead
@@ -1148,6 +1161,63 @@ class lapack_atlas_threads_info(atlas_threads_info):
     _lib_names = ['lapack_atlas'] + atlas_threads_info._lib_names
 
 
+class atlas_3_10_info(atlas_info):
+    _lib_names = ['satlas']
+    _lib_atlas = _lib_names
+    _lib_lapack = _lib_names
+
+
+class atlas_3_10_blas_info(atlas_3_10_info):
+    _lib_names = ['satlas']
+
+    def calc_info(self):
+        lib_dirs = self.get_lib_dirs()
+        info = {}
+        atlas_libs = self.get_libs('atlas_libs',
+                                   self._lib_names)
+        atlas = self.check_libs2(lib_dirs, atlas_libs, [])
+        if atlas is None:
+            return
+        include_dirs = self.get_include_dirs()
+        h = (self.combine_paths(lib_dirs + include_dirs, 'cblas.h') or [None])
+        h = h[0]
+        if h:
+            h = os.path.dirname(h)
+            dict_append(info, include_dirs=[h])
+        info['language'] = 'c'
+        info['define_macros'] = [('HAVE_CBLAS', None)]
+
+        atlas_version, atlas_extra_info = get_atlas_version(**atlas)
+        dict_append(atlas, **atlas_extra_info)
+
+        dict_append(info, **atlas)
+
+        self.set_info(**info)
+        return
+
+
+class atlas_3_10_threads_info(atlas_3_10_info):
+    dir_env_var = ['PTATLAS', 'ATLAS']
+    _lib_names = ['tatlas']
+    #if sys.platfcorm[:7] == 'freebsd':
+        ## I don't think freebsd supports 3.10 at this time - 2014
+    _lib_atlas = _lib_names
+    _lib_lapack = _lib_names
+
+
+class atlas_3_10_blas_threads_info(atlas_3_10_blas_info):
+    dir_env_var = ['PTATLAS', 'ATLAS']
+    _lib_names = ['tatlas']
+
+
+class lapack_atlas_3_10_info(atlas_3_10_info):
+    pass
+
+
+class lapack_atlas_3_10_threads_info(atlas_3_10_threads_info):
+    pass
+
+
 class lapack_info(system_info):
     section = 'lapack'
     dir_env_var = 'LAPACK'
@@ -1366,7 +1436,6 @@ def get_atlas_version(**config):
     return result
 
 
-
 class lapack_opt_info(system_info):
 
     notfounderror = LapackNotFoundError
@@ -1383,7 +1452,11 @@ def calc_info(self):
             self.set_info(**lapack_mkl_info)
             return
 
-        atlas_info = get_info('atlas_threads')
+        atlas_info = get_info('atlas_3_10_threads')
+        if not atlas_info:
+            atlas_info = get_info('atlas_3_10')
+        if not atlas_info:
+            atlas_info = get_info('atlas_threads')
         if not atlas_info:
             atlas_info = get_info('atlas')
 
@@ -1400,7 +1473,7 @@ def calc_info(self):
             if os.path.exists('/System/Library/Frameworks'
                               '/Accelerate.framework/'):
                 if intel:
-                    args.extend(['-msse3'])
+                    args.extend(['-msse3', '-DAPPLE_ACCELERATE_SGEMV_PATCH'])
                 else:
                     args.extend(['-faltivec'])
                 link_args.extend(['-Wl,-framework', '-Wl,Accelerate'])
@@ -1480,7 +1553,11 @@ def calc_info(self):
             self.set_info(**openblas_info)
             return
 
-        atlas_info = get_info('atlas_blas_threads')
+        atlas_info = get_info('atlas_3_10_blas_threads')
+        if not atlas_info:
+            atlas_info = get_info('atlas_3_10_blas')
+        if not atlas_info:
+            atlas_info = get_info('atlas_blas_threads')
         if not atlas_info:
             atlas_info = get_info('atlas_blas')
 
@@ -1497,7 +1574,7 @@ def calc_info(self):
             if os.path.exists('/System/Library/Frameworks'
                               '/Accelerate.framework/'):
                 if intel:
-                    args.extend(['-msse3'])
+                    args.extend(['-msse3', '-DAPPLE_ACCELERATE_SGEMV_PATCH'])
                 else:
                     args.extend(['-faltivec'])
                 args.extend([
diff --git a/numpy/f2py/__main__.py b/numpy/f2py/__main__.py
new file mode 100644
index 000000000000..11dbf5f52e88
--- /dev/null
+++ b/numpy/f2py/__main__.py
@@ -0,0 +1,23 @@
+# See http://cens.ioc.ee/projects/f2py2e/
+import os, sys
+for mode in ["g3-numpy", "2e-numeric", "2e-numarray", "2e-numpy"]:
+    try:
+        i=sys.argv.index("--"+mode)
+        del sys.argv[i]
+        break
+    except ValueError: pass
+os.environ["NO_SCIPY_IMPORT"]="f2py"
+if mode=="g3-numpy":
+    sys.stderr.write("G3 f2py support is not implemented, yet.\\n")
+    sys.exit(1)
+elif mode=="2e-numeric":
+    from f2py2e import main
+elif mode=="2e-numarray":
+    sys.argv.append("-DNUMARRAY")
+    from f2py2e import main
+elif mode=="2e-numpy":
+    from numpy.f2py import main
+else:
+    sys.stderr.write("Unknown mode: " + repr(mode) + "\\n")
+    sys.exit(1)
+main()
diff --git a/numpy/f2py/crackfortran.py b/numpy/f2py/crackfortran.py
index 8930811269c9..0fde37bcf92c 100755
--- a/numpy/f2py/crackfortran.py
+++ b/numpy/f2py/crackfortran.py
@@ -1274,7 +1274,7 @@ def markinnerspaces(line):
     cb=''
     for c in line:
         if cb=='\\' and c in ['\\', '\'', '"']:
-            l=l+c;
+            l=l+c
             cb=c
             continue
         if f==0 and c in ['\'', '"']: cc=c; cc1={'\'':'"','"':'\''}[c]
@@ -2198,8 +2198,10 @@ def analyzevars(block):
                     if 'intent' not in vars[n]:
                         vars[n]['intent']=[]
                     for c in [x.strip() for x in markoutercomma(intent).split('@,@')]:
-                        if not c in vars[n]['intent']:
-                            vars[n]['intent'].append(c)
+                        # Remove spaces so that 'in out' becomes 'inout'
+                        tmp = c.replace(' ', '')
+                        if tmp not in vars[n]['intent']:
+                            vars[n]['intent'].append(tmp)
                     intent=None
                 if note:
                     note=note.replace('\\n\\n', '\n\n')
@@ -2220,7 +2222,7 @@ def analyzevars(block):
                     if 'check' not in vars[n]:
                         vars[n]['check']=[]
                     for c in [x.strip() for x in markoutercomma(check).split('@,@')]:
-                        if not c in vars[n]['check']:
+                        if c not in vars[n]['check']:
                             vars[n]['check'].append(c)
                     check=None
             if dim and 'dimension' not in vars[n]:
diff --git a/numpy/f2py/setup.py b/numpy/f2py/setup.py
index 2f1fd6a01507..c63ab059a3d8 100644
--- a/numpy/f2py/setup.py
+++ b/numpy/f2py/setup.py
@@ -29,6 +29,20 @@
 
 from __version__ import version
 
+
+def _get_f2py_shebang():
+    """ Return shebang line for f2py script
+
+    If we are building a binary distribution format, then the shebang line
+    should be ``#!python`` rather than ``#!`` followed by the contents of
+    ``sys.executable``.
+    """
+    if set(('bdist_wheel', 'bdist_egg', 'bdist_wininst',
+            'bdist_rpm')).intersection(sys.argv):
+        return '#!python'
+    return '#!' + sys.executable
+
+
 def configuration(parent_package='',top_path=None):
     config = Configuration('f2py', parent_package, top_path)
 
@@ -52,32 +66,10 @@ def generate_f2py_py(build_dir):
         if newer(__file__, target):
             log.info('Creating %s', target)
             f = open(target, 'w')
-            f.write('''\
-#!%s
-# See http://cens.ioc.ee/projects/f2py2e/
-import os, sys
-for mode in ["g3-numpy", "2e-numeric", "2e-numarray", "2e-numpy"]:
-    try:
-        i=sys.argv.index("--"+mode)
-        del sys.argv[i]
-        break
-    except ValueError: pass
-os.environ["NO_SCIPY_IMPORT"]="f2py"
-if mode=="g3-numpy":
-    sys.stderr.write("G3 f2py support is not implemented, yet.\\n")
-    sys.exit(1)
-elif mode=="2e-numeric":
-    from f2py2e import main
-elif mode=="2e-numarray":
-    sys.argv.append("-DNUMARRAY")
-    from f2py2e import main
-elif mode=="2e-numpy":
-    from numpy.f2py import main
-else:
-    sys.stderr.write("Unknown mode: " + repr(mode) + "\\n")
-    sys.exit(1)
-main()
-'''%(sys.executable))
+            f.write(_get_f2py_shebang() + '\n')
+            mainloc = os.path.join(os.path.dirname(__file__), "__main__.py")
+            with open(mainloc) as mf:
+                f.write(mf.read())
             f.close()
         return target
 
diff --git a/numpy/f2py/tests/src/regression/inout.f90 b/numpy/f2py/tests/src/regression/inout.f90
new file mode 100644
index 000000000000..80cdad90cec5
--- /dev/null
+++ b/numpy/f2py/tests/src/regression/inout.f90
@@ -0,0 +1,9 @@
+! Check that intent(in out) translates as intent(inout).
+! The separation seems to be a common usage.
+      subroutine foo(x)
+          implicit none
+          real(4), intent(in out) :: x
+          dimension x(3)
+          x(1) = x(1) + x(2) + x(3)
+          return
+      end
diff --git a/numpy/f2py/tests/test_array_from_pyobj.py b/numpy/f2py/tests/test_array_from_pyobj.py
index 3a148e72c735..c51fa39363e4 100644
--- a/numpy/f2py/tests/test_array_from_pyobj.py
+++ b/numpy/f2py/tests/test_array_from_pyobj.py
@@ -4,11 +4,13 @@
 import os
 import sys
 import copy
+import platform
 
 import nose
 
 from numpy.testing import *
-from numpy import array, alltrue, ndarray, asarray, can_cast, zeros, dtype
+from numpy import (array, alltrue, ndarray, asarray, can_cast, zeros, dtype,
+                   intp, clongdouble)
 from numpy.core.multiarray import typeinfo
 
 import util
@@ -81,37 +83,46 @@ def is_intent_exact(self,*names):
 
 intent = Intent()
 
-class Type(object):
-    _type_names = ['BOOL', 'BYTE', 'UBYTE', 'SHORT', 'USHORT', 'INT', 'UINT',
-                   'LONG', 'ULONG', 'LONGLONG', 'ULONGLONG',
-                   'FLOAT', 'DOUBLE', 'LONGDOUBLE', 'CFLOAT', 'CDOUBLE',
-                   'CLONGDOUBLE']
-    _type_cache = {}
-
-    _cast_dict = {'BOOL':['BOOL']}
-    _cast_dict['BYTE'] = _cast_dict['BOOL'] + ['BYTE']
-    _cast_dict['UBYTE'] = _cast_dict['BOOL'] + ['UBYTE']
-    _cast_dict['BYTE'] = ['BYTE']
-    _cast_dict['UBYTE'] = ['UBYTE']
-    _cast_dict['SHORT'] = _cast_dict['BYTE'] + ['UBYTE', 'SHORT']
-    _cast_dict['USHORT'] = _cast_dict['UBYTE'] + ['BYTE', 'USHORT']
-    _cast_dict['INT'] = _cast_dict['SHORT'] + ['USHORT', 'INT']
-    _cast_dict['UINT'] = _cast_dict['USHORT'] + ['SHORT', 'UINT']
-
-    _cast_dict['LONG'] = _cast_dict['INT'] + ['LONG']
-    _cast_dict['ULONG'] = _cast_dict['UINT'] + ['ULONG']
-
-    _cast_dict['LONGLONG'] = _cast_dict['LONG'] + ['LONGLONG']
-    _cast_dict['ULONGLONG'] = _cast_dict['ULONG'] + ['ULONGLONG']
-
-    _cast_dict['FLOAT'] = _cast_dict['SHORT'] + ['USHORT', 'FLOAT']
-    _cast_dict['DOUBLE'] = _cast_dict['INT'] + ['UINT', 'FLOAT', 'DOUBLE']
-    _cast_dict['LONGDOUBLE'] = _cast_dict['LONG'] + ['ULONG', 'FLOAT', 'DOUBLE', 'LONGDOUBLE']
-
-    _cast_dict['CFLOAT'] = _cast_dict['FLOAT'] + ['CFLOAT']
+_type_names = ['BOOL', 'BYTE', 'UBYTE', 'SHORT', 'USHORT', 'INT', 'UINT',
+               'LONG', 'ULONG', 'LONGLONG', 'ULONGLONG',
+               'FLOAT', 'DOUBLE', 'CFLOAT']
+
+_cast_dict = {'BOOL':['BOOL']}
+_cast_dict['BYTE'] = _cast_dict['BOOL'] + ['BYTE']
+_cast_dict['UBYTE'] = _cast_dict['BOOL'] + ['UBYTE']
+_cast_dict['BYTE'] = ['BYTE']
+_cast_dict['UBYTE'] = ['UBYTE']
+_cast_dict['SHORT'] = _cast_dict['BYTE'] + ['UBYTE', 'SHORT']
+_cast_dict['USHORT'] = _cast_dict['UBYTE'] + ['BYTE', 'USHORT']
+_cast_dict['INT'] = _cast_dict['SHORT'] + ['USHORT', 'INT']
+_cast_dict['UINT'] = _cast_dict['USHORT'] + ['SHORT', 'UINT']
+
+_cast_dict['LONG'] = _cast_dict['INT'] + ['LONG']
+_cast_dict['ULONG'] = _cast_dict['UINT'] + ['ULONG']
+
+_cast_dict['LONGLONG'] = _cast_dict['LONG'] + ['LONGLONG']
+_cast_dict['ULONGLONG'] = _cast_dict['ULONG'] + ['ULONGLONG']
+
+_cast_dict['FLOAT'] = _cast_dict['SHORT'] + ['USHORT', 'FLOAT']
+_cast_dict['DOUBLE'] = _cast_dict['INT'] + ['UINT', 'FLOAT', 'DOUBLE']
+
+_cast_dict['CFLOAT'] = _cast_dict['FLOAT'] + ['CFLOAT']
+
+# 32 bit system malloc typically does not provide the alignment required by
+# 16 byte long double types this means the inout intent cannot be satisfied and
+# several tests fail as the alignment flag can be randomly true or fals
+# when numpy gains an aligned allocator the tests could be enabled again
+if ((intp().dtype.itemsize != 4 or clongdouble().dtype.alignment <= 8) and
+        sys.platform != 'win32'):
+    _type_names.extend(['LONGDOUBLE', 'CDOUBLE', 'CLONGDOUBLE'])
+    _cast_dict['LONGDOUBLE'] = _cast_dict['LONG'] + \
+                               ['ULONG', 'FLOAT', 'DOUBLE', 'LONGDOUBLE']
+    _cast_dict['CLONGDOUBLE'] = _cast_dict['LONGDOUBLE'] + \
+                               ['CFLOAT', 'CDOUBLE', 'CLONGDOUBLE']
     _cast_dict['CDOUBLE'] = _cast_dict['DOUBLE'] + ['CFLOAT', 'CDOUBLE']
-    _cast_dict['CLONGDOUBLE'] = _cast_dict['LONGDOUBLE'] + ['CFLOAT', 'CDOUBLE', 'CLONGDOUBLE']
 
+class Type(object):
+    _type_cache = {}
 
     def __new__(cls, name):
         if isinstance(name, dtype):
@@ -138,15 +149,15 @@ def _init(self, name):
         self.dtypechar = typeinfo[self.NAME][0]
 
     def cast_types(self):
-        return [self.__class__(_m) for _m in self._cast_dict[self.NAME]]
+        return [self.__class__(_m) for _m in _cast_dict[self.NAME]]
 
     def all_types(self):
-        return [self.__class__(_m) for _m in self._type_names]
+        return [self.__class__(_m) for _m in _type_names]
 
     def smaller_types(self):
         bits = typeinfo[self.NAME][3]
         types = []
-        for name in self._type_names:
+        for name in _type_names:
             if typeinfo[name][3]<bits:
                 types.append(Type(name))
         return types
@@ -154,7 +165,7 @@ def smaller_types(self):
     def equal_types(self):
         bits = typeinfo[self.NAME][3]
         types = []
-        for name in self._type_names:
+        for name in _type_names:
             if name==self.NAME: continue
             if typeinfo[name][3]==bits:
                 types.append(Type(name))
@@ -163,7 +174,7 @@ def equal_types(self):
     def larger_types(self):
         bits = typeinfo[self.NAME][3]
         types = []
-        for name in self._type_names:
+        for name in _type_names:
             if typeinfo[name][3]>bits:
                 types.append(Type(name))
         return types
@@ -532,7 +543,7 @@ def test_inplace_from_casttype(self):
             assert_(obj.dtype.type is self.type.dtype) # obj type is changed inplace!
 
 
-for t in Type._type_names:
+for t in _type_names:
     exec('''\
 class test_%s_gen(unittest.TestCase,
               _test_shared_memory
diff --git a/numpy/f2py/tests/test_regression.py b/numpy/f2py/tests/test_regression.py
new file mode 100644
index 000000000000..9bd3f3fe302b
--- /dev/null
+++ b/numpy/f2py/tests/test_regression.py
@@ -0,0 +1,32 @@
+from __future__ import division, absolute_import, print_function
+
+import os
+import math
+
+import numpy as np
+from numpy.testing import dec, assert_raises, assert_equal
+
+import util
+
+def _path(*a):
+    return os.path.join(*((os.path.dirname(__file__),) + a))
+
+class TestIntentInOut(util.F2PyTest):
+    # Check that intent(in out) translates as intent(inout)
+    sources = [_path('src', 'regression', 'inout.f90')]
+
+    @dec.slow
+    def test_inout(self):
+        # non-contiguous should raise error
+        x = np.arange(6, dtype=np.float32)[::2]
+        assert_raises(ValueError, self.module.foo, x)
+
+        # check values with contiguous array
+        x = np.arange(3, dtype=np.float32)
+        self.module.foo(x)
+        assert_equal(x, [3, 1, 2])
+
+
+if __name__ == "__main__":
+    import nose
+    nose.runmodule()
diff --git a/numpy/fft/tests/test_fftpack.py b/numpy/fft/tests/test_fftpack.py
index ac892c83b8d6..45b5ac784ee9 100644
--- a/numpy/fft/tests/test_fftpack.py
+++ b/numpy/fft/tests/test_fftpack.py
@@ -48,11 +48,11 @@ def worker(args, q):
              for i in range(self.threads)]
         [x.start() for x in t]
 
+        [x.join() for x in t]
         # Make sure all threads returned the correct value
         for i in range(self.threads):
             assert_array_equal(q.get(timeout=5), expected,
                 'Function returned wrong value in multithreaded context')
-        [x.join() for x in t]
 
     def test_fft(self):
         a = np.ones(self.input_shape) * 1+0j
diff --git a/numpy/lib/_iotools.py b/numpy/lib/_iotools.py
index 1b1180893475..9108b2e4ce16 100644
--- a/numpy/lib/_iotools.py
+++ b/numpy/lib/_iotools.py
@@ -687,7 +687,7 @@ def __call__(self, value):
 
     def upgrade(self, value):
         """
-        Rind the best converter for a given string, and return the result.
+        Find the best converter for a given string, and return the result.
 
         The supplied string `value` is converted by testing different
         converters in order. First the `func` method of the
diff --git a/numpy/lib/format.py b/numpy/lib/format.py
index 7c8dfbafa84a..b93f86ca3bf2 100644
--- a/numpy/lib/format.py
+++ b/numpy/lib/format.py
@@ -298,7 +298,8 @@ def _write_array_header(fp, d, version=None):
     # can take advantage of our premature optimization.
     current_header_len = MAGIC_LEN + 2 + len(header) + 1  # 1 for the newline
     topad = 16 - (current_header_len % 16)
-    header = asbytes(header + ' '*topad + '\n')
+    header = header + ' '*topad + '\n'
+    header = asbytes(_filter_header(header))
 
     if len(header) >= (256*256) and version == (1, 0):
         raise ValueError("header does not fit inside %s bytes required by the"
@@ -433,7 +434,7 @@ def _filter_header(s):
         from io import StringIO
     else:
         from StringIO import StringIO
- 
+
     tokens = []
     last_token_was_number = False
     for token in tokenize.generate_tokens(StringIO(asstr(s)).read):
@@ -448,7 +449,7 @@ def _filter_header(s):
         last_token_was_number = (token_type == tokenize.NUMBER)
     return tokenize.untokenize(tokens)
 
- 
+
 def _read_array_header(fp, version):
     """
     see read_array_header_1_0
diff --git a/numpy/lib/function_base.py b/numpy/lib/function_base.py
index 0a1d05f77488..bc73acd6696d 100644
--- a/numpy/lib/function_base.py
+++ b/numpy/lib/function_base.py
@@ -337,6 +337,11 @@ def histogramdd(sample, bins=10, range=None, normed=False, weights=None):
             smin[i] = smin[i] - .5
             smax[i] = smax[i] + .5
 
+    # avoid rounding issues for comparisons when dealing with inexact types
+    if np.issubdtype(sample.dtype, np.inexact):
+        edge_dt = sample.dtype
+    else:
+        edge_dt = float
     # Create edge arrays
     for i in arange(D):
         if isscalar(bins[i]):
@@ -345,9 +350,9 @@ def histogramdd(sample, bins=10, range=None, normed=False, weights=None):
                     "Element at index %s in `bins` should be a positive "
                     "integer." % i)
             nbin[i] = bins[i] + 2  # +2 for outlier bins
-            edges[i] = linspace(smin[i], smax[i], nbin[i]-1)
+            edges[i] = linspace(smin[i], smax[i], nbin[i]-1, dtype=edge_dt)
         else:
-            edges[i] = asarray(bins[i], float)
+            edges[i] = asarray(bins[i], edge_dt)
             nbin[i] = len(edges[i]) + 1  # +1 for outlier bins
         dedges[i] = diff(edges[i])
         if np.any(np.asarray(dedges[i]) <= 0):
@@ -878,28 +883,33 @@ def copy(a, order='K'):
 # Basic operations
 
 
-def gradient(f, *varargs):
+def gradient(f, *varargs, **kwargs):
     """
     Return the gradient of an N-dimensional array.
-
+    
     The gradient is computed using second order accurate central differences
-    in the interior and second order accurate one-sides (forward or backwards)
-    differences at the boundaries. The returned gradient hence has the same
-    shape as the input array.
+    in the interior and either first differences or second order accurate 
+    one-sides (forward or backwards) differences at the boundaries. The
+    returned gradient hence has the same shape as the input array.
 
     Parameters
     ----------
     f : array_like
-      An N-dimensional array containing samples of a scalar function.
-    `*varargs` : scalars
-      0, 1, or N scalars specifying the sample distances in each direction,
-      that is: `dx`, `dy`, `dz`, ... The default distance is 1.
+        An N-dimensional array containing samples of a scalar function.
+    varargs : list of scalar, optional
+        N scalars specifying the sample distances for each dimension,
+        i.e. `dx`, `dy`, `dz`, ... Default distance: 1.
+    edge_order : {1, 2}, optional
+        Gradient is calculated using N\ :sup:`th` order accurate differences
+        at the boundaries. Default: 1.
+      
+        .. versionadded:: 1.9.1
 
     Returns
     -------
     gradient : ndarray
-      N arrays of the same shape as `f` giving the derivative of `f` with
-      respect to each dimension.
+        N arrays of the same shape as `f` giving the derivative of `f` with
+        respect to each dimension.
 
     Examples
     --------
@@ -911,15 +921,14 @@ def gradient(f, *varargs):
 
     >>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float))
     [array([[ 2.,  2., -1.],
-           [ 2.,  2., -1.]]),
-    array([[ 1. ,  2.5,  4. ],
-           [ 1. ,  1. ,  1. ]])]
+            [ 2.,  2., -1.]]), array([[ 1. ,  2.5,  4. ],
+            [ 1. ,  1. ,  1. ]])]
 
-    >>> x = np.array([0,1,2,3,4])
-    >>> dx = gradient(x)
+    >>> x = np.array([0, 1, 2, 3, 4])
+    >>> dx = np.gradient(x)
     >>> y = x**2
-    >>> gradient(y,dx)
-    array([0.,  2.,  4.,  6.,  8.])
+    >>> np.gradient(y, dx, edge_order=2)
+    array([-0.,  2.,  4.,  6.,  8.])
     """
     f = np.asanyarray(f)
     N = len(f.shape)  # number of dimensions
@@ -934,6 +943,13 @@ def gradient(f, *varargs):
         raise SyntaxError(
             "invalid number of arguments")
 
+    edge_order = kwargs.pop('edge_order', 1)
+    if kwargs:
+        raise TypeError('"{}" are not valid keyword arguments.'.format(
+                                                  '", "'.join(kwargs.keys())))
+    if edge_order > 2:
+        raise ValueError("'edge_order' greater than 2 not supported")
+
     # use central differences on interior and one-sided differences on the
     # endpoints. This preserves second order-accuracy over the full domain.
 
@@ -973,7 +989,7 @@ def gradient(f, *varargs):
                 "at least two elements are required.")
 
         # Numerical differentiation: 1st order edges, 2nd order interior
-        if y.shape[axis] == 2:
+        if y.shape[axis] == 2 or edge_order == 1:
             # Use first order differences for time data
             out = np.empty_like(y, dtype=otype)
 
@@ -1021,7 +1037,8 @@ def gradient(f, *varargs):
             out[slice1] = (3.0*y[slice2] - 4.0*y[slice3] + y[slice4])/2.0
 
         # divide by step size
-        outvals.append(out / dx[axis])
+        out /= dx[axis]
+        outvals.append(out)
 
         # reset the slice object in this dimension to ":"
         slice1[axis] = slice(None)
@@ -2998,7 +3015,7 @@ def percentile(a, q, axis=None, out=None,
     nearest neighbors as well as the `interpolation` parameter will
     determine the percentile if the normalized ranking does not match q
     exactly. This function is the same as the median if ``q=50``, the same
-    as the minimum if ``q=0``and the same as the maximum if ``q=100``.
+    as the minimum if ``q=0`` and the same as the maximum if ``q=100``.
 
     Examples
     --------
@@ -3031,7 +3048,7 @@ def percentile(a, q, axis=None, out=None,
     array([ 3.5])
 
     """
-    q = asarray(q, dtype=np.float64)
+    q = array(q, dtype=np.float64, copy=True)
     r, k = _ureduce(a, func=_percentile, q=q, axis=axis, out=out,
                     overwrite_input=overwrite_input,
                     interpolation=interpolation)
@@ -3758,7 +3775,9 @@ def insert(arr, obj, values, axis=None):
         if (index < 0):
             index += N
 
-        values = array(values, copy=False, ndmin=arr.ndim)
+        # There are some object array corner cases here, but we cannot avoid
+        # that:
+        values = array(values, copy=False, ndmin=arr.ndim, dtype=arr.dtype)
         if indices.ndim == 0:
             # broadcasting is very different here, since a[:,0,:] = ... behaves
             # very different from a[:,[0],:] = ...! This changes values so that
diff --git a/numpy/lib/npyio.py b/numpy/lib/npyio.py
index fe855a71a38d..d98d5a9577b2 100644
--- a/numpy/lib/npyio.py
+++ b/numpy/lib/npyio.py
@@ -37,52 +37,6 @@
     ]
 
 
-def seek_gzip_factory(f):
-    """Use this factory to produce the class so that we can do a lazy
-    import on gzip.
-
-    """
-    import gzip
-
-    class GzipFile(gzip.GzipFile):
-
-        def seek(self, offset, whence=0):
-            # figure out new position (we can only seek forwards)
-            if whence == 1:
-                offset = self.offset + offset
-
-            if whence not in [0, 1]:
-                raise IOError("Illegal argument")
-
-            if offset < self.offset:
-                # for negative seek, rewind and do positive seek
-                self.rewind()
-                count = offset - self.offset
-                for i in range(count // 1024):
-                    self.read(1024)
-                self.read(count % 1024)
-
-        def tell(self):
-            return self.offset
-
-    if isinstance(f, str):
-        f = GzipFile(f)
-    elif isinstance(f, gzip.GzipFile):
-        # cast to our GzipFile if its already a gzip.GzipFile
-
-        try:
-            name = f.name
-        except AttributeError:
-            # Backward compatibility for <= 2.5
-            name = f.filename
-        mode = f.mode
-
-        f = GzipFile(fileobj=f.fileobj, filename=name)
-        f.mode = mode
-
-    return f
-
-
 class BagObj(object):
     """
     BagObj(obj)
@@ -288,8 +242,7 @@ def load(file, mmap_mode=None):
     Parameters
     ----------
     file : file-like object or string
-        The file to read. Compressed files with the filename extension
-        ``.gz`` are acceptable. File-like objects must support the
+        The file to read. File-like objects must support the
         ``seek()`` and ``read()`` methods. Pickled files require that the
         file-like object support the ``readline()`` method as well.
     mmap_mode : {None, 'r+', 'r', 'w+', 'c'}, optional
@@ -369,8 +322,6 @@ def load(file, mmap_mode=None):
     if isinstance(file, basestring):
         fid = open(file, "rb")
         own_fid = True
-    elif isinstance(file, gzip.GzipFile):
-        fid = seek_gzip_factory(file)
     else:
         fid = file
 
@@ -718,7 +669,8 @@ def loadtxt(fname, dtype=float, comments='#', delimiter=None,
 
     """
     # Type conversions for Py3 convenience
-    comments = asbytes(comments)
+    if comments is not None:
+        comments = asbytes(comments)
     user_converters = converters
     if delimiter is not None:
         delimiter = asbytes(delimiter)
@@ -730,7 +682,8 @@ def loadtxt(fname, dtype=float, comments='#', delimiter=None,
         if _is_string_like(fname):
             fown = True
             if fname.endswith('.gz'):
-                fh = iter(seek_gzip_factory(fname))
+                import gzip
+                fh = iter(gzip.GzipFile(fname))
             elif fname.endswith('.bz2'):
                 import bz2
                 fh = iter(bz2.BZ2File(fname))
@@ -791,7 +744,10 @@ def pack_items(items, packing):
 
     def split_line(line):
         """Chop off comments, strip, and split at delimiter."""
-        line = asbytes(line).split(comments)[0].strip(asbytes('\r\n'))
+        if comments is None:
+            line = asbytes(line).strip(asbytes('\r\n'))
+        else:
+            line = asbytes(line).split(comments)[0].strip(asbytes('\r\n'))
         if line:
             return line.split(delimiter)
         else:
@@ -1519,7 +1475,9 @@ def genfromtxt(fname, dtype=float, comments='#', delimiter=None,
 
     # Process the filling_values ...............................
     # Rename the input for convenience
-    user_filling_values = filling_values or []
+    user_filling_values = filling_values
+    if user_filling_values is None:
+        user_filling_values = []
     # Define the default
     filling_values = [None] * nbcols
     # We have a dictionary : update each entry individually
@@ -1574,22 +1532,25 @@ def genfromtxt(fname, dtype=float, comments='#', delimiter=None,
                           for (miss, fill) in zipit]
     # Update the converters to use the user-defined ones
     uc_update = []
-    for (i, conv) in user_converters.items():
+    for (j, conv) in user_converters.items():
         # If the converter is specified by column names, use the index instead
-        if _is_string_like(i):
+        if _is_string_like(j):
             try:
-                i = names.index(i)
+                j = names.index(j)
+                i = j
             except ValueError:
                 continue
         elif usecols:
             try:
-                i = usecols.index(i)
+                i = usecols.index(j)
             except ValueError:
                 # Unused converter specified
                 continue
-        # Find the value to test:
+        else:
+            i = j
+        # Find the value to test - first_line is not filtered by usecols:
         if len(first_line):
-            testing_value = first_values[i]
+            testing_value = first_values[j]
         else:
             testing_value = None
         converters[i].update(conv, locked=True,
diff --git a/numpy/lib/tests/test_format.py b/numpy/lib/tests/test_format.py
index b266f1c1586b..ee77386bcc14 100644
--- a/numpy/lib/tests/test_format.py
+++ b/numpy/lib/tests/test_format.py
@@ -688,28 +688,28 @@ def test_bad_header():
 
 def test_large_file_support():
     from nose import SkipTest
+    if (sys.platform == 'win32' or sys.platform == 'cygwin'):
+        raise SkipTest("Unknown if Windows has sparse filesystems")
     # try creating a large sparse file
-    with tempfile.NamedTemporaryFile() as tf:
-        try:
-            # seek past end would work too, but linux truncate somewhat
-            # increases the chances that we have a sparse filesystem and can
-            # avoid actually writing 5GB
-            import subprocess as sp
-            sp.check_call(["truncate", "-s", "5368709120", tf.name])
-        except:
-            raise SkipTest("Could not create 5GB large file")
-        # write a small array to the end
-        f = open(tf.name, "wb")
+    tf_name = os.path.join(tempdir, 'sparse_file')
+    try:
+        # seek past end would work too, but linux truncate somewhat
+        # increases the chances that we have a sparse filesystem and can
+        # avoid actually writing 5GB
+        import subprocess as sp
+        sp.check_call(["truncate", "-s", "5368709120", tf_name])
+    except:
+        raise SkipTest("Could not create 5GB large file")
+    # write a small array to the end
+    with open(tf_name, "wb") as f:
         f.seek(5368709120)
         d = np.arange(5)
         np.save(f, d)
-        f.close()
-        # read it back
-        f = open(tf.name, "rb")
+    # read it back
+    with open(tf_name, "rb") as f:
         f.seek(5368709120)
         r = np.load(f)
-        f.close()
-        assert_array_equal(r, d)
+    assert_array_equal(r, d)
 
 
 if __name__ == "__main__":
diff --git a/numpy/lib/tests/test_function_base.py b/numpy/lib/tests/test_function_base.py
index a3f805691690..0eaada789ac7 100644
--- a/numpy/lib/tests/test_function_base.py
+++ b/numpy/lib/tests/test_function_base.py
@@ -312,6 +312,16 @@ def test_index_array_copied(self):
         np.insert([0, 1, 2], x, [3, 4, 5])
         assert_equal(x, np.array([1, 1, 1]))
 
+    def test_structured_array(self):
+        a = np.array([(1, 'a'), (2, 'b'), (3, 'c')],
+                     dtype=[('foo', 'i'), ('bar', 'a1')])
+        val = (4, 'd')
+        b = np.insert(a, 0, val)
+        assert_array_equal(b[0], np.array(val, dtype=b.dtype))
+        val = [(4, 'd')] * 2
+        b = np.insert(a, [0, 2], val)
+        assert_array_equal(b[[0, 3]], np.array(val, dtype=b.dtype))
+
 
 class TestAmax(TestCase):
     def test_basic(self):
@@ -516,8 +526,18 @@ def test_badargs(self):
 
     def test_masked(self):
         # Make sure that gradient supports subclasses like masked arrays
-        x = np.ma.array([[1, 1], [3, 4]])
-        assert_equal(type(gradient(x)[0]), type(x))
+        x = np.ma.array([[1, 1], [3, 4]],
+                        mask=[[False, False], [False, False]])
+        out = gradient(x)[0]
+        assert_equal(type(out), type(x))
+        # And make sure that the output and input don't have aliased mask
+        # arrays
+        assert_(x.mask is not out.mask)
+        # Also check that edge_order=2 doesn't alter the original mask
+        x2 = np.ma.arange(5)
+        x2[2] = np.ma.masked
+        np.gradient(x2, edge_order=2)
+        assert_array_equal(x2.mask, [False, False, True, False, False])
 
     def test_datetime64(self):
         # Make sure gradient() can handle special types like datetime64
@@ -526,7 +546,7 @@ def test_datetime64(self):
              '1910-10-12', '1910-12-12', '1912-12-12'],
             dtype='datetime64[D]')
         dx = np.array(
-            [-7, -3, 0, 31, 61, 396, 1066],
+            [-5, -3, 0, 31, 61, 396, 731],
             dtype='timedelta64[D]')
         assert_array_equal(gradient(x), dx)
         assert_(dx.dtype == np.dtype('timedelta64[D]'))
@@ -537,7 +557,7 @@ def test_timedelta64(self):
             [-5, -3, 10, 12, 61, 321, 300],
             dtype='timedelta64[D]')
         dx = np.array(
-            [-3, 7, 7, 25, 154, 119, -161],
+            [2, 7, 7, 25, 154, 119, -21],
             dtype='timedelta64[D]')
         assert_array_equal(gradient(x), dx)
         assert_(dx.dtype == np.dtype('timedelta64[D]'))
@@ -551,7 +571,7 @@ def test_second_order_accurate(self):
         dx = x[1] - x[0]
         y = 2 * x ** 3 + 4 * x ** 2 + 2 * x
         analytical = 6 * x ** 2 + 8 * x + 2
-        num_error = np.abs((np.gradient(y, dx) / analytical) - 1)
+        num_error = np.abs((np.gradient(y, dx, edge_order=2) / analytical) - 1)
         assert_(np.all(num_error < 0.03) == True)
 
 
@@ -1072,6 +1092,13 @@ def test_type(self):
         h, b = histogram(a, weights=np.ones(10, float))
         assert_(issubdtype(h.dtype, float))
 
+    def test_f32_rounding(self):
+        # gh-4799, check that the rounding of the edges works with float32
+        x = np.array([276.318359  , -69.593948  , 21.329449], dtype=np.float32)
+        y = np.array([5005.689453, 4481.327637, 6010.369629], dtype=np.float32)
+        counts_hist, xedges, yedges = np.histogram2d(x, y, bins=100)
+        assert_equal(counts_hist.sum(), 3.)
+
     def test_weights(self):
         v = rand(100)
         w = np.ones(100) * 5
@@ -1460,7 +1487,7 @@ def test_invalid_arguments(self):
         # Test that meshgrid complains about invalid arguments
         # Regression test for issue #4755:
         # https://github.com/numpy/numpy/issues/4755
-        assert_raises(TypeError, meshgrid, 
+        assert_raises(TypeError, meshgrid,
                       [1, 2, 3], [4, 5, 6, 7], indices='ij')
 
 
@@ -1860,6 +1887,14 @@ def test_percentile_no_overwrite(self):
         np.percentile(a, [50])
         assert_equal(a, np.array([2, 3, 4, 1]))
 
+    def test_no_p_overwrite(self):
+        p = np.linspace(0., 100., num=5)
+        np.percentile(np.arange(100.), p, interpolation="midpoint")
+        assert_array_equal(p, np.linspace(0., 100., num=5))
+        p = np.linspace(0., 100., num=5).tolist()
+        np.percentile(np.arange(100.), p, interpolation="midpoint")
+        assert_array_equal(p, np.linspace(0., 100., num=5).tolist())
+
     def test_percentile_overwrite(self):
         a = np.array([2, 3, 4, 1])
         b = np.percentile(a, [50], overwrite_input=True)
diff --git a/numpy/lib/tests/test_io.py b/numpy/lib/tests/test_io.py
index 49ad1ba5b006..50fe0f4d6a45 100644
--- a/numpy/lib/tests/test_io.py
+++ b/numpy/lib/tests/test_io.py
@@ -4,9 +4,7 @@
 import gzip
 import os
 import threading
-import shutil
-import contextlib
-from tempfile import mkstemp, mkdtemp, NamedTemporaryFile
+from tempfile import mkstemp, NamedTemporaryFile
 import time
 import warnings
 import gc
@@ -24,13 +22,7 @@
     assert_raises, assert_raises_regex, run_module_suite
 )
 from numpy.testing import assert_warns, assert_, build_err_msg
-
-
-@contextlib.contextmanager
-def tempdir(change_dir=False):
-    tmpdir = mkdtemp()
-    yield tmpdir
-    shutil.rmtree(tmpdir)
+from numpy.testing.utils import tempdir
 
 
 class TextIO(BytesIO):
@@ -202,7 +194,7 @@ def roundtrip(self, *args, **kwargs):
     def test_big_arrays(self):
         L = (1 << 31) + 100000
         a = np.empty(L, dtype=np.uint8)
-        with tempdir() as tmpdir:
+        with tempdir(prefix="numpy_test_big_arrays_") as tmpdir:
             tmp = os.path.join(tmpdir, "file.npz")
             np.savez(tmp, a=a)
             del a
@@ -311,7 +303,7 @@ def test_closing_zipfile_after_load(self):
         # Check that zipfile owns file and can close it.
         # This needs to pass a file name to load for the
         # test.
-        with tempdir() as tmpdir:
+        with tempdir(prefix="numpy_test_closing_zipfile_after_load_") as tmpdir:
             fd, tmp = mkstemp(suffix='.npz', dir=tmpdir)
             os.close(fd)
             np.savez(tmp, lab='place holder')
@@ -783,6 +775,14 @@ def test_bad_line(self):
         # Check for exception and that exception contains line number
         assert_raises_regex(ValueError, "3", np.loadtxt, c)
 
+    def test_none_as_string(self):
+        # gh-5155, None should work as string when format demands it
+        c = TextIO()
+        c.write('100,foo,200\n300,None,400')
+        c.seek(0)
+        dt = np.dtype([('x', int), ('a', 'S10'), ('y', int)])
+        data = np.loadtxt(c, delimiter=',', dtype=dt, comments=None)
+
 
 class Testfromregex(TestCase):
     # np.fromregex expects files opened in binary mode.
@@ -1093,6 +1093,21 @@ def test_dtype_with_converters(self):
         control = np.array([2009., 23., 46],)
         assert_equal(test, control)
 
+    def test_dtype_with_converters_and_usecols(self):
+        dstr = "1,5,-1,1:1\n2,8,-1,1:n\n3,3,-2,m:n\n"
+        dmap = {'1:1':0, '1:n':1, 'm:1':2, 'm:n':3}
+        dtyp = [('E1','i4'),('E2','i4'),('E3','i2'),('N', 'i1')]
+        conv = {0: int, 1: int, 2: int, 3: lambda r: dmap[r.decode()]}
+        test = np.recfromcsv(TextIO(dstr,), dtype=dtyp, delimiter=',',
+                             names=None, converters=conv)
+        control = np.rec.array([[1,5,-1,0], [2,8,-1,1], [3,3,-2,3]], dtype=dtyp)
+        assert_equal(test, control)
+        dtyp = [('E1','i4'),('E2','i4'),('N', 'i1')]
+        test = np.recfromcsv(TextIO(dstr,), dtype=dtyp, delimiter=',',
+                             usecols=(0,1,3), names=None, converters=conv)
+        control = np.rec.array([[1,5,0], [2,8,1], [3,3,3]], dtype=dtyp)
+        assert_equal(test, control)
+
     def test_dtype_with_object(self):
         "Test using an explicit dtype with an object"
         from datetime import date
@@ -1308,6 +1323,16 @@ def test_user_filling_values(self):
         ctrl = np.array([(0, 3), (4, -999)], dtype=[(_, int) for _ in "ac"])
         assert_equal(test, ctrl)
 
+        data2 = "1,2,*,4\n5,*,7,8\n"
+        test = np.genfromtxt(TextIO(data2), delimiter=',', dtype=int,
+                             missing_values="*", filling_values=0)
+        ctrl = np.array([[1, 2, 0, 4], [5, 0, 7, 8]])
+        assert_equal(test, ctrl)
+        test = np.genfromtxt(TextIO(data2), delimiter=',', dtype=int,
+                             missing_values="*", filling_values=-1)
+        ctrl = np.array([[1, 2, -1, 4], [5, -1, 7, 8]])
+        assert_equal(test, ctrl)
+
     def test_withmissing_float(self):
         data = TextIO('A,B\n0,1.5\n2,-999.00')
         test = np.mafromtxt(data, dtype=None, delimiter=',',
diff --git a/numpy/lib/tests/test_twodim_base.py b/numpy/lib/tests/test_twodim_base.py
index e9dbef70f62f..739061a5df49 100644
--- a/numpy/lib/tests/test_twodim_base.py
+++ b/numpy/lib/tests/test_twodim_base.py
@@ -311,6 +311,40 @@ def test_tril_triu_ndim3():
         yield assert_equal, a_triu_observed.dtype, a.dtype
         yield assert_equal, a_tril_observed.dtype, a.dtype
 
+def test_tril_triu_with_inf():
+    # Issue 4859
+    arr = np.array([[1, 1, np.inf],
+                    [1, 1, 1],
+                    [np.inf, 1, 1]])
+    out_tril = np.array([[1, 0, 0],
+                         [1, 1, 0],
+                         [np.inf, 1, 1]])
+    out_triu = out_tril.T
+    assert_array_equal(np.triu(arr), out_triu)
+    assert_array_equal(np.tril(arr), out_tril)
+
+
+def test_tril_triu_dtype():
+    # Issue 4916
+    # tril and triu should return the same dtype as input
+    for c in np.typecodes['All']:
+        if c == 'V':
+            continue
+        arr = np.zeros((3, 3), dtype=c)
+        assert_equal(np.triu(arr).dtype, arr.dtype)
+        assert_equal(np.tril(arr).dtype, arr.dtype)
+
+    # check special cases
+    arr = np.array([['2001-01-01T12:00', '2002-02-03T13:56'],
+                    ['2004-01-01T12:00', '2003-01-03T13:45']],
+                   dtype='datetime64')
+    assert_equal(np.triu(arr).dtype, arr.dtype)
+    assert_equal(np.tril(arr).dtype, arr.dtype)
+
+    arr = np.zeros((3,3), dtype='f4,f4')
+    assert_equal(np.triu(arr).dtype, arr.dtype)
+    assert_equal(np.tril(arr).dtype, arr.dtype)
+
 
 def test_mask_indices():
     # simple test without offset
diff --git a/numpy/lib/twodim_base.py b/numpy/lib/twodim_base.py
index 2861e1c4afc1..40a140b6b09c 100644
--- a/numpy/lib/twodim_base.py
+++ b/numpy/lib/twodim_base.py
@@ -387,7 +387,6 @@ def tri(N, M=None, k=0, dtype=float):
     dtype : dtype, optional
         Data type of the returned array.  The default is float.
 
-
     Returns
     -------
     tri : ndarray of shape (N, M)
@@ -452,7 +451,9 @@ def tril(m, k=0):
 
     """
     m = asanyarray(m)
-    return multiply(tri(*m.shape[-2:], k=k, dtype=bool), m, dtype=m.dtype)
+    mask = tri(*m.shape[-2:], k=k, dtype=bool)
+
+    return where(mask, m, zeros(1, m.dtype))
 
 
 def triu(m, k=0):
@@ -478,7 +479,9 @@ def triu(m, k=0):
 
     """
     m = asanyarray(m)
-    return multiply(~tri(*m.shape[-2:], k=k-1, dtype=bool), m, dtype=m.dtype)
+    mask = tri(*m.shape[-2:], k=k-1, dtype=bool)
+
+    return where(mask, zeros(1, m.dtype), m)
 
 
 # Originally borrowed from John Hunter and matplotlib
diff --git a/numpy/linalg/tests/test_linalg.py b/numpy/linalg/tests/test_linalg.py
index 8edf36aa67e7..dec98db8cec5 100644
--- a/numpy/linalg/tests/test_linalg.py
+++ b/numpy/linalg/tests/test_linalg.py
@@ -1108,6 +1108,8 @@ def test_xerbla_override():
     # and may, or may not, abort the process depending on the LAPACK package.
     from nose import SkipTest
 
+    XERBLA_OK = 255
+
     try:
         pid = os.fork()
     except (OSError, AttributeError):
@@ -1137,15 +1139,16 @@ def test_xerbla_override():
                 a, a, 0, 0)
         except ValueError as e:
             if "DORGQR parameter number 5" in str(e):
-                # success
-                os._exit(os.EX_OK)
+                # success, reuse error code to mark success as
+                # FORTRAN STOP returns as success.
+                os._exit(XERBLA_OK)
 
         # Did not abort, but our xerbla was not linked in.
         os._exit(os.EX_CONFIG)
     else:
         # parent
         pid, status = os.wait()
-        if os.WEXITSTATUS(status) != os.EX_OK or os.WIFSIGNALED(status):
+        if os.WEXITSTATUS(status) != XERBLA_OK:
             raise SkipTest('Numpy xerbla not linked in.')
 
 
diff --git a/numpy/ma/core.py b/numpy/ma/core.py
index e7427dc46a0c..9e4fb96856c1 100644
--- a/numpy/ma/core.py
+++ b/numpy/ma/core.py
@@ -145,10 +145,15 @@ class MaskError(MAError):
                   'S' : 'N/A',
                   'u' : 999999,
                   'V' : '???',
-                  'U' : 'N/A',
-                  'M8[D]' : np.datetime64('NaT', 'D'),
-                  'M8[us]' : np.datetime64('NaT', 'us')
+                  'U' : 'N/A'
                   }
+
+# Add datetime64 and timedelta64 types
+for v in ["Y", "M", "W", "D", "h", "m", "s", "ms", "us", "ns", "ps",
+          "fs", "as"]:
+    default_filler["M8[" + v + "]"] = np.datetime64("NaT", v)
+    default_filler["m8[" + v + "]"] = np.timedelta64("NaT", v)
+
 max_filler = ntypes._minvals
 max_filler.update([(k, -np.inf) for k in [np.float32, np.float64]])
 min_filler = ntypes._maxvals
@@ -194,7 +199,7 @@ def default_fill_value(obj):
     999999
     >>> np.ma.default_fill_value(np.array([1.1, 2., np.pi]))
     1e+20
-    >>>  np.ma.default_fill_value(np.dtype(complex))
+    >>> np.ma.default_fill_value(np.dtype(complex))
     (1e+20+0j)
 
     """
@@ -203,7 +208,7 @@ def default_fill_value(obj):
     elif isinstance(obj, np.dtype):
         if obj.subdtype:
             defval = default_filler.get(obj.subdtype[0].kind, '?')
-        elif obj.kind == 'M':
+        elif obj.kind in 'Mm':
             defval = default_filler.get(obj.str[1:], '?')
         else:
             defval = default_filler.get(obj.kind, '?')
@@ -780,6 +785,8 @@ def __call__ (self, a, b):
         # component of numpy's import time.
         if self.tolerance is None:
             self.tolerance = np.finfo(float).tiny
+        # don't call ma ufuncs from __array_wrap__ which would fail for scalars
+        a, b = np.asarray(a), np.asarray(b)
         return umath.absolute(a) * self.tolerance >= umath.absolute(b)
 
 
@@ -843,8 +850,7 @@ def __call__ (self, a, *args, **kwargs):
         d = getdata(a)
         # Case 1.1. : Domained function
         if self.domain is not None:
-            with np.errstate():
-                np.seterr(divide='ignore', invalid='ignore')
+            with np.errstate(divide='ignore', invalid='ignore'):
                 result = self.f(d, *args, **kwargs)
             # Make a mask
             m = ~umath.isfinite(result)
@@ -932,8 +938,7 @@ def __call__ (self, a, b, *args, **kwargs):
         else:
             m = umath.logical_or(ma, mb)
         # Get the result
-        with np.errstate():
-            np.seterr(divide='ignore', invalid='ignore')
+        with np.errstate(divide='ignore', invalid='ignore'):
             result = self.f(da, db, *args, **kwargs)
         # check it worked
         if result is NotImplemented:
@@ -945,11 +950,8 @@ def __call__ (self, a, b, *args, **kwargs):
             return result
         # Case 2. : array
         # Revert result to da where masked
-        if m.any():
-            np.copyto(result, 0, casting='unsafe', where=m)
-            # This only makes sense if the operation preserved the dtype
-            if result.dtype == da.dtype:
-                result += m * da
+        if m is not nomask:
+            np.copyto(result, da, casting='unsafe', where=m)
         # Transforms to a (subclass of) MaskedArray
         result = result.view(get_masked_subclass(a, b))
         result._mask = m
@@ -1073,8 +1075,7 @@ def __call__(self, a, b, *args, **kwargs):
         (da, db) = (getdata(a, subok=False), getdata(b, subok=False))
         (ma, mb) = (getmask(a), getmask(b))
         # Get the result
-        with np.errstate():
-            np.seterr(divide='ignore', invalid='ignore')
+        with np.errstate(divide='ignore', invalid='ignore'):
             result = self.f(da, db, *args, **kwargs)
         # check it worked
         if result is NotImplemented:
@@ -1094,8 +1095,7 @@ def __call__(self, a, b, *args, **kwargs):
             else:
                 return result
         # When the mask is True, put back da
-        np.copyto(result, 0, casting='unsafe', where=m)
-        result += m * da
+        np.copyto(result, da, casting='unsafe', where=m)
         result = result.view(get_masked_subclass(a, b))
         result._mask = m
         if isinstance(b, MaskedArray):
@@ -2790,12 +2790,50 @@ def __array_finalize__(self, obj):
         """
         # Get main attributes .........
         self._update_from(obj)
+        # We have to decide how to initialize self.mask, based on
+        # obj.mask. This is very difficult.  There might be some
+        # correspondence between the elements in the array we are being
+        # created from (= obj) and us. Or... there might not. This method can
+        # be called in all kinds of places for all kinds of reasons -- could
+        # be empty_like, could be slicing, could be a ufunc, could be a view,
+        # ... The numpy subclassing interface simply doesn't give us any way
+        # to know, which means that at best this method will be based on
+        # guesswork and heuristics. To make things worse, there isn't even any
+        # clear consensus about what the desired behavior is. For instance,
+        # most users think that np.empty_like(marr) -- which goes via this
+        # method -- should return a masked array with an empty mask (see
+        # gh-3404 and linked discussions), but others disagree, and they have
+        # existing code which depends on empty_like returning an array that
+        # matches the input mask.
+        #
+        # Historically our algorithm was: if the template object mask had the
+        # same *number of elements* as us, then we used *it's mask object
+        # itself* as our mask, so that writes to us would also write to the
+        # original array. This is horribly broken in multiple ways.
+        #
+        # Now what we do instead is, if the template object mask has the same
+        # number of elements as us, and we do not have the same base pointer
+        # as the template object (b/c views like arr[...] should keep the same
+        # mask), then we make a copy of the template object mask and use
+        # that. This is also horribly broken but somewhat less so. Maybe.
         if isinstance(obj, ndarray):
-            odtype = obj.dtype
-            if odtype.names:
-                _mask = getattr(obj, '_mask', make_mask_none(obj.shape, odtype))
+            # XX: This looks like a bug -- shouldn't it check self.dtype
+            # instead?
+            if obj.dtype.names:
+                _mask = getattr(obj, '_mask',
+                                make_mask_none(obj.shape, obj.dtype))
             else:
                 _mask = getattr(obj, '_mask', nomask)
+            # If self and obj point to exactly the same data, then probably
+            # self is a simple view of obj (e.g., self = obj[...]), so they
+            # should share the same mask. (This isn't 100% reliable, e.g. self
+            # could be the first row of obj, or have strange strides, but as a
+            # heuristic it's not bad.) In all other cases, we make a copy of
+            # the mask, so that future modifications to 'self' do not end up
+            # side-effecting 'obj' as well.
+            if (obj.__array_interface__["data"][0]
+                    != self.__array_interface__["data"][0]):
+                _mask = _mask.copy()
         else:
             _mask = nomask
         self._mask = _mask
@@ -3840,8 +3878,7 @@ def __ipow__(self, other):
         "Raise self to the power other, in place."
         other_data = getdata(other)
         other_mask = getmask(other)
-        with np.errstate():
-            np.seterr(divide='ignore', invalid='ignore')
+        with np.errstate(divide='ignore', invalid='ignore'):
             ndarray.__ipow__(self._data, np.where(self._mask, 1, other_data))
         invalid = np.logical_not(np.isfinite(self._data))
         if invalid.any():
@@ -5029,6 +5066,10 @@ def sort(self, axis= -1, kind='quicksort', order=None,
     endwith : {True, False}, optional
         Whether missing values (if any) should be forced in the upper indices
         (at the end of the array) (True) or lower indices (at the beginning).
+        When the array contains unmasked values of the largest (or smallest if
+        False) representable value of the datatype the ordering of these values
+        and the masked values is undefined.  To enforce the masked values are
+        at the end (beginning) in this case one must sort the mask.
     fill_value : {var}, optional
         Value used internally for the masked values.
         If ``fill_value`` is not None, it supersedes ``endwith``.
@@ -5598,9 +5639,8 @@ class mvoid(MaskedArray):
     """
     #
     def __new__(self, data, mask=nomask, dtype=None, fill_value=None,
-                hardmask=False):
-        dtype = dtype or data.dtype
-        _data = np.array(data, dtype=dtype)
+                hardmask=False, copy=False, subok=True):
+        _data = np.array(data, copy=copy, subok=subok, dtype=dtype)
         _data = _data.view(self)
         _data._hardmask = hardmask
         if mask is not nomask:
@@ -6120,8 +6160,7 @@ def power(a, b, third=None):
     else:
         basetype = MaskedArray
     # Get the result and view it as a (subclass of) MaskedArray
-    with np.errstate():
-        np.seterr(divide='ignore', invalid='ignore')
+    with np.errstate(divide='ignore', invalid='ignore'):
         result = np.where(m, fa, umath.power(fa, fb)).view(basetype)
     result._update_from(a)
     # Find where we're in trouble w/ NaNs and Infs
diff --git a/numpy/ma/tests/test_core.py b/numpy/ma/tests/test_core.py
index e6f659041490..d12df5d81606 100644
--- a/numpy/ma/tests/test_core.py
+++ b/numpy/ma/tests/test_core.py
@@ -194,8 +194,7 @@ def test_asarray(self):
 
     def test_fix_invalid(self):
         # Checks fix_invalid.
-        with np.errstate():
-            np.seterr(invalid='ignore')
+        with np.errstate(invalid='ignore'):
             data = masked_array([np.nan, 0., 1.], mask=[0, 0, 1])
             data_fixed = fix_invalid(data)
             assert_equal(data_fixed._data, [data.fill_value, 0., 1.])
@@ -815,7 +814,7 @@ def test_count_func(self):
         res = count(ott)
         self.assertTrue(res.dtype.type is np.intp)
         assert_equal(3, res)
-        
+
         ott = ott.reshape((2, 2))
         res = count(ott)
         assert_(res.dtype.type is np.intp)
@@ -1490,6 +1489,21 @@ def test_fillvalue_exotic_dtype(self):
         control = np.array((0, 0, 0), dtype="int, float, float").astype(ndtype)
         assert_equal(_check_fill_value(0, ndtype), control)
 
+    def test_fillvalue_datetime_timedelta(self):
+        # Test default fillvalue for datetime64 and timedelta64 types.
+        # See issue #4476, this would return '?' which would cause errors
+        # elsewhere
+
+        for timecode in ("as", "fs", "ps", "ns", "us", "ms", "s", "m",
+                         "h", "D", "W", "M", "Y"):
+            control = numpy.datetime64("NaT", timecode)
+            test = default_fill_value(numpy.dtype("<M8[" + timecode + "]"))
+            assert_equal(test, control)
+
+            control = numpy.timedelta64("NaT", timecode)
+            test = default_fill_value(numpy.dtype("<m8[" + timecode + "]"))
+            assert_equal(test, control)
+
     def test_extremum_fill_value(self):
         # Tests extremum fill values for flexible type.
         a = array([(1, (2, 3)), (4, (5, 6))],
@@ -2227,6 +2241,13 @@ def test_empty(self):
         assert_equal(b.shape, a.shape)
         assert_equal(b.fill_value, a.fill_value)
 
+        # check empty_like mask handling
+        a = masked_array([1, 2, 3], mask=[False, True, False])
+        b = empty_like(a)
+        assert_(not np.may_share_memory(a.mask, b.mask))
+        b = a.view(masked_array)
+        assert_(np.may_share_memory(a.mask, b.mask))
+
     def test_put(self):
         # Tests put.
         d = arange(5)
@@ -3523,8 +3544,15 @@ def test_getitem(self):
         assert_equal_records(a[-2]._mask, a._mask[-2])
 
     def test_setitem(self):
-        # Issue 2403
+        # Issue 4866: check that one can set individual items in [record][col]
+        # and [col][record] order
         ndtype = np.dtype([('a', float), ('b', int)])
+        ma = np.ma.MaskedArray([(1.0, 1), (2.0, 2)], dtype=ndtype)
+        ma['a'][1] = 3.0
+        assert_equal(ma['a'], np.array([1.0, 3.0]))
+        ma[1]['a'] = 4.0
+        assert_equal(ma['a'], np.array([1.0, 4.0]))
+        # Issue 2403
         mdtype = np.dtype([('a', bool), ('b', bool)])
         # soft mask
         control = np.array([(False, True), (True, True)], dtype=mdtype)
diff --git a/numpy/ma/tests/test_old_ma.py b/numpy/ma/tests/test_old_ma.py
index 87c2133d78e7..047f91c77575 100644
--- a/numpy/ma/tests/test_old_ma.py
+++ b/numpy/ma/tests/test_old_ma.py
@@ -607,8 +607,7 @@ def test_testToPython(self):
     def test_testScalarArithmetic(self):
         xm = array(0, mask=1)
         #TODO FIXME: Find out what the following raises a warning in r8247
-        with np.errstate():
-            np.seterr(divide='ignore')
+        with np.errstate(divide='ignore'):
             self.assertTrue((1 / array(0)).mask)
         self.assertTrue((1 + xm).mask)
         self.assertTrue((-xm).mask)
diff --git a/numpy/polynomial/__init__.py b/numpy/polynomial/__init__.py
index e9ca387c3713..1200d1c8dd5a 100644
--- a/numpy/polynomial/__init__.py
+++ b/numpy/polynomial/__init__.py
@@ -15,8 +15,6 @@
 """
 from __future__ import division, absolute_import, print_function
 
-import warnings
-
 from .polynomial import Polynomial
 from .chebyshev import Chebyshev
 from .legendre import Legendre
diff --git a/numpy/polynomial/_polybase.py b/numpy/polynomial/_polybase.py
index 23608c74a9be..234b509aa070 100644
--- a/numpy/polynomial/_polybase.py
+++ b/numpy/polynomial/_polybase.py
@@ -374,7 +374,7 @@ def __divmod__(self, other):
         return quo, rem
 
     def __pow__(self, other):
-        coef = self._pow(self.coef, other, maxpower = self.maxpower)
+        coef = self._pow(self.coef, other, maxpower=self.maxpower)
         res = self.__class__(coef, self.domain, self.window)
         return res
 
@@ -721,8 +721,6 @@ def linspace(self, n=100, domain=None):
         y = self(x)
         return x, y
 
-
-
     @classmethod
     def fit(cls, x, y, deg, domain=None, rcond=None, full=False, w=None,
         window=None):
diff --git a/numpy/polynomial/chebyshev.py b/numpy/polynomial/chebyshev.py
index b4acbbeab0a9..f213ab3fd049 100644
--- a/numpy/polynomial/chebyshev.py
+++ b/numpy/polynomial/chebyshev.py
@@ -94,13 +94,14 @@
 from . import polyutils as pu
 from ._polybase import ABCPolyBase
 
-__all__ = ['chebzero', 'chebone', 'chebx', 'chebdomain', 'chebline',
-    'chebadd', 'chebsub', 'chebmulx', 'chebmul', 'chebdiv', 'chebpow',
-    'chebval', 'chebder', 'chebint', 'cheb2poly', 'poly2cheb',
-    'chebfromroots', 'chebvander', 'chebfit', 'chebtrim', 'chebroots',
-    'chebpts1', 'chebpts2', 'Chebyshev', 'chebval2d', 'chebval3d',
-    'chebgrid2d', 'chebgrid3d', 'chebvander2d', 'chebvander3d',
-    'chebcompanion', 'chebgauss', 'chebweight']
+__all__ = [
+    'chebzero', 'chebone', 'chebx', 'chebdomain', 'chebline', 'chebadd',
+    'chebsub', 'chebmulx', 'chebmul', 'chebdiv', 'chebpow', 'chebval',
+    'chebder', 'chebint', 'cheb2poly', 'poly2cheb', 'chebfromroots',
+    'chebvander', 'chebfit', 'chebtrim', 'chebroots', 'chebpts1',
+    'chebpts2', 'Chebyshev', 'chebval2d', 'chebval3d', 'chebgrid2d',
+    'chebgrid3d', 'chebvander2d', 'chebvander3d', 'chebcompanion',
+    'chebgauss', 'chebweight']
 
 chebtrim = pu.trimcoef
 
@@ -109,7 +110,7 @@
 # functions and do minimal error checking.
 #
 
-def _cseries_to_zseries(c) :
+def _cseries_to_zseries(c):
     """Covert Chebyshev series to z-series.
 
     Covert a Chebyshev series to the equivalent z-series. The result is
@@ -134,7 +135,7 @@ def _cseries_to_zseries(c) :
     return zs + zs[::-1]
 
 
-def _zseries_to_cseries(zs) :
+def _zseries_to_cseries(zs):
     """Covert z-series to a Chebyshev series.
 
     Covert a z series to the equivalent Chebyshev series. The result is
@@ -159,7 +160,7 @@ def _zseries_to_cseries(zs) :
     return c
 
 
-def _zseries_mul(z1, z2) :
+def _zseries_mul(z1, z2):
     """Multiply two z-series.
 
     Multiply two z-series to produce a z-series.
@@ -186,7 +187,7 @@ def _zseries_mul(z1, z2) :
     return np.convolve(z1, z2)
 
 
-def _zseries_div(z1, z2) :
+def _zseries_div(z1, z2):
     """Divide the first z-series by the second.
 
     Divide `z1` by `z2` and return the quotient and remainder as z-series.
@@ -223,19 +224,19 @@ def _zseries_div(z1, z2) :
     z2 = z2.copy()
     len1 = len(z1)
     len2 = len(z2)
-    if len2 == 1 :
+    if len2 == 1:
         z1 /= z2
         return z1, z1[:1]*0
-    elif len1 < len2 :
+    elif len1 < len2:
         return z1[:1]*0, z1
-    else :
+    else:
         dlen = len1 - len2
         scl = z2[0]
         z2 /= scl
         quo = np.empty(dlen + 1, dtype=z1.dtype)
         i = 0
         j = dlen
-        while i < j :
+        while i < j:
             r = z1[i]
             quo[i] = z1[i]
             quo[dlen - i] = r
@@ -253,7 +254,7 @@ def _zseries_div(z1, z2) :
         return quo, rem
 
 
-def _zseries_der(zs) :
+def _zseries_der(zs):
     """Differentiate a z-series.
 
     The derivative is with respect to x, not z. This is achieved using the
@@ -285,7 +286,7 @@ def _zseries_der(zs) :
     return d
 
 
-def _zseries_int(zs) :
+def _zseries_int(zs):
     """Integrate a z-series.
 
     The integral is with respect to x, not z. This is achieved by a change
@@ -323,7 +324,7 @@ def _zseries_int(zs) :
 #
 
 
-def poly2cheb(pol) :
+def poly2cheb(pol):
     """
     Convert a polynomial to a Chebyshev series.
 
@@ -368,12 +369,12 @@ def poly2cheb(pol) :
     [pol] = pu.as_series([pol])
     deg = len(pol) - 1
     res = 0
-    for i in range(deg, -1, -1) :
+    for i in range(deg, -1, -1):
         res = chebadd(chebmulx(res), pol[i])
     return res
 
 
-def cheb2poly(c) :
+def cheb2poly(c):
     """
     Convert a Chebyshev series to a polynomial.
 
@@ -427,7 +428,7 @@ def cheb2poly(c) :
         c0 = c[-2]
         c1 = c[-1]
         # i is the current degree of c1
-        for i in range(n - 1, 1, -1) :
+        for i in range(n - 1, 1, -1):
             tmp = c0
             c0 = polysub(c[i - 2], c1)
             c1 = polyadd(tmp, polymulx(c1)*2)
@@ -452,7 +453,7 @@ def cheb2poly(c) :
 chebx = np.array([0, 1])
 
 
-def chebline(off, scl) :
+def chebline(off, scl):
     """
     Chebyshev series whose graph is a straight line.
 
@@ -482,13 +483,13 @@ def chebline(off, scl) :
     -3.0
 
     """
-    if scl != 0 :
+    if scl != 0:
         return np.array([off, scl])
-    else :
+    else:
         return np.array([off])
 
 
-def chebfromroots(roots) :
+def chebfromroots(roots):
     """
     Generate a Chebyshev series with given roots.
 
@@ -537,9 +538,9 @@ def chebfromroots(roots) :
     array([ 1.5+0.j,  0.0+0.j,  0.5+0.j])
 
     """
-    if len(roots) == 0 :
+    if len(roots) == 0:
         return np.ones(1)
-    else :
+    else:
         [roots] = pu.as_series([roots], trim=False)
         roots.sort()
         p = [chebline(-r, 1) for r in roots]
@@ -595,10 +596,10 @@ def chebadd(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if len(c1) > len(c2) :
+    if len(c1) > len(c2):
         c1[:c2.size] += c2
         ret = c1
-    else :
+    else:
         c2[:c1.size] += c1
         ret = c2
     return pu.trimseq(ret)
@@ -647,10 +648,10 @@ def chebsub(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if len(c1) > len(c2) :
+    if len(c1) > len(c2):
         c1[:c2.size] -= c2
         ret = c1
-    else :
+    else:
         c2 = -c2
         c2[:c1.size] += c1
         ret = c2
@@ -794,16 +795,16 @@ def chebdiv(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if c2[-1] == 0 :
+    if c2[-1] == 0:
         raise ZeroDivisionError()
 
     lc1 = len(c1)
     lc2 = len(c2)
-    if lc1 < lc2 :
+    if lc1 < lc2:
         return c1[:1]*0, c1
-    elif lc2 == 1 :
+    elif lc2 == 1:
         return c1/c2[-1], c1[:1]*0
-    else :
+    else:
         z1 = _cseries_to_zseries(c1)
         z2 = _cseries_to_zseries(c2)
         quo, rem = _zseries_div(z1, z2)
@@ -812,7 +813,7 @@ def chebdiv(c1, c2):
         return quo, rem
 
 
-def chebpow(c, pow, maxpower=16) :
+def chebpow(c, pow, maxpower=16):
     """Raise a Chebyshev series to a power.
 
     Returns the Chebyshev series `c` raised to the power `pow`. The
@@ -846,25 +847,25 @@ def chebpow(c, pow, maxpower=16) :
     # c is a trimmed copy
     [c] = pu.as_series([c])
     power = int(pow)
-    if power != pow or power < 0 :
+    if power != pow or power < 0:
         raise ValueError("Power must be a non-negative integer.")
-    elif maxpower is not None and power > maxpower :
+    elif maxpower is not None and power > maxpower:
         raise ValueError("Power is too large")
-    elif power == 0 :
+    elif power == 0:
         return np.array([1], dtype=c.dtype)
-    elif power == 1 :
+    elif power == 1:
         return c
-    else :
+    else:
         # This can be made more efficient by using powers of two
         # in the usual way.
         zs = _cseries_to_zseries(c)
         prd = zs
-        for i in range(2, power + 1) :
+        for i in range(2, power + 1):
             prd = np.convolve(prd, zs)
         return _zseries_to_cseries(prd)
 
 
-def chebder(c, m=1, scl=1, axis=0) :
+def chebder(c, m=1, scl=1, axis=0):
     """
     Differentiate a Chebyshev series.
 
@@ -1057,9 +1058,9 @@ def chebint(c, m=1, k=[], lbnd=0, scl=1, axis=0):
 
     if cnt != m:
         raise ValueError("The order of integration must be integer")
-    if cnt < 0 :
+    if cnt < 0:
         raise ValueError("The order of integration must be non-negative")
-    if len(k) > cnt :
+    if len(k) > cnt:
         raise ValueError("Too many integration constants")
     if iaxis != axis:
         raise ValueError("The axis must be integer")
@@ -1073,7 +1074,7 @@ def chebint(c, m=1, k=[], lbnd=0, scl=1, axis=0):
 
     c = np.rollaxis(c, iaxis)
     k = list(k) + [0]*(cnt - len(k))
-    for i in range(cnt) :
+    for i in range(cnt):
         n = len(c)
         c *= scl
         if n == 1 and np.all(c[0] == 0):
@@ -1162,19 +1163,19 @@ def chebval(x, c, tensor=True):
     if isinstance(x, (tuple, list)):
         x = np.asarray(x)
     if isinstance(x, np.ndarray) and tensor:
-       c = c.reshape(c.shape + (1,)*x.ndim)
+        c = c.reshape(c.shape + (1,)*x.ndim)
 
-    if len(c) == 1 :
+    if len(c) == 1:
         c0 = c[0]
         c1 = 0
-    elif len(c) == 2 :
+    elif len(c) == 2:
         c0 = c[0]
         c1 = c[1]
-    else :
+    else:
         x2 = 2*x
         c0 = c[-2]
         c1 = c[-1]
-        for i in range(3, len(c) + 1) :
+        for i in range(3, len(c) + 1):
             tmp = c0
             c0 = c[-i] - c1
             c1 = tmp + c1*x2
@@ -1410,7 +1411,7 @@ def chebgrid3d(x, y, z, c):
     return c
 
 
-def chebvander(x, deg) :
+def chebvander(x, deg):
     """Pseudo-Vandermonde matrix of given degree.
 
     Returns the pseudo-Vandermonde matrix of degree `deg` and sample points
@@ -1457,15 +1458,15 @@ def chebvander(x, deg) :
     v = np.empty(dims, dtype=dtyp)
     # Use forward recursion to generate the entries.
     v[0] = x*0 + 1
-    if ideg > 0 :
+    if ideg > 0:
         x2 = 2*x
         v[1] = x
-        for i in range(2, ideg + 1) :
+        for i in range(2, ideg + 1):
             v[i] = v[i-1]*x2 - v[i-2]
     return np.rollaxis(v, 0, v.ndim)
 
 
-def chebvander2d(x, y, deg) :
+def chebvander2d(x, y, deg):
     """Pseudo-Vandermonde matrix of given degrees.
 
     Returns the pseudo-Vandermonde matrix of degrees `deg` and sample
@@ -1528,7 +1529,7 @@ def chebvander2d(x, y, deg) :
     return v.reshape(v.shape[:-2] + (-1,))
 
 
-def chebvander3d(x, y, z, deg) :
+def chebvander3d(x, y, z, deg):
     """Pseudo-Vandermonde matrix of given degrees.
 
     Returns the pseudo-Vandermonde matrix of degrees `deg` and sample
@@ -1714,13 +1715,13 @@ def chebfit(x, y, deg, rcond=None, full=False, w=None):
     y = np.asarray(y) + 0.0
 
     # check arguments.
-    if deg < 0 :
+    if deg < 0:
         raise ValueError("expected deg >= 0")
     if x.ndim != 1:
         raise TypeError("expected 1D vector for x")
     if x.size == 0:
         raise TypeError("expected non-empty vector for x")
-    if y.ndim < 1 or y.ndim > 2 :
+    if y.ndim < 1 or y.ndim > 2:
         raise TypeError("expected 1D or 2D array for y")
     if len(x) != len(y):
         raise TypeError("expected x and y to have same length")
@@ -1740,7 +1741,7 @@ def chebfit(x, y, deg, rcond=None, full=False, w=None):
         rhs = rhs * w
 
     # set rcond
-    if rcond is None :
+    if rcond is None:
         rcond = len(x)*np.finfo(x.dtype).eps
 
     # Determine the norms of the design matrix columns.
@@ -1759,9 +1760,9 @@ def chebfit(x, y, deg, rcond=None, full=False, w=None):
         msg = "The fit may be poorly conditioned"
         warnings.warn(msg, pu.RankWarning)
 
-    if full :
+    if full:
         return c, [resids, rank, s, rcond]
-    else :
+    else:
         return c
 
 
@@ -1916,8 +1917,8 @@ def chebweight(x):
     The weight function of the Chebyshev polynomials.
 
     The weight function is :math:`1/\sqrt{1 - x^2}` and the interval of
-    integration is :math:`[-1, 1]`. The Chebyshev polynomials are orthogonal, but
-    not normalized, with respect to this weight function.
+    integration is :math:`[-1, 1]`. The Chebyshev polynomials are
+    orthogonal, but not normalized, with respect to this weight function.
 
     Parameters
     ----------
diff --git a/numpy/polynomial/hermite.py b/numpy/polynomial/hermite.py
index 43ede58ac3c5..1fd49d7745fa 100644
--- a/numpy/polynomial/hermite.py
+++ b/numpy/polynomial/hermite.py
@@ -66,18 +66,18 @@
 from . import polyutils as pu
 from ._polybase import ABCPolyBase
 
-__all__ = ['hermzero', 'hermone', 'hermx', 'hermdomain', 'hermline',
-    'hermadd', 'hermsub', 'hermmulx', 'hermmul', 'hermdiv', 'hermpow',
-    'hermval', 'hermder', 'hermint', 'herm2poly', 'poly2herm',
-    'hermfromroots', 'hermvander', 'hermfit', 'hermtrim', 'hermroots',
-    'Hermite', 'hermval2d', 'hermval3d', 'hermgrid2d', 'hermgrid3d',
-    'hermvander2d', 'hermvander3d', 'hermcompanion', 'hermgauss',
-    'hermweight']
+__all__ = [
+    'hermzero', 'hermone', 'hermx', 'hermdomain', 'hermline', 'hermadd',
+    'hermsub', 'hermmulx', 'hermmul', 'hermdiv', 'hermpow', 'hermval',
+    'hermder', 'hermint', 'herm2poly', 'poly2herm', 'hermfromroots',
+    'hermvander', 'hermfit', 'hermtrim', 'hermroots', 'Hermite',
+    'hermval2d', 'hermval3d', 'hermgrid2d', 'hermgrid3d', 'hermvander2d',
+    'hermvander3d', 'hermcompanion', 'hermgauss', 'hermweight']
 
 hermtrim = pu.trimcoef
 
 
-def poly2herm(pol) :
+def poly2herm(pol):
     """
     poly2herm(pol)
 
@@ -118,12 +118,12 @@ def poly2herm(pol) :
     [pol] = pu.as_series([pol])
     deg = len(pol) - 1
     res = 0
-    for i in range(deg, -1, -1) :
+    for i in range(deg, -1, -1):
         res = hermadd(hermmulx(res), pol[i])
     return res
 
 
-def herm2poly(c) :
+def herm2poly(c):
     """
     Convert a Hermite series to a polynomial.
 
@@ -174,7 +174,7 @@ def herm2poly(c) :
         c0 = c[-2]
         c1 = c[-1]
         # i is the current degree of c1
-        for i in range(n - 1, 1, -1) :
+        for i in range(n - 1, 1, -1):
             tmp = c0
             c0 = polysub(c[i - 2], c1*(2*(i - 1)))
             c1 = polyadd(tmp, polymulx(c1)*2)
@@ -198,7 +198,7 @@ def herm2poly(c) :
 hermx = np.array([0, 1/2])
 
 
-def hermline(off, scl) :
+def hermline(off, scl):
     """
     Hermite series whose graph is a straight line.
 
@@ -228,13 +228,13 @@ def hermline(off, scl) :
     5.0
 
     """
-    if scl != 0 :
+    if scl != 0:
         return np.array([off, scl/2])
-    else :
+    else:
         return np.array([off])
 
 
-def hermfromroots(roots) :
+def hermfromroots(roots):
     """
     Generate a Hermite series with given roots.
 
@@ -284,9 +284,9 @@ def hermfromroots(roots) :
     array([ 0.+0.j,  0.+0.j])
 
     """
-    if len(roots) == 0 :
+    if len(roots) == 0:
         return np.ones(1)
-    else :
+    else:
         [roots] = pu.as_series([roots], trim=False)
         roots.sort()
         p = [hermline(-r, 1) for r in roots]
@@ -340,10 +340,10 @@ def hermadd(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if len(c1) > len(c2) :
+    if len(c1) > len(c2):
         c1[:c2.size] += c2
         ret = c1
-    else :
+    else:
         c2[:c1.size] += c1
         ret = c2
     return pu.trimseq(ret)
@@ -388,10 +388,10 @@ def hermsub(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if len(c1) > len(c2) :
+    if len(c1) > len(c2):
         c1[:c2.size] -= c2
         ret = c1
-    else :
+    else:
         c2 = -c2
         c2[:c1.size] += c1
         ret = c2
@@ -501,13 +501,13 @@ def hermmul(c1, c2):
     elif len(c) == 2:
         c0 = c[0]*xs
         c1 = c[1]*xs
-    else :
+    else:
         nd = len(c)
         c0 = c[-2]*xs
         c1 = c[-1]*xs
-        for i in range(3, len(c) + 1) :
+        for i in range(3, len(c) + 1):
             tmp = c0
-            nd =  nd - 1
+            nd = nd - 1
             c0 = hermsub(c[-i]*xs, c1*(2*(nd - 1)))
             c1 = hermadd(tmp, hermmulx(c1)*2)
     return hermadd(c0, hermmulx(c1)*2)
@@ -560,16 +560,16 @@ def hermdiv(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if c2[-1] == 0 :
+    if c2[-1] == 0:
         raise ZeroDivisionError()
 
     lc1 = len(c1)
     lc2 = len(c2)
-    if lc1 < lc2 :
+    if lc1 < lc2:
         return c1[:1]*0, c1
-    elif lc2 == 1 :
+    elif lc2 == 1:
         return c1/c2[-1], c1[:1]*0
-    else :
+    else:
         quo = np.empty(lc1 - lc2 + 1, dtype=c1.dtype)
         rem = c1
         for i in range(lc1 - lc2, - 1, -1):
@@ -580,7 +580,7 @@ def hermdiv(c1, c2):
         return quo, pu.trimseq(rem)
 
 
-def hermpow(c, pow, maxpower=16) :
+def hermpow(c, pow, maxpower=16):
     """Raise a Hermite series to a power.
 
     Returns the Hermite series `c` raised to the power `pow`. The
@@ -617,24 +617,24 @@ def hermpow(c, pow, maxpower=16) :
     # c is a trimmed copy
     [c] = pu.as_series([c])
     power = int(pow)
-    if power != pow or power < 0 :
+    if power != pow or power < 0:
         raise ValueError("Power must be a non-negative integer.")
-    elif maxpower is not None and power > maxpower :
+    elif maxpower is not None and power > maxpower:
         raise ValueError("Power is too large")
-    elif power == 0 :
+    elif power == 0:
         return np.array([1], dtype=c.dtype)
-    elif power == 1 :
+    elif power == 1:
         return c
-    else :
+    else:
         # This can be made more efficient by using powers of two
         # in the usual way.
         prd = c
-        for i in range(2, power + 1) :
+        for i in range(2, power + 1):
             prd = hermmul(prd, c)
         return prd
 
 
-def hermder(c, m=1, scl=1, axis=0) :
+def hermder(c, m=1, scl=1, axis=0):
     """
     Differentiate a Hermite series.
 
@@ -712,7 +712,7 @@ def hermder(c, m=1, scl=1, axis=0) :
     n = len(c)
     if cnt >= n:
         c = c[:1]*0
-    else :
+    else:
         for i in range(cnt):
             n = n - 1
             c *= scl
@@ -816,9 +816,9 @@ def hermint(c, m=1, k=[], lbnd=0, scl=1, axis=0):
 
     if cnt != m:
         raise ValueError("The order of integration must be integer")
-    if cnt < 0 :
+    if cnt < 0:
         raise ValueError("The order of integration must be non-negative")
-    if len(k) > cnt :
+    if len(k) > cnt:
         raise ValueError("Too many integration constants")
     if iaxis != axis:
         raise ValueError("The axis must be integer")
@@ -832,7 +832,7 @@ def hermint(c, m=1, k=[], lbnd=0, scl=1, axis=0):
 
     c = np.rollaxis(c, iaxis)
     k = list(k) + [0]*(cnt - len(k))
-    for i in range(cnt) :
+    for i in range(cnt):
         n = len(c)
         c *= scl
         if n == 1 and np.all(c[0] == 0):
@@ -924,22 +924,22 @@ def hermval(x, c, tensor=True):
     if isinstance(x, (tuple, list)):
         x = np.asarray(x)
     if isinstance(x, np.ndarray) and tensor:
-       c = c.reshape(c.shape + (1,)*x.ndim)
+        c = c.reshape(c.shape + (1,)*x.ndim)
 
     x2 = x*2
-    if len(c) == 1 :
+    if len(c) == 1:
         c0 = c[0]
         c1 = 0
-    elif len(c) == 2 :
+    elif len(c) == 2:
         c0 = c[0]
         c1 = c[1]
-    else :
+    else:
         nd = len(c)
         c0 = c[-2]
         c1 = c[-1]
-        for i in range(3, len(c) + 1) :
+        for i in range(3, len(c) + 1):
             tmp = c0
-            nd =  nd - 1
+            nd = nd - 1
             c0 = c[-i] - c1*(2*(nd - 1))
             c1 = tmp + c1*x2
     return c0 + c1*x2
@@ -1174,7 +1174,7 @@ def hermgrid3d(x, y, z, c):
     return c
 
 
-def hermvander(x, deg) :
+def hermvander(x, deg):
     """Pseudo-Vandermonde matrix of given degree.
 
     Returns the pseudo-Vandermonde matrix of degree `deg` and sample points
@@ -1229,15 +1229,15 @@ def hermvander(x, deg) :
     dtyp = x.dtype
     v = np.empty(dims, dtype=dtyp)
     v[0] = x*0 + 1
-    if ideg > 0 :
+    if ideg > 0:
         x2 = x*2
         v[1] = x2
-        for i in range(2, ideg + 1) :
+        for i in range(2, ideg + 1):
             v[i] = (v[i-1]*x2 - v[i-2]*(2*(i - 1)))
     return np.rollaxis(v, 0, v.ndim)
 
 
-def hermvander2d(x, y, deg) :
+def hermvander2d(x, y, deg):
     """Pseudo-Vandermonde matrix of given degrees.
 
     Returns the pseudo-Vandermonde matrix of degrees `deg` and sample
@@ -1300,7 +1300,7 @@ def hermvander2d(x, y, deg) :
     return v.reshape(v.shape[:-2] + (-1,))
 
 
-def hermvander3d(x, y, z, deg) :
+def hermvander3d(x, y, z, deg):
     """Pseudo-Vandermonde matrix of given degrees.
 
     Returns the pseudo-Vandermonde matrix of degrees `deg` and sample
@@ -1491,13 +1491,13 @@ def hermfit(x, y, deg, rcond=None, full=False, w=None):
     y = np.asarray(y) + 0.0
 
     # check arguments.
-    if deg < 0 :
+    if deg < 0:
         raise ValueError("expected deg >= 0")
     if x.ndim != 1:
         raise TypeError("expected 1D vector for x")
     if x.size == 0:
         raise TypeError("expected non-empty vector for x")
-    if y.ndim < 1 or y.ndim > 2 :
+    if y.ndim < 1 or y.ndim > 2:
         raise TypeError("expected 1D or 2D array for y")
     if len(x) != len(y):
         raise TypeError("expected x and y to have same length")
@@ -1517,7 +1517,7 @@ def hermfit(x, y, deg, rcond=None, full=False, w=None):
         rhs = rhs * w
 
     # set rcond
-    if rcond is None :
+    if rcond is None:
         rcond = len(x)*np.finfo(x.dtype).eps
 
     # Determine the norms of the design matrix columns.
@@ -1536,9 +1536,9 @@ def hermfit(x, y, deg, rcond=None, full=False, w=None):
         msg = "The fit may be poorly conditioned"
         warnings.warn(msg, pu.RankWarning)
 
-    if full :
+    if full:
         return c, [resids, rank, s, rcond]
-    else :
+    else:
         return c
 
 
@@ -1568,7 +1568,6 @@ def hermcompanion(c):
     .. versionadded::1.7.0
 
     """
-    accprod = np.multiply.accumulate
     # c is a trimmed copy
     [c] = pu.as_series([c])
     if len(c) < 2:
@@ -1636,9 +1635,9 @@ def hermroots(c):
     """
     # c is a trimmed copy
     [c] = pu.as_series([c])
-    if len(c) <= 1 :
+    if len(c) <= 1:
         return np.array([], dtype=c.dtype)
-    if len(c) == 2 :
+    if len(c) == 2:
         return np.array([-.5*c[0]/c[1]])
 
     m = hermcompanion(c)
diff --git a/numpy/polynomial/hermite_e.py b/numpy/polynomial/hermite_e.py
index 874b42470637..6e33dc0bc31c 100644
--- a/numpy/polynomial/hermite_e.py
+++ b/numpy/polynomial/hermite_e.py
@@ -66,18 +66,19 @@
 from . import polyutils as pu
 from ._polybase import ABCPolyBase
 
-__all__ = ['hermezero', 'hermeone', 'hermex', 'hermedomain', 'hermeline',
-    'hermeadd', 'hermesub', 'hermemulx', 'hermemul', 'hermediv', 'hermpow',
-    'hermeval',
-    'hermeder', 'hermeint', 'herme2poly', 'poly2herme', 'hermefromroots',
-    'hermevander', 'hermefit', 'hermetrim', 'hermeroots', 'HermiteE',
-    'hermeval2d', 'hermeval3d', 'hermegrid2d', 'hermegrid3d', 'hermevander2d',
-    'hermevander3d', 'hermecompanion', 'hermegauss', 'hermeweight']
+__all__ = [
+    'hermezero', 'hermeone', 'hermex', 'hermedomain', 'hermeline',
+    'hermeadd', 'hermesub', 'hermemulx', 'hermemul', 'hermediv',
+    'hermepow', 'hermeval', 'hermeder', 'hermeint', 'herme2poly',
+    'poly2herme', 'hermefromroots', 'hermevander', 'hermefit', 'hermetrim',
+    'hermeroots', 'HermiteE', 'hermeval2d', 'hermeval3d', 'hermegrid2d',
+    'hermegrid3d', 'hermevander2d', 'hermevander3d', 'hermecompanion',
+    'hermegauss', 'hermeweight']
 
 hermetrim = pu.trimcoef
 
 
-def poly2herme(pol) :
+def poly2herme(pol):
     """
     poly2herme(pol)
 
@@ -118,12 +119,12 @@ def poly2herme(pol) :
     [pol] = pu.as_series([pol])
     deg = len(pol) - 1
     res = 0
-    for i in range(deg, -1, -1) :
+    for i in range(deg, -1, -1):
         res = hermeadd(hermemulx(res), pol[i])
     return res
 
 
-def herme2poly(c) :
+def herme2poly(c):
     """
     Convert a Hermite series to a polynomial.
 
@@ -173,7 +174,7 @@ def herme2poly(c) :
         c0 = c[-2]
         c1 = c[-1]
         # i is the current degree of c1
-        for i in range(n - 1, 1, -1) :
+        for i in range(n - 1, 1, -1):
             tmp = c0
             c0 = polysub(c[i - 2], c1*(i - 1))
             c1 = polyadd(tmp, polymulx(c1))
@@ -197,7 +198,7 @@ def herme2poly(c) :
 hermex = np.array([0, 1])
 
 
-def hermeline(off, scl) :
+def hermeline(off, scl):
     """
     Hermite series whose graph is a straight line.
 
@@ -228,13 +229,13 @@ def hermeline(off, scl) :
     5.0
 
     """
-    if scl != 0 :
+    if scl != 0:
         return np.array([off, scl])
-    else :
+    else:
         return np.array([off])
 
 
-def hermefromroots(roots) :
+def hermefromroots(roots):
     """
     Generate a HermiteE series with given roots.
 
@@ -284,9 +285,9 @@ def hermefromroots(roots) :
     array([ 0.+0.j,  0.+0.j])
 
     """
-    if len(roots) == 0 :
+    if len(roots) == 0:
         return np.ones(1)
-    else :
+    else:
         [roots] = pu.as_series([roots], trim=False)
         roots.sort()
         p = [hermeline(-r, 1) for r in roots]
@@ -340,10 +341,10 @@ def hermeadd(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if len(c1) > len(c2) :
+    if len(c1) > len(c2):
         c1[:c2.size] += c2
         ret = c1
-    else :
+    else:
         c2[:c1.size] += c1
         ret = c2
     return pu.trimseq(ret)
@@ -388,10 +389,10 @@ def hermesub(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if len(c1) > len(c2) :
+    if len(c1) > len(c2):
         c1[:c2.size] -= c2
         ret = c1
-    else :
+    else:
         c2 = -c2
         c2[:c1.size] += c1
         ret = c2
@@ -501,13 +502,13 @@ def hermemul(c1, c2):
     elif len(c) == 2:
         c0 = c[0]*xs
         c1 = c[1]*xs
-    else :
+    else:
         nd = len(c)
         c0 = c[-2]*xs
         c1 = c[-1]*xs
-        for i in range(3, len(c) + 1) :
+        for i in range(3, len(c) + 1):
             tmp = c0
-            nd =  nd - 1
+            nd = nd - 1
             c0 = hermesub(c[-i]*xs, c1*(nd - 1))
             c1 = hermeadd(tmp, hermemulx(c1))
     return hermeadd(c0, hermemulx(c1))
@@ -558,16 +559,16 @@ def hermediv(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if c2[-1] == 0 :
+    if c2[-1] == 0:
         raise ZeroDivisionError()
 
     lc1 = len(c1)
     lc2 = len(c2)
-    if lc1 < lc2 :
+    if lc1 < lc2:
         return c1[:1]*0, c1
-    elif lc2 == 1 :
+    elif lc2 == 1:
         return c1/c2[-1], c1[:1]*0
-    else :
+    else:
         quo = np.empty(lc1 - lc2 + 1, dtype=c1.dtype)
         rem = c1
         for i in range(lc1 - lc2, - 1, -1):
@@ -578,7 +579,7 @@ def hermediv(c1, c2):
         return quo, pu.trimseq(rem)
 
 
-def hermepow(c, pow, maxpower=16) :
+def hermepow(c, pow, maxpower=16):
     """Raise a Hermite series to a power.
 
     Returns the Hermite series `c` raised to the power `pow`. The
@@ -615,24 +616,24 @@ def hermepow(c, pow, maxpower=16) :
     # c is a trimmed copy
     [c] = pu.as_series([c])
     power = int(pow)
-    if power != pow or power < 0 :
+    if power != pow or power < 0:
         raise ValueError("Power must be a non-negative integer.")
-    elif maxpower is not None and power > maxpower :
+    elif maxpower is not None and power > maxpower:
         raise ValueError("Power is too large")
-    elif power == 0 :
+    elif power == 0:
         return np.array([1], dtype=c.dtype)
-    elif power == 1 :
+    elif power == 1:
         return c
-    else :
+    else:
         # This can be made more efficient by using powers of two
         # in the usual way.
         prd = c
-        for i in range(2, power + 1) :
+        for i in range(2, power + 1):
             prd = hermemul(prd, c)
         return prd
 
 
-def hermeder(c, m=1, scl=1, axis=0) :
+def hermeder(c, m=1, scl=1, axis=0):
     """
     Differentiate a Hermite_e series.
 
@@ -710,7 +711,7 @@ def hermeder(c, m=1, scl=1, axis=0) :
     n = len(c)
     if cnt >= n:
         return c[:1]*0
-    else :
+    else:
         for i in range(cnt):
             n = n - 1
             c *= scl
@@ -814,9 +815,9 @@ def hermeint(c, m=1, k=[], lbnd=0, scl=1, axis=0):
 
     if cnt != m:
         raise ValueError("The order of integration must be integer")
-    if cnt < 0 :
+    if cnt < 0:
         raise ValueError("The order of integration must be non-negative")
-    if len(k) > cnt :
+    if len(k) > cnt:
         raise ValueError("Too many integration constants")
     if iaxis != axis:
         raise ValueError("The axis must be integer")
@@ -830,7 +831,7 @@ def hermeint(c, m=1, k=[], lbnd=0, scl=1, axis=0):
 
     c = np.rollaxis(c, iaxis)
     k = list(k) + [0]*(cnt - len(k))
-    for i in range(cnt) :
+    for i in range(cnt):
         n = len(c)
         c *= scl
         if n == 1 and np.all(c[0] == 0):
@@ -922,21 +923,21 @@ def hermeval(x, c, tensor=True):
     if isinstance(x, (tuple, list)):
         x = np.asarray(x)
     if isinstance(x, np.ndarray) and tensor:
-       c = c.reshape(c.shape + (1,)*x.ndim)
+        c = c.reshape(c.shape + (1,)*x.ndim)
 
-    if len(c) == 1 :
+    if len(c) == 1:
         c0 = c[0]
         c1 = 0
-    elif len(c) == 2 :
+    elif len(c) == 2:
         c0 = c[0]
         c1 = c[1]
-    else :
+    else:
         nd = len(c)
         c0 = c[-2]
         c1 = c[-1]
-        for i in range(3, len(c) + 1) :
+        for i in range(3, len(c) + 1):
             tmp = c0
-            nd =  nd - 1
+            nd = nd - 1
             c0 = c[-i] - c1*(nd - 1)
             c1 = tmp + c1*x
     return c0 + c1*x
@@ -1171,7 +1172,7 @@ def hermegrid3d(x, y, z, c):
     return c
 
 
-def hermevander(x, deg) :
+def hermevander(x, deg):
     """Pseudo-Vandermonde matrix of given degree.
 
     Returns the pseudo-Vandermonde matrix of degree `deg` and sample points
@@ -1226,14 +1227,14 @@ def hermevander(x, deg) :
     dtyp = x.dtype
     v = np.empty(dims, dtype=dtyp)
     v[0] = x*0 + 1
-    if ideg > 0 :
+    if ideg > 0:
         v[1] = x
-        for i in range(2, ideg + 1) :
+        for i in range(2, ideg + 1):
             v[i] = (v[i-1]*x - v[i-2]*(i - 1))
     return np.rollaxis(v, 0, v.ndim)
 
 
-def hermevander2d(x, y, deg) :
+def hermevander2d(x, y, deg):
     """Pseudo-Vandermonde matrix of given degrees.
 
     Returns the pseudo-Vandermonde matrix of degrees `deg` and sample
@@ -1296,7 +1297,7 @@ def hermevander2d(x, y, deg) :
     return v.reshape(v.shape[:-2] + (-1,))
 
 
-def hermevander3d(x, y, z, deg) :
+def hermevander3d(x, y, z, deg):
     """Pseudo-Vandermonde matrix of given degrees.
 
     Returns the pseudo-Vandermonde matrix of degrees `deg` and sample
@@ -1487,13 +1488,13 @@ def hermefit(x, y, deg, rcond=None, full=False, w=None):
     y = np.asarray(y) + 0.0
 
     # check arguments.
-    if deg < 0 :
+    if deg < 0:
         raise ValueError("expected deg >= 0")
     if x.ndim != 1:
         raise TypeError("expected 1D vector for x")
     if x.size == 0:
         raise TypeError("expected non-empty vector for x")
-    if y.ndim < 1 or y.ndim > 2 :
+    if y.ndim < 1 or y.ndim > 2:
         raise TypeError("expected 1D or 2D array for y")
     if len(x) != len(y):
         raise TypeError("expected x and y to have same length")
@@ -1513,7 +1514,7 @@ def hermefit(x, y, deg, rcond=None, full=False, w=None):
         rhs = rhs * w
 
     # set rcond
-    if rcond is None :
+    if rcond is None:
         rcond = len(x)*np.finfo(x.dtype).eps
 
     # Determine the norms of the design matrix columns.
@@ -1532,9 +1533,9 @@ def hermefit(x, y, deg, rcond=None, full=False, w=None):
         msg = "The fit may be poorly conditioned"
         warnings.warn(msg, pu.RankWarning)
 
-    if full :
+    if full:
         return c, [resids, rank, s, rcond]
-    else :
+    else:
         return c
 
 
@@ -1565,7 +1566,6 @@ def hermecompanion(c):
     .. versionadded::1.7.0
 
     """
-    accprod = np.multiply.accumulate
     # c is a trimmed copy
     [c] = pu.as_series([c])
     if len(c) < 2:
@@ -1633,9 +1633,9 @@ def hermeroots(c):
     """
     # c is a trimmed copy
     [c] = pu.as_series([c])
-    if len(c) <= 1 :
+    if len(c) <= 1:
         return np.array([], dtype=c.dtype)
-    if len(c) == 2 :
+    if len(c) == 2:
         return np.array([-c[0]/c[1]])
 
     m = hermecompanion(c)
diff --git a/numpy/polynomial/laguerre.py b/numpy/polynomial/laguerre.py
index 9d88162ce056..8d2705d5d314 100644
--- a/numpy/polynomial/laguerre.py
+++ b/numpy/polynomial/laguerre.py
@@ -66,17 +66,18 @@
 from . import polyutils as pu
 from ._polybase import ABCPolyBase
 
-__all__ = ['lagzero', 'lagone', 'lagx', 'lagdomain', 'lagline',
-    'lagadd', 'lagsub', 'lagmulx', 'lagmul', 'lagdiv', 'lagpow',
-    'lagval', 'lagder', 'lagint', 'lag2poly', 'poly2lag', 'lagfromroots',
-    'lagvander', 'lagfit', 'lagtrim', 'lagroots', 'Laguerre', 'lagval2d',
-    'lagval3d', 'laggrid2d', 'laggrid3d', 'lagvander2d', 'lagvander3d',
-    'lagcompanion', 'laggauss', 'lagweight']
+__all__ = [
+    'lagzero', 'lagone', 'lagx', 'lagdomain', 'lagline', 'lagadd',
+    'lagsub', 'lagmulx', 'lagmul', 'lagdiv', 'lagpow', 'lagval', 'lagder',
+    'lagint', 'lag2poly', 'poly2lag', 'lagfromroots', 'lagvander',
+    'lagfit', 'lagtrim', 'lagroots', 'Laguerre', 'lagval2d', 'lagval3d',
+    'laggrid2d', 'laggrid3d', 'lagvander2d', 'lagvander3d', 'lagcompanion',
+    'laggauss', 'lagweight']
 
 lagtrim = pu.trimcoef
 
 
-def poly2lag(pol) :
+def poly2lag(pol):
     """
     poly2lag(pol)
 
@@ -117,12 +118,12 @@ def poly2lag(pol) :
     [pol] = pu.as_series([pol])
     deg = len(pol) - 1
     res = 0
-    for i in range(deg, -1, -1) :
+    for i in range(deg, -1, -1):
         res = lagadd(lagmulx(res), pol[i])
     return res
 
 
-def lag2poly(c) :
+def lag2poly(c):
     """
     Convert a Laguerre series to a polynomial.
 
@@ -194,7 +195,7 @@ def lag2poly(c) :
 lagx = np.array([1, -1])
 
 
-def lagline(off, scl) :
+def lagline(off, scl):
     """
     Laguerre series whose graph is a straight line.
 
@@ -224,13 +225,13 @@ def lagline(off, scl) :
     5.0
 
     """
-    if scl != 0 :
+    if scl != 0:
         return np.array([off + scl, -scl])
-    else :
+    else:
         return np.array([off])
 
 
-def lagfromroots(roots) :
+def lagfromroots(roots):
     """
     Generate a Laguerre series with given roots.
 
@@ -280,9 +281,9 @@ def lagfromroots(roots) :
     array([ 0.+0.j,  0.+0.j])
 
     """
-    if len(roots) == 0 :
+    if len(roots) == 0:
         return np.ones(1)
-    else :
+    else:
         [roots] = pu.as_series([roots], trim=False)
         roots.sort()
         p = [lagline(-r, 1) for r in roots]
@@ -337,10 +338,10 @@ def lagadd(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if len(c1) > len(c2) :
+    if len(c1) > len(c2):
         c1[:c2.size] += c2
         ret = c1
-    else :
+    else:
         c2[:c1.size] += c1
         ret = c2
     return pu.trimseq(ret)
@@ -385,10 +386,10 @@ def lagsub(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if len(c1) > len(c2) :
+    if len(c1) > len(c2):
         c1[:c2.size] -= c2
         ret = c1
-    else :
+    else:
         c2 = -c2
         c2[:c1.size] += c1
         ret = c2
@@ -499,13 +500,13 @@ def lagmul(c1, c2):
     elif len(c) == 2:
         c0 = c[0]*xs
         c1 = c[1]*xs
-    else :
+    else:
         nd = len(c)
         c0 = c[-2]*xs
         c1 = c[-1]*xs
-        for i in range(3, len(c) + 1) :
+        for i in range(3, len(c) + 1):
             tmp = c0
-            nd =  nd - 1
+            nd = nd - 1
             c0 = lagsub(c[-i]*xs, (c1*(nd - 1))/nd)
             c1 = lagadd(tmp, lagsub((2*nd - 1)*c1, lagmulx(c1))/nd)
     return lagadd(c0, lagsub(c1, lagmulx(c1)))
@@ -556,16 +557,16 @@ def lagdiv(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if c2[-1] == 0 :
+    if c2[-1] == 0:
         raise ZeroDivisionError()
 
     lc1 = len(c1)
     lc2 = len(c2)
-    if lc1 < lc2 :
+    if lc1 < lc2:
         return c1[:1]*0, c1
-    elif lc2 == 1 :
+    elif lc2 == 1:
         return c1/c2[-1], c1[:1]*0
-    else :
+    else:
         quo = np.empty(lc1 - lc2 + 1, dtype=c1.dtype)
         rem = c1
         for i in range(lc1 - lc2, - 1, -1):
@@ -576,7 +577,7 @@ def lagdiv(c1, c2):
         return quo, pu.trimseq(rem)
 
 
-def lagpow(c, pow, maxpower=16) :
+def lagpow(c, pow, maxpower=16):
     """Raise a Laguerre series to a power.
 
     Returns the Laguerre series `c` raised to the power `pow`. The
@@ -613,24 +614,24 @@ def lagpow(c, pow, maxpower=16) :
     # c is a trimmed copy
     [c] = pu.as_series([c])
     power = int(pow)
-    if power != pow or power < 0 :
+    if power != pow or power < 0:
         raise ValueError("Power must be a non-negative integer.")
-    elif maxpower is not None and power > maxpower :
+    elif maxpower is not None and power > maxpower:
         raise ValueError("Power is too large")
-    elif power == 0 :
+    elif power == 0:
         return np.array([1], dtype=c.dtype)
-    elif power == 1 :
+    elif power == 1:
         return c
-    else :
+    else:
         # This can be made more efficient by using powers of two
         # in the usual way.
         prd = c
-        for i in range(2, power + 1) :
+        for i in range(2, power + 1):
             prd = lagmul(prd, c)
         return prd
 
 
-def lagder(c, m=1, scl=1, axis=0) :
+def lagder(c, m=1, scl=1, axis=0):
     """
     Differentiate a Laguerre series.
 
@@ -708,7 +709,7 @@ def lagder(c, m=1, scl=1, axis=0) :
     n = len(c)
     if cnt >= n:
         c = c[:1]*0
-    else :
+    else:
         for i in range(cnt):
             n = n - 1
             c *= scl
@@ -815,9 +816,9 @@ def lagint(c, m=1, k=[], lbnd=0, scl=1, axis=0):
 
     if cnt != m:
         raise ValueError("The order of integration must be integer")
-    if cnt < 0 :
+    if cnt < 0:
         raise ValueError("The order of integration must be non-negative")
-    if len(k) > cnt :
+    if len(k) > cnt:
         raise ValueError("Too many integration constants")
     if iaxis != axis:
         raise ValueError("The axis must be integer")
@@ -831,7 +832,7 @@ def lagint(c, m=1, k=[], lbnd=0, scl=1, axis=0):
 
     c = np.rollaxis(c, iaxis)
     k = list(k) + [0]*(cnt - len(k))
-    for i in range(cnt) :
+    for i in range(cnt):
         n = len(c)
         c *= scl
         if n == 1 and np.all(c[0] == 0):
@@ -924,22 +925,21 @@ def lagval(x, c, tensor=True):
     if isinstance(x, (tuple, list)):
         x = np.asarray(x)
     if isinstance(x, np.ndarray) and tensor:
-       c = c.reshape(c.shape + (1,)*x.ndim)
+        c = c.reshape(c.shape + (1,)*x.ndim)
 
-
-    if len(c) == 1 :
+    if len(c) == 1:
         c0 = c[0]
         c1 = 0
-    elif len(c) == 2 :
+    elif len(c) == 2:
         c0 = c[0]
         c1 = c[1]
-    else :
+    else:
         nd = len(c)
         c0 = c[-2]
         c1 = c[-1]
-        for i in range(3, len(c) + 1) :
+        for i in range(3, len(c) + 1):
             tmp = c0
-            nd =  nd - 1
+            nd = nd - 1
             c0 = c[-i] - (c1*(nd - 1))/nd
             c1 = tmp + (c1*((2*nd - 1) - x))/nd
     return c0 + c1*(1 - x)
@@ -1174,7 +1174,7 @@ def laggrid3d(x, y, z, c):
     return c
 
 
-def lagvander(x, deg) :
+def lagvander(x, deg):
     """Pseudo-Vandermonde matrix of given degree.
 
     Returns the pseudo-Vandermonde matrix of degree `deg` and sample points
@@ -1229,14 +1229,14 @@ def lagvander(x, deg) :
     dtyp = x.dtype
     v = np.empty(dims, dtype=dtyp)
     v[0] = x*0 + 1
-    if ideg > 0 :
+    if ideg > 0:
         v[1] = 1 - x
-        for i in range(2, ideg + 1) :
+        for i in range(2, ideg + 1):
             v[i] = (v[i-1]*(2*i - 1 - x) - v[i-2]*(i - 1))/i
     return np.rollaxis(v, 0, v.ndim)
 
 
-def lagvander2d(x, y, deg) :
+def lagvander2d(x, y, deg):
     """Pseudo-Vandermonde matrix of given degrees.
 
     Returns the pseudo-Vandermonde matrix of degrees `deg` and sample
@@ -1299,7 +1299,7 @@ def lagvander2d(x, y, deg) :
     return v.reshape(v.shape[:-2] + (-1,))
 
 
-def lagvander3d(x, y, z, deg) :
+def lagvander3d(x, y, z, deg):
     """Pseudo-Vandermonde matrix of given degrees.
 
     Returns the pseudo-Vandermonde matrix of degrees `deg` and sample
@@ -1490,13 +1490,13 @@ def lagfit(x, y, deg, rcond=None, full=False, w=None):
     y = np.asarray(y) + 0.0
 
     # check arguments.
-    if deg < 0 :
+    if deg < 0:
         raise ValueError("expected deg >= 0")
     if x.ndim != 1:
         raise TypeError("expected 1D vector for x")
     if x.size == 0:
         raise TypeError("expected non-empty vector for x")
-    if y.ndim < 1 or y.ndim > 2 :
+    if y.ndim < 1 or y.ndim > 2:
         raise TypeError("expected 1D or 2D array for y")
     if len(x) != len(y):
         raise TypeError("expected x and y to have same length")
@@ -1516,7 +1516,7 @@ def lagfit(x, y, deg, rcond=None, full=False, w=None):
         rhs = rhs * w
 
     # set rcond
-    if rcond is None :
+    if rcond is None:
         rcond = len(x)*np.finfo(x.dtype).eps
 
     # Determine the norms of the design matrix columns.
@@ -1535,9 +1535,9 @@ def lagfit(x, y, deg, rcond=None, full=False, w=None):
         msg = "The fit may be poorly conditioned"
         warnings.warn(msg, pu.RankWarning)
 
-    if full :
+    if full:
         return c, [resids, rank, s, rcond]
-    else :
+    else:
         return c
 
 
@@ -1566,7 +1566,6 @@ def lagcompanion(c):
     .. versionadded::1.7.0
 
     """
-    accprod = np.multiply.accumulate
     # c is a trimmed copy
     [c] = pu.as_series([c])
     if len(c) < 2:
@@ -1634,9 +1633,9 @@ def lagroots(c):
     """
     # c is a trimmed copy
     [c] = pu.as_series([c])
-    if len(c) <= 1 :
+    if len(c) <= 1:
         return np.array([], dtype=c.dtype)
-    if len(c) == 2 :
+    if len(c) == 2:
         return np.array([1 + c[0]/c[1]])
 
     m = lagcompanion(c)
@@ -1651,8 +1650,8 @@ def laggauss(deg):
 
     Computes the sample points and weights for Gauss-Laguerre quadrature.
     These sample points and weights will correctly integrate polynomials of
-    degree :math:`2*deg - 1` or less over the interval :math:`[0, \inf]` with the
-    weight function :math:`f(x) = \exp(-x)`.
+    degree :math:`2*deg - 1` or less over the interval :math:`[0, \inf]`
+    with the weight function :math:`f(x) = \exp(-x)`.
 
     Parameters
     ----------
diff --git a/numpy/polynomial/legendre.py b/numpy/polynomial/legendre.py
index 58c130b7e655..d2de282692d8 100644
--- a/numpy/polynomial/legendre.py
+++ b/numpy/polynomial/legendre.py
@@ -90,17 +90,18 @@
 from . import polyutils as pu
 from ._polybase import ABCPolyBase
 
-__all__ = ['legzero', 'legone', 'legx', 'legdomain', 'legline',
-    'legadd', 'legsub', 'legmulx', 'legmul', 'legdiv', 'legpow', 'legval',
-    'legder', 'legint', 'leg2poly', 'poly2leg', 'legfromroots',
-    'legvander', 'legfit', 'legtrim', 'legroots', 'Legendre', 'legval2d',
-    'legval3d', 'leggrid2d', 'leggrid3d', 'legvander2d', 'legvander3d',
-    'legcompanion', 'leggauss', 'legweight']
+__all__ = [
+    'legzero', 'legone', 'legx', 'legdomain', 'legline', 'legadd',
+    'legsub', 'legmulx', 'legmul', 'legdiv', 'legpow', 'legval', 'legder',
+    'legint', 'leg2poly', 'poly2leg', 'legfromroots', 'legvander',
+    'legfit', 'legtrim', 'legroots', 'Legendre', 'legval2d', 'legval3d',
+    'leggrid2d', 'leggrid3d', 'legvander2d', 'legvander3d', 'legcompanion',
+    'leggauss', 'legweight']
 
 legtrim = pu.trimcoef
 
 
-def poly2leg(pol) :
+def poly2leg(pol):
     """
     Convert a polynomial to a Legendre series.
 
@@ -143,12 +144,12 @@ def poly2leg(pol) :
     [pol] = pu.as_series([pol])
     deg = len(pol) - 1
     res = 0
-    for i in range(deg, -1, -1) :
+    for i in range(deg, -1, -1):
         res = legadd(legmulx(res), pol[i])
     return res
 
 
-def leg2poly(c) :
+def leg2poly(c):
     """
     Convert a Legendre series to a polynomial.
 
@@ -202,7 +203,7 @@ def leg2poly(c) :
         c0 = c[-2]
         c1 = c[-1]
         # i is the current degree of c1
-        for i in range(n - 1, 1, -1) :
+        for i in range(n - 1, 1, -1):
             tmp = c0
             c0 = polysub(c[i - 2], (c1*(i - 1))/i)
             c1 = polyadd(tmp, (polymulx(c1)*(2*i - 1))/i)
@@ -226,7 +227,7 @@ def leg2poly(c) :
 legx = np.array([0, 1])
 
 
-def legline(off, scl) :
+def legline(off, scl):
     """
     Legendre series whose graph is a straight line.
 
@@ -256,13 +257,13 @@ def legline(off, scl) :
     -3.0
 
     """
-    if scl != 0 :
+    if scl != 0:
         return np.array([off, scl])
-    else :
+    else:
         return np.array([off])
 
 
-def legfromroots(roots) :
+def legfromroots(roots):
     """
     Generate a Legendre series with given roots.
 
@@ -311,9 +312,9 @@ def legfromroots(roots) :
     array([ 1.33333333+0.j,  0.00000000+0.j,  0.66666667+0.j])
 
     """
-    if len(roots) == 0 :
+    if len(roots) == 0:
         return np.ones(1)
-    else :
+    else:
         [roots] = pu.as_series([roots], trim=False)
         roots.sort()
         p = [legline(-r, 1) for r in roots]
@@ -369,10 +370,10 @@ def legadd(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if len(c1) > len(c2) :
+    if len(c1) > len(c2):
         c1[:c2.size] += c2
         ret = c1
-    else :
+    else:
         c2[:c1.size] += c1
         ret = c2
     return pu.trimseq(ret)
@@ -421,10 +422,10 @@ def legsub(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if len(c1) > len(c2) :
+    if len(c1) > len(c2):
         c1[:c2.size] -= c2
         ret = c1
-    else :
+    else:
         c2 = -c2
         c2[:c1.size] += c1
         ret = c2
@@ -533,13 +534,13 @@ def legmul(c1, c2):
     elif len(c) == 2:
         c0 = c[0]*xs
         c1 = c[1]*xs
-    else :
+    else:
         nd = len(c)
         c0 = c[-2]*xs
         c1 = c[-1]*xs
-        for i in range(3, len(c) + 1) :
+        for i in range(3, len(c) + 1):
             tmp = c0
-            nd =  nd - 1
+            nd = nd - 1
             c0 = legsub(c[-i]*xs, (c1*(nd - 1))/nd)
             c1 = legadd(tmp, (legmulx(c1)*(2*nd - 1))/nd)
     return legadd(c0, legmulx(c1))
@@ -593,16 +594,16 @@ def legdiv(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if c2[-1] == 0 :
+    if c2[-1] == 0:
         raise ZeroDivisionError()
 
     lc1 = len(c1)
     lc2 = len(c2)
-    if lc1 < lc2 :
+    if lc1 < lc2:
         return c1[:1]*0, c1
-    elif lc2 == 1 :
+    elif lc2 == 1:
         return c1/c2[-1], c1[:1]*0
-    else :
+    else:
         quo = np.empty(lc1 - lc2 + 1, dtype=c1.dtype)
         rem = c1
         for i in range(lc1 - lc2, - 1, -1):
@@ -613,7 +614,7 @@ def legdiv(c1, c2):
         return quo, pu.trimseq(rem)
 
 
-def legpow(c, pow, maxpower=16) :
+def legpow(c, pow, maxpower=16):
     """Raise a Legendre series to a power.
 
     Returns the Legendre series `c` raised to the power `pow`. The
@@ -647,24 +648,24 @@ def legpow(c, pow, maxpower=16) :
     # c is a trimmed copy
     [c] = pu.as_series([c])
     power = int(pow)
-    if power != pow or power < 0 :
+    if power != pow or power < 0:
         raise ValueError("Power must be a non-negative integer.")
-    elif maxpower is not None and power > maxpower :
+    elif maxpower is not None and power > maxpower:
         raise ValueError("Power is too large")
-    elif power == 0 :
+    elif power == 0:
         return np.array([1], dtype=c.dtype)
-    elif power == 1 :
+    elif power == 1:
         return c
-    else :
+    else:
         # This can be made more efficient by using powers of two
         # in the usual way.
         prd = c
-        for i in range(2, power + 1) :
+        for i in range(2, power + 1):
             prd = legmul(prd, c)
         return prd
 
 
-def legder(c, m=1, scl=1, axis=0) :
+def legder(c, m=1, scl=1, axis=0):
     """
     Differentiate a Legendre series.
 
@@ -747,7 +748,7 @@ def legder(c, m=1, scl=1, axis=0) :
     n = len(c)
     if cnt >= n:
         c = c[:1]*0
-    else :
+    else:
         for i in range(cnt):
             n = n - 1
             c *= scl
@@ -857,9 +858,9 @@ def legint(c, m=1, k=[], lbnd=0, scl=1, axis=0):
 
     if cnt != m:
         raise ValueError("The order of integration must be integer")
-    if cnt < 0 :
+    if cnt < 0:
         raise ValueError("The order of integration must be non-negative")
-    if len(k) > cnt :
+    if len(k) > cnt:
         raise ValueError("Too many integration constants")
     if iaxis != axis:
         raise ValueError("The axis must be integer")
@@ -873,7 +874,7 @@ def legint(c, m=1, k=[], lbnd=0, scl=1, axis=0):
 
     c = np.rollaxis(c, iaxis)
     k = list(k) + [0]*(cnt - len(k))
-    for i in range(cnt) :
+    for i in range(cnt):
         n = len(c)
         c *= scl
         if n == 1 and np.all(c[0] == 0):
@@ -964,19 +965,19 @@ def legval(x, c, tensor=True):
     if isinstance(x, np.ndarray) and tensor:
         c = c.reshape(c.shape + (1,)*x.ndim)
 
-    if len(c) == 1 :
+    if len(c) == 1:
         c0 = c[0]
         c1 = 0
-    elif len(c) == 2 :
+    elif len(c) == 2:
         c0 = c[0]
         c1 = c[1]
-    else :
+    else:
         nd = len(c)
         c0 = c[-2]
         c1 = c[-1]
-        for i in range(3, len(c) + 1) :
+        for i in range(3, len(c) + 1):
             tmp = c0
-            nd =  nd - 1
+            nd = nd - 1
             c0 = c[-i] - (c1*(nd - 1))/nd
             c1 = tmp + (c1*x*(2*nd - 1))/nd
     return c0 + c1*x
@@ -1211,7 +1212,7 @@ def leggrid3d(x, y, z, c):
     return c
 
 
-def legvander(x, deg) :
+def legvander(x, deg):
     """Pseudo-Vandermonde matrix of given degree.
 
     Returns the pseudo-Vandermonde matrix of degree `deg` and sample points
@@ -1259,14 +1260,14 @@ def legvander(x, deg) :
     # Use forward recursion to generate the entries. This is not as accurate
     # as reverse recursion in this application but it is more efficient.
     v[0] = x*0 + 1
-    if ideg > 0 :
+    if ideg > 0:
         v[1] = x
-        for i in range(2, ideg + 1) :
+        for i in range(2, ideg + 1):
             v[i] = (v[i-1]*x*(2*i - 1) - v[i-2]*(i - 1))/i
     return np.rollaxis(v, 0, v.ndim)
 
 
-def legvander2d(x, y, deg) :
+def legvander2d(x, y, deg):
     """Pseudo-Vandermonde matrix of given degrees.
 
     Returns the pseudo-Vandermonde matrix of degrees `deg` and sample
@@ -1329,7 +1330,7 @@ def legvander2d(x, y, deg) :
     return v.reshape(v.shape[:-2] + (-1,))
 
 
-def legvander3d(x, y, z, deg) :
+def legvander3d(x, y, z, deg):
     """Pseudo-Vandermonde matrix of given degrees.
 
     Returns the pseudo-Vandermonde matrix of degrees `deg` and sample
@@ -1515,13 +1516,13 @@ def legfit(x, y, deg, rcond=None, full=False, w=None):
     y = np.asarray(y) + 0.0
 
     # check arguments.
-    if deg < 0 :
+    if deg < 0:
         raise ValueError("expected deg >= 0")
     if x.ndim != 1:
         raise TypeError("expected 1D vector for x")
     if x.size == 0:
         raise TypeError("expected non-empty vector for x")
-    if y.ndim < 1 or y.ndim > 2 :
+    if y.ndim < 1 or y.ndim > 2:
         raise TypeError("expected 1D or 2D array for y")
     if len(x) != len(y):
         raise TypeError("expected x and y to have same length")
@@ -1541,7 +1542,7 @@ def legfit(x, y, deg, rcond=None, full=False, w=None):
         rhs = rhs * w
 
     # set rcond
-    if rcond is None :
+    if rcond is None:
         rcond = len(x)*np.finfo(x.dtype).eps
 
     # Determine the norms of the design matrix columns.
@@ -1560,9 +1561,9 @@ def legfit(x, y, deg, rcond=None, full=False, w=None):
         msg = "The fit may be poorly conditioned"
         warnings.warn(msg, pu.RankWarning)
 
-    if full :
+    if full:
         return c, [resids, rank, s, rcond]
-    else :
+    else:
         return c
 
 
@@ -1637,11 +1638,11 @@ def legroots(c):
     -----
     The root estimates are obtained as the eigenvalues of the companion
     matrix, Roots far from the origin of the complex plane may have large
-    errors due to the numerical instability of the series for such
-    values. Roots with multiplicity greater than 1 will also show larger
-    errors as the value of the series near such points is relatively
-    insensitive to errors in the roots. Isolated roots near the origin can
-    be improved by a few iterations of Newton's method.
+    errors due to the numerical instability of the series for such values.
+    Roots with multiplicity greater than 1 will also show larger errors as
+    the value of the series near such points is relatively insensitive to
+    errors in the roots. Isolated roots near the origin can be improved by
+    a few iterations of Newton's method.
 
     The Legendre series basis polynomials aren't powers of ``x`` so the
     results of this function may seem unintuitive.
@@ -1649,7 +1650,7 @@ def legroots(c):
     Examples
     --------
     >>> import numpy.polynomial.legendre as leg
-    >>> leg.legroots((1, 2, 3, 4)) # 4L_3 + 3L_2 + 2L_1 + 1L_0 has only real roots
+    >>> leg.legroots((1, 2, 3, 4)) # 4L_3 + 3L_2 + 2L_1 + 1L_0, all real roots
     array([-0.85099543, -0.11407192,  0.51506735])
 
     """
diff --git a/numpy/polynomial/polynomial.py b/numpy/polynomial/polynomial.py
index 60aaff83f5de..60e339a1d2ca 100644
--- a/numpy/polynomial/polynomial.py
+++ b/numpy/polynomial/polynomial.py
@@ -55,11 +55,12 @@
 """
 from __future__ import division, absolute_import, print_function
 
-__all__ = ['polyzero', 'polyone', 'polyx', 'polydomain', 'polyline',
-    'polyadd', 'polysub', 'polymulx', 'polymul', 'polydiv', 'polypow',
-    'polyval', 'polyder', 'polyint', 'polyfromroots', 'polyvander',
-    'polyfit', 'polytrim', 'polyroots', 'Polynomial', 'polyval2d',
-    'polyval3d', 'polygrid2d', 'polygrid3d', 'polyvander2d', 'polyvander3d']
+__all__ = [
+    'polyzero', 'polyone', 'polyx', 'polydomain', 'polyline', 'polyadd',
+    'polysub', 'polymulx', 'polymul', 'polydiv', 'polypow', 'polyval',
+    'polyder', 'polyint', 'polyfromroots', 'polyvander', 'polyfit',
+    'polytrim', 'polyroots', 'Polynomial', 'polyval2d', 'polyval3d',
+    'polygrid2d', 'polygrid3d', 'polyvander2d', 'polyvander3d']
 
 import warnings
 import numpy as np
@@ -92,7 +93,7 @@
 #
 
 
-def polyline(off, scl) :
+def polyline(off, scl):
     """
     Returns an array representing a linear polynomial.
 
@@ -113,20 +114,20 @@ def polyline(off, scl) :
 
     Examples
     --------
-    >>> from numpy import polynomial as P
+    >>> from numpy.polynomial import polynomial as P
     >>> P.polyline(1,-1)
     array([ 1, -1])
     >>> P.polyval(1, P.polyline(1,-1)) # should be 0
     0.0
 
     """
-    if scl != 0 :
+    if scl != 0:
         return np.array([off, scl])
-    else :
+    else:
         return np.array([off])
 
 
-def polyfromroots(roots) :
+def polyfromroots(roots):
     """
     Generate a monic polynomial with given roots.
 
@@ -176,7 +177,7 @@ def polyfromroots(roots) :
 
     Examples
     --------
-    >>> import numpy.polynomial as P
+    >>> from numpy.polynomial import polynomial as P
     >>> P.polyfromroots((-1,0,1)) # x(x - 1)(x + 1) = x^3 - x
     array([ 0., -1.,  0.,  1.])
     >>> j = complex(0,1)
@@ -184,9 +185,9 @@ def polyfromroots(roots) :
     array([ 1.+0.j,  0.+0.j,  1.+0.j])
 
     """
-    if len(roots) == 0 :
+    if len(roots) == 0:
         return np.ones(1)
-    else :
+    else:
         [roots] = pu.as_series([roots], trim=False)
         roots.sort()
         p = [polyline(-r, 1) for r in roots]
@@ -225,7 +226,7 @@ def polyadd(c1, c2):
 
     Examples
     --------
-    >>> from numpy import polynomial as P
+    >>> from numpy.polynomial import polynomial as P
     >>> c1 = (1,2,3)
     >>> c2 = (3,2,1)
     >>> sum = P.polyadd(c1,c2); sum
@@ -236,10 +237,10 @@ def polyadd(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if len(c1) > len(c2) :
+    if len(c1) > len(c2):
         c1[:c2.size] += c2
         ret = c1
-    else :
+    else:
         c2[:c1.size] += c1
         ret = c2
     return pu.trimseq(ret)
@@ -270,7 +271,7 @@ def polysub(c1, c2):
 
     Examples
     --------
-    >>> from numpy import polynomial as P
+    >>> from numpy.polynomial import polynomial as P
     >>> c1 = (1,2,3)
     >>> c2 = (3,2,1)
     >>> P.polysub(c1,c2)
@@ -281,10 +282,10 @@ def polysub(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if len(c1) > len(c2) :
+    if len(c1) > len(c2):
         c1[:c2.size] -= c2
         ret = c1
-    else :
+    else:
         c2 = -c2
         c2[:c1.size] += c1
         ret = c2
@@ -352,7 +353,7 @@ def polymul(c1, c2):
 
     Examples
     --------
-    >>> import numpy.polynomial as P
+    >>> from numpy.polynomial import polynomial as P
     >>> c1 = (1,2,3)
     >>> c2 = (3,2,1)
     >>> P.polymul(c1,c2)
@@ -389,7 +390,7 @@ def polydiv(c1, c2):
 
     Examples
     --------
-    >>> import numpy.polynomial as P
+    >>> from numpy.polynomial import polynomial as P
     >>> c1 = (1,2,3)
     >>> c2 = (3,2,1)
     >>> P.polydiv(c1,c2)
@@ -400,29 +401,29 @@ def polydiv(c1, c2):
     """
     # c1, c2 are trimmed copies
     [c1, c2] = pu.as_series([c1, c2])
-    if c2[-1] == 0 :
+    if c2[-1] == 0:
         raise ZeroDivisionError()
 
     len1 = len(c1)
     len2 = len(c2)
-    if len2 == 1 :
+    if len2 == 1:
         return c1/c2[-1], c1[:1]*0
-    elif len1 < len2 :
+    elif len1 < len2:
         return c1[:1]*0, c1
-    else :
+    else:
         dlen = len1 - len2
         scl = c2[-1]
-        c2  = c2[:-1]/scl
+        c2 = c2[:-1]/scl
         i = dlen
         j = len1 - 1
-        while i >= 0 :
+        while i >= 0:
             c1[i:j] -= c2*c1[j]
             i -= 1
             j -= 1
         return c1[j+1:]/scl, pu.trimseq(c1[:j+1])
 
 
-def polypow(c, pow, maxpower=None) :
+def polypow(c, pow, maxpower=None):
     """Raise a polynomial to a power.
 
     Returns the polynomial `c` raised to the power `pow`. The argument
@@ -456,19 +457,19 @@ def polypow(c, pow, maxpower=None) :
     # c is a trimmed copy
     [c] = pu.as_series([c])
     power = int(pow)
-    if power != pow or power < 0 :
+    if power != pow or power < 0:
         raise ValueError("Power must be a non-negative integer.")
-    elif maxpower is not None and power > maxpower :
+    elif maxpower is not None and power > maxpower:
         raise ValueError("Power is too large")
-    elif power == 0 :
+    elif power == 0:
         return np.array([1], dtype=c.dtype)
-    elif power == 1 :
+    elif power == 1:
         return c
-    else :
+    else:
         # This can be made more efficient by using powers of two
         # in the usual way.
         prd = c
-        for i in range(2, power + 1) :
+        for i in range(2, power + 1):
             prd = np.convolve(prd, c)
         return prd
 
@@ -513,7 +514,7 @@ def polyder(c, m=1, scl=1, axis=0):
 
     Examples
     --------
-    >>> from numpy import polynomial as P
+    >>> from numpy.polynomial import polynomial as P
     >>> c = (1,2,3,4) # 1 + 2x + 3x**2 + 4x**3
     >>> P.polyder(c) # (d/dx)(c) = 2 + 6x + 12x**2
     array([  2.,   6.,  12.])
@@ -550,7 +551,7 @@ def polyder(c, m=1, scl=1, axis=0):
     n = len(c)
     if cnt >= n:
         c = c[:1]*0
-    else :
+    else:
         for i in range(cnt):
             n = n - 1
             c *= scl
@@ -624,7 +625,7 @@ def polyint(c, m=1, k=[], lbnd=0, scl=1, axis=0):
 
     Examples
     --------
-    >>> from numpy import polynomial as P
+    >>> from numpy.polynomial import polynomial as P
     >>> c = (1,2,3)
     >>> P.polyint(c) # should return array([0, 1, 1, 1])
     array([ 0.,  1.,  1.,  1.])
@@ -650,9 +651,9 @@ def polyint(c, m=1, k=[], lbnd=0, scl=1, axis=0):
 
     if cnt != m:
         raise ValueError("The order of integration must be integer")
-    if cnt < 0 :
+    if cnt < 0:
         raise ValueError("The order of integration must be non-negative")
-    if len(k) > cnt :
+    if len(k) > cnt:
         raise ValueError("Too many integration constants")
     if iaxis != axis:
         raise ValueError("The axis must be integer")
@@ -661,7 +662,6 @@ def polyint(c, m=1, k=[], lbnd=0, scl=1, axis=0):
     if iaxis < 0:
         iaxis += c.ndim
 
-
     if cnt == 0:
         return c
 
@@ -775,7 +775,7 @@ def polyval(x, c, tensor=True):
         c = c.reshape(c.shape + (1,)*x.ndim)
 
     c0 = c[-1] + x*0
-    for i in range(2, len(c) + 1) :
+    for i in range(2, len(c) + 1):
         c0 = c[-i] + c0*x
     return c0
 
@@ -1010,7 +1010,7 @@ def polygrid3d(x, y, z, c):
     return c
 
 
-def polyvander(x, deg) :
+def polyvander(x, deg):
     """Vandermonde matrix of given degree.
 
     Returns the Vandermonde matrix of degree `deg` and sample points
@@ -1059,14 +1059,14 @@ def polyvander(x, deg) :
     dtyp = x.dtype
     v = np.empty(dims, dtype=dtyp)
     v[0] = x*0 + 1
-    if ideg > 0 :
+    if ideg > 0:
         v[1] = x
-        for i in range(2, ideg + 1) :
+        for i in range(2, ideg + 1):
             v[i] = v[i-1]*x
     return np.rollaxis(v, 0, v.ndim)
 
 
-def polyvander2d(x, y, deg) :
+def polyvander2d(x, y, deg):
     """Pseudo-Vandermonde matrix of given degrees.
 
     Returns the pseudo-Vandermonde matrix of degrees `deg` and sample
@@ -1126,7 +1126,7 @@ def polyvander2d(x, y, deg) :
     return v.reshape(v.shape[:-2] + (-1,))
 
 
-def polyvander3d(x, y, z, deg) :
+def polyvander3d(x, y, z, deg):
     """Pseudo-Vandermonde matrix of given degrees.
 
     Returns the pseudo-Vandermonde matrix of degrees `deg` and sample
@@ -1254,7 +1254,7 @@ def polyfit(x, y, deg, rcond=None, full=False, w=None):
         rcond -- value of `rcond`.
 
         For more details, see `linalg.lstsq`.
-        
+
     Raises
     ------
     RankWarning
@@ -1310,7 +1310,7 @@ def polyfit(x, y, deg, rcond=None, full=False, w=None):
 
     Examples
     --------
-    >>> from numpy import polynomial as P
+    >>> from numpy.polynomial import polynomial as P
     >>> x = np.linspace(-1,1,51) # x "data": [-1, -0.96, ..., 0.96, 1]
     >>> y = x**3 - x + np.random.randn(len(x)) # x^3 - x + N(0,1) "noise"
     >>> c, stats = P.polyfit(x,y,3,full=True)
@@ -1337,13 +1337,13 @@ def polyfit(x, y, deg, rcond=None, full=False, w=None):
     y = np.asarray(y) + 0.0
 
     # check arguments.
-    if deg < 0 :
+    if deg < 0:
         raise ValueError("expected deg >= 0")
     if x.ndim != 1:
         raise TypeError("expected 1D vector for x")
     if x.size == 0:
         raise TypeError("expected non-empty vector for x")
-    if y.ndim < 1 or y.ndim > 2 :
+    if y.ndim < 1 or y.ndim > 2:
         raise TypeError("expected 1D or 2D array for y")
     if len(x) != len(y):
         raise TypeError("expected x and y to have same length")
@@ -1363,7 +1363,7 @@ def polyfit(x, y, deg, rcond=None, full=False, w=None):
         rhs = rhs * w
 
     # set rcond
-    if rcond is None :
+    if rcond is None:
         rcond = len(x)*np.finfo(x.dtype).eps
 
     # Determine the norms of the design matrix columns.
@@ -1382,9 +1382,9 @@ def polyfit(x, y, deg, rcond=None, full=False, w=None):
         msg = "The fit may be poorly conditioned"
         warnings.warn(msg, pu.RankWarning)
 
-    if full :
+    if full:
         return c, [resids, rank, s, rcond]
-    else :
+    else:
         return c
 
 
@@ -1415,7 +1415,7 @@ def polycompanion(c):
     """
     # c is a trimmed copy
     [c] = pu.as_series([c])
-    if len(c) < 2 :
+    if len(c) < 2:
         raise ValueError('Series must have maximum degree of at least 1.')
     if len(c) == 2:
         return np.array([[-c[0]/c[1]]])
diff --git a/numpy/polynomial/polyutils.py b/numpy/polynomial/polyutils.py
index 99f508521c64..9348559edb97 100644
--- a/numpy/polynomial/polyutils.py
+++ b/numpy/polynomial/polyutils.py
@@ -45,27 +45,25 @@
 """
 from __future__ import division, absolute_import, print_function
 
-__all__ = ['RankWarning', 'PolyError', 'PolyDomainError', 'as_series',
-           'trimseq', 'trimcoef', 'getdomain', 'mapdomain', 'mapparms',
-           'PolyBase']
-
-import warnings
 import numpy as np
-import sys
+
+__all__ = [
+    'RankWarning', 'PolyError', 'PolyDomainError', 'as_series', 'trimseq',
+    'trimcoef', 'getdomain', 'mapdomain', 'mapparms', 'PolyBase']
 
 #
 # Warnings and Exceptions
 #
 
-class RankWarning(UserWarning) :
+class RankWarning(UserWarning):
     """Issued by chebfit when the design matrix is rank deficient."""
     pass
 
-class PolyError(Exception) :
+class PolyError(Exception):
     """Base class for errors in this module."""
     pass
 
-class PolyDomainError(PolyError) :
+class PolyDomainError(PolyError):
     """Issued by the generic Poly class when two domains don't match.
 
     This is raised when an binary operation is passed Poly objects with
@@ -78,7 +76,7 @@ class PolyDomainError(PolyError) :
 # Base class for all polynomial types
 #
 
-class PolyBase(object) :
+class PolyBase(object):
     """
     Base class for all polynomial types.
 
@@ -93,7 +91,7 @@ class PolyBase(object) :
 #
 # Helper functions to convert inputs to 1-D arrays
 #
-def trimseq(seq) :
+def trimseq(seq):
     """Remove small Poly series coefficients.
 
     Parameters
@@ -114,16 +112,16 @@ def trimseq(seq) :
     Do not lose the type info if the sequence contains unknown objects.
 
     """
-    if len(seq) == 0 :
+    if len(seq) == 0:
         return seq
-    else :
-        for i in range(len(seq) - 1, -1, -1) :
-            if seq[i] != 0 :
+    else:
+        for i in range(len(seq) - 1, -1, -1):
+            if seq[i] != 0:
                 break
         return seq[:i+1]
 
 
-def as_series(alist, trim=True) :
+def as_series(alist, trim=True):
     """
     Return argument as a list of 1-d arrays.
 
@@ -165,32 +163,32 @@ def as_series(alist, trim=True) :
 
     """
     arrays = [np.array(a, ndmin=1, copy=0) for a in alist]
-    if min([a.size for a in arrays]) == 0 :
+    if min([a.size for a in arrays]) == 0:
         raise ValueError("Coefficient array is empty")
-    if any([a.ndim != 1 for a in arrays]) :
+    if any([a.ndim != 1 for a in arrays]):
         raise ValueError("Coefficient array is not 1-d")
-    if trim :
+    if trim:
         arrays = [trimseq(a) for a in arrays]
 
-    if any([a.dtype == np.dtype(object) for a in arrays]) :
+    if any([a.dtype == np.dtype(object) for a in arrays]):
         ret = []
-        for a in arrays :
-            if a.dtype != np.dtype(object) :
+        for a in arrays:
+            if a.dtype != np.dtype(object):
                 tmp = np.empty(len(a), dtype=np.dtype(object))
                 tmp[:] = a[:]
                 ret.append(tmp)
-            else :
+            else:
                 ret.append(a.copy())
-    else :
-        try :
+    else:
+        try:
             dtype = np.common_type(*arrays)
-        except :
+        except:
             raise ValueError("Coefficient arrays have no common type")
         ret = [np.array(a, copy=1, dtype=dtype) for a in arrays]
     return ret
 
 
-def trimcoef(c, tol=0) :
+def trimcoef(c, tol=0):
     """
     Remove "small" "trailing" coefficients from a polynomial.
 
@@ -234,17 +232,17 @@ def trimcoef(c, tol=0) :
     array([ 0.0003+0.j   ,  0.0010-0.001j])
 
     """
-    if tol < 0 :
+    if tol < 0:
         raise ValueError("tol must be non-negative")
 
     [c] = as_series([c])
     [ind] = np.where(np.abs(c) > tol)
-    if len(ind) == 0 :
+    if len(ind) == 0:
         return c[:1]*0
-    else :
+    else:
         return c[:ind[-1] + 1].copy()
 
-def getdomain(x) :
+def getdomain(x):
     """
     Return a domain suitable for given abscissae.
 
@@ -283,14 +281,14 @@ def getdomain(x) :
 
     """
     [x] = as_series([x], trim=False)
-    if x.dtype.char in np.typecodes['Complex'] :
+    if x.dtype.char in np.typecodes['Complex']:
         rmin, rmax = x.real.min(), x.real.max()
         imin, imax = x.imag.min(), x.imag.max()
         return np.array((complex(rmin, imin), complex(rmax, imax)))
-    else :
+    else:
         return np.array((x.min(), x.max()))
 
-def mapparms(old, new) :
+def mapparms(old, new):
     """
     Linear map parameters between domains.
 
@@ -337,7 +335,7 @@ def mapparms(old, new) :
     scl = newlen/oldlen
     return off, scl
 
-def mapdomain(x, old, new) :
+def mapdomain(x, old, new):
     """
     Apply linear map to input points.
 
diff --git a/numpy/polynomial/tests/test_chebyshev.py b/numpy/polynomial/tests/test_chebyshev.py
index 82c3ba9ea696..a596905f6771 100644
--- a/numpy/polynomial/tests/test_chebyshev.py
+++ b/numpy/polynomial/tests/test_chebyshev.py
@@ -400,14 +400,14 @@ def f(x):
             return x*(x - 1)*(x - 2)
 
         # Test exceptions
-        assert_raises(ValueError, cheb.chebfit, [1],    [1],     -1)
-        assert_raises(TypeError,  cheb.chebfit, [[1]],  [1],      0)
-        assert_raises(TypeError,  cheb.chebfit, [],     [1],      0)
-        assert_raises(TypeError,  cheb.chebfit, [1],    [[[1]]],  0)
-        assert_raises(TypeError,  cheb.chebfit, [1, 2], [1],      0)
-        assert_raises(TypeError,  cheb.chebfit, [1],    [1, 2],   0)
-        assert_raises(TypeError,  cheb.chebfit, [1],    [1],   0, w=[[1]])
-        assert_raises(TypeError,  cheb.chebfit, [1],    [1],   0, w=[1, 1])
+        assert_raises(ValueError, cheb.chebfit, [1], [1], -1)
+        assert_raises(TypeError, cheb.chebfit, [[1]], [1], 0)
+        assert_raises(TypeError, cheb.chebfit, [], [1], 0)
+        assert_raises(TypeError, cheb.chebfit, [1], [[[1]]], 0)
+        assert_raises(TypeError, cheb.chebfit, [1, 2], [1], 0)
+        assert_raises(TypeError, cheb.chebfit, [1], [1, 2], 0)
+        assert_raises(TypeError, cheb.chebfit, [1], [1], 0, w=[[1]])
+        assert_raises(TypeError, cheb.chebfit, [1], [1], 0, w=[1, 1])
 
         # Test fit
         x = np.linspace(0, 2)
@@ -532,7 +532,7 @@ def test_chebpts1(self):
         assert_almost_equal(cheb.chebpts1(2), tgt)
         tgt = [-0.86602540378443871, 0, 0.86602540378443871]
         assert_almost_equal(cheb.chebpts1(3), tgt)
-        tgt = [-0.9238795325, -0.3826834323,  0.3826834323,  0.9238795325]
+        tgt = [-0.9238795325, -0.3826834323, 0.3826834323, 0.9238795325]
         assert_almost_equal(cheb.chebpts1(4), tgt)
 
     def test_chebpts2(self):
diff --git a/numpy/polynomial/tests/test_classes.py b/numpy/polynomial/tests/test_classes.py
index f9134b8c10ec..cd5a54687939 100644
--- a/numpy/polynomial/tests/test_classes.py
+++ b/numpy/polynomial/tests/test_classes.py
@@ -10,12 +10,10 @@
 
 import numpy as np
 from numpy.polynomial import (
-    Polynomial, Legendre, Chebyshev, Laguerre,
-    Hermite, HermiteE)
+    Polynomial, Legendre, Chebyshev, Laguerre, Hermite, HermiteE)
 from numpy.testing import (
-    TestCase, assert_almost_equal, assert_raises,
-    assert_equal, assert_, run_module_suite, dec)
-from numpy.testing.noseclasses import KnownFailure
+    assert_almost_equal, assert_raises, assert_equal, assert_,
+    run_module_suite)
 from numpy.compat import long
 
 
@@ -410,6 +408,9 @@ def check_roots(Poly):
     d = Poly.domain + random((2,))*.25
     w = Poly.window + random((2,))*.25
     tgt = np.sort(random((5,)))
+    res = np.sort(Poly.fromroots(tgt, domain=d, window=w).roots())
+    assert_almost_equal(res, tgt)
+    # default domain and window
     res = np.sort(Poly.fromroots(tgt).roots())
     assert_almost_equal(res, tgt)
 
@@ -468,6 +469,12 @@ def check_deriv(Poly):
     p3 = p1.integ(1, k=[1])
     assert_almost_equal(p2.deriv(1).coef, p3.coef)
     assert_almost_equal(p2.deriv(2).coef, p1.coef)
+    # default domain and window
+    p1 = Poly([1, 2, 3])
+    p2 = p1.integ(2, k=[1, 2])
+    p3 = p1.integ(1, k=[1])
+    assert_almost_equal(p2.deriv(1).coef, p3.coef)
+    assert_almost_equal(p2.deriv(2).coef, p1.coef)
 
 
 def check_linspace(Poly):
@@ -491,11 +498,18 @@ def check_linspace(Poly):
 def check_pow(Poly):
     d = Poly.domain + random((2,))*.25
     w = Poly.window + random((2,))*.25
-    tgt = Poly([1], domain=d, window=d)
-    tst = Poly([1, 2, 3], domain=d, window=d)
+    tgt = Poly([1], domain=d, window=w)
+    tst = Poly([1, 2, 3], domain=d, window=w)
+    for i in range(5):
+        assert_poly_almost_equal(tst**i, tgt)
+        tgt = tgt * tst
+    # default domain and window
+    tgt = Poly([1])
+    tst = Poly([1, 2, 3])
     for i in range(5):
         assert_poly_almost_equal(tst**i, tgt)
         tgt = tgt * tst
+    # check error for invalid powers
     assert_raises(ValueError, op.pow, tgt, 1.5)
     assert_raises(ValueError, op.pow, tgt, -1)
 
diff --git a/numpy/polynomial/tests/test_hermite.py b/numpy/polynomial/tests/test_hermite.py
index ac60007d1cf4..e67625a88139 100644
--- a/numpy/polynomial/tests/test_hermite.py
+++ b/numpy/polynomial/tests/test_hermite.py
@@ -119,7 +119,6 @@ def test_hermval(self):
         y = [polyval(x, c) for c in Hlist]
         for i in range(10):
             msg = "At i=%d" % i
-            ser = np.zeros
             tgt = y[i]
             res = herm.hermval(x, [0]*i + [1])
             assert_almost_equal(res, tgt, err_msg=msg)
@@ -389,14 +388,14 @@ def f(x):
             return x*(x - 1)*(x - 2)
 
         # Test exceptions
-        assert_raises(ValueError, herm.hermfit, [1],    [1],     -1)
-        assert_raises(TypeError,  herm.hermfit, [[1]],  [1],      0)
-        assert_raises(TypeError,  herm.hermfit, [],     [1],      0)
-        assert_raises(TypeError,  herm.hermfit, [1],    [[[1]]],  0)
-        assert_raises(TypeError,  herm.hermfit, [1, 2], [1],      0)
-        assert_raises(TypeError,  herm.hermfit, [1],    [1, 2],   0)
-        assert_raises(TypeError,  herm.hermfit, [1],    [1],   0, w=[[1]])
-        assert_raises(TypeError,  herm.hermfit, [1],    [1],   0, w=[1, 1])
+        assert_raises(ValueError, herm.hermfit, [1], [1], -1)
+        assert_raises(TypeError, herm.hermfit, [[1]], [1], 0)
+        assert_raises(TypeError, herm.hermfit, [], [1], 0)
+        assert_raises(TypeError, herm.hermfit, [1], [[[1]]], 0)
+        assert_raises(TypeError, herm.hermfit, [1, 2], [1], 0)
+        assert_raises(TypeError, herm.hermfit, [1], [1, 2], 0)
+        assert_raises(TypeError, herm.hermfit, [1], [1], 0, w=[[1]])
+        assert_raises(TypeError, herm.hermfit, [1], [1], 0, w=[1, 1])
 
         # Test fit
         x = np.linspace(0, 2)
diff --git a/numpy/polynomial/tests/test_hermite_e.py b/numpy/polynomial/tests/test_hermite_e.py
index 5341dc7ff046..f8601a82846a 100644
--- a/numpy/polynomial/tests/test_hermite_e.py
+++ b/numpy/polynomial/tests/test_hermite_e.py
@@ -6,7 +6,9 @@
 import numpy as np
 import numpy.polynomial.hermite_e as herme
 from numpy.polynomial.polynomial import polyval
-from numpy.testing import *
+from numpy.testing import (
+    TestCase, assert_almost_equal, assert_raises,
+    assert_equal, assert_, run_module_suite)
 
 He0 = np.array([1])
 He1 = np.array([0, 1])
@@ -117,7 +119,6 @@ def test_hermeval(self):
         y = [polyval(x, c) for c in Helist]
         for i in range(10):
             msg = "At i=%d" % i
-            ser = np.zeros
             tgt = y[i]
             res = herme.hermeval(x, [0]*i + [1])
             assert_almost_equal(res, tgt, err_msg=msg)
@@ -388,14 +389,14 @@ def f(x):
             return x*(x - 1)*(x - 2)
 
         # Test exceptions
-        assert_raises(ValueError, herme.hermefit, [1],    [1],     -1)
-        assert_raises(TypeError,  herme.hermefit, [[1]],  [1],      0)
-        assert_raises(TypeError,  herme.hermefit, [],     [1],      0)
-        assert_raises(TypeError,  herme.hermefit, [1],    [[[1]]],  0)
-        assert_raises(TypeError,  herme.hermefit, [1, 2], [1],      0)
-        assert_raises(TypeError,  herme.hermefit, [1],    [1, 2],   0)
-        assert_raises(TypeError,  herme.hermefit, [1],    [1],   0, w=[[1]])
-        assert_raises(TypeError,  herme.hermefit, [1],    [1],   0, w=[1, 1])
+        assert_raises(ValueError, herme.hermefit, [1], [1], -1)
+        assert_raises(TypeError, herme.hermefit, [[1]], [1], 0)
+        assert_raises(TypeError, herme.hermefit, [], [1], 0)
+        assert_raises(TypeError, herme.hermefit, [1], [[[1]]], 0)
+        assert_raises(TypeError, herme.hermefit, [1, 2], [1], 0)
+        assert_raises(TypeError, herme.hermefit, [1], [1, 2], 0)
+        assert_raises(TypeError, herme.hermefit, [1], [1], 0, w=[[1]])
+        assert_raises(TypeError, herme.hermefit, [1], [1], 0, w=[1, 1])
 
         # Test fit
         x = np.linspace(0, 2)
diff --git a/numpy/polynomial/tests/test_laguerre.py b/numpy/polynomial/tests/test_laguerre.py
index b3d8fe5ee818..1dc57a960294 100644
--- a/numpy/polynomial/tests/test_laguerre.py
+++ b/numpy/polynomial/tests/test_laguerre.py
@@ -116,7 +116,6 @@ def test_lagval(self):
         y = [polyval(x, c) for c in Llist]
         for i in range(7):
             msg = "At i=%d" % i
-            ser = np.zeros
             tgt = y[i]
             res = lag.lagval(x, [0]*i + [1])
             assert_almost_equal(res, tgt, err_msg=msg)
@@ -386,14 +385,14 @@ def f(x):
             return x*(x - 1)*(x - 2)
 
         # Test exceptions
-        assert_raises(ValueError, lag.lagfit, [1],    [1],     -1)
-        assert_raises(TypeError,  lag.lagfit, [[1]],  [1],      0)
-        assert_raises(TypeError,  lag.lagfit, [],     [1],      0)
-        assert_raises(TypeError,  lag.lagfit, [1],    [[[1]]],  0)
-        assert_raises(TypeError,  lag.lagfit, [1, 2], [1],      0)
-        assert_raises(TypeError,  lag.lagfit, [1],    [1, 2],   0)
-        assert_raises(TypeError,  lag.lagfit, [1],    [1],   0, w=[[1]])
-        assert_raises(TypeError,  lag.lagfit, [1],    [1],   0, w=[1, 1])
+        assert_raises(ValueError, lag.lagfit, [1], [1], -1)
+        assert_raises(TypeError, lag.lagfit, [[1]], [1], 0)
+        assert_raises(TypeError, lag.lagfit, [], [1], 0)
+        assert_raises(TypeError, lag.lagfit, [1], [[[1]]], 0)
+        assert_raises(TypeError, lag.lagfit, [1, 2], [1], 0)
+        assert_raises(TypeError, lag.lagfit, [1], [1, 2], 0)
+        assert_raises(TypeError, lag.lagfit, [1], [1], 0, w=[[1]])
+        assert_raises(TypeError, lag.lagfit, [1], [1], 0, w=[1, 1])
 
         # Test fit
         x = np.linspace(0, 2)
diff --git a/numpy/polynomial/tests/test_legendre.py b/numpy/polynomial/tests/test_legendre.py
index e248f005d492..8ac1feb589d4 100644
--- a/numpy/polynomial/tests/test_legendre.py
+++ b/numpy/polynomial/tests/test_legendre.py
@@ -120,7 +120,6 @@ def test_legval(self):
         y = [polyval(x, c) for c in Llist]
         for i in range(10):
             msg = "At i=%d" % i
-            ser = np.zeros
             tgt = y[i]
             res = leg.legval(x, [0]*i + [1])
             assert_almost_equal(res, tgt, err_msg=msg)
@@ -390,14 +389,14 @@ def f(x):
             return x*(x - 1)*(x - 2)
 
         # Test exceptions
-        assert_raises(ValueError, leg.legfit, [1],    [1],     -1)
-        assert_raises(TypeError,  leg.legfit, [[1]],  [1],      0)
-        assert_raises(TypeError,  leg.legfit, [],     [1],      0)
-        assert_raises(TypeError,  leg.legfit, [1],    [[[1]]],  0)
-        assert_raises(TypeError,  leg.legfit, [1, 2], [1],      0)
-        assert_raises(TypeError,  leg.legfit, [1],    [1, 2],   0)
-        assert_raises(TypeError,  leg.legfit, [1],    [1],   0, w=[[1]])
-        assert_raises(TypeError,  leg.legfit, [1],    [1],   0, w=[1, 1])
+        assert_raises(ValueError, leg.legfit, [1], [1], -1)
+        assert_raises(TypeError, leg.legfit, [[1]], [1], 0)
+        assert_raises(TypeError, leg.legfit, [], [1], 0)
+        assert_raises(TypeError, leg.legfit, [1], [[[1]]], 0)
+        assert_raises(TypeError, leg.legfit, [1, 2], [1], 0)
+        assert_raises(TypeError, leg.legfit, [1], [1, 2], 0)
+        assert_raises(TypeError, leg.legfit, [1], [1], 0, w=[[1]])
+        assert_raises(TypeError, leg.legfit, [1], [1], 0, w=[1, 1])
 
         # Test fit
         x = np.linspace(0, 2)
diff --git a/numpy/polynomial/tests/test_polynomial.py b/numpy/polynomial/tests/test_polynomial.py
index 77092cd2f812..c806a8497492 100644
--- a/numpy/polynomial/tests/test_polynomial.py
+++ b/numpy/polynomial/tests/test_polynomial.py
@@ -420,14 +420,14 @@ def f(x):
             return x*(x - 1)*(x - 2)
 
         # Test exceptions
-        assert_raises(ValueError, poly.polyfit, [1],    [1],     -1)
-        assert_raises(TypeError,  poly.polyfit, [[1]],  [1],      0)
-        assert_raises(TypeError,  poly.polyfit, [],     [1],      0)
-        assert_raises(TypeError,  poly.polyfit, [1],    [[[1]]],  0)
-        assert_raises(TypeError,  poly.polyfit, [1, 2], [1],      0)
-        assert_raises(TypeError,  poly.polyfit, [1],    [1, 2],   0)
-        assert_raises(TypeError,  poly.polyfit, [1],    [1],   0, w=[[1]])
-        assert_raises(TypeError,  poly.polyfit, [1],    [1],   0, w=[1, 1])
+        assert_raises(ValueError, poly.polyfit, [1], [1], -1)
+        assert_raises(TypeError, poly.polyfit, [[1]], [1], 0)
+        assert_raises(TypeError, poly.polyfit, [], [1], 0)
+        assert_raises(TypeError, poly.polyfit, [1], [[[1]]], 0)
+        assert_raises(TypeError, poly.polyfit, [1, 2], [1], 0)
+        assert_raises(TypeError, poly.polyfit, [1], [1, 2], 0)
+        assert_raises(TypeError, poly.polyfit, [1], [1], 0, w=[[1]])
+        assert_raises(TypeError, poly.polyfit, [1], [1], 0, w=[1, 1])
 
         # Test fit
         x = np.linspace(0, 2)
diff --git a/numpy/polynomial/tests/test_polyutils.py b/numpy/polynomial/tests/test_polyutils.py
index c77ee24354ed..974e2e09a388 100644
--- a/numpy/polynomial/tests/test_polyutils.py
+++ b/numpy/polynomial/tests/test_polyutils.py
@@ -5,7 +5,9 @@
 
 import numpy as np
 import numpy.polynomial.polyutils as pu
-from numpy.testing import *
+from numpy.testing import (
+    TestCase, assert_almost_equal, assert_raises,
+    assert_equal, assert_, run_module_suite)
 
 
 class TestMisc(TestCase):
@@ -101,3 +103,7 @@ def test_mapparms(self):
         tgt = [-1 + 1j, 1 - 1j]
         res = pu.mapparms(dom1, dom2)
         assert_almost_equal(res, tgt)
+
+
+if __name__ == "__main__":
+    run_module_suite()
diff --git a/numpy/random/mtrand/mtrand.pyx b/numpy/random/mtrand/mtrand.pyx
index c2603543d631..b811726d3eca 100644
--- a/numpy/random/mtrand/mtrand.pyx
+++ b/numpy/random/mtrand/mtrand.pyx
@@ -127,7 +127,10 @@ import_array()
 import numpy as np
 import operator
 import warnings
-from threading import Lock
+try:
+    from threading import Lock
+except ImportError:
+    from dummy_threading import Lock
 
 cdef object cont0_array(rk_state *state, rk_cont0 func, object size,
                         object lock):
@@ -607,8 +610,8 @@ cdef class RandomState:
     def __init__(self, seed=None):
         self.internal_state = <rk_state*>PyMem_Malloc(sizeof(rk_state))
 
-        self.seed(seed)
         self.lock = Lock()
+        self.seed(seed)
 
     def __dealloc__(self):
         if self.internal_state != NULL:
@@ -639,19 +642,22 @@ cdef class RandomState:
         cdef ndarray obj "arrayObject_obj"
         try:
             if seed is None:
-                errcode = rk_randomseed(self.internal_state)
+                with self.lock:
+                    errcode = rk_randomseed(self.internal_state)
             else:
                 idx = operator.index(seed)
                 if idx > int(2**32 - 1) or idx < 0:
                     raise ValueError("Seed must be between 0 and 4294967295")
-                rk_seed(idx, self.internal_state)
+                with self.lock:
+                    rk_seed(idx, self.internal_state)
         except TypeError:
             obj = np.asarray(seed).astype(np.int64, casting='safe')
             if ((obj > int(2**32 - 1)) | (obj < 0)).any():
                 raise ValueError("Seed must be between 0 and 4294967295")
             obj = obj.astype('L', casting='unsafe')
-            init_by_array(self.internal_state, <unsigned long *>PyArray_DATA(obj),
-                PyArray_DIM(obj, 0))
+            with self.lock:
+                init_by_array(self.internal_state, <unsigned long *>PyArray_DATA(obj),
+                    PyArray_DIM(obj, 0))
 
     def get_state(self):
         """
@@ -685,10 +691,13 @@ cdef class RandomState:
         """
         cdef ndarray state "arrayObject_state"
         state = <ndarray>np.empty(624, np.uint)
-        memcpy(<void*>PyArray_DATA(state), <void*>(self.internal_state.key), 624*sizeof(long))
+        with self.lock:
+            memcpy(<void*>PyArray_DATA(state), <void*>(self.internal_state.key), 624*sizeof(long))
+            has_gauss = self.internal_state.has_gauss
+            gauss = self.internal_state.gauss
+            pos = self.internal_state.pos
         state = <ndarray>np.asarray(state, np.uint32)
-        return ('MT19937', state, self.internal_state.pos,
-            self.internal_state.has_gauss, self.internal_state.gauss)
+        return ('MT19937', state, pos, has_gauss, gauss)
 
     def set_state(self, state):
         """
@@ -755,10 +764,11 @@ cdef class RandomState:
             obj = <ndarray>PyArray_ContiguousFromObject(key, NPY_LONG, 1, 1)
         if PyArray_DIM(obj, 0) != 624:
             raise ValueError("state must be 624 longs")
-        memcpy(<void*>(self.internal_state.key), <void*>PyArray_DATA(obj), 624*sizeof(long))
-        self.internal_state.pos = pos
-        self.internal_state.has_gauss = has_gauss
-        self.internal_state.gauss = cached_gaussian
+        with self.lock:
+            memcpy(<void*>(self.internal_state.key), <void*>PyArray_DATA(obj), 624*sizeof(long))
+            self.internal_state.pos = pos
+            self.internal_state.has_gauss = has_gauss
+            self.internal_state.gauss = cached_gaussian
 
     # Pickling support:
     def __getstate__(self):
@@ -932,7 +942,8 @@ cdef class RandomState:
 
         diff = <unsigned long>hi - <unsigned long>lo - 1UL
         if size is None:
-            rv = lo + <long>rk_interval(diff, self. internal_state)
+            with self.lock:
+                rv = lo + <long>rk_interval(diff, self. internal_state)
             return rv
         else:
             array = <ndarray>np.empty(size, int)
@@ -1068,7 +1079,7 @@ cdef class RandomState:
             if pop_size is 0:
                 raise ValueError("a must be non-empty")
 
-        if None != p:
+        if p is not None:
             d = len(p)
             p = <ndarray>PyArray_ContiguousFromObject(p, NPY_DOUBLE, 1, 1)
             pix = <double*>PyArray_DATA(p)
@@ -1090,7 +1101,7 @@ cdef class RandomState:
 
         # Actual sampling
         if replace:
-            if None != p:
+            if p is not None:
                 cdf = p.cumsum()
                 cdf /= cdf[-1]
                 uniform_samples = self.random_sample(shape)
@@ -1103,7 +1114,7 @@ cdef class RandomState:
                 raise ValueError("Cannot take a larger sample than "
                                  "population when 'replace=False'")
 
-            if None != p:
+            if p is not None:
                 if np.count_nonzero(p > 0) < size:
                     raise ValueError("Fewer non-zero entries in p than size")
                 n_uniq = 0
@@ -2532,12 +2543,12 @@ cdef class RandomState:
 
         The Lomax or Pareto II distribution is a shifted Pareto distribution. The
         classical Pareto distribution can be obtained from the Lomax distribution
-        by adding the location parameter m, see below. The smallest value of the
-        Lomax distribution is zero while for the classical Pareto distribution it
-        is m, where the standard Pareto distribution has location m=1.
-        Lomax can also be considered as a simplified version of the Generalized
-        Pareto distribution (available in SciPy), with the scale set to one and
-        the location set to zero.
+        by adding 1 and multiplying by the scale parameter ``m`` (see Notes).
+        The smallest value of the Lomax distribution is zero while for the
+        classical Pareto distribution it is ``mu``, where the standard Pareto
+        distribution has location ``mu = 1``.  Lomax can also be considered as a
+        simplified version of the Generalized Pareto distribution (available in
+        SciPy), with the scale set to one and the location set to zero.
 
         The Pareto distribution must be greater than zero, and is unbounded above.
         It is also known as the "80-20 rule".  In this distribution, 80 percent of
@@ -2566,7 +2577,7 @@ cdef class RandomState:
 
         .. math:: p(x) = \\frac{am^a}{x^{a+1}}
 
-        where :math:`a` is the shape and :math:`m` the location
+        where :math:`a` is the shape and :math:`m` the scale.
 
         The Pareto distribution, named after the Italian economist Vilfredo Pareto,
         is a power law probability distribution useful in many real world problems.
@@ -2574,7 +2585,7 @@ cdef class RandomState:
         distribution. Pareto developed the distribution to describe the
         distribution of wealth in an economy.  It has also found use in insurance,
         web page access statistics, oil field sizes, and many other problems,
-        including the download frequency for projects in Sourceforge [1].  It is
+        including the download frequency for projects in Sourceforge [1]_.  It is
         one of the so-called "fat-tailed" distributions.
 
 
@@ -2592,16 +2603,16 @@ cdef class RandomState:
         --------
         Draw samples from the distribution:
 
-        >>> a, m = 3., 1. # shape and mode
-        >>> s = np.random.pareto(a, 1000) + m
+        >>> a, m = 3., 2.  # shape and mode
+        >>> s = (np.random.pareto(a, 1000) + 1) * m
 
-        Display the histogram of the samples, along with
-        the probability density function:
+        Display the histogram of the samples, along with the probability
+        density function:
 
         >>> import matplotlib.pyplot as plt
-        >>> count, bins, ignored = plt.hist(s, 100, normed=True, align='center')
-        >>> fit = a*m**a/bins**(a+1)
-        >>> plt.plot(bins, max(count)*fit/max(fit),linewidth=2, color='r')
+        >>> count, bins, _ = plt.hist(s, 100, normed=True)
+        >>> fit = a*m**a / bins**(a+1)
+        >>> plt.plot(bins, max(count)*fit/max(fit), linewidth=2, color='r')
         >>> plt.show()
 
         """
@@ -3752,8 +3763,9 @@ cdef class RandomState:
 
         Parameters
         ----------
-        lam : float
-            Expectation of interval, should be >= 0.
+        lam : float or sequence of float
+            Expectation of interval, should be >= 0. A sequence of expectation
+            intervals must be broadcastable over the requested size.
         size : int or tuple of ints, optional
             Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
             ``m * n * k`` samples are drawn.  Default is None, in which case a
@@ -3793,6 +3805,10 @@ cdef class RandomState:
         >>> count, bins, ignored = plt.hist(s, 14, normed=True)
         >>> plt.show()
 
+        Draw each 100 values for lambda 100 and 500:
+
+        >>> s = np.random.poisson(lam=(100., 500.), size=(100, 2))
+
         """
         cdef ndarray olam
         cdef double flam
@@ -4577,20 +4593,22 @@ cdef class RandomState:
             # each row. So we can't just use ordinary assignment to swap the
             # rows; we need a bounce buffer.
             buf = np.empty_like(x[0])
-            while i > 0:
-                j = rk_interval(i, self.internal_state)
-                buf[...] = x[j]
-                x[j] = x[i]
-                x[i] = buf
-                i = i - 1
+            with self.lock:
+                while i > 0:
+                    j = rk_interval(i, self.internal_state)
+                    buf[...] = x[j]
+                    x[j] = x[i]
+                    x[i] = buf
+                    i = i - 1
         else:
             # For single-dimensional arrays, lists, and any other Python
             # sequence types, indexing returns a real object that's
             # independent of the array contents, so we can just swap directly.
-            while i > 0:
-                j = rk_interval(i, self.internal_state)
-                x[i], x[j] = x[j], x[i]
-                i = i - 1
+            with self.lock:
+                while i > 0:
+                    j = rk_interval(i, self.internal_state)
+                    x[i], x[j] = x[j], x[i]
+                    i = i - 1
 
     def permutation(self, object x):
         """
diff --git a/numpy/random/setup.py b/numpy/random/setup.py
index 55cca69dabd1..33c12975b662 100644
--- a/numpy/random/setup.py
+++ b/numpy/random/setup.py
@@ -45,12 +45,11 @@ def generate_libraries(ext, build_dir):
                                   ['mtrand.c', 'randomkit.c', 'initarray.c',
                                    'distributions.c']]+[generate_libraries],
                          libraries=libs,
-                         depends = [join('mtrand', '*.h'),
-                                    join('mtrand', '*.pyx'),
-                                    join('mtrand', '*.pxi'),
-                                    ],
-                         define_macros = defs,
-                        )
+                         depends=[join('mtrand', '*.h'),
+                                  join('mtrand', '*.pyx'),
+                                  join('mtrand', '*.pxi'),],
+                         define_macros=defs,
+                         )
 
     config.add_data_files(('.', join('mtrand', 'randomkit.h')))
     config.add_data_dir('tests')
diff --git a/numpy/random/tests/test_random.py b/numpy/random/tests/test_random.py
index b64c9d6cd69f..1bf25a92613c 100644
--- a/numpy/random/tests/test_random.py
+++ b/numpy/random/tests/test_random.py
@@ -6,6 +6,7 @@
         assert_warns)
 from numpy import random
 from numpy.compat import asbytes
+import sys
 
 class TestSeed(TestCase):
     def test_scalar(self):
@@ -60,7 +61,7 @@ def test_zero_probability(self):
         random.multinomial(100, [0.2, 0.8, 0.0, 0.0, 0.0])
 
     def test_int_negative_interval(self):
-        assert_( -5 <= random.randint(-5, -1) < -1)
+        assert_(-5 <= random.randint(-5, -1) < -1)
         x = random.randint(-5, -1, 5)
         assert_(np.all(-5 <= x))
         assert_(np.all(x < -1))
@@ -68,15 +69,15 @@ def test_int_negative_interval(self):
     def test_size(self):
         # gh-3173
         p = [0.5, 0.5]
-        assert_equal(np.random.multinomial(1 ,p, np.uint32(1)).shape, (1, 2))
-        assert_equal(np.random.multinomial(1 ,p, np.uint32(1)).shape, (1, 2))
-        assert_equal(np.random.multinomial(1 ,p, np.uint32(1)).shape, (1, 2))
-        assert_equal(np.random.multinomial(1 ,p, [2, 2]).shape, (2, 2, 2))
-        assert_equal(np.random.multinomial(1 ,p, (2, 2)).shape, (2, 2, 2))
-        assert_equal(np.random.multinomial(1 ,p, np.array((2, 2))).shape,
+        assert_equal(np.random.multinomial(1, p, np.uint32(1)).shape, (1, 2))
+        assert_equal(np.random.multinomial(1, p, np.uint32(1)).shape, (1, 2))
+        assert_equal(np.random.multinomial(1, p, np.uint32(1)).shape, (1, 2))
+        assert_equal(np.random.multinomial(1, p, [2, 2]).shape, (2, 2, 2))
+        assert_equal(np.random.multinomial(1, p, (2, 2)).shape, (2, 2, 2))
+        assert_equal(np.random.multinomial(1, p, np.array((2, 2))).shape,
                      (2, 2, 2))
 
-        assert_raises(TypeError, np.random.multinomial, 1 , p,
+        assert_raises(TypeError, np.random.multinomial, 1, p,
                       np.float(1))
 
 
@@ -93,17 +94,16 @@ def test_basic(self):
         assert_(np.all(old == new))
 
     def test_gaussian_reset(self):
-        """ Make sure the cached every-other-Gaussian is reset.
-        """
+        # Make sure the cached every-other-Gaussian is reset.
         old = self.prng.standard_normal(size=3)
         self.prng.set_state(self.state)
         new = self.prng.standard_normal(size=3)
         assert_(np.all(old == new))
 
     def test_gaussian_reset_in_media_res(self):
-        """ When the state is saved with a cached Gaussian, make sure the cached
-        Gaussian is restored.
-        """
+        # When the state is saved with a cached Gaussian, make sure the
+        # cached Gaussian is restored.
+
         self.prng.standard_normal()
         state = self.prng.get_state()
         old = self.prng.standard_normal(size=3)
@@ -112,9 +112,8 @@ def test_gaussian_reset_in_media_res(self):
         assert_(np.all(old == new))
 
     def test_backwards_compatibility(self):
-        """ Make sure we can accept old state tuples that do not have the cached
-        Gaussian value.
-        """
+        # Make sure we can accept old state tuples that do not have the
+        # cached Gaussian value.
         old_state = self.state[:-2]
         x1 = self.prng.standard_normal(size=16)
         self.prng.set_state(old_state)
@@ -125,56 +124,55 @@ def test_backwards_compatibility(self):
         assert_(np.all(x1 == x3))
 
     def test_negative_binomial(self):
-        """ Ensure that the negative binomial results take floating point
-        arguments without truncation.
-        """
+        # Ensure that the negative binomial results take floating point
+        # arguments without truncation.
         self.prng.negative_binomial(0.5, 0.5)
 
 class TestRandomDist(TestCase):
-    """ Make sure the random distrobution return the correct value for a
-    given seed
-    """
+    # Make sure the random distrobution return the correct value for a
+    # given seed
+
     def setUp(self):
         self.seed = 1234567890
 
     def test_rand(self):
         np.random.seed(self.seed)
         actual = np.random.rand(3, 2)
-        desired = np.array([[ 0.61879477158567997,  0.59162362775974664],
-                         [ 0.88868358904449662,  0.89165480011560816],
-                         [ 0.4575674820298663,  0.7781880808593471 ]])
+        desired = np.array([[0.61879477158567997, 0.59162362775974664],
+                            [0.88868358904449662, 0.89165480011560816],
+                            [0.4575674820298663, 0.7781880808593471]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_randn(self):
         np.random.seed(self.seed)
         actual = np.random.randn(3, 2)
-        desired = np.array([[ 1.34016345771863121,  1.73759122771936081],
-                         [ 1.498988344300628, -0.2286433324536169 ],
-                         [ 2.031033998682787,  2.17032494605655257]])
+        desired = np.array([[1.34016345771863121, 1.73759122771936081],
+                           [1.498988344300628, -0.2286433324536169],
+                           [2.031033998682787, 2.17032494605655257]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_randint(self):
         np.random.seed(self.seed)
         actual = np.random.randint(-99, 99, size=(3, 2))
-        desired = np.array([[ 31,   3],
-                         [-52,  41],
-                         [-48, -66]])
+        desired = np.array([[31, 3],
+                            [-52, 41],
+                            [-48, -66]])
         np.testing.assert_array_equal(actual, desired)
 
     def test_random_integers(self):
         np.random.seed(self.seed)
         actual = np.random.random_integers(-99, 99, size=(3, 2))
-        desired = np.array([[ 31,   3],
-                         [-52,  41],
-                         [-48, -66]])
+        desired = np.array([[31, 3],
+                            [-52, 41],
+                            [-48, -66]])
         np.testing.assert_array_equal(actual, desired)
 
     def test_random_sample(self):
         np.random.seed(self.seed)
         actual = np.random.random_sample((3, 2))
-        desired = np.array([[ 0.61879477158567997,  0.59162362775974664],
-                         [ 0.88868358904449662,  0.89165480011560816],
-                         [ 0.4575674820298663,  0.7781880808593471 ]])
+        desired = np.array([[0.61879477158567997, 0.59162362775974664],
+                            [0.88868358904449662, 0.89165480011560816],
+                            [0.4575674820298663, 0.7781880808593471]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_choice_uniform_replace(self):
@@ -304,9 +302,10 @@ def test_shuffle_masked(self):
     def test_beta(self):
         np.random.seed(self.seed)
         actual = np.random.beta(.1, .9, size=(3, 2))
-        desired = np.array([[  1.45341850513746058e-02,   5.31297615662868145e-04],
-                         [  1.85366619058432324e-06,   4.19214516800110563e-03],
-                         [  1.58405155108498093e-04,   1.26252891949397652e-04]])
+        desired = np.array(
+                [[1.45341850513746058e-02, 5.31297615662868145e-04],
+                 [1.85366619058432324e-06, 4.19214516800110563e-03],
+                 [1.58405155108498093e-04, 1.26252891949397652e-04]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_binomial(self):
@@ -320,26 +319,26 @@ def test_binomial(self):
     def test_chisquare(self):
         np.random.seed(self.seed)
         actual = np.random.chisquare(50, size=(3, 2))
-        desired = np.array([[ 63.87858175501090585,  68.68407748911370447],
-                            [ 65.77116116901505904,  47.09686762438974483],
-                            [ 72.3828403199695174,  74.18408615260374006]])
+        desired = np.array([[63.87858175501090585, 68.68407748911370447],
+                            [65.77116116901505904, 47.09686762438974483],
+                            [72.3828403199695174, 74.18408615260374006]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=13)
 
     def test_dirichlet(self):
         np.random.seed(self.seed)
-        alpha = np.array([51.72840233779265162,  39.74494232180943953])
+        alpha = np.array([51.72840233779265162, 39.74494232180943953])
         actual = np.random.mtrand.dirichlet(alpha, size=(3, 2))
-        desired = np.array([[[ 0.54539444573611562,  0.45460555426388438],
-                             [ 0.62345816822039413,  0.37654183177960598]],
-                            [[ 0.55206000085785778,  0.44793999914214233],
-                             [ 0.58964023305154301,  0.41035976694845688]],
-                            [[ 0.59266909280647828,  0.40733090719352177],
-                             [ 0.56974431743975207,  0.43025568256024799]]])
+        desired = np.array([[[0.54539444573611562, 0.45460555426388438],
+                             [0.62345816822039413, 0.37654183177960598]],
+                            [[0.55206000085785778, 0.44793999914214233],
+                             [0.58964023305154301, 0.41035976694845688]],
+                            [[0.59266909280647828, 0.40733090719352177],
+                             [0.56974431743975207, 0.43025568256024799]]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_dirichlet_size(self):
         # gh-3173
-        p = np.array([51.72840233779265162,  39.74494232180943953])
+        p = np.array([51.72840233779265162, 39.74494232180943953])
         assert_equal(np.random.dirichlet(p, np.uint32(1)).shape, (1, 2))
         assert_equal(np.random.dirichlet(p, np.uint32(1)).shape, (1, 2))
         assert_equal(np.random.dirichlet(p, np.uint32(1)).shape, (1, 2))
@@ -352,49 +351,49 @@ def test_dirichlet_size(self):
     def test_exponential(self):
         np.random.seed(self.seed)
         actual = np.random.exponential(1.1234, size=(3, 2))
-        desired = np.array([[ 1.08342649775011624,  1.00607889924557314],
-                         [ 2.46628830085216721,  2.49668106809923884],
-                         [ 0.68717433461363442,  1.69175666993575979]])
+        desired = np.array([[1.08342649775011624, 1.00607889924557314],
+                            [2.46628830085216721, 2.49668106809923884],
+                            [0.68717433461363442, 1.69175666993575979]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_f(self):
         np.random.seed(self.seed)
         actual = np.random.f(12, 77, size=(3, 2))
-        desired = np.array([[ 1.21975394418575878,  1.75135759791559775],
-                            [ 1.44803115017146489,  1.22108959480396262],
-                            [ 1.02176975757740629,  1.34431827623300415]])
+        desired = np.array([[1.21975394418575878, 1.75135759791559775],
+                            [1.44803115017146489, 1.22108959480396262],
+                            [1.02176975757740629, 1.34431827623300415]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_gamma(self):
         np.random.seed(self.seed)
         actual = np.random.gamma(5, 3, size=(3, 2))
-        desired = np.array([[ 24.60509188649287182,  28.54993563207210627],
-                             [ 26.13476110204064184,  12.56988482927716078],
-                             [ 31.71863275789960568,  33.30143302795922011]])
+        desired = np.array([[24.60509188649287182, 28.54993563207210627],
+                            [26.13476110204064184, 12.56988482927716078],
+                            [31.71863275789960568, 33.30143302795922011]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=14)
 
     def test_geometric(self):
         np.random.seed(self.seed)
         actual = np.random.geometric(.123456789, size=(3, 2))
-        desired = np.array([[ 8,  7],
-                         [17, 17],
-                         [ 5, 12]])
+        desired = np.array([[8, 7],
+                            [17, 17],
+                            [5, 12]])
         np.testing.assert_array_equal(actual, desired)
 
     def test_gumbel(self):
         np.random.seed(self.seed)
-        actual = np.random.gumbel(loc = .123456789, scale = 2.0, size = (3, 2))
-        desired = np.array([[ 0.19591898743416816,  0.34405539668096674],
-                         [-1.4492522252274278, -1.47374816298446865],
-                         [ 1.10651090478803416, -0.69535848626236174]])
+        actual = np.random.gumbel(loc=.123456789, scale=2.0, size=(3, 2))
+        desired = np.array([[0.19591898743416816, 0.34405539668096674],
+                            [-1.4492522252274278, -1.47374816298446865],
+                            [1.10651090478803416, -0.69535848626236174]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_hypergeometric(self):
         np.random.seed(self.seed)
         actual = np.random.hypergeometric(10.1, 5.5, 14, size=(3, 2))
         desired = np.array([[10, 10],
-                         [10, 10],
-                         [ 9,  9]])
+                            [10, 10],
+                            [9, 9]])
         np.testing.assert_array_equal(actual, desired)
 
         # Test nbad = 0
@@ -418,49 +417,49 @@ def test_hypergeometric(self):
     def test_laplace(self):
         np.random.seed(self.seed)
         actual = np.random.laplace(loc=.123456789, scale=2.0, size=(3, 2))
-        desired = np.array([[ 0.66599721112760157,  0.52829452552221945],
-                         [ 3.12791959514407125,  3.18202813572992005],
-                         [-0.05391065675859356,  1.74901336242837324]])
+        desired = np.array([[0.66599721112760157, 0.52829452552221945],
+                            [3.12791959514407125, 3.18202813572992005],
+                            [-0.05391065675859356, 1.74901336242837324]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_logistic(self):
         np.random.seed(self.seed)
         actual = np.random.logistic(loc=.123456789, scale=2.0, size=(3, 2))
-        desired = np.array([[ 1.09232835305011444,  0.8648196662399954 ],
-                         [ 4.27818590694950185,  4.33897006346929714],
-                         [-0.21682183359214885,  2.63373365386060332]])
+        desired = np.array([[1.09232835305011444, 0.8648196662399954],
+                            [4.27818590694950185, 4.33897006346929714],
+                            [-0.21682183359214885, 2.63373365386060332]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_lognormal(self):
         np.random.seed(self.seed)
         actual = np.random.lognormal(mean=.123456789, sigma=2.0, size=(3, 2))
-        desired = np.array([[ 16.50698631688883822,  36.54846706092654784],
-                         [ 22.67886599981281748,   0.71617561058995771],
-                         [ 65.72798501792723869,  86.84341601437161273]])
+        desired = np.array([[16.50698631688883822, 36.54846706092654784],
+                            [22.67886599981281748, 0.71617561058995771],
+                            [65.72798501792723869, 86.84341601437161273]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=13)
 
     def test_logseries(self):
         np.random.seed(self.seed)
         actual = np.random.logseries(p=.923456789, size=(3, 2))
-        desired = np.array([[ 2,  2],
-                         [ 6, 17],
-                         [ 3,  6]])
+        desired = np.array([[2, 2],
+                            [6, 17],
+                            [3, 6]])
         np.testing.assert_array_equal(actual, desired)
 
     def test_multinomial(self):
         np.random.seed(self.seed)
         actual = np.random.multinomial(20, [1/6.]*6, size=(3, 2))
         desired = np.array([[[4, 3, 5, 4, 2, 2],
-                          [5, 2, 8, 2, 2, 1]],
-                         [[3, 4, 3, 6, 0, 4],
-                          [2, 1, 4, 3, 6, 4]],
-                         [[4, 4, 2, 5, 2, 3],
-                          [4, 3, 4, 2, 3, 4]]])
+                             [5, 2, 8, 2, 2, 1]],
+                            [[3, 4, 3, 6, 0, 4],
+                             [2, 1, 4, 3, 6, 4]],
+                            [[4, 4, 2, 5, 2, 3],
+                             [4, 3, 4, 2, 3, 4]]])
         np.testing.assert_array_equal(actual, desired)
 
     def test_multivariate_normal(self):
         np.random.seed(self.seed)
-        mean= (.123456789, 10)
+        mean = (.123456789, 10)
         # Hmm... not even symmetric.
         cov = [[1, 0], [1, 0]]
         size = (3, 2)
@@ -470,7 +469,7 @@ def test_multivariate_normal(self):
                             [[-2.29186329304599745, 10.],
                              [-1.77505606019580053, 10.]],
                             [[-0.54970369430044119, 10.],
-                             [ 0.29768848031692957, 10.]]])
+                             [0.29768848031692957, 10.]]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
         # Check for default size, was raising deprecation warning
@@ -479,50 +478,50 @@ def test_multivariate_normal(self):
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
         # Check that non positive-semidefinite covariance raises warning
-        mean= [0, 0]
+        mean = [0, 0]
         cov = [[1, 1 + 1e-10], [1 + 1e-10, 1]]
-        rng = np.random.multivariate_normal
         assert_warns(RuntimeWarning, np.random.multivariate_normal, mean, cov)
 
     def test_negative_binomial(self):
         np.random.seed(self.seed)
         actual = np.random.negative_binomial(n=100, p=.12345, size=(3, 2))
         desired = np.array([[848, 841],
-                         [892, 611],
-                         [779, 647]])
+                            [892, 611],
+                            [779, 647]])
         np.testing.assert_array_equal(actual, desired)
 
     def test_noncentral_chisquare(self):
         np.random.seed(self.seed)
-        actual = np.random.noncentral_chisquare(df = 5, nonc = 5, size = (3, 2))
-        desired = np.array([[ 23.91905354498517511,  13.35324692733826346],
-                         [ 31.22452661329736401,  16.60047399466177254],
-                         [  5.03461598262724586,  17.94973089023519464]])
+        actual = np.random.noncentral_chisquare(df=5, nonc=5, size=(3, 2))
+        desired = np.array([[23.91905354498517511, 13.35324692733826346],
+                            [31.22452661329736401, 16.60047399466177254],
+                            [5.03461598262724586, 17.94973089023519464]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=14)
 
     def test_noncentral_f(self):
         np.random.seed(self.seed)
-        actual = np.random.noncentral_f(dfnum = 5, dfden = 2, nonc = 1,
-                                        size = (3, 2))
-        desired = np.array([[ 1.40598099674926669,  0.34207973179285761],
-                         [ 3.57715069265772545,  7.92632662577829805],
-                         [ 0.43741599463544162,  1.1774208752428319 ]])
+        actual = np.random.noncentral_f(dfnum=5, dfden=2, nonc=1,
+                                        size=(3, 2))
+        desired = np.array([[1.40598099674926669, 0.34207973179285761],
+                            [3.57715069265772545, 7.92632662577829805],
+                            [0.43741599463544162, 1.1774208752428319]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=14)
 
     def test_normal(self):
         np.random.seed(self.seed)
-        actual = np.random.normal(loc = .123456789, scale = 2.0, size = (3, 2))
-        desired = np.array([[ 2.80378370443726244,  3.59863924443872163],
-                         [ 3.121433477601256, -0.33382987590723379],
-                         [ 4.18552478636557357,  4.46410668111310471]])
+        actual = np.random.normal(loc=.123456789, scale=2.0, size=(3, 2))
+        desired = np.array([[2.80378370443726244, 3.59863924443872163],
+                            [3.121433477601256, -0.33382987590723379],
+                            [4.18552478636557357, 4.46410668111310471]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_pareto(self):
         np.random.seed(self.seed)
-        actual = np.random.pareto(a =.123456789, size = (3, 2))
-        desired = np.array([[  2.46852460439034849e+03,   1.41286880810518346e+03],
-                         [  5.28287797029485181e+07,   6.57720981047328785e+07],
-                         [  1.40840323350391515e+02,   1.98390255135251704e+05]])
+        actual = np.random.pareto(a=.123456789, size=(3, 2))
+        desired = np.array(
+                [[2.46852460439034849e+03, 1.41286880810518346e+03],
+                 [5.28287797029485181e+07, 6.57720981047328785e+07],
+                 [1.40840323350391515e+02, 1.98390255135251704e+05]])
         # For some reason on 32-bit x86 Ubuntu 12.10 the [1, 0] entry in this
         # matrix differs by 24 nulps. Discussion:
         #   http://mail.scipy.org/pipermail/numpy-discussion/2012-September/063801.html
@@ -533,7 +532,7 @@ def test_pareto(self):
 
     def test_poisson(self):
         np.random.seed(self.seed)
-        actual = np.random.poisson(lam = .123456789, size=(3, 2))
+        actual = np.random.poisson(lam=.123456789, size=(3, 2))
         desired = np.array([[0, 0],
                          [1, 0],
                          [0, 0]])
@@ -549,84 +548,83 @@ def test_poisson_exceptions(self):
 
     def test_power(self):
         np.random.seed(self.seed)
-        actual = np.random.power(a =.123456789, size = (3, 2))
-        desired = np.array([[ 0.02048932883240791,  0.01424192241128213],
-                         [ 0.38446073748535298,  0.39499689943484395],
-                         [ 0.00177699707563439,  0.13115505880863756]])
+        actual = np.random.power(a=.123456789, size=(3, 2))
+        desired = np.array([[0.02048932883240791, 0.01424192241128213],
+                            [0.38446073748535298, 0.39499689943484395],
+                            [0.00177699707563439, 0.13115505880863756]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_rayleigh(self):
         np.random.seed(self.seed)
-        actual = np.random.rayleigh(scale = 10, size = (3, 2))
-        desired = np.array([[ 13.8882496494248393,  13.383318339044731  ],
-                         [ 20.95413364294492098,  21.08285015800712614],
-                         [ 11.06066537006854311,  17.35468505778271009]])
+        actual = np.random.rayleigh(scale=10, size=(3, 2))
+        desired = np.array([[13.8882496494248393, 13.383318339044731],
+                            [20.95413364294492098, 21.08285015800712614],
+                            [11.06066537006854311, 17.35468505778271009]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=14)
 
     def test_standard_cauchy(self):
         np.random.seed(self.seed)
-        actual = np.random.standard_cauchy(size = (3, 2))
-        desired = np.array([[ 0.77127660196445336, -6.55601161955910605],
-                         [ 0.93582023391158309, -2.07479293013759447],
-                         [-4.74601644297011926,  0.18338989290760804]])
+        actual = np.random.standard_cauchy(size=(3, 2))
+        desired = np.array([[0.77127660196445336, -6.55601161955910605],
+                            [0.93582023391158309, -2.07479293013759447],
+                            [-4.74601644297011926, 0.18338989290760804]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_standard_exponential(self):
         np.random.seed(self.seed)
-        actual = np.random.standard_exponential(size = (3, 2))
-        desired = np.array([[ 0.96441739162374596,  0.89556604882105506],
-                         [ 2.1953785836319808,  2.22243285392490542],
-                         [ 0.6116915921431676,  1.50592546727413201]])
+        actual = np.random.standard_exponential(size=(3, 2))
+        desired = np.array([[0.96441739162374596, 0.89556604882105506],
+                            [2.1953785836319808, 2.22243285392490542],
+                            [0.6116915921431676, 1.50592546727413201]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_standard_gamma(self):
         np.random.seed(self.seed)
-        actual = np.random.standard_gamma(shape = 3, size = (3, 2))
-        desired = np.array([[ 5.50841531318455058,  6.62953470301903103],
-                         [ 5.93988484943779227,  2.31044849402133989],
-                         [ 7.54838614231317084,  8.012756093271868  ]])
+        actual = np.random.standard_gamma(shape=3, size=(3, 2))
+        desired = np.array([[5.50841531318455058, 6.62953470301903103],
+                            [5.93988484943779227, 2.31044849402133989],
+                            [7.54838614231317084, 8.012756093271868]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=14)
 
     def test_standard_normal(self):
         np.random.seed(self.seed)
-        actual = np.random.standard_normal(size = (3, 2))
-        desired = np.array([[ 1.34016345771863121,  1.73759122771936081],
-                         [ 1.498988344300628, -0.2286433324536169 ],
-                         [ 2.031033998682787,  2.17032494605655257]])
+        actual = np.random.standard_normal(size=(3, 2))
+        desired = np.array([[1.34016345771863121, 1.73759122771936081],
+                            [1.498988344300628, -0.2286433324536169],
+                            [2.031033998682787, 2.17032494605655257]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_standard_t(self):
         np.random.seed(self.seed)
-        actual = np.random.standard_t(df = 10, size = (3, 2))
-        desired = np.array([[ 0.97140611862659965, -0.08830486548450577],
-                         [ 1.36311143689505321, -0.55317463909867071],
-                         [-0.18473749069684214,  0.61181537341755321]])
+        actual = np.random.standard_t(df=10, size=(3, 2))
+        desired = np.array([[0.97140611862659965, -0.08830486548450577],
+                            [1.36311143689505321, -0.55317463909867071],
+                            [-0.18473749069684214, 0.61181537341755321]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_triangular(self):
         np.random.seed(self.seed)
-        actual = np.random.triangular(left = 5.12, mode = 10.23, right = 20.34,
-                                      size = (3, 2))
-        desired = np.array([[ 12.68117178949215784,  12.4129206149193152 ],
-                         [ 16.20131377335158263,  16.25692138747600524],
-                         [ 11.20400690911820263,  14.4978144835829923 ]])
+        actual = np.random.triangular(left=5.12, mode=10.23, right=20.34,
+                                      size=(3, 2))
+        desired = np.array([[12.68117178949215784, 12.4129206149193152],
+                            [16.20131377335158263, 16.25692138747600524],
+                            [11.20400690911820263, 14.4978144835829923]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=14)
 
     def test_uniform(self):
         np.random.seed(self.seed)
-        actual = np.random.uniform(low = 1.23, high=10.54, size = (3, 2))
-        desired = np.array([[ 6.99097932346268003,  6.73801597444323974],
-                         [ 9.50364421400426274,  9.53130618907631089],
-                         [ 5.48995325769805476,  8.47493103280052118]])
+        actual = np.random.uniform(low=1.23, high=10.54, size=(3, 2))
+        desired = np.array([[6.99097932346268003, 6.73801597444323974],
+                            [9.50364421400426274, 9.53130618907631089],
+                            [5.48995325769805476, 8.47493103280052118]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
-
     def test_vonmises(self):
         np.random.seed(self.seed)
-        actual = np.random.vonmises(mu = 1.23, kappa = 1.54, size = (3, 2))
-        desired = np.array([[ 2.28567572673902042,  2.89163838442285037],
-                         [ 0.38198375564286025,  2.57638023113890746],
-                         [ 1.19153771588353052,  1.83509849681825354]])
+        actual = np.random.vonmises(mu=1.23, kappa=1.54, size=(3, 2))
+        desired = np.array([[2.28567572673902042, 2.89163838442285037],
+                            [0.38198375564286025, 2.57638023113890746],
+                            [1.19153771588353052, 1.83509849681825354]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_vonmises_small(self):
@@ -637,31 +635,31 @@ def test_vonmises_small(self):
 
     def test_wald(self):
         np.random.seed(self.seed)
-        actual = np.random.wald(mean = 1.23, scale = 1.54, size = (3, 2))
-        desired = np.array([[ 3.82935265715889983,  5.13125249184285526],
-                         [ 0.35045403618358717,  1.50832396872003538],
-                         [ 0.24124319895843183,  0.22031101461955038]])
+        actual = np.random.wald(mean=1.23, scale=1.54, size=(3, 2))
+        desired = np.array([[3.82935265715889983, 5.13125249184285526],
+                            [0.35045403618358717, 1.50832396872003538],
+                            [0.24124319895843183, 0.22031101461955038]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=14)
 
     def test_weibull(self):
         np.random.seed(self.seed)
-        actual = np.random.weibull(a = 1.23, size = (3, 2))
-        desired = np.array([[ 0.97097342648766727,  0.91422896443565516],
-                         [ 1.89517770034962929,  1.91414357960479564],
-                         [ 0.67057783752390987,  1.39494046635066793]])
+        actual = np.random.weibull(a=1.23, size=(3, 2))
+        desired = np.array([[0.97097342648766727, 0.91422896443565516],
+                            [1.89517770034962929, 1.91414357960479564],
+                            [0.67057783752390987, 1.39494046635066793]])
         np.testing.assert_array_almost_equal(actual, desired, decimal=15)
 
     def test_zipf(self):
         np.random.seed(self.seed)
-        actual = np.random.zipf(a = 1.23, size = (3, 2))
+        actual = np.random.zipf(a=1.23, size=(3, 2))
         desired = np.array([[66, 29],
-                         [ 1,  1],
-                         [ 3, 13]])
+                            [1, 1],
+                            [3, 13]])
         np.testing.assert_array_equal(actual, desired)
 
 
-class TestThread:
-    """ make sure each state produces the same sequence even in threads """
+class TestThread(object):
+    # make sure each state produces the same sequence even in threads
     def setUp(self):
         self.seeds = range(4)
 
@@ -681,7 +679,13 @@ def check_function(self, function, sz):
         for s, o in zip(self.seeds, out2):
             function(np.random.RandomState(s), o)
 
-        np.testing.assert_array_equal(out1, out2)
+        # these platforms change x87 fpu precision mode in threads
+        if (np.intp().dtype.itemsize == 4 and
+                (sys.platform == "win32" or
+                 sys.platform.startswith("gnukfreebsd"))):
+            np.testing.assert_array_almost_equal(out1, out2)
+        else:
+            np.testing.assert_array_equal(out1, out2)
 
     def test_normal(self):
         def gen_random(state, out):
diff --git a/numpy/random/tests/test_regression.py b/numpy/random/tests/test_regression.py
index 1bba5d91dffb..ccffd033e55c 100644
--- a/numpy/random/tests/test_regression.py
+++ b/numpy/random/tests/test_regression.py
@@ -44,7 +44,7 @@ def test_permutation_longs(self):
         b = np.random.permutation(long(12))
         assert_array_equal(a, b)
 
-    def test_randint_range(self) :
+    def test_randint_range(self):
         # Test for ticket #1690
         lmax = np.iinfo('l').max
         lmin = np.iinfo('l').min
diff --git a/numpy/testing/nosetester.py b/numpy/testing/nosetester.py
index a06d559e7637..8183444eb8f5 100644
--- a/numpy/testing/nosetester.py
+++ b/numpy/testing/nosetester.py
@@ -178,7 +178,7 @@ class NoseTester(object):
                 'pyrex_ext',
                 'swig_ext']
 
-    def __init__(self, package=None, raise_warnings="develop"):
+    def __init__(self, package=None, raise_warnings="release"):
         package_name = None
         if package is None:
             f = sys._getframe(1)
diff --git a/numpy/testing/tests/test_utils.py b/numpy/testing/tests/test_utils.py
index aa0a2669fd7d..41a48ea65dd5 100644
--- a/numpy/testing/tests/test_utils.py
+++ b/numpy/testing/tests/test_utils.py
@@ -244,6 +244,14 @@ def test_inf(self):
         self.assertRaises(AssertionError,
                 lambda : self._assert_func(a, b))
 
+    def test_subclass(self):
+        a = np.array([[1., 2.], [3., 4.]])
+        b = np.ma.masked_array([[1., 2.], [0., 4.]],
+                               [[False, False], [True, False]])
+        assert_array_almost_equal(a, b)
+        assert_array_almost_equal(b, a)
+        assert_array_almost_equal(b, b)
+
 class TestAlmostEqual(_GenericTest, unittest.TestCase):
     def setUp(self):
         self._assert_func = assert_almost_equal
diff --git a/numpy/testing/utils.py b/numpy/testing/utils.py
index ddf21e2bcc8f..a078897e9295 100644
--- a/numpy/testing/utils.py
+++ b/numpy/testing/utils.py
@@ -10,6 +10,9 @@
 import operator
 import warnings
 from functools import partial
+import shutil
+import contextlib
+from tempfile import mkdtemp
 from .nosetester import import_nose
 from numpy.core import float32, empty, arange, array_repr, ndarray
 
@@ -219,7 +222,7 @@ def build_err_msg(arrays, err_msg, header='Items are not equal:',
 
 def assert_equal(actual,desired,err_msg='',verbose=True):
     """
-    Raise an assertion if two objects are not equal.
+    Raises an AssertionError if two objects are not equal.
 
     Given two objects (scalars, lists, tuples, dictionaries or numpy arrays),
     check that all elements of these objects are equal. An exception is raised
@@ -371,7 +374,8 @@ def print_assert_equal(test_string, actual, desired):
 
 def assert_almost_equal(actual,desired,decimal=7,err_msg='',verbose=True):
     """
-    Raise an assertion if two items are not equal up to desired precision.
+    Raises an AssertionError if two items are not equal up to desired
+    precision.
 
     .. note:: It is recommended to use one of `assert_allclose`,
               `assert_array_almost_equal_nulp` or `assert_array_max_ulp`
@@ -488,7 +492,8 @@ def _build_err_msg():
 
 def assert_approx_equal(actual,desired,significant=7,err_msg='',verbose=True):
     """
-    Raise an assertion if two items are not equal up to significant digits.
+    Raises an AssertionError if two items are not equal up to significant
+    digits.
 
     .. note:: It is recommended to use one of `assert_allclose`,
               `assert_array_almost_equal_nulp` or `assert_array_max_ulp`
@@ -669,7 +674,7 @@ def chk_same_position(x_id, y_id, hasval='nan'):
 
 def assert_array_equal(x, y, err_msg='', verbose=True):
     """
-    Raise an assertion if two array_like objects are not equal.
+    Raises an AssertionError if two array_like objects are not equal.
 
     Given two array_like objects, check that the shape is equal and all
     elements of these objects are equal. An exception is raised at
@@ -735,7 +740,8 @@ def assert_array_equal(x, y, err_msg='', verbose=True):
 
 def assert_array_almost_equal(x, y, decimal=6, err_msg='', verbose=True):
     """
-    Raise an assertion if two objects are not equal up to desired precision.
+    Raises an AssertionError if two objects are not equal up to desired
+    precision.
 
     .. note:: It is recommended to use one of `assert_allclose`,
               `assert_array_almost_equal_nulp` or `assert_array_max_ulp`
@@ -823,7 +829,7 @@ def compare(x, y):
         # make sure y is an inexact type to avoid abs(MIN_INT); will cause
         # casting of x later.
         dtype = result_type(y, 1.)
-        y = array(y, dtype=dtype, copy=False)
+        y = array(y, dtype=dtype, copy=False, subok=True)
         z = abs(x-y)
 
         if not issubdtype(z.dtype, number):
@@ -838,7 +844,8 @@ def compare(x, y):
 
 def assert_array_less(x, y, err_msg='', verbose=True):
     """
-    Raise an assertion if two array_like objects are not ordered by less than.
+    Raises an AssertionError if two array_like objects are not ordered by less
+    than.
 
     Given two array_like objects, check that the shape is equal and all
     elements of the first object are strictly smaller than those of the
@@ -1240,7 +1247,8 @@ def _assert_valid_refcount(op):
 def assert_allclose(actual, desired, rtol=1e-7, atol=0,
                     err_msg='', verbose=True):
     """
-    Raise an assertion if two objects are not equal up to desired tolerance.
+    Raises an AssertionError if two objects are not equal up to desired
+    tolerance.
 
     The test is equivalent to ``allclose(actual, desired, rtol, atol)``.
     It compares the difference between `actual` and `desired` to
@@ -1323,7 +1331,7 @@ def assert_array_almost_equal_nulp(x, y, nulp=1):
     -----
     An assertion is raised if the following condition is not met::
 
-        abs(x - y) <= nulps * spacing(max(abs(x), abs(y)))
+        abs(x - y) <= nulps * spacing(maximum(abs(x), abs(y)))
 
     Examples
     --------
@@ -1692,3 +1700,16 @@ def _gen_alignment_data(dtype=float32, type='binary', max_size=24):
 
 class IgnoreException(Exception):
     "Ignoring this exception due to disabled feature"
+
+
+@contextlib.contextmanager
+def tempdir(*args, **kwargs):
+    """Context manager to provide a temporary test folder.
+
+    All arguments are passed as this to the underlying tempfile.mkdtemp
+    function.
+
+    """
+    tmpdir = mkdtemp(*args, **kwargs)
+    yield tmpdir
+    shutil.rmtree(tmpdir)
diff --git a/numpy/tests/test_scripts.py b/numpy/tests/test_scripts.py
new file mode 100644
index 000000000000..b48e3f3f776a
--- /dev/null
+++ b/numpy/tests/test_scripts.py
@@ -0,0 +1,65 @@
+""" Test scripts
+
+Test that we can run executable scripts that have been installed with numpy.
+"""
+from __future__ import division, print_function, absolute_import
+
+import os
+from os.path import join as pathjoin, isfile, dirname, basename
+import sys
+from subprocess import Popen, PIPE
+import numpy as np
+from numpy.compat.py3k import basestring, asbytes
+from nose.tools import assert_equal
+from numpy.testing.decorators import skipif
+
+skipif_inplace = skipif(isfile(pathjoin(dirname(np.__file__),  '..', 'setup.py')))
+
+def run_command(cmd, check_code=True):
+    """ Run command sequence `cmd` returning exit code, stdout, stderr
+
+    Parameters
+    ----------
+    cmd : str or sequence
+        string with command name or sequence of strings defining command
+    check_code : {True, False}, optional
+        If True, raise error for non-zero return code
+
+    Returns
+    -------
+    returncode : int
+        return code from execution of `cmd`
+    stdout : bytes (python 3) or str (python 2)
+        stdout from `cmd`
+    stderr : bytes (python 3) or str (python 2)
+        stderr from `cmd`
+
+    Raises
+    ------
+    RuntimeError
+        If `check_code` is True, and return code !=0
+    """
+    cmd = [cmd] if isinstance(cmd, basestring) else list(cmd)
+    if os.name == 'nt':
+        # Quote any arguments with spaces. The quotes delimit the arguments
+        # on Windows, and the arguments might be file paths with spaces.
+        # On Unix the list elements are each separate arguments.
+        cmd = ['"{0}"'.format(c) if ' ' in c else c for c in cmd]
+    proc = Popen(cmd, stdout=PIPE, stderr=PIPE)
+    stdout, stderr = proc.communicate()
+    if proc.poll() == None:
+        proc.terminate()
+    if check_code and proc.returncode != 0:
+        raise RuntimeError('\n'.join(
+            ['Command "{0}" failed with',
+             'stdout', '------', '{1}', '',
+             'stderr', '------', '{2}']).format(cmd, stdout, stderr))
+    return proc.returncode, stdout, stderr
+
+
+@skipif_inplace
+def test_f2py():
+    # test that we can run f2py script
+    f2py_cmd = 'f2py' + basename(sys.executable)[6:]
+    code, stdout, stderr = run_command([f2py_cmd, '-v'])
+    assert_equal(stdout.strip(), asbytes('2'))
diff --git a/pavement.py b/pavement.py
index c0b5cb2d43b2..43e882c964f5 100644
--- a/pavement.py
+++ b/pavement.py
@@ -99,11 +99,11 @@
 #-----------------------------------
 
 # Source of the release notes
-RELEASE_NOTES = 'doc/release/1.9.0-notes.rst'
+RELEASE_NOTES = 'doc/release/1.9.3-notes.rst'
 
 # Start/end of the log (from git)
-LOG_START = 'v1.8.0b1'
-LOG_END = 'master'
+LOG_START = 'v1.9.2'
+LOG_END = 'maintenance/1.9.x'
 
 
 #-------------------------------------------------------
diff --git a/setup.py b/setup.py
index 12eb290932ea..1e0810777dba 100755
--- a/setup.py
+++ b/setup.py
@@ -49,8 +49,8 @@
 
 MAJOR               = 1
 MINOR               = 9
-MICRO               = 0
-ISRELEASED          = False
+MICRO               = 3
+ISRELEASED          = True
 VERSION             = '%d.%d.%d' % (MAJOR, MINOR, MICRO)
 
 
diff --git a/site.cfg.example b/site.cfg.example
index 714ab63110ff..4a59f10e29ae 100644
--- a/site.cfg.example
+++ b/site.cfg.example
@@ -126,7 +126,7 @@
 # better performance.  Note that the AMD library has nothing to do with AMD
 # (Advanced Micro Devices), the CPU company.
 #
-# UMFPACK is not needed for numpy or scipy.
+# UMFPACK is not used by numpy.
 #
 #   http://www.cise.ufl.edu/research/sparse/umfpack/
 #   http://www.cise.ufl.edu/research/sparse/amd/
@@ -141,7 +141,7 @@
 # FFT libraries
 # -------------
 # There are two FFT libraries that we can configure here: FFTW (2 and 3) and djbfft.
-# Note that these libraries are not needed for numpy or scipy.
+# Note that these libraries are not used by for numpy or scipy.
 #
 #   http://fftw.org/
 #   http://cr.yp.to/djbfft.html
diff --git a/tools/allocation_tracking/track_allocations.py b/tools/allocation_tracking/track_allocations.py
index 2006217c2d61..dfc354eb5dbf 100644
--- a/tools/allocation_tracking/track_allocations.py
+++ b/tools/allocation_tracking/track_allocations.py
@@ -1,6 +1,7 @@
 from __future__ import division, absolute_import, print_function
 
 import numpy as np
+import gc
 import inspect
 from alloc_hook import NumpyAllocHook
 
@@ -35,12 +36,21 @@ def __exit__(self, type, value, traceback):
         self.numpy_hook.__exit__()
 
     def hook(self, inptr, outptr, size):
+        # minimize the chances that the garbage collector kicks in during a
+        # cython __dealloc__ call and causes a double delete of the current
+        # object. To avoid this fully the hook would have to avoid all python
+        # api calls, e.g. by being implemented in C like python 3.4's
+        # tracemalloc module
+        gc_on = gc.isenabled()
+        gc.disable()
         if outptr == 0:  # it's a free
             self.free_cb(inptr)
         elif inptr != 0:  # realloc
             self.realloc_cb(inptr, outptr, size)
         else:  # malloc
             self.alloc_cb(outptr, size)
+        if gc_on:
+            gc.enable()
 
     def alloc_cb(self, ptr, size):
         if size >= self.threshold:
diff --git a/tools/swig/numpy.i b/tools/swig/numpy.i
index e250e78bfa58..217acd5bff69 100644
--- a/tools/swig/numpy.i
+++ b/tools/swig/numpy.i
@@ -872,7 +872,7 @@
   (PyArrayObject* array=NULL, int is_new_object=0)
 {
   npy_intp size[2] = { -1, -1 };
-  array = obj_to_array_contiguous_allow_conversion($input,
+  array = obj_to_array_fortran_allow_conversion($input,
                                                    DATA_TYPECODE,
                                                    &is_new_object);
   if (!array || !require_dimensions(array, 2) ||
@@ -1106,7 +1106,7 @@
   (PyArrayObject* array=NULL, int is_new_object=0)
 {
   npy_intp size[3] = { -1, -1, -1 };
-  array = obj_to_array_contiguous_allow_conversion($input,
+  array = obj_to_array_fortran_allow_conversion($input,
                                                    DATA_TYPECODE,
                                                    &is_new_object);
   if (!array || !require_dimensions(array, 3) ||
@@ -1345,7 +1345,7 @@
   (PyArrayObject* array=NULL, int is_new_object=0)
 {
   npy_intp size[4] = { -1, -1, -1 , -1 };
-  array = obj_to_array_contiguous_allow_conversion($input, DATA_TYPECODE,
+  array = obj_to_array_fortran_allow_conversion($input, DATA_TYPECODE,
                                                    &is_new_object);
   if (!array || !require_dimensions(array, 4) ||
       !require_size(array, size, 4) || !require_fortran(array)) SWIG_fail;
diff --git a/tools/travis-test.sh b/tools/travis-test.sh
index de078edf721e..9a215314fc3c 100755
--- a/tools/travis-test.sh
+++ b/tools/travis-test.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
 set -ex
 
 # setup env
@@ -117,10 +117,17 @@ fi
 export PYTHON
 export PIP
 if [ -n "$USE_WHEEL" ] && [ $# -eq 0 ]; then
-  $PIP install --upgrade pip
+  # Build wheel
   $PIP install wheel
   $PYTHON setup.py bdist_wheel
-  $PIP install --pre --upgrade --find-links dist numpy
+  # Make another virtualenv to install into
+  virtualenv --python=python venv-for-wheel
+  . venv-for-wheel/bin/activate
+  # Move out of source directory to avoid finding local numpy
+  pushd dist
+  $PIP install --pre --upgrade --find-links . numpy
+  $PIP install nose
+  popd
   run_test
 elif [ "$USE_CHROOT" != "1" ] && [ "$USE_BENTO" != "1" ]; then
   setup_base