8000 repr doesn't roundtrip for float32 dtype · Issue #9360 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content
< 8000 div class="gh-header-show ">

repr doesn't roundtrip for float32 dtype #9360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mdickinson opened this issue Jul 4, 2017 · 7 comments
Closed

repr doesn't roundtrip for float32 dtype #9360

mdickinson opened this issue Jul 4, 2017 · 7 comments

Comments

@mdickinson
Copy link
Contributor
mdickinson commented Jul 4, 2017

It seems that the np.float32 type doesn't have a round-trippable repr:

>>> x = np.float32(1024 - 2**-14)
>>> y = np.float32(1024 - 2**-13)
>>> x == y  # get False, as expected
False
>>> repr(x) == repr(y)  # expecting False
True
>>> np.float32(repr(x)) == x  # expecting True
False
>>> np.float32(repr(y)) == y  # get True, as expected
True

Looking at the source, 8 significant digits are used for the repr of an np.float32, but the IEEE 754 binary32 format requires 9 digits to roundtrip correctly.

Perhaps this is intentional, but it seems surprising.

[Versions: Python 3.6.1, numpy 1.13.0, macOS 10.10.5]

@seberg
Copy link
Member
seberg commented Jul 4, 2017

Frankly, I find this slightly disturbing, though it must have been there for many many years, should be fixed in any case in my opinion.

@eric-wieser
Copy link
Member

the IEEE 754 binary32 format requires 9 digits to roundtrip correctly.

If that' 8000 s the case, why is np.finfo(np.float32).precision == 6?

@mdickinson
Copy link
Contributor Author
mdickinson commented Jul 4, 2017

@eric-wieser: the np.finfo precision values are the maximum decimal precisions for which decimal -> binary -> decimal recovers the original value; for repr, you want something different: the minimum decimal precision for which binary -> decimal -> binary recovers the value. (Assuming IEEE 754 formats, the relevant values are 3 and 5 for float16, 6 and 9 for float32 and 15 and 17 for float64.)

@eric-wieser
Copy link
Member

Well explained. Seems like we should also expose those larger numbers in finfo too then (and perhaps be more precise in the documentation for precision). Can you think of a suitable name?

Also, can you cite a source for those numbers?

@mdickinson
Copy link
Contributor Author

As far as sources go, the C99 standard has the relevant formulas, in section 5.2.4.2.2p9: for binary -> decimal -> binary roundtrip for a binary format with precision p (which is what we want for repr), the formula is 1 + ceil(p * log10(2)); for p=11, 24 and 53 this gives 5, 9 and 17 respectively. For the precision, we want floor((p-1) * log10(2)), which is where the 3, 6 and 15 values come from. IEEE 754-2008 also gives the 5, 9 and 17 values explicitly in section 5.12.2

I've never found non-paywalled proofs of those formulas, but they're not hard to prove directly: here are some proofs I wrote up last year, after getting annoyed at not finding anything online.

Seems like we should also expose those larger numbers in finfo too then

Sounds good in principle. The catch would be that C99 provides the precision numbers directly for float, double and long double, without any assumption of IEEE 754, under the names FLT_DIG, DBL_DIG and LDBL_DIG; I assume that that's where np.finfo is getting them from. In the other direction, it only defines one number, DECIMAL_DIG, which is the:

number of decimal digits, n, such that any floating-point number in the widest supported floating type with pmax radix b digits can be rounded to a floating-point number with n decimal digits and back again without change to the value,

C11 does provide separate FLT_DECIMAL_DIG, DBL_DECIMAL_DIG and LDBL_DECIMAL_DIG macros, but I don't know how well current compilers support those.

Can you think of a suitable name?

Not right now. Naming things is hard. :-)

@eric-wieser
Copy link
Member

for a binary format with precision p

Here you're using precision to refer to the number of bits in the mantissa, right?

The catch would be that C99 provides the precision numbers directly for float, double and long double, without any assumption of IEEE 754, under the names FLT_DIG, DBL_DIG and LDBL_DIG; I assume that that's where np.finfo is getting them from.

Nope, they're hard-coded in the python code, calculated from finfo.eps, so this isn't a problem. I don't think the calculations there match your equations though, so perhaps they should be fixed

@mdickinson
Copy link
Contributor Author

Here you're using precision to refer to the number of bits in the mantissa, right?

Aargh, yes. Too many precisions. Yes, the number of bits in the significand, including the implicit bit where relevant.

Nope, they're hard-coded in the python code, calculated from finfo.eps

Ah, right. I was making bad assumptions, then.

I don't think the calculations there match your equations though

At a quick glance, it looks the same to me: for a (binary)precision-p binary format, eps should be 2**(1-p), so self.precision = int(-log10(self.eps)) is computing floor((p-1) * log10(2)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
0