8000 BUG: Inconsistent and potentially misleading conversion of Polynomials to strings when the domain and window differ · Issue #27903 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Inconsistent and potentially misleading conversion of Polynomials to strings when the domain and window differ #27903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ortk95 opened this issue Dec 4, 2024 · 2 comments
Labels

Comments

@ortk95
Copy link
ortk95 commented Dec 4, 2024

Describe the issue:

The string value of a Polynomial is inconsistent and potentially misleading, as it does include any information about the domain/window of the polynomial. If the domain and window of the polynomial are not equal, str(poly) returns a string where x refers to the scaled coordinate, rather than the unscaled coordinates used when e.g. calling the polynomial. This means that the returned string can be mathematically different to the 'actual' values of the polynomial, which would be very misleading to any users who are not aware of the domain/window conversions in the Polynomial, and potentially lead to errors in data analysis.

Furthermore, the value of str(poly) is inconsistent with the value of IPython's display(poly), adding to the potential confusion. With str(poly), x refers to the scaled variable, whereas for display(poly), x refers to the original unscaled variable.

For example, with y = 1 + 2*x + 3*x**2 (see code example), the value of str(poly) is 7601.0 + 15100.0·x + 7500.0·x², which obviously seems surprisingly different to the expected 1.0 + 2.0·x + 3.0·x². The IPython display(poly) version of the polynomial, x↦7601.0+15100.0(-1.0+0.02x)+7500.0(-1.0+0.02x)^2, is more complex, but mathematically gives the correct result.

To avoid confusion, I feel like it may make more sense for str(poly) to return a value formatted similarly to the existing display(poly) behaviour:

  • IPython display(poly): x↦7601.0+15100.0(-1.0+0.02x)+7500.0(-1.0+0.02x)^2
  • Current str(poly): 7601.0 + 15100.0·x + 7500.0·x²
  • Proposed str(poly): 7601.0 + 15100.0·(-1.0+0.02x) + 7500.0·(-1.0+0.02x)²

This would avoid any misleading ambiguity about if x refers to the scaled or unscaled variables, and ensure the IPython and string versions of the polynomial are consistent. It would also help to signpost and emphasise the effect of the a differing domain/window for any users who are using e.g. Polynomial.fit() for the first time.

Reproduce the code example:

import numpy as np
x = np.linspace(0, 100)
y = 1 + 2*x + 3*x**2
p = np.polynomial.Polynomial.fit(x, y, 2)

print(repr(p))
# Polynomial([ 7601., 15100.,  7500.], domain=[  0., 100.], window=[-1.,  1.], symbol='x')

print(str(p))
# 7601.0 + 15100.0·x + 7500.0·x²

display(p) # Display formatted polynomial when using IPython
# x↦7601.0+15100.0(-1.0+0.02x)+7500.0(-1.0+0.02x)^2

print(str(p.convert()))
# 1.0 + 2.0·x + 3.0·x²

Error message:

No response

Python and NumPy Versions:

1.25.2
3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0]

Runtime Environment:

[{'numpy_version': '1.25.2',
'python': '3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0]',
'uname': uname_result(system='Linux', node='alice-login02', release='5.14.0-427.42.1.el9_4.x86_64', version='#1 SMP PREEMPT_DYNAMIC Thu Oct 31 14:01:51 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_KNM',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'Zen',
'filepath': '/alice-home/3/o/ortk2/miniconda3/envs/py311/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-5007b62f.3.23.dev.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23.dev'},
{'architecture': 'Zen',
'filepath': '/alice-home/3/o/ortk2/miniconda3/envs/py311/lib/python3.11/site-packages/scipy.libs/libopenblasp-r0-41284840.3.18.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.18'}]

Context for the issue:

The inconsistent and confusing nature of the conversion of a Polynomial to a string makes it more challenging to use Polynomials quick exploration of datasets, and increases the chance of bugs/mistakes in data analysis caused by users incorrectly interpreting the value of str(poly).

I originally encountered this issue when fitting a dataset with Polynomial.fit(), then plotting the fitted polynomial with matplotlib: ax.plot(x, poly(x), label=str(poly)). This produced a legend entry that was inconsistent with the x coordinates in the graph, due to the domain/window scaling - I now use str(poly.convert()) to produce the 'correct' legend entry.

This issue, however, could easily go unnoticed if the domain and window are similar, but different (e.g. domain=[-1, 1.1], window=[-1, 1]), as it may not be immediately obvious that x coordinate in str(poly) is different to the x coordinate in the graph.

@ortk95 ortk95 added the 00 - Bug label Dec 4, 2024
@eendebakpt
Copy link
Contributor

@ortk95 This issue had already been addressed in #21760. Could you try with a recent version of numpy to see whether the issue still exist?

@ortk95
6B18
Copy link
Author
ortk95 commented Dec 6, 2024

That works, thanks!

@ortk95 ortk95 closed this as completed Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants
0