8000 Remove unnecessary calls to float() before division. by anntzer · Pull Request #10331 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

Remove unnecessary calls to float() before division. #10331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 29, 2018

Conversation

anntzer
Copy link
Contributor
@anntzer anntzer commented Jan 27, 2018

Most of these calls probably date back to the time before
__future__-division existed...

Note that this will have the side effect that some functions that were
accidentally accepting strings instead of floats before will no longer
accept strings -- which seems fine to me.

PR Summary

PR Checklist

  • Has Pytest style unit tests
  • Code is PEP 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@anntzer anntzer force-pushed the future-division branch 3 times, most recently from ea8439a to e19e48d Compare January 27, 2018 10:22
Copy link
Contributor
@afvincent afvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me as far as I can tell.

One typo (I think) and a couple very minor comments.

sum_of_squares = np.sum(histogram * np.arange(len(histogram)) ** 2)
rms = np.sqrt(float(sum_of_squares) / num_values)
return rms
return ((expectedImage - actualImage) ** 2).mean()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I may have missed something but this is "Mean Square" rather than "Root Mean Square", is it not? Is it not missing a call to np.sqrt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous method (going through the histogram) seems extra complex, why was it done that way? Is this just a weird historical artifact of supporting numarray/numeric or an attempt to get performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the missing sqrt is just a typo, fixed.
pretty sure performance of this specific part of the code is irrelevant compared to rendering the png or shelling out to inkscape and ghostscript...

@@ -61,7 +61,7 @@ def line2d_seg_dist(p1, p2, p0):
x01 = np.asarray(p0[0]) - p1[0]
y01 = np.asarray(p0[1]) - p1[1]

u = (x01*x21 + y01*y21)/float(abs(x21**2 + y21**2))
u = (x01*x21 + y01*y21) / abs(x21**2 + y21**2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very genuine comment: are x21 and y21 complex or real numbers? Because in the latter case, what is the point of abs(...) then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure they're intended to be reals (it's computing a 3D distance...). Removed the call to abs, good catch.

@@ -13,7 +13,7 @@
colors = prop_cycle.by_key()['color']

lwbase = plt.rcParams['lines.linewidth']
thin = float('%.1f' % (lwbase / 2))
thin = lwbase / 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By curiosity, was there a purpose to the former "rounding to 0.1" scheme?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably to make the title label look better(?), but I think 0.75 looks just fine too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, and I think the same

@afvincent
Copy link
Contributor

I have the feeling that Travis agrees with me about the "typo" in the image comparison test 😈.

@@ -56,10 +52,10 @@ def select_step_hour(dv):
minsec_limits_ = [1.5, 2.5, 3.5, 4.5, 5.5, 8, 11, 14, 18, 25, 45]
minsec_steps_ = [1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30]

minute_limits_ = A(minsec_limits_)*(1./60.)
minute_limits_ = np.array(minsec_limits_) / 60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like another case of old-school micro-optomizations like (x*x vs x**2), but this one still work!

In [3]: d = np.array(range(52))

In [4]: %timeit d / 60
936 ns ± 0.538 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit d * (1/ 60)
683 ns ± 1.43 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: d = np.array(range(50000))

In [7]: %timeit d / 60
102 µs ± 95.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: %timeit d * (1/ 60)
37.4 µs ± 554 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can repro, but I think axisartist/angle_helper is far enough from being on a critical code path to make it worth obfuscating...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@tacaswell tacaswell added this to the v2.2 milestone Jan 27, 2018
@tacaswell
Copy link
Member

👍 in principle, but some concern about the RMS changes.

@anntzer
Copy link
Contributor Author
anntzer commented Jan 27, 2018

hopefully fixed the rms changes

@anntzer anntzer force-pushed the future-division branch 2 times, most recently from 6aa7aba to 6664dd5 Compare January 27, 2018 19:46
@@ -22,7 +22,7 @@
r'some other string', color='red', fontsize=20, dpi=200)

fig = plt.figure()
fig.figimage(rgba1.astype(float)/255., 100, 100)
fig.figimage(rgba2.astype(float)/255., 100, 300)
fig.figimage(rgba1, 100, 100)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the division by 255?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we can display uint8 images just as well as float images, and the real point of the example is to show how to get the mathtext as an image array...

float(l) / self.dpi * self._figdpi,
(float(height)-b-h) / self.dpi * self._figdpi,
l / self.dpi * self._figdpi,
(height-b-h) / self.dpi * self._figdpi,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, since you are in here, these would be much clearer as

self._figdpi / self.dpi,
self._figdpi * (height-b-h) / self.dpi,

Seeing a/b*c always slows me down in a way that c*a/b or (a/b)*c doesn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, fixed

@anntzer anntzer force-pushed the future-division branch 2 times, most recently from c13813d to 7c30859 Compare January 27, 2018 23:24
Copy link
Member
@efiring efiring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good, but I have a couple more questions.

@@ -268,7 +268,7 @@ def from_any(size, fraction_ref=None):
return Fixed(size)
elif isinstance(size, six.string_types):
if size[-1] == "%":
return Fraction(float(size[:-1])/100., fraction_ref)
return Fraction(float(size[:-1]) / 100, fraction_ref)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this float remains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the purpose here is to convert "3%' to 0.03? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Sorry, I didn't pay enough attention to the context.

@@ -250,7 +245,7 @@ def _get_number_fraction(self, factor):
break

d = factor // threshold
int_log_d = int(floor(math.log10(d)))
int_log_d = int(math.floor(math.log10(d)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier it looked like you were eliminating the use of the math module in favor of numpy functions--but not here. Why? (The math module is faster; I assumed you were simplifying by eliminating it in contexts where the speed difference doesn't matter.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the rest of the expression used math and I was too lazy to change both. Obviously this is not on a critical path, changed.

rms = np.sqrt(float(sum_of_squares) / num_values)
return rms
# Convert to float to avoid overflowing finite integer types.
return np.sqrt(((expectedImage - actualImage).astype(float) ** 2).mean())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is seems to be a substantive change to a genuine RMS, from something I don't understand. There must have been a reason for the convoluted original.

Copy link
Contributor Author
@anntzer anntzer Jan 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the original either, but the relevant tests (which check for exact return values of this function) are still passing...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ours are, but I suspect this might cause problems for packagers who put in larger tolerances than ours, as I believe some do. Therefore I recommend sticking to the original algorithm for the purposes of this PR. Perhaps add a comment that this is not a simple rms. @mdboom can probably tell us the rationale for the algorithm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the results should be the same: the old method counted, for each delta, the number of pixels that have this delta value, then computed a RMS based on that histogram.
Again, https://github.com/matplotlib/matplotlib/pull/10335/files shows tests where we check that the reported RMS values are stable to 1e-4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am correctly expressing the two algorithms below, then their results are not the same:

In [26]: x = np.zeros(1000, dtype=int); x[0] = 1

In [27]: h = np.bincount(x, minlength=256)

In [28]: np.sqrt(((h * np.arange(len(h)))**2).mean())
0.0625

In [29]: np.sqrt((x**2).mean())
0.031622776601683791

In [30]: x[:10]
array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [31]: (h * np.arange(len(h)))[:10]
array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0])

In [32]: len(x)
1000

In [33]: len(h)
256

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the previous algorithm was dividing by num_values (so sum() / len(x) in your case, not mean()). this fix shows that the values are indentical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops... sorry I missed that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I can run the Cartopy test suite without any issue using this PR.

@@ -1574,7 +1574,7 @@ def tick_values(self, vmin, vmax):
"""
if self.nbins is None:
return self.locs
step = max(int(0.99 + len(self.locs) / float(self.nbins)), 1)
step = max(int(np.ceil(len(self.locs) / self.nbins)), 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this change the results in rare cases? It does look nicer, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think the code really, really looks like it was written by someone who meant ceil but didn't remember that function existed.
  2. For the change to actually matter you need a) to use bins with FixedLocator, which seems to be quite rare in my experience, and b) at least 99 ticks and 100 bins (to fall in the "edge" region), which is also quite rare...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go along with this one.

Most of these calls probably date back to the time before
`__future__`-division existed...

Note that this will have the side effect that some functions that were
accidentally accepting strings instead of floats before will no longer
accept strings -- which seems fine to me.
rms = np.sqrt(float(sum_of_squares) / num_values)
return rms
# Convert to float to avoid overflowing finite integer types.
return np.sqrt(((expectedImage - actualImage).astype(float) ** 2).mean())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I can run the Cartopy test suite without any issue using this PR.

@jklymak jklymak merged commit 8650c2d into matplotlib:master Jan 29, 2018
@anntzer anntzer deleted the future-division branch January 29, 2018 19:17
@QuLogic QuLogic modified the milestones: needs sorting, v2.2.0 Feb 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0