-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
ENH: Add the geometric standard deviation function #9810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in my view.
Two comments:
- function can handle normal arrays and masked arrays; no separate
mstats
version. I think this is a clean solution. - the
try … catch
construction is normally not used in otherstats
functions that rather test a certain condition such asif any(a < 0) raise ...
. my understanding is that try catch is slower (https://docs.python.org/2/faq/design.html#how-fast-are-exceptions), but not sure if that really matters here. any views?
The idea behind the try..except statement is to avoid checking the input arrays twice. This way (should) work faster when a valid array is passed. But would be significantly slower when it wouldn't work, to me, this seemed like a good trade-off. I've implemented them as checks to avoid the try..except statement. Here are some timings (units of seconds as floats for 30 runs) for a 1000x1000 array of
For the valid case, using try..except is ~2x as fast as using checks. HOWEVER for invalid cases try..except is much slower than trying to catch errors. As it stands I'm not sure which one would be preferred. It depends on whether we expect the user to provide "correct" input, and to be able to provide a faster function. What do you think? |
Thanks for the additional info. I think the trade-off higher speed for correct input vs slower speed for incorrect input is fine. Let's see what others say |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks good in my view
@tylerjereddy any thoughts on using try..except vs checks? Otherwise, I'll look to merge it soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like domain expert review is positive now. The try/except tradeoffs seem sensible to me too. We sometimes like to add asv
benchmarks for new functions where there are performance concerns.
I checked the line coverage of the addition locally--there's a partial hit and one miss, but I'm not saying you have to hit 100% on this PR; here's the screen cap of the html cov with pertinent lines & cmd used:
python runtests.py --mode=full -t "scipy/stats/tests/test_stats.py" -- --cov-report html --cov=scipy/stats
Docstring render in CircleCI looks nice -- we sometimes use versionadded
for new functions I think
@Kai-Striega since this now has 3 merges of master into your branch, it would be useful to rebase to get rid of those. If it doesn't rebase cleanly, perhaps squash and rebase? |
@tylerjereddy I included the timing to demonstrate the difference between checking and try..except. The performance (shouldn't) really be a concern and I don't think adding I've added a @rgommers will do. |
This PR implements a geometric standard deviation function. The geometric standard deviation describes the spread of a set of numbers where the geometric mean is preferred.
Minor simplification to use np.asanyarray in favour of checking if passed value is not an array, then manually converting to an array.
@rgommers does the version history look better now? If so could you merge it once the tests pass? |
Thanks, version history looks good now. Had a quick look at the try-except, that seems fine here. Normally we do explicit checks, but when the checks become expensive compared to the one-liner that's the actual functionality, then I'm fine with try-except to speed that up. Everyone is happy it seems, so in it goes. Thanks @Kai-Striega & reviewers! |
@Kai-Striega could you add an entry to https://github.com/scipy/scipy/wiki/Release-note-entries-for-SciPy-1.3.0 for this? |
Redo of #9395.
This PR implements a geometric standard deviation function and closes #9154. The geometric standard deviation describes the spread of a set of numbers where the geometric mean is preferred.
The original PR included several revisions handling masked arrays,
mstats
and invalid values. These revisions obfuscate the commit history and are starting to repeat the original commits.Several issues occurred checking for and handling invalid values (see #9558 and numpy/numpy#4959) . In particular the use of
nan_policy
. The current implementation does not usenan_policy
. It attempts a naive calculation of the geometric standard deviation. If this fails it checks for invalid arguments, raising a descriptive error message.Another discussion centered around the use of the
mstats
module. The original PR included support for both regular and masked arrays. Later changing to include an implementation in bothstats
andmstats
. This later changed back to only include astats
version.@chrisb83 @WarrenWeckesser would you be willing to take a look over this?