Impl Series.skew() #813

Rubtsowa · 2020-04-21T11:02:56Z

name	nthreads	type	size	median	min	max	compile	boxing
Series.skew	1	Python	100000000	5.041041	4.553035	6.009044	NaN	NaN
		SDC	100000000	1.082	1.066	1.091	0.205984	0.00001
Series.skew	4	Python	100000000	6.681195	4.275043	14.061077	NaN	NaN
		SDC	100000000	1.07	0.99	1.135	0.329002	0.00001

pep8speaks · 2020-04-21T11:03:03Z

Hello @Rubtsowa! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-04-24 16:12:26 UTC

AlexanderKalistratov · 2020-04-21T11:09:21Z

sdc/datatypes/hpat_pandas_series_functions.py

+        else:
+            _skipna = skipna
+
+        infinite_mask = numpy.isfinite(self._data)


No. that's kills performance and scalability

AlexanderKalistratov · 2020-04-21T11:09:36Z

sdc/datatypes/hpat_pandas_series_functions.py

+
+        infinite_mask = numpy.isfinite(self._data)
+        len_val = len(infinite_mask)
+        data = self._data[infinite_mask]


That's too, actually

AlexanderKalistratov · 2020-04-21T18:37:14Z

@Rubtsowa looks good. Could you please remeasure performance?

Rubtsowa · 2020-04-21T19:44:14Z

name	nthreads	type	size	median	min	max	compile	boxing
Series.skew	1	Python	100000000	9.940206	5.385197	19.73827	NaN	NaN
		SDC	100000000	0.989	0.711	1.241	0.167002	0
Series.skew	4	SDC	100000000	0.731	0.719	0.847	0.105322	0.000417

Rubtsowa · 2020-04-22T07:25:33Z

name	nthreads	type	size	median	min	max	compile	boxing
Series.skew	1	Python	100000000	6.973869	4.221513	9.153432	NaN	NaN
		SDC	100000000	0.597	0.595	0.604	0.12701	0.00008
Series.skew	4	SDC	100000000	0.199	0.197	0.208	0.134013	0.000026

AlexanderKalistratov · 2020-04-22T13:41:42Z

sdc/functions/numpy_like.py

+                return numpy.nan
+
+        n = nfinite
+        m2 = (square_sum - _sum * _sum / n) / n


I believe we could move this formula in somewhere and then use in Series/Rolling/GroupBy methods

@densmirn ?

Rubtsowa · 2020-04-22T14:03:05Z

name	nthreads	type	size	median	min	max	compile	boxing
Series.skew(skipna=False)	1	Python	100000000	3.396047	3.068078	4.135047	NaN	NaN
		SDC	100000000	0.165	0.165	0.171	0.200001	0.000002
	4	SDC	100000000	0.08	0.078	0.09	0.211993	0.000009

AlexanderKalistratov · 2020-04-22T16:09:29Z

sdc/functions/numpy_like.py

+    def skew_impl(arr):
+        len_val = len(arr)
+        n = 0
+        _sum = 0


Let's initialize _sum, square_sum and cube_sum with float values 0.

this will lead to errors

what kind of errors?

AlexanderKalistratov · 2020-04-22T16:10:57Z

sdc/tests/tests_perf/test_perf_series.py

@@ -135,6 +135,8 @@ def _test_case(self, pyfunc, name, total_data_length, input_data=None, data_num=
    TC(name='shape', size=[10 ** 7], call_expr='data.shape', usecase_params='data'),
    TC(name='shift', size=[10 ** 8]),
    TC(name='size', size=[10 ** 7], call_expr='data.size', usecase_params='data'),
+    TC(name='skew', size=[10 ** 8], params='skipna=True'),
+    TC(name='skew', size=[10 ** 8], params='skipna=False', input_data=[test_global_input_data_float64[0]]),


I don't think using test_global_input_data_float64[0] is actually a good idea. It contains max_float values which makes impossible to compare results and may cause some other issues.

…ies_skew

AlexanderKalistratov · 2020-04-24T15:38:23Z

sdc/functions/static.py

@@ -0,0 +1,41 @@
+# -*- coding: utf-8 -*-


Let's name this file 'statistics'

AlexanderKalistratov · 2020-04-24T16:08:08Z

sdc/functions/statics.py

@@ -0,0 +1,41 @@
+# -*- coding: utf-8 -*-


statistics not statics

* Df.at impl (#738) * Series.add / Series.lt with fill_value (#655) * Impl Series.skew() (#813) * Run tests in separate processes (#833) * Run tests in separate processes * Take tests list from sdc/tests/__init__.py * change README (#818) * change README * change README for doc * add refs * change ref * change ref * change ref * change readme * Improve boxing (#832) * Specify sdc version from channel for examples testing (#837) * Specify sdc version from channel for examples testing It occurs that conda resolver can take Intel SDC package not from first channel where it is found. Specify particular SDC version to avoid this in examples for now. Also print info for environment creation and package installing * Fix incerrectly used f-string * Fix log_info call * Numba 0.49.0 all (#824) * Fix run tests Remove import of _getitem_array1d * expectedFailure * expectedFailure-2 * expectedFailure-3 * Conda recipe numba==0.49 * expectedFailure-4 * Refactor imports from Numba * Unskip tests * Fix using of numpy_support.from_dtype() * Unskip tests * Fix DataFrame tests with rewrite IR without Del statements * Unskip tests * Fix corr_overload with type inference error for none < 1 * Fix hpat_pandas_series_cov with type inference error for none < 2 * Unskip tests * Unskip tests * Fixed iternext_series_array with using _getitem_array1d. _getitem_array1d is replaced with _getitem_array_single_int in Numba 0.49. * Unskip tests * Unskip old test * Fix Series.at * Unskip tests * Add decrefs in boxing (#836) * Adding extension type for pd.RangeIndex (#820) * Adding extension type for pd.RangeIndex This commit adds Numba extension types for pandas.RangeIndex class, allowing creation of pd.RangeIndex objects and passing and returning them to/from nopython functions. * Applying review comments * Fix for PR 831 (#839) * Update pyarrow version to 0.17.0 Update recipe, code and docs. * Disable intel channel * Disable intel channel for testing * Fix remarks Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com> * Update to Numba 0.49.1 (#838) * Update to Numba 0.49.1 * Fix requirements.txt * Add travis Co-authored-by: Elena Totmenina <totmeninal@mail.ru> Co-authored-by: Rubtsowa <36762665+Rubtsowa@users.noreply.github.com> Co-authored-by: Sergey Pokhodenko <sergey.pokhodenko@intel.com> Co-authored-by: Vyacheslav-Smirnov <51660067+Vyacheslav-Smirnov@users.noreply.github.com> Co-authored-by: Alexey Kozlov <52973316+kozlov-alexey@users.noreply.github.com> Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com>

* Turn on Azure CI for branch (#822) * Redesign DataFrame structure (#817) * Merge master (#840) * Df.at impl (#738) * Series.add / Series.lt with fill_value (#655) * Impl Series.skew() (#813) * Run tests in separate processes (#833) * Run tests in separate processes * Take tests list from sdc/tests/__init__.py * change README (#818) * change README * change README for doc * add refs * change ref * change ref * change ref * change readme * Improve boxing (#832) * Specify sdc version from channel for examples testing (#837) * Specify sdc version from channel for examples testing It occurs that conda resolver can take Intel SDC package not from first channel where it is found. Specify particular SDC version to avoid this in examples for now. Also print info for environment creation and package installing * Fix incerrectly used f-string * Fix log_info call * Numba 0.49.0 all (#824) * Fix run tests Remove import of _getitem_array1d * expectedFailure * expectedFailure-2 * expectedFailure-3 * Conda recipe numba==0.49 * expectedFailure-4 * Refactor imports from Numba * Unskip tests * Fix using of numpy_support.from_dtype() * Unskip tests * Fix DataFrame tests with rewrite IR without Del statements * Unskip tests * Fix corr_overload with type inference error for none < 1 * Fix hpat_pandas_series_cov with type inference error for none < 2 * Unskip tests * Unskip tests * Fixed iternext_series_array with using _getitem_array1d. _getitem_array1d is replaced with _getitem_array_single_int in Numba 0.49. * Unskip tests * Unskip old test * Fix Series.at * Unskip tests * Add decrefs in boxing (#836) * Adding extension type for pd.RangeIndex (#820) * Adding extension type for pd.RangeIndex This commit adds Numba extension types for pandas.RangeIndex class, allowing creation of pd.RangeIndex objects and passing and returning them to/from nopython functions. * Applying review comments * Fix for PR 831 (#839) * Update pyarrow version to 0.17.0 Update recipe, code and docs. * Disable intel channel * Disable intel channel for testing * Fix remarks Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com> * Update to Numba 0.49.1 (#838) * Update to Numba 0.49.1 * Fix requirements.txt * Add travis Co-authored-by: Elena Totmenina <totmeninal@mail.ru> Co-authored-by: Rubtsowa <36762665+Rubtsowa@users.noreply.github.com> Co-authored-by: Sergey Pokhodenko <sergey.pokhodenko@intel.com> Co-authored-by: Vyacheslav-Smirnov <51660067+Vyacheslav-Smirnov@users.noreply.github.com> Co-authored-by: Alexey Kozlov <52973316+kozlov-alexey@users.noreply.github.com> Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com> * Re-implement df.getitem based on new structure (#845) * Re-implement df.getitem based on new structure * Re-implemented remaining getitem overloads, add tests * Re-implement df.values based on new structure (#846) * Re-implement df.pct_change based on new structure (#847) * Re-implement df.drop based on new structure (#848) * Re-implement df.append based on new structure (#857) * Re-implement df.reset_index based on new structure (#849) * Re-implement df._set_column based on new strcture (#850) * Re-implement df.rolling methods based on new structure (#852) * Re-implement df.index based on new structure (#853) * Re-implement df.copy based on new structure (#854) * Re-implement df.isna based on new structure (#856) * Re-implement df.at/iat/loc/iloc based on new structure (#858) * Re-implement df.head based on new structure (#855) * Re-implement df.head based on new structure * Simplify codegen docstring * Re-implement df.groupby methods based on new structure (#859) * Re-implement dataframe boxing based on new structure (#861) * Re-implement DataFrame unboxing (#860) * Boxing draft Merge branch 'master' of https://github.com/IntelPython/sdc into merge_master # Conflicts: # sdc/hiframes/pd_dataframe_ext.py # sdc/tests/test_dataframe.py * Implement unboxing in new structure * Improve variable names + add error handling * Return error status * Move getting list size to if_ok block * Unskipped unexpected success tests * Unskipped unexpected success tests in GroupBy * Remove decorators * Change to incref False * Skip tests failed due to unimplemented df structure * Bug in rolling * Fix rolling (#865) * Undecorate tests on reading CSV (#866) * Re-implement df structure: enable rolling tests that pass (#867) * Re-implement df structure: refactor len (#868) * Re-implement df structure: refactor len * Undecorated all the remaining methods Co-authored-by: Denis <denis.smirnov@intel.com> * Merge master to feature/dataframe_model_refactoring (#869) * Enable CI on master Co-authored-by: Angelina Kharchevnikova <angelina.kharchevnikova@intel.com> Co-authored-by: Elena Totmenina <totmeninal@mail.ru> Co-authored-by: Rubtsowa <36762665+Rubtsowa@users.noreply.github.com> Co-authored-by: Sergey Pokhodenko <sergey.pokhodenko@intel.com> Co-authored-by: Vyacheslav-Smirnov <51660067+Vyacheslav-Smirnov@users.noreply.github.com> Co-authored-by: Alexey Kozlov <52973316+kozlov-alexey@users.noreply.github.com> Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com>

Impl Series.skew()

a109d42

Rubtsowa requested review from akharche, AlexanderKalistratov and densmirn April 21, 2020 11:02

Rubtsowa added the Ready for Review label Apr 21, 2020

fix problem with PEP8

61d93ab

AlexanderKalistratov reviewed Apr 21, 2020

View reviewed changes

Rubtsowa added 2 commits April 21, 2020 14:41

change impl

cf2b556

not copy data

c4e85fa

add functions for skew in numpy_like

b7948d4

AlexanderKalistratov reviewed Apr 22, 2020

View reviewed changes

change functions in numpy_like, extension unittest, add perf test

5ce5343

AlexanderKalistratov reviewed Apr 22, 2020

View reviewed changes

Rubtsowa added 6 commits April 23, 2020 10:20

change perf test & change functions in numpy_like

1f49c89

change functions in numpy_like

6bfca08

not use global_input_data_integer in tests

9888b7b

change perf test

8089aa0

Merge branch 'master' of https://github.com/IntelPython/hpat into ser…

fdd2b7e

…ies_skew

putting the formula in a separate file

75b007d

AlexanderKalistratov reviewed Apr 24, 2020

View reviewed changes

sdc/functions/static.py Outdated

@@ -0,0 +1,41 @@

# -*- coding: utf-8 -*-

Copy link

Collaborator

AlexanderKalistratov Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's name this file 'statistics'

Rubtsowa added 2 commits April 24, 2020 18:46

rename file

a39a951

add statics.py

df54e1b

AlexanderKalistratov reviewed Apr 24, 2020

View reviewed changes

sdc/functions/statics.py Outdated

@@ -0,0 +1,41 @@

# -*- coding: utf-8 -*-

Copy link

Collaborator

AlexanderKalistratov Apr 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statistics not statics

Rubtsowa and others added 2 commits April 24, 2020 19:09

rename file

acf9943

delete statics

A3D4

73299c5

AlexanderKalistratov approved these changes Apr 27, 2020

View reviewed changes

AlexanderKalistratov merged commit 0e155d4 into IntelPython:master Apr 27, 2020

Rubtsowa deleted the series_skew branch April 27, 2020 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Impl Series.skew() #813

Impl Series.skew() #813

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Impl Series.skew() #813

Impl Series.skew() #813

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Comment last updated at 2020-04-24 16:12:26 UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!