Series.add / Series.lt with fill_value #655

1e-to · 2020-02-28T15:26:39Z

No description provided.

…_ser

pep8speaks · 2020-03-02T13:23:40Z

Hello @1e-to! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-04-14 15:20:52 UTC

AlexanderKalistratov · 2020-03-05T08:51:23Z

@kozlov-alexey ?

kozlov-alexey · 2020-03-11T16:04:57Z

sdc/datatypes/hpat_pandas_series_functions.py

+    fill_value_is_none = False
+    if isinstance(fill_value, (types.NoneType, types.Omitted)) or fill_value is None:
+        fill_value_is_none = True


Maybe just:
fill_value_is_none = isinstance(fill_value, (types.NoneType, types.Omitted)) or fill_value is None
?

kozlov-alexey · 2020-03-11T16:38:49Z

sdc/functions/numpy_like.py

@@ -531,7 +531,7 @@ def sdc_fillna_inplace_int_impl(self, inplace=False, value=None):

        def sdc_fillna_inplace_float_impl(self, inplace=False, value=None):
            length = len(self)
-            for i in prange(length):
+            for i in range(length):


Why are you reverting back to non scalable implementation?

kozlov-alexey · 2020-03-11T22:12:43Z

sdc/tests/test_series.py

@@ -1482,7 +1482,7 @@ def test_series_op5(self):

    @skip_parallel
    def test_series_op5_integer_scalar(self):
-        arithmetic_methods = ('add', 'sub', 'mul', 'div', 'truediv', 'floordiv', 'mod', 'pow')
+        arithmetic_methods = ('sub', 'mul', 'div', 'truediv', 'floordiv', 'mod', 'pow')


I don't think you should remove this function from the test (this use case is still valid), instead you can add a check_dtype=False param to assert_series_equal function and add a comment to state why it's used.

kozlov-alexey · 2020-03-11T22:25:34Z

sdc/tests/test_series.py

+        for data in cases_data:
+            for index in cases_index:
+                for scalar in cases_scalar:
+                    for value in cases_value:
+                        with self.subTest(data=data, index=index, scalar=scalar, value=value):
+                            S1 = pd.Series(data, index)
+                            pd.testing.assert_series_equal(sdc_func(S1, scalar, value), test_impl(S1, scalar, value))


You can use product instead of nested for-loops:

for data, index, scalar, fill_value in product(cases_data, cases_index, cases_scalar, cases_value): S1 = pd.Series(data, index) with self.subTest(data=data, index=index, scalar=scalar, value=fill_value): result = sdc_func(S1, scalar, fill_value) resutl_ref = test_impl(S1, scalar, fill_value) pd.testing.assert_series_equal(result, resutl_ref)

kozlov-alexey · 2020-03-11T22:29:45Z

sdc/tests/test_series.py

+                            S1 = pd.Series(data, index)
+                            pd.testing.assert_series_equal(sdc_func(S1, scalar, value), test_impl(S1, scalar, value))
+
+    def test_series_lt(self):


The name of these tests should reflect that you're testing fill_value param, and now it fits to default use-case when all args are omitted more.

kozlov-alexey · 2020-03-11T22:31:39Z

sdc/tests/test_series.py

+        S2 = pd.Series(data, index2)
+        pd.testing.assert_series_equal(sdc_func(S1, S2), test_impl(S1, S2))
+
+    @unittest.skip('SDC returns only float Series')


Generally it's no good to add some functionality and keep tests for it skipped, so the same comment as above: used check_dtype=False.

kozlov-alexey · 2020-03-11T22:43:42Z

sdc/tests/test_series.py

+                        S1 = pd.Series(data, index)
+                        S2 = pd.Series(index, data)


Hmm, this uses data as index for other series... This doesn't seem very transparent from functional coverage point of view. For instance, you test case when S2 has index=[3.3, 5.4, np.nan, 7.9, np.nan] but it won't be S1's index, so alignment of float indexes with NaNs is not tested. I would advice separate tests per different branches in impl, so e.g. if you have separate branch for S1.index=None and S2.index=None test it in a separate test. And for indexes you can just use:
for index1, index2 in product(cases_index, cases_index):

kozlov-alexey · 2020-03-11T23:44:38Z

sdc/datatypes/hpat_pandas_series_functions.py

+            fill_value_is_nan = False
+            if fill_value is None:
+                fill_value = numpy.nan
+            if not fill_value_is_none == True:  # noqa
+                fill_value_is_nan = numpy.isnan(fill_value)
+            if not (fill_value_is_nan or fill_value_is_none == True):  # noqa


You do not need to capture fill_value_is_none as compile time constant, as there's no need to eliminate dead branches here. If all you need is to run fillna when fill_value is not None/np.nan, you can just write:

if (fill_value is not None and not numpy.isnan(fill_value)): numpy_like.fillna(self._data, inplace=True, value=fill_value)

kozlov-alexey · 2020-03-16T11:39:27Z

sdc/datatypes/hpat_pandas_series_functions.py

-            if not fill_value_is_none == True:  # noqa
-                fill_value_is_nan = numpy.isnan(fill_value)
-            if not (fill_value_is_nan or fill_value_is_none == True):  # noqa
+            if (fill_value is not None and not numpy.isnan(fill_value)):


Don't you like the shorter form?

Suggested change

if (fill_value is not None and not numpy.isnan(fill_value)):

if not (fill_value is None or numpy.isnan(fill_value)):

…_ser

kozlov-alexey · 2020-03-20T15:40:23Z

sdc/sdc_methods_templates.py

+    .. note::
+
+        Parameter axis is currently unsupported by Intel Scalable Dataframe Compiler


Is it correct to have it here? Shouldn't it be one of unsupported parameters in limitations?

kozlov-alexey · 2020-03-20T15:52:47Z

sdc/sdc_methods_templates.py

+
+    Limitations
+    -----------
+    - Parameters level, fill_value are currently unsupported by Intel Scalable Dataframe Compiler


You are adding support for fill_value

kozlov-alexey · 2020-03-20T17:07:48Z

sdc/sdc_methods_templates.py

+        Given: self={}, other={}'.format(_func_name, self, other))
+
+    def sdc_pandas_series_operator_binop_impl(self, other):
+        return sdc_pandas_series_binop(self, other)


Why not

Suggested change

return sdc_pandas_series_binop(self, other)

return self.binop(other)

kozlov-alexey · 2020-03-23T23:18:57Z

sdc/tests/test_series.py

+        cases_value = [None, 4, 5.5]
+        for data, index, value in product(cases_data, cases_index, cases_value):
+            with self.subTest(data=data, index=index, value=value):
+                S1 = pd.Series(data, index)


It seems corruption/compilation failure problems in tests are caused by using index=None as a data in S2 series. I'm still suggesting to not mix up data and indexes to be tested between each other, e.g.:

for value in cases_value: for data1, data2 in product(cases_data, cases_data): for index1, index2 in product(cases_index, cases_index): S1 = pd.Series(data1, index1) S2 = pd.Series(data2, index2) with self.subTest(left=S1, right=S2, value=value): ...

kozlov-alexey · 2020-03-23T23:21:17Z

sdc/sdc_autogenerated.py


-    _func_name = 'Operator add().'


_func_name will not be defined in the function, and it's used in error-handling branches.

AlexanderKalistratov · 2020-04-17T19:30:31Z

@kozlov-alexey can we merge this?

kozlov-alexey · 2020-04-20T13:34:25Z

buildscripts/autogen_sources_methods.py

+    template_series_arithmetic_binop = inspect.getsource(templates_module.sdc_pandas_series_binop)
+    template_series_comparison_binop = inspect.getsource(templates_module.sdc_pandas_series_comp_binop)
+    template_series_arithmetic_binop_operator = inspect.getsource(templates_module.sdc_pandas_series_operator_binop)
+    template_series_comparison_binop_operator = series_operator_comp_binop


Shouldn't the template_series_comparison_binop_operator be assigned inspect.getsource(...)?
I tried to run autogen_source_methos.py but got the following:

(SDC_MASTER) C:\Users\akozlov\AppData\Local\Continuum\anaconda3\hpat>python buil dscripts\autogen_sources_methods.py c:\users\akozlov\appdata\local\continuum\anaconda3\hpat\sdc\utilities\utils.py:4 59: SyntaxWarning: "is not" with a literal. Did you mean "!="? if init_vals is not (): Exception of type AttributeError: 'function' object has no attribute 'replace' while writing to a file: C:\Users\akozlov\AppData\Local\Continuum\anaconda3\hpat \sdc\sdc_autogenerated.py

kozlov-alexey · 2020-04-20T13:37:45Z

buildscripts/autogen_sources_methods.py

+# EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+# *****************************************************************************
+'''
+


Why we need a different file - autogen_source_methods.py? And why we need to keep the old file - autogen_sources.py then?

kozlov-alexey · 2020-04-20T16:30:01Z

buildscripts/autogen_sources.py

@@ -116,9 +117,12 @@
    imports_start_line, import_end_line = min(imports_line_numbers), max(imports_line_numbers)
    import_section_text = ''.join(module_text_lines[imports_start_line: import_end_line + 1])

+    series_operator_comp_binop = inspect.getsource(templates_module.sdc_pandas_series_operator_comp_binop)


I don't think series_operator_comp_binop var is needed, since we are using it only once.
Let's maybe rename and shorten the template names, e.g:

template_series_arithmetic_binop -> template_series_binop template_series_comparison_binop -> template_series_comp_binop template_series_arithmetic_binop_operator -> template_series_operator template_series_comparison_binop_operator -> template_series_comp_operator template_str_arr_comparison_binop -> template_str_arr_comp_binop

* Df.at impl (#738) * Series.add / Series.lt with fill_value (#655) * Impl Series.skew() (#813) * Run tests in separate processes (#833) * Run tests in separate processes * Take tests list from sdc/tests/__init__.py * change README (#818) * change README * change README for doc * add refs * change ref * change ref * change ref * change readme * Improve boxing (#832) * Specify sdc version from channel for examples testing (#837) * Specify sdc version from channel for examples testing It occurs that conda resolver can take Intel SDC package not from first channel where it is found. Specify particular SDC version to avoid this in examples for now. Also print info for environment creation and package installing * Fix incerrectly used f-string * Fix log_info call * Numba 0.49.0 all (#824) * Fix run tests Remove import of _getitem_array1d * expectedFailure * expectedFailure-2 * expectedFailure-3 * Conda recipe numba==0.49 * expectedFailure-4 * Refactor imports from Numba * Unskip tests * Fix using of numpy_support.from_dtype() * Unskip tests * Fix DataFrame tests with rewrite IR without Del statements * Unskip tests * Fix corr_overload with type inference error for none < 1 * Fix hpat_pandas_series_cov with type inference error for none < 2 * Unskip tests * Unskip tests * Fixed iternext_series_array with using _getitem_array1d. _getitem_array1d is replaced with _getitem_array_single_int in Numba 0.49. * Unskip tests * Unskip old test * Fix Series.at * Unskip tests * Add decrefs in boxing (#836) * Adding extension type for pd.RangeIndex (#820) * Adding extension type for pd.RangeIndex This commit adds Numba extension types for pandas.RangeIndex class, allowing creation of pd.RangeIndex objects and passing and returning them to/from nopython functions. * Applying review comments * Fix for PR 831 (#839) * Update pyarrow version to 0.17.0 Update recipe, code and docs. * Disable intel channel * Disable intel channel for testing * Fix remarks Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com> * Update to Numba 0.49.1 (#838) * Update to Numba 0.49.1 * Fix requirements.txt * Add travis Co-authored-by: Elena Totmenina <totmeninal@mail.ru> Co-authored-by: Rubtsowa <36762665+Rubtsowa@users.noreply.github.com> Co-authored-by: Sergey Pokhodenko <sergey.pokhodenko@intel.com> Co-authored-by: Vyacheslav-Smirnov <51660067+Vyacheslav-Smirnov@users.noreply.github.com> Co-authored-by: Alexey Kozlov <52973316+kozlov-alexey@users.noreply.github.com> Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com>

* Turn on Azure CI for branch (#822) * Redesign DataFrame structure (#817) * Merge master (#840) * Df.at impl (#738) * Series.add / Series.lt with fill_value (#655) * Impl Series.skew() (#813) * Run tests in separate processes (#833) * Run tests in separate processes * Take tests list from sdc/tests/__init__.py * change README (#818) * change README * change README for doc * add refs * change ref * change ref * change ref * change readme * Improve boxing (#832) * Specify sdc version from channel for examples testing (#837) * Specify sdc version from channel for examples testing It occurs that conda resolver can take Intel SDC package not from first channel where it is found. Specify particular SDC version to avoid this in examples for now. Also print info for environment creation and package installing * Fix incerrectly used f-string * Fix log_info call * Numba 0.49.0 all (#824) * Fix run tests Remove import of _getitem_array1d * expectedFailure * expectedFailure-2 * expectedFailure-3 * Conda recipe numba==0.49 * expectedFailure-4 * Refactor imports from Numba * Unskip tests * Fix using of numpy_support.from_dtype() * Unskip tests * Fix DataFrame tests with rewrite IR without Del statements * Unskip tests * Fix corr_overload with type inference error for none < 1 * Fix hpat_pandas_series_cov with type inference error for none < 2 * Unskip tests * Unskip tests * Fixed iternext_series_array with using _getitem_array1d. _getitem_array1d is replaced with _getitem_array_single_int in Numba 0.49. * Unskip tests * Unskip old test * Fix Series.at * Unskip tests * Add decrefs in boxing (#836) * Adding extension type for pd.RangeIndex (#820) * Adding extension type for pd.RangeIndex This commit adds Numba extension types for pandas.RangeIndex class, allowing creation of pd.RangeIndex objects and passing and returning them to/from nopython functions. * Applying review comments * Fix for PR 831 (#839) * Update pyarrow version to 0.17.0 Update recipe, code and docs. * Disable intel channel * Disable intel channel for testing * Fix remarks Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com> * Update to Numba 0.49.1 (#838) * Update to Numba 0.49.1 * Fix requirements.txt * Add travis Co-authored-by: Elena Totmenina <totmeninal@mail.ru> Co-authored-by: Rubtsowa <36762665+Rubtsowa@users.noreply.github.com> Co-authored-by: Sergey Pokhodenko <sergey.pokhodenko@intel.com> Co-authored-by: Vyacheslav-Smirnov <51660067+Vyacheslav-Smirnov@users.noreply.github.com> Co-authored-by: Alexey Kozlov <52973316+kozlov-alexey@users.noreply.github.com> Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com> * Re-implement df.getitem based on new structure (#845) * Re-implement df.getitem based on new structure * Re-implemented remaining getitem overloads, add tests * Re-implement df.values based on new structure (#846) * Re-implement df.pct_change based on new structure (#847) * Re-implement df.drop based on new structure (#848) * Re-implement df.append based on new structure (#857) * Re-implement df.reset_index based on new structure (#849) * Re-implement df._set_column based on new strcture (#850) * Re-implement df.rolling methods based on new structure (#852) * Re-implement df.index based on new structure (#853) * Re-implement df.copy based on new structure (#854) * Re-implement df.isna based on new structure (#856) * Re-implement df.at/iat/loc/iloc based on new structure (#858) * Re-implement df.head based on new structure (#855) * Re-implement df.head based on new structure * Simplify codegen docstring * Re-implement df.groupby methods based on new structure (#859) * Re-implement dataframe boxing based on new structure (#861) * Re-implement DataFrame unboxing (#860) * Boxing draft Merge branch 'master' of https://github.com/IntelPython/sdc into merge_master # Conflicts: # sdc/hiframes/pd_dataframe_ext.py # sdc/tests/test_dataframe.py * Implement unboxing in new structure * Improve variable names + add error handling * Return error status * Move getting list size to if_ok block * Unskipped unexpected success tests * Unskipped unexpected success tests in GroupBy * Remove decorators * Change to incref False * Skip tests failed due to unimplemented df structure * Bug in rolling * Fix rolling (#865) * Undecorate tests on reading CSV (#866) * Re-implement df structure: enable rolling tests that pass (#867) * Re-implement df structure: refactor len (#868) * Re-implement df structure: refactor len * Undecorated all the remaining methods Co-authored-by: Denis <denis.smirnov@intel.com> * Merge master to feature/dataframe_model_refactoring (#869) * Enable CI on master Co-authored-by: Angelina Kharchevnikova <angelina.kharchevnikova@intel.com> Co-authored-by: Elena Totmenina <totmeninal@mail.ru> Co-authored-by: Rubtsowa <36762665+Rubtsowa@users.noreply.github.com> Co-authored-by: Sergey Pokhodenko <sergey.pokhodenko@intel.com> Co-authored-by: Vyacheslav-Smirnov <51660067+Vyacheslav-Smirnov@users.noreply.github.com> Co-authored-by: Alexey Kozlov <52973316+kozlov-alexey@users.noreply.github.com> Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com>

elena.totmenina added 3 commits February 28, 2020 18:13

Series.add / Series.lt with fill_value

d331183

Merge branch 'master' of https://github.com/IntelPython/sdc into oper…

8000 eb4a728

…_ser

del useless

722e966

1e-to requested a review from kozlov-alexey February 28, 2020 15:26

elena.totmenina added 2 commits March 2, 2020 16:22

Add oprimization for fill_value=None

52560c1

Merge branch 'master' of https://github.com/IntelPython/sdc into oper…

8839fc1

…_ser

pep

475cb58

kozlov-alexey reviewed Mar 11, 2020

View reviewed changes

test and impl fix

b483957

1e-to added the Ready for Review label Mar 13, 2020

non index impl fix

1684822

kozlov-alexey reviewed Mar 16, 2020

View reviewed changes

kozlov-alexey approved these changes Mar 16, 2020

View reviewed changes

elena.totmenina and others added 7 commits March 17, 2020 13:34

update script

b2ec0a6

fix

c0ca1ad

Merge branch 'master' of https://github.com/IntelPython/sdc into oper…

af3bbdd

…_ser

fix operators

92d0692

add doc

15e2806

div fix

fc0510b

Merge branch 'master' into oper_ser

90e5647

kozlov-alexey reviewed Mar 20, 2020

View reviewed changes

elena.totmenina added 2 commits March 21, 2020 13:27

F438

doc fix

ef8fe79

fix algo for scalars

c48f2e3

kozlov-alexey reviewed Mar 23, 2020

View reviewed changes

etotmeni and others added 3 commits April 14, 2020 15:33

small fixes

70a5484

Merge branch 'master' into oper_ser

bf87286

del prange

8f76b36

kozlov-alexey reviewed Apr 20, 2020

View reviewed changes

Fix script

e09a6fc

kozlov-alexey reviewed Apr 20, 2020

View reviewed changes

kozlov-alexey approved these changes Apr 20, 2020

View reviewed changes

fix vars

542a59c

1e-to added Ready to merge and removed Ready for Review labels Apr 20, 2020

AlexanderKalistratov merged commit 6d77677 into IntelPython:master Apr 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Series.add / Series.lt with fill_value #655

Series.add / Series.lt with fill_value #655

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	if (fill_value is not None and not numpy.isnan(fill_value)):
	if not (fill_value is None or numpy.isnan(fill_value)):

		.. note::

		Parameter axis is currently unsupported by Intel Scalable Dataframe Compiler

	return sdc_pandas_series_binop(self, other)
	return self.binop(other)

Series.add / Series.lt with fill_value #655

Series.add / Series.lt with fill_value #655

Uh oh!

Conversation

Uh oh!

Uh oh!

Comment last updated at 2020-04-14 15:20:52 UTC

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!