Adding extension type for pd.RangeIndex #820

kozlov-alexey · 2020-04-28T01:24:56Z

This commit adds Numba extension types for pandas.RangeIndex class,
allowing creation of pd.RangeIndex objects inside and passing and returning them
to/from nopython functions.

pep8speaks · 2020-04-28T01:25:05Z

Hello @kozlov-alexey! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-13 17:02:27 UTC

This commit adds Numba extension types for pandas.RangeIndex class, allowing creation of pd.RangeIndex objects and passing and returning them to/from nopython functions.

densmirn · 2020-04-28T08:08:56Z

sdc/datatypes/range_index_type.py

+    @property
+    def dtype(self):
+        return types.int64


Is the property going to be expanded? If isn't maybe replace the property with the attribute self.dtype = types.int64 to simplify a bit the code.

No, probably it won't be. I changed to use class attribute instead, that should work too.

densmirn · 2020-04-28T08:17:19Z

sdc/tests/test_indexes.py

+
+        for params in _generate_valid_range_params():
+            for name in test_global_index_names:
+                start, stop, step = params


This line could be moved from internal loop to external one. The action could be applied in several places.

Yes, thank you, fixed it.

sdc/tests/test_indexes.py

AlexanderKalistratov

Looks great!

AlexanderKalistratov · 2020-04-28T20:13:52Z

sdc/extensions/indexes/range_index_ext.py

+            and (isinstance(step, (types.NoneType, types.Omitted)) or step is None)):
+        def pd_range_index_ctor_dummy_impl(
+                start=None, stop=None, step=None, dtype=None, copy=False, name=None, fastpath=None):
+            raise TypeError("RangeIndex(...) must be called with integers")


Why not raise error at compile time?

@AlexanderKalistratov We would have different runtime behavior with pandas then. I'm not sure which way we should take - either comply with pandas as much as possible, or intercept at compile time errors like this that pandas could only catch at runtime, but have compilation failure instead of runtime error. What do you think?

AlexanderKalistratov · 2020-04-28T20:16:34Z

sdc/extensions/indexes/range_index_ext.py

+            raise ValueError("Step must not be zero")
+
+        if _start == 0 and _stop == 0 and _step == 0:
+            raise TypeError("RangeIndex(...) must be called with integers")


Is this correct error message? We are checking _start, _stop and _step to be non-zero. But reports what it must be integers

Thanks! I incorrectly merged two different errors into one.

AlexanderKalistratov · 2020-04-28T20:20:39Z

sdc/extensions/indexes/range_index_ext.py

+
+    res = c.pyapi.call_method(pd_class_obj, "RangeIndex", (start, stop, step, dtype, copy, name))
+
+    c.pyapi.decref(pd_class_obj)


Shouldn't we also decrease reference for start, stop and step objects?
And for dtype, copy and name?

Or, creating objects from native doesn't increment reference for it?

Nice catch! Most likely yes, we need to decref for all python object we created using Python C API functions that return new references. It appears that we don't do this in box/unbox of Series and DFs which I used as a template too, but it's better to address that in a separate PR/task.

* Df.at impl (#738) * Series.add / Series.lt with fill_value (#655) * Impl Series.skew() (#813) * Run tests in separate processes (#833) * Run tests in separate processes * Take tests list from sdc/tests/__init__.py * change README (#818) * change README * change README for doc * add refs * change ref * change ref * change ref * change readme * Improve boxing (#832) * Specify sdc version from channel for examples testing (#837) * Specify sdc version from channel for examples testing It occurs that conda resolver can take Intel SDC package not from first channel where it is found. Specify particular SDC version to avoid this in examples for now. Also print info for environment creation and package installing * Fix incerrectly used f-string * Fix log_info call * Numba 0.49.0 all (#824) * Fix run tests Remove import of _getitem_array1d * expectedFailure * expectedFailure-2 * expectedFailure-3 * Conda recipe numba==0.49 * expectedFailure-4 * Refactor imports from Numba * Unskip tests * Fix using of numpy_support.from_dtype() * Unskip tests * Fix DataFrame tests with rewrite IR without Del statements * Unskip tests * Fix corr_overload with type inference error for none < 1 * Fix hpat_pandas_series_cov with type inference error for none < 2 * Unskip tests * Unskip tests * Fixed iternext_series_array with using _getitem_array1d. _getitem_array1d is replaced with _getitem_array_single_int in Numba 0.49. * Unskip tests * Unskip old test * Fix Series.at * Unskip tests * Add decrefs in boxing (#836) * Adding extension type for pd.RangeIndex (#820) * Adding extension type for pd.RangeIndex This commit adds Numba extension types for pandas.RangeIndex class, allowing creation of pd.RangeIndex objects and passing and returning them to/from nopython functions. * Applying review comments * Fix for PR 831 (#839) * Update pyarrow version to 0.17.0 Update recipe, code and docs. * Disable intel channel * Disable intel channel for testing * Fix remarks Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com> * Update to Numba 0.49.1 (#838) * Update to Numba 0.49.1 * Fix requirements.txt * Add travis Co-authored-by: Elena Totmenina <totmeninal@mail.ru> Co-authored-by: Rubtsowa <36762665+Rubtsowa@users.noreply.github.com> Co-authored-by: Sergey Pokhodenko <sergey.pokhodenko@intel.com> Co-authored-by: Vyacheslav-Smirnov <51660067+Vyacheslav-Smirnov@users.noreply.github.com> Co-authored-by: Alexey Kozlov <52973316+kozlov-alexey@users.noreply.github.com> Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com>

* Improve boxing (#832) * Specify sdc version from channel for examples testing (#837) * Specify sdc version from channel for examples testing It occurs that conda resolver can take Intel SDC package not from first channel where it is found. Specify particular SDC version to avoid this in examples for now. Also print info for environment creation and package installing * Fix incerrectly used f-string * Fix log_info call * Numba 0.49.0 all (#824) * Fix run tests Remove import of _getitem_array1d * expectedFailure * expectedFailure-2 * expectedFailure-3 * Conda recipe numba==0.49 * expectedFailure-4 * Refactor imports from Numba * Unskip tests * Fix using of numpy_support.from_dtype() * Unskip tests * Fix DataFrame tests with rewrite IR without Del statements * Unskip tests * Fix corr_overload with type inference error for none < 1 * Fix hpat_pandas_series_cov with type inference error for none < 2 * Unskip tests * Unskip tests * Fixed iternext_series_array with using _getitem_array1d. _getitem_array1d is replaced with _getitem_array_single_int in Numba 0.49. * Unskip tests * Unskip old test * Fix Series.at * Unskip tests * Add decrefs in boxing (#836) * Adding extension type for pd.RangeIndex (#820) * Adding extension type for pd.RangeIndex This commit adds Numba extension types for pandas.RangeIndex class, allowing creation of pd.RangeIndex objects and passing and returning them to/from nopython functions. * Applying review comments * Fix for PR 831 (#839) * Update pyarrow version to 0.17.0 Update recipe, code and docs. * Disable intel channel * Disable intel channel for testing * Fix remarks Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com> * Update to Numba 0.49.1 (#838) * Update to Numba 0.49.1 * Fix requirements.txt Co-authored-by: Elena Totmenina <totmeninal@mail.ru> Co-authored-by: Vyacheslav-Smirnov <51660067+Vyacheslav-Smirnov@users.noreply.github.com> Co-authored-by: Sergey Pokhodenko <sergey.pokhodenko@intel.com> Co-authored-by: Alexey Kozlov <52973316+kozlov-alexey@users.noreply.github.com> Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com>

* Turn on Azure CI for branch (#822) * Redesign DataFrame structure (#817) * Merge master (#840) * Df.at impl (#738) * Series.add / Series.lt with fill_value (#655) * Impl Series.skew() (#813) * Run tests in separate processes (#833) * Run tests in separate processes * Take tests list from sdc/tests/__init__.py * change README (#818) * change README * change README for doc * add refs * change ref * change ref * change ref * change readme * Improve boxing (#832) * Specify sdc version from channel for examples testing (#837) * Specify sdc version from channel for examples testing It occurs that conda resolver can take Intel SDC package not from first channel where it is found. Specify particular SDC version to avoid this in examples for now. Also print info for environment creation and package installing * Fix incerrectly used f-string * Fix log_info call * Numba 0.49.0 all (#824) * Fix run tests Remove import of _getitem_array1d * expectedFailure * expectedFailure-2 * expectedFailure-3 * Conda recipe numba==0.49 * expectedFailure-4 * Refactor imports from Numba * Unskip tests * Fix using of numpy_support.from_dtype() * Unskip tests * Fix DataFrame tests with rewrite IR without Del statements * Unskip tests * Fix corr_overload with type inference error for none < 1 * Fix hpat_pandas_series_cov with type inference error for none < 2 * Unskip tests * Unskip tests * Fixed iternext_series_array with using _getitem_array1d. _getitem_array1d is replaced with _getitem_array_single_int in Numba 0.49. * Unskip tests * Unskip old test * Fix Series.at * Unskip tests * Add decrefs in boxing (#836) * Adding extension type for pd.RangeIndex (#820) * Adding extension type for pd.RangeIndex This commit adds Numba extension types for pandas.RangeIndex class, allowing creation of pd.RangeIndex objects and passing and returning them to/from nopython functions. * Applying review comments * Fix for PR 831 (#839) * Update pyarrow version to 0.17.0 Update recipe, code and docs. * Disable intel channel * Disable intel channel for testing * Fix remarks Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com> * Update to Numba 0.49.1 (#838) * Update to Numba 0.49.1 * Fix requirements.txt * Add travis Co-authored-by: Elena Totmenina <totmeninal@mail.ru> Co-authored-by: Rubtsowa <36762665+Rubtsowa@users.noreply.github.com> Co-authored-by: Sergey Pokhodenko <sergey.pokhodenko@intel.com> Co-authored-by: Vyacheslav-Smirnov <51660067+Vyacheslav-Smirnov@users.noreply.github.com> Co-authored-by: Alexey Kozlov <52973316+kozlov-alexey@users.noreply.github.com> Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com> * Re-implement df.getitem based on new structure (#845) * Re-implement df.getitem based on new structure * Re-implemented remaining getitem overloads, add tests * Re-implement df.values based on new structure (#846) * Re-implement df.pct_change based on new structure (#847) * Re-implement df.drop based on new structure (#848) * Re-implement df.append based on new structure (#857) * Re-implement df.reset_index based on new structure (#849) * Re-implement df._set_column based on new strcture (#850) * Re-implement df.rolling methods based on new structure (#852) * Re-implement df.index based on new structure (#853) * Re-implement df.copy based on new structure (#854) * Re-implement df.isna based on new structure (#856) * Re-implement df.at/iat/loc/iloc based on new structure (#858) * Re-implement df.head based on new structure (#855) * Re-implement df.head based on new structure * Simplify codegen docstring * Re-implement df.groupby methods based on new structure (#859) * Re-implement dataframe boxing based on new structure (#861) * Re-implement DataFrame unboxing (#860) * Boxing draft Merge branch 'master' of https://github.com/IntelPython/sdc into merge_master # Conflicts: # sdc/hiframes/pd_dataframe_ext.py # sdc/tests/test_dataframe.py * Implement unboxing in new structure * Improve variable names + add error handling * Return error status * Move getting list size to if_ok block * Unskipped unexpected success tests * Unskipped unexpected success tests in GroupBy * Remove decorators * Change to incref False * Skip tests failed due to unimplemented df structure * Bug in rolling * Fix rolling (#865) * Undecorate tests on reading CSV (#866) * Re-implement df structure: enable rolling tests that pass (#867) * Re-implement df structure: refactor len (#868) * Re-implement df structure: refactor len * Undecorated all the remaining methods Co-authored-by: Denis <denis.smirnov@intel.com> * Merge master to feature/dataframe_model_refactoring (#869) * Enable CI on master Co-authored-by: Angelina Kharchevnikova <angelina.kharchevnikova@intel.com> Co-authored-by: Elena Totmenina <totmeninal@mail.ru> Co-authored-by: Rubtsowa <36762665+Rubtsowa@users.noreply.github.com> Co-authored-by: Sergey Pokhodenko <sergey.pokhodenko@intel.com> Co-authored-by: Vyacheslav-Smirnov <51660067+Vyacheslav-Smirnov@users.noreply.github.com> Co-authored-by: Alexey Kozlov <52973316+kozlov-alexey@users.noreply.github.com> Co-authored-by: Vyacheslav Smirnov <vyacheslav.s.smirnov@intel.com>

kozlov-alexey requested review from PokhodenkoSA, AlexanderKalistratov and densmirn April 28, 2020 01:24

kozlov-alexey added the Ready for Review label Apr 28, 2020

Adding extension type for pd.RangeIndex

716bbb4

This commit adds Numba extension types for pandas.RangeIndex class, allowing creation of pd.RangeIndex objects and passing and returning them to/from nopython functions.

kozlov-alexey force-pushed the feature/add_range_index_ext branch from b076836 to 716bbb4 Compare April 28, 2020 01:29

densmirn reviewed Apr 28, 2020

View reviewed changes

AlexanderKalistratov reviewed Apr 28, 2020

View reviewed changes

Applying review comments

1a63480

densmirn approved these changes May 7, 2020

View reviewed changes

AlexanderKalistratov approved these changes May 12, 2020

View reviewed changes

Merge branch 'master' into feature/add_range_index_ext

5b8192b

kozlov-alexey added Ready to merge and removed Ready for Review labels May 13, 2020

kozlov-alexey merged commit c381f30 into IntelPython:master May 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding extension type for pd.RangeIndex #820

Adding extension type for pd.RangeIndex #820

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!


		res = c.pyapi.call_method(pd_class_obj, "RangeIndex", (start, stop, step, dtype, copy, name))

		c.pyapi.decref(pd_class_obj)

Adding extension type for pd.RangeIndex #820

Adding extension type for pd.RangeIndex #820

Uh oh!

Conversation

Uh oh!

Uh oh!

Comment last updated at 2020-05-13 17:02:27 UTC

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!