8000 Merge master by akharche · Pull Request #840 · IntelPython/sdc · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Merge master #840

Merged
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 3 additions & 146 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,6 @@ the code by leveraging modern hardware instructions and by utilizing all availab

Intel® SDC documentation can be found `here <https://intelpython.github.io/sdc-doc/>`_.

Intel® SDC uses special Numba build based on ``0.48.0`` tag for build and run.
Required Numba version can be installed from ``intel/label/beta`` channel from the Anaconda Cloud.

.. note::
For maximum performance and stability, please use numba from ``intel/label/beta`` channel.

Expand Down Expand Up @@ -61,9 +58,6 @@ If you do not have conda, we recommend using Miniconda3::
./miniconda.sh -b
export PATH=$HOME/miniconda3/bin:$PATH

Intel® SDC uses special Numba build based on ``0.48.0`` tag for build and run.
Required Numba version can be installed from ``intel/label/beta`` channel from the Anaconda Cloud.

.. note::
For maximum performance and stability, please use numba from ``intel/label/beta`` channel.

Expand All @@ -88,7 +82,7 @@ Building on Linux with setuptools

PYVER=<3.6 or 3.7>
NUMPYVER=<1.16 or 1.17>
conda create -n sdc-env -q -y -c intel/label/beta -c defaults -c intel -c conda-forge python=$PYVER numpy=$NUMPYVER numba=0.48.0 pandas=0.25.3 pyarrow=0.15.1 gcc_linux-64 gxx_linux-64
conda create -n sdc-env -q -y -c intel/label/beta -c defaults -c intel -c conda-forge python=$PYVER numpy=$NUMPYVER numba=0.49 pandas=0.25.3 pyarrow=0.17.0 gcc_linux-64 gxx_linux-64
source activate sdc-env
git clone https://github.com/IntelPython/sdc.git
cd sdc
Expand Down Expand Up @@ -126,7 +120,7 @@ Building on Windows with setuptools

set PYVER=<3.6 or 3.7>
set NUMPYVER=<1.16 or 1.17>
conda create -n sdc-env -c intel/label/beta -c defaults -c intel -c conda-forge python=%PYVER% numpy=%NUMPYVER% numba=0.48.0 pandas=0.25.3 pyarrow=0.15.1
conda create -n sdc-env -c intel/label/beta -c defaults -c intel -c conda-forge python=%PYVER% numpy=%NUMPYVER% numba=0.49 pandas=0.25.3 pyarrow=0.17.0
conda activate sdc-env
set INCLUDE=%INCLUDE%;%CONDA_PREFIX%\Library\include
set LIB=%LIB%;%CONDA_PREFIX%\Library\lib
Expand Down Expand Up @@ -176,146 +170,9 @@ To build HTML documentation you will need to run:
The built documentation will be located in the ``./sdc/docs/build/html`` directory.
To preview the documentation open ``index.html`` file.

`Sphinx*`_ Generation Internals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The documentation generation is controlled by ``conf.py`` script automatically invoked by `Sphinx*`_.
See `Sphinx documentation <http://www.sphinx-doc.org/en/master/usage/configuration.html>`_ for details.

The API Reference for Intel® SDC User's Guide is auto-generated by inspecting ``pandas`` and ``sdc`` modules.
That's why these modules must be pre-installed for documentation generation using `Sphinx*`_.
However, there is a possibility to skip API Reference auto-generation by setting environment variable ``SDC_DOC_NO_API_REF_STR=1``.

If the environment variable ``SDC_DOC_NO_API_REF_STR`` is unset then Sphinx's ``conf.py``
invokes ``generate_api_reference()`` function located in ``./sdc/docs/source/buildscripts/apiref_generator`` module.
This function parses ``pandas`` and ``sdc`` docstrings for each API, combines those into single docstring and
writes it into RST file with respective `Pandas*`_ API name. The auto-generated RST files are
located at ``./sdc/docs/source/_api_ref`` directory.

.. note::
`Sphinx*`_ will automatically clean the ``_api_ref`` directory on the next invocation of the documenation build.

Intel® SDC docstring decoration rules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since Intel® SDC API Reference is auto-generated from respective `Pandas*`_ and Intel® SDC docstrings there are certain rules that must be
followed to accurately generate the API description.

1. Every Intel® SDC API must have the docstring.
If developer does not provide the docstring then `Sphinx*`_ will not be able to match `Pandas*`_ docstring with respective SDC one.
In this situation `Sphinx*`_ assumes that SDC does not support such API and will include respective note in the API Reference that
**This API is currently unsupported**.

2. Follow 'one function - one docstring' rule.
You cannot have one docstring for multiple APIs, even if those are very similar. Auto-generator assumes every SDC API is covered by
respective docstring. If `Sphinx*`_ does not find the docstring for particular API then it assumes that SDC does not support such API
and will include respective note in the API Reference that **This API is currently unsupported**.

3. Description (introductory section, the very first few paragraphs without a title) is taken from `Pandas*`_.
Intel® SDC developers should not include API description in SDC docstring.
But developers are encouraged to follow Pandas API description naming conventions
so that the combined docstring appears consistent.

4. Parameters, Returns, and Raises sections' description is taken from `Pandas*`_ docstring.
Intel® SDC developers should not include such descriptions in their SDC docstrings.
Rather developers are encouraged to follow Pandas naming conventions
so that the combined docstring appears consistent.

5. Every SDC docstring must be of the follwing structure:
::

"""
Intel Scalable Dataframe Compiler User Guide
********************************************
Pandas API: <full pandas name, e.g. pandas.Series.nlargest>

<Intel® SDC specific sections>

Intel Scalable Dataframe Compiler Developer Guide
*************************************************
<Developer's Guide specific sections>
"""

The first two lines must be the User Guide header. This is an indication to `Sphinx*`_ that this section is intended for public API
and it will be combined with repsective Pandas API docstring.

Line 3 must specify what Pandas API this Intel® SDC docstring does correspond to. It must start with ``Pandas API:`` followed by
full Pandas API name that corresponds to this SDC docstring. Remember to include full name, for example, ``nlargest`` is not
sufficient for auto-generator to perform the match. The full name must be ``pandas.Series.nlargest``.

After User Guide sections in the docstring there can be another header indicating that the remaining part of the docstring belongs to
Developer's Guide and must not be included into User's Guide.

6. Examples, See Also, References sections are **NOT** taken from Pandas docstring. SDC developers are expected to complete these sections in SDC doctrings.
This is so because respective Pandas sections are sometimes too Pandas specific and are not relevant to SDC. SDC developers have to
rewrite those sections in Intel® SDC style. Do not forget about User Guide header and Pandas API name prior to adding SDC specific
sections.

7. Examples section is mandatory for every SDC API. 'One API - at least one example' rule is applied.
Examples are essential part of user experience and must accompany every API docstring.

8. Embed examples into Examples section from ``./sdc/examples``.
Rather than writing example in the docstring (which is error-prone) embed relevant example scripts into the docstring. For example,
here is an example how to embed example for ``pandas.Series.get()`` function into respective Intel® SDC docstring:

::

"""
...
Examples
--------
.. literalinclude:: ../../../examples/series_getitem.py
:language: python
:lines: 27-
:caption: Getting Pandas Series elements
:name: ex_series_getitem

.. code-block:: console

10000 > python ./series_getitem.py
55

In the above snapshot the script ``series_getitem.py`` is embedded into the docstring. ``:lines: 27-`` allows to skip lengthy
copyright header of the file. ``:caption:`` provides meaningful description of the example. It is a good tone to have the caption
for every example. ``:name:`` is the `Sphinx*`_ name that allows referencing example from other parts of the documentation. It is a good
tone to include this field. Please follow the naming convention ``ex_<example file name>`` for consistency.

Accompany every example with the expected output using ``.. code-block:: console`` decorator.


**Every Examples section must come with one or more examples illustrating all major variations of supported API parameter combinations. It is highly recommended to illustrate SDC API limitations (e.g. unsupported parameters) in example script comments.**

9. See Also sections are highly encouraged.
This is a good practice to include relevant references into the See Also section. Embedding references which are not directly
related to the topic may be distructing if those appear across API description. A good style is to have a dedicated section for
relevant topics.

See Also section may include references to relevant SDC and Pandas as well as to external topics.

A special form of See Also section is References to publications. Pandas documentation sometimes uses References section to refer to
external projects. While it is not prohibited to use References section in SDC docstrings, it is better to combine all references
under See Also umbrella.

10. Notes and Warnings must be decorated with ``.. note::`` and ``.. warning::`` respectively.
Do not use
::
Notes
-----

Warning
-------

Pay attention to indentation and required blank lines. `Sphinx*`_ is very sensitive to that.

11. If SDC API does not support all variations of respective Pandas API then Limitations section is mandatory.
While there is not specific guideline how Limitations section must be written, a good style is to follow Pandas Parameters section
description style and naming conventions.

12. Before committing your code for public SDC API you are expected to:
More information about building and adding documentation can be found `here <docs/README.rst>`_.

- have SDC docstring implemented;
- have respective SDC examples implemented and tested
- API Reference documentation generated and visually inspected. New warnings in the documentation build are not allowed.

Running unit tests
------------------
Expand Down
30 changes: 23 additions & 7 deletions buildscripts/autogen_sources.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@

arithmetic_binops_symbols = {
'add': '+',
'div': '/',
'sub': '-',
'mul': '*',
'truediv': '/',
Expand Down Expand Up @@ -117,9 +118,11 @@
import_section_text = ''.join(module_text_lines[imports_start_line: import_end_line + 1])

# read function templates for arithmetic and comparison operators from templates module
template_series_arithmetic_binop = inspect.getsource(templates_module.sdc_pandas_series_operator_binop)
template_series_comparison_binop = inspect.getsource(templates_module.sdc_pandas_series_operator_comp_binop)
template_str_arr_comparison_binop = inspect.getsource(templates_module.sdc_str_arr_operator_comp_binop)
template_series_binop = inspect.getsource(templates_module.sdc_pandas_series_binop)
template_series_comp_binop = inspect.getsource(templates_module.sdc_pandas_series_comp_binop)
template_series_operator = inspect.getsource(templates_module.sdc_pandas_series_operator_binop)
template_series_comp_operator = inspect.getsource(templates_module.sdc_pandas_series_operator_comp_binop)
template_str_arr_comp_binop = inspect.getsource(templates_module.sdc_str_arr_operator_comp_binop)

exit_status = -1
try:
Expand All @@ -133,19 +136,32 @@
# certaing modifications are needed to be applied for templates, so
# verify correctness of produced code manually
for name in arithmetic_binops_symbols:
func_text = template_series_arithmetic_binop.replace('binop', name)
func_text = template_series_binop.replace('binop', name)
func_text = func_text.replace(' + ', f' {arithmetic_binops_symbols[name]} ')
func_text = func_text.replace('def ', f'@sdc_overload(operator.{name})\ndef ', 1)
func_text = func_text.replace('def ', f"@sdc_overload_method(SeriesType, '{name}')\ndef ", 1)
file.write(f'\n\n{func_text}')

for name in comparison_binops_symbols:
func_text = template_series_comparison_binop.replace('comp_binop', name)
func_text = template_series_comp_binop.replace('comp_binop', name)
func_text = func_text.replace(' < ', f' {comparison_binops_symbols[name]} ')
func_text = func_text.replace('def ', f"@sdc_overload_method(SeriesType, '{name}')\ndef ", 1)
file.write(f'\n\n{func_text}')

for name in arithmetic_binops_symbols:
if name != "div":
func_text = template_series_operator.replace('binop', name)
func_text = func_text.replace(' + ', f' {arithmetic_binops_symbols[name]} ')
func_text = func_text.replace('def ', f'@sdc_overload(operator.{name})\ndef ', 1)
file.write(f'\n\n{func_text}')

for name in comparison_binops_symbols:
func_text = template_series_comp_operator.replace('comp_binop', name)
func_text = func_text.replace(' < ', f' {comparison_binops_symbols[name]} ')
func_text = func_text.replace('def ', f'@sdc_overload(operator.{name})\ndef ', 1)
file.write(f'\n\n{func_text}')

for name in comparison_binops_symbols:
func_text = template_str_arr_comparison_binop.replace('comp_binop', name)
func_text = template_str_arr_comp_binop.replace('comp_binop', name)
func_text = func_text.replace(' < ', f' {comparison_binops_symbols[name]} ')
if name == 'ne':
func_text = func_text.replace('and not', 'or')
Expand Down
3 changes: 0 additions & 3 deletions buildscripts/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,6 @@

def build(sdc_utils):
os.chdir(str(sdc_utils.src_path))
# For Windows build do not use intel channel due to build issue
if platform.system() == 'Windows':
sdc_utils.channels = '-c intel/label/beta -c defaults -c conda-forge'

sdc_utils.log_info('Start Intel SDC build', separate=True)
conda_build_cmd = ' '.join([
Expand Down
3 changes: 2 additions & 1 deletion buildscripts/run_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ def run_examples(sdc_utils):
sdc_utils.log_info('Run Intel(R) SDC examples', separate=True)
sdc_utils.log_info(sdc_utils.line_double)
sdc_utils.create_environment()
sdc_utils.install_conda_package(['sdc'])
sdc_package = f'sdc={sdc_utils.get_sdc_version_from_channel()}'
sdc_utils.install_conda_package([sdc_package])

run_examples(sdc_utils)
4 changes: 2 additions & 2 deletions buildscripts/sdc-conda-recipe/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{% set NUMBA_VERSION = "==0.48" %}
{% set NUMBA_VERSION = "==0.49.1" %}
{% set PANDAS_VERSION = "==0.25.3" %}
{% set PYARROW_VERSION = "==0.15.1" %}
{% set PYARROW_VERSION = "==0.17.0" %}

package:
name: sdc
Expand Down
29 changes: 27 additions & 2 deletions buildscripts/sdc-conda-recipe/run_test.bat
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,35 @@ if errorlevel 1 exit 1

@rem TODO investigate root cause of NumbaPerformanceWarning
@rem http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics
python -W ignore -u -m sdc.runtests -v
python -W ignore -u -m sdc.runtests -v sdc.tests.test_basic
if errorlevel 1 exit 1
python -W ignore -u -m sdc.runtests -v sdc.tests.test_series
if errorlevel 1 exit 1
python -W ignore -u -m sdc.runtests -v sdc.tests.test_dataframe
if errorlevel 1 exit 1
python -W ignore -u -m sdc.runtests -v sdc.tests.test_hiframes
if errorlevel 1 exit 1
python -W ignore -u -m sdc.runtests -v sdc.tests.test_date
if errorlevel 1 exit 1
python -W ignore -u -m sdc.runtests -v sdc.tests.test_strings
if errorlevel 1 exit 1
python -W ignore -u -m sdc.runtests -v sdc.tests.test_groupby
if errorlevel 1 exit 1
python -W ignore -u -m sdc.runtests -v sdc.tests.test_join
if errorlevel 1 exit 1
python -W ignore -u -m sdc.runtests -v sdc.tests.test_rolling
if errorlevel 1 exit 1
python -W ignore -u -m sdc.runtests -v sdc.tests.test_ml
if errorlevel 1 exit 1
python -W ignore -u -m sdc.runtests -v sdc.tests.test_io
if errorlevel 1 exit 1
python -W ignore -u -m sdc.runtests -v sdc.tests.test_hpat_jit
if errorlevel 1 exit 1
python -W ignore -u -m sdc.runtests -v sdc.tests.test_sdc_numpy
if errorlevel 1 exit 1
python -W ignore -u -m sdc.runtests -v sdc.tests.test_prange_utils
if errorlevel 1 exit 1

REM Link check for Documentation using Sphinx's in-built linkchecker
REM sphinx-build -b linkcheck -j1 usersource _build/html
REM if errorlevel 1 exit 1

15 changes: 14 additions & 1 deletion buildscripts/sdc-conda-recipe/run_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,17 @@ python -m sdc.tests.gen_test_data

# TODO investigate root cause of NumbaPerformanceWarning
# http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics
python -W ignore -u -m sdc.runtests -v
python -W ignore -u -m sdc.runtests -v sdc.tests.test_basic
python -W ignore -u -m sdc.runtests -v sdc.tests.test_series
python -W ignore -u -m sdc.runtests -v sdc.tests.test_dataframe
python -W ignore -u -m sdc.runtests -v sdc.tests.test_hiframes
python -W ignore -u -m sdc.runtests -v sdc.tests.test_date
python -W ignore -u -m sdc.runtests -v sdc.tests.test_strings
python -W ignore -u -m sdc.runtests -v sdc.tests.test_groupby
python -W ignore -u -m sdc.runtests -v sdc.tests.test_join
python -W ignore -u -m sdc.runtests -v sdc.tests.test_rolling
python -W ignore -u -m sdc.runtests -v sdc.tests.test_ml
python -W ignore -u -m sdc.runtests -v sdc.tests.test_io
python -W ignore -u -m sdc.runtests -v sdc.tests.test_hpat_jit
python -W ignore -u -m sdc.runtests -v sdc.tests.test_sdc_numpy
python -W ignore -u -m sdc.runtests -v sdc.tests.test_prange_utils
23 changes: 20 additions & 3 deletions buildscripts/utilities.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@



import json
import os
import platform
import re
Expand All @@ -51,7 +52,7 @@ def __init__(self, python, sdc_local_channel=None):
self.line_single = '-'*80

# Set channels
self.channel_list = ['-c', 'intel/label/beta', '-c', 'intel', '-c', 'defaults', '-c', 'conda-forge']
self.channel_list = ['-c', 'intel/label/beta', '-c', 'defaults', '-c', 'conda-forge']
if sdc_local_channel:
sdc_local_channel = Path(sdc_local_channel).resolve().as_uri()
self.channel_list = ['-c', sdc_local_channel] + self.channel_list
Expand Down Expand Up @@ -87,7 +88,7 @@ def create_environment(self, packages_list=[]):
# Create Intel SDC environment
create_args = ['-q', '-y', '-n', self.env_name, f'python={self.python}']
create_args += packages_list + self.channel_list + ['--override-channels']
self.__run_conda_command(Conda_Commands.CREATE, create_args)
self.log_info(self.__run_conda_command(Conda_Commands.CREATE, create_args))

return

Expand All @@ -97,7 +98,7 @@ def install_conda_package(self, packages_list):
self.log_info(f'Install {" ".join(packages_list)} to {self.env_name} conda environment')
install_args = ['-n', self.env_name]
install_args += self.channel_list + ['--override-channels', '-q', '-y'] + packages_list
self.__run_conda_command(Conda_Commands.INSTALL, install_args)
self.log_info(self.__run_conda_command(Conda_Commands.INSTALL, install_args))

return

Expand Down Expand Up @@ -135,3 +136,19 @@ def log_info(self, msg, separate=False):
if separate:
print(f'{time.strftime("%d/%m/%Y %H:%M:%S")}: {self.line_double}', flush=True)
print(f'{time.strftim 66C0 e("%d/%m/%Y %H:%M:%S")}: {msg}', flush=True)

def get_sdc_version_from_channel(self):
python_version = 'py' + self.python.replace('.', '')

# Get Intel SDC version from first channel in channel_list
search_args = ['sdc', '-c', self.channel_list[1], '--override-channels', '--json']
search_result = self.__run_conda_command(Conda_Commands.SEARCH, search_args)

repo_data = json.loads(search_result)
for package_data in repo_data['sdc']:
sdc_version = package_data['version']
sdc_build = package_data['build']
if python_version in sdc_build:
break

return f'{sdc_version}={sdc_build}'
Loading
0