Logistic regression example not working #1

alanmarazzi · 2017-10-05T17:45:20Z

Hi, I have an issue running the logistic regression example, when running it after installing h5py as recommended on documentation I get this import error:

Traceback (most recent call last):
  File "generate_data/gen_logistic_regression.py", line 7, in <module>
    @hpat.jit
  File "/home/alanm/miniconda3/envs/HPAT/lib/python3.6/site-packages/hpat-0.1.0-py3.6-linux-x86_64.egg/hpat/__init__.py", line 12, in jit
    from .compiler import add_hpat_stages
  File "/home/alanm/miniconda3/envs/HPAT/lib/python3.6/site-packages/hpat-0.1.0-py3.6-linux-x86_64.egg/hpat/compiler.py", line 5, in <module>
    from .hiframes import HiFrames
  File "/home/alanm/miniconda3/envs/HPAT/lib/python3.6/site-packages/hpat-0.1.0-py3.6-linux-x86_64.egg/hpat/hiframes.py", line 23, in <module>
    from hpat import pio
  File "/home/alanm/miniconda3/envs/HPAT/lib/python3.6/site-packages/hpat-0.1.0-py3.6-linux-x86_64.egg/hpat/pio.py", line 13, in <module>
    from hpat import pio_api, pio_lower, utils
  File "/home/alanm/miniconda3/envs/HPAT/lib/python3.6/site-packages/hpat-0.1.0-py3.6-linux-x86_64.egg/hpat/pio_lower.py", line 10, in <module>
    import hio
ModuleNotFoundError: No module named 'hio'

The text was updated successfully, but these errors were encountered:

ehsantn · 2017-10-05T18:27:29Z

HPAT needs to be rebuilt after installing h5py/HDF5:

LDSHARED="mpicxx -shared" CXX=mpicxx LD=mpicxx \
    CC="mpicxx -std=c++11" python setup.py install

alanmarazzi · 2017-10-05T21:49:17Z

Ok, thanks, at least now I get a building error 😄

mpicxx -shared build/temp.linux-x86_64-3.6/hpat/_io.o -L/home/alanm/miniconda3/envs/HPAT/lib -lpython3.6m -o build/lib.linux-x86_64-3.6/hio.cpython-36m-x86_64-linux-gnu.so -lmpi -lhdf5

/usr/bin/ld: cannot find -lmpi
collect2: error: ld returned 1 exit status
error: command 'mpicxx' failed with exit status 1

Anyway I installed PyArrow as well, shall I get rid of it or keep it since I don't see it mentioned in the documentation?

ehsantn · 2017-10-05T21:52:54Z

You can remove -lmpi from setup.py and try again. PyArrow should be ok.

alanmarazzi · 2017-10-06T08:01:44Z

I tried on another machine was able to rebuild HPAT correctly by removing -lmpi (on the other I still couldn't, I will retry from scratch), but when I launch mpirun -n 4 python examples/logistic_regression.py I get

OSError: Failed at nopython (convert DataFrames)
Unable to open file (file signature not found)

And it seems an HDF5 issue because when I check the file with h5debug lr.hdf5 I get

HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 140576896845568:
  #000: H5F.c line 604 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1087 in H5F_open(): unable to read superblock
    major: File accessibilty
    minor: Read failed
  #002: H5Fsuper.c line 277 in H5F_super_read(): file signature not found
    major: File accessibilty
    minor: Not an HDF5 file
cannot open file

By the way, while the linear regression example generates data by launching python generate_data/gen_linear_regression.py, to generate data for logistic regression I had to do it from an interactive python session, and I get the same HDF5 error with the linear regression example.

ehsantn · 2017-10-10T18:03:09Z

Seems like an issue with your HDF5 setup. Maybe the HDF5 library used in h5py doesn't match the HDF5 library in the system path. There might be file corruption or file system access issues also. Can you read and write HDF5 files using C code?

alanmarazzi · 2017-10-11T21:17:58Z

That's likely one issue. I'm currently trying installing HPAT without HDF5 support and to avoid problems with previous installs I'm doing it in a Docker container.

I'll let you know how it goes

ehsantn · 2018-01-11T16:10:06Z

The new installation provides the HDF5 package which avoids these issues. Closing. Feel free to reopen if you still have issues.

Inline and other fixes

* Changing csv_reader_py impl to return df from objmode Motivation: returning Tuple of columns read from csv file with pyarrow csv reader from objmode and further calling init_dataframe ctor to create native DF turned out to be inneficient in sense of LLVM IR size and compilation time. With this PR we now rely on DF unboxing and return py DF from objmode. * Capture dtype dict instead of building in objmode * Applying comments #1

* Adds Int64Index type and updates Series and DF methods to use it Motivation: as part of the work on supporting common pandas indexes a new type (Int64IndexType) representing pandas.Int64Index is added. Boxing/unboxing of Series and DataFrames as well as common numpy-like functions are changed accordingly to handle it. * Fixing DateTime tests and PEP remarks * Fixing review comments #1

* Adds Int64Index type and updates Series and DF methods to use it (#950) * Adds Int64Index type and updates Series and DF methods to use it Motivation: as part of the work on supporting common pandas indexes a new type (Int64IndexType) representing pandas.Int64Index is added. Boxing/unboxing of Series and DataFrames as well as common numpy-like functions are changed accordingly to handle it. * Fixing DateTime tests and PEP remarks * Fixing review comments #1 * Move to Numba 0.52 (#939) * Taking numba from master * Moving to Numba 0.52 commit 3182540b127268ace11cf4042cd87f044875d9fa Author: Kozlov, Alexey <alexey.kozlov@intel.com> Date: Wed Oct 21 19:49:58 2020 +0300 Cleaning up before squash commit 895668116542fe3057f73fcb276c441cbde66747 Author: Kozlov, Alexey <alexey.kozlov@intel.com> Date: Tue Oct 13 17:31:34 2020 +0300 Workaround for set from str_arr problem * Fixing correct NUMBA_VERSION * Remove intel/label/beta channel from Azure CI builds * Move to pandas=1.2.0 (#959) * Move to pandas=1.2.0 Motivation: use latest versions of dependencies. * More failed tests are fixed * Fixing doc build * Fixing bug in stability of mergesort impl for StringArray (#961) Motivation: for StringArray type legacy implementation of stable sort computed result when sorting with ascending=False by reversing the result of argsorting with ascending=True, which produces wrong order in groups of elements with the same value. Implemented solution adds new function argument 'ascening' and uses it when calling native function impl via serial stable_sort.

* Initial version of ConcurrentDict container via TBB hashmap Motivation: SDC relies on typed.Dict implementation in many core pandas algorithms, and it doesn't support concurrent read/writes. To fill this gap we add ConcurrentDict type which will be used if threading layer is TBB. * Fixing PEP and updating failing import * Fixing builds, warnings and complying to C++11 syntax * Fixing PEP and review comments #1 * Fixing remarks #2 * Applying remarks #3

ehsantn closed this as completed Jan 11, 2018

Hardcode84 pushed a commit to Hardcode84/hpat that referenced this issue Oct 10, 2019

Merge pull request IntelPython#1 from Hardcode84/new_numba_port

684b2c6

Inline and other fixes

kozlov-alexey mentioned this issue Sep 7, 2020

Changing csv_reader_py impl to return df from objmode #918

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Logistic regression example not working #1

Logistic regression example not working #1

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Logistic regression example not working #1

Logistic regression example not working #1

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!