8000 Logistic regression example not working · Issue #1 · IntelPython/sdc · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Logistic regression example not working #1

Closed
alanmarazzi opened this issue Oct 5, 2017 · 7 comments · Fixed by #918
Closed

Logistic regression example not working #1

alanmarazzi opened this issue Oct 5, 2017 · 7 comments · Fixed by #918

Comments

@alanmarazzi
Copy link

Hi, I have an issue running the logistic regression example, when running it after installing h5py as recommended on documentation I get this import error:

Traceback (most recent call last):
  File "generate_data/gen_logistic_regression.py", line 7, in <module>
    @hpat.jit
  File "/home/alanm/miniconda3/envs/HPAT/lib/python3.6/site-packages/hpat-0.1.0-py3.6-linux-x86_64.egg/hpat/__init__.py", line 12, in jit
    from .compiler import add_hpat_stages
  File "/home/alanm/miniconda3/envs/HPAT/lib/python3.6/site-packages/hpat-0.1.0-py3.6-linux-x86_64.egg/hpat/compiler.py", line 5, in <module>
    from .hiframes import HiFrames
  File "/home/alanm/miniconda3/envs/HPAT/lib/python3.6/site-packages/hpat-0.1.0-py3.6-linux-x86_64.egg/hpat/hiframes.py", line 23, in <module>
    from hpat import pio
  File "/home/alanm/miniconda3/envs/HPAT/lib/python3.6/site-packages/hpat-0.1.0-py3.6-linux-x86_64.egg/hpat/pio.py", line 13, in <module>
    from hpat import pio_api, pio_lower, utils
  File "/home/alanm/miniconda3/envs/HPAT/lib/python3.6/site-packages/hpat-0.1.0-py3.6-linux-x86_64.egg/hpat/pio_lower.py", line 10, in <module>
    import hio
ModuleNotFoundError: No module named 'hio'
@ehsantn
Copy link
Contributor
ehsantn commented Oct 5, 2017

HPAT needs to be rebuilt after installing h5py/HDF5:

LDSHARED="mpicxx -shared" CXX=mpicxx LD=mpicxx \
    CC="mpicxx -std=c++11" python setup.py install

@alanmarazzi
Copy link
Author

Ok, thanks, at least now I get a building error 😄

mpicxx -shared build/temp.linux-x86_64-3.6/hpat/_io.o -L/home/alanm/miniconda3/envs/HPAT/lib -lpython3.6m -o build/lib.linux-x86_64-3.6/hio.cpython-36m-x86_64-linux-gnu.so -lmpi -lhdf5

/usr/bin/ld: cannot find -lmpi
collect2: error: ld returned 1 exit status
error: command 'mpicxx' failed with exit status 1

Anyway I installed PyArrow as well, shall I get rid of it or keep it since I don't see it mentioned in the documentation?

@ehsantn
Copy link
Contributor
ehsantn commented Oct 5, 2017

You can remove -lmpi from setup.py and try again. PyArrow should be ok.

@alanmarazzi
Copy link
Author

I tried on another machine was able to rebuild HPAT correctly by removing -lmpi (on the other I still couldn't, I will retry from scratch), but when I launch mpirun -n 4 python examples/logistic_regression.py I get

OSError: Failed at nopython (convert DataFrames)
Unable to open file (file signature not found)

And it seems an HDF5 issue because when I check the file with h5debug lr.hdf5 I get

HDF5-DIAG: Error detected in HDF5 (1.8.17) thread 140576896845568:
  #000: H5F.c line 604 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1087 in H5F_open(): unable to read superblock
    major: File accessibilty
    minor: Read failed
  #002: H5Fsuper.c line 277 in H5F_super_read(): file signature not found
    major: File accessibilty
    minor: Not an HDF5 file
cannot open file

By the way, while the linear regression example generates data by launching python generate_data/gen_linear_regression.py, to generate data for logistic regression I had to do it from an interactive python session, and I get the same HDF5 error with the linear regression example.

@ehsantn
Copy link
Contributor
ehsantn commented Oct 10, 2017

Seems like an issue with your HDF5 setup. Maybe the HDF5 library used in h5py doesn't match the HDF5 library in the system path. There might be file corruption or file system access issues also. Can you read and write HDF5 files using C code?

@alanmarazzi
Copy link
Author

That's likely one issue. I'm currently trying installing HPAT without HDF5 support and to avoid problems with previous installs I'm doing it in a Docker container.

I'll let you know how it goes

@ehsantn
Copy link
Contributor
ehsantn commented Jan 11, 2018

The new installation provides the HDF5 package which avoids these issues. Closing. Feel free to reopen if you still have issues.

@ehsantn ehsantn closed this as completed Jan 11, 2018
Hardcode84 pushed a commit to Hardcode84/hpat that referenced this issue Oct 10, 2019
AlexanderKalistratov pushed a commit that referenced this issue Sep 9, 2020
* Changing csv_reader_py impl to return df from objmode

Motivation: returning Tuple of columns read from csv file with
pyarrow csv reader from objmode and further calling init_dataframe
ctor to create native DF turned out to be inneficient in sense of
LLVM IR size and compilation time. With this PR we now rely on DF
unboxing and return py DF from objmode.

* Capture dtype dict instead of building in objmode

* Applying comments #1
kozlov-alexey added a commit that referenced this issue Jan 29, 2021
* Adds Int64Index type and updates Series and DF methods to use it

Motivation: as part of the work on supporting common pandas indexes
a new type (Int64IndexType) representing pandas.Int64Index is added.
Boxing/unboxing of Series and DataFrames as well as common numpy-like
functions are changed accordingly to handle it.

* Fixing DateTime tests and PEP remarks

* Fixing review comments #1
kozlov-alexey added a commit that referenced this issue Feb 19, 2021
* Adds Int64Index type and updates Series and DF methods to use it (#950)

* Adds Int64Index type and updates Series and DF methods to use it

Motivation: as part of the work on supporting common pandas indexes
a new type (Int64IndexType) representing pandas.Int64Index is added.
Boxing/unboxing of Series and DataFrames as well as common numpy-like
functions are changed accordingly to handle it.

* Fixing DateTime tests and PEP remarks

* Fixing review comments #1

* Move to Numba 0.52 (#939)

* Taking numba from master

* Moving to Numba 0.52

commit 3182540b127268ace11cf4042cd87f044875d9fa
Author: Kozlov, Alexey <alexey.kozlov@intel.com>
Date:   Wed Oct 21 19:49:58 2020 +0300

    Cleaning up before squash

commit 895668116542fe3057f73fcb276c441cbde66747
Author: Kozlov, Alexey <alexey.kozlov@intel.com>
Date:   Tue Oct 13 17:31:34 2020 +0300

    Workaround for set from str_arr problem

* Fixing correct NUMBA_VERSION

* Remove intel/label/beta channel from Azure CI builds

* Move to pandas=1.2.0 (#959)

* Move to pandas=1.2.0

Motivation: use latest versions of dependencies.

* More failed tests are fixed

* Fixing doc build

* Fixing bug in stability of mergesort impl for StringArray (#961)

Motivation: for StringArray type legacy implementation of stable sort
computed result when sorting with ascending=False by reversing the
result of argsorting with ascending=True, which produces wrong order in
groups of elements with the same value. Implemented solution adds
new function argument 'ascening' and uses it when calling native function
impl via serial stable_sort.
kozlov-alexey added a commit that referenced this issue May 14, 2021
* Initial version of ConcurrentDict container via TBB hashmap

Motivation: SDC relies on typed.Dict implementation in many core
pandas algorithms, and it doesn't support concurrent read/writes.
To fill this gap we add ConcurrentDict type which will be used if
threading layer is TBB.

* Fixing PEP and updating failing import

* Fixing builds, warnings and complying to C++11 syntax

* Fixing PEP and review comments #1

* Fixing remarks #2

* Applying remarks #3
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
0