8000 ValueError: Buffer dtype mismatch, expected 'int' but got 'long' · Issue #10758 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ValueError: Buffer dtype mismatch, expected 'int' but got 'long' #10758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hengchao0248 opened this issue Mar 6, 2018 · 4 comments
Closed

ValueError: Buffer dtype mismatch, expected 'int' but got 'long' #10758

hengchao0248 opened this issue Mar 6, 2018 · 4 comments

Comments

@hengchao0248
Copy link

I was trying this code:

for _ in trange(20):
    Q = qdmat.dot(dqmat.dot(Q))
    csr_topn(Q, 10)
    Normalizer(norm="l2", copy=False).fit_transform(Q)
    steps.append(Q[select_indexes])

but get this error:

ValueError                                Traceback (most recent call last)
<ipython-input-5-772329bf5896> in <module>()
      8     Q = qdmat.dot(dqmat.dot(Q))
      9     csr_topn(Q, 10)
---> 10     Normalizer(norm="l1", copy=False).fit_transform(Q)
     11     steps.append(Q[select_indexes])
     12 

/search/odin/tensorflow/lihengchao/anaconda3/lib/python3.6/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params)
    515         if y is None:
    516             # fit method of arity 1 (unsupervised transformation)
--> 517             return self.fit(X, **fit_params).transform(X)
    518         else:
    519             # fit method of arity 2 (supervised transformation)

/search/odin/tensorflow/lihengchao/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/data.py in transform(self, X, y, copy)
   1529         copy = copy if copy is not None else self.copy
   1530         X = check_array(X, accept_sparse='csr')
-> 1531         return normalize(X, norm=self.norm, axis=1, copy=copy)
   1532 
   1533 

/search/odin/tensorflow/lihengchao/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/data.py in normalize(X, norm, axis, copy, return_norm)
   1420                                       "or norm 'l2'")
   1421         if norm == 'l1':
-> 1422             inplace_csr_row_normalize_l1(X)
   1423         elif norm == 'l2':
   1424             inplace_csr_row_normalize_l2(X)

sklearn/utils/sparsefuncs_fast.pyx in sklearn.utils.sparsefuncs_fast.inplace_csr_row_normalize_l1()

sklearn/utils/sparsefuncs_fast.pyx in sklearn.utils.sparsefuncs_fast._inplace_csr_row_normalize_l1()

ValueError: Buffer dtype mismatch, expected 'int' but got 'long'

the csr_matrix Q 's size is

<178726542x2000000 sparse matrix of type '<class 'numpy.float32'>'
	with 1570091377 stored elements in Compressed Sparse Row format>

my os: Linux version 2.6.32-504.23.4.el6.x86_64
my python : Python 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:09:58)
my scipy: 0.19.0
my sklearn:0.19.0

@jnothman
Copy link
Member
jnothman commented Mar 6, 2018

Large sparse matrices are not yet supported, but may be in the next release with thanks to @kdhingra307's work at #9678.

Closing as a near-duplicate of #2969 etc.

@jnothman jnothman closed this as completed Mar 6, 2018
@kdhingra307
Copy link

@jnothman I know this will happen

and this is basically happening due to the handling of large_indices directly in cython.

Also, I have completed the most aspects of it. I just wanted to check it thoroughly, because we have changed the default condition

@hengchao0248
Copy link
Author

thanks for your guys' effort

@Hoeze
Copy link
Hoeze commented Mar 26, 2019

I still got the same error with v0.20.3:

model = LogisticRegression(
    C=1,
    solver='sag',
    random_state=0,
    tol=0.0001,
    max_iter=100,
    verbose=1,
    warm_start=True,
    n_jobs=64,
    penalty='l2',
    dual=False,
    multi_class='ovr',
)
model.fit(train_data.inputs, train_data.targets)
/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/base.py:253: UserWarning: Trying to unpickle estimator StandardScaler from version 0.20.0 when using version 0.20.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
[Parallel(n_jobs=64)]: Using backend ThreadingBackend with 64 concurrent workers.
Traceback (most recent call last):
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-5936f410e945>", line 58, in <module>
    model.fit(train_data_cadd.inputs, train_data_cadd.targets)
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/comet_ml/monkey_patching.py", line 244, in wrapper
    return_value = original(*args, **kwargs)
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/linear_model/logistic.py", line 1363, in fit
    for class_, warm_start_coef_ in zip(classes_, warm_start_coef))
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 930, in __call__
    self.retrieve()
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 833, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 567, in __call__
    return self.func(*args, **kwargs)
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 225, in __call__
    for func, args, kwargs in self.items]
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 225, in <listcomp>
    for func, args, kwargs in self.items]
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/linear_model/logistic.py", line 792, in logistic_regression_path
    is_saga=(solver == 'saga'))
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/linear_model/sag.py", line 305, in sag_solver
    dataset, intercept_decay = make_dataset(X, y, sample_weight, random_state)
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 84, in make_dataset
    seed=seed)
  File "sklearn/utils/seq_dataset.pyx", line 259, in sklearn.utils.seq_dataset.CSRDataset.__cinit__
ValueError: Buffer dtype mismatch, expected 'int' but got 'long'
>>> sklearn.__version__
Out[5]: '0.20.3'
>>> train_data.inputs
Out[6]: 
<28034374x904 sparse matrix of type '<class 'numpy.float32'>'
	with 2223406363 stored elements in Compressed Sparse Row format>

Is there some way I can still train my data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0