ValueError: Buffer dtype mismatch, expected 'int' but got 'long' #10758

hengchao0248 · 2018-03-06T08:35:42Z

I was trying this code:

for _ in trange(20):
    Q = qdmat.dot(dqmat.dot(Q))
    csr_topn(Q, 10)
    Normalizer(norm="l2", copy=False).fit_transform(Q)
    steps.append(Q[select_indexes])

but get this error:

ValueError                                Traceback (most recent call last)
<ipython-input-5-772329bf5896> in <module>()
      8     Q = qdmat.dot(dqmat.dot(Q))
      9     csr_topn(Q, 10)
---> 10     Normalizer(norm="l1", copy=False).fit_transform(Q)
     11     steps.append(Q[select_indexes])
     12 

/search/odin/tensorflow/lihengchao/anaconda3/lib/python3.6/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params)
    515         if y is None:
    516             # fit method of arity 1 (unsupervised transformation)
--> 517             return self.fit(X, **fit_params).transform(X)
    518         else:
    519             # fit method of arity 2 (supervised transformation)

/search/odin/tensorflow/lihengchao/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/data.py in transform(self, X, y, copy)
   1529         copy = copy if copy is not None else self.copy
   1530         X = check_array(X, accept_sparse='csr')
-> 1531         return normalize(X, norm=self.norm, axis=1, copy=copy)
   1532 
   1533 

/search/odin/tensorflow/lihengchao/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/data.py in normalize(X, norm, axis, copy, return_norm)
   1420                                       "or norm 'l2'")
   1421         if norm == 'l1':
-> 1422             inplace_csr_row_normalize_l1(X)
   1423         elif norm == 'l2':
   1424             inplace_csr_row_normalize_l2(X)

sklearn/utils/sparsefuncs_fast.pyx in sklearn.utils.sparsefuncs_fast.inplace_csr_row_normalize_l1()

sklearn/utils/sparsefuncs_fast.pyx in sklearn.utils.sparsefuncs_fast._inplace_csr_row_normalize_l1()

ValueError: Buffer dtype mismatch, expected 'int' but got 'long'

the csr_matrix Q 's size is

<178726542x2000000 sparse matrix of type '<class 'numpy.float32'>'
	with 1570091377 stored elements in Compressed Sparse Row format>

my os: Linux version 2.6.32-504.23.4.el6.x86_64
my python : Python 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:09:58)
my scipy: 0.19.0
my sklearn:0.19.0

The text was updated successfully, but these errors were encountered:

jnothman · 2018-03-06T11:43:00Z

Large sparse matrices are not yet supported, but may be in the next release with thanks to @kdhingra307's work at #9678.

Closing as a near-duplicate of #2969 etc.

kdhingra307 · 2018-03-06T12:28:57Z

@jnothman I know this will happen

and this is basically happening due to the handling of large_indices directly in cython.

Also, I have completed the most aspects of it. I just wanted to check it thoroughly, because we have changed the default condition

hengchao0248 · 2018-03-06T12:33:18Z

thanks for your guys' effort

Hoeze · 2019-03-26T17:18:12Z

I still got the same error with v0.20.3:

model = LogisticRegression(
    C=1,
    solver='sag',
    random_state=0,
    tol=0.0001,
    max_iter=100,
    verbose=1,
    warm_start=True,
    n_jobs=64,
    penalty='l2',
    dual=False,
    multi_class='ovr',
)
model.fit(train_data.inputs, train_data.targets)

/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/base.py:253: UserWarning: Trying to unpickle estimator StandardScaler from version 0.20.0 when using version 0.20.3. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
[Parallel(n_jobs=64)]: Using backend ThreadingBackend with 64 concurrent workers.
Traceback (most recent call last):
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-5936f410e945>", line 58, in <module>
    model.fit(train_data_cadd.inputs, train_data_cadd.targets)
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/comet_ml/monkey_patching.py", line 244, in wrapper
    return_value = original(*args, **kwargs)
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/linear_model/logistic.py", line 1363, in fit
    for class_, warm_start_coef_ in zip(classes_, warm_start_coef))
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 930, in __call__
    self.retrieve()
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 833, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 567, in __call__
    return self.func(*args, **kwargs)
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 225, in __call__
    for func, args, kwargs in self.items]
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 225, in <listcomp>
    for func, args, kwargs in self.items]
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/linear_model/logistic.py", line 792, in logistic_regression_path
    is_saga=(solver == 'saga'))
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/linear_model/sag.py", line 305, in sag_solver
    dataset, intercept_decay = make_dataset(X, y, sample_weight, random_state)
  File "/opt/modules/i12g/anaconda/3-5.0.1/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 84, in make_dataset
    seed=seed)
  File "sklearn/utils/seq_dataset.pyx", line 259, in sklearn.utils.seq_dataset.CSRDataset.__cinit__
ValueError: Buffer dtype mismatch, expected 'int' but got 'long'

>>> sklearn.__version__
Out[5]: '0.20.3'
>>> train_data.inputs
Out[6]: 
<28034374x904 sparse matrix of type '<class 'numpy.float32'>'
	with 2223406363 stored elements in Compressed Sparse Row format>

Is there some way I can still train my data?

jnothman closed this as completed Mar 6, 2018

Hoeze mentioned this issue Mar 26, 2019

ValueError: Buffer dtype mismatch, expected 'int' but got 'long' #13526

Closed

astrophys mentioned this issue Mar 27, 2019

Possible incompatibility with underlying sklearn.utils.sparsefuncs_fast._inplace_csr_row_normalize_l2 smaegol/PlasFlow#22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Buffer dtype mismatch, expected 'int' but got 'long' #10758

ValueError: Buffer dtype mismatch, expected 'int' but got 'long' #10758

ValueError: Buffer dtype mismatch, expected 'int' but got 'long' #10758

ValueError: Buffer dtype mismatch, expected 'int' but got 'long' #10758

Comments