Closed
Description
Description
When using 'seuclidean' distance metric, the algorithm produces different predictions for different values of the n_jobs parameter if no V is passed as additional metric_params. This implies that if configured with n_jobs=-1 two different machines show different results depending on the number of cores. The same happens for 'mahalanobis' distance metric if no V and VI are passed as metric_params.
Steps/Code to Reproduce
# Import required packages
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
# Prepare the dataset
dataset = load_boston()
target = dataset.target
data = pd.DataFrame(dataset.data, columns=dataset.feature_names)
# Split the dataset
np.random.seed(42)
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2)
# Create a regressor with seuclidean distance and passing V as additional argument
model_n_jobs_1 = KNeighborsRegressor(n_jobs=1, algorithm='brute', metric='seuclidean')
model_n_jobs_1.fit(X_train, y_train)
np.sum(model_n_jobs_1.predict(X_test)) # --> 2127.99999
# Create a regressor with seuclidean distance and passing V as additional argument
model_n_jobs_3 = KNeighborsRegressor(n_jobs=3, algorithm='brute', metric='seuclidean')
model_n_jobs_3.fit(X_train, y_train)
np.sum(model_n_jobs_3.predict(X_test)) # --> 2129.38
# Create a regressor with seuclidean distance and passing V as additional argument
model_n_jobs_all = KNeighborsRegressor(n_jobs=-1, algorithm='brute', metric='seuclidean')
model_n_jobs_all.fit(X_train, y_train)
np.sum(model_n_jobs_all.predict(X_test)) # --> 2125.29999
Expected Results
The prediction should be always the same and not depend on the value passed to the n_jobs parameter.
Actual Results
The prediction value changes depending on the value passed to n_jobs which, in case of n_jobs=-1, makes the prediction depend on the number of cores of the machine running the code.
Versions
System
python: 3.6.6 (default, Jun 28 2018, 04:42:43) [GCC 5.4.0 20160609]
executable: /home/mcorella/.local/share/virtualenvs/outlier_detection-8L4UL10d/bin/python3.6
machine: Linux-4.15.0-39-generic-x86_64-with-Ubuntu-16.04-xenial
BLAS
macros: NO_ATLAS_INFO=1, HAVE_CBLAS=None
lib_dirs: /usr/lib
cblas_libs: cblas
Python deps
pip: 18.1
setuptools: 40.5.0
sklearn: 0.20.0
numpy: 1.15.4
scipy: 1.1.0
Cython: None
pandas: 0.23.4