Open
Description
Describe the bug
In sklearn.metrics._ndcg_sample_scores
, there is a counterintuitive handling of the case where all true relevances are equal to zero for some samples. In this case, DCG = 0, IDCG = 0, and the whole NDCG is not defined. In sklearn
implementation it is defined as 0 and included in the averaged NDCG calculation. The least leads to strange effects, like ndcg_score(y,y) != 1
; moreover, it affects the metric value in non-trivial cases too.
In the original 2002 paper where NDCG is proposed, it is not stated how to handle such situations, but it is clearly mentioned that
The (D)CG vectors for each IR technique can be normalized by dividing them
by the corresponding ideal (D)CG vectors, component by component. In this way,
for any vector position, the normalized value 1 represents ideal performance,
and values in the range [0, 1) the share of ideal performance cumulated by each
technique.
meaning that NDCG(y,y) must always be 1.
I suggest excluding observations without relevant items and/or throwing a warning.
Steps/Code to Reproduce
>>> from sklearn.metrics import ndcg_score
>>> y = np.array([[1.0, 0.0, 1.0], [0.0, 0.0, 0.0]])
>>> ndcg_score(y, y)
Expected Results
Actual Results
0.5
Versions
This code was not changed in 1.5, so I guess for newer versions the issue also is actual.
System:
python: 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801]
executable: /usr/bin/python3
machine: Linux-6.6.19-1-MANJARO-x86_64-with-glibc2.39
Python dependencies:
sklearn: 1.3.1
pip: 24.0
setuptools: 69.0.3
numpy: 1.26.4
scipy: 1.10.1
Cython: 3.0.9
pandas: 1.5.3
matplotlib: 3.7.1
joblib: 1.2.0
threadpoolctl: 3.1.0
Built with OpenMP: True
threadpoolctl info:
user_api: openmp
internal_api: openmp
prefix: libgomp
filepath: /home/arabella/.local/lib/python3.11/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
version: None
num_threads: 8
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: /home/arabella/.local/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so
version: 0.3.23.dev
threading_layer: pthreads
architecture: Haswell
num_threads: 8
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: /home/arabella/.local/lib/python3.11/site-packages/scipy.libs/libopenblasp-r0-41284840.3.18.so
version: 0.3.18
threading_layer: pthreads
architecture: Haswell
num_threads: 8