8000 NDCG in case of abscence of relevant items · Issue #29521 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
NDCG in case of abscence of relevant items #29521
Open
@arabel1a

Description

@arabel1a

Describe the bug

In sklearn.metrics._ndcg_sample_scores, there is a counterintuitive handling of the case where all true relevances are equal to zero for some samples. In this case, DCG = 0, IDCG = 0, and the whole NDCG is not defined. In sklearn implementation it is defined as 0 and included in the averaged NDCG calculation. The least leads to strange effects, like ndcg_score(y,y) != 1; moreover, it affects the metric value in non-trivial cases too.

In the original 2002 paper where NDCG is proposed, it is not stated how to handle such situations, but it is clearly mentioned that

The (D)CG vectors for each IR technique can be normalized by dividing them
by the corresponding ideal (D)CG vectors, component by component. In this way,
for any vector position, the normalized value 1 represents ideal performance,
and values in the range [0, 1) the share of ideal performance cumulated by each
technique.

meaning that NDCG(y,y) must always be 1.

I suggest excluding observations without relevant items and/or throwing a warning.

Steps/Code to Reproduce

>>> from sklearn.metrics import ndcg_score
>>> y = np.array([[1.0, 0.0, 1.0], [0.0, 0.0, 0.0]])
>>> ndcg_score(y, y)

Expected Results

Actual Results

0.5

Versions

This code was not changed in 1.5, so I guess for newer versions the issue also is actual.



System:
    python: 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801]
executable: /usr/bin/python3
   machine: Linux-6.6.19-1-MANJARO-x86_64-with-glibc2.39

Python dependencies:
      sklearn: 1.3.1
          pip: 24.0
   setuptools: 69.0.3
        numpy: 1.26.4
        scipy: 1.10.1
       Cython: 3.0.9
       pandas: 1.5.3
   matplotlib: 3.7.1
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /home/arabella/.local/lib/python3.11/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None
    num_threads: 8

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/arabella/.local/lib/python3.11/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: Haswell
    num_threads: 8

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/arabella/.local/lib/python3.11/site-packages/scipy.libs/libopenblasp-r0-41284840.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: Haswell
    num_threads: 8

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0