ENH vectorize labeled ranking average precision #3

jnothman · 2014-04-24T12:15:07Z

No description provided.

jnothman · 2014-04-24T12:15:57Z

Sorry, got distracted by this. It would be nice not to have to use memory in n_features ** 2, but I haven't found a nice way to do so.

arjoly · 2014-04-24T15:50:43Z

Thanks a lot for your help!!

However, I fear that it will not scale with the number of n_labels.
With this benchmark script, I go 8000 t the following timing results

n_labels = 500
-------------------------------------
original 1.83617901802s
vectorized 16.3870398998s 


n_labels = 750
-------------------------------------
original 2.10820603371s
vectorized 114.211375952s

Note that one of my applications have around 20000 samples and 4000 labels.

jnothman · 2014-04-24T21:55:52Z

Okay. I thought this might be an issue. :(

I'll keep thinking about whether there's a linear memory solution.

On 25 April 2014 01:50, Arnaud Joly notifications@github.com wrote:

Thanks a lot for your help!!

However, I fear that it will not scale with the number of n_labels.
With this benchmark script https://gist.github.com/arjoly/11259383, I
got the following timing results

n_labels = 500

original 1.83617901802s
vectorized 16.3870398998s

n_labels = 750

original 2.10820603371s
vectorized 114.211375952s

Note that one of my applications have around 20000 samples and 4000 labels.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/3#issuecomment-41296646
.

jnothman · 2014-04-24T22:03:18Z

Btw, that benchmark is probably because in the n_labels=500 case the
vectorized version is allocating a 19GB array if my calculations are
correcgt.

On 25 April 2014 07:55, Joel Nothman jnothman@student.usyd.edu.au wrote:

Okay. I thought this might be an issue. :(

I'll keep thinking about whether there's a linear memory solution.

On 25 April 2014 01:50, Arnaud Joly notifications@github.com wrote:

Thanks a lot for your help!!

However, I fear that it will not scale with the number of n_labels.
With this benchmark script https://gist.github.com/arjoly/11259383, I
got the following timing results

n_labels = 500

original 1.83617901802s
vectorized 16.3870398998s

n_labels = 750

original 2.10820603371s
vectorized 114.211375952s

Note that one of my applications have around 20000 samples and 4000
labels.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/3#issuecomment-41296646
.

arjoly and others added 7 commits April 23, 2014 19:54

ENH add label ranking average precision

0b29b49

DOC write narrative doc for label ranking average precision

c511965

DOC FIX error + wording

89583b4

TST invariance testing + handle degenerate case

1959066

flake8

290a5a9

FIX use np.bincount

3bf48e5

ENH vectorize labeled ranking average precision

84092fb

arjoly mentioned this pull request Apr 24, 2014

[MRG+1] Label ranking average precision scikit-learn/scikit-learn#2804

Merged

arjoly closed this Sep 1, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH vectorize labeled ranking average precision #3

ENH vectorize labeled ranking average precision #3

Uh oh!

Uh oh!

Uh oh!

Uh oh!

n_labels = 500

n_labels = 750

Uh oh!

n_labels = 500

n_labels = 750

Uh oh!

Uh oh!

ENH vectorize labeled ranking average precision #3

ENH vectorize labeled ranking average precision #3

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

n_labels = 500

n_labels = 750

Uh oh!

n_labels = 500

n_labels = 750

Uh oh!

Uh oh!