8000 ENH vectorize labeled ranking average precision by jnothman · Pull Request #3 · arjoly/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH vectorize labeled ranking average precision #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

jnothman
Copy link

No description provided.

@jnothman
Copy link
Author

Sorry, got distracted by this. It would be nice not to have to use memory in n_features ** 2, but I haven't found a nice way to do so.

@arjoly
Copy link
Owner
arjoly commented Apr 24, 2014

Thanks a lot for your help!!

However, I fear that it will not scale with the number of n_labels.
With this benchmark script, I go 8000 t the following timing results

n_labels = 500
-------------------------------------
original 1.83617901802s
vectorized 16.3870398998s 


n_labels = 750
-------------------------------------
original 2.10820603371s
vectorized 114.211375952s

Note that one of my applications have around 20000 samples and 4000 labels.

@jnothman
Copy link
Author

Okay. I thought this might be an issue. :(

I'll keep thinking about whether there's a linear memory solution.

On 25 April 2014 01:50, Arnaud Joly notifications@github.com wrote:

Thanks a lot for your help!!

However, I fear that it will not scale with the number of n_labels.
With this benchmark script https://gist.github.com/arjoly/11259383, I
got the following timing results

n_labels = 500

original 1.83617901802s
vectorized 16.3870398998s

n_labels = 750

original 2.10820603371s
vectorized 114.211375952s

Note that one of my applications have around 20000 samples and 4000 labels.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3#issuecomment-41296646
.

@jnothman
Copy link
Author

Btw, that benchmark is probably because in the n_labels=500 case the
vectorized version is allocating a 19GB array if my calculations are
correcgt.

On 25 April 2014 07:55, Joel Nothman jnothman@student.usyd.edu.au wrote:

Okay. I thought this might be an issue. :(

I'll keep thinking about whether there's a linear memory solution.

On 25 April 2014 01:50, Arnaud Joly notifications@github.com wrote:

Thanks a lot for your help!!

However, I fear that it will not scale with the number of n_labels.
With this benchmark script https://gist.github.com/arjoly/11259383, I
got the following timing results

n_labels = 500

original 1.83617901802s
vectorized 16.3870398998s

n_labels = 750

original 2.10820603371s
vectorized 114.211375952s

Note that one of my applications have around 20000 samples and 4000
labels.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3#issuecomment-41296646
.

@arjoly arjoly closed this Sep 1, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0