8000 Common tests: Check that random_state is not altered · Issue #1108 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Common tests: Check that random_state is not altered #1108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Sep 3, 2012 · 13 comments
Closed

Common tests: Check that random_state is not altered #1108

amueller opened this issue Sep 3, 2012 · 13 comments

Comments

@amueller
Copy link
Member
amueller commented Sep 3, 2012

Not sure this is what we want, but I think it is:
fit should not change the random_state so that fitting twice on the same data set should produce the same result.
There was a comment by @ogrisel in #455 along these lines, I think.

@ogrisel
Copy link
Member
ogrisel commented Sep 3, 2012

+1 but this will have a very wide impact.

@amueller
Copy link
Member Author
amueller commented Sep 3, 2012

Yeah, this is gonna be real fun ^^

@ogrisel
Copy link
Member
ogrisel commented Sep 3, 2012

We should probably change the current fit idiom:

self.random_state = check_random_state(self.random_state)

to:

self.random_state_ = random_state = check_random_state(self.random_state)

and use that one instead in fit. That would also require pass random_state=0 (fixed seed) in the constructors of all the randomized estimators. Some people might not like it...

@amueller
Copy link
Member Author
amueller commented Sep 3, 2012

Why do you want to make random_state_ an attribute? It only needs to be local to fit. If you need another random state somewhere else, why not generate it again from the self.random_state?

I'm -1 on fixing seeds by default.

@GaelVaroquaux
Copy link
Member

+1 but this will have a very wide impact.

+1

@GaelVaroquaux
Copy link
Member

Why do you want to make random_state_ an attribute? It only needs to be
local to fit. If you need another random state somewhere else, why not
generate it again from the self.random_state?

GMMs and HMMs can use it to sample observations from the model. People
expect that sampling twice will give them different observations.

@ogrisel
Copy link
Member
ogrisel commented Sep 3, 2012

I'm -1 on fixing seeds by default.

Indeed, this is not a requirement.

GMMs and HMMs can use it to sample observations from the model. People
expect that sampling twice will give them different observations.

I don't think so. The sampling method should have it's own random_state kwargs to make that controllable.

@amueller
Copy link
Member Author
amueller commented Sep 3, 2012

I agree with @ogrisel on the hmm issue.
About self.random_state_: I don't like this duplication of attributes for the sake of not changing the model.
I think it becomes pretty confusing to distinguish the two. I know we use this in some places and it can not always be avoided, but I think we should try.

@ogrisel
Copy link
Member
ogrisel commented Sep 3, 2012

If you want idem potent fit calls you need to remember the original seed somewhere. But this is actually not the random_state itself that should be stored, but an integer seed. I think we should write a use case for the API caller point of view before moving on with this discussion.

@amueller
Copy link
Member Author
amueller commented Sep 3, 2012

agreed :)

@glouppe
Copy link
Contributor
glouppe commented Sep 4, 2012

To avoid confusion, we may rename random_state to seed in the constructors and then set random_state = check_random_state(self.seed) (or self.random_state_ = ... if it needs to be stored) in `fit? Too bad this would break the current API though... :(

@mblondel
Copy link
Member

+1 for not overwriting self.random_state

PR #1200 had this problem too.

@amueller
Copy link
Member Author

Closed by #1518 and #1670.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
0