[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Futures: Parallel random number generation (RNG) #60

Closed
HenrikBengtsson opened this issue Dec 8, 2022 · 5 comments
Closed

Futures: Parallel random number generation (RNG) #60

HenrikBengtsson opened this issue Dec 8, 2022 · 5 comments

Comments

@HenrikBengtsson
Copy link

To prevent non-sound random numbers being produced when running in parallel, futureverse asks the developer to specify when their code needs the RNG. If not asked for, it'll still check to see if the RNG was used (i.e. .Random.seed) was updated. If it was, then a warning is produced.

Here is an example:

> library(pbapply)
> future::plan("multisession")
> y <- pblapply(1:2, FUN = rnorm, cl = "future")
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
Warning messages:
1: UNRELIABLE VALUE: One of the 'future.apply' iterations ('future_lapply-1') unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore". 
2: UNRELIABLE VALUE: One of the 'future.apply' iterations ('future_lapply-2') unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore". 

To avoid this, a quick fix is for you could always pass future.seed = TRUE. That will set up a parallel RNG regardless of random numbers being generated or not. The downside is that it can be computationally expensive to do so. To give the developer the control, you'd have to introduce a new argument allowing the to control the future.seed argument to future_lapply() and likes. One way to do that without adding a new argument could be via attributes, e.g.

y <- pblapply(1:2, FUN = rnorm, cl = structure("future", future.seed = TRUE))
@psolymos
Copy link
Owner

I like the attribute for the cl argument, but it might be a bit alien for some users. How about adding it to pboptions()? I.e. have it unset (NULL) on load, but check for the existence of the future.seed option and use that value.

@HenrikBengtsson
Copy link
Author
HenrikBengtsson commented Dec 10, 2022

How about adding it to pboptions()?

This is something the developer should control in their code. I don't think it should be modifiable by the end-user via an option - that'll give different results depending on option, which probably is not what the developer intended.

@psolymos
Copy link
Owner

I see the distinction. If the user is calling pb*apply(..., cl = "future") they should be able to set it as attribute, but if this is being used as part of another package, it is baked in.

@psolymos
Copy link
Owner
psolymos commented Dec 10, 2022

One can pass the future.seed argument directly through ... because ?future.apply::future_lapply tells:

For future_*apply() functions and replicate(), any future.* arguments part of \dots are passed on to future_lapply() used internally.

See:

r$> y <- pblapply(1:2, FUN = rnorm, cl = "future")
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
Warning messages:
1: UNRELIABLE VALUE: One of the ‘future.apply’ iterations (‘future_lapply-1’) unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore". 
2: UNRELIABLE VALUE: One of the ‘future.apply’ iterations (‘future_lapply-2’) unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore". 

r$> y <- pblapply(1:2, FUN = rnorm, cl = "future", future.seed = TRUE)
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
# no warnings

So developers can utilize this behaviour to set the future seed.

psolymos added a commit that referenced this issue Dec 10, 2022
Signed-off-by: Peter Solymos <psolymos@gmail.com>
@HenrikBengtsson
Copy link
Author

So developers can utilize this behaviour to set the future seed.

Good point. Yes, that looks like the cleanest solution. Then a rule of thumb can be to "pass any additional arguments to FUN immediately following the FUN argument, and any additional arguments to the the futureverse after cl = "future";

y <- pblapply(1:2, FUN = my_fcn, {additional my_fcn args}, cl = "future", {additional future args})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants