Code for On the Sample Complexity of Representation Learning in Multi-Task Bandits with Global and Local Structure
OSRL (Optimal Representation Learning in Multi-Task Bandits) comprises an algorithm that addresses the problem of sample complexity with fixed confidence in Multi-Task Bandit problems. Accepted at the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI23)
Author: Alessio Russo
The code contains not only the algorithm mentioned above, but also KL-UCB [1], D-Track and Stop/D-Track and Stop with challenger modification [2].
All the code has been written in Python or C.
All experiments were executed on a stationary desktop computer, featuring an Intel Xeon Silver 4110 CPU, 48GB of RAM. Ubuntu 18.04 was installed on the computer. Ubuntu is a open-source Operating System using the Linux kernel and based on Debian. For more information, please check https://ubuntu.com/.
We set up our experiments using the following software and libraries:
- Python 3.7.7
- Cython version 0.29.15
- NumPy version 1.18.1
- SciPy version 1.4.1
- PyTorch version 1.4.0
All the code can be found in the folder src
.
You can run sample simulations by running the Jupyter notebooks located in the folder notebooks
.
To run the notebooks you need to install Jupyter first. After that, you can open a shell in the notebooks
directory and run
jupyter notebook
This will open the jupyter interface, where you can select which file to run.
MIT license.
[1] Garivier, Aurélien, and Olivier Cappé. "The KL-UCB algorithm for bounded stochastic bandits and beyond." Proceedings of the 24th annual conference on learning theory. 2011. [2] Garivier, Aurélien, and Emilie Kaufmann. "Optimal best arm identification with fixed confidence." Conference on Learning Theory. 2016.