[go: up one dir, main page]

skip to main content
research-article
Open access

Dremel: Adaptive Configuration Tuning of RocksDB KV-Store

Published: 06 June 2022 Publication History

Abstract

LSM-tree-based key-value stores like RocksDB are widely used to support many applications. However, configuring a RocksDB instance is challenging for the following reasons: 1) RocksDB has a massive parameter space to configure; 2) there are inherent trade-offs and dependencies between parameters; 3) right configurations are dependent on workload and hardware; and 4) evaluating configurations is time-consuming. Prior works struggle with handling the curse of dimensionality, capturing relationships between parameters, adapting configurations to workload and hardware, and evaluating quickly. In this work, we present a system, Dremel, to adaptively and quickly configure RocksDB with strategies based on the Multi-Armed Bandit model. To handle the massive parameter space, we propose using fused features, which encode domain-specific knowledge, to work as a compact and powerful representation for configurations. To adapt to the workload and hardware, we build an online bandit model to identify the best configuration. To evaluate quickly, we enable multi-fidelity evaluation and upper-confidence-bound sampling to speed up identifying the best configuration. Dremel not only achieves up to ×2.61 higher IOPS and 57% less latency than default configurations but also achieves up to 63% improvements over prior works on 18 different settings with the same or less time budget.

References

[1]
Sami Alabed and Eiko Yoneki. 2021. High-Dimensional Bayesian Optimization with Multi-Task Learning for RocksDB. In Proceedings of the 1st Workshop on Machine Learning and Systems . 111--119.
[2]
Apache. 2021. Flink . https://flink.apache.org/.
[3]
Apache. 2022. Cassandra . https://cassandra.apache.org/.
[4]
Jean-Yves Audibert, Sébastien Bubeck, and Rémi Munos. 2010. Best arm identification in multi-armed bandits. In COLT. Citeseer, 41--53.
[5]
Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, Vol. 3, Nov (2002), 397--422.
[6]
Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, and Eytan Bakshy. 2020. BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. In Advances in Neural Information Processing Systems 33 . http://arxiv.org/abs/1910.06403
[7]
Oana Balmau, Florin Dinu, Willy Zwaenepoel, Karan Gupta, Ravishankar Chandhiramoorthi, and Diego Didona. 2019. SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores. In 2019 USENIX Annual Technical Conference (USENIX ATC 19) . USENIX Association, Renton, WA, 753--766. https://www.usenix.org/conference/atc19/presentation/balmau
[8]
James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of machine learning research, Vol. 13, 2 (2012).
[9]
Burton H Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, Vol. 13, 7 (1970), 422--426.
[10]
Sébastien Bubeck and Nicolo Cesa-Bianchi. 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. arXiv preprint arXiv:1204.5721 (2012).
[11]
Sébastien Bubeck, Rémi Munos, and Gilles Stoltz. 2009. Pure exploration in multi-armed bandits problems. In International conference on Algorithmic learning theory. Springer, 23--37.
[12]
Zhichao Cao, Siying Dong, Sagar Vemuri, and David HC Du. 2020. Characterizing, modeling, and benchmarking rocksdb key-value workloads at facebook. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 209--223.
[13]
Yu-Zhen Janice Chen, Stephen Pasteris, Mohammad Hajiesmaili, John Lui, Don Towsley, et almbox. 2021. Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback. Advances in Neural Information Processing Systems, Vol. 34 (2021).
[14]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing. 143--154.
[15]
Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian Kroth, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2020. From wisckey to bourbon: A learned index for log-structured merge trees. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) . 155--171.
[16]
Valentin Dalibard, Michael Schaarschmidt, and Eiko Yoneki. 2017. BOAT: Building auto-tuners with structured Bayesian optimization. In Proceedings of the 26th International Conference on World Wide Web. 479--488.
[17]
Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal navigable key-value store. In Proceedings of the 2017 ACM International Conference on Management of Data. 79--94.
[18]
Niv Dayan and Stratos Idreos. 2018. Dostoevsky: Better space-time trade-offs for LSM-tree based key-value stores via adaptive removal of superfluous merging. In Proceedings of the 2018 International Conference on Management of Data. 505--520.
[19]
Niv Dayan and Stratos Idreos. 2019. The log-structured merge-bush & the wacky continuum. In Proceedings of the 2019 International Conference on Management of Data. 449--466.
[20]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's highly available key-value store. ACM SIGOPS operating systems review, Vol. 41, 6 (2007), 205--220.
[21]
Rémy Degenne and Vianney Perchet. 2016. Anytime optimal algorithms in stochastic multi-armed bandits. In International Conference on Machine Learning. PMLR, 1587--1595.
[22]
Diego Didona, Nikolas Ioannou, Radu Stoica, and Kornilios Kourtis. 2020. Toward a better understanding and evaluation of tree structures on flash ssds. arXiv preprint arXiv:2006.04658 (2020).
[23]
Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. 2015. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Twenty-fourth international joint conference on artificial intelligence .
[24]
Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. 2017. Optimizing Space Amplification in RocksDB. In CIDR, Vol. 3. 3.
[25]
Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. 2021. Evolution of Development Priorities in Key-value Stores Serving Large-scale Applications: The RocksDB Experience. In 19th USENIX Conference on File and Storage Technologies (FAST 21). 33--49.
[26]
Yihan Du, Siwei Wang, Zhixuan Fang, and Longbo Huang. 2021. Continuous Mean-Covariance Bandits. Advances in Neural Information Processing Systems, Vol. 34 (2021).
[27]
Eyal Even-Dar, Shie Mannor, Yishay Mansour, and Sridhar Mahadevan. 2006. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems. Journal of machine learning research, Vol. 7, 6 (2006).
[28]
Facebook. 2021. RocksDB . https://github.com/facebook/rocksdb .
[29]
Jacob R Gardner, Geoff Pleiss, David Bindel, Kilian Q Weinberger, and Andrew Gordon Wilson. 2018. Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration. arXiv preprint arXiv:1809.11165 (2018).
[30]
Arnob Ghosh and Vaneet Aggarwal. 2020. Model free reinforcement learning algorithm for stationary mean field equilibrium for multiple types of agents. arXiv preprint arXiv:2012.15377 (2020).
[31]
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and David Sculley. 2017. Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining . 1487--1495.
[32]
Google. 2021. LevelDB . https://github.com/google/leveldb/.
[33]
Joseph M Hellerstein, Michael Stonebraker, and James Hamilton. 2007. Architecture of a database system .Now Publishers Inc.
[34]
Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, et almbox. 2020. TiDB: a Raft-based HTAP database. Proceedings of the VLDB Endowment, Vol. 13, 12 (2020), 3072--3084.
[35]
Gui Huang, Xuntao Cheng, Jianying Wang, Yujie Wang, Dengcheng He, Tieying Zhang, Feifei Li, Sheng Wang, Wei Cao, and Qiang Li. 2019. X-Engine: An optimized storage engine for large-scale E-commerce transaction processing. In Proceedings of the 2019 International Conference on Management of Data. 651--665.
[36]
Haoyu Huang and Shahram Ghandeharizadeh. 2021. Nova-LSM: A Distributed, Component-based LSM-tree Key-value Store. In Proceedings of the 2021 International Conference on Management of Data. 749--763.
[37]
Stratos Idreos, Manos Athanassoulis, Niv Dayan, Demi Guo, Mike S Kester, Lukas Maas, and Kostas Zoumpatianos. 2015. Past and future steps for adaptive storage data systems: From shallow to deep adaptivity. In Real-Time Business Intelligence and Analytics. Springer, 85--94.
[38]
Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, et almbox. 2019. Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn. In CIDR .
[39]
Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S Kester, and Demi Guo. 2018. The data calculator: Data structure design and cost synthesis from first principles and learned cost models. In Proceedings of the 2018 International Conference on Management of Data. 535--550.
[40]
Yichen Jia and Feng Chen. 2020. Kill Two Birds with One Stone: Auto-tuning RocksDB for High Bandwidth and Low Latency. In 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) . IEEE, 652--664.
[41]
Yuchen Jin, Tianyi Zhou, Liangyu Zhao, Yibo Zhu, Chuanxiong Guo, Marco Canini, and Arvind Krishnamurthy. 2021. AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly. arXiv preprint arXiv:2105.10762 (2021).
[42]
Kirthevasan Kandasamy, Gautam Dasarathy, Junier Oliva, Jeff Schneider, and Barnabás Póczos. 2016a. Gaussian process optimisation with multi-fidelity evaluations. In Proceedings of the 30th/International Conference on Advances in Neural Information Processing Systems (NIPS'30) .
[43]
Kirthevasan Kandasamy, Gautam Dasarathy, Barnabas Poczos, and Jeff Schneider. 2016b. The multi-fidelity multi-armed bandit. Advances in neural information processing systems, Vol. 29 (2016), 1777--1785.
[44]
Zohar Karnin, Tomer Koren, and Oren Somekh. 2013. Almost optimal exploration in multi-armed bandits. In International Conference on Machine Learning . PMLR, 1238--1246.
[45]
Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter. 2017. Fast bayesian optimization of machine learning hyperparameters on large datasets. In Artificial Intelligence and Statistics. PMLR, 528--536.
[46]
Aaron Klein, Stefan Falkner, Jost Tobias Springenberg, and Frank Hutter. 2016. Learning curve prediction with Bayesian neural networks. (2016).
[47]
Avinash Lakshman and Prashant Malik. 2010. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, Vol. 44, 2 (2010), 35--40.
[48]
Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. 2019. Kvell: the design and implementation of a fast persistent key-value store. In Proceedings of the 27th ACM Symposium on Operating Systems Principles . 447--461.
[49]
Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. Qtune: A query-aware database tuning system with deep reinforcement learning. Proceedings of the VLDB Endowment, Vol. 12, 12 (2019), 2118--2130.
[50]
Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 6765--6816.
[51]
Hyeontaek Lim, David G Andersen, and Michael Kaminsky. 2016. Towards accurate and fast evaluation of multi-stage log-structured designs. In 14th USENIX Conference on File and Storage Technologies (FAST 16) . 149--166.
[52]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Hariharan Gopalakrishnan, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. 2017. Wisckey: Separating keys from values in ssd-conscious storage. ACM Transactions on Storage (TOS), Vol. 13, 1 (2017), 1--28.
[53]
Chen Luo and Michael J Carey. 2019. On performance stability in LSM-based storage systems (extended version). arXiv preprint arXiv:1906.09667 (2019).
[54]
Chen Luo and Michael J Carey. 2020. Breaking down memory walls: adaptive memory management in LSM-based storage systems. Proceedings of the VLDB Endowment, Vol. 14, 3 (2020), 241--254.
[55]
Jaehong Min, Ming Liu, Tapan Chugh, Chenxingyu Zhao, Andrew Wei, In Hwan Doh, and Arvind Krishnamurthy. 2021. Gimbal: Enabling Multi-Tenant Storage Disaggregation on SmartNIC JBOFs. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference (Virtual Event, USA) (SIGCOMM '21). Association for Computing Machinery, New York, NY, USA, 106--122. https://doi.org/10.1145/3452296.3472940
[56]
MongoDB. 2021. WiredTiger . https://github.com/wiredtiger/wiredtiger .
[57]
Conor Newton, Ayalvadi Ganesh, and Henry Reeve. 2022. Asymptotic Optimality for Decentralised Bandits. SIGMETRICS Perform. Eval. Rev., Vol. 49, 2 (jan 2022), 51--53. https://doi.org/10.1145/3512798.3512817
[58]
Gijun Oh, Junseok Yang, and Sungyong Ahn. 2021. Efficient Key-Value Data Placement for ZNS SSD. Applied Sciences, Vol. 11, 24 (2021), 11842.
[59]
Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica, Vol. 33, 4 (1996), 351--385.
[60]
Fengfeng Pan, Yinliang Yue, and Jin Xiong. 2017. dCompaction: Delayed compaction for the LSM-tree. International Journal of Parallel Programming, Vol. 45, 6 (2017), 1310--1325.
[61]
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. Pebblesdb: Building key-value stores using fragmented log-structured merge trees. In Proceedings of the 26th Symposium on Operating Systems Principles. 497--514.
[62]
Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc., Vol. 58, 5 (1952), 527--535.
[63]
Abishek Sankararaman, Ayalvadi Ganesh, and Sanjay Shakkottai. 2019. Social Learning in Multi Agent Multi Armed Bandits. Proc. ACM Meas. Anal. Comput. Syst., Vol. 3, 3, Article 53 (dec 2019), bibinfonumpages35 pages. https://doi.org/10.1145/3366701
[64]
Subhadeep Sarkar, Dimitris Staratzis, Ziehen Zhu, and Manos Athanassoulis. 2021. Constructing and analyzing the LSM compaction design space. Proceedings of the VLDB Endowment, Vol. 14, 11 (2021), 2216--2229.
[65]
Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias Seeger. 2009. Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995 (2009).
[66]
Jian Tan, Tieying Zhang, Feifei Li, Jie Chen, Qixing Zheng, Ping Zhang, Honglin Qiao, Yue Shi, Wei Cao, and Rui Zhang. 2019. ibtune: Individualized buffer tuning for large-scale cloud databases. Proceedings of the VLDB Endowment, Vol. 12, 10 (2019), 1221--1234.
[67]
Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic database management system tuning through large-scale machine learning. In Proceedings of the 2017 ACM International Conference on Management of Data . 1009--1024.
[68]
Ting Yao, Yiwen Zhang, Jiguang Wan, Qiu Cui, Liu Tang, Hong Jiang, Changsheng Xie, and Xubin He. 2020. MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with Matrix Container in NVM. In 2020 USENIX Annual Technical Conference (USENIXATC 20) . 17--31.
[69]
Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, et almbox. 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data. 415--432.
[70]
Xinyi Zhang, Hong Wu, Zhuo Chang, Shuowei Jin, Jian Tan, Feifei Li, Tieying Zhang, and Bin Cui. 2021. Restune: Resource oriented tuning boosted by meta-learning for cloud databases. In Proceedings of the 2021 International Conference on Management of Data . 2102--2114.

Cited By

View all
  • (2024)LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud ServicesProceedings of the VLDB Endowment10.14778/3685800.368580717:12(3799-3812)Online publication date: 1-Aug-2024
  • (2024)VDTuner: Automated Performance Tuning for Vector Data Management Systems2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00332(4357-4369)Online publication date: 13-May-2024
  • (2022)DremelACM SIGMETRICS Performance Evaluation Review10.1145/3547353.353097050:1(61-62)Online publication date: 7-Jul-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 6, Issue 2
POMACS
June 2022
499 pages
EISSN:2476-1249
DOI:10.1145/3543145
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2022
Published in POMACS Volume 6, Issue 2

Check for updates

Author Tags

  1. configuration management
  2. multi-armed bandit
  3. rocksdb

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)346
  • Downloads (Last 6 weeks)36
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud ServicesProceedings of the VLDB Endowment10.14778/3685800.368580717:12(3799-3812)Online publication date: 1-Aug-2024
  • (2024)VDTuner: Automated Performance Tuning for Vector Data Management Systems2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00332(4357-4369)Online publication date: 13-May-2024
  • (2022)DremelACM SIGMETRICS Performance Evaluation Review10.1145/3547353.353097050:1(61-62)Online publication date: 7-Jul-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media