research-article

Open access

Dremel: Adaptive Configuration Tuning of RocksDB KV-Store

Authors:

Chenxingyu Zhao,

Arvind KrishnamurthyAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 6, Issue 2

Article No.: 37, Pages 1 - 30

https://doi.org/10.1145/3530903

Published: 06 June 2022 Publication History

Abstract

LSM-tree-based key-value stores like RocksDB are widely used to support many applications. However, configuring a RocksDB instance is challenging for the following reasons: 1) RocksDB has a massive parameter space to configure; 2) there are inherent trade-offs and dependencies between parameters; 3) right configurations are dependent on workload and hardware; and 4) evaluating configurations is time-consuming. Prior works struggle with handling the curse of dimensionality, capturing relationships between parameters, adapting configurations to workload and hardware, and evaluating quickly. In this work, we present a system, Dremel, to adaptively and quickly configure RocksDB with strategies based on the Multi-Armed Bandit model. To handle the massive parameter space, we propose using fused features, which encode domain-specific knowledge, to work as a compact and powerful representation for configurations. To adapt to the workload and hardware, we build an online bandit model to identify the best configuration. To evaluate quickly, we enable multi-fidelity evaluation and upper-confidence-bound sampling to speed up identifying the best configuration. Dremel not only achieves up to ×2.61 higher IOPS and 57% less latency than default configurations but also achieves up to 63% improvements over prior works on 18 different settings with the same or less time budget.

References

[1]

Sami Alabed and Eiko Yoneki. 2021. High-Dimensional Bayesian Optimization with Multi-Task Learning for RocksDB. In Proceedings of the 1st Workshop on Machine Learning and Systems . 111--119.

Digital Library

[2]

Apache. 2021. Flink . https://flink.apache.org/.

[3]

Apache. 2022. Cassandra . https://cassandra.apache.org/.

[4]

Jean-Yves Audibert, Sébastien Bubeck, and Rémi Munos. 2010. Best arm identification in multi-armed bandits. In COLT. Citeseer, 41--53.

[5]

Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, Vol. 3, Nov (2002), 397--422.

Digital Library

[6]

Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, and Eytan Bakshy. 2020. BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. In Advances in Neural Information Processing Systems 33 . http://arxiv.org/abs/1910.06403

[7]

Oana Balmau, Florin Dinu, Willy Zwaenepoel, Karan Gupta, Ravishankar Chandhiramoorthi, and Diego Didona. 2019. SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores. In 2019 USENIX Annual Technical Conference (USENIX ATC 19) . USENIX Association, Renton, WA, 753--766. https://www.usenix.org/conference/atc19/presentation/balmau

[8]

James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of machine learning research, Vol. 13, 2 (2012).

[9]

Burton H Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, Vol. 13, 7 (1970), 422--426.

Digital Library

[10]

Sébastien Bubeck and Nicolo Cesa-Bianchi. 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. arXiv preprint arXiv:1204.5721 (2012).

[11]

Sébastien Bubeck, Rémi Munos, and Gilles Stoltz. 2009. Pure exploration in multi-armed bandits problems. In International conference on Algorithmic learning theory. Springer, 23--37.

Digital Library

[12]

Zhichao Cao, Siying Dong, Sagar Vemuri, and David HC Du. 2020. Characterizing, modeling, and benchmarking rocksdb key-value workloads at facebook. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 209--223.

Digital Library

[13]

Yu-Zhen Janice Chen, Stephen Pasteris, Mohammad Hajiesmaili, John Lui, Don Towsley, et almbox. 2021. Cooperative Stochastic Bandits with Asynchronous Agents and Constrained Feedback. Advances in Neural Information Processing Systems, Vol. 34 (2021).

[14]

Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing. 143--154.

Digital Library

[15]

Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian Kroth, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2020. From wisckey to bourbon: A learned index for log-structured merge trees. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) . 155--171.

[16]

Valentin Dalibard, Michael Schaarschmidt, and Eiko Yoneki. 2017. BOAT: Building auto-tuners with structured Bayesian optimization. In Proceedings of the 26th International Conference on World Wide Web. 479--488.

Digital Library

[17]

Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal navigable key-value store. In Proceedings of the 2017 ACM International Conference on Management of Data. 79--94.

Digital Library

[18]

Niv Dayan and Stratos Idreos. 2018. Dostoevsky: Better space-time trade-offs for LSM-tree based key-value stores via adaptive removal of superfluous merging. In Proceedings of the 2018 International Conference on Management of Data. 505--520.

Digital Library

[19]

Niv Dayan and Stratos Idreos. 2019. The log-structured merge-bush & the wacky continuum. In Proceedings of the 2019 International Conference on Management of Data. 449--466.

Digital Library

[20]

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's highly available key-value store. ACM SIGOPS operating systems review, Vol. 41, 6 (2007), 205--220.

Digital Library

[21]

Rémy Degenne and Vianney Perchet. 2016. Anytime optimal algorithms in stochastic multi-armed bandits. In International Conference on Machine Learning. PMLR, 1587--1595.

[22]

Diego Didona, Nikolas Ioannou, Radu Stoica, and Kornilios Kourtis. 2020. Toward a better understanding and evaluation of tree structures on flash ssds. arXiv preprint arXiv:2006.04658 (2020).

[23]

Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. 2015. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Twenty-fourth international joint conference on artificial intelligence .

Digital Library

[24]

Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. 2017. Optimizing Space Amplification in RocksDB. In CIDR, Vol. 3. 3.

[25]

Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. 2021. Evolution of Development Priorities in Key-value Stores Serving Large-scale Applications: The RocksDB Experience. In 19th USENIX Conference on File and Storage Technologies (FAST 21). 33--49.

Digital Library

[26]

Yihan Du, Siwei Wang, Zhixuan Fang, and Longbo Huang. 2021. Continuous Mean-Covariance Bandits. Advances in Neural Information Processing Systems, Vol. 34 (2021).

[27]

Eyal Even-Dar, Shie Mannor, Yishay Mansour, and Sridhar Mahadevan. 2006. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems. Journal of machine learning research, Vol. 7, 6 (2006).

[28]

Facebook. 2021. RocksDB . https://github.com/facebook/rocksdb .

[29]

Jacob R Gardner, Geoff Pleiss, David Bindel, Kilian Q Weinberger, and Andrew Gordon Wilson. 2018. Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration. arXiv preprint arXiv:1809.11165 (2018).

[30]

Arnob Ghosh and Vaneet Aggarwal. 2020. Model free reinforcement learning algorithm for stationary mean field equilibrium for multiple types of agents. arXiv preprint arXiv:2012.15377 (2020).

[31]

Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and David Sculley. 2017. Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining . 1487--1495.

Digital Library

[32]

Google. 2021. LevelDB . https://github.com/google/leveldb/.

[33]

Joseph M Hellerstein, Michael Stonebraker, and James Hamilton. 2007. Architecture of a database system .Now Publishers Inc.

Digital Library

[34]

Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, et almbox. 2020. TiDB: a Raft-based HTAP database. Proceedings of the VLDB Endowment, Vol. 13, 12 (2020), 3072--3084.

Digital Library

[35]

Gui Huang, Xuntao Cheng, Jianying Wang, Yujie Wang, Dengcheng He, Tieying Zhang, Feifei Li, Sheng Wang, Wei Cao, and Qiang Li. 2019. X-Engine: An optimized storage engine for large-scale E-commerce transaction processing. In Proceedings of the 2019 International Conference on Management of Data. 651--665.

Digital Library

[36]

Haoyu Huang and Shahram Ghandeharizadeh. 2021. Nova-LSM: A Distributed, Component-based LSM-tree Key-value Store. In Proceedings of the 2021 International Conference on Management of Data. 749--763.

Digital Library

[37]

Stratos Idreos, Manos Athanassoulis, Niv Dayan, Demi Guo, Mike S Kester, Lukas Maas, and Kostas Zoumpatianos. 2015. Past and future steps for adaptive storage data systems: From shallow to deep adaptivity. In Real-Time Business Intelligence and Analytics. Springer, 85--94.

[38]

Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, et almbox. 2019. Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn. In CIDR .

[39]

Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S Kester, and Demi Guo. 2018. The data calculator: Data structure design and cost synthesis from first principles and learned cost models. In Proceedings of the 2018 International Conference on Management of Data. 535--550.

Digital Library

[40]

Yichen Jia and Feng Chen. 2020. Kill Two Birds with One Stone: Auto-tuning RocksDB for High Bandwidth and Low Latency. In 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) . IEEE, 652--664.

[41]

Yuchen Jin, Tianyi Zhou, Liangyu Zhao, Yibo Zhu, Chuanxiong Guo, Marco Canini, and Arvind Krishnamurthy. 2021. AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly. arXiv preprint arXiv:2105.10762 (2021).

[42]

Kirthevasan Kandasamy, Gautam Dasarathy, Junier Oliva, Jeff Schneider, and Barnabás Póczos. 2016a. Gaussian process optimisation with multi-fidelity evaluations. In Proceedings of the 30th/International Conference on Advances in Neural Information Processing Systems (NIPS'30) .

[43]

Kirthevasan Kandasamy, Gautam Dasarathy, Barnabas Poczos, and Jeff Schneider. 2016b. The multi-fidelity multi-armed bandit. Advances in neural information processing systems, Vol. 29 (2016), 1777--1785.

[44]

Zohar Karnin, Tomer Koren, and Oren Somekh. 2013. Almost optimal exploration in multi-armed bandits. In International Conference on Machine Learning . PMLR, 1238--1246.

[45]

Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter. 2017. Fast bayesian optimization of machine learning hyperparameters on large datasets. In Artificial Intelligence and Statistics. PMLR, 528--536.

[46]

Aaron Klein, Stefan Falkner, Jost Tobias Springenberg, and Frank Hutter. 2016. Learning curve prediction with Bayesian neural networks. (2016).

[47]

Avinash Lakshman and Prashant Malik. 2010. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, Vol. 44, 2 (2010), 35--40.

Digital Library

[48]

Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. 2019. Kvell: the design and implementation of a fast persistent key-value store. In Proceedings of the 27th ACM Symposium on Operating Systems Principles . 447--461.

Digital Library

[49]

Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. Qtune: A query-aware database tuning system with deep reinforcement learning. Proceedings of the VLDB Endowment, Vol. 12, 12 (2019), 2118--2130.

Digital Library

[50]

Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research, Vol. 18, 1 (2017), 6765--6816.

Digital Library

[51]

Hyeontaek Lim, David G Andersen, and Michael Kaminsky. 2016. Towards accurate and fast evaluation of multi-stage log-structured designs. In 14th USENIX Conference on File and Storage Technologies (FAST 16) . 149--166.

Digital Library

[52]

Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Hariharan Gopalakrishnan, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. 2017. Wisckey: Separating keys from values in ssd-conscious storage. ACM Transactions on Storage (TOS), Vol. 13, 1 (2017), 1--28.

Digital Library

[53]

Chen Luo and Michael J Carey. 2019. On performance stability in LSM-based storage systems (extended version). arXiv preprint arXiv:1906.09667 (2019).

[54]

Chen Luo and Michael J Carey. 2020. Breaking down memory walls: adaptive memory management in LSM-based storage systems. Proceedings of the VLDB Endowment, Vol. 14, 3 (2020), 241--254.

Digital Library

[55]

Jaehong Min, Ming Liu, Tapan Chugh, Chenxingyu Zhao, Andrew Wei, In Hwan Doh, and Arvind Krishnamurthy. 2021. Gimbal: Enabling Multi-Tenant Storage Disaggregation on SmartNIC JBOFs. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference (Virtual Event, USA) (SIGCOMM '21). Association for Computing Machinery, New York, NY, USA, 106--122. https://doi.org/10.1145/3452296.3472940

Digital Library

[56]

MongoDB. 2021. WiredTiger . https://github.com/wiredtiger/wiredtiger .

[57]

Conor Newton, Ayalvadi Ganesh, and Henry Reeve. 2022. Asymptotic Optimality for Decentralised Bandits. SIGMETRICS Perform. Eval. Rev., Vol. 49, 2 (jan 2022), 51--53. https://doi.org/10.1145/3512798.3512817

Digital Library

[58]

Gijun Oh, Junseok Yang, and Sungyong Ahn. 2021. Efficient Key-Value Data Placement for ZNS SSD. Applied Sciences, Vol. 11, 24 (2021), 11842.

[59]

Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica, Vol. 33, 4 (1996), 351--385.

Digital Library

[60]

Fengfeng Pan, Yinliang Yue, and Jin Xiong. 2017. dCompaction: Delayed compaction for the LSM-tree. International Journal of Parallel Programming, Vol. 45, 6 (2017), 1310--1325.

Digital Library

[61]

Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. Pebblesdb: Building key-value stores using fragmented log-structured merge trees. In Proceedings of the 26th Symposium on Operating Systems Principles. 497--514.

Digital Library

[62]

Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc., Vol. 58, 5 (1952), 527--535.

[63]

Abishek Sankararaman, Ayalvadi Ganesh, and Sanjay Shakkottai. 2019. Social Learning in Multi Agent Multi Armed Bandits. Proc. ACM Meas. Anal. Comput. Syst., Vol. 3, 3, Article 53 (dec 2019), bibinfonumpages35 pages. https://doi.org/10.1145/3366701

Digital Library

[64]

Subhadeep Sarkar, Dimitris Staratzis, Ziehen Zhu, and Manos Athanassoulis. 2021. Constructing and analyzing the LSM compaction design space. Proceedings of the VLDB Endowment, Vol. 14, 11 (2021), 2216--2229.

Digital Library

[65]

Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias Seeger. 2009. Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995 (2009).

[66]

Jian Tan, Tieying Zhang, Feifei Li, Jie Chen, Qixing Zheng, Ping Zhang, Honglin Qiao, Yue Shi, Wei Cao, and Rui Zhang. 2019. ibtune: Individualized buffer tuning for large-scale cloud databases. Proceedings of the VLDB Endowment, Vol. 12, 10 (2019), 1221--1234.

Digital Library

[67]

Dana Van Aken, Andrew Pavlo, Geoffrey J Gordon, and Bohan Zhang. 2017. Automatic database management system tuning through large-scale machine learning. In Proceedings of the 2017 ACM International Conference on Management of Data . 1009--1024.

Digital Library

[68]

Ting Yao, Yiwen Zhang, Jiguang Wan, Qiu Cui, Liu Tang, Hong Jiang, Changsheng Xie, and Xubin He. 2020. MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with Matrix Container in NVM. In 2020 USENIX Annual Technical Conference (USENIXATC 20) . 17--31.

[69]

Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, et almbox. 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data. 415--432.

Digital Library

[70]

Xinyi Zhang, Hong Wu, Zhuo Chang, Shuowei Jin, Jian Tan, Feifei Li, Tieying Zhang, and Bin Cui. 2021. Restune: Resource oriented tuning boosted by meta-learning for cloud databases. In Proceedings of the 2021 International Conference on Management of Data . 2102--2114.

Digital Library

Cited By

Wang HOu JZhao MQiu SJiao YWang YMao QYang ZLiu YZhang JHu JZhang JLiu JChen JShen YCao LZhang HLi HLi MMa YZhang LLiu JZhang GLiu FChen J(2024)LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud ServicesProceedings of the VLDB Endowment10.14778/3685800.368580717:12(3799-3812)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.14778/3685800.3685807
Yang THu WPeng WLi YLi JWang GLiu X(2024)VDTuner: Automated Performance Tuning for Vector Data Management Systems2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00332(4357-4369)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00332
Zhao CChugh TMin JLiu MKrishnamurthy A(2022)DremelACM SIGMETRICS Performance Evaluation Review10.1145/3547353.353097050:1(61-62)Online publication date: 7-Jul-2022
https://dl.acm.org/doi/10.1145/3547353.3530970

Index Terms

Dremel: Adaptive Configuration Tuning of RocksDB KV-Store
1. Computing methodologies
  1. Machine learning
    1. Learning settings
      1. Active learning settings
  2. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies
2. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs
        Key-value stores

Recommendations

Disaggregating RocksDB: A Production Experience
PACMMOD

As in the general industry, there is a trend in Meta's data centers to migrate data from locally attached SSDs to cloud storage. We extended RocksDB [26], a widely used open-source storage engine designed and built for local SSDs, to leverage ...
Dremel: Adaptive Configuration Tuning of RocksDB KV-Store
SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

LSM-tree-based key-value stores like RocksDB are widely used to support many applications. However, configuring a RocksDB instance is challenging for the following reasons: 1) RocksDB has a massive parameter space to configure; 2) there are inherent ...
RocksDB: Evolution of Development Priorities in a Key-value Store Serving Large-scale Applications
This article is an eight-year retrospective on development priorities for RocksDB, a key-value store developed at Facebook that targets large-scale distributed systems and that is optimized for Solid State Drives (SSDs). We describe how the priorities ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 6, Issue 2

POMACS

June 2022

499 pages

EISSN:2476-1249

DOI:10.1145/3543145

Editors:
Augustin Chaintreau
Columbia University
,
Leana Golubchik
University of Southern California
,
Zhi-Li Zhang
University of Minnesota

Issue’s Table of Contents

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2022

Published in POMACS Volume 6, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Cisco Systems

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
861
Total Downloads

Downloads (Last 12 months)346
Downloads (Last 6 weeks)36

Reflects downloads up to 16 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang HOu JZhao MQiu SJiao YWang YMao QYang ZLiu YZhang JHu JZhang JLiu JChen JShen YCao LZhang HLi HLi MMa YZhang LLiu JZhang GLiu FChen J(2024)LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud ServicesProceedings of the VLDB Endowment10.14778/3685800.368580717:12(3799-3812)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.14778/3685800.3685807
Yang THu WPeng WLi YLi JWang GLiu X(2024)VDTuner: Automated Performance Tuning for Vector Data Management Systems2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00332(4357-4369)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00332
Zhao CChugh TMin JLiu MKrishnamurthy A(2022)DremelACM SIGMETRICS Performance Evaluation Review10.1145/3547353.353097050:1(61-62)Online publication date: 7-Jul-2022
https://dl.acm.org/doi/10.1145/3547353.3530970

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents