[go: up one dir, main page]

skip to main content
research-article

Bringing Order to Chaos: Barrier-Enabled I/O Stack for Flash Storage

Published: 03 October 2018 Publication History

Abstract

This work is dedicated to eliminating the overhead required for guaranteeing the storage order in the modern IO stack. The existing block device adopts a prohibitively expensive approach in ensuring the storage order among write requests: interleaving the write requests with Transfer-and-Flush. For exploiting the cache barrier command for flash storage, we overhaul the IO scheduler, the dispatch module, and the filesystem so that these layers are orchestrated to preserve the ordering condition imposed by the application with which the associated data blocks are made durable. The key ingredients of Barrier-Enabled IO stack are Epoch-based IO scheduling, Order-Preserving Dispatch, and Dual-Mode Journaling. Barrier-enabled IO stack can control the storage order without Transfer-and-Flush overhead. We implement the barrier-enabled IO stack in server as well as in mobile platforms. SQLite performance increases by 270% and 75%, in server and in smartphone, respectively. In a server storage, BarrierFS brings as much as by 43 × and by 73× performance gain in MySQL and SQLite, respectively, against EXT4 via relaxing the durability of a transaction.

References

[1]
Jens Axboe. 2004. Linux block IO present and future. In Proceedings of the Ottawa Linux Symposium. Ottawa, Ontario, Canada.
[2]
Steve Best. 2000. JFS Overview. Retrieved from http://jfs.sourceforge.net/project/pub/jfs.pdf.
[3]
Yu-Ming Chang, Yuan-Hao Chang, Tei-Wei Kuo, Yung-Chun Li, and Hsiang-Pang Li. 2015. Achieving SLC performance with MLC flash memory. In Proceedings of the Design Automation Conference (DAC’15).
[4]
F. Chen, R. Lee, and X. Zhang. 2011. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In Proceedings of the IEEE Symposium on High Performance Computer Architecture (HPCA’11).
[5]
Qingshu Chen, Liang Liang, Yubin Xia, Haibo Chen, and Hyunsoo Kim. 2016. Mitigating sync amplification for copy-on-write virtual disk. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’16). 241--247.
[6]
Vijay Chidambaram. 2015. Orderless and Eventually Durable File Systems. Ph.D. Dissertation. University of Wisconsin--Madison.
[7]
Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’13).
[8]
Vijay Chidambaram, Tushar Sharma, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012. Consistency without ordering. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’12).
[9]
Yong Sung Cho, Il Han Park, Sang Yong Yoon, Nam Hee Lee, Sang Hyun Joo, Ki-Whan Song, Kihwan Choi, Jin-Man Han, Kye Hyun Kyung, and Young-Hyun Jun. 2013. Adaptive multi-pulse program scheme based on tunneling speed classification for next generation multi-bit/cell NAND flash. IEEE J. Solid-State Circ. 48, 4 (2013), 948--959.
[10]
James Cipar, Greg Ganger, Kimberly Keeton, Charles B Morrey III, Craig AN Soules, and Alistair Veitch. 2012. LazyBase: Trading freshness for performance in a scalable database. In Proceedings of the ACM European Conference on Computer Systems (EuroSys’12).
[11]
Danny Cobb and Amber Huffman. 2012. NVM express and the PCI express SSD revolution. In Proceedings of the Intel Developer Forum.
[12]
Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’09).
[13]
Jonathan Corbet. 2010. Barriers and journaling filesystems. Retrieved from http://lwn.net/Articles/283161/.
[14]
Jonathan Corbet. 2010. The end of block barriers. Retrieved from https://lwn.net/Articles/400541/.
[15]
Henggang Cui, James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu Kumar, Jinliang Wei, Wei Dai, Gregory R. Ganger, Phillip B. Gibbons, and others. 2014. Exploiting bounded staleness to speed up big data analytics. In Proceedings of the USENIX Annual Technical Conference (ATC’14).
[16]
Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica. 2001. Wide-area cooperative storage with CFS. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’01).
[17]
Brian Dees. 2005. Native command queuing-advanced performance in desktop storage. IEEE Potent. Mag. 24, 4 (2005), 4--7.
[18]
Ramez Elmasri. 2008. Fundamentals of Database Systems. Pearson Education India, 815--817.
[19]
Christopher Frost, Mike Mammarella, Eddie Kohler, Andrew de los Reyes, Shant Hovsepian, Andrew Matsuoka, and Lei Zhang. 2007. Generalized file system dependencies. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’07).
[20]
Jongmin Gim and Youjip Won. 2010. Extract and infer quickly: Obtaining sector geometry of modern hard disk drives. ACM Trans. Stor. 6, 2 (2010).
[21]
Laura M. Grupp, John D. Davis, and Steven Swanson. 2012. The bleak future of NAND flash memory. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’12). 1.
[22]
Jie Guo, Jun Yang, Youtao Zhang, and Yiran Chen. 2013. Low cost power failure protection for MLC NAND flash storage systems with PRAM/DRAM hybrid buffer. In Proceedings of the Design, Automation and Test Conference (DATE’13). 859--864.
[23]
Christoph Hellwig. Patchwork Block: Update Documentation for REQ_FLUSH/REQ_FUA. Retrieved from https://patchwork.kernel.org/patch/134161/.
[24]
Mark Helm, Jae-Kwan Park, Ali Ghalam, Jason Guo, Chang wan Ha, Cairong Hu, Heonwook Kim, Kalyan Kavalipurapu, Eric Lee, Ali Mohammadzadeh, and others. 2014. 19.1 A 128Gb MLC NAND-flash device using 16nm planar cell. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’14).
[25]
SK hynix. 2015. eMMC5.1 solution in SK hynix. Retrieved from https://www.skhynix.com/kor/product/nandEMMC.jsp.
[26]
Sooman Jeong, Kisung Lee, Seongjin Lee, Seoungbum Son, and Youjip Won. 2013. I/O stack optimization for smartphones. In Proceedings of the USENIX Annual Technical Conference (ATC’13). Berkeley, CA.
[27]
JEDEC Standard JESD220C. 2016. Universal flash storage(UFS) version 2.1.
[28]
JEDEC Standard JESD84-B51. 2015. Embedded multi-media card(eMMC) electrical standard (5.1).
[29]
Junbin Kang, Benlong Zhang, Tianyu Wo, Weiren Yu, Lian Du, Shuai Ma, and Jinpeng Huai. 2015. SpanFS: A scalable file system on fast storage devices. In Proceedings of the USENIX Annual Technical Conference (ATC’15). Berkeley, CA.
[30]
Woon-Hak Kang, Sang-Won Lee, Bongki Moon, Gi-Hwan Oh, and Changwoo Min. 2013. X-FTL: Transactional FTL for SQLite databases. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD’13).
[31]
Ram Kesavan, Rohit Singh, Travis Grusecki, and Yuvraj Patel. 2017. Algorithms and data structures for efficient free space reclamation in WAFL. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’17). USENIX Association, Berkeley, CA, 1--14.
[32]
Hyeong-Jun Kim and Jin-Soo Kim. 2011. Tuning the Ext4 filesystem performance for android-based smartphones. In Proceedings of the 2011 International Conference on Frontiers in Computer Education (ICFCE'11), Sabo Sambath and Egui Zhu (Eds.), Vol. 133. Springer, 745--752.
[33]
Youngjae Kim. 2015. An empirical study of redundant array of independent solid-state drives (RAIS). Cluster Comput. 18, 2 (2015), 963--977.
[34]
Alexey Kopytov. 2004. SysBench Manual. Retrieved from http://imysql.com/wp-content/uploads/2014/10/sysbench-manual.pdf.
[35]
Changman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15). Berkeley, CA.
[36]
Seungjae Lee, Jin-yub Lee, Il-han Park, Jongyeol Park, Sung-won Yun, Min-su Kim, Jong-hoon Lee, Minseok Kim, Kangbin Lee, Taeeun Kim, and others. 2016. 7.5 A 128Gb 2b/cell NAND flash memory in 14nm technology with tPROG=640us and 800MB/s I/O rate. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSC’16).
[37]
Wongun Lee, Keonwoo Lee, Hankeun Son, Wook-Hee Kim, Beomseok Nam, and Youjip Won. 2015. WALDIO: Eliminating the filesystem journaling in resolving the journaling of journal anomaly. In Proceedings of the USENIX Annual Technical Conference (ATC’15). Berkeley, CA.
[38]
Lanyue Lu, Yupu Zhang, Thanh Do, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. Physical disentanglement in a container-based file system. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’14).
[39]
Youyou Lu, Jiwu Shu, Jia Guo, Shuai Li, and Onur Mutlu. LightTx: A lightweight transactional design in flash-based SSDs to support flexible transactions. In Proceedings of the IEEE IEEE International Conference on Computer Design (ICCD’13).
[40]
Ashlie Martinez and Vijay Chidambaram. 2017. CrashMonkey: A framework to automatically test file-system crash consistency. In Proceedings of the 9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’17).
[41]
Avantika Mathur, Mingming Cao, Suparna Bhattacharya, Andreas Dilger, Alex Tomas, and Laurent Vivier. 2007. The new ext4 filesystem: Current status and future plans. In Proceedings of the Linux Symposium 2007.
[42]
Marshall K. McKusick, Gregory R. Ganger, and others. 1999. Soft updates: A technique for eliminating most synchronous writes in the fast filesystem. In Proceedings of the USENIX Annual Technical Conference (ATC’99).
[43]
Changwoo Min, Woon-Hak Kang, Taesoo Kim, Sang-Won Lee, and Young Ik Eom. 2015. Lightweight application-level crash consistency on transactional flash storage. In Proceedings of the USENIX Annual Technical Conference (ATC’15). Berkeley, CA.
[44]
Changwoo Min, Sanidhya Kashyap, Steffen Maass, and Taesoo Kim. 2016. Understanding manycore scalability of file systems. In Proceedings of the USENIX Annual Technical Conference (ATC’16).
[45]
Changwoo Min, Sanidhya Kashyap, Steffen Maass, and Taesoo Kim. 2016. Understanding manycore scalability of file systems. In Proceedings of the USENIX Annual Technical Conference (ATC’16). 71--85.
[46]
C Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. 1992. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. 17, 1 (1992), 94--162.
[47]
AB MySQL. 2007. Mysql 5.1 Reference Manual. Sun Microsystems.
[48]
Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. ACM Trans. Stor. 4, 3 (2008), 10:1--10:23.
[49]
Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn. 2006. Rethink the sync. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’06).
[50]
M. Okun and A. Barak. 2002. Atomic writes for data integrity and consistency in shared storage devices for clusters. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’02).
[51]
Jiaxin Ou, Jiwu Shu, and Youyou Lu. 2016. A high performance file system for non-volatile main memory. In Proceedings of the ACM European Conference on Computer Systems (EuroSys’16).
[52]
Xiangyong Ouyang, David Nellans, Robert Wipfel, David Flynn, and Dhabaleswar K Panda. 2011. Beyond block I/O: Rethinking traditional storage primitives. In Proceedings of the IEEE Symposium on High Performance Computer Architecture (HPCA’11).
[53]
Salvador Palanca, Stephen A. Fischer, Subramaniam Maiyuran, and Shekoufeh Qawami. 2016. MFENCE and LFENCE micro-architectural implementation method and system. (July 5 2016). US Patent 9,383,998.
[54]
Stan Park, Terence Kelly, and Kai Shen. 2013. Failure-atomic msync(): A simple and efficient mechanism for preserving the integrity of durable data. In Proceedings of the ACM European Conference on Computer Systems (EuroSys’13).
[55]
Thanumalayan Sankaranarayana Pillai, Ramnatthan Alagappan, Lanyue Lu, Vijay Chidambaram, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. Application crash consistency and performance with CCFS. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’17). Berkeley, CA, 181--196.
[56]
Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2005. IRON file systems. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’05).
[57]
Vijayan Prabhakaran, Thomas L. Rodeheffer, and Lidong Zhou. 2008. Transactional flash. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’08). Berkeley, CA, 147--160. http://dl.acm.org/citation.cfm?id=1855741.1855752
[58]
Dhathri Purohith, Jayashree Mohan, and Vijay Chidambaram. 2017. The dangers and complexities of SQLite benchmarking. In Proceedings of the 8th Asia-Pacific Workshop on Systems (APSys’17). ACM, New York, NY.
[59]
H. Rev. 2014. SCSI Commands Reference Manual. Seagate.
[60]
Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BTRFS: The linux B-tree filesystem. ACM Trans. Stor. 9, 3 (2013).
[61]
Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), 26--52.
[62]
Priya Sehgal, Vasily Tarasov, and Erez Zadok. 2010. Evaluating performance and energy in file system server workloads. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’10). Berkeley, CA.
[63]
Margo I. Seltzer, Gregory R. Ganger, Marshall K. McKusick, Keith A. Smith, Craig A. N. Soules, and Christopher A. Stein. 2000. Journaling versus soft updates: Asynchronous meta-data protection in file systems. In Proceedings of the USENIX Annual Technical Conference (ATC’00). Berkeley, CA.
[64]
Girish Shilamkar. 2007. Journal Checksums. Retrieved from http://wiki.old.lustre.org/images/4/44/Journal-\checksums.pdf.
[65]
SQLite. 2018. Well-known Users of SQLite. Retrieved from https://www.sqlite.org/famous.html.
[66]
Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Annual Technical Conference (ATC’96). Berkeley, CA, 1. http://dl.acm.org/citation.cfm?id=1268299.1268300
[67]
Toshiba. 2015. Toshiba Expands Line-up of e-MMC Version 5.1 Compliant Embedded NAND Flash Memory Modules. Retrieved from http://toshiba.semicon-storage.com/us/company/taec/news/2015/03/memory-20150323-1.html.
[68]
Theodore Ts’o. 2015. Using Cache barrier in liue of REQ_FLUSH. Retrieved from http://www.spinics.net/lists/linux-ext4/msg49018.html.
[69]
Stephen C. Tweedie. 1998. Journaling the linux ext2fs filesystem. In Proceedings of the 4th Annual Linux Expo.
[70]
Rajat Verma, Anton Ajay Mendez, Stan Park, Sandya Mannarswamy, Terence Kelly, and Charles Morrey. 2015. Failure-atomic updates of application data in a linux file system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15). Berkeley, CA
[71]
Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike Dahlin. 2013. Robustness in the salus scalable block store. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI’13). USENIX Association, Berkeley, CA, 357--370. http://dl.acm.org/citation.cfm?id=2482626.2482661
[72]
Zev Weiss, Sriram Subramanian, Swaminathan Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2015. ANViL: Advanced virtualization for modern non-volatile memory devices. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15). Berkeley, CA.
[73]
Andrew Wilson. 2008. The new and improved filebench. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’08). Berkeley, CA.
[74]
Qiumin Xu, Huzefa Siyamwala, Mrinmoy Ghosh, Tameesh Suri, Manu Awasthi, Zvika Guz, Anahita Shayesteh, and Vijay Balakrishnan. 2015. Performance analysis of NVMe SSDs and their implication on real world databases. In Proceedings of the ACM International Systems and Storage Conference (SYSTOR’15). Haifa, Israel.
[75]
S. y. Park, E. Seo, J. Y. Shin, S. Maeng, and J. Lee. 2010. Exploiting internal parallelism of flash-based SSDs. IEEE Comput. Arch. Lett. 9, 1 (2010), 9--12.
[76]
C. Zhang, Y. Wang, T. Wang, R. Chen, D. Liu, and Z. Shao. 2014. Deterministic crash recovery for NAND flash based storage systems. In Proceedings of the ACM/EDAC/IEEE Design Automation Conference (DAC’14).

Cited By

View all
  • (2022)PM-AIO: An Effective Asynchronous I/O System for Persistent MemoryIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2021.310904710:3(1558-1574)Online publication date: 1-Jul-2022
  • (2021)Better atomic writes by exposing the flash out-of-band area to file systemsProceedings of the 22nd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3461648.3463843(12-23)Online publication date: 22-Jun-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 14, Issue 3
Special Issue on FAST 2018 and Regular Papers
August 2018
210 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3282875
  • Editor:
  • Sam H. Noh
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 October 2018
Accepted: 01 July 2018
Received: 01 June 2018
Published in TOS Volume 14, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Filesystem
  2. block device
  3. linux
  4. storage

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Future OS project (IITP)
  • BK21 plus (NRF)
  • ICT R8D program (IITP)
  • Basic Research Lab Program (NRF)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)31
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)PM-AIO: An Effective Asynchronous I/O System for Persistent MemoryIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2021.310904710:3(1558-1574)Online publication date: 1-Jul-2022
  • (2021)Better atomic writes by exposing the flash out-of-band area to file systemsProceedings of the 22nd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3461648.3463843(12-23)Online publication date: 22-Jun-2021

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media