CN113779014A - A data storage method, apparatus, device and storage medium - Google Patents
A data storage method, apparatus, device and storage medium Download PDFInfo
- Publication number
- CN113779014A CN113779014A CN202010522791.XA CN202010522791A CN113779014A CN 113779014 A CN113779014 A CN 113779014A CN 202010522791 A CN202010522791 A CN 202010522791A CN 113779014 A CN113779014 A CN 113779014A
- Authority
- CN
- China
- Prior art keywords
- data
- prefix
- file
- split
- files
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明实施例提出了一种数据存储方法、装置、电子设备和计算机存储介质,所述方法包括:按照前缀拆分数据库中包含至少两种不同前缀的数据文件,得到至少两个拆分后的数据文件,其中,每个拆分后的数据文件对应所述至少两种不同前缀中的一种前缀,所述数据文件为按照键进行排序的文件,对所述至少两个拆分后的数据文件进行数据处理;由于拆分后的数据文件只对应一种前缀,不存在对应至少两个前缀的数据文件,因此,在对数据文件进行处理时,可以直接按照前缀对相应的数据文件进行处理,降低了数据处理时的开销。
Embodiments of the present invention provide a data storage method, device, electronic device, and computer storage medium. The method includes: splitting data files containing at least two different prefixes in a database according to prefixes, and obtaining at least two splitted data files. data files, wherein each split data file corresponds to one of the at least two different prefixes, the data files are files sorted by key, and the at least two split data files are File data processing; since the split data file only corresponds to one prefix, and there is no data file corresponding to at least two prefixes, therefore, when processing the data file, the corresponding data file can be processed directly according to the prefix. , reducing the overhead of data processing.
Description
技术领域technical field
本发明涉及信息技术,尤其涉及一种数据存储方法、装置、电子设备和计算机存储介质。The present invention relates to information technology, and in particular, to a data storage method, device, electronic device and computer storage medium.
背景技术Background technique
在相关技术中,在数据存储领域,当上层业务对数据进行分片操作时,由于RocksDB数据库的排序列队表(Sorted Sequence Table,SST)文件中的数据是按照键的排序以及键值大小进行有序存储的,这将会导致某些SST文件中,存在多个数据分片中的数据,在对这些存在多个数据分片的SST文件进行数据分片操作时,无法对需要处理的数据分片直接进行处理。In the related art, in the field of data storage, when the upper-layer business performs sharding operations on data, the data in the Sorted Sequence Table (SST) file of the RocksDB database is sorted according to the order of keys and the size of key values. This will result in the existence of data in multiple data fragments in some SST files. When performing data fragmentation operations on these SST files with multiple data fragments, the data that needs to be processed cannot be divided. slices are processed directly.
发明内容SUMMARY OF THE INVENTION
本发明实施例提供了一种数据存储方法、装置、电子设备和计算机存储介质,可以解决某些SST文件中存在多个数据片中的数据,在进行数据片操作时开销较大的问题。Embodiments of the present invention provide a data storage method, apparatus, electronic device, and computer storage medium, which can solve the problem that data in multiple data slices exists in some SST files, and the data slice operation is relatively expensive.
本发明实施例提供了一种数据存储方法,所述方法包括:An embodiment of the present invention provides a data storage method, and the method includes:
按照前缀拆分数据库中包含至少两种不同前缀的数据文件,得到至少两个拆分后的数据文件,其中,每个拆分后的数据文件对应所述至少两种不同前缀中的一种前缀,所述数据文件为按照键进行排序的文件;Split the data files containing at least two different prefixes in the database according to the prefix, and obtain at least two split data files, wherein each split data file corresponds to one of the at least two different prefixes , the data file is a file sorted by key;
对所述至少两个拆分后的数据文件进行数据处理。Data processing is performed on the at least two split data files.
可选地,所述按照前缀拆分数据库中至少包含至少两种不同前缀的数据文件,得到至少两个拆分后的数据文件,包括:Optionally, the splitting the database according to the prefix at least contains at least two data files with different prefixes to obtain at least two split data files, including:
将所述包含至少两种不同前缀的数据文件,按照所述至少两种不同前缀的顺序拆分为有序排列的所述至少两个拆分后的数据文件。Splitting the data file containing at least two different prefixes into the at least two split data files in order according to the sequence of the at least two different prefixes.
可选地,所述对所述至少两个拆分后的数据文件进行数据处理,包括:对所述至少两个拆分后的数据文件中的至少一个数据文件进行数据删除,和/或,针对所述至少两个拆分后的数据文件进行数据插入。Optionally, the performing data processing on the at least two split data files includes: performing data deletion on at least one data file in the at least two split data files, and/or, Data insertion is performed on the at least two split data files.
可选地,所述对所述至少两个拆分后的数据文件中的至少一个数据文件进行数据删除,包括:Optionally, performing data deletion on at least one data file in the at least two split data files includes:
接收到包含第一前缀的数据删除指令,且所述第一前缀包含所述至少连个拆分后的数据文件中的至少一个数据文件的前缀时,在所述至少两个拆分后的数据文件中删除与所述第一前缀对应的数据文件。When a data deletion instruction including a first prefix is received, and the first prefix includes a prefix of at least one data file in the at least two split data files, when the at least two split data files are The data file corresponding to the first prefix is deleted from the file.
可选地,所述针对所述至少两个拆分后的数据文件进行数据插入,包括:Optionally, the data insertion for the at least two split data files includes:
接收到包含待插入数据文件的数据插入指令,且所述待插入数据文件包含第二前缀时,根据所述第二前缀确定数据插入位置,在确定的所述插入位置插入所述待插入数据文件。When receiving a data insertion instruction including a data file to be inserted, and the data file to be inserted includes a second prefix, determine a data insertion position according to the second prefix, and insert the to-be-inserted data file at the determined insertion position .
可选地,所述根据所述第二前缀确定数据插入位置,包括:Optionally, the determining the data insertion position according to the second prefix includes:
根据所述第二前缀、以及所述至少两个拆分后的数据文件的前缀顺序,确定数据插入位置。The data insertion position is determined according to the second prefix and the prefix order of the at least two split data files.
可选地,所述数据文件是SST文件。Optionally, the data file is an SST file.
本发明实施例还提供了一种数据存储装置,所述装置包括:拆分模块和处理模块,其中,An embodiment of the present invention further provides a data storage device, the device includes: a splitting module and a processing module, wherein,
拆分模块,用于按照前缀拆分数据库中包含至少两种不同前缀的数据文件,得到至少两个拆分后的数据文件,其中,每个拆分后的数据文件对应所述至少两种不同前缀中的一种前缀,所述数据文件是为指按照键进行排序的文件;The splitting module is configured to split the data files containing at least two different prefixes in the database according to the prefixes, and obtain at least two split data files, wherein each split data file corresponds to the at least two different prefixes One of the prefixes, the data files refer to files sorted by key;
处理模块,用于对所述至少两个拆分后的数据文件进行数据处理。A processing module, configured to perform data processing on the at least two split data files.
可选地,所述拆分模块用于将所述包含至少两种不同前缀的数据文件,按照所述至少两种不同前缀的顺序拆分为有序排列的所述至少两个拆分后的数据文件。Optionally, the splitting module is configured to split the data file containing at least two different prefixes into the at least two splitted data files in order according to the order of the at least two different prefixes. data files.
可选地,所述处理模块用于对所述至少两个拆分后的数据文件中的至少一个数据文件进行数据删除,和/或,针对所述至少两个拆分后的数据文件进行数据插入。Optionally, the processing module is configured to perform data deletion on at least one data file in the at least two split data files, and/or, perform data on the at least two split data files. insert.
可选地,所述处理模块用于接收到包含第一前缀的数据删除指令,且所述第一前缀包含所述至少两个拆分后的数据文件中的至少一个数据文件的前缀时,在所述至少两个拆分后的数据文件中删除与所述第一前缀对应的数据文件。Optionally, when the processing module is configured to receive a data deletion instruction that includes a first prefix, and the first prefix includes a prefix of at least one data file in the at least two split data files, in the The data file corresponding to the first prefix is deleted from the at least two split data files.
可选地,所述处理模块用于接收到包含待插入数据文件的数据插入指令,且所述待插入数据文件包含第二前缀时,根据所述第二前缀确定数据插入位置,在确定的所述插入位置插入所述待插入数据文件。Optionally, the processing module is configured to receive a data insertion instruction including a data file to be inserted, and when the data file to be inserted includes a second prefix, determine the data insertion position according to the second prefix, and in the determined The data file to be inserted is inserted at the insertion position.
可选地,所述处理模块用于根据所述第二前缀、以及所述至少两个拆分后的数据文件的前缀顺序,确定数据插入位置。Optionally, the processing module is configured to determine the data insertion position according to the second prefix and the prefix order of the at least two split data files.
可选地,所述数据文件是SST文件。Optionally, the data file is an SST file.
本发明实施例还提供了一种电子设备,包括处理器和用于存储能够在处理器上运行的计算机程序的存储器;其中,An embodiment of the present invention also provides an electronic device, including a processor and a memory for storing a computer program that can be executed on the processor; wherein,
所述处理器用于运行所述计算机程序时,执行上述任意一种数据存储方法。The processor is configured to execute any one of the above data storage methods when running the computer program.
本发明实施例还提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述任意一种所述的数据存储方法。An embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a processor, any one of the data storage methods described above is implemented.
本发明实施例提出的数据存储方法、装置、电子设备和计算机存储介质,按照前缀拆分数据库中包含至少两种不同前缀的数据文件,得到至少两个拆分后的数据文件,其中,每个拆分后的数据文件对应所述至少两种不同前缀中的一种前缀,所述数据文件为按照键进行排序的文件;对所述至少两个拆分后的数据文件进行数据处理;由于拆分后的数据文件只对应一种前缀,不存在对应至少两个前缀的数据文件,不存在对应至少两个前缀的数据文件,因此,在对数据文件进行处理时,可以直接按照前缀对相应的数据文件进行处理,降低了数据处理时的开销。In the data storage method, device, electronic device, and computer storage medium proposed by the embodiments of the present invention, the data files containing at least two different prefixes in the database are split according to the prefix, and at least two split data files are obtained, wherein each The split data files correspond to one of the at least two different prefixes, and the data files are files sorted according to keys; perform data processing on the at least two split data files; The divided data files only correspond to one prefix, there is no data file corresponding to at least two prefixes, and there is no data file corresponding to at least two prefixes. Data files are processed, which reduces the overhead of data processing.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本发明。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本发明的实施例,并与说明书一起用于说明本发明的技术方案。The accompanying drawings, which are incorporated into and constitute a part of this specification, illustrate embodiments consistent with the present invention, and together with the description, serve to explain the technical solutions of the present invention.
图1为相关技术中RocksDB的SST文件的拆分过程示意图;1 is a schematic diagram of the splitting process of the SST file of RocksDB in the related art;
图2为相关技术中RocksDB的日志结构合并(Log Structured Merge,LSM)分层机制的结构示意图;2 is a schematic structural diagram of the Log Structured Merge (LSM) layering mechanism of RocksDB in the related art;
图3为相关技术中对数据进行管理时RocksDB中添加前缀的过程示意图;3 is a schematic diagram of the process of adding prefixes in RocksDB when managing data in the related art;
图4为相关技术中RocksDB中插入数据分片的过程示意图;4 is a schematic diagram of the process of inserting data shards in RocksDB in the related art;
图5为相关技术中进行数据分片插入时的过程示意图;5 is a schematic diagram of a process when data fragmentation is inserted in the related art;
图6为本发明实施例的一种数据存储方法的流程示意图;6 is a schematic flowchart of a data storage method according to an embodiment of the present invention;
图7为本发明实施例的前缀拆分过程示意图;7 is a schematic diagram of a prefix splitting process according to an embodiment of the present invention;
图8为本发明实施例的进行数据分片的插入时的过程示意图;8 is a schematic diagram of a process of inserting data fragments according to an embodiment of the present invention;
图9为本发明实施例的数据存储装置的组成结构示意图;9 is a schematic diagram of a composition structure of a data storage device according to an embodiment of the present invention;
图10为本发明实施例的电子设备的结构示意图。FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
具体实施方式Detailed ways
以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所提供的实施例仅仅用以解释本发明,并不用于限定本发明。另外,以下所提供的实施例是用于实施本发明的部分实施例,而非提供实施本发明的全部实施例,在不冲突的情况下,本发明实施例记载的技术方案可以任意组合的方式实施。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the embodiments provided herein are only used to explain the present invention, and are not intended to limit the present invention. In addition, the embodiments provided below are part of the embodiments for implementing the present invention, rather than providing all the embodiments for implementing the present invention. In the case of no conflict, the technical solutions described in the embodiments of the present invention can be combined arbitrarily. implement.
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "and/or" in this article is only an association relationship to describe the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, and A and B exist independently B these three cases. In addition, the term "at least one" herein refers to any combination of any one of the plurality or at least two of the plurality, for example, including at least one of A, B, and C, and may mean including from A, B, and C. Any one or more elements selected from the set of B and C.
例如,本发明实施例提供的数据存储方法包含了一系列的步骤,但是本发明实施例提供的数据存储方法不限于所记载的步骤,同样地,本发明实施例提供的数据存储装置包括了一系列模块,但是本发明实施例提供的装置不限于包括所明确记载的模块,还可以包括为获取相关信息、或基于信息进行处理时所需要设置的模块。For example, the data storage method provided by the embodiment of the present invention includes a series of steps, but the data storage method provided by the embodiment of the present invention is not limited to the described steps. Similarly, the data storage device provided by the embodiment of the present invention includes a A series of modules, but the apparatus provided by the embodiments of the present invention is not limited to including the modules explicitly described, and may also include modules that need to be set for acquiring relevant information or performing processing based on the information.
本发明实施例可以应用于键值存储(Key Value store,KV store)引擎中,KVstore也可以是键-值数据库,是设计用来存储、检索和管理关联数组的数据存储范式,很多情况下关联数组可以是被称为“字典”或散列表的一种数据结构;这里的键值存储引擎可以是RocksDB存储引擎或其它存储引擎,RocksDB是基于levelDB开发并可以提供向后兼容的levelDB应用程序接口(Application Programming Interface,API)的数据库,同时,RocksDB是可以使用C++编写的嵌入式KV store引擎,其中,RocksDB的键值均允许使用二进制流;levelDB是开源的持久化KV单机数据库,具有很高的随机写、顺序读/写功能,由于levelDB的随机读的功能不是很强,因此,LevelDB很适合应用在查询较少,而写很多的场景。这里,LevelDB应用了日志结构合并树(Log Structured Merge-tree,LSM-tree)的文件管理策略,LSM-tree可以是一种基于硬盘的数据结构,与平衡多路查找树(Balance-tree,B-tree)相比,能显著地减少硬盘磁盘臂的开销,并能在较长的时间提供对文件的高速插入和/或删除。由于LSM-tree能够顺序地将文件写入到一系列更小的文件中,所以每个文件包含了一批数据,这些数据在一段短时间内变化,每个文件在被写入前都被排序以便于稍微快速的检索,且这些文件都是不变的,也从来不对这些文件进行数据更新,而每次都是通过写入新的文件。LSM-tree会检查所有文件来定期合并,以降低文件的数量对索引变更进行延迟及批量处理,并通过一种类似于归并排序的方式高效地将更新迁移到磁盘,降低索引插入开销。同时,RocksDB可以针对Flash存储进行优化,延迟极小。The embodiments of the present invention can be applied to a key value store (Key Value store, KV store) engine. KV store can also be a key-value database, which is a data storage paradigm designed to store, retrieve and manage associative arrays. The array can be a data structure called a "dictionary" or a hash table; the key-value storage engine here can be the RocksDB storage engine or other storage engines. RocksDB is developed based on levelDB and can provide a backward compatible levelDB application program interface (Application Programming Interface, API) database, at the same time, RocksDB is an embedded KV store engine that can be written in C++, in which, the key value of RocksDB allows the use of binary streams; levelDB is an open source persistent KV stand-alone database, with high The random write and sequential read/write functions of levelDB, because levelDB's random read function is not very strong, so LevelDB is very suitable for scenarios where there are few queries and many writes. Here, LevelDB applies the file management strategy of Log Structured Merge-tree (LSM-tree). -tree), can significantly reduce the overhead of the hard disk arm, and can provide high-speed insertion and/or deletion of files over a longer period of time. Since LSM-tree is able to write files sequentially into a series of smaller files, each file contains a batch of data that changes over a short period of time, and each file is sorted before being written In order to facilitate a slightly faster retrieval, these files are unchanged, and the data of these files are never updated, but each time by writing a new file. LSM-tree will check all files to merge periodically to reduce the number of files, delay and batch process index changes, and efficiently migrate updates to disk in a way similar to merge sort, reducing index insertion overhead. At the same time, RocksDB can be optimized for Flash storage with minimal latency.
相关技术中,RocksDB使用SST文件进行持久化数据存储,且SST文件在经过压缩(compact)后所存储的数据都是排序好的数据。其中,这些数据按照键进行排序,且仅按照大小进行拆分,当数据大小超过SST文件的大小限制时,将数据写入下一个SST文件中,具体地,当一键对应的键值的数据大小不能在一个SST文件中完整写入时,可以将键值对应的后续数据写入下一个SST文件中。这里,SST文件中的键都是按照排序好的顺序组织的数据文件,一个键或者一个迭代位置可以通过二分查找进行定位。In the related art, RocksDB uses SST files for persistent data storage, and the data stored in the SST files after being compressed (compacted) are sorted data. Among them, these data are sorted according to the key, and are only divided according to the size. When the data size exceeds the size limit of the SST file, the data is written into the next SST file. Specifically, when the data of the key value corresponding to a key When the size cannot be completely written in one SST file, the subsequent data corresponding to the key value can be written into the next SST file. Here, the keys in the SST file are all data files organized in sorted order, and a key or an iterative position can be located by binary search.
图1为相关技术中RocksDB的SST文件的拆分过程示意图,如图1所示,SST1和SST2表示两个不同的SST文件,SST1和SST2均为大小为128M的数据文件,其中,111、112、113……是SST1文件中的键,223、224、225……是SST2文件中的键,由于键111、键112……分别对应的键值大小可以是不相同的,而SST1的容量是有限的,因此,SST1的键的个数是根据各键对应的各键值大小进行变化的,显然,当SST1中存储的键值占据SST1的所有存储空间时,将键值写入SST2。Figure 1 is a schematic diagram of the splitting process of the SST file of RocksDB in the related art. As shown in Figure 1, SST1 and SST2 represent two different SST files, and SST1 and SST2 are both data files with a size of 128M. Among them, 111, 112 , 113... are the keys in the SST1 file, 223, 224, 225... are the keys in the SST2 file, because the key values corresponding to key 111, key 112... can be different, and the capacity of SST1 is Therefore, the number of keys of SST1 is changed according to the size of each key value corresponding to each key. Obviously, when the key value stored in SST1 occupies all the storage space of SST1, the key value is written to SST2.
进一步地,为了防止每一次压缩都会涉及大量文件,RocksDB使用分层机制。上层的数据是新写入的数据,下层的数据是写入时间较长的数据,上下层之间可能存在相同的键,在这种情况下键对应的键值以新写入的数据为准。每一层的数据都是连续且有序排列的。当查询数据时,从上层开始向下层查询,以查到的第一个为存储的实际数据值。其中,压缩是指将一些SST文件合并成另外一些SST文件的后台任务,具体地,通过对SST文件进行压缩可以删除键相同的重复数据的写入。Further, in order to prevent a large number of files being involved in each compression, RocksDB uses a layering mechanism. The data in the upper layer is the newly written data, and the data in the lower layer is the data that has been written for a long time. There may be the same key between the upper and lower layers. In this case, the key value corresponding to the key is based on the newly written data. . The data of each layer is continuous and ordered. When querying data, the query starts from the upper layer to the lower layer, and the first one found is the actual data value stored. The compression refers to the background task of merging some SST files into other SST files. Specifically, by compressing the SST files, the writing of duplicate data with the same key can be deleted.
图2为相关技术中RocksDB的LSM分层机制的结构示意图,如图2所示,每个层级的大小容量是不同的,且依次增加,层级1的容量是300MB,层级2的容量是3GB,层级3的容量是30GB,层级4的容量是300GB,具体地,数据写入的方式可以是,先将数据写入层级1,然后判断层级2是否已满,如果层级2未满,则将数据从层级1移至层级2,进而判断层级3是否已满,如果层级3未满,则将数据从层级2移至层级3,最后判断层级4是否已满,如果层级4未满,则将数据从层级3移至层级4,从而实现层级1(最上层)写入的是新数据,层级4(最下层)的数据是写入时间最长的数据。Figure 2 is a schematic structural diagram of the LSM layering mechanism of RocksDB in the related art. As shown in Figure 2, the size and capacity of each layer are different and increase in turn. The capacity of
同时,RocksDB中为了方便对数据进行管理,上层业务需要对数据进行分片,并将存放在底层RocksDB的数据中的键添加相应的前缀,以便管理与区分。具体地,针对同一RocksDB,可以对于不同操作者对于同一键的操作,设置不同的前缀,例如,当操作者A和操作者B,分别对键111进行操作,则为了方便管理可以对于操作者A对键111的操作,在键111前添加前缀A,对于操作者B对键111的操作,在键111前添加前缀B,以区分不同操作者对数据的管理操作,当然这里不对前缀的名称进行具体的限定,可以是字符“A”、“B”或数字“1”、“2”等等。At the same time, in order to facilitate data management in RocksDB, the upper-level business needs to shard the data, and add the corresponding prefix to the keys stored in the underlying RocksDB data for management and differentiation. Specifically, for the same RocksDB, different prefixes can be set for the operations of different operators on the same key. For example, when operator A and operator B operate the key 111 respectively, for the convenience of management, operator A can operate the same key. For the operation of the key 111, add the prefix A before the key 111. For the operation of the operator B on the key 111, add the prefix B before the key 111 to distinguish the management operations of the data by different operators. Of course, the name of the prefix is not used here. The specific definitions can be characters "A", "B" or numbers "1", "2" and so on.
图3为相关技术中对数据进行管理时RocksDB中添加前缀的过程示意图,如图3所示,在键的前面实现对前缀的添加。Fig. 3 is a schematic diagram of a process of adding a prefix to RocksDB when managing data in the related art. As shown in Fig. 3, the prefix is added in front of the key.
在数据存储的相关技术中,由于RocksDB对数据的存储方式,在底层SST中,数据依照键值的大小,进行有序存储,这将会导致某些SST文件中,存在多个数据片的数据。当需要对数据分片进行处理操作时,如删除数据分片时,需要读取这些SST文件中的数据,并对其进行判断,开销较大。In the related technologies of data storage, due to the way RocksDB stores data, in the underlying SST, the data is stored in an orderly manner according to the size of the key value, which will lead to the existence of multiple data slices in some SST files. . When data fragmentation needs to be processed, for example, when data fragmentation is deleted, data in these SST files needs to be read and judged, which is expensive.
具体地,对于批量数据删除的场景为对数据分片的整体删除的情况,一次性删除的数据量大,且要求能够及时删除,并释放空间,RocksDB删除数据分片的方式分别有两种:方式1,下发批量删除标记,然后在后续RocksDB进行压缩时,将被标记的数据删除;方式2,根据所删除的数据范围,直接对存储数据的SST文件进行删除。可以看出,方式1不能及时地删除数据,且压缩时会有一定的空间放大,且也会加大中央处理器(central processingunit,CPU)的资源开销,方式2的删除速度快,但当数据范围不能包含整个SST文件即,某些SST文件中存在多个数据分片的数据时,由于这些SST文件至少包含两种不同前缀的SST文件,因此,在删除数据分片,即删除分片对应的某一前缀的SST文件时,则无法删除这些至少包含两种不同前缀的SST文件,导致数据的残留,例如,在删除前缀为A的数据分片时,无法删除同时包含前缀A和前缀B的SST文件。Specifically, for the scenario of batch data deletion, which is the overall deletion of data shards, the amount of data to be deleted at one time is large, and it is required to delete it in time and free up space. There are two ways for RocksDB to delete data shards: Method 1: Issue a batch deletion mark, and then delete the marked data when RocksDB compresses it; Method 2, directly delete the SST file storing the data according to the deleted data range. It can be seen that
图4为相关技术中RocksDB中插入数据分片的过程示意图,如图4所示,Figure 4 is a schematic diagram of the process of inserting data shards into RocksDB in the related art, as shown in Figure 4,
Prefix-1代表N个前缀均为Prefix-1的数据文件,Prefix-1至Prefix-3代表1个前缀处于Prefix-1至Prefix-3之间的数据文件,Prefix-3代表N个前缀均为Prefix-3的数据文件。这里N可以是大于或等于1的整数。Prefix-2用于表示前缀为Prefix-2的待插入数据分片,具体地,插入数据分片的应用场景可以是,对于当某一RocksDB存在故障,且需要将该故障RocksDB中的数据插入到正常的RocksDB中的场景,这里的插入过程可以是实现对故障RocksDB中数据的修复过程,故障RocksDB中数据可以是待修复数据分片,可以理解的是,可以通过将待修复的数据分片,生成SST文件,然后将生成的SST文件插入至RocksDB中,以将外部数据恢复至本地RocksDB。由于RocksDB使用了LSM存储引擎,具有分层的特性,所以为了防止数据冲突,在进行数据插入时,如果发现数据冲突,则会将接收数据接口上的数据放置在上层,例如,数据可以被放置在图2中的层级1,待到compact时,再进行判断和处理。这里的接收数据接口可以是ingest接口。Prefix-1 represents N data files with prefixes of Prefix-1, Prefix-1 to Prefix-3 represent 1 data file with prefixes between Prefix-1 and Prefix-3, Prefix-3 represents N prefixes are Prefix-3 data file. Here N can be an integer greater than or equal to 1. Prefix-2 is used to indicate the data shard to be inserted with the prefix of Prefix-2. Specifically, the application scenario of inserting data shards may be, when a certain RocksDB is faulty, and the data in the faulty RocksDB needs to be inserted into the In the normal RocksDB scenario, the insertion process here can be the process of restoring the data in the faulty RocksDB, and the data in the faulty RocksDB can be the data shards to be repaired. Generate an SST file and insert the generated SST file into RocksDB to restore external data to the local RocksDB. Since RocksDB uses the LSM storage engine and has the characteristics of layering, in order to prevent data conflicts, when data conflicts are found during data insertion, the data on the receiving data interface will be placed in the upper layer. For example, the data can be placed At
进一步地,在图4中,进行数据修复即数据分片插入时的数据分片具有相同的前缀即前缀Prefix-2,理论上不会和原有的数据存在重合现象,但由于存在横跨多个前缀的SST文件,例如,图4中的Prefix-1至Prefix-3文件,从而导致当需要插入Prefix-2时,会认为原RocksDB中Prefix-2为前缀的SST文件,即会被判断存在冲突数据,以使得被插入的数据无法被放置在底层,例如,无法被放置在图2中的层级4,当一个大的数据块被放置在上层时,会导致后续的compact存在巨大的空间放大情况,放大倍数甚至可能达到2倍以上。Further, in Figure 4, the data fragmentation during data repair, that is, when the data fragmentation is inserted, has the same prefix, that is, the prefix Prefix-2. In theory, there will be no overlap with the original data. SST files with a prefix, for example, the Prefix-1 to Prefix-3 files in Figure 4, so that when Prefix-2 needs to be inserted, it will be considered that the SST file prefixed with Prefix-2 in the original RocksDB will be judged to exist. Conflicting data, so that the inserted data cannot be placed at the bottom layer, for example, cannot be placed at level 4 in Figure 2. When a large data block is placed at the upper level, it will cause a huge space enlargement for subsequent compacts In some cases, the magnification may even reach more than 2 times.
图5为相关技术中进行数据分片插入时的过程示意图,如图5所示,层级5的编号为51的存储空间和层级6中编号为62的存储空间中存在跨前缀的数据文件时,从而导致新的数据文件不能直接插入层级5中的编号为51的存储空间和层级6的编号为62的存储空间。FIG. 5 is a schematic diagram of the process of inserting data slices in the related art. As shown in FIG. 5 , when there are cross-prefix data files in the storage space numbered 51 in the
针对上述技术问题,在本发明的一些实施例中,提出了一种数据存储方法。In view of the above technical problems, in some embodiments of the present invention, a data storage method is proposed.
图6为本发明实施例的一种数据存储方法的流程图,如图6所示,该流程可以包括:FIG. 6 is a flowchart of a data storage method according to an embodiment of the present invention. As shown in FIG. 6 , the flowchart may include:
步骤601:按照前缀拆分数据库中包含至少两种不同前缀的数据文件,得到至少两个拆分后的数据文件,其中,每个拆分后的数据文件对应所述至少两种不同前缀中的一种前缀,所述数据文件为按照键进行排序的文件;Step 601: Split the data files containing at least two different prefixes in the database according to the prefix, and obtain at least two split data files, wherein each split data file corresponds to one of the at least two different prefixes. A prefix, the data files are files sorted by keys;
这里,数据库可以是索引方式为键值存储的数据库,例如,数据库可以是RocksDB,所述数据文件是指按照键进行排序的文件,具体地,数据文件可以是指索引方式为键值存储的数据库中按照预先排序好的键的顺序进行数据存储的文件,例如,数据文件可以是RocksDB中的某个SST文件。Here, the database may be a database whose index mode is key-value storage, for example, the database may be RocksDB, and the data file refers to a file sorted by key. Specifically, the data file may refer to a database whose index mode is key-value storage A file in which data is stored in the order of pre-sorted keys. For example, the data file can be an SST file in RocksDB.
作为一种实施方式,包含至少两种不同前缀的数据文件,可以是前缀处于Prefix-1至Prefix-3之间的一数据文件,即,该数据文件中的所有键的前缀可以是Prefix-1或Prefix-3中的任意一种,当然在存在前缀Prefix-2的情况下也可以是Prefix-1与Prefix-3间的Prefix-2;将数据库中包含至少两种不同前缀的数据文件按照前缀进行拆分,使得拆分后的每个数据文件对应一种前缀,可以是将前缀处于Prefix-1至Prefix-3之间的一数据文件C(这里,数据文件的前缀中不包含Prefix-2)进行拆分,拆分为前缀为Prefix-1的数据文件A和前缀为Prefix-3的数据文件B。As an implementation manner, the data file containing at least two different prefixes may be a data file whose prefix is between Prefix-1 and Prefix-3, that is, the prefix of all keys in the data file may be Prefix-1 Or any one of Prefix-3, of course, if there is prefix Prefix-2, it can also be Prefix-2 between Prefix-1 and Prefix-3; the data files containing at least two different prefixes in the database are classified according to the prefix. Split so that each split data file corresponds to a prefix, which can be a data file C whose prefix is between Prefix-1 and Prefix-3 (here, the prefix of the data file does not contain Prefix-2 ) to split into data file A with prefix-1 and data file B with prefix-3.
图7是本发明实施例的数据文件的拆分过程示意图,如图7所示,Prefix-1代表N个前缀均为Prefix-1的数据文件,Prefix-1至Prefix-3代表一个前缀处于Prefix-1至Prefix-3之间的数据文件,Prefix-3代表N个前缀均为Prefix-3的数据文件。这里N可以是大于或等于1的整数。可以看出,通过数据文件拆分,Prefix-1至Prefix-3对应的一个数据文件被分别拆分为前缀为Prefix-1的数据文件和前缀为Prefix-3的数据文件。FIG. 7 is a schematic diagram of a splitting process of a data file according to an embodiment of the present invention. As shown in FIG. 7 , Prefix-1 represents N data files whose prefixes are all Prefix-1, and Prefix-1 to Prefix-3 represent that one prefix is in Prefix Data files between -1 and Prefix-3, where Prefix-3 represents N data files with prefixes of Prefix-3. Here N can be an integer greater than or equal to 1. It can be seen that, through data file splitting, a data file corresponding to Prefix-1 to Prefix-3 is split into a data file with prefix-1 and a data file with prefix-3 respectively.
步骤602:对所述至少两个拆分后的数据文件进行数据处理。。Step 602: Perform data processing on the at least two split data files. .
这里,对所述至少两个拆分后的数据文件进行数据处理,可以是指在收到对各数据文件的处理指令时,对各数据文件进行数据处理。具体地,可以是收到包含至少两个拆分后的数据文件中的至少一个数据文件的处理请求时,对至少一个数据文件进行数据处理,例如,可以是收到对Prefix-1数据文件的处理请求时,对拆分后所得到的Prefix-1数据文件进行处理。可以理解的是,对Prefix-1数据文件的处理请求的实现方式,示例性地,上层业务对数据分片的处理请求中可以包含Prefix-1前缀信息,当然,包含Prefix-1前缀信息的上层业务对数据分片的处理请求,也能够实现对拆分后所获得的Prefix-1数据文件以外的其它所有Prefix-1数据文件的处理。Here, performing data processing on the at least two split data files may refer to performing data processing on each data file when a processing instruction for each data file is received. Specifically, when a processing request including at least one data file among the at least two split data files is received, data processing is performed on at least one data file. When processing the request, the Prefix-1 data file obtained after splitting is processed. It can be understood that, for the implementation of the processing request for the Prefix-1 data file, for example, the processing request for the data fragment by the upper-layer service may include the Prefix-1 prefix information, of course, the upper layer that includes the Prefix-1 prefix information. The processing request of the service for the data fragmentation can also realize the processing of all other Prefix-1 data files except the Prefix-1 data file obtained after the split.
在实际应用中,步骤601至步骤602可以利用电子设备中的处理器实现,上述处理器可以为特定用途集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor,DSP)、数字信号处理装置(Digital SignalProcessing Device,DSPD)、可编程逻辑装置(Programmable Logic Device,PLD)、FPGA、中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器中的至少一种。In practical applications, steps 601 to 602 may be implemented by using a processor in an electronic device, and the above-mentioned processor may be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), At least one of a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), an FPGA, a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor A sort of.
可以看出,本发明实施例提出的数据存储方法,按照前缀拆分数据库中包含至少两种不同前缀的数据文件,得到至少两个拆分后的数据文件,其中,每个拆分后的数据文件对应所述至少两种不同前缀中的一种前缀,所述数据文件为按照键进行排序的文件,并对所述至少两个拆分后的数据文件进行数据处理,由于拆分后的数据文件只对应一种前缀,不存在对应至少两个前缀的数据文件,因此,在对数据文件进行处理时,可以直接按照前缀对相应的数据文件进行处理,降低了数据处理时的开销。It can be seen that, in the data storage method proposed by the embodiment of the present invention, the data files containing at least two different prefixes in the database are split according to the prefix, and at least two split data files are obtained, wherein each split data file The file corresponds to one of the at least two different prefixes, the data file is a file sorted by key, and data processing is performed on the at least two split data files, because the split data A file only corresponds to one prefix, and there is no data file corresponding to at least two prefixes. Therefore, when processing a data file, the corresponding data file can be processed directly according to the prefix, which reduces the overhead of data processing.
在一种实施方式中,所述按照前缀拆分数据库中至少包含至少两种不同前缀的数据文件,得到至少两个拆分后的数据文件,包括:将所述包含至少两种不同前缀的数据文件,按照所述至少两种不同前缀的顺序拆分为有序排列的所述至少两个拆分后的数据文件。In an embodiment, the splitting the database according to the prefixes contains at least two data files with different prefixes, and obtaining at least two split data files includes: dividing the data files containing at least two different prefixes The file is split into the at least two split data files in order according to the sequence of the at least two different prefixes.
这里,将所述包含至少两种不同前缀的数据文件,按照所述至少两种不同前缀的顺序拆分为有序排列的所述至少两个拆分后的数据文件,示例性地,可以是将Prefix-1至Prefix-3的数据文件,按照前缀Prefix-1和Prefix-3的顺序,拆分为顺序为Prefix-1、Prefix-3的两个数据文件。从而,可以保证数据的排序依然按照键的顺序排列。Here, the data file containing at least two different prefixes is split into the at least two split data files in order according to the sequence of the at least two different prefixes. Exemplarily, it can be Divide the data files from Prefix-1 to Prefix-3 into two data files of Prefix-1 and Prefix-3 in the order of prefixes Prefix-1 and Prefix-3. Thus, it can be guaranteed that the sorting of the data is still in the order of the keys.
在一种实施方式中,所述对所述至少两个拆分后的数据文件进行数据处理,包括:对所述至少两个拆分后的数据文件中的至少一个数据文件进行数据删除,和/或,针对所述至少两个拆分后的数据文件进行数据插入。In one embodiment, the performing data processing on the at least two split data files includes: performing data deletion on at least one of the at least two split data files, and /or, perform data insertion with respect to the at least two split data files.
这里,对所述至少两个拆分后的数据文件中的至少一个数据文件进行数据删除,示例性地,可以是将拆分后得到的前缀为Prefix-1的数据文件和前缀为Prefix-3的数据文件进行删除;针对所述数据文件进行数据插入,可以是将RocksDB外的数据文件Prefix-2插入到拆分后得到的Prefix-1数据文件和Prefix-3数据文件之间。Here, data deletion is performed on at least one data file in the at least two split data files, exemplarily, the data file with prefix prefix-1 obtained after splitting and the prefix with prefix-3 The data file is deleted; for the data file, data insertion can be performed by inserting the data file Prefix-2 outside of RocksDB between the Prefix-1 data file and the Prefix-3 data file obtained after splitting.
在一示例中,所述对所述至少两个拆分后的数据文件中的至少一个数据文件进行数据删除,包括:接收到包含第一前缀的数据删除指令,且所述第一前缀包含所述至少两个拆分后的数据文件中的至少一个数据文件的前缀时,在所述至少两个拆分后的数据文件中删除与所述第一前缀对应的数据文件。In an example, the performing data deletion on at least one of the at least two split data files includes: receiving a data deletion instruction including a first prefix, and the first prefix includes the When the prefix of at least one data file in the at least two split data files is deleted, the data file corresponding to the first prefix is deleted from the at least two split data files.
这里,第一前缀可以是指第一前缀信息,例如,第一前缀可以直接是前缀Prefix-1,也可以是表示前缀Prefix-1的任意其它字符。接收到包含第一前缀的数据删除指令可以是通过人为操作来将包含第一前缀的数据删除指令发送给数据库,数据库接收包含第一前缀的数据删除指令,所述第一前缀包含所述至少两个拆分后的数据文件中的至少一个数据文件的前缀,可以是第一前缀包含拆分后得到的Prefix-1数据文件和Prefix-3数据文件中至少一个数据文件前缀,例如,第一前缀可以包含前缀Prefix-1和/或前缀Prefix-3。在所述至少两个拆分后的数据文件中删除与所述第一前缀对应的数据文件,示例性地,可以是从至少两个拆分后的数据文件中确定删除命令中所涉及第一前缀对应的数据文件,并将确定的数据文件删除,例如,当第一前缀是Prefix-1时,对应删除拆分后得到的Prefix-1数据文件和Prefix-3数据文件中的Prefix-1数据文件。Here, the first prefix may refer to the first prefix information, for example, the first prefix may be the prefix Prefix-1 directly, or may be any other character representing the prefix Prefix-1. Receiving the data deletion instruction including the first prefix may be to send the data deletion instruction including the first prefix to the database through manual operation, and the database receives the data deletion instruction including the first prefix, and the first prefix includes the at least two The prefix of at least one data file in the split data files may be that the first prefix includes at least one data file prefix in the Prefix-1 data file and the Prefix-3 data file obtained after the split, for example, the first prefix May contain prefix Prefix-1 and/or prefix Prefix-3. Deleting the data file corresponding to the first prefix from the at least two split data files may be, for example, determining from the at least two split data files that the first prefix involved in the delete command is determined. The data file corresponding to the prefix is deleted, and the determined data file is deleted. For example, when the first prefix is Prefix-1, the corresponding Prefix-1 data file obtained after splitting and Prefix-1 data in the Prefix-3 data file are deleted. document.
可以理解的是,包含第一前缀的数据删除指令可以是上层业务对数据分片的数据删除指令,此时,不仅需要删除拆分后得到的Prefix-1数据文件,还需要删除,其它前缀为Prefix-1的数据文件。可以看出,对拆分后的至少两个拆分后的数据文件中的至少一个数据文件进行数据删除,能够一次性快速地删除所有应删除数据,不存在空间放大与数据残留问题。It can be understood that the data deletion instruction containing the first prefix can be the data deletion instruction for the data fragment by the upper-layer service. In this case, not only the Prefix-1 data file obtained after the split needs to be deleted, but also the deletion. The other prefixes are: Data file for Prefix-1. It can be seen that, by performing data deletion on at least one of the at least two split data files after being split, all the data to be deleted can be quickly deleted at one time, and there is no problem of space enlargement and data residue.
在一种实施方式中,所述针对所述至少两个拆分后的数据文件进行数据插入,包括:In one embodiment, the performing data insertion on the at least two split data files includes:
接收到包含待插入数据文件的数据插入指令,且所述待插入数据文件包含第二前缀时,根据所述第二前缀确定数据插入位置,在确定的所述插入位置插入所述待插入数据文件。When receiving a data insertion instruction including a data file to be inserted, and the data file to be inserted includes a second prefix, determine a data insertion position according to the second prefix, and insert the to-be-inserted data file at the determined insertion position .
这里,待插入数据文件可以是指RocksDB外的需要修复的数据文件,数据插入指令可以是根据需要修复的数据文件所形成的指令,所述第二前缀可以是指需要修复的数据文件的前缀,对于根据所述第二前缀确定数据插入位置的实施方式,具体地,可以是在第二前缀为Prefix-2的情况下,确定待插入数据的插入位置为Prefix-1前缀的数据文件和Prefix-3前缀的数据文件之间,进一步地,在确定的所述插入位置插入所述待插入数据文件,可以是在Prefix-1前缀的数据文件和Prefix-3前缀的数据文件之间插入前缀为Prefix-2的待插入数据文件。Here, the data file to be inserted may refer to a data file outside RocksDB that needs to be repaired, the data insertion instruction may be an instruction formed according to the data file to be repaired, and the second prefix may refer to the prefix of the data file to be repaired, For the implementation of determining the data insertion position according to the second prefix, specifically, when the second prefix is Prefix-2, it may be determined that the insertion position of the data to be inserted is the data file and Prefix-1 prefix. Between the data files of the
可以看出,针对拆分后的至少两个拆分后的数据文件中进行数据插入,由于不存在跨前缀的数据文件,因此,可以将待插入文件直接插入LSM分层中的任意一层,且可以直接插入应该插入的位置,而不需要经过compact,从而避免了可能出现的空间方法与资源消耗。It can be seen that, for data insertion into at least two split data files after splitting, since there is no cross-prefix data file, the to-be-inserted file can be directly inserted into any layer in the LSM hierarchy, And it can be directly inserted into the position where it should be inserted without going through compaction, thus avoiding possible space methods and resource consumption.
图8为本发明实施例的进行数据分片的插入时的过程示意图,如图8所示,由于层级0、层级5和层级6都不存在跨前缀的数据文件,因此,可以将待插入的数据分片插入任意一层。FIG. 8 is a schematic diagram of a process of inserting data slices according to an embodiment of the present invention. As shown in FIG. 8 , since there are no cross-prefix data files at level 0,
在一种实施方式中,所述根据所述第二前缀确定数据插入位置,包括:根据所述第二前缀、以及所述至少两个拆分后的数据文件的前缀顺序,确定数据插入位置。In an implementation manner, the determining the data insertion position according to the second prefix includes: determining the data insertion position according to the second prefix and the prefix order of the at least two split data files.
作为一种实施方式,根据所述第二前缀、以及所述至少两个拆分后的数据文件的前缀顺序,确定数据插入位置,可以是根据第二前缀以及拆分后的至少两个拆分后的数据文件的前缀顺序,判断第二前缀与至少两个拆分后的数据文件前缀之间的顺序关系,根据顺序关系确定数据插入位置,例如,当第二前缀为字母B,至少两个拆分后的数据文件的前缀顺序是字母A和字母C时,确定字母B应该在字母A和字母B之间,进而确定数据的插入位置为前缀A的数据文件和前缀为B的数据文件之间。As an implementation manner, the data insertion position is determined according to the second prefix and the prefix order of the at least two split data files, which may be based on the second prefix and the at least two splits after splitting The prefix sequence of the data file after the second prefix is determined, and the sequence relationship between the second prefix and the prefixes of the at least two split data files is determined, and the data insertion position is determined according to the sequence relationship. For example, when the second prefix is the letter B, at least two When the prefix sequence of the split data file is letter A and letter C, determine that letter B should be between letter A and letter B, and then determine that the data insertion position is between the data file with prefix A and the data file with prefix B. between.
在一种实施方式中,所述数据文件是SST文件。In one embodiment, the data file is an SST file.
图9为本发明实施例的数据存储装置的组成结构示意图,如图9所示,该装置可以包括:拆分模块901和处理模块902;其中,FIG. 9 is a schematic structural diagram of a data storage device according to an embodiment of the present invention. As shown in FIG. 9 , the device may include: a splitting
拆分模块901,用于按照前缀拆分数据库中包含至少两种不同前缀的数据文件,得到至少两个拆分后的数据文件,其中,每个拆分后的数据文件对应所述至少两种不同前缀中的一种前缀,所述数据文件是为指按照键进行排序的文件;The
处理模块902,用于对所述至少两个拆分后的数据文件进行数据处理。The
可选地,所述拆分模块901用于将所述包含至少两种不同前缀的数据文件,按照所述至少两种不同前缀的顺序拆分为有序排列的所述至少两个拆分后的数据文件。Optionally, the
可选地,所述处理模块902用于对所述至少两个拆分后的数据文件中的至少一个数据文件进行数据删除,和/或,针对所述至少两个拆分后的数据文件进行数据插入。Optionally, the
可选地,所述处理模块902用于接收到包含第一前缀的数据删除指令,且所述第一前缀包含所述至少两个拆分后的数据文件中的至少一个数据文件的前缀时,在所述至少两个拆分后的数据文件中删除与所述第一前缀对应的数据文件。Optionally, when the
可选地,所述处理模块902用于接收到包含待插入数据文件的数据插入指令,且所述待插入数据文件包含第二前缀时,根据所述第二前缀确定数据插入位置,在确定的所述插入位置插入所述待插入数据文件。Optionally, the
可选地,所述处理模块902用于根据所述第二前缀、以及所述至少两个拆分后的数据文件的前缀顺序,确定数据插入位置。Optionally, the
可选地,所述数据文件是SST文件。Optionally, the data file is an SST file.
实际应用中,拆分模块901和处理模块902均可以利用电子设备中的处理器实现,上述处理器可以为ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。In practical applications, both the
可以看出,本发明实施例提出的数据存储装置,通过拆分模块将按照前缀拆分数据库中包含至少两种不同前缀的数据文件,得到至少两个拆分后的数据文件,其中,每个拆分后的数据文件对应所述至少两种不同前缀中的一种前缀,所述数据文件为按照键进行排序的文件;并通过处理模块对所述至少两个拆分后的数据文件进行数据处理,由于拆分后的数据文件只对应一种前缀,不存在对应至少两个前缀的数据文件,因此,在对数据文件进行处理时,可以直接按照前缀对相应的数据文件进行处理,降低了数据处理时的开销。It can be seen that, in the data storage device proposed by the embodiment of the present invention, the splitting module splits the data files containing at least two different prefixes in the database according to the prefix, and obtains at least two split data files, wherein each The split data files correspond to one of the at least two different prefixes, and the data files are files sorted by keys; and the at least two split data files are processed by the processing module. Since the split data file only corresponds to one prefix, there is no data file corresponding to at least two prefixes. Therefore, when processing the data file, the corresponding data file can be processed directly according to the prefix, reducing the Data processing overhead.
另外,在本实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。In addition, each functional module in this embodiment may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software function modules.
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this embodiment is essentially or The part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions for making a computer device (which can be It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment. The aforementioned storage medium includes: U disk, removable hard disk, Read Only Memory (ROM), Random Access Memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
具体来讲,本实施例中的一种事件处理方法对应的计算机程序指令可以被存储在光盘,硬盘,U盘等存储介质上,当存储介质中的与一种数据存储方法对应的计算机程序指令被一电子设备读取或被执行时,实现前述实施例的任意一种事件处理方法。Specifically, a computer program instruction corresponding to an event processing method in this embodiment may be stored on a storage medium such as an optical disc, a hard disk, a U disk, etc. When the computer program instruction corresponding to a data storage method in the storage medium When read or executed by an electronic device, any one of the event processing methods in the foregoing embodiments is implemented.
基于前述实施例相同的技术构思,参见图10,其示出了本发明实施例提供的一种电子设备,可以包括:存储器1001和处理器1002;其中,Based on the same technical concept as the foregoing embodiments, see FIG. 10 , which shows an electronic device provided by an embodiment of the present invention, which may include: a
所述存储器1001,用于存储计算机程序和数据;The
所述处理器1002,用于执行所述存储器中存储的计算机程序,以实现前述实施例的任意一种数据存储方法。The
在实际应用中,上述存储器1001可以是易失性存储器(volatile memory),例如RAM;或者非易失性存储器(non-volatile memory),例如ROM,快闪存储器(flash memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的存储器的组合,并向处理器1002提供指令和数据。In practical applications, the above-mentioned
上述处理器1002可以为ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。可以理解地,对于不同的增强现实云平台,用于实现上述处理器功能的电子器件还可以为其它,本发明实施例不作具体限定。The above-mentioned
在一些实施例中,本发明实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述In some embodiments, the functions or modules included in the apparatus provided in the embodiments of the present invention may be used to execute the methods described in the above method embodiments. For specific implementation, reference may be made to the above method embodiments. For brevity, here No longer
上文对各个实施例的描述倾向于强调各个实施例之间的不同之处,其相同或相似之处可以互相参考,为了简洁,本文不再赘述The above description of the various embodiments tends to emphasize the differences between the various embodiments, and the similarities or similarities can be referred to each other. For the sake of brevity, details are not repeated herein.
本申请所提供的各方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。The methods disclosed in each method embodiment provided in this application can be combined arbitrarily without conflict to obtain a new method embodiment.
本申请所提供的各产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。The features disclosed in each product embodiment provided in this application can be combined arbitrarily without conflict to obtain a new product embodiment.
本申请所提供的各方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。The features disclosed in each method or device embodiment provided in this application can be combined arbitrarily without conflict to obtain a new method embodiment or device embodiment.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, CD), including several instructions to make a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present invention.
上面结合附图对本发明的实施例进行了描述,但是本发明并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本发明的启示下,在不脱离本发明宗旨和权利要求所保护的范围情况下,还可做出很多形式,这些均属于本发明的保护之内。The embodiments of the present invention have been described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific embodiments, which are merely illustrative rather than restrictive. Under the inspiration of the present invention, without departing from the scope of protection of the present invention and the claims, many forms can be made, which all belong to the protection of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010522791.XA CN113779014A (en) | 2020-06-10 | 2020-06-10 | A data storage method, apparatus, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010522791.XA CN113779014A (en) | 2020-06-10 | 2020-06-10 | A data storage method, apparatus, device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113779014A true CN113779014A (en) | 2021-12-10 |
Family
ID=78834743
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010522791.XA Pending CN113779014A (en) | 2020-06-10 | 2020-06-10 | A data storage method, apparatus, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113779014A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103870492A (en) * | 2012-12-14 | 2014-06-18 | 腾讯科技(深圳)有限公司 | Data storing method and device based on key sorting |
CN109388641A (en) * | 2018-10-22 | 2019-02-26 | 无锡华云数据技术服务有限公司 | Method, the equipment, medium of the common prefix of key in a kind of retrieval key value database |
US10338972B1 (en) * | 2014-05-28 | 2019-07-02 | Amazon Technologies, Inc. | Prefix based partitioned data storage |
-
2020
- 2020-06-10 CN CN202010522791.XA patent/CN113779014A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103870492A (en) * | 2012-12-14 | 2014-06-18 | 腾讯科技(深圳)有限公司 | Data storing method and device based on key sorting |
US10338972B1 (en) * | 2014-05-28 | 2019-07-02 | Amazon Technologies, Inc. | Prefix based partitioned data storage |
CN109388641A (en) * | 2018-10-22 | 2019-02-26 | 无锡华云数据技术服务有限公司 | Method, the equipment, medium of the common prefix of key in a kind of retrieval key value database |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9880746B1 (en) | Method to increase random I/O performance with low memory overheads | |
US10642515B2 (en) | Data storage method, electronic device, and computer non-volatile storage medium | |
US7567973B1 (en) | Storing a sparse table using locality groups | |
US7548928B1 (en) | Data compression of large scale data stored in sparse tables | |
US10678654B2 (en) | Systems and methods for data backup using data binning and deduplication | |
CN100565512C (en) | Eliminate the system and method for redundant file in the document storage system | |
US7065619B1 (en) | Efficient data storage system | |
US8271456B2 (en) | Efficient backup data retrieval | |
US11841826B2 (en) | Embedded reference counts for file clones | |
CN111417939A (en) | Tiered Storage in Distributed File System | |
US11226934B2 (en) | Storage system garbage collection and defragmentation | |
WO2019045959A1 (en) | Kvs tree database | |
US20080183767A1 (en) | Efficient data storage system | |
JP2022500727A (en) | Systems and methods for early removal of tombstone records in databases | |
CN110888837B (en) | Object storage small file merging method and device | |
KR20190019805A (en) | Method and device for storing data object, and computer readable storage medium having a computer program using the same | |
KR20160100216A (en) | Method and device for constructing on-line real-time updating of massive audio fingerprint database | |
CN115454994A (en) | Metadata storage method and device based on distributed key value database | |
CN113535670A (en) | Virtual resource mirror image storage system and implementation method thereof | |
CN114780500A (en) | Data storage method, device, equipment and storage medium based on log merging tree | |
KR101652436B1 (en) | Apparatus for data de-duplication in a distributed file system and method thereof | |
WO2014157243A1 (en) | Storage control device, control method for storage control device, and control program for storage control device | |
Tulkinbekov et al. | CaseDB: Lightweight key-value store for edge computing environment | |
CN108595589A (en) | A kind of efficient access method of magnanimity science data picture | |
US8156126B2 (en) | Method for the allocation of data on physical media by a file system that eliminates duplicate data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20211210 |
|
RJ01 | Rejection of invention patent application after publication |