[go: up one dir, main page]

CN112527735A - Data merging method and device applied to key value storage system - Google Patents

Data merging method and device applied to key value storage system Download PDF

Info

Publication number
CN112527735A
CN112527735A CN202011415970.XA CN202011415970A CN112527735A CN 112527735 A CN112527735 A CN 112527735A CN 202011415970 A CN202011415970 A CN 202011415970A CN 112527735 A CN112527735 A CN 112527735A
Authority
CN
China
Prior art keywords
sst
sst file
file
storage system
operation records
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011415970.XA
Other languages
Chinese (zh)
Inventor
瓦伦丁·库茨奈特索夫
李逸
倪敏芳
鹿智婷
欧佳欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202011415970.XA priority Critical patent/CN112527735A/en
Publication of CN112527735A publication Critical patent/CN112527735A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供一种应用于键值存储系统中的数据合并方法和装置,涉及信息技术领域,能够解决KV存储系统中SST文件中存在大量删除操作记录时导致的读取操作性能低的问题。其方法为:判断键值存储系统的第n层级的SST文件中的删除操作记录的数目是否大于第一预设阈值;其中,n为非负整数;当删除操作记录的数目大于该第一预设阈值时,将该第n层级中的第一SST文件与第二SST文件合并;其中,该第二SST文件位于该键值存储系统的第n+1层级,该键值存储系统接收该第一SST文件中的操作记录的时间晚于该键值存储系统接收该第二SST文件中的操作记录的时间。本申请实施例应用于KV存储系统存在大量数据删除的场景。

Figure 202011415970

The embodiments of the present application provide a data merging method and device applied in a key-value storage system, relate to the field of information technology, and can solve the problem of low read operation performance caused by a large number of deletion operation records in the SST file in the KV storage system . The method is: judging whether the number of deletion operation records in the SST file of the nth level of the key-value storage system is greater than a first preset threshold; wherein, n is a non-negative integer; when the number of deletion operation records is greater than the first preset threshold; When the threshold is set, the first SST file in the nth level is merged with the second SST file; wherein, the second SST file is located at the n+1th level of the key-value storage system, and the key-value storage system receives the The time of the operation record in one SST file is later than the time when the key-value storage system receives the operation record in the second SST file. The embodiments of the present application are applied to a scenario where a large amount of data is deleted in the KV storage system.

Figure 202011415970

Description

Data merging method and device applied to key value storage system
The scheme is a divisional application with the patent application number of 201810825117.1, the application date of 2018, 7 and 24, and the invention name of 'a data merging method and device applied to a key value storage system'.
Technical Field
The present application relates to the field of information technologies, and in particular, to a data merging method and apparatus applied to a key value storage system.
Background
Currently, when a Key Value (KV) storage system based on a log structured merge Tree (LSM Tree) performs a write operation, an additional write mode is adopted. For example, when an upper-level application writes a record, the KV memory system first writes the record to the Write Ahead Log (WAL), and then writes the record to Memtable in memory. When the memory occupied by Memtable reaches a certain upper limit, the KV memory system can freeze Memtable into Immutable Memtable, and the data of the Immutable Memtable are sorted and then dumped to a hard disk to form a Static Sorted Table (SST) file. The hard disk of the KV storage system may include multiple hierarchies, and each hierarchy may contain one or more SST files. When the Immunable Memtable is written from the memory to the hard disk, it is first stored in the 0 th level. For delete data operations, the KV memory system can append the write data deleted identification record in the WAL and Memtable, respectively.
When the KV memory system performs reading operation, the KV memory system needs to search in the memory and SST files of each hierarchy in sequence. When a large number of deletion operations are recorded in the KV storage system, a large number of deletion operation records (identification records in which data is deleted) may exist in the SST file. At this time, the reading operation needs to skip these deletion operation records in the SST file, which greatly reduces the reading performance.
Disclosure of Invention
The embodiment of the application provides a data merging method and device applied to a key value storage system, which can solve the problem of low reading operation performance caused by the fact that a large number of deletion operation records exist in SST files in a KV storage system.
In a first aspect, an embodiment of the present application provides a data merging method applied to a key value storage system, including: judging whether the number of deletion operation records in the SST file of the nth level of the key value storage system is greater than a first preset threshold value or not; wherein n is a non-negative integer; when the number of the deletion operation records is larger than the first preset threshold value, combining the first SST file and the second SST file in the nth level; wherein the second SST file is located at the n +1 th level of the key-value storage system, and the time when the key-value storage system receives the operation records in the first SST file is later than the time when the key-value storage system receives the operation records in the second SST file.
In the prior art, when a storage system performs a read operation, it is necessary to sequentially search in each level file in a memory and a hard disk. If a large amount of data is deleted, a large amount of deletion operation records exist in the SST file. At this time, the reading operation of the database needs to skip the deletion operation record in the SST file, thereby greatly reducing the reading performance. Based on the data merging method applied to the key value storage system provided by the embodiment of the application, if the data processing device determines that the number of the deletion operation records in the SST file at the nth level is greater than a first preset threshold, the data of the first SST file at the nth level and the data of the second SST file at the (n + 1) th level can be merged, redundant data of the first SST file and the second SST file can be removed, namely useless data and the deletion operation records corresponding to the useless data can be removed in time, the reading speed can be improved, better reading performance is ensured, and the problem of low reading operation performance caused by the existence of a large number of deletion operation records in the SST file can be solved.
In one possible implementation manner, the determining whether the number of deletion operation records in the SST file at the nth level of the key value storage system is greater than a first preset threshold includes: and judging whether the number of the deletion operation records in all the SST files in the nth level is greater than the first preset threshold value.
In this way, when all SST files of the nth level commonly include a large number of deletion operation records (greater than a first preset threshold), the first SST file of the nth level and the second SST file of the (n + 1) th level can be merged in time to reduce the deletion operation records in the SST file of the nth level. Therefore, the reading speed can be improved, better reading performance is ensured, and the problem of low reading performance caused by the fact that a large number of deletion operation records exist in the SST file is solved.
In one possible implementation manner, the determining whether the number of deletion operation records in the SST file at the nth level of the key value storage system is greater than a first preset threshold includes: and judging whether the number of the deletion operation records in the first SST file is greater than the first preset threshold value.
In this way, when the first SST file at the nth level includes a large number of deletion operation records (greater than the first preset threshold), the first SST file and the second SST file can be merged in time to reduce the deletion operation records in the first SST file, that is, the deletion operation records in the SST file at the nth level are reduced. Therefore, the reading speed can be improved, better reading performance is ensured, and the problem of low reading performance caused by the fact that a large number of deletion operation records exist in the SST file can be solved.
In one possible implementation, the primary key values of the operation records in the first SST file overlap the primary key values of the operation records in the second SST file by a maximum or minimum amount.
If the SST file with the least overlap with the primary key value of the second SST file in the nth level is selected as the first SST file, the iteration times of the data in the first SST file and the data in the second SST file can be reduced, and the data merging speed is increased. If the SST file which overlaps the primary key value of the second SST file most in the nth level is selected as the first SST file, data redundancy in the first SST file and the second SST file can be removed as much as possible, more deletion operation records and corresponding data records can be removed, and therefore reading speed can be improved, and better reading performance is guaranteed.
In one possible implementation, the first SST file is an SST file in which the deletion operation records are the most included among all SST files in the nth hierarchy.
That is, SST files whose number of operation records in the nth hierarchy is greater than the remaining SST files may be regarded as the first SST file. Therefore, more deletion operation records can be sunk to the next layer, and more deletion operation records and corresponding data records can be removed. Therefore, the reading speed can be improved, and better reading performance is ensured.
In a second aspect, an embodiment of the present application provides a data processing apparatus, including a determining unit and a merging unit: the judging unit is used for judging whether the number of the deletion operation records in the SST file of the nth level of the key value storage system is larger than a first preset threshold value or not; wherein n is a non-negative integer; the merging unit is used for merging the first SST file and the second SST file in the nth level when the number of the deletion operation records is larger than the first preset threshold value; wherein the second SST file is located at the n +1 th level of the key-value storage system, and the time when the key-value storage system receives the operation records in the first SST file is later than the time when the key-value storage system receives the operation records in the second SST file.
In a possible implementation manner, the determining unit is specifically configured to: and judging whether the number of the deletion operation records in all the SST files in the nth level is greater than the first preset threshold value.
In a possible implementation manner, the determining unit is specifically configured to: and judging whether the number of the deletion operation records in the first SST file is greater than the first preset threshold value.
In one possible implementation, the primary key values of the operation records in the first SST file overlap the primary key values of the operation records in the second SST file by a maximum or minimum amount.
In one possible implementation, the first SST file is an SST file in which the deletion operation records are the most included among all SST files in the nth hierarchy.
For technical effects of the second aspect and various possible implementations thereof, reference may be made to the technical effects of the first aspect and various possible implementations thereof, which are not described herein in detail.
In a third aspect, the present invention provides an apparatus, which exists in the form of a chip product, and the apparatus includes a processor and a memory, the memory is configured to be coupled to the processor and stores program instructions and data necessary for the apparatus, and the processor is configured to execute the program instructions stored in the memory, so that the apparatus performs the functions of the data processing apparatus in the method.
In a fourth aspect, an embodiment of the present application provides a data processing apparatus, where the data processing apparatus may implement corresponding functions in the foregoing method embodiments, and the functions may be implemented by hardware or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.
In one possible design, the data processing apparatus includes a processor and a communication interface, and the processor is configured to support the data processing apparatus to execute the corresponding functions of the method. The communication interface is used for supporting communication between the data processing device and other network elements. The data processing apparatus may also include a memory for coupling with the processor that retains program instructions and data necessary for the data processing apparatus.
In a fifth aspect, embodiments of the present application provide a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to perform any one of the methods provided in the first aspect.
In a sixth aspect, embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to perform any one of the methods provided in the first aspect.
Drawings
Fig. 1 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 2 is a schematic hardware structure diagram of an apparatus according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of a data merging method applied to a key value storage system according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a hierarchical structure of an SST file provided by an embodiment of the present application;
fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
For clarity and conciseness of the following description of the various embodiments, a brief introduction to related concepts or technologies is first presented:
KV storage system: the system can take values according to a main key (also called as a keyword or a main code), can query data through the main key, and has the advantages of high query speed, large data storage amount and high support for concurrency. The primary key may be composed of one field or a plurality of fields, and is respectively called a single-field primary key or a multi-field primary key. The primary key may uniquely identify a record (an entry) in a file. When a new record is added, the database may automatically check the primary key value of the new record, not allowing the value to be duplicated with the primary key values of other records, or not NULL (NULL).
Deleting the operation record: the record of the deleted data may be marked with the corresponding primary key value as a deleted state. For example, when the phone number of the user a is deleted, if the primary key value corresponding to the phone number of the user a is 100, the primary key value 100 may be marked as a deleted state.
Merging: it can also be called compression, and is used to merge the existing records (data) so as to delete some records that are no longer valid (e.g. remove duplicate updates or delete operation records), and reduce the data size and number of files to speed up the reading speed. In the embodiment of the present application, the merging process may include a Flush process (also referred to as minor compact) and a compact (also referred to as major compact) process. The Flush process dumps immutable Memtable in the memory into a 0 th level (level, which may be referred to as a layer or a level) SST file in the hard disk. The compact process dumps an SST file from a low level to a high level. Hereinafter, the merging process mainly refers to the compact process. In the embodiment of the invention, the 0 th level in the hard disk refers to an organization level of the SST file in a storage resource pool formed by the hard disk.
Rocksdb, LevelDB: are all KV storage systems based on LSM Tree. When the data is stored, the data can be stored in order according to the recorded primary keys, that is, the data corresponding to the adjacent primary keys are sequentially stored in the storage file.
The embodiment of the application provides a data merging method and device applied to a key value storage system, which can be applied to a KV storage system. For example, the method can be applied to a scenario that a large amount of data is deleted in a KV storage system based on an LSM-Tree structure. The LSM-Tree structure based KV storage system is, for example, but not limited to, Rocksdb, LevelDB, etc.
Fig. 1 is a schematic structural diagram of a data processing apparatus 10 according to an embodiment of the present application. The data processing device 10 may comprise business software 11 and a database 12. The database 12 may be used for accepting access of the business software 11, managing SST files, WAL, manifest (manifest) files, and current (current) files, so as to provide functions of reading (querying) and writing (including add (PUT)/DELETE (DELETE) operations) for users. The WAL is a kind of log for recording operation contents in the process of processing data insertion and deletion. Before each service data write operation is performed, it is recorded in the WAL and written into the Memtable. The manifest file records metadata of each SST file, including information such as a file name, a maximum primary key, and a minimum primary key. The current file is used for recording the file name of the current manifest file of the database.
Wherein the database 12 may manage SST files hierarchically, i.e. all SST files constitute a hierarchical structure. Each layer may maintain a specified number or size of files. For example, the level 0 may include 1M SST files, the level 1 may include 10M SST files, the level 2 may include 100M SST files, the level 3 may include 1G SST files, and so on. Level 0-n represents different levels of SST file storage, and level sequence numbers represent that data in the database are from new to old from small to large (namely, the time for the database to receive the data is from early to late). The SST files in each layer may be ordered by primary key size, with each SST file having a smallest primary key and a largest primary key, and the primary keys of each SST file being non-overlapping. Thus, one primary key is looked up at one level, and only one SST file is looked up. Hierarchy information to which each SST file belongs may be recorded in a manifest file.
The data processing apparatus in this embodiment of the present application may be implemented by a device, or may be a functional module in the device, which is not specifically limited in this embodiment of the present application. It is understood that the above functions may be either network elements in a hardware device, software functions running on dedicated hardware, or virtualized functions instantiated on a platform (e.g., a cloud platform).
For example, the data processing apparatus in the embodiment of the present application may be implemented by the apparatus in fig. 2. Fig. 2 is a schematic diagram illustrating a hardware structure of an apparatus according to an embodiment of the present disclosure. The apparatus 200 includes at least one processor 201, a bus 202, a memory 203, and at least one communication interface 204.
The processor 201 may be a general processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs in accordance with the present invention.
A communication bus 202.
The communication interface 204, using any transceiver or the like, is used for communication with other devices or communication networks, such as ethernet, etc.
The memory 203 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), a magnetic disk storage medium, a hard disk or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to. The memory may be self-contained and coupled to the processor via bus 202. The memory may also be integral to the processor.
The memory 203 is used for storing computer execution instructions for executing the scheme of the application, and is controlled by the processor 201 to execute. The processor 201 is configured to execute computer-executable instructions stored in the memory 203 to implement the methods of the embodiments of the present application.
Optionally, the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.
In particular implementations, processor 201 may include one or more CPUs such as CPU0 and CPU1 in fig. 2, for example, as one embodiment.
In particular implementations, apparatus 200 may include multiple processors, such as processor 201 and processor 207 in FIG. 2, for example, as an example. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In one implementation, the apparatus 200 may further include an output device 205 and an input device 206, as an example. The output device 205 is in communication with the processor 201 and may display information in a variety of ways. For example, the output device 205 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 206 is in communication with the processor 201 and may receive user input in a variety of ways. For example, the input device 206 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. In the description of the present application, the term "plurality" means two or more than two unless otherwise specified. In addition, in order to facilitate clear description of technical solutions of the embodiments of the present application, in the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.
An embodiment of the present application provides a data merging method applied to a key value storage system, as shown in fig. 3, including:
301. the data processing device judges whether the number of the deletion operation records in the SST file of the nth level of the key value storage system is larger than a first preset threshold value or not; wherein n is a non-negative integer.
In one possible design, the SST files at the nth level may include all (all) SST files in the nth level. That is, the data processing apparatus may determine whether the number of deletion operation records in all SST files in the nth hierarchy is greater than a first preset threshold. For example, the first preset threshold may be 10% of the number of deletion operation records of the total entries (records) of all SST files in the nth hierarchy. For example, the number of all entries in the SST file in the second layer is 1000 ten thousand, and if the number of the deletion operation records is 200 ten thousand, the number of the deletion operation records is 20% of the number of all entries, that is, the number of the deletion operation records is greater than the first preset threshold.
In one possible design, the nth level of SST files is the first SST file. That is, the data processing apparatus may determine whether the number of deletion operation records in a first SST file in the SST files of the nth tier is greater than a first preset threshold. For example, the first preset threshold may be that 10 ten thousand deletion operation records exist in the first SST file; or the first preset threshold is 30% of the deletion operation records in the first SST file accounting for the total number of entries of the SST file; alternatively, the first preset threshold is 10% of the number of deletion operation records in the first SST file in all SST files of the nth hierarchy.
In one possible design, if a user expects that the deletion operation records may be concentrated within a certain range, the SST files of the nth level may include SST files within the range. For example, assuming that a user expects to delete SST files whose operation records may be concentrated within the range of primary key values of 100-. That is, the data processing apparatus may determine whether the number of deletion operation records in the SST file having the primary key value falling within the range of 100-300 is greater than a first preset threshold. For example, the first preset threshold may be that the number of deletion operation records of the SST files in the range of 100-300 primary key values in the nth hierarchy accounts for 20% of the total number of entries of the SST files in the range.
If the number of the deletion operation records in the SST file of the nth level is determined to be greater than the first preset threshold, the data processing device executes step 302. For example, if the data processing apparatus determines that the number of deletion operation records in all SST files in the second layer is greater than the first preset threshold, step 302 is performed. Or, if the data processing apparatus determines that the number of deletion operation records in the first SST file of the second layer is greater than the first preset threshold, step 302 is executed.
Note that the deletion operation record of the SST file at the nth level may be continuous, discontinuous, or not completely continuous.
In one possible design, the data processing apparatus may determine whether the number of consecutive deletion operation records in the one or more SST files of the nth tier is greater than a first preset threshold. For example, it is determined whether the number of consecutive deletion operation records in the SST file of the nth hierarchy is greater than 50. If the number of the continuous deletion operation records in the SST file of the nth level is determined to be greater than the first preset threshold, step 302 is executed. In this way, it is possible to avoid that the storage system continuously skips a plurality of deletion operation records when performing a read operation (for example, if data having primary key values of 1 to 100 in the SST file of the nth hierarchy is marked as deleted, when the data processing apparatus queries data having primary key values of 101, it is necessary to continuously skip 100 deletion operation records), and thus read/write efficiency can be improved.
302. The data processing apparatus merges the first SST file and the second SST file in the nth hierarchy.
And the time for receiving the operation records in the first SST file by the key value storage system is later than the time for receiving the operation records in the second SST file by the key value storage system.
When the number of deletion operation records in the SST file at the nth level is greater than a first preset threshold, the data processing apparatus may determine a target SST file (first SST file) from the SST files at the nth level. If the SST files of the nth tier include all SST files of the nth tier, the data processing device may select one first SST file. Then, the data of the first SST file is merged with the data of the second SST file in the SST files of the (n + 1) th tier to remove redundant data of the first SST file and the second SST file.
In one possible design, the SST file including the SST file with the most deleted operation records among all the SST files in the nth hierarchy may be used as the first SST file, that is, the SST file having the operation records in the nth hierarchy greater than the rest of the SST files may be used as the first SST file. In this way, more deletion operation records can be sunk to the next layer, and more deletion operation records and corresponding records can be removed. Therefore, the reading speed can be improved, and better reading performance is ensured.
When the number of deletion operation records per SST file in the SST file at the nth level is the same, the SST file at the nth level that overlaps the primary key value of the second SST file the least may be selected as the first SST file. In this way, the number of iterations of data in the first SST file and data in the second SST file can be reduced, thereby speeding up the speed of data merging. Or, the SST file which overlaps the primary key value of the second SST file most in the nth level may be selected as the first SST file, so that data redundancy in the first SST file and the second SST file may be removed as much as possible, and more deletion operation records and corresponding data records may be removed, thereby improving the reading speed and ensuring better reading performance.
For another example, the first SST file may be a latest generated SST file or an earliest generated SST file among SST files of the nth hierarchy. It should be noted that the above methods for determining the first SST file are only a few examples provided by the embodiments of the present application, and the present application does not limit how the first SST file is determined.
In one possible design, the range of primary key values for the second SST file overlaps (crosses) the range of primary key values for the first SST file. For example, when the number of deletion operation records in all SST files of the second hierarchy is greater than a first preset threshold, one first SST file may be selected from the SST files of the second hierarchy, and data of the first SST file may be merged with data of a second SST file in the SST files of the third hierarchy. Or, when the number of the deletion operation records in the first SST file of the second hierarchy is greater than a first preset threshold, merging the data of the first SST file with the data of the second SST file in the SST files of the third hierarchy. As shown in fig. 4, it is assumed that the range of the primary key values of the first SST file is 100-; thus, the second SST file can include from 2 nd to 4 th SST files of the third hierarchy, i.e., the first SST file and the 2 nd to 4 th SST files of the third hierarchy can be data-merged, respectively.
In one possible design, the minimum primary key value of the second SST file is closest to the maximum primary key value of the first SST file, and/or the maximum primary key value of the second SST file is closest to the minimum primary key value of the first SST file. For example, assume that the range of the primary key values of the first SST file is 100-; thus, the second SST files may include third tier 1 st and 2 nd SST files, and the first SST files may be inserted between the third tier 1 st and 2 nd SST files.
Specifically, the process of merging the data of the first SST file and the second SST file by the data processing device is equivalent to performing multi-path merging and sorting of the data of the first SST file and at least one second SST file. That is, data (records) of the first SST file and the at least one second SST file are sequentially iterated according to a primary key sequence, and are directly discarded if no value is saved; otherwise, it is written to the newly generated SST file. For example, when there is a record of a delete operation for data a in a first SST file and there is just a record of data a in a second SST file, then both data a and the record of the delete operation for data a may be discarded. Thus, the number of deletion operation records in the first SST file can be reduced, the reading speed can be increased, and better reading performance is ensured.
It is understood that, during the merging process of the data, the deletion operation records of the data can sink layer by layer, so that all records related to the deletion operation records in the database and corresponding deletion operation records can be discarded. For example, when the number of deletion operation records in the second-layer SST file is greater than a first preset threshold, the second-layer SST file and the third-layer SST file are merged such that the deletion operation records in the second-layer SST file sink to the third-layer SST file. The corresponding record in the third-layer SST file can be deleted according to the deletion operation record in the second-layer SST file. Further, as the number of deletion operation records in the third layer of files is increased, the number of deletion operation records in the third layer of SST files may be greater than a first preset threshold, and at this time, the third layer of SST files and the fourth layer of SST files are merged, so that the deletion operation records in the third layer of SST files sink to the fourth layer of SST files. The corresponding record in the fourth-layer SST file can be deleted according to the deletion operation record in the third-layer SST file. By analogy, when the deletion operation record sinks to the last layer, all records related to the deletion operation record and the corresponding deletion operation record in the database can be discarded.
In the prior art, when a storage system performs a read operation, it is necessary to sequentially search in each level file in a memory and a hard disk. If a large amount of data is deleted, a large amount of deletion operation records exist in the SST file. At this time, the reading operation of the database needs to skip the deletion operation record in the SST file, thereby greatly reducing the reading performance. According to the data merging method applied to the key value storage system, if the data processing device determines that the number of the deletion operation records in the SST file at the nth level is larger than a first preset threshold, the data of the first SST file at the nth level and the data of the second SST file at the (n + 1) th level can be merged, and the redundant data of the first SST file and the redundant data of the second SST file can be removed, so that the useless data and the deletion operation records corresponding to the useless data can be removed in time, the reading speed can be improved, the better reading performance is ensured, and the problem of low reading operation performance caused by the existence of a large number of deletion operation records in the SST file can be solved.
The above description mainly introduces the solution provided in the embodiments of the present application from the perspective of a data processing apparatus. It will be appreciated that the data processing apparatus, in order to carry out the above-described functions, may comprise corresponding hardware structures and/or software modules for performing the respective functions. Those skilled in the art will readily appreciate that the algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or a combination of hardware and software. Whether a function is performed as hardware or software-driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present application, the data processing apparatus may be divided into the functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.
In the case of dividing the functional modules according to their respective functions, fig. 5 shows a schematic diagram of a possible structure of the data processing apparatus 5 according to the above-described embodiment, which includes: a judging unit 501 and a combining unit 502. A determining unit 501, configured to determine whether the number of deletion operation records in an SST file at an nth level of the key value storage system is greater than a first preset threshold; wherein n is a non-negative integer. A merging unit 502 configured to merge the first SST file and the second SST file in the nth tier when the number of deletion operation records is greater than a first preset threshold; and the time for receiving the operation records in the first SST file by the key value storage system is later than the time for receiving the operation records in the second SST file by the key value storage system. Wherein the determining unit 501 is configured to support the data processing apparatus to execute the process 301 in fig. 3. The merging unit 502 is used to support the data processing apparatus to execute the process 302 in fig. 3. All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.
In a possible implementation manner, the determining unit is specifically configured to: and judging whether the number of the deletion operation records in all the SST files in the nth level is greater than the first preset threshold value.
In another possible implementation manner, the determining unit is specifically configured to: and judging whether the number of the deletion operation records in the first SST file is greater than the first preset threshold value.
In one possible implementation, the primary key values of the operation records in the first SST file overlap the primary key values of the operation records in the second SST file by a maximum or minimum amount.
In another possible implementation manner, the first SST file is an SST file in which the deletion operation records included in all SST files in the nth hierarchy are the most.
The data processing device in the embodiment of the present invention may be a server, and the hard disk in the embodiment of the present invention may be a local hard disk of the server, or a hard disk in multiple servers, or an external hard disk. The hard Disk can be a mechanical hard Disk or a Solid State Disk (SSD) or a combination of a mechanical hard Disk and an SSD.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware or in software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in RAM, flash memory, ROM, EPROM, EEPROM, registers, hard disk, a removable hard disk, a compact disk, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in a server and be used to implement various aspects of embodiments of the present invention. .
Those skilled in the art will recognize that in one or more of the examples described above, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a data processing apparatus-readable medium. Data processing device readable media include data processing device storage media and communication media including any medium that facilitates transfer of a data processing device program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose data processing apparatus.
The above-mentioned embodiments, objects, technical solutions and advantages of the present application are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present application, and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present application should be included in the scope of the present application.

Claims (11)

1. A data merging method applied to a key value storage system is characterized by comprising the following steps:
selecting a first SST file in the SST files of the n-th level of the key value storage system; wherein the number of deletion operation records in the first SST file is greater than a first preset threshold; wherein n is a non-negative integer;
merging a first SST file with a second SST file in the nth tier; the second SST file is positioned at the n +1 th level of the key value storage system, and the time for receiving the operation records in the first SST file by the key value storage system is later than the time for receiving the operation records in the second SST file by the key value storage system.
2. The method of claim 1, wherein the first SST file is a range of SST files having primary key values.
3. The method of claim 1 or 2, wherein the primary key values of the operational records in the first SST file overlap the primary key values of the operational records in the second SST file by a maximum or minimum amount.
4. The merge method according to claim 1 or 2, wherein the first SST file is an SST file in which deletion operations included in all SST files in the nth hierarchy are most recorded.
5. A data processing apparatus is characterized by comprising a judging unit and a merging unit:
the selection unit is used for selecting a first SST file in the static sorting table SST files of the nth level of the key value storage system; wherein the number of deletion operation records in the first SST file is greater than a first preset threshold; wherein n is a non-negative integer;
the merging unit is used for merging the first SST file and the second SST file in the nth level when the number of the deletion operation records is greater than the first preset threshold value; the second SST file is positioned at the n +1 th level of the key value storage system, and the time for receiving the operation records in the first SST file by the key value storage system is later than the time for receiving the operation records in the second SST file by the key value storage system.
6. The data processing apparatus of claim 5, wherein the first SST file is an SST file having a range of primary key values.
7. The data processing apparatus of claim 5 or 6, wherein the primary key values of the operation records in the first SST file overlap the primary key values of the operation records in the second SST file by a maximum or minimum amount.
8. The data processing apparatus according to claim 5 or 6, wherein the first SST file is an SST file having a maximum number of deletion operation records included in all SST files in the nth hierarchy.
9. A data processing apparatus comprising a memory storing computer instructions that, when executed, cause the data processing apparatus to perform the data merging method as claimed in any one of claims 1 to 4 applied in a key-value storage system.
10. A computer storage medium storing computer instructions which, when executed by a computer, cause the computer to perform the data merging method applied in a key-value storage system as claimed in any one of claims 1 to 4.
11. A storage system comprising, as a first storage system, a processor and an interface, the processor and the interface being in communication, the processor being configured to perform the data consolidation method applied in a key-value storage system as claimed in any one of claims 1 to 4.
CN202011415970.XA 2018-07-24 2018-07-24 Data merging method and device applied to key value storage system Pending CN112527735A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011415970.XA CN112527735A (en) 2018-07-24 2018-07-24 Data merging method and device applied to key value storage system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011415970.XA CN112527735A (en) 2018-07-24 2018-07-24 Data merging method and device applied to key value storage system
CN201810825117.1A CN109271343B (en) 2018-07-24 2018-07-24 A data merging method and device applied in a key-value storage system

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201810825117.1A Division CN109271343B (en) 2018-07-24 2018-07-24 A data merging method and device applied in a key-value storage system

Publications (1)

Publication Number Publication Date
CN112527735A true CN112527735A (en) 2021-03-19

Family

ID=65153195

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202011415970.XA Pending CN112527735A (en) 2018-07-24 2018-07-24 Data merging method and device applied to key value storage system
CN201810825117.1A Active CN109271343B (en) 2018-07-24 2018-07-24 A data merging method and device applied in a key-value storage system

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201810825117.1A Active CN109271343B (en) 2018-07-24 2018-07-24 A data merging method and device applied in a key-value storage system

Country Status (1)

Country Link
CN (2) CN112527735A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326262A (en) * 2021-05-14 2021-08-31 锐掣(杭州)科技有限公司 Data processing method, device, equipment and medium based on key value database
CN115480705A (en) * 2022-09-28 2022-12-16 杭州海康威视系统技术有限公司 Data processing method and device, electronic equipment and storage medium
WO2024239898A1 (en) * 2023-05-24 2024-11-28 阿里云计算有限公司 Method, system, and apparatus for data management in data warehouse, and device and medium
US12524443B2 (en) 2023-08-25 2026-01-13 Beijing Volcano Engine Technology Co., Ltd. Log information processing method, device and storage medium

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625531B (en) * 2019-02-28 2023-06-20 阿里巴巴集团控股有限公司 Merging device based on programmable device, data merging method and database system
CN110389942B (en) * 2019-06-21 2021-07-30 华中科技大学 A key-value separation storage method and system without garbage collection
CN112711564B (en) * 2019-10-24 2024-04-09 华为云计算技术有限公司 Merge processing method and related equipment
WO2021061173A1 (en) * 2019-12-04 2021-04-01 Futurewei Technologies, Inc. Data integrity validation on lsm tree snapshots
CN113032340B (en) * 2019-12-24 2024-05-14 阿里巴巴集团控股有限公司 Data file merging method, device, storage medium and processor
CN113051361A (en) * 2019-12-27 2021-06-29 华为技术有限公司 Merging method and related equipment
CN111352908B (en) * 2020-02-28 2023-10-10 北京奇艺世纪科技有限公司 LSM-based data storage method and device, storage medium and computer equipment
CN113495871B (en) * 2020-04-04 2023-06-23 厦门网宿有限公司 File management method and device based on LSM-Tree storage engine
CN111737261B (en) * 2020-06-24 2023-09-22 山东大学 Compressed log caching method and device based on LSM-Tree
CN113568908B (en) * 2021-07-16 2024-11-19 华中科技大学 A key-value request parallel scheduling method and system
CN113934735B (en) * 2021-12-15 2022-05-31 深圳市城市交通规划设计研究中心股份有限公司 Method and device for processing data
CN118152401B (en) * 2024-03-25 2025-03-07 北京力控元通科技有限公司 Data storage method, device, equipment, storage medium and program product
CN118885137A (en) * 2024-10-08 2024-11-01 天津南大通用数据技术股份有限公司 Data storage method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103812877A (en) * 2014-03-12 2014-05-21 西安电子科技大学 Data compression method based on Bigtable distributed storage system
CN105095287A (en) * 2014-05-14 2015-11-25 华为技术有限公司 LSM (Log Structured Merge) data compact method and device
CN106407224A (en) * 2015-07-31 2017-02-15 华为技术有限公司 Method and device for file compaction in KV (Key-Value)-Store system
CN107038206A (en) * 2017-01-17 2017-08-11 阿里巴巴集团控股有限公司 The method for building up of LSM trees, the method for reading data and server of LSM trees
CN107526550A (en) * 2017-09-06 2017-12-29 中国人民大学 A kind of two benches merging method based on log-structured merging tree

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103744628B (en) * 2014-01-27 2016-09-28 北京奇虎科技有限公司 SSTable file storage method and device
CN105468298B (en) * 2015-11-19 2018-11-13 中国科学院信息工程研究所 A kind of key assignments storage method based on log-structured merging tree

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103812877A (en) * 2014-03-12 2014-05-21 西安电子科技大学 Data compression method based on Bigtable distributed storage system
CN105095287A (en) * 2014-05-14 2015-11-25 华为技术有限公司 LSM (Log Structured Merge) data compact method and device
CN106407224A (en) * 2015-07-31 2017-02-15 华为技术有限公司 Method and device for file compaction in KV (Key-Value)-Store system
CN107038206A (en) * 2017-01-17 2017-08-11 阿里巴巴集团控股有限公司 The method for building up of LSM trees, the method for reading data and server of LSM trees
CN107526550A (en) * 2017-09-06 2017-12-29 中国人民大学 A kind of two benches merging method based on log-structured merging tree

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326262A (en) * 2021-05-14 2021-08-31 锐掣(杭州)科技有限公司 Data processing method, device, equipment and medium based on key value database
CN113326262B (en) * 2021-05-14 2022-06-24 锐掣(杭州)科技有限公司 Data processing method, device, equipment and medium based on key value database
CN115480705A (en) * 2022-09-28 2022-12-16 杭州海康威视系统技术有限公司 Data processing method and device, electronic equipment and storage medium
CN115480705B (en) * 2022-09-28 2025-10-03 杭州海康威视系统技术有限公司 Data processing method, device, electronic device and storage medium
WO2024239898A1 (en) * 2023-05-24 2024-11-28 阿里云计算有限公司 Method, system, and apparatus for data management in data warehouse, and device and medium
US12524443B2 (en) 2023-08-25 2026-01-13 Beijing Volcano Engine Technology Co., Ltd. Log information processing method, device and storage medium

Also Published As

Publication number Publication date
CN109271343B (en) 2020-12-15
CN109271343A (en) 2019-01-25

Similar Documents

Publication Publication Date Title
CN112527735A (en) Data merging method and device applied to key value storage system
JP7410181B2 (en) Hybrid indexing methods, systems, and programs
US11048691B2 (en) In-memory database system
CN102722449B (en) Key-Value local storage method and system based on solid state disk (SSD)
US8738673B2 (en) Index partition maintenance over monotonically addressed document sequences
US9514211B2 (en) High throughput data modifications using blind update operations
US10783115B2 (en) Dividing a dataset into sub-datasets having a subset of values of an attribute of the dataset
US20160350302A1 (en) Dynamically splitting a range of a node in a distributed hash table
KR20210058118A (en) Casedb: low-cost put-intensive key-value store for edge computing
CN107665219B (en) Log management method and device
CN115858467A (en) File processing method and device for key value database, electronic equipment and medium
US20110099347A1 (en) Managing allocation and deallocation of storage for data objects
US20180011897A1 (en) Data processing method having structure of cache index specified to transaction in mobile environment dbms
US20220092049A1 (en) Workload-driven database reorganization
CN116450591B (en) Data processing method, device, computer equipment and storage medium
WO2012081165A1 (en) Database management device and database management method
CN113849548A (en) A data extraction method, apparatus, device and medium
CN115658841A (en) Data management method and device, computing equipment and storage medium
US20250245273A1 (en) Similarity information in skip lists
CN120295758A (en) Memory access method, device, computing device, memory device and system
CN118796852A (en) Database-based read-write concurrency control method, medium, device and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220225

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Applicant after: Huawei Cloud Computing Technologies Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Applicant before: HUAWEI TECHNOLOGIES Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210319