CN112527735A

CN112527735A - Data merging method and device applied to key value storage system

Info

Publication number: CN112527735A
Application number: CN202011415970.XA
Authority: CN
Inventors: 瓦伦丁·库茨奈特索夫; 李逸; 倪敏芳; 鹿智婷; 欧佳欣
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2021-03-19
Also published as: CN109271343B; CN109271343A

Abstract

The embodiments of the present application provide a data merging method and device applied in a key-value storage system, relate to the field of information technology, and can solve the problem of low read operation performance caused by a large number of deletion operation records in the SST file in the KV storage system . The method is: judging whether the number of deletion operation records in the SST file of the nth level of the key-value storage system is greater than a first preset threshold; wherein, n is a non-negative integer; when the number of deletion operation records is greater than the first preset threshold; When the threshold is set, the first SST file in the nth level is merged with the second SST file; wherein, the second SST file is located at the n+1th level of the key-value storage system, and the key-value storage system receives the The time of the operation record in one SST file is later than the time when the key-value storage system receives the operation record in the second SST file. The embodiments of the present application are applied to a scenario where a large amount of data is deleted in the KV storage system.

Description

Data merging method and device applied to key value storage system

The scheme is a divisional application with the patent application number of 201810825117.1, the application date of 2018, 7 and 24, and the invention name of 'a data merging method and device applied to a key value storage system'.

Technical Field

The present application relates to the field of information technologies, and in particular, to a data merging method and apparatus applied to a key value storage system.

Background

Currently, when a Key Value (KV) storage system based on a log structured merge Tree (LSM Tree) performs a write operation, an additional write mode is adopted. For example, when an upper-level application writes a record, the KV memory system first writes the record to the Write Ahead Log (WAL), and then writes the record to Memtable in memory. When the memory occupied by Memtable reaches a certain upper limit, the KV memory system can freeze Memtable into Immutable Memtable, and the data of the Immutable Memtable are sorted and then dumped to a hard disk to form a Static Sorted Table (SST) file. The hard disk of the KV storage system may include multiple hierarchies, and each hierarchy may contain one or more SST files. When the Immunable Memtable is written from the memory to the hard disk, it is first stored in the 0 th level. For delete data operations, the KV memory system can append the write data deleted identification record in the WAL and Memtable, respectively.

When the KV memory system performs reading operation, the KV memory system needs to search in the memory and SST files of each hierarchy in sequence. When a large number of deletion operations are recorded in the KV storage system, a large number of deletion operation records (identification records in which data is deleted) may exist in the SST file. At this time, the reading operation needs to skip these deletion operation records in the SST file, which greatly reduces the reading performance.

Disclosure of Invention

The embodiment of the application provides a data merging method and device applied to a key value storage system, which can solve the problem of low reading operation performance caused by the fact that a large number of deletion operation records exist in SST files in a KV storage system.

In a first aspect, an embodiment of the present application provides a data merging method applied to a key value storage system, including: judging whether the number of deletion operation records in the SST file of the nth level of the key value storage system is greater than a first preset threshold value or not; wherein n is a non-negative integer; when the number of the deletion operation records is larger than the first preset threshold value, combining the first SST file and the second SST file in the nth level; wherein the second SST file is located at the n +1 th level of the key-value storage system, and the time when the key-value storage system receives the operation records in the first SST file is later than the time when the key-value storage system receives the operation records in the second SST file.

In the prior art, when a storage system performs a read operation, it is necessary to sequentially search in each level file in a memory and a hard disk. If a large amount of data is deleted, a large amount of deletion operation records exist in the SST file. At this time, the reading operation of the database needs to skip the deletion operation record in the SST file, thereby greatly reducing the reading performance. Based on the data merging method applied to the key value storage system provided by the embodiment of the application, if the data processing device determines that the number of the deletion operation records in the SST file at the nth level is greater than a first preset threshold, the data of the first SST file at the nth level and the data of the second SST file at the (n + 1) th level can be merged, redundant data of the first SST file and the second SST file can be removed, namely useless data and the deletion operation records corresponding to the useless data can be removed in time, the reading speed can be improved, better reading performance is ensured, and the problem of low reading operation performance caused by the existence of a large number of deletion operation records in the SST file can be solved.

In one possible implementation manner, the determining whether the number of deletion operation records in the SST file at the nth level of the key value storage system is greater than a first preset threshold includes: and judging whether the number of the deletion operation records in all the SST files in the nth level is greater than the first preset threshold value.

In this way, when all SST files of the nth level commonly include a large number of deletion operation records (greater than a first preset threshold), the first SST file of the nth level and the second SST file of the (n + 1) th level can be merged in time to reduce the deletion operation records in the SST file of the nth level. Therefore, the reading speed can be improved, better reading performance is ensured, and the problem of low reading performance caused by the fact that a large number of deletion operation records exist in the SST file is solved.

In one possible implementation manner, the determining whether the number of deletion operation records in the SST file at the nth level of the key value storage system is greater than a first preset threshold includes: and judging whether the number of the deletion operation records in the first SST file is greater than the first preset threshold value.

In this way, when the first SST file at the nth level includes a large number of deletion operation records (greater than the first preset threshold), the first SST file and the second SST file can be merged in time to reduce the deletion operation records in the first SST file, that is, the deletion operation records in the SST file at the nth level are reduced. Therefore, the reading speed can be improved, better reading performance is ensured, and the problem of low reading performance caused by the fact that a large number of deletion operation records exist in the SST file can be solved.

In one possible implementation, the primary key values of the operation records in the first SST file overlap the primary key values of the operation records in the second SST file by a maximum or minimum amount.

If the SST file with the least overlap with the primary key value of the second SST file in the nth level is selected as the first SST file, the iteration times of the data in the first SST file and the data in the second SST file can be reduced, and the data merging speed is increased. If the SST file which overlaps the primary key value of the second SST file most in the nth level is selected as the first SST file, data redundancy in the first SST file and the second SST file can be removed as much as possible, more deletion operation records and corresponding data records can be removed, and therefore reading speed can be improved, and better reading performance is guaranteed.

In one possible implementation, the first SST file is an SST file in which the deletion operation records are the most included among all SST files in the nth hierarchy.

That is, SST files whose number of operation records in the nth hierarchy is greater than the remaining SST files may be regarded as the first SST file. Therefore, more deletion operation records can be sunk to the next layer, and more deletion operation records and corresponding data records can be removed. Therefore, the reading speed can be improved, and better reading performance is ensured.

In a second aspect, an embodiment of the present application provides a data processing apparatus, including a determining unit and a merging unit: the judging unit is used for judging whether the number of the deletion operation records in the SST file of the nth level of the key value storage system is larger than a first preset threshold value or not; wherein n is a non-negative integer; the merging unit is used for merging the first SST file and the second SST file in the nth level when the number of the deletion operation records is larger than the first preset threshold value; wherein the second SST file is located at the n +1 th level of the key-value storage system, and the time when the key-value storage system receives the operation records in the first SST file is later than the time when the key-value storage system receives the operation records in the second SST file.

In a possible implementation manner, the determining unit is specifically configured to: and judging whether the number of the deletion operation records in all the SST files in the nth level is greater than the first preset threshold value.

In a possible implementation manner, the determining unit is specifically configured to: and judging whether the number of the deletion operation records in the first SST file is greater than the first preset threshold value.

For technical effects of the second aspect and various possible implementations thereof, reference may be made to the technical effects of the first aspect and various possible implementations thereof, which are not described herein in detail.

In a third aspect, the present invention provides an apparatus, which exists in the form of a chip product, and the apparatus includes a processor and a memory, the memory is configured to be coupled to the processor and stores program instructions and data necessary for the apparatus, and the processor is configured to execute the program instructions stored in the memory, so that the apparatus performs the functions of the data processing apparatus in the method.

In a fourth aspect, an embodiment of the present application provides a data processing apparatus, where the data processing apparatus may implement corresponding functions in the foregoing method embodiments, and the functions may be implemented by hardware or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.

In one possible design, the data processing apparatus includes a processor and a communication interface, and the processor is configured to support the data processing apparatus to execute the corresponding functions of the method. The communication interface is used for supporting communication between the data processing device and other network elements. The data processing apparatus may also include a memory for coupling with the processor that retains program instructions and data necessary for the data processing apparatus.

In a fifth aspect, embodiments of the present application provide a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to perform any one of the methods provided in the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to perform any one of the methods provided in the first aspect.

Drawings

Fig. 1 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 2 is a schematic hardware structure diagram of an apparatus according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a data merging method applied to a key value storage system according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a hierarchical structure of an SST file provided by an embodiment of the present application;

fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.

Detailed Description

For clarity and conciseness of the following description of the various embodiments, a brief introduction to related concepts or technologies is first presented:

KV storage system: the system can take values according to a main key (also called as a keyword or a main code), can query data through the main key, and has the advantages of high query speed, large data storage amount and high support for concurrency. The primary key may be composed of one field or a plurality of fields, and is respectively called a single-field primary key or a multi-field primary key. The primary key may uniquely identify a record (an entry) in a file. When a new record is added, the database may automatically check the primary key value of the new record, not allowing the value to be duplicated with the primary key values of other records, or not NULL (NULL).

Deleting the operation record: the record of the deleted data may be marked with the corresponding primary key value as a deleted state. For example, when the phone number of the user a is deleted, if the primary key value corresponding to the phone number of the user a is 100, the primary key value 100 may be marked as a deleted state.

Merging: it can also be called compression, and is used to merge the existing records (data) so as to delete some records that are no longer valid (e.g. remove duplicate updates or delete operation records), and reduce the data size and number of files to speed up the reading speed. In the embodiment of the present application, the merging process may include a Flush process (also referred to as minor compact) and a compact (also referred to as major compact) process. The Flush process dumps immutable Memtable in the memory into a 0 th level (level, which may be referred to as a layer or a level) SST file in the hard disk. The compact process dumps an SST file from a low level to a high level. Hereinafter, the merging process mainly refers to the compact process. In the embodiment of the invention, the 0 th level in the hard disk refers to an organization level of the SST file in a storage resource pool formed by the hard disk.

Rocksdb, LevelDB: are all KV storage systems based on LSM Tree. When the data is stored, the data can be stored in order according to the recorded primary keys, that is, the data corresponding to the adjacent primary keys are sequentially stored in the storage file.

The embodiment of the application provides a data merging method and device applied to a key value storage system, which can be applied to a KV storage system. For example, the method can be applied to a scenario that a large amount of data is deleted in a KV storage system based on an LSM-Tree structure. The LSM-Tree structure based KV storage system is, for example, but not limited to, Rocksdb, LevelDB, etc.

Fig. 1 is a schematic structural diagram of a data processing apparatus 10 according to an embodiment of the present application. The data processing device 10 may comprise business software 11 and a database 12. The database 12 may be used for accepting access of the business software 11, managing SST files, WAL, manifest (manifest) files, and current (current) files, so as to provide functions of reading (querying) and writing (including add (PUT)/DELETE (DELETE) operations) for users. The WAL is a kind of log for recording operation contents in the process of processing data insertion and deletion. Before each service data write operation is performed, it is recorded in the WAL and written into the Memtable. The manifest file records metadata of each SST file, including information such as a file name, a maximum primary key, and a minimum primary key. The current file is used for recording the file name of the current manifest file of the database.

Wherein the database 12 may manage SST files hierarchically, i.e. all SST files constitute a hierarchical structure. Each layer may maintain a specified number or size of files. For example, the level 0 may include 1M SST files, the level 1 may include 10M SST files, the level 2 may include 100M SST files, the level 3 may include 1G SST files, and so on. Level 0-n represents different levels of SST file storage, and level sequence numbers represent that data in the database are from new to old from small to large (namely, the time for the database to receive the data is from early to late). The SST files in each layer may be ordered by primary key size, with each SST file having a smallest primary key and a largest primary key, and the primary keys of each SST file being non-overlapping. Thus, one primary key is looked up at one level, and only one SST file is looked up. Hierarchy information to which each SST file belongs may be recorded in a manifest file.

The data processing apparatus in this embodiment of the present application may be implemented by a device, or may be a functional module in the device, which is not specifically limited in this embodiment of the present application. It is understood that the above functions may be either network elements in a hardware device, software functions running on dedicated hardware, or virtualized functions instantiated on a platform (e.g., a cloud platform).

For example, the data processing apparatus in the embodiment of the present application may be implemented by the apparatus in fig. 2. Fig. 2 is a schematic diagram illustrating a hardware structure of an apparatus according to an embodiment of the present disclosure. The apparatus 200 includes at least one processor 201, a bus 202, a memory 203, and at least one communication interface 204.

The processor 201 may be a general processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs in accordance with the present invention.

A communication bus 202.

The communication interface 204, using any transceiver or the like, is used for communication with other devices or communication networks, such as ethernet, etc.

The memory 203 may be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), a magnetic disk storage medium, a hard disk or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to. The memory may be self-contained and coupled to the processor via bus 202. The memory may also be integral to the processor.

The memory 203 is used for storing computer execution instructions for executing the scheme of the application, and is controlled by the processor 201 to execute. The processor 201 is configured to execute computer-executable instructions stored in the memory 203 to implement the methods of the embodiments of the present application.

Optionally, the computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.

In particular implementations, processor 201 may include one or more CPUs such as CPU0 and CPU1 in fig. 2, for example, as one embodiment.

In particular implementations, apparatus 200 may include multiple processors, such as processor 201 and processor 207 in FIG. 2, for example, as an example. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In one implementation, the apparatus 200 may further include an output device 205 and an input device 206, as an example. The output device 205 is in communication with the processor 201 and may display information in a variety of ways. For example, the output device 205 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 206 is in communication with the processor 201 and may receive user input in a variety of ways. For example, the input device 206 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. In the description of the present application, the term "plurality" means two or more than two unless otherwise specified. In addition, in order to facilitate clear description of technical solutions of the embodiments of the present application, in the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.

An embodiment of the present application provides a data merging method applied to a key value storage system, as shown in fig. 3, including:

301. the data processing device judges whether the number of the deletion operation records in the SST file of the nth level of the key value storage system is larger than a first preset threshold value or not; wherein n is a non-negative integer.

In one possible design, the SST files at the nth level may include all (all) SST files in the nth level. That is, the data processing apparatus may determine whether the number of deletion operation records in all SST files in the nth hierarchy is greater than a first preset threshold. For example, the first preset threshold may be 10% of the number of deletion operation records of the total entries (records) of all SST files in the nth hierarchy. For example, the number of all entries in the SST file in the second layer is 1000 ten thousand, and if the number of the deletion operation records is 200 ten thousand, the number of the deletion operation records is 20% of the number of all entries, that is, the number of the deletion operation records is greater than the first preset threshold.

In one possible design, the nth level of SST files is the first SST file. That is, the data processing apparatus may determine whether the number of deletion operation records in a first SST file in the SST files of the nth tier is greater than a first preset threshold. For example, the first preset threshold may be that 10 ten thousand deletion operation records exist in the first SST file; or the first preset threshold is 30% of the deletion operation records in the first SST file accounting for the total number of entries of the SST file; alternatively, the first preset threshold is 10% of the number of deletion operation records in the first SST file in all SST files of the nth hierarchy.

In one possible design, if a user expects that the deletion operation records may be concentrated within a certain range, the SST files of the nth level may include SST files within the range. For example, assuming that a user expects to delete SST files whose operation records may be concentrated within the range of primary key values of 100-. That is, the data processing apparatus may determine whether the number of deletion operation records in the SST file having the primary key value falling within the range of 100-300 is greater than a first preset threshold. For example, the first preset threshold may be that the number of deletion operation records of the SST files in the range of 100-300 primary key values in the nth hierarchy accounts for 20% of the total number of entries of the SST files in the range.

If the number of the deletion operation records in the SST file of the nth level is determined to be greater than the first preset threshold, the data processing device executes step 302. For example, if the data processing apparatus determines that the number of deletion operation records in all SST files in the second layer is greater than the first preset threshold, step 302 is performed. Or, if the data processing apparatus determines that the number of deletion operation records in the first SST file of the second layer is greater than the first preset threshold, step 302 is executed.

Note that the deletion operation record of the SST file at the nth level may be continuous, discontinuous, or not completely continuous.

In one possible design, the data processing apparatus may determine whether the number of consecutive deletion operation records in the one or more SST files of the nth tier is greater than a first preset threshold. For example, it is determined whether the number of consecutive deletion operation records in the SST file of the nth hierarchy is greater than 50. If the number of the continuous deletion operation records in the SST file of the nth level is determined to be greater than the first preset threshold, step 302 is executed. In this way, it is possible to avoid that the storage system continuously skips a plurality of deletion operation records when performing a read operation (for example, if data having primary key values of 1 to 100 in the SST file of the nth hierarchy is marked as deleted, when the data processing apparatus queries data having primary key values of 101, it is necessary to continuously skip 100 deletion operation records), and thus read/write efficiency can be improved.

302. The data processing apparatus merges the first SST file and the second SST file in the nth hierarchy.

And the time for receiving the operation records in the first SST file by the key value storage system is later than the time for receiving the operation records in the second SST file by the key value storage system.

When the number of deletion operation records in the SST file at the nth level is greater than a first preset threshold, the data processing apparatus may determine a target SST file (first SST file) from the SST files at the nth level. If the SST files of the nth tier include all SST files of the nth tier, the data processing device may select one first SST file. Then, the data of the first SST file is merged with the data of the second SST file in the SST files of the (n + 1) th tier to remove redundant data of the first SST file and the second SST file.

In one possible design, the SST file including the SST file with the most deleted operation records among all the SST files in the nth hierarchy may be used as the first SST file, that is, the SST file having the operation records in the nth hierarchy greater than the rest of the SST files may be used as the first SST file. In this way, more deletion operation records can be sunk to the next layer, and more deletion operation records and corresponding records can be removed. Therefore, the reading speed can be improved, and better reading performance is ensured.

When the number of deletion operation records per SST file in the SST file at the nth level is the same, the SST file at the nth level that overlaps the primary key value of the second SST file the least may be selected as the first SST file. In this way, the number of iterations of data in the first SST file and data in the second SST file can be reduced, thereby speeding up the speed of data merging. Or, the SST file which overlaps the primary key value of the second SST file most in the nth level may be selected as the first SST file, so that data redundancy in the first SST file and the second SST file may be removed as much as possible, and more deletion operation records and corresponding data records may be removed, thereby improving the reading speed and ensuring better reading performance.

For another example, the first SST file may be a latest generated SST file or an earliest generated SST file among SST files of the nth hierarchy. It should be noted that the above methods for determining the first SST file are only a few examples provided by the embodiments of the present application, and the present application does not limit how the first SST file is determined.

In one possible design, the range of primary key values for the second SST file overlaps (crosses) the range of primary key values for the first SST file. For example, when the number of deletion operation records in all SST files of the second hierarchy is greater than a first preset threshold, one first SST file may be selected from the SST files of the second hierarchy, and data of the first SST file may be merged with data of a second SST file in the SST files of the third hierarchy. Or, when the number of the deletion operation records in the first SST file of the second hierarchy is greater than a first preset threshold, merging the data of the first SST file with the data of the second SST file in the SST files of the third hierarchy. As shown in fig. 4, it is assumed that the range of the primary key values of the first SST file is 100-; thus, the second SST file can include from 2 nd to 4 th SST files of the third hierarchy, i.e., the first SST file and the 2 nd to 4 th SST files of the third hierarchy can be data-merged, respectively.

In one possible design, the minimum primary key value of the second SST file is closest to the maximum primary key value of the first SST file, and/or the maximum primary key value of the second SST file is closest to the minimum primary key value of the first SST file. For example, assume that the range of the primary key values of the first SST file is 100-; thus, the second SST files may include third tier 1 st and 2 nd SST files, and the first SST files may be inserted between the third tier 1 st and 2 nd SST files.

Specifically, the process of merging the data of the first SST file and the second SST file by the data processing device is equivalent to performing multi-path merging and sorting of the data of the first SST file and at least one second SST file. That is, data (records) of the first SST file and the at least one second SST file are sequentially iterated according to a primary key sequence, and are directly discarded if no value is saved; otherwise, it is written to the newly generated SST file. For example, when there is a record of a delete operation for data a in a first SST file and there is just a record of data a in a second SST file, then both data a and the record of the delete operation for data a may be discarded. Thus, the number of deletion operation records in the first SST file can be reduced, the reading speed can be increased, and better reading performance is ensured.

It is understood that, during the merging process of the data, the deletion operation records of the data can sink layer by layer, so that all records related to the deletion operation records in the database and corresponding deletion operation records can be discarded. For example, when the number of deletion operation records in the second-layer SST file is greater than a first preset threshold, the second-layer SST file and the third-layer SST file are merged such that the deletion operation records in the second-layer SST file sink to the third-layer SST file. The corresponding record in the third-layer SST file can be deleted according to the deletion operation record in the second-layer SST file. Further, as the number of deletion operation records in the third layer of files is increased, the number of deletion operation records in the third layer of SST files may be greater than a first preset threshold, and at this time, the third layer of SST files and the fourth layer of SST files are merged, so that the deletion operation records in the third layer of SST files sink to the fourth layer of SST files. The corresponding record in the fourth-layer SST file can be deleted according to the deletion operation record in the third-layer SST file. By analogy, when the deletion operation record sinks to the last layer, all records related to the deletion operation record and the corresponding deletion operation record in the database can be discarded.

In the prior art, when a storage system performs a read operation, it is necessary to sequentially search in each level file in a memory and a hard disk. If a large amount of data is deleted, a large amount of deletion operation records exist in the SST file. At this time, the reading operation of the database needs to skip the deletion operation record in the SST file, thereby greatly reducing the reading performance. According to the data merging method applied to the key value storage system, if the data processing device determines that the number of the deletion operation records in the SST file at the nth level is larger than a first preset threshold, the data of the first SST file at the nth level and the data of the second SST file at the (n + 1) th level can be merged, and the redundant data of the first SST file and the redundant data of the second SST file can be removed, so that the useless data and the deletion operation records corresponding to the useless data can be removed in time, the reading speed can be improved, the better reading performance is ensured, and the problem of low reading operation performance caused by the existence of a large number of deletion operation records in the SST file can be solved.

The above description mainly introduces the solution provided in the embodiments of the present application from the perspective of a data processing apparatus. It will be appreciated that the data processing apparatus, in order to carry out the above-described functions, may comprise corresponding hardware structures and/or software modules for performing the respective functions. Those skilled in the art will readily appreciate that the algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or a combination of hardware and software. Whether a function is performed as hardware or software-driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the data processing apparatus may be divided into the functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.

In the case of dividing the functional modules according to their respective functions, fig. 5 shows a schematic diagram of a possible structure of the data processing apparatus 5 according to the above-described embodiment, which includes: a judging unit 501 and a combining unit 502. A determining unit 501, configured to determine whether the number of deletion operation records in an SST file at an nth level of the key value storage system is greater than a first preset threshold; wherein n is a non-negative integer. A merging unit 502 configured to merge the first SST file and the second SST file in the nth tier when the number of deletion operation records is greater than a first preset threshold; and the time for receiving the operation records in the first SST file by the key value storage system is later than the time for receiving the operation records in the second SST file by the key value storage system. Wherein the determining unit 501 is configured to support the data processing apparatus to execute the process 301 in fig. 3. The merging unit 502 is used to support the data processing apparatus to execute the process 302 in fig. 3. All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.

In another possible implementation manner, the determining unit is specifically configured to: and judging whether the number of the deletion operation records in the first SST file is greater than the first preset threshold value.

In another possible implementation manner, the first SST file is an SST file in which the deletion operation records included in all SST files in the nth hierarchy are the most.

The data processing device in the embodiment of the present invention may be a server, and the hard disk in the embodiment of the present invention may be a local hard disk of the server, or a hard disk in multiple servers, or an external hard disk. The hard Disk can be a mechanical hard Disk or a Solid State Disk (SSD) or a combination of a mechanical hard Disk and an SSD.

The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware or in software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in RAM, flash memory, ROM, EPROM, EEPROM, registers, hard disk, a removable hard disk, a compact disk, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in a server and be used to implement various aspects of embodiments of the present invention. .

Those skilled in the art will recognize that in one or more of the examples described above, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a data processing apparatus-readable medium. Data processing device readable media include data processing device storage media and communication media including any medium that facilitates transfer of a data processing device program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose data processing apparatus.

The above-mentioned embodiments, objects, technical solutions and advantages of the present application are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present application, and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present application should be included in the scope of the present application.

Claims

1. A data merging method applied to a key value storage system is characterized by comprising the following steps:

selecting a first SST file in the SST files of the n-th level of the key value storage system; wherein the number of deletion operation records in the first SST file is greater than a first preset threshold; wherein n is a non-negative integer;

merging a first SST file with a second SST file in the nth tier; the second SST file is positioned at the n +1 th level of the key value storage system, and the time for receiving the operation records in the first SST file by the key value storage system is later than the time for receiving the operation records in the second SST file by the key value storage system.

2. The method of claim 1, wherein the first SST file is a range of SST files having primary key values.

3. The method of claim 1 or 2, wherein the primary key values of the operational records in the first SST file overlap the primary key values of the operational records in the second SST file by a maximum or minimum amount.

4. The merge method according to claim 1 or 2, wherein the first SST file is an SST file in which deletion operations included in all SST files in the nth hierarchy are most recorded.

5. A data processing apparatus is characterized by comprising a judging unit and a merging unit:

the selection unit is used for selecting a first SST file in the static sorting table SST files of the nth level of the key value storage system; wherein the number of deletion operation records in the first SST file is greater than a first preset threshold; wherein n is a non-negative integer;

the merging unit is used for merging the first SST file and the second SST file in the nth level when the number of the deletion operation records is greater than the first preset threshold value; the second SST file is positioned at the n +1 th level of the key value storage system, and the time for receiving the operation records in the first SST file by the key value storage system is later than the time for receiving the operation records in the second SST file by the key value storage system.

6. The data processing apparatus of claim 5, wherein the first SST file is an SST file having a range of primary key values.

7. The data processing apparatus of claim 5 or 6, wherein the primary key values of the operation records in the first SST file overlap the primary key values of the operation records in the second SST file by a maximum or minimum amount.

8. The data processing apparatus according to claim 5 or 6, wherein the first SST file is an SST file having a maximum number of deletion operation records included in all SST files in the nth hierarchy.

9. A data processing apparatus comprising a memory storing computer instructions that, when executed, cause the data processing apparatus to perform the data merging method as claimed in any one of claims 1 to 4 applied in a key-value storage system.

10. A computer storage medium storing computer instructions which, when executed by a computer, cause the computer to perform the data merging method applied in a key-value storage system as claimed in any one of claims 1 to 4.

11. A storage system comprising, as a first storage system, a processor and an interface, the processor and the interface being in communication, the processor being configured to perform the data consolidation method applied in a key-value storage system as claimed in any one of claims 1 to 4.