Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides a kv deletion optimization method, a system, equipment and a medium based on distributed storage, which can effectively accelerate the deletion speed, improve the data recovery efficiency and solve the problem of overtime threads in the recovery process.
In a first aspect, an embodiment of the present invention provides a kv deletion optimization method based on distributed storage, where the kv deletion optimization method based on distributed storage includes:
Acquiring data to be recovered in a distributed storage system;
judging the type of the data to be recovered, and if the data to be recovered is a db object, copying the db object to recover the data;
And deleting the kv data in batches by DELETE RANGE if the quantity of the kv data reaches a first preset value, and deleting invalid kv data generated after deleting the kv data by adopting asynchronous compatibility.
Compared with the prior art, the first aspect of the invention has the following beneficial effects:
The method comprises the steps of obtaining data to be recovered in a distributed storage system, judging the type of the data to be recovered, copying the db object to recover the data if the data to be recovered is the db object, selecting a proper object to recover and copy the data by judging the type of the data to be recovered, deleting kv data generated when the original db object with the same name as the db object is deleted after the db object is copied, judging the quantity of the kv data, adopting DELETE RANGE to delete the kv data in batches if the quantity of the kv data reaches a first preset value, deleting invalid kv data generated after deleting the kv data in batches by adopting asynchronous compact range, deleting the kv data in batches by DELETE RANGE, effectively accelerating the speed of deleting the data, solving the problem of overtime of threads in the recovery process, deleting the invalid kv data by adopting asynchronous compact range, avoiding the influence of service io on the db after long-time sinking, effectively reducing the influence of the data recovery process on the service io, improving the robustness of the distributed storage system, and simultaneously accelerating the recovery of the data and reducing the resource consumption of hardware.
According to some embodiments of the invention, before the copying the db object for data recovery, the kv deletion optimization method based on distributed storage further includes:
The method comprises the steps of judging whether a db object is a master copy or a slave copy, copying a first db object corresponding to data to be restored by adopting a data pulling operation if the db object is the master copy, and copying a second db object corresponding to the data to be restored by adopting a data pushing operation if the db object is the slave copy.
According to some embodiments of the invention, the determining the type of the data to be restored further includes:
If the data to be recovered is a common data object, directly copying the data object corresponding to the data to be recovered to recover the data.
According to some embodiments of the invention, after determining the number of kv data, the kv deletion optimization method based on distributed storage further includes:
And if the number of the kv data is smaller than the first preset value, deleting the kv data in an iterative mode.
According to some embodiments of the invention, after deleting the kv data, the kv deletion optimization method based on distributed storage further includes:
judging the recovery condition of the data to be recovered, and if the data to be recovered fails to recover, waiting for the recovery of the distributed storage system;
and after the distributed storage system is restored, restoring the data to be restored according to a restoration log corresponding to the data to be restored.
According to some embodiments of the invention, the deleting the invalid kv data generated after deleting the kv data using asynchronous co MPACT RANGE includes:
Temporarily storing the kv data deleted in batches through DELETE RANGE, and marking the kv data as invalid kv data;
If the number of the invalid kv data reaches a second preset value, deleting the invalid kv data by adopting asynchronous compatibility;
And if the number of the invalid kv data is smaller than the second preset value, adopting iteration to delete the invalid kv data.
According to some embodiments of the invention, the deleting the invalid kv data using asynchronous compatibility includes:
determining the range of the invalid kv data to be deleted through two parameters of begin and end;
After determining the range, submitting a task request of the compatibility;
and according to the task request, deleting the invalid kv data asynchronously in the background.
In a second aspect, an embodiment of the present invention further provides a kv deletion optimization system based on distributed storage, where the kv deletion optimization system based on distributed storage includes:
The data acquisition unit is used for acquiring data to be recovered in the distributed storage system;
The object copying unit is used for judging the type of the data to be restored, and copying the db object to restore the data if the data to be restored is the db object;
And the data deleting unit is used for deleting the kv data generated when the original db object with the same name as the db object is deleted after the db object is copied, judging the quantity of the kv data, adopting DELETE RANGE to delete the kv data in batches if the quantity of the kv data reaches a first preset value, and adopting asynchronous compatibility to delete the invalid kv data generated after the kv data is deleted.
In a third aspect, an embodiment of the present invention further provides a kv deletion optimization device based on distributed storage, including at least one control processor and a memory for communication connection with the at least one control processor, where the memory stores instructions executable by the at least one control processor, where the instructions are executed by the at least one control processor, so that the at least one control processor can perform a kv deletion optimization method based on distributed storage as described above.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium storing computer-executable instructions for causing a computer to perform a kv deletion optimization method based on distributed storage as described above.
It is to be understood that the advantages of the second to fourth aspects compared with the related art are the same as those of the first aspect compared with the related art, and reference may be made to the related description in the first aspect, which is not repeated herein.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In the description of the present invention, the description of first, second, etc. is for the purpose of distinguishing between technical features only and should not be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, it should be understood that the direction or positional relationship indicated with respect to the description of the orientation, such as up, down, etc., is based on the direction or positional relationship shown in the drawings, is merely for convenience of describing the present invention and simplifying the description, and does not indicate or imply that the apparatus or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In the description of the present invention, unless explicitly defined otherwise, terms such as arrangement, installation, connection, etc. should be construed broadly and the specific meaning of the terms in the present invention can be determined reasonably by a person skilled in the art in combination with the specific content of the technical solution.
In the existing technical scheme based on iterator iterative deleting kv (key-value, key value pair storage), the deleting of kv needs to be iterated one by one, so that the problem of deleting timeout can occur in the scene of db (datebase, database) with large data volume, and finally, the service io (input/output flow, also called as service flow) is affected, even the service io is interrupted, and the iterative deleting of kv is synchronously performed, so that the processing efficiency is low.
In order to solve the problems, the method and the device for restoring the distributed storage system have the advantages that the type of the data to be restored is judged by acquiring the data to be restored in the distributed storage system, if the data to be restored is db, the db object is copied to restore the data, the appropriate object can be selected to restore and copy the data by judging the type of the data to be restored, after the db object is copied, kv data generated when the original db object with the same name as the db object is deleted, the number of kv data is judged, if the number of kv data reaches a first preset value, DELETE RANGE is adopted to delete the kv data in batches, and asynchronous compact range is adopted to delete the invalid kv data generated after the kv data is deleted, the deleting speed of the kv data in batches is effectively accelerated by DELETE RANGE, the restoration of the data is accelerated, the problem of thread overtime in the restoration process is solved, the asynchronous compact range is adopted to delete the invalid kv data, the influence of the service io on the service io is effectively reduced, the robustness of the distributed storage system is improved, and meanwhile, and the resource consumption of hardware is reduced.
Referring to fig. 1, an embodiment of the present invention provides a kv deletion optimization method based on distributed storage, where the kv deletion optimization method based on distributed storage includes, but is not limited to, steps S100 to S300, where:
step S100, obtaining data to be restored in a distributed storage system;
step 200, judging the type of the data to be recovered, and if the data to be recovered is a db object, copying the db object to recover the data;
And step S300, deleting the kv data generated when deleting the original db object with the same name as the db object after copying the db object, judging the quantity of the kv data, deleting the kv data in batches by adopting DELETE RANGE if the quantity of the kv data reaches a first preset value, and deleting invalid kv data generated after deleting the kv data by adopting asynchronous compatibility.
In steps S100 to S300 of some embodiments, in order to select a suitable object to perform data repair copying, the embodiment determines the type of data to be recovered by acquiring the data to be recovered in the distributed storage system, copies the db object to perform data recovery if the data to be recovered is the db object, and in order to effectively accelerate the deleting speed and accelerate the data recovery, solves the problem of thread overtime in the recovering process, the embodiment determines the quantity of kv data by deleting the kv data generated when deleting the original db object with the same name as the db object after the db object is copied, and deletes the kv data in DELETE RANGE batches if the quantity of kv data reaches a first preset value and deletes invalid kv data generated after deleting the kv data by adopting asynchronous compact range.
It should be noted that, in this embodiment, the first preset value may be changed according to actual needs, and this embodiment is not limited specifically.
In some embodiments, the distributed storage based kv deletion optimization method further comprises, prior to copying the db object for data recovery:
The method comprises the steps of judging whether a db object is a master copy or a slave copy, copying a first db object corresponding to data to be restored by adopting data pulling operation if the db object is the master copy, and copying a second db object corresponding to the data to be restored by adopting data pushing operation if the db object is the slave copy.
In this embodiment, according to the type of the db object, the copy recovery is performed in a suitable manner, so that the data recovery efficiency can be improved.
In some embodiments, determining the type of data to be restored further comprises:
If the data to be recovered is a common data object, directly copying the data object corresponding to the data to be recovered to recover the data.
In this embodiment, since the common data object does not record too many data entries, the copy recovery is directly performed, and the copy recovery is performed on the data in a suitable manner, so that the efficiency of data recovery is improved.
In some embodiments, after determining the number of kv data, the kv deletion optimization method based on distributed storage further includes:
And if the number of the kv data is smaller than the first preset value, deleting the kv data in an iterative mode.
In this embodiment, DELETE RANGE has invalid kv data when deleting kv data, and this invalid kv data triggers invalid reading to affect the service io, so when the data size is relatively small, deleting kv data in an iterative manner can avoid that invalid reading affects the service io.
In some embodiments, after deleting the kv data, the kv deletion optimization method based on the distributed storage further includes:
judging the recovery condition of the data to be recovered, and if the recovery of the data to be recovered fails, waiting for the recovery of the distributed storage system;
and after the distributed storage system is restored, restoring the data to be restored according to the restoration log corresponding to the data to be restored.
In this embodiment, if abnormal situations such as node power failure and network interruption occur in the data recovery process, the data recovery of the object will fail, after the distributed storage system recovers, the recovery process of the object is re-executed in combination with the corresponding recovery log, so that the efficiency of data recovery can be improved, and the process of re-operating from beginning to end after some data recovery fails is avoided.
In some embodiments, deleting invalid kv data generated after deleting kv data using asynchronous compatibility includes:
Temporarily storing the kv data deleted in batches through DELETE RANGE, and marking the kv data as invalid kv data;
if the number of the invalid kv data reaches a second preset value, deleting the invalid kv data by adopting an asynchronous compatibility range;
And if the number of the invalid kv data is smaller than a second preset value, adopting iterative deletion to delete the invalid kv data.
In this embodiment, the invalid kv data are timely cleared through asynchronous compatibility, so as to achieve a final and complete clearing effect, avoid causing the service io to sink into db for a long time and not return, effectively accelerate the deleting speed and accelerate the data recovery.
In some embodiments, the deletion of invalid kv data using asynchronous compatibility includes:
Determining the range of invalid kv data to be deleted through two parameters of begin and end;
after determining the range, submitting a task request of the compact range;
and according to the task request, deleting the invalid kv data asynchronously in the background.
In this embodiment, the compact range is processed by adopting an asynchronous task mode, so that the influence of the compact range on the normal service io in the compression process can be avoided.
For ease of understanding by those skilled in the art, a set of preferred embodiments are provided below:
in a distributed storage system, range deletion is used in a data recovery stage, for example, when data of primary and secondary copies are different, it is necessary to agree on data recovery. In addition, for example, the disk service process where the data is located repeatedly goes on and off, which may cause a large amount of temporary object data to remain, and needs to be cleared when the disk service process is started, and this also uses single-scope deletion. And (3) repairing the object data recorded in the db by adopting a mode of copying the whole object, and deleting the whole object after repairing.
All object contents including data, extended attributes, omap (object map) are copied from the source to the destination for the db object in its entirety. If the target object has a relatively large content, such as a large number of extended attributes or omap entries, it is difficult to complete the repair through one interaction, and multiple deliveries are required to complete the complete recovery of the data. And deleting the db object with the same local name after the db object content is completely copied. Because the kv data items stored on the db object are more, the conventional iterative deletion inevitably brings a large amount of time consumption, so that the deleting speed can be effectively increased by adopting a DELETE RANGE-range deleting mode, and the data recovery efficiency is improved. The specific scheme is as follows:
1. Process flow using DELETE RANGE (range deletion) is referred to in fig. 2. The method comprises the following steps:
(1) Judging whether the data to be recovered is a common data object or a db object stored in omap before entering data recovery, if the data to be recovered is the common data object, directly copying and recovering because the common data object does not record too many data entries, if the data to be recovered is the db object stored in omap, carrying out batch copying to recover the data aiming at whether the db object is a master copy or a slave copy;
(2) When the db object is a slave copy, the master copy judges that one or more copies currently have degradation objects, the authority version of each degradation object is actively pushed to the corresponding copy, and then the copy completes data restoration copying;
(3) When the db object is a master copy, the master copy has a degradation object, the master copy selects a proper copy according to the missing log record to pull the authoritative version of the degradation object to the local, and then the data repair copy is completed;
(4) Deleting an original object to be restored (namely, an original db object with the same name as a copied db object) in the whole data restoration process, generating a lot of deleted kv data by deleting the original object, judging the quantity of the stored kv data on one db object when deleting the object, deleting the original object through DELETE RANGE if the quantity of the stored kv data exceeds a certain range (for example, the kv data on a single object exceeds 1000 pieces), and processing by using the original iterative deletion if the quantity of the stored kv data does not exceed the range;
(5) After the data to be restored is copied and deleted, the data restoration aiming at the object is completed, the distributed storage system updates the corresponding restoration log, and the restoration processing is carried out on the data of the rest objects to be restored according to the flow;
(6) If abnormal conditions such as node power failure, network interruption and the like occur in the data recovery process, the object data recovery failure is caused, the environment to be stored (the distributed storage system) is recovered, and the recovery flow of the object is re-executed by combining the corresponding recovery logs.
2. The process flow using the compact range is referred to in fig. 3. The method comprises the following steps:
(1) Deleting the kv data by DELETE RANGE is operated in a batch writing mode, and in order to ensure consistency of the whole process, the intervals of the deleted kv data are temporarily stored in batches and marked as invalid kv data, so that the invalid kv data occupy extra db space;
(2) DELETE RANGE after deleting, generating invalid kv data, judging whether the compatibility operation is needed according to the quantity of the generated invalid kv data, entering default iterative compatibility processing if the quantity of the invalid kv data does not reach a preset value, firstly determining the starting range of the compatibility operation if the quantity of the invalid kv data reaches the preset value, determining the range of the invalid kv data needing to be compressed through begin and end parameters, and then submitting a task request of the compatibility, wherein the whole process is processed asynchronously in the background in order to avoid the influence on normal business io in the compression process.
Since db has a large amount of temporarily invalid kv data after DELETE RANGE is used to delete a large range of kv data, when there is a large amount of invalid kv data behind the iterated object, a large amount of invalid reading is triggered, and in order to avoid causing the service io to sink into db for a long time without returning, the invalid kv data needs to be cleaned in time through asynchronous compact range, so as to achieve the final thorough cleaning effect.
Referring to fig. 4, the embodiment of the present invention further provides a kv deletion optimization system based on distributed storage, where the kv deletion optimization system based on distributed storage includes a data acquisition unit 100, an object copy unit 200, and a data deletion unit 300, where:
A data acquisition unit 100, configured to acquire data to be restored in the distributed storage system;
The object copying unit 200 is configured to determine a type of data to be restored, and copy the db object to perform data restoration if the data to be restored is the db object;
And the data deleting unit 300 is used for deleting the kv data generated when the original db object with the same name as the db object is deleted after the db object is copied, judging the quantity of the kv data, deleting the kv data in batches by adopting DELETE RANGE if the quantity of the kv data reaches a first preset value, and deleting the invalid kv data generated after deleting the kv data by adopting asynchronous compatibility.
It should be noted that, since a kv deletion optimization system based on distributed storage in this embodiment and a kv deletion optimization method based on distributed storage described above are based on the same inventive concept, the corresponding content in the method embodiment is also applicable to this system embodiment, and will not be described in detail here.
The embodiment of the invention also provides a kv deletion optimizing device based on the distributed storage, which comprises at least one control processor and a memory for communication connection with the at least one control processor.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
A non-transitory software program and instructions required to implement a distributed storage-based kv deletion optimization method of the above embodiments are stored in a memory, and when executed by a processor, one of the distributed storage-based kv deletion optimization methods of the above embodiments is executed, for example, the method steps S100 to S300 in fig. 1 described above are executed.
The system embodiments described above are merely illustrative, in that the units illustrated as separate components may or may not be physically separate, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Embodiments of the present invention also provide a computer-readable storage medium storing computer-executable instructions that are executed by one or more control processors to cause the one or more control processors to perform a kv deletion optimization method based on distributed storage in the above method embodiments, for example, to perform the functions of the method steps S100 to S300 in fig. 1 described above.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
While the preferred embodiments of the present application have been described in detail, the embodiments of the present application are not limited to the above-described embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the embodiments of the present application, and these equivalent modifications or substitutions are included in the scope of the embodiments of the present application as defined in the appended claims.