[go: up one dir, main page]

CN117055805B - KV deletion optimization method, system, device and medium based on distributed storage - Google Patents

KV deletion optimization method, system, device and medium based on distributed storage Download PDF

Info

Publication number
CN117055805B
CN117055805B CN202310819534.6A CN202310819534A CN117055805B CN 117055805 B CN117055805 B CN 117055805B CN 202310819534 A CN202310819534 A CN 202310819534A CN 117055805 B CN117055805 B CN 117055805B
Authority
CN
China
Prior art keywords
data
invalid
distributed storage
deleted
restored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310819534.6A
Other languages
Chinese (zh)
Other versions
CN117055805A (en
Inventor
沈大勇
王涛
姚锋
张忠山
王沛
何磊
陈宇宁
陈盈果
刘晓路
杜永浩
闫俊刚
吕济民
陈英武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202310819534.6A priority Critical patent/CN117055805B/en
Publication of CN117055805A publication Critical patent/CN117055805A/en
Application granted granted Critical
Publication of CN117055805B publication Critical patent/CN117055805B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于分布式存储的kv删除优化方法、系统、设备及介质,本方法通过获取分布式存储系统中的待恢复数据;判断待恢复数据的类型,若待恢复数据为db对象,则拷贝db对象以进行数据恢复;完成db对象拷贝后,删除与db对象同名的原db对象时产生kv数据,判断kv数据的数量;若kv数据的数量达到第一预设值,则采用delete range批量删除kv数据,并对删除kv数据后产生的无效kv数据采用异步的compact range进行删除。本发明能够有效加快删除的速度,提升数据恢复的效率,解决了恢复过程中线程超时的问题。

The present invention discloses a kv deletion optimization method, system, device and medium based on distributed storage. The method obtains the data to be recovered in the distributed storage system; determines the type of the data to be recovered, and if the data to be recovered is a db object, copies the db object to recover the data; after the db object is copied, kv data is generated when the original db object with the same name as the db object is deleted, and the number of kv data is determined; if the number of kv data reaches a first preset value, delete range is used to delete the kv data in batches, and the invalid kv data generated after the kv data is deleted is deleted using an asynchronous compact range. The present invention can effectively speed up the deletion speed, improve the efficiency of data recovery, and solve the problem of thread timeout during the recovery process.

Description

Kv deletion optimization method, system, equipment and medium based on distributed storage
Technical Field
The invention relates to the technical field of data processing, in particular to a kv deletion optimization method, a system, equipment and a medium based on distributed storage.
Background
In the current practical application scenario of distributed storage, a scenario of deleting kv (key-value pair storage) in large quantities is mainly used in the data recovery scenarios of bad discs, disc reset, capacity expansion and the like, and the kv is required to be deleted in the process of recovering a large amount of data.
In the existing technical scheme for deleting kv based on iterator iteration, the kv is required to be deleted one by one, so that the problem of deleting timeout can occur in the scene of db (datebase, database) data volume, and finally, the service io (input/output, input/output flow, also called as service flow) is influenced, even the service io is interrupted, and the iterative deleting kv is synchronously performed, so that the processing efficiency is low.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides a kv deletion optimization method, a system, equipment and a medium based on distributed storage, which can effectively accelerate the deletion speed, improve the data recovery efficiency and solve the problem of overtime threads in the recovery process.
In a first aspect, an embodiment of the present invention provides a kv deletion optimization method based on distributed storage, where the kv deletion optimization method based on distributed storage includes:
Acquiring data to be recovered in a distributed storage system;
judging the type of the data to be recovered, and if the data to be recovered is a db object, copying the db object to recover the data;
And deleting the kv data in batches by DELETE RANGE if the quantity of the kv data reaches a first preset value, and deleting invalid kv data generated after deleting the kv data by adopting asynchronous compatibility.
Compared with the prior art, the first aspect of the invention has the following beneficial effects:
The method comprises the steps of obtaining data to be recovered in a distributed storage system, judging the type of the data to be recovered, copying the db object to recover the data if the data to be recovered is the db object, selecting a proper object to recover and copy the data by judging the type of the data to be recovered, deleting kv data generated when the original db object with the same name as the db object is deleted after the db object is copied, judging the quantity of the kv data, adopting DELETE RANGE to delete the kv data in batches if the quantity of the kv data reaches a first preset value, deleting invalid kv data generated after deleting the kv data in batches by adopting asynchronous compact range, deleting the kv data in batches by DELETE RANGE, effectively accelerating the speed of deleting the data, solving the problem of overtime of threads in the recovery process, deleting the invalid kv data by adopting asynchronous compact range, avoiding the influence of service io on the db after long-time sinking, effectively reducing the influence of the data recovery process on the service io, improving the robustness of the distributed storage system, and simultaneously accelerating the recovery of the data and reducing the resource consumption of hardware.
According to some embodiments of the invention, before the copying the db object for data recovery, the kv deletion optimization method based on distributed storage further includes:
The method comprises the steps of judging whether a db object is a master copy or a slave copy, copying a first db object corresponding to data to be restored by adopting a data pulling operation if the db object is the master copy, and copying a second db object corresponding to the data to be restored by adopting a data pushing operation if the db object is the slave copy.
According to some embodiments of the invention, the determining the type of the data to be restored further includes:
If the data to be recovered is a common data object, directly copying the data object corresponding to the data to be recovered to recover the data.
According to some embodiments of the invention, after determining the number of kv data, the kv deletion optimization method based on distributed storage further includes:
And if the number of the kv data is smaller than the first preset value, deleting the kv data in an iterative mode.
According to some embodiments of the invention, after deleting the kv data, the kv deletion optimization method based on distributed storage further includes:
judging the recovery condition of the data to be recovered, and if the data to be recovered fails to recover, waiting for the recovery of the distributed storage system;
and after the distributed storage system is restored, restoring the data to be restored according to a restoration log corresponding to the data to be restored.
According to some embodiments of the invention, the deleting the invalid kv data generated after deleting the kv data using asynchronous co MPACT RANGE includes:
Temporarily storing the kv data deleted in batches through DELETE RANGE, and marking the kv data as invalid kv data;
If the number of the invalid kv data reaches a second preset value, deleting the invalid kv data by adopting asynchronous compatibility;
And if the number of the invalid kv data is smaller than the second preset value, adopting iteration to delete the invalid kv data.
According to some embodiments of the invention, the deleting the invalid kv data using asynchronous compatibility includes:
determining the range of the invalid kv data to be deleted through two parameters of begin and end;
After determining the range, submitting a task request of the compatibility;
and according to the task request, deleting the invalid kv data asynchronously in the background.
In a second aspect, an embodiment of the present invention further provides a kv deletion optimization system based on distributed storage, where the kv deletion optimization system based on distributed storage includes:
The data acquisition unit is used for acquiring data to be recovered in the distributed storage system;
The object copying unit is used for judging the type of the data to be restored, and copying the db object to restore the data if the data to be restored is the db object;
And the data deleting unit is used for deleting the kv data generated when the original db object with the same name as the db object is deleted after the db object is copied, judging the quantity of the kv data, adopting DELETE RANGE to delete the kv data in batches if the quantity of the kv data reaches a first preset value, and adopting asynchronous compatibility to delete the invalid kv data generated after the kv data is deleted.
In a third aspect, an embodiment of the present invention further provides a kv deletion optimization device based on distributed storage, including at least one control processor and a memory for communication connection with the at least one control processor, where the memory stores instructions executable by the at least one control processor, where the instructions are executed by the at least one control processor, so that the at least one control processor can perform a kv deletion optimization method based on distributed storage as described above.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium storing computer-executable instructions for causing a computer to perform a kv deletion optimization method based on distributed storage as described above.
It is to be understood that the advantages of the second to fourth aspects compared with the related art are the same as those of the first aspect compared with the related art, and reference may be made to the related description in the first aspect, which is not repeated herein.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a flow chart of a kv deletion optimization method based on distributed storage according to an embodiment of the invention;
FIG. 2 is a flow chart of range deletion in accordance with one embodiment of the present invention;
FIG. 3 is a flow chart of range compression of an embodiment of the present invention;
fig. 4 is a block diagram of a kv deletion optimization system based on distributed storage according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In the description of the present invention, the description of first, second, etc. is for the purpose of distinguishing between technical features only and should not be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, it should be understood that the direction or positional relationship indicated with respect to the description of the orientation, such as up, down, etc., is based on the direction or positional relationship shown in the drawings, is merely for convenience of describing the present invention and simplifying the description, and does not indicate or imply that the apparatus or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In the description of the present invention, unless explicitly defined otherwise, terms such as arrangement, installation, connection, etc. should be construed broadly and the specific meaning of the terms in the present invention can be determined reasonably by a person skilled in the art in combination with the specific content of the technical solution.
In the existing technical scheme based on iterator iterative deleting kv (key-value, key value pair storage), the deleting of kv needs to be iterated one by one, so that the problem of deleting timeout can occur in the scene of db (datebase, database) with large data volume, and finally, the service io (input/output flow, also called as service flow) is affected, even the service io is interrupted, and the iterative deleting of kv is synchronously performed, so that the processing efficiency is low.
In order to solve the problems, the method and the device for restoring the distributed storage system have the advantages that the type of the data to be restored is judged by acquiring the data to be restored in the distributed storage system, if the data to be restored is db, the db object is copied to restore the data, the appropriate object can be selected to restore and copy the data by judging the type of the data to be restored, after the db object is copied, kv data generated when the original db object with the same name as the db object is deleted, the number of kv data is judged, if the number of kv data reaches a first preset value, DELETE RANGE is adopted to delete the kv data in batches, and asynchronous compact range is adopted to delete the invalid kv data generated after the kv data is deleted, the deleting speed of the kv data in batches is effectively accelerated by DELETE RANGE, the restoration of the data is accelerated, the problem of thread overtime in the restoration process is solved, the asynchronous compact range is adopted to delete the invalid kv data, the influence of the service io on the service io is effectively reduced, the robustness of the distributed storage system is improved, and meanwhile, and the resource consumption of hardware is reduced.
Referring to fig. 1, an embodiment of the present invention provides a kv deletion optimization method based on distributed storage, where the kv deletion optimization method based on distributed storage includes, but is not limited to, steps S100 to S300, where:
step S100, obtaining data to be restored in a distributed storage system;
step 200, judging the type of the data to be recovered, and if the data to be recovered is a db object, copying the db object to recover the data;
And step S300, deleting the kv data generated when deleting the original db object with the same name as the db object after copying the db object, judging the quantity of the kv data, deleting the kv data in batches by adopting DELETE RANGE if the quantity of the kv data reaches a first preset value, and deleting invalid kv data generated after deleting the kv data by adopting asynchronous compatibility.
In steps S100 to S300 of some embodiments, in order to select a suitable object to perform data repair copying, the embodiment determines the type of data to be recovered by acquiring the data to be recovered in the distributed storage system, copies the db object to perform data recovery if the data to be recovered is the db object, and in order to effectively accelerate the deleting speed and accelerate the data recovery, solves the problem of thread overtime in the recovering process, the embodiment determines the quantity of kv data by deleting the kv data generated when deleting the original db object with the same name as the db object after the db object is copied, and deletes the kv data in DELETE RANGE batches if the quantity of kv data reaches a first preset value and deletes invalid kv data generated after deleting the kv data by adopting asynchronous compact range.
It should be noted that, in this embodiment, the first preset value may be changed according to actual needs, and this embodiment is not limited specifically.
In some embodiments, the distributed storage based kv deletion optimization method further comprises, prior to copying the db object for data recovery:
The method comprises the steps of judging whether a db object is a master copy or a slave copy, copying a first db object corresponding to data to be restored by adopting data pulling operation if the db object is the master copy, and copying a second db object corresponding to the data to be restored by adopting data pushing operation if the db object is the slave copy.
In this embodiment, according to the type of the db object, the copy recovery is performed in a suitable manner, so that the data recovery efficiency can be improved.
In some embodiments, determining the type of data to be restored further comprises:
If the data to be recovered is a common data object, directly copying the data object corresponding to the data to be recovered to recover the data.
In this embodiment, since the common data object does not record too many data entries, the copy recovery is directly performed, and the copy recovery is performed on the data in a suitable manner, so that the efficiency of data recovery is improved.
In some embodiments, after determining the number of kv data, the kv deletion optimization method based on distributed storage further includes:
And if the number of the kv data is smaller than the first preset value, deleting the kv data in an iterative mode.
In this embodiment, DELETE RANGE has invalid kv data when deleting kv data, and this invalid kv data triggers invalid reading to affect the service io, so when the data size is relatively small, deleting kv data in an iterative manner can avoid that invalid reading affects the service io.
In some embodiments, after deleting the kv data, the kv deletion optimization method based on the distributed storage further includes:
judging the recovery condition of the data to be recovered, and if the recovery of the data to be recovered fails, waiting for the recovery of the distributed storage system;
and after the distributed storage system is restored, restoring the data to be restored according to the restoration log corresponding to the data to be restored.
In this embodiment, if abnormal situations such as node power failure and network interruption occur in the data recovery process, the data recovery of the object will fail, after the distributed storage system recovers, the recovery process of the object is re-executed in combination with the corresponding recovery log, so that the efficiency of data recovery can be improved, and the process of re-operating from beginning to end after some data recovery fails is avoided.
In some embodiments, deleting invalid kv data generated after deleting kv data using asynchronous compatibility includes:
Temporarily storing the kv data deleted in batches through DELETE RANGE, and marking the kv data as invalid kv data;
if the number of the invalid kv data reaches a second preset value, deleting the invalid kv data by adopting an asynchronous compatibility range;
And if the number of the invalid kv data is smaller than a second preset value, adopting iterative deletion to delete the invalid kv data.
In this embodiment, the invalid kv data are timely cleared through asynchronous compatibility, so as to achieve a final and complete clearing effect, avoid causing the service io to sink into db for a long time and not return, effectively accelerate the deleting speed and accelerate the data recovery.
In some embodiments, the deletion of invalid kv data using asynchronous compatibility includes:
Determining the range of invalid kv data to be deleted through two parameters of begin and end;
after determining the range, submitting a task request of the compact range;
and according to the task request, deleting the invalid kv data asynchronously in the background.
In this embodiment, the compact range is processed by adopting an asynchronous task mode, so that the influence of the compact range on the normal service io in the compression process can be avoided.
For ease of understanding by those skilled in the art, a set of preferred embodiments are provided below:
in a distributed storage system, range deletion is used in a data recovery stage, for example, when data of primary and secondary copies are different, it is necessary to agree on data recovery. In addition, for example, the disk service process where the data is located repeatedly goes on and off, which may cause a large amount of temporary object data to remain, and needs to be cleared when the disk service process is started, and this also uses single-scope deletion. And (3) repairing the object data recorded in the db by adopting a mode of copying the whole object, and deleting the whole object after repairing.
All object contents including data, extended attributes, omap (object map) are copied from the source to the destination for the db object in its entirety. If the target object has a relatively large content, such as a large number of extended attributes or omap entries, it is difficult to complete the repair through one interaction, and multiple deliveries are required to complete the complete recovery of the data. And deleting the db object with the same local name after the db object content is completely copied. Because the kv data items stored on the db object are more, the conventional iterative deletion inevitably brings a large amount of time consumption, so that the deleting speed can be effectively increased by adopting a DELETE RANGE-range deleting mode, and the data recovery efficiency is improved. The specific scheme is as follows:
1. Process flow using DELETE RANGE (range deletion) is referred to in fig. 2. The method comprises the following steps:
(1) Judging whether the data to be recovered is a common data object or a db object stored in omap before entering data recovery, if the data to be recovered is the common data object, directly copying and recovering because the common data object does not record too many data entries, if the data to be recovered is the db object stored in omap, carrying out batch copying to recover the data aiming at whether the db object is a master copy or a slave copy;
(2) When the db object is a slave copy, the master copy judges that one or more copies currently have degradation objects, the authority version of each degradation object is actively pushed to the corresponding copy, and then the copy completes data restoration copying;
(3) When the db object is a master copy, the master copy has a degradation object, the master copy selects a proper copy according to the missing log record to pull the authoritative version of the degradation object to the local, and then the data repair copy is completed;
(4) Deleting an original object to be restored (namely, an original db object with the same name as a copied db object) in the whole data restoration process, generating a lot of deleted kv data by deleting the original object, judging the quantity of the stored kv data on one db object when deleting the object, deleting the original object through DELETE RANGE if the quantity of the stored kv data exceeds a certain range (for example, the kv data on a single object exceeds 1000 pieces), and processing by using the original iterative deletion if the quantity of the stored kv data does not exceed the range;
(5) After the data to be restored is copied and deleted, the data restoration aiming at the object is completed, the distributed storage system updates the corresponding restoration log, and the restoration processing is carried out on the data of the rest objects to be restored according to the flow;
(6) If abnormal conditions such as node power failure, network interruption and the like occur in the data recovery process, the object data recovery failure is caused, the environment to be stored (the distributed storage system) is recovered, and the recovery flow of the object is re-executed by combining the corresponding recovery logs.
2. The process flow using the compact range is referred to in fig. 3. The method comprises the following steps:
(1) Deleting the kv data by DELETE RANGE is operated in a batch writing mode, and in order to ensure consistency of the whole process, the intervals of the deleted kv data are temporarily stored in batches and marked as invalid kv data, so that the invalid kv data occupy extra db space;
(2) DELETE RANGE after deleting, generating invalid kv data, judging whether the compatibility operation is needed according to the quantity of the generated invalid kv data, entering default iterative compatibility processing if the quantity of the invalid kv data does not reach a preset value, firstly determining the starting range of the compatibility operation if the quantity of the invalid kv data reaches the preset value, determining the range of the invalid kv data needing to be compressed through begin and end parameters, and then submitting a task request of the compatibility, wherein the whole process is processed asynchronously in the background in order to avoid the influence on normal business io in the compression process.
Since db has a large amount of temporarily invalid kv data after DELETE RANGE is used to delete a large range of kv data, when there is a large amount of invalid kv data behind the iterated object, a large amount of invalid reading is triggered, and in order to avoid causing the service io to sink into db for a long time without returning, the invalid kv data needs to be cleaned in time through asynchronous compact range, so as to achieve the final thorough cleaning effect.
Referring to fig. 4, the embodiment of the present invention further provides a kv deletion optimization system based on distributed storage, where the kv deletion optimization system based on distributed storage includes a data acquisition unit 100, an object copy unit 200, and a data deletion unit 300, where:
A data acquisition unit 100, configured to acquire data to be restored in the distributed storage system;
The object copying unit 200 is configured to determine a type of data to be restored, and copy the db object to perform data restoration if the data to be restored is the db object;
And the data deleting unit 300 is used for deleting the kv data generated when the original db object with the same name as the db object is deleted after the db object is copied, judging the quantity of the kv data, deleting the kv data in batches by adopting DELETE RANGE if the quantity of the kv data reaches a first preset value, and deleting the invalid kv data generated after deleting the kv data by adopting asynchronous compatibility.
It should be noted that, since a kv deletion optimization system based on distributed storage in this embodiment and a kv deletion optimization method based on distributed storage described above are based on the same inventive concept, the corresponding content in the method embodiment is also applicable to this system embodiment, and will not be described in detail here.
The embodiment of the invention also provides a kv deletion optimizing device based on the distributed storage, which comprises at least one control processor and a memory for communication connection with the at least one control processor.
The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
A non-transitory software program and instructions required to implement a distributed storage-based kv deletion optimization method of the above embodiments are stored in a memory, and when executed by a processor, one of the distributed storage-based kv deletion optimization methods of the above embodiments is executed, for example, the method steps S100 to S300 in fig. 1 described above are executed.
The system embodiments described above are merely illustrative, in that the units illustrated as separate components may or may not be physically separate, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Embodiments of the present invention also provide a computer-readable storage medium storing computer-executable instructions that are executed by one or more control processors to cause the one or more control processors to perform a kv deletion optimization method based on distributed storage in the above method embodiments, for example, to perform the functions of the method steps S100 to S300 in fig. 1 described above.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
While the preferred embodiments of the present application have been described in detail, the embodiments of the present application are not limited to the above-described embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the embodiments of the present application, and these equivalent modifications or substitutions are included in the scope of the embodiments of the present application as defined in the appended claims.

Claims (7)

1.一种基于分布式存储的kv删除优化方法,其特征在于,所述基于分布式存储的kv删除优化方法包括:1. A kv deletion optimization method based on distributed storage, characterized in that the kv deletion optimization method based on distributed storage includes: 获取分布式存储系统中的待恢复数据;Obtain the data to be restored in the distributed storage system; 判断所述待恢复数据的类型,若所述待恢复数据为db对象,则拷贝所述db对象以进行数据恢复;Determine the type of the data to be restored, and if the data to be restored is a db object, copy the db object to perform data restoration; 完成db对象拷贝后,删除与所述db对象同名的原db对象时产生kv数据,判断所述kv数据的数量;若所述kv数据的数量小于第一预设值,则采用迭代方式删除所述kv数据;若所述kv数据的数量达到第一预设值,则采用delete range批量删除所述kv数据,并对删除所述kv数据后产生的无效kv数据采用异步的compact range进行删除,其中:After completing the copy of the db object, kv data is generated when deleting the original db object with the same name as the db object, and the number of the kv data is determined; if the number of the kv data is less than the first preset value, the kv data is deleted in an iterative manner; if the number of the kv data reaches the first preset value, the kv data is deleted in batches using the delete range, and the invalid kv data generated after deleting the kv data is deleted using the asynchronous compact range, wherein: 将通过delete range批量删除的所述kv数据进行临时保存,并标记为无效kv数据;The kv data deleted in batches through the delete range are temporarily saved and marked as invalid kv data; 若所述无效kv数据的数量达到第二预设值,则采用异步的compact range删除所述无效kv数据;其中:If the amount of the invalid kv data reaches a second preset value, the invalid kv data is deleted using an asynchronous compact range; wherein: 通过begin和end两个参数确定需要删除的所述无效kv数据的范围;The range of the invalid kv data to be deleted is determined by the two parameters begin and end; 确定所述范围后,提交compact range的任务请求;After determining the range, submit a task request for compact range; 根据所述任务请求,在后台异步进行所述无效kv数据的删除;According to the task request, the invalid kv data is deleted asynchronously in the background; 若所述无效kv数据的数量小于所述第二预设值,则采用迭代删除所述无效kv数据。If the amount of the invalid kv data is less than the second preset value, the invalid kv data is deleted iteratively. 2.根据权利要求1所述的基于分布式存储的kv删除优化方法,其特征在于,所述拷贝所述db对象以进行数据恢复之前,所述基于分布式存储的kv删除优化方法还包括:2. The distributed storage-based kv deletion optimization method according to claim 1, characterized in that before copying the db object for data recovery, the distributed storage-based kv deletion optimization method further comprises: 判断所述db对象是主副本还是从副本;若所述db对象为主副本,则采用数据拉取操作,拷贝与所述待恢复数据对应的第一db对象;若所述db对象为从副本,则采用数据推送操作,拷贝与所述待恢复数据对应的第二db对象。Determine whether the db object is a master copy or a slave copy; if the db object is a master copy, use a data pull operation to copy a first db object corresponding to the data to be recovered; if the db object is a slave copy, use a data push operation to copy a second db object corresponding to the data to be recovered. 3.根据权利要求1所述的基于分布式存储的kv删除优化方法,其特征在于,所述判断所述待恢复数据的类型还包括:3. The kv deletion optimization method based on distributed storage according to claim 1 is characterized in that the determining the type of the data to be restored further comprises: 若所述待恢复数据为普通的数据对象,则直接拷贝所述待恢复数据对应的数据对象以进行数据恢复。If the data to be restored is a common data object, the data object corresponding to the data to be restored is directly copied to perform data restoration. 4.根据权利要求1所述的基于分布式存储的kv删除优化方法,其特征在于,在删除所述kv数据后,所述基于分布式存储的kv删除优化方法还包括:4. The kv deletion optimization method based on distributed storage according to claim 1 is characterized in that after deleting the kv data, the kv deletion optimization method based on distributed storage further comprises: 判断所述待恢复数据的恢复情况,若所述待恢复数据恢复失败,则等待所述分布式存储系统恢复;Determine the recovery status of the data to be recovered, and if the recovery of the data to be recovered fails, wait for the distributed storage system to be recovered; 在所述分布式存储系统恢复后,根据所述待恢复数据对应的恢复日志,重新恢复所述待恢复数据。After the distributed storage system is restored, the data to be restored is restored according to the restoration log corresponding to the data to be restored. 5.一种基于分布式存储的kv删除优化系统,其特征在于,所述基于分布式存储的kv删除优化系统包括:5. A kv deletion optimization system based on distributed storage, characterized in that the kv deletion optimization system based on distributed storage includes: 数据获取单元,用于获取分布式存储系统中的待恢复数据;A data acquisition unit, used to acquire data to be restored in a distributed storage system; 对象拷贝单元,用于判断所述待恢复数据的类型,若所述待恢复数据为db对象,则拷贝所述db对象以进行数据恢复;An object copy unit, used for determining the type of the data to be restored, and if the data to be restored is a db object, copying the db object to perform data restoration; 数据删除单元,用于完成db对象拷贝后,删除与所述db对象同名的原db对象时产生kv数据,判断所述kv数据的数量;若所述kv数据的数量小于第一预设值,则采用迭代方式删除所述kv数据;若所述kv数据的数量达到第一预设值,则采用delete range批量删除所述kv数据,并对删除所述kv数据后产生的无效kv数据采用异步的compact range进行删除,其中:The data deletion unit is used to generate kv data when deleting the original db object with the same name as the db object after completing the db object copy, and determine the amount of the kv data; if the amount of the kv data is less than a first preset value, delete the kv data in an iterative manner; if the amount of the kv data reaches the first preset value, delete the kv data in batches using delete range, and delete the invalid kv data generated after deleting the kv data using asynchronous compact range, wherein: 将通过delete range批量删除的所述kv数据进行临时保存,并标记为无效kv数据;The kv data deleted in batches through the delete range are temporarily saved and marked as invalid kv data; 若所述无效kv数据的数量达到第二预设值,则采用异步的compact range删除所述无效kv数据;其中:If the amount of the invalid kv data reaches a second preset value, the invalid kv data is deleted using an asynchronous compact range; wherein: 通过begin和end两个参数确定需要删除的所述无效kv数据的范围;The range of the invalid kv data to be deleted is determined by the two parameters begin and end; 确定所述范围后,提交compact range的任务请求;After determining the range, submit a task request for compact range; 根据所述任务请求,在后台异步进行所述无效kv数据的删除;According to the task request, the invalid kv data is deleted asynchronously in the background; 若所述无效kv数据的数量小于所述第二预设值,则采用迭代删除所述无效kv数据。If the amount of the invalid kv data is less than the second preset value, the invalid kv data is deleted iteratively. 6.一种基于分布式存储的kv删除优化设备,其特征在于,包括至少一个控制处理器和用于与所述至少一个控制处理器通信连接的存储器;所述存储器存储有可被所述至少一个控制处理器执行的指令,所述指令被所述至少一个控制处理器执行,以使所述至少一个控制处理器能够执行如权利要求1至4任一项所述的基于分布式存储的kv删除优化方法。6. A KV deletion optimization device based on distributed storage, characterized in that it includes at least one control processor and a memory for communicating with the at least one control processor; the memory stores instructions that can be executed by the at least one control processor, and the instructions are executed by the at least one control processor so that the at least one control processor can execute the KV deletion optimization method based on distributed storage as described in any one of claims 1 to 4. 7.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行如权利要求1至4任一项所述的基于分布式存储的kv删除优化方法。7. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to enable a computer to execute the distributed storage-based kv deletion optimization method as described in any one of claims 1 to 4.
CN202310819534.6A 2023-07-05 2023-07-05 KV deletion optimization method, system, device and medium based on distributed storage Active CN117055805B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310819534.6A CN117055805B (en) 2023-07-05 2023-07-05 KV deletion optimization method, system, device and medium based on distributed storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310819534.6A CN117055805B (en) 2023-07-05 2023-07-05 KV deletion optimization method, system, device and medium based on distributed storage

Publications (2)

Publication Number Publication Date
CN117055805A CN117055805A (en) 2023-11-14
CN117055805B true CN117055805B (en) 2024-12-06

Family

ID=88656158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310819534.6A Active CN117055805B (en) 2023-07-05 2023-07-05 KV deletion optimization method, system, device and medium based on distributed storage

Country Status (1)

Country Link
CN (1) CN117055805B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955530A (en) * 2014-05-12 2014-07-30 暨南大学 Data reconstruction and optimization method of on-line repeating data deletion system
CN110249321A (en) * 2017-09-29 2019-09-17 甲骨文国际公司 For the system and method that capture change data use from distributed data source for heterogeneous target

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10614037B2 (en) * 2017-03-31 2020-04-07 International Business Machines Corporation Optimized deduplicated object storage system
CN111625506A (en) * 2020-05-29 2020-09-04 浪潮电子信息产业股份有限公司 Distributed data deleting method, device and equipment based on deleting queue
CN112269781B (en) * 2020-11-13 2023-07-25 网易(杭州)网络有限公司 Data life cycle management method, device, medium and electronic equipment
CN112364278A (en) * 2020-11-23 2021-02-12 浪潮云信息技术股份公司 Data classification optimization method based on CockroachDB bottom key values
KR102408676B1 (en) * 2021-10-06 2022-06-15 유춘열 Data security video surveillance method using distributed self-replication technology
CN114138180A (en) * 2021-10-24 2022-03-04 济南浪潮数据技术有限公司 Method, device and equipment for deleting object and readable medium
CN115203190A (en) * 2022-07-27 2022-10-18 济南浪潮数据技术有限公司 Method, device and medium for deleting garbage object

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955530A (en) * 2014-05-12 2014-07-30 暨南大学 Data reconstruction and optimization method of on-line repeating data deletion system
CN110249321A (en) * 2017-09-29 2019-09-17 甲骨文国际公司 For the system and method that capture change data use from distributed data source for heterogeneous target

Also Published As

Publication number Publication date
CN117055805A (en) 2023-11-14

Similar Documents

Publication Publication Date Title
US11023448B2 (en) Data scrubbing method and apparatus, and computer readable storage medium
US8103840B2 (en) Snapshot mechanism and method thereof
US9031910B2 (en) System and method for maintaining a cluster setup
US5721918A (en) Method and system for fast recovery of a primary store database using selective recovery by data type
US11397538B2 (en) Data migration method and apparatus
CN101334797B (en) Distributed file systems and its data block consistency managing method
CN109284073B (en) Data storage method, device, system, server, control node and medium
WO2015081473A1 (en) Asynchronous replication method, apparatus and system
CN106406758A (en) Data processing method based on distributed storage system, and storage equipment
CN114090337B (en) A fast synthetic backup and recovery method based on snapshot
CN110825546A (en) Recovery method, system and equipment terminal for high-availability database cluster
US10671567B2 (en) System and method for optimized lock detection
US8312237B2 (en) Automated relocation of in-use multi-site protected data storage
CN117055805B (en) KV deletion optimization method, system, device and medium based on distributed storage
CN107329699B (en) A method and system for erasure rewriting
CN112231150B (en) Method and device for recovering fault database in database cluster
CN118069058A (en) A distributed storage fault processing method, system, device and computer medium
US11645333B1 (en) Garbage collection integrated with physical file verification
WO2020094063A1 (en) Data storage method and device, storage medium and electronic device
CN114297043B (en) Log packet replay method, device, electronic device and storage medium
CN117093542A (en) Metadata snapshot rollback method, system, equipment and storage medium
US20180239535A1 (en) Replicating Data in a Data Storage System
US11204890B2 (en) System and method for archiving data in a decentralized data protection system
CN110806953A (en) A backup method and device
CN113821176B (en) Data migration processing method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant