CN119576226B - Metadata garbage collection method and device, electronic equipment and storage medium - Google Patents
Metadata garbage collection method and device, electronic equipment and storage mediumInfo
- Publication number
- CN119576226B CN119576226B CN202411622074.9A CN202411622074A CN119576226B CN 119576226 B CN119576226 B CN 119576226B CN 202411622074 A CN202411622074 A CN 202411622074A CN 119576226 B CN119576226 B CN 119576226B
- Authority
- CN
- China
- Prior art keywords
- metadata
- storage
- persistence
- storage area
- segment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0652—Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/128—Details of file system snapshots on the file-level, e.g. snapshot creation, administration, deletion
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Memory System (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the disclosure provides a metadata garbage collection method, a device, an electronic device and a storage medium, wherein a metadata persistence storage area in a storage device based on a partition naming space is detected to determine a target storage section by responding to a garbage collection instruction, the metadata persistence storage area comprises a storage section for storing metadata based on a sequential writing mode, the target storage section is a storage section to be subjected to garbage collection, and the metadata is used for realizing data recovery of corresponding service data; and generating a corresponding metadata snapshot aiming at the target storage segment, storing the metadata snapshot into a metadata persistence storage area, wherein the metadata snapshot is used for recording the latest generated metadata in the target storage segment, and clearing the data in the target storage segment. The data recovery of the storage device can be realized, the utilization rate of the storage space and the space recovery efficiency are improved, and the operation stability of the storage device is improved.
Description
Technical Field
The embodiment of the disclosure relates to the technical field of cloud storage, in particular to a metadata garbage collection method, a device, electronic equipment and a storage medium.
Background
In the technical field of cloud storage, storage equipment based on partition naming space (Zoned Namespaces, ZNS) technology is managed by dividing a storage space into a plurality of storage segments (segments), so that the storage and access processes of data are optimized, and the data reading and writing performance is improved.
In the prior art, in order to improve the user storage space of the storage device, only a small part of space is reserved for storing metadata for data recovery, and meanwhile, in the storage device based on the partition naming space technology, data can only be written in sequence but cannot be covered, so that the problems of insufficient storage space and low space recovery efficiency of the metadata are caused, and the operation stability of the storage device is affected.
Disclosure of Invention
The embodiment of the disclosure provides a metadata garbage collection method, a device, electronic equipment and a storage medium, so as to solve the problems of insufficient storage space and low space collection efficiency of metadata.
In a first aspect, an embodiment of the present disclosure provides a metadata garbage collection method, including:
And responding to a garbage collection instruction, detecting a metadata persistence storage area in storage equipment based on a partition naming space, determining a target storage section, wherein the metadata persistence storage area comprises a storage section for storing metadata based on a sequential writing mode, the target storage section is a storage section to be subjected to garbage collection, the metadata is used for realizing data recovery of corresponding service data, generating a corresponding metadata snapshot aiming at the target storage section, storing the metadata snapshot into the metadata persistence storage area, and the metadata snapshot is used for recording the latest generated metadata in the target storage section and clearing the data in the target storage section.
In a second aspect, an embodiment of the present disclosure provides a metadata garbage collection apparatus, including:
The detection module is used for responding to the garbage collection instruction, detecting a metadata persistence storage area in the storage equipment based on the partition naming space and determining a target storage section, wherein the metadata persistence storage area comprises a storage section for storing metadata based on a sequential writing mode, the target storage section is a storage section to be subjected to garbage collection, and the metadata is used for realizing data recovery of corresponding service data;
The generation module is used for generating a corresponding metadata snapshot aiming at the target storage segment, storing the metadata snapshot into the metadata persistence storage area, and recording the latest generated metadata in the target storage segment;
And the cleaning module is used for cleaning the data in the target storage section.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including a processor and a memory;
The memory stores computer-executable instructions;
The processor executes computer-executable instructions stored in the memory to cause the at least one processor to perform the metadata garbage collection method as described above in the first aspect and the various possible designs of the first aspect.
In a fourth aspect, embodiments of the present disclosure provide a computer readable storage medium, where computer executable instructions are stored, which when executed by a processor, implement the metadata garbage collection method according to the first aspect and the various possible designs of the first aspect.
In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements the metadata garbage collection method according to the first aspect and the various possible designs of the first aspect.
According to the metadata garbage collection method, device, electronic equipment and storage medium, a metadata persistence storage area in storage equipment based on a partition naming space is detected through responding to a garbage collection instruction, a target storage section is determined, wherein the metadata persistence storage area comprises a storage section for storing metadata based on a sequential writing mode, the target storage section is a storage section to be subjected to garbage collection, the metadata is used for achieving data recovery of corresponding service data, corresponding metadata snapshot is generated for the target storage section and is stored in the metadata persistence storage area, the metadata snapshot is used for recording the latest generated metadata in the target storage section, and data in the target storage section are cleared. The method comprises the steps of determining a target storage section which needs garbage collection in a metadata persistence storage area, triggering a snapshot generation event aiming at the target storage section, generating a data snapshot of metadata in the target storage section, namely the metadata snapshot, then storing the metadata snapshot into the metadata persistence storage area, and clearing the data in the target storage section, so that garbage collection of the target storage section is completed, and meanwhile, by storing the metadata snapshot in the metadata persistence storage area, data recovery of storage equipment can be realized, the utilization rate of storage space and space recovery efficiency are improved, and the operation stability of the storage equipment is improved.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the description of the prior art, it being obvious that the drawings in the following description are some embodiments of the present disclosure, and that other drawings may be obtained from these drawings without inventive effort to a person of ordinary skill in the art.
Fig. 1 is an application scenario diagram of a metadata garbage collection method provided in an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a metadata garbage collection method according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a metadata persistence store provided by an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a process for determining a target memory segment according to an embodiment of the present disclosure;
FIG. 5 is a flowchart of a specific implementation of step S102 in the embodiment shown in FIG. 2;
Fig. 6 is a second flow chart of a metadata garbage collection method according to an embodiment of the present disclosure;
FIG. 7 is a flowchart of a specific implementation of step S200 in the embodiment shown in FIG. 6;
FIG. 8 is a schematic diagram of a process for persisting metadata snapshots provided by an embodiment of the present disclosure;
FIG. 9 is a flowchart of a specific implementation of step S203 in the embodiment shown in FIG. 6;
Fig. 10 is a block diagram of a metadata garbage collection device according to an embodiment of the present disclosure;
Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure;
fig. 12 is a schematic hardware structure of an electronic device according to an embodiment of the disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and be provided with corresponding operation entries for the user to select authorization or rejection.
The application scenario of the embodiments of the present disclosure is explained below:
Fig. 1 is an application scenario diagram of a metadata garbage collection method provided by an embodiment of the present disclosure, where the metadata garbage collection method provided by the embodiment of the present disclosure may be applied to an application scenario of cloud storage, and more particularly, may be applied to an application scenario of distributed cloud storage, and an execution body of the embodiment may be a storage node in a distributed cloud storage system, for example, a storage device or other electronic devices with similar functions. In some embodiments, the storage device may implement the metadata garbage collection method provided by the embodiments of the present application by running various computer executable instructions or computer programs. For example, the computer-executable instructions may be program-level commands, machine instructions, or software instructions. The computer program can be a native program or a software module in an operating system, a local application program, namely a program which can be installed in the operating system to run, or an applet embedded in any APP, namely a program which runs based on a browser environment. In summary, the above-mentioned computer executable instructions may be any form of instructions, and the above-mentioned computer program may be any form of application program, module or plug-in, and the specific implementation form may be configured according to needs. Further, in the process of implementing the metadata garbage collection method provided by the embodiment of the present application, the storage device may execute the method by running locally-set computer executable instructions or computer programs, or may execute the method by calling computer executable instructions or computer programs in an external server. In some embodiments, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud storage, cloud communication, cloud databases, cloud computing, cloud functions, network services, middleware services, domain name services, security services, content distribution network (Content DeliveryNetwork, CDN), and basic cloud computing services such as big data and artificial intelligence platform.
Referring to fig. 1, in a storage device based on partition namespaces, a storage space is abstracted into consecutive storage segments (segments) through a specific interface, where the segments are a mapping of a physical storage medium at a software layer, a majority of the segments are used to store service data, i.e., service data storage areas, such as segments_00 to segment_95 shown in the drawing, and another minority of the segments are used to store metadata corresponding to the service data, such as segments_96 to segment_99, i.e., metadata persistent storage areas. After the partitions are formatted, the above-mentioned space division manner is also determined, and the space sizes of the storage segments for storing the service data and the storage segments for storing the metadata are not changed any more.
Because of the technical nature of partition namespace (ZNS) technology, metadata, like business data, can only be written sequentially to a memory segment, typically by writing a log to a write operation within the memory segment. After the storage section is fully written, if the space in the storage section needs to be reused, the effective data in the storage section needs to be migrated first, and then the storage section is erased before being reused. However, since the storage space for storing the metadata corresponding to the service data is small, when the service data is frequently modified, a large amount of operation log records are generated, so that the corresponding storage segments are filled to trigger the garbage collection process, and the log records are repeatedly moved by the frequent garbage collection process, so that additional resource overhead is generated, and the problems of insufficient storage space and low space collection efficiency of the metadata are caused.
The embodiment of the disclosure provides a metadata garbage collection method to solve the above problems.
Referring to fig. 2, fig. 2 is a schematic flow chart of a metadata garbage collection method according to an embodiment of the disclosure. The method of the present embodiment can be applied to a storage device, and the metadata garbage collection method includes:
In step S101, in response to a garbage collection instruction, a metadata persistence storage area in a storage device based on a partition naming space is detected, and a target storage section is determined, wherein the metadata persistence storage area comprises a storage section for storing metadata based on a sequential writing mode, and the target storage section is a storage section to be garbage collected.
For example, referring to the application scenario schematic diagram shown in fig. 1, the execution body of the method provided in this embodiment may be a storage device or a control unit in the storage device (the following description will take the control unit as a request of the execution body as an example), where one or more storage segments are provided in the storage device, and the storage device controls the one or more storage segments through the control unit, so as to implement the metadata garbage collection method provided in this embodiment. Specifically, the control unit detects, in response to a garbage collection instruction, a metadata persistence storage area in a partition-based namespace (ZNS) storage device, that is, a set of storage spaces in the storage device for storing metadata-related information. The metadata persistent storage area comprises one or more storage segments (segments), each storage Segment is used for storing corresponding metadata, and the metadata in the storage segments are stored based on a sequential writing mode. The control unit detects each storage segment in the metadata persistence storage area through preset detection logic, and determines one or more storage segments as target storage segments. Specifically, the control unit determines the storage load of each storage segment as a target storage segment by detecting the storage load, and in one possible implementation manner, the storage load represents the duty ratio of invalid data in the storage segments, in this case, one or more storage segments with the largest duty ratio of the invalid data are determined as the target storage segments, wherein the invalid data are out-of-date metadata in the data segments, and the higher the duty ratio of the invalid data is, the more garbage collection is needed to release the storage space of the storage segments.
In one possible implementation manner, fig. 3 is a schematic diagram of a metadata persistence storage area provided by an embodiment of the present disclosure, where, as shown in fig. 3, the metadata persistence storage area includes a first metadata persistence storage area and a second metadata persistence storage area, where the first metadata persistence storage area is used to store a history operation record for service data, where operation records for creating, freezing (freeze), discarding (discard), deleting, and the like of a stand-alone storage engine file (session) are recorded in the operation record (journ log), and operation records for opening, closing, erasing, and the like of a storage segment, where the stand-alone storage engine file is a subunit in the storage segment for storing service data, which is not described herein in detail. Through the operation record, the operation reproduction of the service data can be realized, and further the data recovery of the service data can be realized. Also, therefore, the first metadata persistent storage area for storing the history operation record, that is, an operation log persistent storage area (journ segment). The second metadata persistence storage area is used to store snapshot (checkpoint) data for metadata, i.e., metadata snapshots. The snapshot generation is a technology of data persistence, and the snapshot generation is performed on the storage section or the single machine storage engine file, so that the related information of the metadata of the storage section or the single machine storage engine file is persistence, thereby realizing the data recovery of service data. The second metadata persistence store is used to store system-generated metadata snapshots, i.e., metadata snapshot persistence store (MATA SEGMENT).
Accordingly, the specific implementation manner of the step S101 includes detecting the first metadata persistence storage area in response to the first garbage collection instruction, determining a target storage section in the first metadata persistence storage area, and/or detecting the second metadata persistence storage area in response to the second garbage collection instruction, and determining a target storage section in the second metadata persistence storage area. That is, when the garbage collection instruction responded by the control unit is a first garbage collection instruction, the first metadata persistence storage area is detected, and a target storage section in the first metadata persistence storage area is determined, so that garbage collection of the first metadata persistence storage area is completed in a subsequent step, and when the garbage collection instruction responded by the control unit is a second garbage collection instruction, the second metadata persistence storage area is detected, and a target storage section in the second metadata persistence storage area is determined, so that garbage collection of the second metadata persistence storage area is completed in a subsequent step.
It may be appreciated that in another possible implementation manner, the garbage collection instruction may further include the first garbage collection instruction and the second garbage collection instruction at the same time, that is, the control unit detects, by responding to the garbage echo instruction, the target storage segment in the first metadata persistent storage area and the target storage segment in the second metadata persistent storage area, thereby completing garbage collection on the first metadata persistent storage area and the second metadata persistent storage area, which may be specifically set as needed, and will not be described herein in detail by way of example.
Further, in the case of garbage collection of the first metadata persistent storage area in response to the first garbage collection instruction, when the first data storage area is detected, a data writing time stamp corresponding to each storage segment in the first metadata persistent storage area is obtained, a storage segment corresponding to the earliest writing time stamp is determined to be a target storage segment, and in the case of garbage collection of the second metadata persistent storage area in response to the second garbage collection instruction, when the second data storage area is detected, a storage load corresponding to each storage segment in the second metadata persistent storage area is obtained, and at least one storage segment with the largest storage load is determined to be the target storage segment.
Fig. 4 is a schematic diagram of a process for determining a target storage segment according to an embodiment of the present disclosure, as shown in fig. 4, when a control unit detects a first data storage area in response to a first garbage collection instruction, since the first data storage area is used to store a history of service data, and therefore the history has a time sequence attribute, when data is recovered by using the history, playback must be performed strictly in time sequence, and for this feature, when garbage collection is performed on storage segments (shown as journ_segment_1, journ_segment_2, and the like in the first data storage area, where storage loads are storage load=30% and storage load=76%, respectively), the earliest storage segment is used, that is, a data write time stamp corresponding to each storage segment in the first metadata persistence storage area is acquired, and a storage segment corresponding to the earliest write time stamp is determined as the target storage segment, for example, journ_segment_segment_1 shown in the figure. When the control unit detects the second data storage area in response to the second garbage collection instruction, since the second data storage area is used for storing the metadata snapshot for metadata, data recovery of service data corresponding to the metadata can be achieved through the metadata snapshot without being affected by log time, so that the target storage segment can be determined directly through the storage load of each storage segment in the second data storage, that is, one or more storage segments with the largest storage load are determined as the target storage segment, for example, as shown in the figure, two storage segments meta_segment_1 (storage load=60%) and meta_segment_2 (storage load=75%) with the largest storage load are determined as the target storage segment.
Further, a specific implementation manner of determining at least one storage segment with the largest storage load as the target storage segment includes:
Step S1011, obtaining the comprehensive load value of the second metadata persistence storage area, and determining the target quantity according to the comprehensive load value and preset load mapping information, wherein the preset load mapping information represents the mapping relation between the comprehensive load value and the quantity of the storage sections.
Step S1012, determining the storage segments with the maximum storage load target number in the second metadata persistence storage area as target storage segments.
The control unit, before determining the target memory segment, may first obtain a comprehensive load value of the second metadata persistent storage area, i.e. the current usage amount of the memory segment for storing the metadata snapshot, and then determine, based on the comprehensive load value, the number of memory segments determined as target from the second metadata persistent storage area, i.e. the target number. Wherein, the comprehensive load value is in a proportional relation with the target quantity, namely, the larger the comprehensive load value is, the more the target quantity is in a certain range. For example, when the integrated load value is 60%, the target number is 1, i.e. one storage segment with the largest storage load is selected to determine as the target storage segment, and when the integrated load value is 80%, the target number is 3, i.e. three storage segments with the largest storage load are selected to determine as the target storage segment. Through the steps of the embodiment, the garbage cleaning amount can be improved under the condition that the comprehensive load value of the second metadata persistence storage area is large, so that the comprehensive load value is reduced rapidly, and the garbage cleaning efficiency is improved.
Step S102, generating a corresponding metadata snapshot aiming at the target storage segment, and storing the metadata snapshot into a metadata persistence storage area, wherein the metadata snapshot is used for realizing data recovery of service data corresponding to metadata by recording the latest generated metadata in the target storage segment.
Step S103, clearing the data in the target storage segment.
For example, after determining the target storage segment in the metadata persistence storage area, generating a corresponding metadata snapshot for the target storage segment, where an implementation process of generating the metadata snapshot for the target storage segment may be implemented through a functional interface provided by the ZNS system, and a specific process of generating the metadata snapshot is not described in detail. And then, storing the metadata snapshot into a metadata persistence storage area to realize persistence of the metadata snapshot. After the metadata snapshot is persisted, when the system is restarted, the service data needs to be recovered, and the corresponding metadata snapshot can be read from the metadata persisted storage area (persisted storage area), and the service data is recovered to a state when the metadata snapshot is generated based on the metadata snapshot, so that the recovery of the service data is realized. After the metadata snapshot generation and persistence processes are completed, the data in the target storage segment can be cleared, or the target storage segment can be deleted, so that the aim of garbage collection is fulfilled.
In one possible implementation, the storage device based on the partition naming space includes at least one single storage engine file, where the single storage engine file is used to store service data corresponding to metadata, and before step S102, the method further includes obtaining validity information of the single storage engine file corresponding to the target storage segment, where the validity information is used to indicate validity of the single storage engine file.
Correspondingly, the specific implementation mode of the step S102 comprises the step of generating a corresponding metadata snapshot for the target storage segment according to the validity information of the single storage engine file if the single storage engine file is in a validity state.
In this embodiment, before generating the metadata snapshot, the control unit first obtains the single storage engine file corresponding to the metadata in the target storage segment to detect validity information of the single storage engine file, if the single storage engine file is deleted or moved to other storage devices or media, it is determined that the single storage engine file is in an invalid state, in this case, a subsequent step of generating the metadata snapshot may not be performed to save resource overhead, on the other hand, if the single storage engine file is normally stored, it is determined that the single storage engine file is in an valid state, in this case, a corresponding functional interface is called for the single storage engine file, and a corresponding metadata snapshot may be generated.
Further, in one possible implementation manner, as shown in fig. 5, the specific implementation manner of step S102 includes:
S1021, acquiring storage priorities of a first metadata persistence storage area and a second metadata persistence storage area;
Step S1022, determining a proportion value of a single first metadata snapshot and a second metadata snapshot according to the storage priority, wherein the first metadata snapshot is generated based on a target storage segment in the first metadata persistence storage area;
Step S1023, respectively executing metadata relocation requests based on the proportional values through a pre-configured task queue, and generating corresponding numbers of first metadata snapshots and second metadata snapshots in unit time.
Step S1024, storing the first metadata snapshot and the second metadata snapshot into the metadata persistence storage area.
In this embodiment, in the process of generating the metadata snapshot, first, storage priorities of the first metadata persistent storage area and the second metadata persistent storage area are determined, then, the metadata snapshot is converted into corresponding scale values based on the storage priorities, then, metadata relocation requests are executed based on the scale values through a pre-configured task queue, and a corresponding number of first metadata snapshots and second metadata snapshots, specifically, the first metadata snapshots and the second metadata snapshots are the latest metadata information of the relocation related to each time, are generated in a unit time. The process is equivalent to generating events for snapshots corresponding to target storage segments in the first metadata persistence storage area and the second metadata persistence storage area based on the proportional value, and respectively distributing corresponding computing resources, so that dynamic adjustment of the garbage recycling process is realized, and the overall garbage recycling efficiency of the storage device is improved.
Further, in the step of this embodiment, further includes:
Step S1025, obtaining the comprehensive load value corresponding to the first metadata persistence storage area and the second metadata persistence storage area.
And step S1026, adjusting the storage priority of the first metadata persistence storage area and the second metadata persistence storage area according to the comprehensive load value corresponding to the first metadata persistence storage area and the second metadata persistence storage area.
The control unit may further adjust the storage priorities of the first metadata persistence storage area and the second metadata persistence storage area according to the comprehensive load values corresponding to the first metadata persistence storage area and the second metadata persistence storage area, for example, when the comprehensive load value of the first metadata persistence storage area is greater than a preset load proportion and/or the comprehensive load value of the first metadata persistence storage area is greater than a preset multiple of the comprehensive load value of the second metadata persistence storage area, the corresponding storage priority of the first metadata persistence storage area is increased, for example, when the comprehensive load value of the second metadata persistence storage area is greater than the preset load proportion and/or the comprehensive load value of the second metadata persistence storage area is greater than the preset multiple of the comprehensive load value of the first metadata persistence storage area, and the specific implementation manner may be set according to the resource configuration situation and specific requirements of the storage device, which will not be repeated herein.
In the embodiment, a metadata persistence storage area in a storage device based on a partition naming space is detected in response to a garbage collection instruction, a target storage section is determined, wherein the metadata persistence storage area comprises a storage section for storing metadata based on a sequential writing mode, the target storage section is a storage section to be subjected to garbage collection, the metadata is used for achieving data recovery of corresponding service data, a corresponding metadata snapshot is generated for the target storage section, the metadata snapshot is stored in the metadata persistence storage area, the metadata snapshot is used for recording the latest generated metadata in the target storage section, and data in the target storage section is cleared. The method comprises the steps of determining a target storage section which needs garbage collection in a metadata persistence storage area, triggering a snapshot generation event aiming at the target storage section, generating a data snapshot of metadata in the target storage section, namely the metadata snapshot, then storing the metadata snapshot into the metadata persistence storage area, and clearing data in the target storage section, so that garbage collection of the target storage section is completed, and meanwhile, by storing the metadata snapshot in the metadata persistence storage area, data recovery of storage equipment can be realized, the utilization rate of storage space and space recovery efficiency are improved, and the operation stability of the storage equipment is improved.
Referring to fig. 6, fig. 6 is a second flowchart of a metadata garbage collection method according to an embodiment of the present disclosure. The embodiment further refines steps S102-103 on the basis of the embodiment shown in fig. 2, and the metadata garbage collection method includes:
And step 200, generating a garbage collection instruction, wherein the garbage collection instruction comprises a first garbage collection instruction and/or a second garbage collection instruction.
The garbage collection instruction is generated by the control unit in a preset program logic, for example, based on a preset time interval, so as to trigger a subsequent metadata garbage collection process, or by detecting an index in the storage device.
In a possible implementation manner, as shown in fig. 7, a specific implementation manner of step S200 includes:
step S2001, obtaining a comprehensive load value of a metadata persistence storage area;
Step S2002, determining a corresponding trigger recovery load interval according to the comprehensive load value, wherein the comprehensive load value is inversely proportional to the interval median of the trigger recovery load interval;
And step S2003, when the storage load of at least one storage segment in the metadata persistence storage area is positioned in the trigger recycling load interval, generating a garbage recycling instruction.
Illustratively, the control unit first obtains a comprehensive load value of the metadata persistence storage area, which may be a data storage amount, a occupation amount, an occupancy rate, or the like of the metadata persistence storage area. And then, determining a corresponding trigger recovery load section according to the comprehensive load value, wherein the comprehensive load value is inversely proportional to the section median of the trigger recovery load section, specifically, the higher the comprehensive load value is, the lower the section value of the corresponding trigger recovery load section is, and the section value of the trigger recovery load section can be the upper limit value, the lower limit value or the section median of the trigger recovery load section without limitation. For example, the interval value of the trigger recovery load interval is, for example, a lower limit value of the trigger recovery load interval, for example, when the integrated load value is 50%, the corresponding trigger recovery load interval is 70%, that is, the lower limit value of the trigger recovery load interval is 70%, and the upper limit value is fixed to 100%. Then, based on the trigger recovery load section and the storage load of each storage section in the metadata persistence storage area, a garbage collection instruction is generated, namely, when the storage load of the storage section is positioned in the trigger recovery load section (namely, more than 70%), the garbage collection instruction is generated, and when the comprehensive load value is 70%, the corresponding trigger recovery load section is 50%, namely, the lower limit value of the trigger recovery load section is 50%, and the upper limit value is fixed to be 100%. Then, based on the trigger reclaiming load interval and the storage load of each storage section in the metadata persistence storage area, a garbage collection instruction is generated, namely, when the storage load of the storage section is positioned in the trigger reclaiming load interval (namely, more than 50 percent), the garbage collection instruction is generated.
In the step of the embodiment, the corresponding trigger recovery load interval is dynamically determined according to the comprehensive load value of the metadata persistent storage area, so that the dynamic adjustment of garbage recovery time and frequency is realized, the performance loss caused by excessive garbage recovery is avoided, and the overall read-write efficiency of the storage device is improved.
Step S201, responding to a first garbage collection instruction, detecting a first metadata persistence storage area, determining a target storage section in the first metadata persistence storage area, and/or responding to a second garbage collection instruction, detecting a second metadata persistence storage area, and determining a target storage section in the second metadata persistence storage area.
Step S202, generating a first metadata snapshot aiming at a target storage segment in the first metadata persistence storage area, and storing the first metadata snapshot into the second metadata persistence storage area.
Step S203, for the target storage segment in the second metadata persistence storage area, generating a second metadata snapshot in the second metadata persistence storage area.
Illustratively, after generating the garbage collection instruction, the control unit executes corresponding steps to determine a target storage segment in the first metadata persistent storage area and/or determine a target storage segment in the second metadata persistent storage area according to the specific implementation content of the garbage collection instruction, i.e. the first garbage collection instruction or the second garbage collection instruction. And then, in response to the first garbage collection instruction, generating a first metadata snapshot for the target storage segment in the first metadata persistence storage area and storing the first metadata snapshot to the second metadata persistence storage area, and in response to the second garbage collection instruction, generating a second metadata snapshot for the target storage segment in the second metadata persistence storage area. FIG. 8 is a schematic diagram of a process for persisting metadata snapshots according to an embodiment of the disclosure, where, as shown in FIG. 8, the first metadata snapshot is generated for a target storage segment in a first metadata persisted storage (journ_segment) in response to a first garbage collection instruction, and then the first metadata snapshot is transferred to a second metadata persisted storage (meta_segment), and the second metadata snapshot is generated directly in the second metadata persisted storage (meta_segment) for a target storage segment in a second metadata persisted storage (journ_segment) in response to a second garbage collection instruction.
In other possible implementations, the step S202 and the step S203 may be performed separately, that is, in one possible embodiment, the step S200, the step S201, the step S202, and the step S204 are performed to complete garbage collection of the first metadata persistent storage area, or in another possible embodiment, the step S200, the step S201, the step S203, and the step S204 are performed to complete garbage collection of the second metadata persistent storage area. As also described in the above embodiments, step S202 and step S203 are performed simultaneously, which are determined based on specific garbage collection instructions, and are not described in detail herein.
For the target storage segment in the second metadata persistence storage area, in the case of generating the second metadata snapshot in the second metadata persistence storage area, as shown in fig. 9, the specific implementation manner of step S203 includes:
and step S203-1, generating metadata snapshots of all target storage segments in the second metadata persistence storage area, and acquiring data heat corresponding to all the target storage segments, wherein the data heat represents the update probability of metadata corresponding to the target storage segments.
And step 203-2, storing the metadata snapshot corresponding to the target storage segment with the data heat greater than the heat threshold into the first sub-storage area.
And step 203-3, storing the metadata snapshot corresponding to the target storage segment with the data heat not greater than the heat threshold into the second sub-storage area.
In this case, when a snapshot generation event for each target storage segment is triggered, the data heat corresponding to each target storage segment is first determined, the update probability of the metadata corresponding to the target storage segment is represented by the data heat, that is, for the metadata (corresponding storage segment) which is not updated or is updated with a smaller probability, the generated metadata snapshot can be stored into the second sub-storage region, and the frequency of generating new metadata snapshot can be reduced by reducing the frequency of detecting the second sub-storage region, so that the computing resource cost is saved, and for the metadata (corresponding storage segment) which is updated with a larger probability or is frequently updated, the real-time of data recovery can be ensured by storing the generated metadata snapshot into the first sub-storage region, so that the data security is improved.
Step S204, clearing the data in the target storage segment.
In this embodiment, the implementation manner of step S204 is the same as the implementation manner of step S103 in the embodiment shown in fig. 2 of the present disclosure, and will not be described in detail here.
Corresponding to the metadata garbage collection method of the above embodiments, fig. 10 is a block diagram of a metadata garbage collection device provided in an embodiment of the present disclosure. The method described in the above embodiments may be performed by the metadata garbage collection device, which may be implemented in software and/or hardware, and which may be integrated in an electronic device having a certain data processing function. The electronic device may include, but is not limited to, a mobile terminal having a large data processing capability, and a stationary terminal having a large data processing capability such as a desktop computer, a super computer, and the like.
For ease of illustration, only portions relevant to embodiments of the present disclosure are shown. Referring to fig. 10, the metadata garbage collection apparatus 3 includes:
The detection module 31 is configured to detect a metadata persistence storage area in a storage device based on a partition namespace in response to a garbage collection instruction, and determine a target storage area, where the metadata persistence storage area includes a storage area storing metadata based on a sequential writing manner, and the target storage area is a storage area to be subjected to garbage collection, and the metadata is used to implement data recovery of corresponding service data;
The generating module 32 is configured to generate, for the target storage segment, a corresponding metadata snapshot, and store the metadata snapshot into the metadata persistence storage area, where the metadata snapshot is used to record metadata that is newly generated in the target storage segment;
a cleaning module 33, configured to clean up the data in the target memory segment.
According to one or more embodiments of the present disclosure, a metadata persistence store includes a first metadata persistence store for storing a history of operations for business data and a second metadata persistence store for storing metadata snapshots for metadata, a detection module 31 is specifically configured to detect the first metadata persistence store, determine a target storage segment in the first metadata persistence store in response to a first garbage collection instruction, and/or detect the second metadata persistence store, determine a target storage segment in the second metadata persistence store in response to a second garbage collection instruction.
According to one or more embodiments of the present disclosure, the detection module 31 is specifically configured to, in response to a first garbage collection instruction, detect a first metadata persistent storage area, determine a target storage segment in the first metadata persistent storage area, obtain a data write timestamp corresponding to each storage segment in the first metadata persistent storage area and determine a storage segment corresponding to an earliest write timestamp as the target storage segment in response to the first garbage collection instruction, and when in response to a second garbage collection instruction, detect a second metadata persistent storage area, determine a target storage segment in the second metadata persistent storage area, specifically configured to, in response to the second garbage collection instruction, obtain a storage load corresponding to each storage segment in the second metadata persistent storage area, and determine at least one storage segment with a maximum storage load as the target storage segment, where the storage load is used to characterize a duty ratio of invalid data in the storage segments.
According to one or more embodiments of the present disclosure, the detection module 31 is specifically configured to, when determining at least one storage segment with the largest storage load as a target storage segment, obtain a comprehensive load value of the second metadata persistent storage area, determine a target number according to the comprehensive load value, and determine a storage segment with the largest storage load in the second metadata persistent storage area as the target storage segment.
In accordance with one or more embodiments of the present disclosure, the generation module 32 is specifically configured to generate a first metadata snapshot for a target storage segment in a first metadata persistence storage area and store the first metadata snapshot to a second metadata persistence storage area, and/or generate a second metadata snapshot in the second metadata persistence storage area for the target storage segment in the second metadata persistence storage area.
According to one or more embodiments of the present disclosure, the target storage segment includes at least two target storage segments, the second metadata persistence storage area includes a first sub storage area and a second sub storage area, and the generating module 32 is specifically configured to generate metadata snapshots of each target storage segment in the second metadata persistence storage area when generating the second metadata snapshots in the second metadata persistence storage area for the target storage segment in the second metadata persistence storage area, and obtain data heat corresponding to each target storage segment, where the data heat characterizes update probability of metadata corresponding to the target storage segment, store the metadata snapshots corresponding to the target storage segment with the data heat greater than the heat threshold to the first sub storage area, and store the metadata snapshots corresponding to the target storage segment with the data heat not greater than the heat threshold to the second sub storage area.
According to one or more embodiments of the present disclosure, the generating module 32 is specifically configured to obtain storage priorities of a first metadata persistence storage area and a second metadata persistence storage area, determine a proportion value of the first metadata snapshot and the second metadata snapshot according to the storage priorities, where the first metadata snapshot is a metadata snapshot generated based on a target storage segment in the first metadata persistence storage area, the second metadata snapshot is a metadata snapshot generated based on a target storage segment in the second metadata persistence storage area, and generate a corresponding number of the first metadata snapshot and the second metadata snapshot in a unit time based on the proportion value by executing metadata relocation requests respectively through a pre-configured task queue.
The generating module 32 is further configured to obtain integrated load values corresponding to the first metadata persistence storage area and the second metadata persistence storage area, and adjust storage priorities of the first metadata persistence storage area and the second metadata persistence storage area according to the integrated load values corresponding to the first metadata persistence storage area and the second metadata persistence storage area according to one or more embodiments of the present disclosure.
In accordance with one or more embodiments of the present disclosure, before detecting metadata persistence storage in a partition namespace-based storage device in response to a garbage collection instruction to determine a target storage segment, the detection module 31 is further configured to obtain a comprehensive load value of the metadata persistence storage, determine a corresponding trigger garbage collection load interval according to the comprehensive load value, wherein the comprehensive load value is inversely proportional to an interval median of the trigger garbage collection load interval, and generate the garbage collection instruction based on the trigger garbage collection load interval and storage loads of each storage segment in the metadata persistence storage.
According to one or more embodiments of the present disclosure, the storage device based on the partition namespace includes at least one single storage engine file, and the generating module 32 is further configured to obtain validity information of the single storage engine file corresponding to the target storage segment, and when the generating module 32 generates the corresponding metadata snapshot for the target storage segment, the generating module is specifically configured to generate the metadata snapshot corresponding to the single storage engine file according to the validity information of the single storage engine file.
The detection module 31, the generation module 32 and the cleaning module 33 are sequentially connected. The metadata garbage collection device 3 provided in this embodiment may execute the technical scheme of the foregoing method embodiment, and its implementation principle and technical effects are similar, which is not described herein again.
Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, as shown in fig. 11, the electronic device 4 includes:
a processor 41 and a memory 42 communicatively connected to the processor 41;
memory 42 stores computer-executable instructions;
Processor 41 executes computer-executable instructions stored in memory 42 to implement the metadata garbage collection method in the embodiments shown in fig. 2-9.
Wherein optionally the processor 41 and the memory 42 are connected by a bus 43.
The relevant descriptions and effects corresponding to the steps in the embodiments corresponding to fig. 2 to fig. 9 may be understood correspondingly, and are not described in detail herein.
The embodiments of the present disclosure provide a computer readable storage medium, in which computer executable instructions are stored, where the computer executable instructions are used to implement the metadata garbage collection method provided in any one of the embodiments corresponding to fig. 2 to 9 of the present disclosure when executed by a processor.
The embodiments of the present disclosure provide a computer program product, including a computer program, which when executed by a processor implements the metadata garbage collection method provided in any of the embodiments corresponding to fig. 2 to 9 of the present disclosure.
In order to achieve the above embodiments, the embodiments of the present disclosure further provide an electronic device.
Referring to fig. 12, there is shown a schematic structural diagram of an electronic device 900 suitable for use in implementing embodiments of the present disclosure, which electronic device 900 may be a terminal device or a server. Among them, the terminal device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook Computer, a digital broadcast receiver, a Personal Digital Assistant (PDA) or a Tablet Computer (Tablet Computer), a Portable Multimedia Player (PMP) or a car-mounted terminal (e.g., car navigation terminal), and a fixed terminal such as a digital TV or a desktop Computer. The electronic device shown in fig. 12 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 12, the electronic apparatus 900 may include a processing device (e.g., a central processor, a graphics processor, etc.) 901 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage device 908 into a random access Memory (Random Access Memory RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are also stored. The processing device 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
In general, devices may be connected to the I/O interface 905 including input devices 906 such as a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc., output devices 907 including a Liquid crystal display (LCD for short) CRYSTAL DISPLAY, speaker, vibrator, etc., storage devices 908 including magnetic tape, hard disk, etc., for example, and communication devices 909. The communication means 909 may allow the electronic device 900 to communicate wirelessly or by wire with other devices to exchange data. While fig. 12 shows an electronic device 900 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 909, or installed from the storage device 908, or installed from the ROM 902. When executed by the processing device 901, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to electrical wiring, fiber optic cable, RF (radio frequency), and the like, or any suitable combination of the foregoing.
The computer readable medium may be included in the electronic device or may exist alone without being incorporated into the electronic device.
The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above-described embodiments.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (Local Area Network, LAN for short) or a wide area network (Wide Area Network, WAN for short), or may be connected to an external computer (e.g., through the internet using an internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware. Where the name of the unit or module does not in some cases constitute a limitation of the unit itself.
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic that may be used include Field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems-on-a-chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In a first aspect, according to one or more embodiments of the present disclosure, there is provided a metadata garbage collection method, including:
Responding to a garbage collection instruction, detecting a metadata persistence storage area in storage equipment based on a partition name space, determining a target storage section, wherein the metadata persistence storage area comprises a storage section for storing metadata based on a sequential writing mode, the target storage section is a storage section to be subjected to garbage collection, generating a corresponding metadata snapshot aiming at the target storage section, storing the metadata snapshot into the metadata persistence storage area, and the metadata snapshot is used for realizing data recovery of service data corresponding to the metadata by recording the latest generated metadata in the target storage section, and clearing the data in the target storage section.
According to one or more embodiments of the present disclosure, the metadata persistence storage area comprises a first metadata persistence storage area and a second metadata persistence storage area, wherein the first metadata persistence storage area is used for storing historical operation records for business data, the second metadata persistence storage area is used for storing metadata snapshots for metadata, the method comprises the steps of responding to garbage collection instructions, detecting the metadata persistence storage area in a storage device based on a partition naming space, determining a target storage section, and/or responding to the first garbage collection instructions, detecting the first metadata persistence storage area, determining the target storage section in the first metadata persistence storage area, and/or responding to the second garbage collection instructions, detecting the second metadata persistence storage area, and determining the target storage section in the second metadata persistence storage area.
According to one or more embodiments of the disclosure, the detecting the first metadata persistence storage area in response to a first garbage collection instruction, determining a target storage section in the first metadata persistence storage area, wherein the detecting the first metadata persistence storage area in response to the first garbage collection instruction comprises obtaining a data writing time stamp corresponding to each storage section in the first metadata persistence storage area and determining a storage section corresponding to an earliest writing time stamp as the target storage section, the detecting the second metadata persistence storage area in response to a second garbage collection instruction, determining a target storage section in the second metadata persistence storage area, and the determining the target storage section comprises obtaining a storage load corresponding to each storage section in the second metadata persistence storage area in response to the second garbage collection instruction, and determining at least one storage section with the largest storage load as the target storage section, wherein the storage load is used for representing the duty ratio of invalid data in the storage sections.
According to one or more embodiments of the present disclosure, the determining at least one storage segment with the largest storage load as the target storage segment includes obtaining a comprehensive load value of a second metadata persistence storage area, determining a target number according to the comprehensive load value and preset load mapping information, wherein the preset load mapping information represents a mapping relationship between the comprehensive load value and the number of storage segments, and determining the storage segment with the largest storage load in the second metadata persistence storage area as the target storage segment.
According to one or more embodiments of the present disclosure, the generating a corresponding metadata snapshot for the target storage segment and storing the metadata snapshot into the metadata persistence storage area includes generating a first metadata snapshot for a target storage segment in the first metadata persistence storage area and storing the first metadata snapshot into the second metadata persistence storage area and/or generating a second metadata snapshot in the second metadata persistence storage area for a target storage segment in the second metadata persistence storage area.
According to one or more embodiments of the present disclosure, the target storage segment includes at least two, the second metadata persistence storage area includes a first sub-storage area and a second sub-storage area, and the generating a second metadata snapshot in the second metadata persistence storage area for the target storage segment in the second metadata persistence storage area includes:
Generating metadata snapshots of all target storage segments in the second metadata persistence storage area, acquiring data heat corresponding to all the target storage segments, wherein the data heat represents the update probability of metadata corresponding to the target storage segments, storing the metadata snapshots corresponding to the target storage segments with the data heat being larger than a heat threshold value into the first sub-storage area, and storing the metadata snapshots corresponding to the target storage segments with the data heat being not larger than the heat threshold value into the second sub-storage area.
According to one or more embodiments of the disclosure, the generating of the corresponding metadata snapshot for the target storage segment includes obtaining storage priorities of the first metadata persistence storage area and the second metadata persistence storage area, determining a proportion value of the first metadata snapshot and the second metadata snapshot according to the storage priorities, wherein the first metadata snapshot is generated based on the target storage segment in the first metadata persistence storage area, the second metadata snapshot is generated based on the target storage segment in the second metadata persistence storage area, and generating a corresponding number of the first metadata snapshot and the second metadata snapshot in a unit time based on the proportion value through a pre-configured task queue respectively executing metadata relocation requests.
According to one or more embodiments of the present disclosure, the method further includes obtaining integrated load values corresponding to the first metadata persistence storage area and the second metadata persistence storage area, and adjusting storage priorities of the first metadata persistence storage area and the second metadata persistence storage area according to the integrated load values corresponding to the first metadata persistence storage area and the second metadata persistence storage area.
According to one or more embodiments of the disclosure, before detecting metadata persistence storage areas in a partition-based namespace storage device and determining a target storage segment in response to a garbage collection instruction, the method further includes obtaining a comprehensive load value of the metadata persistence storage areas, determining a corresponding trigger collection load interval according to the comprehensive load value, wherein the comprehensive load value is inversely proportional to an interval median of the trigger collection load interval, and generating the garbage collection instruction based on the trigger collection load interval and storage loads of the storage segments in the metadata persistence storage areas.
According to one or more embodiments of the present disclosure, the storage device based on the partition namespace includes at least one single storage engine file, where the single storage engine file is used to store service data corresponding to the metadata, the method further includes obtaining validity information of the single storage engine file corresponding to the target storage segment, and generating a corresponding metadata snapshot for the target storage segment includes generating a corresponding metadata snapshot for the target storage segment if the single storage engine file is in a validity state according to the validity information of the single storage engine file.
In a second aspect, according to one or more embodiments of the present disclosure, there is provided a metadata garbage collection apparatus, comprising:
The detection module is used for responding to the garbage collection instruction, detecting a metadata persistence storage area in the storage equipment based on the partition naming space and determining a target storage section, wherein the metadata persistence storage area comprises a storage section for storing metadata based on a sequential writing mode, the target storage section is a storage section to be subjected to garbage collection, and the metadata is used for realizing data recovery of corresponding service data;
The generation module is used for generating a corresponding metadata snapshot aiming at the target storage segment, storing the metadata snapshot into the metadata persistence storage area, and recording the latest generated metadata in the target storage segment;
And the cleaning module is used for cleaning the data in the target storage section.
According to one or more embodiments of the present disclosure, the metadata persistence storage area comprises a first metadata persistence storage area and a second metadata persistence storage area, wherein the first metadata persistence storage area is used for storing historical operation records for business data, the second metadata persistence storage area is used for storing metadata snapshots for metadata, and the detection module is specifically used for responding to a first garbage collection instruction, detecting the first metadata persistence storage area, determining a target storage segment in the first metadata persistence storage area, and/or responding to a second garbage collection instruction, detecting the second metadata persistence storage area and determining the target storage segment in the second metadata persistence storage area.
According to one or more embodiments of the disclosure, the detection module is specifically configured to, when responding to a first garbage collection instruction, detect the first metadata persistence storage area, determine a target storage segment in the first metadata persistence storage area, obtain a data write timestamp corresponding to each storage segment in the first metadata persistence storage area and determine a storage segment corresponding to an earliest write timestamp as the target storage segment in response to the first garbage collection instruction, and when responding to a second garbage collection instruction, detect the second metadata persistence storage area, determine a target storage segment in the second metadata persistence storage area, specifically configured to, in response to a second garbage collection instruction, obtain a storage load corresponding to each storage segment in the second metadata persistence storage area, and determine at least one storage segment with a maximum storage load as the target storage segment, where the storage load is used to characterize a duty ratio of invalid data in the storage segments.
According to one or more embodiments of the present disclosure, when determining at least one storage segment with the largest storage load as the target storage segment, the detection module is specifically configured to obtain a comprehensive load value of a second metadata persistence storage area, determine a target number according to the comprehensive load value and preset load mapping information, where the preset load mapping information characterizes a mapping relationship between the comprehensive load value and the number of storage segments, and determine the storage segment with the largest storage load in the second metadata persistence storage area as the target storage segment.
According to one or more embodiments of the disclosure, the generating module is specifically configured to generate a first metadata snapshot for a target storage segment in the first metadata persistence storage area, and store the first metadata snapshot to the second metadata persistence storage area, and/or generate a second metadata snapshot in the second metadata persistence storage area for the target storage segment in the second metadata persistence storage area.
According to one or more embodiments of the present disclosure, the target storage segment includes at least two target storage segments, the second metadata persistence storage area includes a first sub-storage area and a second sub-storage area, and the generating module is specifically configured to generate a metadata snapshot of each target storage segment in the second metadata persistence storage area when generating a second metadata snapshot for a target storage segment in the second metadata persistence storage area, and obtain a data heat corresponding to each target storage segment in the second metadata persistence storage area, where the data heat characterizes an update probability of metadata corresponding to the target storage segment, store a target storage segment with a heat greater than a heat threshold corresponding to the metadata snapshot to the first sub-storage area, and store a target storage segment with a heat not greater than the heat threshold corresponding to the metadata snapshot to the second sub-storage area.
According to one or more embodiments of the disclosure, the generating module is specifically configured to obtain storage priorities of the first metadata persistence storage area and the second metadata persistence storage area, determine a proportion value of a first metadata snapshot and a second metadata snapshot according to the storage priorities, where the first metadata snapshot is a metadata snapshot generated based on a target storage segment in the first metadata persistence storage area, the second metadata snapshot is a metadata snapshot generated based on a target storage segment in the second metadata persistence storage area, and execute metadata relocation requests based on the proportion value through a pre-configured task queue, and generate a corresponding number of first metadata snapshots and second metadata snapshots in a unit time.
According to one or more embodiments of the present disclosure, the generating module is further configured to obtain comprehensive load values corresponding to the first metadata persistence storage area and the second metadata persistence storage area, and adjust storage priorities of the first metadata persistence storage area and the second metadata persistence storage area according to the comprehensive load values corresponding to the first metadata persistence storage area and the second metadata persistence storage area.
According to one or more embodiments of the disclosure, before detecting a metadata persistence storage area in a storage device based on a partition namespace and determining a target storage area in response to a garbage collection instruction, the detection module is further configured to obtain a comprehensive load value of the metadata persistence storage area, determine a corresponding trigger collection load interval according to the comprehensive load value, wherein the comprehensive load value is inversely proportional to an interval median of the trigger collection load interval, and generate the garbage collection instruction based on the trigger collection load interval and storage loads of storage areas in the metadata persistence storage area.
According to one or more embodiments of the present disclosure, the storage device based on the partition namespace includes at least one single storage engine file, and the generating module is further configured to obtain validity information of the single storage engine file corresponding to the target storage segment, and when the generating module generates the corresponding metadata snapshot for the target storage segment, the generating module is specifically configured to generate the metadata snapshot corresponding to the single storage engine file according to the validity information of the single storage engine file.
In a third aspect, according to one or more embodiments of the present disclosure, there is provided an electronic device comprising at least one processor and a memory;
The memory stores computer-executable instructions;
The at least one processor executes the computer-executable instructions stored by the memory, causing the at least one processor to perform the metadata garbage collection method as described above in the first aspect and the various possible designs of the first aspect.
In a fourth aspect, according to one or more embodiments of the present disclosure, there is provided a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the metadata garbage collection method as described in the first aspect and the various possible designs of the first aspect.
In a fifth aspect, according to one or more embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the metadata garbage collection method according to the first aspect and the various possible designs of the first aspect.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).
Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.
Claims (12)
1. A metadata garbage collection method, comprising:
In response to a garbage collection instruction, detecting a metadata persistence storage area in storage equipment based on a partition naming space, and determining a target storage section, wherein the metadata persistence storage area comprises a storage section for storing metadata based on a sequential writing mode, and the target storage section is a storage section to be subjected to garbage collection;
Generating a corresponding metadata snapshot aiming at the target storage segment, and storing the metadata snapshot into the metadata persistence storage area, wherein the metadata snapshot is used for realizing data recovery of service data corresponding to the metadata by recording the latest generated metadata in the target storage segment;
Clearing the data in the target storage segment;
Wherein the metadata persistence store comprises a first metadata persistence store for storing historical operational records for business data and a second metadata persistence store for storing metadata snapshots for metadata, the detecting metadata persistence stores in a partition namespace-based storage device responsive to garbage collection instructions, determining a target storage segment, comprising:
And responding to a first garbage collection instruction, acquiring a data writing time stamp corresponding to each storage section in the first metadata persistence storage area, and determining the storage section corresponding to the earliest writing time stamp as the target storage section, or responding to a second garbage collection instruction, acquiring a storage load corresponding to each storage section in the second metadata persistence storage area, and determining at least one storage section with the largest storage load as the target storage section, wherein the storage load is used for representing the duty ratio of invalid data in the storage section.
2. The method of claim 1, wherein the determining the at least one memory segment with the greatest memory load as the target memory segment comprises:
acquiring a comprehensive load value of the second metadata persistence storage area, and determining a target number according to the comprehensive load value;
and determining the storage segments with the maximum storage load in the second metadata persistence storage area as the target storage segments.
3. The method of claim 1, wherein the generating the corresponding metadata snapshot for the target storage segment and storing the metadata snapshot into the metadata persistence storage comprises:
Generating a first metadata snapshot for a target storage segment in the first metadata persistence storage area, and storing the first metadata snapshot to the second metadata persistence storage area, and/or;
A second metadata snapshot is generated in the second metadata persistent storage for a target storage segment in the second metadata persistent storage.
4. The method of claim 3, wherein the target memory segment comprises at least two, the second metadata persistence memory area comprises a first sub-memory area and a second sub-memory area, wherein the generating a second metadata snapshot in the second metadata persistence memory area for the target memory segment in the second metadata persistence memory area comprises:
Generating metadata snapshots of all the target storage segments in the second metadata persistence storage area, and acquiring data heat corresponding to all the target storage segments, wherein the data heat represents the update probability of metadata corresponding to the target storage segments;
Storing the metadata snapshot corresponding to the target storage segment with the data heat greater than a heat threshold to the first sub-storage area;
and storing the metadata snapshot corresponding to the target storage segment with the data heat not greater than the heat threshold to the second sub-storage area.
5. The method of claim 1, wherein the generating the corresponding metadata snapshot for the target storage segment comprises:
Acquiring storage priorities of the first metadata persistence storage area and the second metadata persistence storage area;
Determining a proportion value of a first metadata snapshot and a second metadata snapshot according to the storage priority, wherein the first metadata snapshot is generated based on a target storage segment in the first metadata persistence storage area; the second metadata snapshot is a metadata snapshot generated based on a target storage segment within the second metadata persistent storage;
and respectively executing metadata relocation requests based on the proportion values through a pre-configured task queue, and generating a corresponding number of first metadata snapshots and second metadata snapshots in unit time.
6. The method of claim 5, wherein the method further comprises:
Acquiring comprehensive load values corresponding to the first metadata persistence storage area and the second metadata persistence storage area;
and adjusting the storage priority of the first metadata persistence storage area and the second metadata persistence storage area according to the comprehensive load value corresponding to the first metadata persistence storage area and the second metadata persistence storage area.
7. The method of claim 1, wherein prior to detecting metadata persistent storage in a partition namespace-based storage device in response to a garbage collection instruction, determining a target storage segment, the method further comprises:
Acquiring a comprehensive load value of the metadata persistence storage area;
determining a corresponding trigger recovery load interval according to the comprehensive load value, wherein the comprehensive load value is inversely proportional to an interval median of the trigger recovery load interval;
and when the storage load of at least one storage section in the metadata persistence storage area is positioned in the trigger recycling load interval, generating the garbage recycling instruction.
8. The method of claim 1, wherein the partition-based namespace-based storage device includes at least one stand-alone storage engine file therein for storing business data corresponding to the metadata;
The method further comprises the steps of:
acquiring validity information of a single machine storage engine file corresponding to the target storage segment, wherein the validity information is used for indicating the validity of the single machine storage engine file;
The generating a corresponding metadata snapshot for the target storage segment includes:
and generating a corresponding metadata snapshot aiming at the target storage segment if the single storage engine file is in the validity state according to the validity information of the single storage engine file.
9. A metadata garbage collection device, comprising:
The detection module is used for responding to the garbage collection instruction, detecting a metadata persistence storage area in the storage equipment based on the partition naming space and determining a target storage section, wherein the metadata persistence storage area comprises a storage section for storing metadata based on a sequential writing mode, the target storage section is a storage section to be subjected to garbage collection, and the metadata is used for realizing data recovery of corresponding service data;
The generation module is used for generating a corresponding metadata snapshot aiming at the target storage segment, storing the metadata snapshot into the metadata persistence storage area, and recording the latest generated metadata in the target storage segment;
the cleaning module is used for cleaning the data in the target storage section;
Wherein the metadata persistence storage area includes a first metadata persistence storage area for storing a history of operation for the business data and a second metadata persistence storage area for storing a metadata snapshot for the metadata,
The detection module is specifically configured to obtain, in response to a first garbage collection instruction, a data write time stamp corresponding to each storage segment in the first metadata persistence storage area, and determine a storage segment corresponding to an earliest write time stamp as the target storage segment, or obtain, in response to a second garbage collection instruction, a storage load corresponding to each storage segment in the second metadata persistence storage area, and determine at least one storage segment with a maximum storage load as the target storage segment, where the storage load is used to characterize a duty ratio of invalid data in the storage segments.
10. An electronic device is characterized by comprising a processor and a memory;
The memory stores computer-executable instructions;
the processor executing computer-executable instructions stored in the memory, causing the processor to perform the metadata garbage collection method of any one of claims 1 to 8.
11. A computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement the metadata garbage collection method of any of claims 1 to 8.
12. A computer program product comprising a computer program which, when executed by a processor, implements the metadata garbage collection method of any of claims 1 to 8.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411622074.9A CN119576226B (en) | 2024-11-13 | 2024-11-13 | Metadata garbage collection method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411622074.9A CN119576226B (en) | 2024-11-13 | 2024-11-13 | Metadata garbage collection method and device, electronic equipment and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN119576226A CN119576226A (en) | 2025-03-07 |
| CN119576226B true CN119576226B (en) | 2025-12-19 |
Family
ID=94811324
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411622074.9A Active CN119576226B (en) | 2024-11-13 | 2024-11-13 | Metadata garbage collection method and device, electronic equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119576226B (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117093542A (en) * | 2023-08-31 | 2023-11-21 | 济南浪潮数据技术有限公司 | Metadata snapshot rollback method, system, equipment and storage medium |
| CN118708570A (en) * | 2024-06-28 | 2024-09-27 | 新华三信息技术有限公司 | Distributed storage metadata expansion method, device and equipment |
Family Cites Families (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10180951B2 (en) * | 2013-03-15 | 2019-01-15 | Amazon Technologies, Inc. | Place snapshots |
| CN107957918B (en) * | 2016-10-14 | 2019-05-10 | 腾讯科技(深圳)有限公司 | Data reconstruction method and device |
| US11010335B2 (en) * | 2018-08-09 | 2021-05-18 | Netapp, Inc. | Methods and systems for protecting data of a persistent memory based file system |
| US10802726B2 (en) * | 2018-10-29 | 2020-10-13 | Microsoft Technology Licensing, Llc | Optimized placement of data contained in a garbage collected storage system |
| CN111143112B (en) * | 2018-11-02 | 2023-08-25 | 伊姆西Ip控股有限责任公司 | Method, apparatus and computer readable storage medium for restoring metadata |
| US10929288B1 (en) * | 2019-10-08 | 2021-02-23 | International Business Machines Corporation | Protecting against data loss during garbage collection |
| CN113741796B (en) * | 2020-07-16 | 2024-04-16 | 北京沃东天骏信息技术有限公司 | A method and device for data persistence of terminal applications |
| CN112685360B (en) * | 2020-12-29 | 2023-09-22 | 湖北华中电力科技开发有限责任公司 | Memory data persistence method and device, storage medium, computer equipment |
| CN117763636A (en) * | 2023-12-08 | 2024-03-26 | 支付宝(杭州)信息技术有限公司 | Data writing method, recovery method, reading method and corresponding device |
| CN117724996B (en) * | 2024-01-03 | 2025-08-22 | 北京火山引擎科技有限公司 | Data storage method and device |
-
2024
- 2024-11-13 CN CN202411622074.9A patent/CN119576226B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117093542A (en) * | 2023-08-31 | 2023-11-21 | 济南浪潮数据技术有限公司 | Metadata snapshot rollback method, system, equipment and storage medium |
| CN118708570A (en) * | 2024-06-28 | 2024-09-27 | 新华三信息技术有限公司 | Distributed storage metadata expansion method, device and equipment |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119576226A (en) | 2025-03-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111309732B (en) | Data processing method, device, medium and computing equipment | |
| US20160266815A1 (en) | Free space collection in log structured storage systems | |
| US20110107053A1 (en) | Allocating Storage Memory Based on Future Use Estimates | |
| CN110545313B (en) | Message push control method and device and electronic equipment | |
| CN109960686A (en) | Database log processing method and device | |
| US10037270B2 (en) | Reducing memory commit charge when compressing memory | |
| CN115599707A (en) | Memory management method, device, medium and electronic equipment | |
| CN114647379B (en) | File storage method, device, electronic device and computer readable medium | |
| CN113934692B (en) | File cleaning method, device, storage medium and equipment | |
| CN111930684A (en) | Small file processing method, device and equipment based on HDFS (Hadoop distributed File System) and storage medium | |
| CN116541174A (en) | Storage device capacity processing method, device, equipment and storage medium | |
| CN119576226B (en) | Metadata garbage collection method and device, electronic equipment and storage medium | |
| EP4668123A1 (en) | Method and apparatus for evicting cached data, electronic device, and storage medium | |
| CN116643702A (en) | Distributed file processing method, device, equipment and medium | |
| CN115065685A (en) | Cloud computing resource scheduling method, device, equipment and medium | |
| CN120596031A (en) | Data writing method and system, data reading method and system | |
| CN115080143A (en) | Page resource preloading method, device, equipment and storage medium | |
| CN111681267A (en) | Track Intrusion Prevention Method Based on Image Recognition | |
| CN118733482A (en) | Garbage collection method, device and storage medium for partition storage device | |
| CN111782588A (en) | File reading method, device, equipment and medium | |
| CN115080233A (en) | Resource allocation management method, device, equipment and storage medium for application software | |
| CN119226331B (en) | Hot spot data detection method, equipment and storage medium of database | |
| CN116627592B (en) | Virtual machine load detection method, device, electronic device and storage medium | |
| CN120508479B (en) | Log processing methods, log query methods, equipment, media, and products | |
| CN119964271B (en) | Data recording method, device, equipment and storage medium of event data recorder |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |