CN114936188B

CN114936188B - Data processing method, device, electronic equipment and storage medium

Info

Publication number: CN114936188B
Application number: CN202210601204.5A
Authority: CN
Inventors: 余涛
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2024-10-11
Anticipated expiration: 2042-05-30
Also published as: CN114936188A

Abstract

The application provides a data processing method, a data processing device, electronic equipment and a storage medium, and relates to the technical field of data processing. The method comprises the following steps: obtaining a file creation request, the file creation request comprising: the size of the target file, the name of the target file; determining the number of objects into which the target file is divided and the number of data blocks contained in each object according to the size of the target file, the preset erasure ratio and the preset data block size; respectively creating a data block of each object on each selected disk according to a preset erasure ratio, and generating file creation information; according to the file creation information, first type metadata and second type metadata of the target file are respectively generated and stored, the first type metadata and the second type metadata are stored in a system disk of the file system, and the second type metadata are stored in a data disk of the file system. The method can reduce the data access amount and improve the data access efficiency.

Description

Data processing method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data processing method, a data processing device, an electronic device, and a storage medium.

Background

With the advent of the big data age, massive data needs to be stored, and Erasure Code (EC) is a data protection method commonly used at present, so that higher data reliability can be obtained with smaller data redundancy. Massive data storage can generate a large amount of metadata, the metadata mainly describes data attribute information, and erasure correction calculation needs to be stored after the data is segmented to a certain extent, so that when the metadata is based on erasure correction storage, the data amount of the metadata can be more, and when the data is accessed, the metadata is generally accessed first, and therefore, the improvement of the access performance of the metadata becomes important.

The method for optimizing the metadata access speed is mainly to improve the data access efficiency in a mode that a plurality of servers share access pressure through hardware acceleration.

The above method, however, increases hardware consumption to some extent, making the data processing cost high.

Disclosure of Invention

The application aims to overcome the defects in the prior art and provide a data processing method, a device, electronic equipment and a storage medium, so as to improve metadata access efficiency and data processing efficiency.

In order to achieve the above purpose, the technical scheme adopted by the embodiment of the application is as follows:

In a first aspect, an embodiment of the present application provides a data processing method, applied to a file system based on erasure correction storage, where the method includes:

obtaining a file creation request, the file creation request comprising: the size of the target file, the name of the target file;

determining the number of objects into which the target file is divided and the number of data blocks contained in each object according to the size of the target file, a preset erasure ratio and the size of a preset data block;

According to the preset erasure ratio, respectively creating a data block of each object on each selected disk, and generating file creation information, wherein the file creation information comprises: the file name, the file size, the file storage path, the file identification, the file creation time, the number of objects contained in the file and the information of each data block in each object contained in the file, wherein the data blocks in each object comprise effective data blocks and redundant data blocks, and each data block under the same object is distributed on different selected magnetic discs of the file system;

According to the file creation information, respectively generating first type metadata and second type metadata of the target file, storing the first type metadata into a system disk of the file system, and storing the second type metadata into a data disk of the file system; the first type of metadata includes: basic information of the target file, and association relation between the first type metadata and second type metadata, wherein the second type metadata comprises: and the access frequency of the first type metadata is greater than that of the second type metadata.

Optionally, the generating the first type metadata and the second type metadata of the target file according to the file creation information includes:

Generating basic information of the target file according to the file name, the file size, the file creation time, the file storage path and the number of objects contained in the file, and generating an association relation between the first type metadata and the second type metadata according to the file identification and the information of each data block in each object contained in the file, wherein the association relation is used for representing the mapping relation between the file identification of the target file and each data block in each object contained in the file;

obtaining first-type metadata of the target file according to the basic information of the target file and the association relation between the first-type metadata and the second-type metadata;

and generating data block information of the target file according to the information of each data block in each object contained in the file, and taking the data block information as the second type metadata.

Optionally, after generating the first type metadata and the second type metadata of the target file according to the file creation information, and storing the first type metadata into a system disk of the file system, and storing the second type metadata into a data disk of the file system, the method includes:

obtaining a file write request for the target file, the file write request comprising: name of the target file, file data of the target file;

determining whether the target file exists or not from the created files according to the name of the target file;

if the target file exists, determining redundant data contained in each object of the target file according to the file data of the target file, wherein the redundant data is used for recovering the file data of the target file;

And writing redundant data contained in each object of the target file into redundant data blocks of each object, writing file data of the target file into effective data blocks of each object respectively, and updating first-type metadata and second-type metadata corresponding to the target file.

Optionally, after the writing the redundant data contained in each object of the target file into the redundant data blocks of each object and writing the file data of the target file into the valid data blocks of each object, the method includes:

acquiring a file reading request aiming at the target file, wherein the file reading request comprises the following steps: the name of the target file, the size of the target file, the reading offset information and the reading length information;

Inquiring first type metadata of each file in a system disk according to the name of the target file and the read offset information, and determining information of each data block corresponding to the target file;

And reading the target file from the corresponding disk according to the read offset information and the read length information according to the information of each data block corresponding to the target file.

Optionally, according to the name of the target file and the read offset information, querying first metadata of each file in a system disk, and determining information of each data block corresponding to the target file includes:

determining a file identification of the target file according to the name of the target file;

inquiring the association relation between the first type metadata and the second type metadata of each file according to the file identification of the target file, and determining the second type metadata corresponding to the target file;

inquiring the second type metadata to obtain information of a magnetic disk to which each data block of the target file belongs;

and determining the information of the magnetic disk to which the target data block corresponding to the target file belongs from the information of each data block corresponding to the target file according to the read offset information.

Optionally, the reading the target file from the corresponding disk according to the information of each corresponding data block of the target file, according to the read offset information and the read length information, includes:

According to the information of the magnetic disk to which the target data block corresponding to the target file belongs, respectively reading file data stored in each target data block from the magnetic disk to which each target data block belongs according to the read offset information and the read length information;

And combining according to the file data stored in each read target data block to obtain the target file.

Optionally, the reading the file data stored in each target data block from the disk to which each target data block belongs includes:

If the target disk fails to read and the data stored in the target data block on the target disk is valid data, performing erasure correction calculation to recover the valid data in the target data block on the target disk, wherein the target disk is any one of the disks corresponding to each target data block.

Optionally, the performing erasure calculation restores valid data in the target data block on the target disk, including:

And calculating the effective data in the target data blocks on the target disk according to the effective data read from the target data blocks on the disks except the target disk and the redundant data.

Optionally, before creating a data block of each object on each selected disk according to the preset erasure ratio, the method includes:

Determining the number of magnetic discs to be selected according to the preset erasure ratio;

And selecting the number of the disks from a plurality of disks in the file system according to the disk weights of the disks in the file system, wherein the disk weights of the disks are determined according to the capacity of the disks and the total capacity of the system disks.

Optionally, the first type of metadata is stored in the form of key-value pairs.

In a second aspect, an embodiment of the present application further provides a data processing apparatus, applied to a file system based on erasure correction storage, where the apparatus includes: the device comprises an acquisition module, a determination module and a generation module;

the obtaining module is configured to obtain a file creation request, where the file creation request includes: the size of the target file, the name of the target file;

The determining module is used for determining the number of objects divided into the target file and the number of data blocks contained in each object according to the size of the target file, the preset erasure ratio and the preset data block size;

the generating module is configured to create a data block of each object on each selected disk according to the preset erasure ratio, and generate file creation information, where the file creation information includes: the file name, the file size, the file storage path, the file identification, the file creation time, the number of objects contained in the file and the information of each data block in each object contained in the file, wherein the data blocks in each object comprise effective data blocks and redundant data blocks, and each data block under the same object is distributed on different selected magnetic discs of the file system;

The generating module is used for respectively generating first type metadata and second type metadata of the target file according to the file creation information, storing the first type metadata into a system disk of the file system and storing the second type metadata into a data disk of the file system; the first type of metadata includes: basic information of the target file, and association relation between the first type metadata and second type metadata, wherein the second type metadata comprises: and the access frequency of the first type metadata is greater than that of the second type metadata.

Optionally, the generating module is specifically configured to generate basic information of the target file according to the file name, the file size, the file creation time, the file storage path and the number of objects contained in the file, and generate, according to the file identifier and information of each data block in each object contained in the file, an association relationship between the first type metadata and the second type metadata, where the association relationship is used to characterize a mapping relationship between the file identifier of the target file and each data block in each object contained in the file;

Optionally, the apparatus further comprises: a write module;

the obtaining module is further configured to obtain a file write request for the target file, where the file write request includes: name of the target file, file data of the target file;

The determining module is further used for determining whether the target file exists or not from the created files according to the name of the target file;

The determining module is further configured to determine, if the redundant data exists, according to file data of the target file, redundant data included in each object of the target file, where the redundant data is used to restore the file data of the target file;

the writing module is used for writing the redundant data contained in each object of the target file into the redundant data block of each object, writing the file data of the target file into the effective data block of each object respectively, and updating the first type metadata and the second type metadata corresponding to the target file.

Optionally, the apparatus further comprises: a reading module;

optionally, the obtaining module is further configured to obtain a file reading request for the target file, where the file reading request includes: the name of the target file, the size of the target file, the reading offset information and the reading length information;

The determining module is further used for inquiring first metadata of each file in the system disk according to the name and the read offset information of the target file and determining information of each data block corresponding to the target file;

And the reading module is used for reading the target file from the corresponding magnetic disk according to the reading offset information and the reading length information according to the information of each data block corresponding to the target file.

Optionally, the determining module is specifically configured to determine a file identifier of the target file according to the name of the target file;

Optionally, the reading module reads the file data stored in each target data block from the disk to which each target data block belongs according to the read offset information and the read length information according to the information of the disk to which the target data block corresponding to the target file belongs;

Optionally, if the reading module specifically fails to read the target disk, and the data stored in the target data block on the target disk is valid data, performing erasure correction calculation to recover the valid data in the target data block on the target disk, where the target disk is any disk in the disks corresponding to the target data blocks.

Optionally, the reading module calculates the valid data in the target data block on the target disk according to the valid data read from the target data block on the disk except the target disk and the redundant data.

Optionally, the determining module is further configured to determine, according to the preset erasure ratio, the number of disks to be selected;

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium, and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method as provided in the first aspect when executed.

In a fourth aspect, an embodiment of the present application provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method as provided in the first aspect.

The beneficial effects of the application are as follows:

The application provides a data processing method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining a file creation request, the file creation request comprising: the size of the target file, the name of the target file; determining the number of objects into which the target file is divided and the number of data blocks contained in each object according to the size of the target file, the preset erasure ratio and the preset data block size; respectively creating a data block of each object on each selected disk according to a preset erasure ratio, and generating file creation information; according to the file creation information, first type metadata and second type metadata of the target file are respectively generated and stored, the first type metadata and the second type metadata are stored in a system disk of the file system, and the second type metadata are stored in a data disk of the file system. According to the method, metadata information of the file is classified according to access frequency, first type metadata and second type metadata are respectively generated, the first type metadata is stored in a system disk of a file system, the second type metadata is stored in a data disk of the file system, because information in the metadata is not accessed in the data access process, based on the classification of the metadata, the metadata can be accessed according to access requirements by separately storing the first type metadata with high frequency access and the second type metadata with low frequency access, so that unnecessary metadata access is reduced, the data access amount can be reduced to a certain extent, the data access efficiency is improved, the first type metadata and the second type metadata are respectively stored in the system disk and the data disk, and when the data access is performed, the access times of the data disk can be reduced, and the pressure of the data disk can be reduced.

In addition, based on the classified storage of the first type metadata and the second type metadata, the data amount of metadata which needs to be updated when the metadata information is modified can be reduced, and the read-write performance of the metadata is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a file system for erasure correction storage according to an embodiment of the present application;

FIG. 2 is a schematic diagram of disk space division according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a data processing method according to an embodiment of the present application;

FIG. 4 is a second flowchart of a data processing method according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating a data processing method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of data separation and storage based on erasure correction and storage according to an embodiment of the present application;

FIG. 7 is a flowchart illustrating a data processing method according to an embodiment of the present application;

FIG. 8 is a flowchart of a data processing method according to an embodiment of the present application;

FIG. 9 is a flowchart illustrating a data processing method according to an embodiment of the present application;

Fig. 10 is a flow chart of a data processing method according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a data processing system according to an embodiment of the present application;

FIG. 12 is a schematic diagram of reading a data block according to an embodiment of the present application;

FIG. 13 is a schematic diagram of another embodiment of a data block read;

FIG. 14 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

Fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for the purpose of illustration and description only and are not intended to limit the scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this disclosure, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art under the direction of the present disclosure.

In addition, the described embodiments are only some, but not all, embodiments of the application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that the term "comprising" will be used in embodiments of the application to indicate the presence of the features stated hereafter, but not to exclude the addition of other features.

Before introducing the method of the application, the concept of metadata is described, the metadata mainly is information describing the attribute of the data, and for a file, the metadata not only records the name, type, access right, size, storage path and the like of the file data, but also records the storage position, source and the like of the file. When data access is performed, metadata of a file is usually accessed first, so that the access frequency of the metadata of the file is very high in the data access process, and the access to the file data is far more than the access to the file data.

When the file is read, metadata is firstly required to be accessed before the data is accessed, and after the information such as the size, access authorization, storage position and the like of the file data is obtained according to the information such as the file name and the like, the specific file data can be read from the storage disk according to the position information stored by the file. However, a large amount of file access requests may not need to obtain specific data of the file, only access to basic information of the file is needed, and only metadata is needed to be accessed at this time, but in metadata, not all information needs to be accessed every time of access, if all metadata of the file is obtained for every access, metadata access efficiency will be low.

Based on the method, the metadata of the file can be separated under the background based on the mode of storing the data by the erasure technique, the metadata with high heat is stored in the system disk by the key value, the metadata with low heat is stored in the data disk, and only the metadata with high heat or the metadata with high heat and the metadata with low heat can be accessed according to the access requirement, so that the metadata access efficiency can be improved by avoiding simultaneously accessing the metadata with high heat and the metadata with low heat each time. Here, metadata with high heat is accessed more frequently and more easily than metadata with low heat.

Fig. 1 is a schematic structural diagram of a file system of erasure correction storage according to an embodiment of the present application, where, as fig. 1 exemplarily illustrates a storage structure of one file data in the file system of erasure correction storage, each file data in the file system of erasure correction storage has the same storage structure. As shown in fig. 1, the file data may be divided into a plurality of data segments, each data segment is stored as an object, a plurality of data blocks may be included under each object, wherein a valid data block and a redundant data block may be included, the data block storing the file data may be used as a valid data block, the data block storing the redundant data may be used as a redundant data block, the redundant data may be used to restore valid data having damage, the number of data blocks included under each object may be determined according to a preset erasure ratio, and the data blocks under each object are respectively distributed on different disks of the file system.

Fig. 2 is a schematic diagram of disk space division provided in an embodiment of the present application, as shown in fig. 2, for any one of the disks in fig. 1, the disk space may include: a reserved area, a super block, a plurality of block groups, each block group may include: the index area and the data area, each index area may include a main index area and a spare index area, the main index area and the spare index area include the same information, taking the main index area as an example, the main index area may include a plurality of sub index units, one sub index unit may include 64 file nodes, each file node stores metadata information of a data block, and the method may include: the use state of the data block, the file name, the file creation time, the file modification time, the file serial number, the occupied capacity of the current data block and the like.

The data area may include 4096 data blocks, each of which stores specific file data, each of which has a size of 64M, and each of which is stored in a block group together with metadata corresponding thereto.

The super block stores information such as the number of index units, the number of block groups, and the size of the data block.

FIG. 3 is a flowchart illustrating a data processing method according to an embodiment of the present application; the execution subject of the method may be an electronic device or a processing device such as a processor. The method may be applied to the file system based on erasure storage shown in fig. 1, as shown in fig. 3, and the method may include:

s301, acquiring a file creation request, wherein the file creation request comprises: the size of the target file, the name of the target file.

The premise of file reading and writing is that a file is firstly created, data can be written into the created file on the basis of creating the file, and after the data writing is finished, the file reading can be performed.

Alternatively, a file may be created according to a file creation request, where the created file, i.e., the created empty file, does not include specific data.

The file creation request may include: the size of the target file, the name of the target file, where the target file may refer to any file that is currently being created.

S302, determining the number of objects divided into the target file and the number of data blocks contained in each object according to the size of the target file, the preset erasure ratio and the preset data block size.

Here, the erasure ratio can be determined according to the erasure policy, and after the erasure policy is selected, the erasure ratio is determined, and the erasure policy is mainly that original data is encoded through an erasure code algorithm to obtain redundant data, and the file data and the redundant data are stored together, so that the fault tolerance purpose is achieved.

As previously explained, the predetermined data block size may be 64M. Assuming that the size of the target file is 1GB (1024M), the preset erasure ratio is 4:1, then the number of data blocks corresponding to the target file can be calculated according to the size of the target file and the size of the preset data blocks to obtain 1024/64=16, and the preset erasure ratio is 4:1, which represents that 5 data blocks are included under one object, wherein 4 data blocks are valid data blocks, 1 data block is a redundant data block, and since 4 valid data blocks can be included under 1 object to store file data, then, in the case that 16 data blocks correspond to the target file, the number of objects corresponding to the target file is 4, and thus, the data of the target file can be respectively stored under each object.

Based on the analysis, the number of objects into which the target file is divided and the number of data blocks contained in each object can be determined, and when the size of the target file and the preset erasure ratio are changed, the number of objects into which the target file is divided and the number of data blocks contained in each object can be still determined according to the method.

S303, respectively creating a data block of each object on each selected disk according to a preset erasure ratio, and generating file creation information.

Wherein the file creation information may include: the method comprises the steps of file name, file size, file storage path, file identification, file creation time, number of objects contained in a file and information of data blocks in objects contained in the file, wherein the data blocks in the objects comprise effective data blocks and redundant data blocks, and the data blocks under the same object are distributed on different magnetic discs of a selected file system.

In some embodiments, the number of disks to be used to create the data blocks may also be determined based on a preset erasure ratio, as described above, according to which the number of data blocks contained under one object may be determined, and since the data blocks under the same object are distributed on different disks of the file system, then the number of disks to be selected corresponds to the number of data blocks under one object.

Alternatively, based on the selected disks, a data block under an object may be created on each disk, respectively, thereby completing creation of the target file and generating file creation information.

S304, according to the file creation information, respectively generating first type metadata and second type metadata of the target file, storing the first type metadata and the second type metadata, storing the first type metadata into a system disk of a file system, and storing the second type metadata into a data disk of the file system.

The first type of metadata includes: basic information of the target file, and association relation between first type metadata and second type metadata, wherein the second type metadata comprises: the data block information of the target file, the access frequency of the first type metadata is greater than that of the second type metadata.

The first type of metadata may refer to the high-heat metadata, and the second type of metadata may refer to the low-heat metadata.

In some embodiments, the first type metadata and the second type metadata of the target file may be generated according to the generated file creation information, where the first type metadata may be generated according to some information in the file creation information, the second type metadata may be generated according to other information in the file creation information, and the generated first type metadata and second type metadata may be stored.

Alternatively, in this embodiment, the generated first type of metadata may be stored in a system disk of the file system, and from the perspective of a computer, the system disk, that is, a disk of a system installed in the computer, may be generally referred to as a C disk, and the second type of metadata may be stored in a data disk of the file system, where the data disk is a disk used for storing data other than the system disk, and may be generally referred to as a D disk, an E disk, or the like.

It should be noted that, the first metadata may further include an association relationship between the first metadata and the second metadata, in some scenarios, when only the basic information of the target file needs to be accessed, the first metadata may be implemented by accessing the basic information of the target file in the first metadata, and in other scenarios, when the file data of the target file needs to be accessed, the basic information of the target file in the first metadata and the association relationship between the first metadata and the second metadata may be accessed, so as to determine the information of the data block of the target file, thereby reading the file data of the target file from the disk according to the information of the data block, and when the metadata information access is performed, the amount of metadata data needed to be acquired may be reduced through the established association relationship between the first metadata and the second metadata.

Because the information in the metadata is not all accessed in the data access process, based on the classification of the metadata, the metadata can be accessed according to the access requirement by separately storing the high-frequency access first type metadata and the low-frequency access second type metadata, thereby reducing unnecessary metadata access, reducing the data access quantity to a certain extent and improving the data access efficiency. And the first type metadata and the second type metadata are respectively stored in the system disk and the data disk, so that the access times of the data disk can be reduced and the pressure of the data disk can be reduced when the data access is performed. In addition, based on the classified storage of the first type metadata and the second type metadata, the data amount of metadata which needs to be updated when the metadata information is modified can be reduced, and the read-write performance of the metadata is improved.

In summary, the data processing method provided in this embodiment includes: obtaining a file creation request, the file creation request comprising: the size of the target file, the name of the target file; determining the number of objects into which the target file is divided and the number of data blocks contained in each object according to the size of the target file, the preset erasure ratio and the preset data block size; respectively creating a data block of each object on each selected disk according to a preset erasure ratio, and generating file creation information; according to the file creation information, first type metadata and second type metadata of the target file are respectively generated and stored, the first type metadata and the second type metadata are stored in a system disk of the file system, and the second type metadata are stored in a data disk of the file system. According to the method, metadata information of the file is classified according to access frequency, first type metadata and second type metadata are respectively generated, the first type metadata is stored in a system disk of a file system, the second type metadata is stored in a data disk of the file system, because information in the metadata is not accessed in the data access process, based on the classification of the metadata, the metadata can be accessed according to access requirements by separately storing the first type metadata with high frequency access and the second type metadata with low frequency access, so that unnecessary metadata access is reduced, the data access amount can be reduced to a certain extent, the data access efficiency is improved, the first type metadata and the second type metadata are respectively stored in the system disk and the data disk, and when the data access is performed, the access times of the data disk can be reduced, and the pressure of the data disk can be reduced.

FIG. 4 is a second flowchart of a data processing method according to an embodiment of the present application; optionally, in step S304, generating the first type metadata and the second type metadata of the target file according to the file creation information and storing the first type metadata and the second type metadata, respectively, may include:

S401, generating basic information of a target file according to a file name, a file size, a file creation time, a file storage path and the number of objects contained in the file, and generating an association relation between first type metadata and second type metadata according to a file identifier and information of each data block in each object contained in the file, wherein the association relation is used for representing a mapping relation between the file identifier of the target file and each data block in each object contained in the file.

Optionally, the first type metadata includes basic information of the target file, where the basic information may include some basic data such as a name, a file identifier, a file size, a file creation time, a modification time, a storage path, and the like, and the association relationship between the first type metadata and the second type metadata may be constructed according to the file identifier of the target file in the first type metadata and information of each data block included in the target file, where the association relationship may be used to query and obtain information of each data block of the target file according to the file identifier of the target file.

S402, obtaining the first type metadata of the target file according to the basic information of the target file and the association relation between the first type metadata and the second type metadata.

Therefore, the first type metadata of the target file can be obtained by combining the obtained basic information of the target file and the association relationship between the first type metadata and the second type metadata.

S403, generating data block information of the target file according to the information of each data block in each object contained in the file, and taking the data block information as second type metadata.

The information of each data block in each object included in the file may refer to storage location information of each data block, that is, a disk location corresponding to each data block, and the second metadata may include information of all data blocks of the target file.

And according to the information of each data block, each data block can be inquired and obtained from the corresponding disk, and the stored file data can be read from the data block.

In addition, the second type of metadata may further include: data block size, data block name, etc.

The association relationship between the generated first type metadata and the second type metadata is described here, and it needs to be stated that in this embodiment, the first type metadata is stored in the form of key value pairs, that is, the first type metadata is stored in a system disk of the file system through a key value.

The key value database is a novel database except the relational database, each record in the database is a key value pair, the key value pair consists of two elements, a key and a value, and the key and the value are variable-length byte sequences. Wherein keys and values can be either binary data or text strings, and keys in a database must be unique. The key value can provide functions such as a persistence mechanism and data synchronization for the database, has the characteristics of high concurrency performance, high expandability, high reliability and the like, and is an effective method for storing metadata.

The association relationship between the first type metadata and the second type metadata may be represented in the form of a table. Mainly comprises the following table structures: the file information table is used for storing entries of metadata of a first type of file, and includes a file path, a file creation time, a modification time, a file size, the number of objects included in the file, a file state, a POOL to which the file belongs, and a file ID (file identifier), as shown in table 1:

Table 1: file information table

The table shares a key: "FI@ File ID", FI means File Info, file ID is the unique identification of the File, and the spliced character is' @. The file status is represented by "0,1,2", where '0' indicates that the file was not written after creation, '1' indicates that the write was in progress, and '2' indicates that the file write was closed. Pool ID is the number of the storage resource Pool to which the file belongs. The number of objects indicates how many OBJ objects the file occupies.

The data Block information table records the information of all (Block) data blocks contained under each file, the key is represented by 'FB@ file ID', wherein the file ID represents that the blocks in the table belong to a specific file, and the Block description information stores the basic information of each Block. As shown in table 2.

Table 2: data block information table

The description information of the Block is represented by a string of character strings with specified formats, wherein bits 5, 10 and 12 are connection symbols; the first 4 bits are denoted by '0000' as reserved; the 6-9 bits are represented by 4-bit numbers, and the disk number corresponding to the block is recorded; the state of the 11-bit record block is respectively represented by 0-5, namely an initialization state, a normal state, a writing state, a damaged state, a missing state and an off-line state; 13 bits indicate the capacity status of the block. In the data block information table, a plurality of block information are adopted'; 'segmentation'.

The Block description information format table records a specific format of each data Block and information represented by each bit of the data Block.

Table 3: block description information format

And the file index table is used for indexing files and rapidly searching the designated file information. Wherein the key is stored in a "DIMF@file directory: the file name "indicates that value is the file ID.

Table 4: file index table

Optionally, after the target file is successfully created, the data block information table can be generated according to the generated file creation information, then the file information table is generated according to the metadata information of the target file, the database is called, and the file index table, the data block information table and the file information table are written into the key value pair database.

FIG. 5 is a flowchart illustrating a data processing method according to an embodiment of the present application; optionally, in step S304, according to the file creation information, first metadata and second metadata of the target file are generated respectively, the first metadata and the second metadata are stored, the first metadata is stored in a system disk of the file system, and the second metadata is stored in a data disk of the file system, where after the method of the present application further includes:

S501, acquiring a file writing request aiming at a target file, wherein the file writing request comprises the following steps: the name of the target file, the file data of the target file.

Based on the created target file, the embodiment writes data into the target file according to the acquired file writing request. Wherein the file write request may include: the name of the target file and the file data of the target file, wherein the file data, that is, the specific data to be written into the created target file, can be understood as: the created target file is an empty file, and file data of the target file is written into specific content in the created empty file.

It should be noted that, the target file may be any file, and for convenience of understanding, the target file to be written and the created target file may be considered as one file.

S502, determining whether the target file exists or not from the created files according to the names of the target files.

In performing writing of the target file, a plurality of files may be created in the file system before that, and whether the target file exists or not may be searched for from the plurality of files already created in the file system according to the name of the target file, that is, whether the target file has been created in the file system or not may be searched for.

Alternatively, the name of the obtained target file may be compared with the file names of the files created in the file system to determine whether the target file exists.

If yes, determining redundant data contained in each object of the target file according to the file data of the target file, wherein the redundant data are used for recovering the file data of the target file.

And if the name of the created file is the same as the name of the target file in the file system, confirming that the target file exists.

In the file system based on erasure correction storage, it has been described above that among the data blocks included in one object of one file, there are a valid data block for storing valid data of the file and a redundant data block for storing redundant data.

When the effective data stored in any effective data block under an object is damaged, erasure calculation can be performed according to the effective data in the rest effective data blocks under the object and the redundant data in the redundant data blocks, so that the damaged effective data is recovered.

Here, the redundant data may be calculated using an erasure calculation function. Erasure calculations are known methods of calculation and can be understood with reference to the description. Alternatively, the corresponding redundant data under an object may be calculated from each valid data under the object.

S504, writing redundant data contained in each object of the target file into redundant data blocks of each object, writing file data of the target file into effective data blocks of each object respectively, and updating first type metadata and second type metadata corresponding to the target file.

Alternatively, the valid data block and the redundant data block are only used to distinguish the types of data stored in the data blocks, and the valid data block and the redundant data block are both common data blocks under one object, and any data block can be used as a valid data block after storing valid data, and can be used as a redundant data block after storing redundant data, where the redundant data block and the valid data block are not specific data blocks under one object.

After the writing of the file data of the target file is completed, the first type metadata and the second type metadata of the target file, which are generated when the target file is created, can be adaptively updated.

Wherein, can include: the size of the target file in the first type metadata is updated, and since partial data writing failure may exist in the process of writing the file data or the user only writes partial data, the size of the written file is inconsistent with the size of the created file, and then the file size in the first type metadata of the target file can be updated according to the size of the truly written file. The status of the data blocks in the second type of metadata may also be updated, wherein the status of each data block may include: the initialization state, the normal state, the writing state, the damage state, the missing state and the off-line state can be updated according to the current actual state of each data block. Different states can be recorded by different digital identifications, and the states of the data blocks can be updated by changing the identifications.

Fig. 6 is a schematic diagram of data separation storage based on erasure correction storage according to an embodiment of the present application. As shown in fig. 6, the first type metadata of the file may be stored in a system disk of the file system in a duplicate manner through a database, the second type metadata of the file may be stored in a data disk of the file system in a duplicate manner through a database, and the association relationship between the first type metadata and the second type metadata may also be stored in a system disk in a duplicate manner through a database. Here, the data security can be improved by storing in a duplicate form to prevent the data from being lost.

The effective data of the file and the redundant data obtained through erasure correction calculation are stored in a data disk, so that the first type metadata and the second type metadata of the file are stored separately.

FIG. 7 is a flowchart illustrating a data processing method according to an embodiment of the present application; optionally, in step S504, after writing the redundant data included in each object of the target file into the redundant data block of each object and writing the file data of the target file into the valid data block of each object, the method of the present application may further include:

s701, acquiring a file reading request aiming at a target file, wherein the file reading request comprises the following steps: the name of the target file, the size of the target file, the read offset information and the read length information.

After the target file is successfully written, the file can be further read, and the file data of the target file can be read from each data block of the target file according to the acquired read request of the target file.

The read offset information may refer to an offset of data to be read in a data block, and the read length information may refer to a length of the data to be read.

S702, according to the name of the target file and the read offset information, inquiring the first metadata of each file in the system disk, and determining the information of each data block corresponding to the target file.

In one implementation manner, the state of the target file can be searched from the database according to the name of the target file, whether the target file exists or not is determined, the identification of the target file can be obtained according to the name of the target file under the condition that the target file exists, so that the data block information table is called, and the information of each data block corresponding to the target file is determined according to the association relation between the identification of the target file and each data block.

S703, reading the target file from the corresponding disk according to the read offset information and the read length information according to the information of each corresponding data block of the target file.

Based on the information of the determined data blocks of the target file, the file data of the target file can be read from the disk corresponding to each data block according to the read offset information and the read length information in each data block.

FIG. 8 is a flowchart of a data processing method according to an embodiment of the present application; optionally, in step S702, according to the name of the target file and the read offset information, the first metadata of each file in the system disk is queried, and the determining information of each data block corresponding to the target file may include:

s801, determining a file identification of the target file according to the name of the target file.

Assuming that the file name of the target file is a, the file identification of the generated target file may be expressed as: the DIMF@File Path: a.

S802, according to the file identification of the target file, inquiring the association relation between the first type metadata and the second type metadata of each file, and determining the second type metadata corresponding to the target file.

Here, the data block information table is queried, and according to the association relation between each data block in the data block information table and the identifier of each file, the information of the data block corresponding to the target file is determined, that is, the second metadata corresponding to the target file is determined.

S803, inquiring the second type metadata to obtain the information of the magnetic disk to which each data block of the target file belongs.

Optionally, the identification of the disk to which each data Block recorded in the Block description information format table belongs may be searched to determine the disk information to which each data Block of the target file belongs.

S804, according to the read offset information, determining the information of the magnetic disk to which the target data block corresponding to the target file belongs from the information of each data block corresponding to the target file.

In some embodiments, the file data of the target file to be read may be only part of all the file data of the target file, and then the target data block corresponding to the target file to be read may be determined from the data blocks according to the read offset information and the length to be read of the data blocks, so as to determine the information of the disk to which the target data block belongs.

FIG. 9 is a flowchart illustrating a data processing method according to an embodiment of the present application; optionally, in step S703, according to the information of each corresponding data block of the target file, the reading the target file from the corresponding disk according to the read offset information and the read length information may include:

S901, according to the information of the magnetic disk to which the target data block corresponding to the target file belongs, respectively reading the file data stored in each target data block from the magnetic disk to which each target data block belongs according to the read offset information and the read length information.

In some embodiments, for the determined target data blocks of the target file, part of the file data may be read from each target data block, where in each target data block, file data reading may be performed according to the reading offset information and the reading length information.

S902, combining according to file data stored in each read target data block to obtain a target file.

In general, only partial file data of a target file to be read is read from one target data block, and the complete target file data can be obtained by combining the partial target file data read from each target data block.

Optionally, in step S901, reading file data stored in each target data block from a disk to which each target data block belongs, respectively, may include: if the target disk fails to read and the data stored in the target data block on the target disk is effective data, performing erasure correction calculation to recover the effective data in the target data block on the target disk, wherein the target disk is any disk in the disks corresponding to the target data blocks.

In one implementation manner, if when the target file data is read, when a disk where a valid data block storing valid data is located is damaged, and the part of valid data is read to fail, the part of valid data can be recovered by performing erasure correction calculation, so as to ensure the integrity of the target file data finally read.

Optionally, performing erasure calculations to recover valid data in a target data block on a target disk may include: the valid data in the target data block on the target disk is calculated from the valid data read from the target data block on the disk other than the target disk and the redundant data.

Illustrating: the file data of the target file to be read is correspondingly distributed in each data block under the object 1 of the target file, taking erasure correction ratio as 4:1 as an example, the object 1 includes an effective data block 1, an effective data block 2, an effective data block 3, an effective data block 4 and a redundant data block 5, the effective data block 1, the effective data block 2, the effective data block 3, the effective data block 4 and the redundant data block 5 are respectively distributed in the magnetic disk 1, the magnetic disk 2, the magnetic disk 3, the magnetic disk 4 and the magnetic disk 5, and if the magnetic disk 4 is a damaged magnetic disk, when the file data is read, the data is read from the magnetic disk 4, namely, the effective data stored in the effective data block 4 is read, then the effective data stored in the effective data block 4 can be calculated by adopting erasure correction calculation according to the effective data block 1, the effective data block 2, the effective data stored in the effective data block 3 and the redundant data read in the redundant data block 5.

Fig. 10 is a flow chart of a data processing method according to an embodiment of the present application; optionally, in step S303, before creating a data block of each object on each selected disk according to the preset erasure ratio, the method may further include:

S110, determining the number of the magnetic discs to be selected according to a preset erasure ratio.

The preset erasure ratio can be determined according to the selected erasure ratio random algorithm, and if the erasure ratio random algorithm is n+m, the preset erasure ratio is N: m, N: m represents the duty ratio of the effective data block to the redundant data block, and because each data block is distributed on different disks, the number of the disks to be selected is N+M.

S111, selecting a number of disks from a plurality of disks in a file system according to the disk weights of the disks in the file system, wherein the disk weights of the disks are determined according to the capacity of the disks and the total capacity of the system disks.

In order to ensure the balance of the disk capacity, the weight of each sucker can be determined according to the capacity of each disk occupying the total disk, and the disk with higher weight is preferentially selected. The disk selection can be carried out by adopting the following algorithm:

Any random number is taken, the random number is divided with the total weight of the magnetic disk to obtain a first round result, the first round result is compared with the weight of each magnetic disk, if the first round result falls within a certain weight range of the magnetic disk, the magnetic disk is determined to be the magnetic disk to be selected, the magnetic disk is removed from the total magnetic disk, and the weight of the rest magnetic disk is recalculated.

Repeating the execution, dividing the random number and the total weight of the disks to obtain a second round result, comparing the second round result with the weight of the disks of the rest disks, and if the second round result falls within a certain weight range of the disks, determining the disk as the disk to be selected, and executing the steps in sequence until all the disks to be selected are determined.

In one implementation manner, the electronic device for executing the method may be divided into a plurality of functional modules, where each module performs interaction processing to execute the method, and of course, the functional modules herein are virtual modules, or may not perform functional subdivision, and the steps are the same when the method is directly executed by the electronic device.

FIG. 11 is a schematic diagram of a data processing system according to an embodiment of the present application, where the data processing system may include: a file management module (FILE MANAGER, FM), a DataBase module (DB), an object management module (OBJ management, OM), an erasure calculation module (Erasure Correction, EC), and a disk data management module (DISK DATA MANAGE, DDM); the function of each module may be as follows:

And a file management module: after receiving a file creation request from a client, the module FM generates a unique file ID for the file and informs the OM module to allocate an idle space for the file, wherein the allocated space is composed of a plurality of OBJ (objects), and each OBJ is composed of a plurality of BLK (data blocks). After the OM returns to be successfully created, the FM writes file information (first type metadata) and information returned by the OM (association relation between the first type metadata and the second type metadata) into a database, and the database is responsible for storing the information into a system disk of a file system. When the file is read, the request issued by the FM analysis client side also finds the information of the first type metadata and the associated second type metadata of the corresponding file according to the file name, so that the data block information of the file to be read is obtained.

An object management module: and when the OM receives the FM file creating request, the module for managing the OBJ acquires disk information of the server, distributes the OBJ according to the current erasure ratio and the file size, selects a proper disk for the BLK in each OBJ, creates the BLK on the selected disk, and writes metadata of the BLK. After the OM is created, the integrated information is returned to the FM. When reading data, the corresponding disk of the data block to be accessed is found out through the block file information issued by the FM, and the data information of the BLK is accessed. The OBJ module is also responsible for updating changes in the file-to-block file relationship that result after data recovery into the database.

A database module: the database module stores server disk information, metadata information of the file and association relation between the file and the data block. Double writing is realized through the function of the database, so that the metadata security is ensured, and the metadata access speed is accelerated in a cache mode.

And an erasure calculation module: and the method is responsible for calculating read-write data, and performs erasure correction calculation according to FM-transmitted data information when the file is read-written, so as to obtain checked data or restore correct data.

Disk data management system: and managing data on the disk, finding out a corresponding disk according to BLK information transmitted by the OBJ, and reading and writing BLK files.

In the file creation process:

a) When FM receives a request of creating a file by a client, firstly querying a database to ensure whether the file exists or not, returning the existence of the file to the client, and when the file does not exist, generating a file index table and generating a file identifier;

b) The OM acquires all disc information from the database, randomly calculates N+M discs according to the erasure ratio to prepare to create BLK, sets disc weight according to the disc capacity to ensure disc capacity balance, and preferentially selects discs with high weight;

c) After OM selects enough disks, a BLK file is created on each disk through DDM;

d) The DDM creates a BLK file and writes BLK file information (namely BLK metadata) in the disk partition;

e) After the FM waits for OM to return that the file is successfully created, a data block information table is generated according to the information of the created file, and the file information table is generated through file metadata information. And calling the database, and writing the file index table, the data block information table and the file information table into the database at the same time.

During the file reading process:

a) The client reads file information, after FM receives a file information reading request, the information such as the state, existence, the size and the like of the file is firstly queried from a database, and an OBJ to be read is calculated;

b) The OM calls the DDM to read BLK files on each disk through the received OBJ information, the read offset and the read length;

c) The DDM finds the position of the BLK to be read on the disk according to INode (index unit) information on the disk, reads data with corresponding offset length, and returns to OM.

D) After the OM receives the return result, returning to the EC, judging whether erasure correction calculation is needed or not by the EC according to the return result of the OM, if not, directly returning to the FM, and if so, returning to the EC after calculation.

The method of the present application will be illustrated by the following specific examples:

embodiment 1 is a case where erasure calculation is not required:

at erasure ratio 4:1, there are 5 disks on the file system, the numbers are 1,2,3,4,5, wherein the disk number 5 is damaged, the client needs to read a 900M file with a file path of/storagecli/file.

At this time, in the database, the file index table is

DIMF@storage:file

123456

At this time, the file information table is

The data block information table is:

The client needs to read data with an offset of 16MB and a length of 1M (1024 k for the first time, and after FM receives the request, queries the database to obtain the file ID of 123456, and the file size of 900M. The file is normal in state and can be read. The BLK file under the first group OBJ is looked up from the FB table by calculating the offset 16MB on the first OBJ. BLK information is 0000-0001-01-1;0000-0002-01-1;0000-0003-01-1;0000-0004-01-1;0000-0005-03-1; while calculating an offset of 4MB (16 divided by 4) for 16MB in BLK, each block needs to read 256KB.

Fig. 12 is a schematic diagram of reading a data block according to an embodiment of the present application.

FM sends the BLK information to be read and the offset and the read length to OM, OM obtains each piece of BLK information, finds the corresponding disk of each BLK, sends the BLK name and the disk and the read offset and length to DDM, and DDM finds the corresponding BLK file name through the file index (Fnode) in the designated disk partition, obtains the offset of BLK in the disk, reads the corresponding data according to the offset and the length to be read of BLK, and returns to OM.

The OM returns the read data to the EC module, and at this time, the DDM fails to read the disk 5, so, in the data returned by the OM, the length of the 5 th BLK read data is 0, the read result is failure, and the EC module finds that, due to the erasure ratio of 4:1, the effective data are on the first 4 BLKs, and the data are directly combined without erasure calculation. And after the EC combines the data, returning the data to the FM, returning the FM to the client, and ending the data reading.

Embodiment 2 is a case where erasure calculations are required:

The client reads data for the second time, the reading offset is 257MB, the reading length is 1M, after FM receives the request, the FM calculates and knows that the data required by the client is the same as the first time on a second OBJ, and BLK under the second OBJ is 0000-0002-01-1;0000-0003-01-1;0000-0004-01-1;0000-0005-03-1;0000-0001-01-1; the offset in each BLK is the 9 th set of stripes, requiring the reading of 8 sets of stripes in length.

Fig. 13 is a schematic diagram of another data block reading according to an embodiment of the present application.

Similarly to the first reading, FM sends BLK and information to be read to OM, which sends OM to DDM, and at this time, since the 4 th BLK is on the number 5 disk in the BLK of the second OBJ, DDM will fail to read the 4 th BLK. The OM returns the read data to the EC, the EC discovers that the fourth stripe is a data stripe according to the result of the OM, the reading fails, erasure calculation needs to be carried out to recover the data, and the recovered data is returned to the FM after calculation. FM returns to the client, and the reading of the data is finished.

Embodiment 3 is directed to the case where only basic information of a file is read, and specific data of the file is not read:

The client scans how many files are stored on the server and obtains information of these files. The client sends a file information acquisition request to the FM to acquire file information of a specified directory/storage/filegroup/lower. After FM receives the request, query the database through catalog/storage/filegroup 1/composition key: the DIMF@storage/filegroup1 scans the files under the catalog, acquires the file IDs of the files under the catalog at the same time after acquiring the files, and then forms the key by the file IDs: fi@fileid scans the file information table. And returning the queried information to the client.

In this process, the client simply obtains the file information and does not want to read the content of each file. Therefore, after the FM obtains the request, the client can be returned only by inquiring the designated directory or the designated file information in the database (cache), the BLK information under the file and the specific distribution of the BLK on the disk are not required to be inquired, namely, in the case, the method can be realized by only accessing the first type metadata of the file and the second type metadata, the frequency of reading the disk during inquiry is reduced, and the data quantity of inquiry is reduced.

Embodiment 4 updates the first type metadata and the second type metadata for the case where the data block is corrupted:

When one BLK is damaged, data recovery is needed, when the BLK is needed to be rewritten in a disk, if the BLK is recovered on the disk corresponding to the BLK, the BLK may not be in the original Block group (Block group) after the recovery, but the serial number of the disk where the BLK is located is unchanged, at the moment, metadata needing to be changed is only the second type metadata, the second type metadata does not need to be changed, and the association relation between the second type metadata and the second type metadata does not need to be changed, so that modification in a database is not needed, and the client is not influenced to acquire the information of the file. Similarly, if the information of the file is modified, for example, the file name is modified, only the first type metadata is required to be updated, the association relation between the first type metadata and the second type metadata is not required to be updated, and the writing frequency of the data disc is effectively reduced.

In summary, the data processing method provided by the application comprises the following steps: obtaining a file creation request, the file creation request comprising: the size of the target file, the name of the target file; determining the number of objects into which the target file is divided and the number of data blocks contained in each object according to the size of the target file, the preset erasure ratio and the preset data block size; respectively creating a data block of each object on each selected disk according to a preset erasure ratio, and generating file creation information; according to the file creation information, first type metadata and second type metadata of the target file are respectively generated and stored, the first type metadata and the second type metadata are stored in a system disk of the file system, and the second type metadata are stored in a data disk of the file system. According to the method, metadata information of the file is classified according to access frequency, first type metadata and second type metadata are respectively generated, the first type metadata is stored in a system disk of a file system, the second type metadata is stored in a data disk of the file system, because information in the metadata is not accessed in the data access process, based on the classification of the metadata, the metadata can be accessed according to access requirements by separately storing the first type metadata with high frequency access and the second type metadata with low frequency access, so that unnecessary metadata access is reduced, the data access amount can be reduced to a certain extent, the data access efficiency is improved, the first type metadata and the second type metadata are respectively stored in the system disk and the data disk, and when the data access is performed, the access times of the data disk can be reduced, and the pressure of the data disk can be reduced.

The following describes a device, equipment, a storage medium, etc. for executing the method provided by the present application, and specific implementation processes and technical effects thereof are referred to above, and are not described in detail below.

Fig. 14 is a schematic diagram of a data processing apparatus according to an embodiment of the present application, where functions implemented by the data processing apparatus correspond to steps executed by the method. The apparatus may be understood as an electronic device or a server, or a processor of a server, or may be understood as a component, which is independent from the server or the processor and performs the functions of the present application under the control of the server, as shown in fig. 14, where the apparatus may include: an acquisition module 140, a determination module 141, a generation module 142;

the obtaining module 140 is configured to obtain a file creation request, where the file creation request includes: the size of the target file, the name of the target file;

A determining module 141, configured to determine, according to the size of the target file, a preset erasure ratio, and a preset size of the data block, the number of objects into which the target file is divided, and the number of data blocks included in each object;

the generating module 142 is configured to create a data block of each object on each selected disk according to a preset erasure ratio, and generate file creation information, where the file creation information includes: the method comprises the steps of setting a file name, a file size, a file storage path, a file identifier, file creation time, the number of objects contained in a file and information of data blocks in each object contained in the file, wherein the data blocks in each object comprise effective data blocks and redundant data blocks, and the data blocks under the same object are distributed on different magnetic discs of a selected file system;

The generating module 142 is configured to generate first type metadata and second type metadata of the target file according to the file creation information, store the first type metadata into a system disk of the file system, and store the second type metadata into a data disk of the file system; the first type of metadata includes: basic information of the target file, and association relation between first type metadata and second type metadata, wherein the second type metadata comprises: the data block information of the target file, the access frequency of the first type metadata is greater than that of the second type metadata.

Optionally, the generating module 142 is specifically configured to generate basic information of the target file according to a file name, a file size, a file creation time, a file storage path, and the number of objects included in the file, and generate an association relationship between the first type metadata and the second type metadata according to a file identifier and information of each data block in each object included in the file, where the association relationship is used to characterize a mapping relationship between the file identifier of the target file and each data block in each object included in the file;

Obtaining first type metadata of the target file according to the basic information of the target file and the association relation between the first type metadata and the second type metadata;

and generating data block information of the target file according to the information of each data block in each object contained in the file, and taking the data block information as second metadata.

Optionally, the apparatus further comprises: a write module;

The obtaining module 140 is further configured to obtain a file write request for the target file, where the file write request includes: name of the target file, file data of the target file;

the determining module 141 is further configured to determine whether the target file exists from the created files according to the name of the target file;

the determining module 141 is further configured to determine, if the target file exists, redundant data included in each object of the target file according to file data of the target file, where the redundant data is used for recovering the file data of the target file;

And the writing module is used for writing the redundant data contained in each object of the target file into the redundant data blocks of each object, respectively writing the file data of the target file into the effective data blocks of each object, and updating the first type metadata and the second type metadata corresponding to the target file.

Optionally, the apparatus further comprises: a reading module;

optionally, the obtaining module 140 is further configured to obtain a file read request for the target file, where the file read request includes: the name of the target file, the size of the target file, the reading offset information and the reading length information;

The determining module 141 is further configured to query the first metadata of each file in the system disk according to the name of the target file and the read offset information, and determine information of each data block corresponding to the target file;

And the reading module is used for reading the target file from the corresponding magnetic disk according to the information of each corresponding data block of the target file, the reading offset information and the reading length information.

Optionally, the determining module 141 is specifically configured to determine a file identifier of the target file according to the name of the target file;

Inquiring the second type metadata to obtain the information of the disk to which each data block of the target file belongs;

Optionally, if the reading of the target disk fails, and the data stored in the target data block on the target disk is valid data, the reading module executes erasure correction calculation to recover the valid data in the target data block on the target disk, where the target disk is any disk corresponding to each target data block.

Optionally, the reading module calculates valid data in the target data block on the target disk according to valid data read from the target data block on the disk other than the target disk and redundant data.

Optionally, the determining module 141 is further configured to determine the number of disks to be selected according to a preset erasure ratio;

and selecting a number of disks from a plurality of disks in the file system according to the disk weights of the disks in the file system, wherein the disk weights of the disks are determined according to the capacity of the disks and the total capacity of the system disks.

The foregoing apparatus is used for executing the method provided in the foregoing embodiment, and its implementation principle and technical effects are similar, and are not described herein again.

The above modules may be one or more integrated circuits configured to implement the above methods, for example: one or more Application SPECIFIC INTEGRATED Circuits (ASIC), or one or more microprocessors (DIGITAL SINGNAL processor, DSP), or one or more field programmable gate arrays (Field Programmable GATE ARRAY, FPGA), etc. For another example, when a module above is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

The modules may be connected or communicate with each other via wired or wireless connections. The wired connection may include a metal cable, optical cable, hybrid cable, or the like, or any combination thereof. The wireless connection may include a connection through a LAN, WAN, bluetooth, zigBee, or NFC, or any combination thereof. Two or more modules may be combined into a single module, and any one module may be divided into two or more units. It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the method embodiments, and are not repeated in the present disclosure.

Fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the device may be a computing device with a data processing function.

The apparatus may include: a processor 801, and a memory 802.

The memory 802 is used for storing a program, and the processor 801 calls the program stored in the memory 802 to execute the above-described method embodiment. The specific implementation manner and the technical effect are similar, and are not repeated here.

Therein, the memory 802 stores program code that, when executed by the processor 801, causes the processor 801 to perform various steps in the methods according to various exemplary embodiments of the application described in the above section of the description of exemplary methods.

The Processor 801 may be a general purpose Processor such as a Central Processing Unit (CPU), digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.

Memory 802, as a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory may include at least one type of storage medium, which may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), magnetic Memory, magnetic disk, optical disk, and the like. The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 802 of embodiments of the present application may also be circuitry or any other device capable of performing storage functions for storing program instructions and/or data.

Optionally, the present application also provides a program product, such as a computer readable storage medium, comprising a program for performing the above-described method embodiments when being executed by a processor.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.

The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (english: processor) to perform some of the steps of the methods according to the embodiments of the application. And the aforementioned storage medium includes: u disk, mobile hard disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.

Claims

1. A data processing method, applied to a file system based on erasure storage, the method comprising:

According to the file creation information, respectively generating first type metadata and second type metadata of the target file, storing the first type metadata into a system disk of the file system, and storing the second type metadata into a data disk of the file system; the first type of metadata includes: basic information of the target file, and association relation between the first type metadata and second type metadata, wherein the second type metadata comprises: the access frequency of the first type metadata is greater than that of the second type metadata;

The generating the first type metadata and the second type metadata of the target file according to the file creation information respectively includes:

2. The method according to claim 1, wherein after generating the first type metadata and the second type metadata of the target file according to the file creation information, and storing the first type metadata in a system disk of the file system and the second type metadata in a data disk of the file system, the method includes:

3. The method according to claim 2, wherein after writing the redundant data included in each object of the target file into the redundant data block of each object, and writing the file data of the target file into the valid data block of each object, respectively, the method comprises:

4. The method of claim 3, wherein the determining the information of each data block corresponding to the target file by querying the first metadata of each file in the system disk according to the name of the target file and the read offset information comprises:

5. The method according to claim 3, wherein reading the target file from the corresponding disk according to the read offset information and the read length information according to the information of each corresponding data block of the target file includes:

6. The method according to claim 5, wherein the reading file data stored in each target data block from the disk to which each target data block belongs includes:

7. The method of claim 6, wherein the performing erasure calculations restores valid data in the target data blocks on the target disk, comprising:

8. The method of claim 1, wherein the method comprises, prior to creating a data block for each object on each selected disk, respectively, according to the preset erasure ratio:

9. The method of claim 1, wherein the first type of metadata is stored in the form of key-value pairs.

10. A data processing apparatus for use in an erasure-storage-based file system, the apparatus comprising: the device comprises an acquisition module, a determination module and a generation module;

The generating module is used for respectively generating first type metadata and second type metadata of the target file according to the file creation information, storing the first type metadata into a system disk of the file system and storing the second type metadata into a data disk of the file system; the first type of metadata includes: basic information of the target file, and association relation between the first type metadata and second type metadata, wherein the second type metadata comprises: the access frequency of the first type metadata is greater than that of the second type metadata;

the generating module is specifically configured to generate basic information of the target file according to the file name, the file size, the file creation time, the file storage path and the number of objects contained in the file, and generate an association relationship between the first type metadata and the second type metadata according to the file identifier and information of each data block in each object contained in the file, where the association relationship is used to characterize a mapping relationship between the file identifier of the target file and each data block in each object contained in the file;

11. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing program instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is running, the processor executing the program instructions to perform the steps of the data processing method according to any one of claims 1 to 9 when executed.

12. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the data processing method according to any of claims 1 to 9.