Detailed Description
In practical application, the Ceph distributed data storage system mainly includes four parts, namely, a client, a metadata server, an object storage cluster, and a Monitor (Monitor, hereinafter referred to as Ceph Mon), where: the client represents a storage node where a current data user is located; the metadata server is used for caching and synchronizing information describing data attributes (such as storage positions of data, historical data, record files and the like); the object storage cluster comprises a plurality of storage nodes for data storage, and the monitor is used for performing a monitoring function on the whole Ceph distributed data storage system.
In the process of storing data in a Ceph distributed data storage system, an Inode Number (INO) is allocated to a File to be stored (File), and the INO is a unique identifier of the File; when the data size of the File to be stored is large, the File needs to be divided into a series of objects with uniform size for storage. Here, the size of the last Object may be different from the preceding Object.
In a Ceph large-scale storage cluster, the number of objects is large, the amount of data contained in each Object is small, and if the objects are stored in a read-write mode through traversal addressing, the data storage rate is seriously influenced. Meanwhile, if the Object is mapped to the OSD for storage through some fixed mapping hashing algorithm, the Object cannot be automatically migrated to other idle OSDs when the OSD is damaged, and data loss is caused. Therefore, objects with large data volumes are typically allocated into several PGs.
The Object identification code (OID) of any one Object can be determined by the INO and the Object Number (ONO). For any Object, a static HASH function is used to perform HASH on the OID to determine a HASH value of the Object, modulo operation is performed on the HASH value and the number of PGs to determine a PG identification code (PGID) of the PG corresponding to the Object, and mapping from the Object to the PG is further realized.
PG is a concept container of objects and also a logical concept, which is a virtual existence in the Ceph distributed data storage system, and is used for organizing and mapping the storage of objects. One PG is responsible for organizing several objects, but one Object can only be mapped into one PG, i.e. there is a "one-to-many" mapping between PG and Object. The reasonable setting of the number of PGs can ensure the uniformity of data distribution.
The Ceph distributed data storage system determines the OSD corresponding to any PG through a pseudo-random data distribution algorithm (CRUSH), and then stores the Object in the PG into the corresponding OSD in the PG, thereby realizing the mapping from the PG to the OSD. A large amount of PG is carried on an OSD, i.e., there is a "many-to-many" mapping between PG and OSD. Through the CRUSH algorithm, data loss when a single point of failure occurs in the storage node can be avoided, the storage node can be prevented from relying on metadata for storage, and the data storage efficiency is effectively improved.
However, since the Ceph distributed data storage system needs to perform hash operation and modulo operation between the hash value and the PG number in the data storage process, the data storage efficiency is low, and the requirement of high-speed reading and writing cannot be met.
In order to achieve the purpose of the present application, embodiments of the present application provide a data storage method and device, after dividing data to be stored into N objects and allocating the N objects to M PGs, directly find at least three OSDs corresponding to any one PG through a predetermined storage mapping table, and further store each Object included in any one PG into a corresponding OSD corresponding to the PG through a CRUSH algorithm, so that data storage efficiency in a Ceph distributed data storage system can be improved, and high-speed reading and writing of data in the Ceph distributed data storage system are effectively achieved.
The technical solutions of the present application will be described clearly and completely below with reference to the specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Example 1
Fig. 1 is a schematic flowchart of a data storage method according to an embodiment of the present application. The data storage method can be applied to a Ceph distributed data storage system and can comprise the following steps.
Step 11: dividing data to be stored into N objects, wherein N is a positive integer.
In step 11, to implement Object storage, the data to be stored is divided into N objects. Here, each Object has an Object identification code different from the other objects. The data amount of the N objects may be the same or different, and is not limited specifically here.
Step 12: and distributing the N objects into M classified groups PG according to the Object size, wherein M is a positive integer smaller than N.
In step 12, the N objects obtained by dividing in step 11 are allocated to M PGs according to the Object size, so as to implement packet storage of the objects. It should be noted that any PG has a different grouping identification code from the other PGs. To achieve uniform distribution of data, the N objects are evenly distributed into the M PGs by Object size. For example, when 500 objects are allocated to 100 PGs, each PG contains 5 objects.
Step 13: and determining at least three object storage devices OSD corresponding to any PG through a storage mapping table, wherein the storage mapping table comprises the mapping relation between PG and OSD.
In step 13, Ceph Mon determines at least three OSDs corresponding to any one PG according to a storage mapping table pre-stored in the Ceph distributed data storage system. In a Ceph distributed data storage system, at least three copies are to be saved by one Object, i.e., one Object is to be stored in at least three OSDs. Since one Object is mapped to only one PG, any one PG needs to be mapped to at least three OSDs to ensure that one Object can be stored in at least three OSDs.
Step 14: for any PG, storing each Object contained in the PG into a corresponding OSD corresponding to the PG through a CRUSH algorithm.
In step 14, for any PG, since at least three OSDs corresponding to the PG have been determined in step 13, each Object contained in the PG can be stored in the corresponding OSD corresponding to the PG by the CRUSH algorithm, thereby realizing distributed storage of data to be stored.
In alternative embodiments of the present application, the memory mapping table may be created in the following manner. Specifically, the process of creating the memory mapping table includes:
first, the hash value of each OSD is read from the memory. Specifically, Ceph Mon reads the hash value of each OSD stored in the memory.
Secondly, a mapping relation between any PG and at least three OSD is established. Specifically, CephMon establishes a mapping relationship between any one PG and at least three OSDs. It should be noted that the OSD that establishes the mapping relationship with the PG is an idle OSD, that is, an OSD capable of implementing a data storage function.
And finally, storing the mapping relation between the PG and the OSD in a storage mapping table.
The mapping relation between each PG and at least three OSD is established by reading the Hash value of each OSD stored in the memory, so that the problem of low storage efficiency caused by the Hash value calculation is avoided.
In an alternative embodiment of the present application, the hash value of each OSD may be determined in the following manner. Specifically, the process of determining the hash value of each OSD includes:
first, device information of a preset number of storage nodes is called from a system folder, wherein any one storage node comprises at least three OSDs. The device information of the storage node includes, but is not limited to, device information such as an IP address and a machine name corresponding to the storage node.
Secondly, according to the device information of any storage node, the hash value of each OSD in the storage node is calculated.
And finally, storing the hash value of each OSD in the storage nodes with the preset number in the memory.
Fig. 2 is a schematic diagram of a system boot process of the Ceph distributed data storage system according to an embodiment of the present application. As shown in fig. 2, when the Ceph distributed data storage system is started, the Ceph Mon calls the device information of 3 storage nodes stored in the system folder, calculates the hash value of each OSD in the storage node according to the device information of any one storage node, and stores the hash values of the OSD in the memory.
In an alternative embodiment of the present application, the device information of the storage node may be determined by:
firstly, setting equipment information of a preset number of storage nodes in a node scanning script;
then, the device information of a preset number of storage nodes is stored in a system folder by parsing the node scanning script.
Fig. 3 is a schematic diagram of a process of setting storage nodes in a Ceph distributed data storage system according to an embodiment of the present application. As shown in fig. 3, Ceph Mon sets device information of 3 storage nodes in a node scanning script of the Ceph distributed data storage system, and further stores the device information of the 3 storage nodes in a system folder by analyzing the node scanning script.
Fig. 4 is a schematic process diagram of data storage of the Ceph distributed data storage system according to an embodiment of the present application. As shown in fig. 4, after determining three OSDs (OSD1, OSD2, and OSD3) corresponding to a PG, Ceph Mon stores each Object included in the PG in the corresponding OSD corresponding to the PG by the CRUSH algorithm.
In an optional embodiment of the present application, the data storage method according to an embodiment of the present application further includes: when the OSD storing the Object fails, calculating the update hash value of each OSD in the storage node where the OSD is located; determining idle OSD in the storage node according to the updated hash value of each OSD in the storage node; and storing the Object stored in the faulted OSD into the idle OSD.
Fig. 5 is a schematic diagram illustrating an OSD fault repair process of the Ceph distributed data storage system according to an embodiment of the present application. As shown in fig. 5, when the OSD2 storing the Object in the Object storage cluster of the Ceph distributed data storage system fails, the storage node where the OSD2 is located starts fault repair, the hash values of the OSDs in the storage node are recalculated (i.e., the updated hash values of the OSDs are calculated), and the idle OSDx in the storage node is determined according to the updated hash values of the OSDs in the storage node, so that the Object stored in the failed OSD2 is stored in the idle OSDx.
In an optional embodiment of the present application, the data storage method according to an embodiment of the present application further includes: when the storage node storing the homing group fails, adding idle storage nodes in the node scanning script; and storing the reset group stored in the storage node with the fault into the idle storage node through a CRUSH algorithm.
Fig. 6 is a schematic diagram of a process of repairing a failure of a storage node of the Ceph distributed data storage system according to an embodiment of the present application. As shown in fig. 6, when any storage Node3 in the object storage cluster of the Ceph distributed data storage system fails, the Ceph Mon updates the Node scanning script, adds the device information of the free storage Node4 in the Node scanning script, and further stores the PG stored in the failed storage Node3 in the free storage Node4 by using the CRUSH algorithm.
In the Ceph distributed data storage system, after data to be stored is divided into N objects and the N objects are distributed into M PGs according to the sizes of the objects, at least three OSD corresponding to any one PG can be directly found through a predetermined storage mapping table, and then any one PG can be stored in each OSD corresponding to the PG through a CRUSH algorithm, so that the data storage efficiency in the Ceph distributed data storage system is improved, and high-speed reading and writing of the data in the Ceph distributed data storage system are effectively realized.
Example 2
Fig. 7 is a schematic structural diagram of a data storage device according to an embodiment of the present application. As shown in fig. 7, the data storage device 70 according to the embodiment of the present application includes a dividing unit 701, an allocating unit 702, and a storage unit 703, wherein: the dividing unit 701 is configured to divide data to be stored into N objects, where N is a positive integer; the allocating unit 702 is configured to allocate N objects into M grouped groups PG according to Object size, where M is a positive integer smaller than N; the storage unit 703 is configured to, for any one of the M homing groups, determine at least three object storage devices OSD corresponding to the homing group PG based on a storage mapping table, where the storage mapping table includes a mapping relationship between the PG and the OSD, and store each object included in the PG into a corresponding OSD corresponding to the PG based on a pseudo random data distribution CRUSH algorithm.
In an alternative embodiment of the present application, the data storage device 70 further comprises: a reading unit 704 and a mapping unit 705, wherein: the reading unit 705 is configured to read the hash value of each OSD from the memory; the mapping unit 706 is configured to establish a mapping relationship between the PG and at least three OSDs for any one of the M homing groups, and store the established mapping relationship in a storage mapping table.
In an alternative embodiment of the present application, the data storage device 70 further comprises a calling unit 706 and a calculating unit 707, wherein: the calling unit 706 is configured to call device information of a preset number of storage nodes from a system folder, where any one storage node includes at least three OSDs; the calculation unit 707 is configured to, for any one of a preset number of storage nodes, calculate a hash value of each OSD in the storage node according to the device information of the storage node, and store the hash value of each OSD in the memory.
In an alternative embodiment of the present application, the data storage device 70 further comprises a setting unit 708, wherein: the setting unit 709 is configured to set device information of a preset number of storage nodes in the node scan script, and store the device information of the preset number of storage nodes in the system folder by parsing the node scan script.
In an optional embodiment of the present application, the calculating unit 707 is further configured to, when an OSD storing an object fails, calculate an updated hash value of each OSD in a storage node where the OSD is located; the storage unit 703 is further configured to determine a free OSD in the storage node according to the updated hash value of each OSD in the storage node, and store an object stored in the failed OSD into the free OSD.
In an optional embodiment of the present application, the setting unit 709 is further configured to, when any storage node storing the homing group fails, add a free storage node in the node scan script; the storage unit 703 is further configured to store the PG stored in the failed storage node into a free storage node by the CRUSH algorithm.
According to the data storage device of the embodiment of the application, after the data to be stored is divided into N objects and the N objects are distributed into M PGs according to the sizes of the objects, at least three OSD corresponding to any one PG can be directly found through a predetermined storage mapping table, and then any one PG can be stored in each OSD corresponding to the PG through a CRUSH algorithm, so that the data storage efficiency in the Ceph distributed data storage system is improved, and high-speed reading and writing of the data in the Ceph distributed data storage system are effectively achieved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.