CN108287660B

CN108287660B - Data storage method and device

Info

Publication number: CN108287660B
Application number: CN201710012670.9A
Authority: CN
Inventors: 付永振; 靳晓嘉; 魏春来; 汤云峰; 王靖; 付旭轮; 单雷光
Original assignee: China Mobile Group Hebei Co Ltd; China Mobile Communications Corp
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Hebei Co Ltd
Priority date: 2017-01-09
Filing date: 2017-01-09
Publication date: 2021-07-09
Anticipated expiration: 2037-01-09
Also published as: CN108287660A

Abstract

The present application discloses a data storage method and device. The data storage method includes: dividing the data to be stored into N objects, where N is a positive integer; allocating the N objects into M placement groups, where M is a positive integer smaller than N; For any placement group in the placement group, determine at least three object storage devices corresponding to the placement group based on the storage mapping table, wherein the storage mapping table includes the mapping relationship between the placement group and the object storage device, And each object included in the placement group is stored in a corresponding object storage device corresponding to the placement group based on a pseudo-random data distribution algorithm. The data storage method and device in the embodiments of the present application can improve the data storage efficiency in the Ceph distributed data storage system, and effectively realize high-speed reading and writing of data in the Ceph distributed data storage system.

Description

Data storage method and device

Technical Field

The present application relates to the field of computers, and in particular, to a data storage method and device.

Background

With the advent of the big data age, the traditional centralized data storage system cannot meet the requirement of large-scale data storage. In order to meet the requirements of large-scale data storage and ensure the reliability and safety of data storage, distributed data storage systems have appeared. The Ceph is an open-source distributed data storage system, and can store data in a distributed manner on a plurality of storage nodes, i.e., a plurality of storage servers, so that distributed storage of the data is realized, and the reliability, the availability and the access efficiency of the data storage system are improved.

In practical application, the distributed data storage system of the Ceph realizes the distributed storage of data through three-level mapping. Firstly, dividing File data (File) to be stored into a plurality of Object data (Object) with consistent data size, and realizing the mapping from the File to the Object; then, any Object is allocated to a Place Group (PG) through a hash algorithm, and the mapping of the Object to the PG is realized; finally, the Object contained in any PG is stored in different Object Storage Devices (OSD) in the Object Storage cluster through a pseudo random data distribution algorithm (CRUSH), so that the mapping from PG to OSD is realized.

However, since the Ceph distributed data storage system needs to perform hash operation in the data storage process and implement mapping storage of data, the storage efficiency of data is low, and the requirement of high-speed reading and writing cannot be met.

Disclosure of Invention

In view of this, embodiments of the present application provide a data storage method and device, so as to improve data storage efficiency of a Ceph distributed data storage system.

The data storage method according to the embodiment of the application is applied to a Ceph distributed data storage system and comprises the following steps: dividing data to be stored into N objects, wherein N is a positive integer; distributing N objects into M homing groups, wherein M is a positive integer less than N; for any one of the M homing groups, determining at least three object storage devices corresponding to the homing group based on a storage mapping table, wherein the storage mapping table comprises a mapping relation between the homing group and the object storage devices, and storing each object contained in the homing group into the corresponding object storage device corresponding to the homing group based on a pseudo-random data distribution algorithm.

A data storage device according to an embodiment of the present application is applied to a Ceph distributed data storage system, and includes: the device comprises a dividing unit, a storage unit and a processing unit, wherein the dividing unit is configured to divide data to be stored into N objects, and N is a positive integer; an allocation unit configured to allocate the N objects into M homing groups, wherein M is a positive integer less than N; and the storage unit is configured to determine, for any one of the M homing groups, at least three object storage devices corresponding to the homing group based on a storage mapping table, wherein the storage mapping table contains a mapping relationship between the homing group and the object storage devices, and store each object contained in the homing group into the corresponding object storage device corresponding to the homing group based on a pseudo-random data distribution algorithm.

According to the data storage method and system provided by the embodiment of the application, after the data to be stored are divided into N objects and the N objects are distributed into M homing groups, at least three object storage devices corresponding to any one homing group are directly searched through a predetermined storage mapping table, and then the objects contained in any one homing group are stored into the corresponding object storage devices corresponding to the homing group through a pseudo-random data distribution algorithm, so that the data storage efficiency in the Ceph distributed data storage system can be improved, and the high-speed reading and writing of the data in the Ceph distributed data storage system are effectively realized.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic flow chart illustrating a data storage method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a system boot process of the Ceph distributed data storage system according to an embodiment of the present application;

fig. 3 is a schematic diagram illustrating a process of setting storage nodes in a Ceph distributed data storage system according to an embodiment of the present application;

fig. 4 is a schematic process diagram of data storage of a Ceph distributed data storage system according to an embodiment of the present application;

fig. 5 is a schematic diagram of a process of repairing a failure of an object storage device of the Ceph distributed data storage system according to an embodiment of the present application;

fig. 6 is a schematic diagram of a process of repairing a failure of a storage node of the Ceph distributed data storage system according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a data storage device according to an embodiment of the present application.

Detailed Description

In practical application, the Ceph distributed data storage system mainly includes four parts, namely, a client, a metadata server, an object storage cluster, and a Monitor (Monitor, hereinafter referred to as Ceph Mon), where: the client represents a storage node where a current data user is located; the metadata server is used for caching and synchronizing information describing data attributes (such as storage positions of data, historical data, record files and the like); the object storage cluster comprises a plurality of storage nodes for data storage, and the monitor is used for performing a monitoring function on the whole Ceph distributed data storage system.

In the process of storing data in a Ceph distributed data storage system, an Inode Number (INO) is allocated to a File to be stored (File), and the INO is a unique identifier of the File; when the data size of the File to be stored is large, the File needs to be divided into a series of objects with uniform size for storage. Here, the size of the last Object may be different from the preceding Object.

In a Ceph large-scale storage cluster, the number of objects is large, the amount of data contained in each Object is small, and if the objects are stored in a read-write mode through traversal addressing, the data storage rate is seriously influenced. Meanwhile, if the Object is mapped to the OSD for storage through some fixed mapping hashing algorithm, the Object cannot be automatically migrated to other idle OSDs when the OSD is damaged, and data loss is caused. Therefore, objects with large data volumes are typically allocated into several PGs.

The Object identification code (OID) of any one Object can be determined by the INO and the Object Number (ONO). For any Object, a static HASH function is used to perform HASH on the OID to determine a HASH value of the Object, modulo operation is performed on the HASH value and the number of PGs to determine a PG identification code (PGID) of the PG corresponding to the Object, and mapping from the Object to the PG is further realized.

PG is a concept container of objects and also a logical concept, which is a virtual existence in the Ceph distributed data storage system, and is used for organizing and mapping the storage of objects. One PG is responsible for organizing several objects, but one Object can only be mapped into one PG, i.e. there is a "one-to-many" mapping between PG and Object. The reasonable setting of the number of PGs can ensure the uniformity of data distribution.

The Ceph distributed data storage system determines the OSD corresponding to any PG through a pseudo-random data distribution algorithm (CRUSH), and then stores the Object in the PG into the corresponding OSD in the PG, thereby realizing the mapping from the PG to the OSD. A large amount of PG is carried on an OSD, i.e., there is a "many-to-many" mapping between PG and OSD. Through the CRUSH algorithm, data loss when a single point of failure occurs in the storage node can be avoided, the storage node can be prevented from relying on metadata for storage, and the data storage efficiency is effectively improved.

However, since the Ceph distributed data storage system needs to perform hash operation and modulo operation between the hash value and the PG number in the data storage process, the data storage efficiency is low, and the requirement of high-speed reading and writing cannot be met.

In order to achieve the purpose of the present application, embodiments of the present application provide a data storage method and device, after dividing data to be stored into N objects and allocating the N objects to M PGs, directly find at least three OSDs corresponding to any one PG through a predetermined storage mapping table, and further store each Object included in any one PG into a corresponding OSD corresponding to the PG through a CRUSH algorithm, so that data storage efficiency in a Ceph distributed data storage system can be improved, and high-speed reading and writing of data in the Ceph distributed data storage system are effectively achieved.

The technical solutions of the present application will be described clearly and completely below with reference to the specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Example 1

Fig. 1 is a schematic flowchart of a data storage method according to an embodiment of the present application. The data storage method can be applied to a Ceph distributed data storage system and can comprise the following steps.

Step 11: dividing data to be stored into N objects, wherein N is a positive integer.

In step 11, to implement Object storage, the data to be stored is divided into N objects. Here, each Object has an Object identification code different from the other objects. The data amount of the N objects may be the same or different, and is not limited specifically here.

Step 12: and distributing the N objects into M classified groups PG according to the Object size, wherein M is a positive integer smaller than N.

In step 12, the N objects obtained by dividing in step 11 are allocated to M PGs according to the Object size, so as to implement packet storage of the objects. It should be noted that any PG has a different grouping identification code from the other PGs. To achieve uniform distribution of data, the N objects are evenly distributed into the M PGs by Object size. For example, when 500 objects are allocated to 100 PGs, each PG contains 5 objects.

Step 13: and determining at least three object storage devices OSD corresponding to any PG through a storage mapping table, wherein the storage mapping table comprises the mapping relation between PG and OSD.

In step 13, Ceph Mon determines at least three OSDs corresponding to any one PG according to a storage mapping table pre-stored in the Ceph distributed data storage system. In a Ceph distributed data storage system, at least three copies are to be saved by one Object, i.e., one Object is to be stored in at least three OSDs. Since one Object is mapped to only one PG, any one PG needs to be mapped to at least three OSDs to ensure that one Object can be stored in at least three OSDs.

Step 14: for any PG, storing each Object contained in the PG into a corresponding OSD corresponding to the PG through a CRUSH algorithm.

In step 14, for any PG, since at least three OSDs corresponding to the PG have been determined in step 13, each Object contained in the PG can be stored in the corresponding OSD corresponding to the PG by the CRUSH algorithm, thereby realizing distributed storage of data to be stored.

In alternative embodiments of the present application, the memory mapping table may be created in the following manner. Specifically, the process of creating the memory mapping table includes:

first, the hash value of each OSD is read from the memory. Specifically, Ceph Mon reads the hash value of each OSD stored in the memory.

Secondly, a mapping relation between any PG and at least three OSD is established. Specifically, CephMon establishes a mapping relationship between any one PG and at least three OSDs. It should be noted that the OSD that establishes the mapping relationship with the PG is an idle OSD, that is, an OSD capable of implementing a data storage function.

And finally, storing the mapping relation between the PG and the OSD in a storage mapping table.

The mapping relation between each PG and at least three OSD is established by reading the Hash value of each OSD stored in the memory, so that the problem of low storage efficiency caused by the Hash value calculation is avoided.

In an alternative embodiment of the present application, the hash value of each OSD may be determined in the following manner. Specifically, the process of determining the hash value of each OSD includes:

first, device information of a preset number of storage nodes is called from a system folder, wherein any one storage node comprises at least three OSDs. The device information of the storage node includes, but is not limited to, device information such as an IP address and a machine name corresponding to the storage node.

Secondly, according to the device information of any storage node, the hash value of each OSD in the storage node is calculated.

And finally, storing the hash value of each OSD in the storage nodes with the preset number in the memory.

Fig. 2 is a schematic diagram of a system boot process of the Ceph distributed data storage system according to an embodiment of the present application. As shown in fig. 2, when the Ceph distributed data storage system is started, the Ceph Mon calls the device information of 3 storage nodes stored in the system folder, calculates the hash value of each OSD in the storage node according to the device information of any one storage node, and stores the hash values of the OSD in the memory.

In an alternative embodiment of the present application, the device information of the storage node may be determined by:

firstly, setting equipment information of a preset number of storage nodes in a node scanning script;

then, the device information of a preset number of storage nodes is stored in a system folder by parsing the node scanning script.

Fig. 3 is a schematic diagram of a process of setting storage nodes in a Ceph distributed data storage system according to an embodiment of the present application. As shown in fig. 3, Ceph Mon sets device information of 3 storage nodes in a node scanning script of the Ceph distributed data storage system, and further stores the device information of the 3 storage nodes in a system folder by analyzing the node scanning script.

Fig. 4 is a schematic process diagram of data storage of the Ceph distributed data storage system according to an embodiment of the present application. As shown in fig. 4, after determining three OSDs (OSD1, OSD2, and OSD3) corresponding to a PG, Ceph Mon stores each Object included in the PG in the corresponding OSD corresponding to the PG by the CRUSH algorithm.

In an optional embodiment of the present application, the data storage method according to an embodiment of the present application further includes: when the OSD storing the Object fails, calculating the update hash value of each OSD in the storage node where the OSD is located; determining idle OSD in the storage node according to the updated hash value of each OSD in the storage node; and storing the Object stored in the faulted OSD into the idle OSD.

Fig. 5 is a schematic diagram illustrating an OSD fault repair process of the Ceph distributed data storage system according to an embodiment of the present application. As shown in fig. 5, when the OSD2 storing the Object in the Object storage cluster of the Ceph distributed data storage system fails, the storage node where the OSD2 is located starts fault repair, the hash values of the OSDs in the storage node are recalculated (i.e., the updated hash values of the OSDs are calculated), and the idle OSDx in the storage node is determined according to the updated hash values of the OSDs in the storage node, so that the Object stored in the failed OSD2 is stored in the idle OSDx.

In an optional embodiment of the present application, the data storage method according to an embodiment of the present application further includes: when the storage node storing the homing group fails, adding idle storage nodes in the node scanning script; and storing the reset group stored in the storage node with the fault into the idle storage node through a CRUSH algorithm.

Fig. 6 is a schematic diagram of a process of repairing a failure of a storage node of the Ceph distributed data storage system according to an embodiment of the present application. As shown in fig. 6, when any storage Node3 in the object storage cluster of the Ceph distributed data storage system fails, the Ceph Mon updates the Node scanning script, adds the device information of the free storage Node4 in the Node scanning script, and further stores the PG stored in the failed storage Node3 in the free storage Node4 by using the CRUSH algorithm.

In the Ceph distributed data storage system, after data to be stored is divided into N objects and the N objects are distributed into M PGs according to the sizes of the objects, at least three OSD corresponding to any one PG can be directly found through a predetermined storage mapping table, and then any one PG can be stored in each OSD corresponding to the PG through a CRUSH algorithm, so that the data storage efficiency in the Ceph distributed data storage system is improved, and high-speed reading and writing of the data in the Ceph distributed data storage system are effectively realized.

Example 2

Fig. 7 is a schematic structural diagram of a data storage device according to an embodiment of the present application. As shown in fig. 7, the data storage device 70 according to the embodiment of the present application includes a dividing unit 701, an allocating unit 702, and a storage unit 703, wherein: the dividing unit 701 is configured to divide data to be stored into N objects, where N is a positive integer; the allocating unit 702 is configured to allocate N objects into M grouped groups PG according to Object size, where M is a positive integer smaller than N; the storage unit 703 is configured to, for any one of the M homing groups, determine at least three object storage devices OSD corresponding to the homing group PG based on a storage mapping table, where the storage mapping table includes a mapping relationship between the PG and the OSD, and store each object included in the PG into a corresponding OSD corresponding to the PG based on a pseudo random data distribution CRUSH algorithm.

In an alternative embodiment of the present application, the data storage device 70 further comprises: a reading unit 704 and a mapping unit 705, wherein: the reading unit 705 is configured to read the hash value of each OSD from the memory; the mapping unit 706 is configured to establish a mapping relationship between the PG and at least three OSDs for any one of the M homing groups, and store the established mapping relationship in a storage mapping table.

In an alternative embodiment of the present application, the data storage device 70 further comprises a calling unit 706 and a calculating unit 707, wherein: the calling unit 706 is configured to call device information of a preset number of storage nodes from a system folder, where any one storage node includes at least three OSDs; the calculation unit 707 is configured to, for any one of a preset number of storage nodes, calculate a hash value of each OSD in the storage node according to the device information of the storage node, and store the hash value of each OSD in the memory.

In an alternative embodiment of the present application, the data storage device 70 further comprises a setting unit 708, wherein: the setting unit 709 is configured to set device information of a preset number of storage nodes in the node scan script, and store the device information of the preset number of storage nodes in the system folder by parsing the node scan script.

In an optional embodiment of the present application, the calculating unit 707 is further configured to, when an OSD storing an object fails, calculate an updated hash value of each OSD in a storage node where the OSD is located; the storage unit 703 is further configured to determine a free OSD in the storage node according to the updated hash value of each OSD in the storage node, and store an object stored in the failed OSD into the free OSD.

In an optional embodiment of the present application, the setting unit 709 is further configured to, when any storage node storing the homing group fails, add a free storage node in the node scan script; the storage unit 703 is further configured to store the PG stored in the failed storage node into a free storage node by the CRUSH algorithm.

According to the data storage device of the embodiment of the application, after the data to be stored is divided into N objects and the N objects are distributed into M PGs according to the sizes of the objects, at least three OSD corresponding to any one PG can be directly found through a predetermined storage mapping table, and then any one PG can be stored in each OSD corresponding to the PG through a CRUSH algorithm, so that the data storage efficiency in the Ceph distributed data storage system is improved, and high-speed reading and writing of the data in the Ceph distributed data storage system are effectively achieved.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A data storage method is applied to a Ceph distributed data storage system and comprises the following steps:

dividing data to be stored into N objects, wherein N is a positive integer;

allocating the N objects to M homing groups, wherein M is a positive integer less than N;

for any one of the M homing groups,

determining at least three object storage devices corresponding to the homing group based on a storage mapping table, wherein the storage mapping table comprises mapping relations between the homing group and the object storage devices, and

and storing each object contained in the homing group into a corresponding object storage device corresponding to the homing group based on a pseudo-random data distribution algorithm.

2. The data storage method of claim 1, further comprising:

reading the hash value of each object storage device from the memory;

for any one of the M homing groups,

establishing a mapping relationship between the grouped identification code of the grouped and the hash values of at least three object storage devices,

and storing the mapping relation in the storage mapping table.

3. The data storage method of claim 2, further comprising:

the method comprises the steps of calling equipment information of a preset number of storage nodes from a system folder, wherein any one of the preset number of storage nodes comprises at least three object storage equipment;

for any one storage node in the preset number of storage nodes, calculating hash values of all object storage devices in the storage nodes based on the device information of the storage nodes;

and storing the hash value of each object storage device in the storage nodes with the preset number in the memory.

4. The data storage method of claim 3, further comprising:

setting the equipment information of the storage nodes with the preset number in the node scanning script;

and storing the equipment information of the storage nodes with the preset number in the system folder by analyzing the node scanning script.

5. The data storage method of claim 4, further comprising:

when any one object storage device storing an object fails, calculating an updated hash value of each object storage device in a storage node where the object storage device is located;

determining idle object storage equipment in the storage node according to the updated hash value of each object storage equipment in the storage node; and

and storing the object stored in the object storage device with the fault into the free object storage device.

6. The data storage method of claim 3, further comprising:

when any storage node storing the homing group fails, adding idle storage nodes in the node scanning script; and

and storing the homing group stored in the failed storage node into the idle storage node based on the pseudo-random data distribution algorithm.

7. A data storage device, wherein the data storage device is applied to a Ceph distributed data storage system, and comprises:

the device comprises a dividing unit, a storage unit and a processing unit, wherein the dividing unit is configured to divide data to be stored into N objects, and N is a positive integer;

an assigning unit configured to assign the N objects to M homing groups, wherein M is a positive integer less than N;

a storage unit configured to, for any one of the M homing groups,

8. The data storage device of claim 7, further comprising:

the reading unit is configured to read the hash value of each object storage device from the memory;

a mapping unit configured to map, for any one of the M homing groups,

and storing the mapping relation in the storage mapping table.

9. The data storage device of claim 8, further comprising:

the system comprises a calling unit and a processing unit, wherein the calling unit is configured to call device information of a preset number of storage nodes from a system folder, and any one of the preset number of storage nodes comprises at least three object storage devices;

a calculating unit configured to calculate, for any one of the preset number of storage nodes, hash values of respective object storage devices in the storage nodes based on the device information of the storage node, and store the hash values of the respective object storage devices in the preset number of storage nodes in the memory.

10. The data storage device of claim 9, further comprising:

a setting unit configured to set the device information of the preset number of storage nodes in a node scanning script, and store the device information of the preset number of storage nodes in the system folder by parsing the node scanning script.

11. The data storage device of claim 10,

the computing unit is further configured to compute an updated hash value of each object storage device in the storage node where the object storage device is located when the object storage device storing the object fails;

the storage unit is further configured to determine a free object storage device in the storage node according to the updated hash value of each object storage device in the storage node, and store the object stored in the failed object storage device into the free object storage device.

12. The data storage device of claim 10,

the setting unit is further configured to add a free storage node in the node scanning script when any one storage node storing the homing group fails;

the storage unit is further configured to store the homing group stored in the failed storage node into the free storage node based on the pseudo random number data distribution algorithm.