WO2019144553A1

WO2019144553A1 - Data storage method and apparatus, and storage medium

Info

Publication number: WO2019144553A1
Application number: PCT/CN2018/089342
Authority: WO
Inventors: 刘源
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-01-26
Filing date: 2018-05-31
Publication date: 2019-08-01
Anticipated expiration: 2020-07-26
Also published as: CN108287669A; CN108287669B

Abstract

Provided in the present application are a data storage method and apparatus, and a computer-readable storage medium, the method comprising the following steps: determining the quantity of OSDs used by each service group according to a preset rule in every preset cycle, and allocating a unique sub-cluster identifier for OSDs of the same service group; receiving a request of a user for storing a data file to a distributed storage system; determining, according to identification information of the user, the service group to which the user belongs and the sub-cluster identifier of the OSDs of said service group; and uniformly dividing the data file into multiple data blocks, and storing multiple copies of each data block in the OSDs which have the corresponding sub-cluster identifier by utilizing a CRUSH algorithm. With the present application, resources are reasonably configured by allocating the OSD for each service group, and the influence of OSD faults on the storage system are controlled to be within a sub-cluster range by means of adding logic division of sub-clusters into a topology structure of a cluster.

Description

Data storage method, device and storage medium

优先权申明Priority claim

本申请要求于2018年1月26日提交中国专利局、申请号为201810079304.X，发明名称为“数据存储方法、装置及存储介质”的中国专利申请的优先权，其内容全部通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201,810, 079, 307, filed on Jan. 26, 2018, the entire disclosure of which is incorporated herein by reference. In this application.

Technical field

本申请涉及数据存储技术领域，尤其涉及一种数据存储方法、装置及计算机可读存储介质。The present application relates to the field of data storage technologies, and in particular, to a data storage method, apparatus, and computer readable storage medium.

Background technique

分布式存储系统是将数据分散地存储在多台独立的设备上的系统，Ceph是一种广泛应用的开源的分布式存储系统，它将数据视为对象(Object)，通过在可扩展散列下的受控复制(Controlled Replication Under Scalable Hashing，CRUSH)算法将对象均匀分布到存储设备集群之中，并提供动态扩展、平衡和恢复。A distributed storage system is a system that distributes data decentralized on multiple independent devices. Ceph is a widely used open source distributed storage system that treats data as an object through extensible hashing. The Controlled Replication Under Scalable Hashing (CRUSH) algorithm evenly distributes objects into clusters of storage devices and provides dynamic scaling, balancing, and recovery.

目前，在分布式存储系统中，多采用三副本机制在三个不同的位置对数据进行存储以提高数据存储的可靠性，数据在以三副本的形式保存的时候，分别以数据中心、机房、机柜、主机、对象存储设备(Object Storage Device，OSD)的层次结构来进行故障域的规划。一旦其中一个OSD出现故障，则为了保证其三副本的可用性，将进行整个集群的数据迁移，该数据迁移的过程占用大量的系统资源，大大影响系统性能。At present, in a distributed storage system, three copies are used to store data in three different locations to improve the reliability of data storage. When data is stored in three copies, the data center and the computer room are respectively used. The hierarchy of the cabinet, host, and object storage device (OSD) is used to plan the fault domain. Once one of the OSDs fails, in order to ensure the availability of the three copies, data migration of the entire cluster will be performed. This data migration process consumes a large amount of system resources and greatly affects system performance.

发明内容Summary of the invention

为解决现有技术存在的不足，本申请提供一种存储方法、装置及计算机可读存储介质，可以为各业务组分配OSD，在集群的拓扑结构中主机的层次上面加入一个子集群的逻辑划分，实现资源的合理配置并将OSD故障对存储系统的影响控制在子集群的范围内。To solve the deficiencies of the prior art, the present application provides a storage method, an apparatus, and a computer readable storage medium, which can allocate an OSD for each service group, and add a logical division of a sub-cluster to the hierarchy of the host in the topology of the cluster. To achieve reasonable allocation of resources and to control the impact of OSD failures on the storage system within the scope of the sub-cluster.

为实现上述目的，本申请提供一种数据存储方法，应用于电子装置，该电子装置通过网络连接分布式存储系统，分布式存储系统提供一系列主机及OSD，其特征在于，该方法包括：To achieve the above objective, the present application provides a data storage method, which is applied to an electronic device, which is connected to a distributed storage system through a network, and the distributed storage system provides a series of hosts and an OSD, wherein the method includes:

设备分配步骤：每隔预设周期按照预设规则确定各业务组使用的OSD的数量，并为同一个业务组的OSD分配一个唯一的子集群标识；Device allocation step: determining the number of OSDs used by each service group according to a preset rule every preset period, and assigning a unique sub-cluster identifier to the OSD of the same service group;

请求接收步骤：接收用户向分布式存储系统存储数据文件的请求；Request receiving step: receiving a request for a user to store a data file to a distributed storage system;

业务组确定步骤：根据用户的识别信息确定该用户所属的业务组及该业务组OSD的子集群标识；及The service group determining step is: determining, according to the identification information of the user, the service group to which the user belongs and the sub-cluster identifier of the service group OSD; and

文件存储步骤：将所述数据文件均匀切分为多个数据块，利用CRUSH算法将每个数据块的多个副本存储在具有对应子集群标识的OSD中。File storage step: uniformly divide the data file into a plurality of data blocks, and store multiple copies of each data block in an OSD having a corresponding sub-cluster identifier by using a CRUSH algorithm.

本申请还提供一种电子装置，该电子装置包括存储器和处理器，所述存储器中包括数据存储程序。该电子装置的处理器执行存储器中的数据存储程序时，实现以下步骤：The application also provides an electronic device including a memory and a processor, the memory including a data storage program. When the processor of the electronic device executes the data storage program in the memory, the following steps are implemented:

此外，为实现上述目的，本申请还提供一种计算机可读存储介质，所述计算机可读存储介质中包括数据存储程序，所述数据存储程序被处理器执行时，实现如上所述的数据存储方法中的任意步骤。In addition, in order to achieve the above object, the present application further provides a computer readable storage medium including a data storage program, when the data storage program is executed by a processor, implementing data storage as described above Any step in the method.

本申请提供的数据存储方法、装置及计算机可读存储介质，根据预设规则确定各业务组使用的OSD的数量，并为同一个业务组的OSD分配一个唯一的子集群标识，以此在集群的拓扑结构中主机的层次上面加入一个子集群的逻辑划分，然后，将数据文件均匀切分成多个数据块，再将每个数据块映射到一个归置组(placement group，PG)中，最后利用CRUSH算法将每个PG的多个副本分别存储在具有对应子集群标识的OSD中。利用本申请，实现了资源的合理配置，还可将OSD故障对存储系统的影响控制在子集群的范围内。The data storage method, device and computer readable storage medium provided by the present application determine the number of OSDs used by each service group according to a preset rule, and assign a unique sub-cluster identifier to the OSD of the same service group, thereby In the topological structure, the logical hierarchy of a sub-cluster is added to the hierarchy of the host. Then, the data file is evenly divided into multiple data blocks, and each data block is mapped into a placement group (PG), and finally Multiple copies of each PG are stored in the OSD with the corresponding sub-cluster identification using the CRUSH algorithm. With the application, the reasonable configuration of the resources is realized, and the influence of the OSD failure on the storage system can be controlled within the scope of the sub-cluster.

DRAWINGS

图1为本申请电子装置较佳实施例的应用环境图。1 is an application environment diagram of a preferred embodiment of an electronic device of the present application.

图2为图1中数据存储程序的程序模块图。2 is a program block diagram of the data storage program of FIG. 1.

图3为本申请的数据文件存储的示意图。3 is a schematic diagram of data file storage of the present application.

图4为本申请数据存储方法较佳实施例的流程图。4 is a flow chart of a preferred embodiment of a data storage method of the present application.

本申请目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The implementation, functional features and advantages of the present application will be further described with reference to the accompanying drawings.

Detailed ways

本领域的技术人员知道，本申请的实施方式可以实现为一种方法、装置、设备、系统或计算机程序产品。因此，本申请可以具体实现为完全的硬件、完全的软件(包括固件、驻留软件、微代码等)，或者硬件和软件结合的形式。Those skilled in the art will appreciate that embodiments of the present application can be implemented as a method, apparatus, device, system, or computer program product. Accordingly, the application can be embodied in a complete hardware, complete software (including firmware, resident software, microcode, etc.), or a combination of hardware and software.

下面将参考若干具体实施例来描述本申请的原理和精神。应当理解，此处所描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。The principles and spirit of the present application are described below with reference to a number of specific embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.

参照图1所示，为本申请电子装置较佳实施例的应用环境图。在该实施例中，电子装置1通过网络2与分布式存储系统3连接，通过网络4与客户端5连接。分布式存储系统3包含若干间机房，每间机房中有若干台机柜，每台机柜中安装有若干台主机，每台主机中包含若干个OSD，主机之间通过光纤高速交换机进行连接，机柜之间通过带宽更高的光纤交换完成交互连接。电子装置1利用本申请提供的数据存储程序10将用户在客户端5发送的数据文件存储至分布式存储系统3的OSD中。1 is an application environment diagram of a preferred embodiment of an electronic device of the present application. In this embodiment, the electronic device 1 is connected to the distributed storage system 3 via the network 2 and to the client 5 via the network 4. The distributed storage system 3 includes a plurality of cabinets, each of which has a plurality of cabinets, and each of the cabinets has a plurality of hosts, each of which contains a plurality of OSDs, and the hosts are connected by a fiber optic high-speed switch. Interconnecting is accomplished through a higher bandwidth fiber exchange. The electronic device 1 stores the data file transmitted by the user at the client 5 into the OSD of the distributed storage system 3 using the data storage program 10 provided by the present application.

电子装置1可以是服务器、智能手机、平板电脑、便携计算机、桌上型计算机等具有存储和运算功能的终端设备。The electronic device 1 may be a terminal device having a storage and computing function such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, or the like.

该电子装置1包括存储器11、处理器12、网络接口13及通信总线14。The electronic device 1 includes a memory 11, a processor 12, a network interface 13, and a communication bus 14.

网络接口13可以包括标准的有线接口、无线接口(如WI-FI接口)。通信总线14用于实现这些组件之间的连接通信。The network interface 13 may include a standard wired interface, a wireless interface (such as a WI-FI interface). Communication bus 14 is used to implement connection communication between these components.

存储器11包括至少一种类型的可读存储介质。所述至少一种类型的可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器等的非易失性存储介质。在一些实施例中，所述可读存储介质可以是所述电子装置1的内部存储单元，例如该电子装置1的硬盘。在另一些实施例中，所述可读存储介质也可以是所述电子装置1的外部存储器11，例如所述电子装置1上配备的插接式硬盘，智能存储卡(Smart Media Card,SMC)，安全数字(Secure Digital,SD)卡，闪存卡(Flash Card)等。The memory 11 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card type memory, or the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1. In other embodiments, the readable storage medium may also be an external memory 11 of the electronic device 1, such as a plug-in hard disk equipped on the electronic device 1, a smart memory card (SMC). , Secure Digital (SD) card, Flash Card, etc.

在本实施例中，所述存储器11存储所述数据存储程序10的程序代码以及处理器12执行数据存储程序10的程序代码应用到的其他数据以及最后输出的数据等。In the present embodiment, the memory 11 stores program codes of the data storage program 10 and other data to which the processor 12 executes the program code of the data storage program 10, and finally outputted data and the like.

处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU)，微处理器或其他数据处理芯片。Processor 12 may be a Central Processing Unit (CPU), microprocessor or other data processing chip in some embodiments.

图1仅示出了具有组件11-14的电子装置1，但是应理解的是，并不要求实施所有示出的组件，可以替代的实施更多或者更少的组件。Figure 1 shows only the electronic device 1 with components 11-14, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.

可选地，该电子装置1还可以包括用户接口，用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的设备、语音输出装置比如音响、耳机等，可选地用户接口还可以包括标准的有线接口、无线接口。Optionally, the electronic device 1 may further include a user interface, and the user interface may include an input unit such as a keyboard, a voice input device such as a microphone, a device with a voice recognition function, a voice output device such as an audio, a headphone, and the like. Optionally, the user interface may also include a standard wired interface and a wireless interface.

可选地，该电子装置1还可以包括显示器。显示器在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode，有机发光二极管)触摸器等。显示器用于显示电子装置1处理的信息以及可视化的用户界面。Optionally, the electronic device 1 may further include a display. The display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch sensor or the like in some embodiments. The display is used to display information processed by the electronic device 1 and a visualized user interface.

可选地，该电子装置1还包括触摸传感器。所述触摸传感器所提供的供用户进行触摸操作的区域称为触控区域。此外，这里所述的触摸传感器可以为电阻式触摸传感器、电容式触摸传感器等。而且，所述触摸传感器不仅包括接触式的触摸传感器，也可包括接近式的触摸传感器等。此外，所述触摸传感器可以为单个传感器，也可以为例如阵列布置的多个传感器。用户可以通过触摸启动数据存储程序10。Optionally, the electronic device 1 further comprises a touch sensor. The area provided by the touch sensor for the user to perform a touch operation is referred to as a touch area. Further, the touch sensor described herein may be a resistive touch sensor, a capacitive touch sensor, or the like. Moreover, the touch sensor includes not only a contact type touch sensor but also a proximity type touch sensor or the like. Furthermore, the touch sensor may be a single sensor or a plurality of sensors arranged, for example, in an array. The user can start the data storage program 10 by touching.

该电子装置1还可以包括射频(Radio Frequency，RF)电路、传感器和音频电路等等，在此不再赘述。The electronic device 1 may further include a radio frequency (RF) circuit, a sensor, an audio circuit, and the like, and details are not described herein.

图1中的数据存储程序10，在被处理器12执行时，实现以下步骤：The data storage program 10 of FIG. 1, when executed by the processor 12, implements the following steps:

关于上述步骤的详细介绍，请参照下述图2关于数据存储程序10的程序模块图、图3关于数据文件存储的示意图及图4关于数据存储方法较佳实施例的流程图的说明。For a detailed description of the above steps, please refer to the following description of the program block diagram of the data storage program 10, the schematic diagram of the data file storage of FIG. 3, and the flowchart of the preferred embodiment of the data storage method of FIG.

参照图2所示，为图1中数据存储程序10的程序模块图。在本实施例中，数据存储程序10被分割为多个模块，该多个模块被存储于存储器11中，并由处理器12执行，以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。Referring to FIG. 2, it is a program block diagram of the data storage program 10 of FIG. In the present embodiment, the data storage program 10 is divided into a plurality of modules, which are stored in the memory 11 and executed by the processor 12 to complete the present application. A module as referred to in this application refers to a series of computer program instructions that are capable of performing a particular function.

所述数据存储程序10可以被分割为：设备分配模块110、请求接收模块120、业务组确定模块130和文件存储模块140。The data storage program 10 can be divided into: a device allocation module 110, a request receiving module 120, a service group determining module 130, and a file storage module 140.

设备分配模块110，用于每隔预设周期按照预设规则确定各业务组使用的OSD的数量，并为同一个业务组的OSD分配一个唯一的子集群标识。在本实施例中，所述预设周期为一个季度，所述预设规则包括：The device allocation module 110 is configured to determine the number of OSDs used by each service group according to a preset rule every preset period, and allocate a unique sub-cluster identifier to the OSD of the same service group. In this embodiment, the preset period is one quarter, and the preset rule includes:

统计各业务组在一个预设周期(例如前三个月)内在分布式存储系统中存储数据的历史资料，包括总数据大小、涉及的OSD总数量、发生数据迁移的OSD数量；Counting historical data of data stored in a distributed storage system by each service group in a preset period (for example, the first three months), including total data size, total number of OSDs involved, and number of OSDs in which data migration occurs;

根据各业务组的总数据大小、涉及的OSD总数量、发生数据迁移的OSD数量计算得到所有业务组在该预设时间范围内在该分布式存储系统中存储数据的平均数据大小、涉及的OSD平均数量和发生数据迁移的OSD平均数量；Calculate the average data size of the data stored in the distributed storage system by all service groups in the preset time range, and the average OSD involved, according to the total data size of each service group, the total number of OSDs involved, and the number of OSDs in which data migration occurs. The number and the average number of OSDs where data migration occurs;

当业务组的总数据大小与平均数据大小之差每大于第一预设阈值(例如500GB)时，在所述涉及的OSD总数量基础上，为该业务组增加第一预设数量(例如2个)的OSD，When the difference between the total data size of the service group and the average data size is greater than a first preset threshold (for example, 500 GB), a first preset number is added to the service group based on the total number of OSDs involved (for example, 2) OSD,

当业务组发生数据迁移的OSD数量与所有业务组发生数据迁移的OSD平均数量之差每大于第二预设阈值(例如2个)时,在所述涉及的OSD总数量基础上，为该业务组增加第二预设数量(例如1个)的OSD。When the difference between the number of OSDs in which data migration occurs in the service group and the average number of OSDs in which data migration occurs in each service group is greater than a second predetermined threshold (for example, two), the service is based on the total number of OSDs involved. The group adds a second predetermined number (eg, 1) of OSDs.

该预设规则是为了根据实际业务需求，动态地，为业务数量大、数据迁移频繁的业务组分配充足的OSD。为业务组分配OSD时，具有不同子集群标识的OSD应该位于不同的主机上，目的是让在集群的拓扑结构中加入的子集群的逻辑划分在主机的层次上面。The preset rule is to dynamically allocate sufficient OSDs for service groups with large number of services and frequent data migration according to actual service requirements. When an OSD is assigned to a service group, the OSDs with different sub-cluster IDs should be located on different hosts, so that the logical division of the sub-clusters added in the topology of the cluster is above the level of the host.

请求接收模块120，用于接收用户向分布式存储系统存储数据文件的请求。当用户在客户端5发出向分布式存储系统存储数据文件的请求时，请求接收模块120接收该请求，并将该请求发送至分布式存储系统3。The request receiving module 120 is configured to receive a request for the user to store the data file to the distributed storage system. When the user issues a request to the distributed storage system to store the data file at the client 5, the request receiving module 120 receives the request and sends the request to the distributed storage system 3.

业务组确定模块130，用于根据用户的识别信息确定该用户所属的业务组及该业务组OSD的子集群标识。因为设备分配模块110为每个业务组分配了确定的OSD，所以在存储数据文件时要先确定数据文件属于哪个业务组，以此确定该数据文件的存储范围，此时可以通过识别发出存储数据文件请求的用户得知数据文件所属的业务组。所述用户的识别信息可以包括用户IP地址、系统登录名、身份验证信息等等。The service group determining module 130 is configured to determine, according to the identification information of the user, the service group to which the user belongs and the sub-cluster identifier of the service group OSD. Because the device allocation module 110 allocates the determined OSD for each service group, when storing the data file, it is first determined which service group the data file belongs to, thereby determining the storage range of the data file, and the storage data can be sent by identifying at this time. The user requesting the file knows the business group to which the data file belongs. The identification information of the user may include a user IP address, a system login name, identity verification information, and the like.

文件存储模块140，将所述数据文件均匀切分为多个数据块，利用CRUSH算法将每个数据块的多个副本存储在具有对应子集群标识的OSD中。本实施例中，在存储数据文件时，先将所述数据文件按照特定的大小均匀切分成多个数据块，再将每个数据块映射到一个PG中，获得每个PG的PG识别码(PlacementGroupid，PGid)，最后利用CRUSH算法根据PGid将PG的多个副本存储在具有对应子集群标识的OSD中。一个PG有n个副本，就会被存储到n个OSD中，n的数值可以根据实际应用中对于可靠性的需求而配置，在通常情况下取值3。将数据文件均匀切分成数据块的过程类似于磁盘条带化的过程，这样做的目的是：其一，可以将大小不等的数据文件变成多个容量一致，便于高效管理的数据块；其二，将对数据文件的串行处理变成对多个数据块的并行化处理，提高处理速度。Ceph分布式存储系统将数据文件切分成数据块时默认的数据块大小是4MB，因此一个数据文件切分成数据块时数据块的数量通常非常多，如果在存储设备集群中对大量的对象进行遍历寻址，速度将会非常缓慢，为了解决这些问题，我们引入PG的概念。PG是抽象的存储节点，每个数据块都会固定映射到一个PG中，而一个PG中通常可包含多个对象。同一个PG内的所有数据块具有相同的存储策略，也就是说，同一个PG内的数据块会被存储在相同的OSD中，同一个PG的不同副本可以视为相似的PG，但它们会被存储在不同的OSD中。在本实施例中，无论在数据寻址还是在数据迁移时，都是以PG作为基本单位，不会直接操作数据块。数据文件所属的业务组由业务组确定模块130得知，各业务组使用的OSD的数量由设备分配模块110确定，故而可以设置PG的数量，因为PG数量的设置通常遵循以下规则：The file storage module 140 divides the data file into a plurality of data blocks uniformly, and stores multiple copies of each data block in an OSD having a corresponding sub-cluster identifier by using a CRUSH algorithm. In this embodiment, when storing the data file, the data file is first uniformly divided into a plurality of data blocks according to a specific size, and each data block is mapped into one PG to obtain a PG identification code of each PG ( PlacementGroupid, PGid) Finally, multiple copies of the PG are stored in the OSD with the corresponding sub-cluster identifier according to the PGid using the CRUSH algorithm. A PG has n copies, which are stored in n OSDs. The value of n can be configured according to the reliability requirements of the actual application. In the normal case, the value is 3. The process of evenly dividing a data file into data blocks is similar to the process of disk striping. The purpose of this is to: firstly, data files of different sizes can be changed into multiple data blocks with uniform capacity for efficient management; Second, the serial processing of the data file becomes parallel processing of a plurality of data blocks, and the processing speed is improved. When Ceph distributed storage system divides data files into data blocks, the default data block size is 4MB. Therefore, when a data file is divided into data blocks, the number of data blocks is usually very large. If a large number of objects are traversed in a storage device cluster. Addressing, the speed will be very slow, in order to solve these problems, we introduce the concept of PG. A PG is an abstract storage node. Each data block is fixedly mapped to a PG, and a PG can usually contain multiple objects. All data blocks in the same PG have the same storage strategy, that is, the data blocks in the same PG will be stored in the same OSD, and different copies of the same PG can be regarded as similar PGs, but they will It is stored in different OSDs. In the present embodiment, PG is used as a basic unit in data addressing or data migration, and data blocks are not directly manipulated. The service group to which the data file belongs is known by the service group determining module 130. The number of OSDs used by each service group is determined by the device allocation module 110. Therefore, the number of PGs can be set because the setting of the number of PGs generally follows the following rules:

当使用的OSD的数量少于5个时，PG的数量设置为128；When the number of OSDs used is less than 5, the number of PGs is set to 128;

当使用的OSD的数量在5到10个时，PG的数量设置为512；When the number of OSDs used is 5 to 10, the number of PGs is set to 512;

当使用的OSD的数量在10到50个时，PG的数量设置为4096；When the number of OSDs used is 10 to 50, the number of PGs is set to 4096;

当使用的OSD的数量大于50个时，可以使用PGCalculator等工具计算PG的数量。When the number of OSDs used is greater than 50, the number of PGs can be calculated using a tool such as PGCalculator.

PG的数量一旦确定，通常不再变更，因为它显著影响着集群的行为以及OSD等故障时的数据持久性，即灾难性事件导致数据丢失的概率。Once the number of PGs is determined, it is usually not changed because it significantly affects the behavior of the cluster and the data persistence in the event of a failure such as OSD, that is, the probability of data loss caused by a catastrophic event.

所述CRUSH算法可以计算出数据的存储位置，即每个PG的每个副本将存储在哪个OSD上，它可以保证同一PG的不同副本存储在不同的OSD中。所述CRUSH算法在计算数据存储位置时需要一个集群拓扑结构的描述地图(CRUSH Map)，在本实施例中，我们更改了CRUSH Map，在集群的拓扑结构中主机的层次上面加入了一个子集群的逻辑划分。参照图3所示，为本申请的文件存储的示意图，我们将包含有每个业务组使用的OSD的若干台主机分别作为一个子集群，每个子集群专门用于存储某个业务组的数据，这样CRUSH算法在计算数据存储位置时，可以保证每个PG的多个副本存储在同一子集群中且同一PG的不同副本存储在不同的主机上。The CRUSH algorithm can calculate the storage location of the data, ie which OSD each copy of each PG will be stored on, which can ensure that different copies of the same PG are stored in different OSDs. The CRUSH algorithm requires a description map of the cluster topology (CRUSH Map) when calculating the data storage location. In this embodiment, we change the CRUSH Map, and a sub-cluster is added to the hierarchy of the host in the topology of the cluster. The logical division. Referring to FIG. 3, which is a schematic diagram of the file storage of the present application, we use a plurality of hosts including the OSD used by each service group as a sub-cluster, and each sub-cluster is dedicated to storing data of a certain service group. In this way, when calculating the data storage location, the CRUSH algorithm can ensure that multiple copies of each PG are stored in the same sub-cluster and different copies of the same PG are stored on different hosts.

可以理解的是，将不同业务组的数据文件存储在不同的子集群中，例如，中国平安保险(集团)股份有限公司将寿险的数据文件和产险的数据文件分别存储在一个子集群中，这样做的好处是：当一个OSD发生故障时，可以利用该OSD所属子集群的其它OSD存储的副本数据恢复该OSD中存储的数据，数据不会丢失，数据恢复过程不占用其他业务组的系统资源，不会影响其他业务组的数据访问性能；当发生故障的OSD的数量大于或等于副本数时，可能会损失数据，但只会损失发生故障的OSD所在的子集群对应的业务组的数据，不会对其他业务组的数据产生影响，不会影响其他业务组的数据访问性能。It can be understood that the data files of different business groups are stored in different sub-clusters. For example, China Ping An Insurance (Group) Co., Ltd. stores the data files of life insurance and the data files of property insurance in a sub-cluster. The advantage of this is that when an OSD fails, the data stored in the OSD of the sub-cluster of the OSD belongs to the data stored in the OSD, and the data is not lost, and the data recovery process does not occupy the system of other service groups. The data does not affect the data access performance of other service groups. When the number of failed OSDs is greater than or equal to the number of copies, the data may be lost, but only the data of the service group corresponding to the sub-cluster where the failed OSD is located is lost. It does not affect the data of other business groups and does not affect the data access performance of other business groups.

参照图4所示，为本申请数据存储方法较佳实施例的流程图。利用图1所示的架构，启动电子装置1，处理器12执行存储器11中存储的数据存储程序10，实现如下步骤：Referring to FIG. 4, it is a flowchart of a preferred embodiment of the data storage method of the present application. With the architecture shown in FIG. 1, the electronic device 1 is booted, and the processor 12 executes the data storage program 10 stored in the memory 11, implementing the following steps:

步骤S10，利用设备分配模块110每隔预设周期按照预设规则确定各业务组使用的OSD的数量，并为同一个业务组的OSD分配一个唯一的子集群标识。在本实施例中，具有不同子集群标识的OSD应该位于不同的主机上，目的是让在集群的拓扑结构中加入的子集群的逻辑划分在主机的层次上面。所述预设周期及预设规则的请参照上述关于设备分配模块110的详细介绍。In step S10, the device allocation module 110 determines the number of OSDs used by each service group according to a preset rule every preset period, and allocates a unique sub-cluster identifier to the OSD of the same service group. In this embodiment, the OSDs with different sub-cluster identifiers should be located on different hosts, so that the logical division of the sub-cluster added in the topology of the cluster is above the hierarchy of the host. For the preset period and preset rules, please refer to the above detailed description about the device allocation module 110.

步骤S20，利用请求接收模块120接收用户向分布式存储系统存储数据文件的请求。当用户在电子装置1上发出向分布式存储系统存储数据文件的请求时，请求接收模块120接收该用户存储数据文件的请求，并将该请求发送至分布式存储系统3。In step S20, the request receiving module 120 is used to receive a request for the user to store the data file to the distributed storage system. When the user issues a request to store the data file to the distributed storage system on the electronic device 1, the request receiving module 120 receives the request for the user to store the data file, and sends the request to the distributed storage system 3.

步骤S30，利用业务组确定模块130根据用户的识别信息确定该用户所属的业务组及该业务组OSD的子集群标识。所述用户的识别信息可以包括IP地址、系统登录名、身份验证信息等。通过确定发出存储数据文件请求的用户所属的业务组，可以得知该数据文件的存储位范围。In step S30, the service group determining module 130 determines the service group to which the user belongs and the sub-cluster identifier of the service group OSD according to the identification information of the user. The identification information of the user may include an IP address, a system login name, identity verification information, and the like. The storage bit range of the data file can be known by determining the service group to which the user who issued the request to store the data file belongs.

步骤S40，利用文件存储模块140将所述数据文件均匀切分为多个数据块，并将每个数据块的多个副本存储在具有对应子集群标识的OSD中。数据文件存储的具体过程请参照上述关于文件存储模块140的详细介绍。需要进一步说明的是，用户存储数据文件，即客户端写入一个数据文件时，一个PG的多个副本包括一个主副本和多个从副本，CRUSH算法会为每个PG计算出与副本个数相等的OSD，例如，假设一个PG有一个主副本和两个从副本，则CRUSH算法会为该PG计算出三个OSD，并对该三个OSD进行编码，使这三个OSD具有不同的序号，序号设置最靠前的第一个OSD用于存储主副本，其余两个OSD用于存储从副本。存储数据文件时，所有的副本都必须存储在OSD中，而在读取数据时，所有的读取操作都从主副本读取。当主副本所在的OSD发生故障时，文件存储模块140会自动利用CRUSH算法重新计算出OSD，以供数据迁移和恢复，原本存储从副本的OSD中序号设置最靠前的OSD代替存储主副本的OSD，开始对外提供数据读取的操作。当从副本所在的OSD发生故障时，文件存储模块140自动利用CRUSH算法重新计算OSD的方式不变，但不影响主副本所在OSD的位置。In step S40, the data file is evenly divided into a plurality of data blocks by using the file storage module 140, and multiple copies of each data block are stored in the OSD having the corresponding sub-cluster identifier. For the specific process of data file storage, please refer to the above detailed description of the file storage module 140. It should be further explained that the user stores the data file, that is, when the client writes a data file, multiple copies of one PG include one master copy and multiple slave copies, and the CRUSH algorithm calculates the number of copies for each PG. An equal OSD, for example, assuming that a PG has one master copy and two slave copies, the CRUSH algorithm calculates three OSDs for the PG and encodes the three OSDs so that the three OSDs have different sequence numbers. The first OSD of the serial number is used to store the primary copy, and the other two OSDs are used to store the secondary copy. When storing data files, all copies must be stored in the OSD, and all read operations are read from the primary copy when the data is read. When the OSD where the primary copy is located fails, the file storage module 140 automatically recalculates the OSD by using the CRUSH algorithm for data migration and recovery. The original storage OSD from the copy of the OSD is set to replace the OSD of the primary copy. , began to provide external data reading operations. When the failure occurs from the OSD where the copy is located, the file storage module 140 automatically recalculates the OSD by using the CRUSH algorithm, but does not affect the location of the OSD where the primary copy is located.

此外，本申请实施例还提出一种计算机可读存储介质，所述计算机可读存储介质可以是硬盘、多媒体卡、SD卡、闪存卡、SMC、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器等等中的任意一种或者几种的任意组合。所述计算机可读存储介质中包括待存储文件以及数据存储程序10，所述数据存储程序10被处理器执行时实现如下操作：In addition, the embodiment of the present application further provides a computer readable storage medium, which may be a hard disk, a multimedia card, an SD card, a flash memory card, an SMC, a read only memory (ROM), and an erasable programmable Any combination or combination of any one or more of read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, and the like. The computer readable storage medium includes a file to be stored and a data storage program 10, and when the data storage program 10 is executed by the processor, the following operations are implemented:

本申请之计算机可读存储介质的具体实施方式与上述数据存储方法以及电子装置1的具体实施方式大致相同，在此不再赘述。The specific implementation of the computer readable storage medium of the present application is substantially the same as the above-described data storage method and the specific embodiment of the electronic device 1, and details are not described herein again.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、装置、物品或者方法所固有的要素。It is to be understood that the term "comprises", "comprising", or any other variants thereof, is intended to encompass a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a series of elements includes those elements. It also includes other elements not explicitly listed, or elements that are inherent to such a process, device, item, or method.

上述本申请实施例序号仅仅为了描述，不代表实施例的优劣。通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，或者网络设备等)执行本申请各个实施例所述的方法。The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments. Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM as described above). , a disk, an optical disk, including a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the various embodiments of the present application.

以上仅为本申请的优选实施例，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利保护范围内。The above is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of this application.

Claims

A data storage method is applied to an electronic device, the electronic device is connected to a distributed storage system through a network, and the distributed storage system provides a series of hosts and an OSD, wherein the method comprises:

Device allocation step: determining the number of OSDs used by each service group according to a preset rule every preset period, and assigning a unique sub-cluster identifier to the OSD of the same service group;

Request receiving step: receiving a request for a user to store a data file to a distributed storage system;

The service group determining step is: determining, according to the identification information of the user, the service group to which the user belongs and the sub-cluster identifier of the service group OSD; and

File storage step: uniformly divide the data file into a plurality of data blocks, and store multiple copies of each data block in an OSD having a corresponding sub-cluster identifier by using a CRUSH algorithm.

The data storage method according to claim 1, wherein the preset rule in the device allocation step comprises:

Counting historical data of data stored in a distributed storage system by each service group in a preset period, including total data size, total number of OSDs involved, and number of OSDs in which data migration occurs;

Calculate the average data size, average OSD quantity, and occurrence of data stored in the distributed storage system by all service groups in the preset period according to the total data size of each service group, the total number of OSDs involved, and the number of OSDs in which data migration occurs. The average number of OSDs for data migration;

When the difference between the total data size of a service group and the average data size of all the service groups is greater than the first preset threshold, the first preset quantity is added to the service group based on the total number of OSDs involved. OSD;

When the difference between the number of OSDs in which data migration occurs in a certain service group and the average number of OSDs in which data migration occurs in each service group is greater than a second preset threshold, the service group is added based on the total number of OSDs involved. The second predetermined number of OSDs.

The data storage method according to claim 1, wherein the OSDs having different sub-cluster identifiers are located on different hosts.

The data storage method according to claim 2, wherein the OSDs having different sub-cluster identifiers are located on different hosts.

The data storage method according to claim 2, wherein said file storage step comprises the following steps:

Evenly dividing the data file into a plurality of data blocks;

Map each data block to a PG;

Multiple copies of each PG are stored in the OSD with the corresponding sub-cluster identification using the CRUSH algorithm.

The data storage method according to claim 1, further comprising: when an OSD fails, recovering data stored in the OSD by using replica data stored by other OSDs of the sub-cluster to which the OSD belongs.

The data storage method according to claim 2, wherein the method further comprises: when an OSD fails, recovering data stored in the OSD by using replica data stored by other OSDs of the sub-cluster to which the OSD belongs.

An electronic device includes a memory and a processor, wherein the memory includes a data storage program, and the data storage program is executed by the processor to implement the following steps:

The electronic device according to claim 8, wherein the preset rule in the device allocation step comprises:

The electronic device of claim 8 wherein the OSDs having different sub-cluster identifications are located on different hosts.

The electronic device of claim 9, wherein the OSDs having different sub-cluster identifiers are located on different hosts.

The electronic device of claim 8, wherein the file storing step comprises the steps of:

Evenly dividing the data file into a plurality of data blocks;

Map each data block to a PG;

The electronic device according to claim 8, wherein said data storage program is further executed by said processor: when an OSD fails, utilizing a copy of another OSD stored by the sub-cluster to which the OSD belongs The data is restored to the data stored in the OSD.

The electronic device according to claim 9, wherein said data storage program is further executed by said processor to: when an OSD fails, utilizing a copy of another OSD stored by the sub-cluster to which the OSD belongs The data is restored to the data stored in the OSD.

A computer readable storage medium, comprising: a data storage program, wherein when the data storage program is executed by a processor, the following steps are implemented:

The storage medium according to claim 15, wherein the preset rule in the device allocation step comprises:

The historical data of the data stored in the distributed storage system by each service group in a preset period, including the total data size, the total number of OSDs involved, and the number of OSDs in which data migration occurs;

The storage medium of claim 15 wherein the OSDs having different sub-cluster identifications are located on different hosts.

The storage medium of claim 16 wherein the OSDs having different sub-cluster identifications are located on different hosts.

A storage medium according to claim 15, wherein said file storing step comprises the steps of:

Evenly dividing the data file into a plurality of data blocks;

Map each data block to a PG;

A storage medium according to claim 15 or 16, wherein said data storage program is further executed by said processor to: when an OSD fails, utilizing other OSD storage of the sub-cluster to which the OSD belongs The copy data is restored to the data stored in the OSD.