[go: up one dir, main page]

CN111522514A - Cluster file system, data processing method, computer equipment and storage medium - Google Patents

Cluster file system, data processing method, computer equipment and storage medium Download PDF

Info

Publication number
CN111522514A
CN111522514A CN202010343972.6A CN202010343972A CN111522514A CN 111522514 A CN111522514 A CN 111522514A CN 202010343972 A CN202010343972 A CN 202010343972A CN 111522514 A CN111522514 A CN 111522514A
Authority
CN
China
Prior art keywords
virtual block
instruction
data
metadata
block device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010343972.6A
Other languages
Chinese (zh)
Other versions
CN111522514B (en
Inventor
张和泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Puli Technology Co ltd
Original Assignee
Shanghai Sensetime Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sensetime Intelligent Technology Co Ltd filed Critical Shanghai Sensetime Intelligent Technology Co Ltd
Priority to CN202010343972.6A priority Critical patent/CN111522514B/en
Publication of CN111522514A publication Critical patent/CN111522514A/en
Application granted granted Critical
Publication of CN111522514B publication Critical patent/CN111522514B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0607Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0614Improving the reliability of storage systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/065Replication mechanisms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a cluster file system, a data processing method, a computer device, and a storage medium, wherein the cluster file system includes: an object storage server, a plurality of object storage devices, and a plurality of first virtual block devices; the object storage server is connected with a plurality of object storage devices; each object storage device is connected with at least one first virtual block device; the object storage server is used for receiving a first input/output (IO) instruction sent by the client and accessing the first virtual block device through the object storage device based on the first IO instruction; target data is stored in the first virtual block device; wherein the target data is stored as at least three copies of data in the first virtual block device.

Description

集群文件系统、数据处理方法、计算机设备及存储介质Cluster file system, data processing method, computer equipment and storage medium

技术领域technical field

本公开涉及计算机存储技术领域,具体而言,涉及一种集群文件系统、数据处理方法、计算机设备及存储介质。The present disclosure relates to the technical field of computer storage, and in particular, to a cluster file system, a data processing method, a computer device, and a storage medium.

背景技术Background technique

集群文件系统是指运行在多台计算机之上,之间通过某种方式相互通信将集群内所有存储空间资源整合、虚拟化并对外提供文件访问服务的文件系统。A cluster file system refers to a file system that runs on multiple computers and communicates with each other in a certain way to integrate and virtualize all storage space resources in the cluster and provide file access services to the outside world.

数据的可靠性是衡量一个集群文件系统稳定性的重要指标;作为集群文件系统一种,Lustre为了防止数据丢失,通常采用动态文件系统(Zettabyte File System,ZFS)和硬件冗余的方式来保证数据可靠性。其中,ZFS是一种内核文件系统,将多块磁盘组成一个软磁盘阵列(Redundant Arrays of Independent Disks,RAID)为Lustre集群提供数据存储盘。Data reliability is an important indicator to measure the stability of a cluster file system; as a cluster file system, Lustre usually adopts a dynamic file system (Zettabyte File System, ZFS) and hardware redundancy to ensure data in order to prevent data loss. reliability. Among them, ZFS is a kernel file system, which combines multiple disks into a floppy disk array (Redundant Arrays of Independent Disks, RAID) to provide data storage disks for the Lustre cluster.

这种数据存储方式存在可扩展性差的问题。This data storage method has the problem of poor scalability.

发明内容SUMMARY OF THE INVENTION

本公开实施例至少提供一种集群文件系统、数据处理方法、计算机设备及存储介质。Embodiments of the present disclosure provide at least a cluster file system, a data processing method, a computer device, and a storage medium.

第一方面,本公开实施例提供了一种集群文件系统,包括:对象存储服务器、多个对象存储设备、以及多个第一虚拟块设备;其中,所述对象存储服务器与多个所述对象存储设备连接;每个所述对象存储设备连接至少一个所述第一虚拟块设备;所述对象存储服务器,用于接收客户端发送的第一输入输出IO指令,并基于所述第一IO指令,通过所述对象存储设备访问所述第一虚拟块设备;所述第一虚拟块设备中存储有目标数据;其中,所述目标数据在所述第一虚拟块设备中存储为至少三个数据副本。In a first aspect, an embodiment of the present disclosure provides a cluster file system, including: an object storage server, a plurality of object storage devices, and a plurality of first virtual block devices; wherein the object storage server and a plurality of the objects storage device connection; each object storage device is connected to at least one of the first virtual block devices; the object storage server is configured to receive the first input and output IO instruction sent by the client, and based on the first IO instruction , access the first virtual block device through the object storage device; target data is stored in the first virtual block device; wherein, the target data is stored in the first virtual block device as at least three pieces of data copy.

这样,第一虚拟块设备能够将多个物理磁盘Disk聚合到一起,形成一个大容量的虚拟设备,因而具有更强的可扩展性。同时,目标数据在第一虚拟块设备中能够被存储为至少三个副本,可以大大提高数据的可靠性。In this way, the first virtual block device can aggregate multiple physical disks together to form a large-capacity virtual device, thus having stronger scalability. Meanwhile, the target data can be stored as at least three copies in the first virtual block device, which can greatly improve the reliability of the data.

一种可能的实施方式中,还包括:元数据服务器、多个元数据存储设备、以及多个第二虚拟块设备;其中,所述元数据服务器与多个所述元数据存储设备连接;每个所述元数据存储设备连接至少一个所述第二虚拟块设备;所述元数据服务器,用于接收所述客户端发送的第二IO指令,并基于所述第二IO指令,通过所述元数据存储设备访问所述第二虚拟块设备;所述第二虚拟块设备中存储有与所述目标数据对应的元数据;其中,所述元数据在所述第二虚拟块设备中存储为至少三个元数据副本。In a possible implementation manner, it further includes: a metadata server, multiple metadata storage devices, and multiple second virtual block devices; wherein the metadata server is connected to multiple metadata storage devices; each Each of the metadata storage devices is connected to at least one of the second virtual block devices; the metadata server is configured to receive a second IO instruction sent by the client, and based on the second IO instruction, The metadata storage device accesses the second virtual block device; the second virtual block device stores metadata corresponding to the target data; wherein the metadata is stored in the second virtual block device as At least three copies of metadata.

这样,这样,由于第二虚拟块设备能够将多个物理磁盘Disk聚合到一起,形成一个大容量的虚拟设备,而第二虚拟块设备中所包含的Disk数量则较大,因而具有更强的可扩展性。In this way, since the second virtual block device can aggregate multiple physical disk Disks together to form a large-capacity virtual device, and the number of Disks contained in the second virtual block device is larger, it has a stronger capacity. Extensibility.

同时,第二虚拟块设备的存储容量更大,因为元数据在第一虚拟块设备中能够被存储为至少三个副本,可以大大提高数据的可靠性。At the same time, the storage capacity of the second virtual block device is larger, because the metadata can be stored as at least three copies in the first virtual block device, which can greatly improve the reliability of the data.

一种可能的实施方式中,所述目标数据包括:由多个文件聚合成的数据块;与所述数据块对应的元数据包括各个所述文件在所述数据块中的位置信息。In a possible implementation manner, the target data includes: data blocks aggregated from multiple files; and the metadata corresponding to the data blocks includes location information of each of the files in the data blocks.

这样,由于目标数据包括了有多个文件聚合成的数据块;对于元数据服务器而言,其需要维护的文件目录规模会显著下降,进而本公开实施例提供的集群文件系统能够适用于海量小文件应用。In this way, since the target data includes data blocks aggregated by multiple files; for the metadata server, the size of the file directory that needs to be maintained will be significantly reduced, and the cluster file system provided by the embodiments of the present disclosure can be applied to a large number of small file application.

一种可能的实施方式中,所述第一虚拟块设备,还用于按照预设的第一读取顺序,从与所述第一IO指令对应的至少三个数据副本中确定并向所述对象存储设备返回目标数据副本。In a possible implementation manner, the first virtual block device is further configured to, according to a preset first reading sequence, determine from at least three data copies corresponding to the first IO instruction and send the data to the The object storage device returns a copy of the target data.

这样,保证了目标数据的顺利读取。In this way, the smooth reading of the target data is ensured.

第二方面,本公开实施例提供一种数据处理方法,包括:接收客户端发送的输入输出IO指令;基于所述IO指令,控制存储设备将与所述IO指令对应的数据写入所述虚拟块设备中,或者从所述虚拟块设备中读取与所述IO指令对应的数据;其中,所述数据在所述虚拟块设备中存储为至少三个数据副本。In a second aspect, an embodiment of the present disclosure provides a data processing method, including: receiving an input/output IO instruction sent by a client; and based on the IO instruction, controlling a storage device to write data corresponding to the IO instruction into the virtual machine The data corresponding to the IO instruction is read from the block device or from the virtual block device; wherein, the data is stored as at least three data copies in the virtual block device.

一种可能的实施方式中,所述存储设备包括:对象存储设备;所述虚拟块设备包括:第一虚拟块设备;所述IO指令包括:第一IO指令;所述数据包括目标数据;所述基于所述IO指令,控制存储设备将与所述IO指令对应的数据写入所述虚拟块设备中,或者从所述虚拟块设备中读取与所述IO指令对应的数据,包括:基于第一IO指令,控制所述对象存储设备将与所述第一IO指令对应的目标数据写入所述第一虚拟块设备中,或者从所述第一虚拟块设备中读取与所述第一IO指令对应的目标数据。In a possible implementation manner, the storage device includes: an object storage device; the virtual block device includes: a first virtual block device; the IO instruction includes: a first IO instruction; the data includes target data; Described, based on the IO instruction, controlling the storage device to write the data corresponding to the IO instruction into the virtual block device, or to read the data corresponding to the IO instruction from the virtual block device, including: based on The first IO instruction controls the object storage device to write the target data corresponding to the first IO instruction into the first virtual block device, or read the target data corresponding to the first virtual block device from the first virtual block device. Target data corresponding to an IO instruction.

一种可能的实施方式中,所述存储设备包括:元数据存储设备;所述虚拟块设备包括:第二虚拟块设备;所述IO指令包括:第二IO指令;所述数据包括元数据;所述基于所述IO指令,控制存储设备将与所述IO指令对应的数据写入所述虚拟块设备中,或者从所述虚拟块设备中读取与所述IO指令对应的数据,包括:基于第二IO指令,控制所述元数据存储设备将与所述第二IO指令对应的元数据数据写入所述第二虚拟块设备中,或者从所述第二虚拟块设备中读取与所述第二IO指令对应的元数据。In a possible implementation manner, the storage device includes: a metadata storage device; the virtual block device includes: a second virtual block device; the IO instruction includes: a second IO instruction; the data includes metadata; The controlling the storage device to write data corresponding to the IO instruction into the virtual block device based on the IO instruction, or to read data corresponding to the IO instruction from the virtual block device, including: Based on the second IO instruction, the metadata storage device is controlled to write the metadata data corresponding to the second IO instruction into the second virtual block device, or to read the metadata from the second virtual block device. metadata corresponding to the second IO instruction.

一种可能的实施方式中,所述从所述虚拟块设备中读取与所述IO指令对应的数据,包括:将所述IO指令发送至所述虚拟块设备,以使所述虚拟块设备按照预设的读取顺序,从与所述IO指令对应的至少三个数据副本中确定并向所述对象存储设备返回目标数据副本。In a possible implementation manner, the reading data corresponding to the IO instruction from the virtual block device includes: sending the IO instruction to the virtual block device, so that the virtual block device According to a preset reading order, the target data copy is determined from at least three data copies corresponding to the IO instruction and returned to the object storage device.

第三方面,本公开实施例还提供一种计算机设备,包括:相互连接的处理器和存储器,所述存储器存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述机器可读指令被所述处理器执行以实现上述第二方面,或第二方面中任一种可能的实施方式中的数据处理方法的步骤。In a third aspect, an embodiment of the present disclosure further provides a computer device, comprising: a processor and a memory connected to each other, the memory stores machine-readable instructions executable by the processor, and when the computer device runs, the Machine-readable instructions are executed by the processor to implement the second aspect above, or the steps of the data processing method in any possible implementation manner of the second aspect.

第四方面,本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述第二方面,或第二方面中任一种可能的实施方式中的数据处理方法的步骤。In a fourth aspect, an embodiment of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to execute the second aspect, or any one of the second aspect. Steps of a data processing method in one possible implementation.

为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more obvious and easy to understand, the preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.

附图说明Description of drawings

为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following briefly introduces the accompanying drawings required in the embodiments, which are incorporated into the specification and constitute a part of the specification. The drawings illustrate embodiments consistent with the present disclosure, and together with the description serve to explain the technical solutions of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. Other related figures are obtained from these figures.

图1示出了本公开实施例所提供的一种集群文件系统的示意图;FIG. 1 shows a schematic diagram of a cluster file system provided by an embodiment of the present disclosure;

图2示出了本公开实施例所提供的集群文件系中,数据读取过程的示意图;FIG. 2 shows a schematic diagram of a data reading process in a cluster file system provided by an embodiment of the present disclosure;

图3示出了本公开实施例所提供的集群文件系中,OST从VBD中读取目标数据的过程示意图;3 shows a schematic diagram of a process of OST reading target data from VBD in a cluster file system provided by an embodiment of the present disclosure;

图4示出了本公开实施例所提供的一种数据处理方法的流程图;FIG. 4 shows a flowchart of a data processing method provided by an embodiment of the present disclosure;

图5示出了本公开实施例所提供的一种计算机设备的示意图。FIG. 5 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.

具体实施方式Detailed ways

为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only These are some, but not all, embodiments of the present disclosure. The components of the disclosed embodiments generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations. Therefore, the following detailed description of the embodiments of the disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.

经研究发现,集群文件系统Lustre中,通常采用磁盘(Disk)阵列来存储数据。而对象存储设备能够安装的Disk数量有限,想要扩充Lustre的存储容量,就要相应增加对象存储设备的数量;但物理设备成本较高,进而导致Lustre的可扩展性差。After research, it is found that in the cluster file system Lustre, disk arrays are usually used to store data. The number of Disks that can be installed in an object storage device is limited. To expand the storage capacity of Lustre, the number of object storage devices must be increased accordingly; however, the cost of physical devices is high, which leads to poor scalability of Lustre.

同时,由于对象存储设备能够安装的Disk数量有限,这就导致了当前Lustre只能提供最高两副本级别的数据冗余,造成数据可靠性差的问题。At the same time, due to the limited number of Disks that can be installed on object storage devices, currently Lustre can only provide data redundancy at the highest level of two copies, resulting in poor data reliability.

另外,Lustre的集群和并行架构,更适合众多客户端并发进行发文件读写的场合;但对于小文件应用并不适用;尤其是海量小文件应用,元数据服务器和对象存储服务器都需要维护庞大的文件目录,造成Lustre响应速度下降的问题。In addition, the cluster and parallel architecture of Lustre is more suitable for many clients to send and write files concurrently; but it is not suitable for small file applications; especially for massive small file applications, both the metadata server and the object storage server need to maintain huge the file directory, causing Lustre's response speed to drop.

基于上述研究,本公开提供了一种集群文件系统、及数据处理方法,包括对象存储服务器、多个对象存储设备、以及多个第一虚拟块设备;其中,所述对象存储服务器与多个所述对象存储设备连接;每个所述对象存储设备连接至少一个所述第一虚拟块设备;所述对象存储服务器,用于接收客户端发送的第一输入输出(Input/Output,IO)指令,并基于第一IO指令,通过所述对象存储设备访问第一虚拟块设备;第一虚拟块设备中存储有目标数据;第一虚拟块设备(Virtual Block Disk,VBD)能够将多个物理磁盘Disk聚合到一起,形成一个大容量的虚拟设备,因而具有更强的可扩展性。同时,目标数据在第一虚拟块设备中能够被存储为至少三个副本,可以大大提高数据的可靠性。Based on the above research, the present disclosure provides a cluster file system and a data processing method, including an object storage server, a plurality of object storage devices, and a plurality of first virtual block devices; wherein the object storage server and a plurality of all The object storage device is connected; each of the object storage devices is connected to at least one of the first virtual block devices; the object storage server is configured to receive a first input/output (Input/Output, IO) instruction sent by the client, And based on the first IO instruction, the first virtual block device is accessed through the object storage device; the first virtual block device stores target data; the first virtual block device (Virtual Block Disk, VBD) can Disk multiple physical disks. Aggregated together to form a large-capacity virtual device, which is more scalable. Meanwhile, the target data can be stored as at least three copies in the first virtual block device, which can greatly improve the reliability of the data.

针对以上方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案,都应该是发明人在本公开过程中对本公开做出的贡献。The defects existing in the above solutions are all the results obtained by the inventor after practice and careful research. Therefore, the discovery process of the above problems and the solutions to the above problems proposed by the present disclosure hereinafter should be the inventors Contributions made to this disclosure during the course of this disclosure.

应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

为便于对本实施例进行理解,首先对本公开实施例所公开的一种集群文件系统进行详细介绍。In order to facilitate understanding of this embodiment, a cluster file system disclosed by an embodiment of the present disclosure is first introduced in detail.

Luster集群中包括:元数据服务器(Meta Data Server,MDS)、以及对象存储服务器(Object Storage Server,OSS);MDS连接有多个元数据存储设备(Meta Data Target,MDT),OSS连接有多个对象存储设备(Object Storage Target,OST)。每个MDT和每个OST都连接有物理存储(Disk),通过Disk为MDT和SOT提供数据存储服务。The Luster cluster includes: a metadata server (Meta Data Server, MDS) and an object storage server (Object Storage Server, OSS); MDS is connected to multiple metadata storage devices (Meta Data Target, MDT), and OSS is connected to multiple Object Storage Target (OST). Each MDT and each OST are connected to a physical storage (Disk), which provides data storage services for the MDT and SOT through the Disk.

其中,MDS负责为集群提供元数据服务,同时管理整个集群的命名空间;多个MDS之间共享访问一个MDT;每个MDT保存文件元数据对象,如文件名称、目录结构和访问权限等;客户端(Client)通过MDS读取保存于MDT上的元数据。Among them, MDS is responsible for providing metadata services for the cluster and managing the namespace of the entire cluster; one MDT is shared among multiple MDSs; each MDT saves file metadata objects, such as file name, directory structure and access rights; customers The client (Client) reads the metadata stored on the MDT through the MDS.

OSS负责Client和Disk之间的交互及数据的存储,并为Client提供数据的输入/输出(Input/Output,I/O)接口;每个OSS管理一个或者多个OST,每个OST用于存储文件数据对象。OSS is responsible for the interaction between Client and Disk and data storage, and provides data input/output (I/O) interfaces for Client; each OSS manages one or more OSTs, and each OST is used for storage file data object.

参见图1所示,为本公开实施例提供的集群文件系统的结构示意图,所述集群文件系统包括:Referring to FIG. 1, which is a schematic structural diagram of a cluster file system according to an embodiment of the present disclosure, the cluster file system includes:

对象存储服务器11、多个对象存储设备12、以及多个第一虚拟块设备13;an object storage server 11, a plurality of object storage devices 12, and a plurality of first virtual block devices 13;

其中,所述对象存储服务器11与多个所述对象存储设备12连接;每个所述对象存储设备12连接至少一个所述第一虚拟块设备13;Wherein, the object storage server 11 is connected to a plurality of the object storage devices 12; each of the object storage devices 12 is connected to at least one of the first virtual block devices 13;

所述对象存储服务器11,用于接收客户端发送的第一输入输出IO指令,并基于所述第一IO指令,通过所述对象存储设备12访问所述第一虚拟块设备13;The object storage server 11 is configured to receive the first input/output IO instruction sent by the client, and access the first virtual block device 13 through the object storage device 12 based on the first IO instruction;

所述第一虚拟块设备13中存储有目标数据;The first virtual block device 13 stores target data;

其中,所述目标数据在所述第一虚拟块设备中存储为至少三个数据副本。Wherein, the target data is stored as at least three data copies in the first virtual block device.

在具体实施中,对象存储服务器(Object Storage Server,OSS)11,在接收到客户端发送的第一IO指令后,在第一IO指令为数据写入指令的情况下,第一IO指令还携带有与第一IO指令对应的要写入的目标数据。对象存储服务器11根据该第一IO指令,从多个对象存储设备(Object Storage Target,OST)12中确定一目标对象存储设备,并将第一IO指令传递给该目标对象存储设备;该目标对象存储设备连接有第一虚拟块设备(Virtual BlockDisk,VBD)13,目标对象存储设备将第一IO指令中携带的目标数据传递给与其相连的第一虚拟块设备13;第一虚拟块设备13将目标数据存储为至少三个数据副本。In a specific implementation, after receiving the first IO instruction sent by the client, the object storage server (Object Storage Server, OSS) 11, in the case that the first IO instruction is a data writing instruction, the first IO instruction also carries There is target data to be written corresponding to the first IO instruction. The object storage server 11 determines a target object storage device from a plurality of object storage devices (Object Storage Target, OST) 12 according to the first IO instruction, and transmits the first IO instruction to the target object storage device; the target object The storage device is connected with a first virtual block device (Virtual BlockDisk, VBD) 13, and the target object storage device transfers the target data carried in the first IO instruction to the first virtual block device 13 connected to it; the first virtual block device 13 will The target data is stored as at least three copies of the data.

在第一IO指令为读数据指令的情况下,第一IO指令中携带有要读取的目标数据的元数据;对象存储服务器11根据该第一IO指令,从多个对象存储设备12中,确定存储有与元数据对应的目标数据的目标对象存储设备,然后将第一IO指令传递给该目标对象存储设备;该目标对象存储设备根据第一IO指令,从与之连接的第一虚拟块设备13中读取与第一IO指令中携带的元数据对应的目标数据;这里,第一虚拟块设备13在向目标对象存储设备返回目标数据时,按照预先设定的第一读取顺序,从保存的至少三个数据副本中读取目标数据副本;具体地,若无法找到读取顺序中第一个数据副本,则按照读取顺序读取对应的第二个数据副本;若第二个数据副本也无法找到,则按照读取顺序读取对应的第三个数据副本。目标对象存储设备在从与之连接的第一虚拟块设备中读取了目标数据后,将目标数据返回给对象存储服务器11,并由对象存储服务器再返回给客户端。In the case where the first IO instruction is a read data instruction, the first IO instruction carries the metadata of the target data to be read; the object storage server 11, according to the first IO instruction, from the plurality of object storage devices 12, Determine the target object storage device that stores the target data corresponding to the metadata, and then pass the first IO instruction to the target object storage device; the target object storage device, according to the first IO instruction, from the first virtual block connected to it The device 13 reads the target data corresponding to the metadata carried in the first IO instruction; here, when the first virtual block device 13 returns the target data to the target object storage device, according to the preset first reading order, Read the target data copy from the saved at least three data copies; specifically, if the first data copy in the reading order cannot be found, the corresponding second data copy is read in the reading order; If the data copy cannot be found, the corresponding third data copy is read according to the reading order. After reading the target data from the first virtual block device connected to it, the target object storage device returns the target data to the object storage server 11, and the object storage server returns the target data to the client.

本公开另一实施例所提供的集群文件系统中,还包括:元数据服务器14、多个元数据存储设备15、以及多个第二虚拟块设备16;The cluster file system provided by another embodiment of the present disclosure further includes: a metadata server 14, multiple metadata storage devices 15, and multiple second virtual block devices 16;

其中,所述元数据服务器14与多个所述元数据存储设备15连接;每个所述元数据存储设备15连接至少一个所述第二虚拟块设备16;Wherein, the metadata server 14 is connected to a plurality of the metadata storage devices 15; each of the metadata storage devices 15 is connected to at least one of the second virtual block devices 16;

所述元数据服务器14,用于接收所述客户端发送的第二IO指令,并基于所述第二IO指令,通过所述元数据存储设备15访问所述第二虚拟块设备16;The metadata server 14 is configured to receive the second IO instruction sent by the client, and based on the second IO instruction, access the second virtual block device 16 through the metadata storage device 15;

所述第二虚拟块设备16中存储有与所述目标数据对应的元数据;The second virtual block device 16 stores metadata corresponding to the target data;

其中,所述元数据在所述第二虚拟块设备16中存储为至少三个元数据副本。Wherein, the metadata is stored in the second virtual block device 16 as at least three copies of the metadata.

在具体实施中,元数据服务器(Meta Data Server,MDS)14在接收到客户端发送的第二IO指令后,在该第二IO指令为数据写入指令的情况下,在第二IO指令中还携带有与第二IO指令对应的要写入的元数据。元数据服务器14根据该第二IO指令,从多个元数据存储设备(Meta Data Target,MDT)15中确定一目标元数据存储设备,并将第二IO指令传递给确定的目标元数据存储设备;目标元数据存储设备连接有第二虚拟块设备16,目标元数据存储设备将第二IO指令中携带的元数据传递给第二虚拟块设备16;第二虚拟块设备16将元数据存储为至少三个元数据副本。In a specific implementation, after the metadata server (Meta Data Server, MDS) 14 receives the second IO instruction sent by the client, if the second IO instruction is a data writing instruction, in the second IO instruction Also carries the metadata to be written corresponding to the second IO instruction. The metadata server 14 determines a target metadata storage device from a plurality of metadata storage devices (Meta Data Target, MDT) 15 according to the second IO instruction, and transmits the second IO instruction to the determined target metadata storage device The target metadata storage device is connected with the second virtual block device 16, and the target metadata storage device transfers the metadata carried in the second IO instruction to the second virtual block device 16; the second virtual block device 16 stores the metadata as At least three copies of metadata.

在第二IO指令为读数据指令的情况下,第二IO指令中携带有要读取的元数据的相关信息,例如文件名等;元数据服务器14根据该元数据的相关信息,从与之连接的多个元数据存储设备15中确定一目标元数据存储设备,并将第二IO指令传递给该目标元数据存储设备;目标元数据存储设备根据第二IO指令,根据第二IO指令中携带的元数据的相关信息,从与之连接的第二虚拟块设备16中读取与第二IO指令对应的元数据。这里,第二虚拟块设备16在向目标元数据存储设备返回元数据时,按照预先设定的第二读取顺序,从保存的至少三个元数据副本中读取目标元数据副本;具体地,若无法找到读取顺序中的第一个元数据副本,则按照读取顺序读取对应的第二个元数据副本;若第二个元数据副本也无法找到,则按照读取顺序读取对应的第三个元数据副本……,直至找到元数据副本为止;若元数据副本都无法找到,则向目标元数据存储设备返回读取失败的信息。目标元数据存储设备在从与之连接的第二虚拟块设备中读取了元数据后,将元数据返回给元数据服务器14;元数据服务器将元数据返回给客户端,以使客户端能够根据获取的元数据,向对象存储服务器发起第一IO指令,以读取与获取的元数据对应的目标数据。In the case where the second IO instruction is a read data instruction, the second IO instruction carries relevant information of the metadata to be read, such as a file name, etc.; A target metadata storage device is determined among the connected multiple metadata storage devices 15, and the second IO instruction is transmitted to the target metadata storage device; The metadata related to the carried metadata is read from the second virtual block device 16 connected to the metadata corresponding to the second IO instruction. Here, when returning the metadata to the target metadata storage device, the second virtual block device 16 reads the target metadata copy from the at least three saved metadata copies according to the preset second reading order; specifically , if the first metadata copy in the reading order cannot be found, the corresponding second metadata copy is read in the reading order; if the second metadata copy cannot be found, it is read in the reading order The corresponding third metadata copy... until the metadata copy is found; if the metadata copy cannot be found, the read failure information is returned to the target metadata storage device. After reading the metadata from the second virtual block device connected to it, the target metadata storage device returns the metadata to the metadata server 14; the metadata server returns the metadata to the client, so that the client can According to the obtained metadata, a first IO instruction is initiated to the object storage server to read target data corresponding to the obtained metadata.

在上述过程中,客户端每发起一次数据读取过程,首先要访问元数据服务器,获取要读取文件的元数据,然后根据要读取文件的元数据,访问对象存储服务器,以读取文件的目标数据。In the above process, each time the client initiates a data reading process, it first needs to access the metadata server to obtain the metadata of the file to be read, and then access the object storage server to read the file according to the metadata of the file to be read. target data.

类似的,客户端每发起一次数据写入过程,首先要访问元数据服务器,元数据服务器生成元数据后,再通过访问对象存储服务器,将与元数据对应的目标数据存储至于对象存储设备连接的第一虚拟块设备中。Similarly, every time the client initiates a data writing process, it first needs to access the metadata server. After the metadata server generates metadata, it then accesses the object storage server to store the target data corresponding to the metadata to the object storage device. in the first virtual block device.

在一种可能的实施方式中,元数据包括:目标数据的文件名称、目录结构和访问权限等。In a possible implementation manner, the metadata includes: the file name, directory structure and access rights of the target data, and the like.

在另一种可能的实施方式中,目标数据包括:由多个文件聚合成的数据块;In another possible implementation manner, the target data includes: data blocks aggregated from multiple files;

与所述数据块对应的元数据包括各个所述文件在所述数据块中的位置信息。The metadata corresponding to the data block includes location information of each of the files in the data block.

这样,可以通过将多个文件聚合成数据块;客户端在访问数据块中的某个文件时,能够基于该位置信息来确定要访问文件在第二虚拟块设备中的存储位置。In this way, multiple files can be aggregated into data blocks; when the client accesses a certain file in the data block, the client can determine the storage location of the file to be accessed in the second virtual block device based on the location information.

如图2所示,本公开实施例提供一种读取数据的完整流程,包括:As shown in FIG. 2 , an embodiment of the present disclosure provides a complete process for reading data, including:

客户端Client向元数据服务器MDS请求元数据;MDS基于Client的请求,向具体的元数据存储设备MDT发起数据读取;MDT从第二虚拟块设备VBD2中读取元数据,并将元数据返回至MDT;MDT将元数据返回给MDS;MDS将元数据返回给Client。The client Client requests metadata from the metadata server MDS; the MDS initiates data reading from the specific metadata storage device MDT based on the client's request; the MDT reads the metadata from the second virtual block device VBD2 and returns the metadata to MDT; MDT returns metadata to MDS; MDS returns metadata to Client.

Client根据获取到的元数据信息找到对应的对象存储服务器OSS获取目标数据;OSS将IO请求转交给具体的对象存储设备OST;OST从与之连接的第一虚拟块设备VBD1中获取目标数据,并将获取的目标数据返回给OST;OST将目标数据返回给OSS;OSS将目标数据返回给Client。The client finds the corresponding object storage server OSS according to the obtained metadata information to obtain the target data; the OSS transfers the IO request to the specific object storage device OST; the OST obtains the target data from the first virtual block device VBD1 connected to it, and Return the acquired target data to the OST; the OST returns the target data to the OSS; the OSS returns the target data to the Client.

通过上述过程,完成一次数据读取。Through the above process, one data read is completed.

参见图3所示,本公开实施例还提供一种第一虚拟块设备13为OST提供数据存储服务的具体数据处理过程,其中,第一虚拟块设备13为将至少一个物理磁盘Disk进行虚拟化后,得到的虚拟存储设备;与对象存储设备12中部署有虚拟块设备(Virtual Block Disk,VBD)服务;另外,VBD服务也可以单独设置在另外一台具体设备中,称VBD服务器。Referring to FIG. 3 , an embodiment of the present disclosure further provides a specific data processing process in which the first virtual block device 13 provides a data storage service for an OST, wherein the first virtual block device 13 virtualizes at least one physical disk Disk Then, the obtained virtual storage device; and the object storage device 12 are deployed with a virtual block device (Virtual Block Disk, VBD) service; in addition, the VBD service can also be independently set in another specific device, called a VBD server.

OST向VBD服务请求次级的元数据,VBD服务基于该请求,从VBD集群中的管理者节点Monitor中获取具体的次级的元数据。The OST requests the secondary metadata from the VBD service, and the VBD service obtains the specific secondary metadata from the manager node Monitor in the VBD cluster based on the request.

这里,元数据包括了目标数据的文件名称、在虚拟块设备中的目录结构和访问权限等。Here, the metadata includes the file name of the target data, the directory structure and access rights in the virtual block device, etc.

次级的元数据,则包括了目标数据的文件名称、目标数据在构成虚拟块设备中的至少一个物理磁盘中的目录结构和访问权限等。The secondary metadata includes the file name of the target data, the directory structure and access rights of the target data in at least one physical disk constituting the virtual block device, and the like.

Monitor将次级的元数据返回给VBD服务;VBD服务将次级的元数据返回至OSD;OSD根据次级的元数据,从具体的物理磁盘Disk中获取与次级的元数据对应的目标数据副本。Monitor returns secondary metadata to VBD service; VBD service returns secondary metadata to OSD; OSD obtains target data corresponding to secondary metadata from specific physical disk Disk according to secondary metadata copy.

这里,需要的注意的是,不同的目标数据副本,例如部署在构成第一虚拟块设备的不同Disk中;在该示例中构成第一虚拟快设备的Disk包括:Disk1、Disk2和Disk3;且每个Disk存储有目标数据的一个目标数据副本;预设的第一读取顺序为:Disk1→Disk2→Disk3;当OSD根据预先设置的第一读取顺序,从Disk1中无法读取到目标数据副本后,再从Disk2中读取目标数据副本;从Disk2中无法读取到目标数据副本后,再从Disk3中读取目标数据副本。Here, it should be noted that different target data copies are, for example, deployed in different Disks that constitute the first virtual block device; in this example, the Disks that constitute the first virtual fast device include: Disk1, Disk2 and Disk3; and each Each Disk stores a target data copy of the target data; the preset first reading order is: Disk1→Disk2→Disk3; when the OSD cannot read the target data copy from Disk1 according to the preset first reading order Then, read the target data copy from Disk2; after the target data copy cannot be read from Disk2, read the target data copy from Disk3.

构成虚拟块设备的三个Disk同时将目标数据副本丢失的可能性极低,进而保证了数据的可靠性。It is extremely unlikely that the three Disks that constitute the virtual block device will lose the target data copy at the same time, thereby ensuring the reliability of the data.

另外,在数据写入的时候,OST也会向VBD服务发送写请求;VBD服务将写请求发送至Monitor,生成次级的元数据;且VBD服务将对应的目标数据写入构成VBD的不同Disk,以在多个Disk中存储为不同的目标数据副本。In addition, when data is written, the OST will also send a write request to the VBD service; the VBD service will send the write request to the Monitor to generate secondary metadata; and the VBD service will write the corresponding target data to the different Disks that constitute the VBD. , to store as different target data copies in multiple Disks.

本公开实施例提供的集群文件系统,包括对象存储服务器、多个对象存储设备、以及多个第一虚拟块设备;其中,所述对象存储服务器与多个所述对象存储设备连接;每个所述对象存储设备连接至少一个所述第一虚拟块设备;所述对象存储服务器,用于接收客户端发送的第一输入输出(Input/Output,IO)指令,并基于第一IO指令,通过所述对象存储设备访问第一虚拟块设备;第一虚拟块设备中存储有目标数据;第一虚拟块设备能够将多个物理磁盘Disk聚合到一起,形成一个大容量的虚拟设备,因而具有更强的可扩展性。The cluster file system provided by the embodiments of the present disclosure includes an object storage server, multiple object storage devices, and multiple first virtual block devices; wherein, the object storage server is connected to the multiple object storage devices; The object storage device is connected to at least one of the first virtual block devices; the object storage server is configured to receive a first input/output (IO) instruction sent by the client, and based on the first IO instruction, through the The object storage device accesses the first virtual block device; the first virtual block device stores target data; the first virtual block device can aggregate a plurality of physical disk Disks together to form a large-capacity virtual device, so it has stronger of scalability.

同时,目标数据在第一虚拟块设备中能够被存储为至少三个副本,可以大大提高数据的可靠性At the same time, the target data can be stored as at least three copies in the first virtual block device, which can greatly improve the reliability of the data

另外,本公开实施例的目标数据包括了有多个文件聚合成的数据块;对于元数据服务器而言,其需要维护的文件目录规模会显著下降,进而本公开实施例提供的集群文件系统能够适用于海量小文件应用。In addition, the target data of the embodiments of the present disclosure includes data blocks that are aggregated by multiple files; for the metadata server, the scale of the file directory that needs to be maintained will be significantly reduced, and the cluster file system provided by the embodiments of the present disclosure can Applicable to massive small file applications.

参见图4所示,本公开实施例还提供一种数据处理方法,包括:Referring to FIG. 4 , an embodiment of the present disclosure further provides a data processing method, including:

S401:接收客户端发送的输入输出IO指令。S401: Receive an input and output IO instruction sent by the client.

S402:基于所述IO指令,控制存储设备将与所述IO指令对应的数据写入所述虚拟块设备中,或者从所述虚拟块设备中读取与所述IO指令对应的数据。S402: Based on the IO instruction, control the storage device to write data corresponding to the IO instruction into the virtual block device, or read data corresponding to the IO instruction from the virtual block device.

其中,所述数据在所述虚拟块设备中存储为至少三个数据副本。Wherein, the data is stored as at least three data copies in the virtual block device.

在一种可能的实施方式中,所述存储设备包括:对象存储设备;所述虚拟块设备包括:第一虚拟块设备;所述IO指令包括:第一IO指令;所述数据包括目标数据;In a possible implementation manner, the storage device includes: an object storage device; the virtual block device includes: a first virtual block device; the IO instruction includes: a first IO instruction; the data includes target data;

所述基于所述IO指令,控制存储设备将与所述IO指令对应的数据写入所述虚拟块设备中,或者从所述虚拟块设备中读取与所述IO指令对应的数据,包括:The controlling the storage device to write data corresponding to the IO instruction into the virtual block device based on the IO instruction, or to read data corresponding to the IO instruction from the virtual block device, including:

基于第一IO指令,控制所述对象存储设备将与所述第一IO指令对应的目标数据写入所述第一虚拟块设备中,或者从所述第一虚拟块设备中读取与所述第一IO指令对应的目标数据。Based on the first IO instruction, the object storage device is controlled to write the target data corresponding to the first IO instruction into the first virtual block device, or to read the target data corresponding to the first virtual block device from the first virtual block device Target data corresponding to the first IO instruction.

此处,具体的目标数据的数据读取方式数据写入方式可参见上述图1对应的实施例,这里不再赘述。Here, for the specific data reading method and data writing method of the target data, reference may be made to the above-mentioned embodiment corresponding to FIG. 1 , which will not be repeated here.

在另一种可能的实施方式中,所述存储设备包括:元数据存储设备;所述虚拟块设备包括:第二虚拟块设备;所述IO指令包括:第二IO指令;所述数据包括元数据;In another possible implementation manner, the storage device includes: a metadata storage device; the virtual block device includes: a second virtual block device; the IO instruction includes: a second IO instruction; the data includes metadata data;

所述基于所述IO指令,控制存储设备将与所述IO指令对应的数据写入所述虚拟块设备中,或者从所述虚拟块设备中读取与所述IO指令对应的数据,包括:The controlling the storage device to write data corresponding to the IO instruction into the virtual block device based on the IO instruction, or to read data corresponding to the IO instruction from the virtual block device, including:

基于第二IO指令,控制所述元数据存储设备将与所述第二IO指令对应的元数据数据写入所述第二虚拟块设备中,或者从所述第二虚拟块设备中读取与所述第二IO指令对应的元数据。Based on the second IO instruction, the metadata storage device is controlled to write the metadata data corresponding to the second IO instruction into the second virtual block device, or to read the metadata from the second virtual block device. metadata corresponding to the second IO instruction.

此处,具体的元数据的读取和写入方式可参见上述图1对应的实施例,这里不再赘述。Here, for the specific way of reading and writing metadata, reference may be made to the above-mentioned embodiment corresponding to FIG. 1 , which will not be repeated here.

本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of the specific implementation, the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.

本公开实施例还提供了一种计算机设备10,如图5所示,为本公开实施例提供的计算机设备10结构示意图,包括:An embodiment of the present disclosure further provides a computer device 10. As shown in FIG. 5, a schematic structural diagram of the computer device 10 provided by an embodiment of the present disclosure includes:

相互连接的处理器11和存储器12,所述存储器12存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述机器可读指令被所述处理器11执行以实现下述步骤:。The processor 11 and the memory 12 are interconnected, the memory 12 stores machine-readable instructions executable by the processor, and when the computer device is running, the machine-readable instructions are executed by the processor 11 to achieve the following: Describe the steps: .

接收客户端发送的输入输出IO指令;Receive input and output IO commands sent by the client;

基于所述IO指令,控制存储设备将与所述IO指令对应的数据写入所述虚拟块设备中,或者从所述虚拟块设备中读取与所述IO指令对应的数据;Based on the IO instruction, controlling the storage device to write data corresponding to the IO instruction into the virtual block device, or to read data corresponding to the IO instruction from the virtual block device;

其中,所述数据在所述虚拟块设备中存储为至少三个数据副本。Wherein, the data is stored as at least three data copies in the virtual block device.

上述指令的具体执行过程可以参考本公开实施例中所述的数据处理方法的步骤,此处不再赘述。For the specific execution process of the above instruction, reference may be made to the steps of the data processing method described in the embodiments of the present disclosure, and details are not repeated here.

本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的数据处理方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the data processing method described in the foregoing method embodiments are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.

本公开实施例所提供的数据处理方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行上述方法实施例中所述的数据处理方法的步骤,具体可参见上述方法实施例,在此不再赘述。The computer program product of the data processing method provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program codes can be used to execute the steps of the data processing methods described in the above method embodiments. , for details, refer to the foregoing method embodiments, which will not be repeated here.

本公开实施例还提供一种计算机程序,该计算机程序被处理器执行时实现前述实施例的任意一种方法。该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software DevelopmentKit,SDK)等等。Embodiments of the present disclosure also provide a computer program, which implements any one of the methods in the foregoing embodiments when the computer program is executed by a processor. The computer program product can be specifically implemented by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. .

所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the system and device described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided by the present disclosure, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-OnlyMemory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on such understanding, the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the parts that contribute to the prior art or the parts of the technical solutions. The computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.

最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure rather than limit them. The protection scope of the present disclosure is not limited thereto, although referring to the foregoing The embodiments describe the present disclosure in detail. Those of ordinary skill in the art should understand that: any person skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed by the present disclosure. Changes can be easily thought of, or equivalent replacements are made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be covered in the present disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be based on the protection scope of the claims.

Claims (10)

1.一种集群文件系统,其特征在于,包括:对象存储服务器、多个对象存储设备、以及多个第一虚拟块设备;1. A cluster file system, comprising: an object storage server, multiple object storage devices, and multiple first virtual block devices; 其中,所述对象存储服务器与多个所述对象存储设备连接;每个所述对象存储设备连接至少一个所述第一虚拟块设备;Wherein, the object storage server is connected to a plurality of the object storage devices; each of the object storage devices is connected to at least one of the first virtual block devices; 所述对象存储服务器,用于接收客户端发送的第一输入输出IO指令,并基于所述第一IO指令,通过所述对象存储设备访问所述第一虚拟块设备;the object storage server, configured to receive the first input/output IO instruction sent by the client, and access the first virtual block device through the object storage device based on the first IO instruction; 所述第一虚拟块设备中存储有目标数据;Target data is stored in the first virtual block device; 其中,所述目标数据在所述第一虚拟块设备中存储为至少三个数据副本。Wherein, the target data is stored as at least three data copies in the first virtual block device. 2.根据权利要求1所述的集群文件系统,其特征在于,还包括:元数据服务器、多个元数据存储设备、以及多个第二虚拟块设备;2. The cluster file system according to claim 1, further comprising: a metadata server, multiple metadata storage devices, and multiple second virtual block devices; 其中,所述元数据服务器与多个所述元数据存储设备连接;每个所述元数据存储设备连接至少一个所述第二虚拟块设备;Wherein, the metadata server is connected to a plurality of the metadata storage devices; each of the metadata storage devices is connected to at least one of the second virtual block devices; 所述元数据服务器,用于接收所述客户端发送的第二IO指令,并基于所述第二IO指令,通过所述元数据存储设备访问所述第二虚拟块设备;the metadata server, configured to receive a second IO instruction sent by the client, and access the second virtual block device through the metadata storage device based on the second IO instruction; 所述第二虚拟块设备中存储有与所述目标数据对应的元数据;metadata corresponding to the target data is stored in the second virtual block device; 其中,所述元数据在所述第二虚拟块设备中存储为至少三个元数据副本。Wherein, the metadata is stored as at least three metadata copies in the second virtual block device. 3.根据权利要求2所述的集群文件系统,其特征在于,所述目标数据包括:由多个文件聚合成的数据块;3. The cluster file system according to claim 2, wherein the target data comprises: a data block aggregated by a plurality of files; 与所述数据块对应的元数据包括各个所述文件在所述数据块中的位置信息。The metadata corresponding to the data block includes location information of each of the files in the data block. 4.根据权利要求1-3任一项所述的集群文件系统,其特征在于,所述第一虚拟块设备,还用于按照预设的第一读取顺序,从与所述第一IO指令对应的至少三个数据副本中确定并向所述对象存储设备返回目标数据副本。4. The cluster file system according to any one of claims 1-3, wherein the first virtual block device is further configured to, according to a preset first reading order, start from and the first IO The target data copy is determined from the at least three data copies corresponding to the instruction and returned to the object storage device. 5.一种数据处理方法,其特征在于,包括:5. A data processing method, characterized in that, comprising: 接收客户端发送的输入输出IO指令;Receive input and output IO commands sent by the client; 基于所述IO指令,控制存储设备将与所述IO指令对应的数据写入所述虚拟块设备中,或者从所述虚拟块设备中读取与所述IO指令对应的数据;Based on the IO instruction, controlling the storage device to write data corresponding to the IO instruction into the virtual block device, or to read data corresponding to the IO instruction from the virtual block device; 其中,所述数据在所述虚拟块设备中存储为至少三个数据副本。Wherein, the data is stored as at least three data copies in the virtual block device. 6.根据权利要求5所述的数据处理方法,其特征在于,所述存储设备包括:对象存储设备;所述虚拟块设备包括:第一虚拟块设备;所述IO指令包括:第一IO指令;所述数据包括目标数据;6. The data processing method according to claim 5, wherein the storage device comprises: an object storage device; the virtual block device comprises: a first virtual block device; the IO instruction comprises: a first IO instruction ; the data includes target data; 所述基于所述IO指令,控制存储设备将与所述IO指令对应的数据写入所述虚拟块设备中,或者从所述虚拟块设备中读取与所述IO指令对应的数据,包括:The controlling the storage device to write data corresponding to the IO instruction into the virtual block device based on the IO instruction, or to read data corresponding to the IO instruction from the virtual block device, including: 基于第一IO指令,控制所述对象存储设备将与所述第一IO指令对应的目标数据写入所述第一虚拟块设备中,或者从所述第一虚拟块设备中读取与所述第一IO指令对应的目标数据。Based on the first IO instruction, the object storage device is controlled to write the target data corresponding to the first IO instruction into the first virtual block device, or to read the target data corresponding to the first virtual block device from the first virtual block device Target data corresponding to the first IO instruction. 7.根据权利要求5或6所述的数据处理方法,其特征在于,所述存储设备包括:元数据存储设备;所述虚拟块设备包括:第二虚拟块设备;所述IO指令包括:第二IO指令;所述数据包括元数据;7. The data processing method according to claim 5 or 6, wherein the storage device comprises: a metadata storage device; the virtual block device comprises: a second virtual block device; the IO instruction comprises: a first Two IO instructions; the data includes metadata; 所述基于所述IO指令,控制存储设备将与所述IO指令对应的数据写入所述虚拟块设备中,或者从所述虚拟块设备中读取与所述IO指令对应的数据,包括:The controlling the storage device to write data corresponding to the IO instruction into the virtual block device based on the IO instruction, or to read data corresponding to the IO instruction from the virtual block device, including: 基于第二IO指令,控制所述元数据存储设备将与所述第二IO指令对应的元数据数据写入所述第二虚拟块设备中,或者从所述第二虚拟块设备中读取与所述第二IO指令对应的元数据。Based on the second IO instruction, the metadata storage device is controlled to write the metadata data corresponding to the second IO instruction into the second virtual block device, or to read and metadata corresponding to the second IO instruction. 8.根据权利要求5-8任一项所述的数据处理方法,其特征在于,所述从所述虚拟块设备中读取与所述IO指令对应的数据,包括:8. The data processing method according to any one of claims 5-8, wherein the reading data corresponding to the IO instruction from the virtual block device comprises: 将所述IO指令发送至所述虚拟块设备,以使所述虚拟块设备按照预设的读取顺序,从与所述IO指令对应的至少三个数据副本中确定并向所述对象存储设备返回目标数据副本。Sending the IO instruction to the virtual block device, so that the virtual block device determines from at least three data copies corresponding to the IO instruction according to a preset reading order and sends the data to the object storage device Returns a copy of the target data. 9.一种电子设备,其特征在于,包括:相互连接的处理器和存储器,所述存储器存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述机器可读指令被所述处理器执行以实现如权利要求5-8任一所述的数据处理方法。9. An electronic device, comprising: a processor and a memory connected to each other, wherein the memory stores machine-readable instructions executable by the processor, and when the computer device runs, the machine-readable instructions Executed by the processor to implement the data processing method according to any one of claims 5-8. 10.一种计算机可读存储介质,其特征在于,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求5-8任一所述的数据处理方法。10. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the data processing method according to any one of claims 5-8 is executed.
CN202010343972.6A 2020-04-27 2020-04-27 Cluster file system, data processing method, computer equipment and storage medium Active CN111522514B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010343972.6A CN111522514B (en) 2020-04-27 2020-04-27 Cluster file system, data processing method, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010343972.6A CN111522514B (en) 2020-04-27 2020-04-27 Cluster file system, data processing method, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111522514A true CN111522514A (en) 2020-08-11
CN111522514B CN111522514B (en) 2023-11-03

Family

ID=71906212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010343972.6A Active CN111522514B (en) 2020-04-27 2020-04-27 Cluster file system, data processing method, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111522514B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113590309A (en) * 2021-06-30 2021-11-02 郑州云海信息技术有限公司 Data processing method, device, equipment and storage medium
CN114035750A (en) * 2021-11-24 2022-02-11 北京度友信息技术有限公司 File processing method, device, equipment, medium and product
CN114079659A (en) * 2020-08-13 2022-02-22 支付宝(杭州)信息技术有限公司 Server of distributed storage system, data storage method and data access system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160026672A1 (en) * 2014-07-23 2016-01-28 Netapp. Inc. Data and metadata consistency in object storage systems
CN105446794A (en) * 2014-09-30 2016-03-30 北京金山云网络技术有限公司 Disc operation method, apparatus and system based on virtual machine
CN105468296A (en) * 2015-11-18 2016-04-06 南京格睿信息技术有限公司 No-sharing storage management method based on virtualization platform
US9558208B1 (en) * 2013-12-19 2017-01-31 EMC IP Holding Company LLC Cluster file system comprising virtual file system having corresponding metadata server
CN107203639A (en) * 2017-06-09 2017-09-26 联泰集群(北京)科技有限责任公司 Parallel file system based on High Performance Computing
CN107807794A (en) * 2017-10-31 2018-03-16 新华三技术有限公司 A kind of date storage method and device
CN109347896A (en) * 2018-08-14 2019-02-15 联想(北京)有限公司 A kind of information processing method, equipment and computer readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9558208B1 (en) * 2013-12-19 2017-01-31 EMC IP Holding Company LLC Cluster file system comprising virtual file system having corresponding metadata server
US20160026672A1 (en) * 2014-07-23 2016-01-28 Netapp. Inc. Data and metadata consistency in object storage systems
CN105446794A (en) * 2014-09-30 2016-03-30 北京金山云网络技术有限公司 Disc operation method, apparatus and system based on virtual machine
CN105468296A (en) * 2015-11-18 2016-04-06 南京格睿信息技术有限公司 No-sharing storage management method based on virtualization platform
CN107203639A (en) * 2017-06-09 2017-09-26 联泰集群(北京)科技有限责任公司 Parallel file system based on High Performance Computing
CN107807794A (en) * 2017-10-31 2018-03-16 新华三技术有限公司 A kind of date storage method and device
CN109347896A (en) * 2018-08-14 2019-02-15 联想(北京)有限公司 A kind of information processing method, equipment and computer readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KRISTAL T. POLLACK: "Quota enforcement for high-performance distributed storage systems", 《IEEE XPLORE》 *
刘仲, 章文嵩, 王召福, 周兴铭: "基于对象存储的集群存储系统设计", 计算机工程与科学, no. 02 *
罗圣美: "一种结合SSD特征的分布式文件系统元数据优化技术", 《小型微型计算机系统》, no. 5 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114079659A (en) * 2020-08-13 2022-02-22 支付宝(杭州)信息技术有限公司 Server of distributed storage system, data storage method and data access system
CN114079659B (en) * 2020-08-13 2025-01-10 支付宝(杭州)信息技术有限公司 Distributed storage system server, distributed storage system, data storage and data access method and system
CN113590309A (en) * 2021-06-30 2021-11-02 郑州云海信息技术有限公司 Data processing method, device, equipment and storage medium
CN113590309B (en) * 2021-06-30 2024-01-23 郑州云海信息技术有限公司 Data processing method, device, equipment and storage medium
CN114035750A (en) * 2021-11-24 2022-02-11 北京度友信息技术有限公司 File processing method, device, equipment, medium and product

Also Published As

Publication number Publication date
CN111522514B (en) 2023-11-03

Similar Documents

Publication Publication Date Title
JP5276218B2 (en) Convert LUNs to files or files to LUNs in real time
US9594514B1 (en) Managing host data placed in a container file system on a data storage array having multiple storage tiers
US9122697B1 (en) Unified data services for block and file objects
US8966476B2 (en) Providing object-level input/output requests between virtual machines to access a storage subsystem
US8473462B1 (en) Change tracking for shared disks
EP4139781B1 (en) Persistent memory architecture
US9286007B1 (en) Unified datapath architecture
US9569455B1 (en) Deduplicating container files
US10031703B1 (en) Extent-based tiering for virtual storage using full LUNs
US8347060B2 (en) Storage system, storage extent release method and storage apparatus
US9430480B1 (en) Active-active metro-cluster scale-out for unified data path architecture
WO2020236353A1 (en) Memory disaggregation for compute nodes
US9122712B1 (en) Compressing container files
US9842117B1 (en) Managing replication of file systems
US9069783B1 (en) Active-active scale-out for unified data path architecture
US8046534B2 (en) Managing snapshots in storage systems
CN111522514A (en) Cluster file system, data processing method, computer equipment and storage medium
US20250306790A1 (en) Co-located Journaling and Data Storage for Write Requests
US11327895B1 (en) Protocol for processing requests that assigns each request received by a node a sequence identifier, stores data written by the request in a cache page block, stores a descriptor for the request in a cache page descriptor, and returns a completion acknowledgement of the request
WO2016013075A1 (en) Storage, computer, and control method therefor
CN109508255B (en) A method and device for data processing
US8356016B1 (en) Forwarding filesystem-level information to a storage management system
GB2502288A (en) Modifying the order of checking virtual machines for cached disc data
He et al. Coordinating parallel hierarchical storage management in object-base cluster file systems
US11994954B2 (en) Fast disaster recover from backup storage using smart links

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20251215

Address after: 518000 Guangdong Province Shenzhen City Futian District Futian Street Fuan Community Fuhan 1st Road 138 International Chamber of Commerce Building A Block 821E26

Patentee after: Shenzhen Puli Technology Co.,Ltd.

Country or region after: China

Address before: Room 1605a, building 3, 391 Guiping Road, Xuhui District, Shanghai

Patentee before: SHANGHAI SENSETIME INTELLIGENT TECHNOLOGY Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right