CN110535692B - Fault handling method, device, computer equipment, storage medium and storage system - Google Patents
Fault handling method, device, computer equipment, storage medium and storage system Download PDFInfo
- Publication number
- CN110535692B CN110535692B CN201910741190.5A CN201910741190A CN110535692B CN 110535692 B CN110535692 B CN 110535692B CN 201910741190 A CN201910741190 A CN 201910741190A CN 110535692 B CN110535692 B CN 110535692B
- Authority
- CN
- China
- Prior art keywords
- fault
- storage node
- target
- node
- storage system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Debugging And Monitoring (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
Abstract
本发明公开了一种分布式存储系统故障处理方法、装置、计算机设备、存储介质及存储系统,属于故障处理技术领域。本方法根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态,从而无需当所有存储节点均故障时,才确定分布式存储系统的故障状态,当确定完故障状态后,可以立即向分布式存储系统内的每个存储节点发送故障状态,以便每个存储节点根据确定的故障状态进行故障处理,从而可以降低分布式存储系统恢复正常所用的时间。
The invention discloses a fault processing method, device, computer equipment, storage medium and storage system of a distributed storage system, belonging to the technical field of fault processing. The method determines the fault state of the distributed storage system according to the faulty storage node of at least one of the plurality of storage nodes, so that it is not necessary to determine the fault state of the distributed storage system only when all the storage nodes are faulty, After the fault status is determined, the fault status can be sent to each storage node in the distributed storage system immediately, so that each storage node can perform fault processing according to the determined fault status, thereby reducing the time taken for the distributed storage system to recover to normal. .
Description
技术领域technical field
本发明涉及故障处理技术领域,特别涉及一种分布式存储系统故障处理方法、装置、计算机设备、存储介质及分布式存储系统。The present invention relates to the technical field of fault handling, in particular to a method, device, computer equipment, storage medium and distributed storage system for fault handling of a distributed storage system.
背景技术Background technique
随着大数据技术的发展,为了存储更多的数据以及防止数据丢失,分布式存储系统越来越受到企业的青睐,分布式存储系统中的存储节点随着使用时间的增长,不可避免的会出现故障,当分布式存储系统中的存储节点均出现故障时,为了保证故障的存储节点不影响正常的业务,为用户提供业务的计算节点可以对分布式存储系统中的故障,进行故障处理。With the development of big data technology, in order to store more data and prevent data loss, distributed storage systems are more and more favored by enterprises. In the event of a failure, when all storage nodes in the distributed storage system fail, in order to ensure that the failed storage nodes do not affect normal services, the computing nodes that provide services to users can perform fault handling on the failures in the distributed storage system.
其中,故障处理可以是以下过程:在分布式存储系统中,客户端向多个存储节点发送小型计算机系统接口(small computer system interface,SCSI)请求;当分布式存储系统中的所有存储节点均出现故障时,若存储节点出现的是短时间内可修复的故障,则存储节点不会响应SCSI请求,当客户端未获取任何存储节点的响应时,则客户端将该分布式存储系统的故障状态确定为全部路径异常(all path down,APD)状态,APD状态为VMWare虚拟机定义的一种存储节点的故障状态,用于表示后端存储节点的所有路径均无法响应主机请求,客户端悬挂未处理的SCSI请求,等待技术人员修复分布式存储系统内的故障;当存储节点出现短时间内不可修复的故障时,存储节点向客户端返回存储异常的消息,则客户端接收到该存储异常的消息后,将该分布式存储系统的故障状态确定为永久设备丢失(permanent device lost,PDL)状态,其中,PDL状态为VMWare虚拟机定义的一种存储节点的故障状态,用于表示后端存储节点长期或永久故障,由于存储节点长时间出现故障会导致分布式存储系统内的文件系统损坏,当客户端确定的故障状态为PDL状态时,该客户端将文件系统下电,并等待技术人员修复故障。The fault handling may be the following process: in the distributed storage system, the client sends a small computer system interface (SCSI) request to multiple storage nodes; when all storage nodes in the distributed storage system appear In the event of a failure, if the storage node has a fault that can be repaired in a short time, the storage node will not respond to the SCSI request. It is determined to be in the all path down (APD) state. The APD state is a fault state of a storage node defined by the VMWare virtual machine. It is used to indicate that all paths of the back-end storage node cannot respond to host requests, and the client is not suspended. The processed SCSI request is waiting for the technician to repair the fault in the distributed storage system; when the storage node has an irreparable fault within a short period of time, the storage node returns a storage exception message to the client, and the client receives the storage exception. After the message, the fault state of the distributed storage system is determined as a permanent device lost (permanent device lost, PDL) state, wherein the PDL state is a fault state of a storage node defined by the VMWare virtual machine, which is used to represent the back-end storage Long-term or permanent node failure. The long-term failure of the storage node will cause the file system in the distributed storage system to be damaged. When the failure state determined by the client is the PDL state, the client powers off the file system and waits for the technician. Fix bugs.
在上述故障处理过程中,只有当分布式存储系统中所有存储节点均出现故障时,客户端才会执行上述故障处理的过程,但是,当分布式存储系统中的部分存储节点出现故障时,客户端不会执行上述故障处理的过程。然而,在分布式存储系统中部分存储节点出现故障是一种比较常见的现象,一旦部分存储节点出现故障,若不对分布式存储系统中的存储节点进行故障诊断,则客户端并无法确定分布式存储系统中的存储节点是否有故障,从而技术人员不能及时地从客户端获知存储节点出现故障的消息,也就不能立即对出现故障的存储节点进行修复,从而延长了分布式存储系统恢复正常所用的时间。In the above fault handling process, the client will only perform the above fault handling process when all storage nodes in the distributed storage system fail. However, when some storage nodes in the distributed storage system fail, the client will The terminal will not perform the above troubleshooting process. However, it is a relatively common phenomenon that some storage nodes fail in a distributed storage system. Once some storage nodes fail, if the fault diagnosis of the storage nodes in the distributed storage system is not performed, the client cannot determine the distributed storage nodes. Whether the storage nodes in the storage system are faulty, so that the technicians cannot timely learn from the client that the storage nodes are faulty, and they cannot repair the faulty storage nodes immediately, thus prolonging the time required for the distributed storage system to return to normal use. time.
发明内容SUMMARY OF THE INVENTION
本发明实施例提供了一种分布式存储系统故障处理方法、装置、计算机设备、存储介质及存储系统,能够降低分布式存储系统恢复正常所用的时间。该技术方案如下:Embodiments of the present invention provide a method, device, computer equipment, storage medium and storage system for fault handling of a distributed storage system, which can reduce the time taken for the distributed storage system to recover to normal. The technical solution is as follows:
第一方面,提供了一种分布式存储系统故障处理方法,所述分布式存储系统包含多个存储节点;该方法包括:In a first aspect, a method for handling faults in a distributed storage system is provided, where the distributed storage system includes multiple storage nodes; the method includes:
根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态;所述故障状态用于指示所述至少一个出现故障的存储节点能否在第一预设时长内全部被修复;A fault state of the distributed storage system is determined according to at least one faulty storage node in the plurality of storage nodes; the fault state is used to indicate whether the at least one faulty storage node can be All repaired within the set time;
向所述多个存储节点中每一个存储节点发送所述故障状态。The failure status is sent to each of the plurality of storage nodes.
基于上述实现方式,根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态,从而无需当所有存储节点均故障时,才确定分布式存储系统的故障状态,当确定完故障状态后,可以立即向分布式存储系统内的每个存储节点发送故障状态,以便每个存储节点根据确定的故障状态进行故障处理,从而可以降低分布式存储系统恢复正常所用的时间。Based on the above implementation manner, the fault state of the distributed storage system is determined according to a storage node in which at least one of the plurality of storage nodes is faulty, so that it is not necessary to determine the fault status of the distributed storage system only when all storage nodes are faulty. When the fault status is determined, the fault status can be sent to each storage node in the distributed storage system immediately, so that each storage node can perform fault processing according to the determined fault status, thereby reducing the recovery time of the distributed storage system. time used.
在一种可能实现方式中,所述方法还包括:In a possible implementation, the method further includes:
根据所述分布式存储系统的故障状态,进行故障处理。Perform fault processing according to the fault state of the distributed storage system.
在一种可能实现方式中,所述根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态包括:In a possible implementation manner, the determining of the fault state of the distributed storage system according to at least one of the multiple storage nodes in which the faulty storage node is faulty includes:
确定所述分布式存储系统内出现故障的至少一个存储节点以及所述至少一个存储节点内无法被访问的目标数据;determining at least one storage node that has failed in the distributed storage system and target data that cannot be accessed in the at least one storage node;
根据所述至少一个存储节点以及所述目标数据,确定所述故障状态。The fault state is determined according to the at least one storage node and the target data.
在一种可能实现方式中,所述根据所述至少一个存储节点以及所述目标数据,确定所述故障状态包括:In a possible implementation manner, the determining the fault state according to the at least one storage node and the target data includes:
当所述至少一个存储节点的数目大于所述分布式存储系统的冗余度,且所述目标数据的数据量符合预设条件时,将所述故障状态确定为第一故障状态,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复。When the number of the at least one storage node is greater than the redundancy of the distributed storage system, and the data volume of the target data meets a preset condition, the fault state is determined to be a first fault state, and the first fault state is determined. A fault state is used to indicate that the at least one storage node can be completely repaired within the first preset time period.
基于上述可能的实现方式,在第一故障状态下,目标设备不会下电文件系统,一旦在第一预设时间内该至少一个存储节点均能被修复,那么也就可以避免下电文件系统,从而可以减少修复文件系统的时间,使得分布式存储系统可以尽快恢复业务,以保证服务质量。Based on the above possible implementation manners, in the first fault state, the target device will not power off the file system. Once the at least one storage node can be repaired within the first preset time, the file system can also be prevented from being powered off. , which can reduce the time to repair the file system, so that the distributed storage system can resume business as soon as possible to ensure the quality of service.
在一种可能实现方式中,所述根据所述至少一个存储节点以及所述目标数据,确定所述故障状态包括:In a possible implementation manner, the determining the fault state according to the at least one storage node and the target data includes:
当所述至少一个存储节点的数目大于所述分布式存储系统的冗余度,且所述目标数据的数据量符合预设条件时,根据所述分布式存储系统的故障场景,确定所述故障状态,所述故障场景用于指示所述至少一个存储节点是否同时出现故障。When the number of the at least one storage node is greater than the redundancy of the distributed storage system, and the data volume of the target data meets a preset condition, determine the failure according to the failure scenario of the distributed storage system status, and the failure scenario is used to indicate whether the at least one storage node fails simultaneously.
在一种可能实现方式中,所述预设条件包括下述任一项:In a possible implementation, the preset condition includes any of the following:
所述目标数据的数据量与第一预设数据量之间的比值大于预设比值,所述第一预设数据量为所述分布式存储系统存储的所有数据的总数据量;The ratio between the data volume of the target data and the first preset data volume is greater than the preset ratio, and the first preset data volume is the total data volume of all data stored in the distributed storage system;
所述目标数据的数据量大于第二预设数据量。The data volume of the target data is greater than the second preset data volume.
在一种可能实现方式中,所述根据所述至少一个存储节点以及所述目标数据,确定所述故障状态之后,所述方法还包括:In a possible implementation manner, after the fault state is determined according to the at least one storage node and the target data, the method further includes:
当所述故障状态为所述第一故障状态时,若在所述第一预设时长内所述至少一个存储节点未全部被修复,则将所述故障状态由第一故障状态更新为第二故障状态,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。When the fault state is the first fault state, if the at least one storage node is not all repaired within the first preset time period, the fault state is updated from the first fault state to the second fault state A failure state, where the second failure state is used to indicate that the at least one storage node cannot be fully repaired within the first preset time period.
在一种可能实现方式中,所述根据所述分布式存储系统的故障场景,确定所述分布式存储系统的故障状态之前,所述方法还包括:In a possible implementation manner, before determining the fault state of the distributed storage system according to the fault scenario of the distributed storage system, the method further includes:
根据所述至少一个存储节点出现故障的时间,确定所述故障场景。The failure scenario is determined according to the time when the at least one storage node fails.
在一种可能实现方式中,所述根据所述至少一个存储节点出现故障的时间,确定所述故障场景包括:In a possible implementation manner, the determining of the failure scenario according to the time when the at least one storage node fails includes:
当所述至少一个存储节点在目标时长内均出现故障时,将所述故障场景确定为第一故障场景,否则,将所述故障场景确定为第二故障场景,所述第一故障场景用于指示所述至少一个存储节点同时出现故障,所述第二故障场景用于指示所述至少一个存储节点出现故障的时间不同。When the at least one storage node fails within the target duration, the failure scenario is determined as the first failure scenario; otherwise, the failure scenario is determined as the second failure scenario, and the first failure scenario is used for Indicates that the at least one storage node fails simultaneously, and the second failure scenario is used to indicate that the at least one storage node fails at different times.
在一种可能实现方式中,所述根据所述分布式存储系统的故障场景,确定所述分布式存储系统的故障状态包括:In a possible implementation manner, the determining the fault state of the distributed storage system according to the fault scenario of the distributed storage system includes:
若所述故障场景为第一故障场景,根据所述至少一个存储节点中每一个存储节点的故障类型,确定所述故障状态,所述第一故障场景用于指示所述至少一个存储节点同时出现故障,所述故障类型用于指示一个存储节点的故障能否在第二预设时长内被修复;If the failure scenario is the first failure scenario, the failure state is determined according to the failure type of each storage node in the at least one storage node, and the first failure scenario is used to indicate that the at least one storage node occurs simultaneously failure, the failure type is used to indicate whether the failure of a storage node can be repaired within the second preset time period;
若所述故障场景为第二故障场景,根据所述至少一个存储节点中最晚出现故障的第一存储节点的故障类型,确定所述故障状态,所述第二故障场景用于指示所述至少一个存储节点出现故障的时间不同。If the failure scenario is a second failure scenario, the failure state is determined according to the failure type of the first storage node that fails the latest among the at least one storage node, and the second failure scenario is used to indicate the at least one storage node. A storage node fails at different times.
在一种可能实现方式中,所述根据所述至少一个存储节点中每一个存储节点的故障类型,确定所述故障状态包括:In a possible implementation manner, the determining the fault state according to the fault type of each storage node in the at least one storage node includes:
当所述至少一个存储节点中每一个存储节点的故障类型均为第一故障类型时,将所述故障状态确定为第一故障状态,所述第一故障类型用于指示一个存储节点的故障能在所述第二预设时长内被修复,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复;When the fault type of each storage node in the at least one storage node is the first fault type, the fault state is determined as the first fault state, and the first fault type is used to indicate the fault energy of one storage node Repaired within the second preset time period, the first fault state is used to indicate that the at least one storage node can be completely repaired within the first preset time period;
当所述至少一个存储节点中目标个数的存储节点的故障类型为第二故障类型时,若所述目标个数小于或等于所述分布式存储系统的冗余度,将所述故障状态确定为所述第一故障状态,否则,将所述故障状态确定为第二故障状态,所述第二故障类型用于指示一个存储节点的故障不能在所述第二预设时长内被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。When the failure type of a target number of storage nodes in the at least one storage node is the second failure type, and if the target number is less than or equal to the redundancy of the distributed storage system, determine the failure state is the first fault state, otherwise, the fault state is determined to be the second fault state, and the second fault type is used to indicate that the fault of one storage node cannot be repaired within the second preset time period, so The second fault state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
在一种可能实现方式中,所述根据所述至少一个存储节点中最晚出现故障的第一存储节点的故障类型,确定所述故障状态包括:In a possible implementation manner, the determining the fault state according to the fault type of the first storage node that fails the latest among the at least one storage node includes:
当所述第一存储节点的故障类型为第一故障类型时,将所述故障状态确定为第一故障状态,所述第一故障类型用于指示一个存储节点的故障能在所述第二预设时长内被修复,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复;When the fault type of the first storage node is the first fault type, the fault state is determined to be the first fault state, and the first fault type is used to indicate that the fault of one storage node can be being repaired within a preset time period, and the first fault state is used to indicate that the at least one storage node can be completely repaired within the first preset time period;
当所述第一存储节点的故障类型为第二故障类型时,则将所述故障状态确定为第二故障状态,所述第二故障类型用于指示一个存储节点的故障不能在所述第二预设时长内被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。When the fault type of the first storage node is the second fault type, the fault state is determined to be the second fault state, and the second fault type is used to indicate that the fault of one storage node cannot be repaired within a preset time period, and the second fault state is used to indicate that the at least one storage node cannot be fully repaired within the first preset time period.
在一种可能实现方式中,所述根据所述分布式存储系统的故障场景,确定所述分布式存储系统的故障状态之前,所述方法还包括:In a possible implementation manner, before determining the fault state of the distributed storage system according to the fault scenario of the distributed storage system, the method further includes:
对于所述至少一个存储节点中的任一存储节点,当所述任一存储节点出现预设的网络故障、预设的异常掉电故障、预设的误操作故障、预设的硬件故障或预设的软件故障时,将所述任一存储节点的故障类型确定为第一故障类型,否则,将所述任一存储节点的故障类型确定为第二故障类型,所述第一故障类型用于指示一个存储节点的故障能在所述第二预设时长内被修复,所述第二故障类型用于指示一个存储节点的故障不能在所述第二预设时长内被修复。For any storage node in the at least one storage node, when a preset network failure, a preset abnormal power failure, a preset misoperation failure, a preset hardware failure or a preset failure occurs on the any storage node When the software is faulty, the fault type of any storage node is determined as the first fault type, otherwise, the fault type of any storage node is determined as the second fault type, and the first fault type is used for Indicates that the failure of one storage node can be repaired within the second preset time period, and the second failure type is used to indicate that the failure of one storage node cannot be repaired within the second preset time period.
在一种可能实现方式中,所述根据所述分布式存储系统的故障状态,进行故障处理之后,所述方法还包括:In a possible implementation manner, after the fault processing is performed according to the fault state of the distributed storage system, the method further includes:
当所述至少一个存储节点修复完成时,向所述分布式存储系统内的各个设备发送修复完成响应,所述修复完成响应用于指示所述分布式存储系统内没有故障设备。When the repair of the at least one storage node is completed, a repair completion response is sent to each device in the distributed storage system, where the repair completion response is used to indicate that there is no faulty device in the distributed storage system.
第二方面,提供了一种分布式存储系统故障处理方法,所述分布式存储系统包含多个存储节点;该方法包括:In a second aspect, a method for handling faults in a distributed storage system is provided, where the distributed storage system includes multiple storage nodes; the method includes:
向所述分布式存储系统中的目标存储节点发送访问请求;sending an access request to the target storage node in the distributed storage system;
接收所述目标存储节点返回的响应;所述响应中包含所述分布式存储系统的故障状态;所述故障状态用于指示至少一个出现故障的存储节点能否在第一预设时长内全部被修复。Receive a response returned by the target storage node; the response includes the fault status of the distributed storage system; the fault status is used to indicate whether at least one faulty storage node can be completely destroyed within a first preset time period repair.
在一种可能实现方式中,所述接收所述目标存储节点返回的响应之后,所述方法还包括:In a possible implementation manner, after receiving the response returned by the target storage node, the method further includes:
基于所述响应中包含的故障状态,进行故障处理。Fault handling is performed based on the fault status contained in the response.
在一种可能实现方式中,所述故障状态的故障标识包括第一故障标识或第二故障标识中的任一个,其中,所述第一故障标识用于指示第一故障状态,所述第二故障标识用于指示第二故障状态,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复,所述存储节点为所述分布式存储系统中出现故障的存储节点。In a possible implementation manner, the fault identifier of the fault state includes any one of a first fault identifier or a second fault identifier, wherein the first fault identifier is used to indicate a first fault state, and the second fault identifier The failure identifier is used to indicate a second failure state, the first failure state is used to indicate that the at least one storage node can be completely repaired within a first preset time period, and the second failure state is used to indicate that the at least one storage node is All cannot be repaired within the first preset time period, and the storage node is a faulty storage node in the distributed storage system.
在一种可能实现方式中,所述基于所述响应中包含的故障状态,进行故障处理包括:In a possible implementation manner, the performing fault processing based on the fault state included in the response includes:
当所述访问请求由所述分布式存储系统中的目标客户端基于目标虚拟机发送,且所述目标虚拟机是VMWare虚拟机时,若所述故障状态为所述第一故障状态,不向所述目标虚拟机响应所述访问请求,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复;When the access request is sent by the target client in the distributed storage system based on the target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is the first fault state, no request is sent to the target virtual machine. The target virtual machine responds to the access request, and the first fault state is used to indicate that at least one storage node can be completely repaired within a first preset time period;
当所述访问请求由所述目标客户端基于目标虚拟机发送,且所述目标虚拟机是VMWare虚拟机时,若所述故障状态为所述第二故障状态时,向所述目标虚拟机返回存储异常的消息,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。When the access request is sent by the target client based on the target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is the second fault state, return to the target virtual machine An abnormal message is stored, and the second fault state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
在一种可能实现方式中,所述基于所述响应中包含的故障状态,进行故障处理包括:In a possible implementation manner, the performing fault processing based on the fault state included in the response includes:
当所述访问请求由所述分布式存储系统中的目标客户端基于目标虚拟机发送,且所述目标虚拟机不是VMWare虚拟机时,若所述故障状态为所述第一故障状态,向所述目标虚拟机发送重试请求,所述重试请求用于指示重新下发所述访问请求,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复;When the access request is sent by the target client in the distributed storage system based on the target virtual machine, and the target virtual machine is not a VMWare virtual machine, if the fault state is the first fault state, send the request to the The target virtual machine sends a retry request, where the retry request is used to instruct to re-issue the access request, and the first fault state is used to indicate that at least one storage node can be fully repaired within a first preset duration;
当所述访问请求由所述目标客户端基于目标虚拟机发送,且所述目标虚拟机不是VMWare虚拟机时,若所述故障状态为所述第二故障状态,向所述目标虚拟机返回所述目标虚拟机可识别的目标错误,所述目标错误用于指示存储介质故障,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。When the access request is sent by the target client based on the target virtual machine, and the target virtual machine is not a VMWare virtual machine, if the fault state is the second fault state, return the information to the target virtual machine. A target error identifiable by the target virtual machine, where the target error is used to indicate a storage medium failure, and the second failure state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
基于上述可能的实现方式,提供了不同的故障处理方式,从而使得该本发明实施例所提供的故障处理方式更具有普适性。Based on the above possible implementation manners, different fault handling manners are provided, thereby making the fault handling manner provided by this embodiment of the present invention more universal.
在一种可能实现方式中,所述向分布式存储系统内的任一存储节点发送访问请求之前,所述方法还包括:In a possible implementation manner, before the sending an access request to any storage node in the distributed storage system, the method further includes:
接收所述分布式存储系统中的目标客户端发送的目标访问请求;receiving a target access request sent by a target client in the distributed storage system;
所述向分布式存储系统中的目标存储节点发送访问请求包括:The sending an access request to the target storage node in the distributed storage system includes:
基于所述目标访问请求,向分布式存储系统中的目标存储节点发送所述访问请求。Based on the target access request, the access request is sent to the target storage node in the distributed storage system.
在一种可能实现方式中,所述接收所述目标存储节点返回的响应之后,所述方法还包括:In a possible implementation manner, after receiving the response returned by the target storage node, the method further includes:
接收目标存储节点返回的修复完成响应,所述修复完成响应用于指示所述分布式存储系统内没有故障设备。A repair completion response returned by the target storage node is received, where the repair completion response is used to indicate that there is no faulty device in the distributed storage system.
第三方面,提供了一种分布式存储系统,所述分布式存储系统包括监管节点和多个存储节点;In a third aspect, a distributed storage system is provided, the distributed storage system includes a supervisory node and a plurality of storage nodes;
所述监管节点用于:The supervisory node is used to:
根据所述多个存储节点中的至少一个出现故障的存储节点确定所述分布式存储系统的故障状态;所述故障状态用于指示所述至少一个出现故障的存储节点能否在第一预设时长内全部被修复;A fault state of the distributed storage system is determined according to at least one faulty storage node among the plurality of storage nodes; the fault state is used to indicate whether the at least one faulty storage node can All repaired in time;
向所述多个存储节点中每一个存储节点发送所述故障状态;sending the fault status to each of the plurality of storage nodes;
所述多个存储节点中的每一个存储节点,用于接收所述故障状态。Each storage node in the plurality of storage nodes is configured to receive the fault status.
在一种可能的实现方式中,所述故障状态的故障标识包括第一故障标识和第二故障标识中的任一个,所述第一故障标识用于指示所述第一故障状态,所述第二故障标识用于指示第二故障状态,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。In a possible implementation manner, the fault identifier of the fault state includes any one of a first fault identifier and a second fault identifier, the first fault identifier is used to indicate the first fault state, and the first fault identifier is used to indicate the first fault state. The second fault identifier is used to indicate a second fault state, the first fault state is used to indicate that the at least one storage node can be completely repaired within the first preset time period, and the second fault state is used to indicate that all The at least one storage node cannot all be repaired within the first preset time period.
在一种可能的实现方式中,所述多个存储节点中的每一个存储节点,还用于当接收到所述故障状态后,若再接收到所述访问请求,悬挂所述访问请求,基于接收的故障状态,进行故障处理。In a possible implementation manner, each storage node in the plurality of storage nodes is further configured to suspend the access request if the access request is received again after receiving the fault state, based on Receive the fault status and perform fault processing.
第四方面,提供了一种故障处理装置,用于执行上述分布式存储系统故障处理方法。具体地,该故障处理装置包括用于执行上述第一方面或上述第一方面的任一种可选方式提供的故障处理方法的功能模块。In a fourth aspect, a fault processing apparatus is provided, which is used for executing the above-mentioned fault processing method of a distributed storage system. Specifically, the fault processing apparatus includes a functional module for executing the fault processing method provided in the above-mentioned first aspect or any optional manner of the above-mentioned first aspect.
第五方面,提供了一种故障处理装置,用于执行上述分布式存储系统故障处理方法。具体地,该故障处理装置包括用于执行上述第二方面或上述第二方面的任一种可选方式提供的故障处理方法的功能模块。In a fifth aspect, a fault processing apparatus is provided for executing the above-mentioned fault processing method for a distributed storage system. Specifically, the fault processing apparatus includes a functional module for executing the fault processing method provided in the above second aspect or any optional manner of the above second aspect.
第六方面,提供一种计算机设备,该计算机设备包括处理器和存储器,该存储器中存储有至少一条指令,该指令由该处理器加载并执行以实现如上述分布式存储系统故障处理方法所执行的操作。In a sixth aspect, a computer device is provided, the computer device includes a processor and a memory, the memory stores at least one instruction, and the instruction is loaded and executed by the processor to implement the above-mentioned method for handling a fault in a distributed storage system. operation.
第七方面,提供一种存储介质,该存储介质中存储有至少一条指令,该指令由处理器加载并执行以实现如上述分布式存储系统故障处理方法所执行的操作。In a seventh aspect, a storage medium is provided, and at least one instruction is stored in the storage medium, and the instruction is loaded and executed by a processor to implement the operations performed by the above-mentioned method for handling a fault in a distributed storage system.
附图说明Description of drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.
图1是本发明实施例提供的一种分布式存储系统的示意图;1 is a schematic diagram of a distributed storage system according to an embodiment of the present invention;
图2是本发明实施例提供的一种分布式存储系统的网络环境的示意图;2 is a schematic diagram of a network environment of a distributed storage system provided by an embodiment of the present invention;
图3是本发明实施例提供的一种分布式存储系统内各个设备之间的交互示意图FIG. 3 is a schematic diagram of interaction between various devices in a distributed storage system according to an embodiment of the present invention
图4是本发明实施例提供的一种计算机设备的结构示意图;4 is a schematic structural diagram of a computer device provided by an embodiment of the present invention;
图5是本发明实施例提供的一种分布式存储系统故障处理方法的流程图;5 is a flowchart of a method for handling faults in a distributed storage system provided by an embodiment of the present invention;
图6是本发明实施例提供的一种分布式存储系统故障处理方法的流程图;6 is a flowchart of a method for handling faults in a distributed storage system provided by an embodiment of the present invention;
图7是本发明实施例提供的一种故障处理装置的结构示意图;FIG. 7 is a schematic structural diagram of a fault processing apparatus provided by an embodiment of the present invention;
图8是本发明实施例提供的一种故障处理装置的结构示意图;FIG. 8 is a schematic structural diagram of a fault processing apparatus provided by an embodiment of the present invention;
具体实施方式Detailed ways
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present invention clearer, the embodiments of the present invention will be further described in detail below with reference to the accompanying drawings.
图1是本发明实施例提供的一种分布式存储系统的示意图,参见图1,该分布式存储系统包括至少一个客户端101、多个存储节点102以及监管节点103。其中,客户端101用于为用户提供数据存储以及数据读取的业务,也即是,客户端101可以将用户上传的数据存储在存储节点102中,也可以从存储节点102中读取数据。FIG. 1 is a schematic diagram of a distributed storage system according to an embodiment of the present invention. Referring to FIG. 1 , the distributed storage system includes at least one client 101 , multiple storage nodes 102 , and a supervisory node 103 . The client 101 is used to provide users with data storage and data reading services, that is, the client 101 can store the data uploaded by the user in the storage node 102 or read data from the storage node 102 .
存储节点102,用于存储客户端101写入的数据,还用于向客户端101返回数据,返回的数据可以是客户端101请求读取的数据,还可以向客户端101返回监管节点103下发的分布式存储系统的故障状态,以便客户端101根据该分布式存储系统的故障状态,处理出现的故障。The storage node 102 is used to store the data written by the client 101, and is also used to return data to the client 101. The returned data can be the data requested by the client 101 to read, and can also be returned to the client 101 under the supervision node 103. The fault state of the distributed storage system is sent, so that the client 101 can handle the fault according to the fault state of the distributed storage system.
监管节点103,用于监控分布式存储系统中各个存储节点102是否出现故障,当分布式存储系统中出现故障的存储节点的数目高于该分布式存储系统的冗余度时,可能会影响到业务的正常运行,因此,该监管节点103可以根据分布式存储系统中出现故障的存储节点,确定该分布式存储系统的故障状态,并将确定的故障状态下发至所有的存储节点102,存储节点102能够告知客户端101该分布式存储系统的故障状态,从而可以使得客户端101或存储节点102可以根据该分布式存储系统的故障状态,进行故障处理。The supervisory node 103 is used to monitor whether each storage node 102 in the distributed storage system is faulty. When the number of faulty storage nodes in the distributed storage system is higher than the redundancy of the distributed storage system, it may affect the Therefore, the supervisory node 103 can determine the fault status of the distributed storage system according to the faulty storage node in the distributed storage system, and send the determined fault status to all the storage nodes 102, and the storage The node 102 can inform the client 101 of the fault status of the distributed storage system, so that the client 101 or the storage node 102 can perform fault processing according to the fault status of the distributed storage system.
各个存储节点102的基板管理控制器(baseboard management controller,BMC)可以实时监控各自的存储节点是否出现故障,当该至少一个存储节点102中的任一存储节点的BMC监控到该任一存储节点出现故障时,该BMC可以存储该任一存储节点出现故障的原因以及出现故障的时间,还可以将该任一存储节点出现故障的原因以及出现故障的时间发送给监管节点103,以便监管节点103可以获知该任一存储节点是否出现故障。The baseboard management controller (BMC) of each storage node 102 can monitor in real time whether the respective storage node is faulty, when the BMC of any storage node in the at least one storage node 102 monitors that any storage node occurs In the event of a failure, the BMC can store the reason for the failure of any storage node and the time of failure, and can also send the reason for the failure of any storage node and the time of failure to the supervisory node 103, so that the supervisory node 103 can Find out if any of the storage nodes is faulty.
当存储节点的BMC不将存储节点出现故障的原因以及出现故障的时间发送给监管节点103时,监管节点103可以通过访问各个存储节点102,来获知各个存储节点102是否存在故障,若存在故障,则可以从故障的存储节点获取出现故障的原因以及出现故障的时间。When the BMC of the storage node does not send the reason for the failure of the storage node and the time of the failure to the supervisory node 103, the supervisory node 103 can access each storage node 102 to learn whether each storage node 102 is faulty. Then, the cause of the failure and the time of the failure can be obtained from the failed storage node.
无论监管节点103是接收故障的存储节点发送的出现故障的原因以及出现故障的时间,还是主动地从故障的存储节点获取发送的出现故障的原因以及出现故障的时间,该监管节点103均可以对故障的存储节点出现故障的原因以及出现故障的时间进行存储,以便后续可以根据故障的存储节点发送的出现故障的原因以及出现故障的时间,确定故障类型以及故障场景,其中,对故障类型的描述参见下文中的步骤603,对故障场景的描述参见下文中的步骤602。Regardless of whether the supervisory node 103 receives the cause of the failure and the time of the failure sent by the failed storage node, or actively obtains the cause of the failure and the time of the failure sent by the failed storage node, the supervisory node 103 can The reason for the failure of the faulty storage node and the time of the failure are stored, so that the failure type and the failure scenario can be determined according to the cause of the failure and the time of the failure sent by the failed storage node. The description of the failure type Refer to step 603 below, and refer to step 602 below for the description of the failure scenario.
该监管节点103可以对故障的存储节点出现故障的原因以及出现故障的时间进行存储可以包括:将故障的存储节点发送的出现故障的原因以及出现故障的时间存储在故障表中,该故障表可以存储编号、存储节点的标识、故障时间以及故障原因,其中,编号用于指示第几个出现故障的存储节点,一个存储节点的标识用于唯一指示一个存储节点,该标识可以是存储节点的互联网协议地址(internet protocol address,IP),还可以存储节点的媒体访问控制地址(media access control address,MAC),还可以是该存储节点在该分布式存储系统内的编号,本发明实施例对该存储节点的标识不做具体限定。另外,故障时间为存储节点出现故障的时间,故障原因为存储节点出现故障的原因。The supervisory node 103 may store the failure reason and failure time of the failed storage node, which may include: storing the failure reason and failure time sent by the failed storage node in a failure table, and the failure table may Storage number, storage node identifier, failure time, and failure cause, where the number is used to indicate the number of storage nodes that have failed, and the identifier of a storage node is used to uniquely indicate a storage node, which can be the storage node's Internet Protocol address (internet protocol address, IP), and may also be the media access control address (media access control address, MAC) of the storage node, and may also be the number of the storage node in the distributed storage system. The identifier of the storage node is not specifically limited. In addition, the failure time is the time when the storage node fails, and the failure cause is the reason for the failure of the storage node.
例如表1所示的故障表,从表1可知,当前分布式存储系统内出现故障的存储节点有2个,分别为标识为X的存储节点以及标识为Y的存储节点,其中,标识为X的存储节点出现故障的时间为A,出现故障的原因为D;标识为Y的存储节点出现故障的时间为B,出现故障的原因为C。For example, in the fault table shown in Table 1, it can be seen from Table 1 that there are two faulty storage nodes in the current distributed storage system, which are the storage node marked X and the storage node marked Y, where the identifier is X The time of failure of the storage node of Y is A, and the reason of failure is D; the time of failure of the storage node marked as Y is B, and the reason of failure is C.
表1Table 1
需要说明的是,当故障表内的记录的存储节点被修复后,该监管节点可以在该故障表中,删除被修复完成的存储节点的相关信息,从而监管节点可以根据故障表中最后一个存储节点的编号,确定分布式存储系统内出现故障的存储节点的数目。It should be noted that after the storage node recorded in the fault table is repaired, the supervisory node can delete the relevant information of the repaired storage node in the fault table, so that the supervisory node can store the data according to the last storage node in the fault table. The number of the node, which determines the number of failed storage nodes in the distributed storage system.
当分布式存储系统内的存储节点出现故障时,可能造成分布式存储系统内的文件系统的损坏,当文件系统损坏时,文件系统内的元数据可能出现错误,由于元数据是指用来描述一个文件的特征的系统数据,例如访问权限、文件拥有者以及文件内的数据块的分布信息等,当元数据出现错误时,元数据所指示的文件中的数据块可能无法被访问。When the storage node in the distributed storage system fails, the file system in the distributed storage system may be damaged. When the file system is damaged, the metadata in the file system may be wrong. Because the metadata is used to describe System data of a file's characteristics, such as access rights, file owner, and distribution information of data blocks in the file, etc. When there is an error in the metadata, the data blocks in the file indicated by the metadata may not be accessible.
在一些实施中,当客户端访问任一存储节点内的数据块失败时,该客户端可以将该数据块的数据量以及唯一标识该数据块的数据标识发送给监管节点,该监管节点接收到该数据量以及数据标识后,对接收到的数据量以及数据标识进行存储,具体的可以存储在数据表中,该数据表可以用于存储数据总量、数据标识以及与每个数据标识对应的数据量,其中,数据总量为分布式存储系统内当前不可被访问的所有数据的数据量。In some implementations, when a client fails to access a data block in any storage node, the client can send the data volume of the data block and a data identifier that uniquely identifies the data block to the supervisory node, and the supervisory node receives the After the data volume and data identification, the received data amount and data identification are stored. Specifically, the data can be stored in a data table. The data table can be used to store the total amount of data, the data identification and the corresponding data identification. The amount of data, where the total amount of data is the amount of all data that is currently inaccessible in the distributed storage system.
例如表2所示的数据表,从表2可知,分布式存储系统中当前不能被访问的数据为数据标识M所指示的数据块内的数据以及数据标识N所指示的数据块内的数据,其中,当前不能被访问的数据的数据总量为30千字节(kilobyte,KB),当前不能被访问的数据包括数据标识M所指示的数据块内10KB的数据以及数据标识N所指示的数据块内20KB的数据。For example, in the data table shown in Table 2, it can be seen from Table 2 that the currently inaccessible data in the distributed storage system is the data in the data block indicated by the data identifier M and the data in the data block indicated by the data identifier N, Among them, the total amount of data that cannot be accessed currently is 30 kilobytes (kilobyte, KB), and the currently inaccessible data includes 10 KB of data in the data block indicated by the data identifier M and the data indicated by the data identifier N. 20KB of data within a block.
表2Table 2
需要说明的是,当客户端再一次访问不能被访问的数据块时,若该客户端可以访问成功,则该客户端向监管节点发送携带该数据块的数据标识的访问数据成功响应,该访问数据成功响应用于指示该数据块内的数据可以被访问,当接收到该访问数据成功响应后,在该数据表中,该监管节点可以删除该数据标识对应的数据量,并更新数据表内的数据总量。例如,该访问数据成功响应携带数据标识M,则该监管节点删除数据表内与数据标识M相关的信息,并将数据总量更新为20KB。需要说明的是,数据表中还可以存储有与数据标识对应的存储节点的标识,以指示哪个存储节点内的那个数据块不能被访问。It should be noted that when the client accesses the data block that cannot be accessed again, if the client can access the data block successfully, the client sends the access data success response carrying the data identifier of the data block to the supervision node. The data success response is used to indicate that the data in the data block can be accessed. After receiving the access data success response, in the data table, the supervisory node can delete the data volume corresponding to the data identifier and update the data in the data table. the total amount of data. For example, if the access data successfully responds with the data identifier M, the supervisory node deletes the information related to the data identifier M in the data table, and updates the total amount of data to 20KB. It should be noted that, the data table may also store the identifier of the storage node corresponding to the data identifier, so as to indicate which data block in which storage node cannot be accessed.
在一些实施例中,由于分布式存储系统负责的业务量比较大,客户端101和存储节点102的数目可能比较多,为了方便客户端101与存储节点102之间的数据传输,客户端101所在的应用层可以设置有至少一个业务交换机,以客户端101与存储节点102之间的交互。为了便于存储节点102之间的数据传输,可以在存储节点102所在的存储层设置至少一个存储交换机,以实现各个存储节点102之间的交互。为了便于监管节点103与存储节点102之间的数据传输,可以设置有监管交换机,以实现监管节点103与存储节点102之间的交互。In some embodiments, since the distributed storage system is responsible for a relatively large amount of business, the number of clients 101 and storage nodes 102 may be relatively large. The application layer of the system may be provided with at least one service switch for interaction between the client 101 and the storage node 102 . In order to facilitate data transmission between the storage nodes 102 , at least one storage switch may be set at the storage layer where the storage nodes 102 are located, so as to realize the interaction between the storage nodes 102 . In order to facilitate data transmission between the supervisory node 103 and the storage node 102 , a supervisory switch may be provided to realize the interaction between the supervisory node 103 and the storage node 102 .
从以上的描述可知,在该分布式存储系统中,除了需要提供业务服务以外,还需要提供监控服务,对于不同的服务可以通过不同的网络来实现。为了实现客户端、存储节点以及监管节点之间的网络连接,该客户端、存储节点以及监管节点中均可以安装有至少一个网口,该至少一个网口可以用于连接不同的网络,不同的网络可以传输不同服务的数据,该至少一个网口可以分别是连接业务网络的业务网口、连接监管网络的监管网口,以及连接BMC网络的BMC网口。As can be seen from the above description, in the distributed storage system, in addition to business services, monitoring services also need to be provided, and different services can be implemented through different networks. In order to realize the network connection between the client, the storage node and the supervisory node, at least one network port can be installed in the client, the storage node and the supervisory node, and the at least one network port can be used to connect different networks. The network can transmit data of different services, and the at least one network port may be a service network port connected to the service network, a supervision network port connected to the supervision network, and a BMC network port connected to the BMC network.
为了说明分布式存储系统中的网络环境,参见图2所示的本发明实施例提供的一种分布式存储系统的网络环境的示意图,在该分布式存储系统中网络可以包括业务网络、监管网络以及BMC网络。In order to illustrate the network environment in the distributed storage system, refer to the schematic diagram of the network environment of a distributed storage system provided by an embodiment of the present invention shown in FIG. 2 . In the distributed storage system, the network may include a service network, a supervision network, and a and the BMC network.
其中,业务网络是存储节点之间用于心跳、数据同步以及镜像时所使用的网络,例如,当将存储节点1中存储的数据块1同步至存储节点2时,存储节点1可以通过业务网口,在该业务网络中,向存储节点2发送数据块1,那么存储节点2通过自己的业务网口,可以接收到数据块1,并将数据块1存储在存储节点2内。The service network is the network used for heartbeat, data synchronization and mirroring between storage nodes. For example, when the data block 1 stored in the storage node 1 is synchronized to the storage node 2, the storage node 1 can use the service network In this service network, data block 1 is sent to storage node 2, then storage node 2 can receive data block 1 through its own service network port, and store data block 1 in storage node 2.
监管网络是监控存储节点是否出现故障以及进行信息查询时所使用的网络,在监管网络中,可以传输监管节点下发的分布式存储系统的故障状态,还可以查询出现故障的存储节点。在一些可能的实施方式中,监管节点可以通过监管网口向存储节点的监管网口,在监管网络中,发送分布式存储系统的故障状态,存储节点可以通过自己的监管网口从监管网络中接收监管节点的故障状态,当存储节点再接收到客户端的业务请求(也即是下文中SCSI请求)后,可以不处理接收的业务请求,直接向客户端返回下发的故障状态,以便客户端可以根据故障状态,进行相应的故障处理。The supervisory network is a network used to monitor whether a storage node is faulty and to query information. In the supervisory network, the fault status of the distributed storage system issued by the supervisory node can be transmitted, and the faulty storage node can also be queried. In some possible implementations, the supervisory node can send the fault status of the distributed storage system to the supervisory network port of the storage node through the supervisory network port. Receive the fault status of the supervisory node. When the storage node receives the client's service request (that is, the SCSI request below), it can directly return the delivered fault status to the client without processing the received service request, so that the client can According to the fault status, the corresponding fault processing can be carried out.
BMC网络是管理BMC的网络,监管节点通过访问该BMC网络的BMC网口,可以监控BMC的状态,根据监控的BMC的状态,可以确定存储节点是否有故障。需要说明的是,BMC网络为可选的网络,在一些实施方式中,可以不通过BMC来监控存储节点的是否有故障,而是可以通过其他方式,来监控存储节点是否有故障,因此,分布式存储系统内还可以不设置该BMC网络,直接通过监管网络,来实现监控。The BMC network is the network that manages the BMC. The supervisory node can monitor the status of the BMC by accessing the BMC network port of the BMC network. Based on the monitored status of the BMC, it can determine whether the storage node is faulty. It should be noted that the BMC network is an optional network. In some embodiments, the BMC may not be used to monitor whether the storage node is faulty, but other methods may be used to monitor whether the storage node is faulty. Therefore, the distribution The BMC network can also not be set up in the storage system, and the monitoring can be realized directly through the supervision network.
当分布式存储系统内的存储节点均和业务网络、监管网络以及BMC网络连接时,监管节点则可以从三网中实时接收存储节点的状态信息,例如是否故障的信息。When the storage nodes in the distributed storage system are all connected to the service network, the supervisory network, and the BMC network, the supervisory node can receive real-time storage node status information from the three networks, such as information about whether it is faulty.
为了进一步说明客户端、存储节点以及监管节点之间交互过程,参见图3所示的本发明实施例提供的一种分布式存储系统内各个设备之间的交互示意图,在图3中,一个存储节点中可以安装有至少一个对象存储(object storage device,OSD)进程、SCSI处理进程以及节点监控服务(node monitor service,NMS)代理进程。In order to further illustrate the interaction process between the client, the storage node, and the supervisory node, refer to the schematic diagram of interaction between various devices in a distributed storage system provided by an embodiment of the present invention shown in FIG. 3 . In FIG. 3 , a storage At least one object storage device (OSD) process, a SCSI processing process, and a node monitor service (node monitor service, NMS) agent process may be installed in the node.
其中,一个OSD进程可以对应存储节点中用于存储数据的一个或多个存储介质,该存储介质可以是硬盘,OSD进程用于管理对于一个或多个存储介质的访问请求,访问请求用于指示对待处理的数据进行处理,其中,对待处理的数据进行处理可以包括读取所述至少一个或多个存储介质内存储的数据块,待处理的数据块包括待处理的数据,对待处理的数据进行处理可以还包括将待处理的数据写入所述至少一个或多个存储介质,当访问请求可以使用SCSI发送时,该访问请求可以视为SCSI请求。One OSD process may correspond to one or more storage media used to store data in the storage node, and the storage media may be a hard disk. The OSD process is used to manage access requests for one or more storage media, and the access request is used to indicate Processing the data to be processed, wherein the processing of the data to be processed may include reading data blocks stored in the at least one or more storage media, the data blocks to be processed include the data to be processed, and performing processing on the data to be processed. Processing may further include writing the data to be processed into the at least one or more storage media, and when the access request may be sent using SCSI, the access request may be regarded as a SCSI request.
SCSI处理进程用于从业务网络中获取客户端发送的SCSI请求,并转换和分解SCSI请求,得到多个SCSI子请求,并将多个SCSI子请求下发到对应的OSD进程。例如,SCSI请求携带待读取的数据的逻辑区块地址(logical block address,LBA)为100-200,由于LBA 100-150所指示的存储位置在存储节点1的存储介质1中,LBA 151-200所指示的存储位置在存储节点1的存储介质2中,则SCSI处理进程可以将SCSI请求转换和分解为2个SCSI子请求,其中,SCSI子请求1用于指示请求读取存储介质1中LBA 100-150处存储的数据,SCSI子请求2用于指示请求读取存储介质2中LBA 151-200处存储的数据,从而SCSI处理进程可以将SCSI子请求1发送至于存储介质1对应的OSD进程,将SCSI子请求2发送至于存储介质2对应的OSD进程。The SCSI processing process is used to obtain the SCSI request sent by the client from the service network, convert and decompose the SCSI request, obtain multiple SCSI sub-requests, and deliver the multiple SCSI sub-requests to the corresponding OSD process. For example, the logical block address (LBA) of the SCSI request to carry the data to be read is 100-200. Since the storage location indicated by the LBA 100-150 is in the storage medium 1 of the storage node 1, the LBA 151- The storage location indicated by 200 is in the storage medium 2 of the storage node 1, then the SCSI processing process can convert and decompose the SCSI request into two SCSI sub-requests, wherein the SCSI sub-request 1 is used to indicate the request to read the storage medium 1. The data stored in the LBAs 100-150, and the SCSI sub-request 2 is used to indicate a request to read the data stored in the LBAs 151-200 in the storage medium 2, so that the SCSI processing process can send the SCSI sub-request 1 to the OSD corresponding to the storage medium 1. process, send SCSI sub-request 2 to the OSD process corresponding to storage medium 2.
一个NMS代理进程用于接收监管节点下发的分布式存储系统的故障状态,并向一个存储节点的所有OSD进程下发接收的故障状态。例如,客户端通过监管网络向存储节点发送分布式存储系统的故障状态,存储节点内的NMS代理进程从监管网络中获取监管节点发送的故障状态,并将获取的故障状态下发至该存储节点内的各个OSD进程中,当任一OSD进程接收到故障状态后,若再接收到SCSI处理进程发送的SCSI子请求或SCSI请求,则直接向SCSI处理进程发送接收的故障状态,以便安装SCSI处理进程的设备根据接收的故障状态,进行故障处理。An NMS agent process is used to receive the fault status of the distributed storage system delivered by the supervisory node, and deliver the received fault status to all OSD processes of a storage node. For example, the client sends the fault status of the distributed storage system to the storage node through the supervision network, and the NMS agent process in the storage node obtains the fault status sent by the supervision node from the supervision network, and sends the obtained fault status to the storage node. In each OSD process in the system, when any OSD process receives the fault status, if it receives the SCSI sub-request or SCSI request sent by the SCSI processing process, it will directly send the received fault status to the SCSI processing process, so that the SCSI processing process can be installed. The device of the process performs fault processing according to the received fault status.
需要说明的是,在一些实施方式中,SCSI处理进程未被安装在存储节点内,而是安装在客户端内,本发明实施例对安装该SCSI处理进程的设备不做具体限定。例如,客户端的SCSI处理进程中的SCSI请求携带待读取的数据的LBA为0-100,由于LBA 0-50所指示的存储位置在存储节点1中,LBA 51-100所指示的存储位置在存储节点2中,则SCSI处理进程可以将SCSI请求转换和分解为2个SCSI子请求,其中,SCSI子请求1用于指示请求读取存储节点1中LBA 0-50处存储的数据,SCSI子请求2用于指示请求读取存储节点2中LBA 51-100处存储的数据,从而SCSI处理进程可以将SCSI子请求1发送至于存储节点1内的OSD进程,将SCSI子请求2发送至于存储节点2内的OSD进程。It should be noted that, in some embodiments, the SCSI processing process is not installed in the storage node, but is installed in the client, and the embodiment of the present invention does not specifically limit the device on which the SCSI processing process is installed. For example, the LBA of the SCSI request carrying the data to be read in the SCSI processing process of the client is 0-100. Since the storage location indicated by LBA 0-50 is in storage node 1, the storage location indicated by LBA 51-100 is in In the storage node 2, the SCSI processing process can convert and decompose the SCSI request into 2 SCSI sub-requests, where the SCSI sub-request 1 is used to instruct the request to read the data stored in the LBA 0-50 in the storage node 1, and the SCSI sub-request Request 2 is used to indicate a request to read the data stored at LBA 51-100 in storage node 2, so that the SCSI processing process can send SCSI sub-request 1 to the OSD process in storage node 1, and send SCSI sub-request 2 to the storage node. 2 within the OSD process.
客户端、存储节点以及监管节点均可以是计算机设备,为了进一步说明,计算机设备的硬件结构,参见图4所示的本发明实施例提供的一种计算机设备的结构示意图,计算机设备400包括可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)401和一个或一个以上的存储器402,其中,该存储器402中存储有至少一条指令,该至少一条指令由该处理器401加载并执行以实现下的故障处理方法实施例提供的方法。当然,该计算机设备400还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该计算机设备400还可以包括其他用于实现设备功能的部件,在此不做赘述。The client, the storage node, and the supervisory node can all be computer equipment. For further description, for the hardware structure of the computer equipment, refer to the schematic structural diagram of a computer equipment provided by an embodiment of the present invention shown in FIG. 4. The
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括指令的存储器,上述指令可由终端中的处理器执行以完成下述实施例中的故障处理方法。例如,该计算机可读存储介质可以是只读存储器(read-only memory,ROM)、随机存取存储器(randomaccess memory,RAM)、只读光盘(compact disc read-only memory,CD-ROM)、磁带、软盘和光数据存储节点等。In an exemplary embodiment, a computer-readable storage medium, such as a memory including instructions, is also provided, and the instructions can be executed by a processor in the terminal to complete the fault handling method in the following embodiments. For example, the computer-readable storage medium may be read-only memory (ROM), random access memory (RAM), compact disc read-only memory (CD-ROM), magnetic tape , floppy disk and optical data storage nodes, etc.
在本发明实施例中,监管节点可以根据出现故障的存储节点,来确定分布式存储系统的故障状态,监管节点并将确定的故障状态下发至分布式存储系统内所有的存储节点;客户端在接收到用户的读请求或者写请求后,可以向分布式存储系统内的存储节点发送SCSI请求,用于完成用户的读请求或者写请求;当任一存储节点接收到SCSI请求后,任一存储节点暂不处理接收的SCSI请求,并向客户端返回接收的故障状态,使得客户端可以基于任一存储节点返回的故障状态,进行故障处理。而在一些实施例中,任一存储节点也可以基于故障状态进行故障处理,在一种可能的实现方式中,当任一存储节点接收到监管节点发送的故障状态时,若再接收到客户端发送的SCSI请求,则任一存储节点可以根据故障状态,进行故障处理。In this embodiment of the present invention, the supervisory node may determine the fault state of the distributed storage system according to the faulty storage node, and the supervisory node sends the determined fault state to all storage nodes in the distributed storage system; the client After receiving the user's read request or write request, a SCSI request can be sent to the storage nodes in the distributed storage system to complete the user's read request or write request; when any storage node receives the SCSI request, any The storage node temporarily does not process the received SCSI request, and returns the received fault status to the client, so that the client can perform fault processing based on the fault status returned by any storage node. In some embodiments, any storage node may also perform fault processing based on the fault status. In a possible implementation manner, when any storage node receives the fault status sent by the supervisory node, if it receives the client If the SCSI request is sent, any storage node can perform fault processing according to the fault state.
为了进一步说明上述过程,参见如图5所示的本发明实施例提供的一种分布式存储系统故障处理方法的流程图,该方法具体包括:To further illustrate the above process, refer to the flowchart of a method for handling faults in a distributed storage system provided by an embodiment of the present invention as shown in FIG. 5 , the method specifically includes:
501、监管节点确定分布式存储系统内出现故障的至少一个存储节点以及该至少一个存储节点无法被访问的目标数据。501. The supervisory node determines at least one faulty storage node in the distributed storage system and target data that cannot be accessed by the at least one storage node.
该目标数据还可以是该分布式存储系统中存储的且不能被访问的数据,该目标数据还可以仅是该至少一个存储节点所存储的无法被访问的数据。本发明实施例以该目标数据是该至少一个存储节点内不能被访问的数据为例进行说明。The target data may also be inaccessible data stored in the distributed storage system, and the target data may also be only inaccessible data stored by the at least one storage node. The embodiments of the present invention are described by taking the target data as data that cannot be accessed in the at least one storage node as an example.
该监管节点可以通过查询的方式,确定该至少一个存储节点以及该目标数据,在一种可能的实现方式中,该监管节点可以每经过第八预设时长,查询故障表以及数据表,从该故障表中确定出现故障的存储节点,从数据表中确定无法被访问的数据。该第八预设时长可以是10分钟或者1小时,本发明实施例对该第八预设时长不做具体限定。The supervisory node may determine the at least one storage node and the target data by querying. In a possible implementation manner, the supervisory node may query the fault table and the data table every time the eighth preset time period elapses, and retrieve the data from the data table. The faulty storage node is determined from the fault table, and the data that cannot be accessed is determined from the data table. The eighth preset duration may be 10 minutes or 1 hour, and the embodiment of the present invention does not specifically limit the eighth preset duration.
需要说明的是,在前文中对从该故障表中确定出现故障的存储节点的方式以及从数据表中确定无法被访问的数据的方式有描述,在此不做赘述。It should be noted that the method of determining the faulty storage node from the fault table and the method of determining the inaccessible data from the data table have been described above, and will not be repeated here.
当该监管节点执行完本步骤501后,还可以根据至少一个存储节点以及所述目标数据,确定故障状态,那么,确定分布式存储系统内出现故障的至少一个存储节点以及该至少一个存储节点无法被访问的目标数据;根据至少一个存储节点以及所述目标数据,确定故障状态的过程,也即是,根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态的过程。其中,根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态的过程可以通过下述步骤502所示的过程的来实现。After the supervisory node has performed this step 501, it can also determine the fault state according to the at least one storage node and the target data. Then, it is determined that the at least one storage node in the distributed storage system is faulty and the at least one storage node cannot The accessed target data; the process of determining the fault state according to at least one storage node and the target data, that is, determining the distributed storage node according to at least one faulty storage node in the plurality of storage nodes The process of the failure state of the system. Wherein, the process of determining the fault state of the distributed storage system according to the faulty storage node in at least one of the plurality of storage nodes may be implemented through the process shown in the following step 502 .
502、当该至少一个存储节点的数目大于该分布式存储系统的冗余度,且该目标数据的数据量符合预设条件时,该监管节点则将该分布式存储系统的故障状态确定为第一故障状态。502. When the number of the at least one storage node is greater than the redundancy of the distributed storage system, and the data volume of the target data meets a preset condition, the supervisory node determines the fault state of the distributed storage system as the first a fault condition.
该冗余度为分布式存储系统内存储的数据的冗余度,也即是该分布式存储系统内存储的数据的副本数目,分布式存储系统的故障状态用于指示该至少一个存储节点能否在第一预设时长内全部被修复,故障状态可以包括第一故障状态以及第二故障状态中的任一个,其中,第二故障状态用于指示该至少一个存储节点不能在该第一预设时长内全部被修复。也即是,当该至少一个存储节点可以在第一预设时长内被修复时,可以认为分布式存储系统内的故障可以在短时间内修复,分布式存储系统处于第一故障状态,该第一故障状态也即是节点短时故障(transient node down,TND)状态。当该至少一个存储节点可以在第一预设时长内被修复时,则认为分布式存储系统内的故障不可以在短时间内修复,分布式存储系统处于第二故障状态,第二故障状态也即是节点长期故障(permanent node down,PND)状态。第一预设时长可以是20分钟或者2两个小时,本发明实施例对该第一预设时长不做具体限定。The redundancy is the redundancy of the data stored in the distributed storage system, that is, the number of copies of the data stored in the distributed storage system, and the fault status of the distributed storage system is used to indicate that the at least one storage node can No all are repaired within the first preset time period, the fault state may include any one of the first fault state and the second fault state, wherein the second fault state is used to indicate that the at least one storage node cannot operate in the first preset time period. All repaired within the set time. That is, when the at least one storage node can be repaired within the first preset time period, it can be considered that the fault in the distributed storage system can be repaired in a short time, the distributed storage system is in the first fault state, and the first A fault state is also a transient node down (transient node down, TND) state. When the at least one storage node can be repaired within the first preset time period, it is considered that the fault in the distributed storage system cannot be repaired in a short time, the distributed storage system is in the second fault state, and the second fault state is also That is, the long-term node down (permanent node down, PND) state. The first preset duration may be 20 minutes or two or two hours, and the embodiment of the present invention does not specifically limit the first preset duration.
当分布式存储系统内存储的数据较多且存储节点的数目也较多时,若分布式存储系统内出现故障的存储节点的数目小于该分布式存储系统的冗余度,该分布式存储系统内的未故障的存储节点可以根据可以被访问的数据,重构故障的存储节点内不能被访问的数据,因此,当分布式存储系统内出现故障的存储节点的数目小于该分布式存储系统的冗余度时,不会影响到该分布式存储系统的正常业务,那么,也就无需修复故障的存储节点。但是,若出现故障的存储节点的数目大于该分布式存储系统的冗余度,该分布式存储系统内的未出现故障的存储节点无法根据可以被访问的数据,重构故障的存储节点内不能被访问的数据,从而就可能会影响到分布式存储系统内正常的业务。并且考虑到分布式存储系统内存储的数据能否被访问也是影响业务的重要因素,当分布式存储系统内无法被访问的目标数据过多时,对分布式存储系统所提供的业务的影响也就比较大,可能会影响到业务的正常运行,因此,该监管节点可以先确定分布式存储系统的故障状态,以便可以快速根据故障状态,进行故障处理,来最大限度的降低对业务的影响程度。而当该目标数据的较少时,对业务影响也就相对较小,可能不会影响到业务的正常运行,为了可以为用户持续提供业务,该监管节点可以暂不进行故障处理。When the data stored in the distributed storage system is large and the number of storage nodes is also large, if the number of faulty storage nodes in the distributed storage system is less than the redundancy of the distributed storage system, the The non-faulty storage nodes can reconstruct the data that cannot be accessed in the faulty storage node according to the data that can be accessed. Therefore, when the number of faulty storage nodes in the distributed storage system is less than the redundancy of the distributed storage system. In the case of redundancy, the normal business of the distributed storage system will not be affected, so there is no need to repair the faulty storage node. However, if the number of faulty storage nodes is greater than the redundancy of the distributed storage system, the non-faulty storage nodes in the distributed storage system cannot reconstruct the failed storage nodes according to the data that can be accessed. The accessed data may affect the normal business in the distributed storage system. And considering whether the data stored in the distributed storage system can be accessed is also an important factor affecting the business, when there are too many target data that cannot be accessed in the distributed storage system, the impact on the business provided by the distributed storage system will also be It is relatively large and may affect the normal operation of the business. Therefore, the supervisory node can first determine the fault status of the distributed storage system, so that the fault can be quickly handled according to the fault status to minimize the impact on the business. When the target data is small, the impact on the business is relatively small, and the normal operation of the business may not be affected. In order to continuously provide services for users, the supervisory node may not perform fault processing for the time being.
监管节点可以通过预设条件,确定目标数据的数据量是否能够影响到业务的正常运行,该预设条件可以包括下述任一项:目标数据的数据量与第一预设数据量之间的比值大于预设比值,该第一预设数据量为该分布式存储系统存储的所有数据的总数据量;该目标数据的数据量大于第二预设数据量。当目标数据的数据量与第一预设数据量之间的比值大于预设比值时,或者当该目标数据的数据量大于第二预设数据量时,说明目标数据的数据量比较大,也就可能会影响到业务的正常运行,从而当目标数据的数据量符合预设条件时,该监管节点可以先将分布式存储系统的故障状态设置为第一故障状态,以便可以快速根据故障状态,进行故障处理,以降低对业务的影响程度。若超过第一预设时间,该至少一个存储节点未全部修复完成,则可以再将该故障状态更新为第二故障状态。需要说明的是,该预设比值可以是0.4、0.5或者是0.6,本发明实施例对该预设比值、第一预设数据量以及第二预设数据量不做具体限定。The supervisory node can determine whether the data volume of the target data can affect the normal operation of the business through preset conditions, and the preset conditions can include any of the following: the difference between the data volume of the target data and the first preset data volume. The ratio is greater than a preset ratio, and the first preset data volume is the total data volume of all data stored in the distributed storage system; the data volume of the target data is greater than the second preset data volume. When the ratio between the data volume of the target data and the first preset data volume is greater than the preset ratio, or when the data volume of the target data is greater than the second preset data volume, it means that the data volume of the target data is relatively large. It may affect the normal operation of the business, so when the data volume of the target data meets the preset conditions, the supervisory node can first set the fault state of the distributed storage system to the first fault state, so that the Troubleshooting to reduce business impact. If the at least one storage node is not fully repaired after the first preset time, the fault state may be updated to the second fault state. It should be noted that the preset ratio may be 0.4, 0.5, or 0.6, and the embodiment of the present invention does not specifically limit the preset ratio, the first preset data amount, and the second preset data amount.
为了能够将该至少一个存储节点的数目与该分布式存储系统的冗余度相比较,该监管节点可以查询该监管节点所存储的故障表,从故障表中确定该至少一个存储节点的数目。为了能够确定目标数据的数据量是否符合预设条件,该监管节点可以查询所存储的数据表,从数据表中确定该分布式存储系统内目前无法访问的目标数据的数据量。例如,监管节点通过查询表2可以确定目标数据包括数据标识M所指示的数据块内10KB的数据以及数据标识N所指示的数据块内20KB的数据。In order to be able to compare the number of the at least one storage node with the redundancy of the distributed storage system, the supervisory node may query a fault table stored by the supervisory node, and determine the number of the at least one storage node from the fault table. In order to be able to determine whether the data volume of the target data meets the preset condition, the supervisory node may query the stored data table, and determine the data volume of the target data currently inaccessible in the distributed storage system from the data table. For example, the supervisory node may determine that the target data includes 10KB of data in the data block indicated by the data identifier M and 20KB of data in the data block indicated by the data identifier N by looking up the table 2.
需要说明的是,在前文中介绍故障表时,对从故障表中确定在该分布式存储系统内出现故障的存储节点的数目的过程进行了叙述,在此,本发明实施例对从故障表中确定在该分布式存储系统内出现故障的存储节点的数目的过程不做赘述。It should be noted that when the fault table was introduced in the foregoing, the process of determining the number of faulty storage nodes in the distributed storage system from the fault table was described. The process of determining the number of faulty storage nodes in the distributed storage system will not be described in detail.
需要说明的是,本步骤502所示的过程也即是根据该至少一个存储节点以及所述目标数据,确定该分布式存储系统的故障状态的过程。It should be noted that the process shown in this step 502 is also a process of determining the fault state of the distributed storage system according to the at least one storage node and the target data.
503、该监管节点向该分布式存储系统内的所有存储节点发送用于指示第一故障状态的第一故障标识。503. The supervisory node sends a first fault identifier for indicating a first fault state to all storage nodes in the distributed storage system.
故障状态的故障标识包括第一故障标识和第二故障标识中的任一个,其中,第一故障标识用于指示第一故障状态,第二故障标识用于指示第二故障状态,该第一故障标识和第二故障标识可以不同,例如,第一故障标识可以是s,第二故障标识可以是t,本发明实施例对第一故障标识和第二故障标识的表示方式不做具体限定。The fault identifier of the fault state includes any one of the first fault identifier and the second fault identifier, wherein the first fault identifier is used to indicate the first fault state, and the second fault identifier is used to indicate the second fault state, and the first fault identifier is used to indicate the second fault state. The identifier and the second fault identifier may be different. For example, the first fault identifier may be s, and the second fault identifier may be t. The embodiments of the present invention do not specifically limit the representations of the first fault identifier and the second fault identifier.
该监管节点可以向每个存储节点的NMS代理进程发送该第一故障标识,从而实现向所有存储节点发送该第一故障标识,以告知所有存储节点分布式存储系统目前的故障状态为第一故障状态。The supervisory node may send the first fault identifier to the NMS agent process of each storage node, so as to send the first fault identifier to all storage nodes, so as to inform all storage nodes that the current fault state of the distributed storage system is the first fault state.
需要说明的是,本步骤503所示的过程也即是监管节点向分布式存储系统包含的多个存储节点中每一个存储节点发送所述故障状态的过程。It should be noted that the process shown in this step 503 is also the process of the supervisory node sending the fault status to each of the multiple storage nodes included in the distributed storage system.
需要说明的是,在一些实施例中,监管节点在确定完故障状态后,还可以根据所述存储系统的故障状态,进行故障处理,本发明实施例对监管节点进行故障处理的过程不做具体限定。It should be noted that, in some embodiments, after determining the fault state, the supervisory node may also perform fault processing according to the fault state of the storage system. The embodiment of the present invention does not specify a process for the supervisory node to perform fault processing. limited.
504、该分布式存储系统中的目标存储节点接收该第一故障标识。504. The target storage node in the distributed storage system receives the first fault identifier.
该目标存储节点为该分布式存储系统中的任一存储节点,该目标存储节点内的每个OSD进程可以从该目标存储节点的NMS代理进程获取该第一故障标识,从而该目标存储节点的每个OSD进程可以获取该第一故障标识。需要说明的是,该分布式存储系统内的每一个存储节点都可以执行本步骤504,对于故障的存储节点可能能够接收到该第一故障标识,也可能接收不到该第一故障标识。The target storage node is any storage node in the distributed storage system, and each OSD process in the target storage node can obtain the first fault identifier from the NMS proxy process of the target storage node, so that the Each OSD process can acquire the first fault identifier. It should be noted that, each storage node in the distributed storage system can perform this step 504, and the faulty storage node may be able to receive the first fault identifier, or may not receive the first fault identifier.
505、目标设备向该目标存储节点发送访问请求,505. The target device sends an access request to the target storage node,
该访问请求用于指示读取该目标存储节点所存储的数据或者向该目标存储节点写入数据。该目标设备为安装有SCSI处理进程的设备,可以是该目标客户端,还可以是目标存储节点,其中,目标客户端为该分布式存储系统中的任一客户端。本步骤506可以由目标设备内的SCSI处理进程来实现。The access request is used to instruct to read data stored in the target storage node or write data to the target storage node. The target device is a device installed with a SCSI processing process, and may be the target client or a target storage node, wherein the target client is any client in the distributed storage system. This step 506 may be implemented by a SCSI processing process in the target device.
在本步骤之前505之前,分布式存储系统内的目标客户端可以向目标设备发送目标访问请求,该目标访问请求用于执行对第一目标数据进行处理,其中第一目标数据包括该访问请求所指示的数据,该目标访问请求可以携带目标存储地址,该目标存储地址可以是第一目标数据的存储地址。由目标客户端可以安装的目标虚拟机发送目标访问请求来。具体地,该目标虚拟机可以向该目标设备内的SCSI处理进程发送该目标访问请求。目标虚拟机向SCSI处理进程发送目标访问请求可以由用户的动作来触发,例如,当用户在客户端的界面内输入待读取的数据的存储地址,并点击读取按钮时,触发客户端内的目标虚拟机向SCSI处理进程发送目标访问请求,以请求读取到用户输入的存储地址处存储的数据。Before 505 in this step, the target client in the distributed storage system may send a target access request to the target device, where the target access request is used to perform processing on the first target data, wherein the first target data includes the data of the access request. The indicated data, the target access request may carry a target storage address, and the target storage address may be the storage address of the first target data. The target access request is sent by the target virtual machine that the target client can install. Specifically, the target virtual machine may send the target access request to a SCSI processing process in the target device. The target virtual machine sends a target access request to the SCSI processing process can be triggered by user actions. For example, when the user enters the storage address of the data to be read in the client interface and clicks the read button, the The target virtual machine sends a target access request to the SCSI processing process, so as to request to read the data stored at the storage address input by the user.
然后,该目标设备接收分布式存储系统内的目标客户端发送的目标访问请求,本步骤506可以通下述方式来实现:目标设备基于该目标访问请求,向分布式存储系统内的目标存储节点发送访问请求。具体地,SCSI处理进程接收到该目标访问请求后,根据目标地址,对该目标访问请求进行转化和分解,得到多个访问请求,每个访问请求可以携带该目标地址中的部分地址,这部分地址可以是该目标存储节点内的任一OSD进程所管理的存储介质中的偏移地址,从而SCSI处理进程向目标存储节点内的OSD进程发送对应的访问请求,将访问请求转换访问请求的过程也即是前述的转换SCSI请求的过程。Then, the target device receives the target access request sent by the target client in the distributed storage system, and this step 506 can be implemented in the following manner: the target device sends a request to the target storage node in the distributed storage system based on the target access request. Send an access request. Specifically, after receiving the target access request, the SCSI processing process transforms and decomposes the target access request according to the target address, and obtains multiple access requests. Each access request can carry part of the address in the target address. The address can be an offset address in the storage medium managed by any OSD process in the target storage node, so that the SCSI processing process sends the corresponding access request to the OSD process in the target storage node, and the process of converting the access request to the access request That is, the aforementioned process of converting the SCSI request.
506、当该分布式存储系统内的目标存储节点接收到该第一故障标识后,若该目标存储节点再接收到访问请求,该目标存储节点悬挂该访问请求,向目标设备发送该第一故障标识。506. After the target storage node in the distributed storage system receives the first fault identifier, if the target storage node receives the access request again, the target storage node suspends the access request and sends the first fault to the target device. logo.
本步骤506可以由该目标存储节点内接收该访问请求的OSD进程来实现。当该目标存储节点接收到该第一故障标识后,说明该目标存储节点已经知道该分布式存储系统中有故障的存储节点,且目前的故障状态为第一故障状态,由于目前分布式存储系统内的存储节点出现故障,则该目标存储节点可以将该访问请求悬挂,暂不处理该访问请求,等待故障的存储节点被自动修复或者手动修复。This step 506 may be implemented by the OSD process that receives the access request in the target storage node. When the target storage node receives the first fault identifier, it means that the target storage node already knows that there is a faulty storage node in the distributed storage system, and the current fault state is the first fault state, because the current distributed storage system If the storage node in the storage node fails, the target storage node can suspend the access request, temporarily suspend the access request, and wait for the faulty storage node to be repaired automatically or manually.
为了使得目标设备也能获知分布式存储系统的故障状态,则该目标存储节点在接收到目标设备发送的访问请求时,该目标存储节点可以将该第一故障标识输出给目标设备。具体地,该目标存储节点内的任一OSD进程接收到目标设备SCSI处理进程发送的访问请求后,任一OSD进程向该SCSI处理进程发送该第一故障标识。当然,在一些实施例中,监管节点确定完该分布式存储系统的故障状态后,也可以将故障状态直接发送给目标设备,以便目标设备可以获取分布式存储系统的故障状态,也就无需通过存储节点向目标设备发送故障状态。In order for the target device to also know the fault state of the distributed storage system, when the target storage node receives the access request sent by the target device, the target storage node can output the first fault identifier to the target device. Specifically, after any OSD process in the target storage node receives the access request sent by the SCSI processing process of the target device, any OSD process sends the first fault identifier to the SCSI processing process. Of course, in some embodiments, after the supervisory node has determined the fault status of the distributed storage system, it can also directly send the fault status to the target device, so that the target device can obtain the fault status of the distributed storage system, so there is no need to pass The storage node sends a fault status to the target device.
需要说明的是,本步骤506所示的过程也即是当该分布式存储系统中的目标存储节点接收到该故障标识后,若该目标存储节点再接收到访问请求时,输出故障标识的过程。It should be noted that the process shown in this step 506 is also the process of outputting the fault identifier when the target storage node receives the access request after the target storage node in the distributed storage system receives the fault identifier. .
507、目标设备接收该目标存储节点基于该访问请求返回的第一故障标识。507. The target device receives the first fault identifier returned by the target storage node based on the access request.
本步骤507以由该目标设备内的SCSI处理进程来实现,本步骤507所示的过程也即是接收该目标存储节点基于该访问请求返回的故障标识的过程。当任一OSD进程向该SCSI处理进程发送该第一故障状态,该SCSI处理进程可以接收该第一故障标识。需要说明的是,本步骤507所示的过程,也即是,接收所述目标存储节点返回的响应;所述响应中包含所述分布式存储系统的故障状态;所述故障状态用于指示至少一个出现故障的存储节点能否在第一预设时长内全部被修复的过程,其中,所述响应中包含所述分布式存储系统的故障状态也即是故障标识。This step 507 is implemented by the SCSI processing process in the target device, and the process shown in this step 507 is also the process of receiving the fault identifier returned by the target storage node based on the access request. When any OSD process sends the first fault status to the SCSI processing process, the SCSI processing process can receive the first fault identification. It should be noted that the process shown in this step 507, that is, receiving the response returned by the target storage node; the response includes the fault status of the distributed storage system; the fault status is used to indicate at least The process of whether a faulty storage node can be completely repaired within a first preset time period, wherein the response includes the fault status of the distributed storage system, that is, the fault identifier.
508、目标设备基于接收的第一故障标识,进行故障处理。508. The target device performs fault processing based on the received first fault identifier.
本步骤508可以由目标设备内安装的SCSI处理进程来执行,当该SCSI处理进程接收到该第一故障标识后,该SCSI处理进程可以将基于第一故障标识,进行故障处理,以便目标客户端内能够进行相应地处理。This step 508 may be performed by a SCSI processing process installed in the target device. After the SCSI processing process receives the first fault identifier, the SCSI processing process may perform fault processing based on the first fault identifier, so that the target client can be processed accordingly.
在一些实施例中,目标客户端中与该SCSI处理进程对接目标虚拟机可能是VMWare虚拟机,也可能不是VMWare虚拟机,由于不同的虚拟机目标设备进行故障处理的方式不同,因此,本步骤508可以通过下述方式1-2中的任一方式实现。In some embodiments, the target virtual machine connected to the SCSI processing process in the target client may be a VMWare virtual machine or may not be a VMWare virtual machine. Since different virtual machine target devices have different fault handling methods, this step 508 can be implemented in any of the following ways 1-2.
方式1、当该访问请求由该分布式存储系统中的目标客户端基于目标虚拟机发送,且该目标虚拟机是VMWare虚拟机时,若该故障状态为该第一故障状态,SCSI处理进程不向该目标虚拟机响应该访问请求。Mode 1. When the access request is sent by the target client in the distributed storage system based on the target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is the first fault state, the SCSI processing process does not work. Respond to the access request to the target virtual machine.
当SCSI处理进程接收到第一故障标识时,说明分布式存储系统处于第一故障状态,对于VMWare虚拟机而言,第一故障状态也即是ADP状态。且由于SCSI处理进程接收的访问请求为该目标虚拟机发送的,那么,为了使得VMWare虚拟机可以感知到分布式存储系统处于自己定义的APD状态,SCSI处理进程不响应目标虚拟机,又因为分布式存储系统内所有的SCSI处理进程只要向OSD进程下发访问请求,均会接收到第一故障标识,而处理访问请求的每个SCSI处理进程均不会响应目标虚拟机,从而可以模拟分布式存储系统内所有链路无响应(DOWN),对于目标虚拟机而言,没有接收到SCSI处理进程的响应,目标虚拟机就会持续发送访问请求,从而即使分布式存储系统内的存储节点没有全部故障时,也可以根据VMWare虚拟机定义的故障状态进行故障处理。When the SCSI processing process receives the first fault identifier, it indicates that the distributed storage system is in the first fault state. For the VMWare virtual machine, the first fault state is also the ADP state. And since the access request received by the SCSI processing process is sent by the target virtual machine, in order for the VMWare virtual machine to perceive that the distributed storage system is in the APD state defined by itself, the SCSI processing process does not respond to the target virtual machine, and because the distribution All SCSI processing processes in the storage system will receive the first fault identifier as long as they issue access requests to the OSD process, and each SCSI processing process processing access requests will not respond to the target virtual machine, so that it is possible to simulate distributed All links in the storage system are unresponsive (DOWN). For the target virtual machine, if no response from the SCSI processing process is received, the target virtual machine will continue to send access requests, so that even if the storage nodes in the distributed storage system do not have all the In the event of a fault, you can also perform fault processing according to the fault state defined by the VMWare virtual machine.
方式2、当该访问请求由该目标客户端基于目标虚拟机发送,且该目标虚拟机不是VMWare虚拟机时,若该故障状态为该第一故障状态,则SCSI处理进程向该目标虚拟机发送重试请求,该重试请求用于指示重新下发该访问请求。Mode 2: When the access request is sent by the target client based on the target virtual machine, and the target virtual machine is not a VMWare virtual machine, if the fault state is the first fault state, the SCSI processing process sends the target virtual machine to the Retry request, the retry request is used to instruct to resend the access request.
重试请求中携带的检测关键字可以是Unit Attention(0x6)错误码,UnitAttention(0x6)错误码可以指示存储节点内的存储介质或链路状态发生了变化,也即是出现了故障,该目标虚拟机接收到该重试请求后,目标虚拟机向该SCSI处理进程重新下发一个访问请求,用于实现上述方式1中的处理方式。The detection keyword carried in the retry request can be the Unit Attention (0x6) error code, and the Unit Attention (0x6) error code can indicate that the storage medium or link status in the storage node has changed, that is, a fault has occurred, the target After the virtual machine receives the retry request, the target virtual machine re-issues an access request to the SCSI processing process, so as to implement the processing method in the above-mentioned method 1.
对于不同的虚拟机,本发明实施例提供了不同的故障处理方式,从而使得该本发明实施例所提供的故障处理方式更具有普适性。For different virtual machines, the embodiments of the present invention provide different fault handling methods, so that the fault handling methods provided by the embodiments of the present invention are more universal.
需要说明的是,本步骤508所示的过程也即是基于响应中包含的故障状态,进行故障处理的过程。It should be noted that the process shown in this step 508 is also a process of performing fault processing based on the fault state included in the response.
509、若在该第一预设时长内该至少一个存储节点全部修复,监管节点向该分布式存储系统内的各个设备发送修复完成响应,该修复完成响应用于指示该分布式存储系统内没有故障设备。509. If the at least one storage node is all repaired within the first preset time period, the supervisory node sends a repair completion response to each device in the distributed storage system, and the repair completion response is used to indicate that there is no storage in the distributed storage system. faulty equipment.
该各个设备包括存储节点以及客户端。当在该第一预设时长内该至少一个存储节点全部修复时,说明此时分布式存储系统内的没有故障设备,若该监管节点存储了该用于标识第一故障状态的第一故障标识,该监管节点可以删除该第一故障标识,并向该分布式存储系统内的各个设备发送修复完成响应,已告知各个设备该分布式存储系统内没有故障设备,可以正常工作,那么当各个设备接收到该修复完成响应后,删除之前接收的故障标识,并可以开始正常工作。需要说明的是,步骤509所示的过程也即是当该至少一个存储节点修复完成时,向该分布式存储系统内的各个设备发送修复完成响应的过程。The respective devices include storage nodes and clients. When all the at least one storage node is repaired within the first preset time period, it means that there is no faulty device in the distributed storage system at this time, and if the supervisory node stores the first fault identifier used to identify the first fault state , the supervisory node can delete the first fault identifier, and send a repair completion response to each device in the distributed storage system, and has informed each device that there is no faulty device in the distributed storage system and can work normally, then when each device After the repair completion response is received, the previously received fault identifier is deleted, and normal work can be started. It should be noted that the process shown in step 509 is also a process of sending a repair completion response to each device in the distributed storage system when the at least one storage node is repaired.
在现有技术中,当客户端未获取任何存储节点的响应时,直接认为分布式存储系统的故障状态为APD状态,当存储节点明确返回存储异常的消息时才会认为分布式存储系统的故障状态确定为PDL状态,若存储节点出现长期故障时,可能无法向客户端返回存储异常的消息,从而客户端可能将故障状态确定为APD状态,因此,现有技术中确定分布式存储系统的故障状态并不精确,且若将PDL状态误认为APD状态,将导致业务侧修复工作无法开展,最终反而延长故障修复时长。而在本发明提供的实施例中,分布式存储系统内的每个存储节点均知道分布式存储系统的故障状态,对于未出现故障的存储节点而言,可以基于SCSI请求向目标设备返回故障状态,从而目标设备可以明确的知道分布式存储系统的故障状态,进而可以提高目标设备确定故障状态的精度。In the prior art, when the client does not obtain a response from any storage node, it directly considers that the fault state of the distributed storage system is the APD state, and only considers the fault of the distributed storage system to be the fault of the distributed storage system when the storage node clearly returns a message of storage exception. The state is determined to be the PDL state. If the storage node has a long-term failure, it may not be able to return a storage exception message to the client, so the client may determine the failure state as the APD state. Therefore, in the prior art, the failure of the distributed storage system is determined. The status is imprecise, and if the PDL status is mistaken for the APD status, the repair work on the service side will not be carried out, and eventually the fault repair time will be prolonged. In the embodiment provided by the present invention, each storage node in the distributed storage system knows the fault status of the distributed storage system, and for the storage node that does not have a fault, the fault status can be returned to the target device based on the SCSI request. , so that the target device can clearly know the fault state of the distributed storage system, thereby improving the accuracy of the target device in determining the fault state.
510、当该故障状态为第一故障状态时,若在该第一预设时长内该至少一个存储节点未全部被修复,则监管节点将该故障状态由第一故障状态更新为第二故障状态。510. When the fault state is the first fault state, if the at least one storage node is not all repaired within the first preset time period, the supervisory node updates the fault state from the first fault state to the second fault state .
由于第一预设时长仅是一个预设的时长,在该第一预设时长内,该至少一个存储节点可能无法被全部修复,那么,当该至少一个存储节点可能未全部被修复时,就可能需要更长的时间修复还未修复的存储节点。由于修复还未修复的存储节点所用的时长不确定,可能会比较久,因此,该监管节点可以直接将该故障状态由第一故障状态更新为第二故障状态。Since the first preset duration is only a preset duration, within the first preset duration, the at least one storage node may not be fully repaired, then, when the at least one storage node may not be fully repaired, the It may take longer to repair unrepaired storage nodes. Since the time period for repairing an unrepaired storage node is uncertain and may be relatively long, the supervisory node can directly update the fault state from the first fault state to the second fault state.
511、当将该故障状态由第一故障状态更新为该第二故障状态时,监管节点向该分布式存储系统内的所有存储节点发送用于指示该第二故障的第二故障标识。511. When the fault state is updated from the first fault state to the second fault state, the supervisory node sends a second fault identifier for indicating the second fault to all storage nodes in the distributed storage system.
监管节点向所有存储节点发送第二故障标识的方式与步骤503中所有存储节点发送第一故障标识的方式同理,在此,本发明实施例对该本步骤511不做赘述。本步骤511所示的过程也即是向该多个存储节点中每一个存储节点发送所述故障状态的过程。The manner in which the supervisory node sends the second fault identifier to all storage nodes is the same as the manner in which all storage nodes send the first fault identifier in step 503 , and this step 511 is not repeated in this embodiment of the present invention. The process shown in this step 511 is also the process of sending the fault status to each of the multiple storage nodes.
512、目标存储节点接收该第二故障标识。512. The target storage node receives the second fault identifier.
目标存储节点接收该第二故障标识与步骤504中接收第一故障标识的方式同理,在此,本发明实施例对该本步骤512不做赘述。The manner in which the target storage node receives the second fault identifier is the same as the manner in which the first fault identifier is received in step 504, and this step 512 is not repeated in this embodiment of the present invention.
513、目标设备向该目标存储节点发送访问请求。513. The target device sends an access request to the target storage node.
目标设备向该目标存储节点发送访问请求的方式在步骤505中有相关描述,在此,本发明实施例对该本步骤513不做赘述。The manner in which the target device sends an access request to the target storage node is described in step 505 , and this step 513 is not repeated in this embodiment of the present invention.
514、当该分布式存储系统内的目标存储节点接收到该第二故障标识时,若该目标存储节点再接收到访问请求,该目标存储节点悬挂该访问请求,输出该第二故障标识。514. When the target storage node in the distributed storage system receives the second fault identifier, if the target storage node receives the access request again, the target storage node suspends the access request and outputs the second fault identifier.
该任一存储节点悬挂访问请求以及输出该第二故障标识方式与步骤506中该目标存储节点悬挂访问请求以及输出该第一故障标识的方式同理,在此,本发明实施例对该本步骤514不做赘述。The manner in which any storage node suspends the access request and outputs the second fault identifier is the same as the manner in which the target storage node suspends the access request and outputs the first fault identifier in step 506 . 514 does not go into details.
515、目标设备接收该目标存储节点基于该访问请求返回的第二故障标识。515. The target device receives the second fault identifier returned by the target storage node based on the access request.
目标设备接收该第二故障标识的方式与步骤507中接收第一故障标识的方式同理,在此,本发明实施例对此不做赘述。The manner in which the target device receives the second fault identifier is the same as the manner in which the first fault identifier is received in step 507, and details are not described herein in this embodiment of the present invention.
516、目标设备基于接收的第二故障标识,进行故障处理。516. The target device performs fault processing based on the received second fault identifier.
本步骤516可以由目标设备的SCSI处理进程来执行,在一些实施例中,客户端中与该SCSI处理进程对接目标虚拟机可能是VMWare虚拟机,也可能不是VMWare虚拟机,由于不同的虚拟机目标设备进行故障处理的方式不同,因此,本步骤516可以通过下述方式3-4中的任一方式实现。This step 516 may be performed by the SCSI processing process of the target device. In some embodiments, the target virtual machine connected to the SCSI processing process in the client may be a VMWare virtual machine, or may not be a VMWare virtual machine, because different virtual machines The target device performs fault processing in different ways. Therefore, this step 516 can be implemented in any of the following ways 3-4.
方式3、当该访问请求由该目标客户端基于目标虚拟机发送,且该目标虚拟机是VMWare虚拟机时,若该故障状态为该第二故障状态,SCSI处理进程向VMWare虚拟机返回存储异常的消息。Mode 3. When the access request is sent by the target client based on the target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is the second fault state, the SCSI processing process returns a storage exception to the VMWare virtual machine news.
当该故障标识第二故障标识时,说明至少一个存储节点在第一预设时间内不能全部被修复,需要更长的时间,目标虚拟机可以进行PDL状态下的故障处理,为了目标虚拟机可以感知到PDL状态,存储异常的消息可以携带VMWare虚拟机自定义的SK 0x0,ASC&ASCQ0x0200或SK 0x5,ASC&ASCQ 0x2500等SCSI错误,SCSI错误可以指示分布式存储系统内的状态为PDL状态,从而目标虚拟机接收到该存储异常的消息就可以感知到PDL状态,那么该目标虚拟机可以将该分布式存储系统内的文件系统下电,等待技术人员修复分布式存储系统内的故障,或者按照用户自定义的故障处理方式选择较优故障处理方式来处理故障的存储节点。例如,将故障的存储节点下电。When the fault is marked with the second fault, it means that at least one storage node cannot be repaired within the first preset time, and it takes a longer time. The target virtual machine can perform fault processing in the PDL state, so that the target virtual machine can Perceiving the PDL state, the abnormal storage message can carry SCSI errors such as SK 0x0, ASC&ASCQ0x0200 or SK 0x5, ASC&ASCQ 0x2500 customized by the VMWare virtual machine, and the SCSI error can indicate that the state in the distributed storage system is the PDL state, so that the target virtual machine The PDL status can be sensed after receiving the message of the storage exception. Then the target virtual machine can power off the file system in the distributed storage system and wait for the technician to repair the fault in the distributed storage system, or customize it according to the user-defined function. Select the optimal fault handling method to deal with the faulty storage node. For example, power off the failed storage node.
需要说明的是,由于分布式存储系统内的存储节点出现不能短时修复的故障时,可能导致文件系统异常,为了保证分布式存储系统内的存储节点的故障被修复后,文件系统能够被正常使用,则需要先将文件系统下电。当修复完成时,在将文件系统进行上电,并修复文件系统。It should be noted that when the storage node in the distributed storage system has a fault that cannot be repaired in a short time, the file system may be abnormal. If you want to use it, you need to power off the file system first. When the repair is complete, power up the file system and repair the file system.
方式4、当该访问请求由该目标客户端基于目标虚拟机发送,且该目标虚拟机不是VMWare虚拟机时,若该故障状态为该第二故障状态,该目标设备向该目标虚拟机返回该目标虚拟机可识别的目标错误,该目标错误用于指示存储介质故障。Mode 4. When the access request is sent by the target client based on the target virtual machine, and the target virtual machine is not a VMWare virtual machine, if the fault state is the second fault state, the target device returns the target virtual machine. A target error recognized by the target virtual machine that indicates a storage media failure.
该目标错误可以是Sense key 0x3错误,也即是存储介质错误(Medium Error),一般的虚拟机均可以识别,当目标虚拟机接收到该目标错误,说明此时分布式存储系统的状态为第二故障状态,那么该目标设备可以将分布式文件系统下电,等待技术人员修复分布式存储系统内的故障,或者按照用户自定义的故障处理方式选择较优故障处理方式来处理故障的存储节点。The target error can be a Sense key 0x3 error, that is, a storage medium error (Medium Error), which can be identified by general virtual machines. When the target virtual machine receives the target error, it means that the state of the distributed storage system is the first. In the second fault state, the target device can power off the distributed file system and wait for technicians to repair the fault in the distributed storage system, or select a better fault processing method to deal with the faulty storage node according to the user-defined fault processing method. .
需要说明的是,本步骤516所示的过程也即是基于接收的故障标识,目标设备进行故障处理的过程,也即是基于所述响应中包含的故障状态,进行故障处理的过程。It should be noted that the process shown in this step 516 is also a process of performing fault processing on the target device based on the received fault identifier, that is, a process of performing fault processing based on the fault status included in the response.
517、当该至少一个存储节点修复完成时,向该分布式存储系统内的各个设备发送修复完成响应。517. When the at least one storage node is repaired, send a repair completion response to each device in the distributed storage system.
本步骤517与步骤509同理,在此本发明实施例对本步骤517不做赘述。需要说明的是,该每个存储节点的故障可以是被自身所修复,还可以被技术人员修复,本发明实施例对存储节点的修复方式不做具体限定。This step 517 is the same as that of step 509, and this step 517 is not described repeatedly in this embodiment of the present invention. It should be noted that the fault of each storage node may be repaired by itself or by a technician, and the embodiment of the present invention does not specifically limit the repairing method of the storage node.
需要说明的是,当客户端接收到该修复完成响应时,若文件系统已经下电,则该客户端对该文件系统上电,并修复该文件系统。由于文件系统内存储有大量的元数据,当对该文件系统进行修复时,需要扫描该文件系统内的所有元数据,并修改扫描到的错误元数据,一般该修复文件系统的过程需要消耗部分时间,在第一故障状态下,客户端不会下电文件系统,一旦在第一预设时间内该至少一个存储节点均能被修复,那么也就可以避免下电文件系统,从而可以减少修复文件系统的时间,使得分布式存储系统可以尽快恢复业务,以保证服务质量。It should be noted that, when the client receives the repair completion response, if the file system has been powered off, the client powers on the file system and repairs the file system. Since a large amount of metadata is stored in the file system, when repairing the file system, it is necessary to scan all the metadata in the file system and modify the scanned error metadata. Generally, the process of repairing the file system needs to consume part of the metadata. In the first fault state, the client will not power off the file system. Once the at least one storage node can be repaired within the first preset time, the file system can be prevented from being powered off, thereby reducing the need for repairs. The time of the file system enables the distributed storage system to resume business as soon as possible to ensure the quality of service.
本发明实施例所示的方法,根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态,从而无需当所有存储节点均故障时,才确定分布式存储系统的故障状态,当确定完故障状态后,可以立即向分布式存储系统内的每个存储节点发送故障状态,以便每个存储节点根据确定的故障状态进行故障处理,从而可以降低分布式存储系统恢复正常所用的时间。并且,对于不同的虚拟机,本发明实施例提供了不同的故障处理方式,从而使得该本发明实施例所提供的故障处理方式更具有普适性。并且,分布式存储系统内的每个存储节点均知道分布式存储系统的故障状态,对于未出现故障的存储节点而言,可以基于访问请求向目标设备返回故障状态,从而目标设备可以明确的知道分布式存储系统的故障状态,进而可以提高目标设备确定故障状态的精度。并且,在第一故障状态下,目标设备不会下电文件系统,一旦在第一预设时间内该至少一个存储节点均能被修复,那么也就可以避免下电文件系统,且当该至少一个存储节点恢复后,文件系统以及业务可以立即恢复,从而可以减少修复文件系统的时间,使得分布式存储系统可以尽快恢复业务,以保证服务质量。In the method shown in the embodiment of the present invention, the fault state of the distributed storage system is determined according to the faulty storage node of at least one of the plurality of storage nodes, so that it is not necessary to determine the distributed storage system only when all storage nodes are faulty. The fault status of the distributed storage system can be sent to each storage node in the distributed storage system immediately after the fault status is determined, so that each storage node can perform fault processing according to the determined fault status, thereby reducing the distributed storage system. The time it takes for the storage system to return to normal. Moreover, for different virtual machines, the embodiments of the present invention provide different fault handling methods, so that the fault handling methods provided by the embodiments of the present invention are more universal. In addition, each storage node in the distributed storage system knows the fault status of the distributed storage system. For storage nodes that have not failed, the fault status can be returned to the target device based on the access request, so that the target device can clearly know the fault status. The fault state of the distributed storage system can further improve the accuracy of determining the fault state of the target device. Moreover, in the first fault state, the target device will not power off the file system. Once the at least one storage node can be repaired within the first preset time, the file system can also be avoided from powering off, and when the at least one storage node can be repaired After a storage node is restored, the file system and services can be restored immediately, which can reduce the time for repairing the file system and enable the distributed storage system to restore services as soon as possible to ensure service quality.
由于存储节点出现的故障可能在短时间内可修复,还可能需要长时间修复,且每个存储节点的修复时间,都可以影响到整个分布式存储系统的修复时间,因此,在一些实施例中,还可以根据每个存储节点的故障类型,来确定分布式存储系统的故障状态。为了进一步说明此过程,参见图6所示的本发明实施例所提供的一种分布式存储系统故障处理方法的流程图,该方法的流程可以包括以下步骤。Since the failure of a storage node may be repairable in a short time, it may also take a long time to repair, and the repair time of each storage node may affect the repair time of the entire distributed storage system. Therefore, in some embodiments , the fault state of the distributed storage system can also be determined according to the fault type of each storage node. To further illustrate this process, refer to the flowchart of a method for handling faults in a distributed storage system provided by an embodiment of the present invention as shown in FIG. 6 . The flowchart of the method may include the following steps.
601、监管节点确定分布式存储系统内出现故障的至少一个存储节点以及该至少一个存储节点无法被访问的目标数据。601. The supervisory node determines at least one faulty storage node in the distributed storage system and target data that cannot be accessed by the at least one storage node.
本步骤601与步骤501同理,在此本发明实施例对步骤601不做赘述。This step 601 is the same as that of step 501, and the embodiment of the present invention does not describe step 601 in detail here.
602、监管节点根据该至少一个存储节点出现故障的时间,确定该分布式存储系统的故障场景,该故障场景用于指示该至少一个存储节点是否同时出现故障。602. The supervisory node determines a failure scenario of the distributed storage system according to the time when the at least one storage node fails, where the failure scenario is used to indicate whether the at least one storage node fails simultaneously.
该故障场景可以包括第一故障场景和第二故障场景中的任一个,其中,该第一故障场景用于指示该至少一个存储节点同时出现故障,该第二故障场景用于指示该至少一个存储节点出现故障的时间不同。The failure scenario may include any one of a first failure scenario and a second failure scenario, wherein the first failure scenario is used to indicate that the at least one storage node fails simultaneously, and the second failure scenario is used to indicate that the at least one storage node fails Nodes fail at different times.
该监管节点可以从存储的故障表中确定每一个存储节点出现故障的时间,从而该监管节点就可以根据至少一个存储节点出现故障的时间是否相同,确定故障场景。The supervisory node can determine the failure time of each storage node from the stored fault table, so that the supervisory node can determine the failure scenario according to whether the failure time of at least one storage node is the same.
在一种可能的实现方式中,当该至少一个存储节点在目标时长内均出现故障时,该监管节点将该故障场景确定为第一故障场景,否则,将该故障场景确定为第二故障场景。需要说明的是本步骤602所示的过程也即是根据所述至少一个存储节点出现故障的时间,确定所述故障场景的过程。In a possible implementation manner, when the at least one storage node fails within the target duration, the supervisory node determines the failure scenario as the first failure scenario, otherwise, determines the failure scenario as the second failure scenario . It should be noted that the process shown in this step 602 is also the process of determining the failure scenario according to the time when the at least one storage node fails.
603、对于该至少一个存储节点中的任一存储节点,当该任一存储节点出现预设的网络故障、预设的异常掉电故障、预设的误操作故障、预设的硬件故障或预设的软件故障时,该监管节点将该任一存储节点的故障类型确定为第一故障类型,否则,将该任一存储节点的故障类型确定为第二故障类型。603. For any storage node in the at least one storage node, when a preset network failure, a preset abnormal power failure, a preset misoperation failure, a preset hardware failure, or a preset failure occurs in the any storage node; When the set software fails, the supervisory node determines the failure type of any storage node as the first failure type, otherwise, determines the failure type of any storage node as the second failure type.
存储节点的故障类型用于表示一个存储节点的故障能否在第二预设时长内被修复,故障类型可以包括第一故障类型和第二故障类型中的任一个,第二故障类型用于指示一个存储节点的故障不能在该第二预设时长内被修复,该第二预设时长可以小于或等于第一预设时长,本发明实施例对该第二预设时长不做具体限定。The fault type of the storage node is used to indicate whether the fault of a storage node can be repaired within the second preset time period, the fault type may include any one of the first fault type and the second fault type, and the second fault type is used to indicate A failure of a storage node cannot be repaired within the second preset duration, the second preset duration may be less than or equal to the first preset duration, and the second preset duration is not specifically limited in this embodiment of the present invention.
其中,该预设的网络故障可以包括下述第1.1-1.7项中的任一项:Wherein, the preset network fault may include any of the following items 1.1-1.7:
第1.1项、该任一存储节点的业务网口无法访问,该任一存储节点的监管网口能够访问,该业务网口是存储节点之间用于心跳、数据同步以及镜像时所使用的业务网络的网口,该监管网口为监控存储节点是否出现故障以及进行信息查询时所使用的监管网络的网口。Item 1.1. The service network port of any storage node cannot be accessed, but the supervision network port of any storage node can be accessed. The service network port is the service used for heartbeat, data synchronization and mirroring between storage nodes. The network port of the network, the supervision network port is the network port of the supervision network used to monitor whether the storage node is faulty and perform information query.
该监管节点可以通过监管节点的业务网口向任一存储节点的业务网口发送因特网包探索器(ping)请求,该ping请求用于请求建立连接,若可以连接成功,则认为该任一存储节点的业务网能够口访问,反之,则认为该任一存储节点的业务网口无法访问。同理,该监管节点可以通过监管节点的监管网口向任一存储节点的监管网口发送ping请求,若可以连接成功,则认为该任一存储节点的监管网口可以访问,否则认为该任一存储节点的监管网口无法访问。The supervisory node can send an Internet packet explorer (ping) request to the service network port of any storage node through the service network port of the supervisory node. The ping request is used to request the establishment of a connection. If the connection is successful, it is considered that any storage node If the service network port of the node can be accessed, otherwise, it is considered that the service network port of any storage node cannot be accessed. Similarly, the supervisory node can send a ping request to the supervisory network port of any storage node through the supervisory network port of the supervisory node. If the connection is successful, the supervisory network port of any storage node is considered to be accessible; otherwise, the supervisory network port of any storage node is considered to be accessible. The supervision network port of a storage node cannot be accessed.
若监管节点通过业务网口无法访问该任一存储节点,说明任一存储节点出现了网络故障,但是通过监管网口能够访问该任一存储节点,说明该任一存储节点出现的故障可以在短时间内修复,则该任一存储节点出现了预设的网络故障。If the supervisory node cannot access any of the storage nodes through the service network port, it means that any storage node has a network failure, but can access any of the storage nodes through the supervisory network port, it means that the failure of any of the storage nodes can be If it is repaired within the time limit, the preset network failure occurs on any storage node.
第1.2项、当该业务网络和该监管网络为同一个目标网络时,该任一存储节点在该目标网络内传输的数据包出现第一预设数目的丢包或第二预设数目的畸形包,且该任一存储节点的业务网口、监管网口以及基板管理控制器BMC网口均不可访问,BMC网口为管理BMC的BMC网络的网口。Item 1.2. When the service network and the supervisory network are the same target network, the data packets transmitted by any storage node in the target network have a first preset number of packet loss or a second preset number of malformations package, and the service network port, supervision network port, and baseboard management controller BMC network port of any storage node are inaccessible, and the BMC network port is the network port of the BMC network that manages the BMC.
该监管节点可以通过监管节点的BMC网口向任一存储节点的BMC网口发送ping请求,若可以连接成功,则认为该任一存储节点的BMC网口可以访问,否则,认为该任一存储节点的BMC网口不可以访问。The supervisory node can send a ping request to the BMC network port of any storage node through the BMC network port of the supervisory node. If the connection is successful, the BMC network port of any storage node is considered to be accessible; otherwise, it is considered that any storage node can be accessed. The BMC network port of the node cannot be accessed.
在交付阶段,技术人员可以将业务网络和监管网络配置为同一个网络,也即是目标网络,当该业务网络和该监管网络为同一个目标网络时,如果该任一存储节点在该目标网络内传输的数据包出现第一预设数目的丢包或第二预设数目的畸形包,说明该任一存储节点出现了网络故障,若该监管节点无法访问该任一存储节点,说明该任一存储节点出现的故障可以在短时间内修复,则该任一存储节点出现了预设的网络故障。In the delivery stage, technicians can configure the service network and the supervision network as the same network, that is, the target network. When the service network and the supervision network are the same target network, if any storage node is on the target network The first preset number of lost packets or the second preset number of malformed packets in the data packets transmitted within the network indicate that any storage node has a network failure. If the supervisory node cannot access any storage node, it indicates that the If the failure of a storage node can be repaired in a short time, then any storage node has a preset network failure.
第1.3项、当该业务网络和该监管网络为同一个目标网络时,该任一存储节点在该目标网络内传输的数据包出现大于第一预设数目的丢包或大于第二预设数目的畸形包,且该任一存储节点在目标网络内传输数据的时延大于第三预设时长。Item 1.3. When the service network and the supervisory network are the same target network, the data packets transmitted by any storage node in the target network have packet loss greater than the first preset number or greater than the second preset number. malformed packets, and the delay for any storage node to transmit data in the target network is greater than the third preset duration.
需要说明的是,当监管节点在向该任一存储节点发送ping请求时,若连接不成功,该任一存储节点会向该监管节点发送连接失败响应,该连接失败响应用于指示连接失败,且该连接失败响应内可以携带时延信息,该时延信息用于指示该任一存储节点在目标网络内传输数据的时延,从而该监管节点可以判断时延信息所指示的时延是否大于第三预设时长。It should be noted that when the supervisory node sends a ping request to any storage node, if the connection is unsuccessful, any of the storage nodes will send a connection failure response to the supervisory node, and the connection failure response is used to indicate that the connection fails. And the connection failure response can carry delay information, the delay information is used to indicate the delay of any storage node transmitting data in the target network, so that the supervision node can judge whether the delay indicated by the delay information is greater than or equal to The third preset duration.
第1.4项、当该业务网络和该监管网络为同一个目标网络时,该目标网络中出现该任一存储节点的优先级的流量控制PFC报文大于第三预设数目,且该任一存储节点不可访问。Item 1.4. When the service network and the supervisory network are the same target network, the flow control PFC packets with the priority of any storage node in the target network are greater than the third preset number, and any storage node Node is not accessible.
监管节点可以检测目标网络内的各个优先级的流量控制(priority-based flowcontrol,PFC)报文,从而可以确定该任一存储节点发送的PFC报文的数目是否大于第三预设数目,该监管节点就可以确定该任一存储节点是否符合预设条件。本发明实施例对该第三预设数目不做具体限定。The supervisory node can detect flow control (priority-based flowcontrol, PFC) packets of each priority in the target network, so as to determine whether the number of PFC packets sent by any storage node is greater than a third preset number, the supervisory The node can then determine whether any of the storage nodes meets the preset conditions. This embodiment of the present invention does not specifically limit the third preset number.
第1.5项、当该业务网络和该监管网络为同一个目标网络时,该目标网络中出现该任一存储节点发送第三预设数目的优先级的流量控制PFC报文,且该任一存储节点在该目标网络内传输数据的时延大于第四预设时长。Item 1.5. When the service network and the supervision network are the same target network, any storage node in the target network sends a third preset number of priority flow control PFC packets, and any storage node appears in the target network. The delay for the node to transmit data in the target network is greater than the fourth preset duration.
需要说明的是,本发明实施例对该四预设时长不做具体限定。It should be noted that, the embodiment of the present invention does not specifically limit the four preset durations.
第1.6项、当该业务网络和该监管网络为同一个目标网络时,该目标网络中出现任一存储节点导致的广播风暴,且该任一存储节点的业务网口、监管网口以及BMC网口均不可访问。Item 1.6. When the service network and the supervision network are the same target network, a broadcast storm caused by any storage node occurs in the target network, and the service network port, supervision network port and BMC network port of any storage node mouth is not accessible.
需要说明的是,当任一存储节点在该目标网络内发送大量的广播包,则该目标网络中可能出现广播风暴。It should be noted that when any storage node sends a large number of broadcast packets in the target network, a broadcast storm may occur in the target network.
第1.7项、当该业务网络和该监管网络为同一个目标网络时,该目标网络中出现该任一存储节点导致的广播风暴,且该任一存储节点在该目标网络内的时延大于第五预设时长。Item 1.7. When the service network and the supervision network are the same target network, a broadcast storm caused by any storage node occurs in the target network, and the delay of any storage node in the target network is greater than the Five preset durations.
需要说明的是,本发明实施例对该五预设时长不做具体限定。It should be noted that, the embodiment of the present invention does not specifically limit the five preset durations.
该预设的异常掉电故障可以包括下述第2.1-2.2项中的任一项:The preset abnormal power failure can include any of the following items 2.1-2.2:
第2.1项、机框内的所有存储节点的业务网口、监管网口以及BMC网口均不可访问,该机框包括所述任一存储节点。Item 2.1. The service network ports, supervisory network ports, and BMC network ports of all storage nodes in the chassis are inaccessible, and the chassis includes any of the storage nodes.
一个机框内可以包括至少一个存储节点,当所有的存储节点业务网口、监管网口以及BMC网口均不可访问时,可以认为机框内所有的存储节点均被下电,那么,若该任一存储节点在该机框内,说明该任一存储节点也被下电,只要给机框上电,就可以修复该任一存储节点的故障,则认为该任一存储节点出现了预设的异常掉电故障。A chassis can include at least one storage node. When all the storage node service network ports, supervisory network ports, and BMC network ports are inaccessible, it can be considered that all storage nodes in the chassis are powered off. If any storage node is in the chassis, it means that any storage node is also powered off. As long as the chassis is powered on, the fault of any storage node can be repaired, and it is considered that any storage node has a preset fault. abnormal power failure.
第2.2项、在第七预设时长内,第一目标个数的存储节点的业务网口、监管网口以及BMC网口均不可访问,该第一目标个数的存储节点包括该任一存储节点。Item 2.2. Within the seventh preset time period, the service network ports, supervisory network ports and BMC network ports of the storage nodes of the first target number are all inaccessible, and the storage nodes of the first target number include any storage node of the storage node. node.
当在第七预设时长内,第一目标个数的存储节点的业务网口、监管网口以及BMC网口均不可访问时,第一目标个数的存储节点均可以认为出现了预设的异常掉电故障。需要说明的是,本发明实施例对该第七预设时长不做具体限定。When the service network ports, supervisor network ports and BMC network ports of the storage nodes of the first target number are all inaccessible within the seventh preset time period, the storage nodes of the first target number can be considered to have the preset number of storage nodes. Abnormal power failure. It should be noted that, the embodiment of the present invention does not specifically limit the seventh preset duration.
该预设的误操作故障可以包括:该任一存储节点被主动下电。例如,当用户点击任一存储节点的关机按钮或者重启按钮时,存储节点认为被主动下电,并将主动下电的信息发送给监管节点,从而监管节点确定该任一存储节点出现了预设的误操作故障。The preset misoperation fault may include: any storage node is actively powered off. For example, when a user clicks the shutdown button or restart button of any storage node, the storage node considers it to be powered off actively, and sends the information about the active power off to the supervisory node, so that the supervisory node determines that any storage node has a preset misoperation failure.
该预设的硬件故障包括:任一存储节点异常退出,该任一存储节点的BMC网口能够访问,且该任一存储节点存在松动的部件。The preset hardware failure includes: any storage node exits abnormally, the BMC network port of any storage node can be accessed, and any storage node has loose components.
当任一存储节点异常退出时,可以向监管节点发送异常退出的信息,以表示自己已经异常退出。由于异常退出可能是内部的部件松动导致。该任一部件松动可以是内存条以及卡条等,对于松动的部件通过插拔的方式可以立即恢复,也就是出现的是短时故障。需要说明的是,当该任一存储节点检测到任一部件连接不良时,说明该任一存储节点存在松动的部件,则该任一存储节点可以向监管节点发送松动部件的信息,以便监管节点可以根据松动部件的信息,确定该任一存储节点出现了预设的误操作故障。When any storage node exits abnormally, it can send abnormal exit information to the supervisory node to indicate that it has exited abnormally. The abnormal exit may be caused by loose internal components. The loose parts can be memory sticks and card sticks, etc. The loose parts can be recovered immediately by plugging and unplugging, that is, a short-term fault occurs. It should be noted that when any storage node detects that any component is poorly connected, it means that any storage node has loose components, and then any storage node can send loose component information to the supervisory node, so that the supervisory node can According to the information of the loose parts, it can be determined that a preset misoperation fault occurs in any storage node.
该预设的软件故障可以包括下述第3.1-3.3项中的任一项:The preset software fault may include any of the following items 3.1-3.3:
第3.1项、该任一存储节点的操作系统异常导致该任一存储节点异常复位。Item 3.1: An abnormality of the operating system of any storage node causes the abnormal reset of any storage node.
当该任一存储节点的内存不足时,导致任一存储节点的操作系统无法继续运行,需要复位,或者是看门狗触发的该任一存储节点异常复位等,当该任一存储节点出现异常复位时,该任一存储节点可以向监管节点发送异常复位的消息,从而该监管节点可以获知该任一存储节点异常复位,则说明该任一存储节点出现了预设的软件故障。When the memory of any storage node is insufficient, the operating system of any storage node cannot continue to run and needs to be reset, or any storage node is reset abnormally triggered by the watchdog, etc. During reset, any storage node can send a message of abnormal reset to the supervisory node, so that the supervisory node can know that any of the storage nodes is abnormally reset, indicating that a preset software fault occurs in any of the storage nodes.
第3.2项、该任一存储节点的软件异常导致该任一存储节点的目标进程退出。Item 3.2, the software exception of any storage node causes the target process of any storage node to exit.
该目标进程可以是OSD进程,当该任一存储节点出现异常复位时,该任一存储节点可以向监管节点发送目标进程退出的消息,从而该监管节点可以获知该任一存储节点的目标进程退出,则说明该任一存储节点出现了预设的软件故障。The target process may be an OSD process. When any storage node is abnormally reset, any storage node can send a message that the target process has exited to the supervisory node, so that the supervisory node can learn that the target process of any storage node has exited , it means that any storage node has a preset software failure.
第3.3项、该任一存储节点的软件异常导致该任一存储节点的操作系统复位。Item 3.3, the software exception of any storage node causes the operating system of any storage node to reset.
由于软件异常,该任一存储节点操作系统出现复位时,该任一存储节点可以向监管节点发送操作系统复位的消息,从而该监管节点可以获知该任一存储节点的操作系统复位,则说明该任一存储节点出现了预设的软件故障。Due to a software exception, when the operating system of any storage node is reset, any storage node can send a message of operating system reset to the supervisory node, so that the supervisory node can learn that the operating system of any storage node is reset. A preset software failure has occurred on either storage node.
需要说明是,当该至少一个存储节点中的每个存储节点出现故障时,监管节点就可以通过本步骤603判断每个存储节点的故障类型是第一故障类型,还是第二故障类型,并将每个存储节点的故障类型存储在故障表中,以便需要监管节点需要任一存储节点的故障类型时,可以直接从故障表中获取。It should be noted that, when each storage node in the at least one storage node is faulty, the supervisory node can determine whether the fault type of each storage node is the first fault type or the second fault type through this step 603, and determine whether the fault type of each storage node is the first fault type or the second fault type. The fault type of each storage node is stored in the fault table, so that when the supervisory node needs the fault type of any storage node, it can be directly obtained from the fault table.
需要说明的是,本步骤603中所体现的多种故障类型判别方法,可以精确地确定每个存储节点的故障类型,进而根据每个存储节点的故障类型确,可以更加精确的确定分布式存储系统的故障状态。It should be noted that, the various fault type discrimination methods embodied in this step 603 can accurately determine the fault type of each storage node, and then according to the exact fault type of each storage node, the distributed storage can be more accurately determined. The fault state of the system.
604、当该至少一个存储节点的数目大于该分布式存储系统的冗余度,且该目标数据的数据量符合预设条件时,若该故障场景为该第一故障场景,该监管节点则根据该至少一个存储节点中每一个存储节点的故障类型,确定该故障状态。604. When the number of the at least one storage node is greater than the redundancy of the distributed storage system, and the data volume of the target data meets a preset condition, if the failure scenario is the first failure scenario, the supervisory node will The failure type of each storage node in the at least one storage node determines the failure state.
由于第一故障场景用于表示该至少一个存储节点同时出现,则该监管节点可以根据每一个存储节点,来确定该分布式存储系统的故障状态。Since the first failure scenario is used to indicate that the at least one storage node occurs simultaneously, the supervisory node may determine the failure state of the distributed storage system according to each storage node.
在一种可能的实现方式中,监管节点则根据该至少一个存储节点中每一个存储节点的故障类型,确定该故障状态可以包括:当该至少一个存储节点中每一个存储节点的故障类型均为第一故障类型时,该监管节点将该故障状态确定为该第一故障状态,该第一故障状态用于指示该至少一个存储节点能在所述第一预设时长内全部被修复;当该至少一个存储节点中目标个数的存储节点的故障类型为第二故障类型时,若该目标个数小于或者等于该分布式存储系统的冗余度时,该监管节点将该故障状态确定为该第一故障状态,否则,将该故障状态确定为该第二故障状态,该第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。In a possible implementation manner, the supervisory node determines the fault state according to the fault type of each storage node in the at least one storage node, which may include: when the fault type of each storage node in the at least one storage node is In the case of the first failure type, the supervisory node determines the failure state as the first failure state, and the first failure state is used to indicate that the at least one storage node can be completely repaired within the first preset time period; when the first failure state is When the failure type of a target number of storage nodes in at least one storage node is the second failure type, if the target number is less than or equal to the redundancy of the distributed storage system, the supervisory node determines the failure state as the failure state. The first failure state, otherwise, the failure state is determined as the second failure state, and the second failure state is used to indicate that the at least one storage node cannot be fully repaired within the first preset time period.
当该至少一个存储节点中每一个存储节点的故障类型均为第一故障类型时,可以认为分布式存储系统内的故障可以在短时间内被修复,则可以将该故障状态确定为第一故障状态。虽然目标个数的存储节点的故障类型为第二故障类型,但是当目标个数小于或等于该分布式存储系统的冗余度,说明目标个数的存储节点对分布式存储系统的影响不大,则可以将该故障状态确定为第一故障状态。一旦目标个数大于该分布式存储系统的冗余度,说明目标个数的存储节点对分布式存储系统的影响较大,则可以将该故障状态确定为第二故障状态,以便迅速修复故障。When the fault type of each storage node in the at least one storage node is the first fault type, it can be considered that the fault in the distributed storage system can be repaired in a short time, and the fault state can be determined as the first fault state. Although the failure type of the target number of storage nodes is the second failure type, when the target number is less than or equal to the redundancy of the distributed storage system, it means that the target number of storage nodes has little impact on the distributed storage system , the fault state can be determined as the first fault state. Once the target number is greater than the redundancy of the distributed storage system, indicating that the target number of storage nodes has a greater impact on the distributed storage system, the fault state can be determined as the second fault state, so as to quickly repair the fault.
需要说明的是,对于该至少一个存储节点的数目是否大于该分布式存储系统的冗余度,且该目标数据的数据量是否符合预设条件的描述,在前文中有体现,对此本发明实施例不做赘述。It should be noted that the description of whether the number of the at least one storage node is greater than the redundancy of the distributed storage system and whether the data volume of the target data meets the preset conditions is reflected in the foregoing description. The embodiment will not be repeated.
需要说明的是,当故障场景为第一故障场景时,监管节点可以通过步骤603中预设的网络故障,预设的异常掉电故障或预设的误操作故障中的任一预设故障,判断每个存储节点的故障类型。It should be noted that, when the fault scenario is the first fault scenario, the supervisory node can use any preset fault among the preset network fault in step 603, the preset abnormal power failure fault or the preset misoperation fault, Determine the fault type of each storage node.
605、当该至少一个存储节点的数目大于该分布式存储系统的冗余度,且该目标数据的数据量符合预设条件时,若该故障场景为第二故障场景,监管节点则根据该至少一个存储节点中最后一个出现故障的第一存储节点的故障类型,确定该故障状态。605. When the number of the at least one storage node is greater than the redundancy of the distributed storage system, and the data volume of the target data meets the preset condition, if the failure scenario is the second failure scenario, the supervisor node will The failure type of the last first storage node in one storage node that fails, and the failure state is determined.
由于第二故障场景用于表示该至少一个存储节点出现故障的时间不同,则该监管节点可以根据该至少与一个存储节点中最后一个出现故障的存储节点的故障类型,来确定该分布式存储系统的故障状态,其中,最后一个出现故障的存储节点也即是第一存储节点。Since the second failure scenario is used to indicate that the at least one storage node fails at different times, the supervisory node may determine the distributed storage system according to the failure type of the last failed storage node among the at least one storage node. The fault state is the first storage node, where the last faulty storage node is also the first storage node.
在一种可能的实现方式中,当该第一存储节点的故障类型为第一故障类型时,监管节点则将该故障状态确定为第一故障状态,该第一故障状态用于指示该至少一个存储节点能在该第一预设时长内全部被修复;当该第一存储节点的故障类型为所述第二故障类型时,监管节点则将该故障状态确定为第二故障状态,该第二故障状态用于指示该至少一个存储节点不能在所述第一预设时长内全部被修复。In a possible implementation manner, when the fault type of the first storage node is the first fault type, the supervisory node determines the fault state as the first fault state, and the first fault state is used to indicate the at least one The storage nodes can all be repaired within the first preset time period; when the failure type of the first storage node is the second failure type, the supervisory node determines the failure state as the second failure state, and the second failure state is the second failure state. The fault state is used to indicate that the at least one storage node cannot be fully repaired within the first preset time period.
需要说明的是,本步骤604和605所示的过程也即是根据所述分布式存储系统的故障场景,确定所述分布式存储系统的故障状态的过程。It should be noted that the processes shown in steps 604 and 605 are also processes of determining the fault state of the distributed storage system according to the fault scenario of the distributed storage system.
需要说明的是,当故障场景为第二故障场景时,监管节点仅需要确定第一存储节点的故障类型即可,无需确定所以存储节点的故障类型,可以通过步骤603中预设的网络故障,预设的异常掉电故障、预设的误操作故障、预设的硬件故障或预设的软件故障中的任一预设故障,来判断第一存储节点的故障类型。It should be noted that, when the failure scenario is the second failure scenario, the supervisory node only needs to determine the failure type of the first storage node, and does not need to determine the failure type of all storage nodes. The network failure preset in step 603 can be used. The fault type of the first storage node is determined based on any one of a preset abnormal power failure fault, a preset misoperation fault, a preset hardware fault, or a preset software fault.
606、该监管节点向该分布式存储系统内的所有存储节点发送用于指示故障状态的故障标识。606. The supervisory node sends a fault identifier for indicating a fault state to all storage nodes in the distributed storage system.
当该故障状态为第一故障状态时,该监管节点向该分布式存储系统内的所有存储节点发送第一故障标识,当该故障状态为第二故障状态时,向该监管节点向该分布式存储系统内的所有存储节点发送第二故障标识,具体执行过程与步骤503同理,在此不做赘述。需要说明的是,本步骤606所示的过程也即是向所述多个存储节点中每一个存储节点发送所述故障状态的过程。When the fault state is the first fault state, the supervisory node sends the first fault identifier to all storage nodes in the distributed storage system, and when the fault state is the second fault state, sends the supervisory node to the distributed storage system. All storage nodes in the storage system send the second fault identifier, and the specific execution process is the same as that of step 503, which is not repeated here. It should be noted that, the process shown in this step 606 is also the process of sending the fault status to each of the plurality of storage nodes.
607、该分布式存储系统内的目标存储节点接收故障标识。607. The target storage node in the distributed storage system receives the fault identifier.
当该故障标识为第一故障标识时,目标存储节点接收到第一故障标识,当该故障标识为第二故障标识时,目标存储节点接收到第二故障标识,具体执行过程与步骤504同理,在此不做赘述。When the fault identifier is the first fault identifier, the target storage node receives the first fault identifier. When the fault identifier is the second fault identifier, the target storage node receives the second fault identifier. The specific execution process is the same as step 504. , which will not be repeated here.
608、目标设备向该目标存储节点发送访问请求。608. The target device sends an access request to the target storage node.
本步骤608与步骤505所示的过程同理,本发明实施例对本步骤608不做赘述。This step 608 is the same as the process shown in step 505, and this step 608 is not described repeatedly in this embodiment of the present invention.
609、当该分布式存储系统内的目标存储节点接收到该故障标识后,若该目标存储节点再接收到访问请求,该目标存储节点悬挂该访问请求,输出该故障标识。609. After the target storage node in the distributed storage system receives the fault identifier, if the target storage node receives the access request again, the target storage node suspends the access request and outputs the fault identifier.
当该故障标识为第一故障标识时,目标存储节点向目标设备输出第一故障标识,当该故障标识为第二故障标识时,目标存储节点向目标设备输出第二故障标识,具体执行过程与步骤506同理,在此不做赘述。When the fault identifier is the first fault identifier, the target storage node outputs the first fault identifier to the target device. When the fault identifier is the second fault identifier, the target storage node outputs the second fault identifier to the target device. The specific execution process is as follows: The same is true for step 506, which is not repeated here.
610、目标设备接收该目标存储节点基于该访问请求返回的故障标识。610. The target device receives the fault identifier returned by the target storage node based on the access request.
本步骤610与步骤505所示的过程同理,本发明实施例对本步骤610不做赘述。需要说明的是,本步骤610所示的过程也即是接收所述目标存储节点返回的响应;所述响应中包含所述分布式存储系统的故障状态;所述故障状态用于指示至少一个出现故障的存储节点能否在第一预设时长内全部被修复的过程。其中,所述响应中包含的故障状态也即是故障标识。This step 610 is the same as the process shown in step 505, and this step 610 is not described repeatedly in this embodiment of the present invention. It should be noted that the process shown in this step 610 is also to receive the response returned by the target storage node; the response includes the fault status of the distributed storage system; the fault status is used to indicate that at least one occurrence of The process of whether all the faulty storage nodes can be repaired within the first preset time period. Wherein, the fault state included in the response is also the fault identifier.
611、目标设备基于接收的故障标识,进行故障处理。611. The target device performs fault processing based on the received fault identifier.
当该故障标识为第一故障标识时,目标设备基于接收的第一故障标识,进行故障处理,具体执行过程与步骤508所示的过程同理。当该故障标识为第二故障标识时,目标设备基于接收的第二故障标识,进行故障处理,具体执行过程与步骤516所示的过程同理,在此,本发明实施例对本步骤611不做赘述。When the fault identifier is the first fault identifier, the target device performs fault processing based on the received first fault identifier, and the specific execution process is the same as the process shown in step 508 . When the fault identifier is the second fault identifier, the target device performs fault processing based on the received second fault identifier. The specific execution process is the same as the process shown in step 516. Here, this embodiment of the present invention does not perform step 611. Repeat.
需要说明的是,分布式存储系统内的每个存储节点均知道分布式存储系统的故障状态,对于未出现故障的存储节点而言,可以基于访问请求向目标设备返回故障状态,从而目标设备可以明确的知道分布式存储系统的故障状态,进而可以提高目标设备确定故障状态的精度。It should be noted that each storage node in the distributed storage system knows the fault status of the distributed storage system. For storage nodes that have not failed, the fault status can be returned to the target device based on the access request, so that the target device can The fault state of the distributed storage system is clearly known, thereby improving the accuracy of the target device in determining the fault state.
需要说明的是,对于不同的虚拟机,本发明实施例提供了不同的故障处理方式,从而使得该本发明实施例所提供的故障处理方式更具有普适性。It should be noted that, for different virtual machines, the embodiments of the present invention provide different fault processing methods, so that the fault processing methods provided by the embodiments of the present invention are more universal.
需要说明的是,本步骤611所示的过程也即是基于所述响应中包含的故障状态,进行故障处理的过程。It should be noted that the process shown in this step 611 is also a process of performing fault processing based on the fault state included in the response.
612、当该至少一个存储节点修复完成时,监管节点向该分布式存储系统内的各个设备发送修复完成响应。612. When the repair of the at least one storage node is completed, the supervisory node sends a repair completion response to each device in the distributed storage system.
本步骤612与步骤509同理,在此本发明实施例对本步骤612不做赘述。需要说明的是,当该故障状态为第一故障状态时,若在该第一预设时长内该至少一个存储节点全部被修复完整,则可以直接执行本步骤612,若在该第一预设时长内该至少一个存储节点未全部修复,则监管节点将该故障状态由第一故障状态更新为第二故障状态,并跳转执行步骤606。需要说明的是,在第一故障状态下,客户端不会下电文件系统,一旦在第一预设时间内该至少一个存储节点均能被修复,那么也就可以避免下电文件系统,从而可以减少修复文件系统的时间,使得分布式存储系统可以尽快恢复业务,以保证服务质量。This step 612 is the same as that of step 509, and this step 612 is not described repeatedly in this embodiment of the present invention. It should be noted that, when the fault state is the first fault state, if the at least one storage node is completely restored within the first preset time period, this step 612 can be directly executed. If the at least one storage node is not all repaired within the time period, the supervisory node updates the fault state from the first fault state to the second fault state, and jumps to step 606 . It should be noted that, in the first fault state, the client will not power off the file system. Once the at least one storage node can be repaired within the first preset time, the file system can be avoided from powering off, thereby It can reduce the time to repair the file system, so that the distributed storage system can resume business as soon as possible to ensure the quality of service.
本发明实施例所示的方法,根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态,从而无需当所有存储节点均故障时,才确定分布式存储系统的故障状态,当确定完故障状态后,可以立即向分布式存储系统内的每个存储节点发送故障状态,以便每个存储节点根据确定的故障状态进行故障处理,从而可以降低分布式存储系统恢复正常所用的时间。并且,对于不同的虚拟机,本发明实施例提供了不同的故障处理方式,从而使得该本发明实施例所提供的故障处理方式更具有普适性。并且,分布式存储系统内的每个存储节点均知道分布式存储系统的故障状态,对于未出现故障的存储节点而言,可以基于访问请求向目标设备返回故障状态,从而目标设备可以明确的知道分布式存储系统的故障状态,进而可以提高目标设备确定故障状态的精度。并且,在第一故障状态下,目标设备不会下电文件系统,一旦在第一预设时间内该至少一个存储节点均能被修复,那么也就可以避免下电文件系统,从而可以减少修复文件系统的时间,使得分布式存储系统可以尽快恢复业务,以保证服务质量。并且,本步骤603中所体现的多种故障类型判别方法,可以精确地确定每个存储节点的故障类型,进而根据每个存储节点的故障类型确,可以更加精确的确定分布式存储系统的故障状态。In the method shown in the embodiment of the present invention, the fault state of the distributed storage system is determined according to the faulty storage node of at least one of the plurality of storage nodes, so that it is not necessary to determine the distributed storage system only when all storage nodes are faulty. The fault status of the distributed storage system can be sent to each storage node in the distributed storage system immediately after the fault status is determined, so that each storage node can perform fault processing according to the determined fault status, thereby reducing the distributed storage system. The time it takes for the storage system to return to normal. Moreover, for different virtual machines, the embodiments of the present invention provide different fault handling methods, so that the fault handling methods provided by the embodiments of the present invention are more universal. In addition, each storage node in the distributed storage system knows the fault status of the distributed storage system. For storage nodes that have not failed, the fault status can be returned to the target device based on the access request, so that the target device can clearly know the fault status. The fault state of the distributed storage system can further improve the accuracy of determining the fault state of the target device. In addition, in the first fault state, the target device will not power off the file system. Once the at least one storage node can be repaired within the first preset time, the file system can be prevented from being powered off, thereby reducing the need for repairs. The time of the file system enables the distributed storage system to resume business as soon as possible to ensure the quality of service. In addition, the various fault type discrimination methods embodied in this step 603 can accurately determine the fault type of each storage node, and then more accurately determine the fault of the distributed storage system according to the exact fault type of each storage node. state.
图7是本发明实施例提供的一种故障处理装置的结构示意图,应用于分布式存储系统,所述分布式存储系统包含多个存储节点,该装置包括:7 is a schematic structural diagram of a fault processing apparatus provided by an embodiment of the present invention, which is applied to a distributed storage system, where the distributed storage system includes a plurality of storage nodes, and the apparatus includes:
确定模块701,用于根据所述多个存储节点中的至少一个出现故障的存储节点,确定所述分布式存储系统的故障状态;所述故障状态用于指示所述至少一个出现故障的存储节点能否在第一预设时长内全部被修复;A
发送模块702,用于向所述多个存储节点中每一个存储节点发送所述故障状态。A sending
可选地,所述装置还包括:Optionally, the device further includes:
处理模块,用于向所述多个存储节点中每一个存储节点发送所述故障状态。A processing module, configured to send the fault status to each of the plurality of storage nodes.
可选地,所述确定模块701,所述确定模块包括:Optionally, the determining
第一确定单元,用于执行上述步骤501;a first determining unit, configured to perform the above step 501;
第二确定单元,用于根据所述至少一个存储节点以及所述目标数据,确定所述故障状态。A second determining unit, configured to determine the fault state according to the at least one storage node and the target data.
可选地,所述第二确定单元用于:Optionally, the second determining unit is used for:
当所述至少一个存储节点的数目大于所述分布式存储系统的冗余度,且所述目标数据的数据量符合预设条件时,将所述故障状态确定为第一故障状态,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复。When the number of the at least one storage node is greater than the redundancy of the distributed storage system, and the data volume of the target data meets a preset condition, the fault state is determined to be a first fault state, and the first fault state is determined. A fault state is used to indicate that the at least one storage node can be completely repaired within the first preset time period.
可选地,所述第二确定单元用于:Optionally, the second determining unit is used for:
当所述至少一个存储节点的数目大于所述分布式存储系统的冗余度,且所述目标数据的数据量符合预设条件时,根据所述分布式存储系统的故障场景,确定所述故障状态,所述故障场景用于指示所述至少一个存储节点是否同时出现故障。When the number of the at least one storage node is greater than the redundancy of the distributed storage system, and the data volume of the target data meets a preset condition, determine the failure according to the failure scenario of the distributed storage system status, and the failure scenario is used to indicate whether the at least one storage node fails simultaneously.
可选地,所述预设条件包括下述任一项:Optionally, the preset conditions include any of the following:
所述目标数据的数据量与第一预设数据量之间的比值大于预设比值,所述第一预设数据量为所述分布式存储系统存储的所有数据的总数据量;The ratio between the data volume of the target data and the first preset data volume is greater than the preset ratio, and the first preset data volume is the total data volume of all data stored in the distributed storage system;
所述目标数据的数据量大于第二预设数据量。The data volume of the target data is greater than the second preset data volume.
可选地,所述装置还包括:Optionally, the device further includes:
更新模块,用于当所述故障状态为所述第一故障状态时,若在所述第一预设时长内所述至少一个存储节点未全部被修复,则将所述故障状态由第一故障状态更新为第二故障状态,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。An update module, configured to change the fault state from the first fault state to the first fault state if the at least one storage node is not all repaired within the first preset time period when the fault state is the first fault state The state is updated to a second failure state, where the second failure state is used to indicate that the at least one storage node cannot be fully repaired within the first preset time period.
可选地,所述第二确定单元,用于执行上述步骤602。Optionally, the second determining unit is configured to perform the above step 602 .
可选地,所述第二确定单元,用于当所述至少一个存储节点在目标时长内均出现故障时,将所述故障场景确定为第一故障场景,否则,将所述故障场景确定为第二故障场景,所述第一故障场景用于指示所述至少一个存储节点同时出现故障,所述第二故障场景用于指示所述至少一个存储节点出现故障的时间不同。Optionally, the second determining unit is configured to determine the failure scenario as the first failure scenario when the at least one storage node fails within the target duration, otherwise, determine the failure scenario as the failure scenario. A second failure scenario, where the first failure scenario is used to indicate that the at least one storage node fails simultaneously, and the second failure scenario is used to indicate that the at least one storage node fails at different times.
可选地,所述第二确定单元包括:Optionally, the second determining unit includes:
第一确定子单元,用于执行上述步骤604;a first determination subunit, configured to perform the above step 604;
第二确定子单元,用于执行上述步骤605。The second determination subunit is used to perform the above step 605 .
可选地,所述第一确定子单元用于:Optionally, the first determination subunit is used for:
当所述至少一个存储节点中每一个存储节点的故障类型均为第一故障类型时,将所述故障状态确定为第一故障状态,所述第一故障类型用于指示一个存储节点的故障能在所述第二预设时长内被修复,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复;When the fault type of each storage node in the at least one storage node is the first fault type, the fault state is determined as the first fault state, and the first fault type is used to indicate the fault energy of one storage node Repaired within the second preset time period, the first fault state is used to indicate that the at least one storage node can be completely repaired within the first preset time period;
当所述至少一个存储节点中目标个数的存储节点的故障类型为第二故障类型时,若所述目标个数小于或等于所述分布式存储系统的冗余度,将所述故障状态确定为所述第一故障状态,否则,将所述故障状态确定为第二故障状态,所述第二故障类型用于指示一个存储节点的故障不能在所述第二预设时长内被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。When the failure type of a target number of storage nodes in the at least one storage node is the second failure type, and if the target number is less than or equal to the redundancy of the distributed storage system, determine the failure state is the first fault state, otherwise, the fault state is determined to be the second fault state, and the second fault type is used to indicate that the fault of one storage node cannot be repaired within the second preset time period, so The second fault state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
可选地,所述第二确定子单元用于:Optionally, the second determination subunit is used for:
当所述第一存储节点的故障类型为第一故障类型时,将所述故障状态确定为第一故障状态,所述第一故障类型用于指示一个存储节点的故障能在所述第二预设时长内被修复,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复;When the fault type of the first storage node is the first fault type, the fault state is determined to be the first fault state, and the first fault type is used to indicate that the fault of one storage node can be being repaired within a preset time period, and the first fault state is used to indicate that the at least one storage node can be completely repaired within the first preset time period;
当所述第一存储节点的故障类型为第二故障类型时,则将所述故障状态确定为第二故障状态,所述第二故障类型用于指示一个存储节点的故障不能在所述第二预设时长内被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。When the fault type of the first storage node is the second fault type, the fault state is determined to be the second fault state, and the second fault type is used to indicate that the fault of one storage node cannot be repaired within a preset time period, and the second fault state is used to indicate that the at least one storage node cannot be fully repaired within the first preset time period.
可选地,所述确定模块701,还用于执行步骤603。Optionally, the determining
可选地,所述发送模块702,还用于执行上述步骤509。Optionally, the sending
图8是本发明实施例提供的一种故障处理装置的结构示意图,所述分布式存储系统包含多个存储节点;该装置包括:8 is a schematic structural diagram of a fault processing apparatus provided by an embodiment of the present invention, where the distributed storage system includes a plurality of storage nodes; the apparatus includes:
发送模块801,用于执行上述步骤608;A sending
接收模块802,用于接收所述目标存储节点返回的响应;所述响应中包含所述分布式存储系统的故障状态;所述故障状态用于指示至少一个出现故障的存储节点能否在第一预设时长内全部被修复。A receiving
可选地,该装置还包括:Optionally, the device also includes:
处理模块,用于基于所述响应中包含的故障状态,进行故障处理。The processing module is configured to perform fault processing based on the fault status included in the response.
可选地,所述故障状态的故障标识包括第一故障标识或第二故障标识中的任一个,其中,所述第一故障标识用于指示第一故障状态,所述第二故障标识用于指示第二故障状态,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复,所述存储节点为所述分布式存储系统中出现故障的存储节点。Optionally, the fault identifier of the fault state includes any one of a first fault identifier or a second fault identifier, wherein the first fault identifier is used to indicate the first fault state, and the second fault identifier is used to Indicates a second failure state, the first failure state is used to indicate that at least one storage node can be fully repaired within a first preset time period, and the second failure state is used to indicate that the at least one storage node cannot operate in the All are repaired within the first preset time period, and the storage node is a faulty storage node in the distributed storage system.
可选地,所述处理模块用于:Optionally, the processing module is used to:
当所述访问请求由所述分布式存储系统中的目标客户端基于目标虚拟机发送,且所述目标虚拟机是VMWare虚拟机时,若所述故障状态为所述第一故障状态,不向所述目标虚拟机响应所述访问请求,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复;When the access request is sent by the target client in the distributed storage system based on the target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is the first fault state, no request is sent to the target virtual machine. The target virtual machine responds to the access request, and the first fault state is used to indicate that at least one storage node can be completely repaired within a first preset time period;
当所述访问请求由所述目标客户端基于目标虚拟机发送,且所述目标虚拟机是VMWare虚拟机时,若所述故障状态为所述第二故障状态时,向所述目标虚拟机返回存储异常的消息,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。When the access request is sent by the target client based on the target virtual machine, and the target virtual machine is a VMWare virtual machine, if the fault state is the second fault state, return to the target virtual machine An abnormal message is stored, and the second fault state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period.
可选地,所述处理模块用于:Optionally, the processing module is used to:
当所述目标访问请求由所述分布式存储系统中的目标客户端基于目标虚拟机发送,且所述目标虚拟机不是VMWare虚拟机时,若所述故障状态为所述第一故障状态,向所述目标虚拟机发送重试请求,所述重试请求用于指示重新下发所述访问请求,所述第一故障状态用于指示至少一个存储节点能在第一预设时长内全部被修复;When the target access request is sent by the target client in the distributed storage system based on the target virtual machine, and the target virtual machine is not a VMWare virtual machine, if the fault state is the first fault state, send the request to the The target virtual machine sends a retry request, where the retry request is used to instruct to re-issue the access request, and the first fault state is used to indicate that at least one storage node can be completely repaired within a first preset time period ;
当所述目标访问请求由所述目标客户端基于目标虚拟机发送,且所述目标虚拟机不是VMWare虚拟机时,若所述故障状态为所述第二故障状态,向所述目标虚拟机返回所述目标虚拟机可识别的目标错误,所述目标错误用于指示存储介质故障,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。When the target access request is sent by the target client based on the target virtual machine, and the target virtual machine is not a VMWare virtual machine, if the fault state is the second fault state, return to the target virtual machine A target error identifiable by the target virtual machine, where the target error is used to indicate a storage medium failure, and the second failure state is used to indicate that the at least one storage node cannot be completely repaired within the first preset time period .
可选地,所述接收模块802,还用于接收所述分布式存储系统中的目标客户端发送的目标访问请求,所述目标访问请求用于指示对第一目标数据进行处理,所述第一目标数据包括所述目标数据;Optionally, the receiving
所述发送模块801,用于基于所述目标访问请求,向分布式存储系统内的目标存储节点发送所述访问请求。The sending
可选地,接收模块802,用于接收目标存储节点返回的修复完成响应,所述修复完成响应用于指示所述分布式存储系统内没有故障设备。Optionally, the receiving
本发明实施例还提供一种分布式存储系统,所述分布式存储系统包括监管节点和多个存储节点;The embodiment of the present invention also provides a distributed storage system, the distributed storage system includes a supervisory node and a plurality of storage nodes;
所述监管节点用于:The supervisory node is used to:
根据所述多个存储节点中的至少一个出现故障的存储节点确定所述分布式存储系统的故障状态;所述故障状态用于指示所述至少一个出现故障的存储节点能否在第一预设时长内全部被修复;A fault state of the distributed storage system is determined according to at least one faulty storage node among the plurality of storage nodes; the fault state is used to indicate whether the at least one faulty storage node can All repaired in time;
向所述多个存储节点中每一个存储节点发送所述故障状态;sending the fault status to each of the plurality of storage nodes;
所述多个存储节点中的每一个存储节点,用于接收所述故障状态。Each storage node in the plurality of storage nodes is configured to receive the fault status.
可选地,所述故障状态的故障标识包括第一故障标识和第二故障标识中的任一个,所述第一故障标识用于指示所述第一故障状态,所述第二故障标识用于指示第二故障状态,所述第一故障状态用于指示所述至少一个存储节点能在所述第一预设时长内全部被修复,所述第二故障状态用于指示所述至少一个存储节点不能在所述第一预设时长内全部被修复。Optionally, the fault identifier of the fault state includes any one of a first fault identifier and a second fault identifier, the first fault identifier is used to indicate the first fault state, and the second fault identifier is used for Indicates a second fault state, the first fault state is used to indicate that the at least one storage node can be completely repaired within the first preset time period, and the second fault state is used to indicate that the at least one storage node is All cannot be repaired within the first preset time period.
可选地,所述多个存储节点中的每一个存储节点,还用于当接收到所述故障标识后,若再接收到所述访问请求,悬挂所述访问请求,基于接收的故障状态,进行故障处理。Optionally, each storage node in the plurality of storage nodes is further configured to suspend the access request if the access request is received again after receiving the fault identifier, and based on the received fault status, Troubleshoot.
需要说明的上述提供的分布式存储系统内的各个设备均可以是实施例5和6中的设备。It should be noted that each device in the distributed storage system provided above may be the device in Embodiments 5 and 6.
上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。All the above-mentioned optional technical solutions can be combined arbitrarily to form optional embodiments of the present disclosure, which will not be repeated here.
需要说明的是:上述实施例提供的故障处理装置在处理故障时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的分布式存储系统故障处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that when the fault handling device provided in the above-mentioned embodiments handles faults, only the division of the above-mentioned functional modules is used as an example for illustration. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the embodiments of the method for handling faults in a distributed storage system provided by the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments, which will not be repeated here.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above embodiments can be completed by hardware, or can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium. The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, etc.
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the protection of the present invention. within the range.
Claims (19)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910741190.5A CN110535692B (en) | 2019-08-12 | 2019-08-12 | Fault handling method, device, computer equipment, storage medium and storage system |
PCT/CN2020/102302 WO2021027481A1 (en) | 2019-08-12 | 2020-07-16 | Fault processing method, apparatus, computer device, storage medium and storage system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910741190.5A CN110535692B (en) | 2019-08-12 | 2019-08-12 | Fault handling method, device, computer equipment, storage medium and storage system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110535692A CN110535692A (en) | 2019-12-03 |
CN110535692B true CN110535692B (en) | 2020-12-18 |
Family
ID=68662506
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910741190.5A Active CN110535692B (en) | 2019-08-12 | 2019-08-12 | Fault handling method, device, computer equipment, storage medium and storage system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110535692B (en) |
WO (1) | WO2021027481A1 (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110535692B (en) * | 2019-08-12 | 2020-12-18 | 华为技术有限公司 | Fault handling method, device, computer equipment, storage medium and storage system |
CN111371848A (en) * | 2020-02-21 | 2020-07-03 | 苏州浪潮智能科技有限公司 | A request processing method, apparatus, device and storage medium |
CN113805788B (en) * | 2020-06-12 | 2024-04-09 | 华为技术有限公司 | Distributed storage system and exception handling method and related device thereof |
CN112187919B (en) * | 2020-09-28 | 2024-01-23 | 腾讯科技(深圳)有限公司 | Storage node management method and related device |
CN113032106B (en) * | 2021-04-29 | 2024-07-09 | 中国工商银行股份有限公司 | Automatic detection method and device for IO suspension abnormality of computing node |
CN113326251B (en) * | 2021-06-25 | 2024-02-23 | 深信服科技股份有限公司 | Data management method, system, device and storage medium |
US11544139B1 (en) * | 2021-11-30 | 2023-01-03 | Vast Data Ltd. | Resolving erred 10 flows |
CN114584454B (en) * | 2022-02-21 | 2023-08-11 | 苏州浪潮智能科技有限公司 | Processing method and device of server information, electronic equipment and storage medium |
CN117008815A (en) * | 2022-04-28 | 2023-11-07 | 华为技术有限公司 | Storage device and data processing method |
CN116382850B (en) * | 2023-04-10 | 2023-11-07 | 北京志凌海纳科技有限公司 | Virtual machine high availability management device and system using multi-storage heartbeat detection |
WO2025000362A1 (en) * | 2023-06-29 | 2025-01-02 | Nokia Shanghai Bell Co., Ltd. | Supervision on supervision object |
CN118567576B (en) * | 2024-07-31 | 2024-10-29 | 浪潮电子信息产业股份有限公司 | Multi-control memory system and data storage method, device, medium and product thereof |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104935481A (en) * | 2015-06-24 | 2015-09-23 | 华中科技大学 | A Data Recovery Method Based on Redundancy Mechanism in Distributed Storage |
CN108984107A (en) * | 2017-06-02 | 2018-12-11 | 伊姆西Ip控股有限责任公司 | Improve the availability of storage system |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103092712B (en) * | 2011-11-04 | 2016-03-30 | 阿里巴巴集团控股有限公司 | A kind of tasks interrupt restoration methods and equipment |
US10691479B2 (en) * | 2017-06-28 | 2020-06-23 | Vmware, Inc. | Virtual machine placement based on device profiles |
CN109831342A (en) * | 2019-03-19 | 2019-05-31 | 江苏汇智达信息科技有限公司 | A kind of fault recovery method based on distributed system |
CN110535692B (en) * | 2019-08-12 | 2020-12-18 | 华为技术有限公司 | Fault handling method, device, computer equipment, storage medium and storage system |
-
2019
- 2019-08-12 CN CN201910741190.5A patent/CN110535692B/en active Active
-
2020
- 2020-07-16 WO PCT/CN2020/102302 patent/WO2021027481A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104935481A (en) * | 2015-06-24 | 2015-09-23 | 华中科技大学 | A Data Recovery Method Based on Redundancy Mechanism in Distributed Storage |
CN108984107A (en) * | 2017-06-02 | 2018-12-11 | 伊姆西Ip控股有限责任公司 | Improve the availability of storage system |
Also Published As
Publication number | Publication date |
---|---|
WO2021027481A1 (en) | 2021-02-18 |
CN110535692A (en) | 2019-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110535692B (en) | Fault handling method, device, computer equipment, storage medium and storage system | |
CN109842651B (en) | Uninterrupted service load balancing method and system | |
CN105095001B (en) | Virtual machine abnormal restoring method under distributed environment | |
CN106936616B (en) | Backup communication method and device | |
US9588542B2 (en) | Rack server system and method for automatically managing rack configuration information | |
CN109947586A (en) | A method, apparatus and medium for isolating faulty equipment | |
JP2006127201A (en) | Storage system and continuity check method | |
US10275330B2 (en) | Computer readable non-transitory recording medium storing pseudo failure generation program, generation method, and generation apparatus | |
CN114868117A (en) | Peer-to-peer storage device messaging over a control bus | |
CN115550291B (en) | Switch reset system and method, storage medium, electronic equipment | |
CN110275793A (en) | Detection method and equipment for MongoDB data fragment cluster | |
CN108769170A (en) | A kind of cluster network fault self-checking system and method | |
CN105607973A (en) | Method, device and system for processing equipment failures in virtual machine system | |
US7499987B2 (en) | Deterministically electing an active node | |
CN108512753B (en) | A method and device for message transmission in a cluster file system | |
CN112612653B (en) | A business recovery method, device, arbitration server and storage system | |
TWI518680B (en) | Method for maintaining file system of computer system | |
CN111342986B (en) | Distributed node management method and device, distributed system and storage medium | |
CN115705261A (en) | Memory fault repairing method, CPU, OS, BIOS and server | |
CN113868058A (en) | Method, device and server for fault detection of peripheral component high-speed interconnection equipment | |
US8819481B2 (en) | Managing storage providers in a clustered appliance environment | |
US20230106077A1 (en) | Distributed Storage System, Exception Handling Method Thereof, and Related Apparatus | |
US6990609B2 (en) | System and method for isolating faults in a network | |
CN111309504A (en) | Control method for embedded module serial port redundant transmission and related components | |
US11947431B1 (en) | Replication data facility failure detection and failover automation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |