[go: up one dir, main page]

CN113535473B - Cluster Server - Google Patents

Cluster Server Download PDF

Info

Publication number
CN113535473B
CN113535473B CN202110721354.5A CN202110721354A CN113535473B CN 113535473 B CN113535473 B CN 113535473B CN 202110721354 A CN202110721354 A CN 202110721354A CN 113535473 B CN113535473 B CN 113535473B
Authority
CN
China
Prior art keywords
server
servers
disk array
disk
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110721354.5A
Other languages
Chinese (zh)
Other versions
CN113535473A (en
Inventor
张弛
吴峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Huaqi Intelligent Technology Co ltd
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202110721354.5A priority Critical patent/CN113535473B/en
Publication of CN113535473A publication Critical patent/CN113535473A/en
Application granted granted Critical
Publication of CN113535473B publication Critical patent/CN113535473B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2089Redundant storage control functionality
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1438Restarting or rejuvenating

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请涉及一种集群服务器,包括:交换机和至少三个服务器,服务器与交换机连接;服务器包括存储设备,存储设备包括硬盘控制器和磁盘阵列,各硬盘控制器通过磁盘连接器连接两个其他服务器的磁盘阵列,各服务器的存储设备以环状拓扑结构连接;至少三个服务器包括主服务器,主服务器用于控制各服务器获取或释放对当前服务器的磁盘阵列和/或至少一个其他服务器的磁盘阵列的控制权。通过本申请,解决了相关技术的集群服务器的存储资源利用率低的问题,提高了集群服务器的存储资源利用率。

The present application relates to a cluster server, comprising: a switch and at least three servers, the servers are connected to the switch; the servers include storage devices, the storage devices include hard disk controllers and disk arrays, each hard disk controller is connected to the disk arrays of two other servers through a disk connector, and the storage devices of each server are connected in a ring topology; at least three servers include a master server, the master server is used to control each server to obtain or release the control of the disk array of the current server and/or the disk array of at least one other server. Through the present application, the problem of low storage resource utilization of cluster servers in related technologies is solved, and the storage resource utilization of cluster servers is improved.

Description

Cluster server
Technical Field
The present application relates to the field of server clusters, and in particular, to a cluster server.
Background
A server cluster refers to a process of centralizing a plurality of servers together to perform the same service, and the server cluster appears to a client as if there is only one server. The cluster can use a plurality of computers to perform parallel computation so as to obtain high computation speed, and can also use a plurality of computers to perform backup, so that any machine breaks the whole system or can normally operate.
The existing cluster server can only realize the cluster of the software system level, namely when one server fails, the application running on the server can be switched to other servers, the hard disk resource on the failed machine can not be applied any more, and the storage link transmitted to the server can be cut off, so that the storage content on the server can not be obtained, and the utilization rate of the storage resource can not be fully utilized.
Disclosure of Invention
In this embodiment, a cluster server is provided to solve the problem of low storage resource utilization rate of the cluster server in the related art.
The embodiment provides a cluster server, which comprises a switch and at least three servers, wherein the servers are connected with the switch;
The server comprises a storage device, wherein the storage device comprises a hard disk controller and a disk array, and each hard disk controller is connected with the disk array of at least one other server through a disk connector;
the at least three servers comprise a main server, and the main server is used for controlling each server to acquire or release control rights to the disk array of the current server and/or the disk array of at least one other server.
In some embodiments, each hard disk controller is connected to a disk array of two other servers through a disk connector, and the storage devices of each server are connected in a ring topology.
In some embodiments, the server comprises a central processor, wherein the central processor is connected with the switch, and the central processor of the main server is used for monitoring the online state of other servers and notifying adjacent servers to obtain the control right of the disk array of the dropped server under the condition that the other servers are dropped.
In some embodiments, the other servers establish heartbeat connection with the main server, and the main server monitors the online state of the other servers through heartbeat information.
In some embodiments, the server further includes a baseboard management controller, the baseboard management controller is connected with the switch, the baseboard management controller is further connected with a hard disk controller of the current server, and the baseboard management controller is used for acquiring or releasing control rights to a disk array of the current server and/or a disk array of at least one other server.
In some of these embodiments,
The baseboard management controller is used for monitoring the running state of each hardware in the current server and sending the running state to the central processing unit of the current server;
The central processor of the other servers is used for releasing the disk array control right under the condition that the running state is abnormal, and notifying the running state abnormality to the main server;
and the central processor of the main server is used for notifying adjacent servers to acquire the control right of the disk array of the server with abnormal running state after receiving the notification of the abnormal running state.
In some embodiments, the other servers are further configured to perform self-checking repair on hardware of the current server after transferring control of the disk array of the current server to the other servers, and re-acquire control of the disk array of the current server after the self-checking repair is successful.
In some embodiments, the disk array of each server is powered by an independent power supply, and the other servers perform self-checking repair by restarting the current server.
In some of these embodiments, the disk connector is a serial attached small computer system interface connector.
In some of these embodiments, the storage devices of each of the servers are physically centrally disposed within the server.
Compared with the related art, the cluster server provided in the embodiment comprises a switch and at least three servers, wherein the servers are connected with the switch, the servers comprise storage equipment, the storage equipment comprises hard disk controllers and disk arrays, each hard disk controller is connected with the disk arrays of at least one other server through the disk connectors, the at least three servers comprise main servers, the main servers are used for controlling each server to acquire or release control rights to the disk arrays of the current server and/or the disk arrays of at least one other server, the problem that the storage resource utilization rate of the cluster server in the related art is low is solved, and the storage resource utilization rate of the cluster server is improved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
fig. 1 is a schematic diagram of a server of the present embodiment.
Fig. 2 is a schematic structural diagram of a cluster server according to the present embodiment.
Fig. 3 is a schematic diagram of the linear topology of the present embodiment.
Fig. 4 is a schematic diagram of the ring topology of the present embodiment.
Detailed Description
The present application will be described and illustrated with reference to the accompanying drawings and examples for a clearer understanding of the objects, technical solutions and advantages of the present application.
Unless defined otherwise, technical or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these" and similar terms in this application are not intended to be limiting in number, but may be singular or plural. The terms "comprises," "comprising," "includes," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this disclosure are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means two or more. "and/or" describes the association relationship of the association object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that a exists alone, a and B exist simultaneously, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this disclosure, merely distinguish similar objects and do not represent a particular ordering for objects.
The present embodiment provides a cluster server, the cluster server includes three or more servers. Fig. 1 is a schematic diagram of the servers of the present embodiment, which may also be referred to as hosts, each of which includes a computing section 10 and a storage section 20 as shown in fig. 1. The computing section 10 typically includes a central processing unit 110 (CPU, also referred to as a master controller or master), and the storage section typically consists of a storage device 210.
Storage device 210 includes disk array 211. It should be noted that, in this embodiment, the disk array 211 may include only one disk drive, or may be a disk group formed by combining a plurality of disk drives. Also, the disk drives making up the disk array are not limited to HDD disk drives or SDD disk drives, but may be a combination of HDD disk drives and SDD disk drives in some embodiments. In addition, the disk array 211 may be a high-capacity disk drive formed by serially connecting all disk drives by JBOD (Just a Bunch Of Disks) technology, or may be a disk drive formed by using RAID (redundant array of independent disks) technology by a server, so as to improve the fault tolerance of the disk.
The interface device between the computing section 10 and the disk array 211 is referred to as a hard disk controller 212, and also as a disk drive adapter. The hard disk controller 212 is used at a software level to interpret commands given by the computing section 10, send various control signals to the disk drive, detect the disk drive status, or write and read data to and from the disk in accordance with a prescribed disk data format, also controlled by the hard disk controller 212. At the hardware level, hard disk controller 212 provides one or more physical interfaces for interfacing with disk array 211. The hard disk controller 212 may interface with one or more disk arrays 211 through these physical interfaces and gain or release control of the physically interfaced disk arrays 211.
Each disk array 211 may include one or more physical interfaces for interfacing with hard disk controller 212. For example, a disk array 211 based on SAS (serial attached small computer system interface) technology may be implemented by connecting with a hard disk controller 212 of a plurality of servers to share the same disk array 211 by the plurality of servers.
The computing portion 10 and the storage portion 20 of each server may be physically centrally located, such as within the same server chassis. The calculation section 10 and the storage section 20 may be provided on the same main circuit board or may be provided separately. For example, the storage section 20 is provided on a server back plane, and the calculation section 10 is provided on a main circuit board.
In addition to the storage section 20 and the computing section 10, the server typically has two core firmware, BIOS (basic input output system) (not shown) and BMC (baseboard management controller) (not shown), respectively. In the computer system, the BIOS has the function of being lower and basic than the operating system of the server, is mainly responsible for detecting, accessing and debugging the lower hardware resources and is distributed to the operating system so as to ensure the whole and smooth and safe operation of the system. The BMC is a small operating system, which is independent from the operating system of the server, and is usually integrated on the motherboard, or is plugged into the motherboard through PCIe or other forms. The external appearance of the BMC is usually a standard RJ45 network port, and the BMC has an independent IP firmware system. Typically, the server may use BMC instructions for unattended operations, such as remote management, monitoring, installation, restarting, etc. of the server.
Fig. 2 is a schematic structural diagram of a cluster server according to the present embodiment. In fig. 2, five servers are taken as an example for illustration, in other embodiments, the number of servers may be any number greater than three, and is generally specifically set according to the requirements of computing resources and storage resources of the cluster server, and the number is not limited in this embodiment.
The cluster server as shown in fig. 2 includes a switch 40 and five servers. Each server is connected to a switch 40. The hard disk controller 212 of each server is connected to the disk array 211 of the current server and the disk array 211 of at least one other server by a disk connector (e.g., SAS connector). Wherein, the other servers refer to other servers except the current server in the cluster server.
Of these five servers, one server is a master server (server a in fig. 2) by self-election or user configuration, and the other servers are referred to as slave servers with respect to the master server. The main server is used for controlling each server to acquire or release control rights to the disk array of the current server and/or the disk array of at least one other server.
However, the master-slave servers in the present embodiment are not limited to the servers having a master-slave relationship in actual business processes, but merely represent that the master server is dominant in implementing the present embodiment. In actual business processes, the master server may have the same rank as the other slave servers, or a lower rank or higher rank than some other slave server.
According to the cluster server provided by the embodiment, one of the servers is used as the main server, the hard disk controllers of the servers are connected with the disk array of the current server and the disk array of at least one other server through the disk connectors, and the main server is used for controlling the other servers to acquire or release the control right to the disk array of the current server and/or the disk array of the at least one other server, so that under the condition that one other server fails, the main server can control the other servers to take over the disk array 211 of the failed server, and the utilization rate of the disk array 211 is improved. Compared to the prior art that uses an SAS switch such as expensive cost to realize the sharing of the disk array 211, the present embodiment does not need to add any SAS switch, but can directly use the switch 40 for service processing with the cluster server, thereby greatly reducing the cost.
The computing section 10 of each server includes a central processor 110, and the central processor 110 of each server is connected to the switch 40. The central processor 110 of each slave server may report the status of the current server to the master server through the switch 40 periodically or aperiodically, and once the status reported by the slave server indicates that the slave server cannot normally complete the service processing, the master server may notify the slave server to release the control right on the disk array of the current server, and notify other slave servers to acquire the control right on the disk array of the failed slave server. In some embodiments, the slave server may also actively release control of the current server's disk array in the event of a failure.
However, upon a secondary server dropping or abnormal power down, the secondary server cannot report the current server's state to the primary server, and thus, in some embodiments, the primary server may actively monitor the state of the secondary server, such as monitoring the online state of other secondary servers. When the other servers are disconnected, the central processing unit 110 of the master server notifies the servers adjacent to the disconnected slave server to acquire control rights to the disk array 211 of the disconnected server.
The heartbeat connection may be used to detect the online status of other slave servers. Namely, the other slave servers and the master server establish heartbeat connection, the slave servers periodically send heartbeat information (keep-alive information) to the master server through the heartbeat connection, and if the master server does not receive the heartbeat information sent by a certain slave server after exceeding a set time interval, the slave server is considered to be disconnected.
In each server, both the central processor 110 and the BMC may be used to control the hard disk controller 212 to gain or release control of the disk array 211. The BMC is connected with the switch 40 through an RJ45 network port, and is also connected with the hard disk controller 212 of the current server. The BMC is configured to control the hard disk controller 212 to acquire or release control of the disk array 211 of the current server and/or the disk array 211 of at least one other server. In some cases, if the slave server's operating system crashes or a CPU failure results in an inability to control the hard disk controller 212 to release control of the current server's disk array 211, the master server may control the slave server's BMC through the switch 40 to release control of the current server's disk array 211. Similarly, the master server may also control the BMCs of other slave servers to obtain control of a certain disk array 211 through the switch 40.
Since the BMC is a small operating system independent of the server operating system, the BMC can still operate normally even if the operating system of the slave server crashes due to hardware failure or software failure, and it is ensured that the control right of the disk array 211 of the cluster server can be handed over normally.
The BMC is independently existed as a third party in the server, can monitor hardware information of the whole server, such as temperature, power supply voltage, fan rotating speed and the like of the system, and can monitor the working state of a system network module, a user interaction module (such as a USB module and a display module) or other modules. Once a certain module of the server has an abnormality that can affect the normal service capability of the server, the BMC determines that the server does not complete the storage function, and then the BMC transmits the abnormality information to the central processor 110 of the current server or directly transmits the abnormality information to the central processor of the main server through the switch 40. The monitoring of the running state of the current server by the central processor 110 may be implemented by the BMC. For example, the BMC monitors the running state of each hardware in the current server and sends the running state to the central processor 110 of the current server, the central processors 110 of other servers release the control right of the disk array 211 and notify the main server of the running state abnormality when the running state is abnormal, and the central processor 110 of the main server notifies the adjacent server to acquire the control right of the disk array 211 of the server with the running state abnormality after receiving the notification of the running state abnormality.
To avoid the cost increase caused by interconnecting all disk arrays 211 in a cluster server with a SAS switch, each hard disk controller 212 in this embodiment connects the disk array 211 of the current server to the disk array 211 of at least one other server via a disk connector (SAS connector). By such connection, the storage devices of the servers may form a linear topology such as that shown in fig. 3. Under the linear topology structure, when the servers at the two ends of the topology structure have faults, the storage device can only be taken over by one adjacent server, and under the condition that the calculation load of the adjacent server is large, the adjacent server can possibly cause self faults caused by further increased load after taking over the storage device, and the stability of the cluster server is reduced. Or two continuous adjacent servers at two ends of the topological structure have faults, the storage equipment of the outermost server cannot be taken over by any one server, and therefore, the utilization rate of the storage equipment still has room for improvement.
To this end, in some of these embodiments, each hard disk controller 212 connects the disk array 211 of the current server and the disk arrays of two other servers through a disk connector (SAS connector), the storage devices of each server forming a ring topology such as that shown in fig. 4. The connection mode ensures that two adjacent servers can take over the storage devices of the fault server under the condition that any one server fails, ensures that one server can take over the disk arrays of the two fault servers even if two continuous adjacent servers fail, and only can cause that the storage devices of one server cannot be taken over by any one server under the condition that three continuous adjacent servers fail. Therefore, the stability of the cluster server and the utilization rate of the storage equipment are improved by adopting the ring topology structure.
The working procedure of the cluster server of this embodiment is described below.
Example 1
In this embodiment, the hard disk controller 212 of each server is controlled by the host server to acquire or release control rights to the disk array of the current server and/or other servers.
Referring to fig. 4, which is a topology structure, taking a master server as a server a and other servers as slave servers as examples, the working process of the cluster server provided in this embodiment includes the following steps:
And step 1, the master server and the slave server monitor the running state of each hardware in each server.
In step 2, when the operation state of the server B is abnormal, the hard disk controller 212 controlling the server B releases the control right to the disk array 211 of the server B.
And step 3, the server B sends a disk array control right handover instruction to the server A.
The disk array control right handover instruction sent by the server B to the server a carries the identification information of the server B, or carries the identification information of the disk array of the server B.
Step 4, when the server a receives the disk array control right handover command sent by the server B through the switch 40, according to the identification information carried in the disk array control right handover command, the neighboring servers of the server B corresponding to the identification information are determined to be the server a and the server C respectively by querying a mapping table configured in advance.
In step 5, if the server a confirms that the current server is adjacent to the server B, the hard disk controller 211 of the server a may be directly controlled to obtain the control right for the disk array 211 of the server B.
Example 2
In this embodiment, the hard disk controller 212 of each server is controlled by the host server to acquire or release control rights to the disk array of the current server and/or other servers.
Referring to fig. 4, a topology is shown in which a master server is taken as a server a, and other servers are taken as slave servers. The working process of the cluster server provided by the embodiment comprises the following steps:
And step 1, the master server and the slave server monitor the running state of each hardware in each server.
In step 2, when the operation state of the server B is abnormal, the hard disk controller 212 controlling the server B releases the control right to the disk array 211 of the server B.
And step 3, the server B sends a disk array control right handover instruction to the server A.
The disk array control right handover instruction sent by the server B to the server a carries the identification information of the server B, or carries the identification information of the disk array of the server B.
Step 4, when the server a receives the disk array control right handover command sent by the server B through the switch 40, according to the identification information carried in the disk array control right handover command, the neighboring servers of the server B corresponding to the identification information are determined to be the server a and the server C respectively by querying a mapping table configured in advance.
In step 5, the server a finds itself adjacent to the server B, but the running state of the server a indicates that the workload is already large, and at this time, the server a sends a disk array control right acquisition instruction to the server C, where the disk array control right acquisition instruction carries the identification information of the server B or the identification information of the disk array 211 of the failed server.
Step 6, after receiving the disk array control right acquiring instruction, the server C acquires the control right of the disk array 211 of the server B according to the identification information carried in the instruction.
The adjacent servers may be one or more servers. For example, in a ring topology, each server has two adjacent servers, and in other embodiments, where servers employ disk arrays 211 such as SAS technology, the two adjacent servers may jointly take control of the disk array 211 of the same failed server.
The main server may maintain a mapping table of the physical interfaces of the hard disk controller 212 and the disk array 211 to obtain the identification information of the disk array 211 connected to each physical interface or the identification information of the server to which the disk array 211 belongs, and may also maintain topology information of a cluster server to obtain the adjacent servers of each server. After obtaining the disk array control right obtaining instruction from the server, determining the physical interface connected to the disk array 211 to be taken over according to the identification information carried in the disk array control right obtaining instruction, so as to control the hard disk controller 212 to obtain the control right of the disk array 211 of the fault server connected to the physical interface.
Example 3
In this embodiment, the hard disk controller 212 of each server is controlled by the master server to acquire or release control rights to the disk array of the current server and/or other servers, but each slave server may not need to actively report the running state of the respective server.
Referring to fig. 4, a topology is shown in which a master server is taken as a server a, and other servers are taken as slave servers. The working process of the cluster server provided by the embodiment comprises the following steps:
and 2, monitoring heartbeat information of each server in the cluster server by the server A, and determining that the server B is disconnected according to the heartbeat information.
And 3, the server A determines that the adjacent servers of the server B are the server A and the server C by inquiring a pre-configured mapping table according to the identification information of the server B.
In step 4, the server a finds itself adjacent to the server B, but the running state of the server a indicates that the workload is already large, and at this time, the server a sends a disk array control right acquisition instruction to the server C, where the disk array control right acquisition instruction carries the identification information of the server B or the identification information of the disk array 211 of the server B.
Step 5, after receiving the disk array control right acquiring instruction, the server C acquires the control right of the disk array 211 of the server B according to the identification information carried in the instruction.
The main server may maintain a mapping table of the physical interfaces of the hard disk controller 212 and the disk array 211 to obtain the identification information of the disk array 211 connected to each physical interface or the identification information of the server to which the disk array 211 belongs, and may also maintain topology information of a cluster server to obtain the adjacent servers of each server. After obtaining the disk array control right obtaining instruction from the server, determining the physical interface connected to the disk array 211 to be taken over according to the identification information carried in the disk array control right obtaining instruction, so as to control the hard disk controller 212 to obtain the control right of the disk array 211 of the fault server connected to the physical interface.
In step 6, after the server B actively or passively transfers the control right of the disk array 211 of the server B to the server C due to a failure or a disconnection, the self-checking repair is performed on the hardware of the server B.
Step 7, if the self-checking repair of the server B is successful, the server B re-acquires the control right to the disk array 211 of the server B.
When the server B re-acquires the control right to the disk array 211 of the server B, a disk array control right acquisition request may be sent to the server a. After receiving the request for obtaining the disk array control right of the server B, the server A sends a disk array control right releasing instruction to the server C which controls the disk array control right of the server B at present, and sends a confirmation message to the server B after receiving the notice that the server C successfully releases the disk array control right. And the server B receives the confirmation message and re-acquires the control right of the disk array. By the method, the self-checking and self-repairing of the fault server are realized.
The disk array 211 of each server is powered by an independent power supply, and the server can perform self-checking repair by restarting the current server, and ensure that the disk array 211 of the current server is continuously powered off and can be taken over and utilized by other servers.
The cluster server may further comprise a control node, which is connected to the switch 40 for configuring each server, for example for configuring a control program of each server, or identification information of each server, or a mapping table stored in each server. In addition, the BMC of each server can be controlled by the control node to realize a remote unattended function, such as remote restarting and the like.
In summary, the conventional cluster service usually cuts off the node service to be abnormal, and cannot call the storage part. The embodiment realizes the completion of cluster service from the aspect of hardware, and effectively utilizes the storage part of the abnormal equipment to multiplex and call the content of the storage part. In the embodiment, the disk arrays of the servers are interconnected by using the disk connectors, so that the storage parts of the servers become a whole capable of performing control right handover, and one of the servers is used as a main server to participate in cluster control, thereby greatly improving the stability and safety of a cluster scheme, and once a certain abnormality occurs, the handover of the disk array control right can be rapidly determined, so that the stability of the cluster scheme is greatly improved.
It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure in accordance with the embodiments provided herein.
It is to be understood that the drawings are merely illustrative of some embodiments of the present application and that it is possible for those skilled in the art to adapt the present application to other similar situations without the need for inventive work. In addition, it should be appreciated that while the development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as a departure from the disclosure.
The term "embodiment" in this disclosure means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. It will be clear or implicitly understood by those of ordinary skill in the art that the embodiments described in the present application can be combined with other embodiments without conflict.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the patent claims. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (7)

1.一种集群服务器,其特征在于包括:交换机和至少三个服务器,所述服务器与所述交换机连接;1. A cluster server, comprising: a switch and at least three servers, wherein the servers are connected to the switch; 所述服务器包括存储设备,所述存储设备包括硬盘控制器和磁盘阵列,各所述硬盘控制器通过磁盘连接器连接至少一个其他服务器的磁盘阵列;The server includes a storage device, and the storage device includes a hard disk controller and a disk array, and each hard disk controller is connected to the disk array of at least one other server through a disk connector; 所述至少三个服务器包括主服务器,所述主服务器用于控制各所述服务器获取或释放对当前服务器的磁盘阵列和/或至少一个其他服务器的磁盘阵列的控制权;The at least three servers include a master server, and the master server is used to control each of the servers to obtain or release control rights over a disk array of a current server and/or a disk array of at least one other server; 所述服务器包括中央处理器,所述中央处理器与所述交换机连接;所述主服务器的中央处理器用于监测其他服务器的在线状态,并在其他服务器掉线的情况下,通知相邻的服务器获取对掉线的服务器的磁盘阵列的控制权;The server includes a central processor, which is connected to the switch; the central processor of the main server is used to monitor the online status of other servers, and in the case where other servers are offline, notify the adjacent servers to obtain the control right of the disk array of the offline server; 所述服务器还包括基板管理控制器,所述基板管理控制器与所述交换机连接,所述基板管理控制器还与当前服务器的硬盘控制器连接,所述基板管理控制器用于获取或释放对当前服务器的磁盘阵列和/或至少一个其他服务器的磁盘阵列的控制权;The server further includes a baseboard management controller, the baseboard management controller is connected to the switch, the baseboard management controller is also connected to the hard disk controller of the current server, and the baseboard management controller is used to obtain or release the control right of the disk array of the current server and/or the disk array of at least one other server; 所述基板管理控制器用于监测当前服务器内各硬件的运行状态,并将所述运行状态发送给当前服务器的中央处理器;The baseboard management controller is used to monitor the operating status of each hardware in the current server and send the operating status to the central processing unit of the current server; 其他服务器的中央处理器用于在所述运行状态异常的情况下,释放磁盘阵列控制权,并将所述运行状态异常通知给所述主服务器;The central processors of other servers are used to release the control right of the disk array when the operation status is abnormal, and notify the main server of the abnormal operation status; 所述主服务器的中央处理器用于在接收到所述运行状态异常的通知后,通知相邻的服务器获取对所述运行状态异常的服务器的磁盘阵列的控制权。The central processing unit of the main server is used to notify the adjacent server to obtain the control right of the disk array of the server with the abnormal operating status after receiving the notification of the abnormal operating status. 2.根据权利要求1所述的集群服务器,其特征在于,各所述硬盘控制器通过磁盘连接器连接两个其他服务器的磁盘阵列,各所述服务器的存储设备以环状拓扑结构连接。2. The cluster server according to claim 1, wherein each of the hard disk controllers is connected to the disk arrays of two other servers via a disk connector, and the storage devices of each of the servers are connected in a ring topology. 3.根据权利要求1所述的集群服务器,其特征在于,其他服务器与所述主服务器建立心跳连接,所述主服务器通过心跳信息监测其他服务器的在线状态。3. The cluster server according to claim 1 is characterized in that other servers establish a heartbeat connection with the main server, and the main server monitors the online status of other servers through heartbeat information. 4.根据权利要求1所述的集群服务器,其特征在于,其他服务器还用于在将当前服务器的磁盘阵列的控制权移交给其他服务器之后,对当前服务器的硬件进行自检修复,并在自检修复成功之后,重新获取对当前服务器的磁盘阵列的控制权。4. The cluster server according to claim 1 is characterized in that other servers are also used to perform self-check and repair on the hardware of the current server after transferring the control of the disk array of the current server to other servers, and regain control of the disk array of the current server after the self-check and repair is successful. 5.根据权利要求4所述的集群服务器,其特征在于,各所述服务器的磁盘阵列采用独立电源供电,所述其他服务器通过重启当前服务器进行自检修复。5. The cluster server according to claim 4, characterized in that the disk array of each server is powered by an independent power supply, and the other servers perform self-check and repair by restarting the current server. 6.根据权利要求1至5中任一项所述的集群服务器,其特征在于,所述磁盘连接器为SAS连接器。6 . The cluster server according to claim 1 , wherein the disk connector is a SAS connector. 7.根据权利要求1至5中任一项所述的集群服务器,其特征在于,每个所述服务器的存储设备在物理上集中设置于服务器内。7 . The cluster server according to claim 1 , wherein the storage device of each server is physically centralized in the server.
CN202110721354.5A 2021-06-28 2021-06-28 Cluster Server Active CN113535473B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110721354.5A CN113535473B (en) 2021-06-28 2021-06-28 Cluster Server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110721354.5A CN113535473B (en) 2021-06-28 2021-06-28 Cluster Server

Publications (2)

Publication Number Publication Date
CN113535473A CN113535473A (en) 2021-10-22
CN113535473B true CN113535473B (en) 2025-06-27

Family

ID=78126129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110721354.5A Active CN113535473B (en) 2021-06-28 2021-06-28 Cluster Server

Country Status (1)

Country Link
CN (1) CN113535473B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535472B (en) * 2021-06-28 2025-03-11 浙江大华技术股份有限公司 Cluster Server
CN113535471B (en) * 2021-06-28 2025-03-07 浙江大华技术股份有限公司 Cluster Server

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103095796A (en) * 2011-11-04 2013-05-08 Lsi公司 Server direct attached storage shared through virtual sas expanders
CN110058803A (en) * 2017-12-20 2019-07-26 三星电子株式会社 Store the local management console of equipment
CN111045602A (en) * 2019-11-25 2020-04-21 浙江大华技术股份有限公司 Cluster system control method and cluster system

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3360719B2 (en) * 1998-06-19 2002-12-24 日本電気株式会社 Disk array clustering reporting method and system
JP4826077B2 (en) * 2004-08-31 2011-11-30 株式会社日立製作所 Boot disk management method
JP4462024B2 (en) * 2004-12-09 2010-05-12 株式会社日立製作所 Failover method by disk takeover
JP5074351B2 (en) * 2008-10-30 2012-11-14 株式会社日立製作所 System construction method and management server
JP5266590B2 (en) * 2009-09-18 2013-08-21 株式会社日立製作所 Computer system management method, computer system, and program
US20130346532A1 (en) * 2012-06-21 2013-12-26 Microsoft Corporation Virtual shared storage in a cluster
CN103257908A (en) * 2013-05-24 2013-08-21 浪潮电子信息产业股份有限公司 Software and hardware cooperative multi-controller disk array designing method
CN106341437B (en) * 2015-07-09 2019-11-15 营邦企业股份有限公司 JBOD device with BMC module and control method thereof
US9836368B2 (en) * 2015-10-22 2017-12-05 Netapp, Inc. Implementing automatic switchover
CN106814976A (en) * 2017-01-19 2017-06-09 东莞市阿普奥云电子有限公司 Cluster storage system and apply its data interactive method
CN108762667A (en) * 2018-04-20 2018-11-06 烽火通信科技股份有限公司 The method that the multi node server of disk can be dynamically distributed and dynamically distribute disk
CN109284207A (en) * 2018-08-30 2019-01-29 紫光华山信息技术有限公司 Hard disc failure processing method, device, server and computer-readable medium
CN109474694A (en) * 2018-12-04 2019-03-15 郑州云海信息技术有限公司 A management and control method and device for a NAS cluster based on a SAN storage array
CN112099990A (en) * 2020-08-31 2020-12-18 新华三信息技术有限公司 Disaster recovery backup method, device, equipment and machine readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103095796A (en) * 2011-11-04 2013-05-08 Lsi公司 Server direct attached storage shared through virtual sas expanders
CN110058803A (en) * 2017-12-20 2019-07-26 三星电子株式会社 Store the local management console of equipment
CN111045602A (en) * 2019-11-25 2020-04-21 浙江大华技术股份有限公司 Cluster system control method and cluster system

Also Published As

Publication number Publication date
CN113535473A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
CN101651559B (en) Failover method of storage service in double controller storage system
US8943258B2 (en) Server direct attached storage shared through virtual SAS expanders
CN101594383B (en) Method for monitoring service and status of controllers of double-controller storage system
CN111767244B (en) Dual-redundancy computer equipment based on domestic Loongson platform
US20060117211A1 (en) Fail-over storage system
CN107733684A (en) A kind of multi-controller computing redundancy cluster based on Loongson processor
JP5561622B2 (en) Multiplexing system, data communication card, state abnormality detection method, and program
CN105072029B (en) The redundant link design method and system of a kind of dual-active dual control storage system
CN102402395A (en) Method for uninterrupted operation of high-availability system based on quorum disk
CN113535473B (en) Cluster Server
CN111737037A (en) Baseboard management control method, master-slave heterogeneous BMC control system and storage medium
US20230409498A1 (en) Redundant baseboard management controller (bmc) system and method
CN103475695A (en) Interconnection method and device for storage system
CN113535471B (en) Cluster Server
WO2020000275A1 (en) Storage system, and method for switching operating mode of storage system
CN101488105B (en) Method for implementing high availability of memory double-controller and memory double-controller system
US20160246746A1 (en) Sas configuration management
CN112612653B (en) A business recovery method, device, arbitration server and storage system
CN212541329U (en) Dual-redundancy computer equipment based on domestic Loongson platform
CN110985426B (en) Fan control system and method for PCIE Switch product
CN113535472B (en) Cluster Server
CN107071189B (en) Connection method of communication equipment physical interface
WO2021238579A1 (en) Method for managing sata hard disk by means of storage system, and storage system
JP2002136000A (en) Uninterruptible power supply system
CN117666746B (en) Multi-node server, method, device and medium applied to multi-node server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20251009

Address after: 310051 Zhejiang Province, Hangzhou City, Binjiang District, Changhe Street, Liyeh Road 580, Building 1, 15th Floor

Patentee after: Zhejiang Huaqi Intelligent Technology Co.,Ltd.

Country or region after: China

Patentee after: ZHEJIANG DAHUA TECHNOLOGY Co.,Ltd.

Address before: No. 1187 Bin'an Road, Binjiang District, Hangzhou, Zhejiang Province

Patentee before: ZHEJIANG DAHUA TECHNOLOGY Co.,Ltd.

Country or region before: China