CN118277167A - Method for preventing power server from cerebral cracking based on quorum mechanism - Google Patents
Method for preventing power server from cerebral cracking based on quorum mechanism Download PDFInfo
- Publication number
- CN118277167A CN118277167A CN202410382774.9A CN202410382774A CN118277167A CN 118277167 A CN118277167 A CN 118277167A CN 202410382774 A CN202410382774 A CN 202410382774A CN 118277167 A CN118277167 A CN 118277167A
- Authority
- CN
- China
- Prior art keywords
- node
- master node
- quorum
- tiebreaker
- nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2053—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
- G06F11/2056—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
- G06F11/2069—Management of state, configuration or failover
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2053—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
- G06F11/2056—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Hardware Redundancy (AREA)
Abstract
The application discloses a method for preventing a power server from cerebral infarction based on quorum mechanism, which comprises the following steps: adding Tiebreaker nodes in the DRBD mirror image block equipment, which are used for judging whether the DRBD mirror image block equipment has quorum, configuring the disk state of the Tiebreaker node as Diskless, connecting the Tiebreaker nodes with other nodes in a high-availability cluster, wherein the number of the nodes in the high-availability cluster is even; the modification script configures the arbitration resource option to be majority, the modification script configures quorum to be majority, and configures the minimum redundancy quorum-minimum-redundancy to be 2; if the master node and the slave node in the high-availability cluster are disconnected, judging the connection states of the master node, the slave node and the Tiebreaker node, and processing when the votes meet majority, so that the network is recovered to be connected and the data of the master node and the slave node are correspondingly updated. The method for preventing the power server from cerebral infarction based on quorum mechanism can prevent DRBD cerebral infarction, thereby avoiding the problem of user data loss caused by cerebral infarction.
Description
Technical Field
The invention relates to the technical field of preventing DRBD from splitting, in particular to a method for preventing a power server from splitting based on quorum mechanism.
Background
DRBD (Distributed Replicated Block Device, distributed block device replication) is a software-based, shared-nothing and replication-capable distributed storage mirroring system that mirrors block device (disk, partition, volume, etc.) content between hosts. DRBD contains core modules, user management tools and scripts, typically for high availability (HA, high Availability) clusters. DRBD resembles RAID1 (disk mirroring) of a disk array except that RAID1 is in the same computer and DRBD mirrors partitions in different computers using a network. DRBD has the disk devices (partitions) it manages as one resource, each of which has two roles: a master Role (Primary Role) and a slave Role (Secondary Role). For a Primary Role DRBD device (master node), the DRBD allows operations such as reading, writing, mounting and the like to be performed on the DRBD device; in the DRBD device (slave node) of the Secondary rule, only the data transmitted from the master node can be passively received synchronously, and the DRBD device does not allow access and operation to the slave node. DRBD brain split refers to such a situation: because of the temporary network connection failure between cluster nodes, both nodes of the cluster become master nodes (from node resource roles to Primary Role), and the upper layer application may write data to both master nodes at the same time, the two nodes are not in the normal Connected state, but in the state of standby none or WFConnection, so that the two nodes do not synchronize the data of the opposite node. Thus, incomplete damage is caused to the written data.
Aiming at the problem of cerebral infarction, the DRBD provides an automatic recovery strategy: the strategy makes a sacrifice by comparing the sequence of two nodes becoming the main node or how much data of the two nodes is written during the fault, and selecting one node after the network connection is recovered, namely discarding the data written by the sacrifice node during the fault, and simultaneously synchronizing the data on the other node to the sacrifice node, so that the DRBD is automatically recovered from the brain fracture fault.
It can be seen that such an automatic recovery from a split brain policy does not guarantee the integrity of the user's data, as no matter which node is chosen as the "victim" node, the user's data will always be lost as long as the upper layer application writes data to both nodes during a network failure.
Disclosure of Invention
One advantage of the present invention is to provide a method for preventing occurrence of brain fracture of a power server based on quorum mechanism, by adding a Tiebreaker node, the present invention ensures that the copied data set is not modified based on quorum mechanism, thereby avoiding the problem of inconsistent node data, preventing occurrence of DRBD brain fracture, and further avoiding the problem of loss of user data due to occurrence of brain fracture.
To achieve at least one of the above advantages of the present invention, the present invention provides a method for preventing occurrence of brain chapping of a computing power server based on quorum mechanism, comprising the steps of:
adding Tiebreaker nodes into the DRBD mirror image block equipment, wherein the Tiebreaker nodes are used for judging whether the DRBD mirror image block equipment has quorum, configuring the disk state of the Tiebreaker nodes as Diskless, and connecting the Tiebreaker nodes with other nodes in a high-availability cluster, wherein the number of the nodes in the high-availability cluster is an even number;
The modification script configures the arbitration resource option to be majority, i.e., more than half of the number of all nodes, the modification script configures quorum to majority, and configures the minimum redundancy quorum-minimum-redundancy to 2;
if the master node and the slave node in the high-availability cluster are disconnected, judging the connection states of the master node, the slave node and the Tiebreaker node, and processing when the votes meet majority, so that the network is recovered to be connected and the data of the master node and the slave node are updated correspondingly.
According to an embodiment of the present invention, when the master node and the slave node are disconnected, and the master node and the slave node are respectively connected with the Tiebreaker nodes, and votes satisfy majority, the master node continues to operate, the slave node downgrades, data becomes Outdated, and after the master node and the slave node resume connection, the slave node synchronizes the data of the master node, and the data of the slave node becomes UpToDate.
According to an embodiment of the present invention, when the master node and the slave node are disconnected, and the master node remains connected with the Tiebreaker node, the slave node is disconnected with the Tiebreaker node, and votes majority are satisfied, the master node continues to operate, the slave node becomes Outdated, and quorum is absent, and after the master node and the slave node resume connection, the slave node is synchronized with the data of the master node, and the data of the slave node becomes UpToDate.
According to an embodiment of the invention, the on no quorum resource option is set as io error, and the on-no-quorum option is configured as suspend-io;
And when the master node and the slave node are disconnected, the master node is disconnected from the Tiebreaker node, the master node quorum suspend, the application io error on the master node DRBD is lifted to be a new master node, service continues to be operated on the new master node when the vote meets majority, and after the master node and the slave node are restored to be connected, the data of the new master node are synchronized by the master node, and meanwhile, the data of the master node is changed into UpToDate.
According to an embodiment of the present invention, if the master node remains connected to the slave node, the slave node synchronizes data of the master node normally.
These and other objects, features and advantages of the present invention will become more fully apparent from the following detailed description.
Drawings
FIG. 1 is a schematic diagram showing a method for preventing occurrence of brain cracks of a power server based on quorum mechanism according to the present application.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art. The basic principles of the invention defined in the following description may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.
It will be appreciated by those skilled in the art that in the disclosure of the present specification, the terms "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," etc. refer to an orientation or positional relationship based on that shown in the drawings, which is merely for convenience of description and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and therefore, the above terms should not be construed as limiting the present invention.
It will be understood that the terms "a" and "an" should be interpreted as referring to "at least one" or "one or more," i.e., in one embodiment, the number of elements may be one, while in another embodiment, the number of elements may be plural, and the term "a" should not be interpreted as limiting the number.
Referring to fig. 1, a method for preventing occurrence of a force server brain crack based on quorum mechanism according to a preferred embodiment of the present invention will be described in detail below, wherein the method for preventing occurrence of a force server brain crack based on quorum mechanism mainly includes the following steps:
And a Tiebreaker node is added in the DRBD mirror image block device. The Tiebreaker node is also called a resolution node, and determines the success or failure according to the data correlation of the node during the halving, namely, the node is used for judging whether the node has quorum. It should be noted, however, that the Tiebreaker node does not require the same storage hardware as the other nodes nor is it used to synchronize the backup data. Configuring the disk state of the Tiebreaker node to Diskless, and connecting the Tiebreaker node with other nodes in a high-availability cluster, wherein the number of the nodes in the high-availability cluster is an even number;
The modification script configures the arbitration resource option to be majority, i.e. more than half of the number of all nodes, i.e. the cluster partition can modify the copied data set only when the number of communicable nodes is more than half of the total number of nodes, in addition, the node without quorum needs to ensure that the copied data set is not modified, so that the node will not have the problem of inconsistent data, in addition, the modification script is configured quorum to be majority, and the minimum redundancy quorum-minimum-redundancy is configured to be 2;
if the master node and the slave node in the high-availability cluster are disconnected, judging the connection states of the master node, the slave node and the Tiebreaker node, and processing when the votes meet majority, so that the network is recovered to be connected and the data of the master node and the slave node are updated correspondingly.
More specifically, in a case where the master node and the slave node are disconnected, and the master node, the slave node are connected to the Tiebreaker nodes, respectively, and the vote satisfies majority:
the master node continues to work, the slave node downgrades, and the data becomes Outdated; after the master node and the slave node resume connection, the slave node synchronizes the data of the master node while the data of the slave node becomes UpToDate.
In another case, when the master node and the slave node are disconnected and the master node remains connected to the Tiebreaker node, the slave node is disconnected from the Tiebreaker node and the vote satisfies majority:
The master node has quorum, continues to work while the slave node becomes Outdated and lacks quorum, at which time the slave node cannot be lifted to the master node; after the master node and the slave node resume connection, the slave node synchronizes the data of the master node while the data of the slave node becomes UpToDate.
In a third case, setting on no quorum resource option as io error and configuring on-no-quorum option as suspend-io;
When the master node and the slave node are disconnected, and the master node is disconnected from the Tiebreaker node, the master node quorum suspend, the application io error on the master node DRBD, and the vote satisfies majority:
And lifting the slave node to be a new master node, continuing to run service on the new master node, and after the master node and the slave node are restored to be connected, synchronizing the data of the new master node by the master node, and changing the data of the master node into UpToDate.
In one embodiment, if the master node remains connected to the slave node, the slave node synchronizes the data of the master node normally.
It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are by way of example only and are not limiting. The advantages of the present invention have been fully and effectively realized. The functional and structural principles of the present invention have been shown and described in the examples and embodiments of the invention may be modified or practiced without departing from the principles described.
Claims (5)
1. A method for preventing occurrence of a power server brain crack based on quorum mechanism, comprising the steps of:
adding Tiebreaker nodes into the DRBD mirror image block equipment, wherein the Tiebreaker nodes are used for judging whether the DRBD mirror image block equipment has quorum, configuring the disk state of the Tiebreaker nodes as Diskless, and connecting the Tiebreaker nodes with other nodes in a high-availability cluster, wherein the number of the nodes in the high-availability cluster is an even number;
The modification script configures the arbitration resource option to be majority, i.e., more than half of the number of all nodes, the modification script configures quorum to majority, and configures the minimum redundancy quorum-minimum-redundancy to 2;
if the master node and the slave node in the high-availability cluster are disconnected, judging the connection states of the master node, the slave node and the Tiebreaker node, and processing when the votes meet majority, so that the network is recovered to be connected and the data of the master node and the slave node are updated correspondingly.
2. The method for preventing a computing power server from being cracked based on quorum mechanism according to claim 1, wherein when the master node and the slave node are disconnected and the master node and the slave node are respectively connected with the Tiebreaker nodes, and votes satisfy majority, the master node continues to operate, the slave node downgrades, data becomes Outdated, and after the master node and the slave node resume connection, the slave node synchronizes the data of the master node and the data of the slave node becomes UpToDate.
3. The method for preventing a computing power server from being cracked based on quorum mechanism according to claim 1, wherein when the master node and the slave node are disconnected and the master node remains connected with the Tiebreaker node, the slave node is disconnected with the Tiebreaker node and votes satisfy majority, the master node continues to operate, the slave node becomes Outdated and loses quorum, and after the master node and the slave node resume connection, the slave node synchronizes data of the master node and data of the slave node becomes UpToDate.
4. The method for preventing a split of a computing power server based on the quorum mechanism as claimed in claim 1, wherein the on no quorum resource option is set to io error and the on-no-quorum option is configured to be suspend-io;
And when the master node and the slave node are disconnected, the master node is disconnected from the Tiebreaker node, the master node quorum suspend, the application io error on the master node DRBD is lifted to be a new master node, service continues to be operated on the new master node when the vote meets majority, and after the master node and the slave node are restored to be connected, the data of the new master node are synchronized by the master node, and meanwhile, the data of the master node is changed into UpToDate.
5. The method for preventing a power server from cracking based on a quorum mechanism according to claim 1, wherein if the master node remains connected to the slave node, the slave node synchronizes data of the master node normally.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410382774.9A CN118277167A (en) | 2024-04-01 | 2024-04-01 | Method for preventing power server from cerebral cracking based on quorum mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410382774.9A CN118277167A (en) | 2024-04-01 | 2024-04-01 | Method for preventing power server from cerebral cracking based on quorum mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118277167A true CN118277167A (en) | 2024-07-02 |
Family
ID=91635518
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410382774.9A Pending CN118277167A (en) | 2024-04-01 | 2024-04-01 | Method for preventing power server from cerebral cracking based on quorum mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118277167A (en) |
-
2024
- 2024-04-01 CN CN202410382774.9A patent/CN118277167A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110083662B (en) | Double-living framework construction method based on platform system | |
US8335899B1 (en) | Active/active remote synchronous mirroring | |
AU2005207573B2 (en) | Geographically distributed clusters | |
US8595546B2 (en) | Split brain resistant failover in high availability clusters | |
EP2281240B1 (en) | Maintaining data integrity in data servers across data centers | |
CN101755257B (en) | Managing the copying of writes from primary storages to secondary storages across different networks | |
CN106776121B (en) | Data disaster recovery device, system and method | |
US20020194015A1 (en) | Distributed database clustering using asynchronous transactional replication | |
CN104536971A (en) | High-availability database | |
US7831550B1 (en) | Propagating results of a volume-changing operation to replicated nodes | |
WO2017041616A1 (en) | Data reading and writing method and device, double active storage system and realization method thereof | |
JP2007086972A (en) | Storage system, duplex control method, and program | |
CN115794499B (en) | Method and system for dual-activity replication data among distributed block storage clusters | |
CN112783694B (en) | Long-distance disaster recovery method for high-availability Redis | |
CN113254275A (en) | MySQL high-availability architecture method based on distributed block device | |
CN113326251B (en) | Data management method, system, device and storage medium | |
US20050097391A1 (en) | Method, system, and article of manufacture for data replication | |
AU2005207572B2 (en) | Cluster database with remote data mirroring | |
CN106325768B (en) | A kind of two-shipper storage system and method | |
CN107357800A (en) | A kind of database High Availabitity zero loses solution method | |
US7979396B1 (en) | System and method for performing consistent resynchronization between synchronized copies | |
CN114089923A (en) | Double-live storage system and data processing method thereof | |
CN107168656B (en) | Volume copy set system based on multipath disk drive and implementation method thereof | |
CN118277167A (en) | Method for preventing power server from cerebral cracking based on quorum mechanism | |
CN112231399A (en) | A method and device applied to a graph database |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |