[go: up one dir, main page]

CN118277167A - Method for preventing power server from cerebral cracking based on quorum mechanism - Google Patents

Method for preventing power server from cerebral cracking based on quorum mechanism Download PDF

Info

Publication number
CN118277167A
CN118277167A CN202410382774.9A CN202410382774A CN118277167A CN 118277167 A CN118277167 A CN 118277167A CN 202410382774 A CN202410382774 A CN 202410382774A CN 118277167 A CN118277167 A CN 118277167A
Authority
CN
China
Prior art keywords
node
master node
quorum
tiebreaker
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410382774.9A
Other languages
Chinese (zh)
Inventor
张海斌
孙骥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Faith Information Technology Co ltd
Original Assignee
Shanghai Faith Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Faith Information Technology Co ltd filed Critical Shanghai Faith Information Technology Co ltd
Priority to CN202410382774.9A priority Critical patent/CN118277167A/en
Publication of CN118277167A publication Critical patent/CN118277167A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2069Management of state, configuration or failover
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The application discloses a method for preventing a power server from cerebral infarction based on quorum mechanism, which comprises the following steps: adding Tiebreaker nodes in the DRBD mirror image block equipment, which are used for judging whether the DRBD mirror image block equipment has quorum, configuring the disk state of the Tiebreaker node as Diskless, connecting the Tiebreaker nodes with other nodes in a high-availability cluster, wherein the number of the nodes in the high-availability cluster is even; the modification script configures the arbitration resource option to be majority, the modification script configures quorum to be majority, and configures the minimum redundancy quorum-minimum-redundancy to be 2; if the master node and the slave node in the high-availability cluster are disconnected, judging the connection states of the master node, the slave node and the Tiebreaker node, and processing when the votes meet majority, so that the network is recovered to be connected and the data of the master node and the slave node are correspondingly updated. The method for preventing the power server from cerebral infarction based on quorum mechanism can prevent DRBD cerebral infarction, thereby avoiding the problem of user data loss caused by cerebral infarction.

Description

Method for preventing power server from cerebral cracking based on quorum mechanism
Technical Field
The invention relates to the technical field of preventing DRBD from splitting, in particular to a method for preventing a power server from splitting based on quorum mechanism.
Background
DRBD (Distributed Replicated Block Device, distributed block device replication) is a software-based, shared-nothing and replication-capable distributed storage mirroring system that mirrors block device (disk, partition, volume, etc.) content between hosts. DRBD contains core modules, user management tools and scripts, typically for high availability (HA, high Availability) clusters. DRBD resembles RAID1 (disk mirroring) of a disk array except that RAID1 is in the same computer and DRBD mirrors partitions in different computers using a network. DRBD has the disk devices (partitions) it manages as one resource, each of which has two roles: a master Role (Primary Role) and a slave Role (Secondary Role). For a Primary Role DRBD device (master node), the DRBD allows operations such as reading, writing, mounting and the like to be performed on the DRBD device; in the DRBD device (slave node) of the Secondary rule, only the data transmitted from the master node can be passively received synchronously, and the DRBD device does not allow access and operation to the slave node. DRBD brain split refers to such a situation: because of the temporary network connection failure between cluster nodes, both nodes of the cluster become master nodes (from node resource roles to Primary Role), and the upper layer application may write data to both master nodes at the same time, the two nodes are not in the normal Connected state, but in the state of standby none or WFConnection, so that the two nodes do not synchronize the data of the opposite node. Thus, incomplete damage is caused to the written data.
Aiming at the problem of cerebral infarction, the DRBD provides an automatic recovery strategy: the strategy makes a sacrifice by comparing the sequence of two nodes becoming the main node or how much data of the two nodes is written during the fault, and selecting one node after the network connection is recovered, namely discarding the data written by the sacrifice node during the fault, and simultaneously synchronizing the data on the other node to the sacrifice node, so that the DRBD is automatically recovered from the brain fracture fault.
It can be seen that such an automatic recovery from a split brain policy does not guarantee the integrity of the user's data, as no matter which node is chosen as the "victim" node, the user's data will always be lost as long as the upper layer application writes data to both nodes during a network failure.
Disclosure of Invention
One advantage of the present invention is to provide a method for preventing occurrence of brain fracture of a power server based on quorum mechanism, by adding a Tiebreaker node, the present invention ensures that the copied data set is not modified based on quorum mechanism, thereby avoiding the problem of inconsistent node data, preventing occurrence of DRBD brain fracture, and further avoiding the problem of loss of user data due to occurrence of brain fracture.
To achieve at least one of the above advantages of the present invention, the present invention provides a method for preventing occurrence of brain chapping of a computing power server based on quorum mechanism, comprising the steps of:
adding Tiebreaker nodes into the DRBD mirror image block equipment, wherein the Tiebreaker nodes are used for judging whether the DRBD mirror image block equipment has quorum, configuring the disk state of the Tiebreaker nodes as Diskless, and connecting the Tiebreaker nodes with other nodes in a high-availability cluster, wherein the number of the nodes in the high-availability cluster is an even number;
The modification script configures the arbitration resource option to be majority, i.e., more than half of the number of all nodes, the modification script configures quorum to majority, and configures the minimum redundancy quorum-minimum-redundancy to 2;
if the master node and the slave node in the high-availability cluster are disconnected, judging the connection states of the master node, the slave node and the Tiebreaker node, and processing when the votes meet majority, so that the network is recovered to be connected and the data of the master node and the slave node are updated correspondingly.
According to an embodiment of the present invention, when the master node and the slave node are disconnected, and the master node and the slave node are respectively connected with the Tiebreaker nodes, and votes satisfy majority, the master node continues to operate, the slave node downgrades, data becomes Outdated, and after the master node and the slave node resume connection, the slave node synchronizes the data of the master node, and the data of the slave node becomes UpToDate.
According to an embodiment of the present invention, when the master node and the slave node are disconnected, and the master node remains connected with the Tiebreaker node, the slave node is disconnected with the Tiebreaker node, and votes majority are satisfied, the master node continues to operate, the slave node becomes Outdated, and quorum is absent, and after the master node and the slave node resume connection, the slave node is synchronized with the data of the master node, and the data of the slave node becomes UpToDate.
According to an embodiment of the invention, the on no quorum resource option is set as io error, and the on-no-quorum option is configured as suspend-io;
And when the master node and the slave node are disconnected, the master node is disconnected from the Tiebreaker node, the master node quorum suspend, the application io error on the master node DRBD is lifted to be a new master node, service continues to be operated on the new master node when the vote meets majority, and after the master node and the slave node are restored to be connected, the data of the new master node are synchronized by the master node, and meanwhile, the data of the master node is changed into UpToDate.
According to an embodiment of the present invention, if the master node remains connected to the slave node, the slave node synchronizes data of the master node normally.
These and other objects, features and advantages of the present invention will become more fully apparent from the following detailed description.
Drawings
FIG. 1 is a schematic diagram showing a method for preventing occurrence of brain cracks of a power server based on quorum mechanism according to the present application.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art. The basic principles of the invention defined in the following description may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.
It will be appreciated by those skilled in the art that in the disclosure of the present specification, the terms "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," etc. refer to an orientation or positional relationship based on that shown in the drawings, which is merely for convenience of description and to simplify the description, and do not indicate or imply that the apparatus or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and therefore, the above terms should not be construed as limiting the present invention.
It will be understood that the terms "a" and "an" should be interpreted as referring to "at least one" or "one or more," i.e., in one embodiment, the number of elements may be one, while in another embodiment, the number of elements may be plural, and the term "a" should not be interpreted as limiting the number.
Referring to fig. 1, a method for preventing occurrence of a force server brain crack based on quorum mechanism according to a preferred embodiment of the present invention will be described in detail below, wherein the method for preventing occurrence of a force server brain crack based on quorum mechanism mainly includes the following steps:
And a Tiebreaker node is added in the DRBD mirror image block device. The Tiebreaker node is also called a resolution node, and determines the success or failure according to the data correlation of the node during the halving, namely, the node is used for judging whether the node has quorum. It should be noted, however, that the Tiebreaker node does not require the same storage hardware as the other nodes nor is it used to synchronize the backup data. Configuring the disk state of the Tiebreaker node to Diskless, and connecting the Tiebreaker node with other nodes in a high-availability cluster, wherein the number of the nodes in the high-availability cluster is an even number;
The modification script configures the arbitration resource option to be majority, i.e. more than half of the number of all nodes, i.e. the cluster partition can modify the copied data set only when the number of communicable nodes is more than half of the total number of nodes, in addition, the node without quorum needs to ensure that the copied data set is not modified, so that the node will not have the problem of inconsistent data, in addition, the modification script is configured quorum to be majority, and the minimum redundancy quorum-minimum-redundancy is configured to be 2;
if the master node and the slave node in the high-availability cluster are disconnected, judging the connection states of the master node, the slave node and the Tiebreaker node, and processing when the votes meet majority, so that the network is recovered to be connected and the data of the master node and the slave node are updated correspondingly.
More specifically, in a case where the master node and the slave node are disconnected, and the master node, the slave node are connected to the Tiebreaker nodes, respectively, and the vote satisfies majority:
the master node continues to work, the slave node downgrades, and the data becomes Outdated; after the master node and the slave node resume connection, the slave node synchronizes the data of the master node while the data of the slave node becomes UpToDate.
In another case, when the master node and the slave node are disconnected and the master node remains connected to the Tiebreaker node, the slave node is disconnected from the Tiebreaker node and the vote satisfies majority:
The master node has quorum, continues to work while the slave node becomes Outdated and lacks quorum, at which time the slave node cannot be lifted to the master node; after the master node and the slave node resume connection, the slave node synchronizes the data of the master node while the data of the slave node becomes UpToDate.
In a third case, setting on no quorum resource option as io error and configuring on-no-quorum option as suspend-io;
When the master node and the slave node are disconnected, and the master node is disconnected from the Tiebreaker node, the master node quorum suspend, the application io error on the master node DRBD, and the vote satisfies majority:
And lifting the slave node to be a new master node, continuing to run service on the new master node, and after the master node and the slave node are restored to be connected, synchronizing the data of the new master node by the master node, and changing the data of the master node into UpToDate.
In one embodiment, if the master node remains connected to the slave node, the slave node synchronizes the data of the master node normally.
It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are by way of example only and are not limiting. The advantages of the present invention have been fully and effectively realized. The functional and structural principles of the present invention have been shown and described in the examples and embodiments of the invention may be modified or practiced without departing from the principles described.

Claims (5)

1. A method for preventing occurrence of a power server brain crack based on quorum mechanism, comprising the steps of:
adding Tiebreaker nodes into the DRBD mirror image block equipment, wherein the Tiebreaker nodes are used for judging whether the DRBD mirror image block equipment has quorum, configuring the disk state of the Tiebreaker nodes as Diskless, and connecting the Tiebreaker nodes with other nodes in a high-availability cluster, wherein the number of the nodes in the high-availability cluster is an even number;
The modification script configures the arbitration resource option to be majority, i.e., more than half of the number of all nodes, the modification script configures quorum to majority, and configures the minimum redundancy quorum-minimum-redundancy to 2;
if the master node and the slave node in the high-availability cluster are disconnected, judging the connection states of the master node, the slave node and the Tiebreaker node, and processing when the votes meet majority, so that the network is recovered to be connected and the data of the master node and the slave node are updated correspondingly.
2. The method for preventing a computing power server from being cracked based on quorum mechanism according to claim 1, wherein when the master node and the slave node are disconnected and the master node and the slave node are respectively connected with the Tiebreaker nodes, and votes satisfy majority, the master node continues to operate, the slave node downgrades, data becomes Outdated, and after the master node and the slave node resume connection, the slave node synchronizes the data of the master node and the data of the slave node becomes UpToDate.
3. The method for preventing a computing power server from being cracked based on quorum mechanism according to claim 1, wherein when the master node and the slave node are disconnected and the master node remains connected with the Tiebreaker node, the slave node is disconnected with the Tiebreaker node and votes satisfy majority, the master node continues to operate, the slave node becomes Outdated and loses quorum, and after the master node and the slave node resume connection, the slave node synchronizes data of the master node and data of the slave node becomes UpToDate.
4. The method for preventing a split of a computing power server based on the quorum mechanism as claimed in claim 1, wherein the on no quorum resource option is set to io error and the on-no-quorum option is configured to be suspend-io;
And when the master node and the slave node are disconnected, the master node is disconnected from the Tiebreaker node, the master node quorum suspend, the application io error on the master node DRBD is lifted to be a new master node, service continues to be operated on the new master node when the vote meets majority, and after the master node and the slave node are restored to be connected, the data of the new master node are synchronized by the master node, and meanwhile, the data of the master node is changed into UpToDate.
5. The method for preventing a power server from cracking based on a quorum mechanism according to claim 1, wherein if the master node remains connected to the slave node, the slave node synchronizes data of the master node normally.
CN202410382774.9A 2024-04-01 2024-04-01 Method for preventing power server from cerebral cracking based on quorum mechanism Pending CN118277167A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410382774.9A CN118277167A (en) 2024-04-01 2024-04-01 Method for preventing power server from cerebral cracking based on quorum mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410382774.9A CN118277167A (en) 2024-04-01 2024-04-01 Method for preventing power server from cerebral cracking based on quorum mechanism

Publications (1)

Publication Number Publication Date
CN118277167A true CN118277167A (en) 2024-07-02

Family

ID=91635518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410382774.9A Pending CN118277167A (en) 2024-04-01 2024-04-01 Method for preventing power server from cerebral cracking based on quorum mechanism

Country Status (1)

Country Link
CN (1) CN118277167A (en)

Similar Documents

Publication Publication Date Title
CN110083662B (en) Double-living framework construction method based on platform system
US8335899B1 (en) Active/active remote synchronous mirroring
AU2005207573B2 (en) Geographically distributed clusters
US8595546B2 (en) Split brain resistant failover in high availability clusters
EP2281240B1 (en) Maintaining data integrity in data servers across data centers
CN101755257B (en) Managing the copying of writes from primary storages to secondary storages across different networks
CN106776121B (en) Data disaster recovery device, system and method
US20020194015A1 (en) Distributed database clustering using asynchronous transactional replication
CN104536971A (en) High-availability database
US7831550B1 (en) Propagating results of a volume-changing operation to replicated nodes
WO2017041616A1 (en) Data reading and writing method and device, double active storage system and realization method thereof
JP2007086972A (en) Storage system, duplex control method, and program
CN115794499B (en) Method and system for dual-activity replication data among distributed block storage clusters
CN112783694B (en) Long-distance disaster recovery method for high-availability Redis
CN113254275A (en) MySQL high-availability architecture method based on distributed block device
CN113326251B (en) Data management method, system, device and storage medium
US20050097391A1 (en) Method, system, and article of manufacture for data replication
AU2005207572B2 (en) Cluster database with remote data mirroring
CN106325768B (en) A kind of two-shipper storage system and method
CN107357800A (en) A kind of database High Availabitity zero loses solution method
US7979396B1 (en) System and method for performing consistent resynchronization between synchronized copies
CN114089923A (en) Double-live storage system and data processing method thereof
CN107168656B (en) Volume copy set system based on multipath disk drive and implementation method thereof
CN118277167A (en) Method for preventing power server from cerebral cracking based on quorum mechanism
CN112231399A (en) A method and device applied to a graph database

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination