CN118535551A

CN118535551A - High availability data management method and device, program product, equipment and storage medium

Info

Publication number: CN118535551A
Application number: CN202410703924.1A
Authority: CN
Inventors: 阳振坤; 徐虎; 韩富晟; 刘浩; 陈斌
Original assignee: Beijing Oceanbase Technology Co Ltd
Current assignee: Beijing Oceanbase Technology Co Ltd
Priority date: 2024-05-31
Filing date: 2024-05-31
Publication date: 2024-08-23
Also published as: WO2025246814A1

Abstract

The present specification provides a method and apparatus, a program product, an apparatus, and a storage medium for high availability data management, the method being for a first database of a data system, the data system further comprising an arbitration node and a second database, the first database being a master database, the second database being a slave database, comprising: writing target data indicated by the data writing request into the first database, and writing the target data into the second database; in response to failure in writing target data into the second database, performing transaction rollback on the data writing request, and sending a backup database degradation request to the arbitration node and the second database; responding to the determining information returned by the arbitration node and/or the second database for the backup database degradation request, and switching the data system from the backup database normal state to the backup database abnormal state; the master library is used for determining that the writing is successful in response to the data being written successfully in the master library and the slave library in the normal state of the slave library, and determining that the writing is successful in response to the data being written successfully in the master library in the abnormal state of the slave library.

Description

High availability data management method and apparatus, program product, device and storage medium

Technical Field

One or more embodiments of the present disclosure relate to the field of database technology, and in particular, to a method and apparatus for managing high availability data, a program product, a device, and a storage medium.

Background

Today, the development of the internet and informatization is rapid, and the generation of data is explosively increasing, so that the requirements for databases and management thereof are increasing. The existing distributed database always faces temporary faults by setting a main database and a standby database, namely the standby database can be switched to the main database to continue to provide service when the main database fails, so that the influence on service quality and efficiency caused by service stop when the database fails is avoided.

In general, a main library of a distributed database provides a writing service to the outside, and a backup library is used for backing up data. In the related art, when the backup database is abnormal, the distributed database is down due to the fact that writing cannot be performed, so that service cannot be provided to the outside.

Disclosure of Invention

In view of this, one or more embodiments of the present description provide a high availability data management method and apparatus, program product, device, and storage medium.

In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:

According to a first aspect of one or more embodiments of the present specification, a high availability data management method is provided, applied to a first database in a data system, the data system further including an arbitration node and a second database, the first database being a master database and the second database being a slave database, the method comprising:

In response to receiving a data writing request sent by a client, writing target data indicated by the data writing request in the first database, and writing the target data into the second database;

Responding to the failure of the execution of the write operation of the target data written into the second database, performing transaction rollback on the data write request, and respectively sending backup database degradation requests to the arbitration node and the second database;

switching the data system from a standby normal state to a standby abnormal state in response to receiving determination information returned by the arbitration node and/or the second database for the standby degradation request;

The main library is used for responding to the successful writing of the target data in the main library and the standby library respectively to return a writing success message to the client when the data system is in a standby library normal state, and responding to the successful writing of the target data in the main library to return a writing success message to the client when the data system is in a standby library abnormal state.

In one embodiment of the present disclosure, the writing, in the first database, the target data indicated by the data writing request, and writing, in the second database, the target data includes:

Writing target data indicated by the data writing request into a leader copy of the first database, and generating a data writing log;

And respectively sending the data writing log to other copies of the first database and the second database so that the other copies of the first database and the second database play back the data writing log respectively to write the target data.

In one embodiment of the present specification, the method further comprises:

Determining that the writing operation of writing the target data in the first database is successfully executed in response to the writing confirmation message returned by the copies which are not less than the first preset number in other copies of the first database and are received in the preset time period;

And determining that the write operation of writing the target data in the first database fails to be executed in response to the fact that the write confirmation message returned by the copies which are not less than the first preset number of other copies of the first database is not received within the preset time.

In one embodiment of the present disclosure, the writing the target data to the second database includes:

writing the target data to multiple copies of the second database, respectively;

determining that the writing operation of writing the target data into the second database is successfully executed in response to receiving a writing confirmation message returned by the copies which are not less than a second preset number in a plurality of copies of the second database within a first preset time period;

and responding to the fact that the write-in confirmation message is not received in the first preset time period, wherein the number of copies which are not smaller than the second preset number in the plurality of copies of the second database is not smaller than the second preset number, and determining that the write-in operation of writing the target data into the second database fails to be executed.

In one embodiment of the present specification, the method further comprises:

Responding to the fact that the first database is in a normal state, and a renewal condition is met between the current moment and the lease of the main database, and respectively sending a first main database election request to the arbitration node and the second database;

And in response to receiving confirmation information returned by the arbitration node and/or the second database for the first master library election request, prolonging the master library lease for a second preset time period.

In one embodiment of the present specification, the method further comprises:

And in response to the first database being in an abnormal state and meeting a contract-continuing condition between the current moment and the lease of the main database, not sending a first main database election request to the arbitration node and the second database, so that the second database respectively sends a second main database election request to the arbitration node and the first database in response to the condition that the contract-continuing condition is met between the current moment and the lease of the main database and the first main database is not received, and switches the second database into the main database in response to receiving confirmation information returned by the arbitration node and/or the first database for the second main database election request.

In one embodiment of the present specification, the method further comprises:

And responding to the first database in an abnormal state, receiving a data writing request sent by a client, and returning a message which cannot be written to the client.

In one embodiment of the present specification, the method further comprises:

And determining that the first database is in an abnormal state in response to failure of execution of a write operation for writing the target data in the first database.

In one embodiment of the present specification, the method further comprises:

The system partition periodically transmits a master library lease to a user partition in response to the first database being in a normal state, and determines that the first database is in an abnormal state in response to the master library lease periodically transmitted by the user partition not being received;

the user partition periodically sends a primary library lease to the system partition in response to the first database being in a normal state, and determines that the first database is in an abnormal state in response to the primary library lease periodically sent by the system partition not being received.

According to a second aspect of one or more embodiments of the present specification, there is provided a high availability data management method applied to a second database of a data system, the data system further comprising an arbitration node and a first database, the first database being a master database and the second database being a slave database, the method comprising:

In response to successful execution of a write operation by the first database to write target data to the second database, a write acknowledge message is returned to the first database to cause the first database to:

Returning a write success message to a client in response to the target database being successful in writing in the first database and the second database, respectively;

Responding to the failure of the execution of the write operation of the target data written into the second database, performing transaction rollback on the data write request, and respectively sending backup database degradation requests to the arbitration node and the second database; switching the data system from a standby normal state to a standby abnormal state in response to receiving determination information returned by the arbitration node and/or the second database for the standby degradation request;

the target data is written into the first database and the second database respectively by a data writing request instruction sent to the first database by the client; the main library is used for responding to the successful writing of the target data in the main library and the standby library respectively to return a writing success message to the client when the data system is in a standby library normal state, and responding to the successful writing of the target data in the main library to return a writing success message to the client when the data system is in a standby library abnormal state.

In one embodiment of the present specification, the method further comprises:

And responding to meeting a constraint continuing condition between the current moment and the lease of the main library, and not receiving a first main library election request sent by the first database, respectively sending a second main library election request to the arbitration node and the first database, and responding to receiving confirmation information returned by the arbitration node and/or the first database for the second main library election request, and switching the second database into the main library.

According to a third aspect of one or more embodiments of the present specification, there is provided a high availability data management apparatus comprising: a first database for use in a data system, the data system further comprising an arbitration node and a second database, the first database being a master and the second database being a slave, the apparatus comprising:

the writing module is used for responding to a data writing request sent by a receiving client, writing target data indicated by the data writing request in the first database, and writing the target data into the second database;

the sending module is used for responding to the failure of the execution of the write operation of writing the target data into the second database, carrying out transaction rollback on the data writing request and respectively sending backup database degradation requests to the arbitration node and the second database;

the downgrade module is used for responding to the received determination information returned by the arbitration node and/or the second database for the backup database downgrade request and switching the data system from the backup database normal state to the backup database abnormal state;

In one embodiment of the present specification, the writing module is configured to:

In one embodiment of the present specification, the apparatus further includes a first determining module configured to:

In one embodiment of the present disclosure, the writing module is configured to, when writing the target data to the second database:

In one embodiment of the present specification, the apparatus further includes a contract module for:

In one embodiment of the present disclosure, the apparatus further includes a master-slave switching module, configured to:

In one embodiment of the present specification, the apparatus further includes an exception handling module configured to:

In one embodiment of the present specification, the apparatus further includes an anomaly identification module configured to:

In one embodiment of the present specification, the apparatus further includes an anomaly notification module configured to:

According to a fourth aspect of one or more embodiments of the present specification, there is provided a high availability data management apparatus for use in a second database in a data system, the data system further comprising an arbitration node and a first database, the first database being a master and the second database being a slave, the apparatus comprising:

A return module, configured to return, in response to successful execution of a write operation of the first database to write target data to the second database, a write acknowledgement message to the first database, so that the first database:

According to a fifth aspect of one or more embodiments of the present description, a computer program product is presented, comprising a computer program/instruction which, when executed by a processor, implements the steps of the method of the first or second aspect.

According to a sixth aspect of one or more embodiments of the present specification, there is provided an electronic device comprising:

A processor;

A memory for storing processor-executable instructions;

Wherein the processor implements the method of the first or second aspect by executing the executable instructions.

According to a seventh aspect of one or more embodiments of the present description, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to the first or second aspect.

The technical scheme provided by the embodiment of the specification can comprise the following beneficial effects:

The high-availability data management method provided by the embodiment of the specification is applied to a first database in a data system comprising a first database currently serving as a main database, a second database currently serving as a standby database and an arbitration node, and firstly, target data indicated by a data writing request are written in the first database in response to receiving the data writing request sent by a client, and the target data are written in the second database; then, in response to failure of execution of a write operation of writing the target data into the second database, performing transaction rollback on the data writing request, and respectively sending backup database degradation requests to the arbitration node and the second database; and finally, switching the data system from the standby normal state to the standby abnormal state in response to receiving the determination information returned by the arbitration node and/or the second database for the standby degradation request.

In other words, the method is applied to a main library of a data system, when a data writing request sent by a client is received, the main library can write target data indicated by the data writing request in the main library and write the target data into a standby library, if the target data are written successfully in the main library and the standby library respectively, a writing success message can be returned to the client, namely the data writing request is successfully processed, if the target data are not written successfully in the standby library, a standby library degradation request can be initiated in the data system, and the standby library is degraded after confirmation information of any member in the data system is received, so that the main library can return the writing success message to the client only when the target data are written successfully in the main library in the subsequent processing of the data writing request. The method aims at the data system which only comprises two data copies of the main library and the standby library, the efficiency is higher when the data is written, the data safety can be ensured when the standby library is normal, namely, the data can return a writing success message after the writing of the main library and the standby library is successful, and the normal writing of the data can be ensured when the standby library is abnormal, namely, the writing of the data cannot return the writing success message and downtime because the standby library cannot write; in addition, when the exception of the backup library is not found yet (namely, when the backup library cannot be written but the data system is still in the normal state of the backup library, for example, the method is applied to a scene), the method can perform transaction rollback on the data which is only successfully written in the main library, so that the data which is not successfully written in the backup library due to the fact that the actual writing condition of the data is inconsistent with the data writing requirement of the state is avoided from being ignored, and further, the data loss is avoided when the backup library is recovered and even is switched to the main library.

Drawings

Fig. 1 is a block diagram of a data system provided by an exemplary embodiment.

Fig. 2 is a block diagram of a first database provided by an exemplary embodiment.

Fig. 3 is a block diagram of a second database provided by an exemplary embodiment.

Fig. 4 is a flow chart of a high availability data management method applied to a first database provided by an exemplary embodiment.

Fig. 5 is a block diagram of a data system provided by an exemplary embodiment.

FIG. 6 is a scene graph of normal writing of data provided by an exemplary embodiment.

FIG. 7 is a scene graph of a backup exception provided by an exemplary embodiment.

FIG. 8 is a scene graph of a master library exception provided by an exemplary embodiment.

FIG. 9 is a flowchart of a method of high availability data management applied to a second database, as provided by an exemplary embodiment.

Fig. 10 is a schematic diagram of an apparatus according to an exemplary embodiment.

Fig. 11 is a block diagram of a high availability data management device for application to a first database, as provided by an example embodiment.

Fig. 12 is a block diagram of a high availability data management device for application to a second database, as provided by an example embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.

It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.

In general, a main library of a distributed database provides a writing service to the outside, and a backup library is used for backing up data. In the related art, when the backup database is abnormal, the distributed database is down due to the fact that writing cannot be performed, so that service cannot be provided to the outside. For example, a distributed database may operate in a maximum performance mode, which can guarantee the efficiency of data writing, but cannot guarantee data security; namely, after the data is written into the main library, a response message is returned to the client, the written data is asynchronously synchronized to the standby library, the data writing efficiency of the main library (namely, the efficiency of returning the response message to the client) is not affected by the time delay of the synchronization of the data to the standby library, but the data cannot be ensured to be successfully synchronized to the standby library when the main library returns the response message to the client, so that the data in the standby library can be missed relative to the main library, and the data is lost when the main library and the standby library are switched. For another example, the distributed database may operate in a maximum protection mode, which may ensure data security, but may reduce the efficiency of data writing; the data is successfully written in the main library and the written data is successfully synchronized to the standby library, and then a response message is returned to the client, especially when the standby library is abnormal, the main library can not always return the response message to the client, and further the service can not be provided for the outside because the data can not be written in.

Based on this, at least one embodiment of the present disclosure provides a high availability data management method, where the method may automatically dynamically adjust a state of a data system, so that the data system returns a response message to a client when writing of data in both a master library and a slave library is successful in a slave library normal state, and returns the response message to the client when writing of data in the master library is successful in a slave library abnormal state, so that data security and processing efficiency of the data system in processing a data writing request may be both considered, that is, data is ensured to be written successfully in the master library and the slave library when the slave library is abnormal, and response message is prevented from being returned as normal because the slave library cannot be written successfully all the time when the slave library is abnormal.

The method is applied to the distributed data system shown in fig. 1, wherein the data system comprises a first database A, a second database B and an arbitration node C, the first database A, the second database B and the arbitration node C respectively operate a distributed consensus protocol, such as a Paxos protocol, namely, the first database A, the second database B and the arbitration node C form a distributed consensus protocol member group and are respectively used as one member in the member group, and a master database, a standby database and arbitration function as three role information of the member group and are acted by the three members, wherein the arbitration node C always acts as the role information for arbitration, and the role information of the master database can only be acted by one member at the same time. For example, the first database A in FIG. 1 serves as a master and the second database B serves as a slave, both of which together with the arbitration node C form a Paxos membership group { A, master ], [ B, slave ], [ C, arbitration ] }.

The first database may be in the form of a cluster of databases, which contains a plurality of partitions within, each partition containing a plurality of copies, i.e., a leader copy and a normal copy. For example, the first database a shown in fig. 2 has only one partition, and the partition includes three copies of copy R1, copy R2, and copy R3, and the leader copy is copy R1, the member group information of the internal partition of the first database may be [ a, master library, (R1, R2, R3) ]. Similarly, the second database may be in the form of a cluster of databases, which internally contains a plurality of partitions, each of which contains a plurality of copies, i.e., a leader copy and a normal copy. For example, the second database B shown in fig. 3 has only one partition, and the partition includes three copies of copy R1, copy R2, and copy R3, and the leader copy is copy R4, the membership group information of the internal partition of the first database may be [ B, backup, (R4, R5, R6) ]. As to the way the leader copy of each partition in the databases is elected, this disclosure is not intended to be limiting.

The method may be performed by a first database as a primary database and a second database as a backup database within the data system, and the method is described in detail below from both sides of the first database and the second database, respectively.

Referring to fig. 4, a flow of a high availability data management method applied to a first database is schematically shown, and includes steps S401 to S403.

In step S401, in response to receiving a data writing request sent by a client, writing target data indicated by the data writing request in the first database, and writing the target data into the second database.

The data system is provided with a standby normal state and a standby abnormal state, and the main library is used for responding to the successful writing of the target data in the main library and the standby library respectively to return a writing success message to the client when the data system is in the standby normal state, and responding to the successful writing of the target data in the main library to return a writing success message to the client when the data system is in the standby abnormal state. In other words, the main library needs to write data into the main library and the standby library in the normal state of the standby library, and needs to write data into the main library in the abnormal state of the standby library. It should be understood that the state of the database system and the actual state of the backup library are not necessarily the same in real time, and the method can correct the state of the database system and compensate and protect data loss possibly caused by the difference between the two states.

The data writing request is used for indicating a data system to write target data. The data write request is received and processed by a first database that is the master. Because the current state of the database system is the normal state of the standby database, the first database serving as the main database writes target data into the first database and the second database respectively.

It should be appreciated that if the data system has a system partition and a user partition, the data write request needs to be assigned to the corresponding partition to perform the write operation of this step. Preferably, if the write operation is performed by the user partition, the result of the write operation may be synchronized to the system partition, especially if the write operation to write the target data to the second database fails to be performed.

Illustratively, in this step, the target data may be written in the first database and the target data may be written in the second database in the following manner:

First, target data indicated by the data writing request is written in a leader copy of the first database, and a data writing log is generated.

And then, respectively sending the data write log to other copies of the first database and the second database, so that the other copies of the first database and the second database play back the data write log respectively to write the target data. For example, the leader copy of the first database sends the data write log to the other copy of the first database and the second database, respectively.

It should be understood that after the other copies of the first database receive the data write log sent by the leader copy, the target data is written into the copy by playing back the data write log, and after the writing is successful, a write confirmation message is returned to the leader copy. Based on this, (leader copy) may determine that a write operation of writing the target data in the first database is successfully performed in response to receiving a write acknowledge message returned from not less than a first preset number of copies among other copies of the first database within a preset period of time; and determining that the write operation of writing the target data in the first database fails to be executed in response to the fact that the write confirmation message returned by the copies which are not less than the first preset number of other copies of the first database is not received within the preset time.

In other words, the first database determines that the target data is successfully written when both the leader copy and the normal copy which is not less than the first preset number in the first database are successfully written into the target data. I.e. the first database has m copies, and when the target data is successfully written in Q (m) copies, it is determined that the target data is successfully written in the first database (Q (m) is a first preset number+1), for example, fig. 2 shows that the first database a has m copies, Q (m) is 2, and then the internal member group information thereof may be [ a, master library, (R1, R2, R3), 3,2].

In this example, after completing the writing of the target data and generating the data write log, the leader copy of the first database synchronizes the data write log to the other copies of the first database and the second database, compared with the mode that in the related art, the leader copy synchronizes target data to other copies of the first database through the data writing log and then synchronizes the target data to the second database, the method can improve the efficiency of synchronizing the target data to the second database and reduce the delay time of data synchronization between the main database and the standby database.

Also by way of example, the target data may be written to the second database in this step as follows:

first, the target data is written to multiple copies of the second database, respectively.

For example, the target data is written to multiple copies of the second database in the manner described in the above example, that is, the leader copy of the first database sends the data write log to each copy of the second database, so that each copy of the second database plays back the data write log to write the target data within the copy. In other words, the specific process of writing the target data in the first database and writing the target log in the second database in this step is shown in fig. 5, namely: first, writing target data indicated by the data writing request in a leader copy of the first database, and generating a data writing log; second, the data writing log is sent to other copies of the first database and each copy of the second database respectively, so that the other copies of the first database and each copy of the second database play back the data writing log respectively to write the target data.

For another example, the first database writes the target data to each copy of the second database simultaneously after writing the target data to both the leader copy and the other copies (e.g., the first predetermined number of copies), e.g., by sending a log write log.

It should be appreciated that the copy of the second database may return a write acknowledge message to the first database (e.g., the leader copy of the first database) after the target data is successfully written.

Then, in response to receiving a write-in confirmation message returned by the copies which are not smaller than a second preset number in the plurality of copies of the second database within a first preset time period, determining that the write operation of writing the target data into the second database is successfully executed; and responding to the fact that the write-in confirmation message is not received in the first preset time period, wherein the number of copies which are not smaller than the second preset number in the plurality of copies of the second database is not smaller than the second preset number, and determining that the write-in operation of writing the target data into the second database fails to be executed.

In other words, when the number of copies in the second database is not smaller than the second preset number, the target data is successfully written into the second database. I.e., the second database has n copies, and it is determined that the target data is successfully written into the second database only when the target data is successfully written into Q (n) copies, for example, fig. 3 shows that the second database B has 3 copies, Q (n) is 2, and then the internal membership group information thereof may be [ B, backup (R4, R5, R6), 3,2].

In this example, the first database determines that the target data is successfully written into the second database only when the target data is successfully written into multiple copies of the second database, so that the safety and reliability of the target data can be increased, more data are backed up, and the situation that the data cannot be recovered due to backup loss is avoided.

As can be seen from the above description of this step, in fig. 1, the first database a, the second database B, and the arbitration node C form Paxos member groups { [ a, master library, (master library replica list), m, Q (m) ], [ B, backup library, (backup library replica list), n, Q (n) ], [ C, arbitration ] }; the data system shown in FIG. 5, for example, forms Paxos membership sets { [ A, master pool, (R1, R2, R3), 3,2], [ B, backup pool, (R4, R5, R6), 3,2], [ C, arbitration ] }.

In step S402, in response to failure in execution of the write operation to write the target data to the second database, the data write request is transaction rolled back, and a backup database destage request is sent to the arbitration node and the second database, respectively.

The transaction rollback of the data writing request refers to recovering the response action of the data writing request, which is completed, for example, deleting the target data written in the first database. And simultaneously, a writing failure message can be returned to the client. The data of the main library and the standby library can be kept consistent through transaction rollback, so that the data which is not successfully written into the standby library and is ignored due to the fact that the actual writing situation of the data is inconsistent with the data writing requirement of the state is avoided, and further, the data loss caused by the recovery of the standby library and even the switching into the main library is avoided.

Illustratively, taking the data system shown in fig. 5 as an example, the backup pool destage request may be to change Paxos membership { [ a, master pool, (R1, R2, R3), 3,2], [ B, backup pool, (R4, R5, R6), 3,2], [ C, arbitration ] } to { [ a, master pool, (R1, R2, R3), 3,2], [ B, backup pool, (R4, R5, R6), 3,0], [ C, arbitration ] }.

The arbitration node and (the leader copy of) the second database, upon receiving the backup destage request, return to the first database determination information returned for the backup destage request. It should be appreciated that when the second database fails to successfully write the target data, a backup database destage request may not be received, and determination information returned for the backup database destage request may not be returned.

Preferably, if the system partition of the data system executes the writing operation in step S401, when the writing operation of writing the target data into the second database fails to execute, the system partition may initiate a backup database degradation in the distributed consensus protocol membership group, that is, send a backup database degradation request to the arbitration node and the second database respectively; if the user partition of the data system performs the write operation in step S401, when the write operation of writing the target data into the second database fails, the result may be synchronized to the system partition, so that the system partition initiates the backup degradation in the distributed consensus protocol member group, that is, sends a backup degradation request to the arbitration node and the second database, respectively.

It should be appreciated that the first database may return a write success message to the client that the data write request completes processing in response to successful execution of a write operation to the first database to write the target data and successful execution of a write operation to the second database to write the target data.

In step S403, in response to receiving the determination information returned by the arbitration node and/or the second database for the backup degradation request, the data system is switched from a backup normal state to a backup abnormal state.

The main library is used for responding to the successful writing of the target data in the main library and the standby library respectively to return a writing success message to the client when the data system is in a standby library normal state, and responding to the successful writing of the target data in the main library to return a writing success message to the client when the data system is in a standby library abnormal state; then the first database may return a write success message to the client only in response to successful execution of the write operation to write the target data to the first database upon receiving a new data write request or re-executing the data write request for transaction rollback in step S402.

It should be understood that, because the moment when the data system is switched from the normal state of the backup repository to the abnormal state of the backup repository is recorded, the backup of the data which is not backed up to the backup repository in the abnormal state of the backup repository can be completed by a worker through a specific operation after the backup repository is restored to be normal. This also illustrates from the side that the actual writing of data matches the data writing requirements of the state in order to make the data recording accurate and efficient, i.e. the necessity of a transaction rollback in step S402.

After the backup database is recovered to be normal, the first database can send backup database upgrading requests to the arbitration node and the second database respectively, and switch the state of the data system from the backup database abnormal state to the backup database normal state in response to receiving confirmation messages for the backup database upgrading requests returned by the arbitration node and/or the second database.

For example, taking the data system shown in fig. 5 as an example, the backup library upgrade request may be to change Paxos membership { [ a, master library, (R1, R2, R3), 3,2], [ B, backup library, (R4, R5, R6), 3,0], [ C, arbitration ] } to { [ a, master library, (R1, R2, R3), 3,2], [ B, backup library, (R4, R5, R6), 3,2], [ C, arbitration ] }.

Next, the effect of the high availability data management method provided in this embodiment will be specifically described with reference to a specific scenario and the data system shown in fig. 5. The Paxos membership group of the data system shown in fig. 5 in the normal state of the backup pool is { [ A, master pool, (R1, R2, R3), 3,2], [ B, backup pool, (R4, R5, R6), 3,2], [ C, arbitration ] }.

Referring to the scenario shown in fig. 6 and the scenario shown in fig. 7, one data write can ensure that at least 2 of the copies R1, R2, R3 are successfully written, and at least 2 of the copies R4, R5, R6 are successfully written, so that both the master library and the slave library are normal.

Referring to the scenario shown in fig. 7, 2 copies of the backup repository are abnormal, and one data write cannot meet the requirement of writing at least 2 copies of the backup repository, and the data system cannot write data. At this time, the main library and the arbitration node are still alive as the most groups of member group changes, the Leader copy of the main library system partition finds that the backup library cannot write data into 2 copies within a set time, and Paxos member groups { [ A, main library, (R1, R2, R3), 3,2], [ B, backup library, (R4, R5, R6), 3,2], [ C, arbitration ] } of the data system are changed to { [ A, main library, (R1, R2, R3), 3,2], [ B, backup library, (R4, R5, R6), 3,2], [ C, arbitration ] }; at this time, the data writing is successful only by successfully writing any 2 copies of the main library, and the system recovers the normal data writing capability. After the backup copies R5 and R6 are repaired, the Leader copy of the partition of the main library system can be used as a majority group of member group changes through the main library and the arbitration node, paxos member groups of the data system are { [ A, main library, (R1, R2, R3), 3,2], [ B, backup library, (R4, R5, R6), 3,2], [ C, arbitration ] } are changed into { [ A, main library, (R1, R2, R3), 3,2], [ B, backup library, (R4, R5, R6), 3,2], [ C, arbitration ] }, and then the data protection capability of the system is restored to be normal after the data is written successfully when the 2 copies of the main library and the 2 copies of the backup library are written successfully.

In the distributed consensus protocol membership group formed by the data system, the role information of the master library is unique and has a master library lease, and once the master library lease expires, the role information of the master library fails, so that a reasonable renewal mechanism and a master-slave library switching mechanism matched with the renewal mechanism can be set to identify whether the database serving as the master library is normal or not, thereby ensuring that the abnormality of the database serving as the master library can be found in time and ensuring that the database serving as the master library can provide services to the outside. Preferably, once a database is used as a main database, the database can be continuously used as the main database through a contract continuing mechanism if no abnormality occurs, and the database can be unloaded from the main database and the main database on the primary database through a main-standby database switching mechanism if the abnormality occurs.

In some embodiments of the present disclosure, the procedure of the renewal mechanism is as follows: firstly, responding to the fact that the first database is in a normal state and a renewal condition is met between the current moment and a lease of a main database, and respectively sending a first main database election request to the arbitration node and the second database; and then, in response to receiving confirmation information returned by the arbitration node and/or the second database for the first master election request, prolonging the master lease for a second preset time period.

The primary and standby library switching mechanism is as follows: and in response to the first database being in an abnormal state and meeting a contract-continuing condition between the current moment and the lease of the main database, not sending a first main database election request to the arbitration node and the second database, so that the second database respectively sends a second main database election request to the arbitration node and the first database in response to the condition that the contract-continuing condition is met between the current moment and the lease of the main database and the first main database is not received, and switches the second database into the main database in response to receiving confirmation information returned by the arbitration node and/or the first database for the second main database election request.

Preferably, the primary and standby library switching mechanism may further include: and responding to the first database in an abnormal state, receiving a data writing request sent by a client, and returning a message which cannot be written to the client. Namely, when the first database is abnormal, the master database is unloaded in time, and the data writing request of the client is not processed.

It should be understood that the provisioning mechanism and the primary and backup library switching mechanism described above in this embodiment may be executed by the system partition of the first database, and notify the user that the partition is automatically provisioned after provisioning (i.e., continues to assume the role of the primary library within the partition); or the provisioning mechanism and the primary and secondary repository switching mechanism described above in this embodiment may be performed within the partition separately by each partition (i.e., the system partition and the user partition) of the first database.

For example, using the data system shown in FIG. 5 as an example, the second master election request may be to change Paxos membership { [ A, master, (R1, R2, R3), 3,2], [ B, backup, (R4, R5, R6), 3,2], [ C, arbitration ] } to { [ A, backup, (R1, R2, R3), 3,2], [ B, master, (R4, R5, R6), 3,2], [ C, arbitration ] }.

The meeting of the offer condition between the current time and the main library lease may be a specific time interval between the current time and an expiration time of the main library lease, i.e., a main library lease period.

Wherein the first database is not determined to be in an abnormal state, and can be determined to be in a normal state. By way of example, it may be determined whether the first database is in an abnormal state in a manner provided by any of the following alternative examples:

Alternative example 1

In this example, if a partition (e.g., a system partition or a user partition) fails to perform a write operation to write target data in the first database, the partition determines that the first database is in an abnormal state.

Alternative example 2

In this example, the system partition and the user partition periodically send a master library lease to each other to notify each other that the partner itself is normal. The system partition and the user partition are normal, and the first database can be confirmed to be in a normal state only by confirming the other party through the notification, and the system partition and the user partition can be timely informed of the abnormality of the other party through the periodic mutual notification. For example, the system partition does not select a leader copy, and is in an abnormal state, and if no leader copy is available, the system partition cannot send a main library lease to the user partition, so that the user partition can timely learn that the system partition is abnormal, and timely confirm that the first database is in an abnormal state; for another example, if the system partition determines that the first database is in an abnormal state (i.e., the manner provided in example 1 is available) due to a failure of a write operation of writing the target data, the main library lease will not be sent to the user partition, and then the user partition can timely learn that the system partition is abnormal and timely confirm that the first database is in an abnormal state; for another example, the user partition does not select the leader copy, and is in an abnormal state, and if no leader copy can not send a main library lease to the system partition, so that the system partition can timely learn that the user partition is abnormal, and timely confirm that the first database is in an abnormal state; for another example, if the user partition determines that the first database is in an abnormal state (i.e., the manner provided in example 1 is available) due to a failure of the write operation of the write target data, the main library lease will not be sent to the system partition, and thus the system partition can timely learn that the user partition is abnormal and timely confirm that the first database is in an abnormal state.

Based on this, if the above-mentioned constraint continuing mechanism and the primary-secondary-library switching mechanism in the present embodiment are executed by the system partition of the first database, by this example, when the system partition itself is abnormal or the user partition is abnormal, it may be determined that the first database is in an abnormal state in time, and then the primary library is unloaded, so that the primary-secondary-library switching mechanism executes and completes the primary-secondary-library switching; if the above-mentioned provisioning mechanism and the primary-backup database switching mechanism in this embodiment are executed in each partition (i.e., the system partition and the user partition) of the first database respectively, by this example, the system partition or the user partition can learn the abnormality of the other party in time to determine that the first database is in an abnormal state, and then the primary database is unloaded, so that each partition completes the primary-backup database switching by executing the primary-backup database switching mechanism.

Next, the effect of the active-standby library switching mechanism provided in this embodiment will be specifically described with reference to a specific scenario and the data system shown in fig. 5. The Paxos membership group of the data system shown in fig. 5 in the normal state of the backup pool is { [ A, master pool, (R1, R2, R3), 3,2], [ B, backup pool, (R4, R5, R6), 3,2], [ C, arbitration ] }.

Referring to the scenario shown in fig. 6, one data write can ensure that at least 2 of the copies R1, R2, R3 are successfully written, and at least 2 of the copies R4, R5, R6 are successfully written, so that both the master library and the slave library are normal.

Referring to the scenario shown in fig. 8, the abnormal copy of a partition of the main library a is greater than 2, the requirement of 2 copies cannot be met by one data write, and the data system cannot write data. If the exception occurs to the system partition of the main library, the leader copies of other partitions in the main library A are unloaded from the main library and switched to the standby library because the host library lease periodically sent by the leader copies of the system partition is not received; if the exception occurs to the user partition of the main library, the Leader copy of the system partition will be disconnected from the main library and switched to the standby library because no lease of the main library periodically sent by the Leader copy of the user partition is received, and finally the exception of one partition will cause all the partitions to be switched to the standby library. The backup library B and the arbitration node C are used as the most groups of member group change, and Paxos member groups of the data system { [ A, master library, (R1, R2, R3), 3,2], [ B, backup library, (R4, R5, R6), 3,2], [ C, arbitration ] } are changed into { [ A, backup library, (R1, R2, R3), 3,2], [ B, master library, (R4, R5, R6), 3,2], [ C, arbitration ] }, and the master-backup library switching is completed; furthermore, the Leader copy of the new main library B system partition finds that the backup library a cannot write data to 2 copies within the set time, and can further change Paxos membership groups of the data system { [ a, main library, (R1, R2, R3), 3,2], [ B, backup library, (R4, R5, R6), 3,2], [ C, arbitration ] } to { [ a, backup library, (R1, R2, R3), 3,0], [ B, main library, (R4, R5, R6), 3,2], [ C, arbitration ] }, and at this time, the system resumes normal data writing capability as long as writing of any 2 copies of the main library B is successful. After the copies R1 and R2 of the backup library A are repaired, the Leader copy of the system partition of the main library B can be used as a majority group of member group changes by the main library and the arbitration node, paxos member group { [ A, backup library, (R1, R2, R3), 3,0], [ B, main library, (R4, R5, R6), 3,2], [ C, arbitration ] } is changed into { [ A, backup library, (R1, R2, R3), 3,2], [ B, main library, (R4, R5, R6), 3,2], [ C, arbitration ] }, then the success is calculated when the data is written in, and the data protection capability of the system is recovered to be normal after the success when the 2 copies of the main library and the 2 copies of the backup library are written in the same time.

In the above scenario, the exception of the master library a may be caused by the network isolation of the master library a, the master library B and the arbitration node C, and if the master library a has not been detached from the master library after the network isolation and has received the data writing request sent by the client, the master library a cannot successfully write data into the master library a and the slave library B at the same time due to the network isolation, and the data writing request will be overtime or fail; and after the network of the main library A is isolated, the main library is unloaded, and a data writing request sent by the client is received, wherein the data writing request is refused when the main library A is unloaded. And then, the original main library A is switched to the standby library, and the original standby library B is switched to the main library, so that the data of the A, B databases are completely consistent, the data loss is avoided in the switching process, and the data loss is realized.

In this embodiment, each time the primary repository lease is in the primary repository lease, if the primary repository lease is in a normal state, the primary repository election request is sent to the arbitration node and the standby repository respectively, that is, the primary repository is claimed to continue to function as the primary repository. And the arbitration node and the standby library can return confirmation information to the main library aiming at the first main library election request after receiving the first main library election request so as to lead the main library of the main library to be leased for renewing. If the main library is in an abnormal state, for example, the writing operation of writing target data into a certain partition fails, the leader copy is not elected by a certain partition, and the like, the main library is unloaded, and the data writing request sent by the client is not processed; and when the lease of the main library is in the temporary period, a first main library election request is not sent to the arbitration node and the standby library, so that the arbitration node and the standby library can learn that the main library is abnormal, and then the standby library can initiate main library election in the distributed consensus protocol member group according to a main and standby library switching mechanism, namely, a second main library election request is respectively sent to the arbitration node and the original main library, and the original main library is selected according to the received confirmation information returned by the arbitration node and/or the original main library for the second main library election request. It should be appreciated that operations of the standby to initiate a master election within the distributed consensus protocol membership group and the above-mentioned master may be performed by a system partition of the standby and broadcast to other partitions after the above-mentioned master. After the original database is the new main database, the original main database which is lowered into the standby database is confirmed whether to be in a normal state according to the flow shown in the figure 4, and the state of the database system is confirmed to be in a standby normal state or a standby abnormal state according to the confirmation result; that is, after the primary library is abnormal and the primary and secondary libraries are switched, the new primary library can downgrade the abnormal primary library to avoid that the database system can not provide normal service to the outside, and taking the data system shown in fig. 5 as an example, paxos member group { [ A, secondary library, (R1, R2, R3), 3,2], [ B, primary library, (R4, R5, R6), 3,2], [ C, arbitration ] } can be further changed into { [ A, secondary library, (R1, R2, R3), 3,0], [ B, primary library, (R4, R5, R6), 3,2], [ C, arbitration ] }.

By combining the above embodiments, it can be known that any one of the master library or the backup library of the data system in the disclosure is abnormal, and the system can automatically perform degradation processing, ensure that the data is lossless and continue to provide read-write services. Particularly, when the main library fails, the system can automatically upgrade the standby library into the main library, ensure that the data is lossless and continue to provide read-write service. If the main library, the standby library and the arbitration node are deployed in three machine rooms, the system can ensure that the system can not damage data and continue to provide read-write service when any single machine room fails. In other words, the method and the device apply the arbitration node based on the distributed consensus protocol to the disaster recovery scene of the main and standby libraries of the distributed database, realize that the main and standby libraries are automatically switched as a whole when the main library fails and all partition data are lossless, and ensure that 0 of user data is lost and service is not stopped when the single library fails.

Referring to fig. 9, a flow of a high availability data management method applied to a second database is schematically shown, which includes step S901.

In step S901, in response to successful execution of the write operation of the first database to write the target data to the second database, a write acknowledge message is returned to the first database to cause the first database to:

In some embodiments of the present disclosure, the method further comprises: and responding to meeting a constraint continuing condition between the current moment and the lease of the main library, and not receiving a first main library election request sent by the first database, respectively sending a second main library election request to the arbitration node and the first database, and responding to receiving confirmation information returned by the arbitration node and/or the first database for the second main library election request, and switching the second database into the main library.

The details of the method for managing high availability data applied to the second database side are described in detail in the foregoing embodiments of the method for managing high availability data applied to the first database side, and are not repeated here.

Fig. 10 is a schematic structural diagram of an apparatus provided in an exemplary embodiment. Referring to fig. 10, at the hardware level, the device includes a processor 1002, an internal bus 1004, a network interface 1006, a memory 1008, and a non-volatile memory 1010, although other tasks may be performed by the device. One or more embodiments of the present description may be implemented in a software-based manner, such as by the processor 1002 reading a corresponding computer program from the non-volatile memory 1010 into the memory 1008 and then running. Of course, in addition to software implementation, one or more embodiments of the present disclosure do not exclude other implementation manners, such as a logic device or a combination of software and hardware, etc., that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device.

Referring to fig. 11, the high availability data management apparatus may be applied to the device shown in fig. 10 to implement the technical solution of the present specification. Wherein the device may be a master library in a data system further comprising an arbitration node and a second database, the second database being a backup library, the high availability data management apparatus may comprise:

a writing module 1101, configured to write, in response to receiving a data writing request sent by a client, target data indicated by the data writing request in the first database, and write the target data to the second database;

A sending module 1102, configured to perform transaction rollback on the data write request in response to failure of execution of a write operation for writing the target data into the second database, and send backup degradation requests to the arbitration node and the second database, respectively;

A downgrade module 1103, configured to switch the data system from a standby normal state to a standby abnormal state in response to receiving determination information returned by the arbitration node and/or the second database for the standby downgrade request;

Referring to fig. 12, the high availability data management apparatus may be applied to the device shown in fig. 10 to implement the technical solution of the present specification. Wherein the device may be used as a backup repository in a data system, the data system further comprising an arbitration node and a first database, the first database being used as a master repository, the high availability data management apparatus may comprise:

A return module 1201, configured to return, in response to successful execution of a write operation of the first database to write the target data to the second database, a write acknowledgement message to the first database, so that the first database:

One or more embodiments of the present specification also provide a computer program product comprising a computer program/instruction which, when executed by a processor, performs the steps of the method provided by any of the embodiments described above.

One or more embodiments of the present specification also provide a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method provided by any of the embodiments described above.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.

In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

The user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of related data is required to comply with the relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation entries for the user to select authorization or rejection.

It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context.

The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.

Claims

1. A high availability data management method applied to a first database in a data system, the data system further comprising an arbitration node and a second database, the first database being a master database and the second database being a slave database, the method comprising:

2. The high availability data management method according to claim 1, the writing target data indicated by the data write request in the first database, and writing the target data to the second database, comprising:

3. The high availability data management method of claim 2, the method further comprising:

4. The high availability data management method according to claim 1 or 2, the writing the target data to the second database comprising:

5. The high availability data management method of claim 1, the method further comprising:

6. The high availability data management method of claim 1, the method further comprising:

7. The high availability data management method of claim 1, the method further comprising:

8. The high availability data management method of claim 1, the method further comprising:

9. The high availability data management method according to claim 5 or 8, the method further comprising:

10. A method of high availability data management for a second database in a data system, the data system further comprising an arbitration node and a first database, the first database being a master and the second database being a slave, the method comprising:

11. The high availability data management method of claim 10, the method further comprising:

12. A high availability data management apparatus comprising: a first database for use in a data system, the data system further comprising an arbitration node and a second database, the first database being a master and the second database being a slave, the apparatus comprising:

13. A high availability data management apparatus for use with a second database in a data system, the data system further comprising an arbitration node and a first database, the first database being a master and the second database being a slave, the apparatus comprising:

14. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 11.

15. An electronic device, comprising:

A processor;

A memory for storing processor-executable instructions;

Wherein the processor is configured to implement the method of any one of claims 1 to 11 by executing the executable instructions.

16. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 11.