CN111858626B

CN111858626B - Parallel execution-based data synchronization method and device

Info

Publication number: CN111858626B
Application number: CN202010499491.4A
Authority: CN
Inventors: 孙峰; 彭青松; 刘启春
Original assignee: Wuhan Dream Database Co ltd
Current assignee: Wuhan Dream Database Co ltd
Priority date: 2020-06-04
Filing date: 2020-06-04
Publication date: 2024-06-21
Anticipated expiration: 2040-06-04
Also published as: CN111858626A

Abstract

The present invention relates to the field of databases, and in particular, to a method and apparatus for data synchronization based on parallel execution. Mainly comprises the following steps: acquiring a transaction to be synchronized; creating at least two execution threads; the transactions are distributed to different idle execution threads one by one according to the submitting sequence; acquiring each operation in the current transaction of each execution thread, and constructing a row lock for each operation according to the unique identifier of each operation; judging whether the current transaction in each execution thread has row lock conflict with the transaction before the current transaction in the submitting sequence, and placing the current transaction into a wakeup queue of the execution thread where the transaction having row lock conflict exists for waiting; the execution thread concurrently executes the current transaction without row lock conflict; after each execution thread completes the current transaction, the transaction in the wakeup queue of the execution thread is waken. The invention improves the synchronization efficiency, reduces the control complexity and realizes the efficient and simple parallel synchronization of the heterogeneous databases.

Description

Parallel execution-based data synchronization method and device

[ Field of technology ]

The present invention relates to the field of databases, and in particular, to a method and apparatus for data synchronization based on parallel execution.

[ Background Art ]

The traditional main and standby mechanisms based on the database realize real-time replication of the database data, and are important solutions for carrying out disaster recovery backup of the data and guaranteeing the data safety. However, the database main-standby mechanism requires that the database system of the standby machine is consistent with the host machine, and for the heterogeneous database system environment, effective real-time data replication cannot be realized by using the main-standby mechanism of the database. In order to realize real-time data replication of heterogeneous databases, a heterogeneous database synchronization method based on software is generally used at present, the method captures incremental data of a source database at a source end, then sends the incremental data to a target end, and applies the incremental data to the target database through a general database access interface at the target end to realize data replication.

In the real-time synchronization process of database synchronization, synchronization needs to be performed according to the operation sequence of each transaction recorded in the database log file, otherwise, the data consistency between the source database and the target database is destroyed. If the data synchronization software of the target end performs synchronization strictly according to the transaction sequence in the log file of the source end database, although the consistency of data replication can be effectively ensured, the parallelism of data synchronization can be seriously affected, and the synchronization efficiency can be very low.

In view of this, how to overcome the defects of the prior art and solve the problem that the data consistency and parallel execution conflict when heterogeneous databases are synchronized is a problem to be solved in the technical field.

[ Invention ]

Aiming at the defects or improvement demands of the prior art, the invention solves the problem of parallel conflict generated for ensuring the consistency of data when the databases are synchronized, and realizes efficient and simple heterogeneous database parallel synchronization on the basis of ensuring the accuracy of synchronous data.

The embodiment of the invention adopts the following technical scheme:

In a first aspect, the present invention provides a method for data synchronization based on parallel execution, specifically: acquiring a transaction to be synchronized; creating at least two execution threads, wherein each execution thread comprises a wake-up queue; the method comprises the steps of distributing transactions to different idle execution threads one by one according to a commit order, wherein the transaction distributed to each thread is the current transaction of each execution thread; acquiring each operation in the current transaction of each execution thread, and constructing a row lock for each operation according to the unique identifier of each operation; judging whether the current transaction in each execution thread has row lock conflict with the transaction before the current transaction in the submitting sequence, if so, putting the current transaction into a wakeup queue of the execution thread where the transaction having row lock conflict exists for waiting; the execution thread concurrently executes the current transaction without row lock conflict; after each execution thread completes the current transaction, the transaction in the wakeup queue of the execution thread is waken.

Preferably, the line lock is constructed for each operation according to the unique identifier of each operation, and specifically comprises: a row lock hash table is created in each execution thread, a unique identifier of each operation is used as a key of the row lock hash table, and each operation is used as a record of the row lock hash table.

Preferably, determining whether the current transaction in each execution thread has a row lock conflict with a transaction before the current transaction in the commit order specifically includes: acquiring a line lock hash table of a first execution thread as a first line lock hash table; acquiring a row lock hash table of a second execution thread as a second row lock hash table, wherein the current transaction commit order of the second execution thread is before the first thread; judging whether the same key exists in the first row lock hash table and the second row lock hash table; if the same key exists, the current transaction of the first thread and the current transaction of the second thread have row lock conflict; if the same key does not exist, the current transaction of the first thread and the current transaction of the second thread do not have row lock conflict; and comparing each group of first threads with each group of second threads one by one, and judging whether line lock conflict exists between every two threads.

Preferably, the current transaction is put into a wake-up queue corresponding to the transaction with the row lock conflict for waiting, which specifically comprises: if the current transaction and the plurality of transactions have row lock conflicts, searching for the transaction with the commit order before the current transaction and closest to the current transaction in all the transactions with row lock conflicts, and placing the current transaction into a wake-up queue of an execution thread where the searched transaction is located.

Preferably, the method further comprises: and if the current transaction execution of the execution thread is completed, releasing all row locks in the row lock hash table of the execution thread.

Preferably, the method further comprises executing the transaction list, and after the transaction to be synchronized is acquired, placing the transaction to be synchronized into the executing transaction list according to the submitting sequence so as to search and distribute the transaction to be synchronized.

Preferably, the determining whether the current transaction in each execution thread has a row lock conflict with the transaction before the current transaction in the commit order specifically includes: searching the position of the current transaction in the execution transaction list, starting from the position of the current transaction, sequentially acquiring the transactions to be executed in reverse order, judging whether a row lock conflict exists between the current transaction and each transaction to be executed, and stopping judging when the first transaction to be executed with the row lock conflict is found.

Preferably, the system further comprises a transaction receiving thread, wherein the transaction receiving thread receives the acquired transaction to be synchronized, stores the transaction to be synchronized in an execution transaction list, and sequentially distributes the transaction to be executed in the execution transaction list to the execution threads when the execution threads are idle.

Preferably, after receiving the acquired transaction to be synchronized, the transaction receiving thread classifies the transaction to be synchronized according to the transaction ID to be synchronized, and only stores the transaction corresponding to the commit operation in the execution transaction list.

In another aspect, the present invention provides a device for synchronizing data based on parallel execution, specifically: comprising at least one processor and a memory connected by a data bus, the memory storing instructions to be executed by the at least one processor, the instructions, after being executed by the processor, for performing the method of parallel execution based data synchronization of any of claims 1-9.

Compared with the prior art, the embodiment of the invention has the beneficial effects that: the synchronization efficiency is improved through multithreading parallel synchronization, mutual exclusion locks are added to operations possibly causing synchronous data errors to avoid the data errors, and the control complexity of the transaction execution sequence is simplified through a wakeup queue, so that efficient and simple heterogeneous database parallel synchronization is realized.

Furthermore, the row lock hash table in the scheme provided by the invention integrally releases resources after the transaction execution is completed, and classifies the operation to be synchronized through the data reading thread, so that the synchronization efficiency of the heterogeneous database is further improved.

[ Description of the drawings ]

In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are required to be used in the embodiments of the present invention will be briefly described below. It is evident that the drawings described below are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a flow chart of a method for data synchronization based on parallel execution according to an embodiment of the present invention;

FIG. 2 is a timing diagram of another method for data synchronization based on parallel execution according to an embodiment of the present invention;

FIG. 3 is a flowchart of another method for data synchronization based on parallel execution according to an embodiment of the present invention;

FIG. 4 is a flowchart of another method for data synchronization based on parallel execution according to an embodiment of the present invention;

FIG. 5 is a flowchart of another method for data synchronization based on parallel execution according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a device for data synchronization based on parallel execution according to an embodiment of the present invention;

Wherein, the reference numerals are as follows:

21: a processor; 22: a memory.

[ Detailed description ] of the invention

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The present invention is an architecture of a specific functional system, so that in a specific embodiment, functional logic relationships of each structural module are mainly described, and specific software and hardware implementations are not limited.

In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other. The invention will be described in detail below with reference to the drawings and examples.

Example 1:

In the database system, all the operations in the transactions in the source database are recorded in the database log, and if the operations in the transactions are acquired and repeated in the destination database, the data in the destination database can be consistent with the data in the source database log. When the number of the transactions needing to be synchronized is large, the single-thread sequential execution can cause long waiting time and poor real-time performance of data synchronization, but when the operations are executed in parallel, the execution sequence of the operations in the target database is inconsistent with that of the source database, so that synchronization errors are caused. Therefore, the embodiment of the invention provides a data synchronization method capable of executing the transactions in parallel on the premise of ensuring that the execution sequences of the transactions are consistent.

As shown in fig. 1, the method for synchronizing data based on parallel execution provided by the embodiment of the invention specifically includes the following steps:

step 101: and acquiring the transaction to be synchronized.

When data synchronization is performed, data in the source database needs to be synchronized to the destination database, so that the data of the source needs to be acquired first. In the database field, since the amount of data to be synchronized is large, in order to reduce the total amount of synchronized data, an incremental synchronization mode may be adopted, that is, based on the previous synchronization, the synchronization is not repeated for unchanged data, and only the changed data is synchronized. The source database generates data changes by executing transactions, each transaction including one or more database operations, each operation generating a data change. Operations include reading data, writing data, updating modified data, deleting data, etc., and in a specific implementation scenario, an operation may correspond to an SQL statement. The executed transactions in the source end database are numbered according to the transaction submitting sequence and stored in the database log, and the same transactions are repeated in the destination end database according to the transaction submitting sequence in the database log of the source end database, so that the data of the destination end database can be changed in the same way, and the aim of data synchronization is achieved. In the non-database field, transactions and operations represent actions corresponding to transactions and operations of a database, such as file systems of a file database, a file, a distributed system, etc., which can generate data changes, and operations can be new creation, modification, deletion, etc., of a file or a data block.

Step 102: at least two execution threads are created, each of which includes a wakeup queue.

In order to improve the execution efficiency of the target-side transaction, the target-side transaction adopts a parallel execution mode when data synchronization is carried out, so that a plurality of execution threads capable of being executed in parallel are required to be created, and the transaction to be executed is distributed to different execution threads for execution. In order to ensure that the execution sequence in each execution thread is correct and no conflict occurs, if two transactions contain operations aiming at the same data, the transaction with the subsequent commit sequence needs to wait for the execution of the transaction with the previous commit sequence to be executed after the execution of the transaction with the previous commit sequence is completed. Therefore, each execution thread also needs to include a wake-up queue for storing the transaction whose commit order is after the current transaction of the execution thread and which includes the operation conflicting with the current transaction, and the transaction is put into the wake-up queue of the current transaction, and after the current transaction is completed and the operation does not conflict, the transaction whose commit order is after the current transaction is not completed is waken up.

Step 103: the transactions are allocated to different idle execution threads one by one according to the commit order, and the transaction allocated to each thread is the current transaction of each execution thread.

Each execution thread can only execute one transaction at a time, so each execution thread can only be allocated one transaction at a time, which is called the current transaction of the execution thread. The state of the execution thread when the current transaction is not allocated is idle, and the idle execution thread can accept the next transaction allocated according to the transaction commit order. If a plurality of idle execution threads exist at the same time, a transaction is allocated to each idle execution thread.

Step 104: each operation in the current transaction of each execution thread is acquired, and a row lock is constructed for each operation according to the unique identifier of each operation.

In order to determine whether there are conflicting operations in the transaction, a line lock may be constructed for the operations according to a unique identifier of the data in the operation. Specifically, the value of the unique identifier may be used as the value of the row lock, which indicates that there is a row lock conflict between two transactions if there is an operation with the same value of the row lock in the two transactions. In the database usage scenario ROWID may be used as the unique identifier for the operation, although different databases have different embodiments in ROWID, some with physical address structures such as ORACLE, some with logical integers such as SQL SERVER and DM7, etc., but follow a principle that the ROWID value of each row of data on a single table is unique. Meanwhile, in the log operation recorded by the database, the log of each operation is provided with corresponding ROWID information for marking the data row corresponding to the log operation, and the ROWID of each operation in the transaction can be simultaneously obtained as a unique identifier by obtaining the transaction to be synchronized from the database log. The running mechanism of the database ensures that the same ROWID data is not allowed to be modified by a plurality of transactions in parallel, so that when the operations with ROWID information are put in storage in data synchronization, the conflicting transactions can be executed in series through mutually exclusive transactions with the same ROWID information, the conflict-free transactions are executed in parallel, the parallelism of the transaction put in storage is effectively increased, and the transaction submitting sequence is partially ignored under the condition that the conflict does not occur, so that the synchronization performance is improved. In the non-database field, a unique identifier actually used, such as a file ID, can be selected according to an actual scene; multiple sets of eigenvalues may also be used as unique identifiers for data, such as table name + ROWID when multiple tables are synchronized simultaneously.

Step 105: judging whether the current transaction in each execution thread has row lock conflict with the transaction before the current transaction in the submitting sequence, if so, putting the current transaction into a wakeup queue of the execution thread where the transaction having row lock conflict exists for waiting.

In order to avoid conflict when the transactions are executed, whether the current transaction of each execution thread has the line lock conflict or not can be judged according to the line locks, if the current transaction of the two threads contains the same line lock, the line lock conflict exists between the two transactions, the transaction with the later commit order in the two transactions needs to wait for the transaction with the earlier commit order to be executed after the execution of the transaction with the earlier commit order is completed. Therefore, when there is a line lock conflict between two transactions, it is necessary to put the transaction with the later commit order into a wake queue in the execution thread where the transaction with the earlier commit order is located, and suspend the execution thread where the transaction with the later commit order is located, so as to wake up the transaction with the later commit order after the execution of the transaction with the earlier commit order is completed. By using the wake-up queue, the execution of the next transaction is triggered by the wake-up message, so that the execution sequence among the mutually conflicting transactions can be conveniently managed, and additional execution state monitoring and scheduling are not needed.

Step 106: the execution threads execute the current transaction in parallel without a row lock conflict.

And the transactions with no row lock conflict are executed in parallel, so that the synchronous conflict is not caused, the transactions can be executed in a parallel mode, the system resources are fully utilized, the total time of executing the transactions is reduced, and the data synchronization efficiency is improved. Since the transaction is allocated according to the transaction commit order when the transaction is allocated to the execution thread in step 103, the execution order of the transaction is further limited by the line lock and the wakeup queue in steps 104-105, so that the parallel execution does not cause the execution order of the transaction to be inconsistent with the commit order, and the data can be ensured to be correct after the execution of the transaction is completed.

Step 107: each execution thread wakes up the transaction in the wake queue of the execution thread after the current transaction is executed.

After each execution thread completes the current transaction, the transaction in the wakeup queue of the execution thread is waken, and the transaction in the wakeup queue is started to execute. By adding the awakening queue and the awakening mode, the sequential dispatching of the transactions can be conveniently completed, additional dispatching processing is not needed, dispatching difficulty and resource consumption required by dispatching are reduced, and dispatching efficiency and stability are improved. If there are multiple waiting transactions in the wakeup queue of the execution thread, all the waiting transactions are waken, the execution thread where the waken transaction is located is activated, that is, the execution thread suspended in step 105 is activated, and the waken transaction is executed according to step 106. And circularly executing the steps of joining the wake-up queue and waking up until the transaction to be synchronized is executed. On the other hand, when a certain transaction exists in the wake-up queues of a plurality of threads, the transaction needs to wait for all the wake-up queues to wake up the transaction before starting to activate execution.

Further, after the execution thread completes the execution of the current transaction, the execution thread is allocated with the next unassigned transaction according to the transaction commit order, and steps 104-106 are repeated for the next transaction until all the transactions to be synchronized are completed.

As shown in the timing diagram of fig. 2, steps 101-107 are specifically described using an example of a specific implementation scenario.

Both the source and target databases now have tables T (ID INT PRIMARY KEY, C1 INT).

The source application performs the following operations on the table T:

Operation 1: INSERT INTO (ID, C1) VALUES (1, 1);

operation 2: INSERT INTO (ID, C1) VALUES (2, 2);

COMMIT；

Operation 3: UPDATE tsetc1=10 WHERE id=1;

COMMIT；

operation 4: UPDATE tsetc1=20 WHERE id=2;

COMMIT；

The serial execution operation includes three COMMIT operations, which generate three transactions in the log of the source database, and TRX1, TRX2, and TRX3, respectively, according to the transaction COMMIT order. Wherein, TRX1 includes operation 1, operation 2, and a COMMIT operation, two rows of data having an ID of 1 and an ID of 2 are inserted into the T table; TRX2 includes operation 3 and COMMIT operation, updating the row with ID 1 in the T table; TRX3 includes operation 4 and a COMMIT operation, updating the row with ID 2 in the T table. All three transactions will cause data changes in the source database, so TRX1, TRX2 and TRX3 are acquired as transactions to be synchronized, per step 101.

According to step 102, three execution threads are created in the destination database, EXEC1, EXEC2 and EXEC3, respectively. The number of execution threads is specifically determined by the number of resources of the destination, and creating three execution threads in this embodiment does not represent a limitation to limit the specific number of threads.

According to step 103, the current execution threads EXEC1, EXEC2 and EXEC3 are all in idle state, and the transactions TRX1, TRX2 and TRX3 are allocated to the three execution threads one by one according to the transaction commit order. After allocation, TRX1 is the current transaction of EXEC1, TRX2 is the current transaction of EXEC2, and TRX3 is the current transaction of EXEC 3.

Since the operations in the transaction are performed on a row basis except for the COMMIT operation, the row number ID, ROWID, is selected as the unique identifier for the operation, per step 104. Constructing a row lock for each operation based on the unique identifier ROWID of the operation: the row lock of operation 1 is id=1, the row lock of operation 2 is id=2, the row lock of operation 3 is id=1, and the row lock of operation 4 is id=2.

It is determined whether each transaction has a row lock conflict with the transaction preceding it in commit order, per step 105. TRX1 commits first, so there is no transaction that conflicts with its line lock; the line lock id=1 of operation 3 in TRX2 is the same as the line lock id=1 of operation 1 in TRX1, there is a line lock conflict between TRX2 and TRX1, and TRX2 is put into the wake-up queue of thread EXEC1 where TRX1 is located; the row lock id=2 of operation 4 in TRX3 is the same as the row lock id=2 of operation 2 in TRX1, there is a row lock conflict between TRX3 and TRX1, and TRX3 is put into the wake-up queue of thread EXEC1 where TRX1 is located; the line lock id=2 of operation 4 in TRX3 is different from the line lock id=1 of operation 3 in TRX2, and thus there is no line lock collision.

The transaction TRX1 in exec1 commits in the order first, and thus can begin execution, per step 106. Transaction TRX2 in EXEC2 and transaction TRX3 in EXEC3 are put into the wakeup queue of EXEC1 because the transaction commit order is after TRX1 and there is a line lock conflict with TRX1, and execution needs to be started again after waiting for TRX1 to finish execution to wakeup.

After the execution of TRX1 is completed, the next allocated transaction is accepted, and TRX2 and TRX3 in the EXEC1 wakeup queue of the execution thread are awakened according to step 107. Since there is no row lock conflict between TRX2 and TRX3, TRX2 and TRX3 can be executed in parallel.

In the above example, since the multithread concurrent execution is used, the total time of serially executing the original sequential execution trx1+trx2+trx3 is reduced to the execution time of trx1+trx2 or trx1+trx3, the total time of executing the transaction is reduced, and the synchronization efficiency is improved. Meanwhile, the conflict between the operation 3 in the TRX2 and the operation 1 in the TRX1 and the conflict between the operation 4 in the TRX3 and the operation 2 in the TRX1 are avoided through the row lock, so that the accuracy of synchronous data is ensured.

To facilitate comparison of the row locks, the row locks may be stored in the form of hash tables. In step 104, the specific implementation of constructing a row lock for each operation based on the unique identifier of each operation may be: creating a row lock hash table in each execution thread, and placing each operation of the current transaction in each execution thread into the row lock hash table, wherein the unique identifier of each operation is used as a key of the row lock hash table. In the implementation scenario of the above example, the key of the row lock hash table is the unique identifier ID of the operation. Two sets of values with (key, value) as structures are stored in a row lock hash table of EXEC1, wherein the key is a key value of the hash table, and is (id=1, value=operation 1) and (id=2, value=operation 2) respectively; storing a set of (key, value) values in a row lock hash table of EXEC2, (id=1, value=operation 3); a set of (key, value) values are stored in the lock hash table of EXEC3, (id=2, value=operation 4). By comparing the key values of the hash tables of the row locks in different threads, the characteristics of the hash tables can be used for quickly and simply judging whether the row lock conflicts exist.

Further, as shown in fig. 3, when the execution threads use the row lock hash table to store row locks, the following specific steps may be used in step 105 to determine whether the current transaction in each execution thread has a row lock conflict with the transaction before the commit order:

step 201: the line lock hash table of the first execution thread is obtained as a first line lock hash table.

Step 202: and acquiring a row lock hash table of the second execution thread as a second row lock hash table, wherein the current transaction commit order of the second execution thread is before the first thread.

In the above example, three sets of row lock hash tables need to be compared according to the transaction commit order: the row lock hash table of EXEC1 is a first row lock hash table, and the row lock hash table of EXEC2 is a second row lock hash table; the row lock hash table of EXEC1 is a first row lock hash table, and the row lock hash table of EXEC3 is a second row lock hash table; the row lock hash table of EXEC2 is a first row lock hash table, and the row lock hash table in EXEC2 is a second row lock hash table.

Step 203: and judging whether the same key exists in the first row of lock hash table and the second row of lock hash table.

Step 204: if so, the current transaction of the first thread and the current transaction of the second thread have row lock conflict.

Step 205: if not, the current transaction of the first thread and the current transaction of the second thread have no row lock conflict.

In the above scenario, the same key exists in the row lock hash table of EXEC1 and the row lock hash table of EXEC 2: id=1, there is a row lock conflict; the same key exists in the row lock hash table of EXEC1 and the row lock hash table of EXEC3, and a row lock conflict exists: id=2; the row lock hash table of EXEC2 and the row lock hash table of EXEC3 do not have the same key, and no row lock conflict exists.

Step 206: and comparing each group of first threads with each group of second threads one by one, and judging whether a row lock exists between every two threads.

When judging the row lock conflict, the row lock hash tables in all execution threads need to be compared in pairs one by using the comparison mode as the examples in the steps 202-205 so as to ensure that all row lock conflicts are found.

When the synchronization is performed, the execution is required according to the transaction commit order, and if the current transaction and a plurality of transactions have line lock conflicts, the execution can be performed after the completion of the conflict transaction with the latest transaction commit order. Therefore, only the transaction with the commit order closest to the current transaction among all the transactions with the row lock conflicts is needed to be searched, and the current transaction is put into the wake-up queue of the execution thread where the searched transaction is located. In the above example, as shown in the timing diagram of fig. 4, if there is operation 5 in TRX 2: (UPDATE tset c1=15 WHERE id=2;) then TRX3 has a row lock conflict with TRX1 and TRX2 at the same time, and according to the transaction commit order, on the basis of ensuring that the data is correct, TRX3 needs to wait for TRX1 and TRX2 to start execution after execution is completed, and TRX2 needs to wait for TRX1 to start execution after execution is completed. At this time, the TRX2 is the transaction with the commit order before the TRX3 and closest to the current transaction in all transactions having a line lock conflict with the TRX3, so that the execution order of the transactions can be correct and the synchronous data can be ensured to be correct only by putting the TRX3 into the wakeup queue of the execution thread EXEC2 where the TRX2 is located, and waking up the TRX3 after the execution of the TRX2 is completed. And only the transaction to be waited is put into a wakeup queue corresponding to the transaction with the closest commit order to the current transaction, and under the condition that a plurality of groups of row lock conflicts exist, unnecessary wakeup is not needed for a plurality of times, so that the resources are saved, and the complexity of executing thread scheduling is reduced.

When the current transaction of the execution thread is executed, the resource corresponding to the line lock can be changed next time, at this time, all the line locks in the line lock hash table of the execution thread need to be released, namely all the keys (value) stored in the line lock hash table are cleared, and a new line lock hash table is re-established after the next current transaction is distributed to the execution thread.

To facilitate managing transactions to be synchronized, the transactions to be synchronized may be organized using a list of executing transactions. After the transaction to be synchronized is acquired, the transaction to be synchronized is put into an execution transaction list according to the submitting sequence, so that the transaction to be synchronized is conveniently searched and distributed according to the transaction submitting sequence. In the above example, the list of executing transactions is: TRX1- > TRX2- > TRX3. In step 103, the transactions are allocated to different idle execution threads one by one according to the commit order, i.e. the transactions are fetched one by one from the header according to the transaction order stored in the execution transaction list, and are allocated to the idle execution threads in sequence. Specifically, in a usage scenario of a database, database transactions in a log are managed by a log sequence number (Log sequence number, abbreviated as LSN), and the LSN is compiled from small to large according to a transaction commit order.

Further, after the transaction list is used for organizing the transactions to be synchronized, the transactions with row lock conflicts can be conveniently searched through the transaction list. When judging whether the current transaction in each execution thread has a row lock conflict with the transaction before the current transaction in the submitting sequence, searching the position of the current transaction in an execution transaction list, starting from the position of the current transaction, acquiring the transaction to be executed in reverse order in sequence to judge whether the row lock conflict exists, and stopping judging when the first transaction to be executed with the row lock conflict is found. In the above example, when the transaction that conflicts with the TRX3 exists in the case of operation 5, the transaction to be executed is obtained by the TRX3 in reverse order, and the first obtained transaction is TRX2, and the determination can be stopped when the transaction that conflicts with the TRX2 exists in the TRX 3. By using the reverse order of the execution transaction list to find out the row lock conflict, the last transaction with the row lock conflict can be conveniently and quickly found out, and the cost of searching and comparing is reduced.

In order to separate the transaction acceptance and distribution process to be synchronized from the transaction execution process and facilitate management of the list of executing transactions, a separate transaction receiving thread may also be created. The transaction receiving thread receives the acquired transaction to be synchronized, stores the transaction to be synchronized in an execution transaction list, and sequentially distributes the transaction to be executed in the execution transaction list to the execution threads when the execution threads are idle. The transaction receiving thread manages the receiving and distributing process of the transaction, so that the receiving and distributing process of the transaction and the executing process of the transaction can be performed in parallel, and the executing efficiency of the data synchronization process is further improved.

When data synchronization is performed, only transactions containing the COMMIT operation are needed to be executed, because only transactions containing the COMMIT operation, namely transactions corresponding to the COMMIT operation, represent that the data change is effective. Therefore, after receiving the acquired transaction to be synchronized, the transaction receiving thread classifies the transaction to be synchronized according to the transaction ID to be synchronized, and only stores the transaction corresponding to the commit operation into the execution transaction list. In the above example, TRX1, TRX2 and TRX3 all contain COMMIT operations, which all generate data changes and therefore all need to be performed. Specifically, in the use scenario of the database, the transaction acceptance thread may determine the operation type according to the transaction ID of the operation, where the transaction ID of the operation represents the operation type, and thus may be used to determine whether the transaction contains a COMMIT operation. The transactions are stored in the execution transaction list and classified first, so that the transactions needing to be synchronized can be screened in advance, the transactions needing to be processed are removed, the number of the transactions needing to be processed is reduced, the transactions needing to be executed are prevented from being processed, and the execution efficiency of data synchronization is further improved.

After the steps provided in the embodiment, a plurality of execution threads are created, and the line locks and the wakeup queues are matched for use, so that a plurality of synchronous transactions can be executed in parallel on the basis of ensuring the correctness of data, and the data synchronization efficiency is improved. Furthermore, the processes of conflict judgment and transaction allocation are further optimized by setting a row lock hash table, an execution transaction list and a transaction receiving thread, so that the data synchronization efficiency is further improved.

Example 2:

Based on the method for synchronizing data based on parallel execution provided in embodiment 1, in different specific application scenarios, the method can be supplemented and adjusted according to different use requirements or actual scenarios. In the following technical solutions, in the case that there is no conflict, one or more technical solutions may be selected and used in combination with the technical solution in embodiment 1.

In order to further improve the efficiency of concurrent execution of execution threads, in the specific implementation scenario of this embodiment, when the transactions are allocated to different idle execution threads one by one according to the commit order in step 103, further optimization allocation may be further performed according to the characteristics of the transactions and threads: the method comprises the steps of judging conflicting transactions in advance, forming the conflicting transactions into queues, estimating the execution time of the transactions in different queues, and selecting proper execution threads according to the time to allocate. In a specific implementation scenario, as shown in fig. 5, steps 101-107 may be changed to the following steps:

step 301: and acquiring the transactions to be synchronized, wherein each transaction comprises at least one operation.

Step 302: at least two execution threads are created, each of which includes a wakeup queue.

Step 303: each operation in each transaction to be synchronized is acquired, and a row lock is constructed for each operation according to the unique identifier of each operation.

Step 304: judging whether the current transaction in each execution thread has row lock conflict with the transaction before the current transaction in the commit order, and putting the conflicting transactions into the same distribution queue according to the commit order, wherein no transaction conflict exists between the transactions in any two distribution queues.

Step 305: the execution time of the first transaction in each allocation queue is estimated.

Step 306: when the execution thread is idle, the transaction with longer execution time is preferentially distributed to the execution thread, and the next transaction of the distribution queue where the transaction is located is put into the wake-up queue of the thread.

Step 307: the execution threads execute the current transaction without row lock conflicts in parallel in the transaction commit order.

Step 308: after each execution thread completes the current transaction, the next assigned transaction is accepted, and the transaction in the wakeup queue of the execution thread is awakened.

The line lock conflict judgment is firstly carried out before the transaction is distributed to the execution thread, so that the operation execution and the line lock judgment in the execution thread can be carried out concurrently, and the time for the line lock judgment is saved; meanwhile, the execution time of the transaction is estimated, the transaction with longer execution time is distributed to the thread with the idle first, the execution time of each thread can be balanced, and the thread resources are fully utilized.

Under the condition of using the line lock hash table, in order to facilitate the release of the line lock hash table, each line lock hash table can be placed in a continuous memory, and when the line lock hash table is released, the whole memory is directly released, so that the operation is simpler and more convenient, and the execution efficiency is higher. Further, when the execution of the transaction is completed, other resources occupied by the transaction need to be released in addition to the release of the resources of the lock hash table.

When data synchronization is performed, there may be cases where the source data and the destination data are different in data format, or only partial data needs to be synchronized, or the like. In order to facilitate synchronization of data with different formats, when a transaction receiving thread is used for managing a transaction to be synchronized, the operation in the transaction can be cleaned by the transaction receiving thread, and the operation of a source end can be converted into the operation of a destination end capable of providing the same data change, or a part which does not need to be synchronized is removed.

The data synchronization methods provided in embodiments 1 and 2 are used in combination, so that system resources can be fully utilized, and the efficiency of data synchronization is further improved.

Example 3:

On the basis of the method for data synchronization based on parallel execution provided in the foregoing embodiments 1 to 2, the present invention further provides a device for implementing the data synchronization based on parallel execution of the foregoing method, as shown in fig. 6, which is a schematic device architecture diagram of an embodiment of the present invention. The apparatus for data synchronization based on parallel execution of the present embodiment includes one or more processors 21 and a memory 22. In fig. 6, a processor 21 is taken as an example.

The processor 21 and the memory 22 may be connected by a bus or otherwise, for example in fig. 6.

The memory 22 is used as a nonvolatile computer-readable storage medium for storing a nonvolatile software program, a nonvolatile computer-executable program, and a module as a data synchronization method based on parallel execution as in embodiments 1 to 2. The processor 21 executes various functional applications and data processing of the apparatus based on parallel execution of data synchronization, that is, implements the method of parallel execution-based data synchronization of embodiments 1 to 2, by running nonvolatile software programs, instructions, and modules stored in the memory 22.

The memory 22 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 22 may optionally include memory located remotely from processor 21, which may be connected to processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The program instructions/modules are stored in the memory 22 and when executed by the one or more processors 21 perform the method of data synchronization based on parallel execution in the above-described embodiments 1 to 2, for example, performing the respective steps shown in fig. 1 to 5 described above.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the embodiments may be implemented by a program that instructs associated hardware, the program may be stored on a computer readable storage medium, the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. A method of data synchronization based on parallel execution, the method comprising:

acquiring the transactions to be synchronized, and putting the transactions to be synchronized into an execution transaction list according to the submitting sequence so as to search and distribute the transactions to be synchronized;

creating at least two execution threads, wherein each execution thread comprises a wake-up queue;

The method comprises the steps of distributing transactions to different idle execution threads one by one according to a commit order, wherein the transaction distributed to each thread is the current transaction of each execution thread;

acquiring each operation in the current transaction of each execution thread, and constructing a row lock for each operation according to the unique identifier of each operation;

Searching the position of the current transaction in the execution transaction list, starting from the position of the current transaction, sequentially acquiring the transactions to be executed in reverse order, judging whether a row lock conflict exists between the current transaction and each transaction to be executed, stopping judging when the first transaction to be executed with the row lock conflict is found, and if the row lock conflict exists, putting the current transaction into a wakeup queue of an execution thread where the transaction with the row lock conflict exists for waiting;

The execution thread concurrently executes the current transaction without row lock conflict;

After each execution thread completes the execution of the current transaction, the transaction in the wakeup queue of the wakeup execution thread starts to be executed; when a plurality of waiting transactions exist in a wakeup queue of an execution thread, all the waiting transactions are waken, and the execution thread where the waken transactions are located is activated; when a certain transaction exists in the wake-up queues of a plurality of threads, after all the wake-up queues are waited for waking up the transaction, the execution is started again.

2. The method for synchronizing data based on parallel execution according to claim 1, wherein said constructing a row lock for each operation based on a unique identifier of said each operation, comprises: a row lock hash table is created in each execution thread, a unique identifier of each operation is used as a key of the row lock hash table, and each operation is used as a record of the row lock hash table.

3. The method for synchronizing data based on parallel execution according to claim 2, wherein the determining whether there is a line lock conflict between the current transaction and each transaction to be executed specifically comprises:

Acquiring a line lock hash table of a first execution thread as a first line lock hash table;

acquiring a row lock hash table of a second execution thread as a second row lock hash table, wherein the current transaction commit order of the second execution thread is before the first thread;

judging whether the same key exists in the first row lock hash table and the second row lock hash table;

if the same key exists, the current transaction of the first thread and the current transaction of the second thread have row lock conflict;

If the same key does not exist, the current transaction of the first thread and the current transaction of the second thread do not have row lock conflict;

And comparing each group of first threads with each group of second threads one by one, and judging whether line lock conflict exists between every two threads.

4. The method for synchronizing data based on parallel execution according to claim 1, wherein the step of placing the current transaction into a wakeup queue corresponding to the transaction for which there is a line lock conflict for waiting, specifically comprises:

If the current transaction and the plurality of transactions have row lock conflicts, searching for the transaction with the commit order before the current transaction and closest to the current transaction in all the transactions with row lock conflicts, and placing the current transaction into a wake-up queue of an execution thread where the searched transaction is located.

5. The method of parallel execution-based data synchronization of claim 2, further comprising: and if the current transaction execution of the execution thread is completed, releasing all row locks in the row lock hash table of the execution thread.

6. The method for parallel execution-based data synchronization according to claim 1, wherein:

The system also comprises a transaction receiving thread, wherein the transaction receiving thread receives the acquired transaction to be synchronized, stores the transaction to be synchronized in an execution transaction list, and sequentially distributes the transaction to be executed in the execution transaction list to the execution threads when the execution threads are idle.

7. The method for parallel execution-based data synchronization of claim 6, wherein:

After receiving the acquired transaction to be synchronized, the transaction receiving thread classifies the transaction to be synchronized according to the transaction ID to be synchronized, and only stores the transaction corresponding to the commit operation into an execution transaction list.

8. An apparatus for data synchronization based on parallel execution, characterized in that:

Comprising at least one processor and a memory connected by a data bus, said memory storing instructions for execution by said at least one processor, said instructions, after being executed by said processor, for performing the method of parallel execution based data synchronization of any of claims 1-7.