CN119201874B

CN119201874B - Data synchronization method, device, equipment, medium and product

Info

Publication number: CN119201874B
Application number: CN202411676002.2A
Authority: CN
Inventors: 庄泽超
Original assignee: Alibaba Cloud Computing Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2024-11-21
Filing date: 2024-11-21
Publication date: 2025-03-14
Anticipated expiration: 2044-11-21
Also published as: CN119201874A

Abstract

The embodiment of the specification provides a data synchronization method and device. The data synchronization method comprises the steps of obtaining a pre-written log data set recorded by data processing of a database master node, performing reading operation on all pre-written log data in the pre-written log data set based on a plurality of parallel reading threads, performing analysis operation on all pre-written log data based on a plurality of parallel analysis threads, and applying all pre-written log data to a database slave node based on a plurality of parallel application threads according to generation time of all pre-written log data to obtain the database slave node after data synchronization. Under the condition of reading, analyzing and applying operation aiming at different pre-written log data, the reading thread, the analyzing thread and the applying thread run in parallel. According to the scheme, based on parallelization and pipelining processing among read operation, analysis operation and application operation, the efficiency of data synchronization processing can be improved.

Description

Data synchronization method, device, equipment, medium and product

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a data synchronization method. The present description relates at the same time to a data synchronization device, a computing device, a computer-readable storage medium and a computer program product.

Background

Based on the pre-written log data, the data synchronization between the master node and the slave node of the database is realized, and the synchronization mechanism is called a physical replication mechanism. In the existing physical copy mechanism, data reading operation, data analyzing operation and data application operation related to data synchronization processing are processed in a serial manner based on the same thread. In the serial processing, only one operation of the data reading operation, the data analyzing operation and the data application operation can be performed at the same time, and the other two operations can only wait. Thus, the existing serialized physical copy mechanism cannot perform data read operations, data parse operations, and data apply operations simultaneously.

Based on this, how to provide a physical replication mechanism with excellent performance to improve the efficiency of data synchronization becomes a technical problem to be solved.

Disclosure of Invention

In view of this, the present embodiments provide a data synchronization method. One or more embodiments of the present specification are also directed to a data synchronization apparatus, a computing device, a computer-readable storage medium, and a computer program product that address the shortcomings of the prior art.

According to a first aspect of embodiments of the present disclosure, there is provided a data synchronization method, including:

Acquiring a pre-written log data set recorded by data processing aiming at a database master node;

Reading each pre-written log data in the pre-written log data set based on a plurality of reading threads to obtain each read pre-written log data;

Analyzing the read pre-written log data based on a plurality of analysis threads to obtain analyzed pre-written log data;

And according to the generation time of each pre-written log data in the pre-written log data set, performing application operation on each pre-written log data after analysis based on a plurality of application threads, and synchronizing each pre-written log data after analysis to a database slave node, wherein the plurality of reading threads, the plurality of analysis threads and the plurality of application threads are all threads running in parallel, and under the condition of reading, analyzing and applying operation on different pre-written log data, the reading threads, the analysis threads and the application threads run in parallel.

According to a second aspect of embodiments of the present specification, there is provided a data synchronization apparatus comprising:

The acquisition module is configured to acquire a pre-written log data set recorded by data processing aiming at a database master node;

the reading module is configured to read each piece of pre-written log data in the pre-written log data set based on a plurality of reading threads to obtain each piece of read pre-written log data;

the analysis module is configured to analyze the read pre-written log data based on a plurality of analysis threads to obtain analyzed pre-written log data;

The application module is configured to perform application operation on each piece of parsed pre-written log data based on a plurality of application threads according to the generation time of each piece of pre-written log data in the pre-written log data set, and synchronize each piece of parsed pre-written log data to a database slave node, wherein the plurality of reading threads, the plurality of parsing threads and the plurality of application threads are all threads running in parallel, and the reading threads, the parsing threads and the application threads run in parallel under the condition of reading, parsing and application operation on different pieces of pre-written log data.

According to a third aspect of embodiments of the present specification, there is provided a computing device comprising:

A memory and a processor;

The memory is used for storing computer programs/instructions, and the processor is used for executing the computer programs/instructions, and the computer programs/instructions realize the steps of the data synchronization method when being executed by the processor.

According to a fourth aspect of embodiments of the present specification, there is provided a computer readable storage medium storing a computer program/instruction which, when executed by a processor, performs the steps of the above-described data synchronization method.

According to a fifth aspect of embodiments of the present specification, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the data synchronization method described above.

In the process of realizing data synchronization between the master node and the slave node of the database based on the pre-written log data, each pre-written log data is synchronously processed to the slave node of the database according to the generation time of the pre-written log data, so that the time sequence of data synchronization for the slave node of the database can be ensured, and the data reading operation, the data analysis operation and the data application operation related in the data synchronization process are completed based on different threads, so that the data reading operation, the data analysis operation and the data application operation can be simultaneously executed at the same time. Therefore, parallelization and pipelining processing can be performed on the data reading operation, the data analyzing operation and the data application operation on the basis of ensuring the time sequence of data synchronization, and further the efficiency of data synchronization on the database slave nodes can be improved.

Drawings

FIG. 1 is a flow chart of a method of data synchronization provided in one embodiment of the present disclosure;

FIG. 2 is a flow chart of allocating each buffer to each resolution thread according to the embodiment of the present disclosure;

FIG. 3 is a schematic flow chart of a resolving operation for resolving a thread queue according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart of data synchronization for a database slave node based on each RedoLog according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a data synchronization device according to an embodiment of the present disclosure;

FIG. 6 is a block diagram of a computing device provided in one embodiment of the present description.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.

Furthermore, it should be noted that, user information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for analysis, stored data, presented data, etc.) according to one or more embodiments of the present disclosure are information and data authorized by a user or sufficiently authorized by each party, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions, and is provided with corresponding operation entries for the user to select authorization or denial.

First, terms related to one or more embodiments of the present specification will be explained.

The pre-written log data is RedoLog used for recording the data of the modification condition of the physical page of the database, and the data is stored with specific change operation of the data page, which is usually generated before the transaction is submitted and is mainly used for quick recovery after the abnormal fault of the database.

Physical replication-the specific physical operations that a database system needs to perform in order to restore the state of the database, which are directed to physical storage, are typically related to the actual storage location and structure of the data.

And (5) applying, namely performing modification operation on the data page according to RedoLog.

Data definition language (Data Definition Language, DDL) for creating, modifying, or deleting database objects.

And the cache area is Redo Buf, and mainly refers to a buffer area for storing the pre-written log data.

In the present specification, a data synchronization method is provided, and the present specification relates to a data synchronization apparatus, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.

Referring to fig. 1, fig. 1 shows a flowchart of a data synchronization method according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 102, obtaining a pre-written log data set recorded by data processing aiming at a database master node.

In this embodiment of the present disclosure, when the user performs data processing on the database master node, the user records the pre-written log data (RedoLog) into the pre-written log file, where the database may be a relational database (MySQL). The format in which RedoLog of the pre-written log files are recorded may be a binary data format, and each RedoLog of the pre-written log files may be recorded sequentially from early to late according to the generation time of RedoLog. The data processing may include any one of data adding processing, data deleting processing and data updating processing for the database master node, or the data processing may further include updating operation of a database structure for the database master node, for example, deleting operation of a data table for the database, creating operation of a data table, increasing and decreasing operation of a data table in the database, newly creating operation of an index, and the like.

In the embodiment of the present disclosure, the pre-written log data set may be data obtained from a shared storage of the database master node. The method for acquiring the pre-written log data set may include periodically acquiring the pre-written log data set from the shared storage, where the pre-written log data set may include all pre-written log data or part of the pre-written log data in the shared storage. Specifically, if the database slave node data synchronization is completed based on the pre-written log data in the shared storage, the shared database automatically deletes the stored pre-written log data, the obtained pre-written log data set may be all the pre-written log data stored in the shared storage, and if the database slave node data synchronization is completed based on the pre-written log data in the shared storage, the shared database does not automatically delete the stored pre-written log data, the obtained pre-written log data set may be the pre-written log data which does not perform data synchronization on the database slave node.

And 104, reading each piece of pre-written log data in the pre-written log data set based on a plurality of reading threads to obtain each piece of read pre-written log data.

In the embodiment of the present specification, the read thread may be a thread in a slave node of the database. The manner in which the number of threads of a read thread is configured may include dynamically adjusting the number of threads of the configured read thread based on the amount of data of the pre-written log data set.

In the embodiment of the present disclosure, the earlier the generation time of RedoLog in the pre-written log file, the earlier the storage location of RedoLog may be. The plurality of read threads may be distributed in a queue, and each read thread in the queue may read RedoLog from the pre-written log file in a front-to-back order.

Optionally, the method of reading the pre-written log data by the reading thread may include that any reading thread may read RedoLog each time according to a preset data size (batch size), and the read data may be a binary data stream. The sequence of reading RedoLog may be sequentially read from front to back according to the storage sequence of RedoLog.

In the embodiment of the present disclosure, any RedoLog is provided with a log serial number (LogSequenceNumber, LSN), and the LSN may be used to represent storage location information of RedoLog in the pre-written log file. The earlier the generation time of RedoLog, the smaller the LSN of RedoLog may be. Thus, the generation time sequence of each RedoLog may be expressed in terms of the LSN sequence of each RedoLog.

Alternatively, each read thread may read RedoLog to the memory of the database slave node. The method for each read thread to cache RedoLog from the node memory to the database may include, for a first free cache area (redox Buf) in the memory, each read thread caching the read RedoLog in parallel to the first redox Buf, and when the first redox Buf is filled, each read thread caching the read RedoLog in parallel to a second free redox Buf.

After RedoLog is cached to the first redox Buf and the second redox Buf, the first redox Buf and the second redox Buf may be distributed in a queue according to the LSN of RedoLog in order from small to large, where the LSN of RedoLog in the redox Buf located in front of the queue is smaller than the LSN of RedoLog in the redox Buf located in back of the queue. For any redox Buf, the individual RedoLog caches within the redox Buf may be arranged in order of small and large according to the LSN of RedoLog.

And 106, analyzing the read pre-written log data based on a plurality of analysis threads to obtain the analyzed pre-written log data.

In the embodiment of the present specification, the parsing thread may be a thread in a slave node of the database. The method for configuring the thread number of the analysis threads can comprise dynamically adjusting the configured thread number according to the data amount of the pre-written log data to be analyzed.

In the embodiment of the present disclosure, after the parsing thread performs the parsing operation with respect to RedoLog, the type of RedoLog may be parsed. For RedoLog, the types analyzed may include any one of a Parallel type, a Before type, and a Serial type. The Parallel type may indicate, among other things, the type of physical modification that involves a page of data (page), such as RedoLog recorded by the physical modifications that add, delete, and update data to the page of data. The Before type may represent a type of application that needs to be applied in advance of the physical modification of the Page, such as RedoLog recorded for logical events such as start and commit of transactions, or global timestamp changes of transactions. The Serial type may indicate a type of forced synchronization, such as RedoLog recorded for updating the structure of the database.

And step 108, according to the generation time of each pre-written log data in the pre-written log data set, performing application operation on each pre-written log data after analysis based on a plurality of application threads, and synchronizing each pre-written log data after analysis to a database slave node, wherein the plurality of reading threads, the plurality of analysis threads and the plurality of application threads are all threads running in parallel, and under the conditions of reading, analyzing and applying operation on different pre-written log data, the reading threads, the analysis threads and the application threads run in parallel.

Optionally, performing an application operation on each of the parsed pre-written log data based on a plurality of application threads according to the generation time of each of the pre-written log data in the pre-written log data set may include performing an application operation on each of the parsed RedoLog based on the application threads according to LSNs of each RedoLog in order of from small to large LSNs. The order of the LSNs of each RedoLog from small to large may reflect the order of the generation times of each RedoLog from early to late.

In this embodiment of the present disclosure, application threads may have a correspondence with data pages in a database, where each application thread is responsible for data application operations of at least one data page.

Optionally, the process of performing the application operation for each RedoLog may include, for each RedoLog after parsing, generating an application address list (recv_addr list) for each RedoLog according to the specific information of RedoLog, traversing the recv_addr list, and applying each RedoLog to a corresponding data page. Wherein the specific information may include related information for indicating the location at which RedoLog was applied.

Optionally, for any RedoLog, the process of applying RedoLog to the data pages may include determining RedoLog a data page corresponding to the data page according to the application address information of RedoLog, determining a corresponding application thread based on a correspondence between the application thread and the data page, and applying RedoLog to the corresponding data page based on the determined application thread.

Note that, for each of the recav_addr lists RedoLog, the recav_addr list may be stored in a hash table at the application thread, so that all relevant RedoLog information for a page may be queried through the hash table.

In this embodiment of the present disclosure, the plurality of read threads may include a read thread 1, a read thread 2..a read thread n, and the like, the plurality of parse threads may include a parse thread 1, a parse thread 2..a parse thread n, and the like, and the plurality of application threads may include an application thread 1, an application thread 2..a.an application thread n, and the like. The plurality of read threads, the plurality of parse threads and the plurality of application threads are all threads running in parallel, and may include: taking thread 1 and reading thread 2..reading thread n is a parallel running thread, analyzing thread 1 and analyzing thread 2..analyzing thread n is a parallel running thread, and application thread 1 and application thread 2..application thread n is a parallel running thread. The parallel running among the read thread, the analysis thread and the application thread when the read, the analysis and the application operations are performed on different pre-written log data may include that if at time t, the read thread a performs the read operation on the pre-written log data a, the analysis thread b performs the analysis operation on the pre-written log data b, and the application thread c performs the analysis operation on the pre-written log data c, the read thread a, the analysis thread b and the application thread c run in parallel with each other, wherein the read thread a is any one of a plurality of read threads, the analysis thread b is any one of a plurality of analysis threads, and the application thread c is any one of a plurality of application threads.

It should be understood that the method according to one or more embodiments of the present disclosure may include the steps in which some of the steps are interchanged as needed, or some of the steps may be omitted or deleted.

In the method in fig. 1, in the process of implementing data synchronization between the database master node and the database slave node based on application of the pre-written log data to the database slave node, the read operation, the parse operation and the application operation can be executed in parallel based on different threads, so that the problem of waiting for other operations in the process of executing any operation can be solved, and the efficiency of data synchronization processing is improved. In the process of applying each RedoLog to the database slave node, each RedoLog is applied to the database slave node according to the generation time of each RedoLog and the sequence from the morning to the evening, so that the time sequence of each RedoLog application process can be maintained on the basis of improving the efficiency of data synchronization processing, the risk of data collision in the data synchronization process is avoided, and the timeliness of data synchronization for the database slave node is improved.

Based on the method of fig. 1, the examples of the present specification also provide some specific implementations of the method, as described below.

In order to improve the reasonable utilization rate of resources, the number of threads of the read thread can be dynamically configured.

Optionally, the reading operation is performed on each of the pre-written log data in the pre-written log data set based on a plurality of reading threads to obtain each of the pre-written log data after reading, which may include determining a total data amount of each of the pre-written log data included in the pre-written log data set, determining a thread number configured for the plurality of reading threads based on the total data amount, and performing the reading operation on each of the pre-written log data in the pre-written log data set based on the thread number of reading threads to obtain each of the pre-written log data after reading.

In the embodiment of the present specification, the total data amount of each of the pre-written log data may include any one of the number of the pre-written log data and the storage space occupied by each of the pre-written log data.

Optionally, the method for determining the number of threads of the read thread based on the total data amount may include determining the number of threads of the read thread by querying a correspondence table after determining the total data amount based on a correspondence table between the total data amount and the number of threads established in advance. Or a preset threshold value can be preset, and after the total data amount is determined, the thread number of the read threads is determined through a comparison result between the total data amount and the preset threshold value.

In the embodiment of the present disclosure, based on the total data amount of the pre-write log data set, the number of threads of the configured read thread may be dynamically adjusted, so that when the total data amount of the pre-write log data set is large, the number of threads of the read thread may be increased, so as to increase the speed of reading the pre-write log data set. When the total data amount of the pre-written log data set is smaller, the number of threads for reading threads can be reduced, so that reasonable utilization of system resources is realized.

In the embodiment of the present specification, a specific embodiment of determining the number of threads of the read thread based on the total amount of data is also proposed.

Optionally, the determining the number of threads configured for the plurality of read threads based on the total data amount may include obtaining a unit read data amount reflecting read performance of one read thread, calculating a quotient between the total data amount and the unit read data amount, and determining the number of threads configured for the plurality of read threads based on the quotient.

In the embodiment of the present specification, the unit read data amount of the read thread may include any one of the data amount read by the read thread at a time or the number of data read by the read thread in a unit time.

Optionally, the method for obtaining the unit read data amount of the read thread may include obtaining a total read data amount of a preset number of times based on historical read data of the read thread, and calculating to obtain the unit read data amount of the read thread based on the total read data amount and the preset number of times, or obtaining a total read data amount in a preset time period, and calculating to obtain the unit read data amount of the read thread based on the total read data amount and the preset time period.

Alternatively, the way of calculating the quotient between the total amount of data and the unit amount of read data may include calculating the quotient between the total amount of data and the unit amount of read data based on the "1 in method" or calculating the quotient between the total amount of data and the unit amount of read data based on the "tail-biting method".

In the embodiment of the present specification, the quotient between the total data amount and the unit read data amount may be directly determined as the number of threads of the read thread. Or the quotient between the total data amount and the unit read data amount can be used as a parameter for determining the thread number of the read threads, and the thread number of the read threads can be reasonably adjusted according to the quotient range.

In the embodiment of the present disclosure, in determining the number of threads of the read thread based on the total data amount of the pre-written log data set, the read performance of a single read thread is also considered, so that the accuracy of determining the number of threads of the read thread may be improved.

In the embodiment of the present disclosure, a specific embodiment of performing a read operation on each of the pre-written log data based on the dynamically determined preset number of read threads is also provided.

Optionally, the reading operation is performed on each of the pre-written log data in the pre-written log data set based on the number of the threads to obtain each of the pre-written log data after reading, which may include sorting each of the pre-written log data based on a generation time of each of the pre-written log data to obtain a target sequence, determining, for each of the pre-written log data, a correspondence for reflecting that each of the pre-written log data is read by a target reading thread according to a modulo operation result between a sequence number corresponding to position information of each of the pre-written log data in the target sequence and the number of the threads, and performing the reading operation on each of the pre-written log data in the pre-written log data set based on the correspondence to obtain each of the pre-written log data after reading.

Optionally, sorting the pre-written log data based on the generation time of the pre-written log data to obtain the target sequence may include sorting the pre-written log data in order from small to large based on the LSN of the pre-written log data to obtain the target sequence.

In this embodiment of the present disclosure, for any one of the pre-write log data in the pre-write log data set, determining location information (sorting number) of the pre-write log data in the target sequence, performing a modulo operation on the sorting number and the number of threads of the read thread, based on a value corresponding to a calculation result of the modulo operation, querying a read thread of the target at the value location in the read thread queue, establishing a correspondence between the pre-write log data and the target read thread, and instructing the target read thread to perform a read operation on the pre-write log data, thereby obtaining the read pre-write log data.

Any of the above-mentioned pre-written log data may be any one of the pre-written log data sets, and a read thread responsible for reading the pre-written log data may be determined for each of the pre-written log data in the pre-written log data set in the above-mentioned manner.

In the embodiment of the present disclosure, each pre-written log data in the pre-written log data set is allocated to each read thread according to the above modulo manner, so that the read tasks of each read thread can be balanced, so as to improve the read performance of the read thread.

In the process of analyzing the pre-written log data by each analysis thread, in order to improve the analysis performance of the analysis thread, the pre-written log data can be distributed to each analysis thread in a mode of balancing the analysis task amount.

Optionally, the analyzing the read pre-written log data based on the plurality of analyzing threads to obtain the analyzed pre-written log data may include distributing the read pre-written log data to each analyzing thread based on a preset rule, where the preset rule is used to reflect that the pre-written log data obtained by each analyzing thread is complete pre-written log data, and analyzing the read pre-written log data based on each analyzing thread to obtain the analyzed pre-written log data.

Optionally, the method for distributing the read pre-written log data to each analysis thread based on the preset rule may include distributing each complete pre-written log data to each analysis thread based on a modulo operation result between a sequence number sequenced for each pre-written log data and the number of threads of the analysis thread, or sequentially selecting a quotient corresponding to a quotient to distribute each continuous complete pre-written log data to each analysis thread based on a quotient between the number of each pre-written log data and the number of threads of the analysis thread, and if the quotient corresponding to each analysis thread distributes each continuous complete pre-written log data, and then, if remaining pre-written log data still exists, randomly distributing the remaining pre-written log data to some analysis threads.

In this embodiment of the present disclosure, after the read pre-write log data is distributed to each analysis thread based on a preset rule, the pre-write log data acquired by each analysis thread may be complete pre-write log data.

In order to facilitate the timing of the application of each parsed pre-written log data, the present disclosure further proposes a specific embodiment for distributing each read pre-written log data to each parsing thread.

Optionally, each read pre-written log data is cached in a plurality of cache areas distributed in a first queue, and in the first queue, if the first cache area is located before the second cache area, the generation time of the pre-written log data included in the first cache area is earlier than the generation time of the pre-written log data included in the second cache area. The method comprises the steps of distributing read pre-written log data to each analysis thread based on preset rules, distributing the read pre-written log data to any one of the buffer areas in the first queue, splitting the data in any one of the buffer areas into a first buffer data block and a second buffer data block based on an analysis coordination thread for coordinating the analysis threads according to splitting sites in any one of the buffer areas, wherein the first buffer data block at least comprises one complete pre-written log data, the splitting sites are used for splitting out complete pre-written log data, distributing the first buffer data block to a first analysis thread in each analysis thread based on the analysis coordination thread, splitting the data in the adjacent buffer areas into a third buffer data block and a fourth buffer data block based on the splitting sites in the adjacent buffer areas according to the adjacent buffer areas, distributing the first buffer data block to adjacent buffer areas in the analysis threads based on the analysis threads, and distributing the data in the first buffer data block to the analysis threads based on the analysis threads.

In the embodiment of the present disclosure, the buffer may be a log buffer (redox Buf) for buffering log data in the slave node memory of the database. And sequencing each buffer area according to the LSN of the pre-written log data from small to large to obtain a first queue. In the first queue, the LSN of the pre-written log data in the buffer area at the front end of the first queue is smaller than the LSN of the pre-written log data in the buffer area at the rear end of the first queue, so that the generation time of the pre-written log data in the buffer area at the front end of the first queue is longer than the generation time of the pre-written log data in the buffer area at the rear end of the first queue. For each buffer area, the pre-written log data buffered in the buffer area are distributed in a queue according to the LSN from small to large.

Fig. 2 is a schematic flow chart of allocating each buffer to each parsing thread according to the embodiment of the present disclosure. Steps 202 through 216 are included as shown in fig. 2.

Step 202, selecting a first buffer area from the first queue based on the analysis coordination thread according to the sequence of LSN from small to large aiming at the first queue.

In the embodiment of the present disclosure, the parsing coordination thread (Parse _ coordinator) may be a core component responsible for coordinating and scheduling parsing tasks, and the parsing thread (Parse _worker) may be a work unit for specifically performing parsing operations. Parse _ coordinator can ensure that tasks are scheduled, and coordinate efficient collaboration of a plurality of Parse _workers to ensure accuracy and efficiency of data analysis.

And 204, judging whether a segmentation site exists in the first cache region or not to obtain a first judgment result, wherein the segmentation site is used for segmenting out complete pre-written log data.

In the embodiment of the present disclosure, the slicing point may be an mtr boundary, and the mtr boundary may be expressed as a boundary of a pre-written log record (redox). Since a pre-written log data can be understood as log data for a Page, the mtr boundary can be understood as an atomic change boundary of the Page content. Through mtr boundary segmentation, the integrity of the redox of one Page can be ensured, and the integrity of the redox of all Page can be ensured within the mtr boundary segmentation range. The data set cut by the mtr boundary may be understood as a complete set of redox that is physically modified for a group of pages, where the data set may include multiple complete redox that is physically modified for multiple pages, and may also include one complete redox that is physically modified for one Page.

In this embodiment of the present disclosure, the pre-written log data buffered in the buffer may be a binary format data stream, and for the pre-written log data in one binary format, a part of the data may be stored in one buffer, and the rest of the data may be stored in another buffer, due to the limited storage space of the buffer, so that incomplete pre-written log data may be stored in the buffer. The splitting site may be a junction point between the end-complete pre-written log data and the incomplete pre-written log data in the binary data chain stored in the buffer region, and the splitting site may split the end-incomplete pre-written log data and the previous complete pre-written log data in the binary data chain.

And 206, if the first judgment result indicates that the segmentation position exists in the first cache region, segmenting the data in the first cache region into a first cache data block and a second cache data block based on an analysis coordination thread.

In this embodiment of the present disclosure, the first buffered data block includes at least one complete pre-written log data, and the second buffered data block includes incomplete pre-written log data.

And step 208, distributing the first cache data block to an idle first analysis thread based on the analysis coordination thread, and adding the first analysis thread into an analysis thread queue.

In this embodiment of the present disclosure, after an idle resolution thread is allocated with a resolution task, the resolution thread may be added to a resolution thread queue, where each resolution thread in the resolution thread queue is sequentially arranged from front to back according to the time sequence of addition.

Step 210, selecting a second buffer area from the first queue based on the parsing coordination thread according to the sequence of LSN from small to large for the first queue.

And 212, if the segmentation position exists in the second cache region, segmenting the data in the second cache region into a third cache data block and a fourth cache data block based on the analysis coordination thread.

In this embodiment of the present disclosure, the third buffered data block includes at least one complete pre-written log data, and the fourth buffered data block includes incomplete pre-written log data.

And step 214, distributing the second cache data block and the third cache data block to an idle second analysis thread based on the analysis coordination thread, and adding the second analysis thread into an analysis thread queue.

In this embodiment of the present disclosure, after the data in the first buffer area is allocated, the first buffer area may be set as a free buffer area to buffer the subsequent pre-written log data.

Step 216, looping according to the steps until the buffer area in the first queue is fully allocated or no free parsing thread exists.

It should be noted that, if any pre-written log data is not cached in the second cache region, it indicates that no data in the second cache region needs to be synchronized to the database slave node, the second cache data block is allocated to an idle second analysis thread, and the second analysis thread is added into the analysis thread queue.

Optionally, the allocating the read pre-written log data to each of the parsing threads based on a preset rule may further include allocating, if the splitting site is not included in any one of the cache regions and the splitting site is included in the adjacent cache region, any one of the cache regions and the third cache data block to a third parsing thread of the parsing threads based on the parsing coordination thread.

In the embodiment of the present disclosure, in fig. 2, if step 206 indicates that the first buffer does not have the splitting site, and step 212 indicates that the second buffer has the splitting site, the pre-written log data in the first buffer and the data in the third buffer data block are allocated to the first parsing thread.

In this embodiment of the present disclosure, if the first buffer area does not have the splitting site, it may be reflected that the data stored in the first buffer area is incomplete pre-written log data, and part of the incomplete pre-written log data may be stored in the second buffer area located behind the first buffer area. Therefore, the data of the first cache region and the third cache data block segmented from the second cache region are commonly distributed to the first analysis thread, so that the data acquired by the first analysis thread can be complete pre-written log data.

In the embodiment of the present disclosure, the binary data streams allocated to each analysis thread are all complete pre-written log data, so that each pre-written log data analyzed by each analysis thread is all complete pre-written log data, and when a subsequent application thread applies to the data analyzed by the analysis thread, the complete pre-written log data can be applied to the database slave node, so as to improve the data integrity of data synchronization for the database slave node.

In the embodiment of the present disclosure, a specific embodiment of the parsing thread performing the parsing operation on each pre-written log data to be parsed is further provided.

Optionally, for any analysis thread, each piece of pre-written log data to be analyzed, which is distributed to any analysis thread, is distributed in a second queue according to the generation time, the analyzing operation is performed on each piece of pre-written log data after reading based on each analysis thread to obtain each piece of pre-written log data after analyzing, which can include traversing each piece of pre-written log data to be analyzed in the second queue for any analysis thread, analyzing the pre-written log data to be analyzed, which is positioned before the pre-written log data of the specific type in the second queue, to a first data set if the pre-written log data of the specific type exists in the pre-written log data of the specific type, wherein the pre-written log data of the specific type at least comprises log data for updating a database structure, analyzing the pre-written log data of the specific type to a second data set, and analyzing the pre-written log data to be analyzed, which is positioned behind the pre-written log data of the specific type in the second queue, to a third data set.

Optionally, each of the pre-written log data to be parsed is distributed in a second queue according to the generating time, and may include that each of the pre-written log data to be parsed is distributed in the second queue according to the LSN of the pre-written log data and in the order from small to large. The earlier the generation time of the pre-written log data is, the smaller the LSN of the pre-written log data is.

In this embodiment of the present disclosure, each of the pre-write log data to be parsed allocated to any one of the parsing threads may be distributed in the second queue. The process of analyzing the to-be-analyzed pre-written log data by any analyzing thread in the second queue can comprise traversing the to-be-analyzed pre-written log data in sequence from front to back in the second queue, and dividing the to-be-analyzed log data into data sets (LogBatch) according to the type of the to-be-analyzed pre-written log data.

Optionally, dividing each log data to be parsed into each data set according to the type of the log data to be parsed may include parsing each log data to be parsed before the log data to be parsed of the specific type into a first LogBatch, parsing each log data to be parsed before the log data to be parsed of the specific type into a second LogBatch, and parsing each log data to be parsed after the log data to be parsed of the specific type into a third LogBatch if the log data to be parsed is the specific type of the log data to be parsed. The pre-written log data of a specific type may include at least a data definition language (Data Definition Language, DDL), which may be used to create, modify, and delete Database objects, which may include meta-information such as databases (databases) and tables (tables).

In the embodiment of the present disclosure, the first LogBatch, the second LogBatch, and the third LogBatch may be distributed in a queue from small to large according to the LSNs of the pre-written log data. So that the subsequent application thread can perform data application according to the queues ordered by the first LogBatch, the second LogBatch and the third LogBatch. Therefore, the time sequence of each pre-written log data to be applied can be improved, and the effectiveness of data synchronization aiming at the database slave nodes is improved.

In the embodiment of the present disclosure, a specific embodiment of performing an application operation on each to-be-applied pre-written log data parsed by the parsing thread is further provided.

Optionally, the parsing threads are distributed in a third queue according to the generation time of the pre-written log data to be parsed. The method comprises the steps of synchronizing each parsed prewritten log data to a database slave node based on a plurality of application threads according to generation time of each prewritten log data in the prewritten log data set, traversing each parsed thread in the third queue based on an application coordination thread for coordinating the plurality of application threads according to the generation time, and applying the parsed prewritten log data of any parsed thread to the database slave node based on the plurality of application threads to obtain the database slave node after data synchronization.

In the embodiment of the present disclosure, the application coordination thread (apply_ coordinator) may be responsible for coordinating and scheduling the tasks, so as to ensure that all application threads (apply_worker) operate according to the correct sequence and dependency. Apply_ coordinator typically manages the workflow, transactions, and coordination tasks during processing to ensure consistency of the system. The apply_worker may be responsible for executing specific data applications or transactional operations. Each apply_ worke can process the tasks assigned by apply_ coordinator to complete the actual data modification or change operations.

In the embodiment of the present disclosure, after each analysis thread is assigned with an analysis task, the analysis threads may be ordered according to the order of the generation time from early to late according to the generation time of the pre-written log data responsible for analysis, so as to obtain the third queue. Specifically, according to the LSN of the pre-written log data, the third queue may be obtained by sequencing each analysis thread according to the order of the LSN from small to large.

Optionally, traversing each of the resolved threads in the third queue based on an application coordination thread for coordinating the plurality of application threads according to the generation time may include traversing each of the resolved threads in the third queue based on the application coordination thread in order of from small to large LSNs according to LSNs of the pre-written log data to be applied.

In the embodiment of the present disclosure, for each traversed analysis thread, each pre-written log data analyzed by the analysis thread is applied to a database slave node by using an application thread, so as to obtain a database slave node after data synchronization.

In the embodiment of the present disclosure, according to the LSN of the pre-written log data, the application thread may apply, according to the order of the LSN from small to large, each pre-written log data analyzed by the analysis thread, so as to improve the time sequence in which each pre-written log data is applied, so as to improve the effectiveness of data synchronization with respect to the slave node of the database.

In the embodiment of the present disclosure, a specific embodiment of performing an application operation on each to-be-applied pre-written log data parsed by one of the parsing threads is further provided.

The method comprises the steps of analyzing each data set generated by any analysis thread according to the generation time of the pre-written log data to be applied, wherein the pre-written log data analyzed by any analysis thread is applied to a database slave node based on the plurality of application threads aiming at any traversed analysis thread to obtain the database slave node after data synchronization, the method comprises traversing each data set in the fourth queue based on the application coordination thread according to the generation time, and applying each pre-written log data in any data set to the database slave node based on the plurality of application threads aiming at any traversed data set to obtain the database slave node after data synchronization.

In this embodiment of the present disclosure, after the parsing thread parses each pre-written log data to be parsed, each data set may be generated according to the type of the pre-written log data. The fourth queue may be obtained by sorting the data sets according to the generation time of the pre-written log data included in the data sets in order of early and late generation times. Specifically, according to the LSN of the pre-written log data, the data sets may be ordered according to the order of the LSNs from small to large, so as to obtain the fourth queue.

Optionally, traversing each data set in the fourth queue based on the application coordination thread according to the generation time may include traversing each data set in the fourth queue in order of from small to large according to LSN of the pre-written log data to be applied.

In the embodiment of the present disclosure, for each traversed data set, each pre-written log data in the data set is applied to a database slave node by using an application thread, so as to obtain a database slave node after data synchronization.

In the embodiment of the present disclosure, an application thread may apply, according to LSNs of the pre-written log data, each pre-written log data included in the data set in order of from small to large LSNs, so as to improve the time sequence in which each pre-written log data is applied, so as to improve the effectiveness of data synchronization with respect to the slave node of the database.

In the embodiment of the present disclosure, a specific embodiment of performing an application operation on each to-be-applied pre-written log data in one of the data sets generated by the parsing thread is also provided.

Optionally, the step of applying each of the pre-written log data in any one of the data sets to a database slave node based on the plurality of application threads to obtain a database slave node after data synchronization may include determining whether any one of the data sets contains the pre-written log data of the specific type to obtain a determination result, if the determination result indicates that any one of the data sets does not contain the pre-written log data of the specific type, applying each of the pre-written log data of the first type to the database slave node based on the application coordination thread, wherein each of the pre-written log data of the first type includes pre-written log data recorded by executing a logical transaction, and applying each of the pre-written log data of the second type to the database slave node based on the plurality of application threads to obtain a database slave node after data synchronization, wherein the pre-written log data of the second type includes a physical operation performed on a data page in the any one of the data sets, and if the pre-written log data of the first type includes a pre-written log data of the specific type, and modifying the pre-written log data of the first type includes a pre-written log data recorded by executing a logical transaction to the database slave node.

Fig. 3 is a schematic flow chart of a resolving operation for resolving a thread queue according to an embodiment of the present disclosure. Steps 302 through 316 are included as shown in fig. 3.

Step 302, selecting a first analysis thread based on the application coordination thread according to the LSN of the pre-written log data and the order from small to large for a third queue formed by each analysis thread.

Step 304, selecting the first LogBatch from the fourth queue constituted by each LogBatch included in the first analysis thread based on the application coordination thread in order of small and large according to the LSN of the pre-written log data.

Step 306, determining whether the pre-written log data of the specific type exists in the first LogBatch.

If the specific type of pre-written log data does not exist in the first LogBatch, the first type of pre-written log data in the first LogBatch is applied to the database slave node based on the application coordination thread in step 308.

In the embodiment of the present disclosure, the first type of pre-written log data may include pre-written log data recorded by a logical event of executing a transaction. For example, the logical events of a transaction may include the start and commit of the transaction, or a global timestamp change of the transaction.

In the embodiment of the present disclosure, the first type of pre-written log data may include logical log contents, so that, in order to avoid destroying the logical structure of each logical log, each first type of pre-written log data needs to be applied to the database slave node by one thread. In an application batch, when an application coordination thread executes application operation aiming at each first type of pre-written log data, the application coordination thread is required to switch to a subsequent operation after the application of each first type of pre-written log data is completed, so that the switching times of the application coordination thread can be reduced, and the application efficiency aiming at each first type of pre-written log data is improved.

And step 310, applying the pre-written log data of each second type in the first LogBatch to the database slave node based on the plurality of application threads to obtain the database slave node after data synchronization.

In the embodiment of the present specification, the second type of pre-written log data may include pre-written log data recorded by performing a physical modification operation with respect to the data page. The application operation may be performed on the second type of pre-written log data after the application is completed on the first type of pre-written log data. Therefore, after the logical log data is synchronized aiming at the database slave node, so that the running environment of the database slave node meets the preset logical requirement, and then the physical log data is synchronized aiming at the standby database, the fluency of data synchronization aiming at the database slave node can be improved.

In the embodiment of the present disclosure, each application thread of the plurality of application threads may have a correspondence relationship with at least one data page in the database slave node. And distributing the pre-written log data to corresponding application threads according to information carried by the pre-written log data and related to the data table and the data page and according to the corresponding relation between the application threads and the data page, and applying the pre-written log data to the corresponding data page based on the corresponding application threads to finish the modification operation of the physical content of the data page so as to obtain the database slave node after data synchronization.

Optionally, establishing the correspondence between the application thread and the data page may include acquiring data table ID information (space_id) and data page number information (page_no) corresponding to each data page, and calculating a hash value for the target data page based on the space_id and the page_no, where the hash value may be a sequence number set for the target data page. And calculating a modular operation result between the hash value and the number of threads of the application threads, wherein each application thread is distributed in a queue, and according to the numerical value corresponding to the modular operation result, the application thread at the serial number corresponding to the numerical value is inquired from the application thread queue, and the corresponding relation between the application thread at the serial number and the target data page is established.

The target data page may be any one of the data pages in the database slave node, and for each data page in the database slave node, a correspondence relationship with the application thread may be established in the manner described above.

In the embodiment of the present disclosure, the application threads and the data pages have the above correspondence, so that each application thread can perform application operations on a plurality of data pages, and thus, the task amount of each application thread can be balanced, so as to improve the performance of application threads in performing application operations. The application threads can be operated in parallel, so that the efficiency of application operation of the application threads can be improved.

Step 312, if the specific type of pre-written log data exists in the first LogBatch, applying the specific type of pre-written log data to the database slave node based on the application coordination thread.

In the embodiment of the present disclosure, if a specific type of pre-written log data exists in the first LogBatch, a piece of pre-written log data may be stored in LogBatch. The specific type of pre-written log data may include pre-written log data recorded by forced synchronization operations on the database, such as DDL.

In the embodiment of the present disclosure, when the application operation is performed on the DDL, the database structure of the database slave node needs to be updated, and since the database slave node needs to be updated, all the first pre-written log data that occurs before the DDL needs to be applied in advance, and the application operation on the second pre-written log data that occurs after the DDL needs to be prohibited. This is so because if the DDL application results in the deletion of an old data page from the node for the database, the application operation will be performed on the first pre-written log data that occurred before the DDL application was completed, and the data synchronization process will occur in the event of a conflict in the data application in the scenario where the first pre-written log data was a physical modification to the deleted old data page. Or if the application result of the DDL is that a new data page is created for the database slave node, before the DDL application is completed, performing application operation for second pre-written log data after the DDL, and in the scenario that the second pre-written log data is a physical modification performed for the newly created new data page, performing data synchronization to generate a conflict event of the data application. Therefore, the application of each pre-written log data according to the sequence of the third queue and the fourth queue can avoid conflict events of the data application, so that the time sequence of each pre-written log data to be applied can be improved, and the effectiveness of data synchronization of the database slave nodes can be improved.

Step 314, looping each LogBatch in the fourth queue according to the processing manner for the first LogBatch.

Step 316, circularly processing each analysis thread in the third queue according to the processing mode aiming at the first analysis thread.

In this embodiment of the present disclosure, after each pre-written log data that the parsing thread is responsible for parsing is applied, the parsing thread may be set to an idle parsing thread, so as to receive the pre-written log data to be parsed subsequently.

Fig. 4 is a schematic flow chart of data synchronization for a database slave node based on each RedoLog according to an embodiment of the present disclosure. Steps 402 through 408 are included as shown in fig. 4.

Step 402, the reading thread reads each RedoLog to each redox Buf, and each redox Buf is arranged according to the sequence from small to large of LSN of RedoLog.

And 404, sequentially distributing RedoLog of each Redo Buf to each analysis thread according to the sequence from small to large of LSN, and forming an analysis thread queue according to the sequence from small to large of LSN after each analysis thread is distributed to be analyzed RedoLog.

Step 406, any parsing thread parses each to-be-parsed RedoLog into each LogBatch according to the type of to-be-parsed RedoLog in charge of parsing, and each LogBatch forms LogBatch a queue according to the sequence from small to large of LSN.

Step 408, the application thread traverses each analysis thread in the analysis thread queue and traverses each LogBatch in the LogBatch queue according to the sequence from small to large of LSN, and executes the application operation aiming at each LogBatch to obtain the database slave node after data synchronization.

In the embodiment of the present disclosure, since the above-described analysis thread queues are provided between the analysis threads to which the analysis tasks are assigned, and each LogBatch generated by each analysis thread has the above-described LogBatch queue. Thus by parsing the thread queues and LogBatch queues during the application phase, each RedoLog can be applied to database slave nodes in order of LSN from small to large. Therefore, in the analysis stage, when the DDL is analyzed, analysis operation is carried out on each RedoLog after the DDL is carried out without waiting for completion of the DDL application, so that delay risks of data synchronization caused by waiting for the DDL application operation in the analysis stage are avoided, and timeliness of data synchronization on the database slave node can be improved.

In the embodiment of the present disclosure, based on the parallelization and pipelining processes between the read thread, the parse thread, and the application thread, the rate of data synchronization for the database slave node may be improved, and each RedoLog may be sequentially applied to the database slave node according to the order of LSN from small to large, so as to improve the timeliness of data synchronization for the database slave node, thereby avoiding a data collision event in the data synchronization process, and further enabling the parallelization and pipelining processes between the read thread, the parse thread, and the application thread to be an effective synchronization process, so as to improve the timeliness of data synchronization for the database slave node.

Corresponding to the above method embodiments, the present disclosure further provides an embodiment of a data synchronization device, and fig. 5 shows a schematic structural diagram of a data synchronization device provided in one embodiment of the present disclosure. As shown in fig. 5, the apparatus includes:

An acquisition module 502 configured to acquire a set of pre-written log data recorded for data processing by a database master node.

And a reading module 504, configured to perform a reading operation on each of the pre-written log data in the pre-written log data set based on a plurality of reading threads, so as to obtain each of the read pre-written log data.

The parsing module 506 is configured to parse the read pre-written log data based on a plurality of parsing threads, so as to obtain parsed pre-written log data.

The application module 508 is configured to perform an application operation on each of the parsed pre-written log data based on a plurality of application threads according to the generation time of each of the pre-written log data in the pre-written log data set, and synchronize each of the parsed pre-written log data to a database slave node, where the plurality of reading threads, the plurality of parsing threads, and the plurality of application threads are all threads running in parallel, and the reading threads, the parsing threads, and the application threads run in parallel under the condition of performing reading, parsing, and application operations on different pre-written log data.

Optionally, the reading module 504 may include:

A first determination unit configured to determine a total amount of data of each of the pre-write log data included in the pre-write log data set.

And a second determination unit configured to determine the number of threads configured for the plurality of read threads based on the total amount of data.

And the reading unit is configured to read each piece of pre-written log data in the pre-written log data set based on the reading threads of the thread number, so as to obtain each piece of read pre-written log data.

Optionally, the second determining unit may include:

an acquisition subunit configured to acquire a unit read data amount for reflecting read performance of one read thread.

A calculation subunit configured to calculate a quotient between the total amount of data and the unit read amount of data.

A first determination subunit configured to determine, based on the quotient, a number of threads configured for the plurality of read threads.

Optionally, the reading unit may include:

And the sequencing subunit is configured to sequence each pre-written log data based on the generation time of each pre-written log data to obtain a target sequence.

The second determining subunit is configured to determine, for any one of the pre-written log data, a corresponding relationship for reflecting that the any one of the pre-written log data is read by a target read thread according to a modulo operation result between a sequence number corresponding to position information of the any one of the pre-written log data in the target sequence and the number of threads.

And the reading subunit is configured to read any one of the pre-written log data based on the corresponding relation to obtain the read pre-written log data.

Optionally, the parsing module 506 may include:

the distribution unit is configured to distribute the read pre-written log data to each analysis thread based on a preset rule, wherein the preset rule is used for reflecting that the pre-written log data acquired by each analysis thread are all complete pre-written log data.

And the analysis unit is configured to analyze the read pre-written log data based on the analysis threads to obtain the analyzed pre-written log data.

Optionally, each read pre-written log data is cached in a plurality of cache areas distributed in a first queue, and in the first queue, if the first cache area is located before the second cache area, the generation time of the pre-written log data included in the first cache area is earlier than the generation time of the pre-written log data included in the second cache area.

The distribution unit may include:

The first segmentation unit is configured to segment data in any buffer area into a first cache data block and a second cache data block according to segmentation sites in any buffer area in the first queue and based on an analysis coordination thread for coordinating the plurality of analysis threads, wherein the first cache data block at least comprises one complete pre-written log data, and the segmentation sites are used for segmenting the complete pre-written log data.

A first allocation subunit configured to allocate the first cache data block to a first analysis thread of the respective analysis threads based on the analysis coordination thread.

The second segmentation subunit is configured to segment data in an adjacent buffer zone positioned behind any buffer zone and adjacent to any buffer zone into a third buffer data block and a fourth buffer data block based on the analysis coordination thread according to the segmentation sites in the adjacent buffer zone, wherein the third buffer data block at least comprises one complete pre-written log data.

And a second allocation subunit configured to allocate the second cache data block and the third cache data block to a second analysis thread in the analysis threads based on the analysis coordination threads.

Optionally, the distribution unit may further include:

And a third allocation subunit configured to allocate, if the any one of the buffer areas does not include the segmentation site and the adjacent buffer areas include the segmentation site, the any one of the buffer areas and the third buffer data block to a third analysis thread of the analysis threads based on the analysis coordination thread.

Optionally, for any one of the parsing threads, each of the pre-written log data to be parsed allocated to the any one of the parsing threads is distributed in a second queue according to the generation time.

Optionally, the parsing unit may include:

and the first traversing subunit is configured to traverse each to-be-parsed pre-written log data in the second queue for any one of the parsing threads.

And the first analysis subunit is configured to analyze the pre-written log data to be analyzed, which is positioned before the pre-written log data of the specific type in the second queue, to a first data set if the pre-written log data of the specific type exists in the pre-written log data to be analyzed, wherein the pre-written log data of the specific type at least comprises log data for updating a database structure.

And a second parsing subunit configured to parse the pre-written log data of the specific type to a second data set.

And the third analysis subunit is configured to analyze the pre-written log data to be analyzed, which is positioned behind the pre-written log data of the specific type in the second queue, to a third data set.

Optionally, the parsing threads are distributed in a third queue according to the generation time of the pre-written log data to be parsed.

The application module 508 may include:

A traversing unit configured to traverse each of the resolution threads in the third queue based on an application coordination thread for coordinating the plurality of application threads according to the generation time;

The application unit is configured to apply the pre-written log data analyzed by any one of the analysis threads to the database slave node based on the plurality of application threads for any one of the traversed analysis threads, and obtain the database slave node after data synchronization.

Optionally, each data set generated by analyzing by any analyzing thread is distributed in a fourth queue according to the generating time of the to-be-applied pre-written log data.

The application unit may include:

A second traversal subunit configured to traverse each of the data sets in the fourth queue based on the application coordination thread according to the generation time.

And the application subunit is configured to apply each pre-written log data in any data set to the database slave node based on the plurality of application threads for any traversed data set, and obtain the database slave node after data synchronization.

Optionally, the application subunit is specifically configured to:

and judging whether any data set contains the pre-written log data of the specific type or not to obtain a judging result.

And if the judging result indicates that the specific type of pre-written log data is not contained in any data set, applying each first type of pre-written log data in any data set to a database slave node based on the application coordination thread, wherein the first type of pre-written log data comprises the pre-written log data recorded by a logic event for executing a transaction.

And applying each second type of pre-written log data in any data set to the database slave node based on the plurality of application threads to obtain the database slave node after data synchronization, wherein the second type of pre-written log data comprises the pre-written log data recorded by physical modification operation on a data page.

And if the judging result indicates that the data set contains the pre-written log data of the specific type, applying the pre-written log data of the specific type to the database slave node based on the application coordination thread to obtain the database slave node after data synchronization.

According to the data synchronization device provided by the specification, based on the parallelization and pipelining processing procedures among the reading thread, the analysis thread and the application thread, the speed of data synchronization for the database slave node can be improved, and meanwhile, each RedoLog can be sequentially applied to the database slave node according to the sequence of LSN from small to large so as to improve the time sequence of data synchronization for the database slave node, thereby avoiding data conflict events in the data synchronization process, and further enabling the parallelization and pipelining processing procedures among the reading thread, the analysis thread and the application thread to be effective synchronous processing procedures.

The foregoing is a schematic scheme of a data synchronization apparatus of this embodiment. It should be noted that, the technical solution of the data synchronization device and the technical solution of the data synchronization method belong to the same concept, and details of the technical solution of the data synchronization device, which are not described in detail, can be referred to the description of the technical solution of the data synchronization method.

Fig. 6 illustrates a block diagram of a computing device 600 provided in accordance with one embodiment of the present description. The components of computing device 600 include, but are not limited to, memory 610 and processor 620. The processor 620 is coupled to the memory 610 via a bus 630 and a database 650 is used to hold data.

Computing device 600 also includes access device 640, access device 640 enabling computing device 600 to communicate via one or more networks 660. Examples of such networks include public switched telephone networks (PSTN, public Switched Telephone Network), local area networks (LAN, local Area Network), wide area networks (WAN, wide Area Network), personal area networks (PAN, personal Area Network), or combinations of communication networks such as the internet. The access device 640 may include one or more of any type of network interface, wired or wireless, such as a network interface card (NIC, network interface controller), such as an IEEE802.11 wireless local area network (WLAN, wireless Local Area Network) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, worldwide Interoperability for Microwave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular network interface, a bluetooth interface, near Field Communication (NFC).

In one embodiment of the present description, the above-described components of computing device 600, as well as other components not shown in FIG. 6, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device shown in FIG. 6 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 600 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or personal computer (PC, personal Computer). Computing device 600 may also be a mobile or stationary server.

Wherein the processor 620 is configured to execute computer programs/instructions that, when executed by the processor, perform the steps of the data synchronization method described above.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for computing device embodiments, the description is relatively simple as it is substantially similar to data synchronization method embodiments, as relevant to the description of the data synchronization method embodiments.

An embodiment of the present disclosure also provides a computer-readable storage medium storing a computer program/instruction that, when executed by a processor, implements the steps of the data synchronization method described above.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for computer readable storage medium embodiments, since they are substantially similar to data synchronization method embodiments, the description is relatively simple, and reference should be made to the description of data synchronization method embodiments in part.

An embodiment of the present specification also provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the data synchronization method described above.

The foregoing is a schematic version of a computer program product of this embodiment. It should be noted that, the technical solution of the computer program product and the technical solution of the data synchronization method belong to the same concept, and details of the technical solution of the computer program product, which are not described in detail, can be referred to the description of the technical solution of the data synchronization method.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be increased or decreased appropriately according to the requirements of the patent practice, for example, in some areas, according to the patent practice, the computer readable medium does not include an electric carrier signal and a telecommunication signal.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. A data synchronization method, comprising:

Obtain a set of write-ahead log data recorded for data processing on the database master node;

Performing a reading operation on each write-ahead log data in the write-ahead log data set based on multiple reading threads to obtain each read write-ahead log data;

Based on multiple parsing threads, a parsing operation is performed on each of the read write-ahead log data to obtain each parsed write-ahead log data; the parsing operation is used to parse the specific type of write-ahead log data used to update the database structure into a second data set, parse the write-ahead log data located before the specific type of write-ahead log data into a first data set, and parse the write-ahead log data located after the specific type of write-ahead log data into a third data set, and each data set generated by the parsing of any of the parsing threads is distributed in a queue according to the generation time of the write-ahead log data;

According to the generation time of each write-ahead log data in the write-ahead log data set, application operations are performed on each of the parsed write-ahead log data based on multiple application threads, and each of the parsed write-ahead log data is synchronized to a database slave node, wherein the multiple reading threads, the multiple parsing threads, and the multiple application threads are all threads running in parallel, and when reading, parsing, and applying operations are performed on different write-ahead log data, the reading threads, the parsing threads, and the application threads run in parallel.

2. The method according to claim 1, wherein the step of performing a reading operation on each of the write-ahead log data in the write-ahead log data set based on multiple reading threads to obtain each of the read write-ahead log data comprises:

Determining the total amount of data of each write-ahead log data included in the write-ahead log data set;

Based on the total amount of data, determining the number of threads configured for the multiple reading threads;

The reading threads based on the number of threads perform a reading operation on each write-ahead log data in the write-ahead log data set to obtain each read write-ahead log data.

3. The method according to claim 2, wherein determining the number of threads configured for the plurality of reading threads based on the total amount of data comprises:

Get the unit read data volume used to reflect the read performance of a read thread;

Calculating a quotient between the total amount of data and the unit read data amount;

Based on the quotient, the number of threads configured for the plurality of reading threads is determined.

4. The method according to claim 2, wherein the reading thread based on the number of threads performs a reading operation on each of the write-ahead log data in the write-ahead log data set to obtain each of the read write-ahead log data, comprising:

sorting the respective write-ahead log data based on the generation time of the respective write-ahead log data to obtain a target sequence;

For any of the write-ahead log data, determining a corresponding relationship reflecting that any of the write-ahead log data is read by a target reading thread according to a result of a modulo operation between a sequence number corresponding to the position information of any of the write-ahead log data in the target sequence and the number of threads;

Based on the corresponding relationship, a read operation is performed on any of the write-ahead log data to obtain the read write-ahead log data.

5. The method according to claim 1, wherein the parsing operation is performed on each of the read write-ahead log data based on multiple parsing threads to obtain each of the parsed write-ahead log data, comprising:

Based on a preset rule, each of the read write-ahead log data is distributed to each of the parsing threads, wherein the preset rule is used to reflect that the write-ahead log data obtained by each of the parsing threads are all complete write-ahead log data;

Based on the parsing threads, a parsing operation is performed on each of the read write-ahead log data to obtain each parsed write-ahead log data.

6. The method of claim 5, wherein each of the read write-ahead log data is cached in a plurality of cache areas distributed in a first queue, and in the first queue, if the first cache area is located before the second cache area, the generation time of the write-ahead log data included in the first cache area is earlier than the generation time of the write-ahead log data included in the second cache area;

The method of allocating the read write-ahead log data to the parsing threads based on a preset rule includes:

For any of the cache areas in the first queue, according to a split point in any of the cache areas, based on a parsing coordination thread for coordinating the multiple parsing threads, data in any of the cache areas is split into a first cache data block and a second cache data block, the first cache data block includes at least one complete write-ahead log data, and the split point is used to split out the complete write-ahead log data;

Based on the parsing coordination thread, allocating the first cache data block to a first parsing thread among the parsing threads;

For an adjacent cache area located behind any of the cache areas and adjacent to any of the cache areas, according to the splitting position in the adjacent cache area, based on the parsing coordination thread, data in the adjacent cache area is split into a third cache data block and a fourth cache data block, wherein the third cache data block includes at least one complete pre-write log data;

Based on the parsing coordination thread, the second cache data block and the third cache data block are allocated to the second parsing thread among the parsing threads.

7. The method according to claim 6, wherein the step of allocating the read write-ahead log data to the parsing threads based on a preset rule further comprises:

If any of the cache areas does not contain the split site, and the adjacent cache area contains the split site, then based on the parsing coordination thread, any of the cache areas and the third cache data block are allocated to the third parsing thread among the parsing threads.

8. The method of claim 5, wherein each of the parsing threads is distributed in a third queue according to the generation time of the write-ahead log data to be parsed;

The step of performing application operations on each of the parsed write-ahead log data based on multiple application threads according to the generation time of each of the write-ahead log data in the write-ahead log data set, and synchronizing each of the parsed write-ahead log data to a database slave node includes:

According to the generation time, traverse each of the parsing threads in the third queue based on an application coordination thread for coordinating the multiple application threads;

For any of the parsing threads that have been traversed, the write-ahead log data parsed by any of the parsing threads is applied to a database slave node based on the multiple application threads to obtain a database slave node after data synchronization.

9. The method according to claim 8, wherein each data set generated by parsing by any of the parsing threads is distributed in a fourth queue according to the generation time of the write-ahead log data to be applied;

The method of applying the pre-written log data parsed by any of the parsing threads to the database slave node based on the multiple application threads for any of the parsing threads to obtain the database slave node after data synchronization includes:

According to the generation time, traverse each of the data sets in the fourth queue based on the application coordination thread;

For any of the traversed data sets, each write-ahead log data in any of the data sets is applied to a database slave node based on the multiple application threads to obtain a database slave node after data synchronization.

10. The method according to claim 9, wherein for any of the traversed data sets, based on the multiple application threads, each write-ahead log data in any of the data sets is applied to the database slave node to obtain the database slave node after data synchronization, comprising:

Determine whether any of the data sets contains the specific type of write-ahead log data, and obtain a determination result;

If the judgment result indicates that any of the data sets does not contain the specific type of write-ahead log data, then applying each first type of write-ahead log data in any of the data sets to the database slave node based on the application coordination thread, where the first type of write-ahead log data includes the write-ahead log data recorded by the logical event of executing the transaction;

Applying each second type of write-ahead log data in any of the data sets to a database slave node based on the multiple application threads to obtain a database slave node after data synchronization, wherein the second type of write-ahead log data includes write-ahead log data recorded by a physical modification operation performed on a data page;

If the judgment result indicates that any of the data sets contains a specific type of write-ahead log data, the specific type of write-ahead log data is applied to the database slave node based on the application coordination thread to obtain the database slave node after data synchronization.

11. A data synchronization device, comprising:

An acquisition module is configured to acquire a set of write-ahead log data recorded for data processing performed on a database master node;

A reading module is configured to perform a reading operation on each write-ahead log data in the write-ahead log data set based on multiple reading threads to obtain each read write-ahead log data;

A parsing module is configured to perform a parsing operation on each of the read write-ahead log data based on multiple parsing threads to obtain each parsed write-ahead log data; the parsing operation is used to parse the specific type of write-ahead log data used to update the database structure into a second data set, parse the write-ahead log data located before the specific type of write-ahead log data into a first data set, and parse the write-ahead log data located after the specific type of write-ahead log data into a third data set, and each data set generated by the parsing of any of the parsing threads is distributed in a queue according to the generation time of the write-ahead log data;

The application module is configured to perform application operations on each of the parsed write-ahead log data based on multiple application threads according to the generation time of each write-ahead log data in the write-ahead log data set, and synchronize each of the parsed write-ahead log data to the database slave node, wherein the multiple reading threads, the multiple parsing threads and the multiple application threads are all threads running in parallel, and when reading, parsing and applying operations are performed on different write-ahead log data, the reading thread, the parsing thread and the application thread run in parallel.

12. A computing device comprising:

Memory and processor;

The memory is used to store computer programs/instructions, and the processor is used to execute the computer programs/instructions. When the computer programs/instructions are executed by the processor, the steps of the data synchronization method according to any one of claims 1 to 10 are implemented.

13. A computer-readable storage medium storing a computer program/instruction, wherein the computer program/instruction, when executed by a processor, implements the steps of the data synchronization method according to any one of claims 1 to 10.

14. A computer program product, comprising a computer program/instruction, which, when executed by a processor, implements the steps of the data synchronization method according to any one of claims 1 to 10.