[go: up one dir, main page]

CN119201874B - Data synchronization method, device, equipment, medium and product - Google Patents

Data synchronization method, device, equipment, medium and product Download PDF

Info

Publication number
CN119201874B
CN119201874B CN202411676002.2A CN202411676002A CN119201874B CN 119201874 B CN119201874 B CN 119201874B CN 202411676002 A CN202411676002 A CN 202411676002A CN 119201874 B CN119201874 B CN 119201874B
Authority
CN
China
Prior art keywords
log data
data
write
threads
ahead log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411676002.2A
Other languages
Chinese (zh)
Other versions
CN119201874A (en
Inventor
庄泽超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Cloud Computing Ltd filed Critical Alibaba Cloud Computing Ltd
Priority to CN202411676002.2A priority Critical patent/CN119201874B/en
Publication of CN119201874A publication Critical patent/CN119201874A/en
Application granted granted Critical
Publication of CN119201874B publication Critical patent/CN119201874B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a data synchronization method and device. The data synchronization method comprises the steps of obtaining a pre-written log data set recorded by data processing of a database master node, performing reading operation on all pre-written log data in the pre-written log data set based on a plurality of parallel reading threads, performing analysis operation on all pre-written log data based on a plurality of parallel analysis threads, and applying all pre-written log data to a database slave node based on a plurality of parallel application threads according to generation time of all pre-written log data to obtain the database slave node after data synchronization. Under the condition of reading, analyzing and applying operation aiming at different pre-written log data, the reading thread, the analyzing thread and the applying thread run in parallel. According to the scheme, based on parallelization and pipelining processing among read operation, analysis operation and application operation, the efficiency of data synchronization processing can be improved.

Description

Data synchronization method, device, equipment, medium and product
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data synchronization method. The present description relates at the same time to a data synchronization device, a computing device, a computer-readable storage medium and a computer program product.
Background
Based on the pre-written log data, the data synchronization between the master node and the slave node of the database is realized, and the synchronization mechanism is called a physical replication mechanism. In the existing physical copy mechanism, data reading operation, data analyzing operation and data application operation related to data synchronization processing are processed in a serial manner based on the same thread. In the serial processing, only one operation of the data reading operation, the data analyzing operation and the data application operation can be performed at the same time, and the other two operations can only wait. Thus, the existing serialized physical copy mechanism cannot perform data read operations, data parse operations, and data apply operations simultaneously.
Based on this, how to provide a physical replication mechanism with excellent performance to improve the efficiency of data synchronization becomes a technical problem to be solved.
Disclosure of Invention
In view of this, the present embodiments provide a data synchronization method. One or more embodiments of the present specification are also directed to a data synchronization apparatus, a computing device, a computer-readable storage medium, and a computer program product that address the shortcomings of the prior art.
According to a first aspect of embodiments of the present disclosure, there is provided a data synchronization method, including:
Acquiring a pre-written log data set recorded by data processing aiming at a database master node;
Reading each pre-written log data in the pre-written log data set based on a plurality of reading threads to obtain each read pre-written log data;
Analyzing the read pre-written log data based on a plurality of analysis threads to obtain analyzed pre-written log data;
And according to the generation time of each pre-written log data in the pre-written log data set, performing application operation on each pre-written log data after analysis based on a plurality of application threads, and synchronizing each pre-written log data after analysis to a database slave node, wherein the plurality of reading threads, the plurality of analysis threads and the plurality of application threads are all threads running in parallel, and under the condition of reading, analyzing and applying operation on different pre-written log data, the reading threads, the analysis threads and the application threads run in parallel.
According to a second aspect of embodiments of the present specification, there is provided a data synchronization apparatus comprising:
The acquisition module is configured to acquire a pre-written log data set recorded by data processing aiming at a database master node;
the reading module is configured to read each piece of pre-written log data in the pre-written log data set based on a plurality of reading threads to obtain each piece of read pre-written log data;
the analysis module is configured to analyze the read pre-written log data based on a plurality of analysis threads to obtain analyzed pre-written log data;
The application module is configured to perform application operation on each piece of parsed pre-written log data based on a plurality of application threads according to the generation time of each piece of pre-written log data in the pre-written log data set, and synchronize each piece of parsed pre-written log data to a database slave node, wherein the plurality of reading threads, the plurality of parsing threads and the plurality of application threads are all threads running in parallel, and the reading threads, the parsing threads and the application threads run in parallel under the condition of reading, parsing and application operation on different pieces of pre-written log data.
According to a third aspect of embodiments of the present specification, there is provided a computing device comprising:
A memory and a processor;
The memory is used for storing computer programs/instructions, and the processor is used for executing the computer programs/instructions, and the computer programs/instructions realize the steps of the data synchronization method when being executed by the processor.
According to a fourth aspect of embodiments of the present specification, there is provided a computer readable storage medium storing a computer program/instruction which, when executed by a processor, performs the steps of the above-described data synchronization method.
According to a fifth aspect of embodiments of the present specification, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the data synchronization method described above.
In the process of realizing data synchronization between the master node and the slave node of the database based on the pre-written log data, each pre-written log data is synchronously processed to the slave node of the database according to the generation time of the pre-written log data, so that the time sequence of data synchronization for the slave node of the database can be ensured, and the data reading operation, the data analysis operation and the data application operation related in the data synchronization process are completed based on different threads, so that the data reading operation, the data analysis operation and the data application operation can be simultaneously executed at the same time. Therefore, parallelization and pipelining processing can be performed on the data reading operation, the data analyzing operation and the data application operation on the basis of ensuring the time sequence of data synchronization, and further the efficiency of data synchronization on the database slave nodes can be improved.
Drawings
FIG. 1 is a flow chart of a method of data synchronization provided in one embodiment of the present disclosure;
FIG. 2 is a flow chart of allocating each buffer to each resolution thread according to the embodiment of the present disclosure;
FIG. 3 is a schematic flow chart of a resolving operation for resolving a thread queue according to an embodiment of the present disclosure;
FIG. 4 is a schematic flow chart of data synchronization for a database slave node based on each RedoLog according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a data synchronization device according to an embodiment of the present disclosure;
FIG. 6 is a block diagram of a computing device provided in one embodiment of the present description.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.
Furthermore, it should be noted that, user information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for analysis, stored data, presented data, etc.) according to one or more embodiments of the present disclosure are information and data authorized by a user or sufficiently authorized by each party, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions, and is provided with corresponding operation entries for the user to select authorization or denial.
First, terms related to one or more embodiments of the present specification will be explained.
The pre-written log data is RedoLog used for recording the data of the modification condition of the physical page of the database, and the data is stored with specific change operation of the data page, which is usually generated before the transaction is submitted and is mainly used for quick recovery after the abnormal fault of the database.
Physical replication-the specific physical operations that a database system needs to perform in order to restore the state of the database, which are directed to physical storage, are typically related to the actual storage location and structure of the data.
And (5) applying, namely performing modification operation on the data page according to RedoLog.
Data definition language (Data Definition Language, DDL) for creating, modifying, or deleting database objects.
And the cache area is Redo Buf, and mainly refers to a buffer area for storing the pre-written log data.
In the present specification, a data synchronization method is provided, and the present specification relates to a data synchronization apparatus, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.
Referring to fig. 1, fig. 1 shows a flowchart of a data synchronization method according to an embodiment of the present disclosure, which specifically includes the following steps.
Step 102, obtaining a pre-written log data set recorded by data processing aiming at a database master node.
In this embodiment of the present disclosure, when the user performs data processing on the database master node, the user records the pre-written log data (RedoLog) into the pre-written log file, where the database may be a relational database (MySQL). The format in which RedoLog of the pre-written log files are recorded may be a binary data format, and each RedoLog of the pre-written log files may be recorded sequentially from early to late according to the generation time of RedoLog. The data processing may include any one of data adding processing, data deleting processing and data updating processing for the database master node, or the data processing may further include updating operation of a database structure for the database master node, for example, deleting operation of a data table for the database, creating operation of a data table, increasing and decreasing operation of a data table in the database, newly creating operation of an index, and the like.
In the embodiment of the present disclosure, the pre-written log data set may be data obtained from a shared storage of the database master node. The method for acquiring the pre-written log data set may include periodically acquiring the pre-written log data set from the shared storage, where the pre-written log data set may include all pre-written log data or part of the pre-written log data in the shared storage. Specifically, if the database slave node data synchronization is completed based on the pre-written log data in the shared storage, the shared database automatically deletes the stored pre-written log data, the obtained pre-written log data set may be all the pre-written log data stored in the shared storage, and if the database slave node data synchronization is completed based on the pre-written log data in the shared storage, the shared database does not automatically delete the stored pre-written log data, the obtained pre-written log data set may be the pre-written log data which does not perform data synchronization on the database slave node.
And 104, reading each piece of pre-written log data in the pre-written log data set based on a plurality of reading threads to obtain each piece of read pre-written log data.
In the embodiment of the present specification, the read thread may be a thread in a slave node of the database. The manner in which the number of threads of a read thread is configured may include dynamically adjusting the number of threads of the configured read thread based on the amount of data of the pre-written log data set.
In the embodiment of the present disclosure, the earlier the generation time of RedoLog in the pre-written log file, the earlier the storage location of RedoLog may be. The plurality of read threads may be distributed in a queue, and each read thread in the queue may read RedoLog from the pre-written log file in a front-to-back order.
Optionally, the method of reading the pre-written log data by the reading thread may include that any reading thread may read RedoLog each time according to a preset data size (batch size), and the read data may be a binary data stream. The sequence of reading RedoLog may be sequentially read from front to back according to the storage sequence of RedoLog.
In the embodiment of the present disclosure, any RedoLog is provided with a log serial number (LogSequenceNumber, LSN), and the LSN may be used to represent storage location information of RedoLog in the pre-written log file. The earlier the generation time of RedoLog, the smaller the LSN of RedoLog may be. Thus, the generation time sequence of each RedoLog may be expressed in terms of the LSN sequence of each RedoLog.
Alternatively, each read thread may read RedoLog to the memory of the database slave node. The method for each read thread to cache RedoLog from the node memory to the database may include, for a first free cache area (redox Buf) in the memory, each read thread caching the read RedoLog in parallel to the first redox Buf, and when the first redox Buf is filled, each read thread caching the read RedoLog in parallel to a second free redox Buf.
After RedoLog is cached to the first redox Buf and the second redox Buf, the first redox Buf and the second redox Buf may be distributed in a queue according to the LSN of RedoLog in order from small to large, where the LSN of RedoLog in the redox Buf located in front of the queue is smaller than the LSN of RedoLog in the redox Buf located in back of the queue. For any redox Buf, the individual RedoLog caches within the redox Buf may be arranged in order of small and large according to the LSN of RedoLog.
And 106, analyzing the read pre-written log data based on a plurality of analysis threads to obtain the analyzed pre-written log data.
In the embodiment of the present specification, the parsing thread may be a thread in a slave node of the database. The method for configuring the thread number of the analysis threads can comprise dynamically adjusting the configured thread number according to the data amount of the pre-written log data to be analyzed.
In the embodiment of the present disclosure, after the parsing thread performs the parsing operation with respect to RedoLog, the type of RedoLog may be parsed. For RedoLog, the types analyzed may include any one of a Parallel type, a Before type, and a Serial type. The Parallel type may indicate, among other things, the type of physical modification that involves a page of data (page), such as RedoLog recorded by the physical modifications that add, delete, and update data to the page of data. The Before type may represent a type of application that needs to be applied in advance of the physical modification of the Page, such as RedoLog recorded for logical events such as start and commit of transactions, or global timestamp changes of transactions. The Serial type may indicate a type of forced synchronization, such as RedoLog recorded for updating the structure of the database.
And step 108, according to the generation time of each pre-written log data in the pre-written log data set, performing application operation on each pre-written log data after analysis based on a plurality of application threads, and synchronizing each pre-written log data after analysis to a database slave node, wherein the plurality of reading threads, the plurality of analysis threads and the plurality of application threads are all threads running in parallel, and under the conditions of reading, analyzing and applying operation on different pre-written log data, the reading threads, the analysis threads and the application threads run in parallel.
Optionally, performing an application operation on each of the parsed pre-written log data based on a plurality of application threads according to the generation time of each of the pre-written log data in the pre-written log data set may include performing an application operation on each of the parsed RedoLog based on the application threads according to LSNs of each RedoLog in order of from small to large LSNs. The order of the LSNs of each RedoLog from small to large may reflect the order of the generation times of each RedoLog from early to late.
In this embodiment of the present disclosure, application threads may have a correspondence with data pages in a database, where each application thread is responsible for data application operations of at least one data page.
Optionally, the process of performing the application operation for each RedoLog may include, for each RedoLog after parsing, generating an application address list (recv_addr list) for each RedoLog according to the specific information of RedoLog, traversing the recv_addr list, and applying each RedoLog to a corresponding data page. Wherein the specific information may include related information for indicating the location at which RedoLog was applied.
Optionally, for any RedoLog, the process of applying RedoLog to the data pages may include determining RedoLog a data page corresponding to the data page according to the application address information of RedoLog, determining a corresponding application thread based on a correspondence between the application thread and the data page, and applying RedoLog to the corresponding data page based on the determined application thread.
Note that, for each of the recav_addr lists RedoLog, the recav_addr list may be stored in a hash table at the application thread, so that all relevant RedoLog information for a page may be queried through the hash table.
In this embodiment of the present disclosure, the plurality of read threads may include a read thread 1, a read thread 2..a read thread n, and the like, the plurality of parse threads may include a parse thread 1, a parse thread 2..a parse thread n, and the like, and the plurality of application threads may include an application thread 1, an application thread 2..a.an application thread n, and the like. The plurality of read threads, the plurality of parse threads and the plurality of application threads are all threads running in parallel, and may include: taking thread 1 and reading thread 2..reading thread n is a parallel running thread, analyzing thread 1 and analyzing thread 2..analyzing thread n is a parallel running thread, and application thread 1 and application thread 2..application thread n is a parallel running thread. The parallel running among the read thread, the analysis thread and the application thread when the read, the analysis and the application operations are performed on different pre-written log data may include that if at time t, the read thread a performs the read operation on the pre-written log data a, the analysis thread b performs the analysis operation on the pre-written log data b, and the application thread c performs the analysis operation on the pre-written log data c, the read thread a, the analysis thread b and the application thread c run in parallel with each other, wherein the read thread a is any one of a plurality of read threads, the analysis thread b is any one of a plurality of analysis threads, and the application thread c is any one of a plurality of application threads.
It should be understood that the method according to one or more embodiments of the present disclosure may include the steps in which some of the steps are interchanged as needed, or some of the steps may be omitted or deleted.
In the method in fig. 1, in the process of implementing data synchronization between the database master node and the database slave node based on application of the pre-written log data to the database slave node, the read operation, the parse operation and the application operation can be executed in parallel based on different threads, so that the problem of waiting for other operations in the process of executing any operation can be solved, and the efficiency of data synchronization processing is improved. In the process of applying each RedoLog to the database slave node, each RedoLog is applied to the database slave node according to the generation time of each RedoLog and the sequence from the morning to the evening, so that the time sequence of each RedoLog application process can be maintained on the basis of improving the efficiency of data synchronization processing, the risk of data collision in the data synchronization process is avoided, and the timeliness of data synchronization for the database slave node is improved.
Based on the method of fig. 1, the examples of the present specification also provide some specific implementations of the method, as described below.
In order to improve the reasonable utilization rate of resources, the number of threads of the read thread can be dynamically configured.
Optionally, the reading operation is performed on each of the pre-written log data in the pre-written log data set based on a plurality of reading threads to obtain each of the pre-written log data after reading, which may include determining a total data amount of each of the pre-written log data included in the pre-written log data set, determining a thread number configured for the plurality of reading threads based on the total data amount, and performing the reading operation on each of the pre-written log data in the pre-written log data set based on the thread number of reading threads to obtain each of the pre-written log data after reading.
In the embodiment of the present specification, the total data amount of each of the pre-written log data may include any one of the number of the pre-written log data and the storage space occupied by each of the pre-written log data.
Optionally, the method for determining the number of threads of the read thread based on the total data amount may include determining the number of threads of the read thread by querying a correspondence table after determining the total data amount based on a correspondence table between the total data amount and the number of threads established in advance. Or a preset threshold value can be preset, and after the total data amount is determined, the thread number of the read threads is determined through a comparison result between the total data amount and the preset threshold value.
In the embodiment of the present disclosure, based on the total data amount of the pre-write log data set, the number of threads of the configured read thread may be dynamically adjusted, so that when the total data amount of the pre-write log data set is large, the number of threads of the read thread may be increased, so as to increase the speed of reading the pre-write log data set. When the total data amount of the pre-written log data set is smaller, the number of threads for reading threads can be reduced, so that reasonable utilization of system resources is realized.
In the embodiment of the present specification, a specific embodiment of determining the number of threads of the read thread based on the total amount of data is also proposed.
Optionally, the determining the number of threads configured for the plurality of read threads based on the total data amount may include obtaining a unit read data amount reflecting read performance of one read thread, calculating a quotient between the total data amount and the unit read data amount, and determining the number of threads configured for the plurality of read threads based on the quotient.
In the embodiment of the present specification, the unit read data amount of the read thread may include any one of the data amount read by the read thread at a time or the number of data read by the read thread in a unit time.
Optionally, the method for obtaining the unit read data amount of the read thread may include obtaining a total read data amount of a preset number of times based on historical read data of the read thread, and calculating to obtain the unit read data amount of the read thread based on the total read data amount and the preset number of times, or obtaining a total read data amount in a preset time period, and calculating to obtain the unit read data amount of the read thread based on the total read data amount and the preset time period.
Alternatively, the way of calculating the quotient between the total amount of data and the unit amount of read data may include calculating the quotient between the total amount of data and the unit amount of read data based on the "1 in method" or calculating the quotient between the total amount of data and the unit amount of read data based on the "tail-biting method".
In the embodiment of the present specification, the quotient between the total data amount and the unit read data amount may be directly determined as the number of threads of the read thread. Or the quotient between the total data amount and the unit read data amount can be used as a parameter for determining the thread number of the read threads, and the thread number of the read threads can be reasonably adjusted according to the quotient range.
In the embodiment of the present disclosure, in determining the number of threads of the read thread based on the total data amount of the pre-written log data set, the read performance of a single read thread is also considered, so that the accuracy of determining the number of threads of the read thread may be improved.
In the embodiment of the present disclosure, a specific embodiment of performing a read operation on each of the pre-written log data based on the dynamically determined preset number of read threads is also provided.
Optionally, the reading operation is performed on each of the pre-written log data in the pre-written log data set based on the number of the threads to obtain each of the pre-written log data after reading, which may include sorting each of the pre-written log data based on a generation time of each of the pre-written log data to obtain a target sequence, determining, for each of the pre-written log data, a correspondence for reflecting that each of the pre-written log data is read by a target reading thread according to a modulo operation result between a sequence number corresponding to position information of each of the pre-written log data in the target sequence and the number of the threads, and performing the reading operation on each of the pre-written log data in the pre-written log data set based on the correspondence to obtain each of the pre-written log data after reading.
Optionally, sorting the pre-written log data based on the generation time of the pre-written log data to obtain the target sequence may include sorting the pre-written log data in order from small to large based on the LSN of the pre-written log data to obtain the target sequence.
In this embodiment of the present disclosure, for any one of the pre-write log data in the pre-write log data set, determining location information (sorting number) of the pre-write log data in the target sequence, performing a modulo operation on the sorting number and the number of threads of the read thread, based on a value corresponding to a calculation result of the modulo operation, querying a read thread of the target at the value location in the read thread queue, establishing a correspondence between the pre-write log data and the target read thread, and instructing the target read thread to perform a read operation on the pre-write log data, thereby obtaining the read pre-write log data.
Any of the above-mentioned pre-written log data may be any one of the pre-written log data sets, and a read thread responsible for reading the pre-written log data may be determined for each of the pre-written log data in the pre-written log data set in the above-mentioned manner.
In the embodiment of the present disclosure, each pre-written log data in the pre-written log data set is allocated to each read thread according to the above modulo manner, so that the read tasks of each read thread can be balanced, so as to improve the read performance of the read thread.
In the process of analyzing the pre-written log data by each analysis thread, in order to improve the analysis performance of the analysis thread, the pre-written log data can be distributed to each analysis thread in a mode of balancing the analysis task amount.
Optionally, the analyzing the read pre-written log data based on the plurality of analyzing threads to obtain the analyzed pre-written log data may include distributing the read pre-written log data to each analyzing thread based on a preset rule, where the preset rule is used to reflect that the pre-written log data obtained by each analyzing thread is complete pre-written log data, and analyzing the read pre-written log data based on each analyzing thread to obtain the analyzed pre-written log data.
Optionally, the method for distributing the read pre-written log data to each analysis thread based on the preset rule may include distributing each complete pre-written log data to each analysis thread based on a modulo operation result between a sequence number sequenced for each pre-written log data and the number of threads of the analysis thread, or sequentially selecting a quotient corresponding to a quotient to distribute each continuous complete pre-written log data to each analysis thread based on a quotient between the number of each pre-written log data and the number of threads of the analysis thread, and if the quotient corresponding to each analysis thread distributes each continuous complete pre-written log data, and then, if remaining pre-written log data still exists, randomly distributing the remaining pre-written log data to some analysis threads.
In this embodiment of the present disclosure, after the read pre-write log data is distributed to each analysis thread based on a preset rule, the pre-write log data acquired by each analysis thread may be complete pre-write log data.
In order to facilitate the timing of the application of each parsed pre-written log data, the present disclosure further proposes a specific embodiment for distributing each read pre-written log data to each parsing thread.
Optionally, each read pre-written log data is cached in a plurality of cache areas distributed in a first queue, and in the first queue, if the first cache area is located before the second cache area, the generation time of the pre-written log data included in the first cache area is earlier than the generation time of the pre-written log data included in the second cache area. The method comprises the steps of distributing read pre-written log data to each analysis thread based on preset rules, distributing the read pre-written log data to any one of the buffer areas in the first queue, splitting the data in any one of the buffer areas into a first buffer data block and a second buffer data block based on an analysis coordination thread for coordinating the analysis threads according to splitting sites in any one of the buffer areas, wherein the first buffer data block at least comprises one complete pre-written log data, the splitting sites are used for splitting out complete pre-written log data, distributing the first buffer data block to a first analysis thread in each analysis thread based on the analysis coordination thread, splitting the data in the adjacent buffer areas into a third buffer data block and a fourth buffer data block based on the splitting sites in the adjacent buffer areas according to the adjacent buffer areas, distributing the first buffer data block to adjacent buffer areas in the analysis threads based on the analysis threads, and distributing the data in the first buffer data block to the analysis threads based on the analysis threads.
In the embodiment of the present disclosure, the buffer may be a log buffer (redox Buf) for buffering log data in the slave node memory of the database. And sequencing each buffer area according to the LSN of the pre-written log data from small to large to obtain a first queue. In the first queue, the LSN of the pre-written log data in the buffer area at the front end of the first queue is smaller than the LSN of the pre-written log data in the buffer area at the rear end of the first queue, so that the generation time of the pre-written log data in the buffer area at the front end of the first queue is longer than the generation time of the pre-written log data in the buffer area at the rear end of the first queue. For each buffer area, the pre-written log data buffered in the buffer area are distributed in a queue according to the LSN from small to large.
Fig. 2 is a schematic flow chart of allocating each buffer to each parsing thread according to the embodiment of the present disclosure. Steps 202 through 216 are included as shown in fig. 2.
Step 202, selecting a first buffer area from the first queue based on the analysis coordination thread according to the sequence of LSN from small to large aiming at the first queue.
In the embodiment of the present disclosure, the parsing coordination thread (Parse _ coordinator) may be a core component responsible for coordinating and scheduling parsing tasks, and the parsing thread (Parse _worker) may be a work unit for specifically performing parsing operations. Parse _ coordinator can ensure that tasks are scheduled, and coordinate efficient collaboration of a plurality of Parse _workers to ensure accuracy and efficiency of data analysis.
And 204, judging whether a segmentation site exists in the first cache region or not to obtain a first judgment result, wherein the segmentation site is used for segmenting out complete pre-written log data.
In the embodiment of the present disclosure, the slicing point may be an mtr boundary, and the mtr boundary may be expressed as a boundary of a pre-written log record (redox). Since a pre-written log data can be understood as log data for a Page, the mtr boundary can be understood as an atomic change boundary of the Page content. Through mtr boundary segmentation, the integrity of the redox of one Page can be ensured, and the integrity of the redox of all Page can be ensured within the mtr boundary segmentation range. The data set cut by the mtr boundary may be understood as a complete set of redox that is physically modified for a group of pages, where the data set may include multiple complete redox that is physically modified for multiple pages, and may also include one complete redox that is physically modified for one Page.
In this embodiment of the present disclosure, the pre-written log data buffered in the buffer may be a binary format data stream, and for the pre-written log data in one binary format, a part of the data may be stored in one buffer, and the rest of the data may be stored in another buffer, due to the limited storage space of the buffer, so that incomplete pre-written log data may be stored in the buffer. The splitting site may be a junction point between the end-complete pre-written log data and the incomplete pre-written log data in the binary data chain stored in the buffer region, and the splitting site may split the end-incomplete pre-written log data and the previous complete pre-written log data in the binary data chain.
And 206, if the first judgment result indicates that the segmentation position exists in the first cache region, segmenting the data in the first cache region into a first cache data block and a second cache data block based on an analysis coordination thread.
In this embodiment of the present disclosure, the first buffered data block includes at least one complete pre-written log data, and the second buffered data block includes incomplete pre-written log data.
And step 208, distributing the first cache data block to an idle first analysis thread based on the analysis coordination thread, and adding the first analysis thread into an analysis thread queue.
In this embodiment of the present disclosure, after an idle resolution thread is allocated with a resolution task, the resolution thread may be added to a resolution thread queue, where each resolution thread in the resolution thread queue is sequentially arranged from front to back according to the time sequence of addition.
Step 210, selecting a second buffer area from the first queue based on the parsing coordination thread according to the sequence of LSN from small to large for the first queue.
And 212, if the segmentation position exists in the second cache region, segmenting the data in the second cache region into a third cache data block and a fourth cache data block based on the analysis coordination thread.
In this embodiment of the present disclosure, the third buffered data block includes at least one complete pre-written log data, and the fourth buffered data block includes incomplete pre-written log data.
And step 214, distributing the second cache data block and the third cache data block to an idle second analysis thread based on the analysis coordination thread, and adding the second analysis thread into an analysis thread queue.
In this embodiment of the present disclosure, after the data in the first buffer area is allocated, the first buffer area may be set as a free buffer area to buffer the subsequent pre-written log data.
Step 216, looping according to the steps until the buffer area in the first queue is fully allocated or no free parsing thread exists.
It should be noted that, if any pre-written log data is not cached in the second cache region, it indicates that no data in the second cache region needs to be synchronized to the database slave node, the second cache data block is allocated to an idle second analysis thread, and the second analysis thread is added into the analysis thread queue.
Optionally, the allocating the read pre-written log data to each of the parsing threads based on a preset rule may further include allocating, if the splitting site is not included in any one of the cache regions and the splitting site is included in the adjacent cache region, any one of the cache regions and the third cache data block to a third parsing thread of the parsing threads based on the parsing coordination thread.
In the embodiment of the present disclosure, in fig. 2, if step 206 indicates that the first buffer does not have the splitting site, and step 212 indicates that the second buffer has the splitting site, the pre-written log data in the first buffer and the data in the third buffer data block are allocated to the first parsing thread.
In this embodiment of the present disclosure, if the first buffer area does not have the splitting site, it may be reflected that the data stored in the first buffer area is incomplete pre-written log data, and part of the incomplete pre-written log data may be stored in the second buffer area located behind the first buffer area. Therefore, the data of the first cache region and the third cache data block segmented from the second cache region are commonly distributed to the first analysis thread, so that the data acquired by the first analysis thread can be complete pre-written log data.
In the embodiment of the present disclosure, the binary data streams allocated to each analysis thread are all complete pre-written log data, so that each pre-written log data analyzed by each analysis thread is all complete pre-written log data, and when a subsequent application thread applies to the data analyzed by the analysis thread, the complete pre-written log data can be applied to the database slave node, so as to improve the data integrity of data synchronization for the database slave node.
In the embodiment of the present disclosure, a specific embodiment of the parsing thread performing the parsing operation on each pre-written log data to be parsed is further provided.
Optionally, for any analysis thread, each piece of pre-written log data to be analyzed, which is distributed to any analysis thread, is distributed in a second queue according to the generation time, the analyzing operation is performed on each piece of pre-written log data after reading based on each analysis thread to obtain each piece of pre-written log data after analyzing, which can include traversing each piece of pre-written log data to be analyzed in the second queue for any analysis thread, analyzing the pre-written log data to be analyzed, which is positioned before the pre-written log data of the specific type in the second queue, to a first data set if the pre-written log data of the specific type exists in the pre-written log data of the specific type, wherein the pre-written log data of the specific type at least comprises log data for updating a database structure, analyzing the pre-written log data of the specific type to a second data set, and analyzing the pre-written log data to be analyzed, which is positioned behind the pre-written log data of the specific type in the second queue, to a third data set.
Optionally, each of the pre-written log data to be parsed is distributed in a second queue according to the generating time, and may include that each of the pre-written log data to be parsed is distributed in the second queue according to the LSN of the pre-written log data and in the order from small to large. The earlier the generation time of the pre-written log data is, the smaller the LSN of the pre-written log data is.
In this embodiment of the present disclosure, each of the pre-write log data to be parsed allocated to any one of the parsing threads may be distributed in the second queue. The process of analyzing the to-be-analyzed pre-written log data by any analyzing thread in the second queue can comprise traversing the to-be-analyzed pre-written log data in sequence from front to back in the second queue, and dividing the to-be-analyzed log data into data sets (LogBatch) according to the type of the to-be-analyzed pre-written log data.
Optionally, dividing each log data to be parsed into each data set according to the type of the log data to be parsed may include parsing each log data to be parsed before the log data to be parsed of the specific type into a first LogBatch, parsing each log data to be parsed before the log data to be parsed of the specific type into a second LogBatch, and parsing each log data to be parsed after the log data to be parsed of the specific type into a third LogBatch if the log data to be parsed is the specific type of the log data to be parsed. The pre-written log data of a specific type may include at least a data definition language (Data Definition Language, DDL), which may be used to create, modify, and delete Database objects, which may include meta-information such as databases (databases) and tables (tables).
In the embodiment of the present disclosure, the first LogBatch, the second LogBatch, and the third LogBatch may be distributed in a queue from small to large according to the LSNs of the pre-written log data. So that the subsequent application thread can perform data application according to the queues ordered by the first LogBatch, the second LogBatch and the third LogBatch. Therefore, the time sequence of each pre-written log data to be applied can be improved, and the effectiveness of data synchronization aiming at the database slave nodes is improved.
In the embodiment of the present disclosure, a specific embodiment of performing an application operation on each to-be-applied pre-written log data parsed by the parsing thread is further provided.
Optionally, the parsing threads are distributed in a third queue according to the generation time of the pre-written log data to be parsed. The method comprises the steps of synchronizing each parsed prewritten log data to a database slave node based on a plurality of application threads according to generation time of each prewritten log data in the prewritten log data set, traversing each parsed thread in the third queue based on an application coordination thread for coordinating the plurality of application threads according to the generation time, and applying the parsed prewritten log data of any parsed thread to the database slave node based on the plurality of application threads to obtain the database slave node after data synchronization.
In the embodiment of the present disclosure, the application coordination thread (apply_ coordinator) may be responsible for coordinating and scheduling the tasks, so as to ensure that all application threads (apply_worker) operate according to the correct sequence and dependency. Apply_ coordinator typically manages the workflow, transactions, and coordination tasks during processing to ensure consistency of the system. The apply_worker may be responsible for executing specific data applications or transactional operations. Each apply_ worke can process the tasks assigned by apply_ coordinator to complete the actual data modification or change operations.
In the embodiment of the present disclosure, after each analysis thread is assigned with an analysis task, the analysis threads may be ordered according to the order of the generation time from early to late according to the generation time of the pre-written log data responsible for analysis, so as to obtain the third queue. Specifically, according to the LSN of the pre-written log data, the third queue may be obtained by sequencing each analysis thread according to the order of the LSN from small to large.
Optionally, traversing each of the resolved threads in the third queue based on an application coordination thread for coordinating the plurality of application threads according to the generation time may include traversing each of the resolved threads in the third queue based on the application coordination thread in order of from small to large LSNs according to LSNs of the pre-written log data to be applied.
In the embodiment of the present disclosure, for each traversed analysis thread, each pre-written log data analyzed by the analysis thread is applied to a database slave node by using an application thread, so as to obtain a database slave node after data synchronization.
In the embodiment of the present disclosure, according to the LSN of the pre-written log data, the application thread may apply, according to the order of the LSN from small to large, each pre-written log data analyzed by the analysis thread, so as to improve the time sequence in which each pre-written log data is applied, so as to improve the effectiveness of data synchronization with respect to the slave node of the database.
In the embodiment of the present disclosure, a specific embodiment of performing an application operation on each to-be-applied pre-written log data parsed by one of the parsing threads is further provided.
The method comprises the steps of analyzing each data set generated by any analysis thread according to the generation time of the pre-written log data to be applied, wherein the pre-written log data analyzed by any analysis thread is applied to a database slave node based on the plurality of application threads aiming at any traversed analysis thread to obtain the database slave node after data synchronization, the method comprises traversing each data set in the fourth queue based on the application coordination thread according to the generation time, and applying each pre-written log data in any data set to the database slave node based on the plurality of application threads aiming at any traversed data set to obtain the database slave node after data synchronization.
In this embodiment of the present disclosure, after the parsing thread parses each pre-written log data to be parsed, each data set may be generated according to the type of the pre-written log data. The fourth queue may be obtained by sorting the data sets according to the generation time of the pre-written log data included in the data sets in order of early and late generation times. Specifically, according to the LSN of the pre-written log data, the data sets may be ordered according to the order of the LSNs from small to large, so as to obtain the fourth queue.
Optionally, traversing each data set in the fourth queue based on the application coordination thread according to the generation time may include traversing each data set in the fourth queue in order of from small to large according to LSN of the pre-written log data to be applied.
In the embodiment of the present disclosure, for each traversed data set, each pre-written log data in the data set is applied to a database slave node by using an application thread, so as to obtain a database slave node after data synchronization.
In the embodiment of the present disclosure, an application thread may apply, according to LSNs of the pre-written log data, each pre-written log data included in the data set in order of from small to large LSNs, so as to improve the time sequence in which each pre-written log data is applied, so as to improve the effectiveness of data synchronization with respect to the slave node of the database.
In the embodiment of the present disclosure, a specific embodiment of performing an application operation on each to-be-applied pre-written log data in one of the data sets generated by the parsing thread is also provided.
Optionally, the step of applying each of the pre-written log data in any one of the data sets to a database slave node based on the plurality of application threads to obtain a database slave node after data synchronization may include determining whether any one of the data sets contains the pre-written log data of the specific type to obtain a determination result, if the determination result indicates that any one of the data sets does not contain the pre-written log data of the specific type, applying each of the pre-written log data of the first type to the database slave node based on the application coordination thread, wherein each of the pre-written log data of the first type includes pre-written log data recorded by executing a logical transaction, and applying each of the pre-written log data of the second type to the database slave node based on the plurality of application threads to obtain a database slave node after data synchronization, wherein the pre-written log data of the second type includes a physical operation performed on a data page in the any one of the data sets, and if the pre-written log data of the first type includes a pre-written log data of the specific type, and modifying the pre-written log data of the first type includes a pre-written log data recorded by executing a logical transaction to the database slave node.
Fig. 3 is a schematic flow chart of a resolving operation for resolving a thread queue according to an embodiment of the present disclosure. Steps 302 through 316 are included as shown in fig. 3.
Step 302, selecting a first analysis thread based on the application coordination thread according to the LSN of the pre-written log data and the order from small to large for a third queue formed by each analysis thread.
Step 304, selecting the first LogBatch from the fourth queue constituted by each LogBatch included in the first analysis thread based on the application coordination thread in order of small and large according to the LSN of the pre-written log data.
Step 306, determining whether the pre-written log data of the specific type exists in the first LogBatch.
If the specific type of pre-written log data does not exist in the first LogBatch, the first type of pre-written log data in the first LogBatch is applied to the database slave node based on the application coordination thread in step 308.
In the embodiment of the present disclosure, the first type of pre-written log data may include pre-written log data recorded by a logical event of executing a transaction. For example, the logical events of a transaction may include the start and commit of the transaction, or a global timestamp change of the transaction.
In the embodiment of the present disclosure, the first type of pre-written log data may include logical log contents, so that, in order to avoid destroying the logical structure of each logical log, each first type of pre-written log data needs to be applied to the database slave node by one thread. In an application batch, when an application coordination thread executes application operation aiming at each first type of pre-written log data, the application coordination thread is required to switch to a subsequent operation after the application of each first type of pre-written log data is completed, so that the switching times of the application coordination thread can be reduced, and the application efficiency aiming at each first type of pre-written log data is improved.
And step 310, applying the pre-written log data of each second type in the first LogBatch to the database slave node based on the plurality of application threads to obtain the database slave node after data synchronization.
In the embodiment of the present specification, the second type of pre-written log data may include pre-written log data recorded by performing a physical modification operation with respect to the data page. The application operation may be performed on the second type of pre-written log data after the application is completed on the first type of pre-written log data. Therefore, after the logical log data is synchronized aiming at the database slave node, so that the running environment of the database slave node meets the preset logical requirement, and then the physical log data is synchronized aiming at the standby database, the fluency of data synchronization aiming at the database slave node can be improved.
In the embodiment of the present disclosure, each application thread of the plurality of application threads may have a correspondence relationship with at least one data page in the database slave node. And distributing the pre-written log data to corresponding application threads according to information carried by the pre-written log data and related to the data table and the data page and according to the corresponding relation between the application threads and the data page, and applying the pre-written log data to the corresponding data page based on the corresponding application threads to finish the modification operation of the physical content of the data page so as to obtain the database slave node after data synchronization.
Optionally, establishing the correspondence between the application thread and the data page may include acquiring data table ID information (space_id) and data page number information (page_no) corresponding to each data page, and calculating a hash value for the target data page based on the space_id and the page_no, where the hash value may be a sequence number set for the target data page. And calculating a modular operation result between the hash value and the number of threads of the application threads, wherein each application thread is distributed in a queue, and according to the numerical value corresponding to the modular operation result, the application thread at the serial number corresponding to the numerical value is inquired from the application thread queue, and the corresponding relation between the application thread at the serial number and the target data page is established.
The target data page may be any one of the data pages in the database slave node, and for each data page in the database slave node, a correspondence relationship with the application thread may be established in the manner described above.
In the embodiment of the present disclosure, the application threads and the data pages have the above correspondence, so that each application thread can perform application operations on a plurality of data pages, and thus, the task amount of each application thread can be balanced, so as to improve the performance of application threads in performing application operations. The application threads can be operated in parallel, so that the efficiency of application operation of the application threads can be improved.
Step 312, if the specific type of pre-written log data exists in the first LogBatch, applying the specific type of pre-written log data to the database slave node based on the application coordination thread.
In the embodiment of the present disclosure, if a specific type of pre-written log data exists in the first LogBatch, a piece of pre-written log data may be stored in LogBatch. The specific type of pre-written log data may include pre-written log data recorded by forced synchronization operations on the database, such as DDL.
In the embodiment of the present disclosure, when the application operation is performed on the DDL, the database structure of the database slave node needs to be updated, and since the database slave node needs to be updated, all the first pre-written log data that occurs before the DDL needs to be applied in advance, and the application operation on the second pre-written log data that occurs after the DDL needs to be prohibited. This is so because if the DDL application results in the deletion of an old data page from the node for the database, the application operation will be performed on the first pre-written log data that occurred before the DDL application was completed, and the data synchronization process will occur in the event of a conflict in the data application in the scenario where the first pre-written log data was a physical modification to the deleted old data page. Or if the application result of the DDL is that a new data page is created for the database slave node, before the DDL application is completed, performing application operation for second pre-written log data after the DDL, and in the scenario that the second pre-written log data is a physical modification performed for the newly created new data page, performing data synchronization to generate a conflict event of the data application. Therefore, the application of each pre-written log data according to the sequence of the third queue and the fourth queue can avoid conflict events of the data application, so that the time sequence of each pre-written log data to be applied can be improved, and the effectiveness of data synchronization of the database slave nodes can be improved.
Step 314, looping each LogBatch in the fourth queue according to the processing manner for the first LogBatch.
Step 316, circularly processing each analysis thread in the third queue according to the processing mode aiming at the first analysis thread.
In this embodiment of the present disclosure, after each pre-written log data that the parsing thread is responsible for parsing is applied, the parsing thread may be set to an idle parsing thread, so as to receive the pre-written log data to be parsed subsequently.
Fig. 4 is a schematic flow chart of data synchronization for a database slave node based on each RedoLog according to an embodiment of the present disclosure. Steps 402 through 408 are included as shown in fig. 4.
Step 402, the reading thread reads each RedoLog to each redox Buf, and each redox Buf is arranged according to the sequence from small to large of LSN of RedoLog.
And 404, sequentially distributing RedoLog of each Redo Buf to each analysis thread according to the sequence from small to large of LSN, and forming an analysis thread queue according to the sequence from small to large of LSN after each analysis thread is distributed to be analyzed RedoLog.
Step 406, any parsing thread parses each to-be-parsed RedoLog into each LogBatch according to the type of to-be-parsed RedoLog in charge of parsing, and each LogBatch forms LogBatch a queue according to the sequence from small to large of LSN.
Step 408, the application thread traverses each analysis thread in the analysis thread queue and traverses each LogBatch in the LogBatch queue according to the sequence from small to large of LSN, and executes the application operation aiming at each LogBatch to obtain the database slave node after data synchronization.
In the embodiment of the present disclosure, since the above-described analysis thread queues are provided between the analysis threads to which the analysis tasks are assigned, and each LogBatch generated by each analysis thread has the above-described LogBatch queue. Thus by parsing the thread queues and LogBatch queues during the application phase, each RedoLog can be applied to database slave nodes in order of LSN from small to large. Therefore, in the analysis stage, when the DDL is analyzed, analysis operation is carried out on each RedoLog after the DDL is carried out without waiting for completion of the DDL application, so that delay risks of data synchronization caused by waiting for the DDL application operation in the analysis stage are avoided, and timeliness of data synchronization on the database slave node can be improved.
In the embodiment of the present disclosure, based on the parallelization and pipelining processes between the read thread, the parse thread, and the application thread, the rate of data synchronization for the database slave node may be improved, and each RedoLog may be sequentially applied to the database slave node according to the order of LSN from small to large, so as to improve the timeliness of data synchronization for the database slave node, thereby avoiding a data collision event in the data synchronization process, and further enabling the parallelization and pipelining processes between the read thread, the parse thread, and the application thread to be an effective synchronization process, so as to improve the timeliness of data synchronization for the database slave node.
Corresponding to the above method embodiments, the present disclosure further provides an embodiment of a data synchronization device, and fig. 5 shows a schematic structural diagram of a data synchronization device provided in one embodiment of the present disclosure. As shown in fig. 5, the apparatus includes:
An acquisition module 502 configured to acquire a set of pre-written log data recorded for data processing by a database master node.
And a reading module 504, configured to perform a reading operation on each of the pre-written log data in the pre-written log data set based on a plurality of reading threads, so as to obtain each of the read pre-written log data.
The parsing module 506 is configured to parse the read pre-written log data based on a plurality of parsing threads, so as to obtain parsed pre-written log data.
The application module 508 is configured to perform an application operation on each of the parsed pre-written log data based on a plurality of application threads according to the generation time of each of the pre-written log data in the pre-written log data set, and synchronize each of the parsed pre-written log data to a database slave node, where the plurality of reading threads, the plurality of parsing threads, and the plurality of application threads are all threads running in parallel, and the reading threads, the parsing threads, and the application threads run in parallel under the condition of performing reading, parsing, and application operations on different pre-written log data.
Optionally, the reading module 504 may include:
A first determination unit configured to determine a total amount of data of each of the pre-write log data included in the pre-write log data set.
And a second determination unit configured to determine the number of threads configured for the plurality of read threads based on the total amount of data.
And the reading unit is configured to read each piece of pre-written log data in the pre-written log data set based on the reading threads of the thread number, so as to obtain each piece of read pre-written log data.
Optionally, the second determining unit may include:
an acquisition subunit configured to acquire a unit read data amount for reflecting read performance of one read thread.
A calculation subunit configured to calculate a quotient between the total amount of data and the unit read amount of data.
A first determination subunit configured to determine, based on the quotient, a number of threads configured for the plurality of read threads.
Optionally, the reading unit may include:
And the sequencing subunit is configured to sequence each pre-written log data based on the generation time of each pre-written log data to obtain a target sequence.
The second determining subunit is configured to determine, for any one of the pre-written log data, a corresponding relationship for reflecting that the any one of the pre-written log data is read by a target read thread according to a modulo operation result between a sequence number corresponding to position information of the any one of the pre-written log data in the target sequence and the number of threads.
And the reading subunit is configured to read any one of the pre-written log data based on the corresponding relation to obtain the read pre-written log data.
Optionally, the parsing module 506 may include:
the distribution unit is configured to distribute the read pre-written log data to each analysis thread based on a preset rule, wherein the preset rule is used for reflecting that the pre-written log data acquired by each analysis thread are all complete pre-written log data.
And the analysis unit is configured to analyze the read pre-written log data based on the analysis threads to obtain the analyzed pre-written log data.
Optionally, each read pre-written log data is cached in a plurality of cache areas distributed in a first queue, and in the first queue, if the first cache area is located before the second cache area, the generation time of the pre-written log data included in the first cache area is earlier than the generation time of the pre-written log data included in the second cache area.
The distribution unit may include:
The first segmentation unit is configured to segment data in any buffer area into a first cache data block and a second cache data block according to segmentation sites in any buffer area in the first queue and based on an analysis coordination thread for coordinating the plurality of analysis threads, wherein the first cache data block at least comprises one complete pre-written log data, and the segmentation sites are used for segmenting the complete pre-written log data.
A first allocation subunit configured to allocate the first cache data block to a first analysis thread of the respective analysis threads based on the analysis coordination thread.
The second segmentation subunit is configured to segment data in an adjacent buffer zone positioned behind any buffer zone and adjacent to any buffer zone into a third buffer data block and a fourth buffer data block based on the analysis coordination thread according to the segmentation sites in the adjacent buffer zone, wherein the third buffer data block at least comprises one complete pre-written log data.
And a second allocation subunit configured to allocate the second cache data block and the third cache data block to a second analysis thread in the analysis threads based on the analysis coordination threads.
Optionally, the distribution unit may further include:
And a third allocation subunit configured to allocate, if the any one of the buffer areas does not include the segmentation site and the adjacent buffer areas include the segmentation site, the any one of the buffer areas and the third buffer data block to a third analysis thread of the analysis threads based on the analysis coordination thread.
Optionally, for any one of the parsing threads, each of the pre-written log data to be parsed allocated to the any one of the parsing threads is distributed in a second queue according to the generation time.
Optionally, the parsing unit may include:
and the first traversing subunit is configured to traverse each to-be-parsed pre-written log data in the second queue for any one of the parsing threads.
And the first analysis subunit is configured to analyze the pre-written log data to be analyzed, which is positioned before the pre-written log data of the specific type in the second queue, to a first data set if the pre-written log data of the specific type exists in the pre-written log data to be analyzed, wherein the pre-written log data of the specific type at least comprises log data for updating a database structure.
And a second parsing subunit configured to parse the pre-written log data of the specific type to a second data set.
And the third analysis subunit is configured to analyze the pre-written log data to be analyzed, which is positioned behind the pre-written log data of the specific type in the second queue, to a third data set.
Optionally, the parsing threads are distributed in a third queue according to the generation time of the pre-written log data to be parsed.
The application module 508 may include:
A traversing unit configured to traverse each of the resolution threads in the third queue based on an application coordination thread for coordinating the plurality of application threads according to the generation time;
The application unit is configured to apply the pre-written log data analyzed by any one of the analysis threads to the database slave node based on the plurality of application threads for any one of the traversed analysis threads, and obtain the database slave node after data synchronization.
Optionally, each data set generated by analyzing by any analyzing thread is distributed in a fourth queue according to the generating time of the to-be-applied pre-written log data.
The application unit may include:
A second traversal subunit configured to traverse each of the data sets in the fourth queue based on the application coordination thread according to the generation time.
And the application subunit is configured to apply each pre-written log data in any data set to the database slave node based on the plurality of application threads for any traversed data set, and obtain the database slave node after data synchronization.
Optionally, the application subunit is specifically configured to:
and judging whether any data set contains the pre-written log data of the specific type or not to obtain a judging result.
And if the judging result indicates that the specific type of pre-written log data is not contained in any data set, applying each first type of pre-written log data in any data set to a database slave node based on the application coordination thread, wherein the first type of pre-written log data comprises the pre-written log data recorded by a logic event for executing a transaction.
And applying each second type of pre-written log data in any data set to the database slave node based on the plurality of application threads to obtain the database slave node after data synchronization, wherein the second type of pre-written log data comprises the pre-written log data recorded by physical modification operation on a data page.
And if the judging result indicates that the data set contains the pre-written log data of the specific type, applying the pre-written log data of the specific type to the database slave node based on the application coordination thread to obtain the database slave node after data synchronization.
According to the data synchronization device provided by the specification, based on the parallelization and pipelining processing procedures among the reading thread, the analysis thread and the application thread, the speed of data synchronization for the database slave node can be improved, and meanwhile, each RedoLog can be sequentially applied to the database slave node according to the sequence of LSN from small to large so as to improve the time sequence of data synchronization for the database slave node, thereby avoiding data conflict events in the data synchronization process, and further enabling the parallelization and pipelining processing procedures among the reading thread, the analysis thread and the application thread to be effective synchronous processing procedures.
The foregoing is a schematic scheme of a data synchronization apparatus of this embodiment. It should be noted that, the technical solution of the data synchronization device and the technical solution of the data synchronization method belong to the same concept, and details of the technical solution of the data synchronization device, which are not described in detail, can be referred to the description of the technical solution of the data synchronization method.
Fig. 6 illustrates a block diagram of a computing device 600 provided in accordance with one embodiment of the present description. The components of computing device 600 include, but are not limited to, memory 610 and processor 620. The processor 620 is coupled to the memory 610 via a bus 630 and a database 650 is used to hold data.
Computing device 600 also includes access device 640, access device 640 enabling computing device 600 to communicate via one or more networks 660. Examples of such networks include public switched telephone networks (PSTN, public Switched Telephone Network), local area networks (LAN, local Area Network), wide area networks (WAN, wide Area Network), personal area networks (PAN, personal Area Network), or combinations of communication networks such as the internet. The access device 640 may include one or more of any type of network interface, wired or wireless, such as a network interface card (NIC, network interface controller), such as an IEEE802.11 wireless local area network (WLAN, wireless Local Area Network) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, worldwide Interoperability for Microwave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular network interface, a bluetooth interface, near Field Communication (NFC).
In one embodiment of the present description, the above-described components of computing device 600, as well as other components not shown in FIG. 6, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device shown in FIG. 6 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 600 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or personal computer (PC, personal Computer). Computing device 600 may also be a mobile or stationary server.
Wherein the processor 620 is configured to execute computer programs/instructions that, when executed by the processor, perform the steps of the data synchronization method described above.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for computing device embodiments, the description is relatively simple as it is substantially similar to data synchronization method embodiments, as relevant to the description of the data synchronization method embodiments.
An embodiment of the present disclosure also provides a computer-readable storage medium storing a computer program/instruction that, when executed by a processor, implements the steps of the data synchronization method described above.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for computer readable storage medium embodiments, since they are substantially similar to data synchronization method embodiments, the description is relatively simple, and reference should be made to the description of data synchronization method embodiments in part.
An embodiment of the present specification also provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the data synchronization method described above.
The foregoing is a schematic version of a computer program product of this embodiment. It should be noted that, the technical solution of the computer program product and the technical solution of the data synchronization method belong to the same concept, and details of the technical solution of the computer program product, which are not described in detail, can be referred to the description of the technical solution of the data synchronization method.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be increased or decreased appropriately according to the requirements of the patent practice, for example, in some areas, according to the patent practice, the computer readable medium does not include an electric carrier signal and a telecommunication signal.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims (14)

1.一种数据同步方法,包括:1. A data synchronization method, comprising: 获取针对数据库主节点进行数据处理所记录的预写日志数据集合;Obtain a set of write-ahead log data recorded for data processing on the database master node; 基于多个读取线程对所述预写日志数据集合中的各个预写日志数据进行读取操作,得到读取后的各个预写日志数据;Performing a reading operation on each write-ahead log data in the write-ahead log data set based on multiple reading threads to obtain each read write-ahead log data; 基于多个解析线程对所述读取后的各个预写日志数据进行解析操作,得到解析后的各个预写日志数据;所述解析操作用于将用于更新数据库结构的特定类型的预写日志数据解析至第二数据集合,将位于所述特定类型的预写日志数据之前的预写日志数据解析至第一数据集合,将位于所述特定类型的预写日志数据之后的预写日志数据解析至第三数据集合,任一所述解析线程解析生成的各个数据集合根据预写日志数据的生成时间呈队列分布;Based on multiple parsing threads, a parsing operation is performed on each of the read write-ahead log data to obtain each parsed write-ahead log data; the parsing operation is used to parse the specific type of write-ahead log data used to update the database structure into a second data set, parse the write-ahead log data located before the specific type of write-ahead log data into a first data set, and parse the write-ahead log data located after the specific type of write-ahead log data into a third data set, and each data set generated by the parsing of any of the parsing threads is distributed in a queue according to the generation time of the write-ahead log data; 根据所述预写日志数据集合中的各个预写日志数据的生成时间,基于多个应用线程对所述解析后的各个预写日志数据进行应用操作,将所述解析后的各个预写日志数据同步至数据库从节点,其中,所述多个读取线程、所述多个解析线程以及所述多个应用线程均为并行运行的线程,且在针对不同预写日志数据进行读取、解析以及应用操作的情况下,读取线程、解析线程以及应用线程之间并行运行。According to the generation time of each write-ahead log data in the write-ahead log data set, application operations are performed on each of the parsed write-ahead log data based on multiple application threads, and each of the parsed write-ahead log data is synchronized to a database slave node, wherein the multiple reading threads, the multiple parsing threads, and the multiple application threads are all threads running in parallel, and when reading, parsing, and applying operations are performed on different write-ahead log data, the reading threads, the parsing threads, and the application threads run in parallel. 2.如权利要求1所述的方法,所述基于多个读取线程对所述预写日志数据集合中的各个预写日志数据进行读取操作,得到读取后的各个预写日志数据,包括:2. The method according to claim 1, wherein the step of performing a reading operation on each of the write-ahead log data in the write-ahead log data set based on multiple reading threads to obtain each of the read write-ahead log data comprises: 确定所述预写日志数据集合所包含的各个预写日志数据的数据总量;Determining the total amount of data of each write-ahead log data included in the write-ahead log data set; 基于所述数据总量,确定针对所述多个读取线程所配置的线程数量;Based on the total amount of data, determining the number of threads configured for the multiple reading threads; 基于所述线程数量的读取线程,对所述预写日志数据集合中的各个预写日志数据进行读取操作,得到读取后的各个预写日志数据。The reading threads based on the number of threads perform a reading operation on each write-ahead log data in the write-ahead log data set to obtain each read write-ahead log data. 3.如权利要求2所述的方法,所述基于所述数据总量,确定针对所述多个读取线程所配置的线程数量,包括:3. The method according to claim 2, wherein determining the number of threads configured for the plurality of reading threads based on the total amount of data comprises: 获取用于反映一个读取线程读取性能的单位读取数据量;Get the unit read data volume used to reflect the read performance of a read thread; 计算所述数据总量与所述单位读取数据量之间的商;Calculating a quotient between the total amount of data and the unit read data amount; 基于所述商,确定针对所述多个读取线程所配置的线程数量。Based on the quotient, the number of threads configured for the plurality of reading threads is determined. 4.如权利要求2所述的方法,所述基于所述线程数量的读取线程,对所述预写日志数据集合中的各个预写日志数据进行读取操作,得到读取后的各个预写日志数据,包括:4. The method according to claim 2, wherein the reading thread based on the number of threads performs a reading operation on each of the write-ahead log data in the write-ahead log data set to obtain each of the read write-ahead log data, comprising: 基于各个所述预写日志数据的生成时间,对所述各个预写日志数据进行排序,得到目标序列;sorting the respective write-ahead log data based on the generation time of the respective write-ahead log data to obtain a target sequence; 针对任一所述预写日志数据,根据所述任一所述预写日志数据在所述目标序列中的位置信息对应的序号与所述线程数量之间的取模运算结果,确定用于反映所述任一所述预写日志数据被目标读取线程读取的对应关系;For any of the write-ahead log data, determining a corresponding relationship reflecting that any of the write-ahead log data is read by a target reading thread according to a result of a modulo operation between a sequence number corresponding to the position information of any of the write-ahead log data in the target sequence and the number of threads; 基于所述对应关系,对所述任一所述预写日志数据进行读取操作,得到读取后的预写日志数据。Based on the corresponding relationship, a read operation is performed on any of the write-ahead log data to obtain the read write-ahead log data. 5.如权利要求1所述的方法,所述基于多个解析线程对所述读取后的各个预写日志数据进行解析操作,得到解析后的各个预写日志数据,包括:5. The method according to claim 1, wherein the parsing operation is performed on each of the read write-ahead log data based on multiple parsing threads to obtain each of the parsed write-ahead log data, comprising: 基于预设规则,将所述读取后的各个预写日志数据分配至各个所述解析线程,所述预设规则用于反映各个所述解析线程获取的预写日志数据均为完整的预写日志数据;Based on a preset rule, each of the read write-ahead log data is distributed to each of the parsing threads, wherein the preset rule is used to reflect that the write-ahead log data obtained by each of the parsing threads are all complete write-ahead log data; 基于所述各个所述解析线程对所述读取后的各个预写日志数据进行解析操作,得到解析后的各个预写日志数据。Based on the parsing threads, a parsing operation is performed on each of the read write-ahead log data to obtain each parsed write-ahead log data. 6.如权利要求5所述的方法,所述读取后的各个预写日志数据缓存在呈第一队列分布的若干个缓存区内,在所述第一队列中,若第一缓存区位于第二缓存区前,则所述第一缓存区包括的预写日志数据的生成时间比所述第二缓存区包括的预写日志数据的生成时间早;6. The method of claim 5, wherein each of the read write-ahead log data is cached in a plurality of cache areas distributed in a first queue, and in the first queue, if the first cache area is located before the second cache area, the generation time of the write-ahead log data included in the first cache area is earlier than the generation time of the write-ahead log data included in the second cache area; 所述基于预设规则,将所述读取后的各个预写日志数据分配至各个所述解析线程,包括:The method of allocating the read write-ahead log data to the parsing threads based on a preset rule includes: 针对所述第一队列中的任一所述缓存区,根据所述任一所述缓存区中的切分位点,基于用于协调所述多个解析线程的解析协调线程,将所述任一所述缓存区中的数据切分为第一缓存数据块与第二缓存数据块,所述第一缓存数据块中至少包括一个完整的预写日志数据,所述切分位点用于切分出完整的预写日志数据;For any of the cache areas in the first queue, according to a split point in any of the cache areas, based on a parsing coordination thread for coordinating the multiple parsing threads, data in any of the cache areas is split into a first cache data block and a second cache data block, the first cache data block includes at least one complete write-ahead log data, and the split point is used to split out the complete write-ahead log data; 基于所述解析协调线程,将所述第一缓存数据块分配至所述各个所述解析线程中的第一解析线程;Based on the parsing coordination thread, allocating the first cache data block to a first parsing thread among the parsing threads; 针对位于所述任一所述缓存区后面且与所述任一所述缓存区相邻的相邻缓存区,根据所述相邻缓存区中的所述切分位点,基于所述解析协调线程将所述相邻缓存区中的数据切分为第三缓存数据块与第四缓存数据块,所述第三缓存数据块中至少包括一个完整的预写日志数据;For an adjacent cache area located behind any of the cache areas and adjacent to any of the cache areas, according to the splitting position in the adjacent cache area, based on the parsing coordination thread, data in the adjacent cache area is split into a third cache data block and a fourth cache data block, wherein the third cache data block includes at least one complete pre-write log data; 基于所述解析协调线程,将所述第二缓存数据块与所述第三缓存数据块分配至所述各个所述解析线程中的第二解析线程。Based on the parsing coordination thread, the second cache data block and the third cache data block are allocated to the second parsing thread among the parsing threads. 7.如权利要求6所述的方法,所述基于预设规则,将所述读取后的各个预写日志数据分配至所述各个所述解析线程,还包括:7. The method according to claim 6, wherein the step of allocating the read write-ahead log data to the parsing threads based on a preset rule further comprises: 若所述任一所述缓存区中不包含所述切分位点,且所述相邻缓存区中包含所述切分位点,则基于所述解析协调线程,将所述任一所述缓存区以及所述第三缓存数据块分配至所述各个所述解析线程中的第三解析线程。If any of the cache areas does not contain the split site, and the adjacent cache area contains the split site, then based on the parsing coordination thread, any of the cache areas and the third cache data block are allocated to the third parsing thread among the parsing threads. 8.如权利要求5所述的方法,所述各个所述解析线程根据待解析预写日志数据的所述生成时间呈第三队列分布;8. The method of claim 5, wherein each of the parsing threads is distributed in a third queue according to the generation time of the write-ahead log data to be parsed; 所述根据所述预写日志数据集合中的各个预写日志数据的生成时间,基于多个应用线程对所述解析后的各个预写日志数据进行应用操作,将所述解析后的各个预写日志数据同步至数据库从节点,包括:The step of performing application operations on each of the parsed write-ahead log data based on multiple application threads according to the generation time of each of the write-ahead log data in the write-ahead log data set, and synchronizing each of the parsed write-ahead log data to a database slave node includes: 根据所述生成时间,基于用于协调所述多个应用线程的应用协调线程遍历所述第三队列中的各个所述解析线程;According to the generation time, traverse each of the parsing threads in the third queue based on an application coordination thread for coordinating the multiple application threads; 针对遍历出的任一所述解析线程,基于所述多个应用线程将所述任一所述解析线程解析后的预写日志数据应用至数据库从节点,得到数据同步后的数据库从节点。For any of the parsing threads that have been traversed, the write-ahead log data parsed by any of the parsing threads is applied to a database slave node based on the multiple application threads to obtain a database slave node after data synchronization. 9.如权利要求8所述的方法,所述任一所述解析线程解析生成的各个数据集合根据待应用预写日志数据的所述生成时间呈第四队列分布;9. The method according to claim 8, wherein each data set generated by parsing by any of the parsing threads is distributed in a fourth queue according to the generation time of the write-ahead log data to be applied; 所述针对遍历出的任一解析线程,基于所述多个应用线程将所述任一所述解析线程解析后的预写日志数据应用至数据库从节点,得到数据同步后的数据库从节点,包括:The method of applying the pre-written log data parsed by any of the parsing threads to the database slave node based on the multiple application threads for any of the parsing threads to obtain the database slave node after data synchronization includes: 根据所述生成时间,基于所述应用协调线程遍历所述第四队列中的各个所述数据集合;According to the generation time, traverse each of the data sets in the fourth queue based on the application coordination thread; 针对遍历出的任一所述数据集合,基于所述多个应用线程将任一所述数据集合中的各个预写日志数据应用至数据库从节点,得到数据同步后的数据库从节点。For any of the traversed data sets, each write-ahead log data in any of the data sets is applied to a database slave node based on the multiple application threads to obtain a database slave node after data synchronization. 10.如权利要求9所述的方法,所述针对遍历出的任一所述数据集合,基于所述多个应用线程将任一所述数据集合中的各个预写日志数据应用至数据库从节点,得到数据同步后的数据库从节点,包括:10. The method according to claim 9, wherein for any of the traversed data sets, based on the multiple application threads, each write-ahead log data in any of the data sets is applied to the database slave node to obtain the database slave node after data synchronization, comprising: 判断所述任一所述数据集合是否包含所述特定类型的预写日志数据,得到判断结果;Determine whether any of the data sets contains the specific type of write-ahead log data, and obtain a determination result; 若所述判断结果表示所述任一所述数据集合中未包含所述特定类型的预写日志数据,则基于所述应用协调线程将所述任一所述数据集合中的各个第一类型的预写日志数据应用至数据库从节点,所述第一类型的预写日志数据包括执行事务的逻辑事件所记录的预写日志数据;If the judgment result indicates that any of the data sets does not contain the specific type of write-ahead log data, then applying each first type of write-ahead log data in any of the data sets to the database slave node based on the application coordination thread, where the first type of write-ahead log data includes the write-ahead log data recorded by the logical event of executing the transaction; 基于所述多个应用线程将所述任一所述数据集合中的各个第二类型的预写日志数据应用至数据库从节点,得到数据同步后的数据库从节点,所述第二类型的预写日志数据包括针对数据页进行物理修改操作所记录的预写日志数据;Applying each second type of write-ahead log data in any of the data sets to a database slave node based on the multiple application threads to obtain a database slave node after data synchronization, wherein the second type of write-ahead log data includes write-ahead log data recorded by a physical modification operation performed on a data page; 若所述判断结果表示所述任一所述数据集合中包含特定类型的预写日志数据,则基于所述应用协调线程将所述特定类型的预写日志数据应用至数据库从节点,得到数据同步后的数据库从节点。If the judgment result indicates that any of the data sets contains a specific type of write-ahead log data, the specific type of write-ahead log data is applied to the database slave node based on the application coordination thread to obtain the database slave node after data synchronization. 11.一种数据同步装置,包括:11. A data synchronization device, comprising: 获取模块,被配置为获取针对数据库主节点进行数据处理所记录的预写日志数据集合;An acquisition module is configured to acquire a set of write-ahead log data recorded for data processing performed on a database master node; 读取模块,被配置为基于多个读取线程对所述预写日志数据集合中的各个预写日志数据进行读取操作,得到读取后的各个预写日志数据;A reading module is configured to perform a reading operation on each write-ahead log data in the write-ahead log data set based on multiple reading threads to obtain each read write-ahead log data; 解析模块,被配置为基于多个解析线程对所述读取后的各个预写日志数据进行解析操作,得到解析后的各个预写日志数据;所述解析操作用于将用于更新数据库结构的特定类型的预写日志数据解析至第二数据集合,将位于所述特定类型的预写日志数据之前的预写日志数据解析至第一数据集合,将位于所述特定类型的预写日志数据之后的预写日志数据解析至第三数据集合,任一所述解析线程解析生成的各个数据集合根据预写日志数据的生成时间呈队列分布;A parsing module is configured to perform a parsing operation on each of the read write-ahead log data based on multiple parsing threads to obtain each parsed write-ahead log data; the parsing operation is used to parse the specific type of write-ahead log data used to update the database structure into a second data set, parse the write-ahead log data located before the specific type of write-ahead log data into a first data set, and parse the write-ahead log data located after the specific type of write-ahead log data into a third data set, and each data set generated by the parsing of any of the parsing threads is distributed in a queue according to the generation time of the write-ahead log data; 应用模块,被配置为根据所述预写日志数据集合中的各个预写日志数据的生成时间,基于多个应用线程对所述解析后的各个预写日志数据进行应用操作,将所述解析后的各个预写日志数据同步至数据库从节点,其中,所述多个读取线程、所述多个解析线程以及所述多个应用线程均为并行运行的线程,且在针对不同预写日志数据进行读取、解析以及应用操作的情况下,读取线程、解析线程以及应用线程之间并行运行。The application module is configured to perform application operations on each of the parsed write-ahead log data based on multiple application threads according to the generation time of each write-ahead log data in the write-ahead log data set, and synchronize each of the parsed write-ahead log data to the database slave node, wherein the multiple reading threads, the multiple parsing threads and the multiple application threads are all threads running in parallel, and when reading, parsing and applying operations are performed on different write-ahead log data, the reading thread, the parsing thread and the application thread run in parallel. 12.一种计算设备,包括:12. A computing device comprising: 存储器和处理器;Memory and processor; 所述存储器用于存储计算机程序/指令,所述处理器用于执行所述计算机程序/指令,该计算机程序/指令被处理器执行时实现权利要求1至10任意一项所述数据同步方法的步骤。The memory is used to store computer programs/instructions, and the processor is used to execute the computer programs/instructions. When the computer programs/instructions are executed by the processor, the steps of the data synchronization method according to any one of claims 1 to 10 are implemented. 13.一种计算机可读存储介质,其存储有计算机程序/指令,该计算机程序/指令被处理器执行时实现权利要求1至10任意一项所述数据同步方法的步骤。13. A computer-readable storage medium storing a computer program/instruction, wherein the computer program/instruction, when executed by a processor, implements the steps of the data synchronization method according to any one of claims 1 to 10. 14.一种计算机程序产品,包括计算机程序/指令,该计算机程序/指令被处理器执行时实现权利要求1至10任意一项所述数据同步方法的步骤。14. A computer program product, comprising a computer program/instruction, which, when executed by a processor, implements the steps of the data synchronization method according to any one of claims 1 to 10.
CN202411676002.2A 2024-11-21 2024-11-21 Data synchronization method, device, equipment, medium and product Active CN119201874B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411676002.2A CN119201874B (en) 2024-11-21 2024-11-21 Data synchronization method, device, equipment, medium and product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411676002.2A CN119201874B (en) 2024-11-21 2024-11-21 Data synchronization method, device, equipment, medium and product

Publications (2)

Publication Number Publication Date
CN119201874A CN119201874A (en) 2024-12-27
CN119201874B true CN119201874B (en) 2025-03-14

Family

ID=94064408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411676002.2A Active CN119201874B (en) 2024-11-21 2024-11-21 Data synchronization method, device, equipment, medium and product

Country Status (1)

Country Link
CN (1) CN119201874B (en)

Family Cites Families (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7031974B1 (en) * 2002-08-01 2006-04-18 Oracle International Corporation Replicating DDL changes using streams
US20050289186A1 (en) * 2004-06-29 2005-12-29 Microsoft Corporation DDL replication without user intervention
US8566326B2 (en) * 2004-11-05 2013-10-22 Oracle International Corporation High-performance log-based processing
KR101322401B1 (en) * 2012-01-31 2013-10-28 주식회사 알티베이스 Apparatus and method for parallel processing in database management system for synchronous replication
US9830372B2 (en) * 2013-07-24 2017-11-28 Oracle International Corporation Scalable coordination aware static partitioning for database replication
IN2013CH06072A (en) * 2013-12-24 2015-06-26 Huawei Technologies India Pvt Ltd
CN104978313A (en) * 2014-04-01 2015-10-14 中兴通讯股份有限公司 Data synchronization method and apparatus for database system, and server
KR101956236B1 (en) * 2016-11-16 2019-03-11 주식회사 실크로드소프트 Data replication technique in database management system
US20180144015A1 (en) * 2016-11-18 2018-05-24 Microsoft Technology Licensing, Llc Redoing transaction log records in parallel
CN108874588A (en) * 2018-06-08 2018-11-23 郑州云海信息技术有限公司 A kind of database instance restoration methods and device
CN112541074B (en) * 2019-09-20 2025-02-14 中兴通讯股份有限公司 Log parsing method, device, server and storage medium
CN111061690B (en) * 2019-11-22 2023-08-22 武汉达梦数据库股份有限公司 RAC-based database log file reading method and device
US11580110B2 (en) * 2019-12-31 2023-02-14 Huawei Cloud Computing Technologies Co., Ltd. Methods and apparatuses for generating redo records for cloud-based database
CN111858505B (en) * 2020-06-04 2024-04-16 武汉达梦数据库股份有限公司 Parallel execution method and data synchronization system based on log analysis synchronization
CN112035410B (en) * 2020-08-18 2023-08-18 腾讯科技(深圳)有限公司 Log storage method, device, node device and storage medium
CN112416654B (en) * 2020-11-26 2024-04-09 上海达梦数据库有限公司 Database log replay method, device, equipment and storage medium
CN112181902B (en) * 2020-11-30 2021-08-31 阿里云计算有限公司 Database storage method and device and electronic equipment
CN113760846B (en) * 2021-01-08 2025-02-21 北京沃东天骏信息技术有限公司 A data processing method and device
CN112765251A (en) * 2021-01-22 2021-05-07 苏州浪潮智能科技有限公司 Method, system and medium for importing database
CN113419824A (en) * 2021-01-25 2021-09-21 阿里巴巴集团控股有限公司 Data processing method, device, system and computer storage medium
CN112905390B (en) * 2021-03-31 2025-03-28 恒生电子股份有限公司 Log data backup method, device, equipment and storage medium
CN113590596A (en) * 2021-07-02 2021-11-02 阿里巴巴新加坡控股有限公司 Data processing method, system, device, computer program product and storage medium
CN113626399B (en) * 2021-08-17 2023-10-20 深圳市恒源昊信息科技有限公司 Data synchronization method, device, server and storage medium
US11822570B2 (en) * 2021-11-03 2023-11-21 International Business Machines Corporation Database synchronization employing parallel poll threads
CN115114370B (en) * 2022-01-20 2023-06-13 腾讯科技(深圳)有限公司 Master-slave database synchronization method and device, electronic equipment and storage medium
KR20240010137A (en) * 2022-07-15 2024-01-23 주식회사 아크데이타 System and control method for integrated replication of cloud
CN115422286A (en) * 2022-08-19 2022-12-02 武汉达梦数据库股份有限公司 Data synchronization method and device for distributed database
US12086041B2 (en) * 2022-10-10 2024-09-10 Salesforce, Inc. Early database transaction visibility
CN115629901A (en) * 2022-10-26 2023-01-20 北京奥星贝斯科技有限公司 Log playback method and device, data recovery method and device and electronic equipment
CN115904817A (en) * 2022-12-30 2023-04-04 金篆信科有限责任公司 Distributed database parallel playback method and device, electronic equipment and storage medium
US20240289329A1 (en) * 2023-02-24 2024-08-29 Google Llc Technique for Parallel Recovery on Read Replica
WO2024177779A1 (en) * 2023-02-24 2024-08-29 Google Llc Technique for parallel recovery on read replica
CN116860874A (en) * 2023-07-06 2023-10-10 金篆信科有限责任公司 Method, device, equipment and storage medium for realizing rapid DDL synchronization of database standby machine
CN117389696A (en) * 2023-10-17 2024-01-12 浙江智臾科技有限公司 Parallel recovery method and storage medium applied to OLTP memory database
CN117591552A (en) * 2023-10-19 2024-02-23 网易(杭州)网络有限公司 Data processing method, medium, device and computing equipment
CN118035255A (en) * 2024-01-23 2024-05-14 天津大学 Non-invasive log pushing method for storing and calculating separated database
CN118445306A (en) * 2024-05-24 2024-08-06 浪潮云信息技术股份公司 Method, device, equipment and medium for offline analysis of pre-written log

Also Published As

Publication number Publication date
CN119201874A (en) 2024-12-27

Similar Documents

Publication Publication Date Title
JP3779263B2 (en) Conflict resolution for collaborative work systems
CN111797121B (en) Strong consistency query method, device and system of read-write separation architecture service system
CN109643310B (en) System and method for redistribution of data in a database
CN105468473B (en) Data migration method and data migration device
Xue et al. Seraph: an efficient, low-cost system for concurrent graph processing
CN113297320B (en) Distributed database system and data processing method
US20120239722A1 (en) Read-only operations processing in a paxos replication system
US20230418811A1 (en) Transaction processing method and apparatus, computing device, and storage medium
CN109726250A (en) Data-storage system, metadatabase synchronization and data cross-domain calculation method
CN111813760A (en) Data migration method and device
CN105468720A (en) Method for integrating distributed data processing systems, corresponding systems and data processing method
US12050603B2 (en) Opportunistic cloud data platform pipeline scheduler
Jiang et al. Alibaba hologres: A cloud-native service for hybrid serving/analytical processing
CN111177244A (en) Data association analysis method for multiple heterogeneous databases
CN113672556A (en) Method and device for migrating batch files
JP6204753B2 (en) Distributed query processing apparatus, processing method, and processing program
Qi et al. Schain: Scalable concurrency over flexible permissioned blockchain
CN116303346A (en) Database migration method and system
CN111125248A (en) Big data storage analysis query system
US20210064602A1 (en) Change service for shared database object
CN113297159B (en) Data storage method and device
CN119201874B (en) Data synchronization method, device, equipment, medium and product
CN113297326B (en) Data processing method and device, computer readable storage medium, and processor
CN113297231B (en) Database processing method and device
CN118093647A (en) Distributed database query system, method, equipment and medium supporting multi-copy consistency reading

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant