[go: up one dir, main page]

CN109710388B - Data reading method and device, electronic equipment and storage medium - Google Patents

Data reading method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN109710388B
CN109710388B CN201910021182.3A CN201910021182A CN109710388B CN 109710388 B CN109710388 B CN 109710388B CN 201910021182 A CN201910021182 A CN 201910021182A CN 109710388 B CN109710388 B CN 109710388B
Authority
CN
China
Prior art keywords
transaction
global
node
node device
tuple
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910021182.3A
Other languages
Chinese (zh)
Other versions
CN109710388A (en
Inventor
李海翔
卢卫
杜小勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910021182.3A priority Critical patent/CN109710388B/en
Publication of CN109710388A publication Critical patent/CN109710388A/en
Application granted granted Critical
Publication of CN109710388B publication Critical patent/CN109710388B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data reading method and device, electronic equipment and a storage medium, and belongs to the technical field of databases and big data. The method provided by the embodiment of the invention can be regarded as an instant reading process by finding the common point with transaction consistency based on MVCC among a plurality of node devices at the moment when the current global read transaction occurs, which is equivalent to only reading the data submitted before the current moment, and in the data reading process, only the data submitted when the current global read transaction occurs is read, namely the global write transaction which possibly causes transaction inconsistency is eliminated, so that the read data has transaction consistency, the external data consistency of a database system is realized, and the data reading accuracy can also be realized.

Description

Data reading method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of database and big data technologies, and in particular, to a data reading method and apparatus, an electronic device, and a storage medium.
Background
Many of the existing distributed database systems can support write operations across nodes, that is, for a certain write operation, a write process to a plurality of node devices in the distributed database system may be involved, and thus, a transaction consistency problem of reading data may be caused. For example: when the cross-node write operation is implemented, assuming that there are two node devices, a preparation commit stage is passed, a transaction can be Committed, the first node device commits completely, but the second node device does not commit yet, at this time, a new global Read operation is performed by the Distributed database system, data Committed by the first node device is Read, but data is not Read because the second node device does not complete data commit yet, which is called a Distributed Read Committed-Committing exception (DRCC for short), and therefore, current data reading cannot guarantee that the Read data is in a transaction consistent state.
Disclosure of Invention
The invention provides a data reading method, a data reading device, electronic equipment and a storage medium, which can solve the problem of inconsistent data reading transactions. The technical scheme is as follows:
in one aspect, a data reading method is provided, and the method includes:
sending an indication message to a plurality of node devices corresponding to a global read transaction, wherein the indication message is used for indicating the node devices to return a related active transaction list of the global read transaction;
receiving a related active transaction list of the plurality of node devices, wherein the related active transaction list of each node device is used for representing related global write transactions in an active state on the node device, and each related global write transaction corresponds to at least two node devices in the plurality of node devices;
determining a target global write transaction group according to the related active transaction lists of the plurality of node devices, wherein target global write transactions included in the target global write transaction group are in an active state on the corresponding node device;
sending the target global write transaction group to the plurality of node devices;
and receiving data returned by the plurality of node devices, wherein the data comprises data acquired based on the global read transaction and the target global write transaction group.
In one possible implementation, the method further comprises:
when a master-slave logic copying technology is adopted in a database of a master-slave structure, when a standby machine receives an operation instruction of a global transaction transmitted by a host, assigning a transaction identifier of an execution transaction on the standby machine according to the global transaction identifier of the global transaction;
when a master-slave physical replication technology is adopted in the database of the master-slave structure, the current state data stored in the standby machine and the transition state data in the rollback section information of the standby machine are read in the process of reading based on the global read transaction.
In one possible implementation, the method further comprises:
rolling back the global read transaction when a reply of at least one of the plurality of node devices is not received within a target duration.
In one aspect, a data reading method is provided, and is applied to a node device, where the method includes:
receiving an indication message, wherein the indication message is used for indicating the plurality of node devices to return a related active transaction list of the global read transaction;
obtaining a related active transaction list, where the related active transaction list includes related global write transactions that are in progress on the node devices and a transaction state of each related global write transaction, and each related global write transaction corresponds to at least two node devices in the plurality of node devices;
sending the related active transaction list;
receiving a target global write transaction group, wherein the transaction state of a target global write transaction included in the target global write transaction group on corresponding node equipment is an executing state or a ready-to-commit state;
and outputting data according to the target global write transaction group and the global read transaction, wherein the data comprises data acquired based on the global read transaction and the target global write transaction group.
In one aspect, there is provided a data reading apparatus, the apparatus including:
a sending module, configured to send an indication message to multiple node devices corresponding to a global read transaction, where the indication message is used to indicate the multiple node devices to return a relevant active transaction list of the global read transaction;
a receiving module, configured to receive a related active transaction list of the multiple node devices, where the related active transaction list of each node device is used to represent related global write transactions in an active state on the node device, and each related global write transaction corresponds to at least two node devices in the multiple node devices;
a determining module, configured to determine a target global write transaction group according to a related active transaction list of the multiple node devices, where a target global write transaction included in the target global write transaction group is in an active state on a corresponding node device;
the sending module is further configured to send the target global write transaction group to the plurality of node devices;
the receiving module is further configured to receive data returned by the plurality of node devices, where the data includes data obtained based on the global read transaction and the target global write transaction group.
In one aspect, a data reading apparatus is provided, which is applied to a node device, and includes:
a receiving module, configured to receive an indication message, where the indication message is used to instruct the multiple node devices to return a relevant active transaction list of the global read transaction;
an obtaining module, configured to obtain a related active transaction list, where the related active transaction list includes related global write transactions that are in progress on the node device and a transaction status of each related global write transaction, and each related global write transaction corresponds to at least two node devices in the multiple node devices;
a sending module, configured to send the related active transaction list;
the receiving module is further configured to receive a target global write transaction group, where a transaction state of a target global write transaction included in the target global write transaction group on a corresponding node device is an executing state or a ready-to-commit state;
and the output module is used for outputting data according to the target global write transaction group and the global read transaction, wherein the data comprises data acquired based on the global read transaction and the target global write transaction group.
In one aspect, an electronic device is provided and includes a processor and a memory, where at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement the operations performed by the data reading method.
In one aspect, a computer-readable storage medium is provided, in which at least one instruction is stored, and the at least one instruction is loaded and executed by a processor to implement the operations performed by the data reading method as described above.
The method provided by the embodiment of the invention can be regarded as an instant reading process by finding the common point with transaction consistency based on MVCC among a plurality of node devices at the moment when the current global read transaction occurs, which is equivalent to only reading the data submitted before the current moment, and in the data reading process, only the data submitted when the current global read transaction occurs is read, namely the global write transaction which possibly causes transaction inconsistency is eliminated, so that the read data has transaction consistency, the external data consistency of a database system is realized, and the data reading accuracy can also be realized.
Drawings
FIG. 1 is a schematic diagram of an implementation environment of a data reading method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a data reading method according to an embodiment of the present invention;
FIGS. 3 and 4 are comparison diagrams of transaction identifications, respectively;
FIG. 5 provides a diagram of signaling interactions between node devices acting as hosts and standby in a database system;
FIGS. 6-8 show data transfer diagrams in three different ways, respectively;
FIG. 9 is a schematic illustration of the visibility situation for S0 and S1;
FIG. 10 is a schematic structural diagram of a data reading apparatus according to an embodiment of the present invention;
FIG. 11 is a schematic structural diagram of a data reading apparatus according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The database according to the embodiment of the present invention stores a plurality of data tables, each data table may be used to store tuples, and the tuples may have one or more versions. The database may be any type of database based on MVCC (Multi-Version concurrent Control). In the embodiment of the present invention, the type of the database is not particularly limited. It should be noted that the data in the database may include three states based on the state attribute: the data processing method comprises a current state, a transition state and a history state, wherein the three states are collectively called a 'full state of data', the 'full state of data' is short for full state data, and different state attributes in the full state data can be used for identifying the state of the data in a life cycle track of the data.
Current State (Current State): the latest version of the tuple's data is the data at the current stage. The state of the data in the current phase is referred to as the current state.
Transition State (Transitional State): the data in the transition state, which is not the latest version or the history state version of the tuple, is called half-decay data in the process of converting from the current state to the history state.
Historical state (Historical state): the tuple is in a state of history whose value is the old value and not the current value. The state of the data in the history phase is referred to as the history state. The historical state of a tuple can be multiple, and the process of state transition of data is reflected. Data in a history state can only be read and cannot be modified or deleted.
It should be noted that, under the MVCC mechanism, all the three states of the data exist, and under the non-MVCC mechanism, the data may exist only in the history state and the current state. Under MVCC or lock concurrency access control mechanisms, the new value of the data after the transaction commits is in the current state. Taking the MVCC mechanism as an example, the state of the data generated by the transaction before the smallest transaction in the current active transaction list is in a history state. Under the lockout concurrent access control mechanism, after a transaction commits, the value of data before the commit becomes the value of the history state, i.e., the old value of the tuple is in the history state. The read version has active transactions (non-latest relevant transactions) in use, and the latest relevant transactions modify the values of the tuples, the latest values of the tuples are already in a current state, the read values are already in a history state relative to the current state, and the data state of the tuples is between the current state and the history state, so the state is called a transition state.
For example, under the MVCC mechanism, the balance of account a in the User table is changed from 10 yuan to 20 yuan, then 15 yuan is consumed to 5 yuan, at this time, the financial B institution reads data to check the transaction all the time, a is recharged with 20 yuan and then changed into 25 yuan, 25 yuan is current state data, 5 yuan that B is reading is a transition state, and the remaining two values 20 and 10 are states existing in history and are both history state data.
Fig. 1 is a schematic diagram of an implementation environment of a data reading method according to an embodiment of the present invention. Referring to fig. 1, an implementation environment provided in fig. 1 may be a distributed database system, which may include a gateway server, a global transaction identifier generation cluster, and a distributed storage cluster, where the distributed storage cluster may include a plurality of node devices, and the distributed storage cluster may adopt a master-slave structure. In some embodiments, the gateway server may be merged with any node device in the distributed storage cluster on the same physical machine, that is, a node device participating in a read operation is allowed to act as the gateway server.
The gateway server is used for receiving the read-write request, and generating a unique global transaction identifier for a cluster application from the global transaction identifier for the read transaction or the write transaction corresponding to the read-write request based on whether the read transaction or the write transaction corresponding to the read-write request is a global transaction, so as to ensure the consistency of data read-write in the whole distributed database system.
The global transaction identifier generation cluster is used for generating a global transaction identifier, namely gxid, to identify a global transaction, which may refer to a transaction involving multiple node devices, for example, a global read transaction may involve reading data stored on multiple node devices, and for example, a global write transaction may involve writing data on multiple node devices. The generation of the global transaction identifier is realized in a cluster form, so that single-point failure can be prevented. When a global transaction occurs, a gateway server can apply a globally unique identification value for the global transaction identification generation cluster.
In some embodiments, the global transaction identifier generation cluster may be physically independent, or may be merged with a distributed storage cluster (e.g., zooKeeper) to provide a global transaction identifier generation service for each gateway server. Fig. 1 is an architecture diagram providing a lightweight global transaction, and is a kind of distributed database system.
Fig. 2 is a flowchart of a data reading method according to an embodiment of the present invention. Referring to fig. 2, the method includes:
201. when the read transaction involves cross-node operation, the gateway server determines the read transaction as a global read transaction, and the gateway server sends a generation request to the global transaction identifier generation cluster.
When the gateway server receives any operation statement (such as an SQL statement), the gateway server as a high-level computing layer of the database may parse the operation statement, and when the operation statement of the read transaction carries a specified keyword, the gateway server determines that the read transaction relates to a cross-node operation. For example, the specified keyword may be "GLOBAL" to indicate that the read object of the operation statement includes all data in the database system, that is, covers all node devices in the database system, and when the operation statement includes "GLOBAL", step 201 is executed.
Of course, in some embodiments, it may also be determined whether the data to be read is on one node device according to an operation statement of the read transaction, and if it is determined that the data to be read is not on the same node device, it is determined that the read transaction involves a cross-node operation. Specifically, it is determined whether data to be read is stored in two or more node devices according to a range of the data to be read and metadata in the range in an operation statement of the read transaction, and when it is determined that the data is stored in the two or more node devices, the read transaction involves a cross-node operation. Since the metadata is recorded with the current storage device of the data. For example, the gateway server may determine the accessed node device for each operation statement (SQL (Structured Query Language)) of each transaction according to the metadata, and the gateway server records the determined node device, and when it is counted that the number of different node devices accessed is greater than or equal to 2, the step 201 is executed. In the above embodiment, this determination manner based on metadata may be applied to a SELECT statement of a single statement.
The above process of determining whether to involve cross-node operations can be summarized as identification based on a specified key and automatic identification by the gateway server, if there are cross-node operations (global transaction identification), then a global transaction. If the operation statement (for example, SQL statement) only relates to a single node device, and belongs to a local transaction, also called local transaction, a global transaction identifier does not need to be applied, and only a local transaction identifier is allocated to the transaction.
Taking the example of specifying the keyword as "GLOBAL", the operation statement may take the following form:
BEGIN GLOBAL transaction, apply GLOBAL unique GLOBAL transaction identification value gxid for GLOBAL gxid generation cluster
SELECT…
END;
202. And after receiving the generation request, the global transaction identifier generation cluster generates a global transaction identifier for the global read transaction and sends the global transaction identifier to the gateway server.
In the embodiment of the present invention, the assignment of the global transaction identifier generated by the global transaction identifier generation cluster is monotonically increased with time, and is essentially a timestamp, the size of the assignment of the global transaction identifier may represent the occurrence time of the global write transaction, and the larger the assignment of the global transaction identifier is, the later the time sequence of the occurrence time of the global write transaction in the submitted global write transaction is. For example, the global transaction identifier may be any form capable of representing a timestamp, such as a numeric type or a time type or a character type.
203. The gateway server takes the global transaction identification as the transaction identification of the global read transaction.
For a database system, if the MVCC technology is supported, a gxid may be added to represent a global transaction identifier in the case that an xid field originally exists on a tuple, and the xid is renamed to lxid to represent a local transaction identifier, and the above differences in format can be seen in fig. 3 and fig. 4. Wherein the assignment of the transaction identity of the local transaction and the global transaction identity of the global transaction may both be monotonically increasing.
For example, if a transaction T needs to write two node devices to perform a transfer operation, the operation statement may take the following form:
BEGIN GLOBAL application for gxid, assume 20
UPDATE user _ account SET my _ wall = my _ wall-10 WHERE key = 100// node 1, local lxid 18
UPDATE user _ account SET my _ wallet = my _ wallet + 10 WHERE key = 900// node 2, local ixid is 22
COMMIT;
The key on the node device 1 is a tuple of 100, and the transaction identifier on the tuple is a two-tuple: { gxid, ixid } = {20, 18}.
The key on the node device 2 is a tuple of 900, and the transaction identifier on the tuple is a two-tuple: { gxid, lxid } = {20, 22}.
Through the transaction identification, whether the data from different node devices are the data of the same global transaction operation or not can be identified, namely whether the data belong to the same transaction or not. If the next transaction of the node apparatus 1 is a global transaction, the transaction identifier is { gxid, ixid } = {21, 19}; and the next transaction is a local transaction, the transaction identifier is { gxid, ixid } = {0, 20}; and the next transaction is a global transaction, the transaction is identified as { gxid, lxid } = {22, 21}, and so on.
204. And the gateway server sends an indication message to a plurality of node devices corresponding to the global read transaction, wherein the indication message is used for indicating the node devices to return a related active transaction list of the global read transaction.
The gateway server may decompose the global read transaction to send an indication message to each of the plurality of node devices involved. The indication message may include a transaction identifier of a global read transaction and node identifiers of the plurality of node devices, for example, the global read transaction may be represented by T-current, and the node devices involved in the T-current include N1, N3, N5, and N9. The node id may adopt a similar style of { N1, N3, N5, N9} to indicate that the global read transaction operation involves these node devices.
For the distributed database system, each node device may have an active global write transaction, and if the active global write transaction is submitted, a new data inconsistency may occur, so that the gateway server may indicate that each node device reports the global write transaction in an active state related to the global read transaction. The active state refers to an executing state or a ready-to-commit state, the executing state indicates that a transaction phase identified in the distributed database system has not entered a ready-to-commit phase, and the ready-to-commit state indicates that the transaction has entered a commit phase.
It should be noted that the gateway server may set a timeout mechanism, for example, if any node device in the plurality of node devices does not return the related active transaction list after timeout, which indicates that the node device may have a network problem or be down, the gateway server may send the indication message to the node device or the plurality of node devices again, so as to ensure that each node device can return the related active transaction list. Of course, in order to save signaling and avoid resource waste, a sending threshold may be set for the retransmission operation, for example, the retry sending operation is set to 3 times, and when the sending threshold is reached, and there is still a node device that does not return to the relevant active transaction list, the global read transaction is rolled back. Certainly, in any message sending process of the gateway server, a timeout mechanism can be set to avoid affecting the normal operation of the database system.
205. Each node device in the plurality of node devices acquires a related active transaction list according to the received indication message, the related active transaction list of each node device contains related global write transactions in an active state on the node device, and each related global write transaction corresponds to at least two node devices in the plurality of node devices.
The node device may provide the gateway server with the relevant global write transactions on the node device in an active state (e.g., in an executing state or a ready-to-commit state). Wherein, the related global write transaction is: other node devices involved in global write transactions must be encompassed by the node devices involved in global read transactions. For example, if a global write transaction T9 writes node devices N1 and N3, it is determined that the contents are contained and need to be fed back to the gateway server; and a global write transaction T10 writes node devices N3, N7, and if N7 is not in the node range involved in the global read transaction, then node device N7 is not required to respond proxy, if not included. The transactions in the relevant active transaction list must be at least related to at least two node devices corresponding to the global read transaction, for example, the global read transaction relates to { N1, N3, N5, N9}, and then the relevant global write transaction of the global read transaction needs to relate to at least two of the node devices.
In this embodiment of the present invention, optionally, the obtaining the related active transaction list may include: traversing global write transactions on the node device in an executing state and a ready-to-commit state; and when the node equipment corresponding to any global write transaction comprises at least two node equipment in the plurality of node equipment, adding the write transaction to the related active transaction list.
The list of related active transactions may be carried in a reply returned by the node device to the gateway server. Of course, in order to let each node device know which write transactions on the node device may affect the consistency of the global read transaction, each global write transaction also needs to issue the node device related to the transaction to each node device related to the transaction to know. For example, for the global write transaction T9, N1 knows that its global write transaction only involves N1, N3, and the two node devices are within the range of the node devices involved in the global read transaction, so the information of the global write transaction T9 is used as the response content. For example, the global write transaction T10, N3 knows that its global write transaction only involves N1, N7, and N7 is not in the range of the node devices involved in the global read transaction, so that the information of the global write transaction T10 is not used as the reply content.
Further, if the node device involved in the global write transaction is not a subset of the plurality of node devices involved in the global read transaction, the global write transaction is not a related global write transaction of the global read transaction, and information of the global write transaction may not be used as the response content.
206. Each node device of the plurality of node devices transmits the list of related active transactions.
In step 206, each node device has snapshot the relevant global write transactions that are active on its own device, and therefore, a relevant active transaction list may be built based on the snapshot for transmission.
It should be noted that, in the process of performing snapshot or sending, none of the node devices will control the commit operation of the global write transaction, and an executing state is allowed to change to a ready-to-commit state or a Committed state (Committed) state.
In one possible implementation manner, the format of the relevant active transaction list returned by the node device to the gateway server may be: node identification of a node device, global transaction identification of a related global write transaction in an active state on the node device. That is, the related active transaction list may adopt a variable length format to represent a plurality of global write transaction identifications, for example, when the number of distributed concurrent transactions is not large, the following format may be adopted, taking the related active transaction list of the N1 node as an example that is a triple: { N1, 105, 107, 110}.
Of course, the related active transaction list may also include the number of related global write transactions, taking the example that the related active transaction list of the N1 node is a quadruplet: { N1,3, 105, 107, 110}.
Further, the transaction status of each relevant global write transaction may also be included, for example:
list of related active transactions for N1: { N1, { 105-ready to commit state, 107-executing state, 110-executing state } }.
In some embodiments, the list of related active transactions for a node device includes a node identification for the node device, a number of related global write transactions, a global transaction identification for a smallest related global write transaction on the node device, and a target bitmap block for indicating a transaction identification for at least one related global write transaction on the node device other than the smallest related global write transaction. That is, the global transaction identifier of the relevant global write transaction in the active state on the node device may adopt a bitmap format, for example, when the distributed concurrent transaction amount is large and the global transaction amount on each node device is also large, the relevant active transaction list may adopt a bitmap format. By adopting the bitmap to represent the related global write transaction in the related active transaction list, the occupation of larger network bandwidth in the transmission process can be avoided.
In the embodiment of the present invention, each node device may use, as a baseline, a global transaction identifier of a least active global write transaction among related active write transactions on the node device, so as to determine a target bitmap block based on the baseline, where the least active global write transaction refers to an active global write transaction with a smallest global transaction identifier assignment, and a unit of the baseline is an unsigned 64-bit integer, and of course, the number here is merely an example, and in some embodiments, other numerical values may also be used.
In the list of related active transactions, the baseline and a bitmap block form a communication unit. A bitmap block may be formed of 128 bytes or 1280 bytes, and may be configured to be parameter adjustable. For example, the default value may be 4K bytes, each bit of each byte represents one transaction, and then 4K bytes may be used to represent 4 × 1024 × 8=32768 global transactions. If the physical device in which a single node device is located has stronger transaction processing capability, more active transactions can be represented by more bits. In a possible implementation manner, each bit is set to 1, which indicates that its corresponding transaction occurs on the node device, and 0 indicates that it does not occur. Such a data structure allows a reduction in the amount of data transmitted in the network.
For example: if the value of the global transaction identifier of the minimum active global write transaction among the related active write transactions is 12333, the transaction identifier corresponding to the first bit of 32768 global transactions is 12334, the transaction identifier corresponding to the second bit is 12335, and so on. If the first bit value is 1, the transaction identifier is 12334 occurring on the node device, and the second bit value is 0, the transaction number is 12335 not occurring on the node device.
For example, in the related active transaction list, the distributed write transactions may be ordered in a small-to-large order, where the minimum active global transaction identifier in the related active write transactions is assumed to be 105, and the other global write transaction identifiers are 107 and 110, respectively, for 3 transactions, the related active transaction list is:
take the case that the answer of the N1 node is a quadruple: { N1,3, 105, {01001} }.
Where 01001 is a target bitmap block, the first position 0 indicates that transaction 106 is a non-distributed related transaction, the second position 1 indicates that transaction 107 is a related transaction, and so on. The bitmap has a 5-bit length because the largest transaction 110-105 results in 5, the first third bit in the quadruple having been represented by a value and no longer putting in the target bitmap piece.
Further, in some embodiments, a compression algorithm (such as RLE, a Run Length Encoding algorithm, etc.) may be further adopted to compress the target bitmap block, that is, the target bitmap block is a compressed bitmap block, and the target bitmap block is decompressed after being received by the gateway server, so that the amount of data transmitted in the network is further reduced.
Correspondingly, the node identifiers of the plurality of node devices transmitted in the indication message sent by the gateway server may also adopt a format similar to the bitmap format described above, for example, the indication message may include the number of node devices involved in the global read transaction, a minimum node identifier of the node devices involved in the global read transaction, and a node bitmap block, where the node bitmap block is used to represent the node identifier of at least one node device of the node devices involved in the global read transaction except for the node device corresponding to the minimum node identifier. For example, the indication message may include { 3, N1, 0101}. In 0101, the first bit is used to indicate node device N2,0 indicates that N2 is not a node device involved in a global read transaction, and the second bit is used to indicate node device N3,1 indicates that N3 is a node device involved in a global read transaction, and so on, that is, the node devices involved in a global read transaction include N1, N3, and N5. Optionally, the node device list included in the indication message may also adopt a compression algorithm to compress the node location graph block, so as to achieve the purpose of saving data amount, and of course, the gateway server may also adopt a variable length format of the response message, which is the same as the format of the related active transaction list, and details are not repeated here.
It should be noted that the node device or the gateway server may determine which form is used to represent the node device list or the related active transaction list by determining the current concurrent transaction amount, for example, when the node device determines that the current concurrent transaction amount is greater than a first preset threshold, it may determine to represent the related active transaction list by using a bitmap format, so as to relieve transmission pressure. For another example, when the node device determines that the current quantity of concurrent transactions is smaller than a second preset threshold, it may be determined that the related active transaction list is represented in a variable length format, so as to save a decompression process on the receiving side, where the first preset threshold is larger than the second preset threshold. The gateway server side may also perform similar judgment, and certainly, the judgment may also be that the gateway server notifies the node device after judging so that the node device can express and transmit in a corresponding format, which is not limited in the embodiment of the present invention.
207. And after receiving the relevant active transaction lists of the plurality of node devices, the gateway server determines a target global write transaction group according to the relevant active transaction lists of the plurality of node devices.
The transaction state of the target global write transaction included in the target global write transaction group on the corresponding node device is an executing state or a ready-to-commit state, that is, for any target global write transaction, the transaction state of the target global write transaction on the corresponding node device may be an executing state or a ready-to-commit state. If the transaction in the target global write transaction group is subsequently submitted, a new inconsistency may be caused, so that the transaction of the type can be recorded as a group, and the tuple submitted by the part of transactions is subsequently excluded from the reading range, so as to avoid the situation of data inconsistency.
In this embodiment of the present invention, step 207 may include: and adding the related global write transactions in the related active transaction lists of the plurality of node devices as target global write transactions to the target global write transaction group. Further, when the related active transaction lists of the node devices include the same related global write transaction, the target global write transaction may be deduplicated, that is, when the target global write transaction group includes a plurality of the same global write transaction identifiers, any one of the plurality of the same global write transaction identifiers is retained, and the other same global write transaction identifiers are deleted, so as to avoid redundancy of data and repeated processing work that may be caused to a subsequent reading process. Or, when the relevant global write transaction in the relevant active transaction list of each node device is added to the target global write transaction group, it may be determined whether the target global write transaction group already contains the relevant global write transaction, if so, the relevant global write transaction is discarded, and if not, the global write transaction identifier of the relevant global write transaction is added to the target global write transaction group.
In one possible implementation, if the list of relevant active transactions is identified in a bitmap format, adding the relevant global write transactions in the list of relevant active transactions of the plurality of node devices as target global write transactions to the target global write transaction group may include: determining global transaction identifications of a plurality of relevant global write transactions on each node device according to the number of the relevant global write transactions in the relevant active transaction list of each node device, the global transaction identification of the minimum relevant global write transaction on the node device and the target bitmap block, and adding the global transaction identifications of the plurality of relevant global write transactions on each node device into the target global write transaction group.
In another possible embodiment, after determining each target global write transaction in the target global write transaction group, the gateway server may notify the node device corresponding to each target global write transaction to resume the commit operation on the global write transaction, and when each target global write transaction is in a committed state on the node device, the sending process of step 208 described below may be performed.
For the transactions other than the target global write transaction group, the transaction may also be notified to reply a commit operation of the global write transaction or the node device may restore the commit operation by itself, which is not limited in the embodiment of the present invention.
In order to express the transaction status on the node device more specifically, refer to table 1 below, where table 1 shows a table of the case where the node device executes a global transaction.
TABLE 1
gxid2 transactions gxid5 transactions gxid9 transactions gxid10 transactions
N1 Prepare commit state Executing state Executing state
N3 Executing state Executing state Prepare commit state Executing state
N5 Prepare commit state Executing state
N9 Executing state
N7 Prepare commit state
Based on the example of table 1 above, it can be seen that for a global write transaction on a node device, there may be four cases:
in case 1, the sub-transactions of the global write transaction are in a ready-to-commit state on each node device;
in case 2, the sub-transactions of the global write transaction are in an executing state on each node device, which indicates that the distributed read inconsistency cannot be caused when the transaction does not enter a commit stage at the moment;
in case 3, at least one of the sub-transactions of the global write transaction is in an executing state and one of the sub-transactions of the global write transaction is in a ready-to-commit state on each node device;
in case 4, the sub-transactions of the global write transaction are not in a ready-to-commit state and an executing state on the respective node devices, i.e., no global write transaction is executing. Since this situation has been excluded by the acquisition of the relevant active transaction list, it is not provided as reply content to the gateway server.
Based on the above cases 1 to 3, the gateway server may traverse each relevant global write transaction, such as gxid2, gxid5, and gxid9 in table 1 above, filter out the global write transaction corresponding to the case 2, and do no processing.
The target global transaction group may be a transaction identifier set, which may be regarded as a snapshot (PostgreSQL becomes SanpShot, mySQL/InnoDB is called ReadView) constructed, the snapshot is equivalent to registering all currently active global write transactions to a list, and subsequently, based on the list and the tuple visibility judgment, it may be determined whether the read tuple is a version written with respect to a transaction that has already committed the transaction, and if so, the read consistency of the single machine is ensured.
208. The gateway server sends the target global write transaction group to the plurality of node devices.
Since the actual reading is completed by the node device, the gateway server needs to send the target global write transaction group to the node device, so that the node device performs visibility determination, thereby implementing the reading of data.
209. After each node device in the plurality of node devices receives a target global write transaction group, a first tuple in a target tuple of a global read transaction is obtained according to a global transaction snapshot of the global read transaction, wherein the first tuple is a tuple visible to the global read transaction.
When each node device completes execution of all global write transactions including a ready-to-commit status, that is, for each node device, each first global write transaction in the target global write transaction group is in a committed status on the node device, the node device may determine to execute a subsequent data visibility determination process, so as to perform actual data reading.
For the global read transaction, whether multiple versions of a tuple are visible or not may be determined by a global transaction snapshot of the global read transaction, for example, whether the tuple is visible or not may be determined according to a creation time of the global transaction snapshot of the global read transaction and a creation time, a deletion time, or a commit time of the tuple. Specifically, whether the tuple related to the embodiment of the present invention is visible means whether the tuple can be read by a transaction at a time corresponding to a global transaction snapshot of a global read transaction. Each tuple read from the data table may read the life cycle information of the tuple, that is, the creation time, deletion time, commit time, and other information of the tuple, taking visibility determination based on the history time, that is, the creation time of the transaction snapshot as an example:
(I): when a certain version of the tuple is generated for the insert operation, the tuple is determined to be visible when the creation time and the commit time are both before the start time of the creation time of the transaction snapshot. (II): when a certain version of the tuple is generated for a delete operation, the tuple is determined to be visible when both the delete time and the commit time are before the creation time of the transaction snapshot. (III): when a certain version of the tuple is generated for the update operation, the tuple is determined to be visible when the creation time and the commit time are both before the creation time of the transaction snapshot.
210. If the first tuple of the global read transaction comprises a second tuple, each node device of the plurality of node devices outputs the first tuple except the second tuple, and the second tuple is a tuple of which the commit transaction is any target global write transaction.
In the visibility judgment of step 209, when a certain version of any tuple is visible to the global read transaction, that is, the visibility condition is satisfied, it may be further judged whether the transaction submitting the version is a target global write transaction in the target global write transaction group, and if so, only the tuples outside the tuple of the version are output to the gateway server.
In the above process, no matter whether the target global write transaction has finished being committed when the node device starts reading, the part of data is excluded from the reading range, and thus, the read transaction corresponding to the data will not have a state that a part of the read transaction has been committed and a part of the read transaction has not been committed on each node device, and reading inconsistency will not be caused, and therefore, the phenomenon that the related data of some transactions has the above-mentioned distributed read-half committed exception phenomenon will not occur, and the consistency of data reading is ensured.
For example, if any tuple, although visible to a global read transaction, is committed by gxid2, and gxid2 is in the target global write transaction group, then the tuple is not output.
The above steps 209 to 210 are processes of outputting data according to the target global write transaction group and the global read transaction, where the data includes data acquired based on the global read transaction and the target global write transaction group.
211. And each node device in the plurality of node devices sends the acquired data to the gateway server.
212. The gateway server outputs the received data.
For each node device, visible data may be obtained according to the received transaction in the target global write transaction group and the global read transaction, so as to output data for the global read transaction. In one possible implementation manner, the gateway server may output the acquired data when receiving the data returned by any node device, that is, after receiving the data returned by all the node devices, in another possible implementation manner, the gateway server may further output the acquired data after receiving the data returned by all the node devices, which is not limited in this embodiment of the present invention.
The method provided by the embodiment of the invention can be regarded as an instant reading process by finding the common point with transaction consistency based on MVCC among a plurality of node devices at the moment when the current global read transaction occurs, which is equivalent to only reading the data submitted before the current moment, and in the data reading process, only the data submitted when the current global read transaction occurs is read, namely the global write transaction which possibly causes transaction inconsistency is eliminated, so that the read data has transaction consistency, the external data consistency of a database system is realized, and the data reading accuracy can also be realized.
It should be noted that, in any message sending process of the gateway server, the gateway server may set a timeout mechanism to avoid affecting the normal operation of the database system. For example, if any node device in the plurality of node devices does not return data after time out, which indicates that the node device may have a network problem or be down, the gateway server may send the read instruction to the node device or the plurality of node devices again, so as to ensure that each node device can return data. Of course, in order to save signaling and avoid resource waste, a sending threshold may be set for the retransmission operation, for example, the retry sending operation is set to 3 times, and when the sending threshold is reached, there is still a node device that does not return data, the global read transaction is rolled back. Rolling back a global read transaction refers to restoring the database system to a state where the global read transaction starts, for example, the database system invalidates all data that has been read or output based on the global read transaction.
Certainly, in any message sending process of the gateway server, a timeout mechanism can be set to avoid affecting the normal operation of the database system. In some embodiments, for a Global read Transaction involving only a read-only Transaction, if any information sending process is timed out, the rolling back may not be performed, for example, for a read-only Transaction, the gateway server may obtain a GTID (Global Transaction ID) value from the beginning, and then distribute the GTID value to each node device, and each node device may perform the read-only operation again. The mechanism avoids the rollback operation of the read-only transaction, can improve the transaction throughput of the system, and is particularly more effective to the application scene of frequent reading in the HTAP system.
The method provided by the embodiment of the invention can also support the concurrent execution of a plurality of global read transactions. For example, when the time interval between the global read transactions is smaller than a preset time interval (e.g., within 1 second, which can be set as a parameter), and the read node devices are the same or contain a relationship (e.g., the node range read by the previous global read transaction contains the read node device range of the next global read transaction), the next global read transaction is allowed to multiplex the previous target global write transaction group, so as to further improve the overall performance.
Furthermore, the system architecture according to the embodiment of the present invention provides a lightweight decentralized transaction processing architecture for a distributed database system, where the lightweight characteristic is that a global transaction identifier is used to generate a cluster, and the cluster has a single function, can generate global transaction identifiers in a batch manner in a memory, and is very efficient. The performance advantage is particularly significant compared to a global transaction manager that implements global transaction management, conflicting access control, and MVCC mechanisms. The decentralization is characterized in that the processing of the transaction in the system depends on each node device rather than a global transaction manager, so that a single-point, complex and time-consuming global transaction manager does not exist in the architecture, and the decentralization on the basis of complete functions is realized.
Further, the embodiment of the present invention can also ensure that the external consistency of the distributed database system is achieved, that is, the events occurring in the distributed database system need to be obtained in the order of occurrence of the events, and the original order of the events can still be reflected when the data is referred to in the database. The global transaction identifier generation cluster provides a globally unique and monotonically increasing logical time identifier for the TDSQL-like system, so that external consistency can be effectively ensured. I.e., a new transaction following the global read transaction, whose data remains invisible to the global read transaction even though the transaction committed prior to the global read transaction. Moreover, the global transaction identifier generation cluster is logically a single point, only the primary copy physically provides services, but the secondary copy can complete the selection of a new primary through a similar Paxos and Raft protocol after the primary copy fails, so that the probability that the global transaction identifier generation cluster becomes a single-point bottleneck is reduced.
The embodiment of the invention is suitable for any transactional database system supporting cross-node global write operation, such as distributed databases (SQL, noSQL, newSQL, relational and non-relational), relational databases based on MVCC, non-relational databases based on MVCC, distributed big data processing and other systems. Further, the embodiment of the present invention is particularly suitable for constructing a distributed HTAP database, especially a massive temporal distributed HTAP (Transactional/Analytical Processing) database. The embodiment of the invention reduces the load of the architecture of the global transaction and analytical database on the aspect of a transaction processing mechanism, so that the transaction processing mechanism is simple and efficient. And the single-machine database system of the MVCC is used as a base line of each node and is matched with a decentralized and lightweight transaction processing mechanism, so that the limitation on concurrent query is less, and query operation is autonomous among the nodes, thereby being very suitable for an analysis system. In summary, the embodiments of the present invention provide a possibility for high performance of a distributed hybrid (transaction, analytic) database, and the technical advantage of improving the performance of the entire system is obvious in the architecture level of the distributed database.
For a database system for realizing the MVCC technology by a single machine, different implementation modes of the database system have certain influence on the embodiment of the invention. For example, the concurrent access control technology for data realizes a serializable scenario, for a database that depends on TO (timestamp) + MVCC, such as PostgreSQL, serializability is guaranteed by using an SSI (spring struts ibatis) technology, and since the SSI technology is also an MVCC technology in nature, the data reading method provided by the embodiment of the present invention can be applied no matter which isolation level is under. Of course, the isolation level set on each node device by transactions across nodes needs to be consistent. For a database relying on a blocking-based technology, such as MySQL/InnoDB, MVCC is used to implement RR (Repeatable Read, isolation level) and RC (Read _ Committed, read Committed isolation level). Of course, the isolation level set on each node by transactions across the nodes also needs to be consistent. The embodiment of the invention is also applicable to the database adopting the SI isolation mode.
It should be noted that the data reading method may be applied to a backup technology, and when a global backup is performed, objects to be read are all node devices in a cluster, in this case, the data reading method may suspend all executing write transactions to commit (the time is theoretically shorter relative to a global transaction manager). But does not affect the starting and running of new transactions and the distributed write transaction of which all the sub-transactions are in the executing state phase. For non-global backup, the influence on a database system is small due to the limited number of involved nodes; by adopting the method provided by the embodiment of the invention, the single-point bottleneck of the global transaction is eliminated from the architecture, the overall transaction throughput of the system can be improved to a larger extent fundamentally, and the performance loss caused by the suspension of the submitting operation of the partial global write transaction is very small in practice, so that the overall performance is not influenced, and the mutual blocking of the read-write operation is not caused.
In some embodiments, the heartbeat may be maintained between node devices, and once any read node device fails to respond to a read due to a downtime or the like, the heartbeat may be reported to the gateway server, and the gateway server is responsible for a transaction rollback release prohibition on continuing to execute a write or a prohibition on processing a write transaction failure or other events). If some node devices have completed the transaction, the user may be notified that the returned data is invalid, and of course, for the centralized result returning mechanism, when any node device has a condition, the result is not output to the user, and the notification to the user is not needed.
In some embodiments, each node cluster may have a different primary/secondary architecture, and when a global read transaction is executed, the global read transaction may be performed based on the different primary/secondary architectures, where a preferential read target of the global read transaction is a node device storing a primary copy. Based on different main/standby architectures, the method provided by the embodiment of the present invention may have different applications:
the first way is to rely on master-slave logic replication technology to achieve high reliability, such as MySQL, which is a logic replication using binglog. For the first mode, because the write transaction occurs on the node device of the primary copy first, and the secondary copy is only used for backup and read-only services, the read method provided by the embodiment of the present invention is used to perform a read operation on the host (i.e., the node device for storing the primary copy), so that the transaction consistency of global read can be ensured. Taking the master-slave replication technique of MySQL as an example, due to its master-slave replication mechanism, it may cause the SQL statement of the user to be re-executed on the backup machine, that is, the node device for storing the slave replica), and since the transaction identifier is automatically allocated by each database engine, it may cause the transaction identifiers on the tuples on the master replica and the slave replica to be different, for this reason, a certain signaling interaction may be performed between the host and the backup machine, for example, fig. 5 provides a signaling interaction diagram between the node devices serving as the host and the backup machine in the database system, and as shown in fig. 5, the method may further include: when a master-slave logic copying technology is adopted in a database of a master-slave structure, when a standby machine receives an operation instruction of a global transaction transmitted by a host, assigning a transaction identifier of an execution transaction on the standby machine according to the global transaction identifier of the global transaction. The global transaction may be a global read transaction or a global write transaction, which is not limited in the embodiment of the present invention. For example, when the host generates a bindlog, for each transaction, { gxid, ixid } of the global transaction may be passed to the standby, which, upon receiving { gxid, ixid } and executing the SQL statement in the bindlog, assigns a value to each transaction's gxid and ixid with the received { gxid, ixid }. A comparison table of MySQL Binlog format improvements is provided, as in table 2 below.
TABLE 2
One-segment format of Binglog Improved bin-log format
#
180115 16:45:48 server id 2392862531 end_log_pos 1151 Query thread_id=73 exec_time=3553 error_code=0 Xid = 22 #180115 16:45:48 server id 2392862531 end_log_pos 1151 Query thread_id=73 exec_time=3553 error_code=0 gxid = 20 lxid = 22
By the method, the problem of global read consistency under a logic replication framework is solved, and the problem of global read data consistency under any cross-node transaction (such as main-main, main-standby and standby combination) can be solved. For example, the data read by the global read transaction all originates from the standby system, so that the host in the distributed database system is not influenced, and the overall performance is high.
The second way is a master-slave physical replication technique to achieve high reliability, such as physical replication using the REDO Log (Log), for example, the stream replication technique of PostgreSQL is a physical replication technique relying on the REDO Log. For the second mode, namely the physical replication mode, because the tuple is a physical-level replication, the transaction identifier of the tuple is not lost, but the multiple versions of data of different databases are organized differently, so that the recovery process may be different. The following two typical modes are available:
class PostgreSQL mode: the characteristic of this kind of mode is that the multi-version data is stored in the PAGE (PAGE), and when the REDO mode is used for recovery, the data is recovered to the standby machine, so that the history data read "dirty" can be successfully read, and the application of the data reading method of the embodiment of the present invention will not be affected.
Type MySQL/InnodB: the characteristic of this type of mode is that the multi-version data is stored in the rollback section of the memory. The method includes the steps that recovery is conducted based on an RODO Log, information lacking a rollback section is recovered in a standby machine at the same time, therefore, the information of the rollback section can be synchronized to the standby machine (for example, the rollback section information is recorded into an REDO Log, and memory recovery is conducted according to the recorded rollback section information during recovery).
The third way is to realize data consistency of logical or physical multiple copies by means of a distributed consistency protocol, so as to realize high reliability, for example, a high-reliability system constructed by distributed consistency protocols such as Paxos, raft and the like. For the third mode, a logical mode and a physical mode may be distinguished, and specific reference may be made to the processing of the first mode and the second mode.
In order to implement the data reading method provided by the embodiment of the present invention, it is necessary that both the gateway server and the node device have the capability of identifying local transactions and global transactions, so as to prepare for constructing a relevant active transaction list.
When the gateway server allocates the transaction identifier for any transaction, the following situations are involved:
in the transaction start phase, any client starts a transaction, and the operation of opening the transaction (such as T1 transaction) is executed on the gateway server, which is generally divided into three cases:
for explicitly opening a global transaction, the gateway server assigns a global transaction identifier to the transaction. For the display of the opening of the local transaction, the gateway server must not allocate a global transaction identifier for the transaction, but allocate a local transaction identifier. And if the transaction is changed from a local transaction to a global transaction, the transaction rolls back directly. For an implicit open transaction, the transaction state of such a transaction is unknown, and since the transaction state is not specified at the time of opening the transaction, it is necessary to track the transition of the transaction state during execution of the transaction. The state of a transaction is defined according to the three cases as above:
T1.state=GLOBAL/LOCAL|UNKNOWN
BEGIN TRANSACTION [ Global ] | [ LOCAL ] [ PRIORITY HIGH | LOW ]// GLOBAL indicates that a GLOBAL TRANSACTION is shown to BEGIN, LOCAL indicates that a LOCAL TRANSACTION is shown to open, both implicitly BEGIN, and [ PRIORITY HIGH | LOW ] indicates that the PRIORITY of the TRANSACTION is HIGH or LOW, by default.
For example, for an implicit open transaction, SQL operations occurring in a transaction block need to be identified, and the state of the transaction is determined, where multiple SQL statements may be included in one transaction, and such a transaction may be referred to as a transaction block.
The identifying of the SQL operation occurring in the transaction block and the determining of the state of the transaction specifically include: and identifying the node equipment operated by each SQL statement according to the meta-information service of the distributed database, registering the node equipment into the node equipment set of the transaction, and modifying the state of the transaction into a GLOBAL transaction (for example, changing the state of the transaction into GLOBAL) when the number of the node equipment in the node equipment set is more than or equal to 2.
When a first node device involved in the transaction is operated, a Local transaction identifier is allocated to the transaction on the node device, the node device (for example, node device N1) constructs a node-level transaction Snapshot, called Snapshot-Local-N1, to obtain a related active transaction list, and sends the Local related active transaction list to the gateway server. Through the sending process, the timing of sending the relevant active transaction list in the above embodiment may be advanced, and essentially, the global active transaction list of the node device including the relevant global write transaction is sent to the gateway device, so that the original synchronous operation is changed into asynchronous operation.
When other node devices except the first node device involved in the transaction are operated, for example, the node device N3 is operated, a Local transaction identifier is allocated to the transaction on the node device N3, the node device constructs a node-level transaction Snapshot, which is called Snapshot-Local-N3, to obtain a related active transaction list, and sends the Local related active transaction list to the gateway server. Further, the gateway server may transfer the related active transaction list related to the transaction, the global transaction identifier of the transaction, the operated tuple (e.g., items-N1) on the operated node device, and the like to the newly added node device N3.
On the newly added node equipment, when a read-write instruction corresponding to any transaction is received, traversing each tuple according to the received read-write instruction of the first transaction; determining the tuple as the target tuple if no concurrent transaction is operating on the tuple; if a second transaction is writing the tuple and the first transaction is a write transaction, rolling back the first transaction; if a second transaction is writing the tuple and the first transaction is a read transaction, allowing the first transaction to read a third tuple corresponding to the tuple if the start time of the second transaction is later than the start time of the first transaction, wherein the third tuple is an old version of the tuple. Further, if the second transaction is writing the tuple and the first transaction is a read transaction, if the start time of the second transaction is earlier than the start time of the first transaction, the priorities of the first transaction and the second transaction are compared, the transaction with the lower priority is rolled back, and if the priorities are the same, the first transaction is rolled back or the first transaction is retried.
That is, while traversing each tuple, if there are no concurrent transactions operating on (reading or writing) the tuple, there are no conflicts, and the traversal process can continue down. If a tuple is being written by another transaction (e.g., T2) (write complete but not committed), then if the transaction T1 is a write transaction, the transaction T1 is rolled back, if the transaction T1 is a read transaction, then if the start time of the transaction T2 is later than the start time of the transaction T1, the transaction T1 is allowed to read the tuple of the old version (read the old version of the latest write), otherwise, the priorities of the transactions T1 and T2 are compared, and the LOW priority (LOW) is rolled back. If the priorities are the same, either transaction T1 is rolled back or a retry method in a timeout mechanism is taken for transaction T1. If another transaction (T2) is reading the tuple, then there is no conflict between transactions T1 and T2 and execution can continue down.
The process is a gradual incremental process in a global transaction, and the process can be adopted to read the tuple each time a node device is newly operated.
Further, when the number of the node devices in the node device set of the transaction is equal to 1, the state of the transaction is modified to LOCAL transaction (LOCAL). In some embodiments, when committing a transaction, if the transaction state is a local transaction, a 1PC commit is made; otherwise, 2PC or 3PC commit is performed.
In some embodiments, the transaction may also be assigned a priority, in one embodiment, the priority of the transaction may be specified at the time of the start of the transaction, or in one embodiment, the priority may also be determined based on the state of the transaction, and in particular, the priority of the global transaction may be higher than that of the local transaction, so as to improve the processing efficiency of the global transaction. For example, the priority of the global transaction may be 1, the priority of the local transaction may be 0, and a higher priority value may be preferentially processed when performing the priority comparison.
In one embodiment, the priority of the transaction may be updated, and in particular, the priority may be updated based on the type of operation performed when the node device is accessed, so as to improve the processing efficiency of the global transaction.
For example, each time a global transaction accesses one node device for the first time, if a read operation is performed, the priority is increased by 1; if a write operation is performed, the priority is increased by 2.
In one embodiment, the priority of the transaction may also be updated based on the running time of the transaction to avoid that some transactions are always squashed by higher priority transactions, which results in inefficient transaction processing, for example, the running time of the transaction may take 1 second as unit time, and the priority is increased by 1 every time the unit time is increased. The longer the transaction runs, the higher the priority is relative, avoiding backtracking and creating backlog.
It should be noted that, after updating the priority of the transaction, each node device may send the updated priority to the gateway server, so that the gateway server controls the processing of the transaction based on the updated priority.
In some embodiments, the node devices in the distributed database system may take three different ways to construct a transaction snapshot of a global transaction, see fig. 6, 7, and 8, which fig. 6-8 illustrate data transfer in three different ways, respectively. Wherein, in fig. 6 to 8, user-1 and User-2 represent 2 users or sessions. User-1 initiates a transaction T1, wherein T1 is a global transaction, and three node devices are operated, namely Server-1, server-2 and Server-3. Transaction T1 represents with thin lines that an old transaction is about to commit, assuming that commit is completed at Server-1, server-2, and that Server-3 has not committed. User-2 initiates a transaction T2, T2 is a global transaction, and two node devices, namely Server-2 and Server-3, are operated. Transaction T2 is represented by a bold line as a new transaction just started. Each global transaction is initiated on a gateway server Proxy, which acts as a coordinator. Whereas the dashed lines in fig. 6 to 8 represent the transfer of the list of related active transactions, the arrows indicate the direction of the transfer.
First, fig. 6 introduces a general control type data transmission, referring to fig. 6, when a transaction comes, the gateway server Proxy starts the transaction, sends a first snapshot instruction to each node device involved in the transaction to instruct each node device to return to a node-level transaction snapshot, and the gateway server determines, based on the received node-level transaction snapshot, whether a conflict exists between concurrent transactions occurring on each node device, and the like. Specifically, the gateway server may start a Transaction on the gateway server by executing a "Begin Transaction", then obtain the accessed node device by analyzing according to an SQL statement in the Transaction block, and request each node device to construct a node-level Transaction snapshot by a snapshot instruction, and transmit the node-level Transaction snapshot to the gateway server. If one SQL operation involves multiple node devices at the same time, if the query is a cross-node query, each node device simultaneously transmits node-level transaction snapshots to the gateway server. It should be noted that when returning to the node-level transaction snapshot, the node device involved in the transaction returns to the gateway Server that starts the transaction, and taking fig. 6 as an example, the Server-2 and the Server-3 send the node-level transaction snapshot to the Proxy2.
Next, fig. 7 introduces a progressive data transfer, when a transaction comes, the gateway server may send a second snapshot instruction to a first node device of the multiple node devices involved in the transaction, so as to instruct the first node device to return to a node-level transaction snapshot, then send the node-level transaction snapshot of the first node device to a second node device of the multiple node devices, after receiving the node-level transaction snapshot of the first node device, the second node device performs a transaction snapshot on the global transaction in an active state on the second node device, obtains a transaction snapshot on the second node device, obtains a union of the transaction snapshot of the second node device and the node-level transaction snapshot of the first node device, sends the union as the node-level transaction snapshot of the second node device to the gateway server, and is sent by the gateway server to any one of the other node devices, so as to implement the transfer of the node-level transaction between the node devices, the gateway server performs a transaction processing based on the received node of each node device, and the like, so as to obtain a final transaction snapshot of the global transaction.
For example, referring to fig. 7, after Proxy-2 is started, the transaction T2 firstly requires Server-2 to transmit the node-level transaction snapshot S2 to Proxy-2, proxy-2 transmits the snapshot to Server-3 after receiving the node-level transaction snapshot S2 of Server-2, server-3 constructs its node-level transaction snapshot S3 according to the received S2, and then transmits the node-level transaction snapshot to Proxy-2, and Proxy uniformly determines the global transaction snapshot of the transaction, whether there is conflict between concurrent transactions occurring on each node device, and the like.
Finally, fig. 8 introduces a progressive transfer type data transfer, where the gateway server sends a third snapshot instruction to a first node device in the multiple node devices involved in the transaction, so as to instruct the first node device to send a node-level transaction snapshot to a second node device, and then performs transaction snapshot on a global transaction in an active state on the second node device, so as to obtain a transaction snapshot of the second node device, obtain a union of the transaction snapshot of the second node device and the node-level transaction snapshot of the first node device, send the union as the node-level transaction snapshot of the second node device to a next node device in the multiple node devices, and so on, after each node device receives a related active transaction list of the previous node device, based on the node-level transaction snapshot of the previous node device and the transaction snapshot performed on its own node device, obtain a node-level transaction snapshot of the node device, send the node-level transaction snapshot of the node device to the next node device, and send the last node-level transaction snapshot to the gateway server, so as to implement node-node snapshot gateway-node-snapshot transfer of the transaction between the node devices.
For example, referring to FIG. 8, snapshot S2 passes node-level transaction snapshot S2 to S3 instead of Proxy-2 based on the node device to which the query relates (Server-3 node device is known to be involved in the query at Server-2), and then passes all accumulated snapshots such as S2, S3 to Proxy-2 node device when there are no other node devices in the same query statement. An improvement is that since S3 is constructed based on S2, S3 is a global transaction snapshot (assuming that T2 transaction only involves 2 node devices), and only the global transaction snapshot is passed to Proxy-2. If the number of node devices involved in a transaction is small, the amount of data in chain transmission is small, and network congestion is effectively reduced. When the number of the related node devices is large, a node device threshold value can be set, and when the node device contained in one SQL statement is smaller than or equal to the node device threshold value, the transmission of the node-level transaction snapshot is carried out by adopting a progressive transmission mode among the node devices, so that the influence of the link length on the time influencing the SQL execution process is avoided. For example, the node device threshold 5 indicates that when a maximum of 5 node devices are included in an SQL statement, the node-level transaction snapshot is transmitted by using the progressive transfer method between the node devices.
For the embodiment of the present invention, if any SQL statement after transaction decomposition only includes one node device and corresponds to an update or delete operation, since the update or delete operation only updates or deletes the latest version of data, it is not necessary to construct a snapshot for the update operation, that is, the update operation or delete operation including only one node device may be omitted when obtaining the relevant active transaction list.
For query transactions, if one SQL statement in the transaction block contains a plurality of node devices, the related active transaction list is constructed in a progressive transmission mode, and if one SQL statement in the transaction block contains one node device, the related active transaction list can be constructed in a general control mode. The rules can be seen in table 3, which specific construction method is adopted.
TABLE 3
Figure 217114DEST_PATH_IMAGE001
It should be noted that, according to the progressive snapshot construction, the related active transaction list must be constructed in compliance with the known related active transaction list, and the specific construction manner may include the following processes:
the first node device performs a transaction snapshot on the global transaction in an active state on the node device to obtain a node-level transaction snapshot of the node device, and it is assumed that the node-level transaction snapshot of the first node device is S0.
The non-first node device performs transaction snapshot on the global transaction in the active state on the node device, sets the obtained transaction snapshot as S1, sets the final node-level transaction snapshot of the node device as S, and then the relationship between S0 and S1 is as shown in fig. 9, and the visibility of S is divided into three cases:
for global transactions unique to S0, these global transactions do not occur on the node device where S1 is located, but S can only read the next new tuple of these global transactions (perhaps the global transaction on the S0 node device has already committed to completion on the node device where S1 is located).
For global transactions unique to S1, the global transactions only occur on the node device where S1 is located, and S can only read the secondary new tuple of the global transactions.
For global transactions shared by S0 and S1, according to the stand-alone MVCC visibility judgment principle, the latest tuples generated by the global transactions are invisible, and S can only read the secondary new tuples of the global transactions.
Therefore, in all three cases, the result is "S can only read the secondary tuple of these global transactions", i.e., S = S0 ═ S1. If there are more node devices accessed, and so on. The node-level transaction snapshot of the newly added node device is necessarily the union of the node-level transaction snapshots of all the previously accessed node devices and the global transaction in the active state on the local node device.
In summary, for any node device, after receiving the node-level transaction snapshot of the previous node device, the node-level transaction snapshot of the node device may be obtained based on the node-level transaction snapshot of the previous node device and the global transaction in the active state on the node device. When the node device is a node device accessed for the first time of the global read transaction, the node-level transaction snapshot of the node device is used for representing the global transaction in an active state on the node device, and when the node device is a node device not accessed for the first time of the global read transaction, the node-level transaction snapshot of the node device represents the global transaction included in the node-level transaction snapshot of the previous node device and the global transaction in an active state on the node device.
For the construction of the master control type global transaction snapshot as shown in fig. 6, the gateway server may receive node-level transaction snapshots sent by each node device involved in the global transaction, and obtain a union to obtain the global transaction snapshot of the global transaction, so that when reading is performed, the global transaction snapshot is issued to each node device for visibility judgment, and data modified by the global transaction in the union on each node device is invisible to the read operation of each node device through the visibility judgment.
In summary, for the gateway server, in three different construction modes, the following functions may be performed, and in a first general control construction mode, the gateway server may receive node-level transaction snapshots respectively sent by the plurality of node devices, and obtain a global transaction snapshot of the global read transaction according to the node-level transaction snapshots of the plurality of node devices. In a second progressive construction manner, the gateway server may receive a node-level transaction snapshot of a first node device of the multiple node devices, send the node-level transaction snapshot of the first node device to a second node device of the multiple node devices until the node-level transaction snapshots of the multiple node devices are all received, and obtain a global transaction snapshot of the global read transaction according to the node-level transaction snapshots of the multiple node devices. In a third progressive transitive construction manner, the gateway server may receive a global transaction snapshot sent by a third node device of the multiple node devices, and use the global transaction snapshot sent by the third node device as a global transaction snapshot of the global read transaction. The third node device may be a node device whose transfer sequence is located at the end among the plurality of node devices, and the embodiment of the present invention does not limit which node device is the third node device.
In the embodiment of the present invention, the following processing of data exception may be involved: if the abnormal condition of unrepeatable data reading is eliminated, each node device only needs to use the locally received global transaction snapshot when the data is read each time, and the same snapshot can be used all the time in the operation process of one transaction without constructing a new snapshot.
For the case of the magic read data exception, for example, the database of the index organization tree structure such as InnoDB, the gap lock elimination can be combined, and if the heap table such as PostgreSQL, the magic read elimination can be realized under the SSI (serialized snapshot isolation) technology.
Whereas the distributed read-half committed exceptions mentioned for embodiments of the present invention may be eliminated at the serialization isolation level by using SSI. And if SI (Snapshot Isolation) Isolation is adopted, the data reading method provided by the embodiment of the present invention can eliminate the distributed read semi-committed exception. And abnormal data with partial order of writing unique to SI can be eliminated by SSI.
The introduction of the snapshot construction mode of the global transaction is the construction of a snapshot in the same transaction, and the construction mode can be applied to the data reading process provided in fig. 2 as a basis of data reading.
Fig. 10 is a schematic structural diagram of a data reading apparatus according to an embodiment of the present invention, and referring to fig. 10, the apparatus includes:
a sending module 1001, configured to send an indication message to multiple node devices corresponding to a global read transaction, where the indication message is used to indicate the multiple node devices to return a relevant active transaction list of the global read transaction;
a receiving module 1002, configured to receive a list of relevant active transactions of the multiple node devices, where the list of relevant active transactions of each node device is used to represent relevant global write transactions in an active state on the node device, and each relevant global write transaction corresponds to at least two node devices in the multiple node devices;
a determining module 1003, configured to determine a target global write transaction group according to the related active transaction list of the multiple node devices, where a target global write transaction included in the target global write transaction group is in an active state on a corresponding node device;
the sending module 1001 is further configured to send the target global write transaction group to the plurality of node devices;
the receiving module 1002 is further configured to receive data returned by the plurality of node devices, where the data includes data obtained based on the global read transaction and the target global write transaction group.
In a possible implementation manner, the determining module is configured to add, from a related active transaction list of the plurality of node devices, any transaction as a target global write transaction to the target global write transaction group when a transaction state of the transaction on a corresponding node device is an executing state or a ready-to-commit state.
In a possible implementation manner, the sending module is further configured to determine, when any read transaction involves a cross-node operation, the read transaction as a global read transaction, and send a generation request to the global transaction identifier generation cluster;
the receiving module is further configured to receive a global transaction identifier returned by the global transaction identifier generation cluster, and use the global transaction identifier as a transaction identifier of the global read transaction.
In one possible implementation, the apparatus further includes:
and the identification module is used for identifying the node equipment operated by each SQL statement of the transaction according to the meta-information service of the distributed database for any transaction, registering the operated node equipment into the node equipment set of the transaction, and modifying the state of the transaction into a global transaction when the number of the node equipment in the node equipment set is more than or equal to 2.
In a possible implementation manner, the related active transaction list of a node device includes a node identifier of the node device, and a global transaction identifier of a related global write transaction in an active state on the node device; or the like, or, alternatively,
the related active transaction list of a node device comprises a node identification of the node device, the number of related global write transactions, a global transaction identification of a minimum related global write transaction on the node device, and a target bitmap block, wherein the target bitmap block is used for representing the transaction identifications of at least one related global write transaction on the node device except the minimum related global write transaction.
In one possible implementation, the target bitmap block is a compressed bitmap block.
In a possible implementation manner, the sending module is further configured to send a snapshot instruction to a plurality of node devices corresponding to the global read transaction, where the snapshot instruction is used to instruct the node devices to return to a node-level transaction snapshot of the node device, and a node-level transaction snapshot of one node device is used to represent a global transaction in an active state on the node device;
the snapshot obtaining module is further configured to obtain a global transaction snapshot of the global read transaction based on the received snapshot;
the sending module is further configured to send a global transaction snapshot of the global read transaction to the plurality of node devices.
It should be noted that: in the data reading apparatus provided in the above embodiment, only the division of the functional modules is illustrated when data is read, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the data reading apparatus and the data reading method provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments, and are not described herein again.
Fig. 11 is a schematic structural diagram of a data reading apparatus according to an embodiment of the present invention, and referring to fig. 9, the apparatus includes:
a receiving module 1101, configured to receive an indication message, where the indication message is used to instruct the multiple node devices to return a relevant active transaction list of the global read transaction;
an obtaining module 1102, configured to obtain a related active transaction list, where the related active transaction list includes related global write transactions that are in progress on the node device and a transaction status of each related global write transaction, and each related global write transaction corresponds to at least two node devices in the multiple node devices;
a sending module 1103, configured to send the related active transaction list;
the receiving module 1101 is further configured to receive a target global write transaction group, where a transaction state of a target global write transaction included in the target global write transaction group on a corresponding node device is an executing state or a ready-to-commit state;
an output module 1104, configured to output data according to the target global write transaction group and the global read transaction, where the data includes data obtained based on the global read transaction and the target global write transaction group.
In one possible implementation, the obtaining module is configured to traverse an ongoing global write transaction on the node device; and when the node equipment corresponding to any global write transaction comprises at least two node equipment in the plurality of node equipment, adding the write transaction to the related active transaction list.
In a possible implementation manner, the output module is configured to obtain, according to a global transaction snapshot of the global read transaction, a first tuple in a target tuple of the global read transaction, where the first tuple is a tuple visible to the global read transaction; and if the first tuple of the global read transaction comprises a second tuple, outputting the first tuple except the second tuple, wherein the second tuple is a tuple of which the commit transaction is any one target global write transaction.
In one possible implementation, the apparatus further includes: a snapshot module for performing any one of the following steps:
when a first snapshot instruction is received, performing transaction snapshot on the global transaction in an active state on the node equipment to obtain a node-level transaction snapshot of the node equipment, and sending the node-level transaction snapshot of the node equipment to a gateway server; or the like, or, alternatively,
when receiving a node-level transaction snapshot of a previous node device in the plurality of node devices, which is sent by a gateway server, performing transaction snapshot on a global transaction in an active state on the node device to obtain the transaction snapshot on the node device, acquiring a union of the transaction snapshot of the node device and the node-level transaction snapshot of the previous node device, and sending the union as the node-level transaction snapshot of the node device to the gateway server; or the like, or, alternatively,
when receiving the node-level transaction snapshot sent by the previous node device, performing transaction snapshot on the global transaction in an active state on the node device to obtain the transaction snapshot of the node device, obtaining a union of the transaction snapshot of the node device and the node-level transaction snapshot of the previous node device, and sending the union as the node-level transaction snapshot of the node device to the next node device in the plurality of node devices.
In one possible implementation, the apparatus further includes: a priority determination module for performing at least one of the following steps:
determining a priority based on a state of the transaction; or the like, or a combination thereof,
updating based on the type of operation performed when the node device is accessed;
the priority of the transaction is updated based on the runtime of the transaction.
It should be noted that: in the data reading apparatus provided in the foregoing embodiment, only the division of the functional modules is illustrated when data is read, and in practical applications, the functions may be distributed by different functional modules as needed, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the data reading apparatus and the data reading method provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments, and are not described herein again.
Fig. 12 is a schematic structural diagram of an electronic device 1200 according to an embodiment of the present invention, where the electronic device 1200 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1201 and one or more memories 1202, where the memory 1202 stores at least one instruction, and the at least one instruction is loaded and executed by the processors 1201 to implement the data reading method provided by each method embodiment. Of course, the electronic device may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the electronic device may further include other components for implementing the functions of the device, which is not described herein again. The gateway server and the node device may adopt a hardware structure of the electronic device.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium is applied to a server, and at least one instruction, at least one program, a code set, or a set of instructions is stored in the computer-readable storage medium, where the instruction, the program, the code set, or the set of instructions are loaded and executed by a processor to implement the operations performed by a gateway server or a node device in the data reading method in the foregoing embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims (14)

1. A data reading method is applied to a gateway server, and the method comprises the following steps:
sending an indication message to a plurality of node devices corresponding to a global read transaction, wherein the indication message is used for indicating the node devices to return a related active transaction list of the global read transaction;
receiving a related active transaction list of the plurality of node devices, wherein the related active transaction list of each node device is used for representing related global write transactions in an active state on the node device, and each related global write transaction corresponds to at least two node devices in the plurality of node devices;
determining a target global write transaction group according to the related active transaction lists of the plurality of node devices, wherein target global write transactions included in the target global write transaction group are in an active state on the corresponding node devices;
sending the target global write transaction group to the plurality of node devices;
receiving data returned by the plurality of node devices, where the data is a first tuple output by the plurality of node devices except for a second tuple when the first tuple of the global read transaction includes the second tuple, the first tuple is a tuple visible to the global read transaction, the first tuple is obtained from a target tuple of the global read transaction according to a global transaction snapshot of the global read transaction, and the second tuple is a tuple in which a commit transaction is any target global write transaction.
2. The method of claim 1, wherein determining a target global write transaction group from the list of related active transactions for the plurality of node devices comprises:
and when the transaction state of any transaction on the corresponding node equipment is an executing state or a ready-to-commit state from the related active transaction lists of the plurality of node equipment, adding the transaction as a target global write transaction to the target global write transaction group.
3. The method of claim 1, further comprising:
when any read transaction relates to cross-node operation, determining the read transaction as a global read transaction, and sending a generation request to a global transaction identifier generation cluster;
receiving the global transaction identifier returned by the global transaction identifier generation cluster, and using the global transaction identifier as the transaction identifier of the global read transaction.
4. The method of claim 1, further comprising:
for any transaction, identifying the node equipment operated by each SQL statement of the transaction according to the meta-information service of a distributed database, registering the operated node equipment into the node equipment set of the transaction, and modifying the state of the transaction into a global transaction when the number of the node equipment in the node equipment set is more than or equal to 2.
5. The method of claim 1,
the related active transaction list of the node equipment comprises a node identification of the node equipment and a global transaction identification of related global write transactions in an active state on the node equipment; or the like, or, alternatively,
the related active transaction list of a node device comprises a node identification of the node device, the number of related global write transactions, a global transaction identification of a minimum related global write transaction on the node device, and a target bitmap block, wherein the target bitmap block is used for representing the transaction identifications of at least one related global write transaction on the node device except the minimum related global write transaction.
6. The method of claim 5 wherein the target bitmap block is a compressed bitmap block.
7. The method of claim 1, further comprising:
sending a snapshot instruction to a plurality of node devices corresponding to the global read transaction, where the snapshot instruction is used to instruct the node devices to return to node-level transaction snapshots of the node devices, and a node-level transaction snapshot of one node device is used to represent a global transaction in an active state on the node device;
acquiring a global transaction snapshot of the global read transaction based on the received snapshot;
and sending the global transaction snapshot of the global read transaction to the plurality of node devices.
8. A data reading method is applied to a node device, and the method comprises the following steps:
receiving an indication message sent by a gateway server, wherein the indication message is used for indicating a plurality of node devices to return a related active transaction list of global read transactions;
obtaining a related active transaction list, where the related active transaction list includes related global write transactions that are in progress on the node devices and a transaction state of each related global write transaction, and each related global write transaction corresponds to at least two node devices in the plurality of node devices;
sending the related active transaction list;
receiving a target global write transaction group sent by the gateway server, wherein the transaction state of a target global write transaction included in the target global write transaction group on corresponding node equipment is an executing state or a ready-to-commit state;
according to the global transaction snapshot of the global read transaction, acquiring a first tuple in a target tuple of the global read transaction, wherein the first tuple is a tuple visible for the global read transaction;
and if the first tuple of the global read transaction comprises a second tuple, outputting the first tuple except the second tuple, wherein the second tuple is a tuple of which the commit transaction is any target global write transaction.
9. The method of claim 8, wherein obtaining the list of related active transactions comprises:
traversing an ongoing global write transaction on the node device;
when the node device corresponding to any global write transaction comprises at least two node devices in the plurality of node devices, adding the global write transaction to the related active transaction list.
10. The method of claim 8, further comprising:
when a first snapshot instruction is received, performing transaction snapshot on the global transaction in an active state on the node equipment to obtain a node-level transaction snapshot of the node equipment, and sending the node-level transaction snapshot of the node equipment to the gateway server; or the like, or, alternatively,
when receiving a node-level transaction snapshot of a previous node device among the plurality of node devices, which is sent by the gateway server, performing transaction snapshot on a global transaction in an active state on the node device to obtain the transaction snapshot on the node device, acquiring a union of the transaction snapshot of the node device and the node-level transaction snapshot of the previous node device, and sending the union as the node-level transaction snapshot of the node device to the gateway server; or the like, or, alternatively,
when receiving the node-level transaction snapshot sent by the previous node device, performing transaction snapshot on the global transaction in an active state on the node device to obtain the transaction snapshot of the node device, obtaining a union of the transaction snapshot of the node device and the node-level transaction snapshot of the previous node device, and sending the union as the node-level transaction snapshot of the node device to the next node device in the plurality of node devices.
11. The method of claim 8, further comprising:
determining a priority based on a state of the transaction; or the like, or a combination thereof,
updating based on the type of operation performed when the node device is accessed;
the priority of the transaction is updated based on the run time of the transaction.
12. A data reading apparatus, applied to a gateway server, the apparatus comprising:
a sending module, configured to send an indication message to multiple node devices corresponding to a global read transaction, where the indication message is used to indicate the multiple node devices to return a relevant active transaction list of the global read transaction;
a receiving module, configured to receive a related active transaction list of the multiple node devices, where the related active transaction list of each node device is used to represent related global write transactions in an active state on the node device, and each related global write transaction corresponds to at least two node devices in the multiple node devices;
a determining module, configured to determine a target global write transaction group according to a related active transaction list of the multiple node devices, where a target global write transaction included in the target global write transaction group is in an active state on a corresponding node device;
the sending module is further configured to send the target global write transaction group to the plurality of node devices;
the receiving module is further configured to receive data returned by the multiple node devices, where the data is a first tuple output by the multiple node devices except for a second tuple when the multiple node devices include the second tuple in a first tuple of the global read transaction, the first tuple is a tuple visible to the global read transaction, the first tuple is obtained from a target tuple of the global read transaction according to a global transaction snapshot of the global read transaction, and the second tuple is a tuple in which a commit transaction is any target global write transaction.
13. An electronic device, comprising a processor and a memory, wherein at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement the operations performed by the data reading method as claimed in any one of claims 1 to 11.
14. A computer-readable storage medium, characterized in that the storage medium stores at least one instruction, which is loaded and executed by a processor to implement the operations performed by the data reading method as provided in any one of claims 1 to 11.
CN201910021182.3A 2019-01-09 2019-01-09 Data reading method and device, electronic equipment and storage medium Active CN109710388B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910021182.3A CN109710388B (en) 2019-01-09 2019-01-09 Data reading method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910021182.3A CN109710388B (en) 2019-01-09 2019-01-09 Data reading method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109710388A CN109710388A (en) 2019-05-03
CN109710388B true CN109710388B (en) 2022-10-21

Family

ID=66261199

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910021182.3A Active CN109710388B (en) 2019-01-09 2019-01-09 Data reading method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109710388B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110196856B (en) * 2019-05-09 2022-08-02 腾讯科技(深圳)有限公司 Distributed data reading method and device
CN112115012A (en) * 2019-06-19 2020-12-22 中兴通讯股份有限公司 Transaction monitoring method, device and system for distributed database and storage medium
CN111190935B (en) * 2019-08-27 2022-10-14 中国人民大学 Data reading method and device, computer equipment and storage medium
CN112650561B (en) * 2019-10-11 2023-04-11 金篆信科有限责任公司 Transaction management method, system, network device and readable storage medium
CN110825752B (en) * 2019-10-16 2020-11-10 深圳巨杉数据库软件有限公司 Database multi-version concurrency control system based on fragment-free recovery
CN110765178B (en) * 2019-10-18 2021-03-05 京东数字科技控股有限公司 Distributed transaction processing method and device and computer storage medium
CN111008157B (en) * 2019-11-29 2022-02-18 北京浪潮数据技术有限公司 Storage system write cache data issuing method and related components
CN111338766B (en) * 2020-03-12 2022-10-25 腾讯科技(深圳)有限公司 Transaction processing method, apparatus, computer equipment and storage medium
CN111708615B (en) * 2020-05-20 2021-10-29 腾讯科技(深圳)有限公司 Transaction processing method and device, computer equipment and storage medium
CN115098228B (en) * 2021-05-19 2023-04-14 腾讯科技(深圳)有限公司 Transaction processing method and device, computer equipment and storage medium
CN114637738B (en) * 2022-03-18 2025-07-11 上海达梦数据库有限公司 Data visibility judgment method, device, database node and medium
CN115858104A (en) * 2022-12-15 2023-03-28 上海达梦数据库有限公司 Method, device and equipment for acquiring snapshot of transaction commit sequence number and storage medium
CN115981817B (en) * 2022-12-30 2023-09-05 深圳计算科学研究院 Task resource scheduling method and system for HTAP

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6237001B1 (en) * 1997-04-23 2001-05-22 Oracle Corporation Managing access to data in a distributed database environment
CN102419764A (en) * 2010-10-20 2012-04-18 微软公司 Distributed transaction management for database systems with multiversioning
CN104657364A (en) * 2013-11-18 2015-05-27 华为技术有限公司 Log-structured database system query processing method and device
CN105608086A (en) * 2014-11-17 2016-05-25 中兴通讯股份有限公司 Transaction processing method and device of distributed database system
CN105684377A (en) * 2013-10-31 2016-06-15 华为技术有限公司 System and method for creating a distributed transaction manager supporting repeatable read isolation level in a mpp database
CN106462586A (en) * 2014-03-28 2017-02-22 华为技术有限公司 Efficient methods and systems for consistent read in record-based multi-version concurrency control
CN106462594A (en) * 2014-04-10 2017-02-22 华为技术有限公司 System and method for massively parallel processing database
CN106598992A (en) * 2015-10-15 2017-04-26 中兴通讯股份有限公司 Database operating method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6237001B1 (en) * 1997-04-23 2001-05-22 Oracle Corporation Managing access to data in a distributed database environment
CN102419764A (en) * 2010-10-20 2012-04-18 微软公司 Distributed transaction management for database systems with multiversioning
CN105684377A (en) * 2013-10-31 2016-06-15 华为技术有限公司 System and method for creating a distributed transaction manager supporting repeatable read isolation level in a mpp database
CN104657364A (en) * 2013-11-18 2015-05-27 华为技术有限公司 Log-structured database system query processing method and device
CN106462586A (en) * 2014-03-28 2017-02-22 华为技术有限公司 Efficient methods and systems for consistent read in record-based multi-version concurrency control
CN106462594A (en) * 2014-04-10 2017-02-22 华为技术有限公司 System and method for massively parallel processing database
CN105608086A (en) * 2014-11-17 2016-05-25 中兴通讯股份有限公司 Transaction processing method and device of distributed database system
CN106598992A (en) * 2015-10-15 2017-04-26 中兴通讯股份有限公司 Database operating method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《分布式数据库中一致性与可用性的关系》;朱涛;《软件学报》;20180131(第1期);全文 *

Also Published As

Publication number Publication date
CN109710388A (en) 2019-05-03

Similar Documents

Publication Publication Date Title
CN109710388B (en) Data reading method and device, electronic equipment and storage medium
CN109739935B (en) Data reading method and device, electronic equipment and storage medium
CN110209734B (en) Data copying method and device, computer equipment and storage medium
CN110196856B (en) Distributed data reading method and device
US11822540B2 (en) Data read method and apparatus, computer device, and storage medium
JP7549137B2 (en) Transaction processing method, system, device, equipment, and program
US20230145054A1 (en) Multi-region database systems and methods
CN113391885A (en) A distributed transaction processing system
US12111817B2 (en) Log execution method and apparatus, computer device and storage medium
CN109783578B (en) Data reading method and device, electronic equipment and storage medium
WO2022135471A1 (en) Multi-version concurrency control and log clearing method, node, device and medium
WO2022242401A1 (en) Transaction processing method and apparatus for database system, and electronic device, computer readable storage medium, and computer program product
CN115114294A (en) Adaptive method, device and computer equipment for database storage mode
US20240045887A1 (en) Systems and methods for controlling replica placement in multi-region databases
Zhang et al. Dependency preserved raft for transactions
CN114328591B (en) Transaction execution method, device, equipment and storage medium
HK40001814B (en) Method, apparatus, electronic device and storage medium for reading data
HK40001814A (en) Method, apparatus, electronic device and storage medium for reading data
HK40001819B (en) Method, apparatus, electronic device and storage medium for reading data
HK40001819A (en) Method, apparatus, electronic device and storage medium for reading data
US12259891B2 (en) Hybrid database implementations
US12360961B2 (en) Hybrid database implementations
US20250021572A1 (en) Hybrid database implementations
CN120029721A (en) A transaction processing method, device and computer equipment for multi-version concurrent control
CN117435574A (en) Improved two-stage commit transaction implementation method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40001814

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230918

Address after: 518057 Tencent Building, No. 1 High-tech Zone, Nanshan District, Shenzhen City, Guangdong Province, 35 floors

Patentee after: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd.

Patentee after: TENCENT CLOUD COMPUTING (BEIJING) Co.,Ltd.

Address before: 518057 Tencent Building, No. 1 High-tech Zone, Nanshan District, Shenzhen City, Guangdong Province, 35 floors

Patentee before: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd.