CN113377774B

CN113377774B - Data query method, device and electronic device

Info

Publication number: CN113377774B
Application number: CN202110645512.3A
Authority: CN
Inventors: 李江伟
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2025-03-07
Anticipated expiration: 2041-06-09
Also published as: CN113377774A

Abstract

The present invention provides a data query method, device and electronic device, wherein a computing node receives a data query instruction sent by a user end and carries a first identifier of a first logic table, a second identifier of a second logic table and a query condition; a hash table of the first logic table is established in a designated memory according to the data of the first logic table pulled from multiple storage nodes; first data satisfying the query condition is obtained from the data of the second logic table stored in multiple storage nodes; second data satisfying the query condition is obtained from the hash table; and a query result determined based on the first data and the second data is returned to the user end. The computing node of this method constructs a hash table in a designated memory accessible to each storage node, and implements data query through the hash table and the data stored in multiple storage nodes. Compared with the method in which the computing node stores multiple hash tables, this method reduces the occupation of memory and CPU, as well as the occupation of network bandwidth during data query.

Description

Data query method and device and electronic equipment

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data query method, a data query device, and an electronic device.

Background

The hash join algorithm (hash join algorithm) is generally a method for joining two tables by hash operation to obtain a joined result set.

In the distributed database, the computing node is connected with a plurality of storage nodes, and the computing node communicates with each storage node through a hash connection algorithm. Specifically, different internal table data are stored in each storage node, the computing node can pull the internal table data stored in all the storage nodes to establish a full hash table, and meanwhile, one hash table needs to be copied for each storage node in a local memory corresponding to the computing node so that data between the computing node and the storage node can be processed in parallel, but the hash table stores multiple parts in the computing node, so that the memory and the CPU occupy more, and meanwhile, each part of data in the multiple parts of data needs to be pulled from the storage node to carry out hash table inquiry, so that the network bandwidth occupies more.

Disclosure of Invention

The invention aims to provide a data query method, a data query device and electronic equipment, so as to reduce the occupied amount of a memory and a CPU and reduce the occupied amount of network bandwidth.

The invention provides a data query method, which is applied to a computing node of a distributed database, the computing node is in communication connection with a plurality of storage nodes, the plurality of storage nodes store data of a first logic table and data of a second logic table, the method comprises the steps of receiving a data query instruction sent by a user side, wherein the data query instruction carries a first identifier of the first logic table, a second identifier of the second logic table and query conditions, pulling the data of the first logic table from the plurality of storage nodes, establishing a hash table corresponding to the first logic table in a specified memory according to the pulled data, wherein the hash table of the specified memory is used for providing data to be queried for each storage node, acquiring the first data meeting the query conditions from the data of the second logic table stored by the plurality of storage nodes, acquiring the second data meeting the query conditions from the hash table, determining a query result corresponding to the data query instruction based on the first data and the second data, and returning the query result to the user side.

In an optional embodiment, the step of determining the query result corresponding to the data query instruction based on the first data and the second data includes determining a plurality of target information meeting the query condition from the first data and the second data, performing splicing processing on the plurality of target information to obtain a spliced result, and determining the spliced result as the query result corresponding to the data query instruction.

In an optional embodiment, the step of performing the splicing processing on the plurality of target data information to obtain a splicing result includes performing splicing on the plurality of target information by using a plurality of threads to obtain the splicing result.

In an alternative embodiment, the step of acquiring the first data meeting the query condition from the data of the second logic table stored by the plurality of storage nodes includes, for each storage node in the plurality of storage nodes, executing the following operations in parallel, namely pulling the data of the second logic table from the storage node, and querying the first data meeting the query condition from the data of the second logic table.

In an alternative embodiment, after the step of querying the first data meeting the query condition from the data of the second logic table, the method further includes deleting the pulled data of the second logic table.

In an alternative embodiment, after the step of returning the query result to the client, the method further includes releasing the hash table from the specified memory.

The invention provides a data query device, which is arranged on a computing node of a distributed database, the computing node is in communication connection with a plurality of storage nodes, the storage nodes are used for storing data of a first logic table and data of a second logic table, the device comprises an instruction receiving module used for receiving a data query instruction sent by a user end, the data query instruction carries a first identification of the first logic table, a second identification of the second logic table and query conditions, a hash table establishing module is used for pulling data of the first logic table from the storage nodes, a hash table corresponding to the first logic table is established in a designated memory according to the pulled data, the hash table of the designated memory is used for providing data to be queried for each storage node, the data query module is used for acquiring the first data meeting the query conditions from the data of the second logic table stored by the storage nodes, acquiring the second data meeting the query conditions from the hash table, and a result returning module is used for determining a query result corresponding to the data query instruction based on the first data and the second data and returning the query result to the user end.

In an optional embodiment, the result returning module is further configured to determine a plurality of target information that meets the query condition from the first data and the second data, perform a splicing process on the plurality of target information to obtain a spliced result, and determine the spliced result as a query result corresponding to the data query instruction.

In a third aspect, the present invention provides an electronic device comprising a processor and a memory storing machine executable instructions executable by the processor to implement the data querying method of any of the preceding embodiments.

In a fourth aspect, the present invention provides a machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement a data query method according to any one of the preceding embodiments.

The embodiment of the invention has the following beneficial effects:

The data query method, the data query device and the electronic equipment provided by the invention have the advantages that a computing node firstly receives a data query instruction sent by a user side, the data query instruction carries a first identifier of a first logic table, a second identifier of a second logic table and query conditions, further, data of the first logic table are pulled from a plurality of storage nodes connected with the computing node, a hash table corresponding to the first logic table is built in a designated memory according to the pulled data, then, the first data meeting the query conditions is acquired from the data of the second logic table stored by the plurality of storage nodes, the second data meeting the query conditions is acquired from the hash table, then, a query result corresponding to the data query instruction is determined based on the first data and the second data, and the query result is returned to the user side. The computing node of the mode constructs a hash table in the appointed memory which can be accessed by each storage node, realizes data inquiry with the data stored in a plurality of storage nodes through the hash table, compared with a mode of storing a plurality of hash tables by a computing node, the method reduces the occupation of a memory and a CPU and reduces the occupation of network bandwidth during data query.

Additional features and advantages of the invention will be set forth in the description which follows, or in part will be obvious from the description, or may be learned by practice of the invention.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a hash connection algorithm provided in the related art according to an embodiment of the present invention;

FIG. 2 is a flowchart of a data query method according to an embodiment of the present invention;

FIG. 3 is a flowchart of another data query method according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a data query device according to an embodiment of the present invention;

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the distributed database, the computing nodes are connected with a plurality of storage nodes, the computing nodes communicate with each storage node through a hash connection algorithm, as shown in fig. 1, which is a schematic diagram of the hash connection algorithm provided in the related art, the computing nodes in fig. 1 are responsible for analysis and parallel computation of SQL (Structured Query Language ), two storage nodes (two storage node clusters may also be included in fig. 1) are Group0 and Group1, different inner table data and outer table data are stored in each storage node (T1 in fig. 1 is inner table data, T2 is outer table data, T1 and T2 are all logic tables, t1_0, t1_1, t1_2, t1_3 are physical sub-tables corresponding to T1, and t2_0, t2_1, t2_2 and t2_3 are physical sub-tables corresponding to T2).

Specifically, the computing node may pull the internal table data stored in all the storage nodes, and build a full hash table for each storage node, that is, the hash tables corresponding to each storage node are the same, where the hash table corresponding to each storage node is stored in the memory of the computing node that can only be accessed by the storage node. In fig. 1, two hash tables are constructed in the computing node to facilitate parallel processing of data between the computing node and the storage node, but the hash tables store multiple data in the computing node, which results in larger memory and CPU occupation, and each data in the multiple data needs to be pulled from the storage node to perform hash table query, which results in more network bandwidth occupation.

Based on the problems, the embodiment of the invention provides a data query method, a data query device and electronic equipment, and the technology can be applied to the scenes of data access, data query and the like of a distributed database. For the sake of understanding the present embodiment, first, a data query method disclosed in the present embodiment of the present invention is described in detail, where the method is applied to a computing node of a distributed database, where the computing node is communicatively connected to a plurality of storage nodes, where the plurality of storage nodes store data of a first logical table and data of a second logical table, and as shown in fig. 2, the method includes the following specific steps:

step S202, a data query instruction sent by a user terminal is received, wherein the data query instruction carries a first identifier of a first logic table, a second identifier of a second logic table and a query condition.

The data query instruction may be sent by the user through a user side, where the user side may be a mobile terminal (for example, a mobile phone, a tablet computer, etc.), or may be a computer, etc. The data query instruction carries the identification of the first logic table and the second logic table to be queried and the query condition. The data in the first logic table may be inner table data stored in a storage node, the data in the second logic table may be outer table data stored in a storage node, the data in a part of the first logic table and the data in a part of the second logic table may be stored in each storage node, the data stored in each storage node are different, or it may be understood that the data stored in a plurality of storage nodes are integrated, and the complete data of the first logic table and the complete data of the second logic table may be obtained.

In a specific implementation, a unique identifier is already set for each logical table when constructing the logical table, so as to find the corresponding logical table. The logical table is usually built by an SQL sentence, and the internal table and the external table are defined according to different positions of the marks of the logical table in the SQL sentence, for example, the SQL sentence is selected from T2left join T1 on T2. Id=T1. Id, T2 is positioned on the left, T1 is positioned on the right, T2 is the external table, T1 is the internal table, and T1 and T2 can be also understood as the marks of the internal table and the external table.

Step S204, pulling data of the first logic table from a plurality of storage nodes, and establishing a hash table corresponding to the first logic table in a designated memory according to the pulled data, wherein the hash table of the designated memory is used for providing data to be queried for each storage node.

After receiving the data query instruction, the computing node needs to pull the data (corresponding to the internal table data) of the first logic table from each storage node of the plurality of storage nodes connected with the computing node, and then establishes a hash table corresponding to the total amount of the first logic table in a preset specified memory according to the pulled data. The appointed memory is a global memory in the computing nodes, and the hash table corresponding to the first logic table is stored in the global memory, so that each storage node connected with the computing nodes can access the appointed memory, and the data to be queried can be acquired from the appointed memory.

In the invention, only one part of internal table data is stored in the computing node, and only one full-quantity hash table is needed to be established, and one full-quantity hash table is not needed to be established for each storage node, thereby reducing the occupation of the memory.

Step S206, acquiring first data meeting the query condition from the data of the second logic table stored by the plurality of storage nodes, and acquiring second data meeting the query condition from the hash table.

When the hash table is constructed, the computing node needs to acquire data (corresponding to the first data) meeting the query condition in the data query instruction from the data of the second logic table stored in the plurality of storage nodes, which may also be understood as that the computing node acquires the first data meeting the query condition from the appearance data stored in the storage nodes. And then, the computing node acquires second data meeting the query condition from the hash table stored in the appointed memory.

Step S208, based on the first data and the second data, determining a query result corresponding to the data query instruction, and returning the query result to the user side.

When the method is concretely implemented, after the first data and the second data are queried, the first data and the second data are integrated and processed according to the query conditions to determine a final query result, and the query result is returned to the user side, so that the data query is completed.

The data query method includes the steps that a computing node firstly receives a data query instruction sent by a user side, the data query instruction carries a first identifier of a first logic table, a second identifier of a second logic table and query conditions, further, data of the first logic table are pulled from a plurality of storage nodes connected with the computing node, a hash table corresponding to the first logic table is built in a designated memory according to the pulled data, then first data meeting the query conditions are obtained from the data of the second logic table stored by the plurality of storage nodes, second data meeting the query conditions are obtained from the hash table, then a query result corresponding to the data query instruction is determined based on the first data and the second data, and the query result is returned to the user side. The computing node of the mode constructs a hash table in the appointed memory which can be accessed by each storage node, realizes data inquiry with the data stored in a plurality of storage nodes through the hash table, compared with a mode of storing a plurality of hash tables by a computing node, the method reduces the occupation of a memory and a CPU and reduces the occupation of network bandwidth during data query.

The embodiment of the invention also provides another data query method, which is realized on the basis of the method of the embodiment, and the method mainly describes a specific process (realized by the following step S306) for acquiring first data meeting query conditions from data of a second logic table stored by a plurality of storage nodes, wherein the specific process (realized by the following steps S310-S312) for determining query results corresponding to data query instructions based on the first data and the second data is realized, and the method comprises the following specific steps as shown in fig. 3:

Step S302, a data query instruction sent by a user terminal is received, wherein the data query instruction carries a first identifier of a first logic table, a second identifier of a second logic table and a query condition.

Step S304, the data of the first logic table is pulled from the plurality of storage nodes, and a hash table corresponding to the first logic table is built in the appointed memory according to the pulled data.

Step S306, for each storage node in the plurality of storage nodes, the following operation is executed in parallel, namely the data of the second logic table is pulled from the storage nodes, and the first data meeting the query condition is queried from the data of the second logic table.

The computing node may concurrently pull data (corresponding to the appearance data) of the second logical table of each storage node from the plurality of storage nodes, and then query the first data satisfying the query condition from the pulled data of the second logical table.

In a specific implementation, after the first data meeting the query condition is queried from the data of the second logic table, the pulled data of the second logic table needs to be deleted to release the memory, so that the occupation of the memory is reduced.

Step S308, obtaining second data meeting the query condition from the hash table.

In step S310, a plurality of target information satisfying the query condition is determined from the first data and the second data.

And step S312, performing splicing processing on the plurality of target information to obtain a splicing result, and determining the splicing result as a query result corresponding to the data query instruction.

In specific implementation, a plurality of target information meeting the query conditions can be determined from the first data and the second data, then the target information is required to be spliced, and a spliced result obtained after the splicing is used as a query result corresponding to the data query instruction.

In some embodiments, in order to return data meeting the query condition as soon as possible, multiple threads may be used to splice multiple target information, so as to obtain a spliced result. It is also understood that this method may use one thread to pull the appearance data from multiple storage nodes, and then use multiple threads to splice the data (corresponding to the target information) that meets the query condition.

Step S314, the query result is returned to the user terminal.

When the method is concretely implemented, after the query result is returned to the user side, the hash table is released from the appointed memory, so that the space of the appointed memory is released, and sufficient memory space is provided for memory table data storage and hash table construction in the next data query.

In order to facilitate understanding of the embodiments of the present invention, a detailed description will be given below taking a query condition of a data query request as an example of querying data that are identical in native places in two logical tables. Let the data query request of SQL language sent by the client be selected t2.Birthplace, t2.Name, t1.Name from t2 left join t2 on t2. Birthplace=t1. Birthplace, wherein t1 represents the first identifier of the first logical table, t2 represents the second identifier of the second logical table, birthplace represents native. After receiving the data query request, the computing node pulls the data of the stored first logic table from the plurality of storage nodes connected with the computing node according to the first identifier, constructs a hash table (corresponding to the hash table) corresponding to the first logic table according to the pulled data, wherein the data related to the first logic table and the hash table are both stored in a global memory (corresponding to the specified memory), for example, a hash table h1 is constructed, and the hash table uses a third column of the data as a key to query the data with identical native places of the two tables, and specifically includes the following contents:

beijing:0,xiaoming,beijing,20;

tianjin:1,xiaohong,tianjin,18;

xian:2,xiaojing,xian,21;

chengdu:3,xiaowei,chengdu,22;

wherein the colon is preceded by a key and followed by a value.

And then the computing node concurrently pulls the data of the second logic table from each storage node according to the second identifier, determines the first data meeting the query condition from the data, and then queries the hash table h1 stored in the global memory to obtain the second data. Assuming that the first data determined from the second logic table is 1, xiaowing, tianjin and 25, native place information tianjin is taken out, then querying the hash table to obtain that the second data is 1, xiaowing, tianjin and 18 native place is tianjin, namely that the data meeting the conditions in the two tables are 1, xiaowing, tianjin and 18 in the hash table and 1, xiaowing, tianjin and 25 in the first logic table respectively. Assuming that the information to be acquired is only native, internal table name (name in hash table) and external table name (name in second logical table), then the data meeting the conditions needs to be processed and spliced (multi-thread concurrent processing can be adopted), so as to obtain splicing results of tianjian, xiaohong and xiaowang, and then the splicing results are returned to the calling party.

The data query method comprises the steps of firstly, receiving a data query instruction which is sent by a user terminal and carries a first identifier of a first logic table, a second identifier of a second logic table and query conditions by a computing node, further, establishing a hash table corresponding to the first logic table in a designated memory according to data of the first logic table pulled from a plurality of storage nodes, then, for each storage node in the plurality of storage nodes, executing the following operations in parallel, namely pulling data of the second logic table from the storage nodes, querying first data meeting the query conditions from the data of the second logic table, then, obtaining second data meeting the query conditions from the hash table, determining a plurality of target information meeting the query conditions from the first data and the second data, performing splicing processing on the plurality of target information to obtain a splicing result, and determining the splicing result to be the query result corresponding to the data query instruction and returning the query result corresponding to the user terminal. The method only needs to reserve one copy of internal table data in the computing node to construct a hash table, so that the memory, the CPU and the network resources are saved, and the hash connection efficiency is improved.

For the embodiment of the data query method, the embodiment of the invention provides a data query device, which is arranged at a computing node of a distributed database, wherein the computing node is in communication connection with a plurality of storage nodes, the plurality of storage nodes store data of a first logic table and data of a second logic table, and the device comprises, as shown in fig. 4:

The instruction receiving module 40 is configured to receive a data query instruction sent by the user side, where the data query instruction carries a first identifier of the first logic table, a second identifier of the second logic table, and a query condition.

The hash table establishing module 41 is configured to pull data of the first logical table from a plurality of storage nodes, and establish a hash table corresponding to the first logical table in a specified memory according to the pulled data, where the hash table of the specified memory is used for providing data to be queried for each storage node.

The data query module 42 is configured to obtain first data meeting the query condition from the data of the second logic tables stored in the plurality of storage nodes, and obtain second data meeting the query condition from the hash table.

The result returning module 43 is configured to determine a query result corresponding to the data query instruction based on the first data and the second data, and return the query result to the user side.

The data query device comprises a computing node, a plurality of storage nodes, a hash table, a second data and a second data, wherein the computing node firstly receives a data query instruction sent by a user terminal, the data query instruction carries a first identifier of a first logic table, a second identifier of a second logic table and a query condition, the data of the first logic table is pulled from the plurality of storage nodes connected with the computing node, the hash table corresponding to the first logic table is built in a specified memory according to the pulled data, the first data meeting the query condition is acquired from the data of the second logic table stored by the plurality of storage nodes, the second data meeting the query condition is acquired from the hash table, then a query result corresponding to the data query instruction is determined based on the first data and the second data, and the query result is returned to the user terminal. The computing node of the mode constructs a hash table in the appointed memory which can be accessed by each storage node, realizes data inquiry with the data stored in a plurality of storage nodes through the hash table, compared with a mode of storing a plurality of hash tables by a computing node, the method reduces the occupation of a memory and a CPU and reduces the occupation of network bandwidth during data query.

Further, the result returning module 43 is further configured to determine a plurality of target information that satisfies the query condition from the first data and the second data, perform a stitching process on the plurality of target information to obtain a stitching result, and determine the stitching result as a query result corresponding to the data query command.

In a specific implementation, the result returning module 43 is further configured to splice multiple target information by multiple threads to obtain a spliced result.

Further, the data query module 42 is further configured to, for each storage node of the plurality of storage nodes, perform in parallel an operation of pulling data of the second logical table from the storage node, and query the data of the second logical table for the first data satisfying the query condition.

The device further comprises a data deleting module, wherein the data deleting module is used for deleting the pulled data of the second logic table after the first data meeting the query condition is queried from the data of the second logic table.

In some embodiments, the apparatus further includes a table deleting module, configured to release the hash table from the specified memory after returning the query result to the client.

The data query device provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the foregoing method embodiment, and for brevity, reference may be made to corresponding contents in the foregoing method embodiment where the device embodiment is not mentioned.

An embodiment of the present invention further provides an electronic device, as shown in fig. 5, where the electronic device includes a processor 101 and a memory 100, where the memory 100 stores machine executable instructions that can be executed by the processor 101, and the processor 101 executes the machine executable instructions to implement the above-mentioned data query method.

Further, the electronic device shown in fig. 5 further includes a bus 102 and a communication interface 103, and the processor 101, the communication interface 103, and the memory 100 are connected through the bus 102.

The memory 100 may include a high-speed random access memory (RAM, randomAccessMemory), and may further include a non-volatile memory (non-volatilememory), such as at least one disk memory. The communication connection between the system network element and at least one other network element is implemented via at least one communication interface 103 (which may be wired or wireless), and may use the internet, a wide area network, a local network, a metropolitan area network, etc. Bus 102 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 5, but not only one bus or type of bus.

The processor 101 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 101 or instructions in the form of software. The processor 101 may be a general-purpose processor including a Central Processing Unit (CPU), a network processor (NetworkProcessor NP), a digital signal processor (DIGITAL SIGNAL Processing DSP), an Application SPECIFIC INTEGRATED Circuit (ASIC), a Field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 100 and the processor 101 reads information in the memory 100 and in combination with its hardware performs the steps of the method of the previous embodiments.

The embodiment of the invention also provides a machine-readable storage medium, which stores machine-executable instructions that, when being called and executed by a processor, cause the processor to implement the data query method, and the specific implementation can be referred to the method embodiment and will not be described herein.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

It should be noted that the foregoing embodiments are merely illustrative embodiments of the present invention, and not restrictive, and the scope of the invention is not limited to the embodiments, and although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that any modification, variation or substitution of some of the technical features of the embodiments described in the foregoing embodiments may be easily contemplated within the scope of the present invention, and the spirit and scope of the technical solutions of the embodiments do not depart from the spirit and scope of the embodiments of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The data query method is applied to computing nodes of a distributed database, wherein the computing nodes are in communication connection with a plurality of storage nodes, and the plurality of storage nodes store data of a first logic table and data of a second logic table;

The data in the first logic table is internal table data stored in the storage nodes, the data in the second logic table is external table data stored in the storage nodes, the data of part of the first logic table and the data of part of the second logic table can be stored in each storage node, the data stored in each storage node are different, and the data stored in the storage nodes are integrated to obtain complete data of the first logic table and complete data of the second logic table;

The method comprises the following steps:

Receiving a data query instruction sent by a user terminal, wherein the data query instruction carries a first identifier of the first logic table, a second identifier of the second logic table and a query condition;

The method comprises the steps of pulling data of a first logic table from a plurality of storage nodes, establishing a hash table corresponding to the first logic table in a designated memory according to the pulled data, wherein the hash table of the designated memory is used for providing data to be queried for each storage node, the designated memory is a global memory in a computing node, the hash table corresponding to the first logic table is stored in the global memory, and each storage node connected with the computing node can access the designated memory and acquire the data to be queried from the designated memory;

Acquiring first data meeting the query condition from the data of a second logic table stored by the plurality of storage nodes;

and determining a query result corresponding to the data query instruction based on the first data and the second data, and returning the query result to the user side.

2. The method of claim 1, wherein the step of determining the query result corresponding to the data query instruction based on the first data and the second data comprises:

Determining a plurality of target information meeting the query condition from the first data and the second data;

and determining the splicing result as a query result corresponding to the data query instruction.

3. The method according to claim 2, wherein the step of performing a splicing process on the plurality of target data information to obtain a splicing result includes:

And splicing the plurality of target information by adopting a plurality of threads to obtain the splicing result.

4. A method according to any one of claims 1-3, wherein the step of retrieving the first data satisfying the query condition from the data of the second logical table stored by the plurality of storage nodes comprises:

for each storage node of the plurality of storage nodes, performing the following in parallel:

pulling data of the second logical table from the storage node;

and inquiring the first data meeting the inquiry condition from the data of the second logic table.

5. The method of claim 4, wherein after the step of querying the first data satisfying the query condition from the data of the second logical table, the method further comprises:

And deleting the pulled data of the second logic table.

6. The method of claim 1, wherein after the step of returning the query result to the user side, the method further comprises:

Releasing the hash table from the specified memory.

7. The data query device is characterized by being arranged at a computing node of a distributed database, wherein the computing node is in communication connection with a plurality of storage nodes, and the plurality of storage nodes store data of a first logic table and data of a second logic table;

The device comprises:

The instruction receiving module is used for receiving a data query instruction sent by a user terminal, wherein the data query instruction carries a first identifier of the first logic table, a second identifier of the second logic table and a query condition;

the hash table establishing module is used for pulling the data of the first logic table from the plurality of storage nodes, and establishing a hash table corresponding to the first logic table in a designated memory according to the pulled data, wherein the hash table of the designated memory is used for providing the data to be queried for each storage node;

The data query module is used for acquiring first data meeting the query conditions from the data of the second logic tables stored by the plurality of storage nodes;

And the result returning module is used for determining a query result corresponding to the data query instruction based on the first data and the second data and returning the query result to the user side.

8. The apparatus of claim 7, wherein the result return module is further configured to:

9. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to implement the data query method of any of claims 1 to 6.

10. A machine-readable storage medium storing machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement the data query method of any one of claims 1 to 6.