[go: up one dir, main page]

CN113377774A - Data query method and device and electronic equipment - Google Patents

Data query method and device and electronic equipment Download PDF

Info

Publication number
CN113377774A
CN113377774A CN202110645512.3A CN202110645512A CN113377774A CN 113377774 A CN113377774 A CN 113377774A CN 202110645512 A CN202110645512 A CN 202110645512A CN 113377774 A CN113377774 A CN 113377774A
Authority
CN
China
Prior art keywords
data
query
storage nodes
result
hash table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110645512.3A
Other languages
Chinese (zh)
Other versions
CN113377774B (en
Inventor
李江伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202110645512.3A priority Critical patent/CN113377774B/en
Publication of CN113377774A publication Critical patent/CN113377774A/en
Application granted granted Critical
Publication of CN113377774B publication Critical patent/CN113377774B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供了一种数据查询方法、装置和电子设备,计算节点接收用户端发送的携带第一逻辑表的第一标识、第二逻辑表的第二标识和查询条件的数据查询指令;根据从多个存储节点中拉取第一逻辑表的数据,在指定内存中建立第一逻辑表的哈希表;从多个存储节点存储的第二逻辑表的数据中获取满足查询条件的第一数据;从哈希表中获取满足查询条件的第二数据;将基于第一数据和第二数据确定的查询结果,返回至用户端。该方式的计算节点在每个存储节点均能访问的指定内存中构建一份哈希表,并通过该哈希表与多个存储节点中存储的数据实现数据查询,相对于计算节点保存多份哈希表的方式,该方式减少了内存和CPU的占用,以及数据查询时网络带宽的占用。

Figure 202110645512

The invention provides a data query method, device and electronic device. A computing node receives a data query instruction sent by a user terminal that carries a first identifier of a first logic table, a second identifier of a second logic table and query conditions; The data of the first logical table is pulled from multiple storage nodes, and a hash table of the first logical table is established in the designated memory; the first data that satisfies the query condition is obtained from the data of the second logical table stored in the multiple storage nodes ; Obtain second data that satisfies the query condition from the hash table; and return the query result determined based on the first data and the second data to the user end. In this way, the computing node builds a hash table in the designated memory that each storage node can access, and implements data query through the hash table and the data stored in multiple storage nodes, and saves multiple copies relative to the computing node. Hash table method, which reduces the occupation of memory and CPU, as well as the occupation of network bandwidth during data query.

Figure 202110645512

Description

Data query method and device and electronic equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data query method, an apparatus, and an electronic device.
Background
A hash join algorithm (hash join algorithm for short) is a table join method that obtains a join result set by means of hash operation when two tables are joined.
In the distributed database, a computing node is connected with a plurality of storage nodes; the computing nodes communicate with the storage nodes through a hash connection algorithm. Specifically, different inner table data are stored in each storage node, a computing node can pull the inner table data stored in all the storage nodes to establish a full hash table, and meanwhile, a hash table needs to be copied in a local memory corresponding to the computing node for each storage node, so that data between the computing node and the storage nodes can be processed in parallel.
Disclosure of Invention
The invention aims to provide a data query method, a data query device and electronic equipment, so as to reduce the occupation amount of a memory and a CPU (central processing unit) and reduce the occupation amount of network bandwidth.
In a first aspect, the invention provides a data query method, which is applied to a computing node of a distributed database; the computing node is in communication connection with a plurality of storage nodes; the plurality of storage nodes store data of the first logic table and data of the second logic table; the method comprises the following steps: receiving a data query instruction sent by a user side; the data query instruction carries a first identifier of a first logic table, a second identifier of a second logic table and a query condition; pulling data of the first logic table from the plurality of storage nodes, and establishing a hash table corresponding to the first logic table in a designated memory according to the pulled data; wherein the hash table of the specified memory is used for: providing data to be queried for each storage node; acquiring first data meeting the query condition from data of a second logic table stored by a plurality of storage nodes; acquiring second data meeting the query condition from the hash table; and determining a query result corresponding to the data query instruction based on the first data and the second data, and returning the query result to the user side.
In an optional implementation manner, the step of determining a query result corresponding to the data query instruction based on the first data and the second data includes: determining a plurality of target information meeting the query condition from the first data and the second data; splicing the target information to obtain a splicing result; and determining the splicing result as a query result corresponding to the data query instruction.
In an optional embodiment, the step of performing a splicing process on the multiple pieces of target data information to obtain a splicing result includes: and splicing the target information by adopting a plurality of threads to obtain a splicing result.
In an optional implementation manner, the step of obtaining first data satisfying the query condition from data of the second logical table stored in the plurality of storage nodes includes: for each storage node of the plurality of storage nodes, performing the following operations in parallel: pulling data of the second logic table from the storage node; and querying the first data meeting the query condition from the data of the second logic table.
In an optional implementation manner, after the step of querying the first data satisfying the query condition from the data of the second logical table, the method further includes: and deleting the data of the pulled second logic table.
In an optional implementation manner, after the step of returning the query result to the user side, the method further includes: and releasing the hash table from the specified memory.
In a second aspect, the present invention provides a data query apparatus, which is disposed at a computing node of a distributed database; the computing node is in communication connection with a plurality of storage nodes; the plurality of storage nodes store data of the first logic table and data of the second logic table; the device includes: the instruction receiving module is used for receiving a data query instruction sent by a user side; the data query instruction carries a first identifier of a first logic table, a second identifier of a second logic table and a query condition; the hash table establishing module is used for pulling the data of the first logic table from the plurality of storage nodes and establishing a hash table corresponding to the first logic table in the appointed memory according to the pulled data; wherein the hash table of the specified memory is used for: providing data to be queried for each storage node; the data query module is used for acquiring first data meeting query conditions from data of a second logic table stored by a plurality of storage nodes; acquiring second data meeting the query condition from the hash table; and the result returning module is used for determining a query result corresponding to the data query instruction based on the first data and the second data and returning the query result to the user side.
In an optional implementation manner, the return-to-result module is further configured to: determining a plurality of target information meeting the query condition from the first data and the second data; splicing the target information to obtain a splicing result; and determining the splicing result as a query result corresponding to the data query instruction.
In a third aspect, the present invention provides an electronic device comprising a processor and a memory, the memory storing machine executable instructions capable of being executed by the processor, the processor executing the machine executable instructions to implement the data query method of any one of the preceding embodiments.
In a fourth aspect, the present invention provides a machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to carry out the data query method of any one of the preceding embodiments.
The embodiment of the invention has the following beneficial effects:
according to the data query method, the data query device and the electronic equipment, a computing node firstly receives a data query instruction sent by a user side, wherein the data query instruction carries a first identifier of a first logic table, a second identifier of a second logic table and query conditions; pulling data of the first logic table from a plurality of storage nodes connected with the computing node, and establishing a hash table corresponding to the first logic table in a designated memory according to the pulled data; acquiring first data meeting the query conditions from data of a second logic table stored by a plurality of storage nodes, and acquiring second data meeting the query conditions from a hash table; and then determining a query result corresponding to the data query instruction based on the first data and the second data, and returning the query result to the user side. The computing node of the method constructs a hash table in the appointed memory which can be accessed by each storage node, and realizes data query through the hash table and the data stored in the plurality of storage nodes.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention as set forth above.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic diagram of a hash join algorithm provided in the related art according to an embodiment of the present invention;
FIG. 2 is a flowchart of a data query method according to an embodiment of the present invention;
FIG. 3 is a flow chart of another data query method according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a data query device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the distributed database, a computing node is connected with a plurality of storage nodes; the computing node communicates with each storage node through a hash connection algorithm, as shown in fig. 1, the computing node in fig. 1 is responsible for parsing and parallel computing of SQL (Structured Query Language), where fig. 1 includes two storage nodes (or two storage node clusters) of Group0 and Group1, each storage node stores different inner table data and outer table data (T1 in fig. 1 is the inner table data, T2 is the outer table data, T1 and T2 are both logical tables, T1_0, T1_1, T1_2, and T1_3 are physical sub-tables corresponding to T1, and T2_0, T2_1, T2_2, and T2_3 are physical sub-tables corresponding to T2).
Specifically, the compute node may pull the internal table data stored in all the storage nodes, and establish a full amount of hash tables for each storage node, that is, the hash tables corresponding to each storage node are the same, where the hash table corresponding to each storage node is stored in the memory of the compute node, which can only be accessed by the storage node. Fig. 1 includes two storage nodes, two hash tables are constructed in the computing node, so as to facilitate parallel processing of data between the computing node and the storage nodes, but the hash tables store multiple copies in the computing node, which results in large occupation of memory and CPU, and meanwhile, each piece of data in the multiple copies of data needs to be pulled from the storage node to perform hash table query, which results in large occupation of network bandwidth.
Based on the above problems, embodiments of the present invention provide a data query method, apparatus, and electronic device; the technology can be applied to scenes of data access, data query and the like of the distributed database. In order to facilitate understanding of the embodiment, a detailed description is first given of a data query method disclosed in the embodiment of the present invention, where the method is applied to a computing node of a distributed database; the computing node is in communication connection with a plurality of storage nodes; the plurality of storage nodes store data of the first logic table and data of the second logic table; as shown in fig. 2, the method comprises the following specific steps:
step S202, receiving a data query instruction sent by a user side; the data query instruction carries a first identifier of the first logic table, a second identifier of the second logic table and a query condition.
The data query command may be sent by a user through a user side, where the user side may be a mobile terminal (e.g., a mobile phone, a tablet computer, etc.), or a computer, etc. The data query instruction carries the identifiers of the first logic table and the second logic table to be queried, and query conditions. The data in the first logical table may be internal table data stored in a storage node, the data in the second logical table may be external table data stored in the storage node, each storage node may store a part of the data in the first logical table and a part of the data in the second logical table, and the data stored in each storage node is different, or may be understood as integrating the data stored in a plurality of storage nodes, so that the complete data in the first logical table and the complete data in the second logical table may be obtained.
In a specific implementation, a unique identifier is already set for each logic table when the logic table is constructed so as to search the corresponding logic table. Usually, the SQL statement constructs a logical table, and defines an inner table and an outer table according to the position of the identifier of the logical table in the SQL statement, for example, the SQL statement is: select from T2left join T1 on T2.id ═ T1. id; t2 is located on the left and T1 is located on the right, then T2 is the outer surface and T1 is the inner surface, wherein T1 and T2 are also understood to be the identifiers of the inner and outer surfaces.
Step S204, pulling data of the first logic table from the plurality of storage nodes, and establishing a hash table corresponding to the first logic table in a designated memory according to the pulled data; wherein the hash table of the specified memory is used for: providing each storage node with data to be queried.
After receiving the data query instruction, the computing node needs to pull data (equivalent to the internal table data) of the first logical table from each storage node of the plurality of storage nodes connected to the computing node, and then establishes a hash table corresponding to the total amount of the first logical table in a pre-applied specified memory according to the pulled data. The designated memory is a global memory in the compute node, and the hash table corresponding to the first logic table is stored in the global memory, so that each storage node connected with the compute node can be ensured to access the designated memory, and data to be queried can be acquired from the designated memory.
In the invention, only one internal table data is stored in the computing node, and simultaneously only one full hash table is required to be established, and the full hash table is not required to be established for each storage node, thereby reducing the occupation of the memory.
Step S206, acquiring first data meeting the query condition from the data of the second logic table stored in the plurality of storage nodes; and acquiring second data meeting the query condition from the hash table.
After the hash table is constructed, the computing node needs to acquire data (equivalent to the first data) satisfying the query condition in the data query instruction from data of the second logical table stored in the plurality of storage nodes, which may also be understood as that the computing node acquires the first data satisfying the query condition from appearance data stored in the storage nodes. And then, the computing node acquires second data meeting the query condition from the hash table stored in the specified memory.
Step S208, based on the first data and the second data, determining a query result corresponding to the data query instruction, and returning the query result to the user side.
In specific implementation, after the first data and the second data are queried, the first data and the second data need to be integrated and processed according to query conditions to determine a final query result, and the query result is returned to the user side, so that the query of the data is completed.
In the data query method provided by the embodiment of the invention, a computing node firstly receives a data query instruction sent by a user side, wherein the data query instruction carries a first identifier of a first logic table, a second identifier of a second logic table and a query condition; pulling data of the first logic table from a plurality of storage nodes connected with the computing node, and establishing a hash table corresponding to the first logic table in a designated memory according to the pulled data; acquiring first data meeting the query conditions from data of a second logic table stored by a plurality of storage nodes, and acquiring second data meeting the query conditions from a hash table; and then determining a query result corresponding to the data query instruction based on the first data and the second data, and returning the query result to the user side. The computing node of the method constructs a hash table in the appointed memory which can be accessed by each storage node, and realizes data query through the hash table and the data stored in the plurality of storage nodes.
The embodiment of the invention also provides another data query method, which is realized on the basis of the method of the embodiment; the method focuses on a specific process of acquiring first data meeting the query condition from data of a second logic table stored in a plurality of storage nodes (implemented by the following step S306), and includes: determining a specific process of a query result corresponding to the data query instruction based on the first data and the second data (realized by steps S310 to S312 described below); as shown in fig. 3, the method comprises the following specific steps:
step S302, receiving a data query instruction sent by a user side; the data query instruction carries a first identifier of the first logic table, a second identifier of the second logic table and a query condition.
Step S304, pulling data of the first logic table from the plurality of storage nodes, and establishing a hash table corresponding to the first logic table in the designated memory according to the pulled data.
Step S306, for each storage node of the plurality of storage nodes, performing the following operations in parallel: pulling data of the second logic table from the storage node; and querying the first data meeting the query condition from the data of the second logic table.
The computing node may concurrently pull data (corresponding to the appearance data) of the second logical table of each storage node from the plurality of storage nodes, and then query the first data satisfying the query condition from the pulled data of the second logical table.
In a specific implementation, after the first data meeting the query condition is queried from the data of the second logic table, the pulled data of the second logic table needs to be deleted to release the memory, so that the memory occupation is reduced.
In step S308, second data satisfying the query condition is obtained from the hash table.
In step S310, a plurality of target information satisfying the query condition is determined from the first data and the second data.
Step S312, splicing the target information to obtain a splicing result; and determining the splicing result as a query result corresponding to the data query instruction.
During specific implementation, a plurality of target information meeting the query conditions can be determined from the first data and the second data, then the target information needs to be spliced, and a splicing result obtained after splicing is used as a query result corresponding to the data query instruction.
In some embodiments, in order to return data satisfying the query condition as soon as possible, multiple threads may be used to splice multiple pieces of target information to obtain a spliced result. It can also be understood that, in this way, one thread may be used to pull the appearance data from multiple storage nodes, and then multiple threads may be used to perform the splicing process on the data (corresponding to the above target information) that meets the query condition.
Step S314, returning the query result to the user side.
In specific implementation, after the query result is returned to the user side, the hash table needs to be released from the specified memory to release the space of the specified memory, so that sufficient memory space is provided for internal table data storage and hash table construction in the next data query.
For the convenience of understanding the embodiments of the present invention, the following description will be made in detail by taking the query condition of the data query request as an example of querying the same data in the two logical tables. Suppose that a data query request of SQL language sent by a client is received by a computing node as follows: select t2. birthdaplace, t2.name, t1.name from t2left join t2 on t2. birthdaplace ═ t1. birthdaplace; where t1 represents the first id of the first logical table, t2 represents the second id of the second logical table, and berthplace represents native. After receiving the data query request, the computing node pulls the stored data of the first logical table from the plurality of storage nodes connected to the computing node according to the first identifier, and constructs a hash table (equivalent to the hash table) corresponding to the first logical table according to the pulled data, where the data related to the first logical table and the hash table are both stored in a global memory (equivalent to the specified memory), for example, a hash table h1 is constructed, and the hash table queries data that are identical to the two tables by using a third column of the data as a key, where the hash table specifically includes the following contents:
beijing:0,xiaoming,beijing,20;
tianjin:1,xiaohong,tianjin,18;
xian:2,xiaojing,xian,21;
chengdu:3,xiaowei,chengdu,22;
where the colon is preceded by a key and followed by a value.
Then, the concurrent calculation nodes pull the data of the second logic table from each storage node according to the second identifier, and determine the first data meeting the query condition from the data; and then, the second data is inquired in a hash table h1 stored in the global memory to obtain second data. Assuming that the first data determined from the second logical table is 1, xiaowang, tianjin,25, retrieving native information tianjin; then, the second data is 1, xiaohong, tianjin and 18 native place is tianjin after query in the hash table; that is, the data meeting the conditions in the two tables are respectively: 1 in the hash table, xiaohong, tianjin, 18, 1 in the first logical table, xiaowang, tianjin, 25. Assuming that the information to be acquired is only native place, internal table name (name in hash table), and external table name (name in second logic table), the data meeting the condition needs to be processed and spliced (multithreading concurrent processing may be adopted), and the result of splicing is: tianjian, xiaohong, xiaowang, and then returns the splice result to the caller.
The data query method comprises the steps that firstly, a computing node receives a data query instruction which is sent by a user side and carries a first identifier of a first logic table, a second identifier of a second logic table and query conditions; then, according to the data of the first logic table pulled from the plurality of storage nodes, a hash table corresponding to the first logic table is established in the appointed memory; and then for each storage node in the plurality of storage nodes, executing the following operations in parallel: pulling data of the second logic table from the storage node; querying first data meeting query conditions from the data of the second logic table; then acquiring second data meeting the query condition from the hash table; and determining a plurality of target information meeting the query conditions from the first data and the second data, splicing the plurality of target information to obtain a splicing result, determining the splicing result as a query result corresponding to the data query instruction, and returning the query result to the user side. In the method, only one internal table data is reserved in the computing node to construct a hash table, so that the memory, the CPU and the network resource are saved, and the hash connection efficiency is improved.
For the embodiment of the data query method, the embodiment of the invention provides a data query device, which is arranged at a computing node of a distributed database; the computing node is in communication connection with a plurality of storage nodes; the plurality of storage nodes store data of the first logic table and data of the second logic table; as shown in fig. 4, the apparatus includes:
the instruction receiving module 40 is configured to receive a data query instruction sent by a user side; the data query instruction carries a first identifier of the first logic table, a second identifier of the second logic table and a query condition.
A hash table establishing module 41, configured to pull data of the first logic table from the multiple storage nodes, and establish a hash table corresponding to the first logic table in the specified memory according to the pulled data; wherein the hash table of the specified memory is used for: providing each storage node with data to be queried.
A data query module 42, configured to obtain first data that meets a query condition from data in the second logical table stored in the plurality of storage nodes; and acquiring second data meeting the query condition from the hash table.
And the result returning module 43 is configured to determine a query result corresponding to the data query instruction based on the first data and the second data, and return the query result to the user side.
In the data query device, the computing node firstly receives a data query instruction sent by a user side, wherein the data query instruction carries a first identifier of a first logic table, a second identifier of a second logic table and a query condition; pulling data of the first logic table from a plurality of storage nodes connected with the computing node, and establishing a hash table corresponding to the first logic table in a designated memory according to the pulled data; acquiring first data meeting the query conditions from data of a second logic table stored by a plurality of storage nodes, and acquiring second data meeting the query conditions from a hash table; and then determining a query result corresponding to the data query instruction based on the first data and the second data, and returning the query result to the user side. The computing node of the method constructs a hash table in the appointed memory which can be accessed by each storage node, and realizes data query through the hash table and the data stored in the plurality of storage nodes.
Further, the result returning module 43 is further configured to: determining a plurality of target information meeting the query condition from the first data and the second data; splicing the target information to obtain a splicing result; and determining the splicing result as a query result corresponding to the data query instruction.
In a specific implementation, the result returning module 43 is further configured to: and splicing the target information by adopting a plurality of threads to obtain a splicing result.
Further, the data query module 42 is further configured to: for each storage node of the plurality of storage nodes, performing the following operations in parallel: pulling data of the second logic table from the storage node; and querying the first data meeting the query condition from the data of the second logic table.
Specifically, the apparatus further includes a data deleting module, configured to: and deleting the pulled data of the second logic table after querying the first data meeting the query condition from the data of the second logic table.
In some embodiments, the apparatus further includes a table deleting module, configured to release the hash table from the specified memory after returning the query result to the user side.
The data query apparatus provided in the embodiment of the present invention has the same implementation principle and technical effect as those of the foregoing method embodiments, and for the sake of brief description, reference may be made to corresponding contents in the foregoing method embodiments for parts of embodiments that are not mentioned in the apparatus embodiments.
An embodiment of the present invention further provides an electronic device, as shown in fig. 5, the electronic device includes a processor 101 and a memory 100, where the memory 100 stores machine executable instructions capable of being executed by the processor 101, and the processor 101 executes the machine executable instructions to implement the data query method.
Further, the electronic device shown in fig. 5 further includes a bus 102 and a communication interface 103, and the processor 101, the communication interface 103, and the memory 100 are connected through the bus 102.
The memory 100 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 103 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The bus 102 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus.
The processor 101 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 101. The processor 101 may be a general-purpose processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 100, and the processor 101 reads the information in the memory 100, and completes the steps of the method of the foregoing embodiment in combination with the hardware thereof.
The embodiment of the present invention further provides a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are called and executed by a processor, the machine-executable instructions cause the processor to implement the data query method.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1.一种数据查询方法,其特征在于,所述方法应用于分布式数据库的计算节点;所述计算节点与多个存储节点通信连接;所述多个存储节点存储有第一逻辑表的数据和第二逻辑表的数据;所述方法包括:1. A data query method, characterized in that the method is applied to a computing node of a distributed database; the computing node is communicatively connected to multiple storage nodes; the multiple storage nodes store data of a first logical table and the data of the second logical table; the method includes: 接收用户端发送的数据查询指令;其中,所述数据查询指令中携带有所述第一逻辑表的第一标识、所述第二逻辑表的第二标识和查询条件;Receive a data query instruction sent by the client; wherein, the data query instruction carries the first identifier of the first logic table, the second identifier of the second logic table, and query conditions; 从所述多个存储节点中拉取所述第一逻辑表的数据,根据拉取的数据在指定内存中建立所述第一逻辑表对应的哈希表;其中,所述指定内存的哈希表用于:为每个所述存储节点提供待查询的数据;The data of the first logical table is pulled from the plurality of storage nodes, and a hash table corresponding to the first logical table is established in the specified memory according to the pulled data; wherein, the hash table of the specified memory is The table is used to: provide data to be queried for each of the storage nodes; 从所述多个存储节点存储的第二逻辑表的数据中获取满足所述查询条件的第一数据;从所述哈希表中获取满足所述查询条件的第二数据;Acquiring first data satisfying the query condition from data in the second logical table stored by the plurality of storage nodes; acquiring second data satisfying the query condition from the hash table; 基于所述第一数据和所述第二数据,确定所述数据查询指令对应的查询结果,并将所述查询结果返回至所述用户端。Based on the first data and the second data, a query result corresponding to the data query instruction is determined, and the query result is returned to the client. 2.根据权利要求1所述的方法,其特征在于,所述基于所述第一数据和所述第二数据,确定所述数据查询指令对应的查询结果的步骤,包括:2. The method according to claim 1, wherein the step of determining the query result corresponding to the data query instruction based on the first data and the second data comprises: 从所述第一数据和所述第二数据中确定满足所述查询条件的多个目标信息;Determine a plurality of target information satisfying the query condition from the first data and the second data; 对所述多个目标信息进行拼接处理,得到拼接结果;将所述拼接结果确定为所述数据查询指令对应的查询结果。Perform splicing processing on the plurality of target information to obtain a splicing result; and determine the splicing result as a query result corresponding to the data query instruction. 3.根据权利要求2所述的方法,其特征在于,所述对所述多个目标数据信息进行拼接处理,得到拼接结果的步骤,包括:3. method according to claim 2 is characterized in that, the described step of splicing processing to described multiple target data information, obtaining splicing result, comprises: 采用多个线程对所述多个目标信息进行拼接,得到所述拼接结果。Using multiple threads to splicing the multiple target information, the splicing result is obtained. 4.根据权利要求1-3任一项所述的方法,其特征在于,所述从所述多个存储节点存储的第二逻辑表的数据中获取满足所述查询条件的第一数据的步骤,包括:4. The method according to any one of claims 1-3, wherein the step of acquiring the first data that satisfies the query condition from the data of the second logical table stored by the plurality of storage nodes ,include: 针对所述多个存储节点中的每个存储节点,并行执行下述操作:For each storage node in the plurality of storage nodes, the following operations are performed in parallel: 从所述存储节点中拉取所述第二逻辑表的数据;Pulling data of the second logical table from the storage node; 从所述第二逻辑表的数据中查询满足所述查询条件的第一数据。The first data that satisfies the query condition is queried from the data of the second logic table. 5.根据权利要求4所述的方法,其特征在于,所述从所述第二逻辑表的数据中查询满足所述查询条件的第一数据的步骤之后,所述方法还包括:5. The method according to claim 4, wherein after the step of querying the first data satisfying the query condition from the data of the second logic table, the method further comprises: 删除拉取的所述第二逻辑表的数据。The pulled data of the second logical table is deleted. 6.根据权利要求1所述的方法,其特征在于,所述将所述查询结果返回至所述用户端的步骤之后,所述方法还包括:6. The method according to claim 1, wherein after the step of returning the query result to the client, the method further comprises: 从所述指定内存中释放所述哈希表。Free the hash table from the specified memory. 7.一种数据查询装置,其特征在于,所述装置设置于分布式数据库的计算节点;所述计算节点与多个存储节点通信连接;所述多个存储节点存储有第一逻辑表的数据和第二逻辑表的数据;所述装置包括:7. A data query device, characterized in that the device is set on a computing node of a distributed database; the computing node is connected in communication with a plurality of storage nodes; the plurality of storage nodes store data of a first logical table and the data of the second logic table; the apparatus includes: 指令接收模块,用于接收用户端发送的数据查询指令;其中,所述数据查询指令中携带有所述第一逻辑表的第一标识、所述第二逻辑表的第二标识和查询条件;an instruction receiving module, configured to receive a data query instruction sent by the client; wherein, the data query instruction carries the first identifier of the first logic table, the second identifier of the second logic table, and query conditions; 哈希表建立模块,用于从所述多个存储节点中拉取所述第一逻辑表的数据,根据拉取的数据在指定内存中建立所述第一逻辑表对应的哈希表;其中,所述指定内存的哈希表用于:为每个所述存储节点提供待查询的数据;a hash table establishment module, configured to pull the data of the first logic table from the plurality of storage nodes, and establish a hash table corresponding to the first logic table in the specified memory according to the pulled data; wherein , the hash table of the specified memory is used to: provide the data to be queried for each of the storage nodes; 数据查询模块,用于从所述多个存储节点存储的第二逻辑表的数据中获取满足所述查询条件的第一数据;从所述哈希表中获取满足所述查询条件的第二数据;A data query module, configured to obtain first data that satisfies the query condition from the data of the second logical table stored by the plurality of storage nodes; obtain second data that meets the query condition from the hash table ; 结果返回模块,用于基于所述第一数据和所述第二数据,确定所述数据查询指令对应的查询结果,并将所述查询结果返回至所述用户端。A result returning module, configured to determine a query result corresponding to the data query instruction based on the first data and the second data, and return the query result to the client. 8.根据权利要求7所述的装置,其特征在于,所述结果返回模块,还用于:8. The device according to claim 7, wherein the result return module is further used for: 从所述第一数据和所述第二数据中确定满足所述查询条件的多个目标信息;Determine a plurality of target information satisfying the query condition from the first data and the second data; 对所述多个目标信息进行拼接处理,得到拼接结果;将所述拼接结果确定为所述数据查询指令对应的查询结果。Perform splicing processing on the plurality of target information to obtain a splicing result; and determine the splicing result as a query result corresponding to the data query instruction. 9.一种电子设备,其特征在于,所述电子设备包括处理器和存储器,所述存储器存储有能够被所述处理器执行的机器可执行指令,所述处理器执行所述机器可执行指令以实现权利要求1至6任一项所述的数据查询方法。9. An electronic device, characterized in that the electronic device comprises a processor and a memory, the memory stores machine-executable instructions that can be executed by the processor, and the processor executes the machine-executable instructions In order to realize the data query method described in any one of claims 1 to 6. 10.一种机器可读存储介质,其特征在于,所述机器可读存储介质存储有机器可执行指令,所述机器可执行指令在被处理器调用和执行时,所述机器可执行指令促使处理器实现权利要求1至6任一项所述的数据查询方法。10. A machine-readable storage medium, characterized in that the machine-readable storage medium stores machine-executable instructions that, when invoked and executed by a processor, cause the machine-executable instructions to The processor implements the data query method according to any one of claims 1 to 6.
CN202110645512.3A 2021-06-09 2021-06-09 Data query method, device and electronic device Active CN113377774B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110645512.3A CN113377774B (en) 2021-06-09 2021-06-09 Data query method, device and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110645512.3A CN113377774B (en) 2021-06-09 2021-06-09 Data query method, device and electronic device

Publications (2)

Publication Number Publication Date
CN113377774A true CN113377774A (en) 2021-09-10
CN113377774B CN113377774B (en) 2025-03-07

Family

ID=77573480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110645512.3A Active CN113377774B (en) 2021-06-09 2021-06-09 Data query method, device and electronic device

Country Status (1)

Country Link
CN (1) CN113377774B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060063544A (en) * 2004-12-06 2006-06-12 한국전자통신연구원 Flow measuring device and method
CN101452486A (en) * 2008-12-31 2009-06-10 中国建设银行股份有限公司 System data management method for [inscriptions on bones or tortoise shells and apparatus thereof
US20100179954A1 (en) * 2009-01-09 2010-07-15 Linkage Technology Group Co., Ltd. Quick Mass Data Manipulation Method Based on Two-Dimension Hash
KR20130047431A (en) * 2011-10-31 2013-05-08 에스케이씨앤씨 주식회사 Method for storaging in memory and pararell-processing for batch process of mass information
CN110442574A (en) * 2019-07-01 2019-11-12 上海赜睿信息科技有限公司 A kind of data processing method, electronic equipment and computer readable storage medium
CN111639078A (en) * 2020-05-25 2020-09-08 北京百度网讯科技有限公司 Data query method and device, electronic equipment and readable storage medium
CN112417227A (en) * 2021-01-21 2021-02-26 国能信控互联技术有限公司 Real-time data storage and query method based on hash table and red-black tree
CN112463795A (en) * 2020-11-26 2021-03-09 杭州安恒信息技术股份有限公司 Dynamic hash method, device, equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060063544A (en) * 2004-12-06 2006-06-12 한국전자통신연구원 Flow measuring device and method
CN101452486A (en) * 2008-12-31 2009-06-10 中国建设银行股份有限公司 System data management method for [inscriptions on bones or tortoise shells and apparatus thereof
US20100179954A1 (en) * 2009-01-09 2010-07-15 Linkage Technology Group Co., Ltd. Quick Mass Data Manipulation Method Based on Two-Dimension Hash
KR20130047431A (en) * 2011-10-31 2013-05-08 에스케이씨앤씨 주식회사 Method for storaging in memory and pararell-processing for batch process of mass information
CN110442574A (en) * 2019-07-01 2019-11-12 上海赜睿信息科技有限公司 A kind of data processing method, electronic equipment and computer readable storage medium
CN111639078A (en) * 2020-05-25 2020-09-08 北京百度网讯科技有限公司 Data query method and device, electronic equipment and readable storage medium
CN112463795A (en) * 2020-11-26 2021-03-09 杭州安恒信息技术股份有限公司 Dynamic hash method, device, equipment and storage medium
CN112417227A (en) * 2021-01-21 2021-02-26 国能信控互联技术有限公司 Real-time data storage and query method based on hash table and red-black tree

Also Published As

Publication number Publication date
CN113377774B (en) 2025-03-07

Similar Documents

Publication Publication Date Title
CN108228799B (en) Object index information storage method and device
WO2017166630A1 (en) Task priority correctingon method and device
JP6932360B2 (en) Object search method, device and server
WO2017080139A1 (en) Region division method in distributed database, region node and system
WO2015078158A1 (en) Webpage loading method, client and server
CN106339267B (en) A kind of Object Query method and server-side
CN112907198A (en) Service state circulation maintenance method and device and electronic equipment
CN113411364B (en) Resource acquisition method and device and server
CN111159187A (en) Two-dimensional query method and device, terminal device and computer readable storage medium
CN113377774A (en) Data query method and device and electronic equipment
CN113961730A (en) Graph data query method, system, computer device and readable storage medium
CN103020186B (en) A kind of document retrieval method based on embedded device, device and equipment
CN105389394A (en) Data request processing method and device based on a plurality of database clusters
CN113297305A (en) Session state processing method, device, equipment and storage medium
CN110955460A (en) Service process starting method and device, electronic equipment and storage medium
CN112887113A (en) Method, device and system for processing data
CN113268483B (en) Request processing method and device, electronic equipment and storage medium
CN105653367A (en) Method and device for traversing timer
CN114996307B (en) A method and device for federal processing of data
CN115525330A (en) Index data creating method, data query method and device
CN111538747B (en) Data query method, device and equipment and auxiliary data query method, device and equipment
CN115081235B (en) Feature processing method, device, storage medium and electronic device
JP2018500650A (en) Method, apparatus and system for determining the presence of a data file
CN112486981A (en) Data processing method and device, electronic equipment and computer storage medium
CN114760238B (en) Internet of things information processing method and system for peer-to-peer network and Internet of things equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant