CN106250565A - Querying method based on burst relevant database and system - Google Patents
Querying method based on burst relevant database and system Download PDFInfo
- Publication number
- CN106250565A CN106250565A CN201610771058.5A CN201610771058A CN106250565A CN 106250565 A CN106250565 A CN 106250565A CN 201610771058 A CN201610771058 A CN 201610771058A CN 106250565 A CN106250565 A CN 106250565A
- Authority
- CN
- China
- Prior art keywords
- value
- column name
- record
- count
- burst
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 238000012545 processing Methods 0.000 claims description 37
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000009826 distribution Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 9
- 238000004590 computer program Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000003860 storage Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 238000011017 operating method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of querying method based on burst relevant database and system to solve mass data to be read in prior art in the internal memory of Centroid, intermediate node order execution performance is excessively poor, data volume is big when, it is impossible to the problem meeting the demand of online query.The method comprising the steps of: the semantic following query statement of reception: SELECT column name A, COUNT (DISTINCT column name B) FROM table name T GROUP BY column name G;Column name G is the subset of column name A;It is avoided that Centroid memory consumption even internal memory overflows.
Description
Technical field
The present invention relates to the inquiry of distributed data, particularly to the inquiry comprising count, distinct and groupby.
Background technology
Big data age, data statistics is a common business.Present statistical business usually builds at relationship type number
On storehouse, such as mysql.This kind of relevant database, because the restriction of unit, data volume, after ten million rank, is added up
Performance will drastically decline.Normally, multiple machine can be distributed data across by relevant database is done horizontal fragmentation
On, the problem solving unit bottleneck.
After burst, the execution process of measurement type SQL can become once adds up, then in the execution of each burst node
Statistics is once collected, to ensure the semanteme of SQL at Centroid.For having packet, duplicate removal, the SQL meeting of tally function
Run into the problem in function and performance.Such SQL is as follows: select group_col, count (distinct dist_col)
from table group by group_col.Wherein group_col and dist_col can be one or more field
Name.
Common, perform if such SQL to be directly issued to each burst, and at Centroid, each burst is returned
Result collect, it may appear that result mistake.Because in multiple bursts, in fact it could happen that group_col and dist_col mono-
The data of sample, burst performs to collect again and there will be repeat count and cause result to become big.
Additionally, in order to ensure that result is correct, it is therefore possible to use following methods performs such SQL.
1. read, from each burst node, the field value that group_col and dist_col comprises.Concrete SQL is: select
Group_col, dist_col from table;
2. all of data of order traversal, first calculate the packet belonging to data according to group_col, then in this packet
Deduplication operation is performed according to dist_col.
After the most all data traversal complete, calculate the dist_col quantity after duplicate removal in each packet.
The method problem is encountered that, can mass data be read in the internal memory of Centroid, and intermediate node order is held
Row performance is excessively poor, data volume is big when, it is impossible to meet the demand of online query.
Summary of the invention
Given below simplify one or more aspect is summarized to try hard to provide the basic comprehension in terms of this type of.This
Summarize the extensive overview of the not all aspect contemplated, and be both not intended to identify the key or decisive of all aspects
Key element is the most non-attempts to define the scope in terms of any or all.Its unique purpose is intended to be given in simplified form one or more
Some concepts of individual aspect are using as the brightest sequence given later.
The present invention provides a kind of querying method based on burst relevant database and system to solve will be big in prior art
Amount digital independent is in the internal memory of Centroid, and intermediate node order execution performance is excessively poor, data volume is big when, it is impossible to
The problem meeting the demand of online query.
For achieving the above object, inventor provides querying method based on burst relevant database, including step:
S101, the semantic following query statement of reception: SELECT column name A, COUNT (DISTINCT column name B) FROM
Table name T GROUP BY column name G;Column name G is the subset of column name A;
S102, perform SELECT column name C FROM table name T respectively at each burst node, column name C be column name A with
The union of column name B;Every record in the Query Result of above-mentioned each burst node is processed, processes i.e. according to every note
The b value of record takes its hash value, and identical data pipe put in the record that hash value is identical, and b value is corresponding to column name B in record
Value;
S103, in the record of identical data pipeline, be grouped according to the value of G row respectively, in each packet, calculate this point
Number count that in group, different b values occur, the corresponding corresponding relation (g, count) of each packet result of calculation;
S104, corresponding relation (G, the COUNT) merging that above-mentioned packet calculating in each pipeline is obtained, i.e. according to column name G,
Merge the record that g value is identical, the value of the COUNT that value is all corresponding relations the merged row of the COUNT row of the record after merging
It is added;The result merged is Query Result.
Further, column name G is equal to column name A.
Further, number N of data pipe is the twice of machine cpu core number, and described step " processes i.e. according to every note
Record b value take its hash value, identical data pipe put in the record that hash value is identical " according to every record b value ask for this b
The hash value that value is corresponding, and by this hash value mould N, distribute this data pipe that recorded correspondence, numerical value 0-according to the value of mould N
To the most unique corresponding 1 data pipe of N-1.
Further, in step " the semantic following query statement of reception: SELECT column name A, COUNT (DISTINCT row name
Claim B) FROM table name T GROUP BY column name G;Further comprised the steps of: before "
Reading database query statement,
Judge whether query statement meets the meaning of one's words;
If the meaning of one's words of not meeting, then return;
If meeting the meaning of one's words, then perform the step of above-mentioned reception query statement.
Further, the most above-mentioned S101, S102 are that the size according to the table T in burst node performs in batches, above-mentioned steps
S103 performs at burst node.
A kind of distributed data base system is also provided herein, it include at least the first burst node and the second burst node,
Receiver module, the first processing module, the second processing module, the 3rd processing module;First burst node and the second burst node divide
Cun Chu there be the dropping cut slice data of data base;
The semantic following query statement of receiver module reception: SELECT column name A, COUNT (DISTINCT column name B)
FROM table name T GROUP BY column name G;Column name G is the subset of column name A;First processing module is for the first burst
Node and the first burst node send query statement SELECT column name C FROM table name T respectively, and column name C is column name A
With the union of column name B, every record in the Query Result of each burst node is processed, process i.e. according to every record
B value take its hash value, identical data pipe put in the record that hash value is identical, and b value is in record corresponding to column name B
Value;
Second processing module, for the record at identical data pipeline, is grouped, in each packet according to the value of G row respectively
Number count that in this packet of interior calculating, different b values occur, each packet result of calculation be correspondence a corresponding relation (g,
count);
3rd processing module merges, i.e. for above-mentioned packet in each pipeline calculates the corresponding relation (G, COUNT) obtained
According to column name G, merging the record that g value is identical, the value of the COUNT row of the record after merging is all corresponding relations merged
The value of COUNT row is added;The result merged is Query Result.
Further, column name G is equal to column name A.
Further, number N of data pipe is the twice of machine cpu core number, and the first processing module is for according to every
The b value of record asks for the hash value that this b value is corresponding, and by this hash value mould N, according to the distribution of mould N value, this recorded correspondence
Data pipe, the most unique corresponding 1 data pipe of numerical value 0-to N-1.
Further, receiver module is at " the semantic following query statement of reception: SELECT column name A, COUNT
(DISTINCT column name B) FROM table name T GROUP BY column name G;Before ", it is additionally operable to reading database query statement,
Judge that query statement meets the meaning of one's words;If the meaning of one's words of not meeting, then return;If meeting the meaning of one's words, then continue executing with.
Further, the first processing module, the second processing module are according to the size of table T in burst node, divide table T data
Criticize and perform;3rd processing module is positioned at burst node.
Being different from prior art, technique scheme reads the process of data and uses streaming (small lot) to read, and batch is pressed
Need ground that data are read internal memory from each burst, it is to avoid disposable mass data pours in Centroid and causes excessive internal memory to disappear
Consume even internal memory to overflow.
For addressing relevant purpose before reaching, this one or more aspect is included in and is hereinafter fully described and appended
The feature particularly pointed out in claim.The following description and drawings illustrate this one or more in terms of some explanation
Property feature.But, it is several that these features only indicate in the various modes of the principle that can use various aspect, and
This description is intended to this type of aspects all and equivalence aspect thereof.
Accompanying drawing explanation
Below with reference to accompanying drawing, disclosed aspect is described, it is provided that accompanying drawing illustrates that and non-limiting disclosed side
Face, label sign similar elements similar in accompanying drawing, and wherein:
Fig. 1 is to combine concrete data querying method based on burst relevant database described herein is described;
Fig. 2 is querying method based on burst relevant database described herein.
Label in Fig. 1 is for referring to its table pointed to.
Detailed description of the invention
By describing the technology contents of technical scheme, structural feature in detail, being realized purpose and effect, below in conjunction with concrete real
Execute example and coordinate accompanying drawing to be explained in detail.In the following description, elaborate that numerous details is to provide right for explanatory purposes
Thorough understanding in terms of one or more.It will be evident that do not have these details can put into practice this type of aspect yet.
The present invention discloses a kind of querying method based on burst relevant database, and burst relevant database is that level is divided
The distributed data base of sheet includes step:
S101, the semantic following query statement of reception: SELECT column name A, COUNT (DISTINCT column name B) FROM
Table name T GROUP BY column name G;Column name G is the subset of column name A;
S102, perform SELECT column name C FROM table name T respectively at each burst node, column name C be column name A with
The union of column name B;Every record in the Query Result of above-mentioned each burst node is processed, processes i.e. according to every note
The b value of record takes its hash value, and identical data pipe put in the record that hash value is identical, and b value is corresponding to column name B in record
Value;
S103, in the record of identical data pipeline, be grouped according to the value of G row respectively, in each packet, calculate this point
Number count that in group, different b values occur, the corresponding corresponding relation (g, count) of each packet result of calculation;
S104, corresponding relation (G, the COUNT) merging that above-mentioned packet calculating in each pipeline is obtained, i.e. according to column name G,
Merge the record that g value is identical, the value of the COUNT that value is all corresponding relations the merged row of the COUNT row of the record after merging
It is added;The result merged is Query Result.
Step S101-S103 can perform in burst node, it is also possible to is by the data of burst node or process
The data that corresponding step obtains read the 3rd node by the side of data stream after (such as Centroid), in the 3rd node execution
's.Centroid refers to the node that data-handling capacity is strong, it is also possible to refer to host node (other burst nodes are for from node).
Preferably, step S102, S103 perform in burst node, reduce Centroid the most to a certain extent
Load, uses due to above-mentioned Distributed Calculation simultaneously, improves technical efficiency and speed.
As a example by readily appreciate step S104, further below illustrate: in certain embodiments, have 4 each data
Pipeline, in each data pipe, is grouped according to G train value respectively, calculates the number that in packet, different b values occur respectively,
Such as the number of group that G train value is 2 statistics b value appearance, each data pipe calculate respectively following corresponding relation (2,
3), (2,4), (2,1), (2,2), merge the corresponding relation that g value is 2 and i.e. obtain (2,3+4+1+2), i.e. (2,10).By above-mentioned
The corresponding relation of the institute that method merges each data pipe i.e. obtains Query Result corresponding to query statement that user inputs, this inquiry
Result is (G, COUNT);
It is understood that can be with the data read in burst node of small lot and real-time to reading in above-mentioned steps
Data perform SELECT column name CFROM table name T, the Query Result of these small lot data is noted down one by one and processes
(processing can be in burst node, it is also possible at Centroid), and need not disposably read or process mass data, find out certain
One node calculates or internal memory over loading.
Being different from prior art, above-mentioned steps reads the process of data and uses streaming (small lot) to read, and batch is desirably
Data are read internal memory from each burst, it is to avoid disposable mass data pours in Centroid and causes excessive memory consumption very
Overflow to internal memory.
It is understandable that the B field in " DISTINCT column name B " may not numeric type, it may be possible to character string
Or time, the combination of the most multiple fields.
For numeric type, character string type, time type, numeric results after taking hash value, can be obtained.More than multi-field
Scene, can be by each field hash after value add up, as last hash result.Generally speaking, the purpose of hash
It is aiming at the nonnumeric type of all B field and B field is the scene that multi-field combines, calculate the knot of a numeric type
Really (ensure two just the same B field values here, hash result can be the same).
With once combine concrete data (See Fig. 1 and Fig. 2) inquiry based on burst relevant database described herein is described
Method:
Step1. original data storage is at two bursts 111 and 121, and each burst is respectively arranged with 3 row records, and record of often going has 3
Row, are id respectively, user_id, date.
Step2. the SQL that needs perform now is: select date, count (distinct user_id) from
table group by date。
Step3. data inquiry module is according to the quantity of burst, uses 2 threads to read data from two bursts respectively, reads
The SQL used that fetches data is: select user_id, date from table, reads result (i.e. inquiry in step S102
Result) it is respectively 112 and 122.Data query in the query script of Step2, Step3 i.e. corresponding diagram 2;
Step4. the distinct_col field of each record can be traveled through during data distribution (for user_id word in this example
Section), according to the hash value delivery 3 (quantity of data pipe is 3, numbering respectively 0,1,2) of this field, can obtain 0 or 1 or
Person 2 three is worth one of them.According to result above, select the data pipe of reference numeral, place data into wherein.This example
In, hash algorithm is the value directly taking user_id, user_id be 1 and 4 two records be assigned to the data pipe of index=1
In road.I.e. 1%3=1,4%3=1, in figure, 113,123 according to delivery result, and record is assigned to different pipeline 214, pipeline
224 and pipeline 234.The data distribution of the i.e. corresponding diagram 2 of Step 4;
The most here there are 3 threads to be responsible for data and calculate (keeping consistent with the quantity of data pipe), each calculating mould
The corresponding data pipe of block.During calculating, each data are carried out according to group_col field (being date field here)
Packet, in Fig. 1,214 obtain 216 according to the packet of date field, and 224 obtain 226 according to the packet of date field, and 234 according to date word
Section packet obtains 236 the intermediate object program of said process (215,225,235 be designated as).According to distinct_col field (this
In be user_id) duplicate removal.
Step6. in computing module has consumed the data pipe oneself being responsible for after all data, can obtain a date and
The mapping relations of user_id chained list (after duplicate removal), count each user_id chained list, (i.e. in Fig. 1 merge 216,226,
The table that 236 labels are corresponding obtains the table as shown in label 300) obtain the mapping of a date and user_id quantity (after duplicate removal)
Relation.Above-mentioned Step 5, Step 6 process i.e. corresponding diagram 2 in data calculate;
Step7. result merging process is completed by main thread, after the result of each data calculation process is collected, for
The record that date is identical, is added the value of count (distinct user_id) and obtains final result.Above-mentioned Step7 is the most right
The result answering Fig. 2 merges.
It is understandable that in Fig. 2 and describes the data query of the present invention, data distribution and data meter by the mode simplified
Calculation can be that multithreading performs.
The present invention also provides for a kind of distributed data base system realizing said method,
A kind of distributed data base system is also provided herein, for realizing above-mentioned inquiry based on burst relevant database
Method, it include at least the first burst node and the second burst node, receiver module, the first processing module, the second processing module,
3rd processing module;First burst node and the second burst node store the dropping cut slice data of data base respectively.Receive mould
Block, the first processing module, the second processing module, burst node can be in can also be in Centroid.
Receiver module connects the first processing module, and the first processing module connects the second processing module, and the second processing module is even
Connect the 3rd processing module.
The semantic following query statement of receiver module reception: SELECT column name A, COUNT (DISTINCT column name B)
FROM table name T GROUP BY column name G;Column name G is the subset of column name A;First processing module is for the first burst
Node and the first burst node send query statement SELECT column name CFROM table name T respectively, column name C be column name A with
The union of column name B, processes every record in the Query Result of each burst node, processes i.e. according to every record
B value takes its hash value, and identical data pipe put in the record that hash value is identical, and b value is corresponding to column name B in record
Value;
Second processing module, for the record at identical data pipeline, is grouped, in each packet according to the value of G row respectively
Number count that in this packet of interior calculating, different b values occur, each packet result of calculation be correspondence a corresponding relation (g,
count);
3rd processing module merges, i.e. for above-mentioned packet in each pipeline calculates the corresponding relation (G, COUNT) obtained
According to column name G, merging the record that g value is identical, the value of the COUNT row of the record after merging is all corresponding relations merged
The value of COUNT row is added;The result merged is Query Result.
In certain embodiments, column name G is equal to column name A.
In certain embodiments, number N of data pipe is the twice of machine cpu core number, and the first processing module is used for
B value according to every record asks for the hash value that this b value is corresponding, and by this hash value mould N, distributes this record according to mould N value
To corresponding data pipe, the most unique corresponding 1 data pipe of numerical value 0-to N-1.
In certain embodiments, receiver module for " receive semantic following query statement: SELECT column name A,
COUNT (DISTINCT column name B) FROM table name T GROUP BY column name G;Before ", it is additionally operable to reading database inquiry
Statement, it is judged that query statement meets the meaning of one's words;If the meaning of one's words of not meeting, then return;If meeting the meaning of one's words, then continue executing with.
In certain embodiments, the first processing module, the second processing module are according to the size of table T in burst node, right
Table T data perform in batches;3rd processing module is positioned at Centroid.
It should be noted that in this article, the relational terms of such as first and second or the like is used merely to a reality
Body or operation separate with another entity or operating space, and deposit between not necessarily requiring or imply these entities or operating
Relation or order in any this reality.And, term " includes ", " comprising " or its any other variant are intended to
Comprising of nonexcludability, so that include that the process of a series of key element, method, article or terminal unit not only include those
Key element, but also include other key elements being not expressly set out, or also include for this process, method, article or end
The key element that end equipment is intrinsic.In the case of there is no more restriction, statement " including ... " or " comprising ... " limit
Key element, it is not excluded that there is also other key element in including the process of described key element, method, article or terminal unit.This
Outward, in this article, " be more than ", " being less than ", " exceeding " etc. are interpreted as not including this number;More than " ", " below ", " within " etc. understand
For including this number.
Those skilled in the art are it should be appreciated that the various embodiments described above can be provided as method, device or computer program product
Product.These embodiments can use complete hardware embodiment, complete software implementation or combine software and hardware in terms of embodiment
Form.All or part of step in the method that the various embodiments described above relate to can instruct relevant hardware by program
Completing, described program can be stored in the storage medium that computer equipment can read, and is used for performing the various embodiments described above side
All or part of step described in method.Described computer equipment, includes but not limited to: personal computer, server, general-purpose computations
Machine, special-purpose computer, the network equipment, embedded device, programmable device, intelligent mobile terminal, intelligent home device, Wearable
Smart machine, vehicle intelligent equipment etc.;Described storage medium, includes but not limited to: RAM, ROM, magnetic disc, tape, CD, sudden strain of a muscle
Deposit, the storage of USB flash disk, portable hard drive, storage card, memory stick, the webserver, network cloud storage etc..
The various embodiments described above are with reference to according to the method described in embodiment, equipment (system) and computer program
Flow chart and/or block diagram describe.It should be understood that can every by computer program instructions flowchart and/or block diagram
Flow process in one flow process and/or square frame and flow chart and/or block diagram and/or the combination of square frame.These computers can be provided
Programmed instruction to the processor of computer equipment to produce a machine so that the finger performed by the processor of computer equipment
Order produces for realizing specifying in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame
The device of function.
These computer program instructions may be alternatively stored in the computer that computer equipment can be guided to work in a specific way and set
In standby readable memory so that the instruction being stored in this computer equipment readable memory produces the manufacture including command device
Product, this command device realizes at one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame middle finger
Fixed function.
These computer program instructions also can be loaded on computer equipment so that performs a series of on a computing device
Operating procedure is to produce computer implemented process, thus the instruction performed on a computing device provides for realizing in flow process
The step of the function specified in one flow process of figure or multiple flow process and/or one square frame of block diagram or multiple square frame.
Although being described the various embodiments described above, but those skilled in the art once know basic wound
The property made concept, then can make other change and amendment to these embodiments, so the foregoing is only embodiments of the invention,
Not thereby the scope of patent protection of the present invention, every equivalent structure utilizing description of the invention and accompanying drawing content to be made are limited
Or equivalence flow process conversion, or directly or indirectly it is used in other relevant technical fields, the most in like manner it is included in the patent of the present invention
Within protection domain.
Claims (10)
1. querying method based on burst relevant database, it is characterised in that include step:
S101, the semantic following query statement of reception: SELECT column name A, COUNT (DISTINCT column name B) FROM table name
Claim T GROUP BY column name G;Column name G is the subset of column name A;
S102, performing SELECT column name C FROM table name T respectively at each burst node, column name C is column name A and row name
Claim the union of B;Every record in the Query Result of above-mentioned each burst node is processed, processes i.e. according to every record
B value takes its hash value, and identical data pipe put in the record that hash value is identical, and b value is corresponding to column name B in record
Value;
S103, to the record at identical data pipeline, be grouped according to the value of G row respectively, in each packet, calculate this packet
Number count that interior different b value occurs, the corresponding corresponding relation (g, count) of each packet result of calculation;
S104, corresponding relation (G, the COUNT) merging above-mentioned packet calculating in each pipeline obtained, i.e. according to column name G, merge
The record that g value is identical, the value of the COUNT that value is all corresponding relations the merged row of the COUNT row of the record after merging is added;
The result merged is Query Result.
Querying method based on burst relevant database the most according to claim 1, it is characterised in that column name G is equal to
Column name A.
Querying method based on burst relevant database the most according to claim 1, it is characterised in that data pipe
Number N is the twice of machine cpu core number, and described step " processes and i.e. takes its hash value, hash value phase according to every record b value
With record put into identical data pipe " according to the b value of every record ask for the hash value of this b value correspondence, and should
Hash value mould N, distributes this data pipe that recorded correspondence according to the value of mould N, numerical value 0 to N-1 the most corresponding 1 number respectively
According to pipeline.
Querying method based on burst relevant database the most according to claim 1, its feature is being, in step
" the semantic following query statement of reception: SELECT column name A, COUNT (DISTINCT column name B) FROM table name T GROUP
BY column name G;Further comprised the steps of: before "
Reading database query statement,
Judge whether query statement meets the meaning of one's words;
If the meaning of one's words of not meeting, then return;
If meeting the meaning of one's words, then perform the step of above-mentioned reception query statement.
Querying method based on burst relevant database the most according to claim 1, it is characterised in that the most above-mentioned
S101, S102 are that the size according to the table T in burst node performs in batches, and above-mentioned steps S103 performs at burst node.
6. distributed data base system, it is characterised in that it includes at least the first burst node and the second burst node, receives mould
Block, the first processing module, the second processing module, the 3rd processing module;First burst node and the second burst node store respectively
There are the dropping cut slice data of data base;It is characterized in that,
The semantic following query statement of receiver module reception: SELECT column name A, COUNT (DISTINCT column name B) FROM table
Title T GROUP BY column name G;Column name G is the subset of column name A;First processing module for the first burst node and
First burst node sends query statement SELECT column name C FROM table name T respectively, by the Query Result of each burst node
In every record process, process i.e. according to every record b value take its hash value, phase put in the record that hash value is identical
Same data pipe, b value value corresponding to column name B in record;
Second processing module, for the record at identical data pipeline, is grouped according to the value of G row respectively, counts in each packet
Calculate number count that different b values occur in this packet, each packet result of calculation be correspondence a corresponding relation (g,
count);
3rd processing module merges for above-mentioned packet in each pipeline calculates the corresponding relation (G, COUNT) obtained, i.e. basis
Column name G, merges the record that g value is identical, and the value of the COUNT row of the record after merging is all corresponding relations merged
The value of COUNT row is added;The result merged is Query Result.
Distributed data base system the most according to claim 6, it is characterised in that column name G is equal to column name A.
Distributed data base system the most according to claim 6, it is characterised in that number N of data pipe is machine cpu
The twice of core number, the first processing module is used for the b value according to every record and asks for the hash value that this b value is corresponding, and should
Hash value mould N, according to the distribution of mould N value, this recorded the data pipe of correspondence, numerical value 0-to N-1 the most corresponding 1 number respectively
According to pipeline.
Distributed data base system the most according to claim 6, it is characterised in that receiver module is for " receiving semanteme
Following query statement: SELECT column name A, COUNT (DISTINCT column name B) FROM table name T GROUP BY column name
G;Before ", it is additionally operable to reading database query statement, it is judged that query statement meets the meaning of one's words;If the meaning of one's words of not meeting, then return;If
Meet the meaning of one's words, then continue executing with.
Distributed data base system the most according to claim 6, it is characterised in that the first processing module, the second process mould
Block is according to the size of table T in burst node, performs table T data in batches;3rd processing module is positioned at burst node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610771058.5A CN106250565B (en) | 2016-08-30 | 2016-08-30 | Querying method and system based on fragment relevant database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610771058.5A CN106250565B (en) | 2016-08-30 | 2016-08-30 | Querying method and system based on fragment relevant database |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106250565A true CN106250565A (en) | 2016-12-21 |
CN106250565B CN106250565B (en) | 2019-05-07 |
Family
ID=58080520
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610771058.5A Active CN106250565B (en) | 2016-08-30 | 2016-08-30 | Querying method and system based on fragment relevant database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106250565B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110825816A (en) * | 2020-01-09 | 2020-02-21 | 四川新网银行股份有限公司 | System and method for data acquisition of partitioned database |
CN110851483A (en) * | 2019-11-07 | 2020-02-28 | 京东数字科技控股有限公司 | Method, apparatus, electronic device, and medium for screening objects |
CN118484473A (en) * | 2024-07-15 | 2024-08-13 | 浙江智臾科技有限公司 | Database query optimization method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101093493A (en) * | 2006-06-23 | 2007-12-26 | 国际商业机器公司 | Speech conversion method for database inquiry, converter, and database inquiry system |
US20100077107A1 (en) * | 2008-09-19 | 2010-03-25 | Oracle International Corporation | Storage-side storage request management |
CN102663116A (en) * | 2012-04-11 | 2012-09-12 | 中国人民大学 | Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse |
CN102722531A (en) * | 2012-05-17 | 2012-10-10 | 北京大学 | Query method based on regional bitmap indexes in cloud environment |
CN103310023A (en) * | 2013-07-05 | 2013-09-18 | 深圳中兴网信科技有限公司 | Distributed searching system and method |
CN104756101A (en) * | 2012-10-31 | 2015-07-01 | 惠普发展公司,有限责任合伙企业 | Executing a query having multiple set operators |
CN105335403A (en) * | 2014-07-23 | 2016-02-17 | 华为技术有限公司 | Database access method and device, and database system |
-
2016
- 2016-08-30 CN CN201610771058.5A patent/CN106250565B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101093493A (en) * | 2006-06-23 | 2007-12-26 | 国际商业机器公司 | Speech conversion method for database inquiry, converter, and database inquiry system |
US20100077107A1 (en) * | 2008-09-19 | 2010-03-25 | Oracle International Corporation | Storage-side storage request management |
CN102663116A (en) * | 2012-04-11 | 2012-09-12 | 中国人民大学 | Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse |
CN102722531A (en) * | 2012-05-17 | 2012-10-10 | 北京大学 | Query method based on regional bitmap indexes in cloud environment |
CN104756101A (en) * | 2012-10-31 | 2015-07-01 | 惠普发展公司,有限责任合伙企业 | Executing a query having multiple set operators |
CN103310023A (en) * | 2013-07-05 | 2013-09-18 | 深圳中兴网信科技有限公司 | Distributed searching system and method |
CN105335403A (en) * | 2014-07-23 | 2016-02-17 | 华为技术有限公司 | Database access method and device, and database system |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110851483A (en) * | 2019-11-07 | 2020-02-28 | 京东数字科技控股有限公司 | Method, apparatus, electronic device, and medium for screening objects |
CN110851483B (en) * | 2019-11-07 | 2021-03-05 | 京东数字科技控股有限公司 | Method, apparatus, electronic device, and medium for screening objects |
CN110825816A (en) * | 2020-01-09 | 2020-02-21 | 四川新网银行股份有限公司 | System and method for data acquisition of partitioned database |
CN110825816B (en) * | 2020-01-09 | 2020-04-21 | 四川新网银行股份有限公司 | System and method for data acquisition of partitioned database |
CN118484473A (en) * | 2024-07-15 | 2024-08-13 | 浙江智臾科技有限公司 | Database query optimization method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106250565B (en) | 2019-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103778135B (en) | A kind of distribution storage of real time data and paging query method | |
US10558664B2 (en) | Structured cluster execution for data streams | |
CN103577440B (en) | A kind of data processing method and device in non-relational database | |
CN106528787B (en) | query method and device based on multidimensional analysis of mass data | |
CN103617295B (en) | A kind of method and apparatus of geographical vector data processing | |
CN103902701B (en) | A kind of data-storage system and storage method | |
CN102024062B (en) | Device and method for realizing data dynamic cache | |
CN106325756B (en) | Data storage method, data calculation method and equipment | |
CN110209686A (en) | Storage, querying method and the device of data | |
CN108268586A (en) | Across the data processing method of more tables of data, device, medium and computing device | |
CN103714096A (en) | Lucene-based inverted index system construction method and device, and Lucene-based inverted index system data processing method and device | |
CN106155630A (en) | Sequencing method, unserializing method, serializing device and unserializing device | |
CN103440246A (en) | Intermediate result data sequencing method and system for MapReduce | |
CN106649828A (en) | Data query method and system | |
CN113722415B (en) | Point cloud data processing method and device, electronic equipment and storage medium | |
CN106095863A (en) | A kind of multidimensional data query and storage system and method | |
CN106250565A (en) | Querying method based on burst relevant database and system | |
CN106250457A (en) | The inquiry processing method of big data platform Materialized View and system | |
CN104809246A (en) | Method and device for processing charging data | |
CN106844320A (en) | A kind of financial statement integration method and equipment | |
CN104199821B (en) | A kind of flow data cube construction method based on Sketch | |
CN106802787B (en) | MapReduce optimization method based on GPU sequence | |
CN108062378A (en) | The Connection inquiring method and system of more time serieses under a kind of column storage | |
CN111026759A (en) | Hbase-based report generation method and device | |
CN110245978A (en) | Policy evaluation, policy selection method and device in tactful group |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |