Invention content
The embodiment of the present invention provides a kind of method, apparatus and system of the processing of distributed parallel task, can solve existing
There is the complexity of the distributed parallel task processing system in technology higher, slow the asking of distributed parallel task processing
Topic.
In a first aspect, the embodiment of the present invention provides a kind of method of distributed parallel task processing, including:
Receive pending data;
It is multiple data fragmentations by the pending data cutting;
The multiple data fragmentation is respectively allocated to multiple processing nodes to handle;
Receive the sub- result data after each processing node processing;
The sub- result data is merged, result data is formed.
Second aspect, the embodiment of the present invention provide a kind of method of distributed parallel task processing, including:
Receive the data fragmentation that control node is sent;Wherein, the data fragmentation is that the control node cutting is pending
Data and obtain, the pending data are not grouped and sort;
Data in the data fragmentation are handled, sub- result data is formed;
The sub- result data is sent to the control node.
The third aspect, the embodiment of the present invention provide a kind of control node, including:
Receiving unit, for receiving pending data;
Cutting unit, the pending data cutting for receiving the receiving unit are multiple data fragmentations;
Allocation unit is handled for the multiple data fragmentation to be respectively allocated to multiple processing nodes;
The receiving unit is additionally operable to receive the sub- result data after each processing node processing;
Combining unit, the sub- result data for receiving the receiving unit merge, and form result data.
Fourth aspect, the embodiment of the present invention provide a kind of processing node, including:
Receiving unit, the data fragmentation for receiving control node transmission;Wherein, the data fragmentation is the control section
It puts the pending data of cutting and obtains, the pending data are not grouped and sort;
Processing unit, the data in the data fragmentation for receiving receiving unit are handled, and form sub- result
Data;
Transmission unit, the sub- result data for forming the processing unit are sent to the control node.
5th aspect, the system that the embodiment of the present invention provides a kind of processing of distributed parallel task, including control node and
Multiple processing nodes, wherein
The pending data cutting is multiple data point for receiving pending data by the control node
The multiple data fragmentation is respectively allocated to multiple processing nodes and handled by piece;
The processing node, the data fragmentation sent for receiving the control node, by the number in the data fragmentation
According to being handled, sub- result data is formed, and the sub- result data is sent to the control node;
The control node is additionally operable to receive the sub- result data after each processing node processing, by the sub- number of results
According to merging, result data is formed.
The method, apparatus and system of distributed parallel task processing provided by the invention, control node receives pending
The pending data cutting is multiple data fragmentations, the multiple data fragmentation is respectively allocated to multiple places by data
Reason node is handled, and receives the sub- result data after each processing node processing, and the sub- result data is merged,
Form result data.And in the prior art, control node is receiving pending data, needs first to pending data
It is grouped and sorts, under the scene that some do not need packet sequence, the mode of the prior art increases entire distribution
The complexity of formula parallel task processing system so that the speed of distributed parallel task processing is slower.And provided by the invention point
The mode of cloth parallel task processing can reduce entire distributed parallel it is not necessary that pending data are grouped and are sorted
The complexity of task processing system can promote the speed of distributed parallel task processing.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other without creative efforts
Embodiment shall fall within the protection scope of the present invention.
The advantages of to make technical solution of the present invention, is clearer, makees specifically to the present invention with reference to the accompanying drawings and examples
It is bright.
As shown in Figure 1, the method for distributed parallel task processing provided in an embodiment of the present invention, carries out from control node side
It illustrates, the method includes:
101, pending data are received.
In distributed parallel task, the data volume of the pending data is generally large, and the size of data volume is general
More than 1 terabyte (Terabyte, abbreviation TB), but it is not only limited to this.
102, it is multiple data fragmentations by the pending data cutting.
Wherein, it is data fragmentation, institute that the pending data can carry out cutting according to the quantity of the processing node
The quantity for stating data fragmentation is identical as the processing quantity of node, and the size of the data of each data fragmentation storage can phase
Together, but it is not only limited to this.
103, the multiple data fragmentation multiple processing nodes are respectively allocated to handle.
The multiple data fragmentation is respectively allocated to multiple processing nodes to carry out processing save according to each processing
The load information of point distributes, and it is minimum that a data fragmentation in multiple data fragmentations distributed to load in every sub-distribution
Handle node;It is had not been obtained furthermore it is also possible to which a data fragmentation in the multiple data fragmentation is randomly assigned to one
The processing node of data fragmentation, but be not only limited to this, by the multiple data fragmentation be respectively allocated to multiple processing nodes into
Row processing can also have other various ways, will not enumerate herein.
104, the sub- result data after each processing node processing is received.
Wherein, the sub- result data is formed after the processing node processing, and the processing node can obtain it
The data fragmentation got is read and is handled line by line, often independent unrelated between capable data so as to be carried out on processing node
Arithmetic logic can be performed simultaneously in multirow data.
105, the sub- result data is merged, forms result data.
Wherein, the control node can merge the sub- result data that each processing node returns, and form result
Data.The result data can store database, for subsequent data analysis application.
The method of distributed parallel task processing provided in an embodiment of the present invention, control node receive pending data,
It is multiple data fragmentations by the pending data cutting, the multiple data fragmentation is respectively allocated to multiple processing nodes
It is handled, and receives the sub- result data after each processing node processing, the sub- result data is merged, knot is formed
Fruit data.And in the prior art, control node is receiving pending data, needs first to divide pending data
Group and sequence, under the scene that some do not need packet sequence, the mode of the prior art increases entire distributed parallel
The complexity of task processing system so that the speed of distributed parallel task processing is slower.And distribution provided by the invention is simultaneously
The mode of row task processing can be reduced it is not necessary that pending data are grouped and are sorted at entire distributed parallel task
The complexity of reason system improves the speed of distributed parallel task processing.
The other side corresponding with control node is processing node side, as shown in Fig. 2, distribution provided in an embodiment of the present invention
The method of parallel task processing is illustrated from processing node side, including:
201, the data fragmentation that control node is sent is received.
The source of the data fragmentation is the pending data that control node receives.The pending data without
The grouping and sequence of control node are crossed, cutting is directly carried out by the control node and forms the data fragmentation.
202, the data in the data fragmentation are handled, forms sub- result data.
The processing node can be read and be handled line by line to the data fragmentation that it gets, often between capable data
It is independent unrelated so that the arithmetic logic carried out on processing node can be performed simultaneously in multirow data.
203, the sub- result data is sent to the control node.
The purpose of above-mentioned steps 203 is that the sub- result data after each processing node processing data fragmentation reaches control
It after node, is merged by the control node, forms result data.
The method of distributed parallel task processing provided in an embodiment of the present invention, processing node receive data fragmentation, wherein
The data fragmentation is the pending data of the control node cutting and obtains that the pending data are not grouped and arrange
Sequence, the processing node carries out processing to data fragmentation and forms sub- result data, then sub- result data is sent to the control
Node.And in the prior art, control node is receiving pending data, needs first to be grouped pending data
And sequence, under the scene that some do not need packet sequence, the mode of the prior art increases entire distributed parallel and appoints
The complexity for processing system of being engaged in so that the speed of distributed parallel task processing is slower.And distributed parallel provided by the invention
The mode of task processing can reduce entire distributed parallel task processing it is not necessary that pending data are grouped and are sorted
The complexity of system can promote the speed of distributed parallel task processing.
It is described in detail and further expands below for method shown in fig. 1 or fig. 2:
As shown in figure 3, the method for the distributed parallel task processing that further embodiment of this invention provides, including:
301, control node receives pending data.
In distributed parallel task, the data volume of the pending data is generally large, and the size of data volume is general
More than 1 terabyte (Terabyte, abbreviation TB), but it is not only limited to this.For example, the pending data can be certain
For application program in intraday logon information, the logon information includes the on-line time of the account under the application program, under
Line time etc., but it is not only limited to this.
302, the pending data cutting is multiple numbers according to the quantity of the processing node by the control node
According to fragment.After step 302, step 303 or step 304 can be executed.
Wherein, it is data fragmentation, institute that the pending data can carry out cutting according to the quantity of the processing node
The quantity for stating data fragmentation is identical as the processing quantity of node, and the size of the data of each data fragmentation storage can phase
Together, but it is not only limited to this.
303, a data fragmentation in the multiple data fragmentation is randomly assigned to one and had not been obtained by control node
The processing node of data fragmentation, until multiple data fragmentations are assigned.Later, step 308 is continued to execute.
In order to ensure that the load of each processing node will not be excessive, need to carry out reasonable distribution, tool to the data fragmentation
Body can be randomly assigned data fragmentation, and after processing node has received data fragmentation, will not receive again
To the data fragmentation of the pending data.
304, the load information of its own is sent to control node by processing node.Step 305-306 is executed later.
Likewise, in order to data fragmentation described in reasonable distribution, it can also be according to the big of the load of each processing node
It is small to be allocated.The load at processing node is carried in the load information.
305, control node determines negative according to the load information of each processing node received according to the load information
Carry minimum processing node.
Specifically, after the load information that the control node gets each processing node, due to the load information
In carry the load of processing node, therefore can learn and load minimum processing node.
306, a data fragmentation in the multiple data fragmentation is distributed to the minimum place of the load by control node
Manage node.Continue to execute step 307.
In this way, when each data fragmentation in multiple data fragmentations is allocated, it is minimum can to distribute to load
Handle node so that the distribution of data fragmentation is more balanced, ensure that the load balancing of processing node.
307, control node judges whether the multiple data fragmentation is assigned.If the data fragmentation is assigned,
Step 308 is executed, otherwise returns to step 304.
308, processing node handles the multirow data in the data fragmentation line by line, forms sub- result data.
The processing node can be read and be handled line by line to the data fragmentation that it gets, often between capable data
It is independent unrelated so that the arithmetic logic carried out on processing node can be performed simultaneously in multirow data.
By above-mentioned pending data be certain application program in intraday logon information for, if desired filter out certain
The online account at one moment, then the logon information can be data fragmentation by the control node cutting, be saved by each processing
Point continues with, and according to the on-line time and downtime of each account in logon information, filters out at a time online
Account.Since multiple processing nodes are carried out at the same time screening, the speed for filtering out the online account at a certain moment is also very fast.
309, the sub- result data is sent to the control node by processing node.
310, control node merges the sub- result data, forms result data.
It is worth noting that the control node and processing node in the embodiment of the present invention may each be computer etc. and have fortune
The electronic equipment of calculation ability.
The method for the distributed parallel task processing that further embodiment of this invention provides, control node receive pending number
According to being multiple data fragmentations by the pending data cutting, and the multiple data fragmentation be respectively allocated to multiple places
Reason node is handled, and receives the sub- result data after each processing node processing, and the sub- result data is closed
And form result data.And in the prior art, control node is receiving pending data, needs first to pending
Data are grouped and sort, and under the scene that some do not need packet sequence, the mode of the prior art increases entirely
The complexity of distributed parallel task processing system so that the speed of distributed parallel task processing is slower.And the present invention provides
The processing of distributed parallel task mode it is not necessary that pending data are grouped and are sorted, entire distribution can be reduced
The complexity of parallel task processing system can promote the speed of distributed parallel task processing.
With reference to the realization of above-mentioned Fig. 1 and method shown in Fig. 3, as shown in figure 4, control provided in an embodiment of the present invention saves
Point, including:
Receiving unit 41, for receiving pending data.
Cutting unit 42, the pending data cutting for receiving the receiving unit 41 are multiple data point
Piece.
Allocation unit 43 is handled for the multiple data fragmentation to be respectively allocated to multiple processing nodes.
The receiving unit 41 is additionally operable to receive the sub- result data after each processing node processing.
Combining unit 44, the sub- result data for receiving the receiving unit 41 merge, and form result
Data.
Specifically, as shown in figure 5, the cutting unit 42, is used for:
It is more by the pending data cutting that the receiving unit 41 receives according to the quantity of the processing node
A data fragmentation.
Wherein, the quantity of the data fragmentation is identical as the processing quantity of node.
Further, as shown in figure 5, the allocation unit 43, is additionally operable to:
A data fragmentation in the multiple data fragmentation after 42 cutting of cutting unit is randomly assigned to one
A processing node that data fragmentation has not been obtained.
Further, as shown in figure 5, the control node further includes:Determination unit 45.
The receiving unit 41 is additionally operable to receive the load information of each processing node.
The determination unit 45, the load information for being received according to the receiving unit 41 determine and load minimum place
Manage node.
The allocation unit 43 is additionally operable to a data in multiple data fragmentations after 42 cutting of cutting unit
Fragment distributes to the minimum processing node of the load.
It is worth noting that the specific implementation of control node provided in an embodiment of the present invention may refer in Fig. 3
The specific implementation of the method for distributed parallel task processing, details are not described herein again.The control node can be computer
Deng the electronic equipment with operational capability.
Control node provided in an embodiment of the present invention, control node receive pending data, by the pending number
It is multiple data fragmentations according to cutting, and the multiple data fragmentation is respectively allocated to multiple processing nodes and is handled, and connects
The sub- result data after each processing node processing is received, and the sub- result data is merged, forms result data.And
In the prior art, control node is receiving pending data, needs that first pending data are grouped and are sorted,
Some are not needed under the scene of packet sequence, and the mode of the prior art increases entire distributed parallel task processing system
Complexity so that distributed parallel task processing speed it is slower.And distributed parallel task processing provided by the invention
Mode can reduce the complexity of entire distributed parallel task processing system it is not necessary that pending data are grouped and are sorted
Degree can promote the speed of distributed parallel task processing.
With reference to the realization of above-mentioned Fig. 2 and method shown in Fig. 3, as shown in fig. 6, processing provided in an embodiment of the present invention saves
Point, including:
Receiving unit 51, the data fragmentation for receiving control node transmission.
Wherein, the data fragmentation is the pending data of the control node cutting and obtains, the pending data
It is not grouped and sorts.
Processing unit 52, the data in the data fragmentation for receiving receiving unit 51 are handled, and form son
Result data.
Transmission unit 53, the sub- result data for forming the processing unit 52 are sent to the control node.
It is worth noting that the data fragmentation includes multirow data.
As shown in fig. 6, the processing unit 52, is specifically used for:
Multirow data in the data fragmentation are handled line by line.
Specifically, as shown in fig. 6, the transmission unit 53, is additionally operable to:
Load information is sent to the control node.Wherein, the load information carries the load of processing node.
It is worth noting that the specific implementation of processing node provided in an embodiment of the present invention may refer in Fig. 3
The specific implementation of the method for distributed parallel task processing, details are not described herein again.The processing node can be computer
Deng the electronic equipment with operational capability.
Processing node provided in an embodiment of the present invention, processing node receive data fragmentation, wherein the data fragmentation is institute
It states the pending data of control node cutting and obtains, the pending data are not grouped and sort, the processing node pair
Data fragmentation carries out processing and forms sub- result data, then sub- result data is sent to the control node.And in the prior art
In, control node is receiving pending data, needs that first pending data are grouped and are sorted, and is not required at some
It wants under the scene that packet is sorted, the mode of the prior art increases the complexity of entire distributed parallel task processing system
Degree so that the speed of distributed parallel task processing is slower.And the mode of distributed parallel task provided by the invention processing without
Pending data need to be grouped and be sorted, the complexity of entire distributed parallel task processing system can be reduced, it can
To promote the speed of distributed parallel task processing.
As shown in fig. 7, the system of distributed parallel task processing provided in an embodiment of the present invention, including 61 He of control node
Multiple processing nodes 62, wherein
The pending data cutting is multiple data for receiving pending data by the control node 61
The multiple data fragmentation is respectively allocated to multiple processing nodes 62 and handled by fragment;
The processing node 62, the data fragmentation for receiving the transmission of the control node 61, will be in the data fragmentation
Data handled, form sub- result data, and the sub- result data is sent to the control node 61;
The control node 61 is additionally operable to receive each processing node 62 treated sub- result data, by the sub- knot
Fruit data merge, and form result data.
It is worth noting that the specific implementation of the system of distributed parallel task processing provided in an embodiment of the present invention
The specific implementation of the method for the distributed parallel task processing in Fig. 3 is may refer to, details are not described herein again.
The system of distributed parallel task processing provided in an embodiment of the present invention, control node receive pending data,
It is multiple data fragmentations by the pending data cutting, and the multiple data fragmentation is respectively allocated to multiple processing and is saved
Point is handled, and receives the sub- result data after each processing node processing, and the sub- result data is merged, shape
At result data.And in the prior art, control node is receiving pending data, need first to pending data into
Row grouping and sequence, under the scene that some do not need packet sequence, the mode of the prior art increases entire distribution
The complexity of parallel task processing system so that the speed of distributed parallel task processing is slower.And distribution provided by the invention
The mode of formula parallel task processing can reduce entire distributed parallel and appoint it is not necessary that pending data are grouped and are sorted
The complexity for processing system of being engaged in can promote the speed of distributed parallel task processing.
Through the above description of the embodiments, it is apparent to those skilled in the art that the present invention can borrow
Help software that the mode of required common hardware is added to realize, naturally it is also possible to which by hardware, but the former is more preferably in many cases
Embodiment.Based on this understanding, the portion that technical scheme of the present invention substantially in other words contributes to the prior art
Dividing can be expressed in the form of software products, which is stored in the storage medium that can be read, and such as count
The floppy disk of calculation machine, hard disk or CD etc., including some instructions are used so that computer equipment (can be personal computer,
Server or the network equipment etc.) execute method described in each embodiment of the present invention.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.