CN105637482A

CN105637482A - Method and device for processing data stream based on gpu

Info

Publication number: CN105637482A
Application number: CN201480038261.0A
Authority: CN
Inventors: 邓利群; 朱俊华
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2014-09-15
Filing date: 2014-09-15
Publication date: 2016-06-01
Also published as: WO2016041126A1

Abstract

A method and device for processing data stream based on GPU combine multiple sub tasks with the same operation logic into a merge task, and call GPU to process the data stream for the merge task, so as to reduce the scheduling frequency and the scheduling overhead of GPU and improve the throughput of the stream data processing system.

Description

Data flow processing method and device based on GPU

Technical field

The present embodiments relate to computer technology, more particularly to it is a kind of based on graphics processor (Graphics Processing Unit, hereinafter referred to as：GPU data flow processing method and device).

Background technology

At present, GPU is applied into general-purpose computations field (such as database, data compression etc.) as coprocessor or accelerator turns into a main trend of industry.With central processing unit (Central Processing Unit, hereinafter referred to as：CPU) compare, GPU has the advantage such as more massive concurrent thread and higher memory bandwidth, be more suitable for large-scale data parallel or calculate parallel task.

But, for more than number of data streams and the big application scenarios of data generation frequency, stream process task has continuity and concurrent tasks are more, but the less feature of amount of calculation of single stream process task, therefore,, it is necessary to frequently dispatch GPU when accelerating using GPU to Data Stream Processing, cause GPU scheduling overheads big.

The content of the invention

The embodiment of the present invention provides a kind of data flow processing method and device based on GPU, to reduce GPU scheduling overhead, improves flow data processing system handling capacity.

First aspect of the embodiment of the present invention provides a kind of data flow processing method based on GPU, including：

The first subtask is received, first subtask includes first operand evidence and the first operation operator, and the operation logic of first operation operator is the first operation logic；

A merging task is merged into first subtask and at least one second subtask, wherein, the operation logic of second subtask is identical with first operation logic；

Dispatch image processor GPU and Data Stream Processing is carried out to the merging task.

It is described to merge into first subtask and at least one second subtask before one merging task with reference in a first aspect, in the first possible implementation, in addition to：

Determine the processing between the pieces of data record of first operand evidence described in first subtask without result dependence.

With reference to the first possible implementation of first aspect or first aspect, in second of possible implementation, the data flow of the first operation operator processing is the first data flow；

It is described that a merging task is merged into first subtask and at least one second subtask, including：

Judge in kernel archives whether to include with the second operation operator of the first operation operator identical, wherein, the operation logic of second operation operator is identical with first operation logic, and the data flow of the second operation operator processing is first data flow；

If not including in the kernel archives with first operation operator is added in the kernel archives if the second operation operator of the first operation operator identical, a merging task is merged into first subtask and at least one second subtask；

If include among the kernel with the second operation operator of the first operation operator identical, a merging task is merged into first subtask and at least one second subtask.

It is described that first operation operator is added in the kernel archives in the third possible implementation with reference to second of possible implementation of first aspect, including：

If there is at least one first candidate operations operator tuple in the kernel archives, first operation operator is added thereto the first candidate operations operator tuple；

If not having the first candidate operations operator tuple in the kernel archives, the first candidate operations operator tuple is created, first operation operator is added into the first candidate operations operator tuple；

The operation logic of each operation operator in the first candidate operations operator tuple is identical with first operation logic.

With reference to the third possible implementation of first aspect, in the 4th kind of possible implementation, it is described that first operation operator is added thereto the first candidate operations operator tuple if there is at least two first candidate operations operator tuples in the kernel archives, including：

A first operation operator group is selected from least two first candidate operations operator tuple according to the first preset rules；

First operation operator is added into the first operation operator group.

With reference to the 4th kind of possible implementation of first aspect, in the 5th kind of possible implementation, first preset rules are any one following rule：

The first minimum candidate operations operator tuple of selection operation operator is the first operation operator group；

It is institute to select the first minimum candidate operations operator tuple of the corresponding average amount of each operation operator State the first operation operator group.

With reference to the third possible implementation of first aspect, in the 6th kind of possible implementation, methods described also includes：If can not add first operation operator at least one described first candidate operations operator tuple in the kernel archives, each operation operator in first operation operator and at least one described first candidate operations operator tuple is grouped again according to the second preset rules.

It is described that each operation operator in first operation operator and at least one described first candidate operations operator tuple is grouped again according to the second preset rules in the 7th kind of possible implementation with reference to the 6th kind of possible implementation of first aspect, including：

The Executing Cost of the Executing Cost of the unit data of each operation operator at least one first candidate operations operator tuple described in calculating and the unit data of first operation operator；

The operation operator of the difference of the Executing Cost of unit data within a preset range is stored in the same first candidate operations operator tuple.

With reference to first aspect the 4th kind to the 7th kind possible implementation in any possible implementation, it is described that a merging task is merged into first subtask and at least one second subtask in the 8th kind of possible implementation, including：

When first subtask is triggered, the operation logic of each operation operator in the first operation operator group is merged into a union operation logic, and the peration data of the corresponding subtask of each operation operator in the first operation operator group is merged in same data structure；

Storage location of the peration data of the corresponding subtask of each operation operator in the first operation operator group in the same data structure, the number of the corresponding subtask of each operation operator in the first operation operator group, the length generation metadata information of the data record bar number in each subtask and each data record.

Second aspect of the embodiment of the present invention provides a kind of data stream processing device based on GPU, including：

Receiving module, for receiving the first subtask, first subtask includes first operand evidence and the first operation operator, and the operation logic of first operation operator is the first operation logic；

Merging module, for first subtask and at least one second subtask to be merged into a merging task, wherein, the operation logic of second subtask is identical with first operation logic；

Processing module, Data Stream Processing is carried out for dispatching image processor GPU to the merging task.

With reference to second aspect, in the first possible implementation, the merging module is additionally operable to

With reference to the first possible implementation of second aspect or second aspect, in second of possible implementation, the data flow of the first operation operator processing is the first data flow；

The merging module includes：

Judging unit, for judging whether included and the second operation operator of the first operation operator identical in kernel archives, wherein, the operation logic of second operation operator is identical with first operation logic, and the data flow of the second operation operator processing is first data flow；

First combining unit, for if not including in the kernel archives with first operation operator is added in the kernel archives if the second operation operator of the first operation operator identical, first subtask and at least one second subtask to be merged into a merging task；

Second combining unit, if for included among the kernel with the second operation operator of the first operation operator identical, a merging task is merged into first subtask and at least one second subtask.

With reference to second of possible implementation of second aspect, in the third possible implementation, if first operation operator is added thereto the first candidate operations operator tuple by first combining unit specifically for there is at least one first candidate operations operator tuple in the kernel archives；If not having the first candidate operations operator tuple in the kernel archives, the first candidate operations operator tuple is created, first operation operator is added into the first candidate operations operator tuple；Wherein, the operation logic of each operation operator in the first candidate operations operator tuple is identical with first operation logic.

With reference to the third possible implementation of second aspect, in the 4th kind of possible implementation, if first combining unit, which is specifically used in the kernel archives, at least two first candidate operations operator tuples, in selecting a first operation operator group from least two first candidate operations operator tuple according to the first preset rules；First operation operator is added into the first operation operator group.

With reference to the 4th kind of possible implementation of second aspect, in the 5th kind of possible implementation, first preset rules are any one following rule：The first minimum candidate operations operator tuple of selection operation operator is the first operation operator group；The the first candidate operations operator tuple for selecting the corresponding average amount of each operation operator minimum is the first operation operator group.

With reference to the third possible implementation of second aspect, in the 6th kind of possible implementation In, if first combining unit is additionally operable to add first operation operator at least one described first candidate operations operator tuple in the kernel archives, each operation operator in first operation operator and at least one described first candidate operations operator tuple is grouped again according to the second preset rules.

With reference to the 6th kind of possible implementation of second aspect, in the 7th kind of possible implementation, first combining unit is specifically for calculating the Executing Cost of the Executing Cost of the unit data of each operation operator at least one described first candidate operations operator tuple and the unit data of first operation operator；The operation operator of the difference of the Executing Cost of unit data within a preset range is stored in the same first candidate operations operator tuple.

With reference to second aspect the 4th kind to the 7th kind possible implementation in any possible implementation, in the 8th kind of possible implementation, the merging module is specifically for when first subtask is triggered, the operation logic of each operation operator in the first operation operator group is merged into a union operation logic, and the peration data of the corresponding subtask of each operation operator in the first operation operator group is merged in same data structure；Storage location of the peration data of the corresponding subtask of each operation operator in the first operation operator group in the same data structure, the number of the corresponding subtask of each operation operator in the first operation operator group, the length generation metadata information of the data record bar number in each subtask and each data record.

The third aspect of the embodiment of the present invention provides a kind of data stream processing device based on GPU, including：

Processor, memory and system bus, are connected by the system bus between the processor and the memory and complete mutual communication；

The memory, for storing computer executed instructions；

The processor, for running the computer executed instructions, the method for making the data stream processing device based on GPU perform any possible implementation such as first aspect.

Data flow processing method and device provided in an embodiment of the present invention based on GPU, by the way that the multiple subtasks of operation logic identical are merged into a merging task, and call GPU to carry out Data Stream Processing to merging task, so as to reduce GPU scheduling frequency, GPU scheduling overhead is reduced, the handling capacity of flow data processing system is improved.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to reality The accompanying drawing used required for applying in example or description of the prior art is briefly described, apparently, drawings in the following description are only some embodiments of the present invention, for those of ordinary skill in the art, without having to pay creative labor, other accompanying drawings can also be obtained according to these accompanying drawings.

Fig. 1 is the schematic flow sheet of the data flow processing method embodiment one of the invention based on GPU；

Fig. 2 is the schematic flow sheet of the data flow processing method embodiment two of the invention based on GPU；

Fig. 3 is data amalgamation result schematic diagram of the present invention；

Fig. 4 is the schematic flow sheet of the data flow processing method embodiment three of the invention based on GPU；

Fig. 5 is the structural representation of the data stream processing device embodiment one of the invention based on GPU；

Fig. 6 is the structural representation of the data stream processing device embodiment two of the invention based on GPU.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art are obtained under the premise of creative work is not made belongs to the scope of protection of the invention.

The (if present)s such as term " first ", " second ", " the 3rd " " the 4th " in description and claims of this specification and above-mentioned accompanying drawing are for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that the data so used can be exchanged in the appropriate case, so that embodiments of the invention described herein can for example be implemented with the order in addition to those for illustrating or describing herein.In addition, term " comprising " and " having " and their any deformation, it is intended to cover non-exclusive include, for example, process, method, system, product or the equipment for containing series of steps or unit are not necessarily limited to those steps clearly listed or unit, but may include not list clearly or for the intrinsic other steps of these processes, method, product or equipment or unit.

In order to reduce GPU scheduling overhead and improve stream processing system handling capacity, the present invention by the multiple subtasks that can merge by merging into a merging task, dispatch GPU and carry out Data Stream Processing to merging task, so as to reduce scheduling GPU frequency, reduce GPU scheduling overhead, and further, because the pending data quantitative change of merging task is big and is uniformly processed, so as to make full use of the large-scale parallel processing performances of GPU, and then the processing handling capacity of the system is improved.

Technical scheme is described in detail with specifically embodiment below.These specific embodiments can be combined with each other below, may be repeated no more for same or analogous concept or process in some embodiments.

For the ease of description, in the following each embodiments of the present invention, the currently received tasks of CPU are referred to as " the first subtask ", the pending data included in first subtask is referred to as " first operand evidence ", the operation operator of first subtask is referred to as " the first operation operator ", the operation logic of first operation operator is referred to as " the first operation logic ", wherein, operation operator refers to which kind of processing is carried out to certain data flow, operation logic refers to which kind of processing carried out, it is, the attribute of operation operator includes two：1st, the data flow of processing, 2, processing mode；And operation logic refers to processing mode, therefore, the operation logic of different operation operators is possible to identical.Operation operator group where first operation operator in GPU kernels archives (Kernel Profile) is referred to as " the first operation operator group ", it will be referred to as " the first candidate operations operator tuple " with the group where the first operation operator operation logic identical operation operator in kernel archives, wherein, kernel archives refer to the set for storing kernel information.In task pending in system will there is operation logic and the first operation logic identical task to be referred to as " the second subtask ".Will be identical with the operation logic of the first operation operator, and the data that handle of data and the first operation operator handled belong to the operation operator of same data flow and are referred to as " the second operation operator ".

Fig. 1 is the schematic flow sheet of the data flow processing method embodiment one of the invention based on GPU, as shown in figure 1, the executive agent of the present embodiment is CPU, the method for the present embodiment is as follows：

S101：Receive the first subtask.

In the internal memory of system, each data flow has a core buffer, data storage from same data flow is in same core buffer, when the data volume of a core buffer is more than the first predetermined threshold value, or, when buffer time is more than the second predetermined threshold value, stream processing system then sends task requests (i.e. to CPU：First subtask), request CPU is handled the data in the buffering area.Wherein, the first predetermined threshold value, the size concrete foundation practical application of the second predetermined threshold value are set, the invention is not limited in this regard.

First subtask includes first operand evidence and the first operation operator, and the operation logic of the first operation operator is the first operation logic.First operand is according to data to be processed are needed in the i.e. subtask of data flow first, the first operation operator, according to which kind of operation carried out, such as selects (Selection) operation, mapping (Projection) operation etc. i.e. to first operand.

S102：A merging task is merged into first subtask and at least one second subtask.

Wherein, the operation logic of the second subtask is identical with the first operation logic.

It is, operation logic identical subtask is merged into a merging task.

S103：Dispatch GPU and Data Stream Processing is carried out to above-mentioned merging task.

In the present embodiment, due to the multiple subtasks of operation logic identical are merged into a merging task, dispatch GPU and carry out Data Stream Processing to merging task, compared to each GPU of subtask scheduling of the prior art, scheduling GPU frequency is reduced, GPU scheduling overhead is reduced, and, it is big due to merging the pending data quantitative change of task, so as to make full use of the large-scale parallel processing performances of GPU, and then improve the processing handling capacity of the system.

In the above-described embodiments, before S102 is performed, in addition to：Determine the processing between the pieces of data record of first operand evidence in the first subtask without result dependence.Wherein, processing is relied on without result and referred to：Each data record in first subtask can be handled parallel, and the result of each data record does not interfere with the result of other data records.For example：Selection operation, map operation etc.；It is, on the premise of the processing only between the pieces of data record of the first operand evidence of the first subtask is without result dependence, the subtask that just can have same operation logic with other merges processing.If the result in the first subtask between the pieces of data record of first operand evidence has dependence, directly scheduling GPU is handled the first subtask, and no longer merges processing with other subtasks.

In the above-described embodiments, it is assumed that the data flow of the first operation operator processing is the first data flow, and S102 specific implementation is：Judge whether included and first the second operation operator of operation operator identical in kernel archives, wherein, the operation logic of the second operation operator is identical with the first operation logic, and the data flow of the second operation operator processing is the first data flow, it is, operation operator is identical must to be fulfilled for two conditions：1st, operation logic is identical；2nd, the data of processing belong to same data flow.If not including and the second operation operator of above-mentioned first operation operator identical in GPU kernels (Kernel) archives, the subtask of the first data flow was had not carried out before then illustrating system, first operation operator is added in kernel archives, a merging task is merged into the first subtask and at least one second subtask.If included in kernel archives with first the second operation operator of operation operator identical, the subtask of treated first data flow before illustrating system then directly performs the first subtask and at least one second subtask merging into a merging task.

Further, the first operation operator is added in kernel archives specifically implementation as shown in Fig. 2 Fig. 2 is the schematic flow sheet of the data flow processing method embodiment two of the invention based on GPU.

S201：Judge whether there is at least one first candidate operations operator tuple in kernel archives.If so, S202 is performed, if it is not, performing S205.

Wherein, the operation logic of each operation operator in the first candidate operations operator tuple is identical with the first operation logic.

S202：Judge whether at least one first candidate operations operator tuple can add the first operation operator.If so, S203 is performed, if it is not, performing S204.

Specifically, judge the first candidate operations operator tuple whether can add the first operation operator determination methods can be according to concrete application depending on；For example：For having the application of strict demand in processing delay, need to consider add after the first operation operator whether bigger time delay cost can be brought to the follow-up merging task of the first candidate operations operator tuple in the first candidate operations operator tuple, i.e., if the first operation operator is added to after a first candidate operations operator tuple, merging task based on the candidate operations operator tuple estimates maximum delay requirement of the time delay more than some operation operator in the first candidate operations operator tuple, then first candidate operation operator group cannot add the first operation operator.If not, then it is assumed that can add the first operation operator in the first candidate operations operator tuple, it is determined that at least one first candidate operations operator tuple can add the first operation operator.If can not add the first operation operator in the first all candidate operations operator tuples in kernel archives, it is determined that at least one first candidate operations operator tuple cannot add the first operation operator.

S203：First operation operator is added in the first operation operator group.

When only one of which the first candidate operations operator tuple can add the first operation operator, it is determined that the first candidate operations operator tuple is the first operation operator group, and the first operation operator is added into the first operation operator group；When there is at least two first candidate operations operator tuples to add the first operation operator, one is then selected from above-mentioned at least two first candidate operations operator tuple according to the first preset rules as the first operation operator group, the first operation operator is added into the first operation operator group.Wherein, the first preset rules can be：That minimum first candidate operations operator tuple of selection operation operator number is the first operation operator group from above-mentioned at least two first candidate operations operator tuple；Or, that the first candidate operations operator tuple for selecting the corresponding average amount of each operation operator minimum from above-mentioned at least two first candidate operations operator tuple is the first operation operator group.

S204：Each operation operator and the first operation operator at least one first candidate operations operator tuple is grouped again.

If can not add the first behaviour at least one first candidate operations operator tuple in kernel archives Make operator, then each operation operator in the first operation operator and at least one above-mentioned first candidate operations operator tuple is grouped again according to the second preset rules.Specifically, each operation operator at least one first candidate operations operator tuple is grouped again according to the second preset rules, comprised the following steps：1) Executing Cost and the Executing Cost of the first operation operator of the unit data of each operation operator at least one first candidate operations operator tuple are calculated.2) operation operator of the difference of the Executing Cost of unit data within a preset range is stored in same operation operator group.

By the way that the operation operator of the difference of the Executing Cost of unit data within a preset range is stored in same first candidate operations operator tuple so that each threads of GPU upon execution try one's best load balancing.

S205：The first candidate operations operator tuple is created, the first operation operator is added into the first candidate operations operator tuple.

If not having above-mentioned first candidate operations operator tuple in kernel archives, the first candidate operations operator tuple is created, the first operation operator the first candidate operations operator tuple is added into, it is, the first operation operator is first operation operator in the first candidate operations operator tuple.

In Fig. 1 or embodiment illustrated in fig. 2, a merging task is merged into the first subtask and at least one second subtask, specifically included：Operation logic merges merging two parts with corresponding pending data.

After the first subtask is triggered, the operation logic of all operation operators in the first operation operator group is merged into a union operation logic, and the corresponding pending peration data of each operation operator in the group is merged in same data structure.Storage location of the peration data of the corresponding subtask of each operation operator in the first operation operator group in the same data structure, the number of the corresponding subtask of each operation operator in the first operation operator group, the length generation metadata information of the data record bar number in each subtask and each data record.

Wherein, for the operation logic of each operation operator in the first operation operator group is merged into a union operation logic, because each operation logic in the first operation operator group is identical, therefore, their interface define it is roughly the same, for example：By taking selection operation operator as an example, its unifiedly calculate equipment framework (Compute Unified Device Architecture, hereinafter referred to as：CUDA) interface definition may be as follows：

“_global_void selection(data,n,result,filter)”

Wherein " data " is pending data, and " n " is data record number in " data ", and " n " is the integer more than or equal to 1, and " result " is also the array of a n dimension, to preserve Result after selection operation, and " filter " is the filter function interface of the selection operation operator, its code is described as follows：

So as to which if (i is the positive integer less than or equal to n) individual data record meets " fiter " condition i-th in " data ", i.e. filter (data [i]) result is true, then " result [i] " is true, is otherwise false.

For the selection operation operator of different data streams, the data type handled by them is different and filter function definition is also different.And the result that operation operator merges is to provide unified interface for these different selection operation operators, them are made to be indiscriminate single operation operator for GPU, such as, the selection operation operator general-purpose interface after merging just can be defined as below：

“_global_void MergedSelection(mergedData,n,result,filters)；”

Wherein, " mergedData " data structure includes the data record and corresponding metadata of processing needed for all operation operators being merged herein, " n " is the total number of all data records in mergedData, " result " is still the result for preserving selection operation, and " filters " is a Function Array, the filter function operation corresponding to each data flow is recorded successively.

Because different input traffics may have different pattern definitions, by taking three data flows as an example, the pattern definition difference of three data flows is as follows：Data flow A pattern definition is as follows：

It is therefore desirable to which suitable data structure preserves the peration data of merging so that above-mentioned " MergedSelection " interface is uniformly processed.

Such as, now need to merge three data flows (such as data flow A, data stream B, data flow C), and their pattern definition is as described above, be their data attribute number, attribute type, data record size and bar number are possible different.In order that obtaining these data flows, their own data record can be accessed by indifference after merging, while also needing to preserve some necessary metadata informations.The length of data record bar number and each data record in storage location of the peration data of each data flow in same data structure, the number of the subtask of each data flow, each subtask generates metadata information.So as to which the data structure to preserve data stream merging result " MergedData " can be defined as follows as follows：

Wherein, data field " data " preserves the data flow data record of each input in byte stream form and it is as shown in Figure 3 in the storage of GPU internal memories.Fig. 3 is data amalgamation result schematic diagram of the present invention, Data flow A data record bar number is nA, and the data record bar number of data stream B is nB, and data flow C data record bar number is nC.The dimension in " position ", " count " and " length " domain be equal to merge data flow number, thus they take up space it is minimum.

Based on " MergedData " data structure, " MergedSelection " general-purpose interface just can be accomplished by the following way：

I.e. for each GPU threads, its data record that need to be handled is determined by Thread Id, while data flow and its corresponding metadata information according to belonging to the Thread Id determines pending data record, such as：Its initial address in a stream, so as to be read correctly the data record, and then call corresponding filter function to be handled.(other merging interfaces are also similar for " MergedSelection " interface, it is " MergedProjection " such as the general-purpose interface that " Projection " is operated) it can be compiled in advance, design parameter only need to be operationally transmitted to it, dynamic call is performed.So that original multiple subtasks are merged into single batch tasks.

Fig. 4 is the schematic flow sheet of the data flow processing method embodiment three of the invention based on GPU, and Fig. 4 is by taking selection operation as an example, as shown in figure 4, the method for the present embodiment, including：

S401：Collect the input data and corresponding metadata information of the corresponding data flow of each operation operator to be combined.

Wherein, metadata information is record number, record length etc..

S402：Newly-built " MergedData " object, and with the data of collection to " MergedData " each data field assignment.

S403：" MergedSelection Kernel " are passed to using the filter function of " MergedData " object and each operation operator as parameter.

S404：" MergedSelection " is dispatched to perform to GPU.

Fig. 5 is the structural representation of the data stream processing device embodiment one of the invention based on GPU, the device of the present embodiment includes receiving module 501, merging module 502 and processing module 503, wherein, receiving module 501 is used to receive the first subtask, above-mentioned first subtask includes first operand evidence and the first operation operator, and the operation logic of above-mentioned first operation operator is the first operation logic；Merging module 502 is used to above-mentioned first subtask and at least one second subtask merging into a merging task, wherein, the operation logic of above-mentioned second subtask is identical with above-mentioned first operation logic；Processing module 503 is used to dispatch image processor GPU to above-mentioned merging task progress Data Stream Processing.

The device of the present embodiment, accordingly available for the technical scheme for performing embodiment of the method shown in Fig. 1, Its implementing principle and technical effect is similar, will not be repeated here.

In the above-described embodiments, above-mentioned merging module 502 is additionally operable to determine processing between the pieces of data record of above-mentioned first operand evidence in above-mentioned first subtask without result dependence.

In the above-described embodiments, the data flow of above-mentioned first operation operator processing is the first data flow；Above-mentioned merging module 502 further includes：Judging unit, the first combining unit and the second combining unit, wherein, judging unit is used to judge whether included and the second operation operator of above-mentioned first operation operator identical in kernel archives, wherein, the operation logic of above-mentioned second operation operator is identical with above-mentioned first operation logic, and the data flow of above-mentioned second operation operator processing is above-mentioned first data flow；First combining unit, for if not including in above-mentioned kernel archives with above-mentioned first operation operator is added in above-mentioned kernel archives if the second operation operator of above-mentioned first operation operator identical, above-mentioned first subtask and at least one second subtask to be merged into a merging task；Second combining unit, if for included among above-mentioned kernel with the second operation operator of above-mentioned first operation operator identical, a merging task is merged into above-mentioned first subtask and at least one second subtask.

In the above-described embodiments, if above-mentioned first combining unit is specifically for there is at least one first candidate operations operator tuple in above-mentioned kernel archives, the operation logic that above-mentioned first operation operator then is added thereto into each operation operator in an above-mentioned first candidate operations operator tuple, above-mentioned first candidate operations operator tuple is identical with above-mentioned first operation logic；If not having above-mentioned first candidate operations operator tuple in above-mentioned kernel archives, above-mentioned first candidate operations operator tuple is created, above-mentioned first operation operator is added into above-mentioned first candidate operations operator tuple.

In the above-described embodiments, if above-mentioned first combining unit, which is specifically used in above-mentioned kernel archives, at least two first candidate operations operator tuples, in selecting a first operation operator group from above-mentioned at least two first candidate operations operator tuple according to the first preset rules；Above-mentioned first operation operator is added into above-mentioned first operation operator group.

In the above-described embodiments, above-mentioned first preset rules are any one following rule：The first minimum candidate operations operator tuple of selection operation operator is above-mentioned first operation operator group；The the first candidate operations operator tuple for selecting the corresponding average amount of each operation operator minimum is above-mentioned first operation operator group.

In the above-described embodiments, if above-mentioned first combining unit is additionally operable to add above-mentioned first operation operator at least one above-mentioned first candidate operations operator tuple in above-mentioned kernel archives, each operation operator in above-mentioned first operation operator and at least one above-mentioned first candidate operations operator tuple is grouped again according to the second preset rules.

In the above-described embodiments, above-mentioned first combining unit is specifically for calculating the Executing Cost of the Executing Cost of the unit data of each operation operator in above-mentioned at least one first candidate operations operator tuple and the unit data of above-mentioned first operation operator；The operation operator of the difference of the Executing Cost of unit data within a preset range is stored in same above-mentioned first candidate operations operator tuple.

In the above-described embodiments, above-mentioned merging module 502 is specifically for when above-mentioned first subtask is triggered, the operation logic of each operation operator in above-mentioned first operation operator group is merged into a union operation logic, and the peration data of the corresponding subtask of each operation operator in above-mentioned first operation operator group is merged in same data structure；Storage location of the peration data of the corresponding subtask of each operation operator in above-mentioned first operation operator group in above-mentioned same data structure, the number of the corresponding subtask of each operation operator in above-mentioned first operation operator group, the length generation metadata information of the data record bar number in each above-mentioned subtask and each data record.

The device of the present embodiment, accordingly available for the technical scheme for performing embodiment of the method shown in Fig. 2, its implementing principle and technical effect is similar, will not be repeated here.

Fig. 6 is the structural representation of the data stream processing device embodiment two of the invention based on GPU, as shown in fig. 6, the data stream processing device 600 based on GPU of the present embodiment includes：Processor 601, memory 602 and system bus 603, wherein, connected between above-mentioned processor 601 and above-mentioned memory 602 by said system bus and complete mutual communication；Above-mentioned memory 602 is used to store computer executed instructions 6021；Above-mentioned processor 601 is used to run above computer execute instruction 6021, the above-mentioned data stream processing device based on GPU is performed following method：

The first subtask is received, above-mentioned first subtask includes first operand evidence and the first operation operator, and the operation logic of above-mentioned first operation operator is the first operation logic；A merging task is merged into above-mentioned first subtask and at least one second subtask, wherein, the operation logic of above-mentioned second subtask is identical with above-mentioned first operation logic；Dispatch image processor GPU and Data Stream Processing is carried out to above-mentioned merging task.

Further, the processing between pieces of data record of the processor 601 specifically for determining above-mentioned first operand evidence in above-mentioned first subtask is without result dependence.

Further, the data flow of above-mentioned first operation operator processing is the first data flow；Whether processor 601 operates specifically for judging to include in kernel archives with above-mentioned first operation operator identical second Operator, wherein, the operation logic of above-mentioned second operation operator is identical with above-mentioned first operation logic, and the data flow of above-mentioned second operation operator processing is above-mentioned first data flow；

If not including in above-mentioned kernel archives with above-mentioned first operation operator is added in above-mentioned kernel archives if the second operation operator of above-mentioned first operation operator identical, a merging task is merged into above-mentioned first subtask and at least one second subtask；

If include among above-mentioned kernel with the second operation operator of above-mentioned first operation operator identical, a merging task is merged into above-mentioned first subtask and at least one second subtask.

Further, if processor 601 is specifically for there is at least one first candidate operations operator tuple in above-mentioned kernel archives, the operation logic that above-mentioned first operation operator then is added thereto into each operation operator in an above-mentioned first candidate operations operator tuple, above-mentioned first candidate operations operator tuple is identical with above-mentioned first operation logic；

If not having above-mentioned first candidate operations operator tuple in above-mentioned kernel archives, above-mentioned first candidate operations operator tuple is created, above-mentioned first operation operator is added into above-mentioned first candidate operations operator tuple.

Further, if there is at least two first candidate operations operator tuples in above-mentioned kernel archives, processor 601 from above-mentioned at least two first candidate operations operator tuple according to the first preset rules specifically for selecting a first operation operator group；Above-mentioned first operation operator is added into above-mentioned first operation operator group.

Further, above-mentioned first preset rules are any one following rule：The first minimum candidate operations operator tuple of selection operation operator is above-mentioned first operation operator group；The the first candidate operations operator tuple for selecting the corresponding average amount of each operation operator minimum is above-mentioned first operation operator group.

Further, if can not add above-mentioned first operation operator at least one above-mentioned first candidate operations operator tuple in above-mentioned kernel archives, processor 601 specifically for being grouped again according to the second preset rules to each operation operator in above-mentioned first operation operator and at least one above-mentioned first candidate operations operator tuple.

Further, processor 601 is specifically for calculating the Executing Cost of the Executing Cost of the unit data of each operation operator in above-mentioned at least one first candidate operations operator tuple and the unit data of above-mentioned first operation operator；The operation operator of the difference of the Executing Cost of unit data within a preset range is stored in same above-mentioned first candidate operations operator tuple.

Further, processor 601 is specific merges into a union operation logic when above-mentioned first subtask is triggered by the operation logic of each operation operator in above-mentioned first operation operator group, and will be upper The peration data for stating the corresponding subtask of each operation operator in the first operation operator group merges in same data structure；Storage location of the peration data of the corresponding subtask of each operation operator in above-mentioned first operation operator group in above-mentioned same data structure, the number of the corresponding subtask of each operation operator in above-mentioned first operation operator group, the length generation metadata information of the data record bar number in each above-mentioned subtask and each data record.

The embodiment of the present invention also provides a kind of computer-readable medium, comprising computer executed instructions, above computer execute instruction is used for the method described in the data stream processing device execution data flow processing method embodiment one to embodiment three of the invention based on GPU based on GPU.

One of ordinary skill in the art will appreciate that：Realizing all or part of step of above-mentioned each method embodiment can be completed by the related hardware of programmed instruction.Foregoing program can be stored in a computer read/write memory medium.The program upon execution, performs the step of including above-mentioned each method embodiment；And foregoing storage medium includes：ROM, RAM, magnetic disc or CD etc. are various can be with the medium of store program codes.

Finally it should be noted that：Various embodiments above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although the present invention is described in detail with reference to foregoing embodiments, it will be understood by those within the art that：It can still modify to the technical scheme described in foregoing embodiments, or carry out equivalent substitution to which part or all technical characteristic；And these modifications or replacement, the essence of appropriate technical solution is departed from the scope of various embodiments of the present invention technical scheme.

Claims

A kind of data flow processing method based on GPU, it is characterised in that including：

The first subtask is received, first subtask includes first operand evidence and the first operation operator, and the operation logic of first operation operator is the first operation logic；

A merging task is merged into first subtask and at least one second subtask, wherein, the operation logic of second subtask is identical with first operation logic；

Dispatch image processor GPU and Data Stream Processing is carried out to the merging task.
According to the method described in claim 1, it is characterised in that described to merge into first subtask and at least one second subtask before one merging task, in addition to：

Determine the processing between the pieces of data record of first operand evidence described in first subtask without result dependence.
Method according to claim 1 or 2, it is characterised in that the data flow of the first operation operator processing is the first data flow；

It is described that a merging task is merged into first subtask and at least one second subtask, including：

Judge in kernel archives whether to include with the second operation operator of the first operation operator identical, wherein, the operation logic of second operation operator is identical with first operation logic, and the data flow of the second operation operator processing is first data flow；

If not including in the kernel archives with first operation operator is added in the kernel archives if the second operation operator of the first operation operator identical, a merging task is merged into first subtask and at least one second subtask；

If include among the kernel with the second operation operator of the first operation operator identical, a merging task is merged into first subtask and at least one second subtask.
Method according to claim 3, it is characterised in that described that first operation operator is added in the kernel archives, including：

If there is at least one first candidate operations operator tuple in the kernel archives, first operation operator is added thereto the first candidate operations operator tuple；

If not having the first candidate operations operator tuple in the kernel archives, the first candidate operations operator tuple is created, first operation operator is added into the first candidate operations operator tuple；

Wherein, the operation logic of each operation operator in the first candidate operations operator tuple and described the One operation logic is identical.
Method according to claim 4, it is characterised in that if there is at least two first candidate operations operator tuples in the kernel archives, it is described that first operation operator is added thereto the first candidate operations operator tuple, including：

A first operation operator group is selected from least two first candidate operations operator tuple according to the first preset rules；

First operation operator is added into the first operation operator group.
Method according to claim 5, it is characterised in that first preset rules are any one following rule：

The first minimum candidate operations operator tuple of selection operation operator is the first operation operator group；

The the first candidate operations operator tuple for selecting the corresponding average amount of each operation operator minimum is the first operation operator group.
Method according to claim 4, it is characterised in that also include：If can not add first operation operator at least one described first candidate operations operator tuple in the kernel archives, each operation operator in first operation operator and at least one described first candidate operations operator tuple is grouped again according to the second preset rules.
Method according to claim 7, it is characterised in that described to be grouped again to each operation operator in first operation operator and at least one described first candidate operations operator tuple according to the second preset rules, including：

The Executing Cost of the Executing Cost of the unit data of each operation operator at least one first candidate operations operator tuple described in calculating and the unit data of first operation operator；

The operation operator of the difference of the Executing Cost of unit data within a preset range is stored in the same first candidate operations operator tuple.
Method according to any one of claim 5~8, it is characterised in that described that a merging task is merged into first subtask and at least one second subtask, including：

When first subtask is triggered, the operation logic of each operation operator in the first operation operator group is merged into a union operation logic, and the peration data of the corresponding subtask of each operation operator in the first operation operator group is merged in same data structure；

Storage location of the peration data of the corresponding subtask of each operation operator in the first operation operator group in the same data structure, each operation in the first operation operator group are calculated The length generation metadata information of the number of the corresponding subtask of son, the data record bar number in each subtask and each data record.
A kind of data stream processing device based on GPU, it is characterised in that including：

Receiving module, for receiving the first subtask, first subtask includes first operand evidence and the first operation operator, and the operation logic of first operation operator is the first operation logic；

Merging module, for first subtask and at least one second subtask to be merged into a merging task, wherein, the operation logic of second subtask is identical with first operation logic；

Processing module, Data Stream Processing is carried out for dispatching image processor GPU to the merging task.
Device according to claim 10, it is characterised in that the merging module is additionally operable to determine processing between the pieces of data record of first operand evidence described in first subtask without result dependence.
Device according to claim 10 or 11, it is characterised in that the data flow of the first operation operator processing is the first data flow；

The merging module includes：

Judging unit, for judging whether included and the second operation operator of the first operation operator identical in kernel archives, wherein, the operation logic of second operation operator is identical with first operation logic, and the data flow of the second operation operator processing is first data flow；

First combining unit, for if not including in the kernel archives with first operation operator is added in the kernel archives if the second operation operator of the first operation operator identical, first subtask and at least one second subtask to be merged into a merging task；

Second combining unit, if for included among the kernel with the second operation operator of the first operation operator identical, a merging task is merged into first subtask and at least one second subtask.
Device according to claim 12, it is characterised in that if first operation operator is added thereto the first candidate operations operator tuple by first combining unit specifically for there is at least one first candidate operations operator tuple in the kernel archives；If not having the first candidate operations operator tuple in the kernel archives, the first candidate operations operator tuple is created, first operation operator is added into the first candidate operations operator tuple；Wherein, the operation logic of each operation operator in the first candidate operations operator tuple is identical with first operation logic.
Device according to claim 13, it is characterized in that, if first combining unit, which is specifically used in the kernel archives, at least two first candidate operations operator tuples, in selecting a first operation operator group from least two first candidate operations operator tuple according to the first preset rules；First operation operator is added into the first operation operator group.
Device according to claim 14, it is characterised in that first preset rules are any one following rule：The first minimum candidate operations operator tuple of selection operation operator is the first operation operator group；The the first candidate operations operator tuple for selecting the corresponding average amount of each operation operator minimum is the first operation operator group.
Device according to claim 13, it is characterized in that, if first combining unit is additionally operable to add first operation operator at least one described first candidate operations operator tuple in the kernel archives, each operation operator in first operation operator and at least one described first candidate operations operator tuple is grouped again according to the second preset rules.
Device according to claim 16, characterized in that, first combining unit is specifically for calculating the Executing Cost of the Executing Cost of the unit data of each operation operator at least one described first candidate operations operator tuple and the unit data of first operation operator；The operation operator of the difference of the Executing Cost of unit data within a preset range is stored in the same first candidate operations operator tuple.
Device according to any one of claim 14~17, it is characterized in that, the merging module is specifically for when first subtask is triggered, the operation logic of each operation operator in the first operation operator group is merged into a union operation logic, and the peration data of the corresponding subtask of each operation operator in the first operation operator group is merged in same data structure；Storage location of the peration data of the corresponding subtask of each operation operator in the first operation operator group in the same data structure, the number of the corresponding subtask of each operation operator in the first operation operator group, the length generation metadata information of the data record bar number in each subtask and each data record.
A kind of data stream processing device based on GPU, it is characterised in that including：

Processor, memory and system bus；

Connected between the processor and the memory by the system bus and complete mutual communication；

The memory, for storing computer executed instructions；

The processor, for running the computer executed instructions, makes the data stream processing device based on GPU perform the method as described in claim 1 to 9 is any.