CN111199274A

CN111199274A - Two-dimensional convolution implementation method and device

Info

Publication number: CN111199274A
Application number: CN202010011113.7A
Authority: CN
Inventors: 鄢贵海; 卢文岩
Original assignee: Yusur Technology Co ltd
Current assignee: Yusur Technology Co ltd
Priority date: 2020-01-06
Filing date: 2020-01-06
Publication date: 2020-05-26

Abstract

The invention provides a method and a device for realizing two-dimensional convolution, wherein the method comprises the following steps: the two-dimensional input data is subjected to one-dimensional conversion, and the two-dimensional input data is expanded into one-dimensional data according to a set dimension; performing convolution kernel decomposition, namely decomposing the two-dimensional convolution kernel into independent rows or independent columns according to the expansion mode of the two-dimensional input data; calculating sub-results, namely performing one-dimensional convolution operation on the one-dimensional data and decomposed convolution kernels respectively to obtain a plurality of one-dimensional sub-result sequences correspondingly; and combining the sub-results, namely combining all the corresponding data in the plurality of one-dimensional sub-result sequences to obtain and output a final two-dimensional convolution calculation result. The invention also provides a device for realizing the two-dimensional convolution. The method and the device can convert all two-dimensional data into one-dimensional data by combining, and can complete two-dimensional convolution operation through one-dimensional convolution operation, thereby greatly simplifying control complexity and on-chip data routing difficulty.

Description

Two-dimensional convolution implementation method and device

Technical Field

The invention relates to data and database processing, in particular to a two-dimensional convolution implementation method and device for sequencing data processing.

Background

The two-dimensional convolution is one of the most core operations in many scientific calculations, and occupies most of the calculation amount in the whole application, for example, the calculation between the input feature image in each layer of the convolutional neural network and the two-dimensional convolution kernel occupies more than 90% of the calculation amount in the whole network model. Therefore, improving the calculation efficiency of the two-dimensional convolution is a key for many core scientific calculation optimizations.

In order to accelerate the calculation efficiency of the two-dimensional convolution, more and more special calculation architectures are proposed, the calculation architectures are designed according to the characteristics of the two-dimensional convolution, the calculation units are organized in a two-dimensional mode, two-dimensional data are directly processed, and the calculation efficiency is high. However, the two-dimensional data addressing method is very complicated, and brings great challenges to chip design. In order to realize such complicated addressing and provide data for the computing unit in time, the existing scheme mostly utilizes a multi-layer on-chip storage structure, finely divides the storage structure and arranges the data, and has very complicated control. In the existing design, although such a complicated scheme is adopted, the problem of insufficient data of a computing unit caused by data collision still often occurs, and the overall computing performance is low.

Disclosure of Invention

The invention aims to solve the technical problems of complex two-dimensional convolution addressing and complex control in the prior art and provides a method and a device for realizing two-dimensional convolution.

In order to achieve the above object, the present invention provides a two-dimensional convolution implementation method, wherein the method comprises the following steps:

the two-dimensional input data is subjected to one-dimensional conversion, and the two-dimensional input data is expanded into one-dimensional data according to a set dimension;

performing convolution kernel decomposition, namely decomposing the two-dimensional convolution kernel into independent rows or independent columns according to the expansion mode of the two-dimensional input data;

calculating sub-results, namely performing one-dimensional convolution operation on the one-dimensional data and decomposed convolution kernels respectively to obtain a plurality of one-dimensional sub-result sequences correspondingly; and

and combining the sub-results, namely combining all the corresponding data in the plurality of one-dimensional sub-result sequences to obtain and output a final two-dimensional convolution calculation result.

Optionally, in the above two-dimensional convolution implementation method, the two-dimensional input data is expanded into the one-dimensional data according to rows, columns, diagonal lines, or irregular manners.

Optionally, in the above two-dimensional convolution implementation method, a manner of the convolution kernel decomposition is completely consistent with a manner of expanding the two-dimensional input data into the one-dimensional data.

Optionally, in the two-dimensional convolution implementation method, the one-dimensional input data further includes: expanding the two-dimensional input data in a line mode, and connecting the tail end of adjacent lines into the one-dimensional data;

the convolution kernel decomposition further comprises: decomposing the two-dimensional convolution kernel into a first row of convolution kernels and a second row of convolution kernels, also in rows;

the sub-result calculation further comprises: the one-dimensional data is respectively subjected to one-dimensional convolution operation with the first row of the convolution kernel and the second row of the convolution kernel, and when the one-dimensional data is subjected to the one-dimensional convolution operation with the first row of the convolution kernel, the sliding window is moved from the first row of the one-dimensional data to the second last row of the one-dimensional data and is ended; when the one-dimensional data and the second row of the convolution kernel perform one-dimensional convolution operation, the sliding window starts to move from the second row of the one-dimensional data to the last row of the one-dimensional data; and

the sub-result merging further comprises: and correspondingly adding and combining the one-dimensional data and the sub-results of the one-dimensional convolution operation performed on the first row of the convolution kernel and the second row of the convolution kernel respectively to obtain a final calculation result.

Optionally, in the above two-dimensional convolution implementation method, the intermediate convolution result generated by the one-dimensional input data and the decomposed convolution kernel during the one-dimensional convolution operation is accumulated and cached with the previously calculated sub-result during the later-stage sub-result combining operation, and then is accumulated and combined with the newly calculated sub-result until the sub-results related to all the decomposed convolution kernels are calculated and combined.

In order to better achieve the above object, the present invention further provides a two-dimensional convolution implementation apparatus, including:

the one-dimensional convolution operation unit is used for realizing one-dimensional convolution operation between one-dimensional input data obtained by expanding two-dimensional input data according to a set dimension and a convolution kernel obtained by decomposing a two-dimensional convolution kernel according to an expansion mode of the two-dimensional input data to obtain a plurality of one-dimensional sub-result sequences;

the sub-result merging operation unit is used for completing merging operation of a plurality of one-dimensional sub-results; and

and the controller is respectively connected with the one-dimensional convolution operation unit and the sub-result merging operation unit and is used for generating a control signal so as to coordinate the operation between the one-dimensional convolution operation unit and the sub-result merging operation unit.

In the above two-dimensional convolution implementation apparatus, the one-dimensional convolution operation unit is in the form of a multiply-accumulate tree, and the controller generates a corresponding control signal for implementing discontinuous control of the convolution of the one-dimensional input data.

The above two-dimensional convolution implementation apparatus further includes a plurality of buffer units, where the plurality of buffer units are used to synchronize the data streams, and the plurality of buffer units further include:

the input data buffer unit is used for buffering the one-dimensional input data and is respectively connected with the controller and the one-dimensional convolution operation unit;

the convolution kernel buffer unit is used for buffering decomposed convolution kernel data and is respectively connected with the controller and the one-dimensional convolution operation unit; and

and the sub-result buffer unit is used for buffering the calculation result of the one-dimensional convolution operation unit and is respectively connected with the controller, the one-dimensional convolution operation unit and the sub-result merging operation unit.

The above two-dimensional convolution implementation apparatus, wherein the plurality of buffer units further include:

and the merging result buffer unit is used for reading and caching the convolution result of the last one-dimensional input data and the first row of the convolution kernel again and is respectively connected with the controller and the sub-result merging operation unit.

In the above two-dimensional convolution implementation apparatus, the input data buffer unit reads the one-dimensional input data again and buffers the one-dimensional input data; the convolution kernel buffer unit reads in and buffers the first row of the convolution kernel; a merging result buffer unit reads in and buffers the last one-dimensional input data and the intermediate convolution result of the first row of the convolution kernel again; the one-dimensional convolution operation unit reads the one-dimensional input data and the data of the second row of the convolution kernel from the input data buffer unit and the convolution kernel buffer unit respectively to perform one-dimensional convolution operation, and caches the operation result into the sub-result buffer unit; and the sub-result merging operation unit reads the sub-results calculated and output by the current one-dimensional convolution operation unit from the sub-result buffer unit, reads the intermediate convolution results from the merging result buffer unit, and outputs the final calculation result after accumulation and merging until the final two-dimensional convolution operation is completed.

The technical effects of the embodiment of the invention comprise:

the two-dimensional convolution one-dimensional method of the embodiment of the invention overcomes the problems of complicated addressing and large control difficulty of two-dimensional convolution operation. All two-dimensional data can be converted into one-dimensional data, and the two-dimensional convolution operation can be completed through the one-dimensional convolution operation, so that the on-chip data storage difficulty is greatly reduced, and the routing complexity and the control complexity are simplified. Meanwhile, the device of the embodiment of the invention can perfectly support one-dimensional convolution operation, greatly improve the calculation efficiency, and reduce the addressing difficulty of data in storage and the complexity of the whole calculation control on the premise of not reducing the processing performance.

It will be appreciated by those skilled in the art that the objects and advantages that can be achieved with the present invention are not limited to the specific details set forth above, and that these and other objects that can be achieved with the present invention will be more clearly understood from the detailed description that follows.

Drawings

FIG. 1 is a schematic flow chart of a two-dimensional convolution implementation method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a two-dimensional convolution implementation apparatus according to an embodiment of the present invention;

FIG. 3 is a diagram of a one-dimensional convolution operation unit according to an embodiment of the present invention;

FIG. 4 is an example of a two-dimensional convolution;

FIG. 5 is a diagram illustrating a two-dimensional convolution and one-dimensional process according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the structures and/or processing steps closely related to the scheme according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted.

It should be emphasized that the term "comprises/comprising/comprises/having" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components.

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict.

In order to solve the problems of complicated addressing and high control difficulty of two-dimensional convolution operation, the embodiment of the invention provides a method for realizing one-dimensional operation of the two-dimensional convolution operation. Fig. 1 is a schematic diagram of a two-dimensional convolution implementation method according to an embodiment of the present invention, and referring to fig. 1, the two-dimensional convolution implementation method according to the embodiment includes the following steps:

step S100, the two-dimensional input data is subjected to one-dimensional conversion, and the two-dimensional input data is expanded into one-dimensional data according to a set dimension.

And S200, performing convolution kernel decomposition, namely decomposing the two-dimensional convolution kernel into independent rows or independent columns according to the expansion mode of the two-dimensional input data in the S100.

And step S300, calculating sub-results, namely performing one-dimensional convolution operation on the one-dimensional data obtained in the step S100 and the convolution kernels decomposed in the step S200 respectively to obtain a plurality of one-dimensional sub-result sequences correspondingly.

And S400, merging the sub-results, namely merging all the corresponding data in the plurality of one-dimensional sub-result sequences obtained by calculation in the step S300 to obtain and output a final two-dimensional convolution calculation result.

In step S100, the set dimension may be a row-wise dimension or a column-wise dimension, or may be a diagonal dimension, that is, the two-dimensional input data may be expanded in a manner of expanding the two-dimensional input data into a one-dimensional sequence in rows and columns, but may also be expanded into one-dimensional data in a diagonal manner, or in another more irregular manner.

In step S200, the convolution kernel is decomposed in the same manner as the two-dimensional input data is expanded into one-dimensional data in step S100, for example, if the input data is expanded in lines, the convolution kernel is also decomposed in lines.

In step S300, the one-dimensional input data is subjected to one-dimensional convolution operation with each of the parts decomposed by the convolution kernel in step S200, so as to obtain a plurality of one-dimensional sub-results.

In step S400, the summation operation is performed on the corresponding elements in all the one-dimensional sub-results to obtain the final two-dimensional convolution calculation result.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a two-dimensional convolution implementation apparatus for rapidly implementing two-dimensional convolution according to an embodiment of the present invention. The two-dimensional convolution implementation device of the embodiment comprises: the device comprises a one-dimensional convolution operation unit, a sub-result merging operation unit and a controller.

The one-dimensional convolution operation unit is used for realizing one-dimensional convolution operation between the one-dimensional input data and the decomposed convolution kernel. The organization structure of the one-dimensional convolution operation unit is relatively flexible, any unit capable of realizing one-dimensional convolution operation can be used, and the most typical structure is in the form of a multiply-accumulate tree, but is not limited to the form. However, an additional control signal of the controller is required to implement discontinuous control of the convolution of the one-dimensional input data, for example, in the form of line expansion, and a new convolution window calculation needs to be started when the end of each line is switched to the data of the next line.

And the sub-result merging operation unit is used for completing the merging operation of a plurality of sub-results.

The controller is respectively connected with the one-dimensional convolution operation unit and the sub-result combination operation unit and is used for generating control signals so as to coordinate the operation between the one-dimensional convolution operation unit and the sub-result combination operation unit.

In this embodiment, the two-dimensional convolution implementation apparatus may further include a plurality of buffer units, where the plurality of buffer units are used to synchronize the data streams, and the plurality of buffer units further include: the device comprises an input data buffer unit, a convolution kernel buffer unit, a sub-result buffer unit and a merging result buffer unit.

The input data buffer unit is used for buffering one-dimensional input data and is respectively connected with the controller and the one-dimensional convolution operation unit.

And the convolution kernel buffer unit is used for buffering decomposed convolution kernel data and is respectively connected with the controller and the one-dimensional convolution operation unit.

And the sub-result buffer unit is used for caching the intermediate calculation result of the one-dimensional convolution operation unit and is respectively connected with the controller, the one-dimensional convolution operation unit and the sub-result merging operation unit.

The merging result buffer unit is used for reading in and caching the convolution result of the last one-dimensional input data and the first row of the convolution kernel again, and is respectively connected with the controller and the sub-result merging operation unit.

The intermediate convolution result generated by the one-dimensional input data and the partial decomposed convolution kernel in the one-dimensional convolution operation unit is accumulated by the post-stage sub-result combination operation unit and the previously calculated sub-result, the combined sub-result is input back to the calculation framework again when the one-dimensional input data and the other partial decomposed convolution kernel perform the one-dimensional convolution operation, and is cached by the combination result buffering unit, and then is accumulated and combined with the newly calculated sub-result in the sub-result combination operation unit. And analogizing in turn until the sub-results related to all decomposed convolution kernels are calculated and merged, and the whole two-dimensional convolution calculation is completed. The input data buffer unit, the convolution kernel buffer unit, the merging result buffer unit and the sub-result buffer unit are used for synchronizing each data stream, and under the condition that the data per se is relatively good in synchronization, each buffer unit is not an unnecessary module and can be omitted.

Referring to fig. 3, fig. 3 is a schematic diagram of a one-dimensional convolution operation unit according to an embodiment of the present invention. In this embodiment, the one-dimensional convolution operation unit is in the form of a multiply-accumulate tree, and the controller generates a corresponding control signal for realizing discontinuous control of the one-dimensional input data convolution.

The working process of the one-dimensional convolution operation is as follows:

firstly, inputting first lines of one-dimensional input data and convolution kernels into an input data buffer unit and a convolution kernel buffer unit respectively for buffering, then taking out the input data from the input data buffer unit one by a one-dimensional convolution operation unit, taking out each data element of the first lines of the convolution kernels from the convolution kernel buffer unit, carrying out one-dimensional convolution operation, and buffering the calculation result in a sub-result buffer unit. Since the one-dimensional convolution operation is performed on the one-dimensional input data and the first row of the convolution kernel, no calculation result needs to be combined, and the result can be directly output and stored until the whole one-dimensional input data is finished.

And then, reading the one-dimensional input data into the input data buffer unit again for buffering, reading the first line of the convolution kernel into the convolution kernel buffer unit for buffering, and simultaneously reading the last one-dimensional input result and the convolution result of the first line of the convolution kernel into the merging result buffer unit again for buffering.

And then, the one-dimensional convolution operation unit reads the one-dimensional input data and the second line data of the convolution kernel from the input data buffer unit and the convolution kernel buffer unit respectively to perform one-dimensional convolution operation, and buffers the operation result into the sub-result buffer unit.

And finally, the sub-result merging operation unit reads the intermediate convolution result calculated and output by the one-dimensional convolution operation unit from the sub-result buffer unit, reads the intermediate convolution result of the result from the merging result buffer unit, accumulates and merges the intermediate convolution result, and outputs the final calculation result until the one-dimensional input data result completes the final two-dimensional convolution operation.

Fig. 4 shows an example of a two-dimensional convolution implementation. And performing convolution operation on the input two-dimensional data and the two-dimensional convolution kernel in a sliding window mode, wherein the sliding window can move in two directions of a row and a column respectively until the end of the row and the column, and the data elements in the sliding window are multiplied by the corresponding elements in the convolution kernel one by one and the calculation results are accumulated to complete a calculation result.

Fig. 5 is a schematic diagram illustrating a process of performing one-dimensional convolution according to an embodiment of the present invention. In this embodiment, the process of unidimensionating the two-dimensional input data further includes:

firstly, two-dimensional input data is expanded in a line mode, and the tail ends of adjacent lines are connected into one-dimensional input data (one-dimensional data for short); second, the convolution kernel is decomposed: decomposing the two-dimensional convolution kernel into a first row of convolution kernels and a second row of convolution kernels in the same row; thirdly, performing a sub-result calculation, further comprising: the one-dimensional data are respectively subjected to one-dimensional convolution operation with a first row of convolution kernels and a second row of convolution kernels, and when the one-dimensional data are subjected to one-dimensional convolution operation with the first row of convolution kernels, the sliding window moves from the first row of the one-dimensional data to the end of the last but one row of the one-dimensional data; and when the one-dimensional convolution operation is carried out on the one-dimensional data and the second row of the convolution kernel, the sliding window moves from the second row of the one-dimensional data to the end of the last row. And finally, combining the sub-results, further comprising: and correspondingly adding and combining the one-dimensional data and the sub-results of the one-dimensional convolution operation performed on the first row of the convolution kernel and the second row of the convolution kernel respectively to obtain a final calculation result.

The intermediate convolution result generated by the one-dimensional input data and the decomposed convolution kernel in the one-dimensional convolution operation is accumulated and cached with the previously calculated sub-result in the later stage sub-result combination operation, and then is accumulated and combined with the newly calculated sub-result until the sub-results related to all the decomposed convolution kernels are calculated and combined.

In convolution operations, many of the operations are accumulation-like operations. Many times, all operations cannot be completed at one time to obtain a final result, so that the operations can be completed only by iterative accumulation for many times, namely, the sub-results after the previous combination are taken as input to be combined with the calculation result to obtain an intermediate result or a final result.

The method of the embodiment of the invention can convert all two-dimensional data into one-dimensional data, and can complete two-dimensional convolution operation through one-dimensional convolution operation, thereby greatly simplifying control complexity and on-chip data routing difficulty. Meanwhile, the device of the embodiment of the invention is combined, thereby greatly improving the calculation efficiency, and reducing the addressing difficulty of data in storage and the complexity of the whole calculation control on the premise of not reducing the processing performance.

Those of ordinary skill in the art will appreciate that the various illustrative components, systems, and methods described in connection with the embodiments disclosed herein may be implemented as hardware, software, or combinations of both. Whether this is done in hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, Erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.

It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments and/or in combination with or instead of the features of the other embodiments in the present invention.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A two-dimensional convolution implementation method is characterized by comprising the following steps:

2. The method of claim 1, wherein the two-dimensional input data is expanded into the one-dimensional data in a row, column, diagonal, or irregular manner.

3. The method of claim 2, wherein the convolution kernel is decomposed in a manner substantially identical to a manner of unfolding the two-dimensional input data into the one-dimensional data.

4. A method for two-dimensional convolution implementation according to claim 1, 2 or 3,

the two-dimensional input data unidentification further comprises: expanding the two-dimensional input data in a line mode, and connecting the tail end of adjacent lines into the one-dimensional data;

5. The method of claim 4, wherein the intermediate convolution result generated by the one-dimensional input data and the decomposed convolution kernel during the one-dimensional convolution operation is accumulated and buffered with the previously calculated sub-result during the subsequent sub-result combination operation, and then accumulated and combined with the newly calculated sub-result until the sub-results related to all the decomposed convolution kernels are calculated and combined.

6. A two-dimensional convolution implementation apparatus, comprising:

7. The apparatus of claim 6, wherein said one-dimensional convolution operation unit is in the form of a multiply-accumulate tree, and said controller generates corresponding control signals for implementing discontinuous control of convolution of the one-dimensional input data.

8. The apparatus of claim 6 or 7, further comprising a plurality of buffer units, the plurality of buffer units configured to synchronize the data streams, the plurality of buffer units further comprising:

9. The apparatus of claim 8, wherein the plurality of buffer cells further comprises:

10. The apparatus of claim 9, wherein the input data buffer unit re-reads and buffers the one-dimensional input data; the convolution kernel buffer unit reads in and buffers the first row of the convolution kernel; a merging result buffer unit reads in and buffers the last one-dimensional input data and the intermediate convolution result of the first row of the convolution kernel again; the one-dimensional convolution operation unit reads the one-dimensional input data and the data of the second row of the convolution kernel from the input data buffer unit and the convolution kernel buffer unit respectively to perform one-dimensional convolution operation, and caches the operation result into the sub-result buffer unit; and the sub-result merging operation unit reads the sub-results calculated and output by the current one-dimensional convolution operation unit from the sub-result buffer unit, reads the intermediate convolution results from the merging result buffer unit, and outputs the final calculation result after accumulation and merging until the final two-dimensional convolution operation is completed.