Summary of the invention
Fundamental purpose of the present invention is to overcome deficiency of the prior art, and the multiply adding up device that needs multiple multiply accumulating pattern to support in a kind of new solution processor is provided.
In order to solve the problems of the technologies described above, the present invention is achieved by the following technical solutions:
The invention provides a kind of multiply adding up device, comprise: pre-decode unit module (10), partial product generation unit module (20), Wallace tree type adder unit module (30), the unit module that adds up (40) and net result unit module (50), described pre-decode unit module (10), partial product generation unit module (20), Wal lace tree type adder unit module (30), the unit module that adds up (40) and net result unit module (50) are connected in order successively;
Described pre-decode unit module (10) comprises multidigit multiplier load module (101), multidigit multiplicand load module (102), multidigit multiply accumulating algorithm selector module (104), square marker bit module (105), multiplication data type marker bit module (106) and pre-decode logic module (103), and multidigit multiplier load module (101), multidigit multiplicand load module (102), multidigit multiply accumulating algorithm selector module (104), square marker bit module (105), multiplication data type marker bit module (106) are connected to pre-decode logic module (103) respectively;
It is that multiplying still is the multiply accumulating computing that described multidigit multiply accumulating algorithm selector module (104) is used to select current computing, select to participate in the sign pattern of computing multiplier and multiplicand simultaneously, and whether current computing structure needs to round off;
Described square of marker bit module (105) is used to select whether current computing is square multiplication or multiply accumulating;
Described multiplication data type marker bit module (106) is used to select the data type of current multiplier and multiplicand;
The effect of described pre-decode logic module (103) is, participates in the long numeric data of multiply operation according to the indication output of MF, SQUARE, MODE signal, exports their sign pattern position simultaneously; Wherein to be used to select current computing be that multiplying still is the multiply accumulating computing to multidigit multiply accumulating algorithm function signal MF, select to participate in the sign pattern of computing multiplier and multiplicand simultaneously, and whether current computing needs to round off; A square of marking signal SQUARE is used to judge that current computing is square operation or multiply accumulating; Multiplication data type marking signal MODE is used to select the data type of current multiplier and multiplicand.
As a kind of improvement, described partial product generation unit module (20) comprises that sign extended logic module (201) and partial product produce logic module (202), sign extended logic module (201) and partial product produce logic module (202) and are connected, and are connected with pre-decode logic module (103) in the pre-decode unit module (10) respectively simultaneously;
The effect of described sign extended logic module (201) is, with one of long numeric data of pre-decode unit module (10) output and with the corresponding sign pattern of this long numeric data position as input, according to the sign pattern position this long numeric data is expanded the output long numeric data.
As a kind of improvement, described Wallace tree type adder unit module (30) comprises a Wallace tree type adder logic module (301), be connected in partial product with the multichannel form and produce logic module (202), and be connected in pre-decode logic module (103) with the two-way form;
Described Wallace tree type adder logic module (301) is used for the multidigit partial product result and the multidigit carry result of long-pending generation unit module (20) output of receiving unit, receives two round off marking signal and mode select signals of pre-decode unit module (10) output simultaneously; Described Wallace tree type adder unit module (30) also comprises a data type of process logic, this module according to the decision of the true and false of mode select signal final with number and carry digit.
As a kind of improvement, the described unit module that adds up (40) comprises logic module that adds up (401) and alternative logic module (402), the logic module that wherein adds up (401) is connected in Wallace tree type adder logic module (301) with the two-way form, and alternative logic module (402) is connected in pre-decode logic module (103) and the logic module that adds up (401);
The described logic module that adds up (401) is used to receive multidigit and the number and the carry digit of Wallace tree type adder unit module (30) output, receives the multidigit output data of alternative logic module (402) simultaneously, produces multidigit output result data;
Described alternative logic module (402) is used for receiving the previous operation result data of multiply adding up device and the remainder certificate of net result unit module (50) output, by the enable signal that adds up select the two one of be transported to multidigit output result data.
As a kind of improvement, described net result unit module (50) comprises a net result selection logic module (501), links to each other with the logic module that adds up (401), alternative logic module (402) and pre-decode logic module (103);
Described net result selects logic module (501) to be used for receiving to add up the multidigit accumulation result of unit module (40) output, produces final multiply adding up device operation result.
Compared with prior art, the invention has the beneficial effects as follows: in sum, example of the present invention has proposed a kind of MAC constructional device, is used to solve the multiplication and the multiply accumulating computing of various modes, the MAC constructional device is divided into five structural units of order, each structural unit is handled respectively and optimized; Propose a kind of partial product associating production method that does not need to produce the BOOTH code coefficient, partial product can have been produced logic and reduce by a link, reduced time delay and door number that partial product produces circuit; Propose a kind of Wallace of utilization and set the method for handling the computing of rounding off (computing of rounding off is preposition), it does not influence the realization of Wallace tree type addition, and can save in the net result generation unit because the extra additions module that carry is introduced reduces the cost that circuit is realized simultaneously at assurance function; Proposed the method for a kind of MAC functional device and dsp processor pipelining-stage coupling, time delay that can balanced each pipelining-stage satisfies DSP high workload frequency requirement; The method that example of the present invention proposed can be used in combination, and also can distinguish separately and to use, and can use in dsp processor, also can use in realizing towards the various circuit that need the MAC functional device.
Embodiment
With reference to the accompanying drawings 1~6, will be described in detail the specific embodiment of the invention 1 below.
Multiply adding up device in the present embodiment comprises: pre-decode unit module 10, partial product generation unit module 20, Wallace tree type adder unit module 301, add up unit module 40 and net result unit module 501, described pre-decode unit module 10, partial product generation unit module 20, Wal lace tree type adder unit module 30, the unit module 40 that adds up select logic module 501 to be connected successively in order with net result.Pre-decode unit module 10 comprises multidigit multiplier load module 101, multidigit multiplicand load module 102, multidigit multiply accumulating algorithm selector module 104, square marker bit module 105, multiplication data type marker bit module 106 and pre-decode logic module 103, and multidigit multiplier load module 101, multidigit multiplicand load module 102, multidigit multiply accumulating algorithm selector module 104, square marker bit module 105, multiplication data type marker bit module 106 are connected to pre-decode logic module 103 respectively; Described partial product generation unit module 20 comprises that sign extended logic module 201 and partial product produce logic module 202, sign extended logic module 201 and partial product produce logic module 202 and are connected, and are connected with pre-decode logic module 103 in the pre-decode unit module 10 respectively simultaneously; Described Wallace tree type addition module 301 is connected in partial product with the multichannel form and produces logic module 202, and is connected in pre-decode logic module 103 with the two-way form; The described unit module 40 that adds up comprises add up logic module 401 and alternative logic module 402, the logic module that wherein adds up 401 is connected in Wal lace tree type adder unit module 301 with the two-way form, and alternative logic module 402 is connected in the pre-decode logic module 103 and the logic module 401 that adds up; Described net result selects logic module 501 to link to each other with the logic module 401 that adds up, alternative logic module 402 and pre-decode logic module 103.
Among Fig. 1:
Long numeric data A, B, two bits R, one digit number is the output of pre-decode unit module 10 according to sign_A, sign_B MAC, accumulation_en, round_en, mode.
Long numeric data A*0, A*1, A*m-2, A*m-1, A*m and sub_carry are the output of partial product generation unit module 20.
Long numeric data sum and carry are the output of Wallace tree type adder unit module 301.
Long numeric data mux_product and accu_product are the output of unit module 40 of adding up.
Long numeric data product is the output of net result unit module 501.
Among Fig. 2:
The B* long numeric data, as the input of MUX case,
Certain odd bits of 2n+1 B*,
Certain odd bits of 2n-1 B*,
Booth_encoder Booth codimg logic,
One of coefficient of 2x Booth coding,
One of coefficient of 1x Booth coding,
One of coefficient of 0x Booth coding,
The sign bit of sign Booth coding,
The partial_product_gen partial product produces logic,
The A long numeric data, partial product produces the input of logic partial_product_gen,
The PnA* long numeric data, partial product produces the output of logic partial_product_gen.
Among Fig. 3:
The B* long numeric data, as the input of MUX case,
Certain odd bits of 2n+1 B*,
Certain odd bits of 2n-1 B*,
Case MUX affair logic,
An incident of 000 MUX affair logic,
An incident of 001 MUX affair logic,
An incident of 010 MUX affair logic,
An incident of 011 MUX affair logic,
An incident of 100 MUX affair logics,
An incident of 101 MUX affair logics,
An incident of 110 MUX affair logics,
An incident of MUX affair logic,
Booth_partial_product_gen Booth coding and partial product produce associating generation logic,
The A long numeric data, Booth coding and partial product are united an input that produces logic Booth_partial_product_gen,
The PnA* long numeric data, Booth coding and partial product are united the output that produces logic Booth_partial_product_gen.
Among Fig. 4:
The concrete position of Pxy partial product, wherein x represents the identification number of partial product, 0≤x≤8, y represents this part
AmassConcrete position, 0≤y≤17,
The opposite number of the concrete position of Pxy partial product, wherein x represents the identification number of partial product, 0≤x≤8, y represents the concrete position of this partial product, 0≤y≤17,
Si partial product carry result's concrete position, 0≤i≤7.
Among Fig. 5:
The accumulator long numeric data, as the input of criterion logic,
Accumulator[16:0]==17 ' b10000 criterion logic,
Accumulator[39:17], and 17 ' b0} long numeric data, one of output result of criterion logic,
Accumulator[39:16], and 16 ' b0} long numeric data, one of output result of criterion logic,
The true sensing of YES criterion logic,
The vacation of NO criterion logic is pointed to.
Among Fig. 6:
Interface processor pipelining-stage interface,
The clock processor clock,
The first order of the extendible execution level of EX1 processor,
The second level of the extendible execution level of EX2 processor,
MAC_in_EX1 MAC functional device is in the part of the first order that can expand execution level,
MAC_ia_EX2 MAC functional device can expanded the partial part of execution level,
The pre-decode unit module 10 of PARTI (10) MAC functional device,
The partial product generation unit module 20 of PART II (20) MAC functional device,
The Wallace tree type adder unit module 301 of PART III (30) MAC functional device,
The unit module 40 that adds up of PART IV (40) MAC functional device,
The net result unit module 501 of PART V (50) MAC functional device.
In essence, the effect of MAC is exactly to receive two long numeric datas (multiplier and multiplicand), finishes specific multiplication, and determines whether as requested this result and previous results added.For the long numeric data that mac device receives, the multiplication process process of signed number and unsigned number is distinguishing, so this just relates to the data type problem of multiplier and multiplicand.Simultaneously, this time the possibility of result of multiplication needs and previous result carry out addition or subtract each other to obtain final result, because the input data of elder generation's time domain the last period or frequency domain need and corresponding multiplication in the digital signal processing algorithms such as FIR, IIR, and these multiplied result add up and tire out and subtract obtaining the final operation result of current time domain or frequency domain, so multiply accumulating or take advantage of the tired function that subtracts that dsp processor is realized that DSP program is very necessary.In addition, processor particularly fixed-point processor always has the restriction of bit wide, so at multiply accumulating or take advantage of tired consideration and the balance that needs processing accuracy in the computing that subtract, therefore need roundoff function, give up the outer data bit of accuracy guarantee, to guarantee the precision of system to greatest extent.
For concrete multiplying, for obtaining best performance and minimum circuit realization, can adopt and break regular multiplication version, adopt particular algorithm or special optimizing structure.In this respect, the BOOTH encryption algorithm is encoded with every continuous three of multiplicand, obtain coefficient and the corresponding partial product of sign bit decision generation according to these codings, determine a partial product result with regard to equivalence for per two of multiplicand like this, can the partial product that multiplying is required reduce half by this method.
BOOTH coding corresponding algorithm is as shown in table 1.
For the partial product addition, can adopt Wallace tree type add structure to realize, it realizes each row vertical summation to several partial products result by 3: 2 full adders or 4: 2/5: 2 Compressor scheduling algorithms, Wallace tree type addition finally obtains two results, one is and number (sum) carry digit (carry).Can significantly reduce the number of times of partial product addition by Wallace tree type addition, with a level Four Wallace tree by 3: 2 full adder formations is example, it can receive nine partial product input vectors simultaneously, produce two output vectors (with number vector and carry digit vector) simultaneously, can significantly reduce the complexity and the time loss of partial product addition like this.
Table 1
BOOTH codimg logic table
B(2n+1,2n- 1) |
The BOOTH code coefficient |
The BOOTH coded identification |
2x |
1x |
|
0x |
|
000 |
0 |
0 |
1 |
0 |
001 |
0 |
1 |
0 |
0 |
010 |
0 |
1 |
0 |
0 |
011 |
1 |
0 |
0 |
0 |
100 |
1 |
0 |
0 |
1 |
101 |
0 |
1 |
0 |
1 |
110 |
0 |
1 |
0 |
1 |
111 |
0 |
0 |
0 |
1 |
For the matching relationship of mac device and dsp processor, the MAC constructional device is divided in a plurality of pipelining-stages and carries out and to satisfy DSP high workload frequency requirement with some clock period.
Multiply adding up device in the present embodiment comprises:
Pre-decode unit module 10, it accepts multidigit multiplier load module 101, multidigit multiplicand load module 102, multidigit multiply accumulating algorithm selector module 104, the signal of square marker bit module 105 and multiplication data type marker bit module 106 is as input, wherein to be used to select current computing be that multiplying still is the multiply accumulating computing to multidigit multiply accumulating algorithm selector module 104, select to participate in the sign pattern of computing multiplier mltiplicand and multiplicand mltiplicator simultaneously, and whether current computing structure needs to round off; Square marker bit module 105 is used to select whether current computing is square multiplication or multiply accumulating; Multiplication data type marker bit module 106 is used to select the data type of current mltiplicand and mltiplicator, this paper mac device support the integer number (for example: 16.0 forms) and multiplication of fractions (for example: 1.15), in example of the present invention, the radix point that integer is counted the index certificate is targeted at back of lowest order of data, all positions of data are all before radix point like this, the radix point of fractional exponent certificate is targeted at back of most significant digit of data, data were removed most significant digit before radix point like this, all the other everybody all after radix point.
Logic in the pre-decode unit module 10 is a multiplication pre-service logic module 103, its effect is according to MF in example of the present invention, the indication output of SQUARE, MODE signal participates in the long numeric data A and the B of multiply operation, export their sign pattern position sign_A and sign_B simultaneously, its logic is as follows:
A=mltiplicand;
It is unsigned number: sign_A=1 that MF selects multiplier;
It is unsigned number: sign_A=0 that MF selects multiplier;
SQUARE puts height: B=mltiplicand;
SQUARE puts low: B=mltiplicantor;
It is unsigned number: sign_B=1 that MF selects multiplier;
It is unsigned number: sign_B=0 that MF selects multiplier;
In addition, multiplication pre-service logic module 103 is also exported multiply accumulating enable signal accumulation, the enable signal round_en that rounds off, mode select signal mode and two marking signal R that round off, the logic of R by as make decision:
The multiplication that rounds off is forbidden: R=00;
The multiplication that rounds off enables: R=10 under the multiplication of integers pattern, R=01 under the multiplication of fractions pattern;
System keeps: R=11 then;
Partial product generation unit module 20, it accepts long numeric data A and B and the corresponding symbol position sign_A and the sign_B of 10 outputs of pre-decode unit module, its sign extended logic module 201 with B and sign_B as input, according to sign pattern position sign_B B is expanded output long numeric data B* (supposition B* has the s position), its expansion logic is:
The figure place of B be even number (j=2n, n=0,1,2.., j represent the figure place of B, below all with):
Sign_B=1:B*={0,0, B}, wherein { } represents connector, promptly B expands two 0 left in most significant digit;
Sign_B=0:B*={B[j-1], B[j-1], B}, { } expression connector wherein, B[j-1] and the numerical value of most significant digit of expression B, promptly B expands two (s=j+2) in most significant digit left with its value;
The figure place of B is odd number (1,2.., j represent the figure place of B for j=2n+1, n=0):
Sign_B=1:B*={0, B}, wherein { } represents connector, promptly B expands one 0 left in most significant digit;
Sign_B=0:B*={B[j-1], B}, { } expression connector wherein, B[j-1] numerical value of most significant digit of expression B, promptly B expands one (s=j+1) in most significant digit left with its value;
Partial product produces logic module 202 and produces the partial product result in partial product generation unit module 20, and for this logic, example of the present invention has proposed a kind of partial product associating production method that does not need to produce the BOOTH code coefficient; With Fig. 2,3 is example, example of the present invention set forth the moving party of institute ratio juris and with traditional B OOTH code coefficient and the partial product difference of production method respectively.
Fig. 2 produces the schematic diagram of logic respectively for traditional B OOTH code coefficient and partial product, BOOTH code coefficient and partial product logic module be can be produced respectively and Booth_encoder and two sub-logic modules of partial_product_gen are divided into, wherein Booth_encoder is the Booth codimg logic, this tribute signal of 2n+1 position is arrived as input in its 2n-1 position with long numeric data B, produce 2x, 1x, three coefficient output identifications position such as 0x and a sign symbol output, can obtain continuous three the Booth coded message that begins and finish with odd positions of long numeric data B in this way, this codimg logic is as shown in table 1.Partial_product_gen is that partial product produces logic, it is output as input with four of Booth_encoder, import long numeric data A simultaneously, and the input signal that is passed over by Booth_encoder is as selecting signal that long numeric data A (supposition A has the k position) is handled, thereby export a partial product PnA*, concrete logic is as follows:
2x ix 0x sign PnA*
0 0 1 0 0
0 1 0 0 {0,0,A}
1 0 0 0 {0,A[k-1],A}
0 0 1 1 0
0 1 0 1 {1,1,~A}
1 0 0 1 {1,~A[k-1],~A}
Wherein ~ A represents the radix-minus-one complement of long numeric data A.
The flow process that traditional B OOTH code coefficient and partial product produce logic respectively can reduce:
Case (B*[2n+1,2n-1]) → BOOTH coding → PnA*
Fig. 3 unites the schematic diagram that produces logic for the partial product that does not need to produce the BOOTH code coefficient that example of the present invention proposes, it can be divided into case and two sub-logic modules of Booth_partial_product_gen, wherein case is the MUX logic, it arrives this tribute signal of 2n+1 position as input with the 2n-1 position of long numeric data B, produce 000,001,010,011,100,101,110, eight incidents such as 111 grades, Booth_partial_product_gen is that Booth coding and partial product produce associating generation logic, it is handled long numeric data A (supposition A has the k position) as input with eight incidents of case output, thereby export a partial product PnA*, concrete logic is as shown in table 2:
Table 2 is united generation logic corresponding tables for the partial product that does not need to produce the BOOTH code coefficient.
Table 2 BOOTH coding and partial product are united the generation logical table
case(B*[2n+1,2n-1]) |
PnA* |
sub_carray[n] |
000,111: |
0 |
0 |
001,010: |
{0,0,A} |
0 |
011: |
{0,A[k-1],A} |
0 |
100: |
{1,~A[k-1],~A} |
1 |
101,110: |
{1,1,~A} |
1 |
Come to the same thing by merging PnA*, as long as MUX logic case is actual output five tunnel.
The partial product that example of the present invention proposes does not need to produce the BOOTH code coefficient is united the flow process that produces logic and can be reduced:
case(B*[2n+1,2n-1])→PnA*
The method that example therefore of the present invention proposes can be omitted the BOOTH cataloged procedure, directly sets up the mapping of input long numeric data B (multiplicand) to partial product PnA*, reduces its circuit as far as possible and realize under the prerequisite of assurance function.
Partial product in partial product generation unit module 20 produces logic partial_generator also needs to produce partial product carry sub_carry as a result, the partial product that not needing of adopting that example of the present invention proposes produces the BOOTH code coefficient is united and is produced logic module and can solve this demand simultaneously, and concrete logic is also as shown in table 2.
Wallace tree type adder unit module 30 comprises a Wallace tree type addition module 301, the multidigit partial product of long-pending generation unit module 20 outputs of receiving unit is A*0 as a result, A*i, ..., A*m-2, A*m-1, (wherein the value of m is relevant with the figure place of B* for A*m, m=s/2), and multidigit carry sub_carry as a result, receive two round off marking signal R and mode select signal mode of 10 outputs of pre-decode unit module simultaneously, it comprises a Wallace_tree logic, and for this logic, example of the present invention has proposed a kind of Wallace of utilization and set the method for handling the computing of rounding off (computing of rounding off is preposition), with Fig. 4 is example, and this instructions has set forth that the present invention proposes utilizes Wallace to set to handle the implementation process of the computing of rounding off.
Fig. 4 is the Wallace tree type adder logic figure of 16 multiplication, 16 multiplication need produce nine partial products, Wallace tree is formed in two superimposition of staggering successively of these nine partial products together, if this Wallace tree is considered as determinant, from vertical direction, every row are distributed in the certain bits of various piece on long-pending by several and form, can adopt 3: 2 full adders that per 3 in these row are added up, produce a result bits and a carry digit, by this type of combination, then this Wallace tree can be added up to realize nine partial products with the level Four full adder, and it finally produces one and number and a carry digit.Before not influencing the required full adder progression of realization Wallace tree addition, put, example of the present invention is expanded two to the right at the lowest order of the 9th grade of partial product, these two can be held two marking signal R that round off just, so just the computing of rounding off can be advanceed to the Wallace tree type adder logic from the net result generation unit and handle, do not influence simultaneously the realization of Wallace tree type addition, can save like this at the net result generation unit because the extra additions module that carry brings reduces the cost that circuit is realized simultaneously at assurance function.
This Wallace tree type adder unit module 301 also comprises a data type of process logic, this module according to the decision of the true and false of mode select signal mode final with number and carry digit, its concrete logic is as follows:
Mode=1: and number and carry digit are respectively two results of Wallace tree type addition
Mode=0: two results that are respectively Wallace tree type addition with several and carry digit respectively move to left one
The unit module 40 that adds up receives multidigit and the number sum and the carry digit carry of Wallace tree type adder unit module 301 outputs, receive the previous operation result data of the mac device product of add up the enable signal accumulation_en and 501 outputs of net result unit module of 10 outputs of pre-decode unit module simultaneously, produce multidigit accumulation result data accu_product.
It comprises two sub-logic modules, wherein MUX logic module is the alternative selector switch, it receives the mac device previous operation result data product and the remainder certificate of 501 outputs of net result unit module, by the enable signal accumulation_en that adds up select the two one of be transported to multidigit output data mux_product, concrete logic is:
accumulation_en=1: mux_product=product;
accumulation_en=0: mux_product=0;
Accumulator is the logic module 401 that adds up, it receives multidigit and the number sum and the carry digit carry of 301 outputs of Wallace tree type adder unit module, receive the multidigit output data mux_product of alternative logic module 402 simultaneously, produce multidigit output result data accu_product.
Net result unit module 50 comprises a net result selection logic module 501, receives the multidigit accumulation result accu_product of unit module 4 outputs that add up, and produces final mac device operation result final_product_generator.It comprises net result and selects logic module, and the previous example of the present invention of the logical and of this module proposes to utilize Wallace to set, and to handle the method for the computing of rounding off (computing of rounding off is preposition) relevant.Fig. 5 is an example with sixteen bit multiplication or multiply accumulating, and supposes that final accumulation result is 40, sets forth its specific implementation process.
In Fig. 5, the criterion logic receives multidigit accumulating operation result data accumulator, by criterion logic accumulator[16:0]=true and false of=17 ' b10000 from two output candidates accumulator[39:17], 17 ' b0} and accumulator[39:16], select specific output among the 16 ' b0}, concrete logic is as follows:
Accumulator[16:0]==17 ' b10000 is true, the bias free that adopts for the example of the present invention computing of rounding off, then not adding the former result that the R zone bit obtains in Wallace tree type addition should be: accumulator[16:0]=17 ' h08000, this situation belongs to zone bit (accumulator[16]) and gives up for the intermediate value of even number, therefore low sixteen bit numerical value is left in the basket, and final operation result is shown in the sensing of the YES among Fig. 4;
Accumulator[16:0]==17 ' b10000 is false, the 15 accumulator[15 of former operation result that does not then add the sign that rounds off] no matter be 1 (will do carry in such cases) or 0 (will do in such cases and give up), NO points to the actual result that is depicted as this computing of rounding off among Fig. 4.
Example of the present invention has proposed the method for a kind of MAC functional device and dsp processor pipelining-stage coupling, is example with Fig. 6, and this instructions has been set forth the matching relationship of the division of MAC function, functional unit combination and DSP streamline.
Suppose that dsp processor carries out multiplication or multiply accumulating computing at EX (execute) execution level, because the physical property of MAC functional device restriction, it is difficult in a dsp processor and finishes in the clock period, therefore in example of the present invention, dsp processor has adopted expansion EX level structure, be that the EX pipelining-stage is telescopic, it moves the required clock period according to functional module and shrinks automatically.
For the MAC constructional device, because example of the present invention is divided into the functional unit of five orders with it, therefore can be based on this, set up the combination of plurality of continuous functional unit, realize crucial time delay and each pipelining-stage permissible delay analysis on matching relationship of DSP by its circuit, with each functional unit of MAC functional device and combination uniform distribution thereof in each pipelining-stage, the combination by the mac device functional unit that example of the present invention proposes is defined each functional unit being distributed with in pipelining-stage with DSP streamline matching relationship trial method and is helped each pipelining-stage time delay of balance, thereby the equilibrium that realizes dsp processor designs.Further, functional unit and pipelining-stage matching process that example of the present invention proposes can also further expand, each functional unit can be continued to be subdivided into continuous plurality of sub logic module, by the combination of the sub-logic module level inside and outside the functional unit and the trial of DSP streamline matching relationship, not only can satisfy more high workload frequency requirement of processor, and more help the equilibrium of pipelining-stage.With Fig. 5 is example, example of the present invention is according to the trial of pipelining-stage delay requirement and the functional unit combination and the DSP streamline matching relationship of target dsp processor, the MAC functional device is divided into MAC_in_EX1 and MAC_in_EX2 two parts that circuit time delay equates substantially, wherein MAC_in_EX1 comprises pre-decode unit module 10, partial product generation unit module 20,30 3 functional unit block of Wallace tree type adder unit module, is arranged in the EX1 pipelining-stage and carries out; MAC_in_EX2 comprises add up unit module 40 and 50 two functional unit block of net result unit module, is arranged in the EX2 pipelining-stage and carries out; Because MAC two parts are carried out all each pipelining-stage interface (interface) output by being latched by clock (clock) of gained result, the string that therefore can not produce the processor time delay around, simultaneously for continuous MAC computing, owing to adopt the feedback mechanism of EX2-EX1, therefore can finish twice MAC computing in two continuous clock period, thereby the characteristics of the streamline that utilizes, equivalence is monocyclic MAC computing, and then has reached the optimum matching of MAC module and dsp system framework.
At last, it is also to be noted that what more than enumerate only is specific embodiments of the invention.Obviously, the invention is not restricted to above embodiment, many distortion can also be arranged.All distortion that those of ordinary skill in the art can directly derive or associate from content disclosed by the invention all should be thought protection scope of the present invention.