Summary of the invention
(1) technical problem that will solve
At defective that exists in the prior art and deficiency, the purpose of this invention is to provide a kind of be used for H.264/AVC the parallel normalized coding realization circuit and the coding method based on CABAC of video coding agreement, it has solved in former normalized and the output code flow production process calculating bottleneck by correlation caused between bit, can avoid the cycling of existing algorithm, and the CABAC pipeline stall problem that is caused is operated in multicycle normalization.
(2) technical scheme
For achieving the above object, the invention provides a kind of parallel normalized coding and realize circuit based on CABAC, comprise first order streamline, be used to finish the normalization operation, and second level streamline, be used to produce output code flow, the two connects with First Input First Output FIFO, and described first order streamline comprises:
Mode standard L and R upgrade engine, are used to finish the lower limit L[9:0 that encodes before the normalization operation under the standard code pattern] and the code area between R[8:0] renewal, this engine be input as binVal, valMPS, pStateIdx[5:0], R[8:0], L[9:0], be output as intermediate object program R
M/LPS[8:0] and L
M/LPS[9:0];
Mode standard normalization engine is used to finish under the standard code pattern between the code area and the normalization operation of coding lower limit, and it is input as R
M/LPS[8:0] and L
M/LPS[9:0] is output as ρ [3:0],
[8:0] and
[9:0];
Mode standard is followed bit and is upgraded engine, is used to finish under the standard code pattern variable β [2:0] that the renewal of following bit count variable bits_follow and generation are write intersegmental FIFO, and the input signal of this engine is L
M/LPS[9:0], ρ [3:0], mode and bits_follow[7:0], output signal is bits_follow[7:0] updating value btf
Reg[7:0] and the β [2:0] that writes intersegmental FIFO;
Bypass mode L upgrades engine, is used to calculate under the bypass coding mode updating value of coding lower limit, this engine be input as binVal, R[8:0], L[9:0], be output as L
Byp[10:0];
Bypass mode normalization engine is used to finish under the bypass mode normalization of coding lower limit is operated, and is input as L
Byp[10:0] is output as
[9:0];
Bypass mode is followed bit and is upgraded engine, is used to finish under the bypass mode the renewal of variable bits_follow, and input signal is L
Byp[10:9] and bits_follow[7:0], output signal is the updating value btf of bits_follow
Byp[7:0];
Wherein, bits_follow[7:0] register is used to store current variable bits_follow value; Code area inter-register R[8:0] be used to store present encoding interval variable R value; Coding lower limit register L[9:0] be used to store present encoding lower limit variables L value; PStateIdx[5:0] be to produce the probability index value of the current bit that is encoded by prime context model engine; ValMPS is big probability value of symbol, is produced by prime context model engine; BinVal is the value of the current bit that is encoded; In the standard code pattern, R
M/LPS[8:0] and L
M/LPS[9:0] is respectively between the preoperative code area of normalization and the coding lower limit,
[8:0] and
[9:0] is respectively between code area after the normalization operation and the coding lower limit, and intermediate variable n is R
M/LPSLeading 0 number in [8:0], variable ρ equals 9-n; In the bypass coding mode, L
Byp[10:0] is the preoperative coding lower limit of normalization,
[9:0] is the coding lower limit after the normalization operation;
Two kinds of coding modes of described standard code pattern and bypass coding mode are controlled by input signal mode, when working in the standard code pattern, and the output of mode standard normalization engine
[8:0] and
[9:0] is used to upgrade code area inter-register R[8:0] and coding lower limit register L[9:0], mode standard is followed the output btf that bit upgrades engine
Reg[7:0] is used to upgrade register bits_follow[7:0], if R
M/LPSThe value of [8:0] is less than 256, register bits_follow[7:0] output, mode standard L and R upgrade the output L of engine
M/LPS[9:3] and mode standard are followed the btf[7:0 that output β [2:0] that bit upgrades engine writes the FIFO tail entry respectively], low[6:0] and beta[2:0] in the field; When working in the bypass coding mode, the output of bypass mode normalization engine
[9:0] is used for upgrading coding lower limit register L[9:0], code area inter-register R[8:0] value remain unchanged, bypass mode is followed the output btf that bit upgrades engine
Byp[7:0] is selected for and upgrades register bits_follow[7:0], register bits_follow[7:0] output write the btf[7:0 of FIFO tail entry] field, L
Byp[10] write the low[6 of FIFO tail entry] field, with 0 beta[2:0 that writes the FIFO tail entry] field.
Wherein, the degree of depth of described FIFO is 5 sections, and every is 18 bit bit wides, by btf[7:0], low[6:0] and beta[2:0] field forms.
Wherein, described second level streamline is to produce engine at the output code flow that phase weekly produces multidigit output bit, described output code flow produces engine and comprises precedence bits output engine and suffix bits output engine, described precedence bits output engine is connected same selector respectively with the suffix bits output engine, the input signal low[6 of precedence bits output engine] be connected to FIFO output item low[6]; The input signal btf[7:0 of precedence bits output engine] be connected to FIFO output item btf[7:0]; The input signal low[5:0 of suffix bits output engine] be connected to the low[5:0 of FIFO output item]; The input signal beta[2:0 of suffix bits output engine] be connected to the beta[2:0 of FIFO output item].
The present invention also provides a kind of parallel normalized coding method based on CABAC that utilizes foregoing circuit to realize.
Wherein, under the standard code pattern, definition γ is L
M/LPSLast is on the throne by 0 bit in [8:9-n], and works as L
M/LPSWhen [8:9-n] do not exist for 0 bit, γ equaled 0, according to variable R
M/LPS[8:0], L
M/LPS[9:0], n and γ obtain by shifting function and logical operation
[8:0] and
[9:0].
Wherein, under the standard code pattern, btf
Reg[7:0] is normalization operation back variable bits_follow[7:0] value, according to current variable bits_follow[7:0] value, n and γ obtain btf by arithmetic operator and logical operation
Reg[7:0].
Wherein, under the bypass coding mode, according to current variables L
BypThe value of [10:0] obtains by logical operation
[9:0].
Wherein, under the bypass coding mode, btf
Byp[7:0] is normalization operation back variable bits_follow[7:0] value, according to variable bits_follow[7:0] and L
BypThe value of [10:9] obtains btf by arithmetic operator and logical operation
Byp[7:0].
(3) beneficial effect
Compared with prior art, the present invention can produce following beneficial effect: in software was realized, the normalized of current bit did not need cycling, thereby has improved normalized speed; In circuit was realized, the normalization of any bit was the monocycle processing, the pipeline stall of having avoided former multicycle normalization operation to be introduced; Normalized is decomposed into two level production lines with producing the operation of RBSP code stream, connects with 5 grades of first-in first-out (FIFO) register between two level production lines, this structure can effectively be avoided pipeline stall.Therefore, the CABAC pipeline stall problem that the multicycle normalization operation of original algorithm is caused has been avoided in this design.And the throughput that the circuit that proposed is realized is constant, and its throughput handles with institute that small probability symbol probability of happening has nothing to do in the bit stream.
Embodiment
Below in conjunction with drawings and Examples, the specific embodiment of the present invention is described in further detail.Following examples are used to illustrate the present invention, but are not used for limiting the scope of the invention.
According to the circuit overall architecture block diagram of the embodiment of the invention as shown in Figure 3.The definition of some variablees at first is described:
Input signal:
PStateIdx[5:0]: the probability index value of the current bit that is encoded is produced by prime context model engine;
ValMPS: big probability value of symbol is produced by prime context model engine;
BinVal: the value of the current bit that is encoded;
Mode:0 shows work at present in the standard code pattern, and 1 shows that work at present is in the bypass coding mode.
Output signal:
ToRBPS[7:0]: with the byte is unit, writes the code stream of RBSP;
RBPS_we: write out enable signal to RBSP.
Be respectively R with the coding lower limit between the preoperative code area of definition normalization
M/LPS[8:0] and L
M/LPS[9:0]; Be respectively with the coding lower limit between the code area after the definition normalization operation
[8:0] and
[9:0]; Definition R
M/LPSLeading 0 number is n in [8:0]; Definition L
M/LPSIn [8:9-n] last on the throne by 0 bit be that γ (works as L
M/LPS[8:9-n] (is L
M/LPSThe 8th to the 9-n position in [9:0]) when not existing for 0 bit, γ equals 0); Definition btf
Reg[7:0] is the value of normalization operation back variable bits_follow; Be limited to L under the coding before the definition normalization operation
Byp[10:0]; Be limited under the coding of definition normalization operation back
[9:0]; Definition btf
Byp[7:0] is the value of normalization operation back variable bits_follow.
Circuit structure as shown in Figure 3 is made of two level production lines.First order streamline comprises following functional part:
Bits_follow[7:0] register: store current variable bits_follow value;
Code area inter-register R[8:0]: storage present encoding interval variable R value;
Coding lower limit register L[9:0]: storage present encoding lower limit variables L value; L[9:0] be limited to 10 bit signals under the presentation code, be up to and be numbered 9, lowest order is numbered 0, and among the present invention, other variable with same form adopts above-mentioned method for expressing definition.
Mode standard L and R upgrade engine: finish the logic function before (shown in Fig. 9 in the list of references 1-7) normalization operation under the standard code pattern, this engine be input as binVal, valMPS, pStateIdx[5:0], R[8:0], L[9:0], be output as intermediate object program R
M/LPS[8:0] and L
M/LPS[9:0];
Mode standard normalization engine: finish under the standard code pattern between the code area and the normalization operation of coding lower limit, this engine be input as R
M/LPS[8:0] and L
M/LPS[9:0] is output as ρ [3:0],
[8:0] and
[9:0];
Mode standard is followed bit and upgraded engine: finish under the standard code pattern variable β [2:0] that renewal and generation to variable bits_follow write intersegmental FIFO, the input signal of this engine is L
M/LPS[9:0], ρ [3:0], mode and bits_follow[7:0], output signal is the updating value btf of bits_follow
Reg[7:0], and the β [2:0] that writes intersegmental FIFO;
Bypass mode L upgrades engine: finish under the bypass coding mode evaluation work of coding lower limit updating value, this engine be input as binVal, R[8:0], L[9:0], be output as L
Byp[10:0];
Bypass mode normalization engine: finish under the bypass mode normalization of coding lower limit is operated, be input as L
Byp[10:0] is output as
[9:0]; And
Bypass mode is followed bit and upgraded engine: finish under the bypass mode the renewal of variable bits_follow, input signal is L
Byp[10:9] and bits_follow[7:0], output signal is the updating value btf of bits_follow
Byp[7:0].
Above-mentioned first order streamline has two kinds of mode of operations: standard code pattern and bypass coding mode, the selection of its mode of operation is controlled by input signal mode.
When first order streamline worked in the standard code pattern, it is in running order that mode standard L and R renewal engine, mode standard normalization engine and mode standard are followed bit renewal engine.The output of mode standard normalization engine
[8:0] and
[9:0] is used to upgrade code area inter-register R[8:0] and coding lower limit register L[9:0].Mode standard is followed the output btf that bit upgrades engine
Reg[7:0] is used to upgrade register bits_follow[7:0].If R
M/LPSThe value of [8:0] is less than 256, register bits_follow[7:0] output, mode standard L and R upgrade the output L of engine
M/LPS[9:3] and mode standard are followed the btf[7:0 that output β [2:0] that bit upgrades engine writes the FIFO tail entry respectively], low[6:0] and beta[2:0] in the field.
When first order streamline worked in the bypass coding mode, it is in running order that bypass mode L renewal engine, bypass mode normalization engine and bypass mode are followed bit renewal engine.The output of bypass mode normalization engine
[9:0] is selected, is used for upgrading coding lower limit register L[9:0].Note when being in the bypass coding mode, code area inter-register R[8:0] value remain unchanged.Bypass mode is followed the output btf that bit upgrades engine
Byp[7:0] is selected, is used to upgrade register bits_follow[7:0].Register bits_follow[7:0] output write the btf[7:0 of FIFO tail entry] field, L
Byp[10] write the low[6 of FIFO tail entry] field, with 0 beta[2:0 that writes the FIFO tail entry] field.
Wherein, be connected with FIFO between the first order and the second level streamline.The degree of depth of FIFO can be preferably 5 sections more than or equal to 5 sections, and the bit wide of every memory block is 18 bits among the FIFO, and every by field btf[7:0], low[6:0] and beta[2:0] form.
Second level streamline comprises that the RBSP bit generates engine.The RBSP bit generates the buffer register buf[7:0 that comprises one 8 bit in the engine], precedence bits output engine and suffix bits output engine.If FIFO non-NULL, RBSP bit generate engine from FIFO read head pointer Storage Item pointed, according to btf[7:0 in this], low[6:0] and beta[2:0] field generates output code flow.The output port that the RBSP bit generates engine is: 1. be the bit stream data output port toRBPS[7:0 that unit writes RBSP with the byte]; 2. output enable signal RBPS_we.The bit output procedure that the RBSP bit generates engine is divided into two stages, and the phase I is precedence bits output, and second stage is suffix bits output.Output bit production process to these two stages carries out specific description below.
Precedence bits output: suppose before the precedence bits output function that the remaining not bit number of output is m (is expressed as buf[m-1:0]) in the buffer register, precedence bits is exported and will be taken
Individual clock cycle complete operation, wherein
Be the computing that rounds up.The precedence bits output function is divided into two kinds of situations:
1) if the value of btf+m+1 less than 8, with Bit String buf[m-1:0], low[6], btf{! Low[6] } } } write back the buffer register buf[7:0 of 8 bits] low btf+m+1 position, i.e. buf[btf+m:0].(symbol " { } " is the step-by-step concatenation, bft{! Low[6] } } represent that a continuous btf value is! Low[6] the Bit String of bit splicing,! Low[6] expression is to bit variable low[6] carry out inversion operation);
2), divide three steps output precedence bits when i 〉=1:
A) in the 1st clock cycle, output buf[m-1:0], low[6], (8-m-1)! Low[6] } } } in the RBSP code stream, if i equals 1, just can finish the precedence bits output function this moment, otherwise:
B) if i greater than 2, from the 2nd to the i-1 clock cycle, weekly the phase in the RBSP code stream, write 8{! Low[6] } };
C) establish variable k and equal btf+m+1-8 * (i-1), in the i clock cycle, will k{! Low[6] } } write buf[k-1:0].
As beta[2:0]>0 the time, the suffix bits output engine is with low[5:6-beta] output.If buffer register buf[7:0 before the suffix bits output] in remaining as yet not the bit number of output be k, discuss in two kinds of situation:
If a) k+beta<8 will { buf[k-1:0], low[5,6-beta] } write back buffer register buf[7:0] low k+beta position;
B) otherwise, i.e. k+beta 〉=8, this moment, k was necessarily greater than 2, at first in the period 1 { buf[k-1:0], low[5:k-2] } was write the RBSP code stream, if k+beta equals 8, finished the suffix bits output function this moment, otherwise; In second round, with low[k-3:6-beta] write buffer register buf[7:0] low k+beta-8 position.
The implementation structure of each part in the circuit overall architecture block diagram of the embodiment of the invention is described respectively below.
Mode standard L and R upgrade the engine circuit design as shown in Figure 4.Wherein the combinational logic of coding range look-up table is the R[7:6 according to the present encoding interval] and pStateIdx[5:0] realize tabling look-up logic, its truth table is shown in table 9-33 in the document 1.
The design of mode standard normalization engine circuit as shown in Figure 5.Wherein, symbol M>>n representative is the variable M n bit manipulation that moves to right, high-order zero padding.Symbol R
M/LPS<<n representative is with variable R
M/LPS[8:0] the n bit manipulation that moves to left, the low level zero padding.Symbol L
M/LPS<<n representative is with variables L
M/LPS[9:0] the n bit manipulation that moves to left, the low level zero padding.Leading 0 counting is to detect in its input signal leading 0 number, adopts the circuit design in the list of references 2 (October 2009 for Synopsys Inc., " Design Ware BuildingBlock IP Documentation Overview ").
Mode standard is followed in the bit renewal engine and is generated btf
RegThe circuit structure of [7:0] as shown in Figure 6.Wherein, symbol L
M/LPS[8:0]>>ρ represents L
M/LPS[8:0] the ρ bit manipulation that moves to right, high-order zero padding.The function of position rotary Engine is the output signal that the input signal of its i bit is connected to the 8-i bit.Leading 1 counting is to detect in its input signal leading 1 number, adopts the circuit design in the list of references 2.Expression formula δ ≠ 9-ρ? 0:1 is expressed as follows logical relation: be output as 0 when δ is not equal to 9-ρ, otherwise is output as 1.
Mode standard follow bit upgrade generate β [2:0] in the engine circuit structure as shown in Figure 7.Wherein, input signal δ is the output of leading 1 counting module among Fig. 6.When being operated in the bypass coding mode, β [2:0] is constantly equal to 0.
Bypass mode L upgrades engine and finishes following logic function: if input variable binVal equals 0, L
Byp[10:0] equals L[9:0] move to left one; Otherwise promptly binVal equals 1, L
Byp[10:0] equals L[9:0] move to left after one and R[8:0] sum.
Bypass mode normalization engine is finished following function:
[8:0] equals L
Byp[8:0]; Work as L
Byp[10] (be L
BypThe 10th of highest order in [10:0]) and L
Byp[9] all be at 1 o'clock,
[9] be 1, otherwise
[9] be 0.
Bypass mode is followed bit renewal engine and is finished following function: work as L
Byp[10:9] is 01 o'clock, btf
Byp[7:0] equals bits_follow[7:0] add 1; Otherwise, btf
Byp[7:0] equals 0.
The data-path circuit of second level streamline RBSP bit generation engine as shown in Figure 8.This data path is mainly by the buffer storage buf[7:0 of 8 bits], precedence bits output engine and suffix bits output engine form.The input signal low[6 of precedence bits output engine] be connected to the low[6 of FIFO output item]; The input signal btf[7:0 of precedence bits output engine] be connected to FIFO output item btf[7:0]; The input signal low[5:0 of suffix bits output engine] be connected to the low[5:0 of FIFO output item]; The input signal beta[2:0 of suffix bits output engine] be connected to the beta[2:0 of FIFO output item].
Foregoing circuit is realized when the parallel normalized coding method that realizes based on CABAC in such a way:
For first order streamline:
1) in the standard code pattern, according to variable R
M/LPS[8:0], L
M/LPS[9:0], n and γ obtain by shifting function and logical operation
[8:0] and
[9:0], this algorithm do not need circulation in software is realized, finish with the combinational logic monocycle in circuit is realized.
2) in the standard code pattern, the algorithm that is proposed is according to the value of current variable bits_follow, and n and γ obtain btf by arithmetic operator and logical operation
Reg[7:0].This algorithm does not need circulation in software is realized, finish required function with the combinational logic monocycle in circuit is realized.
3) in the bypass coding mode, the normalization operation only relates to the operation to coding lower limit and variable bits_follow.The algorithm that is proposed is according to current variables L
BypThe value of [10:0] obtains by logical operation
[9:0] do not need circulation in software is realized, finish required function with the combinational logic monocycle in circuit is realized.
4) in the bypass coding mode, definition btf
Byp[7:0] is the value of normalization operation back variable bits_follow.The algorithm that is proposed is according to variable bits_follow[7:0] and L
BypThe value of [10:9], by count and logical operation obtain btf
Byp[7:0], required operating in the software realization do not need circulation, finishes required function with the combinational logic monocycle in circuit is realized.
5) in circuit is realized, improved the RBSP code stream and produced the throughput of engine, and normalization engine and output code flow are produced engine be decomposed into two-stage pile line operation, between two level production lines be 5 FIFO connection with the degree of depth.
6) if current normalization action need produces output code flow and FIFO is non-full, the normalization engine writes to the tail pointer of FIFO Storage Item pointed: low[6:0], beta[2:0] and btf[7:0].If current FIFO non-NULL, output code flow produce engine and take out low[6:0 from FIFO head pointer Storage Item pointed at every turn], beta[2:0] and btf[7:0], generate output code flow by parsing to these information.
And the output code flow of second level streamline produces engine and can produce multidigit output bit in phase weekly, is that unit is written out to RBSP with the byte at every turn.Because the throughput of output code flow is greater than CABAC normalization operation (operation of first order streamline), so,, just can not stagnate prime normalization engine pipeline as long as the degree of depth of FIFO is enough even the processing of current FIFO item is needed the multicycle operation.According to experiment, the degree of depth of FIFO is changed to 5 sections and can meets the demands.
Specifically, in the standard code pattern, normalization algorithm that is proposed and output code flow algorithm are as follows:
(1)
[8:0] equals the R that moves to left
M/LPS[8:0] n position;
(2)
[9:0] obtains by following operation:
The definition temporary variable
Equal the L that moves to left
M/LPSLow 10 behind [9:0] n position;
If at L
M/LPSDo not have 0 in [9:9-n+1],
[9] equal
Otherwise,
[9]
equal 0.
(3) for making things convenient for mark, definition ρ equals 9-n.The generation of output code flow and variable bits_follow[7:0] renewal in two kinds of situation:
If L
M/LPS[9] equal 1, at first will
Write RBSP;
If γ equals 0, with L
M/LPS[8: ρ+1] write RBSP; Otherwise, with L
M/LPS[8: γ+1] write RBSP;
Variable btf
Reg[7:0] equal max (0, γ-ρ).
Otherwise,
If γ equals 0, btf
Reg[7:0] equals bits_follow+ (9-ρ), in the case, do not need to write any data to RBSP;
Otherwise,
At first will
Write RBSP;
Then with L
M/LPS[8: γ+1] write RBSP;
Obtain variable btf
Reg[7:0] equals γ-ρ.
In the bypass coding mode, be limited to L under the preoperative coding of definition normalization
Byp[10:0]; Be limited under the coding after the definition normalization operation
[9:0]; Definition btf
Byp[7:0] is normalization operation back bits_follow[7:0] value.Normalization algorithm that is proposed and output code flow algorithm are as follows:
(1) if L
Byp[10] equal 1,
[9:0]=L
Byp[9:0]; Otherwise,
[9]=0 and
[8:0]=L
Byp[8:0];
(2) if L
Byp[10:9] equals 01, btf
Byp[7:0] equals bits_follow+1; Otherwise btf
Byp[7:0] equals 0;
(3) 3 kinds of situations of output code flow point:
If L
Byp[10] equal 1, will
Write RBSP;
Otherwise, if L
Byp[10:9] equals 00, will
Write RBSP;
Otherwise, do not need to write any data to RBSP.
As can be seen from the above embodiments, the solution of the present invention is in software is realized, the normalized of current bit does not need cycling, thereby has improved normalized speed; In circuit was realized, the normalization of any bit was the monocycle processing, the pipeline stall of having avoided former multicycle normalization operation to be introduced; Normalized is decomposed into two level production lines with producing the operation of RBSP code stream, connects with 5 grades of first-in first-out (FIFO) register between two level production lines, this structure can effectively be avoided pipeline stall.Therefore, the CABAC pipeline stall problem that the multicycle normalization operation of original algorithm is caused has been avoided in this design.And the throughput that circuit proposed by the invention is realized is constant, and its throughput handles with institute that small probability symbol probability of happening has nothing to do in the bit stream.
The above only is embodiments of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the technology of the present invention principle; can also make some improvement and modification, these improve and modification also should be considered as protection scope of the present invention.