CN104168032A

CN104168032A - High-performance 16-base Turbo decoder with four degrees of parallelism and compatibility with LTE and WiMAX

Info

Publication number: CN104168032A
Application number: CN201410403907.2A
Authority: CN
Inventors: 陈赟; 陈旭斌; 程劼; 曾晓洋
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2014-08-16
Filing date: 2014-08-16
Publication date: 2014-11-26

Abstract

The invention belongs to the technical field of wireless communication, and specifically discloses a turbo decoder compatible with LTE and WiMAX standards, adopting 4 degrees of parallelism and a radix-16 structure. The Turbo decoder adopts 4 degrees of parallelism and a radix-16 structure. The chip design is completed based on the TSMC65nm1P9MLPCMOS process. The core area of the chip is 1.39mm ² and the maximum clock frequency is 600MHz. Compared with other Turbo decoders, it has the following improvements: a new high-radix decoding structure is proposed, which reduces the complexity of the recursive operation unit; for high-parallel, high-radix decoders, a simplified interleaver is proposed, making LTE The interleaver in the WiMAX standard can be based on a simple barrel shift network and a two-dimensional address generator; the memory structure is optimized, storage access conflicts are avoided, and the space and number of memory are reduced; through algorithm optimization and resource Sharing reduces the computational complexity.

Description

4 degree of parallelisms, the base-16 high-performance Turbo decoder of compatible LTE and WiMAX

Technical field

The invention belongs to wireless communication technology field, be specifically related to employing 4 degree of parallelisms of a kind of compatible LTE and WiMAX standard, the Turbo decoder of base-16 structure.

Background technology

The people such as the C. Berrou of France are in the theoretical foundation of convolution code and cascaded code, a kind of brand-new coding and decoding scheme---Turbo code was proposed in 1993, obtain the error-correcting performance that approaches shannon limit, in the evolution of information theory, obtained breakthrough, there is milestone significance.Turbo code is a kind of parallel cascade codes, is made up of two less recursive systematic convolutional codes (RSC) of constraint length, between two component codings, is connected by a pseudo random interleaver.Turbo decoding, based on improved maximal posterior probability algorithm (also referred to as bcjr algorithm) and introduced iterative decoding thought, has obtained very excellent coding gain.Nonbinary Turbo code was suggested in 1999, in the single clock cycle in its coding and decoding process, can process multiple information bits, thereby reduce the interleaver depth under equal code length, reduced decoding delay, and there is even better performance identical with Binary Turbo codes.

Owing to thering is decoding performance excellence, coding is simple, code length and code check can be adjusted flexibly, the series of advantages such as hardware complexity is moderate, Turbo code, once proposing to be just applied to the numerous areas such as deep space communication, mobile communication, DMB, power line communication rapidly, has especially obtained unprecedented success in mobile communication system.Table 1-1 has listed and has adopted Turbo code some Typical Representatives as chnnel coding.

People, to demand high-speed, that high quality broadband communication rapidly increases, have promoted the development of wireless communication technology and the update of communication standard greatly.During the decade short, the third generation (3G) mobile communication standard that the normal structures such as 3GPP, 3GPP2, IEEE are formulated has obtained fast development, and super three generations (B3G) standard is also in continuous evolution process.By introducing a series of technology such as OFDM, MIMO, high order modulation, advanced chnnel coding, new communication standard has been realized the continuous lifting of data throughput and the availability of frequency spectrum.For example HSPA+ standard will progressively be transitioned into 672Mbps peak value throughput from 14.4Mbps.And 3GPP tissue and the IEEE 802.16 technology collection to 4G mobile communication for International Telecommunication Union, LTE-Advanced standard and WiMAX-Advanced standard are formulated respectively, its throughput will meet or exceed the long term object of ITU for 4G: when mobile reception, reach 100Mbps, reach 1Gbps when fixed reception.

In the process of these standard evolution, there is not too large variation in the coded system of Turbo code, and is mainly to have changed traditional interleaver, makes it to have the uncontested feature that interweaves so that parallel decoding.However being applied in high-speed mobile communication system of future generation, still there is lot of challenges in Turbo code.First be the sharp increase of throughput, rise to the desired 1Gbps throughput of following 4G from the throughput of the initial desired 2Mbps of 3G, it is far from being enough that large leap like this relies on merely the development of integrated circuit fabrication process, necessarily requires to excavate from many levels such as decoding algorithm, hardware configuration, circuit realizations the ability of parallel computation; Next is the harsh requirement of mobile terminal for chip area, power consumption, also requires the improvement and bring new ideas of implementation; Be finally mutually to coexist in various standards, in the process of Parallel Development, require the designer of band receiver of base must consider the realization of the configurable terminal of multi-mode, therefore the design of the Turbo decoder of multi-mode multi-standard also becomes one of following developing direction.

Summary of the invention

The object of the present invention is to provide and a kind ofly can support employing 4 degree of parallelisms of binary system and duobinary compatible LTE and WiMAX standard, the Turbo decoder of base-16 structure simultaneously, to be applicable to the 4G mobile communication system such as 3GPP LTE/LTE-Advanced, WiMAX/WiMAX-Advanced (IEEE 802.16e/m).

Turbo decoder provided by the invention, adopts 4 degree of parallelisms, base-16 structure, completes chip design based on TSMC 65nm 1P9M LP CMOS technique, and chip core area is 1.39 mm ², maximum clock frequency is 600MHz, the high-throughput under 5.5 iteration is: 821Mbps under LTE pattern, 810.6Mbps under WiMAX pattern.There is following improvement compared to other Turbo decoders: (1) proposes new high radix decoding architecture, by comparing in advance the branch in grid chart with same initial state and same state of termination, reduce the complexity of recursive operation unit and reduced critical path simultaneously; (2) for high degree of parallelism, high radix decoder, the interleaver of simplifying has been proposed, the interleaver in LTE and WiMAX standard all can be formed based on simple barrel shift network and two-dimensional address maker; (3) optimize memory construction, avoided memory access conflict, reduced space and the number of memory; (4) by algorithm optimization and resource-sharing, reduced computational complexity.

Turbo decoder provided by the invention, as shown in Figure 8, system configuration comprises its system configuration: the soft inputting and soft output sub-decoder (SISO) of external information memory cell, system information memory cell, check digit 1 memory cell, check digit 2 memory cell, P-road permutation network, base-16, corresponding control module, address generation module and hard decision buffer memory.Wherein, P-road permutation network, soft inputting and soft output sub-decoder and corresponding memory cell are all 4 degree of parallelisms, i.e. P=4 shown in figure.External information memory cell is mainly used to be stored in last iterative process and obtains external information, replace by permutation network, input soft inputting and soft output sub-decoder is as the prior information of next iteration, and the size of external information memory cell is N/P, N is code length, and P is degree of parallelism.System information memory cell is the system information to code word for storing received, the systematic code receiving, be input to soft inputting and soft output sub-decoder by permutation network, carry out iterative decoding as the system information in iterative decoding, system information memory cell size is N/P equally.And check digit 1 memory cell and check digit 2 memory cell are respectively for corresponding 2 check codes of storing received code word, by a selector, during different sub-iteration, send into different check digit, to complete iterative decoding.The effect of address generation module is according to the progress of iterative decoding and requirement, generates corresponding address and visits these memory cell, to read corresponding external information, system information and check digit information.Soft inputting and soft output sub-decoder is the core of whole decoder, input corresponding external information, system information and selection check position by permutation network, obtain the required initial value of decoding, carry out iterative decoding, after sub-iteration completes, obtain posterior information, posterior information stores the external information as next second son iteration in external information memory cell into by permutation network.And control module is for total control address generation module and the process of permutation network and iterative decoding.After iterative decoding completes, the code word that obtains translating by hard decision, exists in hard decision buffer memory, exports as decoding.

In the present invention, system information and external information adopt four-quadrant to divide, to ensure the real-time storage of data in the decoding of base-16.Each clock cycle, 4 parallel sub-decoders need read 16 groups of intrinsic information and external information altogether, they and through 4 group of 4 parallel permutation network in tunnel.After maximum posteriori decoding, each clock cycle writes back 16 class values to external information.Wherein, address generation module provides 16 addresses under LTE pattern, 8 addresses are only provided under WiMAX pattern, their hardware resource can be completely multiplexing, and permutation network has stronger configurability, all code lengths and 1,2 under can two kinds of patterns of complete support, 4 three kinds of degree of parallelisms, permutation network is all based on simple barrel shift network.

Below some modules are described further:

1. configurable base-16 soft inputting and soft output sub-decoder

Soft inputting and soft output sub-decoder is to forward state metric α, the unit that backward state measurement β and branch metric γ calculate, as shown in Figure 9, the RAM that the inside has comprised twoport is for depositing calculative numeral information, branch metric calculation unit calculates after branch metric, be kept in register, for follow-up forward state metric and the calculating of backward state measurement, the forward state metric calculating and backward state measurement are kept in by register again, calculate forward and backward state measurement as next iteration, the corresponding LLR value of forward-backward algorithm metric calculation output that LLR arithmetic element utilization is simultaneously current are preserved.

The soft inputting and soft output sub-decoder of the present invention's design can be supported binary system turbo code and duobinary system turbo code, can support sliding window (SW) and two kinds of modes of two-way simultaneous window (PW), can support information transmission and training calculate two kinds of recursive operation initialization, and according to the pattern of rate adjust window and initialize mode, there is dynamic control power consumption.

In fact this configuration provides several important mechanism: be first to peek by two-way simultaneous, this is the prerequisite that reduces time delay, secondly LLR and backward training can be carried out to hardware resource sharing.Due to LLR arithmetic element can with forward recursive walked abreast also can with backward recursive arithmetic element, therefore the key of concurrency is the concurrency of forward direction and backward recursive.In the middle of traditional sliding window flow process, it should be noted that the grid chart between sub-block joins, therefore also can transmit mode metric.

Number at window has very high similitude in fact compared with hour two kinds of flow processs, and PW is more effective, and at this moment in PW, needed training calculating does not need to increase hardware, only need two existing arithmetic elements to calculate, therefore need classification discussion, comprise a window, two windows, the situation of multiple windows.Training computing in the time of high code check, open, in the time of low code check, can close or reduce to train length, performance loss is not obvious because in the time of different code check performance loss difference, this becomes the basis of optimised power consumption.

2. address generation module---configurable ARP and QPP interleaver

The interleaver of the turbo code using in distinct communication standards is conventionally different, and in the design of many standards decoder, interleaver designs is a key issue flexibly.ARP interleaver is a polynomial form, and as Fig. 5, QPP interleaver is quadratic polynomial form.In fact parallel interleaver can utilize the relation between parallel address, and recurrence relation between adjacent periods address is simplified the calculating of interleaving address.What in WiMAX standard, adopt is ARP interleaver, also can calculate interleaving address by recursive form.

Because window is long larger, write address is larger by the method cost of reading to store address, therefore adopts the mode of calculating in real time here, adopts the address generator of two forward directions; Two backward addresses can be obtained by LIFO, or use two reverse address generators.

3. memory cell

Storage resources under two kinds of patterns can efficient multiplexing, notice that duobinary system turbo code carries out decoding based on symbol, the external information of transmitting between adjacent sub-iteration is 2 times of binary system turbo, by the definition of log-likelihood ratio in algorithm, can be reduced to 1.5 times.For base-16 decoding, application four-quadrant splitting scheme.The each symbol of WiMAX (2-bit information bit) need to be stored 3 external information values, and therefore, under code length same case, WiMAX need to increase by 50% external information memory capacity.External information is converted to bit-level storage from symbol level, while reading, is again converted to symbol level, make like this external information memory cell of duobinary system and traditional binary decoding can be completely multiplexing.But this method can cause the performance loss of 0.1 ~ 0.2dB left and right, in the situation that degree of parallelism is lower, this part external information is higher at the proportion of whole decoder, and in the decoding architecture of highly-parallel, proportion can constantly decline.Therefore the present invention does not adopt this external information conversion method, but retains this department's storage resources to obtain good decoding performance.

Brief description of the drawings

Fig. 1 base-16 grid path in graphs merges schematic diagram.

Fig. 2 tradition base- state measurement recursive operation unit.

Fig. 3 based on path merge base- state measurement recursive operation unit.

Log-likelihood calculations unit, improved base-16 of Fig. 4.

Fig. 5 ARP address generator.

Fig. 6 duobinary system Turbo code grid chart.

Fig. 7 binary system and duobinary system Turbo code grid chart state transitions situation and with the relation of coded message.

Fig. 8 restructural LTE/WiMAX encoder/decoder system structure.

Fig. 9 soft inputting and soft output unit.

Embodiment

Below in conjunction with accompanying drawing to the present invention specifically realize optimization and improvement, be described in further details.

1. the fusion of decoding algorithm in the soft inputting and soft of base-16 output sub-decoder (SISO):

Although binary system and duobinary grid chart are distinct, by making grid chart there is identical annexation after reordering.Concrete, the position of 2,7 two states of WiMAX is exchanged, the position of 3,6 two states exchanges, and just can obtain the grid chart through rearrangement.The grid chart shape of it and Binary Turbo codes is closely similar, and in each like this state measurement recursive operation unit, the input of state variable is consistent, does not therefore need extra selector.Because inside, recursive operation unit is difficult to insert pipeline register conventionally, therefore, this effect that can reduce critical path is very useful.

Can make the two there is similar state transitions situation by rearrangement, but due to their difference of coding structure, its output situation in the situation that input information bits is identical is also inconsistent, therefore in branch metric and log-likelihood calculations, considers this species diversity again.Fig. 6 has provided the situation of two kinds of code state transitions and State-output.For its base-4 grid chart of Binary Turbo codes, obtain by conversion, therefore its state transitions be from moment is transferred to moment.For duobinary system Turbo code, each moment of encoder input dibit, therefore its grid progression only has the half of code length, its state transitions be all from arrive moment.

According to the information bit of each state transitions output, the combined situation of check bit, have 16 kinds of branch metrics, table 2 has provided the branch metric calculation formula after fortran.

Each branch metric can be split as two parts, and wherein a part is only relevant to information bit, and another part is only relevant to check digit.The former is prior information and the intrinsic information sum of information bit, we are referred to as measure information (Information-Only Metric, IOM), and the latter is check digit intrinsic information sum, we are referred to as verification tolerance (Parity-Only Metric, POM).By this fractionation, the branch metric calculation under two kinds of patterns can be completely multiplexing.After the branch metric publicity conversion of introducing, the measure information under LTE pattern has , , , 0 four kinds of values, the measure information under WiMAX pattern has , , , 0 four kinds of values.And verification tolerance is under two kinds of patterns , , , 0 these four kinds of values.Meanwhile, in branch metric buffer memory, only need to store three measure informations and two check digit intrinsic information (or three verification tolerance).

Searching out the similitude of two kinds of pattern inferior division metric calculation formula and carrying out after fortran, the amount of calculation of branch metric greatly reduces, and needed computational resource and storage resources be corresponding reducing also.Adopt original method, the calculating of each branch metric needs 5 adders, altogether needs 80 adders.And adopt after the computational methods of simplifying, altogether only need 13 adders, reduce 83.75%

Table 2 bimodulus base-4 decoding branch metric calculation formula

。

In order further to make hardware resource better multiplexing, the computing formula of base-4 log-likelihood ratio to two-stage system Turbo code converts, and can obtain the computational methods of the symbol level posterior probability identical with duobinary system Turbo code.32 possible state transitions branches in grid chart, according to the value condition of information bit, are divided into 4 classes, in each class, have 8 kinds of situations, then by the posterior probability of forward state metric and backward state measurement and branch metric addition acquisition symbol level.Due to rear in the computational process of state measurement, as calculated all situations of branch metric and backward state sum, therefore the two sum in formula , , , results of intermediate calculations that can backward state measurement is carried out buffer memory and is obtained, and this improvement can be saved 32 adders, and for high degree of parallelism decoder, this improvement can be saved a large amount of calculation resources and power consumption.

（3）

By 4 symbol level posterior probability , , , can calculate the external information of Binary Turbo codes , or the external information of duobinary system Turbo code , , .But both external information computational process difference are very large, can not allow fusion, need to calculate log-likelihood ratio and the external information of bit-level for LTE pattern, to calculate the soft output of symbol level for WiMAX pattern.Under LTE pattern, their computational methods are:

（4）

（5）

Notice , these two were just calculated before Branch Computed tolerance with value, therefore need not recalculate.

Under WiMAX pattern, their computational methods are:

（6）

（7）

Notice , , these three were just calculated before Branch Computed tolerance with value, therefore need not recalculate.

The output hard decision of Binary Turbo codes can obtain according to bit-level log-likelihood ratio information, is normally obtained by following method for the output judgement of Turbo code:

（8）

The consistency of considering subtraction operation and compare operation, the generation of the two hard decision information all can, according to formula below, be adjudicated according to the sign bit of bit-level posterior probability log-likelihood ratio.

（9）。

The Binary Turbo codes that adopts base-4 decoding, the calculating of its posterior probability also can be converted to the calculating of symbol level probability.It should be noted that, although formula 3 shows that both posterior probability computing formula are in full accord, but because the state transitions situation of the two encoder is also inconsistent, 32 kinds of paths situation in the time being divided into 4 groups according to information bit is also inconsistent, therefore must increase extra data selector, realize two kinds of compatibilities under pattern.

Also can calculate the posterior probability of bit-level for duobinary system Turbo code, but because its coding and decoding process is based on symbol, therefore can cause obvious performance loss.There is document to propose a kind of method converting between symbol level and bit-level, by stored bits level external information, in serial decoding device, reduced the storage resources of 20% left and right, can reduce performance loss by inverse transformation.But existing by performance simulation hair, this method still can cause the BER/FER performance loss of 0.1dB ~ 0.2dB left and right, on the other hand, for high degree of parallelism decoder, external information memory shared area proportion in whole decoder less (in 4 degree of parallelism base-16 decoders that the present invention realizes, external information memory proportion is less than 2%), therefore, symbolization level log-likelihood ratio storage external information of the present invention, to reduce performance loss.

2. a new high radix decoding architecture, by comparing in advance the branch in grid chart with same initial state and same state of termination, reduces the complexity of recursive operation unit and has reduced critical path simultaneously:

Computation complexity increases along with radix increases and is index, and the critical path of state measurement recursive operation unit also can constantly increase, and therefore, must take effective method reduce the complexity of high radix decoding and promote its operating frequency.

Due to the high radix Jia-ratio of state recursive operation unit-select circuit to occupy the more hardware resource of SISO sub-decoder, and limit the maximum clock frequency of decoding, high radix Jia-ratio-select the needs of circuit improve and optimize.There is not recursive operation in the calculating of branch metric and posterior probability, can adopt pipelining to promote operating frequency, but their computational complexity also can be along with radix increases and increases rapidly.Therefore, also the low complex degree of considering branch metric calculation unit and posterior probability computing unit is realized.

Gao Jijia ratio selects circuit implementation to mainly contain two kinds: a kind of is grid chart based on cascade by some groups of low order Jia-ratios-select circuit to carry out cascade, and another kind is that addition and comparison operation are carried out cascade by the grid chart based on merging.The present invention proposes a kind of high radix recursive operation unit of novelty, it is limited observing state number, therefore Gao Jijia can not exceed status number than the possible situation of the state measurement input of selecting circuit, have in the path of common initial state or common state of termination for these, maximum path only need to can obtain by the maximum of respective branch tolerance relatively, instead of must first computing mode tolerance and branch metric with could calculate maximum path.

In the renewal process of each state measurement, need the number of path of comparison for being radix, because the number of state measurement is certain, therefore branched measurement value that only need to be more identical from two paths of same state, is kept to state number by path number relatively like this.This is equivalent to and has compressed the path number that participates in recursive operation, and obviously, in the time that radix is greater than status number, this method is very effective.Because the status number of Turbo code is generally 8, therefore, in the time of base 16, can adopting said method the required 16 road acs unit of base 16 be reduced to 8 road acs units, can reach the effect that reduces complexity and reduce critical path simultaneously.As shown in Figure 1.

Fig. 2 provided traditional base- state measurement recursive operation unit, the number of adder is consistent with radix, and need to be from individual adder output compares maximum.Base provided by the invention- state measurement recursive operation cellular construction as shown in Figure 3, first compares individual branch metric, is divided into 2 ^mgroup compares, and draws the branch metric of each group of maximum, is then added with the larger branch metric drawing and corresponding state measurement, by 2 ^mthe comparator of input compares, and finally obtains optimum path.In recursive operation unit in Fig. 3 owing to having compared in advance part branch metric, therefore the number of adder is reduced to status number, also corresponding reducing of the input number of comparator, in reducing computational complexity, also make the critical path of this recursive operation unit reduce, thereby obtain higher operating frequency.

In the decoding of base-16, owing to having increased the step-length that in grid chart, state measurement upgrades, every four grid progression only calculate and store one-level state measurement value, and this makes the calculating of soft output more complicated.The method of calculating the log-likelihood ratio of base-16 has two kinds at present, a kind of method is that 4 grades of 128 kinds of state status corresponding to grid are divided into 8 groups according to every bit value condition, every group has 64 kinds of state transitions situations, this method need to be used 256 adders and 8 Zu64 road comparators, and amount of calculation is very large.Another kind method is the posterior probability that first calculates 4 bit symbols, is then converted to the log-likelihood ratio of bit-level, and this method still needs number of adders constant, and comparator is reduced to 16 group of 8 input comparator and 8 group of 8 input comparator.By using the intermediate object program of add operation in backward state measurement, the number of adder can reduce half.For base-16 in this paper, owing to having simplified the add operation of recursive operation unit, therefore the intermediate object program of adder no longer can be unit multiplexed by log-likelihood calculations.What the present invention taked is the state measurement that first recalculates the middle one-level of 4 grades of grids, then calculate respectively 2 bit symbol posterior probability corresponding to front two-stage and rear two-stage, then be converted to respectively symbol rank probability, by this new method, the number of adder and comparator can reduce, and its complexity is low.The computational methods of log-likelihood ratio in base-16 decoding architecture of the present invention's design, as shown in Figure 4, wherein, base-16 to be split into two base-4 calculate the LLR value of each bit (top and the bottom on the right in figure), in figure, the state measurement in the middle of base-16 is first calculated by ACS-4 in the left side, above 8 ACS-4 unit calculate forward state metric, below 8 ACS-4 calculate backward state measurement, 4 paths before and after comparing respectively, select optimal path and have calculated state measurement.And utilize intermediate value alpha+γ, the β+γ of the calculating of 16 ACS-4, then send into ACS-8 and be added and compare with corresponding state measurement, obtain LL _uvthe value of (uv=00,01,10,11), obtains the finally LLR value of each bit according to these values.

3. for high degree of parallelism, high radix decoder, proposed the interleaver of simplifying, the interleaver in LTE and WiMAX standard all can be formed based on simple barrel shift network and two-dimensional address maker:

The interleaver of the turbo code using in distinct communication standards is conventionally different, and in the design of many standards decoder, interleaver designs is a key issue flexibly.ARP interleaver is a polynomial form, and QPP interleaver is quadratic polynomial form.In fact parallel interleaver can utilize the relation between parallel address, and recurrence relation between adjacent periods address is simplified the calculating of interleaving address.What in WiMAX standard, adopt is ARP interleaver, also can calculate interleaving address by recursive form.

Because window is long larger, write address is larger by the method cost of reading to store address, therefore, the present invention adopts the mode of real-time calculating, the address generator that adopts two forward directions, two backward addresses can be obtained by LIFO, or use two reverse address generators.

4. optimize storage structure, avoids memory access conflict, reduces space and the number of memory:

Except total storage size, the bit wide of memory, the degree of depth, type (single port, twoport, SRAM, Register Files etc.), Deng area and power consumption that all can appreciable impact memory, in addition, except these factors, read-write clock frequency, read-write number of times in the unit interval also can appreciable impact power consumption.

In storage resources, the storage size of channel intrinsic information and external information is determined by maximum code length and the lowest bit rate of system, does not change along with sub-block degree of parallelism.Adopt after sliding window setting technique, the branch metric in each MAP sub-decoder and state measurement memory space are by the long decision of window, and the value of boundary condition tolerance reduces along with the increase of degree of parallelism.And when degree of parallelism is lower, channel intrinsic information and external information storage account for the largest percentage, therefore can consider the optimization of its memory.Along with the increase of degree of parallelism, the storage resources proportion in MAP sub-decoder constantly increases.

External information need to provide access hole conventionally simultaneously, and the method therefore realizing has double port memory, adopts two single port memories of ping-pong operation, can be operated in the memory of Clock Doubled.Find by the research to ARP and QPP interleaver, it can be divided into Ji Qiouji or four-quadrant set according to the lowest order of address, and between sequence address and interleaving address, meet the rule of mapping one by one, therefore divide to realize by rational memory block and use single port memory.State measurement employing method for normalizing can increase the bit wide of storage, but the increase of this part is very limited.

5. by algorithm optimization and resource-sharing, reduced computational complexity:

Although the present invention uses after (Scaling)-MAX-Log-MAP decoding algorithm, addition and comparison operation in decode procedure, are only comprised, but meeting under high-throughput requirement, concurrency in algorithm is fully excavated, and be mapped to a large amount of hardware resources, how in the process that completes necessary calculating, to excavate shared data and computing can be saved great amount of hardware resources.

From the arthmetic statement process of MAX-Log-MAP, can see that in decoding, main arithmetic operation comprises: branch metric calculation, state measurement recursive calculation, posterior probability are calculated, in fact in existing document, conventionally only describe state measurement recursive calculation and improve one's methods, and branch metric and posterior probability are calculated description and improve less, but in high radix decoding circuit, the area that Zhe Liang department is shared and power dissipation ratio regular meeting enlarge markedly, and therefore become one of key of design of encoder.

The computation complexity of branch metric is also relevant to lowest bit rate and decoding radix that system is supported.The lowest bit rate of CDMA2000, the EV-DO that for example 3GPP2 organizes to set up, middle regulation is 1/5, and along with the increase of decoding radix, the complexity of branch metric also increases rapidly on the other hand.

What in the calculating due to Max-Log-MAP algorithm state tolerance and posterior probability, rely on is the relative size of state measurement and branch metric, therefore can do a fortran to branch metric, difference is participated in to computing as new branch metric, thereby reduce storage and the amount of calculation of branch metric.Be decoded as example with base-2, original branch metric formula is:

（1）

Here we deduct 4 kinds of branch metrics this branch metric, thereby the branch metric calculation formula being simplified:

（2）。

Table 1 has shown and utilizes above-mentioned fortran's method can save more calculation resources.

The branch metric calculation formula (3GPP standard, 1/3 code check) of table 1 base-4 decoding architecture

。

Claims

1. A kind of 4 parallel degrees compatible with LTE and WiMAX, base-16 high-performance Turbo decoder, it is characterized in that comprising: external information storage unit, system information storage unit, check digit 1 storage unit, check digit 2 storage Unit, P-way permutation network, base-16 soft-input and soft-output sub-decoder, corresponding control module, address generation module and hard decision buffer; wherein, way permutation network, soft-input and soft-output sub-decoder and corresponding The storage units of all are 4 degrees of parallelism; the extrinsic information storage unit is mainly used to store the extrinsic information obtained in the previous iteration process, which is replaced by the permutation network and input to the soft-input and soft-output sub-decoder as the prior information for the next iteration , the size of the external information storage unit is N/P, N is the code length, and P is the degree of parallelism; the system information storage unit is used to store the system information of the received codeword, that is, the received system code, which is input to the soft Input the soft output sub-decoder to perform iterative decoding as the system information in iterative decoding. The size of the system information storage unit is also N/P; the parity bit 1 storage unit and the parity bit 2 storage unit are used to store and receive The corresponding two check codes in the code word are sent to different check digits during different sub-iterations through a selector to complete iterative decoding; the function of the address generation module is based on the progress and requirements of iterative decoding , to generate corresponding addresses to access these storage units to read the corresponding external information, system information and check bit information; the soft input and soft output sub-decoder inputs the corresponding external information, system information and selected calibration information through the replacement network. Verify the position, obtain the initial value required for decoding, and perform iterative decoding. After the sub-iteration is completed, the posterior information is obtained, and the posterior information is stored in the external information storage unit through the replacement network as the external information of the next sub-iteration; the control module It is used for overall control of the address generation module, permutation network, and iterative decoding process; when the iterative decoding is completed, the decoded codeword is obtained through hard decision, stored in the hard decision buffer, and used as a decoding output.

2. Turbo decoder according to claim 1, it is characterized in that, system information and external information adopt four quadrants to divide, to guarantee the real-time storage of data in base-16 decoding; 4 parallel software of each clock cycle The input soft output sub-decoder needs to read a total of 16 sets of intrinsic information and extrinsic information, and their sum passes through 4 sets of 4-way parallel permutation networks; after decoding with maximum a posteriori probability, the external information is written back every clock cycle 16 sets of values; among them, the address generation module provides 16 addresses in LTE mode, but only 8 addresses in WiMAX mode, and their hardware resources are fully reused; the replacement network is configurable and can fully support LTE mode And all code lengths and three parallel degrees of 1, 2, and 4 in WiMAX mode, the permutation network is based on a simple barrel shift network.

3. The Turbo decoder according to claim 1, wherein said soft-input and soft-output sub-decoder is a unit for calculating forward state metric α, backward state metric β and branch metric γ , wherein the dual-port RAM is used to store the code sub-information that needs to be calculated. After the branch metric calculation unit calculates the branch metric, it is stored in the register for subsequent calculation of the forward state metric and backward state metric. The forward state metric and backward state metric are temporarily stored in registers for the next iteration to calculate the forward and backward state metric. At the same time, the LLR operation unit uses this forward and backward metric to calculate the corresponding LLR value and output save;

The soft-input and soft-output sub-decoder supports binary turbo codes and duobinary turbo codes, supports sliding window (SW) and bidirectional parallel window (PW), and supports two recursive operation initializations, information transfer and training calculation, And adjust the window mode and initialization mode according to the code rate.

4. Turbo decoder according to claim 3, it is characterized in that, described address generation module configures ARP and QPP interleaver, and parallel interleaver utilizes the relation between parallel addresses, and the relation between adjacent cycle addresses The recursive relationship simplifies the calculation of the interleaved address; the real-time calculation method is used, using two forward address generators; the two backward addresses are obtained by LIFO, or two reverse address generators are used.

5. Turbo decoder according to claim 4, is characterized in that, in described soft input soft output sub-decoder, decoding algorithm is as follows:

Swap the positions of the two states 2 and 7 of WiMAX, and the positions of the two states 3 and 6 to obtain a reordered grid diagram;

According to the state transition and state output of the two codes, that is, the combination of information bits and parity bits output by each state transition, there are 16 kinds of branch metrics. The transformed branch metric calculation formula is shown in Table 2:

Table 2 Calculation formula of bimodal radix-4 decoding branch metric

.

6. Turbo decoder according to claim 5, is characterized in that, in order to further make hardware resources better multiplexing, the calculation formula of the base-4 logarithmic likelihood ratio of two-level system Turbo code is transformed, Obtain the same symbol-level posterior probability calculation method as the duobinary Turbo code; that is, the 32 possible state transition branches in the grid diagram are divided into 4 categories according to the value of the information bit, and there are 8 situations in each category , and then add the forward state metric, the backward state metric and the branch metric to obtain the symbol-level posterior probability;

(3)

by 4 symbol-level posterior probabilities , , , , can calculate the extrinsic information of the binary Turbo code , Or the extrinsic information of the duobinary Turbo code , , ;

For LTE mode, it is necessary to calculate bit-level log-likelihood ratio and external information, while for WiMAX mode, it is necessary to calculate symbol-level soft output; in LTE mode, their calculation methods are:

(4)

(5)

noticed , These two sums have already been calculated before calculating the branch metric, so there is no need to recalculate;

In WiMAX mode, they are calculated as:

(6)

(7)

noticed , , These three sums have already been calculated before calculating the branch metrics, so there is no need to recalculate.