CN100508405C

CN100508405C - Parallel decoding method and decoding device for improving turbo code decoding speed

Info

Publication number: CN100508405C
Application number: CNB2005101157769A
Authority: CN
Inventors: 郑银香; 张秀军; 周世东; 许希斌; 粟欣; 肖立民; 赵明; 王京
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2005-11-11
Filing date: 2005-11-11
Publication date: 2009-07-01
Anticipated expiration: 2025-11-11
Also published as: CN1758543A

Abstract

A parallel decoding method and a decoding device for improving the speed of a Turbo code decoder belong to the technical field of Turbo code decoders, and is characterized in that it is based on the existing iterative decoding and utilizes P individual The small decoder realizes the decoding of P parallelism and thus improves the speed of the decoder. Correspondingly, the design method and device of interleaver, small decoder, external information memory, received signal memory, state controller and other devices suitable for this parallel decoding structure are proposed. Using such a method and device, the decoding speed can be increased by about P times, but the total storage capacity does not increase, and the occupied resources do not increase by P times.

Description

Parallel decoding method and decoding device for improving turbo code decoding speed

技术领域 technical field

提高Turbo码译码速度的并行译码方法及译码装置属于Turbo码译码器技术领域。A parallel decoding method and a decoding device for improving the decoding speed of Turbo codes belong to the technical field of Turbo code decoders.

背景技术 Background technique

Turbo码是接近香农极限的一种信道编码，建立在卷积码的基础上，在编码长度较长的情况下能得到接近香农极限的纠错性能，而且译码的复杂性不是很高，如果译码器可以达到一定的速度，满足在3G以及未来的4G通信中对数据率以及速度越来越高的要求，则会具有更广泛的应用前景。Turbo code is a kind of channel coding close to the Shannon limit. Based on the convolutional code, the error correction performance close to the Shannon limit can be obtained when the code length is long, and the decoding complexity is not very high. If The decoder can reach a certain speed to meet the increasingly higher data rate and speed requirements in 3G and future 4G communications, and it will have a wider application prospect.

Turbo码编码本质上是并行级联卷积码，设其编码为(n，k)，则编码效率是k/n，它是由两个子编码器和一个交织器组成的编码器。编码结构示意图如图1所示。以1/3码率为例，输入是1比特，输出是3比特，包括信息位X和校验位Y、Y’，两个校验位Y、Y’分别是不经过交织器和经过交织器编码后的输出。The Turbo code encoding is essentially a parallel concatenated convolutional code, if its encoding is (n, k), then the encoding efficiency is k/n, and it is an encoder composed of two sub-encoders and an interleaver. The schematic diagram of the encoding structure is shown in Figure 1. Taking the code rate of 1/3 as an example, the input is 1 bit, and the output is 3 bits, including information bit X and parity bits Y, Y', and the two parity bits Y, Y' are respectively without the interleaver and after interleaving output after encoding.

一)一种没有读写冲突的交织器。1) An interleaver without read-write conflicts.

对于Turbo码并行译码方案来说，每个子译码块是单独同时译码的，如果采用随机交织的方案，那么在同一个时刻，可能会对同一个译码子块的不同位置进行读或者写操作，而对于一个存储结构RAM来说，相同的时刻只能进行一个位置的读或者写，这样就产生了读写冲突。因此，要实现并行译码，就要设计出一种没有这样的读写冲突的交织器。For the turbo code parallel decoding scheme, each sub-decoding block is decoded independently and simultaneously. If the random interleaving scheme is adopted, then at the same time, different positions of the same decoding sub-block may be read or read. Write operation, and for a storage structure RAM, only one location can be read or written at the same time, so a read-write conflict occurs. Therefore, to realize parallel decoding, it is necessary to design an interleaver without such read-write conflicts.

根据相关文献调研，设计思想如下：According to relevant literature research, the design idea is as follows:

取块长度为W，N为码长，则P＝N/W为并行度。根据P和W的不同情况采用不同的交织方案(要求P<＝W)，描述如下：Take the block length as W, and N as the code length, then P=N/W as the degree of parallelism. According to the different situations of P and W, different interleaving schemes are adopted (requiring P<=W), which are described as follows:

i.P为整数，且P与W互为素数，则将N写成P*W(P行W列)的矩阵，然后按照列的顺序读，每行为W个元素，还是P行。这样的交织表是没有冲突的，同一列的元素进行同时的读写操作，但属于不同的RAM，这样就避免了冲突。如N＝28，W＝7，P＝4，交织表见表1所示。i. P is an integer, and P and W are mutually prime numbers, then write N as a matrix of P*W (P rows and W columns), and then read in the order of columns, each row has W elements, or P rows. There is no conflict in such an interleaving table. Elements in the same column perform simultaneous read and write operations, but belong to different RAMs, thus avoiding conflicts. For example, N=28, W=7, P=4, see Table 1 for the interleaving table.

ii.当P为整数，但P与W不互为素数时，设K为所述P与W的最大公约数，则把N写成P行W列的矩阵，将该矩阵以列的顺序的方式分成K块，在变换第0块时，按照列的顺序读，将该块读成每一行有W个元素的矩阵；在变换第I块时，将本块最后I个元素顺次提到本块首，然后按照列的顺序读，将该块读成每一行有W个元素的矩阵，其中I＝1，…，K-1，由此得到一个交织表；由此得到一个交织表；如N＝64，W＝8，P＝8，交织表见表2所示。iii.当P不为整数时，对译码数据尾部加0，使新的码长N满足P＝N/W变为整数，再按照上面当P为整数时的方法，得到一个交织表，在译码后舍弃尾部添加的0。ii. When P is an integer, but P and W are not mutually prime numbers, let K be the greatest common divisor of said P and W, then write N as a matrix of P rows and W columns, and the matrix is in the order of columns Divided into K blocks, when transforming the 0th block, read in the order of the columns, and read the block into a matrix with W elements in each row; when transforming the I-th block, refer the last I elements of this block to The head of the block is then read in the order of the columns, and the block is read into a matrix with W elements in each row, where I=1,...,K-1, thus obtaining an interleaving table; thus obtaining an interleaving table; as N=64, W=8, P=8, see Table 2 for the interleaving table. iii. When P is not an integer, add 0 to the tail of the decoded data, so that the new code length N satisfies P=N/W to become an integer, and then obtain an interleaving table according to the above method when P is an integer. The trailing 0s are discarded after decoding.

表1 N＝28，W＝7交织表Table 1 N=28, W=7 interleaving table

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 twenty one twenty two twenty three twenty four 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

交织之前before weaving

0 8 16 24 32 40 48 56 57 1 9 17 25 33 41 49 50 58 2 10 18 26 34 42 43 51 59 3 11 19 27 35 36 44 52 60 4 12 20 28 29 37 45 53 61 5 13 21 22 30 38 46 54 62 6 14 15 23 31 39 47 55 63 7 0 8 16 twenty four 32 40 48 56 57 1 9 17 25 33 41 49 50 58 2 10 18 26 34 42 43 51 59 3 11 19 27 35 36 44 52 60 4 12 20 28 29 37 45 53 61 5 13 twenty one twenty two 30 38 46 54 62 6 14 15 twenty three 31 39 47 55 63 7

表2 N＝64，W＝8交织表Table 2 N=64, W=8 interleaving table

以上设计方法能够满足并行的读写没有冲突，但是随机性显然不够。相邻的码之间的相对关系比较固定，比如都间隔W，相邻以及相隔一定关系的码的随机性不够，译码性能不好。要使之能适用于Turbo码译码器，还需要进行随机化处理。随机化处理的方法有很多，比如进行行内交织、行间交织或者同时进行行内和行间交织都可以达到这样的效果。最终交织方案的确定，是要通过大量的仿真来实现的，我们最终采用先进行行内交织再进行行间交织的方式，这也是我们发明的一部分。The above design method can satisfy parallel reading and writing without conflict, but the randomness is obviously not enough. The relative relationship between adjacent codes is relatively fixed, for example, the distance between them is W, and the randomness of adjacent codes and codes separated by a certain relationship is not enough, and the decoding performance is not good. To make it applicable to the Turbo code decoder, it also needs to be randomized. There are many methods of randomization processing, such as intra-row interleaving, inter-row interleaving, or both intra-row and inter-row interleaving can achieve this effect. The determination of the final interleaving scheme must be realized through a large number of simulations. We finally adopt the method of performing intra-row interleaving and then inter-row interleaving, which is also a part of our invention.

二)迭代译码方法2) Iterative decoding method

Turbo译码是采用迭代译码的方法进行的，迭代原理示意图如图3所示。具体方法原理因为已经是成熟技术，在此不作详细描述。Turbo decoding is carried out by means of iterative decoding, and the schematic diagram of the iterative principle is shown in Figure 3. Since the principle of the specific method is already a mature technology, it will not be described in detail here.

发明内容 Contents of the invention

本发明的目的在于提供一种可以提高Turbo码译码器速度的方法及译码装置。The object of the present invention is to provide a method and a decoding device capable of increasing the speed of a Turbo code decoder.

本发明的提高Turbo码译码器速度的并行译码方法的特征在于：该方法是利用并行工作的P个单独的基于Turbo码迭代译码的小译码器和在并行译码时没有读写冲突的交织器来提高译码速度的，所述的译码方法依次含有以下步骤：The feature of the parallel decoding method improving the speed of Turbo code decoder of the present invention is: the method utilizes P independent small decoders based on Turbo code iterative decoding of parallel work and does not read and write when parallel decoding The conflicting interleaver is used to improve the decoding speed, and the described decoding method contains the following steps in turn:

步骤1：对所有存储器及控制器进行初始化；Step 1: Initialize all memories and controllers;

步骤2：把P个小交织表存入P个存储器构成的小交织器，所述P个小交织表按照以下步骤得到：Step 2: P small interleaving tables are stored in a small interleaver formed by P memories, and the P small interleaving tables are obtained according to the following steps:

当P为整数，且P与W互为素数，则将N写成P行W列的矩阵，然后按照列的顺序读写，每行为W个元素，共P行，其中，块长度为W，N为码长，P＝N/W，所述P代表并行度；由此，得到一个交织表；When P is an integer, and P and W are mutually prime numbers, write N as a matrix of P rows and W columns, and then read and write according to the order of the columns, each row has W elements, a total of P rows, where the block length is W, N is code length, P=N/W, and described P represents degree of parallelism; Thus, obtain an interleaving table;

当P为整数，但P与W不互为素数时，设K为所述P与W的最大公约数，则把N写成P行W列的矩阵，将该矩阵以列的顺序的方式分成K块，在变换第0块时，按照列的顺序读，将该块读成每一行有W个元素的矩阵；在变换第I块时，将本块最后I个元素顺次提到本块首，然后按照列的顺序读，将该块读成每一行有W个元素的矩阵，其中I＝1，…，K-1，由此得到一个交织表；When P is an integer, but P and W are not mutually prime numbers, let K be the greatest common divisor of P and W, then write N as a matrix of P rows and W columns, and divide the matrix into K in the order of columns When transforming the 0th block, read it in the order of the columns, and read the block as a matrix with W elements in each row; when transforming the Ith block, refer the last I elements of the block to the beginning of the block , then read in the order of the columns, read the block into a matrix with W elements in each row, where I=1,...,K-1, thus obtaining an interleaving table;

当P不为整数时，对译码数据尾部加0，使新的码长N满足P＝N/W变为整数，再按照上面当P为整数时的方法，得到一个交织表，在译码后舍弃尾部添加的0；When P is not an integer, add 0 to the tail of the decoded data, so that the new code length N satisfies P=N/W to become an integer, and then obtain an interleaving table according to the above method when P is an integer, and then decode Discard the 0 added at the end;

再把所述交织表先进行行内交织、再进行行间交织，或者同时进行行内交织和行间交织，得到大交织表，上述大交织表的每一行形成一个小的交织表，得到并行译码的读写不冲突的大交织表；Then carry out intra-row interleaving and inter-row interleaving on the described interleaving table, or perform intra-row interleaving and inter-row interleaving at the same time to obtain a large interleaving table, and each row of the above-mentioned large interleaving table forms a small interleaving table to obtain parallel decoding A large interleaving table with non-conflicting reading and writing;

步骤3：把一帧待译码数据送入RAM读写控制器；Step 3: Send a frame of data to be decoded to the RAM read-write controller;

步骤4：迭代控制器向所属RAM读写控制器发送迭代信号，同时，迭代次数计数器开始计数；Step 4: The iteration controller sends an iteration signal to the RAM read-write controller, and at the same time, the iteration number counter starts counting;

步骤5：所述RAM读写控制器在迭代控制信号控制下，从所述接收信号存储器读出P块数据，并行地送入P个子译码器，同时从P个存储交织表的交织器中读取交织表，进行Turbo码分段递推式译码；Step 5: Under the control of the iterative control signal, the RAM read-write controller reads P blocks of data from the received signal memory, and sends them to P sub-decoders in parallel, and at the same time reads out P blocks of data from P interleavers storing interleaving tables Read the interleaving table and perform segmental recursive decoding of Turbo codes;

步骤6：所述每一个子译码器对输入的每一块数据先进行预推，以得到前向递推的状态似然值的初始值，然后在每一块内进行分段递推，此时，前向递推和反向递推同时进行，前向递推的同时进行似然比和外信息的计算；Step 6: Each of the sub-decoders pre-push each block of input data to obtain the initial value of the forward-recursive state likelihood value, and then perform segmental recursion in each block, at this time , the forward recursion and reverse recursion are performed at the same time, and the likelihood ratio and external information are calculated at the same time as the forward recursion;

步骤7：P个子译码器在每次迭代后读出接收信号和相应的外信息之后，经过迭代译码计算将外信息送往RAM读写控制器写入外信息存储器，供所述P个子译码器读出用于下一次迭代；Step 7: After the P sub-decoders read the received signal and the corresponding external information after each iteration, the external information is sent to the RAM read-write controller to be written into the external information memory through iterative decoding calculation for the P sub-decoders Decoder readout for next iteration;

步骤8：所述迭代次数计数器迭代次数满之后向迭代控制其发出迭代结束信号，所述P个子译码器发出硬判决信息。Step 8: After the number of iterations of the iteration count counter is full, send an iteration end signal to the iteration controller, and the P sub-decoders send hard decision information.

本发明的提高Turbo码译码器速度的并行译码装置的特征在于：RAM读写控制器，P个子译码器，P个接收信号存储器，P个外信息存储器，P个小交织器、一个迭代控制器以及迭代次数计数器，其中，P＝N/W，N为一帧待译码数据码长，W为块长，P为并行度，其中：The feature of the parallel decoding device improving the speed of Turbo code decoder of the present invention is: RAM read-write controller, P sub-decoders, P receiving signal memory, P external information memory, P small interleavers, a An iteration controller and an iteration count counter, wherein, P=N/W, N is the code length of a frame of data to be decoded, W is the block length, and P is the degree of parallelism, wherein:

迭代控制器，输出迭代控制信号；An iterative controller that outputs an iterative control signal;

RAM读写控制器，设有迭代控制信号输入端，待译码数据的接收信号以及外信息的输入端；The RAM read-write controller is provided with an iteration control signal input terminal, a receiving signal of data to be decoded and an input terminal of external information;

P个接收信号存储器，该存储器的读写控制信号和写数据的输入端与所述RAM读写控制器相应的信号输出端相连，该存储器的输出端与所述的RAM读写控制器的相应输入端相连；P receiving signal memory, the input terminal of the read-write control signal and write data of the memory is connected with the corresponding signal output end of the RAM read-write controller, and the output end of the memory is connected with the corresponding signal output end of the RAM read-write controller connected to the input;

迭代次数计数器，该计数器的计数脉冲输出端与所述RAM读写控制器相应输入端相连；The number of iterations counter, the count pulse output end of the counter is connected with the corresponding input end of the RAM read-write controller;

P个外信息存储器，该存储器的外信息数据、地址信号输入端与所述的RAM读写控制器的相应输出端相连；P external information storages, the external information data and address signal input terminals of the storage are connected to the corresponding output terminals of the RAM read-write controller;

P个子译码器，采用可编程逻辑器件，该译码装置的P个数据、地址信号输入端与所述RAM读写控制器的P个交织后的待译码数据的输出端及地址信号输出端相连；该P个子译码器的硬判决信号输出端及外信息输出端与所述RAM读写控制器相应输入端相连；P sub-decoders adopt programmable logic devices, and the P data and address signal input terminals of the decoding device are output with the P interleaved data to be decoded data output terminals and address signal output of the RAM read-write controller. The hard decision signal output terminals and the external information output terminals of the P sub-decoders are connected to the corresponding input terminals of the RAM read-write controller;

P个小交织器，都由存储器构成，所述P个小交织器以下述步骤构成并存储交织表：P small interleavers are all made of memory, and the P small interleavers form and store an interleaving table with the following steps:

再把所述交织表先进行行内交织、再进行行间交织，或者同时进行行内交织和行间交织，上述大交织表的每一行形成一个小的交织表；对小交织表进行存储得到P个小交织器。Then the interleaving table is first carried out in-row interleaving, then inter-row interleaving, or simultaneously in-row interleaving and inter-row interleaving, each row of the above-mentioned large interleaving table forms a small interleaving table; the small interleaving table is stored to obtain P Small interleaver.

本方法可在各种可编程逻辑器件中实施，一个具体实施是用Xilinx公司的Virtex2px2vp70ff1704-5芯片来作Turbo码译码器，也可用专用集成电路实现。This method can be implemented in various programmable logic devices, and a specific implementation is to use the Virtex2px2vp70ff1704-5 chip of Xilinx Company as a Turbo code decoder, and can also be realized by an application-specific integrated circuit.

附图说明 Description of drawings

图1Turbo码编码结构图。Figure 1 Turbo code encoding structure diagram.

图2Turbo码的迭代译码示意图。Figure 2 Schematic diagram of iterative decoding of Turbo codes.

图3迭代译码原理示意图。Fig. 3 is a schematic diagram of the principle of iterative decoding.

弯箭头代表译该位置码元时用到的相邻位置的前后状态似然值，直线箭头指交织前后相应的位置(或者说是序号)。Curved arrows represent the state likelihood values before and after adjacent positions used when translating the symbol at this position, and straight arrows refer to the corresponding positions (or sequence numbers) before and after interleaving.

图4块内递推过程示意图。Figure 4. Schematic diagram of the recursive process within the block.

横轴代表码元，按照顺序排列，每一格表示一个段，相同的数字编号表示同时的操作。The horizontal axis represents code units, which are arranged in order, each grid represents a segment, and the same number represents simultaneous operations.

图5块内递推示意图。Figure 5. Schematic diagram of the recursion within the block.

图6译码过程流程图。Figure 6 is a flowchart of the decoding process.

图7译码装置实现方框图。Figure 7 is a block diagram of the implementation of the decoding device.

具体实施方式 Detailed ways

现有的译码算法是非并行的，采用分段递推的方法，对待译码数据只能串行处理，速度慢，无法满足对译码器速度越来越高的要求。而本方法则能较大地提高译码速度，采用P并行度可以将速度大约提高P倍。当然，译码器的复杂度有所提高，但是并没有增加P倍，相对所得到的速度来说，这样的代价是可以承受的。The existing decoding algorithm is non-parallel, adopts the segmented recursive method, and the decoded data can only be processed serially, and the speed is slow, which cannot meet the increasingly higher requirements for the speed of the decoder. However, this method can greatly increase the decoding speed, and the speed can be increased by about P times by using P parallelism. Of course, the complexity of the decoder has increased, but it has not increased by a factor of P, which is an acceptable price relative to the obtained speed.

1.本发明所提出的方法的特征在于：它是一种利用适用于并行译码的交织器和并行译码方法，P个译码子块可以同时独立地互不干扰地单独译码，相当于P个原无并行的译码器同时工作的译码方法。它所述的译码方法依次含有以下步骤：1. The method proposed by the present invention is characterized in that: it is a method for utilizing an interleaver and a parallel decoding method suitable for parallel decoding, and P decoding sub-blocks can be independently decoded independently without interfering with each other at the same time. A decoding method that works simultaneously on P decoders without parallelism. Its described decoding method contains following steps successively:

1)译码开始，相应的计数控制器初始化。1) The decoding starts, and the corresponding counting controller is initialized.

2)将一帧待译码数据分成P块，作为P个子待译码数据，信息位信息和校验位信息分别存储在P个RAM中。在后面的译码过程中，外信息(包括交织的和不交织的)都要存储在相应的译码子块RAM里。2) A frame of data to be decoded is divided into P blocks, which are used as P pieces of data to be decoded, and information bit information and parity bit information are stored in P RAMs respectively. In the subsequent decoding process, the external information (including interleaved and non-interleaved) must be stored in the corresponding decoding sub-block RAM.

3)每个子块按照并行迭代的方法进行译码。对于每个子块采用分段递推的方法，将子块分成m个小的段，当对某一段进行前向递推计算状态似然值和似然比时，同时进行下一段的反向递推计算反向状态似然值。即在块内同时进行反向和前向的递推以提高吞吐量。在前向递推的过程中计算出似然比及外信息，并存储到相应的RAM中。具体迭代过程如下：判断迭代过程是否结束，是结束，输出译码结果，转5)，否则进行如下迭代：译第一个码时，对于存储外信息以及校验位及信息位比特等存储器，读和写时直接用地址信号即可，而对于译第二个码，需要通过交织器(具体见4))，寻找交织之前、之后对应的地址关系从而对存储器进行正确的读写。块内递推过程如下：对于每一个子块的第一段，先进行反向预推得到第一段反向递推初始值，然后对第一段进行反向递推得到反向递推状态似然值，同时进行下一段的反向预推；然后对一段进行前向递推计算前向递推状态似然值以及似然比和外信息，同时，对第二段进行反向递推得到反向递推状态似然值，依此递推。也就是说，每一个时刻，除了第一段刚开始，都会有两个反向递推(一个预推一个正式递推)和一个前向递推过程在进行。这一块的递推过程如附图4所示，横轴上标注的为分块表示，序号1、2、3表示执行步骤，相同的标注为同时进行的操作。对于每一子块中，第一段前向递推的状态似然值初始值的确定：由前一块的最后一段的前向递推到最后一位的似然值记录得到，最后一段反向递推的状态似然值初始值由后一子块第一段的反向递推的最后一位似然值记录得到，其原理示意及过程如图5所示。3) Each sub-block is decoded according to the method of parallel iteration. For each sub-block, the segmented recursive method is used to divide the sub-block into m small segments. When performing forward recursion to calculate the state likelihood value and likelihood ratio for a certain segment, the reverse recursion of the next segment is performed at the same time. Push computes the reverse state likelihood. That is, both reverse and forward recursion are performed within the block to improve throughput. In the process of forward recursion, the likelihood ratio and extrinsic information are calculated and stored in the corresponding RAM. The concrete iterative process is as follows: judge whether the iterative process is over, if it is over, output the decoding result, and turn to 5), otherwise the following iterations are carried out: when deciphering the first code, for storage of external information and check digits and information bit bits, etc., When reading and writing, you can directly use the address signal, and for the second code to be translated, you need to use the interleaver (see 4 for details) to find the corresponding address relationship before and after interleaving to read and write the memory correctly. The recursion process in the block is as follows: For the first section of each sub-block, first perform reverse pre-push to obtain the initial value of the first section of reverse recursion, and then perform reverse recursion on the first section to obtain the reverse recursion state Likelihood value, at the same time carry out the reverse pre-push of the next section; then perform forward recursion on one section to calculate the forward recursion state likelihood value, likelihood ratio and external information, and at the same time, perform reverse recursion on the second section Obtain the state likelihood value of reverse recursion, and recurse accordingly. That is to say, at every moment, except for the beginning of the first paragraph, there will be two reverse recursions (one pre-recursion and one formal recursion) and one forward recursion process. The recursive process of this block is shown in Figure 4. The horizontal axis marks are divided into blocks, and the serial numbers 1, 2, and 3 represent execution steps, and the same marks represent simultaneous operations. For each sub-block, the determination of the initial value of the state likelihood value of the first forward recursion: it is obtained from the record of the likelihood value recorded from the forward recursion of the last segment of the previous block to the last digit, and the last segment is reversed The initial value of the recursive state likelihood value is obtained from the record of the last bit likelihood value of the reverse recursive first segment of the next sub-block, and its schematic diagram and process are shown in Figure 5.

4)适用于并行译码器的交织器。这种交织器必须能保证在迭代过程中对同时执行操作的并行的P个存储外信息RAM读写的时候没有冲突，即必须能保证对每次对P个RAM分别有一个数据的读写，随机的交织器是不能满足这样的要求的。4) Interleaver suitable for parallel decoders. This kind of interleaver must be able to ensure that there is no conflict when reading and writing the parallel P RAMs that perform operations at the same time during the iterative process, that is, it must be able to ensure that there is one data read and write for each of the P RAMs. A random interleaver cannot meet such requirements.

5)经过几次迭代后，判断迭代译码过程结束，将并行的译码结果转化为串行的数据，输出。5) After several iterations, it is judged that the iterative decoding process is over, and the parallel decoding results are converted into serial data and output.

译码器主要关键技术方法如下：The main key technical methods of the decoder are as follows:

设一帧数据长度为N，译码并行度为P，即将一帧接收到的数据分成P个子块，每块同时进行译码操作，块大小为W＝N/P。让P和W都为整数，如果不能满足，可在译码数据后面添0，译码结束后不将其输出即可。因为每块可以单独互不影响译码，则在一个译码周期里我们可以同时完成P个译码。Let the data length of a frame be N, and the degree of decoding parallelism be P, that is, the data received in a frame is divided into P sub-blocks, each block is decoded at the same time, and the block size is W=N/P. Let both P and W be integers, if not satisfied, you can add 0 after the decoded data, and do not output it after decoding. Because each block can be independently decoded independently of each other, we can simultaneously complete P decodings in one decoding cycle.

对于每块数据的译码，采用原方案的分段递推方法，前向递推计算状态似然值和反向递推计算状态似然值同时进行，在前向递推的过程中同时计算似然比和外信息。For the decoding of each block of data, the piecewise recursive method of the original scheme is adopted, and the forward recursive calculation of the state likelihood value and the reverse recursive calculation of the state likelihood value are carried out at the same time, and the calculation is performed simultaneously in the forward recursive process Likelihood ratios and extrinsic information.

记α为前向递推状态似然值，β为反向递推状态似然值，块内译码过程如图4所示，具体描述如下：Note that α is the forward recursive state likelihood value, and β is the reverse recursive state likelihood value. The intra-block decoding process is shown in Figure 4, and the specific description is as follows:

将一个块分为几个段，各个段之间采用“滑动块”的方法，即先从第二个段做反向预推β，得到第一个段的β较为精确的初始值，然后做第一个段的反向递推β，同时做第二个段的反向预推β得到初始值，接着做第一个段的正向递推α，同时计算似然比，同时做第二个段的反向递推β，第三个段的反向预推β得到初始值……如此进行下去。则每个周期，完成两个反向递推，一个正向递推同时计算似然值。这样分段做可以减小存储量，而两段并行，同时第三段预推的方法则可以提高吞吐量。Divide a block into several segments, and use the method of "sliding blocks" between each segment, that is, first do reverse pre-push β from the second segment to obtain a more accurate initial value of β in the first segment, and then do The reverse recursion β of the first segment, and at the same time do the reverse pre-calculation β of the second segment to get the initial value, then do the forward recursion α of the first segment, calculate the likelihood ratio at the same time, and do the second The reverse recursion β of the first segment, the reverse pre-calculation β of the third segment gets the initial value...and so on. Then, in each cycle, two reverse recursions are completed, and one forward recursion is used to calculate the likelihood value at the same time. In this way, the storage capacity can be reduced by segmenting, and the method of parallelizing the two segments and pre-pushing the third segment can increase the throughput.

图4中，横向为码长，分为P块，假设每一段长度与预推长度相同，都为32，而①、②、③表示时间的先后，相同的标识为同时进行的操作。In Figure 4, the horizontal direction is the code length, which is divided into P blocks. Assume that the length of each segment is the same as the pre-push length, which is 32, while ①, ②, and ③ represent the sequence of time, and the same symbols are operations performed at the same time.

下面我们来看一个具体例子的实现。设一帧数据长度即码长为4096，并行度P为16，则每个子块长度为W＝4096/16＝256。要求每一块能同时而且独立互不影响地译码。将一块数据分成8段来译码，则块内的每一段数据长度为256/8＝32。Let's look at the implementation of a specific example. Assuming that the data length of one frame, that is, the code length, is 4096, and the degree of parallelism P is 16, then the length of each sub-block is W=4096/16=256. Each block is required to be decoded simultaneously and independently without affecting each other. A block of data is divided into 8 segments for decoding, and the length of each segment of data in the block is 256/8=32.

由于前向递推计算状态似然值时，第一段的α的初始值应该来自于前一块的最后一段，而反向递推计算状态似然值时，最后一段的β的初始值应该来自于后一块的第一段。也就是说本块的初始值来自于别的块，因此必须先进行预推得到这些初始值作为另一块的输入。具体预推方案为：每一次迭代开始时，先对最后一段(长度为32)进行α的预推，得到最后一个码元的α值，作为下一块前向递推第一段α的初始预置值。而每一块的第一段的反向递推完成时记下块首的β作为前一块的最后一段的反向递推的β的初始值。初始值的递推关系示意图如图5所示。When calculating the state likelihood value by forward recursion, the initial value of α in the first segment should come from the last segment of the previous block, and when calculating the state likelihood value by reverse recursion, the initial value of β in the last segment should come from in the first paragraph of the following block. That is to say, the initial value of this block comes from other blocks, so it must be pre-deduced to obtain these initial values as the input of another block. The specific pre-push scheme is as follows: at the beginning of each iteration, α pre-push is first performed on the last segment (length 32), and the α value of the last symbol is obtained as the initial pre-push of the first segment α for the next forward recursion. set value. And when the reverse recursion of the first section of each block is completed, write down the β at the beginning of the block as the initial value of the reverse recursion β of the last section of the previous block. The schematic diagram of the recursive relationship of the initial value is shown in Figure 5.

并行译码的实现还有一个关键技术就是适用于这种译码结构的交织器的设计与实现。按照背景技术中介绍的没有冲突的交织器的设计的基本方法，再进行随机化处理：先进行行内交织再进行行间交织。Another key technology for the realization of parallel decoding is the design and implementation of an interleaver suitable for this decoding structure. According to the basic method of designing a conflict-free interleaver introduced in the background art, the randomization process is performed: intra-row interleaving is performed first, and then inter-row interleaving is performed.

下面以码长为64，单位块长度为8来说明交织器的实现。The implementation of the interleaver is described below with the code length being 64 and the unit block length being 8.

N＝64，P＝8，W＝8，W与P的最大公约数为8，则对行写列读的交织表根据背景技术中介绍的规则要进行8个循环变换。在此基础上得到的交织表如表2所示。因为随机性不够，因此还要做随机化处理，即在此基础上再作行内和行间交织，当然，为了保证读写的没有冲突，交织要保证同列元素属于同一列的不变性。下面根据这个码长给出一个随机化处理的方法：先进行行内交织，对于每一行所采用的交织方式是一样的，比如N=64, P=8, W=8, the greatest common divisor of W and P is 8, then the interleaving table for row writing and column reading needs to perform 8 cyclic transformations according to the rules introduced in the background technology. The interleaving table obtained on this basis is shown in Table 2. Because the randomness is not enough, it is necessary to perform randomization processing, that is, to interleave within and between rows on this basis. Of course, in order to ensure that there is no conflict between reading and writing, the interleaving must ensure the invariance of elements in the same column belonging to the same column. According to the code length, a method of randomization processing is given below: first perform intra-row interleaving, and the interleaving method used for each row is the same, for example

5 0 6 2 1 7 4 3 5 0 6 2 1 7 4 3

那么通过行内交织之后，交织表变为：Then after in-row interleaving, the interleaving table becomes:

40 0 48 16 8 56 32 24 33 57 41 9 1 49 25 17 26 50 34 2 58 42 18 10 19 43 27 59 51 35 11 3 12 36 20 52 44 28 4 60 5 29 13 45 37 21 61 53 40 0 48 16 8 56 32 twenty four 33 57 41 9 1 49 25 17 26 50 34 2 58 42 18 10 19 43 27 59 51 35 11 3 12 36 20 52 44 28 4 60 5 29 13 45 37 twenty one 61 53

62 22 6 38 30 14 54 46 55 15 63 31 23 7 47 39 62 twenty two 6 38 30 14 54 46 55 15 63 31 twenty three 7 47 39

表3 行内交织之后的交织表Table 3 Interleaving table after intra-row interleaving

再对得到的交织表进行行间交织，采用模4的方式，也就是列数除以4同余的为同一种交织模式。Then perform inter-row interleaving on the obtained interleaving table, and adopt the mode of modulo 4, that is, the number of columns divided by 4 congruent is the same interleaving mode.

例如可以采用下面的行间交织方案：For example, the following inter-row interleaving scheme can be used:

余数为0：3 0 5 2 7 1 4 6The remainder is 0: 3 0 5 2 7 1 4 6

余数为1：5 1 7 3 0 6 4 2The remainder is 1: 5 1 7 3 0 6 4 2

余数为2：1 4 0 7 5 2 6 3The remainder is 2: 1 4 0 7 5 2 6 3

余数为3：2 7 4 0 1 5 3 6The remainder is 3: 2 7 4 0 1 5 3 6

则经过行间交织之后得到的交织表如表4所示Then the interleaving table obtained after interleaving between rows is shown in Table 4

19 29 41 2 51 21 25 10 40 57 20 31 8 49 4 39 5 15 48 52 37 7 32 60 26 43 63 16 58 35 47 24 55 0 13 9 23 56 61 17 33 22 34 45 1 14 18 53 12 36 6 59 44 28 54 3 62 50 27 38 30 42 11 46 19 29 41 2 51 twenty one 25 10 40 57 20 31 8 49 4 39 5 15 48 52 37 7 32 60 26 43 63 16 58 35 47 twenty four 55 0 13 9 twenty three 56 61 17 33 twenty two 34 45 1 14 18 53 12 36 6 59 44 28 54 3 62 50 27 38 30 42 11 46

表4 行间交织之后的交织表Table 4 Interleaving table after row interleaving

当然，对于不同的码长，行内行间交织的模式会有所不同，要通过大量的仿真来确定。以上是以码长64为例来说明交织器的设计过程。Of course, for different code lengths, the mode of interleaving within a row and between rows will be different, which must be determined through a large number of simulations. The code length 64 is used as an example to illustrate the design process of the interleaver.

综上所述，译码过程如下所述：In summary, the decoding process is as follows:

1)设计适合于并行译码的交织器。1) Design an interleaver suitable for parallel decoding.

2)译码开始，对所有的状态存储器以及相关的控制器初始化。2) Decoding starts, and all state memories and related controllers are initialized.

3)将一帧数据分成P个子块，对接收数据进行分块处理、存储。3) A frame of data is divided into P sub-blocks, and the received data is processed and stored in blocks.

4)每一块先进行预推，以得到前向和反向递推的状态似然值的初始值。4) Each block is pre-deduced first to obtain the initial value of the state likelihood value of forward and backward recursion.

5)每一块内进行分段递推，前向递推和反向递推同时进行，前向递推的同时计算似然比和外信息。5) Segmented recursion is performed in each block, forward recursion and reverse recursion are performed simultaneously, and likelihood ratio and external information are calculated at the same time of forward recursion.

6)迭代译码，迭代结束时，输出硬判信息。6) Iterative decoding, when the iteration ends, hard decision information is output.

并行度为16的并行译码方案，在不考虑工作时钟的条件下，吞吐量比原先方案提高了16倍，而占用的资源并没有无法忍受的提高，因此，这样的代价是可以接受的。The parallel decoding scheme with a degree of parallelism of 16 increases the throughput by 16 times compared with the original scheme without considering the working clock, and the resources occupied are not unbearably increased. Therefore, such a cost is acceptable.

以下对译码装置进行说明：The decoding device is described as follows:

译码装置即Turbo码译码器包括并行处理的P个子译码器(包括内部递推状态似然值存储器、适用于并行译码的存储小交织表的存储器等)、接收信号存储器、外信息存储器、RAM读写控制器、迭代控制器。The decoding device, i.e., the Turbo code decoder, includes P sub-decoders for parallel processing (including internal recursive state likelihood value memory, memory for storing small interleaving tables suitable for parallel decoding, etc.), received signal memory, external information Memory, RAM read and write controller, iterative controller.

子译码器是同时进行独立译码的装置，根据接收到的待译码数据以及上次迭代的计算结果更新每个符号的外信息，包括内部迭代控制器以及存储小交织表和状态似然值的存储器等。P个子译码器并行处理数据，每个时钟周期内并行处理P个元素。The sub-decoder is a device that performs independent decoding at the same time. It updates the external information of each symbol according to the received data to be decoded and the calculation results of the last iteration, including an internal iteration controller and storing small interleaving tables and state likelihoods. Value storage, etc. P sub-decoders process data in parallel, and process P elements in parallel in each clock cycle.

接收信号存储器用于存储接收信号，包括2*P个RAM，分别用于存储信息位和校验位。每接收一个信号，按照其在帧中的位置，选择并存入RAM。The received signal memory is used to store the received signal, including 2*P RAMs, which are respectively used to store information bits and check bits. Each time a signal is received, it is selected and stored in RAM according to its position in the frame.

外信息存储器用于存储外信息，包括P个RAM。第一次迭代时外信息为初始值0(不读外信息)，每次迭代时读出接收信号和相应的外信息，经过处理后把更新的外信息写入用于下一次迭代。因为每次所读的外信息的位置和接收信号的位置相同，所以外信息RAM的组织方式和接收信号RAM的相同。The external information memory is used to store external information, including P RAMs. In the first iteration, the extrinsic information is the initial value 0 (do not read the extrinsic information). In each iteration, the received signal and the corresponding extrinsic information are read out. After processing, the updated extrinsic information is written for the next iteration. Because the location of the external information read each time is the same as that of the received signal, the organization of the external information RAM is the same as that of the received signal RAM.

RAM读写控制器，主要是产生RAM的读写控制信号以及地址信号等。The RAM read-write controller mainly generates RAM read-write control signals and address signals.

迭代控制器，主要是控制迭代过程的。The iterative controller mainly controls the iterative process.

P个小交织表，是按照上述的交织原则设计的，存储在存储器中。The P small interleaving tables are designed according to the above interleaving principle and stored in the memory.

Claims

1. improve the parallel decoding method of Turbo code decoding speed, it is characterized in that, this method be utilize the P of concurrent working independent do not have the interleaver of read/write conflict to improve decoding speed based on the little decoder of Turbo code iterative decoding with when the parallel decoding, described interpretation method contains following steps successively:

Step 1: all memories and controller are carried out initialization;

Step 2: P the table that interweaves for a short time deposited in the little interleaver that P memory constitutes, and described P the table that interweaves for a short time obtains according to following steps:

When P is an integer, and P and W prime number each other, then N is write as the matrix of the capable W row of P, then according to the order read-write of row, every behavior W element, common P is capable, and wherein, block length is W, and N is a code length, P=N/W, described P represents degree of parallelism; Thus, obtain the table that interweaves;

When P is an integer, but P and W establish the greatest common divisor that K is described P and W not each other during prime number, then N is write as the matrix of the capable W of P row, this matrix is divided into the K piece in the mode of the order of row, when the 0th of conversion, according to the sequential read of row, this piece is read as the matrix that each row has W element; When conversion I piece, the last I of this a piece element is mentioned this piece head in turn, according to the sequential read of row, this piece is read as each row matrix of W element then, I=1 wherein ..., K-1 obtains the table that interweaves thus;

When P is not integer, the decoding data afterbody is added 0, make new code length N satisfy P=N/W and become integer, again according to top method when P is integer, obtain the table that interweaves, after decoding, give up 0 of afterbody interpolation;

Again interweaving, in the ranks interweave again in the described advanced every trade of table that interweaves, interweave in perhaps going simultaneously and in the ranks interweave, the table that interweaved greatly, each row of the above-mentioned table that interweaves greatly forms the little table that interweaves, and obtains the table that interweaves greatly that the read-write of parallel decoding does not conflict;

Step 3: a frame data to decode is sent into the RAM read-write controller;

Step 4: the iteration control device sends iteration signal to affiliated RAM read-write controller, and simultaneously, the iterations counter begins counting;

Step 5: described RAM read-write controller is under the iteration control signal controlling, read the P blocks of data from described receiving signal memory, send into P sub-decoder concurrently, from P storage interweaves the interleaver of showing, read the table that interweaves simultaneously, carry out the decoding of Turbo code segmentation stepping type;

Step 6: described each sub-decoder pushes away earlier in advance to each blocks of data of input, initial value with the state likelihood value that obtains forward recursion, in each piece, carry out the segmentation recursion then, at this moment, forward recursion and reverse recursion carry out simultaneously, carry out the calculating of likelihood ratio and external information in the time of forward recursion;

Step 7:P sub-decoder read after each iteration after received signal and the corresponding external information, and calculating is sent to the RAM read-write controller with external information and is write the external information memory through iterative decoding, reads for a described P sub-decoder to be used for next iteration;

Step 8: it sends the iteration end signal to described iterations counter iterations to iteration control completely afterwards, and a described P sub-decoder sends hard decision information.

2. improve the code translator of Turbo code decoding speed, its feature mainly is: RAM read-write controller, P sub-decoder, P receiving signal memory, P external information memory, P little interleaver, iteration control device and iterations counter, wherein, P=N/W, N is a frame data to decode code length, W is a block length, and P is a degree of parallelism, wherein:

The iteration control device, output iteration control signal;

The RAM read-write controller is provided with the iteration control signal input part, the received signal of data to decode and the input of external information;

P receiving signal memory, the read-write control signal of this memory links to each other with the corresponding signal output part of described RAM read-write controller with the input of write data, and the output of this memory links to each other with the respective input of described RAM read-write controller;

The iterations counter, the count pulse output of this counter links to each other with described RAM read-write controller respective input;

P external information memory, the extrinsic information data of this memory, address signal input link to each other with the corresponding output end of described RAM read-write controller;

P sub-decoder adopts programmable logic device, and the output and the address signal output of the data to decode that the P of the P of this a code translator data, address signal input and described RAM read-write controller is individual after interweaving link to each other; The hard decision signal output part of this P sub-decoder and external information output link to each other with described RAM read-write controller respective input;

P little interleaver all is made of memory, and the individual little interleaver of described P is with following step formation and store the table that interweaves:

Interweaving, in the ranks interweave in the described advanced every trade of table that interweaves, interweave in perhaps going simultaneously and in the ranks interweave again, each row of the above-mentioned table that interweaves greatly forms the little table that interweaves; For a short time the table that interweaves stored obtain P little interleaver.

3. the code translator of raising Turbo code decoding speed according to claim 2 is characterized in that described Turbo code code translator has adopted the Virtex2p x2vp70ff1704-5 chip of xilinx company.