ECC decoding system and method of branch pipeline structure
Technical Field
The invention belongs to the technical field of communication data storage, and particularly relates to an ECC decoding system and method of a branch pipeline structure.
Background
At present, a satellite-borne switching system is indispensable for satellite communication, network switching data cannot be transmitted in time and needs to be temporarily stored under many conditions in a special environment of the universe, and NAND FLASH is suitable for data storage of the satellite-borne switching system due to the characteristics of small size, large capacity, long service life, non-volatility in power failure and the like. However, due to the physical characteristics of Nand Flash, problems such as drift Effects (drift Effects), Program-Disturb Errors (Program-Disturb Errors) and Read-Disturb Errors (Read-Disturb Errors) can occur during data reading and writing, and bit flipping occurs at a certain probability. To ensure the reliability of data, a corresponding error detection and correction mechanism, namely ecc (error Checking and correction), is required. Satellite communication has high requirements on data reliability, needs to be stored for a longer time, has higher SLC type NAND FLASH transmission speed and lower power consumption, and meets the requirements of satellite communication. In a common application, for NAND FLASH of SLC type, hamming codes are generally used for ECC error detection and correction, but the reliability of data storage of the satellite-borne switching system is seriously affected by the defects of weak hamming check error correction capability, low coding efficiency and the like. In contrast, the BCH code is more suitable for NAND FLASH error detection and correction in the application scenario.
The ECC implementation of Nand Flash is mainly divided into two parts: encoding, decoding and error correction. The BCH-based ECC coding is mainly realized by a plurality of shift registers and is simple. The BCH-based ECC decoding and error correcting process mainly comprises three steps: syndrome solving, error position polynomial solving and chien searching. At present, the research on the ECC decoding and error correcting process mainly focuses on syndrome solution, error position polynomial solution or chien search, the key points are the internal algorithm optimization of a single module and the corresponding hardware circuit improvement, the whole structure of the ECC decoding system is rarely considered, and the whole decoding speed still has an improvement space. In the prior art, there are two main types of overall structure designs for ECC decoding systems: the non-pipelined BCH decoding structure and the common two-stage pipelined BCH decoding structure are only 8-bit parallel decoding, although the common two-stage pipelined structure is improved in decoding speed compared with the non-pipelined structure, functional redundancy still exists in the data decoding process, error-free data still needs to undergo error position polynomial solution and the error correction process of chien search, path delay is large, power consumption is high, and decoding speed can be improved.
Through the above analysis, the problems and defects of the prior art are as follows: the existing ECC decoding system still has functional redundancy in the data decoding process, the path delay is large, and the decoding speed can still be improved.
The difficulty in solving the above problems and defects is: the algorithm realization of the decoding process tends to be perfect, and the speed is improved from the integral control of a peripheral system, which is a new improvement direction; in addition, to accelerate the speed as much as possible, error-free data is needed to avoid the implementation path of the error correction algorithm as much as possible, which inevitably causes the problems of different optimal decoding routes of different sector data, asynchronous production lines and the like, the considered situation is more complicated, and the design difficulty is greatly increased.
The significance of solving the problems and the defects is as follows: the method provides a new direction for ECC system design, accelerates decoding speed, improves system performance, has strong portability and wide application range, can meet the special situation of the satellite-borne switching system, and is also suitable for NAND FLASH of daily or commercial SLC type and MLC type.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an ECC decoding system and method of a branch pipeline structure.
The ECC decoding system of the branch pipeline structure comprises: the system comprises a BCH decoder module, a data output module, a branch pipeline control module and an RAM storage array;
and caching the input pure data, wherein each RAM corresponds to data of one sector, and a plurality of sectors are polled and written.
Calculating a corresponding syndrome from the input pre-decoded data, and judging whether the input data is in error according to the syndrome;
when data are wrong, solving a polynomial of the wrong position by the syndrome, and detecting the number of error codes;
determining error positions according to the solved error position polynomial, shifting out 16-bit-wide error patterns one by one, wherein each group of error patterns corresponds to one group of data error positions;
when the syndrome solving module judges that the data has no error, skipping a subsequent module of the BCH decoder, and directly outputting the data; when data are in error, after an error pattern is determined by the error position polynomial solving module and the error locator module, data are output with error correction according to the error pattern;
the whole decoding system carries out two-stage branch pipeline operation, the first-stage pipeline is syndrome solving, the second-stage pipeline branch is selected according to syndrome solving results, and the two branches correspond to two conditions of data error and no error;
further, the ECC decoding system and method of the branch pipeline structure determine a group of compiled code data units according to the BCH code, where the group of compiled code data units corresponds to a sector of the FLASH, and a data decoding process of the sector is as follows:
the data input is divided into two paths, one path discards check bits, and pure data is stored into a polled RAM; the other path of input data and the check bit are subjected to syndrome solving together, and whether the data have errors is judged according to the syndrome;
when the data are not wrong, reading correct data from the same RAM, and directly outputting the data;
when data are wrong, error positioning is carried out through an iBM algorithm and a chien search algorithm, exclusive OR is carried out on the data read from the same RAM and the positioned errors, bit inversion of wrong bits is completed, and correct data are output.
Further, the ECC decoding system and method of the branch pipeline structure has four stages for decoding data of one sector: the syndrome solving stage syn, the error position polynomial solving stage iBM, the error positioning stage chien and the data output stage tx judge whether the data is in error according to the syndrome, and two different decoding branches are generated.
Furthermore, NAND FLASH reads and writes by taking page as a unit, for 1page of data, multiple decoding operations of single sector are required, different sector data are decoded to form a two-stage branch pipeline structure, and a first-stage pipeline does not depend on two second-stage pipelines; all data need to pass through an adjoint solution stage syn, and the stage is defined as a first stage pipeline syn; for error data, the error location polynomial solution stage iBM, the error location stage chien and the corresponding error correction output stage tx are passed through, and the three stages are combined into a branched second-stage pipeline ibm _ chien _ tx; for error-free data, only the data output stage tx is passed, which is defined as the second stage pipeline tx of the other branch.
Further, the ECC decoding system and method of the branch pipeline structure is used for BCH (4200, 4096, 8) codes, and the first-level pipeline syn occupiesWith 263 clock cycles, the second stage pipeline ibm _ chien _ tx of the error data occupies 275 clock cycles, the second stage pipeline tx of the error-free data occupies the clock cycles depending on the effective data length, occupies 258 clock cycles for 4096 bits of full load per sector, and occupies only 258 clock cycles for 4096 bits of effective data length less than 4096 bits
One clock cycle.
Furthermore, the ECC decoding method of the branch pipeline structure performs pipeline synchronization on the condition that two stages of pipelines occupy unequal clock periods; when two stages of pipelines are simultaneously carried out, pipelines occupying short clock cycles need to wait for long-cycle pipelines to finish and then flow downwards; when all sectors in the read 1-page data have no errors, the pipeline only passes through the first-stage pipeline syn and the second-stage pipeline tx to reach the upper limit of the performance of the branch pipeline decoding system; when all sectors in the read 1-page data have errors, the pipeline only passes through the first-stage pipeline syn and the second-stage pipeline ibm _ chien _ tx to reach the lower limit of the performance of the branch pipeline decoding system.
Another object of the present invention is to provide an ECC decoding system with a tributary pipeline structure, which is mounted in a satellite-borne switching system.
By combining all the technical schemes, the invention has the advantages and positive effects that: the bit width of the ECC operation of the NANDFLASH is 8 bits, the bit width is expanded to 16 bits, the 16-bit parallel ECC operation is realized, and the transmission rate of a data bus is increased. The syndrome solving is expanded into 16-bit parallel, and the bit width expansion of the whole ECC decoding system is completed by using 16-bit parallel chien search.
The invention provides a pipeline structure with branches, which ensures that the decoded data of each sector only passes through the optimal pipeline branch by selectively carrying out pipeline operation, error-free data does not need to carry out error positioning and error correction, the integral decoding speed is accelerated, and the clock period is saved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.
FIG. 1 is a block diagram of an ECC decoding system with a branch pipeline structure according to an embodiment of the present invention;
in fig. 1: 1. a RAM memory array; 2. a BCH decoder; 3. a data output module; 4. and a branch pipeline control module.
Fig. 2 is a schematic structural diagram of an ECC decoding system with a branch pipeline structure according to an embodiment of the present invention.
FIG. 3 is a block diagram of a sector data decoding process according to an embodiment of the present invention.
FIG. 4 is a pipeline branch diagram illustrating an ECC decoding method for a branch pipeline structure according to an embodiment of the present invention.
FIG. 5 is a schematic branch flow chart of an ECC decoding method for a branch flow structure according to an embodiment of the present invention.
Fig. 6 is a diagram of a branch pipeline worst case/normal pipeline decoding provided by an embodiment of the invention.
Fig. 7 is a schematic diagram of decoding an optimal situation of a branch flow according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In view of the problems in the prior art, the present invention provides an ECC decoding system and method for a branch pipeline structure, and the present invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the present invention provides an ECC decoding system with a branch pipeline structure, which includes:
the RAM memory array is used for caching the input pure data;
the BCH decoder is used for judging whether the input data is in error, correcting error of the error data and outputting a 16-bit parallel error pattern;
the data output module is used for realizing correct data output according to different conditions, and skipping an error correction part when the data have no errors, and directly outputting the data; when the data is wrong, carrying out data error correction output according to the error pattern;
and the branch pipeline control module is used for controlling the whole decoding system to carry out two-stage branch pipeline operation.
The technical solution of the present invention is further described below with reference to the accompanying drawings.
The branch pipeline structure ECC decoding system is a BCH decoding controller developed based on NAND FLASH, can support NAND FLASH of SLC and MLC types to carry out correct data storage, adapts to the special environment of satellite communication, realizes 16-bit parallel ECC decoding, integrally realizes the adoption of a branch pipeline structure, and improves the decoding speed. Here the implementation is exemplified by BCH (4200, 4096, 8) codes. Fig. 2 shows the overall principle structure of the branch pipeline structure ECC decoding system.
RAM memory array: the incoming pure data is buffered by multiple RAMs. Each RAM corresponds to one sector of data, and a plurality of sectors are polled and written.
The BCH decoder consists of an adjoint solving module, an error position polynomial solving module and an error locator module, and has the following specific functions:
an syndrome solving module: and calculating a corresponding syndrome according to the input pre-decoded data, and judging whether the input data has errors or not according to the syndrome.
An error location polynomial solving module: the error location polynomial is solved by the syndrome and the number of errors is detected, here using the parallel non-inversion Berlekamp-Massey algorithm (iBM algorithm).
An error locator module: and determining error positions according to the solved error position polynomial, shifting out 16-bit-wide error patterns one by one, wherein each group of error patterns corresponds to one group of data error positions. Here a 16-bit parallel chien search is used to locate the error location.
The data output module is mainly divided into two sub-modules which respectively correspond to two conditions:
and when the syndrome solving module judges that the data has no error, skipping a subsequent module of the BCH decoder and directly outputting the data. When data are wrong, after an error pattern is determined through the error position polynomial solving module and the error locator module, data are output in an error correction mode according to the error pattern.
The branch pipeline control module: the method is used for controlling the whole decoding system to carry out two-stage branch pipeline operation, the first-stage pipeline is syndrome solving, the second-stage pipeline branch is selected according to whether data is in error or not, and the two branches correspond to two conditions of data error and data error prevention.
The present invention determines a set of compiled code data units according to the BCH code, which is implemented, for example, in BCH (4200, 4096, 8) code, where one compiled code data unit is 512Byte, corresponding to check bit 14Byte, and generally, one compiled code data unit is NAND FLASH sectors. The data decoding flow of a sector is shown in fig. 3 as follows:
the data input is divided into two paths, one path discards check bits, and pure data is stored into a polled RAM; and the other path of input data and the check bit are subjected to syndrome solving, and whether the data have errors is judged according to the syndrome.
When the data has no error, the correct data is read from the same RAM and directly output.
When data are wrong, error positioning is carried out through an iBM algorithm and a chien search algorithm, exclusive OR is carried out on the data read from the same RAM and the positioned errors, bit inversion of wrong bits is completed, and correct data are output.
NAND FLASH read/write is in units of pages, and for data read of 1page size, a decoding operation of a plurality of sectors is required. For data decoding of a sector, there are mainly four stages: the syndrome solving stage (syn), the error location polynomial solving stage (iBM), the error location stage (chien) and the data output stage (tx) judge whether the data is in error according to the syndrome, and two different decoding branches are generated, as shown in fig. 4 below:
all data need to pass through an adjoint solution stage (syn), and the stage is defined as a first-stage pipeline syn; for the erroneous data, it will go through an error location polynomial solving stage (iBM), an error locating stage (chien) and a corresponding error correction output stage (tx), which are combined into a branched second stage pipeline ibm _ chien _ tx; for error-free data, only the data output stage (tx) is traversed, which is defined as the second stage pipeline tx of the other branch. The whole decoding process forms a two-stage pipeline structure with branches, and a first-stage pipeline is independent of two second-stage pipelines. The specific flow diagram is shown in fig. 5 as follows:
the invention realizes 16-bit parallel ECC decoding, for BCH (4200, 4096, 8) codes, 263 clock cycles are occupied by the first-stage pipeline syn, 275 clock cycles are occupied by the second-stage pipeline ibm _ chien _ tx of error data, the clock cycle occupied by the second-stage pipeline tx of error-free data depends on the effective data length, 258 clock cycles are occupied when each sector is full of 4096 bits, and only 4096 bits are occupied when the effective data length is less than 4096 bits
One clock cycle.
And the pipeline synchronization is carried out on the condition that the two-stage pipeline occupies different clock periods. When two stages of pipelines are simultaneously operated, pipelines occupying short clock cycles need to wait for long-cycle pipelines to finish and then flow downwards. The running water shown in fig. 5 is only one of the cases, since the running water process has branches. When all sectors in the read 1-page data have no errors, the pipeline only passes through the first-stage pipeline syn and the second-stage pipeline tx to reach the upper limit of the performance of the branch pipeline decoding system. When all sectors in the read 1-page data have errors, the pipeline only passes through the first-stage pipeline syn and the second-stage pipeline ibm _ chien _ tx to reach the lower limit of the performance of the branch pipeline decoding system.
The technical effects of the present invention will be described in detail with reference to simulations.
Due to the characteristics of the BCH code, the decoding time of an ECC decoding system is long, the hardware cost is high, the problem to be solved is solved by shortening the decoding time of the ECC decoding system, the decoding time can be saved to a great extent and the performance can be improved by improving the ECC decoding control logic outside the BCH decoder besides improving the BCH internal decoding algorithm. The 16-bit parallel branch pipeline structure provided by the invention improves the decoding performance from the aspect of peripheral logic control.
When 1page data is read for decoding, the 16-bit parallel operation is used, and half of decoding time is saved compared with the 8-bit parallel operation.
In addition, for NAND FLASH with 8KB 1page, under the optimal condition, when the branch pipeline structure of the present invention is used and 1page data has no error, at least 200 clock cycles are saved compared with the common two-stage pipeline; when the spare area of NAND FLASH is not fully full and the sector is error free, the tx stage pipeline will save more clock cycles than the ibm _ chien _ tx stage pipeline. In the worst case, even if each sector in 1page data is erroneous, the same decoding speed as that of the general pipeline structure can be maintained. The combination of 16-bit parallel operation and a branch pipeline structure can greatly shorten the ECC decoding time and improve the decoding efficiency.
The implementation example is that for NAND FLASH, the 1-page capacity is 8192+448Byte, where the spare area stores valid data of 96Byte, and BCH (4200, 4096, 8) code is used to perform 16-bit parallel branch pipeline structure decoding, in the worst case, i.e. the normal two-stage pipeline structure, the decoding simulation waveform is as shown in fig. 6, and in the best case, the 1-page data decoding simulation waveform is as shown in fig. 7.
The branch pipeline structure ECC decoding system designed by the invention can meet the special situation of the satellite-borne switching system, and is also suitable for the design of daily or commercial NANDFLASH controllers of SLC type and MLC type and similar memory controllers.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.