Fast Decoding ECC For Future Memories
Fast Decoding ECC For Future Memories
Fast Decoding ECC For Future Memories
9, SEPTEMBER 2016
I. I NTRODUCTION the same sign in ever smaller spaces. Two different strategies
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 08:24:40 UTC from IEEE Xplore. Restrictions apply.
AMATO et al.: FAST DECODING ECC 2487
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 08:24:40 UTC from IEEE Xplore. Restrictions apply.
2488 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 34, NO. 9, SEPTEMBER 2016
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 08:24:40 UTC from IEEE Xplore. Restrictions apply.
AMATO et al.: FAST DECODING ECC 2489
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 08:24:40 UTC from IEEE Xplore. Restrictions apply.
2490 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 34, NO. 9, SEPTEMBER 2016
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 08:24:40 UTC from IEEE Xplore. Restrictions apply.
AMATO et al.: FAST DECODING ECC 2491
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 08:24:40 UTC from IEEE Xplore. Restrictions apply.
2492 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 34, NO. 9, SEPTEMBER 2016
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 08:24:40 UTC from IEEE Xplore. Restrictions apply.
AMATO et al.: FAST DECODING ECC 2493
TABLE V went into mass production in 2010 and was sold in millions
T HEORETICAL E STIMATES OF A REA AND L ATENCY FORTHE of units.
BCH2 D ECODER B LOCKS I MPLEMENTED IN GF 2m
A similar, fully parallel solution is proposed in [15]. The
expressions independent of the bit position are evaluated sepa-
rately from the ones depending on α i . Our proposal introduces
the further optimizations of separating linear and nonlinear
operations, and of efficiently computing all the needed linear
combinations. In fact, just to give an example, by taking
k = 256 and the NAND delay (∼0.03 ns) used in [15], we
have that the delay of the BCH2 in [15] is ∼2.5 ns while the
estimated delay of our solution is ∼1.25 ns.
time consuming. The nonlinear part of S13 , given in (1), is
m−1
m−2 VI. U LTRA -FAST T RIPLE -E RROR -C ORRECTING
nl(S13 ) S1,i S1, j (α 2i+ j + α i+2 j ). BCH C ODE
i=0 j =i+1
A triple-error correcting BCH code C (n, k) for a data size
Its computation requires a single level of m(m − 1)/2 A ND k = 2m−1 can be obtained by shortening a primitive BCH code
gates (T A ) and, on average, m(m − 1)/4 terms to add for with length N = 2m − 1, d = 7, 3m parity bits, and generator
each bit, thus a total of m 2 (m − 1)/4 X OR gates and a polynomial given by
worst case latency of log2 (m(m − 1)/2)TX . The nonlinear
part nl(S13 ) is then added to the syndrome S3 . This requires
m−1
x − α3 · 2 x − α5 · 2
i i i
g(x) = x − α2 . (15)
m X OR, and an additional X OR level. By carefully selecting
i=0
the order of the operations (see Section III-.3) in [32], despite
m = 9, we obtained a total latency of T A + 4TX , including the
addition of S3 . A. BCH3 Decoding Algorithm
3) Root Search and Correction: The last stage checks if BCH3 decoding is more complex than BCH2. The BM algo-
rithm is replaced again by the parallel evaluation of the
Ai + nl(S13 ) + S3 = 0 ELP symbolic expressions, that are now selected through a
for i = 1, . . . , n. In this case, ei = 1 and ri must be corrected. more complicated decision tree [30]. It is important to deal
The check is obtained by N ORing m bits, and the correction with a limited number of ELP expressions. Given the above
by X OR ing the result with the bit ri precautions, the decoding algorithm is still composed of the
following stages, as the BCH2.
v̂i = ri + N OR m (Ai + nl(S13 ) + S3 ) 1) Syndrome Evaluation: A triple ECC BCH code needs a
third syndrome evaluation. Eq. (9) can be trivially extended to
where N ORm gate is a N OR gate with m inputs.2
⎡ ⎤ ⎡ ⎤T
S1 W1
⎣ S3 ⎦ = rWT = r ⎣W3 ⎦ (16)
C. BCH2 Decoder Implementation
S5 W5
In Table V we summarize a first-order approximation of
the area occupancy and of the latency as a function of m. where r is the word read from memory, and each Wb is an
In general these expressions overestimate the actual costs, m × n binary matrix whose columns are given by the poly-
because they do not take into account any optimization for nomial representations of α bi with i = 0, 1, . . . , n and
the specific GF. b = 1, 3, 5.
For instance, with m = 9, k = 256, n = 274, from 2) Error Locator Polynomial: When three errors occur
Tables V and I, we deduce a BCH2 decoder area of 7.6 kX OR (ν = 3), say in positions i 1 , i 2 , i 3 , the ELP with roots in α −in
and a latency of 18.4 TX . An efficient implementation of the reads
standard BCH decoder is outlined by Strukov in [12]. By using
Strukov’s formulae and the values in Table I, for the same (x) = 1 − xα i1 1 − xα i2 1 − xα i3
ECC we get a decoder area of 31.24 kX OR, and a latency = 1 + 1 x + 2 x 2 + 3 x 3 .
of 67.2 TX levels, approximately four times larger and slower
than our solution. The latency of this architecture has been The coefficients of (x), evaluated running the BM algorithm
further reduced to 16.2 TX in [32] with a careful optimization. in symbolic form [30], are
This BCH2 implemented in a 45 nm 1G bit PCM device [3]
S5 + S12 S3 S5 S1 + S16 + S32 + S13 S3
1 = S1 , 2 = , 3 = .
2 Such a gate can be implemented by using a tree of elementary N OR and
S3 + S13 S3 + S13
N AND gates. The tree consists of n N OR = 1≤i<m,i odd m/2i N OR gates,
m − 1 − n N OR N AND gates, and it’s latency is log2 (m) TN OR (assuming If S3 + S13 = 0, an inversionless ELP with the same roots is
TN OR = TNAND as in Table I). An I NV gate has to be added to both area and
latency if log2 (m) is even. (x) = A + Bx + C x 2 + Dx 3 (17)
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 08:24:40 UTC from IEEE Xplore. Restrictions apply.
2494 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 34, NO. 9, SEPTEMBER 2016
where
A S3 + S13
B S1 A = S1 S3 + S14
C S5 + S12 S3
D S5 S1 + S16 + S32 + S13 S3 . (18)
Note that D = + S5 S1 +
A2 + D2 .
S13 S3 A2
If A = 0 but D = 0 the last discrepancy of the
BM algorithm is zero, and we obtain the ELP of the
BCH2 decoder.
If D = 0 and also A = 0, the ELP computed by the
BM algorithm has degree one and reads
(x) = A
+ B
x (19)
with A
= 1 e B
= S1 . Finally, if also S1 = 0 we are in the
Fig. 5. Schematic representation of the fast BCH3 decoder.
error free case (S1 = S3 = S5 = 0). Note that the ELP (11)
reduces to (19) when S3 + S13 = 0, whereas the error free case
TABLE VI
requires a separate test because if also S1 = 0, the ELP (11)
T HEORETICAL E STIMATES OF A REA AND L ATENCY
FOR
is null for any x. THE BCH3 D ECODER I MPLEMENTED IN GF 2m
Following [30] and the BM algorithm we would need three
different ELPs, whose choice is driven by the value of the
most time-consuming term D.
An alternative is the symbolic solution of the Key Equation.
If ν = 2 we can pick two equations among
S5 1 + S4 2 = S6 (20)
S4 1 + S3 2 = S5 (21)
S3 1 + S2 2 = S4 (22)
S2 1 + S1 2 = S3 . (23) 3) Root Search and correction: The fastest search for the
From (22) and (23) we get 1 = S1 . From (21) and (23) ELP roots is the parallel test α −i = 0, ∀i = 0, 1..n. When
we obtain ν ≥ 2 ( A = 0), the test can be run checking the expression
S5 + S12 S3 Aα 3i + A2 + Bα 2i + Cα i + D2 = 0 (24)
2 =
S3 + S13
where we highlight that the constant term A2 can be taken
that is the same 2 found for ν = 3. Therefore (17)-(18) hold
into account while computing the linear combinations Aα 3i .
even in case of two errors. This can be verified also by the
Since the evaluation of D2 = S5 S1 + S13 S3 is still the most
Peterson decoding algorithm (see, e.g., [17]).
time-consuming, expression (24) enables the fastest test. The
From (21) and (22) we can infer the condition for ν = 2
computation of Aα 3i + A2 , Bα 2i and Cα i can be carried in
that reads
parallel with D2 as soon as A, B and C get available.
S4 S2 + S32 = 0 ⇔ S16 + S32 = 0 ⇔ S3 + S13 = 0 In the simplified ELP case of ν ≤ 1 ( A = 0), the search
can be run by the same hardware, simply replacing the coef-
and thus the condition for ν = 2 or 3 (and for the use of the
ficient C with 1 and D2 with S1 . In fact, in this case, A = 0,
ELP (17)) is simply A = 0. This is a great advantage because
and B = S1 A = 0.
the computation of A is much faster than the computation
of D, as we will show in Section VI-B.
When A = 0 and S1 = 0, we have ν ≤ 1 and five equations B. BCH3 Decoder Architecture
for 1 . They all lead to Fig. 5 shows the architecture of the proposed decoder. The
scheme follows the algorithm steps described in the previous
1 = S1 .
subsection. In Table VI we give the general expressions for
Finally, if also S1 = 0 the sequence is error free, and the Area and Latency of the three stages as a function of m. In the
correction must be disabled. discussion, to focus on a practical implementation we assume
In conclusion, the decision tree has just two ELPs to choose m = 9, k = 256.
from: 1) Syndrome Evaluation: The blocks labelled Sb gen with
• when A = 0, (x) = A + Bx + C x 2 + Dx 3 , with A, B, b = 1, 3, 5 compute the three m-bit syndromes S1 , S3 and S5
C, D given in (18) given by (16), as for the BCH2, in a completely bit-parallel
• when A = 0, (x) = 1 + S1 x, that includes as a special implementation. Each bit of Sb is obtained by a separate
case the ELP with no roots (x) = 1 when S1 = 0. X OR-tree with inputs taken from r. The complexity and
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 08:24:40 UTC from IEEE Xplore. Restrictions apply.
AMATO et al.: FAST DECODING ECC 2495
latency for each syndrome is the same as in the BCH2: m by the latency required by the other computations. Finally, the
X OR -trees with (n/2 − 1) X OR each, on average. The depth ELP root test requires an m-inputs NOR gate to produce ei and
of the logic is log2 (w) = m − 1 and the total latency of the correction v̂i = ri + ei is completed T A + 20TX + Tm N O R
this stage is again (m − 1)TX . For instance, with m = 9, the seconds after data reading.
latency is 8TX . As to logic area A, the three linear combination blocks
2) Error Locator Polynomial: The second step is run in require (2m−1 (m − 2) + 1)X OR gates each, as in the BCH2
parallel for the terms of the four coefficients of the ELP. The decoder. For each of the k bit positions, (3m+1)X OR gates and
fastest to compute is A, that requires a cube and one addition an m-input N OR are needed. In Table VI we summarize the
that can actually be inserted without latency penalty: it takes theoretical estimates of Area and Latency of the main BCH3
4 X OR levels and one A ND level (see Table III). The result blocks as a function of m.
of the test A = 0 is conveyed at the output of the blocks
computing C and D2 , because their values must be replaced
C. BCH3 Decoder Implementation
if the test is true.
With SPB representation of the GF(512) elements with The solution proposed, with m = 9, k = 256 [33],
v = 3 or 4 (see Table III) even the value of B that is of has been implemented following the Synopsys topographical
type ab + c4 can be computed with the same latency as synthesis methodology using a 54 nm logic gate length CMOS
A. Conversely, whatever the SPB choice, the value of C is technology. In such implementation the decoding latency is
available TX seconds later as it requires a product operation smaller than 3 ns, and the area occupancy of the decoder is
with a squared term. The computed coefficients A, B and C about 250 · 103 µm2 . We remark that the area overhead associ-
are conveyed to the Root Search stage, to compute their linear ated to ECC is mainly due to the additional memory required
combinations. to store the parity check bits. In particular, for the mentioned
The computation of D2 is the most time consum- implementation the overhead due to ECC circuitry is below 5%
ing. An upper bound to its latency is given by 2T A + of the additional array area for the redundancy. Unlike the
estimates of number of elementary gates, these results take
log2 (m(m − 1)/2) + log2 (m) TX + TO , including the
into account the buffering of the longer interconnections as
change D2 → S1 if A = 0. In GF (512) , by using the opti- well as the routing of signals, which may increase the total
mization described in Sec. III-.3, the latency is 2T A +8TX +TO . decoder area occupancy. A trade-off between latency and area
In fact, with SPB the term S1 S5 is available T A + 4TX after occupancy is possible, depending on the specific application.
syndrome evaluation, and the sum of this term can be inserted
during the computation of S13 S3 without latency penalty. The
VII. C ONCLUSIONS
computation of S13 S3 can be completed within 2T A + 8TX ,
for any GF and basis, thanks to the Mastrovito multiplier that ECC solutions are being used nowadays in an increasing
allows the precomputation of the terms based on S3 . The value number of memory and storage devices. Since the latency of
of D2 is thus ready 2T A +16TX after data reading and no linear NAND flash memories is in the order of µs, the major concern
combinations of D2 are needed. for their ECC solutions is the optimization of the throughput.
Note that the replacements D2 → S1 and C → 1 do not Conversely, DRAMs and storage class memory devices have
need standard switches. In fact, the replaced terms C and D access times in the order of ns. For instance, a typical access
are null, thus a simple OR with the replacing values (provided time of current DRAM devices is around 50 ns. Consequently,
they are nulled when not needed) will produce the same also the latency of the ECC decoder plays a critical role for
result. such applications.
The ELP stage requires four multipliers, four adders, In this paper we have described double- an triple-error-
one squarer, one third and one fourth power. An upper bound correcting codes suitable for DRAM-like devices. We have
to the total logic area of this stage is m 3 /4 + 257m 2/ shown that by parallelizing the decoding algorithm, we are
able to achieve low-latency decoding, in the order of tens of
24 + 6m X OR + 9m 2 /2 − m/2 A ND gates.
X OR levels, that translate to around 1 ns in state-of-the-art
3) Root Search and Correction: For the root search, the
CMOS technology. Moreover we have also shown that a high
blocks labeled Linear comb compute Aα 3i + A2 , Bα 2i and
level of parallelism is possible without an exponential increase
Cα i for each bit position i . This computation requires m linear
of the ECC logic area.
combinations of the bits of A, B and C, for each i . Once
again 2m − 1 mk and all possible linear combinations
are computed with a depth of log2 m levels of X OR . The R EFERENCES
linear combinations of A and B are the first available, at the [1] G. Atwood, “Current and emerging memory technology landscape,”
same time in GF(512) with SPB. For each position i , the in Proc. Flash Memory Summit, Santa Clara, CA, USA, Aug. 2011,
term Aα 3i + A2 + Bα 2i is ready at the same time as Cα i , pp. 1–24.
[2] (Apr. 2016). Breakthrough Nonvolatile Memory Technology. [Online].
TX seconds later. This is shown for a single position i Available: https://www.micron.com/about/emerging-technologies/3d-
in Fig. 5. When the addition Aα 3i + A2 + Bα 2i +Cα i is ready, xpoint-technology
TX seconds later, the value of the coefficient D2 is also ready, [3] C. Villa, D. Mills, G. Barkley, H. Giduturi, S. Schippers, and
D. Vimercati, “A 45nm 1Gb 1.8V phase-change memory,” in IEEE Int.
and the overall ELP computation is completed T A + 19TX Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), San Francisco,
seconds after data reading. Thus only one TX is not hidden CA, USA, Feb. 2010, pp. 270–271.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 08:24:40 UTC from IEEE Xplore. Restrictions apply.
2496 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 34, NO. 9, SEPTEMBER 2016
[4] T.-Y. Liu et al., “A 130.7mm2 2-layer 32Gb ReRAM memory device [30] C. Kraft, “Closed solution of Berlekamp’s algorithm for fast decoding
in 24nm technology,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. of BCH codes,” IEEE Trans. Commun., vol. 39, no. 12, pp. 1721–1725,
Papers (ISSCC), San Francisco, CA, USA, Feb. 2013, pp. 210–211. Dec. 1991.
[5] W. Otsuka et al., “A 4Mb conductive-bridge resistive memory with [31] H. Lee, “High-speed VLSI architecture for parallel Reed–Solomon
2.3GB/s read-throughput and 216MB/s program-throughput,” in IEEE decoder,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11,
Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), San Francisco, no. 2, pp. 288–294, Apr. 2003.
CA, USA, Feb. 2011, pp. 210–211. [32] P. Amato, C. Laurent, M. Sforzin, S. Bellini, M. Ferrari, and
[6] K. Tsuchida et al., “A 64Mb MRAM with clamped-reference and A. Tomasoni, “Ultra fast, two-bit ECC for emerging memories,” in Proc.
adequate-reference schemes,” in Proc. IEEE Int. Solid-State Circuits 6th IEEE Int. Memory Workshop (IMW), Taipei, Taiwan, May 2014,
Conf. Dig. Tech. Papers (ISSCC), San Francisco, CA, USA, Feb. 2010, pp. 79–82.
pp. 258–259. [33] M. Ferrari, A. Tomasoni, S. Bellini, P. Amato, M. Sforzin, and
[7] K. R. Udayakumar et al., “Low-power ferroelectric random access C. Laurent, “Embedded ECC solutions for emerging memories (PCMs),”
memory embedded in 180nm analog friendly CMOS technology,” in in Proc. DATE, 2016, p. 6. [Online]. Available: https://kluedo.ub.uni-
Proc. 5th IEEE Int. Memory Workshop (IMW), Monterey, CA, USA, kl.de/frontdoor/index/index/year/2016/docId/4320
May 2013, pp. 128–131.
[8] N. F. Mott, Metal-Insulator Transitions, 2nd ed. London, U.K.:
Taylor & Francis, 1990.
[9] Low Power Double Data Rate 4 (LPDDR4), JEDEC, Paolo Amato received the Laurea (cum laude)
document JESD209-4, Aug. 2014. [Online]. Available: https://www. degree in computer science from the University of
jedec.org/ Milano, Italy, in 1997, and the Ph.D. degree in
[10] B. L. Ji et al., “In-line-test of variability and bit-error-rate of HfOx - computer science from the University of Milano-
based resistive memory,” in Proc. IEEE Int. Memory Workshop (IMW), Bicocca in 2013.
May 2015, pp. 1–4. Dr. Amato joined Micron in 2010, where he
[11] Y. Emre, C. Yang, K. Sutaria, Y. Cao, and C. Chakrabarti, “Enhancing investigates storage and memory architectures based
the reliability of STT-RAM through circuit and system level techniques,” on mainstream and emerging technologies for next
in Proc. IEEE Workshop Signal Process. Syst. (SiPS), Oct. 2012, generations of mobile systems. From 1998 to 2008,
pp. 125–130. he was with STMicroelectronics, Agrate, Italy. From
[12] D. Strukov, “The area and latency tradeoffs of binary bit-parallel 2000 to 2005, he was the Leader of the Methodology
BCH decoders for prospective nanoelectronic memories,” in Proc. 40th for Complexity Team (a global research and development team of about
Asilomar Conf. Signals, Syst. Comput. (ACSSC), Pacific Grove, CA, 30 people in Milan, Napoli, Catania, Moscow, and Singapore), which is
USA, Oct./Nov. 2006, pp. 1183–1187. aimed at developing methods, algorithms, and software tools for complex
[13] J. Yeon, S.-J. Yang, C. Kim, and H. Lee, “Low-complexity triple-error- system management. From 2005 to 2010, he was the Manager of Statistical
correcting parallel BCH decoder,” J. Semicond. Technol. Sci., vol. 13, Methods for the Non Volatile Memory-Technology Development Group, ST,
no. 5, pp. 465–472, Oct. 2013. and Numonyx, and he started the development of ECC solutions for PCM
[14] X. Wang, D. Wu, L. Pan, R. Zhou, and C. Hu, “An on-chip high-speed devices. He is currently a Distinguished Member of the Technical Staff with
4-bit BCH decoder in MLC NOR flash memories,” in Proc. IEEE Asian Micron Semiconductor Italia S.r.l. He is also an expert on statistical methods,
Solid-State Circuits Conf., Taipei, Taiwan, Nov. 2009, pp. 229–232. error correcting codes and security, and of their application to mobile systems.
[15] C. Badack, T. Kern, and M. Gössel, “Modified DEC BCH codes for He has authored over 40 papers published in peer-reviewed international
parallel correction of 3-bit errors comprising a pair of adjacent errors,” journals and international conferences, and filed over 20 patents.
in Proc. 20th IEEE IOLTS, Girona, Spain, Jul. 2014, pp. 116–121.
[16] I. Yoo and I.-C. Park, “A search-less DEC BCH decoder for low-
complexity fault-tolerant systems,” in Proc. IEEE Workshop Signal Sandro Bellini received the Dr.Ing. (cum laude)
Process. Syst. (SiPS), Belfast, U.K., 2014, pp. 1–6. degree in electronic engineering from the Politecnico
[17] X. Zhang, VLSI Architectures for Modern Error-Correcting Codes. di Milano, Italy, in 1971. He began his research
New York, NY, USA: Taylor & Francis, 2015. activity with the Italian National Research Council.
[18] E. D. Mastrovito, “Vlsi designs for multiplication over finite fields He became an Associate Professor in 1982, and has
G F(2m ),” in Proc. 6th Int. Conf. Appl. Algebra, Algebraic Algo- been a Full Professor of telecommunications since
rithms Error-Correcting Codes (AAECC), Jul. 1989, pp. 297–309, doi: 1990.
10.1007/3-540-51083-4_67. During this time, his main research themes have
[19] B. Sunar and cC. K. Kocc, “Mastrovito multiplier for all trinomials,” been analysis and design of pulse modulation sys-
IEEE Trans. Comput., vol. 48, no. 5, pp. 522–527, May 1999. tems, digital frequency modulation with continu-
[20] H. Fan and M. A. Hasan, “A survey of some recent bit-parallel GF(2n ) ous phase, efficient simulation techniques based on
multipliers,” Finite Fields Appl., vol. 32, pp. 5–43, Mar. 2015. importance sampling, computer emission tomography, multicarrier demodula-
[21] H. Fan and Y. Dai, “Fast bit-parallel GF(2n ) multiplier for all trinomi- tion, channel equalization, blind equalization, symbol and carrier synchroniza-
als,” IEEE Trans. Comput., vol. 54, no. 4, pp. 485–490, Apr. 2005. tion, digital data storage on optical support, and advanced coding techniques.
[22] S.-M. Park and K.-Y. Chang, “Low complexity bit-parallel squarer In recent years, his research activity has mainly been devoted to channel
for G F(2n ) defined by irreducible trinomials,” IEICE Trans. Fundam. coding, with an emphasis on turbo codes, LDPC codes, and turbo product
Electron. Commun. Comput. Sci., vol. E89-A, no. 9, pp. 2451–2452, codes.
Sep. 2006.
[23] X. Xiong and H. Fan, “Bit-parallel G F(2n ) squarer using shifted
polynomial basis,” Cryptol. ePrint Arch., Tech. Rep. 2012/626, 2012.
[Online]. Available: http://eprint.iacr.org/2012/626 Marco Ferrari (M’06) was born in Milano, Italy,
[24] H. Fan and M. A. Hasan, “Fast bit parallel-shifted polynomial basis in 1971. He received the Laurea (cum laude)
multipliers in G F(2n ),” IEEE Trans. Circuits Syst. I, Reg. Papers, degree in telecommunications engineering and the
vol. 53, no. 12, pp. 2606–2615, Dec. 2006. Ph.D. degree in electronics and communication
[25] A. Cilardo, “Fast parallel GF(2m ) polynomial multiplication for all engineering from the Politecnico di Milano, Italy,
degrees,” IEEE Trans. Comput., vol. 62, no. 5, pp. 929–943, May 2013. in 1996 and 2000, respectively.
[26] R. C. Mullin, I. M. Onyszchuk, S. A. Vanstone, and R. M. Wilson, Since 2001, he has been a Researcher with the
“Optimal normal bases in GF( pn ),” Discrete Appl. Math., vol. 22, no. 2, Department of Electronics, Information and Bio-
pp. 149–161, 1989. engineering, Institute IEIIT of the Italian National
[27] B. Sunar and C. K. Koc, “An efficient optimal normal basis type II Research Council (CNR), Politecnico di Milano.
multiplier,” IEEE Trans. Comput., vol. 50, no. 1, pp. 83–87, Jan. 2001. In 2002, he was an EPRSC Research Fellow with
[28] A. Reyhani-Masoleh and M. A. Hasan, “A new construction of the University of Plymouth, U.K. He has co-authored tens of scientific publi-
Massey–Omura parallel multiplier over GF(2m ),” IEEE Trans. Comput., cations in leading international journals and conference proceedings, and a few
vol. 51, no. 5, pp. 511–520, May 2002. patents. His main research interests are in digital transmission, information
[29] S. Lin and D. J. Costello, Error Control Coding, 2nd ed. theory and channel coding. He is a member of the IEEE Communications
Upper Saddle River, NJ, USA: Prentice-Hall, 2004. Society.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 08:24:40 UTC from IEEE Xplore. Restrictions apply.
AMATO et al.: FAST DECODING ECC 2497
Christophe Laurent received the Laurea degree Alessandro Tomasoni (M’09) was born in Milano,
in microelectronics and robotics from Polytech Italy, in 1980. He received the M.S. (cum laude)
Montpellier, France, and Politecnico di Torino, Italy, degree in telecommunications engineering in 2005,
in 1998. From 1998 to 2002, he was a Consul- and the Ph.D. degree in information engineering
tant with Alcatel, STMicroelectronics, Stepmind, from the Politecnico di Milano, Italy, in 2009.
and Philips Semiconductors, in France and Italy, Since 2012, he has been a Researcher with
designing GSM baseband circuits, Bluetooth base- the Dipartimento di Elettronica, Informazione
band circuits, and cryptographic-related peripherals. e Bioingegneria, Italian National Research Council,
From 2002 to 2008, he designed novel architec- Institute of Electronics, Computer and Telecommu-
tures for flash NOR, including ECC circuits for nication Engineering, Politecnico di Milano, Italy.
STMicoelectronics and Numonyx. He is currently From 2009 to 2011, he was a Temporary Research
SMTS with Micron Semiconductor Italia S.r.l., Vimercate, Italy, where he is Assistant with the Dipartimento di Elettronica, Informazione e Bioingegneria.
involved in defining, implementing and synthesizing novel emerging memories In 2008, he was a Visiting Researcher with the Viterbi School of Engineer-
architectures. He is serving as an Expert in ECC code development and ing, University of Southern California, Los Angeles, CA, USA. He was a
implementation, and serving as a Consultant for all the design teams. consultant for leading industries, such as STMicroelectronics, Alcatel-Lucent
Mr. Laurent received three company recognition awards, filed eight inter- and Micron. He has co-authored several publications on journal papers and
national patents and published one article in TLP Technical Journal. He also conference proceedings, and is co-inventor of many patents. His main research
received the Best Technical Paper Award for an article in the SNUG interests are in wireless communications, optical communications and solid
France 2012. state data storage, with emphasis on information theory, advanced channel
coding and channel estimation.
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SILCHAR. Downloaded on November 09,2022 at 08:24:40 UTC from IEEE Xplore. Restrictions apply.