[go: up one dir, main page]

Academia.eduAcademia.edu
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 2, MARCH 2005 293 Encoding Strategy for Maximum Noise Tolerance Bidirectional Associative Memory Dan Shen and Jose B. Cruz, Jr., Life Fellow, IEEE Abstract—In this paper, the basic bidirectional associative memory (BAM) is extended by choosing weights in the correlation matrix, for a given set of training pairs, which result in a maximum noise tolerance set for BAM. We prove that for a given set of training pairs, the maximum noise tolerance set is the largest, in the sense that this optimized BAM will recall the correct training pair if any input pattern is within the maximum noise tolerance set and at least one pattern outside the maximum noise tolerance set by one Hamming distance will not converge to the correct training pair. This maximum tolerance set is the union of the maximum basins of attraction. A standard genetic algorithm (GA) is used to calculate the weights to maximize the objective function which generates a maximum tolerance set for BAM. Computer simulations are presented to illustrate the error correction and fault tolerance properties of the optimized BAM. Index Terms—Bidirectional associative memory (BAM), energy well hyper-radius, neural network training, noise tolerance set. I. INTRODUCTION N 1968, Anderson [6] proposed a memory structure named linear associative memory (LAM), which can be used in hetero-associative pattern recognition. Since LAM is noise sensitive, optimal LAM was introduced by Wee [7] and Kohonen [8], which extended the LAM by absorbing the noise. Although good results can be obtained using these early approaches, many theoretical and practical issues such as network stability and storage capacity were still unresolved. In 1988, Kosko [1] presented the theory of bidirectional associative memory (BAM) by generalizing the Hopfield network model. As a class of artificial neural networks, BAMs provide massive parallelism, high error correction and high fault tolerance ability. However, to form a good BAM, a good encoding strategy was required. This field has received extensive attention from researchers and a substantial effort has been devoted to various learning rules. Kosko [1] has provided a correlation learning strategy and proved that the BAM process will converge after a finite number of interactions. However, the correlation matrix used by Kosko cannot guarantee that the energy of any training I Manuscript received June 1, 2003; revised November 4, 2003. This work was supported by the Defense Advanced Research Project Agency (DARPA) under Contract F33615-01-C3151 issued by the Air Force Research Laboratory/Air Vehicles Directorate. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the DARPA or AFRL. The authors are with the Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210 USA (e-mail: shen.100@osu.edu; jbcruz@ieee.org). Digital Object Identifier 10.1109/TNN.2004.841793 pair is a local minimum. That is, it can not guarantee recall of any training pair even for a very small set of training data. During the following years, various encoding strategies and learning rules were proposed to improve the capacity and the performance of BAM. In 1990, Wang et al. [2] introduced two BAM encoding schemes to increase the recall performance with a trade off of more neurons. These are multiple training methods, which guarantee the recall of all training pairs [3]. In 1993 and 1994, Leung [9], [10] presented the enhanced householder encoding algorithm (EHCA), which was improved by Lenze [11] in 2001, to enlarge the capacity. In 1995, Wang and Don [12] introduced the exponential bidirectional associative memory (eBAM), which uses an exponential encoding rule rather than the correlation scheme. For other types of neural networks, there are good procedures for learning, training and stability analysis in [13]–[18]. However, for the conventional BAM, the current methods have focused on the training set or capacity only. The noisy neighbor pairs and the noise tolerance set of BAM have been ignored. In this paper, we are especially interested in the approach proposed by Wang et al. [2], [3] and expand the applicability of the BAM. The principal contribution of this paper is the construction of an objective function whose maximum with respect to corresponds to the weight that results in the maximum noise tolerance set. For a given set of training pairs, any noisy input pair within the tolerance set will converge to the correct training pair. Some basic concepts of BAM are reviewed in Section II. Then, the multiple training concept is extended in Section III with the optimization-based encoding strategy for constructing the correlation matrix. Four lemmas and two theorems about the new encoding rule are proved in the same section. These provide the foundation for constructing the maximum noise tolerance set. We present a numerical example in Section IV to illustrate the effectiveness of the extended BAM. In this example, a standard GA is used to solve the nonlinear optimal problem and obtain the optimum training weights. Finally, we draw a conclusion in Section V. II. BAM BAM is a two-layer hetero-associative feedback neural network model first introduced by Kosko [1]. As shown in Fig. 1, the input layer includes binary valued neurons and the output layer comprises binary valued components . Now we have and . BAM can be denoted as a 1045-9227/$20.00 © 2005 IEEE 294 Fig. 1. IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 2, MARCH 2005 Structure of BAM. bi-directional mapping in vector space . The training pairs can be stored in the correlation matrix as follows: where i.e. and If inputs are the bipolar mode of and , respectively, are orthogonal to each other, i.e. Kosko [1] and Haines et al. [4] have proved that after a finite number of iterations, converges to a local minimum, where is a stable point. the corresponding pair McEliece et al. [5] has shown that if the training pairs are even with probability 0.5) and -dimensional, the storage coded ( . That means, capacity of the homogeneous BAM is if even-coded stable states are chosen uniformly at random, the maximum value of in order that most of the original . vectors are accurately recalled is For the nonhomogeneous BAM, Haines and Hecht-Nielsen [4] have pointed out that the possible number of the stable states . However, since these stable states is between 1 and are chosen in a rigid geometrical procedure, the storage capacity of the nonhomogeneous BAM is less than the maximum number. Haines and Hecht-Nielsen [4] also have shown that for same dimensional and uniformly randomly chosen training exactly entries equal to and pairs with entries equal to , if , then a nonhomogeneous BAM can be constructed so that approximately 98% of these chosen pairs can be stable states. III. ENCODING STRATEGY FOR BAM WITH MAXIMUM NOISE TOLERANCE SET In this new enhanced model, we start with a weighted learning rule of BAM similar to the multiple training strategy in [3]. For , the weighted correa given set of training pairs lation matrix is then (1) where To obtain higher accuracy for associative memory and recan be trieve one of the nearest training inputs, the output , determine a fed back to BAM. Starting with a pair sequence , until it finally converges to an equilibrium point . If BAM converges for every is said to be bidirectional stable. training pair, The sequence can be obtained as follows: and are the lengths of the input and output patterns, respectively. is the vector of training weights. In [3], necessary and sufficient conditions are desuch that each training pair can be rived for choosing recalled correctly. is defined as The energy of a training pair (2) and where is the th element of the vector. two thresholds for the th element of and and and are , respectively. If , then this kind of BAM is called homogeneous. Others are called nonhomogeneous BAM. For each pair, the Lyapunov or energy function is defined as If the energy of one training pair is lower than all its neighbors with one Hamming distance away from it, then the training pair can be recalled correctly. The neighbor pairs with ( , Integer set) Hamming disis defined as tance away from a pair where and , and and . is the Hamming distance between layers is the Hamming distance between layers SHEN AND CRUZ: ENCODING STRATEGY FOR MAXIMUM NOISE TOLERANCE BAM Lemma 1: If a training satisfies weight vector 295 3) an upper bound of the energy well hyper-radius is (3) where .. . .. . .. . .. . differs from Then, only in the kth (positive integer set), such that any pair , has higher energy than . any pair Proof: Wang et al. [2] has proved that if a training weight satisfies condition (3), then all training pairs can be vector can be recalled correcalled correctly. Since a training pair is a local minimum on the energy surrectly if and only if has higher energy than face, any pair . So, at any pair least Proof: From Lemma 1 and Definition 1, since satisfies . (3), its associated energy well hyper-radius 1) Kosko [1] has pointed out that when a pair is an input to a BAM, the network quickly evolves to a system energy , there local minimum. For any input pair in is a high energy “hill” around it. So it is guaranteed that . Since BAM evolves to some pair is the only system energy local minimum, any converges to the training input pair in the set pair . , if 2) For any , then there is at least one pair . From conclusion 1) which we have just proved, converges to the training pair and . It implies that which is inconsistent with the condition that . So, , for any and such that . 3) From conclusion 2) that for any and any , , we have , we obtain . So an upper bound for the energy well hyper-radius is such that any pair , has higher energy than any pair . Definition 1: For a BAM satisfying condition (3), we define the maximum as the energy well hyper-radius which satisfies the following: 1) ; 2) , and any pair has higher energy than any pair ; 3) has energy at least one pair lower than or equal to that of at least one pair . Lemma 2: Given a desired training pair set , a weight vector satisfying condition (3), for the associated energy well hyper-radius , if we define for each , , then: 1) any input pair in the set converges to the training pair ; , we have 2) for any and such that ; Definition 2: For a given training pair set with a weight vector and the associated energy well hyper-radius , we define as the noise . tolerance set of BAM Any pair in input to BAM converges to the correct training pair. which We want to find the optimal training weight vector with the maximum encan generate a correlation matrix and the optimum noise tolerance ergy well hyper-radius set any . In [3], Wang et al. just considered neighbors with one Hamming distance, corresponding to , and . Their method does not provide any information for determining a noise tolerance set . in a training set For each training pair and formed from the training set by (1), we define the energy of any neighbor (4) where 296 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 2, MARCH 2005 and are the position indices of the bits with the complementary values (in bipolar mode, the complementary value of ( ) is ( ); in binary mode, the complementary value of 1(0) is 0 (1)) for the input pattern and if and where in the operation and stand for and the condition , respectively. and (5) (11) has a similar meaning for the output pat- while tern The series and if can be generated by the following formula: (6) We also define (12) (7) where Then, for a fixed weight vector objective function is defined as , the (8) where is a weighted sum of the energy difference be, tween any pair and any and de- pair for any , . It is obvious that series is strictly decreasing. Theorem 1 (Maximum Noise Tolerance Theorem): Given a and at least one satisfying set of training pairs denotes the that maxithe condition of Lemma 1, and if , where is given in (4) – (12) mizes (13) then the following. 1) The BAM hyper-radius satisfies has the maximum energy well , where uniquely fined as (9), shown at the bottom of the page, where means all combinations of and which satisfies condition (5) and (6), respectively. is defined as If and , then (14) 2) If and , then any for any , i.e., , there is at least one pair such that if it is input to BAM, the If and , then output layer will not converge to the correct training pair. Proof: We divide the proof into three parts. The first one is to show that uniquely satisfies inequality (14). The second is is the maximum energy well hyper-radius. to prove that If and , then The last one is to show that any . First, given a training weight vector and energy well , depends on the training pair set hyper-radius . Since for any pair , we put a penalty value on the objective function if has energy lower than or equal to that of any neighbor pair (10) and is a strictly decreasing series, the objective function (9) SHEN AND CRUZ: ENCODING STRATEGY FOR MAXIMUM NOISE TOLERANCE BAM takes the largest value when only one neighbor pair has energy lower than or equal to that of one pair . On the , other hand, when any neighbor pair has energy lower than or equal to that of any pair 297 This is inconsistent with the fact that . Hence, inequality (14) is satisfied by a unique . , then is the maximum energy Second, if well hyper-radius. If , then the conclusion that is the maximum energy well hyper-radius can be proved by contradiction as follows. pair, with the energy well hyperIf there is a radius , , then , takes the lowest value. So, inequality (14) holds. If , since and , inequality (14) still holds. It can be shown by contradiction that only one unique satisfies the inequality (14). that satisfies inequality (14), If there is , then while (15) so From the condition , we have or . If , from the right part of (14) Then we obtain which is inconsistent with as the optimal solution. So is the max(13) that defines imum energy well hyper-radius. is the maximum energy well hyper-raThird, since dius, for any , there is at least one neighbor pair This is inconsistent with the fact that , the right part of (15) If . , which has energy lower than or equal to that of one pair . is input to BAM, the Then if this neighbor pair output pair will not be the correct training pair. Since and , , we . So, there is can obtain that at least one input pair , such that if it is input to BAM, the network does not converge to the correct training pair. Hence, the optimum tolerance set is . 298 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 2, MARCH 2005 Remarks: The optimum noise tolerance set will be called the maximum noise toleris the maximum basin of ance set. Note that attraction for the training pair . That is, the optimum is the union of the maximum basins noise tolerance set of attraction. It is for a fixed training pair set. It is possible to find some method, such as the dummy augmentation in [2] to change the set of training pairs to one with increased separation between the training pairs but with the same information content. Intuitively, this can increase the probability of finding a larger maximum noise tolerance set due to an increased energy well hyper-radius upper bound. There are three types of neighbors for BAM: 1) the ones , whose output pairs converge to the correct training pairs; 2) the ones, whose deviations are beyond the upper bound away from any training pair will be recalled correctly using . Then, any patterns with less than the training weight Hamming distance away from any training pair will be recalled . correctly using the training weight . It means that . Then . So, To prove 1) If , we consider all four cases. and , then, by Lemma 4, . 2) If and . By Lemma 4, 3) If . By Lemma 3, not possible. 4) If whose output pairs will not converge to correct training pairs; 3) others that may or may not be recalled correctly. Definition 3: For the fixed upper bound , we define as a tentative value of . Definition 4: In the Maximum Noise Tolerance Theorem, if and instead of and we replace with , we obtain . Since is not unique, we denote the set by . if only if Lemma 3: . Proof: From the proof of the Maximum Noise Tolerance Theorem, we know that if then any pair , and has higher energy than any pair . Thus, . , then, by Lemma 3, . So and . , then, by Lemma 4, . So this case is and , by Lemma 3, and . So . . From the above, we can conclude that Remarks: Theorem 2 is very useful in saving computation time. Based on the fact that the smaller the , the shorter the computation time, we can pick a smaller tentative to calcu. If we conclude using late Lemma 3, then we can increase until by 1 and calculate by Lemma 4. IV. COMPUTER SIMULATIONS A numerical example taken from [2] is given in this section to evaluate the performance of the extended BAM with optimized training weights. Suppose one wants to distinguish three pattern pairs shown in Fig. 2. is the maximum energy well hyper-radius, . On the other hand, if , Since , it means that any pair and has higher energy than any pair . Then . Lemma 4: If , then . Proof: By Lemma 3, if then , . Consider the fact . So, by definition 1, this Theorem 2: For any , , we conclude that is exactly . and . Proof: By definition 4, , any patterns with less than . For Hamming distance Since 26 is a relatively big number, we use the methodology presented in Theorem 2. We pick 1 as the first tentative value of . In this example, to find the optimum training weights, the objective function defined in (8) is used as the fitness function of the genetic algorithm (GA). The advantage of the algorithm proved in Theorem 2 is shown in Fig. 3. and . We have used 10 000 randomly generated samples to test the optimized BAM. All training pairs have been recalled correctly and all noisy input pairs with less than 4 Hamming distance away from the training pairs have converged to the correct training pair. We also find a pattern with 5 Hamming distance away from the training pair 1, which cannot be recalled correctly, as shown in Fig. 4. SHEN AND CRUZ: ENCODING STRATEGY FOR MAXIMUM NOISE TOLERANCE BAM 299 Fig. 5. Pattern with 4 Hamming distance away from the training pair 2 (upper) cannot be recalled by methodology in [2] and [3]. Fig. 2. Three training pairs. Fig. 6. Same pattern can be recalled by the optimized BAM. V. CONCLUSION Fig. 3. Maximum F versus computation time. Fig. 4. Input pattern with 5 Hamming distance away from the training pair 1 (upper) versus wrong result (lower). We also compared our optimized BAM with the methodology in [2] and [3]. The simulation results in Figs. 5 and 6 show that our method can find the maximum noise tolerance set, which is not guaranteed by the algorithms in [2] and [3]. We extended the Basic BAM, using an optimized weight for the correlation matrix. For a given set of training pairs, we determined the weights for the training pairs in the BAM correlation matrix that result in the maximum noise tolerance set. Any noisy input pair within the tolerance set will converge to the correct training pair. We proved that for a given set of training pairs, the maximum noise tolerance set is the largest in the sense that at least one pair, with Hamming distance one larger than the hyper radius associated with the optimum noise tolerance set, will not converge to the correct training pair. A standard GA was used to calculate the weights to maximize the objective function. For BAM applications, the speed of encoding is relatively less important than that of the decoding because the encoding can be calculated offline. However, if adaptive encoding is needed to apply to some new desired pairs in real time simulation, the training time should be as short as possible. In the example of this paper, a standard GA algorithm was used. This GA worked well but performed relatively inefficiently, as calculation times were quite long with many generations and fitness values needed to find the optimal solution. Since this calculation is offline, this limitation is not serious. ACKNOWLEDGMENT The authors acknowledge helpful discussions with Dr. G. Chen. 300 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 2, MARCH 2005 REFERENCES [1] B. Kosko, “Bidirectional associative memories,” IEEE Trans. Syst., Man, Cybern., vol. 18, no. 1, pp. 49–60, Jan. 1988. [2] Y.-F. Wang, J. B. Cruz Jr., and J. H. Mulligan Jr., “Two coding strategies for bidirectional associative memory,” IEEE Trans. Neural Netw., vol. 1, no. 1, pp. 81–92, Jan. 1990. , “Guaranteed recall of all training pairs for bidirectional associative [3] memory,” IEEE Trans. Neural Netw., vol. 2, no. 6, pp. 559–567, Nov. 1991. [4] K. Haines and R. Hecht-Nielsen, “A BAM with increased information storage capacity,” in Proc. IEEE ICNN ’88, vol. 1, Jul., pp. 181–190. [5] R. McEliece, E. Posner, E. Rodemich, and S. Venkatesh, “The capacity of the Hopfield associative memory,” IEEE Trans. Inf. Theory, vol. IT–33, no. 7, pp. 461–482, Jul. 1987. [6] J. Anderson, “A memory storage model utilizing spatial correlation functions,” Kybernetik, vol. 5, no. 3, pp. 113–119, 1968. [7] W. G. Wee, “Generalized inverse approach to adaptive multiclass pattern classification,” IEEE Trans. Comput., vol. C-17, no. 12, pp. 1157–1164, Dec. 1969. [8] T. Kohonen and M. Ruohonen, “Representation of associative pairs by matrix operations,” IEEE Trans. Comput., vol. C–22, no. 7, pp. 701–702, Jul. 1973. [9] C. S. Leung, “Encoding method for bidirectional associative memory using projection on convex sets,” IEEE Trans. Neural Netw., vol. 4, no. 5, pp. 879–881, Sep. 1993. , “Optimum learning for bidirectional associative memory in the [10] sense of capacity,” IEEE Trans. Syst., Man, Cybern., vol. 24, no. 5, pp. 791–796, May. 1994. [11] B. Lenze, “Improving Leung’s bidirectional learning rule for associative memories,” IEEE Trans. Neural Netw., vol. 12, no. 5, pp. 1222–1226, Sep. 2001. [12] C.-C. Wang and H.-S. Don, “An analysis of high-capacity discrete exponential BAM,” IEEE Trans. Neural Netw., vol. 6, no. 2, pp. 492–496, Mar. 1995. [13] P. Liu and H. Li, “Efficient learning algorithms for three-layer regular feedforward fuzzy neural networks,” IEEE Trans. Neural Netw., vol. 15, no. 3, pp. 545–558, May 2004. [14] G.-B. Huang, “Learning capability and storage capacity of two-hiddenlayer feedforward networks,” IEEE Trans. Neural Netw., vol. 14, no. 2, pp. 274–281, Mar. 2003. [15] D. Chakraborty and N. R. Pal, “A novel training scheme for multilayered perceptrons to realize proper generalization and incremental learning,” IEEE Trans. Neural Netw., vol. 14, no. 1, pp. 1–14, Jan. 2003. [16] P. K. H. Phua and D. Ming, “Parallel nonlinear optimization techniques for training neural networks,” IEEE Trans. Neural Netw., vol. 14, no. 6, pp. 1460–1468, Nov. 2003. [17] J.-D. Hwang and F.-H. Hsiao, “Stability analysis of neural-network interconnected systems,” IEEE Trans. Neural Netw., vol. 14, no. 1, pp. 201–208, Jan. 2003. [18] X. Liao and K.-W. Wong, “Robust stability of interval bidirectional associative memory neural network with time delays,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 34, no. 2, pp. 1142–1154, Apr. 2004. View publication stats Dan Shen received the B.S. degree in automation from Tsinghua University, Beijing, China, in 1998 and the M.S. degree in electrical engineering from The Ohio State University (OSU), Columbus, in 2003. Currently, he is working toward the Ph.D. degree at OSU. From 1998 to 2000, he was with Softbrain Software Co., Ltd, Beijing, China, as as a Software Engineer. He is currently a Graduate Research Associate in the Department of Electrical and Computer Engineering at OSU. His research interests include game theory and its applications, optimal control, and adaptive control. Jose B. Cruz, Jr. (M’57–SM’61–F’68–LF’95) received the B.S. degree in electrical engineering (summa cum laude) from the University of the Philippines (UP) in 1953, the S.M. degree in electrical engineering from the Massachusetts Institute of Technology (MIT), Cambridge, in 1956, and the Ph.D. degree in electrical engineering from the University of Illinois, Urbana-Champaign, in 1959. He is currently a Distinguished Professor of Engineering and Professor of Electrical and Computer Engineering at The Ohio State University (OSU), Columbus. Previously, he served as Dean of the College of Engineering at OSU from 1992 to 1997, Professor of electrical and computer engineering at the University of California, Irvine (UCI), from 1986 to 1992, and at the University of Illinois from 1965 to 1986. He was a Visiting Professor at MIT and Harvard University, Cambridge, in 1973 and Visiting Associate Professor at the University of California, Berkeley, from 1964 to 1965. He served as Instructor at UP in 1953–1954, and Research Assistant at MIT from 1954 to 1956. He is the author or coauthor of six books, 21 chapters in research books, and numerous articles in research journals and refereed conference proceedings. Dr. Cruz was elected as a member of the National Academy of Engineering (NAE) in 1980. In 2003, he was elected a Corresponding Member of the National Academy of Science and Technology (Philippines). He is also a Fellow of the American Association for the Advancement of Science (AAAS), elected 1989, and a Fellow of the American Society for Engineering Education (ASEE), elected in 2004. He received the Curtis W. McGraw Research Award of ASEE in 1972 and the Halliburton Engineering Education Leadership Award in 1981. He is a Distinguished Member of the IEEE Control Systems Society and received the IEEE Centennial Medal in 1984, the IEEE Richard M. Emberson Award in 1989, the ASEE Centennial Medal in 1993, and the Richard E. Bellman Control Heritage Award, American Automatic Control Council (AACC), 1994. In addition to membership in NAE, ASEE, and AAAS, he is a Member of the Philippine American Academy for Science and Engineering (Founding member, 1980, President 1982, and Chairman of the Board, 1998–2000), Philippine Engineers and Scientists Organization (PESO), National Society of Professional Engineers, Sigma Xi, Phi Kappa Phi, and Eta Kappa Nu. He served as a Member of the Board of Examiners for Professional Engineers for the State of Illinois, from 1984 to 1986. He served on various professional society boards and editorial boards, and he served as an officer of professional societies, including IEEE, where he was President of the Control Systems Society in 1979, Editor of the IEEE TRANSACTIONS ON AUTOMATIC CONTROL, a Member of the Board of Directors from 1980 to 1985, Vice President for Technical Activities in 1982 and 1983, and Vice President for Publication Activities in 1984 and 1985. Currently, he serves as Chair (2004–2005) of the Engineering Section of the American Association for the Advancement of Science (AAAS).