CN114241245B

CN114241245B - Image classification system based on residual capsule neural network

Info

Publication number: CN114241245B
Application number: CN202111587100.5A
Authority: CN
Inventors: 胡小方; 何鹏; 周跃; 段书凯
Original assignee: Southwest University
Current assignee: Southwest University
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2024-05-31
Anticipated expiration: 2041-12-23
Also published as: CN114241245A

Abstract

The invention relates to the technical field of image classification, and particularly discloses an image classification system based on a residual capsule neural network, wherein the system is provided with the residual capsule neural network, and the network comprises a first residual capsule module, a second residual capsule module, a third residual capsule module and a digital capsule layer; the first residual capsule module comprises a first residual convolution sub-network and a first initial capsule layer, the second residual capsule module comprises a second residual convolution sub-network and a second initial capsule layer, and the third residual capsule module comprises a third residual convolution sub-network and a third initial capsule layer. The invention realizes multi-stage information multiplexing by a mode of connecting multi-stage residual capsule modules in series based on the thought of residual learning, solves the problem of gradient disappearance after network deepening, introduces cavity convolution, and is beneficial to extracting more features; the transposed convolution is used for constructing a reconstruction network, so that the network performance is improved while the network parameter number is reduced; hardware deployment of the capsule network is realized based on the memristor cross array, so that the problem of large calculation amount is solved.

Description

Image classification system based on residual capsule neural network

Technical Field

The invention relates to the technical field of image classification, in particular to an image classification system based on a residual capsule neural network.

Background

Computer vision based on convolutional neural networks (Convolutional neural network, CNN) has evolved rapidly in recent years, and the impact of this sudden surge has spread to almost all industries and has been applied in the application fields of assisted driving, image processing, object recognition, etc. With the deep research of convolutional neural networks, more and more CNN new structures are proposed, so that the performance of the CNN new structures is also improved obviously. However, there are still some drawbacks to the CNN at this stage, firstly, the CNN is not robust to affine transformation of the object, i.e. slight rotation and scaling of the object may lead to erroneous results. While the occurrence of such problems can be reduced by using data enhancement methods during the model training phase, this does not make the network robust to any unknown new changes. Second, CNN uses regularly sliding receptive fields to learn the features of the picture, which determines that it will only make decisions based on local features in the input image, and does not take into account the relative relationship between the features. In addition, in order to reduce the parameters of the depth CNN, a pooling operation is introduced in the CNN, which is sensitive only to important features in the image, so this disadvantage of the CNN is exacerbated.

A new Network architecture called Capsule Network (CapsNet) in which data is transformed from scalar to vector form in a simple neural Network, these neurons consisting of multidimensional vectors are called capsules, overcomes the above-mentioned drawbacks. In order to coordinate the transmission and operation of data in capsule neurons among layers of a network, a routing protocol among capsules is proposed, and experiments prove that CapsNet has excellent performance on MNIST data sets. The experimental capsule network also proves that the method can resist black box attacks and white box attacks better than a convolutional network, and has stronger robustness. However, the baseline capsule network model has a simple structure, so the performance under a complex background data set is not very good, and in recent years, with the continuous and deep research on the capsule network, a plurality of researchers optimize and improve the capsule network in terms of network structure, routing algorithm and the like. For example, in order to solve the problem of poor performance caused by the simple CapsNet structure, researchers replace the convolution layer of the capsule network with three densely connected sub-networks, so that the extraction capability of the convolution layer on the image is enhanced, the iteration times of the new network are reduced, and good results are obtained on the problem of image classification; researchers deepen the capsule network, propose the concept of a deep capsule network, and put forward a dynamic routing method suitable for 3D convolution to reduce network parameters, optimize the network from the dimension of the capsule; researchers have improved on capsule routing algorithms, have proposed a dynamic routing algorithm based on the attention mechanism, and have implemented similar activation functions as Relu in the capsule network. Meanwhile, researchers have expanded the application scene of the capsule network, for example, have proposed a capsule network suitable for optical remote sensing image processing, can process the picture of different visual angles, obtain the satisfactory performance under the smaller training set; researchers have proposed a self-care capsule network (SACN) that can significantly improve the performance of neural networks on medical datasets; capsule networks have been used by researchers in lung cancer screening in the medical field and demonstrated that capsule networks perform better than CNN under small data sets. Thus, research on end-side deployment schemes for high-performance capsule networks has positive implications for pushing capsule networks into application in realistic scenarios.

Compared with CNN, the capsule network has larger parameters and calculation amount, but almost all deep learning algorithms are realized by a von Neumann structure-based computer at present, the von Neumann structure computer at present has the bottleneck that a calculation unit and a storage unit are mutually separated, and in the running process of the computer, data transmission between the calculation unit and the storage unit needs to consume great power and bring a certain time delay, and high-efficiency and low-power consumption real-time information processing similar to biological brain is difficult to realize, so that a deployment scheme of the capsule network on the end side needs to be further studied.

The capsule network has natural white-box attack resistance, is a potential emerging research direction in the field of deep learning, but a baseline capsule network model has poor performance under a complex background data set, and the network has large calculation amount and is unfavorable for deployment on end-side equipment.

Disclosure of Invention

The invention provides an image classification system based on a residual capsule neural network, which solves the technical problems that: how to realize high-precision image classification based on a capsule network and how to further realize high efficiency and low power consumption of image classification.

In order to solve the technical problems, the invention provides an image classification system based on a residual capsule neural network, which is provided with the residual capsule neural network; the residual capsule neural network comprises a first residual capsule module, a second residual capsule module, a third residual capsule module and a digital capsule layer; the first residual capsule module comprises a first residual convolution sub-network and a first initial capsule layer, the second residual capsule module comprises a second residual convolution sub-network and a second initial capsule layer, and the third residual capsule module comprises a third residual convolution sub-network and a third initial capsule layer;

The first residual convolution sub-network carries out convolution calculation on an input original image and then outputs a first group of feature images to the second residual convolution sub-network and the first initial capsule layer, the first initial capsule layer converts the first group of feature images into a first initial capsule group, and the digital capsule layer converts the first initial capsule group into a first digital capsule group;

the second residual convolution sub-network carries out convolution calculation on the input first group of feature images and then outputs a second group of feature images to the third residual convolution sub-network and the second initial capsule layer, the second initial capsule layer converts the second group of feature images into a second initial capsule group, and the digital capsule layer converts the second initial capsule group into a second digital capsule group;

The third residual convolution sub-network carries out convolution calculation on the input second group of feature images and then outputs a third group of feature images to the third initial capsule layer, the third initial capsule layer converts the third group of feature images into a third initial capsule group, and the digital capsule layer converts the third initial capsule group into a third digital capsule group;

the digital capsule layer splices the first initial capsule group, the second initial capsule group and the third initial capsule group to obtain a fourth initial capsule group and converts the fourth initial capsule group into a fourth digital capsule group;

and the digital capsule layer is used for outputting M new digital capsules for target classification and image reconstruction after splicing and fusing the four groups of digital capsules in the digital capsule layer, wherein M is equal to the total category number of image classification.

Preferably, the residual capsule neural network further comprises a reconstruction module, wherein the reconstruction module is used for generating a reconstructed image with the same size as the original image according to M new digital capsules.

Preferably, the loss function of the residual capsule neural network is:

Wherein L _total represents total loss ;L_k＝T_kmax(0,m⁺-||v_k||)²+λ(1-T_k)max(0,||v_k||-m^-)² represents edge loss of capsule k in the last capsule layer, T _k represents matching parameters of a kth class classification target corresponding to capsule k, T _k =1 if and only if classification is correct, otherwise T _k＝0;v_k represents activation vector of capsule k, m ⁺、m^-, λ are capsule vector modular length control parameters, m ⁺＝0.9,m^-＝0.1,λ＝0.5;L_r represents reconstruction loss, which is equal to mean square error between pixels of reconstructed image and original image; η represents a weight coefficient of the reconstruction loss at the total loss.

Preferably, the first residual convolution sub-network, the second residual convolution sub-network and the third residual convolution sub-network each comprise two convolution layers and a jump connection layer, and each convolution layer adopts the same convolution with the introduced hole convolution; the digital capsule layer converts the initial capsule into a digital capsule through a dynamic routing algorithm and an extrusion activation function.

Preferably, in a convolution layer of the first residual convolution sub-network, the second residual convolution sub-network, or the third residual convolution sub-network, for a matrix-vector multiplication operation having h inputs and o outputs, the matrix-vector multiplication operation is represented by the following formula:

In the formula (8), x _n represents an nth element in the input vector, w _mn represents a weight of an mth row and an nth column in the weight matrix, and y _m represents an mth output in the output vector;

In the hardware design corresponding to the convolution layer, matrix-vector multiplication operation with h inputs and o outputs is realized by adopting a first 2D memristor cross array;

The first 2D memristor cross array comprises h rows, o columns and columns of memristors, x ₁～x_h is converted into row input voltage V _x1～V_xh of the first 2D memristor cross array, a weight w _mn is mapped into conductance values of m rows and n columns of memristors in the first 2D memristor cross array, input voltage V _xn is multiplied by the conductance value w _mn of the corresponding memristor, output current flowing through each memristor is superposed through a wire to obtain column output current, and the output current of each column is converted into voltage V _ym through a current-voltage conversion circuit and then output.

Preferably, for any one of the first initial capsule layer, the second initial capsule layer and the third initial capsule layer, the initial capsule layer has R capsule units, each capsule unit includes X convolution units, the convolution kernel size is u×u, and on the corresponding hardware design, the operation of the initial capsule layer is implemented by adopting a first 3D memristive cross array;

The first 3D memristor cross array comprises T layers of second 2D memristor cross arrays with similar structures as the first 2D memristor cross arrays, T layers of second 2D memristor cross arrays correspond to T input feature maps, the memristor specification of each layer of second 2D memristor cross arrays is (R X) column (U X U) row, each layer of second 2D memristor cross arrays operate independently, the outputs of different layers of second 2D memristor cross arrays are connected in the same column, the output voltage of each layer of second 2D memristor cross arrays is summed with the outputs of other layers in the direction of the vertical column, and the T feature maps are converted into one-dimensional voltage output through the first 3D memristor cross arrays;

after a voltage sequence, the first 3D memristive crossover array outputs R X one-dimensional voltage signals representing an X-dimensional vector of R capsule units; after all voltage sequences are passed through by using serial input mode, the initial capsule layer outputs Y X-dimensional vectors, i.e. Y X R initial capsules, Y X R represents the size of output characteristic diagram.

Preferably, the digital capsule layer has four sublayers for converting four sets of initial capsules into corresponding four sets of digital capsules; any sub-layer comprises M capsule units, Y X dimension vectors output by the initial capsule layer and M capsule units of the sub-layer share Y X weight transformation matrixes, and the weight transformation matrixes are mapped to Y X independent second 3D memristor cross arrays for parallel operation in the hardware design process;

Each second 3D memristor cross array comprises M layers of third 2D memristor cross arrays similar to the first 2D memristor cross arrays in structure, the memristor specification of each layer of third 2D memristor cross arrays is V rows by W columns, V rows correspond to the input of V-order tensors, W columns correspond to the output of W-order tensors, and finally the sub-layers output M W-order tensors, namely M digital capsules with W dimensions.

Preferably, the dimensions of the digital capsules output by the four sub-layers of the digital capsule layer are W ₁、W₂、W₃、W₄ respectively, and after the digital capsule layer splices the four groups of digital capsules, M new digital capsules with dimensions of W ₁+W₂+W₃+W₄ are obtained for target classification and image reconstruction.

Preferably, the reconstruction module comprises a full connection layer and four transposed convolution layers, wherein each transposed convolution layer firstly complements 0 on an input feature map according to rules, and then carries out convolution operation on the feature map after the complementation of 0;

Defining the size of an input feature diagram of a transposed convolution layer as h×h×n, the convolution step size as S, and the size of each channel of the input feature diagram after 0 is complemented as H _out*H_out, wherein the calculation method is as follows:

H_out＝H+(H-1)×(S-1)+2P+O (11)

The filling rule is as follows: s-1 0 is inserted between every two pixels of the input feature map, then P circles of 0 values are supplemented around the feature map, finally O rows of 0 values are supplemented below the feature map, and O columns of 0 values are supplemented to the right of the feature map.

Preferably, the dataset used for training and testing the residual capsule neural network comprises more than ten thousand m=10 gray-scale images or RGB color images derived from MNIST, cifar-10 or SVHN datasets.

The image classification system based on the residual capsule neural network has the beneficial effects that:

1. Based on the thought of residual learning, multi-level information multiplexing is realized by connecting multi-level residual capsule modules in series, the problem of gradient disappearance after network deepening is solved, and cavity convolution is introduced into the residual capsule modules of higher layers, so that receptive fields are enlarged, more characteristics are extracted by a capsule network, and classification precision is improved;

2. the reconstruction network constructed by using the transposed convolution is provided, so that the network performance is improved while the network parameter is reduced;

3. The high-efficiency and low-power-consumption hardware deployment scheme for the capsule network for image classification solves the problem of large calculation amount of the residual capsule neural network by utilizing the advantages of low power consumption and support of parallel matrix calculation of the memristor cross array.

Drawings

FIG. 1 is a block diagram of a baseline capsule network model provided by an embodiment of the present invention;

fig. 2 is a block diagram of an image classification system based on a residual capsule neural network according to an embodiment of the present invention;

Fig. 3 is a block diagram of a residual capsule module 1 in a residual capsule neural network according to an embodiment of the present invention;

FIG. 4 is a schematic representation of hole convolution provided by an embodiment of the present invention;

FIG. 5 is a block diagram of a transpose convolution based reconstruction network provided by an embodiment of the present invention;

FIG. 6 is a block diagram of a first 2D memristor crossbar array provided by an embodiment of the present disclosure;

FIG. 7 is a block diagram of a first 3D memristor crossbar array provided by an embodiment of the present disclosure;

FIG. 8 is a block diagram of a second 3D memristor crossbar array provided by an embodiment of the present disclosure;

FIG. 9 is a graph comparing losses of MRCAPSNET residual capsule modules provided by an embodiment of the present invention;

fig. 10 is a graph comparing MRCAPSNET network reconstruction errors with CapsNet network reconstruction errors under CIFAR-10 data sets provided by an embodiment of the present invention.

Detailed Description

The following examples are given for the purpose of illustration only and are not to be construed as limiting the invention, including the drawings for reference and description only, and are not to be construed as limiting the scope of the invention as many variations thereof are possible without departing from the spirit and scope of the invention.

In the capsule neural network, a capsule is a basic element forming the network, and is composed of a group of vector neurons, and the input and the output of the capsule neural network are vectors. Wherein the modular length of the vector representing the capsule represents the probability of the classified entity, and each element in the vector represents the characteristic information of the entity. Similar to CNN, capsule networks can only detect lower levels of attributes when the capsule network's receptive field is small, while CapsNet can obtain more and more complex characteristic attributes when the receptive field is large. Figure 1 shows a simple structure of CapsNet (baseline capsule network model) consisting of four parts, a convolutional layer, an initial capsule (PRIMARYCAPS) layer, a digital capsule (DIGITCAPS) layer, and a fully-connected layer. The data flows in the form of vectors in the capsule layer according to the dynamic routing algorithm rules. During training, each low-level capsule outputs a prediction vector to the next-level capsule, and then by comparing actual labels, if the prediction structure and the actual labels match, the coupling coefficient between the two capsules increases.

In the capsule dynamic routing algorithm, u _i is defined to represent the activation vector of capsule i located in the lower capsule layer, u _j|i is defined to represent the prediction vector of capsule i in the lower capsule layer for capsule j in the next capsule layer, then:

u_i|j＝W_iju_i (1)

where W _ij denotes a weight transform matrix for matrix transforms between parts and whole. Also:

b_ij＝b_ij+u_j|i·v_j (4)

c _ij represents the weight parameters of the capsule i in the lower capsule layer (containing m capsules) for the capsule j in the next capsule layer, calculated from the softmax function; the input s _j of capsule j is derived from the weighted summation of c _ij and u _j|i, which is passed through a nonlinear activation function squash to ensure that the vector length is between 0, 1), for representing the probability of existence; v _j denotes the activation vector of capsule j located in the upper capsule layer; b _ij denotes the degree of matching between capsule i and capsule j, j=1, 2,3, …, m. At the first iteration, the value of b _ij is initialized to 0, so the probability weight for c _ij being equal is calculated. The values of all parameters are then updated iteratively by the above formula. After 3 iterations, the weight distribution values tend to converge.

For capsule k of the last layer of the capsule network, edge loss L _k is calculated by the following formula:

L_k＝T_kmax(0,m⁺-||v_k||)²+λ(1-T_k)max(0,||v_k||-m^-)² (6)

In formula (6), v _k represents the activation vector of capsule k; t _k denotes the matching parameters of the kth class classification target corresponding to capsule k, if and only if the classification is correct, T _k = 1, otherwise T _k＝0,v_k denotes the activation vector of capsule k, m ⁺、m^-, λ are both capsule vector modular length control parameters, m ⁺＝0.9,m^- = 0.1, λ = 0.5.λ is set to 0.5, so that the length of all capsules can be prevented from being reduced due to erroneous classification results.

In order to achieve high-precision image classification based on a capsule network and enable the classification network to have good expressive force under a complex background data set, the embodiment provides an image classification system based on a residual capsule neural network, wherein the system is provided with the residual capsule neural network, and the residual capsule neural network is also called as a memristor-based residual capsule network (MEMRISTIVE RESIDUAL CapsNet, MRCapsNet) because the residual capsule neural network is realized based on memristors on the hardware design described below, and the model structure of the residual capsule neural network is as shown in fig. 1 and comprises a first residual capsule module (residual capsule module 1), a second residual capsule module (residual capsule module 2), a third residual capsule module (residual capsule module 3), a digital capsule layer (Digit capsule layer) and a reconstruction module (reconstruction network); the first residual capsule module comprises a first residual convolution sub-network and a first initial capsule layer, the second residual capsule module comprises a second residual convolution sub-network and a second initial capsule layer, and the third residual capsule module comprises a third residual convolution sub-network and a third initial capsule layer.

The first residual convolution sub-network carries out convolution calculation on an input original image and then outputs a first group of feature images to the second residual convolution sub-network and a first initial capsule layer, the first initial capsule layer converts the first group of feature images into a first initial capsule group, and the digital capsule layer converts the first initial capsule group into a first digital capsule group.

The second residual convolution sub-network carries out convolution calculation on the input first group of feature images and then outputs a second group of feature images to the third residual convolution sub-network and a second initial capsule layer, the second initial capsule layer converts the second group of feature images into a second initial capsule group, and the digital capsule layer converts the second initial capsule group into a second digital capsule group.

The third residual convolution sub-network carries out convolution calculation on the input second group of feature images and then outputs a third group of feature images to a third initial capsule layer, the third initial capsule layer converts the third group of feature images into a third initial capsule group, and the digital capsule layer converts the third initial capsule group into a third digital capsule group.

The digital capsule layer splices the first initial capsule group, the second initial capsule group and the third initial capsule group to obtain a fourth initial capsule group and converts the fourth initial capsule group into a fourth digital capsule group.

The digital capsule layer is used for outputting M new digital capsules for target classification and image reconstruction after splicing and fusing four groups of digital capsules in the digital capsule layer, wherein M is equal to the total category number of image classification.

The reconstruction module is used for generating a reconstructed image with the same size as the original image according to the M new digital capsules.

The memristive residual capsule network has a multi-layer multiplexed neural network structure, containing 3 residual capsule modules ResCapsBlock, the output of the first stage ResCapsBlock (first residual capsule module/residual capsule module 1) is used to create a first stage Digit capsule (first digital capsule group) while also serving as input to the second stage ResCapsBlock (second residual capsule module/residual capsule module 2), similarly, the second and third stages ResCapsBlock (third residual capsule module/residual capsule module 3) are also created in the manner described above. Three serially connected residual capsule modules can realize multi-level feature reuse, and each level of residual capsule module can also output Digit capsules in parallel, wherein the three serially connected residual capsule modules contain features with different granularities learned from different levels, and the capsules can be spliced and fused in a Digit capsule layer.

Taking the picture (32×32×3) in the CIFAR-10 dataset as an example of the input image, the output feature map size of each residual capsule module under the CIFAR-10 dataset is shown in table 1.

TABLE 1

The memristive residual error capsule network takes images as input, takes capsules as units of data flow in the network, and trains the network based on multi-level feature reuse and capsule splicing, so that the network is more suitable for classification of complex images.

Taking the residual capsule module 1 as an example, the structure of the residual capsule module is shown in fig. 3, the module adopts the idea of residual learning proposed by ResNet (residual network), and the module comprises two convolution layers, a jump connection layer and an initial (Primary) capsule layer, unlike ResNet, and in order to prevent spatial information loss, the module does not comprise a pooling layer. In the residual capsule module 1, in order to increase the receptive field of the deep network, the principle of the present example is shown in fig. 4 below, the receptive field is enlarged by adding a hole in the hole convolution, the output of each convolution operation contains a larger range of information under the same parameter quantity and calculation quantity, and the hole convolution introduces a super parameter called "expansion rate", and the parameter defines the interval quantity of each point in the convolution kernel. In the residual capsule module 1, the input image (32×32×3) first passes through the first residual convolution sub-network (Res Convlotion Subnetwok), after two times of same convolution calculation, 256 Feature maps (Feature-map, 32×32×256) are output, then a primry capsule layer is input, the layer contains 8 capsule units, each capsule unit contains 12 convolution units with convolution kernel size (5*5), the Feature map (third-order tensor) is converted into Feature capsules (fourth-order tensor) through convolution and Reshape, squash operation, and after the primry capsule layer, the data of the network are transmitted and calculated in a capsule form. The residual capsule modules 2, 3 are constructed based on a principle similar to the residual capsule module 1.

The Digit capsule layer is used for identifying the classification result, and the test data sets in this example are all 10 classification problems, so the digital capsule layer outputs 10 capsule units. As shown in fig. 2, the memristive residual capsule network includes three residual capsule modules, which respectively output PRIMARYCAPS (first initial capsule group), PRIMARYCAPS (second initial capsule group), PRIMARYCAPS (third initial capsule group), and each primay capsule forms a corresponding digital capsule (Digit capsule) through a dynamic routing algorithm and a squeeze activation function, including a first digital capsule group (DIGITCAPS 1), a second digital capsule group (DIGITCAPS 2), and a third digital capsule group (DIGITCAPS 3). As the information extracted by the residual convolution sub-network per stage gradually decreases, the size of the Digit capsule also gradually decreases, namely DIGITCAPS1 (10×16d, d represents dimension), DIGITCAPS2 (10×12d), DIGITCAPS (10×10d). In addition, in order to realize information fusion with different granularity extracted by different residual convolution sub-networks, three independent initial capsules are spliced at the capsule level to form a combined initial PRIMARYCAPS (fourth initial capsule group), and the capsule is also subjected to a dynamic routing algorithm and a squeeze activation function to form a fourth digital capsule group DIGITCAPS (10×16d). Finally, the four digital capsule groups output 10 digital capsules (the fourth digital capsule group, 10×54d) for classification and image reconstruction after the Digit capsule layer completes the capsule-level splicing.

As shown in fig. 5, the reconstruction module/network consists of one fully connected layer and four transposed convolutional layers, unlike the fully connected reconstruction network in the baseline capsule network model, the MRCAPSNET reconstruction network creates more spatial relationships between features of interest when reconstructing an image. The input of the reconstruction network is the output of the Digit capsule layer, and the output is the reconstructed image with the same size as the original image.

MRCAPSNET is composed of two parts, edge loss and reconstruction loss.

The length of the vector used by the capsule network indicates the probability that the capsule exists in the classification entity, e.g., if and only if there is a class k classification object in the image, the class k capsule in the highest-level digital capsule layer ideally has the longest vector modulo length. For multi-object classification in one image, the edge loss function uses a separate margin loss L _k for each object:

L_k＝T_kmax(0,m⁺-||v_k||)²+λ(1-T_k)max(0,||v_k||-m^-)² (6)

if and only if a classification target k is present in the image, T _k＝1,m⁺＝0.9,m^- = 0.1, λ = 0.5.

The present example uses a reconstruction penalty to assist in capsule network training, with the reconstruction penalty L _r representing the mean square error between the pixels of the reconstructed image and the original image.

In order to balance the ratio of the edge loss and the reconstruction loss, the weight coefficient of the reconstruction loss in the total loss is eta, and eta takes 0.512 in the scheme. The total loss L _total is calculated as follows:

in the experiment, the loss is calculated after the multi-layer residual capsule blocks are spliced, and the combination mode strengthens the fusion of the features learned by the deep capsule and the shallow capsule.

In the training and reasoning process of the capsule network, a large number of matrix multiplication operations are required, and the operations consume a large amount of calculation time and power consumption. The traditional von neumann architecture-based computer memory unit and the computing unit are separated from each other, and a computing bottleneck exists. In 1990, the concept of "neuromorphic computation" was first proposed, the idea being to combine analog circuitry with very large scale integrated circuitry for simulating biological nervous systems, thereby obtaining neuromorphic systems approaching the level of human brain intelligence. Memristors are commonly used in the current research of neuromorphic computing devices to simulate biological neurons and to simulate building biological synapses. The memristor is proposed by Leon Chua in 1971, and a nano memristor element is prepared by a Hewlett packard laboratory for the first time in 2008, so that a foundation is laid for the application of the subsequent memristor element. The nanoscale memristive element supports nonvolatile storage, has low power consumption, and is compatible with a CMOS system. In recent years, many researchers have proposed neuromorphic computing architectures built based on memristors. For example, researchers design a memristor-CMOS chip, can realize high-speed multiply-add operation, realize experiments such as principal component analysis, sparse coding and the like on the chip, and provide a hardware solution for low-power-consumption deployment of an algorithm at an end side; researchers construct a full hardware computational integration system of a non-von neumann architecture, which consists of a plurality of memristive cross arrays, has energy efficiency which is two orders of magnitude higher than that of a graphic processor, and obtains the accuracy equivalent to a software algorithm in an image recognition experiment, which further proves the potential of the memristive neuromorphic computing system in the aspect of side-deployment of a neural network.

Based on the memristor cross array, the scheme for realizing the capsule neural network by using memristor cross array hardware is provided, any matrix can be stored by mapping to the conductance value of the memristor in the memristor cross array, parallel operation of the matrix is realized by the current and voltage accumulation characteristics of the memristor cross array, the storage and operation processes are completed in the memristor cross array, data do not need to be transmitted between an operation unit and a storage unit, and the integration of storage and calculation is realized.

The mixed cross array circuit formed by memristive elements and CMOS elements is constructed, convolution operation is equivalently converted into matrix-vector multiplication operation, and parallel operation of the matrix-vector multiplication operation in the memristive cross array is realized.

In each residual convolution sub-network, in order to avoid the size reduction of the output feature map and the loss of image edge information, the embodiment adopts a 'same' convolution mode, and the implementation method is to supplement 0 to the edge of the input image, and when the convolution step length is 1, the feature map output through convolution operation has the same size as the input image. When the "same" convolution is implemented using a memristive crossover array, this can be achieved by asserting 0 on the corresponding input signal.

In convolutional neural networks, for a matrix-vector multiplication operation with h inputs and o outputs, the following formula can be used:

In the formula (8), x _n represents the nth element in the input vector, w _mn represents the weight of the mth row and n column in the weight matrix, and y _m represents the mth output in the output vector.

In the hardware design corresponding to the convolution layer, the matrix-vector multiplication operation with h inputs and o outputs is realized by adopting a first 2D memristor cross array in the embodiment. As shown in fig. 6, the circuit includes h rows of o columns of first wires which are crisscrossed but not connected, a second wire which is crisscrossed but not connected with the input ends of the h rows of first wires, and a third wire which is crisscrossed but not connected with the input ends of the o columns of first wires, wherein the second wire and the third wire are connected through an amplifying circuit;

each of the two first wires is connected with two memristors at the staggered position, each group of the first wires and the second wires and each group of the first wires and the third wires are connected with two wires by a first auxiliary constant value resistor (R _a), and the output ends of the o columns of the first wires are connected in series with a current-voltage conversion circuit;

The amplifying circuit comprises a first auxiliary constant value resistor and a first operational amplifier, wherein the inverting input end and the output end of the first operational amplifier are respectively connected with a second wire and a third wire, the non-inverting input end of the first operational amplifier is grounded, and the first auxiliary constant value resistor is connected between the inverting input end and the output end of the operational amplifier in parallel;

The current-voltage conversion circuit comprises a second auxiliary constant value resistor (R _c) and a second operational amplifier, wherein the second auxiliary constant value resistor is connected in parallel between the inverting input end and the output end of the second operational amplifier, the output end of the second operational amplifier is used as a final voltage output end, and the non-inverting input end of the second operational amplifier is grounded.

The wires and memristors used in the first 2D memristive crossover array are both nanoscale.

Of course, other circuit structures besides the circuit structure shown in fig. 6 may be used for the first 2D memristive cross array, so long as the same function is achieved.

When the first 2D memristor cross array works, an input vector x ₁～x_h in the formula (8) is converted into a row input voltage V _x1～V_xh of the first 2D memristor cross array, a weight w _mn is mapped into a conductance value of a memristor in an mth row and an n-th column in the first 2D memristor cross array, the input voltage V _xn is multiplied by a conductance value w _mn of a corresponding memristor, an output current flowing through each memristor is superposed through a wire to obtain a column output current, and the output current of each column is converted into a voltage V _ym through a current-voltage conversion circuit and then is output. According to kirchhoff's law, the output V _ym of the first 2D memristive crossover array is represented as equation (9), and the weight w _mn is represented as equation (10):

w_mn＝(G_a-G_mn)·R_c,m＝1,2,…,o；n＝1,2,…,h (10)

wherein, R _a is a first auxiliary constant value resistor, G _a is the conductance of R _a, R _c is a second auxiliary constant value resistor, and G _mn is the conductance of the m-th row n-th column memristor in the first 2D memristor cross array.

The weights may be written into the first 2D memristor crossover array by adjusting the conductance values G _mn of the respective position memristors. Since the conductance G _mn of the memristor is constant to be positive and cannot represent negative weight, a constant value resistor R _a is introduced, the conductance values of the constant value resistor G _a,G_on and the constant value resistor G _off respectively represent the conductance values of the lowest resistance state and the highest resistance state of the memristor, so that the weight range of the memristor in the memristor cross array is [ - (G _on-G_off)*R_c/2,(G_on-G_off)*R_c/2 ].

Taking the first layer of convolution layer in the first residual convolution sub-network as an example, it uses a convolution kernel with a size of 3*3, the number of input channels is 3, the number of output channels is 64, and in hardware, the convolution kernel will map into 1 column vector with a length of 27, h=27, o=64 in the 2D memristive crossover array in fig. 6. In the MRCAPSNET deployment process, 3×3×64 weights are mapped to conductance values of memristors in the 27×64 memristor cross array. In the network reasoning stage, the test set of the input network is converted into a read voltage input cross array within a voltage threshold, and the output voltages at the V _y1～V_y64 end respectively represent the output values of the corresponding channels of the capsule network. Similarly, the 2D memristive crossover array described above is equally applicable to hardware deployment of the first residual convolution sub-network and other convolution layers in the second and third residual convolution sub-networks.

Compared with the traditional convolutional neural network, the input and output of the capsule neural network are expanded by one dimension on the basis of the traditional convolutional neural network, so that the weight calculation among layers in the capsule neural network is correspondingly expanded by one dimension, the matrix-vector calculation in the traditional convolutional network is changed into matrix-matrix calculation, and a 3D memristor cross array is needed to be adopted. The embodiment constructs a 3D memristor-based cross array for realizing hardware deployment of a capsule layer.

For any one of the first initial capsule layer, the second initial capsule layer and the third initial capsule layer, the initial capsule layer is provided with R capsule units, each capsule unit comprises X convolution units, the convolution kernel size is U, and on the corresponding hardware design, the operation of the initial capsule layer is realized by adopting a first 3D memristor cross array.

The first 3D memristor cross array comprises T layers of second 2D memristor cross arrays similar to the first 2D memristor cross array in structure, the T layers of second 2D memristor cross arrays correspond to T input feature patterns, the memristor specification of each layer of second 2D memristor cross arrays is (R X) column (U X) row, each layer of second 2D memristor cross arrays operates independently, the outputs of the different layers of second 2D memristor cross arrays are connected in the same column, the output voltage of each layer of second 2D memristor cross arrays is summed with the outputs of other layers in the vertical column direction, and the T feature patterns are converted into one-dimensional voltage output through the first 3D memristor cross arrays.

After a voltage sequence, the first 3D memristor cross array outputs r×x one-dimensional voltage signals, representing an X-dimensional vector of R capsule units; after all voltage sequences are passed through by using serial input mode, the initial capsule layer outputs Y X-dimensional vectors, i.e. Y X R initial capsules, Y X R represents the size of output characteristic diagram.

Taking the first residual capsule module (ResCapsBlock 1) as an example, the initial capsule layer has r=8 capsule units, each capsule unit contains x=12 convolution units, the convolution kernel size is u×u= 5*5, so that the initial capsule layer can be equivalent to 12 parallel common convolution layers, the input of each convolution layer is t=256 channels, and the output of each convolution layer is 8 channels, and thus, the initial capsule layer can be deduced to contain 12×8×256 convolution kernels in total.

The structure of the first 3D memristor cross array adopted by the first initial capsule layer is shown in fig. 7, each layer structure is similar to the implementation mode of the first 2D memristor cross array, dark gray ports are input and output ports, auxiliary resistors and operational amplifiers are included in the dark gray ports, light gray strip-shaped elements represent nanoscale wires, elements at intersections between each wire and other wires are nano memristors, and different layers are connected through longitudinal buses. The first 3D memristive cross array has a total of t=256 layers of first 2D memristive cross arrays, the convolution kernel of each 5*5 is mapped into one column in a cross array, 96 (8×12) columns represent 12 convolution kernels in 8 capsule units, so that the specification of each layer of first 2D memristive cross array is 96×25 (8×12×5×5), each layer of array is independently operated, and the outputs of different layers are connected in the same column. 256 layers of the first 3D memristor cross array correspond to 256 feature maps of the input initial capsule, output voltage of each layer is summed with output of other layers in the vertical column direction, and the 256 feature maps are converted into one-dimensional voltage output through the first 3D memristor cross array. After a voltage sequence, the first 3D memristive crossover array outputs 96 (8×12) one-dimensional voltage signals representing a 12-dimensional vector of 8 capsule cells. After all voltage sequences, the initial capsule layer output size is 14×14×8 12-dimensional vectors, i.e., 14×14×8 initial capsules.

The second initial capsule layer and the third initial capsule layer have the same specification, and the hardware deployment scheme is also realized in the manner described above.

For the digital capsule layer, the digital capsule layer has four sublayers for converting four sets of initial capsules into corresponding four sets of digital capsules; any sub-layer comprises M capsule units, Y X dimension vectors output by the initial capsule layer and M capsule units of the sub-layer share Y X weight transformation matrixes, and the weight transformation matrixes are mapped to Y X independent second 3D memristor cross arrays for parallel operation in the hardware design process;

Each second 3D memristor cross array comprises M layers of third 2D memristor cross arrays similar to the first 2D memristor cross arrays in structure, the memristor specification of each layer of third 2D memristor cross arrays is V rows x W columns, V rows correspond to the input of V-order tensors, W columns correspond to the output of W-order tensors, and finally the sub-layers output M W-order tensors, namely M digital capsules with W dimensions.

Taking Cifar data sets as an example, MRCAPSNET needs to solve the 10 classification problem, i.e. m=10, so the digital capsule layer of the network contains 10 capsule units corresponding to 10 pictures to be classified. Each capsule in the initial capsule layer and each capsule in the digital capsule layer have a corresponding weight transformation matrix, taking the first sublayer DIGITCAPSLAYER1 of the digital capsule layer as an example, the output of the layer is 10 16-order tensors, so that 14×14×8 12-order tensors output by the initial capsule layer PRIMARYCAPSLAYER1 and 10 capsule units of the digital capsule layer1 share 14×14×12×10 weight transformation matrices, these matrices can be mapped to 14×14×8×12 independent second 3D memristive cross arrays in a reasoning process, the structure of each second 3D memristive cross array is shown in fig. 8, the second 3D memristive cross array is composed of a third 2D memristive cross array of 10 layers 12×16 (similar to the structure of the first 2D memristive cross array), v=12 rows input represents the input of the digital capsule layer is 12-order tensors, and w=16 columns output represents the output of the digital capsule layer is 16-order tensors. In the digital capsule layers, the 10 layers of the second 3D memristive crossbar array all have the same 12 th order tensor input from the initial capsule, with each layer, i.e., each capsule cell, independently outputting one 16 th order tensor. The first sublayer DIGITCAPSLAYER of the final digital capsule layer outputs 10 16-order tensors for object classification and image reconstruction. Similarly, hardware implementations of other sublayers are also implemented as described above.

And the dimensions of the digital capsules output by the four sublayers of the digital capsule layer are W ₁、W₂、W₃、W₄ respectively, so that the digital capsule layer splices the four groups of digital capsules to obtain M new digital capsules with the dimensions W ₁+W₂+W₃+W₄ for target classification and image reconstruction. As in fig. 2, w ₁、W₂、W₃、W₄ is equal to 16, 12, 10, 16, w ₁+W₂+W₃+W₄ =54, m=10, respectively, the digital capsule layer outputs 10 new digital capsules of dimension 54.

The 3D memristor cross array improves packaging density, is more compact in structure, and is additionally provided with one-dimensional parallel operation on the basis of the 2D memristor cross array, so that more efficient parallel operation is supported.

The transposed convolution layer up-samples the input tensor, and a reconstruction module formed by the multi-layer superimposed transposed convolution restores the feature map to the original size. The first step of transpose convolution implementation is to supplement 0 to the input feature map according to rules, and the second step is to execute normal convolution operation to the feature map after 0 supplementation. The size of the transposed convolution input feature diagram is defined as H.times.H.times.N, the convolution step length is defined as S, the size of each channel of the feature diagram after 0 is complemented is defined as H _out*H_out, and the calculation mode is as follows:

H_out＝H+(H-1)×(S-1)+2P+O (11)

The filling rule is as follows: s-1 0 is inserted between every two pixels of the input feature map, then P circles of 0 values are supplemented around the feature map, finally O rows of 0 values are supplemented below the feature map, and O columns of 0 values are supplemented to the right of the feature map. In the process of realizing transposed convolution by hardware, firstly, setting 0 for a corresponding input signal according to the rule, and then, mapping the weight of the input signal into a memristor cross array according to the circuit implementation scheme of the convolution network to finish deployment.

Experiments and analyses were performed as follows.

The experiment tested memristive residual capsule network MRCAPSNET on three datasets of MNIST, cifar-10, SVHN, and compared with other capsule models. In addition, an ablation experiment is performed in the embodiment to prove the effectiveness of the multi-level residual capsule module provided by the scheme. The test is carried out on software, the test environments are Ubuntu operating system, keras2.1.6, tensorflow gpu1.15.4 and Python3.6, and the hardware environment for model training is Geforce RTX3090 and Intel Xeon E3 processor.

To fully test MRCAPSNET the performance on both simple and complex datasets, the following three datasets were selected for testing in this example.

1. MNIS: the dataset consisted of 70K 10-class gray scale handwritten digital images, with 60K images for training and 10K images for testing, with image sizes of 28 x 28.

2. CIFAR-10: the dataset contained 60K color images in total of 10 categories, 32 x 3 in size, 50K images for training and 10K images for testing. The image in CIFAR-10 data set has complex characteristic information and background information, and can well detect the performance of the network under the condition of strong interference.

3. STREET VIEW House Numbers (SVHN): the data set comprises 10 color images of 32 x 3 in category, 73K training sets and 26K test sets, wherein the images come from house numbers in google street view, and the performance of the neural network in a real natural scene can be detected.

In the training process, an Adam optimizer is used in the example, the initial learning rate is 0.001, the attenuation of each epoch is 0.9 times of the original attenuation, the batch size is 128, and the training iteration number is 100. The parameters remain unchanged in all datasets during the MRCAPSNET training process.

The accuracy of this experiment was compared MRCAPSNET with capsule network CapsNet and the results are shown in table 2. As can be seen from table 2, compared with the capsule network CapsNet shown in fig. 1, the performance of the MRCAPSNET single model on the CIFAR-10 dataset and the SVHN dataset is improved by 21.6% and 0.9%, respectively, and the performance of MRCAPSNET for the MNIST dataset is slightly lower than CapsNet, but is also maintained at a higher level, the parameter amount of MRCAPSNET is 16.01M, and the parameter amount of CapsNet is 22.48M, which indicates that the multi-level residual module multiplexing structure proposed by the present example can effectively learn multi-level features in an image.

TABLE 2

Model	MNIST	SVHN	CIFAR-10
				CapsNet	99.75％	95.70％	68.74％
MRCapsNet	99.70％	96.60％	90.34％

In the following, the embodiment proves the superiority of the reconstruction network of multilevel ResBlock, cavity convolution and transposition convolution through an ablation experiment. In the CNN field, the research result about the residual network shows that the multi-stage residual block can improve the performance of the network to a certain extent, in order to understand how the number of residual capsule modules has an effect on the performance of MRCaps, a series of experiments are performed in this embodiment, and the experimental result is shown in table 3, when the number of residual capsule modules is increased from 1 to 3, the accuracy of the network is increased from 80.58% to 89.65%, and when a hole convolution method is added to the residual capsule modules and three capsule modules of the capsule are spliced, the accuracy is increased to 90.34%. As can be seen from the training Loss graph in fig. 9, there is a certain drop in Loss value after each stage ResCapsBlock, indicating that each stage ResCapsBlock learns new features.

In the comparison experiment of the number of the capsule modules, the comparison experiment of the cavity convolution is added, and the experimental result table 3 shows that after the cavity convolution is used, the precision of each stage of residual capsule module is improved, the overall network precision is improved from 89.95% to 90.34%, and the improvement effect is obvious. The shallow residual capsule module has extracted more small characteristic information by using a convolution kernel of 3 multiplied by 3, and the method of cavity convolution in the deep residual capsule module can increase the receptive field and extract the characteristic information in a larger range on the premise of not increasing the calculated amount.

TABLE 3 Table 3

Model	Test accuracy (%)
		ResCapsBlock 1 (without hole convolution)	80.58
ResCapsBlock 2 (without hole convolution)	88.32
		ResCapsBlock 3 (without hole convolution)	89.65
Merging ResCapsBlock A1, 2,3 (without hole convolution)	89.95
		ResCapsBlock 1 (using hole convolution with expansion ratio (1, 1))	80.59
ResCapsBlock 2 (using hole convolution with expansion ratio (1, 1))	88.71
		ResCapsBlock 3 (using hole convolution with expansion ratio (2, 2))	89.8
Merging ResCapsBlock, 1,2, 3 (with hole convolution)	90.34

In this example, reconstruction experiments are performed on CIFAR-10 datasets by using the reconstruction network proposed by MRCAPSNET and the reconstruction network of CapsNet base line networks, as can be derived from fig. 10, the reconstruction network consisting of the multi-layer transposed convolution proposed in this example has smaller reconstruction errors in the training phase, and as can be seen from table 4, the number of reconstruction sub-networks formed by the transposed convolution is only 16.39% of that of the CapsNet base line networks, and has better performance.

TABLE 4 Table 4

Reconstruction module	Quantity of parameters	ACC
			Transpose convolution (MRCAPSNET)	303497	90.34％
Full connection (CapsNet)	1851904	89.77％

The power consumption of the core computing unit MRCAPSNET mainly comprises two parts, one part is the power consumption required by mapping the neural network weight to the cross array, and the other part is the power consumption of the memristive cross array in the reasoning process. The memristor has a maximum resistance R _on and a minimum resistance R _off of 100kΩ and 100mΩ, respectively. Memristor write voltage is 6.5V, and maximum write power consumption of a single memristor is 0.5 μw. In the memristor cross array reasoning stage, the written weight of each memristor is different, so that the power consumption is different, and therefore, the maximum power consumption of the memristor under the limit condition is estimated to be 6.9 mu W according to the above method, and the same calculated CMOS circuit power consumption is realized to be close to 60 mu W. Therefore, the hardware implementation scheme provided by the embodiment can greatly reduce the side power consumption of the capsule network side.

In summary, in terms of algorithm, the embodiment of the invention provides a novel capsule network structure MRCAPSNET, which constructs a ResCapsBlock module based on the residual learning idea, solves the problem of gradient disappearance after network deepening, realizes multi-stage information multiplexing by a multi-stage ResCapsBlock serial connection mode, introduces cavity convolution in a ResCapsBlock of a higher layer, expands receptive fields, and is helpful for extracting more features of the capsule network. The embodiment of the invention also provides a reconstruction network constructed by using the transposed convolution, which improves the network performance while reducing the network parameter quantity. MRCASPNET achieved more advanced performance than the current stage capsule network model.

The embodiment of the invention also provides a capsule network hardware implementation scheme based on the memristor cross array, and the memristor cross array has the advantages of low power consumption and support of parallel matrix calculation, so that the problem of large calculation amount of the capsule network can be solved. The power consumption of neurons in the circuit is analyzed and calculated, and the result shows that the power consumption of the circuit based on the memristor cross array is far smaller than that of a CMOS circuit.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. An image classification system based on a residual capsule neural network is characterized in that the residual capsule neural network is carried; the residual capsule neural network comprises a first residual capsule module, a second residual capsule module, a third residual capsule module and a digital capsule layer; the first residual capsule module comprises a first residual convolution sub-network and a first initial capsule layer, the second residual capsule module comprises a second residual convolution sub-network and a second initial capsule layer, and the third residual capsule module comprises a third residual convolution sub-network and a third initial capsule layer;

2. An image classification system based on a residual capsule neural network according to claim 1, characterized in that: the residual capsule neural network further comprises a reconstruction module, wherein the reconstruction module is used for generating a reconstruction image with the same size as the original image according to M new digital capsules.

3. The image classification system based on residual capsule neural network of claim 2, wherein the residual capsule neural network has a loss function of:

Wherein L _total represents total loss ;L_k＝T_kmax(0,m⁺-||v_k||)²+λ(1-T_k)max(0,||v_k||-m^-)² represents edge loss of capsule k in the last capsule layer, T _k represents matching parameters of a kth class classification target corresponding to capsule k, T _k =1 if and only if classification is correct, otherwise T _k＝0,v_k represents activation vector of capsule k, m ⁺、m^-, λ are capsule vector modular length control parameters, m ⁺＝0.9,m^-＝0.1,λ＝0.5;L_r represents reconstruction loss, which is equal to mean square error between pixels of reconstructed image and original image; η represents a weight coefficient of the reconstruction loss at the total loss.

4. An image classification system based on a residual capsule neural network according to claim 3, characterized in that: the first residual convolution sub-network, the second residual convolution sub-network and the third residual convolution sub-network comprise two convolution layers and a jump connection layer, and each convolution layer adopts the same convolution with the introduced hole convolution; the digital capsule layer converts the initial capsule into a digital capsule through a dynamic routing algorithm and an extrusion activation function.

5. The residual capsule neural network-based image classification system of claim 4, wherein: in a convolution layer of the first residual convolution sub-network, the second residual convolution sub-network, or the third residual convolution sub-network, for a matrix-vector multiplication operation having h inputs and o outputs, the operation is expressed by the following formula:

6. The residual capsule neural network-based image classification system of claim 5, wherein: for any one of the first initial capsule layer, the second initial capsule layer and the third initial capsule layer, the initial capsule layer is provided with R capsule units, each capsule unit comprises X convolution units, the convolution kernel size is U, and on the corresponding hardware design, the operation of the initial capsule layer is realized by adopting a first 3D memristor cross array;

7. The residual capsule neural network-based image classification system of claim 6, wherein: the digital capsule layer is provided with four sublayers and is used for converting four groups of initial capsules into four corresponding groups of digital capsules; any sub-layer comprises M capsule units, Y X dimension vectors output by the initial capsule layer and M capsule units of the sub-layer share Y X weight transformation matrixes, and the weight transformation matrixes are mapped to Y X independent second 3D memristor cross arrays for parallel operation in the hardware design process;

8. The residual capsule neural network-based image classification system of claim 7, wherein: and the dimensions of the digital capsules output by the four sublayers of the digital capsule layer are W ₁、W₂、W₃、W₄ respectively, so that the digital capsule layer can splice the four groups of digital capsules to obtain M new digital capsules with the dimensions W ₁+W₂+W₃+W₄ for target classification and image reconstruction.

9. An image classification system based on a residual capsule neural network according to claim 3, characterized in that: the reconstruction module comprises a full-connection layer and four transposition convolution layers, wherein each transposition convolution layer firstly supplements 0 to an input characteristic diagram according to rules and then carries out convolution operation on the characteristic diagram after 0 supplementation;

H_out＝H+(H-1)×(S-1)+2P+O (11)

10. An image classification system based on a residual capsule neural network according to any one of claims 1-9, characterized in that: the dataset used to train and test the residual capsule neural network includes more than ten thousand m=10 gray-scale images or RGB color images derived from MNIST, cifar-10 or SVHN datasets.