CN106845525A

CN106845525A - A kind of depth confidence network image bracket protocol based on bottom fusion feature

Info

Publication number: CN106845525A
Application number: CN201611240795.9A
Authority: CN
Inventors: 熊鹏
Original assignee: Shanghai Dianji University
Current assignee: Shanghai Dianji University
Priority date: 2016-12-28
Filing date: 2016-12-28
Publication date: 2017-06-13

Abstract

The present invention provides a deep belief network image classification protocol based on underlying fusion features. First, the color, texture and shape features in the sample image are extracted to form a weight matrix for multi-feature fusion; and then the weight matrix is normalized. ; Finally, the normalized weight matrix is used as the original data to train and test the deep belief network, and the classification result is obtained. The deep belief network is a deep structure composed of multiple restricted Boltzmann machine model RBMs connected together and a BP network; each layer of RBM is connected, and the output of the previous layer of RBM is used as the input of the next layer of RBM. The output of the last layer of RBM is used as the input of the BP network to form the entire deep belief network; the original data is input into the first layer of RBM, and the output of the BP network is the classification result. The invention has good expansibility, multi-feature fusion improves classification accuracy, and deep belief network improves classification efficiency.

Description

A Deep Belief Network Image Classification Protocol Based on Underlying Fusion Features

技术领域technical field

本发明涉及一种自然界图像的分类算法，尤其涉及一种基于底层融合特征的深度置信网络图像分类协议，适用于图像信息、图像分类与检索技术领域。The invention relates to a classification algorithm of natural images, in particular to a deep belief network image classification protocol based on underlying fusion features, which is applicable to the technical fields of image information, image classification and retrieval.

背景技术Background technique

随着数字技术、信息技术和多媒体技术的快速发展，数字图像已成为人们日常生活中不可缺少的一部分，而且图像的数量正以惊人的速度增长，面对越来越多的图像信息，图像分类与检索已成为研究的重点。传统的基于文本和标注的分类与检索方法存在一些缺点：费时、费力；数字图像的快速增加使得对全部图像做标注几乎变得不可能；标注者主观影响很大。这使得基于文本和标注的图像分类和检索的发展受到限制。随后有大量研究基于内容的图像分类与检索(Content Based Image Retrieval-CBIR)展开，该技术克服了人工标注的缺点，可以实现自动、智能化的分类、检索与管理。图像分类问题目前的难点主要体现在两方面：(1)特征的选择和提取问题；(2)分类器的选择和学习问题。With the rapid development of digital technology, information technology and multimedia technology, digital images have become an indispensable part of people's daily life, and the number of images is growing at an alarming rate. Faced with more and more image information, image classification And retrieval has become the focus of research. Traditional classification and retrieval methods based on text and annotation have some disadvantages: time-consuming and labor-intensive; the rapid increase of digital images makes it almost impossible to annotate all images; annotators have a great subjective influence. This limits the development of text-based and annotation-based image classification and retrieval. Subsequently, a large number of researches on content-based image classification and retrieval (Content Based Image Retrieval-CBIR) were carried out. This technology overcomes the shortcomings of manual annotation and can realize automatic and intelligent classification, retrieval and management. The current difficulties of image classification are mainly reflected in two aspects: (1) the selection and extraction of features; (2) the selection and learning of classifiers.

特征选择和提取是图像分类的基础。图像特征有两类，一类是底层视觉特征，包括颜色、形状和纹理特征、SIFT(尺度不变特征转换)特征等；另一类是中层语义特征，主要有语义特征、区域语义概念特征、BOW特征等。Feature selection and extraction are the basis of image classification. There are two types of image features, one is low-level visual features, including color, shape and texture features, SIFT (Scale Invariant Feature Transform) features, etc.; the other is middle-level semantic features, mainly including semantic features, regional semantic concept features, BOW features, etc.

在分类器方法，当前的多数分类学习算法多为浅层结构算法，包括常见的支持向量机(SVM)、Booting和Logistic Regre-ssion等。SVM应用的典型流程是首先提取出图像局部特征，并形成特征码，然后将每幅图像的局部特征所形成特征单词的直方图作为特征，最后通过SVM进行训练得到模型，其局限性在于有限样本和计算单元情况下对复杂函数的表示能力有限，针对复杂分类问题其泛化能力受到一定制约。BP算法是传统训练多层网络的典型算法，而实际上对于仅包含几层的网络，该训练方法就已很不理想。深度学习通过组合底层特征形成更加抽象的高层表示(属性类别或特征)，以发现数据的分布式特征表示。使用高维的图像描述符和线性分类器相结合的方法是目前较常用的图像分类方法。In the classifier method, most of the current classification learning algorithms are mostly shallow structure algorithms, including common support vector machines (SVM), Booting and Logistic Regre-ssion, etc. The typical process of SVM application is to first extract the local features of the image and form a feature code, then use the histogram of the feature words formed by the local features of each image as a feature, and finally train the model through SVM, the limitation of which is limited samples In the case of complex functions and computing units, the ability to express complex functions is limited, and its generalization ability for complex classification problems is subject to certain restrictions. The BP algorithm is a typical algorithm for traditional multi-layer network training, but in fact, this training method is far from ideal for a network that only contains a few layers. Deep learning forms a more abstract high-level representation (attribute category or feature) by combining low-level features to discover distributed feature representations of data. The combination of high-dimensional image descriptors and linear classifiers is currently the most commonly used image classification method.

Lecun Y等(Lecun Y，Bottou L，Bengio Y，et al.Gradlent-based learningapplied to document recognition[Ｊ].Proceedings of the IEEE，1998，86(11)∶2278-2324)提出了基于贪心逐层非监督学习过程的深度置信网络(DBN)的概念。DBN由多层受限波尔兹曼机(Restricted Boltzann Mechines-RBM)组成的深层神经网络结构，解决了传统BP算法训练多层神经网络的难题。DBN作为一种深度学习网络，其本质上把学习结构看作一个网络，则深度学习的核心思路如下：(1)无监督学习用于每一层网络的预训练；(2)每次用无监督学习只训练一层，将其训练结果作为其高一层的输入；(3)用监督学习去调整所有层，也就是堆叠多个层，上一层的输出作为下一层的输入。通过这种方式，即可实现对输入信息的分级表达。深度置信网络训练可分成两阶段，第一阶段是无监督特征学习；第二阶段是有监督网络参数微调和分类。目前深度置信网络已成功应用于手写字体识别、语音识别等领域，取得了较好的效果。Lecun Y et al. (Lecun Y, Bottou L, Bengio Y, et al. Gradlet-based learning applied to document recognition [J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324) proposed a layer-by-layer non- The concept of Deep Belief Networks (DBNs) for supervised learning processes. DBN is a deep neural network structure composed of multi-layer Restricted Boltzmann Machines (Restricted Boltzmann Mechines-RBM), which solves the problem of traditional BP algorithm training multi-layer neural network. As a deep learning network, DBN essentially regards the learning structure as a network. The core idea of deep learning is as follows: (1) Unsupervised learning is used for pre-training of each layer of network; Supervised learning only trains one layer, and uses its training results as the input of its higher layer; (3) Use supervised learning to adjust all layers, that is, stack multiple layers, and the output of the previous layer is used as the input of the next layer. In this way, hierarchical expression of input information can be realized. Deep belief network training can be divided into two stages, the first stage is unsupervised feature learning; the second stage is supervised network parameter fine-tuning and classification. At present, the deep belief network has been successfully applied in handwritten font recognition, speech recognition and other fields, and achieved good results.

随后，大量学者进行了相关研究，并对DBN算法进行了改进，如孙劲光等(孙劲光，蒋金叶，孟祥福一种数值属性的深度置信网络分类方法[Ｊ]计算机工程2014 33(18)125-131)提出数值属性的DBN，并在UCI的多个数据集上进行对比验证，证明其有效性。付燕等(付燕鲜艳明等基于多特征和改进SVM集成的图像分[Ｊ]计算机工程 2011 37(21)：196-199)认为现有图像分类方法不能充分利用图像各单一特征之间的优势互补特性面，导致分类不精确，其采用主成分分析对所提取的特征进行变换，使用支持向量机的集成分类器进行分类，通过仿真实验表明多特征比单一特征具有更好的图像分类精度和更快的分类速度。Subsequently, a large number of scholars conducted related research and improved the DBN algorithm, such as Sun Jinguang, etc. A DBN of numerical attributes is proposed, and comparative verification is carried out on multiple data sets of UCI to prove its effectiveness. Fu Yan et al. (Fu Yanyanming et al. Image classification based on multi-features and improved SVM integration [J] Computer Engineering 2011 37(21): 196-199) think that the existing image classification methods cannot make full use of the relationship between the single features of the image. Complementary advantages and characteristics, resulting in inaccurate classification. It uses principal component analysis to transform the extracted features, and uses the integrated classifier of support vector machine for classification. Simulation experiments show that multi-features have better image classification accuracy than single features. and faster classification speed.

发明内容Contents of the invention

本发明的目的是克服现有单一特征描述符及浅层结构分类算法分类正确率较低、效率较低的问题，提供一种分类正确率高、效率高的深度置信网络图像分类协议。The purpose of the present invention is to overcome the problems of low classification accuracy and low efficiency of existing single feature descriptors and shallow structure classification algorithms, and provide a deep belief network image classification protocol with high classification accuracy and high efficiency.

为了解决上述技术问题，本发明的技术方案是提供一种基于底层融合特征的深度置信网络图像分类协议，其特征在于，步骤为：In order to solve the above technical problems, the technical solution of the present invention is to provide a deep belief network image classification protocol based on the underlying fusion features, characterized in that the steps are:

步骤1：提取样本图像中的颜色、纹理和形状特征，构成多特征融合的权重矩阵；Step 1: Extract the color, texture and shape features in the sample image to form a weight matrix for multi-feature fusion;

步骤2：对所述权重矩阵进行归一化处理；Step 2: performing normalization processing on the weight matrix;

步骤3：利用归一化处理后的权重矩阵作为原始数据对深度置信网络进行训练和测试，得出分类结果。Step 3: Use the normalized weight matrix as the original data to train and test the deep belief network to obtain the classification result.

优选地，所述深度置信网络是由多个受限波尔兹曼机模型RBM连接在一起的结构和一个BP神经网络构成的深度结构；将各层RBM连接，前一层RBM的输出作为后一层RBM的输入，最后一层RBM输出作为BP神经网络的输入，构成整个深度置信网络；所述原始数据输入第一层RBM，BP神经网络的输出即为分类结果。Preferably, the deep belief network is a deep structure composed of a plurality of restricted Boltzmann machine models RBM connected together and a BP neural network; each layer of RBM is connected, and the output of the previous layer of RBM is used as the rear The input of one layer of RBM and the output of the last layer of RBM are used as the input of the BP neural network to form the entire deep confidence network; the original data is input into the first layer of RBM, and the output of the BP neural network is the classification result.

优选地，所述RBM将原始数据进行逐层的特征提取，从具体到抽象，使得神经网络得到的输入成为一个更加易于分类的特征向量，同时，多层RBM组成的深度结构使得在特征提取过程中的错误或者冗余信息被逐层弱化，并最终在BP神经网络的反向调整过程中使模型达到整体最优。Preferably, the RBM performs layer-by-layer feature extraction on the original data, from specific to abstract, so that the input obtained by the neural network becomes a feature vector that is easier to classify. At the same time, the deep structure composed of multi-layer RBM makes the process of feature extraction The error or redundant information in the neural network is weakened layer by layer, and finally the model reaches the overall optimum in the reverse adjustment process of the BP neural network.

优选地，所述步骤3中，从原始数据中，随机选择80％～90％作为训练集，其余作为测试集。Preferably, in step 3, 80% to 90% of the original data are randomly selected as the training set, and the rest are used as the test set.

优选地，利用所述训练集对深度置信网络进行训练，得到深度置信网络的权重和偏置参数；采用训练过程得到的参数确定的深度置信网络对测试集进行测试，并进行误差评估，得出分类结果。Preferably, the training set is used to train the deep confidence network to obtain the weight and bias parameters of the deep confidence network; the depth confidence network determined by the parameters obtained in the training process is used to test the test set, and the error evaluation is performed to obtain classification results.

本发明提供的协议基于底层图像特征和深度置信网络DBN，侧重于图像底层特性，在详细而全面的分析图像底层特性的前提下，提取样本图像中的颜色、纹理和形状特征，构成多特征融合的权重矩阵，并对特征矩阵进行归一化处理，利用构建的四层DBN分类器进行训练和分类。采用Corel图库，通过训练权重进行测试，算法实际实施的效果表明，该算法的平均分类正确率远高于使用单一特征的分类算法和其他主流分类算法。The protocol provided by the invention is based on the underlying image features and the deep belief network DBN, focusing on the underlying characteristics of the image. On the premise of detailed and comprehensive analysis of the underlying characteristics of the image, the color, texture and shape features in the sample image are extracted to form a multi-feature fusion The weight matrix and the feature matrix are normalized, and the four-layer DBN classifier constructed is used for training and classification. Using the Corel library and testing by training weights, the actual implementation effect of the algorithm shows that the average classification accuracy rate of the algorithm is much higher than that of the classification algorithm using a single feature and other mainstream classification algorithms.

相比现有技术，本发明提供的基于底层融合特征的深度置信网络图像分类协议具有如下有益效果：Compared with the prior art, the deep belief network image classification protocol based on the underlying fusion features provided by the present invention has the following beneficial effects:

1、良好的可扩展性。协议对同一时刻识别的图像数量没设上限，理论上，只要恰当选取各图像的综合特这并进行融合，都能识别。因此，协议的可扩展性较好。1. Good scalability. The agreement does not set an upper limit on the number of images recognized at the same time. In theory, as long as the comprehensive characteristics of each image are properly selected and fused, they can be recognized. Therefore, the scalability of the protocol is better.

2、多特征的融合。多特征融合能够改善单一特征区分图像很容易造成图像的显著征丢失从而降低分类精度的问题，同时有效缓和因单幅图显著特征丢失造成分类精度降低的现象，提高了分类的正确率。2. The fusion of multiple features. Multi-feature fusion can improve the problem that a single feature distinguishes an image and easily causes the loss of the salient features of the image, thereby reducing the classification accuracy.

3、DBN分类器。深度学习通过组合底层特征形成更加抽象的高层表示或特征，以发现数据的分布的特征表示，提高了分类的效率。3. DBN classifier. Deep learning combines low-level features to form more abstract high-level representations or features to discover the feature representation of the data distribution and improve the efficiency of classification.

附图说明Description of drawings

图1为RBM网络结构示意图；Figure 1 is a schematic diagram of the RBM network structure;

图2为DBN分类器的结构示意图；Fig. 2 is the structural representation of DBN classifier;

图3为基于底层融合特征的深度置信网络图像分类协议流程图；Fig. 3 is a flow chart of a deep belief network image classification protocol based on underlying fusion features;

图4为Corel图；Figure 4 is a Corel diagram;

图5为实验中各组的分类正确率示意图；Figure 5 is a schematic diagram of the classification accuracy of each group in the experiment;

图6为10类图像的误分率。Figure 6 shows the misclassification rates of 10 types of images.

具体实施方式detailed description

下面结合具体实施例，进一步阐述本发明。应理解，这些实施例仅用于说明本发明而不用于限制本发明的范围。此外应理解，在阅读了本发明讲授的内容之后，本领域技术人员可以对本发明作各种改动或修改，这些等价形式同样落于本申请所附权利要求书所限定的范围。Below in conjunction with specific embodiment, further illustrate the present invention. It should be understood that these examples are only used to illustrate the present invention and are not intended to limit the scope of the present invention. In addition, it should be understood that after reading the teachings of the present invention, those skilled in the art can make various changes or modifications to the present invention, and these equivalent forms also fall within the scope defined by the appended claims of the present application.

本发明提供了一种基于底层融合特征的深度置信网络图像分类协议IBFCP，IBFCP基于多种特征，从而可以提供较高的辨识度，而辨识度越高，分类也就越容易。The present invention provides a deep belief network image classification protocol IBFCP based on underlying fusion features. The IBFCP is based on multiple features, so that it can provide a higher degree of recognition, and the higher the degree of recognition, the easier the classification.

1、特征表达1. Feature expression

图像特征的提取和表达是图像分类技术的基础。一般来讲，基于内容图像检索的特征以视觉特征为主，主要包括颜色、纹理、形状特征三类。The extraction and expression of image features is the basis of image classification technology. Generally speaking, the features of content-based image retrieval are mainly visual features, mainly including color, texture, and shape features.

1)颜色特征1) Color characteristics

颜色特征是基于内容图像检索中最重要、应用最广泛的视觉特征，主要是因为它提取简单，具有旋转不变性、尺度不变性、平移不变性等优点，而且对观测视角的变化也不太敏感。目前，应用较多的颜色特征主要有颜色直方图、颜色矩(一阶矩、二阶矩及三阶矩)、颜色相关图、颜色信息熵等。这些特征可以在不同的颜色空间中进行提取(如RGB，HSV空间等)。Color feature is the most important and widely used visual feature in content-based image retrieval, mainly because it is easy to extract, has the advantages of rotation invariance, scale invariance, and translation invariance, and is not sensitive to changes in viewing angles. . At present, the color features that are widely used mainly include color histogram, color moment (first-order moment, second-order moment and third-order moment), color correlation map, color information entropy, etc. These features can be extracted in different color spaces (such as RGB, HSV space, etc.).

2)纹理特征2) Texture features

纹理特征是一种不依赖于颜色或亮度的反映图像中同质现象的视觉特征，纹理特征包含了物体表面结构组织排列的重要信息以及它们与周围环境的联系。纹理特征在基于内容的图像分类中得到了广泛的应用，用户可以通过纹理特征相似性对图像进行分类。Texture feature is a visual feature that does not depend on color or brightness and reflects homogeneous phenomena in images. Texture features contain important information about the organization and arrangement of the surface structure of objects and their connection with the surrounding environment. Texture features have been widely used in content-based image classification, and users can classify images by texture feature similarity.

图像分类中所常用的那些纹理特征，主要有Tamura纹理特征、自回归纹理模型、方向性特征、小波变换和共生矩阵等形式。The texture features commonly used in image classification mainly include Tamura texture features, autoregressive texture models, directional features, wavelet transforms, and co-occurrence matrices.

3)形状特征3) Shape features

物体和区域的形状是图像分类和检索中的另一重要特征。它不同于颜色或纹理等底层特征，形状特征的表达以对图像中物体或区域的划分为基础。由于当前的技术无法做到准确而鲁棒的自动图像分割，图像分类中的形状特征只能同其他特征一起应用。另一方面，由于人们对物体形状的变换旋转和缩放主观上不太敏感，合适的形状特征必须满足对变换、旋转和缩放无关，这对形状相似度的计算也带来了难度。The shape of objects and regions is another important feature in image classification and retrieval. It is different from low-level features such as color or texture, and the expression of shape features is based on the division of objects or regions in the image. Since current techniques cannot achieve accurate and robust automatic image segmentation, shape features in image classification can only be applied together with other features. On the other hand, since people are not subjectively sensitive to the transformation, rotation and scaling of object shapes, suitable shape features must be independent of transformation, rotation and scaling, which also brings difficulty to the calculation of shape similarity.

图像分类中所用的形状特征主要有Hu不变矩、边缘方向直方图、傅里叶描述符、Z矩、方向梯度直方图等。The shape features used in image classification mainly include Hu invariant moments, edge orientation histograms, Fourier descriptors, Z moments, and orientation gradient histograms.

2、深度置信网络DBN2. Deep Belief Network DBN

深度学习通过组合底层特征形成更加抽象的高层表示或特征，以发现数据的分布的特征表示。DBN是在深度架构上的推广，由RBM模型扩展而来。DBN是包含多个隐层(隐层数大于2)的概率生成模型，并且可以有效地表示、训练非线性数据。当前层从前一层的隐含单元捕获高度相关的关联，建立一个观察数据和标签之间的联合分布。DBN的核心思想是自底向上每一层受限波尔兹曼机对输入数据进行提取、抽象，尽可能保留重要信息。Deep learning forms a more abstract high-level representation or feature by combining low-level features to discover the feature representation of the data distribution. DBN is an extension of the deep architecture, which is extended from the RBM model. DBN is a probabilistic generative model containing multiple hidden layers (the number of hidden layers is greater than 2), and can effectively represent and train nonlinear data. The current layer captures highly correlated associations from the hidden units of the previous layer, building a joint distribution between observations and labels. The core idea of DBN is to extract and abstract the input data from the bottom up to each layer of restricted Boltzmann machine, and retain important information as much as possible.

1)受限波尔兹曼机模型RBM1) Restricted Boltzmann machine model RBM

RBM是一个无监督学习的能量模型，它包括显层(输入层)和隐层(输出层)两层结构，对称连接且无自反馈的随机神经网络模型，层间全连接，层内无连接。如果显示单元分成2类(只有0或1)，RBM可用联合概率分布来表示。RBM is an energy model of unsupervised learning, which includes two layers of structure (input layer) and hidden layer (output layer), a random neural network model with symmetrical connection and no self-feedback, fully connected between layers, and no connection within the layer . If the display units are divided into 2 classes (only 0 or 1), the RBM can be represented by a joint probability distribution.

RBM网络结构如图1所示，其中，v为显层，用于表示观测数据；h为隐层，可看作一些特征提取器；W为两层间的连接权重。RBM的隐层单元和显层单元可以为任意的指数族单元(即给定隐层单元/显层单元，显层单元/隐层单元的分布可以为任意的指数族分布)，如softmax单元、高斯单元、泊松单元等。The RBM network structure is shown in Figure 1, where v is the visible layer, which is used to represent the observation data; h is the hidden layer, which can be regarded as some feature extractors; W is the connection weight between the two layers. The hidden layer unit and the display layer unit of RBM can be any exponential family unit (that is, given the hidden layer unit/display layer unit, the distribution of the display layer unit/hidden layer unit can be any exponential family distribution), such as softmax unit, Gaussian units, Poisson units, etc.

图1中，RBM网络结构有m个显层节点和n个隐层节点，m、n均为正整数，其中每个显层节点只和n个隐层节点相关，和其他显层节点是独立的，就是这个显层节点的状态只受n个隐层节点的影响，同样对于每个隐层节点也只受n个显层节点的影响，这个特点使得RBM的训练变得容易了。In Figure 1, the RBM network structure has m visible layer nodes and n hidden layer nodes, m and n are both positive integers, and each visible layer node is only related to n hidden layer nodes, and is independent of other visible layer nodes Yes, the state of the visible layer node is only affected by n hidden layer nodes, and each hidden layer node is also only affected by n visible layer nodes. This feature makes the training of RBM easier.

2002年，Hinton提出了对比散度(Constrastive Divergence-CD)算法，之后对其又作了改进，并于2006年把CD算法引入了RBM模型中，解决了RBM模型中联合分布的期望很难获得精确值的问题，提高了训练的效果和效率。由于其方便、易用、灵活度高，被广泛应用于特征提取、分类、降噪、降维等方面。RBM模型由一个显层v和一个隐层h组成。用交的检索信息经过转换后成为显层v，显层与隐层之间通过对称的权重层W相联系。RBM定义的能量函数为：In 2002, Hinton proposed the contrastive divergence (Constrastive Divergence-CD) algorithm, and then improved it, and introduced the CD algorithm into the RBM model in 2006, which solved the difficulty of obtaining the expectation of the joint distribution in the RBM model The problem of precise value improves the effect and efficiency of training. Because of its convenience, ease of use, and high flexibility, it is widely used in feature extraction, classification, noise reduction, and dimensionality reduction. The RBM model consists of a visible layer v and a hidden layer h. The retrieved information exchanged by users is converted into the visible layer v, and the visible layer and the hidden layer are connected through a symmetrical weight layer W. The energy function defined by RBM is:

其中，θ＝{w_mn，b_m，c_n}是RBM的参数，均为实数；w_mn表示显层单元m与隐层单元之间n的连接权重；b_m表示显层单元m的偏置；c_n表示隐层单元n的偏置。当参数确定时，基于式(1)能量函数，可以得到v，h的联合概率分布：Among them, θ={w _mn , b _m , c _n } are the parameters of RBM, all of which are real numbers; w _mn represents the connection weight of n between the visible layer unit m and the hidden layer unit; b _m represents the bias of the visible layer unit m set; c _n represents the bias of hidden layer unit n. When the parameters are determined, based on the energy function of formula (1), the joint probability distribution of v and h can be obtained:

其中，Z(θ)为归一化因子(也称配分函数)：Among them, Z(θ) is the normalization factor (also called partition function):

对于观测数据v的概率分布P(v；θ)对应P(v，h；θ)的边缘分布，也称为似然函数。对应数据的边缘分布(联合分布)可定义为：The probability distribution P(v; θ) for the observed data v corresponds to the marginal distribution of P(v, h; θ), also known as the likelihood function. The marginal distribution (joint distribution) of the corresponding data can be defined as:

类似的，有：Similarly, there are:

2)深度置信网络模型2) Deep Belief Network Model

DBN在训练过程中所要学习的就是联合概率分布，而在机器学习领域中，联合概率所表示的意义就是对象的生成模型。2006年Hinton提出了DBN的模型，它是由多个RBM模型重叠在一起的结构和一个BP神经网络构成的深度结构，其训练过程主要包括两个方面：(1)利用RBM结构训练，筛选数据特征信息；(2)将各层RBM连接，在最后一层经网络，将RBM输出作为BP神经网络的输入，并利用数据进行监督训练，构成整个深度结构。DBN将原始输入进行逐层的特征提取，从具体到抽象，使得神经网络得到的输入成为一个更加易于分类的特征向量，同时，多层RBM组成的深度结构使得在特征提取过程中的错误或者冗余信息被逐层弱化，并最终在BP神经网络的反向调整过程中使模型达到整体最优。与传统神经网络相比，DBN深度结构的优势在于克服了传统神经网络在深度结构增加时训练时间长、易陷入局部最优、大数据处理慢的缺点。DBN可以认为是带有已训练初始权值的神经网络。已有工作证明了下面的3个规则：(1)顶层单元个数超过阈值，准确性在一定水平保持稳定；(2)层数增多，计算性能趋于下降；(3)RBM训练随着迭代次数增长，性能也相应提高。What DBN needs to learn during the training process is the joint probability distribution, and in the field of machine learning, the meaning represented by the joint probability is the generative model of the object. In 2006, Hinton proposed the DBN model, which is a deep structure composed of multiple RBM models overlapping each other and a BP neural network. The training process mainly includes two aspects: (1) Using RBM structure training, screening data Feature information; (2) Connect each layer of RBM, pass the network in the last layer, use the RBM output as the input of the BP neural network, and use the data for supervised training to form the entire deep structure. DBN extracts features layer by layer from the original input, from concrete to abstract, so that the input obtained by the neural network becomes a feature vector that is easier to classify. The remaining information is weakened layer by layer, and finally the model reaches the overall optimum in the reverse adjustment process of the BP neural network. Compared with the traditional neural network, the advantage of the deep structure of DBN is that it overcomes the shortcomings of the traditional neural network, such as long training time, easy to fall into local optimum, and slow processing of large data when the deep structure increases. A DBN can be thought of as a neural network with trained initial weights. The existing work has proved the following three rules: (1) the number of top-level units exceeds the threshold, and the accuracy remains stable at a certain level; (2) the number of layers increases, and the computing performance tends to decline; (3) RBM training increases with iteration As the number of times increases, the performance also improves accordingly.

在一个由m个RBM组成的深度信任网络当中，第n(n＜m)个RBM模型在第n-1个RBM模型训练后开始，P(h_n；h_n-1，w_n)的输入是来自于第n-1个RBM模型的输出P(h_n-1；h_n-2，w_n-1)同时，它的输出P(h_n+1；h_n，w_n+1)就构成了第n+1个RBM模型的输入，n为大于1的正整数。Hinton认为，有1个隐含层的典型的DBN，可视数据v和隐含向量h的关系可以用概率表示成如下形式：In a deep trust network composed of m RBMs, the nth (n<m) RBM model starts after the n-1 RBM model is trained, and the input of P(h _n ; h _n-1 , w _n ) is the output P(h _n-1 ; h _n-2 , w _n-1 ) from the n-1th RBM model. At the same time, its output P(h _n+1 ; h _n , w _n+1 ) is It constitutes the input of the n+1th RBM model, and n is a positive integer greater than 1. Hinton believes that for a typical DBN with 1 hidden layer, the relationship between the visible data v and the hidden vector h can be expressed in probability as follows:

其中，h¹，h²，h¹，h^k均为隐含向量；Among them, h ¹ , h ² , h ¹ , h ^k are all implicit vectors;

3、多特征融合的DBN分类方法3. DBN classification method based on multi-feature fusion

1)融合多特征1) Fusion of multiple features

对于复杂图像，一般来说一个特征是很难具有足够辨识度的。显然，多种特征组合一起可以提供较高的辨识度，辨识度越高，分类也就越容易。For complex images, it is generally difficult for a feature to have sufficient recognition. Obviously, a combination of multiple features can provide a higher degree of recognition, and the higher the degree of recognition, the easier it is to classify.

一类图像的显著特征有些表现在局部特征点上，有些表现在颜色特征上，还有一些则表现在纹理特征上或是形状特征上。采用单一的特征对所有图像进行分类很容易造成一类场景图像的显著特征丢失从而降低分类精度。不仅如此，同一类场景的不同图像的显著特征也存在差异，如果只采用一种特征来对图像进行分类，也很容易丢失单幅图像的显著特征，造成分类精度降低。而多特征融合能够改善这种情况，进一步提高分类精度。Some of the salient features of a class of images are expressed in local feature points, some are expressed in color features, and some are expressed in texture features or shape features. Using a single feature to classify all images can easily cause the loss of salient features of a class of scene images and reduce the classification accuracy. Not only that, but the salient features of different images of the same type of scene are also different. If only one feature is used to classify images, it is easy to lose the salient features of a single image, resulting in a decrease in classification accuracy. Multi-feature fusion can improve this situation and further improve the classification accuracy.

针对彩色图像中复杂目标的特点，本申请为每一幅图像提取颜色、纹理和形状3种类型的特征，包括9个颜色矩、6个tamura特征、20个灰度共生矩阵、7个Hu不变矩、16个边向直方图，共48个特征。然后采用多特征融合的算法，在图像分类中进行特征组合，从而避免单一特征存在的问题，提高分类精度。Aiming at the characteristics of complex objects in color images, this application extracts three types of features of color, texture and shape for each image, including 9 color moments, 6 tamura features, 20 gray-level co-occurrence matrices, and 7 Hu different Variable moments, 16 edge histograms, 48 features in total. Then, the algorithm of multi-feature fusion is used to combine features in image classification, so as to avoid the problem of single feature and improve the classification accuracy.

2)构建DBN分类器2) Build a DBN classifier

DBN采用的是4个RBM组成的深层结构，其结构为48-90-90-90-10。第1层RBM将输入视为显层，共有48个结点，对应图像的48个特征，RBM的隐层(输出层)作为第2层RBM的显层(共90个节点)；第2层RBM的隐层(输出层)作为第3层RBM的显层(共90个节点)；第3层RBM的隐层(输出层)作为第4层的显层(共90个节点)；第4层RBM的隐层(输出层)将是DBN的输出，它包括10个单元，即对图像分成10类。第4层加入sigma函数，作为最终结果输出层。DBN adopts a deep structure composed of four RBMs, and its structure is 48-90-90-90-10. The first layer of RBM regards the input as the display layer, with a total of 48 nodes, corresponding to 48 features of the image, and the hidden layer (output layer) of the RBM is used as the display layer of the second layer of RBM (a total of 90 nodes); the second layer The hidden layer (output layer) of the RBM is used as the visible layer of the third layer RBM (90 nodes in total); the hidden layer (output layer) of the third layer RBM is used as the visible layer of the fourth layer (90 nodes in total); the fourth layer The hidden layer (output layer) of the layer RBM will be the output of the DBN, which includes 10 units, that is, the image is divided into 10 categories. The 4th layer adds the sigma function as the final result output layer.

Sigma函数的公式为：The formula for the Sigma function is:

DBN分类器的结构如图2所示。The structure of the DBN classifier is shown in Figure 2.

3)IBFCP协议实现流程3) IBFCP protocol implementation process

基于底层融合特征的深度置信网络图像分类协议采用Corel 1K数据库，随机选取其中90％作为训练集，剩余的10％作为测试集。算法流程如图3所示。The deep belief network image classification protocol based on the underlying fusion features uses the Corel 1K database, randomly selects 90% of it as the training set, and the remaining 10% as the test set. The algorithm flow is shown in Figure 3.

具体步骤如下：Specific steps are as follows:

(1)特征表达与融合：对每一幅图像提取颜色、纹理和形状3类特征信息，共48个特征，形成48维的特征向量，对1000幅图像形成100048的特征集。(1) Feature expression and fusion: 3 types of feature information of color, texture and shape are extracted for each image, a total of 48 features are formed to form a 48-dimensional feature vector, and a feature set of 100048 is formed for 1000 images.

(2)归一化处理：为了使之后的实际操作时更加准确，保证各数据的尺度一致性，必须将特征向量进行归一化处理，归一化后的所有数据都在[0，1]之间，其归一化公式为：(2) Normalization processing: In order to make the subsequent actual operation more accurate and ensure the scale consistency of each data, the feature vector must be normalized, and all the data after normalization are in [0, 1] Between, its normalization formula is:

X_i、X_min和X_max分别为特征集中的某个，最小和最大特征向量；X _i , X _min and X _max are the smallest and largest eigenvectors in the feature set, respectively;

(3)数据分类：从特征集中随机选择900个(90％)作为训练集，其余100个(10％)的作为测试集。(3) Data classification: randomly select 900 (90%) from the feature set as the training set, and the remaining 100 (10%) as the test set.

(4)训练过程：采用4层DBN结构进行训练。利用Hinton GE在“Training Productsof Experts by Minimizing Contrastive Divergence”(Hinton G，Osindero S，Teh Y.Afast learning algorithm for deep belief nets[Ｊ].Neural Computation，2006，18(7)：1527-1554)中提出的对比散度的快速学习算法进行学习。(4) Training process: A 4-layer DBN structure is used for training. Proposed by Hinton GE in "Training Products of Experts by Minimizing Contrastive Divergence" (Hinton G, Osindero S, Teh Y. Afast learning algorithm for deep belief nets [J]. Neural Computation, 2006, 18(7): 1527-1554) The fast learning algorithm of contrastive divergence is used for learning.

(5)测试过程：采用DBN训练过程得到的权重和偏置对测试集进行测试，(5) Test process: use the weight and bias obtained in the DBN training process to test the test set,

根据RBM的分布进行一次Gibbs采样后所获取的样本与原数据的差异进行误差评估，得出分类结果。According to the distribution of RBM, the difference between the samples obtained after Gibbs sampling and the original data is evaluated for error, and the classification result is obtained.

(6)实验与结果分析(6) Experiment and result analysis

为了验证以上算法，本实验的软件仿真环境为在Win8.1下安装的Matlab2013a，电脑硬件配置为Intel(R)Core(TM)2Duo E8400，3.0GHz CPU，4GB内存，320GB硬盘。In order to verify the above algorithm, the software simulation environment of this experiment is Matlab2013a installed under Win8.1, and the computer hardware configuration is Intel(R) Core(TM) 2Duo E8400, 3.0GHz CPU, 4GB memory, 320GB hard disk.

a.实验数据a.Experimental data

Corel图像库是常用的图像分类和图像检索的图库之一。它有2类，分别为Corel10K和Corel 1K。图像均是256x384像素或384x256像素jpg图像。Corel 10K包括10000张图像，共有100类图像，每类图像有100张。Corel 1K共有10类图，每类图100张。Corel Image Library is one of the commonly used image classification and image retrieval libraries. It has 2 types, namely Corel10K and Corel 1K. The images are all 256x384 px or 384x256 px jpg images. Corel 10K includes 10,000 images, there are 100 categories of images, and each category has 100 images. Corel 1K has 10 types of diagrams, 100 diagrams for each type.

为了同文献(Hinton G E.Training Products of Experts by MinimizingContrastive Divergence[Ｊ]，Neural computation，2002，14(8)：1771-1800.和Rao M B，Kavitieval C H.A New Feature Set for Content Based Image Retrieval[Ｊ].Information Communication and Embedded Systems，2013，1(1)：84-89.)的结果进行比较，本文采用与其相同的图像库，即Corel 1K图库。这10类分别为花、马、恐龙、大象、建筑、海滩、公共汽车、人、食物、山。类别分别为1到10，每类图像100张，共1000张图像，图4显示了这10类图，每类显示了一张。For the same literature (Hinton G E. Training Products of Experts by Minimizing Contrastive Divergence [J], Neural computation, 2002, 14 (8): 1771-1800. and Rao M B, Kavitieval C H.A New Feature Set for Content Based Image Retrieval [J] ].Information Communication and Embedded Systems, 2013, 1(1): 84-89.) for comparison, this paper uses the same image library as Corel 1K library. The 10 categories are flowers, horses, dinosaurs, elephants, buildings, beaches, buses, people, food, and mountains. The categories are 1 to 10, with 100 images for each category, and a total of 1000 images. Figure 4 shows these 10 categories, one for each category.

b.数据分组b. Data packet

将整个图像库分成两部分，其中一部分作为训练集，另一部分作为测试集，训练集为样本总数的90％；测试集为样总数本的10％。分类的过程采用随机分类。随机分类结果如表1所示。The entire image database is divided into two parts, one part is used as a training set, and the other part is used as a test set. The training set is 90% of the total number of samples; the test set is 10% of the total number of samples. The classification process adopts random classification. The random classification results are shown in Table 1.

表1随机分类结果Table 1 Random classification results

c.实验结果c. Experimental results

每次选其中9组作为训练集，另外一组作为测试集，得到一组结果。进行10次，从而保证每一个样本都可以作为测试集进行实验。通过10次实验，得出10组实验正确率。Each time, 9 groups are selected as the training set, and the other group is used as the test set to obtain a set of results. Perform 10 times to ensure that each sample can be used as a test set for experiments. Through 10 experiments, the correct rate of 10 groups of experiments is obtained.

图5显示了10组实验中各组的分类正确率。从每组的分类正确率来看，第10组的正确率最后，为92％。第7组的正确率最低为79％，平均正确率为85.1％。Figure 5 shows the classification accuracy of each group in the 10 experiments. From the classification accuracy rate of each group, the accuracy rate of the 10th group is the last, which is 92%. Group 7 had the lowest correct rate of 79%, with an average correct rate of 85.1%.

根据10次实验统计结果，计算出每一类图像的错误分类情况，具体如表2所示。表2中每一行代表一类图像(共100张)的分类情况，a_ij(i＝1，2，...，10；产1，2，...，10)表明第i类图像分类时分成第j类的数量。第j列的总计表明1000张图像中分类成第j类的数量(每类应该为100张)。最后一列表明对应此类图像的分类正确率。从表2可以看出，在，10类图像中，每一类的分类正确率各不相同，其中恐龙一组分类正确率最高，为100％，全部分类正确。正确率低于80％的有“人”、“海滩”、“建筑”和“大象”4类。According to the statistical results of 10 experiments, the misclassification of each type of image is calculated, as shown in Table 2. Each row in Table 2 represents the classification of a class of images (100 in total), a _ij (i=1, 2,..., 10; production 1, 2,..., 10) indicates the classification of the i-th class of images Time is divided into the number of the jth class. The total in column j indicates the number of 1000 images classified into class j (should be 100 images per class). The last column indicates the classification accuracy for such images. It can be seen from Table 2 that among the 10 categories of images, the classification accuracy rate of each category is different, and the classification accuracy rate of the group of dinosaurs is the highest, which is 100%, and all of them are classified correctly. There are 4 categories of "people", "beach", "building" and "elephant" with a correct rate lower than 80%.

表2实验分类结果Table 2 Experimental classification results

图6显示了10类图像的误分率。误分率为错误分成本类的图像数量除以分成成本类图像的总数，例如，每一个“人”，错误分成本类的数量为17幅，分成本类的总数量为89幅，其误分率为17/89×100％＝19.1％。10类图像的误分率如图6所示。从图6可以看出，“建筑”、“大象”和“山”3类图像的误分率比较高，均超过20％。“汽车”、“恐龙”和“花”的3类的误分率较低。Figure 6 shows the misclassification rates for 10 categories of images. The misclassification rate is the number of images misclassified into this class divided by the total number of images classified into the cost class. For example, for each "person", the number of misclassified images into this class is 17, and the total number of images classified into this class is 89. The fraction is 17/89×100%=19.1%. The misclassification rates of the 10 categories of images are shown in Figure 6. It can be seen from Figure 6 that the misclassification rates of the three types of images of "building", "elephant" and "mountain" are relatively high, all exceeding 20%. The 3 classes of "car", "dinosaur" and "flower" have a lower misclassification rate.

d.方法比较d. Method comparison

1)单一特征与本发明算法结果1) Single feature and algorithm result of the present invention

表3列举了常见特征的分类结果，主要包括灰度直方图、颜色直方图、灰度共生矩阵、颜色共生矩阵和本发明算法的结果。其中，前5个方法的分类大小均为16。Table 3 lists the classification results of common features, mainly including grayscale histogram, color histogram, grayscale co-occurrence matrix, color co-occurrence matrix and the results of the algorithm of the present invention. Among them, the classification size of the first 5 methods is 16.

表3单一特征与本文算法的分类正确率比较％Table 3 Comparison of the classification accuracy rate of a single feature and the algorithm in this paper

从表3可以看出，单一特征的平均分类正确率均不超过70％，而本发明特征融合算法的结果达到85.1％，分类效果较好。It can be seen from Table 3 that the average classification accuracy rate of a single feature is not more than 70%, while the result of the feature fusion algorithm of the present invention reaches 85.1%, and the classification effect is better.

本发明提出一种新的DBN图像分类算法，从原始图像中先提取一般的颜色、纹理和形状等特征，然后以这些特征作为原始数据进行深度置信网络训练。通过对颜色、纹理和形状多特征融合，解决单一特征及现有算法分类正率不高的问题，采用4层DBN网络进行训练，以克服单一特征及支持向量机SVM、Boosting等浅层结构算法分类效果不佳的缺点，同时也避免出现从像素级进行直接训练速度较慢的现象。该算法平均分类正确率达到近86％，高于使用单一特征的分类算法和其他主流分类算法。The present invention proposes a new DBN image classification algorithm, which first extracts general features such as color, texture, and shape from the original image, and then uses these features as original data to carry out deep belief network training. Through the multi-feature fusion of color, texture and shape, it solves the problem of low classification rate of single feature and existing algorithms, and uses 4-layer DBN network for training to overcome single feature and support vector machine SVM, Boosting and other shallow structure algorithms The disadvantage of poor classification effect, but also avoid the slow speed of direct training from the pixel level. The algorithm has an average classification accuracy rate of nearly 86%, which is higher than the classification algorithm using a single feature and other mainstream classification algorithms.

Claims

1. a kind of depth confidence network image bracket protocol based on bottom fusion feature, it is characterised in that step is：

Step 1：Color, texture and shape facility in extraction sample image, constitute the weight matrix of multiple features fusion；

Step 2：The weight matrix is normalized；

Step 3：Depth confidence network is trained and tested as initial data by the use of the weight matrix after normalized, Draw classification results.

2. a kind of depth confidence network image bracket protocol based on bottom fusion feature as claimed in claim 1, its feature It is：The depth confidence network is the structure and a BP god linked together by the limited Boltzmann machine model RBM of multiple Through the depth structure that network is constituted；By each layer RBM connections, the output of preceding layer RBM as later layer RBM input, last Layer RBM outputs constitute entire depth confidence network as the input of BP neural network；The initial data is input into ground floor RBM, The output of BP neural network is classification results.

3. a kind of depth confidence network image bracket protocol based on bottom fusion feature as claimed in claim 2, its feature It is：Initial data is carried out the RBM feature extraction successively, from specific to abstract so that the input that neutral net is obtained As a characteristic vector more easily classified, meanwhile, the depth structure of multilayer RBM compositions is caused in characteristic extraction procedure Mistake or redundancy successively weakened, and finally model is reached entirety during the reverse adjustment of BP neural network It is optimal.

4. a kind of depth confidence network image classification based on bottom fusion feature as described in any one of claims 1 to 3 is assisted View, it is characterised in that：In the step 3, from initial data, 80%~90% is randomly choosed as training set, remaining conduct Test set.

5. a kind of depth confidence network image bracket protocol based on bottom fusion feature as claimed in claim 4, its feature It is：Depth confidence network is trained using the training set, obtains the weight and offset parameter of depth confidence network；Adopt The depth confidence network of the parameter determination obtained with training process is tested test set, and carries out error evaluation, is drawn point Class result.