CN109685076A

CN109685076A - A kind of image-recognizing method based on SIFT and sparse coding

Info

Publication number: CN109685076A
Application number: CN201811481734.0A
Authority: CN
Inventors: 李俊; 李琦铭; 兰晓东
Original assignee: Quanzhou Institute of Equipment Manufacturing
Current assignee: Quanzhou Institute of Equipment Manufacturing
Priority date: 2018-12-05
Filing date: 2018-12-05
Publication date: 2019-04-26

Abstract

The present invention provides an image recognition method based on SIFT and sparse coding, comprising: extracting dense SIFT features and pixel block features from each image of an RGB-D object respectively; and using K-SVD algorithm according to the extracted dense SIFT features Calculate the SIFT feature dictionary, and use the matching pursuit MP algorithm to solve the sparse coding of SIFT to obtain the first image feature; use K-SVD to learn the dictionary of pixel blocks, and use the OMP algorithm to obtain the sparse coding of pixels, and obtain the unit through the maximum pooling algorithm. feature, connect multiple unit features into block features, calculate sparse coding based on block features, and link block features and corresponding sparse coding to obtain second image features; use pyramid pooling algorithm to link all features to obtain fused image features Perform image recognition. By adopting the image recognition method of the present invention, the image recognition accuracy rate can be improved.

Description

An Image Recognition Method Based on SIFT and Sparse Coding

技术领域technical field

本发明涉及图像处理技术领域，尤其涉及一种基于SIFT和稀疏编码的图像识别方法。The invention relates to the technical field of image processing, in particular to an image recognition method based on SIFT and sparse coding.

背景技术Background technique

基于RGB-D信息的物体识别在计算机视觉和机器视觉领域是一个很重要的课题，并且有了相关应用比如人脸识别，手势识别，文字识别和车辆识别。Object recognition based on RGB-D information is an important topic in the field of computer vision and machine vision, and has related applications such as face recognition, gesture recognition, text recognition and vehicle recognition.

目前基于传统算法的物体识别算法中，很多只在初始阶段使用SIFT提取物体特征再结合稀疏编码，识别效果不佳。比如现有的申请号为201510567889.6的中国专利——基于归一化非负稀疏编码器的图像快速特征表示方法，以及申请号为201510874639.7的中国专利——基于多方向上下文信息和稀疏编码模型的图像分类方法，基于这些方法可以有效提取物体梯度信息但是会忽略颜色和形状等特征。还有一些算法直接从图像用稀疏编码提取物体特征的如HMP算法，这种算法可以有效提取图像中物体的颜色和形状信息，但是这种算法会忽略物体梯度信息，在特征提取以及识别上准确率不高，不能很好的识别智能家居中生活常见物体。Among the current object recognition algorithms based on traditional algorithms, many only use SIFT to extract object features and combine sparse coding in the initial stage, and the recognition effect is not good. For example, the existing Chinese patent application No. 201510567889.6 - Image Fast Feature Representation Method Based on Normalized Non-negative Sparse Encoder, and the Chinese Patent Application No. 201510874639.7 - Image Based on Multi-directional Context Information and Sparse Coding Model Classification methods, which can effectively extract object gradient information but ignore features such as color and shape. There are also some algorithms that directly extract object features from the image using sparse coding, such as the HMP algorithm. This algorithm can effectively extract the color and shape information of the object in the image, but this algorithm ignores the gradient information of the object and is accurate in feature extraction and recognition. The rate is not high, and it cannot well identify common objects in smart home life.

发明内容SUMMARY OF THE INVENTION

本发明要解决的技术问题，在于提供一种基于SIFT和稀疏编码的图像识别方法，提高图像识别准确率。The technical problem to be solved by the present invention is to provide an image recognition method based on SIFT and sparse coding to improve the image recognition accuracy.

本发明是这样实现的：一种基于SIFT和稀疏编码的图像识别方法，包括如下步骤：The present invention is realized as follows: an image recognition method based on SIFT and sparse coding, comprising the following steps:

步骤10、从RGB-D对象的每个图像上分别提取密集SIFT特征以及像素块特征；Step 10. Extract dense SIFT features and pixel block features from each image of the RGB-D object;

步骤20、根据提取到的密集SIFT特征，采用K-SVD算法计算SIFT特征字典，并采用匹配追踪MP算法求解SIFT的稀疏编码，得到第一图像特征；Step 20, according to the extracted dense SIFT features, adopt the K-SVD algorithm to calculate the SIFT feature dictionary, and adopt the matching pursuit MP algorithm to solve the sparse coding of the SIFT to obtain the first image feature;

步骤30、采用K-SVD学习像素块的字典，并利用OMP算法得到像素的稀疏编码，通过最大池化算法得到单元特征，将复数个单元特征连接为块特征，计算基于块特征的稀疏编码，将块特征和对应的稀疏编码链接得到第二图像特征；Step 30, using K-SVD to learn the dictionary of pixel blocks, and using the OMP algorithm to obtain the sparse coding of the pixels, obtaining the unit features through the maximum pooling algorithm, connecting the plurality of unit features as block features, and calculating the sparse coding based on the block features, Linking the block feature and the corresponding sparse coding to obtain the second image feature;

步骤40、采用金字塔池化算法分别应用于第一图像特征和第二图像特征，链接所有特征，得到融合后的图像特征进行图像识别；Step 40, adopt the pyramid pooling algorithm to be applied to the first image feature and the second image feature respectively, link all the features, and obtain the fused image feature for image recognition;

上述步骤20和步骤30不分先后进行。The above steps 20 and 30 are performed in no particular order.

进一步的，所述步骤10中的密集SIFT特征的提取方式为：在图像上每隔四像素提取一个图像块，每一图像块提取一个SIFT特征；Further, the method of extracting the dense SIFT feature in the step 10 is: extracting an image block every four pixels on the image, and extracting a SIFT feature from each image block;

所述密集SIFT特征的矩阵表示为：Y＝{y₁,y₂,…,y_u}，其中，y_i是第i个SIFT特征，u是图像块个数。The matrix of the dense SIFT features is expressed as: Y={y ₁ , y ₂ , ..., y _u }, where y _i is the i-th SIFT feature, and u is the number of image blocks.

进一步的，所述步骤20中SIFT的稀疏编码的求解过程如下：Further, the solution process of the sparse coding of SIFT in the step 20 is as follows:

步骤21、在每张图片上采集到复数个SIFT特征，用Y_s＝{y_s1,y_s2,y_s3,…,y_sp}表示样本矩阵，用D_s＝{d_s1,d_s2,…,d_sm}∈R^h×m表示经学习得到的SIFT特征字典，其中，d_si为学习得到的字或者基，X_s＝{x_s1,x_s2,…x_sp}∈R^p×m为稀疏编码，其中x_s代表y_s的稀疏编码；Step 21. Collect a plurality of SIFT features on each image, use Y _s ={y _s1 ,y _s2 ,y _s3 ,...,y _sp } to represent the sample matrix, and use D _s ={d _s1 ,d _s2 ,... ,d _sm }∈R ^h×m represents the learned SIFT feature dictionary, where d _si is the learned word or basis, X _s ={x _s1 ,x _s2 ,…x _sp }∈R ^p×m is Sparse coding, where x _s represents the sparse coding of y _s ;

步骤22、通过下面的公式计算字典：Step 22. Calculate the dictionary by the following formula:

其中，||.||_F代表F范数，||.||₀表示向量中的非零元素个数，o是一个非零数，表示稀疏水平的上限；Among them, ||.|| _F represents the F norm, ||.|| ₀ represents the number of non-zero elements in the vector, o is a non-zero number, representing the upper limit of the sparsity level;

步骤23、在SIFT特征字典的计算过程中采用K-Means算法生成初始SIFT特征字典D；Step 23, using the K-Means algorithm to generate the initial SIFT feature dictionary D in the calculation process of the SIFT feature dictionary;

步骤24、将字典求解问题分解为复数个子问题进行分步求解，具体的采用如下公式进行：Step 24: Decompose the dictionary solution problem into a plurality of sub-problems for step-by-step solution, specifically using the following formula:

采用SVD分解算法来分步更新优化字典，对于第i个字的更新优化过程如下：The SVD decomposition algorithm is used to update the optimization dictionary step by step. The update optimization process for the i-th word is as follows:

其中i表示的取值范围对于SIFT特征是(1，1024)，对于j和i是同一个含义，都是字典中的字，x_i表示第i个字对应的相关特征的稀疏编码在第个维度的值；The value range represented by i is (1, 1024) for the SIFT feature. For j and i, they have the same meaning and are both words in the dictionary. _xi represents the sparse coding of the relevant feature corresponding to the i-th word. the value of the dimension;

其中，表示X_s的行，E_i表示残差矩阵，优化后的字d_i和优化的系数是将SVD算法应用于残差矩阵E_i得到的，更新时利用基中所涉及的行，重复这个过程直到收敛或者达到预设的迭代次数；in, represents the row of X _s , E _i represents the residual matrix, the optimized word d _i and the optimized coefficients It is obtained by applying the SVD algorithm to the residual matrix E _i , using the rows involved in the basis when updating, and repeating this process until convergence or reaching the preset number of iterations;

步骤25、利用最终更新得到的SIFT特征字典采用匹配追踪算法求解SIFT的稀疏编码，得到第一图像特征。Step 25 , using the finally updated SIFT feature dictionary to solve the sparse coding of the SIFT by using the matching pursuit algorithm to obtain the first image feature.

进一步的，所述步骤30进一步包括：Further, the step 30 further includes:

步骤31、采用K-SVD学习像素块的字典；Step 31, using K-SVD to learn the dictionary of pixel blocks;

步骤32、采用OMP算法得到像素的稀疏编码；Step 32, adopt the OMP algorithm to obtain the sparse coding of the pixel;

步骤33、采用最大池化算法得到单元特征；Step 33, using the maximum pooling algorithm to obtain unit features;

步骤34、将复数个单元特征连接为块特征，所述块特征包括颜色块特征，深度块特征和表面法向量块特征；Step 34, connect a plurality of unit features as block features, and the block features include color block features, depth block features and surface normal vector block features;

步骤35、计算基于块特征的稀疏编码；Step 35, calculate the sparse coding based on the block feature;

步骤36、将块特征和对应的稀疏编码链接得到第二图像特征，所述第二图像特征P_v表示为：P_v＝{p_v1,p_v2,...,p_v3}，其中，v表示颜色、深度或法向量。Step 36: Link the block feature and the corresponding sparse coding to obtain a second image feature, where the second image feature P _v is expressed as: P _v ={p _v1 ,p _v2 ,...,p _v3 }, where v Represents a color, depth, or normal vector.

进一步的，所述步骤40具体如下：金子塔池化采用3×3，2×2，1×1三层，共划分为14个子区域，在每个子区域上用最大池化算法求得子区域特征HP_vb，b＝1，2，…，14；Further, the step 40 is specifically as follows: the pyramid pooling adopts three layers of 3×3, 2×2, and 1×1, which are divided into 14 sub-areas, and the maximum pooling algorithm is used to obtain the sub-areas on each sub-area. feature HP _vb , b=1, 2, . . . , 14;

设Θ_v＝{HP_v1,HP_v2,...,HP_v14}，其中v表示颜色，深度和表面法向量，则融合后的物体图像特征表示为：Let Θ _v ={HP _v1 ,HP _v2 ,...,HP _v14 }, where v represents the color, depth and surface normal vector, then the fused object image features are expressed as:

ψ＝{Θ_rgb,Θ_depth,Θ_normal,φ}；ψ = {Θ _rgb , Θ _depth , Θ _normal , φ};

之后将ψ除以进行归一化处理得到最终融合的图像特征，其中ε＝0.001。Then divide ψ by Perform normalization to obtain the final fused image features, where ε=0.001.

本发明具有如下优点：The present invention has the following advantages:

1、设计算法框架用于改进的SIFT的稀疏编码和HMP算法相结合，提高物体识别准确率；1. The algorithm framework is designed for the combination of improved SIFT sparse coding and HMP algorithm to improve the accuracy of object recognition;

2、本发明设计的算法与HMP或基于SIFT的稀疏编码特征可以提取丰富的特征，使用RGB-D信息向HMP添加梯度信息，获得了具有更丰富特征的表示；2. The algorithm designed by the present invention and HMP or SIFT-based sparse coding feature can extract rich features, and use RGB-D information to add gradient information to HMP to obtain a representation with more abundant features;

3、提出了多通道特征融合方案，通过修改HMP算法，以适应基于SIFT的稀疏编码。3. A multi-channel feature fusion scheme is proposed to adapt to SIFT-based sparse coding by modifying the HMP algorithm.

附图说明Description of drawings

下面参照附图结合实施例对本发明作进一步的说明。The present invention will be further described below with reference to the accompanying drawings and embodiments.

图1为本发明一种基于SIFT和稀疏编码的图像识别方法执行流程图。FIG. 1 is an execution flow chart of an image recognition method based on SIFT and sparse coding according to the present invention.

具体实施方式Detailed ways

如图1所示，本发明的一种基于SIFT和稀疏编码的图像识别方法，包括如下步骤：As shown in Figure 1, an image recognition method based on SIFT and sparse coding of the present invention includes the following steps:

较佳的，所述步骤10中的密集SIFT特征的提取方式为：在图像上每隔四像素提取一个图像块，每一图像块提取一个SIFT特征；Preferably, the extraction method of the dense SIFT feature in the step 10 is: extracting an image block every four pixels on the image, and extracting a SIFT feature from each image block;

较佳的，所述步骤20中SIFT的稀疏编码的求解过程如下：Preferably, the solution process of the sparse coding of SIFT in the step 20 is as follows:

步骤21、在每张图片上采集到复数个SIFT特征，用Y_s＝{y_s1,y_s2,y_s3,…,y_sp}表示样本矩阵，用D_s＝{d_s1,d_s2,…,d_sm}∈R^h×m表示经学习得到的SIFT特征字典，其中，d_si为学习得到的字或者基，X_s＝{x_s1,x_s2,…x_sp}∈R^p×m为稀疏编码，其中x_s代表y_s的稀疏编码，其中R^p×m的p和m分别表示特征个数与对应的稀疏编码的维度，R^h×m的h和m分别表示字典个数与字典的维度，H取1024。Step 21. Collect a plurality of SIFT features on each image, use Y _s ={y _s1 ,y _s2 ,y _s3 ,...,y _sp } to represent the sample matrix, and use D _s ={d _s1 ,d _s2 ,... ,d _sm }∈R ^h×m represents the learned SIFT feature dictionary, where d _si is the learned word or basis, X _s ={x _s1 ,x _s2 ,…x _sp }∈R ^p×m is Sparse coding, where x _s represents the sparse coding of y _s , where p and m of R ^p×m represent the number of features and the corresponding dimension of sparse coding, respectively, and h and m of R ^h×m represent the number of dictionaries and the dictionary, respectively dimension, H takes 1024.

其中i表示的取值范围对于SIFT特征是(1，1024)，对于j和i是同一个含义，都是字典中的字，x_i表示第i个字对应的相关特征的稀疏编码在第个维度的值；The value range represented by i is (1, 1024) for the SIFT feature. For j and i, they have the same meaning and are both words in the dictionary. _xi represents the sparse encoding of the relevant feature corresponding to the i-th word. the value of the dimension;

其中，表示X_s的行，E_i表示残差矩阵，优化后的字d_i和优化的系数是将SVD算法应用于残差矩阵E_i得到的，更新时利用基中所涉及的行，重复这个过程直到收敛或者达到预设的迭代次数；in, represents the row of X _s , E _i represents the residual matrix, the optimized word d _i and the optimized coefficients is obtained by applying the SVD algorithm to the residual matrix E _i , using the rows involved in the basis when updating, and repeating this process until convergence or reaching the preset number of iterations;

较佳的，所述步骤30进一步包括：Preferably, the step 30 further includes:

步骤35、计算基于块特征的稀疏编码Step 35. Calculate sparse coding based on block features

步骤36、将块特征和对应的稀疏编码链接得到第二图像特征，所述第二图像特征P_v表示为：P_v＝{p_v1,p_v2,...,p_v3}，v表示颜色、深度和表面法向量，这里的颜色分指红色，绿色和蓝色；Step 36: Link the block feature and the corresponding sparse coding to obtain a second image feature, where the second image feature P _v is expressed as: P _v ={p _v1 ,p _v2 ,...,p _v3 }, where v represents the color , depth and surface normal vector, where the colors refer to red, green and blue;

较佳的，所述步骤40具体如下：金子塔池化采用3×3，2×2，1×1三层，共划分为14个子区域，在每个子区域上用最大池化算法求得子区域特征HP_vb，b＝1，2，…，14；Preferably, the step 40 is as follows: the pyramid pooling adopts three layers of 3×3, 2×2, and 1×1, which are divided into 14 sub-regions, and the maximum pooling algorithm is used to obtain the sub-regions on each sub-region. Region feature HP _vb , b=1, 2, . . . , 14;

其中，Θ_rgb指的是,Θred，Θgreen和Θblue的组合，Θred指的是红色通道图像特征，Θgreen指的是绿色通道图像特征而Θblue指的是蓝色通道图像特征，图像特征指的是将金字塔池化算法用于每一个通道上块特征与对应的稀疏编码的链接。where _Θrgb refers to the combination of Θred, Θgreen and Θblue, Θred refers to the red channel image feature, Θgreen refers to the green channel image feature and Θblue refers to the blue channel image feature, and the image feature refers to the The pyramid pooling algorithm is used to link the block features on each channel with the corresponding sparse coding.

之后将ψ除以进行归一化处理得到最终融合的图像特征，其中ε＝0.001。Then divide ψ by Perform normalization processing to obtain the final fused image features, where ε=0.001.

由于SIFT特征可以提取物体的梯度方向直方图，采用SIFT进行密集采样可以保留物体强化的细节特征。SIFT算法本身具有尺度旋转不变性，同时具有一定的仿射不变性，依靠这种特性可以得到稳定的图像块特征表示。本发明采用SIFT特征进行密集采样之后，根据得到的特征矩阵，利用K-SVD算法进行字典学习，之后在求解块特征时采用MP(matchingpursuit)算法得到块的稀疏表示，目的是尽可能多的保留图像块的细节特征，最后采用金字塔池化算法链接所有特征，得到图像的特征表示。Since the SIFT feature can extract the gradient direction histogram of the object, dense sampling with SIFT can preserve the detailed features of the object enhancement. The SIFT algorithm itself has scale and rotation invariance, and also has a certain affine invariance. Relying on this characteristic, a stable image block feature representation can be obtained. The present invention adopts the SIFT feature to perform intensive sampling, uses the K-SVD algorithm to learn the dictionary according to the obtained feature matrix, and then adopts the MP (matchingpursuit) algorithm to obtain the sparse representation of the block when solving the block feature. The purpose is to retain as much as possible The detailed features of the image block, and finally the pyramid pooling algorithm is used to link all the features to obtain the feature representation of the image.

每个图像划分为若干16*16图像块像素，每个图像块像素划分为16个4*4单元。直接在像素块上利用稀疏编码进行特征学习的优势是，这种像素块学得的字典可以保存像素块的整体特征，包括像素之间空间位置关系，包含的几何特征，包含的像素值之间的联系等。这样借助基于像素块的字典，可以求得像素的稀疏编码。将最大池化算法应用于单元中所有像素对应的稀疏编码可以得到单元特征(这里单元特征指的是4*4的像素块)，然后将单元特征链接起来得到块特征。这样求得的单元特征，可以保留单元中最显著特征，而将单元特征链接在一起则可以在保留显著信息的同时，保留其空间位置信息。针对块特征，继续进行特征学习得到基于块的字典和稀疏编码，这是为了提取更为抽象的特征。最终将金字塔池化算法分别应用于基于SIFT的稀疏编码、基于块的特征和基于块的稀疏编码，将他们链接在一起就可以得到最后的特征表示。其中SIFT提取灰度信息，直接利用稀疏编码的则是提取颜色，深度和法向量信息。Each image is divided into several 16*16 image block pixels, and each image block pixel is divided into 16 4*4 units. The advantage of using sparse coding for feature learning directly on the pixel block is that the dictionary learned by this pixel block can save the overall characteristics of the pixel block, including the spatial position relationship between pixels, the included geometric features, and the included pixel values. contact, etc. In this way, with the help of a pixel block-based dictionary, the sparse coding of pixels can be obtained. Applying the max pooling algorithm to the sparse coding corresponding to all pixels in the unit can obtain the unit feature (here the unit feature refers to a 4*4 pixel block), and then link the unit features to obtain the block feature. The unit features obtained in this way can retain the most salient features in the unit, and linking the unit features together can retain the spatial location information while retaining the salient information. For block features, feature learning is continued to obtain block-based dictionaries and sparse coding, which is to extract more abstract features. Finally, the pyramid pooling algorithm is applied to SIFT-based sparse coding, block-based feature and block-based sparse coding, respectively, and linking them together can get the final feature representation. Among them, SIFT extracts grayscale information, and directly uses sparse coding to extract color, depth and normal vector information.

本发明从RGB-D信息中学习SIFT特征和块特征，块特征能够从RGB-D信息中提取颜色、形状特征、空间几何特征和方向特征。密集的SIFT特征能够捕获梯度特征。然后用K-SVD算法计算SIFT特征字典。这样就可以得到更多的梯度信息。之后使用匹配追踪(MP)来计算SIFT和块特征的稀疏编码。将块特征和相应的稀疏编码的拼接，在SIFT稀疏编码的基础上，采用简单的三层分割3×3，2×2，1×1的空间金字塔池化算法，并利用块特征和相关稀疏编码特征生成最终图像特征，从而提高图像识别准确率。The present invention learns SIFT feature and block feature from RGB-D information, and block feature can extract color, shape feature, spatial geometric feature and directional feature from RGB-D information. Dense SIFT features are able to capture gradient features. Then use the K-SVD algorithm to calculate the SIFT feature dictionary. In this way, more gradient information can be obtained. Match pursuit (MP) is then used to compute SIFT and sparse coding of block features. The block features and the corresponding sparse coding are concatenated, and on the basis of SIFT sparse coding, a simple three-layer segmentation 3 × 3, 2 × 2, 1 × 1 spatial pyramid pooling algorithm is used, and the block features and related sparse coding are used. The encoded features generate final image features, thereby improving image recognition accuracy.

在一实施例中，本发明将多特征结合的稀疏表示与一些经典的对象识别算法在华盛顿RGB-D对象数据集进行比较，数据集包含51个类别和300个实例，每个类别包含多个实例，实例是从30°，45°，60°不同角度拍摄实验，在实验中使用了41788个RGB-D图像，该数据设计视图、旋转、缩放、纹理和纹理较少的照明问题，对于对象识别，通过训练和测试过程，在数据集上使用比较的方法进行，有两种类型的对象识别任务需要要执行：分类识别和实例识别。不可见的物体是由类别识别中的名字来确定的。一个实例被随机地从每个类别中剥离出来进行测试，其余的300-51＝249个对象用于每个踪迹中的训练。在10个随机训练/测试分割上的平均精度。例如识别，从30°和60°角度获得的图像被用于训练，并且使用45°角的图像进行测试。In one embodiment, the present invention compares the sparse representation of the multi-feature combination with some classical object recognition algorithms on the Washington RGB-D object dataset, which contains 51 categories and 300 instances, each category contains multiple Example, the example is the experiment taken from different angles of 30°, 45°, 60°, 41788 RGB-D images are used in the experiment, the data design view, rotation, scaling, texture and texture less lighting problems, for objects Recognition, through training and testing processes, is performed on datasets using comparative methods, and there are two types of object recognition tasks that need to be performed: classification recognition and instance recognition. Invisible objects are identified by name in class recognition. One instance was randomly stripped from each class for testing, and the remaining 300-51=249 objects were used for training in each trace. Average accuracy on 10 random train/test splits. For example recognition, images obtained from 30° and 60° angles were used for training, and images from 45° angles were used for testing.

分类识别实验对比结果如表1所示：The comparison results of classification and recognition experiments are shown in Table 1:

表1Table 1

RGBRGB RGB-DRGB-D Kernel descriptorKernel descriptor 80.7±2.180.7±2.1 86.5±2.186.5±2.1 HMPHMP 74.7±2.574.7±2.5 82.1±3.382.1±3.3 改进的HMPImproved HMP 82.4±3.182.4±3.1 87.5±2.987.5±2.9 VGGVGG 88.9±2.188.9±2.1 91.8±2.491.8±2.4 MJSRMJSR 91.5±1.591.5±1.5 92.8±1.392.8±1.3

实例识别实验对比结果如表2所示：The comparison results of instance recognition experiments are shown in Table 2:

表2Table 2

RGBRGB RGB-DRGB-D Kernel descriptorKernel descriptor 90.890.8 91.291.2 HMPHMP 75.875.8 78.978.9 改进的HMPImproved HMP 92.192.1 92.892.8 MJSRMJSR 92.692.6 92.292.2

通过实验对比可以看出本发明的物体识别方法采用颜色或者将颜色和深度信息相结合都可以得到与Upgraded HMP具有竞争力的结果。整体超出HMP原有算法13个百分点以上，无论是结合颜色信息还是结合颜色和深度信息，与核函数相比，也是具有竞争力的，结合颜色特征比核函数高1.8个百分点，比结合颜色合深度高1个百分点。Through experimental comparison, it can be seen that the object recognition method of the present invention can obtain competitive results with the Upgraded HMP by using color or combining color and depth information. Overall, it exceeds the original algorithm of HMP by more than 13 percentage points. Whether it is combining color information or combining color and depth information, it is also competitive with the kernel function. Depth is 1 percentage point higher.

虽然以上描述了本发明的具体实施方式，但是熟悉本技术领域的技术人员应当理解，我们所描述的具体的实施例只是说明性的，而不是用于对本发明的范围的限定，熟悉本领域的技术人员在依照本发明的精神所作的等效的修饰以及变化，都应当涵盖在本发明的权利要求所保护的范围内。Although the specific embodiments of the present invention have been described above, those skilled in the art should understand that the specific embodiments we describe are only illustrative, rather than used to limit the scope of the present invention. Equivalent modifications and changes made by a skilled person in accordance with the spirit of the present invention should be included within the scope of protection of the claims of the present invention.

Claims

1. An image identification method based on SIFT and sparse coding is characterized in that: the method comprises the following steps:

step 10, extracting dense SIFT features and pixel block features from each image of the RGB-D object respectively;

step 20, calculating an SIFT feature dictionary by adopting a K-SVD algorithm according to the extracted dense SIFT features, and solving sparse coding of SIFT by adopting a matching pursuit MP algorithm to obtain first image features;

step 30, learning a dictionary of the pixel block by adopting K-SVD, obtaining sparse codes of the pixels by utilizing an OMP algorithm, obtaining unit features by utilizing a maximum pooling algorithm, connecting a plurality of unit features into block features, calculating sparse codes based on the block features, and linking the block features and the corresponding sparse codes to obtain second image features;

step 40, respectively applying a pyramid pooling algorithm to the first image feature and the second image feature, linking all the features, obtaining fused image features, and performing image recognition;

the step 20 and the step 30 are not performed successively.

2. The image recognition method based on SIFT and sparse coding as claimed in claim 1, wherein: the extraction method of the dense SIFT features in the step 10 is as follows: extracting an image block every four pixels on the image, wherein each image block extracts an SIFT feature;

the matrix of dense SIFT features is represented as: y ═ Y₁,y₂,…,y_uIn which y_iIs the ith SIFT feature and u is the image block number.

3. The image recognition method based on SIFT and sparse coding as claimed in claim 1, wherein: the solving process of the sparse coding of SIFT in the step 20 is as follows:

step 21, collecting a plurality of SIFT features on each picture, and using Y_s＝{y_s1,y_s2,y_s3,…,y_spDenotes the sample matrix, with D_s＝{d_s1,d_s2,…,d_sm}∈R^h×mRepresents a dictionary of learned SIFT features, wherein d_siFor learning the word or radical, X_s＝{x_s1,x_s2,…x_sp}∈R^p×mIs sparse coding, where x_sRepresents y_sSparse coding of (2);

step 22, calculating the dictionary by the following formula:

wherein | |_FRepresents F norm, | |. | luminance₀The number of non-zero elements in the vector is represented, and o is a non-zero number and represents the upper limit of the sparse level;

step 23, generating an initial SIFT feature dictionary D by adopting a K-Means algorithm in the calculation process of the SIFT feature dictionary;

step 24, decomposing the dictionary solving problem into a plurality of subproblems for stepwise solving, and specifically adopting the following formula:

and (3) updating the optimized dictionary step by adopting an SVD (singular value decomposition) algorithm, wherein the updating optimization process for the ith word is as follows:

wherein,represents X_sLine of (E), E_iRepresenting residual matrices, optimized word d_iAnd optimized coefficientsIs to apply SVD algorithm to residual matrix E_iIf so, the process is repeated by using the rows related in the base during updating until convergence or preset iteration times are reached;

and 25, solving the sparse coding of SIFT by using the finally updated SIFT feature dictionary and adopting a matching pursuit algorithm to obtain first image features.

4. The image recognition method based on SIFT and sparse coding as claimed in claim 1, wherein: said step 30 further comprises:

step 31, learning a dictionary of pixel blocks by adopting K-SVD;

step 32, obtaining sparse coding of the pixels by adopting an OMP algorithm;

step 33, obtaining unit characteristics by adopting a maximum pooling algorithm;

step 34, connecting the plurality of unit features into block features, wherein the block features comprise color block features, depth block features and surface normal vector block features;

step 35, calculating sparse coding based on block characteristics;

step 36, linking the block features and the corresponding sparse codes to obtain second image features, wherein the second image features P_vExpressed as: p_v＝{p_v1,p_v2,...,p_v3Where v represents a color, depth, or normal vector.

5. The SIFT and sparse coding based image recognition method according to claim 4, wherein: the step 40 is specifically as follows: the gold subzone pooling adopts three layers of 3 × 3, 2 × 2 and 1 × 1, which are totally divided into 14 sub-zones, and the maximum pooling algorithm is used for obtaining the sub-zone characteristics HP on each sub-zone_vb，b＝1，2，…，14；

Let Θ be_v＝{HP_v1,HP_v2,...,HP_v14And v represents color, depth and surface normal vectors, the fused object image features are represented as:

ψ＝{Θ_rgb,Θ_depth,Θ_normal,φ}；

after which psi is divided byAnd (5) carrying out normalization processing to obtain the finally fused image characteristics, wherein epsilon is 0.001.