CN101980250B

CN101980250B - Method for identifying target based on dimension reduction local feature descriptor and hidden conditional random field

Info

Publication number: CN101980250B
Application number: CN201010515864.9A
Authority: CN
Inventors: 李超; 池毅韬; 郭信谊; 熊璋
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2010-10-15
Filing date: 2010-10-15
Publication date: 2014-06-18
Anticipated expiration: 2030-10-15
Also published as: CN101980250A

Abstract

The invention proposes a target recognition method based on dimensionality reduction local feature descriptor and implicit conditional random field. The process is to build a target recognition model and use it for object recognition. The modeling process is a process in which the model is supervised by training images as samples. Each object in the training image corresponds to a different label value. First, calculate the descriptor vector of SIFT (Scale invariant feature transform, scale invariant feature) for the training images containing different objects, and the descriptor vector corresponding to each image constitutes a corresponding high-dimensional vector set; using Neighbor Preserving Embedding (Neighbor Preserving Embedding, NPE) method performs dimensionality reduction on the SIFT set; the vector group after dimensionality reduction and the label of the object corresponding to the source image form a binary group, that is, each image is for such a binary group, such a series of binary groups The set of groups is used as the sample for training the implicit conditional random field model. The process of using the model for recognition is that for a given test image, first calculate the SIFT feature descriptor set of the test image, and use the NPE method to reduce the dimension of the obtained SIFT set, and the obtained vector set after dimensionality reduction is input into the training result. The implicit conditional random field model outputs the final object label as the recognition result.

Description

Object Recognition Method Based on Dimensionality Reduction Local Feature Descriptor and Hidden Conditional Random Field

技术领域 technical field

本发明属于一种基于降维局部特征描述子和隐条件随机场的目标识别方法。具体来说，它是目前计算机视觉领域中结合图像的局部特征提取、降维方法以及隐条件随机场进行建模并对目标图像进行判别的方法。The invention belongs to a target recognition method based on dimensionality reduction local feature descriptor and implicit conditional random field. Specifically, it is a method in the field of computer vision that combines image local feature extraction, dimensionality reduction methods, and implicit conditional random fields to model and discriminate target images.

背景技术 Background technique

目标识别作为计算机视觉领域的最重要的方向之一，是后续各种更高层次处理例如目标分类、视频检索、行为理解等的基础。已有许多方法，包括：基于变化轮廓的检测、基于特征建模的检测、基于EM算法的颜色统计信息的检测、基于区域方法的检测以及基于帧差法的检测等等。这些经典方法简洁而易于理解，但其效果并不能令人满意。使用简单特征信息并不足以对物体进行判别，其后出现的改进算法当中，由于某些特征之间尚还存在相互抵消的特性，所以直到目前为止的较为成功的目标识别方法均是特定于某种场景之下的。As one of the most important directions in the field of computer vision, target recognition is the basis for subsequent higher-level processing such as target classification, video retrieval, and behavior understanding. There are many methods, including: detection based on changing contours, detection based on feature modeling, detection based on color statistics of EM algorithm, detection based on region method, detection based on frame difference method and so on. These classic methods are concise and easy to understand, but their effects are not satisfactory. The use of simple feature information is not enough to discriminate objects. Among the improved algorithms that appeared later, because some features still have the characteristics of mutual cancellation, the more successful target recognition methods so far are all specific to a certain object. under this scenario.

局部特征是最近兴起的计算机视觉领域的特征提取方法，已经在目标识别、图像配准、图像检索、三维重建领域得到了广泛的应用。局部特征对于几何变换、光照变换具有不变性，对于噪声、遮挡、以及背景干扰均具有良好的鲁棒性，并且特征间具有很高的区分度。Local features are a recently emerging feature extraction method in the field of computer vision, and have been widely used in the fields of object recognition, image registration, image retrieval, and 3D reconstruction. Local features are invariant to geometric transformation and illumination transformation, have good robustness to noise, occlusion, and background interference, and have a high degree of discrimination between features.

对于目标识别任务来讲，局部特征的提取完成了最基本的一步。局部特征提取的信息包括特征点信息和特征点对应的描述子信息。之后还需要进行描述子的匹配、匹配对筛选、利用概率模型的过程才能完成目标识别，这还不包括关于物体描述子库的建立过程。而且在利用局部特征进行匹配进而识别的整个过程中，还必须用到的是所识别物体表面在物理上的对应性。For target recognition tasks, the extraction of local features completes the most basic step. The information of local feature extraction includes feature point information and descriptor information corresponding to the feature point. Afterwards, the process of descriptor matching, matching pair screening, and the use of probability models is required to complete the target recognition, which does not include the process of establishing the object descriptor library. Moreover, in the whole process of using local features for matching and recognition, the physical correspondence of the surface of the recognized object must also be used.

本发明提出了一种基于降维局部特征描述子和隐条件随机场的目标识别方法。它首先对图像提取SIFT(Scale invariant feature transform，尺度不变特征)特征描述子，在保持SIFT描述子的高维空间中的关系为前提，利用近邻保持嵌入(Neighbor Preserving Embedding，NPE)方法对高维描述子进行降维，建立隐条件随机场(Hidden Conditional Random Fields，HCRF)模型并用于目标识别。The invention proposes a target recognition method based on dimensionality reduction local feature descriptor and implicit conditional random field. It first extracts SIFT (Scale invariant feature transform, scale invariant feature) feature descriptors from the image, on the premise of maintaining the relationship in the high-dimensional space of the SIFT descriptor, and uses the neighbor preserving embedding (Neighbor Preserving Embedding, NPE) method for high-level Dimensional descriptors are used to reduce the dimensionality, and a hidden conditional random field (Hidden Conditional Random Fields, HCRF) model is established and used for target recognition.

发明内容 Contents of the invention

本发明一种基于降维局部特征描述子和隐条件随机场的目标识别方法，所针对解决的问题：提取图像的SIFT特征的描述子，对描述子使用NPE方法降维，并使用隐条件随机场进行建模并完成目标识别的任务。The present invention is a target recognition method based on dimensionality reduction local feature descriptor and implicit conditional random field. The airport is modeled and the task of object recognition is performed.

本发明提出的一种基于降维局部特征描述子和隐条件随机场的目标识别方法，它的目标是建立起一个用于物体识别的目标识别模型，包括建模和识别两个过程。其中建模的步骤包括：The present invention proposes a target recognition method based on dimensionality reduction local feature descriptors and implicit conditional random fields. Its goal is to establish a target recognition model for object recognition, including two processes of modeling and recognition. The modeling steps include:

(1)对于训练样本集中的包含有对应标签值的物体的每一幅图像，提取其SIFT特征描述子；(1) For each image of the object containing the corresponding label value in the training sample set, extract its SIFT feature descriptor;

(2)使用NPE方法对提取的高维SIFT特征描述子降维，得到降维以后向量集合；(2) Use the NPE method to reduce the dimensionality of the extracted high-dimensional SIFT feature descriptor, and obtain the vector set after dimensionality reduction;

(3)每一幅图像对应的降维后的向量集合，与物体的标签号构成训练HCRF模型的一个样本，则经过所有样本学习得到可用于识别物体的隐条件随机场模型；(3) The reduced-dimensional vector set corresponding to each image, and the label number of the object constitute a sample for training the HCRF model, and then learn from all samples to obtain an implicit conditional random field model that can be used to identify objects;

识别的步骤包括：The identification steps include:

(1)对于待识别的测试样本集中的包含有对应物体的每一幅图像，提取其SIFT特征描述子；(1) For each image containing the corresponding object in the test sample set to be identified, extract its SIFT feature descriptor;

(3)每一幅图像对应的降维后的向量集合，输入训练得到的隐条件随机场模型，输出物体标签号，作为最终的识别结果。(3) The dimension-reduced vector set corresponding to each image is input to the hidden conditional random field model obtained by training, and the object label number is output as the final recognition result.

其中，对于训练样本集中的包含有对应标签值的物体的每一幅图像，或者待识别的测试样本集中的包含有对应物体的每一幅图像，提取对应的SIFT特征，均包括特征点检测和描述子计算两个过程，其中特征点检测步骤为：Among them, for each image of the object containing the corresponding label value in the training sample set, or each image containing the corresponding object in the test sample set to be recognized, the corresponding SIFT feature is extracted, including feature point detection and There are two processes of descriptor calculation, in which the feature point detection steps are:

(1)尺度空间极值点检测：检测尺度空间上的极值点，需要对高斯差(Difference-of-Gaussian，DoG)运算后的图像D(x，y，σ)中的点进行遍历。D(x，y，σ)表示为(1) Scale space extreme point detection: To detect extreme points in the scale space, it is necessary to traverse the points in the image D(x, y, σ) after the Difference-of-Gaussian (DoG) operation. D(x, y, σ) is expressed as

D(x，y，σ)＝(G(x，y，kσ)-G(x，y，σ))*I(x，y)D(x,y,σ)=(G(x,y,kσ)-G(x,y,σ))*I(x,y)

＝L(x，y，kσ)-L(x，y，σ)＝L(x,y,kσ)-L(x,y,σ)

其中k为相邻两个尺度间的尺度因子。G(x，y，σ)是以原点为均值，σ为均方差的高斯函数，L(x，y，σ)称作一幅图像的关于可变尺度σ的高斯平滑。I(x，y)表示源图像，*表示卷积运算。比较D(x，y，σ)中的每一个点与相邻8个点以及上下两层相邻的9个点的灰度值，若该点对应灰度值为相邻区域的极大或者极小值，则将其作为候选的关键点；where k is the scale factor between two adjacent scales. G(x, y, σ) is a Gaussian function with the origin as the mean and σ as the mean square error, and L(x, y, σ) is called the Gaussian smoothing of an image with respect to variable scale σ. I(x, y) represents the source image, and * represents the convolution operation. Compare the gray value of each point in D(x, y, σ) with the 8 adjacent points and the 9 points adjacent to the upper and lower layers, if the corresponding gray value of the point is the maximum or minimum value, it is used as a candidate key point;

(2)精确特征点定位：若检测到的局部极值点为X₀＝(x₀，y₀，σ)，对D(x，y，σ)使用泰勒展开，并对展开式求导，使导数为0，得到对应于局部极值点X₀的精确位置(2) Accurate feature point positioning: If the detected local extreme point is X ₀ = (x ₀ , y ₀ , σ), use Taylor expansion for D(x, y, σ), and derive the expansion, Make the derivative 0 to get the exact position corresponding to the local extremum point X ₀

${X x}_{acc acc} = = [[- - ((\frac{{&PartialD; &PartialD;}^{22} D D.}{{&PartialD; &PartialD; X x}^{22}})) \frac{&PartialD; &PartialD; D D.}{&PartialD; &PartialD; X x}]] {| |}_{X x = = {X x}_{00};;}$

描述子计算步骤包括：The descriptor calculation steps include:

(1)主方向确定：对于每一幅经过高斯平滑的图像L(x，y，σ)，特征点处的周围点的梯度幅度m(x，y)以及方向θ(x，y)由下列两式计算：(1) Determine the main direction: For each Gaussian smoothed image L(x, y, σ), the gradient magnitude m(x, y) and direction θ(x, y) of the surrounding points at the feature point are determined by the following Two calculations:

$m m ((x x,, y the y)) = = \sqrt{{((L L ((x x + + 11,, y the y)) - - L L ((x x - - 11))))}^{22} + + {((L L ((x x,, y the y + + 11)) - - L L ((x x,, y the y - - 11))))}^{22}}$

θ(x，y)＝arctan(L(x，y+1)-L(x，y-1))/(L(x+1，y)-L(x-1，y))θ(x,y)=arctan(L(x,y+1)-L(x,y-1))/(L(x+1,y)-L(x-1,y))

将0°～360°分为36个等分，每个等分10°，根据方向θ(x，y)作关于对应幅度m(x，y)的直方图，对应直方图中最大值的对应的方向作为该特征点的主方向；Divide 0°～360° into 36 equal parts, each equal 10°, according to the direction θ(x, y), make a histogram about the corresponding amplitude m(x, y), corresponding to the maximum value in the histogram The direction of is taken as the main direction of the feature point;

(2)描述子计算：以特征点为中心旋转坐标轴使x方向与该特征点的主方向重合，取16×16大小的窗口，将其分为4×4的均等正方形的格状区域，使用均方差为计算描述子所用窗口的变长的一半大小即8的高斯函数对区域内的点进行权值分配，再对于每个区域计算水平、竖直、主对角线、副对角线各两侧共8个方向上关于梯度值的直方图，每个方向上对应的梯度值作为特征描述子中的一个分量，形成4×4×8＝128维的向量，并做归一化生成最终的描述子向量。(2) Descriptor calculation: Rotate the coordinate axis around the feature point so that the x direction coincides with the main direction of the feature point, take a 16×16 window, and divide it into 4×4 equal square grid areas, Use the Gaussian function whose mean square error is half the variable length of the window used to calculate the descriptor, that is, 8, to assign weights to the points in the area, and then calculate the horizontal, vertical, main diagonal, and subdiagonal for each area A histogram of gradient values in 8 directions on each side, the corresponding gradient value in each direction is used as a component in the feature descriptor to form a 4×4×8=128-dimensional vector, and normalized to generate The final descriptor vector.

其中，对于训练样本集中的包含有对应标签值的物体的每一幅图像，或者待识别的测试样本集中的包含有对应物体的每一幅图像，提取的SIFT特征描述子采用NPE方法降维。NPE方法可以对以相同维度的高维向量为顶点，相互距离为边上权值的无向图中的向量进行降维，并保持边上权值的不变性；即对于给定的向量序列x＝[x₁，x₂，...，x_m]，降维后的向量序列为y＝[y₁，y₂，..，y_m]，由x_t到y_t的映射表示为

其中

D＝r×c，d＜＜D，A_npe是D×d维的转换矩阵，其步骤如下：Among them, for each image of the object containing the corresponding label value in the training sample set, or each image containing the corresponding object in the test sample set to be recognized, the extracted SIFT feature descriptor is reduced by using the NPE method. The NPE method can reduce the dimensionality of vectors in an undirected graph with high-dimensional vectors of the same dimension as vertices and mutual distances as edge weights, and maintain the invariance of edge weights; that is, for a given vector sequence x =[x ₁ , x ₂ ,..., x _m ], the vector sequence after dimension reduction is y=[y ₁ , y ₂ ,.., y _m ], the mapping from x _t to y _t is expressed as

in

D=r×c, d<<D, A _npe is a transformation matrix of D×d dimension, and its steps are as follows:

(1)构造邻接图：假使G表示具有m个节点的图，t和s对应图像在特征点序列中的序号，按照以下方式构造邻接图：(1) Construct an adjacency graph: Assuming that G represents a graph with m nodes, t and s correspond to the sequence numbers of the image in the feature point sequence, and construct an adjacency graph as follows:

a)如果x_t和x_s属于同一源物体，计算二者之间的欧氏距离dist(t，s)＝||x_t-x_s|；否则，dist(t，s)＝C，C是预定义常量；a) If x _t and x _s belong to the same source object, calculate the Euclidean distance between them dist(t, s) = ||x _t - x _s |; otherwise, dist(t, s) = C, C is a predefined constant;

b)如果x_s位于x_t的k近邻范围内，在x_t到x_s之间建立有向连接线；b) If x _s is within the k-nearest neighbor range of x _t , establish a directed connection line between x _t and x _s ;

(2)计算权重矩阵：每个数据可由其邻近编号的向量的线形组合重构而成，在满足∑_sW_ts＝1的前提下最小化目标函数∑_t||x_t-∑_sW_tsx_s||，得到的最优代表局部邻近线形关系的权重矩阵W，其中，W_ts表示x_t由其邻近点x_s根据空间距离归一化重构后的系数；(2) Calculate the weight matrix: each data can be reconstructed by the linear combination of its adjacent numbered vectors, and the objective function ∑ _t ||x _t -∑ _s W _ts is minimized under the premise of satisfying ∑ _s W _ts = 1 x _s ||, the obtained weight matrix W that optimally represents the local adjacent linear relationship, where W _ts represents the coefficient of x _t normalized and reconstructed by its neighbor x _s according to the spatial distance;

(3)计算投影矩阵：最小化成本函数Φ(Y)＝∑_t(y_t-∑_sW_tsy_s)²＝a^TXMX^Ta，M＝(I-W)^T(I-W)，I是单位矩阵，并施加约束

转换向量a是通过求解广义特征方程xMx^Ta＝λxx^Ta的最小特征值得到，假定列向量a₁，a₂，...，a_d是按照特征值λ₁≤λ₂≤...≤λ_d排序的对应解，则最终的映射关系表示为

是D×d维矩阵。(3) Calculate the projection matrix: minimize the cost function Φ(Y)=∑ _t (y _t -∑ _s W _ts y _s ) ² ＝a ^T XMX ^T a, M=(IW) ^T (IW), I is the unit matrix, and impose constraints on

The conversion vector a is obtained by solving the minimum eigenvalue of the generalized characteristic equation xMx ^T a=λxx ^T a, assuming that the column vectors a ₁ , a ₂ ,..., a _d are according to the eigenvalues λ ₁ ≤λ ₂ ≤... ≤λ _d sorting corresponding solutions, then the final mapping relationship is expressed as

is a D×d dimensional matrix.

隐条件随机场(Hidden Conditional Random Fields)模型，可以根据输入的同维观测向量序列y＝{y₁，y₂，...，y_m}判别标记值z，一个隐条件随机场的参数模型由隐藏状态层、输入的观测向量及标签值组成，HCRF利用下式对标签的条件概率进行建模和判别：Hidden Conditional Random Fields (Hidden Conditional Random Fields) model, according to the input same-dimensional observation vector sequence y = {y ₁ , y ₂ ,..., y _m } to distinguish the tag value z, a parametric model of hidden conditional random fields Consisting of hidden state layer, input observation vector and label value, HCRF uses the following formula to model and judge the conditional probability of labels:

$P P ((z z | | y the y,, θ θ,, ω ω)) = = \underset{h h}{Σ Σ} ((z z,, h h | | y the y,, θ θ,, ω ω)) = = \frac{{Σ Σ}_{h h} {e e}^{Ψ Ψ ((z z,, h h,, y the y θ θ,, ω ω))}}{{Σ Σ}_{{z z}^{' '} &Element; &Element; Z Z,, h h &Element; &Element; H h} {e e}^{Ψ Ψ (({z z}^{' '},, h h,, yθ yθ,, ω ω))}}$

其中，h＝{h₁，h₂，..，h_m}对应于观测序列y，h_i∈H，H表示所有可能出现的隐藏状态集合；参数为θ＝[θ_h，θ_z，θ_e]和窗口大小ω的势能函数Ψ(z，h，y：θ，ω)为Among them, h={h ₁ , h ₂ ,..,h _m } corresponds to the observation sequence y, h _i ∈ H, and H represents all possible hidden state sets; the parameters are θ=[θ _h , θ _z , θ _e ] and the potential energy function Ψ(z, h, y: θ, ω) of the window size ω is

图E是无向图，(j，k)表示其中的一条边，图中每个顶点对应一个隐藏状态；

可以表示观测窗口中的任意特征；参数组θ＝[θ_h，θ_z，θ_e]中，θ_h表示对应隐含状态h_i∈H的参数，θ_z度量的是隐藏状态h_i与标签z之间的兼容度，θ_e度量的是相连状态j和k与标签z之间的兼容度；Graph E is an undirected graph, (j, k) represents one of the edges, and each vertex in the graph corresponds to a hidden state;

can represent any feature in the observation window; in the parameter group θ=[θ _h , θ _z , θ _e ], θ _h represents the parameter corresponding to the hidden state h _i ∈ H, and θ _z measures the hidden state h _i and the label The compatibility between z, θ _e measures the compatibility between the connected states j and k and the label z;

(1)HCRF模型的训练过程中，参数组θ＝[θ_h，θ_z，θ_e]的最优值根据下式确定(1) During the training process of the HCRF model, the optimal value of the parameter set θ = [θ _h , θ _z , θ _e ] is determined according to the following formula

θ^*＝argmax_θL(θ)θ ^* = argmax _θ L(θ)

其中，估计函数L(θ)为Among them, the estimation function L(θ) is

$L L ((θ θ)) = = {Σ Σ}_{i i = = 11}^{n no} log log P P (({z z}_{i i} | | {y the y}_{i i},, θ θ,, w w)) - - \frac{11}{22 {σ σ}_{θ θ}^{22}} {| | | | θ θ | | | |}^{22}$

其中n表示训练样本序列的总个数，参数θ服从方差为σ_θ ²的高斯分布；Where n represents the total number of training sample sequences, and the parameter θ obeys a Gaussian distribution with a variance of σ _θ ² ;

(2)判别过程，对于输入的观测向量序列y，判别的标签值

为(2) Discrimination process, for the input observation vector sequence y, the discriminant label value

for

$\overset{~ ~}{z z} = = arg arg \underset{z z &Element; &Element; Z Z}{max max} P P ((z z | | y the y,, ω ω,, {θ θ}^{* *})) . .$

附图说明 Description of drawings

图1是整个模型的建立和识别过程的流程图。Figure 1 is a flow chart of the entire model building and recognition process.

图2是特征点周围16×16区域中的梯度以及高斯权值范围图。Figure 2 is a graph of the gradient and the Gaussian weight range in the 16×16 area around the feature point.

图3是描述子最终结果图。Figure 3 is the final result map of the descriptor.

图4是包含4个隐状态的单个目标的隐条件随机场模型示意图。Fig. 4 is a schematic diagram of a hidden conditional random field model of a single target including 4 hidden states.

具体技术方案Specific technical solutions

模型的建立及目标的识别过程如图1所示。用于训练目标识别模型的图像集合中包含L个物体，其中第l个物体又对应k_l幅训练图像。一幅包含某个特定物体的源图像img_i经过计算后得到的SIFT特征点集合，其中每个特征点对应的信息可由一个多元组表示：The establishment of the model and the identification process of the target are shown in Figure 1. The image collection used to train the target recognition model contains L objects, and the lth object corresponds to k _l training images. A set of SIFT feature points obtained after calculation of a source image img _i containing a specific object, where the information corresponding to each feature point can be represented by a tuple:

Sift_j：＝<j，(x，y)，σ，θ，descriptor_128×1>Sift _j :=<j, (x, y), σ, θ, descriptor _128×1 >

其中j代表该特征点在集合中的标号，(x，y)表示该特征点在源图像中的位置，σ表示对应的尺度信息，θ表示主方向信息，descriptor_12×1代表的是该特征点对应的128维的描述子向量。Where j represents the label of the feature point in the set, (x, y) represents the position of the feature point in the source image, σ represents the corresponding scale information, θ represents the main direction information, and descriptor _12×1 represents the feature The 128-dimensional descriptor vector corresponding to the point.

决定特征点信息的主要部分是描述子向量，它是匹配过程的主要依据。则对于源图像img_i计算得到的SIFT特征，抽取其描述子部分以构成描述子向量集合：The main part of determining feature point information is the descriptor vector, which is the main basis of the matching process. Then, for the SIFT feature calculated by the source image img _i , extract its descriptor part to form a set of descriptor vectors:

SiftSet_i＝{SiftDescriptor_j}SiftSet _i = {SiftDescriptor _j }

＝{<j，descriptor_128×1>}={<j, descriptor _128×1 >}

再用NPE方法对SIFT描述子进行降维，其中原始维度D＝128，降维后维度选取d＝6。则降维过程表示为Then use the NPE method to reduce the dimension of the SIFT descriptor, where the original dimension is D=128, and the dimension after dimension reduction is selected as d=6. Then the dimensionality reduction process is expressed as

${SiftSet SiftSet}_{t t}^{((red red))} = = {{SiftDescripto SiftDescripto {r r}_{j j}}}^{((red red))}$

$- - {{< < j j,, {A A}_{npe npe} {desciptor desciptor}_{128128 \times \times 11}}}$

其中A_npe是降维变换矩阵。where _Anpe is the dimensionality reduction transformation matrix.

每个包含某个源物体的图像都对应一个标签值obj_i，其中obj_i＝l，1≤l≤L。则训练过程的输入集合为{

obj_i＞，j＝1，...，n}，n表示训练图像的总个数。根据训练样本对模型进行训练得到对应的模型参数，对于输入的测试图像，经过SIFT特征提取，NPE降维过程得到降维后的向量集合，输入模型后得到输出的目标识别结果。Each image containing a certain source object corresponds to a label value obj _i , where obj _i =l, 1≤l≤L. Then the input set of the training process is {

obj _i >, j=1, . . . , n}, n represents the total number of training images. According to the training samples, the model is trained to obtain the corresponding model parameters. For the input test image, after the SIFT feature extraction, the NPE dimensionality reduction process obtains the vector set after dimensionality reduction, and the output target recognition result is obtained after inputting the model.

SIFT特征提取首先要计算图像高斯平滑(LoG)和图像的高斯差(DoG)。计算图像的高斯平滑的过程中，需要用到尺度空间的概念。尺度空间分成不同的层，每一层对于图像来说对应不同的采样率，算一层的采样步长为1，第二层为2，第三层为4，第k层的采样步长为2^k-1。而每一层中，又分成S个层次，其中第s层次上对应的用于平滑的高斯函数的均方差为σ_s＝2^s/Sσ₀，其中σ₀＝16，S通常取5。一层中相邻的两个层次做差得到图像的高斯差，每层中的高斯差图像的个数为S-1。判断高斯差上的某一点是否为可能的特征点，即要将它与本层次以及上下两个层次中共26个点的值相比较，如果是极值点，就选为候选的特征点。The SIFT feature extraction first needs to calculate the Gaussian smoothing (LoG) of the image and the difference of Gaussian (DoG) of the image. In the process of calculating the Gaussian smoothing of an image, the concept of scale space is needed. The scale space is divided into different layers. Each layer corresponds to a different sampling rate for the image. The sampling step size of the first layer is 1, the second layer is 2, the third layer is 4, and the sampling step size of the kth layer is 2k ^-1 . Each layer is further divided into S layers, where the mean square error of the Gaussian function used for smoothing on the sth layer is σ _s =2 ^s/S σ ₀ , where σ ₀ =16, and S is usually 5. The Gaussian difference of the image is obtained by difference between two adjacent layers in one layer, and the number of Gaussian difference images in each layer is S-1. To judge whether a point on the Gaussian difference is a possible feature point, it is necessary to compare it with the value of 26 points in this level and the upper and lower levels. If it is an extreme point, it is selected as a candidate feature point.

特征点周围用于计算描述子的梯度群以及相应的高斯权值分布范围如图2所示。在高斯平滑图像中，取特征点周围16×16个点，根据The gradient group used to calculate the descriptor around the feature points and the corresponding Gaussian weight distribution range are shown in Figure 2. In the Gaussian smooth image, take 16×16 points around the feature point, according to

θ(x，y)＝arctan(L(x，y+1)-L(x，y 1))/(L(x+1，y)L(x-1，y))θ(x,y)=arctan(L(x,y+1)-L(x,y 1))/(L(x+1,y)L(x-1,y))

确定用于每个位置对应的幅值和方向。使用均方差为计算描述子所用窗口的变长的一半大小即8的高斯函数对区域内的点进行权值分配，以增强光照和几何变化的不变形。再对于每个区域计算水平、竖直、主对角线、副对角线各两侧共8个方向上关于梯度值的直方图，每个方向上对应的梯度值作为特征描述子中的一个分量，形成4×4×8＝128维的向量，并做归化生成最终的描述子向量。如图3所示。Determine the corresponding magnitude and direction for each location. Use a Gaussian function whose mean square error is half the variable length of the window used to calculate the descriptor, that is, 8, to assign weights to the points in the area, so as to enhance the invariance of illumination and geometric changes. Then calculate the histogram of the gradient value in 8 directions on each side of the horizontal, vertical, main diagonal, and subdiagonal for each area, and the corresponding gradient value in each direction is used as one of the feature descriptors components to form a 4×4×8=128-dimensional vector, and perform normalization to generate the final descriptor vector. As shown in Figure 3.

图4所示的是单个物体对应的隐条件随机场中所抽象出来的模型。模型共分为三层：最上一层代表的是目标对应的标签，对于训练模型时它是输入的部分，在识别过程中它是最终输出结果；中间一层是隐状态构成的无向图，这些隐状态中每两个代表隐状态的顶点间都存在有一个边，而物体标签与隐状态间的边的值、隐状态的值和边上的权值都在训练过程中不断调整以完成；每个隐状态又对应一个观测向量，对于本发明而言，这些观测向量对应的是经过NPE方法降维以后的向量集合。图4中只包含有4个隐状态，对应4个观测向量。而实际过程中，通常而言，一幅包含一定纹理的关于物体的图像经过SIFT算法提取，具备几百甚至上千个特征点描述子，因而对应相同数目降维后的向量集合，其构成了基于HCRF的目标识别模型的训练以及识别的输入部分。Figure 4 shows the model abstracted from the implicit conditional random field corresponding to a single object. The model is divided into three layers: the top layer represents the label corresponding to the target, it is the input part for the training model, and it is the final output result in the recognition process; the middle layer is an undirected graph composed of hidden states, In these hidden states, there is an edge between every two vertices representing the hidden state, and the value of the edge between the object label and the hidden state, the value of the hidden state, and the weight on the edge are constantly adjusted during the training process to complete ; Each hidden state corresponds to an observation vector. For the present invention, these observation vectors correspond to the vector set after dimensionality reduction by the NPE method. Figure 4 contains only 4 hidden states, corresponding to 4 observation vectors. In the actual process, generally speaking, an image of an object containing a certain texture is extracted by the SIFT algorithm, and has hundreds or even thousands of feature point descriptors, so corresponding to the same number of reduced-dimensional vector sets, it constitutes The training of target recognition model based on HCRF and the input part of recognition.

Claims

1. A target recognition method based on dimensionality reduction local feature descriptors and hidden conditional random fields, characterized in that: its goal is to set up a target recognition model for object recognition, including two stages of model building and object recognition , where the modeling steps include:

1.1. For each image of the object containing the corresponding label value in the training sample set, extract its scale-invariant feature SIFT feature descriptor;

1.2. Use the neighbor-preserving embedding method to reduce the dimensionality of the extracted high-dimensional scale-invariant feature SIFT feature descriptor, and obtain the vector set after dimensionality reduction;

1.3. The reduced-dimensional vector set corresponding to each image and the label number of the object constitute a sample for training the hidden conditional random field model, and then learn from all samples to obtain the hidden conditional random field model that can be used to identify objects; Steps include:

2.1. For each image in the test set that contains the corresponding object, extract its scale-invariant feature SIFT feature descriptor;

2.2. Use the neighbor-preserving embedding method to reduce the dimensionality of the extracted high-dimensional scale-invariant feature SIFT feature descriptor, and obtain the vector set after dimensionality reduction;

2.3. The reduced-dimensional vector set corresponding to each image is input to the implicit conditional random field model obtained by training, and the object label number is output as the final recognition result;

Among them, for each image of the object containing the corresponding label value in the training sample set, or each image containing the corresponding object in the test sample set to be recognized, the corresponding scale-invariant feature SIFT feature is extracted, including There are two processes of feature point detection and descriptor calculation, where the feature point detection steps are:

3.1. Scale space extreme point detection: To detect extreme points in the scale space, it is necessary to traverse the points in the image D(x,y,σ) after the Gaussian difference operation; D(x,y,σ) is expressed as

D(x,y,σ)=(G(x,y,kσ)-G(x,y,σ))*I(x,y)

=L(x,y,kσ)-L(x,y,σ)

Among them, k is the scale factor between two adjacent scales; G(x,y,σ) is a Gaussian function with the origin as the mean and σ as the mean square error, and L(x,y,σ) is called an image about Gaussian smoothing with variable scale σ; I(x,y) represents the source image, * represents the convolution operation; compare each point in D(x,y,σ) with the adjacent 8 points and the upper and lower layers The gray value of the 9 points, if the corresponding gray value of the point is the maximum or minimum value of the adjacent area, then this point is used as the key point of the candidate;

3.2. Accurate feature point positioning: If the detected local extreme point is X ₀ =(x ₀ ,y ₀ ,σ), use Taylor expansion for D(x,y,σ), and derive the expansion, so that The derivative is 0, and the exact position corresponding to the local extremum point X ₀ is obtained

The descriptor calculation steps include:

4.1. Determine the main direction: For each Gaussian smoothed image L(x, y, σ), the gradient magnitude m(x, y) and direction θ(x, y) of the surrounding points at the feature point are determined by the following two formula calculation:

θ(x,y)=arctan(L(x,y+1)-L(x,y-1))/(L(x+1,y)-L(x-1,y))

Divide 0°～360° into 36 equal parts, each equal part is 10°, according to the direction θ(x,y), make a histogram about the corresponding amplitude m(x,y), corresponding to the maximum value in the histogram The direction of is taken as the main direction of the feature point;

4.2. Descriptor calculation: Rotate the coordinate axis around the feature point so that the x direction coincides with the main direction of the feature point, take a 16×16 window, divide it into 4×4 equal square grid areas, and use The mean square error is half of the variable length of the window used to calculate the descriptor, that is, a Gaussian function of 8 to assign weights to the points in the area. For each area, two horizontal, vertical, main diagonals, and subdiagonals are calculated. The histogram of the gradient value in 8 directions on the side, the corresponding gradient value in each direction is used as a component in the feature descriptor to form a 4×4×8=128-dimensional vector, and normalized to generate the final descriptor vector;

Wherein, the described neighbor-preserving embedding method performs dimensionality reduction on an undirected graph with high-dimensional vectors of the same dimension as vertices and mutual distances as edge weights, and maintains the invariance of edge weights; that is, for a given The vector sequence x=[x ₁ ,x ₂ ,…,x _m ], the vector sequence after dimension reduction is y=[y ₁ ,y ₂ ,…,yx], the mapping from x _t to y _t is expressed as

in

D=r×c, d<<D, A _npe is a transformation matrix of D×d dimension, and the steps are as follows:

5.1. Construct an adjacency graph: Assuming that G represents a graph with m nodes, and t and s correspond to the sequence numbers of the image in the feature point sequence, construct an adjacency graph as follows:

a) If x _t and x _s belong to the same source object, calculate the Euclidean distance between them dist(t,s)=||x _t -x _s ||; otherwise, dist(t,s)=C, C is a predefined constant;

b) If x _s is within the k-nearest neighbor range of x _t , establish a directed connection line between x _t and x _s ;

5.2. Calculate the weight matrix: each data is reconstructed by the linear combination of its adjacent numbered vectors, and the objective function ∑ _t ||x _t -∑ _s W _ts x is minimized under the premise of satisfying ∑ _s W _ts =1 _s ||, the obtained weight matrix W that optimally represents the local adjacent linear relationship, where W _ts represents the coefficient of x _t normalized and reconstructed by its neighbor x _s according to the spatial distance;

5.3. Calculate the projection matrix: minimize the cost function Φ(Y)=∑ _t (y _t -∑ _s W _ts y _s ) ² =a ^T XMX ^T a, M=(IW) ^T (IW), I is the identity matrix , and impose constraints

The conversion vector a is obtained by solving the minimum eigenvalue of the generalized characteristic equation xMx ^T a=λxx ^T a, assuming that the column vectors a ₁ , a ₂ ,…,a _d are sorted according to the eigenvalues λ ₁ ≤λ ₂ ≤…≤λ _d corresponding solution, the final mapping relationship is expressed as

is a D×d dimensional matrix;

Among them, the implicit conditional random field model, according to the input same-dimensional observation vector sequence y={y ₁ ,y ₂ ,…,y _m }, distinguishes the label value z, and a parameter model of the hidden conditional random field consists of the hidden state layer , the input observation vector and the label value, the hidden conditional random field HCRF uses the following formula to model and judge the conditional probability of the label:

Among them, h={h ₁ ,h ₂ ,…,h _m } corresponds to the observation sequence y, h _i ∈ H, and H represents all possible hidden state sets; the parameters are θ=[θ _h ,θ _z ,θ _e ] and the potential energy function Ψ(z,h,y:θ,ω) of the window size ω is

Graph E is an undirected graph, (j,k) represents one of the edges, and each vertex in the graph corresponds to a hidden state; Represents any feature in the observation window; in the parameter group θ=[θ _h ,θ _z ,θ _e ], θ _h represents the parameter corresponding to the hidden state h _i ∈ H, and θ _z measures the hidden state h _i and the label z The degree of compatibility between, θ _e measures the degree of compatibility between the connected states j and k and the label z;

6.1. During the training process of the implicit conditional random field model, the optimal value of the parameter group θ=[θ _h ,θ _z ,θ _e ] is determined according to the following formula

θ ^* = argmax _θ L(θ)

Among them, the estimation function L(θ) is

Where n represents the total number of training sample sequences, and the parameter θ obeys a Gaussian distribution with a variance of σ _θ ² ;

6.2. Discrimination process, for the input observation vector sequence y, the discriminant label value

for

Figure 2010105158649100001DEST_PATH_IMAGE002

.