CN105389583A - Image classifier generation method, and image classification method and device - Google Patents
Image classifier generation method, and image classification method and device Download PDFInfo
- Publication number
- CN105389583A CN105389583A CN201410453884.6A CN201410453884A CN105389583A CN 105389583 A CN105389583 A CN 105389583A CN 201410453884 A CN201410453884 A CN 201410453884A CN 105389583 A CN105389583 A CN 105389583A
- Authority
- CN
- China
- Prior art keywords
- model parameters
- image
- beta
- values
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/87—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using selection of the recognition techniques, e.g. of a classifier in a multiple classifier system
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
本发明公开了一种图像分类器的生成方法和装置,该方法包括:获取训练样本集,训练样本集包括N个图像样本,N个图像样本属于K个类别,N、K为正整数,N大于K;获取每一个图像样本的特征向量,其中,特征向量包括图像样本的隐变量;基于N个图像样本的隐变量,通过多元逻辑回归模型,训练K个类别的分类器。本发明实施例中,通过多元逻辑回归模型,以最大似然的形式同时训练K个类别的分类器,也就是说,多元逻辑回归模型的使用保留了K个类别的分类器之间的相互关联,与LVSM将物体分类领域的K类分类问题转换成相互孤立的多个二类问题的方式相比,训练结果更加准确。
The invention discloses a method and device for generating an image classifier. The method includes: obtaining a training sample set, the training sample set includes N image samples, the N image samples belong to K categories, N and K are positive integers, and N is greater than K; the feature vector of each image sample is obtained, wherein the feature vector includes hidden variables of the image sample; based on the hidden variables of N image samples, classifiers of K categories are trained through a multiple logistic regression model. In the embodiment of the present invention, the classifiers of K categories are simultaneously trained in the form of maximum likelihood through the multiple logistic regression model, that is to say, the use of the multiple logistic regression model preserves the correlation between the classifiers of K categories , compared with the way that LVSM converts the K-class classification problem in the field of object classification into multiple isolated two-class problems, the training results are more accurate.
Description
技术领域technical field
本发明涉及图像分类领域,并且更具体地,涉及一种图像分类器的生成方法、图像分类方法和装置。The present invention relates to the field of image classification, and more specifically, to a method for generating an image classifier, an image classification method and a device.
背景技术Background technique
隐变量指不能直接被观测到,却在实际应用中起到重要作用的综合性变量,如空间关系、数据结构、内联状态等。隐变量广泛应用于机器视觉、自然语言处理、语音识别和公众健康等领域。实验证明,处理图像、语音等对象时,隐变量的引入能捕获更多的有用信息,与仅使用显变量的方式相比,处理效果显著提高。Hidden variables refer to comprehensive variables that cannot be directly observed but play an important role in practical applications, such as spatial relationships, data structures, and inline states. Latent variables are widely used in fields such as machine vision, natural language processing, speech recognition, and public health. Experiments have proved that when processing objects such as images and voices, the introduction of latent variables can capture more useful information, and the processing effect is significantly improved compared with the method of only using explicit variables.
早期的隐变量模型多为生成模型(generativemodels),如隐马尔可夫模型(HiddenMarkovModel,HMM)、高斯混合模型(GaussianMixtureModel,GMM)等。近期更多的研究者试图探寻判别模型(discriminativemodels)中引入隐变量的可能性。典型的例子如条件随机场(ConditionalRandomField,CRF)、隐变量支持向量机(LatentSupportVectorMachine,LSVM)等,这些模型在各自领域均取得了一定成果。值得一提的是,LSVM配合局部可变形模型(DeformablePart-basedModel,DPM),即DPM-LSVM,在机器视觉中的物体检测领域已成为近年来较为成功的算法。DPM用于描述检测类别物体的特征,它由三部分组成:一个主体滤波器(rootfilter),多个局部滤波器(partfilters),以及每个局部对应的形变惩罚(deformablecosts)。主体部分用于描述物体的大体轮廓,局部部分用于描述检测物体的细节特征,形变惩罚用于保证每个局部相对于主体的位置不能有过大的偏移。在物体检测过程中,局部相对于主体的位置可以在一定范围内变化,可看作隐变量,采用LSVM进行训练。Most of the early hidden variable models were generative models, such as Hidden Markov Model (HMM), Gaussian Mixture Model (GMM) and so on. Recently, more researchers have attempted to explore the possibility of introducing hidden variables into discriminative models. Typical examples include Conditional Random Field (CRF), Latent Support Vector Machine (LSVM), etc. These models have achieved certain results in their respective fields. It is worth mentioning that LSVM combined with a locally deformable model (DeformablePart-basedModel, DPM), that is, DPM-LSVM, has become a relatively successful algorithm in the field of object detection in machine vision in recent years. DPM is used to describe the characteristics of detection category objects. It consists of three parts: a main filter (rootfilter), multiple local filters (partfilters), and each local corresponding deformation penalty (deformablecosts). The main part is used to describe the general outline of the object, the local part is used to describe the detailed features of the detected object, and the deformation penalty is used to ensure that the position of each part relative to the main body does not have too much deviation. In the process of object detection, the position of the part relative to the subject can change within a certain range, which can be regarded as a hidden variable, and LSVM is used for training.
LSVM的目标函数形式与原始的SVM相似,如(1)所示:The form of the objective function of LSVM is similar to the original SVM, as shown in (1):
其中,β是分类器的模型参数,yi表示训练样本xi的标签,s(xi,β)表示样本xi的分数,这个分数是在所有可能局部相对位置(即隐变量取值范围)中最优的分数,该分数满足式(2):Among them, β is the model parameter of the classifier, y i represents the label of the training sample xi , s( xi , β) represents the score of the sample xi , and this score is in all possible local relative positions (that is, the hidden variable value range ), which satisfies formula (2):
式(2)中,z为隐变量,f为特征提取方法,f(xi,z)为样本xi的特征向量,如DPM中使用框架梯度直方图特征。In formula (2), z is a hidden variable, f is a feature extraction method, and f( xi ,z) is a feature vector of sample xi , such as the frame gradient histogram feature used in DPM.
可以证明LSVM的目标函数(式(1))具有半凹性,即固定正样本的隐变量取值时,目标函数是凹的。因此,LSVM的求解可使用坐标梯度下降(CoordinateGradientDescent),即首先固定分类器的模型参数,求得正样本隐变量取值,再固定正样本隐变量取值,求最优模型参数和负样本隐变量取值,如此迭代直至收敛。It can be proved that the objective function (formula (1)) of LSVM has semi-concave property, that is, when the hidden variable of the fixed positive sample takes a value, the objective function is concave. Therefore, the solution of LSVM can use coordinate gradient descent (CoordinateGradientDescent), that is, first fix the model parameters of the classifier, obtain the value of the hidden variable of the positive sample, and then fix the value of the hidden variable of the positive sample, and find the optimal model parameter and hidden variable of the negative sample. The variable takes a value, and iterates until convergence.
LSVM与SVM一样,主要适用于物体检测领域。当推广到物体分类领域时,LSVM的处理方式是将物体分类领域中的多类问题转化成物体检测领域的二类问题。采用此种处理方式,会使得用于物体分类的多个分类器的训练过程彼此孤立。实际中,多种物体类别之间可能存在一定的关联性,比如,将建筑物分成多类建筑风格,待分类图片中的建筑物可能同时具有两种或两种以上建筑风格的特征。因此,将多个分类器的训练过程转化成彼此孤立、非此即彼的多个二类问题,会导致分类结果不准确。LSVM, like SVM, is mainly applicable to the field of object detection. When extended to the field of object classification, LSVM's processing method is to transform the multi-class problem in the field of object classification into the second-class problem in the field of object detection. With this approach, the training process of multiple classifiers for object classification will be isolated from each other. In practice, there may be a certain correlation between various object categories. For example, buildings are divided into multiple architectural styles, and the buildings in the picture to be classified may have the characteristics of two or more architectural styles at the same time. Therefore, transforming the training process of multiple classifiers into multiple two-class problems that are isolated from each other will lead to inaccurate classification results.
发明内容Contents of the invention
本发明实施例提供一种图像分类器的生成方法和装置,以提高分类结果的准确性。Embodiments of the present invention provide a method and device for generating an image classifier, so as to improve the accuracy of classification results.
第一方面,提供一种图像分类器的生成方法,包括:获取训练样本集,所述训练样本集包括N个图像样本,所述N个图像样本属于K个类别,N、K为正整数,N大于K;获取每一个所述图像样本的特征向量,其中,所述特征向量包括图像样本的隐变量;基于所述N个图像样本的隐变量,通过多元逻辑回归模型,训练所述K个类别的分类器。In a first aspect, a method for generating an image classifier is provided, including: obtaining a training sample set, the training sample set including N image samples, the N image samples belonging to K categories, N and K are positive integers, N is greater than K; obtain the feature vector of each of the image samples, wherein the feature vector includes hidden variables of the image samples; based on the hidden variables of the N image samples, train the K by multiple logistic regression model The classifier for the category.
结合第一方面,在第一方面的一种实现方式中,所述K个类别的分类器分别包括K个模型参数,所述基于所述N个图像样本的隐变量,通过多元逻辑回归模型,训练所述K个类别的分类器,包括:获取所述K个模型参数的初始值;获取所述N个图像样本的隐变量的初始值;基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的目标值。With reference to the first aspect, in an implementation manner of the first aspect, the classifiers of the K categories respectively include K model parameters, and the latent variables based on the N image samples are passed through a multiple logistic regression model, Training the classifiers of the K categories includes: obtaining the initial values of the K model parameters; obtaining the initial values of the latent variables of the N image samples; based on the feature vectors of the N image samples, and the The initial values of the latent variables of the N image samples are used to train the classifiers of the K categories through the multiple logistic regression model, so as to determine the target values of the K model parameters.
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述N个图像样本隐变量的初始值包括:正图像样本隐变量的初始值和负图像样本隐变量的初始值,所述基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的目标值,包括:基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的当前值,当所述K个模型参数的当前值满足预设的收敛条件时,将所述K个模型参数的当前值确定为所述K个模型参数的目标值,当所述K个模型参数的当前值不满足所述收敛条件时,基于所述N个图像样本的特征向量,以及所述K个模型参数的当前值,确定所述正图像样本隐变量的当前值,并利用所述正图像样本隐变量的当前值更新所述正图像样本隐变量的初始值,重复执行本步骤直到所述K个模型参数的当前值满足所述收敛条件。In combination with the first aspect or any of the above-mentioned implementations, in another implementation of the first aspect, the initial values of the hidden variables of the N image samples include: the initial values of the latent variables of the positive image samples and the negative image The initial value of the hidden variable of the sample, the feature vector based on the N image samples, and the initial value of the latent variable of the N image samples, through the multiple logistic regression model, train the classifier of the K categories , to determine the target values of the K model parameters, including: based on the feature vectors of the N image samples and the initial values of the latent variables of the N image samples, the multiple logistic regression model is used to train the Classifiers of K categories, to determine the current values of the K model parameters, and when the current values of the K model parameters meet a preset convergence condition, determine the current values of the K model parameters as the current values of the K model parameters. The target values of the K model parameters, when the current values of the K model parameters do not satisfy the convergence condition, based on the feature vectors of the N image samples and the current values of the K model parameters, determine The current value of the hidden variable of the positive image sample, and use the current value of the hidden variable of the positive image sample to update the initial value of the hidden variable of the positive image sample, and repeat this step until the current values of the K model parameters satisfy The convergence condition.
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的当前值,包括:基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的迭代值,基于所述N个图像样本的特征向量,以及所述K个模型参数的迭代值,确定所述负图像样本隐变量的迭代值,并利用所述负图像样本隐变量的迭代值更新所述负图像样本隐变量的初始值,当所述K个模型参数的迭代值满足预设的迭代停止条件时,将所述K个模型参数的迭代值确定为所述K个模型参数的当前值,否则,重复执行本步骤直到所述K个模型参数的当前值满足所述迭代停止条件。In combination with the first aspect or any of the above implementations, in another implementation of the first aspect, the feature vectors based on the N image samples, and the initial hidden variables of the N image samples value, through the multiple logistic regression model, train the classifiers of the K categories to determine the current values of the K model parameters, including: feature vectors based on the N image samples, and the N The initial value of the latent variable of the image sample, through the multiple logistic regression model, train the classifiers of the K categories to determine the iterative values of the K model parameters, based on the feature vectors of the N image samples, and The iterative value of the K model parameters, determine the iterative value of the hidden variable of the negative image sample, and use the iterative value of the hidden variable of the negative image sample to update the initial value of the hidden variable of the negative image sample, when the K When the iteration values of the K model parameters satisfy the preset iteration stop condition, determine the iteration values of the K model parameters as the current values of the K model parameters, otherwise, repeat this step until the K model parameters The current value of satisfies the iteration stopping condition.
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的迭代值,包括:根据公式确定所述K个模型参数的迭代值,其中,
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述根据公式确定所述K个模型参数的迭代值,包括:根据公式
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述迭代停止条件为所述目标函数值l(θ)的变化小于预设阈值;或者,所述迭代停止条件为迭代次数达到预设次数。In combination with the first aspect or any of the above implementations, in another implementation of the first aspect, the iteration stop condition is that the change of the objective function value l(θ) is less than a preset threshold; or, The iteration stop condition is that the number of iterations reaches a preset number.
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述根据公式确定所述K个模型参数的迭代值,包括:根据公式
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述基于所述N个图像样本的特征向量,以及所述K个模型参数的迭代值,确定所述负图像样本隐变量的迭代值,包括:根据公式确定所述负图像样本隐变量的迭代值,其中,xi表示所述N个图像样本中的第i样本,βt表示所述K个模型参数中的第t个模型参数,且 表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为βt时xi隐变量的迭代值,i为1至N中的任意整数,t为1至K中的任意整数。In combination with the first aspect or any of the above-mentioned implementation manners, in another implementation manner of the first aspect, the feature vectors based on the N image samples and the iteration values of the K model parameters, Determining the iterative value of the hidden variable of the negative image sample includes: according to the formula Determine the iterative value of the hidden variable of the negative image sample, where x i represents the i-th sample among the N image samples, and βt represents the t -th model parameter among the K model parameters, and Indicates the model parameters corresponding to the xi category, Z( xi ) indicates the value range of the hidden variable z of xi , f( xi ,z) indicates the feature vector of xi , Indicates the iterative value of hidden variable x i when the model parameter is β t , i is any integer from 1 to N, and t is any integer from 1 to K.
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述基于所述N个图像样本的特征向量,以及所述K个模型参数的当前值,确定所述正图像样本隐变量的当前值,包括:根据公式确定所述正图像样本隐变量的当前值,其中,xi表示所述N个图像样本中的第i样本,表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为时xi隐变量的当前值,i为1至N中的任意整数。In combination with the first aspect or any of the above-mentioned implementation manners, in another implementation manner of the first aspect, the feature vectors based on the N image samples, and the current values of the K model parameters, Determining the current value of the hidden variable of the positive image sample includes: according to the formula Determine the current value of the hidden variable of the positive image sample, where x i represents the i-th sample in the N image samples, Indicates the model parameters corresponding to the xi category, Z( xi ) indicates the value range of the hidden variable z of xi , f( xi ,z) indicates the feature vector of xi , Indicates that the model parameters are When is the current value of hidden variable x i , i is any integer from 1 to N.
结合第一方面或其上述实现方式的任一种,在第一方面的另一种实现方式中,所述基于每一个所述模型参数的初始值,确定每一个所述图像样本的隐变量的初始值,包括:根据公式确定每一个所述图像样本的隐变量的初始值,其中,xi表示所述N个图像样本中的第i样本,βk表示所述K个模型参数中的第k个模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为βk时xi隐变量z的初始值,i为1至N中的任意整数,k为1至K中的任意整数。In combination with the first aspect or any of the above implementations, in another implementation of the first aspect, the determination of the hidden variable of each image sample based on the initial value of each model parameter Initial values, including: According to the formula Determine the initial value of the hidden variable of each of the image samples, wherein x i represents the i-th sample in the N image samples, β k represents the k-th model parameter in the K model parameters, and Z( x i ) represents the value range of the latent variable z of x i , f( xi , z) represents the feature vector of x i , Indicates the initial value of hidden variable z of x i when the model parameter is β k , i is any integer from 1 to N, and k is any integer from 1 to K.
第二方面,提供一种图像分类方法,包括:获取待分类图像的特征向量;基于所述待分类图像的特征向量,利用K个分类器,确定所述待分类图像的类别,其中,所述K个分类器是利用第一方面或第一方面的任意一种实现方式训练出的K个分类器;根据公式确定所述待分类图像在所述K个类别下的概率,其中,x表示所述待分类图像,βk表示所述K个分类器中第k个分类器的模型参数,f(x,z)表示x的特征向量,Z(x)表示x的隐变量z的取值范围,k为1至K中的任意整数。In a second aspect, an image classification method is provided, including: obtaining a feature vector of an image to be classified; based on the feature vector of the image to be classified, using K classifiers to determine the category of the image to be classified, wherein the The K classifiers are K classifiers trained using the first aspect or any implementation of the first aspect; according to the formula Determining the probability of the image to be classified under the K categories, wherein, x represents the image to be classified, β k represents the model parameters of the kth classifier among the K classifiers, f(x, z) represents the feature vector of x, and Z(x) represents the hidden variable z of x Value range, k is any integer from 1 to K.
第三方面,提供一种图像分类器的生成装置,包括:第一获取单元,用于获取训练样本集,所述训练样本集包括N个图像样本,所述N个图像样本属于K个类别,N、K为正整数,N大于K;第二获取单元,用于获取所述第一获取单元获取的每一个所述图像样本的特征向量,其中,所述特征向量包括图像样本的隐变量;训练单元,用于基于所述第二获取单元获取的所述N个图像样本的隐变量,通过多元逻辑回归模型,训练所述K个类别的分类器。In a third aspect, a device for generating an image classifier is provided, including: a first acquisition unit configured to acquire a training sample set, the training sample set includes N image samples, and the N image samples belong to K categories, N and K are positive integers, and N is greater than K; a second acquisition unit, configured to acquire a feature vector of each of the image samples acquired by the first acquisition unit, wherein the feature vector includes a hidden variable of the image sample; The training unit is configured to train the classifiers of the K categories through a multiple logistic regression model based on the latent variables of the N image samples acquired by the second acquisition unit.
结合第二方面,在第二方面的一种实现方式中,所述K个类别的分类器分别包括K个模型参数,所述训练单元具体用于获取所述K个模型参数的初始值;获取所述N个图像样本的隐变量的初始值;基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的目标值。With reference to the second aspect, in an implementation manner of the second aspect, the classifiers of the K categories respectively include K model parameters, and the training unit is specifically used to acquire initial values of the K model parameters; The initial values of the hidden variables of the N image samples; based on the feature vectors of the N image samples and the initial values of the hidden variables of the N image samples, through the multiple logistic regression model, train the K classifier to determine target values for the K model parameters.
结合第三方面,在第三方面的一种实现方式中,所述N个图像样本隐变量的初始值包括:正图像样本隐变量的初始值和负图像样本隐变量的初始值,所述训练单元具体用于基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的当前值,当所述K个模型参数的当前值满足预设的收敛条件时,将所述K个模型参数的当前值确定为所述K个模型参数的目标值,当所述K个模型参数的当前值不满足所述收敛条件时,基于所述N个图像样本的特征向量,以及所述K个模型参数的当前值,确定所述正图像样本隐变量的当前值,并利用所述正图像样本隐变量的当前值更新所述正图像样本隐变量的初始值,重复执行本步骤直到所述K个模型参数的当前值满足所述收敛条件。With reference to the third aspect, in an implementation manner of the third aspect, the initial values of the hidden variables of the N image samples include: the initial values of the hidden variables of the positive image samples and the initial values of the hidden variables of the negative image samples, and the training The unit is specifically configured to train the classifiers of the K categories through the multiple logistic regression model based on the feature vectors of the N image samples and the initial values of the latent variables of the N image samples to determine the The current values of the K model parameters, when the current values of the K model parameters meet the preset convergence conditions, the current values of the K model parameters are determined as the target values of the K model parameters, when the K model parameters are determined as target values, When the current values of the K model parameters do not meet the convergence condition, based on the feature vectors of the N image samples and the current values of the K model parameters, determine the current value of the hidden variable of the positive image sample, and update the initial value of the hidden variable of the positive image sample with the current value of the hidden variable of the positive image sample, and repeat this step until the current values of the K model parameters satisfy the convergence condition.
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方式中,所述训练单元具体用于基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的迭代值,基于所述N个图像样本的特征向量,以及所述K个模型参数的迭代值,确定所述负图像样本隐变量的迭代值,并利用所述负图像样本隐变量的迭代值更新所述负图像样本隐变量的初始值,当所述K个模型参数的迭代值满足预设的迭代停止条件时,将所述K个模型参数的迭代值确定为所述K个模型参数的当前值,否则,重复执行本步骤直到所述K个模型参数的当前值满足所述迭代停止条件。In combination with the third aspect or any of the above-mentioned implementation manners, in another implementation manner of the third aspect, the training unit is specifically configured to be based on the feature vectors of the N image samples, and the N image samples The initial value of the sample hidden variable, through the multiple logistic regression model, train the classifiers of the K categories to determine the iterative values of the K model parameters, based on the feature vectors of the N image samples, and the The iterative value of the K model parameters, determine the iterative value of the hidden variable of the negative image sample, and update the initial value of the hidden variable of the negative image sample by using the iterative value of the hidden variable of the negative image sample, when the K When the iteration value of the model parameter satisfies the preset iteration stop condition, the iteration value of the K model parameters is determined as the current value of the K model parameters, otherwise, this step is repeated until the K model parameters reach The current value satisfies the iteration stop condition.
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方式中,所述训练单元具体用于根据公式确定所述K个模型参数的迭代值,其中, xi表示所述N个图像样本中的第i样本,βl表示所述K个模型参数中的第l个模型参数,θ表示所述K个模型参数组成的K维变量,表示xi的类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量。In combination with the third aspect or any of the above-mentioned implementation manners, in another implementation manner of the third aspect, the training unit is specifically configured to Determine the iterative values of the K model parameters, where, x i represents the i-th sample in the N image samples, β l represents the l-th model parameter in the K model parameters, and θ represents the K-dimensional variable composed of the K model parameters, Indicates the model parameters corresponding to the category of xi , Z( xi ) indicates the value range of hidden variable z of xi , and f( xi ,z) indicates the feature vector of xi .
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方式中,所述训练单元具体用于根据公式
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方式中,所述迭代停止条件为所述目标函数值l(θ)的变化小于预设阈值;或者,所述迭代停止条件为迭代次数达到预设次数。In combination with the third aspect or any of the above implementations thereof, in another implementation of the third aspect, the iteration stop condition is that the change of the objective function value l(θ) is less than a preset threshold; or, The iteration stop condition is that the number of iterations reaches a preset number.
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方式中,所述训练单元具体用于根据公式
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方式中,所述训练单元具体用于根据公式确定所述负图像样本隐变量的迭代值,其中,xi表示所述N个图像样本中的第i样本,βt表示所述K个模型参数中的第t个模型参数,且 表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为βt时xi隐变量的迭代值,i为1至N中的任意整数,t为1至K中的任意整数。In combination with the third aspect or any of the above-mentioned implementation manners, in another implementation manner of the third aspect, the training unit is specifically configured to Determine the iterative value of the hidden variable of the negative image sample, where x i represents the i-th sample among the N image samples, and βt represents the t -th model parameter among the K model parameters, and Indicates the model parameters corresponding to the xi category, Z( xi ) indicates the value range of the hidden variable z of xi , f( xi ,z) indicates the feature vector of xi , Indicates the iterative value of hidden variable x i when the model parameter is β t , i is any integer from 1 to N, and t is any integer from 1 to K.
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方式中,所述训练单元具体用于根据公式确定所述正图像样本隐变量的当前值,其中,xi表示所述N个图像样本中的第i样本,表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为时xi隐变量的当前值,i为1至N中的任意整数。In combination with the third aspect or any of the above-mentioned implementation manners, in another implementation manner of the third aspect, the training unit is specifically configured to Determine the current value of the hidden variable of the positive image sample, where x i represents the i-th sample in the N image samples, Indicates the model parameters corresponding to the xi category, Z( xi ) indicates the value range of the hidden variable z of xi , f( xi ,z) indicates the feature vector of xi , Indicates that the model parameters are When is the current value of hidden variable x i , i is any integer from 1 to N.
结合第三方面或其上述实现方式的任一种,在第三方面的另一种实现方式中,所述训练单元具体用于根据公式确定每一个所述图像样本的隐变量的初始值,其中,xi表示所述N个图像样本中的第i样本,βk表示所述K个模型参数中的第k个模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为βk时xi隐变量z的初始值,i为1至N中的任意整数,k为1至K中的任意整数。In combination with the third aspect or any of the above-mentioned implementation manners, in another implementation manner of the third aspect, the training unit is specifically configured to Determine the initial value of the hidden variable of each of the image samples, wherein x i represents the i-th sample in the N image samples, β k represents the k-th model parameter in the K model parameters, and Z( x i ) represents the value range of the latent variable z of x i , f( xi , z) represents the feature vector of x i , Indicates the initial value of hidden variable z of x i when the model parameter is β k , i is any integer from 1 to N, and k is any integer from 1 to K.
第四方面,提供一种图像分类装置,包括:第一获取单元,用于获取待分类图像的特征向量;第一确定单元,用于基于所述待分类图像的特征向量,利用K个分类器,确定所述待分类图像的类别,其中,所述K个分类器是利用第三方面或第三方面的任意一种实现方式训练出的K个分类器;第二确定单元,用于根据公式确定所述待分类图像在所述K个类别下的概率,其中,x表示所述待分类图像,βk表示所述K个分类器中第k个分类器的模型参数,f(x,z)表示x的特征向量,Z(x)表示x的隐变量z的取值范围,k为1至K中的任意整数。In a fourth aspect, an image classification device is provided, comprising: a first acquisition unit, configured to acquire a feature vector of an image to be classified; a first determination unit, configured to use K classifiers based on the feature vector of the image to be classified , to determine the category of the image to be classified, wherein the K classifiers are K classifiers trained by using the third aspect or any implementation of the third aspect; the second determining unit is used for according to the formula Determining the probability of the image to be classified under the K categories, wherein, x represents the image to be classified, β k represents the model parameters of the kth classifier among the K classifiers, f(x, z) represents the feature vector of x, and Z(x) represents the hidden variable z of x Value range, k is any integer from 1 to K.
本发明实施例中,通过多元逻辑回归模型,以最大似然的形式同时训练K个分类器,也就是说,多元逻辑回归模型的使用保留了K个类别的分类器之间的相互关联,与LVSM将物体分类领域的K类分类问题转换成相互孤立的多个二类问题的方式相比,训练结果更加准确。In the embodiment of the present invention, K classifiers are simultaneously trained in the form of maximum likelihood through the multiple logistic regression model, that is to say, the use of the multiple logistic regression model retains the correlation between the classifiers of the K categories, and Compared with the method of converting the K-class classification problem in the field of object classification into multiple isolated two-class problems with LVSM, the training results are more accurate.
附图说明Description of drawings
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following will briefly introduce the accompanying drawings required in the embodiments of the present invention. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without making creative efforts.
图1是本发明实施例的图像分类器的生成方法的示意性流程图。Fig. 1 is a schematic flowchart of a method for generating an image classifier according to an embodiment of the present invention.
图2是利用本发明实施例训练出的分类器参数对图像分类的示例图。Fig. 2 is an example diagram of classifying images using classifier parameters trained in an embodiment of the present invention.
图3是利用本发明实施例训练出的分类器参数对图像分类的示例图。Fig. 3 is an example diagram of classifying images using classifier parameters trained in an embodiment of the present invention.
图4是本发明实施例的图像分类器的生成装置的示意性结构图。Fig. 4 is a schematic structural diagram of a device for generating an image classifier according to an embodiment of the present invention.
图5是本发明实施例的图像分类器的生成装置的示意性结构图。Fig. 5 is a schematic structural diagram of a device for generating an image classifier according to an embodiment of the present invention.
图6是本发明实施例的图像分类方法的示意性流程图。Fig. 6 is a schematic flowchart of an image classification method according to an embodiment of the present invention.
图7是本发明实施例的图像分类装置的示意性框图。Fig. 7 is a schematic block diagram of an image classification device according to an embodiment of the present invention.
图8是本发明实施例的图像分类装置的示意性框图。Fig. 8 is a schematic block diagram of an image classification device according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例,都应属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
图1是本发明实施例的图像分类器的生成方法的示意性流程图。图1的方法包括:Fig. 1 is a schematic flowchart of a method for generating an image classifier according to an embodiment of the present invention. The method of Figure 1 includes:
110、获取训练样本集,训练样本集包括N个图像样本,N个图像样本属于K个类别,N、K为正整数,N大于K。110. Obtain a training sample set, the training sample set includes N image samples, the N image samples belong to K categories, N and K are positive integers, and N is greater than K.
例如,训练样本集合D={(x1,y1),...,(xN,yN)},共包含N个图像样本,其中,yi为图像样本xi的标签,用于指示xi的类别,该类别为上述K个类别之一。For example, the training sample set D={(x 1 ,y 1 ),...,(x N ,y N )} contains N image samples in total, where y i is the label of the image sample xi , used for Indicates the category of xi , which is one of the above K categories.
120、获取每一个图像样本的特征向量,其中,特征向量包括图像样本的隐变量。120. Acquire a feature vector of each image sample, where the feature vector includes hidden variables of the image sample.
应理解,图像特征和隐变量可以根据应用场景或实际需要选取。例如,图像特征可以选取(或定义为)方向梯度直方图(HistogramofOrientedGradient,HOG),局部二值模式(LocalBinaryPatterns,LBP),或Haar等;隐变量可以选取(或定义为)物体在图像中的位置,图像中局部和主体间的相对位置,或物体的子类别等。基于上述选取的图像特征和隐变量,获取每一个图像样本的特征向量,此时,获取的每个图像的特征向量并非一个固定值,会随着隐变量的变化而变化,假设图像xi的隐变量为z,提取出的特征向量可通过f(x,z)表示。It should be understood that image features and latent variables may be selected according to application scenarios or actual needs. For example, image features can be selected (or defined as) Histogram of Oriented Gradient (HOG), local binary pattern (LocalBinaryPatterns, LBP), or Haar, etc.; hidden variables can be selected (or defined as) the position of the object in the image , the relative position between the part and the subject in the image, or the subcategory of the object, etc. Based on the image features and hidden variables selected above, the feature vector of each image sample is obtained. At this time, the feature vector of each image obtained is not a fixed value, but will change with the change of hidden variables. Suppose the image x i The hidden variable is z, and the extracted feature vector can be represented by f(x, z).
130、基于N个图像样本的隐变量,通过多元逻辑回归模型,训练K个类别的分类器。130. Based on hidden variables of N image samples, classifiers of K categories are trained through a multiple logistic regression model.
本发明实施例中,通过多元逻辑回归模型,以最大似然的形式同时训练K个分类器,也就是说,多元逻辑回归模型的使用保留了K个类别的分类器之间的相互关联,与LVSM将物体分类领域的K类分类问题转换成相互孤立的多个二类问题的方式相比,训练结果更加准确。In the embodiment of the present invention, K classifiers are simultaneously trained in the form of maximum likelihood through the multiple logistic regression model, that is to say, the use of the multiple logistic regression model retains the correlation between the classifiers of the K categories, and Compared with the method of converting the K-class classification problem in the field of object classification into multiple isolated two-class problems with LVSM, the training results are more accurate.
可选地,作为一个实施例,步骤130可包括:获取K个模型参数的初始值;获取N个图像样本的隐变量的初始值;基于N个图像样本的特征向量,以及N个图像样本隐变量的初始值,通过多元逻辑回归模型,训练K个类别的分类器,以确定K个模型参数的目标值。Optionally, as an embodiment, step 130 may include: acquiring initial values of K model parameters; acquiring initial values of latent variables of N image samples; feature vectors based on N image samples, and N image sample hidden variables The initial value of the variable is used to train classifiers of K categories through a multiple logistic regression model to determine the target values of K model parameters.
需要说明的是,一个图像样本的隐变量可包括K个初始值,也就是说,一个图像样本的隐变量在一个模型参数的初始值下会有一个对应的初始值。通过步骤130,可获取N*K个隐变量的初始值。It should be noted that a hidden variable of an image sample may include K initial values, that is, a hidden variable of an image sample has a corresponding initial value under an initial value of a model parameter. Through step 130, the initial values of N*K hidden variables can be obtained.
可选地,作为一个实施例,上述N个图像样本隐变量的初始值包括:正图像样本隐变量的初始值和负图像样本隐变量的初始值,上述基于N个图像样本的特征向量,以及N个图像样本的初始值,通过多元逻辑回归模型,训练K个类别的分类器,以确定K个模型参数的目标值,可包括:基于N个图像样本的特征向量,以及N个图像样本的隐变量的初始值,通过多元逻辑回归模型,训练K个类别的分类器,以确定K个模型参数的当前值,当K个模型参数的当前值满足预设的收敛条件时,将K个模型参数的当前值确定为K个模型参数的目标值,当K个模型参数的当前值不满足该收敛条件时,基于N个图像样本的特征向量,以及K个模型参数的当前值,确定正图像样本隐变量的当前值,并利用正图像样本隐变量的当前值更新该正图像样本隐变量的初始值,重复执行本步骤直到K个模型参数的当前值满足收敛条件。Optionally, as an embodiment, the initial values of the hidden variables of the N image samples include: the initial values of the hidden variables of the positive image samples and the initial values of the hidden variables of the negative image samples, the above-mentioned feature vectors based on the N image samples, and The initial value of N image samples, through the multiple logistic regression model, trains classifiers of K categories to determine the target values of K model parameters, which may include: feature vectors based on N image samples, and N image samples The initial value of the hidden variable, through the multiple logistic regression model, trains classifiers of K categories to determine the current values of the K model parameters. When the current values of the K model parameters meet the preset convergence conditions, the K models The current value of the parameter is determined as the target value of the K model parameters. When the current values of the K model parameters do not meet the convergence condition, the positive image is determined based on the eigenvectors of the N image samples and the current values of the K model parameters. The current value of the hidden variable of the sample, and use the current value of the hidden variable of the positive image sample to update the initial value of the hidden variable of the positive image sample, and repeat this step until the current values of the K model parameters meet the convergence condition.
具体而言,一个图像样本的隐变量在不同模型参数下可具有不同的初始值,也就是说一个图像样本的隐变量可包括K个初始值,上述N个图像样本隐变量的初始值可包括:K*N个初始值。一个图像样本在该图像样本类别对应的模型参数下为正样本,上述正图像样本隐变量的初始值共包括N个初始值,分别是N个图像样本在各自类别对应的模型参数下的初始值。K*N个初始值中,除去上述正图像隐变量初始值之外剩余的K*(N-1)个初始值均为负图像样本隐变量的初始值。Specifically, the hidden variables of an image sample may have different initial values under different model parameters, that is to say, the hidden variables of an image sample may include K initial values, and the initial values of the hidden variables of the above N image samples may include : K*N initial values. An image sample is a positive sample under the model parameters corresponding to the image sample category, and the initial values of the hidden variables of the above positive image samples include N initial values, which are the initial values of the N image samples under the model parameters corresponding to their respective categories . Among the K*N initial values, the remaining K*(N-1) initial values except the initial value of the hidden variable of the positive image are the initial values of the hidden variable of the negative image sample.
可以证明,当正图像样本隐变量初始值固定时,多元逻辑回归模型具有凹性,可以通过梯度上升的方式求解。It can be proved that when the initial value of the hidden variable of the positive image sample is fixed, the multiple logistic regression model is concave and can be solved by gradient ascent.
可选地,作为一个实施例,上述基于N个图像样本的特征向量,以及N个图像样本的隐变量的初始值,通过多元逻辑回归模型,训练K个类别的分类器,以确定K个模型参数的当前值可包括:基于N个图像样本的特征向量,以及N个图像样本隐变量的初始值,通过多元逻辑回归模型,训练K个类别的分类器,以确定K个模型参数的迭代值,基于N个图像样本的特征向量,以及K个模型参数的迭代值,确定负图像样本隐变量的迭代值,并利用负图像样本隐变量的迭代值更新负图像样本隐变量的初始值,当K个模型参数的迭代值满足预设的迭代停止条件时,将K个模型参数的迭代值确定为K个模型参数的当前值,否则,重复执行本步骤直到K个模型参数的当前值满足迭代停止条件。Optionally, as an embodiment, the above-mentioned feature vectors based on the N image samples and the initial values of the latent variables of the N image samples are used to train classifiers of K categories through a multiple logistic regression model to determine K models The current value of the parameter may include: based on the eigenvectors of N image samples and the initial values of the latent variables of N image samples, through the multiple logistic regression model, classifiers of K categories are trained to determine the iterative values of K model parameters , based on the eigenvectors of N image samples and the iterative values of K model parameters, determine the iterative value of the hidden variable of the negative image sample, and use the iterative value of the hidden variable of the negative image sample to update the initial value of the hidden variable of the negative image sample, when When the iteration values of the K model parameters meet the preset iteration stop condition, determine the iteration values of the K model parameters as the current values of the K model parameters, otherwise, repeat this step until the current values of the K model parameters satisfy the iteration stop condition.
本发明实施例中,在固定正样本隐变量取值的情况下,通过不断更新负样本隐变量的取值达到优化K个模型参数的目的,进一步提高了分类结果的准确性。In the embodiment of the present invention, in the case of fixing the value of the hidden variable of the positive sample, the purpose of optimizing K model parameters is achieved by continuously updating the value of the hidden variable of the negative sample, and the accuracy of the classification result is further improved.
可选地,作为一个实施例,上述基于N个图像样本的特征向量,以及N个图像样本隐变量的初始值,通过多元逻辑回归模型,训练K个类别的分类器,以确定K个模型参数的迭代值可包括:根据公式确定K个模型参数的迭代值,其中,
可选地,作为一个实施例,上述根据公式确定K个模型参数的迭代值,可包括:根据公式
可选地,作为一个实施例,上述迭代停止条件为目标函数值l(θ)的变化小于预设阈值;或者,迭代停止条件为迭代次数达到预设次数。Optionally, as an embodiment, the above iteration stop condition is that the change of the objective function value l(θ) is less than a preset threshold; or, the iteration stop condition is that the number of iterations reaches a preset number.
可选地,作为一个实施例,上述根据公式确定K个模型参数的迭代值,可包括:根据公式
上述目标函数l(θ)存在对数加和函数,因此,无法分解成K类子问题叠加的形式,也就无法采用并行或分布式计算对寻优过程进行加速。The above objective function l(θ) has a logarithmic sum function, therefore, it cannot be decomposed into the form of superposition of K-type sub-problems, and parallel or distributed computing cannot be used to accelerate the optimization process.
本发明实施例中,利用对数具有凹性(Log-concavity),采用对数凹上界(Log-concavityBound)将目标函数l(θ)转化为K类子问题加和的形式,从而可以实现并行计算,加速了算法的收敛。In the embodiment of the present invention, the logarithm has concavity (Log-concavity), and the log-concavity upper bound (Log-concavityBound) is used to convert the objective function l(θ) into the form of the sum of K-type subproblems, so that Parallel computing speeds up the convergence of the algorithm.
具体而言,对数凹上界的形式为:利用该式就可以将l(θ)转化为:Specifically, the logarithmic concave upper bound has the form: Using this formula, l(θ) can be transformed into:
采用上式作为目标函数,利用梯度上升法求解时,分类器参数的梯度的形式如下:
可选地,作为一个实施例,上述基于N个图像样本的特征向量,以及K个模型参数的迭代值,确定负图像样本隐变量的迭代值可包括:根据公式确定负图像样本隐变量的迭代值,其中,xi表示N个图像样本中的第i样本,βt表示K个模型参数中的第t个模型参数,且 表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为βt时xi隐变量的迭代值,i为1至N中的任意整数,t为1至K中的任意整数。Optionally, as an embodiment, based on the feature vectors of the N image samples and the iterative values of the K model parameters, determining the iterative value of the hidden variable of the negative image sample may include: according to the formula Determine the iterative value of the hidden variable of the negative image sample, where x i represents the i-th sample in the N image samples, β t represents the t-th model parameter in the K model parameters, and Indicates the model parameters corresponding to the xi category, Z( xi ) indicates the value range of the hidden variable z of xi , f( xi ,z) indicates the feature vector of xi , Indicates the iterative value of hidden variable x i when the model parameter is β t , i is any integer from 1 to N, and t is any integer from 1 to K.
可选地,作为一个实施例,上述基于N个图像样本的特征向量,以及K个图像样本的当前值,确定正图像样本隐变量的当前值可包括:根据公式确定正图像样本隐变量的当前值,其中,xi表示N个图像样本中的第i样本,表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为时xi隐变量的当前值,i为1至N中的任意整数。Optionally, as an embodiment, based on the feature vectors of the N image samples and the current values of the K image samples, determining the current value of the hidden variable of the positive image sample may include: according to the formula Determine the current value of the hidden variable of the positive image sample, where x i represents the i-th sample in the N image samples, Indicates the model parameters corresponding to the xi category, Z( xi ) indicates the value range of the hidden variable z of xi , f( xi ,z) indicates the feature vector of xi , Indicates that the model parameters are When is the current value of hidden variable x i , i is any integer from 1 to N.
可选地,作为一个实施例,上述基于每一个模型参数的初始值,确定每一个图像样本的隐变量的初始值可包括:根据公式确定每一个图像样本的隐变量的初始值,其中,xi表示N个图像样本中的第i样本,βk表示K个模型参数中的第k个模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为βk时xi隐变量z的初始值,i为1至N中的任意整数,k为1至K中的任意整数。Optionally, as an embodiment, based on the initial value of each model parameter above, determining the initial value of the hidden variable of each image sample may include: according to the formula Determine the initial value of the latent variable of each image sample, where x i represents the i-th sample among N image samples, β k represents the k-th model parameter among the K model parameters, and Z( xi ) represents x i The value range of the latent variable z, f( xi , z) represents the feature vector of xi , Indicates the initial value of hidden variable z of x i when the model parameter is β k , i is any integer from 1 to N, and k is any integer from 1 to K.
下面将结合具体的例子,详细描述本发明实施例。应注意,这些例子只是为了帮助本领域技术人员更好地理解本发明实施例,而非限制本发明实施例的范围。The embodiments of the present invention will be described in detail below in conjunction with specific examples. It should be noted that these examples are only intended to help those skilled in the art better understand the embodiments of the present invention, rather than limit the scope of the embodiments of the present invention.
实施例1:Example 1:
输入:训练样本集{(x1,y1),…,(xN,yN)},初始全部隐变量取值。Input: training sample set {(x 1 , y 1 ),…, (x N , y N )}, initial values of all hidden variables.
输出:分类器参数θ,θ={β1,...,βK}。Output: classifier parameters θ, θ={β 1 ,...,β K }.
实施例2:Example 2:
输入:训练样本集{(x1,y1),…,(xN,yN)},初始隐变量取值{h}。Input: training sample set {(x 1 ,y 1 ),…,(x N ,y N )}, initial hidden variable value {h}.
输出:分类器参数θ。Output: Classifier parameters θ.
具体实现中,常数numOuterLoop和numInnerLoop的取值与应用场景有较大关系,如在数字识别(digitrecognition)中,由于样本数量多,特征维度小,可以设numOuterLoop=50,numInnerLoop=1。In specific implementation, the values of the constants numOuterLoop and numInnerLoop have a great relationship with the application scenario. For example, in digit recognition (digitrecognition), due to the large number of samples and small feature dimension, numOuterLoop=50 and numInnerLoop=1 can be set.
在更复杂的实例中,如样本数量小,特征维度高,可设numOuterLoop=5,numInnerLoop=1000。In more complex examples, if the number of samples is small and the feature dimension is high, numOuterLoop=5, numInnerLoop=1000 can be set.
下面给出训练出的分类器参数对图像分类的结果。需要说明的是,在下面的描述中,本发明实施例的分类器训练方式称为:隐变量多元逻辑回归(MultinomialLatentLogisticRegression,MLLR)。The results of image classification by the trained classifier parameters are given below. It should be noted that, in the following description, the classifier training method in the embodiment of the present invention is called: latent variable multiple logistic regression (MultinomialLatentLogisticRegression, MLLR).
图2是利用本发明实施例训练出的分类器参数对图像分类的示例图。图2的例子中以哺乳动物分类为研究对象,共包含6类哺乳动物,每类约50张图片。实验中取50%图片作为训练,另50%图像作为测试。图像特征方面使用HOG特征,隐变量为待检测物体在图片中的位置,并规定物体所在框的大小要在总图片大小的30%以上。线性SVM、LSVM和MLLR,测试结果如下:Fig. 2 is an example diagram of classifying images using classifier parameters trained in an embodiment of the present invention. In the example in Figure 2, mammals are classified as the research object, and there are 6 types of mammals in total, each with about 50 pictures. In the experiment, 50% of the images are taken as training, and the other 50% of images are used as testing. In terms of image features, HOG features are used. The hidden variable is the position of the object to be detected in the picture, and the size of the frame where the object is located should be more than 30% of the total picture size. Linear SVM, LSVM and MLLR, the test results are as follows:
表1哺乳动物分类实验分类结果Table 1 Classification results of mammal classification experiments
测试结果表明,MLLR的准确率超过LSVM,并且LSVM和MLLR两种隐变量方式训练出的分类器的效果均优于传统线性SVM方法。The test results show that the accuracy of MLLR exceeds that of LSVM, and the effect of the classifiers trained by the two hidden variable methods of LSVM and MLLR is better than that of the traditional linear SVM method.
图2中,第一列为线性SVM训练出的分类器示意图(采用HOG特征),第二列为MLLR训练出的分类器示意图。图2内小图片中的矩形框为MLLR检测出的物体位置。In Figure 2, the first column is a schematic diagram of a classifier trained by linear SVM (using HOG features), and the second column is a schematic diagram of a classifier trained by MLLR. The rectangular frame in the small picture in Figure 2 is the object position detected by MLLR.
图3是利用本发明实施例训练出的分类器参数对图像分类的示例图。图3以体育人物动作为研究对象,共包括6类动作(板球击球、板球投球、排球扣球、门球击球、网球正手和网球发球)。图像特征仍使用HOG,隐变量模型使用DPM,即物体位置和局部主体相对位置均作为隐变量。结果显示分类准确率MLLR(78.3%)超过LSVM(74.4%)。图3中,第一列为图片中的主体模型示意图,第二列为图片中的局部模型示意图,图3内小图片中深色矩形框代表主体位置,浅色矩形框代表局部位置。应理解,在本发明的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。Fig. 3 is an example diagram of classifying images using classifier parameters trained in an embodiment of the present invention. Figure 3 takes sports figures as the research object, including six types of actions (cricket hitting, cricket pitching, volleyball smashing, croquet hitting, tennis forehand and tennis serve). The image feature still uses HOG, and the hidden variable model uses DPM, that is, the object position and the relative position of the local subject are both used as hidden variables. The results show that the classification accuracy of MLLR (78.3%) exceeds that of LSVM (74.4%). In Figure 3, the first column is the schematic diagram of the main body model in the picture, and the second column is the schematic diagram of the local model in the picture. In the small picture in Figure 3, the dark rectangle represents the position of the subject, and the light rectangle represents the local position. It should be understood that in various embodiments of the present invention, the sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, rather than by the embodiment of the present invention. The implementation process constitutes any limitation.
上文中结合图1至图3,详细描述了根据本发明实施例的图像分类器的生成方法,下面将结合图4至图5,描述根据本发明实施例的图像分类器的生成装置。The method for generating an image classifier according to an embodiment of the present invention is described in detail above with reference to FIG. 1 to FIG. 3 , and the apparatus for generating an image classifier according to an embodiment of the present invention will be described below in conjunction with FIGS. 4 to 5 .
应理解,根据本发明实施例的图像分类器的生成装置能够实现图1中的各个步骤,为了简洁,在此不再赘述。It should be understood that the apparatus for generating an image classifier according to the embodiment of the present invention can implement each step in FIG. 1 , and details are not repeated here for brevity.
图4是本发明实施例的图像分类器的生成装置的示意性结构图。图4的装置400包括:Fig. 4 is a schematic structural diagram of a device for generating an image classifier according to an embodiment of the present invention. The device 400 of Fig. 4 comprises:
第一获取单元410,用于获取训练样本集,所述训练样本集包括N个图像样本,所述N个图像样本属于K个类别,N、K为正整数,N大于K;The first acquiring unit 410 is configured to acquire a training sample set, the training sample set includes N image samples, the N image samples belong to K categories, N and K are positive integers, and N is greater than K;
第二获取单元420,用于获取所述第一获取单元410获取的每一个所述图像样本的特征向量,其中,所述特征向量包括图像样本的隐变量;The second acquiring unit 420 is configured to acquire a feature vector of each image sample acquired by the first acquiring unit 410, wherein the feature vector includes a latent variable of the image sample;
训练单元430,用于基于所述第二获取单元420获取的所述N个图像样本的隐变量,通过多元逻辑回归模型,训练所述K个类别的分类器。The training unit 430 is configured to train the classifiers of the K categories based on the latent variables of the N image samples acquired by the second acquiring unit 420 through a multiple logistic regression model.
本发明实施例中,通过多元逻辑回归模型,以最大似然的形式同时训练K个分类器,也就是说,多元逻辑回归模型的使用保留了K个类别的分类器之间的相互关联,与LVSM将物体分类领域的K类分类问题转换成相互孤立的多个二类问题的方式相比,训练结果更加准确。In the embodiment of the present invention, K classifiers are simultaneously trained in the form of maximum likelihood through the multiple logistic regression model, that is to say, the use of the multiple logistic regression model retains the correlation between the classifiers of the K categories, and Compared with the method of converting the K-class classification problem in the field of object classification into multiple isolated two-class problems with LVSM, the training results are more accurate.
可选地,作为一个实施例,所述K个类别的分类器分别包括K个模型参数,所述训练单元430具体用于获取所述K个模型参数的初始值;基于每一个所述模型参数的初始值,确定每一个所述图像样本的隐变量的初始值;基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的目标值。Optionally, as an embodiment, the classifiers of the K categories respectively include K model parameters, and the training unit 430 is specifically configured to obtain initial values of the K model parameters; based on each of the model parameters The initial value of the initial value, determine the initial value of the hidden variable of each image sample; Based on the feature vector of the N image samples, and the initial value of the hidden variable of the N image samples, through the multiple logistic regression model, Classifiers of the K categories are trained to determine target values of the K model parameters.
可选地,作为一个实施例,所述N个图像样本隐变量的初始值包括:正图像样本隐变量的初始值和负图像样本隐变量的初始值,所述训练单元430具体用于基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的当前值,当所述K个模型参数的当前值满足预设的收敛条件时,将所述K个模型参数的当前值确定为所述K个模型参数的目标值,当所述K个模型参数的当前值不满足所述收敛条件时,基于所述N个图像样本的特征向量,以及所述K个模型参数的当前值,确定所述正图像样本隐变量的当前值,并利用所述正图像样本隐变量的当前值更新所述正图像样本隐变量的初始值,重复执行本步骤直到所述K个模型参数的当前值满足所述收敛条件。Optionally, as an embodiment, the initial values of the hidden variables of the N image samples include: the initial values of the hidden variables of the positive image samples and the initial values of the hidden variables of the negative image samples, and the training unit 430 is specifically configured to The feature vectors of the N image samples, and the initial values of the latent variables of the N image samples, through the multiple logistic regression model, train the classifiers of the K categories to determine the current K model parameters value, when the current values of the K model parameters meet the preset convergence conditions, determine the current values of the K model parameters as the target values of the K model parameters, and when the K model parameters When the current value does not satisfy the convergence condition, based on the eigenvectors of the N image samples and the current values of the K model parameters, determine the current value of the hidden variable of the positive image sample, and use the positive image The current value of the sample latent variable updates the initial value of the positive image sample latent variable, and this step is repeated until the current values of the K model parameters satisfy the convergence condition.
可选地,作为一个实施例,所述训练单元430具体用于基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的迭代值,基于所述N个图像样本的特征向量,以及所述K个模型参数的迭代值,确定所述负图像样本隐变量的迭代值,并利用所述负图像样本隐变量的迭代值更新所述负图像样本隐变量的初始值,当所述K个模型参数的迭代值满足预设的迭代停止条件时,将所述K个模型参数的迭代值确定为所述K个模型参数的当前值,否则,重复执行本步骤直到所述K个模型参数的当前值满足所述迭代停止条件。Optionally, as an embodiment, the training unit 430 is specifically configured to train the multivariate logistic regression model based on the feature vectors of the N image samples and the initial values of the latent variables of the N image samples. The classifiers of the K categories are used to determine the iteration values of the K model parameters, based on the feature vectors of the N image samples and the iteration values of the K model parameters, determine the negative image sample hidden variable, and use the iterative value of the hidden variable of the negative image sample to update the initial value of the hidden variable of the negative image sample, when the iterative value of the K model parameters satisfies the preset iteration stop condition, the The iteration values of the K model parameters are determined as the current values of the K model parameters, otherwise, this step is repeated until the current values of the K model parameters satisfy the iteration stop condition.
可选地,作为一个实施例,所述训练单元430具体用于根据公式确定所述K个模型参数的迭代值,其中,
可选地,作为一个实施例,所述训练单元430具体用于根据公式
可选地,作为一个实施例,所述迭代停止条件为所述目标函数值l(θ)的变化小于预设阈值;或者,所述迭代停止条件为迭代次数达到预设次数。Optionally, as an embodiment, the iteration stop condition is that the change of the objective function value l(θ) is less than a preset threshold; or, the iteration stop condition is that the number of iterations reaches a preset number.
可选地,作为一个实施例,所述训练单元430具体用于根据公式
可选地,作为一个实施例,所述训练单元430具体用于根据公式确定所述负图像样本隐变量的迭代值,其中,xi表示所述N个图像样本中的第i样本,βt表示所述K个模型参数中的第t个模型参数,且 表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为βt时xi隐变量的迭代值,i为1至N中的任意整数,t为1至K中的任意整数。Optionally, as an embodiment, the training unit 430 is specifically configured to Determine the iterative value of the hidden variable of the negative image sample, where x i represents the i-th sample among the N image samples, and βt represents the t -th model parameter among the K model parameters, and Indicates the model parameters corresponding to the xi category, Z( xi ) indicates the value range of the hidden variable z of xi , f( xi ,z) indicates the feature vector of xi , Indicates the iterative value of hidden variable x i when the model parameter is β t , i is any integer from 1 to N, and t is any integer from 1 to K.
可选地,作为一个实施例,所述训练单元430具体用于根据公式确定所述正图像样本隐变量的当前值,其中,xi表示所述N个图像样本中的第i样本,表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为时xi隐变量的当前值,i为1至N中的任意整数。Optionally, as an embodiment, the training unit 430 is specifically configured to Determine the current value of the hidden variable of the positive image sample, where x i represents the i-th sample in the N image samples, Indicates the model parameters corresponding to the xi category, Z( xi ) indicates the value range of the hidden variable z of xi , f( xi ,z) indicates the feature vector of xi , Indicates that the model parameters are When is the current value of hidden variable x i , i is any integer from 1 to N.
可选地,作为一个实施例,所述训练单元430具体用于根据公式确定每一个所述图像样本的隐变量的初始值,其中,xi表示所述N个图像样本中的第i样本,βk表示所述K个模型参数中的第k个模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为βk时xi隐变量z的初始值,i为1至N中的任意整数,k为1至K中的任意整数。Optionally, as an embodiment, the training unit 430 is specifically configured to Determine the initial value of the hidden variable of each of the image samples, wherein x i represents the i-th sample in the N image samples, β k represents the k-th model parameter in the K model parameters, and Z( x i ) represents the value range of the latent variable z of x i , f( xi , z) represents the feature vector of x i , Indicates the initial value of hidden variable z of x i when the model parameter is β k , i is any integer from 1 to N, and k is any integer from 1 to K.
图5是本发明实施例的图像分类器的生成装置的示意性结构图。图5的装置500包括:Fig. 5 is a schematic structural diagram of a device for generating an image classifier according to an embodiment of the present invention. The device 500 of Fig. 5 comprises:
存储器510,用于存储程序;memory 510, for storing programs;
处理器520,用于执行所述程序,当所述程序被执行时,所述处理器520具体用于获取训练样本集,所述训练样本集包括N个图像样本,所述N个图像样本属于K个类别,N、K为正整数,N大于K;获取每一个所述图像样本的特征向量,其中,所述特征向量包括图像样本的隐变量;基于所述N个图像样本的隐变量,通过多元逻辑回归模型,训练所述K个类别的分类器。The processor 520 is configured to execute the program. When the program is executed, the processor 520 is specifically configured to obtain a training sample set, the training sample set includes N image samples, and the N image samples belong to K categories, N and K are positive integers, and N is greater than K; obtain a feature vector of each of the image samples, wherein the feature vector includes hidden variables of the image samples; based on the hidden variables of the N image samples, Classifiers for the K categories are trained through a multiple logistic regression model.
本发明实施例中,通过多元逻辑回归模型,以最大似然的形式同时训练K个分类器,也就是说,多元逻辑回归模型的使用保留了K个类别的分类器之间的相互关联,与LVSM将物体分类领域的K类分类问题转换成相互孤立的多个二类问题的方式相比,训练结果更加准确。In the embodiment of the present invention, the multiple logistic regression model is used to simultaneously train K classifiers in the form of maximum likelihood, that is to say, the use of the multiple logistic regression model retains the correlation between the classifiers of the K categories, and Compared with the method of converting the K-class classification problem in the field of object classification into multiple isolated two-class problems with LVSM, the training results are more accurate.
可选地,作为一个实施例,所述K个类别的分类器分别包括K个模型参数,所述处理器520具体用于获取所述K个模型参数的初始值;基于每一个所述模型参数的初始值,确定每一个所述图像样本的隐变量的初始值;基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的目标值。Optionally, as an embodiment, the classifiers of the K categories respectively include K model parameters, and the processor 520 is specifically configured to obtain initial values of the K model parameters; based on each of the model parameters The initial value of the initial value, determine the initial value of the hidden variable of each image sample; Based on the feature vector of the N image samples, and the initial value of the hidden variable of the N image samples, through the multiple logistic regression model, Classifiers of the K categories are trained to determine target values of the K model parameters.
可选地,作为一个实施例,所述N个图像样本隐变量的初始值包括:正图像样本隐变量的初始值和负图像样本隐变量的初始值,所述处理器520具体用于基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的当前值,当所述K个模型参数的当前值满足预设的收敛条件时,将所述K个模型参数的当前值确定为所述K个模型参数的目标值,当所述K个模型参数的当前值不满足所述收敛条件时,基于所述N个图像样本的特征向量,以及所述K个模型参数的当前值,确定所述正图像样本隐变量的当前值,并利用所述正图像样本隐变量的当前值更新所述正图像样本隐变量的初始值,重复执行本步骤直到所述K个模型参数的当前值满足所述收敛条件。Optionally, as an embodiment, the initial values of the N hidden variables of image samples include: initial values of hidden variables of positive image samples and initial values of hidden variables of negative image samples, and the processor 520 is specifically configured to The feature vectors of the N image samples, and the initial values of the latent variables of the N image samples, through the multiple logistic regression model, train the classifiers of the K categories to determine the current K model parameters value, when the current values of the K model parameters meet the preset convergence conditions, determine the current values of the K model parameters as the target values of the K model parameters, and when the K model parameters When the current value does not satisfy the convergence condition, based on the eigenvectors of the N image samples and the current values of the K model parameters, determine the current value of the hidden variable of the positive image sample, and use the positive image The current value of the sample latent variable updates the initial value of the positive image sample latent variable, and repeats this step until the current values of the K model parameters satisfy the convergence condition.
可选地,作为一个实施例,所述处理器520具体用于基于所述N个图像样本的特征向量,以及所述N个图像样本隐变量的初始值,通过所述多元逻辑回归模型,训练所述K个类别的分类器,以确定所述K个模型参数的迭代值,基于所述N个图像样本的特征向量,以及所述K个模型参数的迭代值,确定所述负图像样本隐变量的迭代值,并利用所述负图像样本隐变量的迭代值更新所述负图像样本隐变量的初始值,当所述K个模型参数的迭代值满足预设的迭代停止条件时,将所述K个模型参数的迭代值确定为所述K个模型参数的当前值,否则,重复执行本步骤直到所述K个模型参数的当前值满足所述迭代停止条件。Optionally, as an embodiment, the processor 520 is specifically configured to train the multivariate logistic regression model based on the feature vectors of the N image samples and the initial values of the hidden variables of the N image samples. The classifiers of the K categories are used to determine the iteration values of the K model parameters, based on the feature vectors of the N image samples and the iteration values of the K model parameters, determine the negative image sample hidden variable, and use the iterative value of the hidden variable of the negative image sample to update the initial value of the hidden variable of the negative image sample, when the iterative value of the K model parameters satisfies the preset iteration stop condition, the The iteration values of the K model parameters are determined as the current values of the K model parameters, otherwise, this step is repeated until the current values of the K model parameters satisfy the iteration stop condition.
可选地,作为一个实施例,所述处理器520具体用于根据公式确定所述K个模型参数的迭代值,其中,
可选地,作为一个实施例,所述处理器520具体用于根据公式
可选地,作为一个实施例,所述迭代停止条件为所述目标函数值l(θ)的变化小于预设阈值;或者,所述迭代停止条件为迭代次数达到预设次数。Optionally, as an embodiment, the iteration stop condition is that the change of the objective function value l(θ) is less than a preset threshold; or, the iteration stop condition is that the number of iterations reaches a preset number.
可选地,作为一个实施例,所述处理器520具体用于根据公式
可选地,作为一个实施例,所述处理器520具体用于根据公式确定所述负图像样本隐变量的迭代值,其中,xi表示所述N个图像样本中的第i样本,βt表示所述K个模型参数中的第t个模型参数,且 表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为βt时xi隐变量的迭代值,i为1至N中的任意整数,t为1至K中的任意整数。Optionally, as an embodiment, the processor 520 is specifically configured to Determine the iterative value of the hidden variable of the negative image sample, where x i represents the i-th sample among the N image samples, and βt represents the t -th model parameter among the K model parameters, and Indicates the model parameters corresponding to the xi category, Z( xi ) indicates the value range of the hidden variable z of xi , f( xi ,z) indicates the feature vector of xi , Indicates the iterative value of hidden variable x i when the model parameter is β t , i is any integer from 1 to N, and t is any integer from 1 to K.
可选地,作为一个实施例,所述处理器520具体用于根据公式确定所述正图像样本隐变量的当前值,其中,xi表示所述N个图像样本中的第i样本,表示xi类别对应的模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为时xi隐变量的当前值,i为1至N中的任意整数。Optionally, as an embodiment, the processor 520 is specifically configured to Determine the current value of the hidden variable of the positive image sample, where x i represents the i-th sample in the N image samples, Indicates the model parameters corresponding to the xi category, Z( xi ) indicates the value range of the hidden variable z of xi , f( xi ,z) indicates the feature vector of xi , Indicates that the model parameters are When is the current value of hidden variable x i , i is any integer from 1 to N.
可选地,作为一个实施例,所述处理器520具体用于根据公式确定每一个所述图像样本的隐变量的初始值,其中,xi表示所述N个图像样本中的第i样本,βk表示所述K个模型参数中的第k个模型参数,Z(xi)表示xi的隐变量z的取值范围,f(xi,z)表示xi的特征向量,表示模型参数为βk时xi隐变量z的初始值,i为1至N中的任意整数,k为1至K中的任意整数。Optionally, as an embodiment, the processor 520 is specifically configured to Determine the initial value of the hidden variable of each of the image samples, wherein x i represents the i-th sample in the N image samples, β k represents the k-th model parameter in the K model parameters, and Z( x i ) represents the value range of the latent variable z of x i , f( xi , z) represents the feature vector of x i , Indicates the initial value of hidden variable z of x i when the model parameter is β k , i is any integer from 1 to N, and k is any integer from 1 to K.
图6是本发明实施例的图像分类方法的示意性流程图。图6的方法中,可利用图1方法训练出的K个分类器对图像进行分类,图6方法包括:Fig. 6 is a schematic flowchart of an image classification method according to an embodiment of the present invention. In the method of Fig. 6, the K classifiers trained by the method of Fig. 1 can be used to classify images, and the method of Fig. 6 includes:
610、获取待分类图像的特征向量;610. Obtain the feature vector of the image to be classified;
620、基于待分类图像的特征向量,利用K个分类器,确定待分类图像的类别;620. Based on the feature vector of the image to be classified, use K classifiers to determine the category of the image to be classified;
630、根据公式确定待分类图像在K个类别下的概率,其中,x表示待分类图像,βk表示K个分类器中第k个分类器的模型参数,f(x,z)表示x的特征向量,Z(x)表示x的隐变量z的取值范围,k为1至K中的任意整数。630. According to the formula Determine the probability of the image to be classified under K categories, where, x represents the image to be classified, β k represents the model parameters of the kth classifier among the K classifiers, f(x,z) represents the feature vector of x, Z(x) represents the value range of the latent variable z of x, k is any integer from 1 to K.
现有的LSVM的分类结果仅给出待分类图像属于哪一类,但是实际情况中,不同类型之间可能存在一定的联系,某一图像并非绝对属于哪一类。例如,可以将建筑物的风格进行分类,包括现代风格,中世纪风格等,图像中某一建筑物风格可能既采用了一些现代风格,也采用了一部分中世纪风格,此时,现有LSVM的分类结果仅会显示待分类图像中的建筑物归为哪种建筑风格,显然不够准确。本实施例中,除了给出待分类图像所属的类别,还给出了该图片在各类别中的概率,与现有技术相比,引入图像分类结果的概率解释使得图像分类结果的描述更加准确。The classification result of the existing LSVM only gives which category the image to be classified belongs to, but in actual situations, there may be a certain relationship between different types, and a certain image does not absolutely belong to which category. For example, the style of the building can be classified, including modern style, medieval style, etc. A certain building style in the image may adopt some modern styles and part of the medieval style. At this time, the classification results of the existing LSVM It only shows which architectural style the buildings in the image to be classified belong to, which is obviously not accurate enough. In this embodiment, in addition to the category to which the image to be classified belongs, the probability of the picture in each category is also given. Compared with the prior art, the introduction of the probability interpretation of the image classification result makes the description of the image classification result more accurate .
图7是本发明实施例的图像分类的装置的示意性框图。图7中的装置700可利用图4的装置400训练出的K个分类器对图像进行分类,装置700包括:Fig. 7 is a schematic block diagram of an image classification device according to an embodiment of the present invention. The device 700 in FIG. 7 can use the K classifiers trained by the device 400 of FIG. 4 to classify images, and the device 700 includes:
第一获取单元710,用于获取待分类图像的特征向量;A first acquiring unit 710, configured to acquire a feature vector of an image to be classified;
第一确定单元720,用于基于待分类图像的特征向量,利用K个分类器,确定待分类图像的类别;The first determination unit 720 is configured to determine the category of the image to be classified by using K classifiers based on the feature vector of the image to be classified;
第二确定单元730,用于根据公式确定待分类图像在K个类别下的概率,其中,x表示待分类图像,βk表示K个分类器中第k个分类器的模型参数,f(x,z)表示x的特征向量,Z(x)表示x的隐变量z的取值范围,k为1至K中的任意整数。The second determining unit 730 is used for according to the formula Determine the probability of the image to be classified under K categories, where, x represents the image to be classified, β k represents the model parameters of the kth classifier among the K classifiers, f(x,z) represents the feature vector of x, Z(x) represents the value range of the latent variable z of x, k is any integer from 1 to K.
现有的LSVM的分类结果仅给出待分类图像属于哪一类,但是实际情况中,不同类型之间可能存在一定的联系,某一图像并非绝对属于哪一类。例如,可以将建筑物的风格进行分类,包括现代风格,中世纪风格等,图像中某一建筑物风格可能既采用了一些现代风格,也采用了一部分中世纪风格,此时,现有LSVM的分类结果仅会显示待分类图像中的建筑物归为哪种建筑风格,显然不够准确。本实施例中,除了给出待分类图像所属的类别,还给出了该图片在各类别中的概率,与现有技术相比,引入图像分类结果的概率解释使得图像分类结果的描述更加准确。The classification result of the existing LSVM only gives which category the image to be classified belongs to, but in actual situations, there may be a certain relationship between different types, and a certain image does not absolutely belong to which category. For example, the style of the building can be classified, including modern style, medieval style, etc. A certain building style in the image may adopt some modern styles and part of the medieval style. At this time, the classification results of the existing LSVM It only shows which architectural style the buildings in the image to be classified belong to, which is obviously not accurate enough. In this embodiment, in addition to the category to which the image to be classified belongs, the probability of the picture in each category is also given. Compared with the prior art, the introduction of the probability interpretation of the image classification result makes the description of the image classification result more accurate .
图8是本发明实施例的图像分类的装置的示意性框图。图8中的图像分类装置800可利用图5的装置500训练出的K个分类器对图像进行分类,图8方法包括:Fig. 8 is a schematic block diagram of an image classification device according to an embodiment of the present invention. The image classification device 800 in Fig. 8 can utilize the K classifiers trained by the device 500 in Fig. 5 to classify the images, and the method in Fig. 8 includes:
存储器810,用于存储程序;memory 810, for storing programs;
处理器820,用于执行程序,当所述程序被执行时,所述程序用于获取待分类图像的特征向量;基于待分类图像的特征向量,利用K个分类器,确定待分类图像的类别;根据公式确定待分类图像在K个类别下的概率,其中,x表示待分类图像,βk表示K个分类器中第k个分类器的模型参数,f(x,z)表示x的特征向量,Z(x)表示x的隐变量z的取值范围,k为1至K中的任意整数。The processor 820 is used to execute a program. When the program is executed, the program is used to obtain the feature vector of the image to be classified; based on the feature vector of the image to be classified, K classifiers are used to determine the category of the image to be classified ;according to the formula Determine the probability of the image to be classified under K categories, where, x represents the image to be classified, β k represents the model parameters of the kth classifier among the K classifiers, f(x,z) represents the feature vector of x, Z(x) represents the value range of the latent variable z of x, k is any integer from 1 to K.
现有的LSVM的分类结果仅给出待分类图像属于哪一类,但是实际情况中,不同类型之间可能存在一定的联系,某一图像并非绝对属于哪一类。例如,可以将建筑物的风格进行分类,包括现代风格,中世纪风格等,图像中某一建筑物风格可能既采用了一些现代风格,也采用了一部分中世纪风格,此时,现有LSVM的分类结果仅会显示待分类图像中的建筑物归为哪种建筑风格,显然不够准确。本实施例中,除了给出待分类图像所属的类别,还给出了该图片在各类别中的概率,与现有技术相比,引入图像分类结果的概率解释使得图像分类结果的描述更加准确。The classification result of the existing LSVM only gives which category the image to be classified belongs to, but in actual situations, there may be a certain relationship between different types, and a certain image does not absolutely belong to which category. For example, the style of the building can be classified, including modern style, medieval style, etc. A certain building style in the image may adopt some modern styles and part of the medieval style. At this time, the classification results of the existing LSVM It only shows which architectural style the buildings in the image to be classified belong to, which is obviously not accurate enough. In this embodiment, in addition to the category to which the image to be classified belongs, the probability of the picture in each category is also given. Compared with the prior art, the introduction of the probability interpretation of the image classification result makes the description of the image classification result more accurate .
应理解,在本发明实施例中,术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系。例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should be understood that in the embodiments of the present invention, the term "and/or" is only an association relationship describing associated objects, indicating that there may be three relationships. For example, A and/or B may mean that A exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this article generally indicates that the contextual objects are an "or" relationship.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two. In order to clearly illustrate the relationship between hardware and software Interchangeability. In the above description, the composition and steps of each example have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present invention.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本发明实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-OnlyMemory)、随机存取存储器(RAM,RandomAccessMemory)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium In, several instructions are included to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-OnlyMemory), random access memory (RAM, RandomAccessMemory), magnetic disk or optical disk, and various media that can store program codes.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of various equivalents within the technical scope disclosed in the present invention. Modifications or replacements shall all fall within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.
Claims (24)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410453884.6A CN105389583A (en) | 2014-09-05 | 2014-09-05 | Image classifier generation method, and image classification method and device |
PCT/CN2015/075781 WO2016033965A1 (en) | 2014-09-05 | 2015-04-02 | Method for generating image classifier and image classification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410453884.6A CN105389583A (en) | 2014-09-05 | 2014-09-05 | Image classifier generation method, and image classification method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105389583A true CN105389583A (en) | 2016-03-09 |
Family
ID=55421853
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410453884.6A Withdrawn CN105389583A (en) | 2014-09-05 | 2014-09-05 | Image classifier generation method, and image classification method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105389583A (en) |
WO (1) | WO2016033965A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106056146A (en) * | 2016-05-27 | 2016-10-26 | 西安电子科技大学 | Logistic regression-based visual tracking method |
CN107492067A (en) * | 2017-09-07 | 2017-12-19 | 维沃移动通信有限公司 | A kind of image beautification method and mobile terminal |
CN108536838A (en) * | 2018-04-13 | 2018-09-14 | 重庆邮电大学 | Very big unrelated multivariate logistic regression model based on Spark is to text sentiment classification method |
CN108549692A (en) * | 2018-04-13 | 2018-09-18 | 重庆邮电大学 | The method that sparse multivariate logistic regression model under Spark frames classifies to text emotion |
CN108595568A (en) * | 2018-04-13 | 2018-09-28 | 重庆邮电大学 | A kind of text sentiment classification method based on very big unrelated multivariate logistic regression |
CN108875455A (en) * | 2017-05-11 | 2018-11-23 | Tcl集团股份有限公司 | A kind of unsupervised face intelligence precise recognition method and system |
CN109784351A (en) * | 2017-11-10 | 2019-05-21 | 财付通支付科技有限公司 | Data classification method, disaggregated model training method and device |
CN110084380A (en) * | 2019-05-10 | 2019-08-02 | 深圳市网心科技有限公司 | A kind of repetitive exercise method, equipment, system and medium |
CN110163794A (en) * | 2018-05-02 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Conversion method, device, storage medium and the electronic device of image |
CN110633725A (en) * | 2018-06-25 | 2019-12-31 | 富士通株式会社 | Method and device for training classification model and classification method and device |
CN113674219A (en) * | 2021-07-28 | 2021-11-19 | 云南大益微生物技术有限公司 | Tea leaf impurity identification method based on double logistic regression |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109815971B (en) * | 2017-11-20 | 2023-03-10 | 富士通株式会社 | Information processing method and information processing apparatus |
CN109685749B (en) * | 2018-09-25 | 2023-04-18 | 平安科技(深圳)有限公司 | Image style conversion method, device, equipment and computer storage medium |
CN111225299A (en) * | 2018-11-27 | 2020-06-02 | 中国移动通信集团广东有限公司 | A kind of ONU fault identification, repair method and device |
CN111368861B (en) * | 2018-12-25 | 2023-05-09 | 杭州海康威视数字技术股份有限公司 | Method and device for determining the sequence of sub-components in an image object detection process |
CN110516737B (en) * | 2019-08-26 | 2023-05-26 | 南京人工智能高等研究院有限公司 | Method and device for generating image recognition model |
CN111199244B (en) * | 2019-12-19 | 2024-04-09 | 北京航天测控技术有限公司 | Data classification method and device, storage medium and electronic device |
CN112329837B (en) * | 2020-11-02 | 2023-01-17 | 北京邮电大学 | An adversarial sample detection method, device, electronic equipment and medium |
CN113239804B (en) * | 2021-05-13 | 2023-06-02 | 杭州睿胜软件有限公司 | Image recognition method, readable storage medium, and image recognition system |
CN114821210B (en) * | 2022-03-17 | 2025-06-03 | 西北工业大学 | A feature selection method based on multi-classification logistic regression |
CN115661510B (en) * | 2022-09-29 | 2025-07-04 | 西安电子科技大学 | Image classification method based on super-resolution image reconstruction and category consistency constraint |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8442309B2 (en) * | 2009-06-04 | 2013-05-14 | Honda Motor Co., Ltd. | Semantic scene segmentation using random multinomial logit (RML) |
US8842883B2 (en) * | 2011-11-21 | 2014-09-23 | Seiko Epson Corporation | Global classifier with local adaption for objection detection |
CN103324938A (en) * | 2012-03-21 | 2013-09-25 | 日电(中国)有限公司 | Method for training attitude classifier and object classifier and method and device for detecting objects |
US9311564B2 (en) * | 2012-10-05 | 2016-04-12 | Carnegie Mellon University | Face age-estimation and methods, systems, and software therefor |
CN103942558A (en) * | 2013-01-22 | 2014-07-23 | 日电(中国)有限公司 | Method and apparatus for obtaining object detectors |
CN103310230B (en) * | 2013-06-17 | 2016-04-13 | 西北工业大学 | Combine the hyperspectral image classification method separating mixed and self-adaptation Endmember extraction |
CN103530656B (en) * | 2013-09-10 | 2017-01-11 | 浙江大学 | Hidden structure learning-based image digest generation method |
CN103761295B (en) * | 2014-01-16 | 2017-01-11 | 北京雅昌文化发展有限公司 | Automatic picture classification based customized feature extraction method for art pictures |
-
2014
- 2014-09-05 CN CN201410453884.6A patent/CN105389583A/en not_active Withdrawn
-
2015
- 2015-04-02 WO PCT/CN2015/075781 patent/WO2016033965A1/en active Application Filing
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106056146A (en) * | 2016-05-27 | 2016-10-26 | 西安电子科技大学 | Logistic regression-based visual tracking method |
CN106056146B (en) * | 2016-05-27 | 2019-03-26 | 西安电子科技大学 | The visual tracking method that logic-based returns |
CN108875455B (en) * | 2017-05-11 | 2022-01-18 | Tcl科技集团股份有限公司 | Unsupervised intelligent face accurate identification method and system |
CN108875455A (en) * | 2017-05-11 | 2018-11-23 | Tcl集团股份有限公司 | A kind of unsupervised face intelligence precise recognition method and system |
CN107492067A (en) * | 2017-09-07 | 2017-12-19 | 维沃移动通信有限公司 | A kind of image beautification method and mobile terminal |
CN107492067B (en) * | 2017-09-07 | 2019-06-07 | 维沃移动通信有限公司 | A kind of image beautification method and mobile terminal |
CN109784351A (en) * | 2017-11-10 | 2019-05-21 | 财付通支付科技有限公司 | Data classification method, disaggregated model training method and device |
CN108595568A (en) * | 2018-04-13 | 2018-09-28 | 重庆邮电大学 | A kind of text sentiment classification method based on very big unrelated multivariate logistic regression |
CN108549692A (en) * | 2018-04-13 | 2018-09-18 | 重庆邮电大学 | The method that sparse multivariate logistic regression model under Spark frames classifies to text emotion |
CN108549692B (en) * | 2018-04-13 | 2021-05-11 | 重庆邮电大学 | Method for classifying text emotion through sparse multiple logistic regression model under Spark framework |
CN108536838B (en) * | 2018-04-13 | 2021-10-19 | 重庆邮电大学 | A Spark-based Maximum Irrelevant Multiple Logistic Regression Model for Text Sentiment Classification |
CN108536838A (en) * | 2018-04-13 | 2018-09-14 | 重庆邮电大学 | Very big unrelated multivariate logistic regression model based on Spark is to text sentiment classification method |
CN108595568B (en) * | 2018-04-13 | 2022-05-17 | 重庆邮电大学 | Text emotion classification method based on great irrelevant multiple logistic regression |
CN110163794A (en) * | 2018-05-02 | 2019-08-23 | 腾讯科技(深圳)有限公司 | Conversion method, device, storage medium and the electronic device of image |
CN110163794B (en) * | 2018-05-02 | 2023-08-29 | 腾讯科技(深圳)有限公司 | Image conversion method, device, storage medium and electronic device |
CN110633725A (en) * | 2018-06-25 | 2019-12-31 | 富士通株式会社 | Method and device for training classification model and classification method and device |
CN110633725B (en) * | 2018-06-25 | 2023-08-04 | 富士通株式会社 | Method and device for training classification model and classification method and device |
CN110084380A (en) * | 2019-05-10 | 2019-08-02 | 深圳市网心科技有限公司 | A kind of repetitive exercise method, equipment, system and medium |
CN113674219A (en) * | 2021-07-28 | 2021-11-19 | 云南大益微生物技术有限公司 | Tea leaf impurity identification method based on double logistic regression |
Also Published As
Publication number | Publication date |
---|---|
WO2016033965A1 (en) | 2016-03-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105389583A (en) | Image classifier generation method, and image classification method and device | |
CN108664996B (en) | A method and system for ancient text recognition based on deep learning | |
CN103984943B (en) | A kind of scene text recognition methods based on Bayesian probability frame | |
CN104392241B (en) | A kind of head pose estimation method returned based on mixing | |
CN105426842A (en) | Support vector machine based surface electromyogram signal multi-hand action identification method | |
CN110188654B (en) | Video behavior identification method based on mobile uncut network | |
CN110717423B (en) | Training method and device for emotion recognition model of facial expression of old people | |
CN107292246A (en) | Infrared human body target identification method based on HOG PCA and transfer learning | |
CN105389593A (en) | Image object recognition method based on SURF | |
CN107742095A (en) | Chinese sign language recognition method based on convolutional neural network | |
CN105718866A (en) | Visual target detection and identification method | |
CN108229401A (en) | A kind of multi-modal Modulation recognition method based on AFSA-SVM | |
CN105334504A (en) | Radar target identification method based on large-boundary nonlinear discrimination projection model | |
CN110969073B (en) | A Facial Expression Recognition Method Based on Feature Fusion and BP Neural Network | |
CN107292225A (en) | A kind of face identification method | |
CN110096991A (en) | A kind of sign Language Recognition Method based on convolutional neural networks | |
CN104966052A (en) | Attributive characteristic representation-based group behavior identification method | |
CN104268507A (en) | Manual alphabet identification method based on RGB-D image | |
CN105678261A (en) | Supervised figure-based transductive data dimension-descending method | |
CN104992166A (en) | Robust measurement based handwriting recognition method and system | |
CN111783688A (en) | A classification method of remote sensing image scene based on convolutional neural network | |
CN106250818A (en) | A kind of total order keeps the face age estimation method of projection | |
CN109002771B (en) | Remote sensing image classification method based on recurrent neural network | |
CN110991554B (en) | Improved PCA (principal component analysis) -based deep network image classification method | |
CN104077771A (en) | Mixed model image segmentation method for achieving space limitation with weighing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20160309 |