CN106845551B

CN106845551B - A kind of histopathological image recognition method

Info

Publication number: CN106845551B
Application number: CN201710059300.0A
Authority: CN
Inventors: 汤红忠; 李骁; 王翔; 毛丽珍
Original assignee: Xiangtan University
Current assignee: Xiangtan University
Priority date: 2017-01-24
Filing date: 2017-01-24
Publication date: 2020-08-11
Anticipated expiration: 2037-01-24
Also published as: CN106845551A

Abstract

The invention discloses a tissue pathology image identification method, which comprises the following steps: selecting disease-free and disease-existing training samples and disease-free and disease-existing testing samples; establishing a disease-free dictionary learning model and a disease dictionary learning model by combining the disease-free training samples and the disease training samples, alternately and iteratively optimizing two objective functions until the maximum iteration times is reached, and learning to obtain a disease-free dictionary and a disease dictionary; performing sparse representation on the test sample by using the disease-free dictionary and the disease dictionary, and respectively calculating sparse reconstruction error vectors of the test sample under the disease-free dictionary and the disease dictionary; and obtaining classification statistics through sparse reconstruction of the error vector, and determining the category of the test sample through comparison of the classification statistics and a threshold value. The invention provides a new model and a new method for the application of dictionary learning in the classification of histopathology images, and the learned dictionary with class marks has better sparse reconstruction and intra-class robustness for similar samples and better inter-class discrimination for non-similar samples.

Description

A kind of histopathological image recognition method

技术领域technical field

本发明涉及一种组织病理图像识别方法。The invention relates to a tissue pathological image recognition method.

背景技术Background technique

随着计算机辅助诊断技术的发展，“数字病理”的研究也逐渐受到广大科研工作者的关注，其中，如何精确地自动提取隐藏在图像中的判别性特征，为后续组织病理图像分析或分类提供必要的信息，从而快速准确给出疾病等级与分类，已成为“数字病理”中极具挑战性的研究课题之一。With the development of computer-aided diagnosis technology, the research of "digital pathology" has gradually attracted the attention of the majority of scientific researchers. Among them, how to accurately and automatically extract the discriminative features hidden in the image provides information for subsequent analysis or classification of histopathological images. It has become one of the most challenging research topics in "digital pathology" to obtain the necessary information to quickly and accurately give disease grades and classifications.

传统的特征提取方式主要分为以下两类：第一大类是基于特定域或特定任务的特征，如生物细胞的大小与形态特征、图像的灰度或彩色信息、纹理等；第二大类主要以空间结构与多尺度特征为主，如形态学特征、图方法、尺度不变特征、小波特征等。上述传统特征提取方式多为像素级特征或手工特征，一般只适合特定的数据对象，其应用范围受到限制，而且特征冗余度高，判别性低。The traditional feature extraction methods are mainly divided into the following two categories: the first category is based on the characteristics of specific domains or specific tasks, such as the size and morphological characteristics of biological cells, grayscale or color information of images, textures, etc.; the second category It mainly focuses on spatial structure and multi-scale features, such as morphological features, graph methods, scale-invariant features, and wavelet features. The above-mentioned traditional feature extraction methods are mostly pixel-level features or manual features, which are generally only suitable for specific data objects, their application scope is limited, and feature redundancy is high and discriminative is low.

近些年来，稀疏表示因其在众多计算机视觉问题中的突出表现而获得了极大关注。其基本思想是将一个原始信号表示成以一组过完备字典为基的稀疏信号。稀疏表示在图像去噪与恢复，人脸识别，图像分类等领域中都获得了极大成功。而随着技术的发展，如何学习到适用于特定问题(比如用于图像分类)的字典成为学者们关注的焦点，即一个字典学习的理论框架。In recent years, sparse representations have received great attention due to their outstanding performance in numerous computer vision problems. The basic idea is to represent an original signal as a sparse signal based on a set of overcomplete dictionaries. Sparse representations have achieved great success in areas such as image denoising and restoration, face recognition, and image classification. With the development of technology, how to learn a dictionary suitable for specific problems (such as image classification) has become the focus of scholars, that is, a theoretical framework for dictionary learning.

字典学习的关键在于构造的字典是否具有较好的重构性与判别性。对这一类问题，Zhang等提出了一种判别性K-SVD(Discriminative K-SVD，DK-SVD)字典学习方法。Jiang等提出了基于类标一致K-SVD(Label Consistent K-SVD，LC-KSVD)的字典学习方法。Yang等采用Fisher准则提出判别性字典学习(Fisher Discrimination DictionaryLearning，FDDL)方法，通过约束稀疏表示系数间接提升字典的判别性能。Vu等提出了一种面向判别性特征的字典学习(Discriminative Feature-oriented Dictionary Learning，DFDL)方法，并将其应用于组织病理图像分类。上述方法，在图像分类中能取得非常不错的分类效果。The key to dictionary learning is whether the constructed dictionary has good reconstruction and discriminative properties. For this type of problem, Zhang et al. proposed a discriminative K-SVD (DK-SVD) dictionary learning method. Jiang et al. proposed a dictionary learning method based on Label Consistent K-SVD (LC-KSVD). Yang et al. proposed the Fisher Discrimination Dictionary Learning (FDDL) method using the Fisher criterion, which indirectly improves the discriminative performance of the dictionary by constraining the sparse representation coefficients. Vu et al. proposed a Discriminative Feature-oriented Dictionary Learning (DFDL) method and applied it to histopathological image classification. The above methods can achieve very good classification results in image classification.

然而，由于不同类型的组织病理图像呈现的特征各异，同一类型的组织病理图像中细胞形态与几何结构特征变化较大，病理特征也呈现出多样化，这导致同类病理图像样本间的特征差异大于非同类病理图像样本间的特征差异，使得上述方法学习的有病字典与无病字典相似程度较高，对无病样本与有病样本的判别性仍然较低，其分类性能依然有的待于提高。However, due to the different features presented by different types of histopathological images, the cell morphology and geometric structure characteristics in the same type of histopathological images vary greatly, and the pathological features also show diversification, which leads to the feature differences between the same type of pathological image samples. It is greater than the feature difference between non-similar pathological image samples, so that the disease dictionary learned by the above method is more similar to the disease-free dictionary, and the discrimination between disease-free samples and diseased samples is still low, and its classification performance still needs to be to improve.

发明内容SUMMARY OF THE INVENTION

为了解决上述技术问题，本发明提供一种准确率高、鲁棒性高的组织病理图像识别方法。In order to solve the above technical problems, the present invention provides a histopathological image recognition method with high accuracy and high robustness.

本发明解决上述问题的技术方案是：一种组织病理图像识别方法，包括以下步骤：The technical solution of the present invention to solve the above problems is: a method for identifying histopathological images, comprising the following steps:

步骤一，从某一组织的无病和有病两种图像中分别选取若干图像块作为无病和有病训练样本，无病和有病测试样本；Step 1: Select a number of image blocks from the disease-free and diseased images of a certain tissue, respectively, as disease-free and diseased training samples, and disease-free and diseased test samples;

步骤二，优化学习无病字典：结合无病训练样本和有病训练样本，建立无病字典学习模型，通过两步交替迭代的优化方式最小化目标函数，学习得到无病字典；Step 2, optimize the learning of the disease-free dictionary: combine the disease-free training samples and the diseased training samples to establish a disease-free dictionary learning model, and minimize the objective function through the optimization method of two-step alternate iteration, and learn to obtain the disease-free dictionary;

步骤三，优化学习有病字典：结合有病训练样本和无病训练样本，建立有病字典学习模型，通过两步交替迭代的优化方式最小化目标函数，学习得到有病字典；Step 3, optimize the learning of the diseased dictionary: combine the diseased training samples and the disease-free training samples to establish a diseased dictionary learning model, and minimize the objective function through the optimization method of two-step alternate iteration, and learn to obtain the diseased dictionary;

步骤四，判断是否达到最大迭代次数，若是，则进入步骤五，若不是，则返回步骤二；Step 4, determine whether the maximum number of iterations is reached, if so, go to Step 5, if not, return to Step 2;

步骤五，获得测试样本的重构误差向量：利用获得的无病字典和有病字典，对测试样本进行稀疏表示，然后分别计算测试样本在无病字典和有病字典下的稀疏重构误差向量；Step 5: Obtain the reconstruction error vector of the test sample: use the obtained disease-free dictionary and diseased dictionary to sparsely represent the test sample, and then calculate the sparse reconstruction error vector of the test sample under the disease-free dictionary and the diseased dictionary respectively. ;

步骤六：获得测试样本的分类结果：通过稀疏重构误差向量获得分类统计量，然后通过分类统计量与阈值的比较确定测试样本的类别。Step 6: Obtain the classification result of the test sample: obtain the classification statistic by sparsely reconstructing the error vector, and then determine the category of the test sample by comparing the classification statistic with the threshold.

上述组织病理图像识别方法，所述步骤一具体步骤为，从某一组织无病和有病两种图像中分别选取同等数量的图像块，然后将每个图像块分为RGB三通道，将三通道的像素值转换成列向量后串联得到特征向量，最后将特征向量并列作为无病和有病训练样本Y,

同理获得测试样本。In the above-mentioned histopathological image recognition method, the specific step of the first step is to select an equal number of image blocks from two images of a certain tissue without disease and with disease, and then divide each image block into three RGB channels, and divide the three image blocks into three RGB channels. The pixel values of the channels are converted into column vectors to obtain feature vectors in series, and finally the feature vectors are juxtaposed as the disease-free and diseased training samples Y,

Obtain test samples in the same way.

上述组织病理图像识别方法，所述步骤二的具体步骤为In the above method for identifying histopathological images, the specific steps of the second step are as follows:

2-1：从无病和有病训练样本中分别随机选取n列向量作为初始化的无病字典D和有病字典

2-1: Randomly select n-column vectors from the disease-free and diseased training samples as the initial disease-free dictionary D and diseased dictionary

2-2：建立无病字典学习模型，模型如下：2-2: Establish a disease-free dictionary learning model, the model is as follows:

其中，argmin表示使目标函数取最小值时的变量值，Y、

分别代表无病与有病训练样本，X、

分别代表无病与有病训练样本的稀疏表示系数，N和

分别代表无病和有病图像特征向量的数量，L₁为无病样本和有病样本在无病字典下的编码稀疏度，ρ为正则化参数，且ρ>0；式中的

代表无病字典与无病训练样本的稀疏重构误差，

代表无病字典与有病训练样本的重构误差，F表示范数，Ψ(D)为无病字典的Fisher准则约束项，其表达式为：

其中m为无病字典D中所有原子的均值，M为无病字典D的原子均值m组成的矩阵，

为有病字典

中所有原子的均值，α、β分别代表类内间距与类间间距的惩罚系数，α,β>0；Among them, argmin represents the variable value when the objective function takes the minimum value, Y,

represent the disease-free and diseased training samples, respectively, X,

represent the sparse representation coefficients of disease-free and diseased training samples, respectively, N and

Represent the number of disease-free and diseased image feature vectors, respectively, L ₁ is the coding sparsity of disease-free samples and diseased samples under the disease-free dictionary, ρ is a regularization parameter, and ρ>0; in the formula

represents the sparse reconstruction error of the disease-free dictionary and the disease-free training samples,

Represents the reconstruction error between the disease-free dictionary and the diseased training sample, F represents the norm, Ψ(D) is the Fisher criterion constraint of the disease-free dictionary, and its expression is:

where m is the mean of all atoms in the disease-free dictionary D, M is a matrix composed of the atomic mean m of the disease-free dictionary D,

dictionary for sick

The mean of all atoms in , α and β represent the penalty coefficients of intra-class spacing and inter-class spacing, respectively, α, β>0;

2-3：固定无病字典D，更新稀疏编码系数，此时的目标函数如下：2-3: Fix the disease-free dictionary D and update the sparse coding coefficients. The objective function at this time is as follows:

令训练样本

编码系数矩阵

L₁为无病样本和有病样本在无病字典下的编码稀疏度，最优稀疏解为

则目标函数的求解分为无病训练样本在无病字典D下的稀疏表示与有病训练样本在无病字典D下的稀疏表示两步迭代完成，统一的简化如下：Let the training sample

Coding coefficient matrix

L ₁ is the coding sparsity of disease-free samples and diseased samples under the disease-free dictionary, and the optimal sparse solution is

The solution of the objective function is divided into two steps: the sparse representation of the disease-free training samples under the disease-free dictionary D and the sparse representation of the diseased training samples under the disease-free dictionary D. The unified simplification is as follows:

利用SPAMS工具箱中的OMP算法，分别求解训练样本在无病字典D稀疏解

Use the OMP algorithm in the SPAMS toolbox to solve the sparse solutions of the training samples in the disease-free dictionary D respectively

2-4：固定稀疏编码系数，更新无病字典D，此时的目标函数如下：2-4: Fix the sparse coding coefficients and update the disease-free dictionary D. The objective function at this time is as follows:

通过化简得：By simplifying:

其中，tr表示矩阵的迹where tr represents the trace of the matrix

采用坐标梯度下降法求出无病字典D最优解。The optimal solution of the disease-free dictionary D is obtained by using the coordinate gradient descent method.

上述组织病理图像识别方法，所述步骤三的具体步骤为In the above-mentioned histopathological image recognition method, the specific steps of the third step are as follows:

3-1：从无病和有病训练样本中分别随机选取n列向量作为初始化的无病字典D和有病字典

3-1: Randomly select n-column vectors from the disease-free and diseased training samples as the initialized disease-free dictionary D and diseased dictionary

3-2：建立有病字典学习模型，模型如下：3-2: Establish a sick dictionary learning model, the model is as follows:

其中，Y、

分别代表无病与有病训练样本，X、

分别代表无病与有病训练样本的稀疏表示系数，N和

分别代表无病和有病图像特征向量的数量，L₂为无病样本和有病样本在有病字典下的编码稀疏度，ρ为正则化参数，且ρ>0；式中的

代表有病字典与有病样本的稀疏重构误差，

代表有病字典与无病样本的重构误差，

为有病字典的Fisher准则约束项，其表达式为：

其中m为无病字典D中所有原子的均值，

为有病字典

中所有原子的均值，M为有病字典

中所有原子的均值

组成的矩阵；Among them, Y,

represent the disease-free and diseased training samples, respectively, X,

Represent the number of disease-free and diseased image feature vectors respectively, L ₂ is the coding sparsity of disease-free samples and diseased samples under the diseased dictionary, ρ is the regularization parameter, and ρ>0; in the formula

represents the sparse reconstruction error of the diseased dictionary and the diseased sample,

represents the reconstruction error between the diseased dictionary and the disease-free sample,

is the Fisher criterion constraint of the diseased dictionary, and its expression is:

where m is the mean of all atoms in the disease-free dictionary D,

dictionary for sick

The mean of all atoms in , M is the sick dictionary

mean of all atoms in

composed of a matrix;

3-3：固定有病字典

更新稀疏编码系数，此时的目标函数如下：3-3: Fixed sick dictionary

To update the sparse coding coefficients, the objective function at this time is as follows:

令训练样本

编码系数矩阵

L₂为无病样本和有病样本在有病字典下的编码稀疏度，最优稀疏解为

则目标函数的求解分为无病训练样本在有病字典

下的稀疏表示与有病训练样本在有病字典

下的稀疏表示两步迭代完成，统一的简化如下：Let the training sample

Coding coefficient matrix

L ₂ is the coding sparsity of disease-free samples and diseased samples under the diseased dictionary, and the optimal sparse solution is

Then the solution of the objective function is divided into the disease-free training samples in the diseased dictionary

Sparse representation with sick training samples under sick dictionary

The sparse representation below is completed in two iterations, and the unified simplification is as follows:

利用SPAMS工具箱中的OMP算法，分别求解训练样本在有病字典

稀疏解

Use the OMP algorithm in the SPAMS toolbox to solve the training samples in the diseased dictionary respectively

sparse solution

3-4：固定稀疏编码系数，更新有病字典

此时的目标函数如下：3-4: Fix sparse coding coefficients, update diseased dictionary

The objective function at this time is as follows:

通过化简得：By simplifying:

其中，

in,

采用坐标梯度下降法求出有病字典

最优解。Using the Coordinate Gradient Descent Method to Find the Diseased Dictionary

Optimal solution.

上述组织病理图像识别方法，所述步骤五的具体步骤为In the above-mentioned histopathological image recognition method, the specific steps of the step 5 are as follows:

5-1，将测试样本图像分块，每个图块视为一个列向量h，随机取u个图块组成矩阵H作为测试样本，利用

求得测试样本H在带类标字典

下的稀疏编码

5-1, divide the test sample image into blocks, each block is regarded as a column vector h, randomly select u blocks to form a matrix H as the test sample, use

Obtain the test sample H in the dictionary with class label

sparse coding under

5-2，计算测试样本在无病字典D与有病字典

下的稀疏重构误差向量，即δ₁＝diag((H-DX)(H-DX)^T)，

其中，diag(·)表示矩阵主对角线上的元素。5-2, Calculate the test sample in the disease-free dictionary D and the diseased dictionary

The sparse reconstruction error vector under , namely δ ₁ =diag((H-DX)(H-DX) ^T ),

where diag( ) represents the elements on the main diagonal of the matrix.

上述组织病理图像识别方法，所述步骤六的具体步骤为In the above-mentioned histopathological image recognition method, the specific steps of the step 6 are as follows:

6-1，定义向量

N_t为测试样本的个数；6-1, define vector

N _t is the number of test samples;

6-2，由向量C得到分类统计量S：6-2, get the classification statistic S from the vector C:

当分类统计量S大于或者等于阈值Th，测试样本为无病样本；反之，当分类统计量S小于阈值Th，则测试样本为有病样本。When the classification statistic S is greater than or equal to the threshold Th, the test sample is a disease-free sample; on the contrary, when the classification statistic S is less than the threshold Th, the test sample is a diseased sample.

本发明的有益效果在于：本发明的步骤包括：首先从组织病理图像数据集中分别随机选取若干图像块作为训练样本和测试样本；然后将不同类型的训练样本输入到模型中，使用交替迭代的方法对模型进行求解，不断优化目标函数，学习得到带类标字典；最后基于得到的带类标字典对测试集矩阵进行稀疏表示，通过重构误差向量和阈值的对比确定此测试集矩阵的类别。本发明对字典学习在组织病理图像分类中的应用提出了新的模型和方法，学习出的带类标字典对同类样本具有较好的稀疏重构性与类内鲁棒性，对非同类样本具有较好的类间判别性，能有效提高组织病理图像分类性能。The beneficial effects of the present invention are as follows: the steps of the present invention include: first, randomly selecting several image blocks from the histopathological image data set as training samples and test samples; then inputting different types of training samples into the model, using an alternate iteration method The model is solved, the objective function is continuously optimized, and the dictionary with class labels is obtained by learning; finally, the test set matrix is sparsely represented based on the obtained dictionary with class labels, and the category of the test set matrix is determined by comparing the reconstructed error vector and the threshold. The invention proposes a new model and method for the application of dictionary learning in the classification of histopathological images. The learned dictionary with class labels has good sparse reconstruction and intra-class robustness for similar samples, and it has better sparse reconstruction and intra-class robustness for non-homogeneous samples. It has good inter-class discrimination and can effectively improve the classification performance of histopathological images.

附图说明Description of drawings

图1为本发明的流程图。FIG. 1 is a flow chart of the present invention.

图2为ADL数据库中肺、脾脏、肾脏的组织病理示意图，其中(a)从左至右分别为肺、脾脏、肾脏的无病图像，(b)从左至右分别为肺、脾脏、肾脏的有病图像。Figure 2 is a schematic diagram of the histopathology of the lung, spleen and kidney in the ADL database, in which (a) from left to right are the disease-free images of the lung, spleen, and kidney, respectively, (b) from left to right are the lung, spleen, and kidney, respectively sick images.

图3为BreaKHis数据库中腺病与叶状癌的组织病理示意图，其中(a)为腺病的组织病理图像，(b)为叶状癌的组织病理图像。3 is a schematic diagram of histopathology of adenopathy and phyllodes carcinoma in the BreaKHis database, wherein (a) is the histopathological image of adenopathy, and (b) is the histopathological image of phyllodes carcinoma.

具体实施方式Detailed ways

下面结合附图和实施例对本发明作进一步的说明。The present invention will be further described below with reference to the accompanying drawings and embodiments.

如图1所示，本发明包括以下步骤：As shown in Figure 1, the present invention comprises the following steps:

步骤一：从某一组织的无病和有病两种图像中分别选取若干图像块作为无病和有病训练样本，无病和有病测试样本。具体步骤为：Step 1: Select several image blocks from two kinds of images of a tissue without disease and with disease as training samples without disease and with disease, and test samples without disease and disease. The specific steps are:

从某一组织的无病和有病两种图像中分别随机选取40张图像，从每张图像随机提取250个图块，块的大小为20×20，则共计10000个彩色图块，然后将每个彩色图块分为RGB三通道，将三通道的像素值转换成列向量后串联得到特征向量，最后将特征向量并列作为训练样本，则Y,

R^1200×10000表示矩阵的大小，分别从剩余的某一种组织图像中随机选取无病和有病两种图像各110张作为测试集。Randomly select 40 images from two images of a certain tissue without disease and with disease, and randomly extract 250 blocks from each image, the size of the block is 20 × 20, there are a total of 10,000 color blocks, and then the Each color block is divided into three RGB channels, and the pixel values of the three channels are converted into column vectors to obtain feature vectors in series, and finally the feature vectors are juxtaposed as training samples, then Y,

R ^1200×10000 represents the size of the matrix, and 110 images of disease-free and diseased images are randomly selected from the remaining tissue images as the test set.

步骤二，优化学习无病字典：结合无病训练样本和有病训练样本，建立无病字典学习模型，通过两步交替迭代的优化方式最小化目标函数，学习得到无病字典。具体步骤为：Step 2: Optimizing learning of disease-free dictionary: combining disease-free training samples and diseased training samples, establishing a disease-free dictionary learning model, and learning to obtain a disease-free dictionary by minimizing the objective function through two-step alternate iterative optimization. The specific steps are:

其中，argmin表示使目标函数取最小值时的变量值，Y、

分别代表无病与有病训练样本，X、

分别代表无病与有病训练样本的稀疏表示系数，N和

代表无病字典与无病训练样本的稀疏重构误差，

为有病字典

中所有原子的均值，α、β分别代表类内间距与类间间距的惩罚系数，α,β>0；模型目的是通过最小化第1项和第3项并同时最大化第2项，则学习的带类标字典对同类样本的重构性能较好，对于非同类样本重构性能较差，甚至无法重构，且学习的字典间具有较强辨别能力，从而获得具有判别性特征从而进一步可以更好的分类；Among them, argmin represents the variable value when the objective function takes the minimum value, Y,

represent the disease-free and diseased training samples, respectively, X,

dictionary for sick

The mean of all atoms in , α and β represent the penalty coefficients of intra-class spacing and inter-class spacing, respectively, α, β>0; the purpose of the model is to minimize the first and third terms and maximize the second term, then The learned dictionary with class labels has good reconstruction performance for similar samples, but poor reconstruction performance for non-homogeneous samples, and even cannot be reconstructed, and the learned dictionaries have strong discriminative ability, so as to obtain discriminative features and further. can be better classified;

令训练样本

编码系数矩阵

Coding coefficient matrix

通过化简得：By simplifying:

其中，tr表示矩阵的迹where tr represents the trace of the matrix

上述函数为凸函数，采用坐标梯度下降法求出无病字典D最优解。The above functions are convex functions, and the optimal solution of the disease-free dictionary D is obtained by using the coordinate gradient descent method.

步骤三，优化学习有病字典：结合有病训练样本和无病训练样本，建立有病字典学习模型，通过两步交替迭代的优化方式最小化目标函数，学习得到有病字典。具体步骤为：Step 3, optimize the learning of the diseased dictionary: combine the diseased training samples and the disease-free training samples to establish a diseased dictionary learning model, and minimize the objective function through a two-step alternate iterative optimization method to learn the diseased dictionary. The specific steps are:

其中，Y、

分别代表无病与有病训练样本，X、

分别代表无病与有病训练样本的稀疏表示系数，N和

代表有病字典与有病样本的稀疏重构误差，

代表有病字典与无病样本的重构误差，

为有病字典的Fisher准则约束项，其表达式为：

其中m为无病字典D中所有原子的均值，

为有病字典

中所有原子的均值，M为有病字典

中所有原子的均值

组成的矩阵；模型目的是通过最小化第1项和第3项并同时最大化第2项，则学习的带类标字典对同类样本的重构性能较好，对于非同类样本重构性能较差，甚至无法重构，且学习的字典间具有较强辨别能力，从而获得具有判别性特征从而进一步可以更好的分类。Among them, Y,

represent the disease-free and diseased training samples, respectively, X,

where m is the mean of all atoms in the disease-free dictionary D,

dictionary for sick

The mean of all atoms in , M is the sick dictionary

mean of all atoms in

The purpose of the model is to minimize the 1st and 3rd items and maximize the 2nd item at the same time, then the learned dictionary with class labels has better reconstruction performance for similar samples, and better reconstruction performance for non-homogeneous samples. Poor, or even impossible to reconstruct, and the learned dictionaries have strong discriminative ability, so as to obtain discriminative features and further better classification.

3-3：固定有病字典

令训练样本

编码系数矩阵

则目标函数的求解分为无病训练样本在有病字典

下的稀疏表示与有病训练样本在有病字典

Coding coefficient matrix

Sparse representation with sick training samples under sick dictionary

利用SPAMS工具箱中的OMP算法，分别求解训练样本在有病字典

稀疏解

sparse solution

3-4：固定稀疏编码系数，更新有病字典

The objective function at this time is as follows:

通过化简得：By simplifying:

其中，

in,

采用坐标梯度下降法求出有病字典

最优解；Using the Coordinate Gradient Descent Method to Find the Diseased Dictionary

Optimal solution;

3-5：返回步骤二，优化学习无病字典和优化学习有病字典的过程交替进行，直至达到最大迭代次数时停止。3-5: Return to step 2, the process of optimizing the learning of the disease-free dictionary and the process of optimizing the learning of the diseased dictionary is performed alternately, and stops when the maximum number of iterations is reached.

步骤四，判断是否达到最大迭代次数，若是，则进入步骤五，若不是，则返回步骤二。Step 4, judge whether the maximum number of iterations is reached, if yes, go to Step 5, if not, go back to Step 2.

步骤五，获得测试样本的重构误差向量：利用获得的无病字典和有病字典，对测试样本进行稀疏表示，然后分别计算测试样本在无病字典和有病字典下的稀疏重构误差向量。具体步骤为：Step 5: Obtain the reconstruction error vector of the test sample: use the obtained disease-free dictionary and diseased dictionary to sparsely represent the test sample, and then calculate the sparse reconstruction error vector of the test sample under the disease-free dictionary and the diseased dictionary respectively. . The specific steps are:

5-1，将测试样本图像分块，每个图块视为一个列向量h，随机取250个图块组成矩阵H作为测试样本，利用

求得测试样本H在带类标字典

下的稀疏编码

5-1, divide the test sample image into blocks, each block is regarded as a column vector h, randomly select 250 blocks to form a matrix H as the test sample, use

Obtain the test sample H in the dictionary with class label

sparse coding under

5-2，计算测试样本在无病字典D与有病字典

下的稀疏重构误差向量，即δ₁＝diag((H-DX)(H-DX)^T)，

where diag( ) represents the elements on the main diagonal of the matrix.

步骤六：获得测试样本的分类结果：通过稀疏重构误差向量获得分类统计量，然后通过分类统计量与阈值的比较确定测试样本的类别。具体步骤为：Step 6: Obtain the classification result of the test sample: obtain the classification statistic by sparsely reconstructing the error vector, and then determine the category of the test sample by comparing the classification statistic with the threshold. The specific steps are:

6-1，定义向量

N_t为测试样本的个数；6-1, define vector

N _t is the number of test samples;

表1为本发明与其它方法运用到ADL数据库中的肺部图像的分类结果对比表。Table 1 is a comparison table of the classification results of lung images applied to the ADL database by the present invention and other methods.

表1Table 1

表2为本发明与其它方法运用到ADL数据库中的脾脏图像的分类结果对比表。Table 2 is a comparison table of the classification results of spleen images applied to the ADL database by the present invention and other methods.

表2Table 2

表2为本发明与其它方法运用到ADL数据库中的肾脏图像的分类结果对比表。Table 2 is a comparison table of the classification results of kidney images applied to the ADL database by the present invention and other methods.

表3table 3

由表1、表2、表3可以知道，本发明提出的模型对这三类器官的疾病诊断效果明显要好于其他方法，在无病样本与有病样本下正分率都有所提高。特别地，表1的肺部分类结果更为明显，与DFDL相比，本文方法的分类精度提升了2～3％。由图2可知，无病的肺部图像中包含体积较大的肺泡，而在有病的肺部图像中肺泡体积较小，且布满了蓝紫色的炎症细胞，且纹理更为复杂，无病与有病的肺部图像之间差异性明显大于脾脏与肾脏图像。同时，无病与有病的脾脏图像纹理与结构相似度高，但因颜色差异较大，两类图像判别性次之，其分类性能次之；无病与有病的肾脏图像不仅纹理与结构相似高、而且颜色相似度高，判别性最差，其分类性能最弱。表中实验结果与图1完全相符，再次说明本发明提出的模型的有效性。From Table 1, Table 2, and Table 3, it can be known that the model proposed by the present invention has significantly better disease diagnosis effect on these three types of organs than other methods, and the positive score rate is improved in both disease-free samples and diseased samples. In particular, the lung classification results in Table 1 are more obvious. Compared with DFDL, the classification accuracy of our method is improved by 2-3%. It can be seen from Figure 2 that the lung images without disease contain larger alveoli, while in the images of diseased lungs, the alveoli are smaller in volume, covered with blue-purple inflammatory cells, and have more complex textures. The difference between diseased and diseased lung images was significantly greater than that of spleen and kidney images. At the same time, the texture and structure of the images of the spleen without disease and disease are highly similar, but due to the large color difference, the two types of images are second in discriminative performance, and their classification performance is second; the images of kidneys without disease and disease are not only texture and structure The similarity is high, and the color similarity is high, the discriminative is the worst, and its classification performance is the weakest. The experimental results in the table are completely consistent with Fig. 1, again illustrating the validity of the model proposed by the present invention.

为了验证本发明构建的组织病理图像的判别性特征学习框架的普适性，特别的，将本发明所提的模型应用于BreaKHis数据集中疾病类型的诊断。In order to verify the universality of the discriminative feature learning framework for histopathological images constructed by the present invention, in particular, the model proposed by the present invention is applied to the diagnosis of disease types in the BreaKHis dataset.

表4为本发明与其它方法运用到BreaKHis数据库中分类结果对比表。Table 4 is a comparison table of classification results applied to BreaKHis database between the present invention and other methods.

表4Table 4

表4给出了不同方法在BreaKHis数据库上的分类结果，实验结果表明，本发明提出的模型对于图3中两种良性乳腺癌图像同样显示出了较好疾病分类性能，这一结果说明本发明对于有效提高带类标字典对同类样本的稀疏表示的重构性与鲁棒性具有较好的作用，同时也解决了对于非同类样本判别性差的问题。Table 4 shows the classification results of different methods on the BreaKHis database. The experimental results show that the model proposed by the present invention also shows better disease classification performance for the two benign breast cancer images in FIG. 3, and this result shows that the present invention It has a good effect on effectively improving the reconstruction and robustness of the sparse representation of the sparse representation of the similar samples with the class-labeled dictionary, and also solves the problem of poor discrimination for non-similar samples.

Claims

1. A tissue pathology image recognition method, comprising the steps of:

firstly, selecting a plurality of image blocks from disease-free images and disease-containing images of a certain tissue as disease-free training samples and disease-containing training samples, and disease-free testing samples and disease-containing testing samples;

step two, optimizing and learning the disease-free dictionary: establishing a study model of the disease-free dictionary by combining the disease-free training samples and the disease training samples, and obtaining the disease-free dictionary through learning by minimizing a target function in a two-step alternate iterative optimization mode;

the second step comprises the following specific steps

2-1: respectively randomly selecting n column vectors from the training samples without diseases and with diseases as initialized dictionary D without diseases and dictionary with diseases

2-2: establishing a disease-free dictionary learning model, wherein the model is as follows:

wherein argmin represents a variable value at which the objective function is minimized, Y,

Respectively represent the training samples of no disease and disease, X,

Sparse representation coefficients representing the training samples of disease-free and disease respectively, N and N representing the number of feature vectors of disease-free and disease images respectively, L₁The encoding sparsity of the disease-free samples and the disease-containing samples under the disease-free dictionary, rho is a regularization parameter, and rho>0; in the formula

Representing the sparse reconstruction error of the disease-free dictionary and the disease-free training sample,

representing the reconstruction error of the disease-free dictionary and the disease-containing training sample, wherein F represents a norm, psi (D) is a Fisher criterion constraint term of the disease-free dictionary, and the expression is as follows:

wherein M is the mean value of all atoms in the disease-free dictionary D, M is a matrix formed by the mean values M of the atoms in the disease-free dictionary D,

for having a fault dictionary

The mean values of all atoms in (α), (β) represent the penalty coefficients of the intra-class spacing and the inter-class spacing, α>0；

2-3: fixing the disease-free dictionary D, and updating the sparse coding coefficient, wherein the objective function at the moment is as follows:

order training sample

Coding coefficient matrix

L₁The coding sparsity of the disease-free samples and the disease-containing samples under the disease-free dictionary is optimally solved as

Then, the solution of the objective function is completed by two steps of iteration of the sparse representation of the disease-free training sample in the disease-free dictionary D and the sparse representation of the disease-free training sample in the disease-free dictionary D, and the unified simplification is as follows:

respectively solving sparse solutions of training samples in the disease-free dictionary D by utilizing OMP algorithm in SPAMS toolbox

2-4: fixing the sparse coding coefficient, and updating the disease-free dictionary D, wherein the objective function at the moment is as follows:

through simplification, the method comprises the following steps:

where tr denotes the trace of the matrix

Solving an optimal solution of the disease-free dictionary D by adopting a coordinate gradient descent method; step three, optimizing and learning the sick dictionary: establishing a diseased dictionary learning model by combining a diseased training sample and a disease-free training sample, and learning to obtain a diseased dictionary by minimizing a target function in a two-step alternate iteration optimization mode;

the third step comprises the following specific steps

3-1: respectively randomly selecting n column vectors from the training samples without diseases and with diseases as initialized dictionary D without diseases and dictionary with diseases

3-2: a disease dictionary learning model is established, and the model is as follows:

wherein Y is,

Respectively represent the training samples of no disease and disease, X,

Sparse representation coefficients representing the training samples of disease-free and disease respectively, N and N representing the number of feature vectors of disease-free and disease images respectively, L₂The encoding sparsity of the disease-free samples and the disease-containing samples under the disease dictionary, rho is a regularization parameter, and rho>0; in the formula

Representing sparse reconstruction errors of the diseased dictionary and the diseased sample,

representing the reconstruction error of the diseased dictionary and the non-diseased sample,

the Fisher criterion constraint term of the sick dictionary is expressed as:

where m is the mean of all atoms in the disease-free dictionary D,

for having a fault dictionary

The mean value of all the atoms in (c),

for having a fault dictionary

Mean of all atoms in

A matrix of compositions;

3-3: fixed with a sick dictionary

And updating the sparse coding coefficient, wherein the objective function at the moment is as follows:

order training sample

Coding coefficient matrix

L₂The coding sparsity of the disease-free samples and the disease-containing samples under the disease dictionary is determined as the optimal sparsity solution

The solution of the objective function is divided into the case that the disease-free training sample is in the disease dictionary

Sparse representation of lower and sick training sample in sick dictionary

The following sparseness represents two iterative steps, which are uniformly simplified as follows:

respectively solving the training samples in the dictionary with diseases by utilizing the OMP algorithm in the SPAMS toolbox

Sparse solution

3-4: fixing sparse coding coefficients and updating a sick dictionary

The objective function at this time is as follows:

through simplification, the method comprises the following steps:

wherein,

method for solving dictionary with diseases by adopting coordinate gradient descent method

An optimal solution;

step four, judging whether the maximum iteration times is reached, if so, entering step five, and if not, returning to step two;

step five, obtaining a reconstructed error vector of the test sample: performing sparse representation on the test sample by using the acquired disease-free dictionary and disease dictionary, and then respectively calculating sparse reconstruction error vectors of the test sample under the disease-free dictionary and the disease dictionary;

step six: obtaining a classification result of the test sample: obtaining a classification statistic through sparse reconstruction of the error vector, and then determining the category of the test sample through comparison of the classification statistic with a threshold value.

2. The histopathological image recognition method according to claim 1, wherein: the first step is that the image blocks with the same number are respectively selected from two images with diseases and without diseases of a certain tissue, then each image block is divided into RGB three channels, pixel values of the three channels are converted into column vectors and then are connected in series to obtain a feature vector, finally the feature vectors are juxtaposed to be used as training samples Y with diseases and without diseases,

test samples were obtained in the same manner.

3. The histopathological image recognition method according to claim 2, wherein: the concrete steps of the fifth step are

5-1, dividing the image of the test sample into blocks, regarding each block as a column vector H, randomly selecting u blocks to form a matrix H as the test sample, and utilizing

Solving test sample H in dictionary with class mark

Sparse coding of

5-2, calculating the test samples in the non-diseased dictionary D and the diseased dictionary

Sparse reconstructed error vector of₁＝diag((H-DX)(H-DX)^T)，

Where diag (·) represents the elements on the main diagonal of the matrix.

4. The histopathological image recognition method according to claim 3, wherein the concrete step of the sixth step is

6-1, defining a vector

N_tThe number of the test samples;

6-2, obtaining a classification statistic S from the vector C:

when the classification statistic S is greater than or equal to the threshold Th, the test sample is a disease-free sample; otherwise, when the classification statistic S is smaller than the threshold Th, the test sample is a diseased sample.