CN106845551B - A kind of histopathological image recognition method - Google Patents
A kind of histopathological image recognition method Download PDFInfo
- Publication number
- CN106845551B CN106845551B CN201710059300.0A CN201710059300A CN106845551B CN 106845551 B CN106845551 B CN 106845551B CN 201710059300 A CN201710059300 A CN 201710059300A CN 106845551 B CN106845551 B CN 106845551B
- Authority
- CN
- China
- Prior art keywords
- disease
- dictionary
- free
- samples
- diseased
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 201000010099 disease Diseases 0.000 claims abstract description 202
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 202
- 238000012549 training Methods 0.000 claims abstract description 72
- 238000012360 testing method Methods 0.000 claims abstract description 50
- 239000013598 vector Substances 0.000 claims abstract description 45
- 230000006870 function Effects 0.000 claims abstract description 31
- 230000007170 pathology Effects 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 23
- 238000011478 gradient descent method Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 6
- 241000287196 Asthenes Species 0.000 claims 1
- 208000037919 acquired disease Diseases 0.000 claims 1
- 239000000203 mixture Substances 0.000 claims 1
- 210000004072 lung Anatomy 0.000 description 8
- 210000003734 kidney Anatomy 0.000 description 6
- 210000000952 spleen Anatomy 0.000 description 6
- 210000001519 tissue Anatomy 0.000 description 6
- 230000001575 pathological effect Effects 0.000 description 4
- 201000009030 Carcinoma Diseases 0.000 description 2
- 208000008771 Lymphadenopathy Diseases 0.000 description 2
- 241001440127 Phyllodes Species 0.000 description 2
- 208000013228 adenopathy Diseases 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 206010006187 Breast cancer Diseases 0.000 description 1
- 208000026310 Breast neoplasm Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004195 computer-aided diagnosis Methods 0.000 description 1
- 210000004969 inflammatory cell Anatomy 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
技术领域technical field
本发明涉及一种组织病理图像识别方法。The invention relates to a tissue pathological image recognition method.
背景技术Background technique
随着计算机辅助诊断技术的发展,“数字病理”的研究也逐渐受到广大科研工作者的关注,其中,如何精确地自动提取隐藏在图像中的判别性特征,为后续组织病理图像分析或分类提供必要的信息,从而快速准确给出疾病等级与分类,已成为“数字病理”中极具挑战性的研究课题之一。With the development of computer-aided diagnosis technology, the research of "digital pathology" has gradually attracted the attention of the majority of scientific researchers. Among them, how to accurately and automatically extract the discriminative features hidden in the image provides information for subsequent analysis or classification of histopathological images. It has become one of the most challenging research topics in "digital pathology" to obtain the necessary information to quickly and accurately give disease grades and classifications.
传统的特征提取方式主要分为以下两类:第一大类是基于特定域或特定任务的特征,如生物细胞的大小与形态特征、图像的灰度或彩色信息、纹理等;第二大类主要以空间结构与多尺度特征为主,如形态学特征、图方法、尺度不变特征、小波特征等。上述传统特征提取方式多为像素级特征或手工特征,一般只适合特定的数据对象,其应用范围受到限制,而且特征冗余度高,判别性低。The traditional feature extraction methods are mainly divided into the following two categories: the first category is based on the characteristics of specific domains or specific tasks, such as the size and morphological characteristics of biological cells, grayscale or color information of images, textures, etc.; the second category It mainly focuses on spatial structure and multi-scale features, such as morphological features, graph methods, scale-invariant features, and wavelet features. The above-mentioned traditional feature extraction methods are mostly pixel-level features or manual features, which are generally only suitable for specific data objects, their application scope is limited, and feature redundancy is high and discriminative is low.
近些年来,稀疏表示因其在众多计算机视觉问题中的突出表现而获得了极大关注。其基本思想是将一个原始信号表示成以一组过完备字典为基的稀疏信号。稀疏表示在图像去噪与恢复,人脸识别,图像分类等领域中都获得了极大成功。而随着技术的发展,如何学习到适用于特定问题(比如用于图像分类)的字典成为学者们关注的焦点,即一个字典学习的理论框架。In recent years, sparse representations have received great attention due to their outstanding performance in numerous computer vision problems. The basic idea is to represent an original signal as a sparse signal based on a set of overcomplete dictionaries. Sparse representations have achieved great success in areas such as image denoising and restoration, face recognition, and image classification. With the development of technology, how to learn a dictionary suitable for specific problems (such as image classification) has become the focus of scholars, that is, a theoretical framework for dictionary learning.
字典学习的关键在于构造的字典是否具有较好的重构性与判别性。对这一类问题,Zhang等提出了一种判别性K-SVD(Discriminative K-SVD,DK-SVD)字典学习方法。Jiang等提出了基于类标一致K-SVD(Label Consistent K-SVD,LC-KSVD)的字典学习方法。Yang等采用Fisher准则提出判别性字典学习(Fisher Discrimination DictionaryLearning,FDDL)方法,通过约束稀疏表示系数间接提升字典的判别性能。Vu等提出了一种面向判别性特征的字典学习(Discriminative Feature-oriented Dictionary Learning,DFDL)方法,并将其应用于组织病理图像分类。上述方法,在图像分类中能取得非常不错的分类效果。The key to dictionary learning is whether the constructed dictionary has good reconstruction and discriminative properties. For this type of problem, Zhang et al. proposed a discriminative K-SVD (DK-SVD) dictionary learning method. Jiang et al. proposed a dictionary learning method based on Label Consistent K-SVD (LC-KSVD). Yang et al. proposed the Fisher Discrimination Dictionary Learning (FDDL) method using the Fisher criterion, which indirectly improves the discriminative performance of the dictionary by constraining the sparse representation coefficients. Vu et al. proposed a Discriminative Feature-oriented Dictionary Learning (DFDL) method and applied it to histopathological image classification. The above methods can achieve very good classification results in image classification.
然而,由于不同类型的组织病理图像呈现的特征各异,同一类型的组织病理图像中细胞形态与几何结构特征变化较大,病理特征也呈现出多样化,这导致同类病理图像样本间的特征差异大于非同类病理图像样本间的特征差异,使得上述方法学习的有病字典与无病字典相似程度较高,对无病样本与有病样本的判别性仍然较低,其分类性能依然有的待于提高。However, due to the different features presented by different types of histopathological images, the cell morphology and geometric structure characteristics in the same type of histopathological images vary greatly, and the pathological features also show diversification, which leads to the feature differences between the same type of pathological image samples. It is greater than the feature difference between non-similar pathological image samples, so that the disease dictionary learned by the above method is more similar to the disease-free dictionary, and the discrimination between disease-free samples and diseased samples is still low, and its classification performance still needs to be to improve.
发明内容SUMMARY OF THE INVENTION
为了解决上述技术问题,本发明提供一种准确率高、鲁棒性高的组织病理图像识别方法。In order to solve the above technical problems, the present invention provides a histopathological image recognition method with high accuracy and high robustness.
本发明解决上述问题的技术方案是:一种组织病理图像识别方法,包括以下步骤:The technical solution of the present invention to solve the above problems is: a method for identifying histopathological images, comprising the following steps:
步骤一,从某一组织的无病和有病两种图像中分别选取若干图像块作为无病和有病训练样本,无病和有病测试样本;Step 1: Select a number of image blocks from the disease-free and diseased images of a certain tissue, respectively, as disease-free and diseased training samples, and disease-free and diseased test samples;
步骤二,优化学习无病字典:结合无病训练样本和有病训练样本,建立无病字典学习模型,通过两步交替迭代的优化方式最小化目标函数,学习得到无病字典;Step 2, optimize the learning of the disease-free dictionary: combine the disease-free training samples and the diseased training samples to establish a disease-free dictionary learning model, and minimize the objective function through the optimization method of two-step alternate iteration, and learn to obtain the disease-free dictionary;
步骤三,优化学习有病字典:结合有病训练样本和无病训练样本,建立有病字典学习模型,通过两步交替迭代的优化方式最小化目标函数,学习得到有病字典;Step 3, optimize the learning of the diseased dictionary: combine the diseased training samples and the disease-free training samples to establish a diseased dictionary learning model, and minimize the objective function through the optimization method of two-step alternate iteration, and learn to obtain the diseased dictionary;
步骤四,判断是否达到最大迭代次数,若是,则进入步骤五,若不是,则返回步骤二;Step 4, determine whether the maximum number of iterations is reached, if so, go to Step 5, if not, return to Step 2;
步骤五,获得测试样本的重构误差向量:利用获得的无病字典和有病字典,对测试样本进行稀疏表示,然后分别计算测试样本在无病字典和有病字典下的稀疏重构误差向量;Step 5: Obtain the reconstruction error vector of the test sample: use the obtained disease-free dictionary and diseased dictionary to sparsely represent the test sample, and then calculate the sparse reconstruction error vector of the test sample under the disease-free dictionary and the diseased dictionary respectively. ;
步骤六:获得测试样本的分类结果:通过稀疏重构误差向量获得分类统计量,然后通过分类统计量与阈值的比较确定测试样本的类别。Step 6: Obtain the classification result of the test sample: obtain the classification statistic by sparsely reconstructing the error vector, and then determine the category of the test sample by comparing the classification statistic with the threshold.
上述组织病理图像识别方法,所述步骤一具体步骤为,从某一组织无病和有病两种图像中分别选取同等数量的图像块,然后将每个图像块分为RGB三通道,将三通道的像素值转换成列向量后串联得到特征向量,最后将特征向量并列作为无病和有病训练样本Y,同理获得测试样本。In the above-mentioned histopathological image recognition method, the specific step of the first step is to select an equal number of image blocks from two images of a certain tissue without disease and with disease, and then divide each image block into three RGB channels, and divide the three image blocks into three RGB channels. The pixel values of the channels are converted into column vectors to obtain feature vectors in series, and finally the feature vectors are juxtaposed as the disease-free and diseased training samples Y, Obtain test samples in the same way.
上述组织病理图像识别方法,所述步骤二的具体步骤为In the above method for identifying histopathological images, the specific steps of the second step are as follows:
2-1:从无病和有病训练样本中分别随机选取n列向量作为初始化的无病字典D和有病字典 2-1: Randomly select n-column vectors from the disease-free and diseased training samples as the initial disease-free dictionary D and diseased dictionary
2-2:建立无病字典学习模型,模型如下:2-2: Establish a disease-free dictionary learning model, the model is as follows:
其中,argmin表示使目标函数取最小值时的变量值,Y、分别代表无病与有病训练样本,X、分别代表无病与有病训练样本的稀疏表示系数,N和分别代表无病和有病图像特征向量的数量,L1为无病样本和有病样本在无病字典下的编码稀疏度,ρ为正则化参数,且ρ>0;式中的代表无病字典与无病训练样本的稀疏重构误差,代表无病字典与有病训练样本的重构误差,F表示范数,Ψ(D)为无病字典的Fisher准则约束项,其表达式为:其中m为无病字典D中所有原子的均值,M为无病字典D的原子均值m组成的矩阵,为有病字典中所有原子的均值,α、β分别代表类内间距与类间间距的惩罚系数,α,β>0;Among them, argmin represents the variable value when the objective function takes the minimum value, Y, represent the disease-free and diseased training samples, respectively, X, represent the sparse representation coefficients of disease-free and diseased training samples, respectively, N and Represent the number of disease-free and diseased image feature vectors, respectively, L 1 is the coding sparsity of disease-free samples and diseased samples under the disease-free dictionary, ρ is a regularization parameter, and ρ>0; in the formula represents the sparse reconstruction error of the disease-free dictionary and the disease-free training samples, Represents the reconstruction error between the disease-free dictionary and the diseased training sample, F represents the norm, Ψ(D) is the Fisher criterion constraint of the disease-free dictionary, and its expression is: where m is the mean of all atoms in the disease-free dictionary D, M is a matrix composed of the atomic mean m of the disease-free dictionary D, dictionary for sick The mean of all atoms in , α and β represent the penalty coefficients of intra-class spacing and inter-class spacing, respectively, α, β>0;
2-3:固定无病字典D,更新稀疏编码系数,此时的目标函数如下:2-3: Fix the disease-free dictionary D and update the sparse coding coefficients. The objective function at this time is as follows:
令训练样本编码系数矩阵L1为无病样本和有病样本在无病字典下的编码稀疏度,最优稀疏解为则目标函数的求解分为无病训练样本在无病字典D下的稀疏表示与有病训练样本在无病字典D下的稀疏表示两步迭代完成,统一的简化如下:Let the training sample Coding coefficient matrix L 1 is the coding sparsity of disease-free samples and diseased samples under the disease-free dictionary, and the optimal sparse solution is The solution of the objective function is divided into two steps: the sparse representation of the disease-free training samples under the disease-free dictionary D and the sparse representation of the diseased training samples under the disease-free dictionary D. The unified simplification is as follows:
利用SPAMS工具箱中的OMP算法,分别求解训练样本在无病字典D稀疏解 Use the OMP algorithm in the SPAMS toolbox to solve the sparse solutions of the training samples in the disease-free dictionary D respectively
2-4:固定稀疏编码系数,更新无病字典D,此时的目标函数如下:2-4: Fix the sparse coding coefficients and update the disease-free dictionary D. The objective function at this time is as follows:
通过化简得:By simplifying:
其中,tr表示矩阵的迹where tr represents the trace of the matrix
采用坐标梯度下降法求出无病字典D最优解。The optimal solution of the disease-free dictionary D is obtained by using the coordinate gradient descent method.
上述组织病理图像识别方法,所述步骤三的具体步骤为In the above-mentioned histopathological image recognition method, the specific steps of the third step are as follows:
3-1:从无病和有病训练样本中分别随机选取n列向量作为初始化的无病字典D和有病字典 3-1: Randomly select n-column vectors from the disease-free and diseased training samples as the initialized disease-free dictionary D and diseased dictionary
3-2:建立有病字典学习模型,模型如下:3-2: Establish a sick dictionary learning model, the model is as follows:
其中,Y、分别代表无病与有病训练样本,X、分别代表无病与有病训练样本的稀疏表示系数,N和分别代表无病和有病图像特征向量的数量,L2为无病样本和有病样本在有病字典下的编码稀疏度,ρ为正则化参数,且ρ>0;式中的代表有病字典与有病样本的稀疏重构误差,代表有病字典与无病样本的重构误差,为有病字典的Fisher准则约束项,其表达式为:其中m为无病字典D中所有原子的均值,为有病字典中所有原子的均值,M为有病字典中所有原子的均值组成的矩阵;Among them, Y, represent the disease-free and diseased training samples, respectively, X, represent the sparse representation coefficients of disease-free and diseased training samples, respectively, N and Represent the number of disease-free and diseased image feature vectors respectively, L 2 is the coding sparsity of disease-free samples and diseased samples under the diseased dictionary, ρ is the regularization parameter, and ρ>0; in the formula represents the sparse reconstruction error of the diseased dictionary and the diseased sample, represents the reconstruction error between the diseased dictionary and the disease-free sample, is the Fisher criterion constraint of the diseased dictionary, and its expression is: where m is the mean of all atoms in the disease-free dictionary D, dictionary for sick The mean of all atoms in , M is the sick dictionary mean of all atoms in composed of a matrix;
3-3:固定有病字典更新稀疏编码系数,此时的目标函数如下:3-3: Fixed sick dictionary To update the sparse coding coefficients, the objective function at this time is as follows:
令训练样本编码系数矩阵L2为无病样本和有病样本在有病字典下的编码稀疏度,最优稀疏解为则目标函数的求解分为无病训练样本在有病字典下的稀疏表示与有病训练样本在有病字典下的稀疏表示两步迭代完成,统一的简化如下:Let the training sample Coding coefficient matrix L 2 is the coding sparsity of disease-free samples and diseased samples under the diseased dictionary, and the optimal sparse solution is Then the solution of the objective function is divided into the disease-free training samples in the diseased dictionary Sparse representation with sick training samples under sick dictionary The sparse representation below is completed in two iterations, and the unified simplification is as follows:
利用SPAMS工具箱中的OMP算法,分别求解训练样本在有病字典稀疏解 Use the OMP algorithm in the SPAMS toolbox to solve the training samples in the diseased dictionary respectively sparse solution
3-4:固定稀疏编码系数,更新有病字典此时的目标函数如下:3-4: Fix sparse coding coefficients, update diseased dictionary The objective function at this time is as follows:
通过化简得:By simplifying:
其中, in,
采用坐标梯度下降法求出有病字典最优解。Using the Coordinate Gradient Descent Method to Find the Diseased Dictionary Optimal solution.
上述组织病理图像识别方法,所述步骤五的具体步骤为In the above-mentioned histopathological image recognition method, the specific steps of the step 5 are as follows:
5-1,将测试样本图像分块,每个图块视为一个列向量h,随机取u个图块组成矩阵H作为测试样本,利用求得测试样本H在带类标字典下的稀疏编码 5-1, divide the test sample image into blocks, each block is regarded as a column vector h, randomly select u blocks to form a matrix H as the test sample, use Obtain the test sample H in the dictionary with class label sparse coding under
5-2,计算测试样本在无病字典D与有病字典下的稀疏重构误差向量,即δ1=diag((H-DX)(H-DX)T),其中,diag(·)表示矩阵主对角线上的元素。5-2, Calculate the test sample in the disease-free dictionary D and the diseased dictionary The sparse reconstruction error vector under , namely δ 1 =diag((H-DX)(H-DX) T ), where diag( ) represents the elements on the main diagonal of the matrix.
上述组织病理图像识别方法,所述步骤六的具体步骤为In the above-mentioned histopathological image recognition method, the specific steps of the step 6 are as follows:
6-1,定义向量Nt为测试样本的个数;6-1, define vector N t is the number of test samples;
6-2,由向量C得到分类统计量S:6-2, get the classification statistic S from the vector C:
当分类统计量S大于或者等于阈值Th,测试样本为无病样本;反之,当分类统计量S小于阈值Th,则测试样本为有病样本。When the classification statistic S is greater than or equal to the threshold Th, the test sample is a disease-free sample; on the contrary, when the classification statistic S is less than the threshold Th, the test sample is a diseased sample.
本发明的有益效果在于:本发明的步骤包括:首先从组织病理图像数据集中分别随机选取若干图像块作为训练样本和测试样本;然后将不同类型的训练样本输入到模型中,使用交替迭代的方法对模型进行求解,不断优化目标函数,学习得到带类标字典;最后基于得到的带类标字典对测试集矩阵进行稀疏表示,通过重构误差向量和阈值的对比确定此测试集矩阵的类别。本发明对字典学习在组织病理图像分类中的应用提出了新的模型和方法,学习出的带类标字典对同类样本具有较好的稀疏重构性与类内鲁棒性,对非同类样本具有较好的类间判别性,能有效提高组织病理图像分类性能。The beneficial effects of the present invention are as follows: the steps of the present invention include: first, randomly selecting several image blocks from the histopathological image data set as training samples and test samples; then inputting different types of training samples into the model, using an alternate iteration method The model is solved, the objective function is continuously optimized, and the dictionary with class labels is obtained by learning; finally, the test set matrix is sparsely represented based on the obtained dictionary with class labels, and the category of the test set matrix is determined by comparing the reconstructed error vector and the threshold. The invention proposes a new model and method for the application of dictionary learning in the classification of histopathological images. The learned dictionary with class labels has good sparse reconstruction and intra-class robustness for similar samples, and it has better sparse reconstruction and intra-class robustness for non-homogeneous samples. It has good inter-class discrimination and can effectively improve the classification performance of histopathological images.
附图说明Description of drawings
图1为本发明的流程图。FIG. 1 is a flow chart of the present invention.
图2为ADL数据库中肺、脾脏、肾脏的组织病理示意图,其中(a)从左至右分别为肺、脾脏、肾脏的无病图像,(b)从左至右分别为肺、脾脏、肾脏的有病图像。Figure 2 is a schematic diagram of the histopathology of the lung, spleen and kidney in the ADL database, in which (a) from left to right are the disease-free images of the lung, spleen, and kidney, respectively, (b) from left to right are the lung, spleen, and kidney, respectively sick images.
图3为BreaKHis数据库中腺病与叶状癌的组织病理示意图,其中(a)为腺病的组织病理图像,(b)为叶状癌的组织病理图像。3 is a schematic diagram of histopathology of adenopathy and phyllodes carcinoma in the BreaKHis database, wherein (a) is the histopathological image of adenopathy, and (b) is the histopathological image of phyllodes carcinoma.
具体实施方式Detailed ways
下面结合附图和实施例对本发明作进一步的说明。The present invention will be further described below with reference to the accompanying drawings and embodiments.
如图1所示,本发明包括以下步骤:As shown in Figure 1, the present invention comprises the following steps:
步骤一:从某一组织的无病和有病两种图像中分别选取若干图像块作为无病和有病训练样本,无病和有病测试样本。具体步骤为:Step 1: Select several image blocks from two kinds of images of a tissue without disease and with disease as training samples without disease and with disease, and test samples without disease and disease. The specific steps are:
从某一组织的无病和有病两种图像中分别随机选取40张图像,从每张图像随机提取250个图块,块的大小为20×20,则共计10000个彩色图块,然后将每个彩色图块分为RGB三通道,将三通道的像素值转换成列向量后串联得到特征向量,最后将特征向量并列作为训练样本,则Y,R1200×10000表示矩阵的大小,分别从剩余的某一种组织图像中随机选取无病和有病两种图像各110张作为测试集。Randomly select 40 images from two images of a certain tissue without disease and with disease, and randomly extract 250 blocks from each image, the size of the block is 20 × 20, there are a total of 10,000 color blocks, and then the Each color block is divided into three RGB channels, and the pixel values of the three channels are converted into column vectors to obtain feature vectors in series, and finally the feature vectors are juxtaposed as training samples, then Y, R 1200×10000 represents the size of the matrix, and 110 images of disease-free and diseased images are randomly selected from the remaining tissue images as the test set.
步骤二,优化学习无病字典:结合无病训练样本和有病训练样本,建立无病字典学习模型,通过两步交替迭代的优化方式最小化目标函数,学习得到无病字典。具体步骤为:Step 2: Optimizing learning of disease-free dictionary: combining disease-free training samples and diseased training samples, establishing a disease-free dictionary learning model, and learning to obtain a disease-free dictionary by minimizing the objective function through two-step alternate iterative optimization. The specific steps are:
2-1:从无病和有病训练样本中分别随机选取n列向量作为初始化的无病字典D和有病字典 2-1: Randomly select n-column vectors from the disease-free and diseased training samples as the initial disease-free dictionary D and diseased dictionary
2-2:建立无病字典学习模型,模型如下:2-2: Establish a disease-free dictionary learning model, the model is as follows:
其中,argmin表示使目标函数取最小值时的变量值,Y、分别代表无病与有病训练样本,X、分别代表无病与有病训练样本的稀疏表示系数,N和分别代表无病和有病图像特征向量的数量,L1为无病样本和有病样本在无病字典下的编码稀疏度,ρ为正则化参数,且ρ>0;式中的代表无病字典与无病训练样本的稀疏重构误差,代表无病字典与有病训练样本的重构误差,F表示范数,Ψ(D)为无病字典的Fisher准则约束项,其表达式为:其中m为无病字典D中所有原子的均值,M为无病字典D的原子均值m组成的矩阵,为有病字典中所有原子的均值,α、β分别代表类内间距与类间间距的惩罚系数,α,β>0;模型目的是通过最小化第1项和第3项并同时最大化第2项,则学习的带类标字典对同类样本的重构性能较好,对于非同类样本重构性能较差,甚至无法重构,且学习的字典间具有较强辨别能力,从而获得具有判别性特征从而进一步可以更好的分类;Among them, argmin represents the variable value when the objective function takes the minimum value, Y, represent the disease-free and diseased training samples, respectively, X, represent the sparse representation coefficients of disease-free and diseased training samples, respectively, N and Represent the number of disease-free and diseased image feature vectors, respectively, L 1 is the coding sparsity of disease-free samples and diseased samples under the disease-free dictionary, ρ is a regularization parameter, and ρ>0; in the formula represents the sparse reconstruction error of the disease-free dictionary and the disease-free training samples, Represents the reconstruction error between the disease-free dictionary and the diseased training sample, F represents the norm, Ψ(D) is the Fisher criterion constraint of the disease-free dictionary, and its expression is: where m is the mean of all atoms in the disease-free dictionary D, M is a matrix composed of the atomic mean m of the disease-free dictionary D, dictionary for sick The mean of all atoms in , α and β represent the penalty coefficients of intra-class spacing and inter-class spacing, respectively, α, β>0; the purpose of the model is to minimize the first and third terms and maximize the second term, then The learned dictionary with class labels has good reconstruction performance for similar samples, but poor reconstruction performance for non-homogeneous samples, and even cannot be reconstructed, and the learned dictionaries have strong discriminative ability, so as to obtain discriminative features and further. can be better classified;
2-3:固定无病字典D,更新稀疏编码系数,此时的目标函数如下:2-3: Fix the disease-free dictionary D and update the sparse coding coefficients. The objective function at this time is as follows:
令训练样本编码系数矩阵L1为无病样本和有病样本在无病字典下的编码稀疏度,最优稀疏解为则目标函数的求解分为无病训练样本在无病字典D下的稀疏表示与有病训练样本在无病字典D下的稀疏表示两步迭代完成,统一的简化如下:Let the training sample Coding coefficient matrix L 1 is the coding sparsity of disease-free samples and diseased samples under the disease-free dictionary, and the optimal sparse solution is The solution of the objective function is divided into two steps: the sparse representation of the disease-free training samples under the disease-free dictionary D and the sparse representation of the diseased training samples under the disease-free dictionary D. The unified simplification is as follows:
利用SPAMS工具箱中的OMP算法,分别求解训练样本在无病字典D稀疏解 Use the OMP algorithm in the SPAMS toolbox to solve the sparse solutions of the training samples in the disease-free dictionary D respectively
2-4:固定稀疏编码系数,更新无病字典D,此时的目标函数如下:2-4: Fix the sparse coding coefficients and update the disease-free dictionary D. The objective function at this time is as follows:
通过化简得:By simplifying:
其中,tr表示矩阵的迹where tr represents the trace of the matrix
上述函数为凸函数,采用坐标梯度下降法求出无病字典D最优解。The above functions are convex functions, and the optimal solution of the disease-free dictionary D is obtained by using the coordinate gradient descent method.
步骤三,优化学习有病字典:结合有病训练样本和无病训练样本,建立有病字典学习模型,通过两步交替迭代的优化方式最小化目标函数,学习得到有病字典。具体步骤为:Step 3, optimize the learning of the diseased dictionary: combine the diseased training samples and the disease-free training samples to establish a diseased dictionary learning model, and minimize the objective function through a two-step alternate iterative optimization method to learn the diseased dictionary. The specific steps are:
3-1:从无病和有病训练样本中分别随机选取n列向量作为初始化的无病字典D和有病字典 3-1: Randomly select n-column vectors from the disease-free and diseased training samples as the initialized disease-free dictionary D and diseased dictionary
3-2:建立有病字典学习模型,模型如下:3-2: Establish a sick dictionary learning model, the model is as follows:
其中,Y、分别代表无病与有病训练样本,X、分别代表无病与有病训练样本的稀疏表示系数,N和分别代表无病和有病图像特征向量的数量,L2为无病样本和有病样本在有病字典下的编码稀疏度,ρ为正则化参数,且ρ>0;式中的代表有病字典与有病样本的稀疏重构误差,代表有病字典与无病样本的重构误差,为有病字典的Fisher准则约束项,其表达式为:其中m为无病字典D中所有原子的均值,为有病字典中所有原子的均值,M为有病字典中所有原子的均值组成的矩阵;模型目的是通过最小化第1项和第3项并同时最大化第2项,则学习的带类标字典对同类样本的重构性能较好,对于非同类样本重构性能较差,甚至无法重构,且学习的字典间具有较强辨别能力,从而获得具有判别性特征从而进一步可以更好的分类。Among them, Y, represent the disease-free and diseased training samples, respectively, X, represent the sparse representation coefficients of disease-free and diseased training samples, respectively, N and Represent the number of disease-free and diseased image feature vectors respectively, L 2 is the coding sparsity of disease-free samples and diseased samples under the diseased dictionary, ρ is the regularization parameter, and ρ>0; in the formula represents the sparse reconstruction error of the diseased dictionary and the diseased sample, represents the reconstruction error between the diseased dictionary and the disease-free sample, is the Fisher criterion constraint of the diseased dictionary, and its expression is: where m is the mean of all atoms in the disease-free dictionary D, dictionary for sick The mean of all atoms in , M is the sick dictionary mean of all atoms in The purpose of the model is to minimize the 1st and 3rd items and maximize the 2nd item at the same time, then the learned dictionary with class labels has better reconstruction performance for similar samples, and better reconstruction performance for non-homogeneous samples. Poor, or even impossible to reconstruct, and the learned dictionaries have strong discriminative ability, so as to obtain discriminative features and further better classification.
3-3:固定有病字典更新稀疏编码系数,此时的目标函数如下:3-3: Fixed sick dictionary To update the sparse coding coefficients, the objective function at this time is as follows:
令训练样本编码系数矩阵L2为无病样本和有病样本在有病字典下的编码稀疏度,最优稀疏解为则目标函数的求解分为无病训练样本在有病字典下的稀疏表示与有病训练样本在有病字典下的稀疏表示两步迭代完成,统一的简化如下:Let the training sample Coding coefficient matrix L 2 is the coding sparsity of disease-free samples and diseased samples under the diseased dictionary, and the optimal sparse solution is Then the solution of the objective function is divided into the disease-free training samples in the diseased dictionary Sparse representation with sick training samples under sick dictionary The sparse representation below is completed in two iterations, and the unified simplification is as follows:
利用SPAMS工具箱中的OMP算法,分别求解训练样本在有病字典稀疏解 Use the OMP algorithm in the SPAMS toolbox to solve the training samples in the diseased dictionary respectively sparse solution
3-4:固定稀疏编码系数,更新有病字典此时的目标函数如下:3-4: Fix sparse coding coefficients, update diseased dictionary The objective function at this time is as follows:
通过化简得:By simplifying:
其中, in,
采用坐标梯度下降法求出有病字典最优解;Using the Coordinate Gradient Descent Method to Find the Diseased Dictionary Optimal solution;
3-5:返回步骤二,优化学习无病字典和优化学习有病字典的过程交替进行,直至达到最大迭代次数时停止。3-5: Return to step 2, the process of optimizing the learning of the disease-free dictionary and the process of optimizing the learning of the diseased dictionary is performed alternately, and stops when the maximum number of iterations is reached.
步骤四,判断是否达到最大迭代次数,若是,则进入步骤五,若不是,则返回步骤二。Step 4, judge whether the maximum number of iterations is reached, if yes, go to Step 5, if not, go back to Step 2.
步骤五,获得测试样本的重构误差向量:利用获得的无病字典和有病字典,对测试样本进行稀疏表示,然后分别计算测试样本在无病字典和有病字典下的稀疏重构误差向量。具体步骤为:Step 5: Obtain the reconstruction error vector of the test sample: use the obtained disease-free dictionary and diseased dictionary to sparsely represent the test sample, and then calculate the sparse reconstruction error vector of the test sample under the disease-free dictionary and the diseased dictionary respectively. . The specific steps are:
5-1,将测试样本图像分块,每个图块视为一个列向量h,随机取250个图块组成矩阵H作为测试样本,利用求得测试样本H在带类标字典下的稀疏编码 5-1, divide the test sample image into blocks, each block is regarded as a column vector h, randomly select 250 blocks to form a matrix H as the test sample, use Obtain the test sample H in the dictionary with class label sparse coding under
5-2,计算测试样本在无病字典D与有病字典下的稀疏重构误差向量,即δ1=diag((H-DX)(H-DX)T),其中,diag(·)表示矩阵主对角线上的元素。5-2, Calculate the test sample in the disease-free dictionary D and the diseased dictionary The sparse reconstruction error vector under , namely δ 1 =diag((H-DX)(H-DX) T ), where diag( ) represents the elements on the main diagonal of the matrix.
步骤六:获得测试样本的分类结果:通过稀疏重构误差向量获得分类统计量,然后通过分类统计量与阈值的比较确定测试样本的类别。具体步骤为:Step 6: Obtain the classification result of the test sample: obtain the classification statistic by sparsely reconstructing the error vector, and then determine the category of the test sample by comparing the classification statistic with the threshold. The specific steps are:
6-1,定义向量Nt为测试样本的个数;6-1, define vector N t is the number of test samples;
6-2,由向量C得到分类统计量S:6-2, get the classification statistic S from the vector C:
当分类统计量S大于或者等于阈值Th,测试样本为无病样本;反之,当分类统计量S小于阈值Th,则测试样本为有病样本。When the classification statistic S is greater than or equal to the threshold Th, the test sample is a disease-free sample; on the contrary, when the classification statistic S is less than the threshold Th, the test sample is a diseased sample.
表1为本发明与其它方法运用到ADL数据库中的肺部图像的分类结果对比表。Table 1 is a comparison table of the classification results of lung images applied to the ADL database by the present invention and other methods.
表1Table 1
表2为本发明与其它方法运用到ADL数据库中的脾脏图像的分类结果对比表。Table 2 is a comparison table of the classification results of spleen images applied to the ADL database by the present invention and other methods.
表2Table 2
表2为本发明与其它方法运用到ADL数据库中的肾脏图像的分类结果对比表。Table 2 is a comparison table of the classification results of kidney images applied to the ADL database by the present invention and other methods.
表3table 3
由表1、表2、表3可以知道,本发明提出的模型对这三类器官的疾病诊断效果明显要好于其他方法,在无病样本与有病样本下正分率都有所提高。特别地,表1的肺部分类结果更为明显,与DFDL相比,本文方法的分类精度提升了2~3%。由图2可知,无病的肺部图像中包含体积较大的肺泡,而在有病的肺部图像中肺泡体积较小,且布满了蓝紫色的炎症细胞,且纹理更为复杂,无病与有病的肺部图像之间差异性明显大于脾脏与肾脏图像。同时,无病与有病的脾脏图像纹理与结构相似度高,但因颜色差异较大,两类图像判别性次之,其分类性能次之;无病与有病的肾脏图像不仅纹理与结构相似高、而且颜色相似度高,判别性最差,其分类性能最弱。表中实验结果与图1完全相符,再次说明本发明提出的模型的有效性。From Table 1, Table 2, and Table 3, it can be known that the model proposed by the present invention has significantly better disease diagnosis effect on these three types of organs than other methods, and the positive score rate is improved in both disease-free samples and diseased samples. In particular, the lung classification results in Table 1 are more obvious. Compared with DFDL, the classification accuracy of our method is improved by 2-3%. It can be seen from Figure 2 that the lung images without disease contain larger alveoli, while in the images of diseased lungs, the alveoli are smaller in volume, covered with blue-purple inflammatory cells, and have more complex textures. The difference between diseased and diseased lung images was significantly greater than that of spleen and kidney images. At the same time, the texture and structure of the images of the spleen without disease and disease are highly similar, but due to the large color difference, the two types of images are second in discriminative performance, and their classification performance is second; the images of kidneys without disease and disease are not only texture and structure The similarity is high, and the color similarity is high, the discriminative is the worst, and its classification performance is the weakest. The experimental results in the table are completely consistent with Fig. 1, again illustrating the validity of the model proposed by the present invention.
为了验证本发明构建的组织病理图像的判别性特征学习框架的普适性,特别的,将本发明所提的模型应用于BreaKHis数据集中疾病类型的诊断。In order to verify the universality of the discriminative feature learning framework for histopathological images constructed by the present invention, in particular, the model proposed by the present invention is applied to the diagnosis of disease types in the BreaKHis dataset.
表4为本发明与其它方法运用到BreaKHis数据库中分类结果对比表。Table 4 is a comparison table of classification results applied to BreaKHis database between the present invention and other methods.
表4Table 4
表4给出了不同方法在BreaKHis数据库上的分类结果,实验结果表明,本发明提出的模型对于图3中两种良性乳腺癌图像同样显示出了较好疾病分类性能,这一结果说明本发明对于有效提高带类标字典对同类样本的稀疏表示的重构性与鲁棒性具有较好的作用,同时也解决了对于非同类样本判别性差的问题。Table 4 shows the classification results of different methods on the BreaKHis database. The experimental results show that the model proposed by the present invention also shows better disease classification performance for the two benign breast cancer images in FIG. 3, and this result shows that the present invention It has a good effect on effectively improving the reconstruction and robustness of the sparse representation of the sparse representation of the similar samples with the class-labeled dictionary, and also solves the problem of poor discrimination for non-similar samples.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710059300.0A CN106845551B (en) | 2017-01-24 | 2017-01-24 | A kind of histopathological image recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710059300.0A CN106845551B (en) | 2017-01-24 | 2017-01-24 | A kind of histopathological image recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106845551A CN106845551A (en) | 2017-06-13 |
CN106845551B true CN106845551B (en) | 2020-08-11 |
Family
ID=59122438
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710059300.0A Active CN106845551B (en) | 2017-01-24 | 2017-01-24 | A kind of histopathological image recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106845551B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107832786B (en) * | 2017-10-31 | 2019-10-25 | 济南大学 | A Face Recognition Classification Method Based on Dictionary Learning |
CN109063766B (en) * | 2018-07-31 | 2021-11-30 | 湘潭大学 | Image classification method based on discriminant prediction sparse decomposition model |
CN109308485B (en) * | 2018-08-02 | 2022-11-29 | 中国矿业大学 | A Migration Sparse Coding Image Classification Method Based on Dictionary Domain Adaptation |
CN109376802B (en) * | 2018-12-12 | 2021-08-03 | 浙江工业大学 | A dictionary learning-based method for classifying gastroscopic organs |
CN111027594B (en) * | 2019-11-18 | 2022-08-12 | 西北工业大学 | A step-by-step anomaly detection method based on dictionary representation |
CN113627556B (en) * | 2021-08-18 | 2023-03-24 | 广东电网有限责任公司 | Method and device for realizing image classification, electronic equipment and storage medium |
CN113793319B (en) * | 2021-09-13 | 2023-08-25 | 浙江理工大学 | Fabric image flaw detection method and system based on category constraint dictionary learning model |
CN114428873B (en) * | 2022-04-07 | 2022-06-28 | 源利腾达(西安)科技有限公司 | Thoracic surgery examination data sorting method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9946931B2 (en) * | 2015-04-20 | 2018-04-17 | Los Alamos National Security, Llc | Change detection and change monitoring of natural and man-made features in multispectral and hyperspectral satellite imagery |
CN104866810B (en) * | 2015-04-10 | 2018-07-13 | 北京工业大学 | A kind of face identification method of depth convolutional neural networks |
CN105844223A (en) * | 2016-03-18 | 2016-08-10 | 常州大学 | Face expression algorithm combining class characteristic dictionary learning and shared dictionary learning |
-
2017
- 2017-01-24 CN CN201710059300.0A patent/CN106845551B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106845551A (en) | 2017-06-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106845551B (en) | A kind of histopathological image recognition method | |
CN107122809B (en) | A neural network feature learning method based on image self-encoding | |
Jia et al. | Image transformation based on learning dictionaries across image spaces | |
CN110533683B (en) | A radiomics analysis method integrating traditional features and deep features | |
CN104008375B (en) | The integrated face identification method of feature based fusion | |
CN104933711A (en) | Automatic fast segmenting method of tumor pathological image | |
Hsu et al. | Capturing implicit hierarchical structure in 3D biomedical images with self-supervised hyperbolic representations | |
CN106778807A (en) | The fine granularity image classification method of dictionary pair is relied on based on public dictionary pair and class | |
CN112836671A (en) | A Data Dimensionality Reduction Method Based on Maximizing Ratio and Linear Discriminant Analysis | |
CN108460400B (en) | Hyperspectral image classification method combining various characteristic information | |
CN110796022B (en) | Low-resolution face recognition method based on multi-manifold coupling mapping | |
CN115496720A (en) | Gastrointestinal cancer pathological image segmentation method and related equipment based on ViT mechanism model | |
CN113256494A (en) | Text image super-resolution method | |
Franco-Barranco et al. | Current progress and challenges in large-scale 3d mitochondria instance segmentation | |
CN111695455B (en) | Low-resolution face recognition method based on coupling discrimination manifold alignment | |
CN104142978B (en) | A kind of image indexing system and method based on multiple features and rarefaction representation | |
CN110298365B (en) | Theme color extraction method based on human vision | |
CN111783796A (en) | A PET/CT Image Recognition System Based on Depth Feature Fusion | |
Lei et al. | HPLTS-GAN: A high-precision remote sensing spatiotemporal fusion method based on low temporal sensitivity | |
CN108121964B (en) | Matrix-based joint sparse locality preserving projection face recognition method | |
CN114022521A (en) | A registration method and system for non-rigid multimodal medical images | |
CN112949422A (en) | Hyperspectral target detection method based on self-supervision spectrum matching framework | |
Xu et al. | Data-efficient histopathology image analysis with deformation representation learning | |
CN113011506A (en) | Texture image classification method based on depth re-fractal spectrum network | |
CN109754001B (en) | Image classification method, computer storage medium and image classification device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |