CN110472694A - A kind of Lung Cancer Images pathological classification method and device - Google Patents
A kind of Lung Cancer Images pathological classification method and device Download PDFInfo
- Publication number
- CN110472694A CN110472694A CN201910778738.3A CN201910778738A CN110472694A CN 110472694 A CN110472694 A CN 110472694A CN 201910778738 A CN201910778738 A CN 201910778738A CN 110472694 A CN110472694 A CN 110472694A
- Authority
- CN
- China
- Prior art keywords
- data set
- rgb
- lung cancer
- pathological
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
本申请公开了一种肺癌图像病理分类方法及装置,包括:首先,提取预置RGB三维病变图像数据集的色彩特征,得到特征向量集;其次,将由所述特征向量集按列组成的矩阵进行非负矩阵分解,得到系数矩阵;然后,将所述系数矩阵作为样本数据集划分为训练数据集与测试数据集,并分别获取每个训练数据的第一真实病理标签和每个测试数据的第二真实病理标签;接着,用所述训练数据集与所述第一真实病理标签对分类器进行分类训练,得到训练完成的分类器;最后,将所述测试数据集输入到所述训练完成的分类器中进行分类,得到预测病理标签和置信距离。解决了等待肺癌诊断结果的时间较长的技术问题。
The present application discloses a lung cancer image pathological classification method and device, including: firstly, extracting the color features of a preset RGB three-dimensional lesion image data set to obtain a feature vector set; The coefficient matrix is obtained by non-negative matrix decomposition; then, the coefficient matrix is divided into a training data set and a test data set as a sample data set, and the first real pathological label of each training data and the first true pathological label of each test data are respectively obtained. Two real pathological labels; then, use the training data set and the first real pathological label to classify the classifier to obtain a trained classifier; finally, input the test data set to the completed training Classification is carried out in the classifier, and the predicted pathological label and confidence distance are obtained. Resolved a technical issue with the long waiting time for a lung cancer diagnosis.
Description
技术领域technical field
本申请涉及图像处理技术领域,尤其涉及一种肺癌图像病理分类方法及装置。The present application relates to the technical field of image processing, in particular to a method and device for pathological classification of lung cancer images.
背景技术Background technique
肺癌是最常见的内脏恶性肿瘤,发病率和死亡率都很高,及早发现和确诊对于病人身体健康至关重要。肺癌按照子类别分主要可以分为浸润性癌、微浸润癌、原位癌等,其中微浸润癌和原位癌诊治过程中的手术方式相同,可以归并为非浸润癌。Lung cancer is the most common visceral malignant tumor with high morbidity and mortality. Early detection and diagnosis are crucial to the health of patients. Lung cancer can be divided into invasive carcinoma, microinvasive carcinoma, and carcinoma in situ according to subcategories. Microinvasive carcinoma and carcinoma in situ have the same surgical methods in the diagnosis and treatment process and can be classified as non-invasive carcinoma.
传统对肺癌的诊断方法主要依靠有经验的医生对胸部影像学如计算机断层(CT)等进行初步检查和评估,而肺癌子类别的确定必须由病理医师在显微镜下观察活体组织样本特征才能得到结果,医生通过手术将患者的病变组织取出后通常交由病理科进行冰冻切片和石蜡切片处理,但该方法需要等待30分钟左右才能出结果,比较耗时。The traditional diagnosis method of lung cancer mainly relies on the preliminary examination and evaluation of chest imaging such as computed tomography (CT) by experienced doctors, and the determination of the subcategory of lung cancer must be obtained by pathologists observing the characteristics of living tissue samples under a microscope , after the doctor removes the patient's diseased tissue through surgery, it is usually sent to the pathology department for frozen section and paraffin section, but this method needs to wait for about 30 minutes to get the result, which is time-consuming.
发明内容Contents of the invention
本申请提供了一种肺癌图像病理分类方法、装置及设备,用于解决现有的通过医师在显微镜下观察从患者体内获取的病变组织后制作的活体冰冻切片和石蜡切片才能得到结果,导致等待肺癌诊断结果的时间较长的技术问题。This application provides a lung cancer image pathological classification method, device and equipment, which are used to solve the problem that the results can only be obtained from live frozen sections and paraffin sections made after doctors observe the lesion tissue obtained from the patient under a microscope, which leads to waiting Lung cancer diagnosis results in longer time technical issues.
有鉴于此,本申请从第一方面提供了一种肺癌图像病理分类方法,包括:In view of this, the present application provides a method for pathological classification of lung cancer images from the first aspect, including:
提取预置RGB三维病变图像数据集的色彩特征,得到特征向量集;Extract the color features of the preset RGB three-dimensional lesion image data set to obtain a feature vector set;
将由所述特征向量集按列组成的矩阵进行非负矩阵分解,得到系数矩阵;Performing non-negative matrix decomposition on the matrix composed of the eigenvector set by column to obtain a coefficient matrix;
将所述系数矩阵作为样本数据集划分为训练数据集与测试数据集,并分别获取每个训练数据的第一真实病理标签和每个测试数据的第二真实病理标签;dividing the coefficient matrix as a sample data set into a training data set and a test data set, and obtaining the first real pathological label of each training data and the second real pathological label of each test data;
用所述训练数据集与所述第一真实病理标签对分类器进行分类训练,得到训练完成的分类器;Using the training data set and the first true pathological label to perform classification training on a classifier to obtain a trained classifier;
将所述测试数据集输入到所述训练完成的分类器中进行分类,得到预测病理标签和置信距离,其中,置信距离为预测病例标签对应的概率值。The test data set is input into the trained classifier for classification, and a predicted pathological label and a confidence distance are obtained, wherein the confidence distance is a probability value corresponding to the predicted case label.
优选地,所述提取预置RGB三维病变数据集的色彩特征,得到特征向量,包括:Preferably, the extraction of color features of preset RGB three-dimensional lesion data sets to obtain feature vectors includes:
基于颜色直方图提取预置RGB三维病变数据集的色彩特征,得到特征向量。Based on the color histogram, the color features of the preset RGB three-dimensional lesion data set are extracted to obtain the feature vector.
优选地,所述分类器的数量为多个。Preferably, there are multiple classifiers.
优选地,还包括:Preferably, it also includes:
根据所述第二真实病理标签、所述预测病理标签以及所述置信距离计算AUC,评价不同分类器的分类效果。AUC is calculated according to the second true pathological label, the predicted pathological label and the confidence distance, and the classification effects of different classifiers are evaluated.
优选地,所述提取预置RGB三维病变图像数据集的色彩特征,得到特征向量,之前还包括:Preferably, the extraction of the color features of the preset RGB three-dimensional lesion image data set to obtain the feature vector also includes:
对RGB三维病变图像样本中的肺癌病理区域进行第一截图处理,获取所述RGB三维病变图像数据集。The first screenshot processing is performed on the pathological region of lung cancer in the RGB three-dimensional lesion image sample to obtain the RGB three-dimensional lesion image data set.
优选地,所述提取预置RGB三维病变图像数据集的色彩特征,得到特征向量,之前还包括:Preferably, the extraction of the color features of the preset RGB three-dimensional lesion image data set to obtain the feature vector also includes:
对RGB三维病变图像样本进行进行第一缩放处理,获取所述RGB三维病变图像数据集。A first scaling process is performed on the RGB three-dimensional lesion image samples to obtain the RGB three-dimensional lesion image data set.
优选地,所述提取预置RGB三维病变图像数据集的色彩特征,得到特征向量,之前还包括:Preferably, the extraction of the color features of the preset RGB three-dimensional lesion image data set to obtain the feature vector also includes:
对RGB三维病变图像样本进行第二缩放处理后,对所述第二缩放处理后的所述RGB三维病变图像样本的肺癌病理区域进行第二截图处理,获取所述RGB三维病变图像数据集。After the second scaling process is performed on the RGB three-dimensional lesion image sample, a second screenshot process is performed on the lung cancer pathological area of the RGB three-dimensional lesion image sample after the second scaling process to obtain the RGB three-dimensional lesion image data set.
本申请从第二方面提供了一种肺癌图像病理分类装置,包括:特征提取模块、分解模块、处理模块、训练模块、测试模块;The present application provides a lung cancer image pathological classification device from the second aspect, including: a feature extraction module, a decomposition module, a processing module, a training module, and a testing module;
所述特征提取模块,用于提取预置RGB三维病变图像数据集的色彩特征,得到特征向量集;The feature extraction module is used to extract the color features of the preset RGB three-dimensional lesion image data set to obtain a feature vector set;
所述分解模块,将由所述特征向量集按列组成的矩阵进行非负矩阵分解,得到系数矩阵;The decomposition module performs non-negative matrix decomposition on the matrix composed of the eigenvector set by column to obtain a coefficient matrix;
所述处理模块,将所述系数矩阵作为样本数据集划分为训练数据集与测试数据集,并分别获取每个训练数据的第一真实病理标签和每个测试数据的第二真实病理标签;The processing module divides the coefficient matrix as a sample data set into a training data set and a test data set, and obtains the first real pathological label of each training data and the second real pathological label of each test data;
所述训练模块,用所述训练数据集与所述第一真实标签对分类器进行分类训练,得到训练完成的分类器;The training module uses the training data set and the first true label to classify and train the classifier to obtain a trained classifier;
所述测试模块,用于将所述测试数据集输入到所述训练完成的分类器中进行分类,得到预测病理标签和置信距离,其中,置信距离为预测病例标签对应的概率值。The test module is configured to input the test data set into the trained classifier for classification to obtain the predicted pathological label and confidence distance, wherein the confidence distance is the probability value corresponding to the predicted case label.
优选地,还包括:评价模块;Preferably, it also includes: an evaluation module;
所述评价模块,根据所述第二真实病理标签、所述预测病理标签以及所述置信距离计算AUC,评价不同分类器的分类效果。The evaluation module calculates AUC according to the second real pathological label, the predicted pathological label and the confidence distance, and evaluates the classification effects of different classifiers.
优选地,还包括:增强模块;Preferably, it also includes: an enhancement module;
通过至少一种图像增强方法对RGB三维病变图想样本进行增强处理,获取RGB三维病变图像数据集,所述图像增强方法包括:第一截图增强、第一缩放增强、第二缩放后的第二截图增强。The RGB three-dimensional lesion image sample is enhanced by at least one image enhancement method to obtain an RGB three-dimensional lesion image data set. The image enhancement method includes: first screenshot enhancement, first zoom enhancement, second zoomed second Screenshot enhancements.
从以上技术方案可以看出,本申请具有以下优点:As can be seen from the above technical solutions, the present application has the following advantages:
本申请中提供的一种肺癌图像病理分类方法,首先,提取预置RGB三维病变图像数据集的色彩特征,得到特征向量集;其次,将由所述特征向量集按列组成的矩阵进行非负矩阵分解,得到系数矩阵;然后,将所述系数矩阵作为样本数据集划分为训练数据集与测试数据集,并分别获取每个训练数据的第一真实病理标签和每个测试数据的第二真实病理标签;接着,用所述训练数据集与所述第一真实病理标签对分类器进行分类训练,得到训练完成的分类器;最后,将所述测试数据集输入到所述训练完成的分类器中进行分类,得到预测病理标签和置信距离,其中,置信距离为预测病例标签对应的概率值。本申请提供的肺癌图像病理分类方法,通过提取肺癌图像的色彩特征相较于提取肺癌图像的纹理、结构特征更加便捷快速,使用非负矩阵分解矩阵能有效对矩阵降维,从而降低计算量,而训练完成的分类器的分类速度较快,所以,本申请从此三方面解决了现有的通过医师在显微镜下观察从患者体内获取的病变组织后制作的活体冰冻切片和石蜡切片才能得到结果,导致等待肺癌诊断结果的时间较长的技术问题。A method for pathological classification of lung cancer images provided in this application, first, extracting the color features of the preset RGB three-dimensional lesion image data set to obtain a feature vector set; secondly, performing a non-negative matrix on the matrix composed of the feature vector set by column Decompose to obtain a coefficient matrix; then, divide the coefficient matrix as a sample data set into a training data set and a test data set, and obtain the first real pathology label of each training data and the second real pathology label of each test data respectively label; then, use the training data set and the first true pathological label to classify the classifier to obtain a trained classifier; finally, input the test data set into the trained classifier Classification is performed to obtain the predicted pathological label and the confidence distance, where the confidence distance is the probability value corresponding to the predicted case label. The pathological classification method of lung cancer images provided by this application is more convenient and faster than extracting the texture and structural features of lung cancer images by extracting the color features of lung cancer images. The use of non-negative matrix factorization matrix can effectively reduce the dimension of the matrix, thereby reducing the amount of calculation. The classification speed of the trained classifier is relatively fast, so this application solves the existing problem of living frozen sections and paraffin sections made after the physician observes the diseased tissue obtained from the patient's body under a microscope in three aspects. Technical issues that lead to longer wait times for lung cancer diagnosis results.
附图说明Description of drawings
图1为本申请的一种肺癌图像病理分类方法的实施例一的流程示意图;FIG. 1 is a schematic flow chart of Embodiment 1 of a lung cancer image pathological classification method of the present application;
图2为本申请的一种肺癌图像病理分类方法的实施例二的流程示意图;FIG. 2 is a schematic flow diagram of Embodiment 2 of a lung cancer image pathological classification method of the present application;
图3为本申请的一种肺癌图像病理分类装置的实施例的结构示意图;FIG. 3 is a schematic structural diagram of an embodiment of a lung cancer image pathology classification device of the present application;
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
本申请提供了一种肺癌图像病理分类方法,首先,提取预置RGB三维病变图像数据集的色彩特征,得到特征向量集;其次,将由特征向量集按列组成的矩阵进行非负矩阵分解,得到系数矩阵;然后,将系数矩阵作为样本数据集划分为训练数据集与测试数据集,并分别获取每个训练数据的第一真实病理标签和每个测试数据的第二真实病理标签;接着,用训练数据集与第一真实病理标签对分类器进行分类训练,得到训练完成的分类器;最后,将测试数据集输入到训练完成的分类器中进行分类,得到预测病理标签和置信距离,其中,置信距离为预测病例标签对应的概率值。本申请提供的肺癌图像病理分类方法,通过提取肺癌图像的色彩特征相较于提取肺癌图像的纹理、结构特征更加便捷快速,使用非负矩阵分解矩阵能有效对矩阵降维,从而降低计算量,而训练完成的分类器的分类速度较快,所以,本申请从此三方面解决了现有的通过医师在显微镜下观察从患者体内获取的病变组织后制作的活体冰冻切片和石蜡切片才能得到结果,导致等待肺癌诊断结果的时间较长的技术问题。This application provides a pathological classification method for lung cancer images. First, the color features of the preset RGB three-dimensional lesion image data set are extracted to obtain the feature vector set; secondly, the matrix composed of the feature vector set by column is subjected to non-negative matrix decomposition to obtain coefficient matrix; then, divide the coefficient matrix as a sample data set into a training data set and a test data set, and obtain the first real pathological label of each training data and the second real pathological label of each test data respectively; then, use The training data set and the first true pathological label are used to classify and train the classifier to obtain the trained classifier; finally, the test data set is input into the trained classifier for classification, and the predicted pathological label and confidence distance are obtained. Among them, The confidence distance is the probability value corresponding to the predicted case label. The pathological classification method of lung cancer images provided by this application is more convenient and faster than extracting the texture and structural features of lung cancer images by extracting the color features of lung cancer images. The use of non-negative matrix factorization matrix can effectively reduce the dimension of the matrix, thereby reducing the amount of calculation. The classification speed of the trained classifier is relatively fast, so this application solves the existing problem of living frozen sections and paraffin sections made after the physician observes the diseased tissue obtained from the patient's body under a microscope in three aspects. Technical issues that lead to longer wait times for lung cancer diagnosis results.
为了便于理解,请参照图1,本申请提供了一种肺癌图像病理分类方法的实施例一,包括:For ease of understanding, please refer to Figure 1. The present application provides a first embodiment of a method for pathological classification of lung cancer images, including:
步骤101、提取预置RGB三维病变图像数据集的色彩特征,得到特征向量集。Step 101, extracting color features of a preset RGB three-dimensional lesion image data set to obtain a set of feature vectors.
其中,色彩特征是RGB三维图像最具代表性的特征之一,提取的特征为向量的形式,每张RGB三维病变图像均可提取到一个特征向量,且提取到的特征向量均为256×1维的。Among them, the color feature is one of the most representative features of RGB three-dimensional images, and the extracted features are in the form of vectors. Each RGB three-dimensional lesion image can extract a feature vector, and the extracted feature vectors are all 256×1 dimensional.
步骤102、将由特征向量集按列组成的矩阵进行非负矩阵分解,得到系数矩阵。Step 102, perform non-negative matrix decomposition on the matrix composed of eigenvector sets by columns to obtain a coefficient matrix.
需要说明的是,将所有获取到的特征向量集按列合并为一个256×N维的矩阵,其中N为RGB三维病变图像总量,然后使用非负矩阵分解法对矩阵进行分解,获取系数矩阵,非负矩阵分解法对矩阵的分解能够有效地对矩阵进行降维,从而减少计算量。It should be noted that all the obtained feature vector sets are combined into a 256×N-dimensional matrix by column, where N is the total amount of RGB three-dimensional lesion images, and then the matrix is decomposed by the non-negative matrix decomposition method to obtain the coefficient matrix , the non-negative matrix factorization method can effectively reduce the dimension of the matrix by decomposing the matrix, thereby reducing the amount of calculation.
步骤103、将系数矩阵作为样本数据集集划分为训练数据集与测试数据集,并分别获取每个训练数据的第一真实病理标签和每个测试数据的第二真实病理标签。Step 103: Divide the coefficient matrix as a sample data set into a training data set and a test data set, and obtain the first real pathological label of each training data and the second real pathological label of each test data respectively.
需要说明的是,将系数矩阵按列拆分作为样本数据集,每一列作为一个样本数据,将所有的样本数据按比例划分为训练数据集和测试数据集,并且为每个训练数据制作第一真实病理标签,为每一个测试数据制造第二真实病理标签,本申请实施例中的真实病理标签指的是样本数据的真实病理类别。It should be noted that the coefficient matrix is split by column as a sample data set, and each column is used as a sample data, and all sample data are divided into training data set and test data set in proportion, and the first training data set is made for each training data The real pathology label is to create a second real pathology label for each test data. The real pathology label in the embodiment of the present application refers to the real pathology category of the sample data.
步骤104、用训练数据集与第一真实病理标签对分类器进行分类训练,得到训练完成的分类器。Step 104: Use the training data set and the first true pathological label to perform classification training on the classifier to obtain a trained classifier.
需要说明的是,训练数据集为向量格式,第一真实病理标签为训练样本数据的真实类别,将训练数据集和第一真实病理标签输入到分类器中,训练分类,使得分类器的分类更具针对性。It should be noted that the training data set is in vector format, and the first real pathological label is the real category of the training sample data. The training data set and the first real pathological label are input into the classifier, and the training classification makes the classification of the classifier more accurate. targeted.
步骤105、将测试数据集输入到所述训练完成的分类器中进行分类,得到预测病理标签和置信距离。Step 105, input the test data set into the trained classifier for classification, and obtain the predicted pathological label and confidence distance.
其中,置信距离为预测病例标签对应的概率值。Among them, the confidence distance is the probability value corresponding to the predicted case label.
需要说明的是,测试数据集为向量格式,将测试数据集输入已经训练完成的分类器中进行分类,可以较快的得到预测病理标签。It should be noted that the test data set is in vector format, and the test data set is input into the trained classifier for classification, and the predicted pathological label can be obtained quickly.
通过提取肺癌图像的色彩特征相较于提取肺癌图像的纹理、结构特征更加便捷快速,使用非负矩阵分解矩阵能有效对矩阵降维,从而降低计算量,而训练完成的分类器的分类速度较快,所以,本申请从此三方面解决了现有的通过医师在显微镜下观察从患者体内获取的病变组织后制作的活体冰冻切片和石蜡切片才能得到结果,导致等待肺癌诊断结果的时间较长的技术问题。Extracting the color features of lung cancer images is more convenient and faster than extracting the texture and structural features of lung cancer images. The use of non-negative matrix factorization matrix can effectively reduce the dimension of the matrix, thereby reducing the amount of calculation, and the classification speed of the trained classifier is faster. Fast, so, this application solves the existing problem that the results can only be obtained by living frozen sections and paraffin sections made by doctors observing the diseased tissues obtained from the patient under a microscope from three aspects, resulting in a long waiting time for the diagnosis of lung cancer. technical problem.
为了便于理解,请参照图2,本申请提供了一种肺癌图像病理分类方法的实施例二,包括:For ease of understanding, please refer to FIG. 2. The present application provides a second embodiment of a method for pathological classification of lung cancer images, including:
步骤201、采用图像增强方法对RGB三维病变图像样本进行增强处理,得到RGB三维病变图像数据集。Step 201, using an image enhancement method to perform enhancement processing on the RGB three-dimensional lesion image samples to obtain an RGB three-dimensional lesion image data set.
需要说明的是,根据获取RGB三维病变图像的质量和数量,可以使用不同的图像增强方法对图像进行增强处理,图像增强方法可以是直接对RGB三维病变图像中的病灶区域进行第一截图;也可以采取第一缩放方法尽量保证RGB三维病变图像病灶的完整性,同时兼顾RGB三维病变图像病灶的形态特异性,同时还可以在第二缩放的基础上进一步对RGB三维病变图像病灶的部分进行第二截图。It should be noted that, according to the quality and quantity of the acquired RGB three-dimensional lesion image, different image enhancement methods can be used to enhance the image, and the image enhancement method can be directly taking the first screenshot of the lesion area in the RGB three-dimensional lesion image; The first scaling method can be used to ensure the integrity of the RGB three-dimensional lesion image as much as possible, while taking into account the morphological specificity of the RGB three-dimensional lesion image lesion. At the same time, the second scaling can be further performed on the part of the RGB three-dimensional lesion image lesion. Two screenshots.
步骤202、提取RGB三维病变图像数据集的色彩特征,得到特征向量集。Step 202, extracting the color features of the RGB three-dimensional lesion image data set to obtain a feature vector set.
需要说明的是,基于颜色直方图提取RGB三维病变数据集的色彩特征,得到特征向量,首先,将RGB图象转为由色调(H)、饱和度(S)、明度(V)组成的颜色空间HSV;其次,将H分量量化16级,将S分量和V分量分别量化为4级;最后,将三个颜色分量合成为一维特征向量,计算其直方图分布,输出256维的列向量。It should be noted that the color features of the RGB three-dimensional lesion data set are extracted based on the color histogram, and the feature vector is obtained. First, the RGB image is converted into a color composed of hue (H), saturation (S), and lightness (V). Space HSV; secondly, quantize the H component to 16 levels, quantize the S component and V component to 4 levels respectively; finally, synthesize the three color components into a one-dimensional feature vector, calculate its histogram distribution, and output a 256-dimensional column vector .
步骤203、将由特征向量集按列组成的矩阵进行非负矩阵分解,得到系数矩阵。Step 203, perform non-negative matrix decomposition on the matrix composed of eigenvector sets by column to obtain a coefficient matrix.
需要说明的是,将所有代表RGB三维病变图像特征的列向量按列组成一个矩阵,然后用非负矩阵分解法对矩阵进行分解,可以得到基矩阵和系数矩阵,获取其系数矩阵。It should be noted that all the column vectors representing the characteristics of the RGB three-dimensional lesion image are composed into a matrix by column, and then the matrix is decomposed by the non-negative matrix decomposition method to obtain the base matrix and coefficient matrix, and obtain the coefficient matrix.
其中,非负矩阵分解过程为:Among them, the non-negative matrix factorization process is:
首先,已知非负矩阵分解数学公式:First, the mathematical formula for non-negative matrix factorization is known:
V≈WHV≈WH
其中,V为原矩阵,W为基矩阵,H为系数矩阵;Among them, V is the original matrix, W is the base matrix, and H is the coefficient matrix;
可以根据非负矩阵分解数学公式得到非负矩阵分解目标函数:The non-negative matrix factorization objective function can be obtained according to the non-negative matrix factorization mathematical formula:
对目标函数分别求基矩阵和系数矩阵的偏导,然后通过梯度下降法求解出基矩阵W和系数矩阵H的迭代公式:The partial derivatives of the base matrix and the coefficient matrix are respectively calculated for the objective function, and then the iterative formula of the base matrix W and the coefficient matrix H is solved by the gradient descent method:
根据预设的迭代次数可以求得输出系数矩阵。The output coefficient matrix can be obtained according to the preset number of iterations.
步骤204、将系数矩阵作为样本数据集划分为训练数据集与测试数据集,并分别获取每个训练数据的第一真实病理标签和每个测试数据的第二真实病理标签。Step 204: Divide the coefficient matrix as a sample data set into a training data set and a testing data set, and obtain the first real pathological label of each training data and the second real pathological label of each testing data respectively.
需要说明的是,首先将系数矩阵按列拆分为列向量,每一列作为一个样本数据,将获取到的所有样本数据按预设比例划分为训练数据集和测试数据集,针对每个样本数据获取响应的样本标签,即所属的病理类别,此时获取的标签为样本的真实标签。It should be noted that, firstly, the coefficient matrix is split into column vectors by columns, and each column is used as a sample data, and all the sample data obtained are divided into training data sets and test data sets according to preset ratios, and for each sample data Obtain the sample label of the response, that is, the pathological category to which it belongs. At this time, the obtained label is the real label of the sample.
步骤205、用训练数据集与第一真实病理标签对SVM分类器进行分类训练,得到训练完成的SVM分类器。Step 205: Classify and train the SVM classifier with the training data set and the first true pathological label, and obtain a trained SVM classifier.
需要说明的是,训练数据集为训练列向量集,输入SVM中进行训练,能够提升SVM分类器对病理图像的识别精度,可以得到具有针对性的SVM分类器。It should be noted that the training data set is a training column vector set, which can be input into the SVM for training, which can improve the recognition accuracy of the SVM classifier for pathological images, and can obtain a targeted SVM classifier.
步骤206、将测试数据集输入到训练完成的SVM分类器中进行分类,得到预测病理标签和置信距离。Step 206, input the test data set into the trained SVM classifier for classification, and obtain the predicted pathological label and confidence distance.
其中,置信距离为预测病例标签对应的概率值。Among them, the confidence distance is the probability value corresponding to the predicted case label.
需要说明的是,本实施例中的分类器除了SVM分类器外,还可以是其他分类器,同样不影响本方法解决本申请中所提技术问题,因此,在执行本申请实施例二步骤201至步骤204后,可以将训练数据集与第一真实病理标签输入到至少两个分类器中,对分类器进行训练,得到不同训练完成的分类器;然后,将测试数据集输入到不同训练完成的分类器中进行分类,得到多组预测病理标签和置信距离,第二真实病理标签、预测病理标签以及置信距离计算AUC(ROC曲线下方的面积大小),评价不同分类器的分类效果。It should be noted that, in addition to the SVM classifier, the classifier in this embodiment can also be other classifiers, which also does not affect the method to solve the technical problems raised in this application. Therefore, when executing step 201 of the second embodiment of this application After step 204, the training data set and the first true pathological label can be input into at least two classifiers, and the classifiers are trained to obtain different trained classifiers; then, the test data set is input into different training complete The classifier is used to classify, and multiple sets of predicted pathological labels and confidence distances are obtained. The second real pathological labels, predicted pathological labels, and confidence distances are used to calculate AUC (area under the ROC curve), and the classification effects of different classifiers are evaluated.
需要说明的是,以置信距离为阈值可以判定样本类别,样本类别主要有:真阳性(TP):判断为阳性,实际也是阳性;伪阳性(FP):判断为阴性,实际却是阳性;真阴性(TN):判断为阴性,实际也是阴性;伪阴性(FN):判断为阴性,实际却是阳性;然后根据第二真实病理标签和预测病理标签可以得到真阳性概率和伪阳性概率;通过真阳性概率和伪阳性概率绘制ROC曲线,从而可以得到ROC曲线下方的面积大小,即AUC。AUC作为一个数值能够清晰直观地反应不同分类器的分类效果,同时客观地评价不均衡样本的分类精度。可以根据AUC对不同分类器的评价,选取分类效果更好的分类器。It should be noted that the sample category can be determined by using the confidence distance as the threshold. The sample categories mainly include: true positive (TP): judged as positive, but actually positive; false positive (FP): judged as negative, but actually positive; true positive (TP): Negative (TN): It is judged to be negative, but it is actually negative; False Negative (FN): It is judged to be negative, but it is actually positive; then the true positive probability and false positive probability can be obtained according to the second real pathological label and predicted pathological label; through The true positive probability and the false positive probability draw the ROC curve, so that the area under the ROC curve can be obtained, that is, AUC. As a value, AUC can clearly and intuitively reflect the classification effect of different classifiers, and at the same time objectively evaluate the classification accuracy of unbalanced samples. According to the evaluation of different classifiers by AUC, the classifier with better classification effect can be selected.
为了便于理解,请参照图3,本申请提供了一种肺癌图像病理分类装置,包括:增强模块301,特征提取模块302、分解模块303、处理模块304、训练模块305、测试模块306、评价模块307。For ease of understanding, please refer to FIG. 3 , the present application provides a lung cancer image pathology classification device, including: an enhancement module 301, a feature extraction module 302, a decomposition module 303, a processing module 304, a training module 305, a testing module 306, and an evaluation module 307.
增强模块301,使用图像增强方法对RGB三维病变图想样本进行增强处理,获取RGB三维病变图像数据集。The enhancement module 301 uses an image enhancement method to perform enhancement processing on RGB three-dimensional lesion image samples, and obtains an RGB three-dimensional lesion image data set.
特征提取模块302,用于提取预置RGB三维病变图像数据集的色彩特征,得到特征向量集。The feature extraction module 302 is configured to extract color features of a preset RGB three-dimensional lesion image data set to obtain a set of feature vectors.
分解模块303,将由特征向量集按列组成的矩阵进行非负矩阵分解,得到系数矩阵。Decomposition module 303 performs non-negative matrix decomposition on the matrix composed of eigenvector sets by columns to obtain a coefficient matrix.
处理模块304,将系数矩阵作为样本数据集划分为训练数据集与测试数据集,并分别获取每个训练数据的第一真实病理标签和每个测试数据的第二真实病理标签。The processing module 304 divides the coefficient matrix as a sample data set into a training data set and a test data set, and acquires the first real pathological label of each training data and the second real pathological label of each test data respectively.
训练模块305,用训练数据集与第一真实标签对分类器进行分类训练,得到训练完成的分类器。The training module 305 uses the training data set and the first real label to perform classification training on the classifier to obtain a trained classifier.
测试模块306,用于将测试数据集输入到训练完成的分类器中进行分类,得到预测病理标签置信距离,其中,置信距离为预测病例标签对应的概率值。The test module 306 is configured to input the test data set into the trained classifier for classification to obtain the confidence distance of the predicted pathological label, wherein the confidence distance is the probability value corresponding to the predicted case label.
评价模块307,根据所述第二真实病理标签、所述预测病理标签以及所述置信距离计算AUC,评价不同分类器的分类效果。The evaluation module 307 calculates AUC according to the second real pathological label, the predicted pathological label and the confidence distance, and evaluates the classification effect of different classifiers.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple modules or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or modules may be in electrical, mechanical or other forms.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place, or may be distributed to multiple network modules. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing module, each module may exist separately physically, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.
所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以通过一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(英文全称:Read-OnlyMemory,英文缩写:ROM)、随机存取存储器(英文全称:Random Access Memory,英文缩写:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated modules are realized in the form of software function modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for executing all or part of the steps of the methods described in the various embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device, etc.). The aforementioned storage media include: U disk, mobile hard disk, read-only memory (English full name: Read-OnlyMemory, English abbreviation: ROM), random access memory (English full name: Random Access Memory, English abbreviation: RAM), disk Or various media such as CDs that can store program codes.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still understand the foregoing The technical solutions described in each embodiment are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the application.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910778738.3A CN110472694A (en) | 2019-08-22 | 2019-08-22 | A kind of Lung Cancer Images pathological classification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910778738.3A CN110472694A (en) | 2019-08-22 | 2019-08-22 | A kind of Lung Cancer Images pathological classification method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110472694A true CN110472694A (en) | 2019-11-19 |
Family
ID=68513374
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910778738.3A Pending CN110472694A (en) | 2019-08-22 | 2019-08-22 | A kind of Lung Cancer Images pathological classification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110472694A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111461243A (en) * | 2020-04-08 | 2020-07-28 | 中国医学科学院肿瘤医院 | Classification method, classification device, electronic equipment and computer-readable storage medium |
CN113239974A (en) * | 2021-04-21 | 2021-08-10 | 中国传媒大学 | Rapid image data classification method capable of continuous learning |
CN117173485A (en) * | 2023-09-18 | 2023-12-05 | 西安交通大学医学院第二附属医院 | An intelligent classification system method and system for lung cancer tissue pathology images |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104933711A (en) * | 2015-06-10 | 2015-09-23 | 南通大学 | Automatic fast segmenting method of tumor pathological image |
CN106530296A (en) * | 2016-11-07 | 2017-03-22 | 首都医科大学 | Lung detection method and device based on PET/CT image features |
-
2019
- 2019-08-22 CN CN201910778738.3A patent/CN110472694A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104933711A (en) * | 2015-06-10 | 2015-09-23 | 南通大学 | Automatic fast segmenting method of tumor pathological image |
CN106530296A (en) * | 2016-11-07 | 2017-03-22 | 首都医科大学 | Lung detection method and device based on PET/CT image features |
Non-Patent Citations (2)
Title |
---|
杨帆: "《数字图像处理与分析》", 31 October 2007, 上海科学技术出版社 * |
王忆勤: "《中医面诊与计算机辅助诊断》", 30 November 2010 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111461243A (en) * | 2020-04-08 | 2020-07-28 | 中国医学科学院肿瘤医院 | Classification method, classification device, electronic equipment and computer-readable storage medium |
CN111461243B (en) * | 2020-04-08 | 2023-06-20 | 中国医学科学院肿瘤医院 | Classification method, device, electronic device, and computer-readable storage medium |
CN113239974A (en) * | 2021-04-21 | 2021-08-10 | 中国传媒大学 | Rapid image data classification method capable of continuous learning |
CN117173485A (en) * | 2023-09-18 | 2023-12-05 | 西安交通大学医学院第二附属医院 | An intelligent classification system method and system for lung cancer tissue pathology images |
CN117173485B (en) * | 2023-09-18 | 2024-02-13 | 西安交通大学医学院第二附属医院 | Intelligent classification system method and system for lung cancer tissue pathological images |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Agarwal et al. | Deep learning for mass detection in full field digital mammograms | |
Zhou et al. | Multi-task learning for segmentation and classification of tumors in 3D automated breast ultrasound images | |
Raghavendra et al. | Computer-aided diagnosis for the identification of breast cancer using thermogram images: A comprehensive review | |
Sridar et al. | Decision fusion-based fetal ultrasound image plane classification using convolutional neural networks | |
CN111553892B (en) | Lung nodule segmentation calculation method, device and system based on deep learning | |
CN112884759B (en) | Method and related device for detecting metastasis state of axillary lymph nodes of breast cancer | |
Özbay et al. | Interpretable pap-smear image retrieval for cervical cancer detection with rotation invariance mask generation deep hashing | |
CN110472694A (en) | A kind of Lung Cancer Images pathological classification method and device | |
Zhou et al. | Deep learning-based breast region extraction of mammographic images combining pre-processing methods and semantic segmentation supported by Deeplab v3+ | |
Zhang et al. | Comparison of multiple feature extractors on Faster RCNN for breast tumor detection | |
Iqbal et al. | Brain tumor segmentation in multimodal MRI using U-Net layered structure | |
Kumar et al. | A Novel Approach for Breast Cancer Detection by Mammograms | |
Holzinger et al. | On the generation of point cloud data sets: Step one in the knowledge discovery process | |
Likhitkar et al. | Automated detection of cancerous lung nodule from the computed tomography images | |
AlShourbaji et al. | Early detection of skin cancer using deep learning approach | |
Bhagavan et al. | A compressive survey on different image processing techniques to identify the brain tumor | |
CN111127636B (en) | Intelligent complex intra-articular fracture desktop-level three-dimensional diagnosis system | |
Akshaya et al. | Identification of Brain Tumor on Mri images with and without Segmentation using DL Techniques | |
Roy Medhi | Lung Cancer Classification from Histologic Images using Capsule Networks | |
Seyed Abolghasemi et al. | Accuracy improvement of breast tumor detection based on dimension reduction in the spatial and edge features and edge structure in the image | |
Kuo et al. | Lymphatic infiltration detection in breast cancer h&e image prior to lymphadenectomy | |
Babna et al. | Multi-class detection of skin disease: detection using HOG and CNN hybrid feature extraction | |
Bakasa et al. | Light Gradient-Boosting Machine Edge Detection With Cropping Layer for Semantic Segmentation of Pancreas. | |
CN115132357B (en) | Device for predicting target disease index state based on medical image map | |
Solanki et al. | An Approach for Classification of Brain Tumor using Fully Connected Deep Convolutional Neural Network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191119 |
|
RJ01 | Rejection of invention patent application after publication |