CN117132573A

CN117132573A - Method for predicting pulmonary nodule multiplication time based on single chest CT examination

Info

Publication number: CN117132573A
Application number: CN202311118440.2A
Authority: CN
Inventors: 范丽; 刘士远; 黄文君; 周陶胡; 周秀秀; 王祥
Original assignee: Shanghai Changzheng Hospital
Current assignee: Shanghai Changzheng Hospital
Priority date: 2023-08-31
Filing date: 2023-08-31
Publication date: 2023-11-28

Abstract

The application relates to the field of lung nodule monitoring, and discloses a lung nodule multiplication time prediction method based on single chest CT examination, which is suitable for being executed in computing equipment, and comprises the following steps: generating feature data for each sample in a pre-acquired sample set, the sample set comprising a plurality of lung nodules, the feature data comprising a location, a size, a proximity structure, and a calculated doubling time of the lung nodules; performing radiometric feature extraction on the CT image according to the feature data to obtain a first feature set; performing feature dimension reduction according to the first feature set to determine a training feature set; training the initial model according to the training feature set and the calculated multiplication time to obtain a prediction model; and inputting CT images of a single chest examination of the patient to be predicted into the model to obtain the predicted multiplication time of the lung nodule of the patient to be predicted. The application can predict the multiplication time of the lung nodule according to the CT image of the single examination of the lung nodule, thereby guiding the scientific examination treatment and reducing the radiation dose of multiple follow-up examinations.

Description

A prediction method for pulmonary nodule doubling time based on a single chest CT examination

技术领域Technical field

本发明涉及肺结节预测领域，特别涉及一种基于单次胸部CT检查的肺结节倍增时间的预测方法。The present invention relates to the field of pulmonary nodule prediction, and in particular to a prediction method of pulmonary nodule doubling time based on a single chest CT examination.

背景技术Background technique

肺结节按影像学表现可分为：实性结节和磨玻璃结节。磨玻璃结节(GGN)可分为两大类：不含实性成分的为单纯性GGN(pure GGN，pGGN)；伴有实性成分、掩盖部分肺纹理的为混合性GGN(mixed GGN，mGGN)。与实性结节相比，GGN与肺腺癌的关系较为密切，后者包括原位腺癌(AIS)、微浸润腺癌(MIA)和浸润性腺癌(IAC)。GGN的生长预测对于判断其性质具有重要的价值，评估结节是否生长，倍增时间是一个重要的指标。目前则是通过多次CT检查，才能计算倍增时间，判断结节是否有生长，在随访复查的期间，患者常常焦虑，同时多次CT检查也增加了辐射剂量。Pulmonary nodules can be divided into solid nodules and ground-glass nodules based on imaging findings. Ground glass nodules (GGN) can be divided into two categories: pure GGN (pGGN) without solid components; mixed GGN (mixed GGN, pGGN) with solid components and covering part of the lung texture. mGGN). Compared with solid nodules, GGN is more closely related to lung adenocarcinoma, which includes adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), and invasive adenocarcinoma (IAC). The growth prediction of GGN is of great value in judging its properties. Doubling time is an important indicator to evaluate whether a nodule will grow. Currently, multiple CT examinations are needed to calculate the doubling time and determine whether the nodule is growing. During the follow-up review period, patients are often anxious, and multiple CT examinations also increase the radiation dose.

而现有技术中，结节的倍增时间无法根据单次检查的CT图像获得。为此，需要一种新的基于单次胸部CT检查的肺结节倍增时间的预测方法。In the existing technology, the doubling time of the nodule cannot be obtained based on the CT image of a single examination. To this end, a new prediction method for pulmonary nodule doubling time based on a single chest CT examination is needed.

发明内容Contents of the invention

为此，本发明提供一种基于单次胸部CT检查的肺结节倍增时间的预测方法，以力图解决上面存在的问题。To this end, the present invention provides a prediction method for pulmonary nodule doubling time based on a single chest CT examination, in an attempt to solve the above existing problems.

根据本发明的一方面，提供了一种基于单次胸部CT检查的肺结节倍增时间的预测方法，适于在计算设备中执行，方法包括步骤：对预先获取的样本集中每个样本生成特征数据，样本集包括多个肺结节的多次胸部CT检查，特征数据包括肺结节的位置、大小、密度、形态、边缘、瘤肺界面、内部特征、邻近结构和计算倍增时间；根据所述特征数据对CT图像进行放射组学特征提取得到第一特征集；根据所述第一特征集进行特征降维确定训练特征集；根据所述训练特征集和计算倍增时间对初始模型进行训练得到预测模型；将待预测患者的单次胸部检查的CT图像输入所述根据所述预测模型，得到待预测患者肺结节的预测倍增时间。According to one aspect of the present invention, a method for predicting pulmonary nodule doubling time based on a single chest CT examination is provided, which is suitable for execution in a computing device. The method includes the steps of: generating features for each sample in a pre-acquired sample set Data, the sample set includes multiple chest CT examinations of multiple pulmonary nodules, and the characteristic data includes the location, size, density, shape, edge, tumor-lung interface, internal characteristics, adjacent structures and calculated doubling time of the pulmonary nodules; according to the required Perform radiomics feature extraction on CT images with the above feature data to obtain a first feature set; perform feature dimensionality reduction based on the first feature set to determine a training feature set; train the initial model based on the training feature set and calculate the doubling time to obtain Prediction model: input the CT image of a single chest examination of the patient to be predicted into the prediction model to obtain the predicted doubling time of the patient's pulmonary nodule to be predicted.

可选地，在根据本发明的方法中，样本集中的样本根据选择标准和排除标准确定得到。Optionally, in the method according to the present invention, the samples in the sample set are determined according to the selection criteria and the exclusion criteria.

可选地，在根据本发明的方法中，选择标准为作为样本肺结节需要满足的标准，包括：基线CT扫描的肺结节直径≤3cm；做过两次以上的薄层CT扫描；基线CT扫描与手术前最后一次CT扫描之间的时间间隔大于或等于180天；在最后一次CT扫描后的一个月内进行手术。Optionally, in the method according to the present invention, the selection criteria are the criteria that need to be met as sample pulmonary nodules, including: the diameter of the pulmonary nodule in the baseline CT scan is ≤3cm; more than two thin-section CT scans have been performed; baseline The time interval between the CT scan and the last CT scan before surgery was greater than or equal to 180 days; surgery was performed within one month of the last CT scan.

可选地，在根据本发明的方法中，还包括：排除标准为不能作为样本的肺结节的标准，包括：手术前做过任何治疗的结节，治疗包括放疗或化疗；缺少临床信息或丢失结节图像数据的结节；结节图像伪影影响结节显示。Optionally, in the method according to the present invention, it also includes: Exclusion criteria for pulmonary nodules that cannot be used as samples, including: nodules that have undergone any treatment before surgery, and the treatment includes radiotherapy or chemotherapy; lack of clinical information or Nodules that have lost nodule image data; nodule image artifacts affect nodule display.

可选地，在根据本发明的方法中，所述对预先获取的样本集中每个样本生成特征数据包括：根据GGN的基线CT扫描到手术前最后一次CT扫描的时间间隔、GGN基线CT扫描时的体积、手术前GGN最后一次CT扫描的体积，按照如下公式确定计算倍增时间：Optionally, in the method according to the present invention, generating characteristic data for each sample in the pre-acquired sample set includes: the time interval from the baseline CT scan of the GGN to the last CT scan before surgery, the time interval of the GGN baseline CT scan The volume of the GGN and the volume of the last CT scan of the GGN before surgery are determined according to the following formula to calculate the doubling time:

VDT＝(log2*T)/[log(V_f/V_i)]VDT=(log2*T)/[log(V _f /V _i )]

其中，T是GGN的基线CT扫描到手术前最后一次CT扫描的时间间隔，V_i是GGN基线CT扫描时的体积，V_f是手术前GGN最后一次CT扫描的体积。Among them, T is the time interval from the baseline CT scan of the GGN to the last CT scan before surgery, V _i is the volume of the GGN at the baseline CT scan, and V _f is the volume of the last CT scan of the GGN before surgery.

可选地，根据所述特征数据对CT图像进行放射组学特征提取得到第一特征集包括：对CT图像进行高斯拉普拉斯滤波变换和小波变换，提取图像的高级特征得到放射组学特征集；对CT图像提取一阶形状和二阶纹理特征得到影像组学特征；使用类间相关系数对影像组学特征进行观察者内和观察者间的一致性检验，从放射组学特征集中选择一致性检验满足预设条件的放射组学特征作为第一特征集。Optionally, performing radiomic feature extraction on the CT image according to the feature data to obtain the first feature set includes: performing Gaussian Laplacian filter transformation and wavelet transform on the CT image, and extracting high-level features of the image to obtain the radiomic feature Set; extract first-order shape and second-order texture features from CT images to obtain radiomic features; use inter-class correlation coefficients to perform intra-observer and inter-observer consistency tests on radiomic features, and select from the radiomic feature set The radiomic features that meet the preset conditions for consistency testing are used as the first feature set.

可选地，在根据本发明的方法中，根据所述第一特征集进行特征降维确定训练特征集包括：对所述第一特征集中的特征通过最小冗余和最大关联方法得到多个预设训练特征集，每个预设训练特征集中包括的特征数目不同；对每个预设训练特征集根据初始模型计算AUC，并将AUC最大的预设训练集作为训练特征集，其中，AUC为受试者工作特征曲线下的平均面积。Optionally, in the method according to the present invention, performing feature dimensionality reduction to determine the training feature set based on the first feature set includes: obtaining multiple predetermined features for the features in the first feature set through the minimum redundancy and maximum correlation methods. Assume a training feature set, each preset training feature set includes a different number of features; calculate the AUC for each preset training feature set based on the initial model, and use the preset training set with the largest AUC as the training feature set, where AUC is The average area under the receiver operating characteristic curve.

可选地，在根据本发明的方法中，初始模型包括：随机森林、线性核支持向量机、径向基函数核支持向量机、线性极端梯度提升、K最近邻、人工神经网络和朴素贝叶斯。Optionally, in the method according to the present invention, the initial model includes: random forest, linear kernel support vector machine, radial basis function kernel support vector machine, linear extreme gradient boosting, K nearest neighbor, artificial neural network and naive Baye Sri Lanka.

可选地，在根据本发明的方法中，根据所述训练特征集和计算倍增时间对初始模型进行训练得到预测模型包括：使用确定的训练特征集，根据样本集中训练集的训练样本进行嵌套交叉验证训练每种初始模型；根据嵌套交叉验证训练结果中各模型的AUC、准确度和稳定性，从初始模型中确定预测模型。Optionally, in the method according to the present invention, training the initial model according to the training feature set and calculating the doubling time to obtain the prediction model includes: using the determined training feature set, nesting according to the training samples of the training set in the sample set Cross-validation trains each initial model; the predictive model is determined from the initial model based on the AUC, accuracy, and stability of each model in the nested cross-validation training results.

根据本发明的另一方面，提供了一种计算设备，包括：一个或多个处理器；存储器；以及一个或多个程序，其中一个或多个程序存储在存储器中并被配置为由一个或多个处理器执行，一个或多个程序包括用于执行根据本发明的基于单次胸部CT检查的肺结节倍增时间的预测方法的任一方法的指令。According to another aspect of the present invention, a computing device is provided, including: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be configured by one or more A plurality of processors are executed, and one or more programs include instructions for executing any of the methods for predicting pulmonary nodule doubling time based on a single chest CT examination according to the present invention.

根据本发明的再一方面，提供了一种存储一个或多个程序的计算机可读存储介质，一个或多个程序包括指令，该指令当由计算设备执行时，使得计算设备执行根据本发明的一种基于单次胸部CT检查的肺结节倍增时间的预测方法中的任一方法。According to yet another aspect of the present invention, there is provided a computer-readable storage medium storing one or more programs, the one or more programs including instructions that, when executed by a computing device, cause the computing device to perform a method according to the present invention. Any one of the methods for predicting pulmonary nodule doubling time based on a single chest CT examination.

本发明的一种基于单次胸部CT检查的肺结节倍增时间的预测方法，适于在计算设备中执行，方法包括步骤：对预先获取的样本集中每个样本生成特征数据，样本集包括多个肺结节，特征数据包括所述肺结节的位置、大小、密度、形态、边缘、瘤肺界面、内部特征、邻近结构和计算倍增时间；根据特征数据对CT图像进行放射组学特征提取得到第一特征集；根据第一特征集进行特征降维确定训练特征集；根据训练特征集和计算倍增时间对初始模型进行训练得到预测模型；将待预测患者的CT图像输入根据预测模型，得到待预测患者肺结节的预测倍增时间。本发明能够根据肺结节的CT图像训练预测模型，从而根据基于单次胸部CT检查预测肺结节的倍增时间，从而指导科学检查处理，减少多次随访检查的辐射剂量。The present invention is a method for predicting pulmonary nodule doubling time based on a single chest CT examination, which is suitable for execution in a computing device. The method includes the steps of: generating characteristic data for each sample in a pre-acquired sample set, and the sample set includes multiple A pulmonary nodule, the characteristic data includes the location, size, density, shape, edge, tumor-lung interface, internal characteristics, adjacent structures and calculated doubling time of the pulmonary nodule; radiomics feature extraction is performed on the CT image based on the characteristic data Obtain the first feature set; perform feature dimensionality reduction according to the first feature set to determine the training feature set; train the initial model according to the training feature set and calculate the doubling time to obtain the prediction model; input the CT image of the patient to be predicted according to the prediction model, and obtain The predicted doubling time of pulmonary nodules in patients to be predicted. The present invention can train a prediction model based on CT images of pulmonary nodules, thereby predicting the doubling time of pulmonary nodules based on a single chest CT examination, thereby guiding scientific examination processing and reducing the radiation dose of multiple follow-up examinations.

附图说明Description of the drawings

为了实现上述以及相关目的，本文结合下面的描述和附图来描述某些说明性方面，这些方面指示了可以实践本文所公开的原理的各种方式，并且所有方面及其等效方面旨在落入所要求保护的主题的范围内。通过结合附图阅读下面的详细描述，本公开的上述以及其它目的、特征和优势将变得更加明显。遍及本公开，相同的附图标记通常指代相同的部件或元素。To carry out the above and related purposes, certain illustrative aspects are described herein in conjunction with the following description and accompanying drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and their equivalents are intended to within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent by reading the following detailed description in conjunction with the accompanying drawings. Throughout this disclosure, the same reference numbers generally refer to the same parts or elements.

图1示出了根据本发明一个示范性实施例的基于单次胸部CT检查的肺结节倍增时间的预测方法100的流程示意图；Figure 1 shows a schematic flow chart of a method 100 for predicting pulmonary nodule doubling time based on a single chest CT examination according to an exemplary embodiment of the present invention;

图2示出了根据本发明一个示范性实施例的训练预测模型的过程示意图；Figure 2 shows a schematic diagram of the process of training a prediction model according to an exemplary embodiment of the present invention;

图3示出了根据本发明一个示范性实施例的数据校准前和校准后特征的均值和标准差的变化示意图；Figure 3 shows a schematic diagram of changes in the mean and standard deviation of features before and after data calibration according to an exemplary embodiment of the present invention;

图4示出了根据本发明一个示范性实施例的4个特征集计算得到平均AUC的示意图；Figure 4 shows a schematic diagram of the average AUC calculated from four feature sets according to an exemplary embodiment of the present invention;

图5示出了根据本发明一个示范性实施例的7种模型训练后的性能示意图；Figure 5 shows a schematic diagram of the performance of seven models after training according to an exemplary embodiment of the present invention;

图6示出了根据本发明一个示范性实施例的7种模型稳定性的测试结果的示意图；Figure 6 shows a schematic diagram of the test results of the stability of seven models according to an exemplary embodiment of the present invention;

图7a示出了根据本发明一个示范性实施例的NNet模型ROC曲线的示意图；Figure 7a shows a schematic diagram of the ROC curve of the NNet model according to an exemplary embodiment of the present invention;

图7b示出了根据本发明一个示范性实施例的NNet模型预测混淆矩阵的示意图；Figure 7b shows a schematic diagram of the confusion matrix predicted by the NNet model according to an exemplary embodiment of the present invention;

图8a和图8b分别示出了根据本发明一个示范性实施例的第一例患者基线CT扫描的示意图和手术前最后一次CT扫描的示意图；Figures 8a and 8b respectively show a schematic diagram of the first patient's baseline CT scan and a schematic diagram of the last CT scan before surgery according to an exemplary embodiment of the present invention;

图8c和图8d分别示出了根据本发明一个示范性实施例的第二例患者基线CT扫描的示意图和手术前最后一次CT扫描的示意图。8c and 8d respectively show a schematic diagram of a baseline CT scan of a second patient and a schematic diagram of the last CT scan before surgery according to an exemplary embodiment of the present invention.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。相同的附图标记通常指代相同的部件或元素。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be implemented in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided to provide a thorough understanding of the disclosure, and to fully convey the scope of the disclosure to those skilled in the art. The same reference numbers generally refer to the same parts or elements.

图1示出了根据本发明一个示范性实施例的基于单次胸部CT检查的肺结节倍增时间的预测方法100的流程示意图。本发明的基于单次胸部CT检查的肺结节倍增时间的预测方法100适于在计算设备中执行。基于单次胸部CT检查的肺结节倍增时间的预测方法始于步骤110，对预先获取的样本集中每个样本生成特征数据，所述样本集包括多个肺结节，所述特征数据包括所述肺结节的位置、大小、密度、形态、边缘、瘤肺界面、内部特征、邻近结构和计算倍增时间。Figure 1 shows a schematic flowchart of a method 100 for predicting pulmonary nodule doubling time based on a single chest CT examination according to an exemplary embodiment of the present invention. The method 100 for predicting pulmonary nodule doubling time based on a single chest CT examination of the present invention is suitable for execution in a computing device. The prediction method of pulmonary nodule doubling time based on a single chest CT examination starts from step 110, generating feature data for each sample in a pre-obtained sample set, the sample set includes multiple pulmonary nodules, and the feature data includes all Describe the location, size, density, morphology, edge, tumor-lung interface, internal characteristics, adjacent structures and calculation of doubling time of pulmonary nodules.

图2示出了根据本发明一个示范性实施例的训练预测模型的过程示意图。如图2所示，本发明为了建立肺结节倍增模型，预先获取肺磨玻璃结节(ground glass nodule，GGN)的CT图像。根据本发明的一个实施例，从4家医院采集经手术病理证实的肺磨玻璃结节的结节数据作为样本数据。采集样本数据时，样本集中的样本根据选择标准和排除标准确定得到。Figure 2 shows a schematic diagram of the process of training a prediction model according to an exemplary embodiment of the present invention. As shown in Figure 2, in order to establish a pulmonary nodule multiplication model, the present invention acquires CT images of pulmonary ground glass nodules (GGN) in advance. According to one embodiment of the present invention, nodule data of pulmonary ground-glass nodules confirmed by surgical pathology were collected from 4 hospitals as sample data. When collecting sample data, the samples in the sample set are determined based on the selection criteria and exclusion criteria.

根据如下几个选择标准选择能够作为样本的肺磨玻璃结节:Pulmonary ground-glass nodules that can be used as samples are selected according to the following selection criteria:

基线CT扫描的肺结节直径≤3cm；Pulmonary nodule diameter ≤3cm on baseline CT scan;

做过两次以上的薄层CT扫描；Have had more than two thin-section CT scans;

基线CT扫描与手术前最后一次CT扫描之间的时间间隔大于或等于180天；The time interval between the baseline CT scan and the last CT scan before surgery was greater than or equal to 180 days;

在最后一次CT扫描后的一个月内进行手术。Surgery was performed within one month of the last CT scan.

并且根据如下几个排除标准排除不能够作为样本的肺磨玻璃结节：And pulmonary ground-glass nodules that cannot be used as samples are excluded according to the following exclusion criteria:

手术前做过任何治疗的结节，治疗包括放疗或化疗；Nodules that have undergone any treatment before surgery, including radiotherapy or chemotherapy;

缺少临床信息或丢失结节图像数据；Missing clinical information or missing nodule image data;

结节图像伪影或影响GGN显示的其他因素。Nodule image artifacts or other factors affecting GGN display.

根据上述选择标准和排除标准从4家医院中选择了172例患者的176个GGN作为样本，得到包括176个GGN的样本集。172例患者中，女性112人，男性60人，中位年龄55岁，年龄范围13～83岁；其中：31例来自第一医院，26例来自第二医院，33例来自第三医院，余下82例患者来自第四医院。根据本发明的一个实施例，本发明将来自第一医院、第二医院和第三医院的90例患者的90个GGN作为训练样本，得到包括90个GGN的训练集，训练集中GGN的CT图像为多个患者的多次胸部检查的CT图像；来自第四医院的82例患者的86个GGN作为测试样本，得到包括86个GGN的验证集。样本集包括训练集和验证集。According to the above selection criteria and exclusion criteria, 176 GGNs of 172 patients from 4 hospitals were selected as samples, resulting in a sample set including 176 GGNs. Among the 172 patients, 112 were female and 60 were male, with a median age of 55 years old and an age range of 13 to 83 years old; among them: 31 cases were from the first hospital, 26 cases were from the second hospital, 33 cases were from the third hospital, and the remaining 82 patients were from the Fourth Hospital. According to an embodiment of the present invention, the present invention uses 90 GGNs of 90 patients from the first hospital, the second hospital and the third hospital as training samples to obtain a training set including 90 GGNs. The CT images of the GGNs in the training set are CT images of multiple chest examinations of multiple patients; 86 GGNs from 82 patients in the Fourth Hospital were used as test samples to obtain a verification set including 86 GGNs. The sample set includes training set and validation set.

根据本发明的一个实施例，作为样本的GNN均采集有CT图像。采集的CT图像以医学数字成像与通信(DICOM)格式在图像归档与通信系统(PACS)中进行存储，以便进行检索。GNN的CT图像可通过如下图像采集设备进行采集：GE MEDICAL SYSTEMSDiscovery HD750CT、GE MEDICAL SYSTEMS Optima CT670、Philips Brilliance iCT、Philips IngenuityCT、SIEMENS SOMATOM Definition as和United Imaging CTu510。According to an embodiment of the present invention, the GNNs used as samples all collect CT images. Acquired CT images are stored in the Image Archiving and Communications System (PACS) in Digital Imaging and Communications in Medicine (DICOM) format for retrieval. CT images of GNN can be collected through the following image acquisition equipment: GE MEDICAL SYSTEMS Discovery HD750CT, GE MEDICAL SYSTEMS Optima CT670, Philips Brilliance iCT, Philips IngenuityCT, SIEMENS SOMATOM Definition as and United Imaging CTu510.

GNN的CT图像采集过程中，图像分割由两位医生完成，分别是具有5年胸部CT图像诊断经验的第一位放射科医生和具有3年胸部CT图像诊断经验的第二位放射科医生。第一位放射科医生描绘了样本集中所有样本，即176个GGN的三维感兴趣区域(3D-ROI)，并从样本集中随机选择25个样本，由第二位放射科医生在1个月以后对GGN进行再次描绘3D-ROI。第二位放射科医生在描绘随机选择25个样本的3D-ROI时，肺窗设置包括窗宽为1400HU，窗位为-600HU，在轴向平面逐层描绘GNN，得到3D-ROI。During the CT image acquisition process of GNN, image segmentation was completed by two doctors, the first radiologist with 5 years of experience in chest CT image diagnosis and the second radiologist with 3 years of experience in chest CT image diagnosis. The first radiologist delineated the three-dimensional regions of interest (3D-ROI) of all samples in the sample set, i.e., 176 GGNs, and randomly selected 25 samples from the sample set, followed by the second radiologist 1 month later. Draw the 3D-ROI again on the GGN. When the second radiologist draws the 3D-ROI of 25 randomly selected samples, the lung window settings include a window width of 1400HU and a window level of -600HU. The GNN is drawn layer by layer in the axial plane to obtain the 3D-ROI.

随机选择25个样本的3D-ROI由第三位放射科医生进行检查矫正，以以排除空洞、钙化和邻近结构，如血管、支气管和胸膜。第三位放射科医生是在胸部图像方面具有19年诊断经验的高年资医生。3D-ROIs of 25 randomly selected samples were examined and corrected by a third radiologist to exclude cavities, calcifications, and adjacent structures such as blood vessels, bronchi, and pleura. The third radiologist is a senior doctor with 19 years of diagnostic experience in chest images.

将176个GGN的CT图像和3D-ROI和随机选择的25个GGN的CT图像和3D-ROI作为样本数据集。每个样本数据集中的一个样本数据为GGN的CT图像和3G-ROI。176个GGN的CT图像和3G-ROI用于进行特征提取和模型构建，25个GGN的CT图像用于进行一致性评价。The CT images and 3D-ROI of 176 GGNs and the CT images and 3D-ROI of 25 randomly selected GGNs were used as sample data sets. One sample data in each sample data set is the CT image and 3G-ROI of GGN. CT images and 3G-ROI of 176 GGNs were used for feature extraction and model construction, and CT images of 25 GGNs were used for consistency evaluation.

随后，根据每个样本数据(包括176个GGN的CT图像和3G-ROI)生成每个样本数据的特征数据。特征数据即GGN的CT图像评价结果。特征数据包括位置、大小、密度、形态、边缘、瘤肺界面、内部特征和邻近结构。Subsequently, characteristic data of each sample data was generated based on each sample data (including CT images and 3G-ROI of 176 GGNs). The characteristic data is the CT image evaluation result of GGN. Feature data include location, size, density, morphology, margins, tumor-lung interface, internal features and adjacent structures.

其中，位置包括GGN以肺门1/3等分的位置以及所在的肺叶；肺结节位置分为内1/3、中1/3、外1/3三类。肺叶包括，右肺上叶，右肺中叶，右肺下叶，左肺上叶，左肺下叶。根据结节在不同分区位置划分不同的位置特征。Among them, the location includes the position where the GGN is equally divided into 1/3 of the hilus and the lung lobe where it is located; the location of pulmonary nodules is divided into three categories: inner 1/3, middle 1/3, and outer 1/3. The lobes of the lungs include the upper lobe of the right lung, the middle lobe of the right lung, the lower lobe of the right lung, the upper lobe of the left lung, and the lower lobe of the left lung. Different location characteristics are divided according to the location of nodules in different partitions.

大小包括GGN的横断面最大径和与之垂直的横断面最小径；The size includes the maximum cross-sectional diameter of the GGN and the minimum cross-sectional diameter perpendicular to it;

密度包括肺窗可见磨玻璃成分而纵隔窗不可见的纯磨玻璃结节(pure ground-glass nodule，pGGN)或既含有实性成分又含有磨玻璃成分的部分实性结节(part-solidnodule，PSN)；Density includes pure ground-glass nodules (pGGN) in which the ground-glass component is visible in the lung window but not in the mediastinal window or part-solid nodule (pGGN) which contains both solid and ground-glass components. PSN);

形状包括不规则或圆形/类圆形；Shapes include irregular or round/round-like;

边缘特征包括分叶、毛刺和棘状突起，棘状突起为从病灶延伸出的结构，但与肺实质的边界不同，它至少有一个凸起的边缘；Marginal features include lobulations, spicules, and spinous processes, which are structures that extend from the lesion but, unlike the borders of the lung parenchyma, have at least a raised edge;

瘤肺界面包括三种类型：边界模糊、边界清楚且光整和边界清楚但毛糙；The tumor-lung interface includes three types: fuzzy boundary, clear and smooth boundary, and clear but rough boundary;

内部特征包括空泡、空腔/囊腔、空洞、钙化、支气管截断、支气管扭曲/扩张；Internal features include vacuoles, cavities/cysts, cavities, calcifications, bronchial truncation, and bronchial tortuosity/dilatation;

邻近结构包括胸膜凹陷征、血管集束征；Adjacent structures include pleural depression sign and vascular bundle sign;

特征数据还包括患者的支气管管壁是否增厚，是否存在肺气肿。Characteristic data also include whether the patient's bronchial walls are thickened and whether emphysema is present.

训练集和验证集的GGN之间临床和形态学特征的差异包括：与测试集相比，训练集毛刺征(P＝0.009)和清楚毛糙的瘤肺界面(P＜0.001)的比例更多。The differences in clinical and morphological features between GGNs in the training set and the validation set include a greater proportion of spiculation signs (P=0.009) and clear and rough tumor-lung interfaces (P<0.001) in the training set compared with the test set.

本发明还计算GGN的计算倍增时间VDT，具体的：根据GGN的基线CT扫描到手术前最后一次CT扫描的时间间隔、GGN基线CT扫描时的体积、手术前GGN最后一次CT扫描的体积，按照如下公式进行计算：The present invention also calculates the calculated doubling time VDT of GGN. Specifically: based on the time interval from the baseline CT scan of GGN to the last CT scan before surgery, the volume of GGN during baseline CT scan, and the volume of GGN's last CT scan before surgery, according to Calculated using the following formula:

VDT＝(log2*T)/[log(V_f/V_i)]VDT=(log2*T)/[log(V _f /V _i )]

根据本发明的一个实施例，本发明可以400天的VDT为阈值，将GGN分为VDT＞400天和VDT≤400天两组。对于磨玻璃结节，VDT＞400天可推荐CT随访(间隔3～12个月)，而VDT≤400天则建议进一步干预。According to an embodiment of the present invention, the present invention uses VDT of 400 days as a threshold and divides GGN into two groups: VDT>400 days and VDT≤400 days. For ground-glass nodules, CT follow-up (with an interval of 3 to 12 months) is recommended for VDT >400 days, while further intervention is recommended for VDT ≤ 400 days.

在样本集中，与VDT＞400天组相比：VDT≤400天组的患者年龄更大(P＝0.001)，更多的患者有合并肺气肿(结节所在肺叶肺气肿P＜0.001，余肺肺气肿P＝0.011)；GGN的横断面最大径更大(P＝0.013)、形状更不规则(P＝0.009)，出现分叶征(P＝0.028)、毛刺征(P＝0.017)和支气管截断征(P＝0.014)的频率更大。In the sample set, compared with the VDT>400 days group: patients in the VDT≤400 days group were older (P=0.001), and more patients had emphysema (emphysema in the lobe where the nodule was located, P<0.001). emphysema in the remaining lung (P=0.011); the maximum diameter of the cross-section of the GGN is larger (P=0.013), the shape is more irregular (P=0.009), and the lobulation sign (P=0.028) and spiculation sign (P=0.017 ) and bronchial truncation sign (P=0.014) were more frequent.

随后，执行步骤120，根据特征数据对CT图像进行放射组学特征提取得到第一特征集。Subsequently, step 120 is performed to perform radiomics feature extraction on the CT image according to the feature data to obtain the first feature set.

本发明对每个训练样本进行影像组学特征提取和放射组学特征提取。The present invention performs radiomics feature extraction and radiomics feature extraction on each training sample.

进行影像组学特征提取前，对GNN的CT图像进行了预处理，包括：重采样体积元素大小到1×1×1mm3的分辨率，使用带宽25对图像灰度进行离散化。Before performing radiomics feature extraction, the CT images of GNN were preprocessed, including: resampling the volume element size to a resolution of 1×1×1mm3, and using a bandwidth of 25 to discretize the image grayscale.

预处理后的CT图像，可采用符合生物医学成像技术研讨会(internationalsymposium on biomedical imaging，ISBI)标准的Pyradiomics软件(版本号3.0.1https://pyradiomics.readthedocs.io/en/latest/)进行影像组学特征提取，提取一阶形状和二阶纹理特征。The preprocessed CT images can be imaged using Pyradiomics software (version number 3.0.1 https://pyradiomics.readthedocs.io/en/latest/) that complies with the standards of the International Symposium on Biomedical Imaging (ISBI) Omics feature extraction, extracting first-order shape and second-order texture features.

进行放射组学特征提取时，对CT图像进行高斯拉普拉斯(Laplacian ofGaussian，LOG；sigma：2-5)滤波变换和小波变换，提取图像的高级特征。最终获得每个GGN的1158个放射组学特征。When performing radiomics feature extraction, the CT image is subjected to Laplacian of Gaussian (LOG; sigma: 2-5) filter transformation and wavelet transformation to extract high-level features of the image. Finally, 1158 radiomic features of each GGN were obtained.

为了从提取的特征中选择稳健和可重复的特征，使用类间相关系数(ICC)评估从第一位放射科医生和第二位放射科医生描绘的25个GGN中提取的放射组学特征的观察者间和观察者间的一致性，以及计算DICE系数评估影像组学特征的观察者内和观察者间分割的体积一致性。In order to select robust and reproducible features from the extracted features, the interclass correlation coefficient (ICC) was used to evaluate the radiomics features extracted from 25 GGNs delineated by the first radiologist and the second radiologist. Interobserver and interobserver agreement, and volumetric agreement of intraobserver and interobserver segmentation of radiomic features were calculated to calculate DICE coefficients.

在评估影像组学特征的观察者内和观察者间的一致性检验中，1158个影像组学特征的观察者内ICC中位数为0.93(第25、第75百分位数：0.86，0.97)，观察者间ICC中位数为0.89(第25、第75百分位数：0.77，0.95)。将观察者间一致性分析中满足ICC＞0.75的900个特征保留以用于后续的分析。观察者内和观察者间的DICE系数分别为0.88[0.85，0.90]和0.82[0.78，0.83]，表明体积分割的一致性是可以接受的。In the test of intra- and inter-observer agreement assessing radiomic features, the median intra-observer ICC for 1158 radiomic features was 0.93 (25th, 75th percentiles: 0.86, 0.97 ), the median inter-observer ICC was 0.89 (25th and 75th percentiles: 0.77, 0.95). The 900 features that met ICC>0.75 in the inter-observer agreement analysis were retained for subsequent analysis. The intra- and inter-observer DICE coefficients were 0.88 [0.85, 0.90] and 0.82 [0.78, 0.83] respectively, indicating that the consistency of volume segmentation was acceptable.

随后，本发明对来自每个医院的样本使用combat进行多中心校准，以减少各个中心之间数据批次不同所造成的影响，按照样本来自医院的不同，将样本划分为来自第一医院的cz中心、来自第二医院的dy中心，来自第三医院的wf中心，来自第四医院的zj中心。Subsequently, the present invention uses combot to perform multi-center calibration on the samples from each hospital to reduce the impact of different data batches between centers. According to the different hospitals where the samples come from, the samples are divided into cz from the first hospital. center, the dy center from the Second Hospital, the wf center from the Third Hospital, and the zj center from the Fourth Hospital.

图3示出了根据本发明一个示范性实施例的数据校准前和校准后特征的均值和标准差的变化示意图。FIG. 3 shows a schematic diagram of changes in the mean and standard deviation of features before and after data calibration according to an exemplary embodiment of the present invention.

图3中a部分绘制了校准前各中心的均值±标准差，其中，cz中心：-0.024±3.640；dy中心：-0.052±2.997；wf中心：0.025±2.680；zj中心：-0.378±4.818。Part a of Figure 3 plots the mean ± standard deviation of each center before calibration, among which, cz center: -0.024±3.640; dy center: -0.052±2.997; wf center: 0.025±2.680; zj center: -0.378±4.818.

图3中b部分绘制了校准后各中心的均值±标准差，其中，cz中心：-0.102±3.322；dy中心：-0.059±3.290；wf中心：-0.016±3.290；zj中心：-0.120±3.457。Part b in Figure 3 plots the mean ± standard deviation of each center after calibration, among which, cz center: -0.102±3.322; dy center: -0.059±3.290; wf center: -0.016±3.290; zj center: -0.120±3.457 .

随后，本发明对校准后的样本，将评估影像组学特征的观察者内和观察者间分割的体积一致性所计算DICE系数得到的ICC大于0.75的特征进行保留，其他的进行剔除。ICC大于0.75的特征视为具有可重复性和较高的鲁棒性。Subsequently, for the calibrated samples, the present invention retains the features with an ICC greater than 0.75 obtained by calculating the DICE coefficient to evaluate the volume consistency of intra-observer and inter-observer segmentation of radiomics features, and eliminates the others. Features with an ICC greater than 0.75 are considered to be repeatable and highly robust.

随后，执行步骤130，根据第一特征集进行特征降维确定训练特征集，根据剔除后的特征采用最小冗余和最大关联(Minimum Redundancy-Maximum Relevance，MRMR)方法进行特征降维。由于特征的数量影响模型的性能，因此MRMR生成4个特征集，分别包含3个、6个、9个和12个特征。采用7种初始模型的受试者工作特征(receiver operatingcharacteristic，ROC)曲线下的平均面积确定最佳特征个数和最终特征集。这7种初始模型分别是：随机森林(random forest，RF)、线性核支持向量机(support vector machineswith Linear kernel，svmLinear)、径向基函数核支持向量机(support Vector machineswith Radial basis function kernel，svmRadial)、线性极端梯度提升(extremegradient boosting with linear booster，xgbLinear)、K最近邻(k-nearest neighbors，KNN)、人工神经网络(artificial neural network，NNet)和朴素贝叶斯(naive bayes，NB)。Subsequently, step 130 is performed to perform feature dimensionality reduction based on the first feature set to determine a training feature set, and use the Minimum Redundancy-Maximum Relevance (MRMR) method to perform feature dimensionality reduction based on the eliminated features. Since the number of features affects the performance of the model, MRMR generates 4 feature sets, containing 3, 6, 9 and 12 features respectively. The average area under the receiver operating characteristic (ROC) curve of the seven initial models was used to determine the optimal number of features and the final feature set. The seven initial models are: random forest (RF), linear kernel support vector machine (support vector machines with Linear kernel, svmLinear), radial basis function kernel support vector machine (support Vector machines with Radial basis function kernel, svmRadial) ), linear extreme gradient boosting (extremegradient boosting with linear booster, xgbLinear), K-nearest neighbors (KNN), artificial neural network (NNet) and naive bayes (NB).

根据本发明的一个实施例，图4示出了根据本发明一个示范性实施例的4个特征集计算得到平均AUC的示意图，如图4所示，7种初始模型下由MRMR生成的四种不同数量特征的特征集的平均性能中，含有3个特征的特征集的平均AUC最高(AUC＝0.7984)。包含3、6、9和12个特征的特征集的平均AUC分别为0.7984、0.7971、0.7776和0.7732。因此，本发明选择含有3个特征的特征集用于之后的模型训练。According to an embodiment of the present invention, Figure 4 shows a schematic diagram of the average AUC calculated from 4 feature sets according to an exemplary embodiment of the present invention. As shown in Figure 4, four types of AUC generated by MRMR under 7 initial models Among the average performances of feature sets with different numbers of features, the feature set containing 3 features has the highest average AUC (AUC=0.7984). The average AUCs of feature sets containing 3, 6, 9 and 12 features are 0.7984, 0.7971, 0.7776 and 0.7732 respectively. Therefore, the present invention selects a feature set containing three features for subsequent model training.

随后，执行步骤140，根据训练特征集和计算倍增时间对初始模型进行训练得到预测模型。Subsequently, step 140 is performed to train the initial model according to the training feature set and the calculated doubling time to obtain a prediction model.

使用确定的特征集，根据训练集的训练样本进行嵌套交叉验证训练每种初始模型，训练集中GGN的CT图像为多例患者的多次胸部检查的CT图像。对于每种初始模型，内部均进行5折交叉验证以调整模型参数并防止模型过拟合，外部交叉验证为100次的省略分组交叉验证(leave group out cross validation，LGOCV)以检验模型的稳定性。在100次LGOCV中，每次将训练集分为训练子集和测试子集，训练子集用于训练初始模型，测试子集用于训练后对模型进行测试，获得性能指标，包括AUC、准确度、特异度、阳性预测值(positive predictive value，PPV)、阴性预测值(negative predictive value，NPV)以及ROC曲线下的面积(area under the curve，AUC)。100次的LGOCV可以产生100个模型性能指标，用相对标准偏差(relative standard deviation，RSD)来评估每个模型的稳定性，计算公式如下：Using the determined feature set, nested cross-validation is performed to train each initial model based on the training samples of the training set. The CT images of GGN in the training set are CT images of multiple chest examinations of multiple patients. For each initial model, 5-fold cross-validation is performed internally to adjust model parameters and prevent model overfitting, and external cross-validation is 100 times of leave group out cross validation (LGOCV) to test the stability of the model. . In 100 times of LGOCV, each time the training set is divided into a training subset and a test subset. The training subset is used to train the initial model, and the test subset is used to test the model after training to obtain performance indicators, including AUC, accuracy degree, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the ROC curve (AUC). 100 times of LGOCV can produce 100 model performance indicators, and the relative standard deviation (RSD) is used to evaluate the stability of each model. The calculation formula is as follows:

RSD％＝(SD_metric/Mean_metric)*100％RSD%=(SD _metric /Mean _metric )*100%

式中，SD_metric表示100个度量值的标准差，Mean_metric表示100个度量值的平均值，metric(度量值)分别为准确度或AUC。RSD值越低，表明模型的稳定性越高。In the formula, SD _metric represents the standard deviation of 100 metric values, Mean _metric represents the average of 100 metric values, and metric (metric value) is accuracy or AUC respectively. The lower the RSD value, the higher the stability of the model.

最后选择预测性能好且稳定性较高的模型作为最终模型，并通过独立的验证集对模型进行评价。计算模型的准确度、敏感度、特异度、阳性预测值(positive predictivevalue，PPV)、阴性预测值(negative predictive value，NPV)以及ROC曲线下的面积(areaunder the curve，AUC)。Finally, the model with good prediction performance and high stability is selected as the final model, and the model is evaluated through an independent validation set. Calculate the accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the ROC curve (AUC) of the model.

根据本发明的一个实施例，本发明可在统计分析时使用R软件(版本4.2.0，https://cran.r-project.org/)和SPSS23.0 for Windows(SPSS，芝加哥，伊利诺伊州))进行统计分析。分类变量分析采用χ2检验或FISHER精确检验，连续变量分析采用Mann-Whitney检验。用ROC分析来考察模型的性能，其中AUC越大，模型的性能越好，并且双尾P值<0.05视为有统计学意义。According to an embodiment of the present invention, the present invention can use R software (version 4.2.0, https://cran.r-project.org/) and SPSS23.0 for Windows (SPSS, Chicago, Illinois) in statistical analysis ))conduct statistical analysis. Categorical variables were analyzed using the χ2 test or FISHER exact test, and continuous variables were analyzed using the Mann-Whitney test. Use ROC analysis to examine the performance of the model. The larger the AUC, the better the performance of the model, and a two-tailed P value <0.05 is considered statistically significant.

图5示出了根据本发明一个示范性实施例的7种初始模型训练后的性能示意图。如图5所示，嵌套交叉验证结果中，xgbLiner算法模型的预测性能最好，准确度和AUC分别为0.890和0.896；NNet算法模型的性能其次，准确度和AUC分别为0.865和0.886；而NB算法模型是七种算法中预测性能最低的，准确度和AUC分别仅为0.748和0.790。具体性能参数如表1所示：Figure 5 shows a performance diagram after training of seven initial models according to an exemplary embodiment of the present invention. As shown in Figure 5, in the nested cross-validation results, the xgbLiner algorithm model has the best prediction performance, with accuracy and AUC of 0.890 and 0.896 respectively; the performance of the NNet algorithm model is second, with accuracy and AUC of 0.865 and 0.886 respectively; and The NB algorithm model has the lowest prediction performance among the seven algorithms, with accuracy and AUC being only 0.748 and 0.790 respectively. The specific performance parameters are shown in Table 1:

表1Table 1

表中数据为均值±标准差 The data in the table are means ± standard deviation

为了从7个模型中选择最佳模型，还对7种模型的稳定性进行了评估。In order to select the best model among the 7 models, the stability of the 7 models was also evaluated.

图6示出了根据本发明一个示范性实施例的7种模型稳定性的测试结果的示意图。如图6所示：左侧为平均准确度的RSD，右侧为平均AUC的RSD。横坐标值越小，表算法预测的稳定性越好；纵坐标值越大，表示模型的平均准确率或平均AUC越高。Figure 6 shows a schematic diagram of the test results of the stability of seven models according to an exemplary embodiment of the present invention. As shown in Figure 6: The left side is the RSD of the average accuracy, and the right side is the RSD of the average AUC. The smaller the abscissa value, the better the stability of the table algorithm prediction; the larger the ordinate value, the higher the average accuracy or average AUC of the model.

结果显示，NNet算法模型具有最高的抗数据干扰的鲁棒性，准确度和AUC的RSD分别为11.9％和10.9％；而预测性能最好的xgbLiner算法模型准确度和AUC的RSD分别为14.4％(第三位)和14.9％(第二位)。NB算法模型准确度的RSD(24.5％)和svmRadial算法模型AUC的RSD(22.2％)最高，鲁棒性最差。The results show that the NNet algorithm model has the highest robustness against data interference, with an accuracy and AUC RSD of 11.9% and 10.9% respectively; while the xgbLiner algorithm model with the best prediction performance has an accuracy and AUC RSD of 14.4% respectively. (third place) and 14.9% (second place). The RSD of the accuracy of the NB algorithm model (24.5%) and the RSD of the AUC of the svmRadial algorithm model (22.2%) are the highest and the robustness is the worst.

不同模型的稳定性参数测试结果具体如表2所示：The stability parameter test results of different models are shown in Table 2:

表2Table 2

如表2所示出的，RSD的值越低，表示模型的稳定性越好。As shown in Table 2, the lower the value of RSD, the better the stability of the model.

进一步的，比较Nnet模型和其他模型之间的平均AuC差异，NNet模型的平均AUC值虽略低于xgbLiner模型，但差异无统计学意义(P＝0.851)，NNet模型与其他模型的平均AUC比较如表3所示：Furthermore, compare the average AuC difference between the NNet model and other models. Although the average AUC value of the NNet model is slightly lower than the xgbLiner model, the difference is not statistically significant (P=0.851). Comparison of the average AUC values of the NNet model and other models as shown in Table 3:

表3table 3

综上，考虑模型的预测性能和稳健性，本发明选择了稳健性最好的NNet模型作为预测模型，并在独立的验证集上进一步评估了NNet模型。In summary, considering the prediction performance and robustness of the model, the present invention selected the most robust NNet model as the prediction model, and further evaluated the NNet model on an independent verification set.

图7a示出了根据本发明一个示范性实施例的NNet模型ROC曲线的示意图；图7b示出了根据本发明一个示范性实施例的NNet模型预测混淆矩阵的示意图。Figure 7a shows a schematic diagram of the NNet model ROC curve according to an exemplary embodiment of the present invention; Figure 7b shows a schematic diagram of the NNet model prediction confusion matrix according to an exemplary embodiment of the present invention.

如图7a和图7b所示，ROC分析和预测的混淆矩阵显示：模型的AUC为0.709(95％CI:0.515～0.879)；准确度、敏感度、特异度、PPV和NPV分别为0.756、0.667、0.766、0.250和0.952。As shown in Figure 7a and Figure 7b, the confusion matrix of ROC analysis and prediction shows that the AUC of the model is 0.709 (95% CI: 0.515~0.879); the accuracy, sensitivity, specificity, PPV and NPV are 0.756 and 0.667 respectively. , 0.766, 0.250 and 0.952.

最后，执行步骤150，将待预测患者的单次胸部检查的CT图像输入根据预测模型，得到待预测患者肺结节的预测倍增时间。本发明中预测模型根据多次胸部检查的CT图像进行训练得到，在实际预测时，仅使用单次胸部检查的CT图像即可实现预测倍增时间预测，从而指导科学检查处理，减少多次随访检查的辐射剂量，缩短患者检查周期，减少来院次数，提升患者就医体验。Finally, step 150 is executed, and the CT image of a single chest examination of the patient to be predicted is input into the prediction model to obtain the predicted doubling time of the pulmonary nodules of the patient to be predicted. The prediction model in the present invention is trained based on CT images of multiple chest examinations. In actual prediction, prediction doubling time prediction can be achieved using only CT images of a single chest examination, thereby guiding scientific examination processing and reducing multiple follow-up examinations. radiation dose, shorten the patient’s examination cycle, reduce the number of hospital visits, and improve the patient’s medical experience.

根据本发明训练NNet模型，对验证集中2例患者的肺结节的CT图像进行预测。图8a和图8b分别示出了根据本发明一个示范性实施例的第一例患者基线CT扫描的示意图和手术前最后一次CT扫描的示意图。图8c和图8d分别示出了根据本发明一个示范性实施例的第二例患者基线CT扫描的示意图和手术前最后一次CT扫描的示意图。The NNet model is trained according to the present invention to predict the CT images of pulmonary nodules of 2 patients in the verification set. 8a and 8b respectively show a schematic diagram of a first patient's baseline CT scan and a schematic diagram of the last CT scan before surgery according to an exemplary embodiment of the present invention. 8c and 8d respectively show a schematic diagram of a baseline CT scan of a second patient and a schematic diagram of the last CT scan before surgery according to an exemplary embodiment of the present invention.

第一例患者为65岁男性，如图8a所示，基线CT显示左肺上叶mGGN；如图8b所示，该患者术前末次CT随访显示结节明显增大，实性成分增加，两次扫描间隔时间为343天，模型预测的倍增时间≤400天，实际计算的VDT为267天，病理证实为浸润性腺癌。The first patient is a 65-year-old male. As shown in Figure 8a, the baseline CT showed mGGN in the upper lobe of the left lung. As shown in Figure 8b, the patient’s last preoperative CT follow-up showed that the nodule increased significantly, the solid component increased, and both The scan interval was 343 days, the doubling time predicted by the model was ≤400 days, the actual calculated VDT was 267 days, and the pathology confirmed invasive adenocarcinoma.

第二例患者为55岁女性，如图8c所示，基线CT显示右肺中叶pGGN；，如图8d所示，该患者术前末次CT随访，结节未见明显变化。两次扫描间隔时间为273天，模型预测的倍增时间＞400天，实际计算的VDT为1000天，病理证实为原位腺癌。The second patient was a 55-year-old female. As shown in Figure 8c, the baseline CT showed pGGN in the right middle lobe of the lung. As shown in Figure 8d, the patient's last preoperative CT follow-up showed no significant changes in the nodules. The interval between two scans was 273 days, the doubling time predicted by the model was >400 days, the actual calculated VDT was 1000 days, and the pathology confirmed it was adenocarcinoma in situ.

需要说明的是，本申请上述的存储介质(计算机可读介质)可以是计算机可读信号介质或者非暂时性计算机可读存储介质或者是上述两者的任意组合。非暂时性计算机可读存储介质例如可以是，但不限于，电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。非暂时性计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。It should be noted that the storage medium (computer-readable medium) mentioned above in this application may be a computer-readable signal medium or a non-transitory computer-readable storage medium, or any combination of the above two. The non-transitory computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof. More specific examples of non-transitory computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

在本申请中，非暂时性计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中，计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是非暂时性计算机可读存储介质以外的任何计算机可读介质，该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：电线、光缆、RF(射频)等，或者上述的任意合适的组合。As used herein, a non-transitory computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which computer-readable program code is carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A computer-readable signal medium may also be any computer-readable medium other than non-transitory computer-readable storage media that can be sent, propagated, or transmitted for use by or in connection with an instruction execution system, apparatus, or device program of. Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical cable, RF (radio frequency), etc., or any suitable combination of the above.

上述计算机可读介质可以是上述电子设备中所包含的；也可以是单独存在，而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.

可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码，上述程序设计语言包括但不限于面向对象的程序设计语言，诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言，诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。Computer program code for performing the operations of the present application may be written in one or more programming languages, including but not limited to object-oriented programming languages such as Java, Smalltalk, C++, or a combination thereof. Includes conventional procedural programming languages, such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.

附图中的流程图和框图，图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这根据所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operations of possible implementations of systems, methods, and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after the other may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.

描述于本申请实施例中所涉及到的单元可以通过软件的方式实现，也可以通过硬件的方式来实现。其中，单元的名称在某种情况下并不构成对该单元本身的限定。The units involved in the embodiments of this application can be implemented in software or hardware. Among them, the name of a unit does not constitute a limitation on the unit itself under certain circumstances.

本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如，非限制性地，可以使用的示范类型的硬件逻辑部件包括：现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(片上系统)、复杂可编程逻辑设备(CPLD)等。The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, and without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on a Chip (Systems on a Chip), Complex Programmable Programmable logic device (CPLD), etc.

以上描述仅为本申请的部分实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本申请中所涉及的公开范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离上述公开构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only some embodiments of the present application and an explanation of the technical principles used. Persons skilled in the art should understand that the disclosure scope involved in this application is not limited to technical solutions composed of specific combinations of the above technical features, but should also cover solutions consisting of the above technical features or without departing from the above disclosed concept. Other technical solutions formed by any combination of equivalent features. For example, a technical solution is formed by replacing the above features with technical features with similar functions disclosed in this application (but not limited to).

此外，虽然采用特定次序描绘了各操作，但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序来执行。在一定环境下，多任务和并行处理可能是有利的。同样地，虽然在上面论述中包含了若干具体实现细节，但是这些不应当被解释为对本申请的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地，在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。Furthermore, although operations are depicted in a specific order, this should not be understood as requiring that the operations be performed in the specific order shown, or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, although several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the application. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题，但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反，上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims

1. A method of predicting lung nodule doubling time based on a single chest CT examination, adapted for execution in a computing device, the method comprising the steps of:

generating feature data for each sample in a pre-acquired sample set, the sample set comprising a plurality of lung nodules, the feature data comprising a location, a size, a density, a morphology, an edge, a tumor lung interface, internal features, an adjacent structure, and a calculated doubling time of the lung nodules;

performing radiometric feature extraction on the CT image according to the feature data to obtain a first feature set;

performing feature dimension reduction according to the first feature set to determine a training feature set;

training the initial model according to the training feature set and the calculated multiplication time to obtain a prediction model;

and inputting CT images of a single chest examination of the patient to be predicted into the model to obtain the predicted multiplication time of the lung nodule of the patient to be predicted.

2. The method of claim 1, wherein the samples in the sample set are determined based on selection criteria and exclusion criteria.

3. The method of claim 2, wherein the selection criteria are criteria that need to be met as a sample lung nodule, comprising:

the diameter of a lung nodule in baseline CT scanning is less than or equal to 3cm;

performing thin layer CT scanning for more than two times;

the time interval between the baseline CT scan and the last CT scan prior to surgery is greater than or equal to 180 days;

surgery was performed within one month after the last CT scan.

4. A method according to claim 2 or 3, wherein the exclusion criteria are criteria for lung nodules that cannot be taken as a sample, comprising:

any treated nodules prior to surgery, including radiation or chemotherapy;

nodules lacking clinical information or missing nodule image data;

the nodule image artifacts affect the nodule display.

5. The method of claim 1, wherein the generating feature data for each sample in the set of pre-acquired samples comprises:

according to the time interval from the baseline CT scanning of the GGN to the last CT scanning before the operation, the volume of the GGN at the baseline CT scanning and the volume of the GGN at the last CT scanning before the operation, the multiplication time is determined and calculated according to the following formula:

VDT＝(log2*T)/[log( _f / _i )]

wherein T is the time interval from baseline CT scan to last CT scan before surgery for GGN, V _i Is GGN baseline CT scanVolume at time V _f Is the volume of the last CT scan of the preoperative GGN.

6. The method of claim 1, wherein performing a radiological feature extraction on a CT image from the feature data to obtain a first feature set comprises:

performing Gaussian Laplace filtering transformation and wavelet transformation on the CT image, and extracting advanced features of the image to obtain a radiology feature set;

extracting first-order shape and second-order texture features from the CT image to obtain image histology features;

and carrying out consistency test on the image histology characteristics in and among observers by using the correlation coefficient among the classes, and selecting the radiology characteristics meeting the preset conditions by the consistency test from the radiology characteristic set as a first characteristic set.

7. The method of claim 1, wherein the feature-dimension-reduction determination of a training feature set from the first feature set comprises:

obtaining a plurality of preset training feature sets for the features in the first feature set through a minimum redundancy and maximum association method, wherein the number of the features included in each preset training feature set is different;

and calculating an AUC (automatic value) for each preset training feature set according to the initial model, and taking the preset training set with the largest AUC as the training feature set, wherein the AUC is the average area under the working feature curve of the subject.

8. The method of claim 1, wherein the initial model comprises: random forests, linear kernel support vector machines, radial basis function kernel support vector machines, linear extreme gradient boosting, K nearest neighbors, artificial neural networks, and naive bayes.

9. The method of any of claims 1-8, wherein the training the initial model from the training feature set and calculating doubling time to obtain a predictive model comprises:

using the determined training feature set, and performing nested cross-validation training on each initial model according to training samples of the training set in the sample set;

and determining a prediction model from the initial model according to the AUC, accuracy and stability of each model in the nested cross-validation training result.

10. A computing device, comprising:

one or more processors;

a memory; and

one or more devices comprising instructions for performing any of the methods of claims 1-9.

11. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-9.