CN112506797B

CN112506797B - A performance testing method for medical image recognition system

Info

Publication number: CN112506797B
Application number: CN202011525218.0A
Authority: CN
Inventors: 陈芳; 成楚凡; 张道强
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2022-05-24
Anticipated expiration: 2040-12-22
Also published as: CN112506797A

Abstract

The invention discloses a performance test method of a medical image recognition system, which comprises the following steps that 1) a multi-class image test data generation module comprises a confrontation sample generation network and an entity and background recombination method; 2) the multi-angle test module is based on system stability, reliability and safety performance; 3) and a model decision evaluation module. The invention realizes the generation of multi-class image test data and multi-angle complete system test, finally completes the decision evaluation of the medical image recognition system, and has wide future application prospect.

Description

A performance test method for medical image recognition system

技术领域technical field

本发明属于医学影像识别系统性能分析技术领域，特别涉及一种针对医学影像识别系统的性能测试方法。The invention belongs to the technical field of performance analysis of a medical image recognition system, in particular to a performance testing method for a medical image recognition system.

背景技术Background technique

医学影像识别系统在临床医学诊断中发挥着重要的作用，它大大改变了临床诊断模式，促进了临床医学的发展。智能医学影像识别是基于人工智能技术，对X线片、计算机断层扫描、磁共振成像等常用医学影像学技术扫描图像和手术视频进行分析处理的过程，其发展方向主要包括智能影像诊断、影像三维重建与配准、智能手术视频解析等。目前，对该领域的研究已取得一定的进展，正在逐步走向临床应用。因此，对医学影像识别系统性能的评估和测试对未来临床医学的发展尤为重要。FERET首次给识别算法设置了性能基准，定义了一系列的评测标准，极大地推进了识别技术的发展，其制定的评测标准和评价协议一直影响至今，为以后人脸识别技术的发展带来了深远的影响。虽然一般的图像识别系统有一些测试方案，但是针对医学影像识别系统的测试方案还没有被提出。而且由于当时的识别技术尚不成熟，参与FERET评测的识别系统也多是大学实验室里的原型系统，识别效果并不是非常令人满意。Medical image recognition system plays an important role in clinical medical diagnosis. It has greatly changed the clinical diagnosis mode and promoted the development of clinical medicine. Intelligent medical image recognition is a process of analyzing and processing scanned images and surgical videos of commonly used medical imaging technologies such as X-ray, computed tomography, and magnetic resonance imaging based on artificial intelligence technology. Reconstruction and registration, intelligent surgical video analysis, etc. At present, the research in this field has made some progress and is gradually moving towards clinical application. Therefore, the evaluation and testing of the performance of the medical image recognition system is particularly important for the development of clinical medicine in the future. For the first time, FERET set a performance benchmark for the recognition algorithm and defined a series of evaluation standards, which greatly promoted the development of recognition technology. profound influence. Although there are some test schemes for general image recognition systems, no test schemes for medical image recognition systems have been proposed. Moreover, since the recognition technology at that time was not yet mature, the recognition systems involved in the FERET evaluation were mostly prototype systems in university laboratories, and the recognition effect was not very satisfactory.

近年来，对识别模型进行分析，并自动进行性能分析的测试方法需求逐渐增加。随着深度学习技术快速发展，医学影像识别系统的性能指标也得到快速提高，大大提高了识别效率，所以如何测试这些模型的性能亟待解决。针对医学影像识别系统在测试中，需要考虑到生成的测试影像的视觉真实性，因此提出了对抗样本生成网络和实体、背景重组方法，充分保证生成样本的真实性；且医学识别系统对可靠性和安全性的测试要求更高，因此提出多角度测试方案，并将对抗样本应用到医学影像中，达到对医学识别模型更好分析的效果。In recent years, there has been a growing demand for test methods that analyze recognition models and automate performance analysis. With the rapid development of deep learning technology, the performance indicators of medical image recognition systems have also been rapidly improved, greatly improving the recognition efficiency, so how to test the performance of these models needs to be solved urgently. For the medical image recognition system in the test, it is necessary to consider the visual authenticity of the generated test image, so the adversarial sample generation network and the entity and background reorganization method are proposed to fully ensure the authenticity of the generated samples; Therefore, a multi-angle test scheme is proposed, and the adversarial samples are applied to medical images to achieve better analysis of the medical recognition model.

发明内容SUMMARY OF THE INVENTION

本发明提供了一种针对医学影像识别系统的性能测试方法，以解决现有技术中的问题。The present invention provides a performance testing method for a medical image recognition system to solve the problems in the prior art.

为了实现上述目的，本发明采用以下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

一种针对医学影像识别系统的性能测试方法，性能测试方法中包括了：多类别影像测试数据生成模块，所述多类别影像测试数据生成模块包括对抗样本生成网络和实体、背景重组方法；多角度测试模块，所述多角度测试模块包括性能测试、可靠性测试和安全性测试；决策评估模块，所述决策评估模块对所输入的测试结果进行分析，判断模型性能，并给出详细的测试报告；A performance testing method for a medical image recognition system, the performance testing method includes: a multi-category image test data generation module, wherein the multi-category image test data generation module includes an adversarial sample generation network and an entity and background reorganization method; Test module, the multi-angle test module includes performance test, reliability test and safety test; decision evaluation module, the decision evaluation module analyzes the input test results, judges the model performance, and gives a detailed test report ;

网络输入一组待分类识别的图片，首次将图片输入到所述多类别影像测试数据生成模块中，经过图像增广后输入模型进行分类，并将分类结果输入至多角度测试模块，多角度测试模块对模型的学习结果进行测试，并将结果传递到决策评估模块，决策评估模块对所输入的测试结果进行分析，判断模型性能，并给出详细的测试报告。The network inputs a set of pictures to be classified and recognized, and the pictures are input into the multi-category image test data generation module for the first time, and after image augmentation, the model is input for classification, and the classification results are input into the multi-angle test module, which is a multi-angle test module. The learning results of the model are tested, and the results are transmitted to the decision evaluation module. The decision evaluation module analyzes the input test results, judges the performance of the model, and gives a detailed test report.

进一步的，所述对抗样本生成网络和实体、背景重组方法中包括对抗增广使用多损失混合对抗伪装，Further, the adversarial sample generation network and the entity and background reorganization method include adversarial augmentation using multi-loss mixed adversarial camouflage,

多损失函数

表示为：Multiple loss functions

Expressed as:

其中：λ表示对抗强度、

表示对抗性损失、

表示用于样式生成的样式损失、

表示用于保存源图像内容的内容损失、

表示用于确保扩展样本的平滑度的平滑度损失；Among them: λ represents the strength of confrontation,

represents the adversarial loss,

represents the style loss used for style generation,

Indicates the content loss used to preserve the content of the source image,

represents the smoothness loss used to ensure smoothness of the expanded samples;

用户定义现有图像、目标攻击区域和预期目标样式，在所需的区域中生成所需的样式，在每个步骤中向生成的扩展样本中添加附加的物理适应训练；The user defines the existing image, the target attack area and the expected target style, generates the desired style in the desired area, adds additional physical fitness training to the generated extended samples at each step;

两个图像之间的样式距离通过两个图像在样式表示中的差异来定义，The style distance between two images is defined by the difference in the style representation of the two images,

其中：

为特征距离，l是风格层特征，S_l是提取风格表示的风格层集，

是风格样式的特征提取器，

是从

的一组样式层提取的深层特征的Gram矩阵，x^s是风格参考图像，x′是生成的对抗样本；in:

is the feature distance, l is the style layer feature, S _l is the style layer set for extracting style representation,

is a style-style feature extractor,

From

A set of Gram matrices of deep features extracted by the style layer, x ^s is the style reference image, and x′ is the generated adversarial example;

用于样式生成的样式损失

所生成参考样式中的增强图像的内容与原始图像的内容非常不同；具体如下，style loss for style generation

The content of the enhanced image in the generated reference style is very different from the content of the original image; as follows,

其中：

是内容损失，t是内容层特征，c_t是提取内容表示的内容层集，

是内容层的特征提取器，x是原始图像，x′是生成的对抗样本；in:

is the content loss, t is the content layer feature, c _t is the content layer set from which the content representation is extracted,

is the feature extractor of the content layer, x is the original image, and x′ is the generated adversarial example;

通过减少相邻像素之间的变化，提高增强图像的平滑度；对于增强图像，平滑度损失定义如下，The smoothness of the enhanced image is improved by reducing the variation between adjacent pixels; for the enhanced image, the smoothness loss is defined as follows,

其中：x_i，j为对抗样本(i,j)坐标处的像素值，x_i+1，j为原始图像(i+1,j)坐标处的像素值，x_i，j+1为原始图像(i,j+1)坐标处的像素值：Among them: x _{i, j} is the pixel value at the coordinate of the adversarial sample (i, j), x _{i+1, j} is the pixel value at the coordinate of the original image (i+1, j), x _{i, j+1} is the original image Pixel values at the image (i,j+1) coordinates:

对于对抗性损失

使用以下交叉熵损失：For adversarial loss

Use the following cross-entropy loss:

其中：p_yadv()和p_y()分别是目标模型F(F指代通用机器模型的目标函数，例如vgg的目标函数F为fc8，由此可以得到对应1000类的概率输出)对标签yadv(对抗样本的类别)和y(原始图像的类别)的概率输出。Among them: p _yadv () and p _y () are the target model F respectively (F refers to the objective function of the general machine model, for example, the objective function F of vgg is fc8, from which the probability output corresponding to 1000 classes can be obtained) to the label yadv (the class of the adversarial example) and the probability output of y (the class of the original image).

将现实条件引入到增强示例的生成过程中，如下所示：Real-world conditions are introduced into the generation of augmented examples as follows:

其中：o是在物理世界中采样的随机背景图像，T是旋转、调整大小和颜色偏移的随机变换，

是变换的集合；通过根据原始图像x和背景图像o，生成的增强样本对于人类观察者来说基本是合法的；where: o is a random background image sampled in the physical world, T is a random transformation for rotation, resizing and color shifting,

is the set of transformations; by relying on the original image x and the background image o, the generated augmented samples are basically legal to a human observer;

目标背景重组增广使用分割算法Mask R-CNN将目标从背景中分割出来，使用插值算法将背景中空白部分补充像素，最后随机组合目标和背景，实现图片增广。The target background reorganization and augmentation uses the segmentation algorithm Mask R-CNN to segment the target from the background, uses the interpolation algorithm to supplement the blank part of the background with pixels, and finally randomly combines the target and the background to achieve image augmentation.

进一步的，多角度测试模块中的性能测试，包含不同角度：识别准确率Accuracy判别判断，识别损失值Loss判别判断以及蜕变关系判别；准确率Accuracy和损失值Loss的判断，都是用增广前后的模型输出的准确率Accuracy和识别损失值Loss相减得到的扩充前后识别准确率差值百分比Δacc和扩充前后识别损失差值百分比Δloss；Further, the performance test in the multi-angle test module includes different angles: the recognition accuracy rate Accuracy judgment, the recognition loss value Loss judgment judgment and the transformation relationship judgment; the accuracy rate and the loss value Loss judgment are both before and after augmentation. The difference between the accuracy rate of the model output Accuracy and the recognition loss value Loss is obtained by subtracting the recognition accuracy difference percentage Δacc before and after expansion, and the difference percentage of recognition loss before and after expansion Δloss;

蜕变测试定义为：C_i为原测试图像

的被图像识别系统分类标签，S_i为原测试图像

的置信分数；C_i′为结合蜕变关系利用

合成的新测试图像

的分类标签，S_i′为结合蜕变关系利用

合成的新测试图像

的置信分数，那么蜕变关系表述为：The transformation test is defined as: C _i is the original test image

is classified by the image recognition system, and S _i is the original test image

The _confidence score of

Synthesized new test image

The classification label of , S _i ′ is used in combination with the metamorphic relationship

Synthesized new test image

The confidence score of , then the transformation relationship is expressed as:

C_i＝C′_i andΔS＝|S_i-S′_i|＜c (7)C _i =C' _i andΔS=|S _i -S' _i |<c (7)

其中：c为超参数，0<c<100，c设置为50，ΔS为扩充前后置信分数的差。Where: c is a hyperparameter, 0<c<100, c is set to 50, and ΔS is the difference between the confidence scores before and after expansion.

进一步的，多角度测试模块中的可靠性测试为鲁棒性(certified robustness)测试，在原始图像x 满足置信度保证的条件下，在范数球半径R内能够免疫攻击：Further, the reliability test in the multi-angle test module is a robustness test. Under the condition that the original image x satisfies the confidence level guarantee, it can be immune to attacks within the radius R of the norm sphere:

其中：z()为损失函数，g()是待优化的目标函数，

指任意，ε为引入的噪声，B(x；R)为噪声集合，R为范数球半径，R为一个无线接近0的值，x是原始图像；Among them: z() is the loss function, g() is the objective function to be optimized,

Refers to any, ε is the noise introduced, B(x; R) is the noise set, R is the radius of the norm sphere, R is a wireless value close to 0, and x is the original image;

最后鲁棒性准确度(robacc)定义为：The final robustness accuracy (robacc) is defined as:

多角度测试模块中的安全性测试为模型不变性测试，选择一个随机图像，使用下面描述的四种方法之一选择一个像素的扰动，然后测量网络对该扰动的敏感度，第一种方法为“裁剪(Crop)”方法，在原始图像中随机选择一个正方形，并将该正方形的大小调整为224x224px，然后，我们将该正方形对角平移一个像素，以创建第二个图像，该图像通过平移单个像素来与第一个图像不同；第二种方法为“嵌入(Embedding)”方法，先缩小图像，使其最小尺寸为100px，同时保持纵横比，并将其嵌入到224x224px图像内的随机位置，同时用黑色(Black)像素填充图像的其余部分，然后将嵌入位置移位单个像素，再次创建两个相同的图像，直到移位单个像素；第三种方法中，先缩小图像，使其最小尺寸为100px，同时保持纵横比，并将其嵌入到224x224px图像内的随机位置，然后使用简单的修复算法(每个黑色像素被其邻域中的非黑色像素的加权平均所取代)，第四个方法与第二个协议相同，先缩小图像，使其最小尺寸为100px，但我们不移动嵌入位置，而是保持嵌入位置不变，并将嵌入图像的大小更改单个像素(例如，从大小100x100px更改为大小101x101px像素)。The security test in the multi-angle test module is a model invariance test. A random image is selected, a perturbation of a pixel is selected using one of the four methods described below, and the sensitivity of the network to this perturbation is measured. The first method is The Crop method randomly selects a square in the original image and resize the square to 224x224px, then we translate the square by one pixel diagonally to create a second image, which is translated by A single pixel is different from the first image; the second method is the "Embedding" method, which first reduces the image to a minimum size of 100px, while maintaining the aspect ratio, and embeds it at a random position within the 224x224px image , while filling the rest of the image with black pixels, then shifting the embedding position by a single pixel, creating two identical images again, up to a single pixel shift; in the third method, the image is first scaled down to the smallest size is 100px while maintaining aspect ratio and embed it at random position inside 224x224px image, then use simple repair algorithm (each black pixel is replaced by weighted average of non-black pixels in its neighborhood), fourth The first method is the same as the second protocol, first shrink the image so that its minimum size is 100px, but instead of moving the embed position, we keep the embed position the same and change the size of the embed image by a single pixel (for example, from size 100x100px Change to size 101x101px).

进一步的，在安全性测试中，用两种方法测量灵敏度作为模型的不变性测试，第一个称之为 P(Top-1 Change)，是网络的TOP-1预测在单像素扰动后发生变化的概率；第二称之为“平均绝对变化”(MAC)，测量在顶层类(即在两个帧的第一帧中具有最高概率的类)的一个像素扰动之后，网络计算的概率的平均绝对值变化(即在两个帧的第一个帧中具有最高概率的类)。Further, in the security test, two methods are used to measure the sensitivity as the invariance test of the model. The first one is called P (Top-1 Change), which is the change of the TOP-1 prediction of the network after single-pixel perturbation. the probability of Absolute value change (ie the class with the highest probability in the first of the two frames).

进一步的，决策评估模块对所输入的测试结果进行分析，判断模型性能【Accuracy扩充后识别准确率，Loss扩充后识别损失，Δacc扩充前后识别准确率差，Δloss扩充前后识别损失差，CR 模型鲁棒性(用robacc来表征)，ΔS扩充前后置信分数差，P(Top-1Change)网络的TOP-1预测在单像素扰动后发生变化的概率，MAC平均绝对变化)】，并给出详细的测试报告，当我们对多个识别模型进行性能对比的时候，大量单独的性能指标往往对用户来说过于繁杂，使用户难以作出合理的判断，因此，我们将不同指标对于识别系统的综合影响考虑到性能指标设计中，然后定义了一个综合性能指标CM(Composite Value)来反映不同识别系统的综合性能；公式如下：Further, the decision evaluation module analyzes the input test results and judges the performance of the model [accuracy after expansion, recognition loss after Loss expansion, poor recognition accuracy before and after Δacc expansion, poor recognition loss before and after Δloss expansion, and CR model is robust. Rodiness (characterized by robacc), difference in confidence scores before and after ΔS expansion, TOP-1 prediction probability of P(Top-1Change) network changes after single-pixel perturbation, MAC mean absolute change)], and gives detailed In the test report, when we compare the performance of multiple recognition models, a large number of individual performance indicators are often too complicated for users, making it difficult for users to make reasonable judgments. Therefore, we consider the comprehensive impact of different indicators on the recognition system. In the design of performance indicators, a comprehensive performance indicator CM (Composite Value) is defined to reflect the comprehensive performance of different identification systems; the formula is as follows:

其中：CM_i代表第i个识别系统的综合性能值，ω_j代表云服务第j个性能指标值的权重，max(M_j)代表多个识别系统中第j个性能指标的最大值，min(M_j)代表多个识别系统中第j个性能指标的最小值， M_ij代表第i个识别系统的第j个性能指标值，N代表识别系统性能指标值的总数，通过利用公式 (2*max(M_j)-M_ij)/(2*max(M_j)-min(M_j))对M_ij的值进行标准化至[0，1]区间。可以看出，CM 的值越大，识别系统的综合性能越好。Among them: CM _i represents the comprehensive performance value of the ith identification system, ω _j represents the weight of the jth performance index value of the cloud service, max(M _j ) represents the maximum value of the jth performance index in multiple identification systems, min (M _j ) represents the minimum value of the j-th performance index in multiple recognition systems, M _ij represents the j-th performance index value of the ith recognition system, and N represents the total number of performance index values of the recognition system. By using formula (2 *max(M _j )-M _ij )/(2*max(M _j )-min(M _j )) normalizes the value of M _ij to the [0, 1] interval. It can be seen that the larger the value of CM, the better the comprehensive performance of the recognition system.

对于一些识别系统性能指标来说，例如Loss，P(Top-1 Change)，MAC等，其值M_ij越小， CM综合性能指标值越大，这些性能指标的值可以直接代入上述公式中，而对于识别准确率等指标，其值越大，说明识别性能更好，但是直接代入公式会造成CM值的降低，这并不符合预期，所以我们需要对这些性能指标的值进行处理，使用(1-M_ij)来替代公式中的M_ij值。For some identification system performance indicators, such as Loss, P(Top-1 Change), MAC, etc., the smaller the value M _ij is, the larger the CM comprehensive performance index value is, and the values of these performance indicators can be directly substituted into the above formula, For indicators such as recognition accuracy, the larger the value, the better the recognition performance, but directly substituting the formula will cause the CM value to decrease, which is not in line with expectations, so we need to process the values of these performance indicators, using ( 1-M _ij ) to replace the Mi _ij value in the formula.

进一步的，决策评估模块最终会输出多角度测试模块测试的各项结果，并生成相应的测试报告表格，如表1-3所示。由于不同任务场景的要求不同，测试系统也会给出相应的建议。Further, the decision evaluation module will eventually output the results of the multi-angle test module test, and generate a corresponding test report form, as shown in Table 1-3. Due to the different requirements of different task scenarios, the test system will also give corresponding suggestions.

与现有技术相比，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

本发明实现了多类别影像测试数据生成和多角度完备的系统测试，最终完成医学影像识别系统的决策评估，未来应用前景广泛。The invention realizes multi-category image test data generation and multi-angle complete system test, finally completes the decision evaluation of the medical image recognition system, and has wide application prospects in the future.

附图说明Description of drawings

图1是本发明的框架流程图；Fig. 1 is the frame flow chart of the present invention;

图2是本发明中对抗增广流程图；Fig. 2 is a flow chart of confrontation augmentation in the present invention;

图3是本发明中目标背景重组增广流程图；Fig. 3 is a flow chart of target background reorganization and augmentation in the present invention;

图4是本发明中决策评估模块。Fig. 4 is a decision evaluation module in the present invention.

具体实施方式Detailed ways

下面结合实施例对本发明作更进一步的说明。The present invention will be further described below in conjunction with the examples.

一种针对医学影像识别系统的性能测试方法，如图1所示，性能测试方法中包括了：多类别影像测试数据生成模块、多角度测试模块和决策评估模块，所述多类别影像测试数据生成模块包括对抗样本生成网络和实体、背景重组方法；所述多角度测试模块包括性能测试、可靠性测试和安全性测试；决策评估模块，所述决策评估模块对所输入的测试结果进行分析，判断模型性能，并给出详细的测试报告；A performance test method for a medical image recognition system, as shown in Figure 1, the performance test method includes: a multi-category image test data generation module, a multi-angle test module and a decision evaluation module, the multi-category image test data generation module The module includes an adversarial sample generation network and entity, and a background reorganization method; the multi-angle testing module includes performance testing, reliability testing and security testing; a decision evaluation module, the decision evaluation module analyzes the input test results and judges Model performance, and give a detailed test report;

对抗样本生成网络和实体、背景重组方法，考虑到医学影像的特性和生成的测试影像的视觉真实性，使用了对抗样本生成联合实体、背景重组方案。对抗增广使用多损失混合对抗伪装，该技术可以生成人类观察者看起来合法的新的增强图像，而不需要依赖大量的数据来训练生成网络。我们的目标是开发一种机制来生成具有自定义样式的扩展样本，利用样式变换技术实现图像增强，利用对抗攻击技术实现图像的隐蔽性。最终的多损失函数

是对抗强度λ与对抗性损失

的乘积、用于样式生成的样式损失

用于保存源图像内容的内容损失

和用于确保扩展样本的平滑度的平滑度损失

的组合。The adversarial sample generation network and entity and background reorganization method, considering the characteristics of medical images and the visual authenticity of the generated test images, use the adversarial sample generation joint entity and background reorganization scheme. Adversarial augmentation uses multi-loss adversarial camouflage, a technique that can generate new augmented images that look legitimate to a human observer without relying on large amounts of data to train generative networks. Our goal is to develop a mechanism to generate extended samples with custom styles, image enhancement using style transformation techniques, and image concealment using adversarial attack techniques. The final multiple loss function

is the adversarial strength λ and the adversarial loss

The product of , the style loss for style generation

Content loss for saving source image content

and the smoothness loss used to ensure smoothness of the expanded samples

The combination.

多损失函数

表示为：Multiple loss functions

Expressed as:

其中：λ表示对抗强度、

表示对抗性损失、

表示用于样式生成的样式损失、

表示用于保存源图像内容的内容损失、

represents the adversarial loss,

represents the style loss used for style generation,

Indicates the content loss used to preserve the content of the source image,

如图2所示，显示了对抗增广方法的概述。用户定义现有图像、目标攻击区域和预期目标样式，在所需的区域中生成所需的样式，如图2右侧所示。为了使扩展样本对各种环境条件(包括照明、旋转等)具有鲁棒性，在每个步骤中向生成的扩展样本中添加附加的物理适应训练；As shown in Figure 2, an overview of adversarial augmentation methods is shown. The user defines the existing image, the target attack area, and the expected target style to generate the desired style in the desired area, as shown on the right side of Figure 2. To make the expanded samples robust to various environmental conditions (including lighting, rotation, etc.), additional physical adaptation training is added to the generated expanded samples at each step;

其中：

是风格样式的特征提取器，

是从

is a style-style feature extractor,

From

用于样式生成的样式损失

其中：

其中：x′_i，j为对抗样本(i,j)坐标处的像素值，x_i+1，j为原始图像(i+1,j)坐标处的像素值，x_i，j+1为原始图像(i,j+1)坐标处的像素值：Where: x′ _{i, j} is the pixel value at the coordinate of the adversarial sample (i, j), x _{i+1, j} is the pixel value at the coordinate of the original image (i+1, j), x _{i, j+1} is Pixel values at the (i,j+1) coordinates of the original image:

对于对抗性损失

使用以下交叉熵损失：For adversarial loss

Use the following cross-entropy loss:

为了使对抗图像样本在现实世界中可实现，我们在生成增强样本的过程中对现实条件进行了建模。由于现实世界环境通常涉及条件波动，如视点移动、图像噪声和其他自然转换，因此我们使用一系列调整来适应这些不同的条件。特别地，我们使用了一种类似于期望过转换(EOT)的技术。我们的目标是提高扩充样本对不同物理条件的适应性。因此，我们考虑了用于模拟物理世界条件波动的变换，包括旋转、缩放、颜色偏移(以模拟光照变化)和随机背景。在这里，将现实条件引入到增强示例的生成过程中，如下所示：To make adversarial image samples achievable in the real world, we model real-world conditions during the generation of augmented samples. Since real-world environments often involve fluctuations in conditions such as viewpoint movement, image noise, and other natural transitions, we use a series of adjustments to accommodate these different conditions. In particular, we use a technique similar to expected over-transformation (EOT). Our goal is to improve the adaptability of augmented samples to different physical conditions. Therefore, we consider transformations used to simulate fluctuations in physical world conditions, including rotation, scaling, color shifting (to simulate lighting changes), and random backgrounds. Here, real-world conditions are introduced into the generation of augmented examples as follows:

目标背景重组增广使用分割算法Mask R-CNN将目标从背景中分割出来，使用插值算法将背景中空白部分补充像素，最后随机组合目标和背景，实现图片增广，总体方法框架如图3所示。Target background reorganization and augmentation The segmentation algorithm Mask R-CNN is used to segment the target from the background, and the interpolation algorithm is used to supplement the blank part of the background with pixels. Finally, the target and background are randomly combined to achieve image augmentation. The overall method framework is shown in Figure 3. Show.

多角度测试模块中的性能测试，包含不同角度：识别准确率Accuracy判别判断，识别损失值 Loss判别判断以及蜕变关系判别；准确率Accuracy和损失值Loss的判断，都是用增广前后的模型输出的准确率Accuracy和识别损失值Loss相减得到的扩充前后识别准确率差值百分比Δacc和扩充前后识别损失差值百分比Δloss；The performance test in the multi-angle test module includes different angles: recognition accuracy rate Accuracy judgment judgment, recognition loss value Loss judgment judgment and transformation relationship judgment; Accuracy rate and loss value Loss judgment are all output from the model before and after augmentation The difference between the recognition accuracy before and after expansion, Δacc, and the difference in recognition loss before and after expansion, Δloss, are obtained by subtracting the accuracy rate Accuracy and the recognition loss value Loss;

蜕变测试定义为：C_i为原测试图像

的被图像识别系统分类标签，S_i为原测试图像

的置The transformation test is defined as: C _i is the original test image

set

信分数；C_i′为结合蜕变关系利用

合成的新测试图像

的分类标签，S_i′为结合蜕变关系利用

合成的新测试图像

的置信分数，那么蜕变关系表述为：trust score; C _i ′ is used for the combined metamorphosis relationship

Synthesized new test image

The confidence score of , then the transformation relationship is expressed as:

C_i＝C^′ _i andΔS＝|S_i-S^′ _i|＜c (7)C _i =C ^′ _i andΔS=|S _i -S ^′ _i |<c (7)

多角度测试模块中的可靠性测试为鲁棒性(certified robustness)测试，在原始图像x满足置信度保证的条件下，在范数球半径R内能够免疫攻击：The reliability test in the multi-angle test module is a certified robustness test. Under the condition that the original image x satisfies the confidence guarantee, it is immune to attacks within the radius R of the norm sphere:

其中：z()为损失函数，g()是待优化的目标函数，

在安全性测试中，用两种方法测量灵敏度作为模型的不变性测试，第一个称之为P(Top-1 Change)，是网络的TOP-1预测在单像素扰动后发生变化的概率；第二称之为“平均绝对变化” (MAC)，测量在顶层类(即在两个帧的第一帧中具有最高概率的类)的一个像素扰动之后，网络计算的概率的平均绝对值变化(即在两个帧的第一个帧中具有最高概率的类)。In the security test, two methods are used to measure the sensitivity as the invariance test of the model. The first one is called P (Top-1 Change), which is the probability that the TOP-1 prediction of the network will change after a single pixel perturbation; The second, called "Mean Absolute Change" (MAC), measures the mean absolute change in the probability computed by the network after one pixel perturbation of the top class (i.e. the class with the highest probability in the first of two frames) (i.e. the class with the highest probability in the first of the two frames).

如图4所示，决策评估模块对所输入的测试结果进行分析，判断模型性能【Accuracy扩充后识别准确率，Loss扩充后识别损失，Δacc扩充前后识别准确率差，Δloss扩充前后识别损失差，CR 模型鲁棒性(用robacc来表征)，ΔS扩充前后置信分数差，P(Top-1 Change)网络的TOP-1预测在单像素扰动后发生变化的概率，MAC平均绝对变化)】，并给出详细的测试报告，当对多个识别模型进行性能对比的时候，大量单独的性能指标往往对用户来说过于繁杂，使用户难以作出合理的判断，因此，将不同指标对于识别系统的综合影响考虑到性能指标设计中，然后定义了一个综合性能指标CM(Composite Value)来反映不同识别系统的综合性能；公式如下：As shown in Figure 4, the decision evaluation module analyzes the input test results to judge the performance of the model [accuracy after expansion, recognition loss after Loss expansion, poor recognition accuracy before and after Δacc expansion, and poor recognition loss before and after Δloss expansion, The robustness of the CR model (characterized by robacc), the difference in confidence scores before and after ΔS expansion, the probability that the TOP-1 prediction of the P(Top-1 Change) network will change after a single-pixel perturbation, the MAC mean absolute change)], and A detailed test report is given. When comparing the performance of multiple recognition models, a large number of individual performance indicators are often too complicated for users, making it difficult for users to make reasonable judgments. Therefore, different indicators are used for the synthesis of the recognition system. The impact is taken into account in the design of performance indicators, and then a comprehensive performance indicator CM (Composite Value) is defined to reflect the comprehensive performance of different identification systems; the formula is as follows:

决策评估模块最终会输出多角度测试模块测试的各项结果，并生成相应的测试报告表格，如表 1-3所示。由于不同任务场景的要求不同，测试系统也会给出相应的建议。The decision evaluation module will eventually output the results of the multi-angle test module test, and generate the corresponding test report form, as shown in Table 1-3. Due to the different requirements of different task scenarios, the test system will also give corresponding suggestions.

表1.性能指标报告Table 1. Performance Metrics Report

表2.安全性指标报告Table 2. Security Metrics Report

表3.模型稳定性指标报告以及模型综合性能报告Table 3. Model stability index report and model comprehensive performance report

以上所述仅是本发明的优选实施方式，应当指出：对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above is only the preferred embodiment of the present invention, it should be pointed out that: for those skilled in the art, without departing from the principle of the present invention, several improvements and modifications can also be made, and these improvements and modifications are also It should be regarded as the protection scope of the present invention.

Claims

1. A performance test method for a medical image recognition system is characterized by comprising the following steps: the performance test method comprises the following steps: the multi-class image test data generation module comprises a confrontation sample generation network and an entity and background recombination method; the multi-angle test module comprises a performance test, a reliability test and a safety test; the decision evaluation module analyzes the input test result, judges the performance of the model and gives a detailed test report;

inputting a group of pictures to be classified and identified by a network, inputting the pictures into the multi-class image test data generation module for the first time, inputting the pictures into the model for classification after image augmentation, inputting the classification result into the multi-angle test module, testing the learning result of the model by the multi-angle test module, transmitting the result to the decision evaluation module, analyzing the input test result by the decision evaluation module, judging the performance of the model, and giving a detailed test report;

The confrontational sample generation network and entity, the background reorganization method comprises confrontation augmentation usage multi-loss mixed confrontation camouflage,

multiple loss function

Expressed as:

wherein: λ represents the antagonistic strength,

Indicating a loss of resistance,

Representing a style loss for style generation,

Representing a content loss for preserving source image content,

Representing a smoothness penalty for ensuring smoothness of the extended sample;

defining an existing image, a target attack area and an expected target pattern by a user, generating a required pattern in the required area, and adding additional physical adaptation training to the generated extended sample in each step;

the style distance between two images is defined by the difference in the style representation of the two images,

wherein:

as a feature distance, l is a style level feature, S_lIs a collection of style layers from which a style representation is extracted,

is a feature extractor for a style or style,

is from

A set of pattern layers of the extracted deep layer features of the Gram matrix, x^sIs a stylistic reference image, x' is the generated countermeasure sample;

pattern loss for pattern generation

The content of the enhanced image in the generated reference pattern is very different from the content of the original image; specifically, as follows, the following description will be given,

Wherein:

is content loss, t is a content layer characteristic, c_tIs a set of content layers that extracts the content representation,

is a feature extractor for the content layer, x is the original image, x' is the generated countermeasure sample;

improving the smoothness of the enhanced image by reducing the variation between adjacent pixels; for the enhanced image, the smoothness penalty is defined as,

wherein: x'_i，jTo combat the pixel value at the sample (i, j) coordinate, x_i+1，jIs the pixel value, x, at the (i +1, j) coordinate of the original image_i，j+1For the pixel value at the original image (i, j +1) coordinate:

for the loss of antagonism

Using the following crossesFork entropy loss:

wherein: p is a radical of_yadv() And p_y() Respectively outputting the probability of the target model F to labels yadv and y, wherein yadv is the category of the confrontation sample, and y is the category of the original image;

realistic conditions are introduced into the generation process of the augmented example, as follows:

wherein: o is a random background image sampled in the physical world, T is a random transformation of rotation, resizing and color shift,

is a set of transformations;

and the target background is recombined and expanded, the target is segmented from the background by using a segmentation algorithm Mask R-CNN, pixels are supplemented to a blank part in the background by using an interpolation algorithm, and finally the target and the background are randomly combined to realize the image expansion.

2. The performance testing method for medical image recognition systems according to claim 1, wherein:

the performance test in the multi-angle test module comprises different angles: judging and judging the identification Accuracy Accuracy, judging and judging the identification Loss value Loss and judging the metamorphic relation; judging the Accuracy Accuracy and the Loss value Loss, wherein the Accuracy Accuracy and the Loss value Loss are obtained by subtracting the Accuracy Accuracy and the identification Loss value Loss output by the models before and after the augmentation to obtain the identification Accuracy difference percentage delta acc before and after the augmentation and the identification Loss difference percentage delta Loss before and after the augmentation;

the disintegration test is defined as: c_iFor the original test image

Is classified by the image recognition system, S_iFor the original test image

A confidence score of; c_i' use for association with metamorphic relations

Synthesized new test image

Class label of S_i' use for association with metamorphic relations

Synthesized new test image

The confidence score of (a) is determined,

the metamorphic relationship is expressed as:

C_i＝C′_i and ΔS＝|S_i-S′_i|＜c (7)

wherein: c is the hyperparameter, 0< c <100, c is set to 50, Δ S is the difference in confidence scores before and after expansion.

3. The performance testing method for medical image recognition system according to claim 1, wherein:

the reliability test in the multi-angle test module is a robustness verified robustness test, and under the condition that an original image x meets confidence degree guarantee, immune attack can be carried out in a norm sphere radius R:

Wherein: z () is lossA missing function, g () is the objective function to be optimized,

the method is characterized in that the method is arbitrary, epsilon is introduced noise, B (x; R) is a noise set, R is a norm sphere radius, and x is an original image;

the final robustness accuracy robac is defined as:

4. the performance testing method for medical image recognition system according to claim 1, wherein:

the security test in the multi-angle test module is a model invariance test, a random image is selected, the disturbance of a pixel is selected by using one of four methods described below, and then the sensitivity of the network to the disturbance is measured, the first method is a Crop method, a square is randomly selected in an original image, the size of the square is adjusted to 224x224px, then the diagonal of the square is translated by one pixel to create a second image, and the image is different from the first image by translating a single pixel; the second method, Embedding, is the Embedding method, which first reduces the image to a minimum size of 100px while maintaining the aspect ratio and embeds it at a random location within the 224x224px image while filling the rest of the image with black pixels, then shifts the Embedding location by a single pixel, and creates two identical images again until a single pixel is shifted; in the third method, the image is first reduced to a minimum size of 100px while maintaining the aspect ratio and embedded at random locations within the 224x224px image, then a simple repair algorithm is used, i.e., each black pixel is replaced by a weighted average of the non-black pixels in its neighborhood, and the fourth method is the same as the second method, the image is first reduced to a minimum size of 100px without moving the embedded location, but keeping the embedded location unchanged, and changing the size of the embedded image by a single pixel.

5. The performance testing method for medical image recognition systems of claim 4, wherein:

in the security test, sensitivity is measured as an invariance test of a model by two methods, the first is the probability that TOP-1 of the network predicts the Change after a single-pixel disturbance, which is called P, i.e. Top-1 Change, and the second is the average absolute Change of the probability calculated by the network after a pixel disturbance of the TOP class, which is called MAC.

6. The performance testing method for medical image recognition systems according to claim 1, wherein:

the decision evaluation module analyzes the input test result, and judges the model performance, namely the identification Accuracy after Accuracy expansion, the identification Loss after Loss expansion, the identification Accuracy difference before and after delta acc expansion, the identification Loss difference before and after delta Loss expansion, the CR model robustness is represented by robac, the confidence score difference before and after delta S expansion, the probability that the TOP-1 of a P network changes after single-pixel disturbance is predicted, the MAC average absolute change is given, and a test report is given; the formula is as follows:

Wherein: CM (compact message processor)_iRepresenting the integrated performance value, ω, of the i-th identification system_jWeight, max (M) representing jth performance index value of cloud service_j) Represents the maximum value of the jth individual performance index, min (M), in multiple recognition systems_j) Representing the minimum value of the j-th performance indicator, M, in a plurality of recognition systems_ijJ (th) representing i (th) recognition systemThe performance indicator value, N, represents the total number of identifying system performance indicator values, by using the formula (2 max (M)_j)-M_ij)/(2*max(M_j)-min(M_j) ) to M_ijIs normalized to [0, 1 ]]An interval.

7. The performance testing method for medical image recognition system according to claim 1, wherein:

and the decision evaluation module finally outputs various results tested by the multi-angle test module and generates a corresponding test report form.