CN117371038B

CN117371038B - Distributed medical image artificial intelligence model evaluation method and device

Info

Publication number: CN117371038B
Application number: CN202311377887.1A
Authority: CN
Inventors: 肖宏旺; 叶启威; 戴腾; 曹岗; 黄铁军
Original assignee: Beijing Zhiyuan Artificial Intelligence Research Institute
Current assignee: Beijing Zhiyuan Artificial Intelligence Research Institute
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2024-07-19
Anticipated expiration: 2043-10-23
Also published as: CN117371038A

Abstract

The present invention discloses a distributed medical imaging artificial intelligence model evaluation method and device, the method includes storing medical imaging evaluation data to respective client servers; registering pre-trained models and corresponding pre-trained model files on a central server; pre-setting evaluation tasks and their corresponding evaluation indicators, sending relevant pre-trained models to be evaluated to designated client servers, the client servers access pre-trained model files and evaluation indicators and evaluate the pre-trained models to be evaluated based on the medical imaging evaluation data stored therein, and after the evaluation is completed, sending the comprehensive evaluation result information to the evaluator, the corresponding model developer and the designated hospital for review and analysis. The present invention is based on the idea of distributed and privacy computing, realizes the computable invisibility of data, fully protects patient privacy, can better deal with the long-tail distribution problem of medical imaging data, and improves the robustness and reliability of medical imaging artificial intelligence models in real medical scenarios.

Description

A distributed medical imaging artificial intelligence model evaluation method and device

技术领域Technical Field

本发明涉及智能医疗技术领域，尤其涉及一种分布式的医学影像人工智能模型评价方法和装置。The present invention relates to the field of intelligent medical technology, and in particular to a distributed medical imaging artificial intelligence model evaluation method and device.

背景技术Background technique

人工智能医学影像系统在诸多疾病的辅助诊断、预测和预后方面取得了较好的效果，但是人工智能医学影像系统从研究成果到落地应用的过程面临着技术、伦理、监管等一系列特有的挑战。医学数据具有不同于其他领域数据的特性：隐私保护要求高、法律法规监管严格、标注难度高、数据孤岛现象明显、数据量少、长尾分布现象明显等。欧盟卫生技术评估网络EUnetHTA开发了通用的9项医疗健康技术评估模型领域；在EUnetHTA框架基础上，丹麦奥登塞大学医院开发了9项针对医学影像人工智能的价值评估模型领域MAS-AI。MAS-AI框架相比于EUnetHTA框架有一项显著的不同点是在“技术”维度强调了“AI模型的开发、性能和验证”。这些指导性的框架提供了更高层面的参考，但是并未提供具体的评价模型的开发实践。医学影像人工智能系统的性能评价是人工智能医学影像领域中存在的一个共识的挑战。Artificial intelligence medical imaging systems have achieved good results in the auxiliary diagnosis, prediction and prognosis of many diseases. However, the process from research results to implementation of artificial intelligence medical imaging systems faces a series of unique challenges such as technology, ethics and supervision. Medical data has characteristics different from data in other fields: high privacy protection requirements, strict legal and regulatory supervision, high difficulty in labeling, obvious data island phenomenon, small data volume, obvious long-tail distribution phenomenon, etc. The European Union Health Technology Assessment Network EUnetHTA has developed 9 common medical and health technology assessment model fields; based on the EUnetHTA framework, Odense University Hospital in Denmark has developed 9 value assessment model fields for medical imaging artificial intelligence MAS-AI. A significant difference between the MAS-AI framework and the EUnetHTA framework is that it emphasizes "AI model development, performance and verification" in the "technology" dimension. These guiding frameworks provide a higher-level reference, but do not provide specific evaluation model development practices. The performance evaluation of medical imaging artificial intelligence systems is a consensus challenge in the field of artificial intelligence medical imaging.

传统的人工智能模型评价方法一般是集中式的，也就是模型开发者在其本地使用评价数据开展模型评价，然后将模型部署在生产环境中。而生产环境的数据分布变化、数据质量变化以及环境变化等因素，将会导致模型性能衰减。这种情况在人工智能医学影像系统的运行过程中见会尤为明显，因为在临床场景中，医学影像数据具有长尾分布特点，主要表现在两个方面：①在医学影像数据中，一些常见的病症可能会出现频率较高，而许多罕见的病症则会出现频率较低；②规模较大的医院积累的医学影像数据较多，而相对规模较小的医院积累的数据则较少，而且这些数据在不同医院之间不能共享和流动。数据的长尾分布为人工智能医学影像分析带来了挑战，有效地应对医学影像数据的长尾分布，对于提升医学影像人工智能模型的临床效用非常重要。在应对数据长尾分布问题上，一般需要采用特殊的方法或技术来处理罕见病症数据或小样本数据集，使得模型在这些情况下也能够表现良好，以确保模型在临床场景中的可靠性和稳定性。另外，医学影像数据具有极高的隐私保护和监管要求，无法在不同的医疗机构之间共享或流通，这导致很难使用传统的集中式的方法来开展医学影像人工智能模型评价。Traditional AI model evaluation methods are generally centralized, that is, model developers use evaluation data locally to conduct model evaluation, and then deploy the model in the production environment. Factors such as changes in data distribution, data quality, and environmental changes in the production environment will lead to model performance degradation. This situation is particularly evident in the operation of AI medical imaging systems, because in clinical scenarios, medical imaging data has a long-tail distribution characteristic, which is mainly manifested in two aspects: ① In medical imaging data, some common diseases may appear more frequently, while many rare diseases appear less frequently; ② Larger hospitals accumulate more medical imaging data, while relatively smaller hospitals accumulate less data, and these data cannot be shared and circulated between different hospitals. The long-tail distribution of data brings challenges to AI medical imaging analysis. Effectively dealing with the long-tail distribution of medical imaging data is very important for improving the clinical utility of medical imaging AI models. In dealing with the problem of long-tail distribution of data, special methods or techniques are generally required to process rare disease data or small sample data sets, so that the model can also perform well in these cases to ensure the reliability and stability of the model in clinical scenarios. In addition, medical imaging data has extremely high privacy protection and regulatory requirements and cannot be shared or circulated between different medical institutions, which makes it difficult to use traditional centralized methods to conduct medical imaging artificial intelligence model evaluation.

发明内容Summary of the invention

为了解决现有技术中存在的问题，本发明提供了如下技术方案。In order to solve the problems existing in the prior art, the present invention provides the following technical solutions.

本发明第一方面提供了一种分布式的医学影像人工智能模型评价方法，包括：各医院将私有的医学影像测评数据上传存储至各自的客户端服务器；各模型开发方在中心端服务器上注册预训练模型，并将各预训练模型对应的预训练模型文件上传存储至中心端服务器；评价方在中心端服务器预设评价任务和与所述评价任务对应的评价指标，下发与所述评价任务相关的待评价预训练模型至指定医院的客户端服务器后启动评价任务，指定医院的客户端服务器访问所述预训练模型文件和评价指标并基于其内储存的医学影像测评数据对待评价预训练模型进行评价；待评价结束后，中心端服务器将待评价预训练模型的综合评价结果信息分别发送至评价方、对应的模型开发方及指定的医院进行查看分析。The first aspect of the present invention provides a distributed medical imaging artificial intelligence model evaluation method, comprising: each hospital uploads and stores private medical imaging evaluation data to its own client server; each model developer registers a pre-trained model on a central server, and uploads and stores the pre-trained model files corresponding to each pre-trained model to the central server; the evaluator presets an evaluation task and an evaluation indicator corresponding to the evaluation task on the central server, sends the pre-trained model to be evaluated related to the evaluation task to the client server of the designated hospital, and then starts the evaluation task, the client server of the designated hospital accesses the pre-trained model file and the evaluation indicator and evaluates the pre-trained model to be evaluated based on the medical imaging evaluation data stored therein; after the evaluation is completed, the central server sends the comprehensive evaluation result information of the pre-trained model to be evaluated to the evaluator, the corresponding model developer and the designated hospital for review and analysis.

进一步地，各医院将私有的医学影像测评数据上传存储至各自的客户端服务器之前还包括对本医院的医学影像测评数据进行预处理。Furthermore, before each hospital uploads and stores the private medical image evaluation data to its own client server, it also includes pre-processing the medical image evaluation data of the hospital.

进一步地，各医院仅对各自私有的医学影像测评数据有管理权限。Furthermore, each hospital only has management authority over its own private medical imaging assessment data.

进一步地，所述预训练模型文件包括预训练模型的模型名称、模型类型和训练采用的骨干网络结构。Furthermore, the pre-trained model file includes the model name, model type and backbone network structure used in training of the pre-trained model.

进一步地，预训练模型文件上传中心端服务器时支持多种格式。进一步地，中心端服务器和所述客户端服务器之间通过虚拟专用网VPN进行通信，所述中心端服务器和所述客户端服务器分别采用B/S架构设计。Furthermore, the pre-trained model file supports multiple formats when it is uploaded to the central server. Furthermore, the central server and the client server communicate via a virtual private network VPN, and the central server and the client server are respectively designed with a B/S architecture.

进一步地，评价方对各模型开发方在中心端服务器上注册的预训练模型进行审核。Furthermore, the evaluator reviews the pre-trained models registered by each model developer on the central server.

进一步地，综合评价结果信息包括使用的评价数据、测试参数和评价结果指标值。Furthermore, the comprehensive evaluation result information includes the evaluation data used, the test parameters and the evaluation result index values.

进一步地，选取预训练模型综合评价结果中性能排名前三的预训练模型，分别发起联邦学习，使用各医院本地的医学影像优化数据对预训练模型进行监督训练。Furthermore, the top three pre-trained models in terms of performance in the comprehensive evaluation results of pre-trained models were selected, and federated learning was initiated separately, and supervised training of the pre-trained models was performed using the local medical imaging optimization data of each hospital.

本发明第二方面提供了的一种分布式医学影像人工智能模型评价装置，所述装置包括：包括：A second aspect of the present invention provides a distributed medical image artificial intelligence model evaluation device, the device comprising:

数据管理模块，用于各医院将私有的医学影像测评数据上传存储至各自的客户端服务器；The data management module is used by each hospital to upload and store private medical imaging evaluation data to their respective client servers;

模型管理模块，用于各模型开发方在中心端服务器上注册预训练模型，并将预训练模型对应的预训练模型文件上传至中心端服务器；The model management module is used by each model developer to register the pre-trained model on the central server and upload the pre-trained model file corresponding to the pre-trained model to the central server;

评价管理模块，用于评价方在中心端服务器预设评价任务和与所述评价任务对应的评价指标，下发与所述评价任务相关的待评价预训练模型至指定医院的客户端服务器后启动评价任务，指定医院的客户端服务器访问所述预训练模型文件和评价指标并基于其内储存的医学影像测评数据对待评价预训练模型进行评价；An evaluation management module is used for the evaluator to preset an evaluation task and an evaluation index corresponding to the evaluation task on a central server, and to initiate the evaluation task after sending a pre-trained model to be evaluated related to the evaluation task to a client server of a designated hospital. The client server of the designated hospital accesses the pre-trained model file and the evaluation index and evaluates the pre-trained model to be evaluated based on the medical image evaluation data stored therein;

可视化模块，用于待评价结束后，中心端服务器将待评价预训练模型的综合评价结果信息分别发送至评价方、对应的模型开发方及指定的医院进行查看分析。The visualization module is used for sending the comprehensive evaluation result information of the pre-trained model to be evaluated to the evaluator, the corresponding model developer and the designated hospital for review and analysis after the evaluation is completed.

本发明的有益效果Beneficial effects of the present invention

本发明提供的分布式医学影像人工智能模型评价方法和装置具有以下三个方面的优势：The distributed medical imaging artificial intelligence model evaluation method and device provided by the present invention have the following three advantages:

(1)在保证“数据不出医院”的前提下使用医院真实场景的私有医学影像数据对医学影像人工智能模型开展评价，实现数据的“可算不可见”，充分保护了患者的隐私；(1) Using private medical imaging data from real hospital scenarios to evaluate medical imaging artificial intelligence models, while ensuring that data does not leave the hospital, the data can be calculated but not seen, fully protecting the privacy of patients;

(2)基于隐私计算思想，充分发挥了真实场景数据的价值，通过真实数据的回馈，优化模型性能，提升医学影像模型在医疗环境下的临床效用；(2) Based on the idea of privacy computing, the value of real-scene data is fully utilized. Through the feedback of real data, the model performance is optimized and the clinical utility of medical imaging models in medical environments is improved;

(3)较好地应对医学影像数据的长尾分布问题，进一步提升医学影像人工智能模型在真实医疗场景中的鲁棒性和可靠性。即使某些较小的医疗机构由于积累的医学影像数据较少，但是通过本发明所提出的分布式医学影像人工智能模型评价方法，也可以受益于最终的医学影像人工智能模型。(3) Better deal with the long-tail distribution problem of medical imaging data, and further improve the robustness and reliability of medical imaging artificial intelligence models in real medical scenarios. Even if some smaller medical institutions have less accumulated medical imaging data, they can benefit from the final medical imaging artificial intelligence model through the distributed medical imaging artificial intelligence model evaluation method proposed in the present invention.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明所述的分布式医学影像人工智能模型评价方法流程示意图；FIG1 is a schematic diagram of the process flow of the distributed medical imaging artificial intelligence model evaluation method according to the present invention;

图2为本发明所述的分布式医学影像人工智能模型评价装置功能结构示意图。FIG2 is a schematic diagram of the functional structure of the distributed medical imaging artificial intelligence model evaluation device according to the present invention.

具体实施方式Detailed ways

为了更好地理解上述技术方案，下面将结合说明书附图以及具体的实施方式对上述技术方案做详细的说明。In order to better understand the above technical solution, the above technical solution will be described in detail below in conjunction with the accompanying drawings and specific implementation methods.

本发明提供的方法可以在如下的终端环境中实施，该终端可以包括一个或多个如下部件：处理器、存储器和显示屏。其中，存储器中存储有至少一条指令，所述指令由处理器加载并执行以实现下述实施例所述的方法。The method provided by the present invention can be implemented in the following terminal environment, and the terminal may include one or more of the following components: a processor, a memory, and a display screen. The memory stores at least one instruction, and the instruction is loaded and executed by the processor to implement the method described in the following embodiment.

处理器可以包括一个或者多个处理核心。处理器利用各种接口和线路连接整个终端内的各个部分，通过运行或执行存储在存储器内的指令、程序、代码集或指令集，以及调用存储在存储器内的数据，执行终端的各种功能和处理数据。The processor may include one or more processing cores. The processor uses various interfaces and lines to connect various parts in the entire terminal, and executes various functions of the terminal and processes data by running or executing instructions, programs, code sets or instruction sets stored in the memory, and calling data stored in the memory.

存储器可以包括随机存储器(Random Access Memory，RAM)，也可以包括只读存储器(Read-Only Memory，ROM)。存储器可用于存储指令、程序、代码、代码集或指令。The memory may include random access memory (RAM) or read-only memory (ROM). The memory may be used to store instructions, programs, codes, code sets or instructions.

显示屏用于显示各个应用程序的用户界面。The display screen is used to display the user interface of each application.

除此之外，本领域技术人员可以理解，上述终端的结构并不构成对终端的限定，终端可以包括更多或更少的部件，或者组合某些部件，或者不同的部件布置。比如，终端中还包括射频电路、输入单元、传感器、音频电路、电源等部件，在此不再赘述。In addition, those skilled in the art will appreciate that the structure of the above terminal does not constitute a limitation on the terminal, and the terminal may include more or fewer components, or combine certain components, or arrange the components differently. For example, the terminal also includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, and a power supply, which will not be described in detail here.

实施例一Embodiment 1

如图1所示，本实施例提供了一种分布式的医学影像人工智能模型评价方法，包括以下步骤：As shown in FIG1 , this embodiment provides a distributed medical image artificial intelligence model evaluation method, comprising the following steps:

S1，各医院将私有的医学影像测评数据上传存储至各自的客户端服务器；S1, each hospital uploads and stores private medical imaging evaluation data to its own client server;

需要说明的是，医院始终是其拥有的医学影像测评数据的持有方，医院私有的医学影像测评数据只存储在各自医院内部的客户端服务器上，医院仅对各自私有的医学影像测评数据有管理权限，医院可访问中心端服务器下发的预训练模型文件，并使用私有的医学影像测评数据对下发的预训练模型性能进行评价；另外，各医院将私有的医学影像测评数据上传至各自的客户端服务器之前，还可以对医学影像测评数据进行预处理，上述医学影像测评数据包括医学影像类型(如CT、X光)、数据量、影像涉及的器官、相关的疾病基本信息等。It should be noted that the hospital is always the holder of the medical imaging evaluation data it owns. The hospital's private medical imaging evaluation data is only stored on the client server within each hospital. The hospital only has management authority over its own private medical imaging evaluation data. The hospital can access the pre-trained model files issued by the central server and use the private medical imaging evaluation data to evaluate the performance of the issued pre-trained model; in addition, before uploading the private medical imaging evaluation data to its own client server, each hospital can also pre-process the medical imaging evaluation data. The above-mentioned medical imaging evaluation data includes the type of medical imaging (such as CT, X-ray), data volume, organs involved in the imaging, and basic information on related diseases.

S2，各模型开发方在中心端服务器上注册预训练模型，并将各预训练模型对应的预训练模型文件上传存储至中心端服务器；S2, each model developer registers a pre-trained model on the central server, and uploads and stores the pre-trained model files corresponding to each pre-trained model to the central server;

需要说明的是，其中，所述预训练模型文件包括预训练模型的模型名称、模型类型、训练采用的骨干网络结构等，预训练模型文件上传中心端服务器时支持多种格式，模型开发方包括两种类型，一种是预训练模型的开发方，另一种是基于预训练模型进行特定任务模型的开发方，各模型开发方全程没有权限接触各医院客户端服务器内的医学影像测评数据。It should be noted that the pre-trained model file includes the model name, model type, backbone network structure used in training, etc. of the pre-trained model. The pre-trained model file supports multiple formats when uploading to the central server. There are two types of model developers, one is the developer of the pre-trained model, and the other is the developer of a specific task model based on the pre-trained model. Each model developer has no authority to access the medical imaging evaluation data in the client server of each hospital throughout the process.

S3，评价方在中心端服务器预设评价任务和与所述评价任务对应的评价指标，下发与所述评价任务相关的待评价预训练模型至指定医院的客户端服务器后启动评价任务，指定医院的客户端服务器访问所述预训练模型文件和评价指标并基于其内储存的医学影像测评数据对待评价预训练模型进行评价；S3, the evaluator presets the evaluation task and the evaluation index corresponding to the evaluation task on the central server, sends the pre-trained model to be evaluated related to the evaluation task to the client server of the designated hospital, and then starts the evaluation task. The client server of the designated hospital accesses the pre-trained model file and the evaluation index and evaluates the pre-trained model to be evaluated based on the medical image evaluation data stored therein;

需要说明的是，中心端服务器和客户端服务器之间通过虚拟专用网VPN进行通信，中心端服务器和客户端服务器分别采用B/S(浏览器/服务器模式)架构设计，由独立且分离的前后端组成，此高度模块化设计，易于维护和部署；模型评价初始化时默认已经具有评价方用户，评价方拥有管理员权限，可以管理医院和模型开发方的注册及两者对应的权限，评价方能看到医院存储至客户端服务器的医学影像测评数据及模型开发方在中心端服务器注册的预训练模型及其对应的预训练模型文件。当各模型开发方在中心端服务器上注册其开发的预训练模型后，评价方对上述预训练模型进行审核，待审核通过后，评价方可以在其管理界面开展待评价预训练模型的性能测试。本实施例中通过设置评价方，可以保证各预训练模型评价公平、公正、独立地被执行。It should be noted that the central server and the client server communicate through a virtual private network VPN. The central server and the client server respectively adopt a B/S (browser/server mode) architecture design, which consists of independent and separate front and back ends. This highly modular design is easy to maintain and deploy. When the model evaluation is initialized, it is assumed that there is an evaluator user. The evaluator has administrator privileges and can manage the registration of hospitals and model developers and the corresponding privileges of the two. The evaluator can see the medical imaging evaluation data stored by the hospital on the client server and the pre-trained model registered by the model developer on the central server and its corresponding pre-trained model file. After each model developer registers the pre-trained model it has developed on the central server, the evaluator reviews the above pre-trained model. After the review is passed, the evaluator can carry out the performance test of the pre-trained model to be evaluated in its management interface. In this embodiment, by setting up an evaluator, it can be ensured that the evaluation of each pre-trained model is carried out fairly, impartially and independently.

S4，待评价结束后，中心端服务器将待评价预训练模型的综合评价结果信息分别发送至评价方、对应的模型开发方及指定的医院进行查看分析。S4, after the evaluation is completed, the central server sends the comprehensive evaluation result information of the pre-trained model to be evaluated to the evaluator, the corresponding model developer and the designated hospital for review and analysis.

需要说明的是，所述模型综合评价结果信息包括被评价的模型文件、使用的评价数据、测试参数和评价结果指标值，各个医院仅可以获得待评价预训练模型基于其私有的医学影像测评数据得到的单独评价结果，模型开发方可获取各医院的单独评价结果，也可以获取多家医院联合评价的平均评价结果。It should be noted that the comprehensive evaluation result information of the model includes the evaluated model file, the evaluation data used, the test parameters and the evaluation result index values. Each hospital can only obtain the individual evaluation results of the pre-trained model to be evaluated based on its private medical imaging evaluation data. The model developer can obtain the individual evaluation results of each hospital, and can also obtain the average evaluation results of the joint evaluation of multiple hospitals.

进一步地，为优化预训练模型性能，选取预训练模型综合评价结果中性能排名前三的预训练模型，分别发起联邦学习，使用各医院本地的医学影像优化数据(这部分数据不是医学影像测评数据)对预训练模型进行监督训练，得到微调的预训练模型后，使用医院的医学影像测评数据再次进行评价，不断促进预训练模型性能的发展。在预训练模型日常运行过程中，形成数据飞轮，也就是新增患者数据辅助预训练模型评价和微调，微调后的预训练模型能辅助更多的临床使用。Furthermore, in order to optimize the performance of the pre-trained model, the top three pre-trained models in the comprehensive evaluation results of the pre-trained model were selected, and federated learning was initiated separately. The pre-trained models were supervised and trained using the local medical imaging optimization data of each hospital (this part of the data is not the medical imaging evaluation data). After obtaining the fine-tuned pre-trained model, the hospital's medical imaging evaluation data was used to evaluate it again, continuously promoting the development of the pre-trained model performance. In the daily operation of the pre-trained model, a data flywheel is formed, that is, new patient data is added to assist the evaluation and fine-tuning of the pre-trained model, and the fine-tuned pre-trained model can assist more clinical use.

关于评价结果指标值，本实施例以某医学影像分类问题为例进行说明。假设根据医学影像判断某患者是否患有某种疾病，即判断疾病的阴性和阳性，那么这是一个分类的问题。分类问题的结果可以采用混淆矩阵表示，如下表1所示：Regarding the evaluation result index value, this embodiment takes a medical image classification problem as an example for explanation. Assuming that a medical image is used to determine whether a patient has a certain disease, that is, to determine the negative and positive of the disease, then this is a classification problem. The result of the classification problem can be represented by a confusion matrix, as shown in Table 1 below:

表1Table 1

其中，in,

TP：True Positive，真实值为阳性，预测值为阳性，预测正确；TP: True Positive, the true value is positive, the predicted value is positive, and the prediction is correct;

FP：False Positive，真实值为阴性，预测值为阳性，预测错误；FP: False Positive, the true value is negative, the predicted value is positive, and the prediction is wrong;

FN：False Negative，真实值为阳性，预测值为阴性，预测错误；FN: False Negative, the true value is positive, the predicted value is negative, and the prediction is wrong;

TN：True Negative，真实值为阴性，预测值为阴性，预测正确。TN: True Negative, the true value is negative, the predicted value is negative, and the prediction is correct.

基于表1，在本实施例中采用的评价结果指标值如下所示：Based on Table 1, the evaluation result index values used in this embodiment are as follows:

准确率(Accuracy)：预测正确的结果占总样本的百分比，Accuracy: The percentage of correct predictions in the total samples.

Accuracy＝(TP+TN)/(TP+TN+FP+FN)；Accuracy=(TP+TN)/(TP+TN+FP+FN);

精准率(Precision)：被预测为阳性的样本中实际为阳性的样本的概率，Precision: The probability of samples predicted to be positive being actually positive.

Precision＝TP/(TP+FP)；Precision = TP/(TP+FP);

召回率(Recall)：实际为阳性的样本中被预测为阳性的样本的概率，Recall: The probability of samples predicted to be positive among samples that are actually positive.

Recall＝TP/(TP+FN)；Recall = TP/(TP+FN);

特异度(Specificity)：又叫真阴性率(True Negative Rate，TNR)，指的是真阴性样本在实际阴性样本中的占比，Specificity: Also known as True Negative Rate (TNR), it refers to the proportion of true negative samples in actual negative samples.

TNR＝TN/(FP+TN)；TNR = TN / (FP + TN);

灵敏度(Sensitivity)：又叫真阳性率(True Positive Rate，TPR)，指的是真阳性样本在实际阳性样本中的占比，Sensitivity: Also known as True Positive Rate (TPR), it refers to the proportion of true positive samples in actual positive samples.

TPR＝TP/(TP+FN)；TPR = TP/(TP+FN);

F1：是精准率和召回率的调和平均，目的是为了找到查准率和查全率之间的一个平衡点，同时考虑精准率和召回率，让两者同时达到最高，F1: It is the harmonic mean of precision and recall. Its purpose is to find a balance between precision and recall. It takes both into account and maximizes them at the same time.

F1＝2×(Precision×Recall)/(Precision+Recall)；F1＝2×(Precision×Recall)/(Precision+Recall);

AUC(Area Under Curve，ROC曲线下的面积)：是受试者工作特征曲线(ReceiverOperating Characteristic，ROC)下与坐标轴围成的面积，AUC的取值范围在0.5和1之间；AUC越接近1，检测方法真实性越高；等于0.5时，则真实性最低，无应用价值。AUC (Area Under Curve) is the area under the receiver operating characteristic (ROC) curve and the coordinate axis. The value range of AUC is between 0.5 and 1. The closer AUC is to 1, the higher the authenticity of the detection method. When it is equal to 0.5, the authenticity is the lowest and it has no application value.

传统情况下，上述7个评价指标在非分布式场景下是广泛被采用的指标。本实施例采用分布式机制，对上述评价指标进行计算。以AUC指标为例，A，B，C三家医院分别使用各自的医学影像测评数据(这些数据是同一种疾病的数据)对模型开发方A在中心端服务上注册的预训练模型(称作模型A)执行了评价任务，任务执行后每家医院得到一个AUC值，分别为AUC_a，AUC_b，AUC_c。评价方最终得到的针对模型A的综合评价结果AUC_{mol_a}如下：Traditionally, the above 7 evaluation indicators are widely used in non-distributed scenarios. This embodiment adopts a distributed mechanism to calculate the above evaluation indicators. Taking the AUC indicator as an example, hospitals A, B, and C used their own medical imaging evaluation data (these data are data for the same disease) to perform an evaluation task on the pre-trained model (called model A) registered by model developer A on the central service. After the task is executed, each hospital obtains an AUC value, which are AUC _a , AUC _b , and AUC _c respectively. The comprehensive evaluation result AUC _{mol_a} for model A finally obtained by the evaluator is as follows:

相同地，A，B，C三家医院分别使用各自的医学影像测评数据对模型开发方B注册的预训练模型(称作模型B)执行评价任务，任务执行后每家医院得到一个AUC值，分别为AUC_a′，AUC_b′，AUC_c′。评价方最终得到的针对模型B的综合评价结果如下：Similarly, hospitals A, B, and C use their own medical imaging evaluation data to perform evaluation tasks on the pre-trained model registered by model developer B (called model B). After the task is completed, each hospital obtains an AUC value, which are AUC _a ′, AUC _b ′, and AUC _c ′ respectively. The comprehensive evaluation results of model B obtained by the evaluator are as follows:

本实施例使用的这种分布式的评价机制，一方面保证医院患者数据“不出医院”，保护了患者的隐私；另一方面，为医学影像人工智能模型评价提供了真实场景的数据，为医学影像人工智能模型性能的进一步优化提供了巨大的价值，这种分布式评价的方式对整个医学影像人工智能应用的发展具有重要意义。The distributed evaluation mechanism used in this embodiment, on the one hand, ensures that hospital patient data "does not leave the hospital", thus protecting the privacy of patients; on the other hand, it provides real-scene data for the evaluation of medical imaging artificial intelligence models, and provides great value for further optimizing the performance of medical imaging artificial intelligence models. This distributed evaluation method is of great significance to the development of the entire medical imaging artificial intelligence application.

实施例二Embodiment 2

如图2所示，本发明实施例还提供了一种分布式的医学影像人工智能模型评价装置，包括：As shown in FIG2 , an embodiment of the present invention further provides a distributed medical image artificial intelligence model evaluation device, comprising:

数据管理模块，用于各医院将私有的医学影像测评数据上传存储至各自的客户端服务器，该模块的主要功能是满足医院管理其医学影像测评数据；The data management module is used by each hospital to upload and store private medical imaging evaluation data to their own client servers. The main function of this module is to meet the needs of hospitals in managing their medical imaging evaluation data.

模型管理模块，用于各模型开发方在中心端服务器上注册预训练模型，并将预训练模型对应的预训练模型文件上传至中心端服务器，该模块的主要功能是满足模型开发方管理其预训练模型；The model management module is used by each model developer to register the pre-trained model on the central server and upload the pre-trained model file corresponding to the pre-trained model to the central server. The main function of this module is to meet the needs of model developers to manage their pre-trained models;

评价管理模块，用于评价方在中心端服务器预设评价任务和与所述评价任务对应的评价指标，下发与所述评价任务相关的待评价预训练模型至指定医院的客户端服务器后启动评价任务，指定医院的客户端服务器访问所述预训练模型文件和评价指标并基于其内储存的医学影像测评数据对待评价预训练模型进行评价，该模块的主要功能是满足中立的评价方管理评价任务及对应的评价指标；An evaluation management module is used for the evaluator to preset the evaluation task and the evaluation index corresponding to the evaluation task on the central server, and to initiate the evaluation task after sending the pre-trained model to be evaluated related to the evaluation task to the client server of the designated hospital. The client server of the designated hospital accesses the pre-trained model file and the evaluation index and evaluates the pre-trained model to be evaluated based on the medical image evaluation data stored therein. The main function of this module is to satisfy the neutral evaluator to manage the evaluation task and the corresponding evaluation index;

可视化模块，用于待评价结束后，中心端服务器将待评价预训练模型的综合评价结果信息分别发送至评价方、对应的模型开发方及指定的医院进行查看分析，该模块的主要功能是满足评价任务过程的监控以及评价结果的对比分析。The visualization module is used for the central server to send the comprehensive evaluation result information of the pre-trained model to be evaluated to the evaluator, the corresponding model developer and the designated hospital for review and analysis after the evaluation is completed. The main function of this module is to monitor the evaluation task process and compare and analyze the evaluation results.

进一步地，所述装置还包括系统管理模块，该模块的主要功能是满足评价方执行账号管理、权限管理等系统管理需要。Furthermore, the device also includes a system management module, the main function of which is to meet the evaluator's system management needs such as account management and authority management.

尽管已描述了本发明的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例作出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。Although preferred embodiments of the present invention have been described, additional changes and modifications may be made to these embodiments by those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications that fall within the scope of the present invention. Obviously, those skilled in the art may make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.

Claims

1. A distributed medical imaging artificial intelligence model evaluation method, characterized by comprising:

Each hospital uploads and stores private medical imaging evaluation data to its own client server;

Each model developer registers a pre-trained model on the central server, and uploads and stores the pre-trained model files corresponding to each pre-trained model to the central server;

The evaluator presets the evaluation task and the evaluation index corresponding to the evaluation task on the central server, sends the pre-trained model to be evaluated related to the evaluation task to the client server of the designated hospital, and then starts the evaluation task. The client server of the designated hospital accesses the pre-trained model file and the evaluation index and evaluates the pre-trained model to be evaluated based on the medical image evaluation data stored therein;

After the evaluation is completed, the central server will send the comprehensive evaluation result information of the pre-trained model to be evaluated to the evaluator, the corresponding model developer and the designated hospital for review and analysis;

Among them, the comprehensive evaluation result information of the pre-trained model to be evaluated includes the evaluated model file, the evaluation data used, the test parameters and the evaluation result indicator value. Each hospital can only obtain the individual evaluation results of the pre-trained model to be evaluated based on its private medical imaging evaluation data. The model developer can obtain the individual evaluation results of each hospital, and can also obtain the average evaluation results of multiple hospitals' joint evaluation.

2. The model evaluation method as described in claim 1 is characterized in that before each hospital uploads and stores the private medical imaging evaluation data to its own client server, it also includes pre-processing the medical imaging evaluation data of the hospital.

3. The model evaluation method as described in claim 1 is characterized in that each hospital only has management authority over its own private medical imaging evaluation data.

4. The model evaluation method as described in claim 1 is characterized in that the pre-trained model file includes the model name, model type and backbone network structure used in training of the pre-trained model.

5. The model evaluation method as described in claim 1 is characterized in that the pre-trained model file supports multiple formats when uploading to the central server.

6. The model evaluation method as described in claim 1 is characterized in that the central server and the client server communicate with each other through a virtual private network VPN, and the central server and the client server respectively adopt a B/S architecture design.

7. The model evaluation method as described in claim 1 is characterized in that the evaluator reviews the pre-trained models registered by each model developer on the central server.

8. The model evaluation method as described in claim 1 is characterized in that the comprehensive evaluation result information includes the evaluation data used, test parameters and evaluation result indicator values.

9. The model evaluation method as described in claim 1 is characterized in that the pre-training models ranked in the top three in terms of performance in the comprehensive evaluation results of the pre-training models are selected, and federated learning is initiated separately, and the pre-training models are supervised and trained using the local medical imaging optimization data of each hospital.

10. A distributed medical image artificial intelligence model evaluation device, characterized by comprising:

The data management module is used by each hospital to upload and store private medical imaging evaluation data to their respective client servers;

The model management module is used by each model developer to register the pre-trained model on the central server and upload the pre-trained model file corresponding to the pre-trained model to the central server;

An evaluation management module is used for the evaluator to preset an evaluation task and an evaluation index corresponding to the evaluation task on a central server, and to initiate the evaluation task after sending a pre-trained model to be evaluated related to the evaluation task to a client server of a designated hospital. The client server of the designated hospital accesses the pre-trained model file and the evaluation index and evaluates the pre-trained model to be evaluated based on the medical image evaluation data stored therein;

The visualization module is used for sending the comprehensive evaluation result information of the pre-trained model to be evaluated to the evaluator, the corresponding model developer and the designated hospital for review and analysis after the evaluation is completed.