CN109658996B

CN109658996B - Physical examination data completion method and device based on side information and application

Info

Publication number: CN109658996B
Application number: CN201811416427.4A
Authority: CN
Inventors: 吴健; 陈晋泰; 郭若乾; 冯芮苇; 雷璧闻; 王文哲; 陆逸飞; 吴福理
Original assignee: Shandong Industrial Technology Research Institute of ZJU
Current assignee: Shandong Industrial Technology Research Institute of ZJU
Priority date: 2018-11-26
Filing date: 2018-11-26
Publication date: 2020-08-18
Anticipated expiration: 2038-11-26
Also published as: CN109658996A

Abstract

The invention discloses a medical examination data completion method based on side information, including (1) constructing and complementing a medical examination-disease matrix, a pathogenic factor-disease matrix, and a pathogenic factor-physical examination matrix according to the side information; (2) respectively Establish encoding and decoding networks D2F Net, D2C Net and F2C Net between any two matrices; (3) Jointly train D2F Net, D2C Net and F2C Net, after training, the pathogenic factor-disease matrix and the pathogenic factor-physical examination matrix (4) Input the medical examination-disease matrix to be completed into D2F Net, D2C Net, and use the completed pathogenic factor-disease matrix, pathogenic factor-physical examination matrix and F2C Net, and complete the calculation after calculation. General Physical Exam-Disease Matrix. A medical examination data complementing device based on side information is also disclosed, which can complement medical examination data and disease results according to existing information.

Description

A method, device and application for medical examination data completion based on side information

技术领域technical field

本发明属于数据统计和人工智能领域，具体涉及一种基于边信息的体检数据补全方法、装置及应用。The invention belongs to the fields of data statistics and artificial intelligence, and in particular relates to a method, device and application for completing physical examination data based on side information.

背景技术Background technique

传统体检方案是经过一系列的体检以进行疾病筛查：根据不同的病症需要，在医生或者医疗手册的安排建议下进行相关生理特征项目的体检，再由医生通过相关的生理特征体检结果对患者可能患有的疾病进行诊断。由于体检项目繁多，不同的医院、医生以及时代都具有不同的检查方式，导致体检项目纷繁杂乱，无法统一，造成相关医疗资源的浪费和使病患无畏受累。The traditional physical examination plan is to go through a series of physical examinations for disease screening: according to the needs of different diseases, the physical examination of the relevant physiological characteristics items is carried out under the arrangement and recommendation of the doctor or medical manual, and then the doctor will pass the relevant physiological characteristics examination results to the patient. Diagnose possible diseases. Due to the variety of physical examination items, different hospitals, doctors and eras have different examination methods, resulting in a variety of physical examination items that cannot be unified, resulting in a waste of relevant medical resources and making patients fearless.

随着科技的不断发展，不同体检项目隐含的生理特征相关性以及生理特征与疾病之间的影响程度等医疗知识的研究趋于完善，矩阵补全和边信息问题也得到了发展。矩阵补全(Matrix Completion，简记为:MC)就是根据已知元素估计未知元素，从而把矩阵恢复完整的过程，是人工智能研究项目中的一个重点难点，其任务是通过人工智能算法对不完整的矩阵进行补全。该任务在数据挖掘，电商营销、工程控制、图像和视频处理中皆有重要的应用。With the continuous development of science and technology, the research on medical knowledge such as the correlation of physiological characteristics implied by different physical examination items and the degree of influence between physiological characteristics and diseases tends to be perfected, and the problems of matrix completion and side information have also been developed. Matrix Completion (abbreviated as: MC) is the process of estimating unknown elements based on known elements, thereby restoring the matrix to a complete process. It is a key difficulty in artificial intelligence research projects. Complete matrix for completion. This task has important applications in data mining, e-commerce marketing, engineering control, image and video processing.

在医疗项目中，不同医疗体检项目的统一有赖于矩阵补全算法，通过相关的体检项目推测未知的体检项目的效果。但是，因为矩阵补全技术目前往往通过线性变换、局部信息插值等方法，但是在利用背景知识进行非线性变换的研究较少，结果也不够完善。In medical projects, the unification of different medical physical examination items depends on the matrix completion algorithm, which infers the effect of unknown physical examination items through related physical examination items. However, because the matrix completion technology often uses methods such as linear transformation and local information interpolation, there are few studies on nonlinear transformation using background knowledge, and the results are not perfect.

边信息(Side Information)是指利用已有的信息Y辅助对信息X进行编码，可以使得信息X的编码长度更短。边信息见多用户信源编码。一个通俗的例子是：假设到马场去赌马，根据每个马的赔率可以得到一个最佳的投资方案。但是如果知道赌马的一些历史数据，例如上几场的胜负情况，那么可以得出一个更优的投资方案。赌马中的历史数据就是边信息。Side information refers to using the existing information Y to assist in encoding the information X, which can make the encoding length of the information X shorter. For side information, see Multi-User Source Coding. A popular example is: Suppose you go to the racecourse to bet on horses, and you can get an optimal investment plan according to the odds of each horse. But if you know some historical data on horse betting, such as the results of the last few games, you can come up with a better investment plan. Historical data in horse betting is side information.

边信息算法是基于边信息补全矩阵中缺失信息的算法，即在信息流中找到相关和不相关的数据点，约束和辅助矩阵补全技术的完善，应用于各种需要矩阵补全的领域。边信息法还是传统机器学习中的一个分支，在与人工神经网络和深度学习的结合上也没有足够的尝试。The side information algorithm is an algorithm based on the missing information in the side information completion matrix, that is, to find the relevant and irrelevant data points in the information flow, and the improvement of constraints and auxiliary matrix completion technology is applied to various fields that require matrix completion. . Side information method is still a branch of traditional machine learning, and there is not enough attempt to combine it with artificial neural network and deep learning.

医疗领域中，数据缺失如此严重，带标签数据稀少的情况也很常见，但却鲜有矩阵补全方法应用。In the medical field, data missing is so serious and labeled data is sparse, but matrix completion methods are rarely applied.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种基于边信息的体检数据补全方法、装置，能够根据已有的信息来补全体检数据和疾病结果。The purpose of the present invention is to provide a medical examination data complementing method and device based on side information, which can complement medical examination data and disease results according to the existing information.

本发明的另一目的是提供一种基于边信息的体检数据补全装置的应用，该装置用于重构疾病。Another object of the present invention is to provide an application of an apparatus for complementing medical examination data based on side information, which is used to reconstruct a disease.

为实现上述发明目的，提供以下技术方案：In order to realize the above-mentioned purpose of the invention, the following technical solutions are provided:

第一方面，一种基于边信息的体检数据补全方法，包括以下步骤：In a first aspect, a method for completing medical examination data based on side information includes the following steps:

(1)构建列表示生理特征和疾病亚型，行表示患者，元素值为患者的生理特征检测值和疾病类型的体检-疾病矩阵；列表示疾病亚型，行表示致病因子，元素值为致病因子导致患疾病的概率的致病因子-疾病矩阵；以及列表示生理特征，行表示致病因子，元素值为致病因子与生理特征的相关性的致病因子-体检矩阵；(1) Construct a column to represent physiological characteristics and disease subtypes, rows to represent patients, and element values to be a physical examination-disease matrix of the patient's physiological characteristics detection values and disease types; columns to represent disease subtypes, rows to represent causative factors, and element values The causative factor-disease matrix of the probability that the causative factor causes the disease; and the causative factor-physical examination matrix in which the column represents the physiological feature, the row represents the causative factor, and the element value is the correlation between the causative factor and the physiological feature;

(2)针对体检-疾病矩阵，根据体检项目数据补充生理特征检测值，根据医生的主观诊断结果补充疾病类型；针对致病因子-疾病矩阵和致病因子-体检矩阵，根据医学知识，补充已知致病因子导致已知疾病亚型的概率，补充已知致病因子与生理特征的相关性；(2) For the physical examination-disease matrix, supplement the detected values of physiological characteristics according to the data of the physical examination items, and supplement the disease types according to the subjective diagnosis results of the doctor; for the pathogenic factor-disease matrix and the pathogenic factor-physical examination matrix, according to medical knowledge, supplement the The probability of known causative factors causing known disease subtypes, supplementing the correlation between known causative factors and physiological characteristics;

(3)分别在体检-疾病矩阵和致病因子-疾病矩阵，体检-疾病矩阵和致病因子-体检矩阵，以及致病因子-疾病矩阵和致病因子-体检矩阵建立编码解码网络D2F Net，D2CNet以及F2C Net；(3) Establish an encoding and decoding network D2F Net in the physical examination-disease matrix and the pathogenic factor-disease matrix, the physical examination-disease matrix and the pathogenic factor-physical examination matrix, and the pathogenic factor-disease matrix and the pathogenic factor-physical examination matrix, respectively, D2CNet and F2C Net;

(4)联合训练编码解码网络D2F Net，D2C Net以及F2C Net，当训练结束后，致病因子-疾病矩阵和致病因子-体检矩阵已经被补全；(4) Jointly train the encoding and decoding networks D2F Net, D2C Net and F2C Net. After the training, the pathogenic factor-disease matrix and the pathogenic factor-physical examination matrix have been completed;

(5)将待补全的体检-疾病矩阵输入到D2F Net，D2C Net中，利用补全的致病因子-疾病矩阵、致病因子-体检矩阵和F2C Net，经计算补全体检-疾病矩阵。(5) Input the physical examination-disease matrix to be completed into D2F Net and D2C Net, and use the completed causative factor-disease matrix, causative factor-physical examination matrix and F2C Net to complete the physical examination-disease matrix after calculation .

该体检数据补全方法能够根据已有的数据信息，通过编码和解码的方式对未知的信息进行补全，极大地减轻了一声繁重的工作量，减轻患者的经济和身体负担，此外，还能够帮助不同的医院、医生统一应用不同的体检结果，保证医疗资源不浪费。第二方面，一种基于边信息的体检数据补全装置，包括计算机存储器、计算机处理器以及存储在所述计算机存储器中并可在所述计算机处理器上执行的计算机程序，The medical examination data completion method can complete the unknown information by encoding and decoding according to the existing data information, which greatly reduces the heavy workload and reduces the economic and physical burden of the patient. Help different hospitals and doctors apply different physical examination results uniformly to ensure that medical resources are not wasted. In a second aspect, an apparatus for completing medical examination data based on side information, comprising a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor,

所述计算机存储器中存有通过第一方面所述的基于边信息的体检数据补全方法补全的致病因子-疾病矩阵、致病因子-体检矩阵以及D2F Net，D2C Net以及F2C Net的参数；The computer memory stores the causative factor-disease matrix, causative factor-physical examination matrix and D2F Net, parameters of D2C Net and F2C Net that are completed by the side information-based medical examination data completion method described in the first aspect ;

所述计算机处理器执行所述计算机程序时实现以下步骤：The computer processor implements the following steps when executing the computer program:

接收输入的待补全的体检-疾病矩阵，利用补全的致病因子-疾病矩阵、致病因子-体检矩阵、D2F Net，D2C Net以及F2C Net对体检-疾病矩阵进行计算，输出补全的体检-疾病矩阵。Receive the input medical examination-disease matrix to be completed, use the completed pathogenic factor-disease matrix, pathogenic factor-physical examination matrix, D2F Net, D2C Net and F2C Net to calculate the medical examination-disease matrix, and output the completed Physical Exam - Disease Matrix.

该体检数据补全装置能够根据已有的数据信息和确定的致病因子-疾病矩阵、致病因子-体检矩阵，通过编码和解码的方式对未知的信息进行补全，极大地减轻了一声繁重的工作量，减轻患者的经济和身体负担，此外，还能够帮助不同的医院、医生统一应用不同的体检结果，保证医疗资源不浪费。第三方面，一种利用如第二方面所述的基于边信息的体检数据补全装置获得疾病结果的应用，根据输出的补全体检-疾病矩阵，查找获得疾病结果。The medical examination data completion device can complete unknown information by encoding and decoding according to the existing data information and the determined pathogenic factor-disease matrix and pathogenic factor-physical examination matrix, which greatly reduces the burden of sound. In addition, it can help different hospitals and doctors apply different physical examination results uniformly, so as to ensure that medical resources are not wasted. A third aspect provides an application for obtaining disease results by using the apparatus for completing medical examination data based on side information as described in the second aspect, and searching and obtaining disease results according to the output complemented medical examination-disease matrix.

根据该体检数据补全装置输出的补全体检-疾病矩阵获得的预测疾病亚型，准确率能达到95％以上，能辅助医生进行疾病诊断。According to the predicted disease subtype obtained by complementing the medical examination-disease matrix output by the medical examination data complementing device, the accuracy rate can reach more than 95%, which can assist doctors in disease diagnosis.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图做简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动前提下，还可以根据这些附图获得其他附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative efforts.

图1是实施例提供的体检-疾病矩阵的一个示意形式；Fig. 1 is a schematic form of the physical examination-disease matrix provided by the embodiment;

图2是实施例提供的致病因子-疾病矩阵的一个示意形式；Fig. 2 is a schematic form of the causative factor-disease matrix provided by the embodiment;

图3是实施例提供的致病因子-体检矩阵的一个示意形式；Fig. 3 is a schematic form of the pathogenic factor-physical examination matrix provided by the embodiment;

图4是实施例提供的在体检-疾病矩阵、致病因子-疾病矩阵、以及致病因子-体检矩阵之间构建的编码解码网络的示意图。FIG. 4 is a schematic diagram of an encoding-decoding network constructed between a physical examination-disease matrix, a causative factor-disease matrix, and a causative factor-physical examination matrix provided by the embodiment.

具体实施方式Detailed ways

为使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例对本发明进行进一步的详细说明。应当理解，此处所描述的具体实施方式仅仅用以解释本发明，并不限定本发明的保护范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, and do not limit the protection scope of the present invention.

为了解决体检费钱费力以及医生体检工作繁重的问题，本实施例提供的基于边信息的体检数据补全方法，具体包括以下步骤：In order to solve the problems of expensive and laborious physical examination and heavy physical examination work for doctors, the method for completing physical examination data based on side information provided in this embodiment specifically includes the following steps:

S101、构建体检-疾病矩阵、致病因子-疾病矩阵以及致病因子-体检矩阵。S101 , constructing a physical examination-disease matrix, a pathogenic factor-disease matrix, and a pathogenic factor-physical examination matrix.

针对体检-疾病矩阵，列表示生理特征和疾病亚型，行表示患者，元素值为患者的生理特征检测值和疾病类型。其中，生理特征是指人体的一些生理信息，一般都是体检的项目，包括身高、体重、心率、血常规20相等，疾病亚型是医生主观诊断的疾病类型，如高血压、糖尿病等。图1给出了一个示意性的体检-疾病矩阵，不包含任何真实信息，仅用于描述体检-疾病矩阵的结构。根据图1所示，行表示不同的患者，列表示不同的体检项目，如球蛋白、洪锡标、谷丙转氨酶等，列还表示患者的体检结果，如A、B、C、D、E、F、G等。For the physical examination-disease matrix, the columns represent physiological characteristics and disease subtypes, the rows represent patients, and the element values are the patient's physiological characteristics detection values and disease types. Among them, physiological characteristics refer to some physiological information of the human body, which are generally items of physical examination, including height, weight, heart rate, and blood routine. Figure 1 presents a schematic physical examination-disease matrix, which does not contain any real information and is only used to describe the structure of the physical examination-disease matrix. As shown in Figure 1, the rows represent different patients, the columns represent different physical examination items, such as globulin, Hongxibiao, alanine aminotransferase, etc., and the columns also represent the physical examination results of patients, such as A, B, C, D, E, F , G, etc.

体检-疾病矩阵中，对于以阴阳表示的生理特征，以阳表示的生理特征对应的检测值用1表示，以阴表示的生理特征对应的检测值用0表示。In the physical examination-disease matrix, for the physiological features represented by yin and yang, the detected value corresponding to the physiological feature represented by yang is represented by 1, and the detected value corresponding to the physiological feature represented by yin is represented by 0.

针对致病因子-疾病矩阵，列表示疾病亚型，分为显性和隐性，对于已知的疾病即为显性疾病亚型，对于未知的疾病即为隐性疾病亚型，行表示致病因子，致病因子也分为显性和隐性，已知的致病因子即为显性致病因子，未知致病因子为隐性致病因子，元素值为致病因子导致患疾病的概率。假设致病因子-疾病矩阵为M×N的矩阵，其行M代表M种致病因子，其中仅有m(<M)种是显性的，其列N代表N种疾病亚型，其中仅有n(<N)种是显性的。图2是一个示例性的致病因子-疾病矩阵，其中，疾亚型A、B、C、D、E、F、G为已知类型的疾病，剩余未知类型1、未知类型2、未知类型3、未知类型4即为未知疾病亚型，a、b、c为已知类型的致病因子，其余6种为未知致病因子。若如图2情况，则M＝9，m＝3；N＝11，n＝7。其中，M和N的一定要大于m和n；而至于大多少，则通过经验进行适当的估计。For the causative factor-disease matrix, the columns represent disease subtypes, which are divided into dominant and recessive. For known diseases, it is a dominant disease subtype, and for an unknown disease, it is a recessive disease subtype. Disease factors, pathogenic factors are also divided into dominant and recessive factors, known pathogenic factors are dominant pathogenic factors, unknown pathogenic factors are recessive pathogenic factors, and the element value is the pathogenic factor that causes the disease. probability. Suppose the causative factor-disease matrix is an M×N matrix, and its row M represents M causative factors, of which only m (<M) are dominant, and its column N represents N disease subtypes, of which only There are n (<N) species that are dominant. Figure 2 is an exemplary causative factor-disease matrix, in which subtypes A, B, C, D, E, F, and G are known types of diseases, and the remaining unknown types 1, unknown types 2, unknown types 3. Unknown type 4 is the unknown disease subtype, a, b, and c are known types of pathogenic factors, and the remaining 6 are unknown pathogenic factors. If as shown in Figure 2, M=9, m=3; N=11, n=7. Among them, M and N must be larger than m and n; and as for how much larger, appropriate estimation is made through experience.

对于已知疾病亚型和已知致病因子组成的m×n矩阵，其元素值，也就是致病因子导致患疾病的概率(即致病因子和疾病之间的发生概率)是根据医学知识或医学知识证明进行补全的，即图2中的数字0.4、0.1等数值根据医学知识或医学知识证明填充，即实现对致病因子-疾病矩阵的边信息建立，M×N的矩阵中未知疾病亚型和未知致病因子对应的元素值无法填充，则空着。For an m×n matrix composed of known disease subtypes and known causative factors, the element values, that is, the probability that the causative factor causes the disease (ie, the probability of occurrence between the causative factor and the disease), are based on medical knowledge. Or the medical knowledge proof is completed, that is, the numbers 0.4, 0.1 and other values in Figure 2 are filled according to the medical knowledge or medical knowledge proof, that is, the establishment of the edge information of the causative factor-disease matrix, which is unknown in the M×N matrix The element values corresponding to disease subtypes and unknown causative factors cannot be filled, so they are left blank.

针对致病因子-体检矩阵，列表示生理特征(也就是体检数据)，行表示致病因子，元素值为致病因子与生理特征的相关性，该相关性是根据医疗知识和医学统计成果构建的，根据相关的程度可以用高、中、低表示，如附图3所示；还可以用正数权重表示正相关，用负数权重表示负相关，用0表示不相关，即实现了对致病因子-体检矩阵的边信息建立。For the causative factor-physical examination matrix, the columns represent physiological characteristics (that is, physical examination data), the rows represent causative factors, and the element value is the correlation between causative factors and physiological characteristics, which is constructed based on medical knowledge and medical statistical results. , according to the degree of correlation, it can be expressed as high, medium and low, as shown in Figure 3; it can also use positive weights to express positive correlations, negative weights to express negative correlations, and 0 to express irrelevance, that is, the matching is achieved. The edge information of the disease factor-physical examination matrix is established.

步骤102，分别在体检-疾病矩阵和致病因子-疾病矩阵，体检-疾病矩阵和致病因子-体检矩阵，以及致病因子-疾病矩阵和致病因子-体检矩阵建立编码解码网络D2F Net，D2C Net以及F2C Net，如图4所示。Step 102, establish an encoding and decoding network D2F Net in the physical examination-disease matrix and the pathogenic factor-disease matrix, the physical examination-disease matrix and the pathogenic factor-physical examination matrix, and the pathogenic factor-disease matrix and the pathogenic factor-physical examination matrix, respectively, D2C Net and F2C Net, as shown in Figure 4.

其中，D2F Net，D2C Net以及F2C Net的网络结构均为由卷积层搭建的自编码器和反卷积搭建的自解码器。卷积层和反卷积层一般为3～4层，且在每个层上建立一个重建目标函数，在自解码器中，要求各层对应的重建差值尽量小。Among them, the network structures of D2F Net, D2C Net and F2C Net are all self-encoders built by convolutional layers and self-decoders built by deconvolution. The convolution layer and the deconvolution layer are generally 3 to 4 layers, and a reconstruction objective function is established on each layer. In the self-decoder, the reconstruction difference corresponding to each layer is required to be as small as possible.

若体检-疾病矩阵、致病因子-疾病矩阵以及致病因子-体检矩阵的尺寸较大，则使用ResNeXt等大容量的神经网络进行编码，并利用与神经网络中的卷积层相对应的反卷积层搭建自解码器，其中，神经网络不能包含会导致信息损失的pooling层，需要将其中的pooling层和dropout层去除。If the size of the physical examination-disease matrix, the causative factor-disease matrix and the causative factor-physical examination matrix are large, use a large-capacity neural network such as ResNeXt for encoding, and use the inverse corresponding to the convolutional layer in the neural network. The convolutional layer builds a self-decoder, in which the neural network cannot contain a pooling layer that will cause information loss, and the pooling layer and dropout layer need to be removed.

S103，联合训练编码解码网络D2F Net，D2C Net以及F2C Net，当训练结束后，致病因子-疾病矩阵和致病因子-体检矩阵已经被补全。S103 , jointly train the encoding and decoding networks D2F Net, D2C Net and F2C Net. After the training, the causative factor-disease matrix and the causative factor-physical examination matrix have been completed.

当补全致病因子-疾病矩阵时，采用D2F Net和F2C Net对致病因子-疾病矩阵进行补全，具体地，When completing the causative factor-disease matrix, D2F Net and F2C Net are used to complete the causative factor-disease matrix, specifically,

对于D2F Net，以体检-疾病矩阵作为输入变量，采用自编码器对体检-疾病矩阵进行编码产生重构致病因子-疾病矩阵，采用自解码器对重构致病因子-疾病矩阵进行解码，产生重构体检-疾病矩阵，以体检-疾病矩阵与重构体检-疾病矩阵的离差平方和损失函数，和因子-疾病矩阵与重构因子-疾病矩阵的离差平方和损失函数之和作为D2F Net的损失函数L₁；For D2F Net, the physical examination-disease matrix is used as the input variable, the autoencoder is used to encode the physical examination-disease matrix to generate the reconstructed pathogenic factor-disease matrix, and the autodecoder is used to decode the reconstructed pathogenic factor-disease matrix, Generate a reconstructed medical examination-disease matrix, taking the sum of the squared deviation loss function of the medical examination-disease matrix and the reconstructed medical examination-disease matrix, and the sum of the squared deviation loss function of the factor-disease matrix and the reconstructed factor-disease matrix as the Loss function L ₁ of D2F Net;

对于F2C Net，以致病因子-体检矩阵作为输入变量，采用自编码器对致病因子-体检矩阵进行编码产生重构致病因子-疾病矩阵，采用自解码器对重构致病因子-疾病矩阵进行解码，产生重构致病因子-体检矩阵，以致病因子-体检矩阵与重构致病因子-体检矩阵的离差平方和损失函数，和致病因子-疾病矩阵与重构致病因子-疾病矩阵的离差平方和损失函数之和作为F2C Net的损失函数L₂；For F2C Net, the pathogenic factor-physical examination matrix is used as the input variable, and the autoencoder is used to encode the pathogenic factor-physical examination matrix to generate the reconstructed pathogenic factor-disease matrix, and the autodecoder is used to reconstruct the pathogenic factor-disease matrix. The matrix is decoded to generate a reconstructed causative factor-physical examination matrix, and the squared dispersion loss function of the causative factor-physical examination matrix and the reconstructed causative factor-physical examination matrix, and the causative factor-disease matrix and the reconstructed pathogenic The sum of the squared deviation and loss function of the factor-disease matrix is used as the loss function L ₂ of F2C Net;

以损失函数L₁和损失函数L₂之和L¹作为补全致病因子-疾病矩阵的总损失函数。The sum of the loss function L ₁ and the loss function L ₂ L ¹ is used as the total loss function to complete the causative factor-disease matrix.

当补全致病因子-体检矩阵时，采用F2C Net和D2C Net对致病因子-体检矩阵进行补全，具体地，When completing the pathogenic factor-physical examination matrix, F2C Net and D2C Net are used to complete the pathogenic factor-physical examination matrix. Specifically,

对于F2C Net，以致病因子-疾病矩阵作为输入变量，采用自编码器对致病因子-疾病矩阵进行编码产生重构致病因子-体检矩阵，采用自解码器对重构致病因子-体检矩阵进行解码，产生重构致病因子-疾病矩阵，以致病因子-疾病矩阵与重构致病因子-疾病矩阵的离差平方和损失函数，和致病因子-体检矩阵与重构致病因子-体检矩阵的离差平方和损失函数之和作为F2C Net的损失函数L₃；For F2C Net, the pathogenic factor-disease matrix is used as the input variable, and the autoencoder is used to encode the pathogenic factor-disease matrix to generate a reconstructed pathogenic factor-physical examination matrix. The matrix is decoded to generate a reconstructed causative factor-disease matrix, with the squared variance loss function of the causative factor-disease matrix and the reconstructed causative factor-disease matrix, and the causative factor-physical examination matrix and the reconstructed pathogenic The sum of the squared deviation and loss function of the factor-physical examination matrix is used as the loss function L ₃ of F2C Net;

对于D2C Net，以体检-疾病矩阵作为输入变量，采用自编码器对体检-疾病矩阵进行编码产生重构致病因子-体检矩阵，采用自解码器对重构致病因子-体检矩阵进行解码，产生重构体检-疾病矩阵，以体检-疾病矩阵与重构体检-疾病矩阵的离差平方和损失函数，和致病因子-疾病矩阵与重构致病因子-疾病矩阵的离差平方和损失函数之和作为D2C Net的损失函数L₄；For D2C Net, using the physical examination-disease matrix as the input variable, the autoencoder is used to encode the physical examination-disease matrix to generate the reconstructed pathogenic factor-physical examination matrix, and the autodecoder is used to decode the reconstructed pathogenic factor-physical examination matrix. Generates a reconstructed medical-disease matrix with a sum of squared dispersion loss function of the medical-disease matrix and the reconstructed medical-disease matrix, and a causative factor-disease matrix and a reconstructed causative factor-disease matrix with a squared deviation loss The sum of functions is used as the loss function L ₄ of D2C Net;

以损失函数L₃和损失函数L₄之和L²作为补全致病因子-体检矩阵的总损失函数。The sum of the loss function L ₃ and the loss function L ₄ L ² is used as the total loss function to complete the causative factor-physical examination matrix.

当补全体检-疾病矩阵时，采用D2C Net和D2F Net对体检-疾病矩阵进行补全，具体地，When completing the physical examination-disease matrix, D2C Net and D2F Net are used to complete the physical examination-disease matrix. Specifically,

对于D2C Net，以致病因子-体检矩阵作为输入变量，采用自编码器对致病因子-体检矩阵进行编码产生重构体检-疾病矩阵，采用自解码器对重构体检-疾病矩阵进行解码，产生重构致病因子-体检矩阵，以致病因子-体检矩阵与重构致病因子-体检矩阵的离差平方和损失函数，和体检-疾病矩阵与重构体检-疾病矩阵的离差平方和损失函数之和作为D2C Net的损失函数L₅；For D2C Net, using the pathogenic factor-physical examination matrix as the input variable, the autoencoder is used to encode the pathogenic factor-physical examination matrix to generate the reconstructed physical examination-disease matrix, and the autodecoder is used to decode the reconstructed physical examination-disease matrix, Generate a reconstructed causative factor-physical examination matrix, with the squared dispersion loss function of the causative factor-physical examination matrix and the reconstructed causative factor-physical examination matrix, and the squared deviation of the physical examination-disease matrix and the reconstructed physical examination-disease matrix. The sum of the loss function is used as the loss function L ₅ of D2C Net;

对于D2F Net，以致病因子-疾病矩阵作为输入变量，采用自编码器对致病因子-疾病矩阵进行编码产生重构体检-疾病矩阵，采用自解码器对重构体检-疾病矩阵进行解码，产生重构致病因子-疾病矩阵，以致病因子-疾病矩阵与重构致病因子-疾病矩阵的离差平方和损失函数，和体检-疾病矩阵与重构体检-疾病矩阵的离差平方和损失函数之和作为D2F Net的损失函数L₆；For D2F Net, the pathogenic factor-disease matrix is used as the input variable, the autoencoder is used to encode the pathogenic factor-disease matrix to generate the reconstructed medical examination-disease matrix, and the autodecoder is used to decode the reconstructed medical examination-disease matrix, Generate a reconstructed causative factor-disease matrix, with the squared dispersion loss function of the causative factor-disease matrix and the reconstructed causative factor-disease matrix, and the squared deviation of the medical examination-disease matrix and the reconstructed medical examination-disease matrix The sum of the loss function is used as the loss function L ₆ of D2F Net;

以损失函数L₅和损失函数L₆之和L³作为补全体检-疾病矩阵的总损失函数。 _The sum L3 ^of the loss function L5 and the loss function L6 is used as the total loss function to complete the medical examination _- disease matrix.

联合训练时，以L¹、L²以及L³三者之和作为总损失函数，反向传递，更新D2F Net，D2C Net以及F2C Net的网络参数和补全致病因子-疾病矩阵、致病因子-体检矩阵。During joint training, the sum of L ¹ , L ² and L ³ is used as the total loss function, and the reverse transfer is performed to update the network parameters of D2F Net, D2C Net and F2C Net and complete the pathogenic factor-disease matrix, pathogenicity Factor-Physical Exam Matrix.

上述体检-疾病矩阵是一个元素值完整的矩阵，致病因子-疾病矩阵和致病因子-体检矩阵仅是通过信息建立的不完整矩阵，即均不包括未知致病因子和未知疾病亚型对应的元素值，通过S103的联合训练，利用体检-疾病矩阵和D2F Net，D2C Net以及F2C Net三个网络的自编码和解码功能补全相应的致病因子-疾病矩阵和致病因子-体检矩阵，这样就找到了未知致病因子与未知疾病亚型之间的发生概率，以及未知致病因子与生理特征之间的相关性。The above physical examination-disease matrix is a matrix with complete element values, and the causative factor-disease matrix and the causative factor-physical examination matrix are only incomplete matrices established by information, that is, they do not include the correspondence of unknown causative factors and unknown disease subtypes. Through the joint training of S103, the corresponding causative factor-disease matrix and causative factor-physical examination matrix are completed by using the autoencoding and decoding functions of the physical examination-disease matrix and the three networks of D2F Net, D2C Net and F2C Net. , thus finding the probability of occurrence between unknown causative factors and unknown disease subtypes, as well as the correlation between unknown causative factors and physiological characteristics.

S104，将待补全的体检-疾病矩阵输入到D2F Net，D2C Net中，利用补全的致病因子-疾病矩阵、致病因子-体检矩阵和F2C Net，经计算补全体检-疾病矩阵。S104, input the physical examination-disease matrix to be completed into D2F Net and D2C Net, and use the completed pathogenic factor-disease matrix, pathogenic factor-health examination matrix and F2C Net to complete the physical examination-disease matrix after calculation.

本实施例还提供了一种基于边信息的体检数据补全装置，包括计算机存储器、计算机处理器以及存储在所述计算机存储器中并可在所述计算机处理器上执行的计算机程序，计算机存储器中存有上述体检数据补全方法补全的致病因子-疾病矩阵、致病因子-体检矩阵以及D2F Net，D2C Net以及F2C Net的参数；This embodiment also provides a side information-based medical examination data complementing device, including a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor. There are causative factor-disease matrix, causative factor-physical examination matrix and parameters of D2F Net, D2C Net and F2C Net completed by the above-mentioned physical examination data completion method;

计算机处理器执行所述计算机程序时实现以下步骤：The computer processor implements the following steps when executing the computer program:

上述体检数据补全方法和装置能够根据已有的数据信息和确定的致病因子-疾病矩阵、致病因子-体检矩阵，通过编码和解码的方式对未知的信息进行补全，极大地减轻了一声繁重的工作量，减轻患者的经济和身体负担，此外，还能够帮助不同的医院、医生统一应用不同的体检结果，保证医疗资源不浪费。当上述体检数据补全装置输出补全的体检-疾病矩阵后，该体检-疾病矩阵中即包含有补全的疾病类型，医生可以根据补全体检-疾病矩阵，查找获得疾病结果，该疾病结果准确率能达到95％以上，能辅助医生进行疾病诊断。The above-mentioned physical examination data completion method and device can complete unknown information by encoding and decoding according to the existing data information and the determined pathogenic factor-disease matrix and pathogenic factor-physical examination matrix, which greatly alleviates the problem. A heavy workload can reduce the economic and physical burden of patients. In addition, it can also help different hospitals and doctors to apply different physical examination results uniformly, so as to ensure that medical resources are not wasted. After the above-mentioned physical examination data completion device outputs the completed physical examination-disease matrix, the physical examination-disease matrix contains the completed disease types, and the doctor can search and obtain the disease result according to the completed physical examination-disease matrix. The accuracy rate can reach more than 95%, which can assist doctors in diagnosing diseases.

以上所述的具体实施方式对本发明的技术方案和有益效果进行了详细说明，应理解的是以上所述仅为本发明的最优选实施例，并不用于限制本发明，凡在本发明的原则范围内所做的任何修改、补充和等同替换等，均应包含在本发明的保护范围之内。The above-mentioned specific embodiments describe in detail the technical solutions and beneficial effects of the present invention. It should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, additions and equivalent substitutions made within the scope shall be included within the protection scope of the present invention.

Claims

1. A physical examination data completion method based on side information comprises the following steps:

(1) constructing a physical examination-disease matrix with columns representing physiological characteristics and disease subtypes, rows representing patients and element values being physiological characteristic detection values and disease types of the patients; the columns represent disease subtypes, the rows represent disease-causing agents, and the element values are disease-causing agent-disease matrices of probability of disease-causing agents; and columns represent physiological characteristics, rows represent pathogenic factors, and element values are pathogenic factor-physical examination matrixes of the correlation between the pathogenic factors and the physiological characteristics;

(2) supplementing physiological characteristic detection values according to physical examination item data and supplementing disease types according to subjective diagnosis results of doctors aiming at the physical examination-disease matrix; supplementing the probability that the known pathogenic factors cause known disease subtypes and the correlation between the known pathogenic factors and physiological characteristics according to medical knowledge for the pathogenic factor-disease matrix and the pathogenic factor-physical examination matrix;

(3) establishing a code-decoding network D2F Net, a code-decoding network D2C Net and a code-decoding network F2C Net in a physical examination-disease matrix and a pathogenic factor-disease matrix, a physical examination-disease matrix and a pathogenic factor-physical examination matrix, and a pathogenic factor-disease matrix and a pathogenic factor-physical examination matrix respectively; the network structures of the coding and decoding network D2F Net, the coding and decoding network D2C Net and the coding and decoding network F2C Net are self-encoders built by convolutional layers and self-decoders built by deconvolution;

(4) jointly training the coding and decoding network D2F Net, the coding and decoding network D2C Net and the coding and decoding network F2CNet, and completing a pathogenic factor-disease matrix and a pathogenic factor-physical examination matrix after training is finished;

in joint training, to complement the overall loss function L of the disease-causative-agent matrix¹And (3) complementing the overall loss function L of the disease-causing factor-physical examination matrix²And the overall loss function L of the completion physical examination-disease matrix³The sum of the three is used as a total loss function, reverse transfer is carried out, and the network parameters, the completion pathogenic factor-disease matrix and the completion pathogenic factor-physical examination matrix of the coding and decoding network D2F Net, the coding and decoding network D2C Net and the coding and decoding network F2C Net are updated;

when the pathogenic factor-disease matrix is complemented, the pathogenic factor-disease matrix is complemented by adopting a coding and decoding network D2F Net and a coding and decoding network F2C Net, for the coding and decoding network D2F Net, a physical examination-disease matrix is used as an input variable, a self-encoder is used for coding the physical examination-disease matrix to generate a reconstructed pathogenic factor-disease matrix, a self-decoder is used for decoding the reconstructed pathogenic factor-disease matrix to generate a reconstructed physical examination-disease matrix, the sum of the squares of deviations of the physical examination-disease matrix and the reconstructed physical examination-disease matrix and the sum of the squares of deviations of the factor-disease matrix and the reconstructed factor-disease matrix are used as a loss function L of the coding and decoding network D2F Net₁；

For the encoding and decoding network F2C Net, a pathogenic factor-physical matrix is used as an input variable, the pathogenic factor-physical matrix is encoded by using an auto-encoder to generate a reconstructed pathogenic factor-disease matrix, the reconstructed pathogenic factor-disease matrix is decoded by using an auto-decoder to generate a reconstructed pathogenic factor-physical matrix, and the sum of the dispersion square sum loss function of the pathogenic factor-physical matrix and the reconstructed pathogenic factor-physical matrix and the dispersion square sum loss function of the pathogenic factor-disease matrix and the reconstructed pathogenic factor-disease matrix is used as the loss function L of the encoding and decoding network F2C Net₂；

By a loss function L₁And a loss function L₂Sum L¹As a function of total loss to complement the disease agent-disease matrix;

when the pathogenic factor-physical matrix is completed, the pathogenic factor-physical matrix is completed by adopting the encoding and decoding network F2C Net and the encoding and decoding network D2C Net, for the encoding and decoding network F2C Net, the pathogenic factor-disease matrix is used as an input variable, the self-encoder is adopted to encode the pathogenic factor-disease matrix to generate a reconstructed pathogenic factor-physical examination matrix, the self-decoder is adopted to decode the reconstructed pathogenic factor-physical examination matrix to generate a reconstructed pathogenic factor-disease matrix, taking the square deviation sum loss function of the pathogenic factor-disease matrix and the reconstructed pathogenic factor-disease matrix, and the sum of the dispersion square sum loss functions of the pathogenic factor-physical matrix and the reconstructed pathogenic factor-physical matrix is used as the loss function L of the encoding and decoding network F2C Net.₃；

For the encoding and decoding network D2C Net, a physical examination-disease matrix is used as an input variable, the physical examination-disease matrix is encoded by using an auto-encoder to generate a reconstructed disease-causing-agent matrix, the reconstructed disease-causing-agent matrix is decoded by using an auto-decoder to generate a reconstructed physical examination-disease matrix, and the sum of the dispersion square sum loss functions of the physical examination-disease matrix and the reconstructed physical examination-disease matrix and the dispersion square sum loss function of the disease-causing-agent matrix and the reconstructed disease-causing-agent matrix is used as a loss function L of the encoding and decoding network D2C Net₄；

By a loss function L₃And a loss function L₄Sum L²As a function of the total loss to complement the virulence factor-physical examination matrix;

(5) inputting the physical examination-disease matrix to be completed into a coding and decoding network D2F Net and a coding and decoding network D2C Net, and calculating a completed physical examination-disease matrix by utilizing the completed pathogenic factor-disease matrix, pathogenic factor-physical examination matrix and the coding and decoding network F2C Net;

when the physical examination-disease matrix is complemented, the physical examination-disease matrix is complemented by adopting the coding and decoding network D2C Net and the coding and decoding network D2F Net, for the coding and decoding network D2C Net, the pathogenic factor-physical examination matrix is used as an input variable, the pathogenic factor-physical examination matrix is coded by adopting an auto-coder to generate a reconstructed physical examination-disease matrix, and the reconstructed physical examination-disease matrix is acquiredDecoding the reconstructed physical examination-disease matrix by using a self-decoder to generate a reconstructed pathogenic factor-physical examination matrix, and taking the sum of the squared deviations and the loss functions of the pathogenic factor-physical examination matrix and the reconstructed pathogenic factor-physical examination matrix and the sum of the squared deviations and the loss functions of the physical examination-disease matrix and the reconstructed physical examination-disease matrix as the loss function L of the coding and decoding network D2C Net₅；

For the encoding and decoding network D2F Net, the pathogenic factor-disease matrix is used as an input variable, the pathogenic factor-disease matrix is encoded by using an auto-encoder to generate a reconstructed physical examination-disease matrix, the reconstructed physical examination-disease matrix is decoded by using an auto-decoder to generate a reconstructed pathogenic factor-disease matrix, and the sum of the square deviation sum loss functions of the pathogenic factor-disease matrix and the reconstructed pathogenic factor-disease matrix and the sum of the square deviation sum loss functions of the physical examination-disease matrix and the reconstructed physical examination-disease matrix is used as the loss function L of the encoding and decoding network D2F Net₆；

By a loss function L₅And a loss function L₆Sum L³As a function of the total loss of the completion physical examination-disease matrix.

2. The method for supplementing physical examination data based on side information as set forth in claim 1, wherein in the physical examination-disease matrix, for the physiological characteristics expressed in yin and yang, the detection value corresponding to the physiological characteristic expressed in yang is represented by 1, and the detection value corresponding to the physiological characteristic expressed in yin is represented by 0.

3. A physical examination data completion apparatus based on side information, comprising a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor, characterized in that:

the computer memory stores the parameters of the pathogenic factor-disease matrix, the pathogenic factor-physical examination matrix and the codec network D2F Net, the codec network D2C Net and the codec network F2C Net complemented by the side information-based physical examination data complementing method of claim 1 or 2;

the computer processor, when executing the computer program, performs the steps of:

and receiving the input physical examination-disease matrix to be complemented, calculating the physical examination-disease matrix by utilizing the complemented pathogenic factor-disease matrix, the complemented pathogenic factor-physical examination matrix, the coding and decoding network D2F Net, the coding and decoding network D2C Net and the coding and decoding network F2C Net, and outputting the complemented physical examination-disease matrix.

4. Use of the side-information based physical examination data completion apparatus of claim 3 for obtaining disease results, wherein the disease results are obtained by searching the output completed physical examination-disease matrix.