CN109658996B - Physical examination data completion method and device based on side information and application - Google Patents
Physical examination data completion method and device based on side information and application Download PDFInfo
- Publication number
- CN109658996B CN109658996B CN201811416427.4A CN201811416427A CN109658996B CN 109658996 B CN109658996 B CN 109658996B CN 201811416427 A CN201811416427 A CN 201811416427A CN 109658996 B CN109658996 B CN 109658996B
- Authority
- CN
- China
- Prior art keywords
- matrix
- disease
- physical examination
- net
- pathogenic factor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 239000011159 matrix material Substances 0.000 claims abstract description 264
- 201000010099 disease Diseases 0.000 claims abstract description 227
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 227
- 230000001717 pathogenic effect Effects 0.000 claims abstract description 114
- 238000012549 training Methods 0.000 claims abstract description 8
- 230000000295 complement effect Effects 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 59
- 239000006185 dispersion Substances 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 6
- 230000001502 supplementing effect Effects 0.000 claims description 5
- 238000003745 diagnosis Methods 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 2
- 231100000676 disease causative agent Toxicity 0.000 claims 1
- 230000001018 virulence Effects 0.000 claims 1
- 238000004364 calculation method Methods 0.000 abstract description 4
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000013589 supplement Substances 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 102100036475 Alanine aminotransferase 1 Human genes 0.000 description 1
- 108010082126 Alanine transaminase Proteins 0.000 description 1
- 241000283086 Equidae Species 0.000 description 1
- 102000006395 Globulins Human genes 0.000 description 1
- 108010044091 Globulins Proteins 0.000 description 1
- -1 Hongxibiao Proteins 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000007918 pathogenicity Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
本发明公开了一种基于边信息的体检数据补全方法,包括(1)构建和根据边信息补全体检‑疾病矩阵、致病因子‑疾病矩阵、致病因子‑体检矩阵;(2)分别在任意两个矩阵之间建立编码解码网络D2F Net,D2C Net以及F2C Net;(3)联合训练D2F Net,D2C Net以及F2C Net,训练结束,致病因子‑疾病矩阵和致病因子‑体检矩阵已经被补全;(4)将待补全的体检‑疾病矩阵输入到D2F Net,D2C Net中,利用补全的致病因子‑疾病矩阵、致病因子‑体检矩阵和F2C Net,经计算补全体检‑疾病矩阵。还公开了一种基于边信息的体检数据补全装置,能够根据已有信息来补全体检数据和疾病结果。
The invention discloses a medical examination data completion method based on side information, including (1) constructing and complementing a medical examination-disease matrix, a pathogenic factor-disease matrix, and a pathogenic factor-physical examination matrix according to the side information; (2) respectively Establish encoding and decoding networks D2F Net, D2C Net and F2C Net between any two matrices; (3) Jointly train D2F Net, D2C Net and F2C Net, after training, the pathogenic factor-disease matrix and the pathogenic factor-physical examination matrix (4) Input the medical examination-disease matrix to be completed into D2F Net, D2C Net, and use the completed pathogenic factor-disease matrix, pathogenic factor-physical examination matrix and F2C Net, and complete the calculation after calculation. General Physical Exam-Disease Matrix. A medical examination data complementing device based on side information is also disclosed, which can complement medical examination data and disease results according to existing information.
Description
技术领域technical field
本发明属于数据统计和人工智能领域,具体涉及一种基于边信息的体检数据补全方法、装置及应用。The invention belongs to the fields of data statistics and artificial intelligence, and in particular relates to a method, device and application for completing physical examination data based on side information.
背景技术Background technique
传统体检方案是经过一系列的体检以进行疾病筛查:根据不同的病症需要,在医生或者医疗手册的安排建议下进行相关生理特征项目的体检,再由医生通过相关的生理特征体检结果对患者可能患有的疾病进行诊断。由于体检项目繁多,不同的医院、医生以及时代都具有不同的检查方式,导致体检项目纷繁杂乱,无法统一,造成相关医疗资源的浪费和使病患无畏受累。The traditional physical examination plan is to go through a series of physical examinations for disease screening: according to the needs of different diseases, the physical examination of the relevant physiological characteristics items is carried out under the arrangement and recommendation of the doctor or medical manual, and then the doctor will pass the relevant physiological characteristics examination results to the patient. Diagnose possible diseases. Due to the variety of physical examination items, different hospitals, doctors and eras have different examination methods, resulting in a variety of physical examination items that cannot be unified, resulting in a waste of relevant medical resources and making patients fearless.
随着科技的不断发展,不同体检项目隐含的生理特征相关性以及生理特征与疾病之间的影响程度等医疗知识的研究趋于完善,矩阵补全和边信息问题也得到了发展。矩阵补全(Matrix Completion,简记为:MC)就是根据已知元素估计未知元素,从而把矩阵恢复完整的过程,是人工智能研究项目中的一个重点难点,其任务是通过人工智能算法对不完整的矩阵进行补全。该任务在数据挖掘,电商营销、工程控制、图像和视频处理中皆有重要的应用。With the continuous development of science and technology, the research on medical knowledge such as the correlation of physiological characteristics implied by different physical examination items and the degree of influence between physiological characteristics and diseases tends to be perfected, and the problems of matrix completion and side information have also been developed. Matrix Completion (abbreviated as: MC) is the process of estimating unknown elements based on known elements, thereby restoring the matrix to a complete process. It is a key difficulty in artificial intelligence research projects. Complete matrix for completion. This task has important applications in data mining, e-commerce marketing, engineering control, image and video processing.
在医疗项目中,不同医疗体检项目的统一有赖于矩阵补全算法,通过相关的体检项目推测未知的体检项目的效果。但是,因为矩阵补全技术目前往往通过线性变换、局部信息插值等方法,但是在利用背景知识进行非线性变换的研究较少,结果也不够完善。In medical projects, the unification of different medical physical examination items depends on the matrix completion algorithm, which infers the effect of unknown physical examination items through related physical examination items. However, because the matrix completion technology often uses methods such as linear transformation and local information interpolation, there are few studies on nonlinear transformation using background knowledge, and the results are not perfect.
边信息(Side Information)是指利用已有的信息Y辅助对信息X进行编码,可以使得信息X的编码长度更短。边信息见多用户信源编码。一个通俗的例子是:假设到马场去赌马,根据每个马的赔率可以得到一个最佳的投资方案。但是如果知道赌马的一些历史数据,例如上几场的胜负情况,那么可以得出一个更优的投资方案。赌马中的历史数据就是边信息。Side information refers to using the existing information Y to assist in encoding the information X, which can make the encoding length of the information X shorter. For side information, see Multi-User Source Coding. A popular example is: Suppose you go to the racecourse to bet on horses, and you can get an optimal investment plan according to the odds of each horse. But if you know some historical data on horse betting, such as the results of the last few games, you can come up with a better investment plan. Historical data in horse betting is side information.
边信息算法是基于边信息补全矩阵中缺失信息的算法,即在信息流中找到相关和不相关的数据点,约束和辅助矩阵补全技术的完善,应用于各种需要矩阵补全的领域。边信息法还是传统机器学习中的一个分支,在与人工神经网络和深度学习的结合上也没有足够的尝试。The side information algorithm is an algorithm based on the missing information in the side information completion matrix, that is, to find the relevant and irrelevant data points in the information flow, and the improvement of constraints and auxiliary matrix completion technology is applied to various fields that require matrix completion. . Side information method is still a branch of traditional machine learning, and there is not enough attempt to combine it with artificial neural network and deep learning.
医疗领域中,数据缺失如此严重,带标签数据稀少的情况也很常见,但却鲜有矩阵补全方法应用。In the medical field, data missing is so serious and labeled data is sparse, but matrix completion methods are rarely applied.
发明内容SUMMARY OF THE INVENTION
本发明的目的是提供一种基于边信息的体检数据补全方法、装置,能够根据已有的信息来补全体检数据和疾病结果。The purpose of the present invention is to provide a medical examination data complementing method and device based on side information, which can complement medical examination data and disease results according to the existing information.
本发明的另一目的是提供一种基于边信息的体检数据补全装置的应用,该装置用于重构疾病。Another object of the present invention is to provide an application of an apparatus for complementing medical examination data based on side information, which is used to reconstruct a disease.
为实现上述发明目的,提供以下技术方案:In order to realize the above-mentioned purpose of the invention, the following technical solutions are provided:
第一方面,一种基于边信息的体检数据补全方法,包括以下步骤:In a first aspect, a method for completing medical examination data based on side information includes the following steps:
(1)构建列表示生理特征和疾病亚型,行表示患者,元素值为患者的生理特征检测值和疾病类型的体检-疾病矩阵;列表示疾病亚型,行表示致病因子,元素值为致病因子导致患疾病的概率的致病因子-疾病矩阵;以及列表示生理特征,行表示致病因子,元素值为致病因子与生理特征的相关性的致病因子-体检矩阵;(1) Construct a column to represent physiological characteristics and disease subtypes, rows to represent patients, and element values to be a physical examination-disease matrix of the patient's physiological characteristics detection values and disease types; columns to represent disease subtypes, rows to represent causative factors, and element values The causative factor-disease matrix of the probability that the causative factor causes the disease; and the causative factor-physical examination matrix in which the column represents the physiological feature, the row represents the causative factor, and the element value is the correlation between the causative factor and the physiological feature;
(2)针对体检-疾病矩阵,根据体检项目数据补充生理特征检测值,根据医生的主观诊断结果补充疾病类型;针对致病因子-疾病矩阵和致病因子-体检矩阵,根据医学知识,补充已知致病因子导致已知疾病亚型的概率,补充已知致病因子与生理特征的相关性;(2) For the physical examination-disease matrix, supplement the detected values of physiological characteristics according to the data of the physical examination items, and supplement the disease types according to the subjective diagnosis results of the doctor; for the pathogenic factor-disease matrix and the pathogenic factor-physical examination matrix, according to medical knowledge, supplement the The probability of known causative factors causing known disease subtypes, supplementing the correlation between known causative factors and physiological characteristics;
(3)分别在体检-疾病矩阵和致病因子-疾病矩阵,体检-疾病矩阵和致病因子-体检矩阵,以及致病因子-疾病矩阵和致病因子-体检矩阵建立编码解码网络D2F Net,D2CNet以及F2C Net;(3) Establish an encoding and decoding network D2F Net in the physical examination-disease matrix and the pathogenic factor-disease matrix, the physical examination-disease matrix and the pathogenic factor-physical examination matrix, and the pathogenic factor-disease matrix and the pathogenic factor-physical examination matrix, respectively, D2CNet and F2C Net;
(4)联合训练编码解码网络D2F Net,D2C Net以及F2C Net,当训练结束后,致病因子-疾病矩阵和致病因子-体检矩阵已经被补全;(4) Jointly train the encoding and decoding networks D2F Net, D2C Net and F2C Net. After the training, the pathogenic factor-disease matrix and the pathogenic factor-physical examination matrix have been completed;
(5)将待补全的体检-疾病矩阵输入到D2F Net,D2C Net中,利用补全的致病因子-疾病矩阵、致病因子-体检矩阵和F2C Net,经计算补全体检-疾病矩阵。(5) Input the physical examination-disease matrix to be completed into D2F Net and D2C Net, and use the completed causative factor-disease matrix, causative factor-physical examination matrix and F2C Net to complete the physical examination-disease matrix after calculation .
该体检数据补全方法能够根据已有的数据信息,通过编码和解码的方式对未知的信息进行补全,极大地减轻了一声繁重的工作量,减轻患者的经济和身体负担,此外,还能够帮助不同的医院、医生统一应用不同的体检结果,保证医疗资源不浪费。第二方面,一种基于边信息的体检数据补全装置,包括计算机存储器、计算机处理器以及存储在所述计算机存储器中并可在所述计算机处理器上执行的计算机程序,The medical examination data completion method can complete the unknown information by encoding and decoding according to the existing data information, which greatly reduces the heavy workload and reduces the economic and physical burden of the patient. Help different hospitals and doctors apply different physical examination results uniformly to ensure that medical resources are not wasted. In a second aspect, an apparatus for completing medical examination data based on side information, comprising a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor,
所述计算机存储器中存有通过第一方面所述的基于边信息的体检数据补全方法补全的致病因子-疾病矩阵、致病因子-体检矩阵以及D2F Net,D2C Net以及F2C Net的参数;The computer memory stores the causative factor-disease matrix, causative factor-physical examination matrix and D2F Net, parameters of D2C Net and F2C Net that are completed by the side information-based medical examination data completion method described in the first aspect ;
所述计算机处理器执行所述计算机程序时实现以下步骤:The computer processor implements the following steps when executing the computer program:
接收输入的待补全的体检-疾病矩阵,利用补全的致病因子-疾病矩阵、致病因子-体检矩阵、D2F Net,D2C Net以及F2C Net对体检-疾病矩阵进行计算,输出补全的体检-疾病矩阵。Receive the input medical examination-disease matrix to be completed, use the completed pathogenic factor-disease matrix, pathogenic factor-physical examination matrix, D2F Net, D2C Net and F2C Net to calculate the medical examination-disease matrix, and output the completed Physical Exam - Disease Matrix.
该体检数据补全装置能够根据已有的数据信息和确定的致病因子-疾病矩阵、致病因子-体检矩阵,通过编码和解码的方式对未知的信息进行补全,极大地减轻了一声繁重的工作量,减轻患者的经济和身体负担,此外,还能够帮助不同的医院、医生统一应用不同的体检结果,保证医疗资源不浪费。第三方面,一种利用如第二方面所述的基于边信息的体检数据补全装置获得疾病结果的应用,根据输出的补全体检-疾病矩阵,查找获得疾病结果。The medical examination data completion device can complete unknown information by encoding and decoding according to the existing data information and the determined pathogenic factor-disease matrix and pathogenic factor-physical examination matrix, which greatly reduces the burden of sound. In addition, it can help different hospitals and doctors apply different physical examination results uniformly, so as to ensure that medical resources are not wasted. A third aspect provides an application for obtaining disease results by using the apparatus for completing medical examination data based on side information as described in the second aspect, and searching and obtaining disease results according to the output complemented medical examination-disease matrix.
根据该体检数据补全装置输出的补全体检-疾病矩阵获得的预测疾病亚型,准确率能达到95%以上,能辅助医生进行疾病诊断。According to the predicted disease subtype obtained by complementing the medical examination-disease matrix output by the medical examination data complementing device, the accuracy rate can reach more than 95%, which can assist doctors in disease diagnosis.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图做简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动前提下,还可以根据这些附图获得其他附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative efforts.
图1是实施例提供的体检-疾病矩阵的一个示意形式;Fig. 1 is a schematic form of the physical examination-disease matrix provided by the embodiment;
图2是实施例提供的致病因子-疾病矩阵的一个示意形式;Fig. 2 is a schematic form of the causative factor-disease matrix provided by the embodiment;
图3是实施例提供的致病因子-体检矩阵的一个示意形式;Fig. 3 is a schematic form of the pathogenic factor-physical examination matrix provided by the embodiment;
图4是实施例提供的在体检-疾病矩阵、致病因子-疾病矩阵、以及致病因子-体检矩阵之间构建的编码解码网络的示意图。FIG. 4 is a schematic diagram of an encoding-decoding network constructed between a physical examination-disease matrix, a causative factor-disease matrix, and a causative factor-physical examination matrix provided by the embodiment.
具体实施方式Detailed ways
为使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例对本发明进行进一步的详细说明。应当理解,此处所描述的具体实施方式仅仅用以解释本发明,并不限定本发明的保护范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, and do not limit the protection scope of the present invention.
为了解决体检费钱费力以及医生体检工作繁重的问题,本实施例提供的基于边信息的体检数据补全方法,具体包括以下步骤:In order to solve the problems of expensive and laborious physical examination and heavy physical examination work for doctors, the method for completing physical examination data based on side information provided in this embodiment specifically includes the following steps:
S101、构建体检-疾病矩阵、致病因子-疾病矩阵以及致病因子-体检矩阵。S101 , constructing a physical examination-disease matrix, a pathogenic factor-disease matrix, and a pathogenic factor-physical examination matrix.
针对体检-疾病矩阵,列表示生理特征和疾病亚型,行表示患者,元素值为患者的生理特征检测值和疾病类型。其中,生理特征是指人体的一些生理信息,一般都是体检的项目,包括身高、体重、心率、血常规20相等,疾病亚型是医生主观诊断的疾病类型,如高血压、糖尿病等。图1给出了一个示意性的体检-疾病矩阵,不包含任何真实信息,仅用于描述体检-疾病矩阵的结构。根据图1所示,行表示不同的患者,列表示不同的体检项目,如球蛋白、洪锡标、谷丙转氨酶等,列还表示患者的体检结果,如A、B、C、D、E、F、G等。For the physical examination-disease matrix, the columns represent physiological characteristics and disease subtypes, the rows represent patients, and the element values are the patient's physiological characteristics detection values and disease types. Among them, physiological characteristics refer to some physiological information of the human body, which are generally items of physical examination, including height, weight, heart rate, and blood routine. Figure 1 presents a schematic physical examination-disease matrix, which does not contain any real information and is only used to describe the structure of the physical examination-disease matrix. As shown in Figure 1, the rows represent different patients, the columns represent different physical examination items, such as globulin, Hongxibiao, alanine aminotransferase, etc., and the columns also represent the physical examination results of patients, such as A, B, C, D, E, F , G, etc.
体检-疾病矩阵中,对于以阴阳表示的生理特征,以阳表示的生理特征对应的检测值用1表示,以阴表示的生理特征对应的检测值用0表示。In the physical examination-disease matrix, for the physiological features represented by yin and yang, the detected value corresponding to the physiological feature represented by yang is represented by 1, and the detected value corresponding to the physiological feature represented by yin is represented by 0.
针对致病因子-疾病矩阵,列表示疾病亚型,分为显性和隐性,对于已知的疾病即为显性疾病亚型,对于未知的疾病即为隐性疾病亚型,行表示致病因子,致病因子也分为显性和隐性,已知的致病因子即为显性致病因子,未知致病因子为隐性致病因子,元素值为致病因子导致患疾病的概率。假设致病因子-疾病矩阵为M×N的矩阵,其行M代表M种致病因子,其中仅有m(<M)种是显性的,其列N代表N种疾病亚型,其中仅有n(<N)种是显性的。图2是一个示例性的致病因子-疾病矩阵,其中,疾亚型A、B、C、D、E、F、G为已知类型的疾病,剩余未知类型1、未知类型2、未知类型3、未知类型4即为未知疾病亚型,a、b、c为已知类型的致病因子,其余6种为未知致病因子。若如图2情况,则M=9,m=3;N=11,n=7。其中,M和N的一定要大于m和n;而至于大多少,则通过经验进行适当的估计。For the causative factor-disease matrix, the columns represent disease subtypes, which are divided into dominant and recessive. For known diseases, it is a dominant disease subtype, and for an unknown disease, it is a recessive disease subtype. Disease factors, pathogenic factors are also divided into dominant and recessive factors, known pathogenic factors are dominant pathogenic factors, unknown pathogenic factors are recessive pathogenic factors, and the element value is the pathogenic factor that causes the disease. probability. Suppose the causative factor-disease matrix is an M×N matrix, and its row M represents M causative factors, of which only m (<M) are dominant, and its column N represents N disease subtypes, of which only There are n (<N) species that are dominant. Figure 2 is an exemplary causative factor-disease matrix, in which subtypes A, B, C, D, E, F, and G are known types of diseases, and the remaining unknown types 1, unknown types 2, unknown types 3. Unknown type 4 is the unknown disease subtype, a, b, and c are known types of pathogenic factors, and the remaining 6 are unknown pathogenic factors. If as shown in Figure 2, M=9, m=3; N=11, n=7. Among them, M and N must be larger than m and n; and as for how much larger, appropriate estimation is made through experience.
对于已知疾病亚型和已知致病因子组成的m×n矩阵,其元素值,也就是致病因子导致患疾病的概率(即致病因子和疾病之间的发生概率)是根据医学知识或医学知识证明进行补全的,即图2中的数字0.4、0.1等数值根据医学知识或医学知识证明填充,即实现对致病因子-疾病矩阵的边信息建立,M×N的矩阵中未知疾病亚型和未知致病因子对应的元素值无法填充,则空着。For an m×n matrix composed of known disease subtypes and known causative factors, the element values, that is, the probability that the causative factor causes the disease (ie, the probability of occurrence between the causative factor and the disease), are based on medical knowledge. Or the medical knowledge proof is completed, that is, the numbers 0.4, 0.1 and other values in Figure 2 are filled according to the medical knowledge or medical knowledge proof, that is, the establishment of the edge information of the causative factor-disease matrix, which is unknown in the M×N matrix The element values corresponding to disease subtypes and unknown causative factors cannot be filled, so they are left blank.
针对致病因子-体检矩阵,列表示生理特征(也就是体检数据),行表示致病因子,元素值为致病因子与生理特征的相关性,该相关性是根据医疗知识和医学统计成果构建的,根据相关的程度可以用高、中、低表示,如附图3所示;还可以用正数权重表示正相关,用负数权重表示负相关,用0表示不相关,即实现了对致病因子-体检矩阵的边信息建立。For the causative factor-physical examination matrix, the columns represent physiological characteristics (that is, physical examination data), the rows represent causative factors, and the element value is the correlation between causative factors and physiological characteristics, which is constructed based on medical knowledge and medical statistical results. , according to the degree of correlation, it can be expressed as high, medium and low, as shown in Figure 3; it can also use positive weights to express positive correlations, negative weights to express negative correlations, and 0 to express irrelevance, that is, the matching is achieved. The edge information of the disease factor-physical examination matrix is established.
步骤102,分别在体检-疾病矩阵和致病因子-疾病矩阵,体检-疾病矩阵和致病因子-体检矩阵,以及致病因子-疾病矩阵和致病因子-体检矩阵建立编码解码网络D2F Net,D2C Net以及F2C Net,如图4所示。Step 102, establish an encoding and decoding network D2F Net in the physical examination-disease matrix and the pathogenic factor-disease matrix, the physical examination-disease matrix and the pathogenic factor-physical examination matrix, and the pathogenic factor-disease matrix and the pathogenic factor-physical examination matrix, respectively, D2C Net and F2C Net, as shown in Figure 4.
其中,D2F Net,D2C Net以及F2C Net的网络结构均为由卷积层搭建的自编码器和反卷积搭建的自解码器。卷积层和反卷积层一般为3~4层,且在每个层上建立一个重建目标函数,在自解码器中,要求各层对应的重建差值尽量小。Among them, the network structures of D2F Net, D2C Net and F2C Net are all self-encoders built by convolutional layers and self-decoders built by deconvolution. The convolution layer and the deconvolution layer are generally 3 to 4 layers, and a reconstruction objective function is established on each layer. In the self-decoder, the reconstruction difference corresponding to each layer is required to be as small as possible.
若体检-疾病矩阵、致病因子-疾病矩阵以及致病因子-体检矩阵的尺寸较大,则使用ResNeXt等大容量的神经网络进行编码,并利用与神经网络中的卷积层相对应的反卷积层搭建自解码器,其中,神经网络不能包含会导致信息损失的pooling层,需要将其中的pooling层和dropout层去除。If the size of the physical examination-disease matrix, the causative factor-disease matrix and the causative factor-physical examination matrix are large, use a large-capacity neural network such as ResNeXt for encoding, and use the inverse corresponding to the convolutional layer in the neural network. The convolutional layer builds a self-decoder, in which the neural network cannot contain a pooling layer that will cause information loss, and the pooling layer and dropout layer need to be removed.
S103,联合训练编码解码网络D2F Net,D2C Net以及F2C Net,当训练结束后,致病因子-疾病矩阵和致病因子-体检矩阵已经被补全。S103 , jointly train the encoding and decoding networks D2F Net, D2C Net and F2C Net. After the training, the causative factor-disease matrix and the causative factor-physical examination matrix have been completed.
当补全致病因子-疾病矩阵时,采用D2F Net和F2C Net对致病因子-疾病矩阵进行补全,具体地,When completing the causative factor-disease matrix, D2F Net and F2C Net are used to complete the causative factor-disease matrix, specifically,
对于D2F Net,以体检-疾病矩阵作为输入变量,采用自编码器对体检-疾病矩阵进行编码产生重构致病因子-疾病矩阵,采用自解码器对重构致病因子-疾病矩阵进行解码,产生重构体检-疾病矩阵,以体检-疾病矩阵与重构体检-疾病矩阵的离差平方和损失函数,和因子-疾病矩阵与重构因子-疾病矩阵的离差平方和损失函数之和作为D2F Net的损失函数L1;For D2F Net, the physical examination-disease matrix is used as the input variable, the autoencoder is used to encode the physical examination-disease matrix to generate the reconstructed pathogenic factor-disease matrix, and the autodecoder is used to decode the reconstructed pathogenic factor-disease matrix, Generate a reconstructed medical examination-disease matrix, taking the sum of the squared deviation loss function of the medical examination-disease matrix and the reconstructed medical examination-disease matrix, and the sum of the squared deviation loss function of the factor-disease matrix and the reconstructed factor-disease matrix as the Loss function L 1 of D2F Net;
对于F2C Net,以致病因子-体检矩阵作为输入变量,采用自编码器对致病因子-体检矩阵进行编码产生重构致病因子-疾病矩阵,采用自解码器对重构致病因子-疾病矩阵进行解码,产生重构致病因子-体检矩阵,以致病因子-体检矩阵与重构致病因子-体检矩阵的离差平方和损失函数,和致病因子-疾病矩阵与重构致病因子-疾病矩阵的离差平方和损失函数之和作为F2C Net的损失函数L2;For F2C Net, the pathogenic factor-physical examination matrix is used as the input variable, and the autoencoder is used to encode the pathogenic factor-physical examination matrix to generate the reconstructed pathogenic factor-disease matrix, and the autodecoder is used to reconstruct the pathogenic factor-disease matrix. The matrix is decoded to generate a reconstructed causative factor-physical examination matrix, and the squared dispersion loss function of the causative factor-physical examination matrix and the reconstructed causative factor-physical examination matrix, and the causative factor-disease matrix and the reconstructed pathogenic The sum of the squared deviation and loss function of the factor-disease matrix is used as the loss function L 2 of F2C Net;
以损失函数L1和损失函数L2之和L1作为补全致病因子-疾病矩阵的总损失函数。The sum of the loss function L 1 and the loss function L 2 L 1 is used as the total loss function to complete the causative factor-disease matrix.
当补全致病因子-体检矩阵时,采用F2C Net和D2C Net对致病因子-体检矩阵进行补全,具体地,When completing the pathogenic factor-physical examination matrix, F2C Net and D2C Net are used to complete the pathogenic factor-physical examination matrix. Specifically,
对于F2C Net,以致病因子-疾病矩阵作为输入变量,采用自编码器对致病因子-疾病矩阵进行编码产生重构致病因子-体检矩阵,采用自解码器对重构致病因子-体检矩阵进行解码,产生重构致病因子-疾病矩阵,以致病因子-疾病矩阵与重构致病因子-疾病矩阵的离差平方和损失函数,和致病因子-体检矩阵与重构致病因子-体检矩阵的离差平方和损失函数之和作为F2C Net的损失函数L3;For F2C Net, the pathogenic factor-disease matrix is used as the input variable, and the autoencoder is used to encode the pathogenic factor-disease matrix to generate a reconstructed pathogenic factor-physical examination matrix. The matrix is decoded to generate a reconstructed causative factor-disease matrix, with the squared variance loss function of the causative factor-disease matrix and the reconstructed causative factor-disease matrix, and the causative factor-physical examination matrix and the reconstructed pathogenic The sum of the squared deviation and loss function of the factor-physical examination matrix is used as the loss function L 3 of F2C Net;
对于D2C Net,以体检-疾病矩阵作为输入变量,采用自编码器对体检-疾病矩阵进行编码产生重构致病因子-体检矩阵,采用自解码器对重构致病因子-体检矩阵进行解码,产生重构体检-疾病矩阵,以体检-疾病矩阵与重构体检-疾病矩阵的离差平方和损失函数,和致病因子-疾病矩阵与重构致病因子-疾病矩阵的离差平方和损失函数之和作为D2C Net的损失函数L4;For D2C Net, using the physical examination-disease matrix as the input variable, the autoencoder is used to encode the physical examination-disease matrix to generate the reconstructed pathogenic factor-physical examination matrix, and the autodecoder is used to decode the reconstructed pathogenic factor-physical examination matrix. Generates a reconstructed medical-disease matrix with a sum of squared dispersion loss function of the medical-disease matrix and the reconstructed medical-disease matrix, and a causative factor-disease matrix and a reconstructed causative factor-disease matrix with a squared deviation loss The sum of functions is used as the loss function L 4 of D2C Net;
以损失函数L3和损失函数L4之和L2作为补全致病因子-体检矩阵的总损失函数。The sum of the loss function L 3 and the loss function L 4 L 2 is used as the total loss function to complete the causative factor-physical examination matrix.
当补全体检-疾病矩阵时,采用D2C Net和D2F Net对体检-疾病矩阵进行补全,具体地,When completing the physical examination-disease matrix, D2C Net and D2F Net are used to complete the physical examination-disease matrix. Specifically,
对于D2C Net,以致病因子-体检矩阵作为输入变量,采用自编码器对致病因子-体检矩阵进行编码产生重构体检-疾病矩阵,采用自解码器对重构体检-疾病矩阵进行解码,产生重构致病因子-体检矩阵,以致病因子-体检矩阵与重构致病因子-体检矩阵的离差平方和损失函数,和体检-疾病矩阵与重构体检-疾病矩阵的离差平方和损失函数之和作为D2C Net的损失函数L5;For D2C Net, using the pathogenic factor-physical examination matrix as the input variable, the autoencoder is used to encode the pathogenic factor-physical examination matrix to generate the reconstructed physical examination-disease matrix, and the autodecoder is used to decode the reconstructed physical examination-disease matrix, Generate a reconstructed causative factor-physical examination matrix, with the squared dispersion loss function of the causative factor-physical examination matrix and the reconstructed causative factor-physical examination matrix, and the squared deviation of the physical examination-disease matrix and the reconstructed physical examination-disease matrix. The sum of the loss function is used as the loss function L 5 of D2C Net;
对于D2F Net,以致病因子-疾病矩阵作为输入变量,采用自编码器对致病因子-疾病矩阵进行编码产生重构体检-疾病矩阵,采用自解码器对重构体检-疾病矩阵进行解码,产生重构致病因子-疾病矩阵,以致病因子-疾病矩阵与重构致病因子-疾病矩阵的离差平方和损失函数,和体检-疾病矩阵与重构体检-疾病矩阵的离差平方和损失函数之和作为D2F Net的损失函数L6;For D2F Net, the pathogenic factor-disease matrix is used as the input variable, the autoencoder is used to encode the pathogenic factor-disease matrix to generate the reconstructed medical examination-disease matrix, and the autodecoder is used to decode the reconstructed medical examination-disease matrix, Generate a reconstructed causative factor-disease matrix, with the squared dispersion loss function of the causative factor-disease matrix and the reconstructed causative factor-disease matrix, and the squared deviation of the medical examination-disease matrix and the reconstructed medical examination-disease matrix The sum of the loss function is used as the loss function L 6 of D2F Net;
以损失函数L5和损失函数L6之和L3作为补全体检-疾病矩阵的总损失函数。 The sum L3 of the loss function L5 and the loss function L6 is used as the total loss function to complete the medical examination - disease matrix.
联合训练时,以L1、L2以及L3三者之和作为总损失函数,反向传递,更新D2F Net,D2C Net以及F2C Net的网络参数和补全致病因子-疾病矩阵、致病因子-体检矩阵。During joint training, the sum of L 1 , L 2 and L 3 is used as the total loss function, and the reverse transfer is performed to update the network parameters of D2F Net, D2C Net and F2C Net and complete the pathogenic factor-disease matrix, pathogenicity Factor-Physical Exam Matrix.
上述体检-疾病矩阵是一个元素值完整的矩阵,致病因子-疾病矩阵和致病因子-体检矩阵仅是通过信息建立的不完整矩阵,即均不包括未知致病因子和未知疾病亚型对应的元素值,通过S103的联合训练,利用体检-疾病矩阵和D2F Net,D2C Net以及F2C Net三个网络的自编码和解码功能补全相应的致病因子-疾病矩阵和致病因子-体检矩阵,这样就找到了未知致病因子与未知疾病亚型之间的发生概率,以及未知致病因子与生理特征之间的相关性。The above physical examination-disease matrix is a matrix with complete element values, and the causative factor-disease matrix and the causative factor-physical examination matrix are only incomplete matrices established by information, that is, they do not include the correspondence of unknown causative factors and unknown disease subtypes. Through the joint training of S103, the corresponding causative factor-disease matrix and causative factor-physical examination matrix are completed by using the autoencoding and decoding functions of the physical examination-disease matrix and the three networks of D2F Net, D2C Net and F2C Net. , thus finding the probability of occurrence between unknown causative factors and unknown disease subtypes, as well as the correlation between unknown causative factors and physiological characteristics.
S104,将待补全的体检-疾病矩阵输入到D2F Net,D2C Net中,利用补全的致病因子-疾病矩阵、致病因子-体检矩阵和F2C Net,经计算补全体检-疾病矩阵。S104, input the physical examination-disease matrix to be completed into D2F Net and D2C Net, and use the completed pathogenic factor-disease matrix, pathogenic factor-health examination matrix and F2C Net to complete the physical examination-disease matrix after calculation.
本实施例还提供了一种基于边信息的体检数据补全装置,包括计算机存储器、计算机处理器以及存储在所述计算机存储器中并可在所述计算机处理器上执行的计算机程序,计算机存储器中存有上述体检数据补全方法补全的致病因子-疾病矩阵、致病因子-体检矩阵以及D2F Net,D2C Net以及F2C Net的参数;This embodiment also provides a side information-based medical examination data complementing device, including a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor. There are causative factor-disease matrix, causative factor-physical examination matrix and parameters of D2F Net, D2C Net and F2C Net completed by the above-mentioned physical examination data completion method;
计算机处理器执行所述计算机程序时实现以下步骤:The computer processor implements the following steps when executing the computer program:
接收输入的待补全的体检-疾病矩阵,利用补全的致病因子-疾病矩阵、致病因子-体检矩阵、D2F Net,D2C Net以及F2C Net对体检-疾病矩阵进行计算,输出补全的体检-疾病矩阵。Receive the input medical examination-disease matrix to be completed, use the completed pathogenic factor-disease matrix, pathogenic factor-physical examination matrix, D2F Net, D2C Net and F2C Net to calculate the medical examination-disease matrix, and output the completed Physical Exam - Disease Matrix.
上述体检数据补全方法和装置能够根据已有的数据信息和确定的致病因子-疾病矩阵、致病因子-体检矩阵,通过编码和解码的方式对未知的信息进行补全,极大地减轻了一声繁重的工作量,减轻患者的经济和身体负担,此外,还能够帮助不同的医院、医生统一应用不同的体检结果,保证医疗资源不浪费。当上述体检数据补全装置输出补全的体检-疾病矩阵后,该体检-疾病矩阵中即包含有补全的疾病类型,医生可以根据补全体检-疾病矩阵,查找获得疾病结果,该疾病结果准确率能达到95%以上,能辅助医生进行疾病诊断。The above-mentioned physical examination data completion method and device can complete unknown information by encoding and decoding according to the existing data information and the determined pathogenic factor-disease matrix and pathogenic factor-physical examination matrix, which greatly alleviates the problem. A heavy workload can reduce the economic and physical burden of patients. In addition, it can also help different hospitals and doctors to apply different physical examination results uniformly, so as to ensure that medical resources are not wasted. After the above-mentioned physical examination data completion device outputs the completed physical examination-disease matrix, the physical examination-disease matrix contains the completed disease types, and the doctor can search and obtain the disease result according to the completed physical examination-disease matrix. The accuracy rate can reach more than 95%, which can assist doctors in diagnosing diseases.
以上所述的具体实施方式对本发明的技术方案和有益效果进行了详细说明,应理解的是以上所述仅为本发明的最优选实施例,并不用于限制本发明,凡在本发明的原则范围内所做的任何修改、补充和等同替换等,均应包含在本发明的保护范围之内。The above-mentioned specific embodiments describe in detail the technical solutions and beneficial effects of the present invention. It should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, additions and equivalent substitutions made within the scope shall be included within the protection scope of the present invention.
Claims (4)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811416427.4A CN109658996B (en) | 2018-11-26 | 2018-11-26 | Physical examination data completion method and device based on side information and application |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811416427.4A CN109658996B (en) | 2018-11-26 | 2018-11-26 | Physical examination data completion method and device based on side information and application |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN109658996A CN109658996A (en) | 2019-04-19 |
| CN109658996B true CN109658996B (en) | 2020-08-18 |
Family
ID=66111605
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201811416427.4A Active CN109658996B (en) | 2018-11-26 | 2018-11-26 | Physical examination data completion method and device based on side information and application |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109658996B (en) |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107205154A (en) * | 2017-06-07 | 2017-09-26 | 南京邮电大学 | A kind of radio multimedia sensor network compression of images acquisition method based on matrix completion |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105469123A (en) * | 2015-12-30 | 2016-04-06 | 华东理工大学 | Missing data completion method based on k plane regression |
| CN106097355A (en) * | 2016-06-14 | 2016-11-09 | 山东大学 | The micro-Hyperspectral imagery processing method of gastroenteric tumor based on convolutional neural networks |
| US10776963B2 (en) * | 2016-07-01 | 2020-09-15 | Cubismi, Inc. | System and method for forming a super-resolution biomarker map image |
| CN107391906B (en) * | 2017-06-19 | 2020-04-28 | 华南理工大学 | Construction method of healthy diet knowledge network based on neural network and graph structure |
| CN107301331B (en) * | 2017-07-20 | 2020-05-05 | 北京大学 | A mining method of disease influencing factors based on gene chip data |
| CN107808278B (en) * | 2017-10-11 | 2021-09-24 | 河海大学 | A recommendation method for Github open source project based on sparse autoencoder |
-
2018
- 2018-11-26 CN CN201811416427.4A patent/CN109658996B/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107205154A (en) * | 2017-06-07 | 2017-09-26 | 南京邮电大学 | A kind of radio multimedia sensor network compression of images acquisition method based on matrix completion |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109658996A (en) | 2019-04-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Lin et al. | Healthgpt: A medical large vision-language model for unifying comprehension and generation via heterogeneous knowledge adaptation | |
| CN107016438B (en) | System based on traditional Chinese medicine syndrome differentiation artificial neural network algorithm model | |
| CN107391900B (en) | Atrial fibrillation detection method, classification model training method and terminal device | |
| CN110544274A (en) | A method and system for fundus image registration based on multispectral | |
| CN116453706A (en) | Hemodialysis scheme making method and system based on reinforcement learning | |
| CN117034142B (en) | A method and system for filling missing values in unbalanced medical data | |
| CN114822849A (en) | Data monitoring method, device, equipment and storage medium based on digital twins | |
| CN115115736B (en) | Image artifact removal method, device, equipment and storage medium | |
| CN107909653B (en) | A 3D reconstruction method of cardiac soft tissue based on sparse principal component analysis | |
| Bruce et al. | Skeleton-based detection of abnormalities in human actions using graph convolutional networks | |
| CN116309754A (en) | A brain medical image registration method and system based on local-global information collaboration | |
| CN117079093B (en) | Method, device and equipment for predicting abnormal symptoms based on multi-mode image data | |
| CN114742915A (en) | Magnetic resonance reconstruction method of super-resolution convolutional neural network based on cavity convolution | |
| CN115272369B (en) | Dynamic aggregation transformer network and retinal vessel segmentation method | |
| CN109658996B (en) | Physical examination data completion method and device based on side information and application | |
| JP2024537971A (en) | Method, program and device for learning and inference of deep learning model based on medical data | |
| CN109979591B (en) | A method and device for analyzing plaque progression factors based on graph neural network | |
| Zhao et al. | Motor function assessment of children with cerebral palsy using monocular video | |
| Saxena et al. | Adaptive multi-hop deep learning based drug recommendation system with selective coverage mechanism | |
| Fu et al. | EEG2GAIT: A hierarchical graph convolutional network for EEG-based gait decoding | |
| CN119762382A (en) | Low-dose CT image denoising method based on lightweight contextual Transformer network | |
| CN119235354A (en) | Ultrasonic section positioning method, device and product for echocardiography | |
| CN118334051A (en) | Left ventricle segmentation method based on semi-supervised adversarial consistency learning in echocardiography | |
| Liu et al. | Artificial neural networks condensation: A strategy to facilitate adaption of machine learning in medical settings by reducing computational burden | |
| Dinh et al. | Quantitative gait analysis from single RGB videos using a dual-input transformer-based network |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |
