CN118227776B

CN118227776B - Disease science popularization method and system based on artificial intelligence

Info

Publication number: CN118227776B
Application number: CN202410643440.2A
Authority: CN
Inventors: 康盛伟; 舒启航; 周嘉琪; 古昌伟; 杨颖�; 比确子拉
Original assignee: Sichuan Cancer Hospital
Current assignee: Sichuan Cancer Hospital
Priority date: 2024-05-23
Filing date: 2024-05-23
Publication date: 2024-07-23
Anticipated expiration: 2044-05-23
Also published as: CN118227776A

Abstract

The invention relates to the technical field of medical information, and provides a disease science popularization method and system based on artificial intelligence, comprising the following steps: inputting a diagnosis report of a user, and extracting a disease result keyword from the diagnosis report; collecting a synonym set of keywords of disease results based on an extreme learning machine classifier technology; if the disease result keyword is not in the synonym set, updating the synonym set; according to the synonym set of the disease result keywords, crawling medical materials to be subjected to science popularization; and after translating the medical materials to be subjected to science popularization into the language set by the user, recommending the language to the user for reading. The invention provides science popularization materials related to the diseases for patients and provides users with multiple aspects of understanding of the diseases.

Description

Disease popularization method and system based on artificial intelligence

技术领域Technical Field

本发明涉及医疗信息技术领域，特别涉及一种基于人工智能的疾病科普方法和系统。The present invention relates to the field of medical information technology, and in particular to a disease popular science method and system based on artificial intelligence.

背景技术Background technique

疾病科普是全民认识疾病的重要渠道，其中针对常见癌症患者，医疗机构投入大量资源制作科普素材（文章、视频、音频等），但是由于疾病类型复杂且多样，存在科普内容与疾病患者匹配度不够的问题，患者需要通过诊断报告中的疾病关键词在互联网上搜索相关信息，对于不常见病、病种同义术语较多等情况，患者的搜索过程困难且局限性很大。为此，为患者自动提供疾病相关科普，是亟待解决的问题。Disease popularization is an important channel for the public to understand diseases. For common cancer patients, medical institutions have invested a lot of resources in producing popular science materials (articles, videos, audio, etc.). However, due to the complexity and diversity of disease types, there is a problem that the popular science content is not well matched with disease patients. Patients need to search for relevant information on the Internet through the disease keywords in the diagnosis report. For uncommon diseases and diseases with many synonymous terms, the patient's search process is difficult and has great limitations. Therefore, automatically providing patients with disease-related popular science is an urgent problem to be solved.

发明内容Summary of the invention

本发明的目的在于为患者提供与其疾病相关的科普素材，供用户多方面了解疾病，提供一种基于人工智能的疾病科普方法和系统。The purpose of the present invention is to provide patients with popular science materials related to their diseases, so that users can understand the diseases in many aspects, and to provide a disease popular science method and system based on artificial intelligence.

为了实现上述发明目的，本发明实施例提供了以下技术方案：In order to achieve the above-mentioned object of the invention, the embodiment of the present invention provides the following technical solutions:

基于人工智能的疾病科普方法，包括以下步骤：The disease popularization method based on artificial intelligence includes the following steps:

步骤1，输入用户的诊断报告，从诊断报告中提取疾病结果关键词；Step 1, input the user's diagnosis report and extract disease result keywords from the diagnosis report;

步骤2，基于极限学习机分类器技术，收集疾病结果关键词的同义词集；若该疾病结果关键词不在同义词集内，则对同义词集进行更新；Step 2: Based on the extreme learning machine classifier technology, a synonym set of the disease result keyword is collected; if the disease result keyword is not in the synonym set, the synonym set is updated;

步骤3，根据疾病结果关键词的同义词集，爬取待科普的医学素材；Step 3: crawl medical materials to be popularized based on the synonym set of the disease result keywords;

步骤4，将待科普的医学素材翻译为用户设定的语言后，推荐给用户阅读。Step 4: After translating the medical materials to be popularized into the language set by the user, the materials are recommended to the user for reading.

所述步骤1之前，还包括步骤：Before step 1, the steps include:

构建疾病词典，其中包含了全球已被命名的疾病结果，F表示疾病词典，为F的向量形式，f_j表示疾病词典中第j个疾病结果，为f_j的向量形式，m表示疾病词典中疾病结果的数量，j=1,2,...,m。Building a disease dictionary , which contains the results of named diseases worldwide, F represents the disease dictionary, is the vector form of F, _fj represents the jth disease result in the disease dictionary, is the vector form of _fj , m represents the number of disease results in the disease dictionary, j=1,2,...,m.

所述步骤1具体包括以下步骤：The step 1 specifically comprises the following steps:

对用户的诊断报告进行分词，得到分词后的诊断报告，其中X表示分词后的诊断报告，为X的向量形式，x_i表示分词后的诊断报告中的第i个分词，为x_i的向量形式，n表示分词后的诊断报告中分词的数量，i=1,2,...,n；Segment the user's diagnostic report to obtain the segmented diagnostic report , where X represents the diagnostic report after word segmentation, is the vector form of X, _xi represents the i-th word in the diagnostic report after word segmentation, is the vector form of _xi , n represents the number of segmented words in the diagnostic report after segmentation, i=1,2,...,n;

将每一个分词向量在中遍历，计算与之间的评分：Each word vector exist Traverse and calculate and Ratings between:

其中，表示与之间的评分；表示第m个疾病结果对应的补偿系数；表示第m-1个疾病结果对应的补偿系数；表示第1个疾病结果对应的补偿系数；表示第m个疾病结果对应的权重，T表示矩阵转置；表示第1个疾病结果对应的权重；表示第1个疾病结果对应的偏置；表示第m个疾病结果对应的偏置；in, express and Ratings between; represents the compensation coefficient corresponding to the mth disease result; represents the compensation coefficient corresponding to the m-1th disease result; Indicates the compensation coefficient corresponding to the first disease result; represents the weight corresponding to the mth disease outcome, and T represents the matrix transpose; Indicates the weight corresponding to the first disease result; Represents the bias corresponding to the first disease result; represents the bias corresponding to the mth disease result;

将评分按照从大到小进行排序，用户根据评分排序选择疾病结果关键词。The rating Sort by large to small, and users select disease result keywords based on the ratings.

所述步骤2中，基于极限学习机分类器技术，收集疾病结果关键词的同义词集的步骤，包括：In step 2, the step of collecting synonym sets of disease result keywords based on extreme learning machine classifier technology includes:

假设疾病结果关键词为x_i，基于分词后的诊断报告X={x₁,x₂,...,x_n}构建疾病结果关键词x_i的上下文语境集合A：Assuming that the disease result keyword is x _i , the context set A of the disease result keyword x _i is constructed based on the diagnosis report X = {x ₁ ,x ₂ ,...,x _n } after word segmentation:

其中，k表示上下文语境度，k<i且k<n-i；Where k represents the context degree, k<i and k<n-i;

预先构建各种疾病结果的同义词库Base={B₁,B₂,...,B_R}，其中R为同义词库中同义词集的数量，B_r为第r个疾病结果对应的同义词集，r=1,2,...,R，有：Pre-build the synonym database Base = {B ₁ ,B ₂ ,...,B _R } of various disease results, where R is the number of synonym sets in the synonym database, B _r is the synonym set corresponding to the rth disease result, r = 1, 2,..., R, and we have:

其中，表示同义词集B_r中的第l个同义词，L表示同义词集B_r中同义词的数量，l=1,2,...,L；in, represents the lth synonym in the synonym set _Br , L represents the number of synonyms in the synonym set _Br , l=1,2,...,L;

基于极限学习机分类器计算疾病结果关键词x_i与同义词集B_r的相关系数：The correlation coefficient between the disease result keyword _xi and the synonym set _Br is calculated based on the extreme learning machine classifier:

其中，C_r表示疾病结果关键词x_i与第r个同义词集之间的相关系数；sigmoid表示激活函数；ELM表示极限学习机分类器；Where C _r represents the correlation coefficient between the disease result keyword _xi and the rth synonym set; sigmoid represents the activation function; ELM represents the extreme learning machine classifier;

预设阈值，若，则表示同义词集B_r为疾病结果关键词x_i的同义词集；否则，表示同义词集B_r不为疾病结果关键词x_i的同义词集。Preset Threshold ,like , it means that the synonym set _Br is the synonym set of the disease result keyword _xi ; otherwise, it means that the synonym set _Br is not the synonym set of the disease result keyword _xi .

所述步骤2中，若该疾病结果关键词不在同义词集内，则对同义词集进行更新的步骤，包括：In step 2, if the disease result keyword is not in the synonym set, the step of updating the synonym set includes:

通过极限学习机分类器限定相关系数C_r的取值范围为0≤C_r≤1，阈值的取值范围为，若C_r=1，则说明疾病结果关键词x_i正好在同义词集B_r中；若C_r=0和/或，则说明疾病结果关键词x_i在同义词库Base中没有同义词集，则创建一个新的同义词集B_r+1，并将x_i加入同义词集B_r+1中，有B_r+1={x_i}；若，则查询同义词x_i是否在同义词集B_r中，若不在，则将x_i加入同义词集B_r中。The correlation coefficient C _r is limited to 0 ≤ _{C r} ≤ 1 by the extreme learning machine classifier, and the threshold The value range is , if C _r = 1, it means that the disease result keyword x _i is exactly in the synonym set B _r ; if C _r = 0 and/or , it means that the disease result keyword _xi has no synonym set in the synonym library Base, so a new synonym set _Br+1 is created, and _xi is added to the synonym set Br ₊₁ , and _Br+1 ={ _xi }; if , then query whether the synonym _xi is in the synonym set B _r . If not, add _xi to the synonym set B _r .

所述步骤3之前还包括步骤：The step 3 also includes the following steps:

计算医学素材的推荐值：Recommended values for calculating medical materials:

其中，表示医学素材的推荐值；G表示该医学素材被阅读过的次数；H表示专家评分；I表示阅读行为，如果该医学素材被分享过，则I=1，否则I=0；J表示收藏行为，如果该医学素材被收藏过，则J=1，否则J=0；in, Indicates the recommendation value of the medical material; G indicates the number of times the medical material has been read; H indicates the expert score; I indicates reading behavior, if the medical material has been shared, then I=1, otherwise I=0; J indicates collection behavior, if the medical material has been collected, then J=1, otherwise J=0;

计算该医学素材的兴趣值：Calculate the interest value of the medical material:

其中，表示医学素材的兴趣值；O表示阅读频率；in, Indicates the interest value of medical materials; O indicates reading frequency;

计算该医学素材的适用值：Calculate the applicable value of the medical material:

其中，表示医学素材的适用值；e表示欧拉数；Q表示误差系数，取值范围为-1≤Q≤1；in, represents the applicable value of medical material; e represents the Euler number; Q represents the error coefficient, and its value range is -1≤Q≤1;

设定阈值，若，则将该医学素材作为适于患者科普的医学素材；或者将所有医学素材的适用值由大到小进行排序，选择前S个作为适于患者科普的医学素材。Setting Thresholds ,like , then the medical material will be used as medical material suitable for popular science for patients; or the applicable value of all medical materials Sort them from large to small, and select the first S as medical materials suitable for popularizing knowledge to patients.

所述步骤3具体包括以下步骤：The step 3 specifically comprises the following steps:

适于为患者科普的医学素材为，其中D表示医学素材集合，为D的向量形式，d_s表示医学素材集合中第s个医学素材，为d_s的向量形式，S为医学素材集合中医学素材的数量，s=1,2,...,S；Medical materials suitable for popularizing knowledge to patients are , where D represents the medical material collection, is the vector form of D, _ds represents the sth medical material in the medical material set, is the vector form of _ds , S is the number of medical materials in the medical material set, s=1,2,...,S;

将医学素材集合构建为图结构数据，每个医学素材为一个节点，节点之间的边表示两个节点之间具有引用关系；The medical material collection is constructed as graph structure data. is a node, and the edge between nodes indicates that there is a reference relationship between the two nodes;

基于图注意力机制提取医学素材的全局特征：Extracting medical materials based on graph attention mechanism Global features:

其中，为医学素材的全局特征；Softplus为激活函数；U为注意力头部数量，u表示第u个注意力头部，u=1,2,...,U；W_u表示第u个注意力头部的权重矩阵；Z_s为与节点s有边的邻居节点集合，z为节点s的邻居节点，z∈Z_s；表示邻居节点z对应的医学素材；表示邻居节点z对节点s的关联参数，有：in, For medical materials Softplus is the activation function; U is the number of attention heads, u represents the u-th attention head, u=1,2,...,U; W _u represents the weight matrix of the u-th attention head; Z _s is the set of neighbor nodes with edges to node s, z is the neighbor node of node s, z∈Z _s ; Indicates the medical material corresponding to the neighbor node z; Represents the association parameters of neighbor node z to node s, which are:

其中，exp表示以e为底的指数函数，e表示欧拉数；SoftShrink为激活函数；为注意力系数矩阵，T表示矩阵转置；p为节点s的邻居节点，p∈Z_s且p≠z；表示邻居节点p对应的医学素材；符号||表示张量的粘合；Among them, exp represents the exponential function with e as the base, e represents the Euler number; SoftShrink is the activation function; is the attention coefficient matrix, T represents the matrix transpose; p is the neighbor node of node s, p∈Z _s and p≠z; represents the medical material corresponding to the neighbor node p; the symbol || represents the gluing of tensors;

计算同义词集B_r相对于医学素材的相关度量值：Compute synonym set B _r relative to medical material Related metrics:

其中，为B_r的向量形式；表示同义词集B_r相对于医学素材的相关度量值；表示sigmoid激活函数；表示度量系数；in, is the vector form of _Br ; Represents the synonym set B _r relative to the medical material ; Represents the sigmoid activation function; represents the measurement coefficient;

设定阈值，若，则选择医学素材d_s作为待科普的医学素材。Setting Thresholds ,like , then select the medical material d _s as the medical material to be popularized.

基于人工智能的疾病科普系统，应用于上述任一实施方式所述的基于人工智能的疾病科普方法，包括：The disease popular science system based on artificial intelligence is applied to the disease popular science method based on artificial intelligence described in any of the above embodiments, including:

关键词提取模块，用于从用户输入的诊断报告中提取疾病结果关键词；A keyword extraction module is used to extract disease result keywords from the diagnosis report input by the user;

同义词收集模块，用于基于极限学习机分类器技术，收集疾病结果关键词的同义词集；The synonym collection module is used to collect synonym sets of disease result keywords based on extreme learning machine classifier technology;

同义词更新模块，用于判断所述关键词提取模块提取的疾病结果关键词是否在所述同义词收集模块收集的同义词集中，若不在，则对该同义词集进行更新；A synonym updating module, used for judging whether the disease result keyword extracted by the keyword extracting module is in the synonym set collected by the synonym collecting module, and if not, updating the synonym set;

素材科普模块，用于基于疾病结果关键词的同义词集，爬取待科普的医学素材；The material popular science module is used to crawl medical materials to be popularized based on the synonym set of disease result keywords;

素材翻译模块，用于将待科普的医学素材翻译为用户设定的语言后，推荐给用户阅读。The material translation module is used to translate the medical materials to be popularized into the language set by the user and recommend them to the users for reading.

还包括词典构建模块，用于构建疾病词典。A dictionary building module is also included to build disease dictionaries.

与现有技术相比，本发明的有益效果：本发明通过提取用户诊断报告中的疾病结果关键词，再收集该疾病结果关键词的同义词集，在网络上爬取相关的素材为用户推荐，能够让用户多方面的认识到疾病相关信息，相比于用户通过诊断报告中用一个词表示的疾病结果自主在网络上搜索的方式，本发明能为用户提供更加全面的科普信息。Compared with the prior art, the present invention has the following beneficial effects: the present invention extracts disease result keywords in user diagnosis reports, collects synonym sets of the disease result keywords, crawls relevant materials on the Internet and recommends them to users, thereby enabling users to understand disease-related information in many aspects. Compared with the method in which users independently search on the Internet for disease results represented by one word in the diagnosis report, the present invention can provide users with more comprehensive popular science information.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本发明的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for use in the embodiments are briefly introduced below. It should be understood that the following drawings only show certain embodiments of the present invention and therefore should not be regarded as limiting the scope. For ordinary technicians in this field, other related drawings can be obtained based on these drawings without creative work.

图1为本发明方法流程图。FIG1 is a flow chart of the method of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。因此，以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围，而是仅仅表示本发明的选定实施例。基于本发明的实施例，本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. The components of the embodiments of the present invention generally described and shown in the drawings here can be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the present invention provided in the drawings is not intended to limit the scope of the claimed invention, but merely represents the selected embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative work belong to the scope of protection of the present invention.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that similar reference numerals and letters denote similar items in the following drawings, and therefore, once an item is defined in one drawing, further definition and explanation thereof is not required in subsequent drawings.

本发明通过下述技术方案实现，如图1所示，基于人工智能的疾病科普方法，包括以下步骤：The present invention is implemented by the following technical solution. As shown in FIG1 , the disease popularization method based on artificial intelligence comprises the following steps:

步骤1，输入用户的诊断报告，从诊断报告中提取疾病结果关键词。Step 1: Input the user's diagnosis report and extract disease result keywords from the diagnosis report.

预先构建疾病词典，其中包含了全球已被命名的疾病结果，F表示疾病词典，为F的向量形式，f_j表示疾病词典中第j个疾病结果，为f_j的向量形式，m表示疾病词典中疾病结果的数量，j=1,2,...,m。Pre-built disease dictionaries , which contains the results of named diseases worldwide, F represents the disease dictionary, is the vector form of F, _fj represents the jth disease result in the disease dictionary, is the vector form of _fj , m represents the number of disease results in the disease dictionary, j=1,2,...,m.

对用户的诊断报告进行分词，得到分词后的诊断报告，其中X表示分词后的诊断报告，为X的向量形式，x_i表示分词后的诊断报告中的第i个分词，为x_i的向量形式，n表示分词后的诊断报告中分词的数量，i=1,2,...,n。Segment the user's diagnostic report to obtain the segmented diagnostic report , where X represents the diagnostic report after word segmentation, is the vector form of X, _xi represents the i-th word in the diagnostic report after word segmentation, is the vector form of _xi , n represents the number of segmented words in the diagnostic report after segmentation, i=1,2,...,n.

其中，表示与之间的评分；表示第m个疾病结果对应的补偿系数；表示第m-1个疾病结果对应的补偿系数；表示第1个疾病结果对应的补偿系数；表示第m个疾病结果对应的权重，T表示矩阵转置；表示第1个疾病结果对应的权重；表示第1个疾病结果对应的偏置；表示第m个疾病结果对应的偏置。in, express and Ratings between; represents the compensation coefficient corresponding to the mth disease result; represents the compensation coefficient corresponding to the m-1th disease result; Indicates the compensation coefficient corresponding to the first disease result; represents the weight corresponding to the mth disease outcome, and T represents the matrix transpose; Indicates the weight corresponding to the first disease result; Represents the bias corresponding to the first disease result; Represents the bias corresponding to the mth disease result.

由于诊断报告中可能不止一个疾病结果，所以用户可以根据评分排序由前往后选择与自身疾病结果相同或最相近或感兴趣的作为疾病结果关键词。Since there may be more than one disease result in a diagnostic report, users can select the disease result that is the same as, most similar to, or of interest to their own disease result as the disease result keyword based on the ranking from top to bottom.

步骤2，基于极限学习机分类器技术，收集疾病结果关键词的同义词集，若该疾病结果关键词不在同义词集内，则对同义词集进行更新。Step 2: Based on the extreme learning machine classifier technology, the synonym set of the disease result keyword is collected. If the disease result keyword is not in the synonym set, the synonym set is updated.

假设用户选择的疾病结果关键词为x_i，基于分词后的诊断报告X={x₁,x₂,...,x_n}构建疾病结果关键词x_i的上下文语境集合A：Assuming that the disease result keyword selected by the user is x _i , the context set A of the disease result keyword x _i is constructed based on the diagnosis report X = {x ₁ ,x ₂ ,...,x _n } after word segmentation:

其中，k表示上下文语境度，k<i且k<n-i。Where k represents the context degree, k<i and k<n-i.

其中，表示同义词集B_r中的第l个同义词，L表示同义词集B_r中同义词的数量，l=1,2,...,L。比如第r个疾病结果为冠心病，那么同义词集B_r中可以包含“冠心病”、“冠状动脉粥样硬化性心脏病”、“缺血性心脏病”、“心肌梗死”、“心力衰竭”、“原发性心脏骤停”等同义词。需要说明的是，本方案所指同义词包含全称、简称、俗称、下位词、上位词等，还可以包含症状。in, represents the lth synonym in the synonym set B _r , L represents the number of synonyms in the synonym set B _r , l=1,2,...,L. For example, if the rth disease result is coronary heart disease, then the synonym set B _r may include synonyms such as "coronary heart disease", "coronary atherosclerotic heart disease", "ischemic heart disease", "myocardial infarction", "heart failure", "primary cardiac arrest", etc. It should be noted that the synonyms referred to in this scheme include full names, abbreviations, common names, hyponyms, hypernyms, etc., and may also include symptoms.

基于极限学习机分类器（ELM，Extreme Learning Machine）计算疾病结果关键词x_i与同义词集B_r的相关系数：The correlation coefficient between the disease result keyword _xi and the synonym set _Br is calculated based on the extreme learning machine classifier (ELM):

其中，C_r表示疾病结果关键词x_i与第r个同义词集之间的相关系数；sigmoid表示激活函数；ELM表示极限学习机分类器。由于有R个同义词集，则可计算得到R个相关系数，预设阈值，若，则表示同义词集B_r为疾病结果关键词x_i的同义词集，且允许一个疾病结果关键词有多个同义词集；否则，表示同义词集B_r不为疾病结果关键词x_i的同义词集。Among them, C _r represents the correlation coefficient between the disease result keyword _xi and the rth synonym set; sigmoid represents the activation function; ELM represents the extreme learning machine classifier. Since there are R synonym sets, R correlation coefficients can be calculated, and the preset threshold ,like , it means that the synonym set _Br is the synonym set of the disease result keyword _xi , and one disease result keyword is allowed to have multiple synonym sets; otherwise, it means that the synonym set _Br is not the synonym set of the disease result keyword _xi .

另外，通过极限学习机分类器限定相关系数C_r的取值范围为0≤C_r≤1，阈值的取值范围为，若C_r=1，则说明疾病结果关键词x_i正好在同义词集B_r中；若C_r=0和/或，则说明疾病结果关键词x_i在同义词库Base中没有同义词集，则可以创建一个新的同义词集B_r+1，并将x_i加入同义词集B_r+1中，有B_r+1={x_i}；若，则查询同义词x_i是否在同义词集B_r中，若不在，则将x_i加入同义词集B_r中，实现同义词库Base的动态更新。In addition, the correlation coefficient C _r is limited to 0 ≤ C _r ≤ 1 by the extreme learning machine classifier, and the threshold The value range is , if C _r = 1, it means that the disease result keyword x _i is exactly in the synonym set B _r ; if C _r = 0 and/or , it means that the disease result keyword _xi has no synonym set in the synonym library Base, then we can create a new synonym set Br ₊₁ , and add _xi to the synonym set Br ₊₁ , so _Br+1 ={ _xi }; if , then query whether the synonym _xi is in the synonym set _Br . If not, add _xi to the synonym set _Br to realize the dynamic update of the synonym base Base.

容易理解的，本步骤假设仅选择的一个疾病结果关键词x_i，当用户选择多个疾病结果关键词时，也使用相同的方式得到对应的同义词集。It is easy to understand that this step assumes that only one disease result keyword x _i is selected. When the user selects multiple disease result keywords, the corresponding synonym sets are obtained in the same way.

步骤3，基于疾病结果关键词的同义词集，爬取待科普的医学素材。Step 3: crawl medical materials to be popularized based on the synonym set of disease result keywords.

由于网络上有大量与医学相关的素材，但并不是所有的医学素材都适合推荐给患者阅读，本步骤基于一篇医学素材的阅读量、专家评分等因素，爬取适于为患者科普的医学素材。首先计算一篇医学素材的推荐值：Since there are a lot of medical materials on the Internet, but not all medical materials are suitable for patients to read, this step crawls medical materials suitable for popularizing science for patients based on factors such as the number of readings of a medical material and expert ratings. First, calculate the recommendation value of a medical material:

其中，表示医学素材的推荐值；G表示该医学素材被阅读过的次数；H表示专家评分（单位百分比）；I表示阅读行为，如果该医学素材被分享过，则I=1，否则I=0；J表示收藏行为，如果该医学素材被收藏过，则J=1，否则J=0。in, It represents the recommendation value of the medical material; G represents the number of times the medical material has been read; H represents the expert score (unit: percentage); I represents the reading behavior, if the medical material has been shared, then I=1, otherwise I=0; J represents the collection behavior, if the medical material has been collected, then J=1, otherwise J=0.

然后计算该医学素材的兴趣值：Then calculate the interest value of the medical material:

其中，表示医学素材的兴趣值；O表示阅读频率（单位次/小时）。in, Indicates the interest value of medical materials; O indicates the reading frequency (times/hour).

最后计算该医学素材的适用值：Finally, calculate the applicable value of the medical material:

其中，e表示欧拉数；表示医学素材的适用值；Q表示误差系数，取值范围为-1≤Q≤1。Where, e represents the Euler number; It represents the applicable value of medical material; Q represents the error coefficient, and its value range is -1≤Q≤1.

如此，适于为患者科普的医学素材为，其中D表示医学素材集合，为D的向量形式，d_s表示医学素材集合中第s个医学素材，为d_s的向量形式，S为医学素材集合中医学素材的数量，s=1,2,...,S。将医学素材集合构建为图结构数据，每个医学素材为一个节点，节点之间的边表示两个节点之间具有引用关系。Thus, the medical materials suitable for popularizing knowledge to patients are , where D represents the medical material collection, is the vector form of D, _ds represents the sth medical material in the medical material set, is the vector form of _ds , S is the number of medical materials in the medical material set, s=1,2,...,S. The medical material set is constructed as graph structure data, each medical material is a node, and the edge between nodes indicates that there is a reference relationship between the two nodes.

其中，exp表示以e为底的指数函数，e表示欧拉数；SoftShrink为激活函数；为注意力系数矩阵，T表示矩阵转置；p为节点s的邻居节点，p∈Z_s且p≠z；表示邻居节点p对应的医学素材；符号||表示张量的粘合（concatenate）。Among them, exp represents the exponential function with e as the base, e represents the Euler number; SoftShrink is the activation function; is the attention coefficient matrix, T represents the matrix transpose; p is the neighbor node of node s, p∈Z _s and p≠z; Represents the medical material corresponding to the neighbor node p; the symbol || represents the concatenation of tensors.

其中，为B_r的向量形式；表示同义词集B_r相对于医学素材的相关度量值；表示sigmoid激活函数；表示度量系数。设定阈值，若，则选择医学素材d_s作为待科普的医学素材。in, is the vector form of _Br ; Represents the synonym set B _r relative to the medical material ; Represents the sigmoid activation function; Indicates the measurement coefficient. Set the threshold ,like , then select the medical material d _s as the medical material to be popularized.

通常待科普的医学素材有多个，且用户习惯的语言有所不同，比如汉语、英语、少数民族语等，用户可以设定习惯的语言，将待科普的医学素材进行翻译后推荐给用户阅读，翻译行为使用现有技术即可，此处不再赘述。Usually, there are multiple medical materials to be popularized, and users are accustomed to different languages, such as Chinese, English, and minority languages. Users can set their accustomed language and translate the medical materials to be popularized and recommend them to users for reading. The translation behavior can use existing technology and will not be elaborated here.

本发明还提出基于人工智能的疾病科普系统，包括：词典构建模块、关键词提取模块、同义词收集模块、同义词更新模块、素材爬取模块、素材科普模块、素材翻译模块。The present invention also proposes an artificial intelligence-based disease science popularization system, including: a dictionary construction module, a keyword extraction module, a synonym collection module, a synonym update module, a material crawling module, a material science popularization module, and a material translation module.

所述词典构建模块用于构建疾病词典。详细来说，词典构建模块预先构建疾病词典，其中包含了全球已被命名的疾病结果，F表示疾病词典，为F的向量形式，f_j表示疾病词典中第j个疾病结果，为f_j的向量形式，m表示疾病词典中疾病结果的数量，j=1,2,...,m。The dictionary building module is used to build a disease dictionary. In detail, the dictionary building module pre-builds a disease dictionary , which contains the results of named diseases worldwide, F represents the disease dictionary, is the vector form of F, _fj represents the jth disease result in the disease dictionary, is the vector form of _fj , m represents the number of disease results in the disease dictionary, j=1,2,...,m.

所述关键词提取模块用于从用户输入的诊断报告中提取疾病结果关键词。详细来说，关键词提取模块对用户的诊断报告进行分词，得到分词后的诊断报告，其中X表示分词后的诊断报告，为X的向量形式，x_i表示分词后的诊断报告中的第i个分词，为x_i的向量形式，n表示分词后的诊断报告中分词的数量，i=1,2,...,n。The keyword extraction module is used to extract disease result keywords from the diagnosis report input by the user. In detail, the keyword extraction module performs word segmentation on the user's diagnosis report to obtain a segmented diagnosis report. , where X represents the diagnostic report after word segmentation, is the vector form of X, _xi represents the i-th word in the diagnostic report after word segmentation, is the vector form of _xi , n represents the number of segmented words in the diagnostic report after segmentation, i=1,2,...,n.

其中，表示与之间的评分；表示第m个疾病结果对应的补偿系数；表示第m-1个疾病结果对应的补偿系数；表示第1个疾病结果对应的补偿系数；表示第m个疾病结果对应的权重，T表示矩阵转置；表示第1个疾病结果对应的权重；表示第1个疾病结果对应的偏置；表示第m个疾病结果对应的偏置。一共会得到n×m个评分，并按照从大到小进行排序，排在越前面的与诊断报告中的疾病结果越相近。in, express and Ratings between; represents the compensation coefficient corresponding to the mth disease result; represents the compensation coefficient corresponding to the m-1th disease result; Indicates the compensation coefficient corresponding to the first disease result; represents the weight corresponding to the mth disease outcome, and T represents the matrix transpose; Indicates the weight corresponding to the first disease result; Represents the bias corresponding to the first disease result; Indicates the bias corresponding to the mth disease result. A total of n×m scores will be obtained , and sort them from large to small. The ones at the front are closer to the disease results in the diagnosis report.

所述同义词收集模块用于基于极限学习机分类器技术，收集疾病结果关键词的同义词集。详细来说，假设用户选择的疾病结果关键词为x_i，同义词收集模块基于分词后的诊断报告X={x₁,x₂,...,x_n}构建疾病结果关键词x_i的上下文语境集合A：The synonym collection module is used to collect synonym sets of disease result keywords based on extreme learning machine classifier technology. In detail, assuming that the disease result keyword selected by the user is x _i , the synonym collection module constructs a context set A of the disease result keyword x _i based on the diagnostic report X={x ₁ ,x ₂ ,...,x _n } after word segmentation:

其中，表示同义词集B_r中的第l个同义词，L表示同义词集B_r中同义词的数量，l=1,2,...,L。in, represents the lth synonym in the synonym set _Br , L represents the number of synonyms in the synonym set _Br , l=1,2,...,L.

其中，C_r表示关键词x_i与第r个同义词集之间的相关系数；sigmoid表示激活函数；ELM表示极限学习机分类器。由于有R个同义词集，则可计算得到R个相关系数，预设阈值，若，则表示同义词集B_r为疾病结果关键词x_i的同义词集，且允许一个疾病结果关键词有多个同义词集；否则，表示同义词集B_r不为疾病结果关键词x_i的同义词集。Among them, C _r represents the correlation coefficient between keyword _xi and the rth synonym set; sigmoid represents the activation function; ELM represents the extreme learning machine classifier. Since there are R synonym sets, R correlation coefficients can be calculated, and the preset threshold ,like , it means that the synonym set _Br is the synonym set of the disease result keyword _xi , and one disease result keyword is allowed to have multiple synonym sets; otherwise, it means that the synonym set _Br is not the synonym set of the disease result keyword _xi .

所述同义词更新模块用于判断所述关键词提取模块提取的疾病结果关键词是否在所述同义词收集模块收集的同义词集中，若不在，则对该同义词集进行更新。详细来说，同义词更新模块通过极限学习机分类器限定相关系数C_r的取值范围为0≤C_r≤1，阈值的取值范围为，若C_r=1，则说明疾病结果关键词x_i正好在同义词集B_r中；若C_r=0和/或，则说明疾病结果关键词x_i在同义词库Base中没有同义词集，则可以创建一个新的同义词集B_r+1，并将x_i加入同义词集B_r+1中，有B_r+1={x_i}；若，则查询同义词x_i是否在同义词集B_r中，若不在，则将x_i加入同义词集B_r中，实现同义词库Base的动态更新。The synonym updating module is used to determine whether the disease result keyword extracted by the keyword extraction module is in the synonym set collected by the synonym collection module. If not, the synonym set is updated. In detail, the synonym updating module limits the value range of the correlation coefficient C _r to 0≤C _r ≤1 through the extreme learning machine classifier, and the threshold The value range is , if C _r = 1, it means that the disease result keyword x _i is exactly in the synonym set B _r ; if C _r = 0 and/or , it means that the disease result keyword _xi has no synonym set in the synonym library Base, then we can create a new synonym set Br ₊₁ , and add _xi to the synonym set Br ₊₁ , so _Br+1 ={ _xi }; if , then query whether the synonym _xi is in the synonym set _Br . If not, add _xi to the synonym set _Br to realize the dynamic update of the synonym base Base.

所述素材爬取模块用于爬取适于为患者科普的医学素材。详细来说，素材爬取模块首先计算一篇医学素材的推荐值：The material crawling module is used to crawl medical materials suitable for popularizing science for patients. In detail, the material crawling module first calculates the recommendation value of a medical material:

其中，表示医学素材的适用值；e表示欧拉数；Q表示误差系数，取值范围为-1≤Q≤1。in, It represents the applicable value of medical material; e represents the Euler number; Q represents the error coefficient, and its value range is -1≤Q≤1.

所述素材科普模块用于基于疾病结果关键词的同义词集，爬取待科普的医学素材。详细来说，素材科普模块获得适于为患者科普的医学素材，其中D表示医学素材集合，为D的向量形式，d_s表示医学素材集合中第s个医学素材，为d_s的向量形式，S为医学素材集合中医学素材的数量，s=1,2,...,S。将医学素材集合构建为图结构数据，每个医学素材为一个节点，节点之间的边表示两个节点之间具有引用关系。The material popularization module is used to crawl medical materials to be popularized based on the synonym set of disease result keywords. In detail, the material popularization module obtains medical materials suitable for popularizing science for patients. , where D represents the medical material collection, is the vector form of D, _ds represents the sth medical material in the medical material set, is the vector form of _ds , S is the number of medical materials in the medical material set, s=1,2,...,S. The medical material set is constructed as a graph structure data, each medical material is a node, and the edge between nodes indicates that there is a reference relationship between the two nodes.

其中，为医学素材的全局特征；Softplus为激活函数；U为注意力头部数量u表示第u个注意力头部，u=1,2,...,U；W_u表示第u个注意力头部的权重矩阵；Z_s为与节点s有边的邻居节点集合，z为节点s的邻居节点，z∈Z_s；表示邻居节点z对应的医学素材；表示邻居节点z对节点s的关联参数，有：in, For medical materials Softplus is the activation function; U is the number of attention heads, u represents the u-th attention head, u=1,2,...,U; _Wu represents the weight matrix of the u-th attention head; _Zs is the set of neighbor nodes with edges to node s, z is the neighbor node of node s, _z∈Zs ; Indicates the medical material corresponding to the neighbor node z; Represents the association parameters of neighbor node z to node s, which are:

所述素材翻译模块用于将待科普的医学素材翻译为用户设定的语言后，推荐给用户阅读。详细来说，素材翻译模块为用户提供设定习惯语言的功能，如汉语、英语、少数民族语等，将待科普的医学素材进行翻译后推荐给用户阅读。The material translation module is used to translate the medical material to be popularized into a language set by the user and recommend it to the user for reading. In detail, the material translation module provides the user with the function of setting a customary language, such as Chinese, English, minority languages, etc., and translates the medical material to be popularized and recommends it to the user for reading.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应以权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any person skilled in the art who is familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed by the present invention, which should be included in the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims

1. A disease popular science method based on artificial intelligence, characterized by comprising the following steps:

Step 1, input the user's diagnosis report and extract disease result keywords from the diagnosis report;

Before step 1, the steps include:

Building a disease dictionary , which contains the results of named diseases worldwide, F represents the disease dictionary, is the vector form of F, _fj represents the jth disease result in the disease dictionary, is the vector form of _fj , m represents the number of disease results in the disease dictionary, j=1,2,...,m;

The step 1 specifically comprises the following steps:

Segment the user's diagnostic report to obtain the segmented diagnostic report , where X represents the diagnostic report after word segmentation, is the vector form of X, _xi represents the i-th word in the diagnostic report after word segmentation, is the vector form of _xi , n represents the number of segmented words in the diagnostic report after segmentation, i=1,2,...,n;

Each word vector exist Traverse and calculate and Ratings between:

in, express and Ratings between; represents the compensation coefficient corresponding to the mth disease result; represents the compensation coefficient corresponding to the m-1th disease result; Indicates the compensation coefficient corresponding to the first disease result; represents the weight corresponding to the mth disease outcome, and T represents the matrix transpose; Indicates the weight corresponding to the first disease result; Represents the bias corresponding to the first disease result; represents the bias corresponding to the mth disease result;

The rating Sort by highest to lowest, and users select disease result keywords based on the ratings;

Step 2: Based on the extreme learning machine classifier technology, a synonym set of the disease result keyword is collected; if the disease result keyword is not in the synonym set, the synonym set is updated;

Step 3: crawl medical materials to be popularized based on the synonym set of the disease result keywords;

Step 4: After translating the medical materials to be popularized into the language set by the user, the materials are recommended to the user for reading.

2. The disease science popularization method based on artificial intelligence according to claim 1 is characterized in that: in step 2, the step of collecting synonym sets of disease result keywords based on extreme learning machine classifier technology includes:

Assuming that the disease result keyword is x _i , the context set A of the disease result keyword x _i is constructed based on the diagnosis report X = {x ₁ ,x ₂ ,...,x _n } after word segmentation:

Where k represents the context degree, k<i and k<n-i;

Pre-build the synonym database Base = {B ₁ ,B ₂ ,...,B _R } of various disease results, where R is the number of synonym sets in the synonym database, B _r is the synonym set corresponding to the rth disease result, r = 1, 2,..., R, and we have:

in, represents the lth synonym in the synonym set _Br , L represents the number of synonyms in the synonym set _Br , l=1,2,...,L;

The correlation coefficient between the disease result keyword _xi and the synonym set _Br is calculated based on the extreme learning machine classifier:

Where C _r represents the correlation coefficient between the disease result keyword _xi and the rth synonym set; sigmoid represents the activation function; ELM represents the extreme learning machine classifier;

Preset Threshold ,like , it means that the synonym set _Br is the synonym set of the disease result keyword _xi ; otherwise, it means that the synonym set _Br is not the synonym set of the disease result keyword _xi .

3. The disease science popularization method based on artificial intelligence according to claim 2 is characterized in that: in step 2, if the disease result keyword is not in the synonym set, the step of updating the synonym set includes:

The correlation coefficient C _r is limited to 0 ≤ _{C r} ≤ 1 by the extreme learning machine classifier, and the threshold The value range is , if C _r = 1, it means that the disease result keyword x _i is exactly in the synonym set B _r ; if C _r = 0 and/or , it means that the disease result keyword _xi has no synonym set in the synonym library Base, so a new synonym set _Br+1 is created, and _xi is added to the synonym set Br ₊₁ , and _Br+1 ={ _xi }; if , then query whether the synonym _xi is in the synonym set B _r . If not, add _xi to the synonym set B _r .

4. The disease popular science method based on artificial intelligence according to claim 3 is characterized in that: the step 3 specifically comprises the following steps:

Medical materials suitable for popularizing knowledge to patients are , where D represents the medical material collection, is the vector form of D, _ds represents the sth medical material in the medical material set, is the vector form of _ds , S is the number of medical materials in the medical material set, s=1,2,...,S;

The medical material collection is constructed as graph structure data. is a node, and the edge between nodes indicates that there is a reference relationship between the two nodes;

Extracting medical materials based on graph attention mechanism Global features:

in, For medical materials Softplus is the activation function; U is the number of attention heads, u represents the u-th attention head, u=1,2,...,U; W _u represents the weight matrix of the u-th attention head; Z _s is the set of neighbor nodes with edges to node s, z is the neighbor node of node s, z∈Z _s ; Indicates the medical material corresponding to the neighbor node z; Represents the association parameters of neighbor node z to node s, which are:

Among them, exp represents the exponential function with e as the base, e represents the Euler number; SoftShrink is the activation function; is the attention coefficient matrix, T represents the matrix transpose; p is the neighbor node of node s, p∈Z _s and p≠z; represents the medical material corresponding to the neighbor node p; the symbol || represents the gluing of tensors;

Compute synonym set B _r relative to medical material Related metrics:

in, is the vector form of _Br ; Represents the synonym set B _r relative to the medical material ; Represents the sigmoid activation function; represents the measurement coefficient;

Setting Thresholds ,like , then select the medical material d _s as the medical material to be popularized.

5. An artificial intelligence-based disease science popularization system, applied to the artificial intelligence-based disease science popularization method according to any one of claims 1 to 4, characterized in that it comprises:

Dictionary building module, used to build disease dictionaries;

Dictionary building module pre-builds disease dictionaries , which contains the results of named diseases worldwide, F represents the disease dictionary, is the vector form of F, _fj represents the jth disease result in the disease dictionary, is the vector form of _fj , m represents the number of disease results in the disease dictionary, j=1,2,...,m;

A keyword extraction module is used to extract disease result keywords from the diagnosis report input by the user;

The keyword extraction module performs word segmentation on the user's diagnostic report to obtain a diagnostic report after word segmentation. , where X represents the diagnostic report after word segmentation, is the vector form of X, _xi represents the i-th word in the diagnostic report after word segmentation, is the vector form of _xi , n represents the number of segmented words in the diagnostic report after segmentation, i=1,2,...,n;

Each word vector exist Traverse and calculate and Ratings between:

The synonym collection module is used to collect synonym sets of disease result keywords based on extreme learning machine classifier technology;

A synonym updating module, used for judging whether the disease result keyword extracted by the keyword extracting module is in the synonym set collected by the synonym collecting module, and if not, updating the synonym set;

The material popular science module is used to crawl medical materials to be popularized based on the synonym set of disease result keywords;

The material translation module is used to translate the medical materials to be popularized into the language set by the user and recommend them to the users for reading.