CN107103207B - Accurate medical knowledge search system based on case multigroup variation characteristics and implementation method - Google Patents
Accurate medical knowledge search system based on case multigroup variation characteristics and implementation method Download PDFInfo
- Publication number
- CN107103207B CN107103207B CN201710218630.XA CN201710218630A CN107103207B CN 107103207 B CN107103207 B CN 107103207B CN 201710218630 A CN201710218630 A CN 201710218630A CN 107103207 B CN107103207 B CN 107103207B
- Authority
- CN
- China
- Prior art keywords
- omics
- variation
- case
- model
- knowledge base
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H15/00—ICT specially adapted for medical reports, e.g. generation or transmission thereof
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biomedical Technology (AREA)
- Primary Health Care (AREA)
- Pathology (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域technical field
本发明属于医药卫生行业领域,一种精准医学知识搜索系统的实现方法,具体地说,是一种基于病例多组学变异特征的精准医学知识搜索系统的实现方法。The invention belongs to the field of medicine and health industry, and relates to an implementation method of a precise medical knowledge search system, in particular to an implementation method of a precise medical knowledge search system based on multi-omics variation characteristics of cases.
背景技术Background technique
精准医疗依赖于生物标志物对疾病风险、预后和治疗响应的分类。组学技术的快速发展大大丰富了分子水平的生物标记物数量,为疾病诊断、判断疾病分期或评价新疗法在目标人群中的安全性与有效性提供了更加全面细致的判断依据。Precision medicine relies on the classification of biomarkers for disease risk, prognosis and treatment response. The rapid development of omics technology has greatly enriched the number of biomarkers at the molecular level, providing a more comprehensive and detailed basis for diagnosing disease, judging disease staging, or evaluating the safety and efficacy of new therapies in target populations.
当前“分子水平标记物或病理组学变异特征-干预响应(包括药物响应)”的关联信息主要可以从伴随诊断、细胞系水平的高通量药物筛选实验、精准医疗临床试验等几个渠道获得。伴随诊断提供的关联信息是在大样本人群即群体水平的观测下获得的,信息直接易获取。但细胞系药物筛选实验与精准医疗试验提供的关联信息需要对原始信息进行处理,通过对组学数据进行多组学变异特征提取才能建立分子水平的变异和干预响应之间的关联。因此不同来源和不同类型的关联信息混杂难分的现况增加了不少临床工作者对组学变异特征生理意义的解读和临床价值的利用的困难。At present, the association information of "molecular-level markers or pathomic variation characteristics-intervention response (including drug response)" can mainly be obtained from several channels such as companion diagnostics, high-throughput drug screening experiments at the cell line level, and precision medicine clinical trials. . The association information provided by the companion diagnosis is obtained under the observation of a large sample population, that is, the population level, and the information is directly and easily obtained. However, the correlation information provided by cell line drug screening experiments and precision medicine experiments needs to be processed on the original information, and the correlation between molecular-level variation and intervention response can be established by extracting multi-omics variation features from omics data. Therefore, the current situation that different sources and types of association information are mixed and indistinguishable has increased the difficulty for many clinicians to interpret the physiological significance of omics variation characteristics and utilize the clinical value.
此外,组学数据的整合和临床转化还需要考虑到数据稳定性的问题,实验平台(如不同实验室或机构),观测尺度(如细胞系水平,组织水平,个体水平等),观测方式(如转录组层面,蛋白组层面,或基因组层面等),观测手段(如单核苷酸多态芯片,二代测序技术等)等因素都可能造成观测到的同一生物标记物行为的不稳定。因此如何最大程度地整合这些关联信息,让它们发挥出最大的作用仍亟待解决。In addition, the integration and clinical translation of omics data also need to consider the issue of data stability, experimental platform (such as different laboratories or institutions), observation scale (such as cell line level, tissue level, individual level, etc.), observation method ( Factors such as transcriptome level, proteome level, or genome level, etc.), observation methods (such as single nucleotide polymorphism chip, next-generation sequencing technology, etc.) and other factors may cause the observed instability of the same biomarker behavior. Therefore, how to integrate these related information to the greatest extent and make them play the best role still needs to be solved urgently.
发明内容SUMMARY OF THE INVENTION
本发明的目的是利用可观察到的个体多组学变异特征,快速搜索知识库中和新病例匹配成功的多组学变异-干预响应关联模型,将所有匹配成功的模型对应的干预策略和是否成功响应的记录以一种易读和整合紧密的形式呈现给用户,本发明是通过以下技术方案来实现的:The purpose of the present invention is to use the observable individual multi-omics variation characteristics to quickly search the knowledge base for the multi-omics variation-intervention response correlation models that are successfully matched with new cases, and to compare the intervention strategies corresponding to all the successfully matched models and whether they are successful or not. The record of the successful response is presented to the user in an easy-to-read and tightly integrated form, and the present invention is achieved through the following technical solutions:
本发明公开了一种基于病例多组学变异特征的精准医学知识搜索系统,系统包括:The invention discloses a precise medical knowledge search system based on the multi-omics variation characteristics of cases. The system includes:
一个精准医学知识库,用于收集多组学变异-干预响应关联模型,实现了对不同水平的“组学变异特征-干预响应”信息的收集整合;A precision medicine knowledge base for collecting multi-omics variation-intervention response correlation models, which realizes the collection and integration of "omics variation characteristics-intervention response" information at different levels;
可优化的匹配算法,用于判断病例与知识库中的模型是否匹配及匹配程度;An optimizable matching algorithm for judging whether the case matches the model in the knowledge base and the degree of matching;
匹配算法的评估算法,用于通过评估匹配算法对知识库模型的聚类结果,与模型根据干预响应的标签分类得到的结果进行对比,可评估匹配算法的优劣,对算法不断进行优化;The evaluation algorithm of the matching algorithm is used to compare the clustering results of the knowledge base model by evaluating the matching algorithm and the results obtained by the model according to the label classification of the intervention response, so as to evaluate the advantages and disadvantages of the matching algorithm and continuously optimize the algorithm;
搜索系统直接生成的包含了病例组学分析数据和系统搜索结果的报表,用于为医生提供组学数据的生理意义参考,辅助治疗方案的拟定。The report directly generated by the search system includes case omics analysis data and system search results, which is used to provide doctors with physiological significance of omics data and assist in the formulation of treatment plans.
作为进一步地改进,本发明所述不同水平包括群体水平、个体水平、组织水平和细胞系水平。As a further improvement, the different levels described in the present invention include population level, individual level, tissue level and cell line level.
本发明还公开了一种基于病例多组学变异特征的精准医学知识搜索系统的实现方法,是通过如下步骤实现:The invention also discloses a method for realizing a precise medical knowledge search system based on the multi-omics variation characteristics of cases, which is realized by the following steps:
1)、建立基于多组学变异-干预响应关联模型的多精准医学知识库;1) Establish a multi-precision medicine knowledge base based on a multi-omics variant-intervention response correlation model;
2)、当新病例出现时,提取新病例的多组学变异特征;2) When a new case appears, extract the multi-omics variation characteristics of the new case;
3)、建立新病例与模型(已知的多组学变异特征-干预响应关联)之间的匹配算法;3) Establish a matching algorithm between new cases and models (known multi-omics variation characteristics-intervention response associations);
4)、产生病例匹配系统的分析报告;4), generate the analysis report of the case matching system;
5)、知识库的数据更新和匹配算法的自进化。5) Data update of knowledge base and self-evolution of matching algorithm.
作为进一步地改进,本发明所述的步骤1)中,多组学变异信息包括转录活跃的基因组区域内单碱基突变(单核苷酸多态性和碱基插入缺失),染色体变异(如基因融合)和用来判断基因是否表达异常的基准基因表达量。As a further improvement, in step 1) of the present invention, the multi-omics variation information includes single-base mutations (single nucleotide polymorphisms and base indels) in the genomic regions with active transcription, chromosomal mutations (such as gene fusion) and the benchmark gene expression level used to judge whether the gene expression is abnormal.
作为进一步地改进,本发明所述的步骤1)中,一个多组学变异-干预响应关联模型是一组有伴随诊断药物响应注释和多组学变异特征的“伴随诊断关联模型”,或是药物筛选实验中包含药物响应信息和多组学变异特征的“细胞系关联模型”,或是临床观察到的包含干预响应结果和多组学变异特征的“病例关联模型”,或是包含药物筛选结果信息和多组学变异特征的“个体化疾病模型关联模型”。所述的个体化模型包括但不限于PDX小鼠、PDO类器官模型。As a further improvement, in step 1) of the present invention, a multi-omics variation-intervention response correlation model is a set of "companion diagnostic correlation models" with companion diagnostic drug response annotations and multi-omics variation characteristics, or "Cell-line association models" that include drug response information and multi-omic variation characteristics in drug screening experiments, or clinically observed "case association models" that include intervention response outcomes and multi-omics variation characteristics, or include drug screening "Individualized Disease Model Association Models" for Outcome Information and Multi-omics Variation Signatures. The individualized models include but are not limited to PDX mice and PDO organoid models.
作为进一步地改进,本发明所述的步骤2)中,所述的多组学变异特征包括转录活跃的基因组区域内单碱基突变、染色体结构变异、基因表达异常信息。As a further improvement, in step 2) of the present invention, the multi-omics variation characteristics include single base mutation, chromosome structural variation, and abnormal gene expression information in the genomic region with active transcription.
作为进一步地改进,本发明所述的步骤2)中,建立一套标准化的组学数据分析流程提取多组学变异,从样本采集、测序、数据分析,到知识库匹配全过程进行质控和质保。As a further improvement, in step 2) of the present invention, a set of standardized omics data analysis process is established to extract multi-omics variation, and the whole process from sample collection, sequencing, data analysis, to knowledge base matching is carried out for quality control and Warranty.
作为进一步地改进,本发明所述的步骤3)中,搜索系统提供了一个起始匹配算法和针对匹配算法的评估方法,评估方法会根据使用不同匹配算法对知识库中关联模型的聚类表现来评估现有算法是否优于新算法,决定是否需要对算法升级优化。As a further improvement, in step 3) of the present invention, the search system provides an initial matching algorithm and an evaluation method for the matching algorithm, and the evaluation method will be based on the clustering performance of the association models in the knowledge base using different matching algorithms To evaluate whether the existing algorithm is better than the new algorithm, decide whether the algorithm needs to be upgraded and optimized.
作为进一步地改进,本发明所述的步骤4)中,所述的报告分为两部分:第一部分是对病例生理相关的多组学变异特征的统计信息展现,从单碱基突变、染色体变异和差异表达基因等方面给出病变组织的组学变异信息;第二部分是在完成对知识库的搜索后,依据系统中模型与病例的相似性从高到低排序展现模型的匹配证据和用药信息。As a further improvement, in step 4) of the present invention, the report is divided into two parts: the first part is the statistical information display of the physiologically related multi-omics variation characteristics of cases, from single base mutation, chromosomal variation The second part is to show the matching evidence and medication of the model in descending order according to the similarity between the model and the case in the system after completing the search of the knowledge base. information.
作为进一步地改进,本发明所述的步骤5)中,当病例完成步骤2)组学特征提取后,跟踪病例用药治疗效果,将病例数据作为一个病例类模型加入精准医学知识库,扩增知识库的覆盖范围和提高知识库的匹配精度;当知识库中没有搜索到可匹配的关联模型时,直接根据医生经验治疗,同时可发展病例建立个体化疾病模型,并跟踪病例治疗效果和个体化疾病模型的试药结果,构建对应的“病例关联模型”或“个体化疾病模型关联模型”加入精准医学知识库。As a further improvement, in step 5) of the present invention, after the case completes step 2) omics feature extraction, track the drug treatment effect of the case, add the case data as a case model to the precision medicine knowledge base, and expand the knowledge Improve the coverage of the database and improve the matching accuracy of the knowledge base; when no matching association model is found in the knowledge base, the treatment can be directly based on the doctor's experience, and at the same time, the case can be developed to establish an individualized disease model, and the treatment effect and individualization of the case can be tracked. Based on the drug test results of the disease model, the corresponding "case association model" or "individualized disease model association model" is constructed and added to the precision medicine knowledge base.
本发明的优点在于:The advantages of the present invention are:
1)本发明搜索范围广,可检索不同观测尺度下的关联模型。本发明系统地整合了已知组学变异和干预响应之间的关联,通过定义了一个广义的多组学变异-干预响应关联模型类的框架,将不同水平和来源的干预响应和组学变异信息整合进了一个知识库。1) The present invention has a wide search range and can retrieve correlation models under different observation scales. The present invention systematically integrates the known associations between omics variation and intervention response, and by defining a generalized framework of multi-omics variation-intervention response association model classes, the intervention response and omics variation of different levels and sources Information is consolidated into a knowledge base.
2)本发明可用的匹配特征和匹配策略丰富。一方面,从单碱基变异、染色体变异、差异表达基因等多个方面对多组学变异特征协同匹配保证了匹配结果的可靠性,降低了单一变异类型与生理表型关联分析中的噪音。另一方面,本发明对知识库中不同尺度的干预响应模型分别提供了特异性的可优化的匹配策略,通过关联模型为病例-干预响应之间的关系提供了多角度的证据支持。2) The available matching features and matching strategies of the present invention are abundant. On the one hand, the cooperative matching of multi-omics variation characteristics from multiple aspects such as single base variation, chromosomal variation, and differentially expressed genes ensures the reliability of the matching results and reduces the noise in the correlation analysis between single variation types and physiological phenotypes. On the other hand, the present invention provides specific and optimizable matching strategies for intervention response models of different scales in the knowledge base, and provides multi-angle evidence support for the relationship between cases and intervention responses through the association model.
3)本发明具有自进化能力。该能力表现在两方面:一、精准医学知识库中模型数量将随着搜索系统的运行不断扩充。新病例进入后,系统会记录病例的多组学变异特征,结合病例的后续治疗方案和干预响应结果或病例的个体化疾病模型的用药结果,生成病例的关联模型加入多精准医学知识库。二、系统的匹配算法可以不断优化。本发明针对匹配算法建立了对应的评价方法。一旦更新匹配算法,可以使用新的匹配算法对知识库中的模型进行重新聚类,与基于干预响应标签的分类方式进行比较,通过评价新算法是否优于现有算法来决定是否需要更新系统。3) The present invention has self-evolution ability. This capability is manifested in two aspects: First, the number of models in the precision medicine knowledge base will continue to expand with the operation of the search system. After a new case is entered, the system will record the multi-omics variation characteristics of the case, and combine the follow-up treatment plan and intervention response results of the case or the medication results of the individualized disease model of the case to generate an association model of the case and add it to the multi-precision medicine knowledge base. Second, the matching algorithm of the system can be continuously optimized. The present invention establishes a corresponding evaluation method for the matching algorithm. Once the matching algorithm is updated, the new matching algorithm can be used to re-cluster the models in the knowledge base, compared with the classification based on the intervention response labels, and whether the system needs to be updated by evaluating whether the new algorithm is better than the existing algorithm.
4)本发明填补了组学变异信息提取环节和临床指导用药环节之间的空白,辅助了临床工作人员对组学变异生理意义的系统性解读和临床价值的挖掘。4) The present invention fills the gap between the link of omics variation information extraction and the link of clinical guidance medication, and assists clinical staff in systematic interpretation of the physiological significance of omics variation and mining of clinical value.
附图说明Description of drawings
图1是本发明技术方案实现流程示意图。FIG. 1 is a schematic diagram of the implementation flow of the technical solution of the present invention.
具体实施方式Detailed ways
本发明建立了一种基于个体病例多组学变异协同匹配方法的精准医学知识搜索系统。本发明系统:一、包含一个精准医学知识库。知识库通过收集多组学变异-干预响应关联模型,实现了对不同水平(群体水平、个体水平、组织水平、细胞系水平等)的“组学变异特征-干预响应”信息的收集整合。进入系统的个体病例可被作为新模型,用于知识库的扩增;二、包含了可优化的匹配算法。系统提供的起始匹配算法并没有最大程度地发挥出丰富的组学变异的优势,但本发明提供了一个匹配算法的评估方法,通过评估匹配算法对知识库模型的聚类结果,与模型根据干预响应的标签分类得到的结果进行对比,可评估匹配算法的优劣,对算法不断进行优化;三、搜索系统直接生成一个易读的包含了病例组学分析数据和系统搜索结果的报表,可以为医生提供组学数据的生理意义参考,辅助治疗方案的拟定。The present invention establishes a precise medical knowledge search system based on the multi-omics variation collaborative matching method of individual cases. The system of the present invention: 1. It includes a precision medicine knowledge base. The knowledge base realizes the collection and integration of "omics variation characteristics-intervention response" information at different levels (population level, individual level, tissue level, cell line level, etc.) by collecting multi-omics variation-intervention response correlation models. Individual cases entered into the system can be used as new models for the expansion of the knowledge base; 2. It includes an optimizable matching algorithm. The initial matching algorithm provided by the system does not maximize the advantage of rich omics variation, but the present invention provides an evaluation method for the matching algorithm, by evaluating the clustering result of the matching algorithm on the knowledge base model, and the model according to the model. By comparing the results obtained from the label classification of the intervention response, the pros and cons of the matching algorithm can be evaluated, and the algorithm can be continuously optimized; 3. The search system directly generates an easy-to-read report containing the case study data and system search results, which can Provide physicians with the physiological significance of omics data and the formulation of adjuvant treatment plans.
这一发明的基本模式是:一、建立基于多组学变异-干预响应关联模型的多精准医学知识库。多组学变异信息包括单碱基突变(单核苷酸多态性和碱基插入缺失),染色体变异(如基因融合)和用来判断基因是否表达异常的基准基因表达量三方面内容。一个多组学变异-干预响应关联模型可以是一组有伴随诊断药物响应注释和多组学变异特征的“伴随诊断关联模型”;也可以是药物筛选实验中包含药物响应信息和多组学变异特征的“细胞系关联模型”;也可以是临床观察到的包含干预响应结果和多组学变异特征的“病例关联模型”;也可以是包含药物筛选结果信息和多组学变异特征的“个体化疾病关联模型”(包括但不限于PDX小鼠、PDO类器官模型)。二、当新病例出现时,提取新病例的多组学变异特征(包括但不限于单碱基突变、染色体结构变异、基因表达谱信息)。建立一套标准化的组学数据分析流程提取多组学变异,从样本采集、测序、数据分析,到知识库匹配全过程进行质控和质保。三、建立新病例与关联模型之间的匹配算法。搜索系统提供了一个起始匹配算法和针对匹配算法的评估方法,评估方法会根据使用不同匹配算法对知识库中关联模型的聚类表现来评估现有算法是否优于新算法,决定是否需要对算法升级优化。四、生成病例的个性化报告。报告分为两部分:第一部分是对病例生理相关的多组学变异特征的统计信息展现,从单碱基突变、染色体变异和差异表达基因等方面给出病变组织的组学变异信息;第二部分是在完成对知识库的搜索后,依据系统中模型与病例的相似性从高到低排序展现模型的匹配证据和用药信息。五、如果病例没有匹配上现有模型,则直接依据医生经验用药,同时可发展基于该病例的个体化疾病治疗模型进行药物筛选,根据反馈结果对该病例构建“病例关联模型”和“个体化疾病关联模型”,加入知识库。The basic model of this invention is as follows: 1. Establish a multi-precision medicine knowledge base based on a multi-omics variation-intervention response correlation model. The multi-omics variation information includes three aspects: single base mutation (single nucleotide polymorphism and base insertion and deletion), chromosomal variation (such as gene fusion) and the benchmark gene expression level used to judge whether the gene expression is abnormal. A multi-omics variant-intervention response association model can be a set of "companion diagnostic association models" with companion diagnostic drug response annotations and multi-omic variation characteristics; it can also be a drug screening experiment that includes drug response information and multi-omics variation. It can be a “cell line association model” of characteristics; it can also be a clinically observed “case association model” that includes intervention response results and multi-omics variation characteristics; it can also be an “individual” that includes drug screening results information and multi-omics variation characteristics. "Disease-associated models" (including but not limited to PDX mice, PDO organoid models). 2. When a new case appears, extract the multi-omics variation characteristics of the new case (including but not limited to single base mutation, chromosomal structural variation, and gene expression profile information). Establish a standardized omics data analysis process to extract multi-omics variants, and conduct quality control and quality assurance throughout the entire process from sample collection, sequencing, data analysis, to knowledge base matching. 3. Establish a matching algorithm between new cases and association models. The search system provides a starting matching algorithm and an evaluation method for the matching algorithm. The evaluation method will evaluate whether the existing algorithm is better than the new algorithm according to the clustering performance of the association models in the knowledge base using different matching algorithms, and decide whether to Algorithm upgrade and optimization. Fourth, generate personalized reports of cases. The report is divided into two parts: the first part is the statistical information of the multi-omics variation characteristics related to the physiology of the cases, and the omics variation information of the diseased tissue is given from the aspects of single base mutation, chromosomal variation and differentially expressed genes; Part of it is to display the matching evidence and medication information of the model in descending order according to the similarity between the model and the case in the system after completing the search of the knowledge base. 5. If the case does not match the existing model, the drug can be used directly according to the doctor's experience. At the same time, an individualized disease treatment model based on the case can be developed for drug screening. Disease Association Models", added to Knowledge Base.
图1是本发明技术方案实现流程示意图,具体实现步骤如下:Fig. 1 is a schematic diagram of the implementation flow of the technical solution of the present invention, and the specific implementation steps are as follows:
1)构建基于多组学变异-干预响应关联模型的精准医学知识库:建立不同尺度(包括但不限于群体水平、个体水平、组织水平、细胞系水平)的干预响应模型,包括但不限于从“群体组学变异特征-干预响应”、“个体病例组学变异特征-干预响应”、“个体化疾病模型(如PDX小鼠和PDO模型等)组学变异特征-干预响应”、“细胞系组学变异特征-干预响应”几个角度,收集多组学变异特征与对应的干预及干预响应信息。本知识库中的数据通过网络爬虫抓取、公开数据库下载,以及本地数据导入(病例及个体化疾病模型)等方式获得。获得的数据需要经过分词,语义分析,正则匹配等技术提取核心关键词和数据后进行格式转化,将原始信息映射到具有临床干预设计参考价值的信息标准化接口,人工校正后加入数据库。数据库中同一类关联模型的数据有统一的信息储存格式;1) Build a precision medicine knowledge base based on a multi-omics variation-intervention response association model: build intervention response models at different scales (including but not limited to population level, individual level, tissue level, and cell line level), including but not limited to "Population Omics Variation Characteristics-Intervention Response", "Individual Case Omics Variation Characteristics-Intervention Response", "Omics Variation Characteristics of Individualized Disease Models (such as PDX Mice and PDO Models, etc.)-Intervention Response", "Cell Lines From the perspectives of "omics variation characteristics-intervention response", collect multi-omics variation characteristics and corresponding intervention and intervention response information. The data in this knowledge base is obtained through web crawling, public database download, and local data import (cases and individualized disease models). The obtained data needs to undergo word segmentation, semantic analysis, regular matching and other technologies to extract core keywords and data, and then convert the format to the original information. The data of the same type of association model in the database has a unified information storage format;
2)搭建提取病例多组学变异特征的流程:搭建基于二代测序技术的生物信息学分析流程,从组学数据中提取与生理变化密切相关的单碱基突变、基因组结构突变以及转录水平表达异常的基因,作为病例的多组学变异特征,用于与多组学变异特征数据库中的模型进行匹配。病例的数据分析过程使用严格的质控,在正常对照样本可获得的情况下,利用正常样本和已知的疾病-组学变异信息对病例组学变异进行筛选,增加病例多组学变异特征与生理表型关联的可靠性;2) Build a process for extracting multi-omics variation characteristics of cases: Build a bioinformatics analysis process based on next-generation sequencing technology, and extract single-base mutations, genome structure mutations and transcriptional level expressions that are closely related to physiological changes from omics data. Abnormal genes, as multi-omics variant signatures of cases, are used to match models in the Multi-Omics Variation Signature Database. The data analysis process of the cases uses strict quality control. When normal control samples are available, the normal samples and known disease-omics variation information are used to screen the case omics variation, and the multi-omics variation characteristics of the cases are increased. Reliability of physiological phenotype associations;
3)实现病例-模型多组学变异协同匹配算法:精准医学知识库整合了多数据来源,多组学角度的关联模型的变异特征信息。当病例完成多组学变异特征的提取,进入病例匹配系统时,需要根据知识库中模型的类型,对病例与模型进行匹配。在与某一特定的关联模型进行匹配时,需要针对不同的组学变异特征,分别使用不同的方法对从病例中提取到的变异特征与模型的变异特征进行匹配打分,最后将不同变异特征的打分根据公式生成病例-药物响应模型的匹配总分,根据总分判断病例与模型是否能匹配上;3) Realize the case-model multi-omics variation collaborative matching algorithm: The precision medicine knowledge base integrates the variation feature information of the association model from multiple data sources and multi-omics perspectives. When a case completes the extraction of multi-omics variation characteristics and enters the case matching system, it is necessary to match the case with the model according to the type of the model in the knowledge base. When matching with a specific association model, it is necessary to use different methods for different omics variation characteristics to match and score the variation characteristics extracted from the cases and the variation characteristics of the model. The scoring generates the matching total score of the case-drug response model according to the formula, and judges whether the case and the model can be matched according to the total score;
4)产生病例匹配系统的分析报告:报告分为两个层面:第一层:个体病例的组学信息报告。包括但不限于原始数据测序质量信息、数据分析流程介绍、多组学变异特征的统计信息;第二层:病例与精准医学知识库中模型的匹配结果。根据搜索结果,按系统中模型与病例的相似性从高到低排序展现模型的干预策略、响应结果以及匹配证据等信息。第二层提供了易读的“个体病例组学变异特征-模型组学变异特征-干预响应”信息,提供了病例的潜在干预响应信息来辅助医生解读组学变异特征的生理意义和挖掘组学数据的临床价值;4) Generate the analysis report of the case matching system: the report is divided into two levels: the first level: the omics information report of the individual case. Including but not limited to raw data sequencing quality information, data analysis process introduction, statistical information of multi-omics variation characteristics; second layer: matching results between cases and models in the precision medicine knowledge base. According to the search results, the model's intervention strategy, response results, and matching evidence are displayed in descending order of the similarity between the model and the case in the system. The second layer provides easy-to-read information of "individual case omics variation characteristics-model omics variation characteristics-intervention response", which provides the potential intervention response information of cases to assist doctors in interpreting the physiological meaning of omics variation characteristics and mining omics the clinical value of the data;
5)搜索系统的更新:系统的更新分为知识库的数据更新和匹配算法的自进化两个部分。5) The update of the search system: the update of the system is divided into two parts: the data update of the knowledge base and the self-evolution of the matching algorithm.
一、知识库的更新:当病例匹配上知识库中模型时,跟踪病例用药治疗效果,将病例数据作为一个病例类模型加入精准医学知识库,扩增知识库的覆盖范围和提高知识库的匹配精度。当知识库中没有搜索到可匹配的关联模型时,直接根据医生经验治疗,同时可发展病例建立个体化疾病模型(如PDX小鼠或PDO类器官模型等),并跟踪病例干预响应结果和个体疾病模型的试药结果,构建对应的病例关联模型或个体疾病关联模型加入精准医学知识库。1. Update of the knowledge base: When a case is matched with the model in the knowledge base, track the drug treatment effect of the case, add the case data as a case model to the precision medicine knowledge base, expand the coverage of the knowledge base and improve the matching of the knowledge base precision. When no matching association model is found in the knowledge base, the treatment can be directly based on the doctor's experience. At the same time, the case can be developed to establish an individualized disease model (such as PDX mouse or PDO organoid model, etc.), and the case intervention response results and individuals can be tracked. Based on the drug test results of the disease model, the corresponding case association model or individual disease association model is constructed and added to the precision medicine knowledge base.
二、匹配算法的自进化:本系统建立了用于比较新旧匹配算法优劣的一个评估方法来优化系统匹配算法。本系统投入运转时,首先提供一个有待优化的起始匹配算法。随着新病例的扩充,精准医学知识库中的模型会不断增加,为优化匹配算法提供了资源。根据知识库中模型对干预的响应分类,本发明可随机选取M个关联模型,对选取的模型两两之间分别使用新旧匹配算法进行打分,得到由这些模型所组成的两个相似性打分矩阵。进一步对矩阵聚类,可获得分别用新旧匹配算法获得的模型分类情况,和真实的根据药物响应信息进行分类的结果进行比较,从而判断新算法是否表现更出众,能取代系统当前算法。2. Self-evolution of matching algorithm: The system establishes an evaluation method for comparing the advantages and disadvantages of new and old matching algorithms to optimize the system matching algorithm. When the system is put into operation, an initial matching algorithm to be optimized is provided first. As new cases expand, the models in the precision medicine knowledge base will continue to grow, providing resources for optimizing matching algorithms. According to the classification of the response of the models in the knowledge base to the intervention, the present invention can randomly select M correlation models, use the old and new matching algorithms to score between the selected models, and obtain two similarity scoring matrices composed of these models. . By further clustering the matrix, the model classification obtained by the old and new matching algorithms can be obtained and compared with the actual classification results based on the drug response information, so as to judge whether the new algorithm performs better and can replace the current algorithm of the system.
以下通过具体实施例对本发明的技术方案作进一步地说明:The technical scheme of the present invention is further described below by specific embodiments:
实施例1:一个基于病例转录组变异特征的癌症病例快速匹配系统Example 1: A Rapid Matching System for Cancer Cases Based on Transcriptome Variation Signatures of Cases
本实施例由五大步骤组成:This embodiment consists of five steps:
1)多精准医学知识库的构建:知识库以关联模型为存储对象,从美国食品药品监督局(FDA)批准的伴随诊断药物列表、My Cancer Genome提供的精准化癌症医学资讯、桑格研究所的GDSC数据库三个数据源收集药物响应信息所关联的多组学变异特征。伴随诊断药物和My Cancer Genome提供了群体水平的组学变异特征-药物响应信息,GDSC数据库提供了细胞系水平特异性的组学变异特征-药物响应信息。不同格式的数据,通过国际标准数据库提供的命名方式进行统一管理。在本实例中,不同来源的单碱基突变都映射到COSMIC数据库中对应名称,以该数据库中的命名作为标准输出。同样地,基因名以NCBI的entrez ID作为标准,疾病名以OMIM ID作为标准。1) Construction of multiple precision medicine knowledge bases: The knowledge base uses the association model as the storage object, from the list of companion diagnostic drugs approved by the US Food and Drug Administration (FDA), the precise cancer medical information provided by My Cancer Genome, and the Sanger Institute The three data sources of the GDSC database collect drug response information associated with multi-omics variability signatures. Companion diagnostic drugs and My Cancer Genome provide population-level omics variation signature-drug response information, and GDSC database provides cell line-specific omics variation signature-drug response information. Data in different formats are managed uniformly through the naming method provided by the international standard database. In this example, single-base mutations from different sources are mapped to the corresponding names in the COSMIC database, and the names in the database are used as the standard output. Likewise, gene names use NCBI's entrez ID as a standard, and disease names use OMIM ID as a standard.
2)病例多组学变异特征的提取:搭建基于转录组测序(RNA-Seq)数据的生物信息学分析流程,从转录组数据中提取与生理变化密切相关的单碱基突变、染色体结构突变以及转录水平表达异常的基因,作为病例的多组学变异特征,用于与多组学变异特征数据库中的模型进行匹配。2) Extraction of multi-omics variation characteristics of cases: build a bioinformatics analysis process based on transcriptome sequencing (RNA-Seq) data, and extract single-base mutations, chromosome structure mutations and mutations that are closely related to physiological changes from transcriptome data. Genes with abnormal expression at the transcript level were used as multi-omics variant signatures of cases for matching with models in the Multi-Omics Variation Signature Database.
在本实例中,变异的提取流程可以被分为以下几个部分:RNA-Seq数据预处理,单碱基突变检测(单核苷酸多态性、小片段插入缺失),染色体结构变异检测(基因融合),基因表达和异常表达基因检测,结果可视化展示。In this example, the variant extraction process can be divided into the following parts: RNA-Seq data preprocessing, single base mutation detection (single nucleotide polymorphism, small fragment indels), chromosomal structural variation detection ( Gene fusion), detection of gene expression and abnormally expressed genes, and visual display of results.
一、RNA-Seq数据预处理:1. RNA-Seq data preprocessing:
原始数据使用质量控制工具检查数据质量,通过检测的数据随后使用去接头软件对读段中的接头序列和头尾低质量碱基进行切除。清洗后的读段用于接下来的序列比对。在此处,本实例使用了快速短片段比对软件以及人类基因组作为参考基因组进行比对。The raw data was checked for data quality using quality control tools, and the detected data was subsequently excised using de-linker software to excise linker sequences and low-quality bases in the reads. The cleaned reads are used for subsequent sequence alignments. Here, this example uses the Fast Short Fragment Alignment software and the human genome as the reference genome for alignment.
二、检测病例的单碱基突变:2. Detection of single base mutations in cases:
本实例在这一步依照了GATK提供的RNA-seq变异检测最佳实践流程(http://gatkforums.broadinstitute.org/gatk/discussion/3892/the-gatk-best-practices-for-variant-calling-on-rnaseq-in-full-detail)进行操作。首先对1.中比对得到的文件去除冗余的读段,再对读段进行裁尾处理,将读段按外显子区段拆开,执行碱基校正,对单核苷酸多态性和单核苷酸插入缺失进行检测,最后利用人类基因组变异数据库资源,使用变异注释软件对检测到的单碱基变异进行注释和过滤。This example follows the RNA-seq variant calling best practice process provided by GATK at this step (http://gatkforums.broadinstitute.org/gatk/discussion/3892/the-gatk-best-practices-for-variant-calling- on-rnaseq-in-full-detail) to operate. First, the redundant reads are removed from the files obtained by the comparison in 1., and then the reads are trimmed, and the reads are disassembled by exon segments, and base correction is performed. Sex and single-nucleotide indels were detected, and finally, the detected single-base variants were annotated and filtered using the human genome variation database resources and variant annotation software.
三、检测病例的染色体变异:3. Detection of chromosomal aberrations in cases:
转录组测序数据所能检测到的结构变异主要为基因融合。在此处对1.中比对结果使用基因融合软件检测转录组上能看到的基因融合事件。Structural variations detected by transcriptome sequencing data are mainly gene fusions. The gene fusion events that can be seen on the transcriptome are detected by gene fusion software for the alignment results in 1. here.
四、检测基因表达量:4. Detection of gene expression:
这一步骤也使用了1.中的比对文件作为片段拼接组装软件的输入文件,用于转录本的拼接和表达量的计算。在该实施例中我们只考虑没有提供癌旁组织且公开的癌症转录组数据库中也没有癌旁组织的情况。This step also uses the alignment file in 1. as the input file for the fragment assembly software for transcript assembly and expression calculation. In this example we only consider the case where no paracancerous tissue is provided nor in the public cancer transcriptome database.
五、病例组学数据结果可视化展示:5. Visual display of case data results:
个体病例的整体多组学变异特征用圈图展示。圈图由里向外由四部分组成,最里面显示了基因融合事件的发生位置,然后显示的是单碱基突变事件的发生位置,其次是基因在整个转录组的表达情况,最外层是带注释的染色体位置信息。The overall multi-omic variation profile of individual cases is shown in circle plots. The circle diagram is composed of four parts from the inside to the outside. The innermost part shows the location of the gene fusion event, then the location of the single base mutation event, followed by the expression of the gene in the entire transcriptome, and the outermost layer is Annotated chromosomal location information.
在分析过程中产生的各类统计图,如散点图、直方图、饼图等通过统计软件R实现可视化输出。All kinds of statistical charts generated in the analysis process, such as scatter plots, histograms, pie charts, etc., are visualized and output through the statistical software R.
3)病例-模型多组学变异协同匹配算法的实现:多组学变异特征数据库整合了多数据来源,多组学角度的关联模型的变异特征信息。当病例完成多组学变异特征的提取,进入病例匹配系统时,需要根据数据库中模型的类型,提供病例-模型的匹配算法。3) Realization of case-model multi-omics variation cooperative matching algorithm: The multi-omics variation characteristic database integrates the variation characteristic information of the association model from multiple data sources and multi-omics perspectives. When a case completes the extraction of multi-omics variation characteristics and enters the case matching system, it is necessary to provide a case-model matching algorithm according to the type of model in the database.
在本实例中,知识库提供了三类模型:1.伴随诊断关联模型;2.细胞系关联模型;3.病例关联模型。In this example, the knowledge base provides three types of models: 1. Companion diagnostic association models; 2. Cell line association models; 3. Case association models.
群体水平的关联模型给出的干预结果通常是针对某一或某几个特定的组学变异特征在大群体样本中对药物响应的影响。因此实例对该模型采用的策略是,进行比对时如果病例和一个群体模型具有完全相同的组学变异特征,报告病例与该群体模型匹配成功,否则匹配失败。The intervention result given by the population-level association model is usually the effect of one or several specific omics variation characteristics on the drug response in a large population sample. Therefore, the strategy adopted by the example for this model is that if the case and a population model have exactly the same omics variation characteristics during the comparison, the reported case is successfully matched with the population model, otherwise the matching fails.
细胞系水平的关联模型和个体水平的关联模型均提供了完整的单碱基突变、染色体结构突变和基因表达谱信息。因此本实例采用了一个综合了这三方面信息的相似性打分方法来度量病例和模型的相似性。其中使用细胞系水平的关联模型和个体水平的关联模型与病例进行匹配的区别在于最终决定是否匹配成功的阈值参数不同。以下为打分方法的实现步骤:Both cell line-level association models and individual-level association models provide complete information on single-base mutations, chromosomal structural mutations, and gene expression profiles. Therefore, this example adopts a similarity scoring method that combines these three aspects of information to measure the similarity between cases and models. The difference between using the association model at the cell line level and the association model at the individual level to match cases is that the threshold parameters that ultimately determine whether the matching is successful are different. The following are the implementation steps of the scoring method:
一、针对单碱基突变:本实例使用DANN方法来度量病例和模型中单碱基突变的功能重要性,分别对病例和模型中每个基因上发生显著单碱基功能突变的位点的DANN值进行求和,度量该基因上单碱基功能突变对生理的影响。病例与模型中该基因功能突变的相似性分值可通过公式1-|Csnv-Msnv|/Max{Csnv,Msnv}获得,其中Csnv为病例中某一基因的功能突变影响值,Msnv为模型的功能突变影响值。该分值可以作为衡量病例和模型的基因功能相似性的一个指标V1。1. For single-base mutations: This example uses the DANN method to measure the functional importance of single-base mutations in cases and models, respectively, for the DANN of sites with significant single-base functional mutations on each gene in the case and model. The values are summed to measure the physiological impact of single-base functional mutations in the gene. The similarity score of the functional mutation of the gene in the case and the model can be obtained by the formula 1-|C snv -M snv |/Max{C snv ,M snv }, where C snv is the functional mutation impact value of a gene in the case , M snv is the functional mutation effect value of the model. This score can be used as an indicator V1 to measure the similarity of gene function between cases and models.
二、针对染色体结构变异:目前还没有直接度量基因融合对生理影响程度的方法。考虑到通常结构变异对基因生理功能的影响非常严重,本实例用一个自定义的指标V2(0或1)来衡量病例和样本在基因融合事件上的相似性。若在病例和模型中,某一基因均发生了基因融合或均未发生基因融合,则V2值为1,否则V2值为0。2. For chromosomal structural variation: There is no method to directly measure the degree of physiological impact of gene fusion. Considering that structural variation usually has a very serious impact on the physiological function of genes, this example uses a custom indicator V2 (0 or 1) to measure the similarity between cases and samples in gene fusion events. If a gene fusion occurs in both cases and models, or no gene fusion occurs, the V2 value is 1; otherwise, the V2 value is 0.
三、针对异常表达基因:本实例定义了一个指标V3来衡量基因表达量异常,公式为V3=1-|Cexp-Mexp|/Max{Cexp,Mexp},其中Cexp和Mexp分别为表达谱经过标准化处理后病例和模型中某一基因的表达量。3. For abnormally expressed genes: This example defines an indicator V3 to measure abnormal gene expression, the formula is V3=1-|C exp -M exp |/Max{C exp ,M exp }, where C exp and M exp are the expression levels of a gene in cases and models after the expression profile is normalized.
在本实例中,考虑到基因表达异常反应了转录水平上的变异,单碱基突变或染色体结构变异反应了基因组上的变异,因此在整合这些指标时需要综合两者的效应。实例中病例与模型最终针对某一基因的相似性分值定义为V=Min{V3*V1,V3*V2},其中V1,V2,V3为说明书上文中所提到的三个相似性指标。对某一特定基因,若相似性分值高于0.5,则认为该基因在病例和模型中表现一致。当病例中有超过半数的基因表现与它们在模型中的表现一致,则认为病例与模型匹配成功,否则认为匹配失败。In this example, considering that abnormal gene expression reflects variation at the transcriptional level, and single-base mutation or chromosomal structural variation reflects variation on the genome, it is necessary to integrate the effects of both when integrating these indicators. In the example, the final similarity score between the case and the model for a certain gene is defined as V=Min{V3*V1, V3*V2}, where V1, V2, V3 are the three similarity indicators mentioned above in the specification. For a specific gene, if the similarity score is higher than 0.5, the gene is considered to be consistent in the case and model. When more than half of the genes in the cases are consistent with their performance in the model, the case and the model are considered to be successfully matched, otherwise the matching is considered unsuccessful.
4)根据病例的匹配结果产生分析报告:4) Generate an analysis report according to the matching results of the cases:
分析报告展示主要分为两个部分:个体病例信息和知识库搜索结果展示。The analysis report display is mainly divided into two parts: individual case information and knowledge base search result display.
在本实例中个体病例信息展示包含:In this example, the display of individual case information includes:
1.测序样品基本信息(包含样品名,送样时间,测序时间,测序仪型号,样品标签,数据饱和度评估参数);1. Basic information of sequencing samples (including sample name, sample delivery time, sequencing time, sequencer model, sample label, and data saturation evaluation parameters);
2.组学数据整体展示图,转录组测序数据统计信息(包含样品原始读段数,清洗后读段数,比对到参考基因组上的读段数,特异性比对上的读段数信息);2. The overall display of the omics data, and the statistical information of the transcriptome sequencing data (including the number of original reads of the sample, the number of reads after cleaning, the number of reads compared to the reference genome, and the number of reads in the specific alignment);
3.检测到表达的基因的表达分布直方图,差异表达基因的图表;3. The histogram of the expression distribution of the detected genes, and the chart of the differentially expressed genes;
4.基因组上单碱基变异和结构变异的数量统计及变异文件格式解读;4. Quantitative statistics of single base variation and structural variation on the genome and interpretation of the variation file format;
5.原始数据QC报告位置,基因和转录本的表达文件位置,差异表达基因的文件位置,单碱基变异信息的文件位置,基因融合信息的文件位置。5. Raw data QC report location, expression file location of genes and transcripts, file location of differentially expressed genes, file location of single base variation information, file location of gene fusion information.
知识库搜索结果展示包含:The knowledge base search result display includes:
1.匹配上的模型的基本信息(模型类型、原始数据来源、模型名称、疾病名称等);1. Basic information of the matched model (model type, original data source, model name, disease name, etc.);
2.支持病例匹配上模型的证据(模型和病例中匹配上的指标的类型、指标名称、指标的度量值等);2. Evidence to support the model on case matching (type of index, index name, index measurement value, etc.) on model and case matching;
3.匹配上的模型的临床用药参考信息(药物名称、模型对药物是否响应等)3. The clinical drug reference information of the matched model (drug name, whether the model responds to the drug, etc.)
5)搜索系统的自进化:5) Self-evolution of the search system:
一、精准医学知识库的更新:对进入知识库分析的病例进行跟踪,根据病例遵医治疗效果和长期结局,建立病例组学变异特征-干预响应关联模型,加入知识库。对初次进入知识库没有搜索到匹配模型的病例,考虑建立个体化疾病模型(PDX小鼠模型或PDO类器官模型),根据体外个体化疾病模型对不同药物的反应,建立个体化疾病模型组学变异特征-药物响应关联模型,加入知识库。1. Update of the precision medicine knowledge base: Track the cases entered into the knowledge base for analysis, and establish a case-omics variation characteristic-intervention response correlation model according to the medical compliance treatment effect and long-term outcome of the cases, and join the knowledge base. For cases where no matching model is found in the knowledge base for the first time, consider establishing an individualized disease model (PDX mouse model or PDO organoid model), and establish an individualized disease model based on the response of the in vitro individualized disease model to different drugs. Variation signature-drug response association model, added to knowledge base.
二、匹配算法的自进化:当搜索系统内知识库中某一类模型数量累积到一定值时,可以随机选择M个该类模型,依据它们对药物的响应进行分类,用于针对该类模型的匹配算法的评估。当实现了一个新的病例与该类模型的匹配算法的时候,可以比较新匹配算法和旧匹配算法的评估结果。如果新方法与根据药物的响应进行分类的一致性更高,说明新匹配算法在真实情景下的应用效果更佳,更新该匹配算法,否则说明原算法表现更好,放弃更新算法。2. Self-evolution of matching algorithm: When the number of models of a certain type in the knowledge base in the search system accumulates to a certain value, M models of this type can be randomly selected and classified according to their responses to drugs for targeting this type of model. The evaluation of the matching algorithm. When a matching algorithm for a new case and this type of model is implemented, the evaluation results of the new matching algorithm and the old matching algorithm can be compared. If the new method is more consistent with the classification according to the response of the drug, it means that the new matching algorithm has better application effect in the real situation, and the matching algorithm is updated; otherwise, the original algorithm is better, and the update algorithm is abandoned.
以上列举的仅是本发明的具体实施例。显然,本发明不限于以上实施例,还可以有许多变形,本领域的普通技术人员能从本发明公开的内容直接导出或联想到的所有变形,均应认为是本发明的保护范围。The foregoing enumerations are merely specific embodiments of the present invention. Obviously, the present invention is not limited to the above embodiments, and there can be many modifications, and all modifications that those of ordinary skill in the art can directly derive or associate from the disclosure of the present invention should be considered as the protection scope of the present invention.
Claims (9)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710218630.XA CN107103207B (en) | 2017-04-05 | 2017-04-05 | Accurate medical knowledge search system based on case multigroup variation characteristics and implementation method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710218630.XA CN107103207B (en) | 2017-04-05 | 2017-04-05 | Accurate medical knowledge search system based on case multigroup variation characteristics and implementation method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN107103207A CN107103207A (en) | 2017-08-29 |
| CN107103207B true CN107103207B (en) | 2020-07-03 |
Family
ID=59675265
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201710218630.XA Expired - Fee Related CN107103207B (en) | 2017-04-05 | 2017-04-05 | Accurate medical knowledge search system based on case multigroup variation characteristics and implementation method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN107103207B (en) |
Families Citing this family (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108335748A (en) * | 2018-01-18 | 2018-07-27 | 中山大学 | A kind of nasopharyngeal carcinoma artificial intelligence assisting in diagnosis and treatment policy server cluster |
| CN108320797B (en) * | 2018-01-18 | 2022-03-08 | 中山大学 | Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database |
| CN108509771B (en) * | 2018-03-27 | 2020-12-22 | 华南理工大学 | A method for discovering associations in multi-omics data based on sparse matching |
| CN109599157B (en) * | 2018-11-29 | 2020-10-02 | 同济大学 | A precise intelligent diagnosis and treatment big data system |
| CN110656172A (en) * | 2019-01-14 | 2020-01-07 | 南方医科大学珠江医院 | Molecular marker and kit for predicting sensitivity of small cell lung cancer to EP chemotherapy scheme |
| CN110379460B (en) * | 2019-06-14 | 2023-06-20 | 西安电子科技大学 | A cancer typing information processing method based on multi-omics data |
| CN110660055B (en) * | 2019-09-25 | 2022-11-29 | 北京青燕祥云科技有限公司 | Disease data prediction method and device, readable storage medium and electronic equipment |
| CN112053783A (en) * | 2020-08-27 | 2020-12-08 | 北京颢云信息科技股份有限公司 | Disease intelligent prediction modeling method based on multiple groups of mathematical data |
| CN112070731B (en) * | 2020-08-27 | 2021-05-11 | 佛山读图科技有限公司 | Method for guiding registration of human body model atlas and case CT image by artificial intelligence |
| CN117457068B (en) * | 2023-06-30 | 2024-05-24 | 上海睿璟生物科技有限公司 | Multi-genetics-based functional biomarker screening method, system, terminal and medium |
Citations (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1547721A (en) * | 2001-08-28 | 2004-11-17 | Systems, methods and instruments for storing, acquiring and integrating clinical, diagnostic, genetic and therapeutic data | |
| CN102637245A (en) * | 2001-05-25 | 2012-08-15 | 株式会社日立制作所 | Information processing system using nucleotide sequence-related information |
| CN103955608A (en) * | 2014-04-24 | 2014-07-30 | 上海星华生物医药科技有限公司 | Intelligent medical information remote processing system and processing method |
| CN104067278A (en) * | 2011-11-18 | 2014-09-24 | 加利福尼亚大学董事会 | Half eye half eye: Parallel comparative analysis of high-throughput sequencing data |
| CN105229649A (en) * | 2013-03-15 | 2016-01-06 | 百世嘉(上海)医疗技术有限公司 | For the human genome analysis of variance of disease association and the system and method for report |
| CN105229651A (en) * | 2013-05-23 | 2016-01-06 | 皇家飞利浦有限公司 | DNA sequence dna fast and the retrieval of safety |
| CN105701342A (en) * | 2016-01-12 | 2016-06-22 | 西北工业大学 | Agent-based construction method and device of intuitionistic fuzzy theory medical diagnosis model |
| CN105760705A (en) * | 2016-05-20 | 2016-07-13 | 陕西科技大学 | Medical diagnosis system based on big data |
| CN106202936A (en) * | 2016-07-13 | 2016-12-07 | 为朔医学数据科技(北京)有限公司 | A kind of disease risks Forecasting Methodology and system |
| CN106227992A (en) * | 2016-07-13 | 2016-12-14 | 为朔医学数据科技(北京)有限公司 | A kind of recommendation method and system of therapeutic scheme |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020064792A1 (en) * | 1997-11-13 | 2002-05-30 | Lincoln Stephen E. | Database for storage and analysis of full-length sequences |
| US20110256545A1 (en) * | 2010-04-14 | 2011-10-20 | Nancy Lan Guo | mRNA expression-based prognostic gene signature for non-small cell lung cancer |
-
2017
- 2017-04-05 CN CN201710218630.XA patent/CN107103207B/en not_active Expired - Fee Related
Patent Citations (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102637245A (en) * | 2001-05-25 | 2012-08-15 | 株式会社日立制作所 | Information processing system using nucleotide sequence-related information |
| CN1547721A (en) * | 2001-08-28 | 2004-11-17 | Systems, methods and instruments for storing, acquiring and integrating clinical, diagnostic, genetic and therapeutic data | |
| CN104067278A (en) * | 2011-11-18 | 2014-09-24 | 加利福尼亚大学董事会 | Half eye half eye: Parallel comparative analysis of high-throughput sequencing data |
| CN105229649A (en) * | 2013-03-15 | 2016-01-06 | 百世嘉(上海)医疗技术有限公司 | For the human genome analysis of variance of disease association and the system and method for report |
| CN105229651A (en) * | 2013-05-23 | 2016-01-06 | 皇家飞利浦有限公司 | DNA sequence dna fast and the retrieval of safety |
| CN103955608A (en) * | 2014-04-24 | 2014-07-30 | 上海星华生物医药科技有限公司 | Intelligent medical information remote processing system and processing method |
| CN105701342A (en) * | 2016-01-12 | 2016-06-22 | 西北工业大学 | Agent-based construction method and device of intuitionistic fuzzy theory medical diagnosis model |
| CN105760705A (en) * | 2016-05-20 | 2016-07-13 | 陕西科技大学 | Medical diagnosis system based on big data |
| CN106202936A (en) * | 2016-07-13 | 2016-12-07 | 为朔医学数据科技(北京)有限公司 | A kind of disease risks Forecasting Methodology and system |
| CN106227992A (en) * | 2016-07-13 | 2016-12-14 | 为朔医学数据科技(北京)有限公司 | A kind of recommendation method and system of therapeutic scheme |
Non-Patent Citations (2)
| Title |
|---|
| 《Gene–disease relationship discovery based on model-driven data integration and database view definition》;S. Yilmaz等;《BIOINFORMATICS》;20090228;第25卷(第2期);第230-236页 * |
| 《医疗大数据临床应用的探索与实践》;汪鹏等;《中国数字医学》;20160930;第11卷(第9期);第8-14页 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN107103207A (en) | 2017-08-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN107103207B (en) | Accurate medical knowledge search system based on case multigroup variation characteristics and implementation method | |
| CN105096225B (en) | Analysis system, device and method for assisting disease diagnosis and treatment | |
| CN109086571B (en) | A kind of method and system that monogenic disease hereditary variation is intelligently interpreted and reported | |
| Li et al. | Decoding the genomics of abdominal aortic aneurysm | |
| CN110021364B (en) | Analysis and detection system for screening single-gene genetic disease pathogenic genes based on patient clinical symptom data and whole exome sequencing data | |
| McCarty et al. | The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies | |
| CN105229649B (en) | System and method for human genome analysis of variance and the report of disease association | |
| EP3977343A1 (en) | Systems and methods of clinical trial evaluation | |
| Bastarache et al. | Improving the phenotype risk score as a scalable approach to identifying patients with Mendelian disease | |
| Vithlani et al. | Economic evaluations of artificial intelligence-based healthcare interventions: a systematic literature review of best practices in their conduct and reporting | |
| Davis et al. | Automated extraction of clinical traits of multiple sclerosis in electronic medical records | |
| EP4260340A1 (en) | Predicting fractional flow reserve from electrocardiograms and patient records | |
| CN111192634A (en) | Method for processing genomic data | |
| Gabryszewski et al. | Unsupervised modeling and genome-wide association identify novel features of allergic march trajectories | |
| CN116640847A (en) | Cancer evolution detection and diagnosis | |
| CN106650256A (en) | Precise medical platform for molecular diagnosis and treatment | |
| WO2019079464A1 (en) | Molecular evidence platform for auditable, continuous optimization of variant interpretation in genetic and genomic testing and analysis | |
| Heintzman et al. | Agreement of Medicaid claims and electronic health records for assessing preventive care quality among adults | |
| WO2021248695A1 (en) | Monogenic disease name recommendation method and system based on clinical features and sequence variations | |
| US20110093448A1 (en) | System method and computer program product for pedigree analysis | |
| Wu et al. | Molecular classification of geriatric breast cancer displays distinct senescent subgroups of prognostic significance | |
| Lee et al. | A comparison between similarity matrices for principal component analysis to assess population stratification in sequenced genetic data sets | |
| Zou et al. | Revealing the diagnostic value and immune infiltration of senescence-related genes in endometriosis: a combined single-cell and machine learning analysis | |
| CN114566221A (en) | Automatic analysis and interpretation system for NGS data of genetic diseases | |
| CA3225678A1 (en) | Systems and methods for providing accurate patient data corresponding with progression milestones for providing treatment options and outcome tracking |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200703 |