CN111785384A

CN111785384A - Artificial intelligence-based abnormal data identification method and related equipment

Info

Publication number: CN111785384A
Application number: CN202010609844.1A
Authority: CN
Inventors: 刘伟; 张旭
Original assignee: Ping An Medical and Healthcare Management Co Ltd
Current assignee: Shenzhen Ping An Medical Health Technology Service Co Ltd
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2020-10-16
Anticipated expiration: 2040-06-29
Also published as: CN111785384B

Abstract

The invention relates to the technical field of artificial intelligence, and provides an artificial intelligence-based abnormal data identification method, comprising: receiving input medical data; performing medical processing on the medical data to obtain the processing data; inputting the processing data into a pre-training In a good combined classifier model, a classification prediction result is obtained, wherein the combined classification model is obtained by training based on multiple dimensions; according to the classification prediction result, an abnormality judgment is performed on each cluster of the medical data. , to obtain the clustering abnormal result; by guiding the aggregation algorithm bagging, according to the clustering abnormal result, the abnormality judgment is performed on the medical treatment record matched by the medical data, and the abnormal medical treatment record result is obtained. The invention also relates to blockchain technology, which can upload abnormal results of medical records to the blockchain. The present application can be applied in smart medical scenarios, thereby promoting the construction of smart cities.

Description

Artificial intelligence-based abnormal data identification method and related equipment

技术领域technical field

本发明涉及人工智能技术领域，尤其涉及一种基于人工智能的异常数据识别方法及相关设备。The invention relates to the technical field of artificial intelligence, in particular to an artificial intelligence-based abnormal data identification method and related equipment.

背景技术Background technique

随着社会医疗保障制度的不断完善，人们的看病就医问题得到了解决。目前，人们在看病就医后，可以拿着病例数据去报销一部分医疗费用，这从很大程度减轻了人们的医疗费用问题。With the continuous improvement of the social medical security system, the problem of people's medical treatment has been solved. At present, after seeing a doctor, people can use the case data to reimburse part of the medical expenses, which greatly reduces the problem of people's medical expenses.

然而，实践中发现，有些非法用户会捏造医疗数据去报销医疗费用，如果这些非法用户报销的医疗费用数额很大，将会导致没有足够的资金来保障合法用户的医疗报销，这无疑会影响合法用户的合法权益。However, in practice, it is found that some illegal users will fabricate medical data to reimburse medical expenses. If the amount of medical expenses reimbursed by these illegal users is large, there will be insufficient funds to guarantee the medical reimbursement of legal users, which will undoubtedly affect the legal the legitimate rights and interests of users.

因此，如何对医学数据进行异常风险识别是一个亟待解决的技术问题。Therefore, how to identify abnormal risks in medical data is an urgent technical problem to be solved.

发明内容SUMMARY OF THE INVENTION

鉴于以上内容，有必要提供一种基于人工智能的异常数据识别方法及相关设备，能够对医学数据进行异常风险识别。In view of the above, it is necessary to provide an abnormal data identification method and related equipment based on artificial intelligence, which can identify abnormal risks in medical data.

本发明的第一方面提供一种基于人工智能的异常数据识别方法，所述基于人工智能的异常数据识别方法包括：A first aspect of the present invention provides a method for identifying abnormal data based on artificial intelligence, and the method for identifying abnormal data based on artificial intelligence includes:

接收输入的医学数据；receive incoming medical data;

对所述医学数据进行医学处理，获得处理数据；Perform medical processing on the medical data to obtain processed data;

将所述处理数据输入至预先训练好的组合分类器模型中，获得分类预测结果，其中，所述组合分类型模型是基于多个维度训练得到的；Inputting the processed data into a pre-trained combined classifier model to obtain a classification prediction result, wherein the combined classification model is obtained by training based on multiple dimensions;

根据所述分类预测结果，对所述医学数据的每个聚类进行异常判定，获得聚类异常结果；According to the classification prediction result, abnormality determination is performed on each cluster of the medical data to obtain an abnormal clustering result;

通过引导聚集算法bagging，根据所述聚类异常结果，对所述医学数据所匹配的就诊记录进行异常判定，获得就诊记录异常结果。By guiding the clustering algorithm bagging, according to the clustering abnormal result, an abnormality judgment is performed on the medical treatment record matched with the medical data, and an abnormal medical treatment record result is obtained.

在一种可能的实现方式中，所述对所述医学数据进行医学处理，获得处理数据包括：In a possible implementation manner, the performing medical processing on the medical data, and obtaining the processing data includes:

对所述医学数据中的非标准数据进行标准转换，获得第一数据；standard conversion is performed on the non-standard data in the medical data to obtain the first data;

对所述医学数据中的错误数据进行更正，获得第二数据；Correcting erroneous data in the medical data to obtain second data;

对所述医学数据中的年龄数据按照年龄进行转换，获得第三数据；Converting the age data in the medical data according to age to obtain third data;

将所述第一数据、所述第二数据、所述第三数据以及所述医学数据中未被处理的数据确定为处理数据。Unprocessed data among the first data, the second data, the third data, and the medical data is determined as processed data.

在一种可能的实现方式中，所述根据所述分类预测结果，对所述医学数据的每个聚类进行异常判定，获得聚类异常结果包括：In a possible implementation manner, performing an abnormality determination on each cluster of the medical data according to the classification prediction result, and obtaining the clustering abnormality result includes:

针对每个所述聚类，获取所述聚类的异常数量阈值；For each of the clusters, obtain an abnormal number threshold of the clusters;

判断所述分类预测结果中判定所述聚类的处理数据为异常数据的数量是否超过所述异常数量阈值；Judging whether the number of processed data of the cluster as abnormal data in the classification prediction result exceeds the abnormal number threshold;

若所述分类预测结果中判定所述聚类的处理数据为异常数据的数量超过所述异常数量阈值，确定所述聚类为异常聚类；或If, in the classification prediction result, it is determined that the number of processed data of the cluster as abnormal data exceeds the abnormal number threshold, the cluster is determined to be an abnormal cluster; or

若所述分类预测结果中判定所述聚类的处理数据为异常数据的数量未超过所述异常数量阈值，确定所述聚类为正常聚类。If, in the classification prediction result, it is determined that the number of processed data of the cluster as abnormal data does not exceed the abnormal number threshold, it is determined that the cluster is a normal cluster.

在一种可能的实现方式中，所述通过引导聚集算法bagging，根据所述聚类异常结果，对所述医学数据所匹配的就诊记录进行异常判定，获得就诊记录异常结果包括：In a possible implementation manner, by guiding the aggregation algorithm bagging, according to the clustering abnormal results, the abnormality judgment is performed on the medical treatment records matched by the medical data, and the abnormal results of the medical treatment records obtained include:

将所述医学数据按照就诊记录进行划分，获得多条就诊记录数据；Divide the medical data according to the medical records, and obtain a plurality of medical records;

针对每条所述就诊记录数据，通过引导聚集算法bagging，确定出现异常聚类的第一就诊记录数据以及未出现异常聚类的第二就诊记录数据；For each piece of medical treatment record data, by guiding the aggregation algorithm bagging, determine the first medical treatment record data with abnormal clustering and the second medical treatment record data without abnormal clustering;

分别为所述第一就诊记录数据和所述第二就诊记录数据设置异常标识。An abnormality flag is respectively set for the first medical visit record data and the second medical visit record data.

在一种可能的实现方式中，所述接收输入的医学数据之前，所述基于人工智能的异常数据识别方法还包括：In a possible implementation manner, before the receiving the input medical data, the artificial intelligence-based abnormal data identification method further includes:

获取第一测试集、第二测试集和第三测试集；Obtain the first test set, the second test set and the third test set;

基于高斯混合模型GMM，对所述第一测试集进行训练，获得第一分类器；Based on the Gaussian mixture model GMM, the first test set is trained to obtain a first classifier;

基于KL散度，对所述第二测试集进行训练，获得第二分类器；Based on the KL divergence, the second test set is trained to obtain a second classifier;

基于深度对抗聚类DASC，对所述第三测试集进行训练，获得第三分类器；Based on the deep adversarial clustering DASC, the third test set is trained to obtain a third classifier;

对所述第一分类器、所述第二分类器以及所述第三分类器进行组合，获得组合分类器模型。The first classifier, the second classifier and the third classifier are combined to obtain a combined classifier model.

在一种可能的实现方式中，所述基于人工智能的异常数据识别方法还包括：In a possible implementation, the artificial intelligence-based method for identifying abnormal data further includes:

根据所述就诊记录异常结果，从多条所述就诊记录数据中确定异常就诊记录数据；According to the abnormal result of the medical treatment record, determine the abnormal medical treatment record data from a plurality of the medical treatment record data;

针对所述异常就诊记录数据，根据预设的聚类权重，计算每条所述异常就诊记录数据的异常分值；For the abnormal medical treatment record data, calculate the abnormal score of each abnormal medical treatment record data according to a preset clustering weight;

根据所述异常分值，确定每条所述异常就诊记录数据的异常程度级别。According to the abnormal score, the abnormality level of each abnormal medical treatment record data is determined.

判断所述异常程度级别是否高于预设异常级别阈值；judging whether the abnormality level is higher than a preset abnormality level threshold;

若所述异常程度级别高于预设异常级别阈值，确定所述异常就诊记录数据所属的用户为高风险用户。If the abnormality level is higher than the preset abnormality level threshold, it is determined that the user to which the abnormal medical treatment record data belongs is a high-risk user.

本发明的第二方面提供一种异常数据识别装置，其特征在于，所述异常数据识别装置包括：A second aspect of the present invention provides an abnormal data identification device, characterized in that, the abnormal data identification device includes:

接收模块，用于接收输入的医学数据；a receiving module for receiving the input medical data;

处理模块，用于对所述医学数据进行医学处理，获得处理数据；a processing module for performing medical processing on the medical data to obtain processing data;

输入模块，用于将所述处理数据输入至预先训练好的组合分类器模型中，获得分类预测结果，其中，所述组合分类型模型是基于多个维度训练得到的；an input module for inputting the processed data into a pre-trained combined classifier model to obtain a classification prediction result, wherein the combined classification model is obtained by training based on multiple dimensions;

判定模块，用于根据所述分类预测结果，对所述医学数据的每个聚类进行异常判定，获得聚类异常结果；A determination module, configured to perform an abnormal determination on each cluster of the medical data according to the classification prediction result, and obtain an abnormal cluster result;

匹配模块，用于通过引导聚集算法bagging，根据所述聚类异常结果，对所述医学数据所匹配的就诊记录进行异常判定，获得就诊记录异常结果；The matching module is used for carrying out an abnormal judgment on the medical treatment record matched by the medical data according to the clustering abnormal result by bagging the guided aggregation algorithm, and obtaining the abnormal medical treatment record result;

上传模块，用于将所述就诊记录异常结果上传至区块链。The uploading module is used to upload the abnormal result of the medical treatment record to the blockchain.

本发明的第三方面提供一种电子设备，所述电子设备包括处理器和存储器，所述处理器用于执行所述存储器中存储的计算机程序时实现所述的基于人工智能的异常数据识别方法。A third aspect of the present invention provides an electronic device, the electronic device includes a processor and a memory, and the processor is configured to implement the artificial intelligence-based abnormal data identification method when executing a computer program stored in the memory.

本发明的第四方面提供一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现所述的基于人工智能的异常数据识别方法。A fourth aspect of the present invention provides a computer-readable storage medium, where a computer program is stored thereon, and when the computer program is executed by a processor, the artificial intelligence-based abnormal data identification method is implemented.

在上述技术方案中，可以基于多个维度的分类器对医学数据进行分类预测，识别出药品异常，检查异常，患者异常，医生异常，医疗机构异常等,最终判断该次就诊记录是否正常,则被标记为异常记录。将多维度的异常识别模型应用在医保风控中,能够对医学数据的异常进行精准识别，同时，不强依赖于医学知识和医学背景，有利于识别出高风险的欺诈骗保用户。In the above technical solution, the medical data can be classified and predicted based on the classifiers of multiple dimensions, and the abnormality of medicine, abnormality of examination, abnormality of patient, abnormality of doctor, abnormality of medical institution, etc. can be identified, and finally whether the record of the visit is normal or not, then is marked as an exception record. The application of the multi-dimensional anomaly identification model in medical insurance risk control can accurately identify the anomalies of medical data. At the same time, it is not strongly dependent on medical knowledge and medical background, which is conducive to identifying high-risk fraudulent insurance users.

附图说明Description of drawings

图1是本发明公开的一种基于人工智能的异常数据识别方法的较佳实施例的流程图。FIG. 1 is a flowchart of a preferred embodiment of an artificial intelligence-based abnormal data identification method disclosed in the present invention.

图2是本发明公开的一种异常数据识别装置的较佳实施例的功能模块图。FIG. 2 is a functional block diagram of a preferred embodiment of an abnormal data identification device disclosed in the present invention.

图3是本发明实现基于人工智能的异常数据识别方法的较佳实施例的电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device implementing a preferred embodiment of the artificial intelligence-based abnormal data identification method according to the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本申请的说明书和权利要求书中的术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "comprising" and "having" and any variations thereof in the description and claims of this application are intended to cover non-exclusive inclusion, eg, a process, method, system, product or product comprising a series of steps or units. The apparatus is not necessarily limited to those steps or units expressly listed, but may include other steps or units not expressly listed or inherent to the process, method, product or apparatus.

另外，各个实施例之间的技术方案可以相互结合，但是必须是以本领域普通技术人员能够实现为基础，当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在，也不在本发明要求的保护范围之内。In addition, the technical solutions between the various embodiments can be combined with each other, but must be based on the realization by those of ordinary skill in the art. When the combination of technical solutions is contradictory or cannot be realized, it should be considered that the combination of such technical solutions does not exist. , is not within the scope of protection required by the present invention.

所述电子设备是一种能够按照事先设定或存储的指令，自动进行数值计算和/或信息处理的设备，其硬件包括但不限于微处理器、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、数字信号处理器(DSP)、嵌入式设备等。所述电子设备可以包括但不限于任何一种可与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互的电子产品，例如，个人计算机、平板电脑、智能手机、个人数字助理PDA等。The electronic device is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, application-specific integrated circuits (ASICs), field programmable gates Arrays (FPGA), Digital Signal Processors (DSP), embedded devices, etc. The electronic device may include, but is not limited to, any electronic product that can interact with the user through a keyboard, a mouse, a remote control, a touchpad or a voice-activated device, for example, a personal computer, a tablet computer, a smart phone, a personal computer, a Digital Assistant PDA etc.

请参见图1，图1是本发明公开的一种基于人工智能的异常数据识别方法的较佳实施例的流程图。其中，根据不同的需求，该流程图中步骤的顺序可以改变，某些步骤可以省略。Please refer to FIG. 1. FIG. 1 is a flowchart of a preferred embodiment of an artificial intelligence-based abnormal data identification method disclosed in the present invention. Wherein, according to different requirements, the order of the steps in the flowchart can be changed, and some steps can be omitted.

S11、接收输入的医学数据。S11. Receive the input medical data.

其中，医学数据可以包括但不限于病人就诊记录ID、病人ID、医生ID、医疗机构ID、性别、年龄、主诊断、社保目录名称和社保目录统一编码等医疗结算数据。The medical data may include, but not limited to, medical settlement data such as patient visit record ID, patient ID, doctor ID, medical institution ID, gender, age, primary diagnosis, social security directory name, and social security directory unified code.

S12、对所述医学数据进行医学处理，获得处理数据。S12. Perform medical processing on the medical data to obtain processed data.

具体的，所述对所述医学数据进行医学处理，获得处理数据包括：Specifically, performing medical processing on the medical data and obtaining the processed data includes:

其中，(1)所述医学数据中有些数据是不标准的，需要将医学数据中的非标准数据转换为标准数据，例如在性别描述中，采用不同的描述方式，包括“男”和“女”，“F”和“M”等，统一为一种标准进行描述；(2)将医疗数据中错误数据进行更正，例如在诊疗项目信息的描述中，出现“血细胞分析_五分类加收/项”、“血细胞分析(五分类)”、“血细胞分析(五分类加收)/次”等，则更正为“(新)血细胞分析_五分类加收”；(3)将病人的年龄按照年龄组换算为婴儿、幼儿、少年、青年、中年和老年等。Among them, (1) some data in the medical data are non-standard, and it is necessary to convert the non-standard data in the medical data into standard data. For example, in the gender description, different description methods are used, including "male" and "female". ”, “F” and “M”, etc., are unified into one standard for description; (2) Correct the erroneous data in the medical data, for example, in the description of the diagnosis and treatment item information, the occurrence of “Blood cell analysis_Five classification additional charges/ item", "blood cell analysis (five categories)", "blood cell analysis (five categories plus)/time", etc., then corrected to "(new) blood cell analysis_five categories plus"; (3) The age of the patient is calculated according to Age groups are converted to infants, toddlers, teenagers, youth, middle-aged and elderly, etc.

对所述医学数据进行处理后，有利于后续的组合分类器模型的统一识别分类，有助于提高分类的准确性。After the medical data is processed, it is beneficial to the unified identification and classification of the subsequent combined classifier model, and helps to improve the accuracy of the classification.

S13、将所述处理数据输入至预先训练好的组合分类器模型中，获得分类预测结果，其中，所述组合分类型模型是基于多个维度训练得到的。S13. Input the processed data into a pre-trained combined classifier model to obtain a classification prediction result, wherein the combined classification model is obtained by training based on multiple dimensions.

其中，所述组合分类器可以是由多个分类器组合而成的，每个分类器均可以对所述处理数据进行聚类，某些未被聚类到某个类别的离散的数据会被认为是异常数据，而能够聚类成某类的数据会被认为是正常数据。The combined classifier may be composed of multiple classifiers, and each classifier can cluster the processed data, and some discrete data that is not clustered into a certain category will be It is considered to be abnormal data, and the data that can be clustered into a certain class is considered to be normal data.

其中，所述分类预测结果包括所述组合分类器中的每个分类器对所述处理数据的聚类结果以及数据异常结果。其中，聚类结果即聚类的类型，包括但不限于药品类别、检查类别、患者类别、医生类别以及医疗机构类别。数据异常结果即某个聚类的数据出现异常，比如药品异常、检查异常、患者异常、医生异常或医疗机构异常等。Wherein, the classification prediction result includes the clustering result of each classifier in the combined classifier to the processed data and the abnormal data result. The clustering result is the type of clustering, including but not limited to drug category, examination category, patient category, doctor category, and medical institution category. The abnormal data result means that the data of a certain cluster is abnormal, such as abnormal drugs, abnormal examinations, abnormal patients, abnormal doctors, or abnormal medical institutions.

可选的，所述步骤S11之前，所述方法还包括：Optionally, before the step S11, the method further includes:

其中，GMM是指对样本的概率密度分布进行估计，而估计采用的训练模型是几个高斯模型的加权和。每个高斯模型就代表了一个聚类。对样本中的数据分别在几个高斯模型上投影，就会分别得到在各个类上的概率。然后，可以选取概率最大的类为聚类结果。Among them, GMM refers to the estimation of the probability density distribution of the sample, and the training model used for estimation is the weighted sum of several Gaussian models. Each Gaussian model represents a cluster. The data in the sample is projected on several Gaussian models, and the probability of each class is obtained separately. Then, the class with the highest probability can be selected as the clustering result.

假设混合高斯模型由K个高斯模型组成(即数据包含K个类)，则GMM的概率密度函数如下：Assuming that the mixture Gaussian model consists of K Gaussian models (that is, the data contains K classes), the probability density function of GMM is as follows:

其中，p(x|k)＝N(x|u_k,∑k)是第k个高斯模型的概率密度函数，可以看成选定第k个模型后，该模型产生x的概率，p(k)＝π_k是第k个高斯模型的权重，称作选择第k个模型的先验概率，且满足Among them, p(x|k)=N(x|u _k ,∑k) is the probability density function of the kth Gaussian model, which can be regarded as the probability that the model produces x after the kth model is selected, p( k)=π _k is the weight of the k-th Gaussian model, which is called the prior probability of selecting the k-th model, and satisfies

举例来说，描述患者的信息有年龄、性别、主诊断、开出的药品、诊疗等N维特征，如果已有2个患者，则此时构建N维二成分的高斯混合模型。根据已有的2个患者数据点确定高斯混合模型的参数均值向量和协方差矩阵，确定参数可以使用EM期望最大算法，最终得到描述两个患者的两个GMM。如果新来一个患者，将新来的患者以N维特征表示，输入到计算好的两个高斯混合模型中，得出在每个GMM下概率，哪个概率大，则新来患者就属于哪类患者。以此类推，就实现了对患者的聚类。For example, the information describing a patient includes N-dimensional features such as age, gender, main diagnosis, prescribed drugs, and diagnosis and treatment. If there are already 2 patients, an N-dimensional two-component Gaussian mixture model is constructed at this time. According to the existing two patient data points, the parameter mean vector and covariance matrix of the Gaussian mixture model are determined, and the EM expectation maximization algorithm can be used to determine the parameters, and finally two GMMs describing the two patients are obtained. If a new patient comes, the new patient is represented by N-dimensional features and input into the calculated two Gaussian mixture models, and the probability under each GMM is obtained. Whichever probability is greater, then the new patient belongs to which category patient. By analogy, the clustering of patients is achieved.

其中，KL散度是两个概率分布P和Q差别的非对称性的度量。KL-divergence就是用来衡量这种情况下平均每个字符多用的比特数。可以用来衡量两个分布的距离，公式如下：where KL divergence is a measure of the asymmetry of the difference between two probability distributions P and Q. KL-divergence is used to measure the average number of bits per character used in this case. Can be used to measure the distance between two distributions, the formula is as follows:

其中，P表示数据的真实分布，Q表示数据的理论分布、模型分布，或P的近似分布。如果用概率分布p(x)的最优编码(即字符x的编码长度等于log(1/P(x))来为符合分布Q(x)的字符编码，那么表示这些字符就会比理想情况多用一些比特数。Among them, P represents the true distribution of the data, and Q represents the theoretical distribution of the data, the model distribution, or the approximate distribution of P. If the optimal encoding of the probability distribution p(x) is used (that is, the encoding length of the character x is equal to log(1/P(x)) to encode the characters conforming to the distribution Q(x), then these characters will be represented than the ideal case. Use a few more bits.

举例来说，描述医生诊疗行为及诊疗效果的信息如下：医生ID、医生所在医疗机构ID、就诊人数、就诊人次、贵重药品占比、检查费用占比、再住院率、复发率、复诊率等N维特征。(1)如果已有M个医生信息，在M个N维向量中随机抽取Q个向量作为初始均值聚类中心，Q为类型数；(2)对于每个待聚类的医生信息计算到每个均值中心的模糊度量的距离；(3)对待聚类的医生分配一个类号，类号取自于具与最小模糊度量距离的均值中心所在的类别；(4)遍历Q个向量，根据所属类号，分别计算具有相同类号的向量的平均向量，该平均向量更新作为新的均值中心；(5)对于每一类别，计算当前均值中心与被更新均值中心间的模糊度量距离；(6)若前后均值中心的模糊度量距离均小于与预设的阈值，则分类结束，否则，回到第(2)步。以此类推，就实现了对医生的聚类。For example, the information describing the doctor's diagnosis and treatment behavior and effect of diagnosis and treatment is as follows: doctor ID, ID of the medical institution where the doctor is located, number of visits, number of visits, proportion of expensive drugs, proportion of inspection costs, readmission rate, recurrence rate, return visit rate, etc. N-dimensional features. (1) If there are M doctor information, randomly select Q vectors from the M N-dimensional vectors as the initial mean clustering center, where Q is the number of types; (2) For each doctor information to be clustered, calculate the number of (3) assign a class number to the doctor to be clustered, and the class number is taken from the category of the mean center with the smallest distance from the fuzzy measure; (4) traverse the Q vectors, according to the Class number, calculate the average vector of vectors with the same class number respectively, and this average vector is updated as the new mean center; (5) For each category, calculate the fuzzy metric distance between the current mean center and the updated mean center; (6) ) If the fuzzy metric distance between the front and rear mean centers is smaller than the preset threshold, the classification ends, otherwise, go back to step (2). By analogy, the clustering of doctors is achieved.

其中，深度对抗聚类(DASC)网络由子空间生成器和判别器两部分组成：在生成器中，使用一个卷积自编码器来学习样本表示，自表达层嵌入在编码器和解码器中用来得到样本的相似矩阵并使用Ncut方法进行聚类，然后以聚类生成的簇为条件生成“真样本”和“假样本”；对于判别器，接收生成的真假样本，并且区分出真样本和假样本。通过DASC网络训练，最终得到系数矩阵CC，通过谱聚类得到聚类结果。Among them, the deep adversarial clustering (DASC) network consists of two parts: a subspace generator and a discriminator: in the generator, a convolutional autoencoder is used to learn the sample representation, and the self-expression layer is embedded in the encoder and decoder with To get the similarity matrix of the samples and use the Ncut method for clustering, and then generate "true samples" and "false samples" based on the clusters generated by the clustering; for the discriminator, receive the generated true and false samples, and distinguish the true samples and fake samples. Through DASC network training, the coefficient matrix CC is finally obtained, and the clustering result is obtained through spectral clustering.

利用深度对抗子空间聚类得到最终聚类结果，根据聚类结果求出每一类中患者的平均值作为聚类中心，然后计算每个患者到该类聚类中心的距离，选出每个类中最接近聚类中心的患者得到患者子集，即为患者聚类的结果。Use deep adversarial subspace clustering to obtain the final clustering result, calculate the average value of patients in each category as the clustering center according to the clustering results, and then calculate the distance from each patient to the clustering center of this category, and select each The patients in the class closest to the cluster center get the patient subset, which is the result of patient clustering.

S14、根据所述分类预测结果，对所述医学数据的每个聚类进行异常判定，获得聚类异常结果。S14. According to the classification prediction result, perform abnormality determination on each cluster of the medical data to obtain a cluster abnormality result.

具体的，所述根据所述分类预测结果，对所述医学数据的每个聚类进行异常判定，获得聚类异常结果包括：Specifically, according to the classification prediction result, the abnormality judgment is performed on each cluster of the medical data, and the clustering abnormal result obtained includes:

举例来说，针对检查这个聚类来说，如果所述分类预测结果中有超过2个处理数据被判定为异常数据，则认为检查聚类为异常聚类，反之，如果所述分类预测结果中只有1个处理数据被判定为异常数据，则认为检查聚类不是异常聚类，即属于正常聚类。For example, for checking this cluster, if more than 2 processed data in the classification prediction result are determined to be abnormal data, then the inspection cluster is considered to be an abnormal cluster. On the contrary, if the classification prediction result in Only 1 processing data is judged as abnormal data, then it is considered that the inspection cluster is not an abnormal cluster, that is, it belongs to a normal cluster.

S15、通过引导聚集算法bagging，根据所述聚类异常结果，对所述医学数据所匹配的就诊记录进行异常判定，获得就诊记录异常结果。S15 , performing an abnormality judgment on the medical treatment records matched by the medical data according to the clustering abnormal results by bagging the guided aggregation algorithm, and obtaining the abnormal results of the medical treatment records.

具体的，所述通过引导聚集算法bagging，根据所述聚类异常结果，对所述医学数据所匹配的就诊记录进行异常判定，获得就诊记录异常结果包括：Specifically, according to the bagging of the guided aggregation algorithm, according to the abnormal clustering results, the abnormality judgment is performed on the medical records matched by the medical data, and the abnormal results of the medical records obtained include:

其中，每条就诊记录数据可以包括各个聚类，如果某条就诊记录出现某个聚类异常，可以认为该条就诊记录为异常记录，可以设置异常标识，比如“1”，反之，如果某条就诊记录的所有聚类均正常，可以认为该条就诊记录为正常记录，可以设置正常标识，比如“0”。Among them, each medical record data can include various clusters. If a certain cluster is abnormal in a medical record, the medical record can be considered as an abnormal record, and an abnormal flag can be set, such as "1". On the contrary, if a certain medical record is abnormal All clusters of the medical treatment record are normal, and the medical treatment record can be considered as a normal record, and a normal flag can be set, such as "0".

如下表所示：As shown in the table below:

所述方法还包括：The method also includes:

其中，可以预先设置每个聚类的聚类权重，比如药品设置聚类权重A，检查设置聚类权重B，患者设置聚类权重C，医生设置聚类权重D，医疗机构设置聚类权重E，其中，不同聚类的聚类权重是不同的，聚类权重的高低主要跟聚类的重要性相关，越重要的聚类，聚类权重就越高，比如医疗机构的聚类权重就比药品的聚类权重高。Among them, the clustering weight of each cluster can be set in advance, for example, clustering weight A is set for drugs, clustering weight B is set for inspection, clustering weight C is set for patients, clustering weight D is set for doctors, and clustering weight E is set for medical institutions , among which, the clustering weights of different clusters are different. The level of the clustering weights is mainly related to the importance of the clusters. The more important the clusters are, the higher the clustering weights are. For example, the clustering weights of medical institutions are higher than Drugs have a high clustering weight.

其中，计算出每条异常就诊记录数据的异常分值后，就可以根据异常分值的高低来设置异常程度级别，具体的，可以设置异常分值阈值，将异常分值与异常分值阈值进行比较，来确定异常程度级别，可以分为严重异常、中度异常以及轻易异常。Among them, after calculating the abnormal score of each abnormal medical treatment record data, the abnormal degree level can be set according to the abnormal score. Specifically, the abnormal score threshold can be set, and the abnormal score can be compared with the abnormal score threshold. Comparison to determine the level of abnormality, which can be divided into severe abnormality, moderate abnormality and easy abnormality.

可选的，所述方法还包括：Optionally, the method further includes:

其中，如果所述异常程度级别高于预设异常级别阈值，表明所述异常就诊记录数据出现了很严重的异常，从某个程度也反映了这个异常可能会是人为造假的虚报数据，所述异常就诊记录数据所属的用户也可能为高风险用户。Wherein, if the abnormality level is higher than the preset abnormality level threshold, it indicates that there is a very serious abnormality in the abnormal medical treatment record data, and to a certain extent, it also reflects that the abnormality may be artificially fabricated false data. The user to which the abnormal medical treatment record data belongs may also be a high-risk user.

通过上述方式，可以识别出高风险的欺诈骗保人员，有利于医保风控的管理。Through the above methods, high-risk fraudsters can be identified, which is beneficial to the management of medical insurance risk control.

可选的，所述方法还包括：Optionally, the method further includes:

将所述就诊记录异常结果上传至区块链。Upload the abnormal result of the medical treatment record to the blockchain.

其中，为了确保数据的私密性和安全性，需要将所述就诊记录异常结果上传至区块链进行保存。Among them, in order to ensure the privacy and security of the data, it is necessary to upload the abnormal results of the medical treatment records to the blockchain for storage.

在图1所描述的方法流程中，可以基于多个维度的分类器对医学数据进行分类预测，识别出药品异常，检查异常，患者异常，医生异常，医疗机构异常等，最终判断该次就诊记录是否正常，则被标记为异常记录。将多维度的异常识别模型应用在医保风控中，能够对医学数据的异常进行精准识别，同时，不强依赖于医学知识和医学背景，有利于识别出高风险的欺诈骗保用户。In the method flow described in Fig. 1, the medical data can be classified and predicted based on the classifiers of multiple dimensions, and the abnormality of medicine, abnormality of examination, abnormality of patient, abnormality of doctor, abnormality of medical institution, etc. can be identified, and finally the record of this visit can be judged. If it is normal, it will be marked as an abnormal record. The application of the multi-dimensional anomaly identification model in medical insurance risk control can accurately identify the anomalies of medical data. At the same time, it is not strongly dependent on medical knowledge and medical background, which is conducive to identifying high-risk fraudulent insurance users.

以上所述，仅是本发明的具体实施方式，但本发明的保护范围并不局限于此，对于本领域的普通技术人员来说，在不脱离本发明创造构思的前提下，还可以做出改进，但这些均属于本发明的保护范围。The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited to this. improvements, but these all belong to the protection scope of the present invention.

请参见图2，图2是本发明公开的一种异常数据识别装置的较佳实施例的功能模块图。Please refer to FIG. 2 , which is a functional block diagram of a preferred embodiment of an abnormal data identification device disclosed in the present invention.

在一些实施例中，所述异常数据识别装置运行于电子设备中。所述异常数据识别装置可以包括多个由程序代码段所组成的功能模块。所述异常数据识别装置中的各个程序段的程序代码可以存储于存储器中，并由至少一个处理器所执行，以执行图1所描述的基于人工智能的异常数据识别方法中的部分或全部步骤，具体可以参考图1中的相关描述，在此不再赘述。In some embodiments, the abnormal data identification apparatus is executed in an electronic device. The abnormal data identification device may include a plurality of functional modules composed of program code segments. The program codes of each program segment in the abnormal data identification device can be stored in the memory and executed by at least one processor to execute part or all of the steps in the artificial intelligence-based abnormal data identification method described in FIG. 1 . , for details, reference may be made to the relevant description in FIG. 1 , which will not be repeated here.

本实施例中，所述异常数据识别装置根据其所执行的功能，可以被划分为多个功能模块。所述功能模块可以包括：接收模块201、处理模块202、输入模块203、判定模块204及匹配模块205。本发明所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一系列计算机程序段，其存储在存储器中。在一些实施例中，关于各模块的功能将在本实施例中详述。In this embodiment, the abnormal data identification device may be divided into a plurality of functional modules according to the functions performed by the device. The functional modules may include: a receiving module 201 , a processing module 202 , an input module 203 , a determining module 204 and a matching module 205 . The modules referred to in the present invention refer to a series of computer program segments that can be executed by at least one processor and can perform fixed functions, and are stored in a memory. In some embodiments, the functions of each module will be described in detail in this embodiment.

接收模块201，用于接收输入的医学数据。The receiving module 201 is used for receiving input medical data.

处理模块202，用于对所述医学数据进行医学处理，获得处理数据。The processing module 202 is configured to perform medical processing on the medical data to obtain processing data.

输入模块203，用于将所述处理数据输入至预先训练好的组合分类器模型中，获得分类预测结果，其中，所述组合分类型模型是基于多个维度训练得到的。The input module 203 is configured to input the processed data into a pre-trained combined classifier model to obtain a classification prediction result, wherein the combined classification model is obtained by training based on multiple dimensions.

判定模块204，用于根据所述分类预测结果，对所述医学数据的每个聚类进行异常判定，获得聚类异常结果。The determination module 204 is configured to perform abnormality determination on each cluster of the medical data according to the classification prediction result, and obtain a cluster abnormality result.

匹配模块205，用于通过引导聚集算法bagging，根据所述聚类异常结果，对所述医学数据所匹配的就诊记录进行异常判定，获得就诊记录异常结果。The matching module 205 is configured to perform an abnormality judgment on the medical treatment records matched by the medical data according to the clustering abnormal results through bagging guided aggregation algorithm, and obtain the abnormal medical treatment records results.

如下表所示：As shown in the table below:

可选的，所述异常数据识别装置还包括：Optionally, the abnormal data identification device further includes:

确定模块，用于根据所述就诊记录异常结果，从多条所述就诊记录数据中确定异常就诊记录数据；a determining module, configured to determine abnormal medical treatment record data from a plurality of the medical treatment record data according to the abnormal result of the medical treatment record;

计算模块，用于针对所述异常就诊记录数据，根据预设的聚类权重，计算每条所述异常就诊记录数据的异常分值；a calculation module, configured to calculate the abnormal score of each abnormal medical visit record data according to a preset clustering weight for the abnormal medical visit record data;

所述确定模块，还用于根据所述异常分值，确定每条所述异常就诊记录数据的异常程度级别。The determining module is further configured to determine, according to the abnormal score, the abnormality level of each of the abnormal medical treatment record data.

可选的，所述判定模块，还用于判断所述异常程度级别是否高于预设异常级别阈值；Optionally, the determination module is further configured to determine whether the abnormality level is higher than a preset abnormality level threshold;

所述确定模块，还用于若所述异常程度级别高于预设异常级别阈值，确定所述异常就诊记录数据所属的用户为高风险用户。The determining module is further configured to determine that the user to which the abnormal medical treatment record data belongs is a high-risk user if the abnormality degree level is higher than a preset abnormality level threshold.

获取模块，用于获取第一测试集、第二测试集和第三测试集；an acquisition module for acquiring the first test set, the second test set and the third test set;

训练模块，用于基于高斯混合模型GMM，对所述第一测试集进行训练，获得第一分类器；a training module for training the first test set based on the Gaussian mixture model GMM to obtain a first classifier;

所述训练模块，还用于基于KL散度，对所述第二测试集进行训练，获得第二分类器；The training module is further configured to perform training on the second test set based on the KL divergence to obtain a second classifier;

所述训练模块，还用于基于深度对抗聚类DASC，对所述第三测试集进行训练，获得第三分类器；The training module is also used to train the third test set based on the deep confrontation clustering DASC to obtain a third classifier;

组合模块，用于对所述第一分类器、所述第二分类器以及所述第三分类器进行组合，获得组合分类器模型。The combination module is configured to combine the first classifier, the second classifier and the third classifier to obtain a combined classifier model.

在图2所描述的异常数据识别装置中，可以基于多个维度的分类器对医学数据进行分类预测，识别出药品异常，检查异常，患者异常，医生异常，医疗机构异常等，最终判断该次就诊记录是否正常，则被标记为异常记录。将多维度的异常识别模型应用在医保风控中，能够对医学数据的异常进行精准识别，同时，不强依赖于医学知识和医学背景，有利于识别出高风险的欺诈骗保用户。In the abnormal data identification device described in FIG. 2 , the medical data can be classified and predicted based on the classifiers of multiple dimensions, and the abnormal drug, abnormal examination, abnormal patient, abnormal doctor, abnormal medical institution, etc. can be identified, and the final judgment of the time Whether the medical record is normal, it is marked as an abnormal record. The application of the multi-dimensional anomaly identification model in medical insurance risk control can accurately identify the anomalies of medical data. At the same time, it is not strongly dependent on medical knowledge and medical background, which is conducive to identifying high-risk fraudulent insurance users.

如图3所示，图3是本发明实现基于人工智能的异常数据识别方法的较佳实施例的电子设备的结构示意图。所述电子设备3包括存储器31、至少一个处理器32、存储在所述存储器31中并可在所述至少一个处理器32上运行的计算机程序33及至少一条通讯总线34。As shown in FIG. 3 , FIG. 3 is a schematic structural diagram of an electronic device implementing a preferred embodiment of the artificial intelligence-based abnormal data identification method of the present invention. The electronic device 3 includes a memory 31 , at least one processor 32 , a computer program 33 stored in the memory 31 and executable on the at least one processor 32 , and at least one communication bus 34 .

本领域技术人员可以理解，图3所示的示意图仅仅是所述电子设备3的示例，并不构成对所述电子设备3的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件，例如所述电子设备3还可以包括输入输出设备、网络接入设备等。Those skilled in the art can understand that the schematic diagram shown in FIG. 3 is only an example of the electronic device 3, and does not constitute a limitation on the electronic device 3, and may include more or less components than those shown, or combinations thereof Certain components, or different components, for example, the electronic device 3 may also include input and output devices, network access devices, and the like.

所述至少一个处理器32可以是中央处理单元(Central Processing Unit，CPU)，还可以是其他通用处理器、数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现场可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。该处理器32可以是微处理器或者该处理器32也可以是任何常规的处理器等，所述处理器32是所述电子设备3的控制中心，利用各种接口和线路连接整个电子设备3的各个部分。The at least one processor 32 may be a central processing unit (Central Processing Unit, CPU), and may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC) ), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The processor 32 can be a microprocessor or the processor 32 can also be any conventional processor, etc. The processor 32 is the control center of the electronic device 3, and uses various interfaces and lines to connect the entire electronic device 3 of each part.

所述存储器31可用于存储所述计算机程序33和/或模块/单元，所述处理器32通过运行或执行存储在所述存储器31内的计算机程序和/或模块/单元，以及调用存储在存储器31内的数据，实现所述电子设备3的各种功能。所述存储器31可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等；存储数据区可存储根据电子设备3的使用所创建的数据(比如音频数据)等。此外，存储器31可以包括非易失性存储器，例如硬盘、内存、插接式硬盘，智能存储卡(Smart Media Card，SMC)，安全数字(Secure Digital，SD)卡，闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。The memory 31 can be used to store the computer program 33 and/or modules/units, and the processor 32 executes or executes the computer programs and/or modules/units stored in the memory 31 and calls the computer programs and/or modules/units stored in the memory 31. 31 to realize various functions of the electronic device 3 . The memory 31 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may Data such as audio data and the like created in accordance with the use of the electronic device 3 are stored. In addition, the memory 31 may include non-volatile memory, such as hard disk, internal memory, plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, Flash Card (Flash Card), At least one disk storage device, flash memory device, or other non-volatile solid state storage device.

结合图1，所述电子设备3中的所述存储器31存储多个指令以实现一种基于人工智能的异常数据识别方法，所述处理器32可执行所述多个指令从而实现：1, the memory 31 in the electronic device 3 stores a plurality of instructions to implement a method for identifying abnormal data based on artificial intelligence, and the processor 32 can execute the plurality of instructions to achieve:

接收输入的医学数据；receive incoming medical data;

通过引导聚集算法bagging，根据所述聚类异常结果，对所述医学数据所匹配的就诊记录进行异常判定，获得就诊记录异常结果；By guiding the clustering algorithm bagging, according to the clustering abnormal results, the abnormality judgment is performed on the medical treatment records matched by the medical data, and the abnormal results of the medical treatment records are obtained;

具体地，所述处理器32对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述，在此不赘述。Specifically, for the specific implementation method of the above-mentioned instruction by the processor 32, reference may be made to the description of the relevant steps in the corresponding embodiment of FIG. 1, and details are not described herein.

在图3所描述的电子设备3中，可以基于多个维度的分类器对医学数据进行分类预测，识别出药品异常，检查异常，患者异常，医生异常，医疗机构异常等，最终判断该次就诊记录是否正常，则被标记为异常记录。将多维度的异常识别模型应用在医保风控中，能够对医学数据的异常进行精准识别，同时，不强依赖于医学知识和医学背景，有利于识别出高风险的欺诈骗保用户。In the electronic device 3 described in FIG. 3 , the medical data can be classified and predicted based on the classifiers of multiple dimensions, and the abnormality of medicine, abnormality of examination, abnormality of patient, abnormality of doctor, abnormality of medical institution, etc. can be identified, and finally the visit to the doctor can be judged. If the record is normal, it will be marked as abnormal record. The application of the multi-dimensional anomaly identification model in medical insurance risk control can accurately identify the anomalies of medical data. At the same time, it is not strongly dependent on medical knowledge and medical background, which is conducive to identifying high-risk fraudulent insurance users.

所述电子设备3集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明实现上述实施例方法中的全部或部分流程，也可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一计算机可读存储介质中，该计算机程序在被处理器执行时，可实现上述各个方法实施例的步骤。其中，所述计算机程序包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器以及只读存储器(ROM，Read-Only Memory)。If the modules/units integrated in the electronic device 3 are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. Based on this understanding, the present invention can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disc, a computer memory, and a read-only memory (ROM, Read-Only Memory). .

在本发明所提供的几个实施例中，应该理解到，所揭露的系统，装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式。In the several embodiments provided by the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division manners in actual implementation.

所述作为分离部件说明的模块可以是或者也可以不是物理上分开的，作为模块显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本发明各个实施例中的各功能模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

对于本领域技术人员而言，显然本发明不限于上述示范性实施例的细节，而且在不背离本发明的精神或基本特征的情况下，能够以其他的具体形式实现本发明。因此，无论从哪一点来看，均应将实施例看作是示范性的，而且是非限制性的，本发明的范围由所附权利要求而不是上述说明限定，因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。系统权利要求中陈述的多个单元或装置也可以通过软件或者硬件来实现。It will be apparent to those skilled in the art that the present invention is not limited to the details of the above-described exemplary embodiments, but that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Therefore, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the invention is to be defined by the appended claims rather than the foregoing description, which are therefore intended to fall within the scope of the claims. All changes within the meaning and range of the equivalents of , are included in the present invention. Any reference signs in the claims shall not be construed as limiting the involved claim. Several units or means recited in the system claims can also be realized by software or hardware.

最后应说明的是，以上实施例仅用以说明本发明的技术方案而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或等同替换，而不脱离本发明技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. a method for identifying abnormal data based on artificial intelligence, is characterized in that, the method for identifying abnormal data based on artificial intelligence comprises:

receive incoming medical data;

Perform medical processing on the medical data to obtain processed data;

Inputting the processed data into a pre-trained combined classifier model to obtain a classification prediction result, wherein the combined classification model is obtained by training based on multiple dimensions;

According to the classification prediction result, abnormality determination is performed on each cluster of the medical data to obtain an abnormal clustering result;

By guiding the clustering algorithm bagging, according to the clustering abnormal result, an abnormality judgment is performed on the medical treatment record matched with the medical data, and an abnormal medical treatment record result is obtained.

2. The method for identifying abnormal data based on artificial intelligence according to claim 1, characterized in that, performing medical processing on the medical data and obtaining the processed data comprises:

standard conversion is performed on the non-standard data in the medical data to obtain the first data;

Correcting erroneous data in the medical data to obtain second data;

Converting the age data in the medical data according to age to obtain third data;

Unprocessed data among the first data, the second data, the third data, and the medical data is determined as processed data.

3. The method for identifying abnormal data based on artificial intelligence according to claim 1, characterized in that, according to the classification prediction result, abnormal judgment is performed on each cluster of the medical data to obtain a cluster abnormal result. include:

For each of the clusters, obtain an abnormal number threshold of the clusters;

Judging whether the number of processed data of the cluster as abnormal data in the classification prediction result exceeds the abnormal number threshold;

If, in the classification prediction result, it is determined that the number of processed data of the cluster as abnormal data exceeds the abnormal number threshold, the cluster is determined to be an abnormal cluster; or

If, in the classification prediction result, it is determined that the number of processed data of the cluster as abnormal data does not exceed the abnormal number threshold, it is determined that the cluster is a normal cluster.

4. The method for identifying abnormal data based on artificial intelligence according to claim 1, characterized in that, according to the clustering abnormal result, an abnormality is performed on the medical treatment record matched by the medical data by bagging a guided aggregation algorithm. It is determined that the abnormal results obtained from the medical records include:

Divide the medical data according to the medical records, and obtain a plurality of medical records;

For each piece of medical treatment record data, by guiding the aggregation algorithm bagging, determine the first medical treatment record data with abnormal clustering and the second medical treatment record data without abnormal clustering;

An abnormality flag is respectively set for the first medical visit record data and the second medical visit record data.

5. The artificial intelligence-based abnormal data identification method according to any one of claims 1 to 4, characterized in that, before the described receiving input medical data, the artificial intelligence-based abnormal data identification method further comprises:

Obtain the first test set, the second test set and the third test set;

Based on the Gaussian mixture model GMM, the first test set is trained to obtain a first classifier;

Based on the KL divergence, the second test set is trained to obtain a second classifier;

Based on the deep adversarial clustering DASC, the third test set is trained to obtain a third classifier;

The first classifier, the second classifier and the third classifier are combined to obtain a combined classifier model.

6. The method for identifying abnormal data based on artificial intelligence according to any one of claims 1 to 4, wherein the method for identifying abnormal data based on artificial intelligence also comprises:

According to the abnormal result of the medical treatment record, determine the abnormal medical treatment record data from a plurality of the medical treatment record data;

For the abnormal medical treatment record data, calculate the abnormal score of each abnormal medical treatment record data according to a preset clustering weight;

According to the abnormal score, the abnormality level of each abnormal medical treatment record data is determined.

7. The method for identifying abnormal data based on artificial intelligence according to any one of claims 1 to 4, wherein the method for identifying abnormal data based on artificial intelligence also comprises:

judging whether the abnormality level is higher than a preset abnormality level threshold;

If the abnormality level is higher than the preset abnormality level threshold, it is determined that the user to which the abnormal medical treatment record data belongs is a high-risk user.

8. A device for identifying abnormal data, wherein the device for identifying abnormal data comprises:

a receiving module for receiving the input medical data;

a processing module for performing medical processing on the medical data to obtain processing data;

an input module for inputting the processed data into a pre-trained combined classifier model to obtain a classification prediction result, wherein the combined classification model is obtained by training based on multiple dimensions;

A determination module, configured to perform an abnormal determination on each cluster of the medical data according to the classification prediction result, and obtain an abnormal cluster result;

The matching module is used for carrying out an abnormal judgment on the medical treatment record matched by the medical data according to the clustering abnormal result by bagging the guided aggregation algorithm, and obtaining the abnormal medical treatment record result;

The uploading module is used to upload the abnormal result of the medical treatment record to the blockchain.

9. An electronic device, characterized in that the electronic device comprises a processor and a memory, and the processor is configured to execute a computer program stored in the memory to implement the manual-based method according to any one of claims 1 to 7. An intelligent method for identifying abnormal data.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores at least one instruction, and when the at least one instruction is executed by a processor, implements the method according to any one of claims 1 to 7. An artificial intelligence-based method for identifying abnormal data.