[go: up one dir, main page]

CN111105317B - A medical insurance fraud detection method based on drug purchase records - Google Patents

A medical insurance fraud detection method based on drug purchase records Download PDF

Info

Publication number
CN111105317B
CN111105317B CN201911383476.7A CN201911383476A CN111105317B CN 111105317 B CN111105317 B CN 111105317B CN 201911383476 A CN201911383476 A CN 201911383476A CN 111105317 B CN111105317 B CN 111105317B
Authority
CN
China
Prior art keywords
chain
drug
abnormal
patient
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911383476.7A
Other languages
Chinese (zh)
Other versions
CN111105317A (en
Inventor
孙佰清
鲍鑫
王天辰
高稳
王思霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology Shenzhen
Original Assignee
Harbin Institute of Technology Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology Shenzhen filed Critical Harbin Institute of Technology Shenzhen
Priority to CN201911383476.7A priority Critical patent/CN111105317B/en
Publication of CN111105317A publication Critical patent/CN111105317A/en
Application granted granted Critical
Publication of CN111105317B publication Critical patent/CN111105317B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Technology Law (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a medical insurance fraud detection method based on a medicine purchase record, belongs to the field of medicine fraud detection methods, and provides the medical insurance fraud detection method based on the medicine purchase record, which can accurately extract medical insurance fraud information, is convenient to operate and has high applicability. In the invention, a fraudster classification model is constructed through a machine learning algorithm; inputting patient information and medicine purchasing information into a model, and establishing a patient-medicine bipartite graph; according to the patient-medicine bipartite graph, a medicine single-mode projection graph is established to form a medicine chain; dividing a medicine chain into a normal chain and an abnormal chain by using a correlation chain algorithm; calculating the similarity of the normal chain and the abnormal chain through cosine similarity formulas respectively; retaining a comparison combination of the abnormal chain and the normal chain with the similarity of not 0; removing the same products in the abnormal chain and the normal chain in the combination, and retaining other medicines; and synthesizing the rest medicines into a fraud chain, and outputting the fraud chain. The invention is mainly used for detecting the fraudulent behavior of fraudulent patients.

Description

一种基于购药记录的医疗保险欺诈检测方法A medical insurance fraud detection method based on drug purchase records

技术领域Technical Field

本发明属于药品欺诈检测方法领域,具体涉及一种医疗保险欺诈检测方法。The invention belongs to the field of drug fraud detection methods, and in particular relates to a medical insurance fraud detection method.

背景技术Background Art

管欺诈并不常见,但医保欺诈事件往往对应着异常的购药记录,同时医保欺诈案件往往具有以下特点:Although medical insurance fraud is not common, medical insurance fraud often corresponds to abnormal drug purchase records. At the same time, medical insurance fraud cases often have the following characteristics:

(1)不常见:欺诈事件罕见,但代价高昂,因此正常患者与诈骗者之间的数量分布极不平衡。(1) Uncommon: Fraud incidents are rare but costly, so the distribution between normal patients and fraudsters is extremely unbalanced.

(2)知识共享:欺诈者经常受到他们的盟友和联系人的影响,进而影响其他人。在医疗采购行为模式中,欺诈知识被转移和发生。(2) Knowledge sharing: Fraudsters are often influenced by their allies and contacts, who in turn influence others. Fraud knowledge is transferred and occurs in medical procurement behavior patterns.

(3)行为模仿:欺诈患者也会模仿正常的参保者的购药行为来掩盖他们的欺诈目标,尽力让自己的购药行为看起来“正常”。(3) Behavioral imitation: Fraudulent patients will also imitate the drug purchasing behavior of normal insured persons to conceal their fraudulent goals and try their best to make their drug purchasing behavior look "normal."

因此,就需要一种能够精准提取医疗保险欺诈信息、操作便捷、适用性强的基于购药记录的医疗保险欺诈检测方法。Therefore, there is a need for a medical insurance fraud detection method based on drug purchase records that can accurately extract medical insurance fraud information, is easy to operate, and has strong applicability.

发明内容Summary of the invention

本发明针对现有医疗保险欺诈模式多样、不能精准确定欺诈信息、人工提取欺诈信息繁琐的问题,提供一种能够精准提取医疗保险欺诈信息、操作便捷、适用性强的基于购药记录的医疗保险欺诈检测方法。In view of the problems that existing medical insurance fraud patterns are diverse, fraud information cannot be accurately determined, and manual extraction of fraud information is cumbersome, the present invention provides a medical insurance fraud detection method based on drug purchase records that can accurately extract medical insurance fraud information, is easy to operate, and has strong applicability.

本发明所涉及的一种基于购药记录的医疗保险欺诈检测方法的技术方案如下:The technical solution of a medical insurance fraud detection method based on drug purchase records involved in the present invention is as follows:

本发明所涉及的一种基于购药记录的医疗保险欺诈检测方法,它包括以下步骤:The present invention relates to a medical insurance fraud detection method based on drug purchase records, which comprises the following steps:

步骤S1、通过机器学习算法构建欺诈者分类模型;Step S1, constructing a fraudster classification model through a machine learning algorithm;

步骤S2、对所述模型输入患者信息和购药信息,建立患者-药品二部图,所述患者信息包括正常患者和欺诈患者;Step S2, inputting patient information and drug purchase information into the model to establish a patient-drug bipartite graph, wherein the patient information includes normal patients and fraudulent patients;

步骤S3、根据患者-药品二部图,建立药品单模投影关系,形成药品链;Step S3: According to the patient-drug bipartite graph, a drug single-mode projection relationship is established to form a drug chain;

步骤S4、利用关联链式算法将步骤S3所述的药品链分为正常链和异常链;Step S4, using an associative chain algorithm to divide the drug chain described in step S3 into a normal chain and an abnormal chain;

步骤S5、将正常链和异常链分别通过余弦相似度公式计算相似度;Step S5, calculating the similarity of the normal chain and the abnormal chain respectively by using the cosine similarity formula;

步骤S6、去除相似度为0的正常链,保留相似度不为0的异常链和正常链的对比组合;Step S6, remove the normal chain with a similarity of 0, and retain the comparison combination of the abnormal chain and the normal chain with a similarity not equal to 0;

步骤S7、去除组合中异常链和正常链中相同的产品,保留其他药品;Step S7, remove the same products in the abnormal chain and the normal chain in the combination, and keep other drugs;

步骤S8、将剩余药品合成欺诈链,输出欺诈链。Step S8: synthesize the remaining drugs into a fraud chain and output the fraud chain.

进一步地:在步骤S1中,整合患者信息,采用机器学习算法提取患者信息的特征向量,对所述特征向量使用监督筛选算法smbinning对每个特征的信息量IV进行计算,并提取信息量IV大于的特征投入机器学习算法,获得欺诈者分类模型。Further: in step S1, the patient information is integrated, and the feature vector of the patient information is extracted using a machine learning algorithm. The information content IV of each feature is calculated using the supervised screening algorithm smbinning for the feature vector, and the features with information content greater than IV are extracted and input into the machine learning algorithm to obtain a fraudster classification model.

进一步地:在步骤S2中,将正常患者的购药信息和存在欺诈行为的欺诈患者的购药信息设置为患者节点和药品节点,分别构建欺诈患者的药品—患者无向二部图及正常患者的药品—患者无向二部图;对患者-药品二部图进行第一轮衍生特征提取,所述第一轮提取的特征包括使用药品的种类总量和使用药品的总量,并根据其衍生特征建立药品单模投影关系。Further: in step S2, the drug purchasing information of normal patients and the drug purchasing information of fraudulent patients with fraudulent behavior are set as patient nodes and drug nodes, and a drug-patient undirected bipartite graph of fraudulent patients and a drug-patient undirected bipartite graph of normal patients are constructed respectively; the first round of derivative feature extraction is performed on the patient-drug bipartite graph, and the features extracted in the first round include the total number of types of drugs used and the total amount of drugs used, and a drug single-mode projection relationship is established based on its derived features.

进一步地:在步骤S3中,对异常链进行第二轮衍生特征提取,所述第二轮提取的特征包括种类异常率、数量异常率和异常链中的异常药品使用率。Further: in step S3, a second round of derivative feature extraction is performed on the abnormal chain, and the features extracted in the second round include the abnormal rate of type, the abnormal rate of quantity and the abnormal drug usage rate in the abnormal chain.

进一步地:在步骤S4中,所述关联链式算法具体为:对二部图的对应矩阵按边权排序,从最高边权对应的药品组合开始,作为异常链中的起始药品组合,进一步检索组合药品中次高边权所连接的药品,依次检索,将药品链串联在一起,输入边权邻接矩阵,输出一条链。Further: In step S4, the associative chain algorithm is specifically as follows: sort the corresponding matrix of the bipartite graph by edge weight, start with the drug combination corresponding to the highest edge weight, as the starting drug combination in the abnormal chain, further search for the drugs connected by the second highest edge weight in the combined drugs, search in sequence, connect the drug chains together, input the edge weight adjacency matrix, and output a chain.

进一步地:在步骤S5中,所述余弦相似度公式为

Figure SMS_1
;其中a、b、c分别为正常链或异常链。Further: In step S5, the cosine similarity formula is
Figure SMS_1
; where a, b, and c are normal chains or abnormal chains respectively.

进一步地:在步骤S8中,对合成的欺诈链进行第三轮衍生特征提取,所述第三轮提取的特征包括种类异常率、数量异常率和异常链中的异常药品使用率。Further: in step S8, a third round of derivative feature extraction is performed on the synthesized fraud chain, and the features extracted in the third round include the abnormal rate of type, the abnormal rate of quantity and the abnormal drug usage rate in the abnormal chain.

本发明所涉及的一种基于购药记录的医疗保险欺诈检测方法的有益效果是:The beneficial effects of the medical insurance fraud detection method based on drug purchase records involved in the present invention are:

本发明所涉及的一种基于购药记录的医疗保险欺诈检测方法,利用二部图及其导出的单模模态投影关系,应用关联链式算法提取欺诈模式转移以及隐藏的药品购买目标,在业务逻辑方面具有迅速准确的优势,便于应用;同时,欺诈链的提取可以帮助监管机构建立避免欺诈活动的监管规则,防止欺诈患者的恶意欺诈活动。所述医疗保险欺诈检测方法针对医疗保险数据中的购药记录进行分析,利用图论算法构建有效衍生特征,对欺诈判断的准确度较高,能够有效的检测多变的医疗保险欺诈模式。The present invention relates to a medical insurance fraud detection method based on drug purchase records. It uses a bipartite graph and a unimodal modal projection relationship derived therefrom, and applies an association chain algorithm to extract fraud pattern transfers and hidden drug purchase targets. It has the advantages of being fast and accurate in terms of business logic and is easy to apply. At the same time, the extraction of fraud chains can help regulatory agencies establish regulatory rules to avoid fraudulent activities and prevent malicious fraud activities of fraudulent patients. The medical insurance fraud detection method analyzes drug purchase records in medical insurance data, uses graph theory algorithms to construct effective derivative features, has a high accuracy in fraud judgment, and can effectively detect variable medical insurance fraud patterns.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的一种基于购药记录的医疗保险欺诈检测方法的流程图;FIG1 is a flow chart of a medical insurance fraud detection method based on drug purchase records of the present invention;

图2为实施例2所述的一种基于购药记录的医疗保险欺诈检测方法的流程图;FIG2 is a flow chart of a medical insurance fraud detection method based on drug purchase records according to Example 2;

图3为实施例2中的正常患者的药品二部图;FIG3 is a bipartite graph of medicines for normal patients in Example 2;

图4为实施例2中的欺诈患者的药品二部图。FIG4 is a bipartite graph of drugs for fraudulent patients in Example 2.

实施方式Implementation

下面结合实施例对本发明的技术方案做进一步的说明,但并不局限于此,凡是对本发明技术方案进行修改或者等同替换,而不脱离本发明技术方案的精神和范围,均应涵盖在本发明的保护范围中。The technical solution of the present invention is further described below in conjunction with the embodiments, but is not limited thereto. Any modification or equivalent replacement of the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention should be included in the protection scope of the present invention.

实施例1Example 1

结合图1说明本实施例,在本实施例中,本实施例所涉及的一种基于购药记录的医疗保险欺诈检测方法,它包括以下步骤:This embodiment is described in conjunction with FIG. 1. In this embodiment, a medical insurance fraud detection method based on drug purchase records involved in this embodiment includes the following steps:

步骤S1、通过机器学习算法构建欺诈者分类模型;整合患者信息,采用机器学习算法提取患者信息的特征向量,对所述特征向量使用监督筛选算法smbinning对每个特征的信息量IV进行计算,并提取信息量IV大于的特征投入机器学习算法,获得欺诈者分类模型。使用机器学习算法完成欺诈检测模型的构建。整合患者对应的特征向量与患者对应的欺诈标记Y,为

Figure SMS_2
,对上述特征使用有监督特征筛选算法smbinning对每个特征的信息量IV进行计算,并提取信息量IV大于0.05的特征投入机器学习算法,获得有效的针对欺诈者的分类模型。Step S1: Construct a fraudster classification model through a machine learning algorithm; integrate patient information, use a machine learning algorithm to extract the feature vector of the patient information, use the supervised screening algorithm smbinning to calculate the information volume IV of each feature of the feature vector, and extract the features with information volume greater than IV and input them into the machine learning algorithm to obtain a fraudster classification model. Use a machine learning algorithm to complete the construction of a fraud detection model. Integrate the feature vector corresponding to the patient and the fraud mark Y corresponding to the patient to obtain
Figure SMS_2
, the supervised feature screening algorithm smbinning is used to calculate the information volume IV of each feature, and the features with information volume IV greater than 0.05 are extracted and input into the machine learning algorithm to obtain an effective classification model for fraudsters.

步骤S2、对所述模型输入患者信息和购药信息,建立患者-药品二部图,所述患者信息包括正常患者和欺诈患者;将正常患者的购药信息和存在欺诈行为的欺诈患者的购药信息设置为患者节点和药品节点,分别构建欺诈患者的药品—患者无向二部图及正常患者的药品—患者无向二部图;对患者-药品二部图进行第一轮衍生特征提取,所述第一轮提取的特征包括使用药品的种类总量和使用药品的总量,并根据其衍生特征建立药品单模投影关系。Step S2, inputting patient information and drug purchase information into the model, establishing a patient-drug bipartite graph, wherein the patient information includes normal patients and fraudulent patients; setting the drug purchase information of normal patients and the drug purchase information of fraudulent patients with fraudulent behavior as patient nodes and drug nodes, respectively constructing a drug-patient undirected bipartite graph of fraudulent patients and a drug-patient undirected bipartite graph of normal patients; performing a first round of derivative feature extraction on the patient-drug bipartite graph, wherein the features extracted in the first round include the total number of types of drugs used and the total amount of drugs used, and establishing a drug single-mode projection relationship based on its derivative features.

从训练数据中对存在欺诈行为的病患以及表现正常的病患进行分割,每个病患均对应多条用药记录。将记录中同一病人的多条用药记录进行整理,基于图论,将患者和药品设置为两类不同的节点,通过用药记录那个患者节点(IDj)与药品节点(Dk)链接在一起,分别构建欺诈病患的药品—患者无向二部图及正常病患的药品—患者无向二部图。针对构建的欺诈病患的药品—患者无向二部图及正常病患的药品—患者无向二部图,对每个患者的购药行为提炼第一轮特征:(1)

Figure SMS_3
:使用药品种类总量 (2)
Figure SMS_4
:使用药品总量;The training data is divided into patients with fraudulent behavior and patients with normal performance. Each patient corresponds to multiple medication records. The multiple medication records of the same patient are sorted out. Based on graph theory, patients and drugs are set as two different types of nodes. The patient node (IDj) and the drug node (Dk) of the medication record are linked together to construct the drug-patient undirected bipartite graph of fraudulent patients and the drug-patient undirected bipartite graph of normal patients. For the constructed drug-patient undirected bipartite graph of fraudulent patients and the drug-patient undirected bipartite graph of normal patients, the first round of features of each patient's drug purchasing behavior are extracted: (1)
Figure SMS_3
:Total number of drugs used (2)
Figure SMS_4
: Total amount of drugs used;

分别推演欺诈病患的药品—患者二部图及正常病患的药品—患者二部图对应的药品单模投影关系,并使用关联链式算法将欺诈患者的购药行为表示为异常链;正常患者的购药行为表示为正常链。The drug unimodal projection relationships corresponding to the drug-patient bipartite graph of fraudulent patients and the drug-patient bipartite graph of normal patients are deduced respectively, and the associative chain algorithm is used to represent the drug purchasing behavior of fraudulent patients as abnormal chains; the drug purchasing behavior of normal patients is represented as normal chains.

二部图的单模式投影算法主要用于研究同类结点间的关系,利用二部图中一类结点通过另一类结点相连的特性,将同类结点进行直接关联聚类,从而生成一个不断增长包含单一类别结点的网络图。单模投影关系依赖于上述二部图。首先,需将所研究的药品节点(Dk)逐个加入到新的网络中,以加入的药品节点为起点在二部图中按照节点边权由低到高查找与之通过患者相连的其他药品节点,将查找过程中获得的药品节点与起点节点相连接。不断重复这一过程,直至所有药物节点完成连接,形成新的单模投影关系。对两投影关系使用关联链式算法进行处理,获得对应患者行为的异常链和正常链。The single-mode projection algorithm of the bipartite graph is mainly used to study the relationship between nodes of the same type. By using the characteristic that one type of node in the bipartite graph is connected through another type of node, the nodes of the same type are directly associated and clustered, thereby generating a growing network graph containing nodes of a single type. The single-mode projection relationship depends on the above-mentioned bipartite graph. First, the drug nodes (Dk) to be studied need to be added to the new network one by one. Taking the added drug nodes as the starting point, other drug nodes connected to the drug nodes through the patients are searched in the bipartite graph according to the node edge weight from low to high, and the drug nodes obtained in the search process are connected to the starting point node. This process is repeated until all drug nodes are connected, forming a new single-mode projection relationship. The two projection relationships are processed using the association chain algorithm to obtain the abnormal chain and normal chain corresponding to the patient's behavior.

链进行第二轮衍生特征提取,所述第二轮提取的特征包括种类异常率、数量异常率和异常链中的异常药品使用率;以欺诈病患行为对应的异常链为基准,衍生第二轮患者行为特征,对于每一条异常链,分别对每个患者提取以下三个特征:(1)每个患者用药的种类与欺诈链中药品种类相同数量与每个患者用药种类总量的比值;(2)每个患者使用欺诈链中药品对应的总量与每个患者用药总量的比值;(3)每个患者使用欺诈链中药品对应的总量与其使用的异常链中药品种类的比值。The second round of derived feature extraction is performed on the chain. The features extracted in the second round include the abnormal rate of type, the abnormal rate of quantity and the abnormal drug usage rate in the abnormal chain. The second round of patient behavior features are derived based on the abnormal chain corresponding to the fraudulent patient behavior. For each abnormal chain, the following three features are extracted for each patient: (1) the ratio of the number of types of drugs used by each patient that is the same as the number of drugs in the fraudulent chain to the total number of drug types used by each patient; (2) the ratio of the total amount of drugs used by each patient in the fraudulent chain to the total amount of drugs used by each patient; (3) the ratio of the total amount of drugs used by each patient in the fraudulent chain to the types of drugs used by the patient in the abnormal chain.

步骤S4、利用关联链式算法将步骤S3所述的药品链分为正常链和异常链;所述关联链式算法具体为:对二部图的对应矩阵按边权排序,从最高边权对应的药品组合开始,作为异常链中的起始药品组合,进一步检索组合药品中次高边权所连接的药品,依次检索,将药品链串联在一起,输入边权邻接矩阵,输出一条链。Step S4, using an associative chain algorithm to divide the drug chain described in step S3 into a normal chain and an abnormal chain; the associative chain algorithm is specifically as follows: sorting the corresponding matrix of the bipartite graph by edge weight, starting from the drug combination corresponding to the highest edge weight, as the starting drug combination in the abnormal chain, further searching for drugs connected by the second highest edge weight in the combined drugs, searching in sequence, connecting the drug chains together, inputting the edge weight adjacency matrix, and outputting a chain.

步骤S5、将正常链和异常链分别通过余弦相似度公式计算相似度;所述余弦相似度公式为

Figure SMS_5
;其中a、b、c分别为正常链或异常链。Step S5: Calculate the similarity of the normal chain and the abnormal chain respectively using the cosine similarity formula; the cosine similarity formula is:
Figure SMS_5
; where a, b, and c are normal chains or abnormal chains respectively.

步骤S6、去除相似度为0的正常链,保留相似度不为0的异常链和正常链的对比组合;计算每个欺诈病患行为对应的异常链与每个正常病患行为对应的正常链按照余弦相似度,去除相似度为0的异常链与正常链的对比组合。保留其他对比组合,去除组合中异常链上与正常链相同的药品,将异常链上的剩余药品组合成欺诈链。Step S6: remove the normal chains with a similarity of 0, and retain the comparison combinations of abnormal chains and normal chains with a similarity of not 0; calculate the cosine similarity between the abnormal chain corresponding to each fraudulent patient behavior and the normal chain corresponding to each normal patient behavior, and remove the comparison combinations of abnormal chains and normal chains with a similarity of 0. Retain other comparison combinations, remove the drugs on the abnormal chain that are the same as the normal chain in the combination, and combine the remaining drugs on the abnormal chain into a fraudulent chain.

步骤S7、去除组合中异常链和正常链中相同的产品,保留其他药品;Step S7, remove the same products in the abnormal chain and the normal chain in the combination, and keep other drugs;

步骤S8、将剩余药品合成欺诈链,对合成的欺诈链进行第三轮衍生特征提取,所述第三轮提取的特征包括种类异常率、数量异常率和异常链中的异常药品使用率;输出欺诈链。Step S8, synthesize the remaining drugs into a fraud chain, and perform a third round of derivative feature extraction on the synthesized fraud chain, wherein the features extracted in the third round include the abnormal rate of type, the abnormal rate of quantity, and the abnormal drug usage rate in the abnormal chain; output the fraud chain.

以欺诈链为基准,衍生第三轮患者行为特征,对于每一条欺诈链,分别对每个患者提取以下三个特征:(1)每个患者用药的种类与欺诈链中药品种类相同数量与每个患者用药种类总量的比值;(2)每个患者使用欺诈链中药品对应的总量与每个患者用药总量的比值;(3)每个患者使用欺诈链中药品对应的总量与其使用的欺诈链中药品种类的比值。Taking the fraud chain as the benchmark, the third round of patient behavior characteristics is derived. For each fraud chain, the following three characteristics are extracted for each patient: (1) the ratio of the number of drug types used by each patient that is the same as the number of drug types in the fraud chain to the total number of drug types used by each patient; (2) the ratio of the total amount of drugs used by each patient in the fraud chain to the total amount of drugs used by each patient; (3) the ratio of the total amount of drugs used by each patient in the fraud chain to the types of drugs used by the patient in the fraud chain.

实施例2Example 2

结合图2、图3和图4以及实施例1说明本实施例,在本实施例中,本实施例所涉及的一种基于购药记录的医疗保险欺诈检测方法,采用某市公布的医保药品购买数据为案例,其对应医疗保险欺诈行为的检测方法,分别对欺诈病患和正常病患建立其对应的患者——药品二部图。This embodiment is explained in conjunction with Figures 2, 3 and 4 as well as Example 1. In this embodiment, a medical insurance fraud detection method based on drug purchase records involved in this embodiment uses the medical insurance drug purchase data published by a certain city as a case, and its corresponding medical insurance fraud detection method establishes corresponding patient-drug bipartite graphs for fraudulent patients and normal patients, respectively.

从训练数据中对存在欺诈行为的病患以及表现正常的病患进行分割,每个病患均对应多条用药记录,总计15000人的1368148条用药记录。将记录中同一病人的多条用药记录进行整理,基于图论,将患者和药品设置为两类不同的节点,通过用药记录那个患者节点(IDj)与药品节点(Dk)链接在一起,分别构建欺诈病患的药品—患者无向二部图及正常病患的药品—患者无向二部图。通过用药记录构建的无向二部图中,患者节点只能通过药品节点相互连接,不能直接相互连接;药物节点亦只能通过患者节点相互连接,不能直接连接。且药品节点与患者节点之间只存在单纯的购买关系,因而,使用无向二部图完成购药行为的表示。无向二部图中的边权为患者对相应药品的购买次数。Patients with fraudulent behavior and normal patients are segmented from the training data. Each patient corresponds to multiple medication records, totaling 1,368,148 medication records for 15,000 people. Multiple medication records of the same patient in the records are sorted out. Based on graph theory, patients and drugs are set as two different types of nodes. The patient node (IDj) and the drug node (Dk) are linked together through the medication record to construct the drug-patient undirected bipartite graph of fraudulent patients and the drug-patient undirected bipartite graph of normal patients. In the undirected bipartite graph constructed by medication records, patient nodes can only be connected to each other through drug nodes, not directly to each other; drug nodes can also only be connected to each other through patient nodes, not directly. There is only a simple purchase relationship between drug nodes and patient nodes. Therefore, an undirected bipartite graph is used to represent drug purchasing behavior. The edge weight in the undirected bipartite graph is the number of times the patient purchases the corresponding drug.

针对构建的欺诈病患的药品—患者无向二部图及正常病患的药品—患者无向二部图,对每个患者的购药行为提炼第一轮特征:(1)

Figure SMS_6
:使用药品种类总量,
Figure SMS_7
表示患者节点j连接的药品节点数量;(2)
Figure SMS_8
:使用药品总量,
Figure SMS_9
表示患者节点j连接的药品节点的权重之和。Based on the constructed drug-patient undirected bipartite graph of fraudulent patients and the drug-patient undirected bipartite graph of normal patients, the first round of features are extracted for each patient’s drug purchasing behavior: (1)
Figure SMS_6
: Total number of types of drugs used,
Figure SMS_7
represents the number of drug nodes connected to patient node j; (2)
Figure SMS_8
: Total amount of drugs used,
Figure SMS_9
Represents the sum of the weights of the drug nodes connected to patient node j.

分别推演欺诈病患的药品—患者二部图及正常病患的药品—患者二部图对应的药品单模投影关系,并使用关联链式算法将欺诈患者的购药行为表示为异常链;正常患者的购药行为表示为正常链。The drug unimodal projection relationships corresponding to the drug-patient bipartite graph of fraudulent patients and the drug-patient bipartite graph of normal patients are deduced respectively, and the associative chain algorithm is used to represent the drug purchasing behavior of fraudulent patients as abnormal chains; the drug purchasing behavior of normal patients is represented as normal chains.

二部图的单模式投影算法主要用于研究同类结点间的关系,利用二部图中一类结点通过另一类结点相连的特性,将同类结点进行直接关联聚类,从而生成一个不断增长包含单一类别结点的网络图。单模投影关系依赖于上述二部图。首先,需将所研究的药品节点(Dk)逐个加入到新的网络中,以加入的药品节点为起点在二部图中按照节点边权由低到高查找与之通过患者相连的其他药品节点,将查找过程中获得的药品节点与起点节点相连接。不断重复这一过程,直至所有药物节点完成连接,形成新的单模投影关系。最后对新形成的单模投影关系进行矩阵表示,节点连接形成的边对应的边权为共同使用两类药品的患者数量,对应矩阵形式如下:The single-mode projection algorithm of the bipartite graph is mainly used to study the relationship between nodes of the same type. By using the characteristic that one type of node in the bipartite graph is connected through another type of node, the nodes of the same type are directly associated and clustered, thereby generating a growing network graph containing nodes of a single category. The single-mode projection relationship depends on the above-mentioned bipartite graph. First, the drug nodes (Dk) to be studied need to be added to the new network one by one. Taking the added drug nodes as the starting point, other drug nodes connected to it through patients are searched in the bipartite graph from low to high according to the node edge weight. The drug nodes obtained in the search process are connected to the starting point node. This process is repeated until all drug nodes are connected to form a new single-mode projection relationship. Finally, the newly formed single-mode projection relationship is represented by a matrix. The edge weight corresponding to the edge formed by the node connection is the number of patients who use two types of drugs together. The corresponding matrix form is as follows:

Figure SMS_10
Figure SMS_10

其中,m为药品的数量,同时使用

Figure SMS_11
药品和
Figure SMS_12
药品的患者数量为
Figure SMS_13
。Among them, m is the number of drugs, and
Figure SMS_11
Medicines and
Figure SMS_12
The number of patients for the drug is
Figure SMS_13
.

据此,获得欺诈病患的药品—患者二部图及正常病患的药品—患者二部图对应的药品单模投影关系,对两投影关系使用关联链式算法进行处理,获得对应患者行为的异常链和正常链。Based on this, the drug unimodal projection relationship corresponding to the drug-patient bipartite graph of fraudulent patients and the drug-patient bipartite graph of normal patients is obtained, and the two projection relationships are processed using the associative chain algorithm to obtain the abnormal chain and normal chain corresponding to the patient behavior.

以对欺诈病患的药品—患者二部图处理获得异常链的过程为例,首先对其二部图对应的矩阵按照边权weight进行排序,并从最高边权对应的药品组合开始,作为异常链中对应的起始药品组合,进一步检索到组合中药品分别对应的次高边权所连接的药品。如果存在,则分别将检索出的药品按照对应关系连接到异常链起始药品组合的两侧,直至无法不重复地检索到链两侧药品对应次高边权的连接药品位置。不断重复上述步骤,直至遍历单模投影关系中所有药品,最终获得承载欺诈病患信息的不包含重复药品的异常链组合。Taking the process of obtaining an abnormal chain by processing the drug-patient bipartite graph of fraudulent patients as an example, first sort the matrix corresponding to the bipartite graph according to the edge weight, and start from the drug combination corresponding to the highest edge weight as the corresponding starting drug combination in the abnormal chain, and further retrieve the drugs connected by the second highest edge weight corresponding to the drugs in the combination. If they exist, connect the retrieved drugs to both sides of the starting drug combination of the abnormal chain according to the corresponding relationship until the connection drug positions corresponding to the second highest edge weight of the drugs on both sides of the chain cannot be retrieved without duplication. Repeat the above steps until all drugs in the single-mode projection relationship are traversed, and finally obtain an abnormal chain combination that carries fraudulent patient information and does not contain duplicate drugs.

对正常病患的药品—患者单模投影关系的处理方式与上述方法一致,最终获得承载正常病患信息的不包含重复药品的正常链组合。The processing method of the drug-patient single-mode projection relationship of normal patients is consistent with the above method, and finally a normal chain combination carrying normal patient information and not containing duplicate drugs is obtained.

以欺诈病患行为对应的异常链为基准,衍生第二轮患者行为特征,对于每一条异常链,分别对每个患者提取以下三个特征:(1)每个患者用药的种类与欺诈链中药品种类相同数量与每个患者用药种类总量的比值;(2)每个患者使用欺诈链中药品对应的总量与每个患者用药总量的比值;(3)每个患者使用欺诈链中药品对应的总量与其使用的异常链中药品种类的比值。若获取异常链的数量为

Figure SMS_14
条,则对应每个患者可以获得
Figure SMS_15
个衍生特征,标记为
Figure SMS_16
。Based on the abnormal chain corresponding to the fraudulent patient behavior, the second round of patient behavior features is derived. For each abnormal chain, the following three features are extracted for each patient: (1) the ratio of the number of types of drugs used by each patient that are the same as the types of drugs in the fraudulent chain to the total number of types of drugs used by each patient; (2) the ratio of the total amount of drugs used by each patient in the fraudulent chain to the total amount of drugs used by each patient; (3) the ratio of the total amount of drugs used by each patient in the fraudulent chain to the types of drugs used by the patient in the abnormal chain. If the number of abnormal chains obtained is
Figure SMS_14
Each patient can obtain
Figure SMS_15
derived features, labeled
Figure SMS_16
.

计算每个欺诈病患行为对应的异常链与每个正常病患行为对应的正常链按照余弦相似度,去除相似度为0的异常链与正常链的对比组合。保留其他对比组合,去除组合中异常链上与正常链相同的药品,将异常链上的剩余药品组合成欺诈链。对每组相似度非0的异常链与正常链进行上述操作,求解对应的欺诈链。Calculate the cosine similarity between the abnormal chain corresponding to each fraudulent patient behavior and the normal chain corresponding to each normal patient behavior, and remove the comparison combinations of abnormal chains and normal chains with a similarity of 0. Keep other comparison combinations, remove the drugs on the abnormal chain that are the same as the normal chain in the combination, and combine the remaining drugs on the abnormal chain into a fraudulent chain. Perform the above operation on each group of abnormal chains and normal chains with a non-zero similarity to solve the corresponding fraudulent chain.

以欺诈链为基准,衍生第三轮患者行为特征,对于每一条欺诈链,分别对每个患者提取以下三个特征:(1)每个患者用药的种类与欺诈链中药品种类相同数量与每个患者用药种类总量的比值;(2)每个患者使用欺诈链中药品对应的总量与每个患者用药总量的比值;(3)每个患者使用欺诈链中药品对应的总量与其使用的欺诈链中药品种类的比值。若获取欺诈链的数量为r条,则对应每个患者可以获得

Figure SMS_17
个衍生特征,标记为
Figure SMS_18
。Based on the fraud chain, the third round of patient behavior features is derived. For each fraud chain, the following three features are extracted for each patient: (1) the ratio of the number of types of drugs used by each patient that are the same as the types of drugs in the fraud chain to the total number of types of drugs used by each patient; (2) the ratio of the total amount of drugs used by each patient in the fraud chain to the total amount of drugs used by each patient; (3) the ratio of the total amount of drugs used by each patient in the fraud chain to the types of drugs used by the patient in the fraud chain. If the number of fraud chains obtained is r, then each patient can obtain
Figure SMS_17
derived features, labeled
Figure SMS_18
.

使用机器学习算法完成欺诈检测模型的构建。Complete the construction of fraud detection models using machine learning algorithms.

综上,整合患者对应的特征向量与患者对应的欺诈标记Y,为

Figure SMS_19
,对上述特征使用有监督特征筛选算法smbinning对每个特征的信息量IV进行计算,并提取信息量IV大于0.05的特征投入机器学习模型,本实施例以逻辑回归算法为例。smbining算法:R语言下的分类处理方法,在本文中目的是对数据集的信息量分类,去除信息量过低的特征向量与欺诈标记的匹配首先,对上述数据集进行十折交叉验证,针对训练集数据使用逻辑回归算法建立关于分类系数的凸优化目标,利用梯度下降法对凸优化目标进行迭代更新,使用ROC、AUC作为模型性能表现的评估变量,比较结果如下表1所示,获得相对性能最佳分类系数向量t,其中,In summary, the feature vector corresponding to the patient and the fraud mark Y corresponding to the patient are integrated to be
Figure SMS_19
, the supervised feature screening algorithm smbinning is used to calculate the information content IV of each feature for the above features, and the features with information content IV greater than 0.05 are extracted and put into the machine learning model. This embodiment takes the logistic regression algorithm as an example. Smbining algorithm: a classification processing method under R language. The purpose of this article is to classify the information content of the data set and remove the feature vectors with too low information content and the matching of fraud marks. First, a ten-fold cross-validation is performed on the above data set. The logistic regression algorithm is used for the training set data to establish a convex optimization target for the classification coefficient. The gradient descent method is used to iteratively update the convex optimization target. ROC and AUC are used as evaluation variables for model performance. The comparison results are shown in Table 1 below. The classification coefficient vector t with the best relative performance is obtained, where

Figure SMS_20
Figure SMS_20
.

11 22 33 44 55 66 77 88 99 1010 训练AUCTraining AUC 0.860.86 0.860.86 0.820.82 0.850.85 0.850.85 0.860.86 0.860.86 0.860.86 0.860.86 0.850.85 测试AUCTest AUC 0.840.84 0.790.79 0.80.8 0.780.78 0.820.82 0.80.8 0.780.78 0.780.78 0.810.81 0.860.86

表1Table 1

根据的分类系数向量t,欺诈概率为

Figure SMS_21
;According to the classification coefficient vector t, the fraud probability is
Figure SMS_21
;

获得

Figure SMS_22
使用logistic函数计算病患对应的欺诈概率,完成对病患是否为欺诈病患的判定。建立新的信用模型的实现方式有多种。为了得到可行的信用评分模型,本实施例中新的信用模型的属性集为属性集可行域的子集。基于属性集性质进一步确定新信用评分模型使用的算法。目前,可应用于生成信用评分模型的算法种类较多。例如:基于逻辑回归,基于随机森林,基于GBDT等等。在本实施例中,进行算法筛选时包含使用算法融合后的新算法,并按照如下策略实现算法的优选。Logistic回归:研究某一事件发生的概率与若干因素间的关系,可得出事件发生的概率。当概率大于0.5时。可认为其发生,小于0.5时,可认为其不发生。get
Figure SMS_22
The logistic function is used to calculate the fraud probability corresponding to the patient, and the determination of whether the patient is a fraudulent patient is completed. There are many ways to implement a new credit model. In order to obtain a feasible credit scoring model, the attribute set of the new credit model in this embodiment is a subset of the feasible domain of the attribute set. The algorithm used by the new credit scoring model is further determined based on the properties of the attribute set. At present, there are many types of algorithms that can be applied to generate credit scoring models. For example: based on logistic regression, based on random forest, based on GBDT, etc. In this embodiment, the algorithm screening includes the use of a new algorithm after algorithm fusion, and the algorithm is optimized according to the following strategy. Logistic regression: Study the relationship between the probability of an event and several factors, and the probability of the event can be obtained. When the probability is greater than 0.5. It can be considered to occur, and when it is less than 0.5, it can be considered not to occur.

关联链式算法:对二部图的对应矩阵按边权(药品组合出现频次)排序,从最高边权对应的药品组合开始,作为异常链中的起始药品组合,进一步检索组合药品中次高边权所连接的药品。依次检索,将药品链串联在一起的算法。输入边权邻接矩阵,输出一条链。如图3和图4所示的二部图示意,甲乙丙丁戊为正常患者,己庚辛壬癸为异常患者,a,b,c,d,e,f为药品。Association chain algorithm: Sort the corresponding matrix of the bipartite graph by edge weight (frequency of occurrence of drug combination), start with the drug combination corresponding to the highest edge weight as the starting drug combination in the abnormal chain, and further search for drugs connected by the second highest edge weight in the combination drug. Search in sequence and connect the drug chains in series. Input the edge weight adjacency matrix and output a chain. As shown in Figures 3 and 4, the bipartite graphs A, B, C, D, and E are normal patients, J, G, S, R, and G are abnormal patients, and a, b, c, d, e, and f are drugs.

单模投影关系Single-mode projection relationship

Figure SMS_23
Figure SMS_23

边权邻接矩阵示意关系:The edge weight adjacency matrix shows the relationship:

头(药品链接的开头)Header (beginning of the drug link) 尾(药品链接的结尾)Tail (end of the drug link) 权(药品链接出现频次)Rights (frequency of drug links appearing) a药品aDrugs b药品bMedicine 22 a药品aDrugs c药品cDrugs 11 a药品aDrugs d药品d. Drugs 00 b药品bMedicine c药品cDrugs 44 b药品bMedicine d药品d. Drugs 11 c药品cDrugs d药品d. Drugs 11

对二部图的边权矩阵按边权高低排序,药品组合出现频次是a-b为2次,a-c为2次,a-d为0次,b-c为4次,b-d为1次,c-d为1次,从最高边权对应的药品组合开始,即b-c作为异常链中的起始药品组合,进一步检索组合药品中次高边权所连接的药品,即a-b为两次,b-d为一次,故取a-b,依次检索,将药品链串联在一起,输出a-b-c-d,即一条正常链。即关联链式算法输出一条按照边权高低排列的一条正常链。The edge weight matrix of the bipartite graph is sorted by edge weight. The frequency of drug combinations is a-b 2 times, a-c 2 times, a-d 0 times, b-c 4 times, b-d 1 time, and c-d 1 time. Starting from the drug combination corresponding to the highest edge weight, that is, b-c is the starting drug combination in the abnormal chain, further search for drugs connected by the second highest edge weight in the combination drug, that is, a-b is twice, b-d is once, so take a-b, search in sequence, connect the drug chains together, and output a-b-c-d, that is, a normal chain. That is, the association chain algorithm outputs a normal chain arranged according to the edge weight.

边权邻接矩阵示意关系:The edge weight adjacency matrix shows the relationship:

头(药品链接的开头)Header (beginning of the drug link) 尾(药品链接的结尾)Tail (end of the drug link) 权(药品链接出现频次)Rights (frequency of drug links appearing) b药品bMedicine c药品cDrugs 44 b药品bMedicine e药品e-Drugs 11 b药品bMedicine f药品fMedicine 11 c药品cDrugs e药品e-Drugs 33 c药品cDrugs f药品fMedicine 22 e药品e-Drugs f药品fMedicine 22

对二部图的边权矩阵按边权高低排序,药品组合出现频次是b-c为4次,b-e为1次,b-f为1次,c-e为3次,c-f为2次,e-f为2次,从最高边权对应的药品组合开始,即b-c作为异常链中的起始药品组合,进一步检索组合药品中次高边权所连接的药品,即c-f为2次,c-e为3次,故取c-f,依次检索,将药品链串联在一起,输出b-c-e-f,即一条异常链。即关联链式算法输出一条按照边权高低排列的一条异常链。The edge weight matrix of the bipartite graph is sorted by edge weight. The frequency of drug combinations is b-c 4 times, b-e 1 time, b-f 1 time, c-e 3 times, c-f 2 times, e-f 2 times. Starting from the drug combination corresponding to the highest edge weight, that is, b-c is the starting drug combination in the abnormal chain, and further searching for drugs connected by the second highest edge weight in the combination drugs, that is, c-f 2 times, c-e 3 times, so c-f is selected, searched in sequence, the drug chains are connected in series, and b-c-e-f is output, that is, an abnormal chain. That is, the association chain algorithm outputs an abnormal chain arranged according to the edge weight.

因为向量化后余弦相似度不为0。因此在去除异常链中与正常链相同的药品b,c后,形成欺诈链e-f。Because the cosine similarity after vectorization is not 0. Therefore, after removing the drugs b and c in the abnormal chain that are the same as the normal chain, a fraud chain e-f is formed.

余弦相似度定义:余弦相似度,又称为余弦相似性,是通过计算两个向量的夹角余弦值来评估他们的相似度。余弦相似度将向量根据坐标值,绘制到向量空间中。用向量空间中两个向量夹角的余弦值作为衡量两个个体间差异的大小。余弦值越接近1,就表明夹角越接近0度,也就是两个向量越相似,反之越接近0就表示两个向量相似度越低,这就叫"余弦相似性"。Cosine similarity definition: Cosine similarity, also known as cosine similarity, evaluates the similarity of two vectors by calculating the cosine value of the angle between them. Cosine similarity plots the vectors into the vector space according to the coordinate values. The cosine value of the angle between two vectors in the vector space is used as a measure of the size of the difference between two individuals. The closer the cosine value is to 1, the closer the angle is to 0 degrees, that is, the more similar the two vectors are. Conversely, the closer it is to 0, the lower the similarity between the two vectors. This is called "cosine similarity".

公式:

Figure SMS_24
,其中a,b,c为正常链与异常链;formula:
Figure SMS_24
, where a, b, c are normal chains and abnormal chains;

其中,a向量是[x1, y1],b向量是[x2, y2],a向量为正常链的向量化,b向量为异常链的向量化。从而去除相似度为0的向量。Among them, the a vector is [x 1 , y 1 ], the b vector is [x 2 , y 2 ], the a vector is the vectorization of the normal chain, and the b vector is the vectorization of the abnormal chain. Thus, vectors with a similarity of 0 are removed.

Figure SMS_25
Figure SMS_25
.

Claims (3)

1.一种基于购药记录的医疗保险欺诈检测方法,其特征在于,它包括以下步骤:1. A medical insurance fraud detection method based on drug purchase records, characterized in that it includes the following steps: 步骤S1、通过机器学习算法构建欺诈者分类模型;Step S1, constructing a fraudster classification model through a machine learning algorithm; 步骤S2、对所述模型输入患者信息和购药信息,建立患者-药品二部图,所述患者信息包括正常患者和欺诈患者;Step S2, inputting patient information and drug purchase information into the model to establish a patient-drug bipartite graph, wherein the patient information includes normal patients and fraudulent patients; 步骤S3、根据患者-药品二部图,建立药品单模投影关系,形成药品链;Step S3: According to the patient-drug bipartite graph, a drug single-mode projection relationship is established to form a drug chain; 步骤S4、利用关联链式算法将步骤S3所述的药品链分为正常链和异常链;Step S4, using an associative chain algorithm to divide the drug chain described in step S3 into a normal chain and an abnormal chain; 步骤S5、将正常链和异常链分别通过余弦相似度公式计算相似度;Step S5, calculating the similarity of the normal chain and the abnormal chain respectively by using the cosine similarity formula; 步骤S6、去除相似度为0的正常链,保留相似度不为0的异常链和正常链的对比组合;Step S6, remove the normal chain with a similarity of 0, and retain the comparison combination of the abnormal chain and the normal chain with a similarity not equal to 0; 步骤S7、去除组合中异常链和正常链中相同的产品,保留其他药品;Step S7, remove the same products in the abnormal chain and the normal chain in the combination, and keep other drugs; 步骤S8、将剩余药品合成欺诈链,输出欺诈链;Step S8, synthesizing the remaining drugs into a fraud chain, and outputting the fraud chain; 在步骤S2中,将正常患者的购药信息和存在欺诈行为的欺诈患者的购药信息设置为患者节点和药品节点,分别构建欺诈患者的药品—患者无向二部图及正常患者的药品—患者无向二部图;对患者-药品二部图进行第一轮衍生特征提取,所述第一轮提取的特征包括使用药品的种类总量和使用药品的总量,并根据其衍生特征建立药品单模投影关系;In step S2, the drug purchase information of normal patients and the drug purchase information of fraudulent patients with fraudulent behavior are set as patient nodes and drug nodes, and a drug-patient undirected bipartite graph of fraudulent patients and a drug-patient undirected bipartite graph of normal patients are constructed respectively; the first round of derived feature extraction is performed on the patient-drug bipartite graph, and the features extracted in the first round include the total amount of types of drugs used and the total amount of drugs used, and a drug single-mode projection relationship is established based on its derived features; 在步骤S3中,对异常链进行第二轮衍生特征提取,所述第二轮提取的特征包括种类异常率、数量异常率和异常链中的异常药品使用率;In step S3, a second round of derivative feature extraction is performed on the abnormal chain, and the features extracted in the second round include the abnormal rate of type, the abnormal rate of quantity, and the abnormal drug usage rate in the abnormal chain; 在步骤S4中,所述关联链式算法具体为:对二部图的对应矩阵按边权排序,从最高边权对应的药品组合开始,作为异常链中的起始药品组合,进一步检索组合药品中次高边权所连接的药品,依次检索,将药品链串联在一起,输入边权邻接矩阵,输出一条链;In step S4, the association chain algorithm is specifically as follows: sorting the corresponding matrix of the bipartite graph by edge weight, starting from the drug combination corresponding to the highest edge weight as the starting drug combination in the abnormal chain, further searching for drugs connected to the second highest edge weight in the drug combination, searching in sequence, connecting the drug chains together, inputting the edge weight adjacency matrix, and outputting a chain; 在步骤S8中,对合成的欺诈链进行第三轮衍生特征提取,所述第三轮提取的特征包括种类异常率、数量异常率和异常链中的异常药品使用率。In step S8, a third round of derivative feature extraction is performed on the synthesized fraud chain, and the features extracted in the third round include the abnormal rate of type, the abnormal rate of quantity and the abnormal drug usage rate in the abnormal chain. 2.根据权利要求1所述的一种基于购药记录的医疗保险欺诈检测方法,其特征在于,在步骤S1中,整合患者信息,采用机器学习算法提取患者信息的特征向量,对所述特征向量使用监督筛选算法smbinning对每个特征的信息量IV进行计算,并提取信息量IV大于的特征投入机器学习算法,获得欺诈者分类模型。2. According to claim 1, a medical insurance fraud detection method based on drug purchase records is characterized in that, in step S1, patient information is integrated, a machine learning algorithm is used to extract a feature vector of the patient information, the supervised screening algorithm smbinning is used to calculate the information volume IV of each feature of the feature vector, and the features with information volume IV greater than the IV are extracted and input into the machine learning algorithm to obtain a fraudster classification model. 3.根据权利要求2所述的一种基于购药记录的医疗保险欺诈检测方法,其特征在于,在步骤S5中,所述余弦相似度公式为
Figure QLYQS_1
;其中a、b、c分别为正常链或异常链。
3. A medical insurance fraud detection method based on drug purchase records according to claim 2, characterized in that in step S5, the cosine similarity formula is
Figure QLYQS_1
; where a, b, and c are normal chains or abnormal chains respectively.
CN201911383476.7A 2019-12-28 2019-12-28 A medical insurance fraud detection method based on drug purchase records Active CN111105317B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911383476.7A CN111105317B (en) 2019-12-28 2019-12-28 A medical insurance fraud detection method based on drug purchase records

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911383476.7A CN111105317B (en) 2019-12-28 2019-12-28 A medical insurance fraud detection method based on drug purchase records

Publications (2)

Publication Number Publication Date
CN111105317A CN111105317A (en) 2020-05-05
CN111105317B true CN111105317B (en) 2023-05-12

Family

ID=70423707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911383476.7A Active CN111105317B (en) 2019-12-28 2019-12-28 A medical insurance fraud detection method based on drug purchase records

Country Status (1)

Country Link
CN (1) CN111105317B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991075A (en) * 2021-02-04 2021-06-18 浙江大学山东工业技术研究院 Package type medicine purchasing abnormity detection method based on FP-growth and graph network
CN113592517A (en) * 2021-08-09 2021-11-02 深圳前海微众银行股份有限公司 Method and device for identifying cheating passenger groups, terminal equipment and computer storage medium
CN114548207A (en) * 2021-12-21 2022-05-27 深圳云天励飞技术股份有限公司 Drug purchase data processing method and device, computer equipment, storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105894038A (en) * 2016-04-22 2016-08-24 天云融创数据科技(北京)有限公司 Credit card fraud prediction method based on signal transmission and link mode
CN107133437A (en) * 2017-03-03 2017-09-05 平安医疗健康管理股份有限公司 The method and device that monitoring medicine is used
CN108596770A (en) * 2017-12-29 2018-09-28 山大地纬软件股份有限公司 Medicare fraud detection device and method based on outlier analysis
CN109523400A (en) * 2018-10-27 2019-03-26 平安医疗健康管理股份有限公司 Medicine method of specifying error and terminal device are taken based on data processing
CN109545316A (en) * 2018-10-30 2019-03-29 平安科技(深圳)有限公司 Purchase the processing method and Related product of medicine data
CN109559236A (en) * 2018-10-27 2019-04-02 平安医疗健康管理股份有限公司 The method and apparatus of drug reimbursement Information abnormity
CN109615540A (en) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium of medicine are purchased in violation of rules and regulations
CN109615547A (en) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium of abnormal purchase medicine
CN109636635A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium write a prescription in violation of rules and regulations
CN109636652A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Purchase monitoring method, monitoring service end and the storage medium of medicine abnormal behavior
CN110349004A (en) * 2019-07-02 2019-10-18 北京淇瑀信息科技有限公司 Risk of fraud method for detecting and device based on user node relational network
CN110555455A (en) * 2019-06-18 2019-12-10 东华大学 Online transaction fraud detection method based on entity relationship

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105894038A (en) * 2016-04-22 2016-08-24 天云融创数据科技(北京)有限公司 Credit card fraud prediction method based on signal transmission and link mode
CN107133437A (en) * 2017-03-03 2017-09-05 平安医疗健康管理股份有限公司 The method and device that monitoring medicine is used
CN108596770A (en) * 2017-12-29 2018-09-28 山大地纬软件股份有限公司 Medicare fraud detection device and method based on outlier analysis
CN109523400A (en) * 2018-10-27 2019-03-26 平安医疗健康管理股份有限公司 Medicine method of specifying error and terminal device are taken based on data processing
CN109559236A (en) * 2018-10-27 2019-04-02 平安医疗健康管理股份有限公司 The method and apparatus of drug reimbursement Information abnormity
CN109545316A (en) * 2018-10-30 2019-03-29 平安科技(深圳)有限公司 Purchase the processing method and Related product of medicine data
CN109615540A (en) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium of medicine are purchased in violation of rules and regulations
CN109615547A (en) * 2018-12-13 2019-04-12 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium of abnormal purchase medicine
CN109636635A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Recognition methods, device, terminal and the computer readable storage medium write a prescription in violation of rules and regulations
CN109636652A (en) * 2018-12-13 2019-04-16 平安医疗健康管理股份有限公司 Purchase monitoring method, monitoring service end and the storage medium of medicine abnormal behavior
CN110555455A (en) * 2019-06-18 2019-12-10 东华大学 Online transaction fraud detection method based on entity relationship
CN110349004A (en) * 2019-07-02 2019-10-18 北京淇瑀信息科技有限公司 Risk of fraud method for detecting and device based on user node relational network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
有向网络下的CoDA社区发现算法评估;郭松;张冬雯;许云峰;杨玉林;郑雅洁;柳晨光;;河北科技大学学报(02);第169-174页 *
采用群体信息的二部图链接预测方法;蔡小雨;陈可佳;安琛;;计算机工程(10);第187-190页 *

Also Published As

Publication number Publication date
CN111105317A (en) 2020-05-05

Similar Documents

Publication Publication Date Title
CN111105317B (en) A medical insurance fraud detection method based on drug purchase records
CN107657536B (en) The recognition methods of social security fraud and device
Dhankhad et al. Supervised machine learning algorithms for credit card fraudulent transaction detection: a comparative study
Xu et al. A novel disjoint community detection algorithm for social networks based on backbone degree and expansion
CN109754258A (en) An online transaction fraud detection method based on individual behavior modeling
CN109101553A (en) The buying user's evaluation method and system of industry for the non-benefited party of purchaser
CN117689386A (en) Data enhancement-based Ethernet account identity recognition method and system
Samuel et al. Sales level analysis using the association method with the apriori algorithm
Ahmed et al. Improving prediction of plant disease using k-efficient clustering and classification algorithms
CN104462480B (en) Comment big data method for digging based on typicalness
Jibril et al. Association rule mining approach: evaluating pre-purchase risk intentions in the online second-hand goods market
CN110992194A (en) A User Reference Index Algorithm Based on Multi-process Sampling Graph Representation Learning Model with Attributes
CN110223786B (en) Method and system for predicting drug-drug interaction based on nonnegative tensor decomposition
Cheng et al. Research on medical insurance anti-gang fraud model based on the knowledge graph
Wani et al. Unleashing Customer Insights: Segmentation Through Machine Learning
Sakthivel et al. Conspectus of k-means clustering algorithm
Idowu et al. Customer segmentation based on RFM model using k-means, hierarchical and fuzzy c-means clustering algorithms
Griechisch et al. Community detection by using the extended modularity
CN103246697B (en) A kind of method and apparatus for determining nearly justice sequence cluster
Kim et al. An Effective Research Method to Predict Human Body Type Using an Artificial Neural Network and a Discriminant Analysis
JP2017010438A (en) Multidimensional correlation data extraction device and multidimensional correlation data extraction method
Kurniawan et al. Improving the effectiveness of classification using the data level approach and feature selection techniques in online shoppers purchasing intention prediction
CN107609110A (en) The method for digging and device of maximum various frequent mode based on classification tree
Khademolghorani The imperialist competitive algorithm for automated mining of association rules
Savage Detection of illicit behaviours and mining for contrast patterns

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant