[go: up one dir, main page]

CN115050042A - Claims data entry method and device, computer equipment and storage medium - Google Patents

Claims data entry method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115050042A
CN115050042A CN202210708855.4A CN202210708855A CN115050042A CN 115050042 A CN115050042 A CN 115050042A CN 202210708855 A CN202210708855 A CN 202210708855A CN 115050042 A CN115050042 A CN 115050042A
Authority
CN
China
Prior art keywords
data
preprocessing
settlement
preset
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210708855.4A
Other languages
Chinese (zh)
Other versions
CN115050042B (en
Inventor
马亿凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202210708855.4A priority Critical patent/CN115050042B/en
Publication of CN115050042A publication Critical patent/CN115050042A/en
Application granted granted Critical
Publication of CN115050042B publication Critical patent/CN115050042B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/42Document-oriented image-based pattern recognition based on the type of document

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application belongs to the field of big data, and relates to a method and a device for inputting claim data, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring claim data to be recorded, wherein the claim data comprises a plurality of images; processing the images through a preset OCR text detection model to obtain corresponding text contents, performing first preprocessing on the text contents to generate target feature vectors of the images so as to perform document classification on the images; and performing second preprocessing on the text content based on the document classification result, respectively inputting the second preprocessing result into a preset rule extractor and a preset neural network model, respectively obtaining corresponding output results to perform model integration processing so as to extract non-standard text elements, and inputting the extracted results into a claim settlement system. The application also relates to a block chain technology, and private information of users in claim settlement data can be stored in the block chain. According to the method and the device, efficient automatic and intelligent entry of claim settlement data can be realized, the data entry speed is increased, and the cost is saved.

Description

一种理赔资料录入方法、装置、计算机设备及存储介质A claim settlement data entry method, device, computer equipment and storage medium

技术领域technical field

本申请涉及大数据技术领域,尤其涉及一种理赔资料录入方法、装置、计算机设备及存储介质。The present application relates to the technical field of big data, and in particular, to a method, device, computer equipment and storage medium for inputting claim settlement data.

背景技术Background technique

理赔过程中,需要客户提交相关的案件资料证明案件的合理性和真实性,案件资料包括客户证件资料、医疗诊治资料、医疗费用资料,医疗诊治资料包括门诊病历、住院病历、病理报告等,医疗费用资料包括医疗发票、医疗清单、社保结算单等,这些资料都需要录入存档。In the process of claim settlement, the customer is required to submit relevant case materials to prove the rationality and authenticity of the case. The case materials include customer certificate information, medical diagnosis and treatment information, and medical expense information. Medical diagnosis and treatment information includes outpatient medical records, inpatient medical records, pathology reports, etc. Expense data include medical invoices, medical bills, social security statements, etc., which need to be entered and archived.

传统的资料录入都采用人工录入的方式,整个过程录入时间长、效率低,耗时耗力且拉长理赔周期,影响用户体验。The traditional data entry method is manual entry. The entire process of entry takes a long time, is inefficient, takes time and labor, and prolongs the claims settlement cycle, which affects the user experience.

发明内容SUMMARY OF THE INVENTION

本申请实施例的目的在于提出一种理赔资料录入方法、装置、计算机设备及存储介质,以解决现有技术理赔资料录入过程录入时间长、效率低的问题。The purpose of the embodiments of the present application is to provide a claim settlement data entry method, device, computer equipment and storage medium, so as to solve the problems of long entry time and low efficiency in the claim settlement data entry process in the prior art.

为了解决上述技术问题,本申请实施例提供一种理赔资料录入方法,采用了如下所述的技术方案:In order to solve the above-mentioned technical problems, the embodiment of the present application provides a method for inputting claim settlement data, which adopts the following technical solutions:

一种理赔资料录入方法,包括下述步骤:A claim settlement data entry method, comprising the following steps:

获取待录入的理赔资料,所述理赔资料包括若干图像;Obtaining claims data to be entered, the claim data including several images;

通过预设的OCR文本检测模型对所述图像进行处理,得到与所述图像对应的文本内容,对所述文本内容进行第一预处理,根据第一预处理结果生成各所述图像的目标特征向量,再根据所述目标特征向量对各所述图像进行单证分类;The image is processed through a preset OCR text detection model to obtain text content corresponding to the image, first preprocessing is performed on the text content, and target features of each image are generated according to the first preprocessing result vector, and then perform document classification on each of the images according to the target feature vector;

基于所述单证分类结果,对所述文本内容进行第二预处理,将第二预处理结果分别输入预设规则抽取器和预设神经网络模型,分别得到对应的输出结果,根据所述输出结果进行模型集成处理以抽取非制式文本要素,并将抽取的结果录入理赔系统,完成理赔资料的结构化录入。Based on the document classification result, a second preprocessing is performed on the text content, and the second preprocessing results are respectively input into the preset rule extractor and the preset neural network model, and corresponding output results are obtained respectively. The results are processed by model integration to extract non-standard text elements, and the extracted results are entered into the claim settlement system to complete the structured entry of claim settlement data.

进一步地,将第二预处理结果输入预设规则抽取器后,所述方法包括:Further, after inputting the second preprocessing result into the preset rule extractor, the method includes:

通过所述预设规则抽取器对输入的所述第二预处理结果依次进行要素匹配词定位和邻域搜索,以对所述文本内容进行命名实体识别,并得到各命名实体对应的值以及各个值的置信度。Element matching word location and neighborhood search are sequentially performed on the input second preprocessing result by the preset rule extractor, so as to perform named entity recognition on the text content, and obtain the value corresponding to each named entity and the corresponding value of each named entity. confidence in the value.

进一步地,通过所述预设规则抽取器对输入的所述第二预处理结果进行领域搜索时,所述方法包括:Further, when performing domain search on the input second preprocessing result by the preset rule extractor, the method includes:

以匹配到的命名实体在所述文本内容中的位置为基准,对所述第二预处理后的文本内容依次执行距离筛选、词性筛选和语义筛选操作,获得命名实体对应的完整的值。Based on the position of the matched named entity in the text content, the operations of distance filtering, part-of-speech filtering and semantic filtering are sequentially performed on the second preprocessed text content to obtain a complete value corresponding to the named entity.

进一步地,将第二预处理结果输入预设神经网络模型后,所述方法包括:Further, after inputting the second preprocessing result into the preset neural network model, the method includes:

对所述第二预处理结果进行多尺度滑窗上下文拼接,根据拼接内容进行命名实体识别,得到识别的命名实体及对应的值以及各个值的置信度。Multi-scale sliding window context splicing is performed on the second preprocessing result, and named entity recognition is performed according to the splicing content, so as to obtain the identified named entities and corresponding values and the confidence of each value.

进一步地,所述根据所述输出结果进行模型集成处理以抽取非制式文本要素的步骤包括:Further, the step of performing model integration processing to extract non-standard text elements according to the output result includes:

将所述预设规则抽取器和所述预设神经网络模型输出的各个值的置信度进行对标归一化处理,再对所述预设规则抽取器和所述预设神经网络模型识别的同一命名实体的归一化后的置信度进行比较,选取置信度更高的值作为命名实体最终的值,并基于识别的命名实体及其最终的值进行非制式文本要素抽取。The confidence level of each value output by the preset rule extractor and the preset neural network model is subjected to standardization processing, and then the preset rule extractor and the preset neural network model are identified. The normalized confidence of the same named entity is compared, and the value with higher confidence is selected as the final value of the named entity, and based on the identified named entity and its final value, unstandardized text element extraction is performed.

进一步地,在所述将抽取的结果录入理赔系统,完成理赔资料的结构化录入的步骤后,所述方法还包括:Further, after the step of entering the extracted result into the claim settlement system and completing the structured entry of the claim settlement data, the method further includes:

响应用户的溯源请求对抽取的非制式文本要素进行溯源,获取所述抽取的非制式文本要素在原始图像中的位置,以供用户对录入的文本内容进行核查。In response to the user's source tracing request, the extracted non-standard text elements are traced to the source, and the position of the extracted non-standard text elements in the original image is obtained, so that the user can check the entered text content.

进一步地,在所述将抽取的结果录入理赔系统,完成理赔资料的结构化录入的步骤后,所述方法还包括:Further, after the step of entering the extracted result into the claim settlement system and completing the structured entry of the claim settlement data, the method further includes:

获取新的历史理赔资料,根据全部历史理赔资料对所述预设规则抽取器中预设的要素定位词和匹配规则进行更新,并根据全部历史理赔资料作为模型训练数据对所述预设神经网络模型进行训练,更新所述预设神经网络模型的参数。Obtain new historical claims data, update the preset element positioning words and matching rules in the preset rule extractor according to all the historical claims data, and use all the historical claim data as model training data to update the preset neural network. The model is trained, and the parameters of the preset neural network model are updated.

为了解决上述技术问题,本申请实施例还提供一种理赔资料录入装置,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application also provides a claim settlement data input device, which adopts the following technical solutions:

一种理赔资料录入装置,包括:A claim settlement data entry device, comprising:

资料获取模块,用于获取待录入的理赔资料,所述理赔资料包括若干图像;a data acquisition module, used for acquiring claim settlement data to be entered, the claim settlement data including several images;

图像分类模块,用于通过预设的OCR文本检测模型对所述图像进行处理,得到与所述图像对应的文本内容,对所述文本内容进行第一预处理,根据第一预处理结果生成各所述图像的目标特征向量,再根据所述目标特征向量对各所述图像进行单证分类;The image classification module is used to process the image through a preset OCR text detection model to obtain text content corresponding to the image, perform first preprocessing on the text content, and generate each the target feature vector of the image, and then perform document classification on each of the images according to the target feature vector;

资料录入模块,用于基于所述单证分类结果,对所述文本内容进行第二预处理,将第二预处理结果分别输入预设规则抽取器和预设神经网络模型,分别得到对应的输出结果,根据所述输出结果进行模型集成处理以抽取非制式文本要素,并将抽取的结果录入理赔系统,完成理赔资料的结构化录入。The data entry module is used to perform second preprocessing on the text content based on the document classification results, and input the second preprocessing results into the preset rule extractor and the preset neural network model respectively, and obtain corresponding outputs respectively As a result, model integration processing is performed according to the output results to extract non-standard text elements, and the extracted results are entered into the claim settlement system to complete the structured entry of the claim settlement data.

为了解决上述技术问题,本申请实施例还提供一种计算机设备,采用了如下所述的技术方案:In order to solve the above-mentioned technical problems, the embodiment of the present application also provides a computer device, which adopts the following technical solutions:

一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如上所述的理赔资料录入方法的步骤。A computer device includes a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the steps of the above-mentioned method for inputting claim settlement data are implemented.

为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:

一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如上所述的理赔资料录入方法的步骤。A computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, implement the steps of the above-mentioned method for inputting claim settlement data.

与现有技术相比,本申请实施例主要有以下有益效果:Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects:

本方案通过对理赔资料进行单证分类,再根据预设规则抽取器和预设神经网络模型基于所述单证分类结果对所述文本内容进行非制式文本要素抽取,从而进行理赔资料的结构化录入,实现了理赔资料的高效自动化和智能化录入,从而提高资料录入的速度,节省成本,同时降低了理赔周期,提升了客户体验。In this scheme, the claim settlement data is structured by document classification, and then extracting non-standard text elements from the text content based on the document classification result according to the preset rule extractor and the preset neural network model, thereby structuring the claim settlement data. Entry realizes efficient automation and intelligent entry of claims data, thereby improving the speed of data entry, saving costs, reducing the claim settlement cycle, and improving customer experience.

附图说明Description of drawings

为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the solutions in the present application more clearly, the following will briefly introduce the accompanying drawings used in the description of the embodiments of the present application. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.

图1是本申请可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;

图2根据本申请的理赔资料录入方法的一个实施例的流程图;2 is a flowchart of an embodiment of a claim settlement data entry method according to the present application;

图3是根据本申请的理赔资料录入装置的一个实施例的结构示意图;3 is a schematic structural diagram of an embodiment of a claim settlement data input device according to the present application;

图4是根据本申请的理赔资料录入装置的另一个实施例的结构示意图;4 is a schematic structural diagram of another embodiment of the claim settlement data input device according to the present application;

图5是根据本申请的计算机设备的一个实施例的结构示意图。FIG. 5 is a schematic structural diagram of an embodiment of a computer device according to the present application.

具体实施方式Detailed ways

除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only It is not intended to limit the application; the terms "comprising" and "having" and any variations thereof in the description and claims of this application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.

在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings.

如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.

终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving PictureExpertsGroup Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(MovingPictureExperts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, and 103 may be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, moving picture experts). Compression Standard Audio Layer 3), MP4 (Moving PictureExperts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc.

服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the terminal devices 101 , 102 , and 103 .

需要说明的是,本申请实施例所提供的理赔资料录入方法一般由服务器执行,相应地,理赔资料录入装置一般设置于服务器中。It should be noted that the claim settlement data entry method provided by the embodiment of the present application is generally performed by the server, and accordingly, the claim settlement data entry device is generally set in the server.

应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

继续参考图2,示出了根据本申请的理赔资料录入方法的一个实施例的流程图。所述理赔资料录入方法包括以下步骤:Continuing to refer to FIG. 2 , a flowchart of an embodiment of a claim settlement data entry method according to the present application is shown. The claim settlement data entry method includes the following steps:

步骤S201,获取待录入的理赔资料,所述理赔资料包括若干图像。Step S201: Acquire claim settlement data to be entered, where the claim settlement data includes several images.

用户需要进行理赔时,通过手机端或电脑端向处理理赔资料收集和录入的服务器端发起理赔资料上传请求,服务器端响应该理赔资料上传请求后接收用户手机端或电脑端上传的理赔资料,这些理赔资料包括在理赔过程中所需的客户证件资料、医疗诊治资料、医疗费用资料,医疗诊治资料包括门诊病历、住院病历、病理报告等,医疗费用资料包括医疗发票、医疗清单、社保结算单等,这些案件理赔资料在理赔系统中都需要进行收集存储。这些理赔资料以图片的形式被上传,服务器端在接收理赔资料结束后,将这些图片以资料集的形式存储。When the user needs to make a claim, he/she sends a claim data upload request to the server that handles the collection and entry of claim data through the mobile phone or computer. Claims information includes customer certificate information, medical diagnosis and treatment information, and medical expense information required in the process of claim settlement. Medical diagnosis and treatment information includes outpatient medical records, inpatient medical records, pathology reports, etc. , the claims data of these cases need to be collected and stored in the claims system. These claims data are uploaded in the form of pictures, and the server side stores these pictures in the form of data sets after receiving the claim data.

在本实施例中,在服务器端获得理赔资料后,所述方法还包括:In this embodiment, after obtaining the claim settlement data on the server side, the method further includes:

对各图像进行检测,以保证所有图像符合图像质量要求,具体可通过清晰度检测模型、方向矫正模型和翻拍检测模型中的一个或多个进行图像质检。Each image is inspected to ensure that all images meet the image quality requirements. Specifically, image quality inspection can be performed through one or more of a sharpness detection model, an orientation correction model, and a remake detection model.

其中,清晰度检测模型可采用现有的图像清晰度检测模型进行质检,比如基于梯度算子的图像清晰度检测模型或者通过训练后的卷积神经网络模型;方向矫正模型可采用基于投影的方法、基于Hough变换的方法、基于线性拟合的方法、基于傅里叶变换到频域进行检测的方法等进行方向矫正;翻拍检测模型可采用噪声分析、像素值偏差检测等方式进行检测。对于检测不达标的图像向用户所在的手机端或电脑端发送重新上传的提示信息,待收到新的上传图像后重新进行检测,直到全部检测通过后进行后续步骤的处理。Among them, the sharpness detection model can use the existing image sharpness detection model for quality inspection, such as the gradient operator-based image sharpness detection model or the trained convolutional neural network model; the direction correction model can use the projection-based image sharpness detection model. method, method based on Hough transform, method based on linear fitting, method based on Fourier transform to frequency domain detection, etc. for direction correction; remake detection model can be detected by noise analysis, pixel value deviation detection, etc. For images that do not meet the detection standard, a prompt message for re-uploading is sent to the user's mobile phone or computer, and the detection is performed again after receiving a new uploaded image, and the subsequent steps are processed until all the detections are passed.

在一些实施例中,所述方法还包括:对理赔资料进行去重操作,以去除用户在不同上传入口因操作不当上传的相同理赔资料并提示用户确认是否漏传理赔资料。In some embodiments, the method further includes: performing a deduplication operation on the claim settlement data, so as to remove the same claim settlement data uploaded by the user in different upload portals due to improper operation, and prompt the user to confirm whether the claim settlement data is omitted.

步骤S202,通过预设的OCR文本检测模型对所述图像进行处理,得到与所述图像对应的文本内容,对所述文本内容进行第一预处理,根据第一预处理结果生成各所述图像的目标特征向量,再根据所述目标特征向量对各所述图像进行单证分类。Step S202, processing the image through a preset OCR text detection model to obtain text content corresponding to the image, performing a first preprocessing on the text content, and generating each of the images according to the first preprocessing result the target feature vector, and then perform document classification on each of the images according to the target feature vector.

在本步骤中,在根据OCR文本检测模型进行文本内容获取时,可以采用现有的OCR文本检测模型来实现。In this step, when acquiring the text content according to the OCR text detection model, an existing OCR text detection model can be used to achieve it.

在对所述文本内容进行第一预处理时,目的是进行粗粒度的文本内容提取,具体为将所述文本内容输入至预处理模块执行词形还原、词性标注、数字规整等操作,其中词形还原针对文本中的英文词,词性标注是对每个词是名词、动词、形容词或其他词性进行标注,数字规整的主要是对理赔资料中的医疗发票这类图像中涉及费用的数字进行规整,以利于对图像进行单证分类,此处单证分类是指将理赔资料中存在的费用单据(诸如诊断单、发票等)和身份证明(诸如身份证、驾照等)的图像进行分类,以便后续针对性进行理赔资料在理赔系统中的录入。在进行粗粒度的特征提取时,只提取能够进行单证分类的关键性文本内容作为分类特征。When the first preprocessing is performed on the text content, the purpose is to perform coarse-grained text content extraction, specifically, inputting the text content into the preprocessing module to perform operations such as morphological restoration, part-of-speech tagging, and number regularization. Shape restoration is for English words in the text. Part-of-speech tagging is to mark whether each word is a noun, verb, adjective or other part-of-speech. The main purpose of digital regularization is to regularize the numbers involved in expenses in images such as medical invoices in claims data. , in order to facilitate the document classification of images, where document classification refers to classifying the images of expense documents (such as diagnosis sheets, invoices, etc.) and identity certificates (such as ID cards, driver's licenses, etc.) Subsequent input of claim information in the claims system will be carried out in a targeted manner. When performing coarse-grained feature extraction, only the key text content that can be used for document classification is extracted as classification features.

在进行分类特征提取时,本实施例可采用词袋模型进行,具体对文本内容进行分词后提取若干分词作为特征字段,对特征字段进行筛选得到特征字段集合,确定特征字段集合中各特征字段在基于词袋模型建立的直方图中的分布,得到中间特征向量,最后对中间特征向量进行归一化处理和降维处理得到目标特征向量。将该目标特征向量输入单证分类模型即可得到单证分类结果。对于理赔材料来讲,具体可分为证件、病历、诊断材料、发票、清单、报销证明、银行卡等不同单证类型,不同的类型的特征字段不同,比如身份证的特征字段包括姓名、号码、有效期等,对于病历的特征字段包括医院、出入院诊断、出入院日期、姓名等,词袋模型对文本的内容进行理解形成的中间特征向量的维度较大,因此需要对中间特征向量进行降维处理目标特征向量。When performing classification feature extraction, this embodiment can use a bag of words model, specifically, after word segmentation of the text content, a number of segmented words are extracted as feature fields, the feature fields are filtered to obtain a feature field set, and each feature field in the feature field set is determined in the feature field set. Based on the distribution of the histogram established by the bag-of-words model, the intermediate feature vector is obtained, and finally the intermediate feature vector is normalized and dimensionally reduced to obtain the target feature vector. The document classification result can be obtained by inputting the target feature vector into the document classification model. For claims materials, it can be divided into different document types such as certificates, medical records, diagnostic materials, invoices, lists, reimbursement certificates, bank cards, etc. Different types have different characteristic fields. For example, the characteristic fields of ID cards include name, number, etc. , validity period, etc. For the feature fields of medical records including hospital, admission and discharge diagnosis, admission and discharge date, name, etc., the dimension of the intermediate feature vector formed by the bag-of-words model’s understanding of the content of the text is large, so it is necessary to reduce the intermediate feature vector. dimensional processing of the target feature vector.

在一些实施例中,也可以采用卷积神经网络模型进行分类特征的提取,具体为将每幅图像的文本内容进行拼接后输入卷积神经网络模型,得到目标特征向量,将得到的目标特征向量输入单证分类模型即可得到单证分类结果。其中,在对卷积神经网络模型进行训练时,对于不同的理赔资料类型的特征字段进行人工标注,比如身份证的特征字段包括姓名、号码、有效期等,对于病历的特征字段包括医院、出入院诊断、出入院日期、姓名等,不同的材料的布局也有一定的特征,比如发票的标题会在文件的中上部,人为对这些特征进行定义和干涉,卷积神经网络模型的训练主要包括对文本的内容理解,此外还可以训练对文本的文档布局的理解,在不断学习过程中建立不同材料特征字段的布局关系。In some embodiments, a convolutional neural network model can also be used to extract classification features. Specifically, the text content of each image is spliced and input into the convolutional neural network model to obtain a target feature vector, and the obtained target feature vector Enter the document classification model to get the document classification result. Among them, when training the convolutional neural network model, the feature fields of different claims data types are manually marked. For example, the feature fields of ID cards include name, number, validity period, etc., and the feature fields of medical records include hospital, admission and discharge. Diagnosis, admission and discharge date, name, etc., the layout of different materials also has certain characteristics. For example, the title of the invoice will be in the middle and upper part of the document. These characteristics are artificially defined and interfered with. The training of the convolutional neural network model mainly includes the text In addition, it can also train the understanding of the document layout of the text, and establish the layout relationship of different material feature fields in the continuous learning process.

在另一些实施例中,也可以结合词袋模型和卷积神经网络模型进行分类特征的提取,具体为根据二者得到的目标特征向量的概率值进行比较,选取概率值较大的目标特征向量作为最终的目标特征向量。In other embodiments, the classification feature can also be extracted by combining the bag-of-words model and the convolutional neural network model. Specifically, according to the probability values of the target feature vectors obtained by the two, the target feature vector with a larger probability value is selected. as the final target feature vector.

步骤S203,基于所述单证分类结果,对所述文本内容进行第二预处理,将第二预处理结果分别输入预设规则抽取器和预设神经网络模型,分别得到对应的输出结果,根据所述输出结果进行模型集成处理以抽取非制式文本要素,并将抽取的结果录入理赔系统,完成理赔资料的结构化录入。Step S203, based on the document classification result, perform second preprocessing on the text content, input the second preprocessing results into the preset rule extractor and the preset neural network model respectively, and obtain corresponding output results respectively, according to The output results are subjected to model integration processing to extract non-standard text elements, and the extracted results are entered into the claim settlement system to complete the structured entry of the claim settlement data.

在本步骤中,不同于第一预处理,对文本内容进行第二预处理的目的是基于单证分类结果进行细粒度的文本内容提取,在进行第二预处理时,首先根据单证分类结果将以图像形式呈现的理赔资料分为单证两个图像集合,针对两个图像集合分别进行第二预处理,与第一预处理的区别在于针对两个图像集合进行第二预处理时,对于单据类的图像和证明类的图像,第二预处理的处理方式会存在差异,即第二预处理包括共用的处理方式(如词形还原、词性标注等)和针对图像类型所特定的处理方式(如单据类图像的数字规整等)。In this step, different from the first preprocessing, the purpose of performing the second preprocessing on the text content is to perform fine-grained text content extraction based on the document classification result. The claim settlement data presented in the form of images is divided into two image sets of documents, and the second preprocessing is performed on the two image sets respectively. The difference from the first preprocessing is that when the second preprocessing is performed on the two image sets, for There will be differences in the processing methods of the second preprocessing for the images of the document category and the images of the certification category, that is, the second preprocessing includes common processing methods (such as morphological restoration, part-of-speech tagging, etc.) and specific processing methods for image types. (such as digital regularization of document images, etc.).

在本实施例中,将第二预处理结果分别输入预设规则抽取器和预设神经网络模型的输出结果为根据第二预处理后的文本内容得到的命名实体、各命名实体对应的值以及每个值的置信度。In this embodiment, the output results of inputting the second preprocessing result into the preset rule extractor and the preset neural network model respectively are named entities obtained according to the text content after the second preprocessing, the values corresponding to the named entities, and Confidence for each value.

在本实施例中,将第二预处理结果输入预设规则抽取器后,所述方法包括:通过所述预设规则抽取器对输入的所述第二预处理结果依次进行要素匹配词定位和邻域搜索,以对所述文本内容进行命名实体识别,并得到各命名实体对应的值以及各个值的置信度。In this embodiment, after the second preprocessing result is input into a preset rule extractor, the method includes: performing element matching word positioning and Neighborhood search is used to perform named entity recognition on the text content, and obtain the value corresponding to each named entity and the confidence level of each value.

具体的,规则抽取器是从文本内容中抽取key-value对,key即为命名实体,value则为命名实体的值,比如姓名:张三、身高:175cm等,这种就是根据规则抽取器获取到的命名实体及对应的值。规则抽取器通过要素匹配词来定位命名实体,以达到命名实体识别的目的,在贵规则抽取器中预先定义有若干要素定位词,比如身高、出院诊断等就是定义的要素定位词,通过要素定位词及其同义词在第二预处理后的文本内容中进行检索匹配,识别命名实体。Specifically, the rule extractor extracts key-value pairs from the text content, the key is the named entity, and the value is the value of the named entity, such as name: Zhang San, height: 175cm, etc. This is obtained according to the rule extractor to the named entity and the corresponding value. The rule extractor locates named entities through element matching words to achieve the purpose of named entity recognition. Several element positioning words are pre-defined in your rule extractor, such as height, discharge diagnosis, etc. are defined element positioning words, through element positioning Words and their synonyms are retrieved and matched in the second preprocessed text content to identify named entities.

在本实施例中,通过所述预设规则抽取器对输入的所述第二预处理结果进行领域搜索时,所述方法包括:以匹配到的命名实体在所述文本内容中的位置为基准,对所述第二预处理后的文本内容依次执行距离筛选、词性筛选和语义筛选操作,获得命名实体对应的完整的值。In this embodiment, when performing a domain search on the input second preprocessing result by using the preset rule extractor, the method includes: taking the position of the matched named entity in the text content as a benchmark , and sequentially perform distance filtering, part-of-speech filtering and semantic filtering operations on the second preprocessed text content to obtain a complete value corresponding to the named entity.

具体的,当通过要素匹配词定位到命名实体后,进一步通过领域搜索抽取命名实体对应的值,在本实施例中进行领域搜索时,所述方法包括以匹配到的命名实体在文本内容中的位置为基准,对第二预处理后的文本内容依次执行距离筛选、词性筛选和语义筛选步骤,获得命名实体对应的完整的值。其中,距离筛选是在空间上查询与匹配到的命名实体的距离满足要求的多个要素,一般来说是位于匹配到的命名实体之后距离较近的几个要素,词性筛选是对查询到的多个要素进行词性识别,再基于匹配到的命名实体进行要素筛选,得到第一要素,语义筛选则是对与第一要素空间位置上临近的其它要素进行语义识别和匹配,将匹配到的要素作为第二要素,将第一要素和第二要素拼接作为匹配到的命名实体对应的值。例如基于要素匹配词“身高”在第二预处理后的文本内容中匹配到命名实体“身高”,并进一步在“身高”之后匹配到空间上距离最近的数量词,再进一步基于数量词匹配最近的要素,得到数量词的单位,将数量词和单位拼接得到身高的值。在输出命名实体和对应的值的同时,规则抽取器还将输出各个值的置信度,该置信度为命名实体及对应的值识别正确的概率值。Specifically, after locating the named entity through the element matching word, the value corresponding to the named entity is further extracted through the domain search. In this embodiment, when the domain search is performed, the method includes using the matched named entity in the text content. Taking the position as the benchmark, the steps of distance filtering, part-of-speech filtering and semantic filtering are performed on the second preprocessed text content in sequence to obtain the complete value corresponding to the named entity. Among them, distance filtering is to spatially query multiple elements whose distance from the matched named entity meets the requirements. Generally speaking, it refers to several elements with a closer distance after the matched named entity. Part-of-speech filtering is for the query results. Perform part-of-speech recognition on multiple elements, and then perform element screening based on the matched named entities to obtain the first element. Semantic screening is to semantically identify and match other elements adjacent to the first element in the spatial position, and the matched elements As the second element, the first element and the second element are concatenated as the value corresponding to the matched named entity. For example, based on the element matching word "height", the named entity "height" is matched in the text content after the second preprocessing, and further, after "height", the nearest quantifier in space is matched, and then the nearest element is further matched based on the quantifier. , get the unit of the quantifier, and concatenate the quantifier and the unit to get the value of height. While outputting the named entity and the corresponding value, the rule extractor will also output the confidence of each value, which identifies the correct probability value for the named entity and the corresponding value.

在一些实施例中,命名实体的值不一定有单位,故领域搜索中语义筛选并不是必须的,可根据实际情况适应性执行。In some embodiments, the value of the named entity does not necessarily have a unit, so semantic filtering in the domain search is not necessary, and can be adaptively performed according to the actual situation.

在一些实施例中,若根据要素定位词没有匹配到相同的命名实体,可以进一步通过计算第二预处理后的文本内容的分词与要素定位词的编辑距离进行模糊匹配,实现命名实体识别。由于实体的命名往往没有规律,可能存在多种变形、拼写形式,这样导致基于要素定位词完全匹配的命名实体识别召回率较低,使用编辑距离由完全匹配泛化到模糊匹配,有效提高识别成功率。In some embodiments, if the same named entity is not matched according to the element locator, the named entity recognition may be further performed by calculating the edit distance between the word segmentation of the second preprocessed text content and the element locator to perform fuzzy matching. Because the naming of entities is often irregular, there may be various forms of deformation and spelling, which leads to a low recall rate of named entity recognition based on the exact matching of element positioning words. The use of edit distance to generalize from perfect matching to fuzzy matching effectively improves the recognition success. Rate.

在本实施例中,将第二预处理结果输入预设神经网络模型后,所述方法包括:对所述第二预处理结果进行多尺度滑窗上下文拼接,根据拼接内容进行命名实体识别,得到识别的命名实体及对应的值以及各个值的置信度。其中,预设神经网络模型根据拼接内容进行命名实体识别时是对文本内容进行结构化解析,识别出来文本中的所有实体以及对应的值。In this embodiment, after the second preprocessing result is input into the preset neural network model, the method includes: performing multi-scale sliding window context splicing on the second preprocessing result, performing named entity recognition according to the splicing content, and obtaining The identified named entities and their corresponding values and the confidence level for each value. Among them, when the preset neural network model performs named entity recognition according to the spliced content, it performs structured analysis on the text content, and identifies all entities and corresponding values in the text.

在本实施例中,所述根据所述输出结果进行模型集成处理以抽取非制式文本要素的步骤包括:将所述预设规则抽取器和所述预设神经网络模型输出的各个值的置信度进行对标归一化处理,再对所述预设规则抽取器和所述预设神经网络模型识别的同一命名实体的归一化后的置信度进行比较,选取置信度更高的值作为命名实体最终的值,并基于识别的命名实体及其最终的值进行非制式文本要素抽取。在完成抽取操作后,将抽取的结果录入理赔系统,完成理赔资料的结构化录入。In this embodiment, the step of performing model integration processing to extract non-standard text elements according to the output result includes: calculating the confidence level of each value output by the preset rule extractor and the preset neural network model Carry out benchmarking normalization processing, then compare the normalized confidence levels of the same named entity identified by the preset rule extractor and the preset neural network model, and select a value with a higher confidence level as the name The final value of the entity, and based on the identified named entity and its final value, non-standard text feature extraction is performed. After the extraction operation is completed, the extracted results are entered into the claim settlement system, and the structured entry of the claim settlement data is completed.

在本实施例中,在所述将抽取的结果录入理赔系统,完成理赔资料的结构化录入的步骤后,所述方法还包括:响应用户的溯源请求对抽取的非制式文本要素进行溯源,获取所述抽取的非制式文本要素在原始图像中的位置,以供用户对录入的文本内容进行核查。相应的,所述方法还包括建立文本和图像位置的对应关系的步骤,具体通过目标检测确定文本内容在图像中的位置,并以坐标形式将文本内容和图像位置的映射关系进行存储,当用户鼠标停留在录入的文本界面时,响应该停留动作,自动根据鼠标停留位置的文本获取对应的图像并显示,便于人工核查自动录入的文本内容,相比于现有全程人工录入的方式,可以大大降低录入时间,提高整体时效。In this embodiment, after the step of entering the extracted results into the claim settlement system and completing the structured entry of the claim settlement data, the method further includes: tracing the extracted non-standard text elements in response to a user's traceability request, and obtaining The position of the extracted non-standard text elements in the original image is for the user to check the entered text content. Correspondingly, the method also includes the step of establishing the correspondence between the text and the image position, specifically determining the position of the text content in the image through target detection, and storing the mapping relationship between the text content and the image position in the form of coordinates. When the mouse stays on the entered text interface, in response to the stop action, the corresponding image is automatically obtained and displayed according to the text at the mouse stop position, which is convenient for manual verification of the automatically entered text content. Compared with the existing manual entry method, it can be greatly improved. Reduce entry time and improve overall timeliness.

在本实施例中,在所述将抽取的结果录入理赔系统,完成理赔资料的结构化录入的步骤后,所述方法还包括:获取新的历史理赔资料,根据全部历史理赔资料对所述预设规则抽取器中预设的要素定位词和匹配规则进行更新,并根据全部历史理赔资料作为模型训练数据对所述预设神经网络模型进行训练,更新所述预设神经网络模型的参数。本步骤可对预设规则抽取器和预设神经网络模型进行优化,其中对于规则抽取器的优化是根据历史的理赔资料对预设的要素定位词和匹配规则进行优化更新,对于预设神经网络模型的优化是根据历史的理赔资料作为模型训练数据进行优化训练,更新模型参数,从而可以通过优化后的预设规则抽取器和预设神经网络模型来优化理赔的服务。In this embodiment, after the step of entering the extracted result into the claim settlement system and completing the structured entry of the claim settlement data, the method further includes: acquiring new historical claim settlement data, and adjusting the pre-determined claim settlement data according to all the historical claim settlement data. It is assumed that the preset element location words and matching rules in the rule extractor are updated, and the preset neural network model is trained according to all historical claim settlement data as model training data, and the parameters of the preset neural network model are updated. In this step, the preset rule extractor and the preset neural network model can be optimized. The optimization of the rule extractor is to optimize and update the preset element positioning words and matching rules according to the historical claim data. For the preset neural network The optimization of the model is based on the historical claim settlement data as model training data to perform optimization training and update the model parameters, so that the claim settlement service can be optimized through the optimized preset rule extractor and preset neural network model.

本申请上述方法可以对客户提交的资料(包括客户证件资料、医疗诊治资料、医疗费用资料,医疗诊治资料包括门诊病历、住院病历、病理报告等,医疗费用资料包括医疗发票、医疗清单、社保结算单等),进行智能录入存档,具体通过对理赔资料进行单证分类,再根据预设规则抽取器和预设神经网络模型基于所述单证分类结果对所述文本内容进行非制式文本要素抽取,从而进行理赔资料的结构化录入,实现了理赔资料的高效自动化和智能化录入,从而提高资料录入的速度,节省成本,同时降低了理赔周期,提升了客户体验。The above method of this application can be used for the data submitted by customers (including customer certificate data, medical diagnosis and treatment data, medical expense data, medical diagnosis and treatment data including outpatient medical records, inpatient medical records, pathology reports, etc., medical expenses data including medical invoices, medical bills, social security settlements, etc. Documents, etc.), are intelligently entered and archived, specifically by classifying the claims data by documents, and then extracting non-standard text elements from the text content based on the document classification results according to the preset rule extractor and the preset neural network model. , so as to carry out the structured input of claims data, realize the efficient automation and intelligent input of claim data, thereby improving the speed of data input, saving costs, reducing the claim cycle and improving customer experience.

需要强调的是,为进一步保证上述理赔资料的私密和安全性,上述理赔资料中的用户隐私信息还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned claims data, the user privacy information in the above-mentioned claims data can also be stored in a node of a blockchain.

本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the program is executed, it may include the processes of the foregoing method embodiments. The aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).

应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowchart of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.

进一步参考图3,作为对上述图2所示方法的实现,本申请提供了一种理赔资料录入装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 3 , as an implementation of the method shown in FIG. 2 above, the present application provides an embodiment of a claim settlement data input device, which corresponds to the method embodiment shown in FIG. 2 . Can be used in various electronic devices.

如图3所示,本实施例所述的理赔资料录入装置300包括:资料获取模块301、图像分类模块302以及资料录入模块303。其中:As shown in FIG. 3 , the claim settlement data entry device 300 in this embodiment includes: a data acquisition module 301 , an image classification module 302 and a data entry module 303 . in:

所述资料获取模块301用于获取待录入的理赔资料,所述理赔资料包括若干图像;所述图像分类模块302用于通过预设的OCR文本检测模型对所述图像进行处理,得到与所述图像对应的文本内容,对所述文本内容进行第一预处理,根据第一预处理结果生成各所述图像的目标特征向量,再根据所述目标特征向量对各所述图像进行单证分类;所述资料录入模块303用于基于所述单证分类结果,对所述文本内容进行第二预处理,将第二预处理结果分别输入预设规则抽取器和预设神经网络模型,分别得到对应的输出结果,根据所述输出结果进行模型集成处理以抽取非制式文本要素,并将抽取的结果录入理赔系统,完成理赔资料的结构化录入。The data acquisition module 301 is used to acquire the claim settlement data to be entered, and the claim settlement data includes several images; the image classification module 302 is used to process the images through a preset OCR text detection model, and obtain the same data as the above. For the text content corresponding to the image, perform a first preprocessing on the text content, generate a target feature vector of each of the images according to the first preprocessing result, and then perform document classification on each of the images according to the target feature vector; The data entry module 303 is configured to perform second preprocessing on the text content based on the document classification results, and input the second preprocessing results into the preset rule extractor and the preset neural network model, respectively, to obtain corresponding According to the output results, model integration processing is performed to extract non-standard text elements, and the extracted results are entered into the claim settlement system to complete the structured entry of the claim settlement data.

用户需要进行理赔时,资料获取模块301接收用户通过手机端或电脑端向处理理赔资料收集和录入的服务器端发起理赔资料上传请求,响应该理赔资料上传请求后接收用户手机端或电脑端上传的理赔资料,这些理赔资料包括在理赔过程中所需的客户证件资料、医疗诊治资料、医疗费用资料,医疗诊治资料包括门诊病历、住院病历、病理报告等,医疗费用资料包括医疗发票、医疗清单、社保结算单等,这些案件理赔资料在理赔系统中都需要进行收集存储。这些理赔资料以图片的形式被上传,在接收理赔资料结束后,将这些图片以资料集的形式存储至服务器端。When the user needs to make a claim, the data acquisition module 301 receives the request for uploading the claim data from the user through the mobile phone or the computer to the server that handles the collection and entry of the claim data, and receives the data uploaded by the user's mobile phone or computer after responding to the request for uploading the claim data. Claims data, these claims data include customer certificate data, medical diagnosis and treatment data, medical expense data required in the process of claim settlement, medical diagnosis and treatment data include outpatient medical records, inpatient medical records, pathology reports, etc. Medical expenses data include medical invoices, medical bills, Social security statements, etc., the claims data of these cases need to be collected and stored in the claims system. These claims data are uploaded in the form of pictures, and after receiving the claim data, the pictures are stored to the server in the form of data sets.

在本实施例中,所述资料获取模块301还用于在服务器端获得理赔资料后,对各图像进行检测,以保证所有图像符合图像质量要求,具体可通过清晰度检测模型、方向矫正模型和翻拍检测模型中的一个或多个进行图像质检。其中,清晰度检测模型可采用现有的图像清晰度检测模型进行质检,比如基于梯度算子的图像清晰度检测模型或者通过训练后的卷积神经网络模型;方向矫正模型可采用基于投影的方法、基于Hough变换的方法、基于线性拟合的方法、基于傅里叶变换到频域进行检测的方法等进行方向矫正;翻拍检测模型可采用噪声分析、像素值偏差检测等方式进行检测。对于检测不达标的图像向用户所在的手机端或电脑端发送重新上传的提示信息,待收到新的上传图像后重新进行检测,直到全部检测通过。In this embodiment, the data acquisition module 301 is further configured to detect each image after obtaining the claim settlement data on the server side to ensure that all images meet the image quality requirements. Specifically, the definition detection model, the orientation correction model and the One or more of the remake detection models perform image quality inspection. Among them, the sharpness detection model can use the existing image sharpness detection model for quality inspection, such as the gradient operator-based image sharpness detection model or the trained convolutional neural network model; the direction correction model can use the projection-based image sharpness detection model. method, method based on Hough transform, method based on linear fitting, method based on Fourier transform to frequency domain detection, etc. for direction correction; remake detection model can be detected by noise analysis, pixel value deviation detection, etc. For images that fail to meet the detection standard, a prompt message for re-uploading is sent to the user's mobile terminal or computer terminal, and the detection is performed again after receiving a new uploaded image until all the detections pass.

在一些实施例中,所述资料获取模块301还用于对理赔资料进行去重操作,以去除用户在不同上传入口因操作不当上传的相同理赔资料并提示用户确认是否漏传理赔资料。In some embodiments, the data acquisition module 301 is further configured to perform a deduplication operation on the claim data, so as to remove the same claim data uploaded by the user in different upload portals due to improper operation, and prompt the user to confirm whether the claim data is missing.

在本步骤中,所述图像分类模块302在根据OCR文本检测模型进行文本内容获取时,可以采用现有的OCR文本检测模型来实现。In this step, when the image classification module 302 acquires the text content according to the OCR text detection model, the existing OCR text detection model can be used for implementation.

所述图像分类模块302在对所述文本内容进行第一预处理时,目的是进行粗粒度的文本内容提取,具体为将所述文本内容输入至预处理模块执行词形还原、词性标注、数字规整等操作,其中词形还原针对文本中的英文词,词性标注是对每个词是名词、动词、形容词或其他词性进行标注,数字规整的主要是对理赔资料中的医疗发票这类图像中涉及费用的数字进行规整,以利于对图像进行单证分类,此处单证分类是指将理赔资料中存在的费用单据(诸如诊断单、发票等)和身份证明(诸如身份证、驾照等)的图像进行分类,以便后续针对性进行理赔资料在理赔系统中的录入。在进行粗粒度的特征提取时,只提取能够进行单证分类的关键性文本内容作为分类特征。When the image classification module 302 performs the first preprocessing on the text content, the purpose is to perform coarse-grained text content extraction, specifically inputting the text content into the preprocessing module to perform morphological restoration, part-of-speech tagging, numbering Regularization and other operations, in which morphological restoration is for English words in the text, part-of-speech tagging is to tag each word as a noun, verb, adjective or other parts of speech, and digital regularization is mainly for medical invoices in claims data. The figures involved in expenses are organized to facilitate the document classification of images, where document classification refers to the expense documents (such as diagnosis sheets, invoices, etc.) and identity certificates (such as ID cards, driver's licenses, etc.) existing in the claims data. The images are classified, so that the claim information can be entered in the claims system in a targeted manner. When performing coarse-grained feature extraction, only the key text content that can be used for document classification is extracted as classification features.

在所述图像分类模块302进行分类特征提取时,本实施例可采用词袋模型进行,具体对文本内容进行分词后提取若干分词作为特征字段,对特征字段进行筛选得到特征字段集合,确定特征字段集合中各特征字段在基于词袋模型建立的直方图中的分布,得到中间特征向量,最后对中间特征向量进行归一化处理和降维处理得到目标特征向量。将该目标特征向量输入单证分类模型即可得到单证分类结果。对于理赔材料来讲,具体可分为证件、病历、诊断材料、发票、清单、报销证明、银行卡等不同单证类型,不同的类型的特征字段不同,比如身份证的特征字段包括姓名、号码、有效期等,对于病历的特征字段包括医院、出入院诊断、出入院日期、姓名等,词袋模型对文本的内容进行理解形成的中间特征向量的维度较大,因此需要对中间特征向量进行降维处理目标特征向量。When the image classification module 302 performs classification feature extraction, this embodiment may use a bag of words model, specifically, after word segmentation of the text content, a number of segmented words are extracted as feature fields, the feature fields are filtered to obtain a feature field set, and the feature fields are determined. The distribution of each feature field in the set in the histogram established based on the bag-of-words model is used to obtain the intermediate feature vector. Finally, the intermediate feature vector is normalized and dimensionally reduced to obtain the target feature vector. The document classification result can be obtained by inputting the target feature vector into the document classification model. For claims materials, it can be divided into different document types such as certificates, medical records, diagnostic materials, invoices, lists, reimbursement certificates, bank cards, etc. Different types have different characteristic fields. For example, the characteristic fields of ID cards include name, number, etc. , validity period, etc. For the feature fields of medical records including hospital, admission and discharge diagnosis, admission and discharge date, name, etc., the dimension of the intermediate feature vector formed by the bag-of-words model’s understanding of the content of the text is large, so it is necessary to reduce the intermediate feature vector. dimensional processing of the target feature vector.

在一些实施例中,所述图像分类模块302也可以采用卷积神经网络模型进行分类特征的提取,具体用于将每幅图像的文本内容进行拼接后输入卷积神经网络模型,得到目标特征向量,将得到的目标特征向量输入单证分类模型即可得到单证分类结果。其中,在对卷积神经网络模型进行训练时,对于不同的理赔资料类型的特征字段进行人工标注,比如身份证的特征字段包括姓名、号码、有效期等,对于病历的特征字段包括医院、出入院诊断、出入院日期、姓名等,不同的材料的布局也有一定的特征,比如发票的标题会在文件的中上部,人为对这些特征进行定义和干涉,卷积神经网络模型的训练主要包括对文本的内容理解,此外还可以训练对文本的文档布局的理解,在不断学习过程中建立不同材料特征字段的布局关系。In some embodiments, the image classification module 302 can also use a convolutional neural network model to extract classification features, specifically for splicing the text content of each image and then inputting it into the convolutional neural network model to obtain the target feature vector , and input the obtained target feature vector into the document classification model to obtain the document classification result. Among them, when training the convolutional neural network model, the feature fields of different claims data types are manually marked. For example, the feature fields of ID cards include name, number, validity period, etc., and the feature fields of medical records include hospital, admission and discharge. Diagnosis, admission and discharge date, name, etc., the layout of different materials also has certain characteristics. For example, the title of the invoice will be in the middle and upper part of the document. These characteristics are artificially defined and interfered with. The training of the convolutional neural network model mainly includes the text In addition, it can also train the understanding of the document layout of the text, and establish the layout relationship of different material feature fields in the continuous learning process.

在另一些实施例中,所述图像分类模块302也可以用于结合词袋模型和卷积神经网络模型进行分类特征的提取,具体为根据二者得到的目标特征向量的概率值进行比较,选取概率值较大的目标特征向量作为最终的目标特征向量。In other embodiments, the image classification module 302 can also be used to extract classification features by combining the bag-of-words model and the convolutional neural network model. Specifically, according to the probability values of the target feature vectors obtained by the two, select The target feature vector with a larger probability value is used as the final target feature vector.

在本实施例中,不同于所述图像分类模块302进行的第一预处理,所述资料录入模块303对文本内容进行第二预处理的目的是基于单证分类结果进行细粒度的文本内容提取,在进行第二预处理时,首先根据单证分类结果将以图像形式呈现的理赔资料分为单证两个图像集合,针对两个图像集合分别进行第二预处理,与所述图像分类模块302进行的第一预处理的区别在于,所述资料录入模块303针对两个图像集合进行第二预处理时,对于单据类的图像和证明类的图像,第二预处理的处理方式会存在差异,即第二预处理包括共用的处理方式(如词形还原、词性标注等)和针对图像类型所特定的处理方式(如单据类图像的数字规整等)。In this embodiment, different from the first preprocessing performed by the image classification module 302, the purpose of performing the second preprocessing on the text content by the data entry module 303 is to perform fine-grained text content extraction based on the document classification result , when the second preprocessing is performed, first, according to the document classification result, the claim settlement data presented in the form of images is divided into two image sets of documents, and the second preprocessing is carried out for the two image sets respectively, and the image classification module The difference between the first preprocessing performed by 302 is that when the data entry module 303 performs the second preprocessing on the two image sets, there will be differences in the processing methods of the second preprocessing for the images of the document type and the images of the certificate type. , that is, the second preprocessing includes common processing methods (such as lemmatization, part-of-speech tagging, etc.) and specific processing methods for image types (such as digital regularization of document-type images, etc.).

在本实施例中,所述资料录入模块303将第二预处理结果分别输入预设规则抽取器和预设神经网络模型的输出结果为根据第二预处理后的文本内容得到的命名实体、各命名实体对应的值以及每个值的置信度。In this embodiment, the data entry module 303 inputs the second preprocessing results into the preset rule extractor and the preset neural network model respectively, and the output results are named entities obtained according to the second preprocessed text content, each The value corresponding to the named entity and the confidence level for each value.

在本实施例中,所述资料录入模块303将第二预处理结果输入预设规则抽取器后,具体用于:通过所述预设规则抽取器对输入的所述第二预处理结果依次进行要素匹配词定位和邻域搜索,以对所述文本内容进行命名实体识别,并得到各命名实体对应的值以及各个值的置信度。In this embodiment, after the data entry module 303 inputs the second preprocessing result into the preset rule extractor, it is specifically configured to: sequentially perform the inputted second preprocessing result through the preset rule extractor. Element matching word location and neighborhood search are used to identify named entities on the text content, and obtain the values corresponding to each named entity and the confidence level of each value.

具体的,所述资料录入模块303用于通过规则抽取器从文本内容中抽取key-value对,key即为命名实体,value则为命名实体的值,比如姓名:张三、身高:175cm等,这种就是根据规则抽取器获取到的命名实体及对应的值。规则抽取器通过要素匹配词来定位命名实体,以达到命名实体识别的目的,在贵规则抽取器中预先定义有若干要素定位词,比如身高、出院诊断等就是定义的要素定位词,通过要素定位词及其同义词在第二预处理后的文本内容中进行检索匹配,识别命名实体。Specifically, the data entry module 303 is used to extract key-value pairs from the text content through a rule extractor, where the key is the named entity, and the value is the value of the named entity, such as name: Zhang San, height: 175cm, etc. This is the named entity and the corresponding value obtained according to the rule extractor. The rule extractor locates named entities through element matching words to achieve the purpose of named entity recognition. Several element positioning words are pre-defined in your rule extractor, such as height, discharge diagnosis, etc. are defined element positioning words, through element positioning Words and their synonyms are retrieved and matched in the second preprocessed text content to identify named entities.

在本实施例中,所述资料录入模块303通过所述预设规则抽取器对输入的所述第二预处理结果进行领域搜索时,具体用于:以匹配到的命名实体在所述文本内容中的位置为基准,对所述第二预处理后的文本内容依次执行距离筛选、词性筛选和语义筛选操作,获得命名实体对应的完整的值。In this embodiment, when the data entry module 303 performs a domain search on the input second preprocessing result through the preset rule extractor, it is specifically used for: using the matched named entity in the text content Based on the position in the second preprocessed text content, distance filtering, part-of-speech filtering, and semantic filtering operations are sequentially performed to obtain the complete value corresponding to the named entity.

具体的,当所述资料录入模块303通过要素匹配词定位到命名实体后,进一步通过领域搜索抽取命名实体对应的值,在本实施例中所述资料录入模块303进行领域搜索时,具体用于以匹配到的命名实体在文本内容中的位置为基准,对第二预处理后的文本内容依次执行距离筛选、词性筛选和语义筛选步骤,获得命名实体对应的完整的值。其中,距离筛选是在空间上查询与匹配到的命名实体的距离满足要求的多个要素,一般来说是位于匹配到的命名实体之后距离较近的几个要素,词性筛选是对查询到的多个要素进行词性识别,再基于匹配到的命名实体进行要素筛选,得到第一要素,语义筛选则是对与第一要素空间位置上临近的其它要素进行语义识别和匹配,将匹配到的要素作为第二要素,将第一要素和第二要素拼接作为匹配到的命名实体对应的值。例如基于要素匹配词“身高”在第二预处理后的文本内容中匹配到命名实体“身高”,并进一步在“身高”之后匹配到空间上距离最近的数量词,再进一步基于数量词匹配最近的要素,得到数量词的单位,将数量词和单位拼接得到身高的值。在输出命名实体和对应的值的同时,规则抽取器还将输出各个值的置信度,该置信度为命名实体及对应的值识别正确的概率值。Specifically, after the data entry module 303 locates the named entity through element matching words, it further extracts the value corresponding to the named entity through domain search. In this embodiment, when the data entry module 303 performs domain search, it is specifically used for Based on the position of the matched named entity in the text content, the steps of distance filtering, part-of-speech filtering and semantic filtering are sequentially performed on the second preprocessed text content to obtain a complete value corresponding to the named entity. Among them, distance filtering is to spatially query multiple elements whose distance from the matched named entity meets the requirements. Generally speaking, it refers to several elements with a closer distance after the matched named entity. Part-of-speech filtering is for the query results. Perform part-of-speech recognition on multiple elements, and then perform element screening based on the matched named entities to obtain the first element. Semantic screening is to semantically identify and match other elements adjacent to the first element in the spatial position, and the matched elements As the second element, the first element and the second element are concatenated as the value corresponding to the matched named entity. For example, based on the element matching word "height", the named entity "height" is matched in the text content after the second preprocessing, and further, after "height", the nearest quantifier in space is matched, and then the nearest element is further matched based on the quantifier. , get the unit of the quantifier, and concatenate the quantifier and the unit to get the value of height. While outputting the named entity and the corresponding value, the rule extractor will also output the confidence of each value, which identifies the correct probability value for the named entity and the corresponding value.

在一些实施例中,命名实体的值不一定有单位,故所述资料录入模块303进行领域搜索时语义筛选并不是必须的,可根据实际情况适应性执行。In some embodiments, the value of the named entity does not necessarily have a unit, so semantic filtering is not necessary when the data entry module 303 performs a domain search, and can be adaptively performed according to the actual situation.

在一些实施例中,若所述资料录入模块303根据要素定位词没有匹配到相同的命名实体,所述资料录入模块303还用于通过计算第二预处理后的文本内容的分词与要素定位词的编辑距离进行模糊匹配,实现命名实体识别。由于实体的命名往往没有规律,可能存在多种变形、拼写形式,这样导致基于要素定位词完全匹配的命名实体识别召回率较低,使用编辑距离由完全匹配泛化到模糊匹配,有效提高识别成功率。In some embodiments, if the data entry module 303 does not match the same named entity according to the element location word, the data entry module 303 is further configured to calculate the word segmentation and element location word of the second preprocessed text content The edit distance is used for fuzzy matching to realize named entity recognition. Because the naming of entities is often irregular, there may be various forms of deformation and spelling, which leads to a low recall rate of named entity recognition based on the exact matching of element positioning words. The use of edit distance to generalize from perfect matching to fuzzy matching effectively improves the recognition success. Rate.

在本实施例中,所述资料录入模块303将第二预处理结果输入预设神经网络模型后,具体用于对所述第二预处理结果进行多尺度滑窗上下文拼接,根据拼接内容进行命名实体识别,得到识别的命名实体及对应的值以及各个值的置信度。其中,预设神经网络模型根据拼接内容进行命名实体识别时是对文本内容进行结构化解析,识别出来文本中的所有实体以及对应的值。In this embodiment, after the data entry module 303 inputs the second preprocessing result into the preset neural network model, it is specifically used to perform multi-scale sliding window context splicing on the second preprocessing result, and name the splicing content according to the content. Entity identification, get the identified named entity and corresponding value and the confidence of each value. Among them, when the preset neural network model performs named entity recognition according to the spliced content, it performs structured analysis on the text content, and identifies all entities and corresponding values in the text.

在本实施例中,所述资料录入模块303根据所述输出结果进行模型集成处理以抽取非制式文本要素时具体用于将所述预设规则抽取器和所述预设神经网络模型输出的各个值的置信度进行对标归一化处理,再对所述预设规则抽取器和所述预设神经网络模型识别的同一命名实体的归一化后的置信度进行比较,选取置信度更高的值作为命名实体最终的值,并基于识别的命名实体及其最终的值进行非制式文本要素抽取。在完成抽取操作后,将抽取的结果录入理赔系统,完成理赔资料的结构化录入。In this embodiment, when the data input module 303 performs model integration processing according to the output result to extract non-standard text elements, it is specifically used to input the output of the preset rule extractor and the preset neural network model. The confidence level of the value is normalized, and then the normalized confidence level of the same named entity identified by the preset rule extractor and the preset neural network model is compared, and the confidence level is higher. The value of the named entity is used as the final value of the named entity, and based on the identified named entity and its final value, the non-standard text feature extraction is performed. After the extraction operation is completed, the extracted results are entered into the claim settlement system, and the structured entry of the claim settlement data is completed.

在本实施例中,如图4所示,所述装置还包括溯源模块304,用于在所述资料录入模块303将抽取的结果录入理赔系统,完成理赔资料的结构化录入后,响应用户的溯源请求对抽取的非制式文本要素进行溯源,获取所述抽取的非制式文本要素在原始图像中的位置,以供用户对录入的文本内容进行核查。相应的,所述溯源模块34还用于建立文本和图像位置的对应关系,具体用于通过目标检测确定文本内容在图像中的位置,并以坐标形式将文本内容和图像位置的映射关系进行存储,当用户鼠标停留在录入的文本界面时,响应该停留动作,自动根据鼠标停留位置的文本获取对应的图像并显示,便于人工核查自动录入的文本内容,相比于现有全程人工录入的方式,可以大大降低录入时间,提高整体时效。In this embodiment, as shown in FIG. 4 , the device further includes a traceability module 304, which is used for entering the extracted results into the claim settlement system in the data entry module 303, and after completing the structured entry of the claim settlement data, responds to the user's The source tracing request traces the extracted non-standard text elements to the source, and obtains the position of the extracted non-standard text elements in the original image, so that the user can check the entered text content. Correspondingly, the traceability module 34 is also used to establish the correspondence between the text and the image position, and is specifically used to determine the position of the text content in the image through target detection, and store the mapping relationship between the text content and the image position in the form of coordinates. , when the user's mouse stays on the entered text interface, in response to the stop action, the corresponding image is automatically obtained and displayed according to the text at the mouse stop position, which is convenient for manual verification of the automatically entered text content. Compared with the existing manual entry method in the whole process , which can greatly reduce the input time and improve the overall timeliness.

在本实施例中,如图4所示,所述装置还包括更新模块305,用于在所述资料录入模块303将抽取的结果录入理赔系统,完成理赔资料的结构化录入后,获取新的历史理赔资料,根据全部历史理赔资料对所述预设规则抽取器中预设的要素定位词和匹配规则进行更新,并根据全部历史理赔资料作为模型训练数据对所述预设神经网络模型进行训练,更新所述预设神经网络模型的参数。所述更新模块对预设规则抽取器和预设神经网络模型进行优化,其中对于规则抽取器的优化是根据历史的理赔资料对预设的要素定位词和匹配规则进行优化更新,对于预设神经网络模型的优化是根据历史的理赔资料作为模型训练数据进行优化训练,更新模型参数,从而可以通过优化后的预设规则抽取器和预设神经网络模型来优化理赔的服务。In this embodiment, as shown in FIG. 4 , the device further includes an update module 305 for entering the extracted results into the claim settlement system in the data entry module 303, and after completing the structured entry of the claim settlement data, obtains new Historical claim settlement data, update the preset element positioning words and matching rules in the preset rule extractor according to all historical claim settlement data, and train the preset neural network model according to all historical claim settlement data as model training data , and update the parameters of the preset neural network model. The update module optimizes the preset rule extractor and the preset neural network model, wherein the optimization of the rule extractor is to optimize and update the preset element locating words and matching rules according to historical claims data. The optimization of the network model is based on the historical claim settlement data as model training data to perform optimization training and update model parameters, so that the claim settlement service can be optimized through the optimized preset rule extractor and preset neural network model.

本申请上述装置可以对客户提交的资料(包括客户证件资料、医疗诊治资料、医疗费用资料,医疗诊治资料包括门诊病历、住院病历、病理报告等,医疗费用资料包括医疗发票、医疗清单、社保结算单等),进行智能录入存档,具体通过对理赔资料进行单证分类,再根据预设规则抽取器和预设神经网络模型基于所述单证分类结果对所述文本内容进行非制式文本要素抽取,从而进行理赔资料的结构化录入,实现了理赔资料的高效自动化和智能化录入,从而提高资料录入的速度,节省成本,同时降低了理赔周期,提升了客户体验。The above-mentioned device in this application can provide data submitted by customers (including customer certificate data, medical diagnosis and treatment data, medical expense data, medical diagnosis and treatment data include outpatient medical records, inpatient medical records, pathology reports, etc., and medical expenses data include medical invoices, medical bills, social security settlements, etc. Documents, etc.), are intelligently entered and archived, specifically by classifying the claims data by documents, and then extracting non-standard text elements from the text content based on the document classification results according to the preset rule extractor and the preset neural network model. , so as to carry out the structured input of claims data, realize the efficient automation and intelligent input of claim data, thereby improving the speed of data input, saving costs, reducing the claim cycle and improving customer experience.

为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图5,图5为本实施例计算机设备基本结构框图。To solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 5 for details. FIG. 5 is a block diagram of a basic structure of a computer device according to this embodiment.

所述计算机设备5包括通过系统总线相互通信连接存储器51、处理器52、网络接口53。所述存储器51中存储有计算机可读指令,所述处理器52执行所述计算机可读指令时实现上述实施例所述的理赔资料录入方法的步骤。The computer device 5 includes a memory 51 , a processor 52 , and a network interface 53 that communicate with each other through a system bus. The memory 51 stores computer-readable instructions, and when the processor 52 executes the computer-readable instructions, the steps of the claim settlement data entry method described in the foregoing embodiment are implemented.

需要指出的是,图5中仅示出了具有组件51-53的计算机设备5,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(ApplicationSpecific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable GateArray,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。It should be pointed out that only the computer device 5 with components 51-53 is shown in FIG. 5, but it should be understood that it is not required to implement all the shown components, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (ApplicationSpecific Integrated Circuit, ASIC), programmable gate array (Field-Programmable GateArray, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.

所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment. The computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.

所述存储器51至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器51可以是所述计算机设备5的内部存储单元,例如该计算机设备5的硬盘或内存。在另一些实施例中,所述存储器51也可以是所述计算机设备5的外部存储设备,例如该计算机设备5上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(FlashCard)等。当然,所述存储器51还可以既包括所述计算机设备5的内部存储单元也包括其外部存储设备。本实施例中,所述存储器51通常用于存储安装于所述计算机设备5的操作系统和各类应用软件,例如理赔资料录入方法的计算机可读指令等。此外,所述存储器51还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 51 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc. In some embodiments, the memory 51 may be an internal storage unit of the computer device 5 , such as a hard disk or a memory of the computer device 5 . In other embodiments, the memory 51 may also be an external storage device of the computer device 5 , for example, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (FlashCard) and so on. Of course, the memory 51 may also include both the internal storage unit of the computer device 5 and its external storage device. In this embodiment, the memory 51 is generally used to store the operating system and various application software installed on the computer device 5 , such as computer-readable instructions of a method for inputting claim settlement data. In addition, the memory 51 can also be used to temporarily store various types of data that have been output or will be output.

所述处理器52在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器52通常用于控制所述计算机设备5的总体操作。本实施例中,所述处理器52用于运行所述存储器51中存储的计算机可读指令或者处理数据,例如运行所述理赔资料录入方法的计算机可读指令。In some embodiments, the processor 52 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. This processor 52 is typically used to control the overall operation of the computer device 5 . In this embodiment, the processor 52 is configured to execute computer-readable instructions stored in the memory 51 or process data, such as computer-readable instructions for executing the claim settlement data entry method.

所述网络接口53可包括无线网络接口或有线网络接口,该网络接口53通常用于在所述计算机设备5与其他电子设备之间建立通信连接。The network interface 53 may include a wireless network interface or a wired network interface, and the network interface 53 is generally used to establish a communication connection between the computer device 5 and other electronic devices.

本申请上述计算机设备可以对客户提交的资料(包括客户证件资料、医疗诊治资料、医疗费用资料,医疗诊治资料包括门诊病历、住院病历、病理报告等,医疗费用资料包括医疗发票、医疗清单、社保结算单等),进行智能录入存档,具体通过对理赔资料进行单证分类,再根据预设规则抽取器和预设神经网络模型基于所述单证分类结果对所述文本内容进行非制式文本要素抽取,从而进行理赔资料的结构化录入,实现了理赔资料的高效自动化和智能化录入,从而提高资料录入的速度,节省成本,同时降低了理赔周期,提升了客户体验。The above-mentioned computer equipment in this application can provide information about the client (including client certificate information, medical diagnosis and treatment information, medical expense information, medical diagnosis and treatment information including outpatient medical records, inpatient medical records, pathology reports, etc., and medical expenses information including medical invoices, medical bills, social security Settlement documents, etc.), intelligently enter and archive, specifically by classifying the claims data by documents, and then performing non-standard text elements on the text content based on the document classification results according to the preset rule extractor and the preset neural network model. Extraction, so as to carry out the structured entry of claims data, realize efficient automation and intelligent input of claim data, thereby improving the speed of data entry, saving costs, reducing the claims cycle, and improving customer experience.

本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的理赔资料录入方法的步骤。The present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to perform the steps of the claims data entry method as described above.

本申请上述计算机可读存储介质存储的指令被执行时,可以对客户提交的资料(包括客户证件资料、医疗诊治资料、医疗费用资料,医疗诊治资料包括门诊病历、住院病历、病理报告等,医疗费用资料包括医疗发票、医疗清单、社保结算单等),进行智能录入存档,具体通过对理赔资料进行单证分类,再根据预设规则抽取器和预设神经网络模型基于所述单证分类结果对所述文本内容进行非制式文本要素抽取,从而进行理赔资料的结构化录入,实现了理赔资料的高效自动化和智能化录入,从而提高资料录入的速度,节省成本,同时降低了理赔周期,提升了客户体验。When the instructions stored in the above-mentioned computer-readable storage medium of this application are executed, the data submitted by the customer (including customer certificate data, medical diagnosis and treatment data, medical expense data, medical diagnosis and treatment data including outpatient medical records, inpatient medical records, pathology reports, etc., medical Expense data includes medical invoices, medical bills, social security statements, etc.), which are intelligently entered and archived. Specifically, the claim data is classified by documents, and then based on the results of the document classification according to the preset rule extractor and the preset neural network model. The non-standard text elements are extracted from the text content, so as to carry out the structured input of the claim data, and realize the efficient automation and intelligent input of the claim data, so as to improve the speed of data input, save the cost, reduce the claim cycle and improve the customer experience.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.

显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The accompanying drawings show the preferred embodiments of the present application, but do not limit the scope of the patent of the present application. This application may be embodied in many different forms, rather these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features. . Any equivalent structure made by using the contents of the description and drawings of the present application, which is directly or indirectly used in other related technical fields, is also within the scope of protection of the patent of the present application.

Claims (10)

1.一种理赔资料录入方法,其特征在于,包括下述步骤:1. a claim settlement data entry method, is characterized in that, comprises the following steps: 获取待录入的理赔资料,所述理赔资料包括若干图像;Obtaining claims data to be entered, the claim data including several images; 通过预设的OCR文本检测模型对所述图像进行处理,得到与所述图像对应的文本内容,对所述文本内容进行第一预处理,根据第一预处理结果生成各所述图像的目标特征向量,再根据所述目标特征向量对各所述图像进行单证分类;The image is processed through a preset OCR text detection model to obtain text content corresponding to the image, first preprocessing is performed on the text content, and target features of each image are generated according to the first preprocessing result vector, and then perform document classification on each of the images according to the target feature vector; 基于所述单证分类结果,对所述文本内容进行第二预处理,将第二预处理结果分别输入预设规则抽取器和预设神经网络模型,分别得到对应的输出结果,根据所述输出结果进行模型集成处理以抽取非制式文本要素,并将抽取的结果录入理赔系统,完成理赔资料的结构化录入。Based on the document classification result, a second preprocessing is performed on the text content, and the second preprocessing results are respectively input into the preset rule extractor and the preset neural network model, and corresponding output results are obtained respectively. The results are processed by model integration to extract non-standard text elements, and the extracted results are entered into the claim settlement system to complete the structured entry of claim settlement data. 2.根据权利要求1所述的理赔资料录入方法,其特征在于,将第二预处理结果输入预设规则抽取器后,所述方法包括:2. The claim settlement data input method according to claim 1, wherein after inputting the second preprocessing result into the preset rule extractor, the method comprises: 通过所述预设规则抽取器对输入的所述第二预处理结果依次进行要素匹配词定位和邻域搜索,以对所述文本内容进行命名实体识别,并得到各命名实体对应的值以及各个值的置信度。Element matching word location and neighborhood search are sequentially performed on the input second preprocessing result by the preset rule extractor, so as to perform named entity recognition on the text content, and obtain the value corresponding to each named entity and the corresponding value of each named entity. confidence in the value. 3.根据权利要求2所述的理赔资料录入方法,其特征在于,通过所述预设规则抽取器对输入的所述第二预处理结果进行领域搜索时,所述方法包括:3 . The claim settlement data input method according to claim 2 , wherein, when performing a field search on the input second preprocessing result by the preset rule extractor, the method comprises: 3 . 以匹配到的命名实体在所述文本内容中的位置为基准,对所述第二预处理后的文本内容依次执行距离筛选、词性筛选和语义筛选操作,获得命名实体对应的完整的值。Based on the position of the matched named entity in the text content, the operations of distance filtering, part-of-speech filtering and semantic filtering are sequentially performed on the second preprocessed text content to obtain a complete value corresponding to the named entity. 4.根据权利要求1至3任一项所述的理赔资料录入方法,其特征在于,将第二预处理结果输入预设神经网络模型后,所述方法包括:4. The claim settlement data input method according to any one of claims 1 to 3, wherein after inputting the second preprocessing result into a preset neural network model, the method comprises: 对所述第二预处理结果进行多尺度滑窗上下文拼接,根据拼接内容进行命名实体识别,得到识别的命名实体及对应的值以及各个值的置信度。Multi-scale sliding window context splicing is performed on the second preprocessing result, and named entity recognition is performed according to the splicing content, so as to obtain the identified named entities and corresponding values and the confidence of each value. 5.根据权利要求4所述的理赔资料录入方法,其特征在于,所述根据所述输出结果进行模型集成处理以抽取非制式文本要素的步骤包括:5. The claim settlement data input method according to claim 4, wherein the step of performing model integration processing according to the output result to extract non-standard text elements comprises: 将所述预设规则抽取器和所述预设神经网络模型输出的各个值的置信度进行对标归一化处理,再对所述预设规则抽取器和所述预设神经网络模型识别的同一命名实体的归一化后的置信度进行比较,选取置信度更高的值作为命名实体最终的值,并基于识别的命名实体及其最终的值进行非制式文本要素抽取。The confidence level of each value output by the preset rule extractor and the preset neural network model is subjected to standardization processing, and then the preset rule extractor and the preset neural network model are identified. The normalized confidence of the same named entity is compared, and the value with higher confidence is selected as the final value of the named entity, and based on the identified named entity and its final value, unstandardized text element extraction is performed. 6.根据权利要求5所述的理赔资料录入方法,其特征在于,在所述将抽取的结果录入理赔系统,完成理赔资料的结构化录入的步骤后,所述方法还包括:6. The claim settlement data entry method according to claim 5, characterized in that, after the described step of entering the extracted result into the claim settlement system and completing the structured entry of the claim settlement data, the method further comprises: 响应用户的溯源请求对抽取的非制式文本要素进行溯源,获取所述抽取的非制式文本要素在原始图像中的位置,以供用户对录入的文本内容进行核查。In response to the user's source tracing request, the extracted non-standard text elements are traced to the source, and the position of the extracted non-standard text elements in the original image is obtained, so that the user can check the entered text content. 7.根据权利要求6所述的理赔资料录入方法,其特征在于,在所述将抽取的结果录入理赔系统,完成理赔资料的结构化录入的步骤后,所述方法还包括:7. The claim settlement data entry method according to claim 6, characterized in that, after the described step of entering the extracted result into the claim settlement system and completing the structured entry of the claim settlement data, the method further comprises: 获取新的历史理赔资料,根据全部历史理赔资料对所述预设规则抽取器中预设的要素定位词和匹配规则进行更新,并根据全部历史理赔资料作为模型训练数据对所述预设神经网络模型进行训练,更新所述预设神经网络模型的参数。Obtain new historical claims data, update the preset element positioning words and matching rules in the preset rule extractor according to all the historical claims data, and use all the historical claim data as model training data to update the preset neural network. The model is trained, and the parameters of the preset neural network model are updated. 8.一种理赔资料录入装置,其特征在于,包括:8. A claim settlement data entry device, characterized in that it comprises: 资料获取模块,用于获取待录入的理赔资料,所述理赔资料包括若干图像;a data acquisition module, used for acquiring claim settlement data to be entered, the claim settlement data including several images; 图像分类模块,用于通过预设的OCR文本检测模型对所述图像进行处理,得到与所述图像对应的文本内容,对所述文本内容进行第一预处理,根据第一预处理结果生成各所述图像的目标特征向量,再根据所述目标特征向量对各所述图像进行单证分类;The image classification module is used to process the image through a preset OCR text detection model to obtain text content corresponding to the image, perform first preprocessing on the text content, and generate each the target feature vector of the image, and then perform document classification on each of the images according to the target feature vector; 资料录入模块,用于基于所述单证分类结果,对所述文本内容进行第二预处理,将第二预处理结果分别输入预设规则抽取器和预设神经网络模型,分别得到对应的输出结果,根据所述输出结果进行模型集成处理以抽取非制式文本要素,并将抽取的结果录入理赔系统,完成理赔资料的结构化录入。The data entry module is used to perform second preprocessing on the text content based on the document classification results, and input the second preprocessing results into the preset rule extractor and the preset neural network model respectively, and obtain corresponding outputs respectively As a result, model integration processing is performed according to the output results to extract non-standard text elements, and the extracted results are entered into the claim settlement system to complete the structured entry of the claim settlement data. 9.一种计算机设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如权利要求1至7中任一项所述的理赔资料录入方法的步骤。9. A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, any one of claims 1 to 7 is implemented. Steps of the claim data entry method described in item . 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如权利要求1至7中任一项所述的理赔资料录入方法的步骤。10. A computer-readable storage medium, wherein computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, any one of claims 1 to 7 is implemented. Steps of the claim data entry method described in item .
CN202210708855.4A 2022-06-21 2022-06-21 A method, device, computer equipment and storage medium for entering claims data Active CN115050042B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210708855.4A CN115050042B (en) 2022-06-21 2022-06-21 A method, device, computer equipment and storage medium for entering claims data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210708855.4A CN115050042B (en) 2022-06-21 2022-06-21 A method, device, computer equipment and storage medium for entering claims data

Publications (2)

Publication Number Publication Date
CN115050042A true CN115050042A (en) 2022-09-13
CN115050042B CN115050042B (en) 2025-04-08

Family

ID=83162668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210708855.4A Active CN115050042B (en) 2022-06-21 2022-06-21 A method, device, computer equipment and storage medium for entering claims data

Country Status (1)

Country Link
CN (1) CN115050042B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115862025A (en) * 2022-11-29 2023-03-28 中国工商银行股份有限公司 Method, device, equipment, medium and program product for extracting elements of product manual
CN117373030A (en) * 2023-06-19 2024-01-09 上海简答数据科技有限公司 OCR-based user material identification method, system, device and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085012A (en) * 2020-09-04 2020-12-15 泰康保险集团股份有限公司 Project name and category identification method and device
CN113688268A (en) * 2021-08-31 2021-11-23 中国平安人寿保险股份有限公司 Picture information extraction method and device, computer equipment and storage medium
CN113762100A (en) * 2021-08-19 2021-12-07 杭州米数科技有限公司 Name extraction and standardization method and device in medical bill, computing equipment and storage medium
CN114299528A (en) * 2021-12-27 2022-04-08 万达信息股份有限公司 Information extraction and structuring method for scanned document
KR20220066737A (en) * 2020-11-16 2022-05-24 주식회사 솔트룩스 Knowledge extraction system for scientific technology papers

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085012A (en) * 2020-09-04 2020-12-15 泰康保险集团股份有限公司 Project name and category identification method and device
KR20220066737A (en) * 2020-11-16 2022-05-24 주식회사 솔트룩스 Knowledge extraction system for scientific technology papers
CN113762100A (en) * 2021-08-19 2021-12-07 杭州米数科技有限公司 Name extraction and standardization method and device in medical bill, computing equipment and storage medium
CN113688268A (en) * 2021-08-31 2021-11-23 中国平安人寿保险股份有限公司 Picture information extraction method and device, computer equipment and storage medium
CN114299528A (en) * 2021-12-27 2022-04-08 万达信息股份有限公司 Information extraction and structuring method for scanned document

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115862025A (en) * 2022-11-29 2023-03-28 中国工商银行股份有限公司 Method, device, equipment, medium and program product for extracting elements of product manual
CN117373030A (en) * 2023-06-19 2024-01-09 上海简答数据科技有限公司 OCR-based user material identification method, system, device and medium

Also Published As

Publication number Publication date
CN115050042B (en) 2025-04-08

Similar Documents

Publication Publication Date Title
CN114398477A (en) A policy recommendation method based on knowledge graph and its related equipment
CN114780746A (en) Knowledge graph-based document retrieval method and related equipment thereof
CN113988223B (en) Certificate image recognition method, device, computer equipment and storage medium
CN115050042B (en) A method, device, computer equipment and storage medium for entering claims data
CN114926282A (en) Abnormal transaction identification method and device, computer equipment and storage medium
CN114626731A (en) Risk identification method, apparatus, electronic device, and computer-readable storage medium
CN116453125A (en) Data input method, device, equipment and storage medium based on artificial intelligence
US11875374B2 (en) Automated auditing and recommendation systems and methods
CN113936286B (en) Image text recognition method, device, computer equipment and storage medium
CN118656495B (en) Public opinion publishing traceability method, device, equipment and storage medium thereof
CN111639164A (en) Question-answer matching method and device of question-answer system, computer equipment and storage medium
CN114265835A (en) Data analysis method and device based on graph mining and related equipment
CN112632249A (en) Method and device for displaying different versions of information of product, computer equipment and medium
CN111177387A (en) User list information processing method, electronic device and computer-readable storage medium
CN113688268B (en) Picture information extraction method, device, computer equipment and storage medium
CN114359928B (en) An electronic invoice identification method, device, computer equipment and storage medium
CN117251799A (en) Financial certificate processing method and device, computer equipment and storage medium
CN116663495A (en) Text standardization processing method, device, equipment and medium
CN116704528A (en) Bill identification verification method, device, computer equipment and storage medium
CN116774973A (en) Data rendering method, device, computer equipment and storage medium
WO2022105120A1 (en) Text detection method and apparatus from image, computer device and storage medium
CN114820211B (en) Method, device, computer equipment and storage medium for checking and verifying quality of claim data
US11956400B2 (en) Systems and methods for measuring document legibility
CN117056488A (en) Data complement method, device, equipment and storage medium based on artificial intelligence
CN117076775A (en) Information data processing method, information data processing device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant