CN114820211B

CN114820211B - Method, device, computer equipment and storage medium for checking and verifying quality of claim data

Info

Publication number: CN114820211B
Application number: CN202210447277.3A
Authority: CN
Inventors: 马亿凯
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2022-04-26
Filing date: 2022-04-26
Publication date: 2024-06-14
Anticipated expiration: 2042-04-26
Also published as: CN114820211A

Abstract

The embodiment of the application belongs to the field of artificial intelligence big data, and relates to a method for checking and verifying the quality of claim settlement data, which comprises the following steps: receiving a plurality of images uploaded by a client; performing image quality inspection through a definition detection model, a direction correction model and a flap detection model; performing text recognition on the image meeting the quality inspection requirement through a preset OCR model; inputting text content as a model, respectively outputting a first feature vector and a second feature vector of each image through a first preset model and a second preset model, and carrying out feature fusion based on the first feature vector and the second feature vector to obtain a target feature vector of each image; and obtaining classification labels of the images according to the target feature vectors for verification. The application also provides a device for checking the quality of the claim settlement data, a computer device and a storage medium. In addition, the application also relates to a blockchain technology, and privacy information of a user in the verification process can be stored in the blockchain. The application can realize intelligent verification of the claim settlement data, shorten the claim settlement period and promote customer experience.

Description

Method, device, computer equipment and storage medium for checking and verifying quality of claim data

Technical Field

The present application relates to the field of big data technologies, and in particular, to a method and apparatus for verifying quality of claim settlement data, a computer device, and a storage medium.

Background

In the process of claim settlement, clients are required to submit relevant case claim settlement data to prove the rationality and the authenticity of the cases, traditional case claim settlement data collection is manually checked, when the data errors are checked or the data quality such as image definition does not meet the requirements, inferior quality bills are issued to enable the clients to re-submit, the checking process consumes a large amount of manpower and material resources, the manual quality check is long in time and low in efficiency, the claim settlement period of the clients is prolonged, and the client experience is affected.

Disclosure of Invention

The embodiment of the application aims to provide a method, a device, computer equipment and a storage medium for checking the quality of claim data, so as to solve the problems that the checking process of the claim data in the prior art consumes a large amount of manpower and material resources and has low checking efficiency.

In order to solve the technical problems, the embodiment of the application provides a method for checking the quality of claim settlement data, which adopts the following technical scheme:

a method for checking the quality of claim data includes the following steps:

Responding to an claim data uploading request of a client, and receiving claim data uploaded by the client, wherein the claim data comprises a plurality of images;

Performing image quality inspection on the images through a preset definition detection model, a direction correction model and a flap detection model, and outputting an image quality inspection result of each image;

Performing text recognition on the image with the image quality inspection result meeting the image quality inspection requirement through a preset OCR model to obtain corresponding text content;

Preprocessing the text content, inputting the preprocessed text content as a model, outputting a first feature vector of each image through a first preset model, respectively outputting a second feature vector of each image through a second preset model, and carrying out feature fusion based on the first feature vector and the second feature vector to obtain a target feature vector of each image;

And inputting the target feature vector into a preset classification model to obtain classification labels of all the images, carrying out verification of claim settlement data based on the classification labels, and outputting verification results.

Further, the first preset model is a bag-of-word model, and the step of outputting the first feature vector of each image through the first preset model includes:

For each image, word segmentation processing is carried out on text content corresponding to the image to obtain a plurality of segmented words, a plurality of characteristic fields are extracted from the segmented words, characteristic field sets are formed after screening the characteristic fields, distribution of each characteristic field in the characteristic field sets in a histogram established based on the word bag model is determined, an intermediate characteristic vector is obtained, and a corresponding probability value is output and is recorded as a first score;

And carrying out normalization processing on the intermediate feature vector, and then carrying out dimension reduction processing to obtain a first feature vector and a corresponding first score.

Further, the second preset model is a convolutional neural network model, and the step of outputting the second feature vector of each image through the second preset model includes:

for each image, splicing text contents of the images to obtain spliced texts, inputting the spliced texts into the convolutional neural network model to obtain second feature vectors of the images, outputting corresponding probability values, and marking the second feature vectors as second scores;

the step of performing feature fusion based on the first feature vector and the second feature vector to obtain target feature vectors of each image specifically comprises the following steps:

and comparing the corresponding first score with the second score for each image, if the first score is larger than the second score, taking the first feature vector corresponding to the first score as the target feature vector, and if the first score is smaller than the second score, taking the second feature vector corresponding to the second score as the target feature vector.

Further, before the step of outputting the first feature vector of each image through the first preset model, the method further includes a step of constructing a bag-of-word model, specifically:

Extracting text segmentation words from images corresponding to different types of claim settlement data to form a text segmentation word set, wherein the extracted text segmentation words are locally unchanged features in the images;

Combining text word segmentation with similar word senses in the text word segmentation set, constructing a word list containing K words, counting the occurrence times of each word in the word list in each image, representing each image as a K-dimension value vector, carrying out clustering operation on a plurality of K-dimension value vectors based on the size of a preset dictionary to obtain word bags containing M words, and completing construction of the word bag model.

Further, when the image quality of the plurality of images is checked by the preset flap detection model, the method includes:

Performing color space conversion on the image subjected to the reproduction detection, converting an original image into a YCbCr color space from an RGB color space, and performing normalization processing on the converted brightness value;

Converting the image converted by the color space through a preset processing function to obtain a new brightness value subjected to specular reflection removal processing, and obtaining a first difference value according to the new brightness value and the brightness value subjected to normalization processing;

converting the image before the color space conversion by the preset processing function to obtain an RGB value after the specular reflection removal processing, and obtaining a second difference value according to the new RGB value and the RGB value before the color space conversion;

And judging whether the image subjected to the image reproduction detection is a reproduction photo or not according to the difference value of the first difference value and the second difference value.

Further, before the step of performing image quality inspection on the plurality of images through the preset sharpness detection model, the direction correction model and the flap detection model, the method further includes:

And carrying out de-duplication operation on the images to remove the same claim settlement data uploaded by the user at different clients, carrying out data category check on the claim settlement data after de-duplication, and prompting the user to upload the absent claim settlement data when any one data category lacks the corresponding claim settlement data.

Further, after the step of performing text recognition on the image with the image quality inspection result meeting the image quality inspection requirement through the preset OCR model, the method further includes:

And establishing and storing a mapping relation between the identified text content and the corresponding image so as to obtain and display the corresponding image according to the text contained in the text content tracing request when responding to the text content tracing request.

In order to solve the technical problems, the embodiment of the application also provides a device for checking the quality of the claim settlement data, which adopts the following technical scheme:

an apparatus for verifying the quality of claim material, comprising:

The receiving module is used for responding to the claim data uploading request of the client and receiving the claim data uploaded by the client, wherein the claim data comprises a plurality of images;

The quality inspection module is used for inspecting the image quality of the images through a preset definition detection model, a preset direction correction model and a preset flap detection model, and outputting an image quality inspection result of each image;

The text recognition module is used for carrying out text recognition on the image with the image quality inspection result meeting the image quality inspection requirement through a preset OCR model to obtain corresponding text content;

The feature extraction module is used for preprocessing the text content and inputting the text content as a model, outputting a first feature vector of each image through a first preset model, respectively outputting a second feature vector of each image through a second preset model, and carrying out feature fusion based on the first feature vector and the second feature vector to obtain a target feature vector of each image;

and the verification module is used for inputting the target feature vector into a preset classification model to obtain classification labels of the images, carrying out verification of claim settlement data based on the classification labels and outputting verification results.

In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:

a computer device comprising a memory having stored therein computer readable instructions which when executed by the processor implement the steps of the claim document quality verification method as described above.

In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:

A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor perform the steps of a method for verifying the quality of claim material as described above.

Compared with the prior art, the embodiment of the application has the following main beneficial effects:

According to the scheme, multistage quality inspection and verification are carried out on the claim settlement data, firstly, image quality inspection is carried out on a plurality of images through a definition detection model, a direction correction model and a flap detection model, then text content is extracted through an OCR model, target feature vectors for image classification are obtained through feature extraction and fusion of the two models, and after classification labels are obtained, intelligent verification of the claim settlement data is carried out, so that the intelligent verification of the claim settlement data can be realized, the whole quality inspection and verification process is shortened to a millisecond level, the condition that the claim settlement data is not compliant can be rapidly intercepted, accordingly, single shot of inferior products is reduced, the claim settlement period is shortened, and customer experience is improved.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method of claim document quality verification in accordance with the present application;

FIG. 3 is a schematic diagram illustrating an embodiment of a claim document quality inspection device according to the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device in accordance with the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture ExpertsGroup Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving PictureExperts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the method for verifying the quality of the claim data provided by the embodiment of the application is generally executed by a server, and accordingly, the device for verifying the quality of the claim data is generally disposed in the server.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow chart of one embodiment of a method of claim document quality verification according to the present application is shown. The method for verifying the quality of the claim data comprises the following steps:

step S201, receiving the claim settlement data uploaded by the client in response to the claim settlement data uploading request of the client, wherein the claim settlement data comprises a plurality of images.

In this embodiment, the electronic device (for example, the server shown in fig. 1) on which the method for verifying the quality of the claim data operates may respond to the request for uploading the claim data of the client through a wired connection manner or a wireless connection manner, and receive the claim data uploaded by the client. It should be noted that the wireless connection may include, but is not limited to, 3G/4G/5G connection, wiFi connection, bluetooth connection, wiMAX connection, zigbee connection, UWB (ultra wideband) connection, and other now known or later developed wireless connection.

Specifically, when a user needs to make an claim, a mobile phone end or a computer end initiates a claim data uploading request to a server end for processing claim data collection, the server end receives claim data uploaded by the mobile phone end or the computer end of the user after responding to the claim data uploading request, the claim data comprise client certificate data, medical diagnosis and treatment data and medical expense data required in the claim settlement process, the medical diagnosis and treatment data comprise clinic medical records, hospitalization medical records, pathology reports and the like, the medical expense data comprise medical invoices, medical checklists, social security settlement sheets and the like, and the claim data are stored after verification in the claim settlement system. In this embodiment, the claims data are received in the form of pictures, and after the claims data are received, the pictures are stored in the form of a data set.

Step S202, performing image quality inspection on the images through a preset definition detection model, a preset direction correction model and a preset flap detection model, and outputting an image quality inspection result of each image.

In this embodiment, the sharpness detection model may adopt an existing image sharpness detection model to perform quality inspection, for example, an image sharpness detection model based on a gradient operator or a convolutional neural network model after training, and is not limited herein, after sharpness is detected to be qualified, an image meeting the requirements is corrected by a direction correction model, and the image inclination correction is characterized in that an image inclination direction and an inclination angle are automatically detected according to image features.

In this embodiment, when performing image quality inspection on the plurality of images through the preset flap detection model, the method includes: performing color space conversion on the image subjected to the reproduction detection, converting an original image into a YCbCr color space from an RGB color space, and performing normalization processing on the converted brightness value; converting the image converted by the color space through a preset processing function to obtain a new brightness value subjected to specular reflection removal processing, and obtaining a first difference value according to the new brightness value and the brightness value subjected to normalization processing; converting the image before the color space conversion by the preset processing function to obtain an RGB value after the specular reflection removal processing, and obtaining a second difference value according to the new RGB value and the RGB value before the color space conversion; and judging whether the image subjected to the image reproduction detection is a reproduction photo or not according to the difference value of the first difference value and the second difference value. Specifically, the preset processing function is y=ax ³+bx² +cx, where a, b, c are constants, x is a value before conversion, and y is a value after conversion. And when the difference value of the first difference value and the second difference value exceeds the preset threshold value, judging that the photo is turned over.

In this embodiment, for an image with detected definition not reaching the standard and an image determined as a photograph to be turned over, the re-uploaded prompt message is sent to the mobile phone end or the computer end where the user is located, and after a new uploaded image is received, quality inspection of the definition detection model, the direction correction model and the photograph turning detection model is performed again, so that it is known that all quality inspection passes and then the subsequent steps are performed.

In other embodiments, the order of execution of the three models of the sharpness detection model, the orientation correction model, and the roll detection model may be other orders, such as the order of execution of the sharpness detection model, the roll detection model, and the orientation correction model.

In some embodiments, before the step of performing image quality inspection on the plurality of images by using a preset sharpness detection model, a direction correction model, and a roll-over detection model, the method further includes: and carrying out de-duplication operation on the images to remove the same claim settlement data uploaded by the user at different clients, carrying out data category check on the claim settlement data after de-duplication, and prompting the user to upload the absent claim settlement data when any one data category lacks the corresponding claim settlement data. The scheme can remove the same claim settlement data uploaded by users at different uploading inlets due to improper operation and prompt the users to confirm whether the claim settlement data is missed.

Step S203, performing text recognition on the image with the image quality inspection result meeting the image quality inspection requirement through a preset OCR model to obtain corresponding text content.

In this embodiment, all the claim data meeting the image quality inspection requirements after quality inspection by the definition inspection model, the direction correction model and the flap inspection model are transmitted to the OCR model, and text content included in each image is obtained.

In this embodiment, after the step of performing text recognition on the image that meets the image quality inspection requirement as the image quality inspection result through the preset OCR model, the method further includes: and establishing and storing a mapping relation between the identified text content and the corresponding image so as to obtain and display the corresponding image according to the text contained in the text content tracing request when responding to the text content tracing request. The mapping relation between the identified text content and the corresponding image is established, so that the follow-up text content tracing can be conveniently carried out, and when more images are used for text content identification in a single batch, the mapping relation can also be used for rapidly acquiring an image set with failed text content identification to carry out text identification again.

Step S204, preprocessing the text content, inputting the preprocessed text content as a model, outputting a first feature vector of each image through a first preset model, respectively outputting a second feature vector of each image through a second preset model, and carrying out feature fusion based on the first feature vector and the second feature vector to obtain a target feature vector of each image.

In this embodiment, when preprocessing text content, the text content is specifically input into the preprocessing module, and operations such as morphological restoration, part-of-speech labeling, and number normalization are performed on the text content, where the morphological restoration is performed on english words in the text, the part-of-speech labeling is performed on each word that is a noun, a verb, an adjective, or other part-of-speech, and the number normalization is mainly performed on numbers related to fees in images such as medical invoices in claim settlement materials, so as to be used for checking the claim settlement amount later.

In this embodiment, for a first feature vector and a second feature vector of each image output by a first preset model and a second preset model, respectively, the dimensions of the first feature vector and the second feature vector are the same.

In this embodiment, the first preset model is a bag-of-word model, and the step of outputting the first feature vector of each image through the first preset model includes: for each image, word segmentation processing is carried out on text content corresponding to the image to obtain a plurality of segmented words, a plurality of characteristic fields are extracted from the segmented words, characteristic field sets are formed after screening the characteristic fields, distribution of each characteristic field in the characteristic field sets in a histogram established based on the word bag model is determined, intermediate characteristic vectors are obtained, probability values corresponding to each first characteristic vector are output, and the probability values are recorded as first scores; and carrying out normalization processing on the intermediate feature vector, and then carrying out dimension reduction processing to obtain a first feature vector and a corresponding first score. For claim, the claim materials are classified into different types of certificates, medical records, diagnostic materials, invoices, lists, reimbursement certificates, bank cards and the like, and the characteristic fields of different types are different, such as the characteristic fields of an identity card comprise names, numbers, validity periods and the like, the characteristic fields of the medical records comprise hospitals, hospital admission diagnoses, hospital admission dates, names and the like, the dimension of an intermediate characteristic vector formed by understanding the text content by a word bag model is larger and is far larger than the dimension of a second characteristic vector, and therefore the dimension of the intermediate characteristic vector is required to be reduced to obtain a first characteristic vector which is the same as the dimension of the second characteristic vector.

Further, the second preset model is a convolutional neural network model, and the step of outputting the second feature vector of each image through the second preset model includes: and for each image, splicing the text content of the image to obtain a spliced text, inputting the spliced text into the convolutional neural network model to obtain second feature vectors of each image, outputting probability values corresponding to each second feature vector, and recording the probability values as second scores. When the convolutional neural network model is trained, the characteristic fields of different claim data types are manually marked, such as the characteristic fields of an identity card comprise names, numbers, validity periods and the like, the characteristic fields of a medical record comprise hospitals, hospital admission diagnoses, hospital admission dates, names and the like, the layout of different materials also have certain characteristics, such as the titles of invoices can be in the middle upper part of a file, the characteristics are manually defined and interfered, the training of the convolutional neural network model mainly comprises the understanding of the content of texts and the understanding of the layout of the files, and the layout relation of the characteristic fields of different materials is established in the continuous learning process.

The step of performing feature fusion based on the first feature vector and the second feature vector to obtain target feature vectors of each image specifically comprises the following steps: and comparing the corresponding first score with the second score for each image, if the first score is larger than the second score, taking the first feature vector corresponding to the first score as the target feature vector, and if the first score is smaller than the second score, taking the second feature vector corresponding to the second score as the target feature vector.

In this embodiment, before the step of outputting the first feature vector of each image through the first preset model, the method further includes a step of constructing a bag-of-word model, specifically: extracting text segmentation words from images corresponding to different types of claim settlement data to form a text segmentation word set, wherein the extracted text segmentation words are locally unchanged features in the images; combining text word segmentation with similar word senses in the text word segmentation set, constructing a word list containing K words, counting the occurrence times of each word in the word list in each image, representing each image as a K-dimension value vector, carrying out clustering operation on a plurality of K-dimension value vectors based on the size of a preset dictionary to obtain word bags containing M words, and completing construction of the word bag model.

Step S205, inputting the target feature vector into a preset classification model to obtain classification labels of all images, carrying out verification of claim settlement data based on the classification labels, and outputting verification results.

In this embodiment, after feature fusion is completed, the target feature vector of each image is input into a multi-classification model, so as to implement single/certificate classification for each image.

When checking the claim data based on the single/certificate classification result, namely the classification label, the method specifically comprises the steps of checking whether the data uploaded by a client is complete or not, for example, in the medical insurance claim settlement, if the claim limit declared by the client is X, the total limit of the uploaded invoice is smaller than X, the client is indicated to lack the invoice or transfer the invoice by mistake, and if the invoice is missing, the user is prompted to add the supplementary related data.

In this embodiment, the method further includes optimizing the model adopted in the foregoing embodiment, specifically, labeling the picture uploaded by the user based on the result of the current model identification (including definition, material type, key parameter value, etc.), and recording in the system as a buried point in combination with whether the user uploads, for example, a picture is detected as an invoice, but the user forcibly uploads in the list, which indicates that the picture is considered to be a list by the user, the high probability is that the model identification is wrong, and the model automatically performs error correction training learning with the picture as list data, so as to realize online learning, and continuously optimize the model. Further, the process of recording the claim settlement information is used as a history record of the client, and the history record is used as input data of an optimization algorithm so as to optimize the claim settlement service.

The method carries out multistage quality inspection and verification on the claim data, firstly carries out image quality inspection on a plurality of images through a definition detection model, a direction correction model and a flap detection model, then extracts text contents through an OCR model, obtains target feature vectors for image classification through feature extraction and fusion of the two models, carries out claim data verification after obtaining classification labels, can realize intelligent claim data verification, shortens the whole quality inspection process to millisecond level, can rapidly intercept the condition that the claim data is not compliant, thereby reducing the issue of inferior quality orders, shortening the claim settlement period and improving customer experience.

It should be emphasized that, to further ensure the privacy and security of the claim material, the user privacy information in the claim material may also be stored in a node of a blockchain.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a device for verifying quality of claim data, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device may be applied to various electronic devices specifically.

As shown in fig. 3, the claim document quality inspection device according to the present embodiment includes: a receiving module 301, a quality inspection module 302, a text recognition module 303, a feature extraction module 304, and a verification module 305. Wherein:

The receiving module 301 is configured to respond to a claim document uploading request of a client, and receive claim document uploaded by the client, where the claim document includes a plurality of images; the quality inspection module 302 is configured to perform image quality inspection on the plurality of images through a preset sharpness detection model, a preset direction correction model and a preset flap detection model, and output an image quality inspection result of each image; the text recognition module 303 is configured to perform text recognition on an image that meets the image quality inspection requirement according to the image quality inspection result through a preset OCR model, so as to obtain corresponding text content; the feature extraction module 304 is configured to pre-process the text content and input the pre-processed text content as a model, output a first feature vector of each image through a first preset model, respectively output a second feature vector of each image through a second preset model, and perform feature fusion based on the first feature vector and the second feature vector to obtain a target feature vector of each image; the verification module 305 is configured to input the target feature vector into a preset classification model, obtain classification labels of each image, perform verification on the claim settlement data based on the classification labels, and output a verification result.

In this embodiment, when a user needs to make a claim, a mobile phone end or a computer end initiates a claim data uploading request to a server end for processing claim data collection, and after responding to the claim data uploading request, the receiving module 301 receives claim data uploaded by the mobile phone end or the computer end of the user, where the claim data includes customer certificate data, medical diagnosis and treatment data and medical fee data required in the claim making process, the medical diagnosis and treatment data includes an outpatient record, an inpatient record, a pathology report, and the medical fee data includes a medical invoice, a medical manifest, a social security settlement sheet, and the like, and the claim data are stored after verification in the claim making system. In this embodiment, the claims data are received by the receiving module 301 in the form of pictures, and after the claims data are received, the pictures are stored in the form of data sets.

In this embodiment, the sharpness detection model adopted by the quality inspection module 302 may be an existing image sharpness detection model for quality inspection, for example, an image sharpness detection model based on a gradient operator or a convolutional neural network model after training, which is not limited herein, when the quality inspection module 302 detects that sharpness is qualified, an image meeting the requirement is corrected by a direction correction model, and the image inclination correction is characterized in that the image inclination direction and the inclination angle are automatically detected according to the image characteristics, and in this embodiment, the quality inspection module 302 may detect the image inclination direction and the inclination angle by adopting any one of a projection-based method, a Hough transformation-based method, a linear fitting-based method, and a method for detecting based on Fourier transformation to a frequency domain, and then correct according to the direction after finishing correction, the quality inspection module 302 performs image quality inspection by flipping the detection model.

In this embodiment, when the quality inspection module 302 performs image quality inspection on the plurality of images through the preset flap detection model, the quality inspection module is specifically configured to: performing color space conversion on the image subjected to the reproduction detection, converting an original image into a YCbCr color space from an RGB color space, and performing normalization processing on the converted brightness value; converting the image converted by the color space through a preset processing function to obtain a new brightness value subjected to specular reflection removal processing, and obtaining a first difference value according to the new brightness value and the brightness value subjected to normalization processing; converting the image before the color space conversion by the preset processing function to obtain an RGB value after the specular reflection removal processing, and obtaining a second difference value according to the new RGB value and the RGB value before the color space conversion; and judging whether the image subjected to the image reproduction detection is a reproduction photo or not according to the difference value of the first difference value and the second difference value. Specifically, the preset processing function is y=ax ³+bx² +cx, where a, b, c are constants, x is a value before conversion, and y is a value after conversion. The quality inspection module 302 compares the difference value of the first difference value and the second difference value with a preset threshold value when judging according to the difference value of the first difference value and the second difference value, and judges that the photo is turned over when the difference value exceeds the preset threshold value.

In this embodiment, for an image with a detected definition that does not reach the standard and an image determined to be a photograph to be photographed, the device sends a re-uploaded prompt message to a mobile phone end or a computer end where a user is located, and after receiving a new uploaded image, the quality inspection module 302 performs quality inspection on the definition detection model, the direction correction model and the photograph to be photographed again by the module to be received 301, so that it is known that all quality inspection passes and then the subsequent steps are performed.

In some embodiments, the receiving module 301 is further configured to perform a deduplication operation on the plurality of images before the step of performing image quality inspection on the plurality of images through a preset sharpness detection model, a direction correction model, and a flap detection model, so as to remove the same claim data uploaded by the user at different clients, perform a data type inspection on the claim data after deduplication, and prompt the user to upload the absent claim data when any one data type lacks the corresponding claim data. The scheme can remove the same claim settlement data uploaded by users at different uploading inlets due to improper operation and prompt the users to confirm whether the claim settlement data is missed.

In this embodiment, the text recognition module 303, the feature extraction module 304 and the verification module 305 acquire text content included in each image through the OCR model from all the claims data meeting the image quality inspection requirements after quality inspection through the sharpness detection model, the direction correction model and the flip detection model.

In this embodiment, the text recognition module 303 is further configured to, after the step of performing text recognition on the image that meets the image quality inspection requirement as a result of image quality inspection through a preset OCR model, establish and store a mapping relationship between the recognized text content and a corresponding image, so as to obtain and display a corresponding image according to a text included in the text content tracing request when responding to the text content tracing request. The text recognition module 303 establishes a mapping relation between the recognized text content and the corresponding image, so that the text content can be conveniently traced later, and when more images are used for text content recognition in a single batch, the mapping relation can also be used for rapidly acquiring an image set with failed text content recognition to carry out text recognition again.

In this embodiment, when preprocessing the text content, the feature extraction module 304 specifically performs operations such as morphological restoration, part of speech tagging, and number normalization on the text content, where the morphological restoration is directed to english words in the text, the part of speech tagging is to tag each word as a noun, a verb, an adjective, or other part of speech, and the number normalization is mainly to rule numbers related to fees in images such as medical invoices in claim settlement materials, so as to be used for checking the amount of claims later.

In this embodiment, the first preset model is a bag-of-word model, and when the feature extraction module 304 outputs the first feature vector of each image through the first preset model, the feature extraction module is specifically configured to: for each image, word segmentation processing is carried out on text content corresponding to the image to obtain a plurality of segmented words, a plurality of characteristic fields are extracted from the segmented words, characteristic field sets are formed after screening the characteristic fields, distribution of each characteristic field in the characteristic field sets in a histogram established based on the word bag model is determined, intermediate characteristic vectors are obtained, probability values corresponding to each first characteristic vector are output, and the probability values are recorded as first scores; and carrying out normalization processing on the intermediate feature vector, and then carrying out dimension reduction processing to obtain a first feature vector and a corresponding first score. For claim, the claim materials are classified into different types of certificates, medical records, diagnostic materials, invoices, lists, reimbursement certificates, bank cards and the like, and the characteristic fields of different types are different, such as the characteristic fields of an identity card comprise names, numbers, validity periods and the like, the characteristic fields of the medical records comprise hospitals, hospital admission diagnoses, hospital admission dates, names and the like, the dimension of an intermediate characteristic vector formed by understanding the text content by a word bag model is larger and is far larger than the dimension of a second characteristic vector, and therefore the dimension of the intermediate characteristic vector is required to be reduced to obtain a first characteristic vector which is the same as the dimension of the second characteristic vector.

Further, when the second preset model is a convolutional neural network model, the feature extraction module 304 is specifically configured to: and for each image, splicing the text content of the image to obtain a spliced text, inputting the spliced text into the convolutional neural network model to obtain second feature vectors of each image, outputting probability values corresponding to each second feature vector, and recording the probability values as second scores. When the convolutional neural network model is trained, the characteristic fields of different claim data types are manually marked, such as the characteristic fields of an identity card comprise names, numbers, validity periods and the like, the characteristic fields of a medical record comprise hospitals, hospital admission diagnoses, hospital admission dates, names and the like, the layout of different materials also have certain characteristics, such as the titles of invoices can be in the middle upper part of a file, the characteristics are manually defined and interfered, the training of the convolutional neural network model mainly comprises the understanding of the content of texts and the understanding of the layout of the files, and the layout relation of the characteristic fields of different materials is established in the continuous learning process.

Further, the feature extraction module 304 performs feature fusion based on the first feature vector and the second feature vector, and is specifically configured to: and comparing the corresponding first score with the second score for each image, if the first score is larger than the second score, taking the first feature vector corresponding to the first score as the target feature vector, and if the first score is smaller than the second score, taking the second feature vector corresponding to the second score as the target feature vector.

In this embodiment, the apparatus further includes a model building module, configured to extract text tokens from images corresponding to different types of claim settlement data to form a text token set before the feature extraction module 304 outputs a first feature vector of each image through a first preset model, where the extracted text tokens are features that are locally unchanged in the image; combining text word segmentation with similar word senses in the text word segmentation set, constructing a word list containing K words, counting the occurrence times of each word in the word list in each image, representing each image as a K-dimension value vector, carrying out clustering operation on a plurality of K-dimension value vectors based on the size of a preset dictionary to obtain word bags containing M words, and completing construction of the word bag model.

In this embodiment, after the feature extraction module 304 completes feature fusion, the verification module 305 inputs the target feature vector of each image into the multi-classification model, so as to implement single/certificate classification for each image. When the verification module 305 verifies the claim data based on the single/certificate classification result, i.e. the classification label, the verification module specifically includes a verification module for verifying whether the data uploaded by the customer is complete, for example, in the medical insurance claim, the claim line declared by the customer is X, if the total line of the uploaded invoice is less than X, it indicates that the customer lacks an invoice or transmits an error invoice, and if there is a miss, it prompts the user to add supplementary related data.

In this embodiment, the model building module is further configured to optimize the model adopted in the foregoing embodiment, specifically, for a picture uploaded by a user, label (including definition, material type, key parameter value, etc.) based on a result of current model identification, and combine whether the user uploads the picture, as a buried point, with the result of the current model identification, where, for example, a picture is detected as an invoice, but the user forces the uploading in a list, which indicates that the picture is considered to be a list by the user, and the high probability is that the model identification is wrong, and the model automatically performs error correction training learning with the picture as list data, so as to realize online learning, and continuously optimize the model. Further, the process of recording the claim settlement information is used as a history record of the client, and the history record is used as input data of an optimization algorithm so as to optimize the claim settlement service.

According to the device, multi-stage quality inspection and verification are carried out on the claim data, firstly, the images are subjected to image quality inspection through the definition detection model, the direction correction model and the reproduction detection model of the quality inspection module 302, then text contents are extracted through the OCR model of the text recognition module 303, the feature extraction module 304 and the verification module 305 are subjected to feature extraction and fusion through the two models to obtain target feature vectors for image classification, and after classification labels are obtained, the verification module 305 is used for carrying out claim data verification, so that intelligent claim data verification can be realized, the whole quality inspection and verification process is shortened to millisecond level, the condition that the claim data is not compliant can be rapidly intercepted, accordingly, the inferior single issue is reduced, the claim period is shortened, and the customer experience is improved.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. The memory 41 stores computer readable instructions, and the processor 42 executes the computer readable instructions to implement the steps of the method for verifying quality of claim data according to the above embodiment.

It should be noted that only computer device 4 having components 41-43 is shown in FIG. 4, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may alternatively be implemented. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used for storing an operating system and various application software installed on the computer device 4, such as computer readable instructions of a method for verifying the quality of claim data. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the method for verifying the quality of claim materials.

The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.

The computer equipment carries out multistage quality inspection and verification on the claim data, firstly carries out image quality inspection on a plurality of images through a definition detection model, a direction correction model and a reproduction detection model, then extracts text contents through an OCR model, obtains target feature vectors for image classification through feature extraction and fusion of the two models, carries out claim data verification after obtaining classification labels, can realize intelligent claim data verification, shortens the whole quality inspection process to millisecond level, and can rapidly intercept the condition that the claim data is not compliant, thereby reducing the issue of inferior quality orders, shortening the claim settlement period and improving customer experience.

The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the claim document quality inspection verification method as described above.

When the instructions stored in the computer readable storage medium are executed, the multi-stage quality inspection is carried out on the claim data, firstly, the images are inspected through the definition detection model, the direction correction model and the flap detection model, then text contents are extracted through the OCR model, the target feature vectors for image classification are obtained through feature extraction and fusion of the two models, and the claim data inspection is carried out after the classification labels are obtained, so that the intelligent inspection of the claim data can be realized, the whole quality inspection process is shortened to millisecond level, the condition that the claim data is not compliant can be rapidly intercepted, the issuing of inferior quality is reduced, the claim settlement period is shortened, and the customer experience is improved.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims

1. A method for checking the quality of claim data is characterized by comprising the following steps:

Preprocessing the text content, inputting the preprocessed text content as a model, outputting a first feature vector of each image through a first preset model, outputting a second feature vector of each image through a second preset model, and carrying out feature fusion based on the first feature vector and the second feature vector to obtain a target feature vector of each image;

inputting the target feature vector into a preset classification model to obtain classification labels of all images, carrying out verification of claim settlement data based on the classification labels, and outputting verification results;

the first preset model is a bag-of-word model, and the step of outputting the first feature vector of each image through the first preset model comprises the following steps:

For each image, word segmentation processing is carried out on text content corresponding to the image to obtain a plurality of segmented words, a plurality of characteristic fields are extracted from the segmented words, characteristic field sets are formed after screening the characteristic fields, distribution of each characteristic field in the characteristic field sets in a histogram established based on the word bag model is determined, an intermediate characteristic vector is obtained, and a corresponding probability value is output and is recorded as a first score; performing normalization processing on the intermediate feature vector, and performing dimension reduction processing to obtain a first feature vector and a corresponding first score;

The second preset model is a convolutional neural network model, and the step of outputting the second feature vector of each image through the second preset model comprises the following steps:

for each image, splicing text contents of the images to obtain spliced texts, inputting the spliced texts into the convolutional neural network model to obtain second feature vectors of the images, outputting corresponding probability values, and marking the second feature vectors as second scores; the step of performing feature fusion based on the first feature vector and the second feature vector to obtain target feature vectors of each image specifically comprises the following steps: comparing the corresponding first score with the second score for each image, if the first score is larger than the second score, taking the first feature vector corresponding to the first score as the target feature vector, and if the first score is smaller than the second score, taking the second feature vector corresponding to the second score as the target feature vector;

Before the step of outputting the first feature vector of each image through the first preset model, the method further comprises the step of constructing a bag-of-words model, specifically:

Extracting text segmentation words from images corresponding to different types of claim settlement data to form a text segmentation word set, wherein the extracted text segmentation words are locally unchanged features in the images; combining text word segmentation with similar word senses in the text word segmentation set, constructing a word list containing K words, counting the occurrence times of each word in the word list in each image, representing each image as a K-dimension value vector, carrying out clustering operation on a plurality of K-dimension value vectors based on the size of a preset dictionary to obtain word bags containing M words, and completing construction of the word bag model.

2. The method for verifying the quality of claim 1, wherein the step of performing the image quality test on the plurality of images by using the preset flip-flop detection model comprises:

3. The method of claim 2, wherein prior to the step of image quality inspection of the plurality of images with the preset sharpness detection model, orientation correction model, and flip-flop detection model, the method further comprises:

4. The method according to claim 1, wherein after the step of text recognition of the image, which is the result of the image quality inspection satisfying the image quality inspection requirement, by a preset OCR model, the method further comprises:

5. A claim document quality inspection verification apparatus for implementing the claim document quality inspection verification method according to any one of claims 1 to 4, characterized in that the apparatus comprises:

6. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which when executed by the processor implement the steps of the claim document quality verification method of any one of claims 1 to 4.

7. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor perform the steps of the claim document quality verification method of any one of claims 1 to 4.