CN115130989A

CN115130989A - Method, device and equipment for auditing service document and storage medium

Info

Publication number: CN115130989A
Application number: CN202210731000.3A
Authority: CN
Inventors: 陈禹燊; 李硕; 岳洪达; 许海洋; 王艺; 韩光耀
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2022-09-30

Abstract

The disclosure provides a method, a device, equipment and a storage medium for auditing a business document, which relate to the technical field of image processing, in particular to the technical fields of artificial intelligence, computer vision and intelligent finance. The specific implementation scheme is as follows: acquiring service document images of at least two service documents; wherein, the at least two service documents are all associated with the target service content and belong to different document types; extracting key information of the target business content from the business document image corresponding to the document type according to information extraction modes corresponding to different document types; and auditing the at least two service documents according to the extracted key information. The scheme can quickly and accurately complete the auditing of the business documents of different document types.

Description

A business document review method, device, device and storage medium

技术领域technical field

本公开涉及图像处理技术领域，尤其涉及人工智能、计算机视觉和智能金融技术领域，具体涉及一种业务文档的审核方法、装置、设备及存储介质。The present disclosure relates to the technical field of image processing, in particular to the technical fields of artificial intelligence, computer vision and intelligent finance, and in particular to a business document review method, device, device and storage medium.

背景技术Background technique

随着人工智能技术的发展，一种以文档图像审核、文档图像比对、文档图像纠错、报告撰写为代表的文档智能应用逐渐兴起。其中，文档图像审核可用于对业务文档的审核过程，具体的可以是对关联相同业务内容的至少两种文档类型的业务文档进行关键信息一致性审核的过程。然而，当至少两种文档类型的业务文档关联的业务内容较多时，如何快速且精准的完成对不同文档类型的业务文档的审核，至关重要。With the development of artificial intelligence technology, a document intelligence application represented by document image review, document image comparison, document image error correction, and report writing has gradually emerged. The document image review may be used in the review process of business documents, and specifically may be a process of performing key information consistency review on business documents of at least two document types associated with the same business content. However, when there are many business contents associated with business documents of at least two document types, how to quickly and accurately complete the review of business documents of different document types is very important.

发明内容SUMMARY OF THE INVENTION

本公开提供了一种业务文档的审核方法、装置、设备及存储介质。The present disclosure provides a business document review method, device, device and storage medium.

根据本公开的一方面，提供了一种业务文档的审核方法，包括：According to an aspect of the present disclosure, a method for reviewing business documents is provided, including:

获取至少两种业务文档的业务文档图像；其中，所述至少两种业务文档均与目标业务内容关联，且属于不同文档类型；Obtaining business document images of at least two kinds of business documents; wherein, the at least two kinds of business documents are both associated with the target business content and belong to different document types;

根据不同文档类型对应的信息提取方式，从对应文档类型的业务文档图像中提取所述目标业务内容的关键信息；Extract the key information of the target business content from the business document image corresponding to the document type according to information extraction methods corresponding to different document types;

根据所提取的关键信息，对所述至少两种业务文档进行审核。The at least two business documents are reviewed according to the extracted key information.

根据本公开的另一方面，提供了一种电子设备，该电子设备包括：According to another aspect of the present disclosure, there is provided an electronic device comprising:

至少一个处理器；以及at least one processor; and

与至少一个处理器通信连接的存储器；其中，a memory communicatively coupled to the at least one processor; wherein,

存储器存储有可被至少一个处理器执行的指令，指令被至少一个处理器执行，以使至少一个处理器能够执行本公开任一实施例的业务文档的审核方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the business document review method of any embodiment of the present disclosure.

根据本公开的另一方面，提供了一种存储有计算机指令的非瞬时计算机可读存储介质，其中，计算机指令用于使计算机执行本公开任一实施例的业务文档的审核方法。According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to execute the business document review method of any embodiment of the present disclosure.

本方案能够快速且精准的完成对不同文档类型的业务文档的审核，为业务文档的审核提供了一种新的解决方案。This solution can quickly and accurately complete the review of business documents of different document types, providing a new solution for the review of business documents.

应当理解，本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征，也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or critical features of embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.

附图说明Description of drawings

附图用于更好地理解本方案，不构成对本公开的限定。其中：The accompanying drawings are used for better understanding of the present solution, and do not constitute a limitation to the present disclosure. in:

图1是根据本公开实施例提供的一种业务文档的审核方法的流程图；1 is a flowchart of a method for reviewing business documents provided according to an embodiment of the present disclosure;

图2A是根据本公开实施例提供的一种业务文档的审核方法的流程图；2A is a flowchart of a method for reviewing business documents provided according to an embodiment of the present disclosure;

图2B是根据本公开实施例提供的一种文本图像感知模型的工作原理示意图；2B is a schematic diagram of the working principle of a text image perception model provided according to an embodiment of the present disclosure;

图3是本公开实施例提供的文本图像感知模型的调参和使用过程的原理框图；3 is a schematic block diagram of a process of adjusting parameters and using a text image perception model provided by an embodiment of the present disclosure;

图4A是根据本公开实施例提供的一种业务文档的审核方法的流程图；4A is a flowchart of a method for reviewing business documents provided according to an embodiment of the present disclosure;

图4B是根据本公开实施例提供的一种表格图像感知模型的输入特征构建原理示意图；FIG. 4B is a schematic diagram of an input feature construction principle of a table image perception model provided according to an embodiment of the present disclosure;

图5是根据本公开实施例提供的一种业务文档的审核方法的流程图；5 is a flowchart of a method for reviewing a business document provided according to an embodiment of the present disclosure;

图6是根据本公开实施例提供的一种业务文档的审核方法的流程图；6 is a flowchart of a method for reviewing business documents provided according to an embodiment of the present disclosure;

图7是根据本公开实施例提供的一种业务文档的审核方法的流程图；7 is a flowchart of a method for reviewing business documents provided according to an embodiment of the present disclosure;

图8A是根据本公开实施例提供的一种业务文档的审核方法的流程图；8A is a flowchart of a method for reviewing business documents provided according to an embodiment of the present disclosure;

图8B是根据本公开实施例提供的一种可视化界面的展示效果示意图；8B is a schematic diagram of a display effect of a visual interface provided according to an embodiment of the present disclosure;

图9是根据本公开实施例提供的一种业务文档的审核方法的流程图；9 is a flowchart of a method for reviewing a business document provided according to an embodiment of the present disclosure;

图10为本公开实施例提供的排序模型的调参和使用过程的原理框图；FIG. 10 is a schematic block diagram of the parameter adjustment and use process of the sorting model provided by the embodiment of the present disclosure;

图11是根据本公开实施例提供的一种业务文档的审核方法的流程图；11 is a flowchart of a method for reviewing business documents provided according to an embodiment of the present disclosure;

图12A为本公开实施例提供的一种工单文档审核的流程图；12A is a flowchart of a work order document review provided by an embodiment of the present disclosure;

图12B为本公开实施例提供的感知引擎的内容工作原理流程图；FIG. 12B is a flow chart of the content working principle of the perception engine provided by the embodiment of the present disclosure;

图13是根据本公开实施例提供的一种业务文档的审核装置的结构示意图；13 is a schematic structural diagram of an apparatus for reviewing business documents provided according to an embodiment of the present disclosure;

图14是用来实现本公开实施例的业务文档的审核方法的电子设备的框图。FIG. 14 is a block diagram of an electronic device used to implement the business document review method according to an embodiment of the present disclosure.

具体实施方式Detailed ways

以下结合附图对本公开的示范性实施例做出说明，其中包括本公开实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本公开的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

图1是根据本公开实施例提供的一种业务文档的审核方法的流程图，本公开实施例适用于对关联相同目标业务内容的至少两种文档类型的业务文档的进行信息一致性审核的情况。例如，适用于对关联相同目标业务内容的业务表格和业务合同进行信息一致性审核的情况。该方法可以由业务文档的审核装置来执行，该装置可以采用软件和/或硬件的方式实现。具体可以集成于安装有文档智能应用的电子设备中。FIG. 1 is a flowchart of a business document review method provided according to an embodiment of the present disclosure. The embodiment of the present disclosure is applicable to the case of performing information consistency review on business documents of at least two document types associated with the same target business content . For example, it is applicable to the case of conducting information consistency review on business forms and business contracts related to the same target business content. The method may be executed by a device for reviewing business documents, and the device may be implemented in software and/or hardware. Specifically, it can be integrated into an electronic device installed with a document intelligence application.

S101，获取至少两种业务文档的业务文档图像。S101, acquiring business document images of at least two business documents.

其中，本实施例的业务文档图像是对业务文档的文档页面进行拍摄或扫描得到的图像。由于本实施例的业务文档的种类为至少两种，所以业务文档图像的种类也包括至少两种。可选的，本实施例中至少两种业务文档对应的文档类型可以包括但不限于：文本类型、表格类型和幻灯片类型等。其中，文本类型的业务文档可以是以段落或篇章的形式表征的文档，例如，可以是业务合同和业务申请文件等。表格类型的业务文档可以是以表格的形式表征的文档，例如，可以是业务申请表或业务审批表等。幻灯片类型的业务文档可以是以幻灯片的形式表征的文档，例如，可以是业务宣讲幻灯片。The business document image in this embodiment is an image obtained by photographing or scanning the document page of the business document. Since there are at least two types of business documents in this embodiment, the types of business document images also include at least two types. Optionally, the document types corresponding to the at least two business documents in this embodiment may include but are not limited to: text type, table type, slide type, and the like. The text-type business documents may be documents represented in the form of paragraphs or chapters, for example, business contracts and business application documents. The form-type business document may be a document represented in the form of a form, for example, a business application form or a business approval form. The business document of the slide type may be a document represented in the form of a slide, for example, it may be a business presentation slide.

本实施例中的业务可以泛指任何执行过程中，需要通过多种不同类型的业务文档来记录其业务内容的业务。相应的，业务文档即为记录该业务执行过程中的相关业务内容而生成的文档。不同文档类型的业务文档可通过不同的形式和方式来记录业务内容。即本实施例获取的至少两种业务文档均与目标业务内容关联，且属于不同文档类型。其中，目标业务是待审核的至少两种业务文档所对应的业务，目标业务内容可以是执行该目标业务的过程中所产生的所有相关内容。本实施例中不同种类的业务文档，其虽然对应的文档类型不同，但是其对应的目标业务内容应该是相同的。本方案对不同种类的业务文档进行审核的实质就是审核不同种类的业务文档关于目标业务内容记录的信息是否一致。The business in this embodiment can generally refer to any business whose business content needs to be recorded through a variety of different types of business documents during the execution process. Correspondingly, the business document is a document generated to record the relevant business content during the execution of the business. Business documents of different document types can record business content in different forms and ways. That is, at least two kinds of business documents acquired in this embodiment are associated with the target business content and belong to different document types. The target service is the service corresponding to the at least two service documents to be reviewed, and the content of the target service may be all relevant content generated during the execution of the target service. In this embodiment, although the corresponding document types of different types of service documents are different, their corresponding target service contents should be the same. The essence of auditing different types of business documents in this solution is to examine whether the information recorded in different types of business documents about the target business content is consistent.

可选的，本实施例可以根据业务文档的审核需求，获取需要审核的目标业务对应的至少两种文档类型的业务文档对应的业务文档图像。若获取到的是至少两种文档类型的业务文档，则可以扫描或拍摄每一类型的业务文档的每一文档页，得到每一文档类型对应的业务文档图像。需要说明的是，本实施例中，针对每一种业务文档，其对应的业务文档图像的数量为至少一张。Optionally, this embodiment may acquire business document images corresponding to business documents of at least two document types corresponding to the target business to be reviewed according to the review requirements of the business documents. If business documents of at least two document types are obtained, each document page of each type of business document can be scanned or photographed to obtain a business document image corresponding to each document type. It should be noted that, in this embodiment, for each type of business document, the number of corresponding business document images is at least one.

可选的，若获取至少两种业务文档的业务文档图像的过程中，没有进行业务文档类型的区分，则在得到业务文档图像后，还需要对获取的每一张业务文档图像，解析其对应业务文档的文档类型。具体的，解析每张业务文档图像所对应的业务文档的文档类型的方式有很多，对此本实施例不进行限定。例如，一种可实现方式为：结合每种文档类型的格式特征，基于特征比对的方式，确定每张业务文档图像所对应的业务文档的文档类型。另一种可实现方式为：对每一张业务文档图像，先通过文本识别技术，如光学字符识别(OpticalCharacter Recognition，OCR)技术，识别其中的文本信息，然后再基于分类模型(例如，基于逻辑回归(softmax)函数构建的多分类模型)，根据每张业务文档图像的文本信息，来解析该业务文档图像对应业务文档的文档类型。Optionally, if there is no distinction between the types of business documents in the process of obtaining the business document images of at least two business documents, after obtaining the business document images, it is also necessary to parse the corresponding business document images for each obtained business document image. The document type of the business document. Specifically, there are many ways to parse the document type of the business document corresponding to each business document image, which is not limited in this embodiment. For example, an implementation manner is: combining the format features of each document type, and determining the document type of the business document corresponding to each business document image based on feature comparison. Another possible implementation is: for each business document image, first use text recognition technology, such as optical character recognition (Optical Character Recognition, OCR) technology, to recognize the text information in it, and then based on the classification model (for example, based on logic The multi-classification model constructed by the regression (softmax) function), according to the text information of each business document image, to parse the document type of the business document image corresponding to the business document.

可选的，针对一个目标业务，其对应的业务文档的种类可能有很多，但是进行审核时可能只需对其中的某些文档类型的业务文档进行审核即可，所以本步骤获取至少两种业务文档的业务文档图像时，可以只获取需要审核的那些文档类型的业务文档的业务文档图像即可。Optionally, for a target business, there may be many types of business documents corresponding to it, but it may only be necessary to review the business documents of some of the document types when reviewing, so this step obtains at least two types of business documents. When obtaining business document images of documents, only the business document images of those document types that need to be reviewed may be obtained.

例如，对于目标业务，进行业务文档审核时，若只比较表格类型和文本类型的业务文档内容是否一致即可，则此时为了防止其他类型的业务文档对审核结果的影响，本实施例在执行本步骤时，可以将除表格类型和文本类型之外其他文档类型的业务文档的业务文档图像排除在外，只提取表格类型和文本类型的业务文档对应的业务文档图像。For example, for the target business, when conducting business document review, if only the content of the table type and text type business documents are consistent, then in order to prevent other types of business documents from affecting the review results, this embodiment executes In this step, business document images of business documents of other document types except table type and text type can be excluded, and only business document images corresponding to business documents of table type and text type can be extracted.

S102，根据不同文档类型对应的信息提取方式，从对应文档类型的业务文档图像中提取目标业务内容的关键信息。S102: Extract key information of the target business content from the business document image corresponding to the document type according to information extraction methods corresponding to different document types.

其中，关键信息可以是与目标业务内容对应的，用于对目标业务的业务文档进行审核所需的信息。可选的，不同目标业务可以基于其业务需求，选择不同的关键信息。本实施例从每种文档类型的业务文档的业务文档图像中提取的关键信息的数量为至少一个。例如，若目标业务为项目签约业务，则该目标业务内容的关键信息可以包括签约双方、签约时间和签约条款等。The key information may be information corresponding to the content of the target service and used for reviewing the service documents of the target service. Optionally, different target businesses may select different key information based on their business requirements. The quantity of key information extracted from the business document image of the business document of each document type in this embodiment is at least one. For example, if the target service is a project contract service, the key information about the content of the target service may include the contracting parties, the contracting time, and the contracting terms.

需要说明的是，在本实施例中，选择关键信息时，要保证不同文档类型的业务文档中都要记录有该关键信息。不同文档类型的业务文档可以通过相同或不同的字段来记录该关键信息。可选的，本实施例的关键信息可以通过键值对(key-value)的方式来表征，其中，key表征关键信息在对应业务文档中的字段名称，value表征从对应文档中提取的具体内容。It should be noted that, in this embodiment, when selecting key information, it is necessary to ensure that the key information must be recorded in business documents of different document types. Business documents of different document types can record this key information through the same or different fields. Optionally, the key information in this embodiment may be represented by a key-value pair, wherein the key represents the field name of the key information in the corresponding business document, and the value represents the specific content extracted from the corresponding document. .

可选的，在本实施例中，针对不同文档类型的业务文档，可以基于各类型业务文档的信息记录方式的特征，为不同文档类型选择不同的信息提取方式，来从对应文档类型的业务文档图像中提取目标业务内容的关键信息。具体的，可以是先针对获取的每一张业务文档图像，基于文本识别技术，如OCR技术，识别该业务文档图像中的文本信息，然后再根据每张业务文档图像对应的信息提取方式，从该文档图像中的文本信息中提取目标业务内容的关键信息。Optionally, in this embodiment, for business documents of different document types, different information extraction methods may be selected for different document types based on the characteristics of information recording methods of various types of business documents, so as to extract information from business documents of corresponding document types. Extract the key information of the target business content from the image. Specifically, for each acquired business document image, based on text recognition technology, such as OCR technology, identify the text information in the business document image, and then according to the information extraction method corresponding to each business document image, from the The key information of the target business content is extracted from the text information in the document image.

可选的，对于表格类型的业务文档的业务文档图像，可以通过结构化信息提取和/或预先训练的表格图像感知模型，根据业务文档图像中的文本信息，提取该业务文档图像中记录的目标业务内容的关键信息。对于文本类型的业务文档的业务文档图像，可以通过语义解析和/或预先训练的文本图像感知模型，根据业务文档图像中的文本信息，提取该业务文档图像中记录的目标业务内容的关键信息。Optionally, for the business document image of the business document of the form type, the target recorded in the business document image can be extracted according to the text information in the business document image through structured information extraction and/or a pre-trained form image perception model. Key information for business content. For a business document image of a text-type business document, the key information of the target business content recorded in the business document image can be extracted according to the text information in the business document image through semantic parsing and/or a pre-trained text image perception model.

需要说明的是，文本图像感知模型和表格图像感知模型是对不同算法构建的预训练模型进行参数微调(fine-tuning)得到的。文本图像感知模型在执行从文本类型对应的业务文档图像中提取关键信息的任务时效果更好；表格图像感知模型在执行从表格类型对应的业务文档图像中提取关键信息的任务时效果更好。关于这两个模型执行关键信息提取任务的具体过程不进行限定。It should be noted that the text image perception model and the table image perception model are obtained by fine-tuning the parameters of pre-trained models constructed by different algorithms. The text-image-aware model performs better when performing the task of extracting key information from business document images corresponding to text types; the table-image-aware model performs better when performing the task of extracting key information from business document images corresponding to table types. There is no limitation on the specific process for the two models to perform key information extraction tasks.

S103，根据所提取的关键信息，对至少两种业务文档进行审核。S103, review at least two business documents according to the extracted key information.

可选的，本实施例可以根据所提取的关键信息对应的业务文档图像和文档提取字段(即该关键信息对应的业务文档中记录该关键信息的字段)，将提取的关键信息中，位于不同业务文档，且对应文档提取字段的语义相同的至少两个关键信息作为一个信息对。然后针对每一信息对中包含的至少两个关键信息进行内容一致性审核，若一致，则说明该信息对审核通过，否则，审核不通过。若所有信息对的审核结果均为审核通过，则说本实施例对目标业务的至少两种业务文档的审核结果为通过，否则审核结果为不通过。Optionally, in this embodiment, according to the business document image corresponding to the extracted key information and the document extraction field (that is, the field in which the key information is recorded in the business document corresponding to the key information), the extracted key information is located in a different location. A business document, and at least two key information corresponding to the document extraction field with the same semantics as an information pair. Then, the content consistency audit is performed for at least two key information included in each information pair. If they are consistent, it means that the information pair has passed the audit; otherwise, the audit has failed. If the audit results of all the information pairs are approved, it is said that the audit results of at least two business documents of the target business in this embodiment are passed, otherwise the audit results are not passed.

可选的，本实施例确定关键信息对应的文档提取字段的方式有很多，例如，若关键信息是通过键值对(key-value)的方式表征的，此时可以直接将该关键信息中的key值作为其对应的文档提取字段。若关键信息就只是从业务文档图像中提取的内容，则此时可以对关键信息进行目标业务对应领域的实体标签识别，并将识别结果作为该关键信息对应的文档提取字段。Optionally, there are many ways to determine the document extraction field corresponding to the key information in this embodiment. For example, if the key information is represented by a key-value pair (key-value), the key information can be The key value is used as its corresponding document extraction field. If the key information is only the content extracted from the business document image, at this time, the key information can be identified by the entity tag of the corresponding field of the target business, and the identification result can be used as the document extraction field corresponding to the key information.

本实施例确定语义相同的文档提取字段的方式也有很多，例如，可以通过对不同文档提取字段进行语义解析，来确定语义相同的文档提取字段，也可以是根据业务实际需求，预先设置好哪些文档提取字段为语义相同的文档提取字段。In this embodiment, there are many ways to determine the document extraction fields with the same semantics. For example, the extraction fields of documents with the same semantics can be determined by performing semantic analysis on the extraction fields of different documents, or which documents are preset according to the actual needs of the business. Extract fields are semantically identical document extraction fields.

本公开实施例的方案，获取至少两种业务文档的业务文档图像后，针对不同类型的业务文档图像，采用不同的信息提取方式来从对应的业务文档图像中提取关键信息，进而根据所提取的关键信息来对至少两种业务文档进行审核。本方案针对不同文档类型的业务文档，采用不同的方式自动提取关键信息，提高了关键信息提取的准确性和高效性，为快速且精准的完成对不同文档类型的业务文档的审核提供了保障。According to the solution of the embodiment of the present disclosure, after acquiring the business document images of at least two kinds of business documents, different information extraction methods are used for different types of business document images to extract key information from the corresponding business document images, and then according to the extracted key information to review at least two types of business documents. For business documents of different document types, this solution automatically extracts key information in different ways, improves the accuracy and efficiency of key information extraction, and provides a guarantee for quickly and accurately completing the review of business documents of different document types.

图2A是根据本公开实施例提供的一种业务文档的审核方法的流程图，图2B是根据本公开实施例提供的一种文本图像感知模型的工作原理示意图；本公开实施例在上述实施例的基础上，进一步对如何从文本类型的业务文档对应的业务文档图像中提取目标业务内容的关键信息的过程进行详细解释说明，如图2A-2B所示，本实施例提供的业务文档的审核方法可以包括：2A is a flowchart of a business document review method provided according to an embodiment of the present disclosure, and FIG. 2B is a schematic diagram of a working principle of a text image perception model provided according to an embodiment of the present disclosure; On the basis of , the process of how to extract the key information of the target business content from the business document image corresponding to the text-type business document is further explained in detail. Methods can include:

S201，获取至少两种业务文档的业务文档图像。S201 , acquiring business document images of at least two kinds of business documents.

其中，至少两种业务文档均与目标业务内容关联，且属于不同文档类型。Wherein, at least two kinds of business documents are associated with the target business content and belong to different document types.

S202，在文档类型为文本类型的情况下，根据预设间隔符、文档提取字段和文本类型的业务文档图像中的文本信息，构建输入序列。S202, if the document type is a text type, construct an input sequence according to a preset spacer, a document extraction field, and text information in a business document image of the text type.

其中，本实施例的预设间隔符可以是预先设置好的用于分隔输入序列的开始、结束，以及输入序列中不同种类的内容的预设符号。本实施例优选三种预设间隔符，即表征输入序列开始的开始符(CLS)、表征输入序列结束的结束符(SEP1)，以及分隔输入序列中的不同种类的内容的分隔(SEP2)。The preset spacer in this embodiment may be a preset symbol used to separate the start and end of the input sequence and different types of content in the input sequence. In this embodiment, three preset spacers are preferred, ie, a start character (CLS) representing the beginning of the input sequence, an end character (SEP1) representing the end of the input sequence, and a separation (SEP2) separating different kinds of content in the input sequence.

文档提取字段可以是关键信息对应的业务文档中记录该关键信息的字段。需要说明的是，本实施例中，相同内容的关键信息在不同种类的业务文档中对应的文档提取字段可以相同，也可以不同。例如，表格类型的业务文档中，项目参与方(即关键信息)对应的文档提取字段是“展开业务机构”和“交易对手”；文本类型的业务文档中，项目参与方(即关键信息)对应的文档提取字段是“甲方”和“乙方”。若本实施例中，相同内容的关键信息在不同种类的业务文档中对应的文档提取字段不同，则本步骤可以将文档类型的业务文档中对应的文档提取字段，作为本步骤构建输入序列所需的文档提取字段。The document extraction field may be a field in the business document corresponding to the key information that records the key information. It should be noted that, in this embodiment, the document extraction fields corresponding to the key information of the same content in different types of business documents may be the same or different. For example, in a table-type business document, the document extraction fields corresponding to project participants (ie, key information) are "deployed business organization" and "transaction counterparty"; in a text-type business document, project participants (ie, key information) correspond to The document extraction fields are "Party A" and "Party B". If in this embodiment, the key information of the same content has different document extraction fields in different types of business documents, this step may use the document extraction fields corresponding to the document type business documents as the required fields for constructing the input sequence in this step. document extraction field.

可选的，在文档类型为文本类型的情况下，本实施例可以是对文本类型的业务文档图像进行文本信息识别，得到的识别结果包括：识别的字符、字符在业务文档图像中的位置信息和识别的置信度。然后对识别结果进行词量化处理，得到多个连续的量化词信息(token)，其中，该量化词信息可以包括：量化词及其位置信息和识别的置信度；接下来，如图2B所示，依次将预设间隔符中的开始符(CLS)、文档提取字段(promot)分隔符(SEP2)、多个连续的量化词信息(token)和结束符(SEP1)进行拼接，得到输入序列。Optionally, in the case where the document type is a text type, this embodiment may perform text information recognition on a text-type business document image, and the obtained recognition result includes: the recognized characters and the position information of the characters in the business document image. and recognition confidence. Then, word quantization processing is performed on the recognition result to obtain a plurality of continuous quantified word information (token), wherein the quantified word information may include: quantified words and their position information and recognition confidence; next, as shown in FIG. 2B , splicing the start character (CLS), document extraction field (promot) separator (SEP2), multiple consecutive quantized word information (token) and end character (SEP1) in the preset spacer in sequence to obtain the input sequence.

S203，通过文本图像感知模型，根据输入序列，确定文档提取字段对应的抽取起始位置，以及抽取起始位置的置信度。S203: Determine the extraction start position corresponding to the document extraction field and the confidence level of the extraction start position according to the input sequence through the text image perception model.

其中，本实施例的文本图像感知模型优选为一个简易版的自动化建模，基于预训练模型(ERNIE)编码部分和统一信息抽取框架(Universal Information Extraction，UIE)搭建而成。该框架实现了实体抽取、关系抽取、事件抽取、情感分析等任务的统一建模，并使得不同任务间具备良好的迁移和泛化能力。Wherein, the text-image perception model in this embodiment is preferably a simplified version of automatic modeling, which is constructed based on the coding part of the pre-training model (ERNIE) and a unified information extraction framework (Universal Information Extraction, UIE). The framework realizes the unified modeling of entity extraction, relation extraction, event extraction, sentiment analysis and other tasks, and enables good transfer and generalization capabilities between different tasks.

具体的，如图2B所示，本实施例可以是将S202构建的输入序列输入到文本图像感知模型中，该模型即可从输入序列中的文本信息(即文本信息中的多个连续的量化词)中，解析与文档提取字段对应连续的量化词，并将量化词信息中记录的该连续的量化词中的首个量化词和末尾量化词对应的位置信息分别作为该组连续量化词的抽取开始位置(Start)和抽取结束位置(End)，即该连续的量化词对应的抽取起始位置。并给出该抽取起始位置对应的置信度。示例性的，如图2B示出的两组输出结果即为两组抽取起始位置及其置信度。Specifically, as shown in FIG. 2B , in this embodiment, the input sequence constructed in S202 may be input into a text-image perception model, and the model can learn from the text information in the input sequence (that is, a plurality of continuous quantifications in the text information). word), parse the continuous quantified words corresponding to the document extraction field, and use the position information corresponding to the first quantified word and the last quantified word in the continuous quantified words recorded in the quantified word information as the group of continuous quantified words. The extraction start position (Start) and the extraction end position (End) are the extraction start positions corresponding to the continuous quantized words. And give the corresponding confidence of the extraction starting position. Exemplarily, the two sets of output results shown in FIG. 2B are the two sets of extraction starting positions and their confidence levels.

需要说明的是，在文本类型的业务文档中，可能多次重复出现了同一文档提取字段，例如，多次出现了业务参与方信息，所以针对每一文档提取字段，可能会提取到多组对应的连续的量化词，进而针对每一文档提取字段，文本图像感知模型会输出多组对应的抽取起始位置及其置信度。It should be noted that, in text-type business documents, the same document extraction field may appear multiple times. For example, business participant information appears multiple times. Therefore, for each document extraction field, multiple sets of corresponding fields may be extracted. The continuous quantified words of , and then for each document extraction field, the text image perception model will output multiple sets of corresponding extraction starting positions and their confidence levels.

S204，根据抽取起始位置和抽取起始位置的置信度，确定文档提取字段对应的抽取信息，以及抽取信息的置信度。S204 , according to the extraction start position and the confidence level of the extraction start position, determine the extraction information corresponding to the document extraction field and the confidence level of the extraction information.

可选的，在本实施例中，可以针对每一文档提取字段，基于每一组抽取起始位置，将文本类型的业务文档图像的文本信息中，位置在该组抽取起始位置之间的各字符作该文档提取字段对应的一组抽取信息。然后将抽取起始位置的置信度作为抽取信息的置信度。具体的，当抽取起始位置的置信度包括：抽取开始位置的置信度和抽取结束位置的置信度时，可以将抽取开始位置的置信度和抽取结束位置的置信度进行融合处理(如求和处理或求均值处理等)，并将处理结果作为抽取信息的置信度。Optionally, in this embodiment, for each document extraction field, based on each group of extraction start positions, in the text information of the text-type business document image, the position between the group of extraction start positions can be extracted. Each character is used as a set of extraction information corresponding to the extraction field of the document. Then the confidence of the extraction starting position is taken as the confidence of the extracted information. Specifically, when the confidence level of the extraction start position includes: the confidence level of the extraction start position and the confidence level of the extraction end position, the confidence level of the extraction start position and the confidence level of the extraction end position may be fused (such as summing up). processing or averaging, etc.), and use the processing result as the confidence level of the extracted information.

需要说明的是，若一个文档提取字段对应多组抽取起始位置及其置信度，则本步骤可以为该文档提取字段确定多组对应的抽取信息，以及每组抽取信息对应的置信度。It should be noted that, if a document extraction field corresponds to multiple sets of extraction starting positions and their confidence levels, this step may determine multiple sets of corresponding extraction information for the document extraction field, and the confidence levels corresponding to each set of extraction information.

S205，根据文档提取字段对应的抽取信息，以及抽取信息的置信度，确定文本类型的业务文档图像中目标业务内容的关键信息。S205: Determine the key information of the target business content in the text-type business document image according to the extraction information corresponding to the document extraction field and the confidence level of the extracted information.

具体的，针对每一文档提取字段，若其对应的抽取信息为一个，则可以直接将该抽取信息，作为文本类型的业务文档图像中，该文档提取字段对应关键信息。若其对应的抽取信息为多个，则可以将置信度最高的抽取信息作为文本类型的业务文档图像中，该文档提取字段对应关键信息。并将所有文档提取字段对应的关键信息进行整合，得到文本类型的业务文档图像中目标业务内容的关键信息。Specifically, for each document extraction field, if the corresponding extraction information is one, the extraction information can be directly used as a text-type business document image, and the document extraction field corresponds to key information. If there are multiple pieces of corresponding extraction information, the extraction information with the highest confidence may be used as the text-type business document image, and the document extraction field corresponds to the key information. The key information corresponding to all document extraction fields is integrated to obtain the key information of the target business content in the text-type business document image.

S206，在文档类型为除文本类型外的其他文档类型的情况下，根据各其他文档类型对应的信息提取方式，从对应文档类型的业务文档图像中提取目标业务内容的关键信息。S206, if the document type is other document types except the text type, extract the key information of the target business content from the business document image corresponding to the document type according to the information extraction mode corresponding to each other document type.

S207，根据所提取的关键信息，对至少两种业务文档进行审核。S207, at least two kinds of business documents are reviewed according to the extracted key information.

本公开实施例的方案，获取至少两种业务文档的业务文档图像后，针对文本类型的业务文档图像，根据预设间隔符、文档提取字段和文本类型的业务文档图像的文本信息构建输入序列，根据文本图像感知模型根据输入序列，确定文档提取字段对应的抽取起始位置及其置信度，进而基于抽取起始位置及其置信度确定文档类型对应的业务文档图像的关键信息，并采用其他信息方式对其他类型的业务文档图像也提取关键信息，进而根据所提取的关键信息来对至少两种业务文档进行审核。本实施例给出了基于ERNIE+UIE的文本图像感知模型如何从文本类型的业务文档图像中提取关键信息的具体实现方式，该模型能够更为精准且全面的提取文本类型对应的业务文档图像中的关键信息，为后续精准实现业务文档的审核提供了保障。In the solution of the embodiment of the present disclosure, after acquiring the business document images of at least two kinds of business documents, for the text-type business document images, an input sequence is constructed according to the preset spacer, the document extraction field, and the text information of the text-type business document images, According to the text image perception model, according to the input sequence, determine the extraction start position and its confidence level corresponding to the document extraction field, and then determine the key information of the business document image corresponding to the document type based on the extraction start position and its confidence level, and use other information The method also extracts key information from other types of business document images, and then checks at least two business documents according to the extracted key information. This embodiment presents a specific implementation of how the text image perception model based on ERNIE+UIE extracts key information from text-type business document images. The model can more accurately and comprehensively extract text-type business document images corresponding to It provides a guarantee for the subsequent accurate review of business documents.

可选的，在本公开实施例中，根据预设间隔符、文档提取字段和文本类型的业务文档图像中的文本信息的具体实现过程可以包括：对文本类型的业务文档图像中的文本信息进行预处理，得到预处理文本；其中，预处理包括：清洗处理、符号替换处理，以及格式转换处理中的至少一种；根据预处理文本、预设间隔符和文档提取字段，构建输入序列。Optionally, in this embodiment of the present disclosure, a specific implementation process of text information in a text-type business document image according to a preset spacer, a document extraction field, and a text-type business document image may include: performing an operation on the text information in the text-type business document image. Preprocessing to obtain preprocessed text; wherein, the preprocessing includes at least one of cleaning processing, symbol replacement processing, and format conversion processing; constructing an input sequence according to the preprocessing text, preset spacers, and document extraction fields.

具体的，对业务文档图像中的文本信息进行清洗处理的目的是为了去除文本中的空格字符“”、字表符“\t”、换行符“\n”和下划线符“_”等。对业务文档图像中的文本信息进行符号替换处理的目的是为了将文本中的标点符号都统一转换为中文标点符号或英文标点符号，本方案优选统一转换为英文标点符号，以节省存储占用开销。格式转换处理的目的是为了将文本信息转换为文本图像感知模型所支持的格式。根据预处理文本、预设间隔符和文档提取字段，构建输入序列的过程与上述实施例介绍的根据文本信息、预设间隔符和文档提取字段，构建输入序列的过程类似。再此不进行赘述。Specifically, the purpose of cleaning the text information in the business document image is to remove the space character " ", the character "\t", the newline character "\n" and the underscore "_" in the text. The purpose of performing symbol replacement processing on the text information in the business document image is to uniformly convert all the punctuation marks in the text into Chinese punctuation marks or English punctuation marks. The purpose of the format conversion process is to convert the textual information into a format supported by the textual image perception model. The process of constructing the input sequence according to the preprocessed text, the preset spacer and the document extraction field is similar to the process of constructing the input sequence according to the text information, the preset spacer and the document extraction field described in the above embodiment. No further description is given here.

本实施例在构建输入序列之前，引入对文本信息的预处理操作，提高文本信息的统一性和规范性，便于后续文本图像感知模型更精准的预测文档提取字段对应的抽取起始位置，以及抽取起始位置的置信度。Before constructing the input sequence, this embodiment introduces a preprocessing operation on the text information, improves the uniformity and standardization of the text information, and facilitates the subsequent text-image perception model to more accurately predict the extraction start position corresponding to the document extraction field, and the extraction process. Confidence of the starting position.

示例性的，图3是本公开实施例提供的文本图像感知模型的调参和使用过程的原理框图。如图3所示，方框301-方框304对应的是文本图像感知模型的调参过程，具体的：在调参过程中，方框301对应的业务文档图像是文本类型的业务文档的文档图像，即训练样本图像(如可以是PDF文档的扫描件)，将方框301中的训练样本图像输入到方框302中，对其进行文本识别(如OCR识别)被并将识别到的文本信息输入到方框303进行数据预处理(如清洗处理、符号替换处理，以及格式转换处理中的至少一种)，根据预处理后的文本信息(即输入文本)、预设间隔符和文档提取字段，构建输入序列，以及基于标注工具对预处理后的数据进行文档提取字段的标注，其中，本实施例优选使用支持嵌套实体的标注工具(doccano)，该标注工具能够针对同一词汇识别出多个不同类别的实体，能够更精准且全面的完成对文档提取字段的标注。将方框303数据预处理得到的输入序列以及对输入序列的输入文本标注的结果一并输入到方框304，对文本图像感知模型进行微调(fine-turning)，微调好的文本图像感知模型即可发布，用于后续方框305进行关键信息的抽取操作。Exemplarily, FIG. 3 is a schematic block diagram of a process of adjusting parameters and using a text image perception model provided by an embodiment of the present disclosure. As shown in FIG. 3 , blocks 301 to 304 correspond to the parameter adjustment process of the text image perception model. Specifically: in the parameter adjustment process, the business document image corresponding to block 301 is a document of a text-type business document. Image, that is, the training sample image (such as a scanned copy of a PDF document), input the training sample image in block 301 into block 302, and perform text recognition (such as OCR recognition) on it and the recognized text The information is input to block 303 for data preprocessing (such as at least one of cleaning processing, symbol replacement processing, and format conversion processing), according to the preprocessed text information (ie input text), preset spacers and document extraction field, constructing the input sequence, and labeling the preprocessed data based on the labeling tool for document extraction fields, wherein, in this embodiment, a labeling tool (doccano) that supports nested entities is preferably used, and the labeling tool can identify the same vocabulary. Multiple entities of different categories can more accurately and comprehensively complete the annotation of document extraction fields. The input sequence obtained by data preprocessing in block 303 and the result of the input text annotation of the input sequence are input into block 304, and the text image perception model is fine-tuned (fine-turning). The fine-tuned text image perception model is It can be published for the subsequent block 305 to perform key information extraction operations.

方框301-方框303和方框305对应的是文本图像感知模型的使用过程，具体的：在实际使用过程中，方框301对应获取文本类型的业务文档的业务文档图像，将方框301获取的业务文档图像输入到方框302中，对其进行文本识别(如OCR识别)被并将识别到的文本信息输入到方框303进行数据预处理(如清洗处理、符号替换处理，以及格式转换处理中的至少一种)，根据预处理后的文本信息(即输入文本)、预设间隔符和文档提取字段，构建输入序列。将方框303构建的输入序列输入到方框305中，通过方框304发布的文本感知模型基于输入序列提取目标业务内容的关键信息，其中，文本感知模型具体如何基于输入序列提取目标业务内容的关键信息的过程见上述实施例的S203-S205。Block 301 to block 303 and block 305 correspond to the process of using the text image perception model, specifically: in the actual use process, block 301 corresponds to acquiring the business document image of the text-type business document, and block 301 The acquired business document image is input into block 302, and text recognition (such as OCR recognition) is performed on it, and the recognized text information is input into block 303 for data preprocessing (such as cleaning processing, symbol replacement processing, and formatting). at least one of conversion processing), constructs an input sequence according to the preprocessed text information (ie, the input text), the preset spacer and the document extraction field. The input sequence constructed in block 303 is input into block 305, and the key information of the target business content is extracted based on the input sequence through the text-aware model published in block 304, wherein, how the text-aware model specifically extracts the information of the target business content based on the input sequence. For the process of key information, see S203-S205 in the above embodiment.

图4A是根据本公开实施例提供的一种业务文档的审核方法的流程图，图4B是根据本公开实施例提供的一种表格图像感知模型的输入特征构建原理示意图；本公开实施例在上述实施例的基础上，进一步对如何从表格文档类型的业务文档对应的业务文档图像中提取目标业务内容的关键信息的过程进行详细解释说明，如图4A-4B所示，本实施例提供的业务文档的审核方法可以包括：4A is a flowchart of a business document review method provided according to an embodiment of the present disclosure, and FIG. 4B is a schematic diagram of an input feature construction principle of a table image perception model provided according to an embodiment of the present disclosure; On the basis of the embodiment, the process of how to extract the key information of the target service content from the service document image corresponding to the service document of the form document type is further explained in detail. Document review methods can include:

S401，获取至少两种业务文档的业务文档图像。S401, acquiring business document images of at least two business documents.

S402，在文档类型为表格类型的情况下，对表格类型的业务文档图像进行分块，得到图像块序列，并确定图像块序列中图像块的图像特征、图像块在业务文档图像中的块位置，以及图像块在图像块序列中的块序号。S402, in the case that the document type is the form type, divide the business document image of the form type into blocks to obtain an image block sequence, and determine the image features of the image blocks in the image block sequence and the block positions of the image blocks in the business document image , and the block number of the image block in the image block sequence.

具体的，如图4B所示，对于表格类型的业务文档图像，可以先基于预先设置的分块策略，将每张业务文档图像划分为多个图像块，如图4B对应的分块策略是将业务文档图像等份划分为4份，划分后的4个图像块(即左上角图像块、右上角图像块、左下角图像块和右下角图像块)依次排序构成图像块序列。Specifically, as shown in FIG. 4B , for table-type business document images, each business document image can be divided into multiple image blocks based on a preset segmentation strategy. The segmentation strategy corresponding to FIG. 4B is to divide The image of the business document is divided into 4 equal parts, and the divided 4 image blocks (ie the upper left image block, the upper right image block, the lower left image block and the lower right image block) are arranged in sequence to form an image block sequence.

采用视觉编码器(Visual Encoder)对图像块序列中的各个图像块进行视觉编码得到各个图像块对应的图像特征，将各个图像块在业务文档图像中的顶点位置(如左上角位置)或中心位置作为各个图像块在业务文档图像中的块位置；将各个图像块在图像块序列中排序位置，作为各个图像块在图像块序列中的块序号。其中，块序号可以是一段升序序号。Visual encoder is used to visually encode each image block in the image block sequence to obtain the image features corresponding to each image block, and the vertex position (such as the upper left corner position) or center position of each image block in the business document image It is used as the block position of each image block in the business document image; the sorting position of each image block in the image block sequence is used as the block sequence number of each image block in the image block sequence. The block sequence number may be a segment of ascending sequence numbers.

示例性的，图4B中视觉和文本特征T1所在行的前4个特征图即为图像块序列中图像块的图像特征；即，T1所在行的第一个特征图为对业务文档图像等份划分后的左上角图像块对应的特征图，T1所在行的第二个特征图为对业务文档图像等份划分后的右上角图像块对应的特征图，T1所在行的第三个特征图为对业务文档图像等份划分后的左下角图像块对应的特征图，T1所在行的第四个特征图为对业务文档图像等份划分后的右下角图像块对应的特征图。Exemplarily, the first 4 feature maps of the line where the visual and text features T1 are located in FIG. 4B are the image features of the image blocks in the image block sequence; that is, the first feature map of the line where T1 is located is equal to the business document image. The feature map corresponding to the upper-left image block after division, the second feature map of the row where T1 is located is the feature map corresponding to the upper-right image block after dividing the business document image into equal parts, and the third feature map of the line where T1 is located is The feature map corresponding to the image block in the lower left corner after the business document image is divided into equal parts. The fourth feature map in the row where T1 is located is the feature map corresponding to the image block in the lower right corner after the image block of the business document image is divided into equal parts.

图4B中坐标位置特征T2所在行中的前4个位置坐标即为各图像块在业务文档图像中的块位置，例如，T2所在行的第一个坐标位置为T1中的第一个特征图对应的图像块在业务文档图像中的块位置；图4B中排序位置特征T3所在行中的前4个序号(0-3)即为各图像块在图像块序列中的块序号，例如，T3所在行的序号0为T1中的第一个特征图对应的图像块在图像块序列中的块序号。The first four position coordinates in the row where the coordinate position feature T2 is located in FIG. 4B are the block positions of each image block in the business document image. For example, the first coordinate position of the row where T2 is located is the first feature map in T1 The block position of the corresponding image block in the business document image; the first 4 serial numbers (0-3) in the row where the sorting position feature T3 is located in FIG. 4B are the block serial numbers of each image block in the image block sequence, for example, T3 The serial number 0 of the row is the block serial number of the image block corresponding to the first feature map in T1 in the image block sequence.

S403，对业务文档图像中的文本信息进行词量化处理，得到词序列，并确定词序列中量化词的词特征、量化词在业务文档图像中的词位置，以及量化词在词序列中的词序号。S403, perform word quantization processing on the text information in the business document image to obtain a word sequence, and determine the word characteristics of the quantified word in the word sequence, the word position of the quantified word in the business document image, and the word of the quantified word in the word sequence serial number.

具体的，如图4B所示，对于表格类型的业务文档图像基于文本识别技术(如OCR技术)，识别每张业务文档图像的文本信息，然后对识别到的文本信息进行词量化处理，得到多个量化后的词汇(即量化词)，多个量化后的词汇依次排列构成词序列。优选的，可以在词序列的首尾位置各添加一个预设分割符，如开始分割符<S>和结束分隔符</S>。将各个量化词对应的字符作为该量化词的词特征，将各个量化词在业务文档图像中的词位置(即文本识别技术在识别文本时给出的位置)作为各个量化词在业务文档图像中的词位置；将各个量化词在词序列中排序位置，作为各个量化词在词序列中的词序号。其中，词序号可以是一段升序序号。Specifically, as shown in FIG. 4B , for table-type business document images, based on text recognition technology (such as OCR technology), the text information of each business document image is identified, and then word quantization processing is performed on the identified text information to obtain multiple Each quantified word (ie, quantified word), and a plurality of quantified words are arranged in sequence to form a word sequence. Preferably, a preset delimiter can be added at the beginning and end of the word sequence, such as a start delimiter <S> and an end delimiter </S>. The character corresponding to each quantified word is used as the word feature of the quantified word, and the word position of each quantified word in the business document image (that is, the position given by the text recognition technology when recognizing the text) is used as each quantified word in the business document image. The word position of each quantified word; sort the position of each quantified word in the word sequence, as the word sequence number of each quantified word in the word sequence. The word sequence number may be a segment of ascending sequence numbers.

示例性的，图4B中视觉和文本特征T1所在行的各个词汇和首尾分隔符(<S>和</S>)即为词序列中量化词的词特征；图4B中坐标位置特征T2所在行中的第5到最后一个位置坐标即为量化词在业务文档图像中的词位置；图4B中排序位置特征T3所在行中的第5到最后一个位置的数值(即0-9)即为量化词在词序列中的词序号。Exemplarily, each vocabulary and the head and tail separators (<S> and </S>) of the line where the visual and text features T1 are located in Figure 4B are the word features of the quantified words in the word sequence; the coordinate position feature T2 in Figure 4B is located. The 5th to the last position coordinate in the row is the word position of the quantified word in the business document image; the numerical value (ie 0-9) of the 5th to the last position in the row where the sorting position feature T3 is located in FIG. 4B is The word number of the quantified word in the word sequence.

需要说明的是，本实施例S403和S404的两个过程可以同时执行，没有先后顺序之分。It should be noted that, in this embodiment, the two processes of S403 and S404 may be performed simultaneously, and there is no order of precedence.

S404，根据图像块序列中图像块的图像特征、块位置和块序号，以及词序列中量化词的词特征、词位置和词序号，确定模型输入特征。S404: Determine the model input feature according to the image feature, block position and block serial number of the image block in the image block sequence, and the word feature, word position and word serial number of the quantized word in the word sequence.

其中，模型输入特征为需要输入到表格图像感知模型中的特征。The model input features are features that need to be input into the table image perception model.

可选的，如图4B所示，本实施例可以是将图像块序列中图像块的图像特征和词序列中量化词的词特征进行拼接，作为视觉和文本特征T1；将图像块序列中图像块的块位置与词序列中量化词的词位置进行拼接作为坐标置特征T2；将图像块序列中图像块的块序号与词序列中量化词的词序号进行拼接作为排序位置特征T3；再将视觉和文本特征T1、坐标位置特征T2和排序位置特征T3构成三元组形式的特征，即模型输入特征。Optionally, as shown in FIG. 4B , in this embodiment, the image features of the image blocks in the image block sequence and the word features of the quantized words in the word sequence can be spliced together as visual and text features T1; The block position of the block and the word position of the quantized word in the word sequence are spliced as the coordinate feature T2; the block sequence number of the image block in the image block sequence and the word sequence number of the quantized word in the word sequence are spliced as the sorting position feature T3; The visual and text features T1, the coordinate position features T2, and the ranking position features T3 constitute features in the form of triples, ie, model input features.

S405，通过表格图像感知模型，根据模型输入特征，从表格类型的业务文档图像中提取目标业务内容的关键信息。S405 , extract key information of the target business content from the table-type business document image by using the table image perception model and according to the model input feature.

其中，本实施例的表格图像感知模型优选为对多模态的预训练模型(MultimodalPre-training for Multilingual Visually-rich Document Understanding，LayoutXLM)进行微调(fine-tuning)后发布的模型。本实施例所谓的多模态包括三种模态，组成输入特征的视觉和文本特征T1、坐标位置特征T2和排序位置特征T3。The table image perception model in this embodiment is preferably a model released after fine-tuning a multimodal pre-training model (Multimodal Pre-training for Multilingual Visually-rich Document Understanding, LayoutXLM). The so-called multimodality in this embodiment includes three modalities, the visual and text features T1, the coordinate position feature T2, and the sorting position feature T3 that constitute the input feature.

具体的，如图4B所示，本实施例可以是将包含三种模态的模型输入特征输入到表格图像感知模型中，该表格图像感知模型即可根据输入的特征，解析表格类型的业务文档图像中，目标业务内容的关键信息。即给出该表格类型的业务文档中，各文档提取字段(具体为表格文档提取字段)对应的目标业务内容的关键信息。Specifically, as shown in FIG. 4B , in this embodiment, the model input features including three modalities may be input into the table image perception model, and the table image perception model can parse the business document of the table type according to the input features. In the image, the key information of the target business content. That is, the key information of the target business content corresponding to each document extraction field (specifically, the table document extraction field) in the business document of the form type is given.

S406，在文档类型为除表格类型外的其他文档类型的情况下，根据各其他文档类型对应的信息提取方式，从对应文档类型的业务文档图像中提取目标业务内容的关键信息。S406, if the document type is other document types except the table type, extract the key information of the target business content from the business document image corresponding to the document type according to the information extraction method corresponding to each other document type.

S407，根据所提取的关键信息，对至少两种业务文档进行审核。S407, at least two kinds of business documents are reviewed according to the extracted key information.

本公开实施例的方案，获取至少两种业务文档的业务文档图像后，针对表格类型的业务文档图像，将其划分为多个图像块，并获取图像块的图像特征、块位置和块序号，以及对业务文档图像中的文本信息进行词量化处理，获取量化词的词特征、词位置和词序号，并将获取的这些信息进行整合得到多模态的模型输入特征，进而通过表格图像感知模型，根据模型输入特征，表格类型对应的业务文档图像中提取关键信息，再采用其他信息方式对其他类型的业务文档图像也提取关键信息，进而根据所提取的关键信息来对至少两种业务文档进行审核。本实施例给出了基于LayoutXLM的表格图像感知模型，通过多模态的输入特征，从表格类型的业务文档图像中提取关键信息的具体实现方式，该模型能够更为精准且全面的提取表格类型对应的业务文档图像中的关键信息，为后续精准实现业务文档的审核提供了保障。According to the solution of the embodiment of the present disclosure, after obtaining the business document images of at least two kinds of business documents, the business document images of the form type are divided into multiple image blocks, and the image features, block positions and block serial numbers of the image blocks are obtained, And perform word quantization processing on the text information in the business document image, obtain the word feature, word position and word serial number of the quantified word, and integrate the obtained information to obtain the multi-modal model input feature, and then perceive the model through the table image. , according to the model input features, extract key information from the business document image corresponding to the table type, and then use other information methods to extract key information from other types of business document images, and then perform at least two business documents according to the extracted key information. review. This embodiment presents a table image perception model based on LayoutXLM, which extracts key information from table-type business document images through multi-modal input features. The model can extract table types more accurately and comprehensively. The key information in the corresponding business document image provides a guarantee for the subsequent accurate review of the business document.

图5是根据本公开实施例提供的一种业务文档的审核方法的流程图，本公开实施例在上述实施例的基础上，进一步对如何从表格文档类型的业务文档对应的业务文档图像中提取目标业务内容的关键信息的过程进行详细解释说明，如图5所示，本实施例提供的业务文档的审核方法可以包括：FIG. 5 is a flowchart of a business document review method provided according to an embodiment of the present disclosure. On the basis of the above-mentioned embodiment, the embodiment of the present disclosure further describes how to extract a business document image corresponding to a business document of the form document type. The process of the key information of the target business content is explained in detail. As shown in FIG. 5 , the review method of the business document provided in this embodiment may include:

S501，获取至少两种业务文档的业务文档图像。S501: Obtain business document images of at least two business documents.

S502，在文档类型为表格类型的情况下，通过表格图像感知模型，从表格类型的业务文档图像中提取目标业务内容的第一候选信息。S502 , when the document type is a table type, extract the first candidate information of the target business content from the business document image of the table type by using a table image perception model.

需要说明的是，本步骤的具体实现方式参见上述实施例的S402-S405的过程，在此不进行赘述。It should be noted that, for the specific implementation manner of this step, refer to the process of S402-S405 in the foregoing embodiment, which will not be repeated here.

S503，根据结构化信息抽取逻辑，从表格类型的业务文档图像中提取目标业务内容的第二候选信息。S503, according to the structured information extraction logic, extract the second candidate information of the target business content from the table-type business document image.

其中，结构化信息抽取逻辑可以是基于预先设置好的文档提取字段-关键信息的结构化模板进行信息提取，该结构化模板可以是根据该业务文档所专用的表格而设置的，例如，该结构化模板中的文档提取字段可以是该表格类型的业务文档中的业务字段名称，如姓名、年龄和时间等。The structured information extraction logic may be based on a preset document extraction field-key information structured template for information extraction, and the structured template may be set according to a table dedicated to the business document. For example, the structure The document extraction fields in the template can be the business field names in the business document of this form type, such as name, age, and time.

具体的，可以将结构化模板中的文档提取字段，与表格类型对应的业务文档图像中的文本信息进行正则匹配，得到文档提取字段对应的文本信息，即文档提取字段对应的候选信息，再将结构化模板中文档提取字段对应的候选信息进行整合，得到从表格类型对应的业务文档图像中提取的目标业务内容的第二候选信息。Specifically, the document extraction field in the structured template can be matched with the text information in the business document image corresponding to the table type to obtain the text information corresponding to the document extraction field, that is, the candidate information corresponding to the document extraction field, and then The candidate information corresponding to the document extraction field in the structured template is integrated to obtain the second candidate information of the target business content extracted from the business document image corresponding to the form type.

S504，据第一候选信息和第二候选信息，确定表格类型的业务文档图像中目标业务内容的关键信息。S504, according to the first candidate information and the second candidate information, determine the key information of the target business content in the table-type business document image.

需要说明的是，S502采用表格图像感知模型，基于多模态的模型输入特征抽取的第一候选信息虽然召回率高，但是准确性低，S503基于结构化信息抽取逻辑，确定的第二候选信息准确性高但召回率低。所以，本步骤可以根据第二候选信息对第一候选信息进行修正，并将修正后的第一候选信息作为表格类型的业务文档图像中目标业务内容的关键信息。具体的，针对第一候选信息中的每一文档提取字段，若第二候选信息中也包含该文档提取字段，则用该文档提取字段在第二候选信息中的内容更新该文档提取字段在第一候选信息中的内容。It should be noted that, in S502, a table image perception model is used, and although the recall rate of the first candidate information extracted from the input feature of the multimodal model is high, the accuracy is low. S503, based on the structured information extraction logic, determines the second candidate information. High accuracy but low recall. Therefore, in this step, the first candidate information can be corrected according to the second candidate information, and the corrected first candidate information can be used as the key information of the target business content in the table-type business document image. Specifically, for each document extraction field in the first candidate information, if the second candidate information also includes the document extraction field, update the document extraction field in the first candidate information with the content of the document extraction field in the second candidate information The content of a candidate message.

S505，在文档类型为除表格类型外的其他文档类型的情况下，根据各其他文档类型对应的信息提取方式，从对应文档类型的业务文档图像中提取目标业务内容的关键信息。S505 , if the document type is other document types except the table type, extract the key information of the target business content from the business document image corresponding to the document type according to the information extraction method corresponding to each other document type.

S506，根据所提取的关键信息，对至少两种业务文档进行审核。S506, at least two kinds of business documents are reviewed according to the extracted key information.

本公开实施例的方案，获取至少两种业务文档的业务文档图像后，针对表格类型对应的业务文档图像，通过表格图像感知模型和结构化信息抽取逻辑各抽取一部分候选信息进行融合，得到表格类型对应的业务文档图像，再采用其他信息方式对其他类型的业务文档图像也提取关键信息，进而根据所提取的关键信息来对至少两种业务文档进行审核。本方案在提取表格类型对应的业务文档图像中的关键信息的过程中，将召回率高但准确性低的神经网络提取方式，与召回率低但准确性高的结构化抽取逻辑相结合，同时兼顾抽取结果的全面性和准确性，为后续更精准的完成对业务文档的审核提供了保障。According to the solution of the embodiment of the present disclosure, after obtaining the business document images of at least two kinds of business documents, for the business document images corresponding to the form types, a part of candidate information is extracted through the form image perception model and the structured information extraction logic for fusion to obtain the form type For the corresponding business document images, other information methods are used to extract key information from other types of business document images, and then at least two business documents are reviewed according to the extracted key information. In the process of extracting the key information in the business document image corresponding to the form type, this solution combines the neural network extraction method with high recall rate but low accuracy with the structured extraction logic with low recall rate but high accuracy. Taking into account the comprehensiveness and accuracy of the extraction results, it provides a guarantee for the subsequent more accurate review of business documents.

图6是根据本公开实施例提供的一种业务文档的审核方法的流程图，本公开实施例在上述实施例的基础上，进一步对如何根据所提取的关键信息，对所述至少两种业务文档进行审核的过程进行详细解释说明，如图6所示，本实施例提供的业务文档的审核方法可以包括：FIG. 6 is a flowchart of a method for reviewing business documents provided according to an embodiment of the present disclosure. On the basis of the above-mentioned embodiment, the embodiment of the present disclosure further specifies how to check the at least two kinds of business documents according to the extracted key information. The process of document auditing is explained in detail. As shown in FIG. 6 , the auditing method for business documents provided in this embodiment may include:

S601，获取至少两种业务文档的业务文档图像。S601, acquiring business document images of at least two business documents.

S602，根据不同文档类型对应的信息提取方式，从对应文档类型的业务文档图像中提取目标业务内容的关键信息。S602: Extract key information of the target business content from the business document image corresponding to the document type according to information extraction methods corresponding to different document types.

S603，根据所提取的关键信息对应的业务文档图像和文档提取字段，将所提取的关键信息划分为至少一个信息对。S603: Divide the extracted key information into at least one information pair according to the business document image and the document extraction field corresponding to the extracted key information.

其中，同一信息对中的关键信息对应的文档提取字段的语义相同，且取自不同文档类型的业务文档图像。The document extraction fields corresponding to the key information in the same information pair have the same semantics and are taken from business document images of different document types.

也就是说，考虑到不同类型的业务文档中，表征相同业务内容的文档提取字段的名称可能不一定相同，所以本实施例可以是比较不同文档类型对应的业务文档图像中提取的关键信息，将文档提取字段的语义相同，且取自不同文档类型的业务文档图像的关键信息构成一个信息对，从而将S602所提取的所有关键信息划分为至少一个信息对。That is to say, considering that in different types of business documents, the names of the document extraction fields representing the same business content may not necessarily be the same, so this embodiment may compare the key information extracted from the business document images corresponding to different document types, and use The semantics of the document extraction fields are the same, and the key information extracted from the business document images of different document types constitutes an information pair, so that all the key information extracted in S602 is divided into at least one information pair.

S604，根据信息对对应的文档提取字段，为信息对确定目标审核规则。S604, extracting fields from documents corresponding to the information pairs, and determining target audit rules for the information pairs.

可选的，本实施例可以为每种语义的文档提取字段设置对应的目标审核规则，来专门用于审核包含该种语义的文档提取字段的信息对中的关键信息。具体的，本步骤根据每一信息对对应的文档提取字段，为该信息对确定目标审核规则可以包括如下至少一种情况：Optionally, in this embodiment, a corresponding target audit rule may be set for each semantic document extraction field, which is specially used for auditing the key information in the information pair containing this semantic document extraction field. Specifically, this step extracts fields according to the document corresponding to each information pair, and determines the target audit rule for the information pair, which may include at least one of the following situations:

情况一、在信息对对应的文档提取字段的信息表达方式唯一的情况下，该信息对的目标审核规则为字符一致性审核。具体的，对于信息表达方式唯一且固定的文档提取字段，例如，业务参与方的名称对应的文档提取字段，其所属的信息对的目标审核规则是信息对中包含的各关键信息的字符必须完全一致才能审核通过。Case 1: In the case where the information expression mode of the corresponding document extraction field of the information pair is unique, the target audit rule for the information pair is character consistency audit. Specifically, for a document extraction field with a unique and fixed information expression method, for example, a document extraction field corresponding to the name of a business participant, the target review rule of the information pair to which it belongs is that the characters of each key information contained in the information pair must be completely Unanimous to be approved.

情况二、在信息对对应的文档提取字段的信息表达方式不唯一的情况下，该信息对的目标审核规则为语义相似度审核。具体的，对于信息表达方式灵活且不唯一的文档提取字段，例如，业务执行时长对应的文档提取字段，可以通过开始时间和结束时间来表达，还可以通过总时长来表达，此时，该种类型的文档提取字段所属的信息对的目标审核规则是信息对中包含的各关键信息的字符不必完全一致只要表达的语义相同即可通过审核。例如，可以计算该信息对中的各关键信息之间的相似度，若相似度大于相似度阈值(如0.5)，则认为一致性审核通过。Case 2: In the case that the information expression mode of the corresponding document extraction field of the information pair is not unique, the target review rule of the information pair is semantic similarity review. Specifically, for document extraction fields with flexible and non-unique information expression methods, for example, the document extraction fields corresponding to the business execution duration can be expressed by the start time and end time, or by the total duration. The target auditing rule for the information pair to which the document extraction field of the type belongs is that the characters of each key information contained in the information pair do not have to be completely consistent, as long as the expressed semantics are the same, the auditing can be passed. For example, the similarity between each key information in the information pair may be calculated, and if the similarity is greater than a similarity threshold (eg, 0.5), it is considered that the consistency review has passed.

情况三、在信息对对应的文档提取字段表征数值信息的情况下，该信息对的目标审核规则为数值字符一致性审核。具体的，对于表征数值类信息的文档提取字段，其所属信息对的目标审核规则是提取信息对中包含的各关键信息中的数值信息，然后将各关键信息的数值一致性进行审核，若数值一致即可通过审核，无需要求关键信息的所有字符都必须一致。Scenario 3: In the case where the corresponding document extraction field of the information pair represents numerical information, the target auditing rule for the information pair is numerical character consistency auditing. Specifically, for a document extraction field representing numerical information, the target review rule for the information pair to which it belongs is to extract the numerical information in each key information contained in the information pair, and then review the numerical consistency of each key information. Consistent to pass the review, there is no need to require all characters of critical information to be consistent.

情况四、在信息对对应的文档提取字段表征日期信息的情况下，该信息对的目标审核规则为日期字符一致性审核。具体的，对于表征日期类信息的文档提取字段，由于日期的表达方式不唯一，例如，可以通过纯数字表示，还可以通过纯文字表示，也可以通过文字加数字的形式表示，所以其所属信息对的目标审核规则是将信息对中包含的日期类型的关键信息转化为统一的表示方式(如XX-XX-XX)后，再进行字符串一致性审核，若一致则审核通过。Situation 4: In the case where the document extraction field corresponding to the information pair represents date information, the target audit rule for the information pair is the date character consistency audit. Specifically, for the document extraction field representing date information, since the expression of date is not unique, for example, it can be represented by pure numbers, or by plain text, or by text plus numbers, so the information to which it belongs is not unique. The correct target audit rule is to convert the key information of the date type contained in the information pair into a unified representation (such as XX-XX-XX), and then conduct a string consistency audit, and if they are consistent, the audit is passed.

表1、审核规则记录表Table 1. Audit rule record sheet

示例性的，表1为审核规则记录表。表1中的第一列代表表格类型的业务文档中的文档提取字段(即表格文档提取字段)，表1中的第二列代表文本类型的业务文档中的文档提取字段(即文本文档提取字段)，且位于同一行的表格文档提取字段和文本文档提取字段为一组语义相同的文档提取字段。第三列对应各组语义相同的文档提取字段的审核规则。其中，第一行为表头，第二行和第三行的文档提取字段对应的信息对的目标审核规则通过上述情况一确定，第四行、第十行和第十二行由于仅一种业务文档中有对应的文档提取字段，所以其构不成信息对，不为其确定目标审核规则。第五行和第六行的文档提取字段对应的信息对的目标审核规则通过上述情况四确定，第七行值第九行的文档提取字段对应的信息对的目标审核规则通过上述情况三确定；第十一行的文档提取字段对应的信息对的目标审核规则通过上述情况二确定。Exemplarily, Table 1 is an audit rule record table. The first column in Table 1 represents the document extraction fields in the table-type business document (that is, the table document extraction fields), and the second column in Table 1 represents the document extraction fields in the text-type business documents (that is, the text document extraction fields) ), and the table document extraction fields and text document extraction fields located in the same row are a set of document extraction fields with the same semantics. The third column corresponds to the audit rules for each group of document extraction fields with the same semantics. Among them, the first row is the header, the target audit rules for the information pairs corresponding to the document extraction fields in the second row and the third row are determined by the above situation 1, the fourth row, the tenth row and the twelfth row are due to only one type of business. There is a corresponding document extraction field in the document, so it does not constitute an information pair, and the target audit rule is not determined for it. The target review rules for the information pairs corresponding to the document extraction fields in the fifth row and the sixth row are determined through the above situation 4, and the target review rules for the information pairs corresponding to the document extraction fields in the seventh row value and the ninth row are determined through the above situation 3; The target audit rule of the information pair corresponding to the document extraction field of eleven lines is determined by the above-mentioned second situation.

本实施例基于文档提取字段的表达方式的唯一性、和表达内容的类型，灵活为不同文档提取字段设置目标审核规则，提高了后续对业务文档审核的准确性。Based on the uniqueness of the expression mode of the document extraction field and the type of the expression content, this embodiment flexibly sets target review rules for different document extraction fields, which improves the accuracy of subsequent review of business documents.

需要说明的是，本实施例可以为上述四种确定目标审核规则的情况设置优先级，当某一文档提取字段同时符合上述多种情况时，可以是选择优先级高的作为目标审核规则的最终确定方式。例如，若某一信息对对应的文档提取字段表征的是日期，则同时满足上述情况一和情况四，但由于情况四的优先级高于情况一，所以最终为该信息对确定的目标审核规则为情况四对应的目标审核规则。It should be noted that, in this embodiment, priorities can be set for the above four situations of determining target audit rules. When a certain document extraction field meets the above multiple conditions at the same time, the one with higher priority can be selected as the final target audit rule. determine the way. For example, if a certain information represents a date for the corresponding document extraction field, both the above-mentioned cases 1 and 4 are satisfied, but since the priority of case 4 is higher than that of case 1, the final audit rule for the target determined for the information pair is satisfied. Review the rule for the target corresponding to Case 4.

S605，根据信息对对应的目标审核规则，对组信息对中的关键信息进行一致性审核，并根据审核结果，确定至少两种业务文档的审核结果。S605 , according to the corresponding target review rule of the information pair, perform consistency review on the key information in the group information pair, and determine the review result of at least two business documents according to the review result.

具体的，本实施例可以是针对每一信息对，基于S604为其确定的目标审核规则，来对该信息对中的各关键信息进行信息一致性审核，若所有信息对对应的审核结果均为关键信息内容一致，则确定对至少两种业务文档的审核结果为通过，否则，对至少两种业务文档的审核结果为不通过。Specifically, this embodiment may perform information consistency audit on each key information in the information pair based on the target audit rule determined for each information pair in S604, if the audit results corresponding to all the information pairs are If the content of the key information is consistent, it is determined that the review result of at least two kinds of business documents is passed; otherwise, the review result of at least two kinds of business documents is not passed.

本公开实施例的方案，获取至少两种业务文档的业务文档图像后，针对不同类型的业务文档图像，采用不同的信息提取方式来从对应的业务文档图像中提取关键信息，将提取的关键信息划分为多个信息对，基于每个信息对对应的文档提取字段，为该信息对选择对应的目标审核规则进行审核，进而根据每个信息对对应的审核结果，确定至少两种业务文档的审核结果。本方案针对不同文档提取字段对应的信息对，灵活选择适配的目标审核规则，相比于单一的目标审核规则，本方案的对关键信息的审核结果更准确。In the solution of the embodiment of the present disclosure, after obtaining the business document images of at least two kinds of business documents, different information extraction methods are used to extract key information from the corresponding business document images for different types of business document images, and the extracted key information is Divide into multiple information pairs, extract fields from the corresponding documents based on each information pair, select the corresponding target audit rules for the information pair to audit, and then determine the auditing of at least two business documents according to the audit results corresponding to each information pair result. This solution flexibly selects suitable target review rules for information pairs corresponding to different document extraction fields. Compared with a single target review rule, this solution has more accurate review results for key information.

图7是根据本公开实施例提供的一种业务文档的审核方法的流程图，本公开实施例在上述实施例的基础上，进一步对如何根据所提取的关键信息，对至少两种业务文档进行审核的过程进行详细解释说明，如图7所示，本实施例提供的业务文档的审核方法可以包括：FIG. 7 is a flowchart of a method for reviewing business documents according to an embodiment of the present disclosure. On the basis of the above-mentioned embodiment, the embodiment of the present disclosure further examines how to perform verification on at least two business documents according to the extracted key information. The review process is explained in detail. As shown in FIG. 7 , the review method of the business document provided in this embodiment may include:

S701，获取至少两种业务文档的业务文档图像。S701, acquiring business document images of at least two kinds of business documents.

S702，根据不同文档类型对应的信息提取方式，从对应文档类型的业务文档图像中提取目标业务内容的关键信息。S702: Extract key information of the target business content from the business document image corresponding to the document type according to information extraction methods corresponding to different document types.

S703，对至少两种业务文档图像进行印章识别，得到印章识别结果。S703: Perform seal recognition on at least two business document images to obtain a seal recognition result.

可选的，本实施例对至少两种业务文档图像进行印章识别的方式可以有很多，对此不进行限定。例如，一种可实现方式可以是：基于预先训练好的印章识别模型，对每种业务文档图像进行印章识别，得到每种业务文档图像对应的印章识别结果。另一种可实现方式是基于印章图像特征匹配的方式，在每种业务文档图像中寻找满足印章图像特征的区域，再进一步对该区域的信息进行提取，得到印章识别结果。Optionally, in this embodiment, there may be many ways of performing seal recognition on at least two business document images, which is not limited. For example, an implementable manner may be: based on a pre-trained seal recognition model, seal recognition is performed on each business document image, and a seal recognition result corresponding to each business document image is obtained. Another achievable method is to search for an area that satisfies the characteristics of the seal image in each business document image based on the feature matching of the seal image, and then further extract the information of the area to obtain the seal recognition result.

可选的，本实施例识别到的印章识别结果可以包括但不限于：印章的形状、位置、数量和印章上的文字等。Optionally, the seal recognition result recognized in this embodiment may include, but is not limited to, the shape, position, quantity, and text on the seal, and the like.

S704，根据印章识别结果和所提取的关键信息，对至少两种业务文档进行审核。S704, at least two kinds of business documents are reviewed according to the seal identification result and the extracted key information.

可选的，本实施例在对至少两种业务文档进行审核时，可以先根据印章识别结果审核文档印章是否满足要求，例如，印章的文字内容是否正确、印章的数量是否够、印章的位置是否正确，以及印章的图案是否完整等。在至少两种业务文档的印章都满足要求的情况下，再进一步根据所提取的关键信息，对至少两种业务文档进行审核，具体的，基于所提取的关键信息对至少两种业务文档进行审核的过程在上述实施例中已经进行了详细的介绍，在此不进行赘述。Optionally, when reviewing at least two types of business documents in this embodiment, it may first check whether the document seal meets the requirements according to the seal identification result, for example, whether the text content of the seal is correct, whether the number of seals is sufficient, and whether the location of the seal is not. Correct, and whether the pattern of the seal is complete, etc. In the case that the seals of the at least two business documents meet the requirements, the at least two business documents are further reviewed according to the extracted key information. Specifically, the at least two business documents are reviewed based on the extracted key information. The process has been described in detail in the above embodiments, and will not be repeated here.

本公开实施例的方案，获取至少两种业务文档的业务文档图像后，针对不同类型的业务文档图像，采用不同的信息提取方式来从对应的业务文档图像中提取关键信息，并对至少两种业务文档图像进行印章识别，根据印章识别结果和提取的关键信息来对至少两种业务文档进行审核。本方案在从印章检测和关键信息检测两个方面来对业务文档进行审核，在保证审核结果准确性的同时，还提高了文档内容的全面性。In the solution of the embodiment of the present disclosure, after acquiring the business document images of at least two kinds of business documents, different information extraction methods are used to extract key information from the corresponding business document images for different types of business document images, and the at least two kinds of business document images are extracted with different information extraction methods. The business document image is subjected to seal recognition, and at least two kinds of business documents are reviewed according to the seal recognition result and the extracted key information. This solution audits business documents from two aspects: seal detection and key information detection, which not only ensures the accuracy of audit results, but also improves the comprehensiveness of document content.

图8A是根据本公开实施例提供的一种业务文档的审核方法的流程图，图8B是根据本公开实施例提供的一种可视化界面的展示效果示意图，本公开实施例在上述实施例的基础上，进行了进一步的优化，增加了显示审核结果的过程，如图8A-8B所示，本实施例提供的业务文档的审核方法可以包括：FIG. 8A is a flowchart of a method for reviewing business documents according to an embodiment of the present disclosure, and FIG. 8B is a schematic diagram of a display effect of a visual interface provided according to an embodiment of the present disclosure. The embodiment of the present disclosure is based on the above-mentioned embodiment. In the above, further optimization is carried out, and the process of displaying the audit result is added. As shown in Figures 8A-8B, the audit method of the business document provided by this embodiment may include:

S801，获取至少两种业务文档的业务文档图像。S801. Obtain business document images of at least two business documents.

S802，根据不同文档类型对应的信息提取方式，从对应文档类型的业务文档图像中提取目标业务内容的关键信息。S802: Extract key information of the target business content from the business document image corresponding to the document type according to information extraction methods corresponding to different document types.

S803，根据所提取的关键信息，对至少两种业务文档进行审核。S803, at least two kinds of business documents are reviewed according to the extracted key information.

S804，通过可视化界面展示所提取的关键信息、至少两种业务文档的业务文档图像和审核结果。S804, displaying the extracted key information, business document images of at least two business documents, and audit results through a visual interface.

可选的，如图8B所示，本实施例在对至少两种业务文档审核完成后，可以通过可视化界面向用户展示审核结果，具体的，可以通过可视化界面展示至少两种业务文档的业务文档图像，并在业务文档图像旁边展示从业务文档图像中提取的关键信息，例如，可以是将同属于一个信息对的各关键信息放在一起关联显示。另外，还可以展示对至少两种业务文档审核结果。可选的，在展示对至少两种业务文档审核结果时，若业务文档审核结果为不通过，则还可以在展示的每一组包含多个关键信息的信息对时，一并展示该信息对对应的审核结果，便于用户对审核结果的复核。Optionally, as shown in FIG. 8B , in this embodiment, after the review of at least two types of business documents is completed, the review results can be displayed to the user through a visual interface. Specifically, the business documents of at least two types of business documents can be displayed through a visual interface. image, and display the key information extracted from the business document image next to the business document image, for example, the key information that belongs to the same information pair can be displayed together in association. In addition, the audit results of at least two types of business documents can also be displayed. Optionally, when displaying the review results of at least two kinds of business documents, if the business document review result is not passed, you can also display the information pairs when each group of displayed information pairs contains multiple key information. The corresponding audit results are convenient for users to review the audit results.

本公开实施例的方案，获取至少两种业务文档的业务文档图像后，针对不同类型的业务文档图像，采用不同的信息提取方式来从对应的业务文档图像中提取关键信息，根据所提取的关键信息来对至少两种业务文档进行审核，并将审核结果、采集的业务文档图像，以及提取的关键信息可视化展示给用户。本方案提供可视化界面向用户展示审核结果和审核过程中的相关数据，便于用户更为直观的了解审核情况。According to the solution of the embodiment of the present disclosure, after obtaining the business document images of at least two kinds of business documents, different information extraction methods are used to extract key information from the corresponding business document images for different types of business document images. information to audit at least two types of business documents, and visualize the audit results, the collected business document images, and the extracted key information to the user. This solution provides a visual interface to show users the audit results and relevant data during the audit process, so that users can understand the audit situation more intuitively.

可选的，本实施例的方案还可以支持人机协同对至少两种业务文档进行审核。即在通过可视化界面展示所提取的关键信息、至少两种类型的文档图像，以及待审核图像集的审核结果之后，还可以包括:接收作用于可视化界面产生的修改信息；根据修改信息，更新所述待审核图像集的审核结果。Optionally, the solution of this embodiment may also support human-machine collaboration to review at least two types of business documents. That is, after displaying the extracted key information, at least two types of document images, and the review result of the image set to be reviewed through the visual interface, it may also include: receiving modification information generated by acting on the visual interface; Describe the review results of the image set to be reviewed.

具体的，在通过可视化界面展示了所提取的关键信息，至少两种类型的业务文档图像，和设备对两种类型的业务文档的审核结果后，用户可以基于可视化界面展示的内容，对系统的审核结果进行复核，若发现系统的审核结果有误，则可以触发可视化界面的相关组件，输入修改信息，相应的，本实施例执行文档审核的设备即可根据用户输入的修改信息，更正审核结果，并将更正后的审核结果重新展示在可视化界面中。Specifically, after displaying the extracted key information, at least two types of business document images, and the device's review results of the two types of business documents through the visual interface, the user can, based on the content displayed on the visual interface, review the system's The audit results are reviewed. If the system audit results are found to be incorrect, the relevant components of the visual interface can be triggered to input modification information. Correspondingly, the device for performing document audit in this embodiment can correct the audit results according to the modification information input by the user. , and redisplay the corrected audit results in the visual interface.

可选的，用户输入的修改信息可以是修改提取的关键信息，也可以是修改审核结果，若修改信息为提取的关键信息，则系统可以基于修改后的关键信息重新执行对业务文档的审核过程，并在可视化界面更新审核结果，若修改信息为审核结果，则系统可以直接基于修改信息在可视化界面中更新审核结果。Optionally, the modification information input by the user may be the key information for modification and extraction, or the modification review result. If the modification information is the extracted key information, the system may re-execute the review process of the business document based on the modified key information. , and update the audit result on the visual interface. If the modified information is the audit result, the system can directly update the audit result in the visual interface based on the modified information.

本方案在上述可视化展示审核结果及其相关信息基础上，进一步给出了一种人机协同审核文档信息的方案，通过人工复核进一步保证了审核结果的准确性。On the basis of the above-mentioned visual display of audit results and related information, this scheme further provides a scheme for human-machine collaborative auditing of document information, which further ensures the accuracy of audit results through manual review.

可选的，在本实施例的基础上，通过可视化界面展示所提取的关键信息，包括：通过可视化界面展示所提取的关键信息和所述关键信息的置信度。具体的，在通过表格图像感知模型和文本感知模型提取业务文档图像中的关键信息时，表格图像感知模型和文本感知模型还会给出每个关键信息的置信度，本实施例可以在展示所提取的关键信息的同时一并展示其对应的置信度。这样设置的好处是：便于用户基于可视化界面显示的置信度，快速定位需要着重审核的关键信息，提高了人工复核过程的高效性。Optionally, on the basis of this embodiment, displaying the extracted key information through a visual interface includes: displaying the extracted key information and the confidence level of the key information through the visual interface. Specifically, when the key information in the business document image is extracted through the table image perception model and the text perception model, the table image perception model and the text perception model will also give the confidence level of each key information. The extracted key information is displayed along with its corresponding confidence. The advantage of this setting is that it is convenient for users to quickly locate the key information that needs to be reviewed based on the confidence displayed on the visual interface, which improves the efficiency of the manual review process.

图9是根据本公开实施例提供的一种业务文档的审核方法的流程图，本公开实施例在上述实施例的基础上，进行了进一步的优化，增加了排序显示业务文档图像的过程，如图9所示，本实施例提供的业务文档的审核方法可以包括：FIG. 9 is a flowchart of a business document review method provided according to an embodiment of the present disclosure. On the basis of the above-mentioned embodiment, the embodiment of the present disclosure further optimizes, and adds a process of sorting and displaying business document images, such as As shown in FIG. 9 , the review method of the business document provided by this embodiment may include:

S901，获取至少两种业务文档的业务文档图像。S901, acquiring business document images of at least two kinds of business documents.

S902，根据不同文档类型对应的信息提取方式，从对应文档类型的业务文档图像中提取目标业务内容的关键信息。S902: Extract key information of the target business content from the business document image corresponding to the document type according to information extraction methods corresponding to different document types.

S903，根据所提取的关键信息，对至少两种业务文档进行审核。S903, at least two kinds of business documents are reviewed according to the extracted key information.

S904，确定每一种业务文档的业务文档图像的展示顺序。S904, determining the display order of the business document images of each business document.

其中，本实施例中，业务文档图像的展示顺序即为业务文档图像在其所属业务文档中对应的页数，Wherein, in this embodiment, the display order of the business document images is the number of pages corresponding to the business document images in the business document to which they belong,

可选的，根据业务复杂程度的不同，对应的业务文档的页数也不同，当业务文档的页数为至少两页时，至少两页业务文档对应的业务文档图像在获取时顺序可能是打乱的，影响后续可视化展示时浏览的连贯性。所以为了避免乱序展示业务文档图像的情况出现，本实施例可以在展示业务文档图像之前，先对每种业务文档的业务文档图像进行排序，即确定每种业务文档的多张业务文档图像的展示顺序。Optionally, according to the complexity of the business, the number of pages of the corresponding business document is also different. When the number of pages of the business document is at least two pages, the business document images corresponding to the at least two pages of business documents may be acquired in the order in which they are obtained. It is messy, which affects the continuity of browsing in the subsequent visualization display. Therefore, in order to avoid the situation of displaying the business document images out of order, in this embodiment, before displaying the business document images, the business document images of each business document may be sorted, that is, to determine the number of business document images of each business document. display order.

具体的，本实施例确定每一种业务文档的业务文档图像的展示顺序的过程是：针对每一种业务文档，先对各业务文档图像进行文本识别，得到各业务文档图像的文本信息，然后基于预先训练好的排序模型，对各业务文档图像的文本信息进行解析，并输出各业务文档图像对应的展示顺序。其中，该排序模型可以是BERT二分类模型。Specifically, the process of determining the display order of the business document images of each business document in this embodiment is: for each business document, first perform text recognition on each business document image to obtain text information of each business document image, and then Based on the pre-trained sorting model, the text information of each business document image is parsed, and the display order corresponding to each business document image is output. Wherein, the ranking model may be a BERT binary classification model.

优选的，本实施例可以针对每一种业务文档对应的各业务文档图像的文本信息，和预设首尾序列，构建上下文数据集，具体的，若有N张业务文档图像，则构建N+1个上下文数据集，每个上下文数据集中包括上文和下文两部分，其中，第一个上下文数据集中的上文是预设首尾序列中的开始序列，下文是N张业务文档图像的文本信息；此后依次将每一张业务文档图像的文本信息作为上文，将除该张业务文档图像之外的其它业务文档图像的文本信息和预设首尾序列中的结束序列作为下文。从而构建出N+1个上下文数据集。然后依次将构建的N+1个上下文数据集，输入到预先训练好的排序模型中，该排序模型即可根据输入的上下文数据集，输出该种业务文档对应的各业务文档图像的展示顺序。Preferably, in this embodiment, a context data set can be constructed according to the text information of each business document image corresponding to each type of business document, and the preset head and tail sequences. Specifically, if there are N business document images, N+1 is constructed. A context data set, each context data set includes the above and the following two parts, wherein, the above in the first context data set is the start sequence in the preset head-to-end sequence, and the following is the text information of N business document images; Thereafter, the text information of each business document image is taken as the above, and the text information of other business document images except the business document image and the end sequence in the preset head and tail sequence are taken as the following. Thereby, N+1 context datasets are constructed. Then, the constructed N+1 context data sets are input into the pre-trained sorting model in turn, and the sorting model can output the display order of each business document image corresponding to the business document according to the input context data set.

可选的，为了节省系统的功耗，本实施例可以在执行本步骤之前，可以先判断每种业务文档的页数是否为多页，若是，则执行本步骤的操作，否则以默认的展示顺序(即第一页展示)执行后续S905的操作。Optionally, in order to save the power consumption of the system, in this embodiment, before executing this step, it may first determine whether the number of pages of each business document is more than one page, and if so, execute the operation of this step, otherwise, display it by default. The operation of the subsequent S905 is performed sequentially (ie, the first page is displayed).

S905，按照展示顺序，在可视化界面中展示每一种业务文档的业务文档图像，同时通过可视化界面展示所提取的关键信息和至少两种业务文档的审核结果。S905 , according to the display sequence, display the business document image of each business document in the visual interface, and simultaneously display the extracted key information and the audit results of at least two business documents through the visual interface.

可选的，本实施例在通过可视化界面展示业务文档的业务文档图像时，可以基于S904确定的展示顺序，对各业务文档图像进行排序后，按照排序顺序依次展示在可视化页面中，同时在该可视化界面中一并展示所提取的关键信息和至少两种业务文档审核结果等。Optionally, when the business document images of the business documents are displayed through the visual interface in this embodiment, the business document images may be sorted based on the display order determined in S904, and then displayed on the visual page in sequence according to the sorting order. The extracted key information and the audit results of at least two business documents are displayed together in the visual interface.

本公开实施例的方案，获取至少两种业务文档的业务文档图像后，针对不同类型的业务文档图像，采用不同的信息提取方式来从对应的业务文档图像中提取关键信息，根据所提取的关键信息来对至少两种业务文档进行审核。同时还对各种业务文档图像进行排序，并基于排序结果，依次将各业务文档图像展示在可视化界面中，此外，还在可视化界面显示所提取的关键信息和审核结果。本方案在可视化界面中展示业务文档图像前，先对业务文档图像的展示顺序进行调整，保证其展示顺序与对应的业务文档的页数一致后再展示，极大的提高了在可视化界面上浏览业务文档图像的连贯性。According to the solution of the embodiment of the present disclosure, after obtaining the business document images of at least two kinds of business documents, different information extraction methods are used to extract key information from the corresponding business document images for different types of business document images. information to review at least two types of business documents. At the same time, various business document images are sorted, and based on the sorting results, each business document image is displayed on the visual interface in turn. In addition, the extracted key information and audit results are also displayed on the visual interface. Before displaying the business document images in the visual interface, this solution firstly adjusts the display order of the business document images to ensure that the display order is consistent with the number of pages of the corresponding business document before displaying, which greatly improves the visual interface. Browse business document image coherence.

示例性的，图10为本公开实施例提供的排序模型的调参和使用过程的原理框图。如图10所示，方框1001-方框1003对应的是排序模型的调参过程，具体的：在调参过程中，方框1001对应的业务文档图像可以是训练模型所使用的页数为多页的业务文档图拍摄或扫描图像进行顺序打乱后的一组图像(即训练样本图像)，将方框1001中的训练样本图像输入到方框1002中进行数据预处理的过程可以是对输入的打乱顺序的业务文档图像进行文本信息识别，并根据识别结果构建上下文数据集(具体的构建过程在上述实施例中已进行了详细的介绍)，以及该组图像的真实排序标签。将方框1002处理后的上下文数据集和真实排序标签输入到方框1003，对排序模型进行微调(fine-turning)，微调好的排序模型即可发布，用于后续方框1004进行模型推理操作。Exemplarily, FIG. 10 is a schematic block diagram of a process of adjusting parameters and using a ranking model according to an embodiment of the present disclosure. As shown in FIG. 10 , blocks 1001 to 1003 correspond to the parameter adjustment process of the sorting model. Specifically: in the parameter adjustment process, the business document image corresponding to block 1001 may be the number of pages used for training the model. A group of images (that is, training sample images) after the multi-page business document image is shot or scanned and the images are scrambled. The process of inputting the training sample images in block 1001 into block 1002 for data preprocessing can be as follows: Perform text information recognition on the input scrambled business document images, and build a context data set according to the recognition results (the specific construction process has been described in detail in the above embodiment), and the real sorting labels of the group of images . Input the context data set and real ranking labels processed in block 1002 into block 1003, fine-turning the ranking model, and the fine-tuned ranking model can be published for subsequent model inference operations in block 1004 .

方框1001-1002，以及1004-1005对应的是排序模型的使用过程，具体的：在实际使用过程中，方框1001对应的需要进行排序的某种文档类型对应的各业务文档图像，将其输入到方框1002中进行预处理，即构建上下文数据集后输入到方框1004中，方框1004基于方框1003发布的排序模型，对输入的上下文数据集进行解析，确定每组上下文数据中对应的上下文排序结果后传输至方框1005，方框1005对各组上下文数据的排序结果进行汇总，得到整个业务文档图像的整体排序结果。Blocks 1001-1002 and 1004-1005 correspond to the use process of the sorting model, specifically: in the actual use process, the business document images corresponding to a certain document type corresponding to block 1001 that need to be sorted, Input to block 1002 for preprocessing, that is, after constructing a context data set, input it to block 1004. Block 1004 parses the input context data set based on the ranking model released in block 1003, and determines the content of each set of context data. The corresponding context sorting results are then transmitted to block 1005, where block 1005 summarizes the sorting results of each group of context data to obtain an overall sorting result of the entire business document image.

图11是根据本公开实施例提供的一种业务文档的审核方法的流程图，本公开实施例在上述实施例的基础上，进一步对如何通过可视化界面展示至少两种业务文档的业务文档图像的过程进行详细解释说明，如图11所示，本实施例提供的业务文档的审核方法可以包括：11 is a flowchart of a method for reviewing business documents according to an embodiment of the present disclosure. On the basis of the above-mentioned embodiments, the embodiment of the present disclosure further describes how to display business document images of at least two kinds of business documents through a visual interface. The process is explained in detail. As shown in FIG. 11 , the review method of the business document provided in this embodiment may include:

S1101，获取至少两种业务文档的业务文档图像。S1101 , acquiring business document images of at least two business documents.

S1102，根据不同文档类型对应的信息提取方式，从对应文档类型的业务文档图像中提取目标业务内容的关键信息。S1102: Extract key information of the target business content from the business document image corresponding to the document type according to information extraction methods corresponding to different document types.

S1103，根据所提取的关键信息，对至少两种业务文档进行审核。S1103: Review at least two business documents according to the extracted key information.

S1104，根据所提取的关键信息对应的业务文档图像和起始位置，在至少两种业务文档的业务文档图像中标注关键信息的提取位置。S1104, according to the business document image and the starting position corresponding to the extracted key information, mark the extraction position of the key information in the business document images of at least two kinds of business documents.

可选的，本方案在根据业务文档对应的业务文档图像中的文本信息中提取关键信息时，由于文本信息中不但包含有文本内容，还包括文本的位置，所以提取的关键信息中的每一个字也对应有其在对应的业务文档图像中的位置信息，所以本实施例可以针对提取的每一个关键信息，将该关键信息中的第一个字和最后一个字对应的位置信息分别作为该关键信息的开始位置和结束位置，并将两位置进行组合，得到该关键信息的起始位置。然后根据该关键信息对应的业务文档图像，将确定的该关键信息的起始位置映射到对应的业务文档图像中，此时可以直接将映射位置区域作为该关键信息在业务文档图像中的提取位置进行标注。示例性的，可以通过高亮、添加文本框或改变字体等方式进行标注。Optionally, when the solution extracts key information from the text information in the business document image corresponding to the business document, since the text information includes not only the text content, but also the position of the text, each of the extracted key information is The word also corresponds to its position information in the corresponding business document image, so in this embodiment, for each extracted key information, the position information corresponding to the first word and the last word in the key information can be used as the The starting position and ending position of the key information, and combining the two positions to obtain the starting position of the key information. Then, according to the business document image corresponding to the key information, the determined starting position of the key information is mapped to the corresponding business document image. At this time, the mapping location area can be directly used as the extraction position of the key information in the business document image. Label. Exemplarily, annotations can be made by highlighting, adding text boxes, or changing fonts.

需要说明的是，本方案在从对应文档类型的业务文档图像中提取目标业务内容的关键信息时，可能会对业务文档图像中的文本信息进行清洗和符号替换处理等预处理操作，这会导致预处理前后的文本信息中的文本位置不一致，影响后续位置标注的准确性。针对该情况，本实施例可以在对数据预处理之前，先缓存一份原始的文本信息，并记录其中每个字符通过文本识别得到的位置信息。在对原始的文本信息进行预处理的过程中，在缓存的文本信息中同步执行清洗和符合替换等预处理操作，从而缓存的文本信息与预处理后的文本信息的字符位置信息的一致性，在进行关键信息标注时，可以缓存的文本信息，确定关键信息的起始位置进行标注，保证标注结果的准确性。It should be noted that, when extracting the key information of the target business content from the business document image of the corresponding document type, this solution may perform preprocessing operations such as cleaning and symbol replacement processing on the text information in the business document image, which may lead to The text positions in the text information before and after preprocessing are inconsistent, which affects the accuracy of subsequent position annotations. In view of this situation, in this embodiment, before preprocessing the data, a copy of the original text information may be cached, and the position information of each character obtained by text recognition may be recorded. In the process of preprocessing the original text information, preprocessing operations such as cleaning and conformity replacement are performed synchronously in the cached text information, so that the cached text information is consistent with the character position information of the preprocessed text information. When labeling key information, you can cache the text information to determine the starting position of the key information for labeling to ensure the accuracy of the labeling results.

S1105，将标注后的业务文档图像展示在可视化界面中，同时通过可视化界面展示所提取的关键信息和至少两种业务文档的审核结果。S1105 , displaying the marked business document image on the visual interface, and simultaneously displaying the extracted key information and audit results of at least two business documents through the visual interface.

本公开实施例的方案，获取至少两种业务文档的业务文档图像后，针对不同类型的业务文档图像，采用不同的信息提取方式来从对应的业务文档图像中提取关键信息，根据所提取的关键信息来对至少两种业务文档进行审核。并根据提取的关键信息对应的业务文档图像和起始位置，在至少两种业务文档的业务文档图像中标注关键信息的提取位置后显示在可视化界面中，同时还在可视化界面展所提取的关键信息和对应的审核结果。本方案在可视化界面中展示的业务文档图像是标注了关键信息提取位置的业务文档图像，便于用户直观且快速的找到关键信息的提取位置，进而审核关键信息提取的准确性，提高后续人机系统审核文本信息的效率。According to the solution of the embodiment of the present disclosure, after obtaining the business document images of at least two kinds of business documents, different information extraction methods are used to extract key information from the corresponding business document images for different types of business document images. information to review at least two types of business documents. And according to the business document image and the starting position corresponding to the extracted key information, the extraction position of the key information is marked in the business document image of at least two business documents and displayed in the visual interface, and the extracted key information is also displayed on the visual interface. information and corresponding audit results. The business document image displayed in the visual interface of this solution is the business document image marked with the extraction location of key information, which is convenient for users to intuitively and quickly find the extraction location of key information, and then review the accuracy of extraction of key information and improve the subsequent human-machine system. Review the efficiency of text messages.

可选的，在本公开实施例中，某一关键信息能分布在业务文档中的不同行，此时如果简单的基于关键信息的起始位置进行标注可能存在标注范围过大，不准确等情况。针对该问题，本实施例可以是：根据所提取的关键信息对应的起始位置，确定关键信息是否属于跨行提取；根据确定结果和关键信息对应的文本图像，在至少两种类型的文档图像中标注关键信息的提取位置。Optionally, in the embodiment of the present disclosure, a certain key information can be distributed in different lines in the business document. At this time, if the labeling is simply based on the starting position of the key information, the labeling range may be too large or inaccurate. . In order to solve this problem, this embodiment may be: according to the starting position corresponding to the extracted key information, determine whether the key information belongs to cross-line extraction; according to the determination result and the text image corresponding to the key information, in at least two types of document images Label the extraction location of key information.

具体的，本实施例可根据所提取的关键信息对应的起始位置，确定关键信息是否属于跨行提取的方式可以是，针对每一关键信息，根据其对应的起始位置，获取从开始位置到结束位置的所有字符的位置信息，然后依次判断获取的各位置信息的X轴(即文档中字符所在行的方向)坐标是否存在突然减少的突变点，若不存在，则基于上述介绍的提取位置的常规标注方式进行标注；若存在，则说明该关键信息属于跨行提取，且该突变点对应位置为跨行位置，此时以跨行位置为分界点，将关键信息划分为至少两段进行标注。例如，若某一关键信息存在一个跨行位置，则将该关键信息划分为两段进行标注，即将关键信息的开始位置到该跨行位置的前一位置点作为第一段，将该跨行位置到关键信息的结束位置作为第二段，然后针对每一段按照上述实施例介绍的方式在对应的业务文档图像中进行提取位置的标注。本实施例这样设置的好处是考虑到了关键信息跨行显示的情况，进一步提高了关键信息在业务文档图像中标注的位置的准确性。Specifically, this embodiment can determine whether the key information belongs to cross-line extraction according to the starting position corresponding to the extracted key information. The position information of all characters at the end position, and then judge whether there is a sudden decrease in the coordinates of the X-axis (that is, the direction of the line in which the characters in the document are located) coordinates of the obtained position information. If it exists, it means that the key information belongs to the cross-line extraction, and the corresponding position of the mutation point is the cross-line position. In this case, the cross-line position is used as the dividing point, and the key information is divided into at least two paragraphs for labeling. For example, if a certain key information has a cross-line position, the key information is divided into two sections for labeling, that is, the point from the start position of the key information to the previous position of the cross-line position is taken as the first paragraph, and the cross-line position to the key The end position of the information is taken as the second segment, and then for each segment, the extraction location is marked in the corresponding business document image in the manner described in the foregoing embodiment. The advantage of this setting in this embodiment is that the situation that the key information is displayed across lines is considered, and the accuracy of the location of the key information marked in the business document image is further improved.

示例性的，图12A为本公开实施例提供的一种工单文档审核的流程图，图12B为本公开实施例提供的感知引擎的内容工作原理流程图。接下来，本实施例以对工单中的业务文档审核为例，来介绍本实施例的业务文档的审核方法。需要说明的是，在金融存款领域中，每笔存款交易可以即可看成一笔工单，每笔工单至少会产生两种基本的材料，即审批表和存款的交易合同。本实施例对工单的审核即为审批表和交易合同中记录的交易双方、起息日、到期日、存期、存款金额、利率、计息方式等字段的具体记录内容是否一致。Exemplarily, FIG. 12A is a flowchart of a work order document review provided by an embodiment of the present disclosure, and FIG. 12B is a flowchart of a content working principle of a perception engine provided by an embodiment of the present disclosure. Next, this embodiment takes the review of the business document in the work order as an example to introduce the review method of the service document in this embodiment. It should be noted that in the field of financial deposits, each deposit transaction can be regarded as a work order, and each work order will generate at least two basic materials, namely the approval form and the deposit transaction contract. The review of the work order in this embodiment refers to whether the specific record contents of the transaction parties, value date, maturity date, deposit period, deposit amount, interest rate, interest calculation method and other fields recorded in the approval form and the transaction contract are consistent.

具体的，如图12A所示，将方框1201中的工单图像(即包括审批表图像和交易合同图像)输入到方框1202的感知引擎中，感知引擎的具体工作原理如图12B所示，用于采用通用OCR技术获取工单图像的文本信息(包括：每个文本框的字符信息、每个字符的识别置信度、字符的坐标信息)，通过印章识别获取工单图像的印章信息(包括：图像是否包含印章，每个印章的字符信息等)，通过表格识别技术获取工单图像中的有线或无线表格的表格信息(包括:有线或无线、行列信息、单元格字符信息，单元格字符的识别置信度)。Specifically, as shown in FIG. 12A, the work order image in block 1201 (that is, including the approval form image and the transaction contract image) is input into the perception engine in block 1202. The specific working principle of the perception engine is shown in FIG. 12B. , which is used to obtain the text information of the work order image (including: character information of each text box, the recognition confidence of each character, and the coordinate information of the characters) by using the general OCR technology, and obtain the seal information of the work order image through seal recognition ( Including: whether the image contains seals, character information of each seal, etc.), obtain the form information of wired or wireless forms in the work order image through form recognition technology (including: wired or wireless, row and column information, cell character information, cell character recognition confidence).

方框1202在得到感知结果后，可以将感知结果输入给方框1203的分类器，该分类器可以是对预训练模型(ERNIE)进行调参后得到的，该分类器会基于输入的感知结果，给出各感知结果对应的文档类型，即各感知结果对应的工单图像是审批表图像、交易合同图像还是其他类型图像。然后根据分类结果，将审批表图像及其对应的相关信息(如方框1202感知到的相关信息)输入到方框1204认知引擎的表格图像信息抽取部分，来执行从审批表图像中提取关键信息的操作。将交易合同图像及其对应的相关信息(如1202感知到的相关信息)输入到方框1204认知引擎的文本图像排序部分，由文本图像排序部分对交易合同图像进行排序后输入到文本图像信息抽取部分，由文本图像信息抽取部分执行从排序后的交易合同图像中提取关键信息的操作。需要说明的是，工单文档审核场景中，审批表的页数通常是一页，而交易合同的页数通常较多，所以本实施例此时只对交易合同图像进行了排序处理。After obtaining the perception result in block 1202, the perception result can be input to the classifier in block 1203, the classifier can be obtained after adjusting the parameters of the pre-training model (ERNIE), and the classifier will be based on the input perception result. , gives the document type corresponding to each perception result, that is, whether the work order image corresponding to each perception result is an image of an approval form, an image of a transaction contract, or an image of other types. Then, according to the classification result, the approval form image and its corresponding related information (such as the relevant information perceived in block 1202) are input into the form image information extraction part of the cognitive engine in block 1204 to perform key extraction from the approval form image. information manipulation. The transaction contract image and its corresponding related information (such as the related information perceived in 1202) are input into the text image sorting part of the cognitive engine in block 1204, and the transaction contract images are sorted by the text image sorting part and then input into the text image information Extraction part, the operation of extracting key information from the sorted transaction contract images is performed by the text image information extraction part. It should be noted that, in the work order document review scenario, the number of pages of the approval form is usually one page, while the number of pages of the transaction contract is usually more, so this embodiment only performs sorting processing on the images of the transaction contract.

方框1204会将其提取到的所有关键信息传输至方框1205，由方框1205中的审核模块根据方框1204所提取到的所有关键信息，来对该笔工单中的审批表和交易合同进行内容一致性审核，并将审核结果通过可视化界面展示，用户可以在可视化界面协同进行审核，从而实现方框1206的人机协同审核的过程。具体的审核和显示过程可参见上述实施例的介绍。Block 1204 will transmit all the key information extracted by it to block 1205, and the review module in block 1205 will review the approval form and transaction in the work order according to all the key information extracted in block 1204. The contract is reviewed for content consistency, and the review results are displayed through a visual interface, and users can collaboratively review on the visual interface, thereby realizing the process of human-machine collaborative review in block 1206 . For the specific review and display process, please refer to the description of the above embodiment.

本实施例整合了包括基于神经网络的分类技术、基于神经网络的表格图像信息抽取技术、基于神经网络的文本图像信息抽取技术，实现了自动提取关键信息，并整合了人机协同技术辅助审核人员提高金融存款场景下的审核效率。This embodiment integrates neural network-based classification technology, neural network-based table image information extraction technology, and neural network-based text image information extraction technology, realizes automatic extraction of key information, and integrates human-machine collaboration technology to assist auditors Improve audit efficiency in financial deposit scenarios.

图13是根据本公开实施例提供的一种业务文档的审核装置的结构示意图，本公开实施例适用于对关联相同目标业务内容的至少两种文档类型的业务文档的进行信息一致性审核的情况。例如，适用于对关联相同目标业务内容的业务表格和业务合同进行信息一致性审核的情况。该装置可以配置于安装有文档智能应用的电子设备中，采用软件和/或硬件来实现，该装置可以实现本公开任意实施例的业务文档的审核方法。如图13所示，该文档图像的审核装置1300包括：FIG. 13 is a schematic structural diagram of an apparatus for reviewing business documents provided according to an embodiment of the present disclosure. The embodiment of the present disclosure is applicable to the case of performing information consistency review on business documents of at least two document types associated with the same target business content . For example, it is applicable to the case of conducting information consistency review on business forms and business contracts related to the same target business content. The apparatus may be configured in an electronic device installed with a document intelligence application, and implemented by software and/or hardware, and the apparatus may implement the business document review method of any embodiment of the present disclosure. As shown in FIG. 13 , the apparatus 1300 for reviewing the document image includes:

文档图像获取模块1301，用于获取至少两种业务文档的业务文档图像；其中，所述至少两种业务文档均与目标业务内容关联，且属于不同文档类型；A document image acquisition module 1301, configured to acquire business document images of at least two types of business documents; wherein, the at least two types of business documents are associated with target business content and belong to different document types;

关键信息提取模块1302，用于根据不同文档类型对应的信息提取方式，从对应文档类型的业务文档图像中提取所述目标业务内容的关键信息；The key information extraction module 1302 is configured to extract the key information of the target business content from the business document image corresponding to the document type according to the information extraction methods corresponding to different document types;

业务文档审核模块1303，用于根据所提取的关键信息，对所述至少两种业务文档进行审核。The business document review module 1303 is configured to review the at least two types of business documents according to the extracted key information.

进一步的，所述关键信息提取模块1302，包括：Further, the key information extraction module 1302 includes:

输入序列构建单元，用于在所述文档类型为文本类型的情况下，根据预设间隔符、文档提取字段和所述文本类型的业务文档图像中的文本信息，构建输入序列；an input sequence construction unit, configured to construct an input sequence according to a preset spacer, a document extraction field and text information in a business document image of the text type when the document type is a text type;

模型感知单元，用于通过文本图像感知模型，根据所述输入序列，确定所述文档提取字段对应的抽取起始位置，以及所述抽取起始位置的置信度；a model-aware unit, configured to determine, according to the input sequence, an extraction start position corresponding to the document extraction field, and a confidence level of the extraction start position through a text-image-aware model;

抽取信息确定单元，用于根据所述抽取起始位置和所述抽取起始位置的置信度，确定所述文档提取字段对应的抽取信息，以及所述抽取信息的置信度；an extraction information determination unit, configured to determine the extraction information corresponding to the document extraction field and the confidence level of the extraction information according to the extraction start position and the confidence level of the extraction start position;

关键信息确定单元，用于根据所述文档提取字段对应的抽取信息，以及所述抽取信息的置信度，确定所述文本类型的业务文档图像中所述目标业务内容的关键信息。A key information determination unit, configured to determine the key information of the target business content in the text-type business document image according to the extraction information corresponding to the document extraction field and the confidence level of the extracted information.

进一步的，所述输入序列构建单元具体用于：Further, the input sequence construction unit is specifically used for:

对所述文本类型的业务文档图像中的文本信息进行预处理，得到预处理文本；其中，所述预处理包括：清洗处理、符号替换处理，以及格式转换处理中的至少一种；Preprocessing the text information in the text-type business document image to obtain preprocessed text; wherein, the preprocessing includes at least one of cleaning processing, symbol replacement processing, and format conversion processing;

根据所述预处理文本、预设间隔符和文档提取字段，构建输入序列。An input sequence is constructed from the preprocessed text, preset spacers, and document extraction fields.

进一步的，所述关键信息提取模块1302，具体用于：Further, the key information extraction module 1302 is specifically used for:

在所述文档类型为表格类型的情况下，对所述表格类型的业务文档图像进行分块，得到图像块序列，并确定所述图像块序列中图像块的图像特征、所述图像块在所述业务文档图像中的块位置，以及所述图像块在所述图像块序列中的块序号；In the case where the document type is a form type, the business document image of the form type is divided into blocks to obtain a sequence of image blocks, and the image features of the image blocks in the sequence of image blocks and the location of the image blocks in the image block sequence are determined. the block position in the business document image, and the block sequence number of the image block in the image block sequence;

对所述业务文档图像中的文本信息进行词量化处理，得到词序列，并确定所述词序列中量化词的词特征、所述量化词在所述业务文档图像中的词位置，以及所述量化词在所述词序列中的词序号；Perform word quantization processing on the text information in the business document image to obtain a word sequence, and determine the word feature of the quantified word in the word sequence, the word position of the quantified word in the business document image, and the the word number of the quantified word in the word sequence;

根据所述图像块序列中图像块的图像特征、块位置和块序号，以及所述词序列中量化词的词特征、词位置和词序号，确定模型输入特征；According to the image feature, block position and block serial number of the image block in the image block sequence, and the word feature, word position and word serial number of the quantized word in the word sequence, determine the model input feature;

通过表格图像感知模型，根据所述模型输入特征，从所述表格类型的业务文档图像中提取所述目标业务内容的关键信息。Through the form image perception model, the key information of the target business content is extracted from the form type business document image according to the model input feature.

进一步的，所述关键信息提取模块1302，还具体用于：Further, the key information extraction module 1302 is also specifically used for:

在所述文档类型为表格类型的情况下，通过表格图像感知模型，从所述表格类型的业务文档图像中提取所述目标业务内容的第一候选信息；In the case that the document type is a table type, extract the first candidate information of the target business content from the business document image of the table type by using a table image perception model;

根据结构化信息抽取逻辑，从所述表格类型的业务文档图像中提取所述目标业务内容的第二候选信息；extracting the second candidate information of the target business content from the table-type business document image according to the structured information extraction logic;

根据所述第一候选信息和所述第二候选信息，确定所述表格类型的业务文档图像中所述目标业务内容的关键信息。According to the first candidate information and the second candidate information, key information of the target business content in the form-type business document image is determined.

进一步的，所述业务文档审核模块1303，包括：Further, the business document review module 1303 includes:

信息对划分单元，用于根据所提取的关键信息对应的业务文档图像和文档提取字段，将所提取的关键信息划分为至少一个信息对，其中，同一信息对中的关键信息对应的文档提取字段的语义相同，且取自不同文档类型的业务文档图像；An information pair dividing unit, configured to divide the extracted key information into at least one information pair according to the business document image and document extraction field corresponding to the extracted key information, wherein the document extraction field corresponding to the key information in the same information pair have the same semantics and are taken from business document images of different document types;

审核规则确定单元，用于根据所述信息对对应的文档提取字段，为所述信息对确定目标审核规则；an audit rule determination unit, configured to extract fields from the corresponding document according to the information pair, and determine a target audit rule for the information pair;

信息审核单元，用于根据所述信息对对应的目标审核规则，对所述信息对中的关键信息进行一致性审核，并根据审核结果，确定所述至少两种业务文档的审核结果。The information auditing unit is configured to perform consistency auditing on the key information in the information pair according to the corresponding target auditing rules of the information pair, and determine the auditing results of the at least two business documents according to the auditing results.

进一步的，所述审核规则确定单元，具体执行如下至少一项：Further, the audit rule determination unit specifically implements at least one of the following:

在所述信息对对应的文档提取字段的信息表达方式唯一的情况下，所述信息对的目标审核规则为字符一致性审核；In the case that the information expression mode of the corresponding document extraction field of the information pair is unique, the target review rule of the information pair is character consistency review;

在所述信息对对应的文档提取字段的信息表达方式不唯一的情况下，所述信息对的目标审核规则为语义相似度审核；In the case that the information expression mode of the corresponding document extraction field of the information pair is not unique, the target review rule of the information pair is semantic similarity review;

在所述信息对对应的文档提取字段表征数值信息的情况下，所述信息对的目标审核规则为数值字符一致性审核；In the case that the information pair corresponds to the document extraction field representing numerical information, the target audit rule for the information pair is numerical character consistency audit;

在所述信息对对应的文档提取字段表征日期信息的情况下，所述信息对的目标审核规则为日期字符一致性审核。In the case that the document extraction field corresponding to the information pair represents date information, the target review rule of the information pair is the date character consistency review.

进一步的，所述业务文档审核模块1303，具体用于：Further, the business document review module 1303 is specifically used for:

对至少两种业务文档图像进行印章识别，得到印章识别结果；Perform seal recognition on at least two business document images to obtain seal recognition results;

根据所述印章识别结果和所提取的关键信息，对所述至少两种业务文档进行审核。The at least two business documents are reviewed according to the seal recognition result and the extracted key information.

进一步的，所述的文档图像的审核装置1300，还包括:Further, the verification device 1300 of the document image also includes:

展示模块，用于通过可视化界面展示所提取的关键信息、所述至少两种业务文档的业务文档图像和审核结果。The display module is used to display the extracted key information, the business document images of the at least two business documents, and the audit result through a visual interface.

进一步的，所述展示模块包括：Further, the display module includes:

排序单元，用于确定每一种业务文档的业务文档图像的展示顺序；a sorting unit, used to determine the display order of the business document images of each business document;

图像展示单元，用于按照所述展示顺序，在可视化界面中展示每一种业务文档的业务文档图像。The image display unit is configured to display the business document image of each business document in the visual interface according to the display sequence.

进一步的，所述展示模块包括：Further, the display module includes:

位置标注单元，用于根据所提取的关键信息对应的业务文档图像和起始位置，在所述至少两种业务文档的业务文档图像中标注关键信息的提取位置；a location labeling unit, configured to label the extraction location of the key information in the business document images of the at least two business documents according to the business document image and the starting location corresponding to the extracted key information;

图像展示单元，用于将标注后的业务文档图像展示在可视化界面中。The image display unit is used to display the marked business document image in the visual interface.

进一步的，所述位置标注单元，具体用于：Further, the position labeling unit is specifically used for:

根据所提取的关键信息对应的起始位置，确定所述关键信息是否属于跨行提取；According to the starting position corresponding to the extracted key information, determine whether the key information belongs to cross-line extraction;

根据确定结果和所述关键信息对应的业务文档图像，在所述至少两种业务文档的业务文档图像中标注关键信息的提取位置。According to the determination result and the business document image corresponding to the key information, the extraction position of the key information is marked in the business document image of the at least two types of business documents.

进一步的，所述展示模块，包括：Further, the display module includes:

关键信息展示单元，用于通过可视化界面展示所提取的关键信息和所述关键信息的置信度。The key information display unit is used to display the extracted key information and the confidence level of the key information through a visual interface.

进一步的，所述的文档图像的审核装置1300，还包括：Further, the document image review device 1300 further includes:

信息接收模块，用于接收作用于可视化界面产生的修改信息；The information receiving module is used to receive the modification information generated by acting on the visual interface;

界面更新模块，用于根据所述修改信息，在所述可视化界面更新所述至少两种业务文档的审核结果。An interface update module, configured to update the audit results of the at least two business documents on the visual interface according to the modification information.

上述产品可执行本公开任意实施例所提供的方法，具备执行方法相应的功能模块和有益效果。The above product can execute the method provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.

本公开的技术方案中，所涉及的任一文档类型的业务文档，以及业务文档的业务文档图像等的获取，存储和应用等，均符合相关法律法规的规定，且不违背公序良俗。In the technical solution of the present disclosure, the business documents of any document type involved, as well as the acquisition, storage and application of business document images of business documents, etc., all comply with relevant laws and regulations, and do not violate public order and good customs.

本公开的技术方案中，所涉及的审批表和交易合同并不是针对某一特定用户的存款工单，并不能反映出某一特定用户的个人信息。In the technical solution of the present disclosure, the approval form and transaction contract involved are not deposit work orders for a specific user, and cannot reflect the personal information of a specific user.

根据本公开的实施例，本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.

图14示出了可以用来实施本公开的实施例的示例电子设备1400的示意性框图。电子设备旨在表示各种形式的数字计算机，诸如，膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字助理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本公开的实现。14 shows a schematic block diagram of an example electronic device 1400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

如图14所示，设备1400包括计算单元1401，其可以根据存储在只读存储器(ROM)1402中的计算机程序或者从存储单元1408加载到随机访问存储器(RAM)1403中的计算机程序，来执行各种适当的动作和处理。在RAM 1403中，还可存储设备1400操作所需的各种程序和数据。计算单元1401、ROM 1402以及RAM 1403通过总线1404彼此相连。输入/输出(I/O)接口1405也连接至总线1404。As shown in FIG. 14 , the device 1400 includes a computing unit 1401 that can be executed according to a computer program stored in a read only memory (ROM) 1402 or a computer program loaded from a storage unit 1408 into a random access memory (RAM) 1403 Various appropriate actions and handling. In the RAM 1403, various programs and data necessary for the operation of the device 1400 can also be stored. The computing unit 1401 , the ROM 1402 , and the RAM 1403 are connected to each other through a bus 1404 . An input/output (I/O) interface 1405 is also connected to bus 1404 .

设备1400中的多个部件连接至I/O接口1405，包括：输入单元1406，例如键盘、鼠标等；输出单元1407，例如各种类型的显示器、扬声器等；存储单元1408，例如磁盘、光盘等；以及通信单元1409，例如网卡、调制解调器、无线通信收发机等。通信单元1409允许设备1400通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Various components in the device 1400 are connected to the I/O interface 1405, including: an input unit 1406, such as a keyboard, mouse, etc.; an output unit 1407, such as various types of displays, speakers, etc.; a storage unit 1408, such as a magnetic disk, an optical disk, etc. ; and a communication unit 1409, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1409 allows the device 1400 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

计算单元1401可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元1401的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元1401执行上文所描述的各个方法和处理，例如业务文档的审核方法。例如，在一些实施例中，业务文档的审核方法可被实现为计算机软件程序，其被有形地包含于机器可读介质，例如存储单元1408。在一些实施例中，计算机程序的部分或者全部可以经由ROM 1402和/或通信单元1409而被载入和/或安装到设备1400上。当计算机程序加载到RAM 1403并由计算单元1401执行时，可以执行上文描述的业务文档的审核方法的一个或多个步骤。备选地，在其他实施例中，计算单元1401可以通过其他任何适当的方式(例如，借助于固件)而被配置为执行业务文档的审核方法。Computing unit 1401 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of computing units 1401 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various specialized artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1401 executes the various methods and processes described above, such as the review method of business documents. For example, in some embodiments, the review method of a business document may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1408 . In some embodiments, part or all of the computer program may be loaded and/or installed on device 1400 via ROM 1402 and/or communication unit 1409 . When the computer program is loaded into RAM 1403 and executed by computing unit 1401, one or more steps of the above-described review method for business documents may be performed. Alternatively, in other embodiments, the computing unit 1401 may be configured to perform the review method of the business document by any other suitable means (eg, by means of firmware).

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、复杂可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described herein above can be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips system (SOC), complex programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that The processor, which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.

用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器，使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行，作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, performs the functions/functions specified in the flowcharts and/or block diagrams. Action is implemented. The program code may execute entirely on the machine, partly on the machine, partly on the machine and partly on a remote machine as a stand-alone software package or entirely on the remote machine or server.

在本公开的上下文中，机器可读介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

为了提供与用户的交互，可以在计算机上实施此处描述的系统和技术，该计算机具有：用于向用户显示信息的显示装置(例如，CRT(阴极射线管)或者LCD(液晶显示器)监视器)；以及键盘和指向装置(例如，鼠标或者轨迹球)，用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：局域网(LAN)、广域网(WAN)、区块链网络和互联网。The systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the Internet.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器，又称为云计算服务器或云主机，是云计算服务体系中的一项主机产品，以解决了传统物理主机与VPS服务中，存在的管理难度大，业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器，或者是结合了区块链的服务器。A computer system can include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also known as a cloud computing server or a cloud host. It is a host product in the cloud computing service system to solve the traditional physical host and VPS services, which are difficult to manage and weak in business scalability. defect. The server can also be a server of a distributed system, or a server combined with a blockchain.

人工智能是研究使计算机来模拟人的某些思维过程和智能行为(如学习、推理、思考、规划等)的学科，既有硬件层面的技术也有软件层面的技术。人工智能硬件技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理等技术；人工智能软件技术主要包括计算机视觉技术、语音识别技术、自然语言处理技术及机器学习/深度学习技术、大数据处理技术、知识图谱技术等几大方向。Artificial intelligence is the study of making computers to simulate certain thinking processes and intelligent behaviors of people (such as learning, reasoning, thinking, planning, etc.), both hardware-level technology and software-level technology. AI hardware technologies generally include technologies such as sensors, dedicated AI chips, cloud computing, distributed storage, and big data processing; AI software technologies mainly include computer vision technology, speech recognition technology, natural language processing technology, and machine learning/depth Learning technology, big data processing technology, knowledge graph technology and other major directions.

云计算(cloud computing)，指的是通过网络接入弹性可扩展的共享物理或虚拟资源池，资源可以包括服务器、操作系统、网络、软件、应用和存储设备等，并可以按需、自服务的方式对资源进行部署和管理的技术体系。通过云计算技术，可以为人工智能、区块链等技术应用、模型训练提供高效强大的数据处理能力。Cloud computing refers to accessing elastically scalable shared physical or virtual resource pools through the network. Resources can include servers, operating systems, networks, software, applications and storage devices, etc., and can be self-service on demand and on demand. A technical system for deploying and managing resources in a way. Through cloud computing technology, it can provide efficient and powerful data processing capabilities for artificial intelligence, blockchain and other technical applications and model training.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行，只要能够实现本公开公开的技术方案所期望的结果，本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present disclosure can be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, no limitation is imposed herein.

上述具体实施方式，并不构成对本公开保护范围的限制。本领域技术人员应该明白的是，根据设计要求和其他因素，可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等，均应包含在本公开保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements, and improvements made within the spirit and principles of the present disclosure should be included within the protection scope of the present disclosure.

Claims

1. A business document auditing method comprises the following steps:

acquiring service document images of at least two service documents; wherein, the at least two service documents are all associated with the target service content and belong to different document types;

extracting key information of the target business content from a business document image corresponding to the document type according to information extraction modes corresponding to different document types;

and auditing the at least two service documents according to the extracted key information.

2. The method according to claim 1, wherein the extracting key information of the target service content from the service document image corresponding to the document type according to the information extraction manner corresponding to the different document types includes:

under the condition that the document type is a text type, constructing an input sequence according to a preset spacer, a document extraction field and text information in a service document image of the text type;

determining an extraction starting position corresponding to the document extraction field and a confidence coefficient of the extraction starting position according to the input sequence through a text image perception model;

according to the extraction starting position and the confidence coefficient of the extraction starting position, determining extraction information corresponding to the document extraction field and the confidence coefficient of the extraction information;

and determining key information of the target business content in the business document image of the text type according to the extraction information corresponding to the document extraction field and the confidence coefficient of the extraction information.

3. The method of claim 2, wherein the constructing an input sequence according to a preset spacer, a document extraction field and text information in the text type service document image comprises:

preprocessing the text information in the service document image of the text type to obtain a preprocessed text; wherein the pre-processing comprises: at least one of a cleaning process, a symbol replacement process, and a format conversion process;

and constructing an input sequence according to the preprocessed text, the preset spacers and the document extraction field.

4. The method according to claim 1, wherein the extracting key information of the target service content from the service document image corresponding to the document type according to the information extraction manner corresponding to the different document types includes:

under the condition that the document type is the table type, partitioning the service document image of the table type to obtain an image block sequence, and determining the image characteristics of the image blocks in the image block sequence, the block positions of the image blocks in the service document image and the block sequence numbers of the image blocks in the image block sequence;

performing word quantization processing on text information in the service document image to obtain a word sequence, and determining word characteristics of quantized words in the word sequence, word positions of the quantized words in the service document image, and word sequence numbers of the quantized words in the word sequence;

determining model input characteristics according to the image characteristics, the block positions and the block serial numbers of the image blocks in the image block sequence and the word characteristics, the word positions and the word serial numbers of the quantization words in the word sequence;

and extracting key information of the target business content from the business document image of the form type according to the input characteristics of the form image perception model.

5. The method according to claim 1, wherein the extracting key information of the target service content from the service document image corresponding to the document type according to the information extraction manner corresponding to the different document types includes:

under the condition that the document type is a table type, extracting first candidate information of the target business content from a business document image of the table type through a table image perception model;

extracting second candidate information of the target business content from the business document image of the table type according to structured information extraction logic;

and determining key information of the target business content in the business document image of the form type according to the first candidate information and the second candidate information.

6. The method of claim 1, wherein the auditing the at least two business documents according to the extracted key information comprises:

dividing the extracted key information into at least one information pair according to the business document image and the document extraction field corresponding to the extracted key information, wherein the semantics of the document extraction fields corresponding to the key information in the same information pair are the same and are taken from business document images of different document types;

extracting fields according to the documents corresponding to the information pairs, and determining target auditing rules for the information pairs;

and performing consistency audit on the key information in the information pair according to the target audit rule corresponding to the information pair, and determining the audit results of the at least two service documents according to the audit results.

7. The method of claim 6, wherein the determining a target audit rule for the information pair according to the document extraction field corresponding to the information pair comprises at least one of:

under the condition that the information expression mode of the document extraction field corresponding to the information pair is unique, the target auditing rule of the information pair is character consistency auditing;

under the condition that the information expression mode of the document extraction field corresponding to the information pair is not unique, the target auditing rule of the information pair is semantic similarity auditing;

under the condition that the document extraction field corresponding to the information represents numerical information, the target auditing rule of the information pair is numerical character consistency auditing;

and under the condition that the document extraction field corresponding to the information represents date information, the target auditing rule of the information pair is date character consistency auditing.

8. The method of claim 1, wherein the auditing the at least two business documents according to the extracted key information comprises:

carrying out seal identification on at least two service document images to obtain seal identification results;

and auditing the at least two service documents according to the seal identification result and the extracted key information.

9. The method according to any one of claims 1-8, further including:

and displaying the extracted key information, the service document images and the auditing results of the at least two service documents through a visual interface.

10. The method of claim 9, wherein presenting the business document image of at least two business documents through a visualization interface comprises:

determining the display sequence of the business document images of each business document;

and displaying the business document image of each business document in a visual interface according to the display sequence.

11. The method of claim 9, wherein presenting the business document image of at least two business documents through a visualization interface comprises:

and marking the extraction position of the key information in the service document images of the at least two service documents according to the service document images and the initial positions corresponding to the extracted key information, and displaying the marked service document images in a visual interface.

12. The method of claim 11, wherein labeling the extraction positions of the key information in the business document images of the at least two business documents according to the business document images and the starting positions corresponding to the extracted key information comprises:

determining whether the key information belongs to cross-row extraction or not according to the initial position corresponding to the extracted key information;

and marking the extraction position of the key information in the service document images of the at least two service documents according to the determination result and the service document images corresponding to the key information.

13. The method of claim 9, wherein presenting the extracted key information through a visualization interface comprises:

displaying the extracted key information and the confidence coefficient of the key information through a visual interface.

14. The method according to any one of claims 9-13, further including:

receiving modification information generated by acting on the visual interface;

and updating the auditing results of the at least two business documents on the visual interface according to the modification information.

15. An auditing device for business documents, comprising:

the document image acquisition module is used for acquiring service document images of at least two service documents; the at least two business documents are associated with the target business content and belong to different document types;

the key information extraction module is used for extracting the key information of the target service content from the service document image of the corresponding document type according to the information extraction modes corresponding to different document types;

and the business document auditing module is used for auditing the at least two business documents according to the extracted key information.

16. The apparatus of claim 15, wherein the key information extraction module comprises:

an input sequence construction unit, configured to construct an input sequence according to a preset spacer, a document extraction field, and text information in a service document image of the text type when the document type is a text type;

the model sensing unit is used for determining an extraction initial position corresponding to the document extraction field and a confidence coefficient of the extraction initial position according to the input sequence through a text image sensing model;

an extraction information determining unit, configured to determine, according to the extraction start position and the confidence level of the extraction start position, extraction information corresponding to the document extraction field and the confidence level of the extraction information;

and the key information determining unit is used for determining the key information of the target business content in the business document image of the text type according to the extraction information corresponding to the document extraction field and the confidence coefficient of the extraction information.

17. The apparatus according to claim 16, wherein the input sequence construction unit is specifically configured to:

18. The apparatus according to claim 15, wherein the key information extraction module is specifically configured to:

19. The apparatus of claim 15, wherein the key information extraction module is further specifically configured to:

extracting second candidate information of the target business content from the business document image of the form type according to structured information extraction logic;

and determining key information of the target service content in the form type service document image according to the first candidate information and the second candidate information.

20. The apparatus of claim 15, wherein the business document auditing module comprises:

the information pair dividing unit is used for dividing the extracted key information into at least one information pair according to the business document image and the document extraction field corresponding to the extracted key information, wherein the semantics of the document extraction fields corresponding to the key information in the same information pair are the same and are taken from the business document images of different document types;

the auditing rule determining unit is used for extracting fields from the document corresponding to the information pair and determining a target auditing rule for the information pair;

and the information auditing unit is used for performing consistency auditing on the key information in the information pair according to the target auditing rule corresponding to the information pair, and determining the auditing results of the at least two service documents according to the auditing results.

21. The apparatus according to claim 20, wherein the audit rule determining unit specifically performs at least one of:

under the condition that the information expression mode of the corresponding document extraction field of the information pair is unique, the target auditing rule of the information pair is character consistency auditing;

under the condition that the document extraction field corresponding to the information pair represents numerical information, the target auditing rule of the information pair is numerical character consistency auditing;

22. The apparatus according to claim 15, wherein the service document auditing module is specifically configured to:

23. The apparatus of any of claims 15-22, further comprising:

and the display module is used for displaying the extracted key information, the business document images of the at least two business documents and the auditing result through a visual interface.

24. The apparatus of claim 23, wherein the presentation module comprises:

the sequencing unit is used for determining the display sequence of the business document images of each business document;

and the image display unit is used for displaying the business document image of each business document in a visual interface according to the display sequence.

25. The apparatus of claim 23, wherein the presentation module comprises:

the position marking unit is used for marking the extraction positions of the key information in the business document images of the at least two business documents according to the business document images and the initial positions corresponding to the extracted key information;

and the image display unit is used for displaying the marked business document image in a visual interface.

26. The apparatus according to claim 25, wherein the location labeling unit is specifically configured to:

27. The apparatus of claim 23, wherein the presentation module comprises:

and the key information display unit is used for displaying the extracted key information and the confidence coefficient of the key information through a visual interface.

28. The apparatus of any of claims 23-27, further comprising:

the information receiving module is used for receiving modification information generated by acting on the visual interface;

and the interface updating module is used for updating the auditing results of the at least two service documents on the visual interface according to the modification information.

29. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of auditing a business document according to any of claims 1-14.

30. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform a method of auditing a business document according to any of claims 1-14.

31. A computer program product comprising a computer program which, when executed by a processor, implements a method of auditing business documents according to any of claims 1-14.