[go: up one dir, main page]

CN113836878B - Table generation method, device, electronic device and storage medium combining RPA and AI - Google Patents

Table generation method, device, electronic device and storage medium combining RPA and AI Download PDF

Info

Publication number
CN113836878B
CN113836878B CN202111026974.3A CN202111026974A CN113836878B CN 113836878 B CN113836878 B CN 113836878B CN 202111026974 A CN202111026974 A CN 202111026974A CN 113836878 B CN113836878 B CN 113836878B
Authority
CN
China
Prior art keywords
target
rpa system
cell
cells
intersection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111026974.3A
Other languages
Chinese (zh)
Other versions
CN113836878A (en
Inventor
黄安
汪冠春
胡一川
褚瑞
李玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Laiye Network Technology Co Ltd
Laiye Technology Beijing Co Ltd
Original Assignee
Beijing Laiye Network Technology Co Ltd
Laiye Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Laiye Network Technology Co Ltd, Laiye Technology Beijing Co Ltd filed Critical Beijing Laiye Network Technology Co Ltd
Priority to CN202111026974.3A priority Critical patent/CN113836878B/en
Publication of CN113836878A publication Critical patent/CN113836878A/en
Application granted granted Critical
Publication of CN113836878B publication Critical patent/CN113836878B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/174Form filling; Merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)
  • Document Processing Apparatus (AREA)

Abstract

本公开提供了一种结合RPA和AI的表格生成方法、装置、电子设备及存储介质,涉及人工智能领域。该方案为:由RPA系统执行,RPA系统从图像中基于人工智能AI提取第一表格的横线和竖线;RPA系统获取横线与竖线的交点集合,其中,交点集合中包括由横线和竖线相交而成的第一类交点,以及横线的延长线和/或竖线的延长线相交而成的第二类交点;RPA系统根据交点集合生成与第一表格一致的空白的第二表格;RPA系统将从图像中基于OCR识别出的文本条目,填充至空白的第二表格中,得到目标表格。本公开运用RPA技术,识别图片中的表格,并将其还原为拥有相同表格结构的表格文档,自动地将线下数据转化为线上数据,取代了繁琐的人工处理流程,提高了表格生成的效率。

The present disclosure provides a table generation method, device, electronic device and storage medium combining RPA and AI, and relates to the field of artificial intelligence. The scheme is: executed by an RPA system, the RPA system extracts the horizontal and vertical lines of the first table from the image based on artificial intelligence AI; the RPA system obtains a set of intersections of the horizontal and vertical lines, wherein the intersection set includes a first type of intersection formed by the intersection of the horizontal and vertical lines, and a second type of intersection formed by the intersection of the extension of the horizontal line and/or the extension of the vertical line; the RPA system generates a blank second table consistent with the first table based on the intersection set; the RPA system fills the text entries identified from the image based on OCR into the blank second table to obtain the target table. The present disclosure uses RPA technology to identify the table in the image and restore it to a table document with the same table structure, automatically converting offline data into online data, replacing the cumbersome manual processing process, and improving the efficiency of table generation.

Description

结合RPA和AI的表格生成方法、装置、电子设备及存储介质Table generation method, device, electronic device and storage medium combining RPA and AI

技术领域Technical Field

本公开涉及人工智能领域,尤其涉及一种结合RPA和AI的表格生成方法、装置、电子设备及存储介质。The present disclosure relates to the field of artificial intelligence, and in particular to a table generation method, device, electronic device and storage medium combining RPA and AI.

背景技术Background Art

机器人流程自动化(RoboticProcessAutomation,RPA)是通过特定的“机器人软件”,模拟人在计算机上的操作,按规则自动执行流程任务。Robotic Process Automation (RPA) uses specific "robot software" to simulate human operations on computers and automatically execute process tasks according to rules.

人工智能(ArtificialIntelligence,AI)是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门技术科学。Artificial Intelligence (AI) is a technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.

相关技术中,由工作人员打字、复制粘贴将图片中的表格转化为线上的表格文档,工作重复机械,浪费人力且效率不高。因此,如何提高表格生成的效率,解放人力,是现在需要急迫解决的事情。In the related technology, the staff typed, copied and pasted to convert the table in the picture into an online table document, which was repetitive and mechanical, wasted manpower and was inefficient. Therefore, how to improve the efficiency of table generation and free up manpower is an urgent issue that needs to be solved now.

发明内容Summary of the invention

本公开提供了一种结合RPA和AI的表格生成方法、装置、电子设备及存储介质。The present disclosure provides a table generation method, device, electronic device and storage medium combining RPA and AI.

根据本公开的一方面,提供了一种结合RPA和AI的表格生成方法,包括:According to one aspect of the present disclosure, a table generation method combining RPA and AI is provided, including:

RPA系统从图像中基于人工智能AI提取第一表格的横线和竖线;The RPA system extracts the horizontal and vertical lines of the first table from the image based on artificial intelligence AI;

RPA系统获取横线与竖线的交点集合,其中,交点集合中包括由横线和竖线相交而成的第一类交点,以及横线的延长线和/或竖线的延长线相交而成的第二类交点;The RPA system obtains a set of intersections of the horizontal line and the vertical line, wherein the set of intersections includes a first type of intersection formed by the intersection of the horizontal line and the vertical line, and a second type of intersection formed by the intersection of the extension line of the horizontal line and/or the extension line of the vertical line;

RPA系统根据交点集合生成与第一表格一致的空白的第二表格;The RPA system generates a blank second table consistent with the first table based on the set of intersection points;

RPA系统将从图像中基于OCR识别出的文本条目,填充至空白的第二表格中,得到目标表格。The RPA system fills the text entries recognized based on OCR from the image into the blank second table to obtain the target table.

本公开实施例运用RPA技术,识别图片中的表格,并将其还原为拥有相同表格结构的表格文档,自动地将线下数据转化为线上数据,取代了繁琐的人工处理流程,提高了表格生成的效率。The disclosed embodiment uses RPA technology to identify tables in images and restore them to table documents with the same table structure, automatically converting offline data into online data, replacing cumbersome manual processing procedures and improving the efficiency of table generation.

根据本公开的另一方面,提供了一种结合RPA和AI的表格生成装置,包括:According to another aspect of the present disclosure, a table generation device combining RPA and AI is provided, including:

提取模块,用于从图像中基于人工智能AI提取第一表格的横线和竖线;An extraction module, used for extracting horizontal lines and vertical lines of the first table from the image based on artificial intelligence AI;

交点获取模块,用于获取横线与竖线的交点集合,其中,交点集合中包括由横线和竖线相交而成的第一类交点,以及横线的延长线和/或竖线的延长线相交而成的第二类交点;An intersection acquisition module, used to acquire a set of intersections of horizontal lines and vertical lines, wherein the set of intersections includes first-type intersections formed by the intersection of horizontal lines and vertical lines, and second-type intersections formed by the intersection of extended lines of horizontal lines and/or extended lines of vertical lines;

生成模块,用于根据交点集合生成与第一表格一致的空白的第二表格;A generating module, used for generating a blank second table consistent with the first table according to the intersection point set;

填充模块,用于将从图像中基于OCR识别出的文本条目,填充至空白的第二表格中,得到目标表格。The filling module is used to fill the text entries recognized based on OCR in the image into the blank second table to obtain the target table.

根据本公开的另一方面,提供了一种电子设备,包括存储器、处理器;According to another aspect of the present disclosure, there is provided an electronic device, including a memory and a processor;

其中,处理器通过读取存储器中存储的可执行程序代码来运行与可执行程序代码对应的程序,以用于实现本公开第一个方面实施例的结合RPA和AI的表格生成方法。Among them, the processor runs the program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement the table generation method combining RPA and AI in the embodiment of the first aspect of the present disclosure.

根据本公开的另一方面,提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本公开第一个方面实施例的结合RPA和AI的表格生成方法。According to another aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored. When the program is executed by a processor, the table generation method combining RPA and AI of the first aspect of the present disclosure is implemented.

根据本公开的另一方面,提供了一种计算机程序产品,包括计算机程序,计算机程序在被处理器执行时实现本公开第一个方面实施例的结合RPA和AI的表格生成方法。According to another aspect of the present disclosure, a computer program product is provided, including a computer program, which, when executed by a processor, implements the table generation method combining RPA and AI of the first aspect of the present disclosure.

应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that the content described in this section is not intended to identify the key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will become easily understood through the following description.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图;FIG1 is a flow chart of a table generation method combining RPA and AI according to an embodiment of the present disclosure;

图2是待识别的表格与检测到的横线和竖线的示意图;FIG2 is a schematic diagram of a table to be identified and detected horizontal and vertical lines;

图3是检测到的横线与竖线构成的交点集合的示意图;FIG3 is a schematic diagram of a set of intersection points formed by detected horizontal and vertical lines;

图4是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图;FIG4 is a flow chart of a table generation method combining RPA and AI according to an embodiment of the present disclosure;

图5是枚举候选单元格的示意图;FIG5 is a schematic diagram of enumerating candidate cells;

图6是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图;FIG6 is a flowchart of a table generation method combining RPA and AI according to an embodiment of the present disclosure;

图7是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图;FIG7 is a flowchart of a table generation method combining RPA and AI according to an embodiment of the present disclosure;

图8是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图;FIG8 is a flowchart of a table generation method combining RPA and AI according to an embodiment of the present disclosure;

图9是由交点组成交点对的示意图;FIG9 is a schematic diagram of a pair of intersection points composed of intersection points;

图10是由交点对组成基础单元格的示意图;FIG10 is a schematic diagram of a basic unit cell composed of a pair of intersection points;

图11是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图;FIG11 is a flowchart of a table generation method combining RPA and AI according to an embodiment of the present disclosure;

图12是确定目标单元格后,更新矩阵元素取值的示意图;FIG12 is a schematic diagram of updating the values of matrix elements after determining the target cell;

图13是检测到目标行后,分裂为两个子目标布尔矩阵的示意图;FIG13 is a schematic diagram of a Boolean matrix split into two sub-targets after detecting the target row;

图14是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图;FIG14 is a flowchart of a table generation method combining RPA and AI according to an embodiment of the present disclosure;

图15是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图;FIG15 is a flowchart of a table generation method combining RPA and AI according to an embodiment of the present disclosure;

图16是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图;FIG16 is a flowchart of a table generation method combining RPA and AI according to an embodiment of the present disclosure;

图17是根据本公开一个实施例的结合RPA和AI的表格生成装置的结构图;FIG17 is a structural diagram of a table generation device combining RPA and AI according to an embodiment of the present disclosure;

图18是用来实现本公开实施例的结合RPA和AI的表格生成方法的电子设备的框图。FIG18 is a block diagram of an electronic device used to implement the table generation method combining RPA and AI according to an embodiment of the present disclosure.

具体实施方式DETAILED DESCRIPTION

下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and are intended to be used to explain the present invention, and should not be construed as limiting the present invention.

下面结合参考附图描述本公开的结合RPA和AI的表格生成方法、装置、电子设备及存储介质。The following describes the table generation method, device, electronic device and storage medium combining RPA and AI in the present disclosure with reference to the accompanying drawings.

图1是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图,如图1所示,该方法包括以下步骤:FIG. 1 is a flow chart of a table generation method combining RPA and AI according to an embodiment of the present disclosure. As shown in FIG. 1 , the method includes the following steps:

S101,RPA系统从图像中基于人工智能AI提取第一表格的横线和竖线。S101, the RPA system extracts horizontal lines and vertical lines of the first table from the image based on artificial intelligence AI.

RPA是模拟人在PC上的操作行为的一项技术,现已开始应用于企业生产办公。RPA的核心是通过自动化技术来“替代人”进行重复性、低价值、无需人工决策等固定性流程化操作,从而有效提升工作效率,减少错误。RPA is a technology that simulates human operation on a PC and has begun to be applied to corporate production and office. The core of RPA is to use automation technology to "replace people" to perform repetitive, low-value, fixed process operations that do not require manual decision-making, thereby effectively improving work efficiency and reducing errors.

RPA系统获取带有表格的图像,第一表格即为图像中待识别的表格,使用表格模型分别检测出图像中的横竖表格线,表格线可以一定程度的倾斜和弯曲。基于人工智能AI提取第一表格的横线和竖线,可选地,可以用传统的计算机视觉(CV)算法提取图中的横竖线。对于端点距离近的线段进行连接,防止有识别误差,得到最终的横竖线。以图2作为示例,从左至右分别为原图、检测到的横线、检测到的竖线。The RPA system obtains an image with a table. The first table is the table to be identified in the image. The table model is used to detect the horizontal and vertical table lines in the image respectively. The table lines can be tilted and curved to a certain extent. The horizontal and vertical lines of the first table are extracted based on artificial intelligence AI. Optionally, the horizontal and vertical lines in the image can be extracted using traditional computer vision (CV) algorithms. Line segments with close endpoints are connected to prevent recognition errors and obtain the final horizontal and vertical lines. Taking Figure 2 as an example, from left to right are the original image, the detected horizontal lines, and the detected vertical lines.

S102,RPA系统获取横线与竖线的交点集合,其中,交点集合中包括由横线和竖线相交而成的第一类交点,以及横线的延长线和/或竖线的延长线相交而成的第二类交点。S102: The RPA system obtains a set of intersections of horizontal lines and vertical lines, wherein the set of intersections includes first-type intersections formed by the intersection of horizontal lines and vertical lines, and second-type intersections formed by the intersection of extended lines of the horizontal lines and/or extended lines of the vertical lines.

RPA系统获取横线与竖线的交点集合,假设上述步骤中共得到了x条横线和y条竖线,计算每对横竖线的交点坐标,无论线段是否相交,共可得到x*y个点,如图3所示。The RPA system obtains the set of intersection points of horizontal and vertical lines. Assuming that x horizontal lines and y vertical lines are obtained in the above steps, the coordinates of the intersection points of each pair of horizontal and vertical lines are calculated. Regardless of whether the line segments intersect or not, a total of x*y points can be obtained, as shown in Figure 3.

其中,交点集合中包括由横线和竖线相交而成的第一类交点,这类交点被横竖线同时穿过。The intersection point set includes first-class intersection points formed by the intersection of horizontal lines and vertical lines, and such intersection points are crossed by both horizontal and vertical lines.

其中,交点集合中还包括由线的延长线和/或竖线的延长线相交而成的第二类交点,这类交点只被一条横线/一条竖线穿过。The intersection set also includes second-type intersections formed by the intersection of the extension lines of the lines and/or the extension lines of the vertical lines. This type of intersection is only crossed by one horizontal line/one vertical line.

S103,RPA系统根据交点集合生成与第一表格一致的空白的第二表格。S103: The RPA system generates a blank second table consistent with the first table according to the intersection point set.

参考图3所示,这些交点能够形成很多个表格,RPA系统根据第一类交点和第二类交点的集合识别第一表格的表格结构,对这些表格进行筛选和去重等操作,从中筛选出与第一表格结构一致的空白表格,即第二表格。As shown in reference Figure 3, these intersections can form many tables. The RPA system identifies the table structure of the first table based on the set of first-type intersections and second-type intersections, and performs operations such as screening and deduplication on these tables to screen out blank tables that are consistent with the structure of the first table, namely, the second table.

S104,RPA系统将从图像中基于OCR识别出的文本条目,填充至空白的第二表格中,得到目标表格。S104, the RPA system fills the text entries recognized from the image based on OCR into the blank second table to obtain a target table.

光学字符识别(Optical Character Recognition,ORC)技术,是通过扫描等光学输入方式将各种票据、报刊、书籍、文稿及其它印刷品的文字转化为图像信息,再利用文字识别技术将图像信息转化为可以使用的计算机输入技术。Optical Character Recognition (ORC) technology converts the text of various bills, newspapers, books, manuscripts and other printed materials into image information through optical input methods such as scanning, and then uses text recognition technology to convert the image information into usable computer input technology.

RPA系统基于ORC技术识别图片中第一表格的文本条目,并将其填充至第二表格的对应位置,得到目标表格。目标表格的表格格式与表格内容与第一表格相同,第一表格是纸质化的表格数据,而目标表格是电子化的表格数据。The RPA system identifies the text entries of the first table in the image based on ORC technology, and fills them into the corresponding positions of the second table to obtain the target table. The table format and table content of the target table are the same as those of the first table. The first table is paper table data, while the target table is electronic table data.

本公开实施例中,RPA系统从图像中基于人工智能AI提取第一表格的横线和竖线,RPA系统获取横线与竖线的交点集合,其中,交点集合中包括由横线和竖线相交而成的第一类交点,以及横线的延长线和/或竖线的延长线相交而成的第二类交点,RPA系统根据交点集合生成与第一表格一致的空白的第二表格,RPA系统将从图像中基于OCR识别出的文本条目,填充至空白的第二表格中,得到目标表格。本公开实施例中运用RPA技术,识别图片中的表格,并将其还原为拥有相同表格结构的表格文档,自动地将线下数据转化为线上数据,取代了繁琐的人工处理流程,提高了表格生成的效率。In the disclosed embodiment, the RPA system extracts the horizontal and vertical lines of the first table from the image based on artificial intelligence AI, and the RPA system obtains a set of intersections of the horizontal and vertical lines, wherein the set of intersections includes a first type of intersection formed by the intersection of the horizontal and vertical lines, and a second type of intersection formed by the intersection of the extensions of the horizontal and/or vertical lines. The RPA system generates a blank second table consistent with the first table based on the set of intersections, and the RPA system fills the text entries identified from the image based on OCR into the blank second table to obtain the target table. In the disclosed embodiment, RPA technology is used to identify the table in the image and restore it to a table document with the same table structure, automatically converting offline data into online data, replacing the cumbersome manual processing flow and improving the efficiency of table generation.

图4是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图,在上述实施例的基础上,进一步结合图4,对RPA系统根据交点集合生成与第一表格一致的空白的第二表格的过程进行解释说明,包括以下步骤:FIG4 is a flow chart of a table generation method combining RPA and AI according to an embodiment of the present disclosure. Based on the above embodiment, further combined with FIG4, a process of generating a blank second table consistent with the first table according to the intersection set by the RPA system is explained, including the following steps:

S401,RPA系统根据交点集合中的交点对单元进行枚举,获取候选单元格和候选单元格的属性信息。S401, the RPA system enumerates cells according to the intersection points in the intersection point set to obtain candidate cells and attribute information of the candidate cells.

假设上述步骤中共得到了x条横线和y条竖线,则共可得到x*y个交点,对这x*y个交点枚举所有可能的单元格作为候选单元格,每个单元格由起始行、结束行、起始列、结束列、四个交点坐标构成。Assuming that x horizontal lines and y vertical lines are obtained in the above steps, a total of x*y intersections can be obtained. For these x*y intersections, all possible cells are enumerated as candidate cells. Each cell consists of a starting row, an ending row, a starting column, an ending column, and four intersection coordinates.

候选单元格的属性信息包括单元格的面积、单元格的四角坐标、单元格所对应的横线的起终点和竖线的起终点。The attribute information of the candidate cell includes the area of the cell, the coordinates of the four corners of the cell, the starting and ending points of the horizontal line and the starting and ending points of the vertical line corresponding to the cell.

以图5中的三对交点为例,对单元枚举的过程进行解释说明,为方便描述,将这三对交点分别编号为a、b、c。如图5所示,a与b可以组成一个候选单元格,相应地,b与c也可以组成一个候选单元格,最后,a与c也可以组成一个候选单元格。Taking the three pairs of intersections in Figure 5 as an example, the process of unit enumeration is explained. For the convenience of description, the three pairs of intersections are numbered a, b, and c. As shown in Figure 5, a and b can form a candidate cell, and correspondingly, b and c can also form a candidate cell. Finally, a and c can also form a candidate cell.

S402,RPA系统根据候选单元格的属性信息,从候选单元格中识别用于生成空白的第二表格的目标单元格。S402: The RPA system identifies a target cell for generating a blank second table from the candidate cells according to the attribute information of the candidate cells.

RPA系统根据候选单元格的属性信息,对所有的候选单元格进行遍历,逐一判断每个候选单元格的四条边是否都存在,四条边均存在时,将该候选单元格作为目标单元格。The RPA system traverses all candidate cells according to the attribute information of the candidate cells, and determines whether all four edges of each candidate cell exist one by one. If all four edges exist, the candidate cell is used as the target cell.

S403,RPA系统对目标单元格按照位置排布生成空白的第二表格。S403, the RPA system generates a blank second table for the target cells according to the position arrangement.

RPA系统获取所有的目标单元格和目标单元格的属性信息,根据目标单元格的四角坐标对单元格进行位置排布,生成空白的第二表格。The RPA system obtains all target cells and attribute information of the target cells, arranges the cells according to the four corner coordinates of the target cells, and generates a blank second table.

本公开实施例中,RPA系统根据交点集合中的交点对单元进行枚举,获取候选单元格和候选单元格的属性信息,RPA系统根据候选单元格的属性信息,从候选单元格中识别用于生成空白的第二表格的目标单元格,RPA系统对目标单元格按照位置排布生成空白的第二表格。本公开实施例中根据交点集合,生成了与第一表格结构相同的空白的第二表格,实现了表格结构的检测与生成,是表格生成中最重要的步骤,为后续文本条目的填充奠定了基础。In the disclosed embodiment, the RPA system enumerates the cells according to the intersections in the intersection set, obtains the candidate cells and the attribute information of the candidate cells, and the RPA system identifies the target cells for generating the blank second table from the candidate cells according to the attribute information of the candidate cells, and the RPA system generates the blank second table according to the position arrangement of the target cells. In the disclosed embodiment, a blank second table with the same structure as the first table is generated according to the intersection set, and the detection and generation of the table structure is realized, which is the most important step in table generation and lays the foundation for the subsequent filling of text entries.

图6是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图,在上述实施例的基础上,进一步结合图6,对RPA系统根据候选单元格的属性信息,从候选单元格中识别用于生成空白的第二表格的目标单元格的过程进行解释说明,包括以下步骤:FIG6 is a flow chart of a table generation method combining RPA and AI according to an embodiment of the present disclosure. Based on the above embodiment, FIG6 is further combined to explain the process of the RPA system identifying target cells for generating a blank second table from candidate cells according to attribute information of candidate cells, including the following steps:

S601,RPA系统对枚举的所有候选单元格,按照单元格面积从小到大进行排序。S601, the RPA system sorts all the enumerated candidate cells in ascending order of cell area.

根据候选单元格的四角坐标,可以计算得到单元格的面积,将枚举的所有候选单元格,按照单元格面积从小到大进行排序。According to the coordinates of the four corners of the candidate cells, the area of the cells can be calculated, and all the enumerated candidate cells are sorted from small to large according to the cell area.

S602,RPA系统按序对候选单元格进行遍历,对遍历到的目标候选单元格的存在性进行判断。S602: The RPA system traverses the candidate cells in order and determines the existence of the traversed target candidate cells.

按照面积从小到大的顺序对候选单元格进行遍历,对遍历到的目标候选单元格的存在性进行判断。The candidate cells are traversed in order of area from small to large, and the existence of the traversed target candidate cells is judged.

RPA系统获取目标候选单元格对应的横线的第一起终点和竖线的第二起终点,根据目标候选单元格的四角坐标、第一起终点和第二起终点,判断目标候选单元格的四条边是否均存在,当四条边均存在时,确定目标候选单元格存在。The RPA system obtains the first starting and ending points of the horizontal line and the second starting and ending points of the vertical line corresponding to the target candidate cell, and determines whether all four sides of the target candidate cell exist based on the four corner coordinates, the first starting and ending points, and the second starting and ending points of the target candidate cell. When all four sides exist, it is determined that the target candidate cell exists.

S603,RPA系统每当判断出目标候选单元格存在,则从未遍历到的候选单元格中删除与目标候选单元格存在重叠的单元格,并将判断出存在的目标候选单元格确定为一个目标单元格。S603, whenever the RPA system determines that a target candidate cell exists, the RPA system deletes cells that overlap with the target candidate cell from the candidate cells that have not been traversed, and determines the target candidate cell that is determined to exist as a target cell.

当判断出某个目标候选单元格存在,则与此存在重叠的候选单元格一定不存在。RPA系统从未遍历到的候选单元格中删除与目标候选单元格存在重叠的单元格,减少了后续遍历的工作量。When it is determined that a target candidate cell exists, the candidate cell that overlaps with it must not exist. The RPA system deletes the cells that overlap with the target candidate cell from the candidate cells that have not been traversed, reducing the workload of subsequent traversal.

RPA系统将判断出存在的目标候选单元格确定为一个目标单元格,用于生成空白的第二表格。The RPA system determines the target candidate cell that exists as a target cell for generating a blank second table.

S604,RPA系统对删除后未遍历到的候选单元格继续按序遍历,直至遍历结束得到所有的目标单元格。S604, the RPA system continues to traverse the candidate cells that have not been traversed after deletion in order until all target cells are obtained after the traversal.

RPA系统对删除后未遍历到的候选单元格继续按序遍历,在得到新的目标单元格后暂停遍历,删除此时后续未遍历到的与新的目标单元格存在重叠的候选单元格,重复这个操作,直至遍历结束得到所有的目标单元格。The RPA system continues to traverse the candidate cells that have not been traversed after deletion in order, pauses the traversal after obtaining a new target cell, deletes the candidate cells that have not been traversed subsequently and overlap with the new target cell, and repeats this operation until the traversal is completed and all the target cells are obtained.

本公开实施例中,RPA系统对枚举的所有候选单元格,按照单元格面积从小到大进行排序,RPA系统按序对候选单元格进行遍历,对遍历到的目标候选单元格的存在性进行判断,RPA系统每当判断出目标候选单元格存在,则从未遍历到的候选单元格中删除与目标候选单元格存在重叠的单元格,并将判断出存在的目标候选单元格确定为一个目标单元格,RPA系统对删除后未遍历到的候选单元格继续按序遍历,直至遍历结束得到所有的目标单元格。本公开实施例中从候选单元格中得到了用于生成第二表格的目标单元格,初步确定了第二表格的结构构成,为第二表格的生成奠定了基础。In the disclosed embodiment, the RPA system sorts all the enumerated candidate cells according to the cell area from small to large, and the RPA system traverses the candidate cells in order, and judges the existence of the traversed target candidate cells. Whenever the RPA system determines that the target candidate cell exists, it deletes the cells that overlap with the target candidate cells from the candidate cells that have not been traversed, and determines the target candidate cells that are determined to exist as a target cell. The RPA system continues to traverse the candidate cells that have not been traversed after the deletion in order until all the target cells are obtained after the traversal. In the disclosed embodiment, the target cells for generating the second table are obtained from the candidate cells, and the structural composition of the second table is preliminarily determined, laying the foundation for the generation of the second table.

图7是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图,在上述实施例的基础上,进一步结合图7,对RPA系统按序对候选单元格进行遍历,对遍历到的目标候选单元格的存在性进行判断的过程进行解释说明,包括以下步骤:FIG7 is a flow chart of a table generation method combining RPA and AI according to an embodiment of the present disclosure. On the basis of the above embodiment, FIG7 is further combined to explain the process of the RPA system traversing the candidate cells in sequence and judging the existence of the traversed target candidate cells, including the following steps:

S701,RPA系统获取目标候选单元格对应的横线的第一起终点和竖线的第二起终点。S701: The RPA system obtains the first starting and ending points of the horizontal line and the second starting and ending points of the vertical line corresponding to the target candidate cell.

RPA系统获取目标候选单元格对应的横线的第一起终点和竖线的第二起终点,即得到了对应横竖线的空间方位和长度。The RPA system obtains the first starting and ending points of the horizontal line and the second starting and ending points of the vertical line corresponding to the target candidate cell, that is, obtains the spatial orientation and length of the corresponding horizontal and vertical lines.

S702,RPA系统根据目标候选单元格的四角坐标、第一起终点和第二起终点,判断目标候选单元格的四条边是否均存在。S702: The RPA system determines whether all four sides of the target candidate cell exist based on the coordinates of the four corners, the first starting point, and the second starting point of the target candidate cell.

根据候选单元格的四角坐标,可以得到四个角所形成的四条边,以左上角和左下角两个点形成的边为例:According to the coordinates of the four corners of the candidate cell, we can get the four edges formed by the four corners. Take the edge formed by the upper left corner and the lower left corner as an example:

把这个边向Y轴投影,得到在Y轴上的位置和长度,并且把这个边所对应的扫描出的竖线同样向Y轴投影,得到在Y轴上的位置和长度。看二者是否有重合部分,如果有重合就证明这条边存在,不重合则证明这条边不存在。同理可证右上角和右下角两个点形成的边是否存在。Project this edge onto the Y axis to get its position and length on the Y axis, and project the scanned vertical line corresponding to this edge onto the Y axis to get its position and length on the Y axis. See if there is any overlap between the two. If there is, it proves that this edge exists, and if there is no overlap, it proves that this edge does not exist. Similarly, we can prove whether the edge formed by the two points in the upper right corner and the lower right corner exists.

相应地,以左上角和右上角两个点形成的边为例:Accordingly, take the edge formed by the upper left and upper right points as an example:

把这个边向X轴投影,得到在X轴上的位置和长度,并且把这个边所对应的扫描出的横线同样向X轴投影,得到在X轴上的位置和长度。看二者是否有重合部分,如果有重合就证明这条边存在,不重合则证明这条边不存在。同理可证左下角和右下角两个点形成的边是否存在。Project this edge onto the X-axis to get its position and length on the X-axis, and project the scanned horizontal line corresponding to this edge onto the X-axis to get its position and length on the X-axis. See if there is any overlap between the two. If there is, it proves that this edge exists, and if there is no overlap, it proves that this edge does not exist. Similarly, we can prove whether the edge formed by the two points in the lower left corner and the lower right corner exists.

S703,RPA系统在判断出四条边均存在时,确定目标候选单元格存在。S703, when the RPA system determines that all four edges exist, it determines that the target candidate cell exists.

候选单元格的四条边均存在,则确定目标候选单元格存在,可以将其作为目标单元格。If all four sides of the candidate cell exist, it is determined that the target candidate cell exists and can be used as the target cell.

本公开实施例中,RPA系统获取目标候选单元格对应的横线的第一起终点和竖线的第二起终点,RPA系统根据目标候选单元格的四角坐标、第一起终点和第二起终点,判断目标候选单元格的四条边是否均存在,RPA系统在判断出四条边均存在时,确定目标候选单元格存在。本公开实施例中提供了一种判断目标候选单元格是否存在的方法,为RPA系统遍历候选单元格获取所有目标单元格奠定了基础。In the disclosed embodiment, the RPA system obtains the first starting and ending point of the horizontal line and the second starting and ending point of the vertical line corresponding to the target candidate cell. The RPA system determines whether all four sides of the target candidate cell exist based on the four corner coordinates, the first starting and ending points, and the second starting and ending points of the target candidate cell. When the RPA system determines that all four sides exist, it determines that the target candidate cell exists. The disclosed embodiment provides a method for determining whether a target candidate cell exists, which lays the foundation for the RPA system to traverse the candidate cells to obtain all target cells.

图8是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图,在上述实施例的基础上,RPA系统根据候选单元格的属性信息,从候选单元格中识别用于生成空白的第二表格的目标单元格之前,如图8所示,还包括:FIG8 is a flow chart of a table generation method combining RPA and AI according to an embodiment of the present disclosure. Based on the above embodiment, before the RPA system identifies a target cell for generating a blank second table from the candidate cells according to the attribute information of the candidate cells, as shown in FIG8, it further includes:

S801,RPA系统将交点集合中的交点进行顺序排列,并按照同一方向将交点集合中相邻的交点,组成多个交点对。S801, the RPA system arranges the intersection points in the intersection point set in order, and groups the adjacent intersection points in the intersection point set into a plurality of intersection point pairs in the same direction.

RPA系统将交点集合中每一行和每一列的交点按照空间顺序进行排列,并按照同一方向,可选地,可以是横向或者纵向,将交点集合中相邻的交点,组成多个交点对。The RPA system arranges the intersections of each row and column in the intersection set in spatial order, and groups adjacent intersections in the intersection set into multiple intersection pairs in the same direction, optionally horizontally or vertically.

如图9所示,当按照横向组成交点对时,一个交点会和它左边的交点组成一个交点对,和右边的交点组成另一个交点对,一个交点不是仅可存在于一个交点对中。As shown in FIG. 9 , when the intersection pairs are formed horizontally, an intersection will form an intersection pair with the intersection on its left, and another intersection pair with the intersection on its right. An intersection cannot exist in only one intersection pair.

相应地,当按照纵向组成交点对时,一个交点会和它上方的交点组成一个交点对,和下方的交点组成另一个交点对。Correspondingly, when forming intersection pairs in the vertical direction, an intersection will form an intersection pair with the intersection above it, and another intersection pair with the intersection below it.

S802,RPA系统按照同一方向依次获取相邻的交点对,并由相邻的交点对组成基础单元格。S802, the RPA system sequentially obtains adjacent intersection pairs in the same direction, and forms basic cells from the adjacent intersection pairs.

RPA系统按照横向或纵向依次获取相邻的交点对,并由相邻的交点对组成基础单元格。当按照横向将相邻的交点组成交点对时,要按纵向获取相邻的交点对。相应地,当按照纵向将相邻的交点组成交点对时,要按横向获取相邻的交点对。The RPA system sequentially obtains adjacent intersection pairs in the horizontal or vertical direction, and the adjacent intersection pairs form basic cells. When adjacent intersection pairs are formed into intersection pairs in the horizontal direction, adjacent intersection pairs are obtained in the vertical direction. Correspondingly, when adjacent intersection pairs are formed into intersection pairs in the vertical direction, adjacent intersection pairs are obtained in the horizontal direction.

如图10所示,当按照横向获取相邻的交点对组成基础单元格时,一组交点对b会和它左边的交点对a组成一个基础单元格,和右边的交点对c组成另一个基础单元格,一组交点对不是仅可构成一个基础单元格。As shown in FIG10 , when adjacent intersection pairs are obtained horizontally to form basic cells, a group of intersection pairs b will form a basic cell with the intersection pair a on its left, and another basic cell with the intersection pair c on its right. A group of intersection pairs cannot only form one basic cell.

相应地,当按照纵向获取相邻的交点对组成基础单元格时,一组交点对e会和它上方的交点对d组成一个基础单元格,和下方的交点对f组成另一个基础单元格。Correspondingly, when adjacent intersection pairs are obtained vertically to form basic cells, a group of intersection pairs e will form a basic cell with the intersection pair d above it, and form another basic cell with the intersection pair f below it.

S803,RPA系统将基础单元格作为矩阵元素,构建布尔矩阵。S803, the RPA system uses the basic cells as matrix elements to construct a Boolean matrix.

RPA系统将基础单元格作为矩阵元素,构建二维布尔矩阵。假设上述步骤中提取了x条横线和y条竖线,则可得到x*y个交点,此时可以构建(x-1)*(y-1)的布尔矩阵,用于标识每个位置的单元格是否存在。The RPA system uses the basic cells as matrix elements to construct a two-dimensional Boolean matrix. Assuming that x horizontal lines and y vertical lines are extracted in the above steps, x*y intersections can be obtained. At this time, a (x-1)*(y-1) Boolean matrix can be constructed to identify whether the cell at each position exists.

本公开实施例中,RPA系统将交点集合中的交点进行顺序排列,并按照同一方向将交点集合中相邻的交点,组成多个交点对,RPA系统按照同一方向依次获取相邻的交点对,并由相邻的交点对组成基础单元格,RPA系统将基础单元格作为矩阵元素,构建布尔矩阵。本公开实施中构建了布尔矩阵,用于标识每个位置的单元格是否存在,可以辅助记录遍历确定的目标单元格和目标单元格的位置,为后续操作提供了便利。In the embodiment of the present disclosure, the RPA system arranges the intersections in the intersection set in order, and forms multiple intersection pairs with adjacent intersections in the intersection set in the same direction. The RPA system sequentially obtains adjacent intersection pairs in the same direction, and forms basic cells with adjacent intersection pairs. The RPA system uses the basic cells as matrix elements to construct a Boolean matrix. In the implementation of the present disclosure, a Boolean matrix is constructed to identify whether a cell at each position exists, which can assist in recording the target cell determined by traversal and the position of the target cell, and facilitate subsequent operations.

图11是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图,在上述实施例的基础上,进一步结合图11,对RPA系统对目标单元格按照位置排布生成空白的第二表格的过程进行解释说明,包括以下步骤:FIG11 is a flow chart of a table generation method combining RPA and AI according to an embodiment of the present disclosure. Based on the above embodiment, FIG11 is further combined to explain the process of the RPA system generating a blank second table for the target cells according to the position arrangement, including the following steps:

S1101,RPA系统根据目标单元格对布尔矩阵中每个矩阵元素进行赋值,以生成目标布尔矩阵。S1101, the RPA system assigns a value to each matrix element in the Boolean matrix according to the target cell to generate a target Boolean matrix.

布尔矩阵中的初始矩阵元素全为F,作为第一取值。当确定一个目标单元格时,RPA系统根据目标单元格的四角坐标,并根据四角坐标获取目标单元格所包括的第一基础单元格,并对第一基础单元格对应的矩阵元素从第一取值F更新为第二取值T。The initial matrix elements in the Boolean matrix are all F, which is the first value. When a target cell is determined, the RPA system obtains the first basic cell included in the target cell based on the four corner coordinates of the target cell, and updates the matrix element corresponding to the first basic cell from the first value F to the second value T.

如图12所示,比方第二行这个目标单元格包括了三个基础单元格,就需要将这三个基础单元格的对应矩阵元素从第一取值F更新为第二取值T。As shown in FIG. 12 , for example, the target cell in the second row includes three basic cells, and the corresponding matrix elements of the three basic cells need to be updated from the first value F to the second value T.

根据目标单元格对布尔矩阵中每个矩阵元素进行赋值,以生成目标布尔矩阵,其中目标单元格所对应的矩阵元素更新为第二取值,其余的矩阵元素保持为初始的第一取值不变。Each matrix element in the Boolean matrix is assigned a value according to the target cell to generate a target Boolean matrix, wherein the matrix element corresponding to the target cell is updated to the second value, and the remaining matrix elements remain unchanged at the initial first value.

S1102,RPA系统识别目标布尔矩阵中是否包括均为第一取值的目标行和/或目标列。S1102: The RPA system identifies whether the target Boolean matrix includes target rows and/or target columns that are all first values.

RPA系统逐行逐列检测目标布尔矩阵中每个矩阵元素的取值,识别目标布尔矩阵中是否包括均为第一取值的目标行和/或目标列。The RPA system detects the value of each matrix element in the target Boolean matrix row by row and column by column, and identifies whether the target Boolean matrix includes target rows and/or target columns that all have the first value.

S1103,RPA系统在存在目标行和/或目标列,按照目标行和/或目标列对布尔矩阵进行分裂,生成子目标布尔矩阵。S1103: When there are target rows and/or target columns, the RPA system splits the Boolean matrix according to the target rows and/or target columns to generate sub-target Boolean matrices.

检测到存在目标行和/或目标列,则证明这行和/或列中不含有目标单元格,说明图中存在上下/左右关系的两个表格,按照目标行和/或目标列对布尔矩阵进行分裂,生成子目标布尔矩阵。If the existence of a target row and/or target column is detected, it proves that the row and/or column does not contain a target cell, indicating that there are two tables with up-down/left-right relationships in the figure. The Boolean matrix is split according to the target row and/or target column to generate a sub-target Boolean matrix.

如图13所示,检测到存在目标行,则证明这行中不含有目标单元格,说明图中存在上下关系的两个表格。As shown in FIG. 13 , if the target row is detected, it is proved that the row does not contain the target cell, indicating that there are two tables in a top-to-bottom relationship in the figure.

检测到存在目标列,则证明这列中不含有目标单元格,说明图中存在左右关系的两个表格。If the existence of a target column is detected, it proves that this column does not contain a target cell, indicating that there are two tables with a left-right relationship in the figure.

检测到存在目标行和目标列,则证明这行和这列中不含有目标单元格,说明图中存在“田”字关系的四个表格。If the existence of the target row and target column is detected, it proves that the row and the column do not contain the target cell, indicating that there are four tables in the “田” character relationship in the figure.

S1104,RPA系统获取子目标波尔矩阵对应的目标单元格,并按照位置排布以生成子目标布尔矩阵对应的空白的第二表格。S1104, the RPA system obtains the target cells corresponding to the sub-target Boolean matrix, and arranges them according to the positions to generate a blank second table corresponding to the sub-target Boolean matrix.

关于步骤S1104中生成子目标布尔矩阵对应的空白的第二表格的过程可参见步骤S403中目标单元格按照位置排布生成空白的第二表格的过程,此处不再赘述。The process of generating a blank second table corresponding to the sub-target Boolean matrix in step S1104 can be referred to the process of generating a blank second table according to the position arrangement of the target cells in step S403, which will not be repeated here.

本公开实施例中,RPA系统根据目标单元格对布尔矩阵中每个矩阵元素进行赋值,以生成目标布尔矩阵,RPA系统识别目标布尔矩阵中是否包括均为第一取值的目标行和/或目标列,RPA系统在存在目标行和/或目标列,按照目标行和/或目标列对布尔矩阵进行分裂,生成子目标布尔矩阵,RPA系统获取子目标波尔矩阵对应的目标单元格,并按照位置排布以生成子目标布尔矩阵对应的空白的第二表格。本公开实施例中,RPA系统根据布尔矩阵中每个矩阵元素的取值判断图中有没有存在上下/左右关系的两个表格,进一步确定了表格的结构。In the disclosed embodiment, the RPA system assigns a value to each matrix element in the Boolean matrix according to the target cell to generate a target Boolean matrix. The RPA system identifies whether the target Boolean matrix includes target rows and/or target columns that are all first values. If there are target rows and/or target columns, the RPA system splits the Boolean matrix according to the target rows and/or target columns to generate a sub-target Boolean matrix. The RPA system obtains the target cells corresponding to the sub-target Boolean matrix and arranges them according to the position to generate a blank second table corresponding to the sub-target Boolean matrix. In the disclosed embodiment, the RPA system determines whether there are two tables in the figure with an up-down/left-right relationship according to the value of each matrix element in the Boolean matrix, and further determines the structure of the table.

图14是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图,在上述实施例的基础上,RPA系统在存在目标行和/或目标列,按照目标行和/或目标列对布尔矩阵进行分裂,生成子目标布尔矩阵之后,如图14所示,还包括:FIG14 is a flow chart of a table generation method combining RPA and AI according to an embodiment of the present disclosure. Based on the above embodiment, when there are target rows and/or target columns, the RPA system splits the Boolean matrix according to the target rows and/or target columns to generate sub-target Boolean matrices, as shown in FIG14, further comprising:

S1401,RPA系统识别子目标布尔矩阵中是否存在取值为第一取值的目标矩阵元素。S1401, the RPA system identifies whether there is a target matrix element with a first value in the sub-target Boolean matrix.

RPA系统按序对子目标布尔矩阵中每个矩阵元素进行遍历,判断每个矩阵元素的取值是否为第一取值F。若判断出当前遍历到的矩阵元素为第一取值,将该矩阵元素标记为目标矩阵元素。继续对剩余的矩阵元素进行遍历,直至遍历到最后一个矩阵元素,得到子目标布尔矩阵中包括的所有的目标矩阵元素。The RPA system traverses each matrix element in the sub-target Boolean matrix in order to determine whether the value of each matrix element is the first value F. If it is determined that the currently traversed matrix element is the first value, the matrix element is marked as the target matrix element. Continue to traverse the remaining matrix elements until the last matrix element is traversed to obtain all the target matrix elements included in the sub-target Boolean matrix.

S1402,RPA系统将目标矩阵元素的取值从第一取值更新为第二取值。S1402: The RPA system updates the value of the target matrix element from the first value to the second value.

为了保证表格的完整性,需要将目标矩阵元素的取值从第一取值更新为第二取值,才能得到最终的第二表格。但是取值为第一取值的目标矩阵元素可能存在两种情况,可能是被漏掉的独立的一个目标单元格,也有可能是属于已经被识别出的目标单元格。本申请实施例中,可以对目标矩阵元素进行判断,如图15所示。In order to ensure the integrity of the table, the value of the target matrix element needs to be updated from the first value to the second value to obtain the final second table. However, the target matrix element whose value is the first value may exist in two situations, which may be an independent target cell that is missed, or may belong to a target cell that has been identified. In the embodiment of the present application, the target matrix element can be judged, as shown in Figure 15.

RPA系统获取目标矩阵元素对应的目标基础单元格。本公开中,布尔矩阵中一个矩阵元素对应一个基础单元格,在确定出目标矩阵元素后,RPA系统可以基于目标矩阵元素在布尔矩阵中的位置,目标矩阵元素对应的基础单元格,此处称为目标基础单元格。The RPA system obtains the target basic cell corresponding to the target matrix element. In the present disclosure, a matrix element in a Boolean matrix corresponds to a basic cell. After determining the target matrix element, the RPA system can obtain the basic cell corresponding to the target matrix element based on the position of the target matrix element in the Boolean matrix, which is referred to as the target basic cell.

RPA系统判断目标基础单元格是否需要合并到目标矩阵元素相邻的相邻矩阵元素对应的目标单元格中。RPA系统可以获取目标基础单元格的四角坐标,基于四角坐标判断是否需要将目标基础单元格合并已有目标单元格中。若已有目标单元格的覆盖范围包括了目标基础单元格的四角坐标,则确定需要合并到已有目标单元格中。若已有目标单元格的覆盖范围未包括目标基础单元格的四角坐标,则确定无需合并到已有目标单元格中,该目标基础单元格为一个独立的目标单元格。其中,已有目标单元格为目标矩阵元素所相邻的相邻矩阵元素对应的目标单元格中。The RPA system determines whether the target basic cell needs to be merged into the target cell corresponding to the adjacent matrix element adjacent to the target matrix element. The RPA system can obtain the four corner coordinates of the target basic cell, and determine whether the target basic cell needs to be merged into the existing target cell based on the four corner coordinates. If the coverage of the existing target cell includes the four corner coordinates of the target basic cell, it is determined that it needs to be merged into the existing target cell. If the coverage of the existing target cell does not include the four corner coordinates of the target basic cell, it is determined that it does not need to be merged into the existing target cell, and the target basic cell is an independent target cell. Among them, the existing target cell is the target cell corresponding to the adjacent matrix element adjacent to the target matrix element.

若RPA系统判断需要合并,则将目标基础单元格合并到相邻矩阵元素对应的目标单元格中,形成一个更大的目标单元格。If the RPA system determines that a merge is required, the target base cell is merged into the target cell corresponding to the adjacent matrix element to form a larger target cell.

若RPA系统判断无需合并,则将目标基础单元格确定为一个缺失的目标单元格,,将这个地方的取值补成T,作为一个独立的目标单元格,并在按照位置排布目标单元格时对缺失的目标单元格进行补全。If the RPA system determines that there is no need to merge, the target basic cell is determined as a missing target cell, and the value of this place is filled with T as an independent target cell, and the missing target cell is filled when the target cells are arranged according to the position.

本公开实施例中,RPA系统识别子目标布尔矩阵中是否存在取值为第一取值的目标矩阵元素,RPA系统将目标矩阵元素的取值从第一取值更新为第二取值。本公开实施例中将子目标布尔矩阵中取值为第一取值的目标矩阵元素更新为第二取值,补全了缺失的单元格,确定了表格结构。In the embodiment of the present disclosure, the RPA system identifies whether there is a target matrix element with a first value in the sub-target Boolean matrix, and the RPA system updates the value of the target matrix element from the first value to the second value. In the embodiment of the present disclosure, the target matrix element with a first value in the sub-target Boolean matrix is updated to the second value, the missing cells are completed, and the table structure is determined.

图16是根据本公开一个实施例的结合RPA和AI的表格生成方法的流程图,在上述实施例的基础上,进一步结合图16,对RPA系统将从图像中基于OCR识别出的文本条目,填充至空白的第二表格中,得到目标表格的过程进行解释说明,包括以下步骤:FIG16 is a flow chart of a table generation method combining RPA and AI according to an embodiment of the present disclosure. Based on the above embodiment, FIG16 is further combined to explain the process of the RPA system filling the text entries recognized by OCR from the image into the blank second table to obtain the target table, including the following steps:

S1601,RPA系统获取文本条目的四角坐标。S1601, the RPA system obtains the four corner coordinates of the text entry.

RPA系统获取文本条目所在的文本识别框,将文本识别框的四角坐标作为文本条目的四角坐标。The RPA system obtains the text recognition box where the text entry is located, and uses the four corner coordinates of the text recognition box as the four corner coordinates of the text entry.

可选地,可以通过一种算法从含有表格的文本图像中提取文本条目,该算法通过模板扫描形成包围图像前景像素的矩形框,从而提取出前景像素,进而组合矩形框形成模式链。利用模式的最大黑游程、长、宽三个统计特征实现对模式的分类,并实现文字的提取。Alternatively, an algorithm can be used to extract text entries from a text image containing a table. The algorithm forms a rectangular frame surrounding the foreground pixels of the image through template scanning, thereby extracting the foreground pixels, and then combining the rectangular frames to form a pattern chain. The three statistical features of the maximum black run, length, and width of the pattern are used to classify the pattern and extract the text.

S1602,RPA系统将文本条目的四角坐标和目标单元格的四角坐标进行匹配,以获取目标单元格对应的目标文本条目。S1602: The RPA system matches the four corner coordinates of the text entry with the four corner coordinates of the target cell to obtain the target text entry corresponding to the target cell.

RPA系统根据文本条目的四角坐标和目标单元格的四角坐标进行匹配,判断文本条目是否存在占用至少两个目标单元格的第一文本条目。The RPA system matches the four corner coordinates of the text entry with the four corner coordinates of the target cell to determine whether there is a first text entry that occupies at least two target cells.

文本条目的四角坐标能够形成一个区域,若一个单元格的四角坐标全部落在区域内部,且还有另一个单元格的其中两角坐标也落在内部,就是占了两个单元格。The four corner coordinates of a text entry can form an area. If the four corner coordinates of a cell all fall within the area, and two corner coordinates of another cell also fall within the area, then it occupies two cells.

同理,有两个单元格的四角坐标落在里面,且有第三个单元格的坐标落在里面,就是占了三个单元格。Similarly, if the coordinates of the four corners of two cells fall inside it, and the coordinates of the third cell fall inside it, it means that three cells are occupied.

若存在第一文本条目,RPA系统对第一文本条目按照被占用的至少两个单元格进行切分,得到第一文本条目所占用的每个目标单元格的目标文本条目。If the first text entry exists, the RPA system divides the first text entry according to at least two occupied cells to obtain a target text entry for each target cell occupied by the first text entry.

例如第一文本条目占用了两个相邻单元格,则以相邻单元格的共用边进行切分,得到所占用的两个目标单元格各自的文本条目。For example, if the first text entry occupies two adjacent cells, the adjacent cells are segmented using the common edges to obtain the text entries of the two occupied target cells.

若文本条目只占用一个目标单元格,则直接将其作为该目标单元格对应的目标文本条目。If a text entry only occupies one target cell, it is directly used as the target text entry corresponding to the target cell.

S1603,RPA系统将目标文本条目填充至对应的目标单元中,得到目标表格。S1603, the RPA system fills the target text entry into the corresponding target unit to obtain a target table.

关于步骤S1603的具体实现可以参见本公开各实施例中相关介绍,此处不再赘述。For the specific implementation of step S1603, please refer to the relevant introduction in each embodiment of the present disclosure, which will not be repeated here.

本公开实施例中,RPA系统获取文本条目的四角坐标,RPA系统将文本条目的四角坐标和目标单元格的四角坐标进行匹配,以获取目标单元格对应的目标文本条目,RPA系统将目标文本条目填充至对应的目标单元中,得到目标表格。本公开实施例中将跨单元格的文本条目进行拆分,可以精准地将文本条目填入对应的目标单元中,生成了完整的表格文档。In the disclosed embodiment, the RPA system obtains the four corner coordinates of the text entry, matches the four corner coordinates of the text entry with the four corner coordinates of the target cell to obtain the target text entry corresponding to the target cell, and fills the target text entry into the corresponding target cell to obtain the target table. In the disclosed embodiment, the text entry across cells is split, and the text entry can be accurately filled into the corresponding target cell to generate a complete table document.

图17是根据本公开一个实施例的结合RPA和AI的表格生成装置的结构图,如图17所示,结合RPA和AI的表格生成装置1700包括:FIG. 17 is a structural diagram of a table generation apparatus combining RPA and AI according to an embodiment of the present disclosure. As shown in FIG. 17 , a table generation apparatus 1700 combining RPA and AI includes:

提取模块1710,用于从图像中基于人工智能AI提取第一表格的横线和竖线;An extraction module 1710 is used to extract horizontal lines and vertical lines of the first table from the image based on artificial intelligence AI;

交点获取模块1720,用于获取横线与竖线的交点集合,其中,交点集合中包括由横线和竖线相交而成的第一类交点,以及横线的延长线和/或竖线的延长线相交而成的第二类交点;The intersection acquisition module 1720 is used to acquire a set of intersections of horizontal lines and vertical lines, wherein the set of intersections includes first-type intersections formed by the intersection of horizontal lines and vertical lines, and second-type intersections formed by the intersection of extended lines of horizontal lines and/or extended lines of vertical lines;

生成模块1730,用于根据交点集合生成与第一表格一致的空白的第二表格;A generating module 1730, configured to generate a blank second table consistent with the first table according to the intersection point set;

填充模块1740,用于将从图像中基于OCR识别出的文本条目,填充至空白的第二表格中,得到目标表格。The filling module 1740 is used to fill the text entries recognized from the image based on OCR into the blank second table to obtain a target table.

本公开实施例中运用RPA和AI技术,识别图片中的表格,并将其还原为拥有相同表格结构的表格文档,自动地将线下数据转化为线上数据,取代了繁琐的人工处理流程,提高了表格生成的效率。In the disclosed embodiment, RPA and AI technologies are used to identify tables in images and restore them to table documents with the same table structure, automatically converting offline data into online data, replacing cumbersome manual processing procedures and improving the efficiency of table generation.

需要说明的是,前述对结合RPA和AI的表格生成方法实施例的解释说明也适用于该实施例的结合RPA和AI的表格生成装置,此处不再赘述。It should be noted that the above explanation of the embodiment of the table generation method combining RPA and AI is also applicable to the table generation device combining RPA and AI in this embodiment, and will not be repeated here.

进一步地,在本公开实施例一种可能的实现方式中,生成模块1730,还用于:根据交点集合中的交点对单元进行枚举,获取候选单元格和候选单元格的属性信息;根据候选单元格的属性信息,从候选单元格中识别用于生成空白的第二表格的目标单元格;对目标单元格按照位置排布生成空白的第二表格。Furthermore, in a possible implementation of the embodiment of the present disclosure, the generation module 1730 is also used to: enumerate the cells according to the intersections in the intersection set to obtain candidate cells and attribute information of the candidate cells; identify target cells for generating a blank second table from the candidate cells according to the attribute information of the candidate cells; and generate a blank second table for the target cells according to their positional arrangements.

进一步地,在本公开实施例一种可能的实现方式中,生成模块1730,还用于:对枚举的所有候选单元格,按照单元格面积从小到大进行排序;按序对候选单元格进行遍历,对遍历到的目标候选单元格的存在性进行判断;每当判断出目标候选单元格存在,则从未遍历到的候选单元格中删除与目标候选单元格存在重叠的单元格,并将判断出存在的目标候选单元格确定为一个目标单元格;对删除后未遍历到的候选单元格继续按序遍历,直至遍历结束得到所有的目标单元格。Furthermore, in a possible implementation of the disclosed embodiment, the generation module 1730 is also used to: sort all the enumerated candidate cells in ascending order according to the cell area; traverse the candidate cells in order, and judge the existence of the traversed target candidate cells; whenever it is judged that the target candidate cell exists, delete the cells that overlap with the target candidate cell from the candidate cells that have not been traversed, and determine the target candidate cell that is judged to exist as a target cell; continue to traverse the candidate cells that have not been traversed after the deletion in order until all the target cells are obtained after the traversal.

进一步地,在本公开实施例一种可能的实现方式中,生成模块1730,还用于:获取目标候选单元格对应的横线的第一起终点和竖线的第二起终点;根据目标候选单元格的四角坐标、第一起终点和第二起终点,判断目标候选单元格的四条边是否均存在;在判断出四条边均存在时,确定目标候选单元格存在。Furthermore, in a possible implementation of the embodiment of the present disclosure, the generation module 1730 is also used to: obtain the first starting and ending points of the horizontal line and the second starting and ending points of the vertical line corresponding to the target candidate cell; determine whether all four sides of the target candidate cell exist based on the four corner coordinates, the first starting and ending points, and the second starting and ending points of the target candidate cell; and determine that the target candidate cell exists when it is determined that all four sides exist.

进一步地,在本公开实施例一种可能的实现方式中,生成模块1730,还用于:将交点集合中的交点进行顺序排列,并按照同一方向将交点集合中相邻的交点,组成多个交点对;按照同一方向依次获取相邻的交点对,并由相邻的交点对组成基础单元格;将基础单元格作为矩阵元素,构建布尔矩阵。Furthermore, in a possible implementation of the embodiment of the present disclosure, the generation module 1730 is also used to: arrange the intersections in the intersection set in order, and group the adjacent intersections in the intersection set into multiple intersection pairs in the same direction; obtain adjacent intersection pairs in sequence in the same direction, and group the adjacent intersection pairs into basic cells; and construct a Boolean matrix using the basic cells as matrix elements.

进一步地,在本公开实施例一种可能的实现方式中,生成模块1730,还用于:根据目标单元格对布尔矩阵中每个矩阵元素进行赋值,以生成目标布尔矩阵;识别目标布尔矩阵中是否包括均为第一取值的目标行和/或目标列;在存在目标行和/或目标列,按照目标行和/或目标列对布尔矩阵进行分裂,生成子目标布尔矩阵;获取子目标波尔矩阵对应的目标单元格,并按照位置排布以生成子目标布尔矩阵对应的空白的第二表格。Furthermore, in a possible implementation of the embodiment of the present disclosure, the generation module 1730 is also used to: assign values to each matrix element in the Boolean matrix according to the target cell to generate a target Boolean matrix; identify whether the target Boolean matrix includes target rows and/or target columns that all have the first value; if there are target rows and/or target columns, split the Boolean matrix according to the target rows and/or target columns to generate a sub-target Boolean matrix; obtain the target cells corresponding to the sub-target Boolean matrix, and arrange them according to the position to generate a blank second table corresponding to the sub-target Boolean matrix.

进一步地,在本公开实施例一种可能的实现方式中,生成模块1730,还用于:根据目标单元格的四角坐标,并根据四角坐标获取目标单元格所包括的第一基础单元格,并对第一基础单元格对应的矩阵元素从第一取值更新为第二取值。Furthermore, in a possible implementation of the embodiment of the present disclosure, the generation module 1730 is also used to: obtain the first basic cell included in the target cell according to the four corner coordinates of the target cell, and update the matrix element corresponding to the first basic cell from the first value to the second value.

进一步地,在本公开实施例一种可能的实现方式中,生成模块1730,还用于:识别子目标布尔矩阵中是否存在取值为第一取值的目标矩阵元素;将目标矩阵元素的取值从第一取值更新为第二取值。Furthermore, in a possible implementation of the embodiment of the present disclosure, the generation module 1730 is also used to: identify whether there is a target matrix element with a first value in the sub-target Boolean matrix; and update the value of the target matrix element from the first value to the second value.

进一步地,在本公开实施例一种可能的实现方式中,生成模块1730,还用于:获取目标矩阵元素对应的目标基础单元格;判断目标基础单元格是否需要合并到目标矩阵元素相邻的相邻矩阵元素对应的目标单元格中;在判断需要合并则将目标基础单元格合并到相邻矩阵元素对应的目标单元格中。Furthermore, in a possible implementation of the embodiment of the present disclosure, the generation module 1730 is also used to: obtain the target basic cell corresponding to the target matrix element; determine whether the target basic cell needs to be merged into the target cell corresponding to the adjacent matrix element adjacent to the target matrix element; and merge the target basic cell into the target cell corresponding to the adjacent matrix element if it is determined that merging is required.

进一步地,在本公开实施例一种可能的实现方式中,生成模块1730,还用于:判断无需合并,则将目标基础单元格确定为一个缺失的目标单元格,并在按照位置排布目标单元格时对缺失的目标单元格进行补全。Furthermore, in a possible implementation of the embodiment of the present disclosure, the generation module 1730 is also used to: if it is determined that no merging is required, determine the target basic cell as a missing target cell, and complete the missing target cell when arranging the target cells according to the position.

进一步地,在本公开实施例一种可能的实现方式中,填充模块1740,还用于:获取文本条目的四角坐标;将文本条目的四角坐标和目标单元格的四角坐标进行匹配,以获取目标单元格对应的目标文本条目;将目标文本条目填充至对应的目标单元中,得到目标表格。Furthermore, in a possible implementation of the embodiment of the present disclosure, the filling module 1740 is also used to: obtain the four corner coordinates of the text entry; match the four corner coordinates of the text entry with the four corner coordinates of the target cell to obtain the target text entry corresponding to the target cell; fill the target text entry into the corresponding target cell to obtain the target table.

进一步地,在本公开实施例一种可能的实现方式中,填充模块1740,还用于:根据文本条目的四角坐标和目标单元格的四角坐标进行匹配,判断文本条目是否存在占用至少两个目标单元格的第一文本条目;对第一文本条目按照被占用的至少两个单元格进行切分,得到第一文本条目所占用的每个目标单元格的目标文本条目。Furthermore, in a possible implementation of the embodiment of the present disclosure, the filling module 1740 is also used to: match the four corner coordinates of the text entry with the four corner coordinates of the target cell to determine whether there is a first text entry that occupies at least two target cells; and divide the first text entry according to the at least two occupied cells to obtain a target text entry for each target cell occupied by the first text entry.

根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to an embodiment of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

图18示出了可以用来实施本公开的实施例的示例电子设备1800的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。FIG. 18 shows a schematic block diagram of an example electronic device 1800 that can be used to implement an embodiment of the present disclosure. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples and are not intended to limit the implementation of the present disclosure described and/or required herein.

如图18所示,包括存储器181、处理器182及存储在存储器181上并可在处理器182上运行的计算机程序,处理器182执行程序时,实现前述的表格生成方法。As shown in FIG. 18 , the system includes a memory 181 , a processor 182 , and a computer program stored in the memory 181 and executable on the processor 182 . When the processor 182 executes the program, the aforementioned table generating method is implemented.

在本发明的描述中,需要理解的是,术语“中心”、“纵向”、“横向”、“长度”、“宽度”、“厚度”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”、“内”、“外”、“顺时针”、“逆时针”、“轴向”、“径向”、“周向”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inside", "outside", "clockwise", "counterclockwise", "axial", "radial", "circumferential" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the accompanying drawings, and are only for the convenience of describing the present invention and simplifying the description, and do not indicate or imply that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be understood as limiting the present invention.

此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本发明的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。In addition, the terms "first" and "second" are used for descriptive purposes only and should not be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features. Therefore, the features defined as "first" and "second" may explicitly or implicitly include one or more of the features. In the description of the present invention, the meaning of "plurality" is two or more, unless otherwise clearly and specifically defined.

在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" etc. means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the above terms do not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described may be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled in the art may combine and combine the different embodiments or examples described in this specification and the features of the different embodiments or examples, without contradiction.

尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described above, it is to be understood that the above embodiments are exemplary and are not to be construed as limitations of the present invention. A person skilled in the art may change, modify, replace and vary the above embodiments within the scope of the present invention.

Claims (16)

1.一种结合RPA和AI的表格生成方法,其特征在于,由RPA系统执行,所述方法包括:1. A table generation method combining RPA and AI, characterized in that it is executed by an RPA system, and the method comprises: 所述RPA系统从图像中基于人工智能AI提取第一表格的横线和竖线;The RPA system extracts horizontal lines and vertical lines of the first table from the image based on artificial intelligence AI; 所述RPA系统获取所述横线与所述竖线的交点集合,其中,所述交点集合中包括由所述横线和所述竖线相交而成的第一类交点,以及所述横线的延长线和/或所述竖线的延长线相交而成的第二类交点;The RPA system acquires a set of intersections of the horizontal line and the vertical line, wherein the set of intersections includes a first type of intersection formed by the intersection of the horizontal line and the vertical line, and a second type of intersection formed by the intersection of an extension line of the horizontal line and/or an extension line of the vertical line; 所述RPA系统根据所述交点集合生成与所述第一表格一致的空白的第二表格;The RPA system generates a blank second table consistent with the first table according to the set of intersection points; 所述RPA系统将从所述图像中基于光学字符识别ORC识别出的文本条目,填充至所述空白的第二表格中,得到目标表格;The RPA system fills the text entries recognized from the image based on optical character recognition (ORC) into the blank second table to obtain a target table; 所述RPA系统根据所述交点集合生成与所述第一表格一致的空白的第二表格,包括:The RPA system generates a blank second table consistent with the first table according to the intersection set, including: 所述RPA系统根据所述交点集合中的交点对单元进行枚举,获取候选单元格和所述候选单元格的属性信息;The RPA system enumerates cells according to the intersection points in the intersection point set to obtain candidate cells and attribute information of the candidate cells; 所述RPA系统对枚举的所有候选单元格,按照单元格面积从小到大进行排序;The RPA system sorts all the enumerated candidate cells in ascending order of cell area; 所述RPA系统按序对所述候选单元格进行遍历,对遍历到的目标候选单元格的存在性进行判断;The RPA system traverses the candidate cells in order and determines the existence of the traversed target candidate cells; 所述RPA系统每当判断出所述目标候选单元格存在,则从未遍历到的所述候选单元格中删除与所述目标候选单元格存在重叠的单元格,并将判断出存在的所述目标候选单元格确定为一个目标单元格;Whenever the RPA system determines that the target candidate cell exists, the RPA system deletes cells that overlap with the target candidate cell from the candidate cells that have not been traversed, and determines the target candidate cell that is determined to exist as a target cell; 所述RPA系统对删除后未遍历到的所述候选单元格继续按序遍历,直至遍历结束得到所有的所述目标单元格;The RPA system continues to traverse the candidate cells that have not been traversed after deletion in order until all the target cells are obtained after the traversal is completed; 所述RPA系统对所述目标单元格按照位置排布生成所述空白的第二表格。The RPA system generates the blank second table for the target cells according to position arrangement. 2.根据权利要求1所述的方法,其特征在于,所述RPA系统按序对所述候选单元格进行遍历,对遍历到的目标候选单元格的存在性进行判断,包括:2. The method according to claim 1, characterized in that the RPA system traverses the candidate cells in order and judges the existence of the traversed target candidate cells, including: 所述RPA系统获取所述目标候选单元格对应的所述横线的第一起终点和所述竖线的第二起终点;The RPA system obtains a first starting point and a second starting point of the horizontal line and a second starting point of the vertical line corresponding to the target candidate cell; 所述RPA系统根据所述目标候选单元格的四角坐标、所述第一起终点和所述第二起终点,判断所述目标候选单元格的四条边是否均存在;The RPA system determines whether all four sides of the target candidate cell exist according to the four corner coordinates of the target candidate cell, the first starting and ending points, and the second starting and ending points; 所述RPA系统在判断出所述四条边均存在时,确定所述目标候选单元格存在。When the RPA system determines that all four edges exist, it determines that the target candidate cell exists. 3.根据权利要求1-2任一项所述的方法,其特征在于,所述RPA系统根据所述候选单元格的属性信息,从所述候选单元格中识别用于生成所述空白的第二表格的目标单元格之前,还包括:3. The method according to any one of claims 1 to 2, characterized in that before the RPA system identifies the target cell for generating the blank second table from the candidate cells according to the attribute information of the candidate cells, it further comprises: 所述RPA系统将所述交点集合中的交点进行顺序排列,并按照同一方向将交点集合中相邻的交点,组成多个交点对;The RPA system arranges the intersection points in the intersection point set in order, and groups adjacent intersection points in the intersection point set into a plurality of intersection point pairs in the same direction; 所述RPA系统按照所述同一方向依次获取相邻的所述交点对,并由所述相邻的所述交点对组成基础单元格;The RPA system sequentially acquires adjacent pairs of intersections in the same direction, and forms basic cells from the adjacent pairs of intersections; 所述RPA系统将所述基础单元格作为矩阵元素,构建布尔矩阵。The RPA system uses the basic cells as matrix elements to construct a Boolean matrix. 4.根据权利要求3所述的方法,其特征在于,所述RPA系统对所述目标单元格按照位置排布生成所述空白的第二表格,包括:4. The method according to claim 3, characterized in that the RPA system generates the blank second table for the target cells according to position arrangement, comprising: 所述RPA系统根据所述目标单元格对所述布尔矩阵中每个所述矩阵元素进行赋值,以生成目标布尔矩阵;The RPA system assigns a value to each matrix element in the Boolean matrix according to the target cell to generate a target Boolean matrix; 所述RPA系统识别所述目标布尔矩阵中是否包括均为第一取值的目标行和/或目标列;The RPA system identifies whether the target Boolean matrix includes target rows and/or target columns that all have first values; 所述RPA系统在存在所述目标行和/或所述目标列,按照所述目标行和/或所述目标列对所述布尔矩阵进行分裂,生成子目标布尔矩阵;When the target row and/or the target column exist, the RPA system splits the Boolean matrix according to the target row and/or the target column to generate a sub-target Boolean matrix; 所述RPA系统获取所述子目标布尔矩阵对应的所述目标单元格,并按照位置排布以生成所述子目标布尔矩阵对应的所述空白的第二表格。The RPA system obtains the target cells corresponding to the sub-target Boolean matrix, and arranges them according to positions to generate the blank second table corresponding to the sub-target Boolean matrix. 5.根据权利要求4所述的方法,其特征在于,所述RPA系统根据所述目标单元格对所述布尔矩阵中每个所述矩阵元素进行赋值,以生成目标布尔矩阵,包括:5. The method according to claim 4, characterized in that the RPA system assigns a value to each matrix element in the Boolean matrix according to the target cell to generate a target Boolean matrix, comprising: 所述RPA系统根据所述目标单元格的四角坐标,并根据所述四角坐标获取所述目标单元格所包括的第一基础单元格,并对所述第一基础单元格对应的矩阵元素从所述第一取值更新为第二取值。The RPA system obtains a first basic cell included in the target cell based on the four corner coordinates of the target cell and the four corner coordinates, and updates a matrix element corresponding to the first basic cell from the first value to a second value. 6.根据权利要求4所述的方法,其特征在于,所述RPA系统在存在所述目标行和/或所述目标列,按照所述目标行和/或所述目标列对所述布尔矩阵进行分裂,生成子目标布尔矩阵之后,还包括:6. The method according to claim 4, characterized in that, when the target row and/or the target column exist, the RPA system splits the Boolean matrix according to the target row and/or the target column to generate a sub-target Boolean matrix, further comprising: 所述RPA系统识别所述子目标布尔矩阵中是否存在取值为所述第一取值的目标矩阵元素;The RPA system identifies whether there is a target matrix element whose value is the first value in the sub-target Boolean matrix; 所述RPA系统将所述目标矩阵元素的取值从所述第一取值更新为第二取值。The RPA system updates the value of the target matrix element from the first value to a second value. 7.根据权利要求6所述的方法,其特征在于,所述RPA系统将所述矩阵元素的取值从所述第一取值更新为所述第二取值之前,还包括:7. The method according to claim 6, characterized in that before the RPA system updates the value of the matrix element from the first value to the second value, it also includes: 所述RPA系统获取所述目标矩阵元素对应的目标基础单元格;The RPA system obtains a target basic cell corresponding to the target matrix element; 所述RPA系统判断所述目标基础单元格是否需要合并到所述目标矩阵元素相邻的相邻矩阵元素对应的所述目标单元格中;The RPA system determines whether the target basic cell needs to be merged into the target cell corresponding to the adjacent matrix element adjacent to the target matrix element; 所述RPA系统在判断需要合并则将所述目标基础单元格合并到所述相邻矩阵元素对应的所述目标单元格中。When the RPA system determines that merging is required, it merges the target basic cell into the target cell corresponding to the adjacent matrix element. 8.根据权利要求7所述的方法,其特征在于,所述方法还包括:8. The method according to claim 7, characterized in that the method further comprises: 所述RPA系统判断无需合并,则将所述目标基础单元格确定为一个缺失的目标单元格,并在按照位置排布所述目标单元格时对所述缺失的目标单元格进行补全。If the RPA system determines that merging is not necessary, the target basic cell is determined as a missing target cell, and the missing target cell is completed when arranging the target cells according to positions. 9.根据权利要求1-2任一项所述的方法,其特征在于,所述RPA系统将从所述图像中基于OCR识别出的文本条目,填充至所述空白的第二表格中,得到目标表格,包括:9. The method according to any one of claims 1 to 2, characterized in that the RPA system fills the text entries recognized based on OCR from the image into the blank second table to obtain a target table, comprising: 所述RPA系统获取所述文本条目的四角坐标;The RPA system obtains the four corner coordinates of the text entry; 所述RPA系统将所述文本条目的四角坐标和所述目标单元格的四角坐标进行匹配,以获取所述目标单元格对应的目标文本条目;The RPA system matches the four corner coordinates of the text entry with the four corner coordinates of the target cell to obtain a target text entry corresponding to the target cell; 所述RPA系统将所述目标文本条目填充至对应的所述目标单元中,得到所述目标表格。The RPA system fills the target text entry into the corresponding target unit to obtain the target table. 10.根据权利要求9所述的方法,其特征在于,所述RPA系统将所述文本条目的四角坐标和所述目标单元格的四角坐标进行匹配,以获取所述目标单元格对应的目标文本条目,包括:10. The method according to claim 9, characterized in that the RPA system matches the four corner coordinates of the text entry with the four corner coordinates of the target cell to obtain the target text entry corresponding to the target cell, comprising: 所述RPA系统根据所述文本条目的四角坐标和所述目标单元格的四角坐标进行匹配,判断所述文本条目是否存在占用至少两个目标单元格的第一文本条目;The RPA system matches the four corner coordinates of the text entry with the four corner coordinates of the target cell, and determines whether the text entry includes a first text entry occupying at least two target cells; 所述RPA系统对所述第一文本条目按照被占用的所述至少两个单元格进行切分,得到所述第一文本条目所占用的每个所述目标单元格的目标文本条目。The RPA system divides the first text entry according to the at least two occupied cells to obtain a target text entry for each target cell occupied by the first text entry. 11.一种结合RPA和AI的表格生成装置,其特征在于,包括:11. A table generation device combining RPA and AI, characterized by comprising: 提取模块,用于从图像中基于人工智能AI提取第一表格的横线和竖线;An extraction module, used for extracting horizontal lines and vertical lines of the first table from the image based on artificial intelligence AI; 交点获取模块,用于获取所述横线与所述竖线的交点集合,其中,所述交点集合中包括由所述横线和所述竖线相交而成的第一类交点,以及所述横线的延长线和/或所述竖线的延长线相交而成的第二类交点;An intersection acquisition module, used for acquiring a set of intersections of the horizontal line and the vertical line, wherein the set of intersections includes first-type intersections formed by the intersection of the horizontal line and the vertical line, and second-type intersections formed by the intersection of the extension lines of the horizontal line and/or the extension lines of the vertical line; 生成模块,用于根据所述交点集合生成与所述第一表格一致的空白的第二表格;A generating module, configured to generate a blank second table consistent with the first table according to the intersection point set; 填充模块,用于将从所述图像中基于OCR识别出的文本条目,填充至所述空白的第二表格中,得到目标表格;A filling module, used for filling the text entries recognized based on OCR from the image into the blank second table to obtain a target table; 所述生成模块,还用于:The generating module is further used for: 根据所述交点集合中的交点对单元进行枚举,获取候选单元格和所述候选单元格的属性信息;Enumerating cells according to the intersection points in the intersection point set to obtain candidate cells and attribute information of the candidate cells; 对枚举的所有候选单元格,按照单元格面积从小到大进行排序;For all the enumerated candidate cells, sort them from small to large according to the cell area; 按序对所述候选单元格进行遍历,对遍历到的目标候选单元格的存在性进行判断;Traversing the candidate cells in order, and judging the existence of the traversed target candidate cells; 每当判断出所述目标候选单元格存在,则从未遍历到的所述候选单元格中删除与所述目标候选单元格存在重叠的单元格,并将判断出存在的所述目标候选单元格确定为一个目标单元格;Whenever it is determined that the target candidate cell exists, cells overlapping with the target candidate cell are deleted from the candidate cells that have not been traversed, and the target candidate cell that is determined to exist is determined as a target cell; 对删除后未遍历到的所述候选单元格继续按序遍历,直至遍历结束得到所有的所述目标单元格;Continue to traverse the candidate cells that have not been traversed after deletion in order until all the target cells are obtained after the traversal is completed; 对所述目标单元格按照位置排布生成所述空白的第二表格。The blank second table is generated by arranging the target cells according to their positions. 12.根据权利要求11所述的装置,其特征在于,所述生成模块,还用于:12. The device according to claim 11, characterized in that the generating module is further used for: 获取所述目标候选单元格对应的所述横线的第一起终点和所述竖线的第二起终点;Obtaining the first starting and ending points of the horizontal line and the second starting and ending points of the vertical line corresponding to the target candidate cell; 根据所述目标候选单元格的四角坐标、所述第一起终点和所述第二起终点,判断所述目标候选单元格的四条边是否均存在;According to the coordinates of the four corners of the target candidate cell, the first starting and ending points, and the second starting and ending points, determining whether all four sides of the target candidate cell exist; 在判断出所述四条边均存在时,确定所述目标候选单元格存在。When it is determined that the four edges all exist, it is determined that the target candidate cell exists. 13.根据权利要求11-12任一项所述的装置,其特征在于,所述填充模块,还用于:13. The device according to any one of claims 11 to 12, characterized in that the filling module is further used for: 获取所述文本条目的四角坐标;Get the four corner coordinates of the text entry; 将所述文本条目的四角坐标和所述目标单元格的四角坐标进行匹配,以获取所述目标单元格对应的目标文本条目;Matching the four corner coordinates of the text entry with the four corner coordinates of the target cell to obtain a target text entry corresponding to the target cell; 将所述目标文本条目填充至对应的所述目标单元中,得到所述目标表格。The target text entries are filled into the corresponding target cells to obtain the target table. 14.一种电子设备,其特征在于,包括存储器、处理器;14. An electronic device, comprising a memory and a processor; 其中,所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于实现如权利要求1-10中任一项所述的方法。The processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement the method according to any one of claims 1 to 10. 15.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-10中任一项所述的方法。15. A computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the method according to any one of claims 1 to 10 is implemented. 16.一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-10中任一项所述的方法。16. A computer program product comprising a computer program, which, when executed by a processor, implements the method according to any one of claims 1 to 10.
CN202111026974.3A 2021-09-02 2021-09-02 Table generation method, device, electronic device and storage medium combining RPA and AI Active CN113836878B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111026974.3A CN113836878B (en) 2021-09-02 2021-09-02 Table generation method, device, electronic device and storage medium combining RPA and AI

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111026974.3A CN113836878B (en) 2021-09-02 2021-09-02 Table generation method, device, electronic device and storage medium combining RPA and AI

Publications (2)

Publication Number Publication Date
CN113836878A CN113836878A (en) 2021-12-24
CN113836878B true CN113836878B (en) 2024-11-05

Family

ID=78962035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111026974.3A Active CN113836878B (en) 2021-09-02 2021-09-02 Table generation method, device, electronic device and storage medium combining RPA and AI

Country Status (1)

Country Link
CN (1) CN113836878B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796031A (en) * 2019-10-11 2020-02-14 腾讯科技(深圳)有限公司 Table identification method and device based on artificial intelligence and electronic equipment
CN112115774A (en) * 2020-08-07 2020-12-22 北京来也网络科技有限公司 Character recognition method, device, electronic device and storage medium combining RPA and AI

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446264B (en) * 2018-03-26 2022-02-15 阿博茨德(北京)科技有限公司 Table vector analysis method and device in PDF document
CN111797685B (en) * 2020-05-27 2022-04-15 贝壳找房(北京)科技有限公司 Identification method and device of table structure
CN111640130A (en) * 2020-05-29 2020-09-08 深圳壹账通智能科技有限公司 Table reduction method and device
CN111860502B (en) * 2020-07-15 2024-07-16 北京思图场景数据科技服务有限公司 Picture form identification method and device, electronic equipment and storage medium
CN113065536B (en) * 2021-06-03 2021-09-14 北京欧应信息技术有限公司 Method of processing table, computing device, and computer-readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110796031A (en) * 2019-10-11 2020-02-14 腾讯科技(深圳)有限公司 Table identification method and device based on artificial intelligence and electronic equipment
CN112115774A (en) * 2020-08-07 2020-12-22 北京来也网络科技有限公司 Character recognition method, device, electronic device and storage medium combining RPA and AI

Also Published As

Publication number Publication date
CN113836878A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
US7350142B2 (en) Method and system for creating a table version of a document
US7149967B2 (en) Method and system for creating a table version of a document
CN102117269B (en) Apparatus and method for digitizing documents
JP5361574B2 (en) Image processing apparatus, image processing method, and program
US8391609B2 (en) Method of massive parallel pattern matching against a progressively-exhaustive knowledge base of patterns
CN110472208A (en) The method, system of form analysis, storage medium and electronic equipment in PDF document
US8693790B2 (en) Form template definition method and form template definition apparatus
JP5854802B2 (en) Image processing apparatus, image processing method, and computer program
CN1859541B (en) Image processing apparatus and its control method
WO2020186779A1 (en) Image information identification method and apparatus, and computer device and storage medium
NL9301004A (en) Apparatus for processing and reproducing digital image information.
CN110633660B (en) Document identification method, device and storage medium
US9298335B2 (en) Information processing device, information processing method, and computer-readable medium
US11908215B2 (en) Information processing apparatus, information processing method, and storage medium
JP2012203491A (en) Document processing device and document processing program
JP2021057783A (en) Image processing device, control method of image processing device and program of the same
CN113836878B (en) Table generation method, device, electronic device and storage medium combining RPA and AI
JP2008108114A (en) Document processing apparatus and document processing method
JP2013020477A (en) Image processing apparatus and program
CN111126029B (en) A method, device, computer equipment and storage medium for generating an electronic document
JP3898645B2 (en) Form format editing device and form format editing program
JP2019169182A (en) Information processing device, control method, and program
JP4031189B2 (en) Document recognition apparatus and document recognition method
CN110390323B (en) Information processing apparatus and computer readable medium
US20210073552A1 (en) Information processing apparatus and non-transitory computer readable medium storing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant