CN113887484B

CN113887484B - Card type file image identification method and device

Info

Publication number: CN113887484B
Application number: CN202111219855.XA
Authority: CN
Inventors: 吴静垠; 俞希林
Original assignee: Qianjin Network Information Technology (shanghai) Co ltd
Current assignee: Qianjin Network Information Technology (shanghai) Co ltd
Priority date: 2021-10-20
Filing date: 2021-10-20
Publication date: 2022-11-04
Anticipated expiration: 2041-10-20
Also published as: CN113887484A

Abstract

The invention relates to a card type file image identification method and a device, wherein the method comprises the following steps: performing character recognition on the target card type file image to obtain a file character set, wherein the characters comprise one or more of characters, numbers and symbols; performing image processing on the target card type file image to at least obtain the image characteristic of each character; extracting characters in the file character set according to semantics, and merging and/or splitting the characters in the file character set at least according to semantic features and image features of the characters to obtain field contents of multiple categories; and generating a file identification text including the contents of the fields of the plurality of categories. The invention can more accurately identify the characters, numbers and characters in the card file image by adopting various auxiliary modes, and can extract the categories from the identified texts, so that the application range of the invention is wider.

Description

Card type file image identification method and device

Technical Field

The invention relates to the technical field of information processing, in particular to a card type file image identification method and device.

Background

The name card is a personal information carrier which is convenient for transferring contact information, and plays an important role in daily business activities and social activities. After business card exchange, people usually need to digitize some important information in business cards, that is, received paper business cards are converted into electronic business cards or business card data are stored in a mobile phone or a computer. In addition, in some application scenarios, all people of the business cards need to convert their own paper business cards into electronic business cards, and then the content on the business cards can be edited. There are some application scenarios, when collecting personal data, the data on the personal business card can be collected and utilized according to the specific scenario.

At present, most business card Recognition methods adopt an Optical Character Recognition (OCR) technology, where the OCR technology is a process of analyzing and recognizing an image file with text information to obtain characters and layout information. The method mainly comprises the steps of image preprocessing, character detection, text recognition and the like. Wherein the image preprocessing is usually to correct for the imaging problem of the image. Common pre-processing procedures include geometric transformations (perspective, warping, rotation, etc.), distortion correction, deblurring, image enhancement, and light correction, among others. Character detection is to detect the position, range and layout of the text. Layout analysis, text line detection and the like are also generally included, and the main problem to be solved is the position and the range of the text in the image. The text recognition is to recognize text contents on the basis of text detection and convert text image information into text information. The character recognition can adopt a template matching mode to classify so as to recognize a single character, or adopt a deep learning model to introduce context information so as to improve the recognition accuracy.

The OCR technologies are classified into a dedicated OCR technology recognizing a specific scene and a general OCR technology recognizing a variety of scenes according to application scenes. For the general OCR technology applied to natural scenes, the problems that the background of a picture is too rich, or the picture has low brightness, low contrast, uneven illumination, perspective deformation, incomplete occlusion and the like, or the problems of distortion, folding, reversing and the like exist in the layout of a text, or the problems of various fonts, heavy font sizes, heavy font weights, different colors and the like of characters can all cause high difficulty of text recognition in the OCR technology, so that the accuracy is not high. The first problem of the business card recognition by the OCR technology is that the recognition accuracy is not high due to the fact that business card background pictures, typesetting layout of characters, font styles, word sizes, colors and the like of the characters are varied, and in the existing business card OCR recognition process, a user needs to perform proofreading and verification after recognition is finished. Even though the first problem is overcome by various means, there is still a second problem, namely: the conventional OCR recognition of a business card basically recognizes characters until characters are recognized, and thus a character set is obtained in which characters on a business card are piled up, and contents such as a name, a company name, an address, and the like need to be manually recognized. The text set cannot be directly utilized.

The same problem is present for some card-like documents similar to business cards.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides a card type file image identification method and a card type file image identification device, which are used for providing a card identification text with a definite category.

In order to solve the above technical problem, according to an aspect of the present invention, there is provided a card-type document image recognition method, including the steps of:

performing character recognition on the target card type file image to obtain a file character set, wherein the characters comprise one or more of characters, numbers, punctuations and special symbols;

the method for processing the target card type file image at least obtains the image characteristics of each character comprises the following steps:

extracting a character image area from the target card type file image;

segmenting the character image area to obtain a single character image area;

carrying out convolution, pooling and normalization operations on a single character image by using a CNN model so as to extract the image characteristics of the character, wherein the image characteristics of the character represent the font, color and height characteristics of the character;

extracting the characters in the file character set according to semantics, and merging and/or splitting the characters in the file character set at least according to semantic features and image features of the characters to obtain field contents of multiple categories, which specifically comprises the following steps:

pre-dividing characters in the file character set according to semantics to obtain a plurality of word segmentation sets;

comparing the image characteristics of the characters of the multiple word segments;

when the character image characteristics of two adjacent separated participles are the same, combining the two adjacent separated participles to form a new participle; when the character image characteristics of two adjacent connected participles are different, splitting the two connected participles;

determining the category of the participle based on the semantic features of the participle; and

generating a file identification text including the contents of the plurality of category fields.

According to another aspect of the present invention, the present invention further provides an apparatus for recognizing an image of a card-type document, including a character recognition module, an image feature extraction module, a semantic extraction module, and a layout module, wherein the character recognition module is configured to perform text recognition on a target card-type document image to obtain a document character set; the image feature extraction module is configured to perform image processing on the target card type file image, and at least obtain an image feature of each character, and specifically includes: extracting a character image area from a target card type file image; segmenting the character image area to obtain a single character image area; carrying out convolution, pooling and normalization operations on a single character image by using a CNN model so as to extract the image characteristics of the character, wherein the image characteristics of the character represent the font, color and height characteristics of the character; the semantic extraction module is connected with the character recognition module and the image feature extraction module, and is configured to merge and/or split characters in a character set according to at least semantic features and image features of the characters to obtain field contents of multiple categories when performing semantic extraction on the characters in the file character set, and specifically includes the following steps: pre-dividing characters in the file character set according to semantics to obtain a plurality of word segmentation sets; comparing the image characteristics of the characters of the multiple participles; when the character image characteristics of two adjacent separated participles are the same, combining the two adjacent separated participles to form a new participle; when the character image characteristics of two adjacent connected participles are different, splitting the two connected participles; determining the category of the participle based on the semantic features of the participle; the layout module is configured to layout the plurality of category field contents according to a preset format to generate a file identification text including the plurality of category field contents.

When the semantic extraction is carried out, the image characteristics are utilized for assistance, so that characters, numbers, characters and the like in the card file image can be more accurately identified, categories can be extracted from the identified texts, and the application range of the semantic extraction method is wider.

Drawings

Preferred embodiments of the present invention will now be described in further detail with reference to the accompanying drawings, in which:

FIG. 1 is a flow diagram of a card file image recognition method according to one embodiment of the invention;

FIG. 2 is a schematic view of a business card according to one embodiment of the present invention;

FIG. 3 is a schematic illustration of a business card in accordance with another embodiment of the invention;

FIG. 4 is a schematic illustration of a business card in accordance with yet another embodiment of the invention;

FIG. 5 is a functional block diagram of a business card recognition device in accordance with one embodiment of the present invention;

FIG. 6 is a schematic diagram of an application of a business card recognition apparatus in accordance with one embodiment of the present invention;

FIG. 7 is a business card recognition flow based on the embodiment of the application shown in FIG. 3;

FIG. 8 is a schematic illustration of the business card of FIG. 3 undergoing image, text detection;

FIG. 9 is a flow diagram of a method of rectifying an image according to one embodiment of the invention;

FIG. 10 is a flow diagram of a method of rectifying an image according to another embodiment of the invention;

FIG. 11 is a flow diagram of a method of rectifying an image according to yet another embodiment of the invention;

FIG. 12 is a pictorial diagram of an original identification card, in accordance with one embodiment of the present invention;

FIG. 13 is a schematic diagram of an identification card image cropped from an original picture according to an embodiment of the invention;

FIG. 14 is a schematic diagram of an identification card image with a first keypoint predicted, according to an embodiment of the invention;

FIG. 15 is a diagram of a standard positive identification card image with a first keypoint labeled, according to one embodiment of the invention;

FIG. 16 is a schematic illustration of a positive identification card image obtained after rectification according to an embodiment of the invention;

FIG. 17 is a functional block diagram of an image rectification module according to an embodiment of the present invention;

FIG. 18 is a functional block diagram of an image rectification module according to another embodiment of the present invention; and

FIG. 19 is a functional block diagram of an image rectification module according to yet another embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

In the following detailed description, reference is made to the accompanying drawings that form a part hereof and in which is shown by way of illustration specific embodiments of the application. In the drawings, like numerals describe substantially similar components throughout the different views. Various specific embodiments of the present application are described in sufficient detail below to enable those skilled in the art to practice the teachings of the present application. It is to be understood that other embodiments may be utilized and structural, logical or electrical changes may be made to the embodiments of the present application.

The invention provides a method and a device for identifying card type file images, wherein the card type files comprise but are not limited to business cards, identity cards, passports and the like. The invention presets corresponding category fields corresponding to different types of files to represent the word semantics in the files. For business cards, for example, the preset category fields include name, position, address, phone, email address, and the like. For an identification card, the preset category fields include name, gender, ethnicity, national identification number, and the like. The invention can identify the field contents of the categories from the files, saves the filling time of information when collecting and extracting the user information, and also provides good data resources for other application programs.

Example one

Fig. 1 is a flow chart of a business card image recognition method according to an embodiment of the present invention, which describes in detail a recognition method and apparatus of the present invention with a business card as an embodiment of a card file, wherein the recognition method includes the following steps:

and S1, acquiring a business card image to be identified. According to the application scenario of the present invention, the source of the business card image to be identified includes the following various ways: reading the photographed and stored business card image from the mobile phone photo album, photographing the business card by using the mobile phone camera, and receiving the business card image sent from other programs, modules and devices.

And S2, correcting the name card image to be recognized, wherein the details are described in the following description and are not repeated herein.

And S3, performing character recognition on the target business card image to obtain a file character set. Various algorithms may be used for text recognition at this time, for example: extracting a character image area from the business card image by using an edge detection algorithm; the method comprises the steps of cutting out single-word images from a word image area by using a projection method, and then carrying out classification calculation on each single-word image by using a Convolutional Neural Network (CNN) model to obtain words, numbers, punctuation marks or special marks corresponding to the single-word images. Other recognition methods, such as a Convolutional Recurrent Neural Network (CRNN) OCR algorithm and an attention mechanism (attention) OCR algorithm based on the deep learning end-to-end OCR technology, may also be used to recognize a text line image through the sequence features learned by the respective deep learning models, so as to obtain a whole line of text.

And S4, carrying out image processing on the target business card image to at least obtain the image characteristics of each character. For example, a character image area is first extracted from a target business card image and divided into individual character image areas, and at this time, coordinate position and height and width information of the corresponding character can be obtained. And then carrying out operations such as convolution, pooling, normalization and the like on the character region image by using a CNN model so as to extract the image characteristics of the character. These image features of the character can express the character image features such as font, color, height, etc., to some extent.

And S5, extracting the characters in the file character set according to semantics to obtain field contents of multiple categories. In one embodiment, the characters in the document character set are firstly pre-segmented according to semantics to obtain a plurality of word segmentation sets. Taking the business card in fig. 2 as an example, when the first line of text on the recognized business card is "Zhang Sanzong manager", the first line of text is semantically segmented to obtain "one", "three", and "total, meridian, and" three participle sets. The "sheets" and "three" in the image are separated for typesetting reasons, and the "three" and "three" are connected together in the word segmentation set. Therefore, when semantic segmentation is carried out, two words are separated according to the semantic features of the words, namely the words, the words and the phrases are divided into a participle set, and the words are divided into words with definite semantic content. Referring to the image features of the five characters, the image features of the two characters, i.e., the "page" and the "three" are the same, the image features of the three characters, i.e., the "total, meridian, and the" three "are the same, and the image features of the" page "and the" three "are obviously different from the image features of the" total, meridian, and the "image features represent the character height, the font and the color, and the image features are the same, i.e., the character height is similar, the font or the character style is similar, and the color is similar or has correlation. Therefore, the two words of 'zhang' and 'san' can be combined together, the three characters of 'zhang, jing, man' should be split from 'san', then the three characters can be classified into the category of 'name' according to the semantic features of 'zhang' and 'san', and the three characters of 'zhang, jing, man' are classified into the category of 'position' according to the semantic features of the three words of 'zhang, jing, man', so that the field contents of the category of 'name' including 'zhang' and 'position' including 'zhang manager' are extracted from the first line of characters 'Zhang Sanzong manager'.

For example, in the case of the type "address", characters belonging to the same address type may be detected in two upper and lower lines, and in this case, the image characteristics of the two lines of characters are compared, and since the image characteristics are the same, the characters in the two lines can be merged together.

In one embodiment, a deep learning algorithm may be used to process each line of text and its image features using a deep learning model, so as to extract different types of field contents from the line of text. The classification categories of the deep learning model can be set according to the field categories commonly used in business cards, such as 'name', 'position', 'unit name', 'unit address', 'mobile phone', 'homepage', 'fax', 'micro signal', 'homepage' and the like, and the model is trained through a labeled training set, so that fields of corresponding categories can be extracted from input characters and image features of the input characters.

And step S6, generating a file identification text containing the contents of the plurality of category fields. For example, a TXT format or Word format file is generated. According to different requirements, file identification texts in different typesetting formats or different expression forms can be generated according to a preset format. In one embodiment, the corresponding category in the business card is expressed by text, for example, recognized text in the business card as shown in FIG. 2 is:

zhang Sanzong manager X X X X X X X province X X X X city Minlu No. 1 z h a n g s a n@1 6.c o m

The business card text generated through the steps is as follows:

name: zhang III; job position: a general manager; company: XXX company; address: XXX province XX city people way No. 1; mailbox: zhangsan@163.com.

The name, position, etc. are field names and are characters which are not available in the original name card. In order to clearly express the classification in the business card, a field name is added to the generated business card file.

In another embodiment, a code may be used to represent the field name, which may facilitate the invocation of other applications. For example, "01" for "name", "02" for "position", and the like are employed. When the current business card lacks field contents corresponding to certain preset categories, default values such as the number "0" or the characters "none" can be adopted.

Example two

In some business cards, as shown in FIG. 3, a simplified telephone drawing image is added before the telephone number

Or

Adding simple picture of facsimile machine before facsimile number

Adding envelope simple picture before E-mail

Etc. to express the meaning of the text following the image. These images help to correctly classify the text in the business card. Thus, based on the embodiment oneIn the method flow of (1), when the target business card image is subjected to image processing in step (S4), besides extracting the image characteristics of each character, the image area in front of the character is identified and whether a mark image and a corresponding category exist is determined. If the character image area is extracted from the target business card image line by line, the mark image area before the character is also extracted, and the first character in a continuous text is moved to be the first character in order to distinguish the character from other characters. And identifying whether the image in the mark image area is the various mark images, if the mark image is obtained, determining the corresponding category according to the mark image, and then establishing the corresponding relation between the mark image and the character behind the mark image. As shown in fig. 3, "mr. Liu/president" in the first row of text-image area is a continuous image of the document, and the character "liu" is the first character without the logo image preceding it. And there are sign images before the first character "mountain" in the second row "mountain east way of mountain north district of Qingdao city of Shandong province", there are sign images before the first character "+" in the third row "+8613888888888", before the first character "z" in the fourth row "zidingyi@qq.com" and before the first character "w" in the fifth row www.zidingyi.com ", extract the image characteristic of each sign image separately, according to the corresponding relation of the sign image and the category that is preset inside, can confirm that the category before" mountain "is" address "," the category before "+" is "telephone", "the category before z" is mail address ", the category before w" is "homepage".

When extracting the characters in the file character set according to the semanteme in the step 5, the method not only extracts the characters according to the semanteme characteristics and the image characteristics of the characters in the embodiment, but also refers to the category of the mark image. For example, when the character "mountain east road in north area of Qingdao city, shandong province" in the second row and the image feature of each character are input to the deep learning model, the relationship between "mountain" and the preceding marker image, and the "address" type of the marker image are also input to the deep learning model. When the deep learning model extracts the second line of characters, namely the mountain east road in the north area of Qingdao city in Shandong province, the field content of the character, namely the mountain east road in the north area of Qingdao city in Shandong province, can be determined as the address by semantic recognition, the image characteristics of each character and the address type before the character, namely the mountain.

Similarly, for the fourth line character set ' zhangsan@163.com ', the image is simply drawn according to the envelope before the first character ' z

The tag image before the first character "z" is determined to be of the "mail address" category. When the line content is extracted, the "zhangsan@163.com" can be specified as the field content whose type is "mail address" by semantic recognition, image features of each character, and a "mail address" type flag image before the word "z".

EXAMPLE III

In other business cards, as shown in FIG. 4, there may be included in the business card: "address: "," telephone: "," mail address: "" Home page: "etc. are used to represent the following words or categories of characters, which are referred to herein as flag characters. These flag characters may be referred to for classification when semantic extraction is performed in step 5. For example, when a plurality of lines of character sets are recognized line by line for characters in a business card, each line of character set is segmented semantically to obtain a plurality of word segmentation sets. For example, for the recognized line character set "phone: +86188 8888888 "performs semantic segmentation and then obtains a word segmentation set in sequence: "telephone", ": "," +8618 8888 ". Recognizing the word segmentation set to obtain 'telephone' as a marker character corresponding to a category 'telephone number', performing semantic analysis on the '+ 86188 88 8', referring to image characteristics of each number and the like, wherein the image characteristics are a telephone number, and combining the marker character 'telephone' and the following characters: "meaning of" the character set "phone: +86188 8888888 "would be the field content for the category" phone number ".

Fig. 5 is a schematic block diagram of a business card recognition apparatus according to an embodiment of the present invention, which includes a character recognition module 1, an image feature extraction module 2, a semantic extraction module 3, and a layout module 4. Wherein, in one embodiment, the image feature extraction module 2 includes an image extraction unit 21, a character image feature extraction unit 22, a marker image feature extraction unit 23, and a marker image determination unit 24. The image extraction unit 21 is configured to extract one or more character image areas from the target business card image, send the character image areas to the character recognition module 1, and perform character recognition on the one or more character image areas by the character recognition module to obtain a corresponding character set. In one embodiment, the image extraction unit 21 extracts one or more character image regions by line from the target business card image. The character image area is usually composed of continuous characters, numbers, letters or punctuation marks and the like. The image extraction unit 21 extracts an image area in front of the character, which is referred to as a logo image area in the present embodiment, in addition to the character image area. The character image feature extraction unit 22 is connected to the image extraction unit 21, and extracts an image feature of each character from the character image region. The marker image feature extraction unit 23 is connected to the image extraction unit 21, and is configured to extract image features of the marker image region. The sign image determining unit 24 is connected to the sign image feature extracting unit 23, and is configured to identify whether the sign image region includes a sign image, determine a category corresponding to the sign image when the sign image is identified, and establish a corresponding relationship between the sign image and a character behind the sign image.

The semantic extraction module 3 is respectively connected with the character recognition module 1 and the image feature extraction module 2, and is configured to extract characters in the file character set according to semantics to obtain a character set of multiple categories. In one embodiment, semantic extraction module 3 includes a semantic pre-segmentation unit 31, a classification unit 32, and a token character recognition unit 33. The semantic pre-segmentation unit 31 pre-segments the characters in the file character set according to semantics to obtain a plurality of word segmentation sets, as in the foregoing embodiment, the character set "total manager three by three" is segmented into word segmentation sets "one", "three", and "total manager", and the character set "telephone: +8 188 888 "semantic segmentation is carried out, and then word segmentation sets are obtained sequentially: "telephone", ": "," +86 88888 ". The marker character recognition unit 33 is connected to the semantic pre-segmentation unit 31 and the classification unit 32, and is configured to recognize whether a plurality of word segmentation sets segmented by the semantic pre-segmentation unit 31 include a marker character, and send the marker character to the classification unit 32. For example, recognizing "phone" as a character set "phone: a logo character in +8618 88888 ".

The classifying unit 32 is respectively connected to the semantic pre-dividing unit 31, the marker character identifying unit 33, and the character image feature extracting unit 22 and the marker image determining unit 24 in the image feature extracting module 2, and when there is no marker image or marker character, the classifying unit 32 merges or splits the characters of the multiple word segmentation sets according to the semantic features of the multiple word segmentation sets and the image features of the characters therein to obtain the field contents of the corresponding categories. When the marker character recognition unit 33 recognizes the marker character, the classification unit 32 extracts the field content of the corresponding category according to the semantic feature, the image feature of the character and the marker character; when the marker image determination unit 24 determines that there is a marker image, the classification unit 32 determines the field content of the corresponding category according to the semantic features of the characters, the image features of the characters, and the corresponding marker image.

The layout module 4 is used for typesetting the field contents of a plurality of categories according to a preset format to generate the file identification text, so that the utilization of the identification result by various scenes is adapted. For example, each category is represented by text or code, each row is provided with the field content of one category, and the like.

In order to improve the recognition accuracy, the present invention further includes an image rectification module 5, which is configured to perform image rectification, such as cropping, stretching, translating, rotating, image perspective transformation, etc., on the acquired business card image to be recognized, so as to obtain a forward target business card image.

Fig. 6 is a schematic diagram of an application of the business card recognition apparatus according to an embodiment of the present invention. In this embodiment, the business card recognition device 100 is connected to a data platform 200 that is capable of collecting various data, including business card images. After the data platform collects the business card image, the content in the business card needs to be collected, so the data platform 200 sends the business card image to the business card recognition device 100, and after the business card recognition device 100 obtains the file recognition text, the file recognition text is sent to the data platform 200. Referring to the business card recognition flow shown in fig. 7, taking the business card shown in fig. 3 as an example, the business card image recognition method is described as follows:

and step S100, acquiring a business card image to be identified. In this embodiment, a business card image is received from the data platform 200.

Step S110, determining whether the business card image is deformed, if the business card image is not deformed and is a positive rectangular image, executing step S120, and if the business card image is deformed seriously, executing step S111 to perform image correction, wherein the image correction method may be various, and will not be described herein again.

In step S120, text detection is performed on the business card image to obtain a plurality of text image regions, such as text image regions 101-105 shown in fig. 8.

Step S130, extracting the sign image regions 201-204 before the first character in each text image region.

In step S140, characters of the text image area are recognized. In the present embodiment, the text image regions 101 to 105 are input to the deep learning model one by one, so that a plurality of rows of corresponding character sets are recognized, for example, the character set 101 "mr. Liu/president" corresponding to the text image region 101, the character set 102 "shandong road in north of the kan Qingdao city, shandong province" corresponding to the text image region 102, the character set 103"+8 13888888 8" corresponding to the text image region 103, the character set 104"z i d i n g y i@q q.c o m" corresponding to the text image region 104, and the character set 105"w w.z i d i n g y i.c o m" corresponding to the text image region 105.

Step S150, extracts an image feature of each character of the text image region.

Step S160 identifies the marker image regions 201-204. By comparison with the preset marker image, it is determined that the category corresponding to the marker image region 201 is "address", the category corresponding to the marker image region 202 is "telephone", the category corresponding to the marker image region 203 is "mail address", and the category corresponding to the marker image region 204 is "home".

Step S170, establishing the corresponding relation of the mark image, the character set and the category. For example, the logo image area 201 corresponds to the character set 102 and the category "address", the logo image area 202 corresponds to the character set 103 and the category "telephone", the logo image area 203 corresponds to the character set 104 and the category "mail address", and the logo image area 204 corresponds to the character set 104 and the category "home".

And step S180, semantically extracting the characters in the file character set by referring to the mark image and the character image characteristics to obtain field contents of multiple categories. For example, for the character set 101 "mr. Liu/president", it is first semantically segmented to obtain three sets of participle sets "mr. Liu", "/", "president". Referring to the image features of these characters, the image features of the characters "liu", "first", "raw" are consistent, the image features of the characters "dong", "shi", "long" are consistent, and the image features of the characters "liu", "first", "raw" are significantly different from the image features of the characters "dong", "shi", "long", and "/" are separators, and thus, the characters "liu", "first", "raw" are merged together, and the characters "dong", "shi", "long" are merged together. And determining the field content of the category name of Mr. Liu and the category position of the president director according to the semantic features of the combined participle set Mr. Liu and the president director.

For the character set 102 "Shandong road in North district of Qingdao city, shandong province", the same as the processing of the character set 101, the character set is firstly divided into a plurality of participle sets "Shandong province", "Qingdao city", "North district" and "Shandong road", the plurality of participle sets can be determined to belong to the category "address" through semantic recognition, and then the character set 102 is also determined to belong to the category "address" by combining the corresponding relationship of the logo image, the character set and the category. And the image features of the characters in the character set 102 are the same, so that a plurality of word segmentation sets are merged together to obtain the field content of ' address ' as the category ' of ' east road in north mountain of Qingdao city, shandong province '. Similarly, it can be determined that "+8613888888888" is the field content of the category "phone number", that "zidingyi@qq.com" is the field content of the category "mail address", and that "www.zidingyi.com" is the field content of the category "home page".

Step S190, a business card recognition text is generated and sent to the data platform 200. In this embodiment, for convenience, the data platform 200 uses the business card to identify the content of each field in the text, and encodes the category, for example, representing the category by two digits. If "00" represents the category "name", "01" represents the category "position", "02" represents the category "address", "03" represents the category "telephone number", "04" represents the category "home page", and "05" represents the category "mail address", the identification text of the business card shown in fig. 3 is obtained as follows:

00

Mr. Liu

01

president

02

Shandong road in northern Shandong province, qingdao City, shandong province

03

+8613888888888

04

www.zidingyi.com

05

zidingyi@qq.com

And sending the business card identification text to the data platform 200, and ending the business card identification process.

Although the business card recognition device in the foregoing embodiment is applied to a data platform, it may be applied to a mobile terminal, and is installed on a mobile terminal and/or a PC terminal of a user in the form of APP for use by an individual user. The business card image recognition method provided by the invention can more accurately recognize characters, numbers, special characters and the like in the business card image by adopting various auxiliary modes, and can extract categories from recognized texts, so that a computer can understand field contents in the business card, and the application range of the invention is wider.

Although business cards are taken as examples, for some documents, such as identity cards, passports, professional qualifications/working certificates, the methods and apparatuses provided in the above embodiments can be used for identification and obtaining corresponding identification texts.

In order to accurately perform character recognition, correction of an original document image is a very critical step, and the present invention provides various image correction methods.

Embodiment of image correction

In the embodiment, the edge detection is carried out on the original file image, and redundant images outside the file image are cut off; and stretching, rotating and the like are carried out on the cut image, so that a forward rectangular business card image is obtained, the image is used for character recognition, and the obtained forward rectangular business card image and the business card image used for character recognition are called as a target business card image for distinguishing the image and the image used in the subsequent processing process.

Second embodiment of image correction

The present invention also provides another method for correcting an image, which is a flow shown in fig. 9:

step S21, four vertex positions of the original file image are obtained. The method comprises the steps of obtaining four boundaries of an effective image by adopting linear detection and screening logic of the image, and obtaining an intersection point of the two boundaries to obtain four vertexes of the business card.

Step S22, obtaining a mapping matrix. Respectively mapping the obtained current four vertexes to corresponding vertexes of a forward business card, and obtaining a mapping matrix according to the mapping relation.

And step S23, carrying out image perspective transformation on the effective image of the original file to be identified. Namely, according to the mapping matrix, the forward target file image is obtained by carrying out image perspective transformation on the effective image of the original file to be identified.

Image correction example three

For some documents with human figures or some images with regular shapes, the mapping matrix for the perspective transformation can be determined by deformation of these images. The portrait and some images having a regular shape are hereinafter referred to as usable images. As shown in the flowchart of fig. 10, taking a business card as an example, the method for correcting an image includes the following steps:

and step S21a, detecting edges of the business card.

In step S22a, the image in the business card is extracted and recognized. For example, one or more images in a picture are identified by object detection methods. The image may be a portrait, a LOGO image, different color blocks in the background for distinguishing content fields, etc.; the portrait in the name card is usually a positive facial image of the head and a positive facial image of the half body, and is perpendicular to one side of the name card, such as the long side of the name card. Some business cards have regular color blocks, two-dimensional codes and the like.

And step S23a, judging whether the identified image is a usable image, if so, executing step S24a, and if not, abandoning the image correction by adopting the method.

In step S24a, a key point is selected on the available image. For example, when the image is a portrait, a plurality of points on the face feature, such as a plurality of contour points on two eyes, a middle point and a contour point on a nose, and a plurality of contour points on a center point of a mouth, can be obtained through a face alignment algorithm. If the portrait has a rectangular background, 4 vertices of the rectangular background box are obtained. If the two-dimensional code image is a two-dimensional code image, 4 vertexes of the boundary are taken as detection points, if the two-dimensional code image is a rectangular color block, 4 vertexes of the rectangular color block are taken as detection points, and if the two-dimensional code image is a circle, the two endpoints of the two vertical diameters are taken as detection points.

In step S25a, key points on the usable image are analyzed, and the usable image is corrected to obtain a standard usable image. For example, for the portrait, whether the face is in the forward direction or the lateral direction is determined according to the face features, when the face is in the forward direction, points on a symmetry axis on the face, such as a nose center point, are selected, and the position relationship between corresponding points on two sides of the nose center point and the nose center point is analyzed, so that the displacement and the rotation angle in the x axis and the y axis are determined, and the positions after correction are obtained through transformation. By this method, a corrected portrait is obtained. And similarly selecting points on the symmetry axis for the two-dimensional code image or color blocks such as rectangles and circles, analyzing the change of the points on two sides of the symmetry axis so as to determine the image deformation rule, and correcting according to the rule to obtain the corrected usable image.

And S26a, selecting four pairs of corresponding key points on the usable images before and after correction, and determining a transformation matrix according to the key points. For example, a 3 × 3 transformation matrix is calculated according to the following equation 1-1:

wherein, (u, v, w) is a point coordinate in an available image before correction, and (x ', y ', w ') is a homogeneous coordinate;

which is a transformation matrix.

Since the invention deals with two-dimensional images, w and a ₃₃ Constant is 1, and:

x＝x′/w′ 1-2

y＝y′/w′ 1-3

(x, y) is the coordinates in the corrected usable image. It can be derived from equations 1-1, 1-2 and 1-3:

from equations 1-4 and 1-5, when the coordinates of the corresponding points of 4 are known, the required transformation matrix can be calculated by using 8 equations

In this embodiment, four pairs of corresponding key points are selected from the available images before and after rectification, so that the transformation matrix M can be obtained.

In step S27a, the original business card image is transformed by using the transformation matrix M to obtain a forward target business card image.

In the above embodiment, a rectangular business card is taken as an example, and of course, the business card may have other shapes, and when the business card has other shapes, the obtained key points are not four vertices of the rectangle, but are positions corresponding to the shapes, for example, when the business card has an oval shape, the key points may respectively take two vertices of a long axis and two vertices of a short axis of the oval shape. For another example, when the business card is heart-shaped, the key points may be respectively the highest point and the lowest point of the vertical symmetry line of the heart shape and the points of the longest distance on both sides of the vertical symmetry line, so as to adapt to various business card images having standard shapes.

Image correction example four

In this embodiment, an identification card picture is taken as an example, and a flowchart shown in fig. 11 is used to describe the method provided by the present invention as follows:

step S1a, an effective image area is obtained from an original picture. As shown in fig. 12, the original picture 1a includes a background image area 10a more or less in addition to an image area 11a of the photographic subject. In order to avoid that too many background image areas 10a affect the correction effect, first an effective image area of the subject is detected from the original picture 1 a. In one embodiment, image detection is performed using a Yolo model, so that the effective image area 12a where the identification card (subject) is located is detected and cut out from the original picture 1a to obtain the image shown in fig. 13.

Step S2a, predicting a key point of a preset position from the effective image area 12 a. The method and the device can predict different numbers of key points at different positions according to different file images. The set of the plurality of key points can determine the deformation condition of one image, such as deformation in the horizontal direction and the vertical direction, the rotation angle and the like. The greater the number of keypoints, the more accurate the determination of the deformation situation, but the more complex the calculation and the more resource-consuming. Therefore, under the condition of meeting the precision requirement, the advantages of high processing speed and less occupied resources can be obtained by a proper number of key points.

Key points embodiment one

In one embodiment, when the shape of the correction object is a rectangle, such as an identity card, the key points to be predicted include at least four vertices of the rectangle, and in order to obtain more deformation information, points at certain positions of the image should also be predicted. In one embodiment, 13 first keypoints as shown in fig. 14 need to be predicted, which includes four rectangular vertices, four edges' midpoints, two diagonal intersections, and a midpoint between a diagonal intersection and four vertices as shown in fig. 14. In an embodiment, the 13 keypoints can be predicted by using a CPN (Cascaded pyramid network) model (to distinguish from the keypoints determined by other methods, the keypoints determined in this embodiment are referred to as first keypoints). The CPN mainly comprises GlobalNet and RefineNet, wherein the GlobalNet is a convolutional neural network which is based on resnet (residual error network) and has a U-shaped structure and is used for feature extraction. In this embodiment, the GlobalNet can perform coarse detection of the keypoints, for example, some keypoints with obvious features and easy search can be detected. The RefineNet detects key points which are not easy to detect by up-sampling and combining the features of each layer of the GlobalNet feature extraction part to obtain more detailed image features and detecting key points from the features.

In order to detect a preset number of key points at preset positions, a CPN network needs to be trained by using samples to obtain a CPN model. First a sample set of images is constructed. According to the type of the correction object, key points are marked on the picture of the correction object, for example, for an identity card image, 13 key points are marked according to the graph shown in fig. 14, so that an image sample is constructed. The image sample set includes samples representing various image deformations as much as possible, thereby increasing the accuracy of prediction. The image samples are then input into the CPN network, which is trained to learn the locations of these key points. In the training process, monitoring indexes such as loss and accuracy are observed to judge the training state of the current model, and the hyperparameter is adjusted in time to train the model more scientifically, so that the resource utilization rate is improved. The hyper-parameters are, for example, learning rate (Lr), batch size (Batchsize), optimizer, iteration number, activation function, and the like. And finally obtaining an optimal CPN model through the training and the adjustment of the hyper-parameters. Which can accurately predict the required keypoints from any one of the images. The present embodiment utilizes the CPN model to predict the required key points from the effective image area 12 a.

Key points example two

When the photographic subject has a standard style and standard parameter items, the key points can be acquired by using the standard parameter items. For example, the parameter items such as "name", "sex", "ethnic group" and "national identification number" on the identification card, and the parameter items such as "surname", "first name", "date of birth", "validity period", "place of birth" on the chinese passport and english. The parameter items have fixed and standard styles and are typeset in the same way. Thus, the points set at the parameter item locations can also result in subsequent matrices. To distinguish from the key points in the embodiment, the point corresponding to the position of the standard parameter item in the embodiment is referred to as a second key point. By training the CPN model, the corresponding second key point can be predicted from any shooting object picture. For example, a point is predicted before the name, sex and ethnicity of the picture on the front side of the ID card, a point is predicted at the front end and the back end of the ID card, and a point is predicted before or after the year, month and day; on the reverse picture of the identity card, two end points on the horizontal symmetry axis and the vertical symmetry axis of the pattern at the upper left corner, a point is predicted before the middle line of characters in the first line, a point is predicted after the nation in the first line of characters, a point is predicted before the middle line of characters in the second line, a point is predicted after the certificate in the second line, and a point is predicted before the issuing authority and the validity period.

Similarly, a key point is set at each of the four vertex positions of the photo on the Chinese passport, two end points of the top character line, two end points of the bottom character line, and the positions in front of the description lines of several middle important items, such as the positions of the surname, the first name, the birth date, the validity period and the place of birth. For the qualification certificates of some industries, such as accounting practice qualification certificate, lawyer practice qualification certificate, etc., the corresponding CPN model can be set to predict the corresponding second key points as required.

Key points example III

When the photographic subject carries a standard style avatar, the avatar may be used to determine key points, referred to herein as third key points. For example, the head portrait in the original picture is identified, and a rectangular frame where the head portrait is located is determined according to the head portrait. And determining at least four points on the rectangular frame where the head portrait is located. For example, a photograph of a person on a passport has a uniform size with distinct rectangular boundaries compared to the background. The same features apply to the photos on the certificate of identity. A rectangular frame where the head portrait is located can be obtained by identifying the effective region of the person photo on the shooting object, and vertexes, four-side midpoints and the like are predicted in the rectangular frame to serve as third key points.

And S3a, acquiring corresponding standard key points in a standard picture with the same type as the original picture. For the first keypoint, 13 keypoints as shown in FIG. 15 are obtained from its standard picture. And for the second key point, acquiring the standard key point from the corresponding position of the corresponding parameter item in the standard picture of the same category. For the third keypoint, the standard keypoint is obtained from the corresponding position in the character avatar rectangular frame in the standard picture of the same category.

S4a, constructing a transformation matrix according to the standard key points and the predicted key points; that is, the coordinates of 13 points predicted from fig. 14 are mapped to the coordinates of a corresponding point of the regular identity document image as shown in fig. 15 by a mapping calculation, and a mapping matrix, that is, a transformation matrix for perspective transformation is obtained. As described in the foregoing embodiments, the details are not repeated herein.

The key points (the first key point, the second key point and the third key point) in the invention at least comprise 4 points, so that a transformation matrix M is obtained by every 4 pairs of key points, and linear regression is carried out on a plurality of transformation matrices M to obtain an optimal transformation matrix M, thereby improving the transformation precision and accuracy.

Step S5a, performing perspective transformation on the effective image area 12a of the original picture according to the transformation matrix M, and after the coordinates of the pixel points of the effective image area 12a are transformed, assigning pixel values to the transformed coordinates, obtaining a corrected forward image, as shown in fig. 16.

Since at least three key points can be obtained by adopting the three methods, a first transformation matrix, a second transformation matrix or a third transformation matrix can be determined by randomly adopting one key point and a corresponding standard key point, and the original image is subjected to transmission transformation by adopting the first transformation matrix, the second transformation matrix or the third transformation matrix. Or when the original picture accords with the condition of acquiring the second key point and/or the third key point, respectively acquiring two or three transformation matrixes by using two or three key points, and then calculating the optimal transformation matrix according to respective weights. For example, when the first transformation matrix and the second transformation matrix are obtained from the first keypoint and the second keypoint, 0.8 is calculated as the weight of the first transformation matrix and 0.2 is calculated as the weight of the second transformation matrix. For example, in a matrix ₁₁ For example, the first element in the first transformation matrix is a ₁₁₁ The first element of the second transformation matrix is a ₁₁₂ A is obtained through weighted calculation ₁₁ ＝0.8a ₁₁₁ +0.2a ₁₁₂ 。

When the first, second and third transformation matrices are obtained from the first, second and third key points, respectively, 0.7 is calculated as the weight of the first transformation matrix, 0.1 is calculated as the weight of the second transformation matrix, and 0.2 is calculated as the weight of the second transformation matrix. In a matrix ₁₁ For example, the first element in the first transformation matrix is a ₁₁₁ The first element of the second transformation matrix is a ₁₁₂ The first element of the third transformation matrix is a ₁₁₃ A is obtained through weighted calculation ₁₁ ＝0.7a ₁₁₁ +0.1a ₁₁₂ +0.2a ₁₁₃ 。

The optimal transformation matrix is obtained by performing the above-mentioned weighting calculation on each element in the matrix, and the original image is subjected to transmission transformation by using the optimal transformation matrix, so that a better correction effect can be obtained.

Embodiment one of the image rectification module

Fig. 17 is a functional block diagram of a picture rectification module according to an embodiment of the present invention. In this embodiment, the identification card picture shown in fig. 12 is taken as an example. The picture rectification module 5 includes an image preprocessing module 51a, a keypoint prediction module 52a, a matrix calculation module 53a, and a transformation module 54a. Wherein the image pre-processing module 51a is configured to obtain the effective image area from the original picture. In an embodiment, the image preprocessing module 51a is a Yolo model module, which uses a Yolo (young Only Look on: unified, real-Time Object Detection) algorithm to extract features from the original image and predict the features, so as to obtain an effective image area, and cuts out the effective image area from the original image to obtain an image that can be processed, such as the image shown in fig. 13. The keypoint prediction module 52a is connected to the image preprocessing module 51a, and is configured to predict the keypoints at preset positions in the effective image region 12 a. In one embodiment, the keypoint prediction module 52a is a trained CPN model module that provides CPN models for predicting keypoints of various document images and document images, and selects a corresponding CPN model for image processing according to the type of input image. In this embodiment, the input image is an identity card image, and the CPN model outputs 13 key points as shown in fig. 14. When a standard passport image is input, the keypoint prediction module 52a employs a CPN model that predicts the keypoints of the passport, and when a standard form image is input, the keypoint prediction module 52a employs a CPN model … … that predicts the keypoints of the form. The number and the positions of the key points can be set on some standard parameter items besides the positions of the key points of the identity card shown in fig. 14 according to different shooting objects in the image. Or key points corresponding to the character avatar on the photographic subject are acquired. The matrix calculation module 53a is connected to the keypoint prediction module 52a, and obtains the corresponding standard keypoints in the standard pictures of the corresponding category according to the category of the processed pictures, and performs mapping calculation on the predicted keypoints to construct the transformation matrix M. The calculation process is shown in step S4a of fig. 11, and will not be described again here. The transformation module 54a is respectively connected to the image preprocessing module 1a and the matrix calculation module 53a, and performs perspective transformation on the effective image area of the original picture according to the transformation matrix M and formulas 1 to 6 to obtain a corrected picture.

Second embodiment of the image rectification module

Fig. 18 is a functional block diagram of a picture correction module according to another embodiment of the present invention. In this embodiment, the original picture to be corrected is a certificate picture with a portrait and standard parameter items. In this embodiment, on the basis of the embodiment shown in fig. 17, the keypoint prediction module 52a includes a first keypoint prediction unit 521a, a second keypoint prediction unit 522a, and a third keypoint prediction unit 523a, correspondingly, the matrix calculation module 53a includes a first transformation matrix calculation unit 531a, a second transformation matrix calculation unit 532a, a third transformation matrix calculation unit 533a, and a weighting calculation unit 534a, and further includes a head portrait identification module 55a.

In this embodiment, each of the keypoint prediction units performs keypoint prediction on the effective image region by using a CPN model. The first keypoint prediction unit 521a obtains 13 first keypoints by predicting according to the first keypoint embodiment in the foregoing method embodiment, and correspondingly, the first transformation matrix calculation unit 531a in the matrix calculation module 53a obtains one first transformation matrix M1 by calculating according to the 13 first keypoints.

The second keypoint prediction unit 522a obtains second keypoints at positions corresponding to the plurality of standard parameter items by predicting according to the keypoint embodiment of the foregoing method embodiment, and correspondingly, the second transformation matrix calculation unit 532a in the matrix calculation module 53a obtains a second transformation matrix M2 by calculating according to the plurality of second keypoints.

The avatar recognition module 55a recognizes the avatar of the person in the effective image area, thereby obtaining a rectangular frame in which the avatar of the person is located.

The third key point predicting unit 523a predicts a plurality of third key points from the rectangular frame of the portrait of the person in the original image according to the third key point embodiment in the foregoing method embodiments, and correspondingly, the third transformation matrix calculating unit 533a in the matrix calculating module 53a calculates a third transformation matrix M3 according to the plurality of third key points.

The weighting calculation unit 534a is connected to the first transformation matrix calculation unit 531a, the second transformation matrix calculation unit 532a, and the third transformation matrix calculation unit 533a, respectively, and performs weighting calculation on elements in the three transformation matrices according to weights corresponding to different kinds of key points, respectively, to finally obtain an optimal transformation matrix M. The transformation module 54a transforms the effective image area of the original picture according to the optimal transformation matrix M, thereby obtaining a rectified image.

In the embodiment, the key points are obtained from the original picture by different key point extraction methods, so that the deformation condition of the image can be more accurately obtained, and respective weights are determined according to the contribution of various key points to the accuracy of the transformation effect, so that the image can be more accurately corrected, and a better correction effect is achieved.

Image rectification module embodiment three

Fig. 19 is a functional block diagram of a picture correction module according to yet another embodiment of the present invention. In this embodiment, the original picture to be corrected is a rectangular certificate picture with a character avatar and standard parameter items. This embodiment is based on the embodiment shown in fig. 17, and the keypoint prediction module 52a includes a first keypoint prediction unit 521a, a second keypoint prediction unit 522a, and a third keypoint prediction unit 523a. Each key point prediction unit adopts a CPN model to predict key points in the effective image area. The first keypoint prediction unit 521a predicts 4 to 13 first keypoints according to the first keypoint embodiment in the foregoing method embodiments; the second keypoint prediction unit 522a obtains second keypoints at positions corresponding to at least 4 standard parameter items by predicting according to the keypoint embodiment two in the foregoing method embodiment, and the third keypoint prediction unit 523a predicts at least 4 third keypoints from the character avatar rectangular frame in the original image according to the keypoint embodiment three in the foregoing method embodiment. The matrix calculation module 53a receives the three key points, performs permutation and combination on the key points, calculates a plurality of transformation matrices M by combining with corresponding standard key points, performs linear regression calculation on the transformation matrices M to obtain an optimal transformation matrix M, and the transformation module 54a transforms the effective image area of the original image according to the optimal transformation matrix M to obtain the corrected image.

Although the image rectification module is described in the above description by taking an identification card picture as an example, the image rectification method and module are also applicable to business cards, passports, and various certificates with unified specifications in the present invention. The invention stores business card templates with various specifications in the database, and when the business card image to be identified is corrected, the corresponding business card templates can be inquired from the database, and the standard key points corresponding to various prediction key points are obtained, thereby obtaining a transformation matrix and correcting the business card image.

The above embodiments are provided only for illustrating the present invention and not for limiting the present invention, and those skilled in the art can make various changes and modifications without departing from the scope of the present invention, and therefore, all equivalent technical solutions should fall within the scope of the present invention.

Claims

1. A card type file image identification method comprises the following steps:

the method for processing the image of the target card type file at least obtains the image characteristic of each character comprises the following steps:

extracting a character image area from a target card type file image;

segmenting the character image area to obtain a single character image area;

comparing the image characteristics of the characters of the multiple participles;

2. A card-like document image recognition method in accordance with claim 1, wherein when the target card-like document is a business card, the image processing of the target card-like document image further comprises:

identifying an image area before a first character;

in response to a sign image being recognized before a first character, establishing a corresponding relation between the first character and the sign image;

extracting image features of the marker image to determine a category corresponding thereto; and

when characters in a file character set are merged and/or split according to the semantic features and the image features of the characters, the category of the character set where the first character is located is determined according to the semantic features of the character set where the first character is located and the corresponding mark image.

3. The card file image recognition method of claim 1, wherein when the target card file is a business card or a certificate with standard parameter items, when extracting the characters in the file character set according to semantics, further comprising:

identifying a marker character in the multiple word segmentation sets according to the semantic features and the image features of the character, wherein the marker character at least comprises words representing a category; and

and in response to the identification of the marker character, when the characters in the file character set are merged and/or split according to the semantic features and the image features of the characters, determining a character set which corresponds to the semantics of the marker character after the marker character as the category represented by the marker character.

4. The card-like document image recognition method according to claim 1, further comprising: and correcting the card type file image to be identified into a forward target card type file image.

5. The card-like document image recognition method according to claim 4, further comprising:

obtaining an effective image area from an original card type file image to be identified;

identifying at least four vertices of the active image area;

mapping the at least four vertexes to four vertexes of a forward card file so as to obtain a mapping matrix, wherein the forward card file and the original card file to be identified are the same specification file; and

and carrying out image perspective transformation on the effective image area of the card type file image to be identified according to the mapping matrix so as to obtain a forward target card type file image.

6. The card document image recognition method of claim 4, further comprising:

predicting key points of a preset position in the effective image area;

acquiring corresponding standard key points in a standard forward image of which the type is the same as that of an original card file to be identified;

constructing a transformation matrix according to the standard key points and the predicted key points; and

and carrying out perspective transformation on the effective image area of the original card file to be identified according to the transformation matrix so as to obtain a forward target card file image.

7. The image recognition method of a card document according to claim 6, wherein the key points of the preset positions are: the method comprises the following steps of including any four or more first key points in vertexes of an effective image area, midpoints of four sides, intersection points of two diagonals and midpoints between the intersection points of the diagonals and the four vertexes; and/or any more than four second key points corresponding to the positions of the standard parameter items; and/or a third key point on the rectangular frame where any more than four character head portraits are located; correspondingly, the transformation matrix is obtained according to any one or more combinations of the first key point, the second key point and the third key point and the corresponding standard key points.

8. The card document image recognition method of claim 6, wherein the key points are predicted by a CPN model.

9. The card-type document image recognition method according to claim 5 or 6, wherein the effective image area is detected from the original card-type document image to be recognized by using a Yolo model.

10. The card document image recognition method of claim 1, further comprising:

and typesetting the contents of the plurality of category fields according to a preset format to generate the file identification text.

11. A card-type document image recognition apparatus, comprising:

the character recognition module is configured to perform character recognition on the target card type file image to obtain a file character set;

the image feature extraction module is configured to perform image processing on the target card type file image, and at least obtain an image feature of each character, and specifically includes:

extracting a character image area from the target card type file image;

segmenting the character image area to obtain a single character image area;

a semantic extraction module, connected to the character recognition module and the image feature extraction module, configured to, when performing semantic extraction on characters in the file character set, merge and/or split the characters in the character set at least according to semantic features and image features of the characters to obtain field contents of multiple categories, specifically including the following steps:

a layout module, connected to the semantic extraction module, configured to lay out a plurality of category field contents according to a preset format to generate a document identification text including the plurality of category field contents.

12. The card document image recognition device of claim 11, wherein the image feature extraction module includes:

an image extraction unit configured to extract a character image region and a sign image region before a character from a target card-type document image;

a character image feature extraction unit connected to the image extraction unit and configured to extract an image feature of each character from the character image region;

a sign image feature extraction unit connected with the image extraction unit and configured to extract an image feature of a sign image region; and

a marker image category determination unit connected with the marker image feature extraction unit and configured to determine a category corresponding to a marker image when the marker image is recognized.

13. The card document image recognition device according to claim 12, wherein the semantic extraction module includes:

the semantic pre-segmentation unit is configured to pre-segment the characters in the file character set according to semantic features to obtain a plurality of word segmentation sets; and

the classification unit is connected with the semantic pre-segmentation unit, the character image feature extraction unit and the mark image category determination unit and is configured to merge and/or split the multiple participle sets according to the semantic features and the character image features of the characters in the participle sets according to preset categories to obtain corresponding category field contents; or merging and/or splitting the multiple participle sets according to the semantic features, the character image features and the mark images of the characters in the participle sets to obtain the corresponding category field content.

14. The card document image recognition device of claim 13, wherein the semantic extraction module further comprises:

a flag character recognition unit connected to the semantic pre-segmentation unit and configured to recognize whether a flag character is included in the plurality of segmented word sets; correspondingly, the classification unit is connected with the mark character recognition unit, and the multiple word segmentation sets are merged and/or split according to the semantic features, the character image features and the corresponding mark characters of the characters in the word segmentation sets to obtain the corresponding category field content.

15. The card document image recognition device of claim 11, further comprising an image rectification module configured to rectify an original card document image to be recognized into a forward target card document image.

16. The card document image recognition device of claim 15, wherein the image correction module includes:

the image preprocessing module is configured to acquire an effective image area from an original card type file image to be identified;

a key point prediction module connected with the image preprocessing module and configured to predict key points at preset positions in the effective image area;

a matrix calculation module connected with the key point prediction module and configured to perform mapping calculation according to the predicted key points and corresponding standard key points in a standard picture to construct a transformation matrix; and

and the transformation module is respectively connected with the image preprocessing module and the matrix calculation module and is configured to perform perspective transformation on the effective image area of the original card file to be identified according to the transformation matrix so as to obtain a forward target card file image.

17. The card document image recognition device of claim 16, wherein the keypoint prediction module includes a plurality of CPN models that respectively predict the following keypoints: the method comprises the following steps of including any four or more first key points in vertexes of an effective image area, midpoints of four sides, intersection points of two diagonals and midpoints between the intersection points of the diagonals and the four vertexes; and/or any more than four second key points corresponding to the positions of the standard parameter items; and/or a third key point on the rectangular frame where any more than four character head portraits are located;

correspondingly, the transformation module obtains a transformation matrix according to any one or more combinations of the first key point, the second key point and the third key point and the corresponding standard key points.