[go: up one dir, main page]

CN111881810B - Certificate identification method, device, terminal and storage medium based on OCR - Google Patents

Certificate identification method, device, terminal and storage medium based on OCR Download PDF

Info

Publication number
CN111881810B
CN111881810B CN202010720829.4A CN202010720829A CN111881810B CN 111881810 B CN111881810 B CN 111881810B CN 202010720829 A CN202010720829 A CN 202010720829A CN 111881810 B CN111881810 B CN 111881810B
Authority
CN
China
Prior art keywords
preset
certificate
information
character array
version
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010720829.4A
Other languages
Chinese (zh)
Other versions
CN111881810A (en
Inventor
邓建泉
黄云晋
张智斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qianhai Life Insurance Co ltd
Original Assignee
Qianhai Life Insurance Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianhai Life Insurance Co ltd filed Critical Qianhai Life Insurance Co ltd
Priority to CN202010720829.4A priority Critical patent/CN111881810B/en
Publication of CN111881810A publication Critical patent/CN111881810A/en
Application granted granted Critical
Publication of CN111881810B publication Critical patent/CN111881810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

The application discloses a certificate recognition method, a device, a terminal and a readable storage medium based on OCR, wherein the certificate recognition method based on OCR obtains a picture of a certificate to be recognized, recognizes the picture through OCR, and stores characters obtained by recognition into a preset character array; judging the front and back information of the certificate to be identified based on a preset front and back distinguishing rule and the storage information of the preset character array; judging version information of the certificate to be identified based on the front and back information, the storage information of the preset character array and a preset version distinguishing rule; and obtaining the certificate content of the certificate to be identified based on the version information, the storage information of the preset character array and a preset attribute confirmation rule. The method and the device can identify the front and back information of the new certificate and the front and back information of the old certificate, so that the identification efficiency of the certificate to be identified is improved.

Description

Certificate identification method, device, terminal and storage medium based on OCR
Technical Field
The present application relates to the field of document identification technologies, and in particular, to an OCR-based document identification method, apparatus, terminal, and storage medium.
Background
With the popularization of mobile internet, the certificate information entry scenes of the in-and-out passage of the harbor and australian residents and the in-and-out passage of the taiwan residents are more and more. The existing recognition equipment cannot recognize the certificates of the new version and the old version at the same time, can only recognize the positive contents of the certificates, and can not support the quick support if the certificate is updated by meeting the countryside certificate update, and the optical character recognition OCR service (Optical Character Recognition ) under the present needs to collect a large number of new version data sets for retraining. It follows that current OCR-based document recognition is inefficient.
Disclosure of Invention
The main purpose of the application is to provide a certificate recognition method, device, terminal and computer storage medium based on OCR, aiming at solving the technical problem of low certificate recognition efficiency based on OCR in the prior art.
To achieve the above object, an embodiment of the present application provides an OCR-based certificate recognition method, including the steps of:
acquiring a picture of a certificate to be identified, identifying the picture through OCR, and storing characters obtained by identification into a preset character array;
judging the front and back information of the certificate to be identified based on a preset front and back distinguishing rule and the storage information of the preset character array;
Judging version information of the certificate to be identified based on the front and back information, the storage information of the preset character array and a preset version distinguishing rule;
and obtaining the certificate content of the certificate to be identified based on the version information, the storage information of the preset character array and a preset attribute confirmation rule.
Optionally, the obtained preset character arrays are stored one by one according to the identified sequence to generate storage information; the step of judging the front and back information of the preset certificate to be identified based on the preset front and back distinguishing rule and the storage information of the preset character array comprises the following steps:
sequentially detecting the number of non-Chinese characters in the last three rows in the storage information of the preset character array;
if the number of the non-Chinese characters in each row accords with a first preset range and any one of the three rows comprises the first preset characters, judging that the front and back information of the certificate to be identified is a back certificate;
and if the number of the non-Chinese characters in each row does not accord with a first preset range, judging that the front and back information of the certificate to be identified is the front certificate.
Optionally, the step of determining the version information of the certificate to be identified based on the front and back information, the stored information of the preset character array and a preset version distinguishing rule includes:
When the front and back information of the certificate to be identified is a back certificate, if the stored information of the preset character array accords with the line number which is smaller than 10 and larger than 5 and the preset characters do not exist, judging that the version information of the certificate to be identified is a new version certificate;
if the stored information of the preset character array accords with the line number less than 6 and the preset characters exist, judging that the version information of the certificate to be identified is an old-version certificate;
when the front and back information of the certificate to be identified is the front certificate, if the storage information of the preset character array accords with a first preset version distinguishing rule, judging that the version information of the certificate to be identified is a new version certificate;
and if the stored information of the preset character array accords with a second preset version distinguishing rule, judging that the version information of the certificate to be identified is an old version certificate.
Optionally, the step of obtaining the certificate content of the certificate to be identified based on the version information, the stored information of the preset character array and a preset attribute confirmation rule includes:
detecting whether target Chinese characters with the number smaller than a first preset value exist in a second row or a third row in the preset character array, and if so, judging that the target Chinese characters are Chinese names;
Detecting preset character arrays of the two rows after the Chinese name, if the preset character arrays of the two rows after the Chinese name exist, the number of target capital letters is larger than a second preset value, and if the target capital letters exist, the target capital letters are English names of the Chinese name, wherein the second preset value is larger than the first preset value.
Optionally, the step of obtaining the certificate content of the certificate to be identified based on the version information, the stored information of the preset character array and a preset attribute confirmation rule further includes:
when the version information is a new version certificate, detecting whether the ninth line to the thirteenth line in the preset character array have Chinese characters with the number larger than the first preset value, and if so, the Chinese characters have second preset characters, wherein the second preset characters are issuing authorities;
detecting whether third preset characters or fourth preset characters with the character length smaller than a third preset value exist in fifth to eighth rows in the preset character array, and if so, judging that the third preset characters or the fourth preset characters are sex information;
detecting whether target numbers with the number of characters smaller than a fourth preset value exist in the last two rows of the preset character array, if so, determining that the last two digits of the target numbers are the issuing times, and if not, determining that the number of characters of any one of the last two rows is larger than the fifth preset value, determining that the last two digits of the target rows are the issuing times;
When the version information is the old version certificate, detecting whether the third preset character or the fourth preset character with the character length smaller than the third preset value exists in the fourth line to the sixth line in the preset character array, and if so, judging that the third preset character or the fourth preset character is sex information.
Optionally, the step of obtaining the certificate content of the certificate to be identified based on the version information, the stored information of the preset character array and a preset attribute confirmation rule further includes:
when the version information is a new version certificate, extracting a first number of any one of a sixth row to an eighth row in the preset character array, judging whether a first target number with a character length of a sixth preset value exists in the first number, wherein the size of the first target number accords with a second preset range, and if the first target number exists, the first target number is the birth date;
extracting a second number of any one of a seventh line to a ninth line in the preset character array, judging whether a second target number with a character length of a seventh preset value exists in the second number, and if so, judging that the second target number is an effective date;
When the version information is the old version certificate, extracting a third number of any one of a fifth row to a seventh row in the preset character array, judging whether a third target number with the character length of the sixth preset value exists in the third number, wherein the size of the third target number accords with a second preset range, and if the third target number exists, the third target number is the birth date;
and extracting numbers with the number of lines larger than any two lines in the eighth line from the preset character array, judging whether the numbers have the character length which is the sixth preset value, the size which accords with the third preset range and is a fourth target number and a fifth target number, if so, the fourth target number is a cut-off valid period, and the fifth target number is a issuing period, wherein the fourth target number is larger than the fifth target number.
Optionally, the step of obtaining the certificate content of the certificate to be identified based on the version information, the stored information of the preset character array and a preset attribute confirmation rule further includes:
when the version information is a new version certificate, detecting whether a first target number with the total number of non-Chinese characters conforming to a fourth preset range exists in any one of the last two rows of the preset character array, if so, the first eight digits of the first target number are Taiwan certificate numbers, and the first nine digits of the first target number are Harbour and Australia certificate numbers;
Detecting whether a second target number of which the total number of the non-Chinese characters accords with a fifth preset range exists in the preset character array, and if so, taking the first ten digits of the second target number as an identity card number;
when the version information is the old version certificate, detecting whether any one of the first three rows in the preset character array has a third target number with the total number of non-Chinese characters conforming to the fourth preset range, and if so, determining that the first eleven digits of the third target number are Taiwan certificate numbers or Australian certificate numbers;
detecting whether a fourth target number with the total number of non-Chinese characters conforming to the fifth preset range exists in any one of a seventh line to a tenth line in the preset character array, wherein the fourth target number does not belong to a time format, and if so, the first ten digits of the fourth target number are port and Australian identification card numbers.
The present application also provides an OCR-based document recognition device comprising:
the storage module is used for acquiring a picture of the certificate to be identified, identifying the picture through OCR, and storing characters obtained through identification into a preset character array;
the first judging module is used for judging the front and back information of the preset certificate to be identified based on a preset front and back distinguishing rule and the storage information of the preset character array;
The second judging module is used for judging the version information of the certificate to be identified based on the front and back information, the storage information of the preset character array and a preset version distinguishing rule;
and the identification module is used for obtaining the certificate content of the certificate to be identified based on the version information, the storage information of the preset character array and the preset attribute confirmation rule.
Optionally, the first determining module includes:
the first detection unit is used for sequentially detecting the number of the non-Chinese characters in each of the last three rows in the storage information of the preset character array;
the first judging unit is used for judging that the front and back information of the certificate to be identified is a back certificate if the number of the non-Chinese characters in each row accords with a first preset range and any one of the three last rows comprises the first preset characters;
and the second judging unit is used for judging that the front and back information of the certificate to be identified is the front certificate if the number of the non-Chinese characters in each row does not accord with the first preset range.
Optionally, the second determining module includes:
the third judging unit is used for judging that the version information of the certificate to be identified is a new version certificate if the storage information of the preset character array accords with the line number less than 10 and greater than 5 and the preset characters do not exist when the front and back information of the certificate to be identified is a back certificate;
A fourth judging unit, configured to judge that the version information of the certificate to be identified is an old version certificate if the stored information of the preset character array accords with the line number less than 6 and the preset text exists;
a fifth judging unit, configured to judge that the version information of the document to be recognized is a new version document if the stored information of the preset character array conforms to a first preset version distinguishing rule when the front and back information of the document to be recognized is a front document;
and the sixth judging unit is used for judging that the version information of the certificate to be identified is the old version certificate if the stored information of the preset character array accords with a second preset version distinguishing rule.
Optionally, the identification module includes:
the first recognition unit is used for detecting whether target Chinese characters with the number smaller than a first preset value exist in a second row or a third row in the preset character array, and if so, the target Chinese characters are Chinese names;
the second recognition unit is used for detecting preset character arrays of the two rows after the Chinese name, if the preset character arrays of the two rows after the Chinese name exist, the number of the target capital letters is larger than a second preset value, and if the target capital letters exist, the target capital letters are English names of the Chinese name, wherein the second preset value is larger than the first preset value.
Optionally, the identification module further includes:
the third recognition unit is used for detecting whether the ninth line to the thirteenth line in the preset character array have Chinese characters with the number larger than the first preset value or not when the version information is a new version certificate, and if so, the second preset characters are issuing authorities;
a fourth recognition unit, configured to detect whether a third preset character or a fourth preset character, where the character length of the third preset character or the fourth preset character is smaller than a third preset value, exists in a fifth row to an eighth row in the preset character array, and if the third preset character or the fourth preset character exists, the third preset character or the fourth preset character is gender information;
a fifth identifying unit, configured to detect whether a target number with the number of characters smaller than a fourth preset value exists in the last two rows of the preset character array, if so, the last two digits of the target number are the number of issuing times, and if not, only any one of the characters in the last two rows is larger than the fifth preset value, the last two digits of the target row are the number of issuing times;
and the sixth recognition unit is used for detecting whether the third preset character or the fourth preset character with the character length smaller than the third preset value exists in the fourth line to the sixth line in the preset character array when the version information is the old version certificate, and if so, the third preset character or the fourth preset character is sex information.
Optionally, the identification module further includes:
a seventh identifying unit, configured to extract a first number of any one of a sixth line to an eighth line in the preset character array when the version information is a new version certificate, determine whether a first target number with a character length being a sixth preset value exists in the first number, and if so, the size of the first target number is in accordance with a second preset range, where the first target number is a birth date;
an eighth recognition unit, configured to extract a second number in any one of a seventh line to a ninth line in the preset character array, determine whether a second target number with a character length being a seventh preset value exists in the second number, and if so, the second target number is an effective date;
a ninth identifying unit, configured to extract a third number of any one of a fifth line to a seventh line in the preset character array when the version information is an old version certificate, and determine whether a third target number with a character length being the sixth preset value exists in the third number, where a size of the third target number conforms to a second preset range, and if so, the third target number is a birth date;
A tenth recognition unit, configured to extract numbers with a number of lines greater than any two lines in the eighth line in the preset character array, determine whether the numbers have a fourth target number and a fifth target number, where the character lengths are the sixth preset value and the sizes of the fourth target number and the fifth target number conform to the third preset range, and if the numbers have the fourth target number is a expiration date, and the fifth target number is a issuance period, where the fourth target number is greater than the fifth target number.
Optionally, the identification module further includes:
the eleventh identification unit is used for detecting whether a first target number with the total number of non-Chinese characters conforming to a fourth preset range exists in any one of the last two rows in the preset character array when the version information is a new version certificate, if so, the first eight digits of the first target number are Taiwan certificate numbers, and the first nine digits of the first target number are harbor and Australian certificate numbers;
a twelfth recognition unit, configured to detect whether a second target number, in which the total number of non-kanji characters accords with a fifth preset range, exists in the preset character array, and if so, the first ten digits of the second target number are identification card numbers;
a thirteenth identifying unit, configured to detect, when the version information is an old version certificate, whether any one of the first three rows in the preset character array has a third target number whose total number of non-kanji characters matches the fourth preset range, and if so, the first eleven digits of the third target number are taiwan certificate numbers or harbour certificate numbers;
A fourteenth recognition unit, configured to detect whether any line from a seventh line to a tenth line in the preset character array has a fourth target number, where the total number of non-kanji characters matches the fifth preset range, and the fourth target number does not belong to a time format, and if so, the first ten digits of the fourth target number are port-australian identification card numbers.
The application also provides a terminal, the terminal includes: a memory, a processor, and an OCR based credential recognition program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the OCR based credential recognition method as described above.
The present application also provides a computer storage medium having stored thereon an OCR-based document recognition program which, when executed by a processor, implements the steps of the OCR-based document recognition method as described above.
The application discloses a certificate recognition method, a device, a terminal and a computer readable storage medium based on OCR, wherein the certificate recognition method based on OCR obtains a picture of a certificate to be recognized, recognizes the picture through OCR, and stores characters obtained by recognition into a preset character array; judging the front and back information of the certificate to be identified based on a preset front and back distinguishing rule and the storage information of the preset character array; judging version information of the certificate to be identified based on the front and back information, the storage information of the preset character array and a preset version distinguishing rule; and obtaining the certificate content of the certificate to be identified based on the version information, the storage information of the preset character array and a preset attribute confirmation rule. The method and the device can identify the front and back information of the new certificate and the front and back information of the old certificate, so that the identification efficiency of the certificate to be identified is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a schematic hardware structure of an optional terminal according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a first embodiment of an OCR-based document recognition method of the present application;
FIG. 3 is a schematic diagram of functional modules of an OCR-based document recognition device of the present application.
The realization, functional characteristics and advantages of the present application will be further described with reference to the embodiments, referring to the attached drawings.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In the following description, suffixes such as "module", "component", or "unit" for representing elements are used only for facilitating the description of the present application, and are not of specific significance per se. Thus, "module," "component," or "unit" may be used in combination.
As shown in fig. 1, fig. 1 is a schematic diagram of a terminal structure of a hardware running environment according to an embodiment of the present application.
The terminal can be a fixed terminal, comprises a certificate recognition machine and the like, and can also be a mobile terminal, and comprises networking equipment such as a handheld certificate recognition instrument and the like.
As shown in fig. 1, the architecture design of the OCR-based certificate recognition terminal includes nodes and a server, and the device structure may include: a processor 1001, such as a CPU, memory 1005, and a communication bus 1002. Wherein a communication bus 1002 is used to enable connected communication between the processor 1001 and a memory 1005. The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
Optionally, the OCR-based document recognition terminal may further include a user interface, a network interface, a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. The user interface may include a Display screen (Display), touch screen, camera (including AR/VR devices), etc., and the optional user interface may also include standard wired interfaces, wireless interfaces. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface, bluetooth interface, probe interface, 3G/4G/5G networking communication interface, etc.).
Those skilled in the art will appreciate that the OCR-based document recognition terminal structure shown in fig. 1 does not constitute a limitation of an OCR-based document recognition terminal and may include more or fewer components than shown, or may combine certain components, or may be a different arrangement of components.
As shown in FIG. 1, an operating system, a network communication module, and an OCR-based credential recognition program may be included in memory 1005, which is a computer storage medium. The operating system is a program that manages and controls the OCR-based credential recognition terminal hardware and software resources, supporting the execution of OCR-based credential recognition programs and other software and/or programs. The network communication module is used to enable communication between components within the memory 1005 and with other hardware and software in the OCR-based credential recognition terminal.
In the OCR-based document recognition terminal shown in fig. 1, the processor 1001 is configured to execute an OCR-based document recognition program stored in the memory 1005, performing the following operations:
acquiring a picture of a certificate to be identified, identifying the picture through OCR, and storing characters obtained by identification into a preset character array;
judging the front and back information of the certificate to be identified based on a preset front and back distinguishing rule and the storage information of the preset character array;
Judging version information of the certificate to be identified based on the front and back information, the storage information of the preset character array and a preset version distinguishing rule;
and obtaining the certificate content of the certificate to be identified based on the version information, the storage information of the preset character array and a preset attribute confirmation rule.
Further, the processor 1001 may call the OCR-based certificate recognition program stored in the memory 1005, and further perform the following operations:
sequentially detecting the number of non-Chinese characters in the last three rows in the storage information of the preset character array;
if the number of the non-Chinese characters in each row accords with a first preset range and any one of the three rows comprises the first preset characters, judging that the front and back information of the certificate to be identified is a back certificate;
and if the number of the non-Chinese characters in each row does not accord with a first preset range, judging that the front and back information of the certificate to be identified is the front certificate.
Further, the processor 1001 may call the OCR-based certificate recognition program stored in the memory 1005, and further perform the following operations:
when the front and back information of the certificate to be identified is a back certificate, if the stored information of the preset character array accords with the line number which is smaller than 10 and larger than 5 and the preset characters do not exist, judging that the version information of the certificate to be identified is a new version certificate;
If the stored information of the preset character array accords with the line number less than 6 and the preset characters exist, judging that the version information of the certificate to be identified is an old-version certificate;
when the front and back information of the certificate to be identified is the front certificate, if the storage information of the preset character array accords with a first preset version distinguishing rule, judging that the version information of the certificate to be identified is a new version certificate;
and if the stored information of the preset character array accords with a second preset version distinguishing rule, judging that the version information of the certificate to be identified is an old version certificate.
Further, the processor 1001 may call the OCR-based certificate recognition program stored in the memory 1005, and further perform the following operations:
detecting whether target Chinese characters with the number smaller than a first preset value exist in a second row or a third row in the preset character array, and if so, judging that the target Chinese characters are Chinese names;
detecting preset character arrays of the two rows after the Chinese name, if the preset character arrays of the two rows after the Chinese name exist, the number of target capital letters is larger than a second preset value, and if the target capital letters exist, the target capital letters are English names of the Chinese name, wherein the second preset value is larger than the first preset value.
Further, the processor 1001 may call the OCR-based certificate recognition program stored in the memory 1005, and further perform the following operations:
when the version information is a new version certificate, detecting whether the ninth line to the thirteenth line in the preset character array have Chinese characters with the number larger than the first preset value, and if so, the Chinese characters have second preset characters, wherein the second preset characters are issuing authorities;
detecting whether third preset characters or fourth preset characters with the character length smaller than a third preset value exist in fifth to eighth rows in the preset character array, and if so, judging that the third preset characters or the fourth preset characters are sex information;
detecting whether target numbers with the number of characters smaller than a fourth preset value exist in the last two rows of the preset character array, if so, determining that the last two digits of the target numbers are the issuing times, and if not, determining that the number of characters of any one of the last two rows is larger than the fifth preset value, determining that the last two digits of the target rows are the issuing times;
when the version information is the old version certificate, detecting whether the third preset character or the fourth preset character with the character length smaller than the third preset value exists in the fourth line to the sixth line in the preset character array, and if so, judging that the third preset character or the fourth preset character is sex information.
Further, the processor 1001 may call the OCR-based certificate recognition program stored in the memory 1005, and further perform the following operations:
when the version information is a new version certificate, extracting a first number of any one of a sixth row to an eighth row in the preset character array, judging whether a first target number with a character length of a sixth preset value exists in the first number, wherein the size of the first target number accords with a second preset range, and if the first target number exists, the first target number is the birth date;
extracting a second number of any one of a seventh line to a ninth line in the preset character array, judging whether a second target number with a character length of a seventh preset value exists in the second number, and if so, judging that the second target number is an effective date;
when the version information is the old version certificate, extracting a third number of any one of a fifth row to a seventh row in the preset character array, judging whether a third target number with the character length of the sixth preset value exists in the third number, wherein the size of the third target number accords with a third preset range, and if the third target number exists, the third target number is the birth date;
And extracting numbers with the number of lines larger than any two lines in the eighth line from the preset character array, judging whether the numbers have the character length which is the sixth preset value and the fourth target number and the fifth target number with the size which accords with the second preset range, and if so, judging that the fourth target number is a cut-off valid period and the fifth target number is a issuance period, wherein the fourth target number is larger than the fifth target number.
Further, the processor 1001 may call the OCR-based certificate recognition program stored in the memory 1005, and further perform the following operations:
when the version information is a new version certificate, detecting whether a first target number with the total number of non-Chinese characters conforming to a fourth preset range exists in any one of the last two rows of the preset character array, if so, the first eight digits of the first target number are Taiwan certificate numbers, and the first nine digits of the first target number are Harbour and Australia certificate numbers; detecting whether a second target number of which the total number of the non-Chinese characters accords with a fifth preset range exists in the preset character array, and if so, taking the first ten digits of the second target number as an identity card number;
when the version information is the old version certificate, detecting whether any one of the first three rows in the preset character array has a third target number with the total number of non-Chinese characters conforming to the fourth preset range, and if so, determining that the first eleven digits of the third target number are Taiwan certificate numbers or Australian certificate numbers; detecting whether a fourth target number with the total number of non-Chinese characters conforming to the fifth preset range exists in any one of a seventh line to a tenth line in the preset character array, wherein the fourth target number does not belong to a time format, and if so, the first ten digits of the fourth target number are port and Australian identification card numbers.
Based on the above hardware structure, various embodiments of the OCR-based certificate recognition method of the present application are presented.
Referring to fig. 2, a first embodiment of the OCR-based document recognition method of the present application provides an OCR-based document recognition method including:
step S10, obtaining a picture of a certificate to be identified, identifying the picture through OCR, and storing characters obtained through identification into a preset character array;
in the technical scheme disclosed by the application, the certificates to be identified comprise a port and Australian resident incoming and outgoing passage and a Taiwan resident incoming and outgoing continental passage; the characters comprise Chinese characters, numbers, punctuation marks and the like; when the certificate to be identified is detected, firstly acquiring a picture of the certificate to be identified, then scanning each row of the picture to identify the acquired picture, so that a series of information can be identified, then storing characters obtained through identification into a preset character array in a row form corresponding to the information on the picture, for example, when the first row on the picture is identified as 'Wang Xiaoming', storing 'Wang Xiaoming' into the preset character array of the first row; when the fourth line on the picture is identified as "man", storing the "man" into the preset character array of the fourth line.
Specifically, the method adopts a convolutional cyclic neural network CRNN comprising a long-short-term memory model neural network LSTM and a target detection technology library YOLO v3 to construct a text detection and text recognition service, and performs end-to-end indefinite length text detection and OCR recognition on pictures of the to-be-recognized certificates.
Step S20, judging the front and back information of the preset certificate to be identified based on preset front and back distinguishing rules and the storage information of the preset character array;
in the technical scheme disclosed in the application, the preset front-back distinguishing rule refers to a preset relevant rule for distinguishing whether the certificate to be identified belongs to the front surface or the back surface; the stored information of the preset character array refers to information stored in the preset character array obtained by OCR recognition.
Step S30, based on the front and back information, the storage information of the preset character array and a preset version distinguishing rule, judging the version information of the certificate to be identified;
in the technical scheme disclosed in the application, the preset version distinguishing rule refers to a preset relevant rule for distinguishing whether the certificate to be identified belongs to a new version certificate or an old version certificate.
And step S40, obtaining the certificate content of the certificate to be identified based on the version information, the storage information of the preset character array and the preset attribute confirmation rule.
In the technical solution disclosed in the present application, the preset attribute confirmation rule refers to a preset related rule for distinguishing the type to which the information of the certificate to be identified belongs, for example, the attribute of "Wang Xiaoming" is a chinese name, and the attribute of "male" is a gender.
In the embodiment, a target detection technology library YOLO v3 and a convolutional cyclic neural network CRNN containing a long-short-term memory model neural network LSTM are adopted to construct a text detection and text recognition service, end-to-end indefinite length text detection and OCR recognition are carried out on pictures of the to-be-recognized certificates, and characters obtained through recognition are stored in a preset character array; judging the front and back information of the certificate to be identified based on a preset front and back distinguishing rule and the storage information of the preset character array; judging version information of the certificate to be identified based on the front and back information, the storage information of the preset character array and a preset version distinguishing rule; and obtaining the certificate content of the certificate to be identified based on the version information, the storage information of the preset character array and a preset attribute confirmation rule. The method and the device can identify the front and back information of the new-version certificate and the front and back information of the old-version certificate, and can quickly support the identification of the certificate to be identified without collecting a large amount of new-version data sets to retrain OCR identification service under the condition that the certificate is updated, thereby improving the identification efficiency of the certificate.
Further, in a second embodiment of the OCR-based document recognition method of the present application, step S20 includes:
step S21, sequentially detecting the number of non-Chinese characters in each of the last three rows in the stored information of the preset character array;
in the technical scheme disclosed in the application, the number of the non-Chinese character refers to other types of characters which do not belong to Chinese characters, such as "<" symbols, numbers and the like, namely, the non-Chinese character.
Step S22, if the number of the non-Chinese characters in each row accords with a first preset range, and any one of the three rows comprises the first preset characters, judging that the front and back information of the certificate to be identified is a back certificate;
in the technical scheme disclosed in the application, the first preset range refers to 28 to 32, and the first preset character refers to a "<" symbol. When the number of the non-Chinese characters in each row is between 28 and 32 and any one of the three rows comprises "<" symbols, judging that the front and back information of the certificate to be identified is the back certificate.
Step S23, if the number of the non-Chinese characters in each row does not accord with the first preset range, judging that the front and back information of the certificate to be identified is the front certificate.
In the technical scheme disclosed by the application, when the number of the non-Chinese characters in each row is detected to be not between 28 and 32, the front and back information of the certificate to be identified is judged to be the front certificate.
In this embodiment, the front and back information of the document to be identified is determined based on the preset front and back distinguishing rule and the stored information of the preset character array, so that the version information of the document to be identified can be determined based on the front and back information, the stored information of the preset character array and the preset version distinguishing rule, thereby realizing identification of the front and back information of the new version of document and the front and back information of the old version of document, and improving the identification efficiency of the document.
Further, in a third embodiment of the OCR-based document recognition method of the present application, step S30 includes:
step S31, when the front and back information of the certificate to be identified is a back certificate, if the stored information of the preset character array accords with the line number less than 10 and greater than 5 and the preset characters do not exist, judging that the version information of the certificate to be identified is a new version certificate;
in the technical scheme disclosed by the application, the identification of the preset character array obtains preset characters such as that the number of lines is less than 10 and greater than 5, and preset characters such as 'the harbor and australian residents come and go to the inner land', 'the taiwan residents come and go to the continental', are not present, and then the certificate belongs to a new certificate.
Step S32, if the stored information of the preset character array accords with the line number less than 6 and the preset characters exist, judging that the version information of the certificate to be identified is an old version certificate;
In the technical scheme disclosed by the application, the number of lines of the preset character array is less than 6, and preset characters such as 'the harbor and australian residents come and go to the inner place', 'the taiwan residents come and go to the continent' exist, so that the certificate belongs to the old-version certificate.
Step S33, when the front and back information of the certificate to be identified is a front certificate, if the stored information of the preset character array accords with a first preset version distinguishing rule, judging that the version information of the certificate to be identified is a new version certificate;
in the technical scheme disclosed in the application, the first preset version distinguishing rule refers to: any two or more of the following conditions are satisfied:
condition one: recognizing that the first three rows of the preset character array have characters such as 'the harbor and australian residents come and go to the inner place', 'the taiwan residents come and go to the continent', 'the pass', and the like;
condition II: identifying that 9 to 12 characters exist in any one of the first three rows of the preset character array;
and (3) a third condition: identifying a date range in which any line of the preset character array exists in a format yyyy.mm.dd-yyyy.mm.dd (for example, 2017.06.30-2022.06.30), or any line from the seventh line to the ninth line has 16 characters;
condition four: recognizing that characters such as the number of times of issuing exist in a preset character array;
Condition five: 9 to 12 numbers are recognized to exist in any one of the last three rows of the preset character array.
And step S34, if the stored information of the preset character array accords with a second preset version distinguishing rule, judging that the version information of the certificate to be identified is an old version certificate.
In the technical scheme disclosed in the application, the second preset version distinguishing rule means that: the default character array has a two-line date range and is formatted in yyyy-mm-dd format (e.g., 2022-06-30) and does not satisfy any of the following conditions:
condition one: recognizing that the first three rows of the preset character array have characters such as 'the harbor and australian residents come and go to the inner place', 'the taiwan residents come and go to the continent', 'the pass', and the like;
condition II: identifying that 9 to 12 characters exist in any one of the first three rows of the preset character array;
and (3) a third condition: recognizing that a date range with a format of yyyy.mm.dd-yyyy.mm.dd (for example, 2017.06.30-2022.06.30) exists in any one of seventh to ninth rows of the preset character array, or that the number of characters in any one of seventh to ninth rows is 16, wherein the first 8 digits and the last 8 digits in the 16-digit characters can be formatted into a yyyy.mm.dd time format, and the time after conversion is within 10 years from the recognition date;
Condition four: recognizing that characters such as the number of times of issuing exist in a preset character array;
condition five: 9 to 12 numbers are recognized to exist in any one of the last three rows of the preset character array.
In this embodiment, the version information of the certificate to be identified is determined based on the front and back information, the stored information of the preset character array and a preset version distinguishing rule, so that the certificate content of the certificate to be identified can be obtained based on the version information, the stored information of the preset character array and a preset attribute confirming rule.
Further, in a fourth embodiment of the OCR-based document recognition method of the present application, step S40 includes:
step S41, detecting whether a second row or a third row in the preset character array has target Chinese characters with the number smaller than a first preset value, if so, the target Chinese characters are Chinese names;
in the technical solution disclosed in the present application, the first preset value refers to 5, for example, when it is detected whether the second row or the third row in the preset character array has the target kanji "Ouyang Xiaogong" with the number smaller than 5, then "Ouyang Xiaogong" is the chinese name.
Step S42, detecting preset character arrays of the two subsequent lines of the chinese name, if the preset character arrays of the two subsequent lines of the chinese name have a target uppercase letter with a number greater than a second preset value, if so, the target uppercase letter is an english name of the chinese name, wherein the second preset value is greater than the first preset value.
In the technical solution disclosed in the present application, the second preset value refers to 6, for example, a preset character array of "Ouyang Xiaogong" two rows after detecting a chinese name, where there are target capital letters "outang. Xiaohong" with a number greater than 6, and if there are target capital letters, the english name is "outang. Xiaohong" chinese name "Ouyang Xiaogong".
In this embodiment, the chinese name and the english name on the document to be identified may be detected based on the stored information of the preset character array and the preset attribute confirmation rule.
Further, in a fifth embodiment of the OCR-based document recognition method of the present application, step S40 further includes:
step S43, when the version information is a new version certificate, detecting whether the ninth row to the thirteenth row in the preset character array have Chinese characters with the quantity larger than the first preset value, and if so, the Chinese characters have second preset characters, wherein the second preset characters are issuing authorities;
in the technical solution disclosed in the present application, the first preset value refers to 5, the second preset characters refer to characters such as "administration," "inbound," and the like, for example, when detecting that the characters "administration" with the number of less than 5 exist in the ninth line to the thirteenth line in the preset character array, the "administration" is the issuing authority.
Step S44, detecting whether third preset characters or fourth preset characters with the character length smaller than a third preset value exist in fifth to eighth rows in the preset character array, and if so, the third preset characters or the fourth preset characters are sex information;
in the technical scheme disclosed by the application, when version information is a new version certificate, the method aims at; the third preset value refers to 2, the third preset character refers to "man" and the fourth preset character refers to "woman", for example, the characters "woman" with the character length smaller than 2 are detected in the fifth line to the eighth line in the preset character array, and the "woman" is sex information.
Step S45, detecting whether target numbers with the number of characters smaller than a fourth preset value exist in the last two rows of the preset character array, if so, the last two digits of the target numbers are the number of issuing times, and if not, the number of characters with any one of the last two rows being larger than the fifth preset value, the last two digits of the target rows are the number of issuing times;
in the technical scheme disclosed by the application, when version information is a new version certificate, the method aims at; the fourth preset value refers to 4, the fifth preset value refers to 9, for example, if the last two rows in the preset character array are detected to have the target number "103" with the number of characters smaller than 4, the last two digits of "103" are the number of times of issuing 3 times.
Step S46, when the version information is the old version certificate, detecting whether the third preset character or the fourth preset character with the character length smaller than the third preset value exists in the fourth line to the sixth line in the preset character array, and if so, the third preset character or the fourth preset character is sex information.
In the technical scheme disclosed in the application, for example, when the version information is an old version certificate, detecting that characters 'men' or 'women' with the character length smaller than 2 exist in the fourth line to the sixth line in the preset character array, and the characters 'men' or 'women' are sex information.
In this embodiment, the issuing authority, the number of times of issuing, and the sex information on the certificate to be identified may be detected based on the version information, the stored information of the preset character array, and the preset attribute confirmation rule.
Further, in a sixth embodiment of the OCR-based document recognition method of the present application, step S40 further includes:
step S47, when the version information is a new version certificate, extracting a first number of any one of a sixth row to an eighth row in the preset character array, judging whether a first target number with a character length of a sixth preset value exists in the first number, wherein the size of the first target number accords with a second preset range, and if the first target number exists, the first target number is a birth date;
In the technical scheme disclosed in the application, the sixth preset value refers to 8, and the second preset range refers to a date between 19200808 and the current digital extraction date. For example, when the version information is a new version certificate, one of the sixth line to the eighth line of the character array, "2020-01-01" is preset, the target number "20200101" with the length of 8 exists after the number is extracted, and the size of 20200101 is between 19200808 and the date of digital extraction, and then "2020-01-01" is the date of birth.
Step S48, extracting a second number of any one of a seventh row to a ninth row in the preset character array, judging whether a second target number with a character length of a seventh preset value exists in the second number, and if so, judging that the second target number is an effective date;
in the technical scheme disclosed by the application, when version information is a new version certificate, the method aims at; the seventh preset value refers to 16; for example, one of the seventh to ninth rows "2011.01.01-2020.01.01" in the character array is preset, and the target number "2010010120200101" with the length of 16 is present after the number is extracted, then "2011.01.01-2020.01.01" is the effective date.
Step S49, when the version information is the old version certificate, extracting a third number of any one of a fifth row to a seventh row in the preset character array, judging whether a third target number with the character length of the sixth preset value exists in the third number, wherein the size of the third target number accords with a second preset range, and if the third target number exists, the third target number is the birth date;
In the technical scheme disclosed in the application, for example, when the version information is an old version certificate, one line "2020-01-01" from the fifth line to the seventh line in the character array is preset, a target number "20200101" with the length of 8 exists after the number is extracted, and the size of 20200101 is between 19200808 and the number extraction date, and then "2020-01-01" is the birth date.
Step S410, extracting numbers with a line number greater than that of each line in the eighth line in the preset character array, and determining whether there are a fourth target number and a fifth target number with the lengths of the two lines of characters being the sixth preset value and the sizes conforming to the third preset range, if yes, the fourth target number is a deadline, and the fifth target number is a issuance period, where the fourth target number is greater than the fifth target number.
In the technical scheme disclosed in the application, when version information is an old version certificate The third preset range refers to a date within 10 years before and after the current digital extraction date; for example, when the current date is 20200615, the number of lines in the preset character array is detected to be greater than the number in the eighth line, wherein two lines have numbers "20150303" and "20250303" with the character length of 8 and the date between 20100615 and 20300615, then "20250303" is the expiration date and "20150303" is the issue date.
In this embodiment, the date of birth, the validity period, and the issue period on the certificate to be identified may be detected based on version information, stored information of the preset character array, and a preset attribute confirmation rule.
Further, in a seventh embodiment of the OCR-based document recognition method of the present application, step S40 further includes:
step S411, when the version information is a new version certificate, detecting whether a first target number of which the total number of non-Chinese characters accords with a fourth preset range exists in any one of the last two rows in the preset character array, if so, the first eight digits of the first target number are Taiwan certificate numbers, and the first nine digits of the first target number are Australian certificate numbers;
in the technical solution disclosed in the present application, the fourth preset range refers to 8 to 13.
Step S412, detecting whether a second target number, whose total number of non-kanji characters matches a fifth preset range, exists in the preset character array, and if so, the first ten digits of the second target number are identification numbers;
in the technical scheme disclosed by the application, when version information is a new version certificate, the method aims at; the fifth preset range refers to 10 to 12.
Step S413, when the version information is an old version certificate, detecting whether any one of the first three rows in the preset character array has a third target number with the total number of non-kanji characters conforming to the fourth preset range, and if so, determining that the first eleven digits of the third target number are taiwan certificate numbers or harbor-australia certificate numbers;
Step S414, detecting whether there is a fourth target number in the fifth preset range, where the total number of non-kanji characters in any one of the seventh to tenth rows in the preset character array is consistent with the fifth preset range, and the fourth target number does not belong to a time format, if yes, the first ten digits of the fourth target number are the port-australity identification card numbers.
In this embodiment, the taiwan certificate number, the hong-ao certificate number and the identity card number on the certificate to be identified may be detected based on the version information, the stored information of the preset character array and the preset attribute confirmation rule.
In addition, referring to fig. 3, an embodiment of the present invention further provides an OCR-based document recognition device, where the OCR-based document recognition device includes:
the storage module is used for acquiring a picture of the certificate to be identified, identifying the picture through OCR, and storing characters obtained through identification into a preset character array;
the first judging module is used for judging the front and back information of the preset certificate to be identified based on a preset front and back distinguishing rule and the storage information of the preset character array;
the second judging module is used for judging the version information of the certificate to be identified based on the front and back information, the storage information of the preset character array and a preset version distinguishing rule;
And the identification module is used for obtaining the certificate content of the certificate to be identified based on the version information, the storage information of the preset character array and the preset attribute confirmation rule.
The application also provides a terminal, the terminal includes: the system comprises a memory, a processor, and an OCR based credential recognition program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the OCR based credential recognition method described above.
The present application also provides a computer readable storage medium having stored thereon an OCR-based document recognition program which when executed by a processor implements the steps of the OCR-based document recognition method described above.
In the embodiments of the OCR-based certificate recognition method, the OCR-based certificate recognition device, the terminal and the readable storage medium of the present application, all technical features of each embodiment of the OCR-based certificate recognition method are included, and the expansion and explanation contents of the description are basically the same as each embodiment of the OCR-based certificate recognition method, which is not described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims (9)

1. An OCR-based document recognition method, characterized in that the OCR-based document recognition method comprises the steps of:
acquiring a picture of a certificate to be identified, identifying the picture through OCR, and storing the characters obtained through identification into a preset character array one by one according to the identification sequence to generate storage information;
sequentially detecting the number of non-Chinese characters in the last three rows in the storage information of the preset character array;
if the number of the non-Chinese characters in each row accords with a first preset range and any one of the three rows comprises the first preset characters, judging that the front and back information of the certificate to be identified is a back certificate;
if the number of the non-Chinese characters in each row does not accord with a first preset range, judging that the front and back information of the certificate to be identified is a front certificate;
judging version information of the certificate to be identified based on the front and back information, the storage information of the preset character array and a preset version distinguishing rule;
and obtaining the certificate content of the certificate to be identified based on the version information, the storage information of the preset character array and a preset attribute confirmation rule.
2. The OCR-based document recognition method of claim 1, wherein the step of determining version information of the document to be recognized based on the front and back information, the stored information of the preset character array, and a preset version discrimination rule comprises:
When the front and back information of the certificate to be identified is a back certificate, if the stored information of the preset character array accords with the line number which is smaller than 10 and larger than 5 and the preset characters do not exist, judging that the version information of the certificate to be identified is a new version certificate;
if the stored information of the preset character array accords with the line number less than 6 and the preset characters exist, judging that the version information of the certificate to be identified is an old-version certificate;
when the front and back information of the certificate to be identified is the front certificate, if the storage information of the preset character array accords with a first preset version distinguishing rule, judging that the version information of the certificate to be identified is a new version certificate;
and if the stored information of the preset character array accords with a second preset version distinguishing rule, judging that the version information of the certificate to be identified is an old version certificate.
3. The OCR-based document recognition method of claim 2, wherein the step of obtaining the document content of the document to be recognized based on the version information, the stored information of the preset character array, and a preset attribute validation rule comprises:
detecting whether target Chinese characters with the number smaller than a first preset value exist in a second row or a third row in the preset character array, and if so, judging that the target Chinese characters are Chinese names;
Detecting preset character arrays of the two rows after the Chinese name, if the preset character arrays of the two rows after the Chinese name exist, the number of target capital letters is larger than a second preset value, and if the target capital letters exist, the target capital letters are English names of the Chinese name, wherein the second preset value is larger than the first preset value.
4. The OCR-based document recognition method of claim 3, wherein the step of obtaining the document content of the document to be recognized based on the version information, the stored information of the preset character array, and a preset attribute validation rule further comprises:
when the version information is a new version certificate, detecting whether the ninth line to the thirteenth line in the preset character array have Chinese characters with the number larger than the first preset value, and if so, the Chinese characters have second preset characters, wherein the second preset characters are issuing authorities;
detecting whether third preset characters or fourth preset characters with the character length smaller than a third preset value exist in fifth to eighth rows in the preset character array, and if so, judging that the third preset characters or the fourth preset characters are sex information;
Detecting whether target numbers with the number of characters smaller than a fourth preset value exist in the last two rows of the preset character array, if so, determining that the last two digits of the target numbers are the issuing times, and if not, determining that the number of characters of any one of the last two rows is larger than the fifth preset value, determining that the last two digits of the target rows are the issuing times;
when the version information is the old version certificate, detecting whether the third preset character or the fourth preset character with the character length smaller than the third preset value exists in the fourth line to the sixth line in the preset character array, and if so, judging that the third preset character or the fourth preset character is sex information.
5. The OCR-based document recognition method of claim 2, wherein the step of obtaining the document content of the document to be recognized based on the version information, the stored information of the preset character array, and a preset attribute validation rule further comprises:
when the version information is a new version certificate, extracting a first number of any one of a sixth row to an eighth row in the preset character array, judging whether a first target number with a character length of a sixth preset value exists in the first number, wherein the size of the first target number accords with a second preset range, and if the first target number exists, the first target number is the birth date;
Extracting a second number of any one of a seventh line to a ninth line in the preset character array, judging whether a second target number with a character length of a seventh preset value exists in the second number, and if so, judging that the second target number is an effective date;
when the version information is the old version certificate, extracting a third number of any one of a fifth row to a seventh row in the preset character array, judging whether a third target number with the character length of the sixth preset value exists in the third number, wherein the size of the third target number accords with a second preset range, and if the third target number exists, the third target number is the birth date;
and extracting fourth digits of which the number of lines is greater than any two lines in an eighth line in the preset character array, judging whether the fourth digits have the character length which is the sixth preset value and the fourth target digits and the fifth target digits of which the sizes accord with a third preset range, if so, the fourth target digits are a cut-off valid period, and the fifth target digits are a issuance period, wherein the fourth target digits are greater than the fifth target digits.
6. The OCR-based document recognition method of claim 2, wherein the step of obtaining the document content of the document to be recognized based on the version information, the stored information of the preset character array, and a preset attribute validation rule further comprises:
When the version information is a new version certificate, detecting whether a first target number with the total number of non-Chinese characters conforming to a fourth preset range exists in any one of the last two rows of the preset character array, if so, the first eight digits of the first target number are Taiwan certificate numbers, and the first nine digits of the first target number are Harbour and Australia certificate numbers;
detecting whether a second target number of which the total number of the non-Chinese characters accords with a fifth preset range exists in the preset character array, and if so, taking the first ten digits of the second target number as an identity card number;
when the version information is the old version certificate, detecting whether any one of the first three rows in the preset character array has a third target number with the total number of non-Chinese characters conforming to the fourth preset range, and if so, determining that the first eleven digits of the third target number are Taiwan certificate numbers or Australian certificate numbers;
detecting whether a fourth target number with the total number of non-Chinese characters conforming to the fifth preset range exists in any one of a seventh line to a tenth line in the preset character array, wherein the fourth target number does not belong to a time format, and if so, the first ten digits of the fourth target number are identification card numbers.
7. An OCR-based document recognition device, the OCR-based document recognition device comprising:
the storage module is used for acquiring pictures of certificates to be identified, identifying the pictures through OCR, and storing the characters obtained through identification into a preset character array one by one according to the identification sequence to generate storage information;
a first decision module, the first decision module comprising:
the first detection unit is used for sequentially detecting the number of the non-Chinese characters in each of the last three rows in the storage information of the preset character array;
the first judging unit is used for judging that the front and back information of the certificate to be identified is a back certificate if the number of the non-Chinese characters in each row accords with a first preset range and any one of the three last rows comprises the first preset characters;
the second judging unit is used for judging that the front and back information of the certificate to be identified is a front certificate if the number of the non-Chinese characters in each row does not accord with a first preset range;
the second judging module is used for judging the version information of the certificate to be identified based on the front and back information, the storage information of the preset character array and a preset version distinguishing rule;
And the identification module is used for obtaining the certificate content of the certificate to be identified based on the version information, the storage information of the preset character array and the preset attribute confirmation rule.
8. A terminal, the terminal comprising: a memory, a processor, and an OCR-based credential recognition program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the OCR-based credential recognition method of any one of claims 1 to 6.
9. A storage medium having stored thereon an OCR based document recognition program which when executed by a processor implements the steps of the OCR based document recognition method of any one of claims 1 to 6.
CN202010720829.4A 2020-07-23 2020-07-23 Certificate identification method, device, terminal and storage medium based on OCR Active CN111881810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010720829.4A CN111881810B (en) 2020-07-23 2020-07-23 Certificate identification method, device, terminal and storage medium based on OCR

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010720829.4A CN111881810B (en) 2020-07-23 2020-07-23 Certificate identification method, device, terminal and storage medium based on OCR

Publications (2)

Publication Number Publication Date
CN111881810A CN111881810A (en) 2020-11-03
CN111881810B true CN111881810B (en) 2024-03-29

Family

ID=73200149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010720829.4A Active CN111881810B (en) 2020-07-23 2020-07-23 Certificate identification method, device, terminal and storage medium based on OCR

Country Status (1)

Country Link
CN (1) CN111881810B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113869299B (en) * 2021-09-30 2024-06-11 中国平安人寿保险股份有限公司 Bank card identification method and device, computer equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008143087A1 (en) * 2007-05-14 2008-11-27 International Frontier Technology Laboratory, Inc. Authenticity validation subject, authenticity validation chip reader, and authenticity judging method
AU2016101599A4 (en) * 2016-09-09 2016-10-13 Auscertified Pty Ltd Method and System for an Integrated Verification and Certification System for Qualifications, Certificates, and Identification.
CN106886774A (en) * 2015-12-16 2017-06-23 腾讯科技(深圳)有限公司 The method and apparatus for recognizing ID card information
CA2954089A1 (en) * 2016-01-08 2017-07-08 Confirm, Inc. Systems and methods for authentication of physical features on identification documents
CN107194397A (en) * 2017-05-09 2017-09-22 珠海赛纳打印科技股份有限公司 Recognition methods, device and the image processing apparatus of card placement direction
CA2963113A1 (en) * 2016-03-31 2017-09-30 Confirm, Inc. Storing identification data as virtual personally identifiable information
CN109325414A (en) * 2018-08-20 2019-02-12 阿里巴巴集团控股有限公司 Extracting method, the extracting method of device and text information of certificate information
CN109446900A (en) * 2018-09-21 2019-03-08 平安科技(深圳)有限公司 Certificate authenticity verification method, apparatus, computer equipment and storage medium
CN109492643A (en) * 2018-10-11 2019-03-19 平安科技(深圳)有限公司 Certificate recognition methods, device, computer equipment and storage medium based on OCR
CN109657673A (en) * 2017-10-11 2019-04-19 阿里巴巴集团控股有限公司 Image-recognizing method and terminal
WO2019237549A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 Verification code recognition method and apparatus, computer device, and storage medium
CN111353497A (en) * 2018-12-21 2020-06-30 顺丰科技有限公司 Identification method and device for identity card information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110312414A1 (en) * 2010-06-16 2011-12-22 Microsoft Corporation Automated certification of video game advertising using ocr

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008143087A1 (en) * 2007-05-14 2008-11-27 International Frontier Technology Laboratory, Inc. Authenticity validation subject, authenticity validation chip reader, and authenticity judging method
CN106886774A (en) * 2015-12-16 2017-06-23 腾讯科技(深圳)有限公司 The method and apparatus for recognizing ID card information
CA2954089A1 (en) * 2016-01-08 2017-07-08 Confirm, Inc. Systems and methods for authentication of physical features on identification documents
CA2963113A1 (en) * 2016-03-31 2017-09-30 Confirm, Inc. Storing identification data as virtual personally identifiable information
AU2016101599A4 (en) * 2016-09-09 2016-10-13 Auscertified Pty Ltd Method and System for an Integrated Verification and Certification System for Qualifications, Certificates, and Identification.
CN107194397A (en) * 2017-05-09 2017-09-22 珠海赛纳打印科技股份有限公司 Recognition methods, device and the image processing apparatus of card placement direction
CN109657673A (en) * 2017-10-11 2019-04-19 阿里巴巴集团控股有限公司 Image-recognizing method and terminal
WO2019237549A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 Verification code recognition method and apparatus, computer device, and storage medium
CN109325414A (en) * 2018-08-20 2019-02-12 阿里巴巴集团控股有限公司 Extracting method, the extracting method of device and text information of certificate information
CN109446900A (en) * 2018-09-21 2019-03-08 平安科技(深圳)有限公司 Certificate authenticity verification method, apparatus, computer equipment and storage medium
CN109492643A (en) * 2018-10-11 2019-03-19 平安科技(深圳)有限公司 Certificate recognition methods, device, computer equipment and storage medium based on OCR
CN111353497A (en) * 2018-12-21 2020-06-30 顺丰科技有限公司 Identification method and device for identity card information

Also Published As

Publication number Publication date
CN111881810A (en) 2020-11-03

Similar Documents

Publication Publication Date Title
CN107785021B (en) Voice input method, device, computer equipment and medium
CN109657738B (en) Character recognition method, device, equipment and storage medium
CN110197179B (en) Method and device for identifying card number, storage medium and electronic equipment
CN101430623A (en) Input-handwriting automatic transformation system and method
CN108564079B (en) Portable character recognition device and method
KR20100103351A (en) Character recognition apparatus, character recognition program and character recognition method
CN110703977A (en) H5 webpage input keyboard implementation method, device, equipment and storage medium
US7565013B2 (en) Character recognition method, method of processing correction history of character data, and character recognition system
CN111222585A (en) Data processing method, device, equipment and medium
CN103279788B (en) The method of counting of a kind of automatic identification numeral and mobile terminal
CN111462388A (en) Bill inspection method and device, terminal equipment and storage medium
CN111881810B (en) Certificate identification method, device, terminal and storage medium based on OCR
CN112668580A (en) Text recognition method, text recognition device and terminal equipment
CN115641594A (en) OCR technology-based identification card recognition method, storage medium and device
CN108171229B (en) Method and system for identifying hollow adhesion verification code
CN112395450A (en) Picture character detection method and device, computer equipment and storage medium
CN114677700A (en) Identification method and device of identity, storage medium and electronic equipment
JP2020017149A (en) Information processing apparatus and information processing method
CN108875748B (en) Method, device and computer readable medium for generating wireless access point information
CN111078983A (en) Method for determining page to be identified and learning device
CN112183149B (en) Graphic code processing method and device
CN110929725B (en) Certificate classification method, device and computer readable storage medium
CN104346213A (en) Method for identifying application program in mobile communication terminal and server
JP5223739B2 (en) Portable character recognition device, character recognition program, and character recognition method
KR100619715B1 (en) Document image capturing method using mobile communication terminal and word extraction method from document image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant